This slideshow could not be started. Try refreshing the page or viewing it in another browser.
This Code Review Talk is Excellent but…
“[P]eer code reviews are the single biggest thing you can do to improve your code.”
Code designed by Nina Geometrieva from the Noun Project
About me: Mohammad (Mo) Jangda
– Toronto, Canada
– Code Wrangler at Automattic (WordPress.com)
– Ice Cream Fan
Who, in the audience, has some code review process in their organization or workflow?
Outline
– My history with Code Review
– Why bother?
– How to implement?
– Golden Rules.
My History with Code Review
Early in my career (school, various development jobs, freelance work, open source projects), I always felt like I was missing something.
Saw the Light
First* week on the job:
“So, let’s get you started by having you review this code sent in by a client!”
— Team Lead
Thrown right in and ~50% of my job for 3 years.
* after completing Automattic’s standard support rotation
I learned three things.
Lesson #1
Reviewing someone’s code is really scary!
Lesson #2
Getting your code reviewed is really scary!
Lesson #3
Extremely rewarding!
[P]eer code reviews are the single biggest thing you can do to improve your code. If you’re not doing code reviews right now with another developer, you’re missing a lot of bugs in your code and cheating yourself out of some key professional development opportunities. As far as I’m concerned, my code isn’t done until I’ve gone over it with a fellow developer.
— Jeff Atwood
— http://codinghorror.com
Common Excuses
– We pay our developers a lot of money. They should be writing perfect code. The need to review code means it’s not perfect. Imperfect code means we’re not getting our money’s worth.
– We don’t have time.
– It’s not my feature!
“the average [defect detection] effectiveness [rate] of design and code inspections are 55 and 60 percent.”
— Steve McConnell (Book: “Code Complete”)
Compared with 25-45% for various forms of testing.
(via Jeff Atwood http://blog.codinghorror.com/code-reviews-just-do-it/)
Development projects with code review have significantly fewer bugs, reduced development costs, increased productivity, and early ship dates.
(via Jeff Atwood http://blog.codinghorror.com/code-reviews-just-do-it/)
Warning: code review can be damaging if not done carefully.
Effective Code Review is a part of the development process and the culture of the organization, not just an afterthought.
Effective Code Review needs buy-in from everyone involved.
We always get push-back from developers working on the VIP platform.
Effective Code Review sets goals
Goals: Take Your Pick
– Learning other perspectives
– Becoming a better developer
– Becoming a better communicator
Goals: Take Your Pick
– Better Code Quality!
– Have at least one other person look at the code.
– Have fun!
Have established rules, standards, and process
Even if it’s no rules, standards, and processes.
Should define: what is the scope of the review? What types of things are we looking for? How will the process works. And communicate that with the team
What does a code review entail?
Looking over code to find potential issues or areas for improvement.
Bugs | Best Practices | Design Patterns | Faulty Logic | Confusing Flows | Opportunities for code reuse | Lack of standards/style | etc.
Culture, environment and workflow will change how you adopt it.
Individuals: DIY
– Do what (some) professional writers do!
– Write/Code in the morning => lunch => Edit/Review in the afternoon
– Freewriting?
Individuals: Find a buddy
– Open Source? Find volunteers at meetups/local community/IRC/Twitter.
– Closed source? Find another freelancer and trade.
Individuals: Offer reviews
Review open pull requests on GitHub (send messages to the PR author if you don’t want to ask or discuss publicly).
(May want to be familiar with topic or code in some form before jumping in otherwise start privately.)
Individuals: Pay someone
Airpair / Helpouts / etc.
Teams: Who?
– Gatekeeper
– Peer Review
– Committee Review
– Pair Programming
– By-request
Junior vs Senior developers?
Teams: When?
– Pre Code Review
– Iterative Review
– Pre Commit Review (for centralized source control like SVN)
– Post Commit Review
– Post Deploy Review
Teams: tools?
– Specific code review tools: Crucible, Phabricator.
– Source Control: Github, Trac Code Comments, patches.
– In-person.
– Ad hoc: Email/IM/IRC/Snapchat?
– Review as Pull Request / bugfix (https://github.com/Automattic/site-logo/pull/14)
Tools: Don’t need to get fancy.
Reviewing a simple patch/diff can go a long way.
Code Review at Automattic
Varies by team:
– no code review
– team lead or peer review
– gitflow (someone else has to review and merge)
– Code Review P2
Code Reviews are “egoless”
It’s not a competition about how many mistakes you can find. Or who is a better developer.
Nope.
I’ve been programming for n years. I don’t need someone else to tell me what’s wrong with my code.
I’ve been programming for n years. I can point out all the things wrong with someone’s code without ever looking at it.
Don’t make it personal
Reviewers: The fault doesn’t lie in the developer; it lies in the code.
Reviewees: don’t get defensive about feedback you receive about the code. Fix your mistakes and learn from them.
Remember the goals of the code review: learning and making the code better.
Caveat: If the mistakes are continually repeated in future code, then it’s a problem.
Reviewees: you can only fool your reviewer once. Accept your mistakes, but work to prevent them in the future.
Communication is everything!
How you convey your feedback and discuss the code will help ensure an egoless and effective review
Communication Tip #1
No personal pronouns in feedback.
Bad
path-to-file.php@8237#L786
Why are you not sanitizing your GET value here? You should go through your code again and properly sanitize all instances where you’re interacting with remote data.
Translation
path-to-file.php@8237#L786
You! Yes, You! I’m talking to you! You suck!
Good
path-to-file.php@8237#L786
We should sanitize the GET value here per best practices. Any reason not to? There may other instances where we should take a closer look as well.
Bonus Protip
Wherever you’re inclined to use a pronoun, replace it with “the code” or a coding concept (variable/class/etc.)
“You should be…” => “The code should be…”
Communication Tip #2
Avoid the use of “but”; prefer “and” or no conjunction instead to maintain an air of positivity.
The overall logic is great but the function could use some refactoring.
vs.
The overall logic is great. The function could use some refactoring, though.
Don’t Make Assumptions
Reviewees: optimize for code review; write your best code because you know someone will critique it (i.e. so make it good)
Reviewers: Be critical; ask questions about decisions that were made.
Safe space to ask questions
– Reviewers are okay to ask about code they do not understand. Opinions are okay as well, although, better phrased as question as a means of discussion.
– Reviewees can ask questions ahead of time about things they are unsure of. This can be provide a good starting point for the review.
Avoid Pedantics
Don’t get too caught up in minor details but make sure development best practices and coding standards are being followed.
Automate code style/standards issues, if possible.
Avoid too much emphasis on small patches vs big ones.
Ask a programmer to review 10 lines of code, he'll find 10 issues. Ask him to do 500 lines and he'll say it looks good.
— Giray Özil (@girayozil) February 27, 2013
Not a Silver Bullet
Code review is not going to catch everything. Bugs are inevitable.
Think Positive!
Reviewers: Remember to praise good code and incremental improvements.
Reviewees: Remember to say thank you!
Have fun!
Jokes + GIFs + Emoticons
In Closing
Takes a while to learn the rules. You will forget to follow them. You will fall out of the code review habit. You will hate yourself and your coworkers and peers. But you will become better.
And it will pay off.
Want a cool job?
Automattic is hiring code reviewers, developers (PHP/javascript/node/go/whatever), Happiness Engineers, and more!
Work from anywhere! Unlimited vacations! Cool swag!
Talk to me or http://automattic.com/jobs/
Thanks!
Presentation: The Database Schema
This slideshow could not be started. Try refreshing the page or viewing it in another browser.
The Database Schema
About me: Mohammad (Mo) Jangda
mo@automattic.com | batmoo@gmail.com | @mjangda
– Toronto, ON
– Code Wrangler at Automattic / WordPress.com
– Ice Cream Fan
Outline
– API vs Database
– Core Structure & Table Walkthrough
– Points of Interest
– Pitfalls & Opportunities
Caveats
– Not really touching much on Multisite
– High-level concepts
– Ignoring links
– Assuming MySQL only
– No database tuning advice :)
API vs. Database
A Very Rich API
As devs, we don’t have to worry about schema because WordPress abstracts the database interaction for us.
The closest thing most devs need to worry about is the username and password during the install process.
Why bother?
– Because it’s interesting :)
– Important to understand the underlying architecture of the system
– Easier to debug problems and understand changes and new features
– Can help solve interesting problems
Core Structure
Core Structure
– Single-site: 11 base tables
– Multisite: 17 base tables
=== 9 additional tables for every new blog
=== 500 million tables on WordPress.com
Core Structure
– Hybrid entity/object-oriented and key-value store (posts + meta)
– Some normalization of the schema (terms)
– Unique IDs (primary key) for each entity within a table
wp_{object}s tables
Main tables are modelled after the object they are storing:
– wp_posts
– wp_comments
– wp_users
wp_{object}meta tables
Each object type has a key-value meta store:
– wp_postmeta
– wp_usermeta
– wp_commentmeta
Consistency across these tables allows for a common metadata API
key-value store
A bit like an associative array, in table form. Values relating to objects are identified by a key.
(“NoSQL”)
key-value store
– post_id: 123
– key: _thumbnail_id
– value: 456
key can any valid varchar(255); value can be any primitive or serializable object.
wp_term(s|taxonomy|taxonomy_relationships)
– Terms are handled very differently via three tables
– Somewhat messy and the source of many frustrations
Table: options
Similar key-value store to meta but only one object (blog) and no matching table for that object
Table: options
option_id [bigint(20)]
option_name [varchar(64)]
option_value [longtext]
autoload [varchar(20)]
SELECT * FROM wp_options WHERE autoload = ‘yes’
// get_option( ‘cookies’ )
SELECT * FROM wp_options WHERE option_name = ‘cookies’
Table: commentmeta
Look familiar? :)
http://codex.wordpress.org/Database_Description#Table:_wp_commentmeta
Terms
– Mapping looks something like:
^v
wp_term_taxonomy
^v
wp_term_relationships
^v
wp_(posts|links)
Table: wp_term_taxonomy
http://codex.wordpress.org/Database_Description#Table:_wp_term_taxonomy
Table: wp_term_relationships
http://codex.wordpress.org/Database_Description#Table:_wp_term_taxonomy
#1: Get the term_id from slug
SELECT wp_43654959_term_taxonomy.term_id
FROM wp_43654959_term_taxonomy
INNER JOIN wp_43654959_terms USING (term_id)
WHERE taxonomy = ‘category’
AND wp_43654959_terms.slug IN (‘stuff’)require, wp, WP->main, WP->query_posts, WP_Query->query, WP_Query->get_posts, WP_Tax_Query->get_sql, WP_Tax_Query->clean_query, WP_Tax_Query->transform_query, wpdb->get_col
#2: Get the term_tax_id
SELECT term_taxonomy_id
FROM wp_43654959_term_taxonomy
WHERE taxonomy = ‘category’
AND term_id IN (293)require, wp, WP->main, WP->query_posts, WP_Query->query, WP_Query->get_posts, WP_Tax_Query->get_sql, WP_Tax_Query->clean_query, WP_Tax_Query->transform_query, wpdb->get_col
#3: Get the term object (optional)
SELECT t.*, tt.* FROM wp_43654959_terms AS t INNER JOIN wp_43654959_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy = ‘category’ AND t.slug = ‘stuff’ LIMIT 1
require, wp, WP->main, WP->query_posts, WP_Query->query, WP_Query->get_posts, get_term_by, wpdb->get_row
#4: Get the posts
SELECT SQL_CALC_FOUND_ROWS wp_43654959_posts.ID FROM wp_43654959_posts INNER JOIN wp_43654959_term_relationships ON (wp_43654959_posts.ID = wp_43654959_term_relationships.object_id) WHERE 1=1 AND ( wp_43654959_term_relationships.term_taxonomy_id IN (1) ) AND wp_43654959_posts.post_type = ‘post’ AND (wp_43654959_posts.post_status = ‘publish’ OR wp_43654959_posts.post_status = ‘private’) GROUP BY wp_43654959_posts.ID ORDER BY wp_43654959_posts.post_date DESC LIMIT 0, 10; SELECT FOUND_ROWS()
require, wp, WP->main, WP->query_posts, WP_Query->query, WP_Query->get_posts, wpdb->get_col
Term Frustration: Complicated Queries
Queries usually require one or more JOINs
– get the term_taxonomy_id of the term matching our slug and taxonomy
– get posts which match the term_taxonomy_id via the term_relationships table
Things get even messier when you want to query by multiple terms or taxonomies or a NOT IN
Term Frustrations: #5809-core
Terms with the same slug are bound together. Title or description change impacts the other. Or be forced to use the dreaded `-2` in your slug
https://core.trac.wordpress.org/ticket/5809
(Fixed early November 2014!)
Term Frustrations: No meta!
No key-value/meta store for terms!
Have to rely on hacky workarounds (store encoded/serialized in the description) or plugins (“Taxonomy Meta” or “Meta for Taxonomies”)
Good News!
There are some plans under way to simplify things:
– from 3 tables to 2
– reduce the complexity of queries with back-compat
– pave the way for taxonomy meta and better object modelling
https://make.wordpress.org/core/2014/11/12/an-update-on-the-taxonomy-roadmap/
Points of Interest
API functions for almost anything you want to do
Most API functions have actions and filters to help modifying the resulting queries (e.g. post_clauses)
add_filter( 'posts_where', function( $where ) { // kill the query! $where = ' 1 = 0'; // modify as needed! return $where; } );
$wpdb class for interacting with the database directly
global $wpdb $wpdb->insert( ... ) $wpdb->delete( ... ) $wpdb->get_results( ... ) $wpdb->get_var( ... )
(object + meta pattern)Very consistent naming
(`{object}_id`, `meta_key`, `meta_value`)
* a few inconsistencies (e.g. ID, umeta_id) :)
Dates are stored in local and GMT versions:
post_date and post_date_gmt
post_modified and post_modified_gmt
etc.
Object data columns are usually prefixed with the type
(`post_title` instead of just `title`, `comment_date` instead of just `date`)
* a few exceptions (e.g. ID)
Not all fields are still used:
`comment_karma` in comments
`to_ping` in posts
`term_group` in terms
options used to have a `blog_id` entry until 3.4)
Some fields are odd
(for comment_type default value is “”, which is a comment :))
Database changes don’t happen very often in core (can go several releases without a change).
- $wp_db_version = 27916; + $wp_db_version = 29188;
Difficult to make schema changes without risk of breaking things
– Especially harder on larger multisite installs
– pre_schema_upgrade attempts to handle the upgrade
https://core.trac.wordpress.org/browser/trunk/src/wp-admin/includes/upgrade.php#L2067
Custom database tables through plugins and themes are discouraged
– The schema was designed to be extremely flexible
– Not always perfect (e.g. post2post) but can accommodate a huge number of use cases
– But, tools available if your use case requires them (e.g. dbDelta)
Pitfalls & Opportunities
Pitfalls: Slow/Complex queries
– WP_Query is very powerful; you can do some insane lookups
– At scale, these insane lookups can break your site
– If you’re not careful, these insane lookups can break the site even with very little traffic
Pitfalls: Slow/Complex queries
– Non-indexed or expensive queries can be a problem
– The structure of the taxonomy system lends itself to slow queries
– meta_key- or meta_value-based queries on a really large tables are slow
Pitfalls: Schemaless Meta
– Some might argue that the flexible schema leads to bad application/site design
– Others also argue that less thought put into how meta will be used
Opportunity: Consistency
– Know what exactly to expect between installs
– Your WordPress is the same as my WordPress
Opportunity: Flexibility
– Custom post types!
– Any manner of object can be stored as custom “posts”
– Keyed off the “post_type” column
Opportunity: Flexibility
– wp_term_relationships does not have a strict definition of what an object_id is
– Used for both posts and links!
– Could technically use for users as well
Opportunity: alt use cases
– Using taxonomy as a post-to-post connector (instead of meta)
– https://github.com/mjangda/taxonomy-to-post_type-sync
Opportunity: alt use cases
– Machine-generated objects as post_types
=== Redirects via WP.com Legacy Redirector
=== DNS records on WordPress.com
=== Sitemaps in Comprehensive Sitemaps
$args = array( 'post_name' => $from_url_hash, 'post_title' => $from_url, 'post_type' => self::POST_TYPE, 'post_parent' => $redirect_to, ); wp_insert_post( $args );
Opportunity: alt use cases
– Liveblog entries as comments
=== Completely custom UI for interacting with data
=== All abstracted into its own API (and using the WordPress API) to map data to database fields (WPCOM_Liveblog_Entry, WPCOM_Liveblog_Entry_Query)
Summary
– The schema provides a good object model + key-value store for data
– It is extremely flexible and powerful
– Not something you should really have to worry about or deal with
– It can present problems if we are not careful
– It can reward us very well if we use it as it was intended to be
Hiring, Hiring, Hiring!
– VIP Wranglers!
– Code Wranglers!
– Designers!
– Happiness Engineers!
– Interns!
Thank You! Questions?
mo@automattic.com | batmoo@gmail.com | @mjangda