This slideshow could not be started. Try refreshing the page or viewing it in another browser.
The Database Schema
About me: Mohammad (Mo) Jangda
mo@automattic.com | batmoo@gmail.com | @mjangda
– Toronto, ON
– Code Wrangler at Automattic / WordPress.com
– Ice Cream Fan
Outline
– API vs Database
– Core Structure & Table Walkthrough
– Points of Interest
– Pitfalls & Opportunities
Caveats
– Not really touching much on Multisite
– High-level concepts
– Ignoring links
– Assuming MySQL only
– No database tuning advice :)
API vs. Database
A Very Rich API
As devs, we don’t have to worry about schema because WordPress abstracts the database interaction for us.
The closest thing most devs need to worry about is the username and password during the install process.
Why bother?
– Because it’s interesting :)
– Important to understand the underlying architecture of the system
– Easier to debug problems and understand changes and new features
– Can help solve interesting problems
Core Structure
Core Structure
– Single-site: 11 base tables
– Multisite: 17 base tables
=== 9 additional tables for every new blog
=== 500 million tables on WordPress.com
Core Structure
– Hybrid entity/object-oriented and key-value store (posts + meta)
– Some normalization of the schema (terms)
– Unique IDs (primary key) for each entity within a table
wp_{object}s tables
Main tables are modelled after the object they are storing:
– wp_posts
– wp_comments
– wp_users
wp_{object}meta tables
Each object type has a key-value meta store:
– wp_postmeta
– wp_usermeta
– wp_commentmeta
Consistency across these tables allows for a common metadata API
key-value store
A bit like an associative array, in table form. Values relating to objects are identified by a key.
(“NoSQL”)
key-value store
– post_id: 123
– key: _thumbnail_id
– value: 456
key can any valid varchar(255); value can be any primitive or serializable object.
wp_term(s|taxonomy|taxonomy_relationships)
– Terms are handled very differently via three tables
– Somewhat messy and the source of many frustrations
Table: options
Similar key-value store to meta but only one object (blog) and no matching table for that object
Table: options
option_id [bigint(20)]
option_name [varchar(64)]
option_value [longtext]
autoload [varchar(20)]
SELECT * FROM wp_options WHERE autoload = ‘yes’
// get_option( ‘cookies’ )
SELECT * FROM wp_options WHERE option_name = ‘cookies’
Table: commentmeta
Look familiar? :)
http://codex.wordpress.org/Database_Description#Table:_wp_commentmeta
Terms
– Mapping looks something like:
^v
wp_term_taxonomy
^v
wp_term_relationships
^v
wp_(posts|links)
Table: wp_term_taxonomy
http://codex.wordpress.org/Database_Description#Table:_wp_term_taxonomy
Table: wp_term_relationships
http://codex.wordpress.org/Database_Description#Table:_wp_term_taxonomy
#1: Get the term_id from slug
SELECT wp_43654959_term_taxonomy.term_id
FROM wp_43654959_term_taxonomy
INNER JOIN wp_43654959_terms USING (term_id)
WHERE taxonomy = ‘category’
AND wp_43654959_terms.slug IN (‘stuff’)require, wp, WP->main, WP->query_posts, WP_Query->query, WP_Query->get_posts, WP_Tax_Query->get_sql, WP_Tax_Query->clean_query, WP_Tax_Query->transform_query, wpdb->get_col
#2: Get the term_tax_id
SELECT term_taxonomy_id
FROM wp_43654959_term_taxonomy
WHERE taxonomy = ‘category’
AND term_id IN (293)require, wp, WP->main, WP->query_posts, WP_Query->query, WP_Query->get_posts, WP_Tax_Query->get_sql, WP_Tax_Query->clean_query, WP_Tax_Query->transform_query, wpdb->get_col
#3: Get the term object (optional)
SELECT t.*, tt.* FROM wp_43654959_terms AS t INNER JOIN wp_43654959_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy = ‘category’ AND t.slug = ‘stuff’ LIMIT 1
require, wp, WP->main, WP->query_posts, WP_Query->query, WP_Query->get_posts, get_term_by, wpdb->get_row
#4: Get the posts
SELECT SQL_CALC_FOUND_ROWS wp_43654959_posts.ID FROM wp_43654959_posts INNER JOIN wp_43654959_term_relationships ON (wp_43654959_posts.ID = wp_43654959_term_relationships.object_id) WHERE 1=1 AND ( wp_43654959_term_relationships.term_taxonomy_id IN (1) ) AND wp_43654959_posts.post_type = ‘post’ AND (wp_43654959_posts.post_status = ‘publish’ OR wp_43654959_posts.post_status = ‘private’) GROUP BY wp_43654959_posts.ID ORDER BY wp_43654959_posts.post_date DESC LIMIT 0, 10; SELECT FOUND_ROWS()
require, wp, WP->main, WP->query_posts, WP_Query->query, WP_Query->get_posts, wpdb->get_col
Term Frustration: Complicated Queries
Queries usually require one or more JOINs
– get the term_taxonomy_id of the term matching our slug and taxonomy
– get posts which match the term_taxonomy_id via the term_relationships table
Things get even messier when you want to query by multiple terms or taxonomies or a NOT IN
Term Frustrations: #5809-core
Terms with the same slug are bound together. Title or description change impacts the other. Or be forced to use the dreaded `-2` in your slug
https://core.trac.wordpress.org/ticket/5809
(Fixed early November 2014!)
Term Frustrations: No meta!
No key-value/meta store for terms!
Have to rely on hacky workarounds (store encoded/serialized in the description) or plugins (“Taxonomy Meta” or “Meta for Taxonomies”)
Good News!
There are some plans under way to simplify things:
– from 3 tables to 2
– reduce the complexity of queries with back-compat
– pave the way for taxonomy meta and better object modelling
https://make.wordpress.org/core/2014/11/12/an-update-on-the-taxonomy-roadmap/
Points of Interest
API functions for almost anything you want to do
Most API functions have actions and filters to help modifying the resulting queries (e.g. post_clauses)
add_filter( 'posts_where', function( $where ) { // kill the query! $where = ' 1 = 0'; // modify as needed! return $where; } );
$wpdb class for interacting with the database directly
global $wpdb $wpdb->insert( ... ) $wpdb->delete( ... ) $wpdb->get_results( ... ) $wpdb->get_var( ... )
(object + meta pattern)Very consistent naming
(`{object}_id`, `meta_key`, `meta_value`)
* a few inconsistencies (e.g. ID, umeta_id) :)
Dates are stored in local and GMT versions:
post_date and post_date_gmt
post_modified and post_modified_gmt
etc.
Object data columns are usually prefixed with the type
(`post_title` instead of just `title`, `comment_date` instead of just `date`)
* a few exceptions (e.g. ID)
Not all fields are still used:
`comment_karma` in comments
`to_ping` in posts
`term_group` in terms
options used to have a `blog_id` entry until 3.4)
Some fields are odd
(for comment_type default value is “”, which is a comment :))
Database changes don’t happen very often in core (can go several releases without a change).
- $wp_db_version = 27916; + $wp_db_version = 29188;
Difficult to make schema changes without risk of breaking things
– Especially harder on larger multisite installs
– pre_schema_upgrade attempts to handle the upgrade
https://core.trac.wordpress.org/browser/trunk/src/wp-admin/includes/upgrade.php#L2067
Custom database tables through plugins and themes are discouraged
– The schema was designed to be extremely flexible
– Not always perfect (e.g. post2post) but can accommodate a huge number of use cases
– But, tools available if your use case requires them (e.g. dbDelta)
Pitfalls & Opportunities
Pitfalls: Slow/Complex queries
– WP_Query is very powerful; you can do some insane lookups
– At scale, these insane lookups can break your site
– If you’re not careful, these insane lookups can break the site even with very little traffic
Pitfalls: Slow/Complex queries
– Non-indexed or expensive queries can be a problem
– The structure of the taxonomy system lends itself to slow queries
– meta_key- or meta_value-based queries on a really large tables are slow
Pitfalls: Schemaless Meta
– Some might argue that the flexible schema leads to bad application/site design
– Others also argue that less thought put into how meta will be used
Opportunity: Consistency
– Know what exactly to expect between installs
– Your WordPress is the same as my WordPress
Opportunity: Flexibility
– Custom post types!
– Any manner of object can be stored as custom “posts”
– Keyed off the “post_type” column
Opportunity: Flexibility
– wp_term_relationships does not have a strict definition of what an object_id is
– Used for both posts and links!
– Could technically use for users as well
Opportunity: alt use cases
– Using taxonomy as a post-to-post connector (instead of meta)
– https://github.com/mjangda/taxonomy-to-post_type-sync
Opportunity: alt use cases
– Machine-generated objects as post_types
=== Redirects via WP.com Legacy Redirector
=== DNS records on WordPress.com
=== Sitemaps in Comprehensive Sitemaps
$args = array( 'post_name' => $from_url_hash, 'post_title' => $from_url, 'post_type' => self::POST_TYPE, 'post_parent' => $redirect_to, ); wp_insert_post( $args );
Opportunity: alt use cases
– Liveblog entries as comments
=== Completely custom UI for interacting with data
=== All abstracted into its own API (and using the WordPress API) to map data to database fields (WPCOM_Liveblog_Entry, WPCOM_Liveblog_Entry_Query)
Summary
– The schema provides a good object model + key-value store for data
– It is extremely flexible and powerful
– Not something you should really have to worry about or deal with
– It can present problems if we are not careful
– It can reward us very well if we use it as it was intended to be
Hiring, Hiring, Hiring!
– VIP Wranglers!
– Code Wranglers!
– Designers!
– Happiness Engineers!
– Interns!
Thank You! Questions?
mo@automattic.com | batmoo@gmail.com | @mjangda
Protected: Workshop: Server-side Performance [VIP Workshop 2014]
WordPress 3.6
3.6 is finally out! Glad that I could contribute to a WordPress release again!