Wordpress Clustering

Clustering a wordpress configuration is not a standard approach and presented many challenges including file replication across multiple copies of the wordpress on different server instances, to choosing the proper caching mechanism. Wordpress, by default, is not designed for a clustered environment and required further customizations.

The existing caching plugins use file-based caching which is not suitable for a cluster environment as the cache resides locally on each server in the cluster. These caches are not available to other servers. Therefore distributed caching is the only option.

Additionally, the existing caching plugins are not true caching implementations. The purpose of caching is to eliminate the database access when serving up unchanged/unmodified pages, hence cached pages. Excessive database queries places an unnecessary load on the server and slows the response time in serving the pages to the visitor. This prompted a custom caching solution.

A plug-in is available that makes use of Memcache but is improperly written, whereas connections to the Memcache server are left open and dangling after each use. Proper coding techniques dictate that resources are freed immediately after use whenever possible.

Architecture Overview

The approach to achieve the required affects is to use page caching. The purpose of page caching is the page will be served from the cache as opposed from being generated from the database. Upon entering the site, a check is made with the requested URI against the Memcache server to see if this page entry is locked. This indicates that the page is currently being updated or regenerated and the visitor who is on a lock page waits for a preset time, the visitor will attempt to reload the page via a redirect. This reduces the load on the server further by preventing unnecessary database or memcache queries. If the page entry is not locked, then a check is made to see if the page is already cached? On the first page access, no memcache entry exists so a page must be generated from the database, at this time this visitor session will lock this page entry informing all subsequent visitors accessing this page not to try to update the page or pull from the cache as he is currently updating the cache. Then this generated page is cached through the use of output buffering. Additional page information is needed to be saved in the cache. This page information is used in the headers of the cache page and to maintain the design approach of no database access during cache retrieval, the page information is also retrieved from the cache. The page lock is release and the database generated page is then displayed to the visitor.

On the next page request, the cache is check for a page entry and this time an entry is found and the item retrieve along with the associated page information entry. The headers are created for the CDN delivery system and page presented to the visitor.

Memcache requires a unique key for each entry. The best practice for generating these keys is to hash the URI (via MD5). This solves the problem of using the same cache server to service multiple domains without key content overlap. Additionally, a key must be generated for each page/post information. A tag _POSTINFO is added to the end of the page URI before hashing for this key.

During servicing of the page from the cache, there are times that the server will send an If-Modified-Since parameter. This cannot be ignored, as a 304 Not Modified response is expected and will be sent back to the visitor if nor specifically handled. When this happens, not content is sent back only headers and not the intended headers previously set. There is one additional header parameter, known as ETag is this included in the headers.

The cache check should occur early in the boot process. There is a little known function, wp_cache_postload which would reside in the drop in file advanced-cache and activated when WP_CACHE constant is set to true. This function is invoked very early in the boot process before any database connection has been established. The function is a UDF for allowing initialization and operations by a custom cache plug-in. This function is where the cache check will be performed against the page URI.

Keep in mind, that caching only occurs for visitors. No caching will occur for administrators/bloggers, comment authors or anyone logged into the wordpress dashboard. So the first check in the wp_cache_postload, is to verify the user is just a visitor.

Once the user has been identified as a visitor, then a connection must be established with the Memcache server pool, followed by getting the URL from the Server and checking for non-cacheable requests. The non-cacheable requests include images, login, admin and comment pages. All other pages are considered cacheable. An additional check is made on the cookies for the existence of a recent comment post, which is not cacheable. This is when a visitor has made a comment to a post and is now consider and identified as a comment author in a cookie. Until the cookie expires, this visitor will not get cached pages.

Now that we have the request URI, we hash (or salt) the URI into a Memcache key and check for the existence in Memcache server pool (this is includes the lock first and then the content, if no lock exists). If the key does not exist, this key is locked and the loading will continue as normal. If the key exists, then the POSTINFO is also retrieved. From the POSTINFO, the Last-Modified header parameter is set based on the last modified dated of the post. Additional headers are set for Limelight CDN.

A request check is made for If-Modified-Since and if present, will return the 304 Not Modified header, otherwise a shutdown hook is registered to ensure resources are cleaned when the script exits and then the cache content is output to the browser via an echo statement and the script exits.

If no cache item exists, then loading process continues by instantiating an output buffer object known as wptruecache_cache. The output buffering method is assigned. Hooks are established for clearing the cache item on the requested URI when a post is published, edited or deleted, as well as when a comment is added, deleted, edit or moved from between approved and unapproved state.

There may be times when the memcache server becomes unavailable, at this point in time, a determination is needed whether to pull from the database directly or failover to server file-based caching until memcache becomes available. Then a decision is needed whether the file caching should be updated every time memcache is updated, that is a mirror copy is kept at the file level. The questions comes up with file cache being out of sync with memcache. Keeping mind that the file cache is only a backup of the last memcache and memcache takes precedent. So for each memcache pull, file cache item is updated, for each database pull, memcache is updated.

The scenario,

Visitor checks for memcache connectivity, if not connected, checks for file cached item, if not exist, then pull from database
Visitor checks for memcache connectivity and available, then checks for cached item in memcache, if not available, pulls from database and populates memcache and file cache.
Visitor checks for memcache connectivity and is available, then checks the cached item in memcache is available, pulls from memcache and updates file cache
Visitor checks for memcache connectivity, not available, checks for a file cache item that is available, then pulls from the file cached.
Visitor checks for memcache connectivity, not available, checks for file cache item that is not available, then pulls from the database and saves in temporary file cache (logs in apache error log of this action to check memcache). At this point memcache and file cache are out of sync and will not be back into sync until next successful memcache connection.

Additional Notes

During an output buffering operation for a visitor in preparation for a cache, the Cookies were also getting cached. This presented a security issue that exposed independent visitor sessions to the world. The correction was to remove the cookies for all visitors that did not post a comment and are considered cacheable. Administrators and bloggers retain their cookies and are not considered cacheable.

No output buffering occurs for Administrators, bloggers and comment authors as well as these users always pull from the database and never from the cache.

Because database connectivity is not available during cache object checking (wp_cache_postload), the post information is also saved to the cache using a related key. This post information is needed to retrieve the last modified date of the post itself and set the last modified header parameter.

To ensure proper cleanup of any wordpress allocated resources upon exit of the script during a cache rendering, a shutdown function is registered. This shutdown function will invoke the wordpress shutdown action. This is the same approach during normal operation after a rendered template page is displayed.

For all comment operations (add, edit, delete) invokes the same hook, which will clear the cache item for that comment. A new cache item is generated, reflecting whether or not the comment has been approved. Once a visitor becomes a comment author, that visitor will retrieve the pages directly from the database and not from the cache until their comment is approved. This ensures that the visitor will see their pending comment status.

A separate hook is invoked when the comment is approved. This just deletes the post item for this comment from the cache so the next time a visitor loads a page the new comment is properly cached.

Caveats

Comment Cookie Time

[from http://shibashake.com/wordpress-theme/w3-total-cache-cookie-is-rejected]

This is a common issue that arises whenever a user makes a comment on your blog. Making a comment causes a cookie to be set, and this disables caching for that user until the cookie is cleared or expired. Here is an explanation by Frederick Townes.

In WordPress 3.0, the default comment cookie expiration time is 30,000,000 seconds which is about 8333 hours or 347 days. That is a long expiration time. Unless the user manually clears his browser cookies, he will not get to enjoy the super caching capabilities on your site for about a whole year. A simple way to fix this is to use the comment_cookie_lifetime filter which will only get applied for non-logged in users. The comment cookie lifetime should be the maximum time a moderator takes to approved or disapproved a comment. If the moderator approves comments every 24 hours, then the comment cookie lifetime is set for 24 hours.

Single Memcache for Multiple Domains

Memcache supports a single key value caching mechanism. This is sufficient when using a single memcache instance per server domain, but when you have multiple domains and you need to delete the cache content for a specified domain, this becomes impossible as Memcache cannot distinguish between which cached items belong to each server. So the option is flush the cache which clears all cached items for all server domains. This became apparent when a comment is approved/updated, and the recent comment widget is not updating on pages because at the time, only the page which contained the actual comment was cleared from the cached. Any other pages which contain the recent comment widget were cached with content before. A quick work around is to flush the cache when a comment is approved/updated/deleted. Remember during a cache checking operation, there is no DB connectivity (only Memcache connections exists).

Another solution, during admin comment updating, the pages and posts permalinks are retrieved with memcache check for the existence of cache items, if found, will delete the cached item. A global admin comment update lock is set to prevent visitors from pulling from the cache during the update.

Memcache Failover Issue

There are three ways to establish a connection to the memcache server(s). Through addServer, connect and pconnect functions. The addServer function does not establish a connection until it is actually needed, e.g. call the add or get function. The connect and pconnect function establishes a connection immediately. The pconnect establishes a persistent connection that cannot be closed. An important note, though the addServer function provides a clean way of providing failover for a pool of servers, memcache does not replicate across the pool. The memcache replication algorithm must be coded into the script. The addServer connection pool, is useful only for reading or pulling from the cache, while multiple connect/pconnect are required for saving/writing to the cache.

During a read only cache transaction, the addServer function is preferred, as failover does occur if one of the servers in the pool is not responding. This is still a fast transaction which is desired, as you want to pull the cache page as quick as you can, display it and leave.

During an update cache transaction, required connections to each available memcache server, as each memcache server must be individually updated with the cache content. Therefore when a server fails later, the other servers have the replicated content and there is no need to regenerate the page for the surviving cache. When one cache server is updated, they all are updated.

To verify, set your WP_MEMCACHE_SERVERS to your full list, load several pages and ensure that you are reading from the cache. Then modify your WP_MEMCACHE_SERVERS showing only the last server or shutdown all but the last server. Reload the page and verify that the page is being pulled from memcache.

Memcache port may not be required

While the default port for the memcache serves is 11211, a port value apparently does not need to be provided if the only argument is the ip address. A memcache connection will be established on the appropriate port.

Browsers sending URL’s differently requires double caching

Discovery has been made that IE will send a trailing slash while Firefox will leave off the trailing slash. This slash determines whether the item is cached or not as different keys are generated for each scenario. There are three options available, 1) remove the trailing slash from the URL before hashing, 2) add a trailing slash, if none exist, before hashing, or 3) cache both the trailing slash and non-slash versions of the page, even if identical. Then delete both versions when a page, post or comment is updated.

The best solution is to remove the trailing slash before hashing so all URL’s are hashed without a trailing slash, and before they are checked against the cache.

File Caching failover

During the development of this plug in, the issue arose when the memcache server is offline, should there be a failover and what type? A simple, file-based caching failover has been implemented. The file cache resides on each web server in the cluster individually and mirrors the content of memcache. When no connection to memcache is possible, file cache retrieval will be substituted until memcache is restored.

Wordpress Functions of Interest

1. wp_is_large_network

Hacking Wordpress

There is a user definable file called my-hacks.php.