|
|
Nate True's Weblog: Posts tagged with digg
Posted by natetrue 1 year ago
 Cre.ations.net keeps its costs down by using Dreamhost shared hosting as its provider. With such high-traffic creations such as the Time Fountain and Turning your pee blue, you might be wondering how cre.ations.net survives the large amounts of traffic that link sites such as Digg and Engadget point in our direction.
First of all, remember that I wrote the cre.ations.net Content Management System in its entirety. I know it inside and out and that gives me a benefit here. I can make these kinds of decisions for that reason, and it's great!
Once upon a time, before I had ever been on Digg (read: before I had published the Time Fountain), every page on cre.ations.net was dynamically generated using anywhere from 5 to 10 SQL queries. This was fine because when I was writing it I wanted to get the site up and only worry minimally about performance issues (don't get me wrong, I still wrote good, responsible code). So when I published the Time Fountain and it got on Digg, cre.ations.net just went down for a few hours until the traffic subsided. That was unacceptable - I hate when things go down during a traffic surge and cre.ations.net was no different.
So I racked my brain for solutions - sure, I could pay for a dedicated server, but I didn't have the money back then and I sure don't now. Even if I did have the money it would be no guarantee that my site wouldn't go down from traffic.
As a programmer, I usually go for software solutions - throwing money or hardware at it just didn't seem right.
I realized that the vast majority of people during a traffic surge aren't logged in users, and don't need a dynamic page. The only dynamic elements to someone not logged in are the randomized creation list on the right and the view counter on each creation, both of which don't need to be updated per viewer.
So I used .htaccess to run everything through one PHP script. Here are the relevant lines from my .htaccess file:
RewriteEngine On
RewriteRule ^([^_][-a-z]+)/(.*)$ /cache.php?path=$1/$2 [L]
RewriteRule ^([^_][-a-z]+)$ /cache.php?path=$1 [L]
Now let me explain a little about the architecture of cre.ations.net. You will have noticed that the URLs aren't your standard /path/to/index.html file URLs. They're semantic, something I really liked on other websites I've seen. The secret is that each URL consists of its first word, which refers to a PHP file, and a series of parameters. So, for example, the URL cre.ations.net/blog/post/how-cre-ations-net-survives-a-digging
would actually translate to calling blog.php with parameters "post" and "how-cre-ations-net-survives-a-digging".
That was how it was before the caching script was inserted. Now everything is a call to cache.php with each segment of the URL being parameters.
When cache.php is called, it first checks to see whether the user has a login cookie. For speed, it doesn't check whether it's valid. If there's a cookie (or some other don't-cache preconditions are met), the relevant PHP file is called and the page is created dynamically.
What's interesting is what happens if the page request qualifies for caching. First the page URL is converted to a path on disk where the cache file would be stored (I used to store the cache in MySQL, but it's faster not to try to connect before serving the page). If the file exists, its modification time is checked. If the cache is recent enough (less than 5 minutes old), the cache file is simply read out to the user's browser. Only then does it try to connect to MySQL to perform Linkback tracking. View logging is accomplished with another file, which gets a single character ('#', if you must know) added to it every time a cache file is read out.
If the cache file is too old or has not been created yet, the caching script makes use of PHP's Output Buffering mechanism, generating the page dynamically and saving the contents to the cache, and also clearing the view counter.
Now if you're savvy you'll have caught a flaw in the architecture. Don't blame yourself if you didn't - it escaped me for several weeks. Since the architecture still connects to SQL for each connection (despite not inherently depending on it), won't the MySQL server still go down from too many connections? The answer here is yes - since every connection needs an SQL connection behind it, the SQL server goes down, even if the cached page is still happily being served from the cache.
So it's not a problem until it comes time for the cache to be renewed. If I leave the script as I've described, the dynamic page creation will fail and the cache will be replaced with an error message - hardly what we want.
So I altered the CMS to call a special function cache_exit() when any part of page creation fails. That aborts the cache creation process and renews the cached copy of the page for another five minutes, after which we'll try again with page creation. Additionally, the cached copy is read to the poor user whose request failed.
This way, cre.ations.net can even function when the MySQL server is down, albeit somewhat in a degraded state.
Hopefully I'll be able to test it with some real high traffic soon.
|
|