Cal Henderson's all day talk on "Scaling Fast and Cheap - How We Built Flickr" is packed with information about building a scalable enterprize web application using predominantly open source software, some custom software, and very little commercial software.
Anyone from a beginner to an experienced web applications developer would find the talk useful. Especially since Cal was open to any questions. However, if you want a talk on the history of Flickr this is not the talk for you. Currently flickr runs on about 200 servers and has about 600 terabytes of storage, not including redundancy it is about 200 terabytes. There are two data centers, one in Texas and one in Virginia. The development team is about 10 engineers and operational support is now handled by Yahoo's operations team, but previously it was about 4 people. Keep it simple
He started out his talked with a picture of the godfather of computer science, Donald Knuth and his famous quote "premature optimization is the root of all evil." (Misattributed quote, see comments). The point he drove home was not to waste man power optimizing the small stuff or the stuff that won't need optimizing. Almost all optimization will be in the database and hardware configuration of your disk storage system.
In that vain he recommended buying commodity hardware and install the standard version of linux, with one exception, use the compiled binaries of MySQL from www.mysql.com -- don't try to compile your own. Further, run mysql on a 64 bit machine to get around the memory limits of 32 bit machines.
Don't lock yourself into a hardware platform if you can help it -- especially if you are growing fast. If you do, check that the components (disk drives, network cards, etc...) your vendor uses stay compatible with your version of linux. If you can't do that, then plan ahead for some time to develop for a possibly new version of linux.
The main software components of flickr are Linux (I think RedHat, Debian, SuSi), JVM, Smarty, Mysql, Apache2, PHP4. Consistency across your systems is key to ensuring ease of maintainance and ease of development.
Use Version Control for Everything
Use version source control (CVS or Subversion) for everything, -- and as hard as it is -- put useful comments in the version control system. I personally like to recall a quote from (I think) Daiman Conway "Document your code as if a homicidial maniac who knows where you live will be taking over development of your code." To help with this use a simple CVS or Subversion program on the client side -- let your developers use which ever one they like. Also, put everything into the version control system, application code, system configuration files (apache, php, etc...), documentation, etc... Set standards for naming files, database tables function and object names, etc... Don't worry about settling on the perfect one, the greatest benefit is in everyone using the same standard. Use a bug tracking system like FogBugz, Mantis, RT, or Bugzilla. Get disciplined and fix bugs before doing development on the next release. Fix the easy bugs first (low hanging fruit). Categorized your bugs, P1 the production site is down, P2, causes the staging site to go down, P3, does not bring the site down but really needs to be fixed to maximize the user experience, P4, it is a bug and no one will every notice it.
Local, Development, Staging, Production
Application development occurs in four segments, local, development, staging, production. Local is the developers local machine where they have the component of the system they are developing on installed. Development is the developing version of the site, the lowest level that all the components are together. Staging is the almost live version of the site -- the in house test site. Production is the live site and is only updated via the staging site via a very simple interface -- one button click. So scripts are written to deploy the latest version of the staging code to the production site. Hence, it is easy to roll back to a pervious version of the code on the production server from the staging server. This is important because you will never ever be able to fully test a web application. There will always been bugs that can only be revealed on the live site.
Unicode
Flickr supports unicode and it is easy to support this in a web application. The hard part is data integrity. What do we do with invalid unicode? First, set up a data intregity policy for the site. Flickr filters the data (comments, titles, etc...) before it is stored to ensure it is valid unicode.
HTML and SQL Spoofing
Displaying user entered HTML on your site is a really stupid idea from a security point of view since there are so many spoofing attacks. However, if you are going to do this use an open source library like lib_filter to clean up the input first. SQL injection attacks are another problem. Use just-in-time escaping of the SQL input and never grant more permissions for a database user than necessary.
Even with all the RFC's (561, 822, 1521) for an email standard a lot of email still does not adhere to these, especially mobile phone providers. Flickr uses PEARS's Mail:mimeDecode, iconv, and custom code to parse email. When you send a photo to your flickr account via a cell phone a lot of code goes into getting the title, comment, and photo out of the email. Some providers seem to take advantage of this feature to advertise via by replacing the subject line of the email with an advertisement or including icons at the bottom of the email.
BottleNecks
Bottlenecks almost always occur either because of swapping to disk because of high memory usage or non-optimized database queries. While it is possible you could have a CPU bottle neck if your web application requires some heavy crunching (image or video editing), it is usually very rare. Don't look for a bottleneck until it occurs. Remember, pre-optimization if the root of all evil. Try to build in some stats gathering into your site to help locate any bottlenecks, e.g., add a feature to optionally log the time of queries or other processes. Mysql indexes are a "bizarre black magic", when it comes to optimizing them consider hiring a Mysql expert.
Site Monitor
When it comes to monitor the site Cal was very happy with Ganglia for gathering all kinds of trend (past hour, day, week, month, year) stats and Nagios for real-time health (is everything up and running as it should be) monitoring of the site.
Scalability
Cal defined scalability as horizontal scaling; buying more servers not buying bigger servers. He centered mostly on how to get MySQL to scale. He describe the different backends to MySQL, MyISAM, BDB, InnoDB, and Heap, and the pros and cons of each and how best to use them. The most interesting part was their use of MySQL replication, which does not get you true horizontal scaling. It gives you the capability to handle a lot more reads to the databases but every database receives the same number of writes. Most web applications do far more reads than writes to a database. I won't get into the details of their Master/Slave database setup. One drawback to this is replication-lag – which you can experience on flickr with the tags. If you add a tag to a photo on flickr and then search for photos with that tag a few seconds latter the photo may not be returned. The problem is that your search request was handled by one of the database slaves that was not updated yet.
True scalability somes with MySQL5 and Orcale RAC ($25K per processor). MySQL5 is not ready for prime time. So the only option is to use Orcale to write your own code to do this – which gets really complicated fast. Not recommended unless you really know what you are doing and have simple database SELECT's. Currently Flickr's architecture does not scale horizontally. They just keep stretching their current architecture.
Storage
I got the impression from the last part of Cal's talk that the image storage system is really not too complicated. He didn't get into the details of what hardware systems they are using but it is grown by just adding more hardware (scales horizontally). The location of the images on the system is stored in the database. I think they have written their own code or protocol to fetch the image from the storage system instead of using ftp or nfs – for speed mostly likely.
He ran out of time before he could talk about RSS support and the flickr API.
Building Scalable Web Sites by Cal Henderson will be available soon.
3 comments:
Fantastic read. Is there a podcast available anywhere of this session?
The "premature optimization" quote is commonly misattributed to Knuth. It was actually said by Tony Hoare, and is known as Hoare's dictum.
Is there a podcast available anywhere of this session?
I did a few quick searches on google and the various podcast search engines and could not find one. If anyone knows of one please let me know and I'll add a link to it.
The "premature optimization" quote is commonly misattributed to Knuth.
Thank you for correcting me (and Cal) on this. There is a discussion on Wikipedia about the misattribution.
Post a Comment