I enjoy to read about the architecture about some of the bigger internet related systems around. Be it about database sharding, Hadoop usage, choice of languages or development methods. I will continuously try to post some numbers from these adventures. Here is a start together with links that can be followed for more details.
From Data Center Knowledge quoting a talk at Structure 2010 by Facebook’s Jonathan Heiliger.
- 400 million users.
- 16 billion minutes spend on Facebook each day.
- 3 billion new photos per month
- More than a million photos viewed per second.
- 65,000 servers spanning 70 countries and 1000 networks.
- Hundreds of billions "Internet interactions" per day
- Traffic peaks at 2 Terabits per second
Facebook recently released their PHP on steroids named HipHop as open source. I listened to their presentation a while before at the FOSDEM conference in Brussels and was as many others impressed - but not as entusiastic as many others seem now.
Some say it's nothing new because there has been a small amount of PHP compilers before or because there are op-code caches already. What HipHop does however is not only to compile the code base to C++ but also process the code in several stages - to use as specific data type possible for instance. Facebook engineers are saying that they see 30-50% performance improvement over PHP that is already boosted by APC. Indeed that is a huge deal given the amount of application servers they use.
On the other side a lot of attention it has been gotten is almost the same as that of APC. It´s seen as a general purpose performance booster. But as with APC results for most people will be disappointing for the reason that most of the application time is not spent in the PHP code with most websites.
Facebook is indeed special compared to most websites. For instance they generally do no joins of data at the database level. That results in alot more data in the application as well as more basic application logic.
The reason they often chose to totally exclude joins are several. Amongst others it´s performance draining for the database servers which are generally harder to scale. It´s also very hard to do when you need to query a whole lot of servers (that can also be sharded by different factors) for each and every type of data you want to join.
HipHop is made for the giants by a giant. The huge sites with a lot of traffic that have big amounts of data to do queries against. Smaller sites will have much less benefit from the performance boost of it as most of their time is spent in databases, caches, reading from disk etc. The results will also vary greatly depending on how much of the code base of the website is mundane (basic constructs like loops, processing with non-dynamic variables etc). Basically everything that can be rewritten with static functions and variables are the most welcome targets to the HipHop optimizations.
Even with some 10 commodity application servers I would say that Hip Hop should give little enough performance boost to justify the time that needs to be spent to learn, test and maintain the framework. For Facebook though, I can certainly see how it's very welcome even with several man-years of development costs as they have thousands (?) of application servers and have good reasons not to drop PHP in most heavy parts of it.