Clarence Eldefors blog Mostly about the web and technology

8Sep/130

Transform a dying LAMP stack to a powerhouse

There are many LAMP stacks of a few to a dozen of servers around that have serious troubles with performance, troubleshooting and manageability.

To give a clear example. Let's think of a setup with 10 apache servers serving static files and PHP, a mysql database server and a haproxy load balancer in front. Assume that there are 5 sites with different domains load balanced across the servers. Some common problems in a setup like this, and possible solutions are:

Problem

Apache performance is mediocre at best.

Solution
Most probably 2-4 of the apache servers can be removed/reused for something else by using a well configured lightweight web server. The two most common choices are nginx and lighttpd, in that order. In my opinion they are both stable and pretty equivalent in performance - but I do like the flexibility in nginx configuration a bit better.

Problem

No good cache system utilized makes each and every PHP request come with a big overhead.

Solution
Changing the plain load balancer into a caching one can come with great performance gains. Varnish is now probably the first choice for most people and is both very stable and performs well. By setting an expiration time from your backend (web) servers you can control exactly how long each file should be cached. Put some time and effort to achieve high hitrates and it is probably the most important change you can do.

Problem

No good way to identify bottlenecks or performance troubles in the PHP code. With many recent releases over several sites in the web cluster it's very hard to know the reason for overloaded servers.

Solution
Having a live profiler for PHP in your web servers is a great win in this situation. If you want something free there is XHPROF that was open sourced by Facebook. Unfortunately they did not release their web interfaces that made it useful in production. There are however two open source solutions for that Xhgui that logs into a mongodb database and XhProf Analytics that logs into mysql. Just take caution on where and how you use it as it's quick to fill up your disk with the profiler results ;) A good setup could be to log only a percentage of the requests and then to empty it from old data (with an index for mongodb and a cronjob for mysql).

If you don't mind paying a bit there is also a hosted solution called New Relic with a nice intuitive interface. They do more than just PHP as well.

Problem

No good way to get an overview of where errors happen. To read the PHP error logs you need to log into each of the machines.

Solution
Use a centralized logging service.

If you want to roll your own there's a very good solution in Graylog2 together with Logstash for collection. Graylog2 provides a user interface with search, log streams and alerts and Logstash provides an easy to use client for parsing and forwarding log file messages to the Graylog2 server.

For hosted solutions the most mentioned service is loggly. Beware that it might become costly depending on your logging scale.

Problem

MySQL uses MyISAM and it keep causing locks.

Solution
With row level locks instead of table level InnoDB mostly comes as a must when you get alot of writes to some hit tables. It will also in general be faster for an average setup with it's buffer pool caching of data, not to mention it's primary key partitioning. Be sure that you have a large enough buffer pool (and suffificent memory for it) and it will do miracles for your database reliability.

Other things to consider

Are you using the right database system?
There are many cases where alot of your database load can be transferred into a more lightweight system. A common situation is that many simple data structures can be moved from a somewhat heavy RDBMS (say MySQL) to a more lightweight NoSQL database such as Redis. This can increase the performance significantly if used in hotspots with simple data structures.

Revisit your database load
Database loads are a very common bottleneck and is most often the hardest part of your application to scale. By periodically checking up on what is causing the load there are usually many easy optimizations to be done. pt-query-digest is an excellent tool for profiling your database load and finding the heavy queries.

Caching effiecency
Measure your performance. See what causes the most load on the servers with xhprof/new relic, varnish cli tools, pt-query-digest etc. You will surely find things within the top requests and queries that can be more effeciently cached.

This list is far from complete but I think and hope I managed to get the most common and important points down.

25Oct/120

PHP: Structure, cooperation and good practices – part 1

PHP land is very often a messy place without standards and structure. Well, that does not have to be any more. Over the years alot of tools, standards and practices have been emerging to make it a more structured place - for the good of all of us PHP developers.

Use a standardized autoloading: PSR-0

The pro of standardized autoloading is in one word interopability. It makes it easier to add source code and libraries from different sources into your project as the autoloader in your and the 3rd party source code will be compatible.

An overview of PSR-0 with links to follow for sample code and information is available at: https://github.com/php-fig/fig-standards/blob/master/accepted/PSR-0.md

Handle libraries with composer

Composer is a great tool for managing libraries. The package information is so flexible it's easy to use it with most libraries already around and completely without fuss to include libraries that added a small and simple composer.json description file to their repository.

If you start up a new project and want to include Monolog and Twig into your application; you just write the requirements up in your local composer.json file to include them like this:

{
    "require": {
        "monolog/monolog": "1.2.*",
        "twig/twig": "v1.9.2"
    }
}

The value of the package is as you can guess a version tag to set which versions you are OK with. When other packages also needs a specific version of the same library this is very useful to see that both your project and the other package are compatible to the same dependency version.

To install the packages you first download the composer PHAR file and then you just run "composer.phar install".

The package information comes from the public package repository at http://www.packagist.com. It is also easy to set up your own repository with Satis (packagist is also open source, but a bit overweight for simple inhouse repositories) and there's support for PEAR as a repo as well. If you want to add zip files or VCS repositories without the composer.json - thats also easily manageable with the package-repository in your own composer.json (see http://getcomposer.org/doc/05-repositories.md#package-2)

To get started; head over to http://getcomposer.org/

Use a coding standard

When working together with other people or companies it's important that everyone writes code the same way. The variable and function names should have the same casing, the spacing and indentation should be the same, opening braces should be at the same place.

The biggest reason is not that one way is the best to write it but that it should be easy to write code and easy to read code. If you learn to read and write code in one way, and have to change every so often during the same workday - your productivity will diminish.

There are a horde of standards out there. PSR-2, Zend, VG etc etc.

My suggestion is to follow the PSR-2 standard as that has been formed by a wider group of people and companies and is therefore easier to adapt into different types of projects.

24Oct/120

PHP, small tips and tricks #1

Purpose of this post series is to give a few tips on how to improve your code. It's not the most important practices, but some common things to see done in a less than elegant way.

For adding days or weeks to a timestamp, don't count your seconds.

DON'T:

$nextDay = $todayTimestamp + 86400;

DO:

$nextDay = strtotime('+1 day', $todayTimestamp);

The first one will fail when you move into or out of daylight savings time. An extra pro is that the second example is a hell lot more readable. Especially if you add something like 2 weeks and 4 days ('+2 weeks 4 days')

Validate and sanitize with filter_var()

Don't use regexp to filter/match an email or an URL. There's already builtin filters in PHP for it:

Example:

$email = filter_var('bob@example.com', FILTER_VALIDATE_EMAIL);

See documentation on:
filter_var()
Available filters

Tagged as: , No Comments
17May/122

Varnish command line tools

Varnish comes with several very useful command line tools that can be a bit hard to get the grasp of. The list below is by no means meant to be exhaustive, but give an introduction to some tools and use cases.

varnishhist

Easily seen as a geeky graph with little information, varnishhist is actually extremely useful to get an overview of the overall status of your backend servers and varnish.


The pipes (|) are requests served from the cache whereas the hash-signs (#) are requests to the backend. The X axis is a logaritmic scale for request times. So the histogram above shows that we have a good amount of cache hits that are served really fast whereas roughly half of the backend requests takes a bit more than 0,1s. Like most of the other command line applications you can filter out the data you need with regex, only show backend requests or only cache hits. See https://www.varnish-cache.org/docs/trunk/reference/varnishhist.html for a complete list of parameters.

varnishstat

varnishstat can be used in two modes, equally useful. If run with only "varnishstat" you will get a continously updated list of the counters that fit on your screen. If you want all the values for indepth analysis you can however use "varnishstat -1" for the current counter values.

Now a couple of important figures from this image:

  • The very first row is the uptime of varnish. This instance had been restarted 1 day and 2h 44 mins before screenshot
  • The counters below that are the average hitrate of varnish. The first row is the timeframe and the second is the average hitrate. If varnishstat is kept open for longer the second timeframe will go up to 100 seconds and the third to 1000 seconds
  • As for the list of variables below the values correspons to total value, followed by current value and finally the average value. Some rows that are apparently interesting would be
    • cache_hit/cache_miss to see if you have monumental miss storms
    • Relationship between client_conn and client_req to see if connections are being reused. In this case there's only API traffic where very few connections are kept open. So the almost 1:1 ratio is to be seen as normal.
    • Also relationship between s_hdrbytes and s_bodybytes is rather interesting as you can see how much of your bandwith is actually being used by the headers. So if s_hdrbytes are a high percentage of your s_bodybytes you might want to consider if all your headers are actually necesary and useful.

varnishtop

Varnishtop is a very handy tool to get filtered information about your traffic. Especially since alot of high-traffic varnish sites do not have access_logs on their backend servers - this can be of great use.

tx are always requests to backends, whereas rx are requests from clients to varnish. The examples below should clarify what I mean.

Some handy examples to work from:

See what requests are most common to the backend servers.
varnishtop -i txurl

See what useragents are the most common from the clients
varnishtop -i RxHeader -C -I ^User-Agent

See what user agents are commonly accessing the backend servers, compare to the previous one to find clients that are commonly causing misses.
varnishtop -i TxHeader -C -I ^User-Agent

See what cookies values are the most commonly sent to varnish.
varnishtop -i RxHeader -I Cookie

See what hosts are being accessed through varnish. Will of course only give you useful information if there are several hosts behind your varnish instance.
varnishtop -i RxHeader -I '^Host:'

See what accept-charsets are used by clients
varnishtop -i RxHeader -I '^Accept-Charset'

varnishlog

varnishlog is yet another powerful tool to log the requests you want to analyze. It's also very useful without parameters to develop your vcl and see the exact results of your changes in all it's verbosity. See https://www.varnish-cache.org/docs/trunk/tutorial/logging.html for the manual and a few examples. You will find it very similar to varnishstop in it's syntax.

One useful example for listing all details about requests resulting in a 500 status:
varnishlog -b -m "RxStatus:500"

varnishncsa

varnishncsa is a handy tool for producing apache/ncsa formatted logs. This is very useful if you want to log the requests to varnish and analyze them with one of the many availalable log analyzers that reads such logs, for instance awstats.

23Jul/110

Send data to include files with dwoo

Sometimes it is wanted to send some data, like title or an id to an included template with Dwoo. Even though described in the manual I did not first look there because I mostly don't find what I look for there. So in case anyone else have the same bother; this will solve it:

{include(file='elements/google-like-button.tpl' url='http://www.eldefors.com')}

This will allow you to use $url in the included template just as if it was a local variable.

More features that one might miss or look too long for in the docs:
Scope $__.var will always fetch 'var' variable from template root. This is useful for loops where scope is changed.
Loop names Adding name="loopname" lets you access it's variables. inside a nested construct by accessing $.foreach.loopname.var (change foreach with your loop element).
Default view for empty foreach By adding a {else} before your end tag you can output data for empty variables passed without an extra element.

What is Dwoo some might wonder. It is a template system with similar syntax to smarty. It has however been rewritten alot and is in my experience working great both performance-wise and feature-wise. It is very easy to extend and the by far most inexpandable feature to me is that of template inheritance. This lacks in most PHP templating systems but with Dwoo you can apply the same thinking as with normal class inheritance. Since many elements on your page are the same on all pages and even more in the same section; you can define blocks which you override (think of it as class methods) and for instance create a general section template that inherits the base layout template and then let every section page template inherit this.

11Feb/110

Alternative Android markets

One good thing about the android system is the ability to create your own application markets to cover functions unavailable on the default Android Market or to get an application market onto devices not approved by Google. Since I couldnt find any good lists on the markets available I browsed around my android market mailbox and searched the web a bit to come up with this list:

SlideMe (4/5)

Website: slideme.org
Number of apps: ~4,000
Payment methods: Paypal, credit and debit cards, amazon payments.
Activity: Good. Top app has more than a 150k downloads. Very very low to default market but high for alternatives.
Extras: Developers can get payments to a SlideMe Mastercard. Downloadable native client called "SAM"!

Handster (2/5)

Website: handster.com
Number of apps: ~4,000 (>10,000 for all platforms)
Payment methods: Paypal, credit and debit cards.
Activity: Seems low. Public number of ratings show 0-5 for most popular apps.
Extras: All platforms at one place.
Comments: Offering locales where they seem to have made automatic translations, at best, makes it seem less reliable.

AndAppstore (1/5)

Website: andappstore.com
Number of apps: ~1,800
Payment methods: Only paypal.
Activity: Fairly low. Around 60 comments posted for their native client.
Extras: Native android client.
Comments: Very basic site. Comments without ratings seems to lower overall rating. Complaints about non-delivered apps on their client comments. Seemed promising but in the end didn't deliver any trust.

OpenMarket (-/5)

Comments: Specialized for htc phones and south africa makes it very narrow. didnt catch more interest by web 2.0 catchphrases and unbrowsable categories.

Amazon appstore (2/5)

Website: amazon.com (?)
Activity: Unknown
Comments: Yearly fee for developers of $99 which is 3 times what you pay for your google market account. First year waived. Needed to sell apps on amazon.com. I still don't see the need or what is so special with this market - unless they are coming with their own device.

Poketgear (2/5)

Website: pocketgear.com
Number of apps: ~4,500 in android section
Payment methods: Payapal, credit and debit cards.
Activity: Ok at best. 7,000 downloads of angry birds compared to millions on default client.
Extras: All platforms.

pdassi (3/5)

Website: android.pdassi.com
Number of apps: ~4,500 in android section
Payment methods: Payapal, credit and debit cards.
Activity: ~13,000 downloads on most popular android application. Other sections much more well used it seems.
Extras: Native android client. All platforms. Good localization for german, dutch and italian.

Comments? Did I miss any? Do you know any devices that ship with an alternative android makert?

Tagged as: No Comments
28Aug/100

Free as in promotional trick

MPEG LA delivers the widespread news that H.264 will be royalty free permanently.

This is merely a trick to promote it's adoption in HTML 5 and popularity in the mainstream. Nothing really changes with this statement. H.264 did note become more free in any important aspect.

Firstly it does not change anything for 4 years. The previous license was valid until 2014. We all know that is a long time given current tech progress.

Secondly only free video broadcasting is included. Should you decide any alternative delivery methods, want to actually create videos or anything other it is not included.

This is only making it free to actually transfer the bits that you already have. A limitation that I would argue should even be allowed to exist. And even if you still feel safe - MPEG LAy can change this license at any time.

Tagged as: , , No Comments
28Aug/101

Microsoft does not love open source

Microsoft now claims to love open source:

http://www.networkworld.com/news/2010/082310-microsoft-open-source.html

There is a very speaking quote from the interview about Ballmer describing Linux as a cancer:

The mistake of equating all open source technology with Linux was "really very early on," Paoli says. "That was really a long time ago," he says. "We understand our mistake."

Reading the original quote it is clear that they still dislike Open Source:

The way the license is written, if you use any open-source software, you have to make the rest of your software open source. If the government wants to put something in the public domain, it should. Linux is not in the public domain. Linux is a cancer that attaches itself in an intellectual property sense to everything it touches. That's the way that the license works.

What they have and had problems with are actually the GPL licenses and how it attributes copyright. This limits Microsofts ability to make use of open source efforts in their propietary software.

What Microsoft should be saying is that they support open source only when they can make use of it in their world of closed source. This is not working WITH open source, it is working AGAINST open source.

22Jul/100

Android versions in the wild

Google recently released figures on the version numbers of Android phones accessing the android market. It's just validating what mostly all developers already know. All 1.5+ versions are important to support and different screens are as important.

Users are very fast to let developers know this, though. Already 3 months before the OTA update of Android 2.2 for my Nexus One I got a message about not being able to find an application in the 2.2 market from a user. The error is common, a default manifest file that has limits on API versions. I couldn't have tested the application on a higher API level at the development time so the abscence in the untested OS was not really an error. However it shows how easy it is to get feedback from users to correct such errors.

I have also gotten feedback on enabling the 2.2 install device (SD card or phone memory) as well as automatically generated crash reports that are also new in android 2.2. Adding to that it is for an application exclusively for Sweden where no 2.2 device has even been sold as of yet (but of course imported in different shady ways).

The chart below shows how important it is to keep backwards-compatibility in your applications, just like for web development and old browsers. Luckily the android emulators are very good and theres a fully functioning image for each and every API level. There's also other developer emulator images circling around where you can test the functions missing in the default images, such as a paid-app enabled market.

Historic development of Android versions
Source: http://developer.android.com/resources/dashboard/platform-versions.html

22Jul/100

Huge server architectures

I enjoy to read about the architecture about some of the bigger internet related systems around. Be it about database sharding, Hadoop usage, choice of languages or development methods. I will continuously try to post some numbers from these adventures. Here is a start together with links that can be followed for more details.

Facebook

From Data Center Knowledge quoting a talk at Structure 2010 by Facebook’s Jonathan Heiliger.
- 400 million users.
- 16 billion minutes spend on Facebook each day.
- 3 billion new photos per month
- More than a million photos viewed per second.

- Probably over 60,000 servers at date (Data Center Knowledge)
- Thousands of memcache servers. Most likely the biggest memcache cluster (Pingdom).

Akamai

- 65,000 servers spanning 70 countries and 1000 networks.
- Hundreds of billions "Internet interactions" per day
- Traffic peaks at 2 Terabits per second
Source: Akamai

Google

- Server number estimated at 450,000 at 2006 (High Scalability)
- Server number estimated at 1 million by Gartner (Pandia)