Saturday, October 5, 2013

Why NFS for code sharing is a bad idea on production machines

One of our customers was facing slowness on their website at peak loads. The architecture looked something like this:

1) LAMP stack.
2) A single NFS server hosting all the static content as well as PHP code.
3) 15 application servers hosted behind a load balancer and the NFS server mounted on all of these.

When we started debugging we found that the CPU load on the app server was never high even at peak loads. But the CPU load on NFS server would be very high at those times.

So we suspected NFS to be an issue but were not very sure because we were using APC with PHP and apc.stat was 0 which means that if the opcode cache is present in the APC, Apache would not do a look up in the file system for that file. If the above is true then once the APC opcode cache is warmed up (at peak loads it should be), then why are we seeing slowness in the site and high CPU load on NFS server at peak loads.

We used a linux utility called strace to trace all the system calls that were being made by  apache processes. We attached strace to one of the apache processes and found that it was doing hell lot of stat and lstat which are Linux system calls to find out if a file has changed or not. Which means that even after making apc.stat =0 the system was still doing lookup for PHP files. Strange.

Turns out it has been clearly mentioned in the APC documentation that APC does a look up for files irrespective of stat status if the file has been included with a relative path (and not absolute path). Most of the includes in the code were relative, which means apc.stat=0 did not help us :( .

Even if the look ups are happening isn't NFS supposed to cache the files at each client ?

Turns out NFS does not cache the file rather caches just the file meta-data (that too for 3 secs by default, which can be changed). The reason for caching the meta-data (called file attribute cache) is performance so that client does not need to make frequent network calls to just do a stat or meta-data lookup. The reason this cache has finite time period is to avoid staleness which could have disastrous effects in a shared environment. There are ways of caching the files too using fscache but is not recommended in a dynamic environment.

So the lessons we learnt are these:

1) Never share code on NFS , never ever, ever. 
2) Use NFS for just sharing the static content.
3) Never ever write to a file shared over NFS. For eg many applications have debug logs. If this log is shared then you can imagine how many network calls need to be made to write logs in the request scope.

After doing the above the response time of the application at peak loads reduced by 10X and down time became history. We were able to run the site with half the number of app servers.

The problem with shared code is that, the load eventually  goes down to the NFS and the app servers just act as dumb terminals. At peak loads you cannot add more servers as it would slow down the environment even more.  Its like putting another straw in a coke bottle which already had 10 straws drawing form it.

1 comment:

  1. Didn't see this before. I am going to steal this explanation :)