Saturday, October 5, 2013

Why NFS for code sharing is a bad idea on production machines

One of our customers was facing slowness on their website at peak loads. The architecture looked something like this:

1) LAMP stack.
2) A single NFS server hosting all the static content as well as PHP code.
3) 15 application servers hosted behind a load balancer and the NFS server mounted on all of these.

When we started debugging we found that the CPU load on the app server was never high even at peak loads. But the CPU load on NFS server would be very high at those times.

So we suspected NFS to be an issue but were not very sure because we were using APC with PHP and apc.stat was 0 which means that if the opcode cache is present in the APC, Apache would not do a look up in the file system for that file. If the above is true then once the APC opcode cache is warmed up (at peak loads it should be), then why are we seeing slowness in the site and high CPU load on NFS server at peak loads.

We used a linux utility called strace to trace all the system calls that were being made by  apache processes. We attached strace to one of the apache processes and found that it was doing hell lot of stat and lstat which are Linux system calls to find out if a file has changed or not. Which means that even after making apc.stat =0 the system was still doing lookup for PHP files. Strange.

Turns out it has been clearly mentioned in the APC documentation that APC does a look up for files irrespective of stat status if the file has been included with a relative path (and not absolute path). Most of the includes in the code were relative, which means apc.stat=0 did not help us :( .

Even if the look ups are happening isn't NFS supposed to cache the files at each client ?

Turns out NFS does not cache the file rather caches just the file meta-data (that too for 3 secs by default, which can be changed). The reason for caching the meta-data (called file attribute cache) is performance so that client does not need to make frequent network calls to just do a stat or meta-data lookup. The reason this cache has finite time period is to avoid staleness which could have disastrous effects in a shared environment. There are ways of caching the files too using fscache but is not recommended in a dynamic environment.

So the lessons we learnt are these:

1) Never share code on NFS , never ever, ever. 
2) Use NFS for just sharing the static content.
3) Never ever write to a file shared over NFS. For eg many applications have debug logs. If this log is shared then you can imagine how many network calls need to be made to write logs in the request scope.

After doing the above the response time of the application at peak loads reduced by 10X and down time became history. We were able to run the site with half the number of app servers.

The problem with shared code is that, the load eventually  goes down to the NFS and the app servers just act as dumb terminals. At peak loads you cannot add more servers as it would slow down the environment even more.  Its like putting another straw in a coke bottle which already had 10 straws drawing form it.

Friday, October 4, 2013

AWS JAVA client examples for ELB and RDS metrics (Asynchronous)

When I started working on the AWS JAVA client for fetching the ELB and RDS metrics, I could not find many examples on the internet. So here I am putting down some sample code snippets.

So first you need to create a Client object


AWSCredentials credentials = new BasicAWSCredentials(AWS_ACCESS_KEY,AWS_SECRET_KEY);
AmazonCloudWatchAsyncClient client = new AmazonCloudWatchAsyncClient(credentials);

ELB:

Now creating an init object . Note that we are getting an aggregation of 5 mins. Which means that we are asking for one data point every five minutes and since time range is 5 mins, we would get exactly one data point every time we invoke this method

private static GetMetricStatisticsRequest initializeRequestObject(AmazonCloudWatchAsyncClient client){ 
        GetMetricStatisticsRequest request   = new GetMetricStatisticsRequest();
         
        request.setPeriod(60*5); // 5 minutes 
         
        request.setNamespace("AWS/ELB"); 
         
        List<Dimension> dims      = new ArrayList<Dimension>(); 
        Dimension dim       = new Dimension(); 
        dim.setName("LoadBalancerName");  
        dim.setValue("NAME");  
        dims.add(dim); 
         
        Date end = new Date(); 
        request.setEndTime(end); 

        // Back up 5 minutes 
        Date beg = new Date(end.getTime() - 5*60*1000); 
        request.setStartTime(beg); 
        request.setDimensions(dims); 
        return request; 
    } 


Now fetching data for individual metrics:

ELB 4XX:


public static void get5MinELB4XX(AmazonCloudWatchAsyncClient client){ 
        client.setEndpoint("monitoring.ap-southeast-1.amazonaws.com"); // endpoints are listed at http://docs.aws.amazon.com/general/latest/gr/rande.html
        GetMetricStatisticsRequest request = initializeRequestObject(client); 
         
        request.setMetricName("HTTPCode_ELB_4XX"); 
        request.setUnit(StandardUnit.Count); 
         
        List<String> stats  = new ArrayList<String>(); 
        stats.add("Sum"); 
        request.setStatistics(stats); 
         
        client.getMetricStatisticsAsync(request, new AsyncHandler<GetMetricStatisticsRequest, GetMetricStatisticsResult>() { 
             
            @Override 
            public void onSuccess(GetMetricStatisticsRequest arg0,
                    GetMetricStatisticsResult arg1) { 
                List<Datapoint> data = arg1.getDatapoints(); 
                // Do something with this data here 
            } 
             
            @Override 
            public void onError(Exception arg0) {
                // log an error 
            } 
        }); 
        return; 
         
    }

ELB Latency:


public static void get5MinLatency(AmazonCloudWatchAsyncClient client){ 
        client.setEndpoint("monitoring.ap-southeast-1.amazonaws.com"); // endpoints are listed at http://docs.aws.amazon.com/general/latest/gr/rande.html
        GetMetricStatisticsRequest request = initializeRequestObject(client); 
         
        request.setMetricName("Latency"); 
        request.setUnit(StandardUnit.Seconds); 
         
        List<String> stats     = new ArrayList<String>(); 
        stats.add("Average"); 
        stats.add("Maximum"); 
        stats.add("Minimum"); 
        request.setStatistics(stats); 
         
        client.getMetricStatisticsAsync(request, new AsyncHandler<GetMetricStatisticsRequest, GetMetricStatisticsResult>() { 
             
            @Override 
            public void onSuccess(GetMetricStatisticsRequest arg0,
                    GetMetricStatisticsResult arg1) { 
                List<Datapoint> data = arg1.getDatapoints(); 
                // Do something with this data here  
            } 
             
            @Override 
            public void onError(Exception arg0) {
                 // log an error  
            } 
        }); 
        return; 
    }

ELB Request Count:

public static void get5MinRequestCount(AmazonCloudWatchAsyncClient client){ 
        client.setEndpoint("monitoring.ap-southeast-1.amazonaws.com"); // endpoints are listed at http://docs.aws.amazon.com/general/latest/gr/rande.html
        GetMetricStatisticsRequest request = initializeRequestObject(client); 
         
        request.setMetricName("RequestCount"); 
        request.setUnit(StandardUnit.Count); 
         
        List<String> stats = new ArrayList<String>(); 
        stats.add("Sum"); 
        request.setStatistics(stats); 
         
        client.getMetricStatisticsAsync(request, new AsyncHandler<GetMetricStatisticsRequest, GetMetricStatisticsResult>() { 
             
            @Override 
            public void onSuccess(GetMetricStatisticsRequest arg0,
                    GetMetricStatisticsResult arg1) { 
                List<Datapoint> data = arg1.getDatapoints(); 
                // Do something with this data here  
                 
            } 
             
            @Override 
            public void onError(Exception arg0) {
                // log an error  
            } 
        }); 
        return; 
    }

RDS:

CPU Utilization:


public static void get5MinCPUUtilization(AmazonCloudWatchAsyncClient client){ 
        client.setEndpoint("monitoring.ap-southeast-1.amazonaws.com"); // endpoints are listed at http://docs.aws.amazon.com/general/latest/gr/rande.html
        GetMetricStatisticsRequest request = initializeRequestObject(client); 
         
        request.setMetricName("CPUUtilization"); 
        request.setUnit(StandardUnit.Percent); 
         
        List<String> stats     = new ArrayList<String>(); 
        stats.add("Average"); 
        stats.add("Maximum"); 
        stats.add("Minimum"); 
        request.setStatistics(stats); 
         
        client.getMetricStatisticsAsync(request, new AsyncHandler<GetMetricStatisticsRequest, GetMetricStatisticsResult>() { 
             
            @Override 
            public void onSuccess(GetMetricStatisticsRequest arg0,
                    GetMetricStatisticsResult arg1) { 
                List<Datapoint> data = arg1.getDatapoints(); 
                // Do something with this data here  
            } 
             
            @Override 
            public void onError(Exception arg0) {
                // log an error 
            } 
        }); 
        return; 
    }

Similarly for other metrics of RDS you just need to chose the correct metric name and its unit.

Thursday, October 3, 2013

Building Real Time Analytics Engine (counters) with Redis

We were assigned a task of building a REAL TIME analytics API . The data to be analyzed were simple counters for multiple events that happen across the website. For eg how many times a particular listing appeared in the search results, how many times the listing page was viewed, how many times Click to Reveal was pressed, etc.

The problem is simple when you have a finite set of listing. But in our case we had around a million listings and each listing had around 5 events and we had to keep data for each listing for an year.

The first approach would be to use a NoSQL database like Mongodb. To be honest, mongodb for just maintaining the counters seemed like an over kill. Other reasons for not using Mongodb were:


  1. The write throughput would be a cause for concern given the collection level lock in Mongodb.
  2. No support for transactions. It becomes very import in case of an analytics API, as we cannot afford to update one counter and then fail to update the other.
  3. No support for atomic operation on a document. So there was no way to synchronize the counter increments or decrements.
  4. No native support for incrementing/decrementing counters.
So we decided to look for something else and came across Redis (kidding, I always wanted to use Redis, was just looking for an excuse :) ).

The features that tipped the scale in Redis' favor were.

  1. Its crazy fast as the whole db is in memory.
  2. Transaction are supported.
  3. Native support for atomic operations.
  4. Native support for incrementing/decrementing counters.
Two very nice libraries exist for redis analytics namely bitmapist and Minuteman . These make use of Redis bit operations which are really awesome and helps us fit humongous data into a few MBs of memory. But these did not work for us as they work for non repeating data (i.e an event for a user is captured only once). Its binary in nature which does not fit with our counters requirement.

So we implemented our own solution and here are some of my learning from the same.

1) Design for failure: You cannot afford to have an analytics API down. So have a file backed fallback in case of datastore (Redis) being down. Have multiple app servers behind a load balancer which helps in making hot releases without bringing the site down. Write a script which replays these logs when your datastore is up.

2) All non counter data is written to a file. We did not know what to do with this additional data, so just persisted it in a file and pushed it periodically to AWS S3.

3)  Enable AOF for Redis with a flush frequency of 1 sec. This is a must if you dont want to lose data on reboot of Redis :). The AOF file should be periodically pushed to S3 for higher durability.

4) Have RDB snaphots atleats twice a day and push those to S3.

5) When the data size becomes few GBs Redis takes a few minutes (yes minutes) to start as it runs the whole AOF file on startup. The AOF file is nothing but a list of all the DMLs that have happened and it does so serially. Redis would throw an exception until this AOF has been replayed completely. This is the reason why we need a file backed fallback when something like this happens.

6) Use Redis Hashes as they are superbly optimized for space (with high time complexity but this trade off doesnt hurt much).

7) Redis is single threaded. So adding more CPUs is of no use. Throw as much memory as you can at it.

8) No redis library supports transactions (they cannot actually) when you run Redis cluster as its not possible to have a transaction across multiple datastores. Redis cluster works the same way as memcache (i.e data is sharded on key hashes).

9) Needless to say use the smallest possible key:value pairs as even a single "_" in 1M keys adds a couple of hundred MBs to memory footprint.

The request flow of our application is something like this:



Hope this helps someone to make a decision about using Redis for analytics.