tag:blogger.com,1999:blog-72558503417293357262024-03-18T00:45:24.014-07:00RamblingsAkash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.comBlogger19125tag:blogger.com,1999:blog-7255850341729335726.post-42174544452928587552016-03-08T22:01:00.002-08:002016-03-08T22:01:23.026-08:00How gmail/google does mutiple user logins in the same browser<div dir="ltr" style="text-align: left;" trbidi="on">
Have you ever wondered how gmail or google in general is able to log you in with two different accounts in the same browser (albeit different tabs)? For eg : a@gmail.com and b@gmail.com are able to login in the same browser at the same time.<br />
<br />
Now why is it a big deal or to rephrase it, what stops any website to do the same and allow multiple user logins in the same browser.<br />
<br />
The culprit is the cookie. Now for those who dont know how sessions work in web applications, once you visit/login to a website a session is created in the server and assigned a unique ID for each such session (user in our case). This ID is written to a cookie which the browser saves locally. When a request is sent from the browser to the server it also sends out the cookies which were set earlier by the browser and this is how the server understands who is calling the service. Is it a@gmail.com or b@gmail.com.<br />
<br />
A cookie can be set at two levels (scope):<br />
<br />
<br />
<ul style="text-align: left;">
<li>Domain: This means that all the requests originating from this browser for this domain would send all the cookies which were set for this domain. For eg: www.google.com</li>
<li>Path (or url): A cookie can be set at the url level too. What this means is we can set separate cookies for /url1 and /url2 (with same domain). When the browser sends a request for /url1 it would not send cookies set for /url2 to the server.</li>
</ul>
Most of the websites use a combination of the two.<br />
<br />
Gmail or google uses the second approach to enable multiple login sessions from the same browser and the reason its able to do that is its a single page app. The url of gmail remains the same irrespective of the page you are on. For eg : Inbox would be <b>https://mail.google.com/mail/u/0/#inbox </b> and sent items would be <b>https://mail.google.com/mail/u/0/#sent. </b><br />
<br />
Now whatever is after # is not assumed to be a part of the url. So effectively no matter which page you are on, browser always thinks you are on the same url and send cookies set just for that url.<br />
<br />
This is how gmail works. If you are logged in as two users, the urls would be <b>https://mail.google.com/mail/u/0/ </b>and<b> </b><b>https://mail.google.com/mail/u/1/. </b>Google assigns this number 0,1,2 for each user session and a separate cookie is set for each and voila you can now login as 2,3,4 different users in the same browser. To test this just change the number manually in one of the tabs, you would see yourself getting logged in as the other user.<br />
<br />
This is possible only with a single page application like gmail where the url never changes. So if you want your application to support multiple logins, go single page!!<br />
<br />
<br />
<br /></div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com0tag:blogger.com,1999:blog-7255850341729335726.post-17128476765814345792015-03-07T22:52:00.002-08:002015-03-12T21:35:49.632-07:00AWS Autoscaling with F5 Big IP dynamic node add/remove<div dir="ltr" style="text-align: left;" trbidi="on">
In continuation of the F5 series (setup <a href="http://akashbhunchal.blogspot.in/2015/01/deploying-f5-big-ip-ha-activepassive.html" target="_blank">F5 BIG-IP HA on AWS EC2</a>, <a href="http://akashbhunchal.blogspot.in/2015/01/creating-virtual-server-on-f5-big-ip-ha.html" target="_blank">setup a Virtual Server</a>), I am going to share my experience with using AWS Autoscaling with F5 Big-IP.<br />
<br />
AWS EC2 has an awesome feature called AutoScaling which one can use to maintain a pool of servers at a desired size or scale the cluster up or down based on CPU utilization, network IO, etc. This gives us the ability to size our cluster according to the traffic resulting in optimal utilization of resources and cost saving.<br />
<br />
The tricky part here is to add/remove nodes on F5 as the nodes get added/removed. Since F5 needs to know about the new nodes added removed, we need to integrate AWS Autoscaling with F5.<br />
<br />
I used F5 REST API to achieve the same. AWS Autoscaling transmits events about the scaling activities through SNS. All we need to do is listen :)<br />
<br />
I have created a micro service which listens for such events and does the needful on F5. I have published my implementation <a href="https://github.com/akashbhunchal/AWSAutoScalingWithF5" target="_blank">here</a> .<br />
<br />
Please feel free to use it. A simple thank you would suffice if you find it useful and it saves some time for you :) .</div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com8tag:blogger.com,1999:blog-7255850341729335726.post-25025669936466435652015-01-15T06:25:00.001-08:002015-03-07T22:55:13.995-08:00Creating a Virtual Server on F5 BIG IP HA Active/Passive (Active/Standby) on AWS EC2 / VPC<div dir="ltr" style="text-align: left;" trbidi="on">
Please visit my previous blog on how to setup an Active/Standby F5 Big IP on AWS <a href="http://akashbhunchal.blogspot.in/2015/01/deploying-f5-big-ip-ha-activepassive.html" target="_blank">here</a>. The blog also covers some basics of F5 terminologies.<br />
<br />
Now its time to get your load balancer up and running. You can run multiple instances of load balanced instances on F5. These are called Virtual Servers on F5.<br />
<br />
<b>1: Prerequisites:</b><br />
<b><br /></b>
<br />
<ol style="text-align: left;">
<li>Make sure that you have your backend servers that you want to load balance ready.</li>
<li>Make sure that the security groups have the required ports open both for F5 as well as backend server subnets.</li>
<li>Make sure that the services you want to load balance on these nodes are running :)</li>
</ol>
<br />
<b>2: Setup Nodes on F5:</b><br />
<br />
<ol style="text-align: left;">
<li>Goto Local Traffic > Nodes > Node List. Click <b>Create</b> .</li>
<li>Give the desired <b>Name</b> and <b>Address</b>.</li>
<li><b>Health Monitors</b> as Node Default.</li>
<li>Click Finished</li>
</ol>
<div>
If the node is reachable you should see the node status as a Blue Box.</div>
<div>
<br /></div>
<div>
<b>3: Create a Pool on F5:</b></div>
<div>
<ol style="text-align: left;">
<li>Goto Local Traffic > Pools > Pool List. Click <b>Create</b>.</li>
<li>Enter the desired <b>Name</b>.</li>
<li>Select one of the <b>Health Monitors</b>. This is for checking the health of the backend servers.</li>
<li>Select the <b>Load Balancing Method</b>.</li>
<li>Under <b>New Members. </b>Select <b>Node List </b>and add the desired nodes to the pool.</li>
<li>Click <b>Finished</b> .</li>
<li>If the pool has been setup properly, it should have a green status.</li>
</ol>
<div>
<b>4: Creating the Virtual Server on F5:</b></div>
</div>
<div>
<ol style="text-align: left;">
<li>Add an additional private IP to the the ENI which in on <b>external</b> VLAN. This becomes your Load Balancer IP.</li>
<li>Goto Local Traffic > Virtual Servers > Virtual Server List. Click <b>Create</b>.</li>
<li>Enter the desired <b>Name </b>and <b>Type.</b></li>
<li>Source Address is the IP range this VS should accept.To allow access from all IPs enter 0.0.0.0/0.</li>
<li><b>Destination Address</b> should be the new private IP that we just now created.</li>
<li><b>Service Port</b> is the port your service is running on.</li>
<li><b>Source Address Translation</b> should be <b>Auto Map</b>. Please note that if you dont do this your Virtual Server will not work and the request would never reach your backend servers.</li>
<li>From <b>Default Pool</b> select the pool that we had created above.</li>
<li>Select the other settings as desired. For this exercise, leave the other settings as is.</li>
</ol>
<div>
If the VS has been setup properly, it should show green in the status.</div>
</div>
<div>
<br /></div>
<div>
Hit the VS IP and voila! your application is load balanced.<br />
<br />
To learn how to integrate AutoScaling with F5 go <a href="http://akashbhunchal.blogspot.in/2015/03/aws-autoscaling-with-f5-big-ip-dynamic.html" target="_blank">here</a> </div>
<div>
<br /></div>
</div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com2tag:blogger.com,1999:blog-7255850341729335726.post-45250367286407065922015-01-14T04:12:00.001-08:002015-03-07T22:54:37.708-08:00Deploying F5 BIG IP HA Active/Passive (Active/Standby) on AWS EC2 / VPC<div dir="ltr" style="text-align: left;" trbidi="on">
BIG IP is a big name in the world of Application Delivery Platforms. It is used primarily as a load balancer/interface for hosting a number of applications. It is modular in nature and has a variety of modules like optimized content delivery, application firewall, etc. The full set of features is listed <a href="https://f5.com/products/big-ip" target="_blank">here</a><br />
<br />
F5 a few years back used to be a hardware box only which one had to buy and wire to switches/ machines . They have now come up with a cloud offering for the same and its called BIG-IP VE (VE stands for virtual edition). One can now chose to either run their hardware or run the VE on cloud.<br />
<br />
We had to set F5 VE for one of our customers on AWS. Coming from a non networking/non physical server background, it was difficult for us to understand the F5 networking terminology and map it to AWS which as we all know is completely abstracted.<br />
<br />
There is one documentation provided by F5 on how to host F5 on EC2 and its pretty good. Its available <a href="http://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/ve-ec2-setup-11-3-0/2.html" target="_blank">here</a>. But the sad part is it assumes one understands F5 completely and is best for people who have hands on experience with running F5 hardware boxes. I followed the same and was able to set up the F5 but with some gotchas which I would like to share with you in this article. I am also going to brief you about the basics of F5 and how it works.<br />
<br />
Some terms that one should know:<br />
<br />
<b><span style="font-size: large;">VLAN (Virtual Lan):</span></b><br />
<br />
We all understand what LAN is. Virtual Lan is used to create further sub sections of the LAN. For eg in case of a SWITCH all the ports on it constitute a single broadcast domain. So if one machine sends out a broadcast message it would be placed on all the ports of the switch. This leads to a lot of unnecessary traffic.<br />
<br />
Since a SWITCH is a layer 2 device and is not aware of the NETWORK layer all the ports are part of the same network. Suppose we have a very big network where in there are 1000 machines on the same network connected via a SWITCH. What if I want to segregate this network further, for eg: if I want to create three groups like SALES, MARKETING, DEVELOPMENT. I want to avoid cross group traffic which is unavoidable in case of SWITCH as its not aware of the logical subnets (if I create one for each which is possible but not recommended). So if a machine in SALES is looking for another machine within that group, it would send out an ARP request which would be received by all the machines on the switch and not just the SALES subnet. This causes a lot of unnecessary traffic.<br />
<br />
To avoid this some switches come with a facility to create virtual lans. It allows us to group ports (phyical switch ports) together into a virtual network. So now we can sat that port 1,2,3 belong to VLAN A and ports 4,5,6 belong to VLAN B. So now there would not be a single broadcast domain and if an ARP request is sent by a machine in VLAN A it would stay within that VLAN (ports to be precise). Now we can have different subnets for each VLAN and these subnets would only be able to talk to each other through a router. This is usually achieved by adding tags to the ports.<br />
<br />
This way we can reduce a lot of unnecessary traffic by limiting our broadcast domain to a smaller section.<br />
<br />
AWS does not support VLAN. So for us a VPC subnet is as good as a VLAN and can be used as such but nothing stops us from creating a pueudo VLAN which is smaller than a subnet.<br />
<br />
<b><span style="font-size: large;">Virtual Server:</span></b><br />
<br />
Virtual Server in F5 is equivalent to an ELB. In ELB we get a Domain Name and not an IP but with F5 we get an IP. A single F5 box can run multiple such Load Balanced endpoints. A single F5 box can be used for all reverse proxy requirements in a VPC. As the name implies, its a logical server and not an actual one, identified by an IP (EIP or private IP). Every Virtual Server has a pool of servers which it load balances. This is similar to the instances on an ELB. Since multiple private IPs can be attached to a single ENI, the number of VS that we can run on an F5 is limited by the number of ENIs that an instance can have.<br />
<br />
<b><span style="font-size: large;">Self IP:</span></b><br />
<br />
An F5 box can be part of multiple VLANs. Think of Self IP as the IP F5 box uses to recognize itself, as a single ENI could have multiple private IPs attached to it which may be used by VS or some other thing. This IP is static in nature and does not migrate in case of failover.<br />
<br />
<b><span style="font-size: large;">Floating IP:</span></b><br />
<br />
For an HA setup we need the VLANs too to migrate from one box to the other. This is achieved by assigning a floating IP to each VLAN. This IP migrates from one F5 box to the other in case of failover. This IP movement happens through reassigning of this private IP from box A to box B through AWS API calls.<br />
<br />
<b><span style="font-size: large;">Traffic Group: </span></b><br />
<br />
In case of a HA setup, the entity that moves from one box to the other is the Traffic Group. All the floating IPs, VS ips are a part of this. We can force the movement of the traffic group manually too through the console.<br />
<br />
Now lets get to the actual setup of a HA cluster:<br />
<br />
<b>1: Prerequisites:</b><br />
<br />
<ol style="text-align: left;">
<li>AWS account with a VPC with atleast three subnets. For this setup lets create a VPC with CIDR 10.0.0.0/16 and three subnets 10.0.0.0/24 (management), 10.0.1.0/24 (external), 10.0.2.0/24 (internal).</li>
<li>Two Security Groups as mentioned <a href="https://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/ve-ec2-setup-11-3-0/2.html#unique_47656422" target="_blank">here</a></li>
</ol>
<div>
<b>2: Launch Box A:</b></div>
<div>
<ol style="text-align: left;">
<li>Go <a href="https://aws.amazon.com/marketplace/seller-profile/ref=dtl_pcp_sold_by?ie=UTF8&id=74d946f0-fa54-4d9f-99e8-ff3bd8eb2745" target="_blank">here</a> . Select the one which suits you.</li>
<li>For subnet, select the management subnet and assign a private IP (example 10.0.0.2). Add two more Network Interfaces one each from external and internal subnet and assign one private Ip (example 10.0.1.2 and 10.0.2.2).</li>
<li>For security group select <b>allow-all-traffic</b> . </li>
<li>Once the machine is launched assign an EIP to the management ENI. This is done so that the management port is accessible over the internet for configuration.</li>
</ol>
<div>
<b>3: Setting up the admin password:</b></div>
</div>
<ol>
<li>Log in to the new AMI that you just launched. Use the name of the key pair (.pem file), and the elastic IP address of your EC2 instance. $ ssh -i <username>-aws-keypair.pem root@<elastic IP address of EC2 instance>.</li>
<li>At the command prompt, type <kbd class="userinput" style="background-color: white; font-size: 13px; font-weight: bold;">tmsh modify auth password admin</kbd><span style="background-color: white; font-family: Arial, Helvetica, Verdana; font-size: 13px;">.</span></li>
To ensure that the system retains the password change, type <b>tmsh save sys config</b>, and then press <b>Enter</b>.</ol>
<div>
<b>4: VLAN setup:</b></div>
<div>
<ol style="text-align: left;">
<li>Login at https:<EIP>. Enter the admin username/password that we created in the last step.</li>
<li>A setup wizard would come up. Complete first 2-3 steps (license activation) then quit the wizard. Dont finish the rest of the steps as we would be doing those manually.</li>
<li>Go to Network > VLAN > VLAN List . Click <b>Create .</b></li>
<li>Enter name <b>internal.</b></li>
<li>Select 1.2 for interface, Tagging <b>Untagged. </b>Click the <b>Add</b> button.</li>
<li>Click <b>Finished.</b></li>
<li>Repeat the same steps as above to create another VLAN by the name <b>external. </b>For interface select 1.1. </li>
</ol>
<div>
<b>5: Self IP setup:</b></div>
</div>
<div>
<ol style="text-align: left;">
<li>Goto Network > Self IPs. Click <b>Create</b></li>
<li>Put Name as <b>self_ip_external</b>. IP Address <b>10.0.1.2</b>. Netmask as <b>255.255.255.0</b>. VLAN as <b>external</b>. Port lockdown <b>Allow All. </b>Select the Default Traffic Group.</li>
<li>Do the same for the <b>internal</b> VLAN.</li>
<li>Click <b>Finished</b>.</li>
</ol>
<div>
<b>6: Setup AWS Credentials: E</b>nter AWS credentials under System > Configuration > AWS.</div>
<div>
<br /></div>
<div>
<b>7: Getting ready for HA setup:</b></div>
<div>
<ol style="text-align: left;">
<li>Goto Device Management > Devices > Device Connectivity > Config Sync. Select the <b>external</b> VLAN IP.</li>
<li>Goto Device Management > Devices > Device Connectivity > Failover Network. Click <b>Add</b> under <b>Failover Unicast Configuration</b>. Use the management (10.0.0.2) IP here.</li>
</ol>
</div>
<div>
<b>8: Setup the Box B : </b>Follow all the above steps to setup the other box. Needless to say, the IPs would be different for this box :) . </div>
<div>
<br /></div>
<div>
<b>9: HA cluster setup:</b></div>
<div>
<ol style="text-align: left;">
<li>In Box A goto Device Management > Device Trust > Peer List. Click <b>Add. </b>Use the management IP of Box B and admin username/password. Follow the rest of the steps</li>
<li>Now both the boxes are paired.</li>
<li>Goto Device Management > Device Groups . Click <b>Create</b>. </li>
<li>Put any name to identify the device group which will participate in failover cluster.</li>
<li>Group Type is <b>Sync-Failover</b>.</li>
<li>Drag both IPs from right to left.</li>
<li>Select <b>Full Sync </b>and <b>Network Failover</b>. </li>
<li>You may have have to sync the config once to the Box B. goto Device Management > Overview and sync Box A to the group once.</li>
<li>You HA cluster Setup is done. One box would show ACTIVE and the other one STANDBY.</li>
</ol>
<div>
<b>10: Creating Floating IPs:</b></div>
</div>
<div>
<ol style="text-align: left;">
<li>This has to be done ONLY on Box A.</li>
<li>Add one more secondary IP to the 10.0.1.0/24 and 10.0.2.0/24 subnet ENI one of the boxes through AWS console.</li>
<li>Go to Network > Self IPs. Click <b>Create</b>. </li>
<li>Enter the name as <b>self_ip_floating_internal </b>for internal VLAN. Select the same values as before (with new IP that we created above). Select <b>traffic-group-1 (floating)</b> for Traffic Group.</li>
<li>Similarly do the same for <b>external</b> VLAN.</li>
</ol>
<div>
Now we have the HA setup ready. To test the movement of the VLAN floating IPs do the force failover and observe in the AWS console. The private IPs (floating) move from one box to the other.</div>
</div>
<div>
<br /></div>
<div>
Any Virtual Server that we create would have their IPs as part of this default floating traffic group. This group and its failover objects (like Virtuals Servers and IPs) can be seen under <b>Device Management > Traffic Groups > Failover Objects</b>.<br />
<br />
<br />
To learn more about creating a Virtual Server go <a href="http://akashbhunchal.blogspot.in/2015/01/creating-virtual-server-on-f5-big-ip-ha.html" target="_blank">here</a>.<br />
To learn how to integrate AutoScaling with F5 go <a href="http://akashbhunchal.blogspot.in/2015/03/aws-autoscaling-with-f5-big-ip-dynamic.html" target="_blank">here</a> </div>
<div>
<br /></div>
</div>
<div>
<b><br /></b></div>
<br />
<br />
<br />
<b><span style="font-size: large;"><br /></span></b></div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com14tag:blogger.com,1999:blog-7255850341729335726.post-25181917262597029662014-12-03T01:42:00.001-08:002014-12-03T01:44:48.076-08:00Migrating a single node Cassandra to multi node on AWS (EC2) with Datastax Enterprise Edition and OpsCenter<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Migration of single node cassandra to an HA cluster:</span></div>
<b id="docs-internal-guid-9cacaf4c-0f87-127b-fa74-17edc911a214" style="font-weight: normal;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">As an example we would migrate a single node cassandra cluster to a 4 node Datastax Enterprise Edition Cassandra. More often than not we start with cassandra on a single node and when the time has come to scale we move it to a cluster for HA and like to have a monitoring layer on top of it called OpsCenter which is a GUI tool from DataStax to manage one or more Cassandra clusters.</span></div>
<b style="font-weight: normal;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The first thing that we are going to do is launch a new Ubuntu 14.04 VM in AWS. This machine would act as a template for more machines to be launched in the cluster. Once all the desired applications are installed on this machine we would be creating an AMI out of it.</span></div>
<b style="font-weight: normal;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Once a plain jane Ubuntu 14.04 has been launched, follow the the following steps to create your first Cassandra Node node:</span></div>
<ol style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Install Oracle Java. </span><a href="http://www.webupd8.org/2012/01/install-oracle-java-jdk-7-in-ubuntu-via.html" style="text-decoration: none;"><span style="background-color: transparent; color: #1155cc; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: underline; vertical-align: baseline; white-space: pre-wrap;">http://www.webupd8.org/2012/01/install-oracle-java-jdk-7-in-ubuntu-via.html</span></a><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> .</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Install Python 2.6+.</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Now install DataStax Cassandra.</span></div>
</li>
<ol style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">echo "deb http://username:password@debian.datastax.com/enterprise stable main" | sudo tee -a /etc/apt/sources.list.d/datastax.sources.list</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> where username and password are the DataStax account credentials from your registration confirmation email. You need to register to be able to download. Registration is free.</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">curl -L https://debian.datastax.com/debian/repo_key | sudo apt-key add - . </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Note: If you have trouble adding the key, use http instead of https.</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">sudo apt-get update</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">sudo apt-get install dse-full </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">(Installs only DataStax Enterprise and the DataStax Agent.)</span></div>
</li>
</ol>
</ol>
<b style="font-weight: normal;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">We now have a DataStax Cassandra node as well as DataStax-Agent installed on this machine. Agent is required by the OpsCenter to monitor each cassandra node remotely.</span></div>
<b style="font-weight: normal;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Now we should copy all the cassandra file to the new machine. You could either attach an empty EBS block to the new machine and copy all the files from the old machine to this volume or just remove the old BS volume from the old machine and attach it to this new machine. After the above activity suppose all the cassandra files are in the location </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">/data/cassandra </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">. This new EBS volume should have good IO throughput. Use provisioned IOPS volume if possible. The instance should have at least 8 GB of Memory and at least 4 CPU.</span></div>
<b style="font-weight: normal;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Now we need to setup the new machine with the new cassandra data at the new location. Do the following on the new node:</span></div>
<ol style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">sudo service dse stop</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Go to /etc/dse/cassandra/cassandra.yaml and configure the following properties:</span></div>
</li>
<ol style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">cluster_name: 'cluster1'</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> . In case you want to change the cluster name, then put the new name here. One more step is required to make this effective, which is covered in the following instructions.</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">num_token: 256 </span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">data_file_directories: - /data/cassandra</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">commitlog_directory: /data/cassandra/commitlog</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> </span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">saved_caches_directory: /data/cassandra/saved_caches </span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">endpoint_snitch: Ec2Snitch</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> # or the desired one</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">- seeds: “x.x.x.x” </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> to the primary/seed node <private ip>address of the current machine. Seeds is a list of servers which a machine connects to at bootstrap to know the meta data about the cluster. This is used only at the start. Since this is the first machine we put its own IP as the seed.</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">listen_address: y.y.y.y</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> to the current node <private ip>address</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">rpc_address: y.y.y.y</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> to current node <private ip>address</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">auto_bootstrap: false</span></div>
</li>
</ol>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Go to </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">/etc/dse/dse.yaml </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> and configure the following properties:</span></div>
</li>
<ol style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">delegated_snitch: org.apache.cassandra.locator.Ec2Snitch </span></div>
</li>
</ol>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Now we need to setup the datacenter name. Go to cassandra-rackdc.properties and make the following change:</span></div>
</li>
<ol style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">dc_suffix=cassandra_dc1 </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">. Here every node which is part of the same data center should have the same suffix. If you want to create more than one data centers within a cluster then provide different name like cassandra_dc2, etc.</span></div>
</li>
</ol>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">sudo service datastax-agent start</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">sudo service dse start</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Now find out the status of the node. </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">sudo nodetool status</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">. Note that it will take 1-2 mins to start and may throw an exception initially. Once its up it should show only one node in the list with state UN.</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Verify that you tables are intact: </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">cqlsh <private ip> -u cluster_name -p password</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">(default cassandra)</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">If you need to update the cluster name then make sure that you have done Step 2a first and then do the following: </span></div>
</li>
<ol style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">cqlsh> UPDATE system.local SET cluster_name = 'cluster1' where key='local';</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; list-style-type: lower-alpha; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">sudo nodetool flush</span></div>
</li>
</ol>
</ol>
<b style="font-weight: normal;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Now we have the cassandra with single node ready. Take the AMI of the above machine and delete everything under /data/cassandra so that this node is clean. </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Make sure that all the machines launched in this tutorial share the same Security Group and all the traffic between the same Security Group should be open. This is very important, else the nodes will not be able to communicate with each other</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">.</span></div>
<b style="font-weight: normal;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Launch more machines from the AMI taken from the first machine and follow the above steps. Just change the following:</span></div>
<ol style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">- seeds: “x.x.x.x” </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> . This should be the private ip of the first machine.</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">auto_bootstrap: true</span></div>
</li>
</ol>
<b style="font-weight: normal;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">When you launch each node and start all the services. Do the </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">nodetool status </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">. Every node’s initial status would be UJ which would change to UN once the node has joined the cluster completely. In this example we have a 4 node cassandra cluster.</span></div>
<b style="font-weight: normal;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Now we need to change the replication factor of our cluster (3). Login to any of the boxes and do the following:</span></div>
<ol style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Get the datacenter name. </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">cqlsh> use system;select data_center from local;</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">cqlsh> ALTER KEYSPACE <cluster_name> WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy','<datacenter_name>' : 3 } ;</span></div>
</li>
</ol>
<b style="font-weight: normal;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Very Important: Post this we need to run </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">nodetool -h <private_ip> repair</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">on each machine of the node. In our experience this may take days together so run this command in nohup. The cluster can be used without any issues even when this is running. It took us almost 20 days to repair all the nodes. </span></div>
<b style="font-weight: normal;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Now we need to setup the Opscenter to monitor the cluster. Launch a new machine (even an m1.small would do) and install ops-center on it. Setup the DataStax apt repo as described above to this new machine and run </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">sudo apt-get install opscenter</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">. This machine should have a public IP. Make sure you run this server is the same Security Group as the Cassandra cluster or open all traffic between the SGs if running in different SG.</span></div>
<b style="font-weight: normal;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Once the opscenter is installed. Execute </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: italic; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">sudo service opscenter start</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;"> .</span></div>
<b style="font-weight: normal;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Opscenter can be accessed via </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: bold; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">https://<IP>/opscenter </span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">. </span></div>
<b style="font-weight: normal;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">To add the cluster to OpsCenter, do the following:</span></div>
<ol style="margin-bottom: 0pt; margin-top: 0pt;">
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">New Cluster -> Manage Existing Cluster.</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Enter IP any of the nodes. This should be the private IP of the node. We are assuming that the opscenter and nodes are all in the same VPC and the communication is open between them.</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">It may ask for the .pem private key for the node. Provide the key.</span></div>
</li>
<li dir="ltr" style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; list-style-type: decimal; text-decoration: none; vertical-align: baseline;"><div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Done. In few minutes it will show the cluster status and all the cluster metrics.</span></div>
</li>
</ol>
<b style="font-weight: normal;"><br /></b>
<br />
<div dir="ltr" style="line-height: 1.15; margin-bottom: 0pt; margin-top: 0pt;">
<span style="background-color: transparent; color: black; font-family: Arial; font-size: 15px; font-style: normal; font-variant: normal; font-weight: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">Our cluster with 4 nodes and replication factor of 3 is now ready along with Opscenter to monitor it.</span></div>
<br /></div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com4tag:blogger.com,1999:blog-7255850341729335726.post-54869360757094961912014-07-04T00:22:00.001-07:002014-07-04T00:22:34.724-07:00Difference between NAT vs PROXY vs ROUTER<div dir="ltr" style="text-align: left;" trbidi="on">
To understand the subtle and not so subtle differences between the three, one needs to know the OSI model and where in that model these operate.<br />
<br />
Here is a brief overview of the OSI model.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhl5s9XMRFwXXhbwg9pBDogVgV5O2hgROAQx9897dIlI9m6GVHkdySIyjqRsFVWop_xdTP4RlM5E61X4DEES_kmTxDActzqYi6KIFqsD7FqXqrwfxllHdOdqFOfW5GF_S7qFy9rftlKOAHs/s1600/osi-model-7-layers.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhl5s9XMRFwXXhbwg9pBDogVgV5O2hgROAQx9897dIlI9m6GVHkdySIyjqRsFVWop_xdTP4RlM5E61X4DEES_kmTxDActzqYi6KIFqsD7FqXqrwfxllHdOdqFOfW5GF_S7qFy9rftlKOAHs/s1600/osi-model-7-layers.png" height="320" width="276" /></a></div>
<br />
The left column defines the data model for each layer. For example the smallest data block at the Network layer is a packet and the smallest data block at the transport layer is a segment.<br />
<br />
Protocols like TCP and UDP function at the Transport layer, whereas IP and routing works at the Network layer.<br />
<br />
Now coming to what operates in which layer:<br />
<br />
<b><span style="font-size: large;">Router</span></b><br />
<br />
See the diagram below:<br />
<span style="text-align: center;"><br /></span>
<span style="text-align: center;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrtnpCAfUBgwXnQR14ilSrO938BfdRhuJyR0L1Z2CSngR6v2XhfJ-M_xNbJ8xKzM2TxKJJRTECKPUnQ1fbEZ3JFJreTDvRDOMT_74sFlulmqBvifIUhH94icq6SUZintn_bRn8CoxVNOiM/s1600/NATvsPROXYvsROUTER_1.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrtnpCAfUBgwXnQR14ilSrO938BfdRhuJyR0L1Z2CSngR6v2XhfJ-M_xNbJ8xKzM2TxKJJRTECKPUnQ1fbEZ3JFJreTDvRDOMT_74sFlulmqBvifIUhH94icq6SUZintn_bRn8CoxVNOiM/s1600/NATvsPROXYvsROUTER_1.png" height="152" width="640" /></a></div>
<span style="text-align: center;"> There are two machines, each on a different network connected through a Router. If machine A wants to communicate to machine B it needs to create a TCP connection. When the connection is created, both the machines A and B are unaware of the presence of a router in between. What this means is, in both the machines if we look up the IP:PORT combination it would be </span>10.0.0.1:1234 - 11.0.0.1:4567 i.e router is transparent.<br />
<br />
In this case router does not participate at the Transport (TCP) layer and just acts as a relay for datagram packets. All it knows is where to send the packet. It does not modify the packets and does not require the response to come back through it. Infact, if there is another router somewhere in between these networks which often is the case if we assume two machines connecting over the internet, then the response may come back through some other route. Router is Network layer (TCP) protocol unaware.<br />
<br />
<span style="font-size: large;">NAT (Network Address Translator)</span><br />
<br />
<br />
Now NAT is nothing but an intelligent router. Since we all know that IPv4 ips are in short supply and these days even something as dumb as a fridge has an IP. It is also a fact that most of the devices are consumers and not producers i.e they do not need a publicly resolvable IP. For eg : all the work stations inside a building need not have a public IP assigned to them.<br />
<br />
This is where NAT comes into play. What NAT does is hide the machines behind it from the internet. All the machines go through NAT to access the outside world.<br />
<br />
See the image below:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNgUH8GOaegrD72ZOi1TtPhsxniU46F0SL1z2Hg4SSOYe0PkPLdIjVAdQ6ngPlZD4zIYXp7csli4cx0spj0tcuQKiGu6uZYKpO0zyh6k5_2ITlkS4fY6EUPIGEtlp-5gryHEudVx2OkZ6V/s1600/Copy+of+NATvsPROXYvsROUTER_2.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNgUH8GOaegrD72ZOi1TtPhsxniU46F0SL1z2Hg4SSOYe0PkPLdIjVAdQ6ngPlZD4zIYXp7csli4cx0spj0tcuQKiGu6uZYKpO0zyh6k5_2ITlkS4fY6EUPIGEtlp-5gryHEudVx2OkZ6V/s1600/Copy+of+NATvsPROXYvsROUTER_2.png" height="273" width="640" /></a></div>
The machine A still thinks that it is talking to machine B directly (10.0.0.1:1234 - 11.0.0.1:4567). What the NAT does is replace the Source:Port Header of the packet it received from A with its own IP (w.x.y.z) and its own random port (7897). When the packet reaches machine B it thinks it came from the NAT IP.<br />
<br />
Since a response is always to the source, it tries to send the response packet back to the NAT at the same port as was overridden in the header(7897). When the NAT had received the packet from A it had assigned it a random port (7897) and kept that in a table called NAT table. So when the response comes back from Machine B it just does a reverse look up in the same table and forwards it to the desired recipient (Machine A). This way more than one machines can access the internet through NAT.<br />
<br />
One important point to note here is that at each layer of OSI model there is a checksum to determine if the packet/segment is valid. The same applies here. If the NAT changes the source IP:PORT combination, it need to recalculate the checksum again both at the Transport as well as Network layer. This leads to some additional work for the NAT.<br />
<br />
<b><span style="font-size: large;">Proxy:</span></b> <br />
<br />
A proxy works at the Transport layer and is aware of the protocol. Its not transparent in nature. It actually creates two connections one each with source and destination. Machine A does not even know about machine B. For machine A Proxy is the only thing its talking to and does not care how and where the proxy gets its data.<br />
<br />
See the image below:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixl2QvObjOUN6e52PKgC5WBdXlO2KRzkZ2VCf7MO_lkrFIVscDFJ6RKPVYurQrOg2uRjE0jP-1ApknA7VuITY5cjezPLL923FdO3x3it7Z-E1WdUo-Gkzx_BLvwLIUDO-Lmp1RR_Qh4Ktf/s1600/Copy+of+NATvsPROXYvsROUTER_3.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEixl2QvObjOUN6e52PKgC5WBdXlO2KRzkZ2VCf7MO_lkrFIVscDFJ6RKPVYurQrOg2uRjE0jP-1ApknA7VuITY5cjezPLL923FdO3x3it7Z-E1WdUo-Gkzx_BLvwLIUDO-Lmp1RR_Qh4Ktf/s1600/Copy+of+NATvsPROXYvsROUTER_3.png" height="154" width="640" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
In the above picture Server A does not even know the IP of Server B. All it knows is the IP and port of the proxy server. The proxy server creates two connections, one with Server A and one with Server B. This happens at the Transport layer. Similarly server B does not even know the IP of server A, for it Proxy is the Source.<br />
<br />
Examples of proxy servers are load balancers like HAProxy, Nginx, Apache, AWS ELB, F5 BIG-IP. They hide the backend from the outside world and do lot of nifty stuff like load balancing. optimization, etc.<br />
<br />
Above is a very high level distinction of the three and the lines infact have blurred. Proxy for one is loosely used even for a NAT and vice versa. This just given you a starting point to learn more.</div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com3tag:blogger.com,1999:blog-7255850341729335726.post-77651565690032889052014-06-12T08:34:00.003-07:002014-06-12T08:34:56.947-07:00Jaspersoft Ad Hoc Cache clear and build programatically using Django and Selenium<div dir="ltr" style="text-align: left;" trbidi="on">
Jaspersoft in amazing in what it does, but really sucks when it doesnt work the way you want it to. I am referring to the Ad Hoc Cache behavior in Jaspersoft. For those familiar with it would know that all we can set for the Query Cache is the TTL, which is fine, but the problem comes when one has to clear the cache on demand or programatically (from a script or an ETL).<br />
<br />
AFAIK, Jaspersoft uses EH cache with hibernate to save the results and the query. It has two problems.<br />
<br />
<ol style="text-align: left;">
<li>This query cache is lazy in nature so one has to visit the Report for the Cache to warm up for that query. </li>
<li>Every query has its own expiry time depending on when it was hit and what is the duration of the cache. This mean that one report could show old data and one report could show new data depending on which one was cached when. This causes a lot of confusion to the end users.</li>
<li>Another problem is with invalidation of the cache. There is no HTTP endpoint which one can hit to clear the cache. One has to login as an Admin and the click the "Clear Cache" button.</li>
</ol>
<br />
The above has a lot of limitations. In our case we run an ETL everyday to load data into our redshift DW. This ETL takes a couple of hours to run as this includes some aggregates too. Since the datastore for our application is a Data Ware House, queries are not exactly superfast. What we wanted was a system which would invalidate the cache as soon as the ETL finishes and then build (warm up) the cache so that the end user does not face the initial slowness of the system and is not shown stale data too.<br />
<br />
Since the only option was to go through the browser we tried to automate the same. <a href="http://phantomjs.org/" target="_blank">Phantomjs </a>was our first choice as it runs on Linux and does not require a browser to automate user behavior. This is important as all our production systems are Linux and devoid of any desktop environment.<br />
<br />
But this did not work (atleast for us). The security in Jaspersoft is really good and makes use of some hidden ExecutionKeys to ensure that the request is coming from a proper source. We tried all the headers a normal browser adds with our requests, but Jaspersoft did not throw out an execution key without which we could not hit the Clear Cache HTTP endpoint for clearing cache.<br />
<br />
The only hope now was to use Selenium which requires a browser and hence a machine with desktop environment. We had to launch one Windows box (t1.micro) just for this purpose in our cloud.<br />
<br />
The script first clears the cache and then rebuilds it. To launch this script we had to integrate it with a web framework which would accept our on demand calls from the ETL and launch the process.<br />
<br />
We have used Django as the Web framework to achieve the same and the source code is available here. It has both windows and Linux ports. To run the Linux port you would need a desktop environment to be running there.<br />
<br />
Project is available <a href="https://github.com/akashbhunchal/JaspersoftClearCache" target="_blank">here</a> for download.</div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com1tag:blogger.com,1999:blog-7255850341729335726.post-60649206420175774772014-05-11T00:36:00.002-07:002014-05-11T00:36:41.657-07:00Jaspersoft HTML5 charts remove decimal (where not required) from tooltip<div dir="ltr" style="text-align: left;" trbidi="on">
For those of you who dont know Jaspersoft use HighCharts js library for its HTML5 (dynamic) charts. We were facing this problem of Jaspersoft adding a decimal (XX.00) for all values in the tooltip. This is very disconcerting when the measure is an Integer, like Number of Users, etc. <div>
<br /></div>
<div>
We thought that, may be our domain query or our data store was to blame for it and was returning decimal but it was not.</div>
<div>
<br /></div>
<div>
HighCharts tooltip has a property called <b>valueDecimals </b>which determines how may decimal positions it should show. </div>
<div>
<br /></div>
<div>
Just add this line to <b>getCommonSeriesGeneralOptions </b>method in <b>/var/lib/tomcat7/webapps/jasperserver-pro/scripts/adhoc/highchart.datamapper.js </b>and live happily ever after :).</div>
<div>
<br /></div>
<div>
<b>options.tooltip.valueDecimals=0;</b></div>
<div>
<br /></div>
<div>
This will add a decimal only if you give one. So all your floats (coming from data store) will remain floats and integers will remain integers.</div>
<div>
</div>
</div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com0tag:blogger.com,1999:blog-7255850341729335726.post-79920844015987381092014-04-14T04:10:00.003-07:002014-04-14T06:47:26.591-07:00Jaspersoft with AWS Redshift Experince, Learnings and Problems<div dir="ltr" style="text-align: left;" trbidi="on">
This was my first experience with AWS Redshift and Jaspersoft. Both the technologies are good and easy to use but can sometimes throw up issues which are difficult to decode/fix . Here are some of the things that I faced/discovered using both.<br />
<br />
<b>Redshift:</b><br />
<br />
<ol style="text-align: left;">
<li>Has a decent performance when the queries that you are making do not have joins in them. For eg if there is one table which has 1 Bn rows and another which has just 50k rows and you do a join on a column which is not a sort key, then the query may take anywhere between 5-10min which for me is a no go for a dashboard (a user would not wait for 5 mins for a chart to load). This is because Redshift is very disk intensive and the caching of blocks in memory in Redshift is still not up to the mark so it has to read a lot of blocks on disk for every query.</li>
<li>Always try to denormalize data as much as possible. Joins are a crime. It will suck the life out of you and the query.</li>
<li>Concurrency is not upto the mark in redshift. The performance goes down exponentially with every new parallel execution.</li>
<li>A LIKE match would almost always make the query slow (no surprise here as even an OLTP DB would do the same). But the degradation is significant.</li>
<li>We have a query which when run makes the whole cluster unresponsive :) . We still dont know the reason but it happens. Again no clue why.</li>
<li>We used to run VACCUM everyday after importing data into the cluster, but on numerous occasion we saw that the whole cluster would hang because VACCUM stalled on one of the nodes. One has to do a cluster reboot to fix this. Also the command does not time out so it may hang your cluster (READ/WRITE) for almost a day if not rebooted. So we moved this command to just once a week so that we do not hit this problem that often.</li>
<li>Always make use of the SORT key in your queries. It works like a charm.</li>
<li>Build aggregates for the charts as time taken by queries is not predictable when run on huge tables. We build aggregates for all the charts and dont run queries on big tables at run time.</li>
</ol>
<b>Jaspersoft:</b><br />
It also has its fair share of issues but most of them are due to Redshift :). The following is for HTML 5 reports which we built using the web wizard and NOT through report designer. <br />
<br />
<ol style="text-align: left;">
<li>CREATE VIEW may throw an error or may not load at all when redshift is having performance issues or there is a lot of load on redshift, still dont know the reason why but it happens.</li>
<li>If you change the chart type in VIEW , it may not reflect in the REPORT. Recreate the report.</li>
<li>There is no support for having a single axis chart with multiple measures with tooltip showing all the measures together. I found a work around and is present <a href="http://akashbhunchal.blogspot.in/2014/04/jaspersoft-sharedcommon-tooltip-for-non.html" target="_blank">here</a></li>
<li>There is no API to clear the Query Cache. I tried a lot of command line tools like phantomjs to script it but it did not work as every page has an executionKey and jaspersoft was smart enough to know that it was a scripted attempt :( . I had to write a selenium test case to do the same. As the cache is in memory and NOT DB I guess one would have to modify the JAVA code. It uses EH Cache for caching the queries.</li>
<li>One Good thing about jaspersoft is that even if you leave a report without it completely loading the query still works in the background and populates the cache. This helped me in automating cache warming by simulating user clicks (selenium) without waiting for the page to load.</li>
<li>There is no eager/ preemptive caching of queries and is on demand. So one has to write a selenium test case to warm up the cache. This is a much needed thing for redshift which is not good with concurrent queries.</li>
</ol>
Hope this will help somebody.<br />
<b><br /></b></div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com0tag:blogger.com,1999:blog-7255850341729335726.post-13927819086035272512014-04-11T00:09:00.000-07:002014-04-11T01:12:21.795-07:00Jaspersoft Shared/Common Tooltip for non mutli-axis/single axis graphs (HTML5)<div dir="ltr" style="text-align: left;" trbidi="on">
We were facing this problem wherein we needed four measures to be shown on the same graph (line graph). By default jaspersoft has two chart types, single axis and multi axis. The single axis graph shows only one y axis and one has to hover/move from one measure to the other to view the data in the tooltip.<br />
<br />
This was very inconvenient for the end users as they were not able to see all the measures in a single tooltip for the same point on the x axis (date in our case) making it difficult for them to compare the data for all the measures on the graph.<br />
<br />
We tried using multi axis chart which by default has a shared tooltip, but shows myltiple y axis which we did not need as it was useless for us as one, all measures were comparable to each other and two, it gave a very wrong impression to the end user as a graph with a very low value could appear over a graph with a very high values due to difference in scales. This created a lot of confusion.<br />
<br />
We tried all the forums and blogs but did not get an answer so decided to get our hands dirty. Turns out the fix/change is very easy (it took me 3 days though).<br />
<br />
For those who do not know jaspersoft uses highcharts as the charting library for dynamic (HTML5) charts. The highcharts API has a property called "shared" for "tooltip". If we enable this the tooltip becomes shared. So we just needed to find the js file where it was being set and we found it :) .<br />
<br />
Go to file scripts/adhoc/highchart.datamapper.js, method name "getCommonSeriesGeneralOptions", line number 342 and change<br />
<br />
<b>options.tooltip.shared = HDM.isDualOrMultiAxisChart(extraOptions.chartState.chartType);</b><br />
to<br />
<b>options.tooltip.shared = true;</b><br />
<br />
Beware, this will make the tooltip shared for all the graphs. If you want more granularity then u need to create another method similar to <b> HDM.isDualOrMultiAxisChart() </b>and return true or false accordingly.<br />
<br />
<br /></div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com5tag:blogger.com,1999:blog-7255850341729335726.post-79473131493089053572014-01-12T09:36:00.001-08:002014-01-12T09:36:14.990-08:00AWS JAVA client examples for Auto Scaling metrics (Asynchronous)<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
Below are few code snippets for gathering Auto Scaling metrics from CloudWatch using the AWS Java Async Client (AmazonCloudWatchAsyncClient). Its very similar to the other code snippets I have shared. The only thing that took me almost a day to discover was the <b>namespace </b>which according to the documentation should be "AWS/AutoScaling" but what actually worked for me was "AWS/EC2"<br />
<br />
:(<br />
<br />
As always first create the client:<br />
<br />
<br /></div>
<pre style="background-color: lightgrey;">AWSCredentials credentials = <span style="color: blue;">new</span> BasicAWSCredentials(obj.getString(<span style="color: maroon;">"AWS_ACCESS_KEY"</span>),obj.getString(<span style="color: maroon;">"AWS_SECRET_KEY"</span>));
ClientConfiguration config = <span style="color: blue;">new</span> ClientConfiguration();
config.setMaxConnections(<span style="color: maroon;">1</span>); <span style="color: green;">// This is done to create fixed number of connections per client</span>
AmazonCloudWatchAsyncClient client = <span style="color: blue;">new</span> AmazonCloudWatchAsyncClient(credentials);
client.setConfiguration(config);</pre>
</div>
<br />
Now a utility method to initialize the request object:<br />
<br /></div>
<pre style="background-color: lightgrey;"><span style="color: blue;">private</span> <span style="color: blue;">static</span> GetMetricStatisticsRequest initializeRequestObject(AmazonCloudWatchAsyncClient client,JSONObject groupDetails){
GetMetricStatisticsRequest request = <span style="color: blue;">new</span> GetMetricStatisticsRequest();
request.setPeriod(<span style="color: maroon;">60</span>*<span style="color: maroon;">5</span>); <span style="color: green;">// 5 minutes</span>
request.setNamespace(<span style="color: maroon;">"AWS/EC2"</span>);
List<Dimension> dims = <span style="color: blue;">new</span> ArrayList<Dimension>();
Dimension dim = <span style="color: blue;">new</span> Dimension();
dim.setName(<span style="color: maroon;">"AutoScalingGroupName"</span>);
dim.setValue(groupDetails.getString(<span style="color: maroon;">"NAME"</span>));
dims.add(dim);
Date end = <span style="color: blue;">new</span> Date();
request.setEndTime(end);
<span style="color: green;">// Back up 5 minutes</span>
Date beg = <span style="color: blue;">new</span> Date(end.getTime() - <span style="color: maroon;">10</span>*<span style="color: maroon;">60</span>*<span style="color: maroon;">1000</span>);
request.setStartTime(beg);
request.setDimensions(dims);
<span style="color: blue;">return</span> request;
}</pre>
</div>
<br />
Lets gather some metrics now:<br />
<br />
<br /></div>
<pre style="background-color: lightgrey;"> <span style="color: blue;">public</span> <span style="color: blue;">static</span> <span style="color: blue;">void</span> get5MinCPUUtilization(AmazonCloudWatchAsyncClient client, <span style="color: blue;">final</span> JSONObject groupDetails, <span style="color: blue;">final</span> String clientName){
client.setEndpoint(groupDetails.getString(<span style="color: maroon;">"END_POINT"</span>));
GetMetricStatisticsRequest request = initializeRequestObject(client, groupDetails);
request.setMetricName(<span style="color: maroon;">"CPUUtilization"</span>);
request.setUnit(StandardUnit.Percent);
List<String> stats = <span style="color: blue;">new</span> ArrayList<String>();
stats.add(<span style="color: maroon;">"Average"</span>);
stats.add(<span style="color: maroon;">"Maximum"</span>);
stats.add(<span style="color: maroon;">"Minimum"</span>);
request.setStatistics(stats);
client.getMetricStatisticsAsync(request, <span style="color: blue;">new</span> AsyncHandler<GetMetricStatisticsRequest, GetMetricStatisticsResult>() {
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onSuccess(GetMetricStatisticsRequest arg0,
GetMetricStatisticsResult arg1) {
List<Datapoint> data = arg1.getDatapoints();
Double avg = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getAverage() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
Double min = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getMinimum() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
Double max = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getMaximum() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
}
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onError(Exception arg0) {
}
});
<span style="color: blue;">return</span>;
}
<span style="color: blue;">public</span> <span style="color: blue;">static</span> <span style="color: blue;">void</span> get5MinDiskReadOps(AmazonCloudWatchAsyncClient client, <span style="color: blue;">final</span> JSONObject groupDetails, <span style="color: blue;">final</span> String clientName){
client.setEndpoint(groupDetails.getString(<span style="color: maroon;">"END_POINT"</span>));
GetMetricStatisticsRequest request = initializeRequestObject(client, groupDetails);
request.setMetricName(<span style="color: maroon;">"DiskReadOps"</span>);
request.setUnit(StandardUnit.Count);
List<String> stats = <span style="color: blue;">new</span> ArrayList<String>();
stats.add(<span style="color: maroon;">"Average"</span>);
stats.add(<span style="color: maroon;">"Maximum"</span>);
stats.add(<span style="color: maroon;">"Minimum"</span>);
request.setStatistics(stats);
client.getMetricStatisticsAsync(request, <span style="color: blue;">new</span> AsyncHandler<GetMetricStatisticsRequest, GetMetricStatisticsResult>() {
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onSuccess(GetMetricStatisticsRequest arg0,
GetMetricStatisticsResult arg1) {
List<Datapoint> data = arg1.getDatapoints();
Double avg = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getAverage() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
Double min = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getMinimum() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
Double max = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getMaximum() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
}
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onError(Exception arg0) {
}
});
<span style="color: blue;">return</span>;
}
<span style="color: blue;">public</span> <span style="color: blue;">static</span> <span style="color: blue;">void</span> get5MinStatusCheckFailed(AmazonCloudWatchAsyncClient client, <span style="color: blue;">final</span> JSONObject groupDetails, <span style="color: blue;">final</span> String clientName){
client.setEndpoint(groupDetails.getString(<span style="color: maroon;">"END_POINT"</span>));
GetMetricStatisticsRequest request = initializeRequestObject(client, groupDetails);
request.setMetricName(<span style="color: maroon;">"StatusCheckFailed"</span>);
request.setUnit(StandardUnit.Count);
List<String> stats = <span style="color: blue;">new</span> ArrayList<String>();
stats.add(<span style="color: maroon;">"Average"</span>);
stats.add(<span style="color: maroon;">"Maximum"</span>);
stats.add(<span style="color: maroon;">"Minimum"</span>);
request.setStatistics(stats);
client.getMetricStatisticsAsync(request, <span style="color: blue;">new</span> AsyncHandler<GetMetricStatisticsRequest, GetMetricStatisticsResult>() {
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onSuccess(GetMetricStatisticsRequest arg0,
GetMetricStatisticsResult arg1) {
List<Datapoint> data = arg1.getDatapoints();
Double avg = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getAverage() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
Double min = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getMinimum() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
Double max = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getMaximum() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
}
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onError(Exception arg0) {
}
});
<span style="color: blue;">return</span>;
}
<span style="color: blue;">public</span> <span style="color: blue;">static</span> <span style="color: blue;">void</span> get5MinDiskWriteOps(AmazonCloudWatchAsyncClient client, <span style="color: blue;">final</span> JSONObject groupDetails, <span style="color: blue;">final</span> String clientName){
client.setEndpoint(groupDetails.getString(<span style="color: maroon;">"END_POINT"</span>));
GetMetricStatisticsRequest request = initializeRequestObject(client, groupDetails);
request.setMetricName(<span style="color: maroon;">"DiskWriteOps"</span>);
request.setUnit(StandardUnit.Count);
List<String> stats = <span style="color: blue;">new</span> ArrayList<String>();
stats.add(<span style="color: maroon;">"Average"</span>);
stats.add(<span style="color: maroon;">"Maximum"</span>);
stats.add(<span style="color: maroon;">"Minimum"</span>);
request.setStatistics(stats);
client.getMetricStatisticsAsync(request, <span style="color: blue;">new</span> AsyncHandler<GetMetricStatisticsRequest, GetMetricStatisticsResult>() {
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onSuccess(GetMetricStatisticsRequest arg0,
GetMetricStatisticsResult arg1) {
List<Datapoint> data = arg1.getDatapoints();
Double avg = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getAverage() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
Double min = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getMinimum() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
Double max = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getMaximum() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
}
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onError(Exception arg0) {
}
});
<span style="color: blue;">return</span>;
}
<span style="color: blue;">public</span> <span style="color: blue;">static</span> <span style="color: blue;">void</span> get5MinNetworkOutBytes(AmazonCloudWatchAsyncClient client, <span style="color: blue;">final</span> JSONObject groupDetails, <span style="color: blue;">final</span> String clientName){
client.setEndpoint(groupDetails.getString(<span style="color: maroon;">"END_POINT"</span>));
GetMetricStatisticsRequest request = initializeRequestObject(client, groupDetails);
request.setMetricName(<span style="color: maroon;">"NetworkOut"</span>);
request.setUnit(StandardUnit.Bytes);
List<String> stats = <span style="color: blue;">new</span> ArrayList<String>();
stats.add(<span style="color: maroon;">"Average"</span>);
stats.add(<span style="color: maroon;">"Maximum"</span>);
stats.add(<span style="color: maroon;">"Minimum"</span>);
request.setStatistics(stats);
client.getMetricStatisticsAsync(request, <span style="color: blue;">new</span> AsyncHandler<GetMetricStatisticsRequest, GetMetricStatisticsResult>() {
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onSuccess(GetMetricStatisticsRequest arg0,
GetMetricStatisticsResult arg1) {
List<Datapoint> data = arg1.getDatapoints();
Double avg = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getAverage() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
Double min = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getMinimum() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
Double max = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getMaximum() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
}
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onError(Exception arg0) {
}
});
<span style="color: blue;">return</span>;
}
<span style="color: blue;">public</span> <span style="color: blue;">static</span> <span style="color: blue;">void</span> get5MinNetworkInBytes(AmazonCloudWatchAsyncClient client, <span style="color: blue;">final</span> JSONObject groupDetails, <span style="color: blue;">final</span> String clientName){
client.setEndpoint(groupDetails.getString(<span style="color: maroon;">"END_POINT"</span>));
GetMetricStatisticsRequest request = initializeRequestObject(client, groupDetails);
request.setMetricName(<span style="color: maroon;">"NetworkIn"</span>);
request.setUnit(StandardUnit.Bytes);
List<String> stats = <span style="color: blue;">new</span> ArrayList<String>();
stats.add(<span style="color: maroon;">"Average"</span>);
stats.add(<span style="color: maroon;">"Maximum"</span>);
stats.add(<span style="color: maroon;">"Minimum"</span>);
request.setStatistics(stats);
client.getMetricStatisticsAsync(request, <span style="color: blue;">new</span> AsyncHandler<GetMetricStatisticsRequest, GetMetricStatisticsResult>() {
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onSuccess(GetMetricStatisticsRequest arg0,
GetMetricStatisticsResult arg1) {
List<Datapoint> data = arg1.getDatapoints();
Double avg = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getAverage() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
Double min = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getMinimum() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
Double max = data.size() > <span style="color: maroon;">0</span> ? data.get(<span style="color: maroon;">0</span>).getMaximum() : <span style="color: maroon;">0</span><span style="color: maroon;">.0</span>;
}
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onError(Exception arg0) {
log.error(<span style="color: maroon;">"Could not get Autoscaling data for "</span> + groupDetails.getString(<span style="color: maroon;">"NAME"</span>) + <span style="color: maroon;">" for client "</span>+ clientName,arg0);
NotificationMail.sendMail(<span style="color: maroon;">"Could not get Autoscaling data for "</span> + groupDetails.getString(<span style="color: maroon;">"NAME"</span>) + <span style="color: maroon;">" for client "</span>+ clientName, <span style="color: maroon;">"AutoScaling data could not be read"</span>);
}
});
<span style="color: blue;">return</span>;
}
</pre>
</div>
<br />
For some more examples (ELB and RDS metrics) go <a href="http://akashbhunchal.blogspot.in/2013/10/aws-java-client-examples-for-elb-and.html" target="_blank">here</a></div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com1tag:blogger.com,1999:blog-7255850341729335726.post-26374409338192126792014-01-12T09:04:00.001-08:002014-01-12T09:04:23.154-08:00Cloud Based (AWS) Elastic Jmeter Load Testing Application (SWARM)<div dir="ltr" style="text-align: left;" trbidi="on">
In this age of internet its imperative for any web based application to benchmark itself for high concurrency. As AWS Advanced Technology partners our work includes helping enterprises/start ups embrace AWS for their production as well as testing workloads. Few questions that people have are<br />
<div>
<br /></div>
<div>
1) Is AWS scalable ? </div>
<div>
2) How many requests/min can an EC2 instance serve ? </div>
<div>
3) What instance class should I choose for my my application ?</div>
<div>
4) How many instances should I chose for my application ?</div>
<div>
5) Does Auto Scaling actually work ? </div>
<div>
<br /></div>
<div>
Turns out there are no simple answers to these question as these are very subjective in nature and vary from one application to the other. The only way to test this is by doing a load test.</div>
<div>
<br /></div>
<div>
Jmeter is almost an industry standard for load testing. We can run a test with desired concurrency and duration and write our own test cases through the GUI provided with it. It can provide a summary in the form a RAW log file (JTL) or a table or a graph.</div>
<div>
<br /></div>
<div>
All this is good when you want to run a load test from one machine, but what if you want to run load test from multiple machines ? How would you aggregate the data across multiple machines ?</div>
<div>
<br /></div>
<div>
You must be wondering why would we need to run load test from multiple machines and why not from one machine only ? </div>
<div>
<br /></div>
<div>
Some things that I have learnt from my experience are :</div>
<div>
<br /></div>
<div>
1) The test should always be run in a distributed nature. When running concurrent connections from a single machine one could easily reach the network/IO limit of a single machine which would add to response time which would not be correct.</div>
<div>
<br /></div>
<div>
2) Since Jmeter creates multiple concurrent threads, the more the threads more would be the CPU contention which would add to the response time incorrectly.</div>
<div>
<br /></div>
<div>
3) You cannot target requests/unit time for your load test as its a function of number of concurrent threads and the server response time.<br />
<br />
4) You can simulate only concurrency with jmeter. For example if you select 100 threads then Jmeter would make sure that there are 100 concurrent requests at any given time. Also Jmeter reuses these threads for maximum performance.<br />
<br />
5) When doing load testing for an application behind ELB make sure that either ELB is pre warmed (details <a href="http://aws.amazon.com/articles/1636185810492479#pre-warming" target="_blank">here</a>) or you use ramp up. Please note that this is required only when the concurrency you are testing for is very high (there are no numbers shared by AWS). To know whether you are reaching the limits of ELB look for ELB 5XX value in the cloudwatch for your ELB.<br />
<br />
6) To know which part of your stack is the bottleneck, use a profiler. My favorite is New Relic. It has plugins for almost all softwares.<br />
<br />
To try out our product please visit <a href="https://swarm.minjar.com/">https://swarm.minjar.com/</a> . </div>
</div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com1tag:blogger.com,1999:blog-7255850341729335726.post-53056001402691394052013-12-05T03:31:00.001-08:002013-12-05T03:31:03.786-08:00AmazonCloudWatchAsyncClient setting maximum concurrent HTTP connections / throttling <div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
I was looking at ways to throttle the Amazon CloudWatch Async Client from making a lot of concurrent connections simultaneously as we in our company monitor AWS system of lot of customers which means the number of metrics being fetched reach thousands easily. which leads to network throttling/packet drops/rejection of requests by AWS.<br />
<br />
Turns out there is a way to do this which was not apparent at first as I was looking at the API of AmazonCloudWatchAsyncClient. It is present as a property of ClientConfiguration class and the way to use it is as follows.<br />
<br />
<br /></div>
<pre style="background-color: #c6ccc6;">AWSCredentials credentials = <span style="color: blue;">new</span> BasicAWSCredentials(obj.getString(<span style="color: maroon;">"AWS_ACCESS_KEY"</span>),obj.getString(<span style="color: maroon;">"AWS_SECRET_KEY"</span>));
ClientConfiguration config = <span style="color: blue;">new</span> ClientConfiguration();
config.setMaxConnections(<span style="color: maroon;">1</span>); <span style="color: green;">// This is done to create fixed number of connections per client</span>
AmazonCloudWatchAsyncClient client = <span style="color: blue;">new</span> AmazonCloudWatchAsyncClient(credentials);
client.setConfiguration(config);</pre>
</div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com0tag:blogger.com,1999:blog-7255850341729335726.post-54053948487368483412013-11-11T03:06:00.002-08:002013-11-12T00:56:23.649-08:00Varnish infinite redirect loop for naked domain redirect to www<div dir="ltr" style="text-align: left;" trbidi="on">
You might face an infinite redirect loop with varnish if your backend server does a redirect for a naked domain or for that matter any domain.<br />
<div>
<br /></div>
<div>
Problem is that varnish by default uses a combination of <b>hostname </b> and <b>url </b> to create a cache key. This becomes a problem when your varnish server's hostname is not the same as the served domain (which is the case most of the times).</div>
<div>
<br /></div>
<div>
This leads to varnish caching a 301/302 redirect in itself. So if I hit example.com and my backend throws a 301 to www.example.com the key generated in varnish would be hostname_url and not HOST_url. So now if I hit www.example.com I am provided a 302 to the same url as lookup does not take into account the HOST which is different for example.com and www.example.com. To overcome this update the VCL to include HOST in the cache key. I have added<b> req.http.X-Forwarded-Prot </b>also to the hash to overcome https redirect that we face with ELB and varnish when the SSL terminates at ELB and the backend has been coded to do a redirect from non secure url (http) to secure urls (https). </div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">sub vcl_hash {</span><br />
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;"> hash_data(req.h</span><wbr style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;"></wbr><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">ttp.host);</span><br />
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;"> hash_data(req.u</span><wbr style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;"></wbr><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">rl);</span><br />
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;"> hash_data(req.h</span><wbr style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;"></wbr><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">ttp.X-Forwarded</span><wbr style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;"></wbr><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">-Proto);</span><br />
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;"> return(hash);</span><br />
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">}</span><br />
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;"><br /></span>
If you are facing infinite redirect loop for https pages behind ELB <a href="http://akashbhunchal.blogspot.in/2013/09/https-redirection-with-elb-and-varnish.html" target="_blank">read this</a>.</div>
</div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com0tag:blogger.com,1999:blog-7255850341729335726.post-44057765208945865322013-10-05T01:36:00.000-07:002013-10-05T06:44:22.219-07:00Why NFS for code sharing is a bad idea on production machines<div dir="ltr" style="text-align: left;" trbidi="on">
One of our customers was facing slowness on their website at peak loads. The architecture looked something like this:<br />
<div>
<br /></div>
<div>
1) LAMP stack.</div>
<div>
2) A single NFS server hosting all the static content as well as PHP code.</div>
<div>
3) 15 application servers hosted behind a load balancer and the NFS server mounted on all of these.</div>
<div>
<br /></div>
<div>
When we started debugging we found that the CPU load on the app server was never high even at peak loads. But the CPU load on NFS server would be very high at those times.</div>
<div>
<br /></div>
<div>
So we suspected NFS to be an issue but were not very sure because we were using APC with PHP and apc.stat was 0 which means that if the opcode cache is present in the APC, Apache would not do a look up in the file system for that file. If the above is true then once the APC opcode cache is warmed up (at peak loads it should be), then why are we seeing slowness in the site and high CPU load on NFS server at peak loads.</div>
<div>
<br /></div>
<div>
We used a linux utility called <b>strace </b>to trace all the system calls that were being made by apache processes. We attached strace to one of the apache processes and found that it was doing hell lot of <b>stat</b> and <b>lstat </b>which are Linux system calls to find out if a file has changed or not. Which means that even after making apc.stat =0 the system was still doing lookup for PHP files. Strange.</div>
<div>
<br /></div>
<div>
Turns out it has been clearly mentioned in the APC documentation that APC does a look up for files irrespective of stat status if the file has been included with a relative path (and not absolute path). Most of the includes in the code were relative, which means apc.stat=0 did not help us :( .</div>
<div>
<br /></div>
<div>
Even if the look ups are happening isn't NFS supposed to cache the files at each client ?</div>
<div>
<br /></div>
<div>
Turns out NFS does not cache the file rather caches just the file meta-data (that too for 3 secs by default, which can be changed). The reason for caching the meta-data (called file attribute cache) is performance so that client does not need to make frequent network calls to just do a stat or meta-data lookup. The reason this cache has finite time period is to avoid staleness which could have disastrous effects in a shared environment. There are ways of caching the files too using <b>fscache </b>but is not recommended in a dynamic environment.</div>
<div>
<br /></div>
<div>
So the lessons we learnt are these:</div>
<div>
<br /></div>
<div>
1) Never share code on NFS , never ever, ever. </div>
<div>
2) Use NFS for just sharing the static content.</div>
<div>
3) Never ever write to a file shared over NFS. For eg many applications have debug logs. If this log is shared then you can imagine how many network calls need to be made to write logs in the request scope.</div>
<div>
<br /></div>
<div>
After doing the above the response time of the application at peak loads reduced by 10X and down time became history. We were able to run the site with half the number of app servers.</div>
<div>
<br /></div>
<div>
The problem with shared code is that, the load eventually goes down to the NFS and the app servers just act as dumb terminals. At peak loads you cannot add more servers as it would slow down the environment even more. Its like putting another straw in a coke bottle which already had 10 straws drawing form it.</div>
<div>
</div>
</div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com1tag:blogger.com,1999:blog-7255850341729335726.post-18763818820406763602013-10-04T19:51:00.000-07:002013-10-04T19:59:05.575-07:00AWS JAVA client examples for ELB and RDS metrics (Asynchronous)<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
When I started working on the AWS JAVA client for fetching the ELB and RDS metrics, I could not find many examples on the internet. So here I am putting down some sample code snippets.<br />
<br />
So first you need to create a Client object</div>
<br />
<br />
<pre style="background-color: lightgrey;">AWSCredentials credentials = <span style="color: blue;">new</span> BasicAWSCredentials(AWS_ACCESS_KEY,AWS_SECRET_KEY);
AmazonCloudWatchAsyncClient client = <span style="color: blue;">new</span> AmazonCloudWatchAsyncClient(credentials);</pre>
<br />
<h3 style="text-align: left;">
<b>ELB:</b></h3>
Now creating an init object . Note that we are getting an aggregation of 5 mins. Which means that we are asking for one data point every five minutes and since time range is 5 mins, we would get exactly one data point every time we invoke this method<br />
<br /></div>
<pre style="background-color: lightgrey;"><span style="color: blue;">private</span> <span style="color: blue;">static</span> GetMetricStatisticsRequest initializeRequestObject(AmazonCloudWatchAsyncClient client){
GetMetricStatisticsRequest request = <span style="color: blue;">new</span> GetMetricStatisticsRequest();
request.setPeriod(<span style="color: maroon;">60</span>*<span style="color: maroon;">5</span>); <span style="color: green;">// 5 minutes</span>
request.setNamespace(<span style="color: maroon;">"AWS/ELB"</span>);
List<Dimension> dims = <span style="color: blue;">new</span> ArrayList<Dimension>();
Dimension dim = <span style="color: blue;">new</span> Dimension();
dim.setName(<span style="color: maroon;">"LoadBalancerName"</span>);
dim.setValue(<span style="color: maroon;">"NAME"</span>);
dims.add(dim);
Date end = <span style="color: blue;">new</span> Date();
request.setEndTime(end);
<span style="color: green;">// Back up 5 minutes</span>
Date beg = <span style="color: blue;">new</span> Date(end.getTime() - <span style="color: maroon;">5</span>*<span style="color: maroon;">60</span>*<span style="color: maroon;">1000</span>);
request.setStartTime(beg);
request.setDimensions(dims);
<span style="color: blue;">return</span> request;
}
</pre>
<br />
Now fetching data for individual metrics:<br />
<br />
ELB 4XX:<br />
<br />
<br /></div>
<pre style="background-color: lightgrey;"><span style="color: blue;">public</span> <span style="color: blue;">static</span> <span style="color: blue;">void</span> get5MinELB4XX(AmazonCloudWatchAsyncClient client){
client.setEndpoint(<span style="color: maroon;">"monitoring.ap-southeast-1.amazonaws.com"</span>); <span style="color: green;">// endpoints are listed at http://docs.aws.amazon.com/general/latest/gr/rande.html</span>
GetMetricStatisticsRequest request = initializeRequestObject(client);
request.setMetricName(<span style="color: maroon;">"HTTPCode_ELB_4XX"</span>);
request.setUnit(StandardUnit.Count);
List<String> stats = <span style="color: blue;">new</span> ArrayList<String>();
stats.add(<span style="color: maroon;">"Sum"</span>);
request.setStatistics(stats);
client.getMetricStatisticsAsync(request, <span style="color: blue;">new</span> AsyncHandler<GetMetricStatisticsRequest, GetMetricStatisticsResult>() {
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onSuccess(GetMetricStatisticsRequest arg0,
GetMetricStatisticsResult arg1) {
List<Datapoint> data = arg1.getDatapoints();
<span style="color: green;">// Do something with this data here</span>
}
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onError(Exception arg0) {
<span style="color: green;">// log an error</span>
}
});
<span style="color: blue;">return</span>;
}</pre>
<br />
ELB Latency:<br />
<br />
<br /></div>
<pre style="background-color: lightgrey;"><span style="color: blue;">public</span> <span style="color: blue;">static</span> <span style="color: blue;">void</span> get5MinLatency(AmazonCloudWatchAsyncClient client){
client.setEndpoint(<span style="color: maroon;">"monitoring.ap-southeast-1.amazonaws.com"</span>); <span style="color: green;">// endpoints are listed at http://docs.aws.amazon.com/general/latest/gr/rande.html</span>
GetMetricStatisticsRequest request = initializeRequestObject(client);
request.setMetricName(<span style="color: maroon;">"Latency"</span>);
request.setUnit(StandardUnit.Seconds);
List<String> stats = <span style="color: blue;">new</span> ArrayList<String>();
stats.add(<span style="color: maroon;">"Average"</span>);
stats.add(<span style="color: maroon;">"Maximum"</span>);
stats.add(<span style="color: maroon;">"Minimum"</span>);
request.setStatistics(stats);
client.getMetricStatisticsAsync(request, <span style="color: blue;">new</span> AsyncHandler<GetMetricStatisticsRequest, GetMetricStatisticsResult>() {
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onSuccess(GetMetricStatisticsRequest arg0,
GetMetricStatisticsResult arg1) {
List<Datapoint> data = arg1.getDatapoints();
<span style="color: green;">// Do something with this data here </span>
}
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onError(Exception arg0) {
<span style="color: green;">// log an error </span>
}
});
<span style="color: blue;">return</span>;
}</pre>
<br />
ELB Request Count:<br />
<br /></div>
<pre style="background-color: lightgrey;"><span style="color: blue;">public</span> <span style="color: blue;">static</span> <span style="color: blue;">void</span> get5MinRequestCount(AmazonCloudWatchAsyncClient client){
client.setEndpoint(<span style="color: maroon;">"monitoring.ap-southeast-1.amazonaws.com"</span>); <span style="color: green;">// endpoints are listed at http://docs.aws.amazon.com/general/latest/gr/rande.html</span>
GetMetricStatisticsRequest request = initializeRequestObject(client);
request.setMetricName(<span style="color: maroon;">"RequestCount"</span>);
request.setUnit(StandardUnit.Count);
List<String> stats = <span style="color: blue;">new</span> ArrayList<String>();
stats.add(<span style="color: maroon;">"Sum"</span>);
request.setStatistics(stats);
client.getMetricStatisticsAsync(request, <span style="color: blue;">new</span> AsyncHandler<GetMetricStatisticsRequest, GetMetricStatisticsResult>() {
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onSuccess(GetMetricStatisticsRequest arg0,
GetMetricStatisticsResult arg1) {
List<Datapoint> data = arg1.getDatapoints();
<span style="color: green;">// Do something with this data here </span>
}
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onError(Exception arg0) {
<span style="color: green;">// log an error </span>
}
});
<span style="color: blue;">return</span>;
}</pre>
<br />
<h3 style="text-align: left;">
<b>RDS:</b></h3>
<div>
CPU Utilization:</div>
<div>
<br /></div>
<div>
<br /></div>
</div>
<pre style="background-color: lightgrey;"><span style="color: blue;">public</span> <span style="color: blue;">static</span> <span style="color: blue;">void</span> get5MinCPUUtilization(AmazonCloudWatchAsyncClient client){
client.setEndpoint(<span style="color: maroon;">"monitoring.ap-southeast-1.amazonaws.com"</span>); <span style="color: green;">// endpoints are listed at http://docs.aws.amazon.com/general/latest/gr/rande.html</span>
GetMetricStatisticsRequest request = initializeRequestObject(client);
request.setMetricName(<span style="color: maroon;">"CPUUtilization"</span>);
request.setUnit(StandardUnit.Percent);
List<String> stats = <span style="color: blue;">new</span> ArrayList<String>();
stats.add(<span style="color: maroon;">"Average"</span>);
stats.add(<span style="color: maroon;">"Maximum"</span>);
stats.add(<span style="color: maroon;">"Minimum"</span>);
request.setStatistics(stats);
client.getMetricStatisticsAsync(request, <span style="color: blue;">new</span> AsyncHandler<GetMetricStatisticsRequest, GetMetricStatisticsResult>() {
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onSuccess(GetMetricStatisticsRequest arg0,
GetMetricStatisticsResult arg1) {
List<Datapoint> data = arg1.getDatapoints();
<span style="color: green;">// Do something with this data here </span>
}
@Override
<span style="color: blue;">public</span> <span style="color: blue;">void</span> onError(Exception arg0) {
<span style="color: green;">// log an error</span>
}
});
<span style="color: blue;">return</span>;
}</pre>
<br />
Similarly for other metrics of RDS you just need to chose the correct metric name and its unit.</div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com0tag:blogger.com,1999:blog-7255850341729335726.post-87727410365295465282013-10-03T10:16:00.000-07:002013-11-03T08:53:29.834-08:00Building Real Time Analytics Engine (counters) with Redis<div dir="ltr" style="text-align: left;" trbidi="on">
We were assigned a task of building a REAL TIME analytics API . The data to be analyzed were simple counters for multiple events that happen across the website. For eg how many times a particular listing appeared in the search results, how many times the listing page was viewed, how many times Click to Reveal was pressed, etc.<br />
<br />
The problem is simple when you have a finite set of listing. But in our case we had around a million listings and each listing had around 5 events and we had to keep data for each listing for an year.<br />
<br />
The first approach would be to use a NoSQL database like Mongodb. To be honest, mongodb for just maintaining the counters seemed like an over kill. Other reasons for not using Mongodb were:<br />
<br />
<br />
<ol style="text-align: left;">
<li>The write throughput would be a cause for concern given the collection level lock in Mongodb.</li>
<li>No support for transactions. It becomes very import in case of an analytics API, as we cannot afford to update one counter and then fail to update the other.</li>
<li>No support for atomic operation on a document. So there was no way to synchronize the counter increments or decrements.</li>
<li>No native support for incrementing/decrementing counters.</li>
</ol>
So we decided to look for something else and came across Redis (kidding, I always wanted to use Redis, was just looking for an excuse :) ).<br />
<br />
The features that tipped the scale in Redis' favor were.<br />
<br />
<ol style="text-align: left;">
<li>Its crazy fast as the whole db is in memory.</li>
<li>Transaction are supported.</li>
<li>Native support for atomic operations.</li>
<li>Native support for incrementing/decrementing counters.</li>
</ol>
Two very nice libraries exist for redis analytics namely <a href="http://amix.dk/blog/post/19714" target="_blank">bitmapist</a> and <a href="http://elcuervo.github.io/minuteman/" target="_blank">Minuteman</a> . These make use of Redis bit operations which are really awesome and helps us fit humongous data into a few MBs of memory. But these did not work for us as they work for non repeating data (i.e an event for a user is captured only once). Its binary in nature which does not fit with our counters requirement.<br />
<br />
So we implemented our own solution and here are some of my learning from the same.<br />
<br />
1) Design for failure: You cannot afford to have an analytics API down. So have a file backed fallback in case of datastore (Redis) being down. Have multiple app servers behind a load balancer which helps in making hot releases without bringing the site down. Write a script which replays these logs when your datastore is up.<br />
<br />
2) All non counter data is written to a file. We did not know what to do with this additional data, so just persisted it in a file and pushed it periodically to AWS S3.<br />
<br />
3) Enable AOF for Redis with a flush frequency of 1 sec. This is a must if you dont want to lose data on reboot of Redis :). The AOF file should be periodically pushed to S3 for higher durability.<br />
<br />
4) Have RDB snaphots atleats twice a day and push those to S3.<br />
<br />
5) When the data size becomes few GBs Redis takes a few minutes (yes minutes) to start as it runs the whole AOF file on startup. The AOF file is nothing but a list of all the DMLs that have happened and it does so serially. Redis would throw an exception until this AOF has been replayed completely. This is the reason why we need a file backed fallback when something like this happens.<br />
<br />
6) Use Redis Hashes as they are superbly optimized for space (with high time complexity but this trade off doesnt hurt much).<br />
<br />
7) Redis is single threaded. So adding more CPUs is of no use. Throw as much memory as you can at it.<br />
<br />
8) No redis library supports transactions (they cannot actually) when you run Redis cluster as its not possible to have a transaction across multiple datastores. Redis cluster works the same way as memcache (i.e data is sharded on key hashes).<br />
<br />
9) Needless to say use the smallest possible key:value pairs as even a single "_" in 1M keys adds a couple of hundred MBs to memory footprint.<br />
<br />
The request flow of our application is something like this:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://dl.dropboxusercontent.com/s/0tt3vpsgl30vnkb/flow_diagram.png?token_hash=AAGLJIF5qFr46IbCBaIKPLk9akD4XUm0VhvZIBSq8ICbvQ&dl=1" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://dl.dropboxusercontent.com/s/0tt3vpsgl30vnkb/flow_diagram.png?token_hash=AAGLJIF5qFr46IbCBaIKPLk9akD4XUm0VhvZIBSq8ICbvQ&dl=1" /></a></div>
<br />
<br />
Hope this helps someone to make a decision about using Redis for analytics.<br />
<br />
</div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com0tag:blogger.com,1999:blog-7255850341729335726.post-74007254602987365612013-09-19T01:44:00.001-07:002013-11-12T00:57:59.080-08:00HTTPS redirection with ELB and Varnish (infinite redirect loop)<div dir="ltr" style="text-align: left;" trbidi="on">
Yesterday I was trying to deploy Varnish cache in front of my app server and was faced with the issue of infinite redirect loop for some pages. This problem happened when I tried to use HTTPS for pages which are not supposed to be.<br />
<br />
The problem is like this:<br />
<br />
I am terminating SSL at the ELB, behind that is the Varnish and behind that is the app server. Its a standard practice that whenever we hit the app server with HTTPS scheme for a page which is not supposed to be HTTPS it throws a 302 with the same page url , only difference being HTTP instead of HTTPS.<br />
<br />
With SSL termination at ELB we can get the original scheme using a header which ELB sets called <span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; font-weight: bold; line-height: 16px;">X</span><span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;">-</span><span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; font-weight: bold; line-height: 16px;">Forwarded</span><span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; line-height: 16px;">-</span><span style="background-color: white; color: #444444; font-family: arial, sans-serif; font-size: x-small; font-weight: bold; line-height: 16px;">Proto</span> . So my app was making use of this and doing a redirection from HTTPS -> HTTP. Everything was good.<br />
<br />
The only problem was the Varnish cache caching 302. So this is what happened:<br />
<br />
1) User hit https://example.com/myurl.<br />
2) Varnish sends request to the backend.<br />
3) Backend send a 302 redirect to http://example.com/myurl.<br />
4) The response (302) gets cached in varnish<br />
<br />
Now the problem is since scheme is not a part of the varnish hash key so effectively the mapping inside varnish is:<br />
<br />
myurl : 302 irrespective of the scheme (https/http). So from now even if the user hit http://example.com/myurl he would always get a 302 to the same url. Infinite redirect loop.<br />
<br />
Solution is very simple just make scheme part of the vcl_hash like this:<br />
<br />
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">sub </span><span class="il" style="background-color: #ffffcc; color: #222222; font-family: arial, sans-serif; font-size: 13px;">vcl_hash</span><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;"> {</span><br />
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">hash_data(req.url);</span><br />
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">hash_data(req.http.X-</span><wbr style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;"></wbr><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">Forwarded-Proto);</span><br />
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">return(</span><span class="il" style="background-color: #ffffcc; color: #222222; font-family: arial, sans-serif; font-size: 13px;">hash</span><span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">);</span><br />
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;">}</span><br />
<span style="background-color: white; color: #222222; font-family: arial, sans-serif; font-size: 13px;"><br /></span>
<br />
Here scheme (X-Forwarded-Proto) has been made a part of the key. So the internal mapping in varnish would like like:<br />
<br />
myurl_http : Actual content<br />
myurl:https: 302 redirect<br />
<br />
Problem solved.<br />
<br />
If you are facing infinite redirect loop for naked domain redirects (example.com -> www.example.com) or otherwise, <a href="http://akashbhunchal.blogspot.in/2013/11/varnish-infinite-redirect-loop-for.html" target="_blank">read this</a>. </div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com1tag:blogger.com,1999:blog-7255850341729335726.post-33061368421999658102013-08-18T10:40:00.001-07:002013-08-18T10:40:37.542-07:00Multi bit rate video streaming with Cloudfront and Flow player<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
Sometime back I was trying to get multi-bit rate streaming working with Cloudfront, but could not find a working sample online. The reason why you need this is that all your users do not have the same bandwidth and even for those with high bandwidth the ACTUAL bandwidth might fluctuate as your packets have to travel over the internet which your ISP does not control.<br />
<br />
The way this works with Cloudfront is that you need to have the same video encoded at different bit rates present in your S3 bucket. Flowplayer (or any other player supporting multi bit rate), keeps switching between the different bit rate videos depending upon the ACTUAL bandwidth. The users would definitely feel the quality of video changing but that is better than showing the dreaded "Buffering" wheel :)<br />
<br />
So first thing that you need is your video encoded at different bit rates. I used ffmpeg to do the same, but you are free to try something else.<br />
<br />
<span style="background-color: #f3f3f3;"> <b>ffmpeg -i <input_file> -b <bit_rate> -r 30 -ab 384k -ar 44100 -s 624x256 -ac 2 -y sample_2200k.flv</b></span><br />
<b><br /></b>
<b>bit_rate</b> : This is the target bit rate, example : 2200k<br />
<b>output_file</b> : This is the output file name, example : sample_2200k.flv<br />
<br />
In this example lets create two files one with 200 kbps and one with 1000 kbps bit rate. Lets name them <b>multi_bit_rate_200k.flv</b> and <b>multi_bit_rate_1000k.flv .</b><br />
<br />
Now create a bucket (lets call it <b>videostreamingakash</b>) in S3 and put both the files there.<br />
<br />
We now need to create a streaming distribution for serving these video. Go <a href="http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/WorkingWithStreamingDistributions.html" target="_blank">here</a> for steps to create the same. While creating the distribution pick the bucket that we had created earlier as the origin.<br />
<br />
<br />
<div>
Now we need to have an HTML file with the following content:</div>
<blockquote class="tr_bq">
<br />
<html><br />
<body><br />
<div class="box black"><a<br />
href=""<br />
style="display:block;width:640px;height:360px;"<br />
id="player"><br />
</a></body><br />
</html><br />
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js" type="text/javascript" charset="utf-8"></script><br />
<script src="flowplayer-3.2.11.min.js"></script><br />
<script><br />
flowplayer("player", "http://releases.flowplayer.org/swf/flowplayer-3.2.16.swf", {<br />
clip: {<br />
<br />
urlResolvers: 'bwcheck',<br />
provider: 'rtmp',<br />
autoPlay: false,<br />
scaling: 'fit',<br />
<br />
// available bitrates and the corresponding files. We specify also the video width<br />
// here, so that the player does not use a too large file. It switches to a<br />
// file/stream with larger dimensions when going fullscreen if the available bandwidth permits.<br />
bitrates: [<br />
{<br />
url: "flv:multi_bit_rate_200k", width: 720, bitrate: 200,<br />
// this is the default bitrate, the playback kicks off with this and after that<br />
// Quality Of Service monitoring adjusts to the most appropriate bitrate<br />
isDefault: true<br />
},<br />
{ url: "flv:multi_bit_rate_1000k", width: 720, bitrate: 1000 }<br />
]<br />
},<br />
plugins: {<br />
<br />
// bandwidth check plugin<br />
bwcheck: {<br />
url: "flowplayer.bwcheck-3.2.12.swf",<br />
<br />
// CloudFront uses Adobe FMS servers<br />
serverType: 'fms',<br />
<br />
// we use dynamic switching, the appropriate bitrate is switched on the fly<br />
dynamic: true,<br />
<br />
<b> netConnectionUrl: 'rtmp://s3ht5aoild7wia.cloudfront.net/cfx/st'</b>,<br />
<br />
// show the selected file in the content box. This is not used in real installations.<br />
onStreamSwitchBegin: function (newItem, currentItem) {alert("here");<br />
$f().getPlugin('content').setHtml("Will switch to: " + newItem.streamName +<br />
" from " + currentItem.streamName);<br />
<span class="Apple-tab-span" style="white-space: pre;"> </span> $('#stream_data').text("Will switch to: " + newItem.streamName +<br />
" from " + currentItem.streamName);<br />
},<br />
onStreamSwitch: function (newItem) {<br />
$f().getPlugin('content').setHtml("Switched to: " + newItem.streamName);<br />
$('#stream_data').text("Switched to: " + newItem.streamName);<br />
}<br />
},<br />
<br />
// RTMP streaming plugin<br />
rtmp: {<br />
url: "http://releases.flowplayer.org/swf/flowplayer.rtmp-3.2.12.swf",<br />
<b>netConnectionUrl: 'rtmp://s3ht5aoild7wia.cloudfront.net/cfx/st'</b> },<br />
<br />
// a content box so that we can see the selected video dimensions. This is not used in real<br />
// installations.<br />
content: {<br />
url: "http://releases.flowplayer.org/swf/flowplayer.content-3.2.8.swf",<br />
top: 0, left: 0, width: 250, height: 150,<br />
backgroundColor: 'transparent', backgroundGradient: 'none', border: 0,<br />
textDecoration: 'outline',<br />
style: {<br />
body: {<br />
fontSize: 14,<br />
fontFamily: 'Arial',<br />
textAlign: 'center',<br />
color: '#ffffff'<br />
}<br />
}<br />
}<br />
}<br />
});<br />
</script> </blockquote>
<div>
<br /></div>
<div>
Replace <b>netConnectionUrl </b>with you STREAMING distribution. Also you need to download <b>flowplayer-3.2.11.min.js </b>from the flow player website to make the above work.</div>
<div>
<br /></div>
<div>
If you look at the block:</div>
<div>
bitrates: [</div>
<div>
{</div>
<div>
url: "flv:multi_bit_rate_200k", width: 720, bitrate: 200,</div>
<div>
// this is the default bitrate, the playback kicks off with this and after that</div>
<div>
// Quality Of Service monitoring adjusts to the most appropriate bitrate</div>
<div>
isDefault: true</div>
<div>
},</div>
<div>
{ url: "flv:multi_bit_rate_1000k", width: 720, bitrate: 1000 }</div>
<div>
</div>
<div>
We have declared the two videos with their bitrates. Flowplayer uses this for switching the streams.</div>
<div>
<br /></div>
<div>
Just open the HTML in the browser of your choice and you are done :) .</div>
<div>
<br /></div>
<div>
<br /></div>
<br />
<br />
<br /></div>
Akash Bhunchalhttp://www.blogger.com/profile/11207979421679230417noreply@blogger.com0