Ramblings

Tuesday, March 8, 2016

How gmail/google does mutiple user logins in the same browser

Have you ever wondered how gmail or google in general is able to log you in with two different accounts in the same browser (albeit different tabs)? For eg : a@gmail.com and b@gmail.com are able to login in the same browser at the same time.

Now why is it a big deal or to rephrase it, what stops any website to do the same and allow multiple user logins in the same browser.

The culprit is the cookie. Now for those who dont know how sessions work in web applications, once you visit/login to a website a session is created in the server and assigned a unique ID for each such session (user in our case). This ID is written to a cookie which the browser saves locally. When a request is sent from the browser to the server it also sends out the cookies which were set earlier by the browser and this is how the server understands who is calling the service. Is it a@gmail.com or b@gmail.com.

A cookie can be set at two levels (scope):

Domain: This means that all the requests originating from this browser for this domain would send all the cookies which were set for this domain. For eg: www.google.com
Path (or url): A cookie can be set at the url level too. What this means is we can set separate cookies for /url1 and /url2 (with same domain). When the browser sends a request for /url1 it would not send cookies set for /url2 to the server.

Most of the websites use a combination of the two.

Gmail or google uses the second approach to enable multiple login sessions from the same browser and the reason its able to do that is its a single page app. The url of gmail remains the same irrespective of the page you are on. For eg : Inbox would be https://mail.google.com/mail/u/0/#inbox and sent items would be https://mail.google.com/mail/u/0/#sent.

Now whatever is after # is not assumed to be a part of the url. So effectively no matter which page you are on, browser always thinks you are on the same url and send cookies set just for that url.

This is how gmail works. If you are logged in as two users, the urls would be https://mail.google.com/mail/u/0/ and https://mail.google.com/mail/u/1/. Google assigns this number 0,1,2 for each user session and a separate cookie is set for each and voila you can now login as 2,3,4 different users in the same browser. To test this just change the number manually in one of the tabs, you would see yourself getting logged in as the other user.

This is possible only with a single page application like gmail where the url never changes. So if you want your application to support multiple logins, go single page!!

Saturday, March 7, 2015

AWS Autoscaling with F5 Big IP dynamic node add/remove

In continuation of the F5 series (setup F5 BIG-IP HA on AWS EC2, setup a Virtual Server), I am going to share my experience with using AWS Autoscaling with F5 Big-IP.

AWS EC2 has an awesome feature called AutoScaling which one can use to maintain a pool of servers at a desired size or scale the cluster up or down based on CPU utilization, network IO, etc. This gives us the ability to size our cluster according to the traffic resulting in optimal utilization of resources and cost saving.

The tricky part here is to add/remove nodes on F5 as the nodes get added/removed. Since F5 needs to know about the new nodes added removed, we need to integrate AWS Autoscaling with F5.

I used F5 REST API to achieve the same. AWS Autoscaling transmits events about the scaling activities through SNS. All we need to do is listen :)

I have created a micro service which listens for such events and does the needful on F5. I have published my implementation here .

Please feel free to use it. A simple thank you would suffice if you find it useful and it saves some time for you :) .

Thursday, January 15, 2015

Creating a Virtual Server on F5 BIG IP HA Active/Passive (Active/Standby) on AWS EC2 / VPC

Please visit my previous blog on how to setup an Active/Standby F5 Big IP on AWS here. The blog also covers some basics of F5 terminologies.

Now its time to get your load balancer up and running. You can run multiple instances of load balanced instances on F5. These are called Virtual Servers on F5.

1: Prerequisites:

Make sure that you have your backend servers that you want to load balance ready.
Make sure that the security groups have the required ports open both for F5 as well as backend server subnets.
Make sure that the services you want to load balance on these nodes are running :)

2: Setup Nodes on F5:

Goto Local Traffic > Nodes > Node List. Click Create .
Give the desired Name and Address.
Health Monitors as Node Default.
Click Finished

If the node is reachable you should see the node status as a Blue Box.

3: Create a Pool on F5:

Goto Local Traffic > Pools > Pool List. Click Create.
Enter the desired Name.
Select one of the Health Monitors. This is for checking the health of the backend servers.
Select the Load Balancing Method.
Under New Members. Select Node List and add the desired nodes to the pool.
Click Finished .
If the pool has been setup properly, it should have a green status.

4: Creating the Virtual Server on F5:

Add an additional private IP to the the ENI which in on external VLAN. This becomes your Load Balancer IP.
Goto Local Traffic > Virtual Servers > Virtual Server List. Click Create.
Enter the desired Name and Type.
Source Address is the IP range this VS should accept.To allow access from all IPs enter 0.0.0.0/0.
Destination Address should be the new private IP that we just now created.
Service Port is the port your service is running on.
Source Address Translation should be Auto Map. Please note that if you dont do this your Virtual Server will not work and the request would never reach your backend servers.
From Default Pool select the pool that we had created above.
Select the other settings as desired. For this exercise, leave the other settings as is.

If the VS has been setup properly, it should show green in the status.

Hit the VS IP and voila! your application is load balanced.

To learn how to integrate AutoScaling with F5 go here

Wednesday, January 14, 2015

Deploying F5 BIG IP HA Active/Passive (Active/Standby) on AWS EC2 / VPC

BIG IP is a big name in the world of Application Delivery Platforms. It is used primarily as a load balancer/interface for hosting a number of applications. It is modular in nature and has a variety of modules like optimized content delivery, application firewall, etc. The full set of features is listed here

F5 a few years back used to be a hardware box only which one had to buy and wire to switches/ machines . They have now come up with a cloud offering for the same and its called BIG-IP VE (VE stands for virtual edition). One can now chose to either run their hardware or run the VE on cloud.

We had to set F5 VE for one of our customers on AWS. Coming from a non networking/non physical server background, it was difficult for us to understand the F5 networking terminology and map it to AWS which as we all know is completely abstracted.

There is one documentation provided by F5 on how to host F5 on EC2 and its pretty good. Its available here. But the sad part is it assumes one understands F5 completely and is best for people who have hands on experience with running F5 hardware boxes. I followed the same and was able to set up the F5 but with some gotchas which I would like to share with you in this article. I am also going to brief you about the basics of F5 and how it works.

Some terms that one should know:

VLAN (Virtual Lan):

We all understand what LAN is. Virtual Lan is used to create further sub sections of the LAN. For eg in case of a SWITCH all the ports on it constitute a single broadcast domain. So if one machine sends out a broadcast message it would be placed on all the ports of the switch. This leads to a lot of unnecessary traffic.

Since a SWITCH is a layer 2 device and is not aware of the NETWORK layer all the ports are part of the same network. Suppose we have a very big network where in there are 1000 machines on the same network connected via a SWITCH. What if I want to segregate this network further, for eg: if I want to create three groups like SALES, MARKETING, DEVELOPMENT. I want to avoid cross group traffic which is unavoidable in case of SWITCH as its not aware of the logical subnets (if I create one for each which is possible but not recommended). So if a machine in SALES is looking for another machine within that group, it would send out an ARP request which would be received by all the machines on the switch and not just the SALES subnet. This causes a lot of unnecessary traffic.

To avoid this some switches come with a facility to create virtual lans. It allows us to group ports (phyical switch ports) together into a virtual network. So now we can sat that port 1,2,3 belong to VLAN A and ports 4,5,6 belong to VLAN B. So now there would not be a single broadcast domain and if an ARP request is sent by a machine in VLAN A it would stay within that VLAN (ports to be precise). Now we can have different subnets for each VLAN and these subnets would only be able to talk to each other through a router. This is usually achieved by adding tags to the ports.

This way we can reduce a lot of unnecessary traffic by limiting our broadcast domain to a smaller section.

AWS does not support VLAN. So for us a VPC subnet is as good as a VLAN and can be used as such but nothing stops us from creating a pueudo VLAN which is smaller than a subnet.

Virtual Server:

Virtual Server in F5 is equivalent to an ELB. In ELB we get a Domain Name and not an IP but with F5 we get an IP. A single F5 box can run multiple such Load Balanced endpoints. A single F5 box can be used for all reverse proxy requirements in a VPC. As the name implies, its a logical server and not an actual one, identified by an IP (EIP or private IP). Every Virtual Server has a pool of servers which it load balances. This is similar to the instances on an ELB. Since multiple private IPs can be attached to a single ENI, the number of VS that we can run on an F5 is limited by the number of ENIs that an instance can have.

Self IP:

An F5 box can be part of multiple VLANs. Think of Self IP as the IP F5 box uses to recognize itself, as a single ENI could have multiple private IPs attached to it which may be used by VS or some other thing. This IP is static in nature and does not migrate in case of failover.

Floating IP:

For an HA setup we need the VLANs too to migrate from one box to the other. This is achieved by assigning a floating IP to each VLAN. This IP migrates from one F5 box to the other in case of failover. This IP movement happens through reassigning of this private IP from box A to box B through AWS API calls.

Traffic Group:

In case of a HA setup, the entity that moves from one box to the other is the Traffic Group. All the floating IPs, VS ips are a part of this. We can force the movement of the traffic group manually too through the console.

Now lets get to the actual setup of a HA cluster:

1: Prerequisites:

AWS account with a VPC with atleast three subnets. For this setup lets create a VPC with CIDR 10.0.0.0/16 and three subnets 10.0.0.0/24 (management), 10.0.1.0/24 (external), 10.0.2.0/24 (internal).
Two Security Groups as mentioned here

2: Launch Box A:

Go here . Select the one which suits you.
For subnet, select the management subnet and assign a private IP (example 10.0.0.2). Add two more Network Interfaces one each from external and internal subnet and assign one private Ip (example 10.0.1.2 and 10.0.2.2).
For security group select allow-all-traffic .
Once the machine is launched assign an EIP to the management ENI. This is done so that the management port is accessible over the internet for configuration.

3: Setting up the admin password:

Log in to the new AMI that you just launched. Use the name of the key pair (.pem file), and the elastic IP address of your EC2 instance. $ ssh -i <username>-aws-keypair.pem root@<elastic IP address of EC2 instance>.
At the command prompt, type tmsh modify auth password admin.

tmsh save sys config

Enter

4: VLAN setup:

Login at https:<EIP>. Enter the admin username/password that we created in the last step.
A setup wizard would come up. Complete first 2-3 steps (license activation) then quit the wizard. Dont finish the rest of the steps as we would be doing those manually.
Go to Network > VLAN > VLAN List . Click Create .
Enter name internal.
Select 1.2 for interface, Tagging Untagged. Click the Add button.
Click Finished.
Repeat the same steps as above to create another VLAN by the name external. For interface select 1.1.

5: Self IP setup:

Goto Network > Self IPs. Click Create
Put Name as self_ip_external. IP Address 10.0.1.2. Netmask as 255.255.255.0. VLAN as external. Port lockdown Allow All. Select the Default Traffic Group.
Do the same for the internal VLAN.
Click Finished.

6: Setup AWS Credentials: Enter AWS credentials under System > Configuration > AWS.

7: Getting ready for HA setup:

Goto Device Management > Devices > Device Connectivity > Config Sync. Select the external VLAN IP.
Goto Device Management > Devices > Device Connectivity > Failover Network. Click Add under Failover Unicast Configuration. Use the management (10.0.0.2) IP here.

8: Setup the Box B : Follow all the above steps to setup the other box. Needless to say, the IPs would be different for this box :) .

9: HA cluster setup:

In Box A goto Device Management > Device Trust > Peer List. Click Add. Use the management IP of Box B and admin username/password. Follow the rest of the steps
Now both the boxes are paired.
Goto Device Management > Device Groups . Click Create.
Put any name to identify the device group which will participate in failover cluster.
Group Type is Sync-Failover.
Drag both IPs from right to left.
Select Full Sync and Network Failover.
You may have have to sync the config once to the Box B. goto Device Management > Overview and sync Box A to the group once.
You HA cluster Setup is done. One box would show ACTIVE and the other one STANDBY.

10: Creating Floating IPs:

This has to be done ONLY on Box A.
Add one more secondary IP to the 10.0.1.0/24 and 10.0.2.0/24 subnet ENI one of the boxes through AWS console.
Go to Network > Self IPs. Click Create.
Enter the name as self_ip_floating_internal for internal VLAN. Select the same values as before (with new IP that we created above). Select traffic-group-1 (floating) for Traffic Group.
Similarly do the same for external VLAN.

Now we have the HA setup ready. To test the movement of the VLAN floating IPs do the force failover and observe in the AWS console. The private IPs (floating) move from one box to the other.

Any Virtual Server that we create would have their IPs as part of this default floating traffic group. This group and its failover objects (like Virtuals Servers and IPs) can be seen under Device Management > Traffic Groups > Failover Objects.

To learn more about creating a Virtual Server go here.
To learn how to integrate AutoScaling with F5 go here

Wednesday, December 3, 2014

Migrating a single node Cassandra to multi node on AWS (EC2) with Datastax Enterprise Edition and OpsCenter

Migration of single node cassandra to an HA cluster:

As an example we would migrate a single node cassandra cluster to a 4 node Datastax Enterprise Edition Cassandra. More often than not we start with cassandra on a single node and when the time has come to scale we move it to a cluster for HA and like to have a monitoring layer on top of it called OpsCenter which is a GUI tool from DataStax to manage one or more Cassandra clusters.

The first thing that we are going to do is launch a new Ubuntu 14.04 VM in AWS. This machine would act as a template for more machines to be launched in the cluster. Once all the desired applications are installed on this machine we would be creating an AMI out of it.

Once a plain jane Ubuntu 14.04 has been launched, follow the the following steps to create your first Cassandra Node node:

Install Oracle Java. http://www.webupd8.org/2012/01/install-oracle-java-jdk-7-in-ubuntu-via.html .
Install Python 2.6+.
Now install DataStax Cassandra.

echo "deb http://username:password@debian.datastax.com/enterprise stable main" | sudo tee -a /etc/apt/sources.list.d/datastax.sources.list where username and password are the DataStax account credentials from your registration confirmation email. You need to register to be able to download. Registration is free.
curl -L https://debian.datastax.com/debian/repo_key | sudo apt-key add - . Note: If you have trouble adding the key, use http instead of https.
sudo apt-get update
sudo apt-get install dse-full (Installs only DataStax Enterprise and the DataStax Agent.)

We now have a DataStax Cassandra node as well as DataStax-Agent installed on this machine. Agent is required by the OpsCenter to monitor each cassandra node remotely.

Now we should copy all the cassandra file to the new machine. You could either attach an empty EBS block to the new machine and copy all the files from the old machine to this volume or just remove the old BS volume from the old machine and attach it to this new machine. After the above activity suppose all the cassandra files are in the location /data/cassandra . This new EBS volume should have good IO throughput. Use provisioned IOPS volume if possible. The instance should have at least 8 GB of Memory and at least 4 CPU.

Now we need to setup the new machine with the new cassandra data at the new location. Do the following on the new node:

sudo service dse stop
Go to /etc/dse/cassandra/cassandra.yaml and configure the following properties:

cluster_name: 'cluster1' . In case you want to change the cluster name, then put the new name here. One more step is required to make this effective, which is covered in the following instructions.
num_token: 256
data_file_directories: - /data/cassandra
commitlog_directory: /data/cassandra/commitlog
saved_caches_directory: /data/cassandra/saved_caches
endpoint_snitch: Ec2Snitch # or the desired one
- seeds: “x.x.x.x” to the primary/seed node <private ip>address of the current machine. Seeds is a list of servers which a machine connects to at bootstrap to know the meta data about the cluster. This is used only at the start. Since this is the first machine we put its own IP as the seed.
listen_address: y.y.y.y to the current node <private ip>address
rpc_address: y.y.y.y to current node <private ip>address
auto_bootstrap: false

Go to /etc/dse/dse.yaml and configure the following properties:

delegated_snitch: org.apache.cassandra.locator.Ec2Snitch

Now we need to setup the datacenter name. Go to cassandra-rackdc.properties and make the following change:

dc_suffix=cassandra_dc1 . Here every node which is part of the same data center should have the same suffix. If you want to create more than one data centers within a cluster then provide different name like cassandra_dc2, etc.

sudo service datastax-agent start
sudo service dse start
Now find out the status of the node. sudo nodetool status. Note that it will take 1-2 mins to start and may throw an exception initially. Once its up it should show only one node in the list with state UN.
Verify that you tables are intact: cqlsh <private ip> -u cluster_name -p password(default cassandra)
If you need to update the cluster name then make sure that you have done Step 2a first and then do the following:

cqlsh> UPDATE system.local SET cluster_name = 'cluster1' where key='local';
sudo nodetool flush

Now we have the cassandra with single node ready. Take the AMI of the above machine and delete everything under /data/cassandra so that this node is clean. Make sure that all the machines launched in this tutorial share the same Security Group and all the traffic between the same Security Group should be open. This is very important, else the nodes will not be able to communicate with each other.

Launch more machines from the AMI taken from the first machine and follow the above steps. Just change the following:

- seeds: “x.x.x.x” . This should be the private ip of the first machine.
auto_bootstrap: true

When you launch each node and start all the services. Do the nodetool status . Every node’s initial status would be UJ which would change to UN once the node has joined the cluster completely. In this example we have a 4 node cassandra cluster.

Now we need to change the replication factor of our cluster (3). Login to any of the boxes and do the following:

Get the datacenter name. cqlsh> use system;select data_center from local;
cqlsh> ALTER KEYSPACE <cluster_name> WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy','<datacenter_name>' : 3 } ;

Very Important: Post this we need to run nodetool -h <private_ip> repair on each machine of the node. In our experience this may take days together so run this command in nohup. The cluster can be used without any issues even when this is running. It took us almost 20 days to repair all the nodes.

Now we need to setup the Opscenter to monitor the cluster. Launch a new machine (even an m1.small would do) and install ops-center on it. Setup the DataStax apt repo as described above to this new machine and run sudo apt-get install opscenter. This machine should have a public IP. Make sure you run this server is the same Security Group as the Cassandra cluster or open all traffic between the SGs if running in different SG.

Once the opscenter is installed. Execute sudo service opscenter start .

Opscenter can be accessed via https://<IP>/opscenter .

To add the cluster to OpsCenter, do the following:

New Cluster -> Manage Existing Cluster.
Enter IP any of the nodes. This should be the private IP of the node. We are assuming that the opscenter and nodes are all in the same VPC and the communication is open between them.
It may ask for the .pem private key for the node. Provide the key.
Done. In few minutes it will show the cluster status and all the cluster metrics.

Our cluster with 4 nodes and replication factor of 3 is now ready along with Opscenter to monitor it.

Friday, July 4, 2014

Difference between NAT vs PROXY vs ROUTER

To understand the subtle and not so subtle differences between the three, one needs to know the OSI model and where in that model these operate.

Here is a brief overview of the OSI model.

The left column defines the data model for each layer. For example the smallest data block at the Network layer is a packet and the smallest data block at the transport layer is a segment.

Protocols like TCP and UDP function at the Transport layer, whereas IP and routing works at the Network layer.

Now coming to what operates in which layer:

Router

See the diagram below:

There are two machines, each on a different network connected through a Router. If machine A wants to communicate to machine B it needs to create a TCP connection. When the connection is created, both the machines A and B are unaware of the presence of a router in between. What this means is, in both the machines if we look up the IP:PORT combination it would be 10.0.0.1:1234 - 11.0.0.1:4567 i.e router is transparent.

In this case router does not participate at the Transport (TCP) layer and just acts as a relay for datagram packets. All it knows is where to send the packet. It does not modify the packets and does not require the response to come back through it. Infact, if there is another router somewhere in between these networks which often is the case if we assume two machines connecting over the internet, then the response may come back through some other route. Router is Network layer (TCP) protocol unaware.

NAT (Network Address Translator)

Now NAT is nothing but an intelligent router. Since we all know that IPv4 ips are in short supply and these days even something as dumb as a fridge has an IP. It is also a fact that most of the devices are consumers and not producers i.e they do not need a publicly resolvable IP. For eg : all the work stations inside a building need not have a public IP assigned to them.

This is where NAT comes into play. What NAT does is hide the machines behind it from the internet. All the machines go through NAT to access the outside world.

See the image below:

The machine A still thinks that it is talking to machine B directly (10.0.0.1:1234 - 11.0.0.1:4567). What the NAT does is replace the Source:Port Header of the packet it received from A with its own IP (w.x.y.z) and its own random port (7897). When the packet reaches machine B it thinks it came from the NAT IP.

Since a response is always to the source, it tries to send the response packet back to the NAT at the same port as was overridden in the header(7897). When the NAT had received the packet from A it had assigned it a random port (7897) and kept that in a table called NAT table. So when the response comes back from Machine B it just does a reverse look up in the same table and forwards it to the desired recipient (Machine A). This way more than one machines can access the internet through NAT.

One important point to note here is that at each layer of OSI model there is a checksum to determine if the packet/segment is valid. The same applies here. If the NAT changes the source IP:PORT combination, it need to recalculate the checksum again both at the Transport as well as Network layer. This leads to some additional work for the NAT.

Proxy:

A proxy works at the Transport layer and is aware of the protocol. Its not transparent in nature. It actually creates two connections one each with source and destination. Machine A does not even know about machine B. For machine A Proxy is the only thing its talking to and does not care how and where the proxy gets its data.

See the image below:

In the above picture Server A does not even know the IP of Server B. All it knows is the IP and port of the proxy server. The proxy server creates two connections, one with Server A and one with Server B. This happens at the Transport layer. Similarly server B does not even know the IP of server A, for it Proxy is the Source.

Examples of proxy servers are load balancers like HAProxy, Nginx, Apache, AWS ELB, F5 BIG-IP. They hide the backend from the outside world and do lot of nifty stuff like load balancing. optimization, etc.

Above is a very high level distinction of the three and the lines infact have blurred. Proxy for one is loosely used even for a NAT and vice versa. This just given you a starting point to learn more.

Thursday, June 12, 2014

Jaspersoft Ad Hoc Cache clear and build programatically using Django and Selenium

Jaspersoft in amazing in what it does, but really sucks when it doesnt work the way you want it to. I am referring to the Ad Hoc Cache behavior in Jaspersoft. For those familiar with it would know that all we can set for the Query Cache is the TTL, which is fine, but the problem comes when one has to clear the cache on demand or programatically (from a script or an ETL).

AFAIK, Jaspersoft uses EH cache with hibernate to save the results and the query. It has two problems.

This query cache is lazy in nature so one has to visit the Report for the Cache to warm up for that query.
Every query has its own expiry time depending on when it was hit and what is the duration of the cache. This mean that one report could show old data and one report could show new data depending on which one was cached when. This causes a lot of confusion to the end users.
Another problem is with invalidation of the cache. There is no HTTP endpoint which one can hit to clear the cache. One has to login as an Admin and the click the "Clear Cache" button.

The above has a lot of limitations. In our case we run an ETL everyday to load data into our redshift DW. This ETL takes a couple of hours to run as this includes some aggregates too. Since the datastore for our application is a Data Ware House, queries are not exactly superfast. What we wanted was a system which would invalidate the cache as soon as the ETL finishes and then build (warm up) the cache so that the end user does not face the initial slowness of the system and is not shown stale data too.

Since the only option was to go through the browser we tried to automate the same. Phantomjs was our first choice as it runs on Linux and does not require a browser to automate user behavior. This is important as all our production systems are Linux and devoid of any desktop environment.

But this did not work (atleast for us). The security in Jaspersoft is really good and makes use of some hidden ExecutionKeys to ensure that the request is coming from a proper source. We tried all the headers a normal browser adds with our requests, but Jaspersoft did not throw out an execution key without which we could not hit the Clear Cache HTTP endpoint for clearing cache.

The only hope now was to use Selenium which requires a browser and hence a machine with desktop environment. We had to launch one Windows box (t1.micro) just for this purpose in our cloud.

The script first clears the cache and then rebuilds it. To launch this script we had to integrate it with a web framework which would accept our on demand calls from the ETL and launch the process.

We have used Django as the Web framework to achieve the same and the source code is available here. It has both windows and Linux ports. To run the Linux port you would need a desktop environment to be running there.

Project is available here for download.