Wednesday, December 3, 2014

Migrating a single node Cassandra to multi node on AWS (EC2) with Datastax Enterprise Edition and OpsCenter

Migration of single node cassandra to an HA cluster:


As an example we would migrate a single node cassandra cluster to a 4 node Datastax Enterprise Edition Cassandra. More often than not we start with cassandra on a single node and when the time has come to scale we move it to a cluster for HA and like to have a monitoring layer on top of it called OpsCenter which is a GUI tool from DataStax to manage one or more Cassandra clusters.


The first thing that we are going to do is launch a new Ubuntu 14.04 VM in AWS. This machine would act as a template for more machines to be launched in the cluster. Once all the desired applications are installed on this machine we would be creating an AMI out of it.


Once a plain jane Ubuntu 14.04 has been launched, follow the the following steps to create your first Cassandra Node node:
  1. Install Python 2.6+.
  2. Now install DataStax Cassandra.
    1. echo "deb http://username:password@debian.datastax.com/enterprise stable main" | sudo tee -a /etc/apt/sources.list.d/datastax.sources.list where username and password are the DataStax account credentials from your registration confirmation email. You need to register to be able to download. Registration is free.
    2. curl -L https://debian.datastax.com/debian/repo_key | sudo apt-key add - . Note: If you have trouble adding the key, use http instead of https.
    3. sudo apt-get update
    4. sudo apt-get install dse-full (Installs only DataStax Enterprise and the DataStax Agent.)


We now have a DataStax Cassandra node as well as DataStax-Agent installed on this machine. Agent is required by the OpsCenter to monitor each cassandra node remotely.


Now we should copy all the cassandra file to the new machine. You could either attach an empty EBS block to the new machine and copy all the files from the old machine to this volume or just remove the old BS volume from the old machine and attach it to this new machine. After the above activity suppose all the cassandra files are in the location /data/cassandra . This new EBS volume should have good IO throughput. Use provisioned IOPS volume if possible. The instance should have at least 8 GB of Memory and at least 4 CPU.


Now we need to setup the new machine with the new cassandra data at the new location. Do the following on the new node:
  1. sudo service dse stop
  2. Go to /etc/dse/cassandra/cassandra.yaml and configure the following properties:
    1. cluster_name: 'cluster1' . In case you want to change the cluster name, then put the new name here. One more step is required to make this effective, which is covered in the following instructions.
    2. num_token: 256
    3. data_file_directories:  - /data/cassandra
    4. commitlog_directory: /data/cassandra/commitlog
    5. saved_caches_directory: /data/cassandra/saved_caches
    6. endpoint_snitch: Ec2Snitch # or the desired one
    7. - seeds: “x.x.x.x”  to the primary/seed node <private ip>address of the current machine. Seeds is a list of servers which a machine connects to at bootstrap to know the meta data about the cluster. This is used only at the start. Since this is the first machine we put its own IP as the seed.
    8. listen_address: y.y.y.y to the current node <private ip>address
    9. rpc_address: y.y.y.y to current node <private ip>address
    10. auto_bootstrap: false
  3. Go to /etc/dse/dse.yaml  and configure the following properties:
    1. delegated_snitch: org.apache.cassandra.locator.Ec2Snitch
  4. Now we need to setup the datacenter name. Go to cassandra-rackdc.properties and make the following change:
    1. dc_suffix=cassandra_dc1 . Here every node which is part of the same data center should have the same suffix. If you want to create more than one data centers within a cluster then provide different name like cassandra_dc2, etc.
  5. sudo service datastax-agent start
  6. sudo service dse start
  7. Now find out the status of the node. sudo nodetool status. Note that it will take 1-2 mins to start and may throw an exception initially. Once its up it should show only one node in the list with state UN.
  8. Verify that you tables are intact: cqlsh <private ip> -u cluster_name -p password(default cassandra)
  9. If you need to update the cluster name then make sure that you have done Step 2a first and then do the following:
    1. cqlsh> UPDATE system.local SET cluster_name = 'cluster1' where key='local';
    2. sudo nodetool flush


Now we have the cassandra with single node ready. Take the AMI of the above machine and delete everything under /data/cassandra so that this node is clean. Make sure that all the machines launched in this tutorial share the same Security Group and all the traffic between the same Security Group should be open. This is very important, else the nodes will not be able to communicate with each other.


Launch more machines from the AMI taken from the first machine and follow the above steps. Just change the following:
  1. - seeds: “x.x.x.x”  . This should be the private ip of the first machine.
  2. auto_bootstrap: true


When you launch each node and start all the services. Do the nodetool status . Every node’s initial status would be UJ which would change to UN once the node has joined the cluster completely. In this example we have a 4 node cassandra cluster.


Now we need to change the replication factor of our cluster (3). Login to any of the boxes and do the following:
  1. Get the datacenter name. cqlsh> use system;select data_center from local;
  2. cqlsh> ALTER KEYSPACE <cluster_name> WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy','<datacenter_name>' : 3 } ;


Very Important: Post this we need to run nodetool -h <private_ip> repair on each machine of the node. In our experience this may take days together so run this command in nohup. The cluster can be used without any issues even when this is running. It took us almost 20 days to repair all the nodes.


Now we need to setup the Opscenter to monitor the cluster. Launch a new machine (even an m1.small would do) and install ops-center on it. Setup the DataStax apt repo as described above to this new machine and run sudo apt-get install opscenter. This machine should have a public IP. Make sure you run this server is the same Security Group as the Cassandra cluster or open all traffic between the SGs if running in different SG.


Once the opscenter is installed. Execute sudo service opscenter start .


Opscenter can be accessed via https://<IP>/opscenter .


To add the cluster to OpsCenter, do the following:
  1. New Cluster  -> Manage Existing Cluster.
  2. Enter IP any of the nodes. This should be the private IP of the node. We are assuming that the opscenter and nodes are all in the same VPC and the communication is open between them.
  3. It may ask for the .pem private key for the node. Provide the key.
  4. Done. In few minutes it will show the cluster status and all the cluster metrics.


Our cluster with 4 nodes and replication factor of 3 is now ready along with Opscenter to monitor it.