Monday, April 14, 2014

Jaspersoft with AWS Redshift Experince, Learnings and Problems

This was my first experience with AWS Redshift and Jaspersoft. Both the technologies are good and easy to use but can sometimes throw up issues which are difficult to decode/fix . Here are some of the things that I faced/discovered using both.

Redshift:

  1. Has a decent performance when the queries that you are making do not have joins in them. For eg if there is one table which has 1 Bn rows and another which has just 50k rows and you do a join on a column which is not a sort key, then the query may take anywhere between 5-10min which for me is a no go for a dashboard (a user would not wait for 5 mins for a chart to load). This is because Redshift is very disk intensive and the caching of blocks in memory in Redshift is still not up to the mark so it has to read a lot of blocks on disk for every query.
  2. Always try to denormalize data as much as possible. Joins are a crime. It will suck the life out of you and the query.
  3. Concurrency is not upto the mark in redshift. The performance goes down exponentially with every new parallel execution.
  4. A LIKE match would almost always make the query slow (no surprise here as even an OLTP DB would do the same). But the degradation is significant.
  5. We have a query which when run makes the whole cluster unresponsive :) . We still dont know the reason but it happens. Again no clue why.
  6. We used to run VACCUM everyday after importing data into the cluster, but on numerous occasion we saw that the whole cluster would hang because VACCUM stalled on one of the nodes. One has to do a cluster reboot to fix this. Also the command does not time out so it may hang your cluster (READ/WRITE) for almost a day if not rebooted. So we moved this command to just once a week so that we do not hit this problem that often.
  7. Always make use of the SORT key in your queries. It works like a charm.
  8. Build aggregates for the charts as time taken by queries is not predictable when run on huge tables. We build aggregates for all the charts and dont run queries on big tables at run time.
Jaspersoft:
It also has its fair share of issues but most of them are due to Redshift :). The following is for HTML 5 reports which we built using the web wizard and NOT through report designer.

  1. CREATE VIEW may throw an error or may not load at all when redshift is having performance issues or there is a lot of load on redshift, still dont know the reason why but it happens.
  2. If you change the chart type in VIEW , it may not reflect in the REPORT. Recreate the report.
  3. There is no support for having a single axis chart with multiple measures with tooltip showing all the measures together. I found a work around and is present here
  4. There is no API to clear the Query Cache. I tried a lot of command line tools like phantomjs to script it but it did not work as every page has an executionKey and jaspersoft was smart enough to know that it was a scripted attempt :( . I had to write a selenium test case to do the same. As the cache is in memory and NOT DB I guess one would have to modify the JAVA code. It uses EH Cache for caching the queries.
  5. One Good thing about jaspersoft is that even if you leave a report without it completely loading the query still works in the background and populates the cache. This helped me in automating cache warming by simulating user clicks (selenium) without waiting for the page to load.
  6. There is no eager/ preemptive caching of queries and is on demand. So one has to write a selenium test case to warm up the cache. This is a much needed thing for redshift which is not good with concurrent queries.
Hope this will help somebody.

Friday, April 11, 2014

Jaspersoft Shared/Common Tooltip for non mutli-axis/single axis graphs (HTML5)

We were facing this problem wherein we needed four measures to be shown on the same graph (line graph). By default jaspersoft has two chart types, single axis and multi axis. The single axis graph shows only one y axis and one has to hover/move from one measure to the other to view the data in the tooltip.

This was very inconvenient for the end users as they were not able to see all the measures in a single tooltip for the same point on the x axis (date in our case) making it difficult for them to compare the data for all the measures on the graph.

We tried using multi axis chart which by default has a shared tooltip, but shows myltiple y axis which we did not need  as it was useless for us as  one, all measures were comparable to each other and two, it gave  a very wrong impression to the end user as a graph with a very low value could appear over a graph with a very high values due to difference in scales. This created a lot of confusion.

We tried all the forums and blogs but did not get an answer so decided to get our hands dirty. Turns out the fix/change is very easy (it took me 3 days though).

For those who do not know jaspersoft uses highcharts as the charting library for dynamic (HTML5) charts. The highcharts API has a property called "shared" for "tooltip". If we enable this the tooltip becomes shared. So we just needed to find the js file where it was being set and we found it :) .

Go to file scripts/adhoc/highchart.datamapper.js, method name "getCommonSeriesGeneralOptions", line number 342 and change

options.tooltip.shared = HDM.isDualOrMultiAxisChart(extraOptions.chartState.chartType);
to
options.tooltip.shared = true;

Beware, this will make the tooltip shared for all the graphs. If you want more granularity then u need to create another method similar to  HDM.isDualOrMultiAxisChart() and return true or false accordingly.