Thursday, August 15, 2013

Faster Rendering of Charts in Oracle Application Express

I was recently helping a customer whom I had met at the ilOUG Tech Days 2013.  This customer had some questions about line charts and how to easily create multi-series charts without having to write a query per series.  This customer was kind enough to create a test case on apex.oracle.com.  When running his test case, I quickly noticed a separate problem with his charts - they were glacially sloooooowww.  While the page itself would render quickly, it would take a significant amount of time to render the chart and it would sometimes even stall the browser.  Not cool.

Nothing looked odd about his queries for the chart series, so I proceeded to manually run the queries in SQL Commands, and that's where the problem became apparent:  he was attempting to render far too much data.  His queries were returning thousands of data points.  Why is that a problem?  Well, you first have to understand how basic chart rendering works in Oracle Application Express:
  1. A page may be rendered with one or more chart regions
  2. For each chart on the page, an AJAX call is made back to the server to fetch the chart definition and data
  3. Using JavaScript API's, the chart is rendered within each region
It was item #2 which was the problem for this customer.  The more data points you have, the longer it takes to transmit this data from the server back to the browser.  And the more data you bring back, the more data the JavaScript APIs must digest - which takes even longer.

To demonstrate this, I created a sample application using the average daily temperature for my fair city, Columbus, Ohio, USA.  I have a simple table populated with the average daily temperature from 01-JAN-1995 to 27-JUL-2013 - 6,783 rows in all.

I've disabled the option to show all 6,783 data points in a chart - I'll save you from stalling your browser.  But here is a screen shot of the fully plotted data with all 6,783 data points:


As you can see from the Firebug output, it took a long time to load - 11.93 seconds to load 45.9KB of chart data.  And then it probably took another 15 seconds in the browser to parse, digest, and render the corresponding chart.

So how do we improve this situation?  It's simple really - just reduce the amount of data.  As I told this customer, if he reduced the data points by 50% or even 80%, he would still get a fairly equivalent visualization of the same underlying data, but with less data points and faster loading times by the end users.

To achieve this, we can use the MOD (modulus) function in SQL.  The original query I was using in the chart series was:

select null link,
       date_recorded,
       temperature_c,
  from weather_data
 order by date_recorded

To get every other row, I can use the SQL analytic function ROW_NUMBER to get back an ordered set of row numbers, and then use the time-tested trick of the MOD function in the WHERE clause to only include the rows where the result of the modulus function is 0:

with wd as
(select date_recorded,
        temperature_c,
        row_number() over (order by date_recorded) as r
   from weather_data)
select null link,
       date_recorded, 
       temperature_c
  from wd
 where mod(r,2) = 0

And now to make this completely variable, I added a select list item to the page and a dynamic action to refresh the chart when the select list item is changed.  The final query is:

with wd as
(select date_recorded,
        temperature_c,
        row_number() over (order by date_recorded) as r
   from weather_data)
select null link,
       date_recorded, 
       temperature_c
  from wd
 where mod(r,nvl(:P1_SKIP_ROWS,3)) = 0

Running this same sample application but now with a divisor of 10 in the modulus function (hence, returning only 1/10th of the data) will give us a chart that looks like:


The basic shape of the chart is the same, just less dense and with less data.  As Firebug showed, 5.6KB of data came back in 1.43s and it took far less time to render in the browser.

This isn't the solution for all problems.  If you need an exact chart, and you're looking for the anomalies in data, then this solution may not work for you.  However, in many cases where charts are used in Web applications, the amount of data used to render the charts can be dramatically reduced without any meaningful loss, and only positive results for your end users.

Feel free to try this yourself in this sample application.



5 comments:

Doitordie said...

Awesome, was looking for something like this a few months back.

Alan Arentsen said...

Hi Joel!
Thanks for this interesting blogpost. It immediately drew my attention.

Your solution probably is enough for most of the customers. But one disadvantage is that, in this way, you will lose important data wich results in a graph representing something different than the 'so called' truth.

In a science environment often filters are used for smoothing data. Such as e.g. the Savitzky-Golay filter. (http://en.wikipedia.org/wiki/Savitzky%E2%80%93Golay_filter_for_smoothing_and_differentiation)

During filtering also data is lost, depending on the kind of filter that you use. I think it also really can improve speed.

However it's a lot more work :(

kind regards
Alan Arentsen

Alan Warren said...

Hi Joel,

Interesting post, but I don't think transmission of the data was in and of itself a bottleneck.

As seen at https://getfirebug.com/wiki/index.php/Net_Panel (Under Request Timeline), the purple time component is time spent waiting for the server to begin sending a response, then the green part is time on the wire. So the majority of the 11.93s latency was on the server side.

Since you've identified that the query isn't the bottleneck I'd suggest that the code serializing the resultset into json/xml or whatever is causing the slowdown on the serverside.

Of course, once you've got the data on the wire there's still the issue of javascript performance on the client side to parse and process the data. It might be possible to lighten the parsing load by using json instead of xml -- which I'm just guessing at due to the hefty JS competition between Firefox and Chrome.

Cheers,
Alan

Roel Hartman said...

Two years ago I stumbled upon the exact same issue and used a slightly different approach to minimize the number of records/ points returned. See
<a href="http://roelhartman.blogspot.nl/2011/10/analytic-function-to-rescue-again.html>this blog post</a>. The nice thing is that in that construct the user can enter a parameter for the number of records / points he wants to see - and wants to wait for.

Joel R. Kallman said...

Alan,

Interesting observation. I thought about the same thing last night. Earlier today, I asked Hilary on our team to take a look at this for APEX 5.0. Not only could the XML stream be condensed (too many spaces and newlines, which add up when you have 6K points), but there is definitely something server-side which is taking an unnecessarily long amount of time.

Thanks again for pointing this out.

Joel