Nothing looked odd about his queries for the chart series, so I proceeded to manually run the queries in SQL Commands, and that's where the problem became apparent: he was attempting to render far too much data. His queries were returning thousands of data points. Why is that a problem? Well, you first have to understand how basic chart rendering works in Oracle Application Express:
- A page may be rendered with one or more chart regions
- For each chart on the page, an AJAX call is made back to the server to fetch the chart definition and data
- Using JavaScript API's, the chart is rendered within each region
To demonstrate this, I created a sample application using the average daily temperature for my fair city, Columbus, Ohio, USA. I have a simple table populated with the average daily temperature from 01-JAN-1995 to 27-JUL-2013 - 6,783 rows in all.
I've disabled the option to show all 6,783 data points in a chart - I'll save you from stalling your browser. But here is a screen shot of the fully plotted data with all 6,783 data points:
As you can see from the Firebug output, it took a long time to load - 11.93 seconds to load 45.9KB of chart data. And then it probably took another 15 seconds in the browser to parse, digest, and render the corresponding chart.
So how do we improve this situation? It's simple really - just reduce the amount of data. As I told this customer, if he reduced the data points by 50% or even 80%, he would still get a fairly equivalent visualization of the same underlying data, but with less data points and faster loading times by the end users.
To achieve this, we can use the MOD (modulus) function in SQL. The original query I was using in the chart series was:
select null link, date_recorded, temperature_c, from weather_data order by date_recorded
To get every other row, I can use the SQL analytic function ROW_NUMBER to get back an ordered set of row numbers, and then use the time-tested trick of the MOD function in the WHERE clause to only include the rows where the result of the modulus function is 0:
with wd as (select date_recorded, temperature_c, row_number() over (order by date_recorded) as r from weather_data) select null link, date_recorded, temperature_c from wd where mod(r,2) = 0
And now to make this completely variable, I added a select list item to the page and a dynamic action to refresh the chart when the select list item is changed. The final query is:
with wd as (select date_recorded, temperature_c, row_number() over (order by date_recorded) as r from weather_data) select null link, date_recorded, temperature_c from wd where mod(r,nvl(:P1_SKIP_ROWS,3)) = 0
Running this same sample application but now with a divisor of 10 in the modulus function (hence, returning only 1/10th of the data) will give us a chart that looks like:
The basic shape of the chart is the same, just less dense and with less data. As Firebug showed, 5.6KB of data came back in 1.43s and it took far less time to render in the browser.
This isn't the solution for all problems. If you need an exact chart, and you're looking for the anomalies in data, then this solution may not work for you. However, in many cases where charts are used in Web applications, the amount of data used to render the charts can be dramatically reduced without any meaningful loss, and only positive results for your end users.
Feel free to try this yourself in this sample application.
5 comments:
Awesome, was looking for something like this a few months back.
Hi Joel!
Thanks for this interesting blogpost. It immediately drew my attention.
Your solution probably is enough for most of the customers. But one disadvantage is that, in this way, you will lose important data wich results in a graph representing something different than the 'so called' truth.
In a science environment often filters are used for smoothing data. Such as e.g. the Savitzky-Golay filter. (http://en.wikipedia.org/wiki/Savitzky%E2%80%93Golay_filter_for_smoothing_and_differentiation)
During filtering also data is lost, depending on the kind of filter that you use. I think it also really can improve speed.
However it's a lot more work :(
kind regards
Alan Arentsen
Hi Joel,
Interesting post, but I don't think transmission of the data was in and of itself a bottleneck.
As seen at https://getfirebug.com/wiki/index.php/Net_Panel (Under Request Timeline), the purple time component is time spent waiting for the server to begin sending a response, then the green part is time on the wire. So the majority of the 11.93s latency was on the server side.
Since you've identified that the query isn't the bottleneck I'd suggest that the code serializing the resultset into json/xml or whatever is causing the slowdown on the serverside.
Of course, once you've got the data on the wire there's still the issue of javascript performance on the client side to parse and process the data. It might be possible to lighten the parsing load by using json instead of xml -- which I'm just guessing at due to the hefty JS competition between Firefox and Chrome.
Cheers,
Alan
Two years ago I stumbled upon the exact same issue and used a slightly different approach to minimize the number of records/ points returned. See
<a href="http://roelhartman.blogspot.nl/2011/10/analytic-function-to-rescue-again.html>this blog post</a>. The nice thing is that in that construct the user can enter a parameter for the number of records / points he wants to see - and wants to wait for.
Alan,
Interesting observation. I thought about the same thing last night. Earlier today, I asked Hilary on our team to take a look at this for APEX 5.0. Not only could the XML stream be condensed (too many spaces and newlines, which add up when you have 6K points), but there is definitely something server-side which is taking an unnecessarily long amount of time.
Thanks again for pointing this out.
Joel
Post a Comment