Monday, March 14, 2011

Rewrite it in .NET

A few weeks ago I received a call from a customer who had an urgent issue. While this customer had gone the proper route and was working with Oracle Support, they really needed an immediate resolution. I've also worked directly with this customer since 1996 and have respect for them. They're judicious with my time - they usually only call when the situation is dire.

This customer had an Internet facing application that was previously written using an older version of Oracle Portal. To get a "quick win", they rewrote the application in Oracle Application Express - it's a simple lookup application that also uses Oracle Text. Once they released this application, during the peak periods of the day they would witness hundreds of database sessions blocking each other. The cause of the problem? Why - Oracle Application Express, of course. The director had called me to "give one last try using Oracle Application Express" before they rewrote it in .NET. He knew how to push my buttons! I dropped what I was doing, reviewed their Service Request with Oracle Support, and immediately arranged to visit them onsite that afternoon.

Once I was onsite, the DBA showed me numerous sessions on their production system, all executing the SQL statement:


DELETE FROM wwv_flow_data
WHERE flow_instance = :b1
AND item_id IN (
SELECT ID
FROM wwv_flow_page_plugs
WHERE flow_id = :b3
AND page_id = :b2
AND plug_source_type IN
('SIMPLE_CHART', 'UPDATABLE_SQL_QUERY',
'DBMSSQL_CURSOR',
'FUNCTION_RETURNING_DBMSSQL_CURSOR',
'FUNCTION_RETURNING_SQL_QUERY_CACHED',
'FUNCTION_RETURNING_SQL_QUERY', 'STRUCTURED_QUERY',
'SQL_QUERY', 'DYNAMIC_QUERY'))

I looked at this statement and I immediately knew that this was a Reset Pagination process. The scope of this statement is for a single logical APEX session - meaning that there are Internet users growing impatient with the performance of the application and clicking "Refresh" in their browsers. [When a page request is executing in the APEX engine, if the end-user clicks Reload in their browser, it will not preempt the database session currently processing their request - it will initiate a new request to a new database session, concurrently competing for the same finite resources as the first.] And why would an end-user click "Refresh" in their browser? Most likely, it's a poor performing application.

Without getting into too many details of Oracle Text, this customer had implemented their search interface such that it could easily result in a broad wildcard expansion search, i.e., the query would be expanded to essentially include all indexed search terms in their Oracle Text CONTAINS clause. To prove my point, I asked them to isolate the query that one of these long-running sessions was performing and issue the same query from their command-line client. The same query took between 30 and 60 seconds - from the command line! Now imagine if you're some Internet user trying to get some work done. You enter your search terms and click Go - the natural tendency after a few seconds, let alone 60 seconds, is to try the search again. And that's what was happening - end-users were growing impatient and repeatedly clicking Go.

The SQL statement from above (DELETE FROM wwv_flow_data...) was due to a Reset Pagination process they were performing after clicking the Go button. The pagination information for the user in the session table of APEX would be deleted, the APEX engine intentionally does not COMMIT, and then their slow-running query would run - all the while maintaining the row-lock until the query completes. The second session comes along (from the impatient user), the DELETE for the pagination information is issued, but now it's blocked from the first session.

My simple advice to them was to improve the performance of the application, as it was essentially unusable. My recommendations included:

  • Inspect the search terms being provided, and if all of the search terms were to be eliminated because of stop words in their Oracle Text stopword list or the search terms didn't meet a minimum length, raise an error and don't issue the query.

  • Add a PREFIX_INDEX and SUBSTRING_INDEX to their Oracle Text index, since they were performing numerous wildcard queries

  • Disable the Go button after it was clicked. This doesn't make the query run faster, but it can help to reduce the number of concurrently running sessions, emanating from an impatient user.

  • Change the pagination of their Classic SQL Report to "Row Ranges X to Y (with next and previous links)." Again, this won't make the query run any faster, but this is one of the fastest pagination styles of Classic Reports - simply because it reduces the number of row fetches performed by the APEX Reporting Engine.

This customer implemented these changes over the next couple of days and they haven't had any issues since (and this was over 2 months ago).

The customer presented this as "an APEX issue." And somehow rewriting it in .NET was going to correct all these problems. But guess what? If they rewrote it in .NET or PHP or Ruby or Java or Formspider or whatever, the original query that took 30 - 60 seconds to execute is going to continue to take 30 - 60 seconds to execute in all of these environments, and their end-users were going to continue to be impatient. In my recommendations above, only the last one is specific to Application Express, and that one is simply a nice-to have. It was the first and second recommendations which fixed their application. Was APEX the cause of their performance problem? I say "no" - and in fact, it did a good job of exposing their application problem.

1 comment:

Dougie Dorko said...

Very interesting... I'm sure that very few people are aware of the fact that allowing the user to re-submit the page acts in this way.

Another way to have found the same problem would have been to produce a 10046 trace of the session by adding '&P_TRACE=YES' to the end of the URL.

After interpreting the trace, you would have seen that same SQL Query as the top resource provider.

Lots of people don't use this because they don't know what to do with a trace file once they get it, and worse yet often don't have access to the server to retrieve the trace.

Since APEX runs on the DB Tier, wouldn't it be great if APEX would not only produce the trace file, but grab it from the UDUMP directory and serve it to you as a download as well?

Food for thought! :-)