Loading...
[-]

WITH

Using WITH allows Oracle to get the result set once, store it and reuse it. The SQL is also shorter. Let’s generate a random sequence as an inline view using the WITH clause and then use it twice. If the query were being recalculated we’d see that the random numbers would be different. However there is no recalculation, even without the use of the undocumented MATERIALIZE hint:

EXEC dbms_random.initialize(5);   

WITH rnd AS
     (SELECT     LEVEL lvl, CHR (LEVEL + 64) letter,
                 TRUNC (DBMS_RANDOM.VALUE (1, 5)) rnd
            FROM DUAL
      CONNECT BY LEVEL <= 5)
SELECT a.*, b.rnd
  FROM rnd a, rnd b
WHERE a.lvl = b.lvl;

WITH - rand runs once

The random numbers match which shows there was no recalculation. Compare that the alternate style of writing the query with two embedded subqueries which yields two different random sequences. That tells us that Oracle has had to calculate the values twice:

EXEC dbms_random.initialize(5);
SELECT a.*, b.rnd
  FROM (SELECT     LEVEL lvl, CHR (LEVEL + 64) letter,
                   TRUNC (DBMS_RANDOM.VALUE (1, 5)) rnd
              FROM DUAL
        CONNECT BY LEVEL <= 5) a,
       (SELECT     LEVEL lvl, CHR (LEVEL + 64) letter,
                   TRUNC (DBMS_RANDOM.VALUE (1, 5)) rnd
              FROM DUAL
        CONNECT BY LEVEL <= 5) b
WHERE a.lvl = b.lvl;

RAND runs twice

Using WITH is both neater and the database needs to do less work. Samples run on 9i.

MIN and MAX together = bad execution plan

To demonstrate the MIN and MAX functions let’s create a global temporary table with an index and then add some data to the table. Note that you must create indexes on temporary tables when they are empty so you’ll have to TRUNCATE the table if there is anything in it. Let’s create the table with an index on the id column and then check that the index was created:

CREATE GLOBAL TEMPORARY TABLE rnd_message(id NUMBER(10), dte DATE)
ON COMMIT DELETE ROWS;

CREATE INDEX ix_msg_01 ON rnd_message(id);

SELECT index_name, table_name, column_name, column_position
  FROM ALL_IND_COLUMNS
 WHERE index_owner = SYS_CONTEXT('USERENV', 'CURRENT_USER');

Index Created

Then we’ll randomly generate a 1,000,000 rows for the table:

EXEC dbms_random.initialize(123);

INSERT INTO rnd_message
SELECT lvl, dte FROM
    (
    SELECT LEVEL lvl, TRUNC(SYSDATE)-7 + dbms_random.value(0,8) dte
      FROM dual
    CONNECT BY LEVEL <= 1000000
    );

Since this is a temporary table we can't gather statistics for it using dbms_stats but it doesn't matter for this example. Let's have a look at some plans, first just a MIN:

SELECT MIN(id) FROM rnd_message;

MIN/MAX scan

Which uses a MIN/MAX scan. So now let's try both a MIN and a MAX:

SELECT MIN(id), MAX(id) FROM rnd_message;

Full Table Scan

Wow! So the optimizer has decided it needs a value from the start of the table and one from the end - so may as well just scan the whole table! With big tables this can be quite slow. If you want to force two MIN/MAX scans instead of a full table scan or an index fast full scan then you'll have to write something like this:

SELECT (SELECT MIN(id) val FROM rnd_message) mn,
       (SELECT MAX(id) val FROM rnd_message) mx
  FROM dual;

Which yields the expected 2 x MIN/MAX scans, which is vastly better.

2 x Min/Max

Using autotrace we can find the consistent gets for the two methods to see how they perform:

So there's a big difference even here, 2480 gets compared to only 9 for a temporary table with only a million records so on production sized tables you can expect even worse performance. Be careful when using both min and max or you might get unexpectedly poor performance. Tested on both 9i and 11gR1 - similar execution plans are generated for both.

LEAD and LAG functions

The lead and lag functions offer the power to peek into adjacent rows and pull values from next or previous rows into the current row for processing. They are a convenient way of accessing multiple rows and cut processing time compared to the alternative of self-joins. Here’s a simple dataset to work with:

CREATE GLOBAL TEMPORARY TABLE dat(lvl NUMBER, letter CHAR(1))
ON COMMIT DELETE ROWS;

INSERT INTO dat
SELECT lvl, letter FROM
    (
    SELECT LEVEL lvl, CHR(LEVEL+64) letter
      FROM DUAL
    CONNECT BY LEVEL <= 5
    );

SELECT * FROM dat;

Dataset for testing lead and lag functions

Now let's apply the LEAD and LAG, jumping only a single row jump ",1" and use (ORDER by lvl ASC) for the window specification which provides the definition of next and previous rows for the functions. Note that the resultant data can be independently sorted in the usual way with a ORDER BY clause. First LEAD and we can see that the 1 A row can see forward to the next row which contains a B:

SELECT lvl, letter, LEAD(letter,1) OVER (ORDER BY lvl ASC) next
  FROM dat;

LEAD

Going the other way we can look backwards with LAG and now see that the 2 B row can see the A in the previous row.

SELECT lvl, letter, LAG(letter,1) OVER (ORDER BY lvl ASC) prev
  FROM dat;

LAG

One possible use of these functions is to test a large numbers of rows to determine whether or not they are a complete sequence. Deleting the middle row out of our sequence yields:

DELETE FROM dat WHERE lvl = 3;

SELECT * FROM dat;

Sequence with missing element at 3

Now we can use a LEAD function to easily check the next row against the current row to see whether or not the sequence is being maintained:

SELECT * FROM
(
    SELECT lvl, letter, LEAD(letter,1) OVER (ORDER BY lvl ASC) next
      FROM dat
)
WHERE CHR(ASCII(letter)+1) != next;

Of course it isn't in this case and we find that there is a problem after row 2 where there is a jump:

Identified the problem after row 2

An alternate way to find gaps in what is expected to be a sequence is to perform an outer join against a generated sequence and that's the technique used in Weekly report on transactions.

Use your CPU and GPU together

Although my previous CUDA posts have looked at the comparative performance between the CPU and GPU, an interesting approach is to split your problem up and use both chips at the same time. Even though your task might not be ideally suited to the GPU, you can still cut your total elapsed time by offloading some work to it. The trick is to figure out what parts to do where, and do it automatically – similar to the problem of Automatic parallelization, which the Wikipedia describes as “a grand challenge”, but with the added complication of heterogeneous systems.

Below is the abstract and a link to the full paper on an attempt to automatically do exactly that:

Qilin: Exploiting Parallelism on Heterogenous Multiprocessors with Adaptive Mapping

Chi-Keung Luk
Sunpyo Hong
Hyesoon Kim

Abstract

Heterogenous multiprocessors are increasingly important in the multi-core era due to their potential for high peformance and energy efficiency. In order for software to fully realize this potential, the step that maps computations to processing elements must be as automated as possible. However, the state-of-the-art approach is to rely on the programmer to specify the mapping manually and statically. This approach is not only labor intensive but also not adaptable to changes in runtime environments like problem sizes and hardware/software configurations. In this study, we propose adaptive mapping, a fully automatic technique to map computations to processing elements on a CPU+GPU machine. We have implemented it in our experimental programming system called Qilin. Our results show that, by judiciously distributing works over the CPU and GPU, automatic adaptive mapping achieves a 25% reduction in execution time and a 20% reduction in energy consumption than static mappings on average for a set of important computation benchmarks. We also demonstrate that our technique is able to adapt to changes in the input problem size and system configuration.

Complete paper (PDF 0.3MB)

Does COALESCE short-circuit?

No, on 9i it evaluates and fails on a subquery it had no business looking at since the first argument isn’t NULL:

SQL> SELECT COALESCE('Not NULL',
    (SELECT 'True short circuit' FROM dual
      UNION
     SELECT 'should ignore this' FROM dual))
  FROM dual;  2    3    4    5
    (SELECT 'True short circuit' FROM dual
     *
ERROR at line 2:
ORA-01427: single-row subquery returns more than one row

Maybe this is a compile time issue and the database doesn’t actually run the subquery? Let’s try asking for the AVG of 100,000 random numbers in the ignored/short circuited part of the COALESCE to see what happens:

SQL> set timi on
SQL>
SQL> SELECT COALESCE('Not NULL',
  2      (SELECT TO_CHAR(AVG(TRUNC(dbms_random.value(1,5)))) rnd
  3         FROM (
  4              SELECT     LEVEL num
  5                FROM     dual
  6              CONNECT BY LEVEL <= 100000 + 1
  7              ))
  8  ) no_short_circuit
  9    FROM DUAL;

NO_SHORT_CIRCUIT
----------------------------------------
Not NULL

Elapsed: 00:00:01.45

1.45 seconds to evaluate what boils down to "SELECT 'Not NULL' FROM dual". Let's look at one last method - testing how many values from a random sequence are used by various calls to COALESCE:

EXEC dbms_random.initialize(0);

SELECT TRUNC (DBMS_RANDOM.VALUE (1, 10)) rnd0,
       TRUNC (DBMS_RANDOM.VALUE (1, 10)) rnd1,
       TRUNC (DBMS_RANDOM.VALUE (1, 10)) rnd2,
       TRUNC (DBMS_RANDOM.VALUE (1, 10)) rnd3
  FROM dual;

EXEC dbms_random.initialize(0);

SELECT COALESCE(TRUNC (DBMS_RANDOM.VALUE (1, 10)),
                TRUNC (DBMS_RANDOM.VALUE (1, 10))) rnd
  FROM dual;

SELECT TRUNC (DBMS_RANDOM.VALUE (1, 10)) rnd FROM dual;

EXEC dbms_random.initialize(0);

SELECT COALESCE(TRUNC (DBMS_RANDOM.VALUE (1, 10)),
                TRUNC (DBMS_RANDOM.VALUE (1, 10)),
                TRUNC (DBMS_RANDOM.VALUE (1, 10))) rnd
  FROM dual;

SELECT TRUNC (DBMS_RANDOM.VALUE (1, 10)) rnd FROM dual;

First we seed the sequence then use four values in a query to see what the random sequence looks like for this seed. Then we test a COALESCE with two calls to DBMS_RANDOM and three calls to DBMS_RANDOM. If COALESCE did short circuit there should have been only one call to DBMS_RANDOM, so only one value used from the random sequence. The very next call to DBMS_RANDOM in both cases should retrieve 8 which is the second value in the sequence.

PL/SQL procedure successfully completed.

      RND0       RND1       RND2       RND3
---------- ---------- ---------- ----------
         1          8          3          2

PL/SQL procedure successfully completed.

       RND
----------
         1

       RND
----------
         3

PL/SQL procedure successfully completed.

       RND
----------
         1

       RND
----------
         2

Instead what comes back are the third and fourth values in the random sequence which is consistent with every call to DBMS_RANDOM being executed and no short circuiting happening.

Three different tests and COALESCE doesn't appear to do any kind of short circuit on 9i.

Edit: After reading Jeff's comment I ran the test cases above on 10g and 11gR1 and they both passed all three test cases. So the answer to the question which started this post "Does COALESCE short-circuit?" is: It depends on the version of Oracle, no for 9i, yes for 10g and 11gR1.

Aggregate functions ignore nulls

Generate a sequence of numbers with a NULL to test:

WITH seq AS
(
    SELECT num
    FROM   (
            SELECT     LEVEL num
            FROM       dual
            CONNECT BY LEVEL <= 9 + 1
            )
    UNION
    SELECT null
    FROM dual
)
SELECT * FROM seq;

Sequence with NULL

Let's try a few aggregate functions:

WITH seq AS
(
    SELECT num
    FROM   (
            SELECT     LEVEL num
            FROM       dual
            CONNECT BY LEVEL <= 9 + 1
            )
    UNION
    SELECT null
    FROM dual
)
SELECT COUNT(*), COUNT(num), SUM(num), AVG(num)
  FROM seq;

Aggregates

So we can see that COUNT(*) counts the number of rows returned, whereas COUNT, SUM and AVG of num all ignore the NULL. Having the aggregates ignore NULLs makes the following kind of query where you are classifying and aggregating different values in a single column:

EXEC dbms_random.initialize(123);

WITH seq AS
(
    SELECT TRUNC(dbms_random.value(1,4)) rnd, num
    FROM   (
            SELECT     LEVEL num
            FROM       dual
            CONNECT BY LEVEL <= 9 + 1
            )
    UNION
    SELECT null, null
    FROM dual
)
SELECT * FROM seq;

Sequence to classify

Let's say we want the average of the num column for each of the posible values 1,2,3 in the rnd column. Since aggregates ignore NULLs we can easily do this:

EXEC dbms_random.initialize(123);

WITH seq AS
(
    SELECT TRUNC(dbms_random.value(1,4)) rnd, num
    FROM   (
            SELECT     LEVEL num
            FROM       dual
            CONNECT BY LEVEL <= 9 + 1
            )
    UNION
    SELECT null, null
    FROM dual
)
SELECT AVG(CASE WHEN rnd = 1 THEN num ELSE null END) one,
       AVG(CASE WHEN rnd = 2 THEN num ELSE null END) two,
       AVG(CASE WHEN rnd = 3 THEN num ELSE null END) three
  FROM seq;

Classified and averaged

Note that the dbms_random.initialize call is made to seed the random number generator to force the same sequence for each sample. Not sure if this sequence is reliable across instances or versions. This sequence was generated on 9i.

SELECT *
  FROM product_component_version;

Oracle 9i 9.2.0.8.0 64 bit Solaris

Weekly report on transactions

When running an OLTP system you are dealing with timestamped transactions or messages which you may like to report on. Let’s say you want to check whether the message rates broken down by hour have been consistent over the last week. So let’s start with some sample message times generated randomly:

EXEC dbms_random.initialize(123);

CREATE GLOBAL TEMPORARY TABLE rnd_message(dte DATE)
ON COMMIT DELETE ROWS;

INSERT INTO rnd_message
SELECT dte FROM
    (
    SELECT TRUNC(SYSDATE)-7 + dbms_random.value(0,8) dte
      FROM dual
    CONNECT BY LEVEL <= 100
    );

which yields:

Messages times to report on

Now we could run a simple group by and then scroll up and down to see what is going on

SELECT TO_CHAR(dte,'hh24') hh, TO_CHAR(dte,'dd') dd, COUNT(*) cnt
  FROM rnd_message
 WHERE TRUNC(dte) >= TRUNC(SYSDATE-7)
GROUP BY TO_CHAR(dte,'hh24'), TO_CHAR(dte,'dd')
ORDER BY dd, hh;

but it is inconvenient, difficult to see patterns and the groups are sparse. Also the gaps are hard to spot:

Sparse result

It may well be that the lack of messages is more interesting than their presence. So to show the gaps more clearly we'll need to do an outer join against a generated sequence like this:

SELECT '01' dd, hr, SUM(msg.cnt) "0"
  FROM
(
SELECT * FROM
    (
    SELECT LEVEL-1 hr
    FROM dual
    CONNECT BY LEVEL <= 24
    )
) hrs,
(
SELECT TO_CHAR(dte,'hh24') hh, TO_CHAR(dte,'dd') dd, COUNT(*) cnt
  FROM rnd_message
 WHERE TRUNC(dte) >= TRUNC(SYSDATE-7)
GROUP BY TO_CHAR(dte,'hh24'), TO_CHAR(dte,'dd')
) msg
WHERE hrs.hr = msg.hh (+)
  AND '01' = msg.dd(+)
GROUP BY hr
ORDER BY hr;

so doing that just for today gives:

Outer join against created sequence

That's nice and the gaps are clearly visible but the patterns across days are still difficult to see. What we'd really like is a side-by-side matrix with rows showing hours and columns being days. To do that we need to convert rows into columns or pivot the data. Pivot is available in 11g but for older versions we can use a series of case statements to achieve the desired effect:

SELECT hr,
       SUM(CASE WHEN dd=TO_CHAR(SYSDATE-7, 'dd') THEN cnt ELSE NULL END) "-7",
       SUM(CASE WHEN dd=TO_CHAR(SYSDATE-6, 'dd') THEN cnt ELSE NULL END) "-6",
       SUM(CASE WHEN dd=TO_CHAR(SYSDATE-5, 'dd') THEN cnt ELSE NULL END) "-5",
       SUM(CASE WHEN dd=TO_CHAR(SYSDATE-4, 'dd') THEN cnt ELSE NULL END) "-4",
       SUM(CASE WHEN dd=TO_CHAR(SYSDATE-3, 'dd') THEN cnt ELSE NULL END) "-3",
       SUM(CASE WHEN dd=TO_CHAR(SYSDATE-2, 'dd') THEN cnt ELSE NULL END) "-2",
       SUM(CASE WHEN dd=TO_CHAR(SYSDATE-1, 'dd') THEN cnt ELSE NULL END) "-1",
       SUM(CASE WHEN dd=TO_CHAR(SYSDATE-0, 'dd') THEN cnt ELSE NULL END) "0"
  FROM
(
SELECT * FROM
    (
    SELECT LEVEL-1 hr
    FROM dual
    CONNECT BY LEVEL <= 24
    )
) hrs,
(
SELECT TO_CHAR(dte,'hh24') hh, TO_CHAR(dte,'dd') dd, COUNT(*) cnt
  FROM rnd_message
 WHERE TRUNC(dte) >= TRUNC(SYSDATE-7)
GROUP BY TO_CHAR(dte,'hh24'), TO_CHAR(dte,'dd')
) msg
WHERE hrs.hr = msg.hh (+)
GROUP BY hr
ORDER BY hr;

which yields a nice matrix showing the days laid out side by side:

Completed pivoted view

So we’ve used an outer join against a generated CONNECT BY sequence and then CASE to PIVOT the result yielding a nice, easy to read, report on transactions for the week broken down by hour.

AJAX Push – Comet

After reading what underdog discovered about Yahoo Finance’s streamerapi I decided to get try to the data directly into an HTML page in IE8.

Go to Yahoo finance and with IE8’s debugger set a breakpoint in the yfs_concat.js file on the loadUrl (214) with a watch on url to get the streamerapi call.

Usually this is minified, fortunately Yahoo must have been debugging when I grabbed this screenshot.

Open that URL in IE8 and the status bar starts filling so we can see something is happening. Right click and view source to see the stream of script commands.

View Source

Monitoring the feed with wireshark

GET /streamer/1.0?s=DX-Y.NYB,AUDUSD=X,EURUSD=X,USDJPY=X,GBPUSD=X,CLF10.NYM,GCZ09.CMX,USD=X,^GSPC,^IXIC,^DJI&
  k=c10,l10,p20,t10&callback=parent.yfs_u1f&mktmcb=parent.yfs_mktmcb&gencallback=parent.yfs_gencb HTTP/1.1
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap,
  application/x-shockwave-flash, */*
Accept-Language: en-AU,zh-TW;q=0.5
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727;
  .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)
Accept-Encoding: gzip, deflate
Host: streamerapi.finance.yahoo.com
Connection: Keep-Alive
Cookie: B=dpcg5595g36bh&b=3&s=qb; adx=c83020@1259630968@1; YSC=0; Q=q1=AACAAAAAAAAAAA--&q2=SxeGOg--;
  PRF=&t=EURUSD=X+GCZ09.CMX+CLF10.NYM+DX-Y.NYB+EURUSD%3DX+AUDUSD%3DX+AUDUSD%3DX+AUDUSD&cp=interactive&
  cd=symbol%3Agcz09.cmx_@range%3A5d_@indicator%3Avolume_@charttype%3Aline_@crosshair%3Aon_@ohlcvalues%3A0_@
  logscale%3Aon_@source%3Aundefined; TT=tick1=0&tick2=0&tick3=1

HTTP/1.1 200 OK
Cache-Control: private
Connection: Keep-Alive
Server: Yahoo! Finance Push Server - v1.0.3
Content-Type: text/html

<html><head><script type='text/javascript'> document.domain='finance.yahoo.com';</script> </head><body></body>
<script>try{parent.yfs_u1f({"USD=X":{l10:"1.00",c10:"0.00",p20:"0.00"}});}catch(e){}</script>
<script>try{parent.yfs_u1f({"^IXIC":{l10:"2214.47",c10:"+13.42",p20:"+0.61"}});}catch(e){}</script>
<script>try{parent.yfs_u1f({"^DJI":{l10:"10497.42",c10:"+45.42",p20:"+0.43"}});}catch(e){}</script>
<script>try{parent.yfs_u1f({"GBPUSD=X":{l10:"1.6386",c10:"+0.0117",p20:"+0.72"}});}catch(e){}</script>
<script>try{parent.yfs_u1f({"AUDUSD=X":{l10:"0.9012",c10:"-0.0048",p20:"-0.52"}});}catch(e){}</script>

shows Yahoo opens an HTTP connection, sends a head and body tags and then slowly sends a series of script tags without ever sending a tag or closing the connection. However I have noted that I reliably lose the connection at GMT 1030 so I assume Yahoo is reseting their servers. The yfs_mktmcb (market time callback?) callback is called every 300s so it acts as a heartbeat, you could use setTimeout and if you don’t get a response within 300+x seconds then reconnect.

After seeing the stream running I moved to the next step. Initially I used an XMLHttpRequest from a local file (IE8 ignores cross domain issues when source HTML is local) but unfortunately since the streamer never completes readyState stays 3 forever and neither responseText nor responseBody ever become available.

I then tried a change of direction, using an html file with a hidden iframe containing the streamerapi call, which is how Yahoo does it, but cross domain scripting restrictions prevented that from working even from localhost. Searching turned up a similar problem on Experts Exchange but the answer didn’t appear on their website, looking on the Google cached version of the page (click cache then scroll to the bottom) instead I found that if you are creating a HTA or HTML Application then you can set your iframe up with “application=yes” and it will conveniently ignore cross domain issues.

Google's cached view of Experts Exchange

Armed with that information I was put together a small HTA debugging tool which consumes the stream, shows the scripted commands as they arrive and changes the window title to the most recent price of the “AUDUSD”. You can download it here (may need to unblock to get it running). Since HTAs are trusted applications which happen to be written in HTML you can do things like write to disk, this could be used to stream the data to a file or you could process the stream and build an HTA ticker application.

A similar technique might work to consume LightStreamer and other push technologies but I haven’t tried.

Arbitrary Web Slices

I was initially impressed by the concept of Web Slices but since webpages have to explicitly support this, and most don’t, it ends up fairly useless. I started thinking that arbitrary slices would be much more useful and all it would take would be an AJAX call from jQuery. Recently someone asked me if it would be possible to extract the Perth weather forecast from the BOM and present the data in a HTA or Win7 gadget. A quick inspection of the source:

view-source

shows that the relevant data is within a <div id=”content”> tag. Turns out the entire code needed to extract the data and present it in a page is simply:

<html>
<head>
<title>Perth BOM Forecast</title>
<style>
body{ overflow:hidden; } // Hide scrollbars
</style>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>
<script>
var url = "http://www.bom.gov.au/weather/wa/forecasts/perth.shtml";

function loadForecast() {
	$.ajax({ url: url, context: document.body, success: function(html){
		$bom = $(html).find("DIV#content");
        $(this).find("DIV#content").html($bom.html());
	}});
}

$(document).ready(function() {
	loadForecast();
	window.resizeTo(590, 700);
	setInterval ( loadForecast, 60*60*1000 );
});
</script>
<body>
<div id="content">
</div>
</body>
</html>

Which, when put into an HTA, looks like this:

BOM HTA

The reason for using a HTA is that, unlike HTML, it is full trust and can make the necessary ActiveX cross-site AJAX call without any warnings. A Win7 gadget behaves similarly.

However the data is returned in an inconvenient format so I used some RegEx to parse it into an array of objects and then put that into an HTA which you can download here.  (may need to unblock to get it running).

Cheesy Exchange Rates

Cheesy Design has released an exchange rate plug-in for Desktop Sidebar http://www.desktopsidebar.com This plug-in is modelled on the Share Market stocks plug-in that comes with Desktop Sidebar.

The binary can be downloaded from Here . The binary gets its extension renamed to .zip when downloading using IE8. Simply rename it back again to .dspackage to install in Desktop Sidebar.

The source code can be obtained from Here where the code is maintained on the Source Forge subversion repository.