Loading...
[-]

Javascript STDEV

After my previous post which was a Shootout between SQL Server, Matlab and C# on Standard Deviation (STDEV) I thought I might try doing it in Javascript to see how it performed on speed and accuracy.  I needed to make an ADODB connection to the database using ActiveX on OBDC through a System DSN and then calculate stdev.  Since Javascript doesn’t have a native implementation and since it had performed well in C# I decided to use the Alglib implementation of stdev:

<HTML>
  <HEAD>
  <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>
  <script language=javascript>
var x = new Array();
var mean = 0.0;
var val = 0;

function genRandom() {
	var startTime=new Date() ;

	var sum = 0;
	for (var i = 0; i < 1000000; i++) {
		val=Math.floor(Math.random()*10) + 1000000000;
		x.push(val);
		sum += val;
	}
	mean = sum / x.length;

	var endTime=new Date();
	$("#timing0").html(endTime-startTime + " ms");
}

function connectDb() {
	$("#test").html("");

	var ConnDB = new ActiveXObject("ADODB.Connection");
	ConnDB.ConnectionString="dsn=matlab_user";
	ConnDB.Open();
	var Rs = new ActiveXObject("ADODB.Recordset");
	Rs.CacheSize = 100; // 15% faster than default size of 1
	var sum=0;

	var startTime=new Date() ;
	Rs.Open("select value from data",ConnDB,0,1,1);
	//Rs.Open("data",ConnDB,0,1,512);
	while(!Rs.EOF) {
		val = Rs.Fields(0).Value;  // Early Binding: http://msdn.microsoft.com/en-us/library/aa496013(SQL.80).aspx
		sum += val;
		x.push(val);
		Rs.MoveNext;
	}
	Rs.Close();

	var endTime=new Date();
	$("#timing1").html(endTime-startTime + " ms");

	mean = sum/x.length;
}  

function test_stdev() {
	var startTime=new Date() ;
	var loopCount = 100;
	for (var i = 0; i< loopCount; i++) {
		$("#test").html(stdev() + "<br/>");
	}
	var endTime=new Date();
	$("#timing2").html(Math.floor((endTime-startTime)/loopCount) + " ms");

	x = new Array();
}

function stdev() {
	var variance = 0.0;
	var n = x.length;
	var v1 = 0.0;
	var v2 = 0.0;
	var stddev;

	if (n != 1)	{
		for (var i = 0; i <= n - 1; i++)
		{
			v1 = v1 + (x[i] - mean) * (x[i] - mean);
			v2 = v2 + (x[i] - mean);
		}

		v2 = v2*v2 / n;
		variance = (v1 - v2) / (n-1);
		if (variance < 0) { variance = 0; }
		stddev = Math.sqrt(variance);
	}
	return stddev;
}

  </script>
  </HEAD>
  <BODY>   

  <P><div id=test> </div></P>
  <P><INPUT id="button0" type="button" value="GenRandom" name="button0" onclick="genRandom();">
     <INPUT id="button1" type="button" value="GetData" name="button1" onclick="connectDb();">
     <INPUT id="button2" type="button" value="Calculate" name="button2" onclick="test_stdev();"></P>
  Random: <span id="timing0"></span><br/>
  ADODB: <span id="timing1"></span><br/>
  Calc : <span id="timing2"></span><br/>

  </BODY>
</HTML>

To make the GetData button in this code to work you’ll need:

  • A database with a date table containing a column called value
  • A System DSN to connect to the database (Mine is OS authenticated so no need for uid or pwd)

Running it under IE8 (which supports the ActiveX necessary to make the database connection) I found that it took around 18 seconds to get the data from SQL Server (originally took 30 seconds but I was able to cut that nearly in half with Early Binding) and then a further second to calculate the standard deviation. It was able to come up with the correct STDEV for this particular dataset returning 2.916476466727062 – the same as C# Alglib and Matlab returned in my shootout post.

Then I got to thinking that IE8 isn’t particularly fast, how would this run in The fastest browser on Earth, Chrome? A quick test on SunSpider showed some promise for Chrome where it took only 560.8ms to complete the test while IE8 dragged its feet taking nearly 10x as long at 5807.4ms.

However Chrome doesn’t support ActiveX … or does it? Well no, but you can sort of fake it with IE Tab Classic which is presumably just calling the IE DLLs but I gave it a shot anyway. As expected it turned in figures comparable to IE8, 18 seconds on data retrieval and 1 on calculation.

How about just generating random numbers in the code and calculating the stdev on that. Sure I couldn’t compare the final STDEV anymore since the data set was different (although you could get the same sequence by using a seed on a PRNG like Mersenne Twister or similar) but I didn’t care about that too much, I only wanted performance figures for the calculation. Because IE8 is so slow to calculate did a single run of STDEV with this result:

IE8 Standard Deviation

As expected, it doesn’t matter where the random sequence came from IE8 still takes about a second to calculate stdev. Chrome turns out to be significantly faster so I ran it 100 times and took the average:

IE8 Standard Deviation

Chrome is the clear winner, taking about 1/15th of the time that IE8 does to do the same math. But that’s not the really surprising result – the careful reader with an extremely good memory will recall from my previous post that Matlab took 95ms to calculate the STDEV in an exported (BuilderNE) .NET DLL – so at only 76 ms Chrome is faster at running custom Javascript to calculate STDEV than Matlab is when running the built-in function written by the vendor! Is Chrome amazingly fast or is Matlab slow?

Shootout between SQL Server, Matlab and C# on Standard Deviation (STDEV)

Let’s say we have an mathematically intensive application written in C# with SQL Server as the backing database. When designing the application it can be convenient to call on the wealth of libraries and rather convenient IDE in a product such as Matlab. My normal thinking would be to first design and test the algorithms in Matlab and then once the design is complete, to rewrite the code in C# and SQL, possibly making use of the SQLCLR to write C# stored procedures. We could also make use of existing math/stats libraries where possible. However it turns out that Matlab has a tool called BuilderNE which can generate C# dlls which can allow your app to conveniently interface with the Matlab code. Using this you could directly call your algorithms inside Matlab and run the code you’ve designed without re-implementation. Now obviously you’ll be paying a performance penalty, but how much of one? I set out to find out. I decided to test a simple function, Standard Deviation over an entire population (STDEV), in a series of scenarios to see how the various options stacked up.

Note that while STDEVP might seem like an easy enough algorithm, there are several ways to implement it and each, due to floating point precision, will give different answers. For my C# implementation I’ve used the Welford algorithm as described on Wikipedia under Rapid calculation methods. You might also be interested in this blog post: Comparing three methods of computing standard deviation.

  1. Matlab
  2. SQL Server
    1. Native Implementation of STDEV
    2. Call .NET DLL exported from Matlab with SQLCLR
    3. Naive custom C# user defined aggregate
    4. Improved C# user defined aggregate
  3. C# Application
    1. C.1) Retrieve STDEV from database
    2. Call Matlab DLL
    3. Custom C# implementation
    4. Alglib’s implementation

All tests run an an Intel E4500 with 2GB RAM.

First we need to generate some sample test data for the comparison, enough that it takes an easily measureable amount of time to complete the calculation. So 1 million 7-digit random numbers. Note that I’ve named my database Matlab for this comparison.

use Matlab;

CREATE TABLE [dbo].[data](
	[value] [numeric](10, 0) NOT NULL
) ON [PRIMARY]

DECLARE @Index INT ;
SET @Index = 1 ;
WHILE @Index <= 1000000 BEGIN
    INSERT data (value) VALUES (ROUND(RAND() * 10,0) + 1000000000) ;
    SET @Index = @Index + 1 ;
END

A) Matlab

Set up an ODBC connection called Matlab_user, retrieve the data and then run the stddev over the values. Here's the Matlab code:

conn = database('Matlab_user','','');
setdbprefs('DataReturnFormat','numeric');
sqlquery = 'SELECT value FROM data'; 

tElapsed_fetch = zeros(1,10);
tElapsed = zeros(1,10);
values = zeros(1,1000000);

for n = 1:10
    tStart = tic; values = fetch(conn, sqlquery); tElapsed_fetch(n) = toc(tStart);
    tStart = tic; std(values,1); tElapsed(n) = toc(tStart);
end

format long g

fetch = round(tElapsed_fetch*1000)'
calc = round(tElapsed*1000)'
total = fetch+calc

As expected, being outside the database means a big performance penalty with most of the time spent in data retrieval.

Average performance: 13250 ms (13217 ms fetch, 33 ms calculation)

B) SQL Server

B.1) The first and simplest test is simply to use SQL Server's native STDEV function:

DECLARE @Index INT ;
SET @Index = 1 ;
WHILE @Index <= 10 BEGIN
  SET STATISTICS TIME ON
  SELECT STDEV(value)
    FROM data;
  SET STATISTICS TIME OFF
  SET @Index = @Index + 1 ;
END;

Average performance: 555 ms

Note if you want to reproduce this test on your own SQL Server instance but are getting strange timing results you may to look at SQL Server timing values may be incorrect when you use utilities or technologies that change CPU frequencies.

B.2) Call .NET DLL exported from Matlab with SQLCLR

It easy to generate a C# DLL using BuilderNE (>> deploytool) so why not put it directly into the database as an user defined aggregate function? The problem with doing this is that it requires all the referenced DLLs to be loaded into the database. In (2) we needed to use MWArray so let's try loading that into SQL Server:

CREATE ASSEMBLY MWArray from 'C:\Program Files\MATLAB\R2008b\toolbox\dotnetbuilder\bin\win32\v2.0\MWArray.dll'
WITH PERMISSION_SET = SAFE;

But we get an error message:

Msg 10301, Level 16, State 1, Line 1
Assembly 'MWArray' references assembly 'system.drawing, version=2.0.0.0, culture=neutral, publickeytoken=b03f5f7f11d50a3a.', which is not present in
the current database. SQL Server attempted to locate and automatically load the referenced assembly from the same location where referring assembly came from,
but that operation has failed (reason: 2(failed to retrieve text for this error. Reason: 15105)). Please load the referenced assembly into the current
database and retry your request.

Now while it is possible to load system.drawing into the database it is not recommended and isn't generally a viable option, see here for the method and a spirited discussion on the merits. While I was able to force the assemblies into the database:

USE Matlab;

ALTER DATABASE Matlab SET TRUSTWORTHY ON;

CREATE ASSEMBLY [system.drawing]
  FROM 'C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Drawing.dll'
  WITH PERMISSION_SET = UNSAFE;

CREATE ASSEMBLY MWArray from 'C:\Program Files\MATLAB\R2008b\toolbox\dotnetbuilder\bin\win32\v2.0\MWArray.dll'
  WITH PERMISSION_SET = UNSAFE;

CREATE ASSEMBLY stddev from 'C:\Users\\Documents\MATLAB\stddev\distrib\stddev.dll'
  WITH PERMISSION_SET = UNSAFE;

Then trying to build in VS2008 will yield this error message:

Metadata file '.\Visual Studio 2008\Projects\stdevp_csharp\stdevp_csharp\obj\sqlclr\MWArray.dll' could not
be opened -- 'Error importing module 'ManagedCPPAPI.netmodule' of assembly
'.\Visual Studio 2008\Projects\stdevp_csharp\stdevp_csharp\obj\sqlclr\MWArray.dll'
-- The system cannot find the file specified.'

Which you can resolve by copying the file ManagedCPPAPI.netmodule across to the specified directory. However that is as far as I got, eventually being prevented from making further progress by a runtime error.

Average performance: N/A (Runtime Failure)

B.3) Naive custom C# user defined aggregate

Take the C# implementation from C.3) and load it into the database as a user defined aggregate. The process to create a UDA with VS2008 is:

  • New Project -> Visual C# -> Database -> SQL Server Project
  • On the project name in the solution explorer, right click -> Add -> Aggregate...

You'll get a template for an aggregate function which I've altered as below:

[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.UserDefined,
    IsInvariantToNulls = true,
    IsInvariantToDuplicates = false,
    IsInvariantToOrder = true,
    MaxByteSize = -1) //maximum size in bytes of persisted value
    ]
public struct stdevp_cs_preload : IBinarySerialize
{
    private int [] a;
    private int i;

    private double stdevp_csharp_inner(int[] value)
    {
        double[] average = { 0, 0 };
        double stdev = 0;

        for (int i = 1; i <= value.Length; i++)
        {
            average[i % 2] = average[(i + 1) % 2] + (value[i - 1] - average[(i + 1) % 2]) / i;
            stdev += (value[i - 1] - average[(i + 1) % 2]) * (value[i - 1] - average[i % 2]);
        }

        return Math.Sqrt(stdev / (value.Length-1));
    }

    public void Init()
    {
        a = new int[1000000];
        i = 0;
    }

    public void Accumulate(SqlInt32 value)
    {
        if (value.IsNull)
        {
            return;
        }

        a[i++] = value.Value;
    }

    public void Merge(stdevp_cs_preload Group)
    {
        // TODO

    }

    public SqlDouble Terminate()
    {
        return new SqlDouble(stdevp_csharp_inner(a));
    }

    public void Read(BinaryReader r)
    {
        i = r.ReadInt32();

        a = new int[i];
        for (int j = 0; j < i; j++)
        {
            a[j] = r.ReadInt32();
        }
    }
    public void Write(BinaryWriter w)
    {
        w.Write(i);
        for (int j = 0; j < i; j++)
        {
            w.Write(a[j]);
        }
    }
}

Now you'll probably have noted that this creates an array with a million elements, rather conveniently the precise size of tested data. Of course this isn't practical or deployable and you'd need to use dynamic memory allocation depending on the size of the dataset. But this is only a simple test and two things stop me from writing the proper code, dynamic allocation will only slow the code so the time achieved is a minimum baseline and there's a much better way of writing the code so no need to spend time on this version. Later on in this post you'll see that this approach is not even close to being in contention in terms of performance but if you really wanted to you could investigate optimizing the memory allocation using stackalloc which creates a pinned (fast) block of temporary memory, that gets deleted (popped off the stack) on exit.

The large memory allocation exceeds the typical MaxByteSize of 8000 so I've had to use -1 for unlimited and this prevents deployment from within Visual Studio and you need to manually deploy.

Average performance: 1180 ms, or more than twice as long as the native STDEV.

B.4) Improved C# user defined aggregate

Since the user defined aggregate functions are called for every row there's no need to store the data, nor is there a requirement for a loop. Simply taking the sample algorithm and putting the part from the loop into the Accumulate function greatly reduces the amount of memory needed and allows the processing to start before all the data is gathered and should yield superior performance. Here's the modified code:

[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.UserDefined,
    IsInvariantToNulls = true,
    IsInvariantToDuplicates = false,
    IsInvariantToOrder = true,
    MaxByteSize = 28) //maximum size in bytes of persisted value
    ]
public struct stdevp_cs : IBinarySerialize
{
    private double[] a;
    private double q;
    private int i;

    public void Init()
    {
        a = new double[2];
        a[0] = 0; a[1] = 0;

        q = 0;
        i = 0;
    }

    public void Accumulate(SqlInt32 value)
    {
        if (value.IsNull)
        {
            return;
        }

        i++;

        a[i % 2] = a[(i + 1) % 2] + (value.Value - a[(i + 1) % 2]) / i;
        q += (value.Value - a[(i + 1) % 2]) * (value.Value - a[i % 2]);
    }

    public void Merge(stdevp_cs Group)
    {
        // TODO
    }

    public SqlDouble Terminate()
    {
        return new SqlDouble(Math.Sqrt(q / i-1));
    }

    public void Read(BinaryReader r)
    {
        i = r.ReadInt32();
        q = r.ReadDouble();
        a = new double[2];
        a[0] = r.ReadDouble();
        a[1] = r.ReadDouble();
    }
    public void Write(BinaryWriter w)
    {
        w.Write(i);
        w.Write(q);
        w.Write(a[0]);
        w.Write(a[1]);
    }
}

Average performance: 969 ms, which compared to the built-in STDEV's performance of 555 ms, is roughly 75% slower.

C) C# Application (using Stopwatch class for timing)

C.0) Retrieve values from database

In order to run any C# stdev implementation we'll need to retrieve the data:

double[] a = (from p in db.datas
              select (double)p.value).ToArray();

Average performance: 451 ms

Note: Linq to SQL is not the fastest way to access a database and you could potentially cut this time by using something more low-level like a DataReader or DataTable. A good comparison of the various C# database access approaches can be found here Entity Framework and LINQ to SQL Performance. My preference is Linq to SQL for convenience and good although not optimal performance.

C.1) Retrieve STDEV from database

Linq won't translate a STDEV function call across so first create a view on the data with STDEV:

CREATE VIEW dbo.stdev_data(value) AS
SELECT STDEV(value) value
  FROM Matlab.dbo.data

Then create a "Linq to SQL" class to connect to the database:

  • On the project name in the solution explorer, right click -> Add -> New Item...
  • Data->Linq to SQL Classes
  • Browse Server Explorer and drag view onto DataClasses1.dbml

then call it:

DataClasses1DataContext db = new DataClasses1DataContext();

double stdev = (from p in db.stdev_datas
                select (double)p.value).Single();

Average performance: 517 ms

C.2) Call Matlab DLL

Let's reuse the .NET DLL created with BuilderNE for B.2), add a reference to it in out project, along with the MWArray and then call it:

using MathWorks.MATLAB.NET.Utility;
using MathWorks.MATLAB.NET.Arrays;
/* ... */
double test_matlab(ref double[] value)
{
    Stddev oStddev = new Stddev();

    MWNumericArray values = new MWNumericArray(value.Length, 1, value);

    return oStddev.std(values).ToArray().Cast().First();
}

Notes: This option requires the Matlab runtime (MCR) which is fairly significant at 257 MB and may limit your deployment options.

 Directory of C:\Program Files\MATLAB\R2008b\toolbox\compiler\deploy\win32

17/09/2008  08:15 PM       270,489,744 MCRInstaller.exe

One time MCR initialization: 2025 ms

Average performance: 95 ms

C.3) Custom C# implementation

Here's a simple implementation of Welford from the Wikipedia in C#:

static double stdevp_csharp(ref double[] value, ref double mean, ref double variance, bool fcompensate)
{
    double[] average = {0.0,0.0};
    double stdev = 0;

    mean = 0; // adjust values

    if ((fcompensate) && (value.Length > 1000))
    {
        int n = (int)(value.Length * 0.1);
        for (int i = 0; i <= (n - 1); i++)
        {
            mean = mean + value[i];
        }
        mean = mean / n;
    }

    for (int i = 1; i <= value.Length; i++)
    {
        average[i % 2] = average[(i+1) % 2] + (value[i-1] - mean - average[(i+1) % 2]) / i;
        stdev += (value[i-1] - mean - average[(i+1) % 2]) * (value[i-1] - mean - average[i % 2]);
    }

    mean = average[value.Length % 2];
    variance = stdev / (value.Length - 1);

    return Math.Sqrt(variance);
}

Average performance: 21 ms

C.4) Alglib's implementation

Reimplement the Alglib calculatemoments function which uses a corrected two-pass algorithm) to remove the skew and kurtosis:

// Copyright (c) 2007, Sergey Bochkanov (ALGLIB project), GNU GPL: http://www.fsf.org/licensing/licenses
public static void calc_stddev(ref double[] x, int n, ref double stddev, ref double mean, bool fcompensate)
{
    int i;
    double v1 = 0;
    double v2 = 0;
    double variance;
    double sum = 0;

    if (fcompensate)
    {
        for (i = 0; i <= (n - 1); i++)
        {
            sum = sum + x[i];
        }
        mean = sum / n;
    }

    //
    // Variance (using corrected two-pass algorithm)
    //
    if (n != 1)
    {
        v1 = 0; v2 = 0;
        for (i = 0; i <= n - 1; i++)
        {
            v1 = v1 + (x[i] - mean) * (x[i] - mean);
            v2 = v2 + (x[i] - mean);
        }

        v2 = v2*v2 / n;
        variance = (v1 - v2) / (n-1);
        if ((double)(variance) < (double)(0))
        {
            variance = 0;
        }
        stddev = Math.Sqrt(variance);
    }
}

Average performance: 5 ms

Conclusion

Summary (Performance in ms):

Matlab Native 13250
SQL Server   Native 555
Matlab N/A
C# Naive 1180
C# Improved   969
C# SQL 517
Matlab 451+95=546  
C# Welford 451+21=472
C# Alglib 451+5=456

The dataset was chosen to be particularly tricky, essentially random numbers between 0-9 with a big number added to them. We know that the STDEV of numbers between 0-9 would likely be less than five so it is interesting to measure the accuracy of the various algorithms:

Summary (Accuracy):

SQL 502.056914348508  
Matlab 2.91647646672706
C# Welford 2.916476466717
C# Welford (uncompensated)   2.91647680582741
C# Alglib 2.91647646672706
C# Alglib (uncompensated) 502.056914348508

So the native implementation of STDEV on SQL Server is not particularly accurate in this tough test case, coming up with a value over 500 for something we know can't possibly be more than 5 and is likely less than 3. Other algorithms agree on a number around 2.9 except for the uncompensated Alglib variant which gets the same number as SQL Server.

Timings are all for compensated algorithms. Compensating is subtracting the precalculated mean from each value to decrease their magnitude and thus decrease rounding errors. There are surprising results in those tables, particularly notable is that both of the C# algorithms outperform the database built-in STDEV function both in accuracy and speed despite needing to pay a performance penalty of 451 ms to retrieve the data. The Alglib function returns the same value as Matlab in under 10% of the time that Matlab needs and the uncompensated version returns the same wrong value that SQL Server's native STDEV comes up with. To answer the question of whether or not Matlab is competitive on time, I would say that for this test it is. Most of the time spent is retrieving the dataset into the C# app for processing and the cost of actually calculating the STDEV is small relative to that. For problems where there is more to be calculated the performance ratio of 19:1 between Matlab's 95ms and Alglib 5ms will be a bigger consideration. So we found out what we probably could have guessed from the start - if you aren't doing too much math then the time taken in I/O and database retrieval will drown out the differences in the performance of the algorithm and Matlab is an option. But the really surprising result for me was that for this sample size it is quicker and more accurate to take the data out of the database and process it in C# rather than use the native database functions and using SQLCLR just further slows things down.

Notes: To boost the accuracy of SQL Server's STDEV you'd need to precalc the Average, subtract it from each value and then run stdev. This would slow the algorithm further and make it less competitive on performance although boost the accuracy. Matlab, SQL Server and the C# app were all running on the same machine.

Chrome Extension – Transperth Live Train Times

Install: Transperth Live Train Times Extension (8kb)Source: ArbitraryWebSlice2.zip

Chrome Extension IconAfter finding how simple it was to build the Arbitrary Web Slice Chrome Extension for the BOM website I decided to try another site. This time I wanted to get the TransPerth live train times because the full site is slow to load, the refresh reloads the whole page and some of the images don’t work. I wanted to just capture this part, but with working images rather than broken links:

By taking a slice I could address all of these issues and get the functionality I wanted. For this project, although still small, I expanded my horizons just a little and incorporated some manipulation of the returned data, an options page, some local caching via the localStorage and a very minor use of the Chrome API to load a new tab with the full page.

The code in the extension is split across three files:

  1. popup.html : Performs the core page retrieval and slicing action, showing the train times
  2. options.html : Allows the user to select a station (list of stations also sliced)
  3. common.js : Functions/variables common to both the popup and options

The popup.html file contains the most interesting code, with options.html performing largely similar tasks. The three core functions in popup.html are:

  1. addHandlers : When the sliced HTML is on the page we first delete the original onclick handler and add our own click actions here to do a custom refresh. Note: chrome.tabs.create is the call to the Chrome API which creates a new tab
  2. storeMain : Once the HTML has been retrieved:
    • Start with the <h2> with “Live Departure Board” and searching back up the tree grab the contents of first (0-indexed) table, then go one step up to capture the <table> too rather than just the contents.
    • Check that something was retrieved and if not assume no service
    • Search and replace to fix the broken images, then store the html to localStorage
  3. $(document).ready : When the page loads this runs and:
    • Checks to see if there is something in localStorage and if yes whether it is the station we are currently interested in

      • True: Grab the HTML from storage, display it so the user can immediately see what old times were, then fire the click handler to start the refresh process to get new times
      • False: Tell the user which station we are retrieving and then run the slicing code

When reviewing the code you might spot the unusual use of setTimeout to invoke the slicing rather than direct invokation, this gives us the ability to provide a small amount of feedback to tell the user that something is going on, with an AJAX spinner in this case. Credit to author of the Coffee, Code and Chocolate which is where I learned this method. Here’s the javascript from the popup.html file:

function addHandlers() {
    $("input").removeAttr("onclick")
              .click(function() {
                        $input = $("body").find("input").parent()
                                          .attr("align","left")
                                          .html("&nbsp;<img src=\"ajaxinfo_spinner.gif\"/>Refreshing Times...");
                        takeSlice();
                    });

    $("a").click(function() {chrome.tabs.create({url: url})});
}

function store(html) {
    storeOptions(html);
    return storeMain(html);
}

function storeMain(html) {
    var $slice = $(html).find("h2:contains('Live Departure Board')").parents("table:eq(0)").parent();
    if ($slice.html() == null) {
        return "No services currently scheduled for " + station_name;
    }

    $("<a href='#'>Transperth: Live Train Times</a>").appendTo($slice);

    localStorage["last_slice"] = ($slice.html()).replace(/www.dev.transperth/g,"www.transperth");
    localStorage["last_slice_stn"] = station_name;

    return localStorage["last_slice"];
}

$(document).ready(function() {
    setup();

    if ( (typeof(localStorage["last_slice"]) != "undefined") && (station_name == localStorage["last_slice_stn"])) {
        $("body").html(localStorage["last_slice"]);
        addHandlers();
        setTimeout(function() {$("input").click();}, CONST_TIMEOUT_MS);
    } else {
        $("span").html("Loading times for " + station_name);
        setTimeout(takeSlice, CONST_TIMEOUT_MS);
    }

})

Here’s the completed extension:

Notes: The actual slicing action is not shown but performed by an AJAX call takeSlice() in the common.js file. The station options are sliced from the page every time the popup.html loads and stored in localStorage["options"] to keep them fresh. Lastly this extension is also a content script which fixes the broken images on the Transperth train times page.

Known Issues: Because jquery is loaded from Google’s CDN if you get a cache miss there could be a long period where jQuery is downloaded without feedback to the user that something is happening leading to a poor user experience.

Disclaimer: This is not a Transperth product and is not endorsed by them. There is no association between this site, this extension and Transperth.

Chrome Extension – Arbitrary Web Slices

Chrome Extension IconTo see how Chrome Extensions work I revisited my Arbitrary Web Slices post.  A great starting point was Google’s own Hello World extension.  Indeed all I really needed to do was start with that, cut and paste code from my other post, add some icons and it was done.

Let’s start with the barely edited (from Google’s sample) manifest.json file:

{
  "name": "Perth BOM Forecast",
  "version": "1.0",
  "description": "Slice the 7 day forecast out of the Bureau of Meterology webpage",
  "icons": {"128": "bomlogo.jpg" },
  "browser_action": {
    "default_icon": "icon.png",
    "popup": "popup.html"
  },
  "permissions": [
    "http://www.bom.gov.au/"
  ]
}

The only addition there is to add “icons” which sets the icon shown on the extensions page. The most important modification is to set the permissions to allow us to use cross site scripting so we can get to the BOM page. For more information on what goes into manifest files take a look a Google’s help.

Next we take a look at the very simple code used to take the slice we want from the page and display it.

1)  Import jQuery from the Google CDN
2)  Make an AJAX call to the BOM website
3)  Display the results

Here’s the popup.html code which does the work:

<style>
body{ overflow:hidden; } // Hide scrollbars
</style>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>
<script>
var url = "http://www.bom.gov.au/weather/wa/forecasts/perth.shtml";

function loadForecast() {
	$.ajax({ url: url, context: document.body, success: function(html){
		$bom = $(html).find("DIV#content");
        $(this).find("DIV#content").html($bom.html());
	}});
}

$(document).ready(function() {
	setTimeout(loadForecast,1);
});
</script>
<body>
<div id="content">
Loading...
</div>
</body>

The loadForecast function takes the slice and loads it. Note that instead of calling loadForecast directly I’ve used a setTimeout with a 1 ms timeout to cause Chrome to render the partially completed page with a loading message to let the user know something is going on. For a page which takes a bit longer to load you might want consider a proper ProgressBar or an AJAX spinner.

That’s all that you need! You can download the files, then use the “Load unpacked extension…” to load the unpacked version or alternately simply click here to load the packed version. Feel free to take this simple example and create your own WebSlice chrome extension. Don’t forget to publish your work in the extensions gallery.

You might also be interested in: Chrome Extension – Transperth Live Train Times

C++ pointers to non-static functions

Something that you don’t see very often in C++ are pointers to non-static class functions. Here’s a simple non functional demonstration of using a pointer to a non-static function:

class A
{
public:
      void Test(int& in)
      {
      }
};
 
int _tmain(int argc, _TCHAR* argv[])
{
void (A::*f)(int&); // Declare f
      f = &A::Test;       // Assign f
      A* obj1 = new A();  // Create object of type A
      int iTest = 23;     // Create an integer that we can get a reference to
      (obj1->*f)(iTest);  // Call the fuction pointed to by f
      delete obj1;
      return 0;
}

So what are their uses? If you have very tight performance constraints they can be used to replace case statements. Here’s an example of using an enumerated type to determine what object is created from an object factory using pointers to non-static class functions:

class Shape
{
};
 
class Square : public Shape
{
};
 
class Circle : public Shape
{
};
 
class Triangle : public Shape
{
};
 
class ShapeFactory
{
public:
      Shape* CreateSquare(int& color)
      {
            return new Square();
      }
      Shape* CreateCircle(int& color)
      {
            return new Circle();
      }
      Shape* CreateTriangle(int& color)
      {
            return new Triangle();
      }
};
 
typedef enum {
      SQUARE,
      CIRCLE,
      TRIANGLE
} ShapeType;
 
typedef Shape* (ShapeFactory::*Functions)(int&);
 
int _tmain(int argc, _TCHAR* argv[])
{
      Functions f[3] = {&ShapeFactory::CreateSquare, &ShapeFactory::CreateCircle, &ShapeFactory::CreateTriangle};
 
      ShapeFactory* factory = new ShapeFactory();
 
      ShapeType type = CIRCLE;
 
      int color = 23;
      Shape* shape = (factory->*f[type])(color);
      delete shape;
      delete factory;
 
      return 0;
}

CheesyFinance.HTA

Following on from my earlier post on AJAX Push – Comet I decided to add some pure javascript graphing, a little logging to disk, some robustness to handle dropped connections, some pull AJAX to get the prices which don’t seem to stream (JPY) and some pretty tooltips.

This is the result of those changes:

Cheesy Finance HTA

The dynamic colouring of the prices as they move up and down is done using a nice little plugin which, when compressed down with a javascript minifier is only 957 bytes so I’ve included it inline in the code.

The code is available for download in a single HTA file which you may need to unblock to get running.

Handling all mouse movements at the top Main Form level in C#

In a C# WinForms application, mouse movement events always go to the child control that the mouse moves over. Sometimes you might want to track mouse movements at the Form level instead of handling them at the child control level. To do this you need to handle the raw windows messages at the application level. Here’s the source code for doing this:

public partial class MainForm : Form
{
      [SecurityPermission(SecurityAction.LinkDemand, Flags = SecurityPermissionFlag.UnmanagedCode)]
      public class TestMessageFilter : IMessageFilter
      {
            const int WM_MOUSEMOVE = 0x200;
            static int count = 0;
            public bool PreFilterMessage(ref Message m)
            {
                  // Blocks all the messages relating to the left mouse button.
                  if (m.Msg == WM_MOUSEMOVE)
                  {
                        System.Diagnostics.Trace.WriteLine("Mouse Moved " + (++count).ToString());
                        return true;
                  }
                  return false;
            }
      }
 
      public MainForm()
      {
            InitializeComponent();
            Application.AddMessageFilter(new TestMessageFilter());
      }

The WM_MOUSEMOVE constant is from C/C++ programming and is part of the Win32 API. It’s declared in Winuser.h which can be found in the Windows SDK if you have Visual C++ installed.

How to show a Child Window from a Console window in C#

An amusing little example I came up with of how to open a window that is a child window of a console window. Just in case I need it in the future. Also useful for spawning a window from an OpenGL app, or hanging a child window off a separate thread.

// This is an example of how to open a Child Window from a Console
// Window in C#.
// By John Stewien 2010
 
using System;
using System.Text;
using System.Threading;
using System.Runtime.InteropServices;
using System.Windows.Forms;
 
namespace ConsoleApplication {
 
  /// <summary>
  /// A simple form with a CheckBox that can be used to verify the
  /// responsiveness of the Form.
  /// </summary>
  public class Form1 : Form {
    public Form1() {
      CheckBox checkBox1 = new System.Windows.Forms.CheckBox();
      checkBox1.AutoSize = true;
      checkBox1.Text = "Check / Uncheck to test responsiveness";
      Controls.Add(checkBox1);
      Text = "Form1";
    }
  }
 
  class Program {
 
    // Import the required Windows API functions
 
    [DllImport("User32.dll", CharSet = CharSet.Auto)]
    static extern IntPtr FindWindow(char[] lpClassName, char[] lpWindowName);
 
    [DllImport("User32.dll", CharSet = CharSet.Auto)]
    static extern int SetWindowLongW(HandleRef hWnd, int nIndex, IntPtr dwNewLong);
    const int GWLP_HWNDPARENT = -8;
 
    // Store these parameters as statics as they are called from static methods
 
    static IntPtr consoleHWnd;
    static Form1 form;
 
    /// <summary>
    /// Method for running the form, called from a thread
    /// </summary>
    static void RunForm() {
      form = new Form1();
      SetWindowLongW(new HandleRef(form, form.Handle), GWLP_HWNDPARENT, consoleHWnd);
      Application.Run(form);
    }
 
    // Delegate for invoking a method with no parameters
    delegate void SimpleDelegate();
 
    static void Main(string[] args) {
 
      // Set the Console Title and then find its HWND
      Console.Title = "Child Window On Console Test";
      consoleHWnd = FindWindow("ConsoleWindowClass".ToCharArray(), Console.Title.ToCharArray());
 
      // Run the Form
      Thread thread = new Thread(new ThreadStart(RunForm));
      thread.Start();
 
      // Do some text stuff on the Console to show it is working
      Console.WriteLine("Enter Text. Enter \"exit\" to exit.");
      string text = "";
      while (text != "exit")
        text = Console.ReadLine();
 
      // Close the Form so we can exit
      form.Invoke(new SimpleDelegate(form.Close));
    }
  }
}

C# WPF Anonymous Delegate

I had a WPF application with some background tasks that intermittently updated their status properties. These properties were bound to some WPF controls, so the updates had to be dispatched on the main window thread. This is simple enough, you just call Dispatcher.Invoke on a delegate.
 
You can do it like this:

private delegate void SetDelegate(DependencyProperty prop, object value);

void CountThread() {
  int count = 0;
  while (true) {
    this.Dispatcher.Invoke( new SetDelegate(SetValue), CountProperty, (count++).ToString());
    Thread.Sleep(1000);
  }
}

But I prefer to use anonymous methods cast to an Action like this:

void CountThread() {
  int count = 0;
  while (true) {
    this.Dispatcher.Invoke(
      // Anonymous delegate that calls SetValue on the Window thread
      (Action)delegate() {
        SetValue(CountProperty, (count++).ToString());
    });
    Thread.Sleep(1000);
  }
}

Here’s a complete C# application for demonstrating an anonymous delegate. Also shows how to run a WPF application using just C# code. Create a WPF application, delete all the .xaml and .cs files, create a .cs file, copy the contents below to that file, compile and run.

// Simple application for demonstrating an anonymous delegate by John Stewien 2010

using System;
using System.Windows;
using System.Threading;
using System.Windows.Controls;
using System.Windows.Data;

namespace AnonDelegateTest {

  /// <summary>
  /// Application that runs the Window
  /// </summary>
  public class App : Application {
    /// <summary>
    /// Application Entry Point.
    /// </summary>
    [System.STAThreadAttribute()]
    public static void Main() {
      App app = new App();
      app.Run(new Window1());
    }
  }

  /// <summary>
  /// Main Application Window
  /// </summary>
  public class Window1 : Window {
    public Window1() {

      // Set the data context and add a Textbox with a binding to Count
      this.DataContext = this;
      this.Width = this.Height = 300;
      this.Title = "Anonymous Delegate Example";

      TextBox textbox = new TextBox();
      textbox.SetBinding(TextBox.TextProperty, new Binding("Count"));
      this.AddChild(textbox);
     
      // Create and run the thread to increment Count
      Thread countThread = new Thread(new ThreadStart(CountThread));
      countThread.IsBackground = true;
      countThread.Start();
    }

    /// <summary>
    /// DependencyProperty for binding to the textbox
    /// </summary>
    public static readonly DependencyProperty CountProperty =
    DependencyProperty.Register("Count", typeof(string), typeof(Window1), new UIPropertyMetadata("0"));

    /// <summary>
    /// Increment the count value on a separate thread
    /// </summary>
    void CountThread() {
      int count = 0;
      while (true) {
        this.Dispatcher.Invoke(
          // Anonymous delegate that calls SetValue on the Window thread
          (Action)delegate() {
          SetValue(CountProperty, (count++).ToString());
        });
        Thread.Sleep(1000);
      }
    }
  }
}

Getting a recursive directory listing in C++

I had a collection of 3D Studio Max files that were in separate directories with their textures. I had an open source 3DS file loader (lib3ds), and I wanted to iterate over all the files in the directories and print out a list of specs for all the files. To do this I wrote a quick routine to pull in all the filenames and then load each one. Below is the simple function I wrote for retrieving a recursive list of files from a directory. Included is a main function to test it. To use it in a Unicode app, change std::string to std::wstring and std::wcout.

#include <windows.h>
#include <string>
#include <iostream>
#include <list>

// This recursively gets all the filenames that match the filter, and adds
// them to the std list passed in
void GetFileListing(std::list<std::string>& listing, std::string directory, std::string fileFilter, bool recursively=true)
{
  // If we are going to recurse over all the subdirectories, first of all
  // get all the files that are in this directory that match the filter
  if (recursively)
    GetFileListing(listing, directory, fileFilter, false);

  directory += "\\";

  WIN32_FIND_DATA FindFileData;
  HANDLE hFind = INVALID_HANDLE_VALUE;

  // Setup the filter according to whether we are getting the directories
  // or just the files
  std::string filter = directory + (recursively ? "*" : fileFilter);

  // Find the first file in the directory.
  hFind = FindFirstFile(filter.c_str(), &FindFileData);

  if (hFind == INVALID_HANDLE_VALUE)
  {
    DWORD dwError = GetLastError();
    if (dwError!=ERROR_FILE_NOT_FOUND)
    {
      std::cout << "Invalid file handle for filter "<<filter<<". Error is " << GetLastError() << std::endl;
    }
  }
  else
  {
    // Add the first file found to the list
    if (!recursively)
    {
      listing.push_back(directory + std::string(FindFileData.cFileName));
    }

    // List all the other files in the directory.
    while (FindNextFile(hFind, &FindFileData) != 0)
    {
      if (!recursively)
      {
        listing.push_back(directory + std::string(FindFileData.cFileName));
      }
      else
      {
        // If we found a directory then recurse into it
        if ((FindFileData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)>0 && FindFileData.cFileName[0]!='.')
        {
          GetFileListing(listing, directory + std::string(FindFileData.cFileName), fileFilter);
        }
      }
    }

    DWORD dwError = GetLastError();
    FindClose(hFind);
    if (dwError != ERROR_NO_MORE_FILES)
    {
      std::cout << "FindNextFile error. Error is "<< dwError << std::endl;
    }
  }
}

int main(int argc, char* argv[])
{
  std::list<std::string> listing;
  GetFileListing(listing, ".", "*.3ds");
  for(std::list<std::string>::iterator it = listing.begin(); it!=listing.end();++it)
  {
    std::cout << *it << std::endl;
  }
  getchar();
}