Let’s say we have an mathematically intensive application written in C# with SQL Server as the backing database. When designing the application it can be convenient to call on the wealth of libraries and rather convenient IDE in a product such as Matlab. My normal thinking would be to first design and test the algorithms in Matlab and then once the design is complete, to rewrite the code in C# and SQL, possibly making use of the SQLCLR to write C# stored procedures. We could also make use of existing math/stats libraries where possible. However it turns out that Matlab has a tool called BuilderNE which can generate C# dlls which can allow your app to conveniently interface with the Matlab code. Using this you could directly call your algorithms inside Matlab and run the code you’ve designed without re-implementation. Now obviously you’ll be paying a performance penalty, but how much of one? I set out to find out. I decided to test a simple function, Standard Deviation over an entire population (STDEV), in a series of scenarios to see how the various options stacked up.
Note that while STDEVP might seem like an easy enough algorithm, there are several ways to implement it and each, due to floating point precision, will give different answers. For my C# implementation I’ve used the Welford algorithm as described on Wikipedia under Rapid calculation methods. You might also be interested in this blog post: Comparing three methods of computing standard deviation.
- Matlab
- SQL Server
- Native Implementation of STDEV
- Call .NET DLL exported from Matlab with SQLCLR
- Naive custom C# user defined aggregate
- Improved C# user defined aggregate
- C# Application
- C.1) Retrieve STDEV from database
- Call Matlab DLL
- Custom C# implementation
- Alglib’s implementation
All tests run an an Intel E4500 with 2GB RAM.
First we need to generate some sample test data for the comparison, enough that it takes an easily measureable amount of time to complete the calculation. So 1 million 7-digit random numbers. Note that I’ve named my database Matlab for this comparison.
use Matlab;
CREATE TABLE [dbo].[data](
[value] [numeric](10, 0) NOT NULL
) ON [PRIMARY]
DECLARE @Index INT ;
SET @Index = 1 ;
WHILE @Index <= 1000000 BEGIN
INSERT data (value) VALUES (ROUND(RAND() * 10,0) + 1000000000) ;
SET @Index = @Index + 1 ;
END
A) Matlab
Set up an ODBC connection called Matlab_user, retrieve the data and then run the stddev over the values. Here's the Matlab code:
conn = database('Matlab_user','','');
setdbprefs('DataReturnFormat','numeric');
sqlquery = 'SELECT value FROM data';
tElapsed_fetch = zeros(1,10);
tElapsed = zeros(1,10);
values = zeros(1,1000000);
for n = 1:10
tStart = tic; values = fetch(conn, sqlquery); tElapsed_fetch(n) = toc(tStart);
tStart = tic; std(values,1); tElapsed(n) = toc(tStart);
end
format long g
fetch = round(tElapsed_fetch*1000)'
calc = round(tElapsed*1000)'
total = fetch+calc
As expected, being outside the database means a big performance penalty with most of the time spent in data retrieval.
Average performance: 13250 ms (13217 ms fetch, 33 ms calculation)
B) SQL Server
B.1) The first and simplest test is simply to use SQL Server's native STDEV function:
DECLARE @Index INT ;
SET @Index = 1 ;
WHILE @Index <= 10 BEGIN
SET STATISTICS TIME ON
SELECT STDEV(value)
FROM data;
SET STATISTICS TIME OFF
SET @Index = @Index + 1 ;
END;
Average performance: 555 ms
Note if you want to reproduce this test on your own SQL Server instance but are getting strange timing results you may to look at SQL Server timing values may be incorrect when you use utilities or technologies that change CPU frequencies.
B.2) Call .NET DLL exported from Matlab with SQLCLR
It easy to generate a C# DLL using BuilderNE (>> deploytool) so why not put it directly into the database as an user defined aggregate function? The problem with doing this is that it requires all the referenced DLLs to be loaded into the database. In (2) we needed to use MWArray so let's try loading that into SQL Server:
CREATE ASSEMBLY MWArray from 'C:\Program Files\MATLAB\R2008b\toolbox\dotnetbuilder\bin\win32\v2.0\MWArray.dll'
WITH PERMISSION_SET = SAFE;
But we get an error message:
Msg 10301, Level 16, State 1, Line 1
Assembly 'MWArray' references assembly 'system.drawing, version=2.0.0.0, culture=neutral, publickeytoken=b03f5f7f11d50a3a.', which is not present in
the current database. SQL Server attempted to locate and automatically load the referenced assembly from the same location where referring assembly came from,
but that operation has failed (reason: 2(failed to retrieve text for this error. Reason: 15105)). Please load the referenced assembly into the current
database and retry your request.
Now while it is possible to load system.drawing into the database it is not recommended and isn't generally a viable option, see here for the method and a spirited discussion on the merits. While I was able to force the assemblies into the database:
USE Matlab;
ALTER DATABASE Matlab SET TRUSTWORTHY ON;
CREATE ASSEMBLY [system.drawing]
FROM 'C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Drawing.dll'
WITH PERMISSION_SET = UNSAFE;
CREATE ASSEMBLY MWArray from 'C:\Program Files\MATLAB\R2008b\toolbox\dotnetbuilder\bin\win32\v2.0\MWArray.dll'
WITH PERMISSION_SET = UNSAFE;
CREATE ASSEMBLY stddev from 'C:\Users\\Documents\MATLAB\stddev\distrib\stddev.dll'
WITH PERMISSION_SET = UNSAFE;
Then trying to build in VS2008 will yield this error message:
Metadata file '.\Visual Studio 2008\Projects\stdevp_csharp\stdevp_csharp\obj\sqlclr\MWArray.dll' could not
be opened -- 'Error importing module 'ManagedCPPAPI.netmodule' of assembly
'.\Visual Studio 2008\Projects\stdevp_csharp\stdevp_csharp\obj\sqlclr\MWArray.dll'
-- The system cannot find the file specified.'
Which you can resolve by copying the file ManagedCPPAPI.netmodule across to the specified directory. However that is as far as I got, eventually being prevented from making further progress by a runtime error.
Average performance: N/A (Runtime Failure)
B.3) Naive custom C# user defined aggregate
Take the C# implementation from C.3) and load it into the database as a user defined aggregate. The process to create a UDA with VS2008 is:
- New Project -> Visual C# -> Database -> SQL Server Project
- On the project name in the solution explorer, right click -> Add -> Aggregate...
You'll get a template for an aggregate function which I've altered as below:
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.UserDefined,
IsInvariantToNulls = true,
IsInvariantToDuplicates = false,
IsInvariantToOrder = true,
MaxByteSize = -1) //maximum size in bytes of persisted value
]
public struct stdevp_cs_preload : IBinarySerialize
{
private int [] a;
private int i;
private double stdevp_csharp_inner(int[] value)
{
double[] average = { 0, 0 };
double stdev = 0;
for (int i = 1; i <= value.Length; i++)
{
average[i % 2] = average[(i + 1) % 2] + (value[i - 1] - average[(i + 1) % 2]) / i;
stdev += (value[i - 1] - average[(i + 1) % 2]) * (value[i - 1] - average[i % 2]);
}
return Math.Sqrt(stdev / (value.Length-1));
}
public void Init()
{
a = new int[1000000];
i = 0;
}
public void Accumulate(SqlInt32 value)
{
if (value.IsNull)
{
return;
}
a[i++] = value.Value;
}
public void Merge(stdevp_cs_preload Group)
{
// TODO
}
public SqlDouble Terminate()
{
return new SqlDouble(stdevp_csharp_inner(a));
}
public void Read(BinaryReader r)
{
i = r.ReadInt32();
a = new int[i];
for (int j = 0; j < i; j++)
{
a[j] = r.ReadInt32();
}
}
public void Write(BinaryWriter w)
{
w.Write(i);
for (int j = 0; j < i; j++)
{
w.Write(a[j]);
}
}
}
Now you'll probably have noted that this creates an array with a million elements, rather conveniently the precise size of tested data. Of course this isn't practical or deployable and you'd need to use dynamic memory allocation depending on the size of the dataset. But this is only a simple test and two things stop me from writing the proper code, dynamic allocation will only slow the code so the time achieved is a minimum baseline and there's a much better way of writing the code so no need to spend time on this version. Later on in this post you'll see that this approach is not even close to being in contention in terms of performance but if you really wanted to you could investigate optimizing the memory allocation using stackalloc which creates a pinned (fast) block of temporary memory, that gets deleted (popped off the stack) on exit.
The large memory allocation exceeds the typical MaxByteSize of 8000 so I've had to use -1 for unlimited and this prevents deployment from within Visual Studio and you need to manually deploy.
Average performance: 1180 ms, or more than twice as long as the native STDEV.
B.4) Improved C# user defined aggregate
Since the user defined aggregate functions are called for every row there's no need to store the data, nor is there a requirement for a loop. Simply taking the sample algorithm and putting the part from the loop into the Accumulate function greatly reduces the amount of memory needed and allows the processing to start before all the data is gathered and should yield superior performance. Here's the modified code:
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Format.UserDefined,
IsInvariantToNulls = true,
IsInvariantToDuplicates = false,
IsInvariantToOrder = true,
MaxByteSize = 28) //maximum size in bytes of persisted value
]
public struct stdevp_cs : IBinarySerialize
{
private double[] a;
private double q;
private int i;
public void Init()
{
a = new double[2];
a[0] = 0; a[1] = 0;
q = 0;
i = 0;
}
public void Accumulate(SqlInt32 value)
{
if (value.IsNull)
{
return;
}
i++;
a[i % 2] = a[(i + 1) % 2] + (value.Value - a[(i + 1) % 2]) / i;
q += (value.Value - a[(i + 1) % 2]) * (value.Value - a[i % 2]);
}
public void Merge(stdevp_cs Group)
{
// TODO
}
public SqlDouble Terminate()
{
return new SqlDouble(Math.Sqrt(q / i-1));
}
public void Read(BinaryReader r)
{
i = r.ReadInt32();
q = r.ReadDouble();
a = new double[2];
a[0] = r.ReadDouble();
a[1] = r.ReadDouble();
}
public void Write(BinaryWriter w)
{
w.Write(i);
w.Write(q);
w.Write(a[0]);
w.Write(a[1]);
}
}
Average performance: 969 ms, which compared to the built-in STDEV's performance of 555 ms, is roughly 75% slower.
C) C# Application (using Stopwatch class for timing)
C.0) Retrieve values from database
In order to run any C# stdev implementation we'll need to retrieve the data:
double[] a = (from p in db.datas
select (double)p.value).ToArray();
Average performance: 451 ms
Note: Linq to SQL is not the fastest way to access a database and you could potentially cut this time by using something more low-level like a DataReader or DataTable. A good comparison of the various C# database access approaches can be found here Entity Framework and LINQ to SQL Performance. My preference is Linq to SQL for convenience and good although not optimal performance.
C.1) Retrieve STDEV from database
Linq won't translate a STDEV function call across so first create a view on the data with STDEV:
CREATE VIEW dbo.stdev_data(value) AS
SELECT STDEV(value) value
FROM Matlab.dbo.data
Then create a "Linq to SQL" class to connect to the database:
- On the project name in the solution explorer, right click -> Add -> New Item...
- Data->Linq to SQL Classes
- Browse Server Explorer and drag view onto DataClasses1.dbml
then call it:
DataClasses1DataContext db = new DataClasses1DataContext();
double stdev = (from p in db.stdev_datas
select (double)p.value).Single();
Average performance: 517 ms
C.2) Call Matlab DLL
Let's reuse the .NET DLL created with BuilderNE for B.2), add a reference to it in out project, along with the MWArray and then call it:
using MathWorks.MATLAB.NET.Utility;
using MathWorks.MATLAB.NET.Arrays;
/* ... */
double test_matlab(ref double[] value)
{
Stddev oStddev = new Stddev();
MWNumericArray values = new MWNumericArray(value.Length, 1, value);
return oStddev.std(values).ToArray().Cast().First();
}
Notes: This option requires the Matlab runtime (MCR) which is fairly significant at 257 MB and may limit your deployment options.
Directory of C:\Program Files\MATLAB\R2008b\toolbox\compiler\deploy\win32
17/09/2008 08:15 PM 270,489,744 MCRInstaller.exe
One time MCR initialization: 2025 ms
Average performance: 95 ms
C.3) Custom C# implementation
Here's a simple implementation of Welford from the Wikipedia in C#:
static double stdevp_csharp(ref double[] value, ref double mean, ref double variance, bool fcompensate)
{
double[] average = {0.0,0.0};
double stdev = 0;
mean = 0; // adjust values
if ((fcompensate) && (value.Length > 1000))
{
int n = (int)(value.Length * 0.1);
for (int i = 0; i <= (n - 1); i++)
{
mean = mean + value[i];
}
mean = mean / n;
}
for (int i = 1; i <= value.Length; i++)
{
average[i % 2] = average[(i+1) % 2] + (value[i-1] - mean - average[(i+1) % 2]) / i;
stdev += (value[i-1] - mean - average[(i+1) % 2]) * (value[i-1] - mean - average[i % 2]);
}
mean = average[value.Length % 2];
variance = stdev / (value.Length - 1);
return Math.Sqrt(variance);
}
Average performance: 21 ms
C.4) Alglib's implementation
Reimplement the Alglib calculatemoments function which uses a corrected two-pass algorithm) to remove the skew and kurtosis:
// Copyright (c) 2007, Sergey Bochkanov (ALGLIB project), GNU GPL: http://www.fsf.org/licensing/licenses
public static void calc_stddev(ref double[] x, int n, ref double stddev, ref double mean, bool fcompensate)
{
int i;
double v1 = 0;
double v2 = 0;
double variance;
double sum = 0;
if (fcompensate)
{
for (i = 0; i <= (n - 1); i++)
{
sum = sum + x[i];
}
mean = sum / n;
}
//
// Variance (using corrected two-pass algorithm)
//
if (n != 1)
{
v1 = 0; v2 = 0;
for (i = 0; i <= n - 1; i++)
{
v1 = v1 + (x[i] - mean) * (x[i] - mean);
v2 = v2 + (x[i] - mean);
}
v2 = v2*v2 / n;
variance = (v1 - v2) / (n-1);
if ((double)(variance) < (double)(0))
{
variance = 0;
}
stddev = Math.Sqrt(variance);
}
}
Average performance: 5 ms
Conclusion
Summary (Performance in ms):
| Matlab |
Native |
13250 |
| SQL Server |
Native |
555 |
| Matlab |
N/A |
| C# Naive |
1180 |
| C# Improved |
969 |
| C# |
SQL |
517 |
| Matlab |
451+95=546 |
| C# Welford |
451+21=472 |
| C# Alglib |
451+5=456 |
The dataset was chosen to be particularly tricky, essentially random numbers between 0-9 with a big number added to them. We know that the STDEV of numbers between 0-9 would likely be less than five so it is interesting to measure the accuracy of the various algorithms:
Summary (Accuracy):
| SQL |
502.056914348508 |
| Matlab |
2.91647646672706 |
| C# Welford |
2.916476466717 |
| C# Welford (uncompensated) |
2.91647680582741 |
| C# Alglib |
2.91647646672706 |
| C# Alglib (uncompensated) |
502.056914348508 |
So the native implementation of STDEV on SQL Server is not particularly accurate in this tough test case, coming up with a value over 500 for something we know can't possibly be more than 5 and is likely less than 3. Other algorithms agree on a number around 2.9 except for the uncompensated Alglib variant which gets the same number as SQL Server.
Timings are all for compensated algorithms. Compensating is subtracting the precalculated mean from each value to decrease their magnitude and thus decrease rounding errors. There are surprising results in those tables, particularly notable is that both of the C# algorithms outperform the database built-in STDEV function both in accuracy and speed despite needing to pay a performance penalty of 451 ms to retrieve the data. The Alglib function returns the same value as Matlab in under 10% of the time that Matlab needs and the uncompensated version returns the same wrong value that SQL Server's native STDEV comes up with. To answer the question of whether or not Matlab is competitive on time, I would say that for this test it is. Most of the time spent is retrieving the dataset into the C# app for processing and the cost of actually calculating the STDEV is small relative to that. For problems where there is more to be calculated the performance ratio of 19:1 between Matlab's 95ms and Alglib 5ms will be a bigger consideration. So we found out what we probably could have guessed from the start - if you aren't doing too much math then the time taken in I/O and database retrieval will drown out the differences in the performance of the algorithm and Matlab is an option. But the really surprising result for me was that for this sample size it is quicker and more accurate to take the data out of the database and process it in C# rather than use the native database functions and using SQLCLR just further slows things down.
Notes: To boost the accuracy of SQL Server's STDEV you'd need to precalc the Average, subtract it from each value and then run stdev. This would slow the algorithm further and make it less competitive on performance although boost the accuracy. Matlab, SQL Server and the C# app were all running on the same machine.