Loading...
[-]

Category Archives: CUDA

Use your CPU and GPU together

Although my previous CUDA posts have looked at the comparative performance between the CPU and GPU, an interesting approach is to split your problem up and use both chips at the same time. Even though your task might not be ideally suited to the GPU, you can still cut your total elapsed time by [...]

Comments break the CUDA preprocessing?

After adding comments to various points the template code suddenly stopped working. After some searching I discovered that this was the problem code:
cutilSafeCall( cudaMemcpy( d_idata, h_idata, mem_size, // Copies nothing to memory
cudaMemcpyHostToDevice) );
and this was the solution:

cutilSafeCall( cudaMemcpy( d_idata, h_idata, mem_size,
cudaMemcpyHostToDevice) ); // Correctly copies the data

You are probably [...]

Getting Started with CUDA (3/3) – Pageable and pinned memory

I’d added GPU based timing to my template code and found out that most of the time was spent copying data back and forth between the host and the device. The “Bandwidth Test” in the SDK gave roughly similar results although it mentioned something about pageable memory. But the big problem was the theoretical performance [...]

Getting Started with CUDA (2/3) – How is the GPU spending its time?

I had modified the supplied SDK template code in a minimal way in order to measure CPU vs GPU performance and found that for the simple test code (1 float multiplication) that the E8400 CPU with a claimed 24 Gflops was handily out performing a GPU with a theoretical max 504 Gflops. Where was all [...]

Getting Started with CUDA (1/3) – SDK template

What are the capabilities of Nvidia’s CUDA running on the GPU and how does it compare to CPU performance? I bought a GeForce 9800GT and set about finding out, starting off by installing the CUDA drivers, toolkit and SDK from the Cuda Zone.
The first thing I noticed was that on my Vista64 machine the [...]