Loading...
[-]

Comments break the CUDA preprocessing?

After adding comments to various points the template code suddenly stopped working. After some searching I discovered that this was the problem code:

cutilSafeCall( cudaMemcpy( d_idata, h_idata, mem_size, // Copies nothing to memory
   cudaMemcpyHostToDevice) );

and this was the solution:

cutilSafeCall( cudaMemcpy( d_idata, h_idata, mem_size,
   cudaMemcpyHostToDevice) ); // Correctly copies the data

You are probably thinking: hey aren’t those identical? Is there a typo, did he copy the wrong sample? But no, the only difference between the working and not is the position of the comment. The VC++ compiler doesn’t normally falter on this kind of code but a *.cu file goes through several phases before it is passed to the host compiler. These phases are documented in “The CUDA Compiler Driver NVCC” PDF (only available with the SDK) and the first of which is –cuda which will “compile all .cu input files to .cu.c/.cu.cpp output.”  According to the documentation the flag –dryrun will “not execute the compilation commands generated by nvcc. Instead, list them”, so let’s try that:

PS C:\CUDA\SDK\projects\template> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2009 NVIDIA Corporation
Built on Sat_May__2_05:54:12_PDT_2009
Cuda compilation tools, release 2.2, V0.2.1221
PS C:\CUDA\SDK\projects\template> nvcc -cuda -dryrun -I..\..\common\inc template.cu
#$ _SPACE_=
#$ _MODE_=DEVICE
#$ _HERE_=C:\CUDA\bin
#$ _THERE_=C:\CUDA\bin
#$ TOP=C:\CUDA\bin/..
#$ PATH=C:\CUDA\bin/../extools/bin;C:\CUDA\bin/../open64/bin;C:\CUDA\bin/../bin;C:\CUDA\b
n/../lib;C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE;C:\Program Files
x86)\Microsoft Visual Studio 9.0\VC\BIN;C:\Program Files (x86)\Microsoft Visual Studio 9.
\Common7\Tools;C:\Windows\Microsoft.NET\Framework\v3.5;C:\Windows\Microsoft.NET\Framework
v2.0.50727;C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\VCPackages;C:\Program Fi
es\\Microsoft SDKs\Windows\v6.0A\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\W
em;C:\Windows\system32\WindowsPowerShell\v1.0\;C:\CUDA\bin;c:\Program Files (x86)\Microso
t SQL Server\100\Tools\Binn\;c:\Program Files\Microsoft SQL Server\100\Tools\Binn\;c:\Pro
ram Files\Microsoft SQL Server\100\DTS\Binn\;c:\Program Files (x86)\Microsoft SQL Server\
00\Tools\Binn\VSShell\Common7\IDE\;c:\Program Files (x86)\Microsoft SQL Server\100\DTS\Bi
n\;c:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\;C:\s
sint
#$ INCLUDES="-IC:\CUDA\bin/../include" "-IC:\CUDA\bin/../include/cudart"
#$ LIBRARIES=  "/LIBPATH:C:\CUDA\bin/../lib" cudart.lib
#$ CUDAFE_FLAGS=
#$ OPENCC_FLAGS=
#$ PTXAS_FLAGS=
#$ PATH=C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\BIN;C:\CUDA\bin/../extools/
in;C:\CUDA\bin/../open64/bin;C:\CUDA\bin/../bin;C:\CUDA\bin/../lib;C:\Program Files (x86)
Microsoft Visual Studio 9.0\Common7\IDE;C:\Program Files (x86)\Microsoft Visual Studio 9.
\VC\BIN;C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\Tools;C:\Windows\Micro
oft.NET\Framework\v3.5;C:\Windows\Microsoft.NET\Framework\v2.0.50727;C:\Program Files (x8
)\Microsoft Visual Studio 9.0\VC\VCPackages;C:\Program Files\\Microsoft SDKs\Windows\v6.0
\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\system32\WindowsP
werShell\v1.0\;C:\CUDA\bin;c:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\;c:
Program Files\Microsoft SQL Server\100\Tools\Binn\;c:\Program Files\Microsoft SQL Server\
00\DTS\Binn\;c:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\I
E\;c:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn\;c:\Program Files (x86)\Micro
oft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\;C:\sysint
#$ cl -D__CUDA_ARCH__=100 -nologo -E -TP -DCUDA_NO_SM_13_DOUBLE_INTRINSICS -DCUDA_FLOAT_M
TH_FUNCTIONS -DCUDA_NO_SM_12_ATOMIC_INTRINSICS -DCUDA_NO_SM_11_ATOMIC_INTRINSICS  "-IC:\C
DA\bin/../include" "-IC:\CUDA\bin/../include/cudart"   -I. -D__CUDACC__ -C  -I "../../com
on/inc" -FI "cuda_runtime.h" > "C:\Users\Andrew\AppData\Local\Temp/tmpxft_00000774_000000
0-3_template.cpp1.ii" "template.cu"
Internal error
PS C:\CUDA\SDK\projects\template>

Unfortunately on my Vista 64 system that results in the “Internal error” as shown above. However there are some other options which allow us to see into the nvcc compilation process. The –keep command will “Keep all intermediate files that are generated during internal compilation steps.” and provides some good insight into what’s going on. We can get the .cubin file which tells us how much shared memory and how many registers our kernel uses, also the ptx file which is the kernel in assembly code and lastly the .cu.cpp files along with quite a few .i preprocessed files from various stages. Indeed, so many files are generated that nvcc provides a –clean command to clean up provided you pass the exact same parameters. Here’s a list of the generated files:

PS C:\CUDA\SDK\projects\template> nvcc -cuda -keep -clean -I..\..\common\inc template.cu
PS C:\CUDA\SDK\projects\template> $before = dir | sort LastWriteTime,Name
PS C:\CUDA\SDK\projects\template> nvcc -cuda -keep -I..\..\common\inc template.cu
template.cu
template.cudafe1.gpu
template.cudafe2.gpu
template.cudafe1.cpp
PS C:\CUDA\SDK\projects\template> $after = dir | sort LastWriteTime,Name
PS C:\CUDA\SDK\projects\template> (compare-object $before $after -syncWindow ($before.count/2))
 | select inputobject

InputObject
-----------
template.cpp1.ii
template.cudafe1.c
template.cudafe1.gpu
template.cudafe1.stub.c
template.cudafe1.stub.h
template.cpp2.i
template.cudafe2.stub.h
template.cudafe2.c
template.cudafe2.gpu
template.cudafe2.stub.c
template.cpp3.i
template.hash
template.ptx
template.linkinfo
template.sm_10.cubin
template.fatbin.c
template.cudafe1.cpp
template.cu.cpp

PS C:\CUDA\SDK\projects\template> nvcc -cuda -keep -clean -I..\..\common\inc template.cu
PS C:\CUDA\SDK\projects\template> dir

    Directory: C:\CUDA\SDK\projects\template

Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---         2/07/2009   7:52 PM       5739 template.cu
-a---        24/02/2009   4:40 PM       2033 template.sln
-a---        24/02/2009   4:40 PM      22247 template.vcproj
-a---        24/02/2009   4:40 PM       2752 template_gold.cpp
-a---        24/02/2009   4:40 PM       3183 template_kernel.cu
-a---         2/07/2009   7:47 PM      11264 template_vc90.ncb
-a---        24/02/2009   4:40 PM       2044 template_vc90.sln
-a---        24/02/2009   4:40 PM      22401 template_vc90.vcproj

PS C:\CUDA\SDK\projects\template>

But still the question remains of how exactly those files were generated and with what commands? The –dryrun switch should tell us that but doesn’t. To find out I used SysInternals’ excellent Process Monitor tool to monitor what was going on as the command ran and dump the data to XML. Then I was able to use PoSH, with VsVars32 and Get-SysInternals scripts, to extract the relevant command lines:

PS C:\CUDA\SDK\projects\template> get-sysinternals
# Long output deleted (Script uses write-host which doesn't use streams and cannot be redirected) ...
PS C:\CUDA\SDK\projects\template> $env:path += ";c:\sysint"
PS C:\CUDA\SDK\projects\template> C:\util\vsvars32.ps1
Visual Studio 2008 Windows PowerShell
PS C:\CUDA\SDK\projects\template> procmon /Quiet /Minimized /NoFilter /BackingFile local.pml
PS C:\CUDA\SDK\projects\template> nvcc -cuda -keep -I..\..\common\inc template.cu
tmpxft_000010b4_00000000-3_template.cudafe1.gpu
tmpxft_000010b4_00000000-8_template.cudafe2.gpu
tmpxft_000010b4_00000000-3_template.cudafe1.cpp
PS C:\CUDA\SDK\projects\template> procmon /Terminate | out-null
PS C:\CUDA\SDK\projects\template> procmon /OpenLog local.pml /SaveAs local.xml | out-null
PS C:\CUDA\SDK\projects\template> ([xml](gc local.xml)).procmon.processlist.SelectNodes("
process[ProcessName='cmd.exe']/CommandLine") | % {$_.InnerText} | get-unique
template.cu
"C:\Windows\system32\cmd.exe"
"C:\Windows\System32\cmd.exe" /k ""C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\
vcvarsall.bat"" x86
cmd /c ""C:\\Users\\Andrew\\AppData\\Local\\Temp/tmpxft_000010b4_00000000-1.bat" "
C:\Windows\system32\cmd.exe /c reg query "HKLM\SOFTWARE\Microsoft\Microsoft SDKs\Windows"
 /v "CurrentInstallFolder"
C:\Windows\system32\cmd.exe /c cl -D__CUDA_ARCH__=100 -nologo -E -TP -DCUDA_NO_SM_13_DOUB
LE_INTRINSICS -DCUDA_FLOAT_MATH_FUNCTIONS -DCUDA_NO_SM_12_ATOMIC_INTRINSICS -DCUDA_NO_SM_
11_ATOMIC_INTRINSICS  "-IC:\CUDA\bin/../include" "-IC:\CUDA\bin/../include/cudart"   -I.
-D__CUDACC__ -C  -I "../../common/inc" -FI "cuda_runtime.h" > "C:\Users\Andrew\AppData\Lo
cal\Temp/tmpxft_000010b4_00000000-6_template.cpp1.ii" "template.cu"
C:\Windows\system32\cmd.exe /c cudafe --m64 --microsoft_version=1500 --msvc_target_versio
n=1500 --compiler_bindir "C:/Program Files (x86)/Microsoft Visual Studio 9.0/VC/BIN/../..
" --sdk_dir "C:/Program Files//Microsoft SDKs/Windows/v6.0A/" --diag_error=host_device_li
mited_call --diag_error=ms_asm_decl_not_allowed -tused --gen_c_file_name "C:\Users\Andrew
\AppData\Local\Temp/tmpxft_000010b4_00000000-3_template.cudafe1.c" --stub_file_name "C:\U
sers\Andrew\AppData\Local\Temp/tmpxft_000010b4_00000000-3_template.cudafe1.stub.c" --stub
_header_file_name "C:\Users\Andrew\AppData\Local\Temp/tmpxft_000010b4_00000000-3_template
.cudafe1.stub.h" --gen_device_file_name "C:\Users\Andrew\AppData\Local\Temp/tmpxft_000010
b4_00000000-3_template.cudafe1.gpu" --include_file_name "C:\Users\Andrew\AppData\Local\Te
mp/tmpxft_000010b4_00000000-5_template.fatbin.c" "C:\Users\Andrew\AppData\Local\Temp/tmpx
ft_000010b4_00000000-6_template.cpp1.ii"
C:\Windows\system32\cmd.exe /c cl -D__CUDA_ARCH__=100 -nologo -E -TC -DCUDA_NO_SM_13_DOUB
LE_INTRINSICS -DCUDA_FLOAT_MATH_FUNCTIONS -DCUDA_NO_SM_12_ATOMIC_INTRINSICS -DCUDA_NO_SM_
11_ATOMIC_INTRINSICS  "-IC:\CUDA\bin/../include" "-IC:\CUDA\bin/../include/cudart"   -I.
-D__CUDACC__ -C  -I "../../common/inc" > "C:\Users\Andrew\AppData\Local\Temp/tmpxft_00001
0b4_00000000-7_template.cpp2.i" "C:\Users\Andrew\AppData\Local\Temp/tmpxft_000010b4_00000
000-3_template.cudafe1.gpu"
C:\Windows\system32\cmd.exe /c cudafe --m64 --microsoft_version=1500 --msvc_target_versio
n=1500 --compiler_bindir "C:/Program Files (x86)/Microsoft Visual Studio 9.0/VC/BIN/../..
" --sdk_dir "C:/Program Files//Microsoft SDKs/Windows/v6.0A/" --c --gen_c_file_name "C:\U
sers\Andrew\AppData\Local\Temp/tmpxft_000010b4_00000000-8_template.cudafe2.c" --stub_file
_name "C:\Users\Andrew\AppData\Local\Temp/tmpxft_000010b4_00000000-8_template.cudafe2.stu
b.c" --stub_header_file_name "C:\Users\Andrew\AppData\Local\Temp/tmpxft_000010b4_00000000
-8_template.cudafe2.stub.h" --gen_device_file_name "C:\Users\Andrew\AppData\Local\Temp/tm
pxft_000010b4_00000000-8_template.cudafe2.gpu" --include_file_name "C:\Users\Andrew\AppDa
ta\Local\Temp/tmpxft_000010b4_00000000-5_template.fatbin.c" "C:\Users\Andrew\AppData\Loca
l\Temp/tmpxft_000010b4_00000000-7_template.cpp2.i"
C:\Windows\system32\cmd.exe /c cl -D__CUDA_ARCH__=100 -nologo -E -TC -DCUDA_NO_SM_13_DOUB
LE_INTRINSICS -DCUDA_FLOAT_MATH_FUNCTIONS -DCUDA_NO_SM_12_ATOMIC_INTRINSICS -DCUDA_NO_SM_
11_ATOMIC_INTRINSICS  "-IC:\CUDA\bin/../include" "-IC:\CUDA\bin/../include/cudart"   -I.
-D__GNUC__ -D__CUDABE__  -I "../../common/inc" > "C:\Users\Andrew\AppData\Local\Temp/tmpx
ft_000010b4_00000000-9_template.cpp3.i" "C:\Users\Andrew\AppData\Local\Temp/tmpxft_000010
b4_00000000-8_template.cudafe2.gpu"
C:\Windows\system32\cmd.exe /c filehash -s "" "C:\Users\Andrew\AppData\Local\Temp/tmpxft_
000010b4_00000000-9_template.cpp3.i" > "C:\Users\Andrew\AppData\Local\Temp/tmpxft_000010b
4_00000000-10_template.hash"
C:\Windows\system32\cmd.exe /c nvopencc  -TARG:compute_10 -m64 "C:\Users\Andrew\AppData\L
ocal\Temp/tmpxft_000010b4_00000000-9_template.cpp3.i"  -o "C:\Users\Andrew\AppData\Local\
Temp/tmpxft_000010b4_00000000-4_template.ptx"
C:\Windows\system32\cmd.exe /c ptxas --key="8c2645991a519b06"  -arch=sm_10 --link-info "t
emplate.linkinfo" "C:\Users\Andrew\AppData\Local\Temp/tmpxft_000010b4_00000000-4_template
.ptx"  -o "C:\Users\Andrew\AppData\Local\Temp/tmpxft_000010b4_00000000-11_template.sm_10.
cubin"
C:\Windows\system32\cmd.exe /c erase "C:\Users\Andrew\AppData\Local\Temp\tmpxft_00000c88_
00000000"*
C:\Windows\system32\cmd.exe /c fatbin --key="8c2645991a519b06" --source-name="template.cu
" --usage-mode="" --embedded-fatbin="C:\Users\Andrew\AppData\Local\Temp/tmpxft_000010b4_0
0000000-5_template.fatbin.c" "--image=profile=compute_10,file=C:\Users\Andrew\AppData\Loc
al\Temp/tmpxft_000010b4_00000000-4_template.ptx" "--image=profile=sm_10,file=C:\Users\And
rew\AppData\Local\Temp/tmpxft_000010b4_00000000-11_template.sm_10.cubin"
C:\Windows\system32\cmd.exe /c bin2c "C:\Users\Andrew\AppData\Local\Temp/tmpxft_000010b4_
00000000-11_template.sm_10.cubin" --name __deviceText_$sm_10$ --type longlong --static --
const --padd 0 > "C:\Users\Andrew\AppData\Local\Temp/tmpxft_00001408_00000000-0"
C:\Windows\system32\cmd.exe /c bin2c "C:\Users\Andrew\AppData\Local\Temp/tmpxft_000010b4_
00000000-4_template.ptx" --name __deviceText_$compute_10$ --type longlong --static --cons
t --padd 0 > "C:\Users\Andrew\AppData\Local\Temp/tmpxft_00001408_00000000-1"
C:\Windows\system32\cmd.exe /c erase "C:\Users\Andrew\AppData\Local\Temp\tmpxft_00001408_
00000000"*
C:\Windows\system32\cmd.exe /c cudafe++ --m64 --microsoft_version=1500 --msvc_target_vers
ion=1500 --compiler_bindir "C:/Program Files (x86)/Microsoft Visual Studio 9.0/VC/BIN/../
.." --sdk_dir "C:/Program Files//Microsoft SDKs/Windows/v6.0A/" --diag_error=host_device_
limited_call --diag_error=ms_asm_decl_not_allowed --parse_templates --gen_c_file_name "C:
\Users\Andrew\AppData\Local\Temp/tmpxft_000010b4_00000000-3_template.cudafe1.cpp" --stub_
file_name "C:\Users\Andrew\AppData\Local\Temp/tmpxft_000010b4_00000000-3_template.cudafe1
.stub.c" --stub_header_file_name "C:\Users\Andrew\AppData\Local\Temp/tmpxft_000010b4_0000
0000-3_template.cudafe1.stub.h" "C:\Users\Andrew\AppData\Local\Temp/tmpxft_000010b4_00000
000-6_template.cpp1.ii"
C:\Windows\system32\cmd.exe /c cl -D__CUDA_ARCH__=100 -nologo -E -TP -DCUDA_NO_SM_13_DOUB
LE_INTRINSICS -DCUDA_FLOAT_MATH_FUNCTIONS -DCUDA_NO_SM_12_ATOMIC_INTRINSICS -DCUDA_NO_SM_
11_ATOMIC_INTRINSICS  "-IC:\CUDA\bin/../include" "-IC:\CUDA\bin/../include/cudart"   -I.
-I "../../common/inc" > "template.cu.cpp" "C:\Users\Andrew\AppData\Local\Temp/tmpxft_0000
10b4_00000000-3_template.cudafe1.cpp"
C:\Windows\system32\cmd.exe /c erase "C:\Users\Andrew\AppData\Local\Temp\tmpxft_000010b4_
00000000"*
PS C:\CUDA\SDK\projects\template>

It turns out the the second command is the one that introduces the error calling “cl.exe –E –C” where –E preprocesses the files and –C retains comments during preprocessing.  When this option is used cl.exe moves the comment to the start of the line, taking the original template.cu line 123 and generating the template.cpp1.ii line 53716 in one of the intermediate preprocessed files:

PS C:\CUDA\SDK\projects\template> dir template.* | select-string "Copies nothing"

template.cpp1.ii:53716:    // Copies nothing to memory__cudaSafeCall (cudaMemcpy( h_odata
, d_odata, sizeof( float) * num_threads, cudaMemcpyDeviceToHost), "template.cu", 123);
template.cu:122:    cutilSafeCall( cudaMemcpy( h_odata, d_odata, sizeof( float) * num_thr
eads, // Copies nothing to memory

Which results in the code being dropped from the .cu.cpp file and leads to the problem I started with – nothing is copied to memory. Of course it was the first command which caused the issue in this case and the –dryrun command had gotten that far. But with the procmon dump from above we now know precisely how everything is built and have a means to debug any other strange occurrences.

Post a Comment

Your email is never shared. Required fields are marked *

*
*