why you must use pinned transfers to compare CUDA device bandwidth

correct me if i'm wrong, but NVIDIA's bandwidthTest program included in the CUDA SDK takes single timing measurements and reports them. if there's any noise in the measurements, these single reports may be misleading. i wrote my own bandwidth test program that takes 100 measurements and spews out the resulting data for analysis. with my program at least, there is plenty of noise if you measure unpinned rather than pinned transfers.

below are some plots to show it. each plot shows the percentage of transfers that completed in <= the amount of time on the x axis. so the closer a line is to being purely vertical, the less variation there is in the transfer times for that device. the farther left a line is, the faster the transfer times for that device. the plots compare transfer times on my two systems--maul and vader, to transfer times on a friend's system. note that i've restricted the plot so the y-axis only goes to 0.99 (99%) so that outliers don't squish the x-axis.
device.to.host_unpinned.png

in this plot, the green and orange lines are all over each other, but in the next one...

device.to.host_pinned.png

...they separate neatly (up till 90% or so).

if you want my measurement code or the R code to analyze its results, shoot me an email or comment.

Leave a comment

About this Entry

This page contains a single entry by mason published on January 18, 2010 11:46 PM.

notes on The Mythical Man Month was the previous entry in this blog.

Find recent content on the main index or look in the archives to find all content.

My photos

www.flickr.com
masonsimon's items Go to masonsimon's photostream