correct me if i'm wrong, but NVIDIA's bandwidthTest program included in the CUDA SDK takes single timing measurements and reports them. if there's any noise in the measurements, these single reports may be misleading. i wrote my own bandwidth test program that takes 100 measurements and spews out the resulting data for analysis. with my program at least, there is plenty of noise if you measure unpinned rather than pinned transfers.
Continue reading why you must use pinned transfers to compare CUDA device bandwidth.