SiSoftware Sandra的处理器测试还算比较快捷的,不过内存缓存测试方面就极耗费时间了。
SiSoftware Sandra Pro Business 2010 | ||||||
---|---|---|---|---|---|---|
测试对象 | 双路Intel Nehalem-EP Xeon X5570 2.93GHz | 双路Intel Westmere-EP Xeon X5670 2.93GHz | 双路Intel Westmere-EP Xeon X5680 3.33GHz | Dawning I840-H 四路Intel Dunnington Xeon X7460 2.66GHz @Sandra 2009 | DELL PowerEdge M910 四路Intel Nehalem-EX Xeon E7540 2.0GHz | 四路Intel Nehalem-EX Xeon X7560 2.27GHz |
Memory Bandwidth Benchmark 内存带宽测试 | ||||||
Aggregate Memory Performance | 38GB/s | 35GB/s | 35.2GB/s | 33.86GB/s | 65.76GB/s | |
Int Buff'd iSSE2 Memory Bandwidth | 38GB/s | 35GB/s | 35.2GB/s | 3.49GB/s | 33.86GB/s | 65.76GB/s |
Float Buff'd iSSE2 Memory Bandwidth | 38GB/s | 35GB/s | 35.18GB/s | 3.49GB/s | 33.85GB/s | 65.77GB/s |
Memory Latency Benchmark(Random) 内存延迟测试(随机) | ||||||
Memory(Random Access) Latency (越小越好) | 80ns | 83ns | 82ns | 192ns | 149ns(min) | |
Speed Factor (越小越好) | 55.50 | 57.00 | 64.60 | 98.10 | 94.50 | |
Internal Data Cache | 4clocks | 4clocks | 4clocks | 4clocks | 3~4clocks | |
L2 On-board Cache | 11clocks | 10clocks | 10clocks | 10clocks | 9~10clocks | |
L3 On-board Cache | 49clocks | 57clocks | 60clocks | 84clocks | 66~70clocks | |
Memory Latency Benchmark(Linear) 内存延迟测试(线性) | ||||||
Memory(Linear Access) Latency (越小越好) | 7ns | 7ns | 7ns | 41ns | 36ns(min) | |
Speed Factor (越小越好) | 4.80 | 5.10 | 5.50 | 20.70 | 20.20 | |
Internal Data Cache | 4clocks | 4clocks | 4clocks | 4clocks | 3~4clocks | |
L2 On-board Cache | 10clocks | 11clocks | 11clocks | 10clocks | 9~10clocks | |
L3 On-board Cache | 13clocks | 13clocks | 13clocks | 34clocks | 27~28clocks | |
Cache and Memory Benchmark 缓存及内存测试 | ||||||
Cache/Memory Bandwidth | 142GB/s | 183.26GB/s | 195.6GB/s | 315GB/s | 510.58GB/s | |
Speed Factor (越小越好) | 21.20 | 31.00 | 35.20 | 34.80 | 26.90 | |
Internal Data Cache | 471GB/s | 663.51GB/s | 744.49GB/s | 919.66GB/s | 1.3TB/s | |
L2 On-board Cache | 295.4GB/s | 537.88GB/s | 611GB/s | 749GB/s | 909.27GB/s | |
L3 On-board Cache | 112GB/s | 146.33GB/s | 159GB/s | 336.6GB/s | 571.35GB/s |
很明显,同样为Nehalem-EX,官方平台比M910对比平台的内存带宽高了一倍达到了65.76GB/s,是上一代Dunnington的18.8倍,是优异双路X5680的1.87倍。M910在四路配置下每个处理器仅使用了两个内存控制器的其中一个,从结果来看影响巨大。
测试样机使用了两个内存控制器——通常的Nehalem-EX机器都应该这样,这样不仅内存带宽翻倍,连内存延迟也有所下降,幅度大约在20%左右,当然,还是要比双路产品线要明显高不少。L1/L2/L3延迟的略为降低应该是跟处理器主频相关,和内存控制器关系不大。
最后,Nehalem-EX的L3总带宽达到了571.35GB/s,双路Westmere-EP Xeon X5680则是159GB/s,增强的带宽和Nehalem-EX的CPU总数以及每CPU的核心数量有关,此外,我们也可以看到环形总线的威力。