Deletions are marked like this. | Additions are marked like this. |
Line 19: | Line 19: |
||processor||gcc v||flags||OMP threads||mri_ca_register runtime|| | ||'''processor'''||'''gcc v'''||'''flags'''||'''OMP threads'''||'''mri_ca_register runtime'''|| |
Line 24: | Line 24: |
||3.3GHz Intel Xeon W5590 (Nehalem)||4.4.5||-O3 -ftree-vectorize -msse4.1 -mfpmath=sse||1||1 hours, 58 minutes|| ||3.3GHz Intel Xeon W5590 (Nehalem)||4.4.5||-O3 -ftree-vectorize -msse4.1 -mfpmath=sse||2||1 hours, 14 minutes|| ||3.3GHz Intel Xeon W5590 (Nehalem)||4.4.5||-O3 -ftree-vectorize -msse4.1 -mfpmath=sse||3||0 hours, 57 minutes|| ||3.3GHz Intel Xeon W5590 (Nehalem)||4.4.5||-O3 -ftree-vectorize -msse4.1 -mfpmath=sse||4||0 hours, 50 minutes|| ||3.3GHz Intel Xeon W5590 (Nehalem)||4.4.5||-O3 -ftree-vectorize -msse4.1 -mfpmath=sse||5||0 hours, 44 minutes|| ||3.3GHz Intel Xeon W5590 (Nehalem)||4.4.5||-O3 -ftree-vectorize -msse4.1 -mfpmath=sse||6||0 hours, 41 minutes|| ||3.3GHz Intel Xeon W5590 (Nehalem)||4.4.5||-O3 -ftree-vectorize -msse4.1 -mfpmath=sse||7||0 hours, 40 minutes|| ||3.3GHz Intel Xeon W5590 (Nehalem)||4.4.5||-O3 -ftree-vectorize -msse4.1 -mfpmath=sse||8||0 hours, 38 minutes|| |
||3.3GHz Intel Xeon W5590 (Nehalem)||4.4.5||-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse||1||1 hours, 58 minutes|| ||3.3GHz Intel Xeon W5590 (Nehalem)||4.4.5||-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse||2||1 hours, 14 minutes|| ||3.3GHz Intel Xeon W5590 (Nehalem)||4.4.5||-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse||3||0 hours, 57 minutes|| ||3.3GHz Intel Xeon W5590 (Nehalem)||4.4.5||-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse||4||0 hours, 50 minutes|| ||3.3GHz Intel Xeon W5590 (Nehalem)||4.4.5||-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse||5||0 hours, 44 minutes|| ||3.3GHz Intel Xeon W5590 (Nehalem)||4.4.5||-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse||6||0 hours, 41 minutes|| ||3.3GHz Intel Xeon W5590 (Nehalem)||4.4.5||-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse||7||0 hours, 40 minutes|| ||3.3GHz Intel Xeon W5590 (Nehalem)||4.4.5||-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse||8||0 hours, 38 minutes|| |
mri_ca_register timing info
- using subject 'ernie'
- using 'dev' build as of 18 march 2012
- commandline:
mri_ca_register \ -nobigventricles \ -T transforms/talairach.lta \ -align-after \ -mask brainmask.mgz \ norm.mgz \ /autofs/cluster/freesurfer/centos6_x86_64/dev/average/RB_all_2008-03-26.gca \ transforms/talairach.m3z
- the opteron was a seychelles node (node0355), running CentOS5
- the intel was machine 'monster', running Centos6
processor |
gcc v |
flags |
OMP threads |
mri_ca_register runtime |
2GHz AMD Opteron |
3.4.6 |
-O3 -msse2 -mfpmath=sse |
NA |
12 hours, 46 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
3.4.6 |
-O3 -msse2 -mfpmath=sse |
NA |
3 hours, 8 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.1.2 |
-O3 -msse2 -mfpmath=sse |
NA |
3 hours, 10 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-O3 -msse2 -mfpmath=sse |
NA |
1 hours, 56 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse |
1 |
1 hours, 58 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse |
2 |
1 hours, 14 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse |
3 |
0 hours, 57 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse |
4 |
0 hours, 50 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse |
5 |
0 hours, 44 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse |
6 |
0 hours, 41 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse |
7 |
0 hours, 40 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse |
8 |
0 hours, 38 minutes |
observations
- nehalem architecture makes a huge difference (compared to amd opteron)
- gcc 4.4.5 alone drops 1 hour of time
- -ftree-vectorize -msse4.1 flags dont make any difference over -msse2
- adding omp threads adds modest and linear performance improvement
- asegstatsdiff comparisons show minimal differences in results