Differences between revisions 4 and 5

Notes on FreeSurfer code optimization

This page is for free-form entry of notes on ways to optimize the freesurfer code base, whether they be simple things, or notes on larger scale problems.

Format is: name:, <short label> - description

* nick:, -ffast-math - Try the -ffast-math flag of gcc v4.x. Prior experiments with this on the AMD compiler showed output differences in recon-all, but perhaps selective use of this flag is possible.

* nick:, SSE math lib - Replace instances of sin, cos, log and exp with routines optimized for SSE instructions found on intel processors. See http://gruntthepeon.free.fr/ssemath/. Tried this, but ran into problems and gave up. Some wrangling with it could make it possible to optionally build with this lib via #ifdefs. Actually, its not a lib but a header file.

* Richard:, MRI data structure - pointer-chasing implied by ***slices is horrific for the CPU caches (and a non-starter on the GPU). The 'chunking' alternative is much better, but needs to be used uniformly

* Richard:, MATRIX data type - If this is only used for 4x4 affine transformations, it should be coded as such. If used more generally too, then a separate 'Affine' class should be considered (this exists for the GPU already in the file affinegpu.cu)

* Richard:, MRI data structure - Do we really need support for all of UCHAR, SHORT, LONG and FLOAT? On the GPU, manipulating datatypes which aren't 4-byte aligned is slow, and I imagine that modern CPUs face similar difficulties. They do save some RAM, but it's only a factor of four; if we want to edit bigger volumes or long sequences, we should be thinking about better datastructures, not trying to 'cram down' the existing ones.

Deletions are marked like this.	Additions are marked like this.
Line 16:	Line 16:
	* ''Richard:'', MRI data structure - Do we really need support for all of UCHAR, SHORT, LONG and FLOAT? On the GPU, manipulating datatypes which aren't 4-byte aligned is slow, and I imagine that modern CPUs face similar difficulties. They do save some RAM, but it's only a factor of four; if we want to edit bigger volumes or long sequences, we should be thinking about better datastructures, not trying to 'cram down' the existing ones.