THIS PAGE HAS BEEN MOVED TO SHAREPOINT!

Please refer to this site/make edits here for the most updated information: https://partnershealthcare.sharepoint.com/sites/LCN/SitePages/Archived-Notes.aspx#notes-on-freesurfer-code-optimization




Notes on FreeSurfer code optimization

This page is for free-form entry of notes on ways to optimize the freesurfer code base, whether they be simple things, or notes on larger scale problems.

Format is: name:, <short label> - description

* nick:, -ffast-math - Try the -ffast-math flag of gcc v4.x. Prior experiments with this on the AMD compiler showed output differences in recon-all, but perhaps selective use of this flag is possible.

* nick:, SSE math lib - Replace instances of sin, cos, log and exp with routines optimized for SSE instructions found on intel processors. See http://gruntthepeon.free.fr/ssemath/. Tried this, but ran into problems and gave up. Some wrangling with it could make it possible to optionally build with this lib via #ifdefs. Actually, its not a lib but a header file.

* Richard:, MRI data structure - pointer-chasing implied by ***slices is horrific for the CPU caches (and a non-starter on the GPU). The 'chunking' alternative is much better, but needs to be used uniformly

* Richard:, MATRIX data type - If this is only used for 4x4 affine transformations, it should be coded as such. If used more generally too, then a separate 'Affine' class should be considered (this exists for the GPU already in the file affinegpu.cu)

* Richard:, Boundary conditions - MRIconvolve1d and MRImean handle out-of-range accesses differently. MRImean effectively returns zero, MRIconvolve1d uses the [x|y|z]i pointers which clamp to the edge of the range. There are probably other places where this happens. A uniform treatment would be best.

* Richard:, Data structure memory management - Datastructures which allocate RAM, or are allocated as arrays should always carry their lengths with them. I'm thinking particularly of GCA_SAMPLE arrays here, but I imagine there are other examples

* Richard:, MRI data structure - Do we really need support for all of UCHAR, SHORT, LONG and FLOAT? On the GPU, manipulating datatypes which aren't 4-byte aligned is slow, and I imagine that modern CPUs face similar difficulties. They do save some RAM, but it's only a factor of four; if we want to edit bigger volumes or long sequences, we should be thinking about better datastructures, not trying to 'cram down' the existing ones.

* Richard:, const correctness - It would be very useful if arguments could be declared const whenever possible.

* Richard:, Array ordering. Within an MRI structure, we have mri->slices[z][y][x] but within a GCA there's gca->nodes[xn][yn][zn] and a GCAmorph has gcam->nodes[i][j][k] These should be made consistent - I'm pretty sure that this difference is the reason for the horrible performance of GCAmri - whatever the order of the loop nest, one of the structures is going to be traversed cache-incoherently. As a note, the MRI ordering is good for CUDA.

* Richard:, strcpy Only an optimisation in the sense that safe code is optimised as compared to insecure code. I can see strcpy calls littered all over the place (e.g. in mriio.c), which is begging for trouble when someone decides to use lengthy identifiers. I think I spotted some other 'unsafe' routines too (e.g. sprintf)

* Richard:, File formats. We should consider using more 'standard' file formats. There are freely available formats such as NetCDF and HDF5 which offer the ability to store arbitrary data along with rich metadata. Partial reading is also supported. These might be preferable to maintaining mriio.c. I already use NetCDF for some GCAmorph output, since I found the headers lurking in the depths of the source tree. However, it seems that NetCDF is going to use HDF as its backend in the future, so we might as well go for HDF. As a side note, I can load by NetCDF files straight into Visit, and do all sorts of stuff there I've not figured out how to do with tkmedit yet.

* Richard:, Version control. CVS is really, really, really old. Subversion would be the obvious choice for migration, since it's designed to be as similar to CVS as possible while fixing the latter's most glaring flaws. I've also started experimenting with mercurial (and bridging to CVS), and that could be interesting. Everyone would have their own local copy of the repository, where they could 'push' back to the main repository. It's a distributed system, designed to make branching easier. We would want to prune the repository to only have source files, though - none of the recon-all data. You can also have web frontends, which can show nice diffs, and maintain bug databases