Differences between revisions 1 and 15 (spanning 14 versions)
Revision 1 as of 2018-03-13 00:18:42
Size: 872
Editor: BevinBrett
Comment:
Revision 15 as of 2021-04-20 21:04:48
Size: 3541
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= THIS PAGE HAS BEEN MOVED TO SHAREPOINT! =
Please refer to this site/make edits here for the most updated information: https://partnershealthcare.sharepoint.com/sites/LCN/SitePages/Morpho-Optimization-Project.aspx

----
<<BR>>
<<BR>>
<<BR>>


## page was renamed from MorphoOptimizationProject_BetterSerialCode
Parent: MorphoOptimizationProject
Line 4: Line 16:
Each face had a normal vector and area stored in the nx,ny,nz,orig_area members of the FACE struct. ==== mris_fix_topology has a hot loop, which does unnecessary calculations of the face normals ====
mrisComputeOptimalRetessellation and mrisComputeRandomRetessellation have a similar structure
Line 6: Line 19:
mris_fix_topology has a hot loop _:_: it loops over a set of patches, or iterates on one patch. For each patch it calls
Line 8: Line 21:
 ___: mrisComputeOptimalRetessellation and mrisComputeRandomRetessellation have a similar structure _:_:_: mrisDefectPatchFitness, which calls
Line 10: Line 23:
___: ___: a computeDefectContext is constructed here _:_:_:_: mrisComputeDefectLogLikelihood, which calls
Line 12: Line 25:
___:___: then it loops over a set of patches, or iterates on one patch. For each patch it calls _:_:_:_:_: mrisComputeDefectMRILogUnlikelihood, which
Line 14: Line 27:
___:___:___: mrisDefectPatchFitness, which calls _:_:_:_:_:_: does an expensive computation all the face normals for ALL the faces
Line 16: Line 29:
___:___:___: ___: mrisComputeDefectLogLikelihood, which calls _:_:_:_:_:_: does two other expensive steps, which only use a few of the face normals
Line 18: Line 31:
___:___:___: ___: ___: mrisComputeDefectMRILogUnlikelihood, which ==== To simplify, the original code does this ====
_: loop
Line 20: Line 34:
___:___:___: ___: ___:___: does an expensive computation all the face normals for ALL the faces _:_: for all fno normal[fno] = f (inputs)
Line 22: Line 36:
___:___:___: ___: ___:___: does two other expensive steps, which only use a few of the face normals _:_: for a few fno use normal[fno]

_: end loop

_: change some inputs

_: use some normal[fno]

''Since only a few of the face normals are used, it is a waste of time to calculate all of them every time around the loop! ''

==== This is replaced by code that does ====
_: loop

_:_: for all fno: normal[fno].deferred = true

_:_: for a few fno: use normal[fno], computing it from f (inputs) if is deferred, and changing it to being no-longer deferred

_: end loop

_: compute any remaining deferred normal[fno] before their inputs can change

_: change some inputs

_: use some normal[fno]

==== How can we test it gets the right answers? ====
When checking code is enabled, the normal[fno] is computed from the inputs twice - once when it is deferred, and again when it is undeferred. It is checked that these two values are the same.

==== How do we make sure that the undeferring is done? ====
If we left the __FACE nx,ny,nz,orig_area__ members unchanged, then it would be hard to find all the users. We could just rename them, but it is easier to place them in a struct and have a function called to get a pointer to the struct, and have the function do the undeferral. This is a standard data hiding technique, implemented here in C. Now all the places that need to be changed are compilation errors.

'''Why put the structs into a separate vector?'''

If we coded this as __MRIS__ has a vector of __FACE__ which has one __FaceNormCacheEntry__ member, then the loop that sets deferred = true would bring in one fno per cache line fetch. Since there are about 100,000 faces, this would bring in about 100,000 cache lines. If instead we code this as MRIS has a vector of __FaceNormCacheEntry,__ then the same loop will bring in about 3 fno per cache line, which will run about three times faster.

Further measurement showed that the code was faster if the __deferred__ flags were separated from the __nx,ny,nz,orig_area__ because the loop would go even faster - 64 fno per cache line. Every other use of them would requires 2 cache lines but apparently there is sufficiently good locality of reference to nearby fno's that the memory traffic is improved by densely packing the deferred flags into an 8 bit char rather than into the 32 bits needed to keep the float data in the FaceNormCacheEntry aligned.

THIS PAGE HAS BEEN MOVED TO SHAREPOINT!

Please refer to this site/make edits here for the most updated information: https://partnershealthcare.sharepoint.com/sites/LCN/SitePages/Morpho-Optimization-Project.aspx





Parent: MorphoOptimizationProject

A variety of techniques have been used.

Deferring calculations until needed

mris_fix_topology has a hot loop, which does unnecessary calculations of the face normals

mrisComputeOptimalRetessellation and mrisComputeRandomRetessellation have a similar structure

_:_: it loops over a set of patches, or iterates on one patch. For each patch it calls

_:_:_: mrisDefectPatchFitness, which calls

_:_:_:_: mrisComputeDefectLogLikelihood, which calls

_:_:_:_:_: mrisComputeDefectMRILogUnlikelihood, which

_:_:_:_:_:_: does an expensive computation all the face normals for ALL the faces

_:_:_:_:_:_: does two other expensive steps, which only use a few of the face normals

To simplify, the original code does this

_: loop

_:_: for all fno normal[fno] = f (inputs)

_:_: for a few fno use normal[fno]

_: end loop

_: change some inputs

_: use some normal[fno]

Since only a few of the face normals are used, it is a waste of time to calculate all of them every time around the loop!

This is replaced by code that does

_: loop

_:_: for all fno: normal[fno].deferred = true

_:_: for a few fno: use normal[fno], computing it from f (inputs) if is deferred, and changing it to being no-longer deferred

_: end loop

_: compute any remaining deferred normal[fno] before their inputs can change

_: change some inputs

_: use some normal[fno]

How can we test it gets the right answers?

When checking code is enabled, the normal[fno] is computed from the inputs twice - once when it is deferred, and again when it is undeferred. It is checked that these two values are the same.

How do we make sure that the undeferring is done?

If we left the FACE nx,ny,nz,orig_area members unchanged, then it would be hard to find all the users. We could just rename them, but it is easier to place them in a struct and have a function called to get a pointer to the struct, and have the function do the undeferral. This is a standard data hiding technique, implemented here in C. Now all the places that need to be changed are compilation errors.

Why put the structs into a separate vector?

If we coded this as MRIS has a vector of FACE which has one FaceNormCacheEntry member, then the loop that sets deferred = true would bring in one fno per cache line fetch. Since there are about 100,000 faces, this would bring in about 100,000 cache lines. If instead we code this as MRIS has a vector of FaceNormCacheEntry, then the same loop will bring in about 3 fno per cache line, which will run about three times faster.

Further measurement showed that the code was faster if the deferred flags were separated from the nx,ny,nz,orig_area because the loop would go even faster - 64 fno per cache line. Every other use of them would requires 2 cache lines but apparently there is sufficiently good locality of reference to nearby fno's that the memory traffic is improved by densely packing the deferred flags into an 8 bit char rather than into the 32 bits needed to keep the float data in the FaceNormCacheEntry aligned.

MorphoOptimizationProject_BetterSerialCode_DeferCalculations (last edited 2021-09-22 09:46:34 by DevaniCordero)