Differences between revisions 5 and 6
Deletions are marked like this. Additions are marked like this.
Line 19: Line 19:
This requires a rebuild, after editing ''include/romp_support.h'' to enable it. The resulting indented display shows approximately how much elapsed time was spent in each parallel region, and approximately how well the available cpu's were used. This requires a rebuild, after editing ''include/romp_support.h'' to ''#define ROMP_SUPPORT_ENABLED'' to enable it.
With this enabled, executables output statistics to ''stderr'' as they exit. They also try to write ''.csv'' files containing the stats into the ''/tmp/ROMP_statsFiles'' directory.

The resulting indented display shows stats for each scope or parallel loop that has been annotated with ROMP macros.
The stats show approximately how much elapsed time was spent in it and approximately how well the available cpus were used.

Parent: MorphoOptimizationProject

Using oprofile

This is easily done, at least on Linux.

For example

  • rm -rf oprofile_data

  • operf -g -t ./mris_fix_topology ...

  • opreport --callgraph

The resulting table is a little more difficult to understand, but basically it is a list of hot spots. Each hotspot lists some of its callers, and then the hotspot itself slightly less indented, and then some called functions. Typically you just need to know the first few hotspots, because they are the most important. The % samples will tell you how important the slightly less indented hotspot is compared to others.

Inlining often results in functions, sometimes very large functions that only have one caller, disappearing. The NOINLINE macro in include/base.h can be used to avoid this.

However this does not do a good job of showing you execution spread over many functions, so after you have driven the hotspots out this way, you need a better tool...

Using ROMP

This requires a rebuild, after editing include/romp_support.h to #define ROMP_SUPPORT_ENABLED to enable it. With this enabled, executables output statistics to stderr as they exit. They also try to write .csv files containing the stats into the /tmp/ROMP_statsFiles directory.

The resulting indented display shows stats for each scope or parallel loop that has been annotated with ROMP macros. The stats show approximately how much elapsed time was spent in it and approximately how well the available cpus were used.

Using Intel Vtune

TBD

MorphoOptimizationProject_profiling_execution (last edited 2021-09-22 09:54:56 by DevaniCordero)