Differences between revisions 6 and 42 (spanning 36 versions)
Revision 6 as of 2012-11-09 17:24:22
Size: 4099
Editor: jbernal
Comment:
Revision 42 as of 2018-07-25 11:20:26
Size: 8023
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#acl LcnGroup:read,write,delete,revert All: #acl LcnGroup:read,write,delete,revert All:read
= Longitudinal Statistics =
This page describes ways of analyzing longitudinal data after processing it using the [[LongitudinalProcessing|longitudinal stream]] in Freesurfer.
Line 3: Line 5:
= Longitudinal Statistics =
LME Matlab tools. Author: Jorge Luis Bernal Rusiel, 2012. jbernal@nmr.mgh.harvard.edu or jbernal0019@yahoo.es
Longitudinal data are more complex than cross-sectional data, as repeated measures are correlated within each subject. The strength of this correlation will depend on the time separation between scans. In addition, extra care must be taken when the data exhibit significant between-subject variation in number of time points and between-scan intervals (imperfect timing). A statistical analysis should then consider these data features in order to obtain valid statistical inferences.
Line 6: Line 7:
If you use these tools in your analysis please cite: Freesurfer currently comes with (at least) three different frameworks for the analysis of longitudinal data:
Line 8: Line 9:
Bernal-Rusiel J.L., Greve D.N., Reuter M., Fischl B., Sabuncu M.R., 2012. Statistical Analysis of Longitudinal Neuroimage Data with Linear Mixed Effects Models, NeuroImage, doi:10.1016/j.neuroimage.2012.10.065.  1. Simplified [[RepeatedMeasuresAnova|repeated measures ANOVA]] (ignores correlation and timing of the measurement occasions)
 1. [[LongitudinalTwoStageModel|Direct analysis of atrophy]] rates or percent changes (ignores correlation and single time points)
 1. [[LinearMixedEffectsModels|Linear mixed effects models]] <-- '''recommended''' (but more complex)
Line 10: Line 13:
These Matlab tools are freely distributed and intended to help neuroimaging researchers when analysing longitudinal neuroimaging (LNI) data. The statistical analysis of such type of data is arguable more challenging than the cross-sectional or time series data traditionally encountered in the neuroimaging field. This is because the timing associated with the measurement occasions and the underlying biological process under study are not usually under full experimental control. ----
Line 12: Line 15:
There are two aspects of longitudinal data that require correct modeling: The mean response over time and the covariance among repeated measures on the same individual. I hope these tools can serve for such modeling purpose as they provide functionality for exploratory data visualization, model specification, model selection, parameter estimation, inference and power analysis including sample size estimation. They are specially targeted to be used with Freesurfer's data but can be used with any other data as long as they are loaded into Matlab and put into the appropriate format. Here are some recommendations about how to use these tools. == Simplified Repeated Measures ANOVA ==
This method can be used to check for differences between individual time points or compare time point differences across groups. For two time points it simplifies to a PairedAnalysis.
Line 14: Line 18:
'''Advantages:'''
Line 15: Line 20:
<<TableOfContents>>  * Included in mri_glmfit.
 * Does not assume any specific trend in the mean response over time and thus can capture complex trajectories.
 * Can make use of different multiple comparisons methods that come with mri_glmfit.
Line 17: Line 24:
'''Disadvantages:'''
Line 18: Line 26:
 * Does NOT consider the correlation among the repeated measures, and thus, there is a significant reduction in statistical power.
 * Does NOT consider the timing of the measurement occasions which may result in a further reduction in power.
 * Can only be applied to balanced data (all subject have their scans acquired at the same set of measurement occasions) with a small number of repeated measures (<=3).
Line 19: Line 30:
== Preparing your data ==
There are two types of analyses that can be done: univariate and mass-univariate. The first step is to load your data into Matlab. If you are working with Freesurfer then univariate data (eg. Hippocampus volume) can be loaded using Qdec tables. There are, under the Qdec directory, some simple example scripts for reading and writing Freesurfer's Qdec tables.
For details see: RepeatedMeasuresAnova
Line 22: Line 32:
In order to read mass-univariate data you should use the following scripts: ----
Line 24: Line 34:
fs_read_label.m
fs_read_surf.m
fs_read_Y.m
== Analysis of Rates or Percent Changes ==
To analyze, e.g. annualized percent change or atrophy rates for 2 or more time points, one can run a two stage model. This avoids dealing with the longitudinal correlation. The two stages are:
Line 28: Line 37:
The last two depend on Freesurfer's scripts so you need to have installed Freesurfer software package and included the Freesurfer's matlab subdirectory in the Matlab's search path.  1. First, simplify the statistic to a single number for each subject (the difference of two time points, or the slope of the fitting line, or the annualized percent change, etc...).
 1. Then analyze the obtained summary measure across subjects or groups with a standard GLM.
Line 30: Line 40:
Previously, the mass-univariate data is generated in Freesurfer by running variants of the following commands: This model is quite simple and can be an option if all subjects have the same number of time points, approximately equally spaced. Linear fits into each subject data are often meaningful, as longitudinal change can be assumed to be almost linear within a short time frame in several applications.
Line 32: Line 42:
mris_preproc --qdec qdec.table.dat --target study_average --hemi lh --meas thickness --out lh.thickness.mgh (assembles your thickness data into a single lh.thickness.mgh file) '''Advantages:'''
Line 34: Line 44:
mri_surf2surf --hemi lh --s study_average --sval lh.thickness.mgh --tval lh.thickness_sm10.mgh --fwhm-trg 10 --cortex --noreshape (smooths the cortical thickness maps with FWHM=10 mm. Note the --cortex and --noreshape options)  * Can deal with differently many and differently spaced time points (but does not model the difference in variability).
 * Works on ROI stat (e.g. aseg.stats or aparc.stats) and on cortical maps (e.g. thickness).
 * The second stage can be performed with QDEC (simple GUI) or directly with mri_glmfit.
 * The second stage analysis can make use of different multiple comparisons methods that come with mri_glmfit.
 * Scripts are available ( long_mris_slopes and long_stats_slopes ), no matlab needed.
 * For the simple case of two time points and when looking at simple differences this model simplifies to a paired analysis, but can additionally compute (symmetrized) percent changes.
 * Includes code for intersecting cortex labels (across time and across subjects) to make sure that all non-cortex vertices are excluded.
Line 36: Line 52:
Then you can load the cortical thickness data lh.thickness_sm10.mgh into Matlab using '''Disadvantages:'''
Line 38: Line 54:
fs_read_Y.m
eg. [Y,mri] = fs_read_Y('lh.thickness_sm10.mgh');
 * Does NOT model the correlation among the repeated measures, and thus, there is a significant reduction in statistical power.
 * Does NOT account for different certainty of within subject slopes depending on the number of time points and therefore it has the highest propensity to false positives (type I family wise error in the mass-univariate setting).
 * Difficult to model non-linear temporal behaviour.
 * Difficult to deal with time varying co-variates (slopes would need to be fit into those for each subject to reduce these to a single number).
 * Cannot include information from subjects with only a single time point and thus the results are likely to be biased and have less statistical power.
Line 41: Line 60:
You should also read the spherical surface (lh.sphere) and cortex label (lh.cortex.label) of study_average. The linear mixed effects model overcomes these limitations and should be used if subjects have differently many time points (or for more complex modeling).
Line 43: Line 62:
eg. lhsphere = fs_read_surf('$FsDir/freesurfer/subjects/fsaverage/surf/lh.sphere');
    lhcortex = fs_read_label('$FsDir/freesurfer/subjects/fsaverage/label/lh.cortex.label');
For details see: LongitudinalTwoStageModel
Line 46: Line 64:
Once you have your data in Matlab you need to build your design matrix. For computational efficiency reasons, these tools require the data ordered according to time for each individual (that is, your design matrix needs to have all the repeated assessments for the first subject, then all for the second and so on). You can use the script: ----
Line 48: Line 66:
sortData == Linear Mixed Effects Model ==
A Linear Mixed Effects (LME) model is the most powerful and principled approach. We recommend this approach.
Line 50: Line 69:
For example, if you have your covariates in a Qdec table then you can use the following code '''Advantages:'''
Line 52: Line 71:
eg. Qdec = fReadQdec('qdec.table.dat');
    Qdec = rmQdecCol(Qdec,1);
    sids = Qdec(:,1);
    Qdec = rmQdecCol(Qdec,1);
    M = Qdec2num(Qdec);
    [M,Y,ni] = sortData(M,2,Y,sids);
 * Works for both stats (univariate) and surface analysis (mass-univariate).
 * Can handle unequal timing and different number of time points across subjects (missing data).
 * Even subjects with only a single time point can be included into these models (make sure they also run through the longitudinal stream, available with version FS 5.2, to avoid a bias due to different image processing) .
 * Appropriately models the temporal correlation.
 * Can model different variances across measurement occasions.
 * Our mass-univariate method can deal very well with the spatial correlation among measurements on the cortex and is very fast by working with spatial regions in which the correlation structure is relative constant.
 * Can be used to model complex longitudinal behavior (e.g. quadratic, or piecewise linear trajectories) and time-varying covariates.
 * It seems to have become the consensus among statisticians that LME models are the right mechanism to study longitudinal data and they may be requested in journal publications by the reviewers.
Line 59: Line 80:
== Model specification ==
== Parameter estimation ==
== Model selection ==
== Inference ==
== Power analysis ==
== Example data analyses ==
'''Disadvantages:'''

 * More complicated use (e.g. requires distinguishing mixed effects from fixed effects ...).
 * Currently, our implementation is in Matlab only.
 * Currently only offers FDR for multiple comparisons correction.

[[LinearMixedEffectsModels]] allow ROI analysis as well as advanced longitudinal analysis for cortical maps. Here we only discuss how to prepare your data for that analysis. The analysis itself is performed in matlab.

Similar to regular (cross sectional) processing, ROI stats data is contained in stats files (cf. the ROI tutorial). You could, e.g., open the stats text files in each {{{tpN.long.templateID/stats/}}} dir, containing statistics such as volume of subcortical structures or thickness averages for cortical regions. These statistics can be fed into any external statistical packages to run whatever analysis you are interested in. Helpful commands to grab the data from all subjects and time points and create a single table are {{{asegstats2table}}} and {{{aparcstats2table}}}.

For example to create a table with subcoritical ROI's from all subjects and all time points you would run this :

{{{
asegstats2table --qdec-long long.qdec.table.dat --stats aseg.stats --tablefile aseg.table.txt
}}}

This will automatically grab the stats from the longitudinal directories ({{{tpN.long.templateID/stats/}}}) and create a table (rows: subject/time points, columns: structures). Similarly you can use {{{aparcstats2table}}} for surface ROI analysis.

To run [[LinearMixedEffectsModels]] on surface maps, you need to map all the data to a template (usually fsaverage) and smooth the data:
{{{
mris_preproc --qdec-long long.qdec.table.dat --target fsaverage --hemi lh --meas thickness --out lh.thickness.stack.mgh
mri_surf2surf --hemi lh --s fsaverage --sval lh.thickness.stack.mgh --tval lh.thickness.stack.fwhm10.mgh --fwhm-trg 10 --cortex --noreshape
}}}

For details see: LinearMixedEffectsModels

----
MartinReuter

Longitudinal Statistics

This page describes ways of analyzing longitudinal data after processing it using the longitudinal stream in Freesurfer.

Longitudinal data are more complex than cross-sectional data, as repeated measures are correlated within each subject. The strength of this correlation will depend on the time separation between scans. In addition, extra care must be taken when the data exhibit significant between-subject variation in number of time points and between-scan intervals (imperfect timing). A statistical analysis should then consider these data features in order to obtain valid statistical inferences.

Freesurfer currently comes with (at least) three different frameworks for the analysis of longitudinal data:

  1. Simplified repeated measures ANOVA (ignores correlation and timing of the measurement occasions)

  2. Direct analysis of atrophy rates or percent changes (ignores correlation and single time points)

  3. Linear mixed effects models <-- recommended (but more complex)


Simplified Repeated Measures ANOVA

This method can be used to check for differences between individual time points or compare time point differences across groups. For two time points it simplifies to a PairedAnalysis.

Advantages:

  • Included in mri_glmfit.
  • Does not assume any specific trend in the mean response over time and thus can capture complex trajectories.
  • Can make use of different multiple comparisons methods that come with mri_glmfit.

Disadvantages:

  • Does NOT consider the correlation among the repeated measures, and thus, there is a significant reduction in statistical power.
  • Does NOT consider the timing of the measurement occasions which may result in a further reduction in power.
  • Can only be applied to balanced data (all subject have their scans acquired at the same set of measurement occasions) with a small number of repeated measures (<=3).

For details see: RepeatedMeasuresAnova


Analysis of Rates or Percent Changes

To analyze, e.g. annualized percent change or atrophy rates for 2 or more time points, one can run a two stage model. This avoids dealing with the longitudinal correlation. The two stages are:

  1. First, simplify the statistic to a single number for each subject (the difference of two time points, or the slope of the fitting line, or the annualized percent change, etc...).
  2. Then analyze the obtained summary measure across subjects or groups with a standard GLM.

This model is quite simple and can be an option if all subjects have the same number of time points, approximately equally spaced. Linear fits into each subject data are often meaningful, as longitudinal change can be assumed to be almost linear within a short time frame in several applications.

Advantages:

  • Can deal with differently many and differently spaced time points (but does not model the difference in variability).
  • Works on ROI stat (e.g. aseg.stats or aparc.stats) and on cortical maps (e.g. thickness).
  • The second stage can be performed with QDEC (simple GUI) or directly with mri_glmfit.
  • The second stage analysis can make use of different multiple comparisons methods that come with mri_glmfit.
  • Scripts are available ( long_mris_slopes and long_stats_slopes ), no matlab needed.
  • For the simple case of two time points and when looking at simple differences this model simplifies to a paired analysis, but can additionally compute (symmetrized) percent changes.
  • Includes code for intersecting cortex labels (across time and across subjects) to make sure that all non-cortex vertices are excluded.

Disadvantages:

  • Does NOT model the correlation among the repeated measures, and thus, there is a significant reduction in statistical power.
  • Does NOT account for different certainty of within subject slopes depending on the number of time points and therefore it has the highest propensity to false positives (type I family wise error in the mass-univariate setting).
  • Difficult to model non-linear temporal behaviour.
  • Difficult to deal with time varying co-variates (slopes would need to be fit into those for each subject to reduce these to a single number).
  • Cannot include information from subjects with only a single time point and thus the results are likely to be biased and have less statistical power.

The linear mixed effects model overcomes these limitations and should be used if subjects have differently many time points (or for more complex modeling).

For details see: LongitudinalTwoStageModel


Linear Mixed Effects Model

A Linear Mixed Effects (LME) model is the most powerful and principled approach. We recommend this approach.

Advantages:

  • Works for both stats (univariate) and surface analysis (mass-univariate).
  • Can handle unequal timing and different number of time points across subjects (missing data).
  • Even subjects with only a single time point can be included into these models (make sure they also run through the longitudinal stream, available with version FS 5.2, to avoid a bias due to different image processing) .
  • Appropriately models the temporal correlation.
  • Can model different variances across measurement occasions.
  • Our mass-univariate method can deal very well with the spatial correlation among measurements on the cortex and is very fast by working with spatial regions in which the correlation structure is relative constant.
  • Can be used to model complex longitudinal behavior (e.g. quadratic, or piecewise linear trajectories) and time-varying covariates.
  • It seems to have become the consensus among statisticians that LME models are the right mechanism to study longitudinal data and they may be requested in journal publications by the reviewers.

Disadvantages:

  • More complicated use (e.g. requires distinguishing mixed effects from fixed effects ...).
  • Currently, our implementation is in Matlab only.
  • Currently only offers FDR for multiple comparisons correction.

LinearMixedEffectsModels allow ROI analysis as well as advanced longitudinal analysis for cortical maps. Here we only discuss how to prepare your data for that analysis. The analysis itself is performed in matlab.

Similar to regular (cross sectional) processing, ROI stats data is contained in stats files (cf. the ROI tutorial). You could, e.g., open the stats text files in each tpN.long.templateID/stats/ dir, containing statistics such as volume of subcortical structures or thickness averages for cortical regions. These statistics can be fed into any external statistical packages to run whatever analysis you are interested in. Helpful commands to grab the data from all subjects and time points and create a single table are asegstats2table and aparcstats2table.

For example to create a table with subcoritical ROI's from all subjects and all time points you would run this :

asegstats2table --qdec-long long.qdec.table.dat --stats aseg.stats --tablefile aseg.table.txt

This will automatically grab the stats from the longitudinal directories (tpN.long.templateID/stats/) and create a table (rows: subject/time points, columns: structures). Similarly you can use aparcstats2table for surface ROI analysis.

To run LinearMixedEffectsModels on surface maps, you need to map all the data to a template (usually fsaverage) and smooth the data:

mris_preproc --qdec-long long.qdec.table.dat --target fsaverage --hemi lh --meas thickness --out lh.thickness.stack.mgh
mri_surf2surf --hemi lh --s fsaverage --sval lh.thickness.stack.mgh --tval lh.thickness.stack.fwhm10.mgh --fwhm-trg 10 --cortex --noreshape

For details see: LinearMixedEffectsModels


MartinReuter

LongitudinalStatistics (last edited 2018-07-25 12:06:32 by MorganFogarty)