|
Size: 18071
Comment:
|
Size: 11526
Comment:
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 1: | Line 1: |
| #acl LcnGroup:read,write,delete,revert All: | #acl LcnGroup:read,write,delete,revert All:read |
| Line 3: | Line 3: |
| LME Matlab tools. Author: Jorge Luis Bernal Rusiel, 2012. jbernal[at]nmr.mgh.harvard.edu or jbernal0019[at]yahoo.es | This page describes ways of analyzing longitudinal data after processing it using the [[LongitudinalProcessing|longitudinal stream]] in Freesurfer. |
| Line 5: | Line 5: |
| If you use these tools in your analysis please cite: | Longitudinal data are more complex than cross-sectional data, as repeated measures are correlated within each subject. The strength of this correlation will depend on the time separation between scans. In addition, extra care must be taken when the data exhibit significant between-subject variation in number of time points and between-scan intervals (imperfect timing). A statistical analysis should then consider these data features in order to obtain valid statistical inferences. |
| Line 7: | Line 7: |
| Bernal-Rusiel J.L., Greve D.N., Reuter M., Fischl B., Sabuncu M.R., 2012. Statistical Analysis of Longitudinal Neuroimage Data with Linear Mixed Effects Models, NeuroImage, doi:10.1016/j.neuroimage.2012.10.065. | Freesurfer currently comes with (at least) three different frameworks for the analysis of longitudinal data: |
| Line 9: | Line 9: |
| These Matlab tools are freely distributed and intended to help neuroimaging researchers when analysing longitudinal neuroimaging (LNI) data. The statistical analysis of such type of data is arguable more challenging than the cross-sectional or time series data traditionally encountered in the neuroimaging field. This is because the timing associated with the measurement occasions and the underlying biological process under study are not usually under full experimental control. | 1. Simplified [[RepeatedMeasuresAnova|repeated measures ANOVA]] (ignores correlation and timing of the measurement occasions) 1. [[LongitudinalTwoStageModel|Direct analysis of atrophy]] rates or percent changes (ignores correlation and single time points) 1. [[LinearMixedEffectsModels|Linear mixed effects models]] <-- '''recommended''' (but more complex) |
| Line 11: | Line 13: |
| There are two aspects of longitudinal data that require correct modeling: The mean response over time and the covariance among repeated measures on the same individual. I hope these tools can serve for such modeling purpose as they provide functionality for exploratory data visualization, model specification, model selection, parameter estimation, inference and power analysis including sample size estimation. They are specially targeted to be used with Freesurfer's data but can be used with any other data as long as they are loaded into Matlab and put into the appropriate format. Here are some recommendations about how to use these tools. | <<TableOfContents(2)>> |
| Line 13: | Line 15: |
| <<TableOfContents>> | ---- |
| Line 15: | Line 17: |
| == Preparing your data == There are two types of analyses that can be done: univariate and mass-univariate. The first step is to load your data into Matlab. If you are working with Freesurfer then univariate data (eg. Hippocampus' volume) can be loaded using Qdec tables. There are, under the Qdec directory, some simple example scripts for reading and writing Freesurfer's Qdec tables. |
== Simplified Repeated Measures ANOVA == This method can be used to check for differences between individual time points or compare time point differences across groups. For two time points it simplifies to a PairedAnalysis. |
| Line 18: | Line 20: |
| In order to read mass-univariate data you should use the following scripts: | '''Advantages:''' |
| Line 20: | Line 22: |
| ''fs_read_label.m '' | * Included in mri_glmfit. * Does not assume any specific trend in the mean response over time and thus can capture complex trajectories. * Can make use of different multiple comparisons methods that come with mri_glmfit. |
| Line 22: | Line 26: |
| ''fs_read_surf.m '' | '''Disadvantages:''' |
| Line 24: | Line 28: |
| ''fs_read_Y.m '' | * Does NOT consider the correlation among the repeated measures, and thus, there is a significant reduction in statistical power. * Does NOT consider the timing of the measurement occasions which may result in a further reduction in power. * Can only be applied to balanced data (all subject have their scans acquired at the same set of measurement occasions) with a small number of repeated measures (<=3). |
| Line 26: | Line 32: |
| The last two depend on Freesurfer's scripts so you need to have installed Freesurfer software package and included the Freesurfer's matlab subdirectory in your Matlab's search path. | For details see: RepeatedMeasuresAnova |
| Line 28: | Line 34: |
| Previously, the mass-univariate data can be generated in Freesurfer by running variants of the following commands: | ---- |
| Line 30: | Line 36: |
| '''mris_preproc''' --qdec qdec.table.dat --target study_average --hemi lh --meas thickness --out lh.thickness.mgh (assembles your thickness data into a single lh.thickness.mgh file) | == Analysis of Rates or Percent Changes == To analyze, e.g. annualized percent change or atrophy rates for 2 or more time points, one can run a two stage model. This avoids dealing with the longitudinal correlation. The two stages are: |
| Line 32: | Line 39: |
| '''mri_surf2surf''' --hemi lh --s study_average --sval lh.thickness.mgh --tval lh.thickness_sm10.mgh --fwhm-trg 10 --cortex --noreshape (smooths the cortical thickness maps with FWHM=10 mm. Note the --cortex and --noreshape options) | 1. First, simplify the statistic to a single number for each subject (the difference of two time points, or the slope of the fitting line, or the annualized percent change, etc...). 1. Then analyze the obtained summary measure across subjects or groups with a standard GLM. |
| Line 34: | Line 42: |
| Then you can load the cortical thickness data lh.thickness_sm10.mgh into Matlab using fs_read_Y.m | This model is quite simple and can be an option if all subjects have the same number of time points, approximately equally spaced. Linear fits into each subject data are often meaningful, as longitudinal change can be assumed to be almost linear within a short time frame in several applications. |
| Line 36: | Line 44: |
| eg. | '''Advantages:''' |
| Line 38: | Line 46: |
| ''[Y,mri] = fs_read_Y('lh.thickness_sm10.mgh'); '' | * Can deal with differently many and differently spaced time points (but does not model the difference in variability). * Works on ROI stat (e.g. aseg.stats or aparc.stats) and on cortical maps (e.g. thickness). * The second stage can be performed with QDEC (simple GUI) or directly with mri_glmfit. * The second stage analysis can make use of different multiple comparisons methods that come with mri_glmfit. * Scripts are available ( long_mris_slopes and long_stats_slopes ), no matlab needed. * For the simple case of two time points and when looking at simple differences this model simplifies to a paired analysis, but can additionally compute (symmetrized) percent changes. * Includes code for intersecting cortex labels (across time and across subjects) to make sure that all non-cortex vertices are excluded. |
| Line 40: | Line 54: |
| You should also read the spherical surface (lh.sphere) and cortex label (lh.cortex.label) of study_average. | '''Disadvantages:''' |
| Line 42: | Line 56: |
| eg. | * Does NOT model the correlation among the repeated measures, and thus, there is a significant reduction in statistical power. * Does NOT account for different certainty of within subject slopes depending on the number of time points and therefore it has the highest propensity to false positives (type I family wise error in the mass-univariate setting). * Difficult to model non-linear temporal behaviour. * Difficult to deal with time varying co-variates (slopes would need to be fit into those for each subject to reduce these to a single number). * Cannot include information from subjects with only a single time point and thus the results are likely to be biased and have less statistical power. |
| Line 44: | Line 62: |
| ''lhsphere = fs_read_surf('$FsDir/freesurfer/subjects/fsaverage/surf/lh.sphere'); '' | The linear mixed effects model overcomes these limitations and should be used if subjects have differently many time points (or for more complex modeling). |
| Line 46: | Line 64: |
| ''lhcortex = fs_read_label('$FsDir/freesurfer/subjects/fsaverage/label/lh.cortex.label'); '' | For details see: LongitudinalTwoStageModel |
| Line 48: | Line 66: |
| Once you have your data in Matlab you need to build your design matrix. For computational efficiency reasons, these tools require the data ordered according to time for each individual (that is, your design matrix needs to have all the repeated assessments for the first subject, then all for the second and so on). You can use the script: | ---- |
| Line 50: | Line 68: |
| ''sortData '' | == Linear Mixed Effects Model == A Linear Mixed Effects (LME) model is the most powerful and principled approach. We recommend this approach. |
| Line 52: | Line 71: |
| For example, if you have your covariates in a Qdec table then you can use the following code | '''Advantages:''' |
| Line 54: | Line 73: |
| ''Qdec = fReadQdec('qdec.table.dat'); '' | * Works for both stats (univariate) and surface analysis (mass-univariate). * Can handle unequal timing and different number of time points across subjects (missing data). * Even subjects with only a single time point can be included into these models (make sure they also run through the longitudinal stream, available with version FS 5.2, to avoid a bias due to different image processing) . * Appropriately models the temporal correlation. * Can model different variances across measurement occasions. * Our mass-univariate method can deal very well with the spatial correlation among measurements on the cortex and is very fast by working with spatial regions in which the correlation structure is relative constant. * Can be used to model complex longitudinal behavior (e.g. quadratic, or piecewise linear trajectories) and time-varying covariates. * It seems to have become the consensus among statisticians that LME models are the right mechanism to study longitudinal data and they may be requested in journal publications by the reviewers. |
| Line 56: | Line 82: |
| ''Qdec = rmQdecCol(Qdec,1); '' | '''Disadvantages:''' |
| Line 58: | Line 84: |
| ''sids = Qdec(:,1); '' | * More complicated use (e.g. requires distinguishing mixed effects from fixed effects ...). * Currently, our implementation is in Matlab only. * Currently only offers FDR for multiple comparisons correction. |
| Line 60: | Line 88: |
| ''Qdec = rmQdecCol(Qdec,1); '' | [[LinearMixedEffectsModels]] allow ROI analysis as well as advanced longitudinal analysis for cortical maps. Here we only discuss how to prepare your data for that analysis. The analysis itself is performed in matlab. |
| Line 62: | Line 90: |
| ''M = Qdec2num(Qdec); '' | Similar to regular (cross sectional) processing, ROI stats data is contained in stats files (cf. the ROI tutorial). You could, e.g., open the stats text files in each {{{tpN.long.templateID/stats/}}} dir, containing statistics such as volume of subcortical structures or thickness averages for cortical regions. These statistics can be fed into any external statistical packages to run whatever analysis you are interested in. Helpful commands to grab the data from all subjects and time points and create a single table are {{{asegstats2table}}} and {{{aparcstats2table}}}. |
| Line 64: | Line 92: |
| ''[M,Y,ni] = sortData(M,2,Y,sids); '' | For example to create a table with subcoritical ROI's from all subjects and all time points you would run this : |
| Line 66: | Line 94: |
| A design matrix X can then be built from matrix M to represent a desired longitudinal model. | {{{ asegstats2table --qdec-long long.qdec.table.dat --stats aseg.stats --tablefile aseg.table.txt }}} |
| Line 68: | Line 98: |
| == Model specification == Analysis of longitudinal data should always start by simple graphical displays of the data. A natural way is through the use of time plots. A time plot is a scatter-plot with the responses on the vertical axis and the measurement times on the horizontal axis. In general it is usually more informative to display a time plot of the smoothed mean response over time. This graphical display can be quite enlightening and can provide the basis for choosing an appropriate model for the analysis of change over time. The toolbox provides the following two functions: |
This will automatically grab the stats from the longitudinal directories ({{{tpN.long.templateID/stats/}}}) and create a table (rows: subject/time points, columns: structures). Similarly you can use {{{aparcstats2table}}} for surface ROI analysis. |
| Line 71: | Line 100: |
| ''lme_timePlot '' | To run [[LinearMixedEffectsModels]] on surface maps, you need to map all the data to a template (usually fsaverage) and smooth the data: {{{ mris_preproc --qdec-long long.qdec.table.dat --target fsaverage --hemi lh --meas thickness --out lh.thickness.stack.mgh mri_surf2surf --hemi lh --s fsaverage --sval lh.thickness.stack.mgh --tval lh.thickness.stack.fwhm10.mgh --fwhm-trg 10 --cortex --noreshape }}} |
| Line 73: | Line 106: |
| ''lme_lowessPlot '' | For details see: LinearMixedEffectsModels |
| Line 75: | Line 108: |
| In the mass-univariate case it is more difficult to determine the trend in the mean response over time since it may vary across different regions of the cortex. You can plot the trends in different regions (eg.from the Freesurfer's parcelation) and select the most complex trend for your model (eg. quadratic in time over linear in time, etc...). We recommend to order the columns of your design matrix in the following way: First, the intercept term (which is a column of ones); second, the time covariate; third, any time-varying covariates (eg. time^2); fourth, the group covariates of interest (eg. a binary variable indicating whether the subject is an Alzheimer's patient or not) and their interactions with the time-varying covariates; finally any other nuisance time-invariant covariate (eg. gender). See bellow in section 7 for an example of a model for the design matrix. | ---- |
| Line 77: | Line 110: |
| == Parameter estimation == A very flexible and versatile approach for analyzing longitudinal continuous data is the linear mixed effects (LME) regression paradigm. This paradigm can provide parsimonious models for both the trend in the mean response over time and the covariance among repeated measures on the same individual. The toolbox provides three types of tools for fitting LME models: |
== Longitudinal QDEC Table == |
| Line 80: | Line 112: |
| === a)Univariate === ''lme_fit_FS '' |
QDEC is a graphical program to perform simple statistical analysis of cross sectional data. A QDEC table is a simple table in text format that contains subject ID's (one subject per row) and different co-variables per column (e.g. age, gender, diagnosis, …). The first row contains a header, where the first column header is '''fsid''' and the other columns are named according to their content. It is described in the [[ QdecGroupAnalysis_freeview | QDEC Group Analysis tutorial ]]. Note that QDEC currently cannot perform longitudinal statistics directly! |
| Line 83: | Line 114: |
| ''lme_fit_EM '' | For the analysis of longitudinal data ( [[LongitudinalStatistics]] ) several command line tools require a 'longitudinal QDEC table'. This table is based on the QDEC table format with an additional 2nd column '''fsid-base''' that groups and assigns several time point to their subject. |
| Line 85: | Line 116: |
| ''lme_fit_NR '' | To get the longitudinal data ready for statistical analysis (LongitudinalTwoStageModel or LinearMixedEffectsModels) you need to create a table (space separated as a text file) in the following format: ||fsid ||fsid-base ||years ||... || ||OAS2_0001_MR1 ||OAS2_0001 ||0 || || ||OAS2_0001_MR2 ||OAS2_0001 ||1.25 || || ||OAS2_0004_MR1 ||OAS2_0004 ||0 || || ||OAS2_0004_MR2 ||OAS2_0004 ||1.47 || || ||... || || || || |
| Line 87: | Line 124: |
| === b)Mass-univariate === ''lme_mass_fit_vw '' |
|
| Line 90: | Line 125: |
| ''lme_mass_fit '' | where the first column is called '''fsid''' (containing all time points of all subjects) and the second column is '''fsid-base''' containing the within-subject template (=base) name, to group time points within subject. You can have many more columns such as gender, age, group, etc. Make sure there is a column containing an accurate time variable (optimally measured in years if you are interested in annualized change) such as age or the duration of the study (time from inital scan). Here we use '''years''' to measure the time from baseline scan (=tp1). You can see in the table that the two subjects OAS2_0001 and OAS2_0004 each have two time points (MR1, MR2) that are not equally spaced (approx 15 and 18 months apart). |
| Line 92: | Line 127: |
| === c)Novel mass-univariate tools (spatiotemporal models) === Spatiotemporal models are more powerful to detect effects in your data and usually require less computation time than traditional vertex-wise mass-univariate tools. Here you should first compute initial covariance component estimates using: |
Note, the '''fsid''' column contains the original subject/time point id's, not the longitudinal names. The command-line scripts know that this is a longitudinal table (because of the parameter, usually '''--qdec-long''' and existing '''fsid-base''' column) and will process the data from the longitudinal directories automatically. For example: {{{ long_mris_slopes --qdec ./qdec/long.qdec.table.dat --meas thickness --hemi lh --do-avg --do-rate --do-pc1 --do-spc --do-stack --do-label --time years --qcache fsaverage }}} is a tool for the LongitudinalTwoStageModel and expects a longitudinal QDEC table (even though the flag is only --qdec). It will automatically grab the data from the longitudinal subN_tp1.long.subNtemplate directories and compute within subject atrophy rates etc. |
| Line 95: | Line 135: |
| ''lme_mass_fit_init '' or'' lme_mass_fit_EMinit '' | Also: {{{ mris_preproc --qdec-long qdec.table.dat --target study_average --hemi lh --meas thickness --out lh.thickness.mgh }}} is a tool usually used for mri_glmfit. It iterates over the subjects from the QDEC table, maps them to the study_average (usually fsaverage) and stacks them into a single file. In this example it takes a longitudinal QDEC table (--qdec-long) and then takes the data from the longitudinal directories, to map and stack them and get them ready for the LinearMixedEffectsModels (usually you would do a smoothing step with mri_surf2surf after the mris_preproc is finished, see the LME description). |
| Line 97: | Line 141: |
| these estimates can then be used to segment the brain into homogeneous regions with similar random effects covariance parameters by | |
| Line 99: | Line 142: |
| ''lme_mass_!RgGrow'' The spatiotemporal model can then be fitted by using ''lme_mass_fit_Rgw '' together with the previous segmentation and initial covariance estimates. == Model selection == Here a likelihood ratio test can be used to compare a model with q random effects against a model with q+1 random effects using === a)Univariate === ''lme_LR '' === b)Mass-univariate === ''lme_mass_LR'' (a correction for multiple comparisons is then required) == Inference == === a)Univariate === ''lme_F '' === b)Mass-univariate === ''lme_mass_F'' (a correction for multiple comparisons is then required) At this time the only tools provided for multiple comparisons correction are ''lme_mass_FDR'' (Original Benjamini and Hochberg FDR procedure) ''lme_mass_FDR2'' (A powerful two-stage FDR procedure) == Power analysis == Only univariate tools are provided: ''lme_plannedPower '' ''lme_plannedSampleSize '' ''lme_realizedPower '' == Example data analyses == Here are two example analyses using ADNI data processed with Freesurfer === a)Univariate === In this case the interest was in determining the differences in mean hippocampal volume change over time among four groups of individuals: healthy controls (HC), stable mild cognitive impairment patients (sMCI), MCI patients who converted to Alzheimer's disease (cMCI) during the follow-up period and patients with Alzheimer's disease (AD) at baseline. A Qdec table file qdec.table.dat was used to process in Freesurfer all time point scans from all subjects included in the study. It contains the following columns: 1) Freesurfer's fsid (fsid) 2) subject ID (sID, in this case an integer between one and the total number of subjects in the study) 3) time from baseline (time) 4) group (HC=1, sMCI=2, cMCI=3, AD=4) 5) Apoe4 (E4:carriers=1, non-carriers=0) 6) Gender (female=1, male=0) 7) Age at baseline (!BslAge) 8) Education (in years). The qdec.table.dat file was then used to collect volumetric data for all subjects within the Freesurfer's Qdec interface. This resulted in a new Qdec table which had the previous columns plus three new columns: 9) left Hippocampus volume 10) right Hippocampus volume 11) total intracranial volume (ICV). The data was loaded into Matlab using: ''Qdec = fReadQdec('qdec.table.dat'); '' ''Qdec = rmQdecCol(Qdec,1); '' ''sID = Qdec(:,1); '' ''Qdec = rmQdecCol(Qdec,1); '' ''M = Qdec2num(Qdec); '' ''Y = M(:,7:9); '' ''M = M(:,1:6); '' All the data was ordered according to time for each subject using ''[M,Y,ni] = sortData(M,1,Y,sID); '' where M,Y,ni are now the ordered assessment variables, the ordered data and the vector with the number of repeated measures for each subject respectively. The mean response trends over time were then plotted ''lme_lowessPlot(M(:,1),Y(:,1)+Y(:,2),0.70,M(:,2)); '' These plots revealed a linear trajectory over time for the four groups. So, the following LME model was proposed: Y,,ij,, = ß,,1,, + ß,,2,,*t,,ij,, +ß,,3,,*sMCI,,i,, + ß,,4,,*sMCI,,i,, *t,,ij,, + ß,,5,,*cMCI,,i,, + ß,,6,,*cMCI,,i,, *t,,ij,, + ß,,7,,*AD,,i,, + ß,,8,,*AD,,i,, *t,,ij,, + ß,,9,,*E4,,i,, + ß,,10,,*E4,,i,, *t,,ij,, + ß,,11,,*Gender,,i,, + ß,,12,,*!BslAge,,i,, + ß,,13,,*Education,,i,, + ß,,14,,*ICV,,i,, + b,,1i,, + b,,2i,, + e,,ij,, Thus, our design matrix X was built from matrix M to represent the previous model. The design matrix X then had 14 columns: 1) intercept (all ones) 2) time (t,,ij,,) 3) one for sMCI and zero otherwise (sMCI,,i,,) 4) colum 3) .* time 5) one for cMCI and zero otherwise (cMCI,,i,,) 6) colum 5) .* time 7) one for AD and zero otherwise (AD,,i,,) 8) colum 7) .* time 9) E4 10) E4 .* time 11) Gender 12) !BslAge 13) Education 14) ICV (converted to liters). The parameters of this model were estimated using ''total_hipp_vol_stats = lme_fit_FS(X,[1 2],Y(:,1)+Y(:,2),ni); '' Note that this model contains two random effects corresponding to the intercept and slope terms (b,,1i,,, b,,2i,,). This model can be compared with the model with only a single random effect corresponding to the intercept (b,,1i,,): ''total_hipp_vol_stats_1RF = lme_fit_FS(X,[1],Y(:,1)+Y(:,2),ni); '' using the likelihood ratio test: ''lr = lme_LR(total_hipp_vol_stats.lreml,total_hipp_vol_stats_1RF.lreml,1); '' The previous test was significant (pval<0.0001) indicating that the model with two random effects is significantly better than the model with a single random effect. To test if there was any difference in the rate of change over time among the four groups the following contrast was used ''C = [zeros(3,3) [1 0 0 0 0; -1 0 1 0 0; 0 0 -1 0 1] zeros(3,6)]; '' and the F-test ''F_C = lme_F(total_hipp_vol_stats,C); '' The p-value of the test was in ''F_C.pval''. === b)Mass-univariate === In this case the interest was in determining regionally specific differences in cortical thickness atrophy rate over time among the previous four groups of individuals: HC, sMCI, cMCI and AD. Thus we had the same Qdec table of the previous example. The following Freesurfer's comands were used to generate the spatial cortical thickness data: '''mris_preproc''' --qdec qdec.table.dat --target fsaverage --hemi lh --meas thickness --out lh.thickness.mgh '''mri_surf2surf''' --hemi lh --s fsaverage --sval lh.thickness.mgh --tval lh.thickness_sm10.mgh --fwhm-trg 10 --cortex --noreshape Then the spatially smoothed data lh.thickness_sm10.mgh were loaded into Matlab using ''[Y,mri] = fs_read_Y('lh.thickness_sm10.mgh'); '' As in the previous example, the data were ordered according to time for each individual: ''Qdec = fReadQdec('qdec.table.dat'); '' ''Qdec = rmQdecCol(Qdec,1); '' ''sID = Qdec(:,1); '' ''Qdec = rmQdecCol(Qdec,1); '' ''M = Qdec2num(Qdec); '' ''[M,Y,ni] = sortData(M,1,Y,sID); '' The fsaverage's spherical surface (lh.sphere) and cortex label (lh.cortex.label) were read with: ''lhsphere = fs_read_surf('$FsDir/freesurfer/subjects/fsaverage/surf/lh.sphere'); '' ''lhcortex = fs_read_label('$FsDir/freesurfer/subjects/fsaverage/label/lh.cortex.label'); '' A priori, we expected a linear model of cortical thickness atrophy over time. However, in order to account for any possible consistent non-linearity across vertices we carried out a model selection procedure starting with a quadratic model over time. So, the following LME model was initially proposed: Y,,ij,, = ß,,1,, + ß,,2,,*t,,ij,, + ß,,3,,*t²,,ij,, + ß,,4,,*sMCI,,i,, + ß,,5,,*sMCI,,i,,*t,,ij,, + ß,,6,,*sMCI,,i,,*t²,,ij,, + ß,,7,,*cMCI,,i,, + ß,,8,,*cMCI,,i,,*t,,ij,, + ß,,9,,*cMCI,,i,,*t²,,ij,, + ß,,10,,*AD,,i,, + ß,,11,,*AD,,i,,*t,,ij,, + ß,,12,,*AD,,i,,*t²,,ij,, + ß,,13,,*E4,,i,, + ß,,14,,*E4,,i,,*t,,ij,, + ß,,15,,*Gender,,i,, + ß,,16,,*!BslAge,,i,, + ß,,17,,*Education,,i,, + b,,1i,, + b,,2i,, + e,,ij,, Thus, our design matrix X was built from matrix M to represent the previous model.Then initial vertex-wise temporal covariance estimates were computed with: ''[lhTh0,lhRe] = lme_mass_fit_EMinit(X,[1 2],Y,ni,lhcortex,3); '' These covariance estimates were segmented into homogeneous regions using: ''[lhRgs,lhRgMeans] = lme_mass_!RgGrow(lhsphere,lhRe,lhTh0,lhcortex,2,95); '' Here both lhTh0 and lhRgMeans maps were overlaid onto lhsphere and visually compared each other to ensure that they were similar enough (the essential spatial organization of the initial covariance estimates was not lost after the segmentation procedure): ''surf.faces = lhsphere.tri; '' ''surf.vertices = lhsphere.coord'; '' ''figure; p1 = patch(surf); '' ''set(p1,'facecolor','interp','edgecolor','none','facevertexcdata',Th0(1,:)'); '' ''figure; p2 = patch(surf); set(p2,'facecolor','interp','edgecolor','none','facevertexcdata',lhRgMeans(1,:)'); '' ''The spatiotemporal LME model was fitted using: '' ''lhstats = lme_mass_fit_Rgw(X,[1 2],Y,ni,lhTh0,lhRgs,lhsphere); '' Note that there is no need to pass the lhcortex label into the previous estimation function since non-cortex vertices (codified as 0 in lhRgs segmentation vector) are not automatically considered for estimation. If it is desired to compare the previous LME model with the one with a single random effect (or the one with three random effects) the same segmentation must be used to estimate the parameters for both models. Here, in order to ensure validity of both models, the segmentation obtained for the most complex model must be used. This is because the same permisibility values for the segmentation procedure result in less and larger regions for the model with lesser number of random effects and using different segmentations for different models will invalid the subsequent likelihood ratio test. For instance, to compare the previous LME model with the one with a single random effect for the intercept term, the following steps can be followed: i) Fit the model with one random effect using the segmentation obtained from the previous model: ''lhTh0_1RF = lme_mass_fit_EMinit(X,[1],Y,ni,lhcortex,3); '' ''lhstats_1RF = lme_mass_fit_Rgw(X,[1],Y,ni,lhTh0_1RF,lhRgs,lhsphere); '' ii) Apply the likelihood ratio test: ''LR_pval = lme_mass_LR(lh_stats,lhstats_1RF,1); '' iii) Correct for multiple comparisons: ''dvtx = lme_mass_FDR2(LR_pval,ones(1,length(LR_pval)),lhcortex,0.05,1); '' In our case most cortex vertices survived the above correction (''length(dvtx)'' > 80% of ''length(lhcortex)'') and therefore we assumed that the model with two random effects was significantly better than the model with a single random effect. We then tested the null hypothesis of no group differences in the quadratic term: ''CM.C = [zeros(3,5) [1 0 0 0 0 0 0;-1 0 0 1 0 0 0;0 0 0 -1 0 0 1] zeros(3,5)];'' ''F_lhstats = lme_mass_F(lhstats,CM); '' and multiple comparisons were corrected with: ''dvtx = lme_mass_FDR2(F_lhstats.pval,F_lhstats.sgn,lhcortex,0.05,0); '' As dvtx was empty, which means that no vertex survived the multiple comparison correction, then all quadratic terms were removed from the model. Thus a linear model over time with two random effects, as in the univariate case, was then fitted using the above functions (lme_mass_fit_EMinit, lme_mass_!RgGrow and lme_mass_fit_Rgw) and the null hypothesis of no group differences in the rate of change over time among the four groups was tested with the following contrast: ''CM.C = [zeros(3,3) [1 0 0 0 0; -1 0 1 0 0; 0 0 -1 0 1] zeros(3,5)]; '' ''F_lhstats = lme_mass_F(lhstats,CM); '' The Freesurfer significance map was then written to disc for visualization and post-processing in Freesurfer (eg. using tksurfer tool): ''fs_write_fstats(F_lhstats,mri,'sig.mgh','sig'); '' Alternatively, a correction for multiple comparisons can be carried out directly here with ''[detvtx,sided_pval,pth] = lme_mass_FDR2(F_lhstats.pval,F_lhstats.sgn,lhcortex,0.05,0); '' then the sided p-values can be saved with ''mri1 = mri;<<BR>>'' ''mri1.volsz(4) = 1;'' ''fs_write_Y(sided_pval,mri1,'spval.mgh'); '' and finally the Freesurfer's '''mri_surfcluster''' cmd can be used with spval.mgh, lh.cortex.label and pth to generate the set of detected clusters and their anatomical coordinates. |
---- MartinReuter |
Longitudinal Statistics
This page describes ways of analyzing longitudinal data after processing it using the longitudinal stream in Freesurfer.
Longitudinal data are more complex than cross-sectional data, as repeated measures are correlated within each subject. The strength of this correlation will depend on the time separation between scans. In addition, extra care must be taken when the data exhibit significant between-subject variation in number of time points and between-scan intervals (imperfect timing). A statistical analysis should then consider these data features in order to obtain valid statistical inferences.
Freesurfer currently comes with (at least) three different frameworks for the analysis of longitudinal data:
Simplified repeated measures ANOVA (ignores correlation and timing of the measurement occasions)
Direct analysis of atrophy rates or percent changes (ignores correlation and single time points)
Linear mixed effects models <-- recommended (but more complex)
Contents
Simplified Repeated Measures ANOVA
This method can be used to check for differences between individual time points or compare time point differences across groups. For two time points it simplifies to a PairedAnalysis.
Advantages:
- Included in mri_glmfit.
- Does not assume any specific trend in the mean response over time and thus can capture complex trajectories.
- Can make use of different multiple comparisons methods that come with mri_glmfit.
Disadvantages:
- Does NOT consider the correlation among the repeated measures, and thus, there is a significant reduction in statistical power.
- Does NOT consider the timing of the measurement occasions which may result in a further reduction in power.
Can only be applied to balanced data (all subject have their scans acquired at the same set of measurement occasions) with a small number of repeated measures (<=3).
For details see: RepeatedMeasuresAnova
Analysis of Rates or Percent Changes
To analyze, e.g. annualized percent change or atrophy rates for 2 or more time points, one can run a two stage model. This avoids dealing with the longitudinal correlation. The two stages are:
- First, simplify the statistic to a single number for each subject (the difference of two time points, or the slope of the fitting line, or the annualized percent change, etc...).
- Then analyze the obtained summary measure across subjects or groups with a standard GLM.
This model is quite simple and can be an option if all subjects have the same number of time points, approximately equally spaced. Linear fits into each subject data are often meaningful, as longitudinal change can be assumed to be almost linear within a short time frame in several applications.
Advantages:
- Can deal with differently many and differently spaced time points (but does not model the difference in variability).
- Works on ROI stat (e.g. aseg.stats or aparc.stats) and on cortical maps (e.g. thickness).
- The second stage can be performed with QDEC (simple GUI) or directly with mri_glmfit.
- The second stage analysis can make use of different multiple comparisons methods that come with mri_glmfit.
- Scripts are available ( long_mris_slopes and long_stats_slopes ), no matlab needed.
- For the simple case of two time points and when looking at simple differences this model simplifies to a paired analysis, but can additionally compute (symmetrized) percent changes.
- Includes code for intersecting cortex labels (across time and across subjects) to make sure that all non-cortex vertices are excluded.
Disadvantages:
- Does NOT model the correlation among the repeated measures, and thus, there is a significant reduction in statistical power.
- Does NOT account for different certainty of within subject slopes depending on the number of time points and therefore it has the highest propensity to false positives (type I family wise error in the mass-univariate setting).
- Difficult to model non-linear temporal behaviour.
- Difficult to deal with time varying co-variates (slopes would need to be fit into those for each subject to reduce these to a single number).
- Cannot include information from subjects with only a single time point and thus the results are likely to be biased and have less statistical power.
The linear mixed effects model overcomes these limitations and should be used if subjects have differently many time points (or for more complex modeling).
For details see: LongitudinalTwoStageModel
Linear Mixed Effects Model
A Linear Mixed Effects (LME) model is the most powerful and principled approach. We recommend this approach.
Advantages:
- Works for both stats (univariate) and surface analysis (mass-univariate).
- Can handle unequal timing and different number of time points across subjects (missing data).
- Even subjects with only a single time point can be included into these models (make sure they also run through the longitudinal stream, available with version FS 5.2, to avoid a bias due to different image processing) .
- Appropriately models the temporal correlation.
- Can model different variances across measurement occasions.
- Our mass-univariate method can deal very well with the spatial correlation among measurements on the cortex and is very fast by working with spatial regions in which the correlation structure is relative constant.
- Can be used to model complex longitudinal behavior (e.g. quadratic, or piecewise linear trajectories) and time-varying covariates.
- It seems to have become the consensus among statisticians that LME models are the right mechanism to study longitudinal data and they may be requested in journal publications by the reviewers.
Disadvantages:
- More complicated use (e.g. requires distinguishing mixed effects from fixed effects ...).
- Currently, our implementation is in Matlab only.
- Currently only offers FDR for multiple comparisons correction.
LinearMixedEffectsModels allow ROI analysis as well as advanced longitudinal analysis for cortical maps. Here we only discuss how to prepare your data for that analysis. The analysis itself is performed in matlab.
Similar to regular (cross sectional) processing, ROI stats data is contained in stats files (cf. the ROI tutorial). You could, e.g., open the stats text files in each tpN.long.templateID/stats/ dir, containing statistics such as volume of subcortical structures or thickness averages for cortical regions. These statistics can be fed into any external statistical packages to run whatever analysis you are interested in. Helpful commands to grab the data from all subjects and time points and create a single table are asegstats2table and aparcstats2table.
For example to create a table with subcoritical ROI's from all subjects and all time points you would run this :
asegstats2table --qdec-long long.qdec.table.dat --stats aseg.stats --tablefile aseg.table.txt
This will automatically grab the stats from the longitudinal directories (tpN.long.templateID/stats/) and create a table (rows: subject/time points, columns: structures). Similarly you can use aparcstats2table for surface ROI analysis.
To run LinearMixedEffectsModels on surface maps, you need to map all the data to a template (usually fsaverage) and smooth the data:
mris_preproc --qdec-long long.qdec.table.dat --target fsaverage --hemi lh --meas thickness --out lh.thickness.stack.mgh mri_surf2surf --hemi lh --s fsaverage --sval lh.thickness.stack.mgh --tval lh.thickness.stack.fwhm10.mgh --fwhm-trg 10 --cortex --noreshape
For details see: LinearMixedEffectsModels
Longitudinal QDEC Table
QDEC is a graphical program to perform simple statistical analysis of cross sectional data. A QDEC table is a simple table in text format that contains subject ID's (one subject per row) and different co-variables per column (e.g. age, gender, diagnosis, …). The first row contains a header, where the first column header is fsid and the other columns are named according to their content. It is described in the QDEC Group Analysis tutorial. Note that QDEC currently cannot perform longitudinal statistics directly!
For the analysis of longitudinal data ( LongitudinalStatistics ) several command line tools require a 'longitudinal QDEC table'. This table is based on the QDEC table format with an additional 2nd column fsid-base that groups and assigns several time point to their subject.
To get the longitudinal data ready for statistical analysis (LongitudinalTwoStageModel or LinearMixedEffectsModels) you need to create a table (space separated as a text file) in the following format:
fsid |
fsid-base |
years |
... |
OAS2_0001_MR1 |
OAS2_0001 |
0 |
|
OAS2_0001_MR2 |
OAS2_0001 |
1.25 |
|
OAS2_0004_MR1 |
OAS2_0004 |
0 |
|
OAS2_0004_MR2 |
OAS2_0004 |
1.47 |
|
... |
|
|
|
where the first column is called fsid (containing all time points of all subjects) and the second column is fsid-base containing the within-subject template (=base) name, to group time points within subject. You can have many more columns such as gender, age, group, etc. Make sure there is a column containing an accurate time variable (optimally measured in years if you are interested in annualized change) such as age or the duration of the study (time from inital scan). Here we use years to measure the time from baseline scan (=tp1). You can see in the table that the two subjects OAS2_0001 and OAS2_0004 each have two time points (MR1, MR2) that are not equally spaced (approx 15 and 18 months apart).
Note, the fsid column contains the original subject/time point id's, not the longitudinal names. The command-line scripts know that this is a longitudinal table (because of the parameter, usually --qdec-long and existing fsid-base column) and will process the data from the longitudinal directories automatically.
For example:
long_mris_slopes --qdec ./qdec/long.qdec.table.dat --meas thickness --hemi lh --do-avg --do-rate --do-pc1 --do-spc --do-stack --do-label --time years --qcache fsaverage
is a tool for the LongitudinalTwoStageModel and expects a longitudinal QDEC table (even though the flag is only --qdec). It will automatically grab the data from the longitudinal subN_tp1.long.subNtemplate directories and compute within subject atrophy rates etc.
Also:
mris_preproc --qdec-long qdec.table.dat --target study_average --hemi lh --meas thickness --out lh.thickness.mgh
is a tool usually used for mri_glmfit. It iterates over the subjects from the QDEC table, maps them to the study_average (usually fsaverage) and stacks them into a single file. In this example it takes a longitudinal QDEC table (--qdec-long) and then takes the data from the longitudinal directories, to map and stack them and get them ready for the LinearMixedEffectsModels (usually you would do a smoothing step with mri_surf2surf after the mris_preproc is finished, see the LME description).
