## RMINC

RMINC is a package for the R statistical environment designed to read and write MINC volumes as well as perform optimized statistical routines at every voxel of a set of files. Supports getting and writing volumes, running voxel-wise linear models, correlations, correcting for multiple comparisons with False Discovery Rate control, and more.

## Installation

The easiest way to get up and running is by using devtools:

devtools::install_github("Mouse-Imaging-Centre/RMINC"
, upgrade_dependencies = FALSE)

or use the most recent development version

devtools::install_github("Mouse-Imaging-Centre/RMINC@develop"
, upgrade_dependencies = FALSE)

see the INSTALL file for more details. Particularly you will need the c library libminc, available through the minc-toolkit.

## Getting Help

The best place for function specific help is the manual, which is best accessed through ?/help interfaces from within R, or via the reference tab of this page. If you notice any bugs, or anything that can be improved in RMINC, we’re pretty responsive on our issues page.

## Walk Throughs

These are quite old, so buyer beware. The tutorials are much more up to date. I highly recommend Analyzing Anatomy with Hierarchical Atlases instead of the ROI based analysis, and How RMINC Parallel Works instead of the parallel examples.

### ROI based anatomical analyses

This section presents a simplified ROI based analysis of neuroanatomy.

Before you start it is import to make sure:

1. the label file you use accurately segments out your final non-linear model otherwise your analysis will be meaningless
2. you use the Jacobian determinants that capture the absolute differences (in our case the log-determinant-scaled )
3. you use little/no blurring, for instance for files with a 56 micron resolution use 100 micron blur (in our case the log-determinant-scaled-fwhm0.1 )

To analyze individual structures in the brain, you will need two things: a set of registered images and an atlas that segments the final average of your registration pipeline into classified labels. This is known as a classified atlas. (Information about how you can align a segmented set of labels to your data set can be found here: Atlas to Atlas Segmentation) The main idea is the following: Through the registration pipeline we have information about the change in volume of the voxels for each of the individual mouse brains. This is captured in the Jacobian determinant files. Using the classified atlas, we can integrate the information captured in the Jacobian determinants to find out the volume of the structures for each of the individual input files.

The Jacobian determinant fields are fairly noisy if you use them unprocessed. For this reason we blur them before we do analysis on them. The amount of blurring you want to use depends on what kind of changes you want to capture. When looking at neuroanatomy it is important to use only a small amount of blurring, because you want to avoid smoothing out information around borders of structures. The mouse brains we generally look at have a resolution of 56 micron, and we do structural analysis on Jacobian determinants that have been blurred using a 100 micron kernel.

A second thing to keep in mind is that the registration pipeline produces two types of Jacobian determinants. One that reflects relative changes (overall scaling or brain size differences have been taking out) and absolute changes which still contain the brain size differences. When analyzing neuroanatomical structures you want to use the determinants that reflect the absolute changes. Currently the naming convention for those files is as follows:

FILENAME_BASE-log-determinant-scaled-fwhm_KERNEL_.mnc

Here is an example of how to do the analysis in R

library(RMINC)

# There is a variable that was loaded into R containing the genotype and the location of the Jacobian determinants called filenames, it contains the following:

filenames

 scaled_jacobians                              genotype
1 01_wt-log-determinant-scaled-fwhm0.1.mnc           wt
2 02_wt-log-determinant-scaled-fwhm0.1.mnc           wt
3 03_wt-log-determinant-scaled-fwhm0.1.mnc           wt
4 04_wt-log-determinant-scaled-fwhm0.1.mnc           wt
5 05_mut-log-determinant-scaled-fwhm0.1.mnc         mut
6 06_mut-log-determinant-scaled-fwhm0.1.mnc         mut
7 07_mut-log-determinant-scaled-fwhm0.1.mnc         mut
8 08_mut-log-determinant-scaled-fwhm0.1.mnc         mut

# The first thing to do now is to get the volume information of all the structures, the atlas with the labels is called resampled_atlas.mnc

volumes <- anatGetAll(filenames$scaled_jacobians, "resampled_atlas.mnc") # This atlas contains information about both left and right structures, but in this case we will combine the information of left and right structures and only look at combined volumes: volumes_combined <- anatCombineStructures(volumes) # Run a linear model to get information about f-statistics and t-statistics on the structures. The first argument is the formula to be used, in this case we want to look at differences between the genotype, the second argument is the data which is stored in the variable filenames, and lastly the actual volume information anatLm(~ genotype, filenames, volumes_combined) # The output will be a list of all the structures with a number of columns for the values of the f-statistics and t-statistics. Next to find out which of these changes survive multiple comparison corrections: anatFDR(anatLm(~ genotype, filenames, volumes_combined)) N: 20 P: 2 Beginning vertex loop: 62 3 Done with vertex loop Computing FDR threshold for all columns Computing threshold for F-statistic Computing threshold for (Intercept) Computing threshold for genotypewt Multidimensional MINC volume Columns: F-statistic (Intercept) genotypewt NULL Degrees of Freedom: c(1, 18) 18 18 FDR Thresholds: F-statistic (Intercept) genotypewt 0.01 NaN 25.36429 NaN 0.05 20.80325 25.36429 6.172145 0.1 17.94643 25.36429 4.542356 0.15 14.99036 25.36429 3.871739 0.2 14.99036 25.36429 3.871739 There were 11 warnings (use warnings() to see them) Which will give a table indicating what is significant at which FDR thresholds. In this case, all structures that have a t-statistic of at least 6.172145 or at most -6.172145 are significant at a 5% FDR threshold, all structures that have a t-statistic of at least 4.542356 or at most -4.542356 are significant at a 10% FDR threshold, etc. There are no structures significantly different at a 1% FDR threshold as indicated by the NaN. Analyzing differences in neuroanatomy with multiple sets of labels In addition to the procedure described above (which uses a single atlas and jacobian determinants for each individual brain) we can generate a unique set of labels for each individual brain. This can be done using MAGeT. Once you have a set of labels for each brain, analysis can proceed as described above, but with one major difference: the anatGetAll command must be called with a different set of arguments. Let’s look at this function more closely: anatGetAll function (filenames, atlas, method = "jacobians", defs = "/projects/mice/jlerch/cortex-label/c57_brain_atlas_labels.csv", dropLabels = FALSE, side = "both") The default values for method, defs, dropLabels and side are set for users at MICe who do 56 micron ex-vivo registrations. In these cases, the anatGetAll call is as shown above: volumes <- anatGetAll(filenames$scaled_jacobians, "resampled_atlas.mnc")

For labels generated using MAGeT, you would need the following:

• filenames: Instead of the scaled_jacobian files, you would instead include the labels for each brain generated by MAGeT.
• atlas: This argument is NOT USED when each file has its own set of labels. Unfortunately, you still need to specify something or the function will fail. An example of what to specify is the labels for the first file – filenames$labels[1] • method: method = “labels” must be specified • defs: This is critical if the label definitions for the files you are looking at differ from the standard set of 62, described in Dorr, et. al. This will also need to be specified for users who wish use the set described in Dorr et.al, but are not using the machines at MICe. • dropLabels and side can continue to use the defaults. So, putting it all together, here is what your anatGetAll call and following analysis would look like: #anatGetAll call, with slightly different arguments volumes <- anatGetAll(filenames$labels, filenames$labels[1], method="labels", defs="brain_label_mappings.csv") #combining structures, anatLm, anatFDR proceed as above: volumes_combined <- anatCombineStructures(volumes, defs="brain_label_mappings.csv") anatLm(~ genotype, filenames, volumes_combined) anatFDR(anatLm(~ genotype, filenames, volumes_combined)) ### Doing analyses in parallel These instructions are largely superseded by the parallel tutorial. pMincApply, just as mincApply, can be used to run any function on all voxels in your input files. The difference with mincApply, is that pMincApply can be parallelized (hence the p). You can use snowfall to run it locally using multiple cores on your machine, or sge to submit jobs to a batch system. Here is a short example of how to use pMincApply to run a function on your input files that is not implemented by any of the standard MINC functions. # load the RMINC library library(RMINC) # load the mapping of your input files and in this case genotype # the aim of this example is to test the variance of the Jacobian determinants between wild types and mutants gf <- read.csv("filenames_and_genotype.csv") # Figure out how your function works on your data. For example test # it on a single voxel: voxel <- mincGetVoxel(gf$jacobians, 0,0,0)

# Run the function you want to use
# the "car" library provides "leveneTest"
library(car)
leveneTest(voxel, gf$Strain, center=mean) Levene's Test for Homogeneity of Variance (center = mean) Df F value Pr(>F) group 2 1.5514 0.2188 74  # in this example we will extract the F value, and the # p value which can be done as so: leveneTest(voxel, gf$Strain, center=mean)[1,2]
leveneTest(voxel, gf$Strain, center=mean)[1,3] # also it's smart to encapsulate the return value by # as.numeric to make sure it's returned as a number # write a function that will be passed on to pMincApply. This # function will directly refer to variables in the gf object # this function will return the F-value and p value from the Levene's Test # which comes from row 1, column 2 (as a numeric value) and column 3 leveneTestForRMINC <- function(x, gf) { fout <- as.numeric(leveneTest(x, gf$Strain, center=mean)[1,2])
pout <- as.numeric(leveneTest(x, gf$Strain, center=mean)[1,3]) return(c(fout, pout)) } ### Local Parallel Run parallel on your local machine. it can be useful to set the environment variable MINC_CACHE_MB=1 or some other small number to save RAM outLeveneTest <- pMincApply(gf$jacobians
, leveneTestForRMINC
, gf = gf
, local = TRUE
, batches = 4
, global=c("gf","leveneTestForRMINC")
, packages="car")

# write your stats out to file
mincWriteVolume(outLeveneTest, "f_values.mnc", 1)
mincWriteVolume(outLeveneTest, "p_values.mnc", 2)

### Cluster Parallel

Instead of running the pMincApply in parallel on your machine you can also use a cluster. Clusters are configured and jobs are managed by the excellent batchtools package. See the parallel tutorial above for more details.

outLeveneTest <-
pMincApply(gf\$jacobians
, leveneTestForRMINC
, gf = gf
, packages="car")