Split a minc apply job into batches and process it either locally or a true grid computing setup. Endeavours to provide an abstract and customizable interface for job scheduling based on the batchtools package. Basic steps of the apply is to
create a registry with qMincRegistry where jobs are coordinated and results are deposited
map a function over batches of voxels in a collection of minc volumes with qMincMap, generating submission scripts for the queue scheduler and submitting the jobs
Collect the results from each batch with qMincReduce
, reorganizing the voxel
results as necessary to reproduce the original order, and then collating the results into a
usable object
Interfaces are provided to perform all three steps at once, either through qMincApply or the more general pMincApply. By default qMincApply will wait for the cluster to finish processing all jobs although the jobs can be submitted and the R session closed while still maintaining the ability to access results when the jobs have finished.
qMincApply(filenames, fun, ..., mask = NULL, batches = 4, tinyMask = FALSE, slab_sizes = NULL, resources = list(), packages = c("RMINC"), registry_dir = getwd(), registry_name = "qMincApply_registry", temp_dir = getwd(), cores = 1, wait = TRUE, cleanup = TRUE, clobber = FALSE, collate = simplify2minc, conf_file = getOption("RMINC_BATCH_CONF")) qMincRegistry(registry_name = "qMincApply_registry", packages = c("RMINC"), registry_dir = getwd(), clobber = FALSE, resources = list(), conf_file = getOption("RMINC_BATCH_CONF")) qMincMap(registry, filenames, fun, ..., mask = NULL, slab_sizes = NULL, batches = 4, tinyMask = FALSE, temp_dir = getwd(), cores = 1) qMincReduce(registry, ignore_incompletes = FALSE, wait = FALSE, collate = simplify2minc)
filenames | Paths to the minc files to apply accross |
---|---|
fun | An arbitrary R function to be applied |
... | extra arguments to pass down through |
mask | The mask used to select voxels to apply to |
batches | The number of batches to divide the job into, this is ignored for multicore jobs, with the number of batches set to the number of cores. |
tinyMask | Shrink the mask for testing |
slab_sizes | A 3 element vector indicating large a chunk of data to read from each minc file at a time defaults to one slice along the first dimension. |
resources | A list of resources to request from the queueing system
common examples including memory, walltime, and nodes see
|
packages | packages to be loaded for each job in a registry |
registry_dir | where batchtools should create the registry |
registry_name | a name for the registry |
temp_dir | A directory to store files needed for the parallelization and job management |
cores | the number of cores to parallelize across for each worker, defaults to 1 but higher numbers may be useful for batchtools multicore or systems like SciNet that do not allocate single core jobs. |
wait | Whether to wait for your results or return a registry object to be checked on later |
cleanup | Whether to empty the registry after a successful run defaults to true |
clobber | Whether to overwrite an existing registry at |
collate | A function to collate the returned list into another object type. |
conf_file | A batchtools config file, defaults to |
registry | a pre-existing job registry |
ignore_incompletes | Whether to reduce the results with |
If qMincApply
is called with wait = TRUE
or if qMincReduce
is called, the results are returned after collation with collate
If qMincApply
is called with wait = FALSE
or if qMincRegistry
or
qMincMap
are called a batchtools registry is returned that can be used to
query job states, kill jobs, and collected results
RMINC's batching facilities are inherited with little modification from the batchtools package, mostly just providing handy wrappers to handle registry creation, batching, submission, and reduction. The abstractions provided are very leaky and it is worth learning about batchtools to handle more complex situations. Formerly one could set the parallelization method from this function, this has been removed. Controlling how and where to execute the parallel jobs is now handled by the conf_file argument.
qMincRegistry
: registry
qMincMap
: map
qMincReduce
: reduce