Monte Carlo computation of posterior probability distribution conditional on given data
Source:R/learn.R
learn.Rd
Monte Carlo computation of posterior probability distribution conditional on given data
Usage
learn(
data,
metadata,
auxdata = NULL,
outputdir = NULL,
nsamples = 3600,
nchains = 60,
nsamplesperchain = 60,
parallel = TRUE,
seed = NULL,
cleanup = TRUE,
appendtimestamp = TRUE,
appendinfo = TRUE,
output = "directory",
subsampledata = NULL,
startupMCiterations = 3600,
minMCiterations = 0,
maxMCiterations = +Inf,
maxhours = +Inf,
ncheckpoints = NULL,
relerror = 0.05,
prior = missing(data),
thinning = NULL,
plottraces = TRUE,
showKtraces = FALSE,
showAlphatraces = FALSE,
hyperparams = list(ncomponents = 64, minalpha = -4, maxalpha = 4, byalpha = 1, Rshapelo
= 0.5, Rshapehi = 0.5, Rvarm1 = 3^2, Cshapelo = 0.5, Cshapehi = 0.5, Cvarm1 = 3^2,
Dshapelo = 0.5, Dshapehi = 0.5, Dvarm1 = 3^2, Lshapelo = 0.5, Lshapehi = 0.5, Lvarm1
= 3^2, Bshapelo = 1, Bshapehi = 1, Dthreshold = 1, tscalefactor = 2)
)
Arguments
- data
A dataset, given as a
data.frame
or as a file path to a csv file.- metadata
A metadata object, given either as a data.frame object, or as a file pa to a csv file.
- auxdata
A larger dataset, given as a data.frame or as a file path to a csv file. Such a dataset would be too many to use in the Monte Carlo sampling, but can be used to calculate hyperparameters.
- outputdir
Character: path to folder where the output should be saved. If omitted, a directory is created that has the same name as the data file but with suffix "
_output_
".- nsamples
Integer: number of desired Monte Carlo samples. Default 3600.
- nchains
Integer: number of Monte Carlo chains. Default 60.
- nsamplesperchain
Integer: number of Monte Carlo samples per chain.
- parallel
Logical: use pre-existing parallel workers from package
doParallel
. Or integer: create and use that many parallel workers. DefaultTRUE
.- seed
Integer: use this seed for the random number generator. If missing or
NULL
(default), do not set the seed.- cleanup
Logical: remove diagnostic files at the end of the computation? Default
TRUE
.- appendtimestamp
Logical: append a timestamp to the name of the output directory
outputdir
? DefaultTRUE
.- appendinfo
Logical: append information about dataset and Monte Carlo parameters to the name of the output directory
outputdir
? DefaultTRUE
.- output
Character: if
'directory'
, return the output directory name asVALUE
; if string'learnt'
, return the'learnt'
object containing the parameters obtained from the Monte Carlo computation. Any other value:VALUE
isNULL
.- subsampledata
Integer: use only a subset of this many datapoints for the Monte Carlo computation.
- startupMCiterations
Integer: number of initial (burn-in) Monte Carlo iterations. Default 3600.
- minMCiterations
Integer: minimum number of Monte Carlo iterations to be done. Default 0.
- maxMCiterations
Integer: Do at most this many Monte Carlo iterations. Default
Inf
.- maxhours
Numeric: approximate time limit, in hours, for the Monte Carlo computation to last. Default
Inf
.- ncheckpoints
Integer: number of datapoints to use for checking when the Monte Carlo computation should end. If NULL (default), this is equal to number of variates + 2. If Inf, use all datapoints.
- relerror
Numeric: desired maximal relative error of calculated probabilities with respect to their variability with new data.
- prior
Logical: Calculate the prior distribution?
- thinning
Integer: thin out the Monte Carlo samples by this value. If NULL (default): let the diagnostics decide the thinning value.
- plottraces
Logical: save plots of the Monte Carlo traces of diagnostic values? Default
TRUE
.- showKtraces
Logical: save plots of the Monte Carlo traces of the K parameter? Default
FALSE
.- showAlphatraces
Logical: save plots of the Monte Carlo traces of the Alpha parameter? Default
FALSE
.- hyperparams
List: hyperparameters of the prior.