Monte Carlo computation of posterior probability distribution conditional on given data

Usage

learn(
  data,
  metadata,
  auxdata = NULL,
  outputdir = NULL,
  nsamples = 3600,
  nchains = 60,
  nsamplesperchain = 60,
  parallel = NULL,
  seed = NULL,
  cleanup = TRUE,
  appendtimestamp = TRUE,
  appendinfo = TRUE,
  output = "directory",
  subsampledata = NULL,
  startupMCiterations = 3600,
  minMCiterations = 0,
  maxMCiterations = +Inf,
  maxhours = +Inf,
  ncheckpoints = NULL,
  relerror = 0.05,
  prior = missing(data) || is.null(data),
  thinning = NULL,
  plottraces = TRUE,
  showKtraces = FALSE,
  showAlphatraces = FALSE,
  hyperparams = list(ncomponents = 64, minalpha = -4, maxalpha = 4, byalpha = 1, Rshapelo
    = 0.5, Rshapehi = 0.5, Rvarm1 = 3^2, Cshapelo = 0.5, Cshapehi = 0.5, Cvarm1 = 3^2,
    Dshapelo = 0.5, Dshapehi = 0.5, Dvarm1 = 3^2, Bshapelo = 1, Bshapehi = 1, Dthreshold
    = 1, tscalefactor = 4.266, avoidzeroW = FALSE, initmethod = "precluster")
)

Arguments

data: A dataset, given as a data.frame or as a file path to a csv file.
metadata: A metadata object, given either as a data.frame object, or as a file pa to a csv file.
auxdata: A larger dataset, given as a data.frame or as a file path to a csv file. Such a dataset would be too many to use in the Monte Carlo sampling, but can be used to calculate hyperparameters.
outputdir: Character: path to folder where the output should be saved. If omitted, a directory is created that has the same name as the data file but with suffix "_output_".
nsamples: Integer: number of desired Monte Carlo samples. Default 3600.
nchains: Integer: number of Monte Carlo chains. Default 60.
nsamplesperchain: Integer: number of Monte Carlo samples per chain.
parallel: Logical or NULL or positive integer: TRUE: use roughly half of available cores; FALSE: use serial computation; NULL: don't do anything (use pre-registered condition); integer: use this many cores. Default NULL
seed: Integer: use this seed for the random number generator. If missing or NULL (default), do not set the seed.
cleanup: Logical: remove diagnostic files at the end of the computation? Default TRUE.
appendtimestamp: Logical: append a timestamp to the name of the output directory outputdir? Default TRUE.
appendinfo: Logical: append information about dataset and Monte Carlo parameters to the name of the output directory outputdir? Default TRUE.
output: Character: if 'directory', return the output directory name as VALUE; if string 'learnt', return the 'learnt' object containing the parameters obtained from the Monte Carlo computation. Any other value: VALUE is NULL.
subsampledata: Integer: use only a subset of this many datapoints for the Monte Carlo computation.
startupMCiterations: Integer: number of initial (burn-in) Monte Carlo iterations. Default 3600.
minMCiterations: Integer: minimum number of Monte Carlo iterations to be done. Default 0.
maxMCiterations: Integer: Do at most this many Monte Carlo iterations. Default Inf.
maxhours: Numeric: approximate time limit, in hours, for the Monte Carlo computation to last. Default Inf.
ncheckpoints: Integer: number of datapoints to use for checking when the Monte Carlo computation should end. If NULL (default), this is equal to number of variates + 2. If Inf, use all datapoints.
relerror: Numeric: desired maximal relative error of calculated probabilities with respect to their variability with new data.
prior: Logical: Calculate the prior distribution?
thinning: Integer: thin out the Monte Carlo samples by this value. If NULL (default): let the diagnostics decide the thinning value.
plottraces: Logical: save plots of the Monte Carlo traces of diagnostic values? Default TRUE.
showKtraces: Logical: save plots of the Monte Carlo traces of the K parameter? Default FALSE.
showAlphatraces: Logical: save plots of the Monte Carlo traces of the Alpha parameter? Default FALSE.
hyperparams: List: hyperparameters of the prior.

Value

Name of directory containing output files, or learnt object, or NULL, depending on argument output.