Skip to contents

Monte Carlo computation of posterior probability distribution conditional on given data

Usage

learn(
  data,
  metadata,
  auxdata = NULL,
  outputdir = NULL,
  nsamples = 3600,
  nchains = 60,
  nsamplesperchain = 60,
  parallel = TRUE,
  seed = NULL,
  cleanup = TRUE,
  appendtimestamp = TRUE,
  appendinfo = TRUE,
  output = "directory",
  subsampledata = NULL,
  startupMCiterations = 3600,
  minMCiterations = 0,
  maxMCiterations = +Inf,
  maxhours = +Inf,
  ncheckpoints = NULL,
  relerror = 0.05,
  prior = missing(data),
  thinning = NULL,
  plottraces = TRUE,
  showKtraces = FALSE,
  showAlphatraces = FALSE,
  hyperparams = list(ncomponents = 64, minalpha = -4, maxalpha = 4, byalpha = 1, Rshapelo
    = 0.5, Rshapehi = 0.5, Rvarm1 = 3^2, Cshapelo = 0.5, Cshapehi = 0.5, Cvarm1 = 3^2,
    Dshapelo = 0.5, Dshapehi = 0.5, Dvarm1 = 3^2, Lshapelo = 0.5, Lshapehi = 0.5, Lvarm1
    = 3^2, Bshapelo = 1, Bshapehi = 1, Dthreshold = 1, tscalefactor = 2)
)

Arguments

data

A dataset, given as a data.frame or as a file path to a csv file.

metadata

A metadata object, given either as a data.frame object, or as a file pa to a csv file.

auxdata

A larger dataset, given as a data.frame or as a file path to a csv file. Such a dataset would be too many to use in the Monte Carlo sampling, but can be used to calculate hyperparameters.

outputdir

Character: path to folder where the output should be saved. If omitted, a directory is created that has the same name as the data file but with suffix "_output_".

nsamples

Integer: number of desired Monte Carlo samples. Default 3600.

nchains

Integer: number of Monte Carlo chains. Default 60.

nsamplesperchain

Integer: number of Monte Carlo samples per chain.

parallel

Logical: use pre-existing parallel workers from package doParallel. Or integer: create and use that many parallel workers. Default TRUE.

seed

Integer: use this seed for the random number generator. If missing or NULL (default), do not set the seed.

cleanup

Logical: remove diagnostic files at the end of the computation? Default TRUE.

appendtimestamp

Logical: append a timestamp to the name of the output directory outputdir? Default TRUE.

appendinfo

Logical: append information about dataset and Monte Carlo parameters to the name of the output directory outputdir? Default TRUE.

output

Character: if 'directory', return the output directory name as VALUE; if string 'learnt', return the 'learnt' object containing the parameters obtained from the Monte Carlo computation. Any other value: VALUE is NULL.

subsampledata

Integer: use only a subset of this many datapoints for the Monte Carlo computation.

startupMCiterations

Integer: number of initial (burn-in) Monte Carlo iterations. Default 3600.

minMCiterations

Integer: minimum number of Monte Carlo iterations to be done. Default 0.

maxMCiterations

Integer: Do at most this many Monte Carlo iterations. Default Inf.

maxhours

Numeric: approximate time limit, in hours, for the Monte Carlo computation to last. Default Inf.

ncheckpoints

Integer: number of datapoints to use for checking when the Monte Carlo computation should end. If NULL (default), this is equal to number of variates + 2. If Inf, use all datapoints.

relerror

Numeric: desired maximal relative error of calculated probabilities with respect to their variability with new data.

prior

Logical: Calculate the prior distribution?

thinning

Integer: thin out the Monte Carlo samples by this value. If NULL (default): let the diagnostics decide the thinning value.

plottraces

Logical: save plots of the Monte Carlo traces of diagnostic values? Default TRUE.

showKtraces

Logical: save plots of the Monte Carlo traces of the K parameter? Default FALSE.

showAlphatraces

Logical: save plots of the Monte Carlo traces of the Alpha parameter? Default FALSE.

hyperparams

List: hyperparameters of the prior.

Value

Name of directory containing output files, or learnt object, or NULL, depending on argument output.