33 Code design
\[ \DeclarePairedDelimiters{\set}{\{}{\}} \DeclareMathOperator*{\argmax}{argmax} \]
Before starting, let’s agree on some terminology in order not to get confused in the discussion below.
- We shall call task a repetitive inference problem with a specified set of units and variates. For instance, a task could be the consecutive prediction of the urgency of incoming patients, given their mean of transportation. We assume that the details of the variates, such as their domain, are well specified. Possibly also a set of data from other patients is available, which we call “training data”.
- We shall call application or instance of the task a single inference about a specific new unit, for example a new incoming patient.
33.1 Range of use of the code
The concrete formulae discussed in the previous chapter 32 can be put into code, for use in different tasks involving only nominal variates. Software of this kind can in principle be written to allow for some or all of the versatility discussed in §§ 27.3–27.4, for example the possibility of taking care (in a first-principled way!) of partially missing training data. But the more versatile we make the software, the more memory, processing power, and computation time it will require.
Roughly speaking, more versatility corresponds to calculations of the joint probability
for more values of the quantities \(\color[RGB]{68,119,170}Z_1, Z_2, \dotsc\). For instance, if data about unit #4 are missing, then we need to calculate the joint probability above for several (possibly all) values of \(\color[RGB]{68,119,170}Z_4\). If data about two units are missing, then we need to do an analogous calculation for all possible combinations of values; and so on.
For our prototype, let’s forgo versatility about units used as training data. From now on we abbreviate the set of training data as
Recall that \({\color[RGB]{68,119,170}Z}\) denotes all (nominal) variates of the population
\[ \mathsfit{\color[RGB]{34,136,51}data}\coloneqq (\color[RGB]{34,136,51} Z_{N}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_{N} \land \dotsb \land Z_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_2 \land Z_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_{1} \color[RGB]{0,0,0}) \]
where \(\color[RGB]{68,119,170}z_N, \dotsc, z_2, z_1\) are specific values, stored in some training dataset. No values are missing.
Since the training \(\mathsfit{\color[RGB]{34,136,51}data}\) are given and fixed in a task, we omit the suffix “\({}_{N+1}\)” that we have often used to indicate a “new” unit. So “\(\color[RGB]{68,119,170}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z\)” simply refers to the variate \({\color[RGB]{68,119,170}Z}\) in a new application of the task.
We allow for full versatility in every new instance. This means that we can accommodate, on the spot at each new instance, what the predictand variates are, and what the predictor variates (if any) are. For example, if the population has three variates \({\color[RGB]{68,119,170}Z}=({\color[RGB]{68,119,170}A}\mathbin{\mkern-0.5mu,\mkern-0.5mu}{\color[RGB]{68,119,170}B}\mathbin{\mkern-0.5mu,\mkern-0.5mu}{\color[RGB]{68,119,170}C})\), our prototype can calculate, at each new application, inferences such as
\(P({\color[RGB]{68,119,170}B}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\dotso\nonscript\:\vert\nonscript\:\mathopen{}\mathsfit{\color[RGB]{34,136,51}data}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I}_{\textrm{d}})\): any one predictand variate, no predictors
\(P({\color[RGB]{68,119,170}A}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\dotso \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\color[RGB]{68,119,170}C}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\dotso\nonscript\:\vert\nonscript\:\mathopen{}\mathsfit{\color[RGB]{34,136,51}data}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I}_{\textrm{d}})\): any two predictand variates, no predictors
\(P({\color[RGB]{68,119,170}A}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\dotso \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\color[RGB]{68,119,170}B}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\dotso \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\color[RGB]{68,119,170}C}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\dotso\nonscript\:\vert\nonscript\:\mathopen{}\mathsfit{\color[RGB]{34,136,51}data}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I}_{\textrm{d}})\): all three variates
\(P({\color[RGB]{68,119,170}B}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\dotso\nonscript\:\vert\nonscript\:\mathopen{}{\color[RGB]{68,119,170}A}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\dotso \mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{\color[RGB]{34,136,51}data}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I}_{\textrm{d}})\): any one predictand variate, any other one predictor
\(P({\color[RGB]{68,119,170}B}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\dotso\nonscript\:\vert\nonscript\:\mathopen{} {\color[RGB]{68,119,170}A}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\dotso \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\color[RGB]{68,119,170}C}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\dotso \mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{\color[RGB]{34,136,51}data}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I}_{\textrm{d}})\): any one predictand variate, any other two predictors
\(P({\color[RGB]{68,119,170}A}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\dotso \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\color[RGB]{68,119,170}C}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\dotso\nonscript\:\vert\nonscript\:\mathopen{}{\color[RGB]{68,119,170}B}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\dotso \mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{\color[RGB]{34,136,51}data}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I}_{\textrm{d}})\): any two predictand variates, any other one predictor
33.2 Code design and computations needed
To enjoy the versatility discussed above, the code needs to compute
This formula is just a rewriting of formula (33.1) for \(L=N+1\), simplified by using the property of the factorial
\[(a+1)! = (a+1) \cdot a!\]
But the computation of formula (33.2) (for all values of \({\color[RGB]{68,119,170}z}\)) must be done only once for a given task. For a new application we only need to combine these already-computed probabilities via sums and fractions. For example, in the three-variate case above, if in a new application we need to forecast \(\color[RGB]{238,102,119}A\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}a\) given \(\color[RGB]{204,187,68}C\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}c\), then we calculate
where all \(P(\color[RGB]{238,102,119}A\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\dotso \color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\color[RGB]{68,119,170}B\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\dotso \color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\color[RGB]{204,187,68}C\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\dotso \color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{\color[RGB]{34,136,51}data}\nonscript\:\vert\nonscript\:\mathopen{} \mathsfit{I}_{\textrm{d}})\) are already computed.
Our prototype software must therefore include two main functions, which we can call as follows:
buildagent()
(see code)- computes \(\color[RGB]{34,136,51}\#{\color[RGB]{68,119,170}z}\) for all values \({\color[RGB]{68,119,170}z}\), as well as the multiplicative factors
\[ \frac{ \bigl(2^{k} -1 \bigr)! }{ \bigl(2^{k} + N \bigr)! \cdot {\bigl(\frac{2^{k}}{M} - 1\bigr)!}^M } \]
for all \(k\), in (33.2). This computation is done once and for all in a given task, using the training \(\mathsfit{\color[RGB]{34,136,51}data}\) and the metadata \(\mathsfit{I}_{\textrm{d}}\) provided. The result can be stored in an array or similar object, which we shall call an
agent
-class object.
We shall also include four additional functions for convenience:
guessmetadata()
- builds a preliminary metadata file, encoding the background information \(\mathsfit{I}_{\textrm{d}}\), from some dataset.
decide()
-
makes a decision according to expected-utility maximization (chapter 3), using probabilities calculated with
infer()
and utilities.
rF()
- draws one or more possible full-population frequency distribution \(\boldsymbol{f}\), according to the updated degree of belief \(\mathrm{p}(F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\nonscript\:\vert\nonscript\:\mathopen{} \mathsfit{\color[RGB]{34,136,51}data}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I}_{\textrm{d}})\)
plotFsamples1D()
-
plots, as a generalized scatter plot, the possible full-population marginal frequency distributions for a single (not joint) predictand variate. If required it also also the final probability obtained with
infer()
.
mutualinfo()
- calculates the mutual information (§ 18.5) between any two sets of variates.
33.3 Code optimization
The formulae of chapter 32, if used as-written, easily lead to two kinds of computation problems. First, they generate overflows and NaN
, owing to factorials and their divisions. Second, the products over variates may involve so many terms as to require a long computation time. In the end we would have to wait a long time just to receive a string of NaN
s.
The first problem is dealt with by rewriting the formulae in terms of logarithms, and renormalizing numerators and denominators of fractions. See for example the lines defining auxalphas
in the buildagent()
function, and the line that redefines counts
one last time in the infer()
function.
The second problem is dealt with by reorganizing the sums as multiples of identical summands; see the lines working with freqscounts
in the buildagent()
function.