19  Third connection with machine learning

Published

2023-11-16

In chapter  11 we made a second tentative connection between the notions about probability explored until then, and notions from machine learning. We considered the possibility that a machine-learning algorithm is like an agent that has some built-in background information (corresponding to the algorithm’s architecture), has received pieces of information (corresponding to the data about perfectly known instances of the task, and possibly partial data about a new instance), and is assessing a not-previously known piece of information (other partial aspects of a new task instance):

\[ \mathrm{P}(\underbracket[0ex]{\color[RGB]{238,102,119}\mathsfit{D}_{N+1}}_{\mathclap{\color[RGB]{238,102,119}\text{outcome?}}} \nonscript\:\vert\nonscript\:\mathopen{} \color[RGB]{34,136,51}\underbracket[0ex]{\mathsfit{D}_N \land \dotsb \land \mathsfit{D}_2 \land \mathsfit{D}_1}_{\mathclap{\color[RGB]{34,136,51}\text{training data?}}} \color[RGB]{0,0,0}\land \underbracket[0ex]{\color[RGB]{204,187,68}\mathsfit{I}}_{\mathrlap{\color[RGB]{204,187,68}\uparrow\ \text{architecture?}}}) \]

The correspondence about training data and architecture seems somewhat convincing, the one about outcome needs more exploration.

Having introduced the notion of quantity in the latest chapters 12 and 13, we recognize that “training data” are nothing else but quantities with given values. So a datum \(\mathsfit{D}_i\) can be expressed by a sentence like \(Z_i\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_i\), where

label = “Saitama”

We can therefore rewrite the correspondence above as follows:

\[ \mathrm{P}(\underbracket[0ex]{\color[RGB]{238,102,119}Z_{N+1} \mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_{N+1}}_{\mathclap{\color[RGB]{238,102,119}\text{outcome?}}} \nonscript\:\vert\nonscript\:\mathopen{} \color[RGB]{34,136,51}\underbracket[0ex]{ Z_N \mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_N \mathbin{\mkern-0.5mu,\mkern-0.5mu}\dotsb \mathbin{\mkern-0.5mu,\mkern-0.5mu}Z_2 \mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_2 \mathbin{\mkern-0.5mu,\mkern-0.5mu}Z_1 \mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_1}_{\mathclap{\color[RGB]{34,136,51}\text{training data?}}} \color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\underbracket[0ex]{\color[RGB]{204,187,68}\mathsfit{I}}_{\mathrlap{\color[RGB]{204,187,68}\uparrow\ \text{architecture?}}}) \]

This is the kind of inference that we explored in the “next-three-patients” scenario of §  17.3 and in some of its following sections. In chapter  27, after a review of conventional machine-learning methods and terminology, we shall discuss with more care what these inferences are about, what kind of information they use, and how they can be concretely calculated.

But we have been speaking of “task instances” and “instances of quantities” quite vaguely so far. These are important notions: the whole idea of “learning from examples” hinges on them. In the next few chapters we shall therefore make them more rigorous. The theory at makes them rigorous is Statistics. As a bonus we shall find out that a rigorous analysis of the notion of “instances” also leads to concrete formulae to calculate the probabilities discussed above.