19 Third connection with machine learning
\[ \DeclarePairedDelimiters{\set}{\{}{\}} \DeclareMathOperator*{\argmax}{argmax} \]
In chapter 11 we made a second tentative connection between the notions about probability explored until then, and notions from machine learning. We considered the possibility that a machine-learning algorithm is like an agent that has some built-in background information (corresponding to the algorithm’s architecture), has received pieces of information (corresponding to the data about perfectly known instances of the task, and possibly partial data about a new instance), and is assessing a not-previously known piece of information (other partial aspects of a new task instance):
\[ \mathrm{P}(\underbracket[0ex]{\color[RGB]{238,102,119}\mathsfit{D}_{N+1}}_{\mathclap{\color[RGB]{238,102,119}\text{outcome?}}} \nonscript\:\vert\nonscript\:\mathopen{} \color[RGB]{34,136,51}\underbracket[0ex]{\mathsfit{D}_N \land \dotsb \land \mathsfit{D}_2 \land \mathsfit{D}_1}_{\mathclap{\color[RGB]{34,136,51}\text{training data?}}} \color[RGB]{0,0,0}\land \underbracket[0ex]{\color[RGB]{204,187,68}\mathsfit{I}}_{\mathrlap{\color[RGB]{204,187,68}\uparrow\ \text{architecture?}}}) \]
The correspondence about training data and architecture seems somewhat convincing, the one about outcome needs more exploration.
Having introduced the notion of quantity in the latest chapters 12 and 13, we recognize that “training data” are just quantities, the values of which the agent has learned. So a datum \(\mathsfit{D}_i\) can be expressed by a sentence like \(Z_i\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_i\), where
- \(i\) is the instance: \(1,2,\dotsc,N, N+1\).
- \(Z_i\), a quantity, describes the type of data at instance \(i\), for example “128 × 128 image with 24-bit colour depth, with a character label”.
- \(z_i\) is the value of the quantity \(Z_i\) at instance \(i\), for example the specific image & label displayed here:
We can therefore rewrite the correspondence above as follows:
\[ \mathrm{P}(\underbracket[0ex]{\color[RGB]{238,102,119}Z_{N+1} \mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_{N+1}}_{\mathclap{\color[RGB]{238,102,119}\text{outcome?}}} \nonscript\:\vert\nonscript\:\mathopen{} \color[RGB]{34,136,51}\underbracket[0ex]{ Z_N \mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_N \mathbin{\mkern-0.5mu,\mkern-0.5mu}\dotsb \mathbin{\mkern-0.5mu,\mkern-0.5mu}Z_2 \mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_2 \mathbin{\mkern-0.5mu,\mkern-0.5mu}Z_1 \mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_1}_{\mathclap{\color[RGB]{34,136,51}\text{training data?}}} \color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\underbracket[0ex]{\color[RGB]{204,187,68}\mathsfit{I}}_{\mathrlap{\color[RGB]{204,187,68}\uparrow\ \text{architecture?}}}) \]
This is the kind of inference that we explored in the “next-three-patients” scenario of § 17.4 and in some of the subsequent sections. In chapter 27, after a review of conventional machine-learning methods and terminology, we shall discuss with more care what these inferences are about, what kind of information they use, and how they can be concretely calculated.
In the last sections we have often been speaking about “instances”, “instances of similar quantities”, “task instances”, and similar expression. What do with mean with “instance”, more exactly? It is time that we make this and related notions more precise: the whole idea of “learning from examples” hinges on them. In the next few chapters we shall therefore make these ideas more rigorous and quantifiable. Statistics is the theory that deals with these ideas. As a bonus we shall find out that a rigorous analysis of the notion of “instances” also leads to concrete formulae for calculating the kind of probabilities discussed in the present chapter.