29 Inferences from frequencies
\[ \DeclarePairedDelimiters{\set}{\{}{\}} \DeclareMathOperator*{\argmax}{argmax} \]
29.1 If the population frequencies were known
Let’s now see how the exchangeability of an agent’s degrees of belief allows it to calculate probabilities about the units of a population. We shall do this calculation in two steps. First, in the case where the agent knows the joint frequency distribution (§§21.2, 21.3, 23.2) for the full population. Second, in the more general case where the agent lacks this population-frequency information.
When the full-population frequency distribution is known, the calculation of probabilities is very intuitive and analogous to the stereotypical “drawing balls from an urn”. We shall rely on this intuition; keep in mind, however, that the probabilities are not assigned “by intuition”, but actually fully determined by the two basic pieces of knowledge or assumptions: exchangeability and known population frequencies. Some simple proof sketches of this will also be given.
We consider an infinite population with any number of variates. For concreteness we assume these variates to have finite, discrete domains; but the formulae we obtain can be easily generalized to other kinds of variates. In this and the following chapters we shall often use the simplified income dataset (file income_data_nominal_nomissing.csv
and its underlying population as an example. This population has nine nominal variates. The variates, their domain sizes, and their possible values are listed at this link.
Notation recap
We shall mainly use the notation introduced in § 27.3:
All population variates, jointly, are denoted \({\color[RGB]{68,119,170}Z}\). In the case of the income dataset, for instance, the variate \({\color[RGB]{68,119,170}Z}\) stands for the joint variate with nine components:
\[ \begin{aligned} {\color[RGB]{68,119,170}Z}&\coloneqq(\color[RGB]{68,119,170} \mathit{workclass} \mathbin{\mkern-0.5mu,\mkern-0.5mu} \mathit{education} \mathbin{\mkern-0.5mu,\mkern-0.5mu} \mathit{marital\_status} \mathbin{\mkern-0.5mu,\mkern-0.5mu} \mathit{occupation} \mathbin{\mkern-0.5mu,\mkern-0.5mu} {}\\ &\qquad \color[RGB]{68,119,170}\mathit{relationship} \mathbin{\mkern-0.5mu,\mkern-0.5mu} \mathit{race} \mathbin{\mkern-0.5mu,\mkern-0.5mu} \mathit{sex} \mathbin{\mkern-0.5mu,\mkern-0.5mu} \mathit{native\_country} \mathbin{\mkern-0.5mu,\mkern-0.5mu} \mathit{income} \color[RGB]{0,0,0}) \end{aligned} \]
When we write \(\color[RGB]{68,119,170}Z \mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z\), the symbol \(\color[RGB]{68,119,170}z\) stands for some definite joint values, for instance \(({\color[RGB]{68,119,170}{\small\verb;Without-pay;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\small\verb;Doctorate;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}\dotsb \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\small\verb;Ireland;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\small\verb;>50K;}})\).
In applications where the agent wants to infer the values of some predictand variates, given the observation of predictor variates, the former are denoted \({\color[RGB]{68,119,170}Y}\), the latter \({\color[RGB]{68,119,170}X}\). In the income problem, for instance, the agent (some USA census agency) would like to infer the \(\color[RGB]{68,119,170}\mathit{income}\) variate of a person from the other eight demographic characteristics \(\color[RGB]{68,119,170}\mathit{workclass} \mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathit{education} \mathbin{\mkern-0.5mu,\mkern-0.5mu}\dotsb\) of that person. So in this inference problem we define
\[ \begin{aligned} {\color[RGB]{68,119,170}Y}&\coloneqq{\color[RGB]{68,119,170}\mathit{income}} \\[1ex] {\color[RGB]{68,119,170}X}&\coloneqq({\color[RGB]{68,119,170}\mathit{workclass} \mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathit{education} \mathbin{\mkern-0.5mu,\mkern-0.5mu}\dotsb \mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathit{sex} \mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathit{native\_country}}) \end{aligned} \]
We shall, however, also consider slightly different inference problems, for example with \(({\color[RGB]{68,119,170}\mathit{race} \mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathit{sex}})\) as predictand and the remaining seven variates \(({\color[RGB]{68,119,170}\mathit{workclass} \mathbin{\mkern-0.5mu,\mkern-0.5mu}\dotsb \mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathit{income}})\) as predictors.
Often we shall use red for quantities that are not known in the problem, and green for quantities that are known.
29.2 Knowing the full-population frequency distribution
Now suppose that the agent knows the full-population joint frequency distribution. Let’s make clearer what this means. In the income problem, for instance, consider these two different joint values for the joint variate \({\color[RGB]{68,119,170}Z}\):
\[ \begin{aligned} {\color[RGB]{68,119,170}z^{*}}&\coloneqq( {\small\verb;Private;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\small\verb;HS-grad;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\small\verb;Married-civ-spouse;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\small\verb;Machine-op-inspct;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{} \\ &\qquad{\small\verb;Husband;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\small\verb;White;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\small\verb;Male;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\small\verb;United-States;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\small\verb;<=50K;} ) \\[2ex] {\color[RGB]{68,119,170}z^{**}}&\coloneqq( {\small\verb;Self-emp-not-inc;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\small\verb;HS-grad;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\small\verb;Married-civ-spouse;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{} \\ &\qquad {\small\verb;Farming-fishing;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\small\verb;Husband;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\small\verb;White;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\small\verb;Male;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\small\verb;United-States;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\small\verb;<=50K;} ) \end{aligned} \]
The agent knows that the value \(\color[RGB]{68,119,170}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z^{*}\) occurs in the full population of interest (in this case all 340 millions or so USA citizens, considered in a short period of time) with a relative frequency \(0.860 369\%\); it also knows that the value \(\color[RGB]{68,119,170}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z^{**}\) occurs with a relative frequency \(0.260 058\%\). We write this as follows:
\[ f({\color[RGB]{68,119,170}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z^{*}}) = 0.860 369\% \ , \qquad f({\color[RGB]{68,119,170}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z^{**}}) = 0.260 058\% \]
The agent knows not only the frequencies of the two particular joint values \(\color[RGB]{68,119,170}z^{*}\), \(\color[RGB]{68,119,170}z^{**}\), but for all possible joint values, that is, for all possible combinations of values from the single variate. In the income example there are 54 001 920 possible combinations, and therefore just as many relative frequencies. All these frequencies together form the full-population frequency distribution for \({\color[RGB]{68,119,170}Z}\), which we denote collectively with \(\boldsymbol{f}\) (note the boldface). Let’s introduce the quantity \(F\), denoting the full-population frequency distribution. Knowledge that the frequencies are \(\boldsymbol{f}\) is then expressed by the sentence \(F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\).
In other cases, these hypothetically known frequencies would refer to the full population of units: maybe even past, present, future, if they span a possibly unlimited time range.
29.3 Inference about a single unit
Now imagine that the agent, given the information \(F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\) about the frequencies and some background information \(\mathsfit{I}\), must infer all \({\color[RGB]{68,119,170}Z}\) variates for a specific unit \(u\). In the income case, it would be an inference about a specific USA citizen. This unit \(u\) could have any particular combination of variate values; in the income case it could have any one of the 54 001 920 possible combined values. The agent must assign a probability to each of these possibilities.1 Which probability values should it assign?
Intuitively we would say that the probability for a particular value \(\color[RGB]{68,119,170}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z\) should be equal to the frequency of that value in the full population:
For instance, the probabilities that unit \(u\) has the values \(\color[RGB]{68,119,170}z^{*}\) or \(\color[RGB]{68,119,170}z^{**}\) above is
\[ \begin{aligned} &\mathrm{P}({\color[RGB]{68,119,170}Z}_{u}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{68,119,170}z^{*}} \nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I}) = f({\color[RGB]{68,119,170}Z}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{68,119,170}z^{*}}) = 0.860 369\% \\[1ex] &\mathrm{P}({\color[RGB]{68,119,170}Z}_{u}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{68,119,170}z^{**}} \nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I}) = f({\color[RGB]{68,119,170}Z}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{68,119,170}z^{**}}) = 0.260 058\% \end{aligned} \]
This intuition is the same as in drawing balls, which may have different sets of labels, from a collection, given that we know the proportion of balls with each possible label set.
But the equality above can actually be proven mathematically in this specific case: it follows from the assumption of exchangeability. Let’s examine a very simple case to get an idea of how this proof works.
Exact calculation of the probabilities in a simple case
Suppose we have three rocks from our Mars-prospecting collection. They are marked #1, #2, #3. They look alike, but we know that two of them have haematite, so \(R\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\) for them, and one doesn’t, so \(R\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\) for that rock. This background information – let’s call it \(\mathsfit{K}_{\textsf{3}}\) – is a simple case of a finite population with three units and a binary variate \(R\). We know that the frequency distribution for this population is
\[f(R\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}) = 2/3 \qquad f(R\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}) = 1/3\]
Our information \(F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\) about the frequencies corresponds to the following composite sentence:
\[ F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\ \Longleftrightarrow\ (R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{3}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}) \lor (R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{3}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}) \lor (R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{3}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}) \]
Given \(F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\), we know that \(F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\) is true: \(\mathrm{P}( F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}})=1\), which means
\[ \mathrm{P}\bigl[(R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{3}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}) \lor (R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{3}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}) \lor (R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{3}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}) \nonscript\:\big\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}, \mathsfit{K}_{\textsf{3}}\bigr] = 1 \]
Now use the or
-rule, considering that the three or
ed sentences are mutually exclusive:
\[ \begin{aligned} 1&=\mathrm{P}\bigl[(R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{3}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}) \lor (R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{3}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}) \lor (R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{3}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}) \nonscript\:\big\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}}\bigr] \\[2ex] &= \mathrm{P}(R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{3}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}}) +{} \\&\qquad \mathrm{P}(R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{3}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}}) +{} \\&\qquad \mathrm{P}(R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{3}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}}) \end{aligned} \]
According to our background information \(\mathsfit{K}_{\textsf{3}}\), our degrees of belief are exchangeable. This means that the three probabilities summed up above must all have the same value, because in each of them \({\color[RGB]{102,204,238}{\small\verb;Y;}}\) appears twice and \({\color[RGB]{204,187,68}{\small\verb;N;}}\) once. But if we are summing the same value thrice, and the sum is \(1\), that that value must be \(1/3\). Hence we find that
\[ \begin{aligned} &\mathrm{P}(R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{3}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}}) = 1/3 \\ &\mathrm{P}(R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{3}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}}) = 1/3 \\ &\mathrm{P}(R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu} R_{3}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}}) = 1/3 \\[1ex] &\text{all other probabilities are zero} \end{aligned} \]
Now let’s find the probability that a rock, say #1, has haematite (\({\color[RGB]{102,204,238}{\small\verb;Y;}}\)), given that we haven’t observed any other rocks: \(\mathrm{P}(R_1\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}})\). This is a marginal probability (§ 16.1), so it’s given by the sum
\[ \begin{aligned} \mathrm{P}(R_1\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}}) &= \sum_{i={\color[RGB]{102,204,238}{\small\verb;Y;}}}^{{\color[RGB]{204,187,68}{\small\verb;N;}}}\sum_{j={\color[RGB]{102,204,238}{\small\verb;Y;}}}^{{\color[RGB]{204,187,68}{\small\verb;N;}}} \mathrm{P}(R_1\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu}R_2\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}i \mathbin{\mkern-0.5mu,\mkern-0.5mu}R_3\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}j \nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}}) \\[1ex] &= \mathrm{P}(R_1\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu}R_2\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu}R_3\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}}) + {} \\ &\qquad \mathrm{P}(R_1\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu}R_2\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu}R_3\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}}) + {} \\ &\qquad \mathrm{P}(R_1\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu}R_2\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu}R_3\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}}) + {} \\ &\qquad \mathrm{P}(R_1\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu}R_2\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu}R_3\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}}) \\[1ex] &= 0 + 1/3 + 1/3 + 0 \\[1ex] &= 2/3 \end{aligned} \]
which is indeed equal to \(f(R\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}})\).
This simple example gives you an idea why our intuition for equating – in specific circumstances – probability with full-population frequencies, is actually a mathematical theorem: it follows from (1) knowledge of the full-population frequencies, and (2) exchangeability.
29.4 Inference about several units
Let’s continue with the Mars-prospecting example of the previous section, with just three rocks. We found that the probability that rock #1 has haematite (\({\color[RGB]{102,204,238}{\small\verb;Y;}}\)) was \(2/3\), given that we haven’t observed any other rocks. This probability was equal to the frequency of \({\color[RGB]{102,204,238}{\small\verb;Y;}}\)-rocks in the urn.
Now suppose that we observe rock #2, and it turns out to have haematite (\(R_2\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\)). What is the probability that rock #1 has haematite?
The probability we are asking about is \(\mathrm{P}(R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\nonscript\:\vert\nonscript\:\mathopen{} R_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu}F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}})\), and it can be calculated with the usual rules. The result is again the same as the frequency of the \({\color[RGB]{102,204,238}{\small\verb;Y;}}\)-rocks, but with respect to the new situation: there are now two rocks left in front of us, and one must contain haematite, while the other doesn’t. The probability is therefore \(1/2\), a value different from that we found before, \(2/3\):
\[ \mathrm{P}(R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}}) = 2/3 \qquad \mathrm{P}(R_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\nonscript\:\vert\nonscript\:\mathopen{} R_{2}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu}F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}_{\textsf{3}}) = 1/2 \]
This situation is quite general: in a collection of many rocks, the probabilities for new observations change accordingly to information about previous observations (and also subsequent ones, if already known).
But consider now the case \(\mathsfit{K}\) of a large collection of 3 000 000 rocks, 2 000 000 of which have haematite and the rest doesn’t.2 The population’s relative frequencies are exactly as in the case with three rocks, and for the probability that rock #1 contains haematite we still have
2 Note how this scenario becomes very similar to that of coin tosses.
\[ \mathrm{P}(R_1\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}) = f(R\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}) = \frac{2 000 000}{3 000 000} = 2/3 \]
Now suppose we examine rock #2 and find haematite. What is the probability that rock #1 also contains haematite? In this case we find
\[ \mathrm{P}(R_1\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\nonscript\:\vert\nonscript\:\mathopen{} R_2\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu}F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}) = \frac{1 999 999}{2 999 999} \approx 2/3 \]
with an absolute error of only \(0.000 000 1\). That is, the probability and frequency are almost the same as before examining rock #2. The reason is clear: the number of rocks is so large that observing some of them doesn’t practically change the content and proportions of the whole collection.
The joint probability that rock #2 contains haematite and rock #1 doesn’t is therefore, by the and
-rule,
\[ \begin{aligned} \mathrm{P}(R_2\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu}R_1\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}) &= \mathrm{P}(R_1\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\nonscript\:\vert\nonscript\:\mathopen{} R_2\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu}F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}) \cdot \mathrm{P}(R_2\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}) \\[1ex] &\approx f(R\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}) \cdot f(R\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}) \end{aligned} \]
the approximation being the better, the larger the collection of rocks.
It is easy to see that this will hold for more observations, and for different and more complex variate domains, as long as the number of units considered is enough small compared with the population size. For instance
\[ \mathrm{P}(R_4\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu}R_3\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu}R_2\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}\mathbin{\mkern-0.5mu,\mkern-0.5mu}R_1\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{K}) \approx f(R\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}) \cdot f(R\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}) \cdot f(R\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{204,187,68}{\small\verb;N;}}) \cdot f(R\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;Y;}}) \]
where \(\boldsymbol{f}\) is the initial frequency distribution for the population.
This situation applies to more general populations: if the full-population frequencies are known, the agent’s beliefs are exchangeable, and the population is practically infinite, then the joint probability that some units have a particular set of values is equal to the product of the frequencies of those values.
The formula above solves our initial problem: How to calculate and encode the joint probability distribution for the full population?, although it does so only in the case where the full-population frequencies \(\boldsymbol{f}\) are known. In this case this probability is encoded in the \(\boldsymbol{f}\) itself (which can be represented as a multidimensional array), and can be calculated for any desired joint probability distribution just by a multiplication.
In the income example from § 29.2, the probability that two units (citizens) #\(a\), #\(c\) have value \(\color[RGB]{68,119,170}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z^{**}\) and one unit #\(b\) has value \(\color[RGB]{68,119,170}Z \mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z^{*}\) is
\[ \begin{aligned} \mathrm{P}( {\color[RGB]{68,119,170}Z}_{a}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{68,119,170}z^{**}} \mathbin{\mkern-0.5mu,\mkern-0.5mu} {\color[RGB]{68,119,170}Z}_{b}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{68,119,170}z^{*}} \mathbin{\mkern-0.5mu,\mkern-0.5mu} {\color[RGB]{68,119,170}Z}_{c}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{68,119,170}z^{**}} \nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I}) &\approx f({\color[RGB]{68,119,170}Z}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{68,119,170}z^{**}}) \cdot f({\color[RGB]{68,119,170}Z}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{68,119,170}z^{*}}) \cdot f({\color[RGB]{68,119,170}Z}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{68,119,170}z^{**}}) \\[1ex] &= 0.260 058\% \cdot 0.860 369\% \cdot 0.260 058\% \\ &= 0.000 005 818 7\% \end{aligned} \]
29.5 No learning when full-population frequencies are known
Imagine an agent with exchangeable beliefs \(\mathsfit{I}\) and knowledge \(F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\) of the full-population frequencies, who also has observed several units with values (possibly some identical) \(\color[RGB]{68,119,170}Z_{u'}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z' \mathbin{\mkern-0.5mu,\mkern-0.5mu}Z_{u''}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z'' \mathbin{\mkern-0.5mu,\mkern-0.5mu}Z_{u'''}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z''' \mathbin{\mkern-0.5mu,\mkern-0.5mu}\dotsb\). What is this agent’s degree of belief that a new unit #\(u\) has value \(\color[RGB]{68,119,170}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z\)?
From our basic formula for this question,
\[ \begin{aligned} &\mathrm{P}(\color[RGB]{238,102,119} Z_{u}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z \color[RGB]{0,0,0}\nonscript\:\vert\nonscript\:\mathopen{}\color[RGB]{34,136,51} Z_{u'}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z' \mathbin{\mkern-0.5mu,\mkern-0.5mu} Z_{u''}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z'' \mathbin{\mkern-0.5mu,\mkern-0.5mu} Z_{u'''}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z''' \mathbin{\mkern-0.5mu,\mkern-0.5mu} \dotsb \color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I}) \\[2ex] &\qquad{}=\frac{ \mathrm{P}(\color[RGB]{238,102,119} Z_{u}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z \color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu} \color[RGB]{34,136,51}Z_{u'}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z' \mathbin{\mkern-0.5mu,\mkern-0.5mu} Z_{u''}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z'' \mathbin{\mkern-0.5mu,\mkern-0.5mu} Z_{u'''}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z''' \mathbin{\mkern-0.5mu,\mkern-0.5mu} \dotsb \color[RGB]{0,0,0}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I}) }{ \sum_{\color[RGB]{170,51,119}z} \mathrm{P}( \color[RGB]{238,102,119}Z_{u}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\color[RGB]{170,51,119}z \color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu} \color[RGB]{34,136,51}Z_{u'}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z' \mathbin{\mkern-0.5mu,\mkern-0.5mu} Z_{u''}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z'' \mathbin{\mkern-0.5mu,\mkern-0.5mu} Z_{u'''}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z''' \mathbin{\mkern-0.5mu,\mkern-0.5mu} \dotsb \color[RGB]{0,0,0}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I}) } \\[2ex] &\qquad{}\approx\frac{ f({\color[RGB]{238,102,119}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z}) \cdot f({\color[RGB]{68,119,170}{\color[RGB]{34,136,51}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z'}}) \cdot f({\color[RGB]{34,136,51}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z''}) \cdot f({\color[RGB]{34,136,51}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z'''}) \cdot \dotsb }{ \sum_{\color[RGB]{170,51,119}z} f({\color[RGB]{238,102,119}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\color[RGB]{170,51,119}z}) \cdot f({\color[RGB]{34,136,51}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z'}) \cdot f({\color[RGB]{34,136,51}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z''}) \cdot f({\color[RGB]{34,136,51}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z'''}) \cdot \dotsb } \\[2ex] &\qquad{}=\frac{ f({\color[RGB]{238,102,119}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z}) \cdot \cancel{f({\color[RGB]{34,136,51}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z'})} \cdot \cancel{f({\color[RGB]{34,136,51}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z''})} \cdot \cancel{f({\color[RGB]{34,136,51}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z'''})} \cdot \cancel{\dotsb} }{ \underbracket[0.2ex]{\sum_{\color[RGB]{170,51,119}z} f({\color[RGB]{238,102,119}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\color[RGB]{170,51,119}z})}_{{}=1} \cdot \cancel{f({\color[RGB]{34,136,51}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z'})} \cdot \cancel{f({\color[RGB]{34,136,51}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z''})} \cdot \cancel{f({\color[RGB]{34,136,51}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z'''})} \cdot \cancel{\dotsb} } \\[2ex] &\qquad{}= f({\color[RGB]{238,102,119}Z\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z}) \\[3ex] &\qquad{}\equiv \mathrm{P}(\color[RGB]{238,102,119}Z_{u}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z \color[RGB]{0,0,0}\nonscript\:\vert\nonscript\:\mathopen{} F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I}) \end{aligned} \]
so the information from the units \(u'\), \(u''\), and so on is irrelevant to this agent. In other words, this agent’s inferences about some units are not affected by the observation of other units.
The reason for this irrelevance is that the agent already knows the full-population frequencies. So the observation of the frequencies of some values provides no new information to the agent.
Obviously this is not what we desire. But it is not a problem: the crucial point is that knowledge of full-population frequencies is only a hypothetical, idealized situation. In the next chapter we shall see that learning occurs when we go beyond this idealization.