22 Subpopulations and conditional frequencies

Published

2024-09-28

22.1 Subpopulations

When we have a statistical population with a joint variate, it is often of interest to focus on a subset of units that share the same value of a particular variate.

Consider for instance the following population, related to the glass-forensics example we encountered before:

units: glass fragments (collected at specific locations)
variate: the joint variate \((\mathit{Ca}, \mathit{Si}, \mathit{Type})\) consisting of three simple variates:
- weight fraction of \(\mathit{Ca}\)lcium in the fragment (ordinal variate), with three possible values \(\set{{\small\verb;low;}, {\small\verb;medium;}, {\small\verb;high;}}\)
- weight fraction of \(\mathit{Si}\)licon in the fragment (ordinal variate), with three possible values \(\set{{\small\verb;low;}, {\small\verb;medium;}, {\small\verb;high;}}\)
- \(\mathit{Type}\) of glass fragment (nominal variate), with seven possible values \(\{{\small\verb;building_windows_float_processed;}\), \({\small\verb;building_windows_non_float_processed;}\), \({\small\verb;containers;}\), \({\small\verb;tableware;}\), \({\small\verb;headlamps;}\}\)

Table 22.1: Simplified glass-fragment population data

unit	\(\mathit{Ca}\)	\(\mathit{Si}\)	\(\mathit{Type}\)
1	\({\small\verb;low;}\)	\({\small\verb;high;}\)	\({\small\verb;headlamps;}\)
2	\({\small\verb;low;}\)	\({\small\verb;medium;}\)	\({\small\verb;building_windows_non_float_processed;}\)
3	\({\small\verb;medium;}\)	\({\small\verb;medium;}\)	\({\small\verb;tableware;}\)
4	\({\small\verb;medium;}\)	\({\small\verb;medium;}\)	\({\small\verb;building_windows_float_processed;}\)
5	\({\small\verb;low;}\)	\({\small\verb;medium;}\)	\({\small\verb;headlamps;}\)
6	\({\small\verb;medium;}\)	\({\small\verb;medium;}\)	\({\small\verb;containers;}\)
7	\({\small\verb;low;}\)	\({\small\verb;medium;}\)	\({\small\verb;building_windows_non_float_processed;}\)
8	\({\small\verb;low;}\)	\({\small\verb;high;}\)	\({\small\verb;tableware;}\)
9	\({\small\verb;medium;}\)	\({\small\verb;medium;}\)	\({\small\verb;tableware;}\)
10	\({\small\verb;medium;}\)	\({\small\verb;medium;}\)	\({\small\verb;tableware;}\)

Let’s say we are interested only in units that have the \(\mathit{Type}\) variate equal to \({\small\verb;tableware;}\). Discarding all others we obtain a new, smaller population with four units:

Table 22.2: Selection according to \(\mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;tableware;}\)

unit	\(\mathit{Ca}\)	\(\mathit{Si}\)	\(\boldsymbol{\nonscript\:\vert\nonscript\:\mathopen{}}\mathit{Type}\)
3	\({\small\verb;medium;}\)	\({\small\verb;medium;}\)	\({\small\verb;tableware;}\)
8	\({\small\verb;low;}\)	\({\small\verb;high;}\)	\({\small\verb;tableware;}\)
9	\({\small\verb;medium;}\)	\({\small\verb;medium;}\)	\({\small\verb;tableware;}\)
10	\({\small\verb;medium;}\)	\({\small\verb;medium;}\)	\({\small\verb;tableware;}\)

were a bar “\(\boldsymbol{\nonscript\:\vert\nonscript\:\mathopen{}}\)” indicates the variate used for the selection.

As another example, we could be interested instead in those units that have both \(\mathit{Ca}\) and \(\mathit{Si}\) variates equal to \({\small\verb;medium;}\). We obtain a smaller population with five units:

Table 22.3: Selection according to \(\mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;medium;}\) and \(\mathit{Si}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;medium;}\)

unit	\(\boldsymbol{\nonscript\:\vert\nonscript\:\mathopen{}}\mathit{Ca}\)	\(\boldsymbol{\nonscript\:\vert\nonscript\:\mathopen{}}\mathit{Si}\)	\(\mathit{Type}\)
3	\({\small\verb;medium;}\)	\({\small\verb;medium;}\)	\({\small\verb;tableware;}\)
4	\({\small\verb;medium;}\)	\({\small\verb;medium;}\)	\({\small\verb;building_windows_float_processed;}\)
6	\({\small\verb;medium;}\)	\({\small\verb;medium;}\)	\({\small\verb;containers;}\)
9	\({\small\verb;medium;}\)	\({\small\verb;medium;}\)	\({\small\verb;tableware;}\)
10	\({\small\verb;medium;}\)	\({\small\verb;medium;}\)	\({\small\verb;tableware;}\)

Populations formed in this way are called subpopulations of the original one. They are statistical populations in their own right. The notion of “subpopulation” is a relative notion. Any population can often be considered as a subpopulation of some larger population having additional variates.

Exercise

From the population of table 22.1:
- Construct the marginal population with variate \(\mathit{Ca}\)
- Report the frequency distribution for the marginal population above (remember that \(\mathit{Ca}\) has three possible values)
- Construct the subpopulation with variate \(\mathit{Si}\) equal to \({\small\verb;high;}\)
- Construct the subpopulation with variate \(\mathit{Type}\) equal to \({\small\verb;headlamps;}\) and the variate \(\mathit{Si}\) equal to \({\small\verb;medium;}\)

Check your understanding of the reasoning behind the notions of marginal population and subpopulation with this exercise:

From the population of table 22.1, construct the subpopulation with variate \(\mathit{Type}\) equal to either \({\small\verb;headlamps;}\) or \({\small\verb;tableware;}\).

22.2 Conditional frequencies

Given a statistical population with joint variates \({\color[RGB]{34,136,51}X}, {\color[RGB]{238,102,119}Y}\) (and possibly others), we define the conditional frequency of the value \({\color[RGB]{238,102,119}y}\) of \({\color[RGB]{238,102,119}Y}\), given or “conditional on” the value \({\color[RGB]{34,136,51}x}\) of \({\color[RGB]{34,136,51}X}\), as the frequency of the value \({\color[RGB]{238,102,119}y}\) in the subpopulation selected by \({\color[RGB]{34,136,51}X\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x}\). This frequency is usually written

where \(f\) is the symbol for the joint frequency of the population.

Consider for instance the glass-fragment population of table 22.1. The conditional frequency of \({\color[RGB]{238,102,119}\mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;low;}}\) given \({\color[RGB]{34,136,51}\mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;tableware;}}\) is the (marginal) frequency of \(\mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;low;}\) in the subpopulation of table 22.2, from which we find

\[ f({\color[RGB]{238,102,119}\mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;low;}} \nonscript\:\vert\nonscript\:\mathopen{} {\color[RGB]{34,136,51}\mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;tableware;}}) = \frac{1}{4} \]

The collection of these conditional frequencies for all values of \({\color[RGB]{238,102,119}Y}\) constitutes the conditional frequency distribution of \({\color[RGB]{238,102,119}Y}\) conditional on \({\color[RGB]{34,136,51}X\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x}\). In our example this distribution has three conditional frequencies:

\[\begin{aligned} &f({\color[RGB]{238,102,119}\mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;low;}} \nonscript\:\vert\nonscript\:\mathopen{} {\color[RGB]{34,136,51}\mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;tableware;}}) = \frac{1}{4} \\ &f({\color[RGB]{238,102,119}\mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;medium;}} \nonscript\:\vert\nonscript\:\mathopen{} {\color[RGB]{34,136,51}\mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;tableware;}}) = \frac{3}{4} \\ &f({\color[RGB]{238,102,119}\mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;high;}} \nonscript\:\vert\nonscript\:\mathopen{} {\color[RGB]{34,136,51}\mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;tableware;}}) = 0 \end{aligned}\]

which sum up to \(1\) as they should.

Conditional on a value of a variate

It doesn’t make sense to speak of the conditional frequency distribution of \(Y\) “conditional on \(X\)”. Conditional frequencies and frequency distributions are always conditional on some value of a variate. If we consider all possible values of \(Y\) and of \(X\) we obtain a collection of frequencies that is not a distribution.

A conditional frequency can be calculated as the ratio of a joint and a marginal frequencies, in a way analogous to conditional probabilities (§ 17.1):

\[ f({\color[RGB]{238,102,119}Y\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y} \nonscript\:\vert\nonscript\:\mathopen{} {\color[RGB]{34,136,51}X\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x}) = \frac{f({\color[RGB]{238,102,119}Y\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\color[RGB]{34,136,51}X\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x})}{f({\color[RGB]{34,136,51}X\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x})} = \frac{f({\color[RGB]{238,102,119}Y\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\color[RGB]{34,136,51}X\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x})}{ \sum_{\color[RGB]{238,102,119}y} f({\color[RGB]{238,102,119}Y\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y} \mathbin{\mkern-0.5mu,\mkern-0.5mu}{\color[RGB]{34,136,51}X\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x})} \]

Exercises

Calculate the conditional frequency distributions corresponding to the subpopulations of tables 22.2 and 22.3. For example, for table 22.2 this means calculating

\[\begin{aligned} &f(\mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;low;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathit{Si}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;low;} \nonscript\:\vert\nonscript\:\mathopen{} \mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;tableware;})\ , \\ &f(\mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;low;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathit{Si}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;medium;} \nonscript\:\vert\nonscript\:\mathopen{} \mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;tableware;})\ , \\ &\dotsc\ , \\ &f(\mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;high;} \mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathit{Si}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;high;} \nonscript\:\vert\nonscript\:\mathopen{} \mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;tableware;}) \end{aligned}\]

22.3 Associations

The analysis of subpopulations and conditional frequencies is important because it often reveals peculiar associations¹ among different variates and groups of variates. Let’s illustrate what we mean by “association” with an example.

¹ In everyday language this is the same as “correlation”. The term “association” is used in statistics to avoid confusion with the Pearson correlation coefficient (see § 18.5)

Extract the subpopulation having variate \(\mathit{Type}\) equal to \({\small\verb;headlamps;}\) from the population of table 22.1:

Table 22.4: Selection according to \(\mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;headlamps;}\)

unit	\(\mathit{Ca}\)	\(\mathit{Si}\)	\(\boldsymbol{\nonscript\:\vert\nonscript\:\mathopen{}}\mathit{Type}\)
1	\({\small\verb;low;}\)	\({\small\verb;high;}\)	\({\small\verb;headlamps;}\)
5	\({\small\verb;low;}\)	\({\small\verb;medium;}\)	\({\small\verb;headlamps;}\)

we notice that all units have variate \(\mathit{Ca}\) equal to \({\small\verb;low;}\). In terms of conditional frequencies, this means

\[ \begin{aligned} &f(\mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;low;} \nonscript\:\vert\nonscript\:\mathopen{} \mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;headlamps;}) = 1 \\ &f(\mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;medium;} \nonscript\:\vert\nonscript\:\mathopen{} \mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;headlamps;}) = 0 \\ &f(\mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;high;} \nonscript\:\vert\nonscript\:\mathopen{} \mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;headlamps;}) = 0 \end{aligned} \]

It is therefore impossible to observe other values of \(\mathit{Ca}\) in this new population.²

² We are not claiming that this fact will be true if new units are considered; this important question will be discussed later.

On the other hand, if we extract the subpopulation having variate \(\mathit{Ca}\) equal to \({\small\verb;low;}\) we obtain

Table 22.5: Selection according to \(\mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;low;}\)

unit	\(\boldsymbol{\nonscript\:\vert\nonscript\:\mathopen{}}\mathit{Ca}\)	\(\mathit{Si}\)	\(\mathit{Type}\)
1	\({\small\verb;low;}\)	\({\small\verb;high;}\)	\({\small\verb;headlamps;}\)
2	\({\small\verb;low;}\)	\({\small\verb;medium;}\)	\({\small\verb;building_windows_non_float_processed;}\)
5	\({\small\verb;low;}\)	\({\small\verb;medium;}\)	\({\small\verb;headlamps;}\)
7	\({\small\verb;low;}\)	\({\small\verb;medium;}\)	\({\small\verb;building_windows_non_float_processed;}\)
8	\({\small\verb;low;}\)	\({\small\verb;high;}\)	\({\small\verb;tableware;}\)

with conditional frequencies such as

\[ \begin{aligned} &f(\mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;headlamps;} \nonscript\:\vert\nonscript\:\mathopen{} \mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;low;}) = \frac{2}{5} \\[1ex] &f(\mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;tableware;} \nonscript\:\vert\nonscript\:\mathopen{} \mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;low;}) = \frac{1}{5} \end{aligned} \]

and so on. The reverse is therefore not true: if \(\mathit{Ca}\) is equal to \({\small\verb;low;}\), that does not mean that it’s impossible to observe other \(\mathit{Type}\) values besides \({\small\verb;headlamps;}\). Note especially how these frequencies differ:

\[ \begin{gathered} f(\mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;low;} \nonscript\:\vert\nonscript\:\mathopen{} \mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;headlamps;}) = 1 \\[1ex] f(\mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;headlamps;} \nonscript\:\vert\nonscript\:\mathopen{} \mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;low;}) = \frac{2}{5} \end{gathered} \]

In the original population we have, figuratively speaking, the following interesting association:

\[ \mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;headlamps;}\ \mathrel{\color[RGB]{34,136,51}\Rightarrow}\ \mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;low;} \qquad\text{\small but}\qquad \mathit{Ca}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;low;}\ \mathrel{\color[RGB]{238,102,119}\nRightarrow}\ \mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;headlamps;} \]

This kind of associations is often useful. Suppose for instance that you are asked to pick a unit with \(\mathit{Ca}\) equal to \({\small\verb;low;}\) in the original population; but it’s difficult to measure a unit’s \(\mathit{Ca}\) value, while it’s easy to measure its \(\mathit{Type}\) value. Then you could instead search for a unit having \(\mathit{Type}\) equal to \({\small\verb;headlamps;}\) (easier search), and you would be sure that the unit you found also has \(\mathit{Ca}\) equal to \({\small\verb;low;}\).

The example above, where some values of a variate completely exclude some values of another, is a special one. More often we find that there are small or large changes in the frequency distribution of some variate, depending on the subpopulation considered.

Exercise

Calculate the (marginal) frequency distribution for the \(\mathit{Ca}\) variate for the glass-fragment population of table 22.1. Is the value \({\small\verb;low;}\) more frequent than \({\small\verb;medium;}\)? or vice versa?
Calculate the frequency distribution for \(\mathit{Ca}\), conditional on \(\mathit{Type}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\small\verb;tableware;}\) (see table 22.2). How does this frequency distribution differ from the one you calculated above? Come up with possible ways to exploit this difference in concrete applications.

Associations can be very counter-intuitive

It is usually best to assess associations by explicitly calculating all relevant conditional frequencies, rather than jumping to intuitive conclusions after having examined just a few. Here’s an example.

Consider the statistical population defined as follows:

units: all reparations done by a repair company on a particular kind of electronic components, which is extremely delicate and usually very difficult to repair. The population has 26 units (every unit actually represents a batch 100 reparations, so the population really refers to 2600 reparations).
a joint variate, consisting in three binary ones:
- \(\mathit{\color[RGB]{102,204,238}Location}\) of the repair procedure, with values \({\color[RGB]{102,204,238}{\small\verb;onsite;}}\) and \({\color[RGB]{102,204,238}{\small\verb;remote;}}\);
- repair \(\mathit{\color[RGB]{34,136,51}Method}\), with values \({\color[RGB]{34,136,51}{\small\verb;old;}}\) and \({\color[RGB]{34,136,51}{\small\verb;new;}}\), representing a traditional reparation method and one introduced more recently;
- \(\mathit{\color[RGB]{238,102,119}Outcome}\) of the repair procedure, with values \(\color[RGB]{238,102,119}{\small\verb;success;}\) and \(\color[RGB]{238,102,119}{\small\verb;fail;}\).

Table 22.6: Reparations (each row is one unit, representing 100 reparations).
file repair_data.csv

\(\mathit{\color[RGB]{238,102,119}Outcome}\)	\(\mathit{\color[RGB]{34,136,51}Method}\)	\(\mathit{\color[RGB]{102,204,238}Location}\)
\(\color[RGB]{238,102,119}{\small\verb;success;}\)	\({\color[RGB]{34,136,51}{\small\verb;old;}}\)	\({\color[RGB]{102,204,238}{\small\verb;onsite;}}\)
\(\color[RGB]{238,102,119}{\small\verb;fail;}\)	\({\color[RGB]{34,136,51}{\small\verb;new;}}\)	\({\color[RGB]{102,204,238}{\small\verb;remote;}}\)
\(\color[RGB]{238,102,119}{\small\verb;fail;}\)	\({\color[RGB]{34,136,51}{\small\verb;old;}}\)	\({\color[RGB]{102,204,238}{\small\verb;remote;}}\)
\(\color[RGB]{238,102,119}{\small\verb;fail;}\)	\({\color[RGB]{34,136,51}{\small\verb;new;}}\)	\({\color[RGB]{102,204,238}{\small\verb;remote;}}\)
\(\color[RGB]{238,102,119}{\small\verb;fail;}\)	\({\color[RGB]{34,136,51}{\small\verb;old;}}\)	\({\color[RGB]{102,204,238}{\small\verb;remote;}}\)
\(\color[RGB]{238,102,119}{\small\verb;fail;}\)	\({\color[RGB]{34,136,51}{\small\verb;old;}}\)	\({\color[RGB]{102,204,238}{\small\verb;remote;}}\)
\(\color[RGB]{238,102,119}{\small\verb;success;}\)	\({\color[RGB]{34,136,51}{\small\verb;new;}}\)	\({\color[RGB]{102,204,238}{\small\verb;onsite;}}\)
\(\color[RGB]{238,102,119}{\small\verb;success;}\)	\({\color[RGB]{34,136,51}{\small\verb;old;}}\)	\({\color[RGB]{102,204,238}{\small\verb;remote;}}\)
\(\color[RGB]{238,102,119}{\small\verb;success;}\)	\({\color[RGB]{34,136,51}{\small\verb;new;}}\)	\({\color[RGB]{102,204,238}{\small\verb;onsite;}}\)
\(\color[RGB]{238,102,119}{\small\verb;success;}\)	\({\color[RGB]{34,136,51}{\small\verb;new;}}\)	\({\color[RGB]{102,204,238}{\small\verb;remote;}}\)
\(\color[RGB]{238,102,119}{\small\verb;success;}\)	\({\color[RGB]{34,136,51}{\small\verb;old;}}\)	\({\color[RGB]{102,204,238}{\small\verb;onsite;}}\)
\(\color[RGB]{238,102,119}{\small\verb;success;}\)	\({\color[RGB]{34,136,51}{\small\verb;old;}}\)	\({\color[RGB]{102,204,238}{\small\verb;onsite;}}\)
\(\color[RGB]{238,102,119}{\small\verb;fail;}\)	\({\color[RGB]{34,136,51}{\small\verb;old;}}\)	\({\color[RGB]{102,204,238}{\small\verb;remote;}}\)
\(\color[RGB]{238,102,119}{\small\verb;success;}\)	\({\color[RGB]{34,136,51}{\small\verb;new;}}\)	\({\color[RGB]{102,204,238}{\small\verb;onsite;}}\)
\(\color[RGB]{238,102,119}{\small\verb;fail;}\)	\({\color[RGB]{34,136,51}{\small\verb;new;}}\)	\({\color[RGB]{102,204,238}{\small\verb;onsite;}}\)
\(\color[RGB]{238,102,119}{\small\verb;fail;}\)	\({\color[RGB]{34,136,51}{\small\verb;old;}}\)	\({\color[RGB]{102,204,238}{\small\verb;onsite;}}\)
\(\color[RGB]{238,102,119}{\small\verb;fail;}\)	\({\color[RGB]{34,136,51}{\small\verb;old;}}\)	\({\color[RGB]{102,204,238}{\small\verb;remote;}}\)
\(\color[RGB]{238,102,119}{\small\verb;fail;}\)	\({\color[RGB]{34,136,51}{\small\verb;old;}}\)	\({\color[RGB]{102,204,238}{\small\verb;remote;}}\)
\(\color[RGB]{238,102,119}{\small\verb;success;}\)	\({\color[RGB]{34,136,51}{\small\verb;new;}}\)	\({\color[RGB]{102,204,238}{\small\verb;onsite;}}\)
\(\color[RGB]{238,102,119}{\small\verb;success;}\)	\({\color[RGB]{34,136,51}{\small\verb;old;}}\)	\({\color[RGB]{102,204,238}{\small\verb;remote;}}\)
\(\color[RGB]{238,102,119}{\small\verb;success;}\)	\({\color[RGB]{34,136,51}{\small\verb;new;}}\)	\({\color[RGB]{102,204,238}{\small\verb;onsite;}}\)
\(\color[RGB]{238,102,119}{\small\verb;fail;}\)	\({\color[RGB]{34,136,51}{\small\verb;new;}}\)	\({\color[RGB]{102,204,238}{\small\verb;remote;}}\)
\(\color[RGB]{238,102,119}{\small\verb;fail;}\)	\({\color[RGB]{34,136,51}{\small\verb;new;}}\)	\({\color[RGB]{102,204,238}{\small\verb;remote;}}\)
\(\color[RGB]{238,102,119}{\small\verb;success;}\)	\({\color[RGB]{34,136,51}{\small\verb;new;}}\)	\({\color[RGB]{102,204,238}{\small\verb;onsite;}}\)
\(\color[RGB]{238,102,119}{\small\verb;success;}\)	\({\color[RGB]{34,136,51}{\small\verb;old;}}\)	\({\color[RGB]{102,204,238}{\small\verb;onsite;}}\)
\(\color[RGB]{238,102,119}{\small\verb;fail;}\)	\({\color[RGB]{34,136,51}{\small\verb;new;}}\)	\({\color[RGB]{102,204,238}{\small\verb;onsite;}}\)

The repair company claims that, in this population, the \({\color[RGB]{34,136,51}{\small\verb;new;}}\) repair method is more effective than the \({\color[RGB]{34,136,51}{\small\verb;old;}}\). Can you back up their claims?:

Exercise (one of the most fun of the course!)

Use the population data above. The calculations can be done with any tools you like.

Examine the whole population first:
1. Calculate the frequency distribution of the \(\mathit{\color[RGB]{238,102,119}Outcome}\) variate, conditional on \(\mathit{\color[RGB]{34,136,51}Method}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{34,136,51}{\small\verb;new;}}\) (note that we are disregarding the \(\mathit{\color[RGB]{102,204,238}Location}\)).
2. Calculate the frequency distribution of the \(\mathit{\color[RGB]{238,102,119}Outcome}\) variate, conditional on \(\mathit{\color[RGB]{34,136,51}Method}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{34,136,51}{\small\verb;old;}}\).
3. Compare the two conditional frequency distributions above. Which of the two repair methods seems more effective?
  Are the claims of the repair company justified?
Now examine the reparations that have been done \({\color[RGB]{102,204,238}{\small\verb;onsite;}}\):
1. Before doing any calculations, what do you expect to find? should the \({\color[RGB]{34,136,51}{\small\verb;new;}}\) repair method be more effective than the \({\color[RGB]{34,136,51}{\small\verb;old;}}\) one, for onsite reparations?
2. Calculate the frequency distribution of the \(\mathit{\color[RGB]{238,102,119}Outcome}\) variate, conditional on \(\mathit{\color[RGB]{34,136,51}Method}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{34,136,51}{\small\verb;new;}}\) and \(\mathit{\color[RGB]{102,204,238}Location}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;onsite;}}\).
3. Calculate the frequency distribution of the \(\mathit{\color[RGB]{238,102,119}Outcome}\) variate, conditional on \(\mathit{\color[RGB]{34,136,51}Method}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{34,136,51}{\small\verb;old;}}\) and \(\mathit{\color[RGB]{102,204,238}Location}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;onsite;}}\).
4. Compare the two conditional frequency distributions for this \({\color[RGB]{102,204,238}{\small\verb;onsite;}}\) case. Which of the two repair methods seems more effective?
  How do you explain this result in the light of what you found in step 3.?
Now examine the reparations that have been done \({\color[RGB]{102,204,238}{\small\verb;remote;}}\)ly:
1. Before doing any calculations, what do you expect to find? should the \({\color[RGB]{34,136,51}{\small\verb;new;}}\) repair method be more effective than the \({\color[RGB]{34,136,51}{\small\verb;old;}}\) one, for reparations done remotely?
2. Calculate the frequency distribution of the \(\mathit{\color[RGB]{238,102,119}Outcome}\) variate, conditional on \(\mathit{\color[RGB]{34,136,51}Method}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{34,136,51}{\small\verb;new;}}\) and \(\mathit{\color[RGB]{102,204,238}Location}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;remote;}}\).
3. Calculate the frequency distribution of the \(\mathit{\color[RGB]{238,102,119}Outcome}\) variate, conditional on \(\mathit{\color[RGB]{34,136,51}Method}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{34,136,51}{\small\verb;old;}}\) and \(\mathit{\color[RGB]{102,204,238}Location}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{102,204,238}{\small\verb;remote;}}\).
4. Compare the two conditional frequency distributions for this \({\color[RGB]{102,204,238}{\small\verb;remote;}}\) case. Which of the two repair methods seems more effective?
  How do you explain this result in the light of what you found in steps 3. and 7.?
Summarize and explain all your findings.
Can the repair company claim that the \({\color[RGB]{34,136,51}{\small\verb;new;}}\) repair method is better than the \({\color[RGB]{34,136,51}{\small\verb;old;}}\)?
Suppose you need to send an electronic component for repair to this company.
1. If you could choose both the \(\mathit{\color[RGB]{102,204,238}Location}\) and the \(\mathit{\color[RGB]{34,136,51}Method}\) of the repair, which would you choose? why?
2. If you could only choose the repair \(\mathit{\color[RGB]{34,136,51}Method}\), but have no control over the \(\mathit{\color[RGB]{102,204,238}Location}\), which method would you choose? why?
Is there other information, missing from the description of the population, that should be known before answering the questions above?

Study reading

§§2.2–2.4 and 2.7–2.10 of Risk Assessment and Decision Analysis with Bayesian Networks
The role of exchangeability in inference This can be a difficult reading. Try to get the main message.
Simpson’s paradox