2  Framework

Published

2024-02-14

2.1 What does the intro problem tell us?

Let’s approach the “accept or discard?” problem of the previous chapter 1 in an intuitive way.

We’re jumping the gun here, because we haven’t learned the method to solve this problem yet!

First, what happens if we accept the component?

We must try to make sense of the \(10\%\) probability that the component fails within a year. For the moment let’s use an intuitive imagination trick to make sense of this. Imagine that this situation is repeated 100 times. In 10 of these repetitions the accepted electronic component is sold and fails within a year after selling. In the remaining 90 repetitions, the component is sold and works fine for at least a year. Later on we’ll approach this in a more rigorous way, where the idea of “imaginary repetitions” is not needed.

In each of the 10 imaginary repetitions in which the component fails early, the manufacturer loses \(\color[RGB]{238,102,119}11\$\). That’s a total loss of \({\color[RGB]{204,187,68}10} \cdot {\color[RGB]{238,102,119}11\$} = {\color[RGB]{238,102,119}110\$}\). In each of the 90 imaginary repetitions in which the component doesn’t fail early, the manufacturer gains \(\color[RGB]{34,136,51}1\$\). That’s a total gain of \({\color[RGB]{204,187,68}90} \cdot {\color[RGB]{34,136,51}1\$} = {\color[RGB]{34,136,51}90\$}\). So over all 100 imaginary repetitions the manufacturer gains

\[ {\color[RGB]{204,187,68}10}\cdot ({\color[RGB]{238,102,119}-11\$}) + {\color[RGB]{204,187,68}90}\cdot {\color[RGB]{34,136,51}1\$} = {\color[RGB]{238,102,119}-20\$} \]

that is, the manufacturer has not gained, but lost \(20\$\)! That’s an average of \(0.2\$\) lost per repetition.


Now let’s examine the second choice: what happens if we discard the component instead?

In this case we don’t need to invoke imaginary repetitions, but even if we do, it’s clear that the manufacturer doesn’t gain or lose anything – that is, the “gain” is \(0\$\) – in each and all of the repetitions.

The conclusion is that if in a situation like this we accept the component, then we’ll lose \(0.2\$\) on average; whereas if we discard it, then on average we’ll lose (or gain) \(0\$\) on average.

Obviously the best, or “least worst”, decision to make is to discard the component.

Exercises
  1. Now that we have an idea of the general reasoning, check what happens with different values of the probability of failure and of the failure cost: is it still best to discard? For instance, try with

    1. failure probability 10% and failure cost 5$;
    2. failure probability 5% and failure cost 11$;
    3. failure probability 10%, failure cost 11$, non-failure gain 2$.

    Feel free to get wild and do plots.

  2. Identify the failure probability at which accepting the component doesn’t lead to any loss or any gain, so it doesn’t matter whether we discard or accept. (You can solve this as you prefer: analytically with an equation, visually with a plot, by trial & error on several cases, or whatnot.)

  3. Consider the special case with failure probability 0% and failure cost 10$. This means no new component will ever fail. To decide in such a case we do not need imaginary repetitions; but confirm that we arrive at the same logical conclusion whether we reason through imaginary repetitions or not.

  4. Consider this completely different problem:

    A patient is examined by a brand-new medical diagnostics AI system.

    The AI first performs some clinical tests on the patient. The tests give an uncertain forecast of whether the patient has a particular disease or not.

    Then the AI decides whether the patient should be dismissed without treatment, or treated with a particular medicine.

    If the patient is dismissed, then the life expectancy doesn’t increase or decrease if the disease is not present, but it decreases by 10 years if the disease is actually present. If the patient is treated, then the life expectancy decreases by 1 year if the disease is not present (owing to treatment side-effects), but also if the disease is present (because it cures the disease, so the life expectancy doesn’t decrease by 10 years; but it still decreases by 1 year owing to the side effects).

    For this patient, the clinical tests indicate that there is a \(10\%\) probability that the patient has the disease.

    Should the diagnostic AI dismiss or treat the patient? Find differences and similarities, even numerical, with the assembly-line problem.


From the solution of the problem and from the exploring exercises, we gather some instructive points:

  • Is it enough if we simply know that the component is less likely to fail than not? in other words, if we simply know that the probability of failure is less than 50% but don’t know its precise value?

    Obviously not. We found that if the failure probability is \(10\%\) then it’s best to discard; but if it’s \(5\%\) then it’s best to accept. In both cases the probability of failure was less than 50%, but the decisions were different. Moreover, we found that the probability affected amount of loss if one made the non-optimal decision. Therefore:

    Knowledge of precise probabilities is absolutely necessary for making the best decision

  • Is it enough if we simply know that failure leads to a loss, and non-failure leads to a gain, without knowing the precise amounts of loss and gain?

    Obviously not. In the exercise we found that if the failure cost is \(11\$\) then it’s best to discard; but if it’s \(5\$\) then it’s best to accept. It’s also best to accept if the failure cost is \(11\$\) but the non-failure gain is \(2\$\). Therefore:

    Knowledge of the precise gains and losses is absolutely necessary for making the best decision

  • Is this kind of decision situation only relevant to assembly lines and sales?

    By all means not. We found a clinical situation that’s exactly analogous: there’s uncertainty, there are gains and losses (of lifetime rather than money), and the best decision depends on both.

2.2 Our focus: decision-making, inference, and data science

Every data-driven engineering project is unique, with its unique difficulties and problems. But there are also problems common to all engineering projects.

In the scenarios we explored above, we found an extremely important problem-pattern. There is a decision or choice to make (and “not deciding” is not an option, or it’s just another kind of choice). Making a particular decision will lead to some consequences, some leading to something desirable, others leading to something undesirable. The decision is difficult because its consequences are not known with certainty, given the information and data available in the problem. We may lack information and data about past or present details, about future events and responses, and so on. This is what we call a problem of decision-making under uncertainty or under risk1, or simply a “decision problem” for short.

1 We’ll avoid the word “risk” because it has several different technical meanings in the literature, some even contradictory.

This problem-pattern appears literally everywhere. But our explored scenarios also suggest that this problem-pattern has a sort of systematic solution method.

In this course we’re going to focus on decision problems and their systematic solution method. We’ll learn a framework and some abstract notions that allow us to frame and analyse this kind of problem, and we’ll learn a universal set of principles to solve it. This set of principles goes under the name of Decision Theory.

But what do decision-making under uncertainty and Decision Theory have to do with data and data science? The three are profoundly and tightly related on many different planes:

  • Data science is based on the laws of Decision Theory. These laws are similar to what the laws of physics are to a rocket engineer. Failure to account for these fundamental laws leads at best to sub-optimal solutions, at worst to disasters.

  • Machine-learning algorithms, in particular, are realizations or approximations of the rules of Decision Theory. This is clear, for instance, considering that a machine-learning classifier is actually choosing among possible output labels or classes.

  • The rules of Decision Theory are also the foundations upon which artificial-intelligence agents, which must make optimal inferences and decisions, are built.

  • We saw that probability values are essential to a decision problem. How do we find them? Data play an important part in their calculation. In our intro example, the failure probability must come from observations or experiments on similar electronic components.

  • We saw that also the values of gains and losses are essential. Data play an important part in their calculation as well.

These five planes will constitute the major parts and motivations of the present course.


There are other important aspects in engineering problems, besides the one of making decisions under uncertainty. For instance the discovery or the invention of new technologies and solutions. Aspects such as these two can barely be planned or decided. Their drive and direction, however, rest on a strive for improvement and optimization – and the fundamental laws of decision theory tell us what’s optimal and what’s not.

Artificial intelligence is proving to be a valuable aid in these more creative aspects too. This kind of use of AI is outside the scope of the present notes. Some aspects of this creativity-assisting use, however, do fall within the domain of the present notes. A pattern-searching algorithm, for example, can be optimized by means of the method we are going to study.

2.3 Our goal: optimality, not “success”

What should we demand from a systematic method for solving decision problems?

By definition, in a decision problem under uncertainty there is generally no method to determine the decision that surely leads to the desired consequence – if such a method existed, then the problem would not have any uncertainty! Therefore, if there is a method to deal with decision problems, its goal cannot be the determination of the successful decision. This also means that a priori we cannot blame an engineer for making an unsuccessful decision in a situation of uncertainty. Then what should be the goal of such a method?

Imagine two persons, Henry and Tina, who must choose between a “heads-bet” or a “tails-bet” before a coin is tossed; the bets work as follows:

  • “heads-bet”: if the coin lands heads, the person wins a small amount of money; but if it lands tails, they lose a large amount of money

  • “tails-bet”: if the coin lands tails, the person wins a small amount of money; if it lands heads, they lose the same small amount of money

Exercise

Which bet would you choose? why?

Henry chooses the heads-bet. Tina chooses the tails-bet. The coin comes down heads. So Henry wins the small amount of money, while Tina loses the same small amount.

What would we say about their decisions?

Henry’s decision was lucky, and yet irrational: he risked losing much more money than he could win. Tina’s decision was unlucky, and yet rational: she wasn’t risking to lose more than she could win. Said otherwise, the two bets had the same winning prospects, but the heads-bet had more risk of loss than the tails-bet.

We expect that any person making Henry’s decision in similar, future bets will eventually lose more money than any person making Tina’s decision.

This example shows two points:

  • “success” is generally not a good criterion to judge a decision under uncertainty; success can be the pure outcome of luck, not of smarts

  • even if there is no method to determine which decision is successful, there is a method to determine which decision is rational or optimal, given the particular gains, losses, and uncertainties involved in the decision problem

We had a glimpse of this method in our introductory scenarios.

Let us emphasize, however, that we are not giving up on “success”, or trading it for “optimality”. Indeed we’ll find that Decision Theory automatically leads to the successful decision in problems where uncertainty is not present or is irrelevant. It’s a win-win. It’s important to keep this point in mind:

Note

Aiming to find the solutions that are successful can make us fail to find those that are optimal when the successful ones cannot be determined.

Aiming to find the solutions that are optimal makes us automatically find those that are successful when those can be determined.

We shall later witness this fact with our own eyes, and will take it up again in the discussion of some misleading techniques to evaluate machine-learning algorithms.

2.4 Decision Theory

So far we have mentioned that Decision Theory has the following features:

  • it tells us what’s optimal and, when possible, what’s successful

  • it takes into consideration decisions, consequences, costs and gains

  • it is able to deal with uncertainties

What other kinds of features should we demand from it, in order to be applied to as many kinds of decision problems as possible, and to be relevant for data science?

  • If we find an optimal decision in regards to some outcome, it may still happen that the decision can be realized in several ways that are equivalent in regard to the outcome, but inequivalent in regard to time or resources. In the assembly-line scenario, for example, the decision discard could be carried out by burning, recycling, and so on. We thus face a decision within a decision. In general, a decision problem may involve several decision sub-problems, in turn involving decision sub-sub-problems, and so on.

  • In data science, a common engineering goal is to design and build an automated or AI-based device capable of making an optimal decision in a specific kind of uncertain situations. Think for instance of an aeronautic engineer designing an autopilot system, or a software company designing an image classifier.

Decision Theory turns out to meet these two demands too, thanks to the following features:

  • it is susceptible to recursive, sequential, and modular application

  • it can be used not only for human decision-makers, but also for automated or AI devices


Decision Theory has a long history, going back to Leibniz in the 1600s and partly even to Aristotle in the −300s, and appearing in its present form around 1920–1960. What’s remarkable about it is that it is not only a framework, but the framework we must use. A logico-mathematical theorem shows that any framework that does not break basic optimality and rationality criteria has to be equivalent to Decision Theory. In other words, any “alternative” framework may use different technical terminology and rewrite mathematical operations in a different way, but it boils down to the same notions and operations of Decision Theory. So if you wanted to invent and use another framework, then either (a) it would lead to some irrational or illogical consequences, or (b) it would lead to results identical to Decision Theory’s. Many frameworks that you are probably familiar with, such as optimization theory or Boolean logic, are just specific applications or particular cases of Decision Theory.

Thus we list one more important characteristic of Decision Theory:

  • it is normative

Normative contrasts with descriptive. The purpose of Decision Theory is not to describe, for example, how human decision-makers typically make decisions. Because human decision-makers typically make irrational, sub-optimal, or biased decisions. That’s exactly what we want to avoid and improve!

Study reading

Who says that Decision Theory should be normative? – this is a respectable scientific question. If you found yourself wondering and doubting about this, then congratulations: that’s how a scientist should think!

Later on we’ll examine material and arguments about this statement. If you like, feel free to already skim through the following works, as a starting point: