We shall compare the results obtained in some numerical simulations by using

- a Machine-Learning Classifier trained to do most successful guesses

- a prototype “Optimal Predictor Machine” trained to make the optimal decision

For the moment we treat both as “black boxes”, that is, we don’t study yet how they’re calculating their outputs (although you may already have a good guess at how the Optimal Predictor Machine works).

Their operation is implemented in the R function defined here:

In [7]:
import random

def hitsvsgain(ntrials, chooseAtrueA, chooseAtrueB, chooseBtrueB, chooseBtrueA, probsA=[0.5]):
    ## Recycle & shuffle the given probabilities for the number of trials
    probsArepeated = random.choices(probsA, k=ntrials)
    
    ## "Magic" parameters used in making the optimal decision
    threshold1 = (chooseAtrueA - chooseAtrueB + chooseBtrueB - chooseBtrueA)
    threshold2 = (chooseBtrueB - chooseAtrueB)
    
    ## Initialize total "hits" and gains
    ## 'mlc' refers to the Machine-Learning Classifier
    ## 'opm' refers to the Optimal Predictor Machine
    mlchits = 0
    mlcgain = 0
    opmhits = 0
    opmgain = 0
    
    ##
    ## Loop through the trials and their probabilities
    for probabilityA in probsArepeated:
        ## Output of the MLC, based on the current probability
        if probabilityA > 0.5:
            mlcchoice = 'A'
        elif probabilityA < 0.5:
            mlcchoice = 'B'
        else:
            mlcchoice = random.choice(['A', 'B']) # A or B with 50%/50% prob.
        
        ## Output of the OPM, based on the current probability
        ## try to understand where this inequality comes from
        if (chooseAtrueA - chooseAtrueB + chooseBtrueB - chooseBtrueA) * probabilityA > (chooseBtrueB - chooseAtrueB):
            opmchoice = 'A'
        elif (chooseAtrueA - chooseAtrueB + chooseBtrueB - chooseBtrueA) * probabilityA < (chooseBtrueB - chooseAtrueB):
if threshold1 * probabilityA > threshold2:
            opmchoice = 'B'
        else:
            opmchoice = random.choice(['A', 'B']) # A or B with 50%/50% prob.
        
        ##
        ## Correct answer for the current trial
        trueitem = random.choices(['A', 'B'], weights=[probabilityA, 1-probabilityA], k=1)[0]
        
        ##
        ## MLC: add one "hit" if correct guess, and add gain/loss
        if mlcchoice == trueitem:
            mlchits += 1 # one success
            if trueitem == 'A':
                mlcgain += chooseAtrueA
            else:
                mlcgain += chooseBtrueB
        else:
            if trueitem == 'B':
                mlcgain += chooseAtrueB
            else:
                mlcgain += chooseBtrueA
        
        ##
        ## OPM: add one "hit" if correct guess, and add gain/loss
        if opmchoice == trueitem:
            opmhits += 1 # one success
            if trueitem == 'A':
                opmgain += chooseAtrueA
            else:
                opmgain += chooseBtrueB
        else:
            if trueitem == 'B':
                opmgain += chooseAtrueB
            else:
                opmgain += chooseBtrueA
    
    ## end of loop
    ##
    ## Output total number of hits and total gain or loss produced
    print('\nTrials:', ntrials)
    print('Machine-Learning Classifier: successes', mlchits, '(', format(mlchits/ntrials*100, '.3f'), '%)',
          '| total gain', mlcgain)
    print('Optimal Predictor Machine:   successes', opmhits, '(', format(opmhits/ntrials*100, '.3f'), '%)',
          '| total gain', opmgain)
    print('\n')

The function above has 6 arguments:

- `ntrials`: how many simulations of guesses to make
- `chooseAtrueA`: utility gained by guessing `A` when the successful guess is indeed `A`
- `chooseAtrueB`: utility gained by guessing `A` when the successful guess is `B` instead
- `chooseBtrueB`: utility gained by guessing `B` when the successful guess is indeed `B`
- `chooseBtrueA`: utility gained by guessing `B` when the successful guess is `A` instead
- `probsA`: a tuple of probabilities (between `0` and `1`) to be used in the simulations (recycling it if necessary), for the successful guess being `A`; the corresponding probabilities for `B` are therefore `1-probsA`. If this argument is omitted it defaults to `0.5` (not very interesting)


## Example 1: electronic component

Let's apply our two classifiers to the *Accept or discard?* problem. Call `A` the alternative in which the element won't fail before one year, and should therefore be accepted *if this alternative were known at the time of the decision*. Call `B` the alternative in which the element will fail within a year, and should therefore be discarded *if this alternative were known at the time of the decision*. (Remember that the crucial point here is that the classifiers *don't* have this information at the moment of making the decision.)

We simulate this decision for 100000 components ("trials"), assuming that the probabilities of failure can be `0.05`, `0.20`, `0.80`, `0.95`. The values of the arguments should be clear:


In [8]:
hitsvsgain(ntrials=100000,
           chooseAtrueA=+1, chooseAtrueB=-11,
           chooseBtrueB=0, chooseBtrueA=0,
           probsA=[0.05, 0.20, 0.80, 0.95])


Trials: 100000
Machine-Learning Classifier: successes 87410 ( 87.410 %) | total gain -26140
Optimal Predictor Machine:   successes 72535 ( 72.535 %) | total gain 9475




Which classifier makes most *successful* guesses?

Which classifier gives the highest monetary gain?

## Example 2: find Aladdin! (image recognition)

A typical use of machine-learning classifiers is for image recognition: for instance, the classifier guesses whether a particular subject is present in the image or not.

Intuitively one may think that "guessing successfully" should be the best goal here. But exceptions to this may be more common than one thinks. Consider the following scenario:

> Bianca has a computer folder with 10000 photos. Some of these include her beloved cat Aladdin, who sadly passed away recently. She would like to select all photos that include Aladdin and save them in a separate "Aladdin" folder. Doing this by hand would take too long, if at all possible; so Bianca wants to employ a machine-learning classifier.
> 
> For Bianca it's important that no photo with Aladdin goes missing, so she would be very sad if any photo with him weren't correctly recognized; on the other hand she doesn't mind if some photos without him end up in the "Aladdin" folder -- she can delete them herself afterwards.

Let's apply and compare our two classifiers to this image-recognition problem, using again the `hitsvsgain()` function. We call `A` the case where Aladdin is present in a photo, and `B` where he isn't. To reflect Bianca's preferences, let's use these "emotional utilities":

- `chooseAisA = +2`: Aladdin is correctly recognized
- `chooseBisA = -2`: Aladdin is not recognized and photo goes missing
- `chooseBisB = +1`: absence of Aladding is correctly recognized
- `chooseAisB = -1`: photo without Aladding end up in "Aladding" folder

and let's say that the photos may have probabilities `0.3`, `0.4`, `0.6`, `0.7` of including Aladding:


In [9]:
hitsvsgain(ntrials=10000,
           chooseAtrueA=+2, chooseAtrueB=-1,
           chooseBtrueB=1, chooseBtrueA=-2,
           probsA=[0.3, 0.4, 0.6, 0.7])


Trials: 10000
Machine-Learning Classifier: successes 6513 ( 65.130 %) | total gain 4565
Optimal Predictor Machine:   successes 5994 ( 59.940 %) | total gain 5443




Again we see that the machine-learning classifier makes more successful guesses than the optimal predictor machine, but the latter yields a higher "emotional utility".

You may sensibly object that this result could depend on the peculiar utilities or probabilities chosen for this example. The next exercise helps answering your objection.

## Now play and experiment!

- Is there any case in which the optimal predictor machine yields a strictly lower utility than the machine-learning classifier?
    + Try using different utilities, for instance using `±5` instead of `±2`, or whatever other values you please.
    + Try using different probabilities as well.
<br>
<br>
    
- As in the previous exercise, try to understand what's happening. Consider this question: *how many photos including Aladdin did each classifier miss?*
    
    Modify the `hitsvsgain()` function to output this result.

- Do the comparison using the following utilities: `chooseAtrueA=+1, chooseAtrueB=-1, chooseBtrueB=1, chooseBtrueA=-1`. What's the result? what does this tell you about the relationship between the machine-learning classifier and the optimal predictor machine?
