library('extraDistr')
library('foreach')
source('tplotfunctions.R')
source('guessmetadata.R')
source('buildagent.R')
source('infer.R')
source('decide.R')
source('mutualinfo.R')
source('rF.R')
source('plotFsamples1D.R')
options(repr.plot.width = 6*sqrt(2), repr.plot.height = 6)
<- buildagent(metadata = 'meta_income_data_example.csv',
opmall data = 'train-income_data_example.csv')
38 The prototype Optimal Predictor Machine makes decisions
\[ \DeclarePairedDelimiters{\set}{\{}{\}} \DeclareMathOperator*{\argmax}{argmax} \]
It is straightforward to implement decision-making in our prototype Optimal Predictor Machine. Let’s continue with the example from chapter 35.
38.1 Initialization and build of OPM agent
Load the necessary libraries and functions, including the decide()
function, and train the agent as we did previously:
38.2 Decision matrix
We use the targeted-advertisement scenario of § 37.2, with the following utility matrix for the three ad-types:
<- matrix(
adutilities c(-1, 3,
2, 2,
3,-1),
nrow = 3, byrow = TRUE,
dimnames = list(ad_type = c('A','B','C'), income = c('<=50K', '>50K')))
print(adutilities)
income
ad_type <=50K >50K
A -1 3
B 2 2
C 3 -1
38.3 Example application
First let’s apply the principle of maximal expected utility step-by-step.
Consider the example from § 37.2. The agent calculates the probabilities for the predictand income
from the given predictor values:
<- list(workclass = 'Private', education = 'Bachelors',
userpredictors marital_status = 'Never-married',
occupation = 'Prof-specialty',
relationship = 'Not-in-family', race = 'White',
sex = 'Female', native_country = 'United-States')
<- infer(agent = opmall, predictand = 'income',
probs predictor = userpredictors)
print(probs)
income
<=50K >50K
0.833333 0.166667
Find the expected utilities of the three possible ad-types by matrix multiplication:
%*% probs adutilities
ad_type [,1]
A -0.33333
B 2.00000
C 2.33333
And we see that ad-type C
is optimal.
The function decide()
does the previous calculations. It outputs a list with elements:
EUs
: the expected utilities of the decisions, sorted from highest to lowestoptimal
: one decision unsystematically chosen among the optimal ones (if more than one)
<- decide(utils = adutilities, probs = probs)
optimalad
print(optimalad)
$EUs
C B A
2.33333 2.00000 -0.33333
$optimal
[1] "C"
38.4 Performance on test set
Finally let’s apply our prototype agent to a test set, as a demonstration, and see how much utility it yields. This procedure will be discussed in more detail in § 40.
Load the test dataset; M
is the number of test data:
<- read.csv('test-income_data_example.csv', header = TRUE,
testdata na.strings = '', stringsAsFactors = FALSE, tryLogical = FALSE)
<- nrow(testdata) M
We build the analogous of a “confusion matrix” (§ 40), telling us how many times the agent chooses the three ad-types for both income levels.
<- adutilities * 0L
confusionmatrix
## Use a for-loop for clarity
for(i in 1:M){
<- testdata[i, colnames(testdata) != 'income']
userpredictors <- infer(agent = opmall, predictand = 'income',
probs predictor = userpredictors)
<- decide(utils = adutilities, probs = probs)$optimal
decision <- testdata[i, 'income']
trueincome
<- confusionmatrix[decision, trueincome] + 1L
confusionmatrix[decision, trueincome]
}
print(confusionmatrix)
income
ad_type <=50K >50K
A 769 2149
B 11768 5093
C 12961 1174
The total utility yield is the total sum of the element-wise product of the confusionmatrix
and the adutilities
matrix
<- sum(adutilities * confusionmatrix)
totalyield <- totalyield/M
averageyield
cat('\nTotal yield =', totalyield,
'\nAverage yield =', averageyield, '\n')
Total yield = 77109
Average yield = 2.27366
Note that:
This yield is higher than what would be obtained by just choosing the neutral ad-type
B
for all test units (the average yield would be exactly2
).This yield is also higher than would be obtained by always choosing ad-type
C
, targeting the majority of units, which haveincome = '<=50K'
. This strategy would yield2.00737
.