- All previous predictors and predictands known (supervised learning)
\[
\begin{aligned}
&\mathrm{P}(\color[RGB]{238,102,119}
Y_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y
\color[RGB]{0,0,0}\nonscript\:\vert\nonscript\:\mathopen{}
\color[RGB]{34,136,51}
X_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x
\, \mathbin{\mkern-0.5mu,\mkern-0.5mu}\,
Y_{N}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y_{N}
\mathbin{\mkern-0.5mu,\mkern-0.5mu}
X_{N}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x_{N}
\mathbin{\mkern-0.5mu,\mkern-0.5mu}\dotsb \mathbin{\mkern-0.5mu,\mkern-0.5mu}
Y_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y_{1}
\mathbin{\mkern-0.5mu,\mkern-0.5mu}
X_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x_{1}
\color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I})
\\[2ex]
&\qquad{}=
\frac{
\mathrm{P}(\color[RGB]{238,102,119}
Y_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y
\color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}
\color[RGB]{34,136,51}
X_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x
\, \mathbin{\mkern-0.5mu,\mkern-0.5mu}\,
Y_{N}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y_{N}
\mathbin{\mkern-0.5mu,\mkern-0.5mu}
X_{N}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x_{N}
\mathbin{\mkern-0.5mu,\mkern-0.5mu}\dotsb \mathbin{\mkern-0.5mu,\mkern-0.5mu}
Y_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y_{1}
\mathbin{\mkern-0.5mu,\mkern-0.5mu}
X_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x_{1}
\color[RGB]{0,0,0}\nonscript\:\vert\nonscript\:\mathopen{} \mathsfit{I})
}{
\sum_{\color[RGB]{170,51,119}y}
\mathrm{P}(\color[RGB]{238,102,119}
Y_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{170,51,119}y}
\color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}
\color[RGB]{34,136,51}
X_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x
\, \mathbin{\mkern-0.5mu,\mkern-0.5mu}\,
Y_{N}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y_{N}
\mathbin{\mkern-0.5mu,\mkern-0.5mu}
X_{N}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x_{N}
\mathbin{\mkern-0.5mu,\mkern-0.5mu}\dotsb \mathbin{\mkern-0.5mu,\mkern-0.5mu}
Y_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y_{1}
\mathbin{\mkern-0.5mu,\mkern-0.5mu}
X_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x_{1}
\color[RGB]{0,0,0}\nonscript\:\vert\nonscript\:\mathopen{} \mathsfit{I})
}
\end{aligned}
\]
- “Guess all variates” (unsupervised learning, generative algorithms):
\[
\mathrm{P}(\color[RGB]{238,102,119}
Z_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z
\color[RGB]{0,0,0}\nonscript\:\vert\nonscript\:\mathopen{}
\color[RGB]{34,136,51}
Z_{N}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_{N}
\mathbin{\mkern-0.5mu,\mkern-0.5mu}\dotsb \mathbin{\mkern-0.5mu,\mkern-0.5mu}
Z_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_{1}
\color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I})
=
\frac{
\mathrm{P}(\color[RGB]{238,102,119}
Z_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z
\color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}
\color[RGB]{34,136,51}
Z_{N}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_{N}
\mathbin{\mkern-0.5mu,\mkern-0.5mu}\dotsb \mathbin{\mkern-0.5mu,\mkern-0.5mu}
Z_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_{1}
\color[RGB]{0,0,0}\nonscript\:\vert\nonscript\:\mathopen{} \mathsfit{I})
}{
\sum_{\color[RGB]{170,51,119}z}
\mathrm{P}(
\color[RGB]{238,102,119}
Z_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{170,51,119}z}
\color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\color[RGB]{34,136,51}
Z_{N}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_{N}
\mathbin{\mkern-0.5mu,\mkern-0.5mu}\dotsb \mathbin{\mkern-0.5mu,\mkern-0.5mu}
Z_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}z_{1}
\color[RGB]{0,0,0}\nonscript\:\vert\nonscript\:\mathopen{} \mathsfit{I})
}
\]
- Previous predictors known, previous predictands unknown (unsupervised learning, clustering)
\[
\begin{aligned}
&\mathrm{P}(\color[RGB]{238,102,119}
Y_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y
\color[RGB]{0,0,0}\nonscript\:\vert\nonscript\:\mathopen{}
\color[RGB]{34,136,51}
X_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x
\, \mathbin{\mkern-0.5mu,\mkern-0.5mu}\,
X_{N}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x_{N}
\mathbin{\mkern-0.5mu,\mkern-0.5mu}\dotsb\mathbin{\mkern-0.5mu,\mkern-0.5mu}
X_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x_{1}
\color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I})
\\[2ex]
&\quad{}=
\frac{
\sum_{\color[RGB]{204,187,68}y_{N}, \dotsc, y_{1}}
\mathrm{P}(\color[RGB]{238,102,119}
Y_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y
\color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}
\color[RGB]{34,136,51}
X_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x
\color[RGB]{0,0,0}\, \mathbin{\mkern-0.5mu,\mkern-0.5mu}\,
\color[RGB]{204,187,68}
Y_{N}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y_{N}
\color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\color[RGB]{34,136,51}
X_{N}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x_{N}
\color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\dotsb\mathbin{\mkern-0.5mu,\mkern-0.5mu}
\color[RGB]{204,187,68}
Y_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y_{1}
\color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\color[RGB]{34,136,51}
X_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x_{1}
\color[RGB]{0,0,0}\nonscript\:\vert\nonscript\:\mathopen{} \mathsfit{I})
}{
\sum_{{\color[RGB]{170,51,119}y}, \color[RGB]{204,187,68}y_{N}, \dotsc, y_{1}}
\mathrm{P}(\color[RGB]{238,102,119}
Y_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}{\color[RGB]{170,51,119}y}
\color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}
\color[RGB]{34,136,51}
X_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x
\color[RGB]{0,0,0}\, \mathbin{\mkern-0.5mu,\mkern-0.5mu}\,
\color[RGB]{204,187,68}
Y_{N}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y_{N}
\color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\color[RGB]{34,136,51}
X_{N}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x_{N}
\color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\dotsb\mathbin{\mkern-0.5mu,\mkern-0.5mu}
\color[RGB]{204,187,68}
Y_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y_{1}
\color[RGB]{0,0,0}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\color[RGB]{34,136,51}
X_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x_{1}
\color[RGB]{0,0,0}\nonscript\:\vert\nonscript\:\mathopen{} \mathsfit{I})
}
\end{aligned}
\]
All these formulae, even for hybrid tasks, involve sums and ratios of only one distribution:
\[\boldsymbol{
\mathrm{P}(\color[RGB]{68,119,170}
Y_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y_{\text{new}}
\mathbin{\mkern-0.5mu,\mkern-0.5mu}
X_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x_{\text{new}}
\mathbin{\mkern-0.5mu,\mkern-0.5mu}\dotsb \mathbin{\mkern-0.5mu,\mkern-0.5mu}
Y_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y_{1}
\mathbin{\mkern-0.5mu,\mkern-0.5mu}
X_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x_{1}
\color[RGB]{0,0,0}\nonscript\:\vert\nonscript\:\mathopen{} \mathsfit{I})
}
\]
and if the problem is exchangeable, for instance without time dependence or memory effects, the distribution can be calculated in a simpler way:
\[
\begin{aligned}
&\mathrm{P}\bigl(
\color[RGB]{68,119,170}Y_{\text{new}} \mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y
\mathbin{\mkern-0.5mu,\mkern-0.5mu}
X_{\text{new}} \mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x
\mathbin{\mkern-0.5mu,\mkern-0.5mu}
\dotsb
\mathbin{\mkern-0.5mu,\mkern-0.5mu}
Y_{1} \mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y_{1}
\mathbin{\mkern-0.5mu,\mkern-0.5mu}
X_{1} \mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x_{1}
\color[RGB]{0,0,0}\pmb{\nonscript\:\big\vert\nonscript\:\mathopen{}} \mathsfit{I}\bigr)
\\[2ex]
&\qquad{}=
\sum_{\boldsymbol{f}}
f({\color[RGB]{68,119,170}Y_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y \mathbin{\mkern-0.5mu,\mkern-0.5mu}X_{\text{new}}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x}) \cdot
\, \dotsb\, \cdot
f({\color[RGB]{68,119,170}Y_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y_{1} \mathbin{\mkern-0.5mu,\mkern-0.5mu}X_{1}\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x_{1}})
\cdot
\mathrm{P}(F\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}\boldsymbol{f}\nonscript\:\vert\nonscript\:\mathopen{} \mathsfit{I})
\end{aligned}
\]
Possibly there is a final decision about the output (if a single output is required), using some utilities and the principle of maximal expected utility:
\[
\mathsfit{\color[RGB]{204,187,68}D}_{\text{optimal}} =
\argmax_{\mathsfit{\color[RGB]{204,187,68}D}} \sum_{\color[RGB]{238,102,119}y} \mathrm{U}(\mathsfit{\color[RGB]{204,187,68}D}\mathbin{\mkern-0.5mu,\mkern-0.5mu}{\color[RGB]{238,102,119}Y\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y} \nonscript\:\vert\nonscript\:\mathopen{} \mathsfit{I}) \cdot
\mathrm{P}({\color[RGB]{238,102,119}Y\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}y} \nonscript\:\vert\nonscript\:\mathopen{} {\color[RGB]{34,136,51}X\mathclose{}\mathord{\nonscript\mkern 0mu\textrm{\small=}\nonscript\mkern 0mu}\mathopen{}x} \mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{\color[RGB]{34,136,51}data}\mathbin{\mkern-0.5mu,\mkern-0.5mu}\mathsfit{I})
\]