class: center, middle, inverse, title-slide # Supervised learning ###
Data Analytics for Psychology and Business
### April 2019 --- layout: true <div class="my-footer"> <span style="text-align:center"> <span> <img src="https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_image/by-sa.png" height=14 style="vertical-align: middle"/> </span> <a href="https://cdsbasel.github.io/dataanalytics/"> <span style="padding-left:82px"> <font color="#7E7E7E"> cdsbasel.github.io/dataanalytics/ </font> </span> </a> <a href="https://cdsbasel.github.io/dataanalytics/"> <font color="#7E7E7E"> Data Analytics for Psychology and Business | April 2019 </font> </a> </span> </div> --- class: center, middle <a><h1>Fitting</h1></a> <font color = "gray"><h1>Evaluation</h1></font> <font color = "gray"><h1>Tuning</h1></font> --- .pull-left45[ # Fitting <p style="padding-top:1px"></p> Models are actually <high>families of models</high>, with every parameter combination specifying a different model. To fit a model means to <high>identify</high> from the family of models <high>the specific model that fits the data best</high>. ] .pull-right45[ <br><br> <p align = "center"> <img src="image/curvefits.png" height=480px><br> <font style="font-size:10px">adapted from <a href="https://www.explainxkcd.com/wiki/index.php/2048:_Curve-Fitting">explainxkcd.com</a></font> </p> ] --- # Which of these models is better? Why? <img src="SupervisedLearning_files/figure-html/unnamed-chunk-2-1.png" width="90%" style="display: block; margin: auto;" /> --- # Which of these models is better? Why? <img src="SupervisedLearning_files/figure-html/unnamed-chunk-3-1.png" width="90%" style="display: block; margin: auto;" /> --- # Loss function .pull-left45[ Possible <high>the most important concept</high> in statistics and machine learning. The loss function defines some <high>summary of the errors committed by the model</high>. <p style="padding-top:7px"> `$$\Large Loss = f(Error)$$` <p style="padding-top:7px"> <u>Two purposes</u> <table style="cellspacing:0; cellpadding:0; border:none;"> <tr> <td> <b>Purpose</b> </td> <td> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> Fitting </td> <td bgcolor="white"> Find parameters that minimize loss function. </td> </tr> <tr> <td> Evaluation </td> <td> Calculate loss function for fitted model. </td> </tr> </table> ] .pull-right45[ <img src="SupervisedLearning_files/figure-html/unnamed-chunk-4-1.png" width="90%" style="display: block; margin: auto;" /> ] --- class: center, middle <high><h1>Regression</h1></high> <font color = "gray"><h1>Decision Trees</h1></font> <font color = "gray"><h1>Random Forests</h1></font> --- # Regression .pull-left45[ In [regression](https://en.wikipedia.org/wiki/Regression_analysis), the criterion `\(Y\)` is modeled as the <high>sum</high> of <high>features</high> `\(X_1, X_2, ...\)` <high>times weights</high> `\(\beta_1, \beta_2, ...\)` plus `\(\beta_0\)` the so-called the intercept. <p style="padding-top:10px"></p> `$$\large \hat{Y} = \beta_{0} + \beta_{1} \times X_1 + \beta_{2} \times X2 + ...$$` <p style="padding-top:10px"></p> The weight `\(\beta_{i}\)` indiciates the <high>amount of change</high> in `\(\hat{Y}\)` for a change of 1 in `\(X_{i}\)`. Ceteris paribus, the <high>more extreme</high> `\(\beta_{i}\)`, the <high>more important</high> `\(X_{i}\)` for the prediction of `\(Y\)` <font style="font-size:12px">(Note: the scale of `\(X_{i}\)` matters too!).</font> If `\(\beta_{i} = 0\)`, then `\(X_{i}\)` <high>does not help</high> predicting `\(Y\)` ] .pull-right45[ <img src="SupervisedLearning_files/figure-html/unnamed-chunk-5-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Regression loss .pull-left45[ <p> <font style="font-size:24"><b> Mean Squared Error (MSE)</b></font><br><br><high>Average squared distance</high> between predictions and true values?<br> $$ MSE = \frac{1}{n}\sum_{i \in 1,...,n}(Y_{i} - \hat{Y}_{i})^{2}$$ <br><font style="font-size:24"><b> Mean Absolute Error (MAE)</b></font><br><br><high>Average absolute distance</high> between predictions and true values?<br> $$ MAE = \frac{1}{n}\sum_{i \in 1,...,n} \lvert Y_{i} - \hat{Y}_{i} \rvert$$ </p> ] .pull-right45[ <img src="SupervisedLearning_files/figure-html/unnamed-chunk-6-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Fitting .pull-left45[ There are two fundamentally different ways to find the set of parameters that minimizes loss. <font style="font-size:24"><b> Analytically </b> In rare cases, the parameters can be <high>directly calculated</high>, e.g., using the <i>normal equation</i>: `$$\boldsymbol \beta = (\boldsymbol X^T\boldsymbol X)^{-1}\boldsymbol X^T\boldsymbol y$$` <font style="font-size:24"><b> Numerically </b> In most cases, parameters need to be found using a <high>directed trial and error</high>, e.g., <i>gradient descent</i>: `$$\boldsymbol \beta_{n+1} = \boldsymbol \beta_{n}+\gamma \nabla F(\boldsymbol \beta_{n})$$` ] .pull-right45[ <p align = "center"> <img src="image/gradient.png" height=420px><br> <font style="font-size:10px">adapted from <a href="https://me.me/i/machine-learning-gradient-descent-machine-learning-machine-learning-behind-the-ea8fe9fc64054eda89232d7ffc9ba60e">me.me</a></font> </p> ] --- .pull-left45[ # Fitting <p style="padding-top:1px"></p> There are two fundamentally different ways to find the set of parameters that minimizes loss. <font style="font-size:24"><b> Analytically </b> In rare cases, the parameters can be <high>directly calculated</high>, e.g., using the <i>normal equation</i>: `$$\boldsymbol \beta = (\boldsymbol X^T\boldsymbol X)^{-1}\boldsymbol X^T\boldsymbol y$$` <font style="font-size:24"><b> Numerically </b> In most cases, parameters need to be found using a <high>directed trial and error</high>, e.g., <i>gradient descent</i>: `$$\boldsymbol \beta_{n+1} = \boldsymbol \beta_{n}+\gamma \nabla F(\boldsymbol \beta_{n})$$` ] .pull-right45[ <br><br2> <p align = "center"> <img src="image/gradient1.gif" height=250px><br> <font style="font-size:10px">adapted from <a href="https://dunglai.github.io/2017/12/21/gradient-descent/ ">dunglai.github.io</a></font><br> <img src="image/gradient2.gif" height=250px><br> <font style="font-size:10px">adapted from <a href="https://dunglai.github.io/2017/12/21/gradient-descent/ ">dunglai.github.io</a></font> </p> ] --- # 2 types of supervised problems .pull-left5[ There are two types of supervised learning problems that can often be approached using the same model. <font style="font-size:24px"><b>Regression</b></font> Regression problems involve the <high>prediction of a quantitative feature</high>. E.g., predicting the cholesterol level as a function of age. <font style="font-size:24px"><b>Classification</b></font> Classification problems involve the <high>prediction of a categorical feature</high>. E.g., predicting the type of chest pain as a function of age. ] .pull-right4[ <p align = "center"> <img src="image/twotypes.png" height=440px><br> </p> ] --- # Logistic regression .pull-left45[ In [logistic regression](https://en.wikipedia.org/wiki/Logistic_regression), the class criterion `\(Y \in (0,1)\)` is modeled also as the <high>sum of feature times weights</high>, but with the prediction being transformed using a <high>logistic link function</high>: <p style="padding-top:10px"></p> `$$\large \hat{Y} = Logistic(\beta_{0} + \beta_{1} \times X_1 + ...)$$` <p style="padding-top:10px"></p> The logistic function <high>maps predictions to the range of 0 and 1</high>, the two class values. <p style="padding-top:10px"></p> $$ Logistic(x) = \frac{1}{1+exp(-x)}$$ ] .pull-right45[ <img src="SupervisedLearning_files/figure-html/unnamed-chunk-7-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Logistic regression .pull-left45[ In [logistic regression](https://en.wikipedia.org/wiki/Logistic_regression), the class criterion `\(Y \in (0,1)\)` is modeled also as the <high>sum of feature times weights</high>, but with the prediction being transformed using a <high>logistic link function</high>: <p style="padding-top:10px"></p> `$$\large \hat{Y} = Logistic(\beta_{0} + \beta_{1} \times X_1 + ...)$$` <p style="padding-top:10px"></p> The logistic function <high>maps predictions to the range of 0 and 1</high>, the two class values. <p style="padding-top:10px"></p> $$ Logistic(x) = \frac{1}{1+exp(-x)}$$ ] .pull-right45[ <img src="SupervisedLearning_files/figure-html/unnamed-chunk-8-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Classification loss - two ways .pull-left45[ <font style="font-size:24px"><b>Distance</b></font> Logloss is <high>used to fit the parameters</high>, alternative distance measures are MSE and MAE. `$$\small LogLoss = -\frac{1}{n}\sum_{i}^{n}(log(\hat{y})y+log(1-\hat{y})(1-y))$$` `$$\small MSE = \frac{1}{n}\sum_{i}^{n}(y-\hat{y})^2, \: MAE = \frac{1}{n}\sum_{i}^{n} \lvert y-\hat{y} \rvert$$` <font style="font-size:24px"><b>Overlap</b></font> Does the <high>predicted class match the actual class</high>. Often preferred for <high>ease of interpretation</high>. `$$\small Loss_{01}=\frac{1}{n}\sum_i^n I(y \neq \lfloor \hat{y} \rceil)$$` ] .pull-right45[ <img src="SupervisedLearning_files/figure-html/unnamed-chunk-9-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Confusion matrix .pull-left45[ The confusion matrix <high>tabulates prediction matches and mismatches</high> as a function of the true class. The confusion matrix permits specification of a number of <high>helpful performance metrics</high>. <br> <u> Confusion matrix </u> <table style="cellspacing:0; cellpadding:0; border:none;"> <tr> <td> </td> <td> <eq><b>ŷ = 1</b></eq> </td> <td> <eq><b>ŷ = 0</b></eq> </td> </tr> <tr> <td bgcolor="white"> <eq><b>y = 1</b></eq> </td> <td bgcolor="white"> <font color="#6ABA9A"> True positive (TP)</font> </td> <td bgcolor="white"> <font color="#EA4B68"> False negative (FN)</font> </td> </tr> <tr> <td> <eq><b>y = 0</b></eq> </td> <td> <font color="#EA4B68"> False positive (FP)</font> </td> <td> <font color="#6ABA9A"> True negative (TN)</font> </td> </tr> </table> ] .pull-right45[ <b>Accuracy</b>: Of all cases</i>, what percent of predictions are correct? `$$\small Acc. = \frac{TP + TN}{ TP + TN + FN + FP} = 1-Loss_{01}$$` <p style="padding-top:10px"></p> <b>Sensitivity</b>: Of the truly Positive cases</i>, what percent of predictions are correct? `$$\small Sensitivity = \frac{TP}{ TP +FN }$$` <b>Specificity</b>: Of the truly Negative cases</i>, what percent of predictions are correct? <p style="padding-top:10px"></p> `$$\small Specificity = \frac{TN}{ TN + FP }$$` ] --- # Confusion matrix .pull-left45[ The confusion matrix <high>tabulates prediction matches and mismatches</high> as a function of the true class. The confusion matrix permits specification of a number of <high>helpful performance metrics</high>. <br> <u> Confusion matrix </u> <table style="cellspacing:0; cellpadding:0; border:none;"> <tr> <td> </td> <td> <eq><b>"Default"</b></eq> </td> <td> <eq><b>"Repay"</b></eq> </td> </tr> <tr> <td bgcolor="white"> <eq><b>Default</b></eq> </td> <td bgcolor="white"> <font color="#6ABA9A"> TP = 3</font> </td> <td bgcolor="white"> <font color="#EA4B68"> FN = 1</font> </td> </tr> <tr> <td> <eq><b>Repay</b></eq> </td> <td> <font color="#EA4B68"> FP = 1</font> </td> <td> <font color="#6ABA9A"> TN = 2</font> </td> </tr> </table> ] .pull-right45[ <b>Accuracy</b>: Of all cases</i>, what percent of predictions are correct? `$$\small Acc. = \frac{TP + TN}{ TP + TN + FN + FP} = 1-Loss_{01}$$` <p style="padding-top:10px"></p> <b>Sensitivity</b>: Of the truly Positive cases</i>, what percent of predictions are correct? `$$\small Sensitivity = \frac{TP}{ TP +FN }$$` <b>Specificity</b>: Of the truly Negative cases</i>, what percent of predictions are correct? <p style="padding-top:10px"></p> `$$\small Specificity = \frac{TN}{ TN + FP }$$` ] --- class: center, middle <font color = "gray"><h1>Fitting</h1></font> <a><h1>Evaluation</h1></a> <font color = "gray"><h1>Tuning</h1></font> --- # Hold-out data .pull-left45[ Model performance must be evaluated as true prediction on an <high>unseen data set</high>. The unseen data set can be <high>naturally</high> occurring, e.g., using 2019 stock prizes to evaluate a model fit using 2018 stock prizes. More commonly unseen data is created by <high>splitting the available data</high> into a training set and a test set. ] .pull-right45[ <p align = "center"> <img src="image/testdata.png" height=430px> </p> ] --- # Training Training a model means to <high>fit the model</high> to data by finding the parameter combination that <high>minizes some error function</high>, e.g., mean squared error (MSE). <p align = "center" style="padding-top:30px"> <img src="image/training_flow.png" height=350px> </p> --- # Test To test a model means to <high>evaluate the prediction error</high> for a fitted model, i.e., for a <high>fixed parameter combination</high>. <p align = "center" style="padding-top:30px"> <img src="image/testing_flow.png" height=350px> </p> --- .pull-left4[ <br><br> # Overfitting Occurs when a model <high>fits data too closely</high> and therefore <high>fails to reliably predict</high> future observations. In other words, overfitting occurs when a model <high>'mistakes' random noise for a predictable signal</high>. More <high>complex models</high> are more <high>prone to overfitting</high>. ] .pull-right5[ <br><br><br> <p align = "center" style="padding-top:0px"> <img src="image/overfitting.png"> </p> ] --- # Overfitting <img src="SupervisedLearning_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- class: center, middle <high><h1>Regression</h1></high> <font color = "gray"><h1>Decision Trees</h1></font> <font color = "gray"><h1>Random Forests</h1></font> --- # Regularized regression .pull-left45[ Penalizes regression loss for having large `\(\beta\)` values using the <high>lambda λ tuning parameter</high> and one of several penalty functions. $$Regularized \;loss = \sum_i^n (y_i-\hat{y}_i)^2+\lambda \sum_j^p f(\beta_j)) $$ <table style="cellspacing:0; cellpadding:0; border:none;"> <tr> <td bgcolor="white"> <b>Name</b> </td> <td bgcolor="white"> <b>Function</b> </td> <td bgcolor="white"> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> <i>Lasso</i> </td> <td bgcolor="white"> |β<sub>j</sub>| </td> <td bgcolor="white"> Penalize by the <high>absolute</high> regression weights. </td> </tr> <tr> <td bgcolor="white"> <i>Ridge</i> </td> <td bgcolor="white"> β<sub>j</sub><sup>2</sup> </td> <td bgcolor="white"> Penalize by the <high>squared</high> regression weights. </td> </tr> <tr> <td bgcolor="white"> <i>Elastic net</i> </td> <td bgcolor="white"> |β<sub>j</sub>| + β<sub>j</sub><sup>2</sup> </td> <td bgcolor="white"> Penalize by Lasso and Ridge penalties. </td> </tr> </table> ] .pull-right45[ <p align = "center"> <img src="image/bonsai.png"><br> <font style="font-size:10px">from <a href="https://www.mallorcazeitung.es/leben/2018/05/02/bonsai-liebhaber-mallorca-kunst-lebenden/59437.html">mallorcazeitung.es</a></font> </p> ] --- .pull-left45[ # Regularized regression <p style="padding-top:1px"></p> Despite <high>superficial similarities</high>, Lasso and Ridge show very different behavior. <p style="padding-top:10px"></p> <b>Ridge</b> By penalizing the most extreme βs most strongly, Ridge leads to (relatively) more <high>uniform βs</high>. <p style="padding-top:10px"></p> <b>Lasso</b> By penalizing all βs equally, irrespective of magnitude, Lasso drives some βs to 0 resulting effectively in <high>automatic feature selection</high>. ] .pull-right45[ <br> <p align = "center"> <font style="font-size:40"><i>Ridge</i></font><br> <img src="image/ridge.png" height=210px><br> <font style="font-size:10px">from <a href="https://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf">James et al. (2013) ISLR</a></font> </p> <p align = "center"> <font style="font-size:40"><i>Lasso</i></font><br> <img src="image/lasso.png" height=210px><br> <font style="font-size:10px">from <a href="https://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf">James et al. (2013) ISLR</a></font> </p> ] --- class: center, middle <font color = "gray"><h1>Regression</h1></font> <high><h1>Decision Trees</h1></high> <font color = "gray"><h1>Random Forests</h1></font> --- # CART .pull-left45[ CART is short for <high>Classification and Regression Trees</high>, which are often just called <high>Decision trees</high>. In [decision trees](https://en.wikipedia.org/wiki/Decision_tree), the criterion is modeled as a <high>sequence of logical TRUE or FALSE questions</high>. <br><br> ] .pull-right45[ <p align = "center" style="padding-top:0px"> <img src="image/tree.png"> </p> ] --- # Classificiation trees .pull-left45[ Classification trees (and regression trees) are created using a relatively simple <high>three-step algorithm</high>. <u>Algorithm</u> 1 - <high>Split</high> nodes to maximize <b>purity gain</b> (e.g., Gini gain). 2 - <high>Repeat</high> until pre-defined threshold (e.g., `minsplit`) splits are no longer possible. 3 - <high>Prune</high> tree to reasonable size. ] .pull-right45[ <p align = "center" style="padding-top:0px"> <img src="image/tree.png"> </p> ] --- # Node splitting .pull-left45[ Classification trees attempt to <high>minize node impurity</high> using, e.g., the <high>Gini coefficient</high>. `$$\large Gini(S) = 1 - \sum_j^kp_j^2$$` Nodes are <high>split</high> using the variable and split value that <high>maximizes Gini gain</high>. `$$Gini \; gain = Gini(S) - Gini(A,S)$$` with `$$Gini(A, S) = \sum \frac{n_i}{n}Gini(S_i)$$` ] .pull-right45[ <p align = "center" style="padding-top:0px"> <img src="image/splitting.png"> </p> ] --- # Pruning trees .pull-left45[ Classification trees are <high>pruned</high> back such that every split has a purity gain of at least <high><mono>cp</mono></high>, with `cp` typically set to `.01`. Minimize: <br> $$ \large `\begin{split} Loss = & Impurity\,+\\ &cp*(n\:terminal\:nodes)\\ \end{split}` $$ ] .pull-right45[ <p align = "center" style="padding-top:0px"> <img src="image/splitting.png"> </p> ] --- # Pruning trees .pull-left45[ Classification trees are <high>pruned</high> back such that every split has a purity gain of at least <high><mono>cp</mono></high>, with `cp` typically set to `.01`. Minimize: <br> $$ \large `\begin{split} Loss = & Impurity\,+\\ &cp*(n\:terminal\:nodes)\\ \end{split}` $$ ] .pull-right45[ <p align = "center"> <img src="image/cp.png"> </p> ] --- # Regression trees .pull-left45[ Trees can also be used to perform regression tasks. Instead of impurity, regression trees attempt to <high>minimize within-node variance</high> (or maximize node homogeneity): `$$\large SSE = \sum_{i \in S_1}(y_i - \bar{y}_1)^2+\sum_{i \in S_2}(y_i - \bar{y}_2)^2$$` <u>Algorithm</u> 1 - <high>Split</high> nodes to maximize <b>homogeneity gain</b>. 2 - <high>Repeat</high> until pre-defined threshold (e.g., `minsplit`) splits are no longe possible. 3 - <high>Prune</high> tree to reasonable size. ] .pull-right45[ <p align = "center" style="padding-top:0px"> <img src="image/splitting_regr.png"> </p> ] --- class: center, middle <font color = "gray"><h1>Regression</h1></font> <font color = "gray"><h1>Decision Trees</h1></font> <high><h1>Random Forests</h1></high> --- .pull-left45[ # Random Forest <p style="padding-top:1px"></p> In [Random Forest](https://en.wikipedia.org/wiki/Random_forest), the criterion is modeled as the <high>aggregate prediction of a large number of decision trees</high> each based on different features. <br> <u>Algorithm</u> 1 - <high>Repeat</high> *n* times 1 - <high>Resample</high> data 2 - <high>Grow</high> non-pruned decision tree Each split <high>consider only <i>m</i> features</high> 2 - <high>Average</high> fitted values ] .pull-right45[ <br> <p align = "center" style="padding-top:0px"> <img src="image/rf.png"> </p> ] --- # Random Forest .pull-left45[ <p style="padding-top:1px"></p> Random forests make use of important machine learning elements, <high>resampling</high> and <high>averaging</high> that together are also referred to as <high>bagging</high>. <table style="cellspacing:0; cellpadding:0; border:none;"> <col width="30%"> <col width="70%"> <tr> <td bgcolor="white"> <b>Element</b> </td> <td bgcolor="white"> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> <i>Resampling</i> </td> <td bgcolor="white"> Creates new data sets that vary in their composition thereby <high>deemphasizing idiosyncracies</high> of the available data. </td> </tr> <tr> <td bgcolor="white"> <i>Averaging</i> </td> <td bgcolor="white"> Combining predictions typically <high>evens out idiosyncracies</high> of the models created from single data sets. </td> </tr> </table> ] .pull-right45[ <p align = "center" style="padding-top:0px"> <img src="image/tree_crowd.png"> </p> ] --- # Random Forest .pull-left45[ <p style="padding-top:1px"></p> Random forests make use of important machine learning elements, <high>resampling</high> and <high>averaging</high> that together are also referred to as <high>bagging</high>. <table style="cellspacing:0; cellpadding:0; border:none;"> <col width="30%"> <col width="70%"> <tr> <td bgcolor="white"> <b>Element</b> </td> <td bgcolor="white"> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> <i>Resampling</i> </td> <td bgcolor="white"> Creates new data sets that vary in their composition thereby <high>deemphasizing idiosyncracies</high> of the available data. </td> </tr> <tr> <td bgcolor="white"> <i>Averaging</i> </td> <td bgcolor="white"> Combining predictions typically <high>evens out idiosyncracies</high> of the models created from single data sets. </td> </tr> </table> ] .pull-right45[ <p align = "center"> <img src="image/mtry_parameter.png"> </p> ] --- class: center, middle <font color = "gray"><h1>Fitting</h1></font> <font color = "gray"><h1>Evaluation</h1></font> <a><h1>Tuning</h1></a> --- # Tuning .pull-left45[ All machine learning models are equipped with tuning parameters that <high> control model complexity<high>. These tuning parameters can be identified using a <high>validation set</high> created from the traning data. <u>Logic</u> 1 - Create separate test set. 2 - Fit model using various tuning parameters. 3 - Select tuning leading to best prediction on validation set. 4 - Refit model to entire training set (training + validation). ] .pull-right45[ <p align = "center" style="padding-top:0px"> <img src="image/validation.png" height=430px> </p> ] --- # Resampling methods .pull-left4[ Resampling methods automatize and generalize model tuning. <table style="cellspacing:0; cellpadding:0; border:none;"> <col width="30%"> <col width="70%"> <tr> <td bgcolor="white"> <b>Method</b> </td> <td bgcolor="white"> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> <i>k-fold cross-validation</i> </td> <td bgcolor="white"> Splits the data in k-pieces, use <high>each piece once</high> as the validation set, while using the other one for training. </td> </tr> <tr> <td bgcolor="white"> <i>Bootstrap</i> </td> <td bgcolor="white"> For <i>B</i> bootstrap rounds <high>sample</high> from the data <high>with replacement</high> and split the data in training and validation set. </td> </tr> </table> ] .pull-right5[ <p align = "center" style="padding-top:0px"> <img src="image/resample1.png"> </p> ] --- # Resampling methods .pull-left4[ Resampling methods automatize and generalize model tuning. <table style="cellspacing:0; cellpadding:0; border:none;"> <col width="30%"> <col width="70%"> <tr> <td bgcolor="white"> <b>Method</b> </td> <td bgcolor="white"> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> <i>k-fold cross-validation</i> </td> <td bgcolor="white"> Splits the data in k-pieces, use <high>each piece once</high> as the validation set, while using the other one for training. </td> </tr> <tr> <td bgcolor="white"> <i>Bootstrap</i> </td> <td bgcolor="white"> For <i>B</i> bootstrap rounds <high>sample</high> from the data <high>with replacement</high> and split the data in training and validation set. </td> </tr> </table> ] .pull-right5[ <p align = "center" style="padding-top:0px"> <img src="image/resample2.png"> </p> ] --- # Resampling methods .pull-left4[ Resampling methods automatize and generalize model tuning. <table style="cellspacing:0; cellpadding:0; border:none;"> <col width="30%"> <col width="70%"> <tr> <td bgcolor="white"> <b>Method</b> </td> <td bgcolor="white"> <b>Description</b> </td> </tr> <tr> <td bgcolor="white"> <i>k-fold cross-validation</i> </td> <td bgcolor="white"> Splits the data in k-pieces, use <high>each piece once</high> as the validation set, while using the other one for training. </td> </tr> <tr> <td bgcolor="white"> <i>Bootstrap</i> </td> <td bgcolor="white"> For <i>B</i> bootstrap rounds <high>sample</high> from the data <high>with replacement</high> and split the data in training and validation set. </td> </tr> </table> ] .pull-right5[ <p align = "center" style="padding-top:0px"> <img src="image/resample3.png"> </p> ] --- .pull-left45[ # <i>k</i>-fold cross validation for Ridge and Lasso <p style="padding-top:1px"></p> <b>Goal</b> Use 10-fold cross-validation to identify <high>optimal regularization parameters</high> for a regression model. <b>Consider</b> `\(\alpha \in 0, .5, 1\)` and `\(\lambda \in 1, 2, ..., 100\)` ] .pull-right45[ <br><br><br> <p align = "center"> <img src="image/lasso_process.png" height=460px> </p> ] --- class: middle, center <h1><a href=https://cdsbasel.github.io/dataanalytics/menu/materials.html>Materials</a></h1>