StatQuest Guide to Machine Learning by Josh Starmer

This project is all about Josh Starmer's StatQuest Guide to Machine Learning.. I first learned about Josh from his Youtube videos. Although his videos are extremely entertaining, make no mistake that Josh is an expert in his field an masterfully breaks down each topic into digestible pieces. I took some time to code along with the examples in the book. The code that I wrote for this project is linear in nature and not functionalized or optimized. This is because I learned some of these topics as I was coding them and wanted to lay out the logic plainly so that I could be sure to master them.

Continuous Probability Distribution

Gaussian distributions are pivotal in capturing natural occurrences in data, characterized by their bell-shaped curve. This distribution is defined by the data's mean and variance, where approximately 68% of data falls within one standard deviation, 95% within two, and 99.7% within three. Its applications in machine learning include anomaly detection, feature scaling, and dimensionality reduction.

Gaussian Distribution Formula

Decision Trees

Decision trees, utilized for both classification and regression, are a cornerstone of machine learning. These models split data to maximize node homogeneity, providing a robust method for handling non-linear data. They are foundational to numerous ensemble methods.

Classification Trees

Regression Trees

Sum of Squared Residuals/Mean Squared Error

When designing a machine learning model, you must decide how to measure the trustworthiness of the model's outputs. Typically a loss function is used to measure the differnce between the model output and the observed data. Two of the simplest functions that are important to note are the Sum of Squared Residuals and the Mean Squared Error.

SSR and MSE

Discrete Probability Distributions

The Binomial Distribution is used in statistics to facilitate determining probabilities when collecting more information would be cost prohibitive or time consuming. If we are to use a histogram in order to categorize discrete data, then the Binomial Distribution can help us to fill in the blanks when bins are empty, using mathematical expression. The Binomial Distribution is used for determining probabilities discrete distributions using binary outcomes. When determining the probabilistic outcomes of events that happen in discrete units of time, use Poisson Distribution.

Binomial

Poisson

Gradient Descent

A collection of derivatives of the same function but with respect to different parameters is called a gradient. Gradient Descent is used to find the optimal parameter that fit the data to a model. Gradient descent begins with a guess and then optimizes until the cost function in minimized.

Gradient Descent

Gradient Descent - Two Parameters

Naive Bayes

Naive Bayes algorithm is a very simple, yet notably effective method of classification. It is used to predict the label of given data. Naive Bayes is called so because is assumes no prior knowledge of conditions that might be related to the current event. Naive Bayes is extremely fast compared to more sophisticated models is and easy to implement. Here is another example of how I used simple coding without much optimization in order to master the topic.

Gaussian Naive Bayes

Multinomial Naive Bayes

Regularization

Regularization is a techniques used to prevent overfitting. By introducing a penaly to the loss term. Adding this penalty should may produce less accuracy on the training data, but increase the model's ability to generalize to unseen data.

Regularization

Sensitivity and Specificity

Sensitivity and specificity are two statistical measures of the performance of a binary classification test. Sensitivity (also known as Recall or True Positive Rate) measure the proportion of actual positive that are correctly classified. Specificity (also known as True Negative Rate) measure the proportion of actual negatives that are correctly classified.

Sensitivity and Specificity

Support Vector Machines

Support Vector machines are a set of supervised learning methods used for classification, regression, and outliers detection. SVMs used for binary classification are renowned for their effectiveness in high-dimensional spaces.

Support Vector Machine