Popular Methods for Machine Learning

Machine Learning involves the programmatic training of applications to predict outcomes. Numerous algorithms and techniques are available for addressing prediction challenges in machine learning. In this discussion, we will explore three popular methods: Logistic Regression, Decision Tree, Random Forests, and Neural Networks.

Logistic Regression

Logistic Regression, a supervised machine learning algorithm, employs a binary classification approach, yielding a result based on two possible values while outputting only one, such as a Boolean (true or false). Despite its historical roots dating back to the 19th century, this method remains popular in machine learning. The resulting function displays an "S"-shaped graph, utilizing the logit function to constrain probabilities between 0 and 1.


For instance, consider the scenario where we are trying to predict the likelihood of a person being approved for a mortgage based on their credit score and income. In the provided colored data point chart, credit score and income are plotted, with the minimum credit score set at 300, serving as the mean-centered data point representing the average credit score in the population. Credit scores above the mean yield positive values, while those below the mean result in negative values (as credit scores cannot go below 300). This dataset is utilized to predict mortgage approval, where red data points denote individuals with prior approvals, and blue points represent those with declined applications.

Red dots: application approved
Blue dots: application denied

Essentially, logistic regression endeavors to identify a line that divides the data into two regions—one indicating customers likely to be approved and the other indicating those unlikely to be approved. Employing a supervised machine learning approach, the algorithm explores various line positions on the graph to determine the optimal probability outcome. This reflects the underlying goal and intuition of logistic regression.

Decision Tree

It is a prevalent machine learning technique featuring an easily interpretable model that utilizes a flowchart-like tree structure. This approach involves taking the data and establishing branches within a designated tree to categorize the data into distinct groups using a supervised machine learning algorithm.

The fundamental concept is to select a variable and create branches based on decisions at each node. The decision tree continues to split branches across different variables until reaching the desired depth or data size limits. This process often employs a greedy algorithm to split branches at each level.

For instance, consider predicting rain based on three inputs: atmospheric pressure, wind speed, and temperature. Starting with a check for a temperature greater than 70 degrees at the initial decision point, the data is split into two branches. On the left, it evaluates wind speed, and if it exceeds 2 mph, the process concludes with a prediction of "no rain." Alternatively, it checks air pressure. The same logical progression occurs on the right side of the tree. The algorithm systematically assesses conditions, progressively splitting the tree until reaching the end of each node.


Random Forests

Random Forests, a popular machine learning method, is an extension of the Decision Tree concept. It employs multiple decision trees based on various datasets. Each decision tree is treated as a "vote," and the final decision is determined by the prediction with the most votes among all sub-trees.


Neural Networks

Primarily used in deep learning and inspired by biological neurons, Neural Networks have gained substantial success in recent years. This algorithm, among the best in machine learning, involves nodes (neurons) that take input, apply transformations, and pass signals on, enabling the creation of complex models that output desired outcomes. Advances in computation (GPUs) and algorithms (backpropagation) have allowed for more layers, resulting in more complex models.

Applications of Neural Networks include simple classification, face recognition, computer vision, and speech recognition. Model selection involves choosing the right method based on the intended goal, evaluating performance using validation datasets, and considering data quality in addition to the chosen machine learning method.

Conclusion 

While numerous machine learning methods exist, other examples include Boosting, Support Vector Machines (SVM), Neural Nets, LASSO, Ridge, weighted regression, and kernel regression. Model selection involves choosing the appropriate method based on the intended goal, necessitating the evaluation of model performance through validation datasets and considering data quality alongside the chosen machine learning method.

References
Graph and Examples: AI For Business Specialization [https://www.coursera.org/specializations/ai-for-business-wharton]


Top Posts

Big Data Characteristics Explained

Planning a Data Warehouse