Welcome back. This is the third and final installment of the machine learning algorithms. Next up, we'll cover Decision Trees and Random Forests.
You can think of decision trees basically as flow charts that are comprised of a lot of If..Then..Else statements. An example could look something like this:
Decision trees work by finding ways of splitting your data based on different features of the dataset. Going back to the fruit examples, a decision tree could start by asking is the shape of the fruit round, if it's yes it could then ask what is the texture of the fruit and so on. Eventually it would be ably to separate the data and tell you which features are the best at distinguishing a mango from an apple. The machine learning aspect comes in because all of this done automatically and the computer is able to see which features are most important. For our example, we only used a few features that separates mangos and apples, but imagine if we had 100 different features, the decision tree would be able to use all of these features and be able to determine where the best split occurs. One issue with decision trees is that sometimes they can be too good and cause overfitting. Remember, the point of these algorithms is to eventually use them on a new set of data to make predictions. One way to avoid this is to use the random forest algorithm.
Instead of using just one decision tree on your dataset, random forests makes a whole a bunch of decision trees using the same data. For example, one tree might use size and color of the fruit while another tree might use color and texture. The random aspect happens because it randomly chooses these features from your data and makes a decision with those features and repeats the process and then takes the average of the end results to give your weights for the features. The main take away is that decision trees use all of the features and then chooses the best split. Random forests randomly selects the features and then chooses the best split which allows your model not to overfit your data! So exciting.