We can make predictions with machine learning by generalizing our data’s pertinent characteristics. Summarizing diverse datasets provides insight that can help produce more relevant generalizations.
Data predictions provide probabilities of future outcomes by mining and analyzing existing data, also called training data. Effective prediction is a mix of engineering, statistics, and intuition. Summarization can help by shaping this intuition. In the generalization phase, we test our training data against new data, called test data, to calculate if our model is good enough to be used in real life. These two processes simplify large multidimensional datasets, so machine learning predictions can be applied to them. This article describes how summarization leads to generalization and then prediction through a real estate example.