Data Mining Mehmed Kantardzic (good english books to read .txt) 📖
- Author: Mehmed Kantardzic
Book online «Data Mining Mehmed Kantardzic (good english books to read .txt) 📖». Author Mehmed Kantardzic
Simplicity and easy implementation are the main reasons why AdaBoost is very popular. It can be combined with any classifiers including neural networks, decision trees, or nearest neighbor classifiers. The algorithm requires almost no parameters to tune, and is still very effective even for the most complex classification problems, but at the same time it could be sensitive to noise and outliers.
Ensemble-learning approach showed all advantages in one very famous application, Netflix $1 million competition. The Netflix prize required substantial improvement in the accuracy of predictions on how much someone is going to love a movie based on his or her previous movie preferences. Users’ rating for movies was 1 to 5 stars; therefore, the problem was classification task with five classes. Most of the top-ranked competitors have used some variations of ensemble learning, showing its advantages in practice. Top competitor BellKor team explains ideas behind its success: “Our final solution consists of blending 107 individual predictors. Predictive accuracy is substantially improved when blending multiple predictors. Our experience is that most efforts should be concentrated in deriving substantially different approaches, rather than refining a single technique. Consequently, our solution is an ensemble of many methods.”
8.5 REVIEW QUESTIONS AND PROBLEMS
1. Explain the basic idea of ensemble learning, and discuss why the ensemble mechanism is able to improve prediction accuracy of a model.
2. Designing an ensemble model, there are factors that directly affect the accuracy of the ensemble. Explain those factors and approaches to each of them.
3. Bagging and boosting are very famous ensemble approaches. Both of them generate a single predictive model from each different training set. Discuss the differences between bagging and boosting, and explain the advantages and disadvantages of each of them.
4. Propose the efficient boosting approach for a large data set.
5. In the bagging methodology, a subset is formed by samples that are randomly selected with replacement from training samples. On average, a subset contains approximately what percentage of training samples?
6. In Figure 8.7, draw a picture of the next distribution D4.
Figure 8.7. AdaBoost iterations.
7. In equation (2) of the AdaBoost algorithm (Fig. 8.8), replaces the term of . Explain how and why this change influences the AdaBoost algorithm.
Figure 8.8. AdaBoost algorithm.
Figure 8.9. Top competitors in 2007/2008 for Netflix award.
8. Consider the following data set, where there are 10 samples with one dimension and two classes:
Training samples:
(a) Determine ALL the best one-level binary decision trees.
(e.g., IF f1 ≤ 0.35, THEN Class is 1, and IF f1 > 0.35, THEN Class is −1. The accuracy of that tree is 80%)
(b) We have the following five training data sets randomly selected from the above training samples. Apply the bagging procedure using those training data sets.
(i) Construct the best one-level binary decision tree from each training data set.
(ii) Predict the training samples using each constructed one-level binary decision tree.
(iii) Combine outputs predicted by each decision tree using voting method
(iv) What is the accuracy rate provided by bagging?
Training Data Set 1:
Training Data Set 2:
Training Data Set 3:
Training Data Set 4:
Training Data Set 5:
(c) Applying the AdaBoost algorithm (Fig. 8.8) to the above training samples, we generate the following initial one-level binary decision tree from those samples:
To generate the next decision tree, what is the probability (D2 in Fig. 8.8) that each sample is selected to the training data set? (αt is defined as an accuracy rate of the initial decision tree on training samples.)
9. For classifying a new sample into four classes: C1, C2, C3, and C4, we have an ensemble that consists of three different classifiers: Classifier 1, Classifiers 2, and Classifier 3. Each of them has 0.9, 0.6, and 0.6 accuracy rate on training samples, respectively. When the new sample, X, is given, the outputs of the three classifiers are as follows:
Each number in the above table describes the probability that a classifier predicts the class of a new sample as a corresponding class. For example, the probability that Classifier 1 predicts the class of X as C1 is 0.9.
When the ensemble combines predictions of each of them, as a combination method:
(a) If the simple sum is used, which class is X classified as and why?
(b) If the weight sum is used, which class is X classified as and why?
(c) If the rank-level fusion is used, which class is X classified as and why?
10. Suppose you have a drug discovery data set, which has 1950 samples and 100,000 features. You must classify chemical compounds represented by structural molecular features as active or inactive using ensemble learning. In order to generate diverse and independent classifiers for an ensemble, which ensemble methodology would you choose? Explain the reason for selecting that methodology.
11. Which of the following is a fundamental difference between bagging and boosting?
(a) Bagging is used for supervised learning. Boosting is used with unsupervised clustering.
(b) Bagging gives varying weights to training instances. Boosting gives equal weight to all training instances.
(c) Bagging does not take the performance of previously built models into account when building a new model. With boosting each new model is built based upon the results of previous models.
(d) With boosting, each model has an equal weight in the classification of new instances. With bagging, individual models are given varying weights.
8.6 REFERENCES FOR FURTHER STUDY
Brown, G., Ensemble Learning, in Encyclopedia of Machine Learning, C. Sammut and Webb G. I., eds., Springer Press, New York, 2010.
“Ensemble learning refers to the procedures employed to train multiple learning machines and combine their outputs, treating them as a committee of decision makers.” The principle is that the committee decision, with individual predictions combined appropriately, should have better overall accuracy, on average, than any individual committee member. Numerous empirical and theoretical studies have demonstrated that ensemble models very often attain higher accuracy than single models. Ensemble methods constitute some of
Comments (0)