Other
Read books online » Other » Data Mining Mehmed Kantardzic (good english books to read .txt) 📖

Book online «Data Mining Mehmed Kantardzic (good english books to read .txt) 📖». Author Mehmed Kantardzic



1 ... 69 70 71 72 73 74 75 76 77 ... 193
Go to page:
classes.

Figure 6.10. Transformation of a decision tree into decision rules. (a) Decision tree; (b) decision rules.

For our trained decision tree in Figure 6.8, the corresponding decision rules will be If     Attribute1 = A and Attribute2 < = 70 Then Classification = CLASS1 (2.0/0); IfAttribute1 = A and Attribute2 > 70 Then Classification = CLASS2 (3.4/0.4); IfAttribute1 = B Then Classification = CLASS1 (3.2/0); IfAttribute1 = C and Attribute3 = True Then Classification = CLASS2 (2.4/0); IfAttribute1 = C and Attribute3 = False Then Classification = CLASS1 (3.0/0).

Rewriting the tree to a collection of rules, one for each leaf in the tree, would not result in a simplified model. The number of decision rules in the classification model can be extremely large and pruning of rules can improve readability of the model. In some cases, the antecedents of individual rules may contain irrelevant conditions. The rules can be generalized by deleting these superfluous conditions without affecting rule-set accuracy. What are criteria for deletion of rule conditions? Let rule R be

If A then Class C

and a more general rule R’ could be

If A’ then Class C

where A’ is obtained by deleting one condition X from A (A = A’∪X). The evidence for the importance of condition X must be found in the training samples. Each sample in the database that satisfies the condition A’ either satisfies or does not satisfy the extended conditions A. Also, each of these cases does or does not belong to the designated Class C. The results can be organized into a contingency 2 × 2 table: Class COther ClassesSatisfies condition XY1E1Does not satisfy condition XY2E2

There are Y1 + E1 cases that are covered by the original rule R, where R misclassifies E1 of them since they belong to classes other than C. Similarly, Y1 + Y2 + E1 + E2 is the total number of cases covered by rule R’, and E1 + E2 are errors. The criterion for the elimination of condition X from the rule is based on a pessimistic estimate of the accuracy of rules R and R’. The estimate of the error rate of rule R can be set to Ucf(Y1 + E1, E1), and for that of rule R’ to Ucf(Y1 + Y2 + E1 + E2, E1 + E2). If the pessimistic error rate of rule R’ is no greater than that of the original rule R, then it makes sense to delete condition X. Of course, more than one condition may have to be deleted when a rule is generalized. Rather than looking at all possible subsets of conditions that could be deleted, the C4.5 system performs greedy elimination: At each step, a condition with the lowest pessimistic error is eliminated. As with all greedy searches, there is no guarantee that minimization in every step will lead to a global minimum.

If, for example, the contingency table for a given rule R is given in Table 6.3, then the corresponding error rates are

1. for initially given rule R:

2. for a general rule R’ without condition X:

TABLE 6.3. Contingency Table for the Rule R Class COther ClassesSatisfies condition X81Does not satisfy condition X70

Because the estimated error rate of the rule R’ is lower than the estimated error rate for the initial rule R, a rule set pruning could be done by simplifying the decision rule R and replacing it with R’.

One complication caused by a rule’s generalization is that the rules are no more mutually exclusive and exhaustive. There will be the cases that satisfy the conditions of more than one rule, or of no rules. The conflict resolution schema adopted in C4.5 (detailed explanations have not been given in this book) selects one rule when there is “multiple-rule satisfaction.” When no other rule covers a sample, the solution is a default rule or a default class. One reasonable choice for the default class would be the class that appears most frequently in the training set. C.4.5 uses a modified strategy and simply chooses as the default class the one that contains the most training samples not covered by any rule.

The other possibility of reducing the complexity of decision rules and decision trees is a process of grouping attribute values for categorical data. A large number of values cause a large space of data. There is a concern that useful patterns may not be detectable because of the insufficiency of training data, or that patterns will be detected but the model will be extremely complex.. To reduce the number of attribute values it is necessary to define appropriate groups. The number of possible splitting is large: for n values, there exist 2n−1 − 1 nontrivial binary partitions. Even if the values are ordered, there are n − 1 “cut values” for binary splitting. A simple example, which shows the advantages of grouping categorical values in decision-rules reduction, is given in Figure 6.11.

Figure 6.11. Grouping attribute values can reduce decision-rules set.

C4.5 increases the number of grouping combinations because it does not include only binary categorical data, but also n-ary partitions. The process is iterative, starting with an initial distribution where every value represents a separate group, and then, for each new iteration, analyzing the possibility of merging the two previous groups into one. Merging is accepted if the information gain ratio (explained earlier) is nondecreasing. A final result may be two or more groups that will simplify the classification model based on decision trees and decision rules.

C4.5 was superseded in 1997 by a commercial system C5.0. The changes include

1. a variant of boosting technique, which constructs an ensemble of classifiers that are then voted to give a final classification, and

2. new data types such as dates, work with “not applicable” values, concept of variable misclassification costs, and mechanisms to pre-filter attributes.

C5.0 greatly improves scalability of both decision trees and rule sets, and enables successful applications with large real-world data sets. In

1 ... 69 70 71 72 73 74 75 76 77 ... 193
Go to page:

Free ebook «Data Mining Mehmed Kantardzic (good english books to read .txt) 📖» - read online now

Comments (0)

There are no comments yet. You can be the first!
Add a comment