Data Mining Mehmed Kantardzic (good english books to read .txt) 📖
- Author: Mehmed Kantardzic
Book online «Data Mining Mehmed Kantardzic (good english books to read .txt) 📖». Author Mehmed Kantardzic
13.3.1 Representation
The first step in the GA is to represent the solution alternative (a value for the input feature) in a coded-string format. Typically, the string is a series of features with their values; each feature’s value can be coded with one from a set of discrete values called the allele set. The allele set is defined according to the needs of the problem, and finding the appropriate coding method is a part of the art of using GAs. The coding method must be minimal but completely expressive. We will use a binary vector as a chromosome to represent real values of the single variable x. The length of the vector depends on the required precision, which, in this example, is selected as 1. Therefore, we need a minimum five-bit code (string) to accommodate the range with required precision:
For this example, the mapping from a real number to a binary code is defined by the relation (because a = 0):
Opposite mapping, from the binary code to the real value of the argument, is also unique:
and it will be used only for checking the intermediate results of optimization. For example, if we want to transform the value x = 11 into a binary string, the corresponding code will be 01011. On the other hand, code 11001 represents the decimal value x = 25.
13.3.2 Initial Population
The initialization process is very simple: We randomly create a population of chromosomes (binary codes) with the given length. Suppose that we decide that the parameter for the number of strings in the population is equal to four. Then one possible randomly selected population of chromosomes is
13.3.3 Evaluation
The evaluation function for binary vectors representing chromosomes is equivalent to the initial function f(x) where the given chromosome represents the binary code for the real value x. As noted earlier, the evaluation function plays the role of the environment, rating potential solutions in terms of their fitness. For our example, four chromosomes CR1 to CR4 correspond to values for input variable x:
Consequently, the evaluation function would rate them as follows:
The results of evaluating the chromosomes initially generated may be given in a tabular form, and they are represented in Table 13.2. The expected reproduction column shows “the evaluated quality” of chromosomes in the initial population. Chromosomes CR2 and CR4 are more likely to be reproduced in the next generation than CR1 and CR3.
TABLE 13.2. Evaluation of the Initial Population
13.3.4 Alternation
In the alternation phase, the new population is selected based on the population evaluated in the previous iteration. Clearly, the chromosome CR4 in our example is the best of the four chromosomes, since its evaluation returns the highest value. In the alternation phase, an individual may be selected depending on its objective-function value or fitness value. For maximization problems, the higher the individual’s fitness is, the more probable that it can be selected for the next generation. There are different schemes that can be used in the selection process. In the simple GA we proposed earlier, the roulette wheel selection technique, an individual is selected randomly depending on a computed probability of selection for each individual. The probability of selection is computed by dividing the individual’s fitness value by the sum of fitness values of the corresponding population, and these values are represented in column 5 of Table 13.2.
In the next step we design the roulette wheel, which is, for our problem, graphically represented in Figure 13.4.
Figure 13.4. Roulette wheel for selection of the next population.
Using the roulette wheel, we can select chromosomes for the next population. Suppose that the randomly selected chromosomes for the next generation are CR1, CR2, CR2, CR4 (the selection is in accordance with the expected reproduction—column 6 in Table 13.2). In the next step, these four chromosomes undergo the genetic operations: crossover and mutation.
13.3.5 Genetic Operators
Crossover is not necessarily applied to all pairs of selected individuals. A choice is made depending on a specified probability called PC, which is typically between 0.5 and 1. If crossover is not applied (PC = 0), the offspring are simply a duplication of the parents. For the process of crossover it is necessary to determine the percentage of the population that will perform the crossover. For our particular problem we use the following parameters of the GA:
1. population size, pop-size = 4 (the parameter was already used).
2. probability of crossover, PC = 1.
3. probability of mutation, PM = 0.001 (the parameter will be used in a mutation operation).
A value of 1 for the probability of crossover translates into a 100% crossover—all chromosomes will be included in the crossover operation.
The second set of parameters in this phase of a GA is the random selection of parents for crossover and positions in the strings where the crossover will be performed. Suppose that these are randomly selected pairs: CR1–CR2 and CR2–CR4, and crossover is after the third position in the strings for both pairs. Then the selected strings
will become, after crossover, a new population:
The second operator that can be applied in every iteration of a GA is mutation. For our example, the mutation operator has a probability of 0.1%, which means that in the 1000 transformed bits, a mutation will be performed only once. Because we transformed only 20 bits (one population of 4 × 5 bits is transformed into another), the probability that a mutation will occur is very small. Therefore, we can assume that the strings
Comments (0)