## Find-S Algorithm: Finding maximally specific hypotheses

Now after learning the concept of general-to-specific ordering of hypotheses, Now its time to use this partial ordering to organize the search for a hypothesis, that is consistent with the observed training examples. One way is to begin with the most specific possible hypothesis in H, then generalize this hypothesis each time it fails to cover an observed positive training example.

FIND-S algorithm is used for this purpose. Here are the steps for find-s algorithm.

To illustrate this algorithm, assume the learner is given the sequence of training examples from the EnjoySport task

1. The first step of FIND-S is to initialize h to the most specific hypothesis in H h — (Ø, Ø, Ø, Ø, Ø, Ø)
2. First training example x1 = < Sunny, Warm, Normal, Strong ,Warm ,Same>, EnjoySport = +ve. Observing the first training example, it is clear that hypothesis h is too specific. None of the “Ø” constraints in h are satisfied by this example, so each is replaced by the next more general constraint that fits the example h1 = < Sunny, Warm, Normal, Strong ,Warm, Same>.
3. Consider the second training example x2 = < Sunny, Warm, High, Strong, Warm, Same>, EnjoySport = +ve. The second training example forces the algorithm to further generalize h, this time substituting a “?” in place of any attribute value in h that is not satisfied by the new example. Now h2 =< Sunny, Warm, ?, Strong, Warm, Same>
4. Consider the third training example x3 =< Rainy, Cold, High, Strong, Warm, Change>,EnjoySport = — ve. The FIND-S algorithm simply ignores every negative example. So the hypothesis remain as before, so h3 =< Sunny, Warm, ?, Strong, Warm, Same>
5. Consider the fourth training example x4 =<Sunny,Warm,High,Strong, Cool,Change>, EnjoySport =+ve. The fourth example leads to a further generalization of h as h4 =< Sunny, Warm, ?, Strong, ?, ?>
6. So the final hypothesis is < Sunny, Warm, ?, Strong, ?, ?>

The above example, can be illustrated with the below figure.

The search begins (ho) with the most specific hypothesis in H, then considers increasingly general hypotheses (hl through h4) as mandated by the training examples. The search moves from hypothesis to hypothesis, searching from the most specific to progressively more general hypotheses along one chain of the partial ordering. At each step, the hypothesis is generalized only as far as necessary to cover the new positive example. Therefore, at each stage the hypothesis is the most specific hypothesis consistent with the training examples observed up to this point.

The key property of the FIND-S algorithm —

• FIND-S is guaranteed to output the most specific hypothesis within H that is consistent with the positive training examples
• FIND-S algorithm’s final hypothesis will also be consistent with the negative examples provided the correct target concept is contained in H, and provided the training examples are correct.

There are several questions still left unanswered, such as:
1. Has the learner converged to the correct target concept ?. Although FIND-S will find a hypothesis consistent with the training data, it has no way to determine whether it has found the only hypothesis in H consistent with the data (i.e., the correct target concept), or whether there are many other consistent hypotheses as well.
2. Why prefer the most specific hypothesis ?. In case there are multiple hypotheses consistent with the training examples, FIND-S will find the most specific. It is unclear whether we should prefer this hypothesis over, say, the most general, or some other hypothesis of intermediate generality.
3. Are the training examples consistent ?. In most practical learning problems there is some chance that the training examples will contain at least some errors or noise. Such inconsistent sets of training examples can severely mislead FIND-S, given the fact that it ignores negative examples.
4. What if there are several maximally specific consistent hypotheses?. There can be several maximally specific hypotheses consistent with the data. Find S finds only one

More

More

More

More

More

More
Top