Raita algorithm

In computer science, the Raita algorithm is a string searching algorithm which improves the performance of Boyer-Moore-Horspool algorithm. This algorithm preprocesses the string being searched for the pattern, which is similar to Boyer-Moore string search algorithm. The searching pattern of particular sub-string in a given string is different from Boyer-Moore-Horspool algorithm. This algorithm was published by Tim Raita in 1991.

Description

Raita algorithm searches for a pattern "P" in a given text "T" by comparing each character of pattern in the given text. Searching will be done as follows. Window for a text "T" is defined as the length of "P".

1. First, last character of the pattern is compared with the rightmost character of the window.

2. If there is a match, first character of the pattern is compared with the leftmost character of the window.

3. If they match again, it compares the middle character of the pattern with middle character of the window.

If everything is successful, then the original comparison starts from the second character to last but one. If there is a mismatch at any stage in the algorithm, it performs the bad character shift function which was computed in pre-processing phase. Bad character shift function is similar to the one proposed in Boyer-Moore algorithm.

C Code for Raita Algorithm

void RAITA(char *x, int m, char *y, int n) {
   int j, bmBc[ASIZE];
   char c, firstCh, *secondCh, middleCh, lastCh;

   if (m == 0)
      return;
   else if (m == 1) {
      char *match_ptr = y;
      while (match_ptr < y + n) {
         match_ptr = memchr (match_ptr, x[0], n - (match_ptr - y));
         if (match_ptr != NULL) {
            OUTPUT(match_ptr - y);
            match_ptr++;
         }
         else
            return;
      }
   }

   /* Preprocessing */
   void preBmBc(char *x, int m, int bmBc[]) {
   	int i;
 
 	  for (i = 0; i < ASIZE; ++i)
      		bmBc[i] = m;
   	  for (i = 0; i < m - 1; ++i)
      		bmBc[x[i]] = m - i - 1;
    }

   firstCh = x[0];
   secondCh = x + 1;
   middleCh = x[m/2];
   lastCh = x[m - 1];

   /* Searching */
   j = 0;
   while (j <= n - m) {
      c = y[j + m - 1];
      if (lastCh == c && middleCh == y[j + m/2] &&
          firstCh == y[j] &&
          memcmp(secondCh, y + j + 1, m - 2) == 0)
         OUTPUT(j);
      j += bmBc[c];
   }
}

Example

Pattern: abddb

Text:abbaabaabddbabadbb

Pre- Processing stage:

  a b d
  4 3 1
 Attempt 1:
 abbaabaabddbabadbb
 ....b
 Shift by 4 (bmBc[a])

Comparison of last character of pattern to rightmost character in the window. Its a mismatch and shifted by 4 according to the value in pre-processing stage.

 Attempt 2:
 abbaabaabddbabadbb
     A.d.B
 Shift by 3 (bmBc[b])

Here last and first character of the pattern are matched but middle character is a mismatch. So the pattern is shifted according to the pre-processing stage.

 Attempt 3:
 abbaabaabddbabadbb
        ABDDB
 Shift by 3 (bmBc[b])

We found exact match here but the algorithm continues until it can't move further.

 Attempt 4:
 abbaabaABDDBabadbb
           ....b
 Shift by 4 (bmBc[a])

At this stage, we need to shift by 4 and we can't move the pattern by 4. So the algorithm terminates. Letters in capital letter are exact match of the pattern in the text.

Complexity

1. Pre-processing stage takes O(m) time where "m" is the length of pattern "P".

2. Searching stage takes O(mn) time complexity where "n" is the length of text "T".

Algorithm [1]

See also

References

  1. RAITA T., 1992, Tuning the Boyer-Moore-Horspool string searching algorithm, Software - Practice & Experience, 22(10):879-884
This article is issued from Wikipedia - version of the 10/27/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.