Subsequence
In mathematics, a subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements. For example, the sequence is a subsequence of obtained after removal of elements , , and . The relation of one sequence being the subsequence of another is a preorder.
The subsequence should not be confused with substring which can be derived from the above string by deleting substring . The substring is a refinement of the subsequence.
Common subsequence
Given two sequences X and Y, a sequence Z is said to be a common subsequence of X and Y, if Z is a subsequence of both X and Y. For example, if
- and
then a common subsequence of X and Y could be
This would not be the longest common subsequence, since Z only has length 3, and the common subsequence has length 4. The longest common subsequence of X and Y is .
Applications
Subsequences have applications to computer science,[1] especially in the discipline of bioinformatics, where computers are used to compare, analyze, and store DNA, RNA, and protein sequences.
Take two sequences of DNA containing 37 elements, say:
- SEQ1 = ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTA
- SEQ2 = CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAA
The longest common subsequence of sequences 1 and 2 is:
- LCS(SEQ1,SEQ2) = CGTTCGGCTATGCTTCTACTTATTCTA
This can be illustrated by highlighting the 27 elements of the longest common subsequence into the initial sequences:
- SEQ1 = ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTA
- SEQ2 = CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAA
Another way to show this is to align the two sequences, i.e., to position elements of the longest common subsequence in a same column (indicated by the vertical bar) and to introduce a special character (here, a dash) in one sequence when two elements in the same column differ:
- SEQ1 = ACGGTGTCGTGCTAT-G--C-TGATGCTGA--CT-T-ATATG-CTA-
- | || ||| ||||| | | | | || | || | || | |||
- SEQ2 = -C-GT-TCG-GCTATCGTACGT--T-CT-ATTCTATGAT-T-TCTAA
Subsequences are used to determine how similar the two strands of DNA are, using the DNA bases: adenine, guanine, cytosine and thymine.
Theorems
- Every infinite sequence of real numbers has an infinite monotone subsequence (This is a lemma used in the proof of the Bolzano–Weierstrass theorem).
- Every bounded infinite sequence in Rn has a convergent subsequence (This is the Bolzano–Weierstrass theorem).
- For every integers r and s, every finite sequence of length at least (r − 1)(s − 1) + 1 contains a monotonically increasing subsequence of length r or a monotonically decreasing subsequence of length s (This is the Erdős–Szekeres theorem).
See also
- Subsequential limit - the limit of some subsequence.
- Limit superior and limit inferior
- Longest increasing subsequence problem
Notes
- ↑ In computer science, string is often used as a synonym for sequence, but it is important to note that substring and subsequence are not synonyms. Substrings are consecutive parts of a string, while subsequences need not be. This means that a substring of a string is always a subsequence of the string, but a subsequence of a string is not always a substring of the string, see: Gusfield, Dan (1999) [1997]. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. USA: Cambridge University Press. p. 4. ISBN 0-521-58519-8.
This article incorporates material from subsequence on PlanetMath, which is licensed under the Creative Commons Attribution/Share-Alike License.