MUMmer
MUMmer is a bioinformatics software system for sequence alignment. It is based on the suffix tree data structure. It has been used for comparing different genomes assemblies to one another, which allows scientists to determine how a genome has changed. The acronym "MUMmer" comes from "Maximal Unique Matches", or MUMs.
The original algorithms in the MUMMER software package were designed by Art Delcher, Simon Kasif and Steven Salzberg. Mummer was the first whole genome comparison system developed in Bioinformatics. It was originally applied to the comparison of two related strains of bacteria.
The MUMmer software is open source. The system is maintained primarily by Steven Salzberg and Arthur Delcher at Center for Computational Biology at Johns Hopkins University.
MUMmer is a highly cited bioinformatics system in the scientific literature. According to Google Scholar, as of early 2013 the original MUMmer paper (Delcher et al., 1999)[1] has been cited 691 times; the MUMmer 2 paper (Delcher et al., 2002)[2] has been cited 455 times; and the MUMmer 3.0 article (Kurtz et al., 2004)[3] has been cited 903 times.
Overview
[edit]Mummer is a fast algorithm used for the rapid alignment of entire genomes. The MUMmer algorithm is relatively new and has 4 versions.
Versions of MUMmers
[edit]MUMmer1
[edit]MUMmer1 or just MUMmer consists of three parts, the first part consists of the creation of suffix trees (to get MUMs), the second part in the longest increasing subsequence or longest common subsequences (to order MUMs), lastly any alignment to close gaps.
Interruptions between MUMs-alignment, are known as gaps. Otherther alignment algorithms fill these gaps. The gaps fall in the following four classes:[4]
- An SNPinterruption – when comparing two sequences, one character will differ.
- An insertion – when comparing two sequences, there is a subsequence in only appears in one of the sequences. It would be an empty gap in the other sequence at the moment of comparison of the two sequences.
- A highly polymorphic region – when comparing two sequences, there can be found a subsequence in which every single character differs.
- A repeat – it’s the repetition of a sequence. Since MUMs can only take unique sequences, that gap can be one repetition of one of the MUMs.
MUMmer 2
[edit]This algorithm was redesigned to require less memory and increase speed and accuracy. It also allows for bigger genomes alignment.
The improvement was the amount stored in the suffix trees by employing the one created by Kurtz.
MUMmer 3
[edit]According to Stefan Kurtz and his teammates, “the most significant technical improvement in MUMmer 3.0, is a complete rewrite of the suffix-tree code, based on the compact suffix- tree representation of” [5] the tree described in the article “Reducing the space requirement of suffix trees”.[6]
MUMmer 4
[edit]According to Guillaume and his team, there are some extra improvements in the implementation and also innovation with Query parallelism. “MUMmer4 now includes options to save and load the suffix array for a given reference."[7] This allows the suffix tree can be built once and constructed again after running it from the saved suffix tree.
Software - Open Source
[edit]MUMmer has open-source software and can be accessed online.
Related Sequence Alignments
[edit]There are other types of sequence alignments:
- Edit distance
- BLAST
- Bowtie
- BWA
- Blat
- Mauve
- LASTZ
- BLAST
References
[edit]- ^ Delcher, A. L.; Kasif, S.; Fleischmann, R. D.; Peterson, J.; White, O.; Salzberg, S. L. (1999). "Alignment of whole genomes". Nucleic Acids Research. 27 (11): 2369–2376. doi:10.1093/nar/27.11.2369. PMC 148804. PMID 10325427.
- ^ Delcher, A. L.; Phillippy, A.; Carlton, J.; Salzberg, S. L. (2002). "Fast algorithms for large-scale genome alignment and comparison". Nucleic Acids Research. 30 (11): 2478–2483. doi:10.1093/nar/30.11.2478. PMC 117189. PMID 12034836.
- ^ Delcher, A.; Harmon, D.; Kasif, S.; White, O.; Salzberg, S. (1999). "Improved microbial gene identification with GLIMMER". Nucleic Acids Research. 27 (23): 4636–4641. doi:10.1093/nar/27.23.4636. PMC 148753. PMID 10556321.
- ^ Delcher, A.; Kasif, S.; Fleischmann, R.; Peterson, J.; White, O.; Salzberg, S. (1999). "Alignment of Whole Genomes". Nucleic Acids Research. 27 (11): 2369–2376. doi:10.1093/nar/27.23.4636. PMC 148804. PMID 10325427.
- ^ Kurtz, S.; Phillippy, A.; Delcher, A.; Smoot, M.; Shumway, M.; Antonescu, C.; Salzberg, S. (2004). "Versatile and open software for comparing large genomes" (PDF). Genome Biology. 5 (2): R12. doi:10.1186/gb-2004-5-2-r12. PMC 395750. PMID 14759262. Archived (PDF) from the original on 2019-07-11. Retrieved 2021-05-06.
- ^ Kurtz, S. (1999). "Reducing the Space Requirement of Suffix Trees". Software: Practice and Experience. 29 (13): 1149–1171. doi:10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O. Archived from the original on 2021-05-06. Retrieved 2021-05-06.
- ^ Marçais, Guillaume.; Pillippy, A.; Delcher, A.; Coston, R.; Salzberg, S.; Zimin, A. (2018). "MUMmer4: A fast and versatile genome alignment system". PLOS Computational Biology. 14 (1): e1005944. Bibcode:2018PLSCB..14E5944M. doi:10.1371/journal.pcbi.1005944. PMC 5802927. PMID 29373581.