In the latest issue of Nature Biotechnology, MIT and Harvard University researchers describe a new algorithm that drastically reduces the time it takes to find a particular gene sequence in a database of genomes. Moreover, the more genomes it's searching, the greater the speedup it affords, so its advantages will only compound as more data is generated.
The new algorithm is anywhere between 2 and 4 times faster than BLAST in initial trials, increasing at a rate proportional to the unique elements of each sequence, with an accuracy of 96%. This is a huge breakthrough for genomics, but possibly also for other applications in big data in which comparison between elements with lots of similarity are required, as the researchers have built a system that indexes directly on the compressed data.