Seq is a programming language created in 2019.
#777on PLDB | 5Years Old |
git clone https://github.com/seq-lang/seq
A High-Performance Language for Bioinformatics. Here, we introduce Seq, the first language tailored specifically to bioinformatics, which marries the ease and productivity of Python with C-like performance. Seq is a subset of Python—and in many cases a drop-in replacement—yet also incorporates novel bioinformatics- and computational genomics-oriented data types, language constructs and optimizations. Seq enables users to write high-level, Pythonic code without having to worry about low-level or domain-specific optimizations, and allows for seamless expression of the algorithms, idioms and patterns found in many genomics or bioinformatics applications. On equivalent CPython code, Seq attains a performance improvement of up to two orders of magnitude, and a 175× improvement once domain-specific language features and optimizations are used. With parallelism, we demonstrate up to a 650× improvement. Compared to optimized C++ code, which is already difficult for most biologists to produce, Seq frequently attains up to a 2× improvement, and with shorter, cleaner code. Thus, Seq opens the door to an age of democratization of highly-optimized bioinformatics software.
from sys import argv
from genomeindex import *
# index and process 20-mers
def process(kmer: k20, index: GenomeIndex[k20]):
prefetch index[kmer], index[~kmer]
hits_fwd = index[kmer]
hits_rev = index[~kmer]