High-performance FASTA & FASTQ parsing for Nim.
Wraps Heng Li's battle-tested kseq.h and
adds a native Nim parser, gzip support, paired-end and interleaved pair reads,
and a rich suite of sequence utilities.
kseq.h C library via FFI.
The pointer-based readFQPtr iterator reuses a
single buffer across records for near-zero allocation overhead.
.fastq.gz or plain .fastq
— the library decompresses on the fly. Stdin is also supported
by passing "-" as the filename.
readFQPair, readFQPairPtr, and
readFQInterleavedPairPtr cover both two-file and
single-stream paired FASTQ workflows, with optional mate-name
validation.
findPrimerMatches finds primer binding sites
using IUPAC ambiguity codes, with configurable mismatch
thresholds.
readFQ), pointer-speed (readFQPtr),
low-level control (readFastx), separate-file pair
iterators, and interleaved zero-copy pair iteration with cached
field lengths.
One import gives you everything — parsers, types, and utilities.
import readfx
# --- Iterate over FASTQ records (simplest API) ---
for record in readFQ("sample.fastq.gz"):
echo record.name, " (", record.sequence.len, " bp)"
# --- Quality-trim then report GC content ---
for record in readFQ("sample.fastq.gz"):
var r = record
qualityTrim(r, minQual = 20) # trim 3' bases with Phred < 20
echo r.name, " GC=", gcContent(r) # GC fraction 0.0–1.0
# --- High-throughput: pointer-based (no string copies) ---
for record in readFQPtr("sample.fastq.gz"):
echo $record.name, ": ", record.sequenceLen
# --- Paired-end reads ---
for pair in readFQPair("R1.fastq.gz", "R2.fastq.gz"):
echo "R1: ", pair.read1.name, " R2: ", pair.read2.name
# --- Paired-end pointer mode (zero-copy) ---
for pair in readFQPairPtr("R1.fastq.gz", "R2.fastq.gz", checkNames = true):
echo pair.read1.sequenceLen + pair.read2.sequenceLen
# --- Interleaved paired-end pointer mode ---
for pair in readFQInterleavedPairPtr("reads.interleaved.fastq.gz", checkNames = true):
echo pair.read1.sequenceLen + pair.read2.sequenceLen
Every use case has an ideal API. Pick based on your throughput and convenience needs.
FQRecord (Nim strings). Safe to store.
Best for most programs.
FQRecordPtr (reused C pointers).
Ideal for streaming tens of millions of records.
FQPair
with optional name validation.
FQPairPtr records with optional name validation.
FQPairPtr
with scratch-backed read1 pointers valid until the next
iterator advance.
Everything you need to go from installation to production.
FQRecord, FQRecordPtr, FQPair, FQPairPtr,
SeqComp, Bufio, Interval —
all types explained.
nimble docs.