nim library

ReadFX

High-performance FASTA & FASTQ parsing for Nim. Wraps Heng Li's battle-tested kseq.h and adds a native Nim parser, gzip support, paired-end and interleaved pair reads, and a rich suite of sequence utilities.

Get Started API Reference GitHub
$ nimble install readfx
Maximum throughput
Built on Heng Li's kseq.h C library via FFI. The pointer-based readFQPtr iterator reuses a single buffer across records for near-zero allocation overhead.
🗜️
Transparent gzip
Pass .fastq.gz or plain .fastq — the library decompresses on the fly. Stdin is also supported by passing "-" as the filename.
🔀
Paired-end reads
readFQPair, readFQPairPtr, and readFQInterleavedPairPtr cover both two-file and single-stream paired FASTQ workflows, with optional mate-name validation.
🧬
Sequence utilities
Reverse complement, GC content, nucleotide composition, quality trimming, low-quality masking, and subsequence extraction — all in one import.
🔬
IUPAC primer matching
findPrimerMatches finds primer binding sites using IUPAC ambiguity codes, with configurable mismatch thresholds.
🎛️
Flexible parsing APIs
Choose the right tradeoff: string-based convenience (readFQ), pointer-speed (readFQPtr), low-level control (readFastx), separate-file pair iterators, and interleaved zero-copy pair iteration with cached field lengths.

Quick start

One import gives you everything — parsers, types, and utilities.

nim
import readfx

# --- Iterate over FASTQ records (simplest API) ---
for record in readFQ("sample.fastq.gz"):
  echo record.name, " (", record.sequence.len, " bp)"

# --- Quality-trim then report GC content ---
for record in readFQ("sample.fastq.gz"):
  var r = record
  qualityTrim(r, minQual = 20)           # trim 3' bases with Phred < 20
  echo r.name, "  GC=", gcContent(r)    # GC fraction 0.0–1.0

# --- High-throughput: pointer-based (no string copies) ---
for record in readFQPtr("sample.fastq.gz"):
  echo $record.name, ": ", record.sequenceLen

# --- Paired-end reads ---
for pair in readFQPair("R1.fastq.gz", "R2.fastq.gz"):
  echo "R1: ", pair.read1.name, "  R2: ", pair.read2.name

# --- Paired-end pointer mode (zero-copy) ---
for pair in readFQPairPtr("R1.fastq.gz", "R2.fastq.gz", checkNames = true):
  echo pair.read1.sequenceLen + pair.read2.sequenceLen

# --- Interleaved paired-end pointer mode ---
for pair in readFQInterleavedPairPtr("reads.interleaved.fastq.gz", checkNames = true):
  echo pair.read1.sequenceLen + pair.read2.sequenceLen
Full tutorial →

Choose your parser

Every use case has an ideal API. Pick based on your throughput and convenience needs.

readFQ
Recommended
Yields FQRecord (Nim strings). Safe to store. Best for most programs.
readFQPtr
Fastest
Yields FQRecordPtr (reused C pointers). Ideal for streaming tens of millions of records.
readFastx
Low-level
Native Nim proc. Full control over the parse loop and I/O stream.
readFQPair
Paired-end
Reads R1 & R2 in lockstep. Yields FQPair with optional name validation.
readFQPairPtr
Paired-end Fastest
Reads R1 & R2 in lockstep. Yields reused pointer-based FQPairPtr records with optional name validation.
readFQInterleavedPairPtr
Interleaved Fastest
Reads one interleaved FASTQ stream. Yields FQPairPtr with scratch-backed read1 pointers valid until the next iterator advance.
Compare parsers in depth →

Documentation

Everything you need to go from installation to production.

📖
Tutorial
Step-by-step introduction for newcomers. What is FASTQ? How do I read it? What can I do with the records?
⚙️
Parsing Methods
Deep-dive into the full parser family, when to use each, and the performance trade-offs involved.
🗂️
Data Structures
FQRecord, FQRecordPtr, FQPair, FQPairPtr, SeqComp, Bufio, Interval — all types explained.
🛠️
Utilities
Reverse complement, GC content, quality trimming, base masking, composition analysis, and more.
🔍
API Reference
Full symbol index generated directly from source doc-comments by nimble docs.
💻
Source Code
Browse the repository, open issues, or contribute on GitHub at Quadram Institute Bioscience.