ReadFX

ReadFX

High-performance FASTA & FASTQ parsing for Nim. Wraps Heng Li's battle-tested kseq.h and adds a native Nim parser, gzip support, paired-end and interleaved pair reads, and a rich suite of sequence utilities.

$ nimble install readfx

⚡

Maximum throughput

Built on Heng Li's kseq.h C library via FFI. The pointer-based readFQPtr iterator reuses a single buffer across records for near-zero allocation overhead.

🗜️

Transparent gzip

Pass .fastq.gz or plain .fastq — the library decompresses on the fly. Stdin is also supported by passing "-" as the filename.

🔀

Paired-end reads

readFQPair, readFQPairPtr, and readFQInterleavedPairPtr cover both two-file and single-stream paired FASTQ workflows, with optional mate-name validation.

🧬

Sequence utilities

Reverse complement, GC content, nucleotide composition, quality trimming, low-quality masking, and subsequence extraction — all in one import.

🔬

IUPAC primer matching

findPrimerMatches finds primer binding sites using IUPAC ambiguity codes, with configurable mismatch thresholds.

🎛️

Flexible parsing APIs

Choose the right tradeoff: string-based convenience (readFQ), pointer-speed (readFQPtr), low-level control (readFastx), separate-file pair iterators, and interleaved zero-copy pair iteration with cached field lengths.

Quick start

One import gives you everything — parsers, types, and utilities.

nim

import readfx

# --- Iterate over FASTQ records (simplest API) ---
for record in readFQ("sample.fastq.gz"):
  echo record.name, " (", record.sequence.len, " bp)"

# --- Quality-trim then report GC content ---
for record in readFQ("sample.fastq.gz"):
  var r = record
  qualityTrim(r, minQual = 20)           # trim 3' bases with Phred < 20
  echo r.name, "  GC=", gcContent(r)    # GC fraction 0.0–1.0

# --- High-throughput: pointer-based (no string copies) ---
for record in readFQPtr("sample.fastq.gz"):
  echo $record.name, ": ", record.sequenceLen

# --- Paired-end reads ---
for pair in readFQPair("R1.fastq.gz", "R2.fastq.gz"):
  echo "R1: ", pair.read1.name, "  R2: ", pair.read2.name

# --- Paired-end pointer mode (zero-copy) ---
for pair in readFQPairPtr("R1.fastq.gz", "R2.fastq.gz", checkNames = true):
  echo pair.read1.sequenceLen + pair.read2.sequenceLen

# --- Interleaved paired-end pointer mode ---
for pair in readFQInterleavedPairPtr("reads.interleaved.fastq.gz", checkNames = true):
  echo pair.read1.sequenceLen + pair.read2.sequenceLen

Full tutorial →

Choose your parser

Every use case has an ideal API. Pick based on your throughput and convenience needs.

readFQ

Recommended

Yields FQRecord (Nim strings). Safe to store. Best for most programs.

readFQPtr

Fastest

Yields FQRecordPtr (reused C pointers). Ideal for streaming tens of millions of records.

readFastx

Low-level

Native Nim proc. Full control over the parse loop and I/O stream.

readFQPair

Paired-end

Reads R1 & R2 in lockstep. Yields FQPair with optional name validation.

readFQPairPtr

Paired-end Fastest

Reads R1 & R2 in lockstep. Yields reused pointer-based FQPairPtr records with optional name validation.

readFQInterleavedPairPtr

Interleaved Fastest

Reads one interleaved FASTQ stream. Yields FQPairPtr with scratch-backed read1 pointers valid until the next iterator advance.

Compare parsers in depth →

Quick start

Choose your parser

Documentation