Parsing Methods

Comparison at a Glance

Method	Returns	Memory	Speed	Ease of use	Use case
`readFQ`	`FQRecord`	Higher	Good	Excellent	General use
`readFQPtr`	`FQRecordPtr`	Low	Excellent	Moderate	High-throughput streaming
`readFastx`	fills `FQRecord`	Custom	Excellent	Requires setup	Custom I/O workflows
`readFQPair`	`FQPair`	Moderate	Good	Excellent	Paired-end reads

`readFQ`

signature

iterator readFQ*(path: string): FQRecord

Yields FQRecord objects backed by Nim strings. Records are completely safe to store, copy, or pass to any function after the loop body returns — the data is owned by the record, not shared with an internal buffer.

nim

import readfx

for record in readFQ("sample.fastq.gz"):
  echo record.name, " (", record.sequence.len, " bp)"

# Collect all names
var names: seq[string]
for record in readFQ("sample.fastq.gz"):
  names.add(record.name)

When to use This is the right default for most programs. Use readFQ unless profiling shows that memory allocation is a bottleneck.

Internally, readFQ is built on top of readFQPtr and converts the C pointers to Nim strings on each yield. The extra copy is usually negligible.

`readFQPtr`

signature

iterator readFQPtr*(path: string): FQRecordPtr

Yields pointer-based records using Heng Li's kseq.h C library directly via FFI. The underlying buffer is reused on every iteration — the pointer returned on iteration N becomes invalid when iteration N+1 begins.

nim

import readfx

for record in readFQPtr("sample.fastq.gz"):
  # Safe: convert to string immediately
  echo $record.name, ": ", len($record.sequence)

Pointer invalidation Never store an FQRecordPtr or its fields across iterations. If you need to retain data, convert it explicitly:

nim

# ✓ Safe — data copied to Nim string immediately
var names: seq[string]
for record in readFQPtr("sample.fastq.gz"):
  names.add($record.name)

# ✗ WRONG — ptr is invalid after the loop body
var p: ptr char
for record in readFQPtr("sample.fastq.gz"):
  p = record.name   # undefined behaviour on next iteration

When to use When processing tens or hundreds of millions of records and you have confirmed (via profiling) that readFQ's string allocation is a bottleneck. For any output that requires retaining field data, convert to strings with $.

`readFastx`

signature

proc readFastx*[T](f: var Bufio[T], r: var FQRecord): bool

The low-level native-Nim parser. Rather than an iterator, it is a procedure that fills a caller-owned FQRecord and returns true while records remain, false at end-of-file.

You open the file yourself with xopen[GzFile] and manage the stream lifecycle with defer: f.close().

nim

import readfx

var record: FQRecord
var f = xopen[GzFile]("sample.fastq.gz")
defer: f.close()

while f.readFastx(record):
  echo record.name, " (", record.sequence.len, " bp)"

Because you control the loop, you can interleave reads from multiple streams, add custom break conditions, or mix FASTX I/O with other file handles.

Status codes

After a call to readFastx, record.status contains a diagnostic value:

Value	Meaning
`> 0`	Sequence length — record parsed successfully
`-1`	End of file
`-2`	Stream error
`-3`	Other parsing error
`-4`	Sequence and quality length mismatch (FASTQ)

When to use Custom parsing workflows — for example, interleaving reads from two sources, implementing a merge sort over multiple files, or building a streaming pipeline that can pause and resume.

`readFQPair`

signature

iterator readFQPair*(path1, path2: string, checkNames: bool = false): FQPair

Reads two FASTQ files in lockstep. On each iteration it yields an FQPair with fields read1 and read2, both of type FQRecord.

nim

import readfx

for pair in readFQPair("sample_R1.fastq.gz", "sample_R2.fastq.gz"):
  echo "R1: ", pair.read1.name, "  (", pair.read1.sequence.len, " bp)"
  echo "R2: ", pair.read2.name, "  (", pair.read2.sequence.len, " bp)"

Error handling

If one file contains more records than the other, an IOError is raised.
With checkNames = true, the iterator strips common suffixes (/1, /2, 1, 2) from read names before comparing them. If they do not match, a ValueError is raised.

nim

for pair in readFQPair("R1.fastq.gz", "R2.fastq.gz", checkNames = true):
  # Raises ValueError if read1.name ≠ read2.name (after suffix stripping)
  discard

Stdin support path1 may be "-" to read R1 from standard input. Both files cannot be stdin simultaneously.

When to use Any paired-end sequencing pipeline — Illumina R1/R2 files, mate-pair libraries, or any scenario where two FASTQ files must be consumed in sync.

Implementation notes

readFQ and readFQPtr (and therefore readFQPair) delegate to Heng Li's kseq.h C library via Nim's FFI. kseq.h is a header-only, zero-copy streaming parser optimised for maximum throughput on POSIX systems.
readFastx is a pure-Nim implementation located in readfx/nimklib.nim, using the Bufio buffered I/O layer also defined in the main module.
All four methods auto-detect gzip vs. plain text by inspecting the magic bytes and support the FASTA (>) and FASTQ (@) formats interchangeably.

Data structures → Utilities →