Comparison at a Glance
| Method | Returns | Memory | Speed | Ease of use | Use case |
|---|---|---|---|---|---|
readFQ |
FQRecord |
Higher | Good | Excellent | General use |
readFQPtr |
FQRecordPtr |
Low | Excellent | Moderate | High-throughput streaming |
readFastx |
fills FQRecord |
Custom | Excellent | Requires setup | Custom I/O workflows |
readFQPair |
FQPair |
Moderate | Good | Excellent | Paired-end reads |
readFQ
iterator readFQ*(path: string): FQRecord
Yields FQRecord objects backed by Nim strings. Records are
completely safe to store, copy, or pass to any function after the loop body
returns — the data is owned by the record, not shared with an internal buffer.
import readfx
for record in readFQ("sample.fastq.gz"):
echo record.name, " (", record.sequence.len, " bp)"
# Collect all names
var names: seq[string]
for record in readFQ("sample.fastq.gz"):
names.add(record.name)
readFQ unless
profiling shows that memory allocation is a bottleneck.
Internally, readFQ is built on top of readFQPtr and
converts the C pointers to Nim strings on each yield. The extra copy is usually
negligible.
readFQPtr
iterator readFQPtr*(path: string): FQRecordPtr
Yields pointer-based records using Heng Li's kseq.h C library
directly via FFI. The underlying buffer is reused on every
iteration — the pointer returned on iteration N becomes
invalid when iteration N+1 begins.
import readfx
for record in readFQPtr("sample.fastq.gz"):
# Safe: convert to string immediately
echo $record.name, ": ", len($record.sequence)
FQRecordPtr or its fields across iterations.
If you need to retain data, convert it explicitly:
# ✓ Safe — data copied to Nim string immediately
var names: seq[string]
for record in readFQPtr("sample.fastq.gz"):
names.add($record.name)
# ✗ WRONG — ptr is invalid after the loop body
var p: ptr char
for record in readFQPtr("sample.fastq.gz"):
p = record.name # undefined behaviour on next iteration
readFQ's string allocation is a
bottleneck. For any output that requires retaining field data, convert to
strings with $.
readFastx
proc readFastx*[T](f: var Bufio[T], r: var FQRecord): bool
The low-level native-Nim parser. Rather than an iterator, it is a procedure
that fills a caller-owned FQRecord and returns true
while records remain, false at end-of-file.
You open the file yourself with xopen[GzFile] and manage the
stream lifecycle with defer: f.close().
import readfx
var record: FQRecord
var f = xopen[GzFile]("sample.fastq.gz")
defer: f.close()
while f.readFastx(record):
echo record.name, " (", record.sequence.len, " bp)"
Because you control the loop, you can interleave reads from multiple streams, add custom break conditions, or mix FASTX I/O with other file handles.
Status codes
After a call to readFastx, record.status contains
a diagnostic value:
| Value | Meaning |
|---|---|
> 0 | Sequence length — record parsed successfully |
-1 | End of file |
-2 | Stream error |
-3 | Other parsing error |
-4 | Sequence and quality length mismatch (FASTQ) |
readFQPair
iterator readFQPair*(path1, path2: string, checkNames: bool = false): FQPair
Reads two FASTQ files in lockstep. On each iteration it yields an
FQPair with fields read1 and read2,
both of type FQRecord.
import readfx
for pair in readFQPair("sample_R1.fastq.gz", "sample_R2.fastq.gz"):
echo "R1: ", pair.read1.name, " (", pair.read1.sequence.len, " bp)"
echo "R2: ", pair.read2.name, " (", pair.read2.sequence.len, " bp)"
Error handling
-
If one file contains more records than the other, an
IOErroris raised. -
With
checkNames = true, the iterator strips common suffixes (/1,/2,1,2) from read names before comparing them. If they do not match, aValueErroris raised.
for pair in readFQPair("R1.fastq.gz", "R2.fastq.gz", checkNames = true):
# Raises ValueError if read1.name ≠ read2.name (after suffix stripping)
discard
path1 may be "-" to read R1 from standard input.
Both files cannot be stdin simultaneously.
Implementation notes
-
readFQandreadFQPtr(and thereforereadFQPair) delegate to Heng Li'skseq.hC library via Nim's FFI.kseq.his a header-only, zero-copy streaming parser optimised for maximum throughput on POSIX systems. -
readFastxis a pure-Nim implementation located inreadfx/nimklib.nim, using theBufiobuffered I/O layer also defined in the main module. -
All four methods auto-detect gzip vs. plain text by inspecting the magic
bytes and support the FASTA (
>) and FASTQ (@) formats interchangeably.