FQRecord
The primary record type. Returned by readFQ, filled by
readFastx, and used as the element type inside
FQPair. All fields are ordinary Nim strings — completely
safe to store, copy, and manipulate.
type FQRecord* = object
name*: string # Sequence identifier (up to first whitespace)
comment*: string # Optional free-text after name (may be empty)
sequence*: string # Nucleotide or amino-acid sequence
quality*: string # Phred quality string (empty for FASTA records)
status*, lastChar*: int # Internal parsing state — see below
Status codes
Relevant only when using readFastx directly. readFQ
and readFQPair handle these internally.
| Value | Meaning |
|---|---|
> 0 | Sequence length — record parsed successfully |
-1 | End of file |
-2 | Stream error |
-3 | Other parsing error |
-4 | Sequence and quality length mismatch in FASTQ |
for record in readFQ("sample.fastq.gz"):
echo record.name # "read_001"
echo record.comment # "length=150" or ""
echo record.sequence # "ACGT..."
echo record.quality # "IIII..." (empty for FASTA)
FQRecordPtr
A pointer-based record for high-performance streaming via the
kseq.h C library. The fields point directly into a
reused internal buffer — they are invalidated on the next
iteration.
type FQRecordPtr* = object
name*: ptr char # null-terminated C string
comment*: ptr char # null-terminated C string (may be nil)
sequence*: ptr char # null-terminated C string
quality*: ptr char # null-terminated C string (nil for FASTA)
Convert a field to a Nim string with the $ operator:
$record.name. This copies the data — safe to keep after
the loop body.
for record in readFQPtr("sample.fastq.gz"):
let name = $record.name # ✓ copied, safe to keep
let seq = $record.sequence # ✓ copied, safe to keep
echo name, ": ", seq.len
FQRecord vs FQRecordPtr
FQRecord |
FQRecordPtr |
|
|---|---|---|
| Memory model | Allocates Nim strings per record | Reuses a single C buffer |
| Safe to store | Yes | No — must copy with $ first |
| Iterator | readFQ |
readFQPtr |
| Typical throughput | Good | Excellent |
| Recommended for | Most programs | High-throughput streaming pipelines |
readFQ / FQRecord. Switch to
readFQPtr only if profiling shows allocation overhead
is measurable.
FQPair
A paired-end record containing two FQRecord objects.
Yielded by readFQPair.
type FQPair* = object
read1*: FQRecord # Forward read (R1)
read2*: FQRecord # Reverse read (R2)
for pair in readFQPair("R1.fastq.gz", "R2.fastq.gz"):
echo pair.read1.name, " / ", pair.read2.name
echo pair.read1.sequence.len, " bp + ", pair.read2.sequence.len, " bp"
SeqComp
Nucleotide composition statistics returned by the
composition() procedure.
type SeqComp* = object
A*, C*, G*, T*: int # Per-base counts
N*: int # Count of ambiguous bases
Other*: int # Non-ACGTN characters
GC*: float # GC fraction (0.0–1.0)
import readfx, strutils
for record in readFQ("sample.fastq.gz"):
let c = composition(record)
echo record.name,
" A=", c.A, " C=", c.C, " G=", c.G, " T=", c.T,
" N=", c.N,
" GC=", (c.GC * 100).formatFloat(ffDecimal, 1), "%"
Bufio[T]
A generic buffered reader used internally by readFastx.
Typically instantiated as Bufio[GzFile] via
xopen[GzFile](path). Handles both plain and gzip files
through the same interface.
type Bufio*[T] = tuple[fp: T, buf: string, st, en, sz: int, EOF: bool]
var record: FQRecord
var f = xopen[GzFile]("sample.fastq.gz")
defer: f.close()
while f.readFastx(record):
echo record.name
Bufio directly beyond calling
xopen, readFastx, and close.
For line-oriented access, readLine is also available.
Interval[S,T]
A genomic interval for use with the built-in interval tree. Useful for overlap queries on annotation data alongside sequence parsing.
type Interval*[S,T] = tuple[st, en: S, data: T, max: S]
Build an interval set, call index() to prepare it, then
query with the overlap() iterator:
var ivs: seq[Interval[int, string]]
ivs.add((st: 100, en: 200, data: "gene_A", max: 0))
ivs.add((st: 150, en: 300, data: "gene_B", max: 0))
ivs.sort()
ivs.index()
for iv in ivs.overlap(120, 180):
echo iv.data # "gene_A", "gene_B"
Performance notes
Benchmarks in the benchmark/ directory show that with
--opt:speed --gc:arc, the performance difference between
object and tuple layouts becomes negligible. Recommended compile flags
for production:
nim c --opt:speed --gc:arc myprogram.nim