Data Structures

`FQRecord`

The primary record type. Returned by readFQ, filled by readFastx, and used as the element type inside FQPair. All fields are ordinary Nim strings — completely safe to store, copy, and manipulate.

nim

type FQRecord* = object
  name*:     string  # Sequence identifier (up to first whitespace)
  comment*:  string  # Optional free-text after name (may be empty)
  sequence*: string  # Nucleotide or amino-acid sequence
  quality*:  string  # Phred quality string (empty for FASTA records)
  status*, lastChar*: int  # Internal parsing state — see below

Status codes

Relevant only when using readFastx directly. readFQ and readFQPair handle these internally.

Value	Meaning
`> 0`	Sequence length — record parsed successfully
`-1`	End of file
`-2`	Stream error
`-3`	Other parsing error
`-4`	Sequence and quality length mismatch in FASTQ

nim — basic usage

for record in readFQ("sample.fastq.gz"):
  echo record.name       # "read_001"
  echo record.comment    # "length=150" or ""
  echo record.sequence   # "ACGT..."
  echo record.quality    # "IIII..." (empty for FASTA)

`FQRecordPtr`

A pointer-based record for high-performance streaming via the kseq.h C library. The fields point directly into a reused internal buffer — they are invalidated on the next iteration.

nim

type FQRecordPtr* = object
  name*:     ptr char  # null-terminated C string
  comment*:  ptr char  # null-terminated C string (may be nil)
  sequence*: ptr char  # null-terminated C string
  quality*:  ptr char  # null-terminated C string (nil for FASTA)

Convert a field to a Nim string with the $ operator: $record.name. This copies the data — safe to keep after the loop body.

nim

for record in readFQPtr("sample.fastq.gz"):
  let name = $record.name       # ✓ copied, safe to keep
  let seq  = $record.sequence   # ✓ copied, safe to keep
  echo name, ": ", seq.len

FQRecord vs FQRecordPtr

	`FQRecord`	`FQRecordPtr`
Memory model	Allocates Nim strings per record	Reuses a single C buffer
Safe to store	Yes	No — must copy with `$` first
Iterator	`readFQ`	`readFQPtr`
Typical throughput	Good	Excellent
Recommended for	Most programs	High-throughput streaming pipelines

Rule of thumb Start with readFQ / FQRecord. Switch to readFQPtr only if profiling shows allocation overhead is measurable.

`FQPair`

A paired-end record containing two FQRecord objects. Yielded by readFQPair.

nim

type FQPair* = object
  read1*: FQRecord   # Forward read (R1)
  read2*: FQRecord   # Reverse read (R2)

nim — usage

for pair in readFQPair("R1.fastq.gz", "R2.fastq.gz"):
  echo pair.read1.name, " / ", pair.read2.name
  echo pair.read1.sequence.len, " bp + ", pair.read2.sequence.len, " bp"

`SeqComp`

Nucleotide composition statistics returned by the composition() procedure.

nim

type SeqComp* = object
  A*, C*, G*, T*: int   # Per-base counts
  N*:             int   # Count of ambiguous bases
  Other*:         int   # Non-ACGTN characters
  GC*:            float # GC fraction (0.0–1.0)

nim — usage

import readfx, strutils

for record in readFQ("sample.fastq.gz"):
  let c = composition(record)
  echo record.name,
       "  A=", c.A, " C=", c.C, " G=", c.G, " T=", c.T,
       "  N=", c.N,
       "  GC=", (c.GC * 100).formatFloat(ffDecimal, 1), "%"

`Bufio[T]`

A generic buffered reader used internally by readFastx. Typically instantiated as Bufio[GzFile] via xopen[GzFile](path). Handles both plain and gzip files through the same interface.

nim

type Bufio*[T] = tuple[fp: T, buf: string, st, en, sz: int, EOF: bool]

nim — typical use with readFastx

var record: FQRecord
var f = xopen[GzFile]("sample.fastq.gz")
defer: f.close()
while f.readFastx(record):
  echo record.name

Note You rarely need to interact with Bufio directly beyond calling xopen, readFastx, and close. For line-oriented access, readLine is also available.

`Interval[S,T]`

A genomic interval for use with the built-in interval tree. Useful for overlap queries on annotation data alongside sequence parsing.

nim

type Interval*[S,T] = tuple[st, en: S, data: T, max: S]

Build an interval set, call index() to prepare it, then query with the overlap() iterator:

nim

var ivs: seq[Interval[int, string]]
ivs.add((st: 100, en: 200, data: "gene_A", max: 0))
ivs.add((st: 150, en: 300, data: "gene_B", max: 0))

ivs.sort()
ivs.index()

for iv in ivs.overlap(120, 180):
  echo iv.data   # "gene_A", "gene_B"

Performance notes

Benchmarks in the benchmark/ directory show that with --opt:speed --gc:arc, the performance difference between object and tuple layouts becomes negligible. Recommended compile flags for production:

shell

nim c --opt:speed --gc:arc myprogram.nim

Utilities → ← Parsing methods