What are FASTA and FASTQ?

FASTA is the simplest sequence format. Each record is two lines: a header (starting with >) followed by the nucleotide or amino-acid sequence.

fasta
>read_001 some optional comment
ACGTACGTACGTACGTACGT

FASTQ extends FASTA with per-base quality scores. Every record spans four lines:

fastq
@read_001 some optional comment
ACGTACGTACGTACGTACGT
+
IIIIIIIIIIIIIIIIIIII

Line 3 is always a bare +. Line 4 encodes Phred quality scores as ASCII characters — subtract 33 from the character code to get the integer score (e.g., 'I' = ASCII 73 − 33 = Q40, meaning a 1-in-10,000 base-call error probability). ReadFX handles the conversion for you.

Files are almost always gzip-compressed (.fastq.gz). ReadFX decompresses transparently — you don't need to gunzip first.

Installation

Install the library via Nimble:

shell
nimble install readfx

Or add it to your project's .nimble file:

nimble
requires "readfx >= 0.2.0"

Then in your source:

nim
import readfx

That single import brings in all types, parsers, and utilities.

Reading your first file

The easiest way to iterate over records is readFQ. It yields one FQRecord per sequence — think of it as a Nim for loop over the file.

nim
import readfx

for record in readFQ("sample.fastq.gz"):
  echo record.name, " (", record.sequence.len, " bp)"

Works equally well on FASTA files and on uncompressed files. To read from standard input, pass "-" as the path:

nim
for record in readFQ("-"):
  echo record.name
Compile with optimisations For production use, add --opt:speed --gc:arc to your compile command. This can double throughput with no code changes.

Understanding the FQRecord

Every record returned by readFQ is an FQRecord object with four string fields:

Field Type Description
name string Sequence identifier (everything up to the first space on the header line)
comment string Optional free-text after the name (may be empty)
sequence string Nucleotide or amino-acid sequence
quality string Phred quality string (empty for FASTA records)
nim
import readfx

for record in readFQ("sample.fastq.gz"):
  echo "Name:     ", record.name
  echo "Comment:  ", record.comment
  echo "Sequence: ", record.sequence
  echo "Length:   ", record.sequence.len
  echo "Quality:  ", record.quality     # empty if FASTA
  echo "---"

Because the fields are plain Nim strings, you can use the entire Nim standard library on them — strutils, sequtils, regex, etc.

Working with sequences

ReadFX exports a collection of sequence-manipulation procedures that operate directly on FQRecord objects (or plain strings).

Reverse complement

nim
let rc = revCompl("ATGCCC")   # → "GGGCAT"

var rec: FQRecord = ...
revCompl(rec)                  # modify in place (also reverses quality)
let copy = revCompl(rec)       # get a new reversed copy

GC content

nim
let gc = gcContent(record)   # returns a float, 0.0–1.0
echo "GC: ", (gc * 100).int, "%"

Nucleotide composition

nim
let comp = composition(record)
echo "A=", comp.A, " C=", comp.C, " G=", comp.G, " T=", comp.T
echo "N=", comp.N, " GC=", comp.GC

Quality trimming

Remove low-quality bases from the 3′ end. The record's sequence and quality strings are both shortened in place.

nim
var r = record           # make a mutable copy
qualityTrim(r, 20)       # trim bases with Phred score < 20
echo r.sequence.len      # may be shorter now

Masking low-quality bases

nim
maskLowQuality(r, 20)              # replace Q<20 with 'N'
maskLowQuality(r, 15, maskChar='X')  # or any other character

Subsequence extraction

nim
let first50 = subSequence(record, 0, 50)   # first 50 bases
let fromPos = subSequence(record, 10)      # from base 10 to end
let trimmed = trimStart(record, 5)         # remove 5 bases from 5' end
let clipped  = trimEnd(record, 3)          # remove 3 bases from 3' end

Full worked example

Putting it all together: read a file, filter short reads, trim quality, report per-record statistics.

nim
import readfx, strutils

const minLength = 50
const minQual   = 20

for record in readFQ("sample.fastq.gz"):
  # Skip short sequences
  if record.sequence.len < minLength:
    continue

  # Work on a mutable copy
  var r = record
  qualityTrim(r, minQual)

  # Skip reads that became too short after trimming
  if r.sequence.len < minLength:
    continue

  let comp = composition(r)
  let avg  = avgQuality(r)

  echo r.name,
       "\tlen=", r.sequence.len,
       "\tGC=",  (comp.GC * 100).formatFloat(ffDecimal, 1), "%",
       "\tQ=",   avg.formatFloat(ffDecimal, 1)

Paired-end reads

Illumina sequencing typically produces two files per sample: R1 (forward reads) and R2 (reverse reads). The readFQPair iterator reads both files in lockstep, yielding an FQPair containing both mates.

nim
import readfx

for pair in readFQPair("sample_R1.fastq.gz", "sample_R2.fastq.gz"):
  echo "R1: ", pair.read1.name, "  (", pair.read1.sequence.len, " bp)"
  echo "R2: ", pair.read2.name, "  (", pair.read2.sequence.len, " bp)"

If the files have different numbers of records, an IOError is raised. Enable name validation to catch mismatched or scrambled files:

nim
for pair in readFQPair("R1.fastq.gz", "R2.fastq.gz", checkNames = true):
  # raises ValueError if read names don't match
  # strips /1 /2 suffixes automatically before comparing
  discard

Choosing the right parser

ReadFX exposes three single-file parsers. Here's the mental model:

Iterator Returns When to use
readFQ FQRecord Start here. Yields safe Nim strings; records can be stored in sequences, passed to functions, etc.
readFQPtr FQRecordPtr Processing tens of millions of reads where allocation overhead is measurable. Do not store pointers across iterations.
readFastx fills FQRecord Custom I/O workflows — e.g. interleaving reads from two different sources in a single loop.
readFQPtr warning The C-level buffer backing FQRecordPtr fields is reused on every iteration. Converting to a Nim string with $record.name is safe, but storing the raw pointer is not. If in doubt, use readFQ.

Performance tips

Next steps

Parsing methods
Detailed comparison of all four iterators with benchmarks and use-case guidance.
Data structures
Full type definitions for FQRecord, FQPair, SeqComp, Bufio, and Interval.
Utilities reference
Every sequence and quality procedure with signatures and examples.
API index
Generated symbol index for the full library — searchable by name.