readfx

Source   Edit  

ReadFX: A Nim library for bioinformatics sequence file parsing

This module provides efficient parsing and manipulation of FASTA/FASTQ format sequence files commonly used in bioinformatics.

Features:

  • Fast FASTA/FASTQ sequence parsing (supports gzipped files)
  • Buffered I/O for efficient file reading
  • Interval tree implementation for genomic interval operations

Example:

import readfx

# Read a FASTQ file
for record in readFQ("sample.fastq.gz"):
  echo "Sequence name: ", record.name
  echo "Sequence: ", record.sequence
  echo "Quality: ", record.quality

Iterator for reading FASTQ files, returning pointers to record data

Note: The pointers are reused between iterations, so don't store them. For stdin input, use "-" as the path parameter.

Args: path: Path to the FASTQ file (supports gzipped files)

Returns: An iterator yielding FQRecordPtr objects

Example:

for rec in readFQPtr("sample.fastq.gz"):
  echo $rec.name
  echo $rec.sequence

Iterator for reading FASTQ files, returning copies of record data

This iterator creates copies of the strings, unlike readFQPtr which returns pointers to the underlying data.

Args: path: Path to the FASTQ file (supports gzipped files)

Returns: An iterator yielding FQRecord objects with copied data

Example:

for rec in readFQ("sample.fastq.gz"):
  echo rec.name
  echo rec.sequence

Iterator for reading paired-end FASTQ files synchronously

Reads two FASTQ files in parallel, yielding pairs of corresponding records. The files must have the same number of sequences in the same order.

Args: path1: Path to the first FASTQ file (R1, forward reads) path2: Path to the second FASTQ file (R2, reverse reads) checkNames: Whether to verify that read names match (default: false)

Returns: An iterator yielding FQPair objects with synchronized reads

Raises: IOError: If files cannot be opened or have mismatched lengths ValueError: If checkNames is true and read names don't match

Example:

for pair in readFQPair("sample_R1.fastq.gz", "sample_R2.fastq.gz"):
  echo "Forward: ", pair.read1.name
  echo "Reverse: ", pair.read2.name
  processReadPair(pair.read1.sequence, pair.read2.sequence)

Formats a sequence record as a FASTA or FASTQ string

Args: name: Sequence name/identifier comment: Sequence comment (optional) sequence: The sequence string quality: Quality scores (empty for FASTA format)

Returns: Formatted FASTA/FASTQ string

Print FASTA record splitting the sequence into lines of N characters Returns: Formatted FASTA stringPrint FASTA record splitting the sequence into lines of N characters Returns: Formatted FASTA stringReturns: Formatted FASTA/FASTQ string

Convert a FQRecordPtr to a string (FASTA or FASTQ format)

Returns: Formatted FASTA/FASTQ string

Types

Bufio[T] = tuple[fp: T, buf: string, st, en, sz: int, EOF: bool]
Source   Edit  
GzFile = gzFile
Source   Edit  
Interval[S; T] = tuple[st, en: S, data: T, max: S]
Source   Edit  

Procs

proc `$`(rec: FQRecord): string {....raises: [], tags: [], forbids: [].}
Source   Edit  
proc `$`(rec: FQRecordPtr): string {....raises: [], tags: [], forbids: [].}
Source   Edit  
proc close[T](f: var Bufio[T]): int {.discardable.}
Source   Edit  
proc eof[T](f: Bufio[T]): bool {.noSideEffect.}
Source   Edit  
proc fafmt(rec: FQRecord; width: int = 60): string {....raises: [], tags: [],
    forbids: [].}
Source   Edit  
proc index[S, T](a: var seq[Interval[S, T]]): int {.discardable.}
Source   Edit  
proc kseq_init(fp: gzFile): ptr kseq_t {.
    header: "/Users/telatina/git/readfx/readfx/kseq.h", importc: "kseq_init",
    ...raises: [], tags: [], forbids: [].}
Source   Edit  
proc kseq_read(seq: ptr kseq_t): int {.header: "/Users/telatina/git/readfx/readfx/kseq.h",
                                       importc: "kseq_read", ...raises: [],
                                       tags: [], forbids: [].}
Source   Edit  
proc kseq_rewind(seq: ptr kseq_t) {.header: "/Users/telatina/git/readfx/readfx/kseq.h",
                                    importc: "kseq_rewind", ...raises: [],
                                    tags: [], forbids: [].}
Source   Edit  
proc open[T](f: var Bufio[T]; fn: string; mode: FileMode = fmRead;
             sz: int = 0x00010000): int {.discardable.}
Source   Edit  
proc read[T](f: var Bufio[T]; buf: var string; sz: int; offset: int = 0): int {.
    discardable.}
Source   Edit  
proc readByte[T](f: var Bufio[T]): int
Source   Edit  
proc readFastx[T](f: var Bufio[T]; r: var FQRecord): bool {.discardable.}
Source   Edit  
proc readLine[T](f: var Bufio[T]; buf: var string): bool {.discardable.}
Source   Edit  
proc readUntil[T](f: var Bufio[T]; buf: var string; dret: var char;
                  delim: int = -1; offset: int = 0): int {.discardable.}
Source   Edit  
proc sort[S, T](a: var seq[Interval[S, T]])
Source   Edit  
proc xopen[T](fn: string; mode: FileMode = fmRead; sz: int = 0x00010000): Bufio[
    T]
Source   Edit  

Iterators

iterator overlap[S, T](a: seq[Interval[S, T]]; st: S; en: S): Interval[S, T] {.
    noSideEffect.}
Source   Edit  
iterator readFQ(path: string): FQRecord {....raises: [], tags: [], forbids: [].}
Source   Edit  
iterator readFQPair(path1: string; path2: string; checkNames: bool = false): FQPair {.
    ...raises: [IOError, IOError, ValueError], tags: [], forbids: [].}
Source   Edit  
iterator readFQPtr(path: string): FQRecordPtr {....raises: [], tags: [],
    forbids: [].}
Source   Edit