readfx

Search:
Group by:
Source   Edit  

ReadFX: A Nim library for bioinformatics sequence file parsing

This module provides efficient parsing and manipulation of FASTA/FASTQ format sequence files commonly used in bioinformatics.

Features:

  • Fast FASTA/FASTQ sequence parsing (supports gzipped files)
  • Buffered I/O for efficient file reading
  • Interval tree implementation for genomic interval operations

Example:

import readfx

# Read a FASTQ file
for record in readFQ("sample.fastq.gz"):
  echo "Sequence name: ", record.name
  echo "Sequence: ", record.sequence
  echo "Quality: ", record.quality

Initialize a kseq parser handle from an open gzFile stream.

Args: fp: Open gzip/plain-text stream handle

Returns: Pointer to an initialized parser state

Reset parser state to the beginning of the input stream.

Args: seq: Parser state previously created with kseq_init

Read the next FASTA/FASTQ record from the parser stream.

Args: seq: Parser state previously created with kseq_init

Returns: Record sequence length on success, or a negative status code on EOF/error

Iterator for reading FASTQ files, returning pointers to record data

Note: The pointers are reused between iterations, so don't store them. For stdin input, use "-" as the path parameter. Cached lengths are available on each FQRecordPtr field and exclude the trailing NUL terminator.

Args: path: Path to the FASTQ file (supports gzipped files)

Returns: An iterator yielding FQRecordPtr objects

Raises: IOError: If the input stream cannot be opened

Example:

for rec in readFQPtr("sample.fastq.gz"):
  echo $rec.name
  echo $rec.sequence

Iterator for reading FASTQ files, returning copies of record data

This iterator creates copies of the strings, unlike readFQPtr which returns pointers to the underlying data.

Args: path: Path to the FASTQ file (supports gzipped files)

Returns: An iterator yielding FQRecord objects with copied data

Raises: IOError: Propagated from readFQPtr if the input stream cannot be opened

Example:

for rec in readFQ("sample.fastq.gz"):
  echo rec.name
  echo rec.sequence

Iterator for reading paired-end FASTQ files synchronously with pointers

Reads two FASTQ files in parallel, yielding pairs of corresponding records. Pointer fields are reused between iterations; convert to strings if data must be retained after the next yield. Cached lengths are available on both read1 and read2.

Args: path1: Path to the first FASTQ file (R1, forward reads) path2: Path to the second FASTQ file (R2, reverse reads) checkNames: Whether to verify that read names match (default: false)

Returns: An iterator yielding FQPairPtr objects with synchronized reads

Raises: IOError: If files cannot be opened or have mismatched lengths ValueError: If checkNames is true and read names don't match

Iterator for reading interleaved paired-end FASTQ files with pointers.

Reads one interleaved FASTQ stream and yields each adjacent R1/R2 record pair as FQPairPtr. read1 is copied into scratch storage so that both records remain valid until the next yield. Scratch-backed pointers are always NUL-terminated.

Args: path: Path to the interleaved FASTQ file (supports gzipped files; "-" reads from stdin) checkNames: Whether to verify that read names match after mate suffix normalization (default: false)

Returns: An iterator yielding FQPairPtr objects with cached field lengths

Raises: IOError: If the input stream cannot be opened or ends with an incomplete pair ValueError: If input is not FASTQ or checkNames detects a mismatch

Iterator for reading paired-end FASTQ files synchronously

Reads two FASTQ files in parallel, yielding pairs of corresponding records. The files must have the same number of sequences in the same order.

Args: path1: Path to the first FASTQ file (R1, forward reads) path2: Path to the second FASTQ file (R2, reverse reads) checkNames: Whether to verify that read names match (default: false)

Returns: An iterator yielding FQPair objects with synchronized reads

Raises: IOError: If files cannot be opened or have mismatched lengths ValueError: If checkNames is true and read names don't match

Example:

for pair in readFQPair("sample_R1.fastq.gz", "sample_R2.fastq.gz"):
  echo "Forward: ", pair.read1.name
  echo "Reverse: ", pair.read2.name
  processReadPair(pair.read1.sequence, pair.read2.sequence)

Formats a sequence record as a FASTA or FASTQ string

Args: name: Sequence name/identifier comment: Sequence comment (optional) sequence: The sequence string quality: Quality scores (empty for FASTA format)

Returns: Formatted FASTA/FASTQ string

Print FASTA record splitting the sequence into lines of N characters Returns: Formatted FASTA stringPrint FASTA record splitting the sequence into lines of N characters Returns: Formatted FASTA stringReturns: Formatted FASTA/FASTQ string

Convert a FQRecordPtr to a string (FASTA or FASTQ format)

Returns: Formatted FASTA/FASTQ string

Open a buffered reader over a gzip/plain file handle.

Args: f: Buffer object to initialize fn: Input filename ("-" means stdin) mode: File mode (only fmRead is supported) sz: Internal buffer size in bytes

Returns: 0 on success; may raise on open failure

Create and open a buffered reader in one call.

Args: fn: Input filename ("-" means stdin) mode: File mode (only fmRead is supported) sz: Internal buffer size in bytes

Returns: Initialized Bufio[T] instance

Close a buffered reader and its underlying file handle.

Args: f: Buffered reader to close

Returns: Close status from the underlying file handle

Report whether buffered reader has reached end-of-file.

Args: f: Buffered reader

Returns: true when no more bytes can be read

Read a single byte from the buffered reader.

Args: f: Buffered reader

Returns: Byte value (0..255) on success, or:

  • -1: EOF
  • -2: stream read error

Read up to sz bytes from the buffered reader into buf.

Args: f: Buffered reader buf: Destination buffer sz: Number of bytes to read offset: Write position inside buf

Returns: Number of bytes written into buf

Read from buffered stream until a delimiter or EOF.

Args: f: Buffered reader buf: Destination buffer dret: Delimiter character found delim: Delimiter mode:

- `-1`: read line
- `-2`: read field (space/tab/newline)
- other: read until the given byte value

offset: Write position inside buf

Returns: Number of bytes appended, or negative status code:

  • -1: EOF
  • -2: stream read error
  • -3: internal buffered-state error

Read one line from buffered stream.

Args: f: Buffered reader buf: Line destination buffer

Returns: true if a line was read, false on EOF/error

Read one FASTA/FASTQ record from a buffered stream.

Args: f: Buffered reader r: Record object reused for output

Returns: true if one record was parsed, false on EOF/error

Notes: On failure, r.status is set to a negative code:

  • -1: EOF
  • -2: stream read error
  • -3: parser/stream state error
  • -4: FASTQ sequence and quality length mismatch

Sort intervals in-place by their start coordinate.

Args: a: Interval sequence to sort

Build interval-tree auxiliary maxima for overlap queries.

Args: a: Interval sequence (sorted automatically if needed)

Returns: Height of the implicit interval tree

Iterate intervals that overlap the query range [st, en).

Args: a: Indexed interval sequence st: Query start coordinate (inclusive) en: Query end coordinate (exclusive)

Yields: Intervals that overlap the query range

Types

Bufio[T] = tuple[fp: T, buf: string, st, en, sz: int, EOF: bool]
Source   Edit  
GzFile = gzFile
Source   Edit  
Interval[S; T] = tuple[st, en: S, data: T, max: S]
Source   Edit  

Procs

proc `$`(rec: FQRecord): string {....raises: [], tags: [], forbids: [].}
Source   Edit  
proc `$`(rec: FQRecordPtr): string {....raises: [], tags: [], forbids: [].}
Source   Edit  
proc close[T](f: var Bufio[T]): int {.discardable.}
Source   Edit  
proc eof[T](f: Bufio[T]): bool {.noSideEffect.}
Source   Edit  
proc fafmt(rec: FQRecord; width: int = 60): string {....raises: [], tags: [],
    forbids: [].}
Source   Edit  
proc index[S, T](a: var seq[Interval[S, T]]): int {.discardable.}
Source   Edit  
proc kseq_init(fp: gzFile): ptr kseq_t {.
    header: "/Users/telatina/git/readfx/readfx/kseq.h", importc: "kseq_init",
    ...raises: [], tags: [], forbids: [].}
Source   Edit  
proc kseq_read(seq: ptr kseq_t): int {.header: "/Users/telatina/git/readfx/readfx/kseq.h",
                                       importc: "kseq_read", ...raises: [],
                                       tags: [], forbids: [].}
Source   Edit  
proc kseq_rewind(seq: ptr kseq_t) {.header: "/Users/telatina/git/readfx/readfx/kseq.h",
                                    importc: "kseq_rewind", ...raises: [],
                                    tags: [], forbids: [].}
Source   Edit  
proc open[T](f: var Bufio[T]; fn: string; mode: FileMode = fmRead;
             sz: int = 0x00010000): int {.discardable.}
Source   Edit  
proc read[T](f: var Bufio[T]; buf: var string; sz: int; offset: int = 0): int {.
    discardable.}
Source   Edit  
proc readByte[T](f: var Bufio[T]): int
Source   Edit  
proc readFastx[T](f: var Bufio[T]; r: var FQRecord): bool {.discardable.}
Source   Edit  
proc readLine[T](f: var Bufio[T]; buf: var string): bool {.discardable.}
Source   Edit  
proc readUntil[T](f: var Bufio[T]; buf: var string; dret: var char;
                  delim: int = -1; offset: int = 0): int {.discardable.}
Source   Edit  
proc sort[S, T](a: var seq[Interval[S, T]])
Source   Edit  
proc xopen[T](fn: string; mode: FileMode = fmRead; sz: int = 0x00010000): Bufio[
    T]
Source   Edit  

Iterators

iterator overlap[S, T](a: seq[Interval[S, T]]; st: S; en: S): Interval[S, T] {.
    noSideEffect.}
Source   Edit  
iterator readFQ(path: string): FQRecord {....raises: [IOError], tags: [],
    forbids: [].}
Source   Edit  
iterator readFQInterleavedPairPtr(path: string; checkNames: bool = false): FQPairPtr {.
    ...raises: [IOError, ValueError, IOError], tags: [], forbids: [].}
Source   Edit  
iterator readFQPair(path1: string; path2: string; checkNames: bool = false): FQPair {.
    ...raises: [IOError, ValueError], tags: [], forbids: [].}
Source   Edit  
iterator readFQPairPtr(path1: string; path2: string; checkNames: bool = false): FQPairPtr {.
    ...raises: [IOError, IOError, ValueError], tags: [], forbids: [].}
Source   Edit  
iterator readFQPtr(path: string): FQRecordPtr {....raises: [IOError], tags: [],
    forbids: [].}
Source   Edit