ReadFX: A Nim library for bioinformatics sequence file parsing
This module provides efficient parsing and manipulation of FASTA/FASTQ format sequence files commonly used in bioinformatics.
Features:
- Fast FASTA/FASTQ sequence parsing (supports gzipped files)
- Buffered I/O for efficient file reading
- Interval tree implementation for genomic interval operations
Example:
import readfx # Read a FASTQ file for record in readFQ("sample.fastq.gz"): echo "Sequence name: ", record.name echo "Sequence: ", record.sequence echo "Quality: ", record.quality
Initialize a kseq parser handle from an open gzFile stream.
Args: fp: Open gzip/plain-text stream handle
Returns: Pointer to an initialized parser state
Reset parser state to the beginning of the input stream.
Args: seq: Parser state previously created with kseq_init
Read the next FASTA/FASTQ record from the parser stream.
Args: seq: Parser state previously created with kseq_init
Returns: Record sequence length on success, or a negative status code on EOF/error
Iterator for reading FASTQ files, returning pointers to record data
Note: The pointers are reused between iterations, so don't store them. For stdin input, use "-" as the path parameter. Cached lengths are available on each FQRecordPtr field and exclude the trailing NUL terminator.
Args: path: Path to the FASTQ file (supports gzipped files)
Returns: An iterator yielding FQRecordPtr objects
Raises: IOError: If the input stream cannot be opened
Example:
for rec in readFQPtr("sample.fastq.gz"): echo $rec.name echo $rec.sequence
Iterator for reading FASTQ files, returning copies of record data
This iterator creates copies of the strings, unlike readFQPtr which returns pointers to the underlying data.
Args: path: Path to the FASTQ file (supports gzipped files)
Returns: An iterator yielding FQRecord objects with copied data
Raises: IOError: Propagated from readFQPtr if the input stream cannot be opened
Example:
for rec in readFQ("sample.fastq.gz"): echo rec.name echo rec.sequence
Iterator for reading paired-end FASTQ files synchronously with pointers
Reads two FASTQ files in parallel, yielding pairs of corresponding records. Pointer fields are reused between iterations; convert to strings if data must be retained after the next yield. Cached lengths are available on both read1 and read2.
Args: path1: Path to the first FASTQ file (R1, forward reads) path2: Path to the second FASTQ file (R2, reverse reads) checkNames: Whether to verify that read names match (default: false)
Returns: An iterator yielding FQPairPtr objects with synchronized reads
Raises: IOError: If files cannot be opened or have mismatched lengths ValueError: If checkNames is true and read names don't match
Iterator for reading interleaved paired-end FASTQ files with pointers.
Reads one interleaved FASTQ stream and yields each adjacent R1/R2 record pair as FQPairPtr. read1 is copied into scratch storage so that both records remain valid until the next yield. Scratch-backed pointers are always NUL-terminated.
Args: path: Path to the interleaved FASTQ file (supports gzipped files; "-" reads from stdin) checkNames: Whether to verify that read names match after mate suffix normalization (default: false)
Returns: An iterator yielding FQPairPtr objects with cached field lengths
Raises: IOError: If the input stream cannot be opened or ends with an incomplete pair ValueError: If input is not FASTQ or checkNames detects a mismatch
Iterator for reading paired-end FASTQ files synchronously
Reads two FASTQ files in parallel, yielding pairs of corresponding records. The files must have the same number of sequences in the same order.
Args: path1: Path to the first FASTQ file (R1, forward reads) path2: Path to the second FASTQ file (R2, reverse reads) checkNames: Whether to verify that read names match (default: false)
Returns: An iterator yielding FQPair objects with synchronized reads
Raises: IOError: If files cannot be opened or have mismatched lengths ValueError: If checkNames is true and read names don't match
Example:
for pair in readFQPair("sample_R1.fastq.gz", "sample_R2.fastq.gz"): echo "Forward: ", pair.read1.name echo "Reverse: ", pair.read2.name processReadPair(pair.read1.sequence, pair.read2.sequence)
Formats a sequence record as a FASTA or FASTQ string
Args: name: Sequence name/identifier comment: Sequence comment (optional) sequence: The sequence string quality: Quality scores (empty for FASTA format)
Returns: Formatted FASTA/FASTQ string
Print FASTA record splitting the sequence into lines of N characters Returns: Formatted FASTA stringPrint FASTA record splitting the sequence into lines of N characters Returns: Formatted FASTA stringReturns: Formatted FASTA/FASTQ stringConvert a FQRecordPtr to a string (FASTA or FASTQ format)
Returns: Formatted FASTA/FASTQ string
Open a buffered reader over a gzip/plain file handle.
Args: f: Buffer object to initialize fn: Input filename ("-" means stdin) mode: File mode (only fmRead is supported) sz: Internal buffer size in bytes
Returns: 0 on success; may raise on open failure
Create and open a buffered reader in one call.
Args: fn: Input filename ("-" means stdin) mode: File mode (only fmRead is supported) sz: Internal buffer size in bytes
Returns: Initialized Bufio[T] instance
Close a buffered reader and its underlying file handle.
Args: f: Buffered reader to close
Returns: Close status from the underlying file handle
Report whether buffered reader has reached end-of-file.
Args: f: Buffered reader
Returns: true when no more bytes can be read
Read a single byte from the buffered reader.
Args: f: Buffered reader
Returns: Byte value (0..255) on success, or:
- -1: EOF
- -2: stream read error
Read up to sz bytes from the buffered reader into buf.
Args: f: Buffered reader buf: Destination buffer sz: Number of bytes to read offset: Write position inside buf
Returns: Number of bytes written into buf
Read from buffered stream until a delimiter or EOF.
Args: f: Buffered reader buf: Destination buffer dret: Delimiter character found delim: Delimiter mode:
- `-1`: read line - `-2`: read field (space/tab/newline) - other: read until the given byte value
offset: Write position inside buf
Returns: Number of bytes appended, or negative status code:
- -1: EOF
- -2: stream read error
- -3: internal buffered-state error
Read one line from buffered stream.
Args: f: Buffered reader buf: Line destination buffer
Returns: true if a line was read, false on EOF/error
Read one FASTA/FASTQ record from a buffered stream.
Args: f: Buffered reader r: Record object reused for output
Returns: true if one record was parsed, false on EOF/error
Notes: On failure, r.status is set to a negative code:
- -1: EOF
- -2: stream read error
- -3: parser/stream state error
- -4: FASTQ sequence and quality length mismatch
Sort intervals in-place by their start coordinate.
Args: a: Interval sequence to sort
Build interval-tree auxiliary maxima for overlap queries.
Args: a: Interval sequence (sorted automatically if needed)
Returns: Height of the implicit interval tree
Iterate intervals that overlap the query range [st, en).
Args: a: Indexed interval sequence st: Query start coordinate (inclusive) en: Query end coordinate (exclusive)
Yields: Intervals that overlap the query range
Procs
proc `$`(rec: FQRecordPtr): string {....raises: [], tags: [], forbids: [].}
- Source Edit
proc kseq_rewind(seq: ptr kseq_t) {.header: "/Users/telatina/git/readfx/readfx/kseq.h", importc: "kseq_rewind", ...raises: [], tags: [], forbids: [].}
- Source Edit
Iterators
iterator readFQInterleavedPairPtr(path: string; checkNames: bool = false): FQPairPtr {. ...raises: [IOError, ValueError, IOError], tags: [], forbids: [].}
- Source Edit
iterator readFQPair(path1: string; path2: string; checkNames: bool = false): FQPair {. ...raises: [IOError, ValueError], tags: [], forbids: [].}
- Source Edit
iterator readFQPairPtr(path1: string; path2: string; checkNames: bool = false): FQPairPtr {. ...raises: [IOError, IOError, ValueError], tags: [], forbids: [].}
- Source Edit
iterator readFQPtr(path: string): FQRecordPtr {....raises: [IOError], tags: [], forbids: [].}
- Source Edit
Exports
-
FQPair, Strand, FQPairPtr, SeqComp, FQRecord, FQRecordPtr, revCompl, composition, revCompl, maskLowQuality, trimEnd, qualIntToChar, avgQuality, avgQuality, qualityTrim, revCompl, filtPolyX, subSequence, gcContent, gcContent, trimStart, trimQuality, qualCharToInt, findOligoMatches, findPrimerMatches, matchIUPAC, DefaultFastaWidth, DefaultBufferSize, flush, DefaultCompressionLevel, close, writeRecord, fileDestination, stdoutDestination, writeRecord, FastxFormat, FastxDestinationKind, fastxWriter, FastxWriter, FastxDestination