ReadFX: A Nim library for bioinformatics sequence file parsing
This module provides efficient parsing and manipulation of FASTA/FASTQ format sequence files commonly used in bioinformatics.
Features:
- Fast FASTA/FASTQ sequence parsing (supports gzipped files)
- Buffered I/O for efficient file reading
- Interval tree implementation for genomic interval operations
Example:
import readfx # Read a FASTQ file for record in readFQ("sample.fastq.gz"): echo "Sequence name: ", record.name echo "Sequence: ", record.sequence echo "Quality: ", record.quality
Iterator for reading FASTQ files, returning pointers to record data
Note: The pointers are reused between iterations, so don't store them. For stdin input, use "-" as the path parameter.
Args: path: Path to the FASTQ file (supports gzipped files)
Returns: An iterator yielding FQRecordPtr objects
Example:
for rec in readFQPtr("sample.fastq.gz"): echo $rec.name echo $rec.sequence
Iterator for reading FASTQ files, returning copies of record data
This iterator creates copies of the strings, unlike readFQPtr which returns pointers to the underlying data.
Args: path: Path to the FASTQ file (supports gzipped files)
Returns: An iterator yielding FQRecord objects with copied data
Example:
for rec in readFQ("sample.fastq.gz"): echo rec.name echo rec.sequence
Iterator for reading paired-end FASTQ files synchronously
Reads two FASTQ files in parallel, yielding pairs of corresponding records. The files must have the same number of sequences in the same order.
Args: path1: Path to the first FASTQ file (R1, forward reads) path2: Path to the second FASTQ file (R2, reverse reads) checkNames: Whether to verify that read names match (default: false)
Returns: An iterator yielding FQPair objects with synchronized reads
Raises: IOError: If files cannot be opened or have mismatched lengths ValueError: If checkNames is true and read names don't match
Example:
for pair in readFQPair("sample_R1.fastq.gz", "sample_R2.fastq.gz"): echo "Forward: ", pair.read1.name echo "Reverse: ", pair.read2.name processReadPair(pair.read1.sequence, pair.read2.sequence)
Formats a sequence record as a FASTA or FASTQ string
Args: name: Sequence name/identifier comment: Sequence comment (optional) sequence: The sequence string quality: Quality scores (empty for FASTA format)
Returns: Formatted FASTA/FASTQ string
Print FASTA record splitting the sequence into lines of N characters Returns: Formatted FASTA stringPrint FASTA record splitting the sequence into lines of N characters Returns: Formatted FASTA stringReturns: Formatted FASTA/FASTQ stringConvert a FQRecordPtr to a string (FASTA or FASTQ format)
Returns: Formatted FASTA/FASTQ string
Procs
proc `$`(rec: FQRecordPtr): string {....raises: [], tags: [], forbids: [].}
- Source Edit
proc kseq_rewind(seq: ptr kseq_t) {.header: "/Users/telatina/git/readfx/readfx/kseq.h", importc: "kseq_rewind", ...raises: [], tags: [], forbids: [].}
- Source Edit
Iterators
iterator readFQPair(path1: string; path2: string; checkNames: bool = false): FQPair {. ...raises: [IOError, IOError, ValueError], tags: [], forbids: [].}
- Source Edit
iterator readFQPtr(path: string): FQRecordPtr {....raises: [], tags: [], forbids: [].}
- Source Edit