This workshop focuses on showing a simplified workflow to mine phages in a metagenome.

Starting from reads, we need to do some quality filtering and assembly to get contigs.

In a simplified workflow:

workflow

We will start with the co-assembly of three samples, so we will use:

  1. A viral miner tool to predict viral sequences in a metagenome (geNomad), that will produce a FASTA file with viral sequences and prophages
  2. A program to check the quality of the predictions (checkV), will give us a report on each prediction checking for bacterial contamination, completeness, and a score based on marker genes.
  3. To produce vOTUs (Viral Operational Taxonomic Units) we will dereplicate the viral sequences
  4. We will rename the sequences with SeqFu to make them Anvi’o friendly

To keep our Anvi’o love high, we can also:

  1. Back-map the reads to the vOTUs to estimate their abundance in the original samples with MiniMap2 and samtools
  2. Use Anvi’o to generate a visualization focused on the vOTUs, building a CONTIGS.db database only based on the candidate vOTUs.

EBAME settings

We will use VMs provided by Biosphere, and with a setup script datasets and databases will be automatically available. See EBAME Setup


Next submodule: