Every miner use a mix of signals to try to identify a contig as viral. The enormous diversity, the fact that there are very short phage genomes, and lack of universal markers, makes this process very challenging.

Now that we have some predicted viruses, let’s look at their completeness! CheckV is a tool developed to assess the quality and completeness of viral genomes assembled from viromes/metagenomes.

CheckV, similarily to geNomad, is also a pipeline we will run end-to-end to:

  • Remove host contamination on proviruses (uses HMMs),
  • Estimate genome completeness
  • Predict full genomes (Direct terminal repeats, Proviruses, Inverted terminal repeats)

Installation

We can once again install using mamba/conda as below:

mamba create -n checkv -c conda-forge -c bioconda checkv
conda activate checkv

You will find the pre-downloaded database in $VIROME/checkv-db-v1.5/

:information_source: How to download the checkV database at home?

If you are running this tutorial outside of EBAME VM, you will need to download the checkV database

  checkv download_database ./

Don’t forget to update your checkV script to point to your database localisation!

Running CheckV

Now remember to activate the environment and invoke the help message (similar to geNomad above…).

:pencil2: Try to create to run checkV on the predicted viruses from geNomad.

checkv end_to_end ~/genomad-out/genomad_votus.fna ~/checkv-out -d $VIROME/checkv-db-v1.5/ -t 8

The parameters are <ASSEMBLY>, <OUTPUT_DIR>, -d <DB_DIR>, -t for the number of threads.

Now you can check the output files.

With ls and find you can list them, and if you find a table you can inspect it interactively with vd (visidata, press q to quit).

In the summary columns, AAI stands for “average amino acid identity”,

To list the files

ls -lh ~/checkv-out/

To inspect the summary table:

vd checkv-out/quality_summary.tsv 

Similarily, you can check the other TSV files.


Previous submodule:
Next submodule: