Every miner use a mix of signals to try to identify a contig as viral. The enormous diversity, the fact that there are very short phage genomes, and lack of universal markers, makes this process very challenging.
Now that we have some predicted viruses, let’s look at their completeness! CheckV is a tool developed to assess the quality and completeness of viral genomes assembled from viromes/metagenomes.
CheckV, similarily to geNomad, is also a pipeline we will run end-to-end to:
We can once again install using mamba/conda as below:
mamba create -n checkv -c conda-forge -c bioconda checkv
conda activate checkv
You will find the pre-downloaded database in $VIROME/checkv-db-v1.5/
How to download the checkV database at home?
If you are running this tutorial outside of EBAME VM, you will need to download the checkV database
checkv download_database ./
Don’t forget to update your checkV script to point to your database localisation!
Now remember to activate the environment and invoke the help message (similar to geNomad above…).
Try to create to run checkV on the predicted viruses from geNomad.
checkv end_to_end ~/genomad-out/genomad_votus.fna ~/checkv-out -d $VIROME/checkv-db-v1.5/ -t 8
The parameters are <ASSEMBLY>
, <OUTPUT_DIR>
, -d <DB_DIR>
, -t
for the number of threads.
Now you can check the output files.
With ls
and find
you can list them, and if you find a table you can inspect it interactively with vd
(visidata, press q
to quit).
In the summary columns, AAI stands for “average amino acid identity”,
To list the files
ls -lh ~/checkv-out/
To inspect the summary table:
vd checkv-out/quality_summary.tsv
Similarily, you can check the other TSV files.