Considering the vast diversity in the “non cellular world”, you can tailor this step to be “mobilome mining”, “virus mining” etc. We are usually mostly interested in phages, but you will see there is more to it.

before genomad

The first step is to have a set of contigs, which can be obtained from a metagenome assembly, and flag those contigs we believe to be viral (or part of the mobilome). We will use geNomad to spot viral contigs in the metagenome co-assembly

before checkv

The lack of a universal marker gene for viruses makes the identification of viral sequences a challenging task, and historically false positives were plaguing the mininig step. geNomad is doing a great job in giving good results.

checkV is a tools that using gene annotations can spot good quality viruses. This is an excellent step, but at the moment the database used is outdated so while you will be able to get a useful table to assess the quality of well annotated contigs, it’s not wise to blindly trash contigs not crowned by checkV: they might still be of interest especially in environmental samples. In our tutorial we will use it to see how it works, and we will see that with gut samples it’s doing a nice job in spite of its limitations.


Next submodule: