Show Menu

Validation Metagenomics Workflows Cheat Sheet (DRAFT) by [deleted]

Validation Metagenomics Workflows

This is a draft cheat sheet. It is a work in progress and is not finished yet.


Shotgun metage­nomics is a powerful platform to charac­terize human microb­iomes. However, to translate such survey data into consum­er-­rel­evant products or services, it is critical to have a robust metage­nom-ics workfl ow. We present a tool – spike-in DNA – to assess perfor­mance of metage­nomics workfl ows. The spike-in is DNA from two organisms – Alivibrio fi scheri and Rhodop­seu­domonas palustris , in a ratio of 4:1 added to samples before DNA extrac­tion. With a valid workfl ow, the output ratio of relative abundances of these organisms should be close to 4. This expect­ation was tested in samples of varying divers­ities (n = 110), and the mean ratio was 4.73 (99% CI [4.0, 5.24]). We anticipate this tool to be a relevant community resource for assessing the quality of shotgun metage­nomics workfl ows and thereby enable robust charac­ter­ization of microb­iomes.

Stage 1

1. . Break workflow into discrete modules, e.g., DNAext­raction and library prepar­ation.

2. Add spike-in genomic DNA to the sample of interest at the first step of the module. F or instance, spike-­inb­efore tagmen­tation if library prepar­ation is the module.

3. Module with the maximum deviation from expected ratio is identified and iterated upon for improv­ement.

4. Put modules together and use spike-in to validate the entire workflow (Stage 2).

Stage 2

1. Assess variance in the spike-in ratio using the experi­mental design outlined (Figure 1).

2. Spike-in ratio should be closest to the expected value when the spike-in genomic DNA is added to sterile saline and processed through the workflow.

3. Test the spike-in perfor­mance in samples of varying comple­xities. Ensure that these samples include microb­iomes of interest. Defining the acceptable variance is left to the operator’s discre­tion. Based upon all the samples described here, we defined the acceptable range to be between 4 and 5.4(99% confidence interv­als).

Figure 1

Stage 3

Spike-in can be used for per run QC as follows:

1. Include tripli­cates of just the spike-in added to sterile saline as positive control.

2. A pooled sample can be created by mixing the samples of interest. The spike-in genomic DNA can be added to this pool in triplicate and processed through the workflow. Spike-in perfor­mance is calculated as outlined in Stage 2.