GutMock_sup_rep1

NS Amplicon · 16S · Generated 2026-05-11 14:49

Run Summary

Metric	Value
Sample ID	GutMock_sup_rep1
Amplicon	16S
Top organism	Bacteroides fragilis (21.1%)
Species detected	12
Shannon diversity (H')	2.15
Simpson diversity (1-D)	0.863

QC Metric	Value	Status
Raw reads	72,830
After QC	63,638	PASS
Retention	87.4%	PASS
Mean read length	1439 bp	PASS
Rarefaction	Depth saturation	PASS

Taxonomic Composition

Abundance Table

Organism	Abundance (%)	Est. Reads	Conf.
Bacteroides fragilis	21.11%	13,200	0.77
Faecalibacterium prausnitzii	18.62%	11,644	1.00
Veillonella sp.	13.34%	8,344	0.99
Veillonella parvula	13.33%	8,335	0.99
Escherichia coli	10.42%	6,517	0.99
Prevotella corporis	8.32%	5,206	1.00
Clostridioides difficile	5.02%	3,138	1.00
Roseburia hominis	2.64%	1,650	1.00
Fusobacterium nucleatum	2.63%	1,647	0.90
Bifidobacterium adolescentis	2.61%	1,631	1.00
Lactobacillus fermentum	1.14%	710	0.33
Akkermansia muciniphila	0.82%	515	1.00

Confidence score: Mean posterior probability from the EM algorithm, indicating how unambiguously reads map to this species. Higher values mean more reliable identification.
≥ 0.90 High confidence 0.70 – 0.89 Moderate < 0.70 Low confidence

Rarefaction Analysis

What is rarefaction? A rarefaction curve shows how the number of detected species increases as more reads are sampled. It is used to assess whether sequencing depth is sufficient to capture the full diversity of the sample. A curve that plateaus (flattens) indicates adequate coverage — further sequencing would find few additional species. The shaded region shows the 90% confidence interval from repeated subsampling.

The rarefaction curve has plateaued, indicating that sequencing depth is sufficient to capture the majority of species in this sample. Additional sequencing is unlikely to reveal new taxa.

Methods

Sequencing reads were processed using the NS Amplicon pipeline. Raw reads were quality-filtered using fastplong with amplicon-specific length thresholds (16S: 1200-1800 bp; ITS: 200-1200 bp).

Filtered reads were clustered using UMAP dimensionality reduction on k-mer frequency vectors followed by HDBSCAN density-based clustering. Draft consensus sequences were generated per cluster using SPOA, then polished with Racon (2 rounds) and Medaka (1 round).

Polished consensus sequences were aligned against reference databases (SILVA 138.2 for 16S rRNA; UNITE v10.0 for ITS) using minimap2. Species-level abundances were estimated using an Expectation-Maximization algorithm that probabilistically resolves multi-mapped reads, weighted by cluster sizes.