GutMock_sup_rep1

NS Amplicon · 16S · Generated 2026-05-11 14:49
Run Summary

Run Summary

MetricValue
Sample IDGutMock_sup_rep1
Amplicon16S
Top organismBacteroides fragilis (21.1%)
Species detected12
Shannon diversity (H')2.15
Simpson diversity (1-D)0.863
QC MetricValueStatus
Raw reads72,830
After QC63,638PASS
Retention87.4%PASS
Mean read length1439 bpPASS
RarefactionDepth saturationPASS
Taxonomic Composition

Taxonomic Composition

Abundance Table
OrganismAbundance (%)Est. ReadsConf.
Bacteroides fragilis 21.11% 13,200 0.77
Faecalibacterium prausnitzii 18.62% 11,644 1.00
Veillonella sp. 13.34% 8,344 0.99
Veillonella parvula 13.33% 8,335 0.99
Escherichia coli 10.42% 6,517 0.99
Prevotella corporis 8.32% 5,206 1.00
Clostridioides difficile 5.02% 3,138 1.00
Roseburia hominis 2.64% 1,650 1.00
Fusobacterium nucleatum 2.63% 1,647 0.90
Bifidobacterium adolescentis 2.61% 1,631 1.00
Lactobacillus fermentum 1.14% 710 0.33
Akkermansia muciniphila 0.82% 515 1.00
Confidence score: Mean posterior probability from the EM algorithm, indicating how unambiguously reads map to this species. Higher values mean more reliable identification.
≥ 0.90 High confidence    0.70 – 0.89 Moderate    < 0.70 Low confidence
Rarefaction Analysis

Rarefaction Analysis

What is rarefaction? A rarefaction curve shows how the number of detected species increases as more reads are sampled. It is used to assess whether sequencing depth is sufficient to capture the full diversity of the sample. A curve that plateaus (flattens) indicates adequate coverage — further sequencing would find few additional species. The shaded region shows the 90% confidence interval from repeated subsampling.

The rarefaction curve has plateaued, indicating that sequencing depth is sufficient to capture the majority of species in this sample. Additional sequencing is unlikely to reveal new taxa.

Methods

Methods

Sequencing reads were processed using the NS Amplicon pipeline. Raw reads were quality-filtered using fastplong with amplicon-specific length thresholds (16S: 1200-1800 bp; ITS: 200-1200 bp).

Filtered reads were clustered using UMAP dimensionality reduction on k-mer frequency vectors followed by HDBSCAN density-based clustering. Draft consensus sequences were generated per cluster using SPOA, then polished with Racon (2 rounds) and Medaka (1 round).

Polished consensus sequences were aligned against reference databases (SILVA 138.2 for 16S rRNA; UNITE v10.0 for ITS) using minimap2. Species-level abundances were estimated using an Expectation-Maximization algorithm that probabilistically resolves multi-mapped reads, weighted by cluster sizes.