GutMock_sup_rep1
Run Summary
| Metric | Value |
|---|---|
| Sample ID | GutMock_sup_rep1 |
| Amplicon | 16S |
| Top organism | Bacteroides fragilis (21.1%) |
| Species detected | 12 |
| Shannon diversity (H') | 2.15 |
| Simpson diversity (1-D) | 0.863 |
| QC Metric | Value | Status |
|---|---|---|
| Raw reads | 72,830 | |
| After QC | 63,638 | PASS |
| Retention | 87.4% | PASS |
| Mean read length | 1439 bp | PASS |
| Rarefaction | Depth saturation | PASS |
Taxonomic Composition
| Organism | Abundance (%) | Est. Reads | Conf. |
|---|---|---|---|
| Bacteroides fragilis | 21.11% | 13,200 | 0.77 |
| Faecalibacterium prausnitzii | 18.62% | 11,644 | 1.00 |
| Veillonella sp. | 13.34% | 8,344 | 0.99 |
| Veillonella parvula | 13.33% | 8,335 | 0.99 |
| Escherichia coli | 10.42% | 6,517 | 0.99 |
| Prevotella corporis | 8.32% | 5,206 | 1.00 |
| Clostridioides difficile | 5.02% | 3,138 | 1.00 |
| Roseburia hominis | 2.64% | 1,650 | 1.00 |
| Fusobacterium nucleatum | 2.63% | 1,647 | 0.90 |
| Bifidobacterium adolescentis | 2.61% | 1,631 | 1.00 |
| Lactobacillus fermentum | 1.14% | 710 | 0.33 |
| Akkermansia muciniphila | 0.82% | 515 | 1.00 |
≥ 0.90 High confidence 0.70 – 0.89 Moderate < 0.70 Low confidence
Rarefaction Analysis
The rarefaction curve has plateaued, indicating that sequencing depth is sufficient to capture the majority of species in this sample. Additional sequencing is unlikely to reveal new taxa.
Methods
Sequencing reads were processed using the NS Amplicon pipeline. Raw reads were quality-filtered using fastplong with amplicon-specific length thresholds (16S: 1200-1800 bp; ITS: 200-1200 bp).
Filtered reads were clustered using UMAP dimensionality reduction on k-mer frequency vectors followed by HDBSCAN density-based clustering. Draft consensus sequences were generated per cluster using SPOA, then polished with Racon (2 rounds) and Medaka (1 round).
Polished consensus sequences were aligned against reference databases (SILVA 138.2 for 16S rRNA; UNITE v10.0 for ITS) using minimap2. Species-level abundances were estimated using an Expectation-Maximization algorithm that probabilistically resolves multi-mapped reads, weighted by cluster sizes.