NS Bio
/pipelines
Analysis pipelines

Production-ready pipelines for microbial genomics

End-to-end analysis from raw sequencing reads to annotated, client-ready reports. Every pipeline runs on sovereign UK infrastructure, produces reproducible results, and generates self-contained HTML reports with no external dependencies.

Pipeline 1 of 2 · Whole-genome

Bacterial whole-genome analysis

The bacterialp pipeline processes Oxford Nanopore whole-genome sequencing data for bacterial isolates. It takes raw FASTQ files and produces a comprehensive per-sample report covering species identification, genome assembly, antimicrobial resistance, MLST typing, and species-specific characterisation.

The pipeline is designed for clinical and public-health microbiology workflows where turnaround time and traceability matter. Each stage runs automatically, and the final report is a single self-contained HTML file that can be opened in any browser, saved as PDF, or archived into a patient record.

Pipeline stages

# Stage Tool
1Quality filteringfastp / fastplong
2Taxonomic classificationKraken2
3De novo assemblyDragonflye (Flye)
4Genome completenessBUSCO
5Gene annotationBakta
6AMR detectionABRicate + PlasmidFinder
7MLST typingmlst
8Species-specific typingSee below

Species-specific tools

When a known clinically-relevant species is identified by Kraken2, the pipeline automatically runs the appropriate species-specific typing tool:

Species Tool Detects
Staphylococcus aureusSCCmecSCCmec cassette type, methicillin resistance
Klebsiella pneumoniaeKleborateCapsule (K-locus), virulence factors, AMR profile
Salmonella entericaSeqSero2Serovar and serotype prediction from genome
Pseudomonas aeruginosaPASTYO-antigen serogroup typing
Escherichia coliECTyperO- and H-antigen serotyping (e.g. O157:H7)
Non-tuberculous MycobacteriaNTM-ProfilerNTM species ID, drug-resistance profile

What the report includes

General summary
QC pass/fail badges, top species, BUSCO completeness, MLST type
Quality filtering
Read statistics before and after filtering, with length and quality distributions
Taxonomic classification
Top species identified by Kraken2 with read-level confidence
Genome completeness
BUSCO stacked-bar chart with completeness, duplication, and fragmentation metrics
Gene annotation
Bakta output with genome statistics and annotated feature counts
AMR genes
Resistance class, coverage, plasmid association, with CSV export
MLST typing
Sequence type and full allele profile from PubMLST schemes
Species-specific results
Serotype, serogroup, SCCmec type, etc., where applicable

Example reports

Each report below was generated from publicly available Oxford Nanopore whole-genome sequencing data from NCBI SRA. Click to open the full interactive report.

Pipeline 2 of 2 · Amplicon

16S / ITS amplicon analysis

The NS Amplicon pipeline processes Oxford Nanopore 16S rRNA and ITS amplicon sequencing data for microbial community profiling. It takes raw FASTQ files through quality filtering, read clustering, consensus polishing, and taxonomic classification to produce per-sample abundance reports with diversity metrics and rarefaction analysis.

Amplicon vs whole-genome

Where the bacterial WGS pipeline asks what is this isolate?, the amplicon pipeline asks what is in this mixed community, and in what proportions?

WGS sequences entire genomes from cultured isolates — single organism, deep resolution, clinically definitive. Amplicon sequences a single conserved marker gene (16S rRNA for bacteria/archaea, ITS for fungi) from a mixed sample, giving you the membership and relative abundance of every taxon present without needing culture. The trade-off is genus-level rather than strain-level resolution.

Pipeline stages

# Stage Tool
1Quality filteringfastplong
2Read clusteringUMAP + HDBSCAN
3Consensus polishingSPOA + Racon + Medaka
4Taxonomic classificationminimap2 + SILVA / UNITE
5Abundance estimationExpectation-Maximisation
6Diversity & rarefactionShannon, Simpson, rarefaction

Diversity metrics explained

The amplicon pipeline reports three standard ecological measures that together describe the structure of the microbial community in each sample:

Shannon index (H′)
Combines richness (how many taxa) and evenness (how equally they are distributed). Higher values indicate more diverse communities. Sensitive to rare taxa.
Simpson index (1−D)
The probability that two reads drawn at random come from different taxa. Weighted toward dominant taxa, making it more robust to sequencing noise.
Rarefaction
Subsamples reads at increasing depths to assess whether sequencing captured the full community. A plateauing curve indicates saturation.

What the report includes

Run summary
QC metrics, read retention, species count, and diversity indices at a glance
Taxonomic composition
Interactive donut chart and abundance table with CSV export
Rarefaction analysis
Curve with confidence bands and saturation assessment
Methods
Full methods description for reproducibility and publication

Example report

Data sovereignty

All pipeline processing is performed on sovereign UK infrastructure. Sequencing data never leaves NS Bio’s self-hosted environment. Reports are self-contained HTML files with no external dependencies — no CDN requests, no analytics, no third-party fonts loaded at runtime. Every font and image is embedded directly in the file.

Get in touch

If you’d like to discuss running one of these pipelines on your data, or need a custom analysis workflow, contact us at enquiries@ns-bio.co.uk.