Pipelines — NS Bio

Pipeline 01 · Whole-genome

Bacterial whole-genome analysis.

The bacterialp pipeline takes raw Oxford Nanopore FASTQ files and produces a single self-contained HTML report — species identification, assembly, AMR profile, MLST, and species-specific typing in one document.

Designed for clinical and public-health microbiology workflows where turnaround time and traceability matter. The final report can be opened in any browser, saved as PDF, or archived into a patient record.

Oxford Nanopore Assembly AMR MLST Phylogeny

See the output

Six worked examples from public Nanopore data.

Click any card to open the full interactive report — exactly what you receive.

Live reports

PASS

Staphylococcus aureus

MRSA · SCCmec type IVa

N50 2.89 Mb · MLST ST22 · ERR14835505

PASS

Klebsiella pneumoniae

Kleborate · K-locus KL46

N50 5.35 Mb · MLST ST277 · SRR38388358

PASS

Salmonella enterica

Serovar Agona · SeqSero2

N50 4.94 Mb · MLST ST13 · SRR38360442

PASS

Pseudomonas aeruginosa

PASTY · O-serogroup O11 · carbapenemase+

N50 7.18 Mb · MLST ST308 · SRR38324899

PASS

Escherichia coli

ECTyper · O6:H1

N50 5.13 Mb · MLST ST73 · SRR38074175

PASS

Mycobacterium avium

NTM-Profiler · subsp. hominissuis

N50 2.78 Mb · SRR34136513

Under the hood

Eight stages, fully automated.click any stage for tools

From quality filtering to species-specific typing — every stage runs without manual intervention, with full provenance captured for the audit trail.

01 Quality filtering

Trims sequencing adapters, removes low-quality bases, and discards reads that fall below length or quality thresholds — clean input for everything downstream.

fastp ↗ fastplong ↗
02 Taxonomic classification

Identifies the species in the sample by matching k-mers against a curated reference database. Confirms the expected organism and flags contamination early.

Kraken2 ↗
03 De novo assembly

Reconstructs the bacterial genome from filtered long reads, including chromosomal and plasmid contigs. Optimised for Nanopore-only or hybrid input.

Dragonflye ↗ Flye ↗
04 Genome completeness

Scores the assembly against expected single-copy orthologs to measure completeness, duplication, and fragmentation. Surfaces incomplete or fragmented assemblies before they reach the report.

BUSCO ↗
05 Gene annotation

Predicts and labels coding sequences, rRNA, tRNA, and other genomic features. Produces a structured annotation suitable for downstream comparative work.

Bakta ↗
06 AMR detection

Screens the assembly for known antimicrobial resistance genes and assesses plasmid context — with resistance class, coverage, and identity reported per hit.

ABRicate ↗ PlasmidFinder ↗
07 MLST typing

Assigns a sequence type by matching the canonical seven housekeeping gene alleles to PubMLST schemes — the standard genotyping framework for epidemiology.

mlst ↗
08 Species-specific typing

When Kraken2 identifies a clinically-relevant species, the pipeline automatically branches into the appropriate typing tool — serotyping, capsule typing, SCCmec, and more. See the cards below for the full set.

Auto-selected per species

Stage 08 detail

Species-specific typing tools.

When a known clinically-relevant species is identified, the pipeline automatically runs the appropriate typing tool for that organism.

Staphylococcus aureus

SCCmec ↗

SCCmec cassette type, methicillin resistance status.

Klebsiella pneumoniae

Kleborate ↗

Capsule (K-locus), virulence factors, full AMR profile.

Salmonella enterica

SeqSero2 ↗

Serovar and serotype prediction directly from the genome.

Pseudomonas aeruginosa

PASTY ↗

O-antigen serogroup typing.

Escherichia coli

ECTyper ↗

O- and H-antigen serotyping (e.g. O157:H7).

Non-tuberculous Mycobacteria

NTM-Profiler ↗

NTM species identification and drug-resistance profile.

Deliverable

What the report includes.

Eight sections, all in one HTML file. Interactive charts, sortable tables, and CSV exports throughout.

General summary

QC pass/fail badges, top species, BUSCO completeness, MLST type.

Quality filtering

Read stats before and after filtering, with length and quality distributions.

Taxonomic classification

Top species identified by Kraken2 with read-level confidence.

Genome completeness

BUSCO stacked-bar chart with completeness, duplication, and fragmentation metrics.

Gene annotation

Bakta output with genome statistics and annotated feature counts.

AMR genes

Resistance class, coverage, plasmid association — with CSV export.

MLST typing

Sequence type and full allele profile from PubMLST schemes.

Species-specific results

Serotype, serogroup, SCCmec type and similar, where applicable.

Pipeline 02 · Amplicon

16S / ITS amplicon analysis.

The NS Amplicon pipeline profiles mixed microbial communities from Oxford Nanopore 16S rRNA or ITS sequencing — what is in the sample, and in what proportions?

Raw FASTQ files are taken through quality filtering, read clustering, consensus polishing, and taxonomic classification to produce per-sample abundance reports with diversity metrics and rarefaction curves.

16S rRNA ITS Nanopore Diversity SILVA / UNITE

Amplicon vs whole-genome

WGS asks what is this isolate? — deep, strain-level resolution from a single cultured organism.

Amplicon sequencing asks what is in this mixed community, and at what relative abundance? — no culture needed, but genus-level resolution.

See the output

Worked example: gut mock community.

A 12-species mock community sequenced on Nanopore. Click to open the full report — exactly what you receive.

Live report

PASS

Gut mock community

12 taxa identified · Shannon 2.15

16S rRNA · mock · GutMock_sup_rep1

Under the hood

Six stages, optimised for Nanopore amplicons.click any stage for tools

From raw long reads to polished consensus sequences and diversity metrics — tuned for the error profile of Oxford Nanopore amplicon data.

01 Quality filtering

Trims adapters and length-filters reads to within the expected amplicon size window, removing chimeras and off-target sequences.

fastplong ↗
02 Read clustering

Embeds reads in low-dimensional sequence space and groups them into clusters that each represent a distinct underlying taxon — resilient to Nanopore error rates.

UMAP ↗ HDBSCAN ↗
03 Consensus polishing

Builds a high-accuracy consensus from each cluster using three-stage polishing — turning many noisy long reads into a single near-perfect reference sequence.

SPOA ↗ Racon ↗ Medaka ↗
04 Taxonomic classification

Aligns each polished consensus against SILVA (16S) or UNITE (ITS) reference databases to assign taxonomy to the lowest reliable rank.

minimap2 ↗ SILVA ↗ UNITE ↗
05 Abundance estimation

Resolves multi-mapped reads to fractional taxon abundances using expectation-maximisation — producing accurate proportions even with closely-related taxa.

Expectation-Maximisation
06 Diversity & rarefaction

Calculates Shannon and Simpson diversity indices and produces rarefaction curves to assess whether sequencing depth captured the full community.

Shannon (H′) Simpson (1−D) Rarefaction

Reading the report

Diversity metrics, briefly explained.

Three standard ecological measures, reported per sample, that together describe community structure.

Shannon index (H′)

Combines richness and evenness. Higher = more diverse. Sensitive to rare taxa.

Simpson index (1−D)

Probability two reads come from different taxa. Weighted toward dominant taxa — robust to noise.

Rarefaction

Subsamples reads at increasing depths. A plateauing curve indicates saturation.

Deliverable

What the report includes.

Run summary

QC metrics, read retention, species count, and diversity at a glance.

Taxonomic composition

Interactive donut chart and abundance table with CSV export.

Rarefaction analysis

Curve with confidence bands and saturation assessment.

Methods

Full methods description, suitable for reproducibility and publication.

Sovereignty

All processing on sovereign UK infrastructure.

Sequencing data never leaves NS Bio's self-hosted environment. Reports are self-contained HTML files with no CDN requests, no analytics, and no third-party fonts loaded at runtime — every asset is embedded directly in the file. A report opened offline in five years' time will render exactly as it did on the day it was delivered.

Want to run one of these on your data?

Initial conversations are free and confidential. We'll discuss your sample type, sequencing platform, and turnaround — followed by a written quote within a week.

Start a conversation → Discuss a custom pipeline

Production-ready pipelines for microbial genomics

Bacterial whole-genome analysis.

Six worked examples from public Nanopore data.

Eight stages, fully automated.click any stage for tools

Species-specific typing tools.

What the report includes.

16S / ITS amplicon analysis.

Worked example: gut mock community.

Six stages, optimised for Nanopore amplicons.click any stage for tools

Diversity metrics, briefly explained.

What the report includes.

All processing on sovereign UK infrastructure.

Want to run one of these on your data?