NS Bio
/pipelines
Analysis pipelines

Production-ready pipelines for microbial genomics

From raw sequencing reads to annotated, client-ready reports. Every pipeline runs on sovereign UK infrastructure and produces a single self-contained HTML report — no external dependencies, no cloud egress, fully reproducible.

Pipeline 01 · Whole-genome

Bacterial whole-genome analysis.

The bacterialp pipeline takes raw Oxford Nanopore FASTQ files and produces a single self-contained HTML report — species identification, assembly, AMR profile, MLST, and species-specific typing in one document.

Designed for clinical and public-health microbiology workflows where turnaround time and traceability matter. The final report can be opened in any browser, saved as PDF, or archived into a patient record.

Oxford Nanopore Assembly AMR MLST Phylogeny
Under the hood

Eight stages, fully automated.click any stage for tools

From quality filtering to species-specific typing — every stage runs without manual intervention, with full provenance captured for the audit trail.

  1. 01 Quality filtering

    Trims sequencing adapters, removes low-quality bases, and discards reads that fall below length or quality thresholds — clean input for everything downstream.

  2. 02 Taxonomic classification

    Identifies the species in the sample by matching k-mers against a curated reference database. Confirms the expected organism and flags contamination early.

  3. 03 De novo assembly

    Reconstructs the bacterial genome from filtered long reads, including chromosomal and plasmid contigs. Optimised for Nanopore-only or hybrid input.

  4. 04 Genome completeness

    Scores the assembly against expected single-copy orthologs to measure completeness, duplication, and fragmentation. Surfaces incomplete or fragmented assemblies before they reach the report.

  5. 05 Gene annotation

    Predicts and labels coding sequences, rRNA, tRNA, and other genomic features. Produces a structured annotation suitable for downstream comparative work.

  6. 06 AMR detection

    Screens the assembly for known antimicrobial resistance genes and assesses plasmid context — with resistance class, coverage, and identity reported per hit.

  7. 07 MLST typing

    Assigns a sequence type by matching the canonical seven housekeeping gene alleles to PubMLST schemes — the standard genotyping framework for epidemiology.

  8. 08 Species-specific typing

    When Kraken2 identifies a clinically-relevant species, the pipeline automatically branches into the appropriate typing tool — serotyping, capsule typing, SCCmec, and more. See the cards below for the full set.

    Auto-selected per species
Stage 08 detail

Species-specific typing tools.

When a known clinically-relevant species is identified, the pipeline automatically runs the appropriate typing tool for that organism.

Staphylococcus aureus
SCCmec ↗

SCCmec cassette type, methicillin resistance status.

Klebsiella pneumoniae
Kleborate ↗

Capsule (K-locus), virulence factors, full AMR profile.

Salmonella enterica
SeqSero2 ↗

Serovar and serotype prediction directly from the genome.

Pseudomonas aeruginosa
PASTY ↗

O-antigen serogroup typing.

Escherichia coli
ECTyper ↗

O- and H-antigen serotyping (e.g. O157:H7).

Non-tuberculous Mycobacteria
NTM-Profiler ↗

NTM species identification and drug-resistance profile.

Deliverable

What the report includes.

Eight sections, all in one HTML file. Interactive charts, sortable tables, and CSV exports throughout.

General summary
QC pass/fail badges, top species, BUSCO completeness, MLST type.
Quality filtering
Read stats before and after filtering, with length and quality distributions.
Taxonomic classification
Top species identified by Kraken2 with read-level confidence.
Genome completeness
BUSCO stacked-bar chart with completeness, duplication, and fragmentation metrics.
Gene annotation
Bakta output with genome statistics and annotated feature counts.
AMR genes
Resistance class, coverage, plasmid association — with CSV export.
MLST typing
Sequence type and full allele profile from PubMLST schemes.
Species-specific results
Serotype, serogroup, SCCmec type and similar, where applicable.
Pipeline 02 · Amplicon

16S / ITS amplicon analysis.

The NS Amplicon pipeline profiles mixed microbial communities from Oxford Nanopore 16S rRNA or ITS sequencing — what is in the sample, and in what proportions?

Raw FASTQ files are taken through quality filtering, read clustering, consensus polishing, and taxonomic classification to produce per-sample abundance reports with diversity metrics and rarefaction curves.

16S rRNA ITS Nanopore Diversity SILVA / UNITE
Amplicon vs whole-genome

WGS asks what is this isolate? — deep, strain-level resolution from a single cultured organism.

Amplicon sequencing asks what is in this mixed community, and at what relative abundance? — no culture needed, but genus-level resolution.

See the output

Worked example: gut mock community.

A 12-species mock community sequenced on Nanopore. Click to open the full report — exactly what you receive.

Live report
Under the hood

Six stages, optimised for Nanopore amplicons.click any stage for tools

From raw long reads to polished consensus sequences and diversity metrics — tuned for the error profile of Oxford Nanopore amplicon data.

  1. 01 Quality filtering

    Trims adapters and length-filters reads to within the expected amplicon size window, removing chimeras and off-target sequences.

  2. 02 Read clustering

    Embeds reads in low-dimensional sequence space and groups them into clusters that each represent a distinct underlying taxon — resilient to Nanopore error rates.

  3. 03 Consensus polishing

    Builds a high-accuracy consensus from each cluster using three-stage polishing — turning many noisy long reads into a single near-perfect reference sequence.

  4. 04 Taxonomic classification

    Aligns each polished consensus against SILVA (16S) or UNITE (ITS) reference databases to assign taxonomy to the lowest reliable rank.

  5. 05 Abundance estimation

    Resolves multi-mapped reads to fractional taxon abundances using expectation-maximisation — producing accurate proportions even with closely-related taxa.

    Expectation-Maximisation
  6. 06 Diversity & rarefaction

    Calculates Shannon and Simpson diversity indices and produces rarefaction curves to assess whether sequencing depth captured the full community.

    Shannon (H′) Simpson (1−D) Rarefaction
Reading the report

Diversity metrics, briefly explained.

Three standard ecological measures, reported per sample, that together describe community structure.

Shannon index (H′)
Combines richness and evenness. Higher = more diverse. Sensitive to rare taxa.
Simpson index (1−D)
Probability two reads come from different taxa. Weighted toward dominant taxa — robust to noise.
Rarefaction
Subsamples reads at increasing depths. A plateauing curve indicates saturation.
Deliverable

What the report includes.

Run summary
QC metrics, read retention, species count, and diversity at a glance.
Taxonomic composition
Interactive donut chart and abundance table with CSV export.
Rarefaction analysis
Curve with confidence bands and saturation assessment.
Methods
Full methods description, suitable for reproducibility and publication.
Sovereignty

All processing on sovereign UK infrastructure.

Sequencing data never leaves NS Bio's self-hosted environment. Reports are self-contained HTML files with no CDN requests, no analytics, and no third-party fonts loaded at runtime — every asset is embedded directly in the file. A report opened offline in five years' time will render exactly as it did on the day it was delivered.

Want to run one of these on your data?

Initial conversations are free and confidential. We'll discuss your sample type, sequencing platform, and turnaround — followed by a written quote within a week.