API Reference

Top-level functions

ncountr — Python pipeline for Nanostring nCounter data analysis.

class ncountr.NanostringExperiment(raw_counts, pos_counts, neg_counts, hk_counts, sample_meta=<factory>, lane_info=<factory>, normalized=None, qc_results=None, de_results=None)[source]

Container for a parsed Nanostring nCounter experiment.

Parameters:
raw_counts

Endogenous gene counts, genes (rows) x samples (columns).

Type:

pd.DataFrame

pos_counts

Positive control counts, controls (rows) x samples (columns).

Type:

pd.DataFrame

neg_counts

Negative control counts, controls (rows) x samples (columns).

Type:

pd.DataFrame

hk_counts

Housekeeping gene counts, genes (rows) x samples (columns).

Type:

pd.DataFrame

sample_meta

Per-sample metadata (index = sample ID).

Type:

pd.DataFrame

lane_info

Per-sample lane attributes (FovCount, FovCounted, BindingDensity, etc.).

Type:

pd.DataFrame

normalized

Normalized count matrix (set after calling normalize).

Type:

pd.DataFrame | None

qc_results

QC results per sample (set after calling qc).

Type:

pd.DataFrame | None

de_results

DE results (set after calling de).

Type:

pd.DataFrame | None

samples

Ordered sample IDs.

Type:

list[str]

raw_counts: DataFrame
pos_counts: DataFrame
neg_counts: DataFrame
hk_counts: DataFrame
sample_meta: DataFrame
lane_info: DataFrame
normalized: DataFrame | None = None
qc_results: DataFrame | None = None
de_results: DataFrame | None = None
property samples: list[str]

Return ordered sample IDs from raw_counts columns.

property genes: list[str]

Return gene names from raw_counts index.

property n_samples: int
property n_genes: int
ncountr.read_rcc(rcc_dirs, *, file_pattern='*.RCC', sample_id_pattern='(\\\\d+)', sample_id_field='ID', sample_id_from='field', sample_meta=None)[source]

Read RCC files from one or more directories into a NanostringExperiment.

Parameters:
  • rcc_dirs (str, Path, or list thereof) – Directory or directories containing .RCC files.

  • file_pattern (str) – Glob pattern to match RCC files within each directory.

  • sample_id_pattern (str) – Regex applied to extract a clean sample ID. The first capture group is used.

  • sample_id_field (str) – Which field in the <Sample_Attributes> section holds the sample ID. Only used when sample_id_from="field".

  • sample_id_from (str) – Where to extract the sample ID from. "field" (default) uses the sample_id_field from the RCC file. "filename" applies the regex to the filename instead — useful when internal IDs are inconsistent across files.

  • sample_meta (dict[str, dict] | None) – Optional per-sample metadata, keyed by sample ID.

Return type:

NanostringExperiment

ncountr.parse_rcc(filepath)[source]

Parse a single Nanostring RCC file.

Parameters:

filepath (str or Path) – Path to a .RCC file.

Returns:

Keys: sample (sample attributes), lane (lane attributes), counts (dict of (CodeClass, GeneName) -> count).

Return type:

dict

ncountr.qc(experiment, *, fov_ratio_threshold=0.75, pos_r2_threshold=0.95, neg_sd=2.0)[source]

Run QC checks and store results on the experiment.

Parameters:
  • experiment (NanostringExperiment)

  • fov_ratio_threshold (float) – Minimum acceptable FOV ratio.

  • pos_r2_threshold (float) – Minimum R-squared for positive control linearity.

  • neg_sd (float) – Number of standard deviations above mean for negative background.

Returns:

QC results indexed by sample.

Return type:

pd.DataFrame

ncountr.normalize(experiment, *, method='pos_hk', neg_bg=None)[source]

Normalize raw counts and store the result on the experiment.

Parameters:
  • experiment (NanostringExperiment)

  • method (str) – "pos_only" — positive control normalization only. "pos_hk" — positive control + housekeeping normalization. "pos_hk_bg" — positive control + housekeeping + background subtraction.

  • neg_bg (pd.Series or dict, optional) – Per-sample negative background values. Required for "pos_hk_bg". If not provided, computed from experiment.neg_counts.

Returns:

Normalized count matrix (genes x samples).

Return type:

pd.DataFrame

ncountr.de(experiment, *, group_a, group_b, counts=None, test='mannwhitneyu', correction='fdr_bh', store=True)[source]

Run differential expression between two sample groups.

Parameters:
  • experiment (NanostringExperiment)

  • group_a (list[str]) – Sample IDs for the two groups. Log2FC is computed as log2(mean_a + 1) - log2(mean_b + 1), i.e. positive values mean higher in group A.

  • group_b (list[str]) – Sample IDs for the two groups. Log2FC is computed as log2(mean_a + 1) - log2(mean_b + 1), i.e. positive values mean higher in group A.

  • counts (pd.DataFrame, optional) – Count matrix to use. Defaults to experiment.normalized if available, otherwise experiment.raw_counts.

  • test (str) – Statistical test: "mannwhitneyu" or "ttest".

  • correction (str) – Multiple testing correction method (passed to statsmodels.stats.multitest.multipletests).

  • store (bool) – If True, store results on experiment.de_results.

Returns:

Columns: gene, log2FC, mean_a, mean_b, pvalue, padj.

Return type:

pd.DataFrame

ncountr.score_gene_set(experiment, *, gene_set, counts=None, samples=None, method='zscore_mean')[source]

Score samples for a gene set.

Parameters:
  • experiment (NanostringExperiment)

  • gene_set (str or list[str]) – A built-in gene set name (e.g. "IFN_JAKSTAT") or an explicit list of gene names.

  • counts (pd.DataFrame, optional) – Count matrix (genes x samples). Defaults to normalized or raw counts.

  • samples (list[str], optional) – Subset of samples to score. Defaults to all.

  • method (str) – Scoring method. Currently "zscore_mean" (z-score each gene across samples, then take the mean z-score per sample).

Returns:

Score per sample.

Return type:

pd.Series

ncountr.get_gene_set(name)[source]

Return a built-in gene set by name.

Parameters:

name (str) – Case-insensitive gene set name. Use list_gene_sets() to see available names.

Raises:

KeyError – If the name is not found.

Return type:

list[str]

ncountr.list_gene_sets()[source]

Return names of all built-in gene sets.

Return type:

list[str]

ncountr.to_anndata(experiment)[source]

Convert a NanostringExperiment to an AnnData object.

Uses normalized counts as the main matrix (X) when available, otherwise falls back to raw counts. Raw counts are always stored in adata.layers["raw"].

Parameters:

experiment (NanostringExperiment) – A parsed (and optionally normalized) experiment.

Returns:

Samples in .obs, genes in .var.

Return type:

anndata.AnnData

Raises:

ImportError – If anndata is not installed.

Data container

class ncountr.experiment.NanostringExperiment(raw_counts, pos_counts, neg_counts, hk_counts, sample_meta=<factory>, lane_info=<factory>, normalized=None, qc_results=None, de_results=None)[source]

Container for a parsed Nanostring nCounter experiment.

Parameters:
raw_counts

Endogenous gene counts, genes (rows) x samples (columns).

Type:

pd.DataFrame

pos_counts

Positive control counts, controls (rows) x samples (columns).

Type:

pd.DataFrame

neg_counts

Negative control counts, controls (rows) x samples (columns).

Type:

pd.DataFrame

hk_counts

Housekeeping gene counts, genes (rows) x samples (columns).

Type:

pd.DataFrame

sample_meta

Per-sample metadata (index = sample ID).

Type:

pd.DataFrame

lane_info

Per-sample lane attributes (FovCount, FovCounted, BindingDensity, etc.).

Type:

pd.DataFrame

normalized

Normalized count matrix (set after calling normalize).

Type:

pd.DataFrame | None

qc_results

QC results per sample (set after calling qc).

Type:

pd.DataFrame | None

de_results

DE results (set after calling de).

Type:

pd.DataFrame | None

samples

Ordered sample IDs.

Type:

list[str]

raw_counts: DataFrame
pos_counts: DataFrame
neg_counts: DataFrame
hk_counts: DataFrame
sample_meta: DataFrame
lane_info: DataFrame
normalized: DataFrame | None = None
qc_results: DataFrame | None = None
de_results: DataFrame | None = None
property samples: list[str]

Return ordered sample IDs from raw_counts columns.

property genes: list[str]

Return gene names from raw_counts index.

property n_samples: int
property n_genes: int

I/O

Parse Nanostring RCC files into a NanostringExperiment.

ncountr.io.rcc.parse_rcc(filepath)[source]

Parse a single Nanostring RCC file.

Parameters:

filepath (str or Path) – Path to a .RCC file.

Returns:

Keys: sample (sample attributes), lane (lane attributes), counts (dict of (CodeClass, GeneName) -> count).

Return type:

dict

ncountr.io.rcc.read_rcc(rcc_dirs, *, file_pattern='*.RCC', sample_id_pattern='(\\\\d+)', sample_id_field='ID', sample_id_from='field', sample_meta=None)[source]

Read RCC files from one or more directories into a NanostringExperiment.

Parameters:
  • rcc_dirs (str, Path, or list thereof) – Directory or directories containing .RCC files.

  • file_pattern (str) – Glob pattern to match RCC files within each directory.

  • sample_id_pattern (str) – Regex applied to extract a clean sample ID. The first capture group is used.

  • sample_id_field (str) – Which field in the <Sample_Attributes> section holds the sample ID. Only used when sample_id_from="field".

  • sample_id_from (str) – Where to extract the sample ID from. "field" (default) uses the sample_id_field from the RCC file. "filename" applies the regex to the filename instead — useful when internal IDs are inconsistent across files.

  • sample_meta (dict[str, dict] | None) – Optional per-sample metadata, keyed by sample ID.

Return type:

NanostringExperiment

Export utilities for writing results to disk.

ncountr.io.export.to_anndata(experiment)[source]

Convert a NanostringExperiment to an AnnData object.

Uses normalized counts as the main matrix (X) when available, otherwise falls back to raw counts. Raw counts are always stored in adata.layers["raw"].

Parameters:

experiment (NanostringExperiment) – A parsed (and optionally normalized) experiment.

Returns:

Samples in .obs, genes in .var.

Return type:

anndata.AnnData

Raises:

ImportError – If anndata is not installed.

ncountr.io.export.export_counts(experiment, output_dir, *, prefix='nanostring')[source]

Write raw and normalized count matrices to CSV.

Returns a dict mapping description to output path.

Return type:

dict[str, Path]

Parameters:
ncountr.io.export.export_qc(experiment, output_dir, *, prefix='nanostring')[source]

Write QC results to CSV.

Return type:

Path | None

Parameters:
ncountr.io.export.export_de(experiment, output_dir, *, prefix='nanostring')[source]

Write DE results to CSV.

Return type:

Path | None

Parameters:

Download nCounter RCC files from NCBI GEO.

ncountr.io.geo.fetch_geo(accession, output_dir='.', *, quiet=False)[source]

Download and extract RCC files from a GEO accession.

Looks for the GSE*_RAW.tar supplement file, downloads it, and extracts any .RCC (or .RCC.gz) files into output_dir.

Parameters:
  • accession (str) – GEO series accession (e.g. "GSE275334").

  • output_dir (str or Path) – Directory to extract RCC files into. A subdirectory named after the accession will be created.

  • quiet (bool) – Suppress progress output.

Returns:

Path to the directory containing extracted RCC files.

Return type:

Path

Core analysis

Quality control checks for Nanostring nCounter data.

ncountr.core.qc.qc(experiment, *, fov_ratio_threshold=0.75, pos_r2_threshold=0.95, neg_sd=2.0)[source]

Run QC checks and store results on the experiment.

Parameters:
  • experiment (NanostringExperiment)

  • fov_ratio_threshold (float) – Minimum acceptable FOV ratio.

  • pos_r2_threshold (float) – Minimum R-squared for positive control linearity.

  • neg_sd (float) – Number of standard deviations above mean for negative background.

Returns:

QC results indexed by sample.

Return type:

pd.DataFrame

Normalization methods for Nanostring nCounter data.

ncountr.core.normalize.normalize(experiment, *, method='pos_hk', neg_bg=None)[source]

Normalize raw counts and store the result on the experiment.

Parameters:
  • experiment (NanostringExperiment)

  • method (str) – "pos_only" — positive control normalization only. "pos_hk" — positive control + housekeeping normalization. "pos_hk_bg" — positive control + housekeeping + background subtraction.

  • neg_bg (pd.Series or dict, optional) – Per-sample negative background values. Required for "pos_hk_bg". If not provided, computed from experiment.neg_counts.

Returns:

Normalized count matrix (genes x samples).

Return type:

pd.DataFrame

ncountr.core.normalize.get_scaling_factors(experiment)[source]

Compute and return scaling factors without modifying the experiment.

Returns:

{"pos": {sid: factor}, "hk": {sid: factor}}.

Return type:

dict

Parameters:

experiment (NanostringExperiment)

Differential expression analysis for Nanostring nCounter data.

ncountr.core.de.de(experiment, *, group_a, group_b, counts=None, test='mannwhitneyu', correction='fdr_bh', store=True)[source]

Run differential expression between two sample groups.

Parameters:
  • experiment (NanostringExperiment)

  • group_a (list[str]) – Sample IDs for the two groups. Log2FC is computed as log2(mean_a + 1) - log2(mean_b + 1), i.e. positive values mean higher in group A.

  • group_b (list[str]) – Sample IDs for the two groups. Log2FC is computed as log2(mean_a + 1) - log2(mean_b + 1), i.e. positive values mean higher in group A.

  • counts (pd.DataFrame, optional) – Count matrix to use. Defaults to experiment.normalized if available, otherwise experiment.raw_counts.

  • test (str) – Statistical test: "mannwhitneyu" or "ttest".

  • correction (str) – Multiple testing correction method (passed to statsmodels.stats.multitest.multipletests).

  • store (bool) – If True, store results on experiment.de_results.

Returns:

Columns: gene, log2FC, mean_a, mean_b, pvalue, padj.

Return type:

pd.DataFrame

Gene set / pathway scoring.

ncountr.core.pathway.score_gene_set(experiment, *, gene_set, counts=None, samples=None, method='zscore_mean')[source]

Score samples for a gene set.

Parameters:
  • experiment (NanostringExperiment)

  • gene_set (str or list[str]) – A built-in gene set name (e.g. "IFN_JAKSTAT") or an explicit list of gene names.

  • counts (pd.DataFrame, optional) – Count matrix (genes x samples). Defaults to normalized or raw counts.

  • samples (list[str], optional) – Subset of samples to score. Defaults to all.

  • method (str) – Scoring method. Currently "zscore_mean" (z-score each gene across samples, then take the mean z-score per sample).

Returns:

Score per sample.

Return type:

pd.Series

Datasets

Built-in gene sets and cell type markers.

ncountr.datasets.get_gene_set(name)[source]

Return a built-in gene set by name.

Parameters:

name (str) – Case-insensitive gene set name. Use list_gene_sets() to see available names.

Raises:

KeyError – If the name is not found.

Return type:

list[str]

ncountr.datasets.list_gene_sets()[source]

Return names of all built-in gene sets.

Return type:

list[str]

ncountr.datasets.get_cell_markers(cell_type=None)[source]

Return cell type marker genes.

Parameters:

cell_type (str, optional) – If given, return markers for that cell type. If None, return the full dictionary.

Return type:

dict[str, list[str]] | list[str]

Plotting

QC summary plots.

ncountr.plotting.qc_plots.plot_qc(experiment, *, output=None, fov_threshold=0.75)[source]

Generate a 4-panel QC summary figure.

Panels: A) FOV ratio, B) Positive control linearity, C) Negative background, D) Housekeeping gene totals (raw + pos-normalized).

Parameters:
  • experiment (NanostringExperiment)

  • output (str or Path, optional) – Save figure to this path.

  • fov_threshold (float) – FOV ratio threshold line.

Return type:

matplotlib.figure.Figure

Differential expression plots.

ncountr.plotting.de_plots.plot_volcano(de_results, *, highlight_genes=None, highlight_label='Highlighted', highlight_color='gold', padj_threshold=0.05, log2fc_threshold=0.0, label_top_n=15, output=None, title=None)[source]

Generate a volcano plot from DE results.

Significant genes (padj < threshold) are colored red (up) or blue (down). An optional set of genes of interest can be highlighted with colored markers on top, useful for visualizing pathway genes (e.g. IFN/JAK-STAT) or custom gene lists.

Parameters:
  • de_results (pd.DataFrame) – Must contain columns: gene, log2FC, pvalue, padj.

  • highlight_genes (list[str], optional) – Genes to highlight with colored markers. Can be any gene list of interest (pathway genes, custom markers, etc.).

  • highlight_label (str) – Legend label for highlighted genes.

  • highlight_color (str) – Color for highlighted gene markers.

  • padj_threshold (float) – Significance threshold for coloring.

  • log2fc_threshold (float) – Minimum absolute log2FC to color significant genes (default 0).

  • label_top_n (int) – Number of top genes to label by p-value.

  • output (str or Path, optional) – Save figure to this path.

  • title (str, optional) – Figure title.

Return type:

matplotlib.figure.Figure

Pathway / gene set scoring plots.

ncountr.plotting.pathway_plots.plot_pathway_scores(scores, groups, *, group_colors=None, output=None, title='Pathway Score', ylabel='Pathway score (z-scored)')[source]

Box + strip plot of pathway scores by group.

Parameters:
  • scores (pd.Series) – Per-sample scores (index = sample ID).

  • groups (dict[str, list[str]]) – Group name → list of sample IDs.

  • group_colors (dict, optional) – Group name → color.

  • output (str or Path, optional) – Save figure to this path.

  • title (str) – Plot labels.

  • ylabel (str) – Plot labels.

Return type:

matplotlib.figure.Figure

Heatmap plotting utilities.

ncountr.plotting.heatmaps.plot_heatmap(data, *, zscore=True, vmin=-2, vmax=2, cmap='RdBu_r', title='', xlabel_rotation=45, ylabel_fontsize=8, output=None, figsize=None)[source]

Plot a heatmap of expression data (genes x samples).

Parameters:
  • data (pd.DataFrame) – Genes (rows) x samples (columns).

  • zscore (bool) – Z-score each row across columns.

  • vmin (float) – Color scale limits.

  • vmax (float) – Color scale limits.

  • cmap (str) – Matplotlib colormap.

  • title (str)

  • output (str or Path, optional)

  • xlabel_rotation (int)

  • ylabel_fontsize (int)

  • figsize (tuple[float, float] | None)

Return type:

matplotlib.figure.Figure

Cross-platform validation

Cross-platform correlation analysis.

ncountr.crossplatform.correlation.per_sample_correlation(nanostring, external, *, shared_genes=None, method='spearman')[source]

Compute per-sample correlation between two expression matrices.

Parameters:
  • nanostring (pd.DataFrame) – Nanostring expression (genes x samples).

  • external (pd.DataFrame) – External expression (genes x samples).

  • shared_genes (list[str], optional) – Genes to include. If None, uses intersection of indices.

  • method (str) – "spearman" or "pearson".

Returns:

One row per sample with columns: sample, r, pvalue.

Return type:

pd.DataFrame

ncountr.crossplatform.correlation.per_gene_correlation(nanostring, external, *, shared_samples=None, method='spearman', min_samples=4)[source]

Compute per-gene correlation across shared samples.

Parameters:
  • nanostring (pd.DataFrame) – Expression matrices (genes x samples).

  • external (pd.DataFrame) – Expression matrices (genes x samples).

  • shared_samples (list[str], optional) – Samples to include. If None, uses intersection.

  • method (str) – "spearman" or "pearson".

  • min_samples (int) – Minimum samples with variation required.

Returns:

One row per gene with columns: gene, r, pvalue.

Return type:

pd.DataFrame

DE direction concordance between platforms.

ncountr.crossplatform.concordance.de_concordance(de_a, de_b, *, gene_col_a='gene', gene_col_b='gene', lfc_col_a='log2FC', lfc_col_b='log2FC', padj_col_a='padj', padj_col_b='padj', gene_mapping=None, gene_flags=None, flag_col_name='is_flagged')[source]

Compare DE results between two platforms.

Parameters:
  • de_a (pd.DataFrame) – DE result tables from each platform.

  • de_b (pd.DataFrame) – DE result tables from each platform.

  • gene_col_a (str) – Column holding gene names.

  • gene_col_b (str) – Column holding gene names.

  • lfc_col_a (str) – Column holding log2 fold-changes.

  • lfc_col_b (str) – Column holding log2 fold-changes.

  • padj_col_a (str) – Column holding adjusted p-values.

  • padj_col_b (str) – Column holding adjusted p-values.

  • gene_mapping (dict, optional) – Mapping from de_a gene names to de_b gene names.

  • gene_flags (dict[str, bool], optional) – Per-gene boolean flag (e.g., IFN gene membership).

  • flag_col_name (str) – Name of the flag column in output.

Returns:

Per-gene concordance with columns: gene, lfc_a, lfc_b, padj_a, padj_b, same_direction, <flag_col_name>.

Return type:

pd.DataFrame

ncountr.crossplatform.concordance.concordance_summary(conc_df)[source]

Compute concordance statistics.

Parameters:

conc_df (pd.DataFrame) – Output of de_concordance().

Returns:

Keys: overall_rate, n_concordant, n_total, lfc_spearman_r, lfc_spearman_p.

Return type:

dict

Cell composition proxy from marker gene expression.

ncountr.crossplatform.composition.marker_composition_proxy(nanostring_counts, cell_proportions, markers, *, samples=None)[source]

Correlate Nanostring marker gene expression with cell proportions.

Parameters:
  • nanostring_counts (pd.DataFrame) – Nanostring expression (genes x samples).

  • cell_proportions (pd.DataFrame) – Cell type proportions (samples x cell types).

  • markers (dict[str, list[str]]) – Cell type name → list of marker gene names.

  • samples (list[str], optional) – Subset of samples. If None, uses intersection.

Returns:

Per-cell-type correlation: cell_type, markers_used, matched_column, spearman_r, spearman_p.

Return type:

pd.DataFrame