API Reference
Top-level functions
ncountr — Python pipeline for Nanostring nCounter data analysis.
- class ncountr.NanostringExperiment(raw_counts, pos_counts, neg_counts, hk_counts, sample_meta=<factory>, lane_info=<factory>, normalized=None, qc_results=None, de_results=None)[source]
Container for a parsed Nanostring nCounter experiment.
- Parameters:
- raw_counts
Endogenous gene counts, genes (rows) x samples (columns).
- Type:
pd.DataFrame
- pos_counts
Positive control counts, controls (rows) x samples (columns).
- Type:
pd.DataFrame
- neg_counts
Negative control counts, controls (rows) x samples (columns).
- Type:
pd.DataFrame
- hk_counts
Housekeeping gene counts, genes (rows) x samples (columns).
- Type:
pd.DataFrame
- sample_meta
Per-sample metadata (index = sample ID).
- Type:
pd.DataFrame
- lane_info
Per-sample lane attributes (FovCount, FovCounted, BindingDensity, etc.).
- Type:
pd.DataFrame
- normalized
Normalized count matrix (set after calling
normalize).- Type:
pd.DataFrame | None
- qc_results
QC results per sample (set after calling
qc).- Type:
pd.DataFrame | None
- de_results
DE results (set after calling
de).- Type:
pd.DataFrame | None
- ncountr.read_rcc(rcc_dirs, *, file_pattern='*.RCC', sample_id_pattern='(\\\\d+)', sample_id_field='ID', sample_id_from='field', sample_meta=None)[source]
Read RCC files from one or more directories into a NanostringExperiment.
- Parameters:
rcc_dirs (str, Path, or list thereof) – Directory or directories containing
.RCCfiles.file_pattern (str) – Glob pattern to match RCC files within each directory.
sample_id_pattern (str) – Regex applied to extract a clean sample ID. The first capture group is used.
sample_id_field (str) – Which field in the
<Sample_Attributes>section holds the sample ID. Only used whensample_id_from="field".sample_id_from (str) – Where to extract the sample ID from.
"field"(default) uses thesample_id_fieldfrom the RCC file."filename"applies the regex to the filename instead — useful when internal IDs are inconsistent across files.sample_meta (dict[str, dict] | None) – Optional per-sample metadata, keyed by sample ID.
- Return type:
- ncountr.qc(experiment, *, fov_ratio_threshold=0.75, pos_r2_threshold=0.95, neg_sd=2.0)[source]
Run QC checks and store results on the experiment.
- Parameters:
experiment (NanostringExperiment)
fov_ratio_threshold (float) – Minimum acceptable FOV ratio.
pos_r2_threshold (float) – Minimum R-squared for positive control linearity.
neg_sd (float) – Number of standard deviations above mean for negative background.
- Returns:
QC results indexed by sample.
- Return type:
pd.DataFrame
- ncountr.normalize(experiment, *, method='pos_hk', neg_bg=None)[source]
Normalize raw counts and store the result on the experiment.
- Parameters:
experiment (NanostringExperiment)
method (str) –
"pos_only"— positive control normalization only."pos_hk"— positive control + housekeeping normalization."pos_hk_bg"— positive control + housekeeping + background subtraction.neg_bg (pd.Series or dict, optional) – Per-sample negative background values. Required for
"pos_hk_bg". If not provided, computed fromexperiment.neg_counts.
- Returns:
Normalized count matrix (genes x samples).
- Return type:
pd.DataFrame
- ncountr.de(experiment, *, group_a, group_b, counts=None, test='mannwhitneyu', correction='fdr_bh', store=True)[source]
Run differential expression between two sample groups.
- Parameters:
experiment (NanostringExperiment)
group_a (list[str]) – Sample IDs for the two groups. Log2FC is computed as
log2(mean_a + 1) - log2(mean_b + 1), i.e. positive values mean higher in group A.group_b (list[str]) – Sample IDs for the two groups. Log2FC is computed as
log2(mean_a + 1) - log2(mean_b + 1), i.e. positive values mean higher in group A.counts (pd.DataFrame, optional) – Count matrix to use. Defaults to
experiment.normalizedif available, otherwiseexperiment.raw_counts.test (str) – Statistical test:
"mannwhitneyu"or"ttest".correction (str) – Multiple testing correction method (passed to
statsmodels.stats.multitest.multipletests).store (bool) – If True, store results on
experiment.de_results.
- Returns:
Columns: gene, log2FC, mean_a, mean_b, pvalue, padj.
- Return type:
pd.DataFrame
- ncountr.score_gene_set(experiment, *, gene_set, counts=None, samples=None, method='zscore_mean')[source]
Score samples for a gene set.
- Parameters:
experiment (NanostringExperiment)
gene_set (str or list[str]) – A built-in gene set name (e.g.
"IFN_JAKSTAT") or an explicit list of gene names.counts (pd.DataFrame, optional) – Count matrix (genes x samples). Defaults to normalized or raw counts.
samples (list[str], optional) – Subset of samples to score. Defaults to all.
method (str) – Scoring method. Currently
"zscore_mean"(z-score each gene across samples, then take the mean z-score per sample).
- Returns:
Score per sample.
- Return type:
pd.Series
- ncountr.get_gene_set(name)[source]
Return a built-in gene set by name.
- Parameters:
name (str) – Case-insensitive gene set name. Use
list_gene_sets()to see available names.- Raises:
KeyError – If the name is not found.
- Return type:
- ncountr.to_anndata(experiment)[source]
Convert a NanostringExperiment to an AnnData object.
Uses normalized counts as the main matrix (X) when available, otherwise falls back to raw counts. Raw counts are always stored in
adata.layers["raw"].- Parameters:
experiment (NanostringExperiment) – A parsed (and optionally normalized) experiment.
- Returns:
Samples in
.obs, genes in.var.- Return type:
- Raises:
ImportError – If
anndatais not installed.
Data container
- class ncountr.experiment.NanostringExperiment(raw_counts, pos_counts, neg_counts, hk_counts, sample_meta=<factory>, lane_info=<factory>, normalized=None, qc_results=None, de_results=None)[source]
Container for a parsed Nanostring nCounter experiment.
- Parameters:
- raw_counts
Endogenous gene counts, genes (rows) x samples (columns).
- Type:
pd.DataFrame
- pos_counts
Positive control counts, controls (rows) x samples (columns).
- Type:
pd.DataFrame
- neg_counts
Negative control counts, controls (rows) x samples (columns).
- Type:
pd.DataFrame
- hk_counts
Housekeeping gene counts, genes (rows) x samples (columns).
- Type:
pd.DataFrame
- sample_meta
Per-sample metadata (index = sample ID).
- Type:
pd.DataFrame
- lane_info
Per-sample lane attributes (FovCount, FovCounted, BindingDensity, etc.).
- Type:
pd.DataFrame
- normalized
Normalized count matrix (set after calling
normalize).- Type:
pd.DataFrame | None
- qc_results
QC results per sample (set after calling
qc).- Type:
pd.DataFrame | None
- de_results
DE results (set after calling
de).- Type:
pd.DataFrame | None
I/O
Parse Nanostring RCC files into a NanostringExperiment.
- ncountr.io.rcc.read_rcc(rcc_dirs, *, file_pattern='*.RCC', sample_id_pattern='(\\\\d+)', sample_id_field='ID', sample_id_from='field', sample_meta=None)[source]
Read RCC files from one or more directories into a NanostringExperiment.
- Parameters:
rcc_dirs (str, Path, or list thereof) – Directory or directories containing
.RCCfiles.file_pattern (str) – Glob pattern to match RCC files within each directory.
sample_id_pattern (str) – Regex applied to extract a clean sample ID. The first capture group is used.
sample_id_field (str) – Which field in the
<Sample_Attributes>section holds the sample ID. Only used whensample_id_from="field".sample_id_from (str) – Where to extract the sample ID from.
"field"(default) uses thesample_id_fieldfrom the RCC file."filename"applies the regex to the filename instead — useful when internal IDs are inconsistent across files.sample_meta (dict[str, dict] | None) – Optional per-sample metadata, keyed by sample ID.
- Return type:
Export utilities for writing results to disk.
- ncountr.io.export.to_anndata(experiment)[source]
Convert a NanostringExperiment to an AnnData object.
Uses normalized counts as the main matrix (X) when available, otherwise falls back to raw counts. Raw counts are always stored in
adata.layers["raw"].- Parameters:
experiment (NanostringExperiment) – A parsed (and optionally normalized) experiment.
- Returns:
Samples in
.obs, genes in.var.- Return type:
- Raises:
ImportError – If
anndatais not installed.
- ncountr.io.export.export_counts(experiment, output_dir, *, prefix='nanostring')[source]
Write raw and normalized count matrices to CSV.
Returns a dict mapping description to output path.
- ncountr.io.export.export_qc(experiment, output_dir, *, prefix='nanostring')[source]
Write QC results to CSV.
- ncountr.io.export.export_de(experiment, output_dir, *, prefix='nanostring')[source]
Write DE results to CSV.
Download nCounter RCC files from NCBI GEO.
- ncountr.io.geo.fetch_geo(accession, output_dir='.', *, quiet=False)[source]
Download and extract RCC files from a GEO accession.
Looks for the
GSE*_RAW.tarsupplement file, downloads it, and extracts any.RCC(or.RCC.gz) files into output_dir.- Parameters:
- Returns:
Path to the directory containing extracted RCC files.
- Return type:
Path
Core analysis
Quality control checks for Nanostring nCounter data.
- ncountr.core.qc.qc(experiment, *, fov_ratio_threshold=0.75, pos_r2_threshold=0.95, neg_sd=2.0)[source]
Run QC checks and store results on the experiment.
- Parameters:
experiment (NanostringExperiment)
fov_ratio_threshold (float) – Minimum acceptable FOV ratio.
pos_r2_threshold (float) – Minimum R-squared for positive control linearity.
neg_sd (float) – Number of standard deviations above mean for negative background.
- Returns:
QC results indexed by sample.
- Return type:
pd.DataFrame
Normalization methods for Nanostring nCounter data.
- ncountr.core.normalize.normalize(experiment, *, method='pos_hk', neg_bg=None)[source]
Normalize raw counts and store the result on the experiment.
- Parameters:
experiment (NanostringExperiment)
method (str) –
"pos_only"— positive control normalization only."pos_hk"— positive control + housekeeping normalization."pos_hk_bg"— positive control + housekeeping + background subtraction.neg_bg (pd.Series or dict, optional) – Per-sample negative background values. Required for
"pos_hk_bg". If not provided, computed fromexperiment.neg_counts.
- Returns:
Normalized count matrix (genes x samples).
- Return type:
pd.DataFrame
- ncountr.core.normalize.get_scaling_factors(experiment)[source]
Compute and return scaling factors without modifying the experiment.
- Returns:
{"pos": {sid: factor}, "hk": {sid: factor}}.- Return type:
- Parameters:
experiment (NanostringExperiment)
Differential expression analysis for Nanostring nCounter data.
- ncountr.core.de.de(experiment, *, group_a, group_b, counts=None, test='mannwhitneyu', correction='fdr_bh', store=True)[source]
Run differential expression between two sample groups.
- Parameters:
experiment (NanostringExperiment)
group_a (list[str]) – Sample IDs for the two groups. Log2FC is computed as
log2(mean_a + 1) - log2(mean_b + 1), i.e. positive values mean higher in group A.group_b (list[str]) – Sample IDs for the two groups. Log2FC is computed as
log2(mean_a + 1) - log2(mean_b + 1), i.e. positive values mean higher in group A.counts (pd.DataFrame, optional) – Count matrix to use. Defaults to
experiment.normalizedif available, otherwiseexperiment.raw_counts.test (str) – Statistical test:
"mannwhitneyu"or"ttest".correction (str) – Multiple testing correction method (passed to
statsmodels.stats.multitest.multipletests).store (bool) – If True, store results on
experiment.de_results.
- Returns:
Columns: gene, log2FC, mean_a, mean_b, pvalue, padj.
- Return type:
pd.DataFrame
Gene set / pathway scoring.
- ncountr.core.pathway.score_gene_set(experiment, *, gene_set, counts=None, samples=None, method='zscore_mean')[source]
Score samples for a gene set.
- Parameters:
experiment (NanostringExperiment)
gene_set (str or list[str]) – A built-in gene set name (e.g.
"IFN_JAKSTAT") or an explicit list of gene names.counts (pd.DataFrame, optional) – Count matrix (genes x samples). Defaults to normalized or raw counts.
samples (list[str], optional) – Subset of samples to score. Defaults to all.
method (str) – Scoring method. Currently
"zscore_mean"(z-score each gene across samples, then take the mean z-score per sample).
- Returns:
Score per sample.
- Return type:
pd.Series
Datasets
Built-in gene sets and cell type markers.
- ncountr.datasets.get_gene_set(name)[source]
Return a built-in gene set by name.
- Parameters:
name (str) – Case-insensitive gene set name. Use
list_gene_sets()to see available names.- Raises:
KeyError – If the name is not found.
- Return type:
Plotting
QC summary plots.
- ncountr.plotting.qc_plots.plot_qc(experiment, *, output=None, fov_threshold=0.75)[source]
Generate a 4-panel QC summary figure.
Panels: A) FOV ratio, B) Positive control linearity, C) Negative background, D) Housekeeping gene totals (raw + pos-normalized).
- Parameters:
experiment (NanostringExperiment)
output (str or Path, optional) – Save figure to this path.
fov_threshold (float) – FOV ratio threshold line.
- Return type:
matplotlib.figure.Figure
Differential expression plots.
- ncountr.plotting.de_plots.plot_volcano(de_results, *, highlight_genes=None, highlight_label='Highlighted', highlight_color='gold', padj_threshold=0.05, log2fc_threshold=0.0, label_top_n=15, output=None, title=None)[source]
Generate a volcano plot from DE results.
Significant genes (padj < threshold) are colored red (up) or blue (down). An optional set of genes of interest can be highlighted with colored markers on top, useful for visualizing pathway genes (e.g. IFN/JAK-STAT) or custom gene lists.
- Parameters:
de_results (pd.DataFrame) – Must contain columns:
gene,log2FC,pvalue,padj.highlight_genes (list[str], optional) – Genes to highlight with colored markers. Can be any gene list of interest (pathway genes, custom markers, etc.).
highlight_label (str) – Legend label for highlighted genes.
highlight_color (str) – Color for highlighted gene markers.
padj_threshold (float) – Significance threshold for coloring.
log2fc_threshold (float) – Minimum absolute log2FC to color significant genes (default 0).
label_top_n (int) – Number of top genes to label by p-value.
output (str or Path, optional) – Save figure to this path.
title (str, optional) – Figure title.
- Return type:
matplotlib.figure.Figure
Pathway / gene set scoring plots.
- ncountr.plotting.pathway_plots.plot_pathway_scores(scores, groups, *, group_colors=None, output=None, title='Pathway Score', ylabel='Pathway score (z-scored)')[source]
Box + strip plot of pathway scores by group.
- Parameters:
- Return type:
matplotlib.figure.Figure
Heatmap plotting utilities.
- ncountr.plotting.heatmaps.plot_heatmap(data, *, zscore=True, vmin=-2, vmax=2, cmap='RdBu_r', title='', xlabel_rotation=45, ylabel_fontsize=8, output=None, figsize=None)[source]
Plot a heatmap of expression data (genes x samples).
- Parameters:
- Return type:
matplotlib.figure.Figure
Cross-platform validation
Cross-platform correlation analysis.
- ncountr.crossplatform.correlation.per_sample_correlation(nanostring, external, *, shared_genes=None, method='spearman')[source]
Compute per-sample correlation between two expression matrices.
- Parameters:
- Returns:
One row per sample with columns: sample, r, pvalue.
- Return type:
pd.DataFrame
- ncountr.crossplatform.correlation.per_gene_correlation(nanostring, external, *, shared_samples=None, method='spearman', min_samples=4)[source]
Compute per-gene correlation across shared samples.
- Parameters:
nanostring (pd.DataFrame) – Expression matrices (genes x samples).
external (pd.DataFrame) – Expression matrices (genes x samples).
shared_samples (list[str], optional) – Samples to include. If None, uses intersection.
method (str) –
"spearman"or"pearson".min_samples (int) – Minimum samples with variation required.
- Returns:
One row per gene with columns: gene, r, pvalue.
- Return type:
pd.DataFrame
DE direction concordance between platforms.
- ncountr.crossplatform.concordance.de_concordance(de_a, de_b, *, gene_col_a='gene', gene_col_b='gene', lfc_col_a='log2FC', lfc_col_b='log2FC', padj_col_a='padj', padj_col_b='padj', gene_mapping=None, gene_flags=None, flag_col_name='is_flagged')[source]
Compare DE results between two platforms.
- Parameters:
de_a (pd.DataFrame) – DE result tables from each platform.
de_b (pd.DataFrame) – DE result tables from each platform.
gene_col_a (str) – Column holding gene names.
gene_col_b (str) – Column holding gene names.
lfc_col_a (str) – Column holding log2 fold-changes.
lfc_col_b (str) – Column holding log2 fold-changes.
padj_col_a (str) – Column holding adjusted p-values.
padj_col_b (str) – Column holding adjusted p-values.
gene_mapping (dict, optional) – Mapping from de_a gene names to de_b gene names.
gene_flags (dict[str, bool], optional) – Per-gene boolean flag (e.g., IFN gene membership).
flag_col_name (str) – Name of the flag column in output.
- Returns:
Per-gene concordance with columns: gene, lfc_a, lfc_b, padj_a, padj_b, same_direction, <flag_col_name>.
- Return type:
pd.DataFrame
- ncountr.crossplatform.concordance.concordance_summary(conc_df)[source]
Compute concordance statistics.
- Parameters:
conc_df (pd.DataFrame) – Output of
de_concordance().- Returns:
Keys: overall_rate, n_concordant, n_total, lfc_spearman_r, lfc_spearman_p.
- Return type:
Cell composition proxy from marker gene expression.
- ncountr.crossplatform.composition.marker_composition_proxy(nanostring_counts, cell_proportions, markers, *, samples=None)[source]
Correlate Nanostring marker gene expression with cell proportions.
- Parameters:
nanostring_counts (pd.DataFrame) – Nanostring expression (genes x samples).
cell_proportions (pd.DataFrame) – Cell type proportions (samples x cell types).
markers (dict[str, list[str]]) – Cell type name → list of marker gene names.
samples (list[str], optional) – Subset of samples. If None, uses intersection.
- Returns:
Per-cell-type correlation: cell_type, markers_used, matched_column, spearman_r, spearman_p.
- Return type:
pd.DataFrame