SCTransform (Single-Cell Transform) is a normalization method primarily used in scRNA-seq data analysis. It was developed to address limitations in standard normalization approaches when dealing with single-cell data.

You can check how to apply SCTransform on your single-cell data in Seurat’s tutorial for SCTransform.

In this post, we will cover:

what is SCTranform
differences with standard normalisation and scaling
how does it work

So if you are ready… let’s dive in!

What is SCTransform?

SCTransform is an advanced normalization and transformation method specifically designed for single-cell RNA sequencing data. It is an alternative to traditional methods like log-normalization and scaling. The method is implemented in the Seurat package and uses a statistical model to perform normalization, accounting for both technical noise and biological variation in a more sophisticated way than simpler normalization techniques.

In a nutshell, SCTransform:

uses a regularized negative binomial regression model to normalize gene expression data
accounts for both sequencing depth (library size) and gene-specific effects simultaneously
applies variance stabilization transformation to make the data more suitable for downstream analysis
automatically removes technical variation while preserving biological variation



Squidtip

SCTransform has become particularly popular in the Seurat package for single-cell analysis, as it often produces more robust results than previous normalization methods, especially when dealing with complex or noisy single-cell datasets.

Why use SCTransform versus standard methods?

Before we dig into the differences between SCTransform and standard methods, let’s have a look at what standard methods are.

Typically, sequencing data is both normalised and scaled.

Standard normalization methods (like log-normalization):

Typically, we first normalize by library size (total UMI counts per cell) and then apply a log transformation to make the counts more “human-friendly” (as otherwise they would be really small numbers).

In scRNAseq, this standard normalisation doesn’t effectively handle the relationship between mean expression and variance. Moreover, it can struggle with zero-inflation (many genes have zero counts in many cells) which is very common in scRNAseq datasets.

Standard scaling methods (like z-score):

Scaling usually means centering the data around the mean and scaling by standard deviation. This way, we make sure that all genes are comparable, which is important for dimensionality reduction, for example. Using z-score scaling means each gene will have a mean expression of 0 across all cells and a standard deviation of 1.

This standard approach treats all features equally regardless of their expression characteristics and doesn’t account for the count-based nature of sequencing data.



Squidtip

Normalization addresses differences between cells by adjusting for sequencing depth or other technical variations, ensuring that gene expression values are comparable across cells.

Scaling addresses differences between genes by adjusting for varying ranges in expression levels across genes, making sure that no single gene with very high expression dominates the analysis.

Ok, so what does SCTransform do differently?

The advantages of using SCTransform in scRNAseq data are a few:

Better handles the mean-variance relationship in count data
Effectively normalizes across cells with vastly different library sizes
Improves detection of variable genes
Reduces the need for explicit regression of technical covariates
Preserves biological heterogeneity while removing technical noise
Performs better for downstream tasks like clustering and differential expression

How does SCTransform work? (no maths explanation)

In single-cell RNA data, there’s a strong relationship between a gene’s average expression and its variance. Higher expressed genes naturally show more variability, which can skew analysis.

This is the problem SCTransform tackles.

SCTransform uses a negative binomial regression where:

Each gene count (Y) is modeled as a negative binomial distribution
The expected count for a gene in a cell depends on:
- The cell’s sequencing depth
- The gene’s overall expression level
- Gene-specific technical factors

The Math Steps:

For each gene, it fits a model: log(expected count) = log(sequencing depth) × β + other factors
It estimates parameters like the dispersion parameter (θ), which controls how much variance there is beyond what’s expected from the mean
It uses a regularization approach where genes with similar expression levels share information to improve estimation



Squidtip

The SCTransform method applies a regularized negative binomial regression to model the gene expression data. For each gene, we get an estimated “real expression” – it’s a corrected gene expression value that takes into account factors like the cell’s sequencing depth, the gene’s overall expression level and gene-specific technical factors.

After fitting the model, SCTransform calculates Pearson residuals:

Wait, what are Pearson residuals?

Pearson residuals are a way to measure how much each data point deviates from what we’d expect based on our model. The basic formula:

Pearson residual = (Observed value – Expected value) / √(Expected variance)

What does this mean?

If a gene has exactly the expression level we’d expect based on the model → residual = 0
If a gene is expressed higher than expected → positive residual
If a gene is expressed lower than expected → negative residual

So, if we go back to our scRNAseq example, a redisual is defined as:

Residual = (observed count – expected count) ÷ √(expected variance)

These residuals become our normalised values and they’re useful for several reasons:

Standardized scale: All genes end up on the same scale regardless of their expression level
Variance stabilization: Unlike raw counts, these residuals have roughly the same variance across all expression levels. The transformation ensures that genes with different expression levels can be compared fairly – the variance no longer depends on the mean expression level.
Interpretation: Positive residuals mean higher-than-expected expression; negative means lower-than-expected. A residual of +2 means “this gene is expressed about 2 standard deviations higher than expected”.

So, in practice for single-cell data:

The model predicts how many counts we expect for each gene in each cell based on technical factors
The residual tells us whether actual expression is higher or lower than this technical baseline
This helps separate biological signal (what we care about) from technical noise.

Think of it as “normalizing” the data while accounting for the unique characteristics of gene expression counts rather than just dividing by something like the total counts in each cell.

The Pearson residuals are useful for downstream processing (including PCA/ neighbor finding/ dimensionality reduction/ integration… but not for DE). For differential gene expression analysis, the Seurat team recommends using corrected counts (stored in the counts layer of the SCT assay, as explained below). A vignette with this exact workflow is here.

To visualise gene expression, we use the scale.data layer of SCT assay, except for visualisations of differential analysis like FeaturePlot, VlnPlot, where it makes more sense to represent corrected counts and not the Pearson’s coefficients. See this post for more information!

How is SCTransform is applied and stored in a Seurat object

You can run SCTransform on a Seurat object using:

seurat_object <- SCTransform(seurat_object)

What gets computed:

The model fits parameters for each gene
Pearson residuals are calculated for each gene in each cell
Variance of each gene is estimated

Where is what?

The Pearson residuals are stored in a new “SCT” assay within the Seurat object. You can access this with seurat_object[["SCT"]]. The raw count data remains untouched in the “RNA” assay. By default, subsequent analyses (PCA, UMAP, clustering) will use the SCT assay.

The SCT assay will now have 3 layers:

seurat_object[["SCT"]]@counts contains the corrected counts that have been adjusted for sequencing depth. Note that Seurat uses pearson residuals for all downstream tasks and not the corrected counts.
seurat_object[["SCT"]]@data contains log1p(corrected counts). This is the layer that should be used for visualization and differential expression analysis.
seurat_object[["SCT"]]@scale.data contains scaled Pearson residuals (only for variable features), which are centered and scaled further for dimensional reduction.

Additionally:

Variable features selected by SCTransform are found in: seurat_object[["SCT"]]@var.features

Model parameters are stored in: seurat_object@misc$sct.model

And that is the end of this tutorial!

In this post, I explained what is SCTransform and how it works, with no math. Hope you found it useful!

Before you go, you might want to check:

Squidtastic!

You made it till the end! Hope you found this post useful.

If you have any questions, or if there are any more topics you would like to see here, leave me a comment down below.

Otherwise, have a very nice day and… see you in the next one!

4 Comments

Sonia Hermoso on May 27, 2025 at 9:04 am

Hello! Thank you for your nice and clear explanations. I have been reading this post about SCTranform because I use this function for spatial transcriptomic data (10x genomics Visium). One think that I have learnt reading your post is that for differential gene expression analysis, they recommend using corrected counts when we are working with scRNA-seq data. So, my question is: do you know if it is the same with spatial transcriptomic data? I have been reading the Seurat vignette about spatial transcriptomic and they do not specify nothing. Thanks by advanced.
- Biostatsquid on May 29, 2025 at 6:53 pm
  
  Yes, using corrected (normalized) counts is also recommended for DGE analysis in spatial transcriptomics, just like in scRNA-seq, since raw counts are not directly comparable between spots or cells without normalization — but there are a few extra nuances due to spatial context and resolution. If you are using Seurat you can still use SCTransform
  spatial_obj <- SCTransform(spatial_obj, assay = "Spatial", verbose = FALSE) If you are using a spatial-specific model or tool like SpatialDE, BayesSpace..., you might need to use raw counts - it's best to read the documentation 🙂 Hope this helps!
scientist1 on June 24, 2025 at 5:42 pm

I am interested in measuring pathway enrichment, such as GSVA. Do you know if pearson residuals can be used for GSVA in a meaningful way? Or would it also be recommended to use SCT normalized counts (SCT data slot)?
- Biostatsquid on June 24, 2025 at 9:48 pm
  
  Hi! That’s a tricky question – I would say start with RNA or SCT normalized counts because GSVA expects expression-like values (it’s designed for log2-transformed expression data). SCT in general preserves expression relationships and will give you more interpretable pathway scores. As far as I know it’s the standard practice. I would only consider Pearson residuals if your SCT data shows residual technical effects, for example. I also recently posted a video (blogpost coming soon) on DGE analysis for scRNAseq, might be worth your time since there are other ways of getting DGEs with scRNAseq data which might be more appropriate!
  It’s also definitely checking this question in Seurat’s Github, I think it’s been addressed by the Seurat team, for example, there’a a brief mention of it here: https://github.com/satijalab/seurat/issues/8008
  And lastly, I just found this which might be of interest to you (never tried it)! https://github.com/carmonalab/UCell

SCTransform – simple and intuitive explanation

What is SCTransform?

Squidtip

Why use SCTransform versus standard methods?

Squidtip

How does SCTransform work? (no maths explanation)

Squidtip

How is SCTransform is applied and stored in a Seurat object

Squidtastic!

4 Comments

Submit a Comment Cancel reply

Recent posts

SCTransform – simple and intuitive explanation

What is SCTransform?

Squidtip

Why use SCTransform versus standard methods?

Squidtip

How does SCTransform work? (no maths explanation)

Squidtip

How is SCTransform is applied and stored in a Seurat object

Squidtastic!

4 Comments

Submit a Comment Cancel reply

Popular Posts

Recent posts