Setting thresholds for differential gene expression (DGE) analysis is crucial and depends on several factors. In essence, for a list of genes, we are trying to define what counts as biologically meaningful versus just statistically significant.

The question is…

How big (or small) does the fold change need to be for a gene to be upregulated or downregulated?

And how small does the p-value need to be for the change to be statistically significant?

In this blogpost, we’ll cover how to choose log2FC and p-value thresholds for DGE analysis.

Let’s dive in!

Core threshold parameters: log2FC and p-value

When analysing differential gene expression results, there are two main parameters we need to consider when calling differentially expressed genes or DGEs:

  • LogFC (log fold change) measures effect size or magnitude of change— how much gene expression changes between groups. A high logFC means a big difference — but not necessarily reliable.
  • P-value measures statistical significance — how confident we are that the observed effect isn’t just due to random variation (noise). It accounts for variability within groups and sample size.

So, back to our questions:

How big (or small) does the fold change need to be for a gene to be upregulated or downregulated?

And how small does the p-value need to be for the change to be statistically significant?

Squidtip

So, a gene can have:

  • High logFC but a high p-value → Big difference, but lots of noise or small sample size.

  • Low logFC but a low p-value → Small difference, but very consistent and statistically reliable.

  • High logFC and low p-value → Jackpot! Likely a strong and reliable biological signal.

Check out this other blogpost if you’d like to know more about p-values and log2FC in DGE analysis.

How to choose log2FC threshold?

 The fold change, often in log base 2, measures the difference between conditions or groups. A log2FC of 1 means a 2× increase, −1 means a 2× decrease, 2 means a 4× increase, etc.

The reason why log2FC thresholds vary across studies is because there’s no single correct value. Like many things, it depends on context (I know, it’s literally the worst answer but what can we do).

Common values are:

  • |log₂FC| ≥ 1 (2-fold change) – most common, good balance
  • |log₂FC| ≥ 0.5 (1.4-fold change) – more sensitive, catches subtle changes
  • |log₂FC| ≥ 2 (4-fold change) – very stringent, only large effects

So how can you decide what is best for your data?

Here are a few guidelines that might help decide on a log2FC threshold:

  • Biological context: An important factor is what you’re studying, and how big of a change do you need for it to be interesting to you. Disease studies might use ≥1, while drug studies might need ≥2. Sometimes a small fold change is still meaningful: for example, small changes in expression of transcription factors or signaling genes can have big downstream effects!
  • Sample size: Smaller studies need higher thresholds to avoid false positives.
  • Noise level: Noisy data often needs higher thresholds, as small log2FC may not be reliable. In clean, high-depth data (like bulk RNA-seq), even small FCs can be real. In sparse data (like single-cell), higher log2FC cutoffs can reduce false positives.

How to choose a p-value threshold?

It’s ok to observe a big change in gene expression – but is it statistically significant? The p-value helps us answer that question. The smaller the p-value, the more sure we can be of the change we’re observing. You can read more about p-values, multiple testing.and confidence intervals in this blogpost.

As you might already know, we usually take p-value thresholds of 0.01 or 0.05. 

Actually, when we do DGE analysis, it’s best practice to correct for multiple testing and use the p-adjusted value (or FDR) instead, since we’re testing thousands of genes simultaneously.

Standard FDR thresholds are:

  • FDR < 0.05 (5% false discovery rate) – most common
  • FDR < 0.01 (1% false discovery rate) – more stringent
Squidtip

 The FDR controls the proportion of false discoveries among your “significant” genes, the p-value would give way too many false positives. Read more about multiple testing correction here.

So, how do I choose lo2FC and p-value thresholds for my dataset?

These would be my general tips on how to decide on a threshold for your dataset.

Step 1: Consider Your Study Design

  • Sample size: More samples = can use lower thresholds
  • Biological variability: High variability = need higher thresholds
  • Expected effect sizes: Known large effects = can be more stringent

Step 2: Look at Your Data Distribution

A great way of visualising differential gene expression results is with a volcano plot, which shows log₂FC vs -log₁₀(p-value). Visualising the DGE results this way can help you find where natural breaks occur. 

Check out my tutorial on volcano plots for differential gene expression analysis here!

An MA plot can also be useful – this one shows expression vs FC.

Step 3: Consider Downstream Applications

Essentially, what do you want your DGE results for? Do you need very few, but high-confidence hits? Or are you just exploring conditions and don’t mind being more lenient?

With validation experiments you can be more liberal initially, and then narrow down the list with more stringent thresholds. If you need DGEs to make a clinical decision, then you probably need high confidence. For pathway analysis you want to strike a balance between coverage and precision – if you have insufficient genes your pathway enrichment analysis will fail.

Squidtip

As a practical recommendation, I would say it’s best to start conservative, then relax:

  1. Begin with standard thresholds (|log₂FC| ≥ 1, FDR < 0.05)
  2. Check if results make biological sense
  3. Adjust based on validation or prior knowledge

And that is the end of this blogpost!

In this post, I explained how to determine the thresholds for DGE analysis, namely for log2FC and p-value. Hope you found it useful!

Before you go, you might want to check:

Squidtastic!

You made it till the end! Hope you found this post useful.

If you have any questions, or if there are any more topics you would like to see here, leave me a comment down below.

Otherwise, have a very nice day and… see you in the next one!