Why do genes with the highest logFC not have the lowest p-value?

That’s a really good and very common question in differential gene expression analysis! It feels intuitive that the larger the difference in expression (log fold change, or logFC), the more significant it should be (i.e., the smaller the p-value), but that’s not always the case. Here’s why:

LogFC and p-values measure different things

LogFC (log fold change) measures effect size or magnitude of change— how much gene expression changes between groups. A high logFC means a big difference — but not necessarily reliable.
P-value measures statistical significance — how confident we are that the observed effect isn’t just due to random variation (noise). It accounts for variability within groups and sample size.



Squidtip

So, a gene can have:

High logFC but a high p-value → Big difference, but lots of noise or small sample size.
Low logFC but a low p-value → Small difference, but very consistent and statistically reliable.
High logFC and low p-value → Jackpot! Likely a strong and reliable biological signal.

Why a gene with high logFC might not have a low p-value

There are several reasons why a given gene with a high fold-change between two groups of interest may not have a low / significant p-value:

High variability (noise) within the groups:

If a gene has a big change in average expression (high logFC), but the values within each group are very spread out (high variance), the p-value can be high.
In other words: we see a big difference, but the data is so noisy that we can’t be confident it’s real.

Small sample size:

With fewer samples, it’s harder to distinguish signal from noise.
You could get a big fold change just by chance.

Outliers:

A gene might have a big fold change driven by one or two extreme values, which would inflate the logFC but still give a non-significant p-value once variance is considered.

Give me an example!

Imagine measuring the height difference between two species of plants:

🌱 Species A: [10, 12, 11, 13]
🌿 Species B: [20, 22, 21, 23]

There is a huge difference in their average height (logFC = high). And this difference is also consistent, in other words, there is low variance within the groups, which will return a low p-value.

Now imagine:

🌱 Species A: [10, 5, 15, 8]
🌿 Species B: [20, 30, 10, 22]

There is still a big difference in the average height, so the log2FC will still be high. However, there is a lot of scatter, high variability within each species – this will return a high p-value.



Squidtip

By the way, in this post I mainly talk about differential gene expression results, but you can also use heatmaps to display your pathway enrichment analysis results. In that case, each row is a different biological pathway.

So, when do you get both high logFC and low p-value?

We get a high logFC and low p-value when there’s a large difference in means and low variability within groups. These are often the most biologically interesting genes!

A great way of visualising differential gene expression results is with a volcano plot. Check out my tutorial on volcano plots for differential gene expression analysis here!

And that is all for today! Squidtastic:)

Did you find this post useful? Is there any more topics you would like to see here?

Let me know what you think in the comments down below!

And that is the end of this tutorial!

In this post, I explained the differences between log2FC and p-value, and why in differential gene expression analysis we don't always get both high log2FC and low p-value. Hope you found it useful!

Before you go, you might want to check:

Squidtastic!

You made it till the end! Hope you found this post useful.

If you have any questions, or if there are any more topics you would like to see here, leave me a comment down below.

Otherwise, have a very nice day and... see you in the next one!

Why do genes with the highest logFC not have the lowest p-value?

LogFC and p-values measure different things

Squidtip

Why a gene with high logFC might not have a low p-value

Give me an example!

Squidtip

So, when do you get both high logFC and low p-value?

Squidtastic!

Submit a Comment Cancel reply

Recent posts

Why do genes with the highest logFC not have the lowest p-value?

LogFC and p-values measure different things

Squidtip

Why a gene with high logFC might not have a low p-value

Give me an example!

Squidtip

So, when do you get both high logFC and low p-value?

Squidtastic!

Submit a Comment Cancel reply

Popular Posts

Recent posts