How to interpret MA plots






In this blogpost, we will go over the basics of an MA plot which is a very useful visualisation for genomics and transcriptomics data. We will go over the basics of MA plots and how to interpret them. This is the first part of the MA plot series - in the second part you can follow an easy step-by-step tutorial where we will learn how to create and customise an MA plot in R.

So if you are ready… let’s dive in!


Video thumbnail

Check out my YouTube tutorial to learn how to interpret MA plots!


What is an MA plot?

Often, when doing gene or protein expression analysis, we want to compare two samples or conditions. A great way to visualise your gene or protein expression across two groups to get an overall understanding of your data is with an MA plot.

Let’s start by defining what is an MA plot and how to interpret it.

An MA plot is a scatterplot used mainly in gene expression analysis (like RNA-Seq data). It helps you compare two samples and see how many and which genes are differentially expressed.

An MA plot shows the log average (A) on the x-axis and the log ratio (M) on the y-axis. As you probably already guessed, the name “MA plot” comes from the axes:
- M-axis (y-axis): Here, M stands for “minus” because log(A/B) = logA-logB. The log ratio indicates how much a gene’s expression changed (like log2 fold change)
- A-axis (x-axis): Average log expression across two samples or conditions (A = (log(A) + log(B))/2)

Ok, but what does this mean? Let’s see how to interpret an MA plot with an example.

SquidTip!
MA plots are particularly useful in genomics for comparing gene expression between conditions, but they’re also valuable for any method comparison study where you want to visualize the relationship between the magnitude of measurements and their differences.


How to interpret an MA plot?

Imagine we are comparing RNAseq expression values between two conditions: disease and control samples.

An MA plot would look like this:

Each point (dot) on the plot represents a gene. In the x axis, we have average log expression. So genes that are more highly expressed across both conditions (disease and control) would be located towards the right. Genes with a very low average expression will be located towards the left.

Easy, right?

Now let’s have a look at the y axis. The y axis, as we mentioned, represents log fold change. Essentially, it is telling us how much the expression changes between the two samples. Let’s look at a few scenarios:

  • If a gene has an average expression of 1 in control samples, and an average expression of 2 in the “disease” samples: log2FC = log2(2/1) = log2(2) = 1.
  • If a gene has an average expression of 1 in control samples, and an average expression of 0.5 in the “disease” samples: log2FC = log2(0.5/1) = log2(0.5) = -1.
  • If a gene has no expression change between conditions (average expression of 2 in both cases), then: log2FC = log2(2/2) = log2(1) = 0.

In summary:
- If a gene is upregulated, log2FC > 0.
- If a gene is downregulated, log2FC < 0.
- If a gene is not differentially expressed, log2FC ~ 0.

If we go back to our plot, you can see that the blue dotted line marks a log2FC of 0. Genes (points) over the line are upregulated in disease compared to control samples, and genes under the line are downregulated in disease compared to control samples.

Easy!

So essentially, MA plots allow us to compare two groups by visualising the relationship between the magnitude of measurements and their differences between conditions. In other words, we’re plotting how much the genes are expressed, and how big is the difference in expression between the two conditions or samples.

Ok, so let’s have a look at a few use cases and general tips when interpreting results using an MA plot.


When to use an MA plot

As we already mentioned, MA plots are widely used to compare gene expression between conditions, for example, using microarray or RNA-seq data. They are very popular to show bulk RNAseq differential expression (e.g. from DESeq2, edgeR).

The key here is that you are comparing two different groups, whether that is treated vs untreated samples, or two different methods, for example, if you want to check whether Oxford Nanopore and PacBio long read sequencing give you similar results if you sequence the same sample.

MA plots are also great to assess overall expression trends. In general we expect this sort of funnel shape: low-expression genes towards the right show more variability, that is, higher and lower Ms. As we move towards the left, were genes are more highly expressed on average across both conditions, then the variability is lower and there are fewer genes with big differences between conditions.

So by plotting an MA plot, the overall shape can already give you clues whether there’s something off going on with your experiment / dataset.
- It helps you see if most genes are unchanged (points near M = 0) or if there are genes with large fold changes.
- It’s good for general quality control, for example, after you normalise your data, you should see that most of your points are centered around an M of 0 (as most genes won’t be differentially expressed). - MA plots are also great to check that there is no systematic bias between conditions, in other words, you expect some genes to be upregulated and some to be downregulated in one sample versus another. If all genes seem to be upregulated in one condition and it’s not something you expect based on biological differences, then there may be technical biases you need to correct for.
- Of course, MA plots are also great to spot interesting genes that are very different between conditions, and also very highly expressed. You can add the gene annotation to your plot too, which we will learn in the R tutorial!

When not to use an MA plot? If you’re comparing more than two groups — use other methods (like PCA plots or heatmaps).


MA plot vs volcano plot

You might be wondering when do we use an MA plot. If you’ve worked with gene expression data, you’ve probably heard of volcano plots before, which are another great way of showing differential gene expression results.

Though MA plots and volcano plot are both used in differential expression analysis, they show information in different ways. Here’s a simple side-by-side comparison:

Feature MA Plot Volcano Plot
X-axis Mean expression level (A = average) Statistical significance (e.g. –log10(p-value))
Y-axis Log2 fold change (M = log ratio) Log2 fold change
Purpose See expression changes relative to expression level See which changes are big and statistically significant
Main use Assess data quality, normalization, expression bias Highlight the most important hits (e.g. significantly up/downregulated genes)

So essentially, an MA plot focuses on the biology and expression trends (“Are changes happening mostly in low or high expressed genes?”); whereas a volcano plot focuses on finding important hits (“Which genes are changing a lot and are statistically significant?”).

It’s important to know that often, we consider a gene with M higher than 1 or lower than -1 as up or downregulated. However, even if we consider these genes differentially expressed, they are not necessarilly statistically differentially expressed! Remember that if a gene has an M higher than 1 it means it’s more than double in one condition versus another, but you don’t know whether that difference is significant. For that, we would use a volcano plot, which I cover in another video and blogpost, which is a way of visualising differential gene expression analysis. A volcano plot will plot the log2FC, or M, and the p value, allowing us to zoom into specific genes that are significantly differentially expressed.

Nice!

We are now ready to create an MA plot in R!. Click here to follow my step-by-step tutorial and create your own MA plots with ggplot.

SquidTip!
You might be interested in this other blogpost on how to interpret a volcano plot. Or check out this easy step-by-step tutorial to create volcano plots in R.


Squidtastic! And that’s the end of this tutorial.

In this post, we covered how to interpret MA plots. Hope you found it useful!

As you saw, MA plots are a great way to compare two samples and see which genes are differentially expressed. As key takeaways, when looking at an MA plot, don’t forget about:

  1. Centering: The horizontal line at M = 0 represents no change between conditions
  2. Spread: Wide spread in M values at low A values often indicates technical noise
  3. Bias: Systematic deviation from M = 0 may indicate technical bias
  4. Significance: Consider both fold change magnitude and statistical significance

Additional resources

You might be interested in…


Squidtastic!

You made it till the end! Hope you found this post useful.

If you have any questions, or if there are any more topics you would like to see here, leave me a comment down below.

Otherwise, have a very nice day and… see you in the next one!