Adding statistical significance asterisks to seaborn plots.

Bastian Eichenberger
4 min readApr 2, 2021

--

Most scientific publications use asterisks (*) to denote statistical significance in graphs. An example is shown below. This is rather simple in graphical programs such as Prism. Like many others, I prefer to create my graphs in python using the matplotlib / seaborn packages. There isn’t any straightforward way to do it by simply ticking a box. However, I’ll present three workarounds that have worked for me.

Example of statistical annotations with asterisks. Source: Picard, Mol Vis 2008; 14:928–941

Step 1: Calculating statistical significance

Before we can start adding asterisks to plots, we have to calculate the underlying values. These are typically p-values to which there is a great introduction in this medium post. In python, we can use the scipy.stats module to calculate the statistical test of choice. Here, we’ll use the T-test to compare the means of two independent samples of scores.

import scipystat, pvalue = scipy.stats.ttest_ind(a, b)

With a and b being the samples we want to compare. This will have to be done for every sample (component of the plot to be compared). Typically, if your data is in a pandas.DataFrame , this is rather straightforward.

import pandas as pddf = pd.read_csv("data.csv")
x_values = df["x_column"].unique()
pvalues = []
for x in x_values:
stat, pvalue = scipy.stats.ttest_ind(
df[(df["x_column"] == x) & (df["hue_column"] == hue1)],
df[(df["x_column"] == x) & (df["hue_column"] == hue2)]
)

Step 2: Converting p-values into asterisks

After getting our p-values, we’ll want to convert them into the format we care about, i.e. asterisks. The conversion will depend on your specific definition of significance but in all cases it’s as straight forward as writing a few if statements.

def convert_pvalue_to_asterisks(pvalue):
if pvalue <= 0.0001:
return "****"
elif pvalue <= 0.001:
return "***"
elif pvalue <= 0.01:
return "**"
elif pvalue <= 0.05:
return "*"
return "ns"

Simply repeat this for all p-values you calculated and start plotting!

Step 3: Plot the asterisks

The final step, plotting can be done in one of three ways. It will mainly depend on how complicated your graphs are and how beautiful they should end up looking.

Option 1. Using a vector graphics editor

I would recommend using a vector graphics editor such as Adobe Illustrator or Affinity Designer for very complex graphs containing multiple hues, comparisons, etc. Simply save your plot using matplotlib.figsave as a PDF. PDFs can be read by any graphics editor allowing you to change all components matplotlib plots or add new ones.

More specifically, this allows you to add asterisks to your liking based on the values you got before.

Processing in vector graphics software (here: Affinity designer) to customize annotations to one’s liking.

Option 2. Manually using matplotlib

matplotlib isn’t just capable of plotting data points but also custom text and lines. We can use this to our advantage by leveraging the matplotlib.pyplot.text function. Even though you can make this arbitrarily complicated, I would recommend using this for simple 2-hue graphs only. This takes in the x and y locations of the to-be-created text as well as what you want to write (here the asterisks). Taken together it’ll look something like this.

y_position = df[y].max() * 1.2for idx, pval in enumerate(pvalue_asterisks):
plt.text(x=idx, y=y_position, s=pval)

Note the somewhat dynamic position in y where i use the maximum value of my data and some more to place the spots. Similarly, the idx value works great when using increments of one in x. You might have to change it to suit your needs.

Output by adding manual text annotations.

Option 3. Semi-automatically using statannot

Finally, for slightly more advanced bar and violin plots, there is an awesome package called statannot that does most of the heavy lifting.

statannot leverages scipy.stats and does some more beautiful annotations that would require a lot more code to do manually.

import statannotstatannot.add_stat_annotation(
ax,
data=df,
x=x,
y=y,
hue=hue,
box_pairs=[
(("Biscoe", "Male"), ("Torgersen", "Female")),
(("Dream", "Male"), ("Dream", "Female")),
],
test="t-test_ind",
text_format="star",
loc="outside",
)
Output by using statannot and selecting the samples a t-test should be performed on.

Conclusion

Taken together, even though there isn’t a simple tick box to check like in some premium products, there are various options of adding statistical annotations in python. It’ll depend on your specific graphic and use case but the options provided above have served me well in all graphs I had to plot. Let me know how you do it!

--

--

Bastian Eichenberger
Bastian Eichenberger

Written by Bastian Eichenberger

Coding molecular biologist trying to be of assistance.

No responses yet