Adding statistical significance asterisks to seaborn plots.
Most scientific publications use asterisks (*) to denote statistical significance in graphs. An example is shown below. This is rather simple in graphical programs such as Prism. Like many others, I prefer to create my graphs in python using the matplotlib / seaborn packages. There isn’t any straightforward way to do it by simply ticking a box. However, I’ll present three workarounds that have worked for me.
Step 1: Calculating statistical significance
Before we can start adding asterisks to plots, we have to calculate the underlying values. These are typically p-values to which there is a great introduction in this medium post. In python, we can use the scipy.stats
module to calculate the statistical test of choice. Here, we’ll use the T-test to compare the means of two independent samples of scores.
import scipystat, pvalue = scipy.stats.ttest_ind(a, b)
With a
and b
being the samples we want to compare. This will have to be done for every sample (component of the plot to be compared). Typically, if your data is in a pandas.DataFrame
, this is rather straightforward.
import pandas as pddf = pd.read_csv("data.csv")
x_values = df["x_column"].unique()pvalues = []
for x in x_values:
stat, pvalue = scipy.stats.ttest_ind(
df[(df["x_column"] == x) & (df["hue_column"] == hue1)],
df[(df["x_column"] == x) & (df["hue_column"] == hue2)]
)
Step 2: Converting p-values into asterisks
After getting our p-values, we’ll want to convert them into the format we care about, i.e. asterisks. The conversion will depend on your specific definition of significance but in all cases it’s as straight forward as writing a few if statements.
def convert_pvalue_to_asterisks(pvalue):
if pvalue <= 0.0001:
return "****"
elif pvalue <= 0.001:
return "***"
elif pvalue <= 0.01:
return "**"
elif pvalue <= 0.05:
return "*"
return "ns"
Simply repeat this for all p-values you calculated and start plotting!
Step 3: Plot the asterisks
The final step, plotting can be done in one of three ways. It will mainly depend on how complicated your graphs are and how beautiful they should end up looking.
Option 1. Using a vector graphics editor
I would recommend using a vector graphics editor such as Adobe Illustrator or Affinity Designer for very complex graphs containing multiple hues, comparisons, etc. Simply save your plot using matplotlib.figsave
as a PDF. PDFs can be read by any graphics editor allowing you to change all components matplotlib
plots or add new ones.
More specifically, this allows you to add asterisks to your liking based on the values you got before.
Option 2. Manually using matplotlib
matplotlib
isn’t just capable of plotting data points but also custom text and lines. We can use this to our advantage by leveraging the matplotlib.pyplot.text
function. Even though you can make this arbitrarily complicated, I would recommend using this for simple 2-hue graphs only. This takes in the x and y locations of the to-be-created text as well as what you want to write (here the asterisks). Taken together it’ll look something like this.
y_position = df[y].max() * 1.2for idx, pval in enumerate(pvalue_asterisks):
plt.text(x=idx, y=y_position, s=pval)
Note the somewhat dynamic position in y
where i use the maximum value of my data and some more to place the spots. Similarly, the idx
value works great when using increments of one in x. You might have to change it to suit your needs.
Option 3. Semi-automatically using statannot
Finally, for slightly more advanced bar and violin plots, there is an awesome package called statannot
that does most of the heavy lifting.
statannot
leverages scipy.stats
and does some more beautiful annotations that would require a lot more code to do manually.
import statannotstatannot.add_stat_annotation(
ax,
data=df,
x=x,
y=y,
hue=hue,
box_pairs=[
(("Biscoe", "Male"), ("Torgersen", "Female")),
(("Dream", "Male"), ("Dream", "Female")),
],
test="t-test_ind",
text_format="star",
loc="outside",
)
Conclusion
Taken together, even though there isn’t a simple tick box to check like in some premium products, there are various options of adding statistical annotations in python. It’ll depend on your specific graphic and use case but the options provided above have served me well in all graphs I had to plot. Let me know how you do it!