I think this is a simple question, but I just still can't seem to think of a simple solution. I have a set of data of molecular abundances, with values ranging many orders of magnitude. I want to represent these abundances with boxplots
(box-and-whiskers plots
), and I want the boxes to be calculated on log scale because of the wide range of values.
I know I can just calculate the log10 of the data and send it to matplotlib's boxplot
, but this does not retain the logarithmic scale in plots later.
So my question is basically this: When I have calculated a boxplot based on the log10 of my values, how do I convert the plot afterward to be shown on a logarithmic scale instead of linear with the log10 values? I can change tick labels to partly fix this, but I have no clue how I get logarithmic scales back to the plot.
Or is there another more direct way to plotting this. A different package maybe that has this options already included?
Many thanks for the help.
The method yscale() or xscale() takes a single value as a parameter which is the type of conversion of the scale, to convert axes to logarithmic scale we pass the “log” keyword or the matplotlib. scale. LogScale class to the yscale or xscale method.
The logarithmic scale in Matplotlib The scale means the graduations or tick marks along an axis. They can be any of: matplotlib. scale. LinearScale—These are just numbers, like 1, 2, 3.
To create a boxplot with log of the variable in base R, we need to use log argument within the boxplot function but we need to carefully pass the Y-axis inside the function because the values of the boxplot are plotted on the Y-axis.
Bottom line: A logarithmic axis can only plot positive values. There simply is no way to put negative values or zero on a logarithmic axis.
I'd advice against doing the boxplot on the raw values and setting the y-axis to logarithmic, because the boxplot function is not designed to work across orders of magnitudes and you may get too many outliers (depends on your data, of course).
Instead, you can plot the logarithm of the data and manually adjust the y-labels.
Here is a very crude example:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator, FormatStrFormatter
np.random.seed(42)
values = 10 ** np.random.uniform(-3, 3, size=100)
fig = plt.figure(figsize=(9, 3))
ax = plt.subplot(1, 3, 1)
ax.boxplot(np.log10(values))
ax.set_yticks(np.arange(-3, 4))
ax.set_yticklabels(10.0**np.arange(-3, 4))
ax.set_title('log')
ax = plt.subplot(1, 3, 2)
ax.boxplot(values)
ax.set_yscale('log')
ax.set_title('raw')
ax = plt.subplot(1, 3, 3)
ax.boxplot(values, whis=[5, 95])
ax.set_yscale('log')
ax.set_title('5%')
plt.show()
The right figure shows the box plot on the raw values. This leads to many outliers, because the maximum whisker length is computed as a multiple (default: 1.5) of the interquartile range (the box height), which does not scale across orders of magnitude.
Alternatively, you could specify to draw the whiskers for a given percentile range:
ax.boxplot(values, whis=[5, 95])
In this case you get a fixed amount of outlires (5%) above and below.
You can use plt.yscale:
plt.boxplot(data); plt.yscale('log')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With