Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

change color of bar for data selection in seaborn histogram (or plt)

Let's say I have a dataframe like:

X2 = np.random.normal(10, 3, 200)
X3 = np.random.normal(34, 2, 200)

a = pd.DataFrame({"X3": X3, "X2":X2})

and I am doing the following plotting routine:

f, axes = plt.subplots(2, 2,  gridspec_kw={"height_ratios":(.10, .30)}, figsize = (13, 4))
for i, c in enumerate(a.columns):
    sns.boxplot(a[c], ax=axes[0,i])
    sns.distplot(a[c], ax = axes[1,i])
    axes[1, i].set(yticklabels=[])
    axes[1, i].set(xlabel='')
    axes[1, i].set(ylabel='')

plt.tight_layout()
plt.show()

Which yields to:

enter image description here

Now I want to be able to perform a data selection on the dataframe a. Let's say something like:

b = a[(a['X2'] <4)]

and highlight the selection from b in the posted histograms. for example if the first row of b is [32:0] for X3 and [0:5] for X2, the desired output would be:

enter image description here

is it possible to do this with the above for loop and with sns? Many thanks!

EDIT: I am also happy with a matplotlib solution, if easier.

EDIT2:

If it helps, it would be similar to do the following:

b = a[(a['X3'] >38)]

f, axes = plt.subplots(2, 2,  gridspec_kw={"height_ratios":(.10, .30)}, figsize = (13, 4))

for i, c in enumerate(a.columns):
   sns.boxplot(a[c], ax=axes[0,i])
   sns.distplot(a[c], ax = axes[1,i])
   sns.distplot(b[c], ax = axes[1,i])

   axes[1, i].set(yticklabels=[])
   axes[1, i].set(xlabel='')
   axes[1, i].set(ylabel='')

plt.tight_layout()
plt.show()

which yields the following:

enter image description here

However, I would like to be able to just colour those bars in the first plot in a different colour! I also thought about setting the ylim to only the size of the blue plot so that the orange won't distort the shape of the blue distribution, but it wouldn't still be feasible, as in reality I have about 10 histograms to show, and setting ylim would be pretty much the same as sharey=True, which Im trying to avoid, so that I'm able to show the true shape of the distributions.

like image 976
La Cordillera Avatar asked Nov 07 '25 08:11

La Cordillera


2 Answers

I think I found the solution for this using the inspiration from the previous answer and this video:

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

np.random.seed(2021)
X2 = np.random.normal(10, 3, 200)
X3 = np.random.normal(34, 2, 200)

a = pd.DataFrame({"X3": X3, "X2":X2})
b = a[(a['X3'] < 30)]


hist_idx=[]

for i, c in enumerate(a.columns):
    bin_ = np.histogram(a[c], bins=20)[1]
    hist = np.where(np.logical_and(bin_<=max(b[c]), bin_>min(b[c])))
    hist_idx.append(hist)
    

f, axes = plt.subplots(2, 2,  gridspec_kw={"height_ratios":(.10, .30)}, figsize = (13, 4))

for i, c in enumerate(a.columns):
    sns.boxplot(a[c], ax=axes[0,i])
    axes[1, i].hist(a[c], bins = 20)
    axes[1, i].set(yticklabels=[])
    axes[1, i].set(xlabel='')
    axes[1, i].set(ylabel='')
    
for it, index in enumerate(hist_idx):
    lenght = len(index[0])
    for r in range(lenght):
        try:
            axes[1, it].patches[index[0][r]-1].set_fc("red")
        except:
            pass 


plt.tight_layout()
plt.show()

which yields the following for b = a[(a['X3'] < 30)] :

enter image description here

or for b = a[(a['X3'] > 36)]: enter image description here

Thought I'd leave it here - although niche, might help someone in the future!

like image 51
La Cordillera Avatar answered Nov 10 '25 09:11

La Cordillera


I created the following code with the understanding that the intent of your question is to add a different color to the histogram based on the data extracted under certain conditions. Use np.histogram() to get an array of frequencies and an array of bins. Get the index of the value closest to the value of the first row of data extracted for a certain condition. Change the color of the histogram with that retrieved index. The same method can be used to deal with the other graph.

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

np.random.seed(2021)
X2 = np.random.normal(10, 3, 200)
X3 = np.random.normal(34, 2, 200)

a = pd.DataFrame({"X3": X3, "X2":X2})

f, axes = plt.subplots(2, 2,  gridspec_kw={"height_ratios":(.10, .30)}, figsize = (13, 4))
for i, c in enumerate(a.columns):
    sns.boxplot(a[c], ax=axes[0,i])
    sns.distplot(a[c], ax = axes[1,i])
    axes[1, i].set(yticklabels=[])
    axes[1, i].set(xlabel='')
    axes[1, i].set(ylabel='')

b = a[(a['X2'] <4)]
hist3, bins3 = np.histogram(X3)
idx = np.abs(np.asarray(hist3) - b['X3'].head(1).values[0]).argmin()

for k in range(idx):
    axes[1,0].get_children()[k].set_color("red")

plt.tight_layout()
plt.show()

enter image description here

like image 38
r-beginners Avatar answered Nov 10 '25 10:11

r-beginners



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!