Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does using "==" return a Series instead of bool in pandas?

I just can't figure out what "==" means at the second line:
- It is not a test, there is no if statement...
- It is not a variable declaration...

I've never seen this before, the thing is data.ctage==cat is a pandas Series and not a test...

for cat in data["categ"].unique():
    subset = data[data.categ == cat] # Création du sous-échantillon
    print("-"*20)
    print('Catégorie : ' + cat)
    print("moyenne:\n",subset['montant'].mean())
    print("mediane:\n",subset['montant'].median())
    print("mode:\n",subset['montant'].mode())
    print("VAR:\n",subset['montant'].var())
    print("EC:\n",subset['montant'].std())
    plt.figure(figsize=(5,5))
    subset["montant"].hist(bins=30) # Crée l'histogramme
    plt.show() # Affiche l'histogramme
like image 383
Xomuama Avatar asked Apr 20 '20 16:04

Xomuama


2 Answers

It is testing each element of data.categ for equality with cat. That produces a vector of True/False values. This is passed as in indexer to data[], which returns the rows from data that correspond to the True values in the vector.

To summarize, the whole expression returns the subset of rows from data where the value of data.categ equals cat.

(Seems possible the whole operation could be done more elegantly using data.groupBy('categ').apply(someFunc).)

like image 103
Dave Costa Avatar answered Sep 28 '22 03:09

Dave Costa


It creates a boolean series with indexes where data.categ is equal to cat , with this boolean mask, you can filter your dataframe, in other words subset will have all records where the categ is the value stored in cat.

This is an example using numeric data

np.random.seed(0)
a = np.random.choice(np.arange(2), 5)
b = np.random.choice(np.arange(2), 5)
df = pd.DataFrame(dict(a = a, b = b))


df[df.a == 0].head()

#   a   b
# 0 0   0
# 2 0   0
# 4 0   1

df[df.a == df.b].head()

#   a   b
# 0 0   0
# 2 0   0
# 3 1   1
like image 38
jcaliz Avatar answered Sep 28 '22 04:09

jcaliz