I just can't figure out what "==" means at the second line:
- It is not a test, there is no if statement...
- It is not a variable declaration...
I've never seen this before, the thing is data.ctage==cat
is a pandas Series and not a test...
for cat in data["categ"].unique():
subset = data[data.categ == cat] # Création du sous-échantillon
print("-"*20)
print('Catégorie : ' + cat)
print("moyenne:\n",subset['montant'].mean())
print("mediane:\n",subset['montant'].median())
print("mode:\n",subset['montant'].mode())
print("VAR:\n",subset['montant'].var())
print("EC:\n",subset['montant'].std())
plt.figure(figsize=(5,5))
subset["montant"].hist(bins=30) # Crée l'histogramme
plt.show() # Affiche l'histogramme
It is testing each element of data.categ
for equality with cat
. That produces a vector of True/False values. This is passed as in indexer to data[]
, which returns the rows from data
that correspond to the True values in the vector.
To summarize, the whole expression returns the subset of rows from data
where the value of data.categ
equals cat
.
(Seems possible the whole operation could be done more elegantly using data.groupBy('categ').apply(someFunc)
.)
It creates a boolean series with indexes where data.categ
is equal to cat
, with this boolean mask, you can filter your dataframe, in other words subset
will have all records where the categ
is the value stored in cat
.
This is an example using numeric data
np.random.seed(0)
a = np.random.choice(np.arange(2), 5)
b = np.random.choice(np.arange(2), 5)
df = pd.DataFrame(dict(a = a, b = b))
df[df.a == 0].head()
# a b
# 0 0 0
# 2 0 0
# 4 0 1
df[df.a == df.b].head()
# a b
# 0 0 0
# 2 0 0
# 3 1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With