I want to get a list of all distinct or unique values of one variable in a dataframe that coincide with a specific value of another variable in that dataframe.
In Stata I would use something like:
levelsof(ID1) if ID2==i
How do I do this in Python?
Stata's levelsof is equivalent to pandas's unique(). They both return an array of unique or distinct values.
>>> df=pd.DataFrame({ 'id1':[0,0,1,1,2,2],
'id2':[5,5,5,6,6,6] })
id1 id2
0 0 5
1 0 5
2 1 5
3 1 6
4 2 6
5 2 6
>>> df.loc[ df['id2'] == 5, 'id1' ].unique()
array([0, 1])
Say your columns are ID1 and ID2, and the DataFrame is df. Then
df.ID1[df.ID2 == i]
will give all the values of the first column where the second one is i.
Following that, you can do
df.ID1[df.ID2 == i].value_counts()
to get a breakdown,
df.ID1[df.ID2 == i].unique()
to get unique values,
df.ID1[df.ID2 == i].describe()
to get a description, and so forth (I don't know what levelsof is exactly).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With