I have a dataframe ( "df") equivalent to:
Cat Data
x 0.112
x 0.112
y 0.223
y 0.223
z 0.112
z 0.112
In other words I have a category column and a data column, and the data values do not vary within values of the category column, but they may repeat themselves between different categories (i.e. the values in categories 'x' and 'z' are the same -- 0.112). This means that I need to select one data point from each category, rather than just subsetting on unique values of "Data".
The way I've done it is like this:
aLst = []
bLst = []
for i in df.index:
if df.loc[i,'Cat'] not in aLst:
aLst += [df.loc[i,'Cat']]
bLst += [i]
new_series = pd.Series(df.loc[bLst,'Data'])
Then I can do whatever I want with it. But the problem is this just seems like a clunky, un-pythonic way of doing things. Any suggestions?
And you can use the following syntax to select unique rows across specific columns in a pandas DataFrame: df = df. drop_duplicates(subset=['col1', 'col2', ...])
You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression. The blow example returns a Courses column where the Fee column value matches with 25000.
DataFrame(). unique() method is used when we deal with a single column of a DataFrame and returns all unique elements of a column. The method returns a DataFrame containing the unique elements of a column, along with their corresponding index labels.
I think you need drop_duplicates
:
#by column Cat
print (df.drop_duplicates(['Cat']))
Cat Data
0 x 0.112
2 y 0.223
4 z 0.112
Or:
#by columns Cat and Value
print (df.drop_duplicates(['Cat','Data']))
Cat Data
0 x 0.112
2 y 0.223
4 z 0.112
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With