Python Pandas subset column x values based on unique values in column y

Tags:

I have a dataframe ( "df") equivalent to:

In other words I have a category column and a data column, and the data values do not vary within values of the category column, but they may repeat themselves between different categories (i.e. the values in categories 'x' and 'z' are the same -- 0.112). This means that I need to select one data point from each category, rather than just subsetting on unique values of "Data".

The way I've done it is like this:

    aLst = []
    bLst = []
    for i in df.index:
        if df.loc[i,'Cat'] not in aLst:
            aLst += [df.loc[i,'Cat']]
            bLst += [i]

    new_series = pd.Series(df.loc[bLst,'Data'])

Then I can do whatever I want with it. But the problem is this just seems like a clunky, un-pythonic way of doing things. Any suggestions?

347

asked Nov 18 '16 15:11

Cole Robertson

1 Answers

I think you need drop_duplicates:

#by column Cat
print (df.drop_duplicates(['Cat']))
  Cat   Data
0   x  0.112
2   y  0.223
4   z  0.112

Or:

#by columns Cat and Value
print (df.drop_duplicates(['Cat','Data']))
  Cat   Data
0   x  0.112
2   y  0.223
4   z  0.112

189

answered Sep 22 '22 00:09

jezrael

Related questions
                            
                                Reading a file and then overwriting it in Python
                            
                                Redirect qDebug output to file with PyQt5
                            
                                How to calculate the 99% confidence interval for the slope in a linear regression model in python?
                            
                                What is the difference between importing matplotlib and matplotlib.pyplot?
                            
                                How to merge two videos?
                            
                                Open file from windows file dialog with python automatically
                            
                                Deleting all but a few nodes in TensorFlow graph
                            
                                Difference between RandomState and seed in numpy
                            
                                Django REST serializer: create object without saving
                            
                                AWS DynamoDB Python - boto3 Key() methods not recognized (Query)
                            
                                Selenium Scroll inside of popup div
                            
                                Stacked 3d bar chart with matplotlib
                            
                                What is output from OpenCV's Dense optical flow (Farneback) function? How can this be used to build an optical flow map in Python?
                            
                                Setting a default sys.path for a Notebook
                            
                                Combination of two lists while keeping the order
                            
                                Pandas Date Range Monthly on Specific Day of Month
                            
                                How to create a multi-index in Pandas
                            
                                Django Doesn't Serve STATIC_ROOT in DEBUG
                            
                                dask DataFrame equivalent of pandas DataFrame sort_values
                            
                                Python SSO: pysaml2 and python3-saml

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Pandas subset column x values based on unique values in column y

Tags:

python

slice

indexing

pandas

subset

Cole Robertson

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us