Pandas isin with multiple columns

Tags:

pandas

I want to select all rows in a dataframe which contain values defined in a list. I've got two approaches which both do not work as expected/wanted.

My dataframe looks something like this:

Timestamp DEVICE READING VALUE
1 | DEV1 | READ1 | randomvalue
2 | DEV1 | READ2 | randomvalue
3 | DEV2 | READ1 | randomvalue
4 | DEV2 | READ2 | randomvalue
5 | DEV3 | READ1 | randomvalue

and I've got the list (ls) like follows:

[[DEV1, READ1], [DEV1, READ2], [DEV2,READ1]]

In this scenario I want to remove line 4 and 5:

My first approach was:

df = df[(df['DEVICE']. isin([ls[i][0] for i in range(len(ls))])) &
        (df['READING'].isin([ls[k][1] for k in range(len(ls))]))]

The problem with this one is obviously, that it does not remove line 4, because DEV2 has the READING READ2, but it should remove it.

My second approach was:

df = df[(df[['DEVICE','READING']].isin({'DEVICE':  [ls[i][0] for i in range(len(ls))],
                                        'READING': [ls[i][1] for i in range(len(ls))] }))]

This one selects the correct rows but it does not remove the other rows. Instead it sets every other cell to NaN, including the VALUE ROW, which i do want to keep. And It does not accumulate both so row 4 looks like 4 |DEV2|NaN|NaN

What would be the easiest or best way, to solve this problem? Can you help me?

~Fabian

778

asked Nov 09 '18 00:11

PythonF

2 Answers

You can convert the list to list of tuples. Convert the required columns in dataframe to tuples and use isin

l = [['DEV1', 'READ1'], ['DEV1', 'READ2'], ['DEV2','READ1']]
l = [tuple(i) for i in l]
df[df[['DEVICE', 'READING']].apply(tuple, axis = 1).isin(l)]

You get

    Timestamp   DEVICE  READING VALUE
0   1   DEV1    READ1   randomvalue
1   2   DEV1    READ2   randomvalue
2   3   DEV2    READ1   randomvalue

161

answered Oct 11 '22 00:10

Vaishali

You can use a multi-index to solve this problem.

values = [['DEV1', 'READ1'], ['DEV1', 'READ2'], ['DEV2', 'READ1']]
# DataFrame.loc requires tuples for multi-index lookups
index_values = [tuple(v) for v in values]

filtered = df.set_index(['DEVICE', 'READING']).loc[index_values].reset_index()
print(filtered)

  DEVICE READING  Timestamp        VALUE
0   DEV1   READ1          1  randomvalue
1   DEV1   READ2          2  randomvalue
2   DEV2   READ1          3  randomvalue

answered Oct 10 '22 22:10

Matthias Ossadnik

Related questions
                            
                                Understanding class type '__main__.ClassName'
                            
                                How to set the request timeout in google ml api python client?
                            
                                What is the opposite of cv2.VideoWriter_fourcc?
                            
                                Pandas: Create Boxplot Grouped By Column
                            
                                Python 3 int division operator is returning a float?
                            
                                pip 10 no module named pip.req
                            
                                How to avoid incorrect rounding with numpy.round?
                            
                                Modifying value on serialization - Django Rest Framework
                            
                                Django filter on date difference between columns
                            
                                Explain python Singleton class
                            
                                How to assign a unique ID to detect repeated rows in a pandas dataframe?
                            
                                How to run Spyder with Python 3.7 with Anaconda
                            
                                Type hints: when to annotate
                            
                                Get CSRF token using python requests
                            
                                How to use an aiohttp ClientSession with Sanic?
                            
                                How do I turn a Pytorch Dataloader into a numpy array to display image data with matplotlib?
                            
                                Keras: "must compile model before using it" despite compile() is used
                            
                                how to get the attribute of setter method of property in python
                            
                                Nested list comprehension with if statement
                            
                                How can I recover the commit message when the git commit-msg hook fails?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With