Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What happens when you compare 2 pandas Series

I ran up against unexpected behavior in pandas when comparing two series. I wanted to know if this is intended or a bug.

suppose I:

import pandas as pd
x = pd.Series([1, 1, 1, 0, 0, 0], index=['a', 'b', 'c', 'd', 'e', 'f'], name='Value')
y = pd.Series([0, 2, 0, 2, 0, 2], index=['c', 'f', 'a', 'e', 'b', 'd'], name='Value')

x > y

yields:

a     True
b    False
c     True
d    False
e    False
f    False
Name: Value, dtype: bool

which isn't what I wanted. Clearly, I expected the indexes to line up. But I have to explicitly line them up to get the desired results.

x > y.reindex_like(x)

yields:

a     True
b     True
c     True
d    False
e    False
f    False
Name: Value, dtype: bool

Which is what I expected.

What's worse is if I:

x + y

I get:

a    1
b    1
c    1
d    2
e    2
f    2
Name: Value, dtype: int64

So when operating, the indexes line up. When comparing, they do not. Is my observation accurate? Is this intended for some purpose?

Thanks,

-PiR

like image 323
piRSquared Avatar asked Aug 21 '14 20:08

piRSquared


People also ask

How do you compare two pandas Series?

Compare two Series objects of the same length and return a Series where each element is True if the element in each Series is equal, False otherwise. Compare two DataFrame objects of the same shape and return a DataFrame where each element is True if the respective element in each DataFrame is equal, False otherwise.

How do you find the difference between two Series in Python?

diff() is used to find difference between elements of the same series. The difference is sequential and depends on period parameter passed to diff() method.

How do you compare values between two data frames?

The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.


1 Answers

Bug or not. I would suggest to make a dataframe and compare the series inside the dataframe.

import pandas as pd
x = pd.Series([1, 1, 1, 0, 0, 0], index=['a', 'b', 'c', 'd', 'e', 'f'], name='Value_x')
y = pd.Series([0, 2, 0, 2, 0, 2], index=['c', 'f', 'a', 'e', 'b', 'd'], name='Value_y')

df = pd.DataFrame({"Value_x":x, "Value_y":y})
df['Value_x'] > df['Value_y']

Out[3]:

a     True
b     True
c     True
d    False
e    False
f    False
dtype: bool
like image 148
firelynx Avatar answered Oct 19 '22 09:10

firelynx