I have a csv data frame like below, I'd like to compare two column value and generate third column, if value is same will return True, not same return False, how to compare with pandas python?
one two
1 a
2 b
3 a
4 b
5 5
6 6
7 7
8 8
9 9
10 10
You need if values are mixed (string and int):
df['three'] = df.one == df.two
But need to_numeric if values are not mixed - dtype of first column is int and second is object what is obviously string and in column one are not NaN values, because to_numeric with parameter errors='coerce' return NaN for non numeric values:
print (pd.to_numeric(df.two, errors='coerce'))
0 NaN
1 NaN
2 NaN
3 NaN
4 5.0
5 6.0
6 7.0
7 8.0
8 9.0
9 10.0
Name: two, dtype: float64
df['three'] = df.one == pd.to_numeric(df.two, errors='coerce')
print (df)
one two three
0 1 a False
1 2 b False
2 3 a False
3 4 b False
4 5 5 True
5 6 6 True
6 7 7 True
7 8 8 True
8 9 9 True
9 10 10 True
Faster solution with Series.eq:
df['three'] = df.one.eq(pd.to_numeric(df.two, errors='coerce'))
print (df)
one two three
0 1 a False
1 2 b False
2 3 a False
3 4 b False
4 5 5 True
5 6 6 True
6 7 7 True
7 8 8 True
8 9 9 True
9 10 10 True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With