I understand that pandas dataframe type has an ability to test the logic of it's value.
here's the code:
import pandas as pd
data = pd.DataFrame(columns=['a', 'b', 'c'])
data = data.append({'a': 'I have data', 'b': 'no more complexe', 'c': 024204}, ignore_index=True)
data = data.append({'a': 'audoausd', 'b': '2048rafaf', 'c': 29313}, ignore_index=True)
data = data.append({'a': 'koplak ente gan', 'b': 'ente g bisa koplak', 'c': 29313}, ignore_index=True)
now we have the following dataframe:
a b c
0 I have data no more complexe 10372
1 audoausd 2048rafaf 29313
2 koplak ente gan ente g bisa koplak 29313
test the logic value for column c and save it to a variable
c = data.c > 20000
will set c to the following value
0 False
1 True
2 True
Name: c, dtype: bool
test the logic value for column b and save it to a variable
b = data.b.str.contains('koplak')
b value
0 False
1 False
2 True
Name: b, dtype: bool
and also for column a
a = data.a.str.contains('koplak')
a value
0 False
1 False
2 True
Name: b, dtype: bool
when i compare all of this values by doing a & b & c will return:
0 False
1 False
2 True
dtype: bool
it's not well fashioned to hard code in case there are many columns involve, so i try to make a list containing all columns logic
logic = [a, b, c]
how do i compare all the items automatically to get a & b & c result?
a & b & c
is equivalent to
import functools
print(functools.reduce(lambda x,y: x & y, [a, b, c]))
which yields
0 False
1 False
2 True
dtype: bool
Unlike my original answer below (suggesting np.logical_and.reduce
), I am confident functools.reduce(lambda x,y: x & y, [a, b, c])
will faithfully return the same Series as a & b & c
.
(In Python2.7, reduce
is a builtin function. functools.reduce
is the same function as reduce
. In Python3, reduce
was removed from the builtins and only functools.reduce
remains. So to future-proof your code, use functools.reduce
.)
Edit: Using np.logical_and.reduce([logic])
may not work in all situations. Here is a counterexample:
import pandas as pd
import numpy as np
x = pd.Series([True,True,False,False], index=[1,2,3,4])
y = pd.Series([True,True,False,False], index=[1,2,3,4])
print(x & y)
prints
1 True
2 True
3 False
4 False
dtype: bool
but np.logical_and.reduce([x,y])
raises a ValueError
print(np.logical_and.reduce([x,y]))
File "/data1/unutbu/.virtualenvs/dev/local/lib/python2.7/site-packages/pandas-0.13.0_98_gd9b0c1f-py2.7-linux-i686.egg/pandas/core/generic.py", line 665, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With