I have a simple dataframe as follows:
Last Known Date ConfigredValue ReferenceValue
0 24-Jun-17 False FALSE
1 25-Jun-17 FALSE FALSE
2 26-Jun-17 TRUE FALSE
3 27-Jun-17 FALSE FALSE
4 28-Jun-17 false FALSE
If I do the following command
df=df[df['ConfigredValue']!=dfs['ReferenceValue']]
then I get as below
0 24-Jun-17 False FALSE
2 26-Jun-17 TRUE FALSE
4 28-Jun-17 false FALSE
But I want the filter with case insensitive (case=False)
I want following output:
2 26-Jun-17 TRUE FALSE
Please suggest, how to get filtered case insensitive data(case=false)
The simplest is to convert the two columns to lower (or to upper) before checking for equality:
df=df[df['ConfigredValue'].str.lower()!=df['ReferenceValue'].str.lower()]
or
df=df[df['ConfigredValue'].str.upper()!=df['ReferenceValue'].str.upper()]
output:
Out:
Last Known Date ConfigredValue ReferenceValue
2 2 26-Jun-17 TRUE FALSE
In this particuler case, you can simply compare the lengths of TRUE and True, they are the same wether the string is upper or lower case:
df[df['ConfigredValue'].str.len()!=df['ReferenceValue'].str.len()]
output:
Out:
Last Known Date ConfigredValue ReferenceValue
2 2 26-Jun-17 TRUE FALSE
str.title()
was also suggested in @0p3n5ourcE answer, here's a vectorized version of it:
df[df['ConfigredValue'].str.title()!=df['ReferenceValue'].str.title()]
Benchmarking the speed shows that str.len()
is a bit faster
In [35]: timeit df[df['ConfigredValue'].str.lower()!=df['ReferenceValue'].str.lower()]
1000 loops, best of 3: 496 µs per loop
In [36]: timeit df[df['ConfigredValue'].str.upper()!=df['ReferenceValue'].str.upper()]
1000 loops, best of 3: 496 µs per loop
In [37]: timeit df[df['ConfigredValue'].str.title()!=df['ReferenceValue'].str.title()]
1000 loops, best of 3: 495 µs per loop
In [38]: timeit df[df['ConfigredValue'].str.len()!=df['ReferenceValue'].str.len()]
1000 loops, best of 3: 479 µs per loop
Better replace existing false with 'FALSE' with case = False
parameter ie
df['ConfigredValue'] = df['ConfigredValue'].str.replace('false','FALSE',case=False)
df=df[df['ConfigredValue']!=df['ReferenceValue']]
Output:
Last Known_Date ConfigredValue ReferenceValue 2 2 26-Jun-17 TRUE FALSE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With