It's a very interesting question and I am seeking help of experts to understand more about it and how to do it. I have a DataFrame (which I made while parsing data from Big Iron... still exists). Now I want to remove some rows by using regular expression but don't know how does it work in Pandas.
24 | DRFT.146.856 | Dollar- | (60.00) | DEBITS- | 0.00 | CREDITSDRA- | 0.00
25 | 0616-21.01 | 2407 | WAYZAT | TMCD | JUNE | 16,DRA |2013
26 | AND | CORRECTION |JOURNAL00 | <DB> |KLRETY | CATEGORYDRA- | *
27 | DRFT.146.867 | Dollar- | (200.00) | DEBITS- | 0.00 | CREDITSDRA- | 0.00
28 | DRFT.146.922 | Dollar- | (25.00) |DEBITS- | 0.00 | CREDITSDRA- |0.00
29 | DRFT.146.963 | Dollar- | (100.00) | DEBITS- | 0.00 | CREDITSDRA- | 0.00
30 | DRFT.146.964 | Dollar- | (100.00) | DEBITS- | 0.00 | CREDITSDRA- | 0.00
The row of concern is 25 & 26 where the data is not following any pattern. Any clue.
A couple of possible contenders:
In [11]: df[2].str.contains('Dollar')
Out[11]:
0 True
1 False
2 False
3 True
4 True
5 True
6 True
Name: 2, dtype: bool
In [12]: df[3].str.startswith('(')
Out[12]:
0 True
1 False
2 False
3 True
4 True
5 True
6 True
Name: 3, dtype: bool
Doing this kind of thing is always a bit of a dark art (as there is usually a lot of data and some could look very similar to the good data)...
In [13]: df[df[3].str.startswith('(')]
Out[13]:
0 1 2 3 4 5 6 7
0 24 DRFT.146.856 Dollar- (60.00) DEBITS- 0.00 CREDITSDRA- 0
3 27 DRFT.146.867 Dollar- (200.00) DEBITS- 0.00 CREDITSDRA- 0
4 28 DRFT.146.922 Dollar- (25.00) DEBITS- 0.00 CREDITSDRA- 0
5 29 DRFT.146.963 Dollar- (100.00) DEBITS- 0.00 CREDITSDRA- 0
6 30 DRFT.146.964 Dollar- (100.00) DEBITS- 0.00 CREDITSDRA- 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With