This is actually a follow up question of here. I had not been clear in my previous question, and since it has been answered, I felt it was better to post a new question instead.
I have a dataframe like below:
Column1 Column2 Column3 Column4 Column5
5FQ 1.047 S$55.3 UG44.2 as of 02/Jun/2016 S$8.2 mm
600 (1.047) S$23.3 AG5.6 as of 02/Jun/2016 S$58 mm
KI2 1.695 S$5.35 RR59.5 as of 02/Jun/2016 S$705 mm
88G 0.0025 S$(5.3) NW44.2 as of 02/Jun/2016 S$112 mm
60G 5.63 S$78.4 UG21.2 as of 02/Jun/2016 S$6.21 mm
90F (5.562) S$(88.3) IG46.2 as of 02/Jun/2016 S$8 mm
I am trying to use regex
to drop all the words and letters, only keeping the numbers. However, if the number is enclosed within a ()
, I would like to make it a negative number instead.
Desired output
Column1 Column2 Column3 Column4 Column5
5 1.047 55.3 44.2 8.2
600 -1.047 23.3 5.6 58
2 1.695 5.35 59.5 705
88 0.0025 -5.3 44.2 112
60 5.63 78.4 21.2 6.21
90 -5.562 -88.3 46.2 8
Would this be possible? I've tried playing around with this code, but was not sure what the appropriate regex
combination should be.
df.apply(lambda x: x.astype(str).str.extract(r'(\d+\.?\d*)', expand=True).astype(np.float))
r1 = r'\((\d+\.?\d*)\)'
r2 = r'(-?\d+\.?\d*)'
df.stack().str.replace(r1, r'-\1', 1) \
.str.extract(r2, expand=False).unstack()
UPDATE: $1,005A
--> 1005
(example in 1st row, column Column3
)
In [131]: df
Out[131]:
Column1 Column2 Column3 Column4 Column5
0 5FQ 1.047 $1,005A UG44.2 as of 02/Jun/2016 S$8.2 mm
1 600 (1.047) S$23.3 AG5.6 as of 02/Jun/2016 S$58 mm
2 KI2 1.695 S$5.35 RR59.5 as of 02/Jun/2016 S$705 mm
3 88G 0.0025 S$(5.3) NW44.2 as of 02/Jun/2016 S$112 mm
4 60G 5.63 S$78.4 UG21.2 as of 02/Jun/2016 S$6.21 mm
5 90F (5.562) S$(88.3) IG46.2 as of 02/Jun/2016 S$8 mm
In [132]: to_replace = [r'\(([\d\.]+)\)', r'.*?([\d\.\,\-]+).*', ',']
In [133]: vals = [r'-\1', r'\1', '']
In [134]: df.replace(to_replace=to_replace, value=vals, regex=True)
Out[134]:
Column1 Column2 Column3 Column4 Column5
0 5 1.047 1005 44.2 8.2
1 600 -1.047 23.3 5.6 58
2 2 1.695 5.35 59.5 705
3 88 0.0025 -5.3 44.2 112
4 60 5.63 78.4 21.2 6.21
5 90 -5.562 -88.3 46.2 8
OLD answer:
Yet another solution, which uses only DataFrame.replace() method:
In [28]: to_replace = [r'\(([\d\.]+)\)', r'.*?([\d\.-]+).*']
In [29]: vals = [r'-\1', r'\1']
In [30]: df.replace(to_replace=to_replace, value=vals, regex=True)
Out[30]:
Column1 Column2 Column3 Column4 Column5
0 5 1.047 55.3 44.2 8.2
1 600 -1.047 23.3 5.6 58
2 2 1.695 5.35 59.5 705
3 88 0.0025 -5.3 44.2 112
4 60 5.63 78.4 21.2 6.21
5 90 -5.562 -88.3 46.2 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With