I have a column Column1
in a pandas dataframe which is of type str
, values which are in the following form:
import pandas as pd
df = pd.read_table("filename.dat")
type(df["Column1"].ix[0]) #outputs 'str'
print(df["Column1"].ix[0])
which outputs '1/350'
. So, this is currently a string. I would like to convert it into a float.
I tried this:
df["Column1"] = df["Column1"].astype('float64', raise_on_error = False)
But this didn't change the values into floats.
This also failed:
df["Column1"] = df["Column1"].convert_objects(convert_numeric=True)
And this failed:
df["Column1"] = df["Column1"].apply(pd.to_numeric, args=('coerce',))
How do I convert all the values of column "Column1" into floats? Could I somehow use regex to remove the parentheses?
EDIT:
The line
df["Meth"] = df["Meth"].apply(eval)
works, but only if I use it twice, i.e.
df["Meth"] = df["Meth"].apply(eval)
df["Meth"] = df["Meth"].apply(eval)
Why would this be?
You need to evaluate the expression (e.g. '1/350') in order to get the result, for which you can use Python's eval()
function.
By wrapping Panda's apply()
function around it, you can then execute the eval()
function on every value in your column. Example:
df["Column1"].apply(eval)
As you're interpreting literals, you can also use the ast.literal_eval
function as noted in the docs. Update: This won't work, as the use of literal_eval()
is still restricted to additions and subtractions (source).
Remark: as mentioned in other answers and comments on this question, the use of eval()
is not without risks, as you're basically executing whatever input is passed in. In other words, if your input contains malicious code, you're giving it a free pass.
Alternative option:
# Define a custom div function
def div(a,b):
return int(a)/int(b)
# Split each string and pass the values to div
df_floats = df['col1'].apply(lambda x: div(*x.split('/')))
Second alternative in case of unclean data:
By using regular expressions, we can remove any non-digits appearing resp. before the numerator and after the denominator.
# Define a custom div function (unchanged)
def div(a,b):
return int(a)/int(b)
# We'll import the re module and define a precompiled pattern
import re
regex = re.compile('\D*(\d+)/(\d+)\D*')
df_floats = df['col1'].apply(lambda x: div(*regex.findall(x)[0]))
We'll lose a bit of performance, but the upside is that even with input like '!erefdfs?^dfsdf1/350dqsd qsd qs d'
, we still end up with the value of 1/350
.
Performance:
When timing both options on a dataframe with 100.000 rows, the second option (using the user defined div
function) clearly wins:
eval
: 1 loop, best of 3: 1.41 s per loopdiv
: 10 loops, best of 3: 159 ms per loopre
: 1 loop, best of 3: 275 ms per loopIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With