I am trying to change type of numbers in a column of a DataFrame from integer to floating point. It should be straightforward to do this, but it's not working. The data type remains to be integer. What am I missing?
In [2]: using DataFrames
df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])
Out [2]: 4x2 DataFrame
| Row | A | B |
|-----|---|-----|
| 1 | 1 | "M" |
| 2 | 2 | "F" |
| 3 | 3 | "F" |
| 4 | 4 | "M" |
In [3]: df[:,:A] = float64(df[:,:A])
Out [3]: 4-element DataArray{Float64,1}:
1.0
2.0
3.0
4.0
In [4]: df
Out [4]: 4x2 DataFrame
| Row | A | B |
|-----|---|-----|
| 1 | 1 | "M" |
| 2 | 2 | "F" |
| 3 | 3 | "F" |
| 4 | 4 | "M" |
In [5]: typeof(df[:,:A])
Out [5]: DataArray{Int64,1} (constructor with 1 method)
pandas Convert String to FloatUse pandas DataFrame. astype() function to convert column from string/int to float, you can apply this on a specific column or on an entire DataFrame. To cast the data type to 54-bit signed float, you can use numpy. float64 , numpy.
DataFrame is a 2 dimensional mutable data structure, that is used for handling tabular data. Unlike Arrays and Matrices, a DataFrame can hold columns of different data types. The DataFrames package in Julia provides the DataFrame object which is used to hold and manipulate tabular data in a flexible and convenient way.
The reason this happens is mutation and conversion. If you have two vectors
a = [1:3]
b = [4:6]
you can make x
refer to one of them with assignment.
x = a
Now x
and a
refer to the same vector [1, 2, 3]
. If you then assign b
to x
x = b
you have now changed x
to refer to the same vector as b
refers to.
You can also mutate vectors by copying over the values in one vector to the other. If you do
x[:] = a
you copy over the values in vector a
to the vector b
, so now you have two vectors with [1, 2, 3]
.
Then there is also conversion. If you copy a value of one type into a vector of another value Julia will attempt to convert the value to that of the elements vector.
x[1] = 5.0
This gives you a the vector [5, 2, 3]
because Julia converted the Float64
value 5.0
to the Int
value 5
. If you tried
x[1] = 5.5
Julia will throw a InexactError()
because 5.5
can't be losslessly converted to an integer.
When it comes to DataFrames things work the same as long as you realize a DataFrame is a collection of named references to vectors. So what you are doing when constructing the DataFrame in this call
df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])
is that you create the vector [1, 2, 3, 4]
, and the vector ["M", "F", "F", "M"]
. You then construct a DataFrame with references to these two new vectors.
Later when you do
df[:,:A] = float64(df[:,:A])
you first create a new vector by converting the values in the vector [1, 2, 3, 4]
into Float64
. You then mutate the vector referred to with df[:A]
by copying over the values in the Float64
vector back into the Int
vector, which causes Julia to convert the values back to Int
.
What Colin T Bower's answer
df[:A] = float64(df[:A])
does is that rather than mutating the vector referred to by the DataFrame, he changes the reference to refer to the vector with the Flaot64
values.
I hope this makes sense.
Try this:
df[:A] = float64(df[:A])
This works for me on Julia v0.3.5 with DataFrames v0.6.1.
This is quite interesting though. Notice that:
df[:, :A] = [2.0, 2.0, 3.0, 4.0]
will change the contents of the column to [2,2,3,4]
, but leaves the type as Int64
, while
df[:A] = [2.0, 2.0, 3.0, 4.0]
will also change the type.
I just had quick look at the manual and couldn't see any reference to this behaviour (admittedly it was a very quick look). But I find this unintuitive enough that perhaps it is worth filing an issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With