I am trying to change type of numbers in a column of a DataFrame from integer to floating point. It should be straightforward to do this, but it's not working. The data type remains to be integer. What am I missing?
In  [2]: using DataFrames
df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])
Out [2]: 4x2 DataFrame
| Row | A | B   |
|-----|---|-----|
| 1   | 1 | "M" |
| 2   | 2 | "F" |
| 3   | 3 | "F" |
| 4   | 4 | "M" |
In  [3]: df[:,:A] = float64(df[:,:A])
Out [3]: 4-element DataArray{Float64,1}:
 1.0
 2.0
 3.0
 4.0
In  [4]: df
Out [4]: 4x2 DataFrame
| Row | A | B   |
|-----|---|-----|
| 1   | 1 | "M" |
| 2   | 2 | "F" |
| 3   | 3 | "F" |
| 4   | 4 | "M" |
In  [5]: typeof(df[:,:A])
Out [5]: DataArray{Int64,1} (constructor with 1 method)
                pandas Convert String to FloatUse pandas DataFrame. astype() function to convert column from string/int to float, you can apply this on a specific column or on an entire DataFrame. To cast the data type to 54-bit signed float, you can use numpy. float64 , numpy.
DataFrame is a 2 dimensional mutable data structure, that is used for handling tabular data. Unlike Arrays and Matrices, a DataFrame can hold columns of different data types. The DataFrames package in Julia provides the DataFrame object which is used to hold and manipulate tabular data in a flexible and convenient way.
The reason this happens is mutation and conversion. If you have two vectors
a = [1:3]
b = [4:6]
you can make x refer to one of them with assignment.
x = a
Now x and a refer to the same vector [1, 2, 3]. If you then assign b to x 
x = b
you have now changed x to refer to the same vector as b refers to. 
You can also mutate vectors by copying over the values in one vector to the other. If you do
x[:] = a
you copy over the values in vector a to the vector b, so now you have two vectors with [1, 2, 3]. 
Then there is also conversion. If you copy a value of one type into a vector of another value Julia will attempt to convert the value to that of the elements vector.
x[1] = 5.0
This gives you a the vector [5, 2, 3] because Julia converted the Float64 value 5.0 to the Int value 5. If you tried 
x[1] = 5.5
Julia will throw a InexactError() because 5.5 can't be losslessly converted to an integer. 
When it comes to DataFrames things work the same as long as you realize a DataFrame is a collection of named references to vectors. So what you are doing when constructing the DataFrame in this call
df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])
is that you create the vector [1, 2, 3, 4], and the vector ["M", "F", "F", "M"]. You then construct a DataFrame with references to these two new vectors. 
Later when you do
df[:,:A] = float64(df[:,:A])
you first create a new vector by converting the values in the vector [1, 2, 3, 4] into Float64. You then mutate the vector referred to with df[:A] by copying over the values in the Float64 vector back into the Int vector, which causes Julia to convert the values back to Int. 
What Colin T Bower's answer
df[:A] = float64(df[:A])
does is that rather than mutating the vector referred to by the DataFrame, he changes the reference to refer to the vector with the Flaot64 values. 
I hope this makes sense.
Try this:
df[:A] = float64(df[:A])
This works for me on Julia v0.3.5 with DataFrames v0.6.1.
This is quite interesting though. Notice that:
df[:, :A] = [2.0, 2.0, 3.0, 4.0]
will change the contents of the column to [2,2,3,4], but leaves the type as Int64, while 
df[:A] = [2.0, 2.0, 3.0, 4.0]
will also change the type.
I just had quick look at the manual and couldn't see any reference to this behaviour (admittedly it was a very quick look). But I find this unintuitive enough that perhaps it is worth filing an issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With