Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia: converting column type from Integer to Float64 in a DataFrame

I am trying to change type of numbers in a column of a DataFrame from integer to floating point. It should be straightforward to do this, but it's not working. The data type remains to be integer. What am I missing?

In  [2]: using DataFrames
df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])

Out [2]: 4x2 DataFrame
| Row | A | B   |
|-----|---|-----|
| 1   | 1 | "M" |
| 2   | 2 | "F" |
| 3   | 3 | "F" |
| 4   | 4 | "M" |

In  [3]: df[:,:A] = float64(df[:,:A])

Out [3]: 4-element DataArray{Float64,1}:
 1.0
 2.0
 3.0
 4.0

In  [4]: df

Out [4]: 4x2 DataFrame
| Row | A | B   |
|-----|---|-----|
| 1   | 1 | "M" |
| 2   | 2 | "F" |
| 3   | 3 | "F" |
| 4   | 4 | "M" |

In  [5]: typeof(df[:,:A])

Out [5]: DataArray{Int64,1} (constructor with 1 method)
like image 703
Pooya Avatar asked Feb 27 '15 01:02

Pooya


People also ask

How do you create a float in a data frame?

pandas Convert String to FloatUse pandas DataFrame. astype() function to convert column from string/int to float, you can apply this on a specific column or on an entire DataFrame. To cast the data type to 54-bit signed float, you can use numpy. float64 , numpy.

What is a DataFrame in Julia?

DataFrame is a 2 dimensional mutable data structure, that is used for handling tabular data. Unlike Arrays and Matrices, a DataFrame can hold columns of different data types. The DataFrames package in Julia provides the DataFrame object which is used to hold and manipulate tabular data in a flexible and convenient way.


2 Answers

The reason this happens is mutation and conversion. If you have two vectors

a = [1:3]
b = [4:6]

you can make x refer to one of them with assignment.

x = a

Now x and a refer to the same vector [1, 2, 3]. If you then assign b to x

x = b

you have now changed x to refer to the same vector as b refers to.

You can also mutate vectors by copying over the values in one vector to the other. If you do

x[:] = a

you copy over the values in vector a to the vector b, so now you have two vectors with [1, 2, 3].

Then there is also conversion. If you copy a value of one type into a vector of another value Julia will attempt to convert the value to that of the elements vector.

x[1] = 5.0

This gives you a the vector [5, 2, 3] because Julia converted the Float64 value 5.0 to the Int value 5. If you tried

x[1] = 5.5

Julia will throw a InexactError() because 5.5 can't be losslessly converted to an integer.

When it comes to DataFrames things work the same as long as you realize a DataFrame is a collection of named references to vectors. So what you are doing when constructing the DataFrame in this call

df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])

is that you create the vector [1, 2, 3, 4], and the vector ["M", "F", "F", "M"]. You then construct a DataFrame with references to these two new vectors.

Later when you do

df[:,:A] = float64(df[:,:A])

you first create a new vector by converting the values in the vector [1, 2, 3, 4] into Float64. You then mutate the vector referred to with df[:A] by copying over the values in the Float64 vector back into the Int vector, which causes Julia to convert the values back to Int.

What Colin T Bower's answer

df[:A] = float64(df[:A])

does is that rather than mutating the vector referred to by the DataFrame, he changes the reference to refer to the vector with the Flaot64 values.

I hope this makes sense.

like image 67
Mr Alpha Avatar answered Oct 12 '22 06:10

Mr Alpha


Try this:

df[:A] = float64(df[:A])

This works for me on Julia v0.3.5 with DataFrames v0.6.1.

This is quite interesting though. Notice that:

df[:, :A] = [2.0, 2.0, 3.0, 4.0]

will change the contents of the column to [2,2,3,4], but leaves the type as Int64, while

df[:A] = [2.0, 2.0, 3.0, 4.0]

will also change the type.

I just had quick look at the manual and couldn't see any reference to this behaviour (admittedly it was a very quick look). But I find this unintuitive enough that perhaps it is worth filing an issue.

like image 37
Colin T Bowers Avatar answered Oct 12 '22 05:10

Colin T Bowers