I am trying to change type of numbers in a column of a DataFrame from integer to floating point. It should be straightforward to do this, but it's not working. The data type remains to be integer. What am I missing? <pre class="prettyprint"><code>In [2]: using DataFrames df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"]) Out [2]: 4x2 DataFrame | Row | A | B | |-----|---|-----| | 1 | 1 | "M" | | 2 | 2 | "F" | | 3 | 3 | "F" | | 4 | 4 | "M" | In [3]: df[:,:A] = float64(df[:,:A]) Out [3]: 4-element DataArray{Float64,1}: 1.0 2.0 3.0 4.0 In [4]: df Out [4]: 4x2 DataFrame | Row | A | B | |-----|---|-----| | 1 | 1 | "M" | | 2 | 2 | "F" | | 3 | 3 | "F" | | 4 | 4 | "M" | In [5]: typeof(df[:,:A]) Out [5]: DataArray{Int64,1} (constructor with 1 method) </code></pre>

The reason this happens is mutation and conversion. If you have two vectors <pre class="prettyprint"><code>a = [1:3] b = [4:6] </code></pre> you can make <code>x</code> refer to one of them with assignment. <pre class="prettyprint"><code>x = a </code></pre> Now <code>x</code> and <code>a</code> refer to the same vector <code>[1, 2, 3]</code>. If you then assign <code>b</code> to <code>x</code> <pre class="prettyprint"><code>x = b </code></pre> you have now changed <code>x</code> to refer to the same vector as <code>b</code> refers to. You can also mutate vectors by copying over the values in one vector to the other. If you do <pre class="prettyprint"><code>x[:] = a </code></pre> you copy over the values in vector <code>a</code> to the vector <code>b</code>, so now you have two vectors with <code>[1, 2, 3]</code>. Then there is also conversion. If you copy a value of one type into a vector of another value Julia will attempt to convert the value to that of the elements vector. <pre class="prettyprint"><code>x[1] = 5.0 </code></pre> This gives you a the vector <code>[5, 2, 3]</code> because Julia converted the <code>Float64</code> value <code>5.0</code> to the <code>Int</code> value <code>5</code>. If you tried <pre class="prettyprint"><code>x[1] = 5.5 </code></pre> Julia will throw a <code>InexactError()</code> because <code>5.5</code> can't be losslessly converted to an integer. When it comes to DataFrames things work the same as long as you realize a DataFrame is a collection of named references to vectors. So what you are doing when constructing the DataFrame in this call <pre class="prettyprint"><code>df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"]) </code></pre> is that you create the vector <code>[1, 2, 3, 4]</code>, and the vector <code>["M", "F", "F", "M"]</code>. You then construct a DataFrame with references to these two new vectors. Later when you do <pre class="prettyprint"><code>df[:,:A] = float64(df[:,:A]) </code></pre> you first create a new vector by converting the values in the vector <code>[1, 2, 3, 4]</code> into <code>Float64</code>. You then mutate the vector referred to with <code>df[:A]</code> by copying over the values in the <code>Float64</code> vector back into the <code>Int</code> vector, which causes Julia to convert the values back to <code>Int</code>. What Colin T Bower's answer <pre class="prettyprint"><code>df[:A] = float64(df[:A]) </code></pre> does is that rather than mutating the vector referred to by the DataFrame, he changes the reference to refer to the vector with the <code>Flaot64</code> values. I hope this makes sense.

Try this: <pre class="prettyprint"><code>df[:A] = float64(df[:A]) </code></pre> This works for me on Julia v0.3.5 with DataFrames v0.6.1. This is quite interesting though. Notice that: <pre class="prettyprint"><code>df[:, :A] = [2.0, 2.0, 3.0, 4.0] </code></pre> will change the contents of the column to <code>[2,2,3,4]</code>, but leaves the type as <code>Int64</code>, while <pre class="prettyprint"><code>df[:A] = [2.0, 2.0, 3.0, 4.0] </code></pre> will also change the type. I just had quick look at the manual and couldn't see any reference to this behaviour (admittedly it was a very quick look). But I find this unintuitive enough that perhaps it is worth filing an issue.

Julia: converting column type from Integer to Float64 in a DataFrame

Tags:

dataframe

julia

I am trying to change type of numbers in a column of a DataFrame from integer to floating point. It should be straightforward to do this, but it's not working. The data type remains to be integer. What am I missing?

Click to copy

In  [2]: using DataFrames
df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])

Out [2]: 4x2 DataFrame
| Row | A | B   |
|-----|---|-----|
| 1   | 1 | "M" |
| 2   | 2 | "F" |
| 3   | 3 | "F" |
| 4   | 4 | "M" |

In  [3]: df[:,:A] = float64(df[:,:A])

Out [3]: 4-element DataArray{Float64,1}:
 1.0
 2.0
 3.0
 4.0

In  [4]: df

Out [4]: 4x2 DataFrame
| Row | A | B   |
|-----|---|-----|
| 1   | 1 | "M" |
| 2   | 2 | "F" |
| 3   | 3 | "F" |
| 4   | 4 | "M" |

In  [5]: typeof(df[:,:A])

Out [5]: DataArray{Int64,1} (constructor with 1 method)

703

asked Feb 27 '15 01:02

Pooya

2 Answers

The reason this happens is mutation and conversion. If you have two vectors

Click to copy

a = [1:3]
b = [4:6]

you can make x refer to one of them with assignment.

Click to copy

x = a

Now x and a refer to the same vector [1, 2, 3]. If you then assign b to x

Click to copy

x = b

you have now changed x to refer to the same vector as b refers to.

You can also mutate vectors by copying over the values in one vector to the other. If you do

Click to copy

x[:] = a

you copy over the values in vector a to the vector b, so now you have two vectors with [1, 2, 3].

Then there is also conversion. If you copy a value of one type into a vector of another value Julia will attempt to convert the value to that of the elements vector.

Click to copy

x[1] = 5.0

This gives you a the vector [5, 2, 3] because Julia converted the Float64 value 5.0 to the Int value 5. If you tried

Click to copy

x[1] = 5.5

Julia will throw a InexactError() because 5.5 can't be losslessly converted to an integer.

When it comes to DataFrames things work the same as long as you realize a DataFrame is a collection of named references to vectors. So what you are doing when constructing the DataFrame in this call

Click to copy

df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"])

is that you create the vector [1, 2, 3, 4], and the vector ["M", "F", "F", "M"]. You then construct a DataFrame with references to these two new vectors.

Later when you do

Click to copy

df[:,:A] = float64(df[:,:A])

you first create a new vector by converting the values in the vector [1, 2, 3, 4] into Float64. You then mutate the vector referred to with df[:A] by copying over the values in the Float64 vector back into the Int vector, which causes Julia to convert the values back to Int.

What Colin T Bower's answer

Click to copy

df[:A] = float64(df[:A])

does is that rather than mutating the vector referred to by the DataFrame, he changes the reference to refer to the vector with the Flaot64 values.

I hope this makes sense.

answered Oct 12 '22 06:10

Mr Alpha

Try this:

Click to copy

df[:A] = float64(df[:A])

This works for me on Julia v0.3.5 with DataFrames v0.6.1.

This is quite interesting though. Notice that:

Click to copy

df[:, :A] = [2.0, 2.0, 3.0, 4.0]

will change the contents of the column to [2,2,3,4], but leaves the type as Int64, while

Click to copy

df[:A] = [2.0, 2.0, 3.0, 4.0]

will also change the type.

I just had quick look at the manual and couldn't see any reference to this behaviour (admittedly it was a very quick look). But I find this unintuitive enough that perhaps it is worth filing an issue.

answered Oct 12 '22 05:10

Colin T Bowers

Related questions
                            
                                Jupyter pandas.DataFrame output table format configuration
                            
                                Joining a large and a massive spark dataframe
                            
                                Pandas equivalent of SQL non-equi JOIN
                            
                                define a function use other function names as parameter
                            
                                Why can't you replace integers with lists using `replace` method - pandas
                            
                                Pandas .at throwing ValueError: At based indexing on an integer index can only have integer indexers
                            
                                Convert a "loadings" object to a dataframe (R)
                            
                                Pandas read_excel sometimes creates index even when index_col=None
                            
                                DataFrame object has no attribute 'name'
                            
                                Can I split this column containing a mix of tuples/None more efficiently?
                            
                                Fill rows with consecutive values and above rows using pandas
                            
                                python how to find the number of days in each month from Dec 2019 and forward between two date columns
                            
                                Rearranging columns with pandas: Is there an equivalent to dplyr's select(..., everything())?
                            
                                Pandas: Insert missing row data and iterate with conditions within groups
                            
                                How to convert vertical pandas table of 2 columns to horizontal table based on common ID value in python
                            
                                Storing complex time-series in R
                            
                                create binary matrix from data.frame
                            
                                Python - pandas - Append Series into Blank DataFrame
                            
                                Rolling a function on a data frame
                            
                                Change column names in Python Pandas from datetime objects to strings?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Julia: converting column type from Integer to Float64 in a DataFrame

Tags:

dataframe

julia

Pooya

People also ask

2 Answers

Mr Alpha

Colin T Bowers

Recent Activity

Donate For Us