Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert the type of a Dataframe column in Julia?

Tags:

julia

I have some data that is badly formatted. Specifically I have numeric columns that have some elements with spurious text in them (e.g. "8 meters" instead of "8"). I want to use readtable to read in the data, make the necessary fixes to the data and then convert the column to a Float64 so that it behaves correctly (comparison, etc).

There seems to have been a macro called @transform that would do the conversion but it has been deleted. How do I do this now?

My best solution at the moment is to clean up the data, write it out as a csv and then re-read it using readtable and specify eltypes. But that is horrible.

What else can I do?

like image 477
user492922 Avatar asked Mar 11 '14 19:03

user492922


1 Answers

There is no need to run things via a csv file. You can change or update the DataFrame directly.

using DataFrames
# Lets make up some data
df=DataFrame(A=rand(5),B=["8", "9 meters", "4.5", "3m", "12.0"])

# And then make a function to clean the data
function fixdata(arr)
    result = DataArray(Float64, length(arr))
    reg = r"[0-9]+\.*[0-9]*"
    for i = 1:length(arr)
        m = match(reg, arr[i])
        if m == nothing
            result[i] = NA
        else
            result[i] = float64(m.match)
        end
    end
    result
end

# Then just apply the function to the column to clean the data
# and then replace the column with the cleaned data.
df[:B] = fixdata(df[:B])
like image 152
Mr Alpha Avatar answered Sep 30 '22 21:09

Mr Alpha