Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Dict to DataFrame in Julia

Suppose I have a Dict defined as follows:

x = Dict{AbstractString,Array{Integer,1}}("A" => [1,2,3], "B" => [4,5,6])

I want to convert this to a DataFrame object (from the DataFrames module). Constructing a DataFrame has a similar syntax to constructing a dictionary. For example, the above dictionary could be manually constructed as a data frame as follows:

DataFrame(A = [1,2,3], B = [4,5,6])

I haven't found a direct way to get from a dictionary to a data frame but I figured one could exploit the syntactic similarity and write a macro to do this. The following doesn't work at all but it illustrates the approach I had in mind:

macro dict_to_df(x)
    typeof(eval(x)) <: Dict || throw(ArgumentError("Expected Dict"))
    return quote
        DataFrame(
            for k in keys(eval(x))
                @eval ($k) = $(eval(x)[$k])
            end
        )
    end
end

I also tried writing this as a function, which does work when all dictionary values have the same length:

function dict_to_df(x::Dict)
    s = "DataFrame("
    for k in keys(x)
        v = x[k]
        if typeof(v) <: AbstractString
            v = string('"', v, '"')
        end
        s *= "$(k) = $(v),"
    end
    s = chop(s) * ")"
    return eval(parse(s))
end

Is there a better, faster, or more idiomatic approach to this?

like image 307
Alex A. Avatar asked Nov 30 '15 00:11

Alex A.


1 Answers

Another method could be

DataFrame(Any[values(x)...],Symbol[map(symbol,keys(x))...])

It was a bit tricky to get the types in order to access the right constructor. To get a list of the constructors for DataFrames I used methods(DataFrame).

The DataFrame(a=[1,2,3]) way of creating a DataFrame uses keyword arguments. To use splatting (...) for keyword arguments the keys need to be symbols. In the example x has strings, but these can be converted to symbols. In code, this is:

DataFrame(;[Symbol(k)=>v for (k,v) in x]...)

Finally, things would be cleaner if x had originally been with symbols. Then the code would go:

x = Dict{Symbol,Array{Integer,1}}(:A => [1,2,3], :B => [4,5,6])
df = DataFrame(;x...)
like image 191
Dan Getz Avatar answered Nov 11 '22 04:11

Dan Getz