Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Preallocate a data frame of known size in Julia

Tags:

julia

When I'm running simulations, I like to initialize a big, empty array and fill it up as the simulation iterates through to the end. I do this with something like res = Array(Real,(n_iterations,n_parameters)). However, it would be nice to have named columns, which I think means using a DataFrame. Yet when I try to do something like res_df = convert(DataFrame,res) it throws an error. I would like a more concise approach than doing something like res_df = DataFrame(a=Array(Real,N),b=Array(Real,N),c=Array(Real,N),....) as suggested by the answers to: julia create an empty dataframe and append rows to it

like image 595
Will Townes Avatar asked Feb 23 '15 03:02

Will Townes


1 Answers

To preallocate a data frame, you must pre-allocate its columns. You can create three columns full of missing values by simply doing [fill(missing, 10000) for _ in 1:3], but that doesn't actually allocate anything at all because those vectors can only hold one value — missing — and thus they can't be changed to hold other values later. One way to do this is by using to Vector constructors that can hold either Missing or Float64:

julia> DataFrame([Vector{Union{Missing, Float64}}(missing, 10000) for _ in 1:3], [:a, :b, :c])
10000×3 DataFrame
   Row │ a         b         c
       │ Float64?  Float64?  Float64?
───────┼──────────────────────────────
     1 │  missing   missing   missing
     2 │  missing   missing   missing
   ⋮   │    ⋮         ⋮         ⋮
 10000 │  missing   missing   missing
                     9997 rows omitted

Note that rather than Real, this is using the concrete Float64 — this will have significantly better performance.

(this answer was edited to reflect DataFrames v1.0 syntax)

like image 113
mbauman Avatar answered Nov 13 '22 23:11

mbauman