When I'm running simulations, I like to initialize a big, empty array and fill it up as the simulation iterates through to the end. I do this with something like res = Array(Real,(n_iterations,n_parameters))
. However, it would be nice to have named columns, which I think means using a DataFrame. Yet when I try to do something like res_df = convert(DataFrame,res)
it throws an error. I would like a more concise approach than doing something like res_df = DataFrame(a=Array(Real,N),b=Array(Real,N),c=Array(Real,N),....)
as suggested by the answers to: julia create an empty dataframe and append rows to it
To preallocate a data frame, you must pre-allocate its columns. You can create three columns full of missing
values by simply doing [fill(missing, 10000) for _ in 1:3]
, but that doesn't actually allocate anything at all because those vectors can only hold one value — missing
— and thus they can't be changed to hold other values later. One way to do this is by using to Vector
constructors that can hold either Missing
or Float64
:
julia> DataFrame([Vector{Union{Missing, Float64}}(missing, 10000) for _ in 1:3], [:a, :b, :c])
10000×3 DataFrame
Row │ a b c
│ Float64? Float64? Float64?
───────┼──────────────────────────────
1 │ missing missing missing
2 │ missing missing missing
⋮ │ ⋮ ⋮ ⋮
10000 │ missing missing missing
9997 rows omitted
Note that rather than Real
, this is using the concrete Float64
— this will have significantly better performance.
(this answer was edited to reflect DataFrames v1.0 syntax)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With