I have the following function in julia, to read an Arrow file (using Arrow.jl) to read data from disk and process it:
function getmembershipsdays(fromId, toId)
memberships = Arrow.Table("HouseholdMemberships.arrow") |> DataFrame
filter!([:IndividualId] => id -> id >= fromId && id <= toId, memberships)
...
end
> Error: ERROR: LoadError: MethodError: no method matching
> deleteat!(::Arrow.Primitive{Int64,Array{Int64,1}}, ::Array{Int64,1})
The DataFrame has the following structure:
123226x10 DataFrame
Row | MembershipId | IndividualId | HouseholdId | ...
| Int64 | Int64 | Int64 |
The rest of the code in the function to step through the Dataframe works, but I get this error if I add the filter condition. It is as if the Dataframe columns are not converted to the underlying julia types.
if I do
m = filter([:IndividualId] => id -> id >= fromId && id <= toId, memberships)
then it works. How do I filter in place?
You are using memory-mapping, which means that you cannot resize the DataFrame
created from Arrow.jl source in place. This is a cost you have to pay for having super-fast zero-copy creation of data frames from Arrow source.
Why was it designed this way?
filter!
with filter
in your example).See https://bkamins.github.io/julialang/2020/11/06/arrow.html for some more examples (in particular - how to avoid doing memory mapping using IO
source instead of file name as source).
PS. Note that id >= fromId && id <= toId
can be just written as fromId <= id <= toId
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With