How would you go about applying a function to some/all columns in a julia dataframe, columnwise? The use case I'm trying to tackle is simple type parsing and processing. For example, I would like to parse the columns of this example dataframes from strings to ints
df = DataFrame(a = ["1","2", "3"], b = ["4","5","6"])
# something like this works but destroys the structure of the dataframe
[parse.(Int64, col) for col in eachcol(df)]
In the future, I would like to be able to have a dataframe with many columns of different types and modify only selections of this dataframe. However I'm still stuck at the simple case where I want to apply the function to all columns.
In R Programming Language to apply a function to every integer type value in a data frame, we can use lapply function from dplyr package. And if the datatype of values is string then we can use paste() with lapply.
Additionally, in your example, you should use select! in order to modify the column names in place, or alternatively do 'df = select(df, "col1" => "Id", "col2" => "Name")` as select always return a new DataFrame .
Steps to Create a DataFrame in Julia from Scratch You can then use the following template to create a DataFrame in Julia: using DataFrames df = DataFrame(column_1 = ["value_1", "value_2", "value_3", ...], column_2 = ["value_1", "value_2", "value_3", ...], column_3 = ["value_1", "value_2", "value_3", ...], ... )
Create an empty Julia DataFrame by enclosing column names and datatype of column inside DataFrame() function. Now you can add rows one by one using push!() function. This is like row binding.
It is not clear what you want to achieve. From your comment I assume you want to take a data frame as a source and have a data frame as the result. If this is the case here are the options.
The basic one is to use mapcols
(creates a new data frame) or mapcols!
(operates in-place). Here is an example of mapcols
on your query:
julia> mapcols(col -> parse.(Int, col), df)
3×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 4 │
│ 2 │ 2 │ 5 │
│ 3 │ 3 │ 6 │
A more general set of functions is transform
(creates a new data frame) and `transform! (operates in place). They add new columns to your data frame:
julia> transform(df, :a => ByRow(x -> parse(Int, x)) => :a)
3×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ String │
├─────┼───────┼────────┤
│ 1 │ 1 │ 4 │
│ 2 │ 2 │ 5 │
│ 3 │ 3 │ 6 │
julia> transform(df, [:a, :b] .=> ByRow(x -> parse(Int, x)) .=> [:a, :b])
3×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 4 │
│ 2 │ 2 │ 5 │
│ 3 │ 3 │ 6 │
Please refer to the documentation of DataFrames.jl for the details (as they are long - the function has many options, but you can start here and here).
There are a few things to note here:
transform
is more general than mapcols
you have to specify the name of the outputted column (if you omitted the output column name it would be auto generated by merging source column name and function name)transform
to a subset of columns as you can see, however note that in order to apply the same transformation to multiple columns we used broadcasting with .=>
notation.(note that there are select
and select!
functions that do almost the same but do not keep the columns of the old data frame by default)
Finally, in practice it is also fully OK to write something like:
julia> foreach(n -> df[!, n] = parse.(Int, df[!, n]), names(df))
julia> df
3×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 4 │
│ 2 │ 2 │ 5 │
│ 3 │ 3 │ 6 │
(this modifies your data frame in-place as you can see)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With