Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to apply a function columnwise to julia dataframe

How would you go about applying a function to some/all columns in a julia dataframe, columnwise? The use case I'm trying to tackle is simple type parsing and processing. For example, I would like to parse the columns of this example dataframes from strings to ints

df = DataFrame(a = ["1","2", "3"], b = ["4","5","6"])

# something like this works but destroys the structure of the dataframe
[parse.(Int64, col) for col in eachcol(df)]

In the future, I would like to be able to have a dataframe with many columns of different types and modify only selections of this dataframe. However I'm still stuck at the simple case where I want to apply the function to all columns.

like image 792
ElBrocas Avatar asked May 19 '20 02:05

ElBrocas


People also ask

How do I apply a function to a DataFrame in R?

In R Programming Language to apply a function to every integer type value in a data frame, we can use lapply function from dplyr package. And if the datatype of values is string then we can use paste() with lapply.

How do I change the column name in Julia DataFrame?

Additionally, in your example, you should use select! in order to modify the column names in place, or alternatively do 'df = select(df, "col1" => "Id", "col2" => "Name")` as select always return a new DataFrame .

How do you create a DataFrame in Julia?

Steps to Create a DataFrame in Julia from Scratch You can then use the following template to create a DataFrame in Julia: using DataFrames df = DataFrame(column_1 = ["value_1", "value_2", "value_3", ...], column_2 = ["value_1", "value_2", "value_3", ...], column_3 = ["value_1", "value_2", "value_3", ...], ... )

How do I create an empty DataFrame in Julia?

Create an empty Julia DataFrame by enclosing column names and datatype of column inside DataFrame() function. Now you can add rows one by one using push!() function. This is like row binding.


1 Answers

It is not clear what you want to achieve. From your comment I assume you want to take a data frame as a source and have a data frame as the result. If this is the case here are the options.

The basic one is to use mapcols (creates a new data frame) or mapcols! (operates in-place). Here is an example of mapcols on your query:

julia> mapcols(col -> parse.(Int, col), df)
3×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 4     │
│ 2   │ 2     │ 5     │
│ 3   │ 3     │ 6     │

A more general set of functions is transform (creates a new data frame) and `transform! (operates in place). They add new columns to your data frame:

julia> transform(df, :a => ByRow(x -> parse(Int, x)) => :a)
3×2 DataFrame
│ Row │ a     │ b      │
│     │ Int64 │ String │
├─────┼───────┼────────┤
│ 1   │ 1     │ 4      │
│ 2   │ 2     │ 5      │
│ 3   │ 3     │ 6      │

julia> transform(df, [:a, :b] .=> ByRow(x -> parse(Int, x)) .=> [:a, :b])
3×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 4     │
│ 2   │ 2     │ 5     │
│ 3   │ 3     │ 6     │

Please refer to the documentation of DataFrames.jl for the details (as they are long - the function has many options, but you can start here and here).

There are a few things to note here:

  • as transform is more general than mapcols you have to specify the name of the outputted column (if you omitted the output column name it would be auto generated by merging source column name and function name)
  • as you can see you can apply transform to a subset of columns as you can see, however note that in order to apply the same transformation to multiple columns we used broadcasting with .=> notation.

(note that there are select and select! functions that do almost the same but do not keep the columns of the old data frame by default)

Finally, in practice it is also fully OK to write something like:

julia> foreach(n -> df[!, n] = parse.(Int, df[!, n]), names(df))

julia> df
3×2 DataFrame
│ Row │ a     │ b     │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 4     │
│ 2   │ 2     │ 5     │
│ 3   │ 3     │ 6     │

(this modifies your data frame in-place as you can see)

like image 196
Bogumił Kamiński Avatar answered Sep 29 '22 07:09

Bogumił Kamiński