Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to separate a DataFrame column in two given a delimiter?

Tags:

julia

Given a DataFrame df in Julia:

using DataFrames
df = DataFrame(X = ['A', 'B', 'C'], Y = ["a|b", "a|c", "b|b"])

How can I create columns Y1 and Y2 by splitting column Y at the "|" delimiter?

E.g., in R tidyverse I'd do:

separate(df, Y, c("Y1", "Y2"), by = "\\|")
like image 586
Vitor Avatar asked Oct 23 '25 14:10

Vitor


1 Answers

There is no in-built function that does this as far as I know.

Two relatively terse ways to do it that come to my mind are either:

julia> df = DataFrame(X = ['A', 'B', 'C'], Y = ["a|b", "a|c", "b|b"])
3×2 DataFrame
│ Row │ X    │ Y      │
│     │ Char │ String │
├─────┼──────┼────────┤
│ 1   │ 'A'  │ a|b    │
│ 2   │ 'B'  │ a|c    │
│ 3   │ 'C'  │ b|b    │

julia> data = split.(df.Y, '|')
3-element Array{Array{SubString{String},1},1}:
 ["a", "b"]
 ["a", "c"]
 ["b", "b"]

julia> foreach(enumerate([:Y1, :Y2])) do (i, n)
           df[!, n] = getindex.(data, i)
       end

julia> df
3×4 DataFrame
│ Row │ X    │ Y      │ Y1        │ Y2        │
│     │ Char │ String │ SubStrin… │ SubStrin… │
├─────┼──────┼────────┼───────────┼───────────┤
│ 1   │ 'A'  │ a|b    │ a         │ b         │
│ 2   │ 'B'  │ a|c    │ a         │ c         │
│ 3   │ 'C'  │ b|b    │ b         │ b         │

or

julia> df = DataFrame(X = ['A', 'B', 'C'], Y = ["a|b", "a|c", "b|b"])
3×2 DataFrame
│ Row │ X    │ Y      │
│     │ Char │ String │
├─────┼──────┼────────┤
│ 1   │ 'A'  │ a|b    │
│ 2   │ 'B'  │ a|c    │
│ 3   │ 'C'  │ b|b    │

julia> hcat(df, DataFrame(reduce(vcat, permutedims.(split.(df.Y, '|'))), [:Y1, :Y2]))
3×4 DataFrame
│ Row │ X    │ Y      │ Y1        │ Y2        │
│     │ Char │ String │ SubStrin… │ SubStrin… │
├─────┼──────┼────────┼───────────┼───────────┤
│ 1   │ 'A'  │ a|b    │ a         │ b         │
│ 2   │ 'B'  │ a|c    │ a         │ c         │
│ 3   │ 'C'  │ b|b    │ b         │ b         │

EDIT

Currently DataFrames.jl allows you to do it in a simpler way (the current DataFrames.jl release is 1.3):

julia> df = DataFrame(X = ['A', 'B', 'C'], Y = ["a|b", "a|c", "b|b"])
3×2 DataFrame
 Row │ X     Y      
     │ Char  String 
─────┼──────────────
   1 │ A     a|b
   2 │ B     a|c
   3 │ C     b|b

julia> transform!(df, :Y => ByRow(x -> split(x, '|')) => [:Y1, :Y2])
3×4 DataFrame
 Row │ X     Y       Y1         Y2        
     │ Char  String  SubStrin…  SubStrin… 
─────┼────────────────────────────────────
   1 │ A     a|b     a          b
   2 │ B     a|c     a          c
   3 │ C     b|b     b          b

and with DataFramesMeta.jl it would be:

julia> df = DataFrame(X = ['A', 'B', 'C'], Y = ["a|b", "a|c", "b|b"])
3×2 DataFrame
 Row │ X     Y      
     │ Char  String 
─────┼──────────────
   1 │ A     a|b
   2 │ B     a|c
   3 │ C     b|b

julia> @rtransform!(df, $[:Y1, :Y2]=split(:Y, '|'))
3×4 DataFrame
 Row │ X     Y       Y1         Y2        
     │ Char  String  SubStrin…  SubStrin… 
─────┼────────────────────────────────────
   1 │ A     a|b     a          b
   2 │ B     a|c     a          c
   3 │ C     b|b     b          b

which I think is quite clean.

like image 131
Bogumił Kamiński Avatar answered Oct 25 '25 10:10

Bogumił Kamiński



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!