Given a DataFrame df in Julia:
using DataFrames
df = DataFrame(X = ['A', 'B', 'C'], Y = ["a|b", "a|c", "b|b"])
How can I create columns Y1 and Y2 by splitting column Y at the "|" delimiter?
E.g., in R tidyverse I'd do:
separate(df, Y, c("Y1", "Y2"), by = "\\|")
There is no in-built function that does this as far as I know.
Two relatively terse ways to do it that come to my mind are either:
julia> df = DataFrame(X = ['A', 'B', 'C'], Y = ["a|b", "a|c", "b|b"])
3×2 DataFrame
│ Row │ X    │ Y      │
│     │ Char │ String │
├─────┼──────┼────────┤
│ 1   │ 'A'  │ a|b    │
│ 2   │ 'B'  │ a|c    │
│ 3   │ 'C'  │ b|b    │
julia> data = split.(df.Y, '|')
3-element Array{Array{SubString{String},1},1}:
 ["a", "b"]
 ["a", "c"]
 ["b", "b"]
julia> foreach(enumerate([:Y1, :Y2])) do (i, n)
           df[!, n] = getindex.(data, i)
       end
julia> df
3×4 DataFrame
│ Row │ X    │ Y      │ Y1        │ Y2        │
│     │ Char │ String │ SubStrin… │ SubStrin… │
├─────┼──────┼────────┼───────────┼───────────┤
│ 1   │ 'A'  │ a|b    │ a         │ b         │
│ 2   │ 'B'  │ a|c    │ a         │ c         │
│ 3   │ 'C'  │ b|b    │ b         │ b         │
or
julia> df = DataFrame(X = ['A', 'B', 'C'], Y = ["a|b", "a|c", "b|b"])
3×2 DataFrame
│ Row │ X    │ Y      │
│     │ Char │ String │
├─────┼──────┼────────┤
│ 1   │ 'A'  │ a|b    │
│ 2   │ 'B'  │ a|c    │
│ 3   │ 'C'  │ b|b    │
julia> hcat(df, DataFrame(reduce(vcat, permutedims.(split.(df.Y, '|'))), [:Y1, :Y2]))
3×4 DataFrame
│ Row │ X    │ Y      │ Y1        │ Y2        │
│     │ Char │ String │ SubStrin… │ SubStrin… │
├─────┼──────┼────────┼───────────┼───────────┤
│ 1   │ 'A'  │ a|b    │ a         │ b         │
│ 2   │ 'B'  │ a|c    │ a         │ c         │
│ 3   │ 'C'  │ b|b    │ b         │ b         │
Currently DataFrames.jl allows you to do it in a simpler way (the current DataFrames.jl release is 1.3):
julia> df = DataFrame(X = ['A', 'B', 'C'], Y = ["a|b", "a|c", "b|b"])
3×2 DataFrame
 Row │ X     Y      
     │ Char  String 
─────┼──────────────
   1 │ A     a|b
   2 │ B     a|c
   3 │ C     b|b
julia> transform!(df, :Y => ByRow(x -> split(x, '|')) => [:Y1, :Y2])
3×4 DataFrame
 Row │ X     Y       Y1         Y2        
     │ Char  String  SubStrin…  SubStrin… 
─────┼────────────────────────────────────
   1 │ A     a|b     a          b
   2 │ B     a|c     a          c
   3 │ C     b|b     b          b
and with DataFramesMeta.jl it would be:
julia> df = DataFrame(X = ['A', 'B', 'C'], Y = ["a|b", "a|c", "b|b"])
3×2 DataFrame
 Row │ X     Y      
     │ Char  String 
─────┼──────────────
   1 │ A     a|b
   2 │ B     a|c
   3 │ C     b|b
julia> @rtransform!(df, $[:Y1, :Y2]=split(:Y, '|'))
3×4 DataFrame
 Row │ X     Y       Y1         Y2        
     │ Char  String  SubStrin…  SubStrin… 
─────┼────────────────────────────────────
   1 │ A     a|b     a          b
   2 │ B     a|c     a          c
   3 │ C     b|b     b          b
which I think is quite clean.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With