R: Split Variable Column into multiple (unbalanced) columns by comma

Tags:

I have a dataset of 25 variables and over 2 million observations. One of my variables is a combination of a few different "categories" that I want to split to where it shows 1 category per column (similar to what split would do in stata). For example:

Click to copy

# Name      Age     Number               Events                      First 
# Karen      24        8         Triathlon/IM,Marathon,10k,5k         0
# Kurt       39        2         Half-Marathon,10k                    0 
# Leah       18        0                                              1

And I want it to look like:

Click to copy

# Name   Age  Number Events_1        Event_2      Events_3     Events_4      First
# Karen   24    8     Triathlon/IM    Marathon       10k         5k             0
# Kurt    39    2     Half-Marathon   10k            NA          NA             0 
# Leah    18    0     NA              NA             NA          NA             1

I have looked through stackoverflow but have not found anything that works (everything gives me an error of some sort). Any suggestions would be greatly appreciated.

Note: May not be important but the largest number of categories 1 person has is 19 therefore I would need to create Event_1:Event_19

Comment: Previous stack overflows have suggested the separate function, however this function does not seem to work with my dataset. When I input the function the program runs but when it is finished nothing is changed, there is no output, and no error code. When I tried to use other suggestions made in other threads I received error messages. However, I finally got it is work by using the cSplit function. Thank for the help!!!

925

asked Jul 23 '15 02:07

Kfruge

1 Answers

From Ananda's splitstackshape package:

Click to copy

cSplit(df, "Events", sep=",")
#    Name Age Number First      Events_1 Events_2 Events_3 Events_4
#1: Karen  24      8     0  Triathlon/IM Marathon      10k       5k
#2:  Kurt  39      2     0 Half-Marathon      10k       NA       NA
#3: Leah   18      0     1            NA       NA       NA       NA

Or with tidyr:

Click to copy

separate(df, 'Events', paste("Events", 1:4, sep="_"), sep=",", extra="drop")
#   Name Age Number               Events_1 Events_2 Events_3 Events_4 First
#1 Karen  24      8           Triathlon/IM Marathon      10k       5k     0
#2  Kurt  39      2          Half-Marathon      10k     <NA>     <NA>     0
#3 Leah   18      0                     NA     <NA>     <NA>     <NA>     1

With the data.table package:

Click to copy

setDT(df)[,paste0("Events_", 1:4) := tstrsplit(Events, ",")][,-"Events", with=F]
#    Name Age Number First               Events_1 Events_2 Events_3 Events_4
#1: Karen  24      8     0           Triathlon/IM Marathon      10k       5k
#2:  Kurt  39      2     0          Half-Marathon      10k       NA       NA
#3: Leah   18      0     1                     NA       NA       NA       NA

Data

Click to copy

df <- structure(list(Name = structure(1:3, .Label = c("Karen", "Kurt", 
"Leah "), class = "factor"), Age = c(24L, 39L, 18L), Number = c(8L, 
2L, 0L), Events = structure(c(3L, 2L, 1L), .Label = c("               NA", 
"         Half-Marathon,10k", "     Triathlon/IM,Marathon,10k,5k"
), class = "factor"), First = c(0L, 0L, 1L)), .Names = c("Name", 
"Age", "Number", "Events", "First"), class = "data.frame", row.names = c(NA, 
-3L))

156

answered Oct 09 '22 17:10

Pierre L

Related questions
                            
                                Subtract matrix of n,k dimensions from array of matrices of n,k dimensions
                            
                                install rJava - "configure: error: One or more JNI types differ from the corresponding native type"
                            
                                How to aggregate a data.frame on both row and column names based on a hierarchical dictionary name structure?
                            
                                add 1 business day to date in R
                            
                                Histogram, error: Error in plot.new() : figure margins too large [duplicate]
                            
                                skip last 10 rows for read in csv file (unknown number of rows)
                            
                                Passing many argumentes (...) by ellipsis in Rcpp
                            
                                Understanding tally(sort = TRUE)
                            
                                How to remove grey borders around individual entries in ggplot2 legend when using theme_bw?
                            
                                How does R's system.time work? [duplicate]
                            
                                Create a variable that identifies the original data.frame after rbind command in R
                            
                                Recursive regression in R
                            
                                Inspecting and visualizing gaps/blanks and structure in large dataframes
                            
                                error Installing topicmodels in R Ubuntu
                            
                                Find all rows of matrix equal to vector
                            
                                Reshaping data table to make column names into row names
                            
                                Adding border or background to scale legend guide_colorbar in ggplot2
                            
                                as.h2o() in R to upload files to h2o environment takes a long time
                            
                                Disable textInput based on radio button selection on Shiny
                            
                                Quantmod Error 'cannot open URL'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R: Split Variable Column into multiple (unbalanced) columns by comma

Tags:

split

r

Kfruge

People also ask

1 Answers

Pierre L

Recent Activity

Donate For Us