Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I split a variable into multiple variables based on the maximum number of separating characters in the column? [duplicate]

Tags:

dataframe

r

I have a variable called month that is a list of values separated by a | character (ex. "1 | 2 | 3"). The number of values in each cell is not the same, so some have "1 | 2" and others have "1 | 2 | 3 | 4". I need to separate month into month1, month2, etc. all the way to the maximum number of values separated by | in the column so that each value has its own column. Some rows should have NA for the columns that exceed their number of values in the original month column. I have no idea where to begin, but I've provided the a minimal reproducible example of what the variable/dataframe look like below.

id <- c("Lars", "Susan", "Juan")
month <- c("1 | 2", "1 | 2 | 3", "1 | 2 | 3 | 4")
df <- data.frame(id,month)
like image 470
connor monie Avatar asked Sep 01 '25 15:09

connor monie


1 Answers

You can use read.delim() with fill = TRUE, constructing the column names afterward.

res <- read.delim(text= paste(id, month, sep = "|"), 
           sep = "|", fill = TRUE, header = FALSE)
names(res) <- c("id", paste0("month", 1:(ncol(res)-1)))
print(res)
     id month1 month2 month3 month4
1  Lars      1      2     NA     NA
2 Susan      1      2      3     NA
3  Juan      1      2      3      4

If the maximum number of columns isn't achieved in the first five input lines, this may go wrong:

The number of data columns is determined by looking at the first five lines of input (or the whole input if it has less than five lines), or from the length of ‘col.names’ if it is specified and is longer. This could conceivably be wrong if ‘fill’ or ‘blank.lines.skip’ are true, so specify ‘col.names’ if necessary (as in the ‘Examples’).

In that case you could specify col.names = c("id", paste0("month", 1:maxmonths)) as part of your read.delim() call. (Assuming you know maxmonths in advance - it would also be possible to run through month first to find the max value ... something like max(lengths(strsplit(months, "|"))))

like image 194
Ben Bolker Avatar answered Sep 04 '25 04:09

Ben Bolker