I have a variable called month
that is a list of values separated by a |
character (ex. "1 | 2 | 3"). The number of values in each cell is not the same, so some have "1 | 2" and others have "1 | 2 | 3 | 4". I need to separate month
into month1
, month2
, etc. all the way to the maximum number of values separated by |
in the column so that each value has its own column. Some rows should have NA
for the columns that exceed their number of values in the original month
column. I have no idea where to begin, but I've provided the a minimal reproducible example of what the variable/dataframe look like below.
id <- c("Lars", "Susan", "Juan")
month <- c("1 | 2", "1 | 2 | 3", "1 | 2 | 3 | 4")
df <- data.frame(id,month)
You can use read.delim()
with fill = TRUE
, constructing the column names afterward.
res <- read.delim(text= paste(id, month, sep = "|"),
sep = "|", fill = TRUE, header = FALSE)
names(res) <- c("id", paste0("month", 1:(ncol(res)-1)))
print(res)
id month1 month2 month3 month4
1 Lars 1 2 NA NA
2 Susan 1 2 3 NA
3 Juan 1 2 3 4
If the maximum number of columns isn't achieved in the first five input lines, this may go wrong:
The number of data columns is determined by looking at the first five lines of input (or the whole input if it has less than five lines), or from the length of ‘col.names’ if it is specified and is longer. This could conceivably be wrong if ‘fill’ or ‘blank.lines.skip’ are true, so specify ‘col.names’ if necessary (as in the ‘Examples’).
In that case you could specify col.names = c("id", paste0("month", 1:maxmonths))
as part of your read.delim()
call. (Assuming you know maxmonths
in advance - it would also be possible to run through month
first to find the max value ... something like max(lengths(strsplit(months, "|")))
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With