I have a dataframe like this:
I would like to split the second column into many columns based on "?". However it is not easy because there are other question marker in the sting. So the only additional common it that every substring contains the 'http'.
How is it possible to split them? The number of columns in the output example is just an example I don't know exactly how many could be generated.
Example of input data:
df_in <- data.frame(x = c('x1','x2','x3','x4'),
y = c('http://example1.com?https://example2.com', 'NA', 'http://example3.com?id=1234?https://example4/com?http://example6.com', 'http://example5.com'))
the dataframe as printed in console:
df_in
x y
x1 http://example1.com?https://example2.com
x2 NA
x3 http://example3.com?id=1234?https://example4/com?http://example6.com
x4 http://example5.com
Example of expected output:
df_out <- data.frame(x = c('x1','x2','x3','x4'),
col1 = c('http://example1.com', 'NA', 'http://example3.com?id=1234', 'http://example5.com'),
col2 = c('https://example2.com', 'NA', 'https://example4/com', 'NA'),
col3 = c('NA', 'NA', 'https://example6/com', 'NA'))
The output as printed in the console:
x col1 col2 col3
x1 http://example1.com https://example2.com NA
x2 NA NA NA
x3 http://example3.com?id=1234 https://example4/com https://example6/com
x4 http://example5.com NA NA
We can use separate
from tidyr
to separate the column 'y' into multiple columns by separating at the ?
that is before the http
library(tidyr)
df_in %>%
separate(y, into = paste0("col", 1:3), sep="[?](?=http)")
# x col1 col2 col3
#1 x1 http://example1.com https://example2.com <NA>
#2 x2 NA <NA> <NA>
#3 x3 http://example3.com?id=1234 https://example4/com http://example6.com
#4 x4 http://example5.com <NA> <NA>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With