Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Two conditions for split a column

Tags:

regex

r

I have a dataframe like this:

I would like to split the second column into many columns based on "?". However it is not easy because there are other question marker in the sting. So the only additional common it that every substring contains the 'http'.

How is it possible to split them? The number of columns in the output example is just an example I don't know exactly how many could be generated.

Example of input data:

 df_in <- data.frame(x = c('x1','x2','x3','x4'),
                     y = c('http://example1.com?https://example2.com', 'NA', 'http://example3.com?id=1234?https://example4/com?http://example6.com', 'http://example5.com'))

the dataframe as printed in console:

 df_in
  x                                                                    y
 x1                             http://example1.com?https://example2.com
 x2                                                                   NA
 x3 http://example3.com?id=1234?https://example4/com?http://example6.com
 x4                                                  http://example5.com

Example of expected output:

df_out <- data.frame(x = c('x1','x2','x3','x4'),
                     col1 = c('http://example1.com', 'NA', 'http://example3.com?id=1234', 'http://example5.com'),
                     col2 = c('https://example2.com', 'NA', 'https://example4/com', 'NA'),
                     col3 = c('NA', 'NA', 'https://example6/com', 'NA'))

The output as printed in the console:

 x                        col1                 col2                 col3
 x1         http://example1.com https://example2.com                   NA
 x2                          NA                   NA                   NA
 x3 http://example3.com?id=1234 https://example4/com https://example6/com
 x4         http://example5.com                   NA                   NA
like image 224
user8831872 Avatar asked Feb 12 '18 07:02

user8831872


Video Answer


1 Answers

We can use separate from tidyr to separate the column 'y' into multiple columns by separating at the ? that is before the http

library(tidyr)
df_in %>%
     separate(y, into = paste0("col", 1:3), sep="[?](?=http)")
#   x                        col1                 col2                col3
#1 x1         http://example1.com https://example2.com                <NA>
#2 x2                          NA                 <NA>                <NA>
#3 x3 http://example3.com?id=1234 https://example4/com http://example6.com
#4 x4         http://example5.com                 <NA>                <NA>
like image 181
akrun Avatar answered Nov 02 '22 23:11

akrun