Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assigning results of strsplit to multiple columns of data frame

I am trying to split a character vector into three different vectors, inside a data frame.

My data is something like:

> df <- data.frame(filename = c("Author1 (2010) Title of paper", 
                                "Author2 et al (2009) Title of paper",
                                "Author3 & Author4 (2004) Title of paper"),
                   stringsAsFactors = FALSE)

And I would like to split those 3 informations (authors, year, title) into three different columns, so that it would be:

> df
                          filename             author  year   title
 1           Author1 (2010) Title1            Author1  2010  Title1
 2     Author2 et al (2009) Title2      Author2 et al  2009  Title2
 3 Author3 & Author4 (2004) Title3  Author3 & Author4  2004  Title3

I have used strsplit to split each filename in a vector of 3 elements:

 df$temp <- strsplit(df$filename, " \\(|\\) ")

But now, I can't find a way to put each element in a separate column. I can access a specific information like that:

> df$temp[[2]][1]
[1] "Author2 et al"

but can't find how to put it in the other columns

> df$author <- df$temp[[]][1]
Error
like image 209
iNyar Avatar asked Dec 01 '22 00:12

iNyar


1 Answers

You could try tstrsplit from the devel version of data.table

library(data.table)#v1.9.5+
 setDT(df)[, c('author', 'year', 'title') :=tstrsplit(filename, ' \\(|\\) ')]
df
#                                  filename             author year
#1:           Author1 (2010) Title of paper           Author1  2010
#2:     Author2 et al (2009) Title of paper     Author2 et al  2009
#3: Author3 & Author4 (2004) Title of paper Author3 & Author4  2004
#             title
#1:  Title of paper
#2:  Title of paper
#3:  Title of paper

Edit: Included OP's split pattern to remove the white spaces.

like image 149
akrun Avatar answered Dec 05 '22 03:12

akrun