Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert sentences to words in R

Tags:

r

I have a data frame in the form of columns - input

Id  Comment
xc545   Ronald is a great person 
g6548   Hero worship is bad

I need the output in the form of Result

Id  Words 
xc545   Ronald
xc545   is
xc545   a
xc545   great
xc545   person
g6548   Hero
g6548   worship
g6548   is
g6548   bad

Need a R statement to execute this.

Following is what I tried -

result<-lapply(input,function(x)strsplit(x[2]," "))

However this returns only one record.

like image 559
user1882633 Avatar asked Jun 04 '13 15:06

user1882633


3 Answers

A data.table solution inspired from this one:

library(data.table)
dt = data.table(df)
dt[,c(Words=strsplit(Comment, " ", fixed = TRUE)), by = Id]
Id      V1
1: xc545  Ronald
2: xc545      is
3: xc545       a
4: xc545   great
5: xc545  person
6: g6548    Hero
7: g6548 worship
8: g6548      is
9: g6548     bad
like image 183
agstudy Avatar answered Nov 06 '22 21:11

agstudy


Suppose DF is your data.frame, a possibility could be:

> List <- strsplit(DF$Comment, " ")
> data.frame(Id=rep(DF$Id, sapply(List, length)), Words=unlist(List))
     Id   Words
1 xc545  Ronald
2 xc545      is
3 xc545       a
4 xc545   great
5 xc545  person
6 g6548    Hero
7 g6548 worship
8 g6548      is
9 g6548     bad

Note that my answer is only valid when there is a simple space between each pair of words.

like image 24
Jilber Urbina Avatar answered Nov 06 '22 21:11

Jilber Urbina


Using scan, tapply and stack:

d <- read.table(text='Id  Comment
xc545   "Ronald is a great person"
g6548   "Hero worship is bad"', header=TRUE, as.is=TRUE)

stack(tapply(d$Comment, d$Id, function(x) scan(text=x, what='')))
#    values   ind
# 1    Hero g6548
# 2 worship g6548
# 3      is g6548
# 4     bad g6548
# 5  Ronald xc545
# 6      is xc545
# 7       a xc545
# 8   great xc545
# 9  person xc545
like image 3
Matthew Plourde Avatar answered Nov 06 '22 21:11

Matthew Plourde