I have a data frame in the form of columns - input
Id Comment
xc545 Ronald is a great person
g6548 Hero worship is bad
I need the output in the form of Result
Id Words
xc545 Ronald
xc545 is
xc545 a
xc545 great
xc545 person
g6548 Hero
g6548 worship
g6548 is
g6548 bad
Need a R statement to execute this.
Following is what I tried -
result<-lapply(input,function(x)strsplit(x[2]," "))
However this returns only one record.
A data.table
solution inspired from this one:
library(data.table)
dt = data.table(df)
dt[,c(Words=strsplit(Comment, " ", fixed = TRUE)), by = Id]
Id V1
1: xc545 Ronald
2: xc545 is
3: xc545 a
4: xc545 great
5: xc545 person
6: g6548 Hero
7: g6548 worship
8: g6548 is
9: g6548 bad
Suppose DF
is your data.frame, a possibility could be:
> List <- strsplit(DF$Comment, " ")
> data.frame(Id=rep(DF$Id, sapply(List, length)), Words=unlist(List))
Id Words
1 xc545 Ronald
2 xc545 is
3 xc545 a
4 xc545 great
5 xc545 person
6 g6548 Hero
7 g6548 worship
8 g6548 is
9 g6548 bad
Note that my answer is only valid when there is a simple space between each pair of words.
Using scan
, tapply
and stack
:
d <- read.table(text='Id Comment
xc545 "Ronald is a great person"
g6548 "Hero worship is bad"', header=TRUE, as.is=TRUE)
stack(tapply(d$Comment, d$Id, function(x) scan(text=x, what='')))
# values ind
# 1 Hero g6548
# 2 worship g6548
# 3 is g6548
# 4 bad g6548
# 5 Ronald xc545
# 6 is xc545
# 7 a xc545
# 8 great xc545
# 9 person xc545
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With