Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When I try to melt my data frame with mixed data types, I get NAs. How can I best resolve this?

Tags:

r

reshape

My goal and context

I have a data frame in R that I want to melt using the reshape2 library. There are two reasons.

  1. I want to plot the score for each user for each question in a bar chart using ggplot.

  2. I want to put this data into Excel so I can see, per user, their sentiment, score, and mixed for motivation, attitudeBefore, etc. My intention was to use melt, then cast to put the data into wide format for easy Excel importing.

My problem

When I try to run melt, I get a warning and end up with NAs in my resulting molten data frame.

Warning messages:
1: In `[<-.factor`(`*tmp*`, ri, value = c(0.148024, 0.244452, -0.00421,  :
invalid factor level, NAs generated
2: In `[<-.factor`(`*tmp*`, ri, value = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,  :
invalid factor level, NAs generated

And I end up with a ton of NAs in my resulting melted data frame. I think it's because I'm using both characters and numerics in the same column.

My questions

I have two questions as a result.

Question 1: Is there a workaround for this in R?

Question 2: Is there a better way for me to structure my data to avoid this problem?

Code

Here's my code for creating the data frame.

words <- data.frame(read.delim("sentiments-test-subset-no-text.txt", header=FALSE))
names(words) <- c("level", "question", "user", "sentiment", "score", "mixed")
words$user <- as.factor(words$user)
words.m <- melt(words, id.vars=c("user", "level"), measure.vars=c("sentiment", "score",     "mixed"))

I'm pretty new to reshape and melt but I think that's what I want in the last line.

Data

The data in human-readable format looks like this.

experimental    motivated   1   positive    0.148024    0
experimental    motivated   2   positive    0.244452    0
experimental    motivated   3   negative       -0.004210    0
experimental    motivated   4   unknown         0.000000    0
experimental    attitudeBefore  1   negative       -0.241500    0
experimental    attitudeBefore  2   neutral         0.000000    0
experimental    attitudeBefore  3   neutral         0.000000    0
experimental    attitudeBefore  4   unknown         0.000000    0

dput dump

dput below.

structure(list(level = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L), .Label = "experimental", class = "factor"), question = structure(c(2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("attitudeBefore", "motivated"
), class = "factor"), user = structure(c(1L, 2L, 3L, 4L, 1L, 
2L, 3L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"), 
sentiment = structure(c(3L, 3L, 1L, 4L, 1L, 2L, 2L, 4L), .Label = c("negative", 
"neutral", "positive", "unknown"), class = "factor"), score = c(0.148024, 
0.244452, -0.00421, 0, -0.2415, 0, 0, 0), mixed = c(0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("level", "question", 
"user", "sentiment", "score", "mixed"), row.names = c(NA, -8L
), class = "data.frame")
like image 885
Irwin Avatar asked Oct 05 '22 10:10

Irwin


1 Answers

It looks like you might simply be using the wrong library. reshape and reshape2 are not the same thing.

library(reshape2)
words.m <- melt(words, id.vars=c("user", "level"), measure.vars=c("sentiment", "score",     "mixed"))
# no problem

detach(package:reshape2)

# using reshape instead of reshape2
library(reshape)
words.m <- melt(words, id.vars=c("user", "level"), measure.vars=c("sentiment", "score",     "mixed"))
# Warning messages:
# 1: In `[<-.factor`(`*tmp*`, ri, value = c(3L, 3L, 1L, 4L, 1L, 2L, 2L,  :
#   invalid factor level, NAs generated
# 2: In `[<-.factor`(`*tmp*`, ri, value = c(3L, 3L, 1L, 4L, 1L, 2L, 2L,  :
#   invalid factor level, NAs generated

if reshape2 is not available on your system, you can install it from CRAN

 install.packages("reshape2")
like image 131
Ricardo Saporta Avatar answered Oct 13 '22 10:10

Ricardo Saporta