Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting order for ggplot groups with repeated factors [duplicate]

I'm playing around with some text analysis and trying to display the top words by each book, using inverse document frequency (numeric value). I've largely been following along with the TidyText mining, but using Harry Potter.

The top words (using IDF) between some of the books are the same (e.g. Lupin or Griphook) and when plotting, the order uses the max IDF for that word. For example, griphook is a key word in both Sorcerer's Stone and Deathly Hallows. It has a value of .0007 in Deathly Hallows but only .0002, but is ordered as the top value for the Sorcerer's Stone.

ggplot output

hp.plot <- hp.words %>%
  arrange(desc(tf_idf)) %>%
  mutate(word = factor(word, levels = rev(unique(word))))

##For correct ordering of books
hp.plot$book <- factor(hp.plot$book, levels = c('Sorcerer\'s Stone', 'Chamber of Secrets',
                                                 'Prisoner of Azkhaban', 'Goblet of Fire',
                                                 'Order of the Phoenix', 'Half-Blood Prince',
                                                 'Deathly Hallows'))

hp.plot %>%
  group_by(book) %>% 
  top_n(10) %>% 
  ungroup %>%
  ggplot(aes(x=word, y=tf_idf, fill = book, group = book)) +
  geom_col(show.legend = FALSE) +
  labs(x = NULL, y = "tf-idf") +
  facet_wrap(~book, scales = "free") +
  coord_flip()

And here's an image of the dataframe for your reference.

I've tried sorting beforehand but that doesn't seem to work. Any ideas?

Edit: CSV is here

like image 801
GeorgeR90 Avatar asked Nov 16 '25 22:11

GeorgeR90


1 Answers

The reorder() function will reorder a factor by a specified variable (see ?reorder).

Inserting mutate(word = reorder(word, tf_idf)) after ungroup() in your last block before plotting should reorder by tf_idf. I don't have a sample of your data, but using the janeaustenr package, this does the same:

library(tidytext)
library(janeaustenr)
library(dplyr)

book_words <- austen_books() %>%
  unnest_tokens(word, text) %>%
  count(book, word, sort = TRUE) %>%
  ungroup()

total_words <- book_words %>% 
  group_by(book) %>% 
  summarize(total = sum(n))

book_words <- left_join(book_words, total_words)

book_words <- book_words %>%
  bind_tf_idf(word, book, n) 


library(ggplot2)
book_words %>% 
  group_by(book) %>%
  top_n(10) %>% 
  ungroup() %>% 
  mutate(word = reorder(word, tf_idf)) %>% 
  ggplot(aes(x = word, y = tf_idf, fill = book, group = book)) + 
  geom_col(show.legend = FALSE) +
  labs(x = NULL, y = "tf-idf") +
  facet_wrap(~book, scales = "free") +
  coord_flip()
like image 86
jdb Avatar answered Nov 18 '25 12:11

jdb



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!