Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I have a dataframe of words and I would like to filter out rows that have numbers in word column in R

Tags:

r

filter

dplyr

So I have a df with a list of words and their frequencies. I would like to filter out rows with numbers; since it's mostly characters however R is recognizing every entry as a character.

I attempted:

test <- test %>%
filter(word == as.character(word)

But this did not work.

test <- structure(list(word = c("data", "summit", "research", "program", 
"analysis", "study", "evaluation", "minority", "experience", "department", 
"statistical", "Experience", "business", "design", "education", 
"response", "7", "sampling", "learning", "5"), n = c(213L, 
131L, 101L, 98L, 90L, 84L, 82L, 82L, 76L, 72L, 65L, 63L, 60L, 
58L, 58L, 58L, 56L, 55L, 50L, 50L)), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

Additionally, is there a way to make all entries lower case? I would like to see a df with no rows that have a number for row as well as all lower case entries (which would later be grouped).

like image 601
Johnny Thomas Avatar asked Dec 27 '25 14:12

Johnny Thomas


1 Answers

You can do:

test %>%
 mutate(word = tolower(word)) %>%
 filter(!grepl("[^A-Za-z]", word))

   word            n
   <chr>       <int>
 1 data          213
 2 summit        131
 3 research      101
 4 program        98
 5 analysis       90
 6 study          84
 7 evaluation     82
 8 minority       82
 9 experience     76
10 department     72
11 statistical    65
12 experience     63
13 business       60
14 design         58
15 education      58
16 response       58
17 sampling       55
18 learning       50
like image 162
tmfmnk Avatar answered Dec 31 '25 17:12

tmfmnk



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!