Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

checking if word exist in english dictionary r

I'm performing some text analysis on mutliple resume to generate a wordcloud using wordcloud package along with tm package for preprocessing the corpus of document in R.

The problems i'm facing are :

  1. Checking whether the word in corpus have some meaning ie. it belongs to english dictionary.

  2. How to mine / process multiple resumes together.

  3. Checking for tech terms like r,java,eclipse etc.

Appreciate the help.

like image 334
test user Avatar asked Jul 07 '17 05:07

test user


People also ask

How do you check if a word exists in English?

Use the Oxford Text Checker to find out which words in any English text are included in our word lists. See a detailed analysis of the vocabulary in your text.

How do you check if a word is in the dictionary python?

You can check if a key exists in a dictionary using the keys() method and IN operator. What is this? The keys() method will return a list of keys available in the dictionary and IF , IN statement will check if the passed key is available in the list. If the key exists, it returns True else, it returns False .

How many English words are there?

We considered dusting off the dictionary and going from A1 to Zyzzyva, however, there are an estimated 171,146 words currently in use in the English language, according to the Oxford English Dictionary, not to mention 47,156 obsolete words.


2 Answers

I've faced some issues before, so sharing solutions to your problems :

1. There is a package qdapDictionaries which is a collection of dictionaries and word lists for use with the 'qdap' package.

library(qdapDictionaries)

#create custom function
is.word  <- function(x) x %in% GradyAugmented # or use any dataset from package

#use this function to filter words, df = dataframe from corpus
df <- df[which(is.word(df$terms)),]

2. Using VCorpus(DirSource(...)) to create your corpus from directory containing all resumes

resumeDir <- "path/all_resumes/"
myCorpus <- VCorpus(DirSource(resumeDir))

3. Create your custom dictionary file like my_dict.csv containing tech terms.

#read custom dictionary
tech_dict <- read.csv("path/to/my_dict.csv", stringsAsFactors = FALSE)
#create tech function
is.tech <- function(x) x %in% tech_dict
#filter
tech_df <- df[which(is.tech(df$terms)),]

Hope this helps.

like image 97
parth Avatar answered Sep 22 '22 10:09

parth


Try the dictionary R package (disclaimer: I am the maintainer of this R library)

Example

Here we get the definition of the word "hello"

word <- "hello"
word_info <- define(word)

word_info$meanings
## [[1]]
##   partOfSpeech
## 1  exclamation
## 2         noun
## 3         verb
##                                                                                definitions
## 1                used as a greeting or to begin a phone conversation., hello there, Katie!
## 2 an utterance of ‘hello’; a greeting., she was getting polite nods and hellos from people
## 3                            say or shout ‘hello’., I pressed the phone button and helloed
like image 29
stevec Avatar answered Sep 22 '22 10:09

stevec