I'm performing some text analysis on mutliple resume to generate a wordcloud
using wordcloud
package along with tm
package for preprocessing the corpus of document in R.
The problems i'm facing are :
Checking whether the word in corpus have some meaning ie. it belongs to english dictionary.
How to mine / process multiple resumes together.
Checking for tech terms like r,java,eclipse etc.
Appreciate the help.
Use the Oxford Text Checker to find out which words in any English text are included in our word lists. See a detailed analysis of the vocabulary in your text.
You can check if a key exists in a dictionary using the keys() method and IN operator. What is this? The keys() method will return a list of keys available in the dictionary and IF , IN statement will check if the passed key is available in the list. If the key exists, it returns True else, it returns False .
We considered dusting off the dictionary and going from A1 to Zyzzyva, however, there are an estimated 171,146 words currently in use in the English language, according to the Oxford English Dictionary, not to mention 47,156 obsolete words.
I've faced some issues before, so sharing solutions to your problems :
1. There is a package qdapDictionaries
which is a collection of dictionaries and word lists for use with the 'qdap' package.
library(qdapDictionaries)
#create custom function
is.word <- function(x) x %in% GradyAugmented # or use any dataset from package
#use this function to filter words, df = dataframe from corpus
df <- df[which(is.word(df$terms)),]
2. Using VCorpus(DirSource(...))
to create your corpus from directory containing all resumes
resumeDir <- "path/all_resumes/"
myCorpus <- VCorpus(DirSource(resumeDir))
3. Create your custom dictionary file like my_dict.csv containing tech
terms.
#read custom dictionary
tech_dict <- read.csv("path/to/my_dict.csv", stringsAsFactors = FALSE)
#create tech function
is.tech <- function(x) x %in% tech_dict
#filter
tech_df <- df[which(is.tech(df$terms)),]
Hope this helps.
Try the dictionary
R package (disclaimer: I am the maintainer of this R library)
Here we get the definition of the word "hello"
word <- "hello"
word_info <- define(word)
word_info$meanings
## [[1]]
## partOfSpeech
## 1 exclamation
## 2 noun
## 3 verb
## definitions
## 1 used as a greeting or to begin a phone conversation., hello there, Katie!
## 2 an utterance of ‘hello’; a greeting., she was getting polite nods and hellos from people
## 3 say or shout ‘hello’., I pressed the phone button and helloed
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With