I'm making a spam classifier using a Naive Bayes Classifier model from the Julia TextAnalysis.jl package.
The text pre-processing functions (like remove_corrupt_utf8!(sd) where sd is a StringDocument) can only be applied to Document types (specific to the package) and not to string type.
Is there any way I can convert this StringDocument back into a string to put back into my dataframe.
Current code:
#global messageLis = []
for row in eachrow(data)
message = row.v2
#push!(messageLis, message)
StringDoc = StringDocument(message)
remove_corrupt_utf8!(StringDoc) #to remove the corrupt characters (if any) in the message so that model doesnt fail
#convert StringDoc back into a string so that text is preprocessed from the dataframe itself.
end
Any help would be appreciated.
Use text to access the processed string:
julia> str = StringDocument("here are some punctuations !!!...");
julia> prepare!(str, strip_punctuation)
julia> text(str)
"here are some punctuations "
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With