Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I revert StringDocument <Type> back into a string ? (TextAnalysis.jl)

Tags:

nlp

julia

I'm making a spam classifier using a Naive Bayes Classifier model from the Julia TextAnalysis.jl package.

The text pre-processing functions (like remove_corrupt_utf8!(sd) where sd is a StringDocument) can only be applied to Document types (specific to the package) and not to string type.

Is there any way I can convert this StringDocument back into a string to put back into my dataframe.

Current code:

#global messageLis = []
for row in eachrow(data)
    message = row.v2
    #push!(messageLis, message)
    StringDoc = StringDocument(message)
    remove_corrupt_utf8!(StringDoc) #to remove the corrupt characters (if any) in the message so that model doesnt fail
    #convert StringDoc back into a string so that text is preprocessed from the dataframe itself.
end

Any help would be appreciated.

like image 970
PseudoCodeNerd Avatar asked Mar 10 '26 07:03

PseudoCodeNerd


1 Answers

Use text to access the processed string:

julia> str = StringDocument("here are some punctuations !!!...");

julia> prepare!(str, strip_punctuation)

julia> text(str)
"here are some punctuations "
like image 155
David Varela Avatar answered Mar 12 '26 11:03

David Varela



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!