<p>Is there any way for me to preserve punctuation marks of !, ?, " and ' from my text documents using text <code>CountVectorizer</code> or <code>TfidfVectorizer</code> parameters in scikit-learn?</p>

<p>You should customize the <code>token_pattern</code> parameter when you instantiate the vectorizer. For example:</p> <pre class="prettyprint"><code>vent = CountVectorizer(token_pattern=r"(?u)\b\w\w+\b|!|\?|\"|\'") </code></pre>

How to preserve punctuation marks in Scikit-Learn text CountVectorizer or TfidfVectorizer?

1 Answers

You should customize the token_pattern parameter when you instantiate the vectorizer. For example:

vent = CountVectorizer(token_pattern=r"(?u)\b\w\w+\b|!|\?|\"|\'")

answered Sep 19 '22 00:09

elyase

Related questions
                            
                                how to get the shifted index value of a dataframe in Pandas?
                            
                                How to set the build description via Jenkins REST API or Python?
                            
                                How does the indexing of subplots work
                            
                                python flask can't find '__main__' module in ''
                            
                                Python at Synology, how to get Python3 modules installed and where is Python2.7 installed?
                            
                                how to convert column names into column values in pandas - python
                            
                                Splitting a string in pandas and join it to the old data
                            
                                Pandas, conditional column assignment based on column values
                            
                                Pandas: drop rows based on duplicated values in a list
                            
                                Add UUID's to pandas DF
                            
                                Why is matplotlib's notched boxplot folding back on itself?
                            
                                Error when creating executable file with pyinstaller
                            
                                assertRaises for method with optional parameters
                            
                                Using replace() method in python by index [duplicate]
                            
                                Django Channels
                            
                                How to create a new color image with python Imaging?
                            
                                UnicodeDecodeError on python3 [duplicate]
                            
                                Converting a dataframe to dictionary with multiple values
                            
                                Dataframe SMA Calculation
                            
                                Python: Correct Way to refer to index of unicode string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to preserve punctuation marks in Scikit-Learn text CountVectorizer or TfidfVectorizer?

Tags:

python

punctuation

nltk

scikit-learn

countvectorizer

Suhairi Suhaimin

People also ask

1 Answers

elyase

Recent Activity

Donate For Us