I am using nltk, so I want to create my own custom texts just like the default ones on nltk.books. However, I've just got up to the method like <pre class="prettyprint"><code>my_text = ['This', 'is', 'my', 'text'] </code></pre> I'd like to discover any way to input my "text" as: <pre class="prettyprint"><code>my_text = "This is my text, this is a nice way to input text." </code></pre> Which method, python's or from nltk allows me to do this. And more important, how can I dismiss punctuation symbols?

This is actually on the main page of nltk.org: <pre class="prettyprint"><code>>>> import nltk >>> sentence = """At eight o'clock on Thursday morning ... Arthur didn't feel very good.""" >>> tokens = nltk.word_tokenize(sentence) >>> tokens ['At', 'eight', "o'clock", 'on', 'Thursday', 'morning', 'Arthur', 'did', "n't", 'feel', 'very', 'good', '.'] </code></pre>

How do I tokenize a string sentence in NLTK?

Tags:

python

tokenize

nlp

nltk

I am using nltk, so I want to create my own custom texts just like the default ones on nltk.books. However, I've just got up to the method like

my_text = ['This', 'is', 'my', 'text']

I'd like to discover any way to input my "text" as:

my_text = "This is my text, this is a nice way to input text."

Which method, python's or from nltk allows me to do this. And more important, how can I dismiss punctuation symbols?

850

asked Feb 24 '13 23:02

diegoaguilar

1 Answers

This is actually on the main page of nltk.org:

>>> import nltk >>> sentence = """At eight o'clock on Thursday morning ... Arthur didn't feel very good.""" >>> tokens = nltk.word_tokenize(sentence) >>> tokens ['At', 'eight', "o'clock", 'on', 'Thursday', 'morning', 'Arthur', 'did', "n't", 'feel', 'very', 'good', '.']

186

answered Sep 25 '22 19:09

Pavel Anossov

Related questions
                            
                                Tensorflow Tensorboard default port
                            
                                Executing Javascript from Python
                            
                                Subtract hours and minutes from time
                            
                                Fastest way to compute entropy in Python
                            
                                Windows cmd encoding change causes Python crash
                            
                                How to easily print ascii-art text? [closed]
                            
                                How to pass an operator to a python function?
                            
                                Why can't I join this tuple in Python?
                            
                                How to clear/delete the contents of a Tkinter Text widget?
                            
                                Matplotlib: save plot to numpy array
                            
                                What is the best way to copy a list? [duplicate]
                            
                                Getting wider output in PyCharm's built-in console
                            
                                pandas multiprocessing apply
                            
                                How to set local variable in list comprehension?
                            
                                Numpy: Checking if a value is NaT
                            
                                Parameterized queries with psycopg2 / Python DB-API and PostgreSQL
                            
                                How many bytes does a string have
                            
                                Round up to Second Decimal Place in Python
                            
                                How do you set a default value for a WTForms SelectField?
                            
                                How to print an exception in Python 3?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With