Keras Tokenizer num_words doesn't seem to work

Tags:

>>> t = Tokenizer(num_words=3) >>> l = ["Hello, World! This is so&#$ fantastic!", "There is no other world like this one"] >>> t.fit_on_texts(l) >>> t.word_index {'fantastic': 6, 'like': 10, 'no': 8, 'this': 2, 'is': 3, 'there': 7, 'one': 11, 'other': 9, 'so': 5, 'world': 1, 'hello': 4}

I'd have expected t.word_index to have just the top 3 words. What am I doing wrong?

617

asked Sep 13 '17 16:09

max_max_mir

1 Answers

There is nothing wrong in what you are doing. word_index is computed the same way no matter how many most frequent words you will use later (as you may see here). So when you will call any transformative method - Tokenizer will use only three most common words and at the same time, it will keep the counter of all words - even when it's obvious that it will not use it later.

148

answered Oct 05 '22 19:10

Marcin Możejko

Related questions
                            
                                Does this app use the Advertising Identifier (IDFA)? when using Firebase
                            
                                Using async and await with export const
                            
                                Picking unordered combinations from pools with overlap
                            
                                Are there any concepts for firestore db schema migrations comparable to rails ActiveRecord migrations?
                            
                                Immediately invoked function expression without using grouping operator
                            
                                Exiting because upload-symbols was run in validation mode
                            
                                Cannot fix "Could not find or use auto-linked library "
                            
                                Is there a way to get the project folder name only on a Github action?
                            
                                What keyword arguments does setuptools.setup() accept?
                            
                                how to force clearing cache in chrome when release new Vue app version
                            
                                How to use the GitHub Actions `workflow_run` event?
                            
                                Speeding up an ASP.Net Web Site or Application

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With