Stanford POS tagger in Java usage

Tags:

Mar 9, 2011 1:22:06 PM edu.stanford.nlp.process.PTBLexer next
WARNING: Untokenizable: � (U+FFFD, decimal: 65533)
Mar 9, 2011 1:22:06 PM edu.stanford.nlp.process.PTBLexer next
WARNING: Untokenizable: � (U+FFFD, decimal: 65533)
Mar 9, 2011 1:22:06 PM edu.stanford.nlp.process.PTBLexer next
WARNING: Untokenizable: � (U+FFFD, decimal: 65533)
Mar 9, 2011 1:22:06 PM edu.stanford.nlp.process.PTBLexer next
WARNING: Untokenizable: � (U+FFFD, decimal: 65533)
Mar 9, 2011 1:22:06 PM edu.stanford.nlp.process.PTBLexer next
WARNING: Untokenizable: � (U+FFFD, decimal: 65533)
Mar 9, 2011 1:22:06 PM edu.stanford.nlp.process.PTBLexer next
WARNING: Untokenizable: � (U+FFFD, decimal: 65533)
Mar 9, 2011 1:22:06 PM edu.stanford.nlp.process.PTBLexer next
WARNING: Untokenizable: � (U+FFFD, decimal: 65533)

These are the errors that I'm getting when I want to assign POS tags to sentences. I read sentences from a file. Initially (for few sentences) I'm not getting this error (i.e untokenizable), but after reading some sentences this error arises. I use v2.0 (i.e. 2009) of POS tagger and model is left3words.

508

asked Mar 09 '11 08:03

KNsiva

1 Answers

I agree with Yuval -- a character encoding problem, but the commonest case is actually when the file is in a single byte encoding such as ISO-8859-1 while the tagger is trying to read it in UTF-8. See the discussion of U+FFFD on Wikipedia.

152

answered Nov 15 '22 22:11

Christopher Manning

Related questions
                            
                                Best books to learn Google app engine and GWT? [closed]
                            
                                Vertical headers in JTable?
                            
                                Launch MongoDB with Maven
                            
                                Why is my System.nanoTime() broken?
                            
                                Database Access in Android
                            
                                How can I start and keep running hsqldb in server mode from within my web application?
                            
                                Setting up environment variable in ant script
                            
                                JSF CDI : Conversation scope bean[s] best practice
                            
                                Solving The 8 Puzzle With A* Algorithm
                            
                                How do I debug GlassFish 3 using Eclipse Helios?
                            
                                Why do Guava classes provide so many factory methods instead of just one that takes varargs? [duplicate]
                            
                                Hibernate @OneToOne mapping with a @Where clause
                            
                                Java floating-point numbers representation as a hexadecimal numbers
                            
                                Capture javax.net.debug to file
                            
                                Transaction is alternating Timeouts
                            
                                Java: Dead code elimination
                            
                                nBuilder alternative for Java
                            
                                Java API for financial data [closed]
                            
                                SimpleXml framework - embedded collections
                            
                                Why do my SwingWorker threads keep running even though they are done executing?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Stanford POS tagger in Java usage

Tags:

java

stanford-nlp

pos-tagger

KNsiva

People also ask

1 Answers

Christopher Manning

Recent Activity

Donate For Us