apostrophe turning into \x92

Tags:

mycorpus.txt

Human where's machine interface for lab abc computer applications   
A where's survey of user opinion of computer system response time

stopwords.txt

let's
ain't
there's

The following code

corpus = set()
for line in open("path\\to\\mycorpus.txt"):
    corpus.update(set(line.lower().split()))
print corpus

stoplist = set()
for line in open("C:\\Users\\Pankaj\\Desktop\\BTP\\stopwords_new.txt"):
    stoplist.add(line.lower().strip())
print stoplist

gives the following output

set(['a', "where's", 'abc', 'for', 'of', 'system', 'lab', 'machine', 'applications', 'computer', 'survey', 'user', 'human', 'time', 'interface', 'opinion', 'response'])
set(['let\x92s', 'ain\x92t', 'there\x92s'])

Why is the apostrophe turning into \x92 in the 2nd set??

559

asked Mar 22 '13 06:03

Pankaj Singhal

1 Answers

Code point 92(hex) in window-1252 encoding is Unicode code point 2019(hex) which is 'RIGHT SINGLE QUOTATION MARK'. This looks very like an apostrophe and is likely to be the actual character that you have in stopwords.txt, which I've guessed from the way python has interpreted in, has be encoded in windows-1252 or an encoding that shares ASCII and ’ codepoint values.

' vs ’

answered Sep 21 '22 14:09

CB Bailey

Related questions
                            
                                Speed of Python Extensions in C vs. C
                            
                                A fast python HTML parser [closed]
                            
                                python JSON array newlines
                            
                                Python class constructor with default arguments [duplicate]
                            
                                Can I override a C++ virtual function within Python with Cython?
                            
                                Is there a decent way of creating a copy constructor in python?
                            
                                Determining if a GIF is transparent in Python
                            
                                how do I properly inherit from a superclass that has a __new__ method?
                            
                                Automatic python code formatting in sublime
                            
                                How can I check the value of a DNS TXT record for a host?
                            
                                How to compile static library with -fPIC from boost.python
                            
                                What are ngram counts and how to implement using nltk?
                            
                                Python re.findall() is not working as expected
                            
                                Truncating string to byte length in Python
                            
                                Empty list is equal to None or not? [duplicate]
                            
                                How to implement a verbose REGEX in Python
                            
                                How to use malt parser in python nltk
                            
                                Subprocess.call or Subprocess.Popen cannot use executables that are in PATH (Linux/Windows)
                            
                                python flask before_request exclude /static directory
                            
                                python pandas DataFrame subplot in columns and rows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

apostrophe turning into \x92

Tags:

python

python-2.7

apostrophe

Pankaj Singhal

People also ask

1 Answers

CB Bailey

Recent Activity

Donate For Us