I have some troubles with a dataframe obtained from reading a xls file. Every data on such dataframe has the type 'unicode' and I can't do anything with this. I wanna change it to str values. Also, iff possible, I'd like to know the reason of this fact. I heard something about 'external data', and I know that both columns and index also present the 'u' of unicode before the names of these ones. I don't know neither almost anything about encoding and I would be really grateful if someone explains something about this in addition. I'm using Python 2 and I tryed to solve it column by column with functions as <pre class="prettyprint"><code>.astype(str) .astype(basestring) .apply(str) </code></pre> and <pre class="prettyprint"><code>.str.decode('iso-8859-1').str.encode('utf-8') </code></pre> (I read this last one here and I just wrote it in my code to try another thing). I also tried <pre class="prettyprint"><code>unicodedata.normalize('NFKD', df_bolsa[l]).encode('ascii','ignore') </code></pre> but this last one cannot be used with a series. I hope someone to be able to help me to clarify this matter. Thank you very much in advance!!

You can use the following code. <pre class="prettyprint"><code>for column in df: df[column] = df_peru[column].str.encode('utf-8') </code></pre>

unicode datas of a dataframe to strings

Tags:

I have some troubles with a dataframe obtained from reading a xls file. Every data on such dataframe has the type 'unicode' and I can't do anything with this. I wanna change it to str values. Also, iff possible, I'd like to know the reason of this fact. I heard something about 'external data', and I know that both columns and index also present the 'u' of unicode before the names of these ones. I don't know neither almost anything about encoding and I would be really grateful if someone explains something about this in addition.

I'm using Python 2 and I tryed to solve it column by column with functions as

Click to copy

.astype(str) 
.astype(basestring)
.apply(str)

and

Click to copy

.str.decode('iso-8859-1').str.encode('utf-8')

(I read this last one here and I just wrote it in my code to try another thing). I also tried

Click to copy

unicodedata.normalize('NFKD', df_bolsa[l]).encode('ascii','ignore')

but this last one cannot be used with a series. I hope someone to be able to help me to clarify this matter. Thank you very much in advance!!

401

asked Feb 23 '17 17:02

emilio.molina

2 Answers

You can use the following code.

Click to copy

for column in df:
    df[column] = df_peru[column].str.encode('utf-8')

187

answered Sep 17 '22 23:09

emilio.molina

To help others, this version worked for me. I was getting an error while loading my dataframe to an oracle database: "UnicodeDecodeError: 'ascii' codec can't decode byte 0xea in position 2: ordinal not in range(128)"

I am on Python ver 2.7

Click to copy

for column in df:
    df[column]=  df[column].astype(str).str.decode('utf-8')

answered Sep 19 '22 23:09

MEdwin

Related questions
                            
                                Better way to write a polling function in python
                            
                                Fast (vectorized) way to find points in one DF belonging to equally sized rectangles (given by two points) from the second DF
                            
                                NumPy - How to bitwise and over each element in matrix rows
                            
                                Graph-tool: subgraphs as new Graph objects
                            
                                How can I convert nested dictionary keys to strings?
                            
                                Fastest way to fill numpy array with distances from a point
                            
                                Beam/Dataflow Python: AttributeError: '_UnwindowedValues' object has no attribute 'sort'
                            
                                Python unable to compare bound method to itself
                            
                                mocking global variables on python doesn't work
                            
                                Rotate screen in mac os with terminal
                            
                                Use boto3 to download from public bucket
                            
                                Python regex search: repeated digit n times
                            
                                How to switch two elements in string using Python RegEx?
                            
                                Counting consecutive 1's in NumPy array
                            
                                aiohttp - Set a cookie and then redirect the user
                            
                                Histogram bin size in seaborn
                            
                                Undefined symbol using Boost/Python
                            
                                How to test Python classes that depend on argparse?
                            
                                Jupyter notebook keeps reconnecting to kernel
                            
                                How to extend OrderedDict with defaultdict behavior

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

unicode datas of a dataframe to strings

Tags:

python

pandas

python-unicode

python-2.7

unicode-string

emilio.molina

People also ask

2 Answers

emilio.molina

MEdwin

Recent Activity

Donate For Us