Python df.to_excel() storing numbers as text in excel. How to store as Value?

Tags:

I am scraping table data from google finance through pd.read_html and then saving that data to excel through df.to_excel() as seen below:

    dfs = pd.read_html('https://www.google.com/finance?q=NASDAQ%3AGOOGL&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM', flavor='html5lib')
    xlWriter = pd.ExcelWriter(output.xlsx, engine='xlsxwriter')

    for i, df in enumerate(dfs):
        df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))
    xlWriter.save()

However, the numbers that are saved to excel are stored as text with the little green triangle in the corner of the cell. When moving over this data to excel, how do I store them as actual values and not text?

864

asked Dec 10 '16 22:12

gluc7

3 Answers

In addition to the other solutions where the string data is converted to numbers when creating or using the dataframe it is also possible to do it using options to the xlsxwriter engine:

# Versions of Pandas >= 1.3.0:
writer = pd.ExcelWriter('output.xlsx',
                        engine='xlsxwriter',
                        engine_kwargs={'options': {'strings_to_numbers': True}})

# Versions of Pandas < 1.3.0:
writer = pd.ExcelWriter('output.xlsx',
                        engine='xlsxwriter',
                        options={'strings_to_numbers': True})

From the docs:

strings_to_numbers: Enable the worksheet.write() method to convert strings to numbers, where possible, using float() in order to avoid an Excel warning about "Numbers Stored as Text".

180

answered Oct 20 '22 14:10

jmcnamara

Consider converting numeric columns to floats since the pd.read_html reads web data as string types (i.e., objects). But before converting to floats, you need to replace hyphens to NaNs:

import pandas as pd
import numpy as np

dfs = pd.read_html('https://www.google.com/finance?q=NASDAQ%3AGOOGL' +
                   '&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM', flavor='html5lib')
xlWriter = pd.ExcelWriter('Output.xlsx', engine='xlsxwriter')
workbook = xlWriter.book

for i, df in enumerate(dfs):
    for col in df.columns[1:]:                  # UPDATE ONLY NUMERIC COLS 
        df.loc[df[col] == '-', col] = np.nan    # REPLACE HYPHEN WITH NaNs
        df[col] = df[col].astype(float)         # CONVERT TO FLOAT   

    df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))

xlWriter.save()

answered Oct 20 '22 16:10

Parfait

That is probably because the Data Types of those columns where the warning is showing are objects and not Numeric Types, such as int or float.

In order to check the Data Types of each column of the DataFrame, use dtypes, such as

print(df.dtypes)

In my case, the column that was stored as object instead of a numeric value, was PRECO_ES

DF dtypes

As, in my particular case, the decimal numbers are relevant, I have converted it, using astype, to float, as following

df['PRECO_ES'] = df['PRECO_ES'].astype(float)

If we check again the Data Types, we get the following

DF column changed to float

Then, all you have to do is export the DataFrame to Excel

#Export the DataFRame (df) to XLS
xlsFile = "Preco20102019.xls"
df.to_excel(xlsFile)

#Export the DataFRame (df) to CSV
csvFile = "Preco20102019.csv"
df.to_csv(csvFile)

If I then open the Excel file, I can see that the warning is not showing anymore, as the values are stored as numeric and not as text

Excel file without the warning

answered Oct 20 '22 16:10

Gonçalo Peres

Related questions
                            
                                Bit masking in Python
                            
                                Python: how to store a numpy multidimensional array in PyTables?
                            
                                How do I unpack a list with fewer variables?
                            
                                Problems with Jinja2: TemplateNotFound: index.html
                            
                                Using the SQLAlchemy ORM inside an Alembic migration: how do I?
                            
                                simple graphics for python
                            
                                Replace all text between 2 strings python
                            
                                Showing Pandas data frame as a table
                            
                                How to assign to a Django PointField model attribute?
                            
                                Django email with smtp.gmail SMTPAuthenticationError 534 Application-specific password required
                            
                                Percentage Overlap of Two Lists
                            
                                PySpark add a column to a DataFrame from a TimeStampType column
                            
                                Python Click command names
                            
                                Generate random number outside of range in python
                            
                                understanding python xgboost cv
                            
                                how to hide "py4j.java_gateway:Received command c on object id p0"?
                            
                                Can a method within a class be generator?
                            
                                Simple multithread for loop in Python
                            
                                Pass parameters to schedule
                            
                                Is it possible for Airflow scheduler to first finish the previous day's cycle before starting the next?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python df.to_excel() storing numbers as text in excel. How to store as Value?

Tags:

python

html

pandas

dataframe

excel