What can I do to prevent pandas from converting my string values to float. The column Billing Doc. and Sales Order contain number 10-11 digit numbers which are to be stored in MySQL table inside a column which has a datatype of CHAR(15). When I execute the following script, I see .0 at the end of each number. I want to treat them as string/char in our db.
The Billing Doc. field contains numbers like 3206790137, 3209056079, 3209763880, 3209763885, 3206790137 who is stored in DB as 3206790137.0, 3209056079.0, 3209763880.0, 3209763885.0, 3206790137.0. The column data type for Billing doc in database is CHAR(15).
def insert_billing(df):
df = df.where((pd.notnull(df)), None)
for row in df.to_dict(orient="records"):
bill_item = row['Bill.Item']
bill_qty = row['Billed Qty']
bill_doct_date = row['Billi.Doc.Date']
bill_doc = row['Billing Doc.']
bill_net_value = row['Billi.Net Value']
sales_order = row['Sales Order']
import_date = DT.datetime.now().strftime('%Y-%m-%d')
query = "INSERT INTO sap_billing(" \
"bill_item, " \
"bill_qty, " \
"bill_doc_date, " \
"bill_doc, " \
"bill_net_value, " \
"sales_order, " \
"import_date" \
") VALUES (" \
"\"{}\", \"{}\", \"{}\", \"{}\"," \
"\"{}\", \"{}\", \"{}\"" \
") ON DUPLICATE KEY UPDATE " \
"bill_qty = VALUES(bill_qty), " \
"bill_doc_date = VALUES(bill_doc_date), " \
"bill_net_value = VALUES(bill_net_value), " \
"import_date = VALUES(import_date) " \
"".format(
bill_item,
bill_qty,
bill_doct_date,
bill_doc,
bill_net_value,
sales_order,
import_date
)
query = query.replace('\"None\"', 'NULL')
query = query.replace('(None', '(NULL')
query = query.replace('\"NaT\"', 'NULL')
query = query.replace('(NaT', '(NULL')
try:
q1 = gesdb_connection.execute(query)
except Exception as e:
print(bill_item, bill_doc, sales_order, e)
if __name__ == "__main__":
engine_str = 'mysql+mysqlconnector://root:abc123@localhost/mydb'
file_name = "tmp/dataload/so_tracking.XLSX"
df = pd.read_excel(file_name)
if df.shape[1] == 35 and compare_columns(list(df.columns.values)) == 1:
insert_billing(df)
else:
print("Incorrect column count, column order or column headers.\n")
When I create a simple df and print it the problem does not show up.
import pandas as pd
df = pd.DataFrame({'Sales Order': [1217252835, 1217988754, 1219068439],
'Billing Doc.': [3222102723, 3209781889, 3214305818]})
>>> df
Billing Doc. Sales Order
0 3222102723 1217252835
1 3209781889 1217988754
2 3214305818 1219068439
However, when I read through excel and then print it, the column is read as float64.
file_name = "tmp/dataload/so_tracking.XLSX"
df = pd.read_excel(file_name)
print(df['Billing Doc.'])
680 3.252170e+09
681 3.252170e+09
682 3.252170e+09
683 3.252170e+09
684 3.252170e+09
685 3.252170e+09
686 3.252170e+09
687 3.252170e+09
688 3.252170e+09
689 3.252170e+09
690 3.252170e+09
.
.
.
694 3.251601e+09
695 3.251631e+09
696 3.252013e+09
697 NaN
698 3.252272e+09
699 3.252360e+09
700 3.252474e+09
.
.
Name: Billing Doc., dtype: float64
I found the solution myself, posting here to document it.
df = pd.read_excel(file_name, converters={'Billing Doc.' : str})
print(df['Billing Doc.'])
695 3251631331
696 3252012614
697 NaN
698 3252272451
699 3252359504
700 3252473894
701 NaN
702 NaN
703 NaN
704 3252652940
705 NaN
706 NaN
707 NaN
708 NaN
Name: Billing Doc., dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With