Why does pandas remove leading zero when writing to a csv?

Question

I have a dataframe that has a column called 'CBG' with numbers as a string value.

    CBG             acs_total_persons   acs_total_housing_units
0   010010211001    1925                1013
1   010030114011    2668                1303
2   010070100043    930                 532

When I write it to a csv file, the leading 'O' are removed:

combine_acs_merge.to_csv(new_out_csv, sep=',')
>>> CBG: [0: 10010221101, ...]

It's already a string; how can I keep the leading zero from being removed in the .csv file

Karn Kumar · Accepted Answer

Lets take an example:

Below is your example DataFrame:

>>> df
    col1   num
0    One   011
1    two  0123
2  three  0122
3   four  0333

Considering the num as an int which you can convert to str().

>>> df["num"] = df["num"].astype(str)
>>> df.to_csv("datasheet.csv")

Output:

$ cat datasheet.csv

You will find the leading zeros are intacted..

,col1,num
0,One,011
1,two,0123
2,three,0122
3,four,0333

OR, if you reading the data from csv first then use belwo..

pd.read_csv('test.csv', dtype=str)

However, if your column CBG already str then it should be straight forward..

>>> df = pd.DataFrame({'CBG': ["010010211001", "010030114011", "010070100043"],
...                    'acs_total_persons': [1925, 2668, 930],
...                    'acs_total_housing_units': [1013, 1303, 532]})
>>>
>>> df
            CBG  acs_total_housing_units  acs_total_persons
0  010010211001                     1013               1925
1  010030114011                     1303               2668
2  010070100043                      532                930
>>> df.to_csv("CBG.csv")

result:

$ cat CBG.csv
,CBG,acs_total_housing_units,acs_total_persons
0,010010211001,1013,1925
1,010030114011,1303,2668
2,010070100043,532,930

manwithfewneeds · Answer

Pandas doesn't strip padded zeros. You're liking seeing this when opening in Excel. Open the csv in a text editor like notepad++ and you'll see they're still zero padded.

katsu · Answer

When reading a CSV file pandas tries to convert values in every column to some data type as it sees fit. If it sees a column which contains only digits it will set the dtype of this column to int64. This converts "010010211001" to 10010211001.

If you don't want any data type conversions to happen specify dtype=str when reading in the CSV file. As per pandas documentation for read_csv https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html:

dtype : Type name or dict of column -> type, optional

    Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’} Use str or object
    together with suitable na_values settings to preserve and not interpret dtype. If
    converters are specified, they will be applied INSTEAD of dtype conversion.

Why does pandas remove leading zero when writing to a csv?

Tags:

python

string

python-3.x

pandas

csv

gwydion93

3 Answers

Below is your example DataFrame:

$ cat datasheet.csv

Karn Kumar

manwithfewneeds

katsu

Recent Activity

Donate For Us

Why does pandas remove leading zero when writing to a csv?

Tags:

python

string

python-3.x

pandas

csv

gwydion93

3 Answers

Below is your example DataFrame:

$ cat datasheet.csv

Karn Kumar

manwithfewneeds

katsu

Related questions

Recent Activity

Donate For Us