I have a dataframe that has a column called 'CBG' with numbers as a string value.
CBG acs_total_persons acs_total_housing_units
0 010010211001 1925 1013
1 010030114011 2668 1303
2 010070100043 930 532
When I write it to a csv file, the leading 'O' are removed:
combine_acs_merge.to_csv(new_out_csv, sep=',')
>>> CBG: [0: 10010221101, ...]
It's already a string; how can I keep the leading zero from being removed in the .csv
file
Lets take an example:
>>> df
col1 num
0 One 011
1 two 0123
2 three 0122
3 four 0333
Considering the num
as an int which you can convert to str()
.
>>> df["num"] = df["num"].astype(str)
>>> df.to_csv("datasheet.csv")
Output:
You will find the leading zeros are intacted..
,col1,num
0,One,011
1,two,0123
2,three,0122
3,four,0333
OR, if you reading the data from csv first then use belwo..
pd.read_csv('test.csv', dtype=str)
However, if your column CBG
already str
then it should be straight forward..
>>> df = pd.DataFrame({'CBG': ["010010211001", "010030114011", "010070100043"],
... 'acs_total_persons': [1925, 2668, 930],
... 'acs_total_housing_units': [1013, 1303, 532]})
>>>
>>> df
CBG acs_total_housing_units acs_total_persons
0 010010211001 1013 1925
1 010030114011 1303 2668
2 010070100043 532 930
>>> df.to_csv("CBG.csv")
result:
$ cat CBG.csv
,CBG,acs_total_housing_units,acs_total_persons
0,010010211001,1013,1925
1,010030114011,1303,2668
2,010070100043,532,930
Pandas doesn't strip padded zeros. You're liking seeing this when opening in Excel. Open the csv in a text editor like notepad++ and you'll see they're still zero padded.
When reading a CSV file pandas tries to convert values in every column to some data type as it sees fit. If it sees a column which contains only digits it will set the dtype of this column to int64. This converts "010010211001" to 10010211001.
If you don't want any data type conversions to happen specify dtype=str when reading in the CSV file. As per pandas documentation for read_csv https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html:
dtype : Type name or dict of column -> type, optional
Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32, ‘c’: ‘Int64’} Use str or object
together with suitable na_values settings to preserve and not interpret dtype. If
converters are specified, they will be applied INSTEAD of dtype conversion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With