Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save a CSV from dataframe, to keep zeros left in column with numbers?

Tags:

python

pandas

csv

In Python 3 and pandas I have a dataframe with a column cpf with codes

candidatos_2014.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 26245 entries, 0 to 1063
Data columns (total 7 columns):
uf                 26245 non-null object
cargo              26245 non-null object
nome_completo      26245 non-null object
cpf                26245 non-null object
nome_urna          26245 non-null object
partido_eleicao    26245 non-null object
situacao           26245 non-null object
dtypes: object(7)
memory usage: 1.6+ MB

The codes are numbers like these: "00229379273", "84274662268", "09681949153", "53135636534"...

I saved as CSV

candidatos_2014.to_csv('candidatos_2014.csv')

I use Ubuntu and LibreOffice. But when I opened the file the cpf column does not show the leading zeros:

"229379273", "9681949153"

Please, is there a way to save a CSV that keeps zeros to the left in a column that only has numbers?

like image 723
Reinaldo Chaves Avatar asked Feb 21 '18 10:02

Reinaldo Chaves


People also ask

How do I keep the trailing zeros in a CSV file?

The fact is, when exporting from Order Time, the trailing zeroes are not being removed at all. When you open the CSV directly in Excel it is changing the formatting which is messing up the original CSV. To fix this you need to do a data import into an excel file from the CSV instead of opening it right in excel.

How do I store numbers in CSV?

To preserve all the digits in text-formatted numbers, you have to import the downloaded CSV file as raw data into a new Excel spreadsheet, set the column datatypes as needed, and then save the new file as an Excel workbook. Excel (XLSX) files will preserve these formats, CSV files won't.

Why the leading zeros are dropped CSV file?

Excel opens CSV files automatically and, if a field contains all numbers, imports that field as a number. Because leading zeroes are unnecessary for a true number, Excel strips them off.

How can I remove leading zeros from a CSV file?

If you have a lot of columns and you don't know which ones contain leading zeros that might be missed, or you might just need to automate your code. You can do the following: By doing this you will have all your columns as strings and you won't lose any leading zeros. pd.read_csv ('filename.csv', dtype= {'zero_column_name': object})

How to save the leading zeros in Excel?

If you format the cells in Excel as (for example) 00000, and save as .csv, the leading zeros will be saved to the .csv file. You can verify this by opening the .csv file in Notepad or another text editor. But if you open the .csv file in Excel, the format will be lost. So: if you want to keep the formatting, don't open a .csv file in Excel! ;-)

How to save a pandas Dataframe as a CSV file?

DataFrames is a 2-Dimensional labeled Data Structure with index for rows and columns, where each cell is used to store a value of any type. Basically, DataFrames are Dictionary based out of NumPy Arrays. Let’s see how to save a Pandas DataFrame as a CSV file using to_csv() method. Example #1: Save csv to working directory.

How do I convert a CSV file to excel with zeros?

If you know your data contains leading zeroes, stop, before you go jumping into Excel > Open > your CSV. Instead, follow the Text Import Wizard that is built into Excel, in order to convert your CSV to Excel format step-by-step so that nothing gets lost in translation.


3 Answers

Specify dtype as string while reading the csv file as below:

# if you are reading data with leading zeros
candidatos_2014 = pd.read_csv('candidatos_2014.csv', dtype ='str')

or convert data column into string

# if data is generated in python you can convert column into string first
candidatos_2014['cpf'] = candidatos_2014['cpf'].astype('str')
candidatos_2014.to_csv('candidatos_2014.csv')
like image 118
Sociopath Avatar answered Oct 13 '22 21:10

Sociopath


First, make sure that output in your csv file does not have zeros. If it does, but you are opening that file in Excel or another spreadsheet, you still sometimes can see values without leading zeros. In this case, Go to Data menu, then Import form Text. Excel's import utility will give you options to define each column's data type.

I am sure that it should be similar in other apps.

Hope it helps!

like image 26
Ivan S. Avatar answered Oct 13 '22 22:10

Ivan S.


TLDR: you don't have to do anything if your pandas columns are type object

I feel like both answers here, but especially the accepted answer, are confusing. The short answer is that, if the dtype of your column is object, then pandas will write it with leading zeros. There's nothing to do.

If like me, you came here because you didn't know that for sure and when you opened the CSV, the leading zeros were gone, then follow Ivan S's advice -- take a look at the file you wrote to verify, but you should see the leading zeros there.

If you do, then both answers give guidance on how to read the data back in preserving leading zeros.

If you don't, the the datatype wasn't correct in pandas when you saved the CSV. Just changing that column using astype wouldn't restore the zeros. You'd also need to use str.zfill as described in this SO answer.

like image 34
Joe Germuska Avatar answered Oct 13 '22 21:10

Joe Germuska