I am importing study data into a Pandas data frame using <code>read_csv</code>. My subject codes are 6 numbers coding, among others, the day of birth. For some of my subjects this results in a code with a leading zero (e.g. "010816"). When I import into Pandas, the leading zero is stripped of and the column is formatted as <code>int64</code>. Is there a way to import this column unchanged maybe as a string? I tried using a custom converter for the column, but it does not work - it seems as if the custom conversion takes place before Pandas converts to int.

As indicated in this question/answer by Lev Landau, there could be a simple solution to use <code>converters</code> option for a certain column in <code>read_csv</code> function. <pre class="prettyprint"><code>converters={'column_name': lambda x: str(x)} </code></pre> You can refer to more options of <code>read_csv</code> funtion in pandas.io.parsers.read_csv documentation. Lets say I have csv file <code>projects.csv</code> like below: <pre class="prettyprint"><code>project_name,project_id Some Project,000245 Another Project,000478 </code></pre> As for example below code is triming leading zeros: <pre class="prettyprint"><code>import csv from pandas import read_csv dataframe = read_csv('projects.csv') print dataframe </code></pre> Result: <pre class="prettyprint"><code>me@ubuntu:~$ python test_dataframe.py project_name project_id 0 Some Project 245 1 Another Project 478 me@ubuntu:~$ </code></pre> Solution code example: <pre class="prettyprint"><code>import csv from pandas import read_csv dataframe = read_csv('projects.csv', converters={'project_id': lambda x: str(x)}) print dataframe </code></pre> Required result: <pre class="prettyprint"><code>me@ubuntu:~$ python test_dataframe.py project_name project_id 0 Some Project 000245 1 Another Project 000478 me@ubuntu:~$ </code></pre> Update as it helps others: To have all columns as str, one can do this (from the comment): <pre class="prettyprint"><code>pd.read_csv('sample.csv', dtype = str) </code></pre> To have most or selective columns as str, one can do this: <pre class="prettyprint"><code># lst of column names which needs to be string lst_str_cols = ['prefix', 'serial'] # use dictionary comprehension to make dict of dtypes dict_dtypes = {x : 'str' for x in lst_str_cols} # use dict on dtypes pd.read_csv('sample.csv', dtype=dict_dtypes) </code></pre>

here is a shorter, robust and fully working solution: simply define a mapping (dictionary) between variable names and desired data type: <pre class="prettyprint"><code>dtype_dic= {'subject_id': str, 'subject_number' : 'float'} </code></pre> use that mapping with <code>pd.read_csv()</code>: <pre class="prettyprint"><code>df = pd.read_csv(yourdata, dtype = dtype_dic) </code></pre> et voila!

How to keep leading zeros in a column when reading CSV with Pandas?

2 Answers

As indicated in this question/answer by Lev Landau, there could be a simple solution to use converters option for a certain column in read_csv function.

converters={'column_name': lambda x: str(x)}

You can refer to more options of read_csv funtion in pandas.io.parsers.read_csv documentation.

Lets say I have csv file projects.csv like below:

project_name,project_id Some Project,000245 Another Project,000478

As for example below code is triming leading zeros:

import csv from pandas import read_csv  dataframe = read_csv('projects.csv') print dataframe

Result:

me@ubuntu:~$ python test_dataframe.py        project_name  project_id 0     Some Project         245 1  Another Project         478 me@ubuntu:~$

Solution code example:

import csv from pandas import read_csv  dataframe = read_csv('projects.csv', converters={'project_id': lambda x: str(x)}) print dataframe

Required result:

me@ubuntu:~$ python test_dataframe.py        project_name project_id 0     Some Project     000245 1  Another Project     000478 me@ubuntu:~$

Update as it helps others:

To have all columns as str, one can do this (from the comment):

pd.read_csv('sample.csv', dtype = str)

To have most or selective columns as str, one can do this:

# lst of column names which needs to be string lst_str_cols = ['prefix', 'serial'] # use dictionary comprehension to make dict of dtypes dict_dtypes = {x : 'str'  for x in lst_str_cols} # use dict on dtypes pd.read_csv('sample.csv', dtype=dict_dtypes)

128

answered Sep 21 '22 18:09

baltasvejas

here is a shorter, robust and fully working solution:

simply define a mapping (dictionary) between variable names and desired data type:

dtype_dic= {'subject_id': str,              'subject_number' : 'float'}

use that mapping with pd.read_csv():

df = pd.read_csv(yourdata, dtype = dtype_dic)

et voila!

answered Sep 23 '22 18:09

ℕʘʘḆḽḘ

Related questions
                            
                                Build a Call graph in python including modules and functions? [closed]
                            
                                python subprocess Popen environment PATH?
                            
                                Test discovery failure when tests in different directories are called the same
                            
                                Difference between dict and set (python)
                            
                                Python Math - TypeError: 'NoneType' object is not subscriptable
                            
                                What kind of problems (if any) would there be combining asyncio with multiprocessing?
                            
                                What is the difference between a function, an unbound method and a bound method?
                            
                                3-dimensional array in numpy
                            
                                Embed (create) an interactive Python shell inside a Python program
                            
                                Python script as linux service/daemon
                            
                                What's the difference between heapq and PriorityQueue in python?
                            
                                Hashing arrays in Python
                            
                                Can you check that an exception is thrown with doctest in Python?
                            
                                Use the default Python rather than the Anaconda installation when called from the terminal
                            
                                Why is '#!/usr/bin/env python' supposedly more correct than just '#!/usr/bin/python'?
                            
                                TypeError: unsupported operand type(s) for -: 'str' and 'int'
                            
                                Inherited class variable modification in Python
                            
                                Testing equality of three values
                            
                                How to convert OpenDocument spreadsheets to a pandas DataFrame?
                            
                                How to scrape a website which requires login using python and beautifulsoup?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to keep leading zeros in a column when reading CSV with Pandas?

Tags:

python

types

pandas

csv

user1802883

People also ask

2 Answers

baltasvejas

ℕʘʘḆḽḘ

Recent Activity

Donate For Us