Pandas read_csv dtype read all columns but few as string

Tags:

I'm using Pandas to read a bunch of CSVs. Passing an options json to dtype parameter to tell pandas which columns to read as string instead of the default:

dtype_dic= { 'service_id':str, 'end_date':str, ... } feedArray = pd.read_csv(feedfile , dtype = dtype_dic)

In my scenario, all the columns except a few specific ones are to be read as strings. So instead of defining several columns as str in dtype_dic, I'd like to set just my chosen few as int or float. Is there a way to do that?

It's a loop cycling through various CSVs with differing columns, so a direct column conversion after having read the whole csv as string (dtype=str), would not be easy as I would not immediately know which columns that csv is having. (I'd rather spend that effort in defining all the columns in the dtype json!)

Edit: But if there's a way to process the list of column names to be converted to number without erroring out if that column isn't present in that csv, then yes that'll be a valid solution, if there's no other way to do this at csv reading stage itself.

Note: this sounds like a previously asked question but the answers there went down a very different path (bool related) which doesn't apply to this question. Pls don't mark as duplicate!

953

asked Apr 06 '18 03:04

Nikhil VJ

2 Answers

EDIT - sorry, I misread your question. Updated my answer.

You can read the entire csv as strings then convert your desired columns to other types afterwards like this:

df = pd.read_csv('/path/to/file.csv', dtype=str) # example df; yours will be from pd.read_csv() above df = pd.DataFrame({'A': ['1', '3', '5'], 'B': ['2', '4', '6'], 'C': ['x', 'y', 'z']}) types_dict = {'A': int, 'B': float} for col, col_type in types_dict.items():     df[col] = df[col].astype(col_type)

Another approach, if you really want to specify the proper types for all columns when reading the file in and not change them after: read in just the column names (no rows), then use those to fill in which columns should be strings

col_names = pd.read_csv('file.csv', nrows=0).columns types_dict = {'A': int, 'B': float} types_dict.update({col: str for col in col_names if col not in types_dict}) pd.read_csv('file.csv', dtype=types_dict)

169

answered Sep 23 '22 06:09

Nathan

I recently encountered the same issue, though I only have one csv file so I don't need to loop over files. I think this solution can be adapted into a loop as well.

Here I present a solution I used. Pandas' read_csv has a parameter called converters which overrides dtype, so you may take advantage of this feature.

An example code is as follows: Assume that our data.csv file contains all float64 columns except A and B which are string columns. You may read this file using:

df = pd.read_csv('data.csv', dtype = 'float64', converters = {'A': str, 'B': str})

The code gives warnings that converters override dtypes for these two columns A and B, and the result is as desired.

Regarding looping over several csv files all one needs to do is to figure out which columns will be exceptions to put in converters. This is easy if files have a similar pattern of column names, otherwise, it would get tedious.

answered Sep 26 '22 06:09

Muhammet Coskun

Related questions
                            
                                Cartesian product of a dictionary of lists
                            
                                UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 3 2: ordinal not in range(128)
                            
                                What is the purpose of the colon before a block in Python?
                            
                                Pandas: Appending a row to a dataframe and specify its index label
                            
                                Is there a way to get the largest integer one can use in Python? [duplicate]
                            
                                How to extend Python Enum?
                            
                                How to share conda environments across platforms
                            
                                How to determine the length of lists in a pandas dataframe column
                            
                                Getting started with the Python debugger, pdb [closed]
                            
                                Generate RFC 3339 timestamp in Python [duplicate]
                            
                                How to solve a pair of nonlinear equations using Python?
                            
                                How to convert string to datetime format in pandas python?
                            
                                In Python 2, how do I write to variable in the parent scope?
                            
                                Python requests speed up using keep-alive
                            
                                How to Copy from IPython session without terminal prompts
                            
                                How to set opacity of background colour of graph with Matplotlib
                            
                                How to get HTML from a beautiful soup object
                            
                                Python pandas check if dataframe is not empty
                            
                                How to trigger function on value change?
                            
                                Python: Uniqueness for list of lists

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas read_csv dtype read all columns but few as string

Tags:

python

pandas

csv

Nikhil VJ

People also ask

2 Answers

Nathan

Muhammet Coskun

Recent Activity

Donate For Us