How to force pandas read_csv to use float32 for all float columns?

2 Answers

Try:

import numpy as np import pandas as pd  # Sample 100 rows of data to determine dtypes. df_test = pd.read_csv(filename, nrows=100)  float_cols = [c for c in df_test if df_test[c].dtype == "float64"] float32_cols = {c: np.float32 for c in float_cols}  df = pd.read_csv(filename, engine='c', dtype=float32_cols)

This first reads a sample of 100 rows of data (modify as required) to determine the type of each column.

It the creates a list of those columns which are 'float64', and then uses dictionary comprehension to create a dictionary with these columns as the keys and 'np.float32' as the value for each key.

Finally, it reads the whole file using the 'c' engine (required for assigning dtypes to columns) and then passes the float32_cols dictionary as a parameter to dtype.

df = pd.read_csv(filename, nrows=100) >>> df    int_col  float1 string_col  float2 0        1     1.2          a     2.2 1        2     1.3          b     3.3 2        3     1.4          c     4.4  >>> df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 3 entries, 0 to 2 Data columns (total 4 columns): int_col       3 non-null int64 float1        3 non-null float64 string_col    3 non-null object float2        3 non-null float64 dtypes: float64(2), int64(1), object(1)  df32 = pd.read_csv(filename, engine='c', dtype={c: np.float32 for c in float_cols}) >>> df32.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 3 entries, 0 to 2 Data columns (total 4 columns): int_col       3 non-null int64 float1        3 non-null float32 string_col    3 non-null object float2        3 non-null float32 dtypes: float32(2), int64(1), object(1)

113

answered Sep 22 '22 08:09

Alexander

Here's a solution which does not depend on .join or does not require reading the file twice:

float64_cols = df.select_dtypes(include='float64').columns mapper = {col_name: np.float32 for col_name in float64_cols} df = df.astype(mapper)

Or for kicks as a one-liner:

df = df.astype({c: np.float32 for c in df.select_dtypes(include='float64').columns})

answered Sep 26 '22 08:09

jorijnsmit

Related questions
                            
                                Is there a pandas function to display the first/last n columns, as in .head() & .tail()?
                            
                                adjusting label to slider value swift
                            
                                How to create a s3 bucket using Boto3?
                            
                                How to convert a Vagrant box to a Docker image
                            
                                Print statement in Celery scheduled task doesn't appear in terminal
                            
                                Elastic Search nested multimatch query
                            
                                Stratified sampling in Spark
                            
                                Can React animate a component being hidden/removed?
                            
                                Deploy/Publish Android app made with React Native
                            
                                Gatling - Log body of request in simulation.log or console
                            
                                Python requests.exceptions.SSLError: EOF occurred in violation of protocol
                            
                                Updating Play services from 8.1 to 8.3 with Proguard enabled

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to force pandas read_csv to use float32 for all float columns?

Tags:

Fabian

People also ask

2 Answers

Alexander

jorijnsmit

Recent Activity

Donate For Us