Because
Note that not all columns in the raw csv file have float types. I only need to set float32 as the default for float columns.
Python loads CSV files 100 times faster than Excel files. Use CSVs.
The default value of the sep parameter is the comma (,) which means if we don't specify the sep parameter in our read_csv() function, it is understood that our file is using comma as the delimiter.
If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.
Try:
import numpy as np import pandas as pd # Sample 100 rows of data to determine dtypes. df_test = pd.read_csv(filename, nrows=100) float_cols = [c for c in df_test if df_test[c].dtype == "float64"] float32_cols = {c: np.float32 for c in float_cols} df = pd.read_csv(filename, engine='c', dtype=float32_cols)
This first reads a sample of 100 rows of data (modify as required) to determine the type of each column.
It the creates a list of those columns which are 'float64', and then uses dictionary comprehension to create a dictionary with these columns as the keys and 'np.float32' as the value for each key.
Finally, it reads the whole file using the 'c' engine (required for assigning dtypes to columns) and then passes the float32_cols dictionary as a parameter to dtype.
df = pd.read_csv(filename, nrows=100) >>> df int_col float1 string_col float2 0 1 1.2 a 2.2 1 2 1.3 b 3.3 2 3 1.4 c 4.4 >>> df.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 3 entries, 0 to 2 Data columns (total 4 columns): int_col 3 non-null int64 float1 3 non-null float64 string_col 3 non-null object float2 3 non-null float64 dtypes: float64(2), int64(1), object(1) df32 = pd.read_csv(filename, engine='c', dtype={c: np.float32 for c in float_cols}) >>> df32.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 3 entries, 0 to 2 Data columns (total 4 columns): int_col 3 non-null int64 float1 3 non-null float32 string_col 3 non-null object float2 3 non-null float32 dtypes: float32(2), int64(1), object(1)
Here's a solution which does not depend on .join
or does not require reading the file twice:
float64_cols = df.select_dtypes(include='float64').columns mapper = {col_name: np.float32 for col_name in float64_cols} df = df.astype(mapper)
Or for kicks as a one-liner:
df = df.astype({c: np.float32 for c in df.select_dtypes(include='float64').columns})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With