I am reading in data from a csv file into a data frame, trying to remove all rows that contain NaNs and then convert it from float64 to float32. I have tried various solutions I've found online, nothing seems to work. Any thoughts?
I think this does what you want:
pd.read_csv('Filename.csv').dropna().astype(np.float32)
To keep rows that only have some NaN values, do this:
pd.read_csv('Filename.csv').dropna(how='all').astype(np.float32)
To replace each NaN with a number instead of dropping rows, do this:
pd.read_csv('Filename.csv').fillna(1e6).astype(np.float32)
(I replaced NaN with 1,000,000 just as an example.)
You can also specify the dtype
when you read the csv file:
dtype : Type name or dict of column -> type Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
pd.read_csv(my_file, dtype={col: np.float32 for col in ['col_1', 'col_2']})
Example:
df_out = pd.DataFrame(np.random.random([5,5]), columns=list('ABCDE'))
df_out.iat[1,0] = np.nan
df_out.iat[2,1] = np.nan
df_out.to_csv('my_file.csv')
df = pd.read_csv('my_file.csv', dtype={col: np.float32 for col in list('ABCDE')})
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 6 columns):
Unnamed: 0 5 non-null int64
A 4 non-null float32
B 4 non-null float32
C 5 non-null float32
D 5 non-null float32
E 5 non-null float32
dtypes: float32(5), int64(1)
memory usage: 180.0 bytes
>>> df.dropna(axis=0, how='any')
Unnamed: 0 A B C D E
0 0 0.176224 0.943918 0.322430 0.759862 0.028605
3 3 0.723643 0.105813 0.884290 0.589643 0.913065
4 4 0.654378 0.400152 0.763818 0.416423 0.847861
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With