Is it possible to set a custom length for object data type in Python pandas? For example, in my test data frame, one column with dtyp=object increases it's size ~60%. Though the values in this column are just "Y" or "N".
"Passing memory_usage='deep' will enable a more accurate memory usage report, that accounts for the full usage of the contained objects"
df.info(memory_usage='deep')
dtypes: datetime64ns, float64(8), int16(2), int8(4), object(1) memory usage: 14.7 MB
df.info()
dtypes: datetime64ns, float64(8), int16(2), int8(4), object(1) memory usage: 9.2+ MB
This looks like very memory inefficient, though I couldn't find any option/data type, which could reduce the size. (for example, like int8 instead of int64)
Best way to deal with that is to use Categoricals. It will use int8
to store the values.
df = pd.DataFrame({'A': np.random.choice(['Y', 'N'], size=10**6)})
df.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 1 columns):
A 1000000 non-null object
dtypes: object(1)
memory usage: 62.9 MB
df['A'] = df['A'].astype('category')
df.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 1 columns):
A 1000000 non-null category
dtypes: category(1)
memory usage: 976.8 KB
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With