Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reduce memory dedicated to pandas dtype=object

Is it possible to set a custom length for object data type in Python pandas? For example, in my test data frame, one column with dtyp=object increases it's size ~60%. Though the values in this column are just "Y" or "N".

"Passing memory_usage='deep' will enable a more accurate memory usage report, that accounts for the full usage of the contained objects"

df.info(memory_usage='deep')

dtypes: datetime64ns, float64(8), int16(2), int8(4), object(1) memory usage: 14.7 MB

df.info()

dtypes: datetime64ns, float64(8), int16(2), int8(4), object(1) memory usage: 9.2+ MB

This looks like very memory inefficient, though I couldn't find any option/data type, which could reduce the size. (for example, like int8 instead of int64)

like image 887
Grinvydas Kareiva Avatar asked Dec 23 '22 18:12

Grinvydas Kareiva


1 Answers

Best way to deal with that is to use Categoricals. It will use int8 to store the values.

df = pd.DataFrame({'A': np.random.choice(['Y', 'N'], size=10**6)})
df.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 1 columns):
A    1000000 non-null object
dtypes: object(1)
memory usage: 62.9 MB

df['A'] = df['A'].astype('category')

df.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 1 columns):
A    1000000 non-null category
dtypes: category(1)
memory usage: 976.8 KB
like image 187
ayhan Avatar answered Dec 28 '22 22:12

ayhan