Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is a transposed pandas dataframe smaller than its original?

Tags:

python

pandas

I have a pandas dataframe which I checked the size of with sys.

sys.getsizeof(df)
# output: 136

If I transpose it I get

sys.getsizeof(df.T)
# output: 341

If I transpose twice I get

sys.getsizeof(df.T.T)
#output: 136

How is pandas managing the memory?

UPDATE:

I used df.memory_usage instead to yield the following (which confused me even more as copying yielded smaller in memory size). Is this related to the datatypes of the objects? Or maybe the column and index strings?

df = pd.DataFrame({"Total Unique Authors": author_count,
                              "Earliest Year": [earliest_year],
                              "Latest Year": [latest_year],
                              "Total Reviews": [total_reviews]})
print(df.memory_usage().sum())
print(df.copy().memory_usage().sum())
print(df.T.memory_usage().sum())
print(df.T.copy().memory_usage().sum())

OUTPUT

112
112
224
64
like image 288
Joe B Avatar asked Dec 28 '25 11:12

Joe B


1 Answers

Taken from sys documentation: Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.

Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.

however, I cannot reproduce your finding:

import sys
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10,3))
print(sys.getsizeof(df))
print(sys.getsizeof(df.T))

leads to

344
344

As commented by coldspeed, df.info() or 'df.memory_usage()' is more helpful.

like image 195
GenError Avatar answered Dec 31 '25 02:12

GenError



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!