Can pd.DataFrame.set_index mantain dtype?

Question

I am trying to call df.set_index in such a way that the dtype of the column I set_index on is the new index.dtype. Unfortunately, in the following example, set_index changes the dtype.

df = pd.DataFrame({'a': pd.Series(np.array([-1, 0, 1, 2], dtype=np.int8))})
df['ignore'] = df['a']
assert (df.dtypes == np.int8).all() # fine
df2=  df.set_index('a')
assert df2.index.dtype == df['a'].dtype, df2.index.dtype

Is it possible to avoid this behavior? My pandas version is 0.23.3

Similarly,

new_idx = pd.Index(np.array([-1, 0, 1, 2]), dtype=np.dtype('int8'))
assert new_idx.dtype == np.dtype('int64')

Even though the documentation for the dtype parameter says: "If an actual dtype is provided, we coerce to that dtype if it's safe. Otherwise, an error will be raised."

piRSquared · Accepted Answer

Despite my bloviating in the comments above, this might suffice to get an appropriate index that is both low memory and starts from -1.

`pandas.RangeIndex`

Takes a start and stop parameters like range

df = df.set_index(pd.RangeIndex(-1, len(df) - 1))

print(df.index, df.index.dtype, sep='
')

This should be very memory efficient.

Despite it still being of dtype int64 (which you should want), it takes up very little memory.

pd.RangeIndex(-1, 4000000).memory_usage()

84

And

for i in range(1, 1000000, 100000):
  print(pd.RangeIndex(-1, i).memory_usage())

84
84
84
84
84
84
84
84
84
84

Can pd.DataFrame.set_index mantain dtype?

Tags:

python

pandas

Sam Shleifer

1 Answers

`pandas.RangeIndex`

piRSquared

Recent Activity

Donate For Us

Can pd.DataFrame.set_index mantain dtype?

Tags:

python

pandas

Sam Shleifer

1 Answers

pandas.RangeIndex

piRSquared

Related questions

Recent Activity

Donate For Us

`pandas.RangeIndex`