Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to set index name in Python Pandas DataFrame

When creating a blank dataframe in Pandas there appears to be at least 2 ways to set the index name.

df = pd.DataFrame(columns=['col1',  'col2'])
df.index.name = 'index name'



df = pd.DataFrame(columns=['index name', 'col1',  'col2'])
df.set_index('index name', inplace=True)

Is one preferred over the other? Is there a third way to do it in 1 line of code instead of 2?

like image 276
SamCie Avatar asked Mar 04 '23 05:03

SamCie


1 Answers

I think here is best use method chaining:

The pandas core team now encourages the use of method chaining. This is a style of programming in which you chain together multiple method calls into a single statement. This allows you to pass intermediate results from one method to the next rather than storing the intermediate results using variables.

Another soution DataFrame.rename_axis:

df = pd.DataFrame(columns=['col1',  'col2']).rename_axis('index name')

Or change your second solution:

df = pd.DataFrame(columns=['index name', 'col1',  'col2']).set_index('index name')

inplace is not recommended - link:

The pandas core team discourages the use of the inplace parameter, and eventually it will be deprecated (which means "scheduled for removal from the library"). Here's why:

inplace won't work within a method chain.
The use of inplace often doesn't prevent copies from being created, contrary to what the name implies.
Removing the inplace option would reduce the complexity of the pandas codebase.

like image 54
jezrael Avatar answered Mar 16 '23 17:03

jezrael