Is it possible when creating a dataframe from a list, to set the index as one of the values?
import pandas as pd
tmp = [['a', 'a1'], ['b',' b1']]
df = pd.DataFrame(tmp, columns=["First", "Second"])
First Second
0 a a1
1 b b1
And how I'd like it to look:
First Second
a a a1
b b b1
We can set a specific column or multiple columns as an index in pandas DataFrame. Create a list of column labels to be used to set an index. We need to pass the column or list of column labels as input to the DataFrame. set_index() function to set it as an index of DataFrame.
rename() function in the pandas Series functionalities, which is used to change the series index labels or to change the name of the series object.
Set the DataFrame index using existing columns. Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length). The index can replace the existing index or expand on it.
Use DataFrame.reset_index() function We can use DataFrame. reset_index() to reset the index of the updated DataFrame. By default, it adds the current row index as a new column called 'index' in DataFrame, and it will create a new row index as a range of numbers starting at 0.
Change it to list before assigning it to index
df.index = list(df["First"])
>>> pd.DataFrame(tmp, columns=["First", "Second"]).set_index('First', drop=False)
First Second
First
a a a1
b b b1
set_axis
To set arbitrary values as the index, best practice is to use set_axis
:
df = df.set_axis(['idx1', 'idx2'])
# First Second
# idx1 a a1
# idx2 b b1
set_index
(list vs array)It's also possible to pass arbitrary values to set_index
, but note the difference between passing a list vs array:
list — set_index
assigns these columns as the index:
df.set_index(['First', 'First'])
# Second
# First First
# a a a1
# b b b1
array (Series/Index/ndarray) — set_index
assigns these values as the index:
df = df.set_index(pd.Series(['First', 'First']))
# First Second
# First a a1
# First b b1
Note that passing arrays to set_index
is very contentious among the devs and may even get deprecated.
df.index
directly?Directly modifying attributes is fine and is used often, but using methods has its advantages:
Methods provide better error checking, e.g.:
df = df.set_axis(['idx1', 'idx2', 'idx3'])
# ValueError: Length mismatch: Expected axis has 2 elements, new values have 3 elements
df.index = ['idx1', 'idx2', 'idx3']
# No error despite length mismatch
Methods can be chained, e.g.:
df.some_method().set_axis(['idx1', 'idx2']).another_method()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With