Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas recalculate index after a concatenation

Tags:

python

pandas

People also ask

How do you reset the index of a data frame?

Use DataFrame.reset_index() function We can use DataFrame. reset_index() to reset the index of the updated DataFrame. By default, it adds the current row index as a new column called 'index' in DataFrame, and it will create a new row index as a range of numbers starting at 0.

Does PD concat use index?

Now, we know that the concat() function preserves indices. If you'd like to verify that the indices in the result of pd. concat() do not overlap, you can set the argument verify_integrity=True . With this set to True , it will raise an exception if there are duplicate indices.

What does reset_index () do in pandas?

Pandas DataFrame reset_index() Method The reset_index() method allows you reset the index back to the default 0, 1, 2 etc indexes. By default this method will keep the "old" idexes in a column named "index", to avoid this, use the drop parameter.

How do I reindex a DataFrame in Python?

One can reindex a single column or multiple columns by using reindex() method and by specifying the axis we want to reindex. Default values in the new index that are not present in the dataframe are assigned NaN.


If your index is autogenerated and you don't want to keep it, you can use the ignore_index option. `

train_df = pd.concat(train_class_df_list, ignore_index=True)

This will autogenerate a new index for you, and my guess is that this is exactly what you are after.


After vertical concatenation, if you get an index of [0, n) followed by [0, m), all you need to do is call reset_index:

train_df.reset_index(drop=True)

(you can do this in place using inplace=True).


import pandas as pd

>>> pd.concat([
    pd.DataFrame({'a': [1, 2]}), 
    pd.DataFrame({'a': [1, 2]})]).reset_index(drop=True)
    a
0   1
1   2
2   1
3   2

This should work:

train_df.reset_index(inplace=True, drop=True) 

Set drop to True to avoid an additional column in your dataframe.