I was confused by this, which is very simple but I didn't immediately find the answer on StackOverflow: <ul> <li> <code>df.set_index('xcol')</code> makes the column <code>'xcol'</code> become the index (when it is a column of df). </li> <li> <code>df.reindex(myList)</code>, however, takes indexes from outside the dataframe, for example, from a list named <code>myList</code> that we defined somewhere else. </li> </ul> However, <code>df.reindex(myList)</code> also changes values to NAs. A simple alternative is: <code>df.index = myList</code> I hope this post clarifies it! Additions to this post are also welcome!

You can see the difference on a simple example. Let's consider this dataframe: <pre class="prettyprint"><code>df = pd.DataFrame({'a': [1, 2],'b': [3, 4]}) print (df) a b 0 1 3 1 2 4 </code></pre> Indexes are then 0 and 1 If you use <code>set_index</code> with the column 'a' then the indexes are 1 and 2. If you do <code>df.set_index('a').loc[1,'b']</code>, you will get 3. Now if you want to use <code>reindex</code> with the same indexes 1 and 2 such as <code>df.reindex([1,2])</code>, you will get 4.0 when you do <code>df.reindex([1,2]).loc[1,'b']</code> What happend is that <code>set_index</code> has replaced the previous indexes (0,1) with (1,2) (values from column 'a') without touching the order of values in the column 'b' <pre class="prettyprint"><code>df.set_index('a') b a 1 3 2 4 </code></pre> while <code>reindex</code> change the indexes but keeps the values in column 'b' associated to the indexes in the original df <pre class="prettyprint"><code>df.reindex(df.a.values).drop('a',1) # equivalent to df.reindex(df.a.values).drop('a',1) b 1 4.0 2 NaN # drop('a',1) is just to not care about column a in my example </code></pre> Finally, <code>reindex</code> change the order of indexes without changing the values of the row associated to each index, while <code>set_index</code> will change the indexes with the values of a column, without touching the order of the other values in the dataframe

Difference between df.reindex() and df.set_index() methods in pandas

2 Answers

You can see the difference on a simple example. Let's consider this dataframe:

df = pd.DataFrame({'a': [1, 2],'b': [3, 4]})
print (df)
   a  b
0  1  3
1  2  4

Indexes are then 0 and 1

If you use set_index with the column 'a' then the indexes are 1 and 2. If you do df.set_index('a').loc[1,'b'], you will get 3.

Now if you want to use reindex with the same indexes 1 and 2 such as df.reindex([1,2]), you will get 4.0 when you do df.reindex([1,2]).loc[1,'b']

What happend is that set_index has replaced the previous indexes (0,1) with (1,2) (values from column 'a') without touching the order of values in the column 'b'

df.set_index('a')
   b
a   
1  3
2  4

while reindex change the indexes but keeps the values in column 'b' associated to the indexes in the original df

df.reindex(df.a.values).drop('a',1) # equivalent to df.reindex(df.a.values).drop('a',1)
     b
1  4.0
2  NaN
# drop('a',1) is just to not care about column a in my example

Finally, reindex change the order of indexes without changing the values of the row associated to each index, while set_index will change the indexes with the values of a column, without touching the order of the other values in the dataframe

141

answered Oct 10 '22 22:10

Ben.T

Just to add, the undo to set_index would be reset_index method (more or less):

df = pd.DataFrame({'a': [1, 2],'b': [3, 4]})
print (df)

df.set_index('a', inplace=True)
print(df)

df.reset_index(inplace=True, drop=False)
print(df)

answered Oct 10 '22 21:10

prosti

Related questions
                            
                                How do I delete the Nth list item from a list of lists (column delete)?
                            
                                NLTK Tagging spanish words using a corpus
                            
                                Posting html form values to python script
                            
                                JSON schema validation with arbitrary keys
                            
                                pyodbc and python 3.4 on Windows
                            
                                Filtering pandas data frame by a list of id's
                            
                                GitPython tags sorted
                            
                                import check_arrays from sklearn
                            
                                How to find the first index of any of a set of characters in a string
                            
                                How to use login_required in django rest view
                            
                                Conditional assignment of tensor values in TensorFlow
                            
                                ValueError: Unable to configure handler 'file': [Errno 2] No such file or directory:
                            
                                How to insert scale bar in a map in matplotlib
                            
                                python - Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch
                            
                                Default pip installation of Dask gives "ImportError: No module named toolz"
                            
                                find max value of a list with numpy nan [duplicate]
                            
                                Whats the difference between os.urandom() and random?
                            
                                How to remove password for Jupyter Notebooks and set token again
                            
                                multiple column/row facet wrap in altair
                            
                                Python Requests with wincertstore

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between df.reindex() and df.set_index() methods in pandas

Tags:

python

indexing

python-3.x

pandas

reindex

Ricardo Guerreiro

People also ask

2 Answers

Ben.T

prosti

Recent Activity

Donate For Us