Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get index where value changes in pandas dataframe column

I am trying to expand my pandas skills. I have a pandas dataframe that looks like this :

df

      Group 1     Group 2            Product ID
0   Products      International      X11
1   Products      International      X11
2   Products      Domestic           X11
3   Products      Domestic           X23
4   Services      Professional       X23
5   Services      Professional       X23
6   Services      Analytics          X25

I am trying to use some pandas functionality to get the index where the values of Group 1 and Group 2 change. I understand that I will probably have to go column by column, and append these indices into different lists.

I have referenced this question Find index where elements change value pandas dataframe which was the closest similar question that I can find.

I am trying to get an output like this:

 Group 1 changes = [0,4]
 Group 2 changes = [0,2,4,6]

Is there any sort of built in functionality that pandas has that can quickly reference if two values in a column are the same, and then grab that index?

All of my data is sorted by group, so shouldn't run into any problems if the solution does involve iterating row by row.

Any help is much appreciated!

like image 621
Warthog1 Avatar asked Feb 07 '18 20:02

Warthog1


People also ask

What is Reset_index () in pandas?

Pandas DataFrame reset_index() Method The reset_index() method allows you reset the index back to the default 0, 1, 2 etc indexes. By default this method will keep the "old" idexes in a column named "index", to avoid this, use the drop parameter.

Does ILOC use index?

iloc[] is an index-based to select rows and/or columns in pandas. It accepts a single index, multiple indexes from the list, indexes by a range, and many more. One of the main advantages of DataFrame is its ease of use.

What does .values in pandas do?

The values property is used to get a Numpy representation of the DataFrame. Only the values in the DataFrame will be returned, the axes labels will be removed. The values of the DataFrame. A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.


2 Answers

Use

In [91]: df.ne(df.shift()).apply(lambda x: x.index[x].tolist())
Out[91]:
Group 1             [0, 4]
Group 2       [0, 2, 4, 6]
Product ID       [0, 3, 6]
dtype: object

In [92]: df.ne(df.shift()).filter(like='Group').apply(lambda x: x.index[x].tolist())
Out[92]:
Group 1          [0, 4]
Group 2    [0, 2, 4, 6]
dtype: object

Also for dict,

In [107]: {k: s.index[s].tolist() for k, s in df.ne(df.shift()).filter(like='Group').items()}
Out[107]: {'Group 1': [0L, 4L], 'Group 2': [0L, 2L, 4L, 6L]}
like image 166
Zero Avatar answered Sep 28 '22 04:09

Zero


This is one non-pandas solution. I like it because it is intuitive and requires no understanding of the large pandas library.

changes = {}

for col in df.columns:
    changes[col] = [0] + [idx for idx, (i, j) in enumerate(zip(df[col], df[col][1:]), 1) if i != j]

# {'Group 1': [0, 4], 'Group 2': [0, 2, 4, 6], 'Product ID': [0, 3, 6]}
like image 24
jpp Avatar answered Sep 28 '22 04:09

jpp