Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make a column based on previous values in dataframe

Tags:

python

pandas

I have a data frame:

user_id      url
111          google.com
111          youtube.com
111          youtube.com
111          google.com
111          stackoverflow.com
111          google.com
222          twitter.com
222          google.com
222          twitter.com

I want to create a column that will show the fact of visiting this URL before.

Desired output:

user_id      url                 target
111          google.com          0
111          youtube.com         0
111          youtube.com         1
111          google.com          1
111          stackoverflow.com   0
111          google.com          1
222          twitter.com         0
222          google.com          0
222          twitter.com         1

I can do this with a loop but it doesn't look good. Is it possible to make it with pandas?

like image 804
ldevyataykina Avatar asked Dec 16 '20 21:12

ldevyataykina


People also ask

How do I change DataFrame column values based on conditions?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

Can I use loc and ILOC together?

In this case, loc and iloc are interchangeable when selecting via a single value or a list of values. Note that loc and iloc will return different results when selecting via slice and conditions.

What does ILOC and loc do?

loc selects rows and columns with specific labels. iloc selects rows and columns at specific integer positions.


2 Answers

Use duplicated:

df['target'] = df.duplicated().astype(int)
print(df)

Output

   user_id                url  target
0      111         google.com       0
1      111        youtube.com       0
2      111        youtube.com       1
3      111         google.com       1
4      111  stackoverflow.com       0
5      111         google.com       1
6      222        twitter.com       0
7      222         google.com       0
8      222        twitter.com       1
like image 137
Dani Mesejo Avatar answered Nov 15 '22 06:11

Dani Mesejo


df['target'] =df.groupby(['user_id','url']).cumcount().gt(0).astype(int)

    user_id            url      target
0      111         google.com       0
1      111        youtube.com       0
2      111        youtube.com       1
3      111         google.com       1
4      111  stackoverflow.com       0
5      111         google.com       1
6      222        twitter.com       0
7      222         google.com       0
8      222        twitter.com       1
like image 33
wwnde Avatar answered Nov 15 '22 07:11

wwnde