I have a data frame:
user_id url
111 google.com
111 youtube.com
111 youtube.com
111 google.com
111 stackoverflow.com
111 google.com
222 twitter.com
222 google.com
222 twitter.com
I want to create a column that will show the fact of visiting this URL before.
Desired output:
user_id url target
111 google.com 0
111 youtube.com 0
111 youtube.com 1
111 google.com 1
111 stackoverflow.com 0
111 google.com 1
222 twitter.com 0
222 google.com 0
222 twitter.com 1
I can do this with a loop but it doesn't look good. Is it possible to make it with pandas?
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
In this case, loc and iloc are interchangeable when selecting via a single value or a list of values. Note that loc and iloc will return different results when selecting via slice and conditions.
loc selects rows and columns with specific labels. iloc selects rows and columns at specific integer positions.
Use duplicated:
df['target'] = df.duplicated().astype(int)
print(df)
Output
user_id url target
0 111 google.com 0
1 111 youtube.com 0
2 111 youtube.com 1
3 111 google.com 1
4 111 stackoverflow.com 0
5 111 google.com 1
6 222 twitter.com 0
7 222 google.com 0
8 222 twitter.com 1
df['target'] =df.groupby(['user_id','url']).cumcount().gt(0).astype(int)
user_id url target
0 111 google.com 0
1 111 youtube.com 0
2 111 youtube.com 1
3 111 google.com 1
4 111 stackoverflow.com 0
5 111 google.com 1
6 222 twitter.com 0
7 222 google.com 0
8 222 twitter.com 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With