I have a data frame: <pre class="prettyprint"><code>user_id url 111 google.com 111 youtube.com 111 youtube.com 111 google.com 111 stackoverflow.com 111 google.com 222 twitter.com 222 google.com 222 twitter.com </code></pre> I want to create a column that will show the fact of visiting this URL before. Desired output: <pre class="prettyprint"><code>user_id url target 111 google.com 0 111 youtube.com 0 111 youtube.com 1 111 google.com 1 111 stackoverflow.com 0 111 google.com 1 222 twitter.com 0 222 google.com 0 222 twitter.com 1 </code></pre> I can do this with a loop but it doesn't look good. Is it possible to make it with pandas?

Use duplicated: <pre class="prettyprint"><code>df['target'] = df.duplicated().astype(int) print(df) </code></pre> Output <pre class="prettyprint"><code> user_id url target 0 111 google.com 0 1 111 youtube.com 0 2 111 youtube.com 1 3 111 google.com 1 4 111 stackoverflow.com 0 5 111 google.com 1 6 222 twitter.com 0 7 222 google.com 0 8 222 twitter.com 1 </code></pre>

How to make a column based on previous values in dataframe

Tags:

python

pandas

I have a data frame:

user_id      url
111          google.com
111          youtube.com
111          youtube.com
111          google.com
111          stackoverflow.com
111          google.com
222          twitter.com
222          google.com
222          twitter.com

I want to create a column that will show the fact of visiting this URL before.

Desired output:

user_id      url                 target
111          google.com          0
111          youtube.com         0
111          youtube.com         1
111          google.com          1
111          stackoverflow.com   0
111          google.com          1
222          twitter.com         0
222          google.com          0
222          twitter.com         1

I can do this with a loop but it doesn't look good. Is it possible to make it with pandas?

804

asked Dec 16 '20 21:12

ldevyataykina

2 Answers

Use duplicated:

df['target'] = df.duplicated().astype(int)
print(df)

Output

   user_id                url  target
0      111         google.com       0
1      111        youtube.com       0
2      111        youtube.com       1
3      111         google.com       1
4      111  stackoverflow.com       0
5      111         google.com       1
6      222        twitter.com       0
7      222         google.com       0
8      222        twitter.com       1

137

answered Nov 15 '22 06:11

Dani Mesejo

df['target'] =df.groupby(['user_id','url']).cumcount().gt(0).astype(int)

    user_id            url      target
0      111         google.com       0
1      111        youtube.com       0
2      111        youtube.com       1
3      111         google.com       1
4      111  stackoverflow.com       0
5      111         google.com       1
6      222        twitter.com       0
7      222         google.com       0
8      222        twitter.com       1

answered Nov 15 '22 07:11

wwnde

Related questions
                            
                                On a django site I am getting socket cluster error
                            
                                How do you make pylint in VSCode know that it's in a package (so that relative imports work)?
                            
                                Python: Dynamically create class while providing arguments to __init__subclass__()
                            
                                Calculate intersection over union (Jaccard's index) in pandas dataframe
                            
                                botocore.exceptions.SSLError: SSL validation failed on WIndows
                            
                                Have unique index value in Pandas DataFrame
                            
                                Where should I put abstract classes in a python package?
                            
                                What shebang should I use to consistently point to python3?
                            
                                Get starlette request body in the middleware context
                            
                                Replace a pandas column by splitting the text based on "_"
                            
                                Add missing rows based on column
                            
                                How to make text processing in a pandas df column more faster for large textual data?
                            
                                InvalidArgumentError: Specified a list with shape [60,9] from a tensor with shape [56,9]
                            
                                Fine-tune Bert for specific domain (unsupervised)
                            
                                Can Homebrew run on Apple ARM processors?
                            
                                How can I properly run 2 threads that await things at the same time?
                            
                                Running into 'java.lang.OutOfMemoryError: Java heap space' when using toPandas() and databricks connect
                            
                                How to find an existing HTML element with python-selenium in a jupyterhub page?
                            
                                VS Code / Pylance / Pylint Cannot resolve import
                            
                                cv2.imwrite() SystemError: <built-in function imwrite> returned NULL without setting an error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With