Odd issue with .isin() and strings (Python/Pandas)

Tags:

I'm having a strange problem with the Pandas .isin() method. I'm doing a project in which I need to identify bad passwords by length, common word/password lists, etc (don't worry, this is from a public source). One of the ways is to see if someone is using part of their name as a password. I'm using .isin() to determine if that is the case, but it's giving me weird results. To show:

Click to copy

# Extracting first and last names into their own columns
users['first_name'] = users.user_name.str.extract('(^.+)(\.)', expand = False)[0]
users['last_name'] = users.user_name.str.extract('\.(.+)', expand = False)

# Flagging the users with passwords that matches their names
users['uses_name'] = (users['password'].isin(users.first_name)) | (users['password'].isin(users.last_name))

# Looking at the new data
print(users[users['uses_name']][['password','user_name','first_name','last_name','uses_name']].head())

The output of this is:

Click to copy

   password            user_name first_name  last_name uses_name
7    murphy          noreen.hale     noreen       hale      True
11  hubbard      milford.hubbard    milford    hubbard      True
22  woodard        jenny.woodard      jenny    woodard      True
30     reid         rosanna.reid    rosanna       reid      True
58   golden  rosalinda.rodriquez  rosalinda  rodriquez      True

Mostly it's good; milford.hubbard is using 'hubbard' as the password, etc. But then we have several examples like the first one. Noreen Hale is being flagged, despite her password being "murphy", which shares only a single letter with her name.

I can't for the life of me figure out what is causing this. Does anyone know why this is happening, and how to fix it?

578

asked Mar 05 '18 23:03

tq343

2 Answers

Since you need to compare adjacent columns in the same row, vectorisation isn't much of an option here. As such, you could use the (possibly) fastest alternative at your disposal: a list comprehension:

Click to copy

df['uses_name'] = [
       pwd in name for name, pwd in zip(df.user_name, df.password)
]

Or, if you dislike loops, you can hide them with np.vectorize:

Click to copy

def f(name, pwd):
    return pwd in name

v = np.vectorize(f)
df['uses_name'] = v(df.user_name, df.password)

Click to copy

df
   password            user_name  uses_name
7    murphy          noreen.hale      False
11  hubbard      milford.hubbard       True
22  woodard        jenny.woodard       True
30     reid         rosanna.reid       True
58   golden  rosalinda.rodriquez      False

Considering you extract first_name and last_name from user_name, I don't think you need it here.

165

answered Sep 18 '22 17:09

cs95

Regarding the reason why this error occurs:

If you do users['password'].isin(users.first_name) you ask for each row of users['password'] if the element is contained in ANY of the elements in the column first_name Therefore I assume that the element murphy is somewhere in that column

answered Sep 19 '22 17:09

DZurico

Related questions
                            
                                python asyncio.Event.wait() not responding to event.set()
                            
                                Queryset: Compare a field with a substring of another field of the same model
                            
                                Pythonic reduce with accumlation and arbitrary lambda function?
                            
                                Is it possible to edit MS word doc files with Python?
                            
                                Plotly Dash Cannot Create Graphs Dynamically
                            
                                Sorting in a Pandas pivot_table
                            
                                Module object has no attribute leaky_relu
                            
                                What is the Rust equivalent of a reverse shell script written in Python?
                            
                                Python 3.Kivy. Is there any way to limit entered text in TextInput widget?
                            
                                Mark a class as abstract without defining any abstract methods
                            
                                Matplotlib 3D: Remove axis ticks & draw upper edge border?
                            
                                Using flask-jwt-extended callbacks with flask-restful and create_app
                            
                                How to loop though range and randomly shuffle a list in Python?
                            
                                Long paths for python on windows - os.stat() fails for relative paths?
                            
                                Why does sys.excepthook behave differently when wrapped?
                            
                                Get trouble to load glove 840B 300d vector
                            
                                Please advise on Ruby vs Python, for someone who likes LISP a lot
                            
                                When should I commit with SQLAlchemy using a for loop?
                            
                                How to hash *args **kwargs for function cache?
                            
                                Is there a Pythonic way to close over a loop variable?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Odd issue with .isin() and strings (Python/Pandas)

Tags:

python

regex

pandas

tq343

People also ask

2 Answers

cs95

DZurico

Recent Activity

Donate For Us