Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if string exists in a text file

Tags:

python

regex

So I've got:

def CheckUserExists(user):
    with open("C:/~/database.txt", 'r') as file:
        if re.search(user, file.read()):
            return True
        else:
            return False

username = input("Please enter you Username: ")
if CheckUserExists(username) == True:
    print("You exist!")
else:
    print("This user does not exist...")

However if you enter for example the letter 'a' and the is a user called 'brain'; the search picks up the a and returns True. How do I search for whole words?

I have looked here: How to check in Python if string is in a text file and print the line? however I don't understand the piece of code:

re.search("\b{0}\b".format(w),line)
like image 392
rexasarus Avatar asked Oct 15 '14 20:10

rexasarus


2 Answers

The regular expression, \b, refers to the empty string at a word boundary, where word is \w+, or [A-Za-z0-9_]+

If you have one name per line (with no other whitespace around the names), you can search by line with ^{0}$ with the re.M or re.MULTILINE flag

That would look like this:

def CheckUserExists(user):
    with open("C:/~/database.txt", 'r') as file:
        if re.search('^{0}$'.format(re.escape(user)), file.read(), flags=re.M):
            return True
        else:
            return False

username = input("Please enter you Username: ")
if CheckUserExists(username): # it's redundant to check if == True here
    print("You exist!")
else:
    print("This user does not exist...")

Although a comment and an answer suggest, if you do

if user in file.read()

you may have false positives.

like image 72
Russia Must Remove Putin Avatar answered Oct 22 '22 17:10

Russia Must Remove Putin


To check whether a space-separated word exists in a file:

with open(filename) as file:
    found = (word in file.read().split())

Or the same but reading line by line instead of loading all in memory:

with open(filename) as file:
    found = any(word in line.split() for line in file)

If the format of the file is one word (/user) per line:

with open(filename) as file:
    found = any(word == line.strip() for line in file)

You don't need regular expressions in simple cases. If there could be multiple words per line and there could be arbitrary punctuation within then you could use the regex that you've linked:

import re

matched = re.compile(r"\b" + re.escape(word) + r"\b").search
with open(filename) as file:
    found = any(matched(line) for line in file)

\b regular expression matches a word boundary (start or the end of a word). Word characters are letters, digits, and underscore. re.escape() is used in case word contains regex metacharacters such as *.

like image 40
jfs Avatar answered Oct 22 '22 18:10

jfs