Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check all the folder inside files and subfolder inside files have particular string present

  • I have folders and files present
  • I have subfolders and files also
  • I need to search a particular string present also in same file other string not present
  • all the files are in .txt
  • I need to check files which string 20210624 is present inside the file and string 20210625 not in the files
  • My output return the file names
import os
match_str = ['20210624']
not_match_str =  ['20210625']
for root, dirs, files in os.walk(path):
    for name in files:
        if name.endswith((".txt")):
             ## search files with match_str `20210624`  and not_match_str `20210625`

Can i use using import walk

like image 738
sim Avatar asked Jun 24 '21 11:06

sim


4 Answers

You can set the recursive keyword argument in the glob.glob() method to True for the program to search recursively through the files of the folders, subfolders, etc.

from glob import glob

path = 'C:\\Users\\User\\Desktop'
for file in glob(path + '\\**\\*.txt', recursive=True):
    with open(file) as f:
        text = f.read()
        if '20210624'  in text and '20210625' not in text:
            print(file)

If you don't want to entire path of the files to be printed; only the filenames, then:

from glob import glob

path = 'C:\\Users\\User\\Desktop'
for file in glob(path + '\\**\\*.txt', recursive=True):
    with open(file) as f:
        text = f.read()
        if '20210624'  in text and '20210625' not in text:
            print(file.split('\\')[-1])

In order to use the os.walk() method, you can use the str.endswith() method (as you have done in your post) like so:

import os

for path, _, files in os.walk('C:\\Users\\User\\Desktop'):
    for file in files:
        if file.endswith('.txt'):
            with open(os.path.join(path, file)) as f:
                text = f.read()
                if '20210624'  in text and '20210625' not in text:
                    print(file)

And to search within a maximum level of subdirectories:

import os

levels = 2
root = 'C:\\Users\\User\\Desktop'
total = root.count('\\') + levels

for path, _, files in os.walk(root):
    if path.count('\\') > total:
        break
    for file in files:
        if file.endswith('.txt'):
            print(os.path.join(path, file))
like image 53
Ann Zen Avatar answered Oct 16 '22 19:10

Ann Zen


You can achieve this with pathlib and glob.

import pathlib
path = pathlib.Path(path)
maybe_valids = list(path.glob("*20210624*.txt"))
valids = [elem for elem in maybe_valids if "20210625" not in elem.name]
print(valids)

maybe_valids list is created taking every element that contains "20210624" and ends with .txt, while valids are the ones that doesn't contain "20210625".

like image 25
crissal Avatar answered Oct 16 '22 17:10

crissal


Continue from here -

if name.endswith((".txt")):
   f = file.read(name,mode='r')
   a = f.read()
   if match_str[0] in f.read():
      # Number is present

You can use for loops for reading too if you have more than one match_str. Similarly, you can use not in keyword to check for not_match_str

like image 41
PCM Avatar answered Oct 16 '22 18:10

PCM


You can get the file names with several simple shell commands:

find . -name "*.txt" | xargs grep -l "20210624" | xargs grep -L "20210625"
like image 1
shdxiang Avatar answered Oct 16 '22 19:10

shdxiang