Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grep and Python

Tags:

python

regex

grep

I need a way of searching a file using grep via a regular expression from the Unix command line. For example when I type in the command line:

python pythonfile.py 'RE' 'file-to-be-searched' 

I need the regular expression 'RE' to be searched in the file and print out the matching lines.

Here's the code I have:

import re import sys  search_term = sys.argv[1] f = sys.argv[2]  for line in open(f, 'r'):     if re.search(search_term, line):         print line,         if line == None:             print 'no matches found' 

But when I enter a word which isn't present, no matches found doesn't print

like image 832
David Avatar asked Dec 17 '09 13:12

David


2 Answers

The natural question is why not just use grep?! But assuming you can't...

import re import sys  file = open(sys.argv[2], "r")  for line in file:      if re.search(sys.argv[1], line):          print line, 

Things to note:

  • search instead of match to find anywhere in string
  • comma (,) after print removes carriage return (line will have one)
  • argv includes python file name, so variables need to start at 1

This doesn't handle multiple arguments (like grep does) or expand wildcards (like the Unix shell would). If you wanted this functionality you could get it using the following:

import re import sys import glob  for arg in sys.argv[2:]:     for file in glob.iglob(arg):         for line in open(file, 'r'):             if re.search(sys.argv[1], line):                 print line, 
like image 100
Nick Fortescue Avatar answered Sep 25 '22 18:09

Nick Fortescue


Concise and memory efficient:

#!/usr/bin/env python # file: grep.py import re, sys, collections  collections.deque(map(sys.stdout.write,(l for l in sys.stdin if re.search(sys.argv[1],l))),maxlen=0) 

It works like egrep (without too much error handling), e.g.:

cat input-file | grep.py "RE" 

And here is the one-liner:

cat input-file | python -c "import re,sys,collections;collections.deque(map(sys.stdout.write,(l for l in sys.stdin if re.search(sys.argv[1],l))),maxlen=0)" "RE" 

Note that the collections.deque function is required in Python3 because map has become a lazy function.

like image 43
Giancarlo Sportelli Avatar answered Sep 24 '22 18:09

Giancarlo Sportelli