Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find position of word in file?

Tags:

python

for example I have file and word "test". file is partially binary but have string "test". How to find position of word ( index ) in file without load to memory this file ?

like image 910
bdfy Avatar asked Aug 08 '11 10:08

bdfy


3 Answers

You cannot find the position of a text within a file unless you open the file. It is like asking someone to read a newspaper without opening the eye.

To answer the first part of your question, it is relatively simple.

with open('Path/to/file', 'r') as f:
    content = f.read()
    print content.index('test')
like image 120
Pankaj Parashar Avatar answered Oct 26 '22 05:10

Pankaj Parashar


You can use memory-mapped files and regular expressions.

Memory-mapped file objects behave like both strings and like file objects. Unlike normal string objects, however, these are mutable. You can use mmap objects in most places where strings are expected; for example, you can use the re module to search through a memory-mapped file. Since they’re mutable, you can change a single character by doing obj[index] = 'a', or change a substring by assigning to a slice: obj[i1:i2] = '...'. You can also read and write data starting at the current file position, and seek() through the file to different positions.

Example

import re
import mmap

f = open('path/filename', 'r+b')
mf = mmap.mmap(f.fileno(), 0)
mf.seek(0) # reset file cursor
m = re.search('pattern', mf)
print m.start(), m.end()
mf.close()
f.close()
like image 38
Nick Dandoulakis Avatar answered Oct 26 '22 05:10

Nick Dandoulakis


Try this:

with open(file_dmp_path, 'rb') as file:
fsize = bsize = os.path.getsize(file_dmp_path)
word_len = len(SEARCH_WORD)
while True:
    p = file.read(bsize).find(SEARCH_WORD)
    if p > -1:
        pos_dec = file.tell() - (bsize - p)
        file.seek(pos_dec + word_len)
        bsize = fsize - file.tell()
    if file.tell() < fsize:
        seek = file.tell() - word_len + 1
        file.seek(seek)
    else:
        break
like image 43
Israel Alberto RV Avatar answered Oct 26 '22 04:10

Israel Alberto RV