Python equivalent of unix "strings" utility

Question

I'm trying to write a script which will extract strings from an executable binary and save them in a file. Having this file be newline-separated isn't an option since the strings could have newlines themselves. This also means, however, that using the unix "strings" utility isn't an option, since it just prints out all the strings newline-separated, meaning there's no way to tell which strings have newlines included just by looking at the output of "strings". Thus, I was hoping to find a python function or library which implements the same functionality of "strings", but which will give me those strings as variables so that I can avoid the newline issue.

Thanks!

Zero Piraeus · Accepted Answer

Here's a generator that yields all the strings of printable characters >= min (4 by default) in length that it finds in filename:

import string

def strings(filename, min=4):
    with open(filename, errors="ignore") as f:  # Python 3.x
    # with open(filename, "rb") as f:           # Python 2.x
        result = ""
        for c in f.read():
            if c in string.printable:
                result += c
                continue
            if len(result) >= min:
                yield result
            result = ""
        if len(result) >= min:  # catch result at EOF
            yield result

Which you can iterate over:

for s in strings("something.bin"):
    # do something with s

... or store in a list:

sl = list(strings("something.bin"))

I've tested this very briefly, and it seems to give the same output as the Unix strings command for the arbitrary binary file I chose. However, it's pretty naïve (for a start, it reads the whole file into memory at once, which might be expensive for large files), and is very unlikely to approach the performance of the Unix strings command.

Sylvain Leroux · Answer

To quote man strings:

STRINGS(1)                   GNU Development Tools                  STRINGS(1)

NAME
       strings - print the strings of printable characters in files.

[...]
DESCRIPTION
       For each file given, GNU strings prints the printable character
       sequences that are at least 4 characters long (or the number given with
       the options below) and are followed by an unprintable character.  By
       default, it only prints the strings from the initialized and loaded
       sections of object files; for other types of files, it prints the
       strings from the whole file.

You could achieve a similar result by using a regex matching at least 4 printable characters. Something like that:

>>> import re

>>> content = "hello,\x02World\x88!"
>>> re.findall("[^\x00-\x1F\x7F-\xFF]{4,}", content)
['hello,', 'World']

Please note this solution require the entire file content to be loaded in memory.

Python equivalent of unix "strings" utility

Tags:

python

string

executable

joshlf

2 Answers

Zero Piraeus

Sylvain Leroux

Recent Activity

Donate For Us

Python equivalent of unix "strings" utility

Tags:

python

string

executable

joshlf

2 Answers

Zero Piraeus

Sylvain Leroux

Related questions

Recent Activity

Donate For Us