Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I fix "TypeError: cannot serialize '_io.BufferedReader' object" error when trying to multiprocess

I'm trying to switch the threading in my code to multiprocessing to measure its performance and hopefully achieve better brute-forcing potential as my program is meant to brute-force password protected .zip files. But whenever I try to run the program I get this:

BruteZIP2.py -z "Generic ZIP.zip" -f  Worm.txt
Traceback (most recent call last):
  File "C:\Users\User\Documents\Jetbrains\PyCharm\BruteZIP\BruteZIP2.py", line 40, in <module>
    main(args.zip, args.file)
  File "C:\Users\User\Documents\Jetbrains\PyCharm\BruteZIP\BruteZIP2.py", line 34, in main
    p.start()
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot serialize '_io.BufferedReader' object

I did find threads that had the same issue as I did but they were both unanswered/unsolved. I also tried inserting Pool above p.start() as I believe this was caused due to the fact that I am on a Windows-based machine but it was no help. My code is as follows:

  import argparse
  from multiprocessing import Process
  import zipfile

  parser = argparse.ArgumentParser(description="Unzips a password protected .zip by performing a brute-force attack using either a word list, password list or a dictionary.", usage="BruteZIP.py -z zip.zip -f file.txt")
  # Creates -z arg
  parser.add_argument("-z", "--zip", metavar="", required=True, help="Location and the name of the .zip file.")
  # Creates -f arg
  parser.add_argument("-f", "--file", metavar="", required=True, help="Location and the name of the word list/password list/dictionary.")
  args = parser.parse_args()


  def extract_zip(zip_file, password):
      try:
          zip_file.extractall(pwd=password)
          print(f"[+] Password for the .zip: {password.decode('utf-8')} \n")
      except:
          # If a password fails, it moves to the next password without notifying the user. If all passwords fail, it will print nothing in the command prompt.
          print(f"Incorrect password: {password.decode('utf-8')}")
          # pass


  def main(zip, file):
      if (zip == None) | (file == None):
          # If the args are not used, it displays how to use them to the user.
          print(parser.usage)
          exit(0)
      zip_file = zipfile.ZipFile(zip)
      # Opens the word list/password list/dictionary in "read binary" mode.
      txt_file = open(file, "rb")
      for line in txt_file:
          password = line.strip()
          p = Process(target=extract_zip, args=(zip_file, password))
          p.start()
          p.join()


  if __name__ == '__main__':
      # BruteZIP.py -z zip.zip -f file.txt.
      main(args.zip, args.file)

As I said before, I believe this is happening mainly because I am on a Windows-based machine right now. I shared my code with a few others who were on Linux based machines and they had no problem running the code above.

My main goal here is to get 8 processes/pools started to maximize the number of attempts done compared to threading, but due to the fact that I cannot get a fix for TypeError: cannot serialize '_io.BufferedReader' object message I am unsure on what to do here and how can I go on to fix it. Any assistance would be appreciated.

like image 631
Arszilla Avatar asked Feb 03 '19 20:02

Arszilla


1 Answers

File handles don't serialize very well... But you could send the name of the zip file instead of the zip filehandle (a string serializes okay between processes). And avoid zip for your filename as it's a built-in. I've chosen zip_filename

p = Process(target=extract_zip, args=(zip_filename, password))

then:

def extract_zip(zip_filename, password):
      try:
          zip_file = zipfile.ZipFile(zip_filename)
          zip_file.extractall(pwd=password)

The other problem is that your code won't run in parallel because of this:

      p.start()
      p.join()

p.join waits for the process to finish... hardly useful. You have to store the process identifiers to join them in the end.

This may cause other problems: creating too many processes in parallel may be an issue for your machine and won't help much after some point. Consider a multiprocessing.Pool instead, to limit the number of workers.

Trivial example is:

with multiprocessing.Pool(5) as p:
    print(p.map(f, [1, 2, 3, 4, 5, 6, 7]))

Adapted to your example:

with multiprocessing.Pool(5) as p:
    p.starmap(extract_zip, [(zip_filename,line.strip()) for line in txt_file])

(starmap expands the tuples as 2 separate arguments to fit your extract_zip method, as explained in Python multiprocessing pool.map for multiple arguments)

like image 70
Jean-François Fabre Avatar answered Nov 14 '22 20:11

Jean-François Fabre