Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TypeError: expected str, bytes or os.PathLike object, not FileStorage while reading pdf files using flask

Tags:

I am trying to read a python file using flask application. I am using pdfminer to read the pdf text.

@app.route('/getfile', methods=['POST'])
def getfile():
    request_data = request.files['file']
    rsrcmgr = PDFResourceManager()
    retstr = io.StringIO()
    codec = 'utf-8'
    laparams = LAParams()
    device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
    fp = open(request_data, 'rb')
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    password = ""
    maxpages = 0
    caching = True
    pagenos = set()

    for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages,
                                  password=password,
                                  caching=caching,
                                  check_extractable=True):
        interpreter.process_page(page)

    text = retstr.getvalue()

    fp.close()
    device.close()
    retstr.close()
return text

Unfortunately it throws error,

  • Running on http://0.0.0.0:5000/ (Press CTRL+C to quit) 127.0.0.1 - - [11/Apr/2018 16:07:53] "GET /hello HTTP/1.1" 200 - [2018-04-11 16:07:55,720] ERROR in app: Exception on /getfile [POST] Traceback (most recent call last): File "c:\users\rb287jd\appdata\local\programs\python\python36\lib\site-packages\flask\app.py", line 1982, in wsgi_app response = self.full_dispatch_request() File "c:\users\rb287jd\appdata\local\programs\python\python36\lib\site-packages\flask\app.py", line 1614, in full_dispatch_request rv = self.handle_user_exception(e) File "c:\users\rb287jd\appdata\local\programs\python\python36\lib\site-packages\flask\app.py", line 1517, in handle_user_exception reraise(exc_type, exc_value, tb) File "c:\users\rb287jd\appdata\local\programs\python\python36\lib\site-packages\flask_compat.py", line 33, in reraise raise value File "c:\users\rb287jd\appdata\local\programs\python\python36\lib\site-packages\flask\app.py", line 1612, in full_dispatch_request rv = self.dispatch_request() File "c:\users\rb287jd\appdata\local\programs\python\python36\lib\site-packages\flask\app.py", line 1598, in dispatch_request return self.view_functionsrule.endpoint File "C:/Users/RB287JD/Documents/Programs/flask_1.py", line 27, in getfile fp = open(request_data, 'rb').decode("utf-8") TypeError: expected str, bytes or os.PathLike object, not FileStorage 127.0.0.1 - - [11/Apr/2018 16:07:55] "POST /getfile HTTP/1.1" 500 -

How do i read an input pdf file inside the flask? PS. i dont want to provide my file location inside anywhere in the code. I want to do it on the fly.

like image 936
dhinar1991 Avatar asked Apr 11 '18 10:04

dhinar1991


1 Answers

The request.files['file'] is an instance of a FileStorage class (see also http://flask.pocoo.org/docs/0.12/api/#flask.Request.files), so you can't do the fp = open(request_data, 'rb'). The FileStorage object contains a stream attribute that should point to an open temporary file, and probably you can pass that to PDFPage.get_pages()

So, something like:

@app.route('/getfile', methods=['POST'])
def getfile():
    file = request.files['file']
    rsrcmgr = PDFResourceManager()
    retstr = io.StringIO()
    codec = 'utf-8'
    laparams = LAParams()
    device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    password = ""
    maxpages = 0
    caching = True
    pagenos = set()

    for page in PDFPage.get_pages(file.stream, pagenos, maxpages=maxpages,
                                  password=password,
                                  caching=caching,
                                  check_extractable=True):
        interpreter.process_page(page)

    text = retstr.getvalue()

    device.close()
    retstr.close()
return text
like image 124
Marco Pantaleoni Avatar answered Sep 28 '22 19:09

Marco Pantaleoni