Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Process request thread error with Flask Application?

This might be a long shot, but here's the error that i'm getting:

  File "/home/MY NAME/anaconda/lib/python2.7/SocketServer.py", line 596, in process_request_thread
    self.finish_request(request, client_address)
  File "/home/MY NAME/anaconda/lib/python2.7/SocketServer.py", line 331, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/home/MY NAME/anaconda/lib/python2.7/SocketServer.py", line 654, in __init__
    self.finish()
  File "/home/MY NAME/anaconda/lib/python2.7/SocketServer.py", line 713, in finish
    self.wfile.close()
  File "/home/MY NAME/anaconda/lib/python2.7/socket.py", line 283, in close
    self.flush()
  File "/home/MY NAME/anaconda/lib/python2.7/socket.py", line 307, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe

I have built a Flask application that takes addresses as input and performs some string formatting, manipulation, etc, then sends them to Bing Maps to geocode (through the geopy external module).

I'm using this application to clean very large data sets. The application works for inputs of usually ~1,500 addresses (inputted 1 per line). By that I mean that it will process the address and send it to Bing Maps to be geocoded and then returned. After around 1,500 addresses, the application becomes unresponsive. If this happens while i'm at work, my proxy tells me that there is a tcp error. If i'm on a non work computer it just doesn't load the page. If I restart the application then it functions perfectly fine. Because of this i'm forced to run my program with batches of about 1,000 addresses (just to be safe because i'm not sure yet of the exact number that the program crashes at).

Does anyone have any idea what might be causing it?

I was thinking something along the lines of me hitting my Bing API key limit for the day (which is 30,000), but that can't be accurate as I rarely use more than 15,000 requests per day.

My second thought was that maybe it's because i'm still using the standard flask server to run my application. Would switching to gunicorn or uWSGI solve this?

My third thought was maybe it was getting overloaded with the amount of requests. I tried to sleep the program for 15 seconds or so after the first 1,000 addresses but that didn't solve anything.

If anyone needs further clarification please let me know.

Here is my code for the backend of the Flask Application. I'm getting the input from this function:

@app.route("/clean", methods=['POST'])
def dothing():
    addresses = request.form['addresses']
    return cleanAddress(addresses)

Here is the cleanAddress function: It's a bit cluttered right now, with all of the if statements to check for specific typos in the address, but I plan on moving a lot of this code into other functions in another file and just passing the address though those functions to clean it up a bit.

def cleanAddress(addresses):

    counter = 0

    # nested helper function to fix addresses such as '30 w 60th'
    def check_st(address):
        if 'broadway' in address:
            return address
        has_th_st_nd_rd = re.compile(r'(?P<number>[\d]{1,4}(th|st|nd|rd)\s)(?P<following>.*)')
        has_number = has_th_st_nd_rd.search(address)
        if has_number is not None:
            if re.match(r'(street|st|floor)', has_number.group('following')):   
                return address
            else:
                new_address = re.sub('(?P<number>[\d]{1,4}(st|nd|rd|th)\s)', r'\g<number>street ', address, 1)
                return new_address
        else:
            return address

    addresses = addresses.split('\n')
    cleaned = []
    success = 0
    fail = 0
    cleaned.append('<body bgcolor="#FACC2E"><center><img src="http://goglobal.dhl-usa.com/common/img/dhl-express-logo.png" alt="Smiley face" height="100" width="350"><br><p>')

    cleaned.append('<br><h3>Note: Everything before the first comma is the Old Address. Everything after the first comma is the New Address</h13>')
    cleaned.append('<p><h3>To format the output in Excel, split the columns using "," as the delimiter. </p></h3>')
    cleaned.append('<p><h2><font color="red">Old Address </font> <font color="black">New Address </font></p></h2>')

    for address in addresses:
        dirty = address.strip()
        if ',' in address:
            dirty = dirty.replace(',', '')
        cleaned.append('<font color="red">' + dirty + ', ' + '</font>')

        address = address.lower()
        address = re.sub('[^A-Za-z0-9#]+', ' ', address).lstrip()

        pattern = r"\d+.* +(\d+ .*(" + "|".join(patterns) + "))"
        address = re.sub(pattern, "\\1", address)

        address = check_st(address) 


        if 'one ' in address:
            address = address.replace('one', '1')
        if 'two' in address:
            address = address.replace('two', '2')
        if 'three' in address:
            address = address.replace('three', '3')
        if 'four' in address:
            address = address.replace('four', '4')
        if 'five' in address:
            address = address.replace('five', '5')
        if 'eight' in address:
            address = address.replace('eight', '8')
        if 'nine' in address:
            address = address.replace('nine', '9')
        if 'fith' in address:
            address = address.replace('fith', 'fifth')
        if 'aveneu' in address:
            address = address.replace('aveneu', 'avenue')
        if 'united states of america' in address:
            address = address.replace('united states of america', '')
        if 'ave americas' in address:
            address = address.replace('ave americas', 'avenue of the americas')
        if 'americas avenue' in address:
            address = address.replace('americas avenue', 'avenue of the americas')
        if 'avenue of americas' in address:
            address = address.replace('avenue of americas', 'avenue of the americas')
        if 'avenue of america ' in address:
            address = address.replace('avenue of america ', 'avenue of the americas ')
        if 'ave of the americ' in address:
            address = address.replace('ave of the americ', 'avenue of the americas')
        if 'avenue america' in address:
            address = address.replace('avenue america', 'avenue of the americas')
        if 'americaz' in address:
            address = address.replace('americaz', 'americas')
        if 'ave of america' in address:
            address = address.replace('ave of america', 'avenue of the americas')
        if 'amrica' in address:
            address = address.replace('amrica', 'americas')
        if 'americans' in address:
            address = address.replace('americans', 'americas')
        if 'walk street' in address:
            address = address.replace('walk street', 'wall street')
        if 'northend' in address:
            address = address.replace('northend', 'north end')
        if 'inth' in address:
            address = address.replace('inth', 'ninth')
        if 'aprk' in address:
            address = address.replace('aprk', 'park')
        if 'eleven' in address:
            address = address.replace('eleven', '11')
        if ' av ' in address:
            address = address.replace(' av ', ' avenue')
        if 'avnue' in address:
            address = address.replace('avnue', 'avenue')
        if 'ofthe americas' in address:
            address = address.replace('ofthe americas', 'of the americas')
        if 'aj the' in address:
            address = address.replace('aj the', 'of the')
        if 'fifht' in address:
            address = address.replace('fifht', 'fifth')
        if 'w46' in address:
            address = address.replace('w46', 'w 46')
        if 'w42' in address:
            address = address.replace('w42', 'w 42')
        if '95st' in address:
            address = address.replace('95st', '95th st')
        if 'e61 st' in address:
            address = address.replace('e61 st', 'e 61st')
        if 'driver information' in address:
            address = address.replace('driver information', '')
        if 'e87' in address:
            address = address.replace('e87', 'e 87')
        if 'thrd avenus' in address:
            address = address.replace('thrd avenus', 'third avenue')
        if '3r ' in address:
            address = address.replace('3r ', '3rd ')
        if 'st ates' in address:
            address = address.replace('st ates', '')
        if 'east52nd' in address:
            address = address.replace('east52nd', 'east 52nd')
        if 'authority to leave' in address:
            address = address.replace('authority to leave', '')
        if 'sreet' in address:
            address = address.replace('sreet', 'street')
        if 'w47' in address:
            address = address.replace('w47', 'w 47')
        if 'signature required' in address:
            address = address.replace('signature required', '')
        if 'direct' in address:
            address = address.replace('direct', '')
        if 'streetapr' in address:
            address = address.replace('streetapr', 'street')
        if 'steet' in address:
            address = address.replace('steet', 'street')
        if 'w39' in address:
            address = address.replace('w39', 'w 39')
        if 'ave of new york' in address:
            address = address.replace('ave of new york', 'avenue of the americas')
        if 'avenue of new york' in address:
            address = address.replace('avenue of new york', 'avenue of the americas')
        if 'brodway' in address:
            address = address.replace('brodway', 'broadway')
        if 'w 31 ' in address:
            address = address.replace('w 31 ', 'w 31th ')
        if 'w 34 ' in address:
            address = address.replace('w 34 ', 'w 34th ')
        if 'w38' in address:
            address = address.replace('w38', 'w 38')
        if 'broadeay' in address:
            address = address.replace('broadeay', 'broadway')
        if 'w37' in address:
            address = address.replace('w37', 'w 37')
        if '35street' in address:
            address = address.replace('35street', '35th street')
        if 'eighth avenue' in address:
            address = address.replace('eighth avenue', '8th avenue')
        if 'west 33' in address:
            address = address.replace('west 33', 'west 33rd')
        if '34t ' in address:
            address = address.replace('34t ', '34th ')
        if 'street ave' in address:
            address = address.replace('street ave', 'ave')
        if 'avenue of york' in address:
            address = address.replace('avenue of york', 'avenue of the americas')
        if 'avenue aj new york' in address:
            address = address.replace('avenue aj new york', 'avenue of the americas')
        if 'avenue ofthe new york' in address:
            address = address.replace('avenue ofthe new york', 'avenue of the americas')
        if 'e4' in address:
            address = address.replace('e4', 'e 4')
        if 'avenue of nueva york' in address:
            address = address.replace('avenue of nueva york', 'avenue of the americas')
        if 'avenue of new york' in address:
            address = address.replace('avenue of new york', 'avenue of the americas')
        if 'west end new york' in address:
            address = address.replace('west end new york', 'west end avenue')

        #print address    
        address = address.split(' ')
        for pattern in patterns:
            try:
                if address[0].isdigit():
                    continue
                else:
                    location = address.index(pattern) + 1
                    number_location = address[location]
                    #print address[location]
                    #if 'th' in address[location + 1] or 'floor' in address[location + 1] or '#' in address[location]:
                    #    continue
            except (ValueError, IndexError):
                continue
            if number_location.isdigit() and len(number_location) <= 4:
                address = [number_location] + address[:location] + address[location+1:]
                break
        address = ' '.join(address)

        if '#' in address:
            address = address.replace('#', '')


        #print (address)


        i = 0
        for char in address:
            if char.isdigit():
                address = address[i:]
                break
            i += 1


        #print (address)

        if 'plz' in address:
            address = address.replace('plz', 'plaza ', 1)
        if 'hstreet' in address:
            address = address.replace('hstreet', 'h street')
        if 'dstreet' in address:
            address = address.replace('dstreet', 'd street')
        if 'hst' in address:
            address = address.replace('hst', 'h st')
        if 'dst' in address:
            address = address.replace('dst', 'd st')
        if 'have' in address:
            address = address.replace('have', 'h ave')
        if 'dave' in address:
            address = address.replace('dave', 'd ave')
        if 'havenue' in address:
            address = address.replace('havenue', 'h avenue')
        if 'davenue' in address:
            address = address.replace('davenue', 'd avenue')



        #print address

        regex = r'(.*)(' + '|'.join(patterns) + r')(.*)'
        address = re.sub(regex, r'\1\2', address).lstrip() + " nyc"

        print (address)

        if 'americasas st' in address:
            address = address.replace('americasas st', 'americas')

        try:

            clean = geolocator.geocode(address)
            x = clean.address
            address, city, zipcode, country = x.split(",")
            address = address.lower()
            if 'first' in address:
                address = address.replace('first', '1st')
            if 'second' in address:
                address = address.replace('second', '2nd')
            if 'third' in address:
                address = address.replace('third', '3rd')
            if 'fourth' in address:
                address = address.replace('fourth', '4th')
            if 'fifth' in address:
                address = address.replace('fifth', '5th')
            if ' sixth a' in address:
                address = address.replace('ave', '')
                address = address.replace('avenue', '')
                address = address.replace(' sixth', ' avenue of the americas')
            if ' 6th a' in address:
                address = address.replace('ave', '')
                address = address.replace('avenue', '')
                address = address.replace(' 6th', ' avenue of the americas')
            if 'seventh' in address:
                address = address.replace('seventh', '7th')
            if 'fashion' in address:
                address = address.replace('fashion', '7th')
            if 'eighth' in address:
                address = address.replace('eighth', '8th')
            if 'ninth' in address:
                address = address.replace('ninth', '9th')
            if 'tenth' in address:
                address = address.replace('tenth', '10th')
            if 'eleventh' in address:
                address = address.replace('eleventh', '11th')


            zipcode = zipcode[3:]
            to_write = str(address) + ", " + str(zipcode.lstrip()) + ", " + str(clean.latitude) + ", " + str(clean.longitude)
            to_find = str(address)

            #print to_write

            # returns 'can not be cleaned' if street address has no numbers
            if any(i.isdigit() for i in str(address)):
                with open('/home/MY NAME/Address_Database.txt', 'a+') as database:
                    if to_find not in database.read():
                        database.write(dirty + '|' + to_write + '\n')
                if 'ncy rd' in address:
                    cleaned.append('<font color="red"> Can not be cleaned </font> <br>')
                    fail += 1
                elif 'nye rd' in address:
                    cleaned.append('<font color="red"> Can not be cleaned </font> <br>')
                    fail += 1
                elif 'nye c' in address:
                    cleaned.append('<font color="red"> Can not be cleaned </font> <br>')
                    fail += 1                    
                else:
                    cleaned.append(to_write + '<br>')
                    success += 1
            else:
                cleaned.append('<font color="red"> Can not be cleaned </font> <br>')
                fail += 1
        except AttributeError:
            cleaned.append('<font color="red"> Can not be cleaned </font> <br>')
            fail += 1
        except ValueError:
            cleaned.append('<font color="red"> Can not be cleaned </font> <br>')
            fail += 1
        except GeocoderTimedOut as e:
            cleaned.append('<font color="red"> Can not be cleaned </font> <br>')
            fail += 1

    total = success + fail
    percent = float(success) / float(total) * 100
    percent = round(percent, 2)
    print percent
    cleaned.append('<br>Accuracy: ' + str(percent) + ' %')
    cleaned.append('</p></center></body>')

    return "\n".join(cleaned)

UPDATE: I have switched to running the application using gunicorn, and this is solving the issue when i'm accessing the application from my home network, however, I am still receiving the TCP error from my work proxy. I am not getting any error message in my console, the browser just displays the TCP error. I can tell that the tool is still working in the background, because I have a print statement in the loop telling me that each address is still being geocoded. Could this be something along the lines of my work network not liking that the page remains loading for a long period of time and then just displays the proxy error page?

like image 919
Harrison Avatar asked Aug 11 '16 14:08

Harrison


People also ask

How do you handle application errors in Flask?

Step 1 — Using The Flask Debugger. In this step, you'll create an application that has a few errors and run it without debug mode to see how the application responds. Then you'll run it with debug mode on and use the debugger to troubleshoot application errors.

How do you send an error response in Flask?

Sometimes when building a Flask application, you might want to raise a HTTPException to signal to the user that something is wrong with the request. Fortunately, Flask comes with a handy abort() function that aborts a request with a HTTP error from werkzeug as desired.

Does Flask use threads or processes?

As of Flask 1.0, flask server is multi-threaded by default. Each new request is handled in a new thread. This is a simple Flask application using default settings.


2 Answers

Sounds like it is running out of file handles (default limit is 1024 for regular users) which you can check by running grep 'open' /proc/<webapp pid> for limit and ls -1 /proc/<pid>/fd | wc -l for currently open file handles.

I think your code is not sending a correct response which is causing the connections to remain open, eventually running out of open file handles (an open socket is a file on posix systems).

Can confirm what state the connections are in with netstat -an | grep <webapp port> when you see the issue. It should have a list of 1k+ IPs and ports and their state.

Would guess they are in TIME_WAIT state which is indicating the client is not closing the connection correctly and it is left up to the kernel to garbage collect them later.

Try:

from flask import make_response

@app.route("/clean", methods=['POST'])
def dothing():
    addresses = request.form['addresses']
    resp = make_response(cleanAddress(addresses), 200)
    return resp
like image 160
danny Avatar answered Oct 28 '22 15:10

danny


I had similar problem and having a proper web server solved the issue. I used UWSGI with nginx

like image 20
slysid Avatar answered Oct 28 '22 13:10

slysid