Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cloud Vision API Client threw an OS Error "too many open files"

I have met an Error of "Too many open files" when I run label detection via Cloud Vision API Client with Python.
When I asked this probrem on GitHub before this post, the maintainer gave me an advice that the problem is general Python issue rather than API.
After this advice, I have not understood yet why Python threw "too many open files".
I did logging and it showed that urllib3 had raised such errors, although I did not import that package explicitly.
What I wrong? Please help me.
My Environment is

  • Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-112-generic x86_64)
  • Python 3.5.2
  • google-cloud-vision (0.31.1)

The error logs:

[2018-05-25 20:18:46,573] {label_detection.py:60} DEBUG - success open decile_data/image/src/00000814.jpg
[2018-05-25 20:18:46,573] {label_detection.py:62} DEBUG - success convert image to types.Image
[2018-05-25 20:18:46,657] {requests.py:117} DEBUG - Making request: POST https://accounts.google.com/o/oauth2/token
[2018-05-25 20:18:46,657] {connectionpool.py:824} DEBUG - Starting new HTTPS connection (1): accounts.google.com
[2018-05-25 20:18:46,775] {connectionpool.py:396} DEBUG - https://accounts.google.com:443 "POST /o/oauth2/token HTTP/1.1" 200 None
[2018-05-25 20:18:47,803] {label_detection.py:60} DEBUG - success open decile_data/image/src/00000815.jpg
[2018-05-25 20:18:47,803] {label_detection.py:62} DEBUG - success convert image to types.Image
[2018-05-25 20:18:47,896] {requests.py:117} DEBUG - Making request: POST https://accounts.google.com/o/oauth2/token
[2018-05-25 20:18:47,896] {connectionpool.py:824} DEBUG - Starting new HTTPS connection (1): accounts.google.com
[2018-05-25 20:18:47,902] {_plugin_wrapping.py:81} ERROR - AuthMetadataPluginCallback "<google.auth.transport.grpc.AuthMetadataPlugin object at 0x7fcd94eb7dd8>" raised exception!
Traceback (most recent call last):
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/urllib3/util/ssl_.py", line 313, in ssl_wrap_socket
OSError: [Errno 24] Too many open files

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/urllib3/connectionpool.py", line 601, in urlopen
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/urllib3/connectionpool.py", line 346, in _make_request
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/urllib3/connection.py", line 326, in connect
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/urllib3/util/ssl_.py", line 315, in ssl_wrap_socket
urllib3.exceptions.SSLError: [Errno 24] Too many open files

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/requests/adapters.py", line 440, in send
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/urllib3/connectionpool.py", line 639, in urlopen
  File "/home/ishiyama/tensorflow/lib/python3.5/site-packages/urllib3/util/retry.py", line 388, in increment
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='accounts.google.com', port=443): Max retries exceeded with url: /o/oauth2/token (Caused by SSLError(OSError(24, 'Too many open files'),))

The script exported above errors is following:

# -*- coding: utf-8 -*-
""" Detecting labels of images using Google Cloud Vision. """

import argparse
import csv
from datetime import datetime
import os
import logging
from pathlib import Path
import sys
from google.cloud import vision
from google.cloud.vision import types


logger= logging.getLogger(__name__)


def get_commandline_args():
    parser = argparse.ArgumentParser(
        description='Detecting labels of images using Google Cloud Vision.')

    parser.add_argument('--image-dir',
                        type=str,
                        required=True,
                        help='Directory in which images are saved.')
    parser.add_argument('--output-path',
                        type=str,
                        required=True,
                        help='Path of output file. This is saved as CSV.')
    parser.add_argument('--max-results',
                        type=int,
                        required=False,
                        default=5,
                        help=('Maximum number of resulting labels.'
                              ' Default is 5.'))
    parser.add_argument('--debug',
                        type=bool,
                        required=False,
                        default=False,
                        help=('Whether running to debug.'
                              ' If True, this scripts will run on 3 files.'
                              ' Default is False.'))
    return parser.parse_args()


def load_image(path):
    """ load image to be capable with Google Cloud Vision Clienet API.

    Args:
        path (str): a path of an image.

    Returns:
        img : an object which is google.cloud.vision.types.Image.

    Raise:
        IOError is raised when 'open' is failed to load the image.
    """
    with open(path, 'rb') as f:
        content = f.read()
    logger.debug('success open {}'.format(path))
    img = types.Image(content=content)
    logger.debug('success convert image to types.Image')

    return img


def detect_labels_of_image(path, max_results):
    _path = Path(path)
    client = vision.ImageAnnotatorClient()
    image = load_image(path=str(_path))
    execution_time = datetime.now()
    response = client.label_detection(image=image, max_results=max_results)
    labels = response.label_annotations
    for label in labels:
        record = (str(_path), _path.name, label.description,
                  label.score, execution_time.strftime('%Y-%m-%d %H:%M:%S'))
        yield record


def main():
    args = get_commandline_args()

    file_handler = logging.FileHandler(filename='label_detection.log')
    logging.basicConfig(
        level=logging.DEBUG,
        format='[%(asctime)s] {%(filename)s:%(lineno)s} %(levelname)s - %(message)s',
        handlers=[file_handler]
    )

    image_dir = args.image_dir

    with open(args.output_path, 'w') as fout:

        writer = csv.writer(fout, lineterminator='\n')
        header = ['path', 'filename', 'label', 'score', 'executed_at']
        writer.writerow(header)

        image_file_lists = os.listdir(image_dir)
        image_file_lists.sort()
        if args.debug:
            image_file_lists = image_file_lists[:3]

        for filename in image_file_lists:
            path = os.path.join(image_dir, filename)
            try:
                results = detect_labels_of_image(path, args.max_results)
            except Exception as e:
                logger.warning(e)
                logger.warning('skiped processing {} due to above exception.'.format(path))
            for record in results:
                writer.writerow(record)


if __name__ == '__main__':
    main()
like image 486
katsuya Avatar asked May 26 '18 17:05

katsuya


2 Answers

It's not the google's limits you are hitting. I guess, You are hitting the maximum no. of open files allowed for a process. You can check all open files for the process when it's running. Use something like 'lsof' to see all the open files for a process. I'm guessing you'll see a lot ipv4, ipv6 connections open. If yes, proceed reading.

You are opening client for each image here, which means a secure authenticated connection is opened for each image. Make the line client global.

Take the line 'client = vision.ImageAnnotatorClient()' out of that function. Make the client global. One open connection will be used. This should solve your problem.

like image 141
srikar jagannadh Avatar answered Oct 20 '22 17:10

srikar jagannadh


If you are handling too many files, just increase the max number of files you can open for each process.

You can do that by:

ulimit -n your_number

For example:

ulimit -n 5000
like image 2
Akshat Bajaj Avatar answered Oct 20 '22 18:10

Akshat Bajaj