Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

psutil in Apache Spark

I'm using PySpark 1.5.2. I got UserWarning Please install psutil to have better support with spilling after I issue the command .collect()

Why is this warning showed?

How can I install psutil?

like image 494
wannik Avatar asked Dec 29 '15 02:12

wannik


2 Answers

pip install psutil

If you need to install specifically for python 2 or 3, try using pip2 or pip3; it works for both major versions. Here is the PyPI package for psutil.

like image 97
Cassidy Laidlaw Avatar answered Nov 12 '22 13:11

Cassidy Laidlaw


y can clone or download the psutil project in the following link: https://github.com/giampaolo/psutil.git

then run setup.py to install psutil

in 'spark/python/pyspark/shuffle.py' y can see the following codes:

def get_used_memory():
    """ Return the used memory in MB """
    if platform.system() == 'Linux':
        for line in open('/proc/self/status'):
            if line.startswith('VmRSS:'):
                return int(line.split()[1]) >> 10

    else:
        warnings.warn("Please install psutil to have better "
                      "support with spilling")**
        if platform.system() == "Darwin":
            import resource
            rss = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
            return rss >> 20
        # TODO: support windows

    return 0

so i guess if yr os is not a linux, so psutil is suggested.

like image 24
Dazhuang Avatar answered Nov 12 '22 13:11

Dazhuang