Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyInstaller with Pandas creates over 500 MB exe

I try to create an exe file using PyInstaller 3.2.1, for test purpose I tried to make an exe for following code:

import pandas as pd
print('hello world')

After considerable amount of time (15mins +) I finished with dist folder as big as 620 MB and build - 150 MB. I work on Windows using Python 3.5.2 |Anaconda custom (64-bit). Might be worth noting that in dist folder mkl files are responsible for almost 300 MB. I run pyinstaller using 'pyinstaller.exe foo.py'. I tried using --exclude-module to exclude some dependencies, still ended up with huge files. Whether I use onefile or onedir doesn't make any difference.

I am aware that exe must contain some important files but is it normal to be as big as almost 1 GB? I can provide warning log if necessary or anything that could be helpful to solve the matter.

P.S. In parallel my coworker created an exe from same sample script and ended up with less than 100 MB, difference is he is not using anaconda. Could that be the matter?

Any help will be appreciated.

like image 701
dylan_fan Avatar asked May 10 '17 08:05

dylan_fan


4 Answers

PyInstaller creates a big executable from conda packages and a small executable from pip packages. From this simple python code:

from pandas import DataFrame as df
print('h')

I obtain a 203MB executable using conda packages and a 30MB executable using pip packages. But conda is a nice replacement for pure virtualenv. I can develop with conda and Jupyter, create some script 'mycode.py' (I can download Jupyter notebook as py-file in myfolder).

But my final solution is next: If you do not have it, install Miniconda and from the Windows Start Menu open Anaconda Prompt;

    cd myfolder
    conda create -n exe python=3
    activate exe
    pip install pandas pyinstaller pypiwin32
    echo hiddenimports = ['pandas._libs.tslibs.timedeltas'] > %CONDA_PREFIX%\Lib\site-packages\PyInstaller\hooks\hook-pandas.py
    pyinstaller -F mycode.py

Where I create a new environment 'exe', pypiwin32 is needed for pyinstaller but is not installed automaticaly, and hook-pandas.py is needed to compile with pandas. Also, importing submodules does not help me optimize the size of the executable file. So I do not need this thing:

from pandas import DataFrame as df

but I can just use the usual code:

import pandas as pd

Also, some errors are possible along using the national letters in paths, so it is nice the english user account for development tools.

like image 184
abicorios Avatar answered Oct 23 '22 18:10

abicorios


This is probably because the Anaconda version of numpy is built using mkl.

If you want to reduce the size of the distributable, you could work with a seperate building virtual environment with the packages installed through pip instead of conda

like image 27
Maarten Fabré Avatar answered Oct 23 '22 17:10

Maarten Fabré


Here's a way to still be using conda and avoid mkl. Install numpy before installing pandas with this alternate command:
conda install -c conda-forge numpy

Avoids mkl, uses an OpenBLAS package in its place. Full explanation in this issue at conda/conda-forge/numpy-feedstock github repo.

like image 7
Nikhil VJ Avatar answered Oct 23 '22 17:10

Nikhil VJ


A simple solution while working with Anaconda:

-Make a new environment inside Anaconda Navigator. (The new environment is free from the large amounts of packages that are causing the problem.)

-Open a terminal and use pipinstall to include the packages you need. ( Make sure it is in the new environment)

-Run pyinstaller.

I reduced my .exe from 300 MB to 30 MB.

like image 4
JSBY Avatar answered Oct 23 '22 18:10

JSBY