Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to install textract in python3

sudo python3 -m pip install textract
sudo apt-get install textract
pip install textract
sudo apt-get install swig

I want to install textract in python3 but it is not install proper way, it gives the following error.

x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -DSPHINXBASE_EXPORTS -DPOCKETSPHINX_EXPORTS -DSPHINX_DLL -DHAVE_CONFIG_H -Ideps/sphinxbase/include -Ideps/sphinxbase/include/sphinxbase -Ideps/sphinxbase/include/android -I/usr/include/python2.7 -c swig/sphinxbase/ad_wrap.c -o build/temp.linux-x86_64-2.7/swig/sphinxbase/ad_wrap.o -Wno-unused-label -Wno-strict-prototypes -Wno-parentheses -Wno-unused-but-set-variable -Wno-unused-variable -Wno-unused-result -Wno-sign-compare -Wno-misleading-indentation
  x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -DSPHINXBASE_EXPORTS -DPOCKETSPHINX_EXPORTS -DSPHINX_DLL -DHAVE_CONFIG_H -Ideps/sphinxbase/include -Ideps/sphinxbase/include/sphinxbase -Ideps/sphinxbase/include/android -I/usr/include/python2.7 -c deps/sphinxbase/src/libsphinxad/ad_pulse.c -o build/temp.linux-x86_64-2.7/deps/sphinxbase/src/libsphinxad/ad_pulse.o -Wno-unused-label -Wno-strict-prototypes -Wno-parentheses -Wno-unused-but-set-variable -Wno-unused-variable -Wno-unused-result -Wno-sign-compare -Wno-misleading-indentation
  deps/sphinxbase/src/libsphinxad/ad_pulse.c:44:30: fatal error: pulse/pulseaudio.h: No such file or directory
  compilation terminated.
  error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
like image 844
Jay Pratap Pandey Avatar asked Nov 25 '17 06:11

Jay Pratap Pandey


People also ask

How do you use Textract in Python?

Using Textract as a Python Module Like the command line utility, the process method automatically detects the current file type using its extension name and then uses an appropriate content parser and extractor suitable for the file extension. Supported file types and extraction methods are listed here.


2 Answers

Follow these steps:

  1. Download the source file for textract from: https://pypi.python.org/pypi/textract

  2. pip3 install pdfminer3k

  3. untar the downloaded file

  4. cd into the directory

  5. run: python3 setup.py install

like image 140
bkira Avatar answered Jan 02 '23 20:01

bkira


You will need to install the libpulse-dev in Ubuntu or pulseaudio-libs-devel in Fedora first.

  • If you are in Ubuntu try sudo apt-get install libpulse-dev
  • If you are in Fedora try sudo dnf install pulseaudio-libs-devel

At least this works for me.

like image 43
Sazedul Islam Sazid Avatar answered Jan 02 '23 19:01

Sazedul Islam Sazid