Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel Document Conversion ODT > PDF Libreoffice

I am converting hundreds of ODT files to PDF files, and it takes a long time doing one after the other. I have a CPU with multiple cores. Is it possible to use bash or python to write a script to do these in parallel? Is there a way to parallelize (not sure if I'm using the right word) batch document conversion using libreoffice from the command line? I have been doing it in python/bash calling the following commands:

libreoffice --headless --convert-to pdf *appsmergeme.odt

OR

subprocess.call(str('cd $HOME; libreoffice --headless --convert-to pdf *appsmergeme.odt'), shell=True);

Thank you!

Tim

like image 318
timlev Avatar asked Feb 27 '13 09:02

timlev


People also ask

Can LibreOffice convert PDF to ODT?

Libre Office Writer does not convert . pdf files to . odt. The feature is very much in demand in the form of a process that does not require programming skills.

Can LibreOffice convert PDF?

No, LibreOffice will not convert a PDF to a DOC (or ODT) or so. What you can do is that if you create a Writer document (ODT or DOC), from it you can create a PDF that embeds the source file.

What is ODT to PDF?

ODT extension is an OpenOffice document file. It is similar to DOC and DOCX formats utilized by the Microsoft Word program. Convert ODT to PDF to get all possible variants to work with such files.


3 Answers

You can run libreoffice as a daemon/service. Please check the following link, maybe it helps you too: Daemonize the LibreOffice service

Other posibility is to use unoconv. "unoconv is a command line utility that can convert any file format that OpenOffice can import, to any file format that OpenOffice is capable of exporting."

like image 55
Pancho Jay Avatar answered Oct 25 '22 22:10

Pancho Jay


Since the author already introduced Python as a valid answer:

import subprocess
import os, glob
from multiprocessing.dummy import Pool    # wrapper around the threading module

def worker(fname, dstdir=os.path.expanduser("~")):
    subprocess.call(["libreoffice", "--headless", "--convert-to", "pdf", fname],
                    cwd=dstdir)

pool = Pool()
pool.map(worker, glob.iglob(
        os.path.join(os.path.expanduser("~"), "*appsmergeme.odt")
    ))

Using a thread pool instead of a process pool by multiprocessing.dummy is sufficient because new processes for real parallelism are spawn by subprocess.call() anyway.

We can set the command as well as the current working directory cwd directly. No need to load a shell for each file for just doing that. Furthermore, os.path enables cross-platform interoperability.

like image 21
Chickenmarkus Avatar answered Oct 25 '22 23:10

Chickenmarkus


this thread or answer is old. I tested libreoffice 4.4, I can confirm I can run libreoffice concurrently. see my script.

for odt in test*odt ; do
echo $odt
soffice --headless --convert-to pdf $odt & 
ps -ef|grep ffice 
done
like image 21
Danny Li Avatar answered Oct 25 '22 23:10

Danny Li