JODConverter (for Java OpenDocument Converter) is a widely used tool that automates document conversions. unoconv is a Python tool with a similar purpose. You can read here details about why you should consider switching to JODConverter’s Collabora Online backend or talk to Collabora Online itself.
Supported formats of these tools include OpenDocument, PDF, HTML, Microsoft Office formats (DOC/DOCX/RTF, XLS/XLSX, PPT/PPTX) and many others. They can be used as a Java/Python library, a command line tool, or a web application. Newer versions have a JODConverter backend that uses Collabora Online instead of LibreOffice directly.
What are the benefits of using Collabora Online for document conversion?
- Improved performance compared to startup-convert-shutdown approach
- The REST API is more reliable than starting LibreOffice in server mode and communicating via remote UNO
- More secure because the conversion happens in an isolated environment and this layered approach protects your infrastructure (from outer to inner layers):
- It is easy to run it in a Virtual Machine / Docker Container
- Document data isolation into per-document chroots
- Seccomp-bpf: inside that chroot (almost) no system calls are allowed
- Extremely sparse filesystem inside the chroot: no shell etc.
BenefitJODConverterunoconvCollabora Online
Many file formats | Yes | Yes | Yes |
Single startup cost | No | No | Yes |
Standard REST API | No | No | Yes |
Easy isolation into VM / docker | No | No | Yes |
Document isolation | No | No | Yes |
Syscall filter | No | No | Yes |
Sparse filesystem | No | No | Yes |
This means you get both improved performance and better security when converting documents with Collabora Online.
Performance
The first chart shows how Collabora Online performs compared to JODConverter’s LibreOffice backend and unoconv when we consider threading and measure the number of documents converted during a second:
Want to try out and set up CODE?
You can see that Collabora Online not only has an initially superior performance, but it also scales better as you use more threads. (We compare curl invocations for Collabora Online with java commandline invocations of JODConverter and python commandline invocations of unoconv.)
Building
If you want to try out JODConverter with its Collabora Online backend:
git clone https://github.com/sbraconnier/jodconverter cd jodconverter sh gradlew build -x integTest distZip cd build/distributions unzip jodconverter-cli-*.zip cd jodconverter-cli-*/
Running
- Example:
bin/jodconverter-cli -c https://localhost:9980/ -f pdf README.txt
- The input format is detected automatically, -f determines the output format.
- The URL is your Collabora Online server URL, it is the https:// value from the installation guide.
Using the Collabora Online REST API directly
- In case you are not using JODConverter already, you can use the REST API directly, for example:
curl -F "data=@test.txt" https://localhost:9980/cool/convert-to/pdf > out.pdf
curl -F "data=@test.txt" https://localhost:9980/cool/convert-to/png > out.png
- Alternatively you can use the HTML forms to specify the format, for example:
curl -F "data=@test.txt" -F "format=pdf" https://localhost:9980/cool/convert-to > out.pdf
Supported formats
Supported input formats:
Documents | Input formats |
---|---|
Writer documents | sxw (view), odt and fodt (edit) |
Calc documents | sxc (view), ods and fods (edit) |
Impress documents | sxi (view), odp and fodp (edit) |
Draw documents | sxd (view), odg and fodg (edit) |
Chart documents | odc (edit) |
Text master documents | sxg (view), odm (edit) |
Text template documents | stw (view), ott (edit) |
Writer master document templates | otm (edit) |
Spreadsheet template documents | stc (view), ots (edit) |
Presentation template documents | sti (view), otp (edit) |
Drawing template documents | std (view), otg (edit) |
Base documents | odb (edit) |
Extensions | oxt (edit) |
MS Word | doc and dot (edit) |
MS Excel | xls (edit) |
MS PowerPoint | ppt (edit) |
OOXML wordprocessing | docx and docm (edit), dotx and dotm (view) |
OOXML spreadsheet | xltx and xltm (view), xlsx and xlsb and xlsm (edit) |
OOXML presentation | pptx, pptm, potx, potm (edit) |
Other | wpd, pdb, hwp, wps, wri, wk1, cgm, dxf, emf, wmf, cdr, vsd, pub, vss, lrf, gnumeric, mw, numbers, p65, pdf, jpg, jpeg, gif, png, etc (view) |
Other | dif, slk, csv, dbf, oth, rtf, txt, etc (edit) |
Supported output formats for Writer/Calc/Impress:
Documents | Output formats |
---|---|
Writer | doc for MS Word 97, docm for MS Word 2007 XML VBA, docx for MS Word 2007 XML, fodt for OpenDocument Text Flat XML, html for HTML (StarWriter), odt for writer8, ott for writer8_template, pdf for writer_pdf_Export, rtf for Rich Text Format, txt for Text, xhtml for XHTML Writer File, png for writer_png_Export |
Calc | csv for Text – txt – csv (StarCalc), fods for OpenDocument Spreadsheet Flat XML, html for HTML (StarCalc), ods for calc8, ots for calc8_template, pdf for calc_pdf_Export, xhtml for XHTML Calc File, xls for MS Excel 97, xlsm for Calc MS Excel 2007 VBA XML, xlsx for Calc MS Excel 2007 XML, png for calc_png_Export |
Impress | fodp for OpenDocument Presentation Flat XML, html for impress_html_Export, odg for impress8_draw, odp for impress8, otp for impress8_template, pdf for impress_pdf_Export, potm for Impress MS PowerPoint 2007 XML Template, pot for MS PowerPoint 97 Vorlage, pptm for Impress MS PowerPoint 2007 XML VBA, pptx for Impress MS PowerPoint 2007 XML, pps for MS PowerPoint 97 Autoplay, ppt for MS PowerPoint 97, svg for impress_svg_Export, swf for impress_flash_Export, xhtml for XHTML Impress File, png for impress_png_Export |
Draw | fodg for draw_ODG_FlatXML, html for draw_html_Export, odg for draw8, pdf for draw_pdf_Export, svg for draw_svg_Export, swf for draw_flash_Export, xhtml for XHTML Draw File, png for draw_png_Export |
Trusting the local Online HTTP certificate from Java
This is only needed if you have a self-signed certificate for your Online installation.
- get the certificate:
openssl s_client -connect localhost:9980 2>&1 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p'
- paste it into a file named certfile.txt
- import it into the Java key store (password is changeit by default):
keytool -importcert -keystore $JAVA_HOME/jre/lib/security/cacerts -alias mycert -file certfile.txt
Depending on the value of $JAVA_HOME, you may need to run keytool with root/Administrator privileges.
Conclusions
- Using JODConverter already ? – consider switching to use its safer Collabora Online backend.
- Using Collabora Online via JodConverter or unoconv? – consider a switch to use the our simple REST conversion API (Java sample code, Python sample code).
- Using another tool ? – evaluate whether a standard Collabora Online solution meets your performance and conversion needs.