page_white_acrobatAdds support for exporting wiki pages to PDF on the client-side using the web browser.
Recommended
TypeXAR
CategoryApplication
Developed by

XWiki Development Team

Active Installs44
Rating
2 Votes
LicenseGNU Lesser General Public License 2.1
Bundled With

XWiki Standard (14.10+)

Compatibility

XWiki 14.2+

Installable with the Extension Manager

Description

Uses paged.js along with CSS Paged Media Module and the CSS Generated Content for Paged Media Module to export wiki pages to PDF using the browser's print to PDF feature.

This application provides:

  • a "PDF" export format on the Export Modal, replacing by default the old PDF export based on Formatting Objects Processor (FOP)
  • an improved PDF Export Options modal, that allows the user to select the PDF template
  • a default PDF template for basic needs (including support for multi-page export)
  • a template provider to help creating new PDF templates
  • an administration section to configure various things, such as the list of PDF templates the end user can select from or the tool used to generate the PDF
  • a PDF export job that renders the selected XWiki pages on the server-side in a background (daemon) thread
  • components to print web pages to PDF on the server-side using a headless Chrome web browser running inside a Docker container

History

Originally the XWiki PDF Export feature was developed to work server-side. However, as XWiki's development progressed, more and more features got implemented in JavaScript and the server-side PDF export cannot export changes done to the HTML DOM by JavaScript (that would require a JavaScript engine running on the server-side and it's not easy to integrate one that would execute any JavaScript framework properly). Thus, we've decided to rewrite the PDF export feature and this extension is the result of that.

How it works

The Front-end

  • The user opens the "Export" modal using the "More Actions > Export" page menu and then selects "PDF" from the list of export formats.
  • If the current page is a nested page (can have child pages) then the user will get the "Export Tree Modal" where they can select the pages to export. Otherwise they will get directly the "PDF Export Options" modal.
  • The user chooses the PDF export options and then clicks on the "Export" button.
  • The JavaScript click listener on the "Export" button makes an HTTP request to start the PDF export job on the back-end, passing the collected data (the list of pages to export, the PDF template, whether to generate the cover page and the table of contents, etc.); the HTTP response includes the id of the scheduled job;
  • The JavaScript code then makes subsequent HTTP requests to get the status of the PDF export job, passing the id received when the job was started, until the job ends (either successfully or failing).
  • 14.4.3+, 14.6+ The user can click on the "Cancel" button to cancel the running PDF export job; this sends an HTTP request to the back-end to stop the PDF export by setting the corresponding flag on the job status; the PDF export job won't stop immediately but as soon as it reads the cancel flag.
  • When the JavaScript code detects that the PDF export job finished (based on its status) it has two options:
    • if the job status specifies a PDF file, which is the case when the PDF is generated server-side, then it redirects the user to that file
    • otherwise it uses a hidden iframe to load the PDF template passing the id of the finished PDF export job, waits for everything to load and be ready for print then calls window.print() which opens the browser's print modal that the user can use to save the result as PDF
  • The PDF template uses the status of the PDF export job specified on the HTTP request to generate the HTML that is going to be printed to PDF
    • it uses paged.js to split the HTML content in print pages and to generate the PDF cover page, table of contents as well as the page header and footer

The Back-end

  • The PDF export job simply iterates the list of wiki pages to export and renders them to HTML, collecting the results, without aggregating them (this is done later by the PDF template)
  • The rendering results are exposed on the job status (to be read by the PDF template) but they are accessible only by the user that triggered the export.
  • If the configuration says that the PDF should be generated server-side then the PDF export job uses a dedicated component to generate the PDF using a headless Chrome web browser and saves the PDF file as a temporary resource, exposing its reference on the job status.
    • The temporary resource name is a Java UUID, but the resource reference has a "fileName" parameter set by default to the title of the wiki page from where the PDF export is triggered; this "fileName" parameter appears also in the PDF temporary file URL, and is used as file name by default when downloading the PDF.
  • The PDF printer component is responsible for downloading the Docker image, creating the Docker container and connecting to the headless Chrome web browser running inside.
  • The PDF printer uses a separate browser context for each export, copying the cookies from the original request that triggered the PDF export in order to have the user authenticated
  • The PDF printer tells Chrome to open the PDF template and waits for everything to be ready before calling the Chrome API to save the web page as PDF, returning the generated PDF file to the PDF export job

PDF Export Options

pdf-export-options.png

The "PDF Export Options" modal allows you to:

  • Select the PDF template to use. The list of PDF templates you can choose from is configured in the dedicated administration section.
  • Specify whether to generate or not:
    • the cover page
    • the table of contents
    • the header (on each printed page, except for cover page and table of contents)
    • the footer (on each printed page, except for cover page and table of contents)

Note that if either table of contents, header or footer is checked then the Paged.js JavaScript library is used for print layout which in some edge cases, for very specific content, can lead to a timeout when performing the export.

PDF Templates

Default PDF Template

This application provides a default PDF template that supports:

  • cover page, showing the title, version, last author and modification date of the wiki page from where the export was triggered (the current wiki page)
  • table of contents, showing:
      • either the headings from the exported wiki page, up to level 3, when a single wiki page is included
      • or the aggregated headings (up to level 3) from all exported wiki pages, including the wiki page titles, for multi-page export

    In both cases the table of contents shows the print page number where each heading appears, and provides internal links to them.

  • page header, showing:
    • either the title of the wiki page from where the export was triggered (for single wiki page export)
    • or the title of the wiki page that provided the content from the current print page (for multi-page export)
  • page footer, showing the print page number and count

When multiple wiki pages are exported, the content of each wiki page starts on a new print page in the generated PDF.

Custom PDF Templates

You can create your own PDF Template by creating a new page and then selecting the "PDF Template" template:

pdf-template-select.png

This leads you to the creation page for your custom template:

pdf-template-edit.png

Once saved, you can see how it looks:

pdf-template-view.png

And you can perform inplace editing of the properties:

pdf-template-inplace.png

You can also inject CSS or Javascript using Skin Extension xobjects. If you edit your custom PDF Template in object mode you'll see a pre-filled SSX xobject:

pdf-template-ssx.png

Administration Section

pdf-export-adminSection.png

The administration section allows you to:

  • 14.9+ check the state of the PDF generator
  • set the list of PDF templates that the users can select from on the PDF Export Options modal; note that leaving the list of templates empty will effectively disable the browser-based PDF export; same happens if the current user doesn't have view access on none of the configured PDF templates
  • 14.9+ select and configure the PDF generator
  • 14.9+ set the page ready timeout, that is the number of seconds to wait for the web page to be ready for print before aborting the PDF export
  • 14.10+ set the maximum content size, in kilobytes (KB), that can be included in a single PDF export
  • 14.10+ set whether to replace the old PDF export based on Apache Formatting Objects Processor (FOP) or not; the label displayed on the Export Modal depends on this setting: "PDF" vs. "PDF (Web)"
  • 14.10.3+ disable the maximum content size by setting its value to 0

PDF Generator

There are multiple ways in which the PDF can be generated and the application provides configuration options in a dedicated administration section (but also in xwiki.properties) to choose what's best for you.

User Browser

The first option (14.8+ and the one used by default) is to generate the PDF using the user's own web browser, on the client side. This has the advantage that it works out of the box because it doesn't depend on any external service (like Docker or a remote headless Chrome) but it has the downside that different users (with different web browsers or different versions of the same web browser) can get different results.

<14.8 

For older versions of XWiki you can opt for the client side PDF generation using the available global configuration:

# [Since 14.4.3]
# [Since 14.6RC1]
# Use the user's browser to generate the PDF instead of a headless Chrome browser instance on the server-side (Docker).
export.pdf.serverSide=false

The PDF export job request also has a property to force the client-side generation for a custom export:

#set ($pdfExportJobRequest = $services.export.pdf.createRequest())
## Tell the PDF export job we want to generate the PDF on the client side.
#set ($discard = $pdfExportJobRequest.setServerSide(false))
## The PDF export job will only render the XWiki pages on the server side. Once the job is done you'll have to redirect
## the user to the print preview page with the job id in the query string.
#set ($pdfExportJob = $services.export.pdf.execute($pdfExportJobRequest))

Chrome Docker Container

The PDF can also be generated on the server-side using a headless Chrome web browser running inside a Docker container. The application takes care of:

  • pulling the right Docker image (when not found locally)
  • creating the container and starting it (if there's no existing container available)
  • stopping the container at the end when XWiki shuts down (if the container was created by XWiki)

The requirements for this are:

  • Docker 20.10+ must be installed on the machine running XWiki (the servlet engine) if XWiki is not itself inside a Docker container (see the following section). The reason is because in this case (XWiki running outside Docker, on the same machine as the Docker daemon) the Chrome browser running inside a Docker container needs to access the XWiki instance running on the Docker host. This is possible thanks to the host-gateway magic host name that was introduced in Docker 20.10 and which we use when creating the Chrome container like this: --add-host=host.xwiki.internal:host-gateway.
  • the OS user running XWiki (e.g. "tomcat") must be allowed to use Docker (e.g. on Linux this usually means adding the user to the "docker" group so that it has access to the Docker socket)
  • internet access to pull the Docker image

Docker out of Docker

If XWiki is also running inside a Docker container then:

  • you need to bind-mount the Docker socket so that XWiki can communicate with the Docker daemon in order to manage the headless Chrome container
  • you should create a Docker network, add the XWiki container to that network and configure XWiki to use it for the headless Chrome container so that they can communicate (XWiki needs to access the Chrome container for remote debugging and the Chrome container needs to be able to load XWiki pages)
    # Tell XWiki which Docker network to use to communicate with the headless Chrome container.
    export.pdf.dockerNetwork=xwiki-network
  • you have to specify in the XWiki configuration the host that the Chrome container can use to access XWiki (usually the network alias of the XWiki container or its IP address):
    # The host that the Chrome container uses to access XWiki.
    export.pdf.xwikiHost=xwiki-container

Note that in this case you can use an older version of Docker because being in the same network means XWiki and Chrome can talk to each other based on their network aliases or IP addresses. We don't need to rely on the magic host-gateway provided by Docker 20.10+.

Reusable Docker Container

If for some reason the machine running XWiki doesn't have internet access but it has Docker installed then you have the option to (re)use an existing Docker container with the headless Chrome web browser:

# Specify the name of the Docker container to reuse.
export.pdf.chromeDockerContainerName=headless-chrome-pdf-printer

In this case you are responsible for creating the headless Chrome container using a proper image. XWiki will be responsible for starting and stopping the Chrome container as needed. The requirements for this are:

  • Docker must be installed on the machine running XWiki (the servlet engine). No specific version of Docker is needed (from the point of view of XWiki), but you need to make sure that the Chrome container you create (for XWiki to reuse) can access the XWiki instance (specified using the export.pdf.xwikiHost configuration). Be aware that if XWiki runs on the same host as the Docker daemon (rather than inside its own Docker container) then you probably need to:
    • either set export.pdf.xwikiHost=host.docker.internal, if you are on Windows or MacOS and have Docker 18.03+
    • or create the Chrome container with --add-host=host.xwiki.internal:host-gateway, if you are on Linux and have Docker 20.10+ (which supports the magic host-gateway)
  • the OS user running XWiki (e.g. "tomcat") must be allowed to use Docker (e.g. on Linux this usually means adding the user to the "docker" group so that it has access to the Docker socket)

If XWiki is also running inside a Docker container then check out the Docker out of Docker section above.

Remote Chrome

If you don't want to rely on Docker, or you don't want to give XWiki access to Docker for security reasons, but you still want to perform the PDF export on the server side then you also have the option to connect to a remote Chrome instance:

# Specify the Chrome host and port so that we can connect for remote debugging.
export.pdf.chromeHost=172.17.0.3
export.pdf.chromeRemoteDebuggingPort=9222
# Specify how the remote Chrome instance can access the XWiki instance in order to load XWiki pages (print preview).
export.pdf.xwikiHost=172.17.0.2

Note that "remote" could also mean local if you use Docker containers like this:

  • run XWiki in a Docker container
  • run headless Chrome in a Docker container
  • put both containers in the same Docker network
  • configure chromeHost and xwikiHost (see above) either using the container IPs or their network aliases

Configuration Options

The following configuration options can be set from xwiki.properties:

# [Since 14.4.3]
# [Since 14.6RC1]
# Whether the PDF export should be performed server-side, e.g. using a headless Chrome web browser running inside a
# Docker container, or client-side, using the user's web browser instead; defaults to client-side PDF generation
# starting with 14.8
export.pdf.serverSide=false

# The host running the headless Chrome web browser, specified either by its name or by its IP address. This allows you
# to use a remote Chrome instance, running on a separate machine, rather than a Chrome instance running in a Docker
# container on the same machine; defaults to empty value, meaning that by default the PDF export is done using the
# Chrome instance running in the specified Docker container.
export.pdf.chromeHost=

# The port number used for communicating with the headless Chrome web browser.
export.pdf.chromeRemoteDebuggingPort=9222

# The host name or IP address that the headless Chrome browser should use to access the XWiki instance (i.e. the print
# preview page); defaults to "host.xwiki.internal" which means the host running the Docker daemon; if XWiki runs itself
# inside a Docker container then you should use the assigned network alias, provided both containers (XWiki and Chrome)
# are in the same Docker network.
export.pdf.xwikiHost=host.xwiki.internal

# The Docker image used to create the Docker container running the headless Chrome web browser.
export.pdf.chromeDockerImage=zenika/alpine-chrome:latest

# The name of the Docker container running the headless Chrome web browser. This is especially useful when reusing an
# existing container.
export.pdf.chromeDockerContainerName=headless-chrome-pdf-printer

# The name or id of the Docker network to add the Chrome Docker container to; this is useful when XWiki itself runs
# inside a Docker container and you want to have the Chrome container in the same network in order for them to
# communicate. The default value "bridge" represents the default Docker network.
export.pdf.dockerNetwork=bridge

# [Since 14.9]
# The number of seconds to wait for the web page to be ready (for print) before timing out.
export.pdf.pageReadyTimeout=60

# [Since 14.10]
# The maximum content size, in kilobytes (KB), an user is allowed to export to PDF; in order to compute the content size
# we sum the size of the HTML rendering for each of the XWiki documents included in the export; the size of external
# resources, such as images, style sheets, JavaScript code is not taken into account; 0 means no limit;
export.pdf.maxContentSize=100

# [Since 14.10]
# The maximum number of PDF exports that can be executed in parallel (each PDF export needs a separate thread).
export.pdf.threadPoolSize=3

# [Since 14.10]
# Whether to replace or not the old PDF export based on Apache Formatting Objects Processor (FOP).
export.pdf.replaceFOP=true

Script Service

The application provides a script service that can be used to perform custom PDF exports:

## Create a PDF export job request based on the current servlet request.
#set ($pdfExportJobRequest = $services.export.pdf.createRequest())

## Customize the PDF export job request:
#set ($discard = $pdfExportJobRequest.setDocuments($documentReferenceList))
#set ($discard = $pdfExportJobRequest.setTemplate($templateDocumentReference))
#set ($discard = $pdfExportJobRequest.setWithCover(true))
#set ($discard = $pdfExportJobRequest.setWithToc(false))
#set ($discard = $pdfExportJobRequest.setWithHeader(true))
#set ($discard = $pdfExportJobRequest.setWithFooter(false))
#set ($discard = $pdfExportJobRequest.setWithTitle(true))
#set ($discard = $pdfExportJobRequest.setServerSide(true))
#set ($discard = $pdfExportJobRequest.setFileName('myCool.pdf'))

## Trigger the PDF export job and wait for it to finish.
#set ($pdfExportJob = $services.export.pdf.execute($pdfExportJobRequest))
#set ($discard = $pdfExportJob.join())

## Get the PDF file reference from the job status.
#set ($pdfExportJobStatus = $pdfExportJob.status)
#set ($pdfFileReference = $pdfExportJobStatus.getPDFFileReference())
#if ($services.resource.temporary.exists($pdfFileReference))
  #set ($pdfFileURL = $services.resource.temporary.getURL($pdfFileReference))

  ## Redirect the use to the generated PDF file.
  #set ($discard = $response.sendRedirect($pdfFileURL))
#end

Troubleshooting

If the PDF export fails then you should first check if the PDF export job starts:

  • open the Network tab from the browser's developer's tools
  • reload the page you want to export as PDF
  • open the Export modal and choose "PDF", then select the pages to export and click "Export"
  • clear the request log from the Network tab
  • click on "Export" (from PDF Export Options modal) and check the HTTP request log

Normally, you should see:

  • a first request to /xwiki/bin/get/PageToExport/ that schedules the PDF export job and returns the job status as JSON:
    {"id":["export","pdf","1663658402005-493"],"state":"NONE","canceled":false,"progress":{"offset":0.0}}
    • if this request fails then it probably means that the PDF export job didn't start. Check the HTTP response (might include a stacktrace) and the request parameters (see if they look normal)
  • once the PDF export job is scheduled the front-end starts making HTTP requests to fetch the job status until the job finishes; thus you should see multiple requests like this:
    /xwiki/bin/get/PageToExport/?outputSyntax=plain&sheet=XWiki.PDFExport.WebHome&data=jobStatus&jobId=export%2Fpdf%2F1663658402005-493

    The response is the job status as JSON:

    {
     "id":["export","pdf","1663658402005-493"],
     "state":"FINISHED",
     "canceled":false,
     "progress":{"offset":1.0},
     "pdfFileURL":"/xwiki/tmp/export/document%3Axwiki%3APageToExport.WebHome/pdf/9888d576-2858-4209-af26-5e88d9a1ebab.pdf",
     "failed":false
    }

    If the job failed then you should see a failed: true in the JSON.

  • At the end, if the PDF export job is successful then you should see a request to the generated PDF file that looks like this:
    /xwiki/tmp/export/document%3Axwiki%3APageToExport.WebHome/pdf/9888d576-2858-4209-af26-5e88d9a1ebab.pdf

If the PDF export job started but failed then you should check the job log that you can find in:

<permanentDirectory>/jobs/status/export/pdf/<timestamp>/log.xml

<permanentDirectory> is the configured permanent directory, while for <timestamp> you should either check the most recent log or take the timestamp from the front-end HTTP requests (jobId parameter). Inside log.xml file you should look for a Java stacktrace, close to the end of the file.

If the job file doesn't include enough information to explain the problem then you should enable debug logs:

  • go to the Logging administration section
  • filter loggers by org.xwiki.export.pdf
  • set log level to debug for the first entry
  • perform again the PDF export and check the new job log, it should contain more detailed information

Prerequisites & Installation Instructions

We recommend using the Extension Manager to install this extension (Make sure that the text "Installable with the Extension Manager" is displayed at the top right location on this page to know if this extension can be installed with the Extension Manager). Note that installing Extensions when being offline is currently not supported and you'd need to use some complex manual method.

You can also use the following manual method, which is useful if this extension cannot be installed with the Extension Manager or if you're using an old version of XWiki that doesn't have the Extension Manager:

  1. Log in the wiki with a user having Administration rights
  2. Go to the Administration page and select the Import category
  3. Follow the on-screen instructions to upload the downloaded XAR
  4. Click on the uploaded XAR and follow the instructions
  5. You'll also need to install all dependent Extensions that are not already installed in your wiki

See the different export modes that exist and that can be configured. You should also check the requirement matching the export mode you've chosen to use (or the default mode's requirement if you haven't changed any configuration).

Dependencies

Dependencies for this extension (org.xwiki.platform:xwiki-platform-export-pdf-ui 15.0):

Tags:
    

Get Connected