Batch Import Application

Last modified by Mohamed Boussaa on 2024/07/29 00:14

cogWizard to perform an Excel / CSV import with Batch Import API in an application created with App Within Minutes.
TypeXAR
Category
Developed by

Anca Paula Luca, Ludovic Dubost, slauriere, mouhb

Active Installs61
Rating
0 Votes
LicenseGNU Lesser General Public License 2.1

Installable with the Extension Manager

Description

This application provides a default interface for the Batch Import API. This interface allows to import data from a file into an application created with App Within Minutes.

This application needs the Batch Import API to be installed and its features (import of Excel and/or CSV) depend on the configuration of the Batch Import API.

How to use

Go on the Home page of the application created with AppWithinMinutes (from the "Applications" panel on the left) and add, at the end of the URL ?sheet=BatchImport.BatchImportSheet . This will launch the import wizard in the context of the current application, and any data related to the wizard (like the source file attachments and persisted configuration) will be stored in the application WebHome.

for versions 1.4+:
However, the method above will cause issue . Starting with Batch Import version 1.4, there is an alternative for creating an import configuration for an application created with AppWithinMinutes, which is the following: with an advanced user, create a terminal page under the homepage of the AppWithinMinutes application (with any name). Save this page and then, in view mode, add ?sheet=BatchImport.BatchImportSheet at the end of the URL of this page. This will launch the import wizard in the context of the current application, and any data related to the wizard (like the source file attachments and persisted configuration) will be stored in this terminal configuration page created under the root of the application. On subsequent imports (if any), the ?sheet=BatchImport.BatchImportSheet parameter doesn't need to be explicitly added at the end of the URL, the import wizard will be displayed whenever this terminal configuration page is viewed. Also, after the first initialization of this page (upload of a file and configuration of a mapping, even without completing the import) this terminal configuration page created for this import can be moved elsewhere in the wiki and used from there.

Then, regardless of the used method, once the import wizard is displayed, follow the steps of the wizard to perform your batch import of data.

Configuration

Each BatchImport can be configured either visually via the included wizard (but not all configuration options are present visually), or directly by editing the attached BatchImportClass object. The table below describes all configuration options.

OptionDescription
Class nameName of the target class to be used for creating objects from column values.
MappingThis option defines the mapping between column names in the input file and class properties. Note that data types should match, i.e. the columns that you choose for numeric, date or boolean fields should be convertible to those types. A property can be mapped to column "Constant:value": in that case, each property value will be set to "value". A property can also be mapped to "Computed:rowIndex": in that case, each property value will be set to the current row index. The mapping "doc.file" can be used to attach a file to each target document, or to convert an Office file to a document.
Metadata file nameName of the file to be imported. 
File importWhen this option is activated, the BatchImport will use the Office Importer to convert the files present in column mapped to "doc.file" into XWiki documents.
Filter stylesThis option has an effect only when the option "File import" is activated. It controls the filtering of styles during the conversion of Office documents to XWiki pages.
File upload

When this option is activated, the files referred in the column mapped to "doc.file" get attached to the target documents.

XWiki 2.4+

For developers: additional settings, not configurable from the UI, allow to control how the values mapped to doc.file are interpreted upon import:

  • docfileislist - whether the value should be interpreted as a list, using the configured list separator of the import, false by default.
  • clearfilenames - whether a cleanup should be performed on the file names before import as attachments of pages, true by default. Setting this to false imports the files with the exact name as it appears in the input file. Starting with version 2.5 this option is also available for configuration from the UI.

These 2 new properties can be configured only when controlling the usage of the Batch Import API and need to be set directly in the BatchImportConfiguration object, as a map entry (there is no API of the configuration object to set them.

SpaceName of the space where the documents should be stored. This option is not required.
Data file nameDirectory or zip file on the server where the referenced files are stored.
Data file prefixThe path inside the zip file named by the data file name where the files are stored.
Field for tagsName of the column to be used for associating tags to each document.
Note: for now, a limitation exists where the items named here are not imported as tags unless they are also mapped to another property in the class and imported on that property.
List separatorIf a file column is mapped on an application field of type list with "multiple select" checked, the import system will try to interpret the value in the field as a potential list of values rather than a single value. The splitting of the value will be done using this character as enumeration separator. By default, it is ',' (comma).
Default date formatBy default, if a column in the file is mapped on an application field of type date, the values in the column will try to be interpreted using the format specified in the date field. However, if you have values that are using a different format, you can specify here the fallback format to be used for values which cannot be interpreted using the format specified in the date field. Note that the format applies to individual values, not to a whole column (if a value on a column cannot be interpreted using the date format of the field, only that value will be interpreted using the format specified here, not all the values on that column). The format should given as a pattern by the rules described in the Java SimpleDataFormat class documentation.
Clear namesWhether the values found in the file that are used for space and document names should be cleaned before being used: whitespaces are removed and special characters as well. Activate this option if characters that might break the document URLs (e.g. slashes, backslashes, pluses, etc) might appear in the values of the columns chosen for document names and document spaces. Note that, if this option is activated, the deduplication options apply to the cleaned names.
XWiki 2.5+ Clear file namesWhether the values found in the file that are used for attachment names should be cleaned before being used: whitespaces are removed and special characters as well.
Honor empty valuesIf the file to be imported contains empty cell values, these values can either be ignored or taken into account and, in that case, override not empty values in the target document, when it exists already. This affects document properties (title, parent, content) and object properties. Activate this option if empty values should override not empty ones. This option is available since version 2.3. In previous versions, empty values were always honoured for document data, and ignored for object properties.

Aside from these configurations, some additional parameters of the import are available only when using directly the API, as they need to be setup programatically in the configuration object passed to the import function. See the documentation of the Batch Import API for more information about these.

Deleting data (imported) from an application

If you need to delete all the data in the application (produced by the import or not):

  • you can use ?sheet=BatchImport.BatchImportDeleteSheet at the end of the URL of the application home page or 
  • on recent versions of XWiki deleting all the data of the application is a standard feature for an application created with App Within Minutes, so you can use that button from the application homepage.

Note that all users which have the right to create documents in the application space or delete documents in the application space will be able to perform import or delete of data: batch import wizard does not enforce any rights, it only provides automation of import / delete.

Importing values to XWiki objects lists

If the source file (excel or csv) contains columns mapped to properties that are lists in the XWiki class, depending on how the lists are configured in these XWiki classes, there may be a need for a transformation of the imported texts from values of the lists to keys of those same lists. In order to do so, an additional extension can be installed in order to obtain this behaviour on import, the Batch import list identifier post-processor .

Example

Here is an example of the import, performed on data based on https://www.danceireland.ie/content/files/Resource-room-library-catalogue.xls, during which we will import a library catalogue from excel to an application in the wiki. Since the initial excel file was designed for print, we touched it a bit to make it suitable for import. The modified version is here .

Define app structure

First, we create the application with a text field for the author, a number field for the number of copies in store, a date field for the publication year (whose format we configure to display only year), and a boolean field to mark whether the item can be loaned or can only be consulted in the library. We will store the title of the publication in the document title, and the id of the publication in the page name (since id is unique and page name is also an identifier of the item).

createApplication.png

Import XLS file in app

After creating the application page, as explained in the "How to use" section above, we proceed to the import by adding at the end of the URL ?sheet=BatchImport.BatchImportSheet . In the first step of the wizard, we upload the xls file and select it:

selectFile.png

selectFile-Uploaded.png

The application doesn't support Excel files of newer standard with .xlsx extension and won't show them in the list. It will only show in the list Excel files of Excel 1997-2003 standard with the .xls extension.

Map the file columns

In the second step of the wizard, we can choose the mapping between the fields of the XWiki class of the application and the columns of the file to import (note that, compared to the initial xls, we moved the DVD sheet in the first position, so we're importing DVDs now):

columnMapping.png

Preview the mapping

Then, a preview of the mapping is displayed (the first ten rows), to show how will the data from the file be processed and parsed according to the types of the fields of the XWiki Class (since conversion needs to be possible automatically): 

mappingPreview.png

Data deduplication & overwrite

Upon confirming of the mapping, we will be able to choose the overwrite options for the import: since, in the mapping step, we chose a file column to be mapped on the document name, we need to specify now how should the import behave when two rows in the file have the same value for the document name column. Also, if a page with the same name already exists in the wiki, we need to specify how should the import wizard behave in such situation.

overwriteOptions.png

Import simulation

The next step is an import simulation for the whole source file, showing what will happen during the actual import:

importSimulation.png

If you encounter a 504 Time-out error page at this step, please use smaller file to import, or use the Batch Import API application to perform the import.

Perform the import

And finally, after the import simulation we can actually perform the import (using the button at the bottom of the import simulation log) and see the data imported in the application, on the application home table:

importedData.png

If you encounter a 504 Time-out error page at this step, it doesn't mean that the import fails, just that it is running in the background. Please look at the logs on your server or use a tool like Admin Tools Application to check when the import will be complete.

Import a CSV file

To show error handling and some special configurations of the import, we will now import the "Guides and Publications" from the excel in the same application and, since it has pretty irregular dates, we will import it as a CSV file this time. 

If the excel import does not work as expected, especially for numbers and dates, CSV export and import of the CSV file in the wiki can help, since it forces the serialization to string of some values which otherwise will be displayed differently than they are stored by in the excel file, thus misleading the user about the actual data format.

First we export the Guides and Publications sheet in a CSV file from the Excel tool we use (resulting this file) and then we upload it in the wizard as we did for the previous file:

importCsv.png

Upon column mapping we can see that the old mapping was stored, as much as possible, and displayed as defaults for this new import (the column names for author and publication date differ from the DVDs to Guides and Publication tab).

mappingSave.png

Since the dates of the Guides and publications are irregular, as we can see in this screenshot of the excel tool:

excelData.png

... we will also set the date format in the column mapping screen, thus helping the import tool parse these "non-conform" dates:

dateFormat.png

On the mapping preview step we only see the first 10 lines, so we cannot see the dates being parsed correctly as "MMM-yy":

mappingPreviewGP.png

... but on the simulation step we can see that the only errors are coming from 4 rows which really have unparsable dates:

importSimulationWithError.png

Note that these rows with errors are completely ignored on import, but could be imported in a second import, removing the date mapping and configuring overwrite so that the existing documents from the previous import are not modified, or they could be corrected in the initial excel so that they can be imported along with the rest of the file.

After the perform of the import in the last step of the wizard, we can see that the couple of dates in the "MMM-yy" format we saw in the initial excel are properly parsed to the corresponding dates in the wiki:

importedDateFormat.png

Finally, in the home page of our library application, we can see all the data, DVD and Guides and Publications together:

allImportedData.png

Prerequisites & Installation Instructions

We recommend using the Extension Manager to install this extension (Make sure that the text "Installable with the Extension Manager" is displayed at the top right location on this page to know if this extension can be installed with the Extension Manager). Note that installing Extensions when being offline is currently not supported and you'd need to use some complex manual method.

You can also use the following manual method, which is useful if this extension cannot be installed with the Extension Manager or if you're using an old version of XWiki that doesn't have the Extension Manager:

  1. Log in the wiki with a user having Administration rights
  2. Go to the Administration page and select the Import category
  3. Follow the on-screen instructions to upload the downloaded XAR
  4. Click on the uploaded XAR and follow the instructions
  5. You'll also need to install all dependent Extensions that are not already installed in your wiki
  • Go to wiki Administration -> Localization and add BatchImport.Translations to the i18n bundles.

If you want to import excel files, don't forget to install properly the Excel Plugin.

Release Notes

v2.6

v2.5

v2.4

v2.3

v2.2

v2.1

v2.0

v1.4

v1.1

  • Update parent POM to XWiki Commons 4.5.4

Dependencies

Dependencies for this extension (org.xwiki.contrib:xwiki-batchimport-application 2.6):

Get Connected