Batch Import Application
Wizard to perform an Excel / CSV import with Batch Import API in an application created with App Within Minutes. |
Type | XAR |
Category | |
Developed by | Anca Paula Luca, Ludovic Dubost, slauriere, mouhb |
Active Installs | 61 |
Rating | |
License | GNU Lesser General Public License 2.1 |
Description
This application provides a default interface for the Batch Import API. This interface allows to import data from a file into an application created with App Within Minutes.
How to use
Go on the Home page of the application created with AppWithinMinutes (from the "Applications" panel on the left) and add, at the end of the URL ?sheet=BatchImport.BatchImportSheet . This will launch the import wizard in the context of the current application, and any data related to the wizard (like the source file attachments and persisted configuration) will be stored in the application WebHome.
for versions 1.4+:
However, the method above will cause issue . Starting with Batch Import version 1.4, there is an alternative for creating an import configuration for an application created with AppWithinMinutes, which is the following: with an advanced user, create a terminal page under the homepage of the AppWithinMinutes application (with any name). Save this page and then, in view mode, add ?sheet=BatchImport.BatchImportSheet at the end of the URL of this page. This will launch the import wizard in the context of the current application, and any data related to the wizard (like the source file attachments and persisted configuration) will be stored in this terminal configuration page created under the root of the application. On subsequent imports (if any), the ?sheet=BatchImport.BatchImportSheet parameter doesn't need to be explicitly added at the end of the URL, the import wizard will be displayed whenever this terminal configuration page is viewed. Also, after the first initialization of this page (upload of a file and configuration of a mapping, even without completing the import) this terminal configuration page created for this import can be moved elsewhere in the wiki and used from there.
Then, regardless of the used method, once the import wizard is displayed, follow the steps of the wizard to perform your batch import of data.
Configuration
Each BatchImport can be configured either visually via the included wizard (but not all configuration options are present visually), or directly by editing the attached BatchImportClass object. The table below describes all configuration options.
Option | Description |
---|---|
Class name | Name of the target class to be used for creating objects from column values. |
Mapping | This option defines the mapping between column names in the input file and class properties. Note that data types should match, i.e. the columns that you choose for numeric, date or boolean fields should be convertible to those types. A property can be mapped to column "Constant:value": in that case, each property value will be set to "value". A property can also be mapped to "Computed:rowIndex": in that case, each property value will be set to the current row index. The mapping "doc.file" can be used to attach a file to each target document, or to convert an Office file to a document. |
Metadata file name | Name of the file to be imported. |
File import | When this option is activated, the BatchImport will use the Office Importer to convert the files present in column mapped to "doc.file" into XWiki documents. |
Filter styles | This option has an effect only when the option "File import" is activated. It controls the filtering of styles during the conversion of Office documents to XWiki pages. |
File upload | When this option is activated, the files referred in the column mapped to "doc.file" get attached to the target documents. XWiki 2.4+ For developers: additional settings, not configurable from the UI, allow to control how the values mapped to doc.file are interpreted upon import:
These 2 new properties can be configured only when controlling the usage of the Batch Import API and need to be set directly in the BatchImportConfiguration object, as a map entry (there is no API of the configuration object to set them. |
Space | Name of the space where the documents should be stored. This option is not required. |
Data file name | Directory or zip file on the server where the referenced files are stored. |
Data file prefix | The path inside the zip file named by the data file name where the files are stored. |
Field for tags | Name of the column to be used for associating tags to each document. Note: for now, a limitation exists where the items named here are not imported as tags unless they are also mapped to another property in the class and imported on that property. |
List separator | If a file column is mapped on an application field of type list with "multiple select" checked, the import system will try to interpret the value in the field as a potential list of values rather than a single value. The splitting of the value will be done using this character as enumeration separator. By default, it is ',' (comma). |
Default date format | By default, if a column in the file is mapped on an application field of type date, the values in the column will try to be interpreted using the format specified in the date field. However, if you have values that are using a different format, you can specify here the fallback format to be used for values which cannot be interpreted using the format specified in the date field. Note that the format applies to individual values, not to a whole column (if a value on a column cannot be interpreted using the date format of the field, only that value will be interpreted using the format specified here, not all the values on that column). The format should given as a pattern by the rules described in the Java SimpleDataFormat class documentation. |
Clear names | Whether the values found in the file that are used for space and document names should be cleaned before being used: whitespaces are removed and special characters as well. Activate this option if characters that might break the document URLs (e.g. slashes, backslashes, pluses, etc) might appear in the values of the columns chosen for document names and document spaces. Note that, if this option is activated, the deduplication options apply to the cleaned names. |
XWiki 2.5+ Clear file names | Whether the values found in the file that are used for attachment names should be cleaned before being used: whitespaces are removed and special characters as well. |
Honor empty values | If the file to be imported contains empty cell values, these values can either be ignored or taken into account and, in that case, override not empty values in the target document, when it exists already. This affects document properties (title, parent, content) and object properties. Activate this option if empty values should override not empty ones. This option is available since version 2.3. In previous versions, empty values were always honoured for document data, and ignored for object properties. |
Aside from these configurations, some additional parameters of the import are available only when using directly the API, as they need to be setup programatically in the configuration object passed to the import function. See the documentation of the Batch Import API for more information about these.
Deleting data (imported) from an application
If you need to delete all the data in the application (produced by the import or not):
- you can use ?sheet=BatchImport.BatchImportDeleteSheet at the end of the URL of the application home page or
- on recent versions of XWiki deleting all the data of the application is a standard feature for an application created with App Within Minutes, so you can use that button from the application homepage.
Note that all users which have the right to create documents in the application space or delete documents in the application space will be able to perform import or delete of data: batch import wizard does not enforce any rights, it only provides automation of import / delete.
Importing values to XWiki objects lists
If the source file (excel or csv) contains columns mapped to properties that are lists in the XWiki class, depending on how the lists are configured in these XWiki classes, there may be a need for a transformation of the imported texts from values of the lists to keys of those same lists. In order to do so, an additional extension can be installed in order to obtain this behaviour on import, the Batch import list identifier post-processor .
Example
Here is an example of the import, performed on data based on https://www.danceireland.ie/content/files/Resource-room-library-catalogue.xls, during which we will import a library catalogue from excel to an application in the wiki. Since the initial excel file was designed for print, we touched it a bit to make it suitable for import. The modified version is .
Define app structure
First, we create the application with a text field for the author, a number field for the number of copies in store, a date field for the publication year (whose format we configure to display only year), and a boolean field to mark whether the item can be loaned or can only be consulted in the library. We will store the title of the publication in the document title, and the id of the publication in the page name (since id is unique and page name is also an identifier of the item).
Import XLS file in app
After creating the application page, as explained in the "How to use" section above, we proceed to the import by adding at the end of the URL ?sheet=BatchImport.BatchImportSheet . In the first step of the wizard, we upload the xls file and select it:
Map the file columns
In the second step of the wizard, we can choose the mapping between the fields of the XWiki class of the application and the columns of the file to import (note that, compared to the initial xls, we moved the DVD sheet in the first position, so we're importing DVDs now):
Preview the mapping
Then, a preview of the mapping is displayed (the first ten rows), to show how will the data from the file be processed and parsed according to the types of the fields of the XWiki Class (since conversion needs to be possible automatically):
Data deduplication & overwrite
Upon confirming of the mapping, we will be able to choose the overwrite options for the import: since, in the mapping step, we chose a file column to be mapped on the document name, we need to specify now how should the import behave when two rows in the file have the same value for the document name column. Also, if a page with the same name already exists in the wiki, we need to specify how should the import wizard behave in such situation.
Import simulation
The next step is an import simulation for the whole source file, showing what will happen during the actual import:
Perform the import
And finally, after the import simulation we can actually perform the import (using the button at the bottom of the import simulation log) and see the data imported in the application, on the application home table:
Import a CSV file
To show error handling and some special configurations of the import, we will now import the "Guides and Publications" from the excel in the same application and, since it has pretty irregular dates, we will import it as a CSV file this time.
First we export the Guides and Publications sheet in a CSV file from the Excel tool we use (resulting
) and then we upload it in the wizard as we did for the previous file:Upon column mapping we can see that the old mapping was stored, as much as possible, and displayed as defaults for this new import (the column names for author and publication date differ from the DVDs to Guides and Publication tab).
Since the dates of the Guides and publications are irregular, as we can see in this screenshot of the excel tool:
... we will also set the date format in the column mapping screen, thus helping the import tool parse these "non-conform" dates:
On the mapping preview step we only see the first 10 lines, so we cannot see the dates being parsed correctly as "MMM-yy":
... but on the simulation step we can see that the only errors are coming from 4 rows which really have unparsable dates:
Note that these rows with errors are completely ignored on import, but could be imported in a second import, removing the date mapping and configuring overwrite so that the existing documents from the previous import are not modified, or they could be corrected in the initial excel so that they can be imported along with the rest of the file.
After the perform of the import in the last step of the wizard, we can see that the couple of dates in the "MMM-yy" format we saw in the initial excel are properly parsed to the corresponding dates in the wiki:
Finally, in the home page of our library application, we can see all the data, DVD and Guides and Publications together:
Prerequisites & Installation Instructions
We recommend using the Extension Manager to install this extension (Make sure that the text "Installable with the Extension Manager" is displayed at the top right location on this page to know if this extension can be installed with the Extension Manager). Note that installing Extensions when being offline is currently not supported and you'd need to use some complex manual method.
You can also use the following manual method, which is useful if this extension cannot be installed with the Extension Manager or if you're using an old version of XWiki that doesn't have the Extension Manager:
- Log in the wiki with a user having Administration rights
- Go to the Administration page and select the Import category
- Follow the on-screen instructions to upload the downloaded XAR
- Click on the uploaded XAR and follow the instructions
- You'll also need to install all dependent Extensions that are not already installed in your wiki
- Go to wiki Administration -> Localization and add BatchImport.Translations to the i18n bundles.
Release Notes
v2.6
v2.5
v2.4
v2.3
v2.2
v2.1
v2.0
v1.4
v1.1
- Update parent POM to XWiki Commons 4.5.4
Dependencies
Dependencies for this extension (org.xwiki.contrib:xwiki-batchimport-application 2.6):