LLM Internal Inference Server (BETA)

Last modified by Admin on 2026/03/19 00:30

cogAn inference server for the LLM application running in XWiki itself.
TypeJAR
CategoryAPI
Developed by

Matéo Munoz, Ludovic Dubost, Michael Hamann, Paul Pantiru

Rating
0 Votes
LicenseGNU Lesser General Public License 2.1
Compatibility

XWiki 16.2.0+, but not 16.4.0, 16.4.1 or 16.5.0 due to XCOMMONS-3088.

Success

Installable with the Extension Manager

Description

This extension provides an inference server for the LLM Application that runs inside XWiki itself. It is based on DJL (Deep Java Library) and currently provides the ability to run embedding models using PyTorch. It can be enabled by adding a server configuration with empty URL in the administration of the LLM Application and then configuring embedding models with this server. Currently, this extension only supports Linux on x86-64 CPUs.

The embedding model and the PyTorch implementation (around 500MB) are stored in the cache/djl.ai/ directory inside XWiki's permanent directory. This can be changed by setting the DJL_CACHE_DIR system property or environment variable as explained in DJL's documentation on cache management. If the system property is empty, this extension sets it to be inside XWiki's permanent directory to avoid surprises and improve compatibility with different kinds of setups of XWiki.

Prerequisites & Installation Instructions

We recommend using the Extension Manager to install this extension (Make sure that the text "Installable with the Extension Manager" is displayed at the top right location on this page to know if this extension can be installed with the Extension Manager).

You can also use the manual method which involves dropping the JAR file and all its dependencies into the WEB-INF/lib folder and restarting XWiki.


After installing this extension, you'll need to restart XWiki if you had already configured a server with empty URL as otherwise the newly available server type isn't recognized.

Versions

Dependencies

Dependencies for this extension (org.xwiki.contrib.llm:application-ai-llm-models-internal 0.8):

  • org.xwiki.contrib.llm:application-ai-llm-models-api 0.8
  • ai.djl.pytorch:pytorch-engine 0.28.0
  • ai.djl.pytorch:pytorch-native-cpu:linux-x86_64 2.2.2
  • ai.djl.pytorch:pytorch-jni 2.2.2-0.28.0
  • ai.djl.pytorch:pytorch-model-zoo 0.28.0
  • ai.djl:api 0.28.0
  • ai.djl.huggingface:tokenizers 0.28.0

Get Connected