LLM Internal Inference Server (BETA)
| An inference server for the LLM application running in XWiki itself. |
| Type | JAR |
| Category | API |
| Developed by | Matéo Munoz, Ludovic Dubost, Michael Hamann, Paul Pantiru |
| Rating | |
| License | GNU Lesser General Public License 2.1 |
| Compatibility | XWiki 16.2.0+, but not 16.4.0, 16.4.1 or 16.5.0 due to XCOMMONS-3088. |
Description
This extension provides an inference server for the LLM Application that runs inside XWiki itself. It is based on DJL (Deep Java Library) and currently provides the ability to run embedding models using PyTorch. It can be enabled by adding a server configuration with empty URL in the administration of the LLM Application and then configuring embedding models with this server. Currently, this extension only supports Linux on x86-64 CPUs.
The embedding model and the PyTorch implementation (around 500MB) are stored in the cache/djl.ai/ directory inside XWiki's permanent directory. This can be changed by setting the DJL_CACHE_DIR system property or environment variable as explained in DJL's documentation on cache management. If the system property is empty, this extension sets it to be inside XWiki's permanent directory to avoid surprises and improve compatibility with different kinds of setups of XWiki.
Prerequisites & Installation Instructions
We recommend using the Extension Manager to install this extension (Make sure that the text "Installable with the Extension Manager" is displayed at the top right location on this page to know if this extension can be installed with the Extension Manager).
You can also use the manual method which involves dropping the JAR file and all its dependencies into the WEB-INF/lib folder and restarting XWiki.
After installing this extension, you'll need to restart XWiki if you had already configured a server with empty URL as otherwise the newly available server type isn't recognized.
Versions
Dependencies
Dependencies for this extension (org.xwiki.contrib.llm:application-ai-llm-models-internal 0.8):
- org.xwiki.contrib.llm:application-ai-llm-models-api 0.8
- ai.djl.pytorch:pytorch-engine 0.28.0
- ai.djl.pytorch:pytorch-native-cpu:linux-x86_64 2.2.2
- ai.djl.pytorch:pytorch-jni 2.2.2-0.28.0
- ai.djl.pytorch:pytorch-model-zoo 0.28.0
- ai.djl:api 0.28.0
- ai.djl.huggingface:tokenizers 0.28.0