LLM Application (BETA)
LLM Artificial Intelligence for search, content generation and editing in XWiki. |
Type | XAR |
Category | Application |
Developed by | Matéo Munoz, Ludovic Dubost, Michael Hamann, Paul Pantiru |
Active Installs | 0 |
Rating | |
License | GNU Lesser General Public License 2.1 |
Compatibility | 14.10 and above, 16.2.0 and above starting with version 0.3 |
Table of contents
Description
The LLM Application extension for XWiki is a tool that enriches the functionalities of XWiki, using LLM AI to provide a chat with an LLM assistant. In combination with the Index for the LLM Application, it can also provide answers based on content of the wiki or other applications.
A dedicated chat UI can also be embedded in other applications that can authenticate with the help of the token-based authentication for the LLM Application.
Acknowledgment
Using the extension
To start using the interface, the first step is to add at least one working configuration (detailed here). Once that is done, there are two ways too launch the chat:
- LLM Application 0.3+ By opening the "LLM Application" entry in the Applications Panel.
- By editing any page in the wiki, and clicking on this button:
(Yes that is the same button icon as the "paste" function in the toolbar, since the LLM application has no button icon dedicated to this day. If you already have the "paste" button activated on the toolbar, LLM application will still work).
The interface should appear with an output that looks like the example image below, and you should be able to work with it. If the button appears to be disabled, please go to the Configuration part.
Features
The LLM Application bundle the following features:
- Interactive chat interface with the LLM models you want.
- Change models, allowing user flexibility to switch as needed.
- Quick task functionality, Menu including every task defined by the users in the Prompt database.
- Option to incorporate model responses into an ongoing edited page or copy directly to the clipboard for later use.
- LLM Application 0.2+ User prompt and System prompt database available for user, to custom tasks and configure them.
- Configuration – users can add as many different models as they need and modify parameters.
- LLM Application 0.3+ Augment chat requests with additional context of the Index for the LLM Application if the extension is installed.
Chat interface
The interface is organized to facilitate the use of the platform. First, It consists of two drop-down menus, one allows the selection of LLM models and the other to choose a specific prompt that corresponds to specific tasks.
As you can see on the image above, model are described as follows: modelname (configTheyComeFrom). So you can get two models with the same name coming from two different configuration.
Here, the prompt selector just displays the labels of the prompt (that should describe the action it does). More detail on the prompts use in the Quick task section.
There are three buttons with specific functions: start a new conversation, submit a request, and stop an ongoing request if needed. The exchanges that occur afterward are visible in a specific window, where the user's messages and the model's responses appear in the form of discussion bubbles for better visualization.
LLM Application 0.2+ There is no more button except the submit button, other one have been simplified in a dropdown menu like this:
This menu has a "New Conversation" button, that just reset everything in the window (erase the previous message + the context in the background). Then at the bottom there is one new feature : Advanced Settings. Advanced Settings allow you to do 2 things, change temperature (i.e, the level of creativity of the models) and edit system prompt on the go. You can see the interface right below.
Change models
When you change models, the conversation you had with the previous model(s) are kept for the next exchanges. So you can use one model and switch to another without any consequences.
Quick task
This feature saves time and streamlines the user's experience, making the process more efficient and straightforward. Quick task eliminates unnecessary back-and-forth dialogue to deliver direct results, that correspond to the chosen. Simply put, the user choose the task they want, and the system produces the desired output without further interaction. This enhanced user interface simplifies complex processes into a quick, one-step task. When you use one, the interface will show the corresponding prompt that has been used in the background above your message (see the image below).
LLM Application <0.2 There is only a defined list of quick task (Summarize, Completion, Translate, Auto-tagging).
LLM Application 0.2+ Since 0.2, the extension provide a database that you can feed with as many tasks as you want. They are then retrieved in the menu, and you can use it.
Interaction with the XWiki page
To interact with the edited page, you have two options.
First, you can use the include button located on the assistant response. This will include the message in the edited page in place of the cursor.
One of the limitation of the include button for now is the incapacity to write to the source of XWiki, that means if you want the model to write a text in XWiki syntax, the only choice you have is to copy / paste it in the source editor. That is why we implemented a copy button, so you can use any response in the way you want.
The copy button will copy the message to clipboard, what you do with it is up to you.
LLM Application 0.2+ Prompt Database
The user and system prompt database represents a structured manner of managing and organizing prompts for enhanced interaction and functionality.
Here, you can see each entry has 7 fields:
- Title: define the name of the prompt. That is the value displayed in the Quick Task menu.
- Description: provide a description of what the prompt is about.
- LLM Application 0.2.1+ Active: if checked the prompt will be displayed in the Quick Task menu. Else the prompt is not displayed in the Quick Task.
- System prompt: The system prompt is the initial set of instructions that sets the boundaries for an AI conversation. What rules the assistant should follow, what topics to avoid, how the assistant should format responses, and more.
- User prompt: A text or instruction provided to guide the model's response generation. It sets the context and provides guidelines for producing relevant output.
- Temperature: Number between 0 and 2 describing the creativity of the model for its answer(0 being deterministic and 2 most creative).
- LLM Application 0.2.1+ Default: if checked the prompt will be used as default.
- Database list: not displayed in the database view, this has no purpose for user.
Configuration
To configure the extension correctly, you will have to follow these steps.
As you can see in this example, you can have as many configurations as you want. In your case at the start the configuration should be empty.
Server name
The server name stand for the name you want to give to the config. There are no rules on the name you give, just keep in mind to not give the same name to two different configuration, else the first one (in the order of declaration) will not be considered anymore.
URL prefix
The URL prefix is the place where you will define the prefix URL to reach the final URI (/models and /chat/completions). It's really important to have a URL respecting the following rules:
- It should respect this template: "http://.../" (or "https://.../"), where "..." has to be replaced with your real address.
- To use the OpenAI models you should use the URL prefix "https://api.openai.com/v1/" for example.
- It has to end with a "/", for request to be valid.
- The endpoint "http://.../models" and "http://.../chat/completions" has to exist, else it can't work.
- LLM Application 0.5+ Leave the URL empty to create a server entry for running a model on the CPU of the XWiki server. Currently, this is only supported for embedding models on Linux with x86-64 CPU.
LLM Application <0.3
Configuration file
The configuration file has to be filled with one model ID at least, and the models has to be compatible with the /chat/completions endpoint! For example with the OpenAI configuration, you can see that the 'Configuration File' field is filled with "gpt-4,gpt-3.5-turbo". In the end, The interface will display these two models in the selection menu.
Token
The token field is used in case you need one to authenticate yourself when doing a request. So it is not always necessary to fill it. On the example image above you can see the OpenAI configuration has a token, but the LocalAI one (a private server with LLM models stocked locally) does not need any, so its token field is let empty.
Can Stream box
The 'Can Stream' option is here in case you want to use models that could not use a streaming API to answer. In that case, you will have to make a configuration with the specified models and to let the 'Can Stream' box unchecked. You will then have to wait for their response to be finalized before receiving it all at once.
LLM Application 0.2+ Group Allowed
This field allows you to choose which groups of users are allowed of use for each configuration. If a configuration has no group defined, nobody will be able to use it.
LLM Application 0.3+
Model Configuration
The model and group list aren't part of the server configuration anymore. Instead, you now need to configure each model separately. This is to provide more control for each individual model and to allow providing different versions of a model with different context collections configured. You can access the model configuration below the chat interface in the LLM application.
Every model that shall be available for users or for embedding in the Index for the LLM Application needs to be configured. The name that you set when creating the model configuration is the name that is displayed to the user when selecting the model. Make sure to choose a descriptive name that is understood by the users like "Fast, with context", "Slow, no context", etc. For each model, you can configure the following properties:
- Server name: the server that provides the model, as configured in the general configuration.
- Type: The type of model, for chat models, select "Large language model", for embedding models "Embeddings model".
- Model ID: This is the ID of the model on the server. This doesn't influence the name that is displayed to users of the chat interface. LLM Application 0.5+ For local inference on the XWiki server (server with empty URL), this is the name of the PyTorch text embedding model on Huggingface. A "Default" model with sentence-transformers/all-MiniLM-L6-v2 is provided as example, other models could work, too, but are untested.
- Number of Dimensions: For embedding models, this is the number of dimensions that the embedding has, e.g., 768. You can leave this empty for chat models.
- Context size: The maximum context size in terms of tokens. Used to limit the chat history to this context size.
- Allow guests to access the model: Whether guest users can use this model. Use this carefully as this might incur high costs if you don't pay attention.
- Group: The user groups that are allowed to use this model. Note that for embedding models it is important to allow both the users that are creating documents in the Index for the LLM Application and the users who perform queries by using the chat interface.
- Filters: Provides configuration for various filters that can be provided by extensions, extensions can add further filters as documented in the LLM Models API documentation.
- Context: when the Index for the LLM Application is installed, the context can be configured. The current chat message is embedded using the embedding model configured for the selected collections and the most similar content chunks from these collections are provided to the LLM as context. This allows the LLM to more accurately answer user requests based on relevant content from the collections. While this can reduce hallucinations and wrong responses, the LLM could still make up facts including references to context that doesn't actually exist, and even when the provided context is relevant it might ignore or misunderstand it and could still provide wrong answers. LLM Application 0.7+ In addition to the embedding-based similarity search, a regular keyword-based search is executed. The results of both searches are combined. The number of results can be configured individually for both.
- Collections: the collections that shall be queried, empty if the request shouldn't be augmented with context.
- Search results limit: how many results shall be considered. Note that a result is just a single chunk, not a whole document. There could be several chunks of the same document.
- LLM Application 0.7+ Keyword search results limit (number) The Maximum number of results returned by the keyword search can now be configured in addition to the Similarity search results limit (number).
- Context prompt: The prompt that is used to instruct the model to use the context. The text {{search_results}} is replaced by the actual search results.
- LLM Application 0.7+ Chunk Template: The template that is used for displaying a search result in the context. The placeholders {{url}}, {{index}} and {{content}} are replaced by the URL, index (1-based position in the result list), and content of the chunk, respectively. If the template is empty, a default template will be used.
- Context: when the Index for the LLM Application is installed, the context can be configured. The current chat message is embedded using the embedding model configured for the selected collections and the most similar content chunks from these collections are provided to the LLM as context. This allows the LLM to more accurately answer user requests based on relevant content from the collections. While this can reduce hallucinations and wrong responses, the LLM could still make up facts including references to context that doesn't actually exist, and even when the provided context is relevant it might ignore or misunderstand it and could still provide wrong answers. LLM Application 0.7+ In addition to the embedding-based similarity search, a regular keyword-based search is executed. The results of both searches are combined. The number of results can be configured individually for both.
Error
If you have an empty selection menu, or if you have a System message stating that an error happened when trying to get the model list, look carefully to your configuration and make sure you did every step described above correctly.
LLM Application 0.3+ The list of LLMs is unfortunately empty whenever an extension has been installed, upgraded or uninstalled until the next restart of XWiki. This is a known bug in XWiki. The only known workaround is to restart the wiki after the extension operation.
Prerequisites & Installation Instructions
We recommend using the Extension Manager to install this extension (Make sure that the text "Installable with the Extension Manager" is displayed at the top right location on this page to know if this extension can be installed with the Extension Manager). Note that installing Extensions when being offline is currently not supported and you'd need to use some complex manual method.
You can also use the following manual method, which is useful if this extension cannot be installed with the Extension Manager or if you're using an old version of XWiki that doesn't have the Extension Manager:
- Log in the wiki with a user having Administration rights
- Go to the Administration page and select the Import category
- Follow the on-screen instructions to upload the downloaded XAR
- Click on the uploaded XAR and follow the instructions
- You'll also need to install all dependent Extensions that are not already installed in your wiki
Dependencies
Dependencies for this extension (org.xwiki.contrib.llm:application-ai-llm-models-ui 0.7):