Solr Search Application

Last modified by Admin on 2017/11/29 19:51

magnifierAllows searching on the wiki using Apache Solr
TypeXAR
Developed by

XWiki Development Team

Rating
LicenseGNU Lesser General Public License 2.1
Bundled With

XWiki Enterprise

Installable with the Extension Manager

Description

searchPage.png

Starting with XWiki 5.1 the default search engine is based on Apache Solr.

XWiki was using Lucene as the default search engine up until version 5.1 RC1. You can enable Solr on these older versions from the Search administration section. Also, you have to manually re-index the content of your wiki because Solr search module did not support automatic indexing prior to version 5.1.

User Interface

By default the search returns document results sorted by relevance. Depending on the selected result type (see the Result Type facet below), you can also sort the results by:

  • document title, last modification date or last author, if the results are documents

    searchDocumentSort.png

  • file name, file size, upload date or uploader, if the results are attachments

    searchAttachmentSort.png

Besides documents and attachments you can also search for objects and object properties:

searchObjectResult.png

searchObjectPropertyResult.png

Each search result highlights the places where the search keywords have been found. Only one match is displayed initially but you can view all the matches by clicking on the "Highlight all matches" link.

searchHighlighting.png

On the right side you have the search facets that will help you drill down the search results. The displayed facets are always relative to the current search results. You can see that the list changes when you select a facet.

searchFacets

The following facets are available:

  • Result Type: filters results based on their type (documents, attachments, objects and object properties); this determines the index fields that are used and the available sort fields; the document type is selected by default
  • Wiki: filters the results from the selected wiki; this facet is currently displayed only on the main wiki and only if you have multiple wikis
  • Location: filters the results based on their location

     searchLocationFacet.png

  • Language: filters the document results that match the selected language; in case of attachments, objects and object properties, it filters the results based on the language of the document that holds them

     searchLanguageFacet

  • Last Author: filters the document results based on their last author

     searchUserFacet

  • Creator: filters the document results based on their creator
  • Last Modification Date: filters the results based on the last modification date of the corresponding document

     searchDateFacet

    The date facets (last modification date, creation date and upload date) display a list of predefined date intervals and offer the possibility to specify a custom interval. If you don't specify both end points of the interval then it means you want all dates after/bofore the specified date.

  • Creation Date: filters the results based on the creation date of the corresponding document
  • Object Type: filters results based on the type of object (XClass) they have. E.g. "documents that have Blog Posts", "Panel objects", "properties of Java Script Extension"

     searchObjectTypeFacet

  • File Type: filters results based on the attachment file type. E.g. "documents that have attached images", "text attachments"

     searchFileTypeFacet.png

    File types are grouped by category. You can select both an entire category and a specific file type. Categories can be expanded/collapsed.

  • Uploaded By: filters the results based on the user that uploaded the attachments
  • Upload Date: filters the results based on the date when the attachments were uploaded
  • File Size: filters the results based on the size (in bytes) of the attachments

     searchFileSizeFacet.png

    You can choose between 4 ranges:

    • tiny (less than 10KB)
    • small (between 10KB and 500KB)
    • medium (between 500KB and 5MB)
    • large (more than 5MB)

The number of displayed facet values is limited to 5 by default but you can see the rest of the values by clicking on the link following the facet values (each click will show 5 more values). You can select multiple values and the selection is preserved if you submit a new search query.

At the bottom of the search results you can find a link to a RSS feed that provides the most recent results that match the current search query and filters. It contains the same type of information that was included in the old Lucene search RSS feed.

Starting with 7.1M1 the search UI is responsive with the screen size. On small screens (phones) the list of search facets is collapsed before the search results. You can of course expand it with one tap.

searchPageMobile.png

Search Syntax

The Solr search engine used in XWiki parses the search query (what you type in the search input) using by default the Extended DisMax Query Parser. You'll have to read the Solr documentation for details. Let's see some examples:

  • Search for word "foo" in the title field
    title:foo
  • Search for phrase "foo bar" in the title field
    title:"foo bar"
  • Search for phrase "foo bar" in the title field AND the phrase "quick fox" in the content field
    title:"foo bar" AND doccontent:"quick fox"
  • Search for either the phrase "foo bar" in the title field AND the phrase "quick fox" in the content field, or the word "fox" in the title field
    (title:"foo bar" AND doccontent:"quick fox") OR title:fox
  • Search for word "foo" and not "bar" in the title field
    title:foo -title:bar
  • Search for any word that starts with "foo" in the title field
    title:foo*
  • Search for any word that starts with "foo" and ends with bar in the title field
    title:foo*bar
  • Note that Solr doesn't support suffix matching: title:*foo
  • Range query
    date:[20020101 TO 20030101]
    number:[* TO 100]
    number:[100 TO *]
    number:[* TO *]
  • Pure negative queries (all clauses prohibited) are allowed
    // finds all field values where hidden is not false
    -hidden:false

    // finds all documents without a value for field
    -field:[* TO *]
  • Boosting fields
    (title:foo OR title:bar)^1.5 (doccontent:foo OR doccontent:bar)

You should also check the schema of the XWiki Solr Index to see what fields can be used in the Solr queries. If you want to query the XWiki Solr Index programmatically then see the Solr Query API available in XWiki.

Search Debug Mode

If you want to debug the search you can add &debug=true to the search URL query string. You'll get the following information:

  • the query parser used
  • the parsed query (see which index fields are used and what is their priority/boost)
  • the filter queries (see which filters/facets are applied and their values)
  • processing time by search component
  • the score for each search result and the way it was computed

searchDebug.png

Search UI Options

Since 7.1 it's possible to enable/disable highlighting and faceting. Both are very slow tasks so disabling them when you don't really need them can give you an important speed boost in the search UI.

searchOptions.png

Search Administration Section

The default folder that stores the Solr index is <permanent folder>/solr but you can change it by adding the property solr.embedded.home to the file "WEB-INF/xwiki.properties".

In the "Solr search administration" section, choose the action you wish to perform on your wiki:

  • add documents to the Solr index
  • remove documents from the Solr index
  • re-index the wiki

SolrActions.png

then click on "Apply".

If you have programming rights you may also limit the list of documents that will be affected by the selected action using a custom query. You can use either XWiki Query Language (XWQL) or Hibernate Query Language (HQL).

SolrCustomQuery.png

The documents are indexed asynchronously, in a background thread, so the "Apply" button only triggers the selected action. You can see how many documents are in the indexing queue and the estimated remaining time. You can use the search function right away but the search results will contain only documents that have been indexed so far.

SolrIndexQueueStatus.png

Advanced Search Suggest Sources

The Search Suggest feature retrieves live search results from various configurable sources. These sources specify the search engine to use and the search query. The search doesn't perform well, at least on Solr, if we use only the search query because each query is different (when the input text is different) so the cache is not used efficiently. Best is to rely on the filter cache but for this we need to be able to specify the filter query.

Starting with 5.4.2 (and 6.0M1) you are able to specify more advanced search parameters in the search query and they will be passed directly to the search engine. As an example, the following statement from the 'query' property of a Search Suggest Source

type:DOCUMENT AND (title:(__INPUT__*) OR name:(__INPUT__*))

can be written as

fq=type:DOCUMENT
qf=title^2 name

This way

  • you will be using the filter query which is the same for all search requests to this Search Suggest Source so it will be cached by Solr
  • you will be able to specify the boost for each field you want to search in
  • the query statement used is '__INPUT__' by default, if not specified.

In order to preserve backward compatibility with existing Solr Search Suggest Sources, we use the following convention:

 A line that doesn't start with 'xxx=' specifies the query statement; in other words, existing Solr Search Suggest Sources are specifying only the query statement.

For example:

foo __INPUT__* bar
fq=type:DOCUMENT
qf=title^2 name

means the query statement is 'foo __INPUT__* bar'. Which is equivalent to:

q=foo __INPUT__* bar
fq=type:DOCUMENT
qf=title^2 name

See the Solr Search Query API documentation for details on what parameters you can pass to the search engine.

For Developers

What is Solr?

Apache Solr is a search engine. You index a set of documents (e.g. wiki pages) and then you ask Solr to return the set of documents that match the user query.

What is a search index?

The fastest way to retrieve pages in a book related to a keyword is by scanning the index at the back of a book, not by searching every word of every page of the book. If we translate this to XWiki, performing a database search is not the best way to search for some keyword. There is a better way.

This type of index is called an inverted index, because it inverts a page-centric data structure (page->words) to a keyword-centric data structure (word->pages).

bookIndex.jpeg

How does Solr represent data?

  • In Solr, a document is the unit of search and index.
  • An index consists of one or more documents, and a document consists of one or more fields.
  • In database terminology, a document corresponds to a table row, and a field corresponds to a table column.

You can view the Solr index as a database with a single table.

Solr Schema

The Solr schema describes the list of document fields, their type and how to index and search each of them.

Fields can use basic field types (int, boolean, date, string) or complex field types which combine one tokenizer with multiple filters. E.g.:

<analyzer>
 <tokenizer class="solr.StandardTokenizerFactory"/>
 <filter class="solr.SnowballPorterFilterFactory"/>
 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
  1. Input: "flip flipped flipping"
  2. After Tokenizer: "flip", "flipped", "flipping"
  3. After Snowball: "flip", "flip", "flip"
  4. After Remove Duplicates: "flip"

The text is analysed at index time but also at query time. Each field type needs to specify the index analyzer and the query analyzer. Most of the time they are the same, but there are cases where we want them to be different.

Defining a field

Here's what a field declaration looks like:

<field name="id" type="text" indexed="true" stored="true" multiValued="true"/>
  • name: the name of the field
  • type: the field type (controls how the field is analysed at index / query time)
  • indexed: whether the field should be added to the inverted index or not
  • stored: whether the original value of this field should be stored or not (required for highlighting)
  • multiValued: can this field have multiple values?

You cannot search for a field that is not indexed and you cannot access from the search result the value of a field that was not stored.

Example of field that is not stored: title_sort. At the moment, the XWiki's Solr Schema doesn't have fields that are not indexed.

Solr Search Relevancy

Solr (actually Lucene under  the hood) is using a scoring algorithm known as the tf.idf. This scoring model involves a number of scoring factors:

  • Term Frequency (tf): The frequency with which a term appears in a document. Given a search query, the higher the term frequency, the higher the document score.
  • Inverse Document Frequency (idf): The rarer a term is across all documents in the index, the higher it's contribution to the score.
  • Coordination Factor (coord): The more query terms that are found in a document, the higher it's score.
  • Field length (fieldNorm): The more words that a field contains, the lower it's score. This factor penalizes documents with longer field values.

In addition to the scoring factors mentioned above, the primary method of modifying document scores is by boosting. There are 2 kinds of boosts. Index-time and Query-time boosts. Index-time boosts are applied when adding documents, and apply to the entire document or to specific fields. Query-time boosts are applied when constructing a search query, and apply to specific fields.

Query boosts are applied by appending the caret character ^ followed by a positive number to query clauses.

title:foo OR (title:foo AND title:bar)^2.0 OR title:"foo bar"^10

Solr Request Handler and Components

  • Request handlers are responsible for accepting search queries, performing searches and returning the results.
  • They actually delegate the work to a series of components
    •  query, facet, moreLikeThis, highlight, stats, debug, Spell Checking, suggester
  • Request handlers and components are configured in solrconfig.xml. We can overwrite this using query parameters.

Solr and XWiki

  • Solr is the default search engine
  • We index wiki pages, attachments, objects and object properties
  • The index is updated by:
    • Start-up sync between the database and the Solr index
    • saving/deleting XWiki entities
    • manual trigger from the Solr Search Administration Section
  • Solr can be used embedded (default) or as an external service
  • Exposed through the QueryManager (like hql or xwql)
  • Used by search suggest and the main search page

Search UI Configuration

Starting with 5.4.2 (and 6.0M1) all configuration parameters for the Solr Search UI can be found in Main.SolrSearchConfig. This simplifies the process of customizing the search UI. 

For example you can find in it the default weights used for the various search fields:

{{velocity output="false"}}
#set ($__defaultSolrConfig = {
  'queryFields': {
    'DOCUMENT': 'title^10.0 name^10.0
                 doccontent^2.0
                 objcontent^0.4 filename^0.4 attcontent^0.4 doccontentraw^0.4
                 author_display^0.08 creator_display^0.08
                 comment^0.016 attauthor_display^0.016 spaces^0.016',
    'ATTACHMENT': 'filename^5.0 attcontent attauthor_display^0.2',
    'OBJECT': 'objcontent',
    'OBJECT_PROPERTY': 'propertyvalue'
  },
[...]

It also allows application developers to easily create a dedicated search page for their application data. As an example, we updated the FAQ application to use the new configuration parameters:

{{include reference="XWiki.SearchCode"/}}

{{velocity output="false"}}
#if ($searchEngine == 'solr')
  ## Customize the Solr Search UI for the FAQ application.
  #set ($solrConfig = {
    'queryFields': 'title^3 property.FAQCode.FAQClass.answer',
    'facetFields': ['creator', 'creationdate', 'author', 'date', 'mimetype', 'attauthor', 'attdate', 'attsize'],
    'filterQuery': [
      'type:DOCUMENT',
      "wiki:$xcontext.database",
      "space_exact:$doc.space",
      'class:FAQCode.FAQClass'
    ]
  })
#end
{{/velocity}}

{{velocity}}
{{include reference="$searchPage"/}}
{{/velocity}}

Sorting on Object Properties

#set ($solrConfig = {
...
'sortFields': {
    'DOCUMENT': {
      'property.XWiki.AverageRatingsClass.averagevote_sortFloat': 'desc'
    },
...

Detailed info here in XWiki Solr Index
Don't forget to add alias for new sort field in Main.SolrTranslations

Faceting on Object Properties

The Solr Search Query API describes how you can add a facet that is based on a object property. Starting with 5.4.2 (and 6.0M1) you can achive the same using the Search UI configuration page, Main.SolrSearchConfig:

...
'facetFields': ['type', ..., 'attsize', 'property.Blog.BlogPostClass.publishDate_date'],
...

For example if you wish to add a "Tag" facet for filtering on tags, you an add the property.XWiki.TagClass.tags_string facet field value as in:

...
  'facetFields': ['type', 'wiki', 'space_facet', 'locale', 'author', 'creator', 'date',
    'creationdate', 'class', 'name_exact', 'mimetype', 'attauthor', 'attdate', 'attsize', 'property.XWiki.TagClass.tags_string'],
...

The facet displayer used for an XClass property is determined based on the property type, and can also be configured using either facetDisplayers or facetDisplayersByPropertyType configuration parameters. Note that you can even create you own custom facet displayer. Take a look at the existing facet displayers like Main.SolrUserFacet or Main.SolrFileSizeFacet. The facet displayer is a wiki page. The displayer code is in the page content.

Translations

Translations for default Solr search page can be found in Main.SolrTranslations

JSON Service

You can send requests to the Solr search service (6.3M2+) from your JavaScript code to get search results in JSON format:

require(['jquery'], function($) {
 var solrServiceURL = new XWiki.Document('SuggestSolrService', 'XWiki').getURL('get');
  $.post(solrServiceURL, {
    outputSyntax: 'plain',
    media: 'json',
    query: [
     'q=*__INPUT__*',
     'fq=type:DOCUMENT',
     'fq=class:XWiki.XWikiUsers',
     'qf=property.XWiki.XWikiUsers.last_name^10 property.XWiki.XWikiUsers.first_name^5 name^2.5'
    ].join('\n'),
    input: $('.userPicker').val()
  });
});

The response is an array with items that include all the information indexed by Solr. An item looks like this:

{
 "id": "xwiki:XWiki.Admin_",
 "hidden": false,
 "wiki": "xwiki",
 "spaces": ["XWiki"],
 "name": "Admin",
 "locale": "",
 "language": "",
 "type": "DOCUMENT",
 "fullname": "XWiki.Admin",
 "title_": "Profile of Administrator ",
 "doccontentraw_": "",
 "doccontent_": "",
 "version": "3.1",
 "doclocale": "",
 "locales": ["", "en"],
 "lang": ["", "en"],
 "author": "xwiki:XWiki.Admin",
 "author_display": "Administrator",
 "creator": "xwiki:XWiki.Admin",
 "creator_display": "Administrator",
 "creationdate": 1108463850000,
 "date": 1414057869000,
 "property.Dashboard.UserDashboardPreferencesClass.displayOnMainPage_boolean": [false],
 "object.Dashboard.UserDashboardPreferencesClass__": ["false"],
 ...
}

So you have access to all the document meta data, including objects and attachments.

The service has two main parameters: query and input. See the Advanced Search Suggest Sources section for more details on how to use them.

Miscellaneous

  •  You can restrict the list of wikis that are searchable by default from the main wiki by defining the following Velocity variable in a page that includes Main.SolrSearch:
    #set ($wikisSearchableFromMainWiki = ["wiki1", "wiki2", "wiki3"])

Prerequisites & Installation Instructions

We recommend using the Extension Manager to install this extension (Make sure that the text "Installable with the Extension Manager" is displayed at the top right location on this page to know if this extension can be installed with the Extension Manager). Note that installing Extensions when being offline is currently not supported and you'd need to use some complex manual method.

You can also use the following manual method, which is useful if this extension cannot be installed with the Extension Manager or if you're using an old version of XWiki that doesn't have the Extension Manager:

  1. Log in the wiki with a user having Administration rights
  2. Go to the Administration page and select the Import category
  3. Follow the on-screen instructions to upload the downloaded XAR
  4. Click on the uploaded XAR and follow the instructions
  5. You'll also need to install all dependent Extensions that are not already installed in your wiki

Dependencies

Dependencies for this extension (org.xwiki.platform:xwiki-platform-search-solr-ui 9.10.1):

Tags: solr search
Created by Eduard Moraru on 2013/02/05 17:42
    

Get Connected