cogOffers XML and HTML/XHTML manipulation and cleaning APIs
Developed by

XWiki Development Team

0 Votes
LicenseGNU Lesser General Public License 2.1
Bundled With

XWiki Standard



  • XML Utility methods
  • HTML Utility methods
  • HTML Cleaner: cleans HTML and produces valid XHTML 1.1 or 14.0-rc-1+  XHTML 5 (if configured) content.
  • Factory to create optimised XMLReader instances. This gives us a level of indirection versus using directly javax.xml.parsers.SAXParserFactory. We use that for example to verify if we're using Xerces and if so we configure it to cache parsed DTD grammars for better performance.
  • 12.8-rc-1+ XMLAttributeValue class to help add values to an XML attribute.

HTML Cleaning

The HTML Cleaner is pretty powerful: it uses HTMLCleaner to produce valid XML and then has a series of transformations to make the resulting XML valid XHTML 1.1 content (see the test suite).


// Initialize Rendering components and allow getting instances
EmbeddableComponentManager componentManager = new EmbeddableComponentManager();

HTMLCleaner cleaner = componentManager.lookup(HTMLCleaner.class);
String xhtml = HTMLUtils.toString(cleaner.clean(new StringReader("this <b>is</b> bold")));
Assert.assertEquals("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
    + "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"\">\n"
    + "<html><head></head><body>"
    + "<p>this <strong>is</strong> bold</p>"
    + "</body></html>\n", xhtml);

To use the HTML Cleaner, you need the following dependency in your Maven pom.xml (available in Maven's Central Repository):


Get Connected