XML Module
Last modified by Thomas Mortagne on 2013/07/17 15:19
![]() | Offers XML and HTML/XHTML manipulation and cleaning APIs |
Type | JAR |
Developed by | |
Rating | |
License | GNU Lesser General Public License 2.1 |
Bundled With | XWiki Enterprise, XWiki Enterprise Manager |
Table of contents
Description
Features
- XML Utility methods
- HTML Utility methods
- HTML Cleaner: cleans HTML and produces valid XHTML 1.1 content
- Factory to create optimised XMLReader instances. This gives us a level of indirection versus using directly javax.xml.parsers.SAXParserFactory. We use that for example to verify if we're using Xerces and if so we configure it to cache parsed DTD grammars for better performance.
HTML Cleaning
The HTML Cleaner is pretty powerful: it uses HTMLCleaner to produce valid XML and then has a series of transformations to make the resulting XML valid XHTML 1.1 content (see the test suite).
Example:
// Initialize Rendering components and allow getting instances
EmbeddableComponentManager componentManager = new EmbeddableComponentManager();
componentManager.initialize(this.getClass().getClassLoader());
HTMLCleaner cleaner = componentManager.lookup(HTMLCleaner.class);
String xhtml = HTMLUtils.toString(cleaner.clean(new StringReader("this <b>is</b> bold")));
Assert.assertEquals("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
+ "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n"
+ "<html><head></head><body>"
+ "<p>this <strong>is</strong> bold</p>"
+ "</body></html>\n", xhtml);
EmbeddableComponentManager componentManager = new EmbeddableComponentManager();
componentManager.initialize(this.getClass().getClassLoader());
HTMLCleaner cleaner = componentManager.lookup(HTMLCleaner.class);
String xhtml = HTMLUtils.toString(cleaner.clean(new StringReader("this <b>is</b> bold")));
Assert.assertEquals("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
+ "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n"
+ "<html><head></head><body>"
+ "<p>this <strong>is</strong> bold</p>"
+ "</body></html>\n", xhtml);
To use the HTML Cleaner, you need the following dependency in your Maven pom.xml (available in Maven's Central Repository):
<dependency>
<groupId>org.xwiki.commons</groupId>
<artifactId>xwiki-commons-xml</artifactId>
<version>3.2-milestone-3</version>
</dependency>
<groupId>org.xwiki.commons</groupId>
<artifactId>xwiki-commons-xml</artifactId>
<version>3.2-milestone-3</version>
</dependency>