Introduction

The iText® XMLWorker is a package created for transforming XML files to PDF. Although parsing XML was already possible with iText, a new version has been created. Many developers use the XML to PDF capabilities to parse simple HTML/XHTML snippets to PDF but the support for CSS was somewhat limited. In the new XMLWorker there is better support for CSS. Initially this is done purely for parsing XHTML tags with CSS2, which is created with a wysiwyg editors (e.g. TinyMCE or CKEditor) for instance. Of course it does not end here. By using the XMLWorker it is possible to parse all kinds of XML and use CSS in them, although this requires specific implementation of the XML-tags and/or CSS-styles.

Current Limitations

The XMLWorker is initially created to parse snippets and absolute positioning is not yet supported. As a result, it is currently not possible to surround everything with borders for instance. For the current CSS limitations see CSS Support.

Parsing XHTML/CSS snippets

Parsing XHTML snippets can be done with the default implementation for parsing HTML to PDF. See code examples for usage tips.

Supported tags

TagCommentSupported Attributes
xml if available used for parsing charset encoding
html ignored  
head ignored  
title if a document is available, the title is set with document.addTitle(title).  
meta parses http-equiv="Content-Type" and the charset http-equiv, content
script ignored  
style parsed and added to css processing  
link css is parsed and added to global styles type, href
body direct content in body is added  
a supported href, name
br supported  
div direct content in div is added  
h1 to h6 supported  
p supported  
span supported  
img supported src, width, height
hr supported  
ul, ol, li supported  
dfn, dl, dt supported  
table supported
nested tables not supported
width
tr supported  
td, th supported width, rowspan, colspan
thead, tfoot, tbody supported  
caption caption element of a table is supported  
sub supported  
sup supported  
small, big supported  
b, strong supported  
u, ins supported  
i, cite, em, var, dfn, address supported  
pre, tt, code, kbd, samp supported  
s, strike, del supported  

Known issues

The implementation is not fully finished. There are a couple areas that need to be fixed/improved and are still worked on.
It is possible not all CSS will behave as expected, there is a lot and not every possible combination is fully tested and implemented.
Javascript is totally ignored at the moment.
The provided snippets content character encoding is not taken into account, we are working on that.

CSS Support

n = not supported, f = fully supported, s = somehow supported

Property
The CSS property (CSS2/3)
Text
CSS properties applicable on text
tables
CSS properties applicable on tables (table, td, tr)
list
CSS properties applicable on list (ul, ol, li)
image
CSS properties applicable on images (img)
background        
background-attachment n n n  
background-color f f n  
background-image n n n  
background-position n n n  
background-repeat n n n  
border n f n n
border-bottom n f n n
border-bottom-color n f n n
border-bottom-style n s n n
border-bottom-width n f n n
border-color n f n n
border-collapse   n - always collapsed    
border-left n f n n
border-left-color n f n n
border-left-style n s n n
border-left-width n f n n
border-right n f n n
border-right-color n f n n
border-right-style n s n n
border-right-width n f n n
border-spacing   n    
border-style n s n n
border-top n f n n
border-top-color n f n n
border-top-style n s n n
border-top-width n f n n
border-width n f n n
bottom n n n n
caption-side   f    
clear n n n n
clip n n n n
color f      
content n n n n
counter-increment n n n  
counter-reset n n n  
cursor n n n  
direction n n n  
display n n n n
empty-cells   f    
float n n n n
font f      
font-family f      
font-size f      
font-style f      
font-variant n      
font-weight f      
height n f n  
left n n n  
letter-spacing f      
line-height f      
list-style     f  
list-style-image     f  
list-style-position     f  
list-style-type     f  
margin f f s (not on li) n
margin-bottom f f f n
margin-left f f f (not on li) n
margin-right f f s (not on li) n
margin-top f f f  
max-height n n n  
max-width n n n  
min-height n n n  
min-width n n n  
orphans n n n  
outline n n n  
outline-color n n n  
outline-style n n n  
outline-width n n n  
overflow n n n  
padding f f s n
padding-bottom f f f  
padding-left f f f (not on li)  
padding-right f f f (not on li)  
padding-top f f f  
page-break-after s - only value always s - only value always s - only value always s - only value always
page-break-before s - only value always s - only value always s - only value always s - only value always
page-break-inside n n n  
position n n n n
quotes n n n  
right n n n n
table-layout   s    
text-align f      
text-decoration f      
text-indent f      
text-shadow n      
text-transform n      
top n n n n
unicode-bidi n n n  
vertical-align f f n  
visibility n n n n
white-space n n n  
widows n n n  
width n f n  
word-spacing n n n  
z-index n n n  

Notes

When using page-break-before and page-break-after inside tags that are 1 element in PDF (like lists or tables) the outcome of adding a new page is unpredicted.

Examples

Default XHTML/CSS processing (Java):

The quick way

	// create a document to write to	final Document doc = new Document();	PdfWriter.getInstance(doc, new FileOutputStream("out.pdf"));	// make sure it's open	doc.open();	// read the html from somewhere	BufferedInputStream bis = new BufferedInputStream(new FileInputStream("snippet.html"));	// parse and listen for elements to add to the document	helper.parseXHtml(new ElementHandler() {		public void addAll(final List<Element> currentContent) throws DocumentException {			for (Element e : currentContent) {				doc.add(e);			}		}		public void add(final Element e) throws DocumentException {			doc.add(e);		}	}, new InputStreamReader(bis));	doc.close();	

The extended setup

	// create a document to write to	final Document doc = new Document();	PdfWriter.getInstance(doc, new FileOutputStream("out.pdf"));	// make sure it's open	doc.open();	// read the html from somewhere	BufferedInputStream bis = new BufferedInputStream(new FileInputStream("snippet.html"));	// setup default config	XMLWorkerConfigurationImpl config = new XMLWorkerConfigurationImpl();	config.tagProcessorFactory(new Tags().getHtmlTagProcessorFactory()).cssResolver(new StyleAttrCSSResolver()).acceptUnknown(true));	XMLWorkerHelper helper = new XMLWorkerHelper();	// parse and listen for elements to add to the document	helper.parseXHtml(new ElementHandler() {		public void addAll(final List<Element> currentContent) throws DocumentException {			for (Element e : currentContent) {				doc.add(e);			}		}		public void add(final Element e) throws DocumentException {			doc.add(e);		}	}, new InputStreamReader(bis), config);		doc.close();	

Demo

There is an online demo available where input can be done through a tinyMCE editor and a PDF is created from the provided input.

Plans for the future