Interface Summary
Interface	Description
CharSequenceBuffer
Tag	Tag returned by `TagTokenizer`.
TagProcessorContext	Defines a set of methods that allows `TagRule`s to interact with the `TagProcessor`.
TagRule	User defined rule for processing `Tag`s encountered by the `TagProcessor`.
TagTokenizer.TokenHandler	Handler that will receive callbacks as 'tags' and 'text' are encountered.

Class Summary
Class	Description
BasicBlockRule<T>	`TagRule` helper class for dealing with blocks surrounded by an opening and closing tag.
BasicRule	Basic implementation of `TagRule`.
CustomTag	A CustomTag provides a mechanism to manipulate the contents of a Tag.
State	Acts a registry of `TagRule`s to apply whilst the `TagProcessor` is processing the document in this particular state.
StateTransitionRule
TagProcessor	Copies a document from a source to a destination, applying rules on the way to extract content and/or transform the content.
TagTokenizer	Splits a chunk of HTML into 'text' and 'tag' tokens, for easy processing.

Enum Summary
Enum Description

Tag.Type
Type of tag.

TagTokenizer.Token

Enum Summary
Enum	Description
Tag.Type	Type of tag.
TagTokenizer.Token

Package org.sitemesh.tagprocessor Description

The SiteMesh TagProcessor

This package is for processing tag-like markup languages - things with anglybrackets. HTML, XHTML, WML, XML and other SGML dialects.

Strengths:

Speed: Rather than attempting to parse the entire page into tags, attributes, etc, it skims through only partially parsing until it comes across a tag that it's interested in, at which point it does a more thorough parse of just that tag.
Tolerance: Because it's not attempting to build a tree, it doesn't care about malformed, or unbalanced tags. It just treats these as text and keeps going.

It has 2 APIs you can use:

Low level: TagTokenizer

The TagTokenizer scans through a document and fires events as it encounters Tags of interest. Anything that does not qualify as a Tag will be treated as a Text token.

This is a similar approach to the SAX API for XML processing.

High level: TagProcessor

The TagProcessor is built on top of the TagTokenizer and acts as a registry for TagRules and TextFilters. It also supports multiple States, allowing different rules to be applied in different sections of document.