Interface | Description |
---|---|
CharSequenceBuffer | |
Tag |
Tag returned by
TagTokenizer . |
TagProcessorContext |
Defines a set of methods that allows
TagRule s to
interact with the TagProcessor . |
TagRule |
User defined rule for processing
Tag s encountered by the TagProcessor . |
TagTokenizer.TokenHandler |
Handler that will receive callbacks as 'tags' and 'text' are encountered.
|
Class | Description |
---|---|
BasicBlockRule<T> |
TagRule helper class for dealing with blocks surrounded by an opening and closing tag. |
BasicRule |
Basic implementation of
TagRule . |
CustomTag |
A CustomTag provides a mechanism to manipulate the contents of a Tag.
|
State |
Acts a registry of
TagRule s to apply whilst the TagProcessor
is processing the document in this particular state. |
StateTransitionRule | |
TagProcessor |
Copies a document from a source to a destination, applying rules on the way
to extract content and/or transform the content.
|
TagTokenizer |
Splits a chunk of HTML into 'text' and 'tag' tokens, for easy processing.
|
Enum | Description |
---|---|
Tag.Type |
Type of tag.
|
TagTokenizer.Token |
This package is for processing tag-like markup languages - things with anglybrackets. HTML, XHTML, WML, XML and other SGML dialects.
Strengths:
It has 2 APIs you can use:
The TagTokenizer
scans through a document and fires events as it encounters Tag
s of
interest. Anything that does not qualify as a Tag
will be treated as a Text
token.
This is a similar approach to the SAX API for XML processing.
The TagProcessor
is built on top of the TagTokenizer
and acts as a registry for TagRule
s and
TextFilter
s.
It also supports multiple State
s, allowing different rules to be applied in different sections of
document.
Copyright © 2015. All Rights Reserved.