Ignore Tags

Thunderstone Search Appliance Manual

Ignore Tags

Syntax: one or more pairs of strings

All data between specified begin and end tag pairs will be stripped from the HTML before the text is extracted (i.e. links are unaffected). These are simple strings, not patterns nor REX expressions, and the case is ignored. This is useful for excluding boilerplate or otherwise unwanted portions of HTML documents. Tag pairs should not nest nor overlap in documents. Documents with no begin tag will be unaffected. Documents with no end tag after the last begin tag will still discard HTML from the last begin tag to end of document.