Syntax: REX expression, Replace expression, field to search, where to store it
This provides alternate means of setting both the HTML fields
(Modified
, Title
, Description
etc.) and any
Additional Fields. It allows getting page information from
non-default places by searching and optionally replacing the data.
New blank rows will be provided as rows are used. See below for
examples.
REX Search - Allows you to specify a REX expression to narrow
down what contents of the From Field
will be used. Leave it
empty to use the entire field.
Note that a REX Search
MUST be specified for the following
From
field types:
HTML
HTML, raw output
Text
.+
as the
REX Search
.
Replace - Replace
can be used to specify a subset of the
value to be stored in the To
field (or subset of the match, if
you're using REX Search
. It uses sandr
replacement
string syntax.
From Field - specifies what the source field is for the data.
HTML
- the raw HTML source of the page. After matching, HTML tags
are removed and HTML entities are resolved.HTML, raw output
- the raw HTML source of the page. Content is
left as-is, with tags in place.Text
- the text of the page, after HTML rendering has
been applied.Title
- the HTML title of the pageAll Meta
- the contents of all meta
headers
specified in the HTML page.Meta Field ->
- the contents of a specific meta
field, specified in the next input box, From Meta Field.Keywords
- the contents of the keywords
meta
header.Description
- the contents of the description
meta
header.Mime Type
the MIME type of the page. This may have been
derived from the Content-Type
header, a
<META HTTP-EQUIV>
tag, or the URL extension, depending on
what is available.URL
- the URL of the page.URL Decoded
- the decoded version of the URL. Any %XX
'URL-safe' sequences in the URL are replaced with their real
characters. E.g. Pre%20%2D%20Expense%20Report.doc
is decoded into
Pre - Expense Report.doc
.URL Protocol
- the URL's protocol, e.g. http
.URL Host
- the host (without port number) from the URL.URL Host and Port
- the host (and port number if given) from
the URL.URL Path
- the file path from the URL.URL Path Decoded
- the file path from the URL, URL-decoded.URL Anchor
- the anchor from the URL (if any), i.e. the part
after the #
(pound sign). May not be available if already stripped.URL Query
- the query string from the URL (if any), i.e.
the part after the ?
(question mark).URL Query Var ->
- the value of the URL query-string variable
named in From Meta Field, URL-decoded.Referrer's Data
- the value of a referring pages field. Store refs is required for this. The field selected will be the same field being populated.
From Meta Field - If Meta Field ->
or
URL Query Var ->
is given as the From Field, this field is
used to specify which meta field's or query var's contents to use as
data. Leave blank otherwise.
Entering text in this field will force the use of Meta Field ->
,
if From Field is set to anything besides Meta Field or URL Query Var.
To Field - specifies where information should be stored.
Modified
, Title
, Description
,
Keywords
, Depth
, and Body
- Override the standard fields
extracted from the content.Authorization URL
- Populates the URL used when checking this result
for Results Authorization. Please see the Allow Authorization URL
section (3.6.57) for more details.Category
- To populate the category via Data From Field, all the
possible category names must be entered in the Category setting. Using one or
more Data From Field rules to set Category will cause the Appliance to
ignore the Categories' URL Patterns and instead set category membership based
on these Data From Field rules.
Note: due to the way categories are stored, if categories are added, reordered, or removed after content has been walked, then a New walk will need to be performed to update the content's categories. Renaming categories does not need a rewalk.
Additional Links
- This target allows you to use Data From
Field to create links that will be walked. These links are subject to the
normal indexing rules, will be rejected if they match exclusions, etc.
Use of this Data From Field target has no effect on the existing links found on the current URL. The links generated by this target will be added to the standard set of links on the page.
Subfetch
- This causes the Search Appliance to take the value(s) it
finds and performs a fetch as URL(s). The URL can be absolute, or relative to
the current URL.
Nothing is changed by the subfetch itself, but any further Data From Field rules will use that fetched document(s) as the source of its content. Please see the Subfetch example below for a situation where this could be used.
Additional Fields
- If this profile has any
Additional Fields, they will be available as a target To Field
.
If you just added the name of a new Additional Field, you will need to hit
Update
for the new Additional Field to appear in the To Field
list.
Append - If set to Y
, then the Data From Field content will be
appended to the field's existing data instead of overwriting it. Date-type
targets, such as Modified
, do not support Append
.