todocpart

Purpose

Converts the argument to a document section it is found in.

Syntax

todocpart(section_name, $group)

Arguments

Takes two required arguments. The first argument section_name specifies a document section a user wants to extract. It takes values listed in the table below.

Value

Comments

section

Document’s section.

section_level

Section’s level (coincides with a heading’s level).

heading

Section’s heading.

heading_level

Heading’s level.

table

The whole table text, including table’s name.

table_name

Table’s name.

row_text

Row’s text (all cells' values separated by a space).

row_name

Row’s name (a value of the leftmost cell of the row).

col_text

Column text (all columns' value separated by a space).

col_name

Column’s name (a value of the top cell of the column).

cell_text

Cell’s text.

cell_unit

Cell’s units (if specified).

cell_factor

Cell scale factor.

table_num

Table’s number.

row_num

Row’s number.

col_num

Column’s number.

page

Returns the text of the page where the argument was found.

page_num

Returns the number of the page where the argument was found.

hyperlink

Internet hyperlink.

The second argument is a reference to a named group. The function also takes the following optional named parameters:

Parameter

Comments

match:=range/arguments

If arguments are discontinuous and they are extracted within several sentences, only these sentences appear in the result. When the optional named parameter match:=range is switched on, the argument is converted to the whole text fragment from the first sentence till the last one.

first:=<numeral>

If the argument is omitted, the parameter is treated as a range of values. Otherwise, it specifies the offset of the start position.

last:=<numeral>

If the argument is omitted, the parameter is treated as a range of values. Otherwise, it specifies the offset of the end position.

separator:=<string>

The user can indicate a custom separator. If it is not specified, default separator ";" is used.

table_level:=<numeral>

Specifies a table level of the elements a user searches for. By default, the level is not set.

nested:=<string>

Specifies the search range within/out of/ within and out of nested tables. Takes "yes"/"no"/"any" values. Set to "any" by default.

has_nested:=<string>

Specifies if a table has nested tables. Takes "yes"/"no"/"any" values. Set to "any" by default.

parent_table:=<string>

Specifies if the output for a parent table should be shown. A parent table is the table one level up. Takes "yes"/"no"/"any" values. Set to "no" by default.

ocr_confidence

Returns an integer number corresponding to the minimum OCR module recognition confidence score of the words included in the argument.

default:=<string>

Specifies the value assigned to the attribute if the result is empty.

Notes

  • The parameter section may be accompanied by the parameter field which takes on the value body, which converts to a text body; heading which converts to a text heading and any, which converts to both body and heading. By default, field:=any.

  • The hyperlink parameter finds hyperlinks only in html-pages. In order to use the parameter, it is necessary to connect the node to an already executed parent node Internet source.

  • The parameter hyperlink may be accompanied by the parameter field which takes on the value text, which converts to a reference name; url which converts to a hyperlink’s URL. By default, field:=text.

  • Named parameters that search for table elements coincide with the totable() function parameters. Thus, the parameters first and last specify the offset of the start and the end position relative to the argument. By default, first:=0, last:=0.

  • The parameters first and last work for two regimes: for table and table_name they deal with the document (the previous/following table or table name in the text), but in other cases they deal with a table.

  • When using the first and last parameters, in case of a discontinuous argument (or when arguments are omitted), duplicate elements found by the search query are not removed (firstly, the range from first to last for the first found result is formatted, then for the second found result, etc. The intersecting sets are not removed for the convenience of results analysis.

Returned Value

The returned data type is text or integer for the parameters section_level and heading_level.

Examples

todocpart(row_name, $m, first:=-1, last:=-1) matches the row name preceding the row containing the argument $m. For example, if the argument is found in the fifth row, the name of the fourth row will appear in the attribute.

todocpart(section, $m, field:=body) matches a text body the named group $m is found in.

XPDL-rule

Result

rule: r1

{

query: {docpart(ocr, confidence:<100)}:m

result: Match = $m

attribute: OCR = todocpart(ocr_confidence, $m)

}

The attribute of the numeric column OCR displays the recognition confidence of words that are identified by the OCR module as unreliable.

rule: r1

{

query: {sentence()}:m

result: Match = $m

attribute: OCR = todocpart(ocr_confidence, $m)

}

The attribute of the numeric column OCR displays the minimum word confidence of each sentence found by the rule.

XPDL-rule

Result

rule: r1

{

query: {docpart(page)}:m

result: Match = $m

attribute: PageText = todocpart(page, $m)

attribute: PageNumber = todocpart(page_num, $m, separator:=",")

}

todocpart(page, $m) returns the text of the page for the positions of the named group $m.

todocpart(page_num, $m) returns the number of the page for the positions of the named group $m.

XPDL-rule

Result

rule: r1

{

query: {docpart(hyperlink, "age")}:m

result: Match = $m

attribute: URL = todocpart(hyperlink, $m, field:=url)

attribute: RefName = todocpart(hyperlink, $m, field:=text)

}

todocpart(hyperlink, $m, field:=url) converts the named group $m to the hyperlink’s URL.

todocpart(hyperlink, $m, field:=text) converts the named group $m to the hyperlink’s name.