todocpart
Arguments
Takes two required arguments. The first argument section_name specifies a document section a user wants to extract. It takes values listed in the table below.
Value |
Comments |
section |
Document’s section. |
section_level |
Section’s level (coincides with a heading’s level). |
heading |
Section’s heading. |
heading_level |
Heading’s level. |
table |
The whole table text, including table’s name. |
table_name |
Table’s name. |
row_text |
Row’s text (all cells' values separated by a space). |
row_name |
Row’s name (a value of the leftmost cell of the row). |
col_text |
Column text (all columns' value separated by a space). |
col_name |
Column’s name (a value of the top cell of the column). |
cell_text |
Cell’s text. |
cell_unit |
Cell’s units (if specified). |
cell_factor |
Cell scale factor. |
table_num |
Table’s number. |
row_num |
Row’s number. |
col_num |
Column’s number. |
page |
Returns the text of the page where the argument was found. |
page_num |
Returns the number of the page where the argument was found. |
hyperlink |
Internet hyperlink. |
The second argument is a reference to a named group. The function also takes the following optional named parameters:
Parameter |
Comments |
match:=range/arguments |
If arguments are discontinuous and they are extracted within several sentences, only these sentences appear in the result. When the optional named parameter match:=range is switched on, the argument is converted to the whole text fragment from the first sentence till the last one. |
first:=<numeral> |
If the argument is omitted, the parameter is treated as a range of values. Otherwise, it specifies the offset of the start position. |
last:=<numeral> |
If the argument is omitted, the parameter is treated as a range of values. Otherwise, it specifies the offset of the end position. |
separator:=<string> |
The user can indicate a custom separator. If it is not specified, default separator ";" is used. |
table_level:=<numeral> |
Specifies a table level of the elements a user searches for. By default, the level is not set. |
nested:=<string> |
Specifies the search range within/out of/ within and out of nested tables. Takes "yes"/"no"/"any" values. Set to "any" by default. |
has_nested:=<string> |
Specifies if a table has nested tables. Takes "yes"/"no"/"any" values. Set to "any" by default. |
parent_table:=<string> |
Specifies if the output for a parent table should be shown. A parent table is the table one level up. Takes "yes"/"no"/"any" values. Set to "no" by default. |
ocr_confidence |
Returns an integer number corresponding to the minimum OCR module recognition confidence score of the words included in the argument. |
default:=<string> |
Specifies the value assigned to the attribute if the result is empty. |
Notes
-
The parameter section may be accompanied by the parameter field which takes on the value body, which converts to a text body; heading which converts to a text heading and any, which converts to both body and heading. By default, field:=any.
-
The hyperlink parameter finds hyperlinks only in html-pages. In order to use the parameter, it is necessary to connect the node to an already executed parent node Internet source.
-
The parameter hyperlink may be accompanied by the parameter field which takes on the value text, which converts to a reference name; url which converts to a hyperlink’s URL. By default, field:=text.
-
Named parameters that search for table elements coincide with the totable() function parameters. Thus, the parameters first and last specify the offset of the start and the end position relative to the argument. By default, first:=0, last:=0.
-
The parameters first and last work for two regimes: for table and table_name they deal with the document (the previous/following table or table name in the text), but in other cases they deal with a table.
-
When using the first and last parameters, in case of a discontinuous argument (or when arguments are omitted), duplicate elements found by the search query are not removed (firstly, the range from first to last for the first found result is formatted, then for the second found result, etc. The intersecting sets are not removed for the convenience of results analysis.