Using Word Classes
The function term() is used to search for items from a given list of words (a word class).
A word class is an entry in a dictionary of the WordClasses category. Such an entry may contain any number of words one would like to search for. Certainly, it is possible to simply list those words in a query, but using word classes offers a number of advantages:
-
Concise notation;
-
Ability to re-use the same word list;
-
All changes made in the list automatically apply to all queries that use it.
Syntax
Arguments
The function requires a word class name and takes any number of list names from a dictionary of the WordClasses category.
To narrow the search it is possible to state an optional argument part of speech (POS-tag). In such case the function will only return those word class matches that have the indicated POS-tag.
The list of valid values for the argument part of speech is included in the table below.
Part of speech value |
Short Notation |
Example |
noun |
event, sister’s, cats … |
|
verb |
talked, admit, moving, goes … |
|
adverb |
advb |
suddenly, more, yet, now … |
adjective |
adjc |
big, important, hard-working … |
particle |
prcl |
not, to … |
pronoun |
pron |
me, he, their, who, nothing … |
numeral |
nmrl |
three, fifth … |
Examples of word classes from the Default dictionary of the WordClasses category can be found in the picture below. More information on working with word classes can be found in the chapter Working with dictionaries.
Parameters
The function also takes a number of named parameters listed in the table below.
Parameter |
Valid values |
Description |
Example |
stem |
yes/no |
Searches for all word forms of the word list elements/matches only the forms indicated in the word list. By default — yes |
term(costs, stem:=no) will find "costs", but not "cost" |
pos |
yes/no |
Searches by POS-tag/ignores POS-tag stated in a word list. By default – yes |
term(months, pos:=no) will match "may" both in "he was born in May" and in "a minister may face trial". |
match |
range |
the whole fragment of text between the first and the last argument, including punctuation, is extracted. |
Example
Task Example: Searching for Dates
In order to find dates in the format <month> <day>, <year> (e.g. "August 30, 2016"), a sequence of a month name, one or two digits, comma, and another four digits should be defined in the query:
phrase(0, lemma(noun, january, february, march, april, may, june, july, august, september, october, november, december), length(1, 2, char(digit)), char(comma), length(4, char(digit)))
However, one might need to use the list of month names again, so it is more convenient to create a word class Months. In addition, it makes the query look more concise:
phrase(term(noun, months), length(1, 2, char(digit)), char(comma), length(4, char(digit)))