Using Word Classes

The function term() is used to search for items from a given list of words (a word class).

A word class is an entry in a dictionary of the WordClasses category. Such an entry may contain any number of words one would like to search for. Certainly, it is possible to simply list those words in a query, but using word classes offers a number of advantages:

  1. Concise notation;

  2. Ability to re-use the same word list;

  3. All changes made in the list automatically apply to all queries that use it.

Syntax

term([part_of_speech,] word_class_name, …​)

Arguments

The function requires a word class name and takes any number of list names from a dictionary of the WordClasses category.

To narrow the search it is possible to state an optional argument part of speech (POS-tag). In such case the function will only return those word class matches that have the indicated POS-tag.

The list of valid values for the argument part of speech is included in the table below.

Part of speech value

Short Notation

Example

noun

event, sister’s, cats …​

verb

talked, admit, moving, goes …​

adverb

advb

suddenly, more, yet, now …​

adjective

adjc

big, important, hard-working …​

particle

prcl

not, to …​

pronoun

pron

me, he, their, who, nothing …​

numeral

nmrl

three, fifth …​

Examples of word classes from the Default dictionary of the WordClasses category can be found in the picture below. More information on working with word classes can be found in the chapter Working with dictionaries.

pdl term 1

Parameters

The function also takes a number of named parameters listed in the table below.

Parameter

Valid values

Description

Example

stem

yes/no

Searches for all word forms of the word list elements/matches only the forms indicated in the word list. By default — yes

term(costs, stem:=no) will find "costs", but not "cost"

pos

yes/no

Searches by POS-tag/ignores POS-tag stated in a word list. By default – yes

term(months, pos:=no) will match "may" both in "he was born in May" and in "a minister may face trial".

match

range

the whole fragment of text between the first and the last argument, including punctuation, is extracted.

Example

term(noun, months) returns all word forms of elements from the Months word class if they are nouns, e.g. "November", "November’s", "Novembers".

term(company_postfix, match:=range) returns all elements from the "postfix" word class including punctuation, e.g. "A.G", "Advisors, LLC", "Agency, Inc.", etc.

term(company_postfix) returns all elements from the "postfix" word class without punctuation, e.g. "A G", "Advisors LLC", "Agency Inc.", etc.

Task Example: Searching for Dates

In order to find dates in the format <month> <day>, <year> (e.g. "August 30, 2016"), a sequence of a month name, one or two digits, comma, and another four digits should be defined in the query:

phrase(0, lemma(noun, january, february, march, april, may, june, july, august, september, october, november, december), length(1, 2, char(digit)), char(comma), length(4, char(digit)))

However, one might need to use the list of month names again, so it is more convenient to create a word class Months. In addition, it makes the query look more concise:

phrase(term(noun, months), length(1, 2, char(digit)), char(comma), length(4, char(digit)))

pdl term 2