word

Purpose

The function is used to search for tokens by such criteria as capitalization, morphological category, token type, alphabet. It duplicates the functionality of the functions char(), lemma(), stem(), form(), case(), allowing to write a query in a more concise way.

Syntax

word()

word([word_parameters,]argument)

Arguments

The function accepts single argument. When used without arguments, the function matches any token.

The first argument word_parameters is used to specify a part of speech, a modificator, capitalization, alphabet, etc.

The function also accepts the following optional named parameters:

Parameter

Comments

sentpart

Is used to set a particular syntactic role.

length

Is used to find tokens of a certain length.

ocr

Is used to find tokens that were recognized by the PolyAnalyst OCR module.

modality

Is used to switch on/off the search for tokens expressing modality.

negate:=yes/no/any

Is used to switch on/off the search for negative contexts.

junk

Is used to find junk tokens, such as ones containing non-alphabetic characters or unusually high percentage of consonants.

nojunk

Excludes junk tokens, such as ones containing non-alphabetic characters or unusually high percentage of consonants.

case

Is used to make search case-sensitive

alphabet

Is used to set a specific alphabet.

It is possible to set several named parameters using the symbols "_" (AND) and "|" (OR), for example, word(noun_upper|adjective) or word(noun_upper_adjective).

It is possible to set arguments using the named parameters lemma/stem/form, for example word(lemma:=start)

If there is a conflict between the first argument and the named parameter, the named parameter takes precedence.

Returned Value

Documents matching the query.

Examples

word() = . finds any token.

word(abc) = stem(abc) finds "abc" in any form.

word(noun) = partofspeech(noun) = lemma(noun) finds all nouns.

word(title) = case(title) finds any word in title case.

word(noun_title) = case(title, partofspeech(noun)) finds all nouns in title case.

word(noun_title|verb) = case(title, partofspeech(noun)) or partofspeech(verb) finds all nouns in title case and all verbs.

word(verb, do, negate:=yes) = negate(partofspeech(noun, do)) finds negative contexts with the word "do" ("never did", "doesn’t do").

word(noun, length:>=3) = length(3, partofspeech(noun)) finds a noun consisting of three or more characters.

word(noun_mixed|verb|alnum, length:>4, negate:=yes) = negate(length(4, lemma(verb) or case(mixed, lemma(noun)) or char(alnum))) finds either a noun in a mixed case or a verb or tokens that consist of alphabetic and numeric characters in a negative context and consisting of four or more characters.

word(title, "abc", sentpart:=subject) = sentpart(subject, case(title, abc)) finds a word "abc" in title case playing the role of the subject.