APPENDIX
-
Examples of Simple XPDL Rules
-
Examples of Hierarchical XPDL Rules
Example 1
The rule extracts heart rate values. The query matches words "heart rate" or "pulse" or "HR" followed by a verb form of "be" or a colon or "of" followed by a two- or three-digit number.
The heart rate value forms a named group "heart_rate" and goes into the output as "HeartRate".
XPDL rule
Rule fragment
rule: heart_rate
{
query: phrase(0, phrase(0, heart or pulse, rate) or case(upper, [HR]) or pulse,
be or ":" or of,
{length(2, 3, char(num))}:heart_rate)
result: HeartRate = $heart_rate
}
Text
Result
Example 2
The rule extracts International Standard Serial Numbers (ISSN) which are unique codes used by publishers to identify a serial publication. The rule looks for the word "ISSN" followed by a sequence of four digits, a dash and another four digits or three digits and "x".
The ISSN number forms a named group "issn" and goes into the output as "Match", while the attribute "Type" receives a constant value "ISSN".
Rule fragment
rule: ISSN
{
query: sfollow(case("ISSN"), not char(rb), {phrase(0, length(4, 4, char(digit)), char("-"), length(4, 4, char(digit)) or phrase(0, length(3, 3, char(digit)), "x"))}:issn)
result: Number = $issn
attribute: Type = "ISSN"
}
Text
Result
Example 3
The rule extracts geoadministrative entities using the context. It looks for the phrase "road to" followed by the repeated occurrence (from one to five repetitions) of the following elements:
-
Title-cased word which is unknown to Morphology Dictionary and consists of alphabetic characters
-
Preposition de/do/du/da
The name of a geoadministrative entity forms a named group "geoadm", which goes into the output as "Name", while the attribute "Category" receives a constant value "geolocation".
XPDL rule
Rule fragment
rule: road_to
{
query: phrase(0, road, to,
{repeat(1, 5, char(alpha, case(title, unknownword(Morphology))) or orn(de, do, du, da))}:geoadm)
confidence: 0.7
result: Name = $geoadm
attribute: Category = "Geolocation"
}
Text
Result
Example 4
The rule extracts positive and negative feedback on customer service. The head rule "service_quality" looks for the words "service" or "customer service" and filters out all the texts that do not contain them. The rule has two child rules - "positive" and "negative".
The rule "positive" looks for a sequence of a positive adjective (e.g. "excellent") and the head rule match. The sequence forms a named group "m" and goes into the output as "Match", which has the attributes "Evaluation" (evaluative adjective) and "Object" (object of evaluation).
The rule "negative" looks for a sequence of a negative adjective (e.g. "terrible") and the head rule match. Just as in its sister rule, the sequence forms a named group "m" and goes into the output as "Match", which has the attributes "Evaluation" (evaluative adjective) and "Object" (object of evaluation).
XPDL rule
Rule fragment
rule: service_quality
{
// get only texts with keywords
query: {phrase(0, optional(customer), service)}:obj
rule: positive
{
// get keywords in positive context: good, excellent, great
// good customer service
query: {phrase(0, {possible(orn(good, excellent, great))}:eval, $obj)}:m
result: Match = $m
attribute: Evaluation = $eval
attribute: Object = $obj
}
rule: negative
{
// get keywords in negative context: bad, terrible, horrible
// good customer service
query: {phrase(0, {possible(orn(bad, terrible, horrible))}:eval, $obj)}:m
result: Match = $m
attribute: Evaluation = $eval
attribute: Object = $obj
}
}
Text
Result
Example 5
This rule extracts facts about bankruptcy. The query in the head rule "bankruptcy_context" looks for bankruptcy-related context, which can be a noun (assigned to the named group "noun") or an adjective (assigned to the named group "adj"). The rule acts as a filter to exclude texts that are not connected with bankruptcy which increases execution speed. The head rule has two child rules - "noun" and "adjective".
The rule "noun" looks for company or organization names that occur in the noun context found by the head rule. The rule outputs the query match as "Match" and the name of the bankrupt company or organization as attribute "Company".
The rule "adjective" looks for company or organization names that occur in the adjective context found by the head rule. It has a child exception rule "negative_context", which excludes cases of future tense - when a company or organization has not become bankrupt yet. This rule outputs the query match as "Match" and the name of the bankrupt company or organization as attribute "Company".
Please note that the rule requires a parent Entity Extraction node.
XPDL rule
Rule fragment
rule: bankruptcy_context
{ // gets only texts with key words and phrases: bankruptcy, bankruptcy filing etc.
query: {bankruptcy or phrase(0, "bankruptcy", "filing"|"auction"|"protection")}:noun
or
{bankrupt insolvent}:adj
rule: noun
{ // Find companies and organizations in bankruptcy context with noun keywords
query: {phrase(0,
// Solyndra filed for bankruptcy
phrase(2, {entity(companies|organizations)}:bankrupt, "file for"|"declare", $noun)
or
// bankruptcy of Solyndra LLC
phrase($noun, [of], {entity(companies|organizations) / lemma(genitive)}:bankrupt)
or
// the Solyndra's bankruptcy
phrase(0, lemma (genitive, {entity(companies)}:bankrupt), $noun)
)}:m
result: Match = $m
attribute: Company = toentity(companies|organizations, $bankrupt, field:=Name)
}
rule: adjective
// Find companies and organizations in bankruptcy context with adjective keywords
{
query: // bankrupt Solyndra
{phrase({$adj}:context,
optional(repeat(1, 4, lemma (noun_nominative adjective participle present))),
{entity(companies|organizations)}:bankrupt)
or
// Solyndra, a bankrupt solar panel manufacturer
phrase(3, {entity(companies|organizations)}:bankrupt,
[a]|[an],
$adj,
"company"|"firm"|"group"|"maker"|"manufacturer"|"producer",
not(entity(companies|organizations)))}:m
rule_except: negative_context
{ // excludefuture tense to leave out companies that are not bankrupt yet
query: phrase(0, "soon-to-be"|lemma(verb, "will"), $m)
result: Match = $m
attribute: Company = toentity(companies|organizations, $bankrupt, field:=Name)
}
}
}
Text
Result
Example 6
This rule extracts relationships between a company and its founder. The query in the rule "filter_texts" matches all the texts where relationship participants are found - entities of the type People, Company or Organization. The rule has a child rule "key_words".
The rule "key_words" looks for words and phrases that may refer to the company founders in the texts matched by the head rule ("filter_texts"). For shorter notation, the rule calls a global macro "founder" which lists some of those words. The rule has a child filter rule "singular".
The filter rule "singular" filters matches of the upper-level rule to keep only those that contain singular nouns. The rule has a child rule "founder_of".
The rule "founder_of" describes a pattern where a person name (stored in the named group "person") is followed by the optional arguments (e.g. verb "to be" or adverb "currently"), the word "founder" or its synonym (stored in the named group "founder"), the preposition "of" or "at" and a company name (stored in the named group "company").
The rule outputs the concatenated match elements "person", "founder_np" and "company" as "Match", the name of the founder as "Founder" and the name of the founded company as "Company".
Please note that the rule requires a parent Entity Extraction node.
XPDL rule
Rule fragment
// nouns that designate a founder
macro: founder() = orn([founder], [cofounder], "co-founder")
rule: filter_texts
// get only texts with main components: people, companies, organizations
{
query: {entity(people, field:= name)}:person or {entity(companies) or entity(organizations)}:company
rule: key_words
// in the texts with main components find key words and phrases: founder, founding partner etc.
{
query: {macro(founder) or
phrase(0, [founding] or macro(founder), [and], orn ("partner", "member", "president", "chairman", "chair", [CEO], phrase("board", "member")))}:founder_np
and
($person or $company)
rule_filter: singular
// leave only texts with keywords for "founder" in singular
{
query: $founder_np & lemma(singular)
rule: founder_of
// find main components in the text with keywords
// Patterns:
// Michael Bloomberg is the founder of Bloomberg LP
// Michael Bloomberg, founder of Bloomberg LP
// Michael Bloomberg is also the founder of Bloomberg LP
// Michael Bloomberg, founder and CEO of Bloomberg LP
{
query: phrase({$person}:founder,
optional([is] or [was] or [known as] or char("(") or lemma(verb, become)),
optional(also or currently),
optional([a] or [the]),
{$founder_np}:founder,
[of] or [at],
{$company}:founder)
result: Match = concat($founder:person, " ", $founder:founder_np, " ", $founder:company)
attribute: Founder = toentity(people, $founder:person, field:=name)
attribute: Company = toentity(companies|organizations, $founder:company, field:=name)
}
}
}
}
Text
Result