Macros Support
Note: This section covers a relatively advanced topic that is relevant when writing large and complicated rule sets. It can be skipped without loss of continuity.
Macros are useful if the rule file contains multiple repetitions of the same (or almost the same) rule or query. Instead of copy-pasting same part multiple times, you can declare a macro that holds these repeated rules or queries and then call that macro every time you want to use them. Macros also provide the ability to make changes in a single place that will be applied to multiple locations.
Macro Declaration Syntax
Macro declaration starts with the trigger macro: followed by macro name and comma-separated arguments list in parentheses. If a macro has no arguments, the parentheses are left empty. The body of a macro is enclosed in curly brackets. Macro names are user-defined and may consist of any alphanumerical characters or underscore sign (special characters and punctuation such as percent or at signs, dots, commas, spaces, brackets, etc. are not allowed).
If a macro contains a PDL query only, a simplified declaration is allowed:
Macros can be declared either at the top of a rule file (global macros, which can be called by any rule in the file) or inside a rule before query section (local macros, which can be called by the rules in which they were declared and its child rules).
A macro declaration must precede the call of the macro in the file. After a macro is declared, it can be called and used.
Macro Call Syntax
A macro call starts with the word macro followed by the macro name and an optional comma-separated list of arguments in parentheses.
Note that macros do not help reduce execution time, but they do reduce typing time and repetition of the same rule fragments, which improves rules readability.
Example. Extract companies names and abbreviations
Let us consider a ruleset in Figure 1 that extracts company names.
Rule fragment
rule: full_comp_names
{
/* For example, Citigroup Inc, Guggenheim Securities LLC, American Airlines */
query: {phrase(0, repeat(case(title_upper mixed), lemma(noun|adjective)),
case(title upper, form(holding, corp, venture, inc, llc, plc, airlines, partners, petroleum)))}:comp
/*excllude contexts as "Agreement to form(Joint Venture"*/
rule_except: exclude_not_companies
{
query: $comp & orn("joint venture", "managing partners", "bank holding company act")
result: Company = $comp
}
}
/*For example, software firm Epic*/
rule: short_comp_names
{
query: {phrase(0, orn(financial, "real estate", software, insurance), form( company, firm),
repeat(case(title, lemma(noun|adjective))))}: comp
result: Company = $comp
}
The first rule ("full_comp_names") matches company names that consist of several capitalized words followed by a word for the company’s type (such as "Inc.", "LLC" or "Holding"). The ruleset matches formal company names, such as "Geysers Power Company, LLC" or "Morgan Stanley Capital Group Inc.". The nested exception rule "exclude_not_companies" is necessary to exclude incorrect matches found by the upper-level rule.
The second rule extracts informal company names (such as "Tesco" rather than "Tesco PLC"). To reduce the number of errors, the rule relies on surrounding context. The rule matches phrases that describe a company’s business sector ("insurance company", "software firm", etc.) followed by title-cased words (assuming that they may be a company name).
The next step is to extract company name abbreviations that sometimes follow a company’s name. Abbreviations may appear in a variety of forms ("Foster Wheeler <FWC>…", "Foster Wheeler, or FWC…", "Foster Wheeler, known as FWC…", "Foster Wheeler ("FWC")…", etc.) and if the abbreviation extraction part is added to the query that extracts company names, the query may become unreadable. So we need a separate rule for abbreviation extraction.
As shown in Figure 2, the abbreviation rule has to be added to both "full_comp_names" and "short_comp_names" because abbreviations may follow both formal and informal company names.
Rule fragment
macro: abbrev_query() = {phrase($comp, phrase(0, "(", optional(""""), {length(2,3, case(upper))}:abbr, optional(""""), ")") /* FWC, "FWC" */
or phrase(0, orn("or", "known as", "referred to as", aka), {length(2,3, case(upper))}:abbr) /* or FWC, known as FWC */)}:comp_full
rule: full_comp_names
{
/* For example, Citigroup Inc, Guggenheim Securities LLC, American Airlines */
query: {phrase(0, repeat(case(title_upper mixed), lemma(noun|adjective)),
case(title|upper, form(holding, corp, venture, inc, llc, plc, airlines, partners, petroleum)))}:comp
/*excllude contexts as "Agreement to form(Joint Venture"*/
rule_except: exclude_not_companies
{
query: $comp & orn("joint venture", "managing partners", "bank holding company act")
result: Company = $comp
rule: abbreviations
{
query: {phrase($comp, phrase(0, "(", optional(""""), {length(2,3, case(upper))}:abbr, optional(""""), ")") /* FWC, "FWC" */
or phrase(0, orn("or", "known as", "referred to as", aka), {length(2,3, case(upper))}:abbr) /* or FWC, known as FWC */
)}:comp_full
result: Company = $comp_full:comp
attribute: Abbreviation = $comp_full:abbr
result: Company = $comp_full:comp
attribute: Abbreviation = $comp_full:abbr
}
}
}
/*For example, software firm Epic*/
rule: short_comp_names
{
query: {phrase(0, orn(financial, "real estate", software, insurance), form( company, firm),
repeat(case(title, lemma(noun|adjective))))}:comp
result: Company = $comp
rule: abbreviations
{
query: {phrase($comp, phrase(0, "(", optional(""""), {length(2,3, case(upper))}:abbr, optional(""""), ")") /* FWC, "FWC" */
or phrase(0, orn("or", "known as", "referred to as", aka), {length(2,3, case(upper))}:abbr) /* or FWC, known as FWC */
)}:comp_full
result: Company = $comp_full:comp
attribute: Abbreviation = $comp_full:abbr
result: Company = $comp_full:comp
attribute: Abbreviation = $comp_full:abbr
}
}
The rules already look quite bulky and complicated, and the situation might get even worse if more rules with some other context were added where the abbreviation extraction rule would have to be copy-pasted again.
To remove repetitive statements, consider defining a macro. Figure 3 shows how to define a macro that holds PDL query.
Rule fragment
macro: abbrev_query()
{
query: {phrase($comp, phrase(0, "(", optional(""""), {length(2,3, case(upper))}:abbr, optional(""""), ")") /* FWC, "FWC" */
or phrase(0, orn("or", "known as", "referred to as", aka), {length(2,3, case(upper))}:abbr) /* or FWC, known as FWC */
)}:comp_full
}
Figure 3 shows standard macro declaration, but as the macro only contains a PDL query, simplified declaration may be used as well (Figure 4).
Rule fragment
macro: abbrev_query() = {phrase($comp, phrase(0, "(", optional(""""), {length(2,3, case(upper))}:abbr, optional(""""), ")") /* FWC, "FWC" */
or phrase(0, orn("or", "known as", "referred to as", aka), {length(2,3, case(upper))}:abbr) /* or FWC, known as FWC */)}:comp_full
After the macro is declared, it must be called when one wants to use it. Figure 5 shows how to call a macro.
Rule fragment
macro: abbrev_query() = {phrase($comp, phrase(0, "(", optional(""""), {length(2,3, case(upper))}:abbr, optional(""""), ")") /* FWC, "FWC" */
or phrase(0, orn("or", "known as", "referred to as", aka), {length(2,3, case(upper))}:abbr) /* or FWC, known as FWC */)}:comp_full
rule: full_comp_names
{
/* For example, Citigroup Inc, Guggenheim Securities LLC, American Airlines */
query: {phrase(0, repeat(case(title_upper mixed), lemma(noun|adjective)),
case(title|upper, form(holding, corp, venture, inc, llc, plc, airlines, partners, petroleum)))}:comp
/*excllude contexts as "Agreement to form(Joint Venture"*/
rule_except: exclude_not_companies
{
query: $comp & orn("joint venture", "managing partners", "bank holding company act")
result: Company = $comp
rule: abbreviations
{
query: macro(abbrev_query)
result: Company = $comp_full:comp
attribute: Abbreviation = $comp_full:abbr
}
}
}
/*For example, software firm Epic*/
rule: short_comp_names
{
query: {phrase(0, orn(financial, "real estate", software, insurance), form( company, firm),
repeat(case(title, lemma(noun|adjective))))}:comp
result: Company = $comp
rule: abbreviations
{
query: macro(abbrev_query)
result: Company = $comp_full:comp
attribute: Abbreviation = $comp_full:abbr
}
}
After removing the repeating queries,the rules became more readable and better organized. Going a step further is to declare a macro that holds a whole XPDL rule, as shown in Figure 6.
Rule fragment
macro: abbreviations()
{
rule: abbreviations
{
query: {phrase($comp, phrase(0, "(", optional(""""), {length(2,3, case(upper))}:abbr, optional(""""), ")") /* FWC, "FWC" */
or phrase(0, orn("or", "known as", "referred to as", aka), {length(2,3, case(upper))}:abbr) /* or FWC, known as FWC */)}:comp_full
result: Company = $comp_full:comp
attribute: Abbreviation = $comp_full:abbr
}
}
rule: full_comp_names
{
/* For example, Citigroup Inc, Guggenheim Securities LLC, American Airlines */
query: {phrase(0, repeat(case(title_upper mixed), lemma(noun|adjective)),
case(title|upper, form(holding, corp, venture, inc, llc, plc, airlines, partners, petroleum)))}:comp
/*excllude contexts as "Agreement to form(Joint Venture"*/
rule_except: exclude_not_companies
{
query: $comp & orn("joint venture", "managing partners", "bank holding company act")
result: Company = $comp
macro(abbreviations)
}
}
/*For example, software firm Epic*/
rule: short_comp_names
{
query: {phrase(0, orn(financial, "real estate", software, insurance), form( company, firm),
repeat(case(title, lemma(noun|adjective))))}:comp
result: Company = $comp
macro(abbreviations)
}
Note: Macros are a special kind of syntax intended to make the rules easier to read and maintain, but they do not affect execution speed. They are expanded prior to execution, so the resulting code is equivalent to what it would have been like without macros.
Passing parameters to macros
Macros with arguments are more powerful than macros without arguments, as they can replace rules and queries that are similar but not exactly the same.
Let us consider a slightly modified ruleset in Figure 7 related to company name extraction.
Rule fragment
rule: full_comp_names
{
/* for example, Citigroup Inc, Guggenheim Securities LLC, American Airlines */
query: {phrase(0, repeat(case(title|mixed|upper, lemma(noun|adjective))),
case(title|upper, form(holding, corp, venture, "inc.", llc, plc, airlines, partners, petroleum)))}:comp
/* exclude contexts as "Agreement to form(Joint Venture ...", "as the Managing Partners and beneficial owners..." */
rule_except: exclude_not_companies
{
query: $comp & orn("joint venture", "managing partners", "bank holding company act")
result: Company = $comp
rule: abbreviations
{
/*Foster Wheeler Corp "FWC"*/
query: {phrase($comp, phrase(0, "(", {length(2,4, case(upper))}:info, ")"))}:comp_full
result: Company = $comp_full:comp
attribute: Abbreviation = $comp_full:info
}
rule: location
{
/* istrata Group Inc. (New_York) */
query: {phrase($comp, phrase(0, "(", {knownword(GeoAdministrative)}:info, ")"))}:comp_full
result: Company = $comp_full:comp
attribute: Location = $comp_full:info
}
rule: stock_symbol
{
/* Isonics Corp (NASDAQ: ISON} */
query: {phrase($comp, phrase(0, "(", phrase(NASDAQ or NYSE, {case(upper)}:info), ")"))}:comp_full
result: Company = $comp_full:comp
attribute: Stock Symbol = $comp_full:info
}
}
}
These rules extract company names and additional information that is often indicated after a company name in parentheses - abbreviation("AstraZeneca Plc.(AZN)"), company location ("Astrata Group Inc. (New York)") or stock symbol ("Isonics Corp (NASDAQ: ISON)"). Locations, abbreviations and stock symbols are displayed in "Location", "Abbreviation" and "Stock Symbol" columns respectively.
The ruleset is run on the following text:
The rules produce the output shown in Figure 8.
Notice that the rules "abbreviation", "location" and "stock symbol" are almost identical except for the rule name (highlighted in red on the figure above), a query part that matches information in parentheses (highlighted in green) and attribute names (highlighted in blue). These parts could become macro parameters, so that different values can be assigned to them when calling the macro. An updated ruleset with a macro using parameters is shown in Figure 9.
To define a macro that takes arguments, parameters should be listed in parentheses that follow the macro name in the macro definition. Macro parameters can refer to the rule name, query, subquery, confidence value, result, attribute name, or value.
When calling a macro, a list of arguments should be stated in parentheses after the name of the macro. The number of arguments stated when calling a macro must match the number of parameters in the macro definition. When the macro is expanded, each use of a parameter in its body is replaced by the value of the corresponding argument.
Rule fragment
macro: additional_info(rule_name, info_in_bracket, attribute_name)
{
rule: rule_name
{
query: {phrase($comp, phrase(0, "(", info_in_bracket, ")"))}:comp_full
result: Company = $comp_full:comp
attribute: attribute name = $comp_full:info
}
}
rule: full_comp_names
/*for example, Citigroup Inc, Guggenheim Securities LLC, Amerrican Airlines*/
{
query: {phrase(0, repeat(case(title_mixed upper, lemma(noun|adjective))),
case(title_upper, form(holding, corp, venture, "inc.", llc, plc, airlines, partners, petroleum)))}:comp
rule_except: exclude_not_companies
{
query: $comp & orn("joint venture", "managing partners", "bank holding company act")
result: Company = $comp
macro(additional_info, abbreviation, {length(2,4, case(upper))}:info, Abbreviation)
macro(additional_info, location, {knownword(GeoAdministrative)}:info, Location)
macro(additional_info, stock symbol, phrase(NASDAQ or NYSE, {case(upper)}:info), Stock Symbol)
}
}
Figure 10 illustrates this macro’s expansion in detail.
Rule fragment
macro: info_complémentaire(rule_name, info, attribute_name)
{
rule: info_complémentaire
{
query: {phrase($comp, phrase(0, "(" or "/", info, ")" or "/"))}:comp_full
result: Entreprise = $comp_full
attribute: attribute name = $comp_full:info
}
}
rule: full_comp_names
{
query: {phrase(0, repeat(case(title|mixed, lemma(noun|adjective))),
case(title|upper, form(airlines, assurance, développement, holding,
immobilier, "inc.")))}:comp
result: Entreprise = $comp
macro(info_complémentaire, abbreviation, {length(2,4, case(upper))}:info, Abbreviation)
macro(info_complémentaire, lieu, {knownword(GeoAdministrative)}:info, Lieu)
}