An introduction to nodes
Nodes constitute the fundamental building blocks of analysis. Nodes are the items that you add to a project’s flowchart to instruct PolyAnalyst to carry out a particular data processing operation.
A node represents a generic data processing action, such as import data or create statistical model. Once a node has carried out its purpose, the node also represents the results of the process, such as the imported dataset or the statistical model.
Speaking the language of PolyAnalyst
While nodes can be assessed individually, the real power of PolyAnalyst stems from using multiple nodes in a sequence to achieve a larger analytical objective. Nodes can be linked together to form a chain of operations. The output of a node can be used as the input to a subsequent node.
Using a language metaphor, nodes are like words that you can string together to form a sentence. In this sense, learning to use PolyAnalyst is similar to learning a second language. The challenge is translating your analytical objectives into this language. Learning the language requires learning the words (nodes), the definitions (each node’s role), and syntactic structure (how to place words in sequence).
The nodes available to you vary immensely in complexity. Speaking fluently takes some practice. It helps to start gradually by focusing on a few key node "words" such as import, filter, append, and analyze. You do not need to learn about all of the available nodes to get started.
The basics of using a node
Generally, you will be performing the following steps when using a node. We will be exploring these steps in greater detail.
-
Add a node to the flowchart.
-
Connect the node to other nodes.
-
Change the node’s settings.
-
Carry out the node’s operation.
-
Inspect the results of the node.
Input - Action - Output
Every node in PolyAnalyst accepts some form of input, performs some type of operation, and provides some type of output.
For example, consider a node that imports data from a database into PolyAnalyst:
-
The node’s input - the data stored within the database.
-
The node’s action - the process of importing the data.
-
The node’s output - the imported data.
Now consider a node that removes some columns from a dataset:
-
The node’s input - the dataset upon which to operate. This dataset is generally the output from some other node.
-
The node’s action - the process of removing certain columns from the dataset.
-
The node’s output - a new dataset consisting of the remaining columns.
Consider a node that adds a new column to a dataset:
-
The node’s input - the dataset upon which to operate. This dataset is the output from some other node.
-
The node’s action - the process of adding a column to the input dataset.
-
The node’s output - a new dataset consisting of all the columns from the input dataset and the new column.
Finally, consider a node that trains a statistical model:
-
The node’s input - the dataset upon which to train. This dataset is the output from some other node.
-
The node’s action - the process of training the statistical model.
-
The node’s output - the model object.
A node represents both a particular operation and the results of that operation. For example, a node that imports data represents both the action of importing data as well as the imported data.
A node does not produce output until it has been executed (until it has been instructed to carry out, and has completed, its operation). Up until the time the node is executed, the node only represents its operation. Once executed, the node represents both the operation and its result.
A completed analysis essentially consists of a collection of node results. For example, at the end of an analysis, you may have produced a few datasets and a few models and charts. You may also have generated certain reports. This collection of results constitutes your deliverable.
Configuring nodes
When a node is first added to the flowchart, it is generic. A generic node is not yet capable of execution (performing its operation).
You can choose to leave nodes in this generic state. In other words, you can add nodes to your flowchart and then not use those nodes. Such nodes will not consume significant resources, but they may leave your flowchart looking a bit cluttered. However, this may be ok, as at the start of your analysis you may be operating in more of a blueprinting mode where you are focusing on the general sequence of your operations and plan to review and fill in the details later.
In order to execute a node and have a node carry out its operation, you need to configure the node, at which point the node is no longer said to be generic.
The operation performed by a node is referred to as node execution. Instructing PolyAnalyst to carry out a node’s operation is referred to as executing a node.
Once imported, the dataset (representing the contents of the specific spreadsheet) is the output of the node. The output can then be used by other nodes as input.