SPSS Modeler: Advanced Stream Automation

If you’re just starting scripting in modeler, please read the Basic Stream Automation article before this one!

In this article, we’ll talk through three more advanced topics including:

Setting properties of nodes
Looping through nodes to set connection properties on multiple nodes at once

These are extremely common use cases that can be combined to truly automate a stream – giving you the tools to dynamically set properties of nodes. This will allow you to automatically connect to databases or adjust values throughout the stream based on stream parameters.

Setting Node Properties
A node property is anything that you would set when you open a node. For a flat file source node this might be the source filepath, or in a filler node the script/logic you write to fill values. All properties can be set via scripting with the simple function ‘setPropertyValue’. This takes two arguments, a property and a value.

To set a property, first find the node using one of the find functions, such as findByType. You can then call the setPropertyValue function after referencing the node:

import modeler.api
stream = modeler.script.stream()
exportNode1 = stream.findByType(None,’Export_1′)
exportNode1.setPropertyValue( )

We can then pass in the property and value that we want to set for the node. In this case, exportNode1 is a database source node, which has a property of “query” – a SQL query that hits the connected database.

exportNode1.setPropertyValue(“query”,”select * from table”)

Remember to put the property name in double-quotes, along with the property value if it is a string. This script would result in setting the query in the source node:

You can look up the list of properties for any node in the IBM documentation (linked at the bottom of this article), along with examples:

The values in the ‘databasenode properties’ column can be passed in as the first argument of setPropertyValue, while the ‘Data type’ column is the expected datatype of the value in the second argument.

Looping/Modifying Multiple Nodes Simultaneously
Selecting multiple nodes of the same type or name can be extremely useful if you want to perform simultaneous operations. One of the most common use cases is to set database credentials for all sources or export nodes so that they do not need to be manually input each time the stream is run. We’ll walk through this example and show how this method can be used.

Instead of the findByType function that finds a single node, we can use the findAll function to find all nodes of a type or name. This takes in two arguments just like findByType, a node type and/or a node name respectively. In the following example we’re finding all database source nodes, with a type of “database”, and of any name, so we can pass in None for name:

dbnodes = stream.findAll(“database”,None)

This will create an object called dbnodes (any name can be used here) that contains a list of all database source nodes regardless of their name. If we wanted to search solely by name, we could pass in None for the type and a string for the name:

dbnodes = stream.findAll(None,”node_name”)

You can also search for a specific type and name by putting values in both arguments.

Once this object of multiple nodes has been created, we can loop through the nodes and perform operations on them. This can be done with a simple FOR loop:

for dbnode in dbnodes:

This allows you to refer to each node as ‘dbnode’ within the loop. Any name can be used in place of dbnode. Then, inside the loop, we can set properties of the nodes. If we wanted to set the datasources, names, and passwords of each node, we can use a combination of the functions described above and set the property values:

dbnodes = stream.findAll(“database”,None)
for dbnode in dbnodes:
dbnode.setPropertyValue(“datasource”,”ODBC1”)
dbnode.setPropertyValue(“username”,”USER1”)
dbnode.setPropertyValue(“password”,”PASSWORD”)

Note that in all python code, white space is important and denotes what is inside/outside of the loop. Everything inside the loop must be at the same level of indentation.
Using code like this will allow Modeler to automatically set its database credentials. This enables you to call the stream from an external source (like Modeler Batch or C&DS) and have the stream run on its own without any user interaction, connecting to databases as it needs.
All other python functionality, including basic looping, is also available within the scripting tool!

IBM’s documentation can be extremely useful to expand your knowledge of scripting and automation of streams, you can find it here:
ftp://public.dhe.ibm.com/software/analytics/spss/documentation/modeler/18.1/en/ModelerScriptingAutomation.pdf

SPSS Modeler: Advanced Stream Automation

What We Do

Industries & Roles

Technology

Resources

About Us