RetryRunner plugin
RetryRunner plugin implements retry logic to improve task execution reliability.
Primary usecase for RetryRunner is to make Nornir task execution as reliable as possible utilizing queuing, retries, connections splaying and exponential backoff mechanisms.
RetryRunner Architecture
RetryRunner helps to control the rate of connections establishment by limiting the number of connector workers.
For example, if num_connectors
is 5, meaning at any point in time there are only 5 workers
establishing connections to devices, even if there are 100 devices, RetryRunner will connect
only with 5 of them at a time. This is very helpful when connections rate need to be limited
due to operations restrictions like AAA (TACACS, RADIUS) servers load.
When new task started and if no connection exist to device that this task makes the use of,
RetryRunner attempts to connect to device retrying up to connect_retry
times.
Once connection established, task handed over to worker threads for execution, workers
will retry the task up to task_retry
times if task fails.
Connection parameters such as timeouts or usage of SSH keys handled by Nornir Connection plugins. RetryRunner calls Nornir to start the connection, further connection establishment details controlled by Connection plugin itself.
Sample Usage
Instruct Nornir to use RetryRunner on instantiation and run your tasks:
from nornir import InitNornir
NornirObj = InitNornir(
runner={
"plugin": "RetryRunner",
"options": {
"num_workers": 100,
"num_connectors": 10,
"connect_retry": 3,
"connect_backoff": 1000,
"connect_splay": 100,
"task_retry": 3,
"task_backoff": 1000,
"task_splay": 100
}
}
)
Sample code to demonstrate usage of RetryRunner
, DictInventory
and ResultSerializer
plugins:
import yaml
import pprint
from nornir import InitNornir
from nornir.core.task import Result, Task
from nornir_netmiko import netmiko_send_command, netmiko_send_config
from nornir_salt.plugins.functions import ResultSerializer
inventory_data = '''
hosts:
R1:
hostname: 192.168.1.151
platform: ios
groups: [lab]
R2:
hostname: 192.168.1.153
platform: ios
groups: [lab]
R3:
hostname: 192.168.1.154
platform: ios
groups: [lab]
groups:
lab:
username: cisco
password: cisco
'''
inventory_dict = yaml.safe_load(inventory_data)
NornirObj = InitNornir(
runner={
"plugin": "RetryRunner",
"options": {
"num_workers": 100,
"num_connectors": 10,
"connect_retry": 3,
"connect_backoff": 1000,
"connect_splay": 100,
"task_retry": 3,
"task_backoff": 1000,
"task_splay": 100
}
},
inventory={
"plugin": "DictInventory",
"options": {
"hosts": inventory_dict["hosts"],
"groups": inventory_dict["groups"],
"defaults": inventory_dict.get("defaults", {})
}
},
)
def _task_group_netmiko_send_commands(task, commands):
# run commands
for command in commands:
task.run(
task=netmiko_send_command,
command_string=command,
name=command
)
return Result(host=task.host)
# run single task
result1 = NornirObj.run(
task=netmiko_send_command,
command_string="show clock"
)
# run grouped tasks
result2 = NornirObj.run(
task=_task_group_netmiko_send_commands,
commands=["show clock", "show run | inc hostname"],
connection_name="netmiko"
)
# run another single task
result3 = NornirObj.run(
task=netmiko_send_command,
command_string="show run | inc hostname"
)
NornirObj.close_connections()
# Print results
formed_result1 = ResultSerializer(result1, add_details=True)
pprint.pprint(formed_result1, width=100)
formed_result2 = ResultSerializer(result2, add_details=True)
pprint.pprint(formed_result2, width=100)
formed_result3 = ResultSerializer(result3, add_details=True)
pprint.pprint(formed_result3, width=100)
Connections handling
Warning
For parent or grouped tasks need to explicitly provide connection plugin
connection_name
task parameter such as netmiko, napalm, scrapli, scrapli_netconf
,
etc. Specifying connection_name
attribute for parent or grouped tasks not required if that
task has CONNECTION_NAME
global variable defined within it. Lack of connection_name
attribute will result in skipping connections retry logic, jumphost connection logic or credentials
retry logic and connections to all hosts initiated simultaneously up to the number of num_workers
option.
Above restriction stems from the fact that Nornir tasks does not have built-in way to communicate
the set of connection plugins that task will use. By convention, task may contain CONNECTION_NAME
global parameter to identify the name(s) of connection plugin(s) task uses.
CONNECTION_NAME
global parameter can be a single connection name or a comma separated list of
connection plugin names that task and its subtask uses. RetryRunner honors this parameter and tries
to establish all specified connections before starting the task.
Alternatively, inline task parameter connection_name
can be provided on task run.
However, only parent/main/grouped task supports task parameters, subtasks does not support them.
As a result, if subtask uses connection plugin different from specified in parent task connection_name
parameter or CONNECTION_NAME
variable, subtask connection does not handled by RetryRunner
connections establishment logic and connection established on subtask start simultaneously in parallel
up to the number of num_workers
option.
Sample task that uses different connection plugins for subtasks:
from nornir.core.task import Result, Task
from nornir_scrapli.tasks import netconf_get_config
from nornir_scrapli.tasks import send_command as scrapli_send_command
from nornir_netmiko.tasks import netmiko_send_command
# inform RetryRunner to establish these connections
CONNECTION_NAME = "scrapli_netconf, netmiko, scrapli"
def task(task: Task) -> Result:
task.run(
name="Pull Configuration Using Scrapli Netconf",
task=netconf_get_config,
source="running"
)
task.run(
name="Pull Configuration using Netmiko",
task=netmiko_send_command,
command_string="show run",
enable=True
)
task.run(
name="Pull Configuration using Scrapli",
task=scrapli_send_command,
command="show run"
)
return Result(host=task.host)
RetryRunner task parameters
RetryRunner supports a number of task parameters to influence its behavior on a per-task basis. These parameters can be supplied to the task as key/value arguments to override RetryRunner options supplied on Nornir object instantiation.
RetryRunner task parameters description:
run_connect_retry
- number of connection attemptsrun_task_retry
- number of attempts to run taskrun_creds_retry
- list of connection credentials and parameters to retry while connecting to devicerun_num_workers
- number of threads for tasks executionrun_num_connectors
- number of threads for device connectionsrun_reconnect_on_fail
- if True, re-establish connection on task failurerun_task_stop_errors
- list of glob patterns to stop retrying if seen in task exception stringconnection_name
- name of connection plugin to use to initiate connection to device
Note
Tasks retry count is the smallest of run_connect_retry
and run_task_retry
counters,
i.e. task_retry
set to min(run_connect_retry, run_task_retry)
value.
Warning
only main/parent tasks support RetryRunner task parameters, subtasks does not support them.
Sample code to use RetryRunner task parameters:
import yaml
from nornir import InitNornir
from nornir.core.task import Result, Task
from nornir_netmiko import netmiko_send_command
inventory_data = '''
hosts:
R1:
hostname: 192.168.1.151
platform: ios
groups: [lab]
groups:
lab:
username: foo
password: bar
defaults:
data:
credentials:
local_creds:
username: nornir
password: nornir
dev_creds:
username: devops
password: foobar
'''
inventory_dict = yaml.safe_load(inventory_data)
NornirObj = InitNornir(
runner={
"plugin": "RetryRunner"
},
inventory={
"plugin": "DictInventory",
"options": {
"hosts": inventory_dict["hosts"],
"groups": inventory_dict["groups"],
"defaults": inventory_dict.get("defaults", {})
}
},
)
# run task without retrying - simulate QueueRunner behavior
result1 = NornirObj.run(
task=netmiko_send_command,
command_string="show clock",
run_connect_retry=0,
run_task_retry=0,
)
# run task one by one - simulate SerialRunner behavior but with retrying
result2 = NornirObj.run(
task=netmiko_send_command,
command_string="show clock",
run_num_workers=1,
run_num_connectors=1,
)
# retry credentials if login fails but without retrying conection establishment
result3 = NornirObj.run(
task=netmiko_send_command,
command_string="show clock",
run_retry_creds=["local_creds", "dev_creds"]
run_connect_retry=0,
)
Connecting to hosts behind jumphost
RetryRunner implements logic to connect with hosts behind bastion/jumphosts.
To connect to devices behind jumphost, need to define jumphost parameters in host’s inventory data:
hosts:
R1:
hostname: 192.168.1.151
platform: ios
username: test
password: test
data:
jumphost:
hostname: 10.1.1.1
port: 22
password: jump_host_password
username: jump_host_user
Note
Only Netmiko connection_name="netmiko"
and Ncclient connection_name="ncclient"
tasks, support connecting to hosts behind Jumphosts using above inventory data.
Retrying different credentials
RetryRunner is capable of trying several credentials while connecting to device.
Credentials tried in a sequence starting with host’s inventory username and
password parameters moving on to connection parameters supplied in creds_retry
RetryRunner option.
Credentials retry logic implemented using conn_open
task plugin in a way that
creds_retry
list content passed as reconnect
argument to conn_open
task.
Items of creds_retry
list tried sequentially until connection successfully established,
or list runs out of items. If no connection established after all creds_retry
items tried,
this connection attempt considered unsuccessful, hosts queued back to connectors queue and
process repeats on next try.
Sample inventory with retry credentials:
hosts:
R1:
hostname: 192.168.1.151
platform: ios
groups: [lab]
data:
credentials:
local_creds:
username: admin
password: admin
groups:
lab:
username: foo
password: bar
defaults:
data:
credentials:
local_creds:
username: nornir
password: nornir
dev_creds:
username: devops
password: foobar
credentials
defined within default
data section, but can be defined
inside host or groups data as well, the preference is host -> groups -> defaults
.
Credentials definitions does not merged across different data section but searched
in a host -> groups -> defaults
order and first one encountered used.
Sample code to use creds_retry
:
from nornir import InitNornir
NornirObj = InitNornir(
runner={
"plugin": "RetryRunner",
"options": {
"creds_retry": ["local_creds", "dev_creds"]
}
}
)
Credentials will be tried in a sequence defined in creds_retry
option
as soon as connection using main credentials fail to establish.
API Reference
- class nornir_salt.plugins.runners.RetryRunner.RetryRunner(num_workers: int = 100, num_connectors: int = 20, connect_retry: int = 3, connect_backoff: int = 5000, connect_splay: int = 100, task_retry: int = 1, task_backoff: int = 5000, task_splay: int = 100, reconnect_on_fail: bool = True, task_timeout: int = 600, creds_retry: Optional[list] = None, task_stop_errors: Optional[list] = None)
RetryRunner is a Nornir runner plugin that strives to make task execution as reliable as possible.
- Parameters
num_workers – number of threads for tasks execution
num_connectors – number of threads for device connections
connect_retry – number of connection attempts
connect_backoff – exponential backoff timer in milliseconds
connect_splay – random interval between 0 and splay for each connection in milliseconds
task_retry – number of attempts to run task
task_backoff – exponential backoff timer in milliseconds
task_splay – random interval between 0 and splay before task start in milliseconds
reconnect_on_fail – boolean, default True, perform reconnect to host on task failure
task_timeout – int, seconds to wait for task to complete before closing all queues and stopping connectors and workers threads, default 600
creds_retry – list of connection credentials and parameters to retry while connecting to device
task_stop_errors – list of glob patterns to stop retrying if seen in task exception string, these patterns not applicable to errors encountered during connection establishment. Error
*validation error*
pattern always included in these list.