Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aborting ansible playbook if a host is unreachable

I'm wondering if there is any decent way to require all hosts that a set of tasks is supposed to execute for actually being reachable?

I'm currently trying to get it to handle an update that could be pain if they are not all relevant nodes are updated in synch.

like image 623
Pierre Andersson Avatar asked Sep 19 '14 09:09

Pierre Andersson


People also ask

How do you handle an unreachable host in Ansible?

Resetting unreachable hosts If Ansible cannot connect to a host, it marks that host as 'UNREACHABLE' and removes it from the list of active hosts for the run. You can use meta: clear_host_errors to reactivate all hosts, so subsequent tasks can try to reach them again.

How do you stop playbook in Ansible?

The default behavior is to pause with a prompt. You can use ctrl+c if you wish to advance a pause earlier than it is set to expire or if you need to abort a playbook run entirely. To continue early: press ctrl+c and then c . To abort a playbook: press ctrl+c and then a .

What is Changed_when in Ansible?

Ansible changed_when property or parameters is defined to deal with the output of the specific task once a task is triggered on the remote node and based on the return code or the output, we can determine whether the task should be reported in the ansible statistic or need to use the trigger to handle the condition and ...


3 Answers

I was about to post a question, when I saw this one. The answer Duncan suggested does not work, atleast in my case. the host is unreachable. All my playbooks specify a max_fail_percentage of 0.

But ansible will happily execute all the tasks on the hosts that it is able to reach and perform the action. What I really wanted was if any of the host is unreachable, don't do any of the tasks.

What I found was a simple but might be considered hacky solution, and an open for better answers.

Since the first step as part of running the playbooks, ansible gathers facts for all the hosts. And in case where a host is not reachable it will not be able to. I write a simple play at the very beginning of my playbook which will use a fact. And in case a host is unreachable that task will fail with "Undefined variable error". The task is just a dummy and will always pass if all hosts are reachable.

See below my example:

- name: Check Ansible connectivity to all hosts
  hosts: host_all
  user: "{{ remote_user }}"
  sudo: "{{ sudo_required }}"
  sudo_user: root
  connection: ssh # or paramiko
  max_fail_percentage: 0
  tasks:
    - name: check connectivity to hosts (Dummy task)
      shell: echo " {{ hostvars[item]['ansible_hostname'] }}"
      with_items: groups['host_all']
      register: cmd_output

    - name: debug ...
      debug: var=cmd_output

In case a host is unreachable you will get an error as below:

TASK: [c.. ***************************************************** 
fatal: [172.22.191.160] => One or more undefined variables: 'dict object'    has no attribute 'ansible_hostname' 
fatal: [172.22.191.162] => One or more undefined variables: 'dict object' has no attribute 'ansible_hostname'

FATAL: all hosts have already failed -- aborting

Note: If your host group is not called host_all, you must change the dummy task to reflect that name.

like image 79
Zoro_77 Avatar answered Nov 15 '22 10:11

Zoro_77


You can combine any_errors_fatal: true or max_fail_percentage: 0 with gather_facts: false, and then run a task that will fail if the host is offline. Something like this at the top of the playbook should do what you need:

- hosts: all
  gather_facts: false
  max_fail_percentage: 0
  tasks:
    - action: ping

A bonus is that this also works with the -l SUBSET option for limiting matching hosts.

like image 38
wilkystyle Avatar answered Nov 15 '22 09:11

wilkystyle


You can add max_fail_percentage into your playbook - something like this:

- hosts: all_boxes
  max_fail_percentage: 0
  roles:
    - common
  pre_tasks:
    - include: roles/common/tasks/start-time.yml
    - include: roles/common/tasks/debug.yml

This way you can decide how much failure you want to tolerate. Here is the relevant section from the Ansible Documentation:

By default, Ansible will continue executing actions as long as there are hosts in the group that have not yet failed. In some situations, such as with the rolling updates described above, it may be desirable to abort the play when a certain threshold of failures have been reached. To achieve this, as of version 1.3 you can set a maximum failure percentage on a play as follows:

  • hosts: webservers max_fail_percentage: 30 serial: 10 In the above example, if more than 3 of the 10 servers in the group were to fail, the rest of the play would be aborted.

Note: The percentage set must be exceeded, not equaled. For example, if serial were set to 4 and you wanted the task to abort when 2 of the systems failed, the percentage should be set at 49 rather than 50.

like image 31
Duncan Lock Avatar answered Nov 15 '22 10:11

Duncan Lock