Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Monitor a cluster of nodes

I have > 10 nodes in a cluster. I have installed an Hadoop stack on the cluster using Cloudera (YARN, HBase, Hue, Hadoop FS, Spark, Flink). Is there an easy way to gather global statistics of all of the nodes (in terms of CPU usage, memory usage and network usage) and read it out with Python? The purpose of using Python is that I am completely free to specify the plots and to ensure a uniform plotting style in my report. Which software can I use to accomplish this? It does not have to be distributed, just an easy library would be sufficient.

like image 369
www.data-blogger.com Avatar asked Nov 08 '22 12:11

www.data-blogger.com


1 Answers

I would suggest to consider using ansible for this purpose. Here's a simple playbook that collects some data on hosts specified in inventory file and appends it to a local file:

- hosts: all
  remote_user: your_user
  tasks:
  - name: collect load average
    shell: cat /proc/loadavg
    register: cluster_node_la

  - name: write to local disk
    lineinfile: dest=/tmp/cluster_stat create=yes line="{{ ansible_fqdn }}:{{ cluster_node_la.stdout_lines }}"
    delegate_to: 127.0.0.1

You can run it as follows: ansible-playbook -i ansible-inventory stats-playbook.yml --forks=1

  • ansible_inventory is the file containing a list of your hosts
  • stats-playbook.yml is the file printed above

Of course, depending on how you're going to store collected data it might be implemented differently but I think the general idea is clear. Anyway, there are plenty of ways to solve it in ansible.

Besides, ansible has python API and you can do most things directly from python! I.e, this is how we can collect configuration of your cluster:

import pprint

import ansible.runner
import ansible.inventory

inventory_file = 'ansible_inventory'  # see ansible inventory files
inventory = ansible.inventory.Inventory(inventory_file)

runner = ansible.runner.Runner(
   module_name='setup',
   module_args='',
   pattern='all',
   inventory=inventory
)

cluster_facts = runner.run()
pprint.pprint(cluster_facts)
like image 88
ffeast Avatar answered Nov 14 '22 21:11

ffeast