Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aerospike Community Edition: what should I do to `aerospike.conf` to setup a cluster?

Tags:

aerospike

I'm trying to setup a three-node Aerospike cluster on Ubuntu 14.04. Apart from the IP address/name, each machine is identical. I installed Aerospike and the management console, per the documentation, on each machine.

I then edited the network/service and network/heartbeat sections in /etc/aerospike/aerospike.conf:

network {
    service {
        address any
        port 3000
        access-address 10.0.1.11  # 10.0.1.12 and 10.0.1.13 on the other two nodes
    }

    heartbeat {
        mode mesh
        port 3002
        mesh-seed-address-port 10.0.1.11 3002
        mesh-seed-address-port 10.0.1.12 3002
        mesh-seed-address-port 10.0.1.13 3002
        interval 150
        timeout 10
    }

[...]

}

When I sudo service aerospike start on each of the nodes, the service runs but it's not clustered. If I try to add another node in the management console, it informs me: "Node 10.0.1.12:3000 cannot be monitored here as it belongs to a different cluster."

Can you see what I'm doing wrong? What changes should I make to aerospike.conf, on each of the nodes, in order to setup an Aerospike cluster instead of three isolated instances?

like image 843
Alex Woolford Avatar asked Feb 10 '23 14:02

Alex Woolford


1 Answers

Your configuration appears correct.

Check if you are able to open a TCP connection over ports 3001 and 3002 from each host to the rest.

nc -z -w5 <host> 3001; echo $?
nc -z -w5 <host> 3002; echo $?

If not I would first suspect firewall configuration.

Update 1:

The netcat commands returned 0 so let's try to get more info.

Run and provide the output of the following on each node:

asinfo -v service
asinfo -v services
asadm -e info

Update 2:

After inspecting the output in the gists, the asadm -e "info net" indicated that all nodes had the same Node IDs.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node               Node                        Fqdn               Ip   Client     Current      HB        HB   
   .                 Id                           .                .    Conns        Time    Self   Foreign   
h      *BB9000000000094   hadoop01.woolford.io:3000   10.0.1.11:3000       15   174464730   37129         0   
Number of rows: 1

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node               Node                        Fqdn               Ip   Client     Current      HB        HB   
   .                 Id                           .                .    Conns        Time    Self   Foreign   
h      *BB9000000000094   hadoop03.woolford.io:3000   10.0.1.13:3000        5   174464730   37218         0   
Number of rows: 1

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node               Node                        Fqdn               Ip   Client     Current      HB        HB   
   .                 Id                           .                .    Conns        Time    Self   Foreign   
h      *BB9000000000094   hadoop02.woolford.io:3000   10.0.1.12:3000        5   174464731   37203         0   
Number of rows: 1

The Node ID is constructed with the fabric port (port 3001 in hex) followed by the MAC address in reverse byte order. Another flag was that the "HB Self" was non-zero and is expected to be zero in a mesh configuration (in a multicast configuration this will also be non-zero since the nodes will receive their own heartbeat messages).

Because all of the Node IDs are the same, this would indicate that all of the MAC address are the same (though it is possible to change the node IDs using rack aware). Heartbeats that appear to have originated from the local node (determined by hb having the same node id) are ignored.

Update 3:

The MAC addresses are all unique, which contradicts previous conclusions. A reply provided the interface name being used, em1, which is not an interface name Aerospike looks for. Aerospike looks for interfaces named either eth#, bond#, or wlan#. I assume since the name wasn't one of the expected three this caused the issue with the MAC addresses; if so I would suspect the following warning exists in the logs?

Tried eth,bond,wlan and list of all available interfaces on device.Failed to retrieve physical address with errno %d %s

For such scenarios the network-interface-name parameter may be used to instruct Aerospike which interface to use for node id generation. This parameter also determines which interface's IP address should be advertised to the client applications.

network {
    service {
        address any
        port 3000
        access-address 10.0.1.11  # 10.0.1.12 and 10.0.1.13 on the other two nodes
        network-interface-name em1 # Needed for Node ID
    }

Update 4:

With the 3.6.0 release, these device names will be automatically discovered. See AER-4026 in release notes.

like image 131
kporter Avatar answered May 10 '23 17:05

kporter