Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wildfly 9 - mod_cluster on TCP

We are currently testing to move from Wildfly 8.2.0 to Wildfly 9.0.0.CR1 (or CR2 built from snapshot). The system is a cluster using mod_cluster and is running on VPS what in fact prevents it from using multicast.

On 8.2.0 we have been using the following configuration of the modcluster that works well:

      <mod-cluster-config proxy-list="1.2.3.4:10001,1.2.3.5:10001" advertise="false" connector="ajp">
          <dynamic-load-provider>
              <load-metric type="cpu"/>
          </dynamic-load-provider>
      </mod-cluster-config>

Unfortunately, on 9.0.0 proxy-list was deprecated and the start of the server will finish with an error. There is a terrible lack of documentation, however after a couple of tries I have discovered that proxy-list was replaced with proxies that are a list of outbound-socket-bindings. Hence, the configuration looks like the following:

      <mod-cluster-config proxies="mc-prox1 mc-prox2" advertise="false" connector="ajp">
          <dynamic-load-provider>
              <load-metric type="cpu"/>
          </dynamic-load-provider>
      </mod-cluster-config>

And the following should be added into the appropriate socket-binding-group (full-ha in my case):

    <outbound-socket-binding name="mc-prox1">
        <remote-destination host="1.2.3.4" port="10001"/>
    </outbound-socket-binding>
    <outbound-socket-binding name="mc-prox2">
        <remote-destination host="1.2.3.5" port="10001"/>
    </outbound-socket-binding>

So far so good. After this, the httpd cluster starts registering the nodes. However I am getting errors from load balancer. When I look into /mod_cluster-manager, I see a couple of Node REMOVED lines and there are also many many errors like:

ERROR [org.jboss.modcluster] (UndertowEventHandlerAdapter - 1) MODCLUSTER000042: Error MEM sending STATUS command to node1/1.2.3.4:10001, configuration will be reset: MEM: Can't read node

In the log of mod_cluster there are the equivalent warnings:

manager_handler STATUS error: MEM: Can't read node

As far as I understand, the problem is that although wildfly/modcluster is able to connect to httpd/mod_cluster, it does not work the other way. Unfortunately, even after an extensive effort I am stuck.

Could someone help with setting mod_cluster for Wildfly 9.0.0 without advertising? Thanks a lot.

like image 822
TomS Avatar asked May 21 '15 13:05

TomS


3 Answers

I ran into the Node Removed issue to. I managed to solve it by using the following as instance-id

<subsystem xmlns="urn:jboss:domain:undertow:2.0" instance-id="${jboss.server.name}">

I hope this will help someone else to ;)

like image 128
JustM Avatar answered Oct 05 '22 12:10

JustM


There is no need for any unnecessary effort or uneasiness about static proxy configuration. Each WildFly distribution comes with xsd sheets that describe xml subsystem configuration. For instance, with WildFly 9x, it's:

WILDFLY_DIRECTORY/docs/schema/jboss-as-mod-cluster_2_0.xsd

It says:

<xs:attribute name="proxies" use="optional">
  <xs:annotation>
    <xs:documentation>List of proxies for mod_cluster to register with defined by outbound-socket-binding in socket-binding-group.</xs:documentation>
  </xs:annotation>
  <xs:simpleType>
    <xs:list itemType="xs:string"/>
  </xs:simpleType>
</xs:attribute>

The following setup works out of box

  1. Download wildfly-9.0.0.CR1.zip or build with ./build.sh from sources
  2. Let's assume you have 2 boxes, Apache HTTP Server with mod_cluster acting as a load balancing proxy and your WildFly server acting as a worker. Make sure botch servers can access each other on both MCMP enabled VirtualHost's address and port (Apache HTTP Server side) and on WildFly AJP and HTTP connector side. The common mistake is to binf WildFLy to localhost; it then reports its addess as localhost to the Apache HTTP Server residing on a dofferent box, which makes it impossible for it to contact WildFly server back. The communication is bidirectional.
  3. This is my configuration diff from the default wildfly-9.0.0.CR1.zip.

328c328
< <mod-cluster-config advertise-socket="modcluster" connector="ajp" advertise="false" proxies="my-proxy-one">
---
> <mod-cluster-config advertise-socket="modcluster" connector="ajp">
384c384
< <subsystem xmlns="urn:jboss:domain:undertow:2.0" instance-id="worker-1">
---
> <subsystem xmlns="urn:jboss:domain:undertow:2.0">
435c435
< <socket-binding-group name="standard-sockets" default-interface="public" port-offset="${jboss.socket.binding.port-offset:102}">
---
> <socket-binding-group name="standard-sockets" default-interface="public" port-offset="${jboss.socket.binding.port-offset:0}">
452,454d451
< <outbound-socket-binding name="my-proxy-one">
< <remote-destination host="10.10.2.4" port="6666"/>
< </outbound-socket-binding>
456c453
< </server>
---
> </server>

Changes explanation

  • proxies="my-proxy-one", outbound socket binding name; could be more of them here.
  • instance-id="worker-1", the name of the worker, a.k.a. JVMRoute.
  • offset -- you could ignore, it's just for my test setup. Offset does not apply to outbound socket bindings.
  • <outbound-socket-binding name="my-proxy-one"> - IP and port of the VirtualHost in Apache HTTP Server containing EnableMCPMReceive directive.

Conclusion

Generally, these MEM read / node error messages are related to network problems, e.g. WildFly can contact Apache, but Apache cannot contact WildFly back. Last but not least, it could happen that the Apache HTTP Server's configuration uses PersistSlots directive and some substantial enviroment conf change took place, e.g. switch from mpm_prefork to mpm_worker. In this case, MEM Read error messages are not realted to WildFly, but to the cached slotmem files in HTTPD/cache/mod_custer that need to be deleted. I'm certain it's network in your case though.

like image 34
Michal Karm Babacek Avatar answered Oct 05 '22 12:10

Michal Karm Babacek


After a couple of weeks I got back to the problem and found the solution. The problem was - of course - in configuration and had nothing in common with the particular version of Wildfly. Mode specifically:

There were three nodes in the domain and three servers in each node. All nodes were launched with the following property:

-Djboss.node.name=nodeX

...where nodeX is the name of a particular node. However, it meant that all three servers in the node get the same name, which is exactly what confused the load balancer. As soon as I have removed this property, everything started to work.

like image 24
TomS Avatar answered Oct 05 '22 12:10

TomS