Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the meaning of NodeDiscoveryType as TOKEN_AWARE in Astyanax client?

I found TOKEN_AWARE enum value in Astyanax client for Cassandra in com.netflix.astyanax.connectionpool.NodeDiscoveryType and am trying to understand what it does?

package com.netflix.astyanax.connectionpool;

public enum NodeDiscoveryType {
    /**
     * Discover nodes exclusively from doing a ring describe
     */
    RING_DESCRIBE,

    /**
     * Discover nodes exclusively from an external node discovery service
     */
    DISCOVERY_SERVICE,

    /**
     * Intersect ring describe and nodes from an external service. This solve
     * the multi-region ring describe problem where ring describe returns nodes
     * from other regions.
     */
    TOKEN_AWARE,

    /**
     * Use only nodes in the list of seeds
     */
    NONE
}

Suppose if I have 24 nodes cross colo cluster with 12 nodes in PHX colo/datacenter and 12 nodes in SLC colo/datacenter.

And I am connecting to Cassandra using Astyanax client as follows:

private CassandraAstyanaxConnection() {
    context = new AstyanaxContext.Builder()
                .forCluster(ModelConstants.CLUSTER)
                .forKeyspace(ModelConstants.KEYSPACE)
    .withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("MyConnectionPool")
        .setPort(9160)
        .setMaxConnsPerHost(40)
        .setSeeds("cdb03.vip.phx.host.com:9160,cdb04.vip.phx.host.com:9160")
    )
    .withAstyanaxConfiguration(new AstyanaxConfigurationImpl()      
        .setCqlVersion("3.0.0")
        .setTargetCassandraVersion("1.2")
        .setDiscoveryType(NodeDiscoveryType.TOKEN_AWARE))
    .withConnectionPoolMonitor(new CountingConnectionPoolMonitor())
    .buildKeyspace(ThriftFamilyFactory.getInstance());

    context.start();
    keyspace = context.getEntity();

    emp_cf = ColumnFamily.newColumnFamily(
        ModelConstants.COLUMN_FAMILY, 
        StringSerializer.get(), 
        StringSerializer.get());
}

Can anyone explain me what the difference between TOKEN_AWARE of NodeDiscoveryType vs TOKEN_AWARE of ConnectionPoolType is?

Thanks for the help.

Updated Code

Below is the code I am using so far after making changes-

private CassandraAstyanaxConnection() {

    context = new AstyanaxContext.Builder()
    .forCluster(ModelConstants.CLUSTER)
    .forKeyspace(ModelConstants.KEYSPACE)
    .withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("MyConnectionPool")
        .setPort(9160)
        .setMaxConnsPerHost(40)
        .setSeeds("cdb03.vip.phx.host.com:9160,cdb04.vip.phx.host.com:9160")
        .setLocalDatacenter("phx")
    )
    .withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
        .setCqlVersion("3.0.0")
        .setTargetCassandraVersion("1.2")
        .setConnectionPoolType(ConnectionPoolType.TOKEN_AWARE))
    .withConnectionPoolMonitor(new CountingConnectionPoolMonitor())
    .buildKeyspace(ThriftFamilyFactory.getInstance());

    context.start();
    keyspace = context.getEntity();

    emp_cf = ColumnFamily.newColumnFamily(
        ModelConstants.COLUMN_FAMILY, 
        StringSerializer.get(), 
        StringSerializer.get());
}

You mentioned in your example that you will be using-

    .setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE)
    .setConnectionPoolType(ConnectionPoolType.TOKEN_AWARE)

these two together right? But I believe TOKEN_AWARE ConnectionPoolType by default uses RING_DESCRIBE so it doesn't make sense to add it again. Am I right?

Correct me if I am wrong?

like image 1000
arsenal Avatar asked May 09 '13 02:05

arsenal


1 Answers

When it comes to "node discovery" the relationship between TOKEN_AWARE for NodeDiscoveryType and TOKEN_AWARE for ConnectionPoolType is interrelated and somewhat confusing.

NodeDiscoveryType is determined as follows (and it -usually- isn't via setDiscoveryType()):

  • If you've provided Seeds via setSeeds and ConnectionPoolType is TOKEN_AWARE then NodeDiscoveryType is RING_DESCRIBE.
  • If you've provided Seeds via setSeeds and ConnectionPoolType is anything other than TOKEN_AWARE then your configured setDiscoveryType will be used. This is the only case in which your configured NodeDiscoveryType (via setDiscoveryType) will be used.
  • If you did not provide Seeds via setSeeds AND ConnectionPoolType is TOKEN_AWARE then NodeDiscoveryType is TOKEN_AWARE.
  • If you did not provide Seeds via setSeeds AND ConnectionPoolType is anything other than TOKEN_AWARE then NodeDiscoveryType is DISCOVERY_SERVICE.

Node Discovery

Now that we've determined how NodeDiscoveryType is set, let's see how it impacts actually discovering nodes. Node discovery boils down to which implementation of HostSupplier (i.e. Supplier<List<Host>>) is used.

  • If NodeDiscoveryType (from above) is DISCOVERY_SERVICE then must use HostSupplier (via withHostSupplier).
  • If NodeDiscoveryType (from above) is RING_DESCRIBE then use RingDescribeHostSupplier.
  • If NodeDiscoveryType (from above) is TOKEN_AWARE and HostSupplier is set (via withHostSupplier) then use FilteringHostSupplier with RingDescribeHostSupplier.
  • If NodeDiscoveryType (from above) is TOKEN_AWARE and no HostSupplier is set then use RingDescribeHostSupplier.

RingDescribe and using the local DC

Based on the configuration you've supplied you'll end up with RingDescribeHostSupplier. RingDescribeHostSupplier allows connections to all nodes in the ring unless you've specified a datacenter. So, when setting up your AstyanaxContext using ConnectionPoolConfigurationImpl you might want to setLocalDatacenter with the desired DC. That will ensure that hosts from the other dc's are not in the connection pool and that your requests are local.

.withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("MyConnectionPool")
        .setPort(9160)
        .setMaxConnsPerHost(40)
        .setLocalDatacenter("phx")
        .setSeeds("cdb03.vip.phx.host.com:9160,cdb04.vip.phx.host.com:9160")
    )

ConnectionPoolType

You also might want to set ConnectionPoolType to TOKEN_AWARE. When that value is left unset, it will default to ROUND_ROBIN (using the nodes from the node discovery work described above). TOKEN_AWARE ConnectionPoolType will "keep track of which hosts have which tokens and attempt to direct traffic intelligently".

I'd do something like this for Astyanax configuration, unless you are providing a HostSupplier.

.withAstyanaxConfiguration(new AstyanaxConfigurationImpl()      
        .setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE)
        .setConnectionPoolType(ConnectionPoolType.TOKEN_AWARE)
    )

Pool Optimizations

Another consideration would be optimizing the pool usage with Astyanax "latency awareness" on ConnectionPoolConfigurationImpl, but YMMV on the settings. e.g. :

.setLatencyScoreStrategy(new SmaLatencyScoreStrategyImpl(10000,10000,100,0.50))
// The constructor takes:
//  UpdateInterval: 10000 : Will resort hosts per token partition every 10 seconds
//  ResetInterval: 10000 : Will clear the latency every 10 seconds
//  WindowSize: 100 : Uses last 100 latency samples
//  BadnessThreshold: 0.50 : Will sort hosts if a host is more than 100% 

See Astyanax Configuration

TLDR;

In summary, set NodeDiscoveryType to RING_DESCRIBE (if you aren't using a HostSupplier) and ConnectionPoolType to TOKEN_AWARE. Additionally, use setLocalDatacenter to keep requests local to the dc and consider the latency awareness settings.

like image 117
Matt Self Avatar answered Sep 24 '22 19:09

Matt Self