I found TOKEN_AWARE
enum value in Astyanax client for Cassandra in com.netflix.astyanax.connectionpool.NodeDiscoveryType and am trying to understand what it does?
package com.netflix.astyanax.connectionpool;
public enum NodeDiscoveryType {
/**
* Discover nodes exclusively from doing a ring describe
*/
RING_DESCRIBE,
/**
* Discover nodes exclusively from an external node discovery service
*/
DISCOVERY_SERVICE,
/**
* Intersect ring describe and nodes from an external service. This solve
* the multi-region ring describe problem where ring describe returns nodes
* from other regions.
*/
TOKEN_AWARE,
/**
* Use only nodes in the list of seeds
*/
NONE
}
Suppose if I have 24 nodes cross colo cluster
with 12 nodes in PHX colo/datacenter
and 12 nodes in SLC colo/datacenter
.
And I am connecting to Cassandra using Astyanax client as follows:
private CassandraAstyanaxConnection() {
context = new AstyanaxContext.Builder()
.forCluster(ModelConstants.CLUSTER)
.forKeyspace(ModelConstants.KEYSPACE)
.withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("MyConnectionPool")
.setPort(9160)
.setMaxConnsPerHost(40)
.setSeeds("cdb03.vip.phx.host.com:9160,cdb04.vip.phx.host.com:9160")
)
.withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
.setCqlVersion("3.0.0")
.setTargetCassandraVersion("1.2")
.setDiscoveryType(NodeDiscoveryType.TOKEN_AWARE))
.withConnectionPoolMonitor(new CountingConnectionPoolMonitor())
.buildKeyspace(ThriftFamilyFactory.getInstance());
context.start();
keyspace = context.getEntity();
emp_cf = ColumnFamily.newColumnFamily(
ModelConstants.COLUMN_FAMILY,
StringSerializer.get(),
StringSerializer.get());
}
Can anyone explain me what the difference between TOKEN_AWARE
of NodeDiscoveryType
vs TOKEN_AWARE
of ConnectionPoolType
is?
Thanks for the help.
Updated Code
Below is the code I am using so far after making changes-
private CassandraAstyanaxConnection() {
context = new AstyanaxContext.Builder()
.forCluster(ModelConstants.CLUSTER)
.forKeyspace(ModelConstants.KEYSPACE)
.withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("MyConnectionPool")
.setPort(9160)
.setMaxConnsPerHost(40)
.setSeeds("cdb03.vip.phx.host.com:9160,cdb04.vip.phx.host.com:9160")
.setLocalDatacenter("phx")
)
.withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
.setCqlVersion("3.0.0")
.setTargetCassandraVersion("1.2")
.setConnectionPoolType(ConnectionPoolType.TOKEN_AWARE))
.withConnectionPoolMonitor(new CountingConnectionPoolMonitor())
.buildKeyspace(ThriftFamilyFactory.getInstance());
context.start();
keyspace = context.getEntity();
emp_cf = ColumnFamily.newColumnFamily(
ModelConstants.COLUMN_FAMILY,
StringSerializer.get(),
StringSerializer.get());
}
You mentioned in your example that you will be using-
.setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE)
.setConnectionPoolType(ConnectionPoolType.TOKEN_AWARE)
these two together right? But I believe TOKEN_AWARE ConnectionPoolType
by default uses RING_DESCRIBE
so it doesn't make sense to add it again. Am I right?
Correct me if I am wrong?
When it comes to "node discovery" the relationship between TOKEN_AWARE for NodeDiscoveryType and TOKEN_AWARE for ConnectionPoolType is interrelated and somewhat confusing.
Now that we've determined how NodeDiscoveryType is set, let's see how it impacts actually discovering nodes. Node discovery boils down to which implementation of HostSupplier (i.e. Supplier<List<Host>>
) is used.
withHostSupplier
).withHostSupplier
) then use FilteringHostSupplier with RingDescribeHostSupplier.Based on the configuration you've supplied you'll end up with RingDescribeHostSupplier. RingDescribeHostSupplier allows connections to all nodes in the ring unless you've specified a datacenter. So, when setting up your AstyanaxContext using ConnectionPoolConfigurationImpl you might want to setLocalDatacenter with the desired DC. That will ensure that hosts from the other dc's are not in the connection pool and that your requests are local.
.withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("MyConnectionPool")
.setPort(9160)
.setMaxConnsPerHost(40)
.setLocalDatacenter("phx")
.setSeeds("cdb03.vip.phx.host.com:9160,cdb04.vip.phx.host.com:9160")
)
You also might want to set ConnectionPoolType to TOKEN_AWARE. When that value is left unset, it will default to ROUND_ROBIN (using the nodes from the node discovery work described above). TOKEN_AWARE ConnectionPoolType will "keep track of which hosts have which tokens and attempt to direct traffic intelligently".
I'd do something like this for Astyanax configuration, unless you are providing a HostSupplier.
.withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
.setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE)
.setConnectionPoolType(ConnectionPoolType.TOKEN_AWARE)
)
Another consideration would be optimizing the pool usage with Astyanax "latency awareness" on ConnectionPoolConfigurationImpl, but YMMV on the settings. e.g. :
.setLatencyScoreStrategy(new SmaLatencyScoreStrategyImpl(10000,10000,100,0.50))
// The constructor takes:
// UpdateInterval: 10000 : Will resort hosts per token partition every 10 seconds
// ResetInterval: 10000 : Will clear the latency every 10 seconds
// WindowSize: 100 : Uses last 100 latency samples
// BadnessThreshold: 0.50 : Will sort hosts if a host is more than 100%
See Astyanax Configuration
In summary, set NodeDiscoveryType to RING_DESCRIBE (if you aren't using a HostSupplier) and ConnectionPoolType to TOKEN_AWARE. Additionally, use setLocalDatacenter to keep requests local to the dc and consider the latency awareness settings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With