Cassandra Administration: Bootstrapping: Adding a nodes to an existing cluster.

Adding nodes:

You might want to consider adding a new node if you have:

Reached data capacity problem
Your data has outgrown the node’s hardware capacity
Reached traffic capacity
Your application needs more rapid response with less latency
Need more operational headroom
Need more resources for node repair, compaction, and other resource-intensive operations

Adding Nodes: Best Practices
Vnodes: For vnode clusters, we can increment the size of the cluster if more nodes are needed

Two Minute Rule:

Wait a period a time before adding each additional node (single-token and vnodes).
Follow the '2 minute rule'.
This ensures the range announcement is known to all nodes before the next one begins entering the cluster.

Bootstrapping: Adding capacity to Cluster without downtime

Node announces itself to ring using seed nodes.
Can be a long-running process.
Time depends on the size of data

Bootstrapping process:

Calculate rages of the new node, notify ring of these pending ranges

Calculate the nodes that currently own these ranges and will no longer own them once the bootstrap completes.

Stream the data from these nodes to the bootstrapping node.

We can monitor the bootstrapping process through # nodetool netstats

Join the new node to the ring so that it can serve traffic

Length of time it takes to join depends on the amount of data to be streamed.

Bootstrapping Process

Boostrapping Steps: Adding node to cluster

Step1 : Install Cassandra on the new nodes, but do not start Cassandra.

Step2 : Depending on the snitch used in the cluster, set either the
properties in the cassandra-topology.properties or the cassandra-
rackdc.properties file:

Step3 : Set the following properties in the cassandra.yaml file:

cluster_name : cluster name same as mentioned in other nodes
listen_address : node ip address
seed-provider : seed node details same as mentioned in other nodes
auto_bootstrap : should be set to true.
endpoint_snitch: same as specified in other nodes.
other non-default settings : data, commitlog, cdc.. in Cassandra
.yamal and snitch config files.

Step4 : Start dse cassandra services:

  $ sudo service dse start

Step5 : Use nodetool status to verify that the node is fully bootstrapped
and all other nodes are up (UN) and not in any other state.

Step6 : After all new nodes are running, run nodetool cleanup on each of the
previously existing nodes to remove the keys that no longer belong
to those nodes.

What if Bootstrap fails?

Two scenarios:

1. Boostrap node couldn't even connect to the cluster.

Check the log file for errors.
Change the configuration and try again.

2. Streaming portion fails:

node exists in cluster in joining state
nodetool rebuild to rebootstrap data

Nodetool cleanup:

Perform cleanup after a bootstrap on the other nodes
Reads all SSTables to make sure there is no token out of range for that particular node.
If you don't run clean up, will get picked up through compaction over time.

How the cleanup process work:

It creates a new SSTable and copies only the data belong to that node from the old table.
It ignores the irrelevant data in old SSTable

$ nodetool [options] cleanup -- <keyspace> (<table>)

Cassandra Administration

Sunday, 3 May 2020

Bootstrapping: Adding a nodes to an existing cluster.

No comments:

Post a Comment

Popular Posts