Sunday, 3 May 2020

Bootstrapping: Adding a nodes to an existing cluster.


Adding nodes:

You might want to consider adding a new node if you have:
  • Reached data capacity problem
  • Your data has outgrown the node’s hardware capacity
  • Reached traffic capacity
  • Your application needs more rapid response with less latency
  • Need more operational headroom
  • Need more resources for node repair, compaction, and other resource-intensive operations


Adding Nodes: Best Practices
Vnodes: For vnode clusters, we can increment the size of the cluster if more nodes are needed


Two Minute Rule:
  • Wait a period a time before adding each additional node (single-token and vnodes).
  • Follow the '2 minute rule'.
  • This ensures the range announcement is known to all nodes before the next one begins entering the cluster.


Bootstrapping: Adding capacity to Cluster without downtime 
  • Node announces itself to ring using seed nodes.
  • Can be a long-running process.
  • Time depends on the size of data


Bootstrapping process:
  • Calculate rages of the new node, notify ring of these pending ranges
  • Calculate the nodes that currently own these ranges and will no longer own them once the bootstrap completes.
  • Stream the data from these nodes to the bootstrapping node.
  • We can monitor the bootstrapping process through # nodetool netstats
  • Join the new node to the ring so that it can serve traffic
  • Length of time it takes to join depends on the amount of data to be streamed.



Bootstrapping Process



Boostrapping Steps: Adding node to cluster

Step1 : Install Cassandra on the new nodes, but do not start Cassandra.

Step2 : Depending on the snitch used in the cluster, set either the  
           properties in the cassandra-topology.properties or the cassandra- 
           rackdc.properties file:

Step3 : Set the following properties in the cassandra.yaml file:
           
           cluster_name   : cluster name same as mentioned in other nodes
           listen_address  : node ip address
           seed-provider   : seed node details same as mentioned in other nodes
           auto_bootstrap : should be set to true.
           endpoint_snitch: same as specified in other nodes.
           other non-default settings  : data, commitlog, cdc.. in   Cassandra
                                                   .yamal and snitch config files.

 Step4 : Start dse cassandra services:
          
            $ sudo service dse start    

 Step5 : Use nodetool status to verify that the node is fully bootstrapped
            and all other nodes are up (UN) and not in any other state.

 Step6 : After all new nodes are running, run nodetool cleanup on each of the 
            previously existing nodes to remove the keys that no longer belong 
            to those nodes. 



What if Bootstrap fails?

Two scenarios:

  1. Boostrap node couldn't even connect to the cluster.
  • Check the log file for errors.
  • Change the configuration and try again.      

  2. Streaming portion fails:
  • node exists in cluster in joining state
  • nodetool rebuild to rebootstrap data

      
Nodetool cleanup:

  • Perform cleanup after a bootstrap on the other nodes
  • Reads all SSTables to make sure there is no token out of range for that particular node.
  • If you don't run clean up, will get picked up through compaction over time.


How the cleanup process work:
  • It creates a new SSTable and copies only the data belong to that node from the old table.
  • It ignores the irrelevant data in old SSTable  
      $ nodetool [options] cleanup -- <keyspace> (<table>)


Using the nodetool Utility

The nodetool utility is a command-line interface for managing a cluster.
  • A command-line interface for monitoring Cassandra
  • Also used for performing routine database operations
  • Run directly from an operating Cassandra node
Note: If the node from which you issue the command is the intended target, you do not need the -h option to identify the target; otherwise, for remote invocation, identify the target node, or nodes, using -h.

How does it work?:
  • JMX(Java Management Extention) command line wrapper.
  • Communicates with JMX to perform Operation and Monitoring tasks exposed by MBeans
  • JMX is a Java technology that supplies tools for managing and monitoring Java applications and services.

Where does it work on? :
The nodetool commands can be categorized into these groups: 1) Cluster 2) Server 3) Backup 4) Storage 5) Compaction 6) Network


nodetool status:










Type of commands: Cluster
  • This has to do with working with the cluster-wide information
  • As the point-of-view of that node as it sees the state of the cluster.
1) nodetool status

  • Provides information about the cluster, such as the state, load, and IDs.
  • Who's up and who's down.
  • This is probably the most used nodetool command.

$ nodetool <options> status <keyspace>


2) nodetool repair

  • Starts a repair process from the point of the view of that node.
  • Repairs one or more keyspaces, tables.

$ nodetool <options> repair


Type of commands: Node
  • Scope of the commands are at the node level.

1) nodetool info
  • Provides node information such as load and uptime
  • Status of the JVM
  • Use the "T" flag to display all tokens

$ nodetool <options> info ( -T | --tokens )

2) nodetool tpstats
  • Provides usage statistics of thread pools
  • How many completed, how many pending, which ones are blocked.
  • A high number of pending tasks for any pool can indicate performance problems
  • Shows mutation drops( critical for troubleshooting a node)
  • A heavily used command during troubleshooting

$ nodetool <options> tpstats
Statuses: Active, Pending, Completed, Blocked, All time blocked, Pool Names : ReadStage RequestReleaseStage MutationStage ReadRepairStage GossipStage Anti-EntropyStage MigrationStage MemoryMeter MemtablePostFlusher FlushWriter Commitlog_Archiver InternalResponseStage HintedHandoff Message Type: Type of operations get dropped Range_Slice Read_Repair Paged_Range Binary Read Mutation _Trace Request_Response Counter_Mutation
3) nodetool tablehistograms
  • The nodetool tablehistograms command provides statistics about a table
  • Includes read/write latency, partition size, column count, and number of SSTABLES
  • The report is incremental, not cumulative.
  • It covers all the operation since the last time nodetool tablehistograms was run in the current session.
  • The use of the metrics-core library makes the output more informative and easier to understand
  • These statistics could be used to plot a frequency function.
$ nodetool <options> tablehistograms -- <keyspace>.<table>

Type of commands: Backup

1) nodetool snapshot
  • We can take a snapshot of one or more keyspaces, or of a table, to backup data.
  • Cassandra flushes the node before taking the snapshot.
  • Takes the snapshot and store the data in the snapshot directory of each keyspace in the data directory.
  • If you do not specify the name of the snapshot directory using the -T option, Cassandra names the directory using the timestamp of the snapshot like 398276394348934.
$ nodetool <options> snapshot Example: Snapshot all keyspaces $ nodetool snapshot Example: Snapshot single keypsace $ nodetool snapshot -t 2016.05.17 killrvideo Example: Multiple keyspace snapshot $ nodetool snapshot mykeyspace killrvideo Example: Single table snapshot $ nodetool snapshot --table users killrvideo Example: Snapshot different tables from different keyspaces $ nodetool snapshot -kt Cycling.Cyclist_Name, KillrVideo.Users
2) nodetool clearsnapshot:
  • Removes one or more snapshots.
$ nodetool <options> clearsnapshot --t snapshot --keyspace Keyspace_Name

3) nodetool listsnapshots:
  • Lists snapshot names, size on disk, and true size.
  $ nodetool <options> listsnapshots

Type of commands: Storage
1) nodetool cleanup:
  • Used to get rid of old data on nodes after bootstrap operations.
$ nodetool <options> cleanup -- <keyspace> (<table> ...)
2) nodetool flush:
  • The command flushes everything in the memtables out to SSTables and deletes all commitlog segments.
  • You can specify a keyspace followed by one or more tables that you want to flush from the memtable to SSTables on disk.

$ nodetool <options> flush -- <keyspace> ( <table> ... )
Type of commands: Compaction 1) nodetool compact:
  • Forces a major compaction on one or more tables for size tired compaction.
  • Acts differently for different compaction strategies.

$ nodetool <options> compact <keyspace> (<table> ...)
2) nodetool compactionstats:
  • Provides statistics about a compaction
  • Not something you would just run all the time.
  • This is a JMX statistic you could also pull in using OpsCenter
  • See what is currently compacting

$ nodetool <options> compactionstats -H
* H converts bytes to human-readable form: KB, MB, GB, or TB
Type of commands: Network

1) nodetool proxyhistograms:
  • Provides a histogram of network statistics
  • The output of this command shows the full request latency recorded by the coordinator.
  • Includes the percentile rank of the read and write latency values for inter-node communication
  • Typically, you use the command to see if requests encounter a slow node.

       $ nodetool <options> proxyhistograms


2) nodetool netstats:
  • Provides network information about host
  • The default host is the connect host if the user does not include a hostname or ip-address in the command.

$ nodetool <options> netstats -H

Saturday, 2 May 2020

Removing a node from cluster.

Why do we remove nodes from cluster:

  • To reduce the capacity of the cluster.
  • Decommission due to operational requirements eg: dse upgrade, swapping instances...
  • Node is offline and will never come back online. eg: hardware issues...

While the node being removed:

  • Other nodes need to pick up the removed node's data.
  • The cluster needs to know the node is gone. 

3 options to remove a node from the cluster, based on the context:

  • nodetool decommission
  • nodetool removenode
  • nodetool assassinate


Decommission: Removing a live node from the cluster. 

Decommissioning a node will assign the ranges of the old node to other nodes of the cluster. That node data will gets streamed evenly among the other active nodes.

Decommissioning will transfer the data from the decommissioned node to other active nodes in the cluster. With Vnodes, the rebalance happens automatically.

After running the nodetool decommission command:

  • The node is offline i.e. node will not be shown in nodetool status.
  • The dse services will still be running. Run "sudo services dse stop" to stop the services.
  • The data will not be deleted from the decommissioned node.
  • Not deleting the data may cause data resurrection issues.


$ nodetool decommission 

Note: Monitor decommission progress with nodetool netstats and nodetool status.




Streaming data to active nodes


                    


Removenode: Used to remove a dead down node from the cluster. 

Check the status of the node being removed through nodetool status. If the node is down and no chance of coming back, remove the node using the nodetool removenode command.

Removing a dead node from the cluster is done to reassign the token ranges that the dead node was responsible for to other nodes in the cluster and to populate other nodes with the data that the dead node had been responsible for. This process is initiated by nodetool removenode command.

With a dead node being dead, the data that it was responsible for needs to come from other nodes in the cluster, which happens when the nodetool removenode command is run.

We can check the node removal status using nodetool removenode status command.

$ nodetool removenode $host-id

Note: host-id can be obtained from nodetool status.



Assassinate:

The nodetool assassinate command is primarily intended to eliminate the problematic nodes from the cluster after the nodetool decommission or nodetool removenode commands have been executed. 

Do this as a last resort if the node is offline and never come back. It forcefully removes a dead node without re-replicating any data. It makes the remaining nodes in the cluster, aware that the node is gone.

Once the assassinated node marked as left, the node's token ranges will be owned by the other active nodes. Run nodetool repair on the remaining nodes to fix the data replication.

$ nodetool assassinate $ip-address