/
Apache Cassandra (Single & Multi-Node/Cluster) Setup & Testing Instructions for Windows (2012-R2 x64)

Apache Cassandra (Single & Multi-Node/Cluster) Setup & Testing Instructions for Windows (2012-R2 x64)

Pre-requisites:

1.  Ensure you have the syndeia-cloud-3.4_cassandra_zookeeper_kafka_setup.zip (or latest service pack) downloaded to your home directory (or home directory's Downloads folder) from the download/license instructions sent out by our team.  

(info)  Note: the .ZIP will pre-create a separate folder for its contents when extracted so there is no need to pre-create a separate folder for it.  

2.  Review Apache Cassandra's recommendations, ie: (Open|Oracle)JDK/JRE, memory, FS selection, params, etc. in Deployment.  

(info)  Note:  Syndeia Cloud can be deployed on a different machine VS Cassandra but these steps will mostly focus on a single-node deployment.  

Single Node Setup Instructions

1. Deploy a new standard Windows image on a physical or virtual machine (VM) or install from a Deployment Kit unatttend.txt script or install from media manually.

2. Setup forward & reverse DNS records on your DNS server (consult your IT admin/sysadmin if required) and set the hostname and primary DNS suffix on the machine itself if necessary.  

3. If using a firewall, ensure the following ports are accessible(consult your local network admin if required): TCP ports 7000, 7001, 7199, 9042, 9142, 9160 (for details on what each port is used for see http://cassandra.apache.org/doc/latest/faq/index.html#what-ports & https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/security/secFirewallPorts.html#secFirewallPorts__firewall_table).  

(info) Note: If required by your IT department, perform any other standard configuration (ie: create separate admin account, set timezone, date & time or set it to synchronize with an NTP server, disable root logins, change default SSH port, installing Fail2Ban, enabling & configuring local firewall, etc.)



Download, Install & Configure JRE

3. Download & install Oracle JRE 1.8.0_221 from https://www.oracle.com/technetwork/java/javase/downloads/server-jre8-downloads-2133154.html

(info) Note, Oracle archives older versions if a new version comes out, so if the version is not available at the link above, check https://www.oracle.com/technetwork/java/javase/downloads/java-archive-javase8-2177648.html instead

4. Set the System Environment Variable JAVA_HOME=C:\Progra~1\Java\jre1.8.0_221 (for steps on how to set this permanently, see Appendix B2.2)

(info) Note, we use the Windows 8-dot-3 "Progra~1" form of "Program Files" as spaces in the path appear to cause issues for Apache Kafka when run via Cygwin 

5. Reboot



Download, Install & Run Apache Cassandra

6.  Download Apache Cassandra v3.11.1 under "Older Supported Releases" from https://cassandra.apache.org/download/ (https://archive.apache.org/dist/cassandra/3.11.1/apache-cassandra-3.11.1-bin.tar.gz, other versions may work but this is the version that was tested against for Syndeia Cloud v3.3.  

(info) Note: Windows by default doesn't know how to read .tar.gz "tarballs" (an extension which comes from "tape archive" that has been "GNU Zip"  compressed), you will need to launch a Cygwin terminal and extract the tarball to c:\cygwin64\opt, ex:  cd /opt; tar -xvf /cygdrive/c/Users/Administrator/Downloads/apache-cassandra-3.11.1-bin.tar.gz  -OR- Alternatively, you can install a utility such as (the open source) 7z(ip) (un)archiver, see http://www.7-zip.org/ (http://www.7-zip.org/a/7z1604-x64.exe

7. Open an Administrator Command Prompt (CMD.EXE) in C:\cygwin64\opt\apache-cassandra-3.11.1 (Note, it is highly recommended you enable QuickEdit Mode, increase the scrollback history, and window size to make it easier to grab log messages later if needed.  See Appendix B2.3 on how to do this)

8.  Extract apache-cassandra-3.11.1-bin.tar.gz to C:\cygwin64\opt and make a symlink to it, ie:  mklink /d apache-cassandra-current c:\cygwin64\opt\apache-cassandra-3.11.1 

9.  Set ACL permissions, ie:  set Administrators to have "Full-Control" on the extracted Apache Cassandra 3.11.1 folder recursively:  icacls C:\cygwin64\opt\apache-cassandra-3.11.1 /grant Administrators:F /T /L /C  

10.  Edit C:\cygwin64\opt\apache-cassandra-3.11.1\conf\cassandra.yaml & confirm the following settings (you should only have to set cluster_name, authenticator, authorizer, and write_request_timeout_in_ms):  

cluster_name: 'YourClusterName'
num_tokens: 256
authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer
seed_provider:
 - class_name: org.apache.cassandra.locator.SimpleSeedProvider
   parameters:
        - seeds: "127.0.0.1"
listen_address: 127.0.0.1
rpc_address: 127.0.0.1
write_request_timeout_in_ms: 20000
batch_size_fail_threshold_in_kb: 200

(info) Note1:  For a quick start, the above settings will get you up and running, however for any production deployment scenarios you may wish to implement other settings to enhance security (ie:  changing the default cassandra superuser password, enabling encryption, etc.) & performance (setting the data & commitlog directories, swap file settings, etc.).  See Appendix B2.11 for more details.  

(warning) If you frequently deal with large artifact sizes, you may want to bump up batch_size_fail_threshold_in_kb from even higher (default = 50 (KB)).

11. To install and start Cassandra as a standard NT service that starts automatically on boot, please perform the following steps: 

11.1.  Patch cassandra.bat :  to allow it to install via the "LEGACY" method (see Appendix B2.14). 

11.2.  Download and extract the Apache Commons Daemon:  Download https://archive.apache.org/dist/commons/daemon/binaries/windows/commons-daemon-1.1.0-bin-windows.zip and extract it to /opt/ (it will be used later for other services) and then copy the extracted folder commons-daemon-1.1.0-bin-windows  as daemon into /opt/apache-cassandra-current/bin/, ie:  from the Cygwin Terminal:  cp -r /opt/commons-daemon-1.1.0-bin-windows /opt/apache-cassandra-current/bin/. ( (warning) note the dot at the end).  

(warning) If you are NOT on 64-bit Windows, you will also need to edit the set PATH_PRUNSRV line in Cassandra's bin/cassandra.bat file to not point to the 64-bit binary in %CASSANDRA_HOME%\bin\daemon\amd64\. , ie: remove the amd64\ at the end.  

11.3.  Install as an NT service:  Run .\cassandra.bat LEGACY -INSTALL

11.4.  Set service to auto-start:  Run sc config cassandra start= auto

11.5.  Start Cassandra as an NT service:  run either net start cassandra or sc.exe start cassandra or run services.msc / the Services MMC applet

(info)  Note: If for whatever reason you wish/need to run the service manually, you can run \cassandra.bat LEGACY from a CMD.EXE or Cygwin Terminal window. To stop, hit ^C (<CTRL> + C) in the same window (see Appendix B2.4 for sample startup output).  

Windows Service Check: Cassandra
C:\> sc queryex cassandra

SERVICE_NAME: cassandra
        TYPE               : 10  WIN32_OWN_PROCESS
        STATE              : 4  RUNNING
                                (STOPPABLE, NOT_PAUSABLE, ACCEPTS_SHUTDOWN)
        WIN32_EXIT_CODE    : 0  (0x0)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x0
        WAIT_HINT          : 0x0

C:\>


12. To examine the log file, in a Cygwin Terminal you can use less /opt/apache-cassandra-current/logs/system.log.  To follow the log, you can use tail -f /opt/apache-cassandra-current/logs/system.log .  You should see output similar to the following (abridged) text (for the full text of an example successful startup, see Appendix A1.1):

$ less /var/log/cassandra/system.log
[...]
INFO  [main] 2019-04-05 13:55:43,277 YamlConfigurationLoader.java:89 - Configuration location: file:/etc/cassandra/cassandra.yaml
INFO  [main] 2019-04-05 13:55:43,613 Config.java:481 - Node configuration:[allocate_tokens_for_keyspace=null; authenticator=PasswordAuthenticator; authorizer=CassandraAuthorizer; auto_boo
tstrap=true; auto_snapshot=true; back_pressure_enabled=false; back_pressure_strategy=org.apache.cassandra.net.RateBasedBackPressure{high_ratio=0.9, factor=5, flow=FAST}; batch_size_fail_t
hreshold_in_kb=50; batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; broadcast_address=null; broadcast_rpc_address=null; buffer_pool_use_heap_if_exhausted=true; cas_contention_timeout_in_ms=1000; 
[...]
INFO  [main] 2019-04-05 13:55:43,613 DatabaseDescriptor.java:367 - DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
[...]
INFO  [main] 2019-04-05 13:55:43,886 CassandraDaemon.java:471 - Hostname: cassandra.mycompany.com
INFO  [main] 2019-04-05 13:55:43,887 CassandraDaemon.java:478 - JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.8.0_131
[...]
INFO  [main] 2019-04-05 13:55:50,097 QueryProcessor.java:163 - Preloaded 328 prepared statements
INFO  [main] 2019-04-05 13:55:50,098 StorageService.java:617 - Cassandra version: 3.11.1
INFO  [main] 2019-04-05 13:55:50,098 StorageService.java:618 - Thrift API version: 20.1.0
INFO  [main] 2019-04-05 13:55:50,098 StorageService.java:619 - CQL supported versions: 3.4.4 (default: 3.4.4)
INFO  [main] 2019-04-05 13:55:50,099 StorageService.java:621 - Native protocol supported versions: 3/v3, 4/v4, 5/v5-beta (default: 4/v4)
INFO  [main] 2019-04-05 13:55:50,134 IndexSummaryManager.java:85 - Initializing index summary manager with a memory pool size of 98 MB and a resize interval of 60 minutes
INFO  [main] 2019-04-05 13:55:50,142 MessagingService.java:753 - Starting Messaging Service on cassandra.mycompany.com/127.0.0.1:7000 (eth0)
INFO  [main] 2019-04-05 13:55:50,168 StorageService.java:706 - Loading persisted ring state
INFO  [main] 2019-04-05 13:55:50,169 StorageService.java:819 - Starting up server gossip
INFO  [main] 2019-04-05 13:55:50,224 TokenMetadata.java:479 - Updating topology for cassandra.mycompany.com/127.0.0.1
INFO  [main] 2019-04-05 13:55:50,225 TokenMetadata.java:479 - Updating topology for cassandra.mycompany.com/127.0.0.1
[...]
INFO  [main] 2019-04-05 13:55:50,392 StorageService.java:2268 - Node localhost/127.0.0.1 state jump to NORMAL
INFO  [main] 2019-04-05 13:55:50,404 AuthCache.java:172 - (Re)initializing CredentialsCache (validity period/update interval/max entries) (2000/2000/1000)
INFO  [main] 2019-04-05 13:55:50,406 Gossiper.java:1655 - Waiting for gossip to settle...
INFO  [main] 2019-04-05 13:55:58,408 Gossiper.java:1686 - No gossip backlog; proceeding
INFO  [main] 2019-04-05 13:55:58,470 NativeTransportService.java:70 - Netty using native Epoll event loop
[...]
INFO  [main] 2019-04-05 13:55:58,520 Server.java:156 - Starting listening for CQL clients on localhost/127.0.0.1:9042 (unencrypted)...
INFO  [main] 2019-04-05 13:55:58,623 ThriftServer.java:116 - Binding thrift service to localhost/127.0.0.1:9160
INFO  [Thread-2] 2019-04-05 13:55:58,629 ThriftServer.java:133 - Listening for thrift clients...

13. cd to the directory you installed Cassandra, ie:  cd /opt/apache-cassandra-3.11.1/bin 
14. In the new terminal window, run nodetool status, you should see output similar to the following:

$ nodetool status
Datacenter: datacenter1
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  206.93 KB  256          100.0%            41ab853b-5d48-4c4e-8d59-40e165acadae  rack1


$ 

15. Validate correct operation and create an archive image to use as a new base image if the node needs to be rebuilt or if you wish to create a cluster.  

(info)  Before making the image you may wish to first stop and optionally disable the service temporarily to prevent auto-start on boot (see Appendix B2.5 for more details on how to safely clone a Windows machine).  



Multi-Node (Cluster) Setup Instructions

Enabling your single-node deployment for cluster operation

If you followed the steps in the previous section to deploy a single-node for Cassandra, you will need to make a few adjustments so the Cassandra ports are no longer bound to localhost and are accessible from other cluster nodes. 

(warning) If you have not already done so, you may wish to secure the cassandra superuser account's password and change it from the default, especially if you will be binding to a public interface on the internet (see Appendix B2.11 on how to do this).  

16. Make a backup of cassandra.yaml & open it for editing.   

17.  Review the pre-reqs in "Initializing a multiple node cluster (single datacenter)" https://docs.datastax.com/en/cassandra/3.0/cassandra/initialize/initSingleDS.html and make the following changes to bind the node to external interfaces: 

seed_provider:
 - class_name: org.apache.cassandra.locator.SimpleSeedProvider
   parameters:
        # where w.x.y.z = the IP address of the node,
        - seeds: "w.x.y.z"
[...]

listen_address: # set to IP of node or leave blank to pickup OS provided IP/FQDN
rpc_address:    # set to IP of node or leave blank to pickup OS provided IP/FQDN

(warning) IMPORTANT: Be aware that if you have other Cassandra server(s) with the same cluster_name as this one, this node will attempt to join it when the service is restarted, which may cause issues that will be difficult to troubleshoot later.  If you do not wish for this occur, you will need to first update cluster_name in the system keyspace via CQLSH (on ALL Cassandra server(s) in the cluster you are renaming), see Appendix B2.13 for details on how to do this.  

18. Update any firewall configurations on the OS and/or externally, ie:  AWS or your cloud provider.  Take a new image (suffix the name "cluster-enabled).  

(warning) IMPORTANT:  If you have configured Cassandra as a service, disable the cassandra service from auto-starting before taking a new image and revert it once the image is taken.  

(info) Note, if Janusgraph is installed, you will also need to update the syndeia_cloud_graph configuration as well (see "Enabling your single-node deployment for cluster operation" in the Multi-Node section on the Janusgraph page to do this).  Once you have done that, you can return here to take an image before proceeding through the remaining sections and then return to the JG page

19. Restart the Cassandra service.



Adding new nodes to an existing single-node

20. Deploy another instance of your Cassandra new image and make any appropriate changes to the cloned MAC address, if necessary (ex: in the VM settings and/or udev, if used).  

21. Setup forward & reverse DNS records on your DNS server (consult your IT admin/sysadmin if required) and set the hostname and primary DNS suffix on the machine itself.  

22. RDP to the IP (or the FQDN of the new node if DNS has already propagated).

(info) Note: If using Fail2Ban, update the sender line in /etc/fail2ban/jail.local to root@<new_Cassandra_node_FQDN>. Restart the fail2ban service (sudo systemctl restart fail2ban)

23. Clear out the contents of /var/lib/cassandra/{commitlog,data,hints,saved_caches}.  

(info)  If you are just adding 1 or 2 nodes and wish to remain with the default SimpleStrategy Replication Strategy and SimpleSnitch for the endpoint_snitch, you can probably just take an image here to use this as a new baseline.  If you are planning to add more nodes, you may wish to consider evolving the other settings below as well.   

(warning) These directories must be empty for a node to join the cluster and auto_bootstrap: false should only be added in cassandra.yaml on the seed nodes you have chosen.  Per the “Prerequisites” section of https://docs.datastax.com/en/cassandra/3.0/cassandra/initialize/initSingleDS.html, normally one would elect a subset of the nodes to be seeds (usually 2-3 per datacenter is sufficient), however be aware there currently is a regression bug in Cassandra v3.6 ~ v3.11.1 that prevents non-seed nodes from starting, the workaround currently is to set all nodes as seeds, ex: seeds: <node1_IP>, <node2_IP>, ... <nodeN_IP> (see https://issues.apache.org/jira/browse/CASSANDRA-13851)

24.  Start Cassandra, keyspaces should be streamed in eventually. 

25.  (Optionally) Change the Keyspace Strategy:  (optionally) change the Snitch, alter the keyspace RF, and run nodetool repair on each node (see https://docs.datastax.com/en/archived/cassandra/3.x/cassandra/operations/opsChangeKSStrategy.html for a summary of the process).

  1. (Optionally) Change the Snitch:  see "Switching Snitches":  https://docs.datastax.com/en/archived/cassandra/3.x/cassandra/operations/opsSwitchSnitch.html

    (info) By default Cassandra cassandra.yaml is configured with endpoint_snitch: SimpleSnitch.  This "treats Strategy order as proximity" and is limited to single-"datacenter" (DC) deployments.  At some point in the future if your needs grow, you can set GossipingPropertyFileSnitch which uses the cassandra-rackdc.properties file to define a given node's membership in a DC and rack and can be overridden by an overall cassandra-topology.properties file.  If you decide to enable GossipingPropertyFileSnitch, please ensure you rename/suffix .template to cassandra-topology.properties as this file is included by default.  

  2. Alter the RF:  verify the Cassandra “datacenter” name (default = dc1) from the CLI by running nodetool status and then from Cassandra CQLSH increment the RF for your Syndeia Cloud keyspace(s):

    ALTER KEYSPACE syndeia_cloud_auth WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', '<datacenter_name>' : <total_number_of_nodes> };
    -- where <datacenter_name> = the name of the datacenter as shown via nodetool status, and <total_number_of_nodes> = total # of nodes (in the cluster)
    ALTER KEYSPACE syndeia_cloud_confluence WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', '<datacenter_name>' : <total_number_of_nodes> };
    -- where <datacenter_name> = the name of the datacenter as shown via nodetool status, and <total_number_of_nodes> = total # of nodes (in the cluster)
    ALTER KEYSPACE syndeia_cloud_store WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', '<datacenter_name>' : <total_number_of_nodes> };
    -- where <datacenter_name> = the name of the datacenter as shown via nodetool status, and <total_number_of_nodes> = total # of nodes (in the cluster)
  3. Run nodetool repair:  Run nodetool repair --full <keyspace_name> on each node (for more details see https://docs.datastax.com/en/dse/5.1/cql/cql/cql_using/cqlUpdateKeyspaceRF.html)

26. Cassandra iRepeat steps 20 ~ 25 for each additional cluster node.  Optionally take another image with the new blank /var/lib/cassandra/{commitlog,data,hints,saved_caches} directories as a new baseline and altered RF.  As your deployment size grows >2 and you need to begin differentiating between seed and non-seed nodes, you may with to take additional node images

(info) Note, this is a simplified view of adding nodes to a single-datacenter (DC) scenario.  As your data usage increase and your cluster grows, you may wish to consider more sophisticated multi-DC topologies.  For further information on this please consult the Cassandra documentation.  

(info)  If you have configured Cassandra as a service, enable the cassandra service to auto-start.



Validating Cassandra Operation (or Cluster Replication) for 1-node (or multiple nodes)

27. To validate Cassandra operation (or cluster replication), we create a sample keyspace, tables, insert test data, create a user (role), grant permissions and perform a basic query with a Consistency Level (CL) of ONE to the server (or each node separately if a cluster has been setup).  To do this, connect via CQLSH on the server (or node 1 if testing a cluster) and perform the following steps.  

27.1. In CQLSH create a new sample keyspace with a Replication Factor (RF) = to the total # of nodes you have (see Appendix B2.6 for sample CQL code on how to do this for a single server or mult-inode cluster).  

27.2. In CQLSH create new sample tables (see Appendix B2.7 for sample CQL code on how to do this).

27.3. In CQLSH insert test data into the new tables (see Appendix B2.8 for sample CQL code on how to do this).

27.4. In CQLSH create a new login user (role) with a password and GRANT ALL PERMISSIONS to the keyspace created earlier (see Appendix B2.9 for sample CQL code on how to do this). 

28. Exit CQLSH and run nodetool status until you see Owns show 100.0% for your server (or for all nodes if testing on a cluster).

29. If testing only 1-node, skip to only perform steps 33-34

(info) Note: For cluster testing, the steps below assume you are testing with a 3-node cluster.

30. Stop the Cassandra service on node1. If you run nodetool status on node 2 or 3 you should now see the other node show as down (DN):

$ nodetool status
Datacenter: dc1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns (effective)  Host ID                               Rack
DN  192.168.1.2   487.87 KiB 256          100.0%            377d1403-ca85-45ae-a1ca-60e09a75425b  rack1
UN  192.168.1.3   644.57 KiB 256          100.0%            141c21db-4f79-476b-b818-ee6d2da16d7d  rack1
UN  192.168.1.4   497.5 KiB  256          100.0%            3d4b81b6-5ccd-4b0b-b126-17d5eed3b634  rack1

31. On the test node (ex: 2), ensure the Cassandra service is running, if not start it.
32. On the other node (ex: 3), stop the Cassandra service.
33. On the test node (ex: 2) connect via CQLSH or via DataStax DevCenter on your machine (see Appendix B2.10 on where to obtain this) & set the Consistency Level (CL) = ONE (CONSISTENCY ONE;) (see Appendix B2.12 for more details on Consistency Level) and issue a simple SELECT * FROM <keyspace>.<table_name>;  
34. Verify that you get 1 row back.
35. Repeat steps 31 ~ 34 but switch the nodes, ie: restart service on node 3, stop service on node 2, and issue query on node 3.


.....Page break.....

Related content