Apache Cassandra (Single & Multi-Node/Cluster) Setup & Testing Instructions for Windows (2012-R2 x64)

Apache Cassandra (Single & Multi-Node/Cluster) Setup & Testing Instructions for Windows (2012-R2 x64)

Single Node Setup Instructions

1. Deploy a new standard Windows 2012-R2 x64 image on a physical or virtual machine (VM) or install from a Deployment Kit unattend.txt script or install from media manually.
2. Setup forward & reverse DNS records on your DNS server (consult your IT admin/sysadmin if required) and set the hostname and primary DNS suffix on the machine itself if necessary (see Appendix B2.1)  
3. If using a firewall, ensure the following ports are accessible (consult your local network admin if required): TCP ports 7000, 7001, 7199, 9042, 9142, 9160 (for details on what each port is used for see http://cassandra.apache.org/doc/latest/faq/index.html#what-ports & https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/security/secFirewallPorts.html#secFirewallPorts__firewall_table).  

(info) Note: If required by your IT department, perform any other standard configuration (ie: create a separate admin account with a strong password, set timezone, date & time or set it to synchronize with an NTP server, etc.), server hardening (ie: disable default administrator logins, change default remote port (ex: RDP), install anti-brute force attack prevention software (ex: IPBan, RDPGuard), enable & configure local firewall, etc.).  



Download, Install & Configure JRE

4. Download & install Oracle JRE 1.8.0_151 from http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html
5. Set the System JAVA_HOME=C:\Program Files\Java\jre1.8.0_151 Environment Variable (for steps on how to set this permanently, see Appendix B2.2)
6. Reboot



Download & Install Python

7. Download & Install Python 2.7.X from https://www.python.org/downloads/windows/ (https://www.python.org/ftp/python/2.7.14/python-2.7.14.amd64.msi).  This is required for CQLSH (Cassandra Query Language Shell).  

(warning) IMPORTANT: when running through the wizard, be sure to select the option to install python into your PATH.  After the installer finishes you will need to logout & login for the changes to the PATH environment to take effect.

(info) Note:  If you want tab-completion support, you will also need to install the pyreadline module once the Python install has finished, ie: pip install pyreadline.



Download, Install & Run Apache Cassandra

8. Download the stable release of Apache Cassandra v3.0.15 under "Older Supported Releases" from http://cassandra.apache.org/download/ (http://www.apache.org/dyn/closer.lua/cassandra/3.0.15/apache-cassandra-3.0.15-bin.tar.gz, other versions may work but this is the version that was tested against for Syndeia Cloud v3.2.  

(info) Note: Because Windows by default doesn't know how to read .tar.gz "tarballs" (an extension which comes from "tape archive" that has been "GNU Zip"  compressed), you will need to download & install a utility such as (the open source) 7z(ip) (un)archiver, see http://www.7-zip.org/ (http://www.7-zip.org/a/7z1604-x64.exe)

9. Extract apache-cassandra-3.0.15-bin.tar.gz to C:\Program Files
10.  In Windows Explorer, navigate to  to C:\Program Files\apache-cassandra-3.0.15\.  
11.  Edit C:\Program Files\apache-cassandra-3.0.15\conf\cassandra.yaml & verify the following settings are set:  

cluster_name: 'YourClusterName'
num_tokens: 256
authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer
seed_provider:
 - class_name: org.apache.cassandra.locator.SimpleSeedProvider
   parameters:
        - seeds: "127.0.0.1"
listen_address: localhost
rpc_address: localhost

(info) Note1:  FQDN = Fully Qualified Domain Name, ex: cassandra.mycompany.com 

(info) Note2:  For a quick start, the above settings will get you up and running, however for any production deployment scenarios you may wish to implement other settings to enhance security & performance (see Appendix B2.11 for more details)

(warning) If you frequently deal with large artifact sizes, you may want to also bump up batch_size_fail_threshold_in_kb from default of 50 (KB) to, for ex. 100.

12. Open an Administrator Command Prompt (CMD.EXE) in C:\Program Files\apache-cassandra-3.0.15 (Note, it is highly recommended you enable QuickEdit Mode, increase the scrollback history, and window size to make it easier to grab log messages later if needed.  See Appendix B2.3 on how to do this)

13. To run the Cassandra service, type .\bin\cassandra.bat LEGACY 

(info)  Note: If you wish to ensure the service runs on startup, you can either:  a.  place a (number prefixed shortcut) to the .BAT file in your Startup folder (ex: %AllUsersProfile%\Microsoft\Windows\Start Menu\Programs\StartUp) or, b. if you wish to manage this as a standard Windows NT service, use the Apache Commons Daemon (https://archive.apache.org/dist/commons/daemon/binaries/windows/commons-daemon-1.1.0-bin-windows.zip) to run Cassandra as a service and set it to Auto.  To do all this, extract the .ZIP to Cassandra's bin folder with the top-level folder renamed from commons-daemon-1.1.0-bin-windows to daemon.  If you are on 64-bit Windows, you will need to first edit the set PATH_PRUNSRV line in Cassandra's bin\cassandra.bat file to point to the 64-bit binary in %CASSANDRA_HOME%\bin\daemon\amd64\.  Then, run .\bin\cassandra.bat LEGACY INSTALL to install as a service (see http://commons.apache.org/proper/commons-daemon/ for more details).  

14. You should see a lot of output similar to the following (abridged) text (for the full text of an example successful startup, see Appendix B2.4):

C:\Program Files\apache-cassandra-3.0.15>.\bin\cassandra.bat LEGACY
WARNING! Powershell script execution unavailable.
   Please use 'powershell Set-ExecutionPolicy Unrestricted'
   on this user-account to run cassandra with fully featured
   functionality on this platform.
Starting with legacy startup options
Starting Cassandra Server
INFO  17:30:52 Configuration location: file:/C:/Program%20Files%20(x86)/apache-cassandra-3.0.15/conf/cassandra.yaml
INFO  17:30:52 Node configuration:[allocate_tokens_for_keyspace=null; authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_bootstrap=true; a uto_snapshot=true; batch_size_fail_threshold_in_kb=50; batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; broadcast_address=null; broadcast _rpc_address=null; buffer_pool_use_heap_if_exhausted=true; cas_contention_timeout_in_ms=1000; 
[...]
INFO  17:30:52 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
[...]
INFO  17:30:52 Hostname: SYND-W2012R2
INFO  17:30:52 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.8.0_151
[...]
INFO  17:30:56 Cassandra version: 3.0.15
INFO  17:30:56 Thrift API version: 20.1.0
INFO  17:30:56 CQL supported versions: 3.4.0 (default: 3.4.0)
INFO  17:30:56 Initializing index summary manager with a memory pool size of 101 MB and a resize interval of 60 minutes
INFO  17:30:56 Loading persisted ring state
INFO  17:30:56 Starting up server gossip
INFO  17:30:56 Updating topology for localhost/127.0.0.1
INFO  17:30:56 Updating topology for localhost/127.0.0.1
[...]
INFO  17:30:56 Node localhost/127.0.0.1 state jump to NORMAL
INFO  17:30:56 Netty using Java NIO event loop
[...]
INFO  17:30:56 Starting listening for CQL clients on localhost/127.0.0.1:9042 (unencrypted)...
INFO  17:30:56 Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it

15. Open a new Administrator CMD.EXE prompt window in C:\Program Files\apache-cassandra-3.0.15\bin
16. In the CMD.EXE prompt window, run nodetool status, you should see output similar to the following:

C:\Program Files\apache-cassandra-3.0.15\bin>nodetool status
Datacenter: datacenter1
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  206.93 KB  256          100.0%            41ab853b-5d48-4c4e-8d59-40e165acadae  rack1


C:\Program Files\apache-cassandra-3.0.15\bin>

17. At this point you may wish to create an archive image to use as a new base image if the node needs to be rebuilt or if you wish to create a cluster (see Appendix B2.5 for more details on how to safely clone a Windows machine).  




Multi-Node (Cluster) or Dedicated Node Setup Instructions

Enabling your single-node deployment for cluster operation or external access

If you followed the steps in the previous section to deploy a single-node with Cassandra, you will need to make a few adjustments so the Cassandra ports are no longer bound to localhost and are accessible externally or from other cluster nodes.  

(warning) If you have not already done so, you may wish to secure the cassandra superuser account's password and change it from the default, especially if you will be binding to a public interface on the internet (see Appendix B2.11 on how to do this).  

11. Make a backup of cassandra.yaml & open it for editing.   
12. Change the following settings:  

seed_provider:
 - class_name: org.apache.cassandra.locator.SimpleSeedProvider
   parameters:
        # where w.x.y.z = the IP address of the node,
        - seeds: "w.x.y.z"
[...]

listen_address: # set to IP of node or leave blank to pickup OS provided IP/FQDN
rpc_address:    # set to IP of node or leave blank to pickup OS provided IP/FQDN

(warning) IMPORTANT: Be aware that if you have other Cassandra server(s) with the same cluster_name as this one, this node will attempt to join it when the service is restarted, which may cause issues that will be difficult to troubleshoot later.  If you do not wish for this occur, you will need to first update cluster_name in the system keyspace via CQLSH (on ALL Cassandra server(s) in the cluster you are renaming), see Appendix B2.13 for details on how to do this.  

13. Save cassandra.yaml
14. Restart the Cassandra service, ie: net stop cassandra; net start cassandra or sc '\\localhost' stop cassandra; sc '\\localhost' start cassandra (if it doesn't start, review your changes for errors),
15. Update any firewall configurations on the OS and/or externally, ie:  AWS or your cloud provider



Adding new nodes to an existing single-node

16. Deploy another instance of your Cassandra base image and make any appropriate changes to the cloned MAC address, if necessary (ex: in the VM settings and/or udev, if used).
17. Setup forward & reverse DNS records on your DNS server (consult your IT admin/sysadmin if required) and set the hostname and primary DNS suffix on the machine itself.  
18. RDP to the IP (or the FQDN of the new node if DNS has already propagated).
19. Follow "Initializing a multiple node cluster (single datacenter)" https://docs.datastax.com/en/cassandra/3.0/cassandra/initialize/initSingleDS.html

(info) Note1: in the provided example cassandra.yaml, rpc_address is shown set to 0.0.0.0, however leaving this blank will let it pick up the address automatically.  

(info) Note2: in steps 3a and 7, to stop Cassandra installed via a tarball on a Windows system, hit ^C (<CTRL> + C) in the CMD.EXE window used to start Cassandra.  To start cassandra, run .\bin\cassandra.bat LEGACY from the Cassandra directory.  If installed as a standard Windows NT service, use the net or sc command or the services.msc MMC applet.  

(warning) IMPORTANT: Pay special attention to steps 3b & 4, the data dir must be empty for a node to join the cluster and auto_bootstrap: false should only be added in cassandra.yaml on seed nodes. Per the “Prerequisites” section, normally one would elect a subset of the nodes to be seeds (usually 2-3 per datacenter is sufficient), however be aware there currently is a regression bug in Cassandra v3.6 ~ v3.11.1 that prevents non-seed nodes from starting, the workaround currently is to set all nodes as seeds, ex: seeds: <node1_IP>, <node2_IP>, ... <nodeN_IP> (see https://issues.apache.org/jira/browse/CASSANDRA-13851)

20. Repeat steps 16 ~ 19 for each additional cluster node.



Validating Cassandra Operation (or Cluster Replication) for 1-node (or multiple nodes)

21. To validate Cassandra operation (or cluster replication), we create a sample keyspace, tables, insert test data, create a user (role), grant permissions and perform a basic query with a Consistency Level (CL) of ONE to the server (or each node separately if a cluster has been setup).  To do this, connect via CQLSH on the server (or node 1 if testing a cluster) and perform the following steps.  

21.1. In CQLSH create a new sample keyspace with a Replication Factor (RF) = to the total # of nodes you have (see Appendix B2.6 for sample CQL code on how to do this for a single server or mult-inode cluster).  

21.2. In CQLSH create new sample tables (see Appendix B2.7 for sample CQL code on how to do this).

21.3. In CQLSH insert test data into the new tables (see Appendix B2.8 for sample CQL code on how to do this).

21.4. In CQLSH create a new login user (role) with a password and GRANT ALL PERMISSIONS to the keyspace created earlier (see Appendix B2.9 for sample CQL code on how to do this). 

22. Exit CQLSH and run nodetool status until you see Owns show 100.0% for your server (or for all nodes if testing on a cluster).

23. If testing only 1-node, skip to only perform steps 27-28

(info) Note: For cluster testing, the steps below assume you are testing with a 3-node cluster.

24. Stop the Cassandra service on node1. If you run nodetool status on node 2 or 3 you should now see the other node show as down (DN):

$ nodetool status
Datacenter: dc1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns (effective)  Host ID                               Rack
DN  192.168.1.2   487.87 KiB 256          100.0%            377d1403-ca85-45ae-a1ca-60e09a75425b  rack1
UN  192.168.1.3   644.57 KiB 256          100.0%            141c21db-4f79-476b-b818-ee6d2da16d7d  rack1
UN  192.168.1.4   497.5 KiB  256          100.0%            3d4b81b6-5ccd-4b0b-b126-17d5eed3b634  rack1

25. On the test node (ex: 2), ensure the Cassandra service is running, if not start it.
26. On the other node (ex: 3), stop the Cassandra service.
27. On the test node (ex: 2) connect via CQLSH or via DataStax DevCenter on your machine (see Appendix B2.10 on where to obtain this) & set the Consistency Level (CL) = ONE (CONSISTENCY ONE;) (see Appendix B2.12 for more details on Consistency Level) and issue a simple SELECT * FROM <keyspace>.<table_name>;  
28. Verify that you get 1 row back.
29. Repeat steps 25 ~ 28 but switch the nodes, ie: restart service on node 3, stop service on node 2, and issue query on node 3.


Loading