Apache Kafka Setup & Testing Instructions for Windows (2012-R2 x64)

Single Node Setup Instructions

1. Ensure you have the syndeia-cloud-3.5_cassandra_zookeeper_kafka_setup.zip (or latest service pack) downloaded to your home directory (or home directory's Downloads folder) from the download/license instructions sent out by our team.  

(info)  Note: the .ZIP will pre-create a separate folder for its contents when extracted so there is no need to pre-create a separate folder for it.  

2. Review Kafka's recommendations, ie: (Open|Oracle)JDK/JRE, memory, FS selection, params, etc. (see http://kafka.apache.org/documentation/#java till the Monitoring section for more details, however keep in mind most of it was written when deploying as a cluster).  

3. If using a firewall, ensure the following port is accessible (consult your local network admin if required): TCP port 9092 (this is the port to listen to for client connections).  

4. Ensure you have Apache Commons Daemon installed from https://archive.apache.org/dist/commons/daemon/binaries/windows/commons-daemon-1.2.3-bin-windows.zip

(info) Note, if you followed the Note from step 11 from Download, Install, & Run Apache Cassandra, you should already have this installed.

(warning) To avoid any issues on Windows, it is recommended you do NOT extract it to a path that contains any spaces in any of the directory names.  



Download, Install, Configure & Run Apache Kafka

1. Launch a Cygwin Terminal, it should start in your home dir. 

2. Download Kafka 2.13-3.2.1 from http://kafka.apache.org/downloads#3.2.1 (ie: wget https://archive.apache.org/dist/kafka/3.2.1/kafka_2.13-3.2.1.tgz)

3. Use tar to extract the .tar.gz file to /opt/ , ie: KAFKA_build_ver=2.13-3.2.1; tar -xvzf kafka_${KAFKA_build_ver}.tgz -C /opt/ ; where ${KAFKA_build_ver} = the version you downloaded, ex:  2.13-3.2.1

4. Create/update a symlink to the current version & pre-create the PID dir, ie:  winln -fs /opt/kafka_${KAFKA_build_ver} /opt/kafka-current && mkdir -p /var/run/kafka

5. Edit /opt/kafka_${KAFKA_build_ver}/config/server.properties and change:

  • log.dirs=/tmp/kafka-logs (on ~L60) to log.dirs=C:\\\cygwin64\\\opt\\\kafka-current\\\logs
  • In the Log Retention Policy section:

    # The minimum age of a log file to be eligible for deletion due to age
    log.retention.hours=-1
    
    #  Add this property at the end of your properties file.
    log.cleaner.enable=false

    (warning) Note, -1 sets retention to infinity for the Kafka "logs" (database), so monitor storage as you would any other database (see KB article Kafka, Windows, Logs, and KAFKA-1194 for more details on limitations of Kafka on Windows)

6. To install the Kafka service run the following from the /bin/Apache_Commons subdirectory in the Cygwin Terminal,

./Apache_Commons/install_as_Commons_service_windows.bash "kafka" "Apache Kafka" "Kafka + Commons Daemon service wrapper" "c:\\cygwin64\\opt\\kafka-current" "C:\\cygwin64\\opt\\kafka-current\\libs\\*" "c:\\cygwin64\\opt\\kafka-current" "kafka.Kafka" "main" "C:\\cygwin64\\opt\\kafka-current\\config\\server.properties" "java.lang.System" "exit" "0" "1024" "1024" \
                "-Dserver" \
                "-XX:+UseG1GC" \
                "-XX:MaxGCPauseMillis=20" \
                "-XX:InitiatingHeapOccupancyPercent=35" \
                "-XX:+ExplicitGCInvokesConcurrent" \
                "-Djava.awt.headless=true" \
                "-Xloggc:C:\\cygwin64\\opt\\kafka-current\\logs\\kafkaServer-gc.log" \
                "-Dverbose:gc" \
                "-XX:+PrintGCDetails" \
                "-XX:+PrintGCDateStamps" \
                "-XX:+PrintGCTimeStamps" \
                "-XX:+UseGCLogFileRotation" \
                "-XX:NumberOfGCLogFiles=10" \
                "-XX:GCLogFileSize=100M" \
                "-Dcom.sun.management.jmxremote" \
                "-Dcom.sun.management.jmxremote.authenticate=false" \
                "-Dcom.sun.management.jmxremote.ssl=false" \
                "-Dkafka.logs.dir=C:\\cygwin64\\opt\\kafka-current\\logs" \
                "-Dlog4j.configuration=file:C:\\cygwin64\\opt\\kafka-current\\config\\log4j.properties"

To start the service, run sc start kafka (or start it through the NT Service Manager Applet, ie:  services.msc)

7. If the service successfully starts you should get the command prompt again.  To check the status of Kakfa services, use either the services.msc Control Panel applet or run sc queryex kafka from an Administrator "Command Prompt" (CMD.EXE) (launch via Start→Run or if using GUI, R-click on icon and select "Run as Administrator").  You can verify that it started by verifying "4 RUNNING" shows up in the STATE field's output:

C:\>sc queryex kafka
 
SERVICE_NAME: kafka
        TYPE               : 10  WIN32_OWN_PROCESS
        STATE              : 4  RUNNING
                                (STOPPABLE, NOT_PAUSABLE, ACCEPTS_SHUTDOWN)
        WIN32_EXIT_CODE    : 0  (0x0)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x0
        WAIT_HINT          : 0x0
        PID                : 8337
        FLAGS              :
 
C:\>

8. To examine the log file, you can look at c:\cygwin64\opt\kafka_2.13-3.2.1\logs\server.log or in Cygwin Terminal use less /opt/kafka_2.13-3.2.1/logs/server.log.  To follow the log, you can use tail -f /opt/kafka_2.13-3.2.1/logs/server.log . 

    You should see output similar to the following (abridged) text:

$ less /opt/kafka_2.13-3.2.1/logs/server.log
[...]
[2019-04-05 20:07:30,673] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2019-04-05 20:07:31,000] INFO starting (kafka.server.KafkaServer)
[2019-04-05 20:07:31,001] INFO Connecting to zookeeper on localhost:2181 (kafka.server.KafkaServer)
[2019-04-05 20:07:31,015] INFO [ZooKeeperClient] Initializing a new session to localhost:2181. (kafka.zookeeper.ZooKeeperClient)
[2019-04-05 20:07:31,032] INFO Client environment:zookeeper.version=3.6.3-6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT (org.apache.zookeeper.ZooKeeper)
[2019-04-05 20:07:31,032] INFO Client environment:host.name=kafka.domain.tld (org.apache.zookeeper.ZooKeeper)
[2019-04-05 20:07:31,032] INFO Client environment:java.version=1.8.0_332 (org.apache.zookeeper.ZooKeeper)
[2019-04-05 20:07:31,032] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper)
[2019-04-05 20:07:31,032] INFO Client environment:java.home=/usr/lib/jvm/java-8-oracle/jre (org.apache.zookeeper.ZooKeeper)
[2019-04-05 20:07:31,032] INFO Client environment:java.class.path=/opt/kafka_2.13-3.2.1/bin/../libs/aopalliance-repackaged-2.5.0-b32.jar:/opt/kafka_2.13-3.2.1/bin/../libs/argparse4j-0.7.0
.jar:/opt/kafka_2.13-3.2.1/bin/../libs/commons-lang3-3.5.jar:/opt/kafka_2.13-3.2.1/bin/../libs/connect-api-1.1.0.jar:/opt/kafka_2.13-3.2.1/bin/../libs/connect-file-1.1.0.jar:/opt/kafka_2.13-3.2.1/bin/../libs/connect-json-1.1.0.jar:/opt/kafka_2.13-3.2.1/bin/../libs/connect-runtime-1.1.0.jar:/opt/kafka_2.13-3.2.1/bin/../libs/connect-transforms-1.1.0.jar:/opt/kafka_2.13-3.2.1/bin/../libs/guava-20.0.jar:/opt/kafka_2.13-3.2.1/bin/../libs/hk2-api-2.5.0-b32.jar:/opt/kafka_2.13-3.2.1/bin/../libs/hk2-locator-2.5.0-b32.jar:/opt/kafka_2.13-3.2.1/bin/../libs/hk2-utils-2.5.0-b32.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jackson-annotations-2.9.4.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jackson-core-2.9.4.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jackson-databind-2.9.4.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jackson-jaxrs-base-2.9.4.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jackson-jaxrs-json-provider-2.9.4.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jackson-module-jaxb-annotations-2.9.4.jar:/opt/kafka_2.13-3.2.1/bin/../libs/javassist-3.20.0-GA.jar:/opt/kafka_2.13-3.2.1/bin/../libs/javassist-3.21.0-GA.jar:/opt/kafka_2.13-3.2.1/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka_2.13-3.2.1/bin/../libs/javax.inject-1.jar:/opt/kafka_2.13-3.2.1/bin/../libs/javax.inject-2.5.0-b32.jar:/opt/kafka_2.13-3.2.1/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka_2.13-3.2.1/bin/../libs/javax.ws.rs-api-2.0.1.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jersey-client-2.25.1.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jersey-common-2.25.1.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jersey-container-servlet-2.25.1.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jersey-container-servlet-core-2.25.1.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jersey-guava-2.25.1.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jersey-media-jaxb-2.25.1.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jersey-server-2.25.1.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jetty-client-9.2.24.v20180105.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jetty-continuation-9.2.24.v20180105.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jetty-http-9.2.24.v20180105.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jetty-io-9.2.24.v20180105.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jetty-security-9.2.24.v20180105.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jetty-server-9.2.24.v20180105.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jetty-servlet-9.2.24.v20180105.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jetty-servlets-9.2.24.v20180105.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jetty-util-9.2.24.v20180105.jar:/opt/kafka_2.13-3.2.1/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka_2.13-3.2.1/bin/../libs/kafka_2.13-3.2.1.jar:/opt/kafka_2.13-3.2.1/bin/../libs/kafka_2.13-3.2.1-sources.jar:/opt/kafka_2.13-3.2.1/bin/../libs/kafka_2.13-3.2.1-test-sources.jar:/opt/kafka_2.13-3.2.1/bin/../libs/kafka-clients-1.1.0.jar:/opt/kafka_2.13-3.2.1/bin/../libs/kafka-log4j-appender-1.1.0.jar:/opt/kafka_2.13-3.2.1/bin/../libs/kafka-streams-1.1.0.jar:/opt/kafka_2.13-3.2.1/bin/../libs/kafka-streams-examples-1.1.0.jar:/opt/kafka_2.13-3.2.1/bin/../libs/kafka-streams-test-utils-1.1.0.jar:/opt/kafka_2.13-3.2.1/bin/../libs/kafka-tools-1.1.0.jar:/opt/kafka_2.13-3.2.1/bin/../libs/log4j-1.2.17.jar:/opt/kafka_2.13-3.2.1/bin/../libs/lz4-java-1.4.jar:/opt/kafka_2.13-3.2.1/bin/../libs/maven-artifact-3.5.2.jar:/opt/kafka_2.13-3.2.1/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka_2.13-3.2.1/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka_2.13-3.2.1/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka_2.13-3.2.1/bin/../libs/reflections-0.9.11.jar:/opt/kafka_2.13-3.2.1/bin/../libs/rocksdbjni-5.7.3.jar:/opt/kafka_2.13-3.2.1/bin/../libs/scala-library-2.11.12.jar:/opt/kafka_2.13-3.2.1/bin/../libs/scala-logging_2.11-3.7.2.jar:/opt/kafka_2.13-3.2.1/bin/../libs/scala-reflect-2.11.12.jar:/opt/kafka_2.13-3.2.1/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka_2.13-3.2.1/bin/../libs/slf4j-log4j12-1.7.25.jar:/opt/kafka_2.13-3.2.1/bin/../libs/snappy-java-1.1.7.1.jar:/opt/kafka_2.13-3.2.1/bin/../libs/validation-api-1.1.0.Final.jar:/opt/kafka_2.13-3.2.1/bin/../libs/zkclient-0.10.jar:/opt/kafka_2.13-3.2.1/bin/../libs/zookeeper-3.5.10.jar (org.apache.zookeeper.ZooKeeper)
[2019-04-05 20:07:31,032] INFO Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2019-04-05 20:07:31,032] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2019-04-05 20:07:31,033] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2019-04-05 20:07:31,033] INFO Client environment:os.name=Windows Server 2016 (org.apache.zookeeper.ZooKeeper)
[2019-04-05 20:07:31,033] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2019-04-05 20:07:31,033] INFO Client environment:os.version=10.0 (org.apache.zookeeper.ZooKeeper)
[2019-04-05 20:07:31,033] INFO Client environment:user.name=kafka (org.apache.zookeeper.ZooKeeper)
[2019-04-05 20:07:31,033] INFO Client environment:user.home=/home/kafka (org.apache.zookeeper.ZooKeeper)
[2019-04-05 20:07:31,033] INFO Client environment:user.dir=/ (org.apache.zookeeper.ZooKeeper)
[2019-04-05 20:07:31,034] INFO Initiating client connection, connectString=localhost:2181 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@49ec71f8 (org.apache.zookeeper.ZooKeeper)
[2019-04-05 20:07:31,047] INFO [ZooKeeperClient] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2019-04-05 20:07:31,048] INFO Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-04-05 20:07:31,053] INFO Socket connection established to localhost/127.0.0.1:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-04-05 20:07:31,059] INFO Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x169eeac4d530002, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2019-04-05 20:07:31,062] INFO [ZooKeeperClient] Connected. (kafka.zookeeper.ZooKeeperClient)
[2019-04-05 20:07:31,283] INFO Cluster ID = k7If0XwzQS6n-iqqQz6ztr (kafka.server.KafkaServer)

9.  Validate correct operation and create/update an archive image to use as a new base image if the node needs to be rebuilt or if you wish to create a cluster. 

(info)  Before making the image you may wish to first stop and optionally disable the service temporarily to prevent auto-start on boot, ie:  sc config kafka start=disable.   



Multi-Node (Cluster) Setup Instructions (Adding nodes to an existing single-node SC deployment)

11. Deploy another instance of your Kafka base image.

12. Make any appropriate changes for the MAC address (ex: in the VM settings, if used).

13. Setup forward & reverse DNS records on your DNS server (consult your IT admin/sysadmin if required) and set the hostname and primary DNS suffix on the machine itself.

14. RDP to the IP (or the FQDN of the new node if DNS has already propagated) & launch Cygwin Terminal.

15A. On the ORIGINAL Zookeeper (ZK) node:  ensure you have already re-bound the Zookeeper port to an external interface that the additional NEW Kafka node can reach, note the FQDN of the ORIGINAL Zookeeper node.    

15B. On the NEW Kafka node:  Update the Kafka Config file (/opt/kafka-current/config/server.properties) : ie, change the zookeeper.connect (on ~L125) from localhost / 127.0.0.1 to the name of the ORIGINAL Zookeeper node, ex:  

zookeeper.connect=my-ZK-server.domain.tld:2181

(info) Note, it is possible to list additional failover zNode servers by just listing each additional server as comma-separated entries, but that is beyond the scope of this document, see https://kafka.apache.org/32/documentation.html#brokerconfigs_zookeeper.connect for more details

(info) Note, if your topology evolution involves moving Kafka COMPLETELY off the original single-node SC deployment, ex: you are moving to host Kafka on a managed service provider (ex:  AWS Managed Services for Kafka (MSK)), you will want to disable both the local Apache Zookeeper and Kafka services and update kafka_native = "http://localhost:9092" in all SC application.conf files under each service in /opt/syndeia-cloud/current/... on the SC node(s) to point to Kafka on the managed service provider, ex:  

for i in /opt/icx/syndeia-cloud-current/**/conf/application.conf; do sudo -u syndeia-cloud sed -i 's#kafka_native = "http://localhost:9092"#kafka_native = "http://my-Kafka-server.domain.tld:9092"#' $i; done;

15C. On the ORIGINAL Kafka node:  Migrate / Reassign Partitions Topics:  Follow https://kafka.apache.org/32/documentation.html#basic_ops_cluster_expansion

16. Repeat steps 11 ~ 15 for each additional cluster node.



Validating Kafka Operation (or Cluster Replication) for 1-node (or multiple nodes)

17. To validate Kafka operation, we create a sample test topic, start a producer, put some messages into the queue, start a consumer script, and validate we see them.  To do this, perform the following steps:  

17.1.  Open two new Cygwin terminal windows

17.2.  Follow steps 3-5 of "Quickstart" https://kafka.apache.org/32/documentation.html#quickstart .  You should see messages input into the producer side show up on the consumer side:    

(info)  Note, you will first need to cd into the Kafka directory to run steps 3-5, ie:  /opt/kafka-current/bin.  

17.3.  To quit hit ^C in the producer and consumer scripts. 


.....Page break.....