JanusGraph Setup & Testing Instructions for Windows (2012-R2 x64)

Single Node Setup Instructions



Pre-requisites

1.  Ensure you have the syndeia-cloud-3.3.${build_number}_janusgraph_setup.zip downloaded to your home directory (or home directory's Downloads folder) from the download/license instructions sent out by our team.  

(info)  Note: the .ZIP will pre-create a separate folder for its contents when extracted so there is no need to pre-create a separate folder for it.  

2.  Ensure you satisfy Janusgraph's pre-requisites, ie: have (Open|Oracle)JDK/JRE + Apache Cassandra (& optionally Elasticsearch) installed.  Re. CPU, memory, HD space, etc. this will depend on how much data you expect to store, query, and graph.  

3.  If using a firewall, ensure the following port is accessible (consult your local network admin if required): TCP port 8182 (this is the port to listen to for client connections).  

(info) Note: If required by your IT department, perform any other standard configuration, server hardening (ie: enabling & configuring local firewall, etc.)

4. Ensure you have Apache Commons Daemon installed from https://archive.apache.org/dist/commons/daemon/binaries/windows/commons-daemon-1.2.3-bin-windows.zip

(info) Note, if you followed the Note from step 10 from Download, Install, & Run Apache Cassandra, you should already have this installed.

(warning) To avoid any issues on Windows, it is recommended you do NOT extract it to a path that contains any spaces in any of the directory names.  



Downloading & Extracting JanusGraph

1. Launch a Cygwin Terminal, it should start in your home dir. 

2. Download Janusgraph 0.31 from https://github.com/JanusGraph/janusgraph/releases/tag/v0.3.1 + http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe (this is required by Janusgraph when run on Windows), ie: wget https://github.com/JanusGraph/janusgraph/releases/download/v0.3.1/janusgraph-0.3.1-hadoop2.zip; wget http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe

(info) Note, releases currently seem to be all suffixed with “...-hadoop2...” despite being applicable for all databases, even cassandra, so download the one suffixed with “...-hadoop2.zip

3. Download Syndeia Cloud's JanusGraph setup package from the Downloads page.  

4. Unzip the JanusGraph package into /opt, ie:  JG_build_ver=0.3.1-hadoop2; unzip janusgraph-${JG_build_ver}.zip -d /opt/

(info) Note, on Windows, HortonWorks' winutils.exe is "required" to avoid an error when gremlin-server.bat or gremlin.bat is run.  To avoid this error, cd to your Janusgraph bin dir and download winutils.exe (you may need to also copy in MSVCR100.DLL from $JAVA_HOME\bin too), ie:  cd /opt/janusgraph-${JG_build_ver}; wget http://public-repo-1.hortonworks.com/hdp-win-alpha/winutils.exe; cp $JAVA_HOME/bin/msvcr100.dll /opt/janusgraph-${JG_build_ver};

(warning) IMPORTANT!  There is currently a bug in Janusgraph's bin\gremlin.bat file that prevents proper CLI script execution mode.  This requires the following patch (shown below in diff format) to be applied to it (first line indicates the approximate line #s in each file < = original, removal > = edited, additions)

113,119c113,134
< SET strg=
< 
< FOR %%X IN (%*) DO (
< CALL :concat %%X %1 %2
< )
< 
< java %JAVA_OPTIONS% %JAVA_ARGS% -cp %CP% org.apache.tinkerpop.gremlin.groovy.jsr223.ScriptExecutor %strg%
---
> REM 2019-11-21, BKM:  Below changes have been made to avoid the "Error: Could not find or load main class org.apache.tinkerpop.gremlin.console.Console" error message during CLI execution mode
> REM --------------
> REM SET CP=C:\cygwin64\opt\janusgraph-0.3.1-hadoop2\conf;C:\cygwin64\opt\janusgraph-0.3.1-hadoop2\lib\*.jar
> 
> SET JAVA_OPTIONS=-server ^
> -Duser.working_dir=c:\Users\Administrator ^
> -Dtinkerpop.ext=c:\cygwin64\opt\janusgraph-0.3.1-hadoop2\ext ^
> -Dlog4j.configuration=conf\log4j-console.properties ^
> -Dgremlin.log4j.level=WARN ^
> -javaagent:C:\cygwin64\opt\janusgraph-0.3.1-hadoop2\lib\jamm-0.3.0.jar ^
> -Dgremlin.io.kryoShimService=org.janusgraph.hadoop.serialize.JanusGraphKryoShimService 
> 
> REM SET strg=
> SET strg=%1 %2
> REM FOR %%X IN (%*) DO (
> REM CALL :concat %%X %1 %2
> REM )
> 
> REM java %JAVA_OPTIONS% %JAVA_ARGS% -cp %CP% org.apache.tinkerpop.gremlin.groovy.jsr223.ScriptExecutor %strg%
> echo %JAVA_OPTIONS% %JAVA_ARGS% -cp %CP% org.apache.tinkerpop.gremlin.console.Console %strg%
> java %JAVA_OPTIONS% %JAVA_ARGS% -cp %CP% org.apache.tinkerpop.gremlin.console.Console %strg%
> REM --------------

5. Unzip Syndeia Cloud's JanusGraph setup packages into /opt/icx, ie: unzip syndeia-cloud-3.3_janusgraph_setup.zip -d /opt/icx/



JanusGraph Keyspace Setup, Service Setup & Start:

6. cd into the bin dir and run the Syndeia Cloud JanusGraph setup script, ie:  cd /opt/icx/syndeia-cloud-3.3_janusgraph_setup/bin; ./syndeia-cloud-3.3_janusgraph_setup_windows.bash.  This will:  

- Create/update janusgraph-current symlink to specified version, default = 0.3.1
- Create a new syndeia_admin superuser in Cassandra with a password you specify
- Create a new syndeia_cloud_graph and syndeia_cloud_graph_config keyspaces
GRANT ALL PERMISSIONS ON KEYSPACE syndeia_cloud_graph TO syndeia_admin
GRANT ALL PERMISSIONS ON KEYSPACE syndeia_cloud_graph_config TO syndeia_admin
- Install & run a Groovy JanusGraph setup script to set storage params for your graph and build indexes
- Create a renamed copy of the file /opt/janusgraph-<release_ver>/conf/janusgraph-cql-configurationgraph.properties as janusgraph-cql-configurationgraph-syndeia.properties with: 

graph.graphname=syndeia_cloud_graph_config
Add storage.username=syndeia_admin
Add storage.password=<password_specified>
storage.hostname=<your_Cassandra_host>

- If you installed Elasticsearch on the same machine, add index.search.backend=elasticsearch and index.search.hostname=localhost to use Elasticsearch for search indexing
- Create a renamed copy of the file /opt/janusgraph-<release_ver>/conf/gremlin-server/gremlin-server-configuration.yaml as gremlin-server-configuration-syndeia.yaml and set ConfigurationManagementGraph to point to janusgraph-cql-configurationgraph-syndeia.properties.  
- Install service file for JanusGraph service
- Start JanusGraph service

(info) Note: you may be prompted for sudo authentication, you will also be prompted to set your syndeia_admin password and the FQDN of your Cassandra host.  

(warning)  Avoid any of the following special characters: \?*[]+#&.{}$ when setting your syndeia_admin password.



Managing JanusGraph:

7. To check the status of Janusgraph services, use either the services.msc Control Panel applet or run sc "\\localhost" queryex janusgraph from an Administrator "Command Prompt" (CMD.EXE) (launch via Start→Run or if using GUI, R-click on icon and select "Run as Administrator").  You can verify that it started by verifying "4 RUNNING" shows up in the STATE field's output:

 

C:\>sc "\\localhost" queryex janusgraph

SERVICE_NAME: janusgraph
        TYPE               : 10  WIN32_OWN_PROCESS
        STATE              : 4  RUNNING
                                (STOPPABLE, NOT_PAUSABLE, ACCEPTS_SHUTDOWN)
        WIN32_EXIT_CODE    : 0  (0x0)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x0
        WAIT_HINT          : 0x0
        PID                : 8336
        FLAGS              :

C:\>

8. To stop/start the Janusgraph services, use either the services.msc Control Panel applet or run sc "\\localhost" <action> janusgraph; where <action> = one of start|stop from the Command Prompt (CMD.EXE).  The service will be issued a start|stop command (query the status a few seconds afterwards to confirm successful start), ex:  

C:\>sc "\\localhost" start janusgraph

SERVICE_NAME: janusgraph
        TYPE               : 10  WIN32_OWN_PROCESS
        STATE              : 2  START_PENDING
                                (NOT_STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN)
        WIN32_EXIT_CODE    : 0  (0x0)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x0
        WAIT_HINT          : 0x7d0
        PID                : 8336
        FLAGS              :
C:\>

(info)  Note: If you wish to ensure the services run on startup, use either the services.msc Control Panel applet or run sc "\\localhost" config janusgraph start=auto depend=cassandra.  For alternative ways to setup non-native Windows services (to auto-start), see Setting up Services to Start on Boot.  

9. To view the logs for JanusGraph, use less /opt/janusgraph-0.3.1-hadoop2/log/gremlin-server.log.  To follow the log files, you can use tail -f /opt/janusgraph-0.3.1-hadoop2/log/gremlin-server.log.  You should see output similar to the following (abridged) text:

(info) Note, for your convenience you may wish to create a symlink to /opt/janusgraph-0.3.1-hadoop2/log/ from /var/log, ie: ln -nfs /opt/janusgraph-0.3.1-hadoop2/log/ /var/log/janusgraph

$ less /var/log/janusgraph/gremlin-server.log
[...]
2987 [main] INFO  com.datastax.driver.core.NettyUtil  - Found Netty's native epoll transport in the classpath, using it
4827 [main] INFO  com.datastax.driver.core.policies.DCAwareRoundRobinPolicy  - Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)
4835 [main] INFO  com.datastax.driver.core.Cluster  - New Cassandra host janusgraph.domain.com/localhost:9042 added
9043 [main] INFO  org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration  - Set default timestamp provider MICRO
11325 [main] INFO  org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration  - Generated unique-instance-id=2d3877742385-janusgraph-domain-com1
11341 [main] INFO  com.datastax.driver.core.ClockFactory  - Using java.lang.System clock to generate timestamps.
11902 [main] INFO  com.datastax.driver.core.policies.DCAwareRoundRobinPolicy  - Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor)
11902 [main] INFO  com.datastax.driver.core.Cluster  - New Cassandra host janusgraph.domain.com/localhost:9042 added
12251 [main] INFO  org.janusgraph.diskstorage.Backend  - Initiated backend operations thread pool of size 8
27969 [main] INFO  org.janusgraph.diskstorage.Backend  - Configuring total store cache size: 208138508
33098 [main] INFO  org.janusgraph.diskstorage.log.kcvs.KCVSLog  - Loaded unidentified ReadMarker start time 2019-03-18T01:55:02.918Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@28d6290
35668 [main] INFO  org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor  - Initialized Gremlin thread pool.  Threads in pool named with pattern gremlin-*
36019 [main] INFO  org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor  - Initialized GremlinExecutor and preparing GremlinScriptEngines instances.
40510 [main] INFO  org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor  - Initialized gremlin-groovy GremlinScriptEngine and registered metrics
40532 [main] INFO  org.apache.tinkerpop.gremlin.server.op.OpLoader  - Adding the standard OpProcessor.
40540 [main] INFO  org.apache.tinkerpop.gremlin.server.op.OpLoader  - Adding the session OpProcessor.
41030 [main] INFO  org.apache.tinkerpop.gremlin.server.op.OpLoader  - Adding the traversal OpProcessor.
41076 [main] INFO  org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor  - Initialized cache for TraversalOpProcessor with size 1000 and expiration time of 600000 ms
41097 [main] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - idleConnectionTimeout was set to 0 which resolves to 0 seconds when configuring this value - this feature will be disabled
41098 [main] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - keepAliveInterval was set to 0 which resolves to 0 seconds when configuring this value - this feature will be disabled
41247 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/vnd.gremlin-v3.0+gryo with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
41248 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/vnd.gremlin-v3.0+gryo-stringd with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
41298 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/vnd.gremlin-v3.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0
41299 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0
41306 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/vnd.gremlin-v1.0+gryo with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0
41311 [main] WARN  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - The org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0 serialization class is deprecated.
41314 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/vnd.gremlin-v1.0+gryo-lite with org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0
41315 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/vnd.gremlin-v1.0+gryo-stringd with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0
41325 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/vnd.gremlin-v2.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0
41347 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/vnd.gremlin-v1.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0
41350 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - application/json already has org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0 configured - it will not be replaced by org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0.
41437 [gremlin-server-boss-1] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Gremlin Server configured with worker thread pool of 1, gremlin pool of 4 and boss thread pool of 1.
41438 [gremlin-server-boss-1] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Channel started at port 8182.
181192 [metrics-logger-reporter-thread-1] INFO  org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics  - type=GAUGE, name=org.apache.tinkerpop.gremlin.server.GremlinServer.gremlin-groovy.sessionless.class-cache.average-load-penalty, value=2.450843818E9

10.  Validate correct operation and create/update an archive image to use as a new base image if the node needs to be rebuilt or if you wish to create a cluster. 

(info)  Before making the image you may wish to first stop and optionally disable the service temporarily to prevent auto-start on boot, ie:  sc "\\localhost" config janusgraph start=disabled.  



Multi-Node (Cluster) Setup Instructions

Enabling your single-node deployment for cluster operation

11. Ensure you first enable Cassandra for cluster operation.   

12. Update Janusgraph's syndeia_cloud_graph configuration by doing the following: 

12A.  Launch a Cygwin terminal.

12B.  Using an editor of your choice, save the below Groovy code to a file (ex: /opt/janusgraph-current/conf/syndeia-cloud-3.3_janusgraph_multi-node_init.groovy). 

(lightbulb) ex. via vim:  vim /opt/janusgraph-current/conf/syndeia-cloud-3.3_janusgraph_multi-node_init.groovy; In vim type :set paste (for paste mode), hit i for (insert mode), copy and paste in the below code, then type :wq (write and quit))

:remote connect tinkerpop.server conf/remote.yaml session
:remote console
map = new HashMap();
map.put('storage.backend', 'cql'); // should return: ==>null
map.put('storage.hostname', 'cassandra.mydomain.com');
ConfiguredGraphFactory.updateConfiguration('syndeia_cloud_graph', map);

12C.  (warning) IMPORTANT:  Ensure the JAVA_HOME environment variable is properly set to point to where Java is installed via a 8.3 Windows path, ex:  C:\Progra~1\Java\jre1.8.0_151 (note, your exact Java version may differ).  If not, set it via export, ex: export JAVA_HOME=c:\Progra~1\Java\jre1.8.0_151

Administrator@MY-JG-SERVER ~
$ export JAVA_HOME=c:\Progra~1\Java\jre1.8.0_151

Administrator@MY-JG-SERVER ~
$ echo $JAVA_HOME
C:\Progra~1\Java\jre1.8.0_151

12D.  cd to Janusgraph's bin directory (ie: cd /opt/janusgraph-current/bin), 

12E.  Run gremlin.bat with the Groovy script created earlier, ex:  ./gremlin.bat -e .\\\\conf\\\\syndeia-cloud-3.3_janusgraph_multi-node_init.groovy.  You should see something like the following:    

Administrator@MY-JG-SERVER /opt/janusgraph-current/bin
$ ./gremlin.bat -e .\\\\conf\\\\syndeia-cloud-3.3_janusgraph_multi-node_init.groovy

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/janusgraph-0.3.1-hadoop2/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/janusgraph-0.3.1-hadoop2/lib/logback-classic-1.1.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14:02:47 WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

(info) Java apps tend to not reset the TTY terminal correctly in Cygwin's terminal, so if after running the above you do not see do not see any of your keystrokes being output, ie:  your typing is "invisible", type "stty sane" <Enter>  (you may need to hit ^C and then enter a few times to ensure you're at a fresh prompt)


Adding new nodes to an existing single-node

13. Deploy another instance of your Janusgraph base image.

14. Make any appropriate changes for the MAC address (ex: in the VM settings and/or udev, if used).

15. Setup forward & reverse DNS records on your DNS server (consult your IT admin/sysadmin if required) and set the hostname and primary DNS suffix on the machine itself (sudo hostnamectl set-hostname <new_JanusGraph_node_FQDN> where FQDN = Fully Qualified Domain Name, ex: janusgraph2.mycompany.com )

16. SSH to the IP (or the FQDN of the new node if DNS has already propagated).

17. Verify the Cassandra “datacenter” name (default = dc1) from the CLI by running nodetool status and then from Cassandra CQLSH increment the Replication Factor (RF) for your JanusGraph keyspace(s):

ALTER KEYSPACE syndeia_cloud_graph WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', '<datacenter_name>' : <total_number_of_nodes> };
-- where <datacenter_name> = the name of the datacenter as shown via nodetool status, and <total_number_of_nodes> = total # of nodes (in the cluster)
ALTER KEYSPACE syndeia_cloud_graph_config WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', '<datacenter_name>' : <total_number_of_nodes> };
-- where <datacenter_name> = the name of the datacenter as shown via nodetool status, and <total_number_of_nodes> = total # of nodes (in the cluster)

18. Follow  "Things to Consider in a Multi-Node JanusGraph Cluster" https://docs.janusgraph.org/0.3.1/things-to-consider-in-a-multi-node-janusgraph-cluster.html

19. Repeat steps 13 ~ 18 for each additional cluster node.



Validating JanusGraph Operation for 1-node (or multiple nodes)

20. To validate JanusGraph operation, we start a Gremlin client and issue commands to check the edges and vertices in our graph  To do this, perform the following steps:  

20.1.  Open a new terminal window

20.2.  Go to the bin folder in JanusGraph installation and type the following command:  ./gremlin.bat.  You should see output similar to the below:    

$ ./gremlin.bat 

         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/janusgraph-0.3.1-hadoop2/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/janusgraph-0.3.1-hadoop2/lib/logback-classic-1.1.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
plugin activated: janusgraph.imports
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
20:19:16 WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.spark
plugin activated: tinkerpop.tinkergraph
gremlin> 

20.3.  Type the following commands, substituting your appropriate values as required:  

:remote connect tinkerpop.server conf/remote.yaml session
:remote console
graph = ConfiguredGraphFactory.open('syndeia_cloud_graph');
// should return: ==>standardjanusgraph[cql:[cassandra.mydomain.com]]
g = graph.traversal();
g.V();
g.E();

(info) The last 2 commands above should not return any results since the graph (syndeia_cloud_graph) is empty - no vertices or edges.

20.4.  Type :quit   


.....Page break.....