Backup & Restore Methods for Syndeia Cloud Keyspace in Cassandra
Overview
Cassandra has multiple methods to backup and restore data depending on the scenario and the amount of data you wish to backup/restore.
Note, this document will NOT discuss the backup and restore for the application software itself (ie: all the software/dependencies you installed or symlinked into /opt) or the configuration for it. Use whatever conventional tool(s) you are comfortable with to backup/restore that.
This document will go over the simplest & most portable method (using the CQL COPY ... TO
/ COPY ... FROM
CQL commands), but be aware that other methods (ie: sstableloader
with nodetool snapshot
+ schema & token export/import or 3rd party ETL tools) are available too but have their own caveats. Refer to the Apache Cassandra and DataStax documentation (see ICX KB article "How to Backup & Restore Syndeia Cloud", https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/migrating.html & https://web.archive.org/web/20161227010514/http://datascale.io/cloning-cassandra-clusters-fast-way/) for further details.
There are three main types of data that we need to be concerned about when backing up & restoring your database:
- Schema: the schema is a description of what type(s) of data each table contains, think of it like column headers to table(s) of items
- Data: this is actual application table data.
- Metadata: this is supporting meta-data used by the database software, ex: Cassandra node token assignments.
For the method we are discussing here, we only need to be concerned about the first two.
Backup
The following will backup the Syndeia Cloud database schema + keyspace tables automatically via a shell script.
1. Copy and paste the below into a new file named syndeia_cloud-3.4_backup.bash
(set the password for syndeia_admin
on L4 (default is myPw
)):
Note L3-4 are environment variables that can be optionally picked up from your shell if you export them out beforehand, ie: export version=3.4; syndeia_admin=foo
, useful for automation or any (scheduled) cron
job)
#!/bin/env bash cassandra_jg_host=${1:-localhost} version=${version:-3.4} syndeia_admin_pw=${syndeia_admin_pw:-myPw} SC_HOME=${SC_HOME:-/opt/icx/syndeia-cloud-current} keyspaces=(syndeia_cloud_auth syndeia_cloud_devops syndeia_cloud_graph syndeia_cloud_graph_config syndeia_cloud_store) tables=(syndeia_cloud_auth.auto_key / syndeia_cloud_auth.config / syndeia_cloud_auth.messages / syndeia_cloud_auth.metadata / syndeia_cloud_auth.offsetstore / syndeia_cloud_auth.snapshots / syndeia_cloud_auth.users / syndeia_cloud_auth.users_by_external_id / syndeia_cloud_auth.users_by_external_key / syndeia_cloud_auth.users_by_user_name / syndeia_cloud_graph.edgestore / syndeia_cloud_graph.edgestore_lock_ / syndeia_cloud_graph.graphindex / syndeia_cloud_graph.graphindex_lock_ / syndeia_cloud_graph.janusgraph_ids / syndeia_cloud_graph.system_properties / syndeia_cloud_graph.system_properties_lock_ / syndeia_cloud_graph.systemlog / syndeia_cloud_graph.txlog / syndeia_cloud_graph_config.edgestore / syndeia_cloud_graph_config.edgestore_lock_ / syndeia_cloud_graph_config.graphindex / syndeia_cloud_graph_config.graphindex_lock_ / syndeia_cloud_graph_config.janusgraph_ids / syndeia_cloud_graph_config.system_properties / syndeia_cloud_graph_config.system_properties_lock_ / syndeia_cloud_graph_config.systemlog / syndeia_cloud_graph_config.txlog / syndeia_cloud_store.artifact_types / syndeia_cloud_store.artifact_types_by_external_id / syndeia_cloud_store.artifact_types_by_external_key / syndeia_cloud_store.artifact_types_by_repository_key / syndeia_cloud_store.artifacts / syndeia_cloud_store.artifacts_by_container_key / syndeia_cloud_store.artifacts_by_external_id / syndeia_cloud_store.artifacts_by_external_key / syndeia_cloud_store.artifacts_by_key / syndeia_cloud_store.artifacts_by_type_key / syndeia_cloud_store.auto_key / syndeia_cloud_store.config / syndeia_cloud_store.container_types / syndeia_cloud_store.container_types_by_external_id / syndeia_cloud_store.container_types_by_external_key / syndeia_cloud_store.container_types_by_repository_key / syndeia_cloud_store.containers / syndeia_cloud_store.containers_by_external_id / syndeia_cloud_store.containers_by_external_key / syndeia_cloud_store.containers_by_repository_key / syndeia_cloud_store.messages / syndeia_cloud_store.metadata / syndeia_cloud_store.offsetstore / syndeia_cloud_store.relation_types / syndeia_cloud_store.relation_types_by_external_id / syndeia_cloud_store.relation_types_by_external_key / syndeia_cloud_store.relation_types_by_repository_key / syndeia_cloud_store.relations / syndeia_cloud_store.relations_by_container_key / syndeia_cloud_store.relations_by_external_id / syndeia_cloud_store.relations_by_external_key / syndeia_cloud_store.relations_by_key / syndeia_cloud_store.relations_by_source_key / syndeia_cloud_store.relations_by_target_key / syndeia_cloud_store.repositories / syndeia_cloud_store.repositories_by_uri / syndeia_cloud_store.repository_types / syndeia_cloud_store.snapshots) LC_NUMERIC=C # TIMEFORMAT='{"real":%R,"user":%U,"sys":%S}' TIMEFORMAT='{"real":%R}' if [[ ! -d SC_backups ]]; then mkdir SC_backups fi; echo "$(date) - Dump schema for ${keyspaces}..." for keyspace in ${keyspaces[@]} do; dst_filename="SC_backups/${keyspace}_schema_backup.cql" all_keyspace_filenames="${all_keyspace_filenames} ${keyspace}_schema_backup.cql" echo "$(date) - Dump schema for ${keyspace}..." sudo cqlsh -u syndeia_admin -p ${syndeia_admin_pw} -e "DESCRIBE KEYSPACE ${keyspace};" "${cassandra_jg_host}" > ${dst_filename} > /dev/null if [[ $? -ne 0 ]]; then echo "$(date) x Error, could not export schema ${keyspace}" exit 1 else echo "$(date) - Successfully exported schema ${keyspace}" fi; cat done; cat "${all_keyspace_filenames}" > SC_backups/syndeia-cloud-${version}_concatenated_schema_backup.cql > /dev/null echo "$(date) - Export data for tables..." for table in ${tables[@]} do; dst_filename="SC_backups/syndeia-cloud-${version}_backup_${table}.csv" # Generate MD5 hash for NULL values during each table export md5hash=$(date -Iseconds | md5sum | cut -f1 -d' ') echo "${md5hash}" > SC_backups/${table}_md5hash_for_NULL.txt echo "$(date) - Export data for ${table} to ${dst_filename} using md5hash=${md5hash} for NULL..." CQL_to_backup_table="COPY ${table} TO '${dst_filename}' WITH HEADER = TRUE AND NULL='${md5hash}'; " sudo cqlsh -u syndeia_admin -p ${syndeia_admin_pw} -e "${CQL_to_backup_table}" "${cassandra_jg_host}" > /dev/null if [[ $? -ne 0 ]]; then echo "$(date) x Error, could not export table ${table}" exit 2 else echo "$(date) - Successfully exported table ${table}" fi; done;
2. Make the script executable, ie: chmod ug+x syndeia_cloud-3.4_backup.bash
3. To use the script, run the following command to from the CLI: syndeia_cloud-3.4_backup.bash
<node_FQDN>
where <node_FQDN> is the node Fully Qualified Domain Name (ex: cassandra.mycompany.com) or if unspecified, localhost
by default.
For each successfully exported schema and table, you should see output similar to the following (note, the # of exported rows):
Mon May 10 16:57:49 EDT 2021 - Successfully exported schema syndeia_cloud_auth Using 3 child processes Starting copy of syndeia_cloud_auth.users with columns [id, key, email, provider_id, provider_key, activated, avatar_url, created_by, created_date, first_name, last_name, modified_by, modified_date, permissions, roles]. Processed: 11 rows; Rate: 19 rows/s; Avg. rate: 4 rows/s 11 rows exported to 1 files in 2.698 seconds.
Restore
The following will restore the Syndeia Cloud database schema + keyspace tables automatically via a shell script.
1. Copy and paste the below into a new file named syndeia_cloud-3.4_restore.bash
(set the password for syndeia_admin
on L4 (default is myPw
)):
Note L3-4 are environment variables that can be optionally picked up from your shell if you export them out beforehand, ie: export version=3.4; syndeia_admin=foo
, useful for automation)
#!/bin/env bash cassandra_jg_host=${1:-localhost} version=${version:-3.4} syndeia_admin_pw=${syndeia_admin_pw:-myPw} SC_HOME=${SC_HOME:-/opt/icx/syndeia-cloud-current} keyspaces=(syndeia_cloud_auth syndeia_cloud_devops syndeia_cloud_graph syndeia_cloud_graph_config syndeia_cloud_store) tables=(syndeia_cloud_auth.auto_key / syndeia_cloud_auth.config / syndeia_cloud_auth.messages / syndeia_cloud_auth.metadata / syndeia_cloud_auth.offsetstore / syndeia_cloud_auth.snapshots / syndeia_cloud_auth.users / syndeia_cloud_auth.users_by_external_id / syndeia_cloud_auth.users_by_external_key / syndeia_cloud_auth.users_by_user_name / syndeia_cloud_graph.edgestore / syndeia_cloud_graph.edgestore_lock_ / syndeia_cloud_graph.graphindex / syndeia_cloud_graph.graphindex_lock_ / syndeia_cloud_graph.janusgraph_ids / syndeia_cloud_graph.system_properties / syndeia_cloud_graph.system_properties_lock_ / syndeia_cloud_graph.systemlog / syndeia_cloud_graph.txlog / syndeia_cloud_graph_config.edgestore / syndeia_cloud_graph_config.edgestore_lock_ / syndeia_cloud_graph_config.graphindex / syndeia_cloud_graph_config.graphindex_lock_ / syndeia_cloud_graph_config.janusgraph_ids / syndeia_cloud_graph_config.system_properties / syndeia_cloud_graph_config.system_properties_lock_ / syndeia_cloud_graph_config.systemlog / syndeia_cloud_graph_config.txlog / syndeia_cloud_store.artifact_types / syndeia_cloud_store.artifact_types_by_external_id / syndeia_cloud_store.artifact_types_by_external_key / syndeia_cloud_store.artifact_types_by_repository_key / syndeia_cloud_store.artifacts / syndeia_cloud_store.artifacts_by_container_key / syndeia_cloud_store.artifacts_by_external_id / syndeia_cloud_store.artifacts_by_external_key / syndeia_cloud_store.artifacts_by_key / syndeia_cloud_store.artifacts_by_type_key / syndeia_cloud_store.auto_key / syndeia_cloud_store.config / syndeia_cloud_store.container_types / syndeia_cloud_store.container_types_by_external_id / syndeia_cloud_store.container_types_by_external_key / syndeia_cloud_store.container_types_by_repository_key / syndeia_cloud_store.containers / syndeia_cloud_store.containers_by_external_id / syndeia_cloud_store.containers_by_external_key / syndeia_cloud_store.containers_by_repository_key / syndeia_cloud_store.messages / syndeia_cloud_store.metadata / syndeia_cloud_store.offsetstore / syndeia_cloud_store.relation_types / syndeia_cloud_store.relation_types_by_external_id / syndeia_cloud_store.relation_types_by_external_key / syndeia_cloud_store.relation_types_by_repository_key / syndeia_cloud_store.relations / syndeia_cloud_store.relations_by_container_key / syndeia_cloud_store.relations_by_external_id / syndeia_cloud_store.relations_by_external_key / syndeia_cloud_store.relations_by_key / syndeia_cloud_store.relations_by_source_key / syndeia_cloud_store.relations_by_target_key / syndeia_cloud_store.repositories / syndeia_cloud_store.repositories_by_uri / syndeia_cloud_store.repository_types / syndeia_cloud_store.snapshots) LC_NUMERIC=C # TIMEFORMAT='{"real":%R,"user":%U,"sys":%S}' TIMEFORMAT='{"real":%R}' echo "$(date) - Restore schema for ${keyspaces}..." src_filename="SC_backups/syndeia-cloud-${version}_concatenated_schema_backup.cql" sudo cqlsh -u syndeia_admin -p ${syndeia_admin_pw} -e "SOURCE '${src_filename}';" "${cassandra_jg_host}" if [[ $? -ne 0 ]]; then echo "$(date) x Error, could not import concatenated schema ${src_filename}" exit 1 else echo "$(date) - Successfully imported concatenated schema ${src_filename}" fi; echo "$(date) - Import data for tables..." for table in ${tables[@]} do; src_filename="SC_backups/syndeia-cloud-${version}_backup_${table}.csv" # Read MD5 hash for NULL values during each table export md5hash=$(cat SC_backups/${table}_md5hash_for_NULL.txt) echo "$(date) - Import data for ${table} from ${dst_filename}..." CQL_to_restore_table="COPY ${table} FROM '${src_filename}' WITH HEADER = TRUE AND WITH NULL='${md5hash}'; " sudo cqlsh -u syndeia_admin -p ${syndeia_admin_pw} -e "${CQL_to_restore_table}" "${cassandra_jg_host}" if [[ $? -ne 0 ]]; then echo "$(date) x Error, could not import table ${table}" exit 2 else echo "$(date) - Successfully imported table ${table}" fi; done; echo "$(date) - Done!"
2. Make the script executable, ie: chmod ug+x syndeia_cloud-3.4_restore.bash
3. To use the script, run the following command to from the CLI: syndeia_cloud-3.4_restore.bash
<node_FQDN>
where <node_FQDN> is the node Fully Qualified Domain Name (ex: cassandra.mycompany.com) or if unspecified, localhost
by default.
For each successfully imported schema and table, you should see output similar to the following (note, the # of exported rows):
Mon May 10 16:57:49 EDT 2021 - Successfully imported schema syndeia_cloud_auth Using 3 child processes Starting copy of syndeia_cloud_auth.users with columns [id, activated, avatar_url, created_by_user_key, created_timestamp, description, external_id, external_key, first_name, key, last_modified_by_user_key, last_modified_timestamp, last_name, name, other_info, permissions, profiles, roles, user_name]. Processed: 11 rows; Rate: 6 rows/s; Avg. rate: 10 rows/s 11 rows imported from 1 files in 1.048 seconds (0 skipped).
Troubleshooting
Import Errors
Q1: Sometimes I get the following timeout error and nothing gets imported (see below), how do I resolve this?
Failed to import 20 rows: OperationTimedOut - errors={<Host: x.x.x.x dc1>: ConnectionException('Host has been marked down or removed',)}, last_host=y.y.y.y, will retry later, attempt 1 of 5 Failed to import 20 rows: OperationTimedOut - errors={'y.y.y.y': 'Client request timeout. See Session.execute[\_async](timeout)'}, last\_host=y.y.y.y, will retry later, attempt 1 of 5 No records inserted in 90 seconds, aborting Processed: 0 rows; Rate: 0 rows/s; Avg. rate: 0 rows/s 0 rows imported from 1 files in 1 minute and 30.180 seconds (0 skipped).
A1: Rerun the import command or create/set the following ~/.cassandra/cqlshrc
parameters from the account used to run cqlsh and re-run the import command
[connection] request_timeout=6000 # default is 10 seconds, set to None to disable client_timeout=3600 # default is 10 seconds, set to None to disable
Q2: Sometimes I get the following "Pickling" error (see below), how do I resolve this?
PicklingError: Can't pickle <class 'cqlshlib.copyutil.ImmutableDict'>: attribute lookup cqlshlib.copyutil.ImmutableDict failed
A2: Suffix the following to the end of your COPY commands: WITH MINBATCHSIZE=1 AND MAXBATCHSIZE=1 AND PAGESIZE=10