Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Port enhancements from SC 3.3 doc

...

Cassandra has multiple methods to backup and restore data depending on the scenario and the amount of data you wish to backup/restore.

(info) Note, this document will not discuss the backup and restore for the application software itself (ie: all the software/dependencies you installed or symlinked into /opt) or the configuration for it.  Use whatever conventional tool(s) you are comfortable with to backup/restore that.  

This document will go over the simplest & most portable methodi method (using the CQL COPY ... TO / COPY ... FROM CQL commands), but be aware that other methods (ie: sstableloader with nodetool snapshot + schema & token export/import or 3rd party ETL tools) may be necessary if your situation is more complexare available too but have their own caveats. Refer to the Apache Cassandra and DataStax documentation (https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/migrating.html & https://web.archive.org/web/20161227010514/http://datascale.io/cloning-cassandra-clusters-fast-way/) for further details.

There are three main types of data that we need to be concerned about when backing up & restoring your database:

  • Schema: the schema is a description of what type(s) of data each table contains, think of it like column headers to table(s) of items
  • Data: this is actual application table data.
  • Metadata: this is supporting meta-data used by the database software, ex: Cassandra node token assignments.

For the method we are discussing here, we only need to be concerned about the first two.  

...

Backup

Schema

1. To The following will backup the Syndeia Cloud database schema , run the following command to from the CLI: cqlsh -u <superuser> -e 'DESCRIBE KEYSPACE syndeia' <node_FQDN> > syndeia-cloud_v<version_#>_backup_schema.cql

    where <superuser> is a Cassandra superuser account, <node_FQDN> is the node Fully Qualified Domain Name (ex: cassandra.mycompany.com), and <version_#> is the version of Syndeia Cloud. You will be prompted for your Cassandra password.

Data

2. To backup the Syndeia Cloud database data, run the below to export data from each respective syndeia keyspace table to the specified .CSV file, where <version_#> is the version of Syndeia Cloud.

Code Block
languagesql
themeRDark
    COPY syndeia.artifact_types TO 'syndeia-cloud_v<version#>_backup_syndeia.artifact_types.csv' WITH HEADER = TRUE;
    COPY syndeia.artifacts TO 'syndeia-cloud_v<version#>_backup_syndeia.artifacts.csv' WITH HEADER = TRUE;
    COPY syndeia.auth_tokens TO 'syndeia-cloud_v2019-01-02_syndeia.auth_tokens.csv' WITH HEADER = TRUE;
    COPY syndeia.auto_key TO 'syndeia-cloud_v<version#>_backup_syndeia.auto_key.csv' WITH HEADER = TRUE;
    COPY syndeia.container_versions TO 'syndeia-cloud_v<version#>_backup_syndeia.container_versions.csv' WITH HEADER = TRUE;
    COPY syndeia.containers TO 'syndeia-cloud_v<version#>_backup_syndeia.containers.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_artifacts TO 'syndeia-cloud_v<version#>_backup_syndeia.deleted_artifacts.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_containers TO 'syndeia-cloud_v<version#>_backup_syndeia.deleted_containers.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_relations TO 'syndeia-cloud_v<version#>_backup_syndeia.deleted_relations.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_relations TO 'syndeia-cloud_v<version#>_backup_syndeia.deleted_relations.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_repositories TO 'syndeia-cloud_v<version#>_backup_syndeia.deleted_repositories.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_users TO 'syndeia-cloud_v<version#>_backup_syndeia.deleted_users.csv' WITH HEADER = TRUE;
    COPY syndeia.passwords TO 'syndeia-cloud_v<version#>_backup_syndeia.passwords.csv' WITH HEADER = TRUE;
    COPY syndeia.relation_types TO 'syndeia-cloud_v<version#>_backup_syndeia.relation_types.csv' WITH HEADER = TRUE;
    COPY syndeia.relations TO 'syndeia-cloud_v<version#>_backup_syndeia.relations.csv' WITH HEADER = TRUE;
    COPY syndeia.repositories TO 'syndeia-cloud_v<version#>_backup_syndeia.repositories.csv' WITH HEADER = TRUE;
    COPY syndeia.user_container_access TO 'syndeia-cloud_v<version#>_backup_syndeia.user_container_access.csv' WITH HEADER = TRUE;
    COPY syndeia.users TO 'syndeia-cloud_v<version#>_backup_syndeia.users.csv' WITH HEADER = TRUE;
    

For each successfully exported table, you should see output similar to the following (note, the # of exported rows):

Code Block
languagesql
themeRDark
    cassandra@cqlsh> COPY syndeia.users TO 'syndeia-cloud_v2018-02-02_backup_syndeia.users.csv' WITH HEADER = TRUE;
    Using 3 child processes

    Starting copy of syndeia.users with columns [id, key, email, provider_id, provider_key, activated, avatar_url, created_by, created_date, first_name, last_name, modified_by, modified_date, permissions, roles].
    Processed: 11 rows; Rate:      19 rows/s; Avg. rate:       4 rows/s
    11 rows exported to 1 files in 2.698 seconds.
    cassandra@cqlsh> 
    

Restore

Schema

3. To restore the Syndeia Cloud database schema, run the following command from the CLI: cqlsh -u <superuser> -f syndeia-cloud_v<version_#>_backup_schema.cql <node_FQDN>

    where <superuser> is a Cassandra superuser account, <node_FQDN> is the node Fully Qualified Domain Name (ex: cassandra.mycompany.com), and <version_#> is the version of Syndeia Cloud. You will be prompted for your Cassandra password.

Data

4. To restore the Syndeia Cloud database data, run the below to import data into each respective syndeia keyspace table from the specified .CSV file, where <version_#> is the version of Syndeia Cloud.

Code Block
languagesql
themeRDark
    COPY syndeia.artifact_types FROM 'syndeia-cloud_v<version#>_backup_syndeia.artifact_types.csv' WITH HEADER = TRUE;
    COPY syndeia.artifacts FROM 'syndeia-cloud_v<version#>_backup_syndeia.artifacts.csv' WITH HEADER = TRUE;
    COPY syndeia.auth_tokens FROM 'syndeia-cloud_v2019-01-02_syndeia.auth_tokens.csv' WITH HEADER = TRUE;
    COPY syndeia.auto_key FROM 'syndeia-cloud_v<version#>_backup_syndeia.auto_key.csv' WITH HEADER = TRUE;
    COPY syndeia.container_versions FROM 'syndeia-cloud_v<version#>_backup_syndeia.container_versions.csv' WITH HEADER = TRUE;
    COPY syndeia.containers FROM 'syndeia-cloud_v<version#>_backup_syndeia.containers.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_artifacts FROM 'syndeia-cloud_v<version#>_backup_syndeia.deleted_artifacts.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_containers FROM 'syndeia-cloud_v<version#>_backup_syndeia.deleted_containers.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_relations FROM 'syndeia-cloud_v<version#>_backup_syndeia.deleted_relations.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_relations FROM 'syndeia-cloud_v<version#>_backup_syndeia.deleted_relations.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_repositories FROM 'syndeia-cloud_v<version#>_backup_syndeia.deleted_repositories.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_users FROM 'syndeia-cloud_v<version#>_backup_syndeia.deleted_users.csv' WITH HEADER = TRUE;
    COPY syndeia.passwords FROM 'syndeia-cloud_v<version#>_backup_syndeia.passwords.csv' WITH HEADER = TRUE;
    COPY syndeia.relation_types FROM 'syndeia-cloud_v<version#>_backup_syndeia.relation_types.csv' WITH HEADER = TRUE;
    COPY syndeia.relations FROM 'syndeia-cloud_v<version#>_backup_syndeia.relations.csv' WITH HEADER = TRUE;
    COPY syndeia.repositories FROM 'syndeia-cloud_v<version#>_backup_syndeia.repositories.csv' WITH HEADER = TRUE;
    COPY syndeia.user_container_access FROM 'syndeia-cloud_v<version#>_backup_syndeia.user_container_access.csv' WITH HEADER = TRUE;
    COPY syndeia.users FROM 'syndeia-cloud_v<version#>_backup_syndeia.users.csv' WITH HEADER = TRUE;
    

For each successfully imported + keyspace tables automatically via a shell script.

1. Copy and paste the below into a new file named syndeia_cloud-3.3_backup.bash (set the password for syndeia_admin on L4 (default is myPw)):  

(info) Note L3-4 are environment variables that can be optionally picked up from your shell if you export them out beforehand, ie:  export version=3.4; syndeia_admin=foo , useful for automation or any (scheduled) cron job)

Code Block
languagebash
themeRDark
linenumberstrue
#!/bin/env bash
cassandra_jg_host=${1:-localhost}
version=${version:-3.4}
syndeia_admin_pw=${syndeia_admin_pw:-myPw}
SC_HOME=${SC_HOME:-/opt/icx/syndeia-cloud-current}
keyspaces=(syndeia_cloud_auth syndeia_cloud_devops syndeia_cloud_graph syndeia_cloud_graph_config syndeia_cloud_store)
tables=(syndeia_cloud_store.snapshots \
syndeia_cloud_store.metadata \
syndeia_cloud_store.offsetstore \
syndeia_cloud_store.auto_key \
syndeia_cloud_store.messages \
syndeia_cloud_store.repositories \
syndeia_cloud_store.repositories_by_uri \
syndeia_cloud_store.containers \
syndeia_cloud_store.containers_by_repository_key \
syndeia_cloud_store.containers_by_external_key \
syndeia_cloud_store.containers_by_external_id \
syndeia_cloud_store.artifacts \
syndeia_cloud_store.artifacts_by_key \
syndeia_cloud_store.artifacts_by_container_key \
syndeia_cloud_store.artifacts_by_external_key \
syndeia_cloud_store.artifacts_by_external_id \
syndeia_cloud_store.artifacts_by_type_key \
syndeia_cloud_store.relations \
syndeia_cloud_store.relations_by_container_key \
syndeia_cloud_store.config \
syndeia_cloud_store.relations_by_key \
syndeia_cloud_store.relations_by_external_id \
syndeia_cloud_store.relations_by_external_key \
syndeia_cloud_store.relations_by_source_key \
syndeia_cloud_store.relations_by_target_key \
syndeia_cloud_store.repository_types \
syndeia_cloud_store.container_types \
syndeia_cloud_store.container_types_by_repository_key \
syndeia_cloud_store.container_types_by_external_key \
syndeia_cloud_store.container_types_by_external_id \
syndeia_cloud_store.artifact_types \
syndeia_cloud_store.artifact_types_by_repository_key \
syndeia_cloud_store.artifact_types_by_external_key \
syndeia_cloud_store.artifact_types_by_external_id \
syndeia_cloud_store.relation_types \
syndeia_cloud_store.relation_types_by_repository_key \
syndeia_cloud_store.relation_types_by_external_key \
syndeia_cloud_store.relation_types_by_external_id \
syndeia_cloud_auth.snapshots \
syndeia_cloud_auth.metadata \
syndeia_cloud_auth.offsetstore \
syndeia_cloud_auth.messages \
syndeia_cloud_auth.config \
syndeia_cloud_auth.auto_key \
syndeia_cloud_auth.users \
syndeia_cloud_auth.users_by_external_id \
syndeia_cloud_auth.users_by_external_key \
syndeia_cloud_auth.users_by_user_name)
LC_NUMERIC=C
# TIMEFORMAT='{"real":%R,"user":%U,"sys":%S}'
TIMEFORMAT='{"real":%R}'
if [[ ! -d SC_backups ]]; then
    mkdir SC_backups
fi;
echo "$(date) - Dump schema for ${keyspaces}..."
for keyspace in ${keyspaces[@]} do;
    dst_filename="SC_backups/${keyspace}_schema_backup.cql"
    all_keyspace_filenames="${all_keyspace_filenames} ${keyspace}_schema_backup.cql"
    echo "$(date)   - Dump schema for ${keyspace}..."
    sudo cqlsh -u syndeia_admin -p ${syndeia_admin_pw} -e "DESCRIBE KEYSPACE ${keyspace};" "${cassandra_jg_host}" > ${dst_filename} > /dev/null
    if [[ $? -ne 0 ]]; then
        echo "$(date) x Error, could not export schema ${keyspace}"
        exit 1
    else
        echo "$(date) - Successfully exported schema ${keyspace}"
    fi;
	cat 
done;
cat "${all_keyspace_filenames}" > SC_backups/syndeia-cloud-${version}_concatenated_schema_backup.cql > /dev/null
echo "$(date) - Export data for tables..."
for table in ${tables[@]} do;
    dst_filename="SC_backups/syndeia-cloud-${version}_backup_${table}.csv"
	# Generate MD5 hash for NULL values during each table export
	md5hash=$(date -Iseconds | md5sum | cut -f1 -d' ')
	echo "${md5hash}" > SC_backups/${table}_md5hash_for_NULL.txt
    echo "$(date)  - Export data for ${table} to ${dst_filename} using md5hash=${md5hash} for NULL..."
    CQL_to_backup_table="COPY ${table} TO '${dst_filename}' WITH HEADER = TRUE AND NULL='${md5hash}'; "
    sudo cqlsh -u syndeia_admin -p ${syndeia_admin_pw} -e "${CQL_to_backup_table}" "${cassandra_jg_host}" > /dev/null
    if [[ $? -ne 0 ]]; then
        echo "$(date) x Error, could not export table ${table}"
        exit 2
    else
        echo "$(date) - Successfully exported table ${table}"
    fi;
done;

2.  Make the script executable, ie:  chmod ug+x syndeia_cloud-3.4_backup.bash

3.  To use the script, run the following command to from the CLI: syndeia_cloud-3.4_backup.bash <node_FQDN>

      where <node_FQDN> is the node Fully Qualified Domain Name (ex: cassandra.mycompany.com) or if unspecified, localhost by default.  

     For each successfully exported schema and table, you should see output similar to the following (note, the # of exported rows):

Code Block
languagesql
themeRDark
Mon May 10 16:57:49 EDT 2021 - Successfully exported schema syndeia_cloud_auth

Using 3 child processes

Starting copy of syndeia_cloud_auth.users with columns [id, key, email, provider_id, provider_key, activated, avatar_url, created_by, created_date, first_name, last_name, modified_by, modified_date, permissions, roles].
Processed: 11 rows; Rate:      19 rows/s; Avg. rate:       4 rows/s
11 rows exported to 1 files in 2.698 seconds.


...

Restore

The following will restore the Syndeia Cloud database schema + keyspace tables automatically via a shell script.

1. Copy and paste the below into a new file named syndeia_cloud-3.4_restore.bash (set the password for syndeia_admin on L4 (default is myPw)):  

(info) Note L3-4 are environment variables that can be optionally picked up from your shell if you export them out beforehand, ie:  export version=3.4; syndeia_admin=foo , useful for automation)

Code Block
languagebash
themeRDark
linenumberstrue
#!/bin/env bash
cassandra_jg_host=${1:-localhost}
version=${version:-3.4}
syndeia_admin_pw=${syndeia_admin_pw:-myPw}
SC_HOME=${SC_HOME:-/opt/icx/syndeia-cloud-current}
keyspaces=(syndeia_cloud_auth syndeia_cloud_devops syndeia_cloud_graph syndeia_cloud_graph_config syndeia_cloud_store)
tables=(syndeia_cloud_store.snapshots \
syndeia_cloud_store.metadata \
syndeia_cloud_store.offsetstore \
syndeia_cloud_store.auto_key \
syndeia_cloud_store.messages \
syndeia_cloud_store.repositories \
syndeia_cloud_store.repositories_by_uri \
syndeia_cloud_store.containers \
syndeia_cloud_store.containers_by_repository_key \
syndeia_cloud_store.containers_by_external_key \
syndeia_cloud_store.containers_by_external_id \
syndeia_cloud_store.artifacts \
syndeia_cloud_store.artifacts_by_key \
syndeia_cloud_store.artifacts_by_container_key \
syndeia_cloud_store.artifacts_by_external_key \
syndeia_cloud_store.artifacts_by_external_id \
syndeia_cloud_store.artifacts_by_type_key \
syndeia_cloud_store.relations \
syndeia_cloud_store.relations_by_container_key \
syndeia_cloud_store.config \
syndeia_cloud_store.relations_by_key \
syndeia_cloud_store.relations_by_external_id \
syndeia_cloud_store.relations_by_external_key \
syndeia_cloud_store.relations_by_source_key \
syndeia_cloud_store.relations_by_target_key \
syndeia_cloud_store.repository_types \
syndeia_cloud_store.container_types \
syndeia_cloud_store.container_types_by_repository_key \
syndeia_cloud_store.container_types_by_external_key \
syndeia_cloud_store.container_types_by_external_id \
syndeia_cloud_store.artifact_types \
syndeia_cloud_store.artifact_types_by_repository_key \
syndeia_cloud_store.artifact_types_by_external_key \
syndeia_cloud_store.artifact_types_by_external_id \
syndeia_cloud_store.relation_types \
syndeia_cloud_store.relation_types_by_repository_key \
syndeia_cloud_store.relation_types_by_external_key \
syndeia_cloud_store.relation_types_by_external_id \
syndeia_cloud_auth.snapshots \
syndeia_cloud_auth.metadata \
syndeia_cloud_auth.offsetstore \
syndeia_cloud_auth.messages \
syndeia_cloud_auth.config \
syndeia_cloud_auth.auto_key \
syndeia_cloud_auth.users \
syndeia_cloud_auth.users_by_external_id \
syndeia_cloud_auth.users_by_external_key \
syndeia_cloud_auth.users_by_user_name)
LC_NUMERIC=C
# TIMEFORMAT='{"real":%R,"user":%U,"sys":%S}'
TIMEFORMAT='{"real":%R}'
echo "$(date) - Restore schema for ${keyspaces}..."
src_filename="SC_backups/syndeia-cloud-${version}_concatenated_schema_backup.cql"
sudo cqlsh -u syndeia_admin -p ${syndeia_admin_pw} -e "SOURCE '${src_filename}';" "${cassandra_jg_host}"
if [[ $? -ne 0 ]]; then
    echo "$(date) x Error, could not import concatenated schema ${src_filename}"
    exit 1
else
    echo "$(date) - Successfully imported concatenated schema ${src_filename}"
fi;
echo "$(date) - Import data for tables..."
for table in ${tables[@]} do;
    src_filename="SC_backups/syndeia-cloud-${version}_backup_${table}.csv"
	# Read MD5 hash for NULL values during each table export
	md5hash=$(cat SC_backups/${table}_md5hash_for_NULL.txt)
    echo "$(date)  - Import data for ${table} from ${dst_filename}..."
    CQL_to_restore_table="COPY ${table} FROM '${src_filename}' WITH HEADER = TRUE AND WITH NULL='${md5hash}'; "
    sudo cqlsh -u syndeia_admin -p ${syndeia_admin_pw} -e "${CQL_to_restore_table}" "${cassandra_jg_host}"
    if [[ $? -ne 0 ]]; then
        echo "$(date) x Error, could not import table ${table}"
        exit 2
    else
        echo "$(date) - Successfully imported table ${table}"
    fi;
done;
echo "$(date) - Done!"


2.  Make the script executable, ie:  chmod ug+x syndeia_cloud-3.4_restore.bash

3.  To use the script, run the following command to from the CLI: syndeia_cloud-3.4_restore.bash <node_FQDN>

      where <node_FQDN> is the node Fully Qualified Domain Name (ex: cassandra.mycompany.com) or if unspecified, localhost by default.  

     For each successfully imported schema and table, you should see output similar to the following (note, the # of imported exported rows):

Code Block
languagesql
themeRDark
    cassandra@cqlsh> COPY syndeia.users FROM 'syndeia-cloud_v2018-02-02_backup_syndeia.users.csv' WITH HEADER = TRUE;
  Mon May 10 16:57:49 EDT 2021 - Successfully imported schema syndeia_cloud_auth

Using 3 child processes

  

Starting copy of syndeia_cloud_auth.users with columns [id, activated, avatar_url, created_by_user_key, emailcreated_timestamp, description, providerexternal_id, providerexternal_key, activatedfirst_name, avatar_urlkey, createdlast_modified_by_user_key, createdlast_modified_datetimestamp, firstlast_name, last_name, modifiedother_byinfo, modified_datepermissions, permissionsprofiles, roles, user_name].
   
Processed: 11 rows; Rate:       6 rows/s; Avg. rate:      10 rows/s
    11 rows imported from 1 files in 1.048 seconds (0 skipped).
    cassandra@cqlsh> 
    


...

Troubleshooting

Import Errors

...