Overview

Cassandra has multiple methods to backup and restore data depending on the scenario and the amount of data you wish to backup/restore.

This document will go over the simplest & most portable methodi (using the CQL COPY ... TO / COPY ... FROM CQL commands), but be aware that other methods (ie: sstableloader with nodetool snapshot + schema & token export/import or 3rd party ETL tools) may be necessary if your situation is more complex. Refer to the Apache Cassandra and DataStax documentation (https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/migrating.html & https://web.archive.org/web/20161227010514/http://datascale.io/cloning-cassandra-clusters-fast-way/) for further details.

There are three main types of data that we need to be concerned about when backing up & restoring your database:

Schema: the schema is a description of what type(s) of data each table contains, think of it like column headers to table(s) of items
Data: this is actual application table data.
Metadata: this is supporting meta-data used by the database software, ex: node token assignments.

For the method we are discussing here, we only need to be concerned about the first two.

Backup

Schema

1. To backup the Syndeia Cloud database schema, run the following command to from the CLI: cqlsh -u <superuser> -e 'DESCRIBE KEYSPACE syndeia' <node_FQDN> > syndeia-cloud_v<version_#>_backup_schema.cql

where <superuser> is a Cassandra superuser account, <node_FQDN> is the node Fully Qualified Domain Name (ex: cassandra.mycompany.com), and <version_#> is the version of Syndeia Cloud. You will be prompted for your Cassandra password.

Data

2. To backup the Syndeia Cloud database data, run the below to export data from each respective syndeia keyspace table to the specified .CSV file, where <version_#> is the version of Syndeia Cloud.

    COPY syndeia.artifact_types TO 'syndeia-cloud_v<version#>_backup_syndeia.artifact_types.csv' WITH HEADER = TRUE;
    COPY syndeia.artifacts TO 'syndeia-cloud_v<version#>_backup_syndeia.artifacts.csv' WITH HEADER = TRUE;
    COPY syndeia.auth_tokens TO 'syndeia-cloud_v2019-01-02_syndeia.auth_tokens.csv' WITH HEADER = TRUE;
    COPY syndeia.auto_key TO 'syndeia-cloud_v<version#>_backup_syndeia.auto_key.csv' WITH HEADER = TRUE;
    COPY syndeia.container_versions TO 'syndeia-cloud_v<version#>_backup_syndeia.container_versions.csv' WITH HEADER = TRUE;
    COPY syndeia.containers TO 'syndeia-cloud_v<version#>_backup_syndeia.containers.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_artifacts TO 'syndeia-cloud_v<version#>_backup_syndeia.deleted_artifacts.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_containers TO 'syndeia-cloud_v<version#>_backup_syndeia.deleted_containers.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_relations TO 'syndeia-cloud_v<version#>_backup_syndeia.deleted_relations.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_relations TO 'syndeia-cloud_v<version#>_backup_syndeia.deleted_relations.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_repositories TO 'syndeia-cloud_v<version#>_backup_syndeia.deleted_repositories.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_users TO 'syndeia-cloud_v<version#>_backup_syndeia.deleted_users.csv' WITH HEADER = TRUE;
    COPY syndeia.passwords TO 'syndeia-cloud_v<version#>_backup_syndeia.passwords.csv' WITH HEADER = TRUE;
    COPY syndeia.relation_types TO 'syndeia-cloud_v<version#>_backup_syndeia.relation_types.csv' WITH HEADER = TRUE;
    COPY syndeia.relations TO 'syndeia-cloud_v<version#>_backup_syndeia.relations.csv' WITH HEADER = TRUE;
    COPY syndeia.repositories TO 'syndeia-cloud_v<version#>_backup_syndeia.repositories.csv' WITH HEADER = TRUE;
    COPY syndeia.user_container_access TO 'syndeia-cloud_v<version#>_backup_syndeia.user_container_access.csv' WITH HEADER = TRUE;
    COPY syndeia.users TO 'syndeia-cloud_v<version#>_backup_syndeia.users.csv' WITH HEADER = TRUE;

For each successfully exported table, you should see output similar to the following (note, the # of exported rows):

    cassandra@cqlsh> COPY syndeia.users TO 'syndeia-cloud_v2018-02-02_backup_syndeia.users.csv' WITH HEADER = TRUE;
    Using 3 child processes

    Starting copy of syndeia.users with columns [id, key, email, provider_id, provider_key, activated, avatar_url, created_by, created_date, first_name, last_name, modified_by, modified_date, permissions, roles].
    Processed: 11 rows; Rate:      19 rows/s; Avg. rate:       4 rows/s
    11 rows exported to 1 files in 2.698 seconds.
    cassandra@cqlsh>

Restore

Schema

3. To restore the Syndeia Cloud database schema, run the following command from the CLI: cqlsh -u <superuser> -f syndeia-cloud_v<version_#>_backup_schema.cql <node_FQDN>

where <superuser> is a Cassandra superuser account, <node_FQDN> is the node Fully Qualified Domain Name (ex: cassandra.mycompany.com), and <version_#> is the version of Syndeia Cloud. You will be prompted for your Cassandra password.

Data

4. To restore the Syndeia Cloud database data, run the below to import data into each respective syndeia keyspace table from the specified .CSV file, where <version_#> is the version of Syndeia Cloud.

    COPY syndeia.artifact_types FROM 'syndeia-cloud_v<version#>_backup_syndeia.artifact_types.csv' WITH HEADER = TRUE;
    COPY syndeia.artifacts FROM 'syndeia-cloud_v<version#>_backup_syndeia.artifacts.csv' WITH HEADER = TRUE;
    COPY syndeia.auth_tokens FROM 'syndeia-cloud_v2019-01-02_syndeia.auth_tokens.csv' WITH HEADER = TRUE;
    COPY syndeia.auto_key FROM 'syndeia-cloud_v<version#>_backup_syndeia.auto_key.csv' WITH HEADER = TRUE;
    COPY syndeia.container_versions FROM 'syndeia-cloud_v<version#>_backup_syndeia.container_versions.csv' WITH HEADER = TRUE;
    COPY syndeia.containers FROM 'syndeia-cloud_v<version#>_backup_syndeia.containers.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_artifacts FROM 'syndeia-cloud_v<version#>_backup_syndeia.deleted_artifacts.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_containers FROM 'syndeia-cloud_v<version#>_backup_syndeia.deleted_containers.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_relations FROM 'syndeia-cloud_v<version#>_backup_syndeia.deleted_relations.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_relations FROM 'syndeia-cloud_v<version#>_backup_syndeia.deleted_relations.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_repositories FROM 'syndeia-cloud_v<version#>_backup_syndeia.deleted_repositories.csv' WITH HEADER = TRUE;
    COPY syndeia.deleted_users FROM 'syndeia-cloud_v<version#>_backup_syndeia.deleted_users.csv' WITH HEADER = TRUE;
    COPY syndeia.passwords FROM 'syndeia-cloud_v<version#>_backup_syndeia.passwords.csv' WITH HEADER = TRUE;
    COPY syndeia.relation_types FROM 'syndeia-cloud_v<version#>_backup_syndeia.relation_types.csv' WITH HEADER = TRUE;
    COPY syndeia.relations FROM 'syndeia-cloud_v<version#>_backup_syndeia.relations.csv' WITH HEADER = TRUE;
    COPY syndeia.repositories FROM 'syndeia-cloud_v<version#>_backup_syndeia.repositories.csv' WITH HEADER = TRUE;
    COPY syndeia.user_container_access FROM 'syndeia-cloud_v<version#>_backup_syndeia.user_container_access.csv' WITH HEADER = TRUE;
    COPY syndeia.users FROM 'syndeia-cloud_v<version#>_backup_syndeia.users.csv' WITH HEADER = TRUE;

For each successfully imported table, you should see output similar to the following (note, the # of imported rows):

    cassandra@cqlsh> COPY syndeia.users FROM 'syndeia-cloud_v2018-02-02_backup_syndeia.users.csv' WITH HEADER = TRUE;
    Using 3 child processes

    Starting copy of syndeia.users with columns [id, key, email, provider_id, provider_key, activated, avatar_url, created_by, created_date, first_name, last_name, modified_by, modified_date, permissions, roles].
    Processed: 11 rows; Rate:       6 rows/s; Avg. rate:      10 rows/s
    11 rows imported from 1 files in 1.048 seconds (0 skipped).
    cassandra@cqlsh>

Troubleshooting

Import Errors

Q1: Sometimes I get the following timeout error and nothing gets imported (see below), how do I resolve this?

Failed to import 20 rows: OperationTimedOut - errors={<Host: x.x.x.x dc1>: ConnectionException('Host has been marked down or removed',)}, last_host=y.y.y.y,  will retry later, attempt 1 of 5
Failed to import 20 rows: OperationTimedOut - errors={'y.y.y.y': 'Client request timeout. See Session.execute[\_async](timeout)'}, last\_host=y.y.y.y,  will retry later, attempt 1 of 5
No records inserted in 90 seconds, aborting
Processed: 0 rows; Rate:       0 rows/s; Avg. rate:       0 rows/s
0 rows imported from 1 files in 1 minute and 30.180 seconds (0 skipped).

A1: Rerun the import command or create/set the following ~/.cassandra/cqlshrc parameters from the account used to run cqlsh and re-run the import command

[connection]
request_timeout=6000 
# default is 10 seconds, set to None to disable
client_timeout=3600 
# default is 10 seconds, set to None to disable

Q2: Sometimes I get the following "Pickling" error (see below), how do I resolve this?

PicklingError: Can't pickle <class 'cqlshlib.copyutil.ImmutableDict'>: attribute lookup cqlshlib.copyutil.ImmutableDict failed

A2: Suffix the following to the end of your COPY commands: WITH MINBATCHSIZE=1 AND MAXBATCHSIZE=1 AND PAGESIZE=10

Backup & Restore Methods for Syndeia Cloud Keyspace in Cassandra

Overview

Backup

Schema

Data

Restore

Schema

Data

Troubleshooting

Import Errors

Q1: Sometimes I get the following timeout error and nothing gets imported (see below), how do I resolve this?

Q2: Sometimes I get the following "Pickling" error (see below), how do I resolve this?