Backup and Restore of VMware Blockchain nodes

It’s always a good idea to take a backup of your VMware Blockchain client and replica nodes prior to any kind of maintenance activity.

Client node backups can be set to run on a schedule while replica nodes backups are more or a manual process. Client node backups can also be run manually should the need arise.

Replica Backup and Restore

Backup the Replica Nodes

Before getting started, I took a couple of screenshots to document that there was only one contract present under the Alice user that accounted for a single IOU being present. This is from the IOU QuickStart application that I deployed, as noted in my previous post, Deploying a test DAML application on VMware Blockchain 1.6.0.1.

When backing up the replicas, you start out by stopping any applications that might be accessing the Blockchain. In my case, this is just the damlnavigator application.

Once any client applications are stopped, you’ll need to stop all containers with the exception of the agent operator containers on any client nodes by SSH’ing to each and running the following command:

curl -X POST 127.0.0.1:8546/api/node/management?action=stop

You can validate that all containers are stopped by running sudo docker ps. You should see output similar to the following:

CONTAINER ID   IMAGE                                             COMMAND                  CREATED        STATUS          PORTS                    NAMES
add90e1994ab   ae2a0236ff92                                      "/operator/operator_…"   2 months ago   Up 16 minutes                            operator
6898c4ba6f18   harbor.corp.vmw/vmwblockchain/agent:1.6.0.1.266   "java -jar node-agen…"   2 months ago   Up 16 minutes   0.0.0.0:8546->8546/tcp   agent

If you have more than one client node, you should only have an operator container running on one of them and the sudo docker ps output would only show the agent container running on the other client nodes.

With the client nodes stopped, you should take a backup of their database. It might not seem intuitive to do so since this is a replica node backup we’re working on but you will find that you need to restore the client nodes to the same point-in-time state as the replicas. I like to perform these steps from an externa system, per the following commands:

sshpass -p '<vmbc_user_password>' ssh -t -o StrictHostKeyChecking=no -o PreferredAuthentications=password -o LogLevel=QUIET vmbc@192.168.100.35 "sudo tar cvzf client.tgz /mnt/data/db;sudo chown vmbc:users client.tgz"
sshpass -p '<vmbc_user_password>' scp -o StrictHostKeyChecking=no -o PreferredAuthentications=password -o LogLevel=QUIET vmbc@192.168.100.35:client.tgz .

Once the Blockchain containers are stopped on all client nodes, you can “wedge” the replica nodes. This essentially prevents new transactions from occurring in the Blockchain. From the client node where the operator container is running, execute the following command:

sudo docker exec -it operator sh -c './concop wedge stop'

You should see output similar to the following:

{"succ":true}

You don’t want to proceed until all of the replica nodes are wedged. The output of the previous command might make you think that it’s done but there is another command you can run against the operator container to get a more detailed view of where the wedge operation is:

sudo docker exec -it operator sh -c './concop wedge status'
	
{"192.168.100.31":true,"192.168.100.32":true,"192.168.100.33":true,"192.168.100.34":true}

As there are only four replica nodes in my Blockchain installation, the output from the previous command shows me that they are all wedged now and it’s safe to proceed to the next step.

Similar to what was done for the client nodes, you’ll need to stop all containers except the agent container on all replica and full-copy client nodes. You can SSH to each node and run the following command:

curl -X POST 127.0.0.1:8546/api/node/management?action=stop

And just as was done on the client nodes, you can validate that only the agent container is running on the replica and full-copy client nodes by running sudo docker ps. You should see output similar to the following on each node.

CONTAINER ID   IMAGE                                             COMMAND                  CREATED        STATUS          PORTS                    NAMES
5dce10507743   harbor.corp.vmw/vmwblockchain/agent:1.6.0.1.266   "java -jar node-agen…"   2 months ago   Up 30 minutes   0.0.0.0:8546->8546/tcp   agent

One final check to do before moving on is to make sure that all replica nodes are stopped on the same BlockID. This can be done by running the following commands on each replica node:

image=$(sudo docker images --format "{{.Repository}}:{{.Tag}}" | grep "concord")
sudo docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata $image /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastBlockID
sudo docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata $image /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastReachableBlockID'

You should see output similar to the following (and with the same BlockID values on all replica nodes):

{
  "lastBlockID": "320"
}
{
  "lastReachableBlockID": "320"
}

With the replica nodes sufficiently quiesced, you can proceed with backing up the rocksdb database on each node. I like to do this from one central location to make less work for myself.

for node in 192.168.100.31 192.168.100.32 192.168.100.33 192.168.100.34
do 
  sshpass -p '<vmbc_user_password>' ssh -t -o StrictHostKeyChecking=no -o PreferredAuthentications=password -o LogLevel=QUIET vmbc@$node "sudo tar cvzf replica-$node.tgz /mnt/data/rocksdbdata;sudo chown vmbc:users replica-$node.tgz"
  sshpass -p '<vmbc_user_password>' scp -o StrictHostKeyChecking=no -o PreferredAuthentications=password -o LogLevel=QUIET vmbc@$node:replica-$node.tgz .
done

This small script is looping through each replica node, creating a .tgz file of the /mnt/data/rocksdbdata directory and then using scp to copy the file (replica-<ip_address>.tgz) to the local system.

The backup is now complete and we just need to restart everything on the nodes.

Start out by SSH’ing to each replica and full-copy client node and issue the following command to start all containers:

curl -X POST 127.0.0.1:8546/api/node/management?action=start

As with the process to stop the containers, you can use the sudo docker ps command to validate that all containers are up and running again:

CONTAINER ID   IMAGE                                             COMMAND                  CREATED        STATUS                          PORTS                                                                                                      NAMES
55b5e3c63f12   96e1e024f557                                      "/go/bin/agent-linux"    2 months ago   Up 50 seconds (healthy)         0.0.0.0:5775->5775/udp, 0.0.0.0:6831-6832->6831-6832/udp, 0.0.0.0:5778->5778/tcp                           jaeger-agent
1328c3b7af1e   def4a8d834f5                                      "/concord/concord-en…"   2 months ago   Up 51 seconds (healthy)         0.0.0.0:3501->3501/tcp, 3501-3505/udp, 0.0.0.0:50051->50051/tcp, 3502-3505/tcp, 127.0.0.1:5458->5458/tcp   concord
4896943fb83b   6b3f9670fd94                                      "/bin/bash /opt/wave…"   2 months ago   Up 52 seconds                                                                                                                              wavefront-proxy
abd1f875147b   4697884441e0                                      "/doc/daml/entrypoin…"   2 months ago   Up 53 seconds (healthy)         0.0.0.0:55000->55000/tcp                                                                                   daml_execution_engine
32a61957f4f0   cb15bf57b0aa                                      "tini -- /bin/entryp…"   2 months ago   Up 54 seconds (healthy)         5140/tcp, 24224/tcp                                                                                        fluentd
5dce10507743   harbor.corp.vmw/vmwblockchain/agent:1.6.0.1.266   "java -jar node-agen…"   2 months ago   Up 11 minutes                   0.0.0.0:8546->8546/tcp                                                                                     agent
070ae96c04e2   870ee38129f8                                      "/entrypoint.sh tele…"   2 months ago   Up 49 seconds (healthy)         8092/udp, 8125/udp, 8094/tcp, 0.0.0.0:9273->9273/tcp                                                       telegraf

From an SSH session to the client node where the operator container is running, you can now unwedge the replica nodes:

sudo docker exec -it operator sh -c './concop unwedge'

You should see output similar to the following:

{"succ":true}

And as with the wedge that was issued earlier, you can use the following to get more detail on if all nodes are truly unwedged:

sudo docker exec -it operator sh -c './concop wedge status'
	
{"192.168.100.31":false,"192.168.100.32":false,"192.168.100.33":false,"192.168.100.34":false}

The last thing to do is to start all containers on the client nodes. Issue the following on each via an SSH session:

curl -X POST 127.0.0.1:8546/api/node/management?action=start

sudo docker ps on each client node should show output similar to the following:

CONTAINER ID   IMAGE                                             COMMAND                  CREATED        STATUS                        PORTS                                                                              NAMES
add90e1994ab   ae2a0236ff92                                      "/operator/operator_…"   2 months ago   Up 53 minutes                                                                                                    operator
41077d06d4cf   3382e600c110                                      "/clientservice/clie…"   2 months ago   Up 20 seconds (healthy)       0.0.0.0:50505->50505/tcp                                                           clientservice
f09352787a2f   870ee38129f8                                      "/entrypoint.sh tele…"   2 months ago   Up 21 seconds (healthy)       8092/udp, 8125/udp, 8094/tcp, 0.0.0.0:9273->9273/tcp                               telegraf
0b4ace21eb4f   f1edf3cb8810                                      "/cre/cre_server"        2 months ago   Up 22 seconds                                                                                                    cre
cb9ae083f8b4   96e1e024f557                                      "/go/bin/agent-linux"    2 months ago   Up 23 seconds (healthy)       0.0.0.0:5775->5775/udp, 0.0.0.0:6831-6832->6831-6832/udp, 0.0.0.0:5778->5778/tcp   jaeger-agent
79b054c4aa96   418c1f4894c2                                      "/bin/sh -c '/doc/da…"   2 months ago   Up 24 seconds (healthy)       0.0.0.0:6865->6865/tcp                                                             daml_ledger_api
d95e81a1b4aa   6b3f9670fd94                                      "/bin/bash /opt/wave…"   2 months ago   Up 25 seconds                                                                                                    wavefront-proxy
16aa15d0f24f   01e47563f112                                      "/doc/daml/scripts/d…"   2 months ago   Up 26 seconds (healthy)       5432/tcp                                                                           daml_index_db
c9bdc55ed1d0   cb15bf57b0aa                                      "tini -- /bin/entryp…"   2 months ago   Up 26 seconds (healthy)       5140/tcp, 24224/tcp                                                                fluentd
6898c4ba6f18   harbor.corp.vmw/vmwblockchain/agent:1.6.0.1.266   "java -jar node-agen…"   2 months ago   Up 54 minutes                 0.0.0.0:8546->8546/tcp                                                             agent

Create a Transaction

To make sure the restore process works as expected, I created a new transaction with the intent of rolling it back via a restore from my earlier backup.

You can see that the Alice user now has two contracts that are responsible for two IOUs.

I also thought to check the BlockID value on the replica nodes to validate that it went up (from the value of 320 earlier):

image=$(sudo docker images --format "{{.Repository}}:{{.Tag}}" | grep "concord")
sudo docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata $image /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastBlockID
sudo docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata $image /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastReachableBlockID'

I saw the following output on all replica nodes:

{
  "lastBlockID": "322"
}
{
  "lastReachableBlockID": "322"
}

Restore the Replica Nodes

The first part of the retore is the same as the backup section…stop any applications accessing the Blockchain, stop all containers except for the agent and operator containers on the client nodes.

I restored the client node database from the backup taken earlier. Before restoring the client database, the original must be removed via the following command executed via an SSH session to each client node:

sudo rm -rf /mnt/data/db/

The following commands were run from the same remote system where the backup was saved earlier. If you have more than one client node, you will need to run these commands against each one of them:

sshpass -p '<vmbc_user_password>' scp -o StrictHostKeyChecking=no -o PreferredAuthentications=password -o LogLevel=QUIET client.tgz vmbc@192.168.100.35:
sshpass -p '<vmbc_user_password>' ssh -t -o StrictHostKeyChecking=no -o PreferredAuthentications=password -o LogLevel=QUIET vmbc@192.168.100.35 'sudo tar xvzf client.tgz -C /

Once more, you’ll need to stop all containers except the agent container on all replica and full-copy client nodes. You can SSH to each node and run the following command:

curl -X POST 127.0.0.1:8546/api/node/management?action=stop

You can validate that only the agent container is running on the replica and full-copy client nodes by running sudo docker ps.

Similar to what was done on the client node, you will want to remove the existing rocksdb database on all replica nodes via the following command:

sudo rm -rf /mnt/data/rocksdbdata

From the remote system where the rocksb database was copied earlier, the following commands can be used to restore this data to each replica node:

for node in 192.168.100.31 192.168.100.32 192.168.100.33 192.168.100.34
do 
  sshpass -p '<vmbc_user_password>' scp -o StrictHostKeyChecking=no -o PreferredAuthentications=password -o LogLevel=QUIET replica-$node.tgz vmbc@$node:replica.tgz
  sshpass -p '<vmbc_user_password>' ssh -t -o StrictHostKeyChecking=no -o PreferredAuthentications=password -o LogLevel=QUIET vmbc@$node 'sudo tar xvzf replica.tgz -C /'
done

You can now SSH to each replica and full-copy client node and start all containers via the following command:

curl -X POST 127.0.0.1:8546/api/node/management?action=start

And you can use sudo docker ps to validate that all containers are running.

At this point, you can check that the blockID value on each replica node is back to what it was when the backup was taken (320 in my case):

image=$(sudo docker images --format "{{.Repository}}:{{.Tag}}" | grep "concord")
sudo docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata $image /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastBlockID
sudo docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata $image /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastReachableBlockID'
{
  "lastBlockID": "320"
}
{
  "lastReachableBlockID": "320"
}

The last thing to do is to start all containers on the client nodes via the following command:

curl -X POST 127.0.0.1:8546/api/node/management?action=start

And again, use the sudo docker ps command on the client nodes to ensure that all containers are running.

Within the damlnavigator application, I was able to validate that the Alice user only sees one contract and one IOU, as was the case before the backup was taken:

Client Backup and Restore

Backup the Client Node(s)

You’ve already seen the manual client backup and restore process replica backup and restore section of this post. Simply copying the database folder/file structure off the system and back on is all that is needed. There is an automated process as well that can be implemented to create regular backups.

The only prerequisite for this to work is that you need a writeable NFS share to store the backups remotely from the client nodes. You’ll want to set the ownership on this folder to 999:999 so that the postgres user in the daml_index_db container can write to it.

On each client node, mount your NFS share to the /mnt/client-backups directory:

sudo mount -t nfs 10.10.20.60:/mnt/vol1/bc-client-backups /mnt/client-backups/

10.10.20.60 is my NFS filer and /mnt/vol1/bc-client-backups is the share/folder where I’ll be storing the client node backups. /mnt/client-backups is a pre-existing folder on each client node. You can use the df command to see that the share is mounted successfully:

Filesystem                               Size  Used Avail Use% Mounted on
10.10.20.60:/mnt/vol1/bc-client-backups  191G  119G   73G  63% /mnt/client-backups

You should create an /etc/fstab entry similar to the following to ensure that the share is re-mounted at boot:

10.10.20.60:/mnt/vol1/bc-client-backups         /mnt/client-backups nfs defaults,rw     0       0

In order to better assure that the backup process will work, a temporary daml_index_db container will be created and we’ll test writing to the /mnt/client-backups mount as the postgres user.

You’ll need to find the image id value for the daml-index-db container image:

sudo docker images |egrep 'REPO|index'

REPOSITORY                                      TAG           IMAGE ID       CREATED         SIZE
harbor.corp.vmw/vmwblockchain/daml-index-db     1.6.0.1.266   01e47563f112   5 months ago    358MB

With the image id value of 01e47563f112, we can start a second daml_index_db container by issuing a command similar to the following:

sudo docker run -it --rm --name=daml_index_db_mount_test -e PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/postgresql/11/bin -e GOSU_VERSION=1.12 -e LANG=en_US.utf8 -e PG_MAJOR=11 -e PG_VERSION=11.12-1.pgdg90+1 -e PGDATA=/var/lib/postgresql/data -e POSTGRES_USER=indexdb -e POSTGRES_PASSWORD=indexdb -e POSTGRES_MULTIPLE_SCHEMAS=daml_ledger_api -e POSTGRES_CONFIG_FILE= -v /config/daml-index-db:/config/daml-index-db:rw -v /config/generic:/config/generic:rw -v /mnt/data/db:/var/lib/postgresql/data:rw  -v /config/pgbackrest:/etc/pgbackrest -v /mnt/client-backups:/mnt/client-backups --network=blockchain-fabric --expose=5432 --entrypoint /bin/bash 01e47563f112  

Be sure to replace 01e47563f112 with the value for your own daml-index-db image.

You’ll be at a prompt within the temporary daml_index_db container. From here issue the following commands to test writing to the /mnt/client-backups mount as the postgres user:

su - postgres

touch /mnt/client-backups/somefile
ls /mnt/client-backups/

somefile

rm /mnt/client-backups/somefile

You can type exit to exit this container and stop it.

Now that we’re sure that the postgres user can write to the /mnt/client-backups mount, we need to restart all containers on the client node.

curl -X POST 127.0.0.1:8546/api/node/management?action=stop
curl -X POST 127.0.0.1:8546/api/node/management?action=start

You can use the sudo docker ps command to validate that the containers were recent restarted (note the STATUS column value):

CONTAINER ID   IMAGE                                             COMMAND                  CREATED        STATUS                        PORTS                                                                              NAMES
add90e1994ab   ae2a0236ff92                                      "/operator/operator_…"   2 months ago   Up 2 days                                                                                                        operator
41077d06d4cf   3382e600c110                                      "/clientservice/clie…"   2 months ago   Up 11 seconds (healthy)       0.0.0.0:50505->50505/tcp                                                           clientservice
f09352787a2f   870ee38129f8                                      "/entrypoint.sh tele…"   2 months ago   Up 12 seconds (healthy)       8092/udp, 8125/udp, 8094/tcp, 0.0.0.0:9273->9273/tcp                               telegraf
0b4ace21eb4f   f1edf3cb8810                                      "/cre/cre_server"        2 months ago   Up 13 seconds                                                                                                    cre
cb9ae083f8b4   96e1e024f557                                      "/go/bin/agent-linux"    2 months ago   Up 14 seconds (healthy)       0.0.0.0:5775->5775/udp, 0.0.0.0:6831-6832->6831-6832/udp, 0.0.0.0:5778->5778/tcp   jaeger-agent
79b054c4aa96   418c1f4894c2                                      "/bin/sh -c '/doc/da…"   2 months ago   Up 15 seconds (healthy)       0.0.0.0:6865->6865/tcp                                                             daml_ledger_api
d95e81a1b4aa   6b3f9670fd94                                      "/bin/bash /opt/wave…"   2 months ago   Up 165 seconds                                                                                                   wavefront-proxy
16aa15d0f24f   01e47563f112                                      "/doc/daml/scripts/d…"   2 months ago   Up 17 seconds (healthy)       5432/tcp                                                                           daml_index_db
c9bdc55ed1d0   cb15bf57b0aa                                      "tini -- /bin/entryp…"   2 months ago   Up 18 seconds (healthy)       5140/tcp, 24224/tcp                                                                fluentd
6898c4ba6f18   harbor.corp.vmw/vmwblockchain/agent:1.6.0.1.266   "java -jar node-agen…"   2 months ago   Up 2 days                     0.0.0.0:8546->8546/tcp                                                             agent

You will also need to manually restart the agent container with the sudo docker restart agent command.

I stumbled upon an issue with the file system permissions in the daml_index_db container while conducting this exercise. The /var/lib/postgres folder is owned by the postgres user and the root group. This causes no issues for normal operations but during the restore process, the postgres user attempts to set that group ownership to root and is blocked by the operating system. The workaround is to set the group ownership to postgres prior to configuring the backup.

Launch a shell into the daml_index_db container:

docker exec -it daml_index_db bash

Issue the following command to ensure that the postgres group owns all files/folders under /var/lib/postgresql:

chown -R postgres:postgres /var/lib/postgresql

Type exit to get out of the daml_index_db container.

Checking the status of the backup configuration on the client node, you should see that it is disabled:

curl -X GET 127.0.0.1:8546/api/backup/status | jq

{
  "execution_status_code": 0,
  "last_run_start_time": "1970-01-01T00:00:00Z[UTC]",
  "last_run_end_time": "1970-01-01T00:00:00Z[UTC]",
  "backup_state": {
    "state": "DISABLED",
    "state_change_time": "1970-01-01T00:00Z[UTC]"
  },
  "backup_configuration": {
    "schedule_frequency": "",
    "retention_days": 0,
    "enabled": false
  },
  "in_progress": false,
  "next_run_time": "1970-01-01T00:00:00Z[UTC]"
}

To create a backup job, you must POST a backup configuration to the client node’s API. The following is a sample of what a backup configuration could look like:

{
    "retention_days": 33,
    "schedule_frequency": "DAILY"
}

And the command to POST the backup configuration should look similar to the following:

curl -X POST 127.0.0.1:8546/api/backup/ -H "Content-Type: application/json" -d @client-backup.json

You should see output similar to the following:

{
  "message":"ClientBackup: Scheduled backup next run at: 2022-10-25T19:00Z[UTC]",
  "execution_message": "2022-10-25 14:48:20.963 P00   INFO: stanza-create command begin 2.38: --exec-id=342-e5784a67 --log-level-console=info --pg1-path=/var/lib/postgresql/data --repo1-path=/mnt/client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf --stanza=daml-indexdb\n2022-10-25 14:48:21.600 P00   INFO: stanza-create for stanza 'daml-indexdb' on repo1\n2022-10-25 14:48:21.701 P00   INFO: stanza-create command end: completed successfully (753ms)\n",
  "execution_error": "",
  "execution_status_code": 0,
  "last_run_start_time": "1970-01-01T00:00:00Z[UTC]",
  "last_run_end_time": "1970-01-01T00:00:00Z[UTC]",
  "backup_state": {
    "state": "ENABLED",
    "state_change_time": "2022-10-25T14:48:23.035841Z[UTC]"
  },
  "backup_configuration": {
    "schedule_frequency": "DAILY",
    "retention_days": 33,
    "enabled": true
  },
  "in_progress": false,
  "next_run_time": "2022-10-25T19:00:00Z[UTC]"
}

You can also query the status of the backup configuration on the client node and see nearly identical output to what is posted above:

curl -X GET 127.0.0.1:8546/api/backup/status | jq

{
  "execution_message": "2022-10-25 14:48:20.963 P00   INFO: stanza-create command begin 2.38: --exec-id=342-e5784a67 --log-level-console=info --pg1-path=/var/lib/postgresql/data --repo1-path=/mnt/client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf --stanza=daml-indexdb\n2022-10-25 14:48:21.600 P00   INFO: stanza-create for stanza 'daml-indexdb' on repo1\n2022-10-25 14:48:21.701 P00   INFO: stanza-create command end: completed successfully (753ms)\n",
  "execution_error": "",
  "execution_status_code": 0,
  "last_run_start_time": "1970-01-01T00:00:00Z[UTC]",
  "last_run_end_time": "1970-01-01T00:00:00Z[UTC]",
  "backup_state": {
    "state": "ENABLED",
    "state_change_time": "2022-10-25T14:48:23.035841Z[UTC]"
  },
  "backup_configuration": {
    "schedule_frequency": "DAILY",
    "retention_days": 33,
    "enabled": true
  },
  "in_progress": false,
  "next_run_time": "2022-10-25T19:00:00Z[UTC]"
}

A backup should be scheduled for a few hours from the current time and subsequent backups will be run on a daily basis. In this example, the command was run at 2022-10-25 14:48:20 UTC and the next scheduled backup is set for 2022-10-25 19:00:00 UTC.

If you examine the contents of the share/folder on your NFS filer, you will see that there is a new folder with a name that is a UUID value. This UUID value is unique to each client node (you will actually see it as part of the client node VM’s name in the vSphere client). Under this folder there is a backup folder and an archive folder with the backup contents.

find /mnt/vol1/bc-client-backups

/mnt/vol1/bc-client-backups
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/backup
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/backup/daml-indexdb
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/backup/daml-indexdb/backup.info
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/backup/daml-indexdb/backup.info.copy
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/archive
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/archive/daml-indexdb
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/archive/daml-indexdb/archive.info.copy
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/archive/daml-indexdb/archive.info

You can also examine the contents of the backup.info or archive.info files to see metadata on the database and backups:

cat /mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/backup/daml-indexdb/backup.info

[backrest]
backrest-format=5
backrest-version="2.38"

[db]
db-catalog-version=201809051
db-control-version=1100
db-id=1
db-system-id=7127334954729418817
db-version="11"

[db:history]
1={"db-catalog-version":201809051,"db-control-version":1100,"db-system-id":7127334954729418817,"db-version":"11"}

[backrest]
backrest-checksum="23c61e32fcefc4d5a12a95516df5ee918694b097"


cat /mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/archive/daml-indexdb/archive.info

[backrest]
backrest-format=5
backrest-version="2.38"

[db]
db-id=1
db-system-id=7127334954729418817
db-version="11"

[db:history]
1={"db-id":7127334954729418817,"db-version":"11"}

[backrest]
backrest-checksum="c7f1a8008ca064c91f4487151280f696675135a5"

I waited until after 19:00:00 UTC to see how things had changed:

curl -X GET 127.0.0.1:8546/api/backup/status | jq

{
  "execution_message": "2022-10-25 19:00:01.329 P00   INFO: backup command begin 2.38: --archive-timeout=600 --compress-type=lz4 --exec-id=32610-ab86462c --log-level-console=info --pg1-path=/var/lib/postgresql/data --process-max=10 --repo1-path=/mnt/client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf --repo1-retention-full=33 --repo1-retention-full-type=time --stanza=daml-indexdb --type=full\n2022-10-25 19:00:02.042 P00   INFO: execute non-exclusive pg_start_backup(): backup begins after the next regular checkpoint completes\n2022-10-25 19:00:03.645 P00   INFO: backup start archive = 000000010000000000000005, lsn = 0/5000028\n2022-10-25 19:00:03.645 P00   INFO: check archive for prior segment 000000010000000000000004\n2022-10-25 19:00:11.711 P00   INFO: execute non-exclusive pg_stop_backup() and wait for all WAL segments to archive\n2022-10-25 19:00:12.213 P00   INFO: backup stop archive = 000000010000000000000005, lsn = 0/50000F8\n2022-10-25 19:00:12.314 P00   INFO: check archive for segment(s) 000000010000000000000005:000000010000000000000005\n2022-10-25 19:00:12.668 P00   INFO: new backup label = 20221025-190001F\n2022-10-25 19:00:12.832 P00   INFO: full backup size = 39.6MB, file total = 1641\n2022-10-25 19:00:12.833 P00   INFO: backup command end: completed successfully (11506ms)\n2022-10-25 19:00:12.833 P00   INFO: expire command begin 2.38: --exec-id=32610-ab86462c --log-level-console=info --repo1-path=/mnt/client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf --repo1-retention-full=33 --repo1-retention-full-type=time --stanza=daml-indexdb\n2022-10-25 19:00:12.871 P00   INFO: repo1: time-based archive retention not met - archive logs will not be expired\n2022-10-25 19:00:12.871 P00   INFO: expire command end: completed successfully (38ms)\n",
  "execution_error": "",
  "execution_status_code": 0,
  "last_run_start_time": "2022-10-25T19:00:00.038502Z[UTC]",
  "last_run_end_time": "2022-10-25T19:00:13.316444Z[UTC]",
  "backup_state": {
    "state": "ENABLED",
    "state_change_time": "2022-10-25T14:48:23.035841Z[UTC]"
  },
  "backup_configuration": {
    "schedule_frequency": "DAILY",
    "retention_days": 33,
    "enabled": true
  },
  "in_progress": false,
  "next_run_time": "2022-10-26T06:00:00Z[UTC]"
}

You can see that the last backup ran at 2022-10-25 19:00:00UTC.

If you check the contents of the share/directory on your NFS filer, you’ll see that the number of objects has jumped from a handful to thousands.

One last step for the client node backup process is to export the database metadata so that it can be compared later after a restore.

sudo docker exec -it daml_index_db bash

root@16aa15d0f24f:/# su - postgres
postgres@16aa15d0f24f:~$ psql -U indexdb -d daml_ledger_api

psql (11.13 (Debian 11.13-1.pgdg90+1))
Type "help" for help.

daml_ledger_api=# 

select count(1) from configuration_entries;
select count(1) from flyway_schema_history;
select count(1) from metering_parameters;
select count(1) from package_entries;
select count(1) from packages;
select count(1) from parameters;
select count(1) from participant_command_completions;
select count(1) from participant_events;
select count(1) from participant_events_consuming_exercise;
select count(1) from participant_events_create;
select count(1) from participant_events_create_filter;
select count(1) from participant_events_divulgence;
select count(1) from participant_events_non_consuming_exercise;
select count(1) from participant_metering;
select count(1) from participant_migration_history_v100;
select count(1) from participant_user_rights;
select count(1) from participant_users;
select count(1) from party_entries;
select count(1) from string_interning;
select count(1) from transaction_metering;

You’ll want to save the output and then type exit a few times to get out of this container.

Restore the Client Node(s)

Again, we’re going to stop all containers except the agent and operator containers on the client node:

curl -X POST 127.0.0.1:8546/api/node/management?action=stop

We need to remove the contents of the /mnt/data/db folder (as the root user) to ensure that there is no conflicting data after the restore:

sudo bash -c 'rm -rf /mnt/data/db/*'

Launch a temporary daml_index_db container:

sudo docker run -it --rm --name=daml_index_db_mount_test -e PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/postgresql/11/bin -e GOSU_VERSION=1.12 -e LANG=en_US.utf8 -e PG_MAJOR=11 -e PG_VERSION=11.12-1.pgdg90+1 -e PGDATA=/var/lib/postgresql/data -e POSTGRES_USER=indexdb -e POSTGRES_PASSWORD=indexdb -e POSTGRES_MULTIPLE_SCHEMAS=daml_ledger_api -e POSTGRES_CONFIG_FILE= -v /config/daml-index-db:/config/daml-index-db:rw -v /config/generic:/config/generic:rw -v /mnt/data/db:/var/lib/postgresql/data:rw  -v /config/pgbackrest:/etc/pgbackrest -v /mnt/client-backups:/mnt/client-backups --network=blockchain-fabric --expose=5432 --entrypoint /bin/bash 01e47563f112

Switch to the postgres user and use the pgbackrest command to check on the status of the backup:

su - postgres

pgbackrest info

stanza: daml-indexdb
    status: ok
    cipher: none

    db (current)
        wal archive min/max (11): 000000010000000000000004/000000010000000000000005

        full backup: 20221025-190001F
            timestamp start/stop: 2022-10-25 19:00:01 / 2022-10-25 19:00:12
            wal start/stop: 000000010000000000000005 / 000000010000000000000005
            database size: 39.6MB, database backup size: 39.6MB
            repo1: backup set size: 7.3MB, backup size: 7.3MB

Use the pgbackrest command again to restore from the most recent backup:

pgbackrest --stanza=daml-indexdb --log-level-console=info restore

2022-10-25 19:28:40.869 P00   INFO: restore command begin 2.38: --exec-id=12-6934bf90 --log-level-console=info --pg1-path=/var/lib/postgresql/data --process-max=10 --repo1-path=/mnt/client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf --stanza=daml-indexdb
2022-10-25 19:28:40.926 P00   INFO: repo1: restore backup set 20221025-190001F, recovery will start at 2022-10-25 19:00:01
2022-10-25 19:28:46.506 P00   INFO: write /var/lib/postgresql/data/recovery.conf
2022-10-25 19:28:46.663 P00   INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2022-10-25 19:28:46.668 P00   INFO: restore size = 39.6MB, file total = 1641
2022-10-25 19:28:46.669 P00   INFO: restore command end: completed successfully (5802ms)

Type exit a few times to get out of the temporary daml_index_db container. Issue the following command to start all containers on the client node:

curl -X POST 127.0.0.1:8546/api/node/management?action=start

Leave a Comment

Your email address will not be published. Required fields are marked *