It’s always a good idea to take a backup of your VMware Blockchain client and replica nodes prior to any kind of maintenance activity.
Client node backups can be set to run on a schedule while replica nodes backups are more or a manual process. Client node backups can also be run manually should the need arise.
Table of Contents
Replica Backup and Restore
Backup the Replica Nodes
Before getting started, I took a couple of screenshots to document that there was only one contract present under the Alice user that accounted for a single IOU being present. This is from the IOU QuickStart application that I deployed, as noted in my previous post, Deploying a test DAML application on VMware Blockchain 1.6.0.1.


When backing up the replicas, you start out by stopping any applications that might be accessing the Blockchain. In my case, this is just the damlnavigator application.
Once any client applications are stopped, you’ll need to stop all containers with the exception of the agent operator containers on any client nodes by SSH’ing to each and running the following command:
curl -X POST 127.0.0.1:8546/api/node/management?action=stop
You can validate that all containers are stopped by running sudo docker ps
. You should see output similar to the following:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
add90e1994ab ae2a0236ff92 "/operator/operator_…" 2 months ago Up 16 minutes operator
6898c4ba6f18 harbor.corp.vmw/vmwblockchain/agent:1.6.0.1.266 "java -jar node-agen…" 2 months ago Up 16 minutes 0.0.0.0:8546->8546/tcp agent
If you have more than one client node, you should only have an operator container running on one of them and the sudo docker ps
output would only show the agent container running on the other client nodes.
With the client nodes stopped, you should take a backup of their database. It might not seem intuitive to do so since this is a replica node backup we’re working on but you will find that you need to restore the client nodes to the same point-in-time state as the replicas. I like to perform these steps from an externa system, per the following commands:
sshpass -p '<vmbc_user_password>' ssh -t -o StrictHostKeyChecking=no -o PreferredAuthentications=password -o LogLevel=QUIET vmbc@192.168.100.35 "sudo tar cvzf client.tgz /mnt/data/db;sudo chown vmbc:users client.tgz"
sshpass -p '<vmbc_user_password>' scp -o StrictHostKeyChecking=no -o PreferredAuthentications=password -o LogLevel=QUIET vmbc@192.168.100.35:client.tgz .
Once the Blockchain containers are stopped on all client nodes, you can “wedge” the replica nodes. This essentially prevents new transactions from occurring in the Blockchain. From the client node where the operator container is running, execute the following command:
sudo docker exec -it operator sh -c './concop wedge stop'
You should see output similar to the following:
{"succ":true}
You don’t want to proceed until all of the replica nodes are wedged. The output of the previous command might make you think that it’s done but there is another command you can run against the operator container to get a more detailed view of where the wedge operation is:
sudo docker exec -it operator sh -c './concop wedge status'
{"192.168.100.31":true,"192.168.100.32":true,"192.168.100.33":true,"192.168.100.34":true}
As there are only four replica nodes in my Blockchain installation, the output from the previous command shows me that they are all wedged now and it’s safe to proceed to the next step.
Similar to what was done for the client nodes, you’ll need to stop all containers except the agent container on all replica and full-copy client nodes. You can SSH to each node and run the following command:
curl -X POST 127.0.0.1:8546/api/node/management?action=stop
And just as was done on the client nodes, you can validate that only the agent container is running on the replica and full-copy client nodes by running sudo docker ps
. You should see output similar to the following on each node.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5dce10507743 harbor.corp.vmw/vmwblockchain/agent:1.6.0.1.266 "java -jar node-agen…" 2 months ago Up 30 minutes 0.0.0.0:8546->8546/tcp agent
One final check to do before moving on is to make sure that all replica nodes are stopped on the same BlockID. This can be done by running the following commands on each replica node:
image=$(sudo docker images --format "{{.Repository}}:{{.Tag}}" | grep "concord")
sudo docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata $image /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastBlockID
sudo docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata $image /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastReachableBlockID'
You should see output similar to the following (and with the same BlockID values on all replica nodes):
{
"lastBlockID": "320"
}
{
"lastReachableBlockID": "320"
}
With the replica nodes sufficiently quiesced, you can proceed with backing up the rocksdb database on each node. I like to do this from one central location to make less work for myself.
for node in 192.168.100.31 192.168.100.32 192.168.100.33 192.168.100.34
do
sshpass -p '<vmbc_user_password>' ssh -t -o StrictHostKeyChecking=no -o PreferredAuthentications=password -o LogLevel=QUIET vmbc@$node "sudo tar cvzf replica-$node.tgz /mnt/data/rocksdbdata;sudo chown vmbc:users replica-$node.tgz"
sshpass -p '<vmbc_user_password>' scp -o StrictHostKeyChecking=no -o PreferredAuthentications=password -o LogLevel=QUIET vmbc@$node:replica-$node.tgz .
done
This small script is looping through each replica node, creating a .tgz
file of the /mnt/data/rocksdbdata
directory and then using scp to copy the file (replica-<ip_address>.tgz
) to the local system.
The backup is now complete and we just need to restart everything on the nodes.
Start out by SSH’ing to each replica and full-copy client node and issue the following command to start all containers:
curl -X POST 127.0.0.1:8546/api/node/management?action=start
As with the process to stop the containers, you can use the sudo docker ps
command to validate that all containers are up and running again:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
55b5e3c63f12 96e1e024f557 "/go/bin/agent-linux" 2 months ago Up 50 seconds (healthy) 0.0.0.0:5775->5775/udp, 0.0.0.0:6831-6832->6831-6832/udp, 0.0.0.0:5778->5778/tcp jaeger-agent
1328c3b7af1e def4a8d834f5 "/concord/concord-en…" 2 months ago Up 51 seconds (healthy) 0.0.0.0:3501->3501/tcp, 3501-3505/udp, 0.0.0.0:50051->50051/tcp, 3502-3505/tcp, 127.0.0.1:5458->5458/tcp concord
4896943fb83b 6b3f9670fd94 "/bin/bash /opt/wave…" 2 months ago Up 52 seconds wavefront-proxy
abd1f875147b 4697884441e0 "/doc/daml/entrypoin…" 2 months ago Up 53 seconds (healthy) 0.0.0.0:55000->55000/tcp daml_execution_engine
32a61957f4f0 cb15bf57b0aa "tini -- /bin/entryp…" 2 months ago Up 54 seconds (healthy) 5140/tcp, 24224/tcp fluentd
5dce10507743 harbor.corp.vmw/vmwblockchain/agent:1.6.0.1.266 "java -jar node-agen…" 2 months ago Up 11 minutes 0.0.0.0:8546->8546/tcp agent
070ae96c04e2 870ee38129f8 "/entrypoint.sh tele…" 2 months ago Up 49 seconds (healthy) 8092/udp, 8125/udp, 8094/tcp, 0.0.0.0:9273->9273/tcp telegraf
From an SSH session to the client node where the operator container is running, you can now unwedge the replica nodes:
sudo docker exec -it operator sh -c './concop unwedge'
You should see output similar to the following:
{"succ":true}
And as with the wedge that was issued earlier, you can use the following to get more detail on if all nodes are truly unwedged:
sudo docker exec -it operator sh -c './concop wedge status'
{"192.168.100.31":false,"192.168.100.32":false,"192.168.100.33":false,"192.168.100.34":false}
The last thing to do is to start all containers on the client nodes. Issue the following on each via an SSH session:
curl -X POST 127.0.0.1:8546/api/node/management?action=start
sudo docker ps
on each client node should show output similar to the following:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
add90e1994ab ae2a0236ff92 "/operator/operator_…" 2 months ago Up 53 minutes operator
41077d06d4cf 3382e600c110 "/clientservice/clie…" 2 months ago Up 20 seconds (healthy) 0.0.0.0:50505->50505/tcp clientservice
f09352787a2f 870ee38129f8 "/entrypoint.sh tele…" 2 months ago Up 21 seconds (healthy) 8092/udp, 8125/udp, 8094/tcp, 0.0.0.0:9273->9273/tcp telegraf
0b4ace21eb4f f1edf3cb8810 "/cre/cre_server" 2 months ago Up 22 seconds cre
cb9ae083f8b4 96e1e024f557 "/go/bin/agent-linux" 2 months ago Up 23 seconds (healthy) 0.0.0.0:5775->5775/udp, 0.0.0.0:6831-6832->6831-6832/udp, 0.0.0.0:5778->5778/tcp jaeger-agent
79b054c4aa96 418c1f4894c2 "/bin/sh -c '/doc/da…" 2 months ago Up 24 seconds (healthy) 0.0.0.0:6865->6865/tcp daml_ledger_api
d95e81a1b4aa 6b3f9670fd94 "/bin/bash /opt/wave…" 2 months ago Up 25 seconds wavefront-proxy
16aa15d0f24f 01e47563f112 "/doc/daml/scripts/d…" 2 months ago Up 26 seconds (healthy) 5432/tcp daml_index_db
c9bdc55ed1d0 cb15bf57b0aa "tini -- /bin/entryp…" 2 months ago Up 26 seconds (healthy) 5140/tcp, 24224/tcp fluentd
6898c4ba6f18 harbor.corp.vmw/vmwblockchain/agent:1.6.0.1.266 "java -jar node-agen…" 2 months ago Up 54 minutes 0.0.0.0:8546->8546/tcp agent
Create a Transaction
To make sure the restore process works as expected, I created a new transaction with the intent of rolling it back via a restore from my earlier backup.



You can see that the Alice user now has two contracts that are responsible for two IOUs.
I also thought to check the BlockID value on the replica nodes to validate that it went up (from the value of 320 earlier):
image=$(sudo docker images --format "{{.Repository}}:{{.Tag}}" | grep "concord")
sudo docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata $image /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastBlockID
sudo docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata $image /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastReachableBlockID'
I saw the following output on all replica nodes:
{
"lastBlockID": "322"
}
{
"lastReachableBlockID": "322"
}
Restore the Replica Nodes
The first part of the retore is the same as the backup section…stop any applications accessing the Blockchain, stop all containers except for the agent and operator containers on the client nodes.
I restored the client node database from the backup taken earlier. Before restoring the client database, the original must be removed via the following command executed via an SSH session to each client node:
sudo rm -rf /mnt/data/db/
The following commands were run from the same remote system where the backup was saved earlier. If you have more than one client node, you will need to run these commands against each one of them:
sshpass -p '<vmbc_user_password>' scp -o StrictHostKeyChecking=no -o PreferredAuthentications=password -o LogLevel=QUIET client.tgz vmbc@192.168.100.35:
sshpass -p '<vmbc_user_password>' ssh -t -o StrictHostKeyChecking=no -o PreferredAuthentications=password -o LogLevel=QUIET vmbc@192.168.100.35 'sudo tar xvzf client.tgz -C /
Once more, you’ll need to stop all containers except the agent container on all replica and full-copy client nodes. You can SSH to each node and run the following command:
curl -X POST 127.0.0.1:8546/api/node/management?action=stop
You can validate that only the agent container is running on the replica and full-copy client nodes by running sudo docker ps
.
Similar to what was done on the client node, you will want to remove the existing rocksdb database on all replica nodes via the following command:
sudo rm -rf /mnt/data/rocksdbdata
From the remote system where the rocksb database was copied earlier, the following commands can be used to restore this data to each replica node:
for node in 192.168.100.31 192.168.100.32 192.168.100.33 192.168.100.34
do
sshpass -p '<vmbc_user_password>' scp -o StrictHostKeyChecking=no -o PreferredAuthentications=password -o LogLevel=QUIET replica-$node.tgz vmbc@$node:replica.tgz
sshpass -p '<vmbc_user_password>' ssh -t -o StrictHostKeyChecking=no -o PreferredAuthentications=password -o LogLevel=QUIET vmbc@$node 'sudo tar xvzf replica.tgz -C /'
done
You can now SSH to each replica and full-copy client node and start all containers via the following command:
curl -X POST 127.0.0.1:8546/api/node/management?action=start
And you can use sudo docker ps
to validate that all containers are running.
At this point, you can check that the blockID value on each replica node is back to what it was when the backup was taken (320 in my case):
image=$(sudo docker images --format "{{.Repository}}:{{.Tag}}" | grep "concord")
sudo docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata $image /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastBlockID
sudo docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata $image /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastReachableBlockID'
{
"lastBlockID": "320"
}
{
"lastReachableBlockID": "320"
}
The last thing to do is to start all containers on the client nodes via the following command:
curl -X POST 127.0.0.1:8546/api/node/management?action=start
And again, use the sudo docker ps
command on the client nodes to ensure that all containers are running.
Within the damlnavigator application, I was able to validate that the Alice user only sees one contract and one IOU, as was the case before the backup was taken:


Client Backup and Restore
Backup the Client Node(s)
You’ve already seen the manual client backup and restore process replica backup and restore section of this post. Simply copying the database folder/file structure off the system and back on is all that is needed. There is an automated process as well that can be implemented to create regular backups.
The only prerequisite for this to work is that you need a writeable NFS share to store the backups remotely from the client nodes. You’ll want to set the ownership on this folder to 999:999
so that the postgres user in the daml_index_db container can write to it.
On each client node, mount your NFS share to the /mnt/client-backups
directory:
sudo mount -t nfs 10.10.20.60:/mnt/vol1/bc-client-backups /mnt/client-backups/
10.10.20.60 is my NFS filer and /mnt/vol1/bc-client-backups
is the share/folder where I’ll be storing the client node backups. /mnt/client-backups
is a pre-existing folder on each client node. You can use the df
command to see that the share is mounted successfully:
Filesystem Size Used Avail Use% Mounted on
10.10.20.60:/mnt/vol1/bc-client-backups 191G 119G 73G 63% /mnt/client-backups
You should create an /etc/fstab
entry similar to the following to ensure that the share is re-mounted at boot:
10.10.20.60:/mnt/vol1/bc-client-backups /mnt/client-backups nfs defaults,rw 0 0
In order to better assure that the backup process will work, a temporary daml_index_db container will be created and we’ll test writing to the /mnt/client-backups
mount as the postgres user.
You’ll need to find the image id value for the daml-index-db container image:
sudo docker images |egrep 'REPO|index'
REPOSITORY TAG IMAGE ID CREATED SIZE
harbor.corp.vmw/vmwblockchain/daml-index-db 1.6.0.1.266 01e47563f112 5 months ago 358MB
With the image id value of 01e47563f112, we can start a second daml_index_db container by issuing a command similar to the following:
sudo docker run -it --rm --name=daml_index_db_mount_test -e PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/postgresql/11/bin -e GOSU_VERSION=1.12 -e LANG=en_US.utf8 -e PG_MAJOR=11 -e PG_VERSION=11.12-1.pgdg90+1 -e PGDATA=/var/lib/postgresql/data -e POSTGRES_USER=indexdb -e POSTGRES_PASSWORD=indexdb -e POSTGRES_MULTIPLE_SCHEMAS=daml_ledger_api -e POSTGRES_CONFIG_FILE= -v /config/daml-index-db:/config/daml-index-db:rw -v /config/generic:/config/generic:rw -v /mnt/data/db:/var/lib/postgresql/data:rw -v /config/pgbackrest:/etc/pgbackrest -v /mnt/client-backups:/mnt/client-backups --network=blockchain-fabric --expose=5432 --entrypoint /bin/bash 01e47563f112
Be sure to replace 01e47563f112 with the value for your own daml-index-db image.
You’ll be at a prompt within the temporary daml_index_db container. From here issue the following commands to test writing to the /mnt/client-backups
mount as the postgres user:
su - postgres
touch /mnt/client-backups/somefile
ls /mnt/client-backups/
somefile
rm /mnt/client-backups/somefile
You can type exit
to exit this container and stop it.
Now that we’re sure that the postgres user can write to the /mnt/client-backups mount, we need to restart all containers on the client node.
curl -X POST 127.0.0.1:8546/api/node/management?action=stop
curl -X POST 127.0.0.1:8546/api/node/management?action=start
You can use the sudo docker ps
command to validate that the containers were recent restarted (note the STATUS
column value):
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
add90e1994ab ae2a0236ff92 "/operator/operator_…" 2 months ago Up 2 days operator
41077d06d4cf 3382e600c110 "/clientservice/clie…" 2 months ago Up 11 seconds (healthy) 0.0.0.0:50505->50505/tcp clientservice
f09352787a2f 870ee38129f8 "/entrypoint.sh tele…" 2 months ago Up 12 seconds (healthy) 8092/udp, 8125/udp, 8094/tcp, 0.0.0.0:9273->9273/tcp telegraf
0b4ace21eb4f f1edf3cb8810 "/cre/cre_server" 2 months ago Up 13 seconds cre
cb9ae083f8b4 96e1e024f557 "/go/bin/agent-linux" 2 months ago Up 14 seconds (healthy) 0.0.0.0:5775->5775/udp, 0.0.0.0:6831-6832->6831-6832/udp, 0.0.0.0:5778->5778/tcp jaeger-agent
79b054c4aa96 418c1f4894c2 "/bin/sh -c '/doc/da…" 2 months ago Up 15 seconds (healthy) 0.0.0.0:6865->6865/tcp daml_ledger_api
d95e81a1b4aa 6b3f9670fd94 "/bin/bash /opt/wave…" 2 months ago Up 165 seconds wavefront-proxy
16aa15d0f24f 01e47563f112 "/doc/daml/scripts/d…" 2 months ago Up 17 seconds (healthy) 5432/tcp daml_index_db
c9bdc55ed1d0 cb15bf57b0aa "tini -- /bin/entryp…" 2 months ago Up 18 seconds (healthy) 5140/tcp, 24224/tcp fluentd
6898c4ba6f18 harbor.corp.vmw/vmwblockchain/agent:1.6.0.1.266 "java -jar node-agen…" 2 months ago Up 2 days 0.0.0.0:8546->8546/tcp agent
You will also need to manually restart the agent container with the sudo docker restart agent
command.
I stumbled upon an issue with the file system permissions in the daml_index_db container while conducting this exercise. The /var/lib/postgre
s folder is owned by the postgres user and the root group. This causes no issues for normal operations but during the restore process, the postgres user attempts to set that group ownership to root and is blocked by the operating system. The workaround is to set the group ownership to postgres prior to configuring the backup.
Launch a shell into the daml_index_db container:
docker exec -it daml_index_db bash
Issue the following command to ensure that the postgres group owns all files/folders under /var/lib/postgresql
:
chown -R postgres:postgres /var/lib/postgresql
Type exit
to get out of the daml_index_db container.
Checking the status of the backup configuration on the client node, you should see that it is disabled:
curl -X GET 127.0.0.1:8546/api/backup/status | jq
{
"execution_status_code": 0,
"last_run_start_time": "1970-01-01T00:00:00Z[UTC]",
"last_run_end_time": "1970-01-01T00:00:00Z[UTC]",
"backup_state": {
"state": "DISABLED",
"state_change_time": "1970-01-01T00:00Z[UTC]"
},
"backup_configuration": {
"schedule_frequency": "",
"retention_days": 0,
"enabled": false
},
"in_progress": false,
"next_run_time": "1970-01-01T00:00:00Z[UTC]"
}
To create a backup job, you must POST a backup configuration to the client node’s API. The following is a sample of what a backup configuration could look like:
{
"retention_days": 33,
"schedule_frequency": "DAILY"
}
And the command to POST the backup configuration should look similar to the following:
curl -X POST 127.0.0.1:8546/api/backup/ -H "Content-Type: application/json" -d @client-backup.json
You should see output similar to the following:
{
"message":"ClientBackup: Scheduled backup next run at: 2022-10-25T19:00Z[UTC]",
"execution_message": "2022-10-25 14:48:20.963 P00 INFO: stanza-create command begin 2.38: --exec-id=342-e5784a67 --log-level-console=info --pg1-path=/var/lib/postgresql/data --repo1-path=/mnt/client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf --stanza=daml-indexdb\n2022-10-25 14:48:21.600 P00 INFO: stanza-create for stanza 'daml-indexdb' on repo1\n2022-10-25 14:48:21.701 P00 INFO: stanza-create command end: completed successfully (753ms)\n",
"execution_error": "",
"execution_status_code": 0,
"last_run_start_time": "1970-01-01T00:00:00Z[UTC]",
"last_run_end_time": "1970-01-01T00:00:00Z[UTC]",
"backup_state": {
"state": "ENABLED",
"state_change_time": "2022-10-25T14:48:23.035841Z[UTC]"
},
"backup_configuration": {
"schedule_frequency": "DAILY",
"retention_days": 33,
"enabled": true
},
"in_progress": false,
"next_run_time": "2022-10-25T19:00:00Z[UTC]"
}
You can also query the status of the backup configuration on the client node and see nearly identical output to what is posted above:
curl -X GET 127.0.0.1:8546/api/backup/status | jq
{
"execution_message": "2022-10-25 14:48:20.963 P00 INFO: stanza-create command begin 2.38: --exec-id=342-e5784a67 --log-level-console=info --pg1-path=/var/lib/postgresql/data --repo1-path=/mnt/client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf --stanza=daml-indexdb\n2022-10-25 14:48:21.600 P00 INFO: stanza-create for stanza 'daml-indexdb' on repo1\n2022-10-25 14:48:21.701 P00 INFO: stanza-create command end: completed successfully (753ms)\n",
"execution_error": "",
"execution_status_code": 0,
"last_run_start_time": "1970-01-01T00:00:00Z[UTC]",
"last_run_end_time": "1970-01-01T00:00:00Z[UTC]",
"backup_state": {
"state": "ENABLED",
"state_change_time": "2022-10-25T14:48:23.035841Z[UTC]"
},
"backup_configuration": {
"schedule_frequency": "DAILY",
"retention_days": 33,
"enabled": true
},
"in_progress": false,
"next_run_time": "2022-10-25T19:00:00Z[UTC]"
}
A backup should be scheduled for a few hours from the current time and subsequent backups will be run on a daily basis. In this example, the command was run at 2022-10-25 14:48:20 UTC and the next scheduled backup is set for 2022-10-25 19:00:00 UTC.
If you examine the contents of the share/folder on your NFS filer, you will see that there is a new folder with a name that is a UUID value. This UUID value is unique to each client node (you will actually see it as part of the client node VM’s name in the vSphere client). Under this folder there is a backup
folder and an archive
folder with the backup contents.
find /mnt/vol1/bc-client-backups
/mnt/vol1/bc-client-backups
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/backup
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/backup/daml-indexdb
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/backup/daml-indexdb/backup.info
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/backup/daml-indexdb/backup.info.copy
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/archive
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/archive/daml-indexdb
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/archive/daml-indexdb/archive.info.copy
/mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/archive/daml-indexdb/archive.info
You can also examine the contents of the backup.info
or archive.info
files to see metadata on the database and backups:
cat /mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/backup/daml-indexdb/backup.info
[backrest]
backrest-format=5
backrest-version="2.38"
[db]
db-catalog-version=201809051
db-control-version=1100
db-id=1
db-system-id=7127334954729418817
db-version="11"
[db:history]
1={"db-catalog-version":201809051,"db-control-version":1100,"db-system-id":7127334954729418817,"db-version":"11"}
[backrest]
backrest-checksum="23c61e32fcefc4d5a12a95516df5ee918694b097"
cat /mnt/vol1/bc-client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf/archive/daml-indexdb/archive.info
[backrest]
backrest-format=5
backrest-version="2.38"
[db]
db-id=1
db-system-id=7127334954729418817
db-version="11"
[db:history]
1={"db-id":7127334954729418817,"db-version":"11"}
[backrest]
backrest-checksum="c7f1a8008ca064c91f4487151280f696675135a5"
I waited until after 19:00:00 UTC to see how things had changed:
curl -X GET 127.0.0.1:8546/api/backup/status | jq
{
"execution_message": "2022-10-25 19:00:01.329 P00 INFO: backup command begin 2.38: --archive-timeout=600 --compress-type=lz4 --exec-id=32610-ab86462c --log-level-console=info --pg1-path=/var/lib/postgresql/data --process-max=10 --repo1-path=/mnt/client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf --repo1-retention-full=33 --repo1-retention-full-type=time --stanza=daml-indexdb --type=full\n2022-10-25 19:00:02.042 P00 INFO: execute non-exclusive pg_start_backup(): backup begins after the next regular checkpoint completes\n2022-10-25 19:00:03.645 P00 INFO: backup start archive = 000000010000000000000005, lsn = 0/5000028\n2022-10-25 19:00:03.645 P00 INFO: check archive for prior segment 000000010000000000000004\n2022-10-25 19:00:11.711 P00 INFO: execute non-exclusive pg_stop_backup() and wait for all WAL segments to archive\n2022-10-25 19:00:12.213 P00 INFO: backup stop archive = 000000010000000000000005, lsn = 0/50000F8\n2022-10-25 19:00:12.314 P00 INFO: check archive for segment(s) 000000010000000000000005:000000010000000000000005\n2022-10-25 19:00:12.668 P00 INFO: new backup label = 20221025-190001F\n2022-10-25 19:00:12.832 P00 INFO: full backup size = 39.6MB, file total = 1641\n2022-10-25 19:00:12.833 P00 INFO: backup command end: completed successfully (11506ms)\n2022-10-25 19:00:12.833 P00 INFO: expire command begin 2.38: --exec-id=32610-ab86462c --log-level-console=info --repo1-path=/mnt/client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf --repo1-retention-full=33 --repo1-retention-full-type=time --stanza=daml-indexdb\n2022-10-25 19:00:12.871 P00 INFO: repo1: time-based archive retention not met - archive logs will not be expired\n2022-10-25 19:00:12.871 P00 INFO: expire command end: completed successfully (38ms)\n",
"execution_error": "",
"execution_status_code": 0,
"last_run_start_time": "2022-10-25T19:00:00.038502Z[UTC]",
"last_run_end_time": "2022-10-25T19:00:13.316444Z[UTC]",
"backup_state": {
"state": "ENABLED",
"state_change_time": "2022-10-25T14:48:23.035841Z[UTC]"
},
"backup_configuration": {
"schedule_frequency": "DAILY",
"retention_days": 33,
"enabled": true
},
"in_progress": false,
"next_run_time": "2022-10-26T06:00:00Z[UTC]"
}
You can see that the last backup ran at 2022-10-25 19:00:00UTC.
If you check the contents of the share/directory on your NFS filer, you’ll see that the number of objects has jumped from a handful to thousands.
One last step for the client node backup process is to export the database metadata so that it can be compared later after a restore.
sudo docker exec -it daml_index_db bash
root@16aa15d0f24f:/# su - postgres
postgres@16aa15d0f24f:~$ psql -U indexdb -d daml_ledger_api
psql (11.13 (Debian 11.13-1.pgdg90+1))
Type "help" for help.
daml_ledger_api=#
select count(1) from configuration_entries;
select count(1) from flyway_schema_history;
select count(1) from metering_parameters;
select count(1) from package_entries;
select count(1) from packages;
select count(1) from parameters;
select count(1) from participant_command_completions;
select count(1) from participant_events;
select count(1) from participant_events_consuming_exercise;
select count(1) from participant_events_create;
select count(1) from participant_events_create_filter;
select count(1) from participant_events_divulgence;
select count(1) from participant_events_non_consuming_exercise;
select count(1) from participant_metering;
select count(1) from participant_migration_history_v100;
select count(1) from participant_user_rights;
select count(1) from participant_users;
select count(1) from party_entries;
select count(1) from string_interning;
select count(1) from transaction_metering;
You’ll want to save the output and then type exit
a few times to get out of this container.
Restore the Client Node(s)
Again, we’re going to stop all containers except the agent and operator containers on the client node:
curl -X POST 127.0.0.1:8546/api/node/management?action=stop
We need to remove the contents of the /mnt/data/db
folder (as the root user) to ensure that there is no conflicting data after the restore:
sudo bash -c 'rm -rf /mnt/data/db/*'
Launch a temporary daml_index_db container:
sudo docker run -it --rm --name=daml_index_db_mount_test -e PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/postgresql/11/bin -e GOSU_VERSION=1.12 -e LANG=en_US.utf8 -e PG_MAJOR=11 -e PG_VERSION=11.12-1.pgdg90+1 -e PGDATA=/var/lib/postgresql/data -e POSTGRES_USER=indexdb -e POSTGRES_PASSWORD=indexdb -e POSTGRES_MULTIPLE_SCHEMAS=daml_ledger_api -e POSTGRES_CONFIG_FILE= -v /config/daml-index-db:/config/daml-index-db:rw -v /config/generic:/config/generic:rw -v /mnt/data/db:/var/lib/postgresql/data:rw -v /config/pgbackrest:/etc/pgbackrest -v /mnt/client-backups:/mnt/client-backups --network=blockchain-fabric --expose=5432 --entrypoint /bin/bash 01e47563f112
Switch to the postgres user and use the pgbackrest
command to check on the status of the backup:
su - postgres
pgbackrest info
stanza: daml-indexdb
status: ok
cipher: none
db (current)
wal archive min/max (11): 000000010000000000000004/000000010000000000000005
full backup: 20221025-190001F
timestamp start/stop: 2022-10-25 19:00:01 / 2022-10-25 19:00:12
wal start/stop: 000000010000000000000005 / 000000010000000000000005
database size: 39.6MB, database backup size: 39.6MB
repo1: backup set size: 7.3MB, backup size: 7.3MB
Use the pgbackrest
command again to restore from the most recent backup:
pgbackrest --stanza=daml-indexdb --log-level-console=info restore
2022-10-25 19:28:40.869 P00 INFO: restore command begin 2.38: --exec-id=12-6934bf90 --log-level-console=info --pg1-path=/var/lib/postgresql/data --process-max=10 --repo1-path=/mnt/client-backups/a1b9dae0-a16a-472b-bfd3-fb76c241ffdf --stanza=daml-indexdb
2022-10-25 19:28:40.926 P00 INFO: repo1: restore backup set 20221025-190001F, recovery will start at 2022-10-25 19:00:01
2022-10-25 19:28:46.506 P00 INFO: write /var/lib/postgresql/data/recovery.conf
2022-10-25 19:28:46.663 P00 INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2022-10-25 19:28:46.668 P00 INFO: restore size = 39.6MB, file total = 1641
2022-10-25 19:28:46.669 P00 INFO: restore command end: completed successfully (5802ms)
Type exit
a few times to get out of the temporary daml_index_db container. Issue the following command to start all containers on the client node:
curl -X POST 127.0.0.1:8546/api/node/management?action=start