Part 2: Failure and Recovery
Post by Martin Hayes
In Part 1, we got to the point where we configured and manually kicked off a ‘Protection Job’. Next up we want to see how to recover from what we will call an orchestrated ‘mishap’ versus a disaster per se. We will cover off a inter cluster level recovery in a future post, whereby we actually lose our entire OpenShift cluster for whatever reason. For now though we are going to see what happens when we introduce some human error!

As per the diagram above we have our Namespace or ‘Project’ on OpenShift with our application Pod running inside, creatively named ‘ppdm-test-pod’.

Of course I have been busy developing my application and I have written a file to the mounted volume, which is backed via PowerStore. We will see this in more detail in the video demo, when we we run through the process end to end. This will serve as a simple example of data persistence post recovery.

Navigating back to the Pod details in the GUI we can verify the mount path for the storage volume and the associated Persistent Volume Claim (PVC). This is the one we attached in the last post ‘ppdm-claim-1’. Note: the path ‘mnt/storage’ is where we have written our demo text file.

Delving a little deeper into the PVC details we can see the name of the Persistent Volume that has been created on PowerStore and the associated Storage Class.

Moving on over to PowerStore we can see that the ‘PersistentVolume’ ‘ocp11-f0e672d7f6’ is present as expected.

Orchestrated Failure
Before we orchestrate or demo a failure by deleting the OpenShift project, let’s ensure we have a copy on PPDM from which to recover. We did this in the last post, but just to confirm.

Next let’s go ahead and delete the namespace/project by navigating to the project and using the GUI to ‘Delete Project’.

Confirm ‘Delete Project’ when prompted.

Wait for a couple of minutes as the the namespace/project deletes and its associated entities take a couple of minutes to terminate and clear down.

No OpenShift project means our POD/Application has also been deleted as has our Persistent Volume Claim (PVC) and Persistent Volume. Note ‘ocp11-f0e672d7f6’ has disappeared.

What about on PowerStore itself. We can see here it is also gone! Where once we had 8 volumes present we now have 7. The CSI API has unbound the claim on the volume and Powerstore has deleted ‘ocp11-f0e672d7f6’.

Net result everything is gone, the Project/Namespace, the PVC, the volume on PowerStore and by definition our application. A bit of a mini disaster if you deleted the namespace in error…. it happens to the best of us!
Have no fear… PPDM and DDVE to the rescue.
Policy Driven Recovery via PPDM
Of course we have everything backed up in our DDVE instance, fully orchestrated by PPDM. Let’s head back over to the PPDM console and perform a full recovery.
Navigate to the ‘Restore’ menu and then to ‘Assets’.

The process is really very straightforward. Note you are presented with the option to recover from multiple point in time copies of the data (dependent on the length of your retention policy). I want to recover the latest copy. Select the namespace to recover and then click ‘Restore’.

Run through the menu. We will restore to the original cluster (In an upcoming blog we will restore to an alternate OpenShift Cluster on different hardware).

We will chose to restore everything, including cluster scoped resources, such as role bindings and custom resource definitions (CRD’s).

For the restore type we will ‘Restore to a New Namespace’, giving it the name of ‘ppdm-restored’.

We have only a copy of a single PVC to restore, so we will select that copy. Click ‘Next’.

Skipping through a couple of screenshots until we get to the last step ( Everything will be covered in the video demo). Make sure everything looks o.k. and then click ‘Restore’.

Navigate over to the jobs pane and monitor the status of the restore.

You can drill a little deeper into the Job to monitor its progress. There is a bit going on behind the scenes in terms of the cproxy pod deployment, so be patient. (this process will be the subject of another blog also, when we dig into what actually happens in the background). This will be a little clearer also in the video.

Finally after a couple of minutes, the PPDM console has indicated that everything has completed sucessfully

The ‘proof is in the pudding’ as they say, so let’s verify what has actually happened and have I recovered my application workload/pod?
Verification
Back in the OpenShift Console, we can see that the ‘ppdm-restored’ Project has been created and we have the pod ‘ppdm-test-pod’ has been re-created and deployed into this namespace.

Navigating into the Pod terminal itself. Let’s see if I can see the text file that I created earlier. Let’s ‘Cat’ the file to have a peek inside to make sure I’m telling the truth…sure enough here is our original file and content.

What about our Persistent Volume Claim (PVC). as we can see this has also been recovered and re-attached to our POD.

Double-clicking on the ‘ppdm-claim-1’, we can see it is bound and has created a net new Persistent Volume ‘ocp11-c0857aec4d’.

And finally….. back over to Powerstore, we can see our net new volume that has been provisioned via CSI, where our restored data has been written.
