vRealize AutomationEarlier in this series, I discussed Microsoft SQL database high availability but when deploying a fully distributed vRealize Automation environment, you also need to protect the Postgres database on the vRealize Automation appliance.

In the past, when you wanted high availability for the vCloud Automation Center/vRealize Automation internal database, you needed an external instance of the VMware vFabric Postgres vRA database. As of September 1, 2014 VMware vFabric Postgres is End Of Availability and no longer available as a standalone product.

As of vRealize Automation 6.x VMware developed a way to utilize the database instance located in the VMware vRealize Automation appliance in a high availability (HA) mode, without having to incur additional licensing.

To setup Postgres database high availability and configure streaming replication follow the steps below:

Prerequisites

  • The appliance is freshly deployed and no configuration is performed;
  • The replication channel is not encrypted;
  • Create a DNS Entry for the database VIP, for example: dbVIP.domain.local.
    The IP address of the DNS entry created should point to the primary appliance.
  • Two VMware vRealize Automation appliances, freshly deployed and resolvable through DNS;
  • Administrator access to VMware vSphere vCenter, in order to add new disks to the vRealize Automation Appliance.

Configuring  the database VIP

 To configure database VIP, perform these steps on both vRealize Automation appliances:
  1. Power off the vRealize Automation appliance;
  2. Add a 20GB disk through VMware vSphere Server;
  3. Power on the appliance;
  4. Go to VAMI interface (https://[vRA-IP or DNS name]:5480) and verify that SSH is enabled;
  5. Download the 2108923_dbCluster.zip file;
  6. Copy the 2108923_dbCluster.zip file to the appliance;
  7. Connect to the appliance using SSH as root;
  8. Extract the tar file by running this command:# tar xvf 2108923_dbCluster.tar
  9. Locate the disk added in Step 2 by running this command::# parted -l

    Error:
    /dev/sdd: unrecognized disk label
    Sector size (logical/physical): 512B/512B
    Note: For a fresh VMware vRealize Automation 6.2 appliance deployment, this should be /dev/sdd.
    This value varies depending on the version of VMware vRealize Automation appliance deployed.
  10. Run the configureDisk.sh script to configure the disk added in Step 2:# ./configureDisk.sh /dev/sdd
  11. Run the pgClusterSetup.sh script to prepare the appliance database for clustering:# ./pgClusterSetup.sh [-d] db_fqdn [-w] db_pass [-r] replication_password [-p] postgres_password
    [-d] Database load balancer fully qualified domain name
    [-w] Database password (will set password to this value)
    [-r] Replication password (Optional: will use Database password if not set)
    [-p] Postgres password (Optional: will use Database password if not set)


    Note
    : Change the password from changeMe! # ./pgClusterSetup.sh -d dbCluster.domain.local -w changeMe1! -r changeMe1! -p changeMe1!

    Updating vRealize Automation to utilize database cluster fully qualified domain name
    Finished

Configuring the database replication on appliance B (replica)

  1. Connect to the appliance B using SSH as root;
  2. Configure replication as user postgres:./run_as_replica –h Primary Appliance -b -W -U replicate
    [-U] The user who will perform replication. For the purpose of this KB this user is replicate
    [-W] Prompt for the password of the user performing replication
    [-b] Take a base backup from the master. This option destroys the current contents of the data directory
    [-h] Hostname of the master database server. Port 5432 is assumed
    # su – postgres
    /opt/vmware/vpostgres/current/share/run_as_replica -h app1.domain.local -b -W -U replicate

    1. Enter the replicate users password when prompted;
    2. Type yes after verifying the thumbprint of the primary machine;
    3. Enter the postgres users password;
    4. Type yes when prompted with:Type yes to enable WAL archiving on primary;
    5. Type yes when prompted with:WARNING: the base backup operation will replace the current contents of the data directory. Please confirm by typing yes:
  3. Peform the steps in Appendix D to validate replication.Note: Appliance A is the primary (master) and appliance B is the replica.

Performing a test failover (appliance A to appliance B)

  1. Connect to appliance A using SSH as root.
  2. Stop the vpostgres service by running this command:# service vpostgres stop
    Stopping VMware vPostgres: Last login: Mon Apr 27 19:49:26 UTC 2015 on pts/0
    ok
  3. Connect to appliance B using SSH as root.
  4. Promote the replica database to master as the postgres user by running these commands:# su – postgres
    /opt/vmware/vpostgres/current/share/promote_replica_to_primary
    server promoting
  5. SSH into appliance A as root.
  6. Configure database replication as user postgres by running these commands:# su – postgres
    /opt/vmware/vpostgres/current/share/run_as_replica -h app2.domain.local -b -W -U replicate

    1. Enter the replicate users password when prompted.
    2. Type yes after verifying the thumbprint of the primary machine when prompted.
    3. Enter the postgres users password when prompted.
    4. Type yes when prompted with:WARNING: the base backup operation will replace the current contents of the data directory. Please confirm by typing yes:
  7. Perform the steps in Appendix D to validate replica.Note: Appliance B is the primary (master) and appliance A is the replica.

Perform a test failback (appliance B to appliance A)

  1. Connect to appliance B using SSH as root.
  2. Stop the vpostgres service by running this command:# service vpostgres stop
    Stopping VMware vPostgres: Last login: Mon Apr 27 19:49:26 UTC 2015 on pts/0
    ok
  3.  Connect to appliance A using SSH as root.
  4. Promote the replicate database to master as user postgres by running these commands:# su – postgres
    /opt/vmware/vpostgres/current/share/promote_replica_to_primary
    server promoting
  5. Connect to appliance B using SSH as root.
  6. Configure database replication as user postgres by running these commands:# su – postgres
    /opt/vmware/vpostgres/current/share/run_as_replica -h app1.domain.local -b -W -U replicate

    1. Enter the replicate users password when prompted.
    2. Type yes when prompted with:WARNING: the base backup operation will replace the current contents of the data directory. Please confirm by typing yes:
  7. Appliance A is now the primary (master) and appliance B is the replica.

 

Performing failover

Note: VMware vRealize Automation will be offline from step 1 until step 7 has been completed.

  1. If the appliance hosting the master database is still functional, perform steps 2 and 3.
    If the appliance hosting the master database is no longer accessible, go to step 4.
  2. Connec to appliance A, containing the primary (master) database, using SSH as root.
  3. Stop the vpostgres service by running this command:# service vpostgres stop
    Stopping VMware vPostgres: Last login: Mon Apr 27 19:49:26 UTC 2015 on pts/0
    ok
  4. Promote the replica database to master database.
  5. Connect to appliance B, containing the replica database, using SSH as root.
  6. Promote the replica database to master as user postgres by running these commands:# su – postgres
    /opt/vmware/vpostgres/current/share/promote_replica_to_primary
    server promoting
  7. Modify the IP address of the DNS Entry to point at the new primary appliance.
    On all VMware vRealize Automation appliances, log in as root and run the command:service network restart
  8. Rebuild the replica database on appliance A.
  9. Connect to appliance A using SSH as root.
  10. Configure database replication as user postgres using these commands:# su – postgres
    /opt/vmware/vpostgres/current/share/run_as_replica -h app1.domain.local -b -W -U replicate

    1. Enter the replicate users password when prompted.
    2. Type yes when prompted with:WARNING: the base backup operation will replace the current contents of the data directory. Please confirm by typing yes:
  11. Appliance B is now the primary (master) and appliance A is the replica.

Validate replication

  1. Connect to the appliance with the primary (master) database using SSH.
  2. Validate if the WAL process is running. You should see the WAL process by running this command:# ps -ef |grep wal
    postgres 4784 4779 0 21:42 ? 00:00:00 postgres: wal writer process
    postgres 20901 4779 0 22:49 ? 00:00:00 postgres: wal sender process replicate 10.26.36.64(55887) streaming 0/70000B8
  3. Validate if the master is ready for read-write connections by running these commands:#su – postgres
    /opt/vmware/vpostgres/current/bin/psql vcac
    SELECT pg_is_in_recovery();
    You see output similar to:vcac=# SELECT pg_is_in_recovery();
    pg_is_in_recovery
    ——————-
    f(1 row)
  4. Quit psql by running this command:\q to quit psql
  5. Connect to the appliance with the replica database using SSH.
  6. Validate if the replica is read only using these commands:#su – postgres
    # /opt/vmware/vpostgres/current/bin/psql vcac
    SELECT pg_is_in_recovery();
    The command should return:vcac=# SELECT pg_is_in_recovery();
    pg_is_in_recovery
    ——————-t(1 row)
  7. Quit psql by running this command:\q to quit psql