1.1 What’s New
•
•
In the following descriptions a term refers to any word or group of words that are language keywords, user-supplied values, literals, etc. A term’s exact meaning depends upon the context in which it is used.
• Italic font introduces a new term, typically, in the sentence that defines it for the first time.
• Fixed-width (mono-spaced) font is used for terms that must be given literally such as SQL commands, specific table and column names used in the examples, programming language keywords, etc. For example, SELECT * FROM emp;
• Italic fixed-width font is used for terms for which the user must substitute values in actual usage. For example, DELETE FROM table_name;
• Square brackets [ ] denote that one or none of the enclosed term(s) may be substituted. For example, [ a | b ], means choose one of “a” or “b” or neither of the two.
• Braces {} denote that exactly one of the enclosed alternatives must be specified. For example, { a | b }, means exactly one of “a” or “b” must be specified.
• Ellipses ... denote that the proceeding term may be repeated. For example, [ a | b ] ... means that you may have the sequence, “b a a b a”.
• Traditionally, a cluster is a single instance of Postgres managing multiple databases. In this document, the term cluster refers to a Failover Manager cluster. A Failover Manager cluster consists of a Master agent, one or more Standby agents, and an optional Witness agent that reside on servers in a cloud or on a traditional network and communicate using the JGroups toolkit.Figure 2.1 illustrates a Failover Manager cluster that employs a virtual IP address. You can use a load balancer in place of a virtual IP address if you provide your own fencing script to re-configure the load balancer in the event of a failure. For more information about using Failover Manager with a virtual IP address, see Section 3.6. For more information about using a fencing script, see Section 3.5.1.
•
•
2.2 Prerequisites
• If you provide a value in the script.notification property, you can leave the user.email field blank; an SMTP server is not required.If an event occurs, Failover Manager invokes the script (if provided), and sends a notification email to any email addresses specified in the user.email parameter of the cluster properties file. For more information about using an SMTP server, visit:Unless specified with the -sourcenode option, a recovery.conf file is copied from a random standby node to the stopped master during switchover. You should ensure that the paths within the recovery.conf files on your standby nodes are consistent before performing a switchover. For more information about the -sourcenode option, please see Section 4.1.4.Please note that Failover Manager does not support automatic reconfiguration of the standby databases after a failover if you use replication slots to manage your WAL segments. If you use replication slots, you should set the auto.reconfigure parameter to false, and manually reconfigure the standby servers in the event of a failover.You must modify the pg_hba.conf file on the Master and Standby nodes, adding entries that allow communication between the all of the nodes in the cluster. The following example demonstrates entries that might be made to the pg_hba.conf file on the Master node:efm specifies the name of a valid database user.By default, the pg_hba.conf file resides in the data directory, under your Postgres installation. After modifying the pg_hba.conf file, you must reload the configuration file on each node for the changes to take effect. You can use the following command:If a Master node reboots, Failover Manager may detect the database is down on the Master node and promote a Standby node to the role of Master. If this happens, the Failover Manager agent on the (rebooted) Master node will not get a chance to write the recovery.conf file; the rebooted Master node will return to the cluster as a second Master node.To prevent this, start the Failover Manager agent before starting the database server. The agent will start in idle mode, and check to see if there is already a master in the cluster. If there is a master node, the agent will verify that a recovery.conf file exists, and the database will not start as a second master.If a Linux firewall (i.e. iptables) is enabled on the host of a Failover Manager node, you may need to add rules to the firewall configuration that allow tcp communication between the Failover Manager processes in the cluster. For example:The command shown above opens a small range of ports (7800 through 7810). Failover Manager will connect via the port that corresponds to the port specified in the cluster properties file.The database user specified in the efm.properties file must have sufficient privileges to invoke the following functions on behalf of Failover Manager:
• The cluster_name.properties file contains parameters that specify connection properties and behaviors for your Failover Manager cluster. Modifications to property settings are applied when Failover Manager starts.The following properties are the minimal properties required to configure a Failover Manager cluster. If you are configuring a production system, please see Section 3.5.1 for a complete list of properties.Only one of the following properties is needed. If you provide the service name, EFM will use a service command to control the database server when necessary; if you provide the location of the Postgres bin directory, EFM will use pg_ctl to control the database server.When configuring a production cluster, the following properties can be either true or false depending on your system configuration and usage. Set them both to true to simplify startup if you're configuring an EFM test cluster.The cluster_name.nodes file is read at startup to tell an agent how to find the rest of the cluster or, in the case of the first node started, can be used to simplify authorization of subsequent nodes.Please note that the Failover Manager agent will not verify the content of the efm.nodes file; the agent expects that some of the addresses in the file cannot be reached (e.g. that another agent hasn’t been started yet). For more information about the efm.nodes file, see Section 3.5.2.Copy the efm.properties and efm.nodes files to the /etc/edb/efm-3.3 directory on the other nodes in your sample cluster. After copying the files, change the file ownership so the files are owned by efm:efm. The efm.properties file can be the same on every node, except for the following properties:
•
• Set is.witness to true if the node is a witness node. If the node is a witness node, the properties relating to a local database installation will be ignored.On any node, start the Failover Manager agent. The agent is named efm-3.3; you can use your platform-specific service command to control the service. For example, on a CentOS or RHEL 7.x host use the command:
1. Use the edb-repo package to create the repository configuration file. You can download and invoke the edb-repo file, or use rpm or yum to create the repository. Assume superuser privileges and use either rpm or yum to create the EnterpriseDB repository configuration file. :
2. Use your choice of editor to modify the repository configuration file, enabling the [enterprisedb-tools] and the [enterprisedb-dependencies] entries. To enable a repository, change the value of the enabled parameter to 1 and replace the user name and password placeholders in the baseurl specification with your user name and the repository password.Then, you can use the yum install command to install Failover Manager. For example, to install Failover Manager version 3.3, use the command:When you install an RPM package that is signed by a source that is not recognized by your system, yum may ask for your permission to import the key to your local server. If prompted, and you are satisfied that the packages come from a trustworthy source, enter a y, and press Return to continue.Failover Manager must be installed by root. During the installation process, the installer will also create a user named efm that has sufficient privileges to invoke scripts that control the Failover Manager service for clusters owned by enterprisedb or postgres.If you are using Failover Manager to monitor a cluster owned by a user other than enterprisedb or postgres, see Section 3.4, Extending Failover Manager Permissions.
1. Modify the cluster properties file on each node. For detailed information about modifying the cluster properties file, see Section 3.5.1.
2.
4. 3.1.1 Installation Locations
To install Failover Manager, you must also have credentials that allow access to the EnterpriseDB repository. To request credentials for the repository, visit the EnterpriseDB Advanced Downloads page at:The following steps will walk you through using the EnterpriseDB apt repository to install Failover Manager. When using the commands, replace the username and password with the credentials provided by EnterpriseDB.sh -c 'echo "deb https://username:password@apt.enterprisedb.com/$(lsb_release -cs)-edb/ $(lsb_release -cs) main" > /etc/apt/sources.list.d/edb-$(lsb_release -cs).list'
To install Failover Manager, you must also have credentials that allow access to the EnterpriseDB repository. To request credentials for the repository, visit the Advanced Downloads page at:You can use the zypper package manager to install a Failover Manager agent on an SLES 12 host. zypper will attempt to satisfy package dependencies as it installs a package, but requires access to specific repositories that are not hosted at EnterpriseDB.The commands create the repository configuration files in the /etc/zypp/repos.d directory. Then, use the following command to refresh the metadata on your SLES host to include the EnterpriseDB repository:When prompted, provide credentials for the repository, and specify a to always trust the provided key, and update the metadata to include the EnterpriseDB repository.zypper install SUSEConnect
SUSEConnect -r registration_number -e user_id
SUSEConnect -p PackageHub/12/x86_64
SUSEConnect -p sle-sdk/12/x86_64
During the Failover Manager installation, the installer creates a user named efm. efm does not have sufficient privileges to perform management functions that are normally limited to the database owner or operating system superuser.
• When performing management functions requiring database superuser privileges, efm invokes the efm_db_functions script.
• When performing management functions requiring operating system superuser privileges, efm invokes the efm_root_functions script.
• The efm_db_functions or efm_root_functions scripts perform management functions on behalf of the efm user.The sudoers file contains entries that allow the user efm to control the Failover Manager service for clusters owned by postgres or enterprisedb. You can modify a copy of the sudoers file to grant permission to manage Postgres clusters owned by other users to efm.# Copyright EnterpriseDB Corporation, 2014-2018. All Rights
# Reserved.
#
# Do not edit this file. Changes to the file may be overwritten
# during an upgrade.
#
# This file assumes you are running your efm cluster as user
# 'efm'. If not, then you will need to copy this file.
# Allow user 'efm' to sudo efm_db_functions as either 'postgres'
# or 'enterprisedb'. If you run your db service under a
# non-default account, you will need to copy this file to grant
# the proper permissions and specify the account in your efm
# cluster properties file by changing the 'db.service.owner'
# property.
efm ALL=(postgres) NOPASSWD: /usr/edb/efm-3.3 /bin/efm_db_functions
efm ALL=(enterprisedb) NOPASSWD: /usr/edb/efm-3.3 /bin/efm_db_functions
# Allow user 'efm' to sudo efm_root_functions as 'root' to
# write/delete the PID file, validate the db.service.owner
# property, etc.
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-3.3 /bin/efm_root_functions
# Allow user 'efm' to sudo efm_address as root for VIP tasks.
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-3.3 /bin/efm_address
# relax tty requirement for user 'efm'
Defaults:efm !requirettyIf you are using Failover Manager to monitor clusters that are owned by users other than postgres or enterprisedb, make a copy of the efm-33 file, and modify the content to allow the user to access the efm_functions script to manage their clusters.If an agent cannot start because of permission problems, make sure the default /etc/sudoers file contains the following line at the end of the file:By default, Failover Manager uses sudo to securely manage access to system functionality. If you choose to configure Failover Manager to run without sudo access, please note that root access is still required to:To run Failover Manager without sudo, you must select a database process owner that will have privileges to perform management functions on behalf of Failover Manager. The user could be the default database superuser (for example, enterprisedb or postgres) or another privileged user. After selecting the user:
1. su - enterprisedb
cp /etc/edb/efm-3.3/efm.properties.in directory/cluster_name.properties
cp /etc/edb/efm-3.3/efm.nodes.in directory/cluster_name.nodesThen, modify the cluster properties file, providing the name of the user in the db.service.owner property. You must also ensure that the db.service.name property is blank; without sudo, you cannot run services without root access./usr/edb/efm-3.3/bin/runefm.sh start|stop directory/cluster_name.propertiesWhere directory/cluster_name.properties specifies the full path and name of the cluster properties file. Please note that the user must ensure that the full path to the properties file must be provided whenever the non-default user is controlling agents or using the efm script.Failover Manager uses a binary named manage-vip that resides in /usr/edb/efm-3.3/bin/secure/ to perform VIP management operations without sudo privileges. This script uses setuid to acquire with the privileges needed to manage Virtual IP addresses.
•
• For security reasons, we recommend against modifying the access privileges of the /usr/edb/efm-3.3/bin/secure/ directory or the manage-vip script.
The efm.properties file contains the properties of the individual node on which it resides, while the efm.nodes file contains a list of the current Failover Manager cluster members. By default, the installer places the files in the /etc/edb/efm-3.3 directory.The Failover Manager installer creates a file template for the cluster properties file named efm.properties.in in the /etc/edb/efm-3.3 directory. After completing the Failover Manager installation, you must make a working copy of the template before modifying the file contents. For example, the following command copies the efm.properties.in file, creating a properties file named efm.properties:Please note: By default, Failover Manager expects the cluster properties file to be named efm.properties. If you name the properties file something other than efm.properties, you must modify the service script or unit file to instruct Failover Manager to use a different name.After creating the cluster properties file, add (or modify) configuration parameter values as required. For detailed information about each property, see Section 3.5.1.1.The property files are owned by root. The Failover Manager service script expects to find the files in the /etc/edb/efm-3.3 directory. If you move the property file to another location, you must create a symbolic link that specifies the new location.3.5.1.1 Specifying Cluster PropertiesUse the properties in the efm.properties file to specify connection, administrative, and operational details for Failover Manager.The db.user specified must have sufficient privileges to invoke selected PostgreSQL commands on behalf of Failover Manager. For more information, please see Section 2.2.Use the db.service.owner property to specify the name of the operating system user that owns the cluster that is being managed by Failover Manager. This property is not required on a dedicated witness node.Specify the name of the database service in the db.service.name property if you use the service or systemctl command when starting or stopping the service.You should use the same service control mechanism (pg_ctl, service, or systemctl) each time you start or stop the database service. If you use the pg_ctl program to control the service, specify the location of the pg_ctl program in the db.bin property.Use the db.recovery.conf.dir property to specify the location to which a recovery file will be written on the Master node of the cluster, and a trigger file is written on a Standby. This property is not required on a dedicated witness node.# Specify the location of the db recovery.conf file on the node.
# On a standby node, the trigger file location is read from the
# file in this directory. After a failover, the recovery.conf
# files on remaining standbys are changed to point to the new
# master db (a copy of the original is made first). On a master
# node, a recovery.conf file will be written during failover and
# promotion to ensure that the master node can not be restarted
# as the master database.Use the jdbc.sslmode property to instruct Failover Manager to use SSL connections; by default, SSL is disabled.# Use the jdbc.sslmode property to enable ssl for EFM
# connections. Setting this property to anything but 'disable'
# will force the agents to use 'ssl=true' for all JDBC database
# connections (to both local and remote databases).
# Valid values are:
#
# disable - Do not use ssl for connections.
# verify-ca - EFM will perform CA verification before allowing
# the certificate.
# require - Verification will not be performed on the server
# certificate.
jdbc.sslmode=disableUse the user.email property to specify an email address (or multiple email addresses) that will receive any notifications sent by Failover Manager.Use the notification.level property to specify the minimum severity level at which Failover Manager will send user notifications or when a notification script is called. For a complete list of notifications, please see Section 7.Use the script.notification property to specify the path to a user-supplied script that acts as a notification service; the script will be passed a message subject and a message body. The script will be invoked each time Failover Manager generates a user notification.# Absolute path to script run for user notifications.
#
# This is an optional user-supplied script that can be used for
# notifications instead of email. This is required if not using
# email notifications. Either/both can be used. The script will
# be passed two parameters: the message subject and the message
# body.The bind.address property specifies the IP address and port number of the agent on the current node of the Failover Manager cluster.# This property specifies the ip address and port that jgroups
# will bind to on this node. The value is of the form
# <ip>:<port>.
# Note that the port specified here is used for communicating
# with other nodes, and is not the same as the admin.port below,
# used only to communicate with the local agent to send control
# signals.
# For example, <provide_your_ip_address_here>:7800Use the admin.port property to specify a port on which Failover Manager listens for administrative commands.Set the is.witness property to true to indicate that the current node is a witness node. If is.witness is true, the local agent will not check to see if a local database is running.The Postgres pg_is_in_recovery() function is a boolean function that reports the recovery state of a database. The function returns true if the database is in recovery, or false if the database is not in recovery. When an agent starts, it connects to the local database and invokes the pg_is_in_recovery() function. If the server responds true, the agent assumes the role of standby; if the server responds false, the agent assumes the role of master. If there is no local database, the agent will assume an idle state.The local.period property specifies how many seconds between attempts to contact the database server.
The local.timeout property specifies how long an agent will wait for a positive response from the local database server.
The local.timeout.final property specifies how long an agent will wait after the final attempt to contact the database server on the current node. If a response is not received from the database within the number of seconds specified by the local.timeout.final property, the database is assumed to have failed.For example, given the default values of these properties, a check of the local database happens once every 10 seconds. If an attempt to contact the local database does not come back positive within 60 seconds, Failover Manager makes a final attempt to contact the database. If a response is not received within 10 seconds, Failover Manager declares database failure and notifies the administrator listed in the user.email property. These properties are not required on a dedicated witness node.Use the remote.timeout property to specify how many seconds an agent waits for a response from a remote database server (i.e., how long a standby agent waits to verify that the master database is actually down before performing failover).Use the node.timeout property to specify the number of seconds that an agent will wait for a response from a node when determining if a node has failed. The node.timeout property value specifies a timeout value for agent-to-agent communication; other timeout properties in the cluster properties file specify values for agent-to-database communication.# The total amount of time in seconds to wait before determining
# that a node has failed or been disconnected from this node.
#
# The value of this property must be the same across all agents.Use the stop.isolated.master property to instruct Failover Manager to shut down the database if a master agent detects that it is isolated. When true (the default), Failover Manager will stop the database before invoking the script specified in the script. master.isolated property.Use the stop.failed.master property to instruct Failover Manager to attempt to shut down a master database if it can not reach the database. If true, Failover Manager will run the script specified in the script.db.failure property after attempting to shut down the database.Use the pingServer property to specify the IP address of a server that Failover Manager can use to confirm that network connectivity is not a problem.Use the auto.allow.hosts property to instruct the server to use the addresses specified in the .nodes file of the first node started to update the allowed host list. Enabling this property (setting auto.allow.hosts to true) can simplify cluster start-up.Use the stable.nodes.file property to instruct the server to not rewrite the nodes file when a node joins or leaves the cluster. This property is most useful in clusters with unchanging IP addresses.The db.reuse.connection.count property allows the administrator to specify the number of times Failover Manager reuses the same database connection to check the database health. The default value is 0, indicating that Failover Manager will create a fresh connection each time. This property is not required on a dedicated witness node.# Whether or not failover will happen automatically when the master
# fails. Set to false if you want to receive the failover notifications
# but not have EFM actually perform the failover steps.
# The value of this property must be the same across all agents.Use the auto.reconfigure property to instruct Failover Manager to enable or disable automatic reconfiguration of remaining Standby servers after the primary standby is promoted to Master. Set the property to true to enable automatic reconfiguration (the default) or false to disable automatic reconfiguration. This property is not required on a dedicated witness node.# After a standby is promoted, failover manager will attempt to
# update the remaining standbys to use the new master. Failover
# manager will back up recovery.conf, change the host parameterPlease note: If you are using replication slots to manage your WAL segments, automatic reconfiguration is not supported; you should set auto.reconfigure to false. For more information, see Section 2.2.Use the promotable property to indicate that a node should not be promoted. To override the setting, use the efm set-priority command at runtime; for more information about the efm set-priority command, see Section 5.3.Use the minimum.standbys property to specify the minimum number of standby nodes that will be retained on a cluster; if the standby count drops to the specified minimum, a replica node will not be promoted in the event of a failure of the master node.Use the recovery.check.period property to specify the number of seconds that Failover Manager will wait before checks to see if a database is out of recovery.Use the auto.resume.period property to specify the number of seconds (after a monitored database fails, and an agent has assumed an idle state, or when starting in IDLE mode) during which an agent will attempt to resume monitoring that database.Failover Manager provides support for clusters that use a virtual IP. If your cluster uses a virtual IP, provide the host name or IP address in the virtualIp property; specify the corresponding prefix in the virtualIp.prefix property. If virtualIp is left blank, virtual IP support is disabled.The specified virtual IP address is assigned only to the master node of the cluster. If you specify virtualIp.single=true, the same VIP address will be used on the new master in the event of a failover. Specify a value of false to provide a unique IP address for each node of the cluster.# These properties specify the IP and prefix length that will be
# remapped during failover. If you do not use a VIP as part of
# your failover solution, leave the virtualIp property blank to
# disable Failover Manager support for VIP processing (assigning,
# releasing, testing reachability, etc).
#
# If you specify a VIP, the interface and prefix are required.
#
# If specify a host name, it will be resolved to an IP address
# when acquiring or releasing the VIP. If the host name resolves
# to more than one IP address, there is no way to predict which
# address Failover Manager will use.
#
# By default, the virtualIp and virtualIp.prefix values must be
# the same across all agents. If you set virtualIp.single to
# false, you can specify unique values for virtualIp and
# virtualIp.prefix on each node.
#
# If you are using an IPv4 address, the virtualIp.interface value
# should not contain a secondary virtual ip id (do not include
# ":1", etc).Set the check.vip.before.promotion property to false to indicate that Failover Manager will not check to see if a VIP is in use before assigning it to a a new master in the event of a failure. Please note that this could result in multiple nodes broadcasting on the same VIP address; unless the master node is isolated or can be shut down via another process, you should set this property to true.Provide a script name after the script.load.balancer.attach property to identify a script that will be invoked when a node should be attached to the load balancer. Use the script.load.balancer.detach property to specify the name of a script that will be invoked when a node should be detached from the load balancer. Include the %h placeholder to represent the IP address of the node that is being attached or removed from the cluster.# Absolute path to load balancer scripts
# The attach script is called when a node should be attached to
# the load balancer, for example after a promotion. The detach
# script is called when a node should be removed, for example
# when a database has failed or is about to be stopped. Use %h to
# represent the IP/hostname of the node that is being
# attached/detached.
#
# Example:
# script.load.balancer.attach=/somepath/attachscript %hscript.fence specifies the path to an optional user-supplied script that will be invoked during the promotion of a standby node to master node.# absolute path to fencing script run during promotion
#
# This is an optional user-supplied script that will be run
# during failover on the standby database node. If left blank,
# no action will be taken. If specified, EFM will execute this
# script before promoting the standby.
#
# Parameters can be passed into this script for the failed master
# and new primary node addresses. Use %p for new primary and %f
# for failed master. On a node that has just been promoted, %p
# should be the same as the node's efm binding address.
#
# Example:
# script.fence=/somepath/myscript %p %f
#
# NOTE: FAILOVER WILL NOT OCCUR IF THIS SCRIPT RETURNS A NON-ZERO EXIT CODE.Use the script.post.promotion property to specify the path to an optional user-supplied script that will be invoked after a standby node has been promoted to master.# Absolute path to fencing script run after promotion
#
# This is an optional user-supplied script that will be run after
# failover on the standby node after it has been promoted and
# is no longer in recovery. The exit code from this script has
# no effect on failover manager, but will be included in a
# notification sent after the script executes.
#
# Parameters can be passed into this script for the failed master
# and new primary node addresses. Use %p for new primary and %f
# for failed master. On a node that has just been promoted, %p
# should be the same as the node's efm binding address.
#
# Example:
# script.post.promotion=/somepath/myscript %f %pUse the script.resumed property to specify an optional path to a user-supplied script that will be invoked when an agent resumes monitoring of a database.Use the script.db.failure property to specify the complete path to an optional user-supplied script that Failover Manager will invoke if an agent detects that the database that it monitors has failed.Use the script.master.isolated property to specify the complete path to an optional user-supplied script that Failover Manager will invoke if the agent monitoring the master database detects that the master is isolated from the majority of the Failover Manager cluster. This script is called immediately after the VIP is released (if a VIP is in use).Use the script.remote.pre.promotion property to specify the path and name of a script that will be invoked on any agent nodes not involved in the promotion when a node is about to promote its database to master.Use the script.remote.post.promotion property to specify the path and name of a script that will be invoked on any non-master nodes after a promotion occurs.Use the script.custom.monitor property to provide the name and location of an optional script that will be invoked on regular intervals (specified in seconds by the custom.monitor.interval property).Use custom.monitor.timeout to specify the maximum time that the script will be allowed to run; if script execution does not complete within the time specified, Failover Manager will send a notification.Set custom.monitor.safe.mode to true to instruct Failover Manager to report non-zero exit codes from the script, but not promote a standby as a result of an exit code.Use the sudo.command property to specify a command that will be invoked by Failover Manager when performing tasks that require extended permissions. Use this option to include command options that might be specific to your system authentication.Use the sudo.user.command property to specify a command that will be invoked by Failover Manager when executing commands that will be performed by the database owner.Use the lock.dir property to specify an alternate location for the Failover Manager lock file; the file prevents Failover Manager from starting multiple (potentially orphaned) agents for a single cluster on the node.Use the log.dir property to specify the location to which agent log files will be written; Failover Manager will attempt to create the directory if the directory does not exist.After enabling the UDP or TCP protocol on a Failover Manager host, you can enable logging to syslog. Use the syslog.protocol parameter to specify the protocol type (UDP or TCP) and the syslog.port parameter to specify the listener port of the syslog host. The syslog.facility value may be used as an identifier for the process that created the entry; the value must be between LOCAL0 and LOCAL7.Use the file.log.enabled and syslog.enabled properties to specify the type of logging that you wish to implement. Set file.log.enabled to true to enable logging to a file; enable the UDP protocol or TCP protocol and set syslog.enabled to true to enable logging to syslog. You can enable logging to both a file and syslog.Use the jgroups.loglevel and efm.loglevel parameters to specify the level of detail logged by Failover Manager. The default value is INFO. For more information about logging, see Section 6, Controlling Logging.Use the jvm.options property to pass JVM-related configuration information. The default setting specifies the amount of memory that the Failover Manager agent will be allowed to use.
Failover Manager requires you to encrypt your database password before including it in the cluster properties file. Use the efm utility (located in the /usr/edb/efm-3.3 /bin directory) to encrypt the password. When encrypting a password, you can either pass the password on the command line when you invoke the utility, or use the EFMPASS environment variable.If you include the --from-env option, you must export the value you wish to encrypt before invoking the encryption utility. For example:export EFMPASS=passwordIf you do not include the --from-env option, Failover Manager will prompt you to enter the database password twice before generating an encrypted password for you to place in your cluster property file. When the utility shares the encrypted password, copy and paste the encrypted password into the cluster property files.Please note: Many Java vendors ship their version of Java with full-strength encryption included, but not enabled due to export restrictions. If you encounter an error that refers to an illegal key size when attempting to encrypt the database password, you should download and enable a Java Cryptography Extension (JCE) that provides an unlimited policy for your platform.The following example demonstrates using the encrypt utility to encrypt a password for the acctg cluster:# efm encrypt acctg
This utility will generate an encrypted password for you to place in your EFM cluster property file:
/etc/edb/efm-3.3/acctg.properties
Please enter the password and hit enter:
Please enter the password again to confirm:
The encrypted password is: 516b36fb8031da17cfbc010f7d09359c
Please paste this into your acctg.properties file
db.password.encrypted=516b36fb8031da17cfbc010f7d09359cIf you receive this message when starting the Failover Manager service, please see the startup log (located in /var/log/efm-3.3/startup-efm.log) for more information.The following example demonstrates using the --from-env environment variable when encrypting a password. Before invoking the efm encrypt command, set the value of EFMPASS to the password (1safepassword):The encrypted password (7ceecd8965fa7a5c330eaa9e43696f83) is returned as a text value; when using a script, you can check the exit code of the command to confirm that the command succeeded. A successful execution returns 0.
3.5.2 The Cluster Members FileEach node in a Failover Manager cluster has a cluster members file. When an agent starts, it uses the file to locate other cluster members. The Failover Manager installer creates a file template for the cluster members file named efm.nodes.in in the /etc/edb/efm-3.3 directory. After completing the Failover Manager installation, you must make a working copy of the template:By default, Failover Manager expects the cluster members file to be named efm.nodes. If you name the cluster members file something other than efm.nodes, you must modify the Failover Manager service script to instruct Failover Manager to use the new name.The cluster members file on the first node started can be empty; this node will become the Membership Coordinator. On each subsequent node, the cluster member file must contain the address and port number of the Membership Coordinator. Each entry in the cluster members file must be listed in an address:port format, with multiple entries separated by white space.The Membership Coordinator will update the contents of the efm.nodes file to match the current members of the cluster. As agents join or leave the cluster, the efm.nodes files on other agents are updated to reflect the current cluster membership. If you invoke the efm stop-cluster command, Failover Manager does not modify the file.If the Membership Coordinator leaves the cluster, another node will assume the role. You can use the efm cluster-status command to find the address of the Membership Coordinator. If a node joins or leaves a cluster while an agent is down, you must manually ensure that the file includes at least the current Membership Coordinator.If you know the IP addresses and ports of the nodes that will be joining the cluster, you can include the addresses in the cluster members file at any time. At startup, any addresses that do not identify cluster members will be ignored unless the auto.allow.hosts property (in the cluster properties file) is set to true. For more information, see Section 4.1.2.If the stable.nodes.file property is set to true, the Membership Coordinator will not update the .nodes file when cluster members join or leave the cluster; this behavior is most useful when the IP addresses of cluster members do not change often. For information about modifying cluster properties, see Section 3.5.1.1.
Failover Manager uses the efm_address script to assign or release a virtual IP address. By default, the script resides in:interface_name matches the name specified in the virtualIp.interface property in the cluster properties file.IPv4_addr or IPv6_addr matches the name specified in the virtualIp property in the cluster properties file.You must invoke the efm_address script as the root user. The efm user is created during the installation, and is granted privileges in the sudoers file to run the efm_address script. For more information about the sudoers file, see Section 3.4, Extending Failover Manager Permissions.Please note: the virtualIp.prefix specifies the number of significant bits in the virtual Ip address.When instructed to ping the VIP from a node, use the command defined by the pingServerCommand property.2. Run the efm_address add4 command on the Master node to assign the VIP and then confirm with ip address:4. Use the efm_address del command to release the address on the master node and confirm the node has been released with ip address:
If a Master node reboots, Failover Manager may detect the database is down on the Master node and promote a Standby node to the role of Master. If this happens, the Failover Manager agent on the (rebooted) Master node will not get a chance to write the recovery.conf file; the rebooted Master node will return to the cluster as a second Master node. To prevent this, start the Failover Manager agent before starting the database server. The agent will start in idle mode, and check to see if there is already a master in the cluster. If there is a master node, the agent will verify that a recovery.conf file exists, and the database will not start as a second master.
By default, some of the commands listed below must be invoked by efm or by an OS superuser; an administrator can selectively permit users to invoke these commands by adding the user to the efm group. The commands are:If the cluster properties file for the node specifies that is.witness is true, the node will start as a Witness node.If the node is not a dedicated Witness node, Failover Manager will connect to the local database and invoke the pg_is_in_recovery() function. If the server responds false, the agent assumes the node is a Master node, and assigns a virtual IP address to the node (if applicable). If the server responds true, the Failover Manager agent assumes that the node is a Standby server. If the server does not respond, the agent will start in an idle state.
1. Unless auto.allow.hosts is set to true, use the efm allow-node command, to add the IP address of the new node to the Failover Manager allowed node host list. When invoking the command, specify the cluster name and the IP address of the new node:efm allow-node cluster_name ip_addressFor more information about using the efm allow-node command or controlling a Failover Manager service, see Section 5.Install a Failover Manager agent and configure the cluster properties file on the new node. For more information about modifying the properties file, see Section 3.5.1.
2. When the new node joins the cluster, Failover Manager will send a notification to the administrator email provided in the user.email property, and/or will invoke the specified notification script.If your Failover Manager cluster includes more than one Standby server, you can use the efm set-priority command to influence the promotion priority of a Standby node. Invoke the command on any existing member of the Failover Manager cluster, and specify a priority value after the IP address of the member.For example, the following command instructs Failover Manager that the acctg cluster member that is monitoring 10.0.1.9:7800 is the primary Standby (1):efm cluster-status cluster_nameYou can invoke efm promote on any node of a Failover Manager cluster to start a manual promotion of a Standby database to Master database.Manual promotion should only be performed during a maintenance window for your database cluster. If you do not have an up-to-date Standby database available, you will be prompted before continuing. To start a manual promotion, assume the identity of efm or the OS superuser, and invoke the command:Include the –switchover option to reconfigure the original Master as a Standby. If you include the –switchover keyword, the cluster must include a master node and at least one standby, and the nodes must be in sync.Include the –sourcenode keyword to specify the node from which the recovery.conf file will be copied to the master.
•
• During a manual promotion, the Master agent releases the virtual IP address before creating a recovery.conf file in the directory specified by the db.recovery.conf.dir property. The Master agent remains running, and assumes a status of Idle.Please note that this command instructs the service to ignore the value specified in the auto.failover parameter in the cluster properties file.efm set-priority cluster_name ip_address priorityWhen you stop an agent, Failover Manager will remove the node's address from the cluster members list on all of the running nodes of the cluster, but will not remove the address from the Failover Manager Allowed node host list.Until you invoke the efm disallow-node command (removing the node's address of the node from the Allowed node host list), you can use the service efm-3.3 start command to restart the node at a later date without first running the efm allow-node command again.To stop a Failover Manager cluster, connect to any node of a Failover Manager cluster, assume the identity of efm or the OS superuser, and invoke the command:The command will cause all Failover Manager agents to exit. Terminating the Failover Manager agents completely disables all failover functionality.Please Note: when you invoke the efm stop-cluster command, all authorized node information is lost from the Allowed node host list.The efm disallow-node command removes the IP address of a node from the Failover Manager Allowed node host list. Assume the identity of efm or the OS superuser on any existing node (that is currently part of the running cluster), and invoke the efm disallow-node command, specifying the cluster name and the IP address of the node:efm disallow-node cluster_name ip_addressThe efm disallow-node command will not stop a running agent; the service will continue to run on the node until you stop the agent (for information about controlling the agent, see Section 5). If the agent or cluster is subsequently stopped, the node will not be allowed to rejoin the cluster, and will be removed from the failover priority list (and will be ineligible for promotion).After invoking the efm disallow-node command, you must use the efm allow-node command to add the node to the cluster again. For more information about using the efm utility, see Section 5.3.
You can use either the Failover Manager efm cluster-status command or the PEM Client graphical interface to check the current status of a monitored node of a Failover Manager cluster.The cluster-status command returns a report that contains information about the status of the Failover Manager cluster. To invoke the command, enter:efm cluster-status efm
Cluster Status: efm
Agent Type Address Agent DB VIP
-----------------------------------------------------
Witness 172.19.12.170 UP N/A
Master 172.19.13.105 UP UP 172.19.13.107*
Standby 172.19.13.113 UP UP 172.19.13.106
Standby 172.19.14.106 UP UP 172.19.13.108
Allowed node host list:
172.19.12.170 172.19.13.113 172.19.13.105 172.19.14.106
Membership coordinator: 172.19.12.170
Standby priority host list:
172.19.13.113 172.19.14.106
Promote Status:
DB Type Address XLog Loc Info
-------------------------------------------------------
Master 172.19.13.105 0/31000140
Standby 172.19.13.113 0/31000140
Standby 172.19.14.106 0/31000140
Standby database(s) in sync with master. It is safe to promote.The Cluster Status section provides an overview of the status of the agents that reside on each node of the cluster:Cluster Status: efm
Agent Type Address Agent DB VIP
-----------------------------------------------------
Witness 172.19.12.170 UP N/A
Master 172.19.13.105 UP UP 172.19.13.107*
Standby 172.19.13.113 UP UP 172.19.13.106
Standby 172.19.14.106 UP UP 172.19.13.108
The Allowed node host list and Standby priority host list provide an easy way to tell which nodes are allowed to join the cluster, and the promotion order of the nodes. The IP address of the Membership coordinator is also displayed in the report:Allowed node host list:
172.19.12.170 172.19.13.113 172.19.13.105 172.19.14.106
Membership coordinator: 172.19.12.170
Standby priority host list:
172.19.13.113 172.19.14.106The Promote Status section of the report is the result of a direct query from the node on which you are invoking the cluster-status command to each database in the cluster; the query also returns the transaction log location of each database.Promote Status:
DB Type Address XLog Loc Info
-------------------------------------------------------
Master 172.19.13.105 0/31000140
Standby 172.19.13.113 0/31000140
Standby 172.19.14.106 0/31000140
Standby database(s) in sync with master. It is safe to promote.If a database is down (or if the database has been restarted, but the resume command has not yet been invoked), the state of the agent that resides on that host will be Idle. If an agent is idle, the cluster status report will include a summary of the condition of the idle node:Agent Type Address Agent DB VIP
-----------------------------------------------------
Idle 172.19.18.105 UP UP 172.19.13.105
The Streaming Replication Analysis Dashboard (shown in Figure 4.1) displays statistical information about activity for any monitored server on which streaming replication is enabled. The dashboard header identifies the status of the monitored server (either Replication Master or Replication Slave), and displays the date and time that the server was last started, the date and time that the page was last updated, and a current count of triggered alerts for the server.
2.
•
• To run a Failover Manager agent for both of these database clusters, use the efm.properties.in template to create two properties files. Each cluster properties file must have a unique name. For this example, we create acctg.properties and sales.properties to match the acctg and sales database clusters.Within each cluster properties file, the db.port parameter should specify a unique value for each cluster, while the db.user and db.database parameter may have the same value or a unique value. For example, the acctg.properties file may specify:When creating the cluster properties file for each cluster, the db.recovery.conf.dir parameters must also specify values that are unique for each respective database cluster.This parameter value is determined by the virtual IP addresses being used and may or may not be the same for both acctg.properties and sales.properties.After creating the acctg.properties and sales.properties files, create a service script or unit file for each cluster that points to the respective property files; this step is platform specific. If you are using RHEL 6.x or CentOS 6.x, see Section 4.3.1; if you are using RHEL 7.x or CentOS 7.x, see Section 4.3.2.4.3.1 RHEL 6.x or CentOS 6.xIf you are using RHEL 6.x or CentOS 6.x, you should copy the efm-3.3 service script to new file with a name that is unique for each cluster. For example:Then, use the new service scripts to start the agents. For example, you can start the acctg agent with the command:4.3.2 RHEL 7.x or CentOS 7.xIf you are using RHEL 7.x or CentOS 7.x, you should copy the efm-3.3 unit file to new file with a name that is unique for each cluster. For example, if you have two clusters (named acctg and sales), the unit file names might be:Then, edit the CLUSTER variable within each unit file, changing the specified cluster name from efm to the new cluster name. For example, for a cluster named acctg, the value would specify:You must also update the value of the PIDfile parameter to specify the new cluster name. For example:Then, use the new service scripts to start the agents. For example, you can start the acctg agent with the command:
• A configuration file named efm.properties that contains the properties used by the Failover Manager service. Each node of a replication scenario must contain a properties file that provides information about the node.
• A cluster members file named efm.nodes that contains a list of the cluster members. Each node of a replication scenario must contain a cluster members list.The commands that control the Failover Manager service are platform-specific; for information about controlling Failover Manager on a RHEL 6.x or CentOS 6.x host, see Section 5.1. If you are using RHEL 7.x or CentOS 7.x, see Section 5.2.
On RHEL 6.x and CentOS 6.x, Failover Manager runs as a Linux service named (by default) efm-3.3 that is located in /etc/init.d. Each database cluster monitored by Failover Manager will run a copy of the service on each node of the replication cluster.Use the following service commands to control a Failover Manager agent that resides on a RHEL 6.x or CentOS 6.x host:The start command starts the Failover Manager agent on the current node. The local Failover Manager agent monitors the local database and communicates with Failover Manager on the other nodes. You can start the nodes in a Failover Manager cluster in any order.The status command returns the status of the Failover Manager agent on which it is invoked. You can invoke the status command on any node to instruct Failover Manager to return status information. For example:
On RHEL 7.x and CentOS 7.x, Failover Manager runs as a Linux service named (by default) efm-3.3.service that is located in /usr/lib/systemd/system. Each database cluster monitored by Failover Manager will run a copy of the service on each node of the replication cluster.Use the following systemctl commands to control a Failover Manager agent that resides on a RHEL 7.x or CentOS 7.x host:The start command starts the Failover Manager agent on the current node. The local Failover Manager agent monitors the local database and communicates with Failover Manager on the other nodes. You can start the nodes in a Failover Manager cluster in any order.The status command returns the status of the Failover Manager agent on which it is invoked. You can invoke the status command on any node to instruct Failover Manager to return status and server startup information.
Failover Manager provides the efm utility to assist with cluster management. The RPM installer adds the utility to the /usr/edb/efm-3.3/bin directory when you install Failover Manager.efm allow-node cluster_nameInvoke the efm allow-node command to allow the specified node to join the cluster. When invoking the command, provide the name of the cluster and the IP address of the joining node.efm cluster-status cluster_nameInvoke the efm cluster-status command to display the status of a Failover Manager cluster. For more information about the cluster status report, see Section 4.2.1.efm cluster-status-json cluster_nameInvoke the efm cluster-status-json command to display the status of a Failover Manager cluster in json format. While the format of the displayed information is different than the display generated by the efm cluster-status command, the information source is the same.{
"nodes": {
"172.16.144.176": {
"type": "Witness",
"agent": "UP",
"db": "N\/A",
"vip": "",
"vip_active": false
},
"172.16.144.177": {
"type": "Master",
"agent": "UP",
"db": "UP",
"vip": "",
"vip_active": false,
"xlog": "2\/77000220",
"xloginfo": ""
},
"172.16.144.180": {
"type": "Standby",
"agent": "UP",
"db": "UP",
"vip": "",
"vip_active": false,
"xlog": "2\/77000220",
"xloginfo": ""
}
},
"allowednodes": [
"172.16.144.177",
"172.16.144.160",
"172.16.144.180",
"172.16.144.176"
],
"membershipcoordinator": "172.16.144.177",
"failoverpriority": [
"172.16.144.180"
],
"minimumstandbys": 0,
"missingnodes": [],
"messages": []
}efm disallow-node cluster_name ip_addressInvoke the efm disallow-node command to remove the specified node from the allowed hosts list, and prevent the node from joining a cluster. Provide the name of the cluster and the IP address of the node when calling the efm disallow-node command. This command must be invoked by efm, a member of the efm group, or root.Invoke the efm encrypt command to encrypt the database password before include the password in the cluster properties file. Include the --from-env option to instruct Failover Manager to use the value specified in the EFMPASS environment variable, and execute without user input. For more information, see Section 3.5.1.2.Include the –switchover clause to promote a standby node, and reconfigure a master node as a standby node. Include the -sourcenode keyword, and specify a node address to indicate the node whose recovery.conf file will be copied to the old master node (making it a standby). Include the -quiet keyword to suppress notifications during the switchover process.Please note that this command instructs the service to ignore the value specified in the auto.failover parameter in the cluster properties file.efm resume cluster_nameInvoke the efm resume command to resume monitoring a previously stopped database. This command must be invoked by efm, a member of the efm group, or root.efm set-priority cluster_name ip_address priorityInvoke the efm set-priority command to assign a failover priority to a standby node. The value specifies the order in which the new node will be used in the event of a failover. This command must be invoked by efm, a member of the efm group, or root.priority is an integer value of 1 to n, where n is the number of standby nodes in the list. Specify a value of 1 to indicate that the new node is the primary standby, and will be the first node promoted in the event of a failover. A priority of 0 instructs Failover Manager to not promote the standby.efm stop-cluster cluster_nameInvoke the efm stop-cluster command to stop Failover Manager on all nodes. This command instructs Failover Manager to connect to each node on the cluster and instruct the existing members to shut down. The command has no effect on running databases, but when the command completes, there is no failover protection in place.Please note: when you invoke the efm stop-cluster command, all authorized node information is removed from the Allowed node host list.Invoke the efm upgrade-conf command to copy the configuration files from an existing Failover Manager installation, and add parameters required by a Failover Manager 3.3 installation. Provide the name of the previous cluster when invoking the utility. This command must be invoked with root privileges.If you are upgrading from a Failover Manager configuration that does not use sudo, include the -source flag and specify the name of the directory in which the configuration files reside when invoking upgrade-conf.
Failover Manager writes and stores one log file per agent and one startup log per agent in /var/log/cluster_name-3.3 (where cluster_name specifies the name of the cluster).You can control the level of detail written to the agent log by modifying the jgroups.loglevel and efm.loglevel parameters in the cluster properties file:# Logging levels for JGroups and EFM.
# Valid values are: TRACE, DEBUG, INFO, WARN, ERROR
# Default value: INFO
# It is not necessary to increase these values unless debugging a
# specific issue. If nodes are not discovering each other at
# startup, increasing the jgroups level to DEBUG will show
# information about the TCP connection attempts that may help
# diagnose the connection failures.For example, if you set the efm.loglevel parameter to WARN, Failover Manager will only log messages at the WARN level and above (WARN and ERROR).By default, Failover Manager log files are rotated daily, compressed, and stored for a week. You can modify the file rotation schedule by changing settings in the log rotation file (/etc/logrotate.d/efm-3.3). For more information about modifying the log rotation schedule, consult the logrotate man page:
To allow a connection to syslog, edit the /etc/rsyslog.conf file and uncomment the protocol you wish to use. You must also ensure that the UDPServerRun or TCPServerRun entry associated with the protocol includes the port number to which log entries will be sent.After modifying the syslog configuration file, restart the rsyslog service to enable the connections:After modifying the rsyslog.conf file on the Failover Manager host, you must modify the Failover Manager properties to enable logging. Use your choice of editor to modify the properties file (/etc/edb/efm-3.3/efm.properties.in) specifying the type of logging that you wish to implement:You must also specify syslog details for your system. Use the syslog.protocol parameter to specify the protocol type (UDP or TCP) and the syslog.port parameter to specify the listener port of the syslog host. The syslog.facility value may be used as an identifier for the process that created the entry; the value must be between LOCAL0 and LOCAL7.syslog man
Failover Manager will send e-mail notifications and/or invoke a notification script when a notable event occurs that affects the cluster. If you have configured Failover Manager to send an email notification, you must have an SMTP server running on port 25 on each node of the cluster. Use the following parameters to configure notification behavior for Failover Manager:EFM node: 10.0.1.11
Cluster name: acctg
Database name: postgres
VIP: ip_address (Active|Inactive)
Database health is not being monitored.INFO indicates an informational message about the agent and does not require any manual intervention (for example, Failover Manager has started or stopped).WARNING indicates that an event has happened that requires the administrator to check on the system (for example, failover has occurred).SEVERE indicates that a serious event has happened and requires the immediate attention of the administrator (for example, failover was attempted, but was unable to complete).The severity level designates the urgency of the notification. A notification with a severity level of SEVERE requires user attention immediately, while a notification with a severity level of INFO will call your attention to operational information about your cluster that does not require user action. Notification severity levels are not related to logging levels; all notifications are sent regardless of the log level detail specified in the configuration file.You can use the notification.level property to specify the minimum severity level that will trigger a notification; for more information, see Section 3.5.1.1.
Agent exited for cluster cluster_name Starting auto resume check for cluster cluster_name The agent on this node will check every auto.resume.period seconds to see if it can resume monitoring the failed database. The cluster should be checked during this time and the agent stopped if the database will not be started again. See the agent log for more details.
No standby agent in cluster for cluster cluster_name Standby agent failed for cluster cluster_name A standby agent on cluster_name has left the cluster, but the coordinator has detected that the standby database is still running. Standby database failed for cluster cluster_name Standby agent cannot reach database for cluster cluster_name This node is no longer connected to the majority of the cluster cluster_name. Because this node is part of a subset of the cluster, failover will not be attempted. Current nodes that are visible are: node_address Witness failure for cluster cluster_name One or more nodes isolated from network for cluster cluster_name The standby EFM agent tried to promote itself, but detected that the master DB is still running on node_address. This usually indicates that the master EFM agent has exited. Failover has NOT occurred. The standby EFM agent tried to promote itself, but could not detect whether or not the master DB is still running on node_address. Failover has NOT occurred. The standby EFM agent tried to promote itself, but could not because the virtual IP address (VIP_address) appears to still be assigned to another node. Promoting under these circumstances could cause data corruption. Failover has NOT occurred. The standby EFM agent tried to promote itself, but could not because the well-known server (server_address) could not be reached. This usually indicates a network issue that has separated the standby agent from the other agents. Failover has NOT occurred. An agent has detected that the master database is no longer available in cluster cluster_name, but there are no standby nodes available for failover. A potential failover situation was detected for cluster cluster_name. Automatic failover has been disabled for this cluster, so manual intervention is required. Failover has completed on cluster cluster_name The lock file for cluster cluster_name has been removed from: path_name on node node_address. This lock prevents multiple agents from monitoring the same cluster on the same node. Please restore this file to prevent accidentally starting another agent for cluster. A recovery.conf file for cluster cluster_name has been found at: path_name on master node node_address. This may be problematic should you attempt to restart the DB on this node. The path provided for the trigger_file parameter in the recovery.conf file is not writable by the db_service_owner user. Failover Manager will not be able to promote the database if needed. Promotion has not occurred for cluster cluster_name Standby not reconfigured after failover in cluster cluster_name The auto.reconfigure property has been set to false for this node. The node has not been reconfigured to follow the new master node after a failover. Could not resume replay for cluster cluster_name Could not resume replay for standby being promoted. Manual intervention may be required. Error: error_decription
This error is returned if the server encounters an error when invoking replay during the promotion of a standby. Your remote.timeout value (value) is higher than your local.timeout value (value). If the local database takes too long to respond, the local agent could assume that the database has failed though other agents can connect. While this will not cause a failover, it could force the local agent to stop monitoring, leaving you without failover protection. No standbys available for promotion in cluster cluster_name The current number of standby nodes in the cluster has dropped to the minimum number: number. There cannot be a failover unless another standby node(s) is added or made promotable. Custom monitor timeout for cluster cluster_name Custom monitor 'safe mode' failure for cluster cluster_name The following custom monitor script has failed, but is being run in "safe mode": script_name.
Output: script_results
Unable to connect to DB on node_address Unable to connect to DB on node_address Unable to connect to DB on node_address The master agent can no longer reach the local database running at node_address. Other nodes are able to access the database remotely, so the master will not release the VIP and/or create a recovery.conf file. The master agent will become idle until the resume command is run to resume monitoring the database. Post-promotion script script_name failed to execute successfully.
Exit Value: exit_code
Results: script_results Exit Value: exit_codeResults: script_resultsNode: node_address Exit Value: exit_codeResults: script_resultsNode: node_address Post-database failure script script_name failed to execute successfully.
Exit Value: exit_code
Results: script_results Master isolation script script_name failed to execute successfully.
Exit Value: exit_code
Results: script_results The trigger file file_name could not be created on node. Could not promote standby. Error details: message_details There was an error creating the recovery.conf file on master node node_address during promotion. Promotion has continued, but requires manual intervention to ensure that the old master node can not be restarted. Error details: message_details An unexpected error has occurred for cluster cluster_name Master database being fenced off for cluster cluster_name The master database has been isolated from the majority of the cluster. The cluster is telling the master agent at ip_address to fence off the master database to prevent two masters when the rest of the failover manager cluster promotes a standby. Master database being fenced off for cluster cluster_name Could not assign VIP to node node_address Agent is timing out for cluster cluster_name Resume timed out for cluster cluster_name Internal state mismatch for cluster cluster_name is no longer available in cluster cluster_name, but there are not enough standby nodes available for failover.. Database in wrong state on node_address Database in wrong state on node_address Database connection failure for cluster cluster_name Standby custom monitor failure for cluster cluster_name The following custom monitor script has failed on a standby node.
The agent will stop monitoring the local database.Script output: script_results Master custom monitor failure for cluster cluster_name Script output: script_results Load balancer attach script script_name failed to execute successfully.
Exit Value: exit_code
Results: script_results Load balancer detach script script_name failed to execute successfully.
Exit Value: exit_code
Results: script_resultsPlease note: In addition to sending notices to the administrative email address, all notifications are recorded in the cluster log file (/var/log/efm-3.3/cluster_name.log).
Failover Manager supports a very specific and limited set of failover scenarios. Failover can occur:Failover Manager also supports a no auto-failover mode for situations where you want Failover Manager to monitor and detect failover conditions, but not perform an automatic failover to a Standby. In this mode, a notification is sent to the administrator when failover conditions are met. To disable automatic failover, modify the cluster properties file, setting the auto.failover parameter to false (see Section 3.5.1.1).
If no agent can reach the virtual IP address or the database server, Failover Manager starts the failover process. The Standby agent on the most up-to-date node runs a fencing script (if applicable), promotes the Standby database to Master database, and assigns the virtual IP address to the Standby node. Any additional Standby nodes are configured to replicate from the new master unless auto.reconfigure is set to false. If applicable, the agent runs a post-promotion script.
2.
1. If the cluster has more than one Standby node, use the efm allow-node command to set the node's failover priority to 1.
2. Invoke the efm promote -switchover command to promote the node to its original role of Master node. For more information about the command, please see Section 5.3.
After returning the Standby database to a healthy state, invoke the efm resume command to return the Standby to the cluster.
If this scenario has occurred because the master has been isolated from network, the Master agent will detect the isolation and release the virtual IP address and create the recovery.conf file. Failover Manager will perform the previously listed steps on the remaining nodes of the cluster.
Note: If there is only one Master and one Standby remaining, there is no failover protection in the case of a Master node failure. In the case of a Master database failure, the Master and Standby agents can agree that the database failed and proceed with failover.
1.
2. After installing Failover Manager, invoke the efm upgrade-conf utility to create the .properties and .nodes files for Failover Manager 3.3. The Failover Manager installer adds the upgrade utility (efm upgrade-conf) to the /usr/edb/efm-3.3/bin directory. To invoke the utility, assume root privileges, and invoke the command:The efm upgrade-conf utility locates the .properties and .nodes files of pre-existing clusters and copies the parameter values to a new configuration file for use by Failover Manager. The utility saves the updated copy of the configuration files in the /etc/edb/efm-3.3 directory.
3. Modify the .properties and .nodes files for EFM 3.3, specifying any new preferences. Version 3.3 of Failover Manager adds the following configuration properties:Use your choice of editor to modify any additional properties in the properties file (located in the /etc/edb/efm-3.3 directory) before starting the service for that node. For detailed information about property settings, see Section 3.5.
5. Start the new Failover manager service (efm-3.3) on each node of the cluster. For more information about starting the service, see Section 4.1.1.The following example demonstrates invoking the upgrade utility to create the .properties and .nodes files for a Failover Manager installation:If you are using a Failover Manager configuration without sudo, include the -source flag and specify the name of the directory in which the configuration files reside when invoking upgrade-conf.If you are using a Failover Manager configuration without sudo, include the -source flag and specify the name of the directory in which the configuration files reside. If the directory is not the configuration default directory, the upgraded files will be created in the directory from which the upgrade-conf command was invoked. For more information, see Section 3.4.1.
When your updates are complete, you can use the efm set-priority command to add the old master to the front of the standby list, and then switchover to return the cluster to its original state. For more information about efm set-priority, see Section 5.3.
In the example that follows, we will use a .pgpass file to enable md5 authentication for the replication user – this may or may not be the safest authentication method for your environment. For more information about the supported authentication options, please see the PostgreSQL core documentation at:
• The Master node resides on 146.148.46.44
• The Standby node resides on 107.178.217.178
• The replication user name is edbrepuser.Connect to the master node of the replication scenario, and modify the pg_hba.conf file (located in the data directory under your Postgres installation), adding connection information for the replication user (in our example, edbrepuser):Modify the postgresql.conf file (located in the data directory, under your Postgres installation), adding the following replication parameter and values to the end of the file:With your choice of editor, create a .pgpass file in the home directory of the enterprisedb user. The .pgpass file holds the password of the replication user in plain-text form; if you are using a .pgpass file, you should ensure that only trusted users have access to the .pgpass file:The server will enforce restrictive permissions on the .pgpass file; use the following command to set the file permissions:You must stop the database server before replacing the data directory on the Standby node with the data directory of the Master node. Use the command:After deleting the existing data directory, move into the bin directory and use the pg_basebackup utility to copy the data directory of the Master node to the Standby:cd /opt/edb/as10/bin
./pg_basebackup –R –D /opt/edb/as10/data
--host=146.148.46.44 –-port=5444
--username=edbrepuser --passwordThe call to pg_basebackup specifies the IP address of the Master node and the name of the replication user created on the Master node. For more information about the options available with the pg_basebackup utility, see the PostgreSQL core documentation at:After copying the data directory, change ownership of the directory to the database superuser (enterprisedb):With your choice of editor, create a file named recovery.conf( in the /opt/PostgresPlus/9.xAS/data directory) that includes:standby_mode = on
primary_conninfo = 'host=146.148.46.44 port=5444 user=edbrepuser sslmode=prefer sslcompression=1 krbsrvname=postgres'
trigger_file = '/opt/edb/as10/data/mytrigger'
restore_command = '/bin/true'
recovery_target_timeline = 'latest'The primary_conninfo parameter specifies connection information for the replication user on the master node of the replication scenario.Modify the postgresql.conf file (located in data directory, under the Postgres installation), specifying the following values at the end of the file:If you connect to the Standby with the psql client and query the pg_is_in_recovery() function, the server will reply:
1. Place a server.crt and server.key file in the data directory (under your Advanced Server installation). You can purchase a certificate signed by an authority, or create your own self-signed certificate. For information about creating a self-signed certificate, see the PostgreSQL core documentation at:
2.
3. Modify the pg_hba.conf file on each node of the Failover Manager cluster, adding the following line to the beginning of the file:The line instructs the server to reject any connections that are not using SSL authentication; this enforces SSL authentication for any connecting clients. For information about modifying the pg_hba.conf file, see the PostgreSQL core documentation at:
4. After placing the server.crt and server.key file in the data directory, convert the certificate to a form that Java understands; you can use the command:$JAVA_HOME is the home directory of your Java installation.You can use the keytool command to review a list of the available certificates or retrieve information about a specific certificate. For more information about using the keytool command, enter:The certificate from each database server must be imported into the trusted certificates file of each agent. Note that the location of the cacerts file may vary on each system. For more information, visit:
6.
12 Inquiries