"EPAS 11 Guide"

EDB Postgresフェイルオーバーマネージャーガイド

EDB Postgresフェイルオーバーマネージャーバージョン3.4

2019年1月23日

1 はじめに

EDB Postgresフェールオーバーマネージャー（EFM）はEnterpriseDBの高可用性モジュールで、マスター上でソフトウェアまたはハードウェアの障害が発生した場合にPostgresマスターノードが自動的にスタンバイノードにフェールオーバーできるようにします。

このガイドでは、Failover Manager 3.4のインストール、設定、および使用方法について説明します。

この文書ではPostgreSQLを使用してPostgreSQLまたはEDB Postgres Advanced Serverデータベースのいずれかを意味します。 EDB Postgres製品の使用に関する詳細は、以下のEnterpriseDB Webサイトをご覧ください。

http://www.enterprisedb.com/documentation

1.1 新機能

バージョン3.4を作成するために EDB Postgresフェイルオーバーマネージャーに以下の変更が加えられました。

•

Failover Managerでは、master.shutdown.as.failureプロパティを使用して、マスターノード上のエージェントのシャットダウンが失敗として扱われるように指示できるようになりました。詳細については、 3.5.1 項を参照してください。 master なるとあなたに警告する通知が追加されました。 shutdown ます。 as 。 failure プロパティは true 設定されてい true 。

•

エージェント終了通知は WARNING レベルになりました。これは、エージェントの再起動に失敗した場合（マシンの再起動後など）に注意を向けるのに役立ちます。詳細については、セクション 7を参照してください。

•

Failover Managerは、プロモーション中にVIPが使用されていないことの確認を再試行します。詳細はセクション 3.6を参照してください。

1.2 このガイドで使用されている表記規則

本書では、さまざまなコマンド、ステートメント、プログラム、例などの意味と使用方法を明確にするために、特定の表記規則が使用されています。この節では、これらの規則の要約を示します。

以下の説明では、用語は、言語キーワード、ユーザー指定の値、リテラルなどである任意の単語または単語のグループを指します。用語の正確な意味は、それが使用される文脈によって異なります。

•

イタリック体のフォントは、通常は初めて定義する文章の中に新しい用語を導入します。

•

Fixed-width (mono-spaced) font は、 SQL コマンド、例で使用されている特定のテーブル名および列名、プログラミング言語のキーワードなど、文字通りに与えなければならない用語に使用されます。例えば、 SELECT * FROM emp;

•

Italic fixed-width font は、ユーザーが実際の使用法で値を置き換える必要がある用語に使用されます。例えば、 DELETE FROM table_name ;

•

垂直パイプパイプの両側の用語間の選択を示します。垂直パイプは、角括弧（オプションの選択肢）または中括弧（1つの必須の選択肢）で2つ以上の代替用語を区切るために使用されます。

•

角括弧[]は、囲まれた用語の1つまたはすべてを置換できることを示します。たとえば、 [ a | b ] 、「 a 」または「 b 」のいずれかを選択するか、または両方を選択しないことを意味します。

•

中括弧{}は、囲まれた選択肢のうち1つだけを指定する必要があることを示します。たとえば、 { a | b } 正確に一つ「の意味 a 」又は「 b 」を指定しなければなりません。

•

省略記号...は、前の用語が繰り返される可能性があることを示します。たとえば、 [ a | b ] ... あなたが「 baaba 」というシーケンスを持っているかもしれないことを意味します。

2 フェイルオーバーマネージャー - 概要

EDB Postgresフェールオーバーマネージャー（EFM）クラスターは、ネットワーク上の次のホストにあるフェールオーバーマネージャープロセスで構成されています。

•

マスターノード - マスターノードは、データベースクライアントにサービスを提供しているプライマリデータベースサーバーです。

•

1つ以上のスタンバイノード - スタンバイノードは、マスターノードに関連付けられているストリーミング複製サーバーです。

•

監視ノード - 監視ノードは、フェールオーバーシナリオでマスターまたはスタンバイのいずれかのアサーションを確認します。クラスタに3つ以上のノードが含まれる場合、クラスタに専用の監視ノードは必要ありません。データベースホストである3番目のクラスタメンバーがない場合は、専用のWitnessノードを追加できます。

伝統的に、クラスタは複数のデータベースを管理するPostgresの単一インスタンスです。このドキュメントでは、クラスタという用語はフェールオーバーマネージャクラスタを指します。 Failover Managerクラスタは、マスターエージェント、1つ以上のスタンバイエージェント、およびクラウド内のサーバー上または従来のネットワーク上に存在し、JGroupsツールキットを使用して通信するオプションのWitnessエージェントから構成されます。

図2.1 - 仮想IPアドレスを使用するFMシナリオ

監視エージェントが起動すると、ローカルデータベースに接続してデータベースの状態を確認します。

•

エージェントがデータベースにアクセスできない場合、エージェントはアイドルモードで起動します。

•

データベースがリカバリ中であることが判明した場合、エージェントはスタンバイの役割を引き受けます。

•

データベースが回復中でない場合、エージェントはマスターの役割を引き受けます。

フェイルオーバーが発生した場合、Failover Managerは昇格したスタンバイがクラスタ内で最新のスタンバイであることを確認しようとします。スタンバイノードがマスターノードと同期していないと、データが失われる可能性があります。

JGroupsは、フェールオーバーマネージャが、メンバーノードが互いに通信し、ノードの障害を検出できるクラスタを作成できるようにするテクノロジを提供します。 JGroupの詳細については、次の公式プロジェクトサイトを参照してください。

http://www.jgroups.org

図2.1は、仮想IPアドレスを使用するFailover Managerクラスタを示しています。障害発生時にロードバランサーを再設定するための独自のフェンシングスクリプトを提供する場合は、仮想IPアドレスの代わりにロードバランサーを使用できます。 Failover Managerを仮想IPアドレスで使用する方法の詳細については、 3.6 項を参照してください。フェンシングスクリプトの使用方法の詳細については、 3.5.1項を参照してください。

2.1 サポートされているプラットフォーム

Failover Manager 3.4は、以下の環境で実行されているEDB Postgres Advanced ServerまたはPostgreSQL（バージョン9.3以降）インストールでサポートされています。

•

CentOS 6.xおよび7.x

•

Red Hat Enterprise Linux 6.xおよび7.x

•

Oracle Enterprise Linux 6.xおよび7.x

•

Red Hat Enterprise Linux（IBM Power 8リトルエンディアンまたはppc64le）7.x

•

Debian 9

•

SLES 12

•

Ubuntu 18.04

2.2 前提条件

Failover Managerクラスタを設定する前に、以下の前提条件を満たす必要があります。

Java 1.8（またはそれ以降）をインストールする

Failover Managerを使用する前に、まずJava（バージョン1.8以降）をインストールする必要があります。 Failover ManagerはOpenJDKでテストされており、そのバージョンのJavaをインストールすることを強くお勧めします。 Javaのインストール手順はプラットフォームによって異なります。詳細については、以下をご覧ください。

https://openjdk.java.net/install/

SMTPサーバーを提供する

ユーザ定義の通知スクリプト、電子メール、またはその両方の指定に従って、Failover Managerから通知を受信できます。

•

電子メール通知を使用している場合は、Failover Managerシナリオの各ノードでSMTPサーバーが実行されている必要があります。

•

script 値を入力した場合。 notificationプロパティは、あなたがuserを残すことができuser 。 emailフィールドは空白です。 SMTPサーバーは必要ありません。

イベントが発生すると、Failover Managerはスクリプト（提供されている場合）を呼び出して、 user 指定されている任意のEメールアドレスに通知Eメールを送信し user 。クラスタプロパティファイルの email パラメータ。 SMTPサーバーの使用に関する詳細については、次のURLにアクセスしてください。

https://access.redhat.com/site/documentation

詳細はセクション 3.5.1.1を参照してください。

ストリーミングレプリケーションを構成する

Failover Managerでは、PostgreSQLストリーミングレプリケーションがマスターノードとスタンバイノード間で設定されている必要があります。 Failover Managerは他の種類のレプリケーションをサポートしません。

- sourcenode オプションと一緒に指定しない限り、 recovery 。スイッチオーバー中に、 conf ファイルがランダムなスタンバイノードから停止したマスターにコピーされます。あなたは recovery 内のパスを確認する必要があります。スタンバイノード上の conf ファイルは、スイッチオーバーを実行する前は一貫しています。 - sourcenode オプションの詳細については、セクション 4.1.4 を参照してください。

レプリケーションスロットを使用してWALセグメントを管理する場合、フェールオーバーマネージャーはフェールオーバー後のスタンバイデータベースの自動再構成をサポートしません。レプリケーションスロットを使用している場合は、 auto を設定する必要があります。パラメータを false に reconfigure 設定し、フェイルオーバーの発生時にスタンバイサーバを手動で再設定します。

pg_hba.confファイルを修正する

マスターノードとスタンバイノードの pg_hba.conf ファイルを修正して、クラスター内のすべてのノード間の通信を可能にするエントリを追加する必要があります。次の例は、マスターノード上の pg_hba.conf ファイルに作成される可能性のあるエントリを示しています。

# access for itself
host fmdb efm 127.0.0.1/32 md5
# access for standby
host fmdb efm 192.168.27.1/32 md5
# access for witness
host fmdb efm 192.168.27.34/32 md5

どこで：

efm は有効なデータベースユーザーの名前を指定します。

fmdb は、 efm ユーザーが接続できるデータベースの名前を指定します。

properties ファイルの詳細については、 3.5.1項を参照してください。

デフォルトでは、 pg _ hba です。 conf ファイルはあなたのPostgresインストールの下の data ディレクトリにあります。 pg_hba.conf ファイルを変更したら、変更を有効にするために各ノードで設定ファイルをリロードする必要があります。次のコマンドを使用できます。

# systemctl reload edb-as- x

どこで x Postgresのバージョンを指定します。

データベースサーバに対する自動起動の使用

マスターノードが再起動した場合、フェールオーバーマネージャーはデータベースがマスターノードで停止していることを検出し、スタンバイノードをマスターの役割に昇格させます。これが発生した場合、（再起動された）マスターノード上のFailover Managerエージェントは recovery を書き込む機会を得られません。 conf ファイル再起動したマスターノードは、2番目のマスターノードとしてクラスターに戻ります。

これを防ぐには、データベースサーバを起動する前にFailover Managerエージェントを起動します。エージェントは idle モードで起動し、クラスタにマスターがすでに存在するかどうかを確認します。マスターノードがある場合、エージェントはその recovery を確認します。 conf ファイルが存在し、データベースが2番目のマスターとして起動しません。

ファイアウォールを介した通信を確保する

Failover Managerノードのホストで Linuxファイアウォール（ iptables ）が有効になっている場合は、ファイアウォール設定に、クラスタ内のFailover Managerプロセス間の tcp 通信を許可するルールを追加する必要があります。例えば：

# iptables -I INPUT -p tcp --dport 7800:7810 -j ACCEPT
/sbin/service iptables save

上に示したコマンドは、ポートの小さな範囲（開き 7800 を介して 7810 ）。 Failover Managerは、クラスタプロパティファイルで指定されたポートに対応するポートを介して接続します。

db.userに十分な特権があることを確認してください。

efm 指定されたデータベースユーザー。 properties ファイルには、Failover Managerに代わって次の機能を呼び出すための十分な権限が必要です。

pg_current_wal_lsn()

pg_last_wal_replay_lsn()

pg_wal_replay_pause()

pg_is_wal_replay_paused()

pg_wal_replay_resume()

これらの各機能の詳細については、次のWebサイトで入手可能なPostgreSQLのコアドキュメントを参照してください。

https://www.postgresql.org/docs/10/static/index.html

2.3 チュートリアル - 単純なFailover Managerクラスタの設定

このチュートリアルでは、テスト環境でのFailover Managerクラスタの迅速な設定について説明します。このガイドの他のセクションでは、本番配備用にFailover Managerを設定する前に読んで理解しておくべき重要な情報を提供します。

このチュートリアルでは、次のことを前提としています。

•

データベースサーバーが実行されており、マスターと1つまたは2つのスタンバイノードの間にストリーミングレプリケーションが設定されています。

•

各ノードにFailover Managerをインストールしました。 Failover Managerのインストールの詳細については、セクション3を参照してください。

次の例では、 efm という名前のクラスタを作成します。

マスターノードまたはスタンバイノードで構成プロセスを開始する必要があります。次に、構成ファイルを他のノードにコピーして時間を節約します。

ステップ1：作業用設定ファイルの作成

提供されているサンプルファイルをコピーしてEFM構成ファイルを作成し、所有権を修正します。

cd /etc/edb/efm-3.4
cp efm.properties.in efm.properties
cp efm.nodes.in efm.nodes
chown efm：efm efm.properties
chown efm：efm efm.nodes

ステップ2：暗号化パスワードを作成する

暗号化パスワードを作成します（プロパティファイルに必要）。

/usr/edb/efm-3.4/bin/efm encrypt efm

画面上の指示に従って、データベースパスワードの暗号化バージョンを作成します。

ステップ3：efm.propertiesファイルを更新する

クラスタ _ 名。 properties ファイルには、Failover Managerクラスタの接続プロパティと動作を指定するパラメータが含まれています。プロパティ設定への変更は、Failover Managerの起動時に適用されます。

次のプロパティは、Failover Managerクラスタを構成するために必要な最小限のプロパティです。プロダクションシステムを設定している場合は、プロパティの完全なリストについて3.5.1を参照してください。

データベース接続プロパティ（必要に応じて他のデータベースに接続できるように、証人にも必要）。

db.user
db.password.encrypted
db.port
db.database

データディレクトリの所有者（通常は postgres または enterprisedb ）：

db.service.owner

以下のプロパティのうち1つだけが必要です。サービス名を指定した場合、EFMは必要に応じてサービスコマンドを使用してデータベースサーバを制御します。 Postgresの bin ディレクトリの場所を指定した場合、EFMはデータベースサーバを制御するために pg _ ctl を使用します。

db.service.name
db.bin

EFMは、検索または回復を作成するするデータディレクトリ。 conf ファイル：

db.recovery.conf.dir

電子メール通知を受信するように設定します（通知テキストはエージェントログにも含まれます）。

user.email

これは、EFMに使用するノードおよびポートのローカルアドレスです。他のノードはこのアドレスを使用してエージェントにアクセスし、エージェントは（localhostに接続するのではなく）ローカルデータベースに接続するためにもこのアドレスを使用します。フォーマットの例を以下に示します。

bind.address=1.2.3.4:7800

ミラーリング監視ノードでこのプロパティを true に設定し、マスターノードまたはスタンバイノードの場合は falseに設定します。

is.witness

インターネットにアクセスできないネットワーク上で実行している場合は、これをネットワーク上で使用可能なアドレスに変更します。

pingServerIp=8.8.8.8

プロダクションクラスタを設定するときは、システムの設定と使用方法に応じて、次のプロパティが true または falseなります。 EFMテストクラスターを構成している場合は、両方を true に設定して起動を簡単にします。

auto.allow.hosts=true
stable.nodes.file=true

ステップ4：efm.nodesファイルを更新する

クラスタ _ 名 .nodes fileは起動時に読み込まれ、エージェントにクラスタの残りの部分を見つける方法を指示するか、最初に起動されたノードの場合は後続のノードの認証を単純化するために使用できます。

クラスタ内の各ノードのアドレスとポートをこのファイルに追加します。 1つのノードがメンバーシップコーディネーターとして機能します。リストには、少なくとも会員増強コーディネーターの住所を含める必要があります。

1.2.3.4:7800
1.2.3.5:7800
1.2.3.6:7800

Failover Managerエージェントは efm 内容を確認しません。 nodes ファイル。エージェントは、ファイル内の一部のアドレスに到達できないことを期待しています（たとえば、別のエージェントがまだ開始されていないなど）。 efm 詳細については。 nodes ファイル、 3.5.2 項を参照。

手順5：他のノードを構成する

efm コピーして efm 。 properties そして efm 。 nodes /etc/edb/efm-3.4 へのファイルサンプルクラスタ内の他のノードのディレクトリ。ファイルをコピーし efm 、ファイルが efm ： efm によって所有されるようにファイルの所有権を変更します。 efm 。次のプロパティを除いて、 properties ファイルはすべてのノードで同じにすることができます。

•

bind 変更してください。ノードのローカルアドレスを使用するための address プロパティ。

•

セット is 。 witness ノードが証人ノードである場合はtrue に。ノードがミラーリング監視ノードの場合、ローカルデータベースのインストールに関連するプロパティは無視されます。

手順6：EFMクラスタを起動する

任意のノードで、Failover Managerエージェントを起動します。このエージェントの名前は efm-3.4 です。プラットフォーム固有のserviceコマンドを使ってサービスを制御できます。たとえば、CentOSまたはRHEL 7.xホストでは、次のコマンドを使用します。

systemctl start efm-3.4

CentOSまたはRHEL 6.xホストでは、次のコマンドを使用します。

service efm-3.4 start

エージェントが起動したら、次のコマンドを実行してシングルノードクラスタのステータスを確認します。他のノードのアドレスが Allowed 表示されます。 node host リスト。

/usr/edb/efm-3.4/bin/efm cluster-status efm

他のノードでエージェントを起動します。 efm 実行する cluster - status 任意のノードで efm コマンドを実行してクラスタのステータスを確認します。

If any agent fails to start, see the startup log for information about what went wrong:

cat /var/log/efm-3.4/startup-efm.log

スイッチオーバーの実行

クラスタステータスの出力に、マスターとスタンバイが同期していることが示されている場合は、次のコマンドでスイッチオーバーを実行できます。

/usr/edb/efm-3.4/bin/efm promote efm -switchover

そのコマンドはスタンバイを促進し、マスターデータベースをクラスタ内の新しいスタンバイとして再設定します。元に戻すには、コマンドを再度実行してください。

efm コマンドラインツールの使用方法の詳細については、 5.3項を参照してください。

3 Failover Managerのインストールと設定

Failover Managerをインストールして設定する前に、Postgresストリーミングレプリケーションシナリオを作成し、ノードが互いに通信するのに十分な権限を持っていることを確認してください。 EnterpriseDBリポジトリへのアクセスを許可する認証情報も必要です。

リポジトリの認証情報を要求するには、EnterpriseDB Advanced にアクセスしてください。 Downloads ページ：

https://www.enterprisedb.com/advanced-downloads

認証情報を要求するには、EDB Failover Managerテーブルのリンクをたどります。

3.1 RedHat、CentOS、またはOELホストへのRPMパッケージのインストール

資格情報を受け取ったら、クラスタの各ノードにEnterpriseDBリポジトリ構成ファイルを作成し、そのファイルを変更してアクセスを有効にする必要があります。次の手順では、EnterpriseDBリポジトリへのアクセスに関する詳細情報を提供します。手順は、クラスタの各ノードで実行する必要があります。

1。

使用 edb - repo リポジトリ設定ファイルを作成するために、パッケージを。 edb - repo ファイルをダウンロードして起動するか、rpmまたはyumを使用してリポジトリを作成します。スーパーユーザー特権を想定し、 rpm または yum を使用してEnterpriseDBリポジトリ構成ファイルを作成します。：

rpm -Uvh http://yum.enterprisedb.com/edbrepos/edb-repo-latest.noarch.rpm

または

yum install -y http://yum.enterprisedb.com/edbrepos/edb-repo-latest.noarch.rpm

リポジトリ設定ファイルの名前は edb です。レポ ;それは内に存在 /etc/yum.repos.d.

2。

選択したエディタを使用してリポジトリ設定ファイルを変更し、 [enterprisedb-tools] ]エントリと[ enterprisedb - dependencies ]エントリを有効にします。リポジトリを有効にするには、 enabled パラメータの値を 1 して baseurl 仕様のユーザ名とパスワードのプレースホルダを、自分のユーザ名とリポジトリのパスワードに置き換えます。

[enterprisedb-tools]

name=EnterpriseDB Tools $releasever - $basearch

baseurl=http://<username>:<password>@yum.enterprisedb.com/tools/redhat/rhel-$releasever-$basearch

enabled=0

gpgcheck=1

gpgkey=file:///etc/pki/rpm-gpg/ENTERPRISEDB-GPG-KEY

[enterprisedb-dependencies]

name=EnterpriseDB Dependencies $releasever - $basearch

baseurl=http://<username>:<password>@yum.enterprisedb.com/dependencies/redhat/rhel-$releasever-$basearch

enabled=0

gpgcheck=1

gpgkey=file:///etc/pki/rpm-gpg/ENTERPRISEDB-GPG-KEY

3。

リポジトリ設定ファイルの該当するエントリを変更したら、設定ファイルを保存してエディタを終了します。

そして、あなたは yum を使うことができます Failover Managerをインストール install コマンド。たとえば、Failover Managerバージョン3.4をインストールするには、次のコマンドを使用します。

yum install edb-efm34

あなたのシステムによって認識されないソースによって署名されたRPMパッケージをインストールするとき、yumはあなたのローカルサーバーにキーをインポートする許可を求めます。プロンプトが表示され、パッケージが信頼できるソースからのものであることを確認したら、 yを入力して Return キーを押して続行します。

インストール中に、yumは解決できない依存関係に遭遇する可能性があります。もしそうなら、それはあなたが手動で解決しなければならない必要な依存関係のリストを提供するでしょう。

Failover Managerは root でインストールする必要があります。インストールプロセス中に、インストーラは、 enterprisedb または postgres が所有するクラスタ用のFailover Managerサービスを制御するスクリプトを呼び出すための十分な特権を持つ efm という名前のユーザーも作成します。

フェールオーバーマネージャを使用して enterprisedb または postgres 以外のユーザーが所有するクラスタを監視している場合は、セクション 3.4 、フェールオーバーマネージャのアクセス許可の拡張を参照してください。

フェールオーバーマネージャをクラスタの各ノードにインストールした後、次の作業を行う必要があります。

1。

各ノードでクラスタプロパティファイルを変更します。クラスタプロパティファイルの変更の詳細については、 3.5.1 項を参照してください。

2。

各ノードでクラスタメンバーファイルを変更します。クラスタメンバーファイルの詳細については、 3.5.2項を参照してください。

3。

該当する場合は、仮想IPアドレス設定、およびクラスタプロパティファイルで識別されるスクリプトを設定してテストします。

4。

クラスタの各ノードでFailover Managerエージェントを起動します。 Failover Managerサービスの制御の詳細については、セクション5を参照してください。

3.1.1 設置場所

Failover Managerコンポーネントは、次の場所にインストールされています。

Component

Location

Executables

/usr/edb/efm-3.4/bin

Libraries

/usr/edb/efm-3.4/lib

Cluster configuration files

/etc/edb/efm-3.4

Logs

/var/log/efm-3.4

Lock files

/var/lock/efm-3.4

Log rotation file

/etc/logrotate.d/efm-3.4

sudo configuration file

/etc/sudoers.d/efm-34

Binary to access VIP without sudo

/usr/edb/efm-3.4/bin/secure

3.2 DebianまたはUbuntuホストへのRPMパッケージのインストール

Failover Managerをインストールするには、EnterpriseDBリポジトリへのアクセスを許可する認証情報も必要です。リポジトリの認証情報を要求するには、EnterpriseDB Advanced にアクセスしてください。 Downloads ページ：

https://www.enterprisedb.com/advanced-downloads

認証情報を要求するには、EDB Failover Managerテーブルのリンクをたどります。

次の手順では、EnterpriseDB aptリポジトリを使用してFailover Managerをインストールする手順を説明します。コマンドを使用するときは、 usernameとpasswordをEnterpriseDBから提供された資格情報に置き換えpassword 。

1。

スーパーユーザー特権を仮定します。

sudo su -

2。

EnterpriseDB aptリポジトリを設定します。

sh -c 'echo "deb https:// username : password @apt.enterprisedb.com/$(lsb_release -cs)-edb/ $(lsb_release -cs) main" > /etc/apt/sources.list.d/edb-$(lsb_release -cs).list'

3。

安全なAPTリポジトリのためにあなたのシステムにサポートを追加します。

apt-get install apt-transport-https

4。

EDB署名鍵を追加します。

wget -q -O - https:// username : password @apt.enterprisedb.com/edb-deb.gpg.key | apt-key add -

5。

リポジトリのメタデータを更新します。

apt-get update

6。

フェールオーバーマネージャをインストールします。

apt-get install edb-efm34

3.3 SLESホストへのRPMパッケージのインストール

Failover Managerをインストールするには、EnterpriseDBリポジトリへのアクセスを許可する認証情報も必要です。リポジトリの認証情報を要求するには、 Advanced にアクセスしてください。 Downloads ページ：

https://www.enterprisedb.com/advanced-downloads

zypperパッケージマネージャを使用して、SLES 12ホストにFailover Managerエージェントをインストールできます。 zypperはパッケージをインストールするときにパッケージの依存関係を満たそうとしますが、EnterpriseDBでホストされていない特定のリポジトリにアクセスする必要があります。

Failover Managerをインストールする前に、スーパーユーザー特権を引き受け、ファイアウォールを停止する必要があります。次に、以下のコマンドを使用してEnterpriseDBリポジトリをシステムに追加します。

zypper addrepo http://zypp.enterprisedb.com/suse/epas96-sles.repo
zypper addrepo http://zypp.enterprisedb.com/suse/epas-sles-tools.repo
zypper addrepo http://zypp.enterprisedb.com/suse/epas-sles-dependencies.repo

コマンドは /etc/zypp/repos.dディレクトリにリポジトリ設定ファイルを作成します。次に、次のコマンドを使用してSLESホスト上のメタデータを更新し、EnterpriseDBリポジトリを含めます。

zypper refresh

プロンプトが表示されたら、リポジトリの資格情報を入力し、指定されたキーを常に信頼するように aを指定 aて、EnterpriseDBリポジトリを含めるようにメタデータを更新します。

また、SUSEConnectとSUSE Package Hub拡張機能をSLESホストに追加し、そのホストをSUSEに登録して、SUSEリポジトリへのアクセスを許可する必要があります。以下のコマンドを使用してください。

zypper install SUSEConnect
SUSEConnect -r registration_number -e user_id
SUSEConnect -p PackageHub/12/x86_64
SUSEConnect -p sle-sdk/12/x86_64

その後、zypperユーティリティを使用してFailover Managerエージェントをインストールできます。

zypper install edb-efm34

SUSEホストの登録について詳しくは、次のWebサイトをご覧ください。

https://www.suse.com/support/kb/doc/?id=7016626

3.4 フェイルオーバマネージャの権限を拡張する

Failover Managerのインストール中に、インストーラは efm という名前のユーザを作成します。 efm は、通常データベースの所有者またはオペレーティングシステムのスーパーユーザーに制限されている管理機能を実行するための十分な特権がありません。

•

データベースのスーパーユーザー特権を必要とする管理機能を実行するとき、 efm は efm _ db _ functions スクリプトを呼び出します。

•

オペレーティングシステムのスーパーユーザー特権を必要とする管理機能を実行するとき、 efm は efm _ root _ functions スクリプトを呼び出します。

•

仮想IPアドレスを割り当てる、または解放すると、 efmはefm _ addressスクリプトを呼び出します。

efm _ db _ functions や efm _ root _ functions スクリプトが代わって管理機能を実行する efm ユーザー。

sudoersファイルには、ユーザー efm が postgres または enterprisedb によって所有されているクラスタのFailover Managerサービスを制御できるようにするエントリが含まれています。 sudoersファイルのコピーを変更して、他のユーザが所有するPostgresクラスタを efm に管理する権限を付与することができます。

efm-34 ファイルは/にあります etc/sudoers.d 、および次のエントリが含まれています。

# Copyright EnterpriseDB Corporation, 2014-2019. All Rights
# Reserved.
#
# Do not edit this file. Changes to the file may be overwritten
# during an upgrade.
#
# This file assumes you are running your efm cluster as user
# 'efm'. If not, then you will need to copy this file.

# Allow user 'efm' to sudo efm_db_functions as either 'postgres'
# or 'enterprisedb'. If you run your db service under a
# non-default account, you will need to copy this file to grant
# the proper permissions and specify the account in your efm
# cluster properties file by changing the 'db.service.owner'
# property.

efm ALL=(postgres) NOPASSWD: /usr/edb/efm-3.4 /bin/efm_db_functions
efm ALL=(enterprisedb) NOPASSWD: /usr/edb/efm-3.4 /bin/efm_db_functions

# Allow user 'efm' to sudo efm_root_functions as 'root' to
# write/delete the PID file, validate the db.service.owner
# property, etc.
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-3.4 /bin/efm_root_functions
# Allow user 'efm' to sudo efm_address as root for VIP tasks.
efm ALL=(ALL) NOPASSWD: /usr/edb/efm-3.4 /bin/efm_address
# relax tty requirement for user 'efm'
Defaults:efm !requiretty

Failover Managerを使用して postgres または enterprisedb 以外のユーザーが所有するクラスターをモニターしている場合は、 efm-34 ファイルのコピーを作成し、ユーザーが efm _ functions スクリプトにアクセスしてクラスターを管理できるように内容を変更します。

権限の問題のためにエージェントを起動できない場合は、デフォルトの /etc/sudoers ファイルの最後に次の行が含まれていることを確認してください。

## Read drop-in files from /etc/sudoers.d (the # here does not
# mean a comment)

#includedir /etc/sudoers.d

3.4.1 sudoを使用せずにFailover Managerを実行する

デフォルトでは、Failover Managerはsudoを使用してシステム機能へのアクセスを安全に管理します。フェイルオーバーマネージャーをsudoアクセスなしで実行するように設定することを選択した場合、 root アクセスが依然として以下に必要であることに注意してください。

•

Failover Manager RPMをインストールします。

•

フェールオーバーマネージャのセットアップ作業を実行します。

sudoを使用せずにFailover Managerを実行するには、Failover Managerに代わって管理機能を実行する権限を持つデータベースプロセス所有者を選択する必要があります。ユーザーは、デフォルトのデータベーススーパーユーザー（たとえば、 enterprisedb または postgres ）または別の特権ユーザーになります。ユーザーを選択した後：

1。

次のコマンドを使用して、ユーザーを efm グループに追加します。

usermod -a -G efm enterprisedb

これにより、ユーザーは /var/run/efm-3.4および/var/lock/efm-3.4 に書き込むことができます。

2。

クラスタ名を再利用している場合は、以前に作成したログファイルをすべて削除します。新しいユーザーは、デフォルト（または他の）所有者によって作成されたログファイルに書き込むことはできません。

3。

クラスタプロパティテンプレートファイルとノードテンプレートファイルをコピーします。

su - enterprisedb
cp /etc/edb/efm-3.4/efm.properties.in directory / cluster_name .properties
cp /etc/edb/efm-3.4/efm.nodes.in directory / cluster_name .nodes

次に、クラスター・プロパティー・ファイルを修正して、 db 内のユーザーの名前を指定します。 service 。 ownerプロパティ。また、 db確認する必要があります。 service 。 nameプロパティは空白です。 sudoがないと、 rootアクセスなしでサービスを実行することはできません。

設定を変更した後、新しいユーザは次のコマンドでFailover Managerを制御できます。

/usr/edb/efm-3.4/bin/runefm.sh start|stop directory/cluster_name.properties

どこ directory/cluster_name.propertiesクラスタのプロパティファイルのフルパスと名前を指定します。デフォルト以外のユーザーがエージェントを制御しているとき、またはefmスクリプトを使用しているときは、必ずユーザーがプロパティーファイルへのefmパスを指定する必要があります。

新しいユーザーがFailover Managerをサービスとして管理できるようにするには、カスタムスクリプトまたはユニットファイルを指定する必要があります。

Failover Managerは /usr/edb/efm-3.4/bin/secure/にあるmanage - vip という名前のバイナリを使用して、 sudo特権なしでVIP管理操作を実行します。このスクリプトは、setuidを使用して仮想IPアドレスを管理するために必要な特権を取得します。

•

このディレクトリは、 rootとefmグループのユーザーだけがアクセスできます。

•

バイナリは rootとefmグループによってのみ実行可能です。

セキュリティ上の理由から、 /usr/edb/efm-3.4/bin/secure/ - /usr/edb/efm-3.4/bin/secure/ディレクトリまたはmanage - vipスクリプトのアクセス権限を変更しないことをお勧めします。

sudoを使用せずにFailover Managerを使用する方法の詳細については、次のURLにアクセスしてください。

https://www.enterprisedb.com/blog/running-edb-postgres-failover-manager-without-sudo

3.5 フェールオーバーマネージャの設定

構成可能なFailover Managerのプロパティは、2つのユーザー変更可能ファイルで指定されています。

•

efm.properties

•

efm.nodes

efm 。 properties ファイルには、それが存在する個々のノードのプロパティが含まれていますが、 efm は含まれています。 nodes ファイルには、現在のFailover Managerクラスタメンバーのリストが含まれています。デフォルトでは、インストーラはファイルを /etc/edb/efm-3.4 ディレクトリに配置します。

プロパティファイルで参照されているすべてのユーザースクリプトは、Failover Managerユーザーとして呼び出されます。

3.5.1 クラスタプロパティファイル

Failover Managerインストーラは、 efm という名前のクラスタプロパティファイル用のファイルテンプレートを作成します。 properties 。 in で /etc/edb/efm-3.4 ディレクトリ。 Failover Managerのインストールが完了したら、ファイルの内容を変更する前にテンプレートの作業用コピーを作成する必要があります。たとえば、次のコマンドは efm コピーし efm 。 properties 。 in 、ファイル、名前付きプロパティファイルの作成 efm 。 properties ：

# cp /etc/edb/efm-3.4/efm.properties.in /etc/edb/efm-3.4/efm.properties

テンプレートファイルをコピーし efm 、ファイルの所有者を efm に変更します。

# chown efm:efm efm.properties

注意してください。デフォルトでは、Failover Managerはクラスタプロパティファイルが efm.properties という名前であることを期待します。プロパティファイルに efm 以外の名前を付ける場合。 properties 変更する場合は、サービススクリプトまたはユニットファイルを変更して、Failover Managerに別の名前を使用するように指示する必要があります。

クラスタプロパティファイルを作成したら、必要に応じて構成パラメータ値を追加（または変更）します。各プロパティの詳細については、 3.5.1.1項を参照してください。

プロパティファイルは root が所有しています。 Failover Managerサービススクリプトは /etc/edb/efm-3.4 ディレクトリにファイルを見つけることを期待しています。プロパティファイルを別の場所に移動する場合は、新しい場所を指定するシンボリックリンクを作成する必要があります。

3.5.1.1 クラスタプロパティの指定

クラスタプロパティファイルに一覧表示されているプロパティを使用して、Failover Managerクラスタの接続プロパティと動作を指定できます。プロパティ設定の変更は、Failover Managerの起動時に適用されます。プロパティ値を変更した場合は、変更を適用するためにFailover Managerを再起動する必要があります。

プロパティ値は大文字と小文字を区別します。 Postgresはパラメータ値に引用符付きの文字列を使用しますが、Failover Managerはプロパティ値に引用符付きの文字列を使用できません。たとえば、Postgres設定パラメータで次のようにIPアドレスを指定するとします。

listen_addresses='192.168.2.47'

Failover Managerでは、値を引用符で囲まないでください。

bind.address=192.168.2.54:7800

efm のプロパティを使用して efm 。 Failover Managerの接続、管理、および運用の詳細を指定するための properties ファイル。

次のプロパティを使用して、Failover Managerクラスタの接続詳細を指定します。

# The value for the password property should be the output from
# 'efm encrypt' -- do not include a cleartext password here. To
# prevent accidental sharing of passwords among clusters, the
# cluster name is incorporated into the encrypted password. If
# you change the cluster name (the name of this file), you must
# encrypt the password again with the new name.
# The db.port property must be the same for all nodes.

db.user=

db.password.encrypted=

db.port=

db.database=

db 。指定された user は、フェイルオーバーマネージャーに代わって選択されたPostgreSQLコマンドを呼び出すための十分な特権を持っていなければなりません。詳しくはセクション 2.2 をご覧ください。

データベースユーザーのパスワードの暗号化については、 3.5.1.2 項を参照してください。

db 使用してください。 service 。 Failover Managerによって管理されているクラスタを所有するオペレーティングシステムユーザーの名前を指定する owner プロパティ。このプロパティは、専用の監視ノードでは必要ありません。

# This property tells EFM which OS user owns the $PGDATA dir for
# the 'db.database'. By default, the owner is either 'postgres'
# for PostgreSQL or 'enterprisedb' for EDB Postgres Advanced
# Server. However, if you have configured your db to run as a
# different user, you will need to copy the /etc/sudoers.d/efm-XX
# conf file to grant the necessary permissions to your db owner.
#
# This username must have write permission to the
# 'db.recovery.conf.dir' specified below.

db.service.owner=

データベースサービスの名前を db 指定します。 service 。 service 時または停止時に service または systemctl コマンドを使用する場合は、 name プロパティ。

# Specify the proper service name in order to use service
# commands rather than pg_ctl to start/stop/restart a database.
# For example, if this property is set, then 'service <name>
# restart' or 'systemctl restart <name>' (depending on OS
# version) will be used to restart the database rather than
# pg_ctl. This property is required unless db.bin is set.

db.service.name=

データベースサービスを起動または停止するたびに、同じサービス制御メカニズム（ pg _ ctl 、 service 、または systemctl ）を使用する必要があります。 pg _ ctl プログラムを使用してサービスを制御する場合は、 db 内の pg _ ctl プログラムの場所を指定します。 bin プロパティ

# Specify the directory containing the pg_ctl command, for
# example: /usr/pgsql-9.6/bin. Unless the db.service.name
# property is used, the pg_ctl command is used to
# start/stop/restart databases as needed after a failover or
# switchover. This property is required unless db.service.name
# is set.

db.bin=

db 使用してください。 recovery 。 conf 。クラスターのマスターノード上のリカバリファイルの dir を指定する dir プロパティ。スタンバイ上のトリガーファイルの書き込み先。このプロパティは、専用の監視ノードでは必要ありません。

# Specify the location of the db recovery.conf file on the node.
# On a standby node, the trigger file location is read from the
# file in this directory. After a failover, the recovery.conf
# files on remaining standbys are changed to point to the new
# master db (a copy of the original is made first). On a master
# node, a recovery.conf file will be written during failover and
# promotion to ensure that the master node can not be restarted
# as the master database.

db.recovery.conf.dir=

jdbc 使用してください。フェイルオーバーマネージャーにSSL接続を使用するように指示する sslmode プロパティー。デフォルトでは、SSLは無効になっています。

# Use the jdbc.sslmode property to enable ssl for EFM
# connections. Setting this property to anything but 'disable'
# will force the agents to use 'ssl=true' for all JDBC database
# connections (to both local and remote databases).
# Valid values are:
#
# disable - Do not use ssl for connections.
# verify-ca - EFM will perform CA verification before allowing
# the certificate.
# require - Verification will not be performed on the server
# certificate.

jdbc.sslmode=disable

SSLの設定と使用については、以下を参照してください。

https://www.postgresql.org/docs/10/static/ssl-tcp.html

そして

https://jdbc.postgresql.org/documentation/94/ssl.html

user 使用してください。 Failover Managerによって送信された通知を受け取るEメールアドレス（または複数のEメールアドレス）を指定するE email プロパティー。

# Email address(es) for notifications. The value of this
# property must be the same across all agents. Multiple email
# addresses must be separated by space. If using a notification
# script instead, this property can be left blank.

user.email=

通知を使用してください。 Failover Managerがユーザー通知を送信する、または通知スクリプトが呼び出されるときの最小の重大度を指定する level プロパティ。通知の完全なリストについては、セクション 7 を参照してください。

＃によって送信される通知の最低重要度レベル
＃エージェント。最小レベルは通知にも適用されます。
＃スクリプト（下）。有効な値はINFO、WARNING、およびSEVEREです。
＃通知のリストはユーザーの重要度によってグループ化されています
＃ガイド。

notification.level = INFO

script.notification プロパティを使用して、通知サービスとして機能するユーザー指定のスクリプトへのパスを指定します。スクリプトにはメッセージの件名とメッセージの本文が渡されます。このスクリプトは、Failover Managerがユーザー通知を生成するたびに呼び出されます。

# Absolute path to script run for user notifications.
#
# This is an optional user-supplied script that can be used for
# notifications instead of email. This is required if not using
# email notifications. Either/both can be used. The script will
# be passed two parameters: the message subject and the message
# body.

script.notification=

bind 。 address プロパティは、Failover Managerクラスタの現在のノードにあるエージェントのIPアドレスとポート番号を指定します。

# This property specifies the ip address and port that jgroups
# will bind to on this node. The value is of the form
# <ip>:<port>.
# Note that the port specified here is used for communicating
# with other nodes, and is not the same as the admin.port below,
# used only to communicate with the local agent to send control
# signals.
# For example, <provide_your_ip_address_here>:7800

bind.address=

フェールオーバーマネージャが管理コマンドを待機するポートを指定するには、 admin.port プロパティを使用します。

# This property controls the port binding of the administration
# server which is used for some commands (ie cluster-status). The
# default is 7809; you can modify this value if the port is
# already in use.

admin.port=7809

を設定し is 。現在のノードがミラーリング監視ノードであることを示すには、プロパティ witness を true します。場合が is 。 witness は true である、ローカルエージェントはローカルデータベースが実行されているかどうか確認しない。

# Specifies whether or not this is a witness node. Witness nodes
# do not have local databases running.

is.witness=

Postgres pg_is_in_recovery() 関数はデータベースの回復状態を報告するブール関数です。データベースがリカバリ中の場合、この関数は true 返し true 。データベースがリカバリ中でない場合、 false 返し true 。エージェントが起動すると、ローカルデータベースに接続して pg_is_in_recovery() 関数を呼び出します。サーバーが true と応答した true 、エージェントはスタンバイの役割を引き受けます。サーバーが false と応答し false 場合、エージェントはマスターの役割を引き受けます。ローカルデータベースがない場合、エージェントはアイドル状態になります。

場合が is 。 witness は true 、Failover Managerは回復状態をチェックしない。

local 。 period プロパティは、データベースサーバへの接続試行の間隔を秒数で指定します。
local 。 timeout プロパティは、エージェントがローカルデータベースサーバーからの肯定的な応答を待つ時間を指定します。
local 。 timeout.final プロパティは、現在のノードのデータベースサーバに最後に接続しようとした後にエージェントが待機する時間を指定します。応答は、で指定した秒数以内にデータベースから受信されていない場合は local 。 timeout.final プロパティは、データベースが失敗したと見なされます。

たとえば、これらのプロパティのデフォルト値を指定すると、ローカルデータベースのチェックは10秒ごとに1回行われます。ローカルデータベースへの接続試行が60秒以内に元に戻らない場合、Failover Managerはデータベースへの最後の接続試行を試みます。応答が10秒以内に受信されない場合、Failover Managerはデータベース障害を宣言し、 user リストされている管理者に通知し user 。 email プロパティ。これらのプロパティは、専用の監視ノードには必要ありません。

# These properties apply to the connection(s) EFM uses to monitor
# the local database. Every 'local.period' seconds, a database
# check is made in a background thread. If the main monitoring
# thread does not see that any checks were successful in
# 'local.timeout' seconds, then the main thread makes a final
# check with a timeout value specified by the
# 'local.timeout.final' value. All values are in seconds.
# Whether EFM uses single or multiple connections for database
# checks is controlled by the 'db.reuse.connection.count'
# property.

local.period=10
local.timeout=60
local.timeout.final=10

必要に応じて、ビジネスモデルに合わせてこれらの値を変更する必要があります。

remote 使用してください。 timeout プロパティ。エージェントがリモートデータベースサーバからの応答を待つ秒数（つまり、フェールオーバーを実行する前に、スタンバイエージェントがマスターデータベースが実際に停止していることを確認するまでの待機時間）を指定します。

# Timeout for a call to check if a remote database is responsive.
# For example, this is how long a standby would wait for a
# DB ping request from itself and the witness to the master DB
# before performing failover.

remote.timeout=10

node 使用してください。ノードが失敗したかどうかを判断するときにエージェントがノードからの応答を待つ秒数を指定する timeout プロパティ。 node 。 timeout プロパティ値は、エージェント間通信のタイムアウト値を指定します。クラスター・プロパティー・ファイル内の他のタイムアウト・プロパティーは、エージェントからデータベースへの通信のための値を指定します。

# The total amount of time in seconds to wait before determining
# that a node has failed or been disconnected from this node.
#
# The value of this property must be the same across all agents.

node.timeout=50

stop 使用してください。 isolated 。 masterエージェントが分離されていることをマスターエージェントが検出した場合にフェールオーバーマネージャーにデータベースをシャットダウンするように指示するmasterプロパティ。 true （デフォルト）の場合、Failover Managerはスクリプトで指定されたscriptを呼び出す前にデータベースを停止します。 master.isolatedプロパティ

# Shut down the database after a master agent detects that it has
# been isolated from the majority of the efm cluster. If set to
# true, efm will stop the database before running the
# 'script.master.isolated' script, if a script is specified.

stop.isolated.master=true

停止を使用してください。失敗しました。フェールオーバーマネージャがマスターデータベースにアクセスできない場合にマスターデータベースのシャットダウンを試みるように指示する master プロパティ。 trueの場合、フェイルオーバーマネージャーは、データベースを停止しようとした後script.db.failureプロパティで指定されたスクリプトを実行します。

＃EFMができなくなった後で、失敗したmasterデータベースをシャットダウンしようとしました。
＃もう接続する。これは安全性を高めるために使用できます。
＃フェイルオーバーがネットワーク上のネットワークの障害によって発生した場合
＃マスターノード。
＃指定した場合、この試行の後に 'script.db.failure'スクリプトが実行されます。

stop.failed.master = true

master 使い master 。 shutdown ます。 as 。マスターノード上のFailover Managerエージェントのシャットダウンが失敗として扱われるべきであることを示す failure パラメーター。このパラメーターが true 設定されていてマスターエージェントが（なんらかの理由で）停止した場合、クラスターはマスターノード上のデータベースが稼働しているかどうかを確認しようとします。

•

データベースに到達すると、エージェントのステータスを知らせる通知が送信されます。

•

データベースに到達していないと、フェイルオーバーが発生します。

# Treat a master agent shutdown as a failure. This can be set to
# true to treat a master agent shutdown as a failure situation,
# eg during the shutdown of a node, accidental or otherwise.
# Caution should be used when using this feature, as it could
# cause an unwanted promotion in the case of performing master
# database maintenance.
# Please see the user's guide for more information.

master.shutdown.as.failure=false

master 。 shutdownます。 as 。 failureプロパティは、マスターノードが誤ってシャットダウンされるなど、障害ではなくユーザーエラーを検出するためのものです。ユーザーがマスターFailover Managerエージェントを停止したように、ノードの適切なシャットダウンが残りのクラスターに表示されることがあります（例えば、マスターデータベースの保守を実行するため）。 masterを設定すれば。 shutdownます。 as 。 failureプロパティがtrue場合、メンテナンスを実行するときは注意が必要です。

masterデータベースを master メンテナンスする。 shutdownます。 as 。 failureあるtrue 、あなたはマスターエージェントを停止し、マスターエージェントが失敗したが、データベースがまだ実行されている通知を受信するのを待つ必要があります。それからmasterデータベースを停止しても安全です。あるいは、 efm stop - clusterコマンドを使用して、障害チェックを実行せずにすべてのエージェントを停止することもできます。

使用 pingServer フェイルオーバーマネージャーは、ネットワーク接続が問題ではありませんを確認するために使用できるサーバーのIPアドレスを指定するプロパティを。

# This is the address of a well-known server that EFM can ping
# in an effort to determine network reachability issues. It
# might be the IP address of a nameserver within your corporate
# firewall or another server that *should* always be reachable
# via a 'ping' command from each of the EFM nodes.
#
# There are many reasons why this node might not be considered
# reachable: firewalls might be blocking the request, ICMP might
# be filtered out, etc.
#
# Do not use the IP address of any node in the EFM cluster
# (master, standby, or witness because this ping server is meant
# to provide an additional layer of information should the EFM
# nodes lose sight of each other.
#
# The installation default is Google's DNS server.

pingServerIp=8.8.8.8

pingServerCommand プロパティを使用して、ネットワーク接続をテストするために使用されるコマンドを指定します。

# This command will be used to test the reachability of certain
# nodes.
#
# Do not include an IP address or hostname on the end of
# this command - it will be added dynamically at runtime with the
# values contained in 'virtualIp' and 'pingServer'.
#
# Make sure this command returns reasonably quickly - test it
# from a shell command line first to make sure it works properly.

pingServerCommand=/bin/ping -q -c3 -w5

使用 auto.allow.hosts 中で指定されたアドレスを使用するようにサーバーに指示するプロパティを。最初のノードの nodes ファイルが許可ホストリストの更新を開始しました。このプロパティを（設定有効にすると auto 。 allow 。 hosts に true ）クラスタの起動を簡素化することができます。

# Have the first node started automatically add the addresses
# from its .nodes file to the allowed host list. This will make
# it faster to start the cluster when the initial set of hosts
# is already known.

auto.allow.hosts=false

厩舎を使用してください。ノード。ノードがクラスターに参加または離脱したときにノードファイルを書き換えないようにサーバーに指示する file プロパティ。このプロパティは、IPアドレスが変更されていないクラスタで最も役立ちます。

＃trueに設定すると、EFMは常に.nodesファイルを書き換えません。
＃新しいノードがクラスタに参加またはクラスタから脱退します。これは始めるのを助けることができます
＃メンバーアドレスに期待される場合には＃cluster
＃ほぼ静的になり、 'auto.allow.hosts'と組み合わせると、
＃フェイルオーバーマネージャを学ぶときの起動が簡単になります。

stable.nodes.file = false

db.reuse.connection.count プロパティは倍のフェールオーバーマネージャーの数を指定するには、管理者がデータベースの状態をチェックするために、同じデータベース接続を再利用することができます。デフォルト値は 0 。これは、Failover Managerが毎回新しい接続を作成することを示します。このプロパティは、専用の監視ノードでは必要ありません。

# This property controls how many times a database connection is
# reused before creating a new one. If set to zero, a new
# connection will be created every time an agent pings its local
# database.

db.reuse.connection.count=0

auto.failover プロパティは、自動フェイルオーバーを可能にします。デフォルトでは、 auto です。 failover は true 設定されてい true 。

# Whether or not failover will happen automatically when the master
# fails. Set to false if you want to receive the failover notifications
# but not have EFM actually perform the failover steps.
# The value of this property must be the same across all agents.

auto.failover=true

auto 使用してください。プライマリスタンバイがマスターに昇格した後、残りのスタンバイサーバーの自動再設定を有効または無効にするようにフェールオーバーマネージャーに指示するためのプロパティを reconfigure ます。プロパティを設定し true 自動再構成（デフォルト）または有効にするために false 自動再構成を無効にします。このプロパティは、専用の監視ノードでは必要ありません。

# After a standby is promoted, failover manager will attempt to
# update the remaining standbys to use the new master. Failover
# manager will back up recovery.conf, change the host parameter

# of the primary_conninfo entry, and restart the database. The

# restart command is contained in either the efm_db_functions or

# efm_root_functions file; default when not running db as an os

# service is:
# "pg_ctl restart -m fast -w -t <timeout> -D <directory>"

# where the timeout is the local.timeout property value and the

# directory is specified by db.recovery.conf.dir. To turn off

# automatic reconfiguration, set this property to false.

auto.reconfigure=true

Please note: primary_conninfo is a space-delimited list of keyword=value pairs. is a space-delimited list of pairs.

Please note: If you are replication slots to manage your WAL segments, automatic reconfiguration is not supported; you should set 使用し Please note: If you are replication slots to manage your WAL segments, automatic reconfiguration is not supported; you should set auto replication slots to manage your WAL segments, automatic reconfiguration is not supported; you should set . false reconfigure to false . For more information, see Section 2.2 . For more information, see Section .

promotable プロパティを使用して、ノードをプロモートしないように指示します。設定を上書きするには、実行時に efm set-priority コマンドを使用します。詳細については、 efm set-priority コマンドは、セクションを参照 5.3 。

# A standby with this set to false will not be added to the
# failover priority list, and so will not be available for
# promotion. The property will be used whenever an agent starts
# as a standby or resumes as a standby after being idle. After
# startup/resume, the node can still be added or removed from the
# priority list with the 'efm set-priority' command. This
# property is required for all non-witness nodes.

promotable=true

minimum.standbys プロパティを使用して、クラスタに保持されるスタンバイノードの最小数を指定します。スタンバイカウントが指定された最小値まで低下すると、マスターノードに障害が発生してもレプリカノードはプロモートされません。

# Instead of setting specific standbys as being unavailable for
# promotion, this property can be used to set a minimum number
# of standbys that will not be promoted. Set to one, for
# example, promotion will not happen if it will drop the number
# of standbys below this value. This property must be the same on
# each node.

minimum.standbys=0

recovery.check.period プロパティを使用して、データベースが復旧していないかどうかを確認するためにFailover Managerが待機する秒数を指定します。

# Time in seconds between checks to see if a promoting database
# is out of recovery.

recovery.check.period=2

auto.resume.period プロパティを使用して、エージェントがそのデータベースの監視を再開しようとする秒数（監視対象データベースが失敗し、エージェントがアイドル状態になった後、またはIDLEモードで起動した後）を指定します。

# Period in seconds for IDLE agents to try to resume monitoring
# after a database failure or when starting in IDLE mode. Set to
# 0 for agents to not try to resume (in which case the
# 'efm resume <cluster>' command is used after bringing a
# database back up).

auto.resume.period=0

Failover Managerは、仮想IPを使用するクラスタをサポートします。クラスタが仮想IPを使用している場合は、 virtualIp プロパティにホスト名またはIPアドレスを入力します。 virtualIpに対応するプレフィックスを指定します。接頭辞プロパティ。 virtualIp が空白のままの場合、仮想IPサポートは無効になります。

virtualIpを使用してください。 VIPが使用するネットワークインターフェイスを提供する interface プロパティ。

指定された仮想IPアドレスは、クラスタのマスターノードにのみ割り当てられます。 virtualIp を指定した場合 single = trueの場合、フェイルオーバーが発生した場合、新しいVIPに同じVIPアドレスが使用されます。クラスターの各ノードに固有のIPアドレスを指定するには、値falseを指定してください。

仮想IPアドレスの使用については、 3.6 項を参照してください。

# These properties specify the IP and prefix length that will be
# remapped during failover. If you do not use a VIP as part of
# your failover solution, leave the virtualIp property blank to
# disable Failover Manager support for VIP processing (assigning,
# releasing, testing reachability, etc).
#
# If you specify a VIP, the interface and prefix are required.
#
# If specify a host name, it will be resolved to an IP address
# when acquiring or releasing the VIP. If the host name resolves
# to more than one IP address, there is no way to predict which
# address Failover Manager will use.
#
# By default, the virtualIp and virtualIp.prefix values must be
# the same across all agents. If you set virtualIp.single to
# false, you can specify unique values for virtualIp and
# virtualIp.prefix on each node.
#
# If you are using an IPv4 address, the virtualIp.interface value
# should not contain a secondary virtual ip id (do not include
# ":1", etc).

virtualIp=
virtualIp.interface=
virtualIp.prefix=
virtualIp.single = true

スイッチオーバーまたはマスターの障害が発生した場合にロードバランサを再設定するスクリプトへのパスを指定します。これらのスクリプトは、スタンバイに障害が発生した場合にも呼び出されます。

これらのプロパティを使用している場合は、クラスタのすべてのノード（master、standby、およびwitness）にそれらを提供する必要があります。これにより、データベースノードに障害が発生した場合、別のノードが失敗したノードのアドレスを使用してデタッチスクリプトを呼び出すようになります。

check 設定して check 。 vip 。 before 。 promotion へのプロパティ false のフェイルオーバーマネージャーは、VIPは障害が発生した場合にAA新しいマスタに割り当てる前に使用されているかどうかをチェックしないことを示すために。これにより、同じVIPアドレスで複数のノードがブロードキャストされる可能性があります。マスターノードが分離されているか、別のプロセスでシャットダウンできる場合を除き、このプロパティを true 設定する必要があり true 。

# Whether to check if the VIP (when used) is still in use before
# promoting after a master failure. Turning this off may allow
# the new master to have the VIP even though another node is also
# broadcasting it. This should only be used in environments where
# it is known that the failed master node will be isolated or
# shut down through other means.

check.vip.before.promotion=true

script.load.balancer.attachプロパティーの後にスクリプト名を指定して、ノードをロードバランサーに接続する必要があるときに呼び出されるスクリプトを識別します。ノードをロードバランサから切り離す必要があるときに呼び出されるスクリプトの名前を指定するには、 script.load.balancer.detachプロパティを使用します。クラスタに接続または削除されているノードのIPアドレスを表すために、 %hプレースホルダを含めます。

# Absolute path to load balancer scripts
# The attach script is called when a node should be attached to
# the load balancer, for example after a promotion. The detach
# script is called when a node should be removed, for example
# when a database has failed or is about to be stopped. Use %h to
# represent the IP/hostname of the node that is being
# attached/detached.
#
# Example:
# script.load.balancer.attach=/somepath/attachscript %h

script.load.balancer.attach=
script.load.balancer.detach=

script 。 fence は、スタンバイノードからマスターノードへの昇格中に呼び出される、オプションのユーザー指定スクリプトへのパスを指定します。

# absolute path to fencing script run during promotion
#
# This is an optional user-supplied script that will be run
# during failover on the standby database node. If left blank,
# no action will be taken. If specified, EFM will execute this
# script before promoting the standby.
#
# Parameters can be passed into this script for the failed master
# and new primary node addresses. Use %p for new primary and %f
# for failed master. On a node that has just been promoted, %p
# should be the same as the node's efm binding address.
#
# Example:
# script.fence=/somepath/myscript %p %f
#
# NOTE: FAILOVER WILL NOT OCCUR IF THIS SCRIPT RETURNS A NON-ZERO EXIT CODE.

script.fence=

script 使用してください。 post 。スタンバイ・ノードがマスターに昇格した後に呼び出されるオプションのユーザー提供スクリプトへのパスを指定するための promotion プロパティー。

# Absolute path to fencing script run after promotion
#
# This is an optional user-supplied script that will be run after
# failover on the standby node after it has been promoted and
# is no longer in recovery. The exit code from this script has
# no effect on failover manager, but will be included in a
# notification sent after the script executes.
#
# Parameters can be passed into this script for the failed master
# and new primary node addresses. Use %p for new primary and %f
# for failed master. On a node that has just been promoted, %p
# should be the same as the node's efm binding address.
#
# Example:
# script.post.promotion=/somepath/myscript %f %p

script.post.promotion=

script 使用してください。エージェントがデータベースの監視を再開したときに呼び出されるユーザー指定のスクリプトへのオプションのパスを指定する resumed プロパティ。

# Absolute path to resume script
#
# This script is run before an IDLE agent resumes
# monitoring its local database.

script.resumed=

script 使用してください。 db 。エージェントがモニターするデータベースに failure ことをエージェントが検出した場合にFailover Managerが呼び出すオプションのユーザー提供スクリプトへの絶対パスを指定する failure プロパティー。

# Absolute path to script run after database failure

#

# This is an optional user-supplied script that will be run after

# an agent detects that its local database has failed.

script.db.failure=

script 使用してください。 master 。マスターデータベースを監視しているエージェントが、マスターがフェールオーバーマネージャークラスターの大部分から分離されていることを検出した場合に、フェールオーバーマネージャーが呼び出すオプションのユーザー指定スクリプトへの絶対パスを指定するための isolated プロパティこのスクリプトは、VIPが解放された直後に呼び出されます（VIPが使用中の場合）。

# Absolute path to script run on isolated master

#

# This is an optional user-supplied script that will be run after

# a master agent detects that it has been isolated from the
# majority of the efm cluster.

script.master.isolated=

script 使用してください。 remote 。 pre 。ノードがデータベースをマスターに昇格させようとしているときに、昇格に関与していないエージェントノードで呼び出されるスクリプトのパスと名前を指定する promotion プロパティ。

新しいプライマリノードのアドレスを識別するために、 %p プレースホルダを含めます。

# Absolute path to script invoked on non-promoting agent nodes
# before a promotion.
#
# This optional user-supplied script will be invoked on other
# agents when a node is about to promote its database. The exit
# code from this script has no effect on Failover Manager, but
# will be included in a notification sent after the script
# executes.
#
# Pass a parameter (%p) with the script to identify the new
# primary node address.
#
# Example:
# script.remote.pre.promotion=/path_name/script_name %p

script.remote.pre.promotion=

script 使用してください。 remote 。 post 。 promotion 促進の発生後にマスター以外のノードで呼び出されるスクリプトのパスと名前を指定する promotion プロパティー。

新しいプライマリノードのアドレスを識別するために、 %p プレースホルダを含めます。

# Absolute path to script invoked on non-master agent nodes
# after a promotion.
#
# This optional user-supplied script will be invoked on nodes
# (except the new master) after a promotion occurs. The exit code
# from this script has no effect on Failover Manager, but will be
# included in a notification sent after the script executes.
#
# Pass a parameter (%p) with the script to identify the new
# primary node address.
#
# Example:
# script.remote.post.promotion=/path_name/script_name %p

script.remote.post.promotion=

使用 script.custom.monitor 定期的に呼び出されるオプションのスクリプトの名前と場所を提供するプロパティを（で秒単位で指定し custom 。 monitor 。 interval プロパティ）。

custom 使用してください。 monitor 。スクリプトの実行が許可される最大時間を指定する timeout 。指定された時間内にスクリプトの実行が完了しない場合、Failover Managerは通知を送ります。

設定し custom 。 monitor 。 safe です。フェールオーバーマネージャがスクリプトからゼロ以外の終了コードを報告するように指示し、終了コードの結果としてスタンバイを昇格させないように指示するには、 mode を true に mode し true 。

# Absolute path to a custom monitoring script.
#
# Use script.custom.monitor to specify the location and name of
# an optional user-supplied script that will be invoked
# periodically to perform custom monitoring tasks. A non-zero
# exit value means that a check has failed; this will be treated
# as a database failure. On a master node, script failure will
# cause a promotion. On a standby node script failure will
# generate a notification and the agent will become IDLE.
#
# The custom.monitor.* properties are required if a custom
# monitoring script is specified:
#
# custom.monitor.interval is the time in seconds between executions of the script.
#
# custom.monitor.timeout is a timeout value in seconds for how
# long the script will be allowed to run. If script execution
# exceeds the specified time, the task will be stopped and a
# notification sent. Subsequent runs will continue.
#
# If custom.monitor.safe.mode is set to true, non-zero exit codes
# from the script will be reported but will not cause a promotion
# or be treated as a database failure. This allows testing of the
# script without affecting EFM.
#
script.custom.monitor=
custom.monitor.interval=
custom.monitor.timeout=
custom.monitor.safe.mode=

sudo 使ってください。拡張権限を必要とするタスクを実行するときにFailover Managerによって呼び出されるコマンドを指定する command プロパティー。システム認証に固有のコマンドオプションを含めるには、このオプションを使用します。

sudo 使ってください。 user 。データベース所有者によって実行されるコマンドを実行するときにFailover Managerによって呼び出されるコマンドを指定する command プロパティー。

# Command to use in place of 'sudo' if desired when efm runs
# the efm_db_functions or efm_root_functions, or efm_address
# scripts.
# Sudo is used in the following ways by efm:
#
# sudo /usr/edb/efm-<version>/bin/efm_address <arguments>
# sudo /usr/edb/efm-<version>/bin/efm_root_functions <arguments>
# sudo -u <db service owner>
/usr/edb/efm-<version>/bin/efm_db_functions <arguments>
#
# 'sudo' in the first two examples will be replaced by the value
# of the sudo.command property. 'sudo -u <db service owner>' will
# be replaced by the value of the sudo.user.command property.
# The '%u' field will be replaced with the db owner.

sudo.command=sudo
sudo.user.command=sudo -u %u

ロックを使用してください。 Failover Managerロックファイルの代替場所を指定する dir プロパティ。このファイルは、Failover Managerがノード上の単一のクラスタに対して複数の（孤立している可能性がある）エージェントを起動するのを防ぎます。

＃ノード上のロックファイルのディレクトリを指定します。フェイルオーバー
＃マネージャは、この場所に<cluster> .lockという名前のファイルを作成します。
＃同じクラスタに対して複数のエージェントを起動しないでください。パスが
＃が存在しない場合、Failover Managerはそれを作成しようとします。もし
＃指定されていない場合のデフォルトは '/ var / lock / efm- <version>'です。

lock.dir =

log 使用してください。エージェントログファイルが書き込まれる場所を指定する dir プロパティ。ディレクトリが存在しない場合、Failover Managerはディレクトリの作成を試みます。

# Specify the directory of agent logs on the node. If the path
# does not exist, Failover Manager will attempt to create it. If
# not specified defaults to '/var/log/efm-<version>'. (To store
# Failover Manager startup logs in a custom location, modify the
# path in the service script to point to an existing, writable
# directory.)
# If using a custom log directory, you must configure
# logrotate separately. Use 'man logrotate' for more information.

log.dir=

Failover ManagerホストでUDPまたはTCPプロトコルを有効にした後は、syslogへのロギングを有効にできます。 syslogを使用してください。プロトコルタイプ（UDPまたはTCP）とsyslogを指定するprotocolパラメータ。 syslogホストのリスナーポートを指定するportパラメータ。 syslogです。 facility値は、エントリを作成したプロセスの識別子として使用できます。値はLOCAL0とLOCAL7の間になければなりません。

＃syslog情報syslogサービスは待機している必要があります
＃指定されたプロトコルのポート。UDPまたはTCPです。
＃サポートされている機能はLOCAL0からLOCAL7です。

syslog.host = localhost
syslog.port = 514
syslog.protocol = UDP
syslog.facility = LOCAL1

file 使用してください。 log 。 enabled 、 syslog 。実装したいロギングのタイプを指定するためのenabledなプロパティ。設定file log 。ファイルへのロギングをenabledするにはtrueに有効にします。 UDPプロトコルまたはTCPプロトコルを有効にしてsyslogを設定します。 syslogへのロギングをenabledするには、 trueにenabledしtrue 。ファイルとsyslogの両方へのロギングを有効にできます。

＃どのロギングが有効になっていますか。

file.log.enabled = true
syslog.enabled = false

syslogロギングの設定に関する詳細は、 6.1 節を参照してください。

jgroups 使用してください。 loglevel と efm 。フェールオーバーマネージャによってログに記録される詳細レベルを指定する loglevel パラメータ。デフォルト値は INFO です。ロギングの詳細については、セクション 6 、ロギングの制御を参照してください。

# Logging levels for JGroups and EFM.
# Valid values are: TRACE, DEBUG, INFO, WARN, ERROR
# Default value: INFO
# It is not necessary to increase these values unless debugging a
# specific issue. If nodes are not discovering each other at
# startup, increasing the jgroups level to DEBUG will show
# information about the TCP connection attempts that may help
# diagnose the connection failures.

jgroups.loglevel=INFO
efm.loglevel=INFO

jvm 使って jvm 。 JVM関連の設定情報を渡すための options プロパティ。デフォルト設定では、Failover Managerエージェントが使用を許可されるメモリ量を指定します。

# Extra information that will be passed to the JVM when starting
# the agent.

jvm.options=-Xmx128m

3.5.1.2 データベースパスワードの暗号化

Failover Managerでは、データベースのパスワードをクラスタプロパティファイルに含める前に暗号化する必要があります。 efm ユーティリティ（ efm ます）を使用して /usr/edb/efm-3.4 /bin パスワードを暗号化するためのディレクトリ）。パスワードを暗号化するときは、ユーティリティを起動するときにコマンドラインでパスワードを渡すか、または EFMPASS 環境変数を使用できます。

パスワードを暗号化するには、次のコマンドを使用します。

# efm encrypt cluster_name [ --from-env ]

どこ cluster_name フェールオーバーマネージャークラスタの名前を指定します。

--from-env オプションを含める場合は、暗号化ユーティリティを起動する前に暗号化したい値をエクスポートする必要があります。例えば：

export EFMPASS= password

--from-env オプションを含めない場合、フェールオーバーマネージャーは、データベースのパスワードを2回入力するように求めてから、クラスターのプロパティファイルに暗号化されたパスワードを生成します。ユーティリティが暗号化パスワードを共有したら、暗号化パスワードをコピーしてクラスタプロパティファイルに貼り付けます。

注意してください：多くのJavaベンダーはフル強度の暗号化を含んだ彼らのバージョンのJavaを出荷しますが、輸出規制のために可能にされません。データベースパスワードを暗号化しようとしたときに不正なキーサイズに関するエラーが発生した場合は、プラットフォームに無制限のポリシーを提供するJava Cryptography Extension（JCE）をダウンロードして有効にする必要があります。

次の例は、 encrypt ユーティリティを使用して acctg クラスタのパスワードを暗号化する方法を示しています。

# efm encrypt acctg
This utility will generate an encrypted password for you to place in your EFM cluster property file:
/etc/edb/efm-3.4/acctg.properties

Please enter the password and hit enter:
Please enter the password again to confirm:
The encrypted password is: 516b36fb8031da17cfbc010f7d09359c

Please paste this into your acctg.properties file
db.password.encrypted=516b36fb8031da17cfbc010f7d09359c

注意してください。プロパティファイルが存在しない場合は、ユーティリティから通知されます。

暗号化されたパスワードを受け取ったら、そのパスワードをプロパティファイルに貼り付けてFailover Managerサービスを開始します。暗号化されたパスワードに問題があると、Failover Managerサービスは開始されません。

[witness@localhost ~]# service efm-3.4 start
Starting local efm-3.4 service: [FAILED]

Failover Managerサービスの起動時にこのメッセージが表示された場合は、起動ログ（ /var/log/efm-3.4/startup-efm.log ）を参照してください。

RHEL 7.xまたはCentOS 7.xを使用している場合は、次のコマンドで起動情報も入手できます。

systemctl status efm-3.4

クラスタが誤って別のクラスタのデータベースに接続するのを防ぐために、クラスタ名は暗号化されたパスワードに組み込まれています。クラスタ名を変更した場合は、データベースパスワードを再暗号化してクラスタプロパティファイルを更新する必要があります。

EFMPASS環境変数の使用

次の例は、パスワードを暗号化するときに --from-env 環境変数を使用する方法を示しています。 efm を起動する前に encrypt コマンドで、 EFMPASS の値をパスワード（ 1safepassword ）に設定します。

# export EFMPASS=1safepassword

次に、 efm 起動し efm --from-env オプションを指定して encrypt ます。

# efm encrypt acctg --from-env
# 7ceecd8965fa7a5c330eaa9e43696f83

暗号化されたパスワード（ 7ceecd8965fa7a5c330eaa9e43696f83 ）はテキスト値として返されます。スクリプトを使用するときは、コマンドの終了コードを確認して、コマンドが成功したことを確認できます。正常に実行された場合、 0 が返され 0 。

3.5.2 クラスタメンバーファイル

Failover Managerクラスタ内の各ノードには、クラスタメンバーファイルがあります。エージェントは起動時に、そのファイルを使用して他のクラスタメンバーを見つけます。 Failover Managerインストーラは、 efm という名前のクラスタメンバーファイル用のファイルテンプレートを作成します。 nodes.in で /etc/edb/efm-3.4 ディレクトリ。 Failover Managerのインストールが完了したら、テンプレートの作業用コピーを作成する必要があります。

# cp /etc/edb/efm-3.4/efm.nodes.in /etc/edb/efm-3.4/efm.nodes

テンプレートファイルをコピーし efm 、ファイルの所有者を efm に変更します。

chown efm:efm efm.nodes

デフォルトでは、Failover Managerはクラスタメンバーファイルの名前が efm あると efm ます。 nodes 。クラスタメンバーに efm 以外の efm ます。 nodes がある場合は、Failover Managerサービススクリプトを変更して、Failover Managerに新しい名前を使用するように指示する必要があります。

最初に起動したノードのクラスタメンバーファイルは空にすることができます。このノードがメンバーシップコーディネーターになります。後続の各ノードで、クラスタメンバーファイルには、メンバーシップコーディネーターのアドレスとポート番号が含まれている必要があります。クラスタメンバーファイルの各エントリは、 address ： port 形式で、複数のエントリを空白で区切って並べてください。

メンバーシップコーディネーターが efm の内容を更新します。クラスタの現在のメンバーに一致する nodes ファイル。エージェントがクラスタに参加するか、クラスタから脱退すると、 efm です。他のエージェントの nodes ファイルは、現在のクラスタメンバーシップを反映するように更新されます。あなたが起動した場合 efm stop-cluster コマンドを、フェイルオーバーマネージャーは、ファイルを変更しません。

メンバーシップコーディネーターがクラスターを離れると、別のノードがその役割を引き継ぎます。あなたは efm を使うことができます Membership Coordinatorのアドレスを見つけるための cluster - status コマンド。エージェントの停止中にノードがクラスタに参加またはクラスタから脱退する場合は、ファイルに少なくとも現在のメンバーシップコーディネーターが含まれていることを手動で確認する必要があります。

クラスタに参加するノードのIPアドレスとポートがわかっている場合は、いつでもそのアドレスをクラスタメンバーファイルに含めることができます。起動時に、クラスタメンバーを識別しないアドレスは、 auto ない限り無視されます。 allow ます。クラスタプロパティファイルの hosts プロパティが true 設定されて true 。詳細はセクション 4.1.2を参照してください。

stable いれば。 nodes 。 file プロパティが true に設定されている true 、メンバーシップコーディネーターはを更新しません。クラスタメンバーがクラスタに参加または脱退したときの nodes ファイル。この動作は、クラスタメンバーのIPアドレスが頻繁に変更されない場合に最も便利です。クラスタプロパティの変更については、 3.5.1.1 項を参照してください。

3.6 仮想IPアドレスでのFailover Managerの使用

Failover Managerは efm_address スクリプトを使用して仮想IPアドレスを割り当てまたは解放します。

仮想IPアドレスは多くのクラウドプロバイダではサポートされていません。そのような環境では、別のメカニズム（AWSのElastic IPアドレスなど）を使用する必要があります。これは、必要に応じてフェンシングまたはプロモーション後のスクリプトによって変更できます。

デフォルトでは、スクリプトは次の場所にあります。

/usr/edb/efm-3.4/bin/efm_address

次のコマンドバリエーションを使用して、IPv4またはIPv6のIPアドレスを割り当てまたは解放します。

仮想IPv4 IPアドレスを割り当てるには：

# efm_address add4 interface _ name IPv4 _ addr/prefix

仮想IPv6 IPアドレスを割り当てるには：

# efm_address add6 interface _ name IPv6 _ addr / prefix

仮想アドレスを解放するには

# efm_address del interface_name IP _ address/prefix

どこで：

interface _ name は、クラスタプロパティファイルの virtualIp.interface プロパティで指定された name 一致します。

IPv4_addrまたはIPv6_addr は、クラスター・プロパティー・ファイルの virtualIp プロパティーに指定されている名前と一致します。

prefix は、クラスタプロパティファイルの virtualIp.prefix プロパティで指定された値と一致します。

仮想IPアドレスを記述するプロパティの詳細については、 3.5.1.1項を参照してください。

root ユーザーとして efm _ address スクリプトを呼び出す必要があります。 efm ユーザーは、インストール時に作成され、中に権限が付与された sudoers 実行するファイル efm _ address スクリプトを。 sudoers ファイルの詳細については、セクション3.4 、フェールオーバーマネージャの権限の拡張を参照してください。

VIPをテストする

Failover Managerで仮想IP（VIP）アドレスを使用する場合は、Failover Managerを起動する前にVIPの機能を手動でテストすることが重要です。実際のフェイルオーバー中に問題が発生する前に、これによってネットワーク関連の問題が検出されます。次の手順では、フェールオーバーマネージャが実行するアクションをテストします。この例では、次のプロパティ値を使用しています。

virtualIp=172.24.38.239
virtualIp.interface=eth0
virtualIp.prefix=24
pingServerCommand=/bin/ping -q -c3 -w5

注意してください： virtualIp 。 prefix は、仮想IPアドレスの有効ビット数を指定します。

ノードからVIPにpingするように指示された場合は、 pingServerCommand プロパティで定義されたコマンドを使用して pingServerCommand 。

1.すべてのノードからVIPにpingを実行して、アドレスがまだ使用されていないことを確認します。

# /bin/ping -q -c3 -w5 172.24.38.239
PING 172.24.38.239 (172.24.38.239) 56(84) bytes of data.
--- 172.24.38.239 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3000ms

あなたは100％のパケット損失を見るはずです。

2. efm _ address 実行し address マスターノードで add4 コマンドを実行してVIPを割り当て、次に ip address 確認し ip address 。

# efm_address add4 eth0 172.24.38.239/24
# ip address
<output truncated>
eth0 Link encap:Ethernet HWaddr 36:AA:A4:F4:1C:40
inet addr:172.24.38.239 Bcast:172.24.38.255
...

3.他のノードからVIPにpingを実行して、それらがVIPに到達できることを確認します。

# /bin/ping -q -c3 -w5 172.24.38.239
PING 172.24.38.239 (172.24.38.239) 56(84) bytes of data.
--- 172.24.38.239 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.023/0.025/0.029/0.006 ms

パケットロスは見られないはずです。

4. efm_address del コマンドを使用してマスターノードのアドレスを解放し、ノードが ip address 解放されたことを確認します。

# efm_address del eth0 172.24.38.239/24
# ip address
eth0 Link encap:Ethernet HWaddr 22:00:0A:89:02:8E

inet addr:10.137.2.142 Bcast:10.137.2.191
...

このステップからの出力はeth0インターフェイスを示すべきではありません

5.手順3を繰り返します。今回は、スタンバイと証人がVIPを使用していないことを確認します。

# /bin/ping -q -c3 -w5 172.24.38.239
PING 172.24.38.239 (172.24.38.239) 56(84) bytes of data.
--- 172.24.38.239 ping statistics ---
4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3000ms

あなたは100％のパケット損失を見るはずです。すべてのノードでこの手順を繰り返します。

6.すべてのスタンバイノードで手順2を繰り返して、すべてのノードにVIPを割り当てます。どのノードからでもVIPにpingを実行して、それが使用中であることを確認できます。

# efm_address add4 eth0 172.24.38.239/24
# ip address
<output truncated>
eth0 Link encap:Ethernet HWaddr 36:AA:A4:F4:1C:40
inet addr:172.24.38.239 Bcast:172.24.38.255
...

上記のテスト手順の後、Failover Managerを起動する前に、マスター以外のノードからVIPを解放してください。

注意：VIPに使用されるネットワークインターフェースは、Failover Managerエージェントの bind 使用されるインターフェースと同じである必要はありません。 address 値マスターエージェントはフェールオーバー中に必要に応じてVIPをドロップし、フェールオーバーマネージャはスタンバイを昇格させる前にVIPが使用できなくなったことを確認します。バインドアドレスネットワークに障害が発生すると、マスターの分離とフェールオーバーが発生します。

VIPが別のインターフェイスを使用している場合は、マスターエージェントがドロップする前に、残りのクラスタが到達可能なVIPをチェックするというタイミング条件が発生する可能性があります。この場合、EFMは node 指定された秒数の間VIPチェックを再試行し node 。フェイルオーバーが予想どおりに行われるようにするための timeout プロパティ。

4 フェールオーバーマネージャの使用

Failover Managerは、1台以上のスタンバイサーバーを持つクラスタの監視とフェイルオーバーをサポートします。リソースに対する需要の増減に応じて、クラスタにノードを追加または削除できます。

マスターノードが再起動した場合、フェールオーバーマネージャーはデータベースがマスターノードで停止していることを検出し、スタンバイノードをマスターの役割に昇格させます。これが発生した場合、（再起動された）マスターノード上のFailover Managerエージェントは recovery を書き込む機会を得られません。 conf ファイル再起動したマスターノードは、2番目のマスターノードとしてクラスターに戻ります。これを防ぐには、データベースサーバを起動する前にFailover Managerエージェントを起動します。エージェントはアイドルモードで起動し、クラスタにマスターがすでに存在するかどうかを確認します。マスターノードがある場合、エージェントはその recovery を確認します。 conf ファイルが存在し、データベースが2番目のマスターとして起動しません。

4.1 Failover Managerクラスタの管理

設定が完了すると、Failover Managerクラスタは定期的なメンテナンスを必要としません。次の項では、Failover Managerクラスタで必要となる場合がある管理タスクの実行について説明します。

デフォルトでは、以下にリストされているコマンドのいくつかは、 efm またはOSスーパーユーザーによって呼び出される必要があります。管理者は、ユーザーを efm グループに追加することによって、ユーザーにこれらのコマンドの呼び出しを選択的に許可することができます。コマンドは以下のとおりです。

•

efm allow-node

•

efm disallow-node

•

efm promote

•

efm resume

•

efm set-priority

•

efm stop-cluster

•

efm upgrade-conf

4.1.1 Failover Managerクラスタの起動

Failover Managerクラスタのノードは任意の順序で起動できます。

RHEL 6.xまたはCentOS 6.xでFailover Managerクラスタを起動するには、スーパーユーザー特権を引き受けて、次のコマンドを呼び出します。

service efm-3.4 start

RHEL 7.xまたはCentOS 7.xでFailover Managerクラスタを起動するには、スーパーユーザー特権を想定して、次のコマンドを呼び出します。

systemctl start efm-3.4

ノードのクラスタープロパティーファイルでそれ is 指定されて is 。 witness が true である場合、ノードは証人ノードとして起動します。

ノードが専用の pg_is_in_recovery() ノードではない場合、Failover Managerはローカルデータベースに接続して pg_is_in_recovery() 関数を呼び出します。サーバーが false と応答し false 場合、エージェントはノードをマスターノードと見なし、仮想IPアドレスをノードに割り当てます（該当する場合）。サーバーが true と応答した true 、Failover Managerエージェントはノードがスタンバイサーバーであると見なします。サーバーが応答しない場合、エージェントはアイドル状態で起動します。

クラスタに参加した後、Failover Managerエージェントは提供されたデータベース認証情報をチェックして、クラスタ内のすべてのデータベースに接続できることを確認します。エージェントが接続できない場合、エージェントはシャットダウンします。

新しいマスターノードまたはスタンバイノードがクラスタに参加すると、既存のすべてのノードも、それらが新しいノード上のデータベースに接続できることを確認します。

4.1.2 クラスタへのノードの追加

Failover Managerクラスタにはいつでもノードを追加できます。ノードをクラスタに追加するときは、新しいノードを許可するようにクラスタを変更してから、新しいノードにクラスタの検索方法を指示する必要があります。次の手順では、クラスタへのノードの追加について詳しく説明します。

1。

auto なければ。 allow ます。 hosts が true に設定されている true は、 efm allow-node コマンドを使用して、新しいノードのIPアドレスをFailover Manager許可ノードホストリストに追加します。コマンドを呼び出すときに、新しいノードのクラスタ名とIPアドレスを指定します。

efm allow-node cluster_name ip_address

使用方法の詳細については efm allow-node コマンドをまたはフェールオーバーマネージャーサービスを制御し、参照の第5節。

Failover Managerエージェントをインストールし、新しいノードにクラスタプロパティファイルを設定します。プロパティファイルの変更の詳細については、 3.5.1 項を参照してください。

2。

Membership Coordinatorのエントリを追加して、新しいノードにクラスタメンバーファイルを設定します。クラスタメンバーファイルの変更の詳細については、 3.5.2項を参照してください。

3。

新しいノードでスーパーユーザー特権を想定し、Failover Managerエージェントを起動します。 RHEL 6.xまたはCentOS 6.xでFailover Managerクラスタを起動するには、スーパーユーザー特権を引き受けて、次のコマンドを呼び出します。

service efm-3.4 start

RHEL 7.xまたはCentOS 7.xでFailover Managerクラスタを起動するには、スーパーユーザー特権を想定して、次のコマンドを呼び出します。

systemctl start efm-3.4

新しいノードがクラスタに参加すると、Failover Managerは user 提供された管理者の電子メールに通知を送信し user 。 email プロパティ、または指定された通知スクリプトを呼び出します。

注意：現在のノードにとって便利なスタンバイになるには、PostgreSQLストリーミングレプリケーションシナリオでノードがスタンバイになっている必要があります。

4.1.3 スタンバイの優先順位を変更する

Failover Managerクラスタに複数のスタンバイサーバーが含まれている場合は、 efm set-priority コマンドを使用してスタンバイノードの昇格優先順位に影響を与えることができます。 Failover Managerクラスターの既存のメンバーでコマンドを呼び出し、メンバーのIPアドレスの後に優先順位の値を指定します。

たとえば、次のコマンドは、 10.0.1.9:7800 ： 10.0.1.9:7800 監視している acctg クラスタメンバーがプライマリスタンバイ（ 1 ）であることをフェールオーバーマネージャに指示します。

efm set-priority acctg 10.0.1.9:7800 1

フェイルオーバーが発生した場合、Failover Managerは最初にPostgresストリーミングレプリケーションから情報を取得してどのスタンバイノードに最新のデータがあるかを確認し、データ損失の可能性が最も少ないノードを昇格させます。 2つのスタンバイノードに同じように最新のデータが含まれている場合、より高いユーザー指定の優先順位値を持つノードがマスターに昇格されます。スタンバイノードの優先順位の値を確認するには、次のコマンドを使用します。

efm cluster-status cluster_name

注意：ノードがクラスタから切り離され、後でクラスタに再参加すると、プロモーションの優先順位が変わる可能性があります。

4.1.4 フェイルオーバーマネージャノードの昇格

あなたは efm を起動することができ efm promote マスターデータベースにスタンバイ・データベースを手動でプロモーションを開始するフェールオーバーマネージャークラスタの任意のノードに。

手動による昇格は、データベースクラスタのメンテナンス期間中にのみ実行する必要があります。最新のスタンバイデータベースを利用できない場合は、続行する前にプロンプトが表示されます。手動昇格を開始するには、 efm またはOSスーパーユーザーの IDを想定して、次のコマンドを呼び出します。

efm promote cluster_name [-switchover]
[-sourcenode < address >] [-quiet]

どこで：

cluster _ name は、Failover Managerクラスタの名前です。

元のマスターをスタンバイとして再設定するには、 - switchover オプションを含めます。 - switchover キーワードを含める場合は、クラスタにマスターノードと少なくとも1つのスタンバイノードが含まれ、ノードが同期している必要があります。

recovery ノードを指定するには、 -sourcenode キーワードを含めます。 conf ファイルがマスターにコピーされます。

スイッチオーバー中の通知を抑制するには、 -quiet スイッチを含めます。

スイッチオーバー中：

•

recovery 。 confファイルが既存のスタンバイからマスターノードにコピーされます。

•

masterデータベースが停止しています。

•

VIPを使用している場合、アドレスはマスターノードから解放されます。

•

スタンバイはマスターノードを交換するために昇格され、VIPを獲得します。

•

新しいマスターノードのアドレスが recovery 追加されます。 confファイル

•

古いマスターが再起動されます。エージェントはスタンバイとして監視を再開します。

手動昇格中に、マスターエージェントは recovery 作成する前に仮想IPアドレスを解放します。 conf で指定したディレクトリ内のファイル db 。 recovery 。 conf 。 dir プロパティ。マスターエージェントは実行されたままで、 Idle ステータスになります。

スタンバイエージェントは、既知のアドレスをpingする前に仮想IPアドレスが使用されていないことを確認して、エージェントがネットワークから分離されていないことを確認します。スタンバイエージェントはフェンシングスクリプトを実行し、スタンバイデータベースをマスターに昇格させます。その後、スタンバイエージェントは仮想IPアドレスをスタンバイノードに割り当て、プロモーション適用後のスクリプトを実行します（該当する場合）。

このコマンドは、 auto 指定された値を無視するようにサービスに指示します。クラスタプロパティファイルの failover パラメータ。

ノードをマスターの役割に戻すには、ノードをプロモーションリストの最初に配置します。

efm set-priority cluster_name ip_address priority

次に、手動プロモーションを実行します。

efm promote cluster _ name - switchover

efmユーティリティの使用方法の詳細については、 5.3 項を参照してください。

4.1.5 Failover Managerエージェントの停止

エージェントを停止すると、フェールオーバーマネージャーは、クラスターのすべての実行中のノードのクラスターメンバーリストからノードのアドレスを削除しますが、フェールオーバーマネージャーからそのアドレスを削除することは Allowed ません。 node host list 。

RHEL 6.xまたはCentOS 6.xでFailover Managerエージェントを停止するには、スーパーユーザー特権を想定して、次のコマンドを呼び出します。

service efm-3.4 stop

RHEL 7.xまたはCentOS 7.xでFailover Managerエージェントを停止するには、スーパーユーザー特権を想定して、次のコマンドを呼び出します。

systemctl stop efm-3.4

あなたは起動するまで efm disallow-node からノードのノードのアドレスを削除する（コマンドを Allowed node host list ）、あなたは service を使用することができます efm-3.4 最初に efm allow-node コマンドを再度実行せずに、後でノードを再起動するための start コマンド。

エージェントを停止しても、エージェントに障害が発生したことがクラスタに通知されるわけではありません。

4.1.6 Failover Managerクラスタの停止

Failover Managerクラスタを停止するには、Failover Managerクラスタの任意のノードに接続し、 efm またはOSスーパーユーザーの IDを想定して、次のコマンドを呼び出します。

efm stop-cluster cluster_name

コマンドはすべてを引き起こします Failover Managerエージェントが終了します。 Failover Managerエージェントを終了すると、すべてのフェイルオーバー機能が完全に無効になります。

注意してください：あなたが efm を呼び出すとき stop - cluster コマンドは、すべての認可ノード情報から失われる Allowed node host list 。

4.1.7 クラスタからのノードの削除

efm disallow-node コマンドは、フェールオーバーマネージャーからのノードのIPアドレス削除 Allowed node host list 。既存のノード（現在実行中のクラスタの一部）上のefmまたはOSスーパーユーザーのIDを想定し、 efm disallow-node のクラスタ名とIPアドレスを指定して efm disallow-node コマンドを呼び出します。

efm disallow-node cluster_name ip_address

efm disallow-node コマンドは、実行中のエージェントを停止することはありません。エージェントを停止するまで、サービスはノード上で実行され続けます（エージェントの制御については、セクション 5を参照）。その後エージェントまたはクラスターが停止されると、ノードはクラスターに再参加することは許可されず、フェイルオーバー優先順位リストから削除されます（そして昇格には不適格になります）。

efm disallow-node コマンドを呼び出した後、 efm allow-node コマンドを使用してノードをクラスタに再度追加する必要があります。 efmユーティリティの使用方法の詳細については、 5.3 項を参照してください。

4.2 フェールオーバーマネージャクラスタの監視

Failover Manager efm cluster-status コマンドまたはPEM Clientのグラフィカルインターフェイスを使用して、Failover Managerクラスタの監視対象ノードの現在のステータスを確認できます。

4.2.1 クラスタステータスレポートの確認

cluster - status コマンドは、フェールオーバーマネージャークラスタのステータスに関する情報を含むレポートを返します。コマンドを呼び出すには、次のように入力します。

# efm cluster-status cluster _ name

次のステータスレポートは、クラスタの名前です edb 実行されている4つのノードがあります。

efm cluster-status efm
Cluster Status: efm

Agent Type Address Agent DB VIP
-----------------------------------------------------
Witness 172.19.12.170 UP N/A
Master 172.19.13.105 UP UP 172.19.13.107*
Standby 172.19.13.113 UP UP 172.19.13.106
Standby 172.19.14.106 UP UP 172.19.13.108

Allowed node host list:
172.19.12.170 172.19.13.113 172.19.13.105 172.19.14.106

Membership coordinator: 172.19.12.170

Standby priority host list:
172.19.13.113 172.19.14.106

Promote Status:

DB Type Address XLog Loc Info
-------------------------------------------------------
Master 172.19.13.105 0/31000140
Standby 172.19.13.113 0/31000140
Standby 172.19.14.106 0/31000140

Standby database(s) in sync with master. It is safe to promote.

[root@FOUR efm-3.4]}:

Cluster Status セクションには、クラスタの各ノードに存在するエージェントのステータスの概要が表示されます。

Cluster Status: efm

Agent Type Address Agent DB VIP
-----------------------------------------------------
Witness 172.19.12.170 UP N/A
Master 172.19.13.105 UP UP 172.19.13.107*
Standby 172.19.13.113 UP UP 172.19.13.106
Standby 172.19.14.106 UP UP 172.19.13.108

VIPアドレスの後のアスタリスク（*）は、そのアドレスが接続に使用できることを示します。 VIPアドレスの後にアスタリスクが付いていない場合、そのアドレスは（プロパティー・ファイル内で）ノードに関連付けられていますが、そのアドレスは現在使用されていません。

Failover Managerエージェントは、 Cluster 表示される情報を提供します。 Status セクション

Allowed node host list と Standby priority host list 使用すると、どのノードがクラスタに参加できるか、およびノードの昇格順序を簡単に判断できます。 Membership のIPアドレス coordinator もレポートに表示されます。

Allowed node host list:
172.19.12.170 172.19.13.113 172.19.13.105 172.19.14.106

Membership coordinator: 172.19.12.170

Standby priority host list:
172.19.13.113 172.19.14.106

Promote レポートの Status セクションは、 cluster - status コマンドを呼び出しているノードから cluster 各データベースへの直接照会の結果です。クエリは各データベースのトランザクションログの場所も返します。

Promote Status:

DB Type Address XLog Loc Info
-------------------------------------------------------
Master 172.19.13.105 0/31000140
Standby 172.19.13.113 0/31000140
Standby 172.19.14.106 0/31000140

Standby master. in sync with
Standby database(s) master. It is safe to promote.

データベースが停止している（またはデータベースが再起動されているのに resume コマンドがまだ呼び出されていない）場合、そのホストに存在するエージェントの状態は Idle ます。エージェントがアイドル状態の場合、クラスタステータスレポートにはアイドルノードの状態の概要が含まれます。

Agent Type Address Agent DB VIP
-----------------------------------------------------
Idle 172.19.18.105 UP UP 172.19.13.105

終了コード

クラスタステータスプロセスは、クラスタの状態に基づいた終了コードを返します。

•

終了コード0は、すべてのエージェントが実行中で、マスターノードとスタンバイノードのデータベースが実行中で同期していることを示します。

•

ゼロ以外の終了コードは、問題があることを示します。以下の問題が、ゼロ以外の終了コードを引き起こす可能性があります。

データベースが停止しているか不明です（またはアイドル状態のエージェントがあります）。

Failover Managerは提供されたデータベースパスワードを復号化できません。

xlogの場所を取得するためにデータベースに連絡する際に問題が発生しました。

マスターエージェントはありません。

スタンバイエージェントはありません。

1つ以上のスタンバイノードがマスターと同期していません。

4.2.2 Postgres Enterprise Managerを使用したストリーミングレプリケーションの監視

Postgres Enterprise Manager（PEM）を使用してサーバーを監視する場合、ストリーミングレプリケーションシナリオの一部であるマスターノードまたはスタンバイノードの状態を表示するようにストリーミングレプリケーション分析ダッシュボード（PEMグラフィカルインターフェースの一部）を設定できます。

$C：\ Users \ susan \ Desktop \ str_replication_dashboard_master.png$

図4.1 - ストリーミングレプリケーションダッシュボード（マスターノード）

ストリーミングレプリケーション分析ダッシュボード（図4.1に示す）には、ストリーミングレプリケーションが有効になっている監視対象サーバーのアクティビティに関する統計情報が表示されます。ダッシュボードのヘッダを監視サーバのステータスを特定する（いずれかの Replication Master または Replication Slave ）、およびサーバは、最後のページが最後に更新された日付と時刻を開始し、サーバーのトリガーアラートの現在のカウントされた日付と時刻を表示します。

レプリケーションスレーブ（スタンバイノード）のダッシュボードを確認するときは、ダッシュボードの下部にあるラベルでサーバーのステータスを確認します（図4.2を参照）。

$C：\ Users \ susan \ Desktop \ str_replication_dashboard_standby.png$

図4.2 - ストリーミングレプリケーションダッシュボード（スタンバイノード）

デフォルトでは、ストリーミングレプリケーション分析ダッシュボードに情報を提供するPEMレプリケーションプローブは無効になっています。

レプリケーションシナリオのマスターノードのストリーミングレプリケーション分析ダッシュボードを表示するには、以下のプローブを有効にする必要があります。

•

ストリーミングレプリケーション

•

WALアーカイブステータス

レプリケーションシナリオのスタンバイノードのストリーミングレプリケーション分析ダッシュボードを表示するには、以下のプローブを有効にする必要があります。

•

ストリーミングレプリケーションの遅延時間

PEMの詳細については、EnterpriseDBのWebサイトを参照してください。

http://www.enterprisedb.com/products-services-training/products/postgres-enterprise-manager

4.3 単一ノードで複数のエージェントを実行する

そのフェイルオーバーマネージャーノードで複数のマスターエージェントまたはスタンバイエージェントを実行することで、同じホストに存在する複数のデータベースクラスターを監視できます。単一のノードで複数のWitnessエージェントを実行することもできます。異なるクラスタのFailover Managerエージェントが互いに干渉しないようにしながら、複数のデータベースクラスタを監視するようにFailover Managerを設定するには、次の手順を実行する必要があります。

1。

各クラスターの各メンバーに対して、固有のプロパティーのセットとクラスター内のノードの役割を定義するクラスタープロパティーファイルを作成します。

2。

各クラスタのメンバごとに、クラスタのメンバを一覧表示するクラスタメンバファイルを作成します。

3。

各クラスタのサービススクリプト（RHELまたはCentOS 6.xシステムの場合）またはユニットファイル（RHELまたはCentOS 7.xシステムの場合）をカスタマイズして、クラスタプロパティの名前とクラスタメンバファイルを指定します。

4。

各クラスタのサービスを起動します。

以下の例では、同じノードで実行されている 2つのデータベースクラスタ（ acctg と sales ）を使用しています。

•

acctg データは /opt/pgdata1 ます。そのサーバーはポート 5444 監視しています。

•

sales データは /opt/pgdata2; そのサーバーはポート 5445 監視しています。

これら両方のデータベースクラスタに対してFailover Managerエージェントを実行するには、 efm 使用し efm 。 2つのプロパティファイルを作成するための properties.in テンプレート。各クラスタプロパティファイルには一意の名前を付ける必要があります。この例では、 acctg を作成し acctg 。 properties と sales 。 acctg および sales データベースクラスタに一致する properties 。

以下のパラメーターは、各クラスター・プロパティー・ファイル内で固有でなければなりません。

admin.port
bind.address
db.port
db.recovery.conf.dir
virtualIp （使用されている場合）
virtualIp.interface （使用されている場合）

各クラスター・プロパティー・ファイル内で、 db.port パラメーターは各クラスターごとに固有の値を指定しますが、 db はその db.port です。 user と db 。 database パラメータは同じ値または一意の値を持つことができます。たとえば、 acctg です。 properties ファイルは次のように指定します。

db.user=efm_user
db.password.encrypted=7c801b32a05c0c5cb2ad4ffbda5e8f9a
db.port=5444
db.database=acctg_db

sales ながら。 properties ファイルは次のように指定します。

db.user=efm_user
db.password.encrypted=e003fea651a8b4a80fb248a22b36f334
db.port=5445
db.database=sales_db

同じノードに複数のFailover Managerクラスタエージェントを設定するときには、特別な注意が必要なパラメータがあります。同じノードに複数のエージェントが存在する場合、各ポートは一意である必要があります。任意の2つのポートが機能しますが、互いに近すぎないポートを使用する場合は、情報を明確にしておく方が簡単な場合があります。

各クラスターのクラスター・プロパティー・ファイルを作成するときは、 db 。 recovery 。 conf 。 dir パラメータは、それぞれのデータベースクラスタごとに一意の値も指定する必要があります。

以下のパラメーターは、仮想IPアドレスをノードに割り当てるときに使用されます。 Failover Managerクラスタが仮想IPアドレスを使用しない場合は、これらのパラメータを空白のままにしてください。

virtualIp
virtualIp 。 interface
virtualIp 。 prefix

このパラメータ値は、使用されている仮想IPアドレスによって決定され、両方の acctg 同じ場合も acctg ば、同じでない場合もあります。 properties と sales 。 properties 。

acctg 作成した後。 properties と sales 。 properties ファイル。クラスタごとに、それぞれのプロパティファイルを指すサービススクリプトまたはユニットファイルを作成します。この手順はプラットフォームによって異なります。 RHEL 6.xまたはCentOS 6.xを使用している場合は、 4.3.1 項を参照してください。 RHEL 7.xまたはCentOS 7.xを使用している場合は、 4.3.2 項を参照してください。

注意：カスタムのサービススクリプトまたはユニットファイルを使用している場合は、Failover Managerをアップグレードするときに新しいサービス名を反映するようにファイルを手動で更新する必要があります。

4.3.1 RHEL 6.xまたはCentOS 6.x

RHEL 6.xまたはCentOS 6.xを使用している場合は、 efm-3.4 サービススクリプトを各クラスタに固有の名前で新しいファイルにコピーする必要があります。例えば：

# cp /etc/init.d/efm-3.4 /etc/init.d/efm-acctg

# cp /etc/init.d/efm-3.4 /etc/init.d/efm-sales

次に、 CLUSTER 変数を編集して、クラスタ名を efm から acctg または sales に変更します。

サービススクリプトを作成したら、次のコマンドを実行します。

# chkconfig efm-acctg on

# chkconfig efm-sales on

次に、新しいサービススクリプトを使用してエージェントを起動します。たとえば、 acctg エージェントは次のコマンドで起動できます。

# service efm-acctg start

4.3.2 RHEL 7.xまたはCentOS 7.x

RHEL 7.xまたはCentOS 7.xを使用している場合は、 efm-3.4 ユニットファイルを各クラスタに固有の名前で新しいファイルにコピーする必要があります。たとえば、2つのクラスタ（名前が acctg と sales ）がある場合、ユニットファイル名は次のようになります。

/etc/systemd/system/efm-acctg.service

/etc/systemd/system/efm-sales.service

次に、各ユニットファイル内の CLUSTER 変数を編集し、指定されたクラスタ名を efm から新しいクラスタ名に変更します。たとえば、 acctg という名前のクラスタの場合、値は次のように指定されます。

Environment=CLUSTER=acctg

新しいクラスター名を指定するには、 PIDfile パラメーターの値も更新する必要があります。例えば：

PIDFile = / var / run / efm-3.4 / acctg.pid

サービススクリプトをコピーしたら、次のコマンドを使用してサービスを有効にします。

# systemctl enable efm-acctg.service

# systemctl enable efm-sales.service

次に、新しいサービススクリプトを使用してエージェントを起動します。たとえば、 acctg エージェントは次のコマンドで起動できます。

# systemctl start efm-acctg

ユニットファイルのカスタマイズについては、次のURLをご覧ください。

http://fedoraproject.org/wiki/Systemd#How_do_I_customize_a_unit_file.2F_add_a_custom_unit_file.3F

5 Failover Managerサービスを制御する

Failover Managerクラスタ内の各ノードは、サービススクリプトによって制御されるFailover Managerエージェントをホストします。デフォルトでは、サービススクリプトは以下のものを見つけることを期待しています。

•

efm という名前の設定ファイル。 Failover Managerサービスによって使用される properties を含むproperties。レプリケーションシナリオの各ノードには、そのノードに関する情報を提供するプロパティファイルが含まれている必要があります。

•

efm という名前のクラスタメンバーファイル。クラスタメンバーのリストを含む nodes 。レプリケーションシナリオの各ノードには、クラスタメンバーリストが含まれている必要があります。

単一のノードで複数のクラスターを実行している場合は、クラスター固有の名前で構成ファイルを手動で作成し、対応するクラスターのサービススクリプトを変更する必要があります。

Failover Managerサービスを制御するコマンドはプラットフォーム固有です。 RHEL 6.xまたはCentOS 6.xホストでFailover Managerを制御する方法については、 5.1 項を参照してください。 RHEL 7.xまたはCentOS 7.xを使用している場合は、セクション 5.2を参照してください。

5.1 RHEL 6.xおよびCentOS 6.xでサービスユーティリティを使用する

RHEL 6.xおよびCentOS 6.xでは、Failover Manager は /etc/init.d ある efm-3.4 （デフォルト）という名前のLinuxサービスとして動作します。 Failover Managerによって監視されている各データベースクラスタは、レプリケーションクラスタの各ノードでサービスのコピーを実行します。

RHEL 6.xまたはCentOS 6.xホストに存在するFailover Managerエージェントを制御するには、次の service コマンドを使用します。

service efm-3.4 start

start コマンドは、現在のノード上のフェールオーバーマネージャー・エージェントを開始します。ローカルFailover Managerエージェントはローカルデータベースを監視し、他のノード上のFailover Managerと通信します。 Failover Managerクラスタ内のノードは任意の順序で起動できます。

このコマンドは root 起動する必要があります。

service efm-3.4 stop

現在のノードでフェールオーバーマネージャーを停止します。このコマンドは root 起動する必要があります。

service efm-3.4 status

statusコマンドは、呼び出されたFailover Managerエージェントのステータスを返します。フェールオーバーマネージャにステータス情報を返すように指示するには、任意のノードで status コマンドを呼び出します。例えば：

[witness@localhost ~]# service efm-3.4 status

efm-3.4 (pid 50836) is running...

service efm-3.4 help

Failover Managerサービススクリプトのオンラインヘルプを表示します。

5.2 RHEL 7.xおよびCentOS 7.xでsystemctlユーティリティを使用する

RHEL 7.xおよびCentOS 7.xでは、Failover Manager は / usr / lib /systemd/system （デフォルトで） efm-3.4.service という名前のLinuxサービスとして実行されます。 Failover Managerによって監視されている各データベースクラスタは、レプリケーションクラスタの各ノードでサービスのコピーを実行します。

RHEL 7.xまたはCentOS 7.xホストに存在するFailover Managerエージェントを制御するには、次の systemctl コマンドを使用します。

systemctl start efm-3.4

start コマンドは、現在のノード上のフェールオーバーマネージャー・エージェントを開始します。ローカルFailover Managerエージェントはローカルデータベースを監視し、他のノード上のFailover Managerと通信します。 Failover Managerクラスタ内のノードは任意の順序で起動できます。

このコマンドは root 起動する必要があります。

systemctl stop efm-3.4

現在のノードでフェールオーバーマネージャーを停止します。このコマンドは root 起動する必要があります。

systemctl status efm-3.4

statusコマンドは、呼び出されたFailover Managerエージェントのステータスを返します。任意のノードで status コマンドを呼び出して、ステータスとサーバの起動情報を返すようにFailover Managerに指示することができます。

[root @ ONE〜]}> systemctlステータスefm-3.4

efm-3.4.service - EnterpriseDBフェイルオーバーマネージャー3.4

ロード済み：ロード済み（/usr/lib/systemd/system/efm-3.4.service;無効、ベンダープリセット：無効）

アクティブ：アクティブ（実行中）以来水曜日2013-02-14 14:02:16 EST。 4秒前

プロセス：58125 ExecStart = / bin / bash -c /usr/edb/efm-3.4/bin/runefm.sh start $ {CLUSTER}（コード=終了、ステータス= 0 /成功）

メインPID：58180（java）

CGroup：/system.slice/efm-3.4.service

└─58180/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64 / jre / bin / java -cp /usr/edb/efm-3.4/lib/EFM- 3.4.0.jar -Xmx128m -agentlib：jdwp =トランスポート...

5.3 efmユーティリティを使う

Failover Managerには、クラスタ管理を支援するための efm ユーティリティがあります。 Failover Managerをインストールすると、RPMインストーラによって /usr/edb/efm-3.4/binディレクトリにユーティリティが追加されます。

efm allow-node cluster_name

efm 起動する指定されたノードがクラスターに参加 allow-node コマンド。コマンドを呼び出すときは、クラスターの名前と参加ノードのIPアドレスを指定します。

このコマンドは、によって呼び出されなければならない efm のメンバー efm group, 又は root 。

efm cluster-status cluster_name

Failover Managerクラスタのステータスを表示するには、 efm cluster-status コマンドを呼び出します。クラスタステータスレポートの詳細については、 4.2.1 項を参照してください。

efm cluster-status-json cluster_name

efm cluster-status-json コマンドを呼び出して、Failover Managerクラスタのステータスをjson形式で表示します。表示される情報の形式は efm によって生成される表示とは異なりますが cluster - status コマンド、情報源は同じです。

次の例は、2つのノードを持つ正常なクラスタのステータスを照会することによって生成されます。

{
"nodes": {
"172.16.144.176": {
"type": "Witness",
"agent": "UP",
"db": "N\/A",
"vip": "",
"vip_active": false
},
"172.16.144.177": {
"type": "Master",
"agent": "UP",
"db": "UP",
"vip": "",
"vip_active": false,
"xlog": "2\/77000220",
"xloginfo": ""
},
"172.16.144.180": {
"type": "Standby",
"agent": "UP",
"db": "UP",
"vip": "",
"vip_active": false,
"xlog": "2\/77000220",
"xloginfo": ""
}
},
"allowednodes": [
"172.16.144.177",
"172.16.144.160",
"172.16.144.180",
"172.16.144.176"
],
"membershipcoordinator": "172.16.144.177",
"failoverpriority": [
"172.16.144.180"
],
"minimumstandbys": 0,
"missingnodes": [],
"messages": []
}

efm disallow-node cluster_name ip_address

efm 起動する指定されたノードを許可ホストリストから削除し、そのノードがクラスタに参加しないようにする disallow-node コマンド。 efm 呼び出すときに、クラスタの名前とノードのIPアドレスを指定します。 disallow - node コマンド。このコマンドは、によって呼び出されなければならない efm のメンバー efm group, 又は root 。

efm encrypt cluster_name [--from-env]

パスワードをクラスター・プロパティー・ファイルに含める前に、 efm encrypt コマンドを呼び出してデータベースのパスワードを暗号化してください。フェールオーバーマネージャに EFMPASS 環境変数で指定された値を使用し、ユーザーの入力なしで実行するように指示するには、 - from - env オプションを含めます。詳細はセクション 3.5.1.2を参照してください。

efm promote cluster_name [-switchover [-sourcenode address ]
-quiet]]

promote コマンドをマスターにスタンバイの手動フェイルオーバーを実行するためにフェイルオーバーマネージャーに指示します。

手動昇格は、statusコマンドで、クラスターにマスターと最新のスタンバイノードが含まれていることが報告された場合にのみ試行する必要があります。最新のスタンバイがない場合は、続行する前にFailover Managerからプロンプトが出されます。

スタンバイノードを昇格させるには –switchover 句を含め、マスターノードをスタンバイノードとして再構成します。 - sourcenode キーワードを含め、ノード address を指定して、 recovery するノードを示します。 conf ファイルは古いマスターノードにコピーされます（スタンバイになります）。スイッチオーバープロセス中の通知を抑制するには、 - quiet キーワードを含めます。

このコマンドは、によって呼び出されなければならない efm のメンバー efm基、又は root 。

このコマンドは、 auto 指定された値を無視するようにサービスに指示します。クラスタプロパティファイルの failover パラメータ。

efm resume cluster_name

efm 起動する以前に停止したデータベースの監視を再開する resume コマンド。このコマンドは、によって呼び出されなければならない efm のメンバー efm group, 又は root 。

efm set-priority cluster_name ip_address priority

efm 起動するフェールオーバー優先順位をスタンバイノードに割り当てる set-priority コマンド。値は、フェイルオーバーの際に新しいノードが使用される順序を指定します。このコマンドは、によって呼び出されなければならない efm のメンバー efm group, 又は root 。

priority は n 整数値です。ここで、 n はリスト内のスタンバイノードの数です。値 1 を指定すると、新しいノードがプライマリスタンバイになり、フェールオーバーの際に最初に昇格されるノードになります。 priority の 0 スタンバイを促進しないためにフェールオーバーマネージャーに指示します。

efm stop-cluster cluster_name

efm 起動するすべてのノードでフェールオーバーマネージャーを停止 stop-cluster コマンド。このコマンドは、フェールオーバーマネージャーにクラスター上の各ノードに接続し、既存のメンバーにシャットダウンするように指示します。このコマンドは実行中のデータベースには影響しませんが、コマンドが完了した時点ではフェイルオーバー保護は行われていません。

注意してください：あなたが efm を呼び出すとき stop - cluster コマンド、許可されたすべてのノード情報が Allowed から削除されます。 node host list 。

このコマンドは、によって呼び出されなければならない efm のメンバー efm group, 又は root 。

efm upgrade-conf cluster_name [-source efm upgrade-conf directory ]

efm upgrade-conf コマンドを呼び出して、既存のFailover Managerインストールから設定ファイルをコピーし、Failover Manager 3.4のインストールに必要なパラメータを追加します。ユーティリティを起動するときに、前のクラスタの名前を指定します。このコマンドは root 権限で呼び出す必要があります。

-あなたはsudoを使用していないフェールオーバーマネージャー構成からアップグレードする場合、含める source フラグをしての名前を指定し directory 呼び出すときにコンフィギュレーション・ファイルが存在する upgrade - conf 。

efm --help

efm - help コマンドを呼び出して、Failover Managerユーティリティコマンドのオンラインヘルプを表示します。

6 ロギングの制御

Failover Managerは、エージェントごとに1つのログファイルとエージェントごとに1つの起動ログを /var/log/ cluster_name -3.4 （ cluster _ name は cluster name 指定します）に書き込み、保存します。

jgroups 変更して、エージェントログに書き込まれる詳細のレベルを制御できます。 loglevel と efm 。クラスタプロパティファイルの loglevel パラメータ

# Logging levels for JGroups and EFM.
# Valid values are: TRACE, DEBUG, INFO, WARN, ERROR
# Default value: INFO
# It is not necessary to increase these values unless debugging a
# specific issue. If nodes are not discovering each other at
# startup, increasing the jgroups level to DEBUG will show
# help
# information about the TCP connection attempts that may help
# diagnose the connection failures.

jgroups.loglevel=INFO
efm.loglevel=INFO

ロギング機能は、Javaロギングライブラリとロギングレベルを使用します。ログレベルは（ほとんどのログ出力から順に）です。

TRACE
DEBUG
INFO
WARN
ERROR

たとえば、 efm を設定したとし efm 。 loglevel パラメータを WARN に設定すると、Failover Managerは WARN レベル以上（ WARN と ERROR ）のメッセージのみをログに記録します。

デフォルトでは、Failover Managerログファイルは毎日ローテーションされ、圧縮され、そして1週間保存されます。ログローテーションファイル（ /etc/logrotate.d/efm-3.4 ）の設定を変更して、ファイルローテーションスケジュールを変更できます。ログローテーションスケジュールの変更の詳細については、 logrotate 参照してください。 man ページ：

$ man logrotate

6.1 syslogログファイルエントリの有効化

Failover Managerはsyslogロギングをサポートしています。 syslogロギングを実装するには、UDPまたはTCP接続を許可するようにsyslogを設定する必要があります。

syslogへの接続を許可するには、 /etc/rsyslog.confファイルを編集して、使用したいプロトコルのコメントを外します。プロトコルに関連付けられているUDPServerRunまたはTCPServerRunエントリに、ログエントリの送信先となるポート番号が含まれていることも確認する必要があります。

たとえば、次の設定ファイルエントリは、ポート514へのUDP接続を有効にします。

＃UDP syslog受信を提供します
$ ModLoad imudp
$ UDPServerRun 514

次の設定ファイルエントリは、ポート514へのTCP接続を有効にします。

＃TCP syslog受信を提供します
$ ModLoad imtcp
$ InputTCPServerRun 514

syslog設定ファイルを変更したら、 rsyslogサービスを再起動して接続を有効にします。

systemctl restart rsyslog.service

Failover Managerホスト上のrsyslog.confファイルを変更した後、ログ記録を有効にするためにFailover Managerのプロパティを変更する必要があります。プロパティファイルを変更するために、エディタの選択を使用してください（ /etc/edb/efm-3.4/efm 。 properties 。 in 、実装したいログの種類を指定します）：

＃どのロギングが有効になっていますか。
file.log.enabled = true
syslog.enabled = false

システムのsyslog詳細も指定する必要があります。 syslogを使用してください。プロトコルタイプ（UDPまたはTCP）とsyslogを指定するprotocolパラメータ。 syslogホストのリスナーポートを指定するportパラメータ。 syslog.facility値は、エントリを作成したプロセスの識別子として使用できます。値はLOCAL0とLOCAL7の間になければなりません。

＃syslog情報syslogサービスは待機している必要があります
＃指定されたプロトコルのポート。UDPまたはTCPです。
＃サポートされている機能はLOCAL0からLOCAL7です。
syslog.host = localhost
syslog.port = 514
syslog.protocol = UDP
syslog.facility = LOCAL1

Failover Manager設定ファイルの修正に関する詳細は、 3.5 項を参照してください。

syslogの詳細については、syslogのmanページを参照してください。

syslogの人

7 通知

フェールオーバーマネージャは、クラスタに影響を与える重要なイベントが発生したときに電子メール通知を送信したり、通知スクリプトを呼び出したりします。電子メール通知を送信するようにFailover Managerを設定した場合は、クラスタの各ノードのポート 25 で SMTPサーバを実行している必要があります。フェールオーバーマネージャの通知動作を設定するには、次のパラメータを使用します。

user.email
script.notification

構成プロパティの編集の詳細は、 3.5.1.1 項を参照してください。

通知の本文には、通知をトリガーしたイベントに関する詳細と、クラスタの現在の状態に関する詳細が含まれています。例えば：

EFM node: 10.0.1.11
Cluster name: acctg
Database name: postgres
VIP: ip _ address (Active|Inactive)

Database health is not being monitored.

VIP ノードのために実装されている場合、フィールドには、仮想IPのIPアドレスと状態を表示します。

Failover Managerは各通知に重大度を割り当てます。次のレベルは、必要とされる注意のレベルの増加を示しています。

INFO はエージェントに関する情報メッセージを示し、手動操作を必要としません（たとえば、Failover Managerが起動または停止したなど）。

WARNING は、管理者にシステムのチェックを要求するイベントが発生したことを示します（たとえば、フェイルオーバーが発生したなど）。

SEVERE は、重大なイベントが発生したことを示し、管理者の即時対応が必要です（たとえば、フェイルオーバーが試行されましたが完了できませんでした）。

重大度は通知の緊急度を示します。重大度レベルが SEVERE の通知はすぐにユーザーの注意を必要としますが、重大度レベルが INFO 通知はユーザーの操作を必要としないクラスターに関する運用情報に注意を向けます。通知の重大度はログレベルとは関係ありません。設定ファイルで指定されたログレベルの詳細に関係なく、すべての通知が送信されます。

あなたは notification を使用することができます。通知をトリガーする最小の重大度レベルを指定する level プロパティ。詳細については、 3.5.1.1 項を参照してください。

以下の表にリストされている条件は、 INFO レベルの通知をトリガーします。

Subject

Description

Executed fencing script

Executed fencing script script_name Results: script_results

Executed post-promotion script

Executed post-promotion script script_name Results: script_results

Executed remote pre-promotion script

Executed remote pre-promotion script script_name Results: script_results

Executed remote post-promotion script

Executed remote post-promotion script script_name Results: script_results

Executed post-database failure script

Executed post-database failure script script_name Results: script_results

Executed master isolation script

Executed master isolation script script_name Results: script_results

for cluster cluster_name node_address for cluster Witness agent running on node_address Witness agent running on

Witness agent is running.

for cluster cluster_name node_address for cluster Master agent running on

Master agent is running and database health is being monitored.

for cluster cluster_name node_address for cluster Standby agent running on

Standby agent is running and database health is being monitored.

for cluster cluster_name Idle agent running on node node_address for cluster Idle agent running on node

Idle agent is running. After starting the local database, the agent can be resumed.

Assigning VIP to node node_address Assigning VIP to node

Assigning VIP VIP_address to node node_address Results: script_results

Releasing VIP from node node_address Releasing VIP from node

Releasing VIP VIP_address from node node_address Results: script_results

Starting auto resume check for cluster cluster_name Starting auto resume check for cluster cluster_name

The agent on this node will check every auto.resume.period seconds to see if it can resume monitoring the failed database. The cluster should be checked during this time and the agent stopped if the database will not be started again. See the agent log for more details. The agent on this node will check every seconds to see if it can resume monitoring the failed database. The cluster should be checked during this time and the agent stopped if the database will not be started again. See the agent log for more details.

Executed agent resumed script

Executed agent resumed script script_name Executed agent resumed script Results: script_results

以下の表にリストされている条件は、 WARNING レベルの通知をトリガーします。

Subject

Description

node_address Witness agent exited on for cluster cluster_name node_address for cluster Witness agent exited on node_address

Witness agent has exited.

Master agent exited on for cluster cluster_name node_address for cluster Master agent exited on node_address

Database health is not being monitored.

Cluster cluster_name notified that master node has left

Failover is disabled for the cluster until the master agent is restarted.

Standby agent exited on for cluster cluster_name node_address for cluster Standby agent exited on node_address

Database health is not being monitored.

for cluster cluster_name node_address for cluster Agent exited during promotion on

Database health is not being monitored.

Agent exited on for cluster cluster_name node_address for cluster Agent exited on node_address

The agent has exited. This is generated by an agent in the Idle state.

Agent exited for cluster cluster_name Agent exited for cluster

The agent has exited. This notification is usually generated during startup when an agent exits before startup has completed.

Virtual IP address assigned to non-master node

The virtual IP address appears to be assigned to a non-master node. To avoid any conflicts, Failover Manager will release the VIP. You should confirm that the VIP is assigned to your master node and manually reassign the address if it is not.

Virtual IP address not assigned to master node.

The virtual IP address appears to not be assigned to a master node. EDB Postgres Failover Manager will attempt to reacquire the VIP.

No standby agent in cluster for cluster cluster_name No standby agent in cluster for cluster

cluster_name The standbys on have left the cluster.

Standby agent failed for cluster cluster_name Standby agent failed for cluster

cluster_name A standby agent on has left the cluster, but the coordinator has detected that the standby database is still running.

Standby database failed for cluster cluster_name Standby database failed for cluster

A standby agent has signaled that its database has failed. The other nodes also cannot reach the standby database.

Standby agent cannot reach database for cluster cluster_name Standby agent cannot reach database for cluster

A standby agent has signaled database failure, but the other nodes have detected that the standby database is still running.

Cluster cluster_name has dropped below three nodes

At least three nodes are required for full failover protection. Please add witness or agent node to the cluster.

Subset of cluster cluster_name Subset of cluster disconnected from master

This node is no longer connected to the majority of the cluster cluster_name This node is no longer connected to the majority of the cluster . Because this node is part of a subset of the cluster, failover will not be attempted. Current nodes that are visible are: node_address . Because this node is part of a subset of the cluster, failover will not be attempted. Current nodes that are visible are: node_address

Promotion has started on cluster cluster_name Promotion has started on cluster .

The promotion of a standby has started on cluster cluster_name The promotion of a standby has started on cluster .

Witness failure for cluster cluster_name Witness failure for cluster

node_address Witness running at node_address has left the cluster.

Idle agent failure for cluster cluster_name Idle agent failure for cluster .

node_address Idle agent running at has left the cluster.

One or more nodes isolated from network for cluster cluster_name One or more nodes isolated from network for cluster

This node appears to be isolated from the network. Other members seen in the cluster are: node_name

Node no longer isolated from network for cluster cluster_name Node no longer isolated from network for cluster .

This node is no longer isolated from the network.

Standby agent tried to promote, but master DB is still running

The standby EFM agent tried to promote itself, but detected that the master DB is still running on node_address The standby EFM agent tried to promote itself, but detected that the master DB is still running on node_address . This usually indicates that the master EFM agent has exited. Failover has NOT occurred.

Standby agent started to promote, but master has rejoined.

The standby EFM agent started to promote itself, but found that a master agent has rejoined the cluster. Failover has NOT occurred.

Standby agent tried to promote, but could not verify master DB

The standby EFM agent tried to promote itself, but could not detect whether or not the master DB is still running on node_address The standby EFM agent tried to promote itself, but could not detect whether or not the master DB is still running on node_address . Failover has NOT occurred.

Standby agent tried to promote, but VIP appears to still be assigned

The standby EFM agent tried to promote itself, but could not because the virtual IP address ( VIP_address ) appears to still be assigned to another node. Promoting under these circumstances could cause data corruption. Failover has NOT occurred. The standby EFM agent tried to promote itself, but could not because the virtual IP address ( ) appears to still be assigned to another node. Promoting under these circumstances could cause data corruption. Failover has NOT occurred.

Standby agent tried to promote, but appears to be orphaned

The standby EFM agent tried to promote itself, but could not because the well-known server ( server_address ) could not be reached. This usually indicates a network issue that has separated the standby agent from the other agents. Failover has NOT occurred.

Failover has not occurred

An agent has detected that the master database is no longer available in cluster cluster_name , but there are no standby nodes available for failover. An agent has detected that the master database is no longer available in cluster , but there are no standby nodes available for failover.

Potential manual failover required on cluster cluster_name Potential manual failover required on cluster .

A potential failover situation was detected for cluster cluster_name A potential failover situation was detected for cluster . Automatic failover has been disabled for this cluster, so manual intervention is required.

Failover has completed on cluster cluster_name Failover has completed on cluster

Failover has completed on cluster cluster_name Failover has completed on cluster .

Lock file for cluster cluster_name Lock file for cluster has been removed

The lock file for cluster cluster_name The lock file for cluster on node node_address path_name has been removed from: node_address . This lock prevents multiple agents from monitoring the same cluster on the same node. Please restore this file to prevent accidentally starting another agent for cluster.

file for cluster cluster_name recovery.conf file for cluster has been found

A recovery.conf file for cluster cluster_name A recovery.conf file for cluster on master node node_address path_name on master node has been found at: . This may be problematic should you attempt to restart the DB on this node.

recovery_target_timeline is not set to latest in recovery.conf

The recovery_target_timeline parameter is not set to latest in the recovery.conf file. The standby server will not be able to follow a timeline change that occurs when a new master is promoted.

trigger_file path given in recovery.conf is not writable

The path provided for the trigger_file parameter in the recovery.conf file is not writable by the db_service_owner user. Failover Manager will not be able to promote the database if needed. The path provided for the trigger_file parameter in the recovery.conf file is not writable by the user. Failover Manager will not be able to promote the database if needed.

Promotion has not occurred for cluster cluster_name Promotion has not occurred for cluster

A promotion was attempted but there is already a node being promoted: ip_address .

Standby not reconfigured after failover in cluster cluster_name Standby not reconfigured after failover in cluster

The auto . property has been set to false for this node. The node has not been reconfigured to follow the new master node after a failover. reconfigure property has been set to false for this node. The node has not been reconfigured to follow the new master node after a failover.

Could not resume replay for cluster cluster_name Could not resume replay for cluster

Could not resume replay for standby being promoted. Manual intervention may be required. Error: error_decription
This error is returned if the server encounters an error when invoking replay during the promotion of a standby.

Could not resume replay for standby standby_id Could not resume replay for standby .

Could not resume replay for standby. Manual intervention may be required. Error: error_message .

Possible problem with database timeout values

Your remote.timeout value ( value ) is higher than your local.timeout value ( value ) . If the local database takes too long to respond, the local agent could assume that the database has failed though other agents can connect. While this will not cause a failover, it could force the local agent to stop monitoring, leaving you without failover protection.

No standbys available for promotion in cluster cluster_name No standbys available for promotion in cluster

The current number of standby nodes in the cluster has dropped to the minimum number: number . There cannot be a failover unless another standby node(s) is added or made promotable.

Custom monitor timeout for cluster cluster_name Custom monitor timeout for cluster

The following custom monitoring script has timed out: script_name

Custom monitor 'safe mode' failure for cluster cluster_name Custom monitor 'safe mode' failure for cluster

The following custom monitor script has failed, but is being run in "safe mode": script_name .
Output: script_results

以下の表にリストされている条件は SEVERE 通知をトリガーします。

Subject

Description

node_address Unable to connect to DB on

The maximum connections limit has been reached.

node_address Unable to connect to DB on

Invalid password for db.user= user_name Invalid password for db.user= .

node_address Unable to connect to DB on

Invalid authorization specification.

Master cannot ping local database for cluster cluster_name Master cannot ping local database for cluster

The master agent can no longer reach the local database running at node_address. The master agent can no longer reach the local database running at node_address. Other nodes are able to access the database remotely, so the master will not release the VIP and/or create a recovery.conf file. The master agent will become idle until the resume command is run to resume monitoring the database. Other nodes are able to access the database remotely, so the master will not release the VIP and/or create a file. The master agent will become idle until the resume command is run to resume monitoring the database.

Fencing script error

Fencing script script_name failed to execute successfully.

Exit Value: exit_code
Results: script_results
Failover has NOT occurred.

Post-promotion script failed

Post-promotion script script_name failed to execute successfully.
Exit Value: exit_code
Results: script_results

Remote-post-promotion script failed

Remote-post-promotion script script_name failed to execute successfully

Exit Value: exit_code

Results: script_results

Node: node_address

Remote-pre-promotion script failed

Remote-pre-promotion script script_name failed to execute successfully

Exit Value: exit_code

Results: script_results

Node: node_address

Post-database failure script error

Post-database failure script script_name failed to execute successfully.
Exit Value: exit_code
Results: script_results

Agent resumed script error

Agent resumed script script_name failed to execute successfully.
Results: script_results

Master isolation script failed

Master isolation script script_name failed to execute successfully.
Exit Value: exit_code
Results: script_results

Could not promote standby

The trigger file file_name could not be created on node. Could not promote standby. Error details: message_details

for cluster cluster_name node_address recovery.conf file on Error creating

during promotion. Promotion has continued, but requires manual intervention to ensure that the old master node can not be restarted. Error details: There was an error creating the recovery.conf file on master node node_address during promotion. Promotion has continued, but requires manual intervention to ensure that the old master node can not be restarted. Error details: There was an error creating the recovery.conf file on master node during promotion. Promotion has continued, but requires manual intervention to ensure that the old master node can not be restarted. Error details: message_details

An unexpected error has occurred for cluster cluster_name An unexpected error has occurred for cluster

An unexpected error has occurred on this node. Please check the agent log for more information. Error: error_details

Master database being fenced off for cluster cluster_name Master database being fenced off for cluster

The master database has been isolated from the majority of the cluster. The cluster is telling the master agent at to fence off the master database to prevent two masters when the rest of the failover manager cluster promotes a standby. The master database has been isolated from the majority of the cluster. The cluster is telling the master agent at ip_address The master database has been isolated from the majority of the cluster. The cluster is telling the master agent at to fence off the master database to prevent two masters when the rest of the failover manager cluster promotes a standby. ip_address to fence off the master database to prevent two masters when the rest of the failover manager cluster promotes a standby.

Isolated master database shutdown.

The isolated master database has been shutdown by failover manager.

Master database being fenced off for cluster cluster_name Master database being fenced off for cluster

The master database has been isolated from the majority of the cluster. Before the master could finish detecting isolation, a standby was promoted and has rejoined this node in the cluster. This node is isolating itself to avoid more than one master database.

Could not assign VIP to node node_address Could not assign VIP to node

Failover manager could not assign the VIP address for some reason.

database failure for cluster cluster_name master_or_standby database failure for cluster

The database has failed on the specified node.

Agent is timing out for cluster cluster_name Agent is timing out for cluster

This agent has timed out trying to reach the local database. After the timeout, the agent could successfully ping the database and has resumed monitoring. However, the node should be checked to make sure it is performing normally to prevent a possible database or agent failure.

Resume timed out for cluster cluster_name Resume timed out for cluster

This agent could not resume monitoring after reconfiguring and restarting the local database. See agent log for details.

Internal state mismatch for cluster cluster_name Internal state mismatch for cluster

The failover manager cluster's internal state did not match the actual state of the cluster members. This is rare and can be caused by a timing issue of nodes joining the cluster and/or changing their state. The problem should be resolved, but you should check the cluster status as well to verify. Details of the mismatch can be found in the agent log file.

Failover has not occurred

An agent has detected that the master database

is no longer available in cluster cluster_name , but there are not enough standby nodes available for failover.. is no longer available in cluster , but there are not enough standby nodes available for failover..

node_address Database in wrong state on node_address

The standby agent has detected that the local database is no longer in recovery. The agent will now become idle. Manual intervention is required.

node_address Database in wrong state on node_address

The master agent has detected that the local database is in recovery. The agent will now become idle. Manual intervention is required.

Database connection failure for cluster cluster_name Database connection failure for cluster

This node is unable to connect to the database running on: node_address This node is unable to connect to the database running on:

Until this is fixed, failover may not work properly because this node will not be able to check if the database is running or not.

Standby custom monitor failure for cluster cluster_name Standby custom monitor failure for cluster

The following custom monitor script has failed on a standby node.
The agent will stop monitoring the local database.

Script location: script_name

Script output: script_results

Master custom monitor failure for cluster cluster_name Master custom monitor failure for cluster

The following custom monitor script has failed on a master node.

EFM will attempt to promote a standby.
Script location: script_name

Script output: script_results

set to true for master node property_name set to true for master node

property has been set to true for this cluster. Stopping the master agent without stopping the entire cluster will be treated by the rest of the cluster as an immediate master agent failure. If maintenance is required on the master database, shut down the master agent and wait for a notification from the remaining nodes that failover will not happen. The property_name property has been set to true for this cluster. Stopping the master agent without stopping the entire cluster will be treated by the rest of the cluster as an immediate master agent failure. If maintenance is required on the master database, shut down the master agent and wait for a notification from the remaining nodes that failover will not happen.

Load balancer attach scrip error

Load balancer attach script script_name failed to execute successfully.
Exit Value: exit_code
Results: script_results

Load balancer detach script error

Load balancer detach script script_name failed to execute successfully.
Exit Value: exit_code
Results: script_results

ご注意：管理者のメールアドレスに通知を送信することに加えて、すべての通知は、クラスタログファイルに記録されている（ /var/log/efm-3.4/ cluster_name .log ）。

8 サポートされているフェイルオーバーと障害のシナリオ

フェールオーバーマネージャーは、フェールオーバーを引き起こす可能性があるまたは起こらない可能性がある障害についてクラスターを監視します。

Failover Managerは、非常に限定的なフェールオーバーシナリオをサポートしています。フェイルオーバーが発生する可能性があります。

•

Masterデータベースがクラッシュしたかシャットダウンされた場合

•

Masterデータベースをホストしているノードがクラッシュするか、またはアクセスできない場合。

Failover Managerはこれらの条件の正確さを検証するためにあらゆる試みをします。マスターデータベースまたはノードに障害が発生したことをエージェントが確認できない場合、Failover Managerはクラスタに対してフェイルオーバーアクションを実行しません。

Failover Managerは noをサポートします。自動 - あなたはフェイルオーバーマネージャーは、監視およびフェイルオーバー条件を検出しますが、スタンバイへの自動フェイルオーバーを実行しないようにしたいような状況のためのフェールオーバーモード。このモードでは、フェイルオーバー条件が満たされると通知が管理者に送信されます。自動フェイルオーバーを無効にするには、クラスタプロパティファイルを修正して auto 設定します。 failover パラメータを false 設定し false （ 3.5.1.1項を参照）。

Failover Managerは、管理者の介入を必要とするがスタンバイデータベースをマスターに昇格することには意味がない状況について管理者に警告します。

8.1 マスターデータベースが停止している

Masterデータベースノードで実行されているエージェントがMasterデータベースの障害を検出すると、Failover Managerは障害を確認するプロセスを開始します（図8.1を参照）。

$C：\ Users \ susan \ Desktop \スクリーンショット2015-05-28 at 11.19.47 AM.png$

図8.1 - masterデータベースの障害確認

Masterノード上のエージェントがMasterデータベースの障害を検出すると、すべてのエージェントがMasterデータベースへの直接接続を試みます。エージェントがデータベースに接続できる場合、Failover Managerはマスターノードの状態に関する通知を送信します。接続できるエージェントがいない場合、マスターエージェントはデータベース障害を宣言し、VIPを解放します（該当する場合）。

どのエージェントも仮想IPアドレスまたはデータベースサーバーにアクセスできない場合、Failover Managerはフェイルオーバープロセスを開始します。最新のノード上のスタンバイエージェントは、フェンシングスクリプトを実行し（該当する場合）、スタンバイデータベースをマスターデータベースに昇格させ、仮想IPアドレスをスタンバイノードに割り当てます。 auto ない限り、追加のスタンバイノードはすべて新しいマスターから複製するように構成されます。 reconfigure は false 設定されてい false 。該当する場合、エージェントは販売促進スクリプトを実行します。

ノードをクラスタに戻す

クラスタ全体を再起動せずにこのシナリオから回復するには、次の手順を実行します。

1。

元のマスターノード上のデータベースをスタンバイデータベースとして再起動します。

2。

元のマスターノードで efm resume コマンドを呼び出します。

ノードをマスターの役割に戻す

ノードをスタンバイとしてクラスターに戻した後、ノードをマスターの役割に簡単に戻すことができます。

1。

クラスタに複数のスタンバイノードがある場合は、 efm allow-node コマンドを使用してノードのフェールオーバー優先順位を 1 に設定します。

2。

ノードをその元の役割であるマスターノードに昇格させるには、 efm promote -switchover コマンドを呼び出します。コマンドの詳細については、 5.3 項を参照してください。

8.2 スタンバイデータベースが停止している

スタンバイエージェントがそのデータベースの障害を検出すると、エージェントは他のエージェントに通知します。他のエージェントはデータベースの状態を確認します（図8.2参照）。

図8.2 - スタンバイデータベースの障害確認

スタンバイデータベースを efm resume な状態に戻した後、 efm resume コマンドを実行してスタンバイをクラスタに戻します。

8.3 マスターエージェントの終了またはノード障害

Failover Manager Masterエージェントがクラッシュした場合やノードに障害が発生した場合は、スタンバイエージェントがその障害を検出し、（適切な場合）フェイルオーバーを開始します（図8.3を参照）。

図8.3 - マスターエージェントの失敗の確認

エージェントがマスターエージェントが去ったことを検出すると、すべてのエージェントはマスターデータベースへの直接接続を試みます。いずれかのエージェントがデータベースに接続できると、エージェントはマスターエージェントの失敗に関する通知を送信します。接続できるエージェントがいない場合、エージェントは仮想IPアドレスをpingして解放されているかどうかを確認します。

どのエージェントも仮想IPアドレスまたはデータベースサーバーにアクセスできない場合、Failover Managerはフェイルオーバープロセスを開始します。最新のノード上のスタンバイエージェントは、フェンシングスクリプトを実行し（該当する場合）、スタンバイデータベースをマスターデータベースに昇格させ、仮想IPアドレスをスタンバイノードに割り当てます。該当する場合、エージェントは販売促進スクリプトを実行します。 auto ない限り、追加のスタンバイノードはすべて新しいマスターから複製するように構成されます。 reconfigure は false 設定されてい false 。

マスターがネットワークから分離されているためにこのシナリオが発生した場合、マスターエージェントはその分離を検出して仮想IPアドレスを解放し、 recovery を作成します。 conf ファイルFailover Managerは、クラスタの残りのノードで上記の手順を実行します。

クラスタ全体を再起動せずにこのシナリオから回復するには、次の手順を実行します。

1。

元のマスターノードを再起動します。

2。

元のMasterデータベースをスタンバイノードとして起動します。

3。

元のマスターノードでサービスを開始します。

エージェントを停止しても、エージェントに障害が発生したことがクラスタに通知されるわけではありません。

8.4 スタンバイエージェントの終了またはノードの障害

スタンバイエージェントが終了した場合、またはスタンバイノードに障害が発生した場合、他のエージェントはそれがクラスタに接続されていないことを検出します。

図8.4 - スタンバイエージェントの失敗

障害が検出されると、エージェントはノード上にあるデータベースへの接続を試みます。エージェントが問題があることを確認すると、Failover Managerは適切な通知を管理者に送信します。

1つのマスターと1つのスタンバイしか残っていない場合、マスターノードに障害が発生してもフェイルオーバー保護はありません。 Masterデータベースに障害が発生した場合、MasterエージェントとStandbyエージェントはデータベースに障害が発生したことに同意し、フェールオーバーを続行できます。

8.5 専用監視エージェントの出口/ノード障害

次のシナリオでは、専用のWitness（データベースをホストしていないノード）に障害が発生した場合の対処方法について詳しく説明します。

図8.5 - 専用の目撃者の失敗の確認

エージェントがWitnessノードに到達できないことを検出すると、Failover Managerは管理者にWitnessの状態を通知します（図8.5を参照）。

注：1つのマスターと1つのスタンバイしか残っていない場合、マスターノードに障害が発生してもフェイルオーバー保護はありません。 Masterデータベースに障害が発生した場合、MasterエージェントとStandbyエージェントはデータベースに障害が発生したことに同意し、フェールオーバーを続行できます。

8.6 ノードがクラスタから切り離される

次のシナリオでは、1つ以上のノード（少数のクラスタ）が多数のクラスタから分離された場合に実行されるアクションについて詳しく説明します。

$C：\ Users \ susan \ Desktop \スクリーンショット2015-02-02、9.27.03 AM.png$

図8.6 - クラスタのメンバーが孤立した場合

1つ以上のノード（ただし、クラスタの半分未満）が残りのクラスタから分離された場合、残りのクラスタは、ノードに障害が発生した場合と同様に動作します。エージェントは、マスターノードが独立ノードの中にあるかどうかを識別しようとします。つまり、マスターはクラスターからのフェンシングを解除し、スタンバイノード（クラスターの大多数の中から）はそれを置き換えるようにプロモートされます。他のスタンバイノードは、 auto ない限り、新しいマスターから複製するように設定されています。 reconfigure は false 設定されてい false 。

その後、Failover Managerは管理者に通知し、分離されたノードは可能なときにクラスタに再参加します。ノードがクラスタに再参加すると、フェイルオーバーの優先順位が変わる可能性があります。

9 既存のクラスタのアップグレード

Failover Managerは、Failover Managerクラスタをアップグレードするときに役立つユーティリティを提供します。既存のクラスタをアップグレードするには、次の手順を実行する必要があります。

1。

フェールオーバーマネージャ3.4をクラスタの各ノードにインストールします。 Failover Managerのインストールの詳細については、セクション 3を参照してください。

2。

Failover Managerをインストールした後、 efm 起動して efm upgrade - conf ユーティリティを作成します。 properties と。 Failover Manager 3.4の nodes ファイルFailover Managerインストーラがアップグレードユーティリティを追加しました（ efm /usr/edb/efm-3.4/binディレクトリにupgrade - conf)します。ユーティリティを起動するには、root権限を想定して次のコマンドを起動します。

efm upgrade-conf cluster _ name

efm upgrade - conf ユーティリティが。 properties と。フェールオーバーマネージャで使用するために、既存のクラスタのファイルを nodes 、パラメータ値を新しい設定ファイルにコピーします。ユーティリティは、更新された設定ファイルのコピーを /etc/edb/efm-3.4 ディレクトリに保存します。

3。

を変更します。 properties と。 EFM 3.4用の nodes ファイル。新しい設定を指定します。 Failover Managerのバージョン3.4では、次の設定プロパティが追加されました。

master.shutdown.as.failure

選択したエディタを使用して、そのノードのサービスを開始する前に、プロパティファイル（ /etc/edb/efm-3.4 ディレクトリにあります）の追加のプロパティを変更します。プロパティ設定の詳細については、 3.5 項を参照してください。

4。

バージョン固有のコマンドを使用して、古いFailover Managerクラスタを停止します。たとえば、次のコマンドを使用してバージョン3.4のクラスタを停止できます。

/usr/efm-3.4/bin/efm stop-cluster efm

5。

クラスタの各ノードで新しいフェールオーバーマネージャサービス（ efm-3.4 ）を起動します。サービス開始の詳細については、 4.1.1 項を参照してください。

次の例は、アップグレードユーティリティを呼び出してを作成する方法を示しています。 properties と。 Failover Managerインストール用の nodes ファイル

[root@ONE efm-3.4]}> /usr/edb/efm-3.4/bin/efm upgrade-conf example
Checking directory /etc/edb/efm-3.4
Processing example.properties file
jvm.options property value updated to "-Xmx128m".

The following properties were added in addition to those in previous installed version:
master.shutdown.as.failure

Checking directory /etc/edb/efm-3.3
Processing example.nodes file

Upgrade of files is finished. The owner and group for properties and nodes files have been set as 'efm'.

sudoを使用せずにFailover Manager構成を使用している場合は、 - source フラグを含めて、 upgrade - conf を呼び出すときに構成ファイルが存在するディレクトリの名前を指定します。

-あなたはsudoをせずにフェールオーバーマネージャーの設定を使用している場合は、含める source フラグを設定し、ファイルが存在するディレクトリの名前を指定します。ディレクトリが設定のデフォルトディレクトリではない場合、アップグレードされたファイルは upgrade - conf コマンドが呼び出されたディレクトリに作成されます。詳細はセクション 3.4.1を参照してください。

注意：カスタムのサービススクリプトまたはユニットファイルを使用している場合は、アップグレードを実行するときに新しいFailover Managerサービス名を反映するようにファイルを手動で更新する必要があります。

9.1 Failover Managerのアンインストール

Failover Manager 3.4にアップグレードした後、Yumを使用して以前のFailover Managerのインストールを削除できます。たとえば、次のコマンドを使用してFailover Manager 3.3と不要な依存関係を削除します。

yum remove edb-efm33

9.2 データベース更新（マイナーバージョン）の実行

このセクションでは、マイナーデータベースバージョンのクイックアップグレードを実行する方法について説明します。次の手順を使用して、あるマイナーバージョンから別のマイナーバージョンにアップグレードする（たとえば、10.1.5からバージョン10.2.7にアップグレードする）か、バージョンにパッチリリースを適用することができます。

まず、Failover Managerクラスタの各スタンバイノードのデータベースサーバを更新する必要があります。次に、スイッチオーバーを実行し、スタンバイノードをFailover Managerクラスタ内のMasterの役割に昇格させます。次に、古いマスターノードでデータベースの更新を実行します。

データベースサーバを更新するには、クラスタの各ノードで次の手順を実行する必要があります。

1。

Failover Managerエージェントを停止します。

2。

データベースサーバを停止します。

3。

データベースサーバを更新します。

4。

データベースサービスを起動します。

5。

Failover Managerエージェントを起動します。

Advanced Serverサービスの制御、またはAdvanced Serverのバージョンのアップグレードについての詳細は、次のWebサイトで入手可能な『EDB Postgres Advanced Serverガイド』を参照してください。

https://www.enterprisedb.com/resources/product-documentation

更新が完了したら、 efm set - priorityコマンドを使用して古いマスターをスタンバイリストの先頭に追加してから、スイッチオーバーしてクラスターを元の状態に戻すことができます。 efm set - priorityの詳細については、 5.3項を参照してください。

10 トラブルシューティング

予期しないエラーメッセージに関する通知メッセージを受け取った場合は、フェールオーバーマネージャのログファイル（セクション 6を参照）でOutOfMemoryメッセージを確認します。 Failover Managerは、このプロパティで設定されたデフォルトのメモリ値で動作します。

# Extra information that will be passed to the JVM when starting the agent.

jvm.options=-Xmx128m

128メガバイト未満を割り当てて実行している場合は、値を増やしてFailover Managerエージェントを再起動する必要があります。

Failover ManagerはOpenJDKでテストされています。 OpenJDKを使用することを強くお勧めします。 Javaインストールのタイプを確認するには、次のコマンドを使用できます。

# java -version

openjdk version "1.8.0_191"

OpenJDK Runtime Environment (build 1.8.0_191-b12)

OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

11 付録A - ストリーミングレプリケーションの設定

次のセクションでは、ストリーミングレプリケーションを使用してマスターノードからスタンバイノードにデータをレプリケートする単純な2ノードレプリケーションシナリオを構成するプロセスについて説明します。より大きなシナリオでの複製プロセスは複雑になる可能性があります。設定オプションの詳細については、PostgreSQLのコアドキュメントを参照してください。

https://www.postgresql.org/docs/10/static/warm-standby.html#streaming-replication

次の例では、を使用します。レプリケーションユーザーに対して md5 認証を有効にするための pgpass ファイル - これはあなたの環境にとって最も安全な認証方法かもしれません。サポートされている認証オプションの詳細については、以下のPostgreSQLコアドキュメントを参照してください。

https://www.postgresql.org/docs/10/static/client-authentication.html

以下の手順では、1つのマスターノードと1つのスタンバイノードを使用して、それぞれEDB Postgres Advanced Serverのインストールを実行する単純なストリーミングレプリケーションシナリオを構成します。例では：

•

マスターノードは 146.148.46.44

•

スタンバイノードは 107.178.217.178

•

複製ユーザー名は edbrepuserです。

例で参照されているパス名とコマンドは、CentOS 6.xホスト上にあるAdvanced Serverホスト用です。構成に応じてパスとコマンドを変更する必要があります。

マスターノードの設定

複製シナリオのマスターノードに接続し、 pg_hba.conf ファイル（ Postgresインストールの下の data ディレクトリにあります）を変更して、複製ユーザー（この例では edbrepuser ）の接続情報を追加します。

host replication edbrepuser 107.178.217.178/32 md5

接続情報には、レプリケーションシナリオのスタンバイノードのアドレスと、優先認証方法を指定する必要があります。

変更 postgresql.conf （にあるファイル data directory, under your Postgres installation ファイルの末尾に次のレプリケーションのパラメータと値を追加し、）：

wal_level = hot_standby
max_wal_senders = 8
wal_keep_segments = 128
archive_mode = on
archive_command = 'cp %p /tmp/%f'

構成ファイルを保存してサーバーを再起動します。

/etc/init.d/edb-as10 restart

sudo su – コマンドを使用して、 enterprisedb データベースのスーパーユーザーの身元を確認します。

sudo su - enterprisedb

Then, start a edb database: session, connecting to the Then, start a psql session, connecting to the Then, start a database:

/opt/edb/as10/psql -d edb

時 psql コマンドラインを持つユーザーの作成 replication 属性を：

CREATE ROLE edbrepuser WITH REPLICATION LOGIN PASSWORD 'password';

スタンバイノードの設定

スタンバイサーバーに接続し、データベーススーパーユーザーのID（ enterprisedb ）を想定します。

sudo su - enterprisedb

選択したエディタで、 enterprisedb ユーザのホームディレクトリに .pgpass ファイルを作成します。 .pgpass ファイルはプレーンテキスト形式で複製、ユーザのパスワードを保持しています。 .pgpass ファイルを使用している場合は、信頼できるユーザーだけが .pgpass ファイルにアクセスできるようにする .pgpass ます。

レプリケーションユーザーの接続情報を指定するエントリを追加します。

*:5444:*:edbrepuser:password

サーバーは、 .pgpass ファイルに制限付きのアクセス許可を強制します。ファイルのアクセス権を設定するには、次のコマンドを使用します。

chmod 600 .pgpass

データベースのスーパーユーザーの身元を放棄します。

exit

次に、スーパーユーザー特権を仮定します。

sudo su -

スタンバイノード上の data ディレクトリをマスターノードの data ディレクトリに置き換える前に、データベースサーバーを停止する必要があります。以下のコマンドを使用してください。

/etc/init.d/edb-as-10 stop

次に、スタンバイノードの data ディレクトリを削除します。

rm -rf /opt/edb/as10/data

既存の data ディレクトリを削除し pg_basebackup 、 bin ディレクトリに移動し、 pg_basebackup ユーティリティを使用してマスターノードの data ディレクトリをスタンバイにコピーし data 。

cd /opt/edb/as10/bin
./pg_basebackup –R –D /opt/edb/as10/data
--host=146.148.46.44 –-port=5444
--username=edbrepuser --password

pg_basebackup の呼び出しは、マスターノードのIPアドレスとマスターノード上に作成されたレプリケーションユーザーの名前を指定します。 pg_basebackupユーティリティで利用可能なオプションの詳細については、以下のPostgreSQLコアドキュメントを参照してください。

https://www.postgresql.org/docs/10/static/app-pgbasebackup.html

pg_basebackup からプロンプトが pg_basebackup され pg_basebackup 、複製ユーザに関連付けられているパスワードを入力します。

data ディレクトリをコピーしたら、ディレクトリの所有権をデータベースのスーパーユーザー（ enterprisedb ）に変更します。

chown -R enterprisedb /opt/edb/as10/data

移動し data ディレクトリ：

cd /opt/edb/as10/data

編集者のお好みで、という名前のファイルを作成 recovery.conf( 中 /opt/PostgresPlus/9.xAS/data 含まれているディレクトリ）：

standby_mode = on
primary_conninfo = 'host=146.148.46.44 port=5444 user=edbrepuser sslmode=prefer sslcompression=1 krbsrvname=postgres'
trigger_file = '/opt/edb/as10/data/mytrigger'
restore_command = '/bin/true'
recovery_target_timeline = 'latest'

primary_conninfo パラメータは、レプリケーションシナリオのマスターノード上の複製ユーザーの接続情報を指定します。

recovery 所有権を変更します。 conf のEnterpriseDBへのファイル：

chown enterprisedb:enterprisedb recovery.conf

postgresql 修正してください。ファイルの最後に次の値を指定して、 conf ファイル（ Postgresインストールの下の data directory にあります）：

wal_level = hot_standby
max_wal_senders = 8
wal_keep_segments = 128
hot_standby = on

The data file has been copied from the Master node, and will contain the replication parameters specified previously.

その後、サーバーを再起動します。

/etc/init.d/edb-as-10 start

この時点で、マスターノードはスタンバイノードにデータを複製します。

マスタからスタンバイへのレプリケーションの確認

次のコマンドを入力して、サーバーが稼働中で複製中であることを確認できます。

ps -ef | grep postgres

複製が実行されている場合、スタンバイサーバーは次のようにエコーします。

501 42054 1 0 07:57 pts/1 00:00:00 /opt/PostgresPlus/9.2AS/bin/edb-postgres -D /opt/PostgresPlus/9.2AS/data
501 42055 42054 0 07:57 ? 00:00:00 postgres: logger process
501 42056 42054 0 07:57 ? 00:00:00 postgres: startup process recovering 000000010000000000000004
501 42057 42054 0 07:57 ? 00:00:00 postgres: checkpointer process
501 42058 42054 0 07:57 ? 00:00:00 postgres: writer process
501 42059 42054 0 07:57 ? 00:00:00 postgres: stats collector process
501 42060 42054 0 07:57 ? 00:00:00 postgres: wal receiver process streaming 0/4000150
501 42068 42025 0 07:58 pts/1 00:00:00 grep postgres

psqlクライアントでスタンバイに接続して pg_is_in_recovery() 関数を問い合わせると、サーバーは次のように応答します。

edb=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)

マスターノードに対して行われたエントリはすべてスタンバイノードに複製されます。スタンバイノードは読み取り専用モードで動作します。スタンバイサーバーに問い合わせることはできますが、スタンバイノードにあるデータベースに直接エントリを追加することはできません。

手動でフェイルオーバーを呼び出す

スタンバイをマスターノードになるように昇格させるには、クラスタの所有者のID（enterprisedb）を想定します。

sudo su - enterprisedb

それから、pg_ctlを呼び出します。

/opt/edb/as10/bin/pg_ctl promote -D / opt/edb/as10 /data/

その後、psqlを使用してスタンバイノードに接続すると、サーバーはそれがスタンバイノードではなくなったことを確認します。

edb=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)

ストリーミングレプリケーションの設定と使用の詳細については、次のWebサイトで入手可能なPostgreSQLのコアドキュメントを参照してください。

https://www.postgresql.org/docs/10/static/warm-standby.html#streaming-replication

11.1 カスケードレプリケーションの制限付きサポート

Failover Managerはカスケード複製を完全にはサポートしていませんが、カスケード複製のシナリオでは単純なフェイルオーバーを限定的にサポートしています。カスケードレプリケーションを使用すると、スタンバイノードを別のスタンバイノードにストリーミングして、マスターノードへの接続数（および処理オーバーヘッド）を減らすことができます。

$C：\ Users \ susan \ AppData \ Local \ Temp \ vmware-susan \ VMwareDnD \ 46cf8bc4 \スクリーンショット2016-10-06 at 11.15.37 AM.png$

カスケードレプリケーションの設定の詳細については、PostgreSQLのドキュメントを参照してください。

https://www.postgresql.org/docs/10/static/warm-standby.html#cascading-replication

カスケードレプリケーションシナリオでフェールオーバーマネージャーを使用するには、スタンバイノード＃2で次のプロパティ値を設定して、クラスタプロパティファイルを変更する必要があります。

promotable=false
auto.reconfigure=false

フェイルオーバーが発生すると、スタンバイノード＃1がマスターノードの役割に昇格されます。フェイルオーバーが発生した場合、スタンバイノード＃2は、3つのノードが含まれるようにレプリケーションシナリオを手動で再設定する操作を実行するまで、新しいマスターノードの読み取り専用レプリカとして機能し続けます。

スタンバイノード＃1に障害が発生した場合、フェールオーバー保護はありませんが、ノードの障害を通知する電子メールが届きます。

スイッチオーバーを実行して元のマスターに切り替えると、カスケード複製シナリオが維持されないことがあります。

12 付録B - Failover ManagerクラスタでのSSL認証の設定

次の手順で、Failover ManagerのSSL認証を有効にします。接続しているすべてのクライアントは、クラスタ内のデータベースサーバに接続するときにSSL認証を使用する必要があります。既存のクライアントが現在使用している接続方法を変更する必要があります。

Failover ManagerクラスタでSSLを有効にするには、次の手順を実行します。

1。

置き server 。 crt と server 。 dataディレクトリ（Advanced Serverインストールの下）にある key ファイル。認証局によって署名された証明書を購入することも、独自の自己署名証明書を作成することもできます。自己署名証明書の作成については、次の場所にあるPostgreSQLのコアドキュメントを参照してください。

https://www.postgresql.org/docs/10/static/ssl-tcp.html#ssl-certificate-creation

2。

postgresql 修正してください。 Failover Managerクラスタ内の各データベースに conf ファイルを作成し、SSLを有効にします。

ssl=on

postgresql 修正した後。 conf ファイル、サーバーを再起動する必要があります。

3。

pg _ hba 変更します。 Failover Managerクラスタの各ノードに conf ファイルを追加し、ファイルの先頭に次の行を追加します。

hostnossl all all all reject

この行は、SSL認証を使用していない接続をすべて拒否するようにサーバーに指示します。これにより、接続しているすべてのクライアントに対してSSL認証が強制されます。 pg_hba.conf ファイルの変更については、次の場所にあるPostgreSQLのコアドキュメントを参照してください。

https://www.postgresql.org/docs/10/static/auth-pg-hba-conf.html

4。

server 配置した後。 crt と server 。データディレクトリの key ファイルで、証明書をJavaが理解できる形式に変換します。あなたがコマンドを使用することができます：

openssl x509 -in server.crt -out server.crt.der -outform der

詳しくは、以下を参照してください。

https://jdbc.postgresql.org/documentation/94/ssl-client.html

5。

次に、証明書をJavaの信頼できる証明書ファイルに追加します。

keytool -keystore $JAVA_HOME/lib/security/cacerts -alias alias _ name -import -file server.crt.der

どこで

$JAVA_HOME は、Javaインストールのホームディレクトリです。

alias _ name は任意の文字列にできますが、各証明書に対して一意である必要があります。

keytool コマンドを使用して、使用可能な証明書のリストを確認したり、特定の証明書に関する情報を取得したりできます。 keytool コマンドの使用方法について詳しくは、次のように入力してください。

man keytool

各データベースサーバーからの証明書は、各エージェントの信頼できる証明書ファイルにインポートする必要があります。 cacerts ファイルの場所はシステムごとに異なる可能性があることに注意してください。詳しくは、次のWebサイトをご覧ください。

https://jdbc.postgresql.org/documentation/94/ssl-client.html

6。

efm 修正して efm 。 properties 設定、クラスタ内の各ノード上のファイル jdbc 。 sslmode プロパティ。

13 お問い合わせ

EDB Postgres Failover Managerに関して質問がある場合は、以下のEnterpriseDBにお問い合わせください。

sales@enterprisedb.com

Component	Location
Executables	/usr/edb/efm-3.4/bin
Libraries	/usr/edb/efm-3.4/lib
Cluster configuration files	/etc/edb/efm-3.4
Logs	/var/log/efm-3.4
Lock files	/var/lock/efm-3.4
Log rotation file	/etc/logrotate.d/efm-3.4
sudo configuration file	/etc/sudoers.d/efm-34
Binary to access VIP without sudo	/usr/edb/efm-3.4/bin/secure

Subject	Description
Executed fencing script	Executed fencing script script_name Results: script_results
Executed post-promotion script	Executed post-promotion script script_name Results: script_results
Executed remote pre-promotion script	Executed remote pre-promotion script script_name Results: script_results
Executed remote post-promotion script	Executed remote post-promotion script script_name Results: script_results
Executed post-database failure script	Executed post-database failure script script_name Results: script_results
Executed master isolation script	Executed master isolation script script_name Results: script_results
for cluster cluster_name node_address for cluster Witness agent running on node_address Witness agent running on	Witness agent is running.
for cluster cluster_name node_address for cluster Master agent running on	Master agent is running and database health is being monitored.
for cluster cluster_name node_address for cluster Standby agent running on	Standby agent is running and database health is being monitored.
for cluster cluster_name Idle agent running on node node_address for cluster Idle agent running on node	Idle agent is running. After starting the local database, the agent can be resumed.
Assigning VIP to node node_address Assigning VIP to node	Assigning VIP VIP_address to node node_address Results: script_results
Releasing VIP from node node_address Releasing VIP from node	Releasing VIP VIP_address from node node_address Results: script_results
Starting auto resume check for cluster cluster_name Starting auto resume check for cluster cluster_name	The agent on this node will check every auto.resume.period seconds to see if it can resume monitoring the failed database. The cluster should be checked during this time and the agent stopped if the database will not be started again. See the agent log for more details. The agent on this node will check every seconds to see if it can resume monitoring the failed database. The cluster should be checked during this time and the agent stopped if the database will not be started again. See the agent log for more details.
Executed agent resumed script	Executed agent resumed script script_name Executed agent resumed script Results: script_results

Subject	Description
node_address Witness agent exited on for cluster cluster_name node_address for cluster Witness agent exited on node_address	Witness agent has exited.
Master agent exited on for cluster cluster_name node_address for cluster Master agent exited on node_address	Database health is not being monitored.
Cluster cluster_name notified that master node has left	Failover is disabled for the cluster until the master agent is restarted.
Standby agent exited on for cluster cluster_name node_address for cluster Standby agent exited on node_address	Database health is not being monitored.
for cluster cluster_name node_address for cluster Agent exited during promotion on	Database health is not being monitored.
Agent exited on for cluster cluster_name node_address for cluster Agent exited on node_address	The agent has exited. This is generated by an agent in the Idle state.
Agent exited for cluster cluster_name Agent exited for cluster	The agent has exited. This notification is usually generated during startup when an agent exits before startup has completed.
Virtual IP address assigned to non-master node	The virtual IP address appears to be assigned to a non-master node. To avoid any conflicts, Failover Manager will release the VIP. You should confirm that the VIP is assigned to your master node and manually reassign the address if it is not.
Virtual IP address not assigned to master node.	The virtual IP address appears to not be assigned to a master node. EDB Postgres Failover Manager will attempt to reacquire the VIP.
No standby agent in cluster for cluster cluster_name No standby agent in cluster for cluster	cluster_name The standbys on have left the cluster.
Standby agent failed for cluster cluster_name Standby agent failed for cluster	cluster_name A standby agent on has left the cluster, but the coordinator has detected that the standby database is still running.
Standby database failed for cluster cluster_name Standby database failed for cluster	A standby agent has signaled that its database has failed. The other nodes also cannot reach the standby database.
Standby agent cannot reach database for cluster cluster_name Standby agent cannot reach database for cluster	A standby agent has signaled database failure, but the other nodes have detected that the standby database is still running.
Cluster cluster_name has dropped below three nodes	At least three nodes are required for full failover protection. Please add witness or agent node to the cluster.
Subset of cluster cluster_name Subset of cluster disconnected from master	This node is no longer connected to the majority of the cluster cluster_name This node is no longer connected to the majority of the cluster . Because this node is part of a subset of the cluster, failover will not be attempted. Current nodes that are visible are: node_address . Because this node is part of a subset of the cluster, failover will not be attempted. Current nodes that are visible are: node_address
Promotion has started on cluster cluster_name Promotion has started on cluster .	The promotion of a standby has started on cluster cluster_name The promotion of a standby has started on cluster .
Witness failure for cluster cluster_name Witness failure for cluster	node_address Witness running at node_address has left the cluster.
Idle agent failure for cluster cluster_name Idle agent failure for cluster .	node_address Idle agent running at has left the cluster.
One or more nodes isolated from network for cluster cluster_name One or more nodes isolated from network for cluster	This node appears to be isolated from the network. Other members seen in the cluster are: node_name
Node no longer isolated from network for cluster cluster_name Node no longer isolated from network for cluster .	This node is no longer isolated from the network.
Standby agent tried to promote, but master DB is still running	The standby EFM agent tried to promote itself, but detected that the master DB is still running on node_address The standby EFM agent tried to promote itself, but detected that the master DB is still running on node_address . This usually indicates that the master EFM agent has exited. Failover has NOT occurred.
Standby agent started to promote, but master has rejoined.	The standby EFM agent started to promote itself, but found that a master agent has rejoined the cluster. Failover has NOT occurred.
Standby agent tried to promote, but could not verify master DB	The standby EFM agent tried to promote itself, but could not detect whether or not the master DB is still running on node_address The standby EFM agent tried to promote itself, but could not detect whether or not the master DB is still running on node_address . Failover has NOT occurred.
Standby agent tried to promote, but VIP appears to still be assigned	The standby EFM agent tried to promote itself, but could not because the virtual IP address ( VIP_address ) appears to still be assigned to another node. Promoting under these circumstances could cause data corruption. Failover has NOT occurred. The standby EFM agent tried to promote itself, but could not because the virtual IP address ( ) appears to still be assigned to another node. Promoting under these circumstances could cause data corruption. Failover has NOT occurred.
Standby agent tried to promote, but appears to be orphaned	The standby EFM agent tried to promote itself, but could not because the well-known server ( server_address ) could not be reached. This usually indicates a network issue that has separated the standby agent from the other agents. Failover has NOT occurred.
Failover has not occurred	An agent has detected that the master database is no longer available in cluster cluster_name , but there are no standby nodes available for failover. An agent has detected that the master database is no longer available in cluster , but there are no standby nodes available for failover.
Potential manual failover required on cluster cluster_name Potential manual failover required on cluster .	A potential failover situation was detected for cluster cluster_name A potential failover situation was detected for cluster . Automatic failover has been disabled for this cluster, so manual intervention is required.
Failover has completed on cluster cluster_name Failover has completed on cluster	Failover has completed on cluster cluster_name Failover has completed on cluster .
Lock file for cluster cluster_name Lock file for cluster has been removed	The lock file for cluster cluster_name The lock file for cluster on node node_address path_name has been removed from: node_address . This lock prevents multiple agents from monitoring the same cluster on the same node. Please restore this file to prevent accidentally starting another agent for cluster.
file for cluster cluster_name recovery.conf file for cluster has been found	A recovery.conf file for cluster cluster_name A recovery.conf file for cluster on master node node_address path_name on master node has been found at: . This may be problematic should you attempt to restart the DB on this node.
recovery_target_timeline is not set to latest in recovery.conf	The recovery_target_timeline parameter is not set to latest in the recovery.conf file. The standby server will not be able to follow a timeline change that occurs when a new master is promoted.
trigger_file path given in recovery.conf is not writable	The path provided for the trigger_file parameter in the recovery.conf file is not writable by the db_service_owner user. Failover Manager will not be able to promote the database if needed. The path provided for the trigger_file parameter in the recovery.conf file is not writable by the user. Failover Manager will not be able to promote the database if needed.
Promotion has not occurred for cluster cluster_name Promotion has not occurred for cluster	A promotion was attempted but there is already a node being promoted: ip_address .
Standby not reconfigured after failover in cluster cluster_name Standby not reconfigured after failover in cluster	The auto . property has been set to false for this node. The node has not been reconfigured to follow the new master node after a failover. reconfigure property has been set to false for this node. The node has not been reconfigured to follow the new master node after a failover.
Could not resume replay for cluster cluster_name Could not resume replay for cluster	Could not resume replay for standby being promoted. Manual intervention may be required. Error: error_decription This error is returned if the server encounters an error when invoking replay during the promotion of a standby.
Could not resume replay for standby standby_id Could not resume replay for standby .	Could not resume replay for standby. Manual intervention may be required. Error: error_message .
Possible problem with database timeout values	Your remote.timeout value ( value ) is higher than your local.timeout value ( value ) . If the local database takes too long to respond, the local agent could assume that the database has failed though other agents can connect. While this will not cause a failover, it could force the local agent to stop monitoring, leaving you without failover protection.
No standbys available for promotion in cluster cluster_name No standbys available for promotion in cluster	The current number of standby nodes in the cluster has dropped to the minimum number: number . There cannot be a failover unless another standby node(s) is added or made promotable.
Custom monitor timeout for cluster cluster_name Custom monitor timeout for cluster	The following custom monitoring script has timed out: script_name
Custom monitor 'safe mode' failure for cluster cluster_name Custom monitor 'safe mode' failure for cluster	The following custom monitor script has failed, but is being run in "safe mode": script_name . Output: script_results

Subject	Description
node_address Unable to connect to DB on	The maximum connections limit has been reached.
node_address Unable to connect to DB on	Invalid password for db.user= user_name Invalid password for db.user= .
node_address Unable to connect to DB on	Invalid authorization specification.
Master cannot ping local database for cluster cluster_name Master cannot ping local database for cluster	The master agent can no longer reach the local database running at node_address. The master agent can no longer reach the local database running at node_address. Other nodes are able to access the database remotely, so the master will not release the VIP and/or create a recovery.conf file. The master agent will become idle until the resume command is run to resume monitoring the database. Other nodes are able to access the database remotely, so the master will not release the VIP and/or create a file. The master agent will become idle until the resume command is run to resume monitoring the database.
Fencing script error	Fencing script script_name failed to execute successfully. Exit Value: exit_code Results: script_results Failover has NOT occurred.
Post-promotion script failed	Post-promotion script script_name failed to execute successfully. Exit Value: exit_code Results: script_results
Remote-post-promotion script failed	Remote-post-promotion script script_name failed to execute successfully Exit Value: exit_code Results: script_results Node: node_address
Remote-pre-promotion script failed	Remote-pre-promotion script script_name failed to execute successfully Exit Value: exit_code Results: script_results Node: node_address
Post-database failure script error	Post-database failure script script_name failed to execute successfully. Exit Value: exit_code Results: script_results
Agent resumed script error	Agent resumed script script_name failed to execute successfully. Results: script_results
Master isolation script failed	Master isolation script script_name failed to execute successfully. Exit Value: exit_code Results: script_results
Could not promote standby	The trigger file file_name could not be created on node. Could not promote standby. Error details: message_details
for cluster cluster_name node_address recovery.conf file on Error creating	during promotion. Promotion has continued, but requires manual intervention to ensure that the old master node can not be restarted. Error details: There was an error creating the recovery.conf file on master node node_address during promotion. Promotion has continued, but requires manual intervention to ensure that the old master node can not be restarted. Error details: There was an error creating the recovery.conf file on master node during promotion. Promotion has continued, but requires manual intervention to ensure that the old master node can not be restarted. Error details: message_details
An unexpected error has occurred for cluster cluster_name An unexpected error has occurred for cluster	An unexpected error has occurred on this node. Please check the agent log for more information. Error: error_details
Master database being fenced off for cluster cluster_name Master database being fenced off for cluster	The master database has been isolated from the majority of the cluster. The cluster is telling the master agent at to fence off the master database to prevent two masters when the rest of the failover manager cluster promotes a standby. The master database has been isolated from the majority of the cluster. The cluster is telling the master agent at ip_address The master database has been isolated from the majority of the cluster. The cluster is telling the master agent at to fence off the master database to prevent two masters when the rest of the failover manager cluster promotes a standby. ip_address to fence off the master database to prevent two masters when the rest of the failover manager cluster promotes a standby.
Isolated master database shutdown.	The isolated master database has been shutdown by failover manager.
Master database being fenced off for cluster cluster_name Master database being fenced off for cluster	The master database has been isolated from the majority of the cluster. Before the master could finish detecting isolation, a standby was promoted and has rejoined this node in the cluster. This node is isolating itself to avoid more than one master database.
Could not assign VIP to node node_address Could not assign VIP to node	Failover manager could not assign the VIP address for some reason.
database failure for cluster cluster_name master_or_standby database failure for cluster	The database has failed on the specified node.
Agent is timing out for cluster cluster_name Agent is timing out for cluster	This agent has timed out trying to reach the local database. After the timeout, the agent could successfully ping the database and has resumed monitoring. However, the node should be checked to make sure it is performing normally to prevent a possible database or agent failure.
Resume timed out for cluster cluster_name Resume timed out for cluster	This agent could not resume monitoring after reconfiguring and restarting the local database. See agent log for details.
Internal state mismatch for cluster cluster_name Internal state mismatch for cluster	The failover manager cluster's internal state did not match the actual state of the cluster members. This is rare and can be caused by a timing issue of nodes joining the cluster and/or changing their state. The problem should be resolved, but you should check the cluster status as well to verify. Details of the mismatch can be found in the agent log file.
Failover has not occurred	An agent has detected that the master database is no longer available in cluster cluster_name , but there are not enough standby nodes available for failover.. is no longer available in cluster , but there are not enough standby nodes available for failover..
node_address Database in wrong state on node_address	The standby agent has detected that the local database is no longer in recovery. The agent will now become idle. Manual intervention is required.
node_address Database in wrong state on node_address	The master agent has detected that the local database is in recovery. The agent will now become idle. Manual intervention is required.
Database connection failure for cluster cluster_name Database connection failure for cluster	This node is unable to connect to the database running on: node_address This node is unable to connect to the database running on: Until this is fixed, failover may not work properly because this node will not be able to check if the database is running or not.
Standby custom monitor failure for cluster cluster_name Standby custom monitor failure for cluster	The following custom monitor script has failed on a standby node. The agent will stop monitoring the local database. Script location: script_name Script output: script_results
Master custom monitor failure for cluster cluster_name Master custom monitor failure for cluster	The following custom monitor script has failed on a master node. EFM will attempt to promote a standby. Script location: script_name Script output: script_results
set to true for master node property_name set to true for master node	property has been set to true for this cluster. Stopping the master agent without stopping the entire cluster will be treated by the rest of the cluster as an immediate master agent failure. If maintenance is required on the master database, shut down the master agent and wait for a notification from the remaining nodes that failover will not happen. The property_name property has been set to true for this cluster. Stopping the master agent without stopping the entire cluster will be treated by the rest of the cluster as an immediate master agent failure. If maintenance is required on the master database, shut down the master agent and wait for a notification from the remaining nodes that failover will not happen.
Load balancer attach scrip error	Load balancer attach script script_name failed to execute successfully. Exit Value: exit_code Results: script_results
Load balancer detach script error	Load balancer detach script script_name failed to execute successfully. Exit Value: exit_code Results: script_results