Release 10.1B: OpenEdge Replication
User Guide
Lost connection
A lost connection, in which a OpenEdge Replication server loses contact with its agents, can occur for a variety of reasons, including:
When a lost connection occurs, the source goes into failure recovery and the target goes into transition.
Detecting TCP/IP communications failures
It is possible that a break in the TCP/IP connection between an OpenEdge Replication server and its agents can go undetected. For example, in a large, complex network with a number of bridges and routers, a segment of the network could go down, interrupting the communications between the server host machine and the agent host machine. However, TCP/IP would still be running in other segments of the network and the server or agent might be unaware of the break.
You can ensure that TCP/IP failures are detected by having the server and agent ping each other. If there is no response to the ping, the connection is assumed to be broken and failure recovery begins.
Use the server
Repl-Keep-Alive
property to enable pinging between the server and the agent. A ping is sent every thirty seconds. TheRepl-Keep-Alive
property allows you to specify the number of seconds to wait for a response to the ping. If there is no response for the specified period (the default is 300 seconds), a connection failure condition is set and failure recovery begins.For more information about configuring
Repl-Keep-Alive
, see the "Server properties" section.Source failure recovery after losing connection
When the OpenEdge Replication server loses connection with one or more OpenEdge Replication agents, the OpenEdge Replication server tries to contact the OpenEdge Replication agent and establish connection for an amount of time determined by the
connect-timeout
value set in the OpenEdge Replication server properties file.The OpenEdge Replication server does the following:
- The OpenEdge Replication server recognizes that there has been an agent failure. The server places itself into a state that allows continuous RDBMS activity, as if the OpenEdge Replication server is not running.
- The OpenEdge Replication server tries to reconnect to OpenEdge Replication agents for a set amount of time.
Source database activity by clients is still allowed unless synchronous replication is being used or schema updates are being performed by a process.
- If the OpenEdge Replication server is able to reconnect to the OpenEdge Replication agent, it again begins processing AI blocks from the RDBMS. When it gets within ten AI blocks of the RDBMS, the OpenEdge Replication server halts normal database activity and completes the synchronization process.
Schema updates are not allowed while the OpenEdge Replication server is performing synchronization. If schema updates are being performed when failure recovery synchronization begins, source database updates will block until failure recovery completes.
Source database activity cannot continue without the agent connected when synchronous replication is being used.
- When synchronization is completed, the OpenEdge Replication server reinserts itself back into the AI block write process and the RDBMS will be unlocked, allowing normal database activity and replication activity to continue.
If the OpenEdge Replication server is unable to reconnect to all agents or to a critical agent in the configured
connect-timeout
period, the OpenEdge Replication server will terminate and source database activity will continue. In other words, if there are no critical agents, the server must be able to reconnect to all agents or it will terminate. If one agent is a critical agent, the server will continue if it can reconnect to the single critical agent. When source database activity continues while the OpenEdge Replication server is not running, be sure that there is enough AI extent space to handle all database activity until the OpenEdge Replication server is restarted and replication continues.There is a possibility when failure recovery is being performed and synchronization takes place that the OpenEdge Replication server might not catch up to the RDBMS. During this time, all target databases are not up to date with the source.
Target transition after losing connection
When the OpenEdge Replication agent loses contact with the OpenEdge Replication server, the OpenEdge Replication agent goes into transition. During transition after a lost connection, the OpenEdge Replication agent listens for the OpenEdge Replication server in order to re-establish connection, if auto transition is configured, for a set amount of time determined by the
transition-timeout
value in the OpenEdge Replication agent properties file. The OpenEdge Replication agent does the following:
- When the OpenEdge Replication agent first loses contact with the OpenEdge Replication server, it goes into a pre-transition state where it listens for the OpenEdge Replication server.
- If contact is not established and the agent is configured to perform auto transition, the target database is transitioned to a normal OpenEdge database. A normal OpenEdge database means that all standard client connections and updates can be performed on it.
- If manual transition is configured, the OpenEdge Replication agent continues waiting until the database administrator initiates a change. Until the administrator initiates a change using the DSRUTIL utility, the database will remain in an unknown state.
For more information about transition, see the "Transitioning to the target database" section.
Copyright © 2006 Progress Software Corporation www.progress.com Voice: (781) 280-4000 Fax: (781) 280-4095 |