Troubleshooting Geo-replication
Troubleshooting Geo-replication
This section describes the most common troubleshooting scenarios related to GlusterFS Geo-replication.
Locating Log Files
For every Geo-replication session, the following three log files are associated to it (four, if the secondary is a gluster volume):
- Primary-log-file - log file for the process which monitors the Primary volume
- Secondary-log-file - log file for process which initiates the changes in secondary
- Primary-gluster-log-file - log file for the maintenance mount point that Geo-replication module uses to monitor the Primary volume
- Secondary-gluster-log-file - is the secondary's counterpart of it
Primary Log File
To get the Primary-log-file for geo-replication, use the following command:
gluster volume geo-replication <session> config log-file
For example:
# gluster volume geo-replication Volume1 example.com:/data/remote_dir config log-file
Secondary Log File
To get the log file for geo-replication on secondary (glusterd must be running on secondary machine), use the following commands:
-
On primary, run the following command:
# gluster volume geo-replication Volume1 example.com:/data/remote_dir config session-owner 5f6e5200-756f-11e0-a1f0-0800200c9a66
Displays the session owner details.
-
On secondary, run the following command:
# gluster volume geo-replication /data/remote_dir config log-file /var/log/gluster/${session-owner}:remote-mirror.log
-
Replace the session owner details (output of Step 1) to the output of Step 2 to get the location of the log file.
/var/log/gluster/5f6e5200-756f-11e0-a1f0-0800200c9a66:remote-mirror.log
Rotating Geo-replication Logs
Administrators can rotate the log file of a particular primary-secondary
session, as needed. When you run geo-replication's log-rotate
command, the log file is backed up with the current timestamp suffixed
to the file name and signal is sent to gsyncd to start logging to a new
log file.
To rotate a geo-replication log file
-
Rotate log file for a particular primary-secondary session using the following command:
# gluster volume geo-replication log-rotate
For example, to rotate the log file of primary
Volume1
and secondaryexample.com:/data/remote_dir
:# gluster volume geo-replication Volume1 example.com:/data/remote_dir log rotate log rotate successful
-
Rotate log file for all sessions for a primary volume using the following command:
# gluster volume geo-replication log-rotate
For example, to rotate the log file of primary
Volume1
:# gluster volume geo-replication Volume1 log rotate log rotate successful
-
Rotate log file for all sessions using the following command:
# gluster volume geo-replication log-rotate
For example, to rotate the log file for all sessions:
# gluster volume geo-replication log rotate log rotate successful
Synchronization is not complete
Description: GlusterFS geo-replication did not synchronize the data completely but the geo-replication status displayed is OK.
Solution: You can enforce a full sync of the data by erasing the index and restarting GlusterFS geo-replication. After restarting, GlusterFS geo-replication begins synchronizing all the data. All files are compared using checksum, which can be a lengthy and high resource utilization operation on large data sets.
Issues in Data Synchronization
Description: Geo-replication display status as OK, but the files do not get synced, only directories and symlink gets synced with the following error message in the log:
[2011-05-02 13:42:13.467644] E [primary:288:regjob] GMaster: failed to
sync ./some\_file\`
Solution: Geo-replication invokes rsync v3.0.0 or higher on the host and the remote machine. You must verify if you have installed the required version.
Geo-replication status displays Faulty very often
Description: Geo-replication displays status as faulty very often with a backtrace similar to the following:
2011-04-28 14:06:18.378859] E [syncdutils:131:log\_raise\_exception]
\<top\>: FAIL: Traceback (most recent call last): File
"/usr/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line
152, in twraptf(\*aa) File
"/usr/local/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in
listen rid, exc, res = recv(self.inf) File
"/usr/local/libexec/glusterfs/python/syncdaemon/repce.py", line 42, in
recv return pickle.load(inf) EOFError
Solution: This error indicates that the RPC communication between the primary gsyncd module and secondary gsyncd module is broken and this can happen for various reasons. Check if it satisfies all the following pre-requisites:
- Password-less SSH is set up properly between the host and the remote machine.
- If FUSE is installed in the machine, because geo-replication module mounts the GlusterFS volume using FUSE to sync data.
- If the Secondary is a volume, check if that volume is started.
- If the Secondary is a plain directory, verify if the directory has been created already with the required permissions.
- If GlusterFS 3.2 or higher is not installed in the default location
(in Primary) and has been prefixed to be installed in a custom
location, configure the
gluster-command
for it to point to the exact location. - If GlusterFS 3.2 or higher is not installed in the default location
(in secondary) and has been prefixed to be installed in a custom
location, configure the
remote-gsyncd-command
for it to point to the exact place where gsyncd is located.
Intermediate Primary goes to Faulty State
Description: In a cascading set-up, the intermediate primary goes to faulty state with the following log:
raise RuntimeError ("aborting on uuid change from %s to %s" % \\
RuntimeError: aborting on uuid change from af07e07c-427f-4586-ab9f-
4bf7d299be81 to de6b5040-8f4e-4575-8831-c4f55bd41154
Solution: In a cascading set-up the Intermediate primary is loyal to the original primary. The above log means that the geo-replication module has detected change in original primary. If this is the desired behavior, delete the config option volume-id in the session initiated from the intermediate primary.