赞
踩
1、报错内容如下:
具体描述如下图所示:
This scheduler instance xxxx is still active but was recovered by another instance in the cluster. This may cause inconsistent behavior.
ClusterManager detected 1 failed or restarted instances.
分析:
1、可以看到当前日志是由LocalDataSourceJobStore打印出来的,源码查看无日志信息,往父类和接口进行查找到JobStoreSupport,主要源码如下:
protected void clusterRecover(Connection conn, List<SchedulerStateRecord> failedInstances)
throws JobPersistenceException {
if (failedInstances.size() > 0) {
long recoverIds = System.currentTimeMillis();
logWarnIfNonZero(failedInstances.size(),
"ClusterManager: detected " + failedInstances.size()
+ " failed or restarted instances.");
// 省略后面的N行代码
// ....
}
}
protected List<SchedulerStateRecord> findFailedInstances(Connection conn) throws JobPersistenceException { try { List<SchedulerStateRecord> failedInstances = new LinkedList<SchedulerStateRecord>(); boolean foundThisScheduler = false; long timeNow = System.currentTimeMillis(); List<SchedulerStateRecord> states = getDelegate().selectSchedulerStateRecords(conn, null); for(SchedulerStateRecord rec: states) { // find own record... if (rec.getSchedulerInstanceId().equals(getInstanceId())) { foundThisScheduler = true; if (firstCheckIn) { failedInstances.add(rec); } } else { // find failed instances... if (calcFailedIfAfter(rec) < timeNow) { failedInstances.add(rec); } } } // The first time through, also check for orphaned fired triggers. if (firstCheckIn) { failedInstances.addAll(findOrphanedFailedInstances(conn, states)); } // If not the first time but we didn't find our own instance, then // 不是当前机器同时也不是第一次进行check. if ((!foundThisScheduler) && (!firstCheckIn)) { // FUTURE_TODO: revisit when handle self-failed-out impl'ed (see FUTURE_TODO in clusterCheckIn() below) getLog().warn( "This scheduler instance (" + getInstanceId() + ") is still " + "active but was recovered by another instance in the cluster. " + "This may cause inconsistent behavior."); } return failedInstances; } catch (Exception e) { lastCheckin = System.currentTimeMillis(); throw new JobPersistenceException("Failure identifying failed instances when checking-in: " + e.getMessage(), e); } }
可以看到代码中的 // find failed instances… 下面的calcFailedIfAfter方法:
protected long calcFailedIfAfter(SchedulerStateRecord rec) {
return rec.getCheckinTimestamp() +
Math.max(rec.getCheckinInterval(),
(System.currentTimeMillis() - lastCheckin)) +
7500L;
}
由于数据库中没有找到当前机器的instance并不是第一次check,所以会打印如下日志:
This scheduler instance xxxx is still active but was recovered by another instance in the cluster. This may cause inconsistent behavior.
同时有其他机器节点的时间发生了超时,由于系统的时间差值较大,超过7.5秒,才会将失败的实例增加到failedInstances中,由于存在超时通讯的节点,所以会执行调用clusterRecover方法,则会打印如下的日志:
ClusterManager detected 1 failed or restarted instances.
所以这个问题主要是由于系统服务器时间不同步导致的,同步集群当中服务的时间即可解决该问题。当前源码学习仍在进行中,如有不对,请不吝赐教,感激不尽!
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。