quartz版本:
<dependency> <groupId>org.quartz-scheduler</groupId> <artifactId>quartz</artifactId> <version>2.2.1</version> </dependency>
quartz任务线程池(最小8,最大15,最多20个任务等待,那么最多是35个任务在里面)
<bean id="jobExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor"> <property name="corePoolSize" value="8" /> <property name="maxPoolSize" value="15" /> <property name="queueCapacity" value="20" /> </bean>
scheduler
<bean id = "scheduler" class="org.springframework.scheduling.quartz.SchedulerFactoryBean"> <property name="quartzProperties"> <props> <prop key="org.quartz.scheduler.skipUpdateCheck">true</prop> </props> </property> <property name="triggers"> <list> <ref local="exceptionMonitorJobCronTrigger"/> ..... </list> </property> <property name="taskExecutor" ref="jobExecutor" /> </bean>
(1)问题 某日数据库,负载较高,憋了~ 系统操作缓慢,多个定时任务卡了, 后kill掉数据库耗时的sql后,恢复正常。 但是,发现某个定时任务不跑了,看日志,可以看到最后一次,是完整的开始和结束, 然后就没动静了,凭空消失了~~ 看线程列表,正常。 其他定时任务都在,唯独该任务消失了!!!!
(2)看日志,可见如下内容:
2016-11-03 22:35:00.002 [scheduler_QuartzSchedulerThread] ERROR o.s.s.q.LocalTaskExecutorThreadPool - Task has been rejected by TaskExecutor org.springframework.core.task.TaskRejectedException: Executor [java.util.concurrent.ThreadPoolExecutor@7f4e41ad[Running, pool size = 15, active threads = 15, queued tasks = 20, completed tasks = 109842> did not accept task: org.quartz.core.JobRunShell@54cf7222 at org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor.execute(ThreadPoolTaskExecutor.java:257) ~[spring-context-4.1.7.RELEASE.jar:4.1.7.RELEASE] at org.springframework.scheduling.quartz.LocalTaskExecutorThreadPool.runInThread(LocalTaskExecutorThreadPool.java:79) ~[spring-context-support-4.1.7.RELEASE.jar:4.1.7.RELEASE] at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:381) [quartz-2.2.1.jar:na] Caused by: java.util.concurrent.RejectedExecutionException: Task org.quartz.core.JobRunShell@54cf7222 rejected from java.util.concurrent.ThreadPoolExecutor@7f4e41ad[Running, pool size = 15, active threads = 15, queued tasks = 20, completed tasks = 109842] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) ~[na:1.7.0_79] at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) ~[na:1.7.0_79] at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) ~[na:1.7.0_79] at org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor.execute(ThreadPoolTaskExecutor.java:254) ~[spring-context-4.1.7.RELEASE.jar:4.1.7.RELEASE] ... 2 common frames omitted
任务执行出错了,但定时任务不应该死吧? 改代码进行了尝试,结果却证实了,抛出TaskRejectedException异常后,定时任务不再触发了。 搜索得知,当线程池无法接纳新的任务时,ThreadPoolExecutor的默认政策是报错,抛RejectedExecutionException出来。
看quartz代码: org.quartz.core.QuartzSchedulerThread的381行:
if (qsRsrcs.getThreadPool().runInThread(shell) == false) { // this case should never happen, as it is indicative of the // scheduler being shutdown or a bug in the thread pool or // a thread pool being used concurrently - which the docs // say not to do... getLog().error("ThreadPool.runInThread() return false!"); qsRsrcs.getJobStore().triggeredJobComplete(triggers.get(i), bndle.getJobDetail(), CompletedExecutionInstruction.SET_ALL_JOB_TRIGGERS_ERROR); }
进去:
@Override @Override public boolean runInThread(Runnable runnable) { if (runnable == null) { return false; } try { this.taskExecutor.execute(runnable); return true; } catch (RejectedExecutionException ex) { logger.error("Task has been rejected by TaskExecutor", ex); return false; } }
重点后,抓到RejectedExecutionException后, qsRsrcs.getJobStore().triggeredJobComplete(triggers.get(i), bndle.getJobDetail(), CompletedExecutionInstruction.SET_ALL_JOB_TRIGGERS_ERROR); 将触发器都设置为了Error状态,即SET_ALL_JOB_TRIGGERS_ERROR 那么后面触发器就不触发了。
重看日志,发现: 2016-11-03 22:35:00.003 [scheduler_QuartzSchedulerThread] ERROR o.quartz.core.QuartzSchedulerThread - ThreadPool.runInThread() return false! 2016-11-03 22:35:00.003 [scheduler_QuartzSchedulerThread] INFO org.quartz.simpl.RAMJobStore - All triggers of Job DEFAULT.exceptionMonitorJobDetail set to ERROR state. 2016-11-03 22:35:00.003 [scheduler_QuartzSchedulerThread] ERROR o.s.s.q.LocalTaskExecutorThreadPool - Task has been rejected by TaskExecutor 而且实际还有一个触发器无效了,只是没发现而已。
(3)如何解决? <1>根据定时任务数量,适当加大线程池的等待队列大小,或者干脆弄为无限大。 只是,一个任务憋的话,下一个执行估计还是憋~
<2>不需要并发的定时任务,可设置为不并发。这样,前面憋了的话,后面那个不会执行(会等前一个完成才继续吧?) 但是,这样,你要保证你的task执行时不要卡死,否则后续的task也跑不了。
<bean id="xxxDetail" class="org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean"> <property name="targetObject" ref="xxxJob" /> <property name="targetMethod" value="excute" /> <property name="concurrent" value="true" /> <!-- 禁止并发,第一次未执行完,后续的要等待 --> </bean>
<3>修改线程池的拒绝策略,貌似是给线程池设置一个RejectedExecutionHandler。
RejectedExecutionHandler.rejectedExecution(java.lang.Runnable, java.util.concurrent.ThreadPoolExecutor) 方法。下面提供了四种预定义的处理程序策略: 在默认的 ThreadPoolExecutor.AbortPolicy 中,处理程序遭到拒绝将抛出运行时RejectedExecutionException。 在 ThreadPoolExecutor.CallerRunsPolicy 中,线程调用运行该任务的execute 本身。此策略提供简单的反馈控制机制,能够减缓新任务的提交速度。 在 ThreadPoolExecutor.DiscardPolicy 中,不能执行的任务将被删除。 在 ThreadPoolExecutor.DiscardOldestPolicy 中,如果执行程序尚未关闭,则位于工作队列头部的任务将被删除,然后重试执行程序(如果再次失败,则重复此过程)。
AbortPolicy是默认的,不能用了。 CallerRunsPolicy,用主线程重跑任务,太危险。 如果下次任务会包含本次任务的数据的话,我倒觉得直接用DiscardPolicy比较好,直接扔掉。
(4)综合一下,还是调整一下队列大小比较简单,另外把不需要并发的任务也改一下。 另外,再搞个定时任务的监控,监控每个JobDetail的状态,如果是ERROR,及时报错提醒。
|