Reentrant locking in Groups is causing Jenkins instance to hang

Issue

  • Jenkins server hangs unexpectedly only recovered after a restart.
  • Stacktrace similar to the below is observed when the issue occurs.

Thread updating group members

"Handling POST /groups/Jenkins-users/addMember/api/json from 10.15.132.73 : Jetty (winstone)-37013694" id=37013694 (0x234c8be) state=WAITING cpu=73%
    - waiting on <0x595f0447> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
    - locked <0x595f0447> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
    at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
    at nectar.plugins.rbac.groups.Group.doAddMember(Group.java:1710)

Many threads blocked on

    - waiting on <0x595f0447> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
    - locked <0x595f0447> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
    at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
    at nectar.plugins.rbac.groups.Group.isMatch(Group.java:510)

Environment

Resolution

Reentrant locking in RBAC groups causes Jenkins to become unresponsive.
Making concurrent modifications to RBAC groups was occasionally causing Jenkins to become unresponsive. The issue was caused by deadlock as a result of the reentrant locking strategy.

The deadlock issue has been resolved. Making concurrent modifications to RBAC groups no longer causes Jenkins to become unresponsive.
The fix is available in 2.303.1.5 and upwards.

Workaround

If you are unable to upgrade your instance for some reason the fix is also back ported into version 5.51.1
of the CloudBees Role-Based Access Control Plugin.

References

  • BEE-7033: Reentrant locking in Groups is causing Jenkins instance to hang

Have more questions?

0 Comments

Please sign in to leave a comment.