Operations Center Credentials Cache blocks Reading Threads

Issue

  • When configuring a job in a Client Master, connected to an Operations Center, if we try to set a value for credentials on any given configuration section, when we click the dropdown to open it, it takes more time than expected.

  • We see a lot of thread contention in threads related to one thread showing the trace below:

        at hudson.XmlFile.write(XmlFile.java:181)
        at com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider.saveCache(OperationsCenterCredentialsProvider.java:234)
        at com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider.cachePut(OperationsCenterCredentialsProvider.java:170)
        at com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider.getCredentials(OperationsCenterCredentialsProvider.java:153)
    

    Or since version 2.107.0.5 of Operations Center Client plugin:

        at hudson.XmlFile.write(XmlFile.java:193)
        at com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider.lambda$saveCache$0(OperationsCenterCredentialsProvider.java:234)
        at com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider$$Lambda$631/795926822.run(Unknown Source)
    
  • The credentials cache file $JENKINS_HOME/com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider.cache.xml in the Master’s file system is growing large

Environment

Related Issue(s)

  • CJP-6052: Use a CJOC shared credentials cache when disconnected (introduction of the Operations Center Credentials Cache)
  • CJP-8620: Colossal Credentials Cache Causes Cataclysmic Contention (fixes thread locking and improvement of the cache that now saves entry asynchronously operations-center-client 2.107.1.5)
  • CTR-788: Slowdown when credentials cache file gets big (new implementation of the cache that avoid duplication and clean out entries periodcially operations-center-client 2.249.0.12)

Explanation

The problem comes from the performances of the serialization of the credentials cache provided by the Operations Center. The Operations Center Credentials cache is a feature meant to make the credentials that are defined in Operations Center available to the Master even in cases when the master is disconnected from the Operations Center (ie.e Operations Center restart or simply disconnection). A common use case is the Shared Agent reconnection after a Master restart so that durable tasks (i.e. pipeline steps) could continue.

Each master keeps a cache of the Operations Center credentials that it uses. This is persisted in a file $JENKINS_HOME/com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider.cache.xml in the master’s file system.

The Operations Center cache went through a couple of improvement to make it work at scale:

  • CJP-8620: Colossal Credentials Cache Causes Cataclysmic Contention: Thread contention and synchronous save may causes severe performances problems on the Master, especially when the cache is large. This was fixed in operations-center-client 2.107.1.5)
  • CTR-788: Slowdown when credentials cache file gets big: The credentials cache can grow very large and cause severe performance problems at some extents (when the file reaches 100s of MB). This was fixed with a new implementation of the cache that avoid duplication and clean out entries periodically in operations-center-client 2.249.0.12)

Resolution

The recommended solution is to upgrade CloudBees CI to version 2.263.1.2 or later

Workaround

Due to the fact that upgrading an instance requires a lot of planning and testing, in the meantime and in order to reduce the impact of this issue the following workaround may be used:

Clearing the cache periodically

The following script can be used to reduce the probability of the issue from happening. The script flushes the cache, the performance improvement should be immediate:

import com.cloudbees.plugins.credentials.Credentials
import java.util.HashMap
import com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider

provider = ExtensionList.lookup(OperationsCenterCredentialsProvider.class).get(0)
provider.lock.writeLock().lock();
println 'Cleaning credentials cache. Lock acquired...'
try {
  provider.cache = new HashMap<String, List<? extends Credentials>>();
  provider.saveCache()
  println 'Cache cleaned'
} finally {
  provider.lock.writeLock().unlock();
  println 'Lock released'
}

This may be executed from Manage Jenkins > Script Console when the issue happens. It may well be automated and run periodically.

Disable the Cache

Another workaround is to disable the cache by adding the system property com.cloudbees.opscenter.client.plugin.OperationsCenterCredentialsProvider.cache.disabled=true to your Master:

Note: This would impact the lookup of Credentials defined in Operations Center and used by the Master. In case the Operations Center is down or not connected, the master cannot access those credentials. This impact for example features like Shared Agents that need credentials to be defined in Operations Center.

Have more questions?

0 Comments

Please sign in to leave a comment.