Venafi Kubernetes components deployment best practices¶
Learn how to configure the Venafi Kubernetes components for deployment in production environments.
Introduction¶
You will learn how to use Helm chart values to override settings such as the number of replicas in each Deployment and the resource requests for CPU and memory.
Discover methods to ensure the high availability of the Venafi Kubernetes components, whilst avoiding inefficient over-allocation of your cluster resources.
Isolate Venafi Kubernetes components on a dedicated Node pool¶
Venafi Kubernetes components should be treated as part of your platform's control plane. Some Venafi Kubernetes components, such as cert-manager, create and modify Kubernetes Secret resources and most of the components cache TLS Secret resources in memory.
If an untrusted or malicious workload runs on the same Node as the Venafi Kubernetes components, and gains root access to the underlying node, it may be able to read the Secrets that the controller has cached in memory. You can mitigate this risk by running Venafi Kubernetes components on nodes that are reserved for trusted platform operators.
The Helm charts for Venafi Kubernetes components have parameters to configure the Pod tolerations
and Node affinity
(or nodeSelector
) for each component. The exact values of these parameters will depend on your particular cluster.
Learn more
Read Assigning Pods to Nodes in the Kubernetes documentation.
Read about Taints and Tolerations in the Kubernetes documentation.
Example¶
This example demonstrates how to use:
taints
to repel non-platform Pods from Nodes which you have reserved for your platform's control-plane.tolerations
to allow cert-manager Pods to run on those Nodes.nodeSelector
to place the cert-manager Pods on those Nodes.
To isolate Venafi Kubernetes components on a dedicated Node pool:
-
Label the Nodes:
kubectl label node ... node-restriction.kubernetes.io/reserved-for=platform
-
Taint the Nodes:
kubectl taint node ... node-restriction.kubernetes.io/reserved-for=platform:NoExecute
-
Then install cert-manager using the following Helm chart values:
nodeSelector: kubernetes.io/os: linux node-restriction.kubernetes.io/reserved-for: platform tolerations: - key: node-restriction.kubernetes.io/reserved-for operator: Equal value: platform webhook: nodeSelector: kubernetes.io/os: linux node-restriction.kubernetes.io/reserved-for: platform tolerations: - key: node-restriction.kubernetes.io/reserved-for operator: Equal value: platform cainjector: nodeSelector: kubernetes.io/os: linux node-restriction.kubernetes.io/reserved-for: platform tolerations: - key: node-restriction.kubernetes.io/reserved-for operator: Equal value: platform startupapicheck: nodeSelector: kubernetes.io/os: linux node-restriction.kubernetes.io/reserved-for: platform tolerations: - key: node-restriction.kubernetes.io/reserved-for operator: Equal value: platform
Note
This example uses nodeSelector
to place the Pods but you could also use affinity.nodeAffinity
. nodeSelector
is chosen here because it has a simpler syntax.
Note
The default nodeSelector
value kubernetes.io/os: linux
avoids placing cert-manager Pods on Windows nodes in a mixed OS cluster, so that must be explicitly included here too.
Tip
On a multi-tenant cluster, consider enabling the PodTolerationRestriction
plugin to limit which tolerations tenants may add to their Pods. You may also use that plugin to add default tolerations to the cert-manager
namespace, which removes the need to explicitly set the tolerations in the Helm chart.
Tip
As an alternative, you could use Kyverno to limit which tolerations Pods are allowed to use.
Read Restrict control plane scheduling as a starting point.
Learn more
Read the Guide to isolating tenant workloads to specific nodes in the EKS Best Practice Guides, for an in-depth explanation of these techniques.
Learn how to Isolate your workloads in dedicated node pools on Google Kubernetes Engine.
Learn about Placing pods on specific nodes using node selectors, with RedHat OpenShift.
Read more about the node-restriction.kubernetes.io/
prefix and the NodeRestriction
admission plugin.
High Availability¶
This section is about increasing the Venafi Kubernetes resilience to voluntary and involuntary disruptions.
For example, as a platform administrator, you need to periodically drain and decommission a Kubernetes node without causing any downtime to the Venafi Kubernetes services or the applications relying on the Venafi Kubernetes services.
As a platform administrator, you need to be confident that the Venafi Kubernetes components can be deployed in such a way that they are resilient to a Kubernetes node failure.
Replicas¶
Use two or more replicas to achieve high availability.
Each of the Venafi Kubernetes components has one or more Kubernetes Deployment resources. By default each Deployment has one replica. The advantage of using one replica by default, is that you can easily evaluate the components on a single node cluster; on a kind or a minikube cluster, for example. The disadvantage is that one replica does not provide high availability. The Helm charts for Venafi Kubernetes components have parameters to configure the replicas for each controller, and in production you should use two or more replicas to achieve high availability.
Most of the Venafi Kubernetes components use leader election to ensure that only one replica is active. This prevents conflicts that would arise if multiple controllers were reconciling the same API resources. So in general multiple replicas can be used for high availability but not load balancing.
You don't need to use more than two replicas of a controller Pod. Using two replicas for each Deployment ensures that there is a standby Pod scheduled to a Node and ready to take leadership, if the current leader encounters a disruption. For example, a voluntary disruption, such as the draining of the Node on which the leader Pod is running, or an involuntary disruption, such as the leader Pod encountering an unexpected deadlock. Further replicas may add a degree of resilience if you have the luxury of sufficient Nodes with sufficient CPU and memory to accommodate standby replicas.
Special Cases
There are some exceptions to the general rules above.
cert-manager webhook
By default, the cert-manager webhook Deployment has one replica, but you should use three or more in production. If the cert-manager webhook is unavailable, all API operations on cert-manager custom resources will fail, disrupting any software that creates, updates, or deletes cert-manager custom resources. So, it is especially important to keep at least one replica of the cert-manager webhook running at all times.
The cert-manager webhook does not use leader election, so you can scale it horizontally by increasing the number of replicas. When the Kubernetes API server connects to the cert-manager webhook it does so via a Service that load balances the connections between all the Ready replicas. For this reason, there is a clear benefit to increasing the number of webhook replicas to three or more, on clusters where there is a high frequency of cert-manager custom resource interactions. Furthermore, the webhook has modest memory requirements because it does not use cache. For this reason, the resource cost of scaling out the webhook is relatively low.
Approver Policy and Approver Policy Enterprise
By default, Approver Policy and Approver Policy Deployment resource have one replica, but in production you should use two or more. These components have both a controller manager and a webhook and unlike cert-manager these are both provided by the same process in each Pod. This changes the strategy for high availability and scaling of these components. You should use at least two replicas for high availability. The Approver Policy webhook is only used for validating CertificateRequestPolicy resources, so in the unlikely event that there is a high frequency of CertificateRequestPolicy API interactions, you might consider scaling out the Deployment to three or more replicas. Be aware that if you do, any of the replicas might also be elected as the controller manager and will therefore need to reserve sufficient memory to cache all CertificateRequests and CertificateRequestPolicy resources.
CSI driver and CSI driver for SPIFFE
These CSI components use DaemonSet resources because exactly one instance of the driver is meant to run on all or some of the nodes in your cluster. Therefore, the Helm charts for these components do not allow you to configure the replica count.
Learn more¶
For examples of how webhook disruptions might disrupt your cluster, see Ensure control plane stability when using webhooks in the Google Kubernetes Engine (GKE) documentation, .
To learn more about potential issues caused by webhooks and how you can avoid them, see The dark side of Kubernetes admission webhooks on the Cisco Tech Blog.
PodDisruptionBudget¶
For high availability you should also deploy a PodDisruptionBudget resource, with minAvailable=1
or with maxUnavailable=1
. This ensures that a voluntary disruption, such as the draining of a Node, can not proceed until at least one other replica has been successfully scheduled and started on another Node. The Helm charts for Venafi Kubernetes components have parameters to enable and configure a PodDisruptionBudget.
Warning
These PodDisruptionBudget settings are only suitable for high availability deployments. You must increase the replicas
value of each Deployment to more than the minAvailable
value, otherwise the PodDisruptionBudget will prevent you from draining cert-manager Pods.
Learn more
See Specifying a Disruption Budget for your Application in the Kubernetes documentation.
Topology Spread Constraints¶
Consider using Topology Spread Constraints, to ensure that a disruption of a node or data center does not degrade the operation of cert-manager.
For high availability, you don't want the replica Pods to be scheduled on the same Node, because if that node fails, both the active and standby Pods will exit. There will be no further reconciliation of the resources by that controller until there is another Node with enough free resources to run a new Pod, and until that Pod has become Ready.
It is also desirable for the replica Pods to run in separate data centers (availability zones), if the cluster has nodes distributed between zones. Then, in the event of a failure at the data center hosting the active Pod, the standby Pod will immediately be available to take leadership.
Fortunately, you may not need to do anything to achieve these goals because Kubernetes > 1.24 has Built-in default constraints which should mean that the high availability scheduling described above will happen implicitly. But in case your cluster does not have those defaults, each of the Venafi Kubernetes component Helm charts have parameters to explicitly add Topology Spread Constraints to the Pods.
Liveness Probes¶
Some Venafi Kubernetes components have liveness probes. Those components that have liveness probes have sensible defaults for the following options:
initialDelaySeconds
periodSeconds
timeoutSeconds
successThreshold
failureThreshold
If necessary, you can override the defaults by supplying Helm chart values.
Venafi may add liveness probes to the other components in the future, but only where there is a clear benefit. This cautious approach to liveness probes is guided by the following paragraph in the Kubernetes documentation Configure Liveness, Readiness and Startup Probes:
Liveness probes can be a powerful way to recover from application failures, but they should be used with caution. Liveness probes must be configured carefully to ensure that they truly indicate unrecoverable application failure, for example a deadlock
Background information
The Venafi Kubernetes components are designed to crash when there's a fatal error. Kubernetes will restart a crashed container and if it crashes repeatedly, there will be increasing time delays between successive restarts.
Each of the components starts one or more long running threads. For example, in Approver Policy, one thread is responsible for leader election and another is responsible for reconciling custom resources, another is responsible for serving metrics and another is responsible for handling webhook requests. If any of these threads exits with an error, it will trigger the orderly shutdown of the remaining threads and the process will exit.
For this reason, the liveness probe should only be needed if there is a bug in this orderly shutdown process, or if there is a bug in one of the other threads which causes the process to deadlock and not shutdown.
Learn more
Read about Using Liveness Probes in cert-manager.
Read Configure Liveness, Readiness and Startup Probes in the Kubernetes documentation, paying particular attention to the notes and cautions in that document.
Read Shooting Yourself in the Foot with Liveness Probes for more cautionary information about liveness probes.
Priority Class¶
The reason for setting a priority class is summarized as follows in the Kubernetes blog Protect Your Mission-Critical Pods From Eviction With PriorityClass
:
Pod priority and preemption help to make sure that mission-critical pods are up in the event of a resource crunch by deciding order of scheduling and eviction.
If the Venafi Kubernetes components are mission-critical to your platform, then set a priorityClassName
on the Pods to protect them from preemption, in situations where a Kubernetes node becomes starved of resources. Without a priorityClassName
the Pods of the Venafi Kubernetes components may be evicted to free up resources for other workloads, and this may cause disruption to any applications that rely on them.
Most Kubernetes clusters will come with two built-in priority class names: system-cluster-critical
and system-node-critical
, which are used for Kubernetes core components. These can also be used for critical add-ons, such as the Venafi Kubernetes components.
Here are the Helm chart values you can use for cert-manager, for example:
global:
priorityClassName: system-cluster-critical
The Venafi Kubernetes components allow you to set the priorityClassName
using Helm chart values, and the Venafi Kubernetes Manifest utility makes it easy to set the values for multiple components.
On some clusters the ResourceQuota
admission controller may be configured to limit the use of certain priority classes to certain namespaces. For example, Google Kubernetes Engine (GKE) will only allow priorityClassName: system-cluster-critical
for Pods in the kube-system
namespace, by default.
Learn more
Read Kubernetes PR #93121 to see how and why this was implemented.
In such cases you will need to create a ResourceQuota
in the venafi
namespace:
# venafi-resourcequota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: venafi-critical-pods
namespace: venafi
spec:
hard:
pods: 1G
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values:
- system-node-critical
- system-cluster-critical
kubectl apply -f venafi-resourcequota.yaml
If you are using CSI driver or CSI driver for SPIFFE it is essential to set priorityClassName: system-node-critical
. These components are deployed using DaemonSet
resources and must run on the same nodes as the general applications on your platform (as opposed to components like cert-manager which should run on isolated nodes). By assigning priorityClassName: system-node-critical
to these CSI driver pods, you can ensure that they have a higher scheduling priority compared to other general application pods.
Learn more
Read Protect Your Mission-Critical Pods From Eviction With PriorityClass
, a Kubernetes blog post about how Pod priority and preemption help to make sure that mission-critical pods are up in the event of a resource crunch by deciding order of scheduling and eviction.
Read Guaranteed Scheduling For Critical Add-On Pods to learn why system-cluster-critical
should be used for add-ons that are critical to a fully functional cluster.
Read Limit Priority Class consumption by default, to learn why platform administrators might restrict usage of certain high priority classes to a limited number of namespaces.
Setting default values for HA deployments using the Venafi CLI tool¶
To generate HA values for any of the supported components (see the example that follows), and include them in a Venafi manifest, use the --ha-values-dir
flag for venctl components kubernetes manifest generate
:
venctl components kubernetes manifest generate \
--namespace venafi \
--cert-manager \
--approver-policy-enterprise \
--csi-driver \
--trust-manager \
--venafi-connection \
--venafi-enhanced-issuer \
--venafi-kubernetes-agent \
--ha-values-dir venafi-values > venafi-components.yaml
venafi-values
directory, creating that directory if needed. These generated values files will also be included in your generated manifest (in this example, venafi-components.yaml
). Using the --ha-values-dir
flag will overwrite any previously generated values. If you want to change any of the Venafi HA values, you can pass your custom values as with previous versions of the Venafi CLI tool - your customizations will take precedence.
For example, if you wish to increase the memory request for the cert-manager controller:
cat cert-manager-controller-memory.values.yaml
resources:
requests:
memory: 1Gi
venctl components kubernetes manifest generate \
--namespace venafi \
--cert-manager \
--cert-manager-values-files cert-manager-controller-memory.values.yaml \
--approver-policy-enterprise \
--csi-driver \
--trust-manager \
--venafi-connection \
--venafi-enhanced-issuer \
--venafi-kubernetes-agent \
--ha-values-dir venafi-values > venafi-components.yaml
Scaling¶
This section explains how to accurately allocate CPU and memory resources to match the actual usage needs of the Venafi Kubernetes applications running in your Kubernetes environment. This is sometimes referred to as right sizing.
This is important for the following reasons:
- Efficiency: Matching the resource requests with the actual usage helps prevent the wastage of resources.
- Performance: Using appropriate resource requests will improve the performance of the components.
- Reliability: Right sizing improves reliability by avoiding crashes caused by memory exhaustion, for example.
However, right sizing is a continuous process that requires ongoing monitoring, adjustment, and optimization.
Learn more
See Practical tips for right sizing your Kubernetes workloads, on the Datadog blog.
See Right sizing workloads in Kubernetes for cost optimization, on the Google Cloud blog.
A note about horizontal scaling
Components that use leader election¶
Most Venafi Kubernetes components use leader election to ensure that only one replica is active. This prevents conflicts which would arise if multiple replicas were reconciling the same API resources. You cannot use horizontal scaling for these components because only one replica is active. Use vertical scaling instead.
Components that use DaemonSet¶
CSI driver and CSI driver for SPIFFE
These CSI components use DaemonSet resources because exactly one instance of the driver is meant to run on all or some of the nodes in your cluster. Therefore, the Helm charts for these components do not allow you to configure the replica count. Use vertical scaling instead.
cert-manager webhook
The cert-manager webhook does not use leader election, so you can scale it horizontally by increasing the number of replicas. When the Kubernetes API server connects to the cert-manager webhook, it does so via a Service that load balances the connections between all the Ready replicas. For this reason, there is a clear benefit to increasing the number of webhook replicas to three or more, on clusters with a high frequency of cert-manager custom resource interactions. Furthermore, the webhook has modest memory requirements because it does not use a cache. For this reason, the resource cost of scaling out the webhook is relatively low.
Approver Policy and Approver Policy Enterprise
By default, Approver Policy and Approver Policy Enterprise Deployment resources have one replica, but in production you should use two or more. These components have both a controller manager and a webhook, and unlike cert-manager, these are both provided by the same process in each Pod. This changes the strategy for high availability and scaling of these components. You should use at least two replicas for high availability. The Approver Policy webhook is only used for validating CertificateRequestPolicy resources. In the unlikely event of a high frequency of CertificateRequestPolicy API interactions, you might consider scaling out the Deployment to three or more replicas. Be aware that if you do, any of the replicas might also be elected as the controller manager and will, therefore, need to reserve sufficient memory to cache all CertificateRequests and CertificateRequestPolicy resources.
Do not use HorizontalPodAutoscaling¶
HorizontalPodAutoscaling is not useful for any of the Venafi Kubernetes components. They either do not support horizontal scaling (see above) or they are webhooks, which are unlikely to encounter sufficient load to trigger the horizontal pod autoscaler.
The Helm charts for the Venafi Kubernetes components do not all include resource requests and limits, so you should supply resource requests and limits that are appropriate for your cluster.
What are appropriate resource requests and limits? This depends on factors such as the size of your cluster and the number and nature of the workloads that run on the cluster.
Memory¶
Use vertical scaling to assign sufficient memory resources to Venafi Kubernetes components. The memory requirements will be higher on clusters with very many API resources or with large API resources. This is because each of the components reconciles one or more Kubernetes API resources, and each component will cache the metadata and sometimes the entire resource in memory, to reduce the load on your Kubernetes API server.
For example, if the cluster contains very many CertificateRequest resources, you will need to increase the memory limit of the Venafi Enhanced Issuer Pod. The default memory limit of Venafi Enhanced Issuer is 128MiB, which is sufficient to cache ~3000 CertificateRequest resources.
You can use tools like Kubernetes Resource Recommender, to measure the typical memory usage of the Venafi Kubernetes components in your cluster and to help you choose appropriate memory limits.
Learn more
See Assign Memory Resources to Containers and Pods, in the Kubernetes documentation.
See What Everyone Should Know About Kubernetes Memory Limits, to learn why the best practice is to set memory limit=request
.
Benchmarks¶
The following charts and tables show the maximum memory usage of the Venafi Kubernetes components with: 1000, 5000, and 10000 Certificate resources. The private key size affects the size of the Secret resources, and larger Secret resources cause higher memory use in some components, due to in-memory caching. The measurements were collected using a Venafi benchmarking tool called tlspk-bench
. The components were installed in a Kind cluster using the Venafi CLI tool, with default Helm chart values. The unit of measurement is mebibyte (MiB). You can use these measurements to choose initial memory limits for each component.
RSA 2048¶
Component | 1000 | 5000 | 10000 |
---|---|---|---|
approver-policy-enterprise (v0.17.0) | 57 | 229 | 419 |
cert-manager-cainjector (v1.14.5) | 39 | 102 | 172 |
cert-manager-controller (v1.14.5) | 95 | 314 | 528 |
cert-manager-webhook (v1.14.5) | 20 | 25 | 23 |
trust-manager (v0.10.0) | 18 | 18 | 19 |
venafi-enhanced-issuer (v0.14.0) | 41 | 98 | 162 |
venafi-kubernetes-agent (v0.1.48) | 115 | 457 | 715 |
ECDSA 256¶
Component | 1000 | 5000 | 10000 |
---|---|---|---|
approver-policy-enterprise (v0.17.0) | 51 | 208 | 402 |
cert-manager-cainjector (v1.14.5) | 37 | 84 | 145 |
cert-manager-controller (v1.14.5) | 89 | 246 | 458 |
cert-manager-webhook (v1.14.5) | 20 | 25 | 24 |
trust-manager (v0.10.0) | 18 | 18 | 22 |
venafi-enhanced-issuer (v0.14.0) | 41 | 93 | 157 |
venafi-kubernetes-agent (v0.1.48) | 148 | 388 | 770 |
CPU¶
Use vertical scaling to assign sufficient CPU resources to the Venafi Kubernetes components. The CPU requirements will be higher on clusters where there are very frequent updates to the resources which are reconciled by the components. Whenever a resource changes, it will be queued to be re-reconciled by the component. Higher CPU resources allow the component to process the queue faster.
You can use tools like Kubernetes Resource Recommender, to measure the typical CPU usage of the Venafi Kubernetes components in your cluster and to help you choose appropriate CPU requests.
Learn more
See Assign CPU Resources to Containers and Pods, in the Kubernetes documentation.
Read Stop Using CPU Limits on Kubernetes, to learn why the best practice is to set CPU requests, but not limits.
Component Notes¶
cert-manager-controller¶
Cache Size¶
When Certificate resources are the dominant use-case, as for example when workloads need to mount the TLS Secret or when gateway-shim is used, the memory consumption of the cert-manager controller will be roughly proportional to the total size of those Secret resources that contain the TLS key pairs. This is because the cert-manager controller caches the entire content of these Secret resources in memory. If large TLS keys are used (e.g. RSA 4096) the memory use will be higher than if smaller TLS keys are used (e.g. ECDSA).
The other Secrets in the cluster, such as those used for Helm chart configurations or for other workloads, will not significantly increase the memory consumption, because cert-manager will only cache the metadata of these Secrets.
When CertificateRequest
resources are the dominant use-case, as for example with CSI driver or with Istio CSR, the memory consumption of the cert-manager controller will be much lower, because there will be fewer TLS Secrets and fewer resources to be cached.
Disable client-side rate limiting for Kubernetes API requests¶
By default cert-manager throttles the rate of requests to the Kubernetes API server to 20 queries per second. Historically this was intended to prevent cert-manager from overwhelming the Kubernetes API server, but modern versions of Kubernetes implement API Priority and Fairness, which obviates the need for client side throttling. You can increase the threshold of the client-side rate limiter using the following helm values:
# helm-values.yaml
config:
apiVersion: controller.config.cert-manager.io/v1alpha1
kind: ControllerConfiguration
kubernetesAPIQPS: 10000
kubernetesAPIBurst: 10000
Learn more
This does not technically disable the client-side rate-limiting but configures the QPS and Burst values high enough that they are never reached.
Read cert-manager#6890
: Allow client-side rate-limiting to be disabled. A proposal for a cert-manager configuration option to disable client-side rate-limiting.
Read kubernetes#111880
: Disable client-side rate-limiting when AP&F is enabled. A proposal that the kubernetes.io/client-go
module should automatically use server-side rate-limiting when it is enabled.
Read about other projects that disable client-side rate limiting: Flux.
Read API documentation for ControllerConfiguration for a description of the kubernetesAPIQPS
and kubernetesAPIBurst
configuration options.
Restrict the use of large RSA keys¶
Certificates with large RSA keys cause cert-manager to use more CPU resources. When there are insufficient CPU resources, the reconcile queue length grows. This delays the reconciliation of all Certificates. A user who has permission to create a large number of RSA 4096 certificates might accidentally or maliciously cause a denial of service for other users on the cluster.
Learn more
Learn how to enforce an Approval Policy to prevent the use of large RSA keys.
Learn how to set Certificate defaults automatically, using tools like Kyverno.
Set revisionHistoryLimit: 1
on all Certificate resources¶
By default, cert-manager will keep all the CertificateRequest
resources that it creates (revisionHistoryLimit
):
The maximum number of
CertificateRequest
revisions that are maintained in the Certificate's history. Each revision represents a singleCertificateRequest
created by this Certificate, either when it was created, renewed, or Spec was changed. Revisions will be removed by oldest first if the number of revisions exceeds this number. If set,revisionHistoryLimit
must be a value of1
or greater. If unset (nil
), revisions will not be garbage collected. The default value isnil
.
On a busy cluster these will eventually overwhelm your Kubernetes API server because of the memory and CPU required to cache them all and the storage required to save them.
Use a tool like Kyverno to override the Certificate.spec.revisionHistoryLimit
for all namespaces.
Learn more
Adapt the Kyverno policies in the tutorial: how to set Certificate defaults automatically, to override rather than default the revisionHistoryLimit
field.
Learn how to set revisionHistoryLimit
when using Annotated Ingress resources.
Read cert-manager#3958
: Sane defaults for Certificate revision history limit; a proposal to change the default revisionHistoryLimit
, which will obviate this particular recommendation.
Enable Server-Side Apply¶
By default, cert-manager uses Update requests to create and modify resources like CertificateRequest
and Secret
, but on a busy cluster there will be frequent conflicts as the control loops in cert-manager each try to update the status of various resources.
You will see errors, like this one, in the logs:
I0419 14:11:51.325377 1 controller.go:162] "re-queuing item due to optimistic locking on resource" logger="cert-manager.certificates-trigger" key="team-864-p6ts6/app-7" error="Operation cannot be fulfilled on certificates.cert-manager.io \"app-7\": the object has been modified; please apply your changes to the latest version and try again"
This error is relatively harmless because the update attempt is retried, but it slows down the reconciliation because each error triggers an exponential back off mechanism, which causes increasing delays between retries.
The solution is to turn on the Server-Side Apply Feature, which causes cert-manager to use HTTP PATCH using Server-Side Apply when ever it needs to modify an API resource. This avoids all conflicts because each cert-manager controller sets only the fields that it owns.
You can enable the server-side apply feature gate with the following Helm chart values:
# helm-values.yaml
config:
apiVersion: controller.config.cert-manager.io/v1alpha1
kind: ControllerConfiguration
featureGates:
ServerSideApply: true
Learn more
Read Using Server-Side Apply in a controller, to learn about the advantages of server-side apply for software like cert-manager.
cert-manager cainjector¶
You can reduce the memory consumption of cainjector
by configuring it to only watch resources in the venafi
namespace, and by configuring it to not watch Certificate
resources. Here's how to configure the cainjector using Helm chart values:
cainjector:
extraArgs:
- --namespace=venafi
- --enable-certificates-data-source=false
Warning
This optimization is only appropriate if cainjector
is being used exclusively for the the cert-manager webhook. It is not appropriate if cainjector
is also being used to manage the TLS certificates for webhooks of other software. For example, some Kubebuilder derived projects may depend on cainjector
to inject TLS certificates for their webhooks.
Next Steps¶
- Helm-based installation methods explains how to install and configure the Helm charts of the Venafi Kubernetes components.
- Configuring Venafi Kubernetes Manifest tool explains how to configure Venafi Kubernetes components using the Venafi Kubernetes Manifest utility.
- Approver Policy supported Helmfile values reference
- Approver Policy Enterprise supported Helmfile values reference
- cert-manager supported Helmfile values reference
- CSI Driver supported Helmfile values reference
- Trust Manager supported Helmfile values reference
- Venafi Enhanced Issuer supported Helmfile values reference