Venafi Kubernetes components deployment best practices¶

Learn how to configure the Venafi Kubernetes components for deployment in production environments.

Introduction¶

You will learn how to use Helm chart values to override settings such as the number of replicas in each Deployment and the resource requests for CPU and memory.

Discover methods to ensure the high availability of the Venafi Kubernetes components, whilst avoiding inefficient over-allocation of your cluster resources.

Isolate Venafi Kubernetes components on a dedicated Node pool¶

Venafi Kubernetes components should be treated as part of your platform's control plane. Some Venafi Kubernetes components, such as cert-manager, create and modify Kubernetes Secret resources and most of the components cache TLS Secret resources in memory.

If an untrusted or malicious workload runs on the same Node as the Venafi Kubernetes components, and gains root access to the underlying node, it may be able to read the Secrets that the controller has cached in memory. You can mitigate this risk by running Venafi Kubernetes components on nodes that are reserved for trusted platform operators.

The Helm charts for Venafi Kubernetes components have parameters to configure the Pod tolerations and Node affinity (or nodeSelector) for each component. The exact values of these parameters will depend on your particular cluster.

Learn more

Read Assigning Pods to Nodes in the Kubernetes documentation.

Read about Taints and Tolerations in the Kubernetes documentation.

Example¶

This example demonstrates how to use:

taints to repel non-platform Pods from Nodes which you have reserved for your platform's control-plane.
tolerations to allow cert-manager Pods to run on those Nodes.
nodeSelector to place the cert-manager Pods on those Nodes.

To isolate Venafi Kubernetes components on a dedicated Node pool:

Label the Nodes:

kubectl label node ... node-restriction.kubernetes.io/reserved-for=platform

Taint the Nodes:

kubectl taint node ... node-restriction.kubernetes.io/reserved-for=platform:NoExecute

Then install cert-manager using the following Helm chart values:

nodeSelector:
  kubernetes.io/os: linux
  node-restriction.kubernetes.io/reserved-for: platform
tolerations:
- key: node-restriction.kubernetes.io/reserved-for
  operator: Equal
  value: platform

webhook:
  nodeSelector:
    kubernetes.io/os: linux
    node-restriction.kubernetes.io/reserved-for: platform
  tolerations:
  - key: node-restriction.kubernetes.io/reserved-for
    operator: Equal
    value: platform

cainjector:
  nodeSelector:
    kubernetes.io/os: linux
    node-restriction.kubernetes.io/reserved-for: platform
  tolerations:
  - key: node-restriction.kubernetes.io/reserved-for
    operator: Equal
    value: platform

startupapicheck:
  nodeSelector:
    kubernetes.io/os: linux
    node-restriction.kubernetes.io/reserved-for: platform
  tolerations:
  - key: node-restriction.kubernetes.io/reserved-for
    operator: Equal
    value: platform

Note

This example uses nodeSelector to place the Pods but you could also use affinity.nodeAffinity. nodeSelector is chosen here because it has a simpler syntax.

Note

The default nodeSelector value kubernetes.io/os: linux avoids placing cert-manager Pods on Windows nodes in a mixed OS cluster, so that must be explicitly included here too.

Tip

On a multi-tenant cluster, consider enabling the PodTolerationRestriction plugin to limit which tolerations tenants may add to their Pods. You may also use that plugin to add default tolerations to the cert-manager namespace, which removes the need to explicitly set the tolerations in the Helm chart.

Tip

As an alternative, you could use Kyverno to limit which tolerations Pods are allowed to use.

Read Restrict control plane scheduling as a starting point.

Learn more

Read the Guide to isolating tenant workloads to specific nodes in the EKS Best Practice Guides, for an in-depth explanation of these techniques.

Learn how to Isolate your workloads in dedicated node pools on Google Kubernetes Engine.

Learn about Placing pods on specific nodes using node selectors, with RedHat OpenShift.

High Availability¶

This section is about increasing the Venafi Kubernetes resilience to voluntary and involuntary disruptions.

For example, as a platform administrator, you need to periodically drain and decommission a Kubernetes node without causing any downtime to the Venafi Kubernetes services or the applications relying on the Venafi Kubernetes services.

As a platform administrator, you need to be confident that the Venafi Kubernetes components can be deployed in such a way that they are resilient to a Kubernetes node failure.

Replicas¶

Use two or more replicas to achieve high availability.

Each of the Venafi Kubernetes components has one or more Kubernetes Deployment resources. By default each Deployment has one replica. The advantage of using one replica by default, is that you can easily evaluate the components on a single node cluster; on a kind or a minikube cluster, for example. The disadvantage is that one replica does not provide high availability. The Helm charts for Venafi Kubernetes components have parameters to configure the replicas for each controller, and in production you should use two or more replicas to achieve high availability.

Most of the Venafi Kubernetes components use leader election to ensure that only one replica is active. This prevents conflicts that would arise if multiple controllers were reconciling the same API resources. So in general multiple replicas can be used for high availability but not load balancing.

You don't need to use more than two replicas of a controller Pod. Using two replicas for each Deployment ensures that there is a standby Pod scheduled to a Node and ready to take leadership, if the current leader encounters a disruption. For example, a voluntary disruption, such as the draining of the Node on which the leader Pod is running, or an involuntary disruption, such as the leader Pod encountering an unexpected deadlock. Further replicas may add a degree of resilience if you have the luxury of sufficient Nodes with sufficient CPU and memory to accommodate standby replicas.

Special Cases

There are some exceptions to the general rules above.

cert-manager webhook

By default, the cert-manager webhook Deployment has one replica, but you should use three or more in production. If the cert-manager webhook is unavailable, all API operations on cert-manager custom resources will fail, disrupting any software that creates, updates, or deletes cert-manager custom resources. So, it is especially important to keep at least one replica of the cert-manager webhook running at all times.

The cert-manager webhook does not use leader election, so you can scale it horizontally by increasing the number of replicas. When the Kubernetes API server connects to the cert-manager webhook it does so via a Service that load balances the connections between all the Ready replicas. For this reason, there is a clear benefit to increasing the number of webhook replicas to three or more, on clusters where there is a high frequency of cert-manager custom resource interactions. Furthermore, the webhook has modest memory requirements because it does not use cache. For this reason, the resource cost of scaling out the webhook is relatively low.

Approver Policy and Enterprise Approver Policy for CyberArk Certificate Manager

By default, Approver Policy and Approver Policy Deployment resource have one replica, but in production you should use two or more. These components have both a controller manager and a webhook and unlike cert-manager these are both provided by the same process in each Pod. This changes the strategy for high availability and scaling of these components. You should use at least two replicas for high availability. The Approver Policy webhook is only used for validating CertificateRequestPolicy resources, so in the unlikely event that there is a high frequency of CertificateRequestPolicy API interactions, you might consider scaling out the Deployment to three or more replicas. Be aware that if you do, any of the replicas might also be elected as the controller manager and will therefore need to reserve sufficient memory to cache all CertificateRequests and CertificateRequestPolicy resources.

CSI Driver and CSI Driver for SPIFFE

These CSI components use DaemonSet resources because exactly one instance of the driver is meant to run on all or some of the nodes in your cluster. Therefore, the Helm charts for these components do not allow you to configure the replica count.

Learn more¶

For examples of how webhook disruptions might disrupt your cluster, see Ensure control plane stability when using webhooks in the Google Kubernetes Engine (GKE) documentation, .

To learn more about potential issues caused by webhooks and how you can avoid them, see The dark side of Kubernetes admission webhooks on the Cisco Tech Blog.

PodDisruptionBudget¶

For high availability you should also deploy a PodDisruptionBudget resource, with minAvailable=1 or with maxUnavailable=1. This ensures that a voluntary disruption, such as the draining of a Node, can not proceed until at least one other replica has been successfully scheduled and started on another Node. The Helm charts for Venafi Kubernetes components have parameters to enable and configure a PodDisruptionBudget.

Warning

These PodDisruptionBudget settings are only suitable for high availability deployments. You must increase the replicas value of each Deployment to more than the minAvailable value, otherwise the PodDisruptionBudget will prevent you from draining cert-manager Pods.

Learn more

See Specifying a Disruption Budget for your Application in the Kubernetes documentation.

Topology Spread Constraints¶

Consider using Topology Spread Constraints, to ensure that a disruption of a node or data center does not degrade the operation of cert-manager.

For high availability, you don't want the replica Pods to be scheduled on the same Node, because if that node fails, both the active and standby Pods will exit. There will be no further reconciliation of the resources by that controller until there is another Node with enough free resources to run a new Pod, and until that Pod has become Ready.

It is also desirable for the replica Pods to run in separate data centers (availability zones), if the cluster has nodes distributed between zones. Then, in the event of a failure at the data center hosting the active Pod, the standby Pod will immediately be available to take leadership.

Fortunately, you may not need to do anything to achieve these goals because Kubernetes > 1.24 has Built-in default constraints which should mean that the high availability scheduling described above will happen implicitly. But in case your cluster does not have those defaults, each of the Venafi Kubernetes component Helm charts have parameters to explicitly add Topology Spread Constraints to the Pods.

Liveness Probes¶

Some Venafi Kubernetes components have liveness probes. Those components that have liveness probes have sensible defaults for the following options:

initialDelaySeconds
periodSeconds
timeoutSeconds
successThreshold
failureThreshold

If necessary, you can override the defaults by supplying Helm chart values.

Venafi may add liveness probes to the other components in the future, but only where there is a clear benefit. This cautious approach to liveness probes is guided by the following paragraph in the Kubernetes documentation Configure Liveness, Readiness and Startup Probes:

Liveness probes can be a powerful way to recover from application failures, but they should be used with caution. Liveness probes must be configured carefully to ensure that they truly indicate unrecoverable application failure, for example a deadlock

Background information

The Venafi Kubernetes components are designed to crash when there's a fatal error. Kubernetes will restart a crashed container and if it crashes repeatedly, there will be increasing time delays between successive restarts.

Each of the components starts one or more long running threads. For example, in Approver Policy, one thread is responsible for leader election and another is responsible for reconciling custom resources, another is responsible for serving metrics and another is responsible for handling webhook requests. If any of these threads exits with an error, it will trigger the orderly shutdown of the remaining threads and the process will exit.

For this reason, the liveness probe should only be needed if there is a bug in this orderly shutdown process, or if there is a bug in one of the other threads which causes the process to deadlock and not shutdown.

Learn more

Read about Using Liveness Probes in cert-manager.

Read Configure Liveness, Readiness and Startup Probes in the Kubernetes documentation, paying particular attention to the notes and cautions in that document.

Read Shooting Yourself in the Foot with Liveness Probes for more cautionary information about liveness probes.

Priority Class¶

The reason for setting a priority class is summarized as follows in the Kubernetes blog Protect Your Mission-Critical Pods From Eviction With PriorityClass:

Pod priority and preemption help to make sure that mission-critical pods are up in the event of a resource crunch by deciding order of scheduling and eviction.

If the Venafi Kubernetes components are mission-critical to your platform, then set a priorityClassName on the Pods to protect them from preemption, in situations where a Kubernetes node becomes starved of resources. Without a priorityClassName the Pods of the Venafi Kubernetes components may be evicted to free up resources for other workloads, and this may cause disruption to any applications that rely on them.

Most Kubernetes clusters will come with two built-in priority class names: system-cluster-critical and system-node-critical, which are used for Kubernetes core components. These can also be used for critical add-ons, such as the Venafi Kubernetes components.

Here are the Helm chart values you can use for cert-manager, for example:

global:
  priorityClassName: system-cluster-critical

The Venafi Kubernetes components allow you to set the priorityClassName using Helm chart values, and the Manifest tool for CyberArk Certificate Manager makes it easy to set the values for multiple components.

On some clusters the ResourceQuota admission controller may be configured to limit the use of certain priority classes to certain namespaces. For example, Google Kubernetes Engine (GKE) will only allow priorityClassName: system-cluster-critical for Pods in the kube-system namespace, by default.

Learn more

Read Kubernetes PR #93121 to see how and why this was implemented.

In such cases you will need to create a ResourceQuota in the venafi namespace:

# venafi-resourcequota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: venafi-critical-pods
  namespace: venafi
spec:
  hard:
    pods: 1G
  scopeSelector:
    matchExpressions:
    - operator: In
      scopeName: PriorityClass
      values:
      - system-node-critical
      - system-cluster-critical

kubectl apply -f venafi-resourcequota.yaml

If you are using CSI Driver or CSI Driver for SPIFFE it is essential to set priorityClassName: system-node-critical. These components are deployed using DaemonSet resources and must run on the same nodes as the general applications on your platform (as opposed to components like cert-manager which should run on isolated nodes). By assigning priorityClassName: system-node-critical to these CSI Driver pods, you can ensure that they have a higher scheduling priority compared to other general application pods.

Learn more

Read Protect Your Mission-Critical Pods From Eviction With PriorityClass, a Kubernetes blog post about how Pod priority and preemption help to make sure that mission-critical pods are up in the event of a resource crunch by deciding order of scheduling and eviction.

Read Guaranteed Scheduling For Critical Add-On Pods to learn why system-cluster-critical should be used for add-ons that are critical to a fully functional cluster.

Read Limit Priority Class consumption by default, to learn why platform administrators might restrict usage of certain high priority classes to a limited number of namespaces.

Setting default values for HA deployments using the CLI tool for CyberArk Certificate Manager ¶

To generate HA values for any of the supported components (see the example that follows), and include them in a Venafi manifest, use the --ha-values-dir flag for venctl components kubernetes manifest generate:

venctl components kubernetes manifest generate \
   --namespace venafi \
   --cert-manager \
   --approver-policy-enterprise \
   --csi-driver \
   --trust-manager \
   --venafi-connection \
   --venafi-enhanced-issuer \
   --venafi-kubernetes-agent \
   --ha-values-dir venafi-values > venafi-components.yaml

This will write values files to the venafi-values directory, creating that directory if needed. These generated values files will also be included in your generated manifest (in this example, venafi-components.yaml).

Using the --ha-values-dir flag will overwrite any previously generated values. If you want to change any of the Venafi HA values, you can pass your custom values as with previous versions of the CLI tool for CyberArk Certificate Manager tool - your customizations will take precedence.

For example, if you wish to increase the memory request for the cert-manager controller:

cat cert-manager-controller-memory.values.yaml
resources:
  requests:
    memory: 1Gi

venctl components kubernetes manifest generate \
   --namespace venafi \
   --cert-manager \
   --cert-manager-values-files cert-manager-controller-memory.values.yaml \
   --approver-policy-enterprise \
   --csi-driver \
   --trust-manager \
   --venafi-connection \
   --venafi-enhanced-issuer \
   --venafi-kubernetes-agent \
   --ha-values-dir venafi-values > venafi-components.yaml

Scaling¶

This section explains how to accurately allocate CPU and memory resources to match the actual usage needs of the Venafi Kubernetes applications running in your Kubernetes environment. This is sometimes referred to as right sizing.

This is important for the following reasons:

Efficiency: Matching the resource requests with the actual usage helps prevent the wastage of resources.
Performance: Using appropriate resource requests will improve the performance of the components.
Reliability: Right sizing improves reliability by avoiding crashes caused by memory exhaustion, for example.

However, right sizing is a continuous process that requires ongoing monitoring, adjustment, and optimization.

Learn more

See Practical tips for right sizing your Kubernetes workloads, on the Datadog blog.

See Right sizing workloads in Kubernetes for cost optimization, on the Google Cloud blog.

A note about horizontal scaling

Components that use leader election¶

Most Venafi Kubernetes components use leader election to ensure that only one replica is active. This prevents conflicts which would arise if multiple replicas were reconciling the same API resources. You cannot use horizontal scaling for these components because only one replica is active. Use vertical scaling instead.

Components that use DaemonSet¶

CSI Driver and CSI Driver for SPIFFE

These CSI components use DaemonSet resources because exactly one instance of the driver is meant to run on all or some of the nodes in your cluster. Therefore, the Helm charts for these components do not allow you to configure the replica count. Use vertical scaling instead.

cert-manager webhook

The cert-manager webhook does not use leader election, so you can scale it horizontally by increasing the number of replicas. When the Kubernetes API server connects to the cert-manager webhook, it does so via a Service that load balances the connections between all the Ready replicas. For this reason, there is a clear benefit to increasing the number of webhook replicas to three or more, on clusters with a high frequency of cert-manager custom resource interactions. Furthermore, the webhook has modest memory requirements because it does not use a cache. For this reason, the resource cost of scaling out the webhook is relatively low.

Approver Policy and Enterprise Approver Policy for CyberArk Certificate Manager

By default, Approver Policy and Enterprise Approver Policy for CyberArk Certificate Manager Deployment resources have one replica, but in production you should use two or more. These components have both a controller manager and a webhook, and unlike cert-manager, these are both provided by the same process in each Pod. This changes the strategy for high availability and scaling of these components. You should use at least two replicas for high availability. The Approver Policy webhook is only used for validating CertificateRequestPolicy resources. In the unlikely event of a high frequency of CertificateRequestPolicy API interactions, you might consider scaling out the Deployment to three or more replicas. Be aware that if you do, any of the replicas might also be elected as the controller manager and will, therefore, need to reserve sufficient memory to cache all CertificateRequests and CertificateRequestPolicy resources.

Do not use HorizontalPodAutoscaling¶

HorizontalPodAutoscaling is not useful for any of the Venafi Kubernetes components. They either do not support horizontal scaling (see above) or they are webhooks, which are unlikely to encounter sufficient load to trigger the horizontal pod autoscaler.

The Helm charts for the Venafi Kubernetes components do not all include resource requests and limits, so you should supply resource requests and limits that are appropriate for your cluster.

What are appropriate resource requests and limits? This depends on factors such as the size of your cluster and the number and nature of the workloads that run on the cluster.

Memory¶

Use vertical scaling to assign sufficient memory resources to Venafi Kubernetes components. The memory requirements will be higher on clusters with very many API resources or with large API resources. This is because each of the components reconciles one or more Kubernetes API resources, and each component will cache the metadata and sometimes the entire resource in memory, to reduce the load on your Kubernetes API server.

For example, if the cluster contains very many CertificateRequest resources, you will need to increase the memory limit of the Enterprise Issuer Pod. The default memory limit of Enterprise Issuer is 128MiB, which is sufficient to cache ~3000 CertificateRequest resources.

You can use tools like Kubernetes Resource Recommender, to measure the typical memory usage of the Venafi Kubernetes components in your cluster and to help you choose appropriate memory limits.

Learn more

See Assign Memory Resources to Containers and Pods, in the Kubernetes documentation.

See What Everyone Should Know About Kubernetes Memory Limits, to learn why the best practice is to set memory limit=request.

Benchmarks¶

The following charts and tables show the maximum memory usage of the Venafi Kubernetes components with: 1000, 5000, and 10000 Certificate resources. The private key size affects the size of the Secret resources, and larger Secret resources cause higher memory use in some components, due to in-memory caching. The measurements were collected using a Venafi benchmarking tool called tlspk-bench. The components were installed in a Kind cluster using the CLI tool for CyberArk Certificate Manager, with default Helm chart values. The unit of measurement is mebibyte (MiB). You can use these measurements to choose initial memory limits for each component.

RSA 2048¶

Component	1000	5000	10000
approver-policy-enterprise (v0.17.0)	57	229	419
cert-manager-cainjector (v1.14.5)	39	102	172
cert-manager-controller (v1.14.5)	95	314	528
cert-manager-webhook (v1.14.5)	20	25	23
trust-manager (v0.10.0)	18	18	19
venafi-enhanced-issuer (v0.14.0)	41	98	162
venafi-kubernetes-agent (v0.1.48)	115	457	715

ECDSA 256¶

Component	1000	5000	10000
approver-policy-enterprise (v0.17.0)	51	208	402
cert-manager-cainjector (v1.14.5)	37	84	145
cert-manager-controller (v1.14.5)	89	246	458
cert-manager-webhook (v1.14.5)	20	25	24
trust-manager (v0.10.0)	18	18	22
venafi-enhanced-issuer (v0.14.0)	41	93	157
venafi-kubernetes-agent (v0.1.48)	148	388	770

CPU¶

Use vertical scaling to assign sufficient CPU resources to the Venafi Kubernetes components. The CPU requirements will be higher on clusters where there are very frequent updates to the resources which are reconciled by the components. Whenever a resource changes, it will be queued to be re-reconciled by the component. Higher CPU resources allow the component to process the queue faster.

You can use tools like Kubernetes Resource Recommender, to measure the typical CPU usage of the Venafi Kubernetes components in your cluster and to help you choose appropriate CPU requests.

Learn more

See Assign CPU Resources to Containers and Pods, in the Kubernetes documentation.

Read Stop Using CPU Limits on Kubernetes, to learn why the best practice is to set CPU requests, but not limits.

Component Notes¶

cert-manager-controller¶

Cache Size¶

When Certificate resources are the dominant use-case, as for example when workloads need to mount the TLS Secret or when gateway-shim is used, the memory consumption of the cert-manager controller will be roughly proportional to the total size of those Secret resources that contain the TLS key pairs. This is because the cert-manager controller caches the entire content of these Secret resources in memory. If large TLS keys are used (e.g. RSA 4096) the memory use will be higher than if smaller TLS keys are used (e.g. ECDSA).

The other Secrets in the cluster, such as those used for Helm chart configurations or for other workloads, will not significantly increase the memory consumption, because cert-manager will only cache the metadata of these Secrets.

When CertificateRequest resources are the dominant use-case, as for example with CSI Driver or with Istio CSR, the memory consumption of the cert-manager controller will be much lower, because there will be fewer TLS Secrets and fewer resources to be cached.

Disable client-side rate limiting for Kubernetes API requests¶

By default cert-manager throttles the rate of requests to the Kubernetes API server to 20 queries per second. Historically this was intended to prevent cert-manager from overwhelming the Kubernetes API server, but modern versions of Kubernetes implement API Priority and Fairness, which obviates the need for client side throttling. You can increase the threshold of the client-side rate limiter using the following helm values:

# helm-values.yaml
config:
  apiVersion: controller.config.cert-manager.io/v1alpha1
  kind: ControllerConfiguration
  kubernetesAPIQPS: 10000
  kubernetesAPIBurst: 10000

Learn more

This does not technically disable the client-side rate-limiting but configures the QPS and Burst values high enough that they are never reached.

Read cert-manager#6890: Allow client-side rate-limiting to be disabled. A proposal for a cert-manager configuration option to disable client-side rate-limiting.

Read kubernetes#111880: Disable client-side rate-limiting when AP&F is enabled. A proposal that the kubernetes.io/client-go module should automatically use server-side rate-limiting when it is enabled.

Read about other projects that disable client-side rate limiting: Flux.

Read API documentation for ControllerConfiguration for a description of the kubernetesAPIQPS and kubernetesAPIBurst configuration options.

Restrict the use of large RSA keys¶

Certificates with large RSA keys cause cert-manager to use more CPU resources. When there are insufficient CPU resources, the reconcile queue length grows. This delays the reconciliation of all Certificates. A user who has permission to create a large number of RSA 4096 certificates might accidentally or maliciously cause a denial of service for other users on the cluster.

Learn more

Learn how to enforce an Approval Policy to prevent the use of large RSA keys.

Learn how to set Certificate defaults automatically, using tools like Kyverno.

Set `revisionHistoryLimit: 1` on all Certificate resources¶

By default, cert-manager will keep all the CertificateRequest resources that it creates (revisionHistoryLimit):

The maximum number of CertificateRequest revisions that are maintained in the Certificate's history. Each revision represents a single CertificateRequest created by this Certificate, either when it was created, renewed, or Spec was changed. Revisions will be removed by oldest first if the number of revisions exceeds this number. If set, revisionHistoryLimit must be a value of 1 or greater. If unset (nil), revisions will not be garbage collected. The default value is nil.

On a busy cluster these will eventually overwhelm your Kubernetes API server because of the memory and CPU required to cache them all and the storage required to save them.

Use a tool like Kyverno to override the Certificate.spec.revisionHistoryLimit for all namespaces.

Learn more

Adapt the Kyverno policies in the tutorial: how to set Certificate defaults automatically, to override rather than default the revisionHistoryLimit field.

Learn how to set revisionHistoryLimit when using Annotated Ingress resources.

Read cert-manager#3958: Sane defaults for Certificate revision history limit; a proposal to change the default revisionHistoryLimit, which will obviate this particular recommendation.

Enable Server-Side Apply¶

By default, cert-manager uses Update requests to create and modify resources like CertificateRequest and Secret, but on a busy cluster there will be frequent conflicts as the control loops in cert-manager each try to update the status of various resources.

You will see errors, like this one, in the logs:

I0419 14:11:51.325377       1 controller.go:162] "re-queuing item due to optimistic locking on resource" logger="cert-manager.certificates-trigger" key="team-864-p6ts6/app-7" error="Operation cannot be fulfilled on certificates.cert-manager.io \"app-7\": the object has been modified; please apply your changes to the latest version and try again"

This error is relatively harmless because the update attempt is retried, but it slows down the reconciliation because each error triggers an exponential back off mechanism, which causes increasing delays between retries.

The solution is to turn on the Server-Side Apply Feature, which causes cert-manager to use HTTP PATCH using Server-Side Apply when ever it needs to modify an API resource. This avoids all conflicts because each cert-manager controller sets only the fields that it owns.

You can enable the server-side apply feature gate with the following Helm chart values:

# helm-values.yaml
config:
  apiVersion: controller.config.cert-manager.io/v1alpha1
  kind: ControllerConfiguration
  featureGates:
    ServerSideApply: true

Learn more

Read Using Server-Side Apply in a controller, to learn about the advantages of server-side apply for software like cert-manager.

cert-manager cainjector¶

You can reduce the memory consumption of cainjector by configuring it to only watch resources in the venafi namespace, and by configuring it to not watch Certificate resources. Here's how to configure the cainjector using Helm chart values:

cainjector:
  extraArgs:
  - --namespace=venafi
  - --enable-certificates-data-source=false

Warning

This optimization is only appropriate if cainjector is being used exclusively for the the cert-manager webhook. It is not appropriate if cainjector is also being used to manage the TLS certificates for webhooks of other software. For example, some Kubebuilder derived projects may depend on cainjector to inject TLS certificates for their webhooks.

Next Steps¶

Helm-based installation methods explains how to install and configure the Helm charts of the Venafi Kubernetes components.
Configuring the Manifest tool explains how to configure Venafi Kubernetes components using the Manifest tool for CyberArk Certificate Manager.
Approver Policy supported Helmfile values reference
Enterprise Approver Policy supported Helmfile values reference
cert-manager supported Helmfile values reference
CSI Driver supported Helmfile values reference
Trust Manager supported Helmfile values reference
Enterprise Issuer for CyberArk Certificate Manager supported Helmfile values reference

Venafi Kubernetes components deployment best practices¶

Introduction¶

Isolate Venafi Kubernetes components on a dedicated Node pool¶

Example¶

High Availability¶

Replicas¶

Learn more¶

PodDisruptionBudget¶

Topology Spread Constraints¶

Liveness Probes¶

Priority Class¶

Setting default values for HA deployments using the CLI tool for CyberArk Certificate Manager ¶

Scaling¶

Components that use leader election¶

Components that use DaemonSet¶

Do not use HorizontalPodAutoscaling¶

Memory¶

Benchmarks¶

RSA 2048¶

ECDSA 256¶

CPU¶

Component Notes¶

cert-manager-controller¶

Cache Size¶

Disable client-side rate limiting for Kubernetes API requests¶

Restrict the use of large RSA keys¶

Set revisionHistoryLimit: 1 on all Certificate resources¶

Enable Server-Side Apply¶

cert-manager cainjector¶

Next Steps¶

Set `revisionHistoryLimit: 1` on all Certificate resources¶