May 22, 2026

Incident Response in Kubernetes (GKE)

The pager goes off at 3:00 AM, again.

Last time, we were in AWS territory, wrestling with Elastic Kubernetes Cluster (EKS) and CloudWatch. This time the alert is coming from a Google Kubernetes Engine (GKE) cluster running on Google Cloud. The attack is the same the environment is not. Welcome to part 2 of our blog series about incident response in managed Kubernetes.

TL;DR

GKE ships three log sources to Google Cloud Logging by default, giving IR teams more out-of-the-box visibility than EKS. However, default does not mean complete. Important log sources, such as the Data Access audit logs, are not enabled by default and cannot be reconstructed after the fact. Enable them before you need them. If you are running GKE in Autopilot mode, do remember that node-level forensics are not possible as these are managed by Google themselves. Thus, your investigation is confined to what Cloud Logging received before the incident started. Treat your logging configuration as part of your incident response plan, not an afterthought. 

Understanding GKE

Like EKS, GKE follows a shared responsibility model split between a control plane managed by Google and a data plane where your workloads run. GKE takes this a step further by offering two distinct modes of operating: standard- and autopilot mode, which is defined by the operator when creating the cluster.

Standard mode is comparable to how the responsibility is handled in EKS. The provider (Google for GKE) is responsible for managing the control plane, but the operator remains in charge of managing the node pools, including choosing the correct machine types, updating the OS on the nodes etc. Using Autopilot, Google takes ownership of the nodes entirely. The cluster becomes a purely logical construct: you submit workloads, Google decides where they run and keeps the underlying infrastructure secure. This difference also requires a different approach when a security incident occurs. In Standard mode you have access to the node, since this is under your control. This means that you are able to directly SSH to the node and perform forensics there. This is not the case for Autopilot mode.

Logging for GKE

Compared to EKS, GKE takes a more proactive approach. By default, GKE ships three log sources to Google Cloud Logging, compared to EKS’s opt-in for logging. This is illustrated in table 1.

Table 1: logging sources for GKE
Log Source Purpose Relevance to IR Enabled by default
Admin Activity audit logs These record operations that modify resources, such as changing a RoleBinding, updating a ConfigMap. Track resource creation and modification by an attacker. Yes
Data Access audit logs Record operations that read resources or touch user-provided data, such as listing secrets. Covers the discovery and credential access phases. Detect secret enumeration, unauthorised exec into pods, and unexpected API reads. No
System Logs Capture the underlying infrastructure your workloads run on; logs from Kubernetes system namespaces and node-level services like the container runtime. Detects container runtime anomalies, unexpected process spawning, and node-level tampering that wouldn't appear in application or audit logs. Yes
Application Logs All logs generated by non-system containers running on user nodes. Shows exploitation attempts and malicious payloads in access logs, such as those for Nginx. Yes
Kubernetes API server Logs All logs generated by kube-apiserver. Supplements audit logs with lower-level API server activity. This includes request handling, connection management, errors, warnings, and internal state. No
Scheduler Logs All logs generated by kube-scheduler. Detect attempts to place pods on specific nodes or bypass security constraints such as taints and tolerations to gain a foothold. No
Controller Manager Logs All logs generated by kube-controller-manager. Track creation of malicious resources: coinmining containers, backdoor deployments No
Horizontal Pod Autoscaler Record scaling decisions made for each HPA object in the cluster. Detect unusual resource consumption patterns that may indicate cryptomining or a denial-of-service condition caused by a compromised workload. No
Control plane network connections Inbound network connection logs for GKE control plane instances. Detect unexpected inbound connections to the control plane. No; requires the GKE control plane authority feature.
Control plane SSH events Logs for all SSH events such as public key acceptance and session closure. Detect unauthorised SSH access to control plane nodes. No; requires the GKE control plane authority feature.

Having those logs shipped to a centralized location is a major benefit for IR teams, as it means increased visibility from day 1. These logs are collected through a FluentBit-based agent installed on the node. These logs are further complemented by GCP’s other non-GKE specific log sources, such as Cloud Audit Logs.

Investigating a possible threat

Don't panic, and don't tip your hand. It is important to not directly alter the environment by killing or restarting the affected pod(s). Doing so tells the attacker you know they are there and more importantly, destroys forensic evidence. If they have persistence elsewhere in the cluster, they will simply re-establish their foothold and destroy evidence as much as possible, complicating the IR process. Furthermore, you should not log in directly to the compromised environment. Beyond the forensic contamination risk, you may inadvertently expose your own credentials to a compromised environment. The goal at this stage is to observe without being observed.

Start with the logs

With a node and a time range in hand, go to Cloud Logging first. Filter on the affected node and work outward from there. In the audit logs, look for unusual exec calls into pods, unexpected secret reads, or RBAC modifications made around the time of the alert. The application logs might tell more about initial access and first steps taken. Look for unexpected outbound connections or shell scripts executed. System logs can complement both by describing what the container runtime was doing on the node at the time of the event; unexpected process spawning or filesystem mounts can indicate a container escape attempt. 

Specific queries for each of these scenarios are covered in the Investigating in Cloud Logging section below. 

Containment & Eradication

Once you have a confirmed compromised container, the next step is to quarantine it. This ensures that network connections are cut-off but let the container live for further investigation. In GKE, the approach mirrors what we covered in Part 1; Apply a deny-all NetworkPolicy scoped to the pod. This cuts the attacker's connection to any C2 infrastructure but leaves the pod running:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: quarantine
spec:
  podSelector:
    matchLabels:
      app: YOUR_APP_LABEL
  policyTypes:
    - Ingress
    - Egress

Remove the pod's labels so the load balancer stops routing production traffic to it:

kubectl label pod YOUR_POD_NAME app-

The pod is now isolated from both the internet and your production traffic, but still running, giving you the opportunity to continue your investigation without the attacker knowing you have acted. 

Depending on the compromise, it may be beneficial to take a snapshot of the disk the node uses. Do note that this option is not available for Autopilot deployments; since the node is managed by Google, direct access is not available.

Mapping to Kubernetes TTPs

Microsoft released the threat matrix for Kubernetes, providing a way to map attacker behaviour (TTPs) to goals (see figure 1).

Figure 1: Kubernetes matrix: https://www.microsoft.com/en-us/security/blog/2021/03/23/secure-containerized-environments-with-updated-threat-matrix-for-kubernetes/

We can use this matrix to see where behaviour would be detected, see table 2. Do note that this table is non-exhaustive, as certain tactics can be found in multiple log sources, depending on the technique used.

Table 1: logging sources for GKE
Log Source Purpose Relevance to IR Enabled by default
Admin Activity audit logs These record operations that modify resources, such as changing a RoleBinding, updating a ConfigMap. Track resource creation and modification by an attacker. Yes
Data Access audit logs Record operations that read resources or touch user-provided data, such as listing secrets. Covers the discovery and credential access phases. Detect secret enumeration, unauthorised exec into pods, and unexpected API reads. No
System Logs Capture the underlying infrastructure your workloads run on; logs from Kubernetes system namespaces and node-level services like the container runtime. Detects container runtime anomalies, unexpected process spawning, and node-level tampering that wouldn't appear in application or audit logs. Yes
Application Logs All logs generated by non-system containers running on user nodes. Shows exploitation attempts and malicious payloads in access logs, such as those for Nginx. Yes
Kubernetes API server Logs All logs generated by kube-apiserver. Supplements audit logs with lower-level API server activity. This includes request handling, connection management, errors, warnings, and internal state. No
Scheduler Logs All logs generated by kube-scheduler. Detect attempts to place pods on specific nodes or bypass security constraints such as taints and tolerations to gain a foothold. No
Controller Manager Logs All logs generated by kube-controller-manager. Track creation of malicious resources: coinmining containers, backdoor deployments No
Horizontal Pod Autoscaler Record scaling decisions made for each HPA object in the cluster. Detect unusual resource consumption patterns that may indicate cryptomining or a denial-of-service condition caused by a compromised workload. No
Control plane network connections Inbound network connection logs for GKE control plane instances. Detect unexpected inbound connections to the control plane. No; requires the GKE control plane authority feature.
Control plane SSH events Logs for all SSH events such as public key acceptance and session closure. Detect unauthorised SSH access to control plane nodes. No; requires the GKE control plane authority feature.

Enable Data Access audit logs before you need them.

Admin Activity logs tell you what was created or modified. Data Access logs tell you what was read; secrets listed, pods inspected etc. In a real investigation, the latter is often more valuable. They are not enabled by default, and unlike Admin Activity logs, they cannot be reconstructed after the fact. If an incident is already underway and Data Access logging was not enabled beforehand, that chapter of the attacker's story is gone permanently. Do note that it is a noisy logsource and especially DATA_READ are incredibly high-volume. Hence we recommend setting up explicit Log Router exclusions or sinks to filter out noisy, high-frequency internal system queries while retaining human activity.

Investigating in Cloud Logging

Below is a sample investigation using Cloud Logging's query language for detecting common abuse scenarios. Unlike Part 1's CloudWatch Logs Insights syntax, Cloud Logging uses a filter-based query language where log entries are addressed by resource type and proto payload fields.

Scenario: Investigate kube exec abuse

As in EKS, kubectl exec grants an attacker a shell inside a running pod. In GKE, this call passes through the Kubernetes API server and lands in the Data Access audit logs as a specific method name:

resource.type="k8s_cluster"
protoPayload.methodName="io.k8s.core.v1.pods.exec.get"

Resulting in:

View JSON Payload (GKE Audit Log)
{
  "protoPayload": {
    "@type": "type.googleapis.com/google.cloud.audit.AuditLog",
    "authenticationInfo": {
      "principalEmail": "user@domain"
    },
    "authorizationInfo": [
      {
        "granted": true,
        "permission": "io.k8s.core.v1.pods.exec.get",
        "resource": "core/v1/namespaces/default/pods/nginx-demo-654d46676b-5t7hw/exec"
      }
    ],
    "methodName": "io.k8s.core.v1.pods.exec.get",
    "requestMetadata": {
      "callerIp": "203.0.113.1",
      "callerSuppliedUserAgent": "kubectl/v1.35.3 (linux/amd64) kubernetes/665c2a2"
    },
    "resourceName": "core/v1/namespaces/default/pods/nginx-demo-654d46676b-5t7hw/exec",
    "serviceName": "k8s.io",
    "status": {
      "code": 0
    }
  },
  "insertId": "7400a9e0-b625-4e72-bd65-ea1a1715df92",
  "resource": {
    "type": "k8s_cluster",
    "labels": {
      "location": "us-central1",
      "project_id": "gke-blog-496810",
      "cluster_name": "autopilot-cluster"
    }
  },
  "timestamp": "2026-05-21T13:28:30.596798Z",
  "labels": {
    "authorization.k8s.io/decision": "allow",
    "command.gke.io/command": "/bin/bash",
    "authorization.k8s.io/reason": "access granted by IAM permissions."
  },
  "logName": "projects/<project-name>/logs/cloudaudit.googleapis.com%2Factivity",
  "operation": {
    "id": "7400a9e0-b625-4e72-bd65-ea1a1715df92",
    "producer": "k8s.io",
    "first": true
  },
  "receiveTimestamp": "2026-05-21T13:28:38.060411149Z"
}

As with EKS, exec activity is not inherently malicious. It can be a legitimate administrator performing debugging. As seen in the result of the query, we see that the /bin/bash command was executed, opening up a shell for the user. If we would directly pass a command via the `-c` parameter, this would show in the logging:

{
...
  "labels": {
    "command.gke.io/command": "/bin/bash -c whoami",
    "authorization.k8s.io/decision": "allow",
    "authorization.k8s.io/reason": "access granted by IAM permissions."
  },
...
}

 If a shell session was opened, the activity performed is not shown in the logging. For increased visibility, you would need to install a new agent such as Falco.

Real-World Context

While there are no clear documented GKE-specific cases in the past few months, it does not mean that running your workload in GKE is completely safe. The engine itself may still be vulnerable. In December 2023, Unit 42 researchers at Palo Alto disclosed a privilege escalation chain affecting GKE directly. The attack required an existing foothold in the cluster, such as a compromised pod, but from there, two misconfigurations could be chained to achieve full cluster-admin. Although dated as of now in 2026, it still provides insight into how privilege escalation can occur in GKE.

The first link in the chain was FluentBit, the default logging agent deployed as a DaemonSet on every GKE cluster since March 2023. A misconfiguration in FluentBit's volume mounts gave it access to service account tokens belonging to other pods on the same node. An attacker with code execution inside the FluentBit container could read those tokens and use them to authenticate to the Kubernetes API as a different, potentially more privileged identity.

The second link was Anthos Service Mesh (ASM), Google's optional service mesh add-on. Its CNI DaemonSet retained excessive ClusterRole permissions after installation, which an attacker could exploit to create new pods with elevated privileges, ultimately escalating to cluster-admin.

Neither vulnerability was dangerous in isolation. Together, they formed a clean two-step path from a compromised pod to full cluster control. Google patched both issues on December 14, 2023 via GCP-2023-047, though the ASM fix required manual action from cluster operators.

Conclusion

Incident response is more opinionated in GKE than it is in EKS due to the default always-on logs. This makes it sometimes more straightforward to investigate, as you are already always provided with a set of logs, which may not be the case with EKS. Even more with GKE’s AutoPilot mode, the shared responsibility line is drawn further toward Google, leaving less for the operator to misconfigure.

But that comfort comes with its own risks. The Data Access audit log gap means that the most investigatively valuable logs are the ones operators most commonly forget to enable. The Autopilot forensic cliff means that choosing convenience over control has real consequences when an incident occurs. And as the FluentBit case demonstrated, the components you rely on for visibility can themselves become attack vectors.

As with EKS, the most important IR work happens before the incident. Enable Data Access audit logs. Configure log sinks to a separate project. Audit your Workload Identity bindings for excessive permissions. Know whether you are on Standard or Autopilot and what that means for your forensic options at 3 AM.

Coming up next in Part 3: we move to Azure and look at AKS. We will see how Microsoft approaches the same shared responsibility model, how Azure Monitor and Defender for Containers compare to what we have covered here, and what changes when the underlying platform shifts again. 

About Invictus Incident Response

We are an incident response company and we ❤️ the cloud. We help our clients stay undefeated.

🆘 Incident Response support: reach out to cert@invictus-ir.com or go to https://www.invictus-ir.com/24-7

Be ready for the next cloud incident.

Invictus Schield