Most GKE clusters start public because it's the path of least resistance. For anything carrying real workloads, that's the wrong default. A private cluster removes public IPs from your nodes and, optionally, restricts control-plane access to known networks โ shrinking your attack surface to almost nothing. Here's the baseline we run in production.
1. Remove public node IPs
In a private cluster, nodes get only internal RFC 1918 addresses. They can't be reached from the internet and can't reach the internet directly. Enable it at creation time โ converting an existing public cluster is disruptive.
gcloud container clusters create prod-gke \
--enable-private-nodes \
--enable-private-endpoint \
--master-ipv4-cidr 172.16.0.0/28 \
--enable-ip-alias \
--network shared-vpc \
--subnetwork gke-subnet \
--workload-pool=PROJECT_ID.svc.id.goog
--enable-private-endpoint also makes the control-plane endpoint private. If your CI/CD runners live outside the VPC, you'll reach the API through a bastion, a VPN, or Connect Gateway instead of a public endpoint.
2. Give nodes egress via Cloud NAT
Private nodes still need to pull images and reach Google APIs. Provision Cloud NAT on a Cloud Router so egress is controlled and logged โ no public IPs required.
gcloud compute routers create gke-router \
--network shared-vpc --region us-central1
gcloud compute routers nats create gke-nat \
--router gke-router --region us-central1 \
--nat-all-subnet-ip-ranges \
--auto-allocate-nat-external-ips \
--enable-logging
For Google APIs specifically, prefer Private Google Access so that traffic to *.googleapis.com never leaves Google's network.
3. Lock down the control plane with authorized networks
Even with a private endpoint, enable control-plane authorized networks so only explicit CIDRs (your bastion, VPN range, or CI egress) can talk to the Kubernetes API.
gcloud container clusters update prod-gke \
--enable-master-authorized-networks \
--master-authorized-networks 10.8.0.0/24,10.9.0.0/24
4. Use Workload Identity, not node service accounts
Stop binding broad IAM to the node pool. Workload Identity federates Kubernetes service accounts to Google service accounts so each workload gets exactly the permissions it needs โ and no static keys ever touch a pod.
kubectl annotate serviceaccount catalog-api \
-n catalog \
iam.gke.io/gcp-service-account=catalog-api@PROJECT_ID.iam.gserviceaccount.com
gcloud iam service-accounts add-iam-policy-binding \
catalog-api@PROJECT_ID.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:PROJECT_ID.svc.id.goog[catalog/catalog-api]"
5. Land it in a Shared VPC
In production we put GKE in a Shared VPC so networking is owned by a central host project while teams deploy into service projects. Combine it with a deny-all firewall baseline and open only what each workload requires.
6. Prefer Gateway API for ingress
For L7 traffic, the GKE Gateway controller (Gateway API) is more expressive than legacy Ingress and pairs cleanly with Google-managed certificates and Cloud Armor. Terminate TLS at the load balancer, serve HTTPS only, and attach a Cloud Armor security policy for WAF and rate limiting.
Production checklist
- Private nodes โ no public IPs
- Private control-plane endpoint + authorized networks
- Cloud NAT for egress, Private Google Access for Google APIs
- Workload Identity for all workloads โ zero static keys
- Shared VPC with deny-all firewall baseline
- Gateway API + managed certs + Cloud Armor for ingress
- Enable VPC flow logs and audit logging end to end
This is exactly the kind of posture ATechsCloud CloudOps checks for automatically โ surfacing public exposure, over-broad IAM, and missing guardrails before they become incidents.