In larger organizations, networking shouldn't be owned by every team. Shared VPC lets you designate one project as the host project that owns the network โ subnets, secondary IP ranges, firewall rules โ and attach other service projects that deploy workloads into it. This walkthrough provisions private GKE clusters across separate service projects on a shared network, with Terraform.
Two constraints to know first
- Shared VPC clusters must be VPC-native (alias IPs) โ legacy networks aren't supported.
- GKE cannot convert an existing cluster to the Shared VPC model โ plan for it at creation time.
The layout
Three projects: one host project holding all network infrastructure, and two service projects that each deploy a private GKE cluster into the shared subnets and secondary ranges.
1. Enable the GKE API in every project
gcloud services enable container.googleapis.com --project host-project-123456
gcloud services enable container.googleapis.com --project service-project-123456
2. Create a Terraform service account
gcloud iam service-accounts create terraform-sa \
--description="Terraform Service account" \
--display-name="Terraform Service Account" \
--project=platform-build-tf
Grant it the roles it needs (compute/network admin on the host, container admin on the service projects), create a JSON key, and point Terraform at it:
export GOOGLE_APPLICATION_CREDENTIALS="/mnt/e/terraform/terraform-sa.json"
echo $GOOGLE_APPLICATION_CREDENTIALS
3. The IAM binding everyone forgets
For GKE in a service project to use the host network, the project's GKE host service agent must hold roles/container.hostServiceAgentUser on the host project. Miss this and cluster creation fails with a permissions error:
gcloud projects add-iam-policy-binding host-project-123456 \
--member=serviceAccount:service-426640237071@container-engine-robot.iam.gserviceaccount.com \
--role=roles/container.hostServiceAgentUser
4. Drive it from Terraform with remote state
Keep state in a GCS backend and reference the host project's network outputs from each service project so the clusters land in the right shared subnet and secondary ranges:
data "terraform_remote_state" "project1_data" {
backend = "gcs"
config = {
bucket = "terraform-state-bucket"
prefix = "prod/shared-vpc-vm-fw"
}
}
From there, the GKE module creates each cluster as private (no public node IPs), VPC-native, attached to the shared subnetwork, with its pod/service secondary ranges supplied by the host network.
Production checklist
- Host project owns subnets, secondary ranges, and firewall โ service projects only deploy
- Private, VPC-native clusters โ no public node IPs
hostServiceAgentUsergranted to each service project's GKE robot SA- Remote state in GCS, host network outputs consumed by service projects
- Deny-all firewall baseline, open only what each workload needs
This is the exact secure-baseline pattern behind our Private GKE Platform case study โ and the kind of posture ATechsCloud CloudOps checks for automatically.
Originally published by Aslam Parvaiz on LinkedIn (Oct 2024). This is a condensed write-up; the full Terraform code lives in the linked repository. Read the original on LinkedIn โ