Skip to main content

GKE/K8s Production

· 12 min read
Sanjeev Sarda
High Performance Developer

Notes I took whilst doing the "Architecting with Google Kubernetes Engine: Production" course. Covers access control, security, logging and managed storage.

cloud

Access Control and Security in K8s and GKE

User accounts - human users

k8s service accounts - creds for applications in the cluster

User accounts defined through a cloud identity domain for fine grained control. There is also IAM to link permissions to users and groups.

For service accounts, we have Google Cloud Workload Identity to bridge the gap between k8s service accounts and IAM service accounts. Auth pods by keeping service account creds outside the pod.

A pod gets the service account creds from a GKE metadata server, which is just an endpoint on the overall k8s metadata server. Simplifies pod auth.

IAM - for google cloud resources outside your cluster

K8S RBAC - access within the cluster, namespace level

SYnchronized approach with both IAM and RBAC. IAM controls who can modify the cluster config. K8S RBAC for who can interact with k8s objects within the cluster.

Least privilege - minimum set of permissions to do the job

GKE Auth will manage end user auth for you:

alt text

K8S RBAC

Provide control over individual k8s resources.

Subjects - users, groups, service accounts

Verbs - the actions allowed for the resource

Resources - the k8s resources like pods, pvs, deployments

Roles - connect API resources and verbs

ROle bindings - roles to subjects

  1. Create roles with permissions in the right namespace
  2. Create role bindings for the roles created

Cluster Roles

alt text

A manifest defining RBAC role - rules are permissions granted by the role. Single namespace for a role.

Rules - the permissions

Resources - which resources the role can access

Verbs - permitted actions

RBAC roles at the namespace level. RBAC CLuster roles at the cluster level.

alt text

Non-resource endpoints can also be specified in rules:

alt text

Cluster Role Bindings

alt text

Can specify the subject kind as users or groups.

To assign the roles to users or groups you use the roleRef within the Role binding.

Namespaced Role bindings

alt text

Cluster level resources should be managed with ClusterRoles and NameSpaced resources with Roles.

For multi-namespace resources you need cluster roles.

Workload Identity

This is a GKE service which simplifies how containerized apps access google cloud services. Containers directly authenticate using their own dedicated credentials.

When you enable it on a cluster, the node pool gets automatically created using the format "PROJECT_ID.svc.id.goog". Containers on this nodepool can use it's service account creds to auth to google cloud services. This can be via command line or using the Application Default Credentials lib.

Autopilot clusters enable workload identity by default.

gcloud container clusters create/update to enable workload identity on a new or existing cluster

Migration to workload identity

  1. Create a new node pool
  2. Enable workload identity
  3. Configure the app to auth too google using workload identity
  4. Assign a k8s service account to the app, configure that service account to act as an IAM service account

Configuring an app to use workload identity

  1. Create a new namespace for the service account

kubectl create namespace NAMESPACE

  1. Create a service account for the app:

kubectl create serviceaccount K8SA_NAME --namespace NAMESPACE

  1. Create an IAM allow policy referencing the service account

alt text

  1. Grant roles to the service account:

alt text

  1. Add the service account to the pod manifest:

alt text

Control Plane Security

alt text

Each cluster has a dedicated root CA managed by google internally.

Control plane to nodes communication uses this CA.

etcd is isolated from other databases with a seperate CA per cluster.

kubelets - primary agents on nodes. Uses certs from the cluster CA for comms with the API server.

Newly provisioned nodes have a shared secret which allows kubelets to initiate certificate signing requests to the cluster's root CA.

Credential Rotation

Credential rotation - rotate creds regularly.

  1. Create a new IP for the cluster control plane
  2. Issue new certs for the control plane with the new IP
  3. Update nodes to use the new IP and creds
  4. Update external API clients
  5. Remove old IP and credentials

alt text

The complete command tells the control plane to only serve with the new credentials.

During rotation the API server can have downtime.

alt text

Metadata concealment.

Pod Security

PSS - pod security standards. Predefined security configs for pods. 3 predefined levels:

  • privileged (unrestricted)
  • baseline (balance between security and flexibility)
  • restricted (most stringent security).

PSA - pod security admission. An admission controller for GKE. Intercepts pod creation requests and evaluating the context vs the policy for a namespace. If the pod is not policy compliant, the pod won't come up. Enforces all pods follow policies.

Implementing PSS and PSA

alt text

GKE Logs

Logs for:

  • SYstem
  • Workloads
  • API Server
  • Scheduler
  • Controller manager

alt text

Default for logs generated by applications on the node: /var/log/ These are ephemeral. Include logs for kubelets and kube-proxy.

kubectl logs [PODNAME] to see logs from a pod

Cloud logging - in the _Default bucket, free tier for first 50gb for 30 days.

Admin logs are held in _Required for 400 days - includes admin activity audit logs, system event logs and login audit.

Up to 10 year retention with user defined buckets.

Monitoring

Cluster vs application monitoring:

alt text

alt text

K8s abstraction uses labels to organise resources and monitor systems and sub-systems by combining labels. Labels are keys and values e.g. environment=production

Logging based custom metrics.

Cloud monitoring and prometheus integration:

alt text

Kubectl logs

--since=3h from the last 3 hours

--previous [pod name] - logs before a crash

Logs are not persistent on pod eviction unless you have cluster level logging.

Node level logging uses rotation with archiving - once per day or max 100mb log. Keeps last 5 logs.

If a container restarts, kubelet will keep 1 terminated container with it's logs.

Inspecting logs with cloud logging and logging agent

Cluster events are stored as API objects on the cluster control plane temporarily.

alt text

GKE pushes the events from the control plane into the cloud logging product.

Logging agents pre-installed on nodes and configured to export to cloud logging. Can write to cloud logging via custom API also.

alt text

The log aggregator runs using a DaemonSet controller.

Managed Storage Services with GKE

You are responsible for app lifecycle and resilient and reliable service.

Google offers managed storage services via API.

Apps can use those APIs via workload identity.

IAM - defined within google cloud for cloud API auth

You can bind an IAM service account to a k8s service account:

alt text

THe IAM role should match the app's permissions.

Service account per app with specific IAM roles and permissions - protection against breaches, minimum required permissions, can disable specific service accounts.

Using Cloud Storage

Storage for apps, structured and unstructured files. Not a file system.

  • Standard storage - for hot data, stored for short periods

  • Nearline - infrequently accessed (once per month or less)

  • Coldline - low cost, once every 90 days

  • Archive - lowest cost, once a year access, for DR and backup

Server side encryption before storage. Integrates with pub sub service to notify applications. Can also use versioning for storage.

Using cloud databases

alt text

alt text

Bigtable

alt text

Firestore

Can handle queries and return the document result. Syncs data across all connected devices.

alt text

Spanner

Managed SQL.

alt text

BigQuery

Managed data warehousing.

alt text

alt text

Cloud sql

Managed MySQL, Postgresql and MS SQL server

Cloud SQL Auth Proxy - streamline and secure connections to your cloud dbs, sits beteen your app and database. Intransit encryption. All pods can access the DB.

alt text

Needs the cloud sql api and sqladmin api.

Comparing Storage Options

Depends on data type and business need.

alt text

Structured, transactional

From OLTP (online transaction processing), fast inserts and updates required for row based records.

If you need SQL: cloud sql for regional scalability, or spanner for global scalability

If you need NOSQL: Firestore

alt text

Structured, Analytics

From OLAP (online analytical processing), complex queries.

alt text

If you need SQL: Bigquery

If you need NOSQL: Bigtable

CI CD

alt text

alt text

alt text

Build - checkout the code on a specific version, artifacts built and images built

Deploy - deploy artifacts to the test env

Approve stage - developers decide if a pipeline should proceed to production.

Manual pipelines - no CI/CD

Packaged tool - fixed set of built in automations

Dev-centric - centers the pipeline around the code repository, GitOps, keep code and config in source repo which gets used as the golden source.

Ops centric - automating opterational tasks like infra, monitoring, incident response.

Spinnaker - tool for implementing ops centric CI/CD

CI/CD Tools Available in Google Cloud

Jenkins - continuous integration

Spinnaker - continuous delivery

CircLECI - continuous integration

GitlabCI - continuous integration

Drone - container first CI/CD

Artifact registry - store any kind of artifact including code, binaries, images. Built on google cloud storage. Supports AAR, APK, JAR, POM and ZIP.

Cloud Build - code to deployable. Executes built on gcloud infra or on prem hardware. Produces container images, automations on triggers based on events like changes in the code. Common volume called /workspace mounted which is where the source will be dled too. Each step of the build is run as independent containers.

Cloud deploy - automate deployment of applications to GKE. Handles the CD stage, connecting build artifacts to the deployment environment. Supports blue-green, rolling updates and canary.

Best Practice

  • Use a managed artifact repo
  • Use a build service
  • Use a deployment service
  • Automate your pipeline
  • Monitor your pipeline

Lab Notes

Applying pod security standards

  1. Create new namespaces
 kubectl create ns baseline-ns
kubectl create ns restricted-ns
  1. Use labels to apply security policies.

Apply security standards to pods, baseline gets applied to the baseline-ns namespace and restricted gets applied to the restricted-ns namespace

 kubectl label --overwrite ns baseline-ns pod-security.kubernetes.io/warn=baseline
kubectl label --overwrite ns restricted-ns pod-security.kubernetes.io/enforce=restricted
  1. View applied labels
 kubectl get ns --show-labels

alt text

Testing applied policies

Now we'll check if we can deploy a workload that violates the PodSecruity admission controller's policy.

Create psa-workload.yaml

 apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
securityContext:
privileged: true

This is an nginx pod resource with a securityContext of privileged.

We apply it to the baseline-ns namespace:

 kubectl apply -f psa-workload.yaml --namespace=baseline-ns

alt text

We can see the pod was still created.

We now try to apply it to the restricted-ns namespace

 kubectl apply -f psa-workload.yaml --namespace=restricted-ns

alt text

Viewing Policy Violations

In logs explorer use a query:

 resource.type="k8s_cluster"
protoPayload.response.reason="Forbidden"
protoPayload.resourceName="core/v1/namespaces/restricted-ns/pods/nginx"

IP Address and Credential Rotation

Start the IP and credential rotation process:

gcloud container clusters update $my_cluster --zone $my_zone --start-credential-rotation

alt text

The cluster master now temporarily serves the new IP address in addition to the original address.

Confirm and complete the process:

gcloud container clusters update $my_cluster --zone $my_zone --complete-credential-rotation

This finalizes the rotation processes and removes the original cluster ip-address.

If this process fails, you may need to upgrade the cluster and try again:

gcloud container clusters upgrade $my_cluster     --node-pool=default-pool   --zone $my_zone

alt text

Create a VPC-Native k8s cluster with native k8s monitoring

gcloud container clusters create $my_cluster \
--num-nodes 3 --enable-ip-alias --zone $my_zone \
--logging=SYSTEM \
--monitoring=SYSTEM

Create a cloud sql instance

gcloud sql instances create sql-instance --tier=db-n1-standard-2 --region=$my_region

Create a K8s Service account for the sql instance

kubectl create serviceaccount gkesqlsa

Bind a K8s Service Account to an IAM Service Account

gcloud iam service-accounts add-iam-policy-binding \
--role="roles/iam.workloadIdentityUser" \
--member="serviceAccount:$my_project.svc.id.goog[default/gkesqlsa]" \
sql-access@$my_project.iam.gserviceaccount.com

Annotate the K8s Service Account with the Google Cloud Service Account

kubectl annotate serviceaccount \
gkesqlsa \
iam.gke.io/gcp-service-account=sql-access@$my_project.iam.gserviceaccount.com

Create Secrets

You create two Kubernetes Secrets: one to provide the MySQL credentials and one to provide the Google credentials (the service account).

kubectl create secret generic sql-credentials \
--from-literal=username=sqluser\
--from-literal=password=sqlpassword

Deploy the SQL Agent as a Sidecar

apiVersion: apps/v1
kind: Deployment
metadata:
name: wordpress
labels:
app: wordpress
spec:
selector:
matchLabels:
app: wordpress
template:
metadata:
labels:
app: wordpress
spec:
serviceAccountName: gkesqlsa
containers:
- name: web
image: gcr.io/cloud-marketplace/google/wordpress:6.1
#image: wordpress:5.9
ports:
- containerPort: 80
env:
- name: WORDPRESS_DB_HOST
value: 127.0.0.1:3306
# These secrets are required to start the pod.
# [START cloudsql_secrets]
- name: WORDPRESS_DB_USER
valueFrom:
secretKeyRef:
name: sql-credentials
key: username
- name: WORDPRESS_DB_PASSWORD
valueFrom:
secretKeyRef:
name: sql-credentials
key: password
# [END cloudsql_secrets]
# Change '<INSTANCE_CONNECTION_NAME>' here to include your Google Cloud
# project, the region of your Cloud SQL instance and the name
# of your Cloud SQL instance. The format is
# $PROJECT:$REGION:$INSTANCE
# [START proxy_container]
- name: cloudsql-proxy
image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.8.0
args:
- "--structured-logs"
- "--port=3306"
- "<INSTANCE_CONNECTION_NAME>"
securityContext:
runAsNonRoot: true
---
apiVersion: "v1"
kind: "Service"
metadata:
name: "wordpress-service"
namespace: "default"
labels:
app: "wordpress"
spec:
ports:
- protocol: "TCP"
port: 80
selector:
app: "wordpress"
type: "LoadBalancer"
loadBalancerIP: ""