Anthos Hybrid Cloud Infra
Notes I took whilst studying the "Hybrid Cloud Infrastructure Foundations with Anthos" course.
Intro to Anthos
Hybrid cloud acts as a bridge between on-prem and cloud providers. Allows us to move to the public cloud in a staggered and controlled manner.
To increase developer velocity to compete on innovation.
Borg - google's internal container orch, the open source equivalent is k8s.
Lifecycle management of services.
Understanding personas to take a use-case based approach to service lifecycle management - there are a lot of consumers of any given stack.
Architectural principals -> Framework -> Implementation
GKE On prem - modernization in place.
Your tools as a dev can evolve independently of where your code and apps run.
Istio - policies around services, how services interact, without infra complexity.
Declarative config at scale - define policies once and push to environments, important for PCI compliance and governance.
K8s facilitates a service centric architecture. Lots of services requires management. Istio helps with this.
A cluster is a soverign unit in k8s, though you can have multiple clusters in GKE.
Ubernetes - a lot of clusters around the world to be close to users. No collab between clusters without a higher level abstraction.
Observability allows us to compare how workloads run between the cloud and on-prem.
What is the Anthos stack?
Means flower in Greek (grow on prem but need rain from clouds).
Same dash between envs to control both environments.
Can you use a cloud interconnect between on prem and cloud.
Policy repo - on git.
Managing Hybrid Clusters with GKE
Containers and k8s are our compute layer. Originally only for stateless apps, but we now have stateful sets to cover more use cases in a containerized environment.
GKE - k8s as a managed turnkey service. Also provides regional clustering for HA for zones in a single region. Certified config management with best practices applied in an automated manner.
GKE on prem - an appliance you run on-prem with GKE. Includes Istio, knative, marketplace solutions, container services like cloud build, registry, logging and monitoring.
Identity and access management based on cloud identity (Google's) or your own identity provider.
Secure connection for the control plane between cloud and on-prem without a VPN.
Use Cases
GKE On Prem Architecture
Kubeception - A cluster master, an admin cluster to manage other clusters.
gkectl - cli for controlling your on prem gke environment, the admin cluster
Admin cluster controls the masters of your user clusters.
control plane ~ kubernetes master
Your containers run on user clusters.
The user cluster control plane belongs to the user cluster, but are admined by the admin cluster.
On Prem Install
gkectl create cluster
to create the admin cluster.
Networking Connectivity
Island mode - pods within a cluster can not be reached directly, it requires a k8s service. Communication within a cluster via BGP.
Flat ip mode - routable and reachable within a DC.
Control plane connection
You use an outbound TLS connection from your on-prem location to a GKE connect proxy which runs in the cloud.
Data plane connection
The data plane is how your apps actually communicate. The above table shows the 3 main options you have based on your bandwidth requirements and physical proximity to Google's DCs.
Intro to Service Mesh
Networks in enterprises are often flat with a thick border - it's hard to get in, but once you do, you can move around laterally relatively easily.
With a zero trust network, we use principals of defense in depth, checks and balances on intra-service communication.
An ID for a service helps to auth who we are. We need authentication and authorization as now we have microservices communicating with one another whereas before we would have had all inprocess communication in a monolith.
The things around your business logic are network functions.
A service mesh seperates these network functions.
Two containers within a pod, with the business logic in one container and a sidecar that's responsible for the network functions.
Main network functions and those of a service mesh:
- Traffic control
- Observability
- Security
Service mesh control plane
We have islands of clusters across the world - we need them to communicate in a secure and reliable fashion.
Istio decouples these functions from your application - increases portability.
Istio sidecar model
The Istio proxy is based on Envoy proxy.
The proxy intercepts all communication.
The envoy proxy intercepts network comms and applies logic too it. We introduce functionality at the network level.
Pilot is used to manage proxies, make sure proxies are aware of services, have relevant TLS certs, handles routing, service discovery etc.
Mixer collects telemetry, does policy checking.
Citadel - a CA in the service mesh for certificate management, uses MTLS for bidirectional auth.
Request Flow in a Service Mesh
Pilot reads the service entries in k8s. A control plane to configure and push service communication policies.
LB and comms happen between the proxies.
Service A tries to contact service B. It sends a DNS request, gets an IP back from kube-dns. The proxy intercepts the request. It has a map of all routes and services. It then does client side LB - chooses one of the pods which has service B, then makes a direct connection too it.
Once service B gets the request, it checks with MIxer for quota/limits and policy checks (including an ACL). Async sends telemetry info as well. Mixer responds with yes, then the proxy forwards the request to service B which returns the response. The response post response from service B is also reported back into Mixer by the proxy in order to establish response times/provide more telemetry.
The response is intercepted by the proxy for service A which sends the response to service A. The proxy on service A also sends telemetry so we have full end to end observability.
Service mesh features
-
Traffic splitting independently on infra to allow for things like canary releases without doing pod balancing i.e. route 1% of traffic to v2 and 99% to v1. Rather than having 99 pods on v1 and 1 on v2, we use traffic based routing, making the network smarter, requiring less pods.
-
Content based traffic - layer 7 based routing, HTTP header, URI - used to route within the mesh.
-
Fault injection and circuit breaking
Traffic Shaping
Lookup static virtual IP. Use label selectors to route.
Traffic splitting in traditional K8s:
With Service mesh there is a direct communication between proxies, not virtual IPs.
Gateway allows external traffic into the mesh. In traditional k8s this would have been an LB.
Virtual service is an abstraction ontop of a k8s service. The host is the equivalent of a traditional k8s service.