Skip to main content

Hybrid Cloud Service Mesh

· 5 min read
Sanjeev Sarda
High Performance Developer

Notes I took whilst studying the "Hybrid Cloud Service Mesh with Anthos" course.

cloud

Observing Hybrid Cloud Service Mesh

Telemetry

We want to measure the performance of our apps - how long the app took, how long the network took, full e2e observability.

This was traditionally done with application instrumentation by developers.

Decouples the role of an operator from a developer.

All comms goes through the mesh in Istio, so telemetry can be derived, decoupled from the code.

Mixer offers adapters to telemetry backends like prometheus, influx, stackdriver.

Istio also gives you observability:

alt text

You can derive dependency trees based on traffic telemetry - which teams need to be updated/told when you change your application via traffic analysis.

alt text

  • Latency - app response time

  • Traffic - how many requests

  • Errors - 50x codes

  • Saturation - defined max queries per second, not provided by Istio, but self defined on a dashboard

Telemetry backends

Provide:

  • Log aggregation
  • Monitoring (metrics)
  • Alerting (function of monitoring)

Stackdriver

Cloud native intrumentation backend for GKE and GKE on prem. Not for other workloads.

alt text

Stackdriver collector will collect everything. There is also a logging collector and a metadata agent.

On Prem

Istio comes with promo and grafana by default.

Push GKE on prem to cloud stackdriver to have telemetry in one place. Or you push GKE on prem to your existing telemtry backend.

alt text

Managing Traffic Routing with Service Mesh

Pilot manages distributed proxies across environments.

alt text

Provides discovery service via the pod sidecar proxies.

alt text

The Envoy API is called xDS API - Pilot collects the topology info via adapters and converts it to the envoy api format.

The info it collects is basically Map<ServiceName, List<ServiceEndpoint>>

To make the network smart, it needs info like what services run where. Pilot provides an abstraction layer upon env topology.

Traffic Shaping

alt text

Istio provides more granularity and control.

  • Gateway - the ingress and egress

  • Virtual service - to get to point b which services should you use

  • Destination rule - how do I get to the endpoint I want

  • Service entry - do things beyond your mesh boundary

alt text

System does LB based on label selectors. Pilot can convert the service registry to the envoy api.

All pods know about all others via the service registry.

Pilot picks up the association between services and endpoints and propogates that info into your mesh.

alt text

The gateway is how you enter and exit the mesh. Once in, you need to hit a service. The virtual service is a routing rule - routing across different services which could be in different physical locations.

Virtual Service

A higher level abstraction on k8s service.

alt text

In the above we route http traffic with 95% to service_b subset v1 and 5% to the v2 subset.

Virtual Service with Gateway

alt text

The Istio ingressgateway is a sidecar that comes with Istio. It's exposed via LB on GKE.

The virtual service called bookinfo referenes the gateway.

The host in the route is the k8s service.

L7 Traffic Splitting

alt text

In this config we use a match on the headers using a regex on the user-agent.

Allows for device based routing based on http headers.

Destination Rule

After routing has already occured, we may still need to define and apply policies to traffic that's intended for a service for LB, session affinity, connection pooling and circuit breakers.

alt text

The host specified is the destination. We apply a least_conn policy for loadbalancing on port 80. LB happens on the client in Istio (at the proxy level). Each proxy makes LB decisions.

DestinationRule has a 1-2-1 relationship with a K8s service.

Traffic Splitting

alt text

A virtual service that tries to get to host service-b. We route it with a weight. Both hosts are service B but have subsets which

The destination rule is coupled with the k8s service, so can have different LB, auth etc based on different subsets. The subsets are defined using label selectors - version is a k8s label, though doesn't need to use version can also use production/canary.

Fault Injection

Making fault injection a function of your network.

alt text

Add a 5 second delay to a subset of your traffic and an abort with status 400 to a subset of traffic.

The proxy will intercept the traffic and create the abort or delay - not the application. You can also define a timeout in the virtualService.

Security In Service Mesh

Service traffic has to be encrypted against MITM attacks. It needs access controls, mTLS, access polciies and audit.

Istio security - security at the network level. No changes for app code, no auth in the app code for service traffic.

Our security posture has to become like a maze - every step you have to do something, so each service has to authenticate one another and everything is encrypted.

mTLS Flow

alt text

alt text

alt text

alt text

alt text

Citadel is responsible for all of the certs, it is the CA and also provides cert rotation.

alt text

Incremental mTLS

alt text

In permissive mode you can go from one service to another without authentication. With mutual mode you must use mTLS. This allows gradual adoption for things outside of our mesh which otherwise requires a CA etc.

Can lockdown which bits use mTLS selectively.

In strict mode, we say everything uses mTLS but there is now also explicit whitelisting and blacklisting.