Tech Giant: GCP

Why Zone & Regions

If your application deployed in 1 zone only Singapore zone-a
and suppose zone-a goes down, so your system will down, so you should have multi-zone deployment.

suppose now, your application in 2 zone, zone-a and zone-b, but suppose now whole region is down.
again you your system will goes down, so you should have multi region deployment or recovery plan also.

so now suppose you have deployment in Singapore region and us-west also. that will be better recovery plan

Benefits
Low latency
Follow Government rules. - some data must exist only on certain location only.
High Availability
Disaster recovery

Total 24 project can create by default,

No organization project we can create.

if you have organization you can select the project under Orgination

here learncloudwithankit is account and gcp-devops-3385 is project id

how to switch project id on cloud shell
gcloud config set project gcp-devops-33851

Cloud shell quota limit
50 hours a week

what is persistent in gcp cloud shell?
so whatever you install that will be lost
but if you save something in /home/learcloudwithankit will be persistant and you have 5gb storage.

SRE

Conflict

DevOps is set of practices, guideline and culture
which designed to reduce the gap b/w software development and software operations.

DevOps established five goals.
1. Reduce organizational silos.
2. Accept failure as normal.
3. Implement gradual changes.
4. Leverage tooling and automation.
5. Measure everything.

But devops dont know how to achieve these 5 goals,
then SRE comes in picture.

SRE
Goal of devops very broad.
Devops does not define how to implement it.

Devops is philosophy where SRE is implementation of Devops's Philosophy.

SRE Practices
1. SRE Role: - eventually SRE is replace all the operator role, but then it share the responsibility with developer also.
Its just not the developer role who handle the code, developer also going to handle/maintain s/w in production.

Eventually merging the role of developer and operator is SRE role.
2. Blameless postmortems
3. error budget
4. reduce toil
5. track service level metrics, SLIs, SLOs, and SLAs.

SRE Role
Specific job role
Old operator role -> SRE Role
A site Reliability Engineer is basically the result of asking a software engineer to design an operations team.
SRE requires experience in both development as well as operations
SRE spends half of their time doing ops-related work
production issues, attending call, performing manual interventions
SRE spends other half of their time in development task, Scaling system, automation
Compared to old operator, both SRE & Developer share responsibility of Prod Server
SREs build the tools that developers use to compile, test, and deploy their code.(CI/CD Pipeline)
Developers and SREs work together to fix issue

Blameless postmortems

Blameless postmortems
Idea behind blameless postmortems
is to analyze system failure
Root cause behind it.
Discuss about what has happened exactly.
what action need to be performed.

Not to look for someone who can be blamed.

Assumption is - everyone had good intentions
Some postmortems question need to be asked.
When incident begin & end?
How incident get notified
who are all involved
which system are affected
what is root cause of failure
how to avoid in future

accept that with human error are involved.
Blameless postmortems is
Honest communication with other team member so that similar incident can be avoided in

Error budget

1.One of goal of DevOps is implement gradual change
2.why outage occurs
added new feature, change, new hardware, security patches
More change leads to less stable system

3.How to balance between change & stability
we have to define metric for high system reliability.
It is business problem
how much can the service fail before it begins to have a significant negative impact?

4.how quickly do we need to be able to release new features?

Depending on target, need to define error budget.

Anytime your service is down, time require to recover it will be consumed from error budget
After you define error budget
as long as you are within error budget, you are good to go for more changes
once you run out of error budget, need to hold all future changes for deployment & make system stable first

Larger error budget
means more downtime for service acceptable,
frequent changes possible.

Less error budget,
means less downtime for service acceptable,
lesser changes allowed.

Error budget make sure smaller & gradual changes deployed.

Toil
how to eliminate toil?
One of goal of devops is leverage tooling and automation
Maximum automation, so minimum human intervention.
There is lots of task are manual, laborious.
Task like Password Change, Copy Files, Creating new Folders, Restart Servers
These type of task are considered as toil.
Identifying Toil is important.
Not all task are Toil.
There are task which is laborious but not necessary is toil. and you have to do it manually, there is no option, you just can't automate it, in that particular case, you should not waste your time, to eliminate those toil, to make it automatable

Toil is related
Prod system
Manual, repetitive & automatable task (consider as toil)

How we are gonna reduce it?
SRE want to reduce Toil by automation
Task like
automate CICD pipeline
schedule Jobs
write some automattion script
Automate testing
No manual Provisioning hardware

If Repetitive task automated, it should be automated
Due to Automation, more resource can work something more interesting
SRE should spend significant amount of time in reducing toil.

define metric

define slo
Service level objective
it is internal objective of team

Let suppose your team has defined, that your slo for some particular task in your s/w should be 96%, thats means 96% of time your system should behave, as expected, 4% is error budget.

SLO is something everyone in org want to achieve
Error budget is directly related to slo
It kind complement to error budget
error -3% means service is down 3% at max
Slo - 97% means service should be up for 97%
Error budget + SLO = 100%
Define SLO with respect to latency, Availability, Response Time

SLI
Service level indicator
Indicator internal to team
SLI needs to be compared against SLO
SLI are metrics which track over time (generally 5 minutes interval)
SLI ranges from 0 to 100%

SLI = (Total Good Event / Total valid Event) x 100

Let's say SLO - 96%
96% of request should be serve within 300ms latency.

If current SLI is 95% or anything less than 96%, system is under performing.
SLI help us to find which service are not performing as per SLO

SLI
Service level indicator
Indicator internal to team
SLI needs to be compared against SLO
SLI are metrics which track over time (generally 5 minutes interval)
SLI ranges from 0 to 100%

SLI = (Total Good Event / Total valid Event) x 100

Good SLI leads customer happy

If Changes to SLI does not impact customer, SLI definition is not worth
Track different signal
Latency
Traffic
Errors
Saturation
Availability of system

Selecting right SLO & SLI will lead to success

SLO is target and SLI current performance of your s/w or application.

SLA
Service Level agreement
It is contract with consequences of failing to meet the SLOs they contain
SLO & SLA are quite similar
But your SLAs should not be the same as your SLOs
SLO is an internal objective,
if you can not meet SLO, team can slow down changes
SLAs violations are shared with your customers
if you can not meet SLA, compensate need to be provided to customers

Example of google SLA, in google for each service google providing somekind of SLA.
let say cloud run

SLI should be higher than SLO & SLA, means current indicator shows services are performing as expected

If SLI below SLO, slow down
if SLI goes below SLA, notify customer & compensate
Higher SLA Good but more likely you will violate it
Google recommendation in case very high SLA
Down your service for some time

Container Registry

HostName/ProjectID/imagename:Tag-gcr.io/[ProjectID]/nginx:1.0

docker tag myapp:v1.0 gcr.io/$DEVSHELL_PROJECT_ID/myapp:v1.0

docker push gcr.io/atomic-matrix-401102/myapp:v1.0

when you push your image to container registry, it will create on storage bucket - cloud storage

lets give another tagging for EU.
docker tag myapp:v1.0 eu.gcr.io/$DEVSHELL_PROJECT_ID/myapp:v1.0

push images
docker push eu.gcr.io/atomic-matrix-401102/myapp:v1.0

Artifact Registry
Artifact Registry comes with fine-grained access control via Cloud IAM
we can give admin/write/read access that is possible in artifact registry, in container register that where not possible, becz there were no role created in IAM for that.

Artifact Registry comes with fine-grained access control via Cloud IAM
Multiple Repository per project
Regional & Multi-region repositories
It can store not just Docker image but many more thing like NPM,maven,Python
asia-southeast1-docker.pkg.dev/[ProjectID]/[repo]/nginx:v1.0
Create Repo (Not Required for Container Registry)

Create Artifact repo
Name - demorepo
Format - Docker
Location - Regional - asia-southeast1
Type - Standard
Encryption - Google manged
Cleanup policy - Dry run (will not delete)

Configure:
gcloud auth configure-docker asia-southeast1-docker.pkg.dev
this above config 1 time, for each region

Let's see
How to configure via gcloud
Push image to Artifact Registry

after you run below auth cmd it will configure asia-southeast region on in your cloud shell
gcloud auth configure-docker asia-southeast1-docker.pkg.dev

How you can check current configured region ?
cat ~/.docker/config.json

now create a tag for AR
docker tag myapp:v1.0 asia-southeast1-docker.pkg.dev/atomic-matrix-401102/demorepo/myapp:v1.0

Push image to AR
docker push asia-southeast1-docker.pkg.dev/atomic-matrix-401102/demorepo/myapp:v1.0

App Deployment

Two things:
Where you want to deploy?
What are deployment strategy?

Deployment Compute options
Compute Engine
Kubernetes ----- use to deploy containerized application
App Engine ---- web app deployment, but completely server less
Cloud Run ------ use to deploy containerized application
Cloud Function-- event trigger cloud function

Deployment methods
Blue/green
Rolling
Canary
Traffic splitting

Blue/green
2 copies of prod system running in parellal
v1.0 v1.0
and your traffic currently to Left side may be blue one.
and your Developer need to deploy some new feature, v2.0, so he will do it in right side green one.

So now your traffic will switch for user to green one v2.0, and then left side blue one become staging server after that, and green one become live server,
And same way if new feature comes, then v3.0 will deploy to staging one and then switch traffic to staging one, and then it will be prod, and another server will be staging

Rolling Deployment
v1.0 v1.0 v1.0 v1.0

v1.0 v1.0 v1.0 v2.0

You gradually will update your v2.0 in your deployment, not in one go, if everything work fine, then it will deploy phase by phase in other system also

Canary Deployment
Let suppose your app deployed in 8 difference resource
but in canary deployment we deploy small percentage of machine,
and let say user satisfied with version v2.0, and here only small amout of user will be affected,
if everything fine, you can deploy to all resource.

in Canary deployment compare to rolling deployment
you will not update 1 resource, you will update a small percentage of resources.

Traffic splitting
Small Percentage of user will be served new version (ex: 10-20%)
If everything is fine, Redirect all user to new version.
Traffic splitting can be used for A/B Testing.

Deploy Cloud Functions

Cloud function is lightweight, event-based, asynchronouse compute solution
that allows you to create small, single-purpose fuctions
that respond to solution that allow you to create small, single-purpose fuctions
that respond to cloud events without the need to manage a server or a runtime environment.

Deploy a function Manually

Create functions
Name- function-1-v10
Region - us-central1

Trigger
HTTP
Cloud Pub/Sub
Cloud Storage
Cloud Firestore
Google Analytics for Firbase
Firebase Realtime Database
Firebase Remote config
so on----

we choose HTTP

URL - https://us-central1-gcp-devops-338510.cloudfunctions.net/function1-v10

Authentication - Allow unauthenticated invocations
Required - HTTPS
Save

Runtime,build,connections and security settings
Runtime
Memory allocated - 256MB
Timeout - 60 second

Runtime service account
App Engine default service account

Next
Configuration
Runtime - python3.7
Entry Point
helloWorld

Source Code
Inline Editor
main.py
requirements.txt

update the msg in main.py - Hello world 1

Now may get error Cloud Build API required to use the runtime selected. - so need to enable cloud build api.

Deploy - once it success - go th HTTP url for trigger, and see if you able to see the msg ?
Same way update next version of the function.

after function get deploy we can check it by Trigger link.

if we change anything in code and deploy again with same deployment,
still it will create new version - and older version1 function will not be available anymore.

if you want to use both of version - version1 and 2 at one-shot
so in that case recommended practice, you will be deploy 2 function.
and internally you can manage it via load balancer and serve some percentage with function 1 and same version 2.0
so in this way at DNS level you can redirect traffic to different function.
but as part of service google cloud function, does not offer such a functionality.
to manage all those version, at the moment we update newer code our earlier code
simple vanished, and we don't have any provision to go back, so that is not good idea

App Engine

Cloud function - single purpose, 1 single task, 1 single micro-service, kind of task you want to perform, some kind of event.
App Engine - If you deploy full fledged application.

Deploying app on App Engine

For every Single project you can deploy only 1 App engine,
Once you create application you can not create any other application,
you have to go to New project

Create App Engine

Next page select

Resource

Language - Python /

Environment - Standard / Flexible

aftter that create below file

cat main.py
from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello_world():
return 'Hello, World!'

if __name__ == "__main__":
app.run(host='0.0.0.0',port=8080)

cat requirements.txt
Flask==3.0.0

cat app.yaml
runtime: python39

and run cmd in cloud shell

glcloud app deploy

Now there is hierarchy
Dashboard
Services - inside application you can deploy multiple services
Versions - inside the service there are multiple versions
Instance - All version will being install on 1 or multiple instance.

Update new version in App engine

Change code
vi main.py
from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello_world():
return 'Hello, World version 2.0'

if __name__ == "__main__":
app.run(host='0.0.0.0',port=8080)

go to cloud shell run below cmd
gcloud app deploy --no-promote --version 2

So now if you want to use both version, you can use Split traffic

Cloud Run

Its completely server less you can scale it from 0 to infinity.
you not need to manage anything in cloud run.
you can deploy your containerized application.
In app engine standard env we deploy, we need to worry about specific runtime.
In cloud run, before you build your own docker image, custom runtime, anything you can deploy in Cloud run.

In container registry some image we see that we pushed manually,
and some of them you will see created by gcp itself, like gcf

Create cloud run service

Create service

Options:
Deploy one revision from an existing container image
Continuously deploy new revisions from a source repository

CPU allocation and pricing:
CPU is only allocated during request processing
CPU is always allocated

Autoscaling:
Minimum number of instances 0
Maximum number of instances 100

Ingress:
Allow all traffic
Allow internal traffic and traffic
Allow internal traffic only

Authentication:
Allow unauthenticated invocations
Require Authentication

Container,Variables & Secrets, Connections, Security
Container port: 8080

Capacity:
Memory: 512MiB
CPU: 1
Request timeout: 300
Maximum requests per container: 80

Execution environment:
Default
First generation
Second generation

Save

after deployment

Deployment of GKE

Create Kubernetes cluster

Create Cluster
GKE Standard

GKE Autopilot

Configure standard one
Cluster name- cluster1
Location
Zone
Regional

Cluster Basics
Node Pools
Name: default-pool
Size: 2

Nodes
Machine Configuration
Serice E2
Machine type
e2-machine

Security
Access Scopes
Allow Default access
Metadata

Create

Once GKE cluster created
you will get option of deploy as well
and same thing we can do it from workload also in GKE UI.

In workload
we can deploy 2 way
Existing container image
New container image: image path

configure,
application name
Namespace: default

after pod deploy
you can not access service becz it did not expose service.
you need to expose it, and inform service type also.

Rolling update in GKE

here you need to provide new version url and

minimum sec ready

max surge - 25%

max unavailable- 25%

Option KUBECTL is here, that container whole yaml file, that container all config done for pod.

Deploy App to compute Engine

old way
IAAS - Infrastructure as a service
General Purpose computing machine
2 ways Deployment
Containerized App

via Container optimized OS.
via other OS + manual Docker installation.

Non containerized app

manual install Apache.
install via startup scrip.

What is instance group?

deploying app to multiple machine.

Deploy to instance group
Instance template
blue print for all virtual machine
Create instance from template
Instance group

managed
unmanaged

Load balancer

Difference b/w Continuous Deployment vs Continuous Delivery
Continuous Deployment

Fully automated, no manual intervention
code is continuously build & deploy

Continuous Delivery

Release to production
may involve manual approval
It will make sure delivery are often & fast
Before continuous Delivery, frequency of release usually one in 3-month,
Now possible to release 5 times in day

Build
Circleci,teamcity,Jenkins like tool- Cloud Build

Artifact Storage
Jfrog Artifactory,Dockerhub,Artifact Storage - Container Registry,Artifact Registry

Deployment
Compute engine
Kubernetes
App engine
Cloud run
Cloud function

Source Code management
Bitbucket,Github,mercurial - Cloud Source Repository

CICD Pipline - 1

Create Docker Image & Push to Container Registry

Source Code
Dockerfile
Main.py

Cloud Build to Build Images

Push Image to registry

CICD-1

create a directory in shell
mkdir cicd-pipeline

copy code in the cicd-pipeline directory
ls
Dockerfile main.py

Create a source repository in gcp
repo-1

push the code in source repo
git add *
git commit -m "first commit"
git push -u origin master

Create the Cloud build pipeline
cicd-1

Run the build 1st time manually.

2nd time by pushing code in git repo.

CICD-2

Deploy app on App engine

Source Code
app.yaml
main.py
requirements.txt

Cloud Build with
cloudbuild.yaml

Deply to App Engine

cloudbuild.yaml
steps:
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
entrypoint: 'bash'
args ['-c', 'gcloud config set app/cloud_build_timeout 1600 && gcloud app deploy']
options:
logging : CLOUD_LOGGING_ONLY
timeout: 1600s

Note : if you notice above args is like a cmd.

Cloud Builder
Google provided a good amount of cloud builder, by google also, by community as well, and you can also create your own custom builder also.
https://cloud.google.com/build/docs/cloud-builders

exam:
builder images

Community-contributed builders

Writing your own custom builder

Deploy your app engine code via CICD with new service account
you will get many error.
like log writer
app engine permission etc.
below are the permission you need to give the service account you create

CICD-3

Deploy app on GCP function

Source Code
main.py
requirements.txt
function-source.zip

Cloud Build with
cloudbuild.yaml

Deploy to Cloud function

cloudbuild.yaml
steps:
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
args:
- gcloud
- functions
- deploy
- function_cid
- --region=us-central1
- --source=.
- --trigger=http
- --runtime=python37
- --allow-unauthenticated

In above if you see args is here for cmd we want to execute.

cloud functions --help

CICD-3

create a directory in shell
mkdir cicd-pipeline

create a source repo
repo-3

copy code in the repo-3 directory
ls
main.py requirements.txt

create a zip of main.py and requirements.txt
and right cloudbuild.yaml
cloudbuild.yaml function-source.zip

cloudbuild.yaml
steps:
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
args:
- gcloud
- functions
- deploy
- function_cid
- --region=us-central1
- --source=.
- --trigger=http
- --runtime=python37
- --allow-unauthenticated
options:
logging : CLOUD_LOGGING_ONLY

creat a build trigger in cloud build cicd-3

push the code in source repo
git add *
git commit -m "first commit"
git push -u origin master

give the below permission to IAM service

CICD - 4

Deploy to Cloud run

Tech Giant

Wednesday, November 15, 2023

GCP