Gcp
GCP#
Getting Started#
Ref: https://cloud.google.com/dataflow/docs/quickstarts/quickstart-python
In the Google Cloud Console, on the project selector page, select or create a Google Cloud project. Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.
Go to project selector
Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project. Enable the Dataflow, Compute Engine, Cloud Logging, Cloud Storage, Google Cloud Storage JSON, BigQuery, Cloud Pub/Sub, Cloud Datastore, and Cloud Resource Manager APIs.
Enable the APIs
Create a service account:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Create a service account key:
1 2 3 4 5 |
|
Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the JSON file that contains your service account key. This variable only applies to your current shell session, so if you open a new session, set the variable again.
Example: Linux or macOS
export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH
"
Replace KEY_PATH with the path of the JSON file that contains your service account key.
For example:
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/service-account-file.json"
Example: Windows
For PowerShell:
$env:GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH
"
Replace KEY_PATH with the path of the JSON file that contains your service account key.
For example:
$env:GOOGLE_APPLICATION_CREDENTIALS="C:\Users\username\Downloads\service-account-file.json"
For command prompt:
set GOOGLE_APPLICATION_CREDENTIALS=KEY_PATH
Replace KEY_PATH with the path of the JSON file that contains your service account key. Create a Cloud Storage bucket:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Copy the Google Cloud project ID and the Cloud Storage bucket name. You need these values later in this document.
Setup#
gcloud CLI#
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
IAM#
Roles & Permissions#
Logging#
Structured Logging#
https://cloud.google.com/logging/docs/structured-logging
Log Analytics#
https://www.cloudskillsboost.google/focuses/49749?parent=catalog
BigQuery#
Dataflow#
Storage#
Cost#
https://cloud.google.com/storage/pricing#storage-pricing
GCP Cost#
Estimate#
- https://cloud.google.com/products/calculator
Scheduler#
To schedule the dataflow batch job.
- Requires OAuth authetication from service account
- Requires
https://www.googleapis.com/auth/cloud-platform
authorization scope
Ref: https://www.thecodebuzz.com/schedule-dataflow-job-google-cloud-scheduler/
Cost#
- pricing is based exclusively on the job
- execution of job is not billed; in fact existence of a job is billed
- $0.10 for per job per 31 days (i.e. $0.003/job/day)
- least time unit is a day
Note: A paused job is counted as a job.
Quota & Limit#
Ways#
Using HTTP#
https://cloud.google.com/sdk/gcloud/reference/scheduler/jobs/update/http
Use this option along with Dataflow REST api t
1 |
|
Using PubSub#
tbd
Using AppEngine HTTP#
tbd
Terraform script for the same:#
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
Ref: https://cloud.google.com/community/tutorials/schedule-dataflow-jobs-with-cloud-scheduler
Vertex AI#
- https://cloud.google.com/vertex-ai/pricing