Schedule production runs
This quickstart walks you through the following steps to schedule production runs in Dataform:
- Create a Dataform repository.
- Grant the required roles.
Create a release configuration and a workflow configuration.
Create a
productionrelease configuration and set the frequency of creatingproductioncompilation results. Then, create aproductionworkflow configuration, select theproductionrelease configuration, and set a schedule for runningproductioncompilation results.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
-
Enable the BigQuery and Dataform APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles. -
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
-
Create a project: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles.
-
Verify that billing is enabled for your Google Cloud project.
-
Enable the BigQuery and Dataform APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles.
Additionally, select or create a custom service account to run workflows in BigQuery.
Required roles
To get the permissions that you need to perform all tasks in this tutorial, ask your administrator to grant you the following IAM roles:
-
Dataform Admin (
roles/dataform.admin) on repositories -
Dataform Editor (
roles/dataform.editor) on workspaces and workflow invocations -
Service Account User (
roles/iam.serviceAccountUser) on custom service account -
Project IAM Admin (
roles/resourcemanager.projectIamAdmin) on project
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Grant required roles
To run workflows in BigQuery, you can use a custom service account or your Google Account (Preview). However, custom service account credentials are the default option for scheduled runs. Using Google Account user account credentials is discouraged for scheduled runs.
To run workflows in BigQuery, your custom service account must have the following required roles:
- BigQuery Data Editor
(
roles/bigquery.dataEditor) on projects to which Dataform needs both read and write access, which usually includes the project hosting your Dataform repository. - BigQuery Data Viewer
(
roles/bigquery.dataViewer) on projects to which Dataform needs read-only access. - BigQuery Job User
(
roles/bigquery.jobUser) on the project hosting your Dataform repository.
To let Dataform use your custom service account, the default Dataform service agent must have the following roles on the custom service account resource:
- Service Account Token Creator
(
roles/iam.serviceAccountTokenCreator) - Service Account User
(
roles/iam.serviceAccountUser)
To grant these roles, follow these steps:
In the Google Cloud console, go to the IAM page.
Click Grant access.
In the New principals field, enter your custom service account ID.
In the Select a role menu, select the following roles one by one, using Add another role for each additional role:
- BigQuery Data Editor
- BigQuery Data Viewer
- BigQuery Job User
Click Save.
In the Google Cloud console, go to the Service accounts page.
Select your custom service account.
Go to Principals with access, and then click Grant access.
In the New principals field, enter your default Dataform service agent ID.
Your default Dataform service agent ID is in the following format:
service-PROJECT_NUMBER@gcp-sa-dataform.iam.gserviceaccount.comIn the Select a role list, add the following roles:
- Service Account User
- Service Account Token Creator
Click Save.
Create a Dataform repository
In the Google Cloud console, go to the Dataform page.
Click Create repository.
On the Create repository page, do the following:
In the Repository ID field, enter
quickstart-production.In the Region list, select
europe-west4.In the Service account field, click Enter manually, and then enter the name of your custom service account. Ensure you enter your custom service account in this field.
Click Create.
Click Go to repositories.
Create a release configuration and workflow configuration
To create production compilation results of the quickstart-production
repository and schedule a run of production tables, follow these steps:
In the Google Cloud console, go to the Dataform page.
Click
quickstart-production.Click Releases & scheduling, then click Create production release.
In the Create release configuration pane, configure the following settings:
- In the Release ID field, enter
production. - In the Git commitish field, leave the default value
main. - In the Schedule frequency section, in the Repeats menu, select Custom.
- In the Custom schedule field, enter
0 16 * * *. In the Timezone menu, select a UTC+1 timezone, for example, Central European Standard Time (CET).
Every day at 4 PM UTC+1, Dataform compiles the
quickstart-productionrepository and applies the compilation settings configured in this release configuration to createproductioncompilation results.
- In the Release ID field, enter
Click Create.
The
productionrelease configuration creates a compilation result of the entirequickstart-productionrepository every day at 4PM UTC+1.Ensure that you're on the Releases & scheduling tab. Go to the Workflow configurations section and click Create.
In the Create workflow configuration pane, configure the following settings:
- In the Configuration ID field, enter
production. - In the Release configuration menu, select
production. - In the Schedule frequency section, in the Repeats menu, select Custom.
- In the Custom schedule field, enter
0 17 * * *. In the Timezone menu, select a UTC+1 timezone, for example, Central European Standard Time (CET).
Every day at 5PM UTC+1, Dataform runs the latest
productioncompilation result of thequickstart-productionrepository.Click All actions.
Dataform runs all the workflow actions in the
productioncompilation result.
- In the Configuration ID field, enter
Click Create.
The
productionworkflow configuration runs the entire latest compilation result created by the production release configuration every day at 5PM UTC+1.
View past production compilation results
To view past scheduled production compilation results, follow these steps:
In the Google Cloud console, go to the Dataform page.
Select the
quickstart-productionrepository.Click Releases & scheduling.
In the Release configurations section, click
production.
View past production workflow runs
To view past production workflow runs, follow these steps:
In the Google Cloud console, go to the Dataform page.
Select the
quickstart-productionrepository.Click Workflow Execution Logs.
Select a workflow run to see more detailed information, including the status of each action and any logs.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
Delete the dataset created in BigQuery
To avoid incurring charges for BigQuery assets, follow these
steps to delete the dataset called dataform_production:
In the Google Cloud console, go to the BigQuery page.
In the Explorer panel, expand your project and select
dataform_production.Click the Actions menu, and then select Delete.
In the Delete dataset dialog, enter delete, and then click Delete.
Delete the Dataform release configuration
There are no costs associated with creating Dataform release
configurations. However, if you want to delete the production release
configuration, follow these steps:
In the Google Cloud console, go to the Dataform page.
Click
quickstart-production.Click Releases & scheduling, and go to the Release configurations section.
By the
productionrelease configuration, click the More menu, and then click Delete.In the Delete release configuration dialog, click Delete.
Delete the Dataform workflow configuration
To avoid incurring charges for BigQuery assets, follow these
steps to delete the Dataform production workflow configuration:
In the Google Cloud console, go to the Dataform page.
Click
quickstart-production.Click Releases & scheduling, and go to the Workflow configurations section.
By the
productionworkflow configuration, click the More menu, and then click Delete.In the Delete release configuration dialog, click Delete.
Delete the Dataform repository
There are no costs associated with creating Dataform repositories. However, if you want to delete a repository and all its contents, follow these steps:
In the Google Cloud console, go to the Dataform page.
By
quickstart-production, click the More menu, and then select Delete.In the Delete repository window, enter the name of the repository to confirm deletion.
To confirm, click Delete.
What's next
- To learn more about service accounts, see About custom service accounts and Dataform service agents.
- To learn more about code lifecycle in Dataform, see Introduction to code lifecycle in Dataform.
- To learn more about best practices for the workflow lifecycle in Dataform, see Best practices for the workflow lifecycle.
- To learn more about release configurations in Dataform, see Create a release configuration.
- To learn more about workflow configurations in Dataform, see Schedule runs with workflow configurations.