In this tutorial, we'll walk through how a platform team can build a guided experience for creating modern, streaming data pipelines. Empower data engineers to spin up Pub/Sub, Cloud Run Functions, and Cloud Storage via a guided UI that emits infrastructure as code.
To follow along with this guide, sign up for a free account of Resourcely here. Once you have signed up, navigate to the Foundry.
Goals and outcomes
Consider a company with many data engineers. Perhaps they are using an expensive ETL tool, or they want to move towards event-based data pipelines. In this scenario, these engineers want to deploy new data pipelines but may not be cloud infrastructure experts.
This company's platform team can streamline configuration of these data pipelines by using Resourcely Blueprints and Guardrails. The result will be a guided, UI-based experience for developers that will allow them to generate properly configured infrastructure as code.
Developers: deploy faster, on their own, without mistakes.
Platform teams: create trusted patterns, without being stuck in support & operations work.
Architecture
We will automate data pipeline creation with the following GCP stack:
Pub/Sub for a message queue
Cloud Run Functions for custom code computation and transformation
Cloud Storage to store the results and Cloud Run Function
A similar set of Blueprints and Guardrails could apply to AWS services: SNS, Lambdas, and S3.
Terraform Example
The following Terraform code will create a one-off data pipeline. This would require developers that are comfortable with Terraform and the cloud services options that can be set within it.
Original Terraform code
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = ">= 4.34.0"
}
}
}
resource "random_id" "bucket_prefix" {
byte_length = 8
}
resource "google_service_account" "default" {
account_id = "test-gcf-sa"
display_name = "Test Service Account"
}
resource "google_pubsub_topic" "default" {
name = "functions2-topic"
}
resource "google_storage_bucket" "default" {
name = "${random_id.bucket_prefix.hex}-gcf-source" # Every bucket name must be globally unique
location = "US"
uniform_bucket_level_access = true
}
data "archive_file" "default" {
type = "zip"
output_path = "/tmp/function-source.zip"
source_dir = "function-source/"
}
resource "google_storage_bucket_object" "default" {
name = "function-source.zip"
bucket = google_storage_bucket.default.name
source = data.archive_file.default.output_path # Path to the zipped function source code
}
resource "google_cloudfunctions2_function" "default" {
name = "function"
location = "us-central1"
description = "a new function"
build_config {
runtime = "nodejs16"
entry_point = "helloPubSub" # Set the entry point
environment_variables = {
BUILD_CONFIG_TEST = "build_test"
}
source {
storage_source {
bucket = google_storage_bucket.default.name
object = google_storage_bucket_object.default.name
}
}
}
service_config {
max_instance_count = 3
min_instance_count = 1
available_memory = "256M"
timeout_seconds = 60
environment_variables = {
SERVICE_CONFIG_TEST = "config_test"
}
ingress_settings = "ALLOW_INTERNAL_ONLY"
all_traffic_on_latest_revision = true
service_account_email = google_service_account.default.email
}
event_trigger {
trigger_region = "us-central1"
event_type = "google.cloud.pubsub.topic.v1.messagePublished"
pubsub_topic = google_pubsub_topic.default.id
retry_policy = "RETRY_POLICY_RETRY"
}
}
This code creates the following Resources:
Google Service Accounts
Google Pub/Sub Topic
Google Storage Bucket
Google Storage Bucket Object
Google Cloud Run Function
Inside the bucket, a .zip file is stored that contains the code that the Cloud Run Function will execute. This code is very rigid as-is:
Assumes node.js code for the function
Hard codes a variety of parameters
256 megabytes of memory
60 second timeout
Maximum of 3 instances
Hardcoded environment variables
Converting to Resourcely Blueprint
We will take this code and turn it into a dynamic, interactive template where developers can use a UI to deploy. Here's a preview of what the UI we will generate will look like:
Let's walk through our resources step-by-step and convert it to a Resourcely Blueprint!
General Purpose Blueprint Code
First, we'll add some general purpose variables to our Blueprint. Resourcely Blueprint code is structured first with frontmatter, where variables and their tags are defined. Then, templated Terraform that references our frontmatter variables are structured.
Here, we create a name and location variable and a special __name constant. The desc, required, suggest, and group tags will all impact behavior in the resulting generated UI:
Variables can be referenced multiple times through Blueprints. You'll see {{ name }} and {{ __name }}, which will use the input from this single UI field.
At the end of the frontmatter we also define groups, logical organization of input fields:
Groups in frontmatter
groups:
General:
desc: |
General configuration for the Blueprint, including resource names and locations.
order: 1
Service Account:
desc: |
Configuration related to the Google Cloud service account used by the Cloud Function.
order: 2
Pub/Sub:
desc: |
Configuration for the Pub/Sub topic that triggers the Cloud Function.
order: 3
Storage:
desc: |
Configuration for the storage bucket used to store Cloud Function source code.
order: 4
Archive:
desc: |
Configuration for archiving the Cloud Function source code into a deployable format.
order: 5
Cloud Function:
desc: |
Configuration for the Cloud Function, including runtime settings and environment variables.
order: 6
Advanced:
desc: |
Advanced configuration options for testing, ingress settings, and retry policies.
order: 7
Service Accounts
We'll now move to the service account resource, which are used to logically separate different resources within GCP.
// Frontmatter
service_account_name:
desc: |
**The name of the service account to be created.**
- Must be unique within the project
required: true
suggest: "test-gcf-sa"
group: Service Account
// Inline variable reference
resource "google_service_account" "{{ __name }}" {
account_id = "{{ service_account_name }}"
display_name = "Service Account for {{ name }}"
}
Notice that the google_service_account resource references {{ __name }} for a globally unique name. {{ service_account_name }}, which we also defined in the frontmatter, is then referenced for the account_id parameter.
Here's what the UI looks like for the variable we defined:
Pub/Sub
Pub/Sub is relatively simple, although we'll come back to this later to make it more secure.
// Frontmatter
pubsub_topic_name:
desc: |
**The name of the Pub/Sub topic to be created.**
- Must be unique within the project
required: true
suggest: "functions2-topic"
group: Pub/Sub
// Inline
resource "google_pubsub_topic" "{{ __name }}" {
name = "{{ pubsub_topic_name }}"
}
resource "google_pubsub_topic" "default" {
name = "functions2-topic"
}
Note that we've introduced a single variable called pubsub_topic_name. Our description and suggestion give the user critical context they may have been missing:
Name must be unique, but only within the project
The suggestion lets the user know the expected format
Storage Bucket and Bucket Object
Our Storage Bucket is for hosting function results, as well as the function code that will be executed. The original Terraform code had many hard-coded fields without much guidance:
File type and path for the function
Hardcoded location
// Frontmatter
bucket_prefix:
desc: |
**The prefix for the storage bucket name.**
- Ensures globally unique bucket names
required: true
suggest: "random-prefix"
group: Storage
bucket_object_name:
desc: |
**The name of the object stored in the bucket.**
- Typically the zip file for Cloud Function source code
required: true
suggest: "function-source.zip"
group: Storage
archive_file_type:
desc: |
**The archive file type.**
- Must be a valid `zip`
required: true
suggest: "zip"
group: Archive
archive_output_path:
desc: |
**The output path for the archived file.**
- Path where the archive will be stored
required: true
suggest: "/tmp/function-source.zip"
group: Archive
archive_source_dir:
desc: |
**The source directory for the archived file.**
- Path to the directory containing the function source code
required: true
suggest: "function-source/"
group: Archive
// Inline
resource "google_storage_bucket" "{{ __name }}" {
name = "{{ bucket_prefix }}-gcf-source"
location = "{{ location }}"
uniform_bucket_level_access = true
}
data "archive_file" "{{ __name }}" {
type = "{{ archive_file_type }}"
output_path = "{{ archive_output_path }}"
source_dir = "{{ archive_source_dir }}"
}
resource "google_storage_bucket_object" "{{ __name }}" {
name = "{{ bucket_object_name }}"
bucket = google_storage_bucket.{{ __name }}.name
source = data.archive_file.{{ __name }}.output_path
}
resource "google_storage_bucket" "default" {
name = "${random_id.bucket_prefix.hex}-gcf-source" # Every bucket name must be globally unique
location = "US"
uniform_bucket_level_access = true
}
data "archive_file" "default" {
type = "zip"
output_path = "/tmp/function-source.zip"
source_dir = "function-source/"
}
resource "google_storage_bucket_object" "default" {
name = "function-source.zip"
bucket = google_storage_bucket.default.name
source = data.archive_file.default.output_path # Path to the zipped function source code
}
The Resourcely Blueprint code introduces variables for input, while giving guidance to the user. Note also the use of {{ location }}, a variable previously defined. With this behavior, we don't require the user to choose location multiple times.
Cloud Run Function
Finally we come to our Cloud Run Function, where we are introducing the most guidance and flexibility for users. In this section, we focus on enumerating possible options for users to choose, in the description field as well as in the form pick lists.
Function runtime: A user may not have been familiar with node.js, and may have not known how to utilize python code for their function.
Timeout: Flexibility to choose a timeout, and guidance on the length of time
Function entry point: Ensuring the user knows that their code needs an entry point, and that it is matches their code
Available memory: The amount of RAM to dedicate to the function, with an indicative format to guide users
Ingress: Optional ingress settings, and the available options
Retry policy: The policy on retries, and the two possible options
Environment variables: Optional environment variables the users can take advantage of in their code.
// Frontmatter
function_runtime:
desc: |
**The runtime for the Cloud Function.**
- Default: `nodejs16`
- Other options: `nodejs14`, `python39`, `go116`, ...
- See [GCP Supported Runtimes](https://cloud.google.com/functions/docs/concepts/exec#runtimes)
required: true
suggest: "nodejs16"
group: Cloud Function
function_entry_point:
desc: |
**The entry point for the Cloud Function.**
- Matches the exported function in your source code
required: true
suggest: "helloPubSub"
group: Cloud Function
available_memory:
desc: |
**The memory available for the Cloud Function.**
- Default: `256M`
- Other options: `128M`, `512M`, `1G`, ...
required: true
suggest: "256M"
group: Cloud Function
timeout_seconds:
desc: |
**The timeout for the Cloud Function.**
- Default: `60`
- Maximum: `540`
required: true
suggest: 60
group: Cloud Function
build_config_test:
desc: |
**Test environment variable for the build configuration.**
- Used for advanced build testing
- Feel free to change to match your function
required: false
suggest: "build_test"
group: Advanced
service_config_test:
desc: |
**Test environment variable for the service configuration.**
- Used for advanced service testing
- Feel free to change to match your function
required: false
suggest: "config_test"
group: Advanced
ingress_settings:
desc: |
**Ingress settings for the Cloud Function.**
- Default: `ALLOW_INTERNAL_ONLY`
- Other options: `ALLOW_ALL`, `ALLOW_INTERNAL_AND_GCLB`
required: false
suggest: "ALLOW_INTERNAL_ONLY"
group: Advanced
retry_policy:
desc: |
**Retry policy for the Cloud Function event trigger.**
- Default: `RETRY_POLICY_RETRY`
- Other options: `RETRY_POLICY_DO_NOT_RETRY`
required: false
suggest: "RETRY_POLICY_RETRY"
group: Advanced
// Inline
resource "google_cloudfunctions2_function" "{{ __name }}" {
name = "{{ name }}"
location = "{{ location }}"
description = "A new function for {{ name }}"
build_config {
runtime = "{{ function_runtime }}"
entry_point = "{{ function_entry_point }}"
environment_variables = {
BUILD_CONFIG_TEST = "{{ build_config_test }}"
}
source {
storage_source {
bucket = google_storage_bucket.{{ __name }}.name
object = google_storage_bucket_object.{{ __name }}.name
}
}
}
service_config {
max_instance_count = 3
min_instance_count = 1
available_memory = "{{ available_memory }}"
timeout_seconds = {{ timeout_seconds }}
environment_variables = {
SERVICE_CONFIG_TEST = "{{ service_config_test }}"
}
ingress_settings = "{{ ingress_settings }}"
all_traffic_on_latest_revision = true
service_account_email = google_service_account.{{ __name }}.email
}
event_trigger {
trigger_region = "{{ location }}"
event_type = "google.cloud.pubsub.topic.v1.messagePublished"
pubsub_topic = google_pubsub_topic.{{ __name }}.id
retry_policy = "{{ retry_policy }}"
}
}
Putting all of our Blueprint code together looks like the below. You can paste this directly into Resourcely Foundry to immediately publish and get started.
Full Blueprint Code
---
constants:
__name: "{{ name }}_{{ __guid }}"
variables:
name:
desc: |
**The base name for all resources in this Blueprint.**
- Must be unique within the GCP project
required: true
suggest: "my-cloud-function"
group: General
location:
desc: |
**The location for all GCP resources in this Blueprint.**
- Must be a valid GCP region
required: true
suggest: "us-central1"
group: General
service_account_name:
desc: |
**The name of the service account to be created.**
- Must be unique within the project
required: true
suggest: "test-gcf-sa"
group: Service Account
pubsub_topic_name:
desc: |
**The name of the Pub/Sub topic to be created.**
- Must be unique within the project
required: true
suggest: "functions2-topic"
group: Pub/Sub
bucket_prefix:
desc: |
**The prefix for the storage bucket name.**
- Ensures globally unique bucket names
required: true
suggest: "random-prefix"
group: Storage
bucket_object_name:
desc: |
**The name of the object stored in the bucket.**
- Typically the zip file for Cloud Function source code
required: true
suggest: "function-source.zip"
group: Storage
archive_file_type:
desc: |
**The archive file type.**
- Must be a valid `zip`
required: true
suggest: "zip"
group: Archive
archive_output_path:
desc: |
**The output path for the archived file.**
- Path where the archive will be stored
required: true
suggest: "/tmp/function-source.zip"
group: Archive
archive_source_dir:
desc: |
**The source directory for the archived file.**
- Path to the directory containing the function source code
required: true
suggest: "function-source/"
group: Archive
function_runtime:
desc: |
**The runtime for the Cloud Function.**
- Default: `nodejs16`
- Other options: `nodejs14`, `python39`, `go116`, ...
- See [GCP Supported Runtimes](https://cloud.google.com/functions/docs/concepts/exec#runtimes)
required: true
suggest: "nodejs16"
group: Cloud Function
function_entry_point:
desc: |
**The entry point for the Cloud Function.**
- Matches the exported function in your source code
required: true
suggest: "helloPubSub"
group: Cloud Function
available_memory:
desc: |
**The memory available for the Cloud Function.**
- Default: `256M`
- Other options: `128M`, `512M`, `1G`, ...
required: true
suggest: "256M"
group: Cloud Function
timeout_seconds:
desc: |
**The timeout for the Cloud Function.**
- Default: `60`
- Maximum: `540`
required: true
suggest: 60
group: Cloud Function
build_config_test:
desc: |
**Test environment variable for the build configuration.**
- Used for advanced build testing
- Feel free to change to match your function
required: false
suggest: "build_test"
group: Advanced
service_config_test:
desc: |
**Test environment variable for the service configuration.**
- Used for advanced service testing
- Feel free to change to match your function
required: false
suggest: "config_test"
group: Advanced
ingress_settings:
desc: |
**Ingress settings for the Cloud Function.**
- Default: `ALLOW_INTERNAL_ONLY`
- Other options: `ALLOW_ALL`, `ALLOW_INTERNAL_AND_GCLB`
required: false
suggest: "ALLOW_INTERNAL_ONLY"
group: Advanced
retry_policy:
desc: |
**Retry policy for the Cloud Function event trigger.**
- Default: `RETRY_POLICY_RETRY`
- Other options: `RETRY_POLICY_DO_NOT_RETRY`
required: false
suggest: "RETRY_POLICY_RETRY"
group: Advanced
groups:
General:
desc: |
General configuration for the Blueprint, including resource names and locations.
order: 1
Service Account:
desc: |
Configuration related to the Google Cloud service account used by the Cloud Function.
order: 2
Pub/Sub:
desc: |
Configuration for the Pub/Sub topic that triggers the Cloud Function.
order: 3
Storage:
desc: |
Configuration for the storage bucket used to store Cloud Function source code.
order: 4
Archive:
desc: |
Configuration for archiving the Cloud Function source code into a deployable format.
order: 5
Cloud Function:
desc: |
Configuration for the Cloud Function, including runtime settings and environment variables.
order: 6
Advanced:
desc: |
Advanced configuration options for testing, ingress settings, and retry policies.
order: 7
---
# Service Account
resource "google_service_account" "{{ __name }}" {
account_id = "{{ service_account_name }}"
display_name = "Service Account for {{ name }}"
}
# Pub/Sub Topic
resource "google_pubsub_topic" "{{ __name }}" {
name = "{{ pubsub_topic_name }}"
}
# Storage Bucket
resource "google_storage_bucket" "{{ __name }}" {
name = "{{ bucket_prefix }}-gcf-source"
location = "{{ location }}"
uniform_bucket_level_access = true
}
# Archive File
data "archive_file" "{{ __name }}" {
type = "{{ archive_file_type }}"
output_path = "{{ archive_output_path }}"
source_dir = "{{ archive_source_dir }}"
}
# Storage Bucket Object
resource "google_storage_bucket_object" "{{ __name }}" {
name = "{{ bucket_object_name }}"
bucket = google_storage_bucket.{{ __name }}.name
source = data.archive_file.{{ __name }}.output_path
}
# Cloud Function
resource "google_cloudfunctions2_function" "{{ __name }}" {
name = "{{ name }}"
location = "{{ location }}"
description = "A new function for {{ name }}"
build_config {
runtime = "{{ function_runtime }}"
entry_point = "{{ function_entry_point }}"
environment_variables = {
BUILD_CONFIG_TEST = "{{ build_config_test }}"
}
source {
storage_source {
bucket = google_storage_bucket.{{ __name }}.name
object = google_storage_bucket_object.{{ __name }}.name
}
}
}
service_config {
max_instance_count = 3
min_instance_count = 1
available_memory = "{{ available_memory }}"
timeout_seconds = {{ timeout_seconds }}
environment_variables = {
SERVICE_CONFIG_TEST = "{{ service_config_test }}"
}
ingress_settings = "{{ ingress_settings }}"
all_traffic_on_latest_revision = true
service_account_email = google_service_account.{{ __name }}.email
}
event_trigger {
trigger_region = "{{ location }}"
event_type = "google.cloud.pubsub.topic.v1.messagePublished"
pubsub_topic = google_pubsub_topic.{{ __name }}.id
retry_policy = "{{ retry_policy }}"
}
}
Resulting UI
We now have a fully functioning form that developers can interact with. As covered above, the form has extensive guidance around possible input values, minimums and maximums, formatting, and other tips. When a user fills out this form, their infrastructure as code (and data pipeline!) will be created for them automatically and deployed using your existing CI/CD process.
Adding Guardrails
Now that we have created a streamlined configuration experience, we have unblocked data engineers to create their own data pipelines.
However, you may want to also put more strict controls in place. What if a developer wants to deploy Terraform outside of Resourcely, or if you want to require approval to use a language other than node.js in your Cloud Run Function?
If we want to control what GCP location can be used for our data pipeline, we could control this with Guardrails. The following Guardrail can be published using Resourcely Foundry:
GUARDRAIL "[Misc] GCP Allowed Regions"
WHEN google_*
REQUIRE region IN ["US-CENTRAL1"]
OVERRIDE WITH APPROVAL @default
Note the wildcard in the WHEN clause: this means that the location restrictions will be enforced for all Google resources.
This Guardrail will manifest itself in two ways:
Exposed as part of the Blueprint form. If a developer wants to deviate, they need to "unlock" the Guardrail.
The region value will be checked during your CI, and PRs that don't meet the requirements will be blocked and require review (i.e., if they were unlocked in the form and changed)
Cloud Run Function Runtime
Guardrails are incredibly flexible, which is their beauty. We can also create a Guardrail that restricts the runtime used for Cloud Run Functions specifically.
GUARDRAIL "Require Python 3.9 for Cloud Run Functions"
WHEN google_cloudfunctions2_function
REQUIRE build_config.runtime = "python39"
This Guardrail restricts the user from selecting anything but Python 3.9:
Conclusion
Engineers of all types are looking for tools to move faster. Cloud infrastructure is a complex, nuanced topic that require expertise and guidance: usually in the form of platform teams.
With Resourcely, we were able to turn a potentially confusing Terraform example into a guided experience that can turn the data pipeline deployment process from a headache into a breeze.
constants:
"
variables:
desc: |
**The base name for all resources in this Blueprint.**
- Must be unique within the GCP project
required: true
suggest: "my-cloud-function"
group: General
desc: |
**The location for all GCP resources in this Blueprint.**
- Must be a valid GCP region
required: true
suggest: "us-central1"
group: General