Last updated
Last updated
In this tutorial, we'll walk through how a platform team can build a guided experience for creating modern, streaming data pipelines. Empower data engineers to spin up Pub/Sub, Cloud Run Functions, and Cloud Storage via a guided UI that emits infrastructure as code.
To follow along with this guide, sign up for a free account of Resourcely . Once you have signed up, navigate to the .
Consider a company with many data engineers. Perhaps they are using an expensive ETL tool, or they want to move towards event-based data pipelines. In this scenario, these engineers want to deploy new data pipelines but may not be cloud infrastructure experts.
This company's platform team can streamline configuration of these data pipelines by using Resourcely and . The result will be a guided, UI-based experience for developers that will allow them to generate properly configured infrastructure as code.
Developers: deploy faster, on their own, without mistakes.
Platform teams: create trusted patterns, without being stuck in support & operations work.
We will automate data pipeline creation with the following GCP stack:
Pub/Sub for a message queue
Cloud Run Functions for custom code computation and transformation
Cloud Storage to store the results and Cloud Run Function
A similar set of Blueprints and Guardrails could apply to AWS services: SNS, Lambdas, and S3.
This code creates the following Resources:
Google Service Accounts
Google Pub/Sub Topic
Google Storage Bucket
Google Storage Bucket Object
Google Cloud Run Function
Inside the bucket, a .zip
file is stored that contains the code that the Cloud Run Function will execute. This code is very rigid as-is:
Assumes node.js code for the function
Hard codes a variety of parameters
256 megabytes of memory
60 second timeout
Maximum of 3 instances
Hardcoded environment variables
We will take this code and turn it into a dynamic, interactive template where developers can use a UI to deploy. Here's a preview of what the UI we will generate will look like:
Let's walk through our resources step-by-step and convert it to a Resourcely Blueprint!
First, we'll add some general purpose variables to our Blueprint. Resourcely Blueprint code is structured first with frontmatter, where variables and their tags are defined. Then, templated Terraform that references our frontmatter variables are structured.
Here, we create a name
and location
variable and a special __name
constant. The desc
, required
, suggest
, and group
tags will all impact behavior in the resulting generated UI:
Variables can be referenced multiple times through Blueprints. You'll see {{ name }}
and {{ __name }}
, which will use the input from this single UI field.
At the end of the frontmatter we also define groups, logical organization of input fields:
We'll now move to the service account resource, which are used to logically separate different resources within GCP.
Notice that the google_service_account
resource references {{ __name }}
for a globally unique name. {{ service_account_name }}
, which we also defined in the frontmatter, is then referenced for the account_id
parameter.
Here's what the UI looks like for the variable we defined:
Pub/Sub is relatively simple, although we'll come back to this later to make it more secure.
Note that we've introduced a single variable called pubsub_topic_name
. Our description and suggestion give the user critical context they may have been missing:
Name must be unique, but only within the project
The suggestion lets the user know the expected format
Our Storage Bucket is for hosting function results, as well as the function code that will be executed. The original Terraform code had many hard-coded fields without much guidance:
File type and path for the function
Hardcoded location
The Resourcely Blueprint code introduces variables for input, while giving guidance to the user. Note also the use of {{ location }}
, a variable previously defined. With this behavior, we don't require the user to choose location
multiple times.
Finally we come to our Cloud Run Function, where we are introducing the most guidance and flexibility for users. In this section, we focus on enumerating possible options for users to choose, in the description field as well as in the form pick lists.
Function runtime: A user may not have been familiar with node.js, and may have not known how to utilize python code for their function.
Timeout: Flexibility to choose a timeout, and guidance on the length of time
Function entry point: Ensuring the user knows that their code needs an entry point, and that it is matches their code
Available memory: The amount of RAM to dedicate to the function, with an indicative format to guide users
Ingress: Optional ingress settings, and the available options
Retry policy: The policy on retries, and the two possible options
Environment variables: Optional environment variables the users can take advantage of in their code.
We now have a fully functioning form that developers can interact with. As covered above, the form has extensive guidance around possible input values, minimums and maximums, formatting, and other tips. When a user fills out this form, their infrastructure as code (and data pipeline!) will be created for them automatically and deployed using your existing CI/CD process.
Now that we have created a streamlined configuration experience, we have unblocked data engineers to create their own data pipelines.
However, you may want to also put more strict controls in place. What if a developer wants to deploy Terraform outside of Resourcely, or if you want to require approval to use a language other than node.js in your Cloud Run Function?
If we want to control what GCP location can be used for our data pipeline, we could control this with Guardrails. The following Guardrail can be published using Resourcely Foundry:
Note the wildcard in the WHEN
clause: this means that the location restrictions will be enforced for all Google resources.
This Guardrail will manifest itself in two ways:
Exposed as part of the Blueprint form. If a developer wants to deviate, they need to "unlock" the Guardrail.
The region value will be checked during your CI, and PRs that don't meet the requirements will be blocked and require review (i.e., if they were unlocked in the form and changed)
Guardrails are incredibly flexible, which is their beauty. We can also create a Guardrail that restricts the runtime used for Cloud Run Functions specifically.
This Guardrail restricts the user from selecting anything but Python 3.9:
Engineers of all types are looking for tools to move faster. Cloud infrastructure is a complex, nuanced topic that require expertise and guidance: usually in the form of platform teams.
With Resourcely, we were able to turn a potentially confusing Terraform example into a guided experience that can turn the data pipeline deployment process from a headache into a breeze.
will create a one-off data pipeline. This would require developers that are comfortable with Terraform and the cloud services options that can be set within it.
Putting all of our Blueprint code together looks like the below. You can paste this directly into to immediately publish and get started.
We can accomplish this with Resourcely
are written with Really, the Resourcely policy-as-code language. You can learn more about .
Sign up for your and build your own streamlined deployment experiences in the today!
Simplify creation of streaming data pipelines with a guided UI