Xgrid Snow DataOps: A Snowflake CI/CD Using Azure DevOps and Flyway
What is DataOps?
DataOps is a methodology for enhancing data analytics in terms of quality, speed, and collaboration and fostering a culture of continuous improvement. It is achieved by incorporating automation and agile software engineering techniques into a unified, process-centric data view and Integration. Contrary to DevOps, DataOps can be demonstrated as intersecting value and innovation pipelines, as shown in figure 2.
Downloads
Article (PDF-276 KB)MOST POPULAR INSIGHTS
- Optimizing Container Image Pull Efficiency: A Technical Deep Dive
- Streamline Your Application Delivery with AWS AppStream 2.0: An Introduction
- Secure Communication in a Hybrid Cloud – A Case of Site-to-Site VPN on AWS
- Understanding, Communicating and Making Informed Decisions with Data Visualization
- Cross-Account, Cross-Region Backups in AWS
DataOps builds upon the DevOps development model. Since the DevOps process flow incorporates a set of procedures typical of software development initiatives,
• Develop – create/alter an application
• Build –put together the parts of the application
• Test – application verification in a testing environment
• Deploy –switch from Development to Production Code
• Run – execute the application
1. DevOps Vs. DataOps
2. CI/CD (Continuous Integration/Continuous Deployment)
Two of the most important ideas in DevOps are CI/CD. Continuous Integration is the method of building, integrating, and testing a new code in a staging area that occurs in real-time. The building and testing processes are automated to ensure that they can be carried out quickly and reliably.
Continuous development is an automated method of software deployment or delivery. After all prerequisite tests have been completed, the DevOps team puts an application into production. CI and CD collaborate to address the primary barrier to Agile development. Prior to DevOps, Agile produced a steady stream of updates and innovations that would come to a halt during the manual Integration and deployment cycle. DevOps has enabled organizations to update their software multiple times due to automated CI and CD.
3. The first step toward DataOps
We introduce how to apply Continuous Integration and Continuous Deployment (CI/CD) practices to the development life cycle of data pipelines on a real-life data platform. In this case, the data platform is Snowflake.
It is assumed that three environments of the data platform are available: development, test, and production environment. The deployment of these environments is not part of the CI/CD process. Therefore, it can be considered that these environments are either manually created or built following a CI/CD process using an Infrastructure as Code (IaC) approach for product development. It is assumed that each environment has an independent instance of each data platform system.
4. What is a Snowflake?
Snowflake’s Data Cloud is a Software-as-a-Service (SaaS). It is a cutting-edge data platform. Snowflake provides vastly superior data storage, processing, and analysis capabilities compared to conventional methods.
Like other typical cloud-based database systems, it is not based on existing database technology or the Hadoop ecosystem and storage. It is an entirely new SQL query engine with unusual architecture. Its architecture was developed from the ground up to function in a cloud setting. It is loaded with various unique capabilities, distinctive features, and an enterprise analytic database for a user.
5. What is Flyway?
Flyway is a database migration utility that is free to use and open source. It gives simplicity and convention more weight than configuration options. The migration and transformations are supported with PL/SQL, T-SQL, and JAVA. The advanced transformation of Lobs can be written in JAVA for Flyway. It is licensed under Apache License 2.0. The key advantage of this process is that Flyway detects and executes the necessary update operations. As a result, you don’t need to know which SQL update statements must be executed to update your current database. You and your colleagues can define the update operations used to migrate the database from one version to the next. Flyway will detect the current version and execute the necessary update operations to update the database.
Xgrid Snow DataOps: A Snowflake CI/CD using Azure DevOps and Flyway
Step 1: Create a Demo database on Snowflake
Create Databases
CREATE DATABASE XGRID_DATA_DEMO COMMENT = ‘Azure DevOps deployment test’;
CREATE DATABASE XGRID_DATA_DEMO_DEV COMMENT = ‘Azure DevOps deployment test’;
CREATE DATABASE XGRID_DATA_DEMO_QA COMMENT = ‘Azure DevOps deployment test’;
— Create a Deploy User
create user devopsuser password=” default_role = sysadmin;
Step2: Create a project on Azure DevOps
- Sign in to Azure DevOps with appropriate credentials.
- Choose the Organization and click on the Blue-colored +New Project button.
- Give a unique and concise name to your project. You can also add a description for it. Let’s name the project as Xgrid_data_Snowflake_Flyway for the sake of this tutorial.
- Now, select the Visibility option for your project and click Create button.
- The details of the project will be shown as under
You must have an Environment to add the Approval step. Follow the steps given below to create the necessary Environments and Approvals:
- Head back to the Azure DevOps home page.
- Navigate to the left-side navigation bar and click on the Environments option.
- Give a unique name to the Production Environment and Just hit the button labeled “Create”.
- To create Approval for the Production Environment, click on the three vertical dots next to the Add Resource button.
- Click on the Approvals and Checks option to add a list of Approvers.
Step 3: Create a Library Variable Group
When you have a set of variables used in multiple pipelines, you can create a Variable Group once and reference it in multiple groups. Libraries are used to securely store variables and files used in your Snowflake CI/CD pipeline. Follow the steps given below to create a Library Variable Group:
- Click on Library present under the Pipelines option in the left navigation bar.
- On the Library page, navigate to the Variable Groups tab.
- Click on the +Variable Group button for the creation of a new Library Variable Group.
-
- Give a unique name to the group and add the following variables to it.
SNOWFLAKE_JDBC_URL=jdbc:snowflake://
SNOWFLAKE_ACCOUNT_NAME=..snowflakecomputing.com
SNOWFLAKE_WAREHOUSE=
SNOWFLAKE_ROLENAME=sysadmin
SNOWFLAKE_DEVOPS_USERNAME=
# mark as a secret variable type
SNOWFLAKE_DEVOPS_SECRET=
SNOWFLAKE_AUTHENTICATOR=snowflake
-
-
- Once you have successfully added all the variables, do not forget to click the Save button to the right of the Variable Group’s name. This is how your Variable Group would look like
-
-
-
- The variable group “Snowflake.database” should not have any restrictions for any pipeline, as shown.
-
Step 4: Create and Run a Snowflake CI/CD Deployment Pipeline
Now, to create a Snowflake CI/CD Pipeline, follow the steps given below:
-
-
-
- In the left navigation bar, click on the Pipelines option.
- If you are creating a pipeline for the first time, hit on the Create Pipeline button. In case you already have another pipeline defined, click on the New Pipeline button.
- Select the Azure Repos Git option on the Connect tab and select the desired repository (Xgrid_data_Snowflake_Flyway) on the next screen.
-
-
-
-
-
- On the Configure your Pipeline page, select the Starter Pipeline option.
-
-
-
-
-
- Lastly, paste the following piece of code into the Review your Final YAML page.
-
-
variables:
- group: Snowflake.Database
- name: DBNAME
value: flyway_demo
- name: flywayartifactName
value: DatabaseArtifacts
- name: flywayVmImage
value: 'ubuntu-16.04'
- name: flywayContainerImage
value: 'kulmam92/flyway-azure:6.2.3'
trigger:
- master
stages:
– stage: Build
variables:
– name: DBNAME_POSTFIX
value: _DEV
jobs:
– template: templates/snowflakeFlywayBuild.yml
parameters:
jobName: ‘BuildDatabase’
databaseName: $(DBNAME)
databasePostfix: $(DBNAME_POSTFIX)
artifactName: $(flywayartifactName)
vmImage: $(flywayVmImage)
containerImage: $(flywayContainerImage)
– stage: DEV
variables:
– name: DBNAME_POSTFIX
value: _DEV
jobs:
– template: templates/snowflakeFlywayDeploy.yml
parameters:
jobName: DEV
databaseName: $(DBNAME)
databasePostfix: $(DBNAME_POSTFIX)
artifactName: $(flywayartifactName)
vmImage: $(flywayVmImage)
containerImage: $(flywayContainerImage)
environmentName: DEV
– stage: QA
variables:
– name: DBNAME_POSTFIX
value: _QA
jobs:
– template: templates/snowflakeFlywayDeploy.yml
parameters:
jobName: QA
databaseName: $(DBNAME)
databasePostfix: $(DBNAME_POSTFIX)
artifactName: $(flywayartifactName)
vmImage: $(flywayVmImage)
containerImage: $(flywayContainerImage)
environmentName: QA
– stage: PROD
variables:
– name: DBNAME_POSTFIX
value: ” # Empty string for PROD
jobs:
– template: templates/snowflakeFlywayDeploy.yml
parameters:
jobName: PROD
databaseName: $(DBNAME)
databasePostfix: $(DBNAME_POSTFIX)
artifactName: $(flywayartifactName)
vmImage: $(flywayVmImage)
containerImage: $(flywayContainerImage)
environmentName: PROD
-
-
-
- Once successfully adding the code to the editor, click the Save and Run button.
-
-
You have successfully built the DataOps.
Conclusion
DataOps effectively streamlined the process of building data products to save time. It has become the bridge between data handlers who collect or model data, analyze it, or incorporate their findings. It drives the agile from DevOps and is successfully implemented for data products. Secondly, it assures quality and security by intersecting the value and innovating pipelines and multiple testing between the development and production environment.
About The Author(s)
AUTHOR(S)
Established in 2012, Xgrid has a history of delivering a wide range of intelligent and secure cloud infrastructure, user interface and user experience solutions. Our strength lies in our team and its ability to deliver end-to-end solutions using cutting edge technologies.
OFFICE ADDRESS
US Address:
Plug and Play Tech Center, 440 N Wolfe Rd, Sunnyvale, CA 94085
Pakistan Address:
Xgrid Solutions (Private) Limited, Bldg 96, GCC-11, Civic Center, Gulberg Greens, Islamabad
Xgrid Solutions (Pvt) Ltd, Daftarkhwan (One), Building #254/1, Sector G, Phase 5, DHA, Lahore