{"id":58837,"date":"2025-02-03T06:27:27","date_gmt":"2025-02-03T14:27:27","guid":{"rendered":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/?p=58837"},"modified":"2025-06-04T01:17:26","modified_gmt":"2025-06-04T08:17:26","slug":"building-a-snowflake-data-warehouse-from-scratch-using-soda-core","status":"publish","type":"post","link":"\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/","title":{"rendered":"Building a Snowflake data warehouse from scratch using Soda Core"},"content":{"rendered":"<p><span style=\"font-weight: 400\">My name is Pavel, and I am a Data Engineer in the <a href=\"https:\/\/www.ringcentral.com\/bg\/en\/careers.html\">RingCentral Bulgaria<\/a> office. Ensuring data quality in our Snowflake warehouse used to be an important function in my role. In this piece, I will share how our team tackled the challenge from scratch using Soda Core. It&#8217;s important to note that Snowflake&#8217;s constraints are declarative in nature, allowing us to define data rules without complex programming. I won\u2019t delve into all aspects of data quality assurance or final reporting however, I will focus on the challenges we faced and how we addressed them using Soda Core.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Ensuring data quality from the early stages of building a data warehouse is crucial. In modern data pipelines, errors in the initial stages can propagate, leading to significant issues later on. Therefore, building quality checks early on can help avoid problems as data volumes and complexity grow.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Soda Core is an open-source tool that allows for easy querying and writing of data checks. It also integrates seamlessly with databases using Python, making the process of validating data more flexible and user-friendly.<\/span><\/p>\n<hr \/>\n<h2><span style=\"font-weight: 400\">The Challenges of Modern Data Stacks<\/span><\/h2>\n<p><span style=\"font-weight: 400\">The modern data stack is often more agile in its approach to database constraints, focusing more on scalability and speed than enforcing rigid data rules upfront. While this brings certain advantages, such as faster iteration and development cycles, it also means that monitoring and validating data quality becomes essential. Without rigorous checks, data integrity may be compromised over time.<\/span><\/p>\n<p><span style=\"font-weight: 400\">However, this flexibility presents a unique challenge for data quality assurance teams. Modern data warehouse architectures, often built on cloud platforms like Snowflake, BigQuery, or Redshift, prioritize scale and ease of use over strict data governance. For data quality teams, this becomes a balancing act: how to ensure high data quality without compromising flexibility and performance that modern architectures offer.<\/span><\/p>\n<p><span style=\"font-weight: 400\">This article will explore how we navigated these challenges, particularly focusing on the use of Soda Core for building a robust data quality monitoring system.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">What We Had<\/span><\/h2>\n<h3><span style=\"font-weight: 400\">Snowflake and the Need for Monitoring Constraints<\/span><\/h3>\n<p><span style=\"font-weight: 400\">We were working with Snowflake, which required us to actively monitor data constraints. Unlike traditional databases, modern data stacks like Snowflake don\u2019t enforce strict rules such as Primary Keys (PK) or Foreign Keys (FK). This flexibility can be an advantage, but it also places the burden of enforcing and monitoring these constraints on the data quality team. Ensuring that data meets these constraints becomes a critical part of the data quality process.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">Data Vault Architecture and the Complexity of Monitoring Multiple Tables<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Our architecture was based on Data Vault, which gave the team the option to quickly connect and disconnect data sources. Data Vault&#8217;s design allows seamless changes in the sources without having to fully rebuild the data warehouse\u2019s layers. For us, this meant the data quality process had to be equally adjustable, enabling the quick deployment of tests for new sources. With this architecture, it&#8217;s not just about defining constraints for the incoming data, but also leveraging the warehouse&#8217;s metadata to streamline other aspects of data testing and monitoring.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The goal was to maintain simplicity and efficiency in deploying data quality checks for new sources. We realized the value of using metadata not only to define constraints but also to automate broader data quality monitoring tasks, improvingboth speed and adjustability.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">Limited Resources and the Search for Automation<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Given our limited resources, we needed a tool that could automate as much of the process as possible. Writing SQL queries based on metadata and executing them directly in the database might seem straightforward\u2014and often it is.\u00a0 However, we wanted a solution that minimized manual intervention and allowed us to defer complex tasks without sacrificing data quality. Automation was key to keeping our processes efficient and scalable.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">Team Skillset: SQL Over Python<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Our team consisted of members who were far more proficient in SQL than Python. Based on this, we chose to follow a &#8220;SQL-first&#8221; approach. Although we still utilized Python where necessary, all critical checks and validations were written in SQL. This approach ensured that our data quality checks were understandable and accessible to everyone on the team, including business analysts and testers. Using a Python-first approach or a tool like Great Expectations would have reduced this opportunity.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">Dealing with Uncertainty: Defining Data Quality Metrics<\/span><\/h3>\n<p><span style=\"font-weight: 400\">We faced a lot of uncertainty regarding which data quality metrics would ultimately prove valuable. While data freshness was a top priority, we were uncertain about how to measure other important metrics like accuracy or detect anomalies over time. This uncertainty meant we had to remain flexible and ready to experiment, adjusting our metrics as our understanding of the data and business needs evolved.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">Working Within the Existing Tech Stack<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Another consideration was integrating with our existing tech stack. Our data warehouse was populated using Airflow, so it made sense to leverage Airflow for data quality monitoring as well. This approach allowed us to maintain a unified, coherent process and avoid introducing unnecessary complexity by adding new tools that didn\u2019t align with our existing workflow.<\/span><\/p>\n<p><span style=\"font-weight: 400\">If you&#8217;re working in a similar environment, this article might provide insights that are directly relevant to your own challenges.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">What is Soda and Why We Used It<\/span><\/h2>\n<h2><span style=\"font-weight: 400\">The Leading Competitor: Great Expectations<\/span><\/h2>\n<p><span style=\"font-weight: 400\">In the realm of data quality, Great Expectations stands out as a feature-rich framework. It provides a wide range of capabilities for data validation and monitoring, including:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Extensive built-in checks: Great Expectations offers a robust library of predefined validations, ranging from schema checks to anomaly detection and value testing.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Automated documentation and visualization: It generates documentation and detailed reports for every data check, which simplifies tracking and communication about data quality across teams.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Multiple integrations: Great Expectations supports a wide array of data sources \u2014 databases, CSV files, Parquet, and more \u2014 which makes it a versatile tool for almost any data pipeline.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">However, while Great Expectations is undeniably powerful, we chose Soda Core for a few key reasons that made it a better fit for our use case.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">What Soda Core Brings to the Table<\/span><\/h2>\n<p><span style=\"font-weight: 400\">While Great Expectations is a powerful and versatile tool, Soda Core offers a number of features that made it the right choice for our project, providing almost all the necessary functionality with a SQL-first approach. Here\u2019s why we chose Soda Core:<\/span><\/p>\n<h3><span style=\"font-weight: 400\">SQL-First Approach and Automation<\/span><\/h3>\n<p><span style=\"font-weight: 400\">One of Soda Core&#8217;s biggest advantages is its SQL-first design. It allows users to define and execute data quality checks directly in SQL, which simplifies the validation process, especially for teams more comfortable with SQL. Additionally, Soda Core can be run as a Python script, automatically generating SQL queries. This makes it easy to automate the process of data validation.<\/span><\/p>\n<p><span style=\"font-weight: 400\">For example, a simple check can be initiated like this:<\/span><\/p>\n<p><code><span style=\"font-weight: 400\">import<\/span> <span style=\"font-weight: 400\">soda<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">scan<\/span> <span style=\"font-weight: 400\">=<\/span> <span style=\"font-weight: 400\">soda.create_scan()<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">scan.add_query(\"\"\"<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400\">SELECT<\/span> <span style=\"font-weight: 400\">COUNT(*)<\/span> <span style=\"font-weight: 400\">FROM<\/span> <span style=\"font-weight: 400\">my_table<\/span> <span style=\"font-weight: 400\">WHERE<\/span> <span style=\"font-weight: 400\">value<\/span> <span style=\"font-weight: 400\">IS<\/span> <span style=\"font-weight: 400\">NULL<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\"\"\")<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">scan.run()<\/span><\/code><\/p>\n<h3><span style=\"font-weight: 400\">Support for Custom SQL Queries<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Custom SQL checks are another key feature. Soda Core enables users to define highly specific and complex checks, tailored to business rules or unique data patterns. This flexibility was crucial for us, enabling the direct handling of complex business logic through custom SQL queries. This flexibility ensured that even the most nuanced data quality requirements were covered.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">Wide Database Support, Including Snowflake<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Soda Core supports a wide range of relational databases, including Snowflake, making it versatile for different infrastructures. This flexibility allowed us to easily integrate it into our existing system, regardless of the database technology in use. Soda Core&#8217;s database connectors enable seamless connectivity across modern data warehouses like Snowflake, Redshift, BigQuery, Spark, Vertica, Clickhouse and others.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">Optimized SQL Execution<\/span><\/h3>\n<p><span style=\"font-weight: 400\">The SQL queries generated by Soda Core are optimized for performance, ensuring that even on large datasets, the checks run efficiently. This is critical for large-scale data warehouses where performance is a top priority. Optimized queries mean faster checks without compromising accuracy or completeness.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">Metadata Extraction from Checks<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Soda Core enables users to extract key metadata from validation results, such as the number of records checked, errors found, and other important metrics. This helps teams monitor data quality over time and make informed decisions based on comprehensive data.<\/span><\/p>\n<p><span style=\"font-weight: 400\">For example:<\/span><\/p>\n<p><code><span style=\"font-weight: 400\">results<\/span> <span style=\"font-weight: 400\">=<\/span> <span style=\"font-weight: 400\">scan.get_results()<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">for<\/span> <span style=\"font-weight: 400\">result<\/span> <span style=\"font-weight: 400\">in<\/span> <span style=\"font-weight: 400\">results:<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400\">print(f\"Check:<\/span> <span style=\"font-weight: 400\">{result.check_name},<\/span> <span style=\"font-weight: 400\">Errors:<\/span> <span style=\"font-weight: 400\">{result.failed}\")<\/span><\/code><\/p>\n<h3><span style=\"font-weight: 400\">Switching Between Warehouses and Databases<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Soda Core supports switching between different warehouses and databases within the same project. This feature is particularly useful in environments where data resides in multiple isolated systems or stages of development.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Example:<\/span><\/p>\n<p><code><span style=\"font-weight: 400\">scan.set_connection(database=\"prod_db\",<\/span> <span style=\"font-weight: 400\">warehouse=\"wh_analytics\")<\/span><\/code><\/p>\n<h3><span style=\"font-weight: 400\">Open Source and Accessibility<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Soda Core is part of the larger Soda ecosystem and is an open-source solution, which means it can be used without licensing costs or additional fees. This makes it accessible to teams of all sizes and budgets, offering a high degree of flexibility without financial constraints.<\/span><\/p>\n<h3><span style=\"font-weight: 400\">Customizable Severity Levels for Checks<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Soda Core allows setting criticality levels for each check, which helps prioritize issues based on their severity. For instance, critical checks can be set to stop the ETL process, while non-critical ones can simply trigger alerts or be included in reports.<\/span><\/p>\n<p><code><span style=\"font-weight: 400\">scan.add_check(\"Check<\/span> <span style=\"font-weight: 400\">name\").critical()<\/span><\/code><\/p>\n<h3><span style=\"font-weight: 400\">Schema, Table, and Column-Level Checks<\/span><\/h3>\n<p><span style=\"font-weight: 400\">Soda Core supports validations at multiple data levels\u2014schema, table, and column. This allows for fine-grained control over data quality checks depending on the requirements of the project. You can validate specific data points or enforce broader constraints on entire datasets.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Example of a YAML configuration:<\/span><\/p>\n<p><code><span style=\"font-weight: 400\">checks:<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0<\/span><span style=\"font-weight: 400\">-<\/span> <span style=\"font-weight: 400\">schema:<\/span> <span style=\"font-weight: 400\">my_schema<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0<\/span><span style=\"font-weight: 400\">-<\/span> <span style=\"font-weight: 400\">table:<\/span> <span style=\"font-weight: 400\">my_table<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0<\/span><span style=\"font-weight: 400\">-<\/span> <span style=\"font-weight: 400\">column:<\/span> <span style=\"font-weight: 400\">my_column<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400\">checks:<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400\">-<\/span> <span style=\"font-weight: 400\">missing_percentage<\/span> <span style=\"font-weight: 400\">&lt;<\/span> <span style=\"font-weight: 400\">5%<\/span><\/code><\/p>\n<h2><span style=\"font-weight: 400\">Final Thoughts<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Great Expectations remains an excellent choice for projects that demand extensive out-of-the-box data quality checks and where Python is the preferred language. But for those seeking a more streamlined, SQL-centric solution, Soda Core offers almost all of the same key features with added simplicity. It\u2019s a lightweight, highly adaptable framework that integrates smoothly with modern data stacks while offering powerful data validation capabilities.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">How We Used Soda Core for Data Quality<\/span><\/h2>\n<p><span style=\"font-weight: 400\">In our project, we structured data quality checks into three primary categories: <\/span><b>constraints<\/b><span style=\"font-weight: 400\"> checks, <\/span><b>technical<\/b><span style=\"font-weight: 400\"> checks, and <\/span><b>business<\/b><span style=\"font-weight: 400\"> checks. This helped us manage different types of validations efficiently, using a combination of automated metadata-driven queries and custom scripts developed with business analysts.<\/span><\/p>\n<p><b>Types of Checks<\/b><\/p>\n<p><span style=\"font-weight: 400\">1. Constraint Checks<\/span><\/p>\n<p><span style=\"font-weight: 400\">Constraint checks focus on ensuring that the data adheres to specific rules and database-level restrictions, such as uniqueness, references, data formats, and required fields (e.g., NOT NULL). These checks help maintain the structural integrity of the data. We were able to generate constraint checks from the database metadata, automating the validation process and making monitoring more efficient.<\/span><\/p>\n<p><span style=\"font-weight: 400\">For example, if a table must contain unique identifiers, we could set up a query to automatically verify this parameter and flag any violations.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Example SQL Query for Constraint Check:<\/span><\/p>\n<p><code><span style=\"font-weight: 400\">SELECT<\/span> <span style=\"font-weight: 400\">COUNT(*)<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">FROM<\/span> <span style=\"font-weight: 400\">my_schema.my_table<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">WHERE<\/span> <span style=\"font-weight: 400\">id<\/span> <span style=\"font-weight: 400\">IS<\/span> <span style=\"font-weight: 400\">NOT<\/span> <span style=\"font-weight: 400\">DISTINCT<\/span> <span style=\"font-weight: 400\">FROM<\/span> <span style=\"font-weight: 400\">NULL<\/span><\/code><\/p>\n<p><span style=\"font-weight: 400\">2. Technical Checks<\/span><\/p>\n<p><span style=\"font-weight: 400\">Technical checks ensure the stability and correctness of the data infrastructure. These include checks for data freshness, timeliness of updates, data volumes, and the smooth operation of ETL pipelines. These checks guarantee that data is ingested on time and error-free, which is crucial for maintaining the integrity of the data warehouse.<\/span><\/p>\n<p><span style=\"font-weight: 400\">3. Business Checks<\/span><\/p>\n<p><span style=\"font-weight: 400\">Business checks validate data against specific business logic or requirements, which are defined in collaboration with system and business analysts. These checks are unique to each project, often validating data correctness according to the specific calculations, ranges, or logical conditions relevant to the business.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Business checks are custom-built and stored separately from the general constraint or technical checks, allowing us to manage them flexibly and adapt quickly to changes in business processes.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Example Business Check:<\/span><\/p>\n<p><code><span style=\"font-weight: 400\">SELECT<\/span> <span style=\"font-weight: 400\">COUNT(*)<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">FROM<\/span> <span style=\"font-weight: 400\">sales<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">WHERE<\/span> <span style=\"font-weight: 400\">revenue<\/span> <span style=\"font-weight: 400\">&lt;<\/span> <span style=\"font-weight: 400\">cost<\/span><\/code><\/p>\n<p><span style=\"font-weight: 400\">This checks if any rows in the sales table violate the business rule that revenue should always exceed cost.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Automated Check Generation from Metadata<\/span><\/h2>\n<p><span style=\"font-weight: 400\">We automated the generation of constraint checks by extracting relevant information directly from the database metadata. By accessing details about table relationships, fields, and schemas, we could dynamically create SQL queries to validate these constraints.<\/span><\/p>\n<p><span style=\"font-weight: 400\">For example, to validate foreign key relationships between layers of the warehouse, we generated SQL queries from metadata like this:<\/span><\/p>\n<p><span style=\"font-weight: 400\">Example Foreign Key Check:<\/span><\/p>\n<p><code><span style=\"font-weight: 400\">checks<\/span> <span style=\"font-weight: 400\">for<\/span> <span style=\"font-weight: 400\">fk_table:<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0<\/span><span style=\"font-weight: 400\">-<\/span> <span style=\"font-weight: 400\">values<\/span> <span style=\"font-weight: 400\">in<\/span> <span style=\"font-weight: 400\">(fk_column)<\/span> <span style=\"font-weight: 400\">must<\/span> <span style=\"font-weight: 400\">exist<\/span> <span style=\"font-weight: 400\">in<\/span> <span style=\"font-weight: 400\">pk_table<\/span> <span style=\"font-weight: 400\">(pk_column)<\/span><\/code><\/p>\n<p><span style=\"font-weight: 400\">However, one limitation of Soda Core&#8217;s standard package is the inability to check foreign key constraints across different layers of the warehouse. To address this, we generated custom queries for such cases:<\/span><\/p>\n<p><code><span style=\"font-weight: 400\">SELECT<\/span> <span style=\"font-weight: 400\">COUNT(SOURCE.fk_column_name,<\/span> <span style=\"font-weight: 400\">SOURCE.fk_column_name2)<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">FROM<\/span> <span style=\"font-weight: 400\">database_name.schema_name.FK_NAME<\/span> <span style=\"font-weight: 400\">SOURCE<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">LEFT<\/span> <span style=\"font-weight: 400\">JOIN<\/span> <span style=\"font-weight: 400\">database_name.schema_name.PK_NAME<\/span> <span style=\"font-weight: 400\">TARGET<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0<\/span><span style=\"font-weight: 400\">ON<\/span> <span style=\"font-weight: 400\">SOURCE.fk_column_name<\/span> <span style=\"font-weight: 400\">=<\/span> <span style=\"font-weight: 400\">TARGET.pk_column_name<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0<\/span><span style=\"font-weight: 400\">AND<\/span> <span style=\"font-weight: 400\">SOURCE.fk_column_name2<\/span> <span style=\"font-weight: 400\">=<\/span> <span style=\"font-weight: 400\">TARGET.pk_column_name2<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">WHERE<\/span> <span style=\"font-weight: 400\">((SOURCE.fk_column_name<\/span> <span style=\"font-weight: 400\">IS<\/span> <span style=\"font-weight: 400\">NOT<\/span> <span style=\"font-weight: 400\">NULL<\/span> <span style=\"font-weight: 400\">AND<\/span> <span style=\"font-weight: 400\">TARGET.pk_column_name<\/span> <span style=\"font-weight: 400\">IS<\/span> <span style=\"font-weight: 400\">NULL)<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0<\/span><span style=\"font-weight: 400\">OR<\/span> <span style=\"font-weight: 400\">(SOURCE.fk_column_name2<\/span> <span style=\"font-weight: 400\">IS<\/span> <span style=\"font-weight: 400\">NOT<\/span> <span style=\"font-weight: 400\">NULL<\/span> <span style=\"font-weight: 400\">AND<\/span> <span style=\"font-weight: 400\">TARGET.pk_column_name2<\/span> <span style=\"font-weight: 400\">IS<\/span> <span style=\"font-weight: 400\">NULL))<\/span><\/code><\/p>\n<h2><span style=\"font-weight: 400\">Profiling for Technical Checks<\/span><\/h2>\n<p><span style=\"font-weight: 400\">For technical checks, we performed data profiling depending on the attribute types. Profiling results were used to drive validation at the warehouse level. The checks differed based on whether the attribute was numeric, text, or date-based:<\/span><\/p>\n<p><code><span style=\"font-weight: 400\">Numeric<\/span> <span style=\"font-weight: 400\">Columns:<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400\">Minimum<\/span> <span style=\"font-weight: 400\">value<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400\">Maximum<\/span> <span style=\"font-weight: 400\">value<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400\">Five<\/span> <span style=\"font-weight: 400\">smallest\/largest<\/span> <span style=\"font-weight: 400\">values<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400\">Most<\/span> <span style=\"font-weight: 400\">frequent<\/span> <span style=\"font-weight: 400\">values<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400\">Average,<\/span> <span style=\"font-weight: 400\">sum,<\/span> <span style=\"font-weight: 400\">standard<\/span> <span style=\"font-weight: 400\">deviation,<\/span> <span style=\"font-weight: 400\">variance<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400\">Count<\/span> <span style=\"font-weight: 400\">of<\/span> <span style=\"font-weight: 400\">distinct<\/span> <span style=\"font-weight: 400\">and<\/span> <span style=\"font-weight: 400\">missing<\/span> <span style=\"font-weight: 400\">values<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400\">Histogram<\/span> <span style=\"font-weight: 400\">of<\/span> <span style=\"font-weight: 400\">values<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">Text<\/span> <span style=\"font-weight: 400\">Columns:<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400\">Most<\/span> <span style=\"font-weight: 400\">frequent<\/span> <span style=\"font-weight: 400\">values<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400\">Count<\/span> <span style=\"font-weight: 400\">of<\/span> <span style=\"font-weight: 400\">distinct<\/span> <span style=\"font-weight: 400\">and<\/span> <span style=\"font-weight: 400\">missing<\/span> <span style=\"font-weight: 400\">values<\/span><\/code><\/p>\n<p><code><span style=\"font-weight: 400\">\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"font-weight: 400\">Average,<\/span> <span style=\"font-weight: 400\">minimum,<\/span> <span style=\"font-weight: 400\">and<\/span> <span style=\"font-weight: 400\">maximum<\/span> <span style=\"font-weight: 400\">length<\/span><\/code><\/p>\n<p><span style=\"font-weight: 400\">Dates were not profiled, but technical checks on timeliness and freshness were applied directly to date fields.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Storing Results for Iterative Improvement<\/span><\/h2>\n<p><span style=\"font-weight: 400\">We stored the results of these checks in a separate database. This allowed us to maintain a history of validation results and continuously improve our data quality approach over time. By keeping a record of checks, we could apply them retrospectively to historical data, which is especially important for evolving time-series validation.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Final Thoughts on Our Approach<\/span><\/h2>\n<p><span style=\"font-weight: 400\">This structured approach\u2014separating checks by type, generating them from metadata, and leveraging profiling\u2014allowed us to comprehensively monitor and ensure data quality across both technical and business dimensions. As our data quality needs evolved, we could easily extend the system to adapt, all while maintaining clear visibility into the performance of each type of check.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Next Steps<\/span><\/h2>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Integrate Additional Data Warehouses: We can expand our process to include multiple data warehouses by adding connectors. Such adjustability allows us to monitor and validate data across different environments seamlessly.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Build and Visualize Data Quality Dimensions: For deeper analysis of data quality, it&#8217;s essential to define and visualize data quality dimensions such as freshness, completeness, accuracy, and consistency. These dimensions provide a clearer understanding of data health and highlight areas that require attention. By integrating these dimensions into visualizations using Business Intelligence (BI) tools, we can offer intuitive representations of data quality, making interpretation easier.<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Implement Real-Time Checks During Data Ingestion: We plan to perform validations as data is ingested into the warehouse, with the ability to halt the data-loading process in the event of validation failures. This proactive approach helps maintain data integrity right from the outset.<\/span><\/li>\n<\/ul>\n<h2><span style=\"font-weight: 400\">Summary<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Soda Core has proven to be particularly user-friendly for SQL developers, utilizing Python primarily for executing tasks rather than for writing checks. This design streamlines workflows, enabling developers to focus on familiar SQL for data validation while avoiding the complexities of Python programming.<\/span><\/p>\n<p><span style=\"font-weight: 400\">One of our key innovations was the ability to generate SQL queries based on metadata, a feature we implemented using Soda Core. This approach provides significant flexibility, allowing us to easily adapt our checks according to the evolving data structure and changing requirements. As a result, our data quality framework has become more dynamic and responsive, ensuring we can maintain high standards even as our data landscape evolves.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>My name is Pavel, and I am a Data Engineer in the RingCentral Bulgaria office. Ensuring data quality in our Snowflake warehouse used to be an important function in my role. In this piece, I will share how our team tackled the challenge from scratch using Soda Core. It&#8217;s important to note that Snowflake&#8217;s constraints &#8230;<\/p>\n","protected":false},"author":1242,"featured_media":59479,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18885],"tags":[43346],"class_list":["post-58837","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ringcentral-newsdesk","tag-working-at-rc-bulgaria"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v19.3 (Yoast SEO v27.9) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Building a Snowflake data warehouse from scratch using Soda Core | RingCentral Blog<\/title>\n<meta name=\"description\" content=\"My name is Pavel, and I am a Data Engineer in the RingCentral Bulgaria office. Ensuring data quality in our Snowflake warehouse used to be an important\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.ringcentral.com\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building a Snowflake data warehouse from scratch using Soda Core\" \/>\n<meta property=\"og:description\" content=\"My name is Pavel, and I am a Data Engineer in the RingCentral Bulgaria office. Ensuring data quality in our Snowflake warehouse used to be an important\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.ringcentral.com\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/\" \/>\n<meta property=\"og:site_name\" content=\"RingCentral Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/ringcentral\" \/>\n<meta property=\"article:published_time\" content=\"2025-02-03T14:27:27+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-06-04T08:17:26+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.ringcentral.com\/us\/en\/blog\/wp-content\/uploads\/2025\/02\/Pavel_Surkou_930x700.png\" \/>\n\t<meta property=\"og:image:width\" content=\"930\" \/>\n\t<meta property=\"og:image:height\" content=\"700\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Pavel Surkou\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@ringcentral\" \/>\n<meta name=\"twitter:site\" content=\"@ringcentral\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Pavel Surkou\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"13 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\\\/#article\",\"isPartOf\":{\"@id\":\"\\\/us\\\/en\\\/blog\\\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\\\/\"},\"author\":{\"name\":\"Pavel Surkou\",\"@id\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/#\\\/schema\\\/person\\\/55c3d5848b391446702cd9ab06dc9bbf\"},\"headline\":\"Building a Snowflake data warehouse from scratch using Soda Core\",\"datePublished\":\"2025-02-03T14:27:27+00:00\",\"dateModified\":\"2025-06-04T08:17:26+00:00\",\"mainEntityOfPage\":{\"@id\":\"\\\/us\\\/en\\\/blog\\\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\\\/\"},\"wordCount\":2427,\"publisher\":{\"@id\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\\\/#primaryimage\"},\"thumbnailUrl\":\"\\\/us\\\/en\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/02\\\/Pavel_Surkou_930x700.png\",\"keywords\":[\"Working at RC Bulgaria\"],\"articleSection\":[\"Company news &amp; culture\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"\\\/us\\\/en\\\/blog\\\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\\\/\",\"url\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\\\/\",\"name\":\"Building a Snowflake data warehouse from scratch using Soda Core | RingCentral Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\\\/#primaryimage\"},\"thumbnailUrl\":\"\\\/us\\\/en\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/02\\\/Pavel_Surkou_930x700.png\",\"datePublished\":\"2025-02-03T14:27:27+00:00\",\"dateModified\":\"2025-06-04T08:17:26+00:00\",\"description\":\"My name is Pavel, and I am a Data Engineer in the RingCentral Bulgaria office. Ensuring data quality in our Snowflake warehouse used to be an important\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\\\/#primaryimage\",\"url\":\"\\\/us\\\/en\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/02\\\/Pavel_Surkou_930x700.png\",\"contentUrl\":\"\\\/us\\\/en\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/02\\\/Pavel_Surkou_930x700.png\",\"width\":930,\"height\":700},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Building a Snowflake data warehouse from scratch using Soda Core\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/\",\"name\":\"RingCentral Blog\",\"description\":\"Intelligent Communications\",\"publisher\":{\"@id\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/#organization\",\"name\":\"RingCentral\",\"url\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"\\\/us\\\/en\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/04\\\/ringcentral-logo.png\",\"contentUrl\":\"\\\/us\\\/en\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/04\\\/ringcentral-logo.png\",\"width\":2048,\"height\":309,\"caption\":\"RingCentral\"},\"image\":{\"@id\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/ringcentral\",\"https:\\\/\\\/x.com\\\/ringcentral\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/ringcentral\\\/\",\"https:\\\/\\\/www.instagram.com\\\/ringcentral\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/newrcblog.wpengine.com\\\/us\\\/en\\\/blog\\\/#\\\/schema\\\/person\\\/55c3d5848b391446702cd9ab06dc9bbf\",\"name\":\"Pavel Surkou\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/25e16d9e26a5343964138c91d1d0068516cd534d18cd0a897e28bcf14abaf7b9?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/25e16d9e26a5343964138c91d1d0068516cd534d18cd0a897e28bcf14abaf7b9?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/25e16d9e26a5343964138c91d1d0068516cd534d18cd0a897e28bcf14abaf7b9?s=96&d=mm&r=g\",\"caption\":\"Pavel Surkou\"},\"url\":\"\\\/us\\\/en\\\/blog\\\/author\\\/pavel-surkou\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Building a Snowflake data warehouse from scratch using Soda Core | RingCentral Blog","description":"My name is Pavel, and I am a Data Engineer in the RingCentral Bulgaria office. Ensuring data quality in our Snowflake warehouse used to be an important","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.ringcentral.com\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/","og_locale":"en_US","og_type":"article","og_title":"Building a Snowflake data warehouse from scratch using Soda Core","og_description":"My name is Pavel, and I am a Data Engineer in the RingCentral Bulgaria office. Ensuring data quality in our Snowflake warehouse used to be an important","og_url":"https:\/\/www.ringcentral.com\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/","og_site_name":"RingCentral Blog","article_publisher":"https:\/\/www.facebook.com\/ringcentral","article_published_time":"2025-02-03T14:27:27+00:00","article_modified_time":"2025-06-04T08:17:26+00:00","og_image":[{"width":930,"height":700,"url":"https:\/\/www.ringcentral.com\/us\/en\/blog\/wp-content\/uploads\/2025\/02\/Pavel_Surkou_930x700.png","type":"image\/png"}],"author":"Pavel Surkou","twitter_card":"summary_large_image","twitter_creator":"@ringcentral","twitter_site":"@ringcentral","twitter_misc":{"Written by":"Pavel Surkou","Est. reading time":"13 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/#article","isPartOf":{"@id":"\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/"},"author":{"name":"Pavel Surkou","@id":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/#\/schema\/person\/55c3d5848b391446702cd9ab06dc9bbf"},"headline":"Building a Snowflake data warehouse from scratch using Soda Core","datePublished":"2025-02-03T14:27:27+00:00","dateModified":"2025-06-04T08:17:26+00:00","mainEntityOfPage":{"@id":"\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/"},"wordCount":2427,"publisher":{"@id":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/#organization"},"image":{"@id":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/#primaryimage"},"thumbnailUrl":"\/us\/en\/blog\/wp-content\/uploads\/2025\/02\/Pavel_Surkou_930x700.png","keywords":["Working at RC Bulgaria"],"articleSection":["Company news &amp; culture"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/","url":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/","name":"Building a Snowflake data warehouse from scratch using Soda Core | RingCentral Blog","isPartOf":{"@id":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/#primaryimage"},"image":{"@id":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/#primaryimage"},"thumbnailUrl":"\/us\/en\/blog\/wp-content\/uploads\/2025\/02\/Pavel_Surkou_930x700.png","datePublished":"2025-02-03T14:27:27+00:00","dateModified":"2025-06-04T08:17:26+00:00","description":"My name is Pavel, and I am a Data Engineer in the RingCentral Bulgaria office. Ensuring data quality in our Snowflake warehouse used to be an important","breadcrumb":{"@id":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/#primaryimage","url":"\/us\/en\/blog\/wp-content\/uploads\/2025\/02\/Pavel_Surkou_930x700.png","contentUrl":"\/us\/en\/blog\/wp-content\/uploads\/2025\/02\/Pavel_Surkou_930x700.png","width":930,"height":700},{"@type":"BreadcrumbList","@id":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/"},{"@type":"ListItem","position":2,"name":"Building a Snowflake data warehouse from scratch using Soda Core"}]},{"@type":"WebSite","@id":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/#website","url":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/","name":"RingCentral Blog","description":"Intelligent Communications","publisher":{"@id":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/#organization","name":"RingCentral","url":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/#\/schema\/logo\/image\/","url":"\/us\/en\/blog\/wp-content\/uploads\/2025\/04\/ringcentral-logo.png","contentUrl":"\/us\/en\/blog\/wp-content\/uploads\/2025\/04\/ringcentral-logo.png","width":2048,"height":309,"caption":"RingCentral"},"image":{"@id":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/ringcentral","https:\/\/x.com\/ringcentral","https:\/\/www.linkedin.com\/company\/ringcentral\/","https:\/\/www.instagram.com\/ringcentral"]},{"@type":"Person","@id":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/#\/schema\/person\/55c3d5848b391446702cd9ab06dc9bbf","name":"Pavel Surkou","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/25e16d9e26a5343964138c91d1d0068516cd534d18cd0a897e28bcf14abaf7b9?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/25e16d9e26a5343964138c91d1d0068516cd534d18cd0a897e28bcf14abaf7b9?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/25e16d9e26a5343964138c91d1d0068516cd534d18cd0a897e28bcf14abaf7b9?s=96&d=mm&r=g","caption":"Pavel Surkou"},"url":"\/us\/en\/blog\/author\/pavel-surkou\/"}]}},"rc_img_url":"\/us\/en\/blog\/wp-content\/uploads\/2025\/02\/Pavel_Surkou_930x700.png","rcblog_by_author":"<a href=\"\/us\/en\/blog\/author\/pavel-surkou\/amp\" data-dl-events-click=\"true\" data-dl-element=\"link\"><span class=\"image\"><img src=\"https:\/\/secure.gravatar.com\/avatar\/25e16d9e26a5343964138c91d1d0068516cd534d18cd0a897e28bcf14abaf7b9?s=96&d=mm&r=g\" alt=\"\" width=\"30\" height=\"30\" layout=\"fixed\"><\/img><\/span><span class=\"by-author-name\">Pavel Surkou<\/span><\/a>","rc_author_full_name":"Pavel Surkou","rc_author_avatar":"\/us\/en\/blog\/wp-content\/uploads\/2025\/02\/pavel.surkou.png","rc_author_link":"\/us\/en\/blog\/author\/pavel-surkou\/amp","rc_post_categories":"<a href=\"\/us\/en\/blog\/category\/trending\/ringcentral-newsdesk\/amp\">Company news &amp; culture<\/a>","amp_link":"\/us\/en\/blog\/building-a-snowflake-data-warehouse-from-scratch-using-soda-core\/amp","excerpt_title":"Building a Snowflake data warehouse from scratch using So...","_links":{"self":[{"href":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/wp-json\/wp\/v2\/posts\/58837","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/wp-json\/wp\/v2\/users\/1242"}],"replies":[{"embeddable":true,"href":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/wp-json\/wp\/v2\/comments?post=58837"}],"version-history":[{"count":0,"href":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/wp-json\/wp\/v2\/posts\/58837\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/wp-json\/wp\/v2\/media\/59479"}],"wp:attachment":[{"href":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/wp-json\/wp\/v2\/media?parent=58837"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/wp-json\/wp\/v2\/categories?post=58837"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/newrcblog.wpengine.com\/us\/en\/blog\/wp-json\/wp\/v2\/tags?post=58837"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}