Link Search Menu Expand Document

What’s new in Soda docs?


October 3, 2024

  • Added documentation for using variables in the SampleRef message parameter for Python custom sampler to collect and display failed row samples.

October 2, 2024

September 30, 2024

September 26, 2024

  • Updated documentation to include customizable permissions for global and dataset roles in Soda Cloud, plus the ability to create new roles.
  • Added release notes documentation for Soda Library 1.6.3 and Soda Agent 1.1.29.

September 25, 2024

September 24, 2024

  • Add Configuration and setting hierarchy section to offer an overview of behavior for failed row sample collection.
  • Added release notes documentation for Soda Library 1.6.2.
  • Removed note about an upper limit of 10,000 for collecting failed row samples.
  • Added Soda Cloud connection configuration details to Connect to Dask and Pandas, and corrcted install package names in Troubleshooting section.
  • Added troubleshooting solution to MS Teams integration.

September 23, 2024

  • Compiled and updated failed row samples documentation, including:
    • the option to use scan context in a CustomSampler to read/write data to/from a scan
    • the option to collect failed rows samples from specific columns in a dataset in Soda Cloud
    • the option to disable failed row sample collection from all datasets, except those with explicit configuration to collect samples
  • Updated Failed row checks and User-defined checks to include optional configuration to specify a single column against which to run the check.
  • Revised check attributes configuration when applying attributes to more than one check.

September 23, 2024

September 19, 2024

  • Added attribute mapping to Okta SSO integration documentation.
  • Correct reconciliation check documentation to remove the option to add a list of comma-separated datasets to compare.

September 18, 2024

September 17, 2024

September 13, 2024

September 12, 2024

  • Added release notes documentation for Soda Core 3.3.15 - 3.3.22.

September 8, 2024

  • Published new use case guide for using Soda to test data quality in a Dasger pipeline.

September 4, 2024

  • Added release notes documentation for Soda Library 1.6.0 and Soda Agent 1.1.26.

August 29, 2024

August 28, 2024

  • Updated documentation to clarify when to deploy a self-hosted vs. Soda-hosted agent.

August 20, 2024

August 19, 2024

August 14, 2024

  • Added release notes documentation for Soda Library 1.5.25 and Soda Core 3.3.14.

August 13, 2024

  • Added release notes documentation for Soda Library 1.5.24 and Soda Agent 1.1.24.

August 8, 2024

  • Added content to clarify that Soda Library officially supports Python 3.8, 3.9, and 3.10.

August 2, 2024

  • Added release notes documentation for Soda Library 1.5.23 and Soda Agent 1.1.23.

August 1, 2024

  • Added release notes documentation for Soda Library 1.5.22 and Soda Agent 1.1.22.

July 31, 2024

July 29, 2024

July 25, 2024

July 24, 2024

  • Added release notes documentation for Soda Library 1.5.20 and Soda Agent 1.1.21.

July 23, 2024

July 22, 2024

July 19, 2024

  • Updated MS Teams integration documentation to reference creating Workflows in MS Teams instead of Office 365 Connectors. Microsoft is retiring the connectors effective August 15, 2024. If you have previously set up a Soda integration with an Office 365 connector, follow the instructions for Creating a workflow from a channel in Teams, then update the integration URL in your existing Soda <> MS Teams integration in Soda Cloud.

July 18, 2024

  • Added release notes documentation for Soda Agnet 1.1.20 and Soda Core 3.3.12.
  • Added host and port as optional connection configuration parameters for Snowflake.
  • Added an optional multi_subnet_failover parameter to the connection configuration for MS SQL.
  • Added an optional sslmode parameter to the connection configurations for PostgreSQL and Denodo.

July 17, 2024

  • The preview program for anomaly dashboards for observability has reached its quota. Removed “Request preview access” links from documentation.
  • Added release notes documentation for Soda Library 1.5.17 and Soda Core 3.3.11.
  • Corrected Missing metrics and Validity metrics to indicate that column config keys that use regex are supported only for text data types.

July 16, 2024

  • Added release notes documentation for Soda Library 1.5.16 and Soda Agent 1.1.19.

July 15, 2024

July 10, 2024

  • Revised SodaGPT documentation to replace it with details about Ask AI, Soda’s in-product generative AI assistant.

July 8, 2024

July 5, 2024

  • Added clarification to the inclusion and exclusion rules for profiling behavior.
  • Repeated the configuration instructions for samples columns when implicitly collecting failed row samples in multiple places, notably in Collect failed row samples.
  • Added details about RollingUpdate when upgrading a self-hosted Soda Agent.

July 2, 2024

  • Added release notes documentation for Soda Agent 1.1.17 & 1.1.18 and Soda Library 1.5.14.

June 28, 2024

  • Added release notes documentation for Soda Agent 1.1.15 & 1.1.16, Soda Library 1.5.13, and Soda Core 3.3.7, 3.3.8 & 3.3.9.
  • Published documentation to accompany data contracts version 4 release.

June 27, 2024

  • Added release notes documentation for Soda Agent 1.1.14 and Soda Library 1.5.12.

June 24, 2024

  • Added release notes documentation for Soda Agent 1.1.13 and Soda Library 1.5.11.

June 21, 2024

June 10, 2024

  • Added release notes documentation for Soda Core 3.3.6 and Soda Library 1.5.9.

June 18, 2024

  • Added release notes documentation for Soda Agent 1.1.10 & 1.1.11 and Soda Library 1.5.7 & 1.5.8.

June 17, 2024

  • Added a new docs page to begin recording data source connection issues and workarounds.
  • Added link to troubleshooting advice for reference checks that use dataset filters.
  • Added troubleshooting advice for Snowflake connections that use proxies.
  • Clarified the use of scan definition names in multiple programmatic scans in different pipelines.
  • Added requirement for SSO setup for customers to indicate whether they use Identity Provider Initiated (IdP-initiated) or and Service Provider Initiated (SP-initiated) single sign-on integrations. (Also included in procedural instructions.)
  • Updated experimental support for data contracts.

June 10, 2024

  • Added release notes documentation for Soda Agent 1.1.9 and Soda Library 1.5.6.

June 7, 2024

June 6, 2024

June 5, 2024

  • Added release notes documentation for Soda Library 1.5.5.
  • Added details about IRSA authentication for Athena and Redshift data sources.
  • Added new example script to fetch dataset and check info from a Soda Cloud account and transfer data into CSV files.

May 30, 2024

  • Added release notes documentation for the Soda AI features generally available or available for preview access upon request.

May 29, 2024

  • Added release notes documentation for Soda Library 1.5.4 and Soda Agents 1.1.5 and 1.1.6.
  • Added documentation and example of including a failed rows query in a user-defined check.

May 28, 2024

May 25, 2024

May 24, 2024

  • Added release notes documentation for Soda Library 1.5.2, Soda Core 3.3.5, and Soda Agent 1.1.3.

May 23, 2024

  • Documented the new feature for data quality observability, the automated, ML-driven Anomaly Dashboard.
  • Added release notes documentation for Soda Library 1.5.1 and Soda Agent 1.1.2.
  • Added note about using a different key for database for connecting to a DuckDB data source.

May 20, 2024

May 17, 2024

May 14, 2024

  • Added release notes documentation for Soda Agent 1.1.1 and Soda Library 1.4.9.

May 8, 2024

  • Added to programmatic scan to include option to run the scan locally and not send results to Soda Cloud.

May 7, 2024

  • Added to example script for a programmatic scan to include a check template file path.
  • Added prerequisite to no-code check creation that datasets must be discovered during data source onboarding.
  • Touched up some details for advanced configuration of the anomaly detection simulator.
  • Added release notes documentation for Soda Core 3.3.3 and 3.3.4, and Soda Library 1.4.8.

May 6, 2024

  • Removed the Agreement deprecation notice as the decision to deprecate the feature has been reversed.

April 30, 2024

  • Published documentation for V3 of data contracts, Soda’s experimental way to set data quality standards for data products.

April 29, 2024

  • Added optional connection parameters to Denodo data source configuration.

April 26, 2024

  • Included information about allocating resources for improved performance of a self-hosted Soda Agent.

April 25, 2024

  • Added release notes documentation for Soda Agent 1.1.0.
  • Added documentation to initiate an integration between Soda Cloud and Microscoft Purview.

April 24, 2024

April 22, 2024

April 12, 2024

April 10, 2024

April 4, 2024

  • Added release notes documentation for Soda Library 1.4.5 & 1.4.6, and Soda Agent 1.0.4 and 1.0.5.
  • Updated the example script in Reroute failed row samples guide.

March 27, 2024

  • Added release notes documentation for Soda Core 3.3.1, Soda Library 1.4.4, and Soda Agent 1.0.3.

March 21, 2024

  • Added release notes documentation for Soda Library 1.4.3.
  • Added more content and example for rerouting failed row samples.
  • Added requirement for using anomaly detection checks in group by configurations: requires Soda Library 1.1.27 or greater, or Soda Agent 0.8.57 or greater.

March 20, 2024

March 18, 2024

March 15, 2024

  • Published documentation for V2 of data contracts, Soda’s experimental way to set data quality standards for data products.

March 12, 2024

  • Add instructions for how to programmatically use Soda Library with an example script to reroute failed row samples to the CLI output instead of Soda Cloud.

March 6, 2024

March 5, 2024

March 1, 2024

  • Published guidance for managing sensitive data in Soda.
  • Added release notes documentation for Soda Library 1.4.1.
  • Added a compatibility legend to SodaCL reference documentation to clarify which checks are available via various means; see example.

February 29, 2024

  • Following improvements and changes to the self-hosted Soda Agent 1.0.0, removed the documented details for including idle replicas and polling intervals in a cluster that aimed to improve scan times. Also, added release notes to inform existing Soda Agent users about changes to parameter configuration with 1.0.0 and advice for optimal performance using managed node groups instead of Fargate profiles in Amazon EKS, GCP Autopilot, or AKS Virtual Clusters. See Soda Agent release notes for upgrade details.
  • Added details for system requirements for deploying a Soda Agent in a Kubernetes cluster.
  • Included schema checks as available to add as a no-code check to a dataset in a data source that uses a Soda Agent to execute scans.
  • Added instructions for how to run a Soda Cloud-defined scan remotely using the Soda Library CLI. See the Remotely run a scan tab in Scan for data quality.
  • Added release notes documentation for Soda Library 1.3.4, 1.4.0 and Soda Core 1.3.3.
  • Added Databricks SQL to the list of data sources you can connect to using a Soda-hosted Agent.

February 28, 2024

  • Added notation for opting out of usage statistics with a Soda Agent.
  • Added notation that Group By checks support a maximum of 1000 groups.
  • Changed instances of scheduled scan and scan schedule to scan definition to match the Soda Cloud user interface.
  • Clarified the support for casting columns when using a freshness check.
  • Clarified the Basic SAML Configuration values to provide during SSO integration with Azure AD.

February 26, 2024

  • Added release notes documentation for Soda Agent 0.9.1, which maps to Soda Library 1.3.2.

February 22, 2024

February 21, 2024

  • Added API documentation for the Soda Cloud API that enables you to trigger Soda Cloud scans programmatically.
  • Added a new section to Scan for data quality for triggering a scan via API.
  • Added release notes documentation for Soda Agent 0.9.0, which maps to Soda Library 1.3.2.

February 14, 2024

  • Updated connection configuration parameters for Athena and Oracle.
  • Made corrections to the connection details for Athena and Redshift; access keys are required parameters for each, regardless of whether you also use a role_arn parameter.

February 13, 2024

  • Added release notes documentation for Soda Library 1.3.3.
  • Added release notes documentation for Soda Library 1.3.2 and Soda Core 3.2.1.
  • Added release notes documentation for Soda Agent 0.8.57, which maps to Soda Library 1.3.2.

February 9, 2024

February 8, 2024

February 2, 2024

  • Added links to a video that demonstrates how to add Soda to a Databricks pipeline.
  • Added release notes documentation for Soda Agent 0.8.56, which maps to Soda Library 1.2.4.

February 1, 2024

  • Published new documentation for the Soda-hosted agent, a secure, out-of-the-box agent you can use to connect to data sources from within the Soda Cloud user interface.
  • Added documentation for the new anomaly detection check, which replaces the anomaly score check.
  • Added release notes documentation for Soda Agent 0.8.55, which maps to Soda Library 1.2.3.

January 29, 2024

  • Added release notes documentation for Soda Agent 0.8.54, which maps to Soda Library 1.2.3.

January 26, 2024

January 22, 2024

  • Updated the documentation for rerouting failed row samples to include new, optional configuration parameters that offer users direct access to the failed row sample data.

January 19, 2024

January 15, 2024

January 12, 2024

January 5, 2024

January 3, 2024

  • Updated Integrate Jira with Soda to include copy-able code snippets for the field values in Jira.
  • Documented the optional syntax for anomaly score checks to produce warnings instead of fails.
  • Added release notes documentation for Soda Library 1.1.29 and Soda Core 3.1.3.

January 2, 2024

  • Added alertnate syntax for failed row check using a failed row condition.

December 21, 2021

  • Documented the support for tracking anomalies and changes over time in checks grouped by category.
  • Updated the Self-serve Soda use case guide to include instructions for using no-code checks and Discussions to empower non-coders to join the team effort of establishing good-quality data.

December 15, 2023

  • Added release notes documentation for Soda Library 1.1.27 and Soda Agent 0.8.53.
  • Added release notes documentation for Soda Library 1.1.28 and Soda Core 3.1.2.

December 13, 2023

  • Updated freshness check to include support for in-check filters.
  • Added documentation to clarify that Soda supports Azure Data Factory (ADF) with Airflow using Synapse connection configuration.
  • Documented the support for adding quotes to all datasets that Soda acts upon automatically such as with profiling or discovering datasets.
  • Added an example of an in-check filter that uses a string value.
  • Added a troubleshooting item for the error NoneType object is not iteratable.
  • Added instructions for dynamically including a dataset name in a for each configuration.
  • Prepared new, independent documentation for integrating Soda with Jira and ServiceNow.

December 7, 2023

  • Introducting no-code check creation in Soda Cloud. Create checks via the Soda Cloud user interface that creates SodaCL checks without writing any SodaCL.

December 4, 2023

  • Added release notes documentation for Soda Library 1.1.26, Soda Core 3.1.1, and Soda Agent 0.8.51 - 0.8.52.

November 29, 2023

November 28, 2023

November 24, 2023

  • Added release notes documentation for Soda Library 1.1.24 - 1.1.25.
  • Added Known issue to Group By configuration; does not support anomaly score checks.
  • Adjusted workaround advice for troubleshooting error using quotes with an in-check filter.
  • Added Advanced configuration for setting key column identifiers.

November 22, 2023

  • Added an example of a schema check that detects columns which could contain PII.
  • Added release notes documentation for Soda Agent 0.8.49 - 0.8.50.

November 21, 2023

  • Added documentation for managing scans and setting up failed scan notifications.
  • Added work_group as an optional connection configuration property for Athena.
  • Added troubleshooting tip for using quotes on column names within an in-check filter. See Troubleshoot SodaCL.
  • In the context of Soda Cloud, changed instances of scan defintion to scan schedule to reflect the updated naming in the Soda Cloud UI.

November 16, 2023

  • Introducing the launch of data contracts, Soda’s experimental way to set data quality standards for data products.
  • Added release notes documentation for Soda Core 3.1.0.

November 15, 2023

  • Corrected the rule that numeric characters in a list of valid values, invalid values, or missing values, must be wrapped in single quotes. This is not the case. See Specify valid or invalid values for corrected content.

November 14, 2023

  • Added release notes documentation for Soda Library 1.1.22 and Soda Core 3.0.54.

November 8, 2023

  • Removed Reporting API v0 documentation as the version is now deprecated.

November 7, 2023

  • Added two configuration keys for use with validity metrics: invalid format, invalid regex.
  • Soda Cloud Reporting API v0 is now deprecated. Please use Reporting API v1.
  • Updated data source configuration reference content to fill in blanks and offer more examples.

November 2, 2023

  • Added pollingInterval to Soda Agent deployment instructions.
  • Added release notes documentation for Soda Library 1.1.20 - 1.1.21 and Soda Core 3.0.52 - 3.0.53.
  • Added sample input values and clarifying notes to data source connection config reference for Athena.

October 30, 2023

  • Updated anomaly score documentation to include support for dataset filters.
  • Added documentation to accompany new support for Presto data source.

October 26, 2023

  • Added documentation to accompany new support for MotherDuck data source.

October 25, 2023

  • Added to the list of supported check types in SodaGPT.
  • Added another example snippet to Group By checks.

October 24, 2023

  • Added instructions to Connect Soda to Spark to recommend changing the name of the data_source_name in step 5.

October 23, 2023

  • Added release notes documentation for Soda Library 1.1.19.
  • Clarify instructions about adding a check identity to a check; see Add a check identity.
  • Corrected the syntax for data source connection values when using the GitHub Action for Soda in a Workflow; needed spaces before and after variables in single curly braces. See Add the GitHub Action for Soda to a Workflow.
  • Added Slack icon in header to link to Soda Community.

October 17, 2023

  • Deprecated sampling from distribution check DRO generation.
  • Documented the support for adding alert coniditions to a failed row check.
  • Added instructions for applying check attributes to multiple checks in a single checks for dataset_name block.

October 13, 2023

  • Added new content to clarify what an active check is. Soda’s licensing model can inlcude volume-based measures of active checks.
  • Added link to new video for Atlan integration.

October 12, 2023

October 11, 2023

  • Refactored the content on docs.soda.io to focus more on use cases, tasks, and reader goals. The goal of the project was to pivot from a products-based set of documentation to task-based/use case-based content.
    You may notice a change to the navigation on docs.soda.io that is organized by actions (Install, Deploy, Run Scans, Set alerts, etc.) instead of by product (Soda Library, Soda Cloud, Soda Agent, etc.)
    • Access a new Get started roadmap with recommendations to help you quickly become productive and confident using Soda for data quality testing.
    • Get inspired by new Use case guides to offer guidance in setting up Soda to meet a specific need.
    • Get your Soda account organized and set up to maximize your team’s data quality testing efficiency.
  • Updated Integrate Soda with dbt to install sub-packages with double-quotes.
  • Update best practices for reconciliation checks to recommend creating a separate agreement for a reconciliation project.
  • Added release notes documentation for Soda Library 1.1.15 - 1.1.16 and Soda Core 3.0.51.

October 6, 2023

  • Updated session_parameters config to session_params in Snowflake connection config reference.
  • Added instructions for how to reset anomaly history for an anolamy score check.
  • Added detail to programmatic scan to include a filename in a scan when checks are included inline.
  • Added release notes documentation for Soda Library 1.1.14.

October 5, 2023

September 27, 2023

  • Added clarifying information about user input and how it is used by SodaGPT.
  • Added release notes documentation for Soda Library 1.1.13 and Soda Core 3.0.50.

September 26, 2023

  • Added documentation for reconciliation schema checks which now support data type mapping.
  • Documented a new scan option, --local that you can add to a soda scan command to prevent Soda Library from pushing any check results to Soda Cloud. See: Add scan options and Scan output in Soda Cloud.
  • Revised and tigtened Soda Core information.
  • Documented the global configuration to disable sending any samples of data to Soda Cloud; see Disable samples in Soda Cloud.

September 21, 2023

  • Updated support for dbt for ingesting tests into Soda Cloud. You must now install a soda-dbt subpackage that uses dbt 1.5 or 1.6.
  • Added release notes documentation for Soda Library 1.1.12.

September 20, 2023

September 19, 2023

  • Added release notes documentation for Soda Library 1.1.11 and Soda Core 3.0.49.

September 18, 2023

September 14, 2023

  • Removed known issue for inability to use check identity with failed row checks. This is now supported in Soda Library.

September 13, 2023

September 12, 2023

September 11, 2023

September 1, 2023

August 31, 2023

August 30, 2023

August 24, 2023

  • Added documentation for the new, native integration of Soda in Atlan.
  • Updated orchestration documentation to include a link to an Astronomer tutorial for Data Quality Checks with Airflow, Snowflake, and Soda.
  • Added to item to Troublshoot SodaCL for dealing with unexpected missing checks behaviour.

August 23, 2023

  • Update agreement documentation to reflect the change in behaviour where scans do not run until stakeholders have approved of the agreement.

August 21, 2023

August 11, 2023

  • Added release notes documentation for Soda Core 3.0.48.
  • Added release notes documentation for Soda Library 1.0.6 - 1.0.8.
  • Added Known issue: Failed rows checks do not support the check identity parameter.
  • Added a note to Create an agreement to clarify that you can only create agreements using data sources that have been added to Soda Cloud via a Soda Agent.
  • Added collection as a new term in the Glossary.

August 10, 2023

August 8, 2023

  • Revised documentation to reflect the new Checks dashboard, that displays checks and their latest scan results. This replaces the Check Results dashboard, that displayed all individual check results.

August 7, 2023

July 26, 2023

  • Added release notes documentation for Soda Library 1.0.5.
  • Added detail to schema check documentation for new schema_name parameter.

July 24, 2023

July 21, 2023

July 6, 2023

  • Added documentation for the samples columns check configuration for metrics and checks that implicitly collect failed row samples: missing, validity, duplicates, reference.

July 4, 2023

  • Updated commands for installing Soda Library using a Docker image.

June 27, 2023

  • Documentation to accompany the preview launch of SodaGPT.

June 23, 2023

  • Changed requirement for check template to include the dataset identifier in the first line of the check so that Soda Cloud can properly render the check results.
  • Added release notes documentation for Soda Core 3.0.40 and Soda Core 3.0.41.
  • Added release notes documentation for Soda Library 1.0.1 and Soda Library 1.0.2.
  • Reverted Soda Agent to describe configuring Soda Core settings instead of Library. Will update to Soda Library details when updates are complete.

June 15, 2023

June 12, 2023

June 9, 2023

  • Added Known Issue for using BigQuery and specifying numeric missing values or valid values with single quotes. TL;DR: Don’t use single quotes.
  • Added clarification to the value for path when connecting a DuckDB data source.
  • Removed incorrect syntax guidance regarding multiple thresholds for an alert. Each warn or fail condition can contain only one threshold. See Optional check configurations.
  • Updated instructions for configuring a soda_cloud connection in a configuration.yml file. New instructions involve copying the whole configuration instead of just API Key values.

June 8, 2023

  • Added release notes documentation for Soda Core 3.0.38 and Soda Core 3.0.39.

May 31, 2023

  • Added instructions and event payload details for using a webhook to notify a third-party of new, deleted, or changed Soda agreements.

May 30, 2023

  • Added a new parameter, datasource_container_id to the .datasource-mapping.yml file neede to map a Soda Cloud-Alation catalog integration.

May 25, 2023

  • Added a step for configuring soda-core-spark[databricks] to be sure to install databricks-sql-connector as well.

May 23, 2023

  • Added a video overview showcasing the integration of Soda and Alation.
  • Added a note for a Known Issue regarding the use of variables in profiling configurations.

May 20, 2023

May 15, 2023

May 11, 2023

  • Added release notes documentation for Soda Core 3.0.33 and Soda Core 3.0.34.
  • Added instructions for user-defined metrics to access and use queries in separate SQL files.
  • Adjusted content for the revised CLI and Soda Cloud scan output for schema checks. Schema check results now display the output for all alerts triggered during a scan.

May 9, 2023

  • Added the install package to each connector’s page.
  • Added a connectivity troubleshooting tip to Connect to Snowflake.

May 2, 2023

  • Published content regarding the set up of multiple Soda Cloud organizations for use with different environments in your network infrastructure.
  • Added a note about selecting a region when you sign up for a new Soda Cloud account.

April 28, 2023

  • Corrected the explanation of the duplicate_count check regarding checks that included multiple arguments (columns).

April 18, 2023

  • Added release notes documentation for Soda Core 3.0.31 and Soda Core 3.0.32.

April 11, 2023

  • Added a copy-to-clipboard button to most code snippets in documentation.
  • Added attribute mapping details to add Soda Cloud to Google Workspace as a SAML app.

March 29, 2023

March 28, 2023

March 24, 2023

March 21, 2023

  • Added release notes documentation for Soda Core 3.0.29 and Soda Core 3.0.30.
  • Added instructions for limiting samples for an entire data source.

March 9, 2023

March 8, 2023

  • Added to Troubleshoot SodaCL with information about checks that return [NOT EVALUATED] results.
  • Added new content with advice to Compare data using SodaCL.
  • Documented how to prevent Soda from collecting failed rows samples and sending them to Soda Cloud using a samples limit.
  • Corrected a prerequisite in Add a data source to indicate that you can deploy a Soda Agent in any Kubernetes cluster, not just Amazon EKS.

March 7, 2023

  • Added release notes documentation for Soda Core 3.0.26 & 3.0.27.

February 28, 2023

  • Published instructions for setting up private connectivity to a Soda Cloud account using AWS PrivateLink.

February 23, 2023

February 22, 2023

  • Removed preview status from agent deployment documentation for Azure Kubernetes Service (AKS) and Google Kubernetes Engine (GKE).
  • Added instructions for programmatically running a Soda scan of the contents of a local file using Dask.

February 21, 2023

February 16, 2023

  • Added documentation for the invalid values configuration key. Refer to Validity metrics documentation.
  • Added release notes documentation for Soda Core 3.0.23.
  • Corrected custom check templates to use fail condition syntax, not fail expression.
  • Added instructions to Configure a time partition using the NOW variable.
  • Added a note for limitations on using variables in checks in agreements in Soda Cloud.

February 10, 2023

February 9, 2023

January 25, 2023

  • Added release notes documentation for Soda Core 3.0.22.
  • Added a detail for adding an optional scheme property to soda_cloud configuration when connecting Soda Core to Soda Cloud.
  • Added documentation to accompany new support for Dask and Pandas (Experimental).

January 24, 2023

  • Added documentation to accompany new support for Vertica (Experimental).
  • Added troubleshooting tip for errors in which Soda does not compute metrics for a dataset that includes a schema in its identifier.

January 20, 2023

January 19, 2023

  • Added clarity to the documentation for adding a check identity and using a scan definition name.
  • Added release notes documentation for Soda Core 3.0.20.
  • Added release notes documentation for Soda Core 3.0.21.
  • Updated screenshots of Soda Cloud for deploying an agent.
  • Added explicit detail about when to wrap date variables in single quotes.
  • Added a custom check templates for validating event sequence with date columns.
  • Updated the Soda product feature list.

January 13, 2023

  • Updated Soda Agent for GKE documentation so that the instructions for using a file reference for a BigQuery data source connection use a Kubernetes secret instead of an Kubernetes ConfigMap.

January 11, 2023

  • Added documentation for the ability to create and use check attributes.
  • Adjusted documentation for adding dataset attributes to correspond with the new check attributes feature.
  • Added release notes documentation for Soda Core 3.0.18.
  • Removed the known issue for using duplicate_count and duplicate_percent metrics with an in-check filter.

January 10, 2023

  • Added note about the new ability to add co-owners to an agreement.

December 28, 2022

December 20, 2022

December 15, 2022

  • Added release notes documentation for Soda Core 3.0.16.
  • Corrected data types on which max and min metrics can be used. See Numeric metrics.

December 12, 2022

December 8, 2022

  • Added preview documentation for the Soda Cloud Reporting API v1.
  • Corrected documentation to properly reflect that you can add only one column against which to execute a metric in a check.
  • Reverted the statement about using variables to pass any value anywhere in syntax or configuration at scan time. Refer to variables documentation for details on how to use them.

December 2, 2022

  • Added preview documentation for deploying a Soda Agent in an AKS cluster. Reorganized and expanded Soda Agent documentation in general.
  • Added documentation to cast a column so as to use TEXT type data in a freshness check.
  • Documented troubleshooting tips for Soda Cloud 400 response.

December 1, 2022

November 30, 2022

  • Adjusted the documentation for dataset discovery because, as of Soda Core v3.0.14, the action no longer derives a row_count metric; see Dataset discovery.
  • Added documentation for the preview of the alert notification rules feature.

November 28, 2022

November 23, 2022

November 18, 2022

  • Added a list of valid formats for validity metrics that Soda for MS SQL Server supports.
  • Added documentation for rerouting failed rows samples to an HTTP endpoint; supported as of Soda Core 3.0.13.
  • Removed content for overwriting Soda Cloud checks results using -t option.
  • Archived all Soda SQL and Soda Spark content to the sodadata/soda-sql repository in GitHub.

November 16, 2022

  • Added content to more explictly describe the metrics that dataset discovery and column profiling derive, and the potential compute costs associated with these configurations.

November 15, 2022

  • Added release notes documentation for Soda Core 3.0.13.
  • Adjusted freshness check documentation to reflect new support for columns that contain data type DATE.
  • Added documentation to accompany new support for OracleDB.

November 14, 2022

  • Corrected the location in which to opt out of sending Soda Core usage statistics.

November 10, 2022

November 8, 2022

  • Added an example webhook integration for Soda Cloud and ServiceNow.

November 7, 2022

November 3, 2022

  • Added release notes to correspond with the release of Soda Core 3.0.12.
  • Added documentation for a new numeric metric: duplicate_percent. See Numeric metrics.
  • Removed known issue regarding Soda Core for SparkDF not supporting anomaly score or distribution checks; now the checks are supported.
  • Added documentation for a new feature to disable failed rows samples for specific columns.
  • Added documentation for distribution checks which now support dataset and in-check filters. See Distribution check optional check configurations.

November 2, 2022

  • Removed missing format as a valid configuration key for missing metrics.
  • Added an independent Connect to Databricks page that points to documentation to use Soda Core packages for Apache Spark to connect.

November 1, 2022

October 26, 2022

  • Removed the Preview status from self-serve features which are now generally available in Soda Cloud, such as agreements and profiling.
  • Migrated custom metric templates from Soda SQL to SodaCL.

October 19, 2022

October 13, 2022

  • Added notes about specifying the type of quotes you use in SodaCL checks must match that which the data source uses.
  • Added short snippet as an example to obtain scan exit codes in a programmatic scan.
  • Added detail about using multiple checks files in one scan command.
  • Added detail about re-using user-defined metrics in multiple checks in the same checks YAML file.

October 11, 2022

October 5, 2022

  • Added release notes to correspond with the release of Soda Core 3.0.10.
  • Revised the value for the default number of failed row samples that Soda automatically collects and displays in Soda Cloud from 1000 to 100.
  • Added documentation to accompany new support for Dremio.
  • Added documentation to accompany new support for ClickHouse (Experimental).

September 29, 2022

  • Added a link to a community contribution for Prefect 2.0 collection for Soda Core.
  • Updated Reference checks documentation for displaying failed rows in Soda Cloud.

September 28, 2022

  • Added release notes to correspond with the release of Soda Core 3.0.9.
  • Added documentation for a new samples limit configuration key that you can add to checks that use missing, validity, or duplicate_count metrics which automatically send 1000 failed row samples to Soda Cloud.
  • Added instructions to save failed row samples to a file.
  • Added Windows-specific instructions for installing Soda Core using a virtual environment.
  • Removed known issue for in-check variables which are supported as of Soda Core 3.0.9: “Except for customizing dynamic names for checks, you cannot use in-check variables. For example, Soda does not support the following check:
checks for dim_customers:
  - row_count > ${VAR_2}

September 23, 2022

Septemeber 22, 2022

  • Added release notes to correspond with the release of Soda Core 3.0.8.
  • Removed Known issue: Connections to MS SQL Server do not support checks that use regex, such as with missing metrics or validity metrics.

September 14, 2022

  • Added instructions for configuring a custom sampler for failed rows.

September 13, 2022

  • Added documentation to correspond with the release of Soda Core 3.0.7, including an update to freshness check results.
  • Removed the known issue for using variables in the SQL or CTE of a user-defined check. See GitHub Issue 1577.
  • Added instructions for configuring the same scan to run in multiple environments.
  • Added information about passing parameters to a Snowflake data source in connection configurations, specifically which parameter to use to authenticate a connection via SSO with a SAML 2.0-compliant identity provider (IdP).

Septemeber 12, 2022

  • Documented Soda Cloud resources to add visual context to the parts that exist in Soda Cloud, and how they relate to each other, particularly when you delete a resource.
  • Added documentation to correspond with Soda Cloud’s new support for webhooks to integrate with third-party service providers to send alert notifications or create and track incidents externally.
  • Corrected documentation to indicate that reference checks do not support dataset filters.

September 9, 2022

  • Decoupled data source connection configuration details from Soda Core. Created a separate page for each data source’s connection config details. See Connect a data source.

September 8, 2022

September 7, 2022

  • Added content to correspond with Soda Core’s new support for Spark for Databricks SQL.
  • Adjusted documentation to reflect that Soda Core now supports the ingestion of dbt tests.

August 30, 2022

  • Recorded known issue: Soda Core for SparkDF does not support anomaly score or distribution checks.

August 29, 2022

August 26, 2022

  • Added documentation for how to use Soda Core for SparkDF with a Notebook to connect to Databricks.
  • Adjusted the configuration for connecting to MS SQL Server based on community feedback.

August 24, 2022

  • Adjusted configuration instructions for soda-core-spark-df to separately install dependencies for Hive and ODBC as needed.
  • Added content to correspond with Soda Core’s new support for Trino.
  • Removed the known issue: The missing format configuration does not function as expected.

August 22, 2022

  • Added an example DAG for using Soda with Airflow PythonOperator.
  • Added Tips and best practices for SodaCL documentation.
  • Expanded For each documentation with optional configurations and examples.
  • Published a new Quick start for Soda Cloud (Preview) that outlines how to use preview features in Soda Cloud to connect to a data source, then write a new agreement for stakeholder approval.

August 11, 2022

  • Added documentation for the new -t option for use with scan commands to overwrite scan output in Soda Cloud.

August 10, 2022

  • Added content to correspond with Soda Core’s new support for MySQL.
  • Validated and clarified documentation for using filters and variables.

August 9, 2022

  • Added documentation to describe the migration path from Soda SQL to Soda Core.

August 2, 2022

  • Adjusted the instructions for Slack integration to correspond with a slightly changed UI experience.
  • Added limitation to the for each as the configuration is not compatible with dataset filters (also known as partitions).

August 1. 2022

July 27, 2022

July 20, 2022

  • Published documentation associated with the preview release of Soda Cloud’s self-serve features and functionality. This is a limited access preview release, so please ask us for access at support@soda.io.

June 29, 2022

June 28, 2022

  • Revised documentation to reflect the general availability of Soda Core and SodaCL.
  • Archived the deprecated documentation for Soda SQL and Soda Spark.

June 23, 2022

June 22, 2022

June 21, 2022

  • Added details to Soda Core documentation for using system variables. to store sensitive credentials.
  • Updated the Quick start for Soda Core and Soda Cloudwith slightly changed instructions.

June 20, 2022

  • Changed all references to table in SodaCL to dataset, notably used with for each and distribution check syntax.
  • Added deprecation warning banners to all Soda SQL and Soda Spark content.
  • Revised and reorganized content to reframe focus on Soda Core in lieu of Soda SQL.
  • New How Soda Core works documentation.
  • Added more Soda Core documentation to main docs set.
  • Updated Soda product overview to reflect new focus on Soda Core and imminent deprecation of Soda SQL and Soda Spark.
  • Updated Soda Cloud documentation to reflect new focus on Soda Core.
  • Update links on docs home page to point to most recent content and shift Soda SQL and Soda Core to a Legacy section.

June 14, 2022

  • Added documentation corresponding to Soda Core support for Apache Spark DataFrames. For use with programmatic Soda scans, only.
  • Updated the syntax for freshness checks to remove using from the syntax and identify column name instead by wrapping in parentheses.
    • old: freshness using created_at < 3h
    • new: freshness(created_at) < 3h
  • Added clarification to the context-specific measning of a BigQuery dataset versus a dataset in the context of Soda.
  • Added instructions for setting a default notification channel in Slack for Soda Cloud alerts.
  • Added an explanation about anomaly score check results and the minimum number of measurements required to gauge an anomaly.
  • Moved installation instructions for Soda Core Scientific to a sub-section of Install Soda Core.
  • Added expanded example for setting up Soda Core Spark DataFrames.

June 9, 2022

  • Added some new Soda Core content to documentation.
  • Moved Soda SQL and Soda Spark in documentation leftnav.
  • Updated Home page with links to new Soda Core documentation.
  • Fixed formatting in Quick start for Soda Core and Soda Cloud.

June 8, 2022

  • Updated the Quick start for SodaCL with an example of a check for duplicates.
  • Added documentation for installing Soda Spark on Windows.
  • Updated the Distribution check documentation to record a change in syntax for the check and the addition of two more methods available to use with distribution checks.

June 7, 2022

June 6, 2022

June 2, 2022

June 1, 2022

May 31, 2022

  • Updated SodaCL Schema checks and Reference checks documentation.
  • Corrected Soda Cloud connection syntax in the Quick start for Soda Core and Soda Cloud.
  • Removed separate Duplicate checks documentation, redirecting to Numeric metrics.

May 26, 2022

May 25, 2022

  • Revised and renamed Data observability to Data concepts.

May 24, 2022

  • Updated the documentation for the distribution check in SodaCL, including instructions to install Soda Core Scientific.

May 19, 2022

May 18, 2022

  • Updated the details pertaining to connecting Soda Core to Soda Cloud. The syntax for the key-value pairs for API keys changed from api_key and api_secret to api_key_id and api_key_secret.

May 9, 2022

April 26, 2022

  • Updated a set of Soda product comparison matrices to illustrate the features and functionality available with different Soda tools.

April 25, 2022

  • Updated the Soda product overview with a more thorough explanation of the product suite and how the parts work together to establish and maintain data reliability.

April 22, 2022

  • Replaced the quick start tutorials for Soda SQL and Soda Cloud with two new tutorials:
    • Quick start for Soda SQL and Soda Cloud
    • Quick start for Soda Core and Soda Cloud

April 6, 2022

  • Added details to the Freshness check to clarify limitations when specifying duration.
  • Added documentation for how to use system variables to store property values#provide-credentials-as-system-variables) instead of storing values in the env_vars.yml file.
  • Updated Soda Core documentation to remove aspirational content from Adding scans to a pipeline.

April 1, 2022

  • Added documentation for the dataset_name identifier in a scan YAML file. Use the identifier to send more precise dataset information to Soda Cloud.

March 22, 2022

  • New documentation for the beta release of Soda Core, a free, open-source, command-line tool that enables you to use the Soda Checks Language to turn user-defined input into aggregated SQL queries.
  • New documentation for the beta release of SodaCL, a domain-specific language you can use to define Soda Checks in a checks YAML file.

February 15, 2022

  • Added content to explain how Soda Cloud notifies users of a scan failure.

February 10, 2022

January 18, 2022

January 17, 2022

  • Added text to Roles and rights documentation about the option to use the Reporting API to access Audit Trail data.

January 12, 2022

  • Added documentation regarding Licenses in Soda Cloud.

January 11, 2022

  • Added requirement for installing Soda Spark on a Databricks cluster. See Soda Spark Requirements.

December 22, 2021

  • Added data types information for Trino and MySQL.
  • Adjusted the docs footer to offer users ways to suggest or make improve our docs.

December 16, 2021

  • Added documentation for how to integrate Soda with dbt. Access the test results from a dbt run directly within your Soda Cloud account.

December 14, 2021

  • Added documentation to accompany the new Soda Cloud Incidents feature. Collaborate with your team in Soda Cloud and in Slack to investigate and resolve data quality issues.

December 13, 2021

December 6, 2021

  • Added documenation for the new audit trail feature for Soda Cloud.
  • Added further detail about which rows Soda SQL sends to Soda Cloud as samples.

December 2, 2021

  • Updated Quick start tutorial for Soda Cloud.
  • Added information about using regex in a YAML file.

November 30, 2021

  • Added documentation about the anonymous Soda SQL usage statistics that Soda collects. Learn more about the information Soda collects and how to opt out of sending statistics.

November 26, 2021

November 24, 2021

November 23, 2021

  • Revised the Quick start tutorial for Soda SQL to use the same demo repo as the interactive demo.

November 15, 2021

  • Added a new, embedded interactive demo for Soda SQL.
  • New documentation to accompany the soft-launch of Soda Spark, an extension of Soda SQL functionality.

November 9, 2021

  • New documentation to accompany the new, preview release of historic metrics. This type of metric enables you to use Soda SQL to access the historic measurements in the Cloud Metric Store and write tests that use those historic measurements.

October 29, 2021

  • Added SSO identity providers to the list of third-party IdPs to which you can add Soda Cloud as a service provider.

October 25, 2021

  • Removed the feature to Add datasets directly in Soda Cloud. Instead, users add datasets using Soda SQL.
  • Added support for Snowflake session parameter configuration in the warehouse YAML file.

October 18, 2021

  • New documentation to accompany the new Schema Evolution Monitor in Soda Cloud. Use this monitor type to get notifications when columns are changed, added, or deleted in your dataset.

October 17, 2021

  • New documentation to accompany the new feature to disable or reroute sample data to Soda Cloud.

September 30, 2021

September 28, 2021

  • New documentation to accompany the release of SSO integration for Soda Cloud.

September 17, 2021

  • Added Soda Cloud metric names to primary list of column metrics.

September 9, 2021

  • Published documentation for time partitioning, column metrics, and sample data in Soda Cloud.

September 1, 2021

  • Added information for new command options included in Soda CLI version 2.1.0b15 for
    • limiting the datasets that Soda SQL analyzes,
    • preventing Soda SQL from sending scan results to Soda Cloud after a scan, and
    • instructing Soda SQL to skip confirmations before running a scan.
  • Added information about how to use a new option, account_info_path, to direct Soda SQL to your BigQuery service account JSON key file for configuration details.

August 31, 2021

  • Added documentation for the feature that allows you to include or exclude specific datasets in your soda analyze command.

August 30, 2021

  • Updated content and changed the name of Data monitoring documentation to Data quality.

August 23, 2021

  • New document for custom metric templates that you can copy and paste into scan YAML files.

August 9, 2021

  • Added details for Apache Spark support. See Install Soda SQL.
  • Updated Adjust a dataset scan schedule to include details instructions for triggering a Soda scan externally.

August 2, 2021

  • Added new document to ouline the Support that Soda provides its users and customers.
  • Updated BigQuery data source configuration to include auth_scopes.

July 29, 2021

  • Added instructions for configuring BigQuery permissions to run Soda scans.
  • Added an example of a programmatic scan using a lambda function.
  • Added instructions for overwriting scan output in Soda Cloud.
  • New document for Example test to compare row counts.

July 26. 2021

  • Added Soda SQL documentation for configuring excluded_columns during scans.
  • Updated compatible data sources for Soda SQL to include MySQL (experimental), and Soda Cloud to improve accuracy.
  • Updated Create monitors and alerts to include custom metrics as part of creation flow; updated prerequisites.
  • Updated Product overview comparison for new excluded_columns functionality and custom metrics in Soda Cloud.
  • Minor adjustments to reflect new and changed elements in the Soda SQL 2.1.0b12 release.

July 16, 2021

  • Added early iteraction of content for Best practices for defining tests and running scans.
  • Added a link to the docs footer to open a Github issue to report issues with docs.

July 13, 2021

  • New Add datasets documentation for the newly launched feature that enables your to connect to data sources and add datasets directly in Soda Cloud.
  • New Collaborate on data monitoring documentation that incorporates how to integrate with Slack, and how to include your team in your efforts to monitor your data.
  • New Adjust a dataset scan schedule content to help you refine how often Soda scans a particular dataset.
  • Revised Quick start tutorial for Soda Cloud that incorporates the new feature to add datasets.
  • Improved Soda product overview page with a comparison chart for features and functionality.

July 6, 2021

If you want to know which flavor of Soda is best, you need to examine the criteria of what makes a good Soda. Is it sweet? Is it performant? Is it an appealing color? Does it produce valid SQL?

Though not conclusive, early test results would indicate that the best flavors of Soda, in descending order, are as follows:

  1. Cream Soda
  2. Root Beer
  3. Coca Cola
  4. Ginger Beer
  5. Cherry Cola

Last modified on 03-Oct-24