This document tracks the history and evolution of the Open Data Contract Standard.
v3.0.2 - 2024-03-31 - APPROVED
- Added field
physicalNamefor the properties in JSON schema. - Explicitly specifies
YYYY-MM-DDTHH:mm:ss.SSSZfor default date format. - Added field
nameteam members in JSON schema and docs. - Added field
descriptionteam members in JSON schema and docs. - Fixed Athena Server required property name from
staging_dirtostagingDir
v3.0.1 - 2024-12-22 - APPROVED
- Added field
authoritativeDefinitionsinto JSON schema - Added field
description.customPropertiesinto JSON schema - Added field
description.authoritativeDefinitionsinto JSON schema - Added field
role.customPropertiesinto JSON schema - Updated
statusfield to include examples - Updated
authoritativeDefinitionsdescription to be vendor agnostic - Updated
tagsdescription and included examples
v3.0.0 - 2024-10-21 - APPROVED
- New section: Support & communication channels.
- New section: Servers.
- Changes to fundamentals :
- Rename
uuidtoid. - Add
name. - Rename
quantumNametodataProductand make it optional. - Rename
datasetDomaintodomain(we avoid the dataset prefix). - Drop
datasetKind(example:virtualDataset, was optional, have not seen any usage). - Drop
userConsumptionMode(examples:analytical, was optional, already deprecated in v2.). - Drop
sourceSystem(example:bigQuery, information will be encoded in servers). - Drop
sourcePlatform(example:googleCloudPlatform, information will be encoded in servers). - Drop
productSlackChannel(will move to support channels). - Drop
productFeedbackUrl(will move to support channels). - Drop
productDl(will move to support channels). - Drop
username(credentials should not be stored in the data contract). - Drop
password(credentials should not be stored in the data contract). - Drop
driverVersion(will move to servers if needed). - Drop
driver(will move to servers if needed). - Drop
server(will move to servers if needed). - Drop
project(BigQuery-specific, will move to servers). - Drop
datasetName(BigQuery-specific, will move to servers). - Drop
database(BigQuery-specific, will move to servers). - Drop
schedulerAppName(not part of the contract). - Changes to Schema:
- Major changes, check spec.
- Adds support for non table formats, hierarchies, and arrays.
nameis a new fielditemsis a new fieldpriorTableNameis not supported anymore, if needed, consider a custom property.tableis not supported anymore, if needed, consider usingname.columnsis nowpropertiesdataGranularityis nowdataGranularityDescription.encryptedColumnNameis nowencryptedName.partitionStatusis nowpartitioned.clusterStatusis not supported anymore, if needed, consider a custom property.clusterKeyPositionis not supported anymore, if needed, consider a custom property.sampleValuesis nowexamples.isNullableis nowrequired.isUniqueis nowunique.isPrimaryKeyis nowprimaryKey.criticalDataElementStatusis nowcriticalDataElement.clusterKeyPositionis not supported anymore, if needed, consider a custom property.transformSourceTablesis nowtransformSourceObjects- Restrict
schema.*.logicalTypeto be one ofstring,date,number,integer,object,array,boolean. - Add
schema.*.logicalTypeOptions. - Changes to Data Quality:
- Significant changes have been applied to support more tools and use cases. Please review the new section.
- If needed,
templateNameis a custom property. toolNameis obsolete, replaced bytype=custom; engine: <engine name>.scheduleCronExpressionis replaced byscheduleandscheduler.scheduleCronExpression: 0 20 * * *becomesschedule: 0 20 * * *andscheduler: cron.- Pricing:
- No changes.
- Changes to team (fka stakeholders):
- Replaces
stakeholders. Content stays the same. - Changes to Role:
- Added
description - Changed
accessis not required anymore - Security:
- No changes.
- Changes to SLA:
- Starting with v3, the schema is not purely tables and columns, hence minor modifications: columns are now elements.
slaDefaultColumnis nowslaDefaultElement.columnis nowelement.- Explicit reference to Data QoS.
- Changes to custom and other properties:
systemInstanceis not supported anymore, if needed, consider a custom property.
v2.2.2 - 2024-05-23 - APPROVED
- In JSON schema validation:
- Change
dataset.descriptiondata type fromarraytostring. - Change
dataset.column.isPrimaryKeydata type fromstringtoboolean. - Change
price.priceAmountdata type fromstringtonumber. - Change
slaProperties.valuedata type fromstringtooneOf[string, number]. - Change
slaProperties.valueExtdata type fromstringtooneOf[string, number]. - Update examples to adhere to JSON schema.
- Full example from README directs to full-example.yaml.
- Add in mkdocs for creating a documentation website. Check building-doc.md.
- Add vendors page vendors.md. Feel free to add anyone there.
v2.2.1 - 2023-12-18 - REPLACED BY V2.2.2
- Reformat quality examples to be valid YAML.
- Type of definition for authority have standard values:
businessDefinition,transformationImplementation,videoTutorial,tutorial, andimplementation. - Add in
isUnique,primaryKeyPosition,partitionKeyPosition, andclusterKeyPositiontocolumndefinition. - Add JSON schema to validate YAML files for v2.2.1.
- Integrated as part of Bitol.
- Reformat Markdown tables.
v2.2.0 - 2023-07-27 - REPLACED BY V2.2.1
- New name to Open Data Contract Standard.
templateNameis now calledstandardVersion, v2.2.0 parsers should account for this change and support both to avoid a breaking change.- Added support for
authoritativeDefinitionsat the table level. - Added many examples.
- Various improvements and typo corrections.
- Finalization of fork under AIDA User Group.
v2.1.1 - 2023-04-26 - REPLACED BY V2.2.0
- Open source version.
- Additional value field
valueExtin SLA.
v2.1.0 - 2023-03-23 - REPLACED BY V2.1.1
Data Quality
The data contract adds elements specifically for interfacing with the Data Quality tooling.
Additions: * quality (table level & column level check): * templateName (called standardVersion since v2.2.0) * dimension * type * severity * businessImpact * scheduleCronExpression * customProperties * columns * isPrimaryKey
Physical names
The data contract is a logical construct; we add more specific links to the physical world.
Service-level agreement
The service-level agreements not previously used are more detailed to follow the DP QoS pattern. See SLA.
Other
Removed the weight for system ratings from the data contract. Their default values remain.
v2.0.0 - REPLACED BY V2.1.0
Guidelines & Evolution
- Type case
- Support for SemVer versioning.
- Tags can have values.
Additions
- Version of contract definition: v2.0.0. A breaking change with v1.
- Description:
- Purpose (text field).
- Limitations (text field).
- Usage (text field).
- Domain.
- Dictionary section:
- Identification of masked column (encryptedColumnName property), example: the email_decrypted column would be masked by email_encrypted.
- Flag for critical data element.
- Added keys for transformation data (sources, logic, description).
- Sample values.
- Ability to specify links to authoritative sources at the column level (authoritativeDefinitions).
- Business name.
- List of stakeholders:
- Username (user account).
- Role.
- Date in.
- Date out.
- Replaced by.
- Service levels: agreements & objective orginal inspiration.
- Price / cost.
- Name changes to match PPaaS type case.
- Product data:
- productDl.
- productSlackChannel.
- productFeedbackUrl.
- Renamed
tableskey todataset. - Removed
ownerkey. Owner is now a stakeholder role. - Additional quality keys:
- description.
- toolName.
- toolRuleName.
- Custom Properties.
- Product dates:
- generalAvailabilityDate.
- endOfSupportDate.
- endOfLifeDate.
v1 - DEPRECATED
- Description of the data quantum/data artifact.
- Roles.
- Schema:
- Tables, columns.
- Data quality.
- System rating weightage.
- Ratings:
- System, user, etc.