Skip to content

Infrastructure & Servers

The servers element describes where the data protected by this data contract is physically located. That metadata helps to know where the data is so that a data consumer can discover the data and a platform engineer can automate access.

An entry in servers describes a single dataset on a specific environment and a specific technology. The servers element can contain multiple servers, each with its own configuration.

The typical ways of using the top level servers element are as follows:

  • Single Server: The data contract protects a specific dataset at a specific location. Example: a CSV file on an SFTP server.
  • Multiple Environments: The data contract makes sure that the data is protected in all environments. Example: a data product with data in a dev(elopment), UAT, and prod(uction) environment on Databricks.
  • Different Technologies: The data contract makes sure that regardless of the offered technology, it still holds. Example: a data product offers its data in a Kafka topic and in a BigQuery table that should have the same structure and content.
  • Different Technologies and Multiple Environments: The data contract makes sure that regardless of the offered technology and environment, it still holds. Example: a data product offers its data in a Kafka topic and in a BigQuery table that should have the same structure and content in dev(elopment), UAT, and prod(uction).

Back to TOC

General Server Structure

Each server in the schema has the following structure:

servers:
  - id: my_awesome_server
    server: my-server-name
    type: <server-type>
    description: <server-description>
    environment: <server-environment>
    <server-type-specific-fields> # according to the server type, see below
    roles:
      - <role-details>
    customProperties:
      - <custom-properties>

Common Server Properties

Key UX label Required Description
server Server Yes Identifier of the server.
id ID No A unique identifier used to reduce the risk of collisions, such as a UUID.
type Type Yes Type of the server. Can be one of: api, athena, azure, bigquery, clickhouse, cloudsql, custom, databricks, db2, denodo, dremio, duckdb, glue, hive, impala, informix, kafka, kinesis, local, mysql, oracle, postgres, postgresql, presto, pubsub, redshift, s3, sftp, snowflake, sqlserver, synapse, trino, vertica, zen.
description Description No Description of the server.
environment Environment No Environment of the server. Examples includes: prod, preprod, dev, uat.
roles Roles No List of roles that have access to the server. Check roles section for more details.
customProperties Custom Properties No Custom properties that are not part of the standard.

Specific Server Properties

Each server type can be customized with different properties such as host, port, database, and schema, depending on the server technology in use. Refer to the specific documentation for each server type for additional configurations.

Specific Server Properties

If your server is not in the list, please use custom and suggest it as an improvement. Possible values for type are:

API Server

Key UX Label Required Description
location Location Yes URL to the API

Amazon Athena Server

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL. With a few actions in the AWS Management Console, you can point Athena at your data stored in Amazon S3 and begin using standard SQL to run ad-hoc queries and get results in seconds.

Key UX Label Required Description
schema Schema Yes Identify the schema in the data source in which your tables exist.
stagingDir Staging Directory No Amazon Athena automatically stores query results and metadata information for each query that runs in a query result location that you can specify in Amazon S3.
catalog Catalog No Identify the name of the Data Source, also referred to as a Catalog.
regionName Region Name No The region your AWS account uses.

Azure Server

Key UX Label Required Description
location Location Yes Fully qualified path to Azure Blob Storage or Azure Data Lake Storage (ADLS), supports globs.
format Format Yes File format.
delimiter Delimiter No Only for format = json. How multiple json documents are delimited within one file

Google BigQuery

BigQuery is a fully managed, AI-ready data analytics platform that helps you maximize value from your data and is designed to be multi-engine, multi-format, and multi-cloud.

Key UX Label Required Description
project Project Yes The Google Cloud Platform (GCP) project name.
dataset Dataset Yes The GCP dataset name.

ClickHouse Server

ClickHouse is an open-source column-oriented database management system that allows generating analytical data reports in real-time.

Key UX Label Required Description
host Host Yes The host of the ClickHouse server.
port Port Yes The port to the ClickHouse server.
database Database Yes The name of the database.

Google Cloud SQL

Google Cloud SQL is a fully managed, cost-effective relational database service for PostgreSQL, MySQL, and SQL Server.

Key UX Label Required Description
host Host Yes The host of the Google Cloud SQL server.
port Port Yes The port of the Google Cloud SQL server.
database Database Yes The name of the database.
schema Schema Yes The name of the schema.

Databricks Server

Key UX Label Required Description
catalog Catalog Yes The name of the Hive or Unity catalog
schema Schema Yes The schema name in the catalog
host Host No The Databricks host

IBM Db2 Server

Key UX Label Required Description
host Host Yes The host of the IBM DB2 server.
port Port Yes The port of the IBM DB2 server.
database Database Yes The name of the database.
schema Schema No The name of the schema.

Denodo Server

Key UX Label Required Description
host Host Yes The host of the Denodo server.
port Port Yes The port of the Denodo server.
database Database No The name of the database.

Dremio Server

Key UX Label Required Description
host Host Yes The host of the Dremio server.
port Port Yes The port of the Dremio server.
schema Schema No The name of the schema.

DuckDB Server

DuckDB supports a feature-rich SQL dialect complemented with deep integrations into client APIs.

Key UX Label Required Description
database Database Yes Path to duckdb database file.
schema Schema No The name of the schema.

Amazon Glue

Key UX Label Required Description
account Account Yes The AWS Glue account
database Database Yes The AWS Glue database name
location Location No The AWS S3 path. Must be in the form of a URL.
format Format No The format of the files

Hive

Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at massive scale. Built on top of Apache Hadoop, Hive allows users to read, write, and manage petabytes of data using SQL-like queries through HiveQL, with native support for cloud storage systems and enterprise-grade security features.

Key UX Label Required Description
host Host Yes The host to the Hive server.
port Port No The port to the Hive server. Defaults to 10000.
database Database Yes The name of the Hive database.

Apache Impala

Apache Impala is a massively parallel processing (MPP) SQL query engine for data stored in Apache Hadoop clusters. Impala provides high-performance, low-latency SQL queries on data stored in HDFS and Apache HBase, enabling interactive exploration and analytics without data movement or transformation.

Key UX Label Required Description
host Host Yes The host to the Impala server.
port Port No The port to the Impala server. Defaults to 21050.
database Database Yes The name of the Impala database.

IBM Informix and HCL Informix

IBM Informix is a high performance, always-on, highly scalable and easily embeddable enterprise-class database optimized for the most demanding transactional and analytics workloads. As an object-relational engine, IBM Informix seamlessly integrates the best of relational and object-oriented capabilities enabling the flexible modeling of complex data structures and relationships.

Key UX Label Required Description
host Host Yes The host to the Informix server.
port Port No The port to the Informix server. Defaults to 9088.
database Database Yes The name of the database.

Kafka Server

Key UX Label Required Description
host Host Yes The bootstrap server of the kafka cluster.
format Format No The format of the messages.

Amazon Kinesis

Key UX Label Required Description
stream Stream Yes The name of the Kinesis data stream.
region Region No AWS region.
format Format No The format of the record

Local Files

Key UX Label Required Description
path Path Yes The relative or absolute path to the data file(s).
format Format Yes The format of the file(s)

MySQL Server

Key UX Label Required Description
host Host Yes The host of the MySql server.
port Port No The port of the MySql server. Defaults to 3306.
database Database Yes The name of the database.

Oracle

Key UX Label Required Description
host Host Yes The host to the Oracle server
port Port Yes The port to the Oracle server.
serviceName Service Name Yes The name of the service.

PostgreSQL

PostgreSQL is a powerful, open source object-relational database system with over 35 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.

Key UX Label Required Description
host Host Yes The host to the PostgreSQL server
port Port No The port to the PostgreSQL server. Defaults to 5432.
database Database Yes The name of the database.
schema Schema No The name of the schema in the database.

Presto Server

Key UX Label Required Description
host Host Yes The host to the Presto server
catalog Catalog No The name of the catalog.
schema Schema No The name of the schema.

Google Pub/Sub

Google Cloud service to Ingest events for streaming into BigQuery, data lakes or operational databases.

Key UX Label Required Description
project Project Yes The GCP project name.

Amazon Redshift Server

Amazon Redshift is a power data driven decisions with the best price-performance cloud data warehouse.

Key UX Label Required Description
database Database Yes The name of the database.
schema Schema Yes The name of the schema.
host Host No An optional string describing the server.
region Region No AWS region of Redshift server.
account Account No The account used by the server.

Amazon S3 Server and Compatible Servers

Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. Millions of customers of all sizes and industries store, manage, analyze, and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile apps. Other vendors have implemented a compatible implementation of S3.

Key UX Label Required Description
location Location Yes S3 URL, starting with s3://
endpointUrl Endpoint URL No The server endpoint for S3-compatible servers.
format Format No File format.
delimiter Delimiter No Only for format = json. How multiple json documents are delimited within one file

SFTP Server

Secure File Transfer Protocol (SFTP) is a network protocol that enables secure and encrypted file transfers between a client and a server.

Key UX Label Required Description
location Location Yes SFTP URL, starting with sftp://. The URL should include the port number.
format Format No File format.
delimiter Delimiter No Only for format = json. How multiple json documents are delimited within one file

Snowflake

Key UX Label Required Description
host Host Yes The host to the Snowflake server
port Port Yes The port to the Snowflake server.
account Account Yes The Snowflake account used by the server.
database Database Yes The name of the database.
warehouse Warehouse Yes The name of the cluster of resources that is a Snowflake virtual warehouse.
schema Schema Yes The name of the schema.

Microsoft SQL Server

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft.

Key UX Label Required Description
host Host Yes The host to the database server
port Port No The port to the database server. Defaults to 1433.
database Database Yes The name of the database.
schema Schema Yes The name of the schema in the database.

Synapse Server

Key UX Label Required Description
host Host Yes The host of the Synapse server.
port Port Yes The port of the Synapse server.
database Database Yes The name of the database.

Trino Server

Key UX Label Required Description
host Host Yes The Trino host URL.
port Port Yes The Trino port.
catalog Catalog Yes The name of the catalog.
schema Schema Yes The name of the schema in the database.

Vertica Server

Key UX Label Required Description
host Host Yes The host of the Vertica server.
port Port Yes The port of the Vertica server.
database Database Yes The name of the database.
schema Schema Yes The name of the schema.

Actian Zen Server

Actian Zen (formerly Btrieve, later named Pervasive PSQL until version 13) is an ACID-compliant, zero-DBA, embedded, nano-footprint, multi-model, Multi-Platform database management system (DBMS).

Key UX Label Required Description
host Host Yes Hostname or IP address of the Zen server.
port Port No Zen server SQL connections port. Defaults to 1583.
database Database Yes Database name to connect to on the Zen server.

Custom Server

Key UX Label Required Description
account Account No Account used by the server.
catalog Catalog No Name of the catalog.
database Database No Name of the database.
dataset Dataset No Name of the dataset.
delimiter Delimiter No Delimiter.
endpointUrl Endpoint URL No Server endpoint.
format Format No File format.
host Host No Host name or IP address.
location Location No A URL to a location.
path Path No Relative or absolute path to the data file(s).
port Port No Port to the server. No default value is assumed for custom servers.
project Project No Project name.
region Region No Cloud region.
regionName Region Name No Region name.
schema Schema No Name of the schema.
serviceName Service Name No Name of the service.
stagingDir Staging Directory No Staging directory.
stream Stream No Name of the data stream.
warehouse Warehouse No Name of the cluster or warehouse.

If you need another property, use custom properties.

Back to TOC