Infrastructure & Servers

The servers element describes where the data protected by this data contract is physically located. That metadata helps to know where the data is so that a data consumer can discover the data and a platform engineer can automate access.

An entry in servers describes a single dataset on a specific environment and a specific technology. The servers element can contain multiple servers, each with its own configuration.

The typical ways of using the top level servers element are as follows:

Single Server: The data contract protects a specific dataset at a specific location. Example: a CSV file on an SFTP server.
Multiple Environments: The data contract makes sure that the data is protected in all environments. Example: a data product with data in a dev(elopment), UAT, and prod(uction) environment on Databricks.
Different Technologies: The data contract makes sure that regardless of the offered technology, it still holds. Example: a data product offers its data in a Kafka topic and in a BigQuery table that should have the same structure and content.
Different Technologies and Multiple Environments: The data contract makes sure that regardless of the offered technology and environment, it still holds. Example: a data product offers its data in a Kafka topic and in a BigQuery table that should have the same structure and content in dev(elopment), UAT, and prod(uction).

Back to TOC

General Server Structure

Each server in the schema has the following structure:

servers:
  - id: my_awesome_server
    server: my-server-name
    type: <server-type>
    description: <server-description>
    environment: <server-environment>
    <server-type-specific-fields> # according to the server type, see below
    roles:
      - <role-details>
    customProperties:
      - <custom-properties>

Common Server Properties

Key	UX label	Required	Description
server	Server	Yes	Identifier of the server.
id	ID	No	A unique identifier used to reduce the risk of collisions, such as a UUID.
type	Type	Yes	Type of the server. Can be one of: api, athena, azure, bigquery, clickhouse, cloudsql, custom, databricks, db2, denodo, dremio, duckdb, glue, hive, impala, informix, kafka, kinesis, local, mysql, oracle, postgres, postgresql, presto, pubsub, redshift, s3, sftp, snowflake, sqlserver, synapse, trino, vertica, zen.
description	Description	No	Description of the server.
environment	Environment	No	Environment of the server. Examples includes: prod, preprod, dev, uat.
roles	Roles	No	List of roles that have access to the server. Check roles section for more details.
customProperties	Custom Properties	No	Custom properties that are not part of the standard.

Specific Server Properties

Each server type can be customized with different properties such as host, port, database, and schema, depending on the server technology in use. Refer to the specific documentation for each server type for additional configurations.

Specific Server Properties

If your server is not in the list, please use custom and suggest it as an improvement. Possible values for type are:

API Server

Key	UX Label	Required	Description
location	Location	Yes	URL to the API

Amazon Athena Server

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL. With a few actions in the AWS Management Console, you can point Athena at your data stored in Amazon S3 and begin using standard SQL to run ad-hoc queries and get results in seconds.

Key	UX Label	Required	Description
schema	Schema	Yes	Identify the schema in the data source in which your tables exist.
stagingDir	Staging Directory	No	Amazon Athena automatically stores query results and metadata information for each query that runs in a query result location that you can specify in Amazon S3.
catalog	Catalog	No	Identify the name of the Data Source, also referred to as a Catalog.
regionName	Region Name	No	The region your AWS account uses.

Azure Server

Key	UX Label	Required	Description
location	Location	Yes	Fully qualified path to Azure Blob Storage or Azure Data Lake Storage (ADLS), supports globs.
format	Format	Yes	File format.
delimiter	Delimiter	No	Only for format = json. How multiple json documents are delimited within one file

Google BigQuery

BigQuery is a fully managed, AI-ready data analytics platform that helps you maximize value from your data and is designed to be multi-engine, multi-format, and multi-cloud.

Key	UX Label	Required	Description
project	Project	Yes	The Google Cloud Platform (GCP) project name.
dataset	Dataset	Yes	The GCP dataset name.

ClickHouse Server

ClickHouse is an open-source column-oriented database management system that allows generating analytical data reports in real-time.

Key	UX Label	Required	Description
host	Host	Yes	The host of the ClickHouse server.
port	Port	Yes	The port to the ClickHouse server.
database	Database	Yes	The name of the database.

Google Cloud SQL

Google Cloud SQL is a fully managed, cost-effective relational database service for PostgreSQL, MySQL, and SQL Server.

Key	UX Label	Required	Description
host	Host	Yes	The host of the Google Cloud SQL server.
port	Port	Yes	The port of the Google Cloud SQL server.
database	Database	Yes	The name of the database.
schema	Schema	Yes	The name of the schema.

Databricks Server

Key	UX Label	Required	Description
catalog	Catalog	Yes	The name of the Hive or Unity catalog
schema	Schema	Yes	The schema name in the catalog
host	Host	No	The Databricks host

IBM Db2 Server

Key	UX Label	Required	Description
host	Host	Yes	The host of the IBM DB2 server.
port	Port	Yes	The port of the IBM DB2 server.
database	Database	Yes	The name of the database.
schema	Schema	No	The name of the schema.

Denodo Server

Key	UX Label	Required	Description
host	Host	Yes	The host of the Denodo server.
port	Port	Yes	The port of the Denodo server.
database	Database	No	The name of the database.

Dremio Server

Key	UX Label	Required	Description
host	Host	Yes	The host of the Dremio server.
port	Port	Yes	The port of the Dremio server.
schema	Schema	No	The name of the schema.

DuckDB Server

DuckDB supports a feature-rich SQL dialect complemented with deep integrations into client APIs.

Key	UX Label	Required	Description
database	Database	Yes	Path to duckdb database file.
schema	Schema	No	The name of the schema.

Amazon Glue

Key	UX Label	Required	Description
account	Account	Yes	The AWS Glue account
database	Database	Yes	The AWS Glue database name
location	Location	No	The AWS S3 path. Must be in the form of a URL.
format	Format	No	The format of the files

Hive

Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at massive scale. Built on top of Apache Hadoop, Hive allows users to read, write, and manage petabytes of data using SQL-like queries through HiveQL, with native support for cloud storage systems and enterprise-grade security features.

Key	UX Label	Required	Description
host	Host	Yes	The host to the Hive server.
port	Port	No	The port to the Hive server. Defaults to 10000.
database	Database	Yes	The name of the Hive database.

Apache Impala

Apache Impala is a massively parallel processing (MPP) SQL query engine for data stored in Apache Hadoop clusters. Impala provides high-performance, low-latency SQL queries on data stored in HDFS and Apache HBase, enabling interactive exploration and analytics without data movement or transformation.

Key	UX Label	Required	Description
host	Host	Yes	The host to the Impala server.
port	Port	No	The port to the Impala server. Defaults to 21050.
database	Database	Yes	The name of the Impala database.

IBM Informix and HCL Informix

IBM Informix is a high performance, always-on, highly scalable and easily embeddable enterprise-class database optimized for the most demanding transactional and analytics workloads. As an object-relational engine, IBM Informix seamlessly integrates the best of relational and object-oriented capabilities enabling the flexible modeling of complex data structures and relationships.

Key	UX Label	Required	Description
host	Host	Yes	The host to the Informix server.
port	Port	No	The port to the Informix server. Defaults to 9088.
database	Database	Yes	The name of the database.

Kafka Server

Key	UX Label	Required	Description
host	Host	Yes	The bootstrap server of the kafka cluster.
format	Format	No	The format of the messages.

Amazon Kinesis

Key	UX Label	Required	Description
stream	Stream	Yes	The name of the Kinesis data stream.
region	Region	No	AWS region.
format	Format	No	The format of the record

Local Files

Key	UX Label	Required	Description
path	Path	Yes	The relative or absolute path to the data file(s).
format	Format	Yes	The format of the file(s)

MySQL Server

Key	UX Label	Required	Description
host	Host	Yes	The host of the MySql server.
port	Port	No	The port of the MySql server. Defaults to 3306.
database	Database	Yes	The name of the database.

Oracle

Key	UX Label	Required	Description
host	Host	Yes	The host to the Oracle server
port	Port	Yes	The port to the Oracle server.
serviceName	Service Name	Yes	The name of the service.

PostgreSQL

PostgreSQL is a powerful, open source object-relational database system with over 35 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.

Key	UX Label	Required	Description
host	Host	Yes	The host to the PostgreSQL server
port	Port	No	The port to the PostgreSQL server. Defaults to 5432.
database	Database	Yes	The name of the database.
schema	Schema	No	The name of the schema in the database.

Presto Server

Key	UX Label	Required	Description
host	Host	Yes	The host to the Presto server
catalog	Catalog	No	The name of the catalog.
schema	Schema	No	The name of the schema.

Google Pub/Sub

Google Cloud service to Ingest events for streaming into BigQuery, data lakes or operational databases.

Key	UX Label	Required	Description
project	Project	Yes	The GCP project name.

Amazon Redshift Server

Amazon Redshift is a power data driven decisions with the best price-performance cloud data warehouse.

Key	UX Label	Required	Description
database	Database	Yes	The name of the database.
schema	Schema	Yes	The name of the schema.
host	Host	No	An optional string describing the server.
region	Region	No	AWS region of Redshift server.
account	Account	No	The account used by the server.

Amazon S3 Server and Compatible Servers

Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. Millions of customers of all sizes and industries store, manage, analyze, and protect any amount of data for virtually any use case, such as data lakes, cloud-native applications, and mobile apps. Other vendors have implemented a compatible implementation of S3.

Key	UX Label	Required	Description
location	Location	Yes	S3 URL, starting with `s3://`
endpointUrl	Endpoint URL	No	The server endpoint for S3-compatible servers.
format	Format	No	File format.
delimiter	Delimiter	No	Only for format = json. How multiple json documents are delimited within one file

SFTP Server

Secure File Transfer Protocol (SFTP) is a network protocol that enables secure and encrypted file transfers between a client and a server.

Key	UX Label	Required	Description
location	Location	Yes	SFTP URL, starting with `sftp://`. The URL should include the port number.
format	Format	No	File format.
delimiter	Delimiter	No	Only for format = json. How multiple json documents are delimited within one file

Snowflake

Key	UX Label	Required	Description
host	Host	Yes	The host to the Snowflake server
port	Port	Yes	The port to the Snowflake server.
account	Account	Yes	The Snowflake account used by the server.
database	Database	Yes	The name of the database.
warehouse	Warehouse	Yes	The name of the cluster of resources that is a Snowflake virtual warehouse.
schema	Schema	Yes	The name of the schema.

Microsoft SQL Server

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft.

Key	UX Label	Required	Description
host	Host	Yes	The host to the database server
port	Port	No	The port to the database server. Defaults to 1433.
database	Database	Yes	The name of the database.
schema	Schema	Yes	The name of the schema in the database.

Synapse Server

Key	UX Label	Required	Description
host	Host	Yes	The host of the Synapse server.
port	Port	Yes	The port of the Synapse server.
database	Database	Yes	The name of the database.

Trino Server

Key	UX Label	Required	Description
host	Host	Yes	The Trino host URL.
port	Port	Yes	The Trino port.
catalog	Catalog	Yes	The name of the catalog.
schema	Schema	Yes	The name of the schema in the database.

Vertica Server

Key	UX Label	Required	Description
host	Host	Yes	The host of the Vertica server.
port	Port	Yes	The port of the Vertica server.
database	Database	Yes	The name of the database.
schema	Schema	Yes	The name of the schema.

Actian Zen Server

Actian Zen (formerly Btrieve, later named Pervasive PSQL until version 13) is an ACID-compliant, zero-DBA, embedded, nano-footprint, multi-model, Multi-Platform database management system (DBMS).

Key	UX Label	Required	Description
host	Host	Yes	Hostname or IP address of the Zen server.
port	Port	No	Zen server SQL connections port. Defaults to 1583.
database	Database	Yes	Database name to connect to on the Zen server.

Custom Server

Key	UX Label	Required	Description
account	Account	No	Account used by the server.
catalog	Catalog	No	Name of the catalog.
database	Database	No	Name of the database.
dataset	Dataset	No	Name of the dataset.
delimiter	Delimiter	No	Delimiter.
endpointUrl	Endpoint URL	No	Server endpoint.
format	Format	No	File format.
host	Host	No	Host name or IP address.
location	Location	No	A URL to a location.
path	Path	No	Relative or absolute path to the data file(s).
port	Port	No	Port to the server. No default value is assumed for custom servers.
project	Project	No	Project name.
region	Region	No	Cloud region.
regionName	Region Name	No	Region name.
schema	Schema	No	Name of the schema.
serviceName	Service Name	No	Name of the service.
stagingDir	Staging Directory	No	Staging directory.
stream	Stream	No	Name of the data stream.
warehouse	Warehouse	No	Name of the cluster or warehouse.

If you need another property, use custom properties.

Back to TOC