Setting up Data Catalog Metastore

Data Flow is integrated with the Data Catalog Metastore where the schema definitions for unstructured and semi-structured data is stored.

You can only create one metastore per tenancy. This constraint ensures a single source of truth for metadata. When creating a Data Catalog metastore, you indicate both the managed-table-bucket location and the external-table-bucket location in Object Storage. Keep these two locations different as a best practice. The metastore assumes that it owns the data for the managed tables. For external tables, the Hive-compatible metastore doesn't manage or own the underlying data. So, operations such as delete DROPTABLE both data and metadata for managed tables, but it only deletes the metadata for external tables.

If you don't have a metastore, create one for use with Data Flow.

Coarse-Grained Access Control in Data Catalog Metastore

The Data Catalog Metastore provides coarse-grained access control using the Identity and Access Management service to avoid accidental access and modification of resources created by another user. As an administrator, you can grant access to resources such as catalogs, databases, and tables using predefined policies mentioned in the Resources List on the metastore details page. For more information, see the Data Catalog Metastore documentation.
Note

This feature isn't supported with Spark 2.4.4.