Setting up Data Catalog Metastore
Data Flow is integrated with the Data Catalog Metastore where the schema definitions for unstructured and semi-structured data is stored.
You can only create one metastore per tenancy. This constraint ensures a single source of truth for metadata. When creating a Data Catalog metastore, you indicate both the managed-table-bucket
location and the external-table-bucket
location in Object Storage. Keep these two locations different as a best practice. The metastore assumes that it owns the data for the managed tables. For external tables, the Hive-compatible metastore doesn't manage or own the underlying data. So, operations such as delete DROPTABLE
both data and metadata for managed tables, but it only deletes the metadata for external tables.
Use the create command and required parameters to create a metastore for use with Data Flow.
oci data-catalog metastore create [OPTIONS]
For a complete list of flags and variable options for CLI commands, see the CLI Command Reference.
Run the CreateMetastore operation to create a Metastore to use with Data Flow.
Coarse-Grained Access Control in Data Catalog Metastore
This feature isn't supported with Spark 2.4.4.