Private Endpoints Use Cases

Data Flow applications can be configured to access data sources hosted within private networks, enabling secure and seamless connectivity and reducing the exposure of sensitive data to potential breaches or unauthorized access.

By limiting exposure to public networks, this approach reduces the risk of data breaches and unauthorized access, a critical concern for industries handling sensitive information. For example, organizations subject to regulations such as the General Data Protection Regulation (GDPR) in the EU can ensure personal data remains protected within controlled environments, minimizing the risk of noncompliance. Similarly, healthcare providers bound by the Health Insurance Portability and Accountability Act (HIPAA) in the United States can securely process protected health information (PHI) through private network configurations, safeguarding patient confidentiality. This architecture also supports compliance with other regulatory frameworks, such as SOC 2 for data security and privacy, enabling organizations to meet their obligations while maintaining high-performance data processing.

Configuring a Data Flow application with private network access provides the following capabilities:

  • Access to Private Oracle Cloud Infrastructure (OCI) Data Sources: Connect to OCI data sources accessible only within private networks.
  • Integration with On-Premises Data Sources: Access on-premises data connected to an OCI Virtual Cloud Network (VCN) through Site-to-Site VPN or FastConnect.
  • Support for Oracle RAC Databases: Use the SCAN proxy functionality to access Oracle RAC databases.

The following image depicts a simplified diagram of the Data Flow network configuration. Within that, notice the serverless applications running inside the Data Flow Services Tenancy. Therefore, some network components need to be understood from this diagram. An OCI region connecting to an on-premises network over a FastConnect link. The OCI region contains a Data Flow tenancy and a customer tenancy linked through a private access gateway.

Like any other application running on a secure private network, the Data Flow application cluster connects to the Internet through a NAT Gateway (network address translation) and to the Oracle Service Network (OSN) using the Services Gateway (SGW), restricting any access from the Internet to the application cluster. A Data Flow Private Access Gateway (Data Flow private endpoint) is constructed, letting a Data Flow application access OCI resources such as ADB Instances that reside in a private subnet, and customer on-premises resources connected to OCI with Fast-Connect or Site-to-Site VPNs.

The following use cases show how to analyze how the network settings and some extra configurations allow secure access through private subnets:

Connect to an ADB Instance Configured with Private Endpoint Access only

This is the most common use case for Data Flow and private endpoints.

Two things need consideration:
  • The ADB instance has a network Access type set to Virtual cloud network, and a private subnet is selected.

    Navigate to the Autonomous Database details, and under Network, copy the Private endpoint URL (FQDN), for example:

    <podID>.adb.<region>.oraclecloud.com
  • Hence, the private endpoint is already resolved with the VCN, so it can be carried forward to the Data Flow configuration. So, in the Data Flow Private endpoint details:

    Select Edit and update the DNS zones to resolve and paste the FQDN of the ADB instance gathered in the previous step.

If the ADB network configuration restricts application access even further by using Network Security Groups (NSGs) that allow only specific CIDR ranges, OCI Services, or NSGs, ensure they're represented in both ends of the ADB and Data Flow configuration.

Secure Access from Allowed IPs and VCNs

This is a variation of the Network settings in the ADB, where the option "Secure access from allowed IPs and VCNs only" is selected, and an access control list (ACL) is attached to the configuration. In this scenario, the Data Flow private endpoint isn't necessary. The network traffic travels throughout the NAT gateway in the Data Flow service tenancy within the customer-allocated subnet (tenant OKE cluster subnet in the previous image). For the documented list of IPs to allowlist in the ACL, see the IP Address Allowlist.
Note

Data Flow private endpoint s aren't necessary for the implementation setting.

Move a Data Flow Application to Use Private Endpoints and Regional Restrictions on Oracle Cloud Infrastructure Object Storage

A Data Flow application running with the "Internet access" type might count to access buckets in different regions. For example, an application running in IAD region can access objects stored in PHX region, provided OCI IAM authorization grants such access.

If this application eventually moves into a "Private access" run, it loses access to another region's public Object Store service. The Service Gateway maintains communication with the Object Store service, as depicted in the previous image, and therefore, regional access is enforced at the domain name resolution (DNS).

If this is your scenario, follow Cross-Region Access with Data Flow Private Endpoints.

Access to other Oracle Services Restrictions

The following DNS zones might generate a request being rejected when using Data Flow private endpoints with other Oracle services:
oracle.com
oracle-ocna.com
oraclegoviaas.com
oraclegovcloud.com
oracleiaas.com
grungy.us
oraclecorp.com
oraclecloud.net
oraclegha.com
oc-test.com
oracleemaildelivery.com
ocir.io
oracledx.com
In general, the following Oracle services are limited by using the Data Flow private endpoints:
  • Object store buckets with segregated IAM access policy.
  • Cross-Region access of OCI resources.
  • Direct use of IP addresses to access private resources in the customer tenancy.

Connect to On-Premises Resources

Integrate Data Flow with on-premises data sources through private endpoints ensures secure and efficient data processing across hybrid environments.

By using private network connectivity options such as Site-to-Site VPN or FastConnect, you can seamlessly connect Data Flow applications to data repositories hosted in the on-premises infrastructure. This gives robust, low-latency communication while maintaining strict security boundaries. Use it for use cases that require secure data access and processing across cloud and on-premises environments.

A key element of this setup is the DNS resolution, which now takes place within the Private Access Gateway. The configuration provides a DNS name (fully qualified domain name, or FQDN) for the private endpoints, not the private IP address itself. If you've configured your network setup for DNS, your hosts can access the private endpoint using the FQDN. Data Flow supports network security groups (NSGs) with its resources. You can request that the Data Flow service set up the private endpoint in an NSG within your VCN. NSGs let you write security rules to control access to the private endpoint without knowing the private IP address assigned to the private endpoint.

When connectivity between the OCI customer VCN and the on-premises network has been established, for the Data Flow application to operate correctly, it's necessary to associate the private IPs in the on-premises network with a private Domain resolver in the OCI customer VCN.

To illustrate this part, we're using two networks connected using the OCI Network Visualizer. The hdi-dataflow-VCN private subnet is connected to a disdemo-DRG dynamic routing gateway attachment. A private subnet, hdi-dataflow-vcn, connected to a dynamic routing gateway attachment, disdemo-DRG.

From this point onwards, the relevant part is to decide how the DNS resolution occurs. In this example, we created two DNS resolvers:

  • One attached to hdi-dataflow-vcn standard view (industrial.com)
  • Another in a customer-private-view with the resolution for the domain oraclevcn.com, indicating this is a private subnet in an OCI VCN.

In the "Private resolver details" of the selected VCN, select Manage private views. Two options exist:

  • Protected private view, in our case, creating the hdi-dataflow-vcn private view (Order 1).
  • Using Manage private views again, create the customer-private-view (Order 2).

Navigating to the first private view hdi-dataflow-vcn, under the Private Zones, we created the industrial.com zone with Primary Zone type.

Similarly, navigate to the customer-private-view and verify that a Private zone is already created within the Private subnet of the referred VCN.

Under the industrial.com zone in the Private view, select Manage records to create a new record for the IP of an external MySQLServer running in the customer on-premises network. For example:

Name: MySQL_Customer_OnPremises(.industrial.com)

Type: A - IPv4 address

TTL in seconds: 3600

RDATA/Answers

Address: 10.xx.xx.xx (the IP address of the on-premises resource).

Similarly, do the same for private access within another OCI private VCN that has been peered with the current VCN, such as the one in the previous image, through the local peering gateway (LPG). So, in the internal zone privatesubnet.dfarchitecture.oraclevcn.com add record to the IP of another MySQL instance running in the private subnet of the peered VCN. For example:

Domain: mysql-private.privatesubnet.dfarchitecture.oraclevcn.com

Type: A - IPv4 address

TTL in seconds: 3600

RDATA/Answers

Address: 12.xx.xx.xx (the IP address of the resource in the peered VCN).

Again, the DNS resolution is the key factor to be addressed when using Data Flow private endpoints. Resolve them using record types appropriate to the downstream network. For these types and configurations, follow the Managing DNS zones documentation.

Test the Data Flow Private Endpoint

To test and verify that this configuration is working, in the Github Dataflow Samples, a body of code tests the DNS resolution in the first iteration and, if positive, tries to establish connectivity with the configured record in the second iteration. For more details, see README.

The output of a successful test shows in the application driver's log. For example:
FQDN 'MySQL_Private_DB.industrial.com' resolved to IP '255.33.36.2'. Testing connectivity...
Success: Able to connect to MySQL_Private_DB.industrial.com (255.33.36.2) on port 3306.

Cross-Region Access with Data Flow Private Endpoints

Data Flow supports private endpoint integration for seamless cross-regional data access using remote VCN peering through an upgraded Dynamic Routing Gateway (DRG).

This configuration lets Data Flow applications in one region to securely connect to data sources hosted in another region's VCN, as depicted in the following image: Remote VCN peering between two OCI regions through an upgraded Dynamic Routing Gateway

You can ensure high-performance, low-latency data transfers while maintaining a robust security posture by using the upgraded DRG's advanced capabilities, including transitive routing and centralized connectivity.

Also, with OCI’s multicloud networking capabilities, you can extend this setup to connect with, for example, Microsoft Azure. By using OCI-Azure Interconnect or a similar multicloud connectivity framework, you can enable Data Flow to process data stored in Azure resources such as Azure Blob Storage or Azure SQL Database, as shown in the following image: Peering between an OCI Region and a Microsoft Azure Region passing through an Oracle partner facility.

This architecture supports centralized connectivity, transitive routing, and low-latency data transfer between OCI and Azure while maintaining stringent security and compliance standards.

For the Data Flow private endpoint, it matters how the DNS names are resolved before the DRG, using private resolvers, as shown in Connect to On-Premises Resources. A tangible benefit is that the instances for private connectivity in the service, VCN can access a consumer-specified workload without traversing the Internet. Beyond that, the Data Flow private endpoint can extend private connectivity from instances in service VCN to the consumer's on-premises network and other networks accessible through the consumer VCN. From the usability perspective, you can continue interacting with only the service Console (or API) and don't need another interface to enable private access. Despite the flexibility of operation using the Data Flow private endpoint, some limitations exist:

  • The default limit for a Data Flow private endpoint is no more than five per tenancy per region.
  • If Internet connectivity is required for a Spark Application run with private endpoints enabled, the corresponding DNS zone (for example, Google's APIs/google.zone) needs to be mentioned in the parameter (zones) section for private endpoints under the Application. So, if the zone is allowlisted, the traffic is routed to the customer VCN for resolution. The customer network is responsible for internet connectivity after the packet reaches the consumer gateway VCN. The network traffic is dropped for all other zones not mentioned in the parameter (DNS Zones).

Connect to an Oracle Database Cluster (RAC or Exadata)

Data Flow can connect to the RAC (real application cluster) or Exadata machine as a client application using SCAN (Single Client Access Name).

The SCAN is a virtual name similar to those used for virtual IP addresses. However, unlike a virtual IP, the SCAN virtual name is associated with the entire cluster rather than an individual node and several IP addresses, not one address only.

When the SCAN proxy feature is enabled, a reverse connection entry point (RCE) is established to handle IP-based redirects. As shown in the following image, a private endpoint VNIC (private endpoint virtual network interface card) is created in the Customer VCN. The RCE private endpoint VNIC is unique for each Data Flow private endpoint setup. One important consideration about the TLS connection to a database cluster is that the database SCAN listeners redirect the network traffic to an FQDN, not the IP address directly. Only the FQDN redirects from a SCAN listener enable TLS. Therefore, configure the database cluster to redirect to an FQDN if TLS is a requirement. SCAN proxy flow for RAC/Exadata

The configuration steps that happen behind the scenes to create the SCAN proxy feature:

  • The user configures the SCAN proxy in the Data Flow private endpoint configuration
  • The Data Flow updates RCE to include SCAN configuration (the SCAN listener DNS name and port), which provides a new IP (SCAN proxy IP) in the service VCN binding to the same SCAN port
  • Data Flow then uses the SCAN proxy IP to create a DNS mapping within the service network, using the original SCAN listener DNS name and the SCAN proxy IP

The previous image shows an example of a RAC Oracle Database system within a private subnet in the customer VCN. The flow of the picture states the sequence used to identify the connectivity:

  1. Data Flow starts a connection to the SCAN Proxy endpoint within the service VCN using the DNS name. Data Flow defines a customer RAC connection through the SCAN Proxy by selecting a specific port. RCE SCAN proxy forwards the request to the underlying private endpoint VNIC listener in the customer network. It then inspects the SCAN listener response for an IP of the underlying database cluster instance, creates a Class E NAT IP, and replaces the cluster instance IP with NAT IP in the SCAN proxy response.
  2. The Private Access Gateway receives the redirect request from the SCAN listener and automatically translates the local listener IP in the customer's VCN into a mapped IP address, then returns this information to the Data Flow components and makes a connection request to one of the local listeners.

In the Data Flow private endpoint configuration, enter the DNS name of the scan host in the SCAN details section and its associated port number. For example:

DNS name: oracleDB-scan-sub0911000090.dfarchitecture.oraclevcn.com

Port: 16001

Considerations about Network Traffic and Isolation

When running a Spark serverless execution, it's critical to account for network traffic patterns and isolation to ensure best performance, security, and compliance.

Serverless Spark jobs run within a managed environment where network traffic flows between your application, data sources, and external services. To minimize latency and control traffic, ensure data sources are colocated within the same region and Virtual Cloud Network (VCN) where possible. Here are some more considerations and information about the Data Flow private endpoint setup:

  • After an Application run with Data Flow private endpoint resource is attached, the network traffic to the Internet is routed to the your VCN subnet through private endpoint Infrastructure as long as the DNS zone is allowlisted during private endpoint creation. It fails if your VCN doesn't have an Internet Gateway attached. If the DNS zone isn't allowlisted, network traffic is dropped. Network traffic to OCI services, for example, Object Storage in the Oracle Services Network is still routed through Data Flow's VCN.
  • When the Data Flow run is running, the Spark Data Flow Application running from any nodes assigned to the tenant start a network connection to the your private resource with the DNS name, for example, customer1.instance1.subnet.oraclevcn.com). This involves a DNS lookup of the DNS Proxy IP assigned to your private resource, created during private endpoint/reverse connection endpoint setup. In a reverse connection, a server starts the connection to a client, letting Data Flow access the resource privately by connecting to a specified endpoint within the your network.
  • Your DNS zones in the private views create a proxy that returns a Class E IP address (240.0.0.0-255.255.255.255) for the customer.instance.subnet.oraclevcn.com allocating from a specific CIDR range, for example, 255.33.36.2, as portrayed in the test script in Test the Data Flow Private Endpoint. In this example, Data Flow nodes running your jobs can establish a network connection to 255.33.36.0/24, and a Stateful Egress Rule is created with that destination CIDR range. This means that when a Data Flow instance starts traffic to another host and that traffic is allowed by egress security rules, any traffic that the instance later receives from that host for a period is considered response traffic(ingress) and is allowed. A route table rule is also added to route the Data Flow private endpoint appropriately for the destination CIDR range as 255.33.36.0/24, allocated to that resource.

Summary

Data Flow's private endpoint capabilities provide robust and secure connectivity for accessing diverse data sources and environments.

With private endpoint access, you can seamlessly connect to Autonomous Database (ADB) instances configured for private access only, ensuring secure interactions without exposing the database to public networks. Similarly, private endpoints enable secure connectivity to on-premises resources through Site-to-Site VPN or FastConnect, giving hybrid cloud use cases.

Cross-regional access is supported for distributed workloads using remote VCN peering and upgraded Dynamic Routing Gateways (DRGs), enabling low-latency data processing across regions. Also, Data Flow supports connections to Oracle Database Clusters, such as RAC or Exadata, leveraging SCAN proxy functionality for efficient and high-availability access. These features are underpinned by stringent network isolation and traffic management practices, including private IPs, security rules, and DNS configuration, to ensure best performance, security, and compliance for serverless Spark executions.

Data Flow Private Endpoints Use Cases References