Integrating Vision with Oracle Analytics Cloud (OAC)
Create a Data Integration flow that uses the Vision SDK
to detect objects in images and project that information into a table in a data warehouse. This
output data is then used by Oracle Analytics Cloud to create visualizations and find patterns.
This is the high-level flow of the system:Figure 1. High level flow between Vision and OAC
Before You Begin
To follow this tutorial, you must be able to create VCN networks, functions, and API
gateways, and use Data Integration and Vision.
Talk to your administrator about the policies required.
Setting Up the Required Policies 🔗
Follow these steps to set up the required policies.
In the Console navigation menu, select
Identity & Security.
Under Identity, select
Policies.
Select Create Policy.
In the Create Policy panel, populate Name and
Description.
For Name, enter a name without any spaces. You can use
alphanumeric characters, hyphens, periods, and underscores only.
For
Description, enter a description to help other
users know the purpose of this set of policies.
In Policy Builder, use the manual editor to add the following
statements:
Copy
allow group <group-name> to use cloud-shell in tenancy
allow group <group-name> to inspect all-resources in tenancy
allow group <group-name> to read instances in tenancy
allow group <group-name> to read audit-events in tenancy
allow group <group-name> to manage dis-workspaces in compartment <compartment-name>
allow group <group-name> to manage dis-work-requests in compartment <compartment-name>
allow group <group-name> to manage virtual-network-family in compartment <compartment-name>
allow group <group-name> to manage tag-namespaces in compartment <compartment-name>
allow service dataintegration to use virtual-network-family in compartment <compartment-name>
allow group <group-name> to manage object-family in compartment <compartment-name>
allow group <group-name> to manage functions-family in compartment <compartment-name>
allow group <group-name> to manage api-gateway-family in compartment <compartment-name>
allow group <group-name> to inspect instance-family in compartment <compartment-name>
allow group <group-name> to manage autonomous-database-family in compartment <compartment-name>
allow group <group-name> to use analytics-instances in compartment <compartment-name>
allow group <group-name> to manage repos in tenancy
allow group <group-name> to read objectstorage-namespaces in tenancy
allow group <group-name> to manage logging-family in compartment <compartment-name>
allow group <group-name> to read metrics in compartment <compartment-name>
allow group <group-name> to use apm-domains in compartment <compartment-name>
allow service faas to use apm-domains in compartment <compartment-name>
allow group <group-name> to use ai-service-vision-family in compartment <compartment-name>
Select Create.
1. Create a Virtual Cloud Network 🔗
Create a VCN to serve as the home for the serverless function and the API gateway created later in the tutorial.
1.1 Creating a VCN with Internet Access 🔗
Follow these steps to create a VCN with internet access.
In the navigation menu, select Networking.
Select Virtual Cloud Networks.
Slick Start VCN Wizard.
Select Create VCN with Internet Connectivity.
Select Start VCN Wizard.
Enter a name for the VCN. Avoid entering confidential information.
Select Next.
Select Create.
1.2 Accessing your VCN from the Internet 🔗
You must add a new stateful ingress rule for the public regional subnet to allow traffic on port 443.
From the Console navigation menu, select
Developer Services.
Select Gateways.
Select Create Gateway.
Enter a Name for the gateway. Don't enter confidential
information.
Set the Type of the gateway to
Public.
Select the Compartment to create the API Gateway
resources in.
Select the name of the VCN to use with the API Gateway.
Use the name of the VCN you created in section 1. Create a Virtual Cloud Network.
Select the name of the regional subnet in the VCN. Set it to the Public Subnet
you changed.
Select Create Gateway.
When the API Gateway is created, it's shown as Active in the
list on the Gateways page.
3. Create an Enrichment Function 🔗
Follow these steps to create an enrichment function than can be called from Oracle Cloud Infrastructure Data Integration.
Create a serverless function that only runs on demand. The function conforms to the
schema required to be consumed by Data Integration. The
serverless function calls Vision's API through Python SDK.
3.1 Creating an Application 🔗
To add a function, first we need to create an application.
import io
import json
import logging
import pandas
import requests
import base64
from io import StringIO
from fdk import response
import oci
from vision_service_python_client.ai_service_vision_client import AIServiceVisionClient
from vision_service_python_client.models.analyze_image_details import AnalyzeImageDetails
from vision_service_python_client.models.image_object_detection_feature import ImageObjectDetectionFeature
from vision_service_python_client.models.inline_image_details import InlineImageDetails
def handler(ctx, data: io.BytesIO=None):
signer = oci.auth.signers.get_resource_principals_signer()
resp = do(signer,data)
return response.Response(
ctx, response_data=resp,
headers={"Content-Type": "application/json"}
)
def vision(dip, txt):
encoded_string = base64.b64encode(requests.get(txt).content)
image_object_detection_feature = ImageObjectDetectionFeature()
image_object_detection_feature.max_results = 5
features = [image_object_detection_feature]
analyze_image_details = AnalyzeImageDetails()
inline_image_details = InlineImageDetails()
inline_image_details.data = encoded_string.decode('utf-8')
analyze_image_details.image = inline_image_details
analyze_image_details.features = features
try:
le = dip.analyze_image(analyze_image_details=analyze_image_details)
except Exception as e:
print(e)
return ""
if le.data.image_objects is not None:
return json.loads(le.data.image_objects.__repr__())
return ""
def do(signer, data):
dip = AIServiceVisionClient(config={}, signer=signer)
body = json.loads(data.getvalue())
input_parameters = body.get("parameters")
col = input_parameters.get("column")
input_data = base64.b64decode(body.get("data")).decode()
df = pandas.read_json(StringIO(input_data), lines=True)
df['enr'] = df.apply(lambda row : vision(dip,row[col]), axis = 1)
#Explode the array of aspects into row per entity
dfe = df.explode('enr',True)
#Add a column for each property we want to return from imageObjects struct
ret=pandas.concat([dfe,pandas.DataFrame((d for idx, d in dfe['enr'].iteritems()))], axis=1)
#Drop array of aspects column
ret = ret.drop(['enr'],axis=1)
#Drop the input text column we don't need to return that (there may be other columns there)
ret = ret.drop([col],axis=1)
if 'name' not in ret.columns:
return pandas.DataFrame(columns=['id','name','confidence','x0','y0','x1','y1','x2','y2','x3','y3']).to_json(orient='records')
for i in range(4):
ret['x' + str(i)] = ret.apply(lambda row: row['bounding_polygon']['normalized_vertices'][i]['x'], axis=1)
ret['y' + str(i)] = ret.apply(lambda row: row['bounding_polygon']['normalized_vertices'][i]['y'], axis=1)
ret = ret.drop(['bounding_polygon'],axis=1)
rstr=ret.to_json(orient='records')
return rstr
Oracle Cloud Infrastructure Data Integration supports calling
functions, where the data payload is a single base 64 encoded string that contains
the records to process and a set of parameters. For
example:
The
encoded data is the base 64 encoded version of a set of JSON Lines format (each line
is a JSON for each record). Each record has an ID that's used to associate the
output. Decoding the example string
gives:
From the Console navigation menu, select
Identity & Security.
Select Dynamic Groups.
Create a dynamic group with the following rule:
Copy
ALL {resource.type = 'fnfunc', resource.compartment.id = '<compartment-id>'}
Add the following statements to the policy:
Copy
allow any-user to use functions-family in compartment <compartment-name> where ALL {request.principal.type= 'ApiGateway', request.resource.compartment.id = '<compartment-id>'}
allow dynamic-group <dynamic-group-name> to use ai-service-vision-family in tenancy
5. Creating an Oracle Cloud Infrastructure Data Integration Workspace 🔗
Before you can use Data Integration, ensure you have
the rights to use the capability.
allow any-user to read buckets in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<data-integration-workspace-ocid>', request.operation = 'GetBucket'}
allow any-user to manage objects in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<data-integration-workspace-ocid>'}
allow any-user to manage buckets in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<data-integration-workspace-ocid>', request.permission = 'PAR_MANAGE'}
allow any-user {PAR_MANAGE} in compartment <compartment-name> where ALL {request.principal.type='disworkspace', request.principal.id='<data-integration-workspace-ocid>'}
allow any-user to use functions-family in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id='<data-integration-workspace-ocid>'}
7. Prepare the Data Sources and Sinks 🔗
You're using car parking images along with the date the images were taken as sample
data.
Gather 10 images (or more) of parked cars as the data source on which you perform object
detection analysis using Data Integration and Vision.
7.1 Loading Sample Data 🔗
Load the parked car images sample data to your bucket.
Find 10 images of parked cars, either locally or online.
In the Console navigation menu, select
Storage.
Select Buckets.
Select an existing bucket, or create a new one.
On the Buckets details page, under Objects, select
Upload.
Drag to the drop zone the 10 image files you gathered in step 1.
Select Upload.
Create a CSV file with a table of four columns and 10 rows. The column names
are Record ID, Image Name, Date Taken, and Image Location. Fill in the Record ID
column from 1 to 10.
Figure 2. Sample Data File
Name the file cars.csv.
Fill in the table by providing the image names, date taken, and image
location.
The image location can be found by selecting the Actions menu () for the image in
the Console when viewing the bucket. Select View
Object Details and copy the URL path to cars.csv.
Upload cars.csv to the bucket.
Select Close.
7.2 Creating a Staging Bucket 🔗
Data Integration needs a staging location to dump
intermediate files in, before publishing data to a data warehouse.
In the Console navigation menu, select
Oracle Database.
Select Autonomous Data Warehouse.
Select your Compartment.
Select Create Autonomous Database.
Enter a Display Name. Avoid entering confidential
information.
Enter a Database Name. Avoid entering confidential
information.
Set Workload type to Data
warehouse.
Create the username and password for the database's adminsitrator.
Set Access type to Secure access from
anywhere.
Set Authentication to mTLS.
Set License type to BYOL.
Select Create Autonomous Database.
When your database has provisioned, on the Database details page, select
Database Actions.
Log in with the credentials you supplied in step 8.
Select Development.
Select SQL.
Create a Contributor user by running the following SQL:
Copy
CREATE USER USER1 IDENTIFIED BY "<enter user1 password here>";
GRANT DWROLE TO USER1;
ALTER USER USER1 QUOTA 200M ON DATA;
Autonomous Databases come with a predefined database role called
DWROLE. It provides the common privileges for a
database developer or data scientist to perform real-time analytics.
Depending on the usage requirements you might also need to grant privileges
to other users.
7.4 Creating a Table to Project the Analyzed Data 🔗
Create a table to store any information about the detected objects.
Create the components necessary to create a data flow in Data Integration.
The data flow is: Figure 3. The Data Flow
All the underlying storage resources were created in earlier chapters. In Data Integration, you create the data assets for each of the
elements of the data flow.
8.1 Creating a Data Asset for your Source and Staging 🔗
Create a data asset for your source and staging data.
Provide an object storage location where the intermediate files
can be created when the data flow runs:
Data Asset
cars-data-source
Connection
Default connection
In Schema
Select data-staging object storage
location
Map the output of the function to the correct fields in the target database
table. Use the mappings in the following table:
Function Output Map
Name
Mapping
RECORD_ID
RECORD_ID
IMAGE_NAME
Image_Name
DATE_TAKEN
DATE_TAKEN
IMAGE_LOCATION
Image_Location
OBJECT_NAME
name
OBJECT_CONFIDENCE
confidence
VERTEX X1
x0
VERTEX Y1
y0
VERTEX X2
x1
VERTEX Y2
y1
VERTEX X3
x2
VERTEX Y3
y2
VERTEX X4
x3
VERTEX Y4
y3
The mappings should look like:Figure 5. Mappings One to FourFigure 6. Mappings Five to EightFigure 7. Mappings Nine to TwelveFigure 8. Mappings Thirteen and Fourteen
8.8 Running the Data Flow 🔗
Run the data flow to populate the target database.