Perform all the steps in Step 1. Creating a User Group, and name your
group, data-scientists.
Perform all the steps in Step 2. Creating a Compartment, and name the
compartment for your work data-science-work.
From the detail page of data-science-work compartment, copy the
<data-science-work-compartment-ocid>.
Follow all the steps in Step 3. Creating a VCN and Subnet. This step
is required for this tutorial. In the data-science-work
compartment, use the wizard to create a VCN with the name,
datascience-vcn.
In Step 4. Creating Policies, create a policy in the
data-science-work compartment called data-science-policy, and
only add the following policies:
Copy
allow group data-scientists to manage all-resources in compartment data-science-work
allow service datascience to use virtual-network-family in compartment data-science-work
The first policy gives you administrative rights to your compartment,
in which you can manage all the resources of OCI services.
In Step 5. Creating a Dynamic Group with Policies, create a dynamic
group called data-science-dynamic-group with the following three matching
rules:
Replace
<data-science-work-compartment-ocid>
with the OCID that you copied in step
3.
Copy
ALL {resource.type='datasciencenotebooksession', resource.compartment.id='<data-science-work-compartment-ocid>'}
Copy
ALL {resource.type='datasciencemodeldeployment', resource.compartment.id='<data-science-work-compartment-ocid>'}
Copy
ALL {resource.type='datasciencejobrun', resource.compartment.id='<data-science-work-compartment-ocid>'}
Note
You
only need the last matching rule for the datasciencejobrun
resource used in this tutorial. Add the other Data Science resources to be
prepared for working with notebook sessions and model
deployments.
For Step 5, create a policy called
data-science-dynamic-group-policy in the root (tenancy)
compartment. Click Show manual editor and add the
following policies for the data-science-dynamic-group:
Copy
allow dynamic-group data-science-dynamic-group to manage all-resources in compartment data-science-work
allow dynamic-group data-science-dynamic-group to read compartments in tenancy
allow dynamic-group data-science-dynamic-group to read users in tenancy
For Step 6. Creating a Notebook Session, create a project in the
data-science-work compartment called DS Project and skip
creating a notebook session.
Note
In this tutorial, you name your Data Science project, DS Project,
and later you name your Data Integration project DI Project. Don't
name your project Initial Project as you are instructed
in Step 6.
Allow the Data Integration service to create workspaces.
Open the navigation menu and click
Identity & Security. Under Identity, click
Policies.
In the left navigation, under List Scope, click
data-science-work for the compartment.
Click data-science-policy that you created in the Set Up
Resources step.
Click Edit Policy Statements.
Click Advanced.
In a new line, add the following statement:
Copy
allow service dataintegration to use virtual-network-family in compartment data-science-work
Click Save Changes.
Note
The preceding policy allows the Create workspace dialog of the Data
Integration service to list the VCNs in the data-science-work
compartment, allowing you to assign a VCN to your workspace when you create
it. The workspace then uses this VCN for its resources.
In this step, you add Data Integration workspaces to the
data-science-dynamic-group. The
data-science-dynamic-group-policy allows all members of this dynamic
group to manage the data-science-family. This way, the workspace resources
such tasks schedules can create your Data Science jobs.
Open the navigation menu and
click Identity & Security. Under Identity, click Dynamic
Groups.
In the list of Dynamic Groups, click the
data-science-dynamic-group that you created in the
Set Up Resources step.
Click Edit All Matching Rules.
Add the following matching rule:
Copy
ALL {resource.type='disworkspace', resource.compartment.id='<data-science-work-compartment-ocid>'}
Replace <data-science-work-compartment-ocid> with the
OCID for datascience-work compartment.
Tip
You can copy the
<data-science-work-compartment-ocid> from another
rule in the data-science-dynamic-group matching
rules, because they all point to the datascience-work
compartment.
The preceding matching rule means that all Data Integration workspaces
created in your compartment are added to
data-science-dynamic-group. The
data-science-dynamic-group-policy created for
data-science-dynamic-group now applies to the
workspaces in this compartment.
When you create a job, you set the infrastructure and artifacts for the job. Then you
create a job run that provisions the infrastructure, runs the job artifact, and when
the job ends, deprovisions and destroys the used resources.
In the hello_world_job page, click Start a job
run.
Select the data-science-work compartment.
Name the job run,
hello_world_job_run_test.
Skip the Logging configuration override and
Job configuration override sections.
Click Start.
In the trail that displays the current page, which is now the job run details
page, click Job runs to go back and get the list of job
runs.
For the hello_world_job_run_test, wait for the
Status to change from Accepted to
In Progress, and finally to
Succeeded before you go to the next step.
This workspace uses datascience-vcn, and the Data Science job that you
created uses the Default networking option that Data Science offers. Because
you have given the Data Integration service access to all resources in the
data-science-work compartment, it doesn't matter that the VCNs differ.
Data Integration has a scheduler in datascience-vcn, creating job runs in the
Default networking VCN.
Create a task and define the REST API parameters for creating a job run.
In the trail that displays the current page, go back to the
hello_world_workspace workspace.
In the Quick actions panel of the
hello_world_workspace, click Create REST
task.
Name the task,
hello_world_REST_task.
For Project or Folder, select DI
Project.
Configure your REST API
details:
HTTP method: POST
URL: Find the API endpoint and path for your URL:
From the Data Science API, copy the API
endpoint for your region. The endpoint must include the
<region-identifier> you copied in the
Gather Job Info
section.
Click Next, review the default conditions, and
keep their default options:
Success condition:SYS.RESPONSE_STATUS >= 200 AND SYS.RESPONSE_STATUS <
300
Click Configure.
For Authentication, Configure the
following options:
Authentication:
OCI resource principal
Authentication source: Workspace
Click Configure.
Skip configuring the Parameters (Optional) panel.
Click Validate task.
After you get Validation: Successful, click
Create.
After the workspace shows that REST task created successfully, click
Save and Close.
Note
In the Request body of your REST
task, you assign values to the parameters needed for creating a job
run. You use the same values as the hello_world_job you created in
Data Science in the Create a Job section of this
tutorial.
Before you schedule the hello_world_REST_task, test the task by
manually running it:
In the hello_world_workspace workspace, click the
Applications link.
Click Scheduler Application.
To confirm that the task is published, see if the
hello_world_REST_task is listed in the tasks for this
application.
In the list of tasks, click Actions menu for
hello_world_REST_task, and then click
Run.
In the list of Runs, click the latest run,
hello_world_REST_task_<id>.
Example:
hello_world_REST_task_1651261399967_54652479
Wait for the status of your run to change from Not
Started to Success.
Note
Troubleshooting
If you get an Error status, go back to your
project, and check the URL and the request body of your REST task,
including the OCIDs that you assigned to the REST task. Then:
Update the hello_world_REST_task URL or
the request body with your fixes.
Open the navigation menu and click Analytics and AI. Under
Data Lake, click Data Integration.
Click Workspaces.
Select the data-science-work compartment.
Click the hello_world_workspace.
Click Applications and then Scheduler
Application.
In the left navigation panel, click Schedules.
Click Create schedule.
Set up the following options:
Name:hello_world_schedule
Identifier:
HELLO_WORLD_SCHEDULE
Time Zone:UTC
Ensure that you keep the default value of
universal time zone:
(UTC+00:00) Coordinated
Universal Time (UTC)
Frequency: Hourly
Repeat every: 1 (1 hour)
Minutes: 0
Summary: At 0 minutes past every
hour
Tip
Check your time and change the
Minutes: to 5 minutes after your
current time. For example, if your current time is 11:15, change
Minutes to 20. This way, you don't
have to wait 45 minutes to see the job run. This tutorial uses
zero minutes for the next sections.
Click Create.
Note
In this step, you set up a schedule in the Scheduler
Application. In the next step, you associate the
schedule with the hello_world_REST_task.
Check that the Data Science Job runs displays the scheduled task from Data
Integration.
Open the navigation menu and click Analytics and AI. Under
Machine Learning, click Data Science.
Select the data-science-work compartment.
Click the DS Project you created in the Prepare section
of this tutorial.
In the left navigation panel, click Jobs.
Click hello_world_job.
In the list of Job runs, wait for the HELLO
WORLD JOB RUN instance to list with the scheduled
date.
Click HELLO_WORLD_JOB_RUN.
Copy the value for Created by in a notepad.
Example: ocid1.disworkspace.oc1.phx....
Open the navigation menu and click Analytics and AI. Under
Data Lake, click Data Integration.
Click Workspaces.
In the list of workspaces, click Actions menu for
hello_world_workspace.
Click Copy OCID, and copy the workspace OCID and compare
it with the value you copied for Created by, in step 8.
The two OCIDs are the same.
Note
The creator of the jobs is the OCID of
the Data Integration workspace,
hello_world_workspace.
(Optional)
Engage yourself with other tasks, and then come back in an hour for the next
job run.
Note
If you want run jobs that are less than an hour apart, then create several
hourly schedules with different minutes. For
example, for the schedules to be 15 minutes apart, create four hourly
schedules: minute-0, minute-15, minute-30, and minute-45. Then, for the
hello_world_REST_task, create a task schedule for each
schedule. For example, a task schedule for the minute-15 schedule, another
task schedule for the minute-30 schedule.