For the job run, you point to the main entry file using the
JOB_RUN_ENTRYPOINT environment variable. This variable is only
used with jobs that use zip or compressed tar job artifacts.
Using a Data Science Conda Environment 🔗
You can use one of the Data Science conda
environments that are included in the service.
A conda environment encapsulates all the third-party Python dependencies (such as Numpy, Dask, or XGBoost) that the job run requires. Data Science conda environments are included and maintained in the service. If you don't specify a conda environment as part of job and job run configurations, a conda environment isn't used because there's no default.
Your job code is embedded in a Data Science conda environment:
Find the Data Science conda that you want to
use, and then select from:
Start a job run and, to use a different conda
environment for the job run, use the custom environment variables to override
the job configuration.
Using a Custom Conda Environment 🔗
You can use a zip and compressed tar file jobs with custom conda environments or Data Science conda environments.
A conda environment encapsulates all the third-party Python dependencies (such as
Numpy, Dask, or XGBoost) that your job run requires. You create, publish, and
maintain custom conda environments. If you don't specify a conda environment as part
of job and job run configurations, a conda environment isn't used because there's no
default
Your job code is embedded in a custom conda environment
such as this:
The job and job run must be configured with a subnet that has a service
gateway to access the published conda environment in your tenancy's
Object Storage bucket.
Using a jobruntime.yaml file makes setting custom environment
variables in your project easier.
Change the jobruntime.yaml sample file to specify your
values.
Add variables that you want to use during the job run. You can add job run
specific environment variables
such as CONDA_ENV_TYPE or CONDA_ENV_SLUG,
and custom key pairs.
For example:
Copy
CONDA_ENV_TYPE: service
CONDA_ENV_SLUG: dataexpl_p37_cpu_v2
JOB_RUN_ENTRYPOINT: conda_pack_test.py
KEY1: value1
KEY2: 123123
Important
Nested variables aren't supported.
Note how the JOB_RUN_ENTRYPOINT for the project is included
in the runtime YAML, so you don't have to do this manually when you run the
job.
Create a simple project with a single python file and your
jobruntime.yaml file in a project root directory.
In the python file, read the environment variables, and print them to test that
they are accessible.
Archive the project root directory to a zip or compressed tar file.
For example, to zip a file on a Mac you could use:
Copy
zip -r zip-runtime-yaml-artifact.zip zip-runtime-yaml-artifact/ -x ".*" -x "__MACOSX"
From the Console, create a new job and upload the job archive file.
Run the job to test that it works.
Note that you don't need to provide any environment variables in the job run
because they're set in your .yaml file.
Monitor the job run for a successful finish.
(Optional)
If you used logging, then you can review them to see the job run values.
Using a Vault 🔗
You can integrate the OCI
Vault service into Data Science jobs using resource principals.
Before you begin:
For the resource principal in the job to have access to a vault, ensure that
you have a dynamic group in your compartment that either specifies the
instance or the resource principal. For example, you could use the resource
principal and a dynamic group with this rule:
Copy
all {resource.type='datasciencejobrun',resource.compartment.id='<compartment_ocid>'}
For the job to run, you must ensure that you can at least manage
secret-family on the dynamic group. For example:
Copy
Allow dynamic-group <dynamic_group_name> to manage secret-family in compartment <compartment_name>