Creating an Integration Task

Create an integration task in a project or folder in Data Integration. An integration task lets you take a Data Integration data flow and configure the parameter values that you want to use at runtime.

Data Integration includes one default project to get you started. To create your own project or folder, see Projects and Folders. An integration task in a project or folder can reference a data flow from any project or folder in the same workspace.

In Data Integration, by default you can have simultaneous or parallel task runs of a task at a given time. To disallow concurrent task runs that are initiated manually, select the Disable simultaneous execution of the task checkbox when you create the task. When simultaneous task runs are disallowed, a run request for the task fails if there's already a task run in progress that's in a non-terminal state.

To update the run configuration of a task to use the OCI Data Flow service, ensure that you have already created the prerequisite resources such as Object Storage buckets and Data Flow pools, as described in Required Setup and Policies for OCI Data Flow Service to Run Tasks.

    1. Open the project or folder in which you want to create the task.

      For the steps to open the details page of a project or folder, see Viewing the Details of a Project or Viewing the Details of a Folder.

    2. On the project or folder details page, click Tasks.
    3. In the Tasks section, click Create task and select Integration.
    4. On the Create integration task page, enter the following basic information:
      1. In the Name and Identifier fields, enter the values that you want or let Data Integration fill in the values automatically, based on the name of the data flow that you select for this task (in the next step).

        If you want Data Integration to fill in the fields automatically, don't change or enter values before you select a data flow. You can change the values after the fields are populated with values that are based on the selected data flow's name.

        In either case, the identifier is a system-generated value based on the name. You can change the value, but after you create and save the task, you can't update the identifier.

      2. (Optional) Enter a description for the task.
      3. Select the Disable simultaneous execution of the task checkbox if you want to disallow concurrent runs of this task.
      4. (Optional) For Project or folder, click Select and select a different project or folder to save the task in.
    5. In the Data flow section, click Select and select the data flow that this task runs by following these steps:
      1. In the Select a data flow panel, perform one of the following actions:
        • Select a data flow that's saved in the project or folder that you're working in currently.
        • To select a data flow that's saved in a different project or folder, click Select next to the current project or folder name. In the Select project or folder panel that appears, select the project or folder and click Select. Then select the data flow from the list of available data flows.
      2. Click Select.

        Data Integration starts validating the selected data flow, and you're returned to the Create integration task page.

        If there are any errors or warnings in the data flow, click the data flow's name to open it in a new tab. Resolve the errors or warnings and save the data flow. When you navigate back to the Create integration task page, Data Integration automatically validates the data flow again.

    6. To save the task for the first time, click one of the following buttons:
      • Create: Creates and saves the task. You can continue to create and edit the task.

      • Create and close: Creates and saves the task, closes the page, and returns you to the tasks list on the project or folder details page.

    7. Save periodically while you work by clicking one of the following buttons:
      • Save: Commits changes since the last save. You can continue editing after saving.

      • Save and close: Commits changes, closes the page, and returns you to the tasks list on the project or folder details page.

      • Save as: Commits changes (since the last save) and saves to a copy instead of overwriting the current task. You can provide a name for the copy and select a different project or folder for the copy, or save the copy in the same project or folder as the existing task.

    8. In the Run configuration section, do one of the following:
      • By default, all tasks that you create in Data Integration are configured to run in the OCI Data Integration service, as indicated by the label Task run service: OCI Data Integration service. No additional configuration is needed. Proceed to step 10.

      • To run this task in the OCI Data Flow service, click Edit.

        Note

        Ensure that you have already created the required resources and policies for using the OCI Data Flow service. See Required Setup and Policies for OCI Data Flow Service to Run Tasks.

        If you have satisfied the prerequisites, proceed to step 9 to update the task's run configuration, and optionally use parameters for the run properties.

    9. On the Update task run configuration page, click OCI Data Flow service. Then complete the following selections to update or parameterize the run properties for OCI Data Flow.

      These steps should only be done after you have satisfied the prerequisite tasks as described in Required Setup and Policies for OCI Data Flow Service to Run Tasks.

      1. Select the pool in OCI Data Flow to run this task.
      2. (Optional) Select the private endpoint in OCI Data Flow.
      3. For Log bucket path, select the Object Storage bucket to use for OCI Data Flow application run logs.

        If this is the first time you're editing the task's OCI Data Flow service run configuration, and the bucket dis-df-system-bucket already exists in Object Storage, Data Integration automatically selects that bucket, as indicated by oci://dis-df-system-bucket@<tenancy-name> in the selection field.

      4. For Artifact bucket path, select the Object Storage bucket to use for Data Integration run job artifacts such as jar and zip files.

        If this is the first time you're editing the task's OCI Data Flow service run configuration, and the bucket dis-df-system-bucket already exists in Object Storage, Data Integration automatically selects that bucket, as indicated by oci://dis-df-system-bucket@<tenancy-name> in the selection field.

      5. (Optional) For Application compartment, select the compartment for the OCI Data Flow application that's created when Data Integration service tasks are run in the Data Flow service.

        If an application compartment is not specified, the Data Integration application compartment is used.

      6. Enter the minimum number of workers (or executors) to use for OCI Data Flow jobs.

        The default is 1. If the value for Maximum number of workers is also 1, then dynamic allocation for OCI Data Flow jobs is not used.

      7. Enter the maximum number of workers (or executors) to use for OCI Data Flow jobs.

        The default is 1, which indicates that dynamic allocation is not used. If you want to use dynamic allocation for OCI Data Flow jobs, specify a larger value. This value must be greater than or equal to the value for Minimum number of workers.

      8. (Optional) For OCI Data Flow Spark configuration properties, enter one or more Spark properties to use for the task run.

        A Spark property is a key-value pair. Click Another property to add more key-value pairs, as needed.

        For the Spark configuration properties that you can add, see Supported Spark Properties.

      9. (Optional) After configuring any task run property (steps 9a to 9h), click Parameterize that's below the configured property value to assign a parameter to that property.

        Upon parameterizing, Data Integration adds a parameter of type String and sets the default parameter value to the value that's currently configured for that property. The label Parameterized followed by a parameter name is displayed. For example: Parameterized: OCI_DF_POOL

        The parameter names are:

        Task run property Parameter name
        Pool OCI_DF_POOL
        Private endpoint OCI_DF_PRIVATE_ENDPOINT
        Log bucket path OCI_DF_LOG_BUCKET
        Artifact bucket path OCI_DF_ARTIFACT_BUCKET
        Application compartment OCI_DF_APP_COMPARTMENT
        Minimum number of workers OCI_DF_MIN_WORKERS
        Maximum number of workers OCI_DF_MAX_WORKERS
        Custom OCI Data Flow configuration OCI_DF_CUSTOM_OCI_DF_SPARK_CONFIG

        The actions for a parameter are:

        • Click Edit to add or edit a parameter description. The parameter name and type cannot be edited. A parameter description, if added, is displayed as a tip in the panel for changing parameter values at design time or runtime.
        • Click Remove if you no longer want a property to be parameterized.
      10. Click Save.
    10. (Optional) If parameters are assigned in the selected data flow, view and change the default parameter values by following these steps:
      1. In the Configure parameters section, click Configure.
      2. On the Configure parameters page, change the default values as needed.

        Consider the following restrictions when editing the default values:

        • If the incremental extract strategy for a BICC source is configured to use a date that's managed by the Data Integration system parameter SYS.LAST_LOAD_DATE, you're not allowed to change the date value during design time.

        • If the incremental extract strategy for a BICC source is configured to use a parameterized last extract date that you have added, you can change the date value during design time and runtime.

        • If a parameterized target data entity is configured to use the Merge strategy, you can change the Merge key selection.

        • For a parameterized data asset that requires a staging location: If you select a data asset that doesn't have a default staging location specified in that data asset, Data Integration displays a warning notification. When you see the notification, choose one of the following options:
          • Update that data asset by adding a default staging location.
          • Choose a different data asset that has a default staging location.

      3. Click Configure or Cancel.
        You're returned to the Create integration task page.
    11. (Optional) In the Validate task section, click Validate to check the parameter configurations.

      If there are errors or warnings, click View messages. Resolve any errors before you publish the task.

    12. When you finish configuring the task, click Create and close or Save and close.
    Publish the integration task to an application in Data Integration before you run the task or schedule the task for running. You can also publish the integration task to OCI Data Flow, if applicable. For publishing information, see Task Publishing.
  • Use the oci data-integration task create-integration-task command and required parameters to create an integration task:

    oci data-integration task create-integration-task [OPTIONS]

    For a complete list of flags and variable options for CLI commands, see the Command Line Reference.

  • Run the CreateTask operation with the appropriate resource subtype to create an integration task.