Creating a Custom Model

Document Understanding provides an option to build custom models to extract insights from images without needing data scientists.

You need the following before building a custom model:

  • A paid tenancy account in Oracle Cloud Infrastructure.
  • Familiarity with Oracle Cloud Infrastructure Object Storage.
  • The correct policies set up.

Train the model using one of Document Understanding's custom model training modes. The training modes are:

  • Recommended training: Document Understanding automatically selects the training duration to create the best model. The training might take up to 24 hours.
  • Custom duration: This option lets you set the maximum training duration.

The best training duration depends on the complexity of the detection problem, the typical number of labels in a document, the resolution, and other factors. Consider these needs, and allocate more time as the training complexity increases. The minimum amount of training time recommended is 30 minutes. A longer training time gives greater accuracy, but gives diminishing returns in accuracy with time. Use the recommended mode to get a base optimized model. If you want a better result, increase the training time.

  • You need a project to create a model in. If you don't have one, see Creating a Project.
    1. From the project details page, select Create Model. If you need help finding the project details page, see Viewing a Project.
    2. Select the model type to train. either Document classification or Key value extraction.
      For a description of these types, see About Custom Models.
    3. Select the training data.
      • If you don't have any annotated documents, select to Create a New Dataset. You're taken to Oracle Cloud Infrastructure Data Labeling where you can easily add labels to the document content. For more information on annotating documents in Data Labeling, see the section on Labeling Documents.
      • If you do have annotated documents, Choose an existing dataset.
        • If you annotated the dataset in Data Labeling, select Data Labeling Service.
        • If you annotated the images using a third-party tool, select Object Storage.
    4. Select Next.
    5. Enter a name or the custom model.
    6. (Optional) Give the model a description to help you find it.
    7. Select the training duration:
      • Recommended training: Document Understanding automatically selects the training duration to create the best model. The training might take up to 24 hours.
      • Custom: This option let you set the maximum training duration (in hours).
    8. Select Next.
    9. Review the information you provided in the previous steps. You can make any changes, by selecting Previous.
    10. When you want to start training the custom model, select Create and train.
  • Use the create command and required parameters to create a model:

    oci ai-document model create [OPTIONS]

    For a complete list of flags and variable options for CLI commands, see the CLI Command Reference.

  • Run the CreateProject operation to create a project.

    Run the CreateModel operation to create a model.