RAG Tool Object Storage Guidelines for Generative AI Agents

Review the following sections to prepare Object Storage data for RAG tools in Generative AI Agents.

General Guidelines

Follow these guidelines to prepare data for Generative AI Agents data sources before uploading to Object Storage:

  • Data Sources: Data for Generative AI Agents must be uploaded as files to an Object Storage bucket.
  • Number of Buckets: Only one bucket is allowed per data source.
  • Supported File Types: Only PDF and txt files are supported.
  • File Size Limit: Each file must be no larger than 100 MB.
  • PDF Contents: PDF files can include images, charts, and reference tables but these must not exceed 8 MB.
  • Chart Preparation: No special preparation is needed for charts, as long as they're two-dimensional with labeled axes. The model can answer questions about the charts without explicit explanations.
  • Table Preparation: Use reference tables with several rows and columns. For example, the agent can read the table on the limits page.
  • URLs: All the hyperlinks present in the PDF documents are extracted and displayed as hyperlinks in the chat response.
  • Data Not Ready: If your data isn't yet available, create an empty folder for the data source and populate it later. This way, you can ingest data into the source after the folder is populated.
Note

Set up the following Object Storage permissions before you proceed.

  • User access to Object Storage files
  • Data ingestion job access to Object Storage files for long-running jobs

See Getting Access for the permissions.

Ensuring Enhanced Table Understanding

Enhanced table understanding, a feature of RAG tools, aims to enhance the accuracy of responses to queries with answers embedded in PDF table data. It processes these tables to generate more precise and relevant responses aligned with the information they contain. In general, the RAG tools can read the tables. For the RAG tool to read the tables with enhanced table understanding, ensure that the tables have the following features:

  • All cells of the table are separated with visible lines or object boundaries from other cells, including the header names in the first row.
  • All columns including the first column have a header name.
  • Each table has more than one column and more than one row, excluding the row with header names.
Tables that are ingested with enhanced table understanding are listed when you ingest the data. Example message:
Count of tables that support enhanced table understanding in following PDFs:
      - enhanced_table_test_data/2025_Report1.pdf has 4 tables processed successfully
      - enhanced_table_test_data/2025_Report2.pdf has 3 tables processed successfully
      - enhanced_table_test_data/2025_Report3.pdf has 3 tables processed successfully

Enhancing Responses with Metadata Filtering

Use predefined metadata to apply filters during a chat. When filters are applied, an agent's searches in a chat session are limited to data files that are associated with the metadata, helping the model generate answers relevant to the content scope, thus enhancing the agent's response accuracy and relevance.

The following steps describe an overview of how to use the metadata filtering feature. After you understand the workflow overview, review the details for your use case in the sections provided after the overview steps.

  1. In a text editor, create the metadata schema, which is required for the filters that you want to be made available. Write the schema in JSON format. Name the file _metadata_schema.json.

    Example:

    {
        "metadataSchema": [
            {
                "name": "publication_year",
                "type": "integer"
            },
            {
                "name": "title",
                "type": "string"
            }
    	]
    }
  2. Upload the _metadata_schema.json file created in step 1 to the root level of the Object Storage bucket that contains the data files for a knowledge base.
  3. Create JSON files to associate data files with the predefined metadata and provide the metadata values.

    Example:

    {
        "metadataAttributes": {
            "publication_year": 2020
        }
    } 

    You can associate one or more data files or all files in a bucket with the metadata. For details about the JSON file name conventions to use for the options you choose, see Metadata Filter Options (File Name and Location).

  4. Upload the JSON files created in step 3 to the Object Storage bucket that contains the data files for a knowledge base. For each option, ensure that you save the file in the correct location in the hierarchy.
  5. Create a knowledge base. Select Object Storage as the data store type, and the option to automatically start the ingestion job.

    When the data files are ingested, Generative AI Agents creates a list of the metadata names and the values that can be selected in a chat. To view the ingested metadata names and values, see Getting a Knowledge Base's Details in Generative AI Agents.

  6. Create an agent with a RAG tool, selecting the knowledge base created in step 5. In the agent, select the option to automatically create an endpoint. If you need help, see Creating an Agent and Creating a RAG tool.
  7. In a chat window, add one or more predefined metadata filters and select the values to apply. See Use Metadata Filters in a Chat.
Note

Review the following sections to learn more about preparing metadata JSON files for your use case and how to add and apply metadata filters in a chat session.

Adding Metadata to an Object Storage Metadata Header

Create an Object Storage bucket and upload source files for RAG responses in OCI Generative AI Agents. Optionally, add a custom URL to each file for citation.
  1. In the navigation bar of the Console, select a region that hosts Generative AI Agents, for example, US Midwest (Chicago). If you don't know which region to select, see Regions with Generative AI Agents.
  2. Open the navigation menu  and select Storage. Under Object Storage & Archive Storage, select Buckets.
  3. Select the compartment in which you want to create a bucket or the compartment that contains the bucket that you want to use. You must already have the following permission to add Object Storage resources to this compartment.
    allow group <your-group-name> to manage object-family in compartment <compartment-with-bucket>
  4. To create a bucket follow these steps:
    1. Select Create Bucket.
    2. Enter a name unique to your region for the bucket.
    3. For other fields, select the Learn more links and then select options that apply to your data. Also see Creating an Object Storage Bucket.
    4. Select Create.
      By default, a new bucket is private. You can change the visibility of a bucket after you create it.
  5. Select the name of the bucket that you want to use.
  6. On the bucket details page, under Objects, select Upload.
  7. (Optional) Select Show Optional Headers and Metadata and then select and enter the following values.
    • Type: Metadata
    • Name: gaas-metadata-filtering-field-<metadata-name>
    • Value: <metadata-value>
    Important

    For the metadata filtering to work, you must use the prefix gaas-metadata-filtering-field- for the metadata Name.

    Object Storage then prepends opc-meta- to the metadata name, so the header is displayed as opc-meta-gaas-metadata-filtering-field-<metadata-name>.

    For example, to add a metadata with the name publication_year, add a metadata header with the name gaas-metadata-filtering-field-publication_year. When you get the details for this file, the metadata name displays as opc-meta-gaas-metadata-filtering-field-publication_year.

    For list values, use the following format:

    _LIST_OF_STRING_|list_value_1|list_value_2, where _LIST_OF_STRING_ is fixed, and each list item is separated by a pipe '|' character. This format is decoded as a list of values: {list_value_1, list_value_2}

  8. Add one or more files for the data source and select Upload.
    Note

    • You can't update the metadata property of existing objects. Instead, you can copy a file, add a new metadata to that file, and then delete the old file.

    • You can add filters to your chat conversation with an agent using the metadata filtering after the knowledge base data from Object Storage and its metadata are ingested. To learn about adding filters, see step 11 in Chatting with Agents in Generative AI Agents. You can also view details of metadata values after you ingest the data in a knowledge base. See the Metadata resource in Getting a Knowledge Base's Details in Generative AI Agents.

Adding Data with Custom URL to an Object Storage Bucket

Create an Object Storage bucket and upload source files for RAG responses in OCI Generative AI Agents. Optionally, add a custom URL to each file for citation.
  1. In the navigation bar of the Console, select a region that hosts Generative AI Agents, for example, US Midwest (Chicago). If you don't know which region to select, see Regions with Generative AI Agents.
  2. Open the navigation menu  and select Storage. Under Object Storage & Archive Storage, select Buckets.
  3. Select the compartment in which you want to create a bucket or the compartment that contains the bucket that you want to use. You must already have the following permission to add Object Storage resources to this compartment.
    allow group <your-group-name> to manage object-family in compartment <compartment-with-bucket>
  4. To create a bucket follow these steps:
    1. Select Create Bucket.
    2. Enter a name unique to your region for the bucket.
    3. For other fields, select the Learn more links and then select options that apply to your data. Also see Creating an Object Storage Bucket.
    4. Select Create.
      By default, a new bucket is private. You can change the visibility of a bucket after you create it.
  5. Select the name of the bucket that you want to use.
  6. On the bucket details page, under Objects, select Upload.
  7. (Optional) Select Show Optional Headers and Metadata and then select and enter the following values.
    • Type: Metadata
    • Name: customized_url_source
    • Value: <Custom-URL-for-the-file>
    Important

    For the citation link override to work, you must use Name: customized_url_source.
  8. Add one or more files for the data source and select Upload.
    Note

    If you added the customized_url_source metadata to an object in step 7, this custom URL applies to all the files that you upload for this object. You can't update the metadata property of existing objects. Instead, you can copy a file, add a new metadata to that file, and then delete the old file. To add or update a file with the customized_url_source metadata, using OCI CLI, see Assigning a Custom URL to a Citation.
Note

Beta Customers:

If you created a knowledge base in the Beta phase, you might need to delete and re-create the data source for the URL handling feature to work.

Assigning a Custom URL to a Citation

When an agent uses the RAG for its responses, you can get citations. By default, the citations point to Object Storage where the files are stored. To reference a URL instead of the file that's being referenced, you can add a custom URL to the metadata object for that file.

This topic shows how to add or update the metadata object through OCI CLI.

  1. Start OCI CLI in an environment or in Cloud Shell. We recommend that you try it in Cloud Shell first to become familiar with the commands.
  2. Get the object name for the file that you want to add a custom URL to:
    Command
    oci os object list --bucket-name <the-bucket-name> 
    --file <the-file-name>
    Example output:
    "data": [
        {
          "archival-state": null,
          "etag": "xxx",
          "md5": "xxx==",
          "name": "<the-object-name>",
          "size": 1117630,
          "storage-tier": "Standard",
          "time-created": "2025-03-12T22:21:26.991000+00:00",
          "time-modified": "2025-03-12T22:38:10.217000+00:00"
        },
    Other objects are listed similarly after this comma.

    You can also find the object name in the Console. In the bucket details page, select the Actions menu Actions Menu for the object, select View Object Details and copy the name.

    Note

    If a file is in a folder, then the file name and its object name differ. For example, for a file named file1.pdf, its object name could be folder1/file1.pdf. Otherwise, the file name and its object name are the same.
  3. Download the file into the current working directory.

    To add or update a file's metadata object, you replace the file with the same file that has a new metadata object. That's why you're copying the file into the current working directory first.

    Command
    oci os object get 
    --bucket-name <the-bucket-name> 
    --file <the-file-name>
    --name <the-object-name>
  4. Find the metadata object values for the current file.
    Command
    oci os object head 
    --bucket-name <the-bucket-name> 
    --name <the-object-name>
    Example output:
    {
     some data
    
      "opc-client-request-id": "xxx",
      "opc-meta-key1": "value1",
      "opc-meta-key2": "value2",
      "opc-request-id": "xxx",
     ...
    }
    

    This example shows that the metadata object value is '{"key1":"value1","key2":"value2"}'. The metadata name is saved with a prefix of opc-meta-, but you don't have to add this prefix when you add the metadata name in the next steps. This prefix is added automatically to each metadata name.

  5. Replace the file that's in Object Storage with the same file that's in the current working directory, and add a new metadata object.

    To keep the current metadata and add the custom URL name and values, '{"customized_url_source":"<the-custom-url>" to the metadata object:

    Command
    oci os object put 
    --bucket-name <the-bucket-name> 
    --file <the-file-name> 
    --name <the-object-name>
    --force --metadata 
    '{"customized_url_source":"<the-custom-url>",
    "<existing-metadata-name-1>":"<existing-metadata-value-1>"
    "<existing-metadata-name-2>":"<existing-metadata-value-2>"}'

    For example, to keep the metadata names and values displayed in the step 4 example:

    Command
    oci os object put 
    --bucket-name <the-bucket-name> 
    --file <the-file-name> 
    --name <the-object-name>
    --force --metadata 
    '{"customized_url_source":"<the-custom-url>",
    "key1":"value1",
    "key2":"value2"}'

    To replace the existing metadata object to only include the custom URL run the following command

    Command
    oci os object put 
    --bucket-name <the-bucket-name> 
    --file <the-file-name> 
    --name <the-object-name>
    --force --metadata '{"customized_url_source":"<the-custom-url>"}'
  6. Ensure that the metadata object for the custom URL is replaced.
    Command
    oci os object head 
    --bucket-name <the-bucket-name> 
    --name <the-object-name>
    Example output:
    {
     some data
    
      "opc-meta-customized_url_source": "some-new-link",
     ...
    }
    
Important

  • The metadata object that overrides the default citation must have the name, customized_url_source.
  • You can have one metadata object with the name, customized_url_source
  • Each customized_url_source can have only one URL.
  • The commands in step 5 works for both adding and updating the metadata object, because they replace the current metadata object's value.
  • Ensure that you pass the values for the --metadata object with the format shown in the commands in step 5.
Links