For example, to create a text classification model, you give the model many examples of
text records and you label them with the class that they belong to. The model learns
the characteristics of the labeled records. Then the trained model can infer the class for
new records.
Training data must be labeled. For example, to create a text classification model, you give the model representative examples of text records with a label for each record. These examples enable the model to learn and predict for examples that aren't seen by a model. To label data, we recommend that you use the OCI CLI toData Labeling service.
Dataset Recommendations for Custom Models
Follow the guidelines in the following table to prepare training datasets. If you're lacking datasets for validation and test, then a random 60% of items are used for training, 20% for validation, and 20% for test.
If you don't provide a validation or test dataset, then a random 20% of the samples are chosen by the service.
Training Set
Validation Set
Test Set
Custom Named Entity Recognition
Minimum — 10 instances per entity.
Recommended — 50 instances per entity.
Minimum — 5 instance per entity or 20% of training instances, whichever is higher.
Recommended — 20 instances per entity.
Minimum — 5 instance per entity or 20% of training instances, whichever is higher.
Recommended — 20 instances per entity.
Custom Text Classification
Minimum — 10 documents per class.
Recommended — 100 documents per class.
Recommended — 20 documents per class.
Recommended — 20 documents per class.
Tip
Label the training examples correctly. The quality of the model depends on the quality of the data. When you train a model, if a type of class or entity doesn't perform as expected, add more examples for that class or entity. Also ensure that the entity is annotated at every occurrence in the training set. Low quality training data results in poor training metrics and yields inaccurate results.
Have enough training samples for models. More data is always better to boost model performance. We recommend that you train the model with a small dataset, review the model training metrics, and add more training samples as needed.
Training a Model 🔗
Training is the process where the model learns from the labeled data. The training duration and results depend on the size of the dataset, the size of each record, and the number of active training jobs.
Evaluating the Model 🔗
After a model is trained, you can get evaluation metrics that describe the quality of the model, or how likely the model is to predict correctly. The service applies the model to the test set and compares the predicted labels to the expected labels. The metrics are based on how accurate the model predicts the test set.
Using the Console you get a set of evaluation metrics at the model level, and at the class level, (or entity level for NER models) listed in the following section.
Using the Console you get the following types of evaluation metrics:
Class level metrics
Model level metrics
Entity level metrics for NER models
Confusion matrix
Using the API, you can get a more complete set of metrics including micro, macro and
weighted average precision recall, and F-1 scores.
Class Metrics
Class metrics are or entity level metrics.
Precision
The ratio between true positives (the correctly predicted examples), and all examples of the particular class.
It describes how many of the predicted examples are correctly predicted. The value is between 0 and 1. Higher values are better.
Recall
The ratio between true positives (the correctly predicted examples) and all predicted examples.
It describes how many correct examples are predicted. The value is between 0 and 1. Higher values are better.
F1-Score
The F1-score is the harmonic mean of precision and recall, giving you a single value to evaluate the model. The value is between 0 and 1. Higher values are better.
Model Metrics
Model metrics are model level metrics for multi-class models. Model level metrics describe the overall quality of a model. Precision, recall, and F-1 values are presented at the macro, micro, and weighted average level.
Macro Averages
A macro average is the average of metric values of all classes.
For example, macro-precision is calculated as the sum of all class precision values
divide by the number of classes.
Micro Averages
A micro-average aggregates the contributions of all examples to compute the average metric.
For example, a micro recall is calculated as (sum of correct examples predicted) / (sum of correct examples predicted + sum of correct examples not predicted).
Weighted Averages
Calculated by considering the number of instances of each class.
For example, a weighted F1-score is calculated as sum of (F1-score of class * support proportion of the class).
Accuracy
A fraction of all correctly predicted and nonpredicted examples. The fraction is calculated as the ratio between correctly predicted and nonpredicted classes (true positive + true negative) and all examples
Confusion Matrix
A table to visualize the true and prediction results of each class.
Deploying the Model 🔗
After model metrics meet expectations, you can put the model in production and use the model to analyze text. To put the model into production, you create an endpoint. An endpoint assigns dedicated compute resources for inferencing (performing text analysis) on custom models.
Custom model endpoints are private and you must specify a compartment for deploying the endpoint. You can create more than one endpoint for a model. You can create or delete endpoints without deleting a model.
Analyzing Text 🔗
After you create a model endpoint, you can deploy the model and analyze text using a custom model. You can deploy a model to an endpoint in the following ways: