Host Meta Llama 3.1 405B on new clusters in OCI Generative AI
- Services: Generative AI
- Release Date: February 07, 2025
OCI
Generative AI has released a new FP8 quantized version of the Meta Llama 3.1 405B model, with a 50% reduced GPU footprint. You can now host the meta.llama-3.1-405b-instruct
model on the new dedicated AI cluster of type Large Generic 2. This type is intended to maintain the model performance with a lower cost than its predecessor, Large Generic 4. See the performance benchmarks that were performed for the meta.llama-3.1-405b-instruct
model hosted on one Large Generic 2 unit and on one Large Generic 4 unit.
To host a Meta Llama 3.1 405B model on the new Large Generic 2 cluster, follow the steps in creating a dedicated AI cluster and creating an endpoint on the cluster. For a list of offered models, see Pretrained Foundational Models in Generative AI. For information about the service, see the Generative AI documentation.