What is the 176B Parameter Bloom Model?

Introduction

BLOOM is an autoregressive Large Language Model (LLM) that has been trained to continue text from a prompt utilizing huge quantities of text data and highly powerful computing power. In 46 languages and 13 programming languages, it can thus produce content that is cohesive and rarely distinguishable from text written by people. The 176B Parameter Bloom Model is the creation of BigScience, an international effort driven by the community that aims to make huge natural language models broadly accessible for research.

Large language models, or “LLMs,” can, more or less, summarise, translate, and produce text with humanlike nuance. But because they have historically been expensive to produce, researchers have not had access to them, and Big Tech firms like Meta, Google, and Microsoft have largely controlled their use.

Training of Bloom Model

On the Jean Zay supercomputer, which is one of the most potent machines in the world and is located close to Paris, France, Bloom was trained to utilize compute time that was publicly sponsored for $7 million. BLOOM’s training required over one million compute hours, and its hardware includes the following:

GPUs: 384 NVIDIA A100 80GB GPUs (8 per node), connected through NVLink 4 and OmniPath, with 32 spare GPUs.

As a result, unnecessary material was deleted and the 1.5TB data repository was cleaned up. It contained data in 46 different languages. Bekman calculated that this was translated into 350B distinct tokens, with the estimated vocabulary size of the model being 250,680 tokens.

What makes Bloom unique?

The goal of the Big Science project BLOOM is to develop the most potent language model ever. Some consider it to be the most significant model of the previous ten years and a turning point for artificial intelligence. The most potent one is GPT-3, but BLOOM is much different because BLOOM is open to anyone. The 176B Parameter Bloom Model has 70 layers, 112 attention heads per layer, a hidden dimensionality of 14336, and a sequence length of 2048 tokens. BLOOM has one billion more parameters than GPT-3. We were able to follow the training because it was accessible to everyone. BLOOM has been trained in a variety of languages, including English, Spanish, Italian, and even computer programming.

Examples