# What is BERT Model and How to use it.

In this blog, we would discuss What is BERT Model and How to use it. Bidirectional Encoder Representations from Transformers is known as BERT. It is intended to jointly condition both left and right contexts to pre-train deep bidirectional representations from the unlabeled text. With just one additional output layer, the pre-trained BERT model can be improved to produce cutting-edge models for a variety of NLP tasks.

BERT has already been trained on a sizable corpus of unlabeled text, such as the entirety of Wikipedia (which has 2,500,000,000 words!) and the Book Corpus (800 million words). Half of BERT’s success lies in this pre-training phase. This is because as we train a model on a huge text corpus, the model begins to comprehend the language more deeply and specifically.

## What is BERT Model?

BERT is a transformers model that has been pre-trained on a significant body of English data in an unsupervised manner. This indicates that it was pre-trained using simply the raw texts, with no human labeling of any kind (thus, it can use a lot of material that is readily accessible to the public), and an automatic procedure to generate inputs and labels from those texts. It was pre-trained specifically to accomplish two things:

• Masked language modeling (MLM) involves randomly masking 15% of the input words in a sentence, running the full sentence through the model, and then predicting the words that were hidden. perceive the words sequentially, and autoregressive models like GPT, which internally conceal the next tokens. This makes it possible for the model to learn a two-way representation of the statement.

• Next sentence prediction (NSP): During pretraining, the model concatenates two masked sentences as inputs. They occasionally match sentences that were next to one another in the original text, and sometimes they don’t. The model must then determine whether or not the two sentences followed one another.

## Text Preprocessing

Positional embeddings: BERT learns and employs positional embeddings to express the placement of words in a phrase. These are included to get over Transformer’s limitation, which, unlike an RNN, is unable to record “sequence” or “order” information.

• Section Embeddings: Inputs for tasks in BERT can also be sentence pairs (Question-Answering). For the first and second sentences, it learns a special embedding to aid in the model’s ability to distinguish between them. In the example above, all of the tokens marked as EA are associated with phrase A. (and similarly for EB)

• Embeddings of Tokens: These are the embeddings discovered from the WordPiece token vocabulary for the particular token.

The input representation for a particular token is built by adding the relevant token, segment, and position embeddings. A model can learn a lot from an embedding strategy that is so extensive. This variety of preprocessing processes is what makes BERT so adaptable. This suggests that we may simply train the model on a variety of NLP tasks without having to make any significant changes to the model’s design.

## How to use the BERT model

from tf_transformers.models import BertModel
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-large-cased')
model = BertModel.from_pretrained("bert-large-cased")

text = "Replace me by any text you'd like."
inputs_tf = {}
inputs = tokenizer(text, return_tensors='tf')

inputs_tf["input_ids"] = inputs["input_ids"]
inputs_tf["input_type_ids"] = inputs["token_type_ids"]
inputs_tf["input_mask"] = inputs["attention_mask"]
outputs_tf = model(inputs_tf)


Also, read 176B Parameter Bloom Model.

Share this post