How to use GPT-J Model using Python
Introduction
In the previous blog, we discussed the GPT-J Model and how to use it with the help of playgrounds provided by different websites. In this blog, we would discuss how to use GPT-J Model using Python. The open source, autoregressive language model known as GPT-J-6B was created by the research group EleutherAI. It can complete a range of natural language tasks, including conversation, summarising, and question answering, and is one of the most advanced alternatives to OpenAI’s GPT-3. GPT-J, a 6-billion parameter transformer-based language model, was introduced in June 2021 by the AI research team EleutherAI. Since the group’s inception in July 2020, its goal has been to offer a collection of models that might be applied to replicate OpenAI’s research.
Training procedure
On the TPU v3-256 pod, this model was trained for 383,500 steps for 402 billion tokens. It was developed as an autoregressive language model to increase the chance that it would accurately predict the next token.
How can we use GPT-J Model Using Python?
Step – 1: Importing Libraries
Firstly, we need to install the transformers Library as GPT-J Model is available in it. We can install it using ! pip install transformers
command in python.
# we import the GPT-J Model from Transformers from transformers import GPTJForCausalLM, AutoTokenizer # also we import the tensorflow import tensorflow as tf
Step-2: Building the model
we use the prebuilt model of the GPT-J Model from the Transformers library. It may take time for downloading as it takes around 11.4 GB of data.
# to create a model we use AutoModelForCausalLM function which is imported from transformers library model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", revision="float16", torch_dtype=tf.float16, low_cpu_mem_usage=True) # to create tokenizer we use AutoTokenizer.from_pretrained function which is imported from transformers library tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
Step – 3: Testing the model
Now, We test the model which we have built.
context = " Write about Neural Networks " # it pt set, it will return tensors instead of list of python integers and tokenize the prompts input_ids = tokenizer(context, return_tensors="pt").input_ids # more is the temperature, less would be the plagiarism and max length is the maximum number of tokens gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=100,) # get the result gen_text = tokenizer.batch_decode(gen_tokens)[0] print(gen_text)
Output for the above code is :
“A feedforward neural network is a class of learning algorithms based on artificial neural networks. A feedforward neural network (or “FFNN”) consists of a set of nonlinear processing elements, typically referred to as “neurons”, which are connected to each other and to the inputs via weighted, directed connections. In some cases, a single neuron can be “active” at any time (i.e., it receives input from every possible input), in which case it is referred to as a “dynamic” neuron. The neurons are arranged in layers: the input neurons receive the inputs to the network, and the output neurons provide the outputs (i.e. the output from the network)
Also, read – How to use the BLOOM Model
Read- GPT-J (huggingface.co)