How to use GPT-J Model using Python



In the previous blog, we discussed the GPT-J Model and how to use it with the help of playgrounds provided by different websites. In this blog, we would discuss how to use GPT-J Model using Python. The open source, autoregressive language model known as GPT-J-6B was created by the research group EleutherAI. It can complete a range of natural language tasks, including conversation, summarising, and question answering, and is one of the most advanced alternatives to OpenAI’s GPT-3. GPT-J, a 6-billion parameter transformer-based language model, was introduced in June 2021 by the AI research team EleutherAI. Since the group’s inception in July 2020, its goal has been to offer a collection of models that might be applied to replicate OpenAI’s research.


Training procedure


On the TPU v3-256 pod, this model was trained for 383,500 steps for 402 billion tokens. It was developed as an autoregressive language model to increase the chance that it would accurately predict the next token.


How can we use GPT-J Model Using Python?


Step – 1: Importing Libraries


Firstly, we need to install the transformers Library as GPT-J Model is available in it.  We can install it using ! pip install transformers command in python.


# we import the GPT-J Model from Transformers
from transformers import GPTJForCausalLM, AutoTokenizer

# also we import the tensorflow
import tensorflow as tf


Step-2: Building the model


we use the prebuilt model of the GPT-J Model from the Transformers library. It may take time for downloading as it takes around 11.4 GB of data.


# to create a model we use AutoModelForCausalLM function which is imported from transformers library 

model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", revision="float16", torch_dtype=tf.float16, low_cpu_mem_usage=True)

# to create tokenizer we use AutoTokenizer.from_pretrained function which is imported from transformers library 

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")


Step – 3: Testing the model


Now, We test the model which we have built.


context = " Write about Neural Networks "

# it pt set, it will return tensors instead of list of python integers and tokenize the prompts
input_ids = tokenizer(context, return_tensors="pt").input_ids

# more is the temperature, less would be the plagiarism and max length is the maximum number of tokens
gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=100,)

# get the result
gen_text = tokenizer.batch_decode(gen_tokens)[0]


Output for the above code is :


“A feedforward neural network is a class of learning algorithms based on artificial neural networks. A feedforward neural network (or “FFNN”) consists of a set of nonlinear processing elements, typically referred to as “neurons”, which are connected to each other and to the inputs via weighted, directed connections. In some cases, a single neuron can be “active” at any time (i.e., it receives input from every possible input), in which case it is referred to as a “dynamic” neuron. The neurons are arranged in layers: the input neurons receive the inputs to the network, and the output neurons provide the outputs (i.e. the output from the network)


Also, read – How to use the BLOOM Model

Read- GPT-J (


Share this post

Leave a Reply

Your email address will not be published. Required fields are marked *