How do you produce text from a text generation model

We use the pytorch and transformers (from huggingface) libraries to run our transformers models.

First, we import the libraries required

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

We create the model and the tokenizer (see Tokens). You can find a lot of models on the Hugging Face website. You can also specify a folder as the model_name to load a model from your computer.

model_name = "Salesforce/codegen2-1B" # name of the mode

# We use the metal backend as we are mac users.
# You can use "cpu" to run on the cpu
# or "cuda" if you have nvidia
device = "mps"

# We load the model
tokenizer = AutoTokenizer.from_pretrained(model_name, device=device)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).to("mps")

We can then write a function to use the network.

def generate_text(prompt):
	inputs = tokenizer(prompt, return_tensors="pt").to("mps")
	outputs = model.generate(inputs.input_ids, max_new_tokens=60, attention_mask=inputs.attention_mask, do_sample=True, num_beams=5)
	return tokenizer.decode(outputs[0], skip_special_tokens=True)

Note that we are using Beam search. You can use different parameters in model.generate for different ways of picking the next token.

And we can use the function !

>>> generate_text("function generate")
function generate_key() {
	return openssl_pkey_new(array(
		'private_key_bits' => 1024,
		'private_key_type' => OPENSSL_KEYTYPE_RSA,
	));
}

Here, we used a model with 1 billion parameters that specializes in code generation but you can use the same code to run any hugging face model.