Loading a model from hugging face
Hugging face is a hub that stores machine learning models, datasets and a lot of different deep learning resources.
You can find some of the models available here:
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard The name of the
model looks like a path: model_type/model_name usually.
You can then use the from_pretrained method to load your model.
model_name = "EleutherAI/gpt-neo-125M"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name, device=device)
model = transformers.AutoModelForCausalLM.from_pretrained(model_name).to(device)
Usually, model are provided with a dedicated tokenizer as the tokenizer used depends on the model.
This will download the model from the Hugging Face Hub. It usually takes about 10 minutes depending on your internet connection. The models we use weigh about 4GB, but can go as high as 20 GB for the big ones.
The device is where the model will be stored in memory, in the CPU, the GPU or MPS