The New Huggingface Templates for Chat Models Feature

Introduction

LLMs fine-tuned for chat are one of the most exciting new applications of machine learning, and is already revolutionizing the way we interact with the digital world. In the open source community, the chase is on to meet and/or exceed GPT-4 on various performance benchmarks. A quick browse of the model hub shows how there are plenty of trending models containing the term "instruct" or "chat". Mistral.ai, Meta, and tiiuae are just a few of the big players in the race to create competitive chat-aligned LLMs. However, developing and integrating prompts to effectively interact with the increasing number of models available can be challenging, due to the high degree of variation in the format of prompts that different models are trained on.

For example, the LlaMA-2 Chat models are trained using a format like the following:

<<SYS>>
You are a friendly assistant designed to help answer a user's questions.
</SYS>
[INST]
Which came first, the chicken or the egg?
[/INST]

Other models, such as Falcon and XGen, expect a prompt like this

### Instruction:
You are a friendly assistant designed to help answer a user's questions.
### Input:
Which came first, the chicken or the egg?
### Response:

Knowing which format is correct is crucial to getting reasonable performance out of a chat LLM, but it's not straightforward to know what format was used to train a model, and if a proprietary model is trained, there hasn't been a standard way to encode this template information into the model artifact. That is, until now! In this blog post we will discuss the new feature released by huggingface in their transformers library that provides a solution to this problem.

Chat Templates Enter the Chat

Recently, huggingface released version v4.34.00. Among other things, model tokenizers now optionally contain the key "chat_template" in the tokenizer_config.json file. As this field begins to be implemented into model artifacts, there is now a model-agnostic way for AI researchers to load and test these models. Using this new feature, code such as


template="""### Instruction:
{{instruction}}
### Input:
{{context}}
### Response:
"""
text = template.format(instruction=instruction, context=context)
input_ids = tokenizer(text).input_ids

Is changed to:

chat = [
  {"role": "system", "content": "You are a friendly assistant designed to help answer a user's questions."},
  {"role": "user", "content": "Which came first, the chicken or the egg?"},
]
tokenizer.apply_chat_template(chat, tokenize=False)
input_ids = tokenizer.apply_chat_template(chat, return_tensors="pt")

This may seem like a small refactor, but the key here is that now the code doesn't need to know anything about the underlying formatting of the prompt, reducing coupling between the code and the model architecture.

For further details about this new feature and how to customize the templates, see the documentation on huggingface docs. Additionally, you can view the implementation of the chat_template key in the LlaMA-2 configuration on huggingface hub

The release notes for v4.34.0 can be found on Github

This new feature should help to decouple and simplify code, and reduce the number of headaches I get from mixing up which prompt format is supposed to be used with which model.

Cover Photo by Anne-Marie Pos on Unsplash