Published on

The most important AI development of 2023 was maybe not GPT-4



DALL-E "llama wearing a winter hat with gloves on his hooves, in a snowy scene with a falcon flying in the sky, cartoon"

It’s that time of the year! Now that the holiday season is upon us, I usually take a bit of time to reflect on the past year. Personally, I’m quite proud of the streak I have for using Duolingo every day: thanks to some motivation from my wife (who currently has a 1,360 day streak and counting), I started learning German in February and am getting close to achieving a year-long streak of learning a bit of the language every day: Es ist so aufregend!

In the world of AI, I've been thinking about what should be considered as the biggest development of the year. The obvious choice is OpenAI and GPT-4. OpenAI has dominated the news this year and it's no wonder: from ChatGPT, GPT-4, Whisper, and DALL-E 3, they maintain state-of-the-art performance in a field full of competition. I know some of these models were released at the tail end of 2022, but the fact that they've remained so relevant throughout 2023 is really a testament to their exceptional performance. The performance of GPT-4 is the benchmark that all current research is comparing itself against.

Other than OpenAI, there have been other players in the field of proprietary (closed-source) models, like Anthropic with their Claude model, AI21 with their Jurassic Model, and just this week, Google with Gemini (which is a part of Bard). As I mentioned last week, AWS is steaming ahead at full speed to give companies access to many of these models to integrate into their products via Bedrock. They're helping lead the charge in equipping prospectors with pickaxes for the AI gold rush. AWS is also reportedly working on training a GPT-4 competitor called Olympus.

All of the above models are closed source, meaning that those companies (OpenAI/Anthropic/Google/AI2I) spent millions or billions of dollars to train the models, and they hold the details about the models private. Although users and companies can interact and integrate their applications with the model, they don't provide the necessary information or model weights to enable someone to run the model on their own hardware or examine the internals of the training process, inference pipeline, or model architecture.

Of all the above mentioned companies, there is one leading AI company that I haven't yet mentioned: Meta. I think there is an argument to be made that the most important development of the year was their decision to continue supporting the open-source AI research community. In contrast to many of their peers, Meta AI (led by their chief AI scientist, Yann LeCun) decided to release their Llama and Llama 2 model family (7B, 13B, 70B, both base and chat models) to the community. Although there are some licensing restrictions attached to the Llama 2 models, the models can be used freely for research purposes, which is massively important. I watched a great conversation between Fei-Fei Li and Geoffrey Hinton, where Fei-Fei points out that no university has the resources required to train a model the size of ChatGPT.

Without support from larger companies, it can be difficult for less well-funded institutions to advance the field. Some of my favorite papers/code from this year used Llama2 as a part of their research, tweaking the architecture of Llama in ways that they don't have the opportunity to try with closed-source models. Consider this paper which is helpful for those looking to run LLMs on consumer hardware: QLoRA shows how the weights of Llama can be quantized into 4-bit or 8-bit representations which allows it to run on smaller devices with only small degradations to performance. I absolutely love this one: Andrej Karpathy wrote a program purely in C called Llama.c that lets you run inference with Llama on a CPU with a rate of output of ~110 tokens/second! These researchers from Meta show how the rotary position embeddings of llama can be interpolated in order to support much longer context lengths than the 4k token length of the pretrained model, using what they term RoPE scaling (paper). The ability for people to play around with the internals of the model has enabled lots of exciting work, and I'm sure lots more is to come.

This week, Meta and others announced the AI Alliance. Their mission is to act "as an International Community of Leading Technology Developers, Researchers, and Adopters Collaborating Together to Advance Open, Safe, Responsible AI". Now, this alliance is almost certainly motivated by concern about OpenAI pulling ahead in the field, but regardless, it's another statement that shows Meta's commitment to developing safe and secure Gen AI through research transparency.

The discussion about open vs. closed source Generative AI is popular, so I'll link to a few relevant pieces that approach the debate.

I'm excited for 2024 and hope to see many more exciting discoveries enabled by Llama2 and companies that prioritize investments into the open source and research community.

P.S. I should mention that there is some debate about the open-source nature of Llama2. Although the model weights and architecture have been shared, Two things are generally missing. First, the license isn't truly open source (i.e. it's not Apache or MIT licensed, you're not allowed to use Llama for whatever you want). Second, the code used to train the model as well as the dataset used is not provided to us by Meta, so we aren't given a way to reproduce the model weights if we had a few billion dollars lying around and wanted to check Meta's work 😆.