logo
Published on

Function Calling with Google's Gemini LLM

Authors
drawing

Bing Image Creator "Robot chef cooking, cyberpunk. Hyperlink of information flowing into his brain"

Growing up, my family relied on a variety of cookbooks and family recipes for our homecooked meals. Although we enjoyed the normal meals that were in our rotation, our source for new recipes (from my memory) was normally either recommendations from friends, new cookbooks, or recipe blogs.

When I graduated college and became responsible for cooking meals for myself, cookbooks seem to be basically extinct (at least in my social circle). We relied on the internet for our recipes; the obvious one is AllRecipes. AllRecipes is good but it can be hard to choose a recipe because there are sooo many options. True to its name, if you search on AllRecipes for a "pancakes" recipe, you get like 500 options. I get overwhelmed by the options and generally prefer someone to do the curation for me: enter BudgetBytes! For the past 5+ years, my wife and I have almost exclusively sourced our recipes from BudgetBytes. They have lots of recipes that are delicious, healthy, and generally inexpensive to make. They remove the decision paralysis of AllRecipes, because they generally only have a few recipes per dish. So if you search their site for "pancakes", you only get about 5 recipes, instead of 500. Much easier to choose.

If you've searched the internet for recipes, you'll quickly encounter the headache that we all have to endure: ads. Most times when I'm pulling up a recipe, it's on my iPhone or iPad, and I don't have any sort of adblocker. To my knowledge, most recipe blog/websites rely on advertising to generate revenue. This means that they make money by plastering as many ads onto the webpage as possible. What's the simplest way to fit more ads on a webpage? Make the webpage longer. This is (probably) why all the recipe blogposts/pages have the actual recipe way down at the bottom, and all that pre-amble that you have to scroll through first. One short hack that I've found to bypass the ads is to click the "print page" option, which then generally brings up an html page that is simplified down to an ad-free version of the recipe. However, it's still a hassle to load the ad-bloated first webpage and scroll to find the appropriate button. Most people would shrug their shoulders, mutter under their breath, and live with it. AI engineers, however, are not like most people.

Why am I talking about food, this is an AI Blog!

A fairly well advertised feature of LLMs is their ability to generate recipes. Ask an LLM for a Lemon Pepper Chicken recipe, and it will give you a pretty good one. However, I'm a BudgetBytes fanatic, so I don't just want any recipe, I want the BudgetBytes Lemon Pepper Chicken recipe! I only want the LLM to generate me a recipe if it can't find one from BudgetBytes.

This is where the concept of "function calling" comes into play. For this demo I'm going to use Google Gemini (Google's ChatGPT competitor), because they let us use function calling for free (as of Mar 2024), and they have a nice Node.JS (Javascript) package that I've been wanting to try out. Per their docs:

Function calling lets developers create a description of a function in their code, then pass that description to a language model in a request. The response from the model includes the name of a function that matches the description and the arguments to call it with. Function calling lets you use functions as tools in generative AI applications

With this technique, we can give Gemini the ability to pull recipes from BudgetBytes.com, and thereby skip over all the scrolling and ads.

Coding

Pre-requisites

Quick Start

I've placed this code into a public repository here:

https://github.com/njbrake/gemini-function-calling

Follow the setup instructions listed on the Readme if you'd like to give it a shot!

Once you have your GOOGLE_API_KEY added to the .env file in the repository, you should be able to run the following in a bash terminal to see it in action

npm install
npm start "get me the Easy Lemon Pepper Chicken off budget bytes"

What's fun is that if you don't explicitely ask it for a recipe off of budget bytes, it will respond like a normal LLM, and also if the recipe you want isn't found, it will tell you that BudgetBytes didn't have one, and will generate one for you.

Code Walkthrough

In order to make it a bit easier to explain, I put all of the code into a single file.

Caveat

If I was turning this into something more complex I would have separated out the utility and constants into separate files. Similarly, this code is designed to be a building block for building a server: in order to access this from my phone I'll need to put this inside of a http server (Express.js), and then host that web server on either a local PC of mine (yay, finally a new use for a Raspberry Pi!) or some cloud hosting service (AWS has a free tier that I could use). If you decide to host this application, make sure that you have the access to the server tightly controlled since the service uses your Google API key and if you expose the server to the world, someone may find your server, start using Gemini and have your account getting charged.

Walkthrough

The code is here, I'm going to skip over some of the boilerplate code, and chat about the interesting parts. As of Mar 25, the Node.JS API for using Gemini function calling is still in beta and not officially documented, which is why I needed to provide v1betawhen instantiating Gemini.

First, generation config:

const generationConfig = {
  maxOutputTokens: 2000,
  temperature: 0.1, // make this lower for more deterministic results
}

The generation config is what is going to help control the model output. The temperature can be between 0-1, I set it very low (0.1) to make the outputs more deterministic. This blog post has tons of good explanations about LLM text generation settings.

Next, the BudgetBytes calling logic.

const functions = {
  getRecipe: async ({ value }) => {
    console.log('Searching on budgetbytes.com for : ', value)
    // search for a recipe on budgetbytes.com
    // convert value to a search string
    let query = value.replace(/\s+/g, '+').toLowerCase().replace('recipe', '')
    ;``
    // download the search results page
    const searchUrl = `https://www.budgetbytes.com/?s=${query}`
    let response = await fetch(searchUrl)
    let text = await response.text()
    // grab the URL for the first search result
    let $ = cheerio.load(text)
    // the links are in the class archive-post-listing
    const recipe = $('.archive-post-listing a').first().attr('href')
    // if recipe is undefined, return an error
    if (!recipe) {
      console.log('No recipe found for this on budget bytes')
      return { error: true }
    }
    // download the html and then parse out the ingredients and instructions
    response = await fetch(recipe)
    text = await response.text()
    // console.log(text);
    $ = cheerio.load(text)
    const ingredients = $('.wprm-recipe-ingredient')
      .map((i, el) => $(el).text())
      .get()
    const instructions = $('.wprm-recipe-instruction')
      .map((i, el) => $(el).text())
      .get()
    const output = { url: recipe, ingredients, instructions }
    return output
  },
}

When executed, this code runs a search on budget bytes, and picks the first search result for the query. We could refine the logic to show the user a list of recipes and have them select which one sounds best, but this works fine for a starting point. Once I find out which recipe I want from the search page, I then load that new page and extract out the recipe and ingredients. I'm parsing the HTML documents using cheerio, I found the correct HTML tags to look for by doing a quick inspection of the BudgetBytes search webpage (Hitting F12 in Chrome on the BudgetBytes search webpage and using Chrome Devtools to find the right locations).

The BudgetBytes website was rather nicely organized for scraping, the recipe and ingredients were tagged with .wprm-recipe-ingredient and .wprm-recipe-instruction which made them super easy to abstract.

If the parsing fails, I set the error key which is what will instruct Gemini that the BudgetBytes search failed and that it should make up a recipe for me.

Now, we start the integration into Gemini.

const tools = [
  {
    functionDeclarations: [
      {
        name: 'getRecipe',
        description: 'Retrieve a recipe from budgetbytes.com. Only use this when explicitely asked',
        parameters: {
          type: FunctionDeclarationSchemaType.OBJECT,
          properties: {
            value: { type: FunctionDeclarationSchemaType.STRING },
          },
          required: ['value'],
        },
      },
    ],
  },
]

This code is building a "function" that we will provide to the Gemini API. When we ask Gemini a question, it uses the "description" field that we filled out to decide whether or not our function for scraping BudgetBytes.com should be called. So for instance if you ask "What's the capital of Connecticut?", Gemini will know that this does not relate to recipes or BudgetBytes, so our function won't be called and it will tell you the result of your question, no problem. Another awesome part is that based on what you set in the properties key, Gemini also parses and sets the parameter of our getRecipe function. So if we ask "Get me a recipe for pancakes from budgetbytes", Gemini will both know that we want to call getRecipe() but will also set the parameter to "pancakes" that will be passed into our function.

After we've instantiated the model, we pass in a prompt.

const prompt = {
  role: 'user',
  parts: [
    {
      text: search,
    },
  ],
}

const result = await model.generateContent({
  contents: [prompt],
  tools: tools,
})

We pass the "tools" key to generateContent, which is how we tell Gemini that our GetRecipe function is a valid option if prompt is asking for a recipe.

const content = result.response.candidates[0].content
const fc = content.parts[0].functionCall

This code here is the biggest feature and the greatest part of the Gemini function calling feature! This "functionCall" key is only set if Gemini detected that your question was asking about a recipe from budget bytes. So if you ask "Generate me a new recipe for waffles", functionCall will be null/undefined. But if you ask Gemini "Get me a pancake recipe from BudgetBytes", functionCall would be contain the string "getRecipe" which is what we can use to know that we should call the getRecipe function.

If Gemini set functionCall, now we can call our javascript function.

const { url, ingredients, instructions, error } = await functions[name](args)

Gemini takes care of configure our args for us based on what we set in functionDeclarations, so it's really that easy to get the result!

In my code I added some logic around this to handle lookup failures and logic routing, but the above things are the core features.

Example Outputs

Here's some examples:

From Budget bytes

Input:

npm start "get me the Easy Lemon Pepper Chicken off budget bytes"

Output:

Searching on budgetbytes.com for :  Easy Lemon Pepper Chicken
URL: https://www.budgetbytes.com/easy-lemon-pepper-chicken/
Ingredients:
 2 boneless skinless chicken breasts (about 1.3 lbs. total) ($6.79)
2 Tbsp all-purpose flour ($0.02)
1 Tbsp lemon pepper seasoning ($0.30)
1 Tbsp  cooking oil ($0.04)
1 clove garlic, minced ($0.08)
1/2 cup chicken broth ($0.07)
1 Tbsp butter ($0.14)
1 tsp lemon juice ($0.02)
1 Tbsp chopped fresh parsley (optional) ($0.10)
1/8 tsp freshly cracked black pepper ($0.02)
Instructions:
 Use a sharp knife to carefully fillet the chicken breasts into two thinner peices (or use thin-cut chicken breasts).
Combine the flour and lemon pepper seasoning in a bowl. Sprinkle the mixture over both sides of the chicken breast pieces and then rub it in until the chicken is fully coated.
Heat the cooking oil in a large skillet over medium. When the skillet and oil are very hot, add the chicken and cook on each side until golden brown (about 5 minutes per side). Remove the cooked chicken to a clean plate and cover to keep warm.
Add the butter and minced garlic to the skillet and sauté for about one minute.
Add the chicken broth to the skillet and whisk to dissolve all the browned bits from the bottom of the skillet. Add the lemon juice and allow the sauce to simmer in the skillet for 3-5 minutes, or until it has reduced slightly. Taste the sauce and add salt if needed (I did not add any).
Finally, return the chicken to the skillet and spoon the sauce over top. Allow the chicken to heat through. Season with a little freshly cracked pepper and fresh chopped parsley (optional), then serve.

Not from Gemini

Input:

npm start "make me a new recipe for Easy Lemon Pepper Chicken"

Output:

**Ingredients:**

* 1 pound boneless, skinless chicken breasts, cut into bite-sized pieces
* 1 tablespoon olive oil
* 1 tablespoon lemon juice
* 1 teaspoon lemon zest
* 1 teaspoon black pepper
* 1/2 teaspoon salt
* 1/4 teaspoon garlic powder
* 1/4 teaspoon onion powder

**Instructions:**

1. In a large bowl, combine the chicken, olive oil, lemon juice, lemon zest, black pepper, salt, garlic powder, and onion powder. Toss to coat evenly.
2. Heat a large skillet over medium heat. Add the chicken and cook for 5-7 minutes per side, or until cooked through.
3. Serve immediately with your favorite sides, such as rice, pasta, or vegetables.

**Tips:**

* For a crispier chicken, cook it over high heat for a shorter amount of time.
* If you don't have fresh lemon juice, you can use 1 tablespoon of bottled lemon juice.
* You can also add other seasonings to your taste, such as oregano, thyme, or rosemary.
* Serve with a side of lemon wedges for extra flavor.

All that's left to do now is try them both out!