KIBI.ONE PRESENTS…

Setting up an Open Source Serverless LLM to Expose API Endpoints

In this tutorial you’ll learn how to setup a version of your favorite open source LLM (i.e. Llama 2, Mistral etc.) in the cloud and expose the serverless endpoints to control the AI.

blackship one logo good

Kibi.OneĀ is a no-code SaaS incubation program focused on helping non-technical founders build software companies. Visit our homepage to learn more about our program.

How to go Serverless with Open Source AI Models Such as Llama 2?

In this tutorial I want to talk more about how we can interact with the API endpoints of open source LLMs, or large language models, without having to code.

In the last tutorial, linked to below, I showed you how to host your own version of an open source LLM like LLama2 or Mistral. I showed you how by renting hardware by the hour, you were able to avoid having to set your LMM up on your own computer locally. In this tutorial, I want to take this a step further and teach you how to interact with open source LLMs through their API endpoints without having to code, and without having to install anything locally. So letā€™s jump in.

Setting up AI on the Cloud

Again, letā€™s head over to RunPod. Here you can see that RunPod is a cloud provider that specializes in AI applications. After logging in, what I want you to do is I want you to click on ā€œserverlessā€. Now from here, we can set up our API endpoints for the AI platform we want to interact with. Under ā€˜quick deployā€ click on ā€œview moreā€ and here youā€™ll see a list of open source AI products we can launch. For example we could launch Whisper, Llama 13B, Llama 7B, Mistral, Open Journey, stable diffusion and some others. For this tutorial, we went text based AI, so letā€™s selected LLama 13B. Simply click on ā€œstartā€. When we do that, weā€™ll be asked to select a GPU. If we roll over the stats here, weā€™ll see the stats of the pod that this serverless API will run on. Iā€™ll selected this 48GB option here and then click on ā€œdeployā€.

Viewing the AI API Endpoints

Once our serverless instance is set up, weā€™ll see it here under the serverless tab. You can click on ā€œeditā€ to change any details. However, for this tutorial what we want to do is we want to find the API endpoints. So letā€™s click on the title here where it says LLama 13B. Youā€™ll see the API endpoints where you see ā€œrunsynchā€ here. Simply expand this section by clicking on this ā€œmoreā€ icon here. Here youā€™ll see our other endpoints.

We could for example, use this run endpoint. It will give us the response as an ID, which we can later call through a separate API endpoint called ā€œstatusā€ where we can dynamically inject the id of any run call to show the response. However, for queries which take under 15 seconds to perform we can use this runsynch option which will take our prompt and give us the answer in the initial response, without having to check the status of a call through a separate aPI call. So how you use this will really depend on your use case and individual needs. But for this tutorial, Iā€™m just going to use run synch.

Testing Serverless AI API Endpoints

So Iā€™ll grab this API endpoint here. So letā€™s test out this Llama 2 API. You can use whatever API testing tool you like. Iā€™m using Insomnia. I just created a tutorial on how to use this API testing software if you want to follow along. Iā€™ll link to that below. Also, Iā€™ve posted step by step instructions over on Kibi.one regarding how to setup the API. Over on Kibi oneā€™s website, Iā€™ve included the API URL endpoints and the exact JSON file formatting you need to use in order to make this API call work. So Iā€™ll link to Kibi below, so you can get the step by step authentication, header and JSON instructions. In the description, just look for where it says ā€œinstructionsā€.

Also, while youā€™re over on Kibi Oneā€™s website, be sure to check out our no-code SaaS development course. We teach people how to build software products by taking a no code or low code approach. We own a few platforms that have AI integrated into them, and in this course, youā€™ll learn how to do the same.

Okay, so once you have the API setup within your API testing tools, itā€™s time to click ā€œsendā€. So letā€™s enter a prompt here. Iā€™ll type in something like ā€œwhatā€™s the fastest growing economy in the worldā€. When I do that,, I just need to wait a few seconds for the response and thenā€¦ once itā€™s ready, it will show up over to the right here.

Using Llama 2 13B in the Cloud

So now weā€™re using llama 13B in the cloud. As you can see, without writing any code, we were able to set up a serverless instance of Llama 13B and then interact with it VIA our own custom API endpoint. YOu could set up a serverless version of any LLM taking this approach, But these quick launch serverless options are your best bet if youā€™re trying to take a no code or low code approach. I expect that Runpod will add the newest and most popular versions including llama 70B to this list of quick start LLms shortly. But as you can see, Iā€™m getting really good responses using the Llama 13B.

Now the benefit to taking the serverless approach over the pod approach is that Iā€™m only charged for the request execution time. This means I donā€™t have to worry about how many hours my pod is online for. I donā€™t need to worry about any infrastructure or GPU details at all. These out of the box serverless instances all work out of the box and are optimized based on the LLM our open source AI software youā€™re running.

Conclusion

So I hope youā€™ve enjoyed this tutorial on setting up open source AI projects in the cloud and interacting with them through API endpoints.

And remember, if youā€™re looking to upskill in the area of SaaS development, or no code AI development I encourage you to check us out over on Kibi.one. We have a no-code SaaS development course where we will teach you how to build and monetize pretty much whatever you can dream up including AI applications. A link to Kibi.one and a coupon code for $100 off can be found below.

blackship one logo good

Kibi.OneĀ is a no-code SaaS incubation program focused on helping non-technical founders build software companies. Visit our homepage to learn more about our program.

Build SaaS Platforms Without Code

Kibi.One is a platform development incubator that helps non-technical founders and entrepreneurs build software companies without having to know how to code.Ā 

We're so sure of our program, that if it's doesn't generate a positive ROI for you, we'll buy your platform from you. Watch the video to the right to learn more.Ā