KIBI.ONE PRESENTS…
Setting up an Open Source Serverless LLM to Expose API Endpoints
In this tutorial you’ll learn how to setup a version of your favorite open source LLM (i.e. Llama 2, Mistral etc.) in the cloud and expose the serverless endpoints to control the AI.

Kibi.One is a no-code SaaS incubation program focused on helping non-technical founders build software companies. Visit our homepage to learn more about our program.
How to go Serverless with Open Source AI Models Such as Llama 2?
In this tutorial I want to talk more about how we can interact with the API endpoints of open source LLMs, or large language models, without having to code.
In the last tutorial, linked to below, I showed you how to host your own version of an open source LLM like LLama2 or Mistral. I showed you how by renting hardware by the hour, you were able to avoid having to set your LMM up on your own computer locally. In this tutorial, I want to take this a step further and teach you how to interact with open source LLMs through their API endpoints without having to code, and without having to install anything locally. So letâs jump in.
Setting up AI on the Cloud
Again, letâs head over to RunPod. Here you can see that RunPod is a cloud provider that specializes in AI applications. After logging in, what I want you to do is I want you to click on âserverlessâ. Now from here, we can set up our API endpoints for the AI platform we want to interact with. Under âquick deployâ click on âview moreâ and here youâll see a list of open source AI products we can launch. For example we could launch Whisper, Llama 13B, Llama 7B, Mistral, Open Journey, stable diffusion and some others. For this tutorial, we went text based AI, so letâs selected LLama 13B. Simply click on âstartâ. When we do that, weâll be asked to select a GPU. If we roll over the stats here, weâll see the stats of the pod that this serverless API will run on. Iâll selected this 48GB option here and then click on âdeployâ.
Viewing the AI API Endpoints
Once our serverless instance is set up, weâll see it here under the serverless tab. You can click on âeditâ to change any details. However, for this tutorial what we want to do is we want to find the API endpoints. So letâs click on the title here where it says LLama 13B. Youâll see the API endpoints where you see ârunsynchâ here. Simply expand this section by clicking on this âmoreâ icon here. Here youâll see our other endpoints.
We could for example, use this run endpoint. It will give us the response as an ID, which we can later call through a separate API endpoint called âstatusâ where we can dynamically inject the id of any run call to show the response. However, for queries which take under 15 seconds to perform we can use this runsynch option which will take our prompt and give us the answer in the initial response, without having to check the status of a call through a separate aPI call. So how you use this will really depend on your use case and individual needs. But for this tutorial, Iâm just going to use run synch.
Testing Serverless AI API Endpoints
So Iâll grab this API endpoint here. So letâs test out this Llama 2 API. You can use whatever API testing tool you like. Iâm using Insomnia. I just created a tutorial on how to use this API testing software if you want to follow along. Iâll link to that below. Also, Iâve posted step by step instructions over on Kibi.one regarding how to setup the API. Over on Kibi oneâs website, Iâve included the API URL endpoints and the exact JSON file formatting you need to use in order to make this API call work. So Iâll link to Kibi below, so you can get the step by step authentication, header and JSON instructions. In the description, just look for where it says âinstructionsâ.
Also, while youâre over on Kibi Oneâs website, be sure to check out our no-code SaaS development course. We teach people how to build software products by taking a no code or low code approach. We own a few platforms that have AI integrated into them, and in this course, youâll learn how to do the same.
Okay, so once you have the API setup within your API testing tools, itâs time to click âsendâ. So letâs enter a prompt here. Iâll type in something like âwhatâs the fastest growing economy in the worldâ. When I do that,, I just need to wait a few seconds for the response and then⌠once itâs ready, it will show up over to the right here.
Using Llama 2 13B in the Cloud
So now weâre using llama 13B in the cloud. As you can see, without writing any code, we were able to set up a serverless instance of Llama 13B and then interact with it VIA our own custom API endpoint. YOu could set up a serverless version of any LLM taking this approach, But these quick launch serverless options are your best bet if youâre trying to take a no code or low code approach. I expect that Runpod will add the newest and most popular versions including llama 70B to this list of quick start LLms shortly. But as you can see, Iâm getting really good responses using the Llama 13B.
Now the benefit to taking the serverless approach over the pod approach is that Iâm only charged for the request execution time. This means I donât have to worry about how many hours my pod is online for. I donât need to worry about any infrastructure or GPU details at all. These out of the box serverless instances all work out of the box and are optimized based on the LLM our open source AI software youâre running.
Conclusion
So I hope youâve enjoyed this tutorial on setting up open source AI projects in the cloud and interacting with them through API endpoints.
And remember, if youâre looking to upskill in the area of SaaS development, or no code AI development I encourage you to check us out over on Kibi.one. We have a no-code SaaS development course where we will teach you how to build and monetize pretty much whatever you can dream up including AI applications. A link to Kibi.one and a coupon code for $100 off can be found below.

Kibi.One is a no-code SaaS incubation program focused on helping non-technical founders build software companies. Visit our homepage to learn more about our program.
Build SaaS Platforms Without Code
Kibi.One is a platform development incubator that helps non-technical founders and entrepreneurs build software companies without having to know how to code.Â
We're so sure of our program, that if it's doesn't generate a positive ROI for you, we'll buy your platform from you.