Setting up an Open Source Serverless LLM to Expose API Endpoints

In this tutorial you’ll learn how to setup a version of your favorite open source LLM (i.e. Llama 2, Mistral etc.) in the cloud and expose the serverless endpoints to control the AI.

blackship one logo good

Kibi.One is a no-code SaaS incubation program focused on helping non-technical founders build software companies. Visit our homepage to learn more about our program.

How to go Serverless with Open Source AI Models Such as Llama 2?

In this tutorial I want to talk more about how we can interact with the API endpoints of open source LLMs, or large language models, without having to code.

In the last tutorial, linked to below, I showed you how to host your own version of an open source LLM like LLama2 or Mistral. I showed you how by renting hardware by the hour, you were able to avoid having to set your LMM up on your own computer locally. In this tutorial, I want to take this a step further and teach you how to interact with open source LLMs through their API endpoints without having to code, and without having to install anything locally. So let’s jump in.

Setting up AI on the Cloud

Again, let’s head over to RunPod. Here you can see that RunPod is a cloud provider that specializes in AI applications. After logging in, what I want you to do is I want you to click on “serverless”. Now from here, we can set up our API endpoints for the AI platform we want to interact with. Under ‘quick deploy” click on “view more” and here you’ll see a list of open source AI products we can launch. For example we could launch Whisper, Llama 13B, Llama 7B, Mistral, Open Journey, stable diffusion and some others. For this tutorial, we went text based AI, so let’s selected LLama 13B. Simply click on “start”. When we do that, we’ll be asked to select a GPU. If we roll over the stats here, we’ll see the stats of the pod that this serverless API will run on. I’ll selected this 48GB option here and then click on “deploy”.

Viewing the AI API Endpoints

Once our serverless instance is set up, we’ll see it here under the serverless tab. You can click on “edit” to change any details. However, for this tutorial what we want to do is we want to find the API endpoints. So let’s click on the title here where it says LLama 13B. You’ll see the API endpoints where you see “runsynch” here. Simply expand this section by clicking on this “more” icon here. Here you’ll see our other endpoints.

We could for example, use this run endpoint. It will give us the response as an ID, which we can later call through a separate API endpoint called “status” where we can dynamically inject the id of any run call to show the response. However, for queries which take under 15 seconds to perform we can use this runsynch option which will take our prompt and give us the answer in the initial response, without having to check the status of a call through a separate aPI call. So how you use this will really depend on your use case and individual needs. But for this tutorial, I’m just going to use run synch.

Testing Serverless AI API Endpoints

So I’ll grab this API endpoint here. So let’s test out this Llama 2 API. You can use whatever API testing tool you like. I’m using Insomnia. I just created a tutorial on how to use this API testing software if you want to follow along. I’ll link to that below. Also, I’ve posted step by step instructions over on regarding how to setup the API. Over on Kibi one’s website, I’ve included the API URL endpoints and the exact JSON file formatting you need to use in order to make this API call work. So I’ll link to Kibi below, so you can get the step by step authentication, header and JSON instructions. In the description, just look for where it says “instructions”.

Also, while you’re over on Kibi One’s website, be sure to check out our no-code SaaS development course. We teach people how to build software products by taking a no code or low code approach. We own a few platforms that have AI integrated into them, and in this course, you’ll learn how to do the same.

Okay, so once you have the API setup within your API testing tools, it’s time to click “send”. So let’s enter a prompt here. I’ll type in something like “what’s the fastest growing economy in the world”. When I do that,, I just need to wait a few seconds for the response and then… once it’s ready, it will show up over to the right here.

Using Llama 2 13B in the Cloud

So now we’re using llama 13B in the cloud. As you can see, without writing any code, we were able to set up a serverless instance of Llama 13B and then interact with it VIA our own custom API endpoint. YOu could set up a serverless version of any LLM taking this approach, But these quick launch serverless options are your best bet if you’re trying to take a no code or low code approach. I expect that Runpod will add the newest and most popular versions including llama 70B to this list of quick start LLms shortly. But as you can see, I’m getting really good responses using the Llama 13B.

Now the benefit to taking the serverless approach over the pod approach is that I’m only charged for the request execution time. This means I don’t have to worry about how many hours my pod is online for. I don’t need to worry about any infrastructure or GPU details at all. These out of the box serverless instances all work out of the box and are optimized based on the LLM our open source AI software you’re running.


So I hope you’ve enjoyed this tutorial on setting up open source AI projects in the cloud and interacting with them through API endpoints.

And remember, if you’re looking to upskill in the area of SaaS development, or no code AI development I encourage you to check us out over on We have a no-code SaaS development course where we will teach you how to build and monetize pretty much whatever you can dream up including AI applications. A link to and a coupon code for $100 off can be found below.

blackship one logo good

Kibi.One is a no-code SaaS incubation program focused on helping non-technical founders build software companies. Visit our homepage to learn more about our program.

Build SaaS Platforms Without Code

Kibi.One is a platform development incubator that helps non-technical founders and entrepreneurs build software companies without having to know how to code. 

We're so sure of our program, that if it's doesn't generate a positive ROI for you, we'll buy your platform from you. Watch the video to the right to learn more.