Running Open Source AI LLMs in the Cloud Without Code

In this tutorial you’ll learn how to deploy AI models such as Lllama 2, Mistral, Stable Diffusion and Open Journey in the cloud on virtual machines without having to code.

blackship one logo good

Kibi.One is a no-code SaaS incubation program focused on helping non-technical founders build software companies. Visit our homepage to learn more about our program.

Launch AI In the Cloud Without Writing a Line of Code

In this tutorial I want to talk more about how we can interact with the API endpoints of open source LLMs, or large language models, without having to code.

I’m creating this tutorial because there are many new language models that have been released recently including

– Llama 2 70B
– Code LLama 70B
– Code llama python
– Code llama instruct

There’s also been new releases of Mistral and many others. Now traditionally working with these open source large language models has been difficult because you need to know how to code and you need to have a powerful enough computer to work with the open source LMM locally. Most of our laptops simply don’t have that much power. And even if we could get an LLM working on a home computer, we’d likely see really slow token output times. Even slower than one token per second in many cases.

Overcoming The Many Obstacles of Open Source LLMs

So how do we overcome all of these obstacles? One way is to use third party cloud solutions. You could use use any of the big cloud service providers like Google Cloud, Azure or AWS, but recently there have been a popularization of AI focused cloud computing platforms, that can both store your instance of an open source LMM AND expose its endpoints to you so that you can embed the LLM’s functionality within your own applications.

There are many different platforms that specialize in this service, but today, I’m going to introduce you to one and show you how to set up your very own custom LLMs.

So let’s jump over to RunPod, Here you’ll see the service is “the cloud built for AI”. Sign up and then once you’re logged in, you’ll see a page that looks a bit like this:

So let’s click over on pods. From here, Click on “GPU pod”. Then, from here you can rent hardware by the hour to run your instance of your LLM. There are many different options here, but let’s go down to RTX A6000. This is affordable and will give us the performance we need for this tutorial.

LLMs in the Cloud: Picking a Template

So I’ve put a bit of money on my account here, so I’ll just click on “deploy”. So here, it’s asking us to select a template. From that drop down list we could select any template we want here. However, there are also templates built by the community that we can use that aren’t included this list by default. So if we temporarily closed this down, we could click on this option for “choose template” here.

Now, notice that if you scroll down under “community template” you’ll see LLama 2 70B by trellis research, here you’ll see “mistral instruct”, various stable diffusion models, whisper models and so on. I’m just going to click on “Llama2 70B” here. So now that I have this template selected, back on my pod page you’ll see that I have this readme file here. If I open it, you’ll see the settings in terms of what hardware I should select to run this model and other important details. This is helpful if you ever need to troubleshoot one of your pods or templates. As you can see in the readme file, when I deploy my server, I’ll have this software up and running and available to me through this API endpoint.

So now I’m going to click on RTX a6000. Now you’ll see my selected template is loaded by default. And now I’ll click “continue”. And then I’ll click on deploy. Now, it’s building my pod, this will just take a minute. And when that’s done, you’ll be able to select this drop down to see the status of your setup. Here, under “logs”, you’ll see my pod is still being configured. So let’s just give this another minute.

Getting Your LLMs API Endpoints

Once it’s ready to go you’ll see an interface that looks something like this. What we want to do here is we want to click on “connect”, you’ll likely see this button for “HTTP service Not Ready”. So we just have to wait another moment until that’s ready so we can use our API endpoints. So hold tight a few more minutes while it finishes setting up llama 2.

Okay, so everything for me is showing up as ready. So let’s test out this Llama 2 API. You can use whatever API testing tool you like. I’m using Insomnia. I just created a tutorial on how to use this API testing software if you want to follow along. Also, over on Kibi one’s website, I’ve included the API URL endpoints and the exact JSON file formatting you need to use in order to make this API call work. You can find those step-by-step instructions here.

Also, while you’re over on Kibi One’s website, be sure to check out our no-code SaaS development course. We teach people how to build software products by taking a no code or low code approach. We own a few platforms that have AI integrated into them, and in this course, you’ll learn how to do the same.

Okay, so let’s jump back into RunPod. So essentially, what I want to do here now, is I want to make sure my pod is expanded and then I want to copy this ID here. Then, what I’ll do is I’ll paste that ID into my endpoint like this: Again, over on’s website you’ll see the endpoint here with a bunch of XX’s. Simply remove those x’s and insert your ID in their place. ANd that will be your API endpoint. Instructions on headers and authentication can also be found on our site, so be sure to read this document step by step.

Now, I’ve already authenticated myself and I have my headers setup, so now I’m able to ping this API endpoint. Here you can see I’ve entered a prompt and you can see my generated text over to the right.

Interacting With Llama 2 in the Cloud… Without Code!

So I’m officially interacting with my own hosted version of Llama 2 and I’m able to interact with it using this API. Also, it’s important to remember to stop your pod when you’re not using it so you’re not charged by the hour.

I hope you’ve enjoyed this tutorial and I hope you were able to get your APIs up and running using your own LLM on a cloud server. In the next tutorial I’m going to show you how to do this using a serverless system, so essentially, you don’t have to rent a pod by the hour. You just pay for the service you use based on the use of your endpoints. So be sure to check that video out as well. In most cases it’s more practical to go serverless if you’re going to be incorporating these APis into your applications. again – a link can be found in the description below to our serverless AI video.


And remember, if you’re looking to upskill in the area of SaaS development, or no code AI development I encourage you to check us out over on We have a no-code SaaS development course where we will teach you how to build and monetize pretty much whatever you can dream up including AI applications.

blackship one logo good

Kibi.One is a no-code SaaS incubation program focused on helping non-technical founders build software companies. Visit our homepage to learn more about our program.

Build SaaS Platforms Without Code

Kibi.One is a platform development incubator that helps non-technical founders and entrepreneurs build software companies without having to know how to code. 

We're so sure of our program, that if it's doesn't generate a positive ROI for you, we'll buy your platform from you. Watch the video to the right to learn more.