It is actually against law now to talk about tech without mentioning AI. Okay, not really but let’s be honest, AI has been dominating the tech news for nearly a year and a half now. It really is an exciting time for technology but what is often lost in the conversation is “how do we consume AI?”
Today I am going to talk about some serverless companies who are attempting to help streamline the process.
Vultr is a privately held cloud computing company (often considered the largest). They recently announced their Cloud Inference offering. This will be a serverless solution that will simplify usage of Generative AI.
What do I mean by this? Well the “core” of generative AI is the Large Language Model (LLM). As the name suggests, it is a massive AI model training on a massive amount of language data (often in text form). The creators of these models (OpenAI, Anthropic, Google, etc.) will then create an API layer on top of it for people to interface with the model.
Inferencing is the process of actually using the model. This can be in the form of a chat bot or a search bar or something else creative. You will write code for the purposes of inferencing with the LLM. This code has to be hosting somewhere.
The use of serverless simplifies the development of the application but also scales up and scales down as needed. This is great for developers inferencing with the LLMs. Let’s say that I am building a chat bot. The backend of the bot that calls the LLM can scale down to zero when no one is using it. As more people use it, it scales to complete the requests. You only pay for what you use.
Vultr creating a product/service that is specifically designed for inferencing and is serverless is a big deal for the serverless market. It is showing that serverless can be used for more than just “glue” and has a place in this new world!
I expect to see what more examples of AI inferencing in the serverless world. Actually, Dr. Li Feifei of Alibaba Cloud seems to agree. In a recent interview he said
“The promise of AI is not just limited to gaming or even making sense of unstructured data,” he says. “This capability is handy given the move towards serverless cloud computing. As the name suggests, serverless means that there is no longer a need to worry about servers behind the service you're using.”
He continues: “In the past when one purchased a cloud service product, a provision had to be made for a set of servers. For example – four core eight gigabytes of memory – but that came at a cost. When one provisions a server that has more capacity than the actual workloads require, server resources are wasted.”
In fact, not too long ago, Alibaba Cloud launched their own serverless AI inferencing solution. The idea here is to let developers just focus on using AI. Don’t make them think about machine types and memory usage and how to expose the service to the internet. Just allow them to write code to use AI.
Now so far we have been talking about serverless compute in relation to AI. That is to say, we have talk about where we host code. But serverless isn’t limited to just compute. What about serverless data streaming?
Startup RedPanda recently announced their serverless streaming platform, RedPanda Serverless. It is Kafka API compliant so if you understand Kafka, you can understand their platform. The idea is that you only pay for what you use.
Those of you familiar with Kafka may be asking “wait, how can it be serverless when you need to have a Kafka cluster running at all time?”. That is a very fair question. They basically are only charging for data that is stored and data that is transmitted. They have found a way to abstract the compute layer.
In the past, you would have to pay for the actual servers that store and process data. Here, you are pretty much just paying for the storage and processing without paying for the actual machine. This can be great for using GenAI with streaming data. Imagine collecting data in real time and getting it summarized by an LLM. Or imagine being able to trigger a process based on some real time event. This can be a game changer as well.
In the past I talked about serverless databases but now we are seeing serverless streaming.
Ultimately, I think what we will see is the complete abstraction of the computer layer as time progresses. No more idle machines or worry about provisioning or even trying to figure out how to scale in a smart way. Your cloud provider will provide the abstraction for the compute.
I think this move in GenAI and AI in general will be a catalyst for serverless adoption. We will see more people trying to find easy ways to consume and inference with LLMs and other AI Models.
It often feels like a race against the clock with regards to GenAI. Every day some new product is launched and seems to blow people’s minds. Just the other day DataBricks launched their own LLM called DBRX.
As these technologies keep coming and changing, organizations are finding themselves needing to improve the velocity of their development cycles. Serverless removes the need of provisioning and speeds up the overall development cycle. Businesses can now inference with new teechnology and launch their new AI-powered features in break-neck speed.
Photo by cottonbro studio