It’s time to talk about the elephant in the room. Serverless can be costly. I know, it is weird coming from me, the serverless fanatic. I go on and on about the cost savings of serverless. Hell, one of the biggest selling points of serverless is “pay-per-use”.
That being said, there have been a lot of stories lately about runaway bills. There is even a website called Serverless Horrors that cover some of these stories. Fortunately, they don’t have too many stories logged but it’s still worth talking about.
It is very important to have these conversations both in the name of transparency but to also provide best practices for controlling these bills. I want to be very clear, I am not here to attack vendors or users. I am looking to have an honest conversation about these issues so that we can find ways to mitigate them.
Cara: Fighting AI Artwork and Bills
Cara is a social media platform for artists that takes a hard stance against Generative AI artwork. They want to encourage artists to share their work and prevent them from being used to train GenAI. As a side note, I stand behind this notion as there are a lot of great artists in the world and I would rather use their work than something generated by AI when given the choice.
Apparently I am not alone in this thought process as this app went viral. Of course with the rise in popularity, that meant more people were visiting and using the app which meant more resources were being used. FireShip just did a YouTube video explaining how Cara was hit with a bill of $96,280 from Vercel.
To those who are unfamiliar, Vercel is the creator of the Next.js framework and they offer a serverless platform that they refer to as “the frontend cloud”. As the video describes, they utilize the same pay-per-use model as most serverles platforms do.
They also have a “hobby-tier” which is a very common practice in the Freemium model. In this model, you offer a “free tier” to get people on the platform then as they grow, they start using more features and resources then move onto one of the paid tiers. Cloud is a consumption based business so their free-tiers are usually low amounts of data and compute per month. Even major clouds such as Google Cloud offer a free-tier.
Cara likely started as a side project that went viral and exceeded this free tier’s limitations. Being that it was a side project, there probably wasn’t a plan to monetize it nor were there any VCs throwing money their. So naturally, getting hit with a nearly $100,000.00 bill will shock anyone.
In Vercel’s defense, they did state that they sent emails to the user about their budget growth and the user did not set budget limts.
DDOS on a Static Site causes a Six-figure Bill
A few months ago someone posted on Reddit about their experience with Netlify. Netlify is a similar service to Vercel and they were very important to the creation of the Jamstack architecture pattern.
Naturally, the Redditor was upset and many people advised them to post this story on HackerNews so they did. After a lot of back and forth between the CEO and the user, the bill was forgiven. However, there were legitimate arguments being made that the “Starter tier” mentioned on the pricing page doesn’t indicate any network costs that could result in such a bill.
CyberNews did a deeper dive on this story and it is pointed out how some serverless platforms fail to provide DDOS protection or spending limits. This is a very fair call out. I think that we are experiencing some “growing pains” with newer serverless companies figuring out how to provide proper guardrails on their system.
That being said, it’s a shared responsibility. The vendor needs to provide protections and guardrails and the user needs to properly implement them in their business/projects.
The Heart of the Matter
As I just mentioned, I think one of the biggest issues here is that many of the new serverless startups are still figuring out their platform. Most of them are effectively abstractions on top of existing cloud platforms (very few are hosted in their own data centers). They use the offerings of major public clouds as the primitives for their own platform.
In building their abstractions and platform, they often tend to focus on building a platform geared towards a great user experience (UX). This is a good thing as this is how you bring in long term customers. You want to have the best platform for your users to build and host their applications. The problem here is that this usually translates into solving technical problems and not business problems.
An added focus on users’ business problems would provide users with guardrails and other tools to manage their costs. Of course, there is also a user responsibility. I had this problem recently myself not but four months ago. I started a Cloud Run service and left the maximum number of instances at 100. It was a brain fart on my part.
To those who are unfamiliar with Cloud Run, it’s a serverless container platform offered by Google Cloud. Cloud Run refers to their “workers” as “instances”. In order to ensure quality performance for the end user, you will want your application to scale to as many instances as possible. However, instances cost money when they are running. I saw my bill triple in two-days before I realized my error.
I was able to fix this by limiting my maximum instances for that specific service down to 10. To this day I don’t know exactly what caused the issue but I suspect it was some form of DDOS. Had I been paying more attention to my instances and used something like Cloud Armor to protect my services from DDOS attacks, I may have been better off.
Again, there is a shared responsibility.
How to be cost conscious?
It’s not all doom and gloom. CIO Magazine recently did an article on how to get the best experience with a serverless cloud. They cite the importance of cost-optimization tools. Most of the public clouds already have them built into their platform. Beyond that, there are many third parties out there that can help such as CloudZero.
Anomaly detection is very important too. If your costs are going up because your application is getting very popular and receiving legitimate traffic then that’s fine. But if there is a “glitch in the matrix” then you want to identify and address is immediately. Again, every major cloud provider offers someting to help with this. There are also third parties like DataDog and SysDig.
There is no singular best platform so I highly advise doing your homework on third party tools to figure out what’s best for you and your use case.
I am also a major advocate for budget alerts. Make sure that you have budget alerts setup to tell you when you are hitting your budget so that you can act accordingly. Most cloud providers offer these kind of controls.
Final Thoughts
Serverless 100% can be cheaper than non-serverless solutions due to the pay-per-use model. The ability to scale down to zero and then scale up accordingly gives you a lot of leeway in controlling costs.
However, it is up to us as end-users ensure that we are setting budget alerts, using anomaly detection and analyzing our bills to find ways to spend effectively.
It is also important for serverless providers to provide the tools and dashboards to help protect their users from unwanted events. Transparency in pricing is also key. “Free-tiers” are never truly free and vendors should provide ways for users to calculate and traject where their spend could go based on a variety of common factors.
In recent years a concept known as FinOps has grown in popularity. FinOps practices allow for organizations to plan their spending and figure out how to create boundaries for their costs. It’s more than just a toolkit, it’s an operational framework. Vendors need to provide tools and transparency to empower users to implement FinOps practices when using their platform.
-Photo courtesy of Maitree Rimthong via Pexels