Ben:
Hello and welcome to PodRocket. Today, I'm here with Leon Kuperman who's the founder and CTO of CAST AI.

Ben:
How are you, Leon?

Leon Kuperman:
Hey Ben, how are you? Good to be with you.

Ben:
Yeah. Excited to have you and excited to learn about CAST. So, why don't you give us a quick overview of what you're building?

Leon Kuperman:
Sure. So, I've been doing this kind of startup thing for most of my life. Very rarely am I in large companies. The last large company I was in was because our startup was acquired by Oracle and I had do my stint at Oracle, which was actually much more fun and interesting than I thought it was going to be. I was pleasantly surprised. But one of the things that we noticed... And in startup life, you tend to solve problems that you have. If you've never faced problems in the industry, it's hard to come up with an interesting idea. And we had this massive problem. The startup was called Zenedge and we had this massive problem with cost control. So, we were growing the company, growing the product and the customers loved it. Everything was great but because we were fully hosted in the cloud... We were in 30 regions of AWS or however many regions they had.

Leon Kuperman:
Every large customer we would onboard, the bill would spike and then I'd have this massive disagreement and argument with my CEO about why the bill is growing. And honestly, I had no answer. So, while we were pretty successful on the product side and the customer side, the investor side and the acquisition, we failed miserably on cost management. And then I saw this pattern a bunch of times in the industry and then I can't be the only guy that can't figure out my bill, even though it is a complicated one. And then we had this vision to say, "All right, what would it take to solve this problem for all customers?" And so, we came up with this platform based on three basic principles and I'll talk clearly what they are.

Leon Kuperman:
The first one is the world is moving to containers. For those in the audience who are unfamiliar, containers are the smallest kind of unit of computing that are going to cloud compute. The second principle is, if the world is moving to containers over the next five to seven years, they need an orchestration platform. So, it was pretty clear that Kubernetes was the orchestration platform winner for container management. And then the third principle, which is maybe the most controversial is if those first two things are true, we do not have enough engineers in the world to manage all of these complexities of cloud infrastructure and containerized management. We need containers to be autonomous. We need them to run automatically in Kubernetes. And that's the vision for CAST. We want to make Kubernetes an autonomous platform and we're starting with cost control as the first pillar of functionality that we're helping customers with. And that's what the platform does today.

Leon Kuperman:
In a nutshell, customers that come to us that already run Kubernetes clusters, sign up, they go through a small process and they're able to see that they can save somewhere between 30% and 80% of their bill almost immediately. And then they can onboard themselves or we help them with it and they start saving right away.

Ben:
Got it. So, a bunch to dig in on there. I really like that idea of three core principles and building the company and the product based on, if you believe in these principles, then it's very clear that there's a need for a product like this. The first thing I'm curious about is you said, typically when someone starts using CAST, you can find 30% to 80% savings right away. What are some of the typical, low hanging fruit you find it in customer's applications that can very quickly be fixed?

Leon Kuperman:
There's a bunch of these wells of waste, I call them. So, the first one is human biased and the types of infrastructure you choose. You as a DevOps engineer are used to, I'll just use AWS, M5 extra large instances. You've used them for the last three years. You're happy with their performance. You're not going to use anything else. Well, that's a bias that a computer won't have. We're going to choose the best infrastructure for the job at the best possible price. So, infrastructure choice is one bucket. The next one is, Kubernetes does a really bad job. It's a very fair platform, which means if you've got 10 computers in your cluster, your workloads are going to get spread across 10 computers. Well, you might not need 10 computers, right? So, it does this even spray. And so, bin packing is a really big problem for customers.

Leon Kuperman:
They don't pack their workloads in sufficiently and they leave a lot of waste on every single server. And then the third one is the sandbagging problem. So, if you've got an engineering team that sets up some resource requirements for the workloads, for the applications, and then you have a DevOps team that maybe sandbags that by 20% and then maybe you have an SRE. No one wants to get woken up in the middle of the night. So, everyone's sandbags and then you end up with 3X the capacity. So, you're asking for 16 CPUs for an application that needs three CPUs or something along those lines. So, those are the main pools. And then I would say there's one interesting pool, which is the wrong type of instance life cycles. And I'll explain what I mean. Amazon and all clouds sell their computers in three modes. The traditional on demand mode, which is pay as you go like it's a dollar an hour. As soon as you don't need it, shut it down.

Leon Kuperman:
That was the original promise of cloud. And then customers came back to those cloud providers and said, "Well, this is all way too expensive. How do we go cheaper?" And they said, "No problem. Commit to us for two years, three years, pay some money down and we will drop your price. We'll give you a continuous use discount," or a savings plan discount. Well, that's all fine and dandy but then it takes away from the original promise of cloud. Now I'm back into three year contracts and what if I want to change something in my business? Well, I'm stuck with this bill. So, we don't like those. They are unnecessarily evil and we don't like them.

Leon Kuperman:
And then the third type of life cycle is this thing called a preemptible or a spot instance. And those are the most interesting for us because they're currently underused in the market. These are computers that no one's using their excess inventory and we can go grab them for a fraction of the price, such as 80% off. Super powerful computers that just aren't being used. Well, the problem with those and why only 7% of customers use those type of computers is because they have no SLA. They have no guarantee from the cloud provider, meaning AWS can take that computer away from me in two minutes notice. GCP can do the same thing with 30 seconds notice. So, customers don't like the chaos and unless it's fully automated, they don't want to deal with it. So, we provide that automation layer that lets us go and buy the cheapest possible lifecycle instances from the market and we do that on a continuous basis.

Ben:
So, what does it look like to get started with CAST? Let's say I have a cloud application.

Ben:
First question, does it have to be using Kubernetes or are you a bit more agnostic than that in terms of how the application is architected?

Leon Kuperman:
No, we are not agnostic. So, if you're using Kubernetes, we're the right people for you. If you're not using Kubernetes, then we're the wrong people for you. And it's a very and white thing. We're going to spend the next five to seven years or however many years this journey is just making this one particular platform as automated as possible. Why only focus their, Ben, because I believe everyone is going to be there anyway. We're just going to meet customers where they're going to end up in five years.

Ben:
Got it. So, I have my Kubernetes application. What's the process look like to get started with CAST?

Leon Kuperman:
Yeah. Well, the first piece is free. You just go to the website, you sign up and you run this small agent. It's a read-only agent that installs into your cluster. And then within five minutes you get this big report back that says, "All right, here's where all your waste is. Here's what we would do differently. Here's your cost allocation." We have this beautiful visualization across all your teams. So, Kubernetes is a really good multi-tenant system, meaning you can have multiple service teams in the same cluster sharing infrastructure.

Leon Kuperman:
So, we say, "Here's all of the cost structure between all of your components and teams," and how it all breaks down to the lowest level. And then you don't have to do anything. You can just take that report and implement it if you want or you can say, "Yeah, I want to try this automation." And you move to what is phase two of our platform, which is you onboard our read/write permissions. So, we create a set of permissions to actively manage your Kubernetes cluster. And then that's it, basically. That's all you really need to do to get started. And there's two modes of optimization. A slow roll optimization or an immediate optimization that gets you all the savings over a 10 minute period and it's your choice of which path to take.

Ben:
Got it. And so, you make these recommendations into what I can do and then it sounds like you automate the implementation of those recommendations. Is that accurate? And can you do those automations regardless of whether I'm on Amazon or Google or wherever my application is hosted, you can implement these different optimizations and automations?

Leon Kuperman:
Yeah. And that's probably the difference between us and 99% of cost optimizations. Most other folks give you a report and, Bob's your uncle, "See you later." And then nobody implements those things because who wants to take the risk of taking a recommendation and then having downtime? Whereas we take this very micro optimization approach. We start optimizing and take step by step until it's all done. And we can work with any of the three major hyper-scalers. So, Amazon, Google and Azure, they all have Kubernetes offerings that we plug right into. And then we'll have more in the future as other clouds grow in size and scope. So yeah, you don't have to do much. You just turn it on. We take the place of what's called the cluster autoscaler in Kubernetes terms. So, you turn that off and we replace that thing and we're off to the optimization races.

Ben:
I'm curious, you replaced the cluster auto-scaler. Correct me if I'm wrong but that's Kubernetes native auto scaling system that implements the logic of auto scaling. Why do you think the Kubernetes team hasn't built in more intelligent optimization since it seems like something every Kubernetes user would want over time?

Leon Kuperman:
So, the cluster auto-scaler is great. It just doesn't treat cost as a first principle to scaling. And part of it is that it doesn't know what the costs are. It's a pretty dumb engine. It knows about computers and node tools. So, there are two things. Kubernetes is this concept of homogeneous pools. So, pools of machines that get added to a cluster. That's an anti pattern from my perspective. There's no reason to do that. There's no reason why you can't mix and match. It's like playing a game of Tetris. You can take the blocks that you need to fill the puzzle at that moment. There's no reason to take eight long bars all of the time when those will create a lot of fragmentation in your puzzle. So, I think that the original premise for the auto-scaler is flawed in that it's homogeneous.

Leon Kuperman:
Now, there are other open source projects that try to solve that problem. One is called Karpenter that AWS is promoting that tries to move in our direction of these heterogeneous mixed instance clusters. And that's cool. And then the second piece is, is that the cluster auto-scaler does not have the capacity or the capability to do the analysis of where the market trend is going. And a big part of this and why it's a SaaS platform is that we're ingesting all of this cloud data from all regions of all clouds. And we're doing forward looking analysis to see where the market is going and to see if you're likely to get interrupted. There's all these calculations that are... It's just not something that you want to do locally inside of your cluster. We use a lot of compute to do that.

Ben:
So, you're saying you look across all your customers who are using CAST and can be smarter than any one individual tool could be because it doesn't have that data across the different cloud providers and different applications. Is that accurate?

Leon Kuperman:
Yeah. That's one of them. So, I'll give you a little bit of our secret sauce, Ben. The reason why we give everyone a free agent that they can look at all of our recommendations is because it's a value for value trade. We're giving you a valuable recommendation that you can use anytime. We're also taking your data and we're making it part of our machine learning analysis platform.

Ben:
I'm curious... Maybe this goes back to some of the things you mentioned at the beginning of the call like the common sources of waste in an average Kubernetes application. I'm curious beyond those, any particularly interesting learnings that you've seen from doing data analysis across the pool of all of your customers?

Leon Kuperman:
There are some really interesting phenomenon happening and I'm actually in the middle of writing a pretty cool white paper about it. So, you're familiar with Moore's law. The fact that computing power tends to double every 18 months. And if you back up, it would imply the costs of compute tends to have every... If you're producing in the same way for double the competing capacity, your costs should go down. So, all of that's true and I think that Moore's law is about to... Based on what we're seeing that's happening in the market, we see this inflection point where the Moore's law, it's more like an observation, is about to break because of geopolitical issues. So, we have this massive supply chain disruption where you know automotive chips are just not available but it's not only automotive. It's bleeding to everywhere, right?

Leon Kuperman:
So, we're seeing cloud inflation for the first time since the introduction of cloud. And it has happens in subtle ways differently, depending on the provider. So, for example, Google just increased their prices on traffic going in and out certain regions. Full Stop, they just tripled in some case. Amazon does it slightly differently, where those spot instances I was talking about, they're becoming more scarce. So, the ability to replace them quickly with other types while the market is dry, is super important. So, we learned a lot going through this holiday season, from about November to January, when the market was particularly dry and we saw prices edging up and then sometimes there would be a complete drought at very critical times. And one of the things that I'm super cautious about and passionate about making sure our customers have solutions for is what happens in those times of drought when they absolutely need computers but those computers just happen to be more expensive temporarily?

Ben:
You touched on some of the gaps in your competition. I'm familiar with a few of the cloud optimization platforms. One of my colleagues used to work at CloudHealth and I know there's CloudScale. There's a whole bunch of these tools that, currently or in the past, sought to help people manage their growing cloud bills.

Ben:
I think I know the answer but I'm curious to hear from you. What's the pitch for how CAST is different and are there any ways where those tools are actually a better fit? I can guess one which is, if you're not using Kubernetes then CAST isn't for you. We've talked about that. If you are a Kubernetes app, what are some of the particular ways where CAST is superior or different from competition?

Leon Kuperman:
So, that's a great question. If you're not using Kubernetes like CloudHealth and all of those guys... And most of them, have been acquired over the last several years. So, they're all kind of legacy platforms at this point but they still do a great job. They do a great job if you need to reduce your RDS spend or your EC2 spend. They have good rules and heuristics for places where they've commonly seen customers waste money and that's cool. If you are you using Kubernetes, you have a really interesting problem in that you've seen you have multiple teams using the same infrastructure. So, at the container level, let's say you get a bill for a thousand dollars for this set of computers. Which team are you going to hold accountable for that thousand dollar bill?

Leon Kuperman:
None of those tools have really good container insights. Even the insights, nevermind the self optimization piece. So, one place where we help our customers significantly is understanding and holding individual services accountable for their spend. And this is where engineers get to learn about cloud economics and understand the best practices of scaling their services in a cost effective way. And then the second piece is, which I think what you alluded to, we're moving from the perspective of, "You can't spend human time. You just need to have computers make these micro decisions." And we do that every 15 seconds. It's very hard for a human being to keep up with those types of optimizations.

Ben:
So, one of the things we were talking about a bit before we started recording was that one of the problems that developers and teams building applications face nowadays is vendor walk in and people start building on Google or Amazon or Azure and you get locked in over time. And that can be from a variety of reasons. And I know you have strong feelings about this. So, curious to hear your thoughts on why that is a problem and maybe some of the ways that CAST could be helpful long term in fixing that.

Leon Kuperman:
Absolutely, Ben. Now you're going to get me started on this rant. Let me try to summarize it. So, clouds obviously want you to stay in their ecosystem. It's a natural monopolistic tendency like monopolies are worth more than completely open free market enterprises. So, there's a couple of things that they do that trap customers. And so, some of them are if you use proprietary protocols. I'll give you an example. If you choose to use DynamoDB in AWS, which is one of their database offerings, versus RDS, which can have a PostgreSQL flavor or a MySQL flavor that's available anywhere. If you're using DynamoDB, you're pretty much sticking with Amazon from here on out until the end. You have locked yourself in.

Leon Kuperman:
So, I encourage my customers to use open source wherever possible. There's no single service that these clouds have that is so irreplaceable that it can't be replaced with a more flexible and transferable service offering. But the one really egregious practice that I see that all three of the major hyper scales perform is the charging of data. Let me explain what I mean. When you have traffic going into AWS, Google or Azure, you don't pay anything for that. That's called ingress and you get that for $0 per gigabyte. When you try to take your data out, it's like Hotel California. You can check out anytime you want but you ain't going anywhere because that number is so high that it is cost prohibitive to actually move your data and make it transferable. So, it's 9 cents a gigabyte. That's roughly the starting list price for transfer out or what's called egress.

Leon Kuperman:
And then you might say, "Hey, Leon. That sounds like not a lot. I pay a lot more than that for cell data." Yeah, except when you do the math, it's 30X the cost of the transfer. So, you are paying the equivalent of 30 something dollars per megabit and Amazon is paying 75 cents or Google is paying 75 cents. That is egregious and it has to be fixed because until that gets fixed... And it might need to get fixed from a legislative perspective. Someone needs to step in to break this monopolistic behavior because it is only hurting the customer. You are charging money for something that it doesn't cost you anywhere close to that number and you're doing it specifically to prevent customers from moving data around.

Ben:
And I guess to be fair, developers are smart people and understand these dynamics before choosing a cloud provider. But I guess it is your contention that it's an oligopoly between the cloud providers and they're colluding to raise all of their egress fees, such that whichever one you choose, you get locked in and that's why it would make sense for the government to get involved.

Leon Kuperman:
It is an oligopolistic behavior. And until you have a player that comes in that's willing to drop that price to win business. And there is an example of that but they're not making a huge... Oracle is a good example of that, such as my former employer. Why did they win so much Zoom business back when the pandemic started? Zoom needed to expand like crazy. There was no computers but the egress fees were also killing Zoom because remember, Zoom is in the business of feeding video. So, Oracle was able to come in with a super aggressive egress price and they were able to win a bunch of business but there isn't enough of that competition to drive prices down naturally. So, we either need a super big player to come in, doesn't look like there's one on the horizon, or we need some intervention to say, "Look, guys, there needs some to be some equality of costs on the traffic side."

Ben:
Right. I guess that's a good point that there's multiple reasons why egress is important. There are some use cases where part of the core use case is to have users download data or stream video or things like that. And then there's the reason why you want egress to just be able to switch clouds. So, there are market forces that push providers to be aggressive for the former, like Oracle, but not a lot of market forces that would make any of these players want to allow data to be easily sent from their platform to another cloud provider.

Leon Kuperman:
Oracle and Microsoft did another interesting thing where they partnered together to wire their data centers over what's called the Layer 2 connection. And the reason for that was they had enterprise customers that said, "Oh, we want an Oracle database. We want a .net application and we want to host them in the clouds that do those things best." So, Azure applications on Oracle databases and they didn't charge customers for any of it. That was a smart thing that helped the customer that gave them what they wanted and helped the market overall. And I'm looking forward to seeing more of that collaboration where these clouds can actually work together more seamlessly without worrying so much about the transient nature of customer workloads. Theoretically, our customers should have workloads in all the clouds. Why not? It only adds resiliency to an enterprise if they can work with multiple vendors.

Ben:
Right. And as we were talking about earlier, each of the cloud providers have pieces of infrastructure that are not open source that might be quite good, not irreplaceable, but good. And it benefits all of us to be able to use Google Bigtable or Amazon DynamoDB and not need to pay large amounts of money to send data between the cloud providers in order to use the best of breed of each type of infrastructure.

Leon Kuperman:
Yeah. That's a really good example. I love BigQuery. It's so good for data studio and other visualizations. It just makes it so easy to use but am I going to get locked in there? What happens if I want my Amazon applications to use those? Am I going to be paying a boatload of money to get the data out of BigQuery?

Ben:
Right. It's probably ideal for all the developers in the world, if Google and Amazon compete on quality of infrastructure tooling versus just locking you in and then not having as much incentive to make BigQuery an incredible tool because people are locked in, so they're going to use it anyway.

Leon Kuperman:
And Ben, I think that's going to change a little bit because if you look into the future, we've only experienced the macroeconomic growth over the last decade and a half. Since the cloud started, there's only been an upward swing in macroeconomics. What happens when there is a cost crunch time and there's a shrinking of multiples, which we're seeing now in the market. And then there is a time for everyone to tighten their belts on P&L?

Leon Kuperman:
And so, when that happens, there's going to be a real look at cloud migration such as, "Does this make sense from a cost perspective? Maybe my old clunky data center is fully amortized and I should stay there." And as these clouds want migration to continue and there's a real repatriation movement that is occurring that has to be considered, I think the clouds will reevaluate their position on some of these costs to continue to have migration ramp up. We may be a couple of years from that but I don't think growth is infinite unless there's some rethinking on the clouds part of cost structure and how expensive some of these things can be.

Ben:
So, bringing things back to CAST, curious to learn a bit about the future. What is your roadmap look like both in the shorter term of 2022 but also, what's the long term vision?

Leon Kuperman:
Great. So, the vision is autonomous Kubernetes and we've only achieved a very small sliver of that. So, this year you're going to see a lot of interesting enhancements come out on the reporting side and the integration with tools like Datadog and CloudWatch and basically where our customers are consuming data today and how they want to realize data. And we've got a pretty interesting reporting suite that we just released. It's in a quiet data right now but beyond that, there are other areas of autonomous operations that we need to tackle. So, very specifically, I grouped them as day two operations. Upgrading, patching, moving your infrastructure along so that there are no vulnerabilities. That's a whole kind of area of autonomous operation that we will focus on. This year, we will also release the second pillar of our platform, which is a cybersecurity module for Kubernetes. So, K8s is kind of backwards a little bit from the traditional virtual machine world in that those tools are fairly well understood and sophisticated. Kubernetes is not as sophisticated in my opinion.

Leon Kuperman:
And coming from a cybersecurity background, I'm really excited to bring some of our learnings to Kubernetes. So, we will release a suite of cybersecurity modules that are all intended to be autonomous and self-managing that will tie in with SEAM. So, SEAM is a security event and information management system. There are a few big ones out there and what are called SOAR platforms or security orchestration platforms. So, that's pillar number two and that will be toward the end of 2022. And it will probably have a three to four year roadmap. And then we're also going to be introducing data governance and high availability solution that's going to make sure that your data stays resilient and your clusters remain resilient and highly available, even when clouds have issues or you might be in a disaster recovery scenario. So, there's three. The cost and day two operations. There's cybersecurity and then there's high availability disaster recovery. Those are the three pillars that we're building the platform on.

Ben:
So, super exciting future roadmap. I imagine you need to grow the team quite a bit to get there. So, tell me a bit about how the team looks like today and what are the plans in the future to keep growing?

Leon Kuperman:
So, our engineering team and a lot of our corporate team is in a Baltic country called Lithuania. It's a way for startups... If you have to build in the US or Canada or in these super hotspot areas, it's very difficult to afford all the engineering talent you need. So, we started building our team in the Baltic's and we're hiring for a lot of different engineering positions, marketing positions and sales positions. But for engineering specifically, we're doing something really interesting for folks in the Ukraine. And it's a little bit personal to me. I was born in the Ukraine. I was born in Odessa, And so, everything that's going on there for us and the co-founding team is super important but we want to give these folks a mechanism to get out of that chaos.

Leon Kuperman:
So, we're offering any folks that can go through our loop that are currently living in the Ukraine and they'd like to get a way out, we're offering them help with their visa and their permits and their immigration status and then three months of housing in Lithuania, if they want to move to Lithuania where that's closer to our engineering headquarters. So, we want to take care of all the red tape and paperwork and then offer them a head start to move their family. We think that's an important give back to that community.

Ben:
Yeah. That's great to hear and both a way to help those folks and I'm sure to get great technical talent to join the team.

Leon Kuperman:
Not altruistic from my perspective. We're going to absolutely get the best folks we can get but we want to help them obviously as well.

Ben:
Well, Leon, it's been great having you and really exciting to learn about CAST and also just hear some of your perspectives on the overall cloud market and where things are going. For folks who want to learn more, the website is cast.ai. And as Leon mentioned, they're hiring. I imagine you have a jobs page that's like cast.ai/jobs or career or something like that.

Leon Kuperman:
If we don't have that /jobs, then shame on us. We'll fix it but you can just go to our homepage and you'll see the job postings.

Ben:
Awesome. Well, thanks so much for joining today, Leon.

Leon Kuperman:
Thanks, man. Great questions. Really appreciate it.

Speaker 3:
Thanks for listening to PodRocket. You can find us at PodRocketPod on Twitter and don't forget to subscribe, rate and review on Apple Podcasts.

Speaker 3:
Thanks.