TROND: Welcome to another episode of the Augmented Podcast. Augmented reveals the stories behind a new era of industrial operations where technology will restore the agility of frontline workers. In Episode 85 of the podcast, the topic is Industrial Cloud Interoperability, and our guest is Leon Kuperman, CTO of CAST AI. 


In this conversation, we talk about cloud interoperability, whether it exists, why it's needed, and what it would accomplish. We get into the technical underpinnings such as Kubernetes and containerization and the outlook for public, private, and hybrid clouds, and the vendors that supply such advanced infrastructures. Augmented is a podcast for industrial leaders, engineers, and for shop-floor operators, hosted by futurist Trond Arne Undheim and presented by Tulip.


Leon, how are you?


LEON: Hey, Trond. Nice to be with you. Nice to speak to you.


TROND: So listen, we are going to have some interesting conversations about pretty hefty topics. So let's kick it off a little bit on the personal end. You've got a computer science degree from York University, and then you got yourself into some interesting stuff. I see you have a brown belt in Brazilian Jiu-Jitsu. Tell me how you got into that.


LEON: It's funny. It's probably more closely tied to my computer science geekiness than you would assume. So I was always picked on as a kid. I was a scrawny, little 120-pound teenager, and I guess bullies thought I was easy picking. And so, I joined the wrestling team in high school. That kind of gave me the confidence to stop the bullying. And then, after high school, I helped coach wrestling, and it became a big part of my life, grappling in general. 


And then I found Brazilian Jiu-Jitsu right around the time when the UFC was coming up. And the Gracie Family made their debut in UFC 1. And I was fortunate enough to be in Los Angeles, which is the mecca of Brazilian Jiu-Jitsu. What I've noticed since then is a lot of techies practice the sport because it's a lot like chess with a physical element. So when I was in Seattle, for example, at Oracle, I found that a lot of folks from AWS, and from Microsoft, and Oracle, and Google get together in the evenings, and they roll. And then position doesn't matter, company doesn't matter. It's just a tight-knit group of individuals. 


TROND: That's funny. Have you been watching Cobra Kai at all, the Netflix series?


LEON: My nine-year-old loves it. She can't get enough. So she watches it, and I watch it with her. And it's so much fun because of the Karate Kid.


TROND: It has that same effect on me too. It's just an interesting way. But I guess I thought of it because there's this whole philosophy crash there between the various schools of thought, but a lot of it has to do with gaining self-confidence.


LEON: Yeah. And you know what's interesting about that show is like, I don't live in Los Angeles anymore, but I used to. So every time Cobra Kai goes through like the areas that they're in, Encino, it's like a double layer of nostalgia for me because I recognize all of the parts where they film. 


TROND: Yeah, that's interesting. Well, anyway, let's go beyond Cobra Kai for a minute. You obviously became a CTO type, or, I mean, you've been working on the technical side for a long time. You were at Oracle for a good while as well on cloud infrastructure. And now I understand you lead an infrastructure as a service company. Let's start there, maybe. Infrastructure as a service, what is that in terms of a business model? What does that mean for you?


LEON: So it really started in 2006, I think when AWS or Amazon released Amazon Web Services, when they started with a few simple services, S3 for storage and EC2 for computers. It's a very simple concept. You don't need a data center. You don't need routers and switches, and you don't need to rack and stack your own stuff. You can just go and rent computers. 


But unlike previous hosting models, they provided an API or a control plane that said, "Hey, you can do this all programmatically. So you don't need to talk to human beings to fill out orders. Just call this API, and we'll give you a computer. We'll give you storage, and away you go." And that's pure infrastructure as a service with an on-demand billing model. And that's what we're aiming to optimize for our customers.


TROND: Look, cloud is, you know, or I guess it has been all the rage for a long time. It's actually, I guess, a little bit surprising that we have this topic still because so much of cloud has to do with speaking together across data boundaries, meaning in order to just upload the data to someone's cloud, you obviously have to put it into a format that works for that other provider. But let's dig into this whole area of cloud interoperability. First of all, let me ask you a very basic question. Does cloud interoperability even exist right now?


LEON: It does. It does right now and not necessarily because of a set of pre-thought-out standards that have emerged but because of commercial opportunity and through kind of a capitalist process. There has been a need for businesses to have multiple presences in different clouds in order to communicate together. And so, the very simplest interoperability is like the backbone of the internet, HTTP, HTTPS, and RESTful APIs. And that's not something that necessarily was a long-drawn-out standard that was a collaboration with a lot of big companies. It happened quite organically, which is why I believe it has taken hold and is the main driver for API interoperability.


TROND: At a very basic level, interoperability, is that something that in your field have you seen it mostly emerge fairly naturally between, I guess, the big players, for example, Microsoft and Oracle collaboration, or is it largely driven by more government mandates, or is it kind of a mix?


LEON: So it's kind of a mix. And I would say the most successful interoperability projects have been accidental and opportunistic, and let me give you an example. So Amazon was the first kind of company to provide and implemented at hyperscale object storage platform. It's called S3. It has an API, and customers started using it. Now, that's a proprietary API. There's nothing standard about the S3 API. Anything from the authentication stack to the actual data plane is all proprietary. But it's expensive. S3 is expensive. 


And there became an opportunity for others in the industry to step in and make that service more accessible and less expensive to companies. And so what did they do? They copied the API. So you have companies like Backblaze IO and Wasabi, and now Cloudflare has stepped in with a completely S3-compatible API. And that's just become the industry standard. So Amazon no longer owns it. I mean, they don't have ownership of the specification. And everyone just writes that API. Now, as a user of object storage, I have plenty of choices, and I'm not locked into Amazon just because I've used that API.


TROND: Well, look, it's a myriad of different organizations that get involved in standardization, as you know. I wanted to see if you could maybe lay it out for us. And I can sort of shepherd along with some of the ones that I can remember. But if you think about cloud standardization, which really is quite important if we believe that cloud is this important technology, what would you say are the four where this is really happening? 


So I can just list off a few that I'm fairly aware of. So DMTF is one organization, IEEE obviously a massive standards organization. They have mixed origin, so these two organizations. And then obviously, OASIS does some work, and ODF is another standards body. SNIA is more on the data management side. And you started out with some Web Standards, and W3C is a very famous standards organization there. 


But then there are also national bodies and international standards like ISO and stuff. And they famously have standardized a lot of standards, which were actually kind of big industry battles. And then they went into ISO standardization as some sort of middle ground, I guess. What are the four that you spend the most time in or that you think are developing standards that are really helping the cause right now for the industry?


LEON: I'm obviously very focused on an area of cloud computing called cloud-native. So cloud-native is kind of the area of taking containerized applications and allowing them to --


TROND: Can you explain that first? I meant to actually get there as well. But can you explain some of the technical side maybe before we get to the standards? Because you mentioned containerization, and that brings us to Kubernetes and a bunch of different things that I think we need to establish that base-level understanding first.


LEON: Yeah, absolutely. So when we started CAST, which is the current business, CAST AI, we made three fundamental bets. Those bets are important to understand the technical stack. We said that all kinds of applications as a deployment model are going to move away from being installed on virtual machines so on hypervisors with virtual machines to this containerization technology. What a container is it's this skinny-down operating system without drivers, without hardware support. It has a very basic file system, a basic network stack. And it has very few OS packages, just enough to make that application's requirements run. 


And you, as a DevOps person or an engineer, can specify what you want in that stack. So you can say I need this version of Linux with these dependencies installed, and then my application goes into what's called a container image. And Docker is the famous desktop platform that allows engineers to work and orchestrate these images. So we made a bet that look, we think that 95% of applications, maybe it's a little less, maybe within five to seven years are all going to be containerized as the standard practice. And if that's true, then who's going to orchestrate these containers at runtime? 


So when you have a container that starts up, you have to manage it. You have to manage its lifecycle, very much like a computer. And so, there was a bunch of competing open-source projects several years ago. And the clear winner of that race is Kubernetes. Kubernetes its origin was an internal project at Google that was called Borg, I believe. And Borg turned into...and was renamed Kubernetes. And it was open-sourced. And basically, it's a mechanism of taking tens of thousands or hundreds of thousands of these little containers and orchestrating them on virtual or physical hardware to make sure that they run together, that they're scheduled properly, and their life cycles are managed. 


TROND: This makes a lot of sense. I think you also probably need to give a little spiel on open source versus standardization because you said there are various competing or various open-source projects that matter to this. But for some people, open source does mean that these interfaces are standardized because they're theoretically open. But that's actually not really the case, is it? You actually have to actively standardize something.


LEON: Yeah, it's not the case at all. And I find that the projects that start in the open-source arena find value to their consumers, find value to the customers that are using those projects, have contribution from those consumers, meaning it's a full lifecycle. So open source works because developers get together and say, "All right, for our mutual benefit, we're going to contribute code to this project. We like your basic premise. We're going to sign up as a contributor. And we're going to donate our time, and energy, and effort to make this a better project." 


So Linux is the ultimate kind of famous open-source example. Now, did it lead to standardization of many things? Absolutely. But it started as an open-source project to break the monopolistic operating system that was Windows back in the day. So yes, there's a big difference. I believe that all of the value is first created in an open-source project when it gains traction in a market, whether that's an open-source market or even used in a commercial market. It then lends itself to standardization. And then there are many examples where we can go over that express that. 


We can even use some older examples like SQL is a great...I think it's an IEEE standard. I have to refresh myself on it. But SQL stands for Sequential Query Language. It's the language of all databases. It didn't start out that way. You had a few players, Oracle, IBM, back in the day. And they decided that they were going to make their query language interoperable at the database level. Are there extensions? Sure. But the fundamental syntax is the same across all databases, and that standardization has led to a lot of additional innovation on top of those commercial offerings.


TROND: Yeah, that's right. I mean, ISO, I think it was IBM, I guess, at the time that actually introduced their SQL standard into ISO and, at the end of the day, actually created the business opportunity for Oracle, right?


LEON: Yeah.


TROND: Because if that hadn't happened, there would have been no Oracle in a sense because they were building on that standard. So that's an interesting example. So now I think we're back to being able to actually talk about standards in your field. So you're saying, you know, we were talking about Kubernetes and container-based standards as like your entry point into cloud. What are the active standards that you are using now in this field? And what are some of the emerging discussions about interoperability and in your field?


LEON: At all levels of the stack, there are different standards that matter. So let's go pretty low in the stack, and then we'll choose a couple of examples higher in the stack. At the lowest level, a container has to run in a container runtime, so that used to be a commercial company would provide an implementation. Docker was one of those companies. Now we have moved to what's called a containerd  runtime environment. And that is just a completely open standard environment, so anybody can implement that interface. And I'll find the exact standard in just a second, Trond. But anyone can implement that interface and have an interoperable container runtime. And then folks can implement those with different best practices. 


So, for example, you can have a very security-oriented container runtime. AWS has actually done a really good job in that they're so concerned about container security. They produced a standards-based container runtime called Firecracker microVM that is interoperable with Kubernetes, as an example, and many other container orchestration platforms. So I think at the lowest level, how your containers run, that being an interoperable piece where you can plug and play different frameworks to run the same image and its layers, is very important at the bottom of the stack from a cloud-native perspective. 


And at the top of the stack, we have application-level standards. So one interesting standard that's emerging right now is a standard called CloudEvents. So what is an event? Every system emits some type of event. And so when you have to transmit data between two systems, really, there are two options: there's a pull option, and there's a push option. So, pulling is kind of clunky because it involves a client to pull for information. So every five seconds, I wake up and say, "Hey, do you have anything more for me? Give me the other stuff." 


And then there's a push option. And that push option in the last five, seven years has become extremely popular. So with the advent of open-source projects like Kafka, RabbitMQ, every modern application stack now has a push model where you can write events to an event broker and then subscribe to those events somewhere else and read those events. The problem is there's no standard for the meat of the event. What is inside of the event? So if I write my events in JSON but I have some funky schema, how are you going to easily understand that schema? 


So what the CloudEvents standard tries to do for...I believe it's from Cloud Native Foundation, CNF, is it tries to say, "Okay, guys, we're going to interoperate these events regardless of your implementation, whether it's Kafka, Rabbit, or anything else. When you push events across cloud boundaries, let's do it in this kind of cloud event format. And Microsoft and Oracle have been pretty active in publishing this standard. And I think it's gaining some traction because there's a real need for schema normalization of these types of documents in exchange.


TROND: So CloudEvents, right? So that you said is one pretty active track right now. If you think about your industry and the way that it has moved towards standardization, do you see some good reason why vendors would wait a little bit if they have the choice, or they perceive they have the choice to push on with some of their proprietary journeys a little bit before they standardize? Or do you think that overall in the cloud, the industry basically you have to compete on other things than these sort of proprietary interfaces? Because otherwise, you don't actually capture enough partners and users of your technology. 


There seems to be this battle in any case in computer programs, basically, where you obviously want to capture some amount of proprietary traction for your own, whether it is an ecosystem or a product. And some of it would then lock in others to follow it. But then, very quickly, the benefit of opening up also is very apparent because otherwise, maybe a competing ecosystem starts to emerge that gets more users, and then you're dead in the water. Tell us a little bit about how that logic works in various areas in cloud.


LEON: So when you look at the startup world, and you look at technology giants in general, whether they're explicitly saying it or not, they're all vying for monopoly. The golden rule in startup life is to obtain what's called a blue ocean, if you've ever read the Blue Ocean Strategy, and have no competitors. Now, the government says, "Oh my God. You are monopolies, like, this is terrible." But the truth is all companies that say, "Oh, we don't have a monopoly," have a monopoly. And all companies that say, "Oh, we have first-mover monopolistic advantage," don't have a monopoly. So they're usually pitching the opposite from what they have. But the goal is to get to a monopoly. 


And if you code to a standard as part of your early business model, you're basically saying, I am giving away the interface that I'm going to code to. I'm going to implement all of these things in this standard. And the next person that comes around can do exactly the same thing with very little barrier to entry. So from a startup culture perspective, standards aren't great. They don't help build proprietary gorilla status with customers.


TROND: Right. They don't necessarily build moat.


LEON: Exactly right. But the way that startups work around those, they say, "Look, we're going to do our proprietary journey. But we're going to open-source our stack, and we're going to figure out how to make money in other ways." And then that open-source argument led to a fork in the road. What type of license are we going to open source? Are we going to open source a very friendly license? And we should probably explain to listeners what the different licenses are.


But there are basically two camps of licenses. There's a very open and friendly type of open-source license; some of the examples include MIT license, Apache 2.0 license. And then there are other licenses that are very closed, what we call copyleft licenses. And a copyleft license basically says you can have our code, and you can run it. If you modify it or you attach to it in any way, you must open source all of those changes. So if, for example, if I extend the Linux Kernel, which is a GPL license, it's a copyleft license. I must open-source those changes. And that's how they keep Linux open and accessible.


TROND: It's funny how you call that closed because they themselves call it open. A copyleft is sort of in the community of copyleft; they think that that is the absolutely most open license. But I guess if you compare it on a spectrum, you have kind of copyleft; then you have these more permissive and what you call open licenses, the Apache and maybe MIT-style. And then you have obviously all kinds of blends of proprietary licenses that are...the old word would be closed for those licenses. 


But I mean, I guess you are using the terms somewhat differently. Because you're right, I mean, copyleft isn't very open in terms of the kinds of things you can do. They are predatory in a certain sense because they are viral, which is probably another metaphor that doesn't work very well these days. But anyway, they contain and contaminate, I guess, all code. So lawyers really don't have a good time with that in large corporations, that I know.


LEON: Yeah. So, for example, if you're fortunate enough to create a startup that's successful and it goes into an acquisition, into an M&A, one of the first things that will happen in due diligence is that your code will get scanned, and it will get scanned for license types. And the due diligence checklist will force you to remove things in your codebase that have a potential to infect or bleed into the acquiring company's intellectual property. 


So you're right; from a pure community perspective, it is the most purest expression of open source. Hey, if you touch this, bring your stuff back. As an entrepreneur, you really hate those licenses. I am totally fine committing code to Kubernetes, but I want to do that on my terms, not be forced into which parts of my platform should be open-sourced and which parts shouldn't be open-sourced. That's kind of my perspective.


TROND: I mean, this is obviously a very, very interesting territory. And I guess what I find [chuckles] and why it's interesting to discuss this on a podcast today, I think, is that these are not just theoretical discussions, are they? I mean, this is massive amounts of money, massive amounts of code. And which also means this is like in terms of running things and running manufacturing and other important things; this is not just theoretical. I mean, these things will have massive ramifications. If you were to make a big mistake in some people's minds and basically try to merge two types of codebase and then one actually turned the other one all copyleft, that could potentially have massive ramifications.


LEON: Yeah. In fact, in our CI/CD process, which is continuous integration, continuous delivery, we actually have a commercial piece of software that runs to check all licenses before we promote code to production on every single change. It's that serious of a problem for me.


TROND: So tell me a little bit about what it is that you actually do at CAST AI then. So you're two years in. You're starting to get some clients here. And your thesis is very aggressive on containerization becoming a much more important principle in the data world. So tell me then a little bit more about how you try to compete, what kinds of things you do for your clients, and what this container-based business model really means in practice?


LEON: Great question. So we work with customers that are adopters of Kubernetes if they're earlier on in what's called their modernization process. So you can roughly define modernization as the move to modular microservices that are packaged in container deployment entities and then pushed into an environment like Kubernetes. There are some competing ones, but Kubernetes is the one we work with. So if a customer uses Kubernetes, we kind of have this basic business thesis. You are over-provisioned on the computers that you use in the cloud, regardless of which cloud you use. 


And by some statistics, customers are 70% over-provisioned. That leads to massive waste, economic waste from a customer's perspective. And that also leads to massive impact on the environment because you have all these computers running idle doing nothing that needs to be cooled, that need to consume electricity that could be used for other purposes making the whole compute environment much more green if we were able to make effective change. 


So our principle is this: in the first leg of the stool of our product suite, which is resource and cost optimization, we believe customers are vastly over-provisioned and also don't have the DevOps resources to do the changes that they need to do manually. So in our industry, the typical approach from legacy vendors is, hey, we will connect your environment, and we'll give you a list of recommendations of things you should change in practice to bring down the cost. 


The problem with recommendations is nobody ever takes them. They're a great checkbox for the CFO. But no one ever saves any money on those. Why? Because as a DevOps engineer, if I make that change, and let's say I squeeze the requirements on my application from 16 CPUs to 12 CPUs, which might be a reasonable thing, I'm still on the hook for waking up at night if the application falls over. 


So there's this natural sandbagging process that occurs at all levels of the stack. And if you have an SRE team, you best believe there's more sandbagging that's occurring. And so, this overestimates and overestimates. You get to production clusters with tens of thousands of CPUs, and maybe only 10% of it is being used. But that's where we got to, and there are a lot of reasons why we got there, which I won't get into. But the thesis is if you don't have enough human beings to take those recommendations, a computer should take over that responsibility of managing the infrastructure that you use in the cloud. 


So what we do in a nutshell is we connect to a customer's Kubernetes cluster, and we take over the scaling operations of that cluster. We look at the resources that are required, and we make decisions every 15 seconds. That's our life cycle for decision-making based on cost and SLA as a first principle. So what requirements do you have? How much compute, and memory do you need to be successful? How much of disk compute memory? All the basics and principles of cloud computing. And how much do those things cost? And let us put you on a lifecycle that lowers your commitments to the clouds, saves you money, and ultimately packs those resources into the smallest possible footprint we can.


TROND: So, Leon, in a really simplified way, a container or a container strategy and an end vendor like you essentially helps someone with very large computer needs to optimize the way that they're using cloud storage or any sort of cloud capacity. And you're basically almost...and you said on a 15-second basis you make these choices about where they should put their resource, where they should buy into, and which type of resources they should be using at any given moment. 


And that is a process that, up until now, when done manually, I'm assuming like no human being would sit there and make decisions every 15 seconds. That just doesn't make any sense. So you're like, okay, look, this year, we went for this service. And yeah, they're a little expensive, but that's what I went with. And what this elasticity allows is a completely different way of allocating computer resources. That's what I'm understanding.


LEON: Exactly right. So think of a container as just a little small computer that runs on much bigger computers. It's virtualization but without all of the excess heavyweight drivers and operating system stuff that you don't need when you're running microservices. Like, when you're running a microservice, there's no need to have a video driver unless you need a GPU. So it's a simplified computer, and you can have many, many...so an average AWS instance can run 110 of these what are called pods or containers on a computer. So think about a 1 to 100 ratio. That's two orders of magnitude of scale.


TROND: Just super quickly, you said microservices. That's also another term that not everybody fully understands. Why were you saying microservices as opposed to other generic computer services?


LEON: So a microservice is basically a web service, so think of a RESTful API that only concerns itself with a very small domain. So, for example, in a production system, you used to have a monolithic application, and that monolithic application would have the ability to take care of authentication, logging, tracing, the application business logic, the database connection pools, and many, many, many other things. So it was considered a monolith because it would get published as one big unit. 


And so, the trend of modernization is to take a monolith and to break it down into multiple discrete services that each are responsible for a simple thing. So you would have something like an authentication service. You would have an audit log service. You might have a billing service that takes care of billing your customers. And the idea here is, yes, they have interoperability requirements. They need to talk to each other. But they act as independent entities, maybe with their own database connections, their own HTTP server Ingress, and so forth. 


So we're really trying to make it so that failure of a monolith doesn't happen. You might have failure in microservices, and that's fine; other services continue to run. Imagine if every time AWS had a blip, all the services of AWS would go down, like, database, compute, storage, machine learning, you name it. I mean, it would be an unusable cloud because there are failures every day. So by decomposing these useful utilities into smaller pieces, you can have failures that consumers tolerate.


TROND: So I wanted to bring this down to manufacturing. I know your family business is in manufacturing. Take all of this down to what a manufacturer might have to deal with. So in terms of cloud, historically, manufacturers were buying on-premise compute, and they were installing these systems on their shop floors. And they had these control system proprietary, all of them, that were essentially one system controlling one machine. And then this has gotten a little out of hand because you have more than one machine. 


But then these machines are expensive. And I'm sort of paraphrasing a little bit here. But essentially, even for fairly small manufacturers, they found themselves with a bunch of machines that were working more or less, but the control systems that they were operating with were only running one machine, and then they obviously had some accounting needs or had these systems of records which it's called in manufacturing where you have to kind of count what you're doing. 


But very rapidly, this has become a bit of a disaster in the sense that you need to train all these people to operate these very archaic systems. And then comes along cloud. What does that do? And what is the relevance of all of the things we've been talking about here to modern manufacturing services? And is it possible to move to this cloud environment if you basically have an on-premise system from the get-go?


LEON: Yeah, that's a fascinating topic. And it's one that I spent a lot of time thinking about just out of personal interest. But if you think about it, as you said, you have these microcontrollers like Siemens was a big manufacturer of microcontrollers. Allen-Bradley was kind of the 800-pound gorilla back in the day. And they all use proprietary low-level what's called ladder logic implementation. So even just to code those things, you had to be a specialized engineer or programmer. 


And then you have this adoption or modern era of IoT, so a lot of buzzwords around IoT, but it stands for Internet of Things. And what does that mean to manufacturing? Well, all it means is the manufacturing shop floor is now connected to the internet. Like, you have reliable, safe, and fast pipes to the cloud. So you don't have to do all of your processing on the shop floor. 


Does that mean you should do everything in the cloud? Absolutely not. Because imagine someone comes in, you have AT&T digging a trench, and they cut your fiber. All of a sudden, your whole shop floor is down. That's not going to work for you, right? So you need to be a little bit more sophisticated about where you put your compute. 


But as an example, let's say you have an advanced manufacturing facility and you want to do some quality assurance. So you install a vision system to inspect the parts that you're producing in your manufacturing cycle. So you have a camera that takes a picture and is able to discern high-quality parts versus parts that might be defective. And usually, that's the realm of human beings, but you want to apply some artificial intelligence or machine learning to do that.


Well, you need a vision model to be trained to understand the difference between good and bad parts. Are you going to train that model on the shop floor? Probably not. That doesn't make much sense. You're going to have a lot of expensive GPU horsepower sitting there literally collecting dust on the shop floor [chuckles] and sucking into the expense. So what you want to do is you want to take the data that you collect off of your assembly line, ship it to the cloud, and let the cloud do the heavy lifting to produce a machine learning model that can discern the difference between good and bad. 


And then you ship that model back to your shop floor, and that runs semi-connected meaning. Even if you're disconnected from the internet, that model can still run and discern the difference between good and bad parts. You might not be able to get a new model down until the connection is brought back up. But your factory is certainly not going to go down. So this creation of IoT has really moved the industry forward.


TROND: So that brings us to this new word edge, right? It's like, how much compute can you do locally on these edge devices versus how much can you do on the cloud? And I guess for you; the answer is pretty evident because, for some of these very, very computer-intense algorithmic calculations, you don't really have a choice because the edge might become very powerful. But this is the domain of high-power compute, and it is a cloud task unless you happen to have a supercomputer stacked in a corner. I mean, you're not going to run these massive calculations on the edge. 


But it seems to me that we would be...and I guess this goes now to my more futuristic question, you know, what is the outlook for this type of computing when it comes to industrial clouds? Because we haven't talked so much about that. But I know you have some opinions about whether people should have one or two clouds and what kinds of choices they make. So it seems to me that just as a base level observation, that you have to make flexibility built into your system. 


If you are a manufacturer or even if you have large computer needs, you have to have both local computing capability that can run fairly independently. And then you have to be very clever about your use of cloud resources because they're expensive and also inefficient if you don't handle them. But that, I guess, gets me to this question. So there are a lot of choices out there for cloud computing. Should one pick one vendor and put all eggs in one basket, or should you duplicate and spread your bets out a little bit? And what consequences would those choices have? And then that get us into a little bit of a discussion about where you see this evolving.


LEON: So, as always, the answer depends on where your maturity is as a technology shop, and what your goals are. If someone is just starting out in cloud computing and saying, "Okay, we got a bunch of tasks. Let's move them to cloud," don't pick two-three vendors because that's just a complexity disaster that is waiting to unveil itself. Pick a vendor that you like. Go through... there are plenty of criteria to help evaluate. And there are really three or four, maybe five big vendors. But the big ones are Amazon, Microsoft Azure, Google public cloud, Oracle, maybe IBM but Alibaba if you're kind of the fringe. So pick a vendor. 


But here's what you shouldn't do: you shouldn't go all-in on completely proprietary services where there is no opportunity to even swap out that service for something else in case of emergency or in case you really are unhappy with your vendor. And I'll give you a couple of examples. If you choose Amazon and you choose S3 (We talked about S3 Simple Storage before.), do you have optionality? Absolutely, you have so many vendors that will support the S3 protocol. Even though it's a proprietary service, you shouldn't be scared at all about using S3. Should you be scared of using EC2? No, because those are just regular computers. 


But when you get into something like their proprietary database, they have a database called DynamoDB. It's not an SQL standard database. It's basically a key-value store. And I'm thinking about Mongo or Redis like they have these key-value capabilities. The API is completely proprietary. And the number of vendors that actually support it is maybe one or two outside of Amazon. Maybe I've seen one vendor that adequately supports DynamoDB. 


So don't go all-in on DynamoDB; you're just locking yourself in forever. And so it's those higher-level what I'll call PaaS, platform as a service, offerings that you have to watch out for. Those are the tremendous lock-in levers. And Amazon loves those, and all clouds love those because they're basically getting a customer for life. The switching costs become way too high to consider.


TROND: Well, that's interesting. So switching costs, Leon, they haven't gotten gone away, and lock-in hasn't gone away because you would have thought that this was a big government battle. The government fought it back in the '90s with document formats when that was kind of the only salient thing you were exchanging, you know, that regular users were exchanging was Word documents. And so there was a big battle around that. And then those eventually were standardized. But now, of course, every decade, every year, brings in another type of lock-in, and you're saying on the cloud-side, there still are opportunities for lock-in. 


So if you take my question then again on the future outlook, what's going to happen to industrial cloud computing? Is it going to continue to be a game of three to five players with some of them being somewhat proprietary or all of them pretending that they are very, very open, but at the deeper levels of the stack, they're always going to try; to go for proprietary lock-in? 


Or do you see any opportunity to really move to a completely interoperable world where essentially, if you like Amazon, then that's fine, but if you don't, switch to something else, and there shouldn't be really big issues around it? I mean, from a public service type of way of thinking, I would think the latter is preferable, but it doesn't seem to me like that's in the interest of big vendors or small vendors, for that matter. Like a startup wants to create some initial lock-in, like you said.


LEON: Well, if you look at how evaluations are created, they're created on the perception of whether you have a moat or you don't have a moat. If you don't have a moat, then it's easy to step into a space. So I think about moats all the time. I think about how do I create...maybe it's not an interoperability moat, but I create data moats as an example. But yeah, there are a couple of interesting trends on the industrial side so let's talk about those. And they come from vendors that all of the vendors kind of have these offerings now because they're forced to. 


So industrial players have said, "We want to be able to connect with these clouds reliably, so we need what's called the ISO network stack model. So layer two of the network stack, we want to be able to connect with your clouds at layer two, and we want you to provide us an affordable service to do that. So once we connect that layer two, then all of the layer three and four protocols begin. And we can easily talk to our applications in the cloud on a private line, so that's happened.
 
Every vendor has a private connection that any industrial complex can sign up for, and through a series of vendors, they will get a low latency, fast connection to their cloud environment that is highly secure. Great innovation and highly interoperable because you can be connected to one or two or more cloud environments if your enterprise needs that. There's nothing that stops you with routing protocols, which are all open and standard to say I want to be connected to Google and Amazon at the same time.  


And then the other interesting innovation that these guys have done is they've created these Outpost offerings. So what they've said is you like our cloud stack, but you need it local to your environment for many reasons. Maybe it needs to be air-gapped for security for kind of a top-secret implementation, or maybe you just can't afford the latency to go over that wire. Maybe you have other security constraints. So they said, look, no problem. We will ship you 1,2,3,4 racks of Google cloud, AWS cloud, Oracle cloud. We'll ship that to your facility, and we will run it for you. 


It'll be in your data center, but you will interoperate with it with a control plane, what's called the control plane, which is a set of APIs to run it using our standard API. So you have to change nothing between talking to a public cloud and talking to an Outpost or a secure private environment. You basically just point yourself to another. So those are two really interesting vendor-led innovations that have led to greater interoperability. 


However, there's one kind of elephant in the room for interoperability. And no matter what standards get created for data exchange, no matter what standards get created for application interoperability, I believe all vendors...I shouldn't say all, but most vendors are doing a very egregious thing that is detrimental to the development of the next wave of internet innovations. And that's something called Egress costs. I've been very passionate about this topic. So, Trond, can I give a 30-second explanation on what these Egress costs are? 


TROND: Yeah, yeah, for sure. 


LEON: Okay. So basically, what the cloud says is that when you send...what these cloud vendors say is when you send data to a cloud, we're not going to charge you anything for it. Send as much data as you want. It's all free to go. But when you take data out of the cloud, we're going to charge you. And we're not only going to charge you what we pay plus maybe a small margin, we're going to charge you heavily. So let me just break this down economically.


It all depends on the number of bytes you move in each direction. And so AWS today charges approximately nine cents USD in one of our, you know, let's say Ashburn or California to move data out of the cloud. Nine cents USD per gigabyte doesn't sound like a lot of money. But when you know what the cost structure is under the covers, it's two orders of magnitude more than what you would pay if you run your own data center. So when I ran my own data center, I would pay what's called a dollar a megabit. Rough measure, right? A dollar. 


They're charging you the equivalent of $30, 30 times more to get access to your data. That's not cost of goods plus some margin; that is gouging. And that prevents people from interoperating because even if you have two data centers, let's say you have your own data center; you're pretty close to a cloud location, so latency isn't a problem. You can't move massive amounts of data from a cost perspective. It's just unattainable.


TROND: So what does that mean then if that Egress cost is there? Is this where you would have to, from the get-go, basically have several vendors because you either have to also keep some of your data in your own data center and then do a lot of the compute on some other vendor's center? Or is that when you need to plug in some optionality for data that you think you might have to move around because you want to re-compute some calculation? So you know that it is data that needs to be mobile and needs to be brought back into the company.


LEON: Yeah, at scale, this is a huge gotcha. For most companies that are dealing with data lakes at scale, this is problematic. And it's even led to some talk of repatriation which means, hey, this whole thing is just way too expensive. Let's figure out if running in our own data centers makes sense for the scale that we're currently at. I'll just give you a statistic, like, public data. Snapchat spends more than a third of its revenue on cloud infrastructure. Like, all the value they add to their customers with their application and the network they've built, third of that money goes to the cloud. And it's definitely a high-margin business, and a big chunk of that is Egress. 


So, where do I think the industry needs to have a breakthrough? It's in this very specific topic. We need to have these vendors come together, and Oracle and Microsoft have kind of done that. And they said, "Look, for you customers, if you want to run Microsoft applications with Microsoft data centers in Oracle databases, we're not going to charge you a lot of money for this interchange of data." 


So it's an oligopoly in a lot of ways because if there are only three or four players, then no one's changing prices. They can keep these Egress costs artificially high. And I believe that if there isn't natural competition due to scale, then there is going to have to be some type of intervention from a government perspective to figure out how to limit the profitability on that specific line of communication.


TROND: You know, given what I heard in the congressional hearings over technology last year, it strikes me that this would have to be explained at an enormously basic level before it could get to policymakers. It's like, even to just explain the basic cloud is a challenge. Now you're talking about a very specific segment of the cloud business. But yeah, it does sound like it hasn't received an enormous amount of attention yet. Why do you think that is?


LEON: It's where we are in that adoption lifecycle journey, that famous kind of adoption cloud. We're just getting the mainstream adoption of cloud. Like, there's still a ton of IT infrastructure that's in data centers. People will have plans, people will have...but these are multi-decade transformations. And then, as the quantity of data grows, this Egress problem will quietly bubble to the surface. I mean, the problem is even more egregious than that. 


Like, just to give you an example, Amazon offers these things called availability zones. So within a single cloud, the best practice is not to put all your eggs in one data center. You want to have some of your application running within; I don't know, 25 miles between data centers. So there's an official kind of availability zone definition, if you will. They charge money for transferring data within their own cloud. So I have customers that actually say, "It's too expensive to have high availability in these cloud environments. We are going to take the risk of going down in the availability zone not to pay the potential bill of hundreds of thousand dollars a month just to have applications spread across AZs."


If you are an alien that came to earth and that was explained to you that this is how people build their applications for these limitations, you would think this is an absolutely crazy race of people. So the technology is understood. What we have to do is understand that economics is now getting in the way. And some of that economics we can break. As CAST, we're breaking some of that economics, and we're helping our customers. But some of it we have to wait for either competition to create the pressure or governmental pressure to be there.


TROND: Leon, these are fascinating days. And it's a really fascinating but complex topic that we've been discussing today. I thank you so much for enlightening us on the continued labors and journeys in cloud computing.


LEON: Thank you so much, Trond. Really great speaking to you, and fantastic questions. 


TROND: All right. Thank you. 


You have just listened to Episode 85 of the Augmented Podcast with host Trond Arne Undheim. The topic was industrial cloud interoperability, and our guest was Leon Kuperman, CTO of CAST AI. In this conversation, we talked about whether industrial cloud interoperability exists and why that matters. My takeaway is that interoperability is a silent enabler of collaboration between systems which by the same token affects collaboration between people and organizations. Its technical complexity often limits the debate about the subject in non-specialist circles, which is a shame. Given the pivotal importance of cloud infrastructure in today's computing environment, the relative progress made on interoperability will determine the course of products, flexibility, security, and productivity. 


Thanks for listening. If you liked the show, subscribe at augmentedpodcast.co or in your preferred podcast player. And please, rate us with five stars. If you liked this episode, you might also like Episode 17 on Smart Manufacturing for All. Hopefully, you'll find something awesome in these or in other episodes, and if so, do let us know by messaging us. We would love to share your thoughts with other listeners. 


The Augmented Podcast is created in association with Tulip, the frontline operations platform that connects the people, machines, devices, and systems used in a production or logistics process in a physical location. Tulip is democratizing technology and empowering those closest to operations to solve problems. Tulip is also hiring. You can find Tulip at tulip.co. Please share this show with colleagues who care about where industry and especially where industrial tech is heading. 


To find us on social media is easy. We are Augmented Pod on LinkedIn and Twitter and Augmented Podcast on Facebook and YouTube. 


Augmented industrial conversations that matter. See you next time.