work on Architectural Cloud Design Patterns with Keith Casey
===

[00:00:00] Hi there, and welcome to PodRocket, a web development podcast brought to you by LogRocket. LogRocket helps software teams improve user experience with session replay, ever tracking, and product analytics. Try it for free at logrocket. com today. My name is Paul, and joined with us is Keith Casey. Keith is a senior project manager over at Pangea, and we're going to be digging into some topics that touch all projects near and far.

We're going to be talking about design patterns. The cloud and globally rethinking our approach to utilizing these design patterns, where, how, why, what can go wrong. So excited to dig into it. Welcome to the show, Keith.

Thank you. Thanks for having me.

So right now you're senior project manager over at Pangea. Maybe we could just get into a little bit about what brought you to this point in your software journey and why you feel like you have some opinions to share about design patterns. So really quick, what does Pangea do? are you doing over at Pangea?

And then what was your journey to get there?

Yeah, [00:01:00] absolutely. So, I'm an API guy for my entire last 20 years or so in my career learning and building about REST APIs circa 2005 or six felt like absolute pioneering at that point. And what I realized pretty quickly is that we all had ideas on how APIs should be built, but none of us were doing it well.

Fast forward probably four or five years when I joined Twilio, somebody was starting to do APIs well and doing them well at scale. And so we're out there advocating here's how to build APIs. Here's how to consume APIs. Here's how to think about that. So when I joined.

Pangeo, one of the things I realized behind the scenes was that as people are building their apps in the cloud and those aren't new muscles at all, we've been building apps in the cloud now for what, 10, 15 years if you want to be generous about

Oh yeah, definitely.

Yeah, but we're applying a lot of the same patterns and practices that we have that entire time.

Yes, we still have MVC and active record and all those things. But there's still a lot of our components that are [00:02:00] dependent on the assumption that our stack is all together, like promises in JavaScript are one of the exceptions, but the vast majority of our systems assume that all the components that we're interacting with are all going to be local.

They can all interact. We can very quickly tell if they're online or offline. we have some great tooling that's starting to address those problems, but even then the tooling addresses the problem. Without us understanding what the underlying problem was and what the underlying design change we should make and how we can look at the world differently, potentially not even hit that problem in the first place.

So now that you're over at Pangea, do you feel like even now, like you're in a well established company, there's a lot of misconceptions that you've brought over, like you say yourself as a pioneer of the API days. Do you feel like we're still behind?

And I don't, I don't want to pick on Pangea.

Like it's, this is with any company. No matter how great your software is.

yeah, and it's it, this is not specific to Pangea at all. In fact, it's not even internal to Pangea that I've seen this. [00:03:00] It's that the people that are integrating these components they're relearning the same lessons over and over again. Every time there's a new JavaScript framework, they absolutely nail one or two things and then they completely forget a bunch of other lessons.

And we end up in this situation where we're having to resolve that problem that has frankly been solved in one way or another for decades, but was really solved well with that last framework you were using and you switched over. And one of the things that we see is that when we're thinking about designing these apps we're not thinking as a distributed system developer.

We're thinking as web developers. And the vast majority of web development is request response. And it's very predictable, very reasonable. It's a one to one , relationship between those things. When we start thinking about distributed systems and that manifests through something simple like webhooks.

We have to assume that just because we make a request, we may not get a response. Or we may get two responses. We may make five requests and get five [00:04:00] responses out of order. And we just have to think about how we're addressing, how we're considering those things as we're building our systems.

I'm sure folks who maybe use the request and response, I know me as myself as well you know, you're following some documentation and I should say like these design patterns if we want to call them like, yeah, they're put forth even more so by, by frameworks. I like that you mentioned frameworks. So the frameworks really push where you steer your head and where you put your finger in the wind for, Hey, how am I going to build this thing?

A lot of times the opinions can come through the grapevine. So when people are building why do you think causes that? misalignment of, Hey, I think I'm using a design pattern. I think I'm following documentation. And then it ends up with more problems. It ends up with like lock in down the line.

To take a step back, fundamentally, a design pattern is not, it's not a multi tool. It's not going to solve every problem you have. A design pattern is just a. A common repeatable solution to a problem that's really what it is. And what I really feel that design patterns are great at are [00:05:00] communicating a lot of information very succinctly.

So if I say NBC, you know what NBC is, just like if I said uh, we had a Goldilocks problem. If I said we had a Goldilocks problem, most of your listeners would say, oh, there must be something that's too big or too small. And we're trying to find like the thing that's just right. We were able to express a lot of information and.

An approach like in a problem state in one phrase design patterns are the exact same way and what most frameworks do is they bake a few design patterns just kind of implicitly in the system itself. Most of them have some sort of active record model. Okay, cool. Active record is a fantastic design pattern.

It makes sense for a lot of things, but there are trade offs that you have to make. And if you understand those trade offs, you can look at your particular situation and say, okay, the trade offs that active record brings to me are acceptable in my situation. But if you don't know those tradeoffs, and if you don't know what the constraints of your system are, when you bring that [00:06:00] pattern into your system, you may have nailed it and you may be fine, or you may have introduced constraints or tradeoffs that aren't going to work in your situation.

And so we end up in this problem of the thing that we chose that we thought was going to be the solution to all of our problems. Just introduce different problems and we don't even know what those problems are yet.

Speaking of problems, one of the things that you would maybe have listed in your slide presentation. So Keith, Was kind enough to send us over you know, some of work that he's done putting together notes about design patterns. And one of the things that stood out to me was lift and shift because personally, I've heard of this from years and years you know, we're going to take one system and merge it with another.

We're going to take one system and migrate it. Oh, that that's a good word, right? We're going to migrate the set of API endpoints. Oh, that's going to be the death of me. Can you speak to us why we get to this point? What is lift and shift at the end of the day? And. Why do you see that in particular causing problems?

Yeah, absolutely So lift and shift fundamentally is the idea that we could take [00:07:00] something from our existing architecture the existing Infrastructure or whatever pick it up and move it over to another completely different infrastructure completely different architecture And everything will work just fine.

And I think containerization is a fantastic step along that route. We've done that well with for what, 10 years now with Docker and now Kubernetes and everything like that's a good route. Like we're in pretty good shape there. What we're missing is that even when we have things containerized pretty well, and you've got your web app containerized pretty well, there's still all these other dependencies.

It's still dependent on the database server. It's still dependent on your email service provider, your SMS provider, your caching layer, all these other things that individually don't seem like much. And so when we lift it out of one place to just shifted it to the other, we don't necessarily know where all those threads are going to.

And if we don't understand that, we're probably [00:08:00] going to have an outage. We're probably going to have problems. And what we're ignoring through that process anyway, is that the constraints that we built under. On the initial architecture and the constraints that we have on the new architecture may not be the same.

And if they're not the same, then we need to rethink how we're doing things. It could be as simple as our networking, like our ingress layer has changed. Okay. Well, , We need to change that configuration. Then some of that stuff is going to be pretty clear and obvious, but there are so many dependencies that just slips through the cracks that we don't even realize about, and we think about this from a.

External web developer standpoint, when we're building apps for the general public, this is generally a lot simpler. There are going to be less dependencies. They're more likely to be clean, predictable APIs. But when we're building this for enterprise apps, when we have this app that has been in production in our company for 5, 10, 20 years, that's a nightmare scenario.

That's the kind of thing where there are a ton of dependencies that we probably haven't even considered, let alone [00:09:00] found yet.

Do you feel like that is a product of misusing design patterns? Or is it a product of just a misconception of The strength of the design pattern and saying, Oh, it can lift the world. It's Atlas. So we can glue them together, lift and shifted.

That, that's a good reference. I think some of it is being unaware that because our constraints have changed and our requirements have changed, our tooling and our approach and our thought process for solving those problems needs to change also. So, , In some cases we have ended up with a misapplication of design patterns.

Like, I have a LinkedIn course on design patterns doing it in PHP. Every single time, like probably once or twice a quarter or somebody takes that course and I get questions about the Singleton pattern. Everyone loved the Singleton pattern because it just made so much sense, but then they misused the hell out of it.

They embedded it all over the place. They did so many bad things with it and they overused it to the point where now. In many [00:10:00] applications, and this is very common in old school PHP, you end up with this pool of globals globals are terrible because anyone can change the state. And when we're thinking about enterprise applications, we have a lot of those old patterns built in.

But we didn't realize, Oh, wait a minute. If we change a global state of something that changes the global state, that means it's different for this guy and this developer and this situation and everything like that, and we don't even think about that because it worked for our use case. I think it's probably a common thread that we're running into that's now just multiplying.

So you mentioned downtime as an example, like it's just unavoidable especially in these lifts and shifts scenarios, because you're going to end up with threads and tendrils that you didn't expect, let's say we're not even focused specifically on lift and shift, but we're talking about a cut over maybe you have a blue green deployment, whatever it might be, can you talk to us a little bit about what makes cut overs so challenging and from your experience in the past five or 10 years, [00:11:00] like design pattern practices.

Help minimize that impact that it has on the org and the users

I think blue green deployment is the single best solution that we've come up with. Done well, the resulting downtime should be measured in seconds. And that is a wonderful thing to have. I spent five years at Okta and I remember the first time that our blue green deployment, like the downtime or the read only time is basically what the system went into.

The first time it was consistently under a second. That was awesome. That was something to

is awesome.

right? Like we're sitting here dealing with millions of users at that point. I think we had seven or 8, 000 customers and many of them had thousands of their own users. And they're like, Hey the read only time, cause they didn't even call it downtime.

It was the read only time was 0. 98 seconds. And I was like, Holy crap. That's amazing. That's the kind of thing where you're like, if you happen to hit that, which the odds of an end user hitting that are [00:12:00] exceptionally low, you're like, Oh, that's weird. You hit refresh and you're like, Oh, I don't know what happened there, but it's resolved now.

, I'd say that's the single best place to be. The problem that we have as an industry is that unless you're on the forefront, unless that system was built in the last, say 10 maybe, 12, maybe 15 years. Most systems don't have the concept of blue green deployment. We're still deploying to an application server.

Maybe we have some sort of pooling where we can shift traffic from one set of nodes to another set of nodes. But it's not that clean approach that now we develop and we use on a daily basis. Twilio and Okta at Pangea working with a lot of legacy customers, just bringing their apps into the cloud in a good predictable way.

Yes. They've been putting stuff on EC2 for 10, 15 years at this point, but they haven't been like thinking about, oh, now how should we do this differently, but blue green deployment is wonderful. Like whenever possible, if you're doing that you're in good shape.

[00:13:00] in general, do you feel like in the past 10 years, we've made progress in this direction? And despite any products that you might have seen, what do you think is the current goal of zero downtime deployments? Are we there? Do we have a lot of ways to go?

So I would say yes to all the above. there have been great improvements, great great tooling around that great mindsets and shifts around that and great training around that. That's good. I'm very optimistic about the way we're pointing the current state of where we are, though.

Unless you're building a consumer facing web app, generally at one of the large tech companies. So think of not even a Facebook, but think of somebody who launched and built their app in the last five years, unless you're at one of them, you're still in the dark ages on a lot of the stuff.

And you're still struggling with these things that other apps figured out a long time ago, and you're trying to figure out. Okay. Well, I've got the [00:14:00] system and I'm trying to get it online. I'm trying to keep it online. The development team that built it, they built it 10 years ago. Now, how do we do this?

How do we keep this going? And. I've kind of split the world into those two buckets, but that boundary is not so clear because if we're building apps, if I built a new app today and I'm trying to get it launched, enterprises are a huge customer base. And so at some point, I might have to integrate with their systems, whether that's like with a pulling data from their database, whether that's integrating with their identity provider, whatever, I'm going to be dependent on something from their system.

So, The fact that they can't update that easily. Is a huge problem in my world.

I really liked the first 15 minutes of conversation we just had because we talked a lot about what's wrong, what we wanted to desire in these deployments in some considerations in API design. But of course, the meat and potatoes of this podcast is talking about [00:15:00] rethinking patterns is talking about reevaluating what we're doing.

So we're going to get into kind of the second arm of this conversation. Right Before we do that, I want to remind our listeners that this podcast is brought to you and sponsored. By log rocket. So if you're building a web app, no matter how big or how small, and you want to spend less time in our favorite place, the console, not really, and more time writing your code in your app and actually making stuff for your users, you can go to log rocket.

com and you get a bunch of features such as heat maps. AI driven data collection, you can have full session replay to see where your users are clicking and garner actual real data. So head over to logrocker. com you can spend more time building an app and less time debugging. So , let's talk about rethinking some of the Patterns and modes of thought that might have led to inefficiencies and anything less than that gold standard of blue green deployments and avoiding cutovers primarily you like to focus on, or at least in the top of [00:16:00] conversation that we're looking at today specifically.

Rethinking into the cloud, like the cloud's awesome. Everybody likes not managing servers. Uh, Is the most cost effective decision for you? I don't know. That's up to you and your business. But at the end of the day, ~there's a lot more tools.~ There are a lot more tools and services that you can use to make your life easier.

Maybe you don't need a giant engineering team like Oracle massive to achieve a blue green deployment. So just cutting straight to the chase. What are some of your primary areas where You say, Hey, the cloud allows us to rethink this and allows us to rethink it in a effective and efficient manner.

Yeah. There's a whole bunch of areas we, we can hit. Obviously we all know what microservices are, but I, like a lot of things with distributed system development with event driven systems. Webhooks are a big piece of that. I love that approach. Last year I launched a website called a webhooks.

fyi. When I was working at Ingrok and it was all about let's document different approaches to webhooks and get a state of the industry. [00:17:00] And hopefully make, help make things better and zoom and HubSpot and a few others have used that to improve their webhooks. So I'm a big fan of like, okay, let's think about this from a distributed system standpoint, where when we make a request, we may not get a response, or we may get multiple responses, or that may be out of order, all kinds of things like that.

But I'm also really focused on how do we make that transition smoother? Of, okay, cool, that we're on a single node or a single EC2 node, or we're on an actual physical server. Now, how do we think about moving this to maybe not just the cloud, but maybe start splitting this up. Start thinking about the individual containers that go behind this so that we can start replacing pieces of the system.

Because fundamentally, legacy code sucks. It's a huge investment. It's a huge time sink, and odds are it doesn't meet your current requirements. That's probably why we call it legacy because if it worked and we are all happy with it, we wouldn't call it legacy code, we call it our app and we'd be happy with our app. 

[00:18:00] So like in this talk, I really focus on in three things. One is the strangler pattern. Then we've got pub sub publisher subscriber, which is very common in webhooks. And then we've got a circuit breaker of when things break to make sure that we're approaching things well. ~ Where would you want to start?~

Let's start with the circuit breaker, because that's the last one you mentioned. Let's go in reverse.

Yeah, absolutely. So, Fundamentally circuit breaker if you have at your apartment or your house when you have a short somewhere in your electrical system, it trips the breaker. And so you have to go over, you have to find your utility box and flip the breaker back. Fundamentally, a circuit breaker is all about I have an error in the system somewhere, therefore, I'm going to turn off traffic or in this case, electricity to that to mitigate danger to mitigate damage.

Everything like that. A circuit breaker is a design pattern works exactly the same way. I have an outage in the system. So instead of piling up requests. So think of a queue filling up instead of filling up that queue and hoping at some point that [00:19:00] service comes back. And now we flooded with all this traffic.

A circuit breaker pattern says, Hey, this system is down. Let's fail fast instead of waiting for all those requests and go on and do another ~another~ process. Let's not block here. It's actually a common pattern in logging. So I want to be able to write a log. Okay, my logging system for some reason I can't communicate with it.

Maybe it's offline. Maybe there's a misconfiguration, whatever. Okay, let's write this to a queue and let the queue figure out how to resolve that later. It ends up being very common in a lot of notification systems also, where we have an outage downstream. Let's hold off on this. And when we're thinking about these blue green deployments, we're thinking about scaling our infrastructure.

We need to have some sort of circuit breaking capabilities. You might also call it health checks of this particular node, this particular cluster infrastructure, whatever, there's something wrong with it. It's not giving the results back. We expect let's take it out of our cluster. [00:20:00] So we're at least not sending traffic there and potentially breaking things because all it takes is a misconfiguration or maybe something got upgraded in that,~ ~~that~ particular cluster.

And it changed the behavior of the underlying system for getting back results. We're not expecting. It's probably doing work. We're not expecting. So, Therefore, it might be modifying data in some way. It might be sending emails to users. It could be doing all kinds of things. Circuit breaker will take it out of the system, isolate that component, so we're not using that broken component anymore, and keep the overall system functional.

Do you find that the circuit breakers are common? I have to say, in my experience, working on a couple of SAS apps in the past few years, kind of yet to see something that is truly at the top level. That's like, hey, health checks are wrong. And it's not just one health check. I'm listening to a few, right?

And if one goes out that then I cut it. Is this like a newer thing that's coming out? Where have you seen it in practice?

~In,~ in digging around to, to make sure that I've described it well,[00:21:00] they've been publicly described since about 2013, 2014. So we're looking at it having this like out in the wild as a concept for a decade, but it's not in many places. If you have a load balancer. Odds are the load balancer has some sort of circuit breaking capability built into it.

But that's very much backend that a web app developer is probably unlikely to see. I think moving this more to the front end , for things where you're interacting with systems where you have zero control, so think of you're interacting with Stripe. When I sent off my request to Stripe, if I don't get a response, which granted Stripe's pretty reliable, so that seems unlikely.

But if we did have that problem, to be able to go, okay, we haven't received a response for that yet. Let's cut off more billing until we get a confirmation that it's back online. I think moving those sort of capabilities to the front end makes a lot of sense. Because at the end of the day, We don't want to take our app offline.

We don't want to block the user. We don't want to frustrate them and give them a reason to go [00:22:00] do something else.

So I that was a great clarification . Thank you for saying that because The analogy of a circuit breaker ~ actually mine's been going out in the house. I'd say like once a week, something's wrong with that thing. You need to get an electrician in there ~but you know, I'm thinking about like actually like flipping the physical thing and I'm thinking, wow, so this directly relates to infrastructure and totally could, like you said, load balancers.

They have this tooling built in. You could use terraform, the UI, whatever it is to Wire up your health checks, but what you're noting here is specifically one level higher. Let's bring this into the application logic. Let's try to write some guardrails around that. So do you feel like this is this is something that could be done in code, essentially, like in your application, you can write these fail safes and these checks, these like global checkers.

I don't want to use the word global inappropriately here, but kind of the concept of that.

~Yeah. ~Yes, I think we need to bring this more into the front end application. But there are some trade offs just like any other design pattern. There are some trade offs that we have to think about because a circuit breaker is inherently saying this component is [00:23:00] broken for some reason.

We don't know why we have to take it out. We have to turn it off. We have to disconnect it from the system. So in our applications, we need to have awareness of, Hey, look, we, this will hit errors at some point. How do we build around that? How do we catch the exception that the exception is coming back?

And first of all, trip the circuit breaker to be able to go, okay, remove this from our,~ our~ flow. Then we have to be able to say, okay, with that removed from our flow, how does our system work? Do we allow it to continue to work? What capabilities of the system ~does it use? ~Does it use or lose in the meantime?

So if we were sending text messages and we said, okay well, you know, Twilio is our primary thing. Oh, something is wrong with the Twilio account. Let's stop using Twilio. Does that mean you don't send text messages or does that mean you have a backup provider that now you try to send through? And you need to think about those decisions before you start implementing these patterns.

And then kind of the flip side is when that circuit breakers reset, how do you then go back to the [00:24:00] original, the, like the designed purpose of flow that you had in the first place? And frankly, there's a lot of people who aren't thinking in those terms. They're thinking at the infrastructure layer.

Yes, it's there. Your load balancer will remove things from your cluster and add things to your cluster automatically for you. That's wonderful. But when we think about the web app of, okay now the system's not responding to us. Let's take it off. Let's take it offline. Now what do we do? And we're not,~ we're not~ describing that well.

We're not thinking about that well, and we need to.

Thinking about breaking a circuit, cutting off, it's power kind of wants me to jump to the first design pattern. You mentioned when you asked which one we should go to, which is the strangler.

Yes.

So strangling things in the same way? We're breaking the circuit. I assume not because we're at a different design pattern here.

Tell me a little bit about the strangler, .

Yeah, absolutely. So the Strangler Pattern is used a little bit differently. So, Let's say you have that big, ugly legacy system, and most people's initial reflexive action is, I want to rewrite it. Okay, cool. That's terrifying though, [00:25:00] because if it's working if people are using on a day to day basis, they probably need it.

So if you go to your team lead or your CTO or somebody like that, and you say, I have a brilliant idea. We're going to rewrite the system. That's a bad idea. Maybe they'll be polite and won't laugh you out of their office, but that's the kind of thing that they'll stop taking you seriously.

So what the strangler pattern does is it helps you rewrite systems safely. So you have your existing system and it works. It works in ways that you can expect, but you need to rewrite it. You need to improve it. So you do a couple of different steps. One, you put a facade in front of it, a proxy layer, and that proxy layer on the first version of it is going to be exceptionally dumb.

It's going to basically be a replay. So it's just going to proxy the request and replay them back into your system. Very simple. Once you have that, then what you do is in parallel with your old system, you start building the new system and you say, okay, my, my old system does these five things. Let me pick off this one thing and [00:26:00] re implement this in my new system.

And then in the proxy layer that facade, you make the facade a little bit more intelligent. You say, okay, my new system handles this one action. So when I get requests for that one action, Instead of going to the old system, go to the new one. And the cool thing about this approach is if your, ~your~ old system does those five things and it does them well, well now it only does those four things and your new system with the new architecture and the new setup does that one thing really well.

And oh, by the way, your system still works. You never took the old one offline. In fact, if the new system, even if there's a catastrophic failure, You just change your facade to point at the old one.

Oh, it's like a circuit breaker. Like we just talked about.

Well, And a circuit breaker fits really well there because you can put that in directly into the flow of, if I do have an outage in my new system, cut over to the old one.

So if I were to visualize this, we sort of have a top service. I [00:27:00] don't want to overload the service term here, but we have a top thing where the requests are coming in. You have your old system on the left and your new system on the right. And that sliver on the right of what that new system you're writing is doing is slowly going to take up the pie chart of where that old system was.

Exactly. It strangles out the old system.

Ah, that's where it comes from.

yeah, there you go. So the old system is doing five things over time. It's doing four, three, two, one, and then you take it offline. And ~your, proxy ends up,~ your proxy, your facade layer ~ends up,~ starts very dumb because it's just redirecting everything to the old system.

Over time it starts redirecting to the things in the new system. And then it gets very dumb because it just turns into a proxy again.

That sounds like a really maintainable way to start to rewrite something. And I'm curious if you've seen the strangler method. So far used in the past five or 10 years. And if you have, and it went wrong, what did they do wrong? Like If somebody is, Hey, I want to try this out for the first time.

Everybody's a noob when they do something for the first time, you've got to get something wrong. So [00:28:00] maybe what's a, I saw this in the wild. You might want to look out for that.

this is one of the things that we nailed really well at Twilio early on. So I was there, I was like number 18 there way back in the day. And whenever we bring on new systems, they had a facade layer called shadow. And basically it was an early blue green deployment where we could, we had our existing nodes.

Shadow was sitting in front of it. We'd bring up our new nodes and then shadow could take those requests that came in and clone them to the new nodes and then do a diff on the results from each. And whenever we're thinking blue green deployments, and we're thinking about doing that of a uh, strangler pattern is a good, safe way to go about it.

Because if you think about it, as we're transitioning between our blue and green nodes, That's fundamentally just a strangler. When we say, okay, 5 percent of our traffic is going there. Now 25, now 50. We're doing the exact same concept, and we are replacing an old system with a new system. We just have never described [00:29:00] it this way.

So I'm happy to say that a lot of people are doing this and they're doing it well, and they're not even realizing it.

That's well, that's encouraging. I guess what, that encouragement, we can move on to the next topic, which. I don't know what's going to come out of this next segment of conversation because I feel like I looked at these things. I'm like, wow, this is infrastructure. Even you mentioned , like the circuit breaker doesn't have to be infrastructure.

Put that in the application code. So pub sub patterns. I mean, Everybody knows pub sub classic on AWS. We got queues and stuff. How does that boil down into making more maintainable software?

Yeah. So fundamentally when we're thinking about how our systems are architected, a lot of times we think about I make a request and I get that response. The exact same pattern that we have all the time. What we don't realize is that when I get that response and I do whatever action and that action is complete, whatever the result was, whether it's success or failure or [00:30:00] whatever, who cares about that?

Like what other systems care about that? And PubSub fundamentally is, the principle of like what Webhooks is built on. So in a publisher subscriber model, you've got the publisher, that's the thing doing the work. It says, okay well, that unit of work is complete. Let me broadcast into this effectively a shared channel to say, this work is complete.

Whether it's a Twilio text message of this text message was received or a Stripe payment confirmation of this payment was received, we broadcast that into the channel and then anyone who cares about any subscriber can listen to that and say, okay, I care about this message. Now let me take action.

So the analogy I always use is the airport. When you go to the airport and you're sitting at your gate waiting ~for your uh,~ uh, to board your flight, when they come over the PA system, they say, okay, now flight you know, United 1, 2, 3 is boarding to the city. They don't call you by name. That would take forever if they tried to call people by name.[00:31:00] 

But by saying, hey, this flight, this information, there's a change in state. You as a subscriber can say, oh, wait a minute, that's my flight. Cool. I, know I need to go. They don't have to communicate one on one. So when we think about this in our web apps, obviously webhooks are the most common scenario of this by far, but any sort of notification system throughout our systems whether that is like on a dashboard of saying this just changed state or whether that's a Slack notification, it's all fundamentally the same principles.

Do you find that pub sub is something that is still rolling out into an actual effective time saving, maintainable maintenance saving, I should say design pattern? Because from my perspective, pub sub, like that's tried and true. You, You talk to a lot of people, they use pub sub. We got a ton of frameworks.

Bull mq, you got temporal technologies out there. So I'm curious where you see. the pertinent diff about [00:32:00] what ground do we have to gain here using those tools and powers?

I would say people are using PubSub and they're using it well, they don't know that it's PubSub. So they're, they're using these different notification systems and everything and they don't realize, Oh, wait a minute, this is just a PubSub system underneath behind the scenes. And because it's a PubSub system, ~here are my,~ here are the things I can do with it.

They're thinking more of a, I have this tool and this tool is wonderful and it's great. It solves my problems, but they don't realize, Oh, wait a minute. Here's what it's actually doing. And so in this one, it's more of a I'd say the vendors have been exceptionally good at marketing their tools. But if you understand the principles behind the scenes, now you can dig in and do things a little bit differently.

So just for sake of example, , like if we had a pub sub based Twilio API that we were interacting with, and you're urging folks to say, Hey, don't view this as a black box. Know that this is pub sub, you can do things with this to make your life better, like your team, better the quality of your software, better.

~What are ~what's one of those [00:33:00] angles that you feel like that new garnered knowledge could, flower into one of those end outcomes of having not known that it was PubSub before, and then with that new knowledge, doing something different or structuring something different.

Yeah, I think probably the most important thing is if you could say well, once you realize this is just a message, and this message can be processed multiple times, it can be stored, it can be multicasted to other systems. It doesn't have to be like a, I received this response. I received this webhook notification and then it's done and you can say, okay well, this could be the first step of a larger system, the first step of a larger workflow.

That's where things start to get interesting because you can go, okay, well.~ I've got,~ I get this request. I get ~this ~this notification. Now, what do I do with it? you know, One of the common things, especially I've seen this in lots of startups is every time there's a strike payment, it gets dropped into a Slack channel so everyone can celebrate.

That's a wonderful, fantastic situation. But we can also do that for. Any sort of reporting or [00:34:00] billing purposes. We could drop that into a Google spreadsheet and go, okay here's what you can do with it. Here's how we can be happy about this. But then I also think about it from our other design patterns standpoint of the strangler and more specifically the circuit breaker, because a change in state is what the circuit breaker wants.

That's what the circuit breaker exists for. So if there's a change in state where this component, it goes offline, that's what a circuit breaker is,~ is~ dying to have. So can we use pub sub to then notify the circuit breaker of, Hey, look, this system just went offline or this system's getting it, having an issue, just preemptively cut off.

preemptively this component. And then on the flip side of now, we're starting to get responses back from our systems. Again, we're getting responses that make sense that what we expect. Let's go back to the circuit breaker and turn it back on. Because now things are working again.

I love that example because it's emphasizing that there are entry points and exit points all over the place of PubSub.[00:35:00] That's one

Of the takeaways I'm getting here. , Extending beyond, it's not just a black box, but there's points of insertion and points of extraction of data and what's happening with your system that allow you to grow it more organically and implement the things that we talked about prior.

Yeah, absolutely. And when we start thinking about pulling these three particular design patterns together. So you've got a new deployment coming online or you want to rebuild the system. Okay, cool. Let's put the strangler pattern in place. So we've got that facade and that facade sits in front of your old application.

Okay, cool. As you're bringing your new application online. Use a circuit breaker in front of that new application of in case there's a problem, you can shut it off and go back to the old one. So there's no data loss. There's no catastrophic issues, anything like that. And then behind the scenes, you've got pub sub communicating everything.

So you've got pub sub getting notifications of there's been an update to this,~ this~ new system. It's been deployed. Okay, cool. Let's go ahead and make sure the circuit breaker is operating. [00:36:00] Everything's working. Everything's are the right traffic is going to the new system. Okay, now we have an error in that new system.

Let's use PubSub to then flip the circuit breaker off to stop that traffic. Oh, wait a minute. Now it's been fixed. Let's use PubSub to then flip it back on. And so I think these three design patterns fit very nicely together of I've got this legacy system. I've got this legacy architecture. I maybe I just have a blue green deployment.

And now I want to be able to make this entire subsystem a little bit more reliable and a little bit more self contained, self reliant. And if I plug these in correctly, we get really powerful systems without having to do a whole lot.

This conversation is entertaining too, because I feel like we as software developers, we could learn so much from the trades. In that regard, like where I had National Grid doing gas line work, and a lot of these same kids, they probably have the same constraints and even 10 times as strict because they don't want to blow the street up.

And they need to think about [00:37:00] all these things and their circuit breakers and they're cut off and how they're going to blue green my gas line so it's interesting to see this like bubble up into the software world. Yeah, Absolutely. And what's funny is on a, on a~ on a~ personal note uh, three years ago, I moved from in suburban Austin, Texas out to the country. And so I've had to learn a lot of those skills. And I, what I've realized is that. Circuits like working on circuits is actually pretty straightforward once you know, the principles working on plumbing is actually very similar to electricity.

Once you know, the principles and so all these principles like, you can gain a lot of knowledge and understanding ~from~ of things by stepping out into other areas, exploring, learning, probably getting knocked on your butt a couple of times. And then coming back and saying, Oh, wait a minute, here's the actual underlying lesson I could take from this.

And here's how these principles tie together. So yeah, a hundred percent, like the trades can teach us so much.

. I want to continue our conversation. It's unfortunate that that time slipping beneath our feet. Really quick before we wrap up your [00:38:00] outro, I just am curious if there's a design pattern. We talked about a bunch of good things here, but if there's a design pattern that you've seen people foot gun themselves with more than you would like to accept as reality.

Others in Singleton 

Oh, I forgot about Singleton. Right. You

mentioned 

that one. 

I would say also microservices ~ we,~ I think we took the slider too far one side and we ended up with a, in some organizations I've seen a microservice per person. And I think that's slicing things a little too thin. And we end up with these nightmare management scenarios and we end up having to introduce patterns on top of it like, ~uh,~ service discovery and service mesh and all these other components to solve the problem that we created ourselves.

~One of~ one of the reasons that might have caused more problems than others in some companies. I've even heard of organizations where they say, Hey your bonus is tied to how many microservices you manage.

Oh, that's terrible.

my God. Yeah. 

That's like measuring developers by lines of code

Yeah. 

it's such a [00:39:00] terrible metric. I just, it doesn't, at the end of the day, I want to measure things by is the system stable? Is it reliable? And is it solving needs? If it's solving customers needs and it's online, we win.

this has been illuminating, not just for me. I hope it's been educational and entertaining for our listeners to learn about these design patterns and because hearing some of the stuff, it's almost cathartic where you're like, I've dealt with this in the past and man, if we did it this way especially the strangler one, like that's a classic, let's rewrite this big system.

Let's change the framework. So really appreciate your expertise. your decades of software engineering coming in and talking to us about this. If people want to learn more about design patterns whether that be directly from you and any of the resources you have, or just other books and stuff online, where would you point people to, to, extend their journey of learning here?

Yeah, absolutely. So the gold standard is Martin Fowler's Patterns of Legacy, Displacement, and Patterns of Enterprise Architecture. He is the godfather of so much of this [00:40:00] stuff. He's been writing and thinking about these things for 25 years. There's also um, the Big Four Design Patterns book, which is fantastic.

That's feeling a little dated, but still 95 percent of it is fantastic. The Microsoft Azure's architecture website actually has a ton of things on great design patterns. And right now, I'm actually standing up a new site, designpatterns. fyi, to sort of aggregate and collect some of this information, especially with regards to Like here's some cloud focused design patterns and how do we use those effectively?

Awesome. , if people wanted to just keep up to date with you and your musings, what's coming out, are you on Twitter slash X or wherever? Other social media platforms like Mastodon, where can people find you?

Yeah, absolutely. So I'm Casey software everywhere. That's C A S E Y software. You can find me on Twitter, on LinkedIn . I've got a couple of courses on API designs, API testing, API security. Um, In addition to to design patterns and yeah, I'm all over the place. So, Definitely check that out.

Case software is the best place. Oh, and case software. [00:41:00] com.

Ksoftware. com, easy to remember, thank you for throwing that in there. Well , thank you again for your time, it's been a pleasure.

Thank you.