AMOS::
I'm not up on all this kid slang stuff you guys got going on.

CHRIS::
This isn't kids slang stuff.

AMOS::
The only people I've ever heard say don't at me is you and my daughter.

CHRIS::
Well, we made it. We're all here.

AMOS::
I'm so glad we're all here. I would thought last night I was going to be recording in my in-laws basement again, but we found a house.

ANNA::
Oh, are you in Kansas City?

AMOS::
I'm not in Kansas city. Um, but we have our, we have a house under contract right now, so.

ANNA::
Where is it you're moving to again?

AMOS::
Kansas city.

ANNA::
Okay.

AMOS::
Yeah. Northern side of Kansas city. I'm actually not in Kansas city proper, but he, you wouldn't know the difference when you were driving down the road. So that that'll be nice. I'm so glad. We've probably looked at 30, 40 houses and I'm just tired of looking at houses. Just want to buy one now. So we, we made a contract last night and the seller accepted it. So now we're just moving forward with all the paperwork and getting the house inspected hope. Hopefully, uh, that goes well. And then I'll be a city dweller.

CHRIS::
I don't think being 40 minutes outside of a city makes you a city dweller.

AMOS::
I'm 15 minutes from downtown.

AMOS::
Really?

CHRIS::
I thought you said you were moving out of the city. Quite a ways.

AMOS::
Kansas city has some, some great, uh, interstates into the inside.

CHRIS::
Sprawl, I think is the word you're looking for? Uh, I love Kansas City. I mean, I go to Kansas city all the time.

AMOS::
You get 30 minutes North of Kansas city. And you're like in the woods and there's a lot of that within Kansas city area itself, you can get some like two and three acre, lots pretty close to downtown. It's kind of crazy. Close is relative for some people though.

ANNA::
Wait, how many acres?

AMOS::
Two, three, two or three?

ANNA::
Yeah. That's wow.

AMOS::
You got like a 900 square foot lot there in San Francisco.

CHRIS::
Yeah, but now you're going to have to pick your, uh, you're going to have to pick a barbecue joint that you're going to rep.

AMOS::
Yeah - my house. I'm not saying Kansas city barbecue. It's got too much sugar in it for me. I liked like spicy, you know, mustardy barbecue sauce is like the Southern.

CHRIS::
You're living in the wrong part of the country. My dude, I know.

AMOS::
That's why I gotta make barbecue at home, but you're right.

CHRIS::
I think picking your barbecue place in Kansas City is a little bit like picking your tocoria in San Francisco. You're required to have your go-to.

ANNA::
That's probably, that's probably true. That's all right. Still gonna wear my Cardinals shirt to every barbecue joint. I go to

AMOS::
Playing with fire.

AMOS::
Uh, it's all on the same interstate right there on the other side of the state. So there's enough people wearing both of those that it I'll probably not get a shot. Hopefully.

CHRIS::
There's enough other rivalries in the greater Kansas city area. You'll probably be safe, right?

AMOS::
Like Ruby versus Java.

CHRIS::
Exactly.

AMOS::
Yeah. My mouth is probably what will get me shot.

ANNA::
Uh, so should we talk about elixir-y anythings?

AMOS::
Yeah. What have you guys been been up to in the elixir world lately?

ANNA::
Just working on planning this workshop for Write Speak Code and working on this talk, this exchange for this, um, which turns out the Elixir is the most awesome part. The hardest part is just, you know, blockchain.

AMOS::
I thought blockchain was super simple. I watched your talk.

ANNA::
I mean it is, but yeah, it turns out distributed systems are hard.

AMOS::
I think that leads into what we were going to talk about today. Should we make mention of the, of the newest version before we jump into deep distributed systems talk? So elixir one seven release candidate. I think two just came out one, one, Oh, it's they start numbering zero, like real computer scientists. So release candidate one just came out. Um, it'll probably be one seven by the time we release the show will already be out, but there's some pretty cool stuff. Um, have you guys seen the EEP 48?

CHRIS::
This is the thing that allows you to have, have documentation for Erlang functions inside of IEX now.

AMOS::
Yeah. And, and other languages, it basically puts a standard on Erlang documentation so that, uh, all Erlang systems can have access to the documentation of all the others.

CHRIS::
So you can have alpaca and LFE and Erlang and Elixir all in the same thing. And then be able to look at all their documentation together.

AMOS::
Which is sweet because now the IEX helpers, whenever you're wanting to use an Erlang function will actually display their Erlang documentation. That's very dope flee. Totally. That's very cool. So yeah, that, that I think was, was the biggest thing. There's some, some stuff with lager, um, and X unit, but the thing that excited me was documentation because I love documentation in a non side-effects stack trace.

CHRIS::
What did they decide to do with logger. I haven't kept up with it since the original proposals where they were going to start forcing everything to always use like anonymous functions so that they could purge them, um, to avoid in, you know, in doing an inspection on like a giant map or something like that in production.

AMOS::
So they do have compiled time emerging, but it they've added some, uh, configuration for it where you can, uh, add to at the application level to have every application log a little differently. You can have it purge out certain modules or even, even logs from certain functions. So that's pretty cool. If you have, if you have something that's a really busy that you keep in dev, like, but it's bits out a lot of stuff. Maybe you keep it in dev cause you need to watch something, but it slows your system down in production. You'll have that compiled out. So you won't even need an anonymous functions. It'll just completely eliminate it. And then, um, they've added some, some information and integrated with the OTP logger. So yeah, that should be, be a lot better. And they said if you're using Erlang air logger, you should probably in your application, you should probably hook into now Elixir's logger instead, since we have all that integration. And, and, uh, coverage results built in.

ANNA::
Oh yeah, that's right. That's right.

AMOS::
There's a whole lot of other stuff. There's a bunch of bug fixes and things like that, but I'm, I'm most excited about documentation.

CHRIS::
Yeah. That's going to be super helpful.

AMOS::
If you'd ask 10 years ago Amos if he'd be excited about documentation, he might've punched you in the face for asking such an idiotic question. And now I'm like, now I'm like yay documentation. And I'm a lot more calm.

CHRIS::
Really hated documentation. 10 years ago.

ANNA::
You had really strong feelings,

CHRIS::
Did you have like a religious experience? And then all of a sudden come to light and realize that you really wanted documentation in your life?

AMOS::
I think at one point I would have said, why don't you work on a real problem to solve? And then later on I found out documentation is an extreme problem to be solved.

ANNA::
Yes. It makes a huge difference.

AMOS::
Yeah. Yeah. I, I think that's one of the things that I really like about Elixir versus Erlang is the documentation feels much more approachable to me. Uh, when I get into the Erlang stuff, some, some of it has great documentation, but a lot of it is missing or all over the place.

ANNA::
Yeah. It's a little bit harder to parse.

CHRIS::
It's all over the place is the real thing for me because, there's three different places to go look for documentation in Erlang. Cause you have like the actual docs, you have the sort of getting started, whatever that manual thing is called. And then you have a really useful document, but it's, it's I forget the name. It's like, uh, the tips and tricks documentation. I like the, uh, the, the system design strategy stuff, uh, documentation that's in there. Um, I totally blanking on the name, but I'll find it and add it to, to notes. And that last documents really, really, really useful to go look through. Um, but figuring out how to go find all the different pieces of things is the tricky part with Erlang docs.

AMOS::
Yeah. One day, one day. I hope to master that. Now we got that out of the way.

ANNA::
Chris, can you talk about any of the cool stuff you're working on yet or not really

CHRIS::
Right now I'm working on just tons and tons and tons of queuing theory. So much queuing theory going on in my life currently, which is, which is really to say that it's, uh, it's just Little's law all the way down. If you've done any queuing theory,

AMOS::
I was getting ready to suggest - at Strange Loop last year, somebody gave a talk on Little's law. That's pretty,

CHRIS::
You know, if you're only going to know one thing about queuing theory, it's basically Little's law,

AMOS::
Which is not a difficult thing. It's like what three variables multiply some things together. You're done.

CHRIS::
Well. I mean, Little's law is sort of the algebraic equivalent of equals MC squared. The actual underlying math is less interesting than what it implies and what you can do with it.

AMOS::
So what I learned last time, I looked at Little's law is rate-limiting doesn't really work.

CHRIS::
And you're gonna have to, you're gonna have to expand that more.

AMOS::
Yeah. I think, I think that might have to be a different show cause I might need to review before I get too deep into that, but basically with rate limiting, you just end up with backing systems up and, and there, there better ways to, to figure out what resources you need to actually process through everything,

CHRIS::
We should probably actually define what Little's law is.

ANNA::
I was about to say, we should probably, for those people listening, who aren't familiar with it,

CHRIS::
I should have done that at least 30 seconds to two minutes ago, somewhere in that range.

AMOS::
So Chris, do you want to talk or do you want me to, or does Anna

ANNA::
Only a little, I mean, I am familiar with Chris might be the most -

CHRIS::
No pun intended.

ANNA::
Oh man.

AMOS::
So go for it, Chris.

CHRIS::
Yeah. So Little's law basically says that given a stable system, given like a stable queue, the number of things in the queue is equal to the average arrival rate multiplied by the average wait time, which seems obvious, right? Like let's say you're a, at a bank, people arrive at the bank every 10 seconds and you have one teller and the teller takes 30 seconds to process that person. So you can calculate out the number of people from that, uh, pretty, uh, pretty easily because it's just, uh, if they arrive every 10 seconds, then you, uh, multiply that against 30 and then you arrive at your answer. That's all like how many people are in the queue. Um, and so it seems like super obvious and the math is really basic, but what's cool about it is you can calculate out a bunch of other interesting properties based on Little's law, because it's the thing that always holds, um, for different cues.

ANNA::
What are some of the other properties that you can,

CHRIS::
This is why you can determine that like rate-limiting, doesn't work to some degree because you can, it doesn't depending on how you rate limit stuff. Uh, you're still just going to back up the cue and cues overflow on a really, really, really simple math problem, which is that the arrival rate is greater than the departure rate. That's queues get backed up. If those things are equal, you don't get backed up. If you start to skew towards a rival rate, then you get backed up. And at some point, because we live in a physical universe, cues, overflow, and bad things happen. And so you have to apply some sort of mechanism to handle cues, getting overflown. One of the more common ways of doing this is called back pressure. Uh, it's this gen state, uses back pressure.

AMOS::
And, and that applies a lot in Elixir because we're sending messages and creating cues between our processes. So, and, and the nice thing about Little's law is like, if you have a couple processes in a chain, you can apply Little's law at each point in that chain, as well as to the overall system, the same law will hold as you go up in the system and, and encompass more accused under one same equation. The strange loop talk, if you guys want to look it up, it's, uh, John Moore, uh, gave a talk called stop rate, limiting capacity management done right. Where he talks a lot about Little's law and concurrency controls instead of rate-limiting. And, uh, I think he, I think he's like a chief architect at Comcast cable or something. So they probably deal with a lot of this type of stuff.

CHRIS::
So, yeah, we've, uh, I've been doing a bunch of that just in terms of capacity planning on utilization and stuff like that on more of like a systems level. Um, we have a bunch of big queuing type problems that we have to work on. Um, so trying to figure out how all that's going to play out, very reasonable amount of math, which is nice.

AMOS::
I like math.

CHRIS::
I like the idea of math.

AMOS::
Yeah. That's that might be true.

CHRIS::
I'm not very good at math. Okay. Actually kind of really bad at math.

AMOS::
I'm really crappy at writing proofs. So I tried that for awhile, but you know, that's, that's a lot of testing is writing proofs too. So maybe I'm getting better at it over time. I don't know. It's been a long time. Since I tried to sit down and write a proof. So Anna distributed systems.

CHRIS::
You're solving like one of the hardest problems, which is distributed consensus. And you're solving that problem in the face of Byzantine failures, which turns out that's kind of hard.

AMOS::
Well, you better define some terms there.

ANNA::
You want to, you want to define it.

CHRIS::
I'm going to know. And you're the blockchain person, right?

ANNA::
Yeah. So, I mean, we talk about how consensus, right? When, when you mine a block on the blockchain blocks are always being mined and added to ideally the longest chain. But because the distributed system, you have nodes all over the world mining at the same time. So whoever essentially mines the block and sends it out to the rest of the net, at least once when I went to node mines block, they sent out to us the network saying, Oh, we've mined this block. Assuming it's valid, the other nodes on the network will add that block to the chain. And essentially the first block that's valid that propagates through over 50% of the network will essentially be the one that gets added to the longest chain. And that's kind of how consensus is achieved, you know, and part of the reason that these protocols work is generally people are incentivized to be good actors and not bad actors. Um, because you have no way of verifying if the nodes are, or are not trustworthy,

AMOS::
It's distributed big enough that it's really hard to control the network.

ANNA::
So it was actually dealing with a bug, not specifically with the exchange was dealing with a bug recently where it's trying to retrieve supply from the Ethereum blockchain. We were hitting, we're using this API. Um, and anyway, we thought we were looking at the same block number with the same supply, turns out, we'd realized at some point in time later, that supplies didn't match when we were calculating in different environments. Um, and it turns out that we thought we were looking at the same block, but that, that had not achieved consensus yet. Um, and so our suppliers were wrong because it birtherism the supply does not increase statically for each block. You have this concept of like uncle blocks. Um, so you, the supply varies, the supply that gets added after each block is mined varies. And so one block ended up winning right, and not getting added to the chain. And that was the block that had had a larger, um, reward for that mining, not the one that we thought was the one that was added to the chain, but it took a long time to actually figure out that was the problem. And so solution was to go for, I mean, and it was actually a mistake in the sense that, um, we should have been using the general rule of thumb, which is to calculate not towards the end, not towards the head of the blockchain, but to kick it six blocks back, um, to make sure that that consensus had generally been achieved. We were only going like one or two blocks back. And so that fork essentially changed from under us. And because we were getting information from the API, which could have been hitting different nodes on the network, it's totally possible. You can mine two blocks at the same time. One of them just hasn't propagated over 50% of the network yet.

AMOS::
So the further back in the network, you go, the more likely you are to hit a block that has more than 50% of the network has consensus, right? So you said one or two what's, what's the advantage of only going back that far?

ANNA::
I actually want to go back like six blocks there. The rule of thumb is like six blocks.

AMOS::
Why is that the magic number?

ANNA::
There's really no magic number. The idea is that if somebody was trying to do like a malicious attack, it would be unlikely that they would actually be able to have enough the probability of them actually being able to mine six consecutive blocks in a row to continue manually having the longest chain is highly unlikely given how much hashing power is required. So there's really no magic number just after about six webs of probability of something shifting right after six had been added is really unlikely.

AMOS::
What's the expense of you going back blocks on your end? Like why, why wouldn't you just do 50?

ANNA::
Well, if we need, if you want up to date supply, you want, you want to get as close to the end of the blockchain to see how much, how much supply there actually is. And so you want to get close to the head of the chain as possible.

AMOS::
So there's a balancing act there. Like I want the most up-to-date ledger because I want to know who has what, but if I get too up to date, I could end up getting some conflicts.

ANNA::
Yeah. Or incorrect, incorrect values, essentially.

AMOS::
Right. I guess that's, that's the scary part

CHRIS::
'Cause the entire thing is guarded against this idea that there's bad actors in the system, which is the Byzantine failure thing. Like you send messages to each other, but you can't trust any of the messages that you get back. So how do you establish consensus when you're in an untrustworthy network and that's why blocks can disappear out from under you. And that's why going back a certain amount actually works out because the likelihood that they can own that amount of the chain is just sort of overwhelmingly unlikely, like statistically unlikely at that point, which is about as good as you can get.

AMOS::
Are there other methods to deal with Byzantine failure? Other than that, I mean, not necessarily blockchain wise, but in any sort of distributed system,

ANNA::
Chris.

CHRIS::
So there are other ways of like trying to solve Byzantine failures. Um, I think that, I mean, there's a Byzantine Paxos. We should maybe explain that they're called Byzantine failures because of that. That was sort of a, I think it was lamp. I think it was Lamport he proposed this idea that, uh, this problem called the Byzantine General's problem. Um, it's basically, you have a bunch of generals all around a city and they're all gonna attack at the same time. And how do you coordinate that attack, uh, bearing in mind that at any moment, any of the messages that they send in between each other can be intercepted by rogue agents and changed or not delivered or delivered out of order delivered, uh, or manipulated. You don't know that. Yeah. And you, and you can't control any of the other. No. So how do you coordinate, uh, uh, an attack, um, in that way? And I think, uh, the original, like Byzantine Paxos proposal, I think Liskov was even involved in that. I don't remember, um, sounds right. But, uh, but yeah, so like people have attempted to, uh, solve this before the, the original like blockchain paper or whatever the Bitcoin paper, I guess I should say. Uh, I mean, combined a bunch of these different concepts that had already been out there in the world, um, before, um, as a different way to kind of solve this problem.

ANNA::
I mean, essentially incentivizes people to be good actors generally. Um, like the incentive on the blockchain to be a good actor versus a bad actor.

AMOS::
It's seems like you have to get a network of a significant size too, before you can really have any sort of trust in what's going on. Hey, is there like a, a certain level that you need to get to because we have, you know, three people, if two people are being bad, it's done. Like the whole thing is over.

ANNA:
No, it's true. I mean, that's why part of the security in the network itself, at least for blockchain, part of the security, the network itself is as that's network grows, right. Inherently the idea is that it becomes more secure from that standpoint. Some more, there are more actors involved and it becomes increasingly difficult for a bad actor to take over over 50% of the network.

AMOS::
It would have to have, uh, a lot of current capital to get into stuff,

ANNA::
Not even capital.

AMOS::
Or, resources.

ANNA:
Yeah, exactly. Because in order to have the hashing power, right. Cause on the blockchain, your ability, at least currently until they change kind of the model, your ability to mine is directly equivalent to your hashing power. Right. And so you would have to have a lot of hashing power for the probability of you mining to outweigh the probability of everybody else, mining in such a way that you could consistently maintain the lead on the longest chain.

AMOS::
So you're encouraged to be a good actor and it's cost-prohibitive to be a bad actor.

ANNA::
Basically. It's just really interesting, like that thing that we were, I don't know for me at least, and I'm newer to distributed systems, but for me, at least the thing that we run into with, um, you know, again, separate from actually just building this exchange in elixir and modeling this exchange in elixir, this idea of like consensus, especially finality around data, right? Like learning to think about those things in a different way. And it took a - now I think in my head, I would think about it differently. It took us, it took us, um, a little while to figure out exactly what the problem was until we were like, Oh wait, we're not actually looking at the thing that we think we're looking at.

AMOS::
How did you end up figuring out what the actual problem was?

ANNA::
It took a long time, like just again, eliminating other factors and then looking at the differences in supply, um, and realizing that from one supply source, right. From one feed of supplies together, that it was actually just at certain points, you were different by the amount of an uncle block. And so we were like, wait, wonder if, and then, um, but it took a long time to determine that. Yeah. I mean, Chris, as far as like, I think he, I mean, you talk about this a lot. You've talked about this before, but really like, you know, again, if you read, people read those papers, right? Like trying to get finality around data and distributed systems, making sure that what you, what you have is what you think you have. It's a different way of thinking about how you need to think about those things.

CHRIS::
And a lot of this stuff is all or nothing. You either get all of it, right. Or it all doesn't work. And it, it doesn't work in ways that you don't understand. And it doesn't work in ways that don't manifest as crashes or stack traces or logs or things that are easy to go and understand and to go start diagnosing, they manifest as incorrect data without a real good way of knowing why, uh, for most problems, I think when people would approach this stuff, they just don't have the tool set, uh, at, at the ready to diagnose how you got out of order, how things, how rights got overwritten, how you lost data, or how you ended up in this place where you just have different data. Um, and it's just tough. I mean, it just takes looking at it and takes experience. And I think there's a set of tools that would help people be able to do that. But we don't, you know, people just aren't equipped with that to be able to do that necessarily.

ANNA:
What, what are some of those tools?

CHRIS::
Uh, I mean, one, I think you need a practical understanding of how these things work, um, and the kinds of failures. So experience, I mean, experience beats everything at some level, you just have to do it and you have to get in there and build this stuff and see that it doesn't work and understand why it doesn't work. Um, the second problem becomes one of observability and being able to diagnose unknown problems. So in order to do that, you need ways of doing some sort of verification of your algorithm, which means probably like sending a bunch of data into it, uh, exercising faults, as you send data into it in a reproducible manner and then asserting the outcome of it. Right? So it's a little bit like property testing. Uh, you can say like I have these inputs, this is what I should get out at the end. And if you don't get those things out at the end, then you know that you have failures. Um, and you need to be able to do that in a way that is controllable and deterministic. And to the deterministic part is the, is the hardest part when it comes to distributed systems, making things deterministic, meaning when I send these things in, when I have these things fail in this order, I can always get the same output and I can always get it to fail in this way. Like if you can get that you you're light years ahead of everybody else because that's really hard to get. So, but if you can figure out ways to control that, then you're good to go because now you can reproduce these things over and over again, until you suss out the problem, it becomes really, really useful to start understanding lineage and by lineage. I mean, uh, the order in which operations take place along with their failures. Um, so starting at the client and saying the client issues, these requests, these right requests in these orders, and they arrive at the box at the, at the server, uh, in these orders and they, and these get executed. Now, how do we look at that lineage at those histories and determine if we had failures or determine, um, if we have these characteristics. And so being able to look at those and figure out what your minimal cases for reproducing a problem becomes really, really important as well.

AMOS::
Um, I'm going to, I'm going to just say it, it sounds like, uh, a job that quick check can do for you.

CHRIS::
It is. I mean, that's, but that, so I mean, what we're getting at is effectively quick check, uh, property, you know, a state machine tests or FSM tests. It generates this big list of things that it's going to execute in order quick check specifically has pulse, which will control the ordering of all your side effecting operations. And it'll run those things and control all the inner leavings until it finds problems. And once it finds problems that can shrink those inner leavings until it finds a minimal report, reproducible test case. I mean, that is the magic. And that is that's. That is exactly what you need to be able to suss out and find problems. In certain cases, you want to go a step beyond a quick check, which runs, you know, and pulse, which runs inside your beam with your tests. I mean, those are super useful. You should use those. Uh, but at some level too, you want to start incorporating more systems things. And this is where tools like Jepson come in because Jepson does this basically takes all these same ideas, but applies them at like a black blocks, a system level. And it has tools to look at histories and analyze histories of different operations and see if they're linearizable, or if they're, um, what the different characteristics of those histories are. So it actually allows you to explore that state space in a much more meaningful way, but at a higher level. And that's the only way that you can actually find this stuff. If you're not in my opinion, like if, if you are trying to build systems that have these kinds of characteristics, then the only way to do it is to use these tools, which we'll explore the state space for you in ways that you wouldn't imagine to do it. Normally.

AMOS::
If you, if you do get it into an air situation, if you're keeping it track of everything coming in, because you're reacting to events at some point. So if you're keeping track of all these events coming in to, is there, you could use a lot of that to, to do a replay and try to figure out your problems too. I mean, that's an after after incident mitigation, but it seems like it would be really important in a, in a distributed system to be able to do that.

CHRIS::
Yeah, there's a, there's actually a great paper called lineage driven fault injection. And this is by Peter Alvaro. I think that's the name of the paper. I'm not getting the name of that wrong slightly, but it's paper by pier Alfaro. Uh, he wrote this, um, he, he, he came up with this idea with this prototype called Molly that would actually be able to use lineage to figure out, uh, different paths that you could take and to find faults. So if you actually start from a good operation, like I wrote something to the database and it worked well, what are all of the ways in which that it could fail? And then it just figures out all those ways and uses a sat solver to go in and find all the different, uh, possibilities of failure.

AMOS::
What's a sat solver?

CHRIS::
So a sat solver is a, a satisfiability solver. That's what sat stands for. And it's used to determine if there exists an interpretation, uh, back in satisfy some sort of predicate, like some sort of Boolean thing. So in the case of Molly, what it would do is it would try to figure out like, is there a way in which that we can create a fault here? Uh, and then it passes into the sat solver and sat solver like generates values and says, yeah, if we cut these, uh, different network connections or we make it flow through these different ways, like we might be able to cause a fault. And then it goes in retries that over and over again. And it does that until either you definitely find a problem or the sat solver gives up and it's like, no, there's nothing that satisfies this. The cool thing is that, uh, Peter Abra went and worked at Netflix, um, as like a kind of collaboration project with some folks at Netflix in their like chaos engineering team, and then applied a lot of this stuff, uh, in production at Netflix. And they were doing a lot of these things where they would capture all these traces across different systems and figure out ways in which they could have failed and replay them, or like they would go in and inject problems inside of those different services to try to suss out where bugs might have happened using these techniques.

AMOS::
That's pretty cool.

ANNA:
That's really cool. I mean, yeah. I mean, I was, uh, talking to a friend about yeah. At Netflix. Right. But they, it's pretty incredible that they will just take down like live. Well, they're chaos, chaos, monkey, whatever. We'll just take down systems.

AMOS::
I still need one of those built into, uh, supervision trees to just take down random processes as while. My systems running, see what happens.

CHRIS::
The only problem with doing things in that sort of like very random stochastic way is that they only explore a limited range of the state space. They only explore certain faults. And I think there's a case to be made. And I think a lot of people agree with this, uh, that you actually want very targeted generators and you want to be able to generate, you want to actually target your properties and your faults in very specific ways. Um, because that's going to actually lead to more problems. Uh, I mean, I was working on something like this the other day and it, and this is it's a very, very basic property. We were generating lists of things and then filtering that list of things based on a good list. Um, so you generate this giant, uh, list filled with, uh, some good keys and garbage keys. And you pass in your, you know, your allowable, uh, fields inside of a different list. And you want to ensure that you only get the allowable fields out at the other end. And we were, I was, you know, writing property tests for this and I couldn't get it to fail and I've, couldn't get it to fail because the lists that it was generating were tiny. They were like three or four elements. And when I increased the list size, uh, by like arbitrarily scaling up the generators to actually go and find more interesting bugs, we actually found tons of little things inside of that. Um, for, for a seemingly like really trivial problem, we actually found bugs in our implementation that we just hadn't considered. Uh, but we only found those when we could actually sort of force the generators to start exploring a more interesting part of the state space. And this is also why, like, I don't think, I think generating, uh, data like fake data or generated data for these tests based on type definitions is like not a good idea. It's just not going to be helpful in the long run because you actually want to control how you navigate through all and to find interesting bugs. It's not very interesting to just generate integers that are negative one zero and one over and over again, that only scale out to, to some degree, you're not going to find that many interesting things

AMOS::
Are there any benefits to generating off of, of, of type definitions? It feels like it feels like you might get some weird fuzzy things.

CHRIS::
I think there's very limited. I think it's very, very limited value over just riding your own generators that actually generate interesting data because it's purely convenience at the end of the day, right? It's just a pure convenience thing that I can say, well, I've specked this struct to have these types. And now that magically is generated well, that's like super convenient. I mean, it's no one could argue that that's not convenient. Uh, and maybe that's useful for some limited operations, but it's not, in my opinion, not very useful for finding faults in a system, which is predominantly the reason I want to do property testing. Uh, I think it's cute more than it's helpful.

AMOS::
Do you think that's a problem of our ability to define tight enough types or, uh, or, um, I don't know. I lost the, or part of that. I don't know where it went. It's gone.

CHRIS::
Um, if you have the ability to control the generation of that stuff, uh, when you actually go to use it in your properties, then that's fine. Whatever you've already kind of declared that the struct has these types of now you can kind of tweak and fiddle with the scaling of the different, um, integers that you're using or whatever. That's fine, uh, that that's very, very convenient and it can be useful for you. Um, but just like sort of randomly walking around in the tree, uh isn't as useful. Um, I mean, and this is even true of, of larger system stuff like Netflix, chaos monkey stuff, uh, as cool as that technique is. And as interesting as that technique is, um, it only finds a limited selection of things because that's all because you're still doing random walks. Like you're still just randomly searching around in this space. So you can't like fully explore the state space that way, um, in a way, because you can't control it just randomly going and doing stuff, and you can actually induce much more interesting failures by taking a more pragmatic approach to exploring that state space, which is going in and saying, well, let's go poke at these corners, this, or let's go do this and this at the same time and, and control these inner leavings or whatever. Uh, you're actually more, much more likely to find catastrophic failures or more interesting failures when you can do that. Um, there's a bunch of papers that have come out about this kind of stuff. I just, uh, this is not super recent. Um, but I just pulled down. Um, uh, what's it called again? Uh, it's a paper called a beginner's luck, uh, which talks a little bit about being able to do more controllable property-based generators, um, introduces like a whole DSL ability that it's, it's a very interesting paper. Um, it's worth reading and there's a lot of different exploration into this idea now about trying to find ways of exploring the State space.

AMOS::
So it feels like to me that, uh, being able to go off of types though, it might be a great thing if you're not exactly sure what to do quite yet, you know, to do that pragmatic approach. Maybe, maybe you just need a broad thing. Let's go ahead and throw this in here. Then as we learn more, we can, we can then start to restrict those things down to, to the interesting cases where it matters. So I, I think that it's still a good feature to have.

CHRIS::
So my only concern with doing that is you can write these properties that won't generate interesting data because they're just there, the generators aren't set up to be able to do that by default, you'll get green tests. And you'll assume that that means that you're, that everything's working fine when there's, you definitely just haven't explored like the state space enough yet. This is the equivalent of like, let's say that you were building a RGB to hex converter, and you never generated numbers that were bigger than two 55. Like you would assume that your stuff just works and never crashes and never returns an error or whatever, but it's only because you've never generated the interesting values that would have led to a failure. My concern is that you'll end up with people who do this, think that it all works out great. And then it turns out it doesn't, uh, at the end of the day, you just have to develop an intuition for what are the interesting things that could cause a failure. And then you have to couple that with tools that allow you to introspect the generated the data that you're actually generating so that you can see and really, really gain an intuition about like, what is it that I'm actually throwing at this function? So, um, proper is actually really good about this, like there's tools built into proper to allow you to do bucketing. Um, so you can actually see the distribution of values that you're generating. So you can kind of check yourself to make sure that you're generating datas that are of worth, um, uh, you know, for instance, if you have that RGB to hex converter thing, you could bucket all the integers that you're generating and determine if they're values that are interesting. And you might look at the buckets and realize, Oh, wow, we never generated value that's over 10. Uh, and then based on that distribution, and that could clue you into, uh, being able to start building more interesting generators to actually go find stuff.

ANNA::
But it sounds like you do have to start somewhere right. To build that intuition. Right. You do have to, even though at times you may false have some false confidence, right? It's like you said, experience, right. It's hard to get that if you don't start somewhere.

AMOS::
Um, so what if you combined generating off of types with buckets in order to get you to that point where you can, can do interesting things. So if you're collecting all of these random things that are getting generated and based on your types, and then you can look at them and say, Hey, okay, here's where we're missing a lot of stuff. And then you can start to build out that more knowledgeable generators.

CHRIS::
Yeah, yeah. Having the tools to actually introspect the day that you're generating in a, um, uh, like a readable and understandable way is kind of, I mean, that's super important. You have to have that to be able to actually, uh, you know, induce real failures, uh, or at least explore the kinds of interesting failures, but you'll need to find. Um, so if you can control, if you can start with types and then like figure out a way to, um, compose that with, uh, you know, the ability to drive scaling in specific ways or sizing of those values, then that's fine. Um, you just need easy and composable ways to be able to do that. But yeah, distributed systems are hard - that's the take away.

ANNA::
It's really hard. It's really hard to control that stuff. It's really hard. It's really interesting, but it's really, yeah.

CHRIS::
I, I still contend that, uh, getting the RAFT stuff I've been working on just to work was, I mean, it it's one of the more fragile things I've ever worked, worked on. And maybe that says more about my programming skills than anything probably does, but it's yeah, you get any piece of it wrong. And it just falls apart. Everything has side effects and it's all moving parts and it's all writing to discs and it's all, all these things all the time. And if you get any part of that, just wrong or out of order, um, or you, you mess up any of the guarantees and it all comes crumbling down. And I think that, that there's a, uh, uh, Peter Alvaro talks about this as well. Uh, I think he gave, he talked about this in his talk at Erlang factory a couple of years ago, but I think there's something to this idea that, um, uh, safety and data consistency is not a composable feature. Like, like you can't take two consistent systems and compose them together and get a consistent system. Like that's not a, that's not a real thing that you can do. Um, so you have to, you know, you have to make sure you get all every piece of this totally. Right. Even the integrations with pieces that are also proven on their own, because things can still go wrong in that integration. So it's not like a, it's not a composable property.

ANNA:
That makes sense.

AMOS::
That's, that's unfortunate. So we've talked about a lot of papers and stuff like that that, uh, people might want to read. And I think we have, at least one problem that Anna has put forth is just dealing with the blockchain, but what's a good beginner distributed system application that you could build. You know, just like if you wanted to play around and start to learn more about this, you could read all the papers you want, but for a lot of people, if they're not doing it, they're not learning. So does it any, do you guys have any good starting problem?

CHRIS::
Build your own MapReduce? I mean, it's the, like, I think it's the easiest one to get, right. And you'll learn a bunch of stuff by doing it, you know, and do it in a real distributed, distributed fashion. Um, you'll learn a ton doing that. And, uh, you'll, you'll understand a bunch of the trade-offs that you have to make to get those kinds of systems to work correctly. And MapReduce has a bunch of qualities built into it that, uh, handle faults sort of by default. So it's a good one to get started with.

CHRIS::
That makes sense.

ANNA:
Cool. Thanks. Y'all

AMOS::
Yeah. I'm getting hungry. So I haven't eaten. All right. Well, uh, well, let's go get some ramen noodles, a cup of noodle, be ready in one minute and you'll be good for your meeting and you guys have a great day.

ANNA::
All right. You too. Bye. Bye.