CHRIS (00:08): Welcome to Elixir Outlaws, the hallway track of the Elixir community. PAUL (00:17): Uh, a guy I worked with at one of my previous companies exercise, uh, Dan Slimane, I think he's at Etsy now. He was really into coming up with like the best way to react to operational issues. So he, I don't know if he'd come up with this idea or if he was just like, kind of building on it, but there's this idea of differential diagnosis that doctors do, right? This was featured heavily in like house, for example, you know, how they all sat around a room and... CHRIS (00:45): It's lupus. PAUL (00:48): That whole situation that's what they were doing was differential diagnosis. Somebody would propose a possible issue and then everybody else tries to basically shoot it down or support it, right. Depending on whether there's evidence one way or the other. And so you eventually either figured out that yeah, that, that potential diagnosis, that hypothesis was not supported by the evidence. Uh, so we can eliminate that as a cause and you kind of work your way down the list of hypotheses until you arrive at one, that is the most likely case. Um, and all this, the way it set it up is we were using HipChat at this place, but it really works in any sort of chat platform was that he built a bot that spun up a new room for a particular issue, invited all the people that were like important for that particular problem. And you could add more people later if need be. And then there were commands, like start a hypothesis and attach information to it and all that kind of business. So you could like start an issue and then have like all the steps that occurred towards troubleshooting it and all the conversations that happen as a log after the fact. So you could do a post-mortem on the problem and have all this sort of like the life of this whole issue and how steps occurred and when, what points they occurred, that all that business, which is super helpful to be able to come back after the fact and say like, okay, this problem occurred at 10:20 PM. The first person started looking at it at, you know, 10:25 or 10:30. And we started looking at this issue and this issue, and we eventually worked it down to this one problem. We're able to close the issue out at like 10 45 or something, whatever the case might be. You can see after the fact exactly what happened and if mistakes are made or somebody does something that exacerbates a problem. Like you, you can see that in fact, have that into your post-mortem. So you can be like, okay, well, in the future, we need to make sure that like, we communicate better on this step or whatever the case might be. It's not so much like a blame game situation as it is, like figuring out what went wrong, right? Like how do you prevent that from happening again? And it also, I think you, you would have somebody that would claim kind of ownership of a step or of part of it. So that person had to kind of like a sign off on like, this problem has been dealt with, but also they were the person that was supposed to be executing on a particular thing. So you don't have multiple people in there stomping on each other's shoes. Right, CHRIS (03:18): Right, right. Well, and adding to the noise and getting each other's way and like, yeah. Sometimes just having too many people in there super counterproductive, you can't see what's going on and you can't follow. Like, even if you're going in the wrong direction in order to determine that like someone has to sort of quietly raise their hand and then help point you in the right direction. Like it doesn't help to everybody to shout, you know, stay cool and collected it's all right. We have a process and we'll work through it. PAUL (03:45): I've kept things organized and neat and clean. And yeah, there was definitely a atmosphere of like, don't just pump noise into this room. There's a process. If you see something that's important, call it out. But otherwise, you know, stick within the bounds of the workflow here and let people they're responsible for a particular thing, do their job. You know, don't hop in and do your own thing. Even if you maybe know more about the system, like you can offer to maybe hand it off or whatever, but let people that are responsible for a particular part, do that part. And that prevents people from accidentally like changing something underneath somebody else. That's trying to troubleshoot a problem. And suddenly it works and they don't know what happened. Like, you know, that doesn't help people later solve the issue. People have to know why a thing actually was fixed in order to prevent it from happening again. So, yeah, I, I don't know how our conversation got so deep into the ops world, but it's all right. That's how it goes. I do think that if people have an opportunity to spend some times doing, you know, kind of the, the ops side of things, it's, it's well worth it, it sucks to be on call and it sucks to be working on issues, uh, kind of high stress environment. But if you're at a place that has a process like this or that is amenable to a process like this, it definitely brings the stress down. But the real benefit is that you start to have a real understanding of the impact that your application design and the decisions you make in your Elixir applications actually impacts the system as a whole or the maintainability of the system, the observability of that system. Like if you build an elixir app and you never actually interact with it in an offsetting, you're not going to know that like the metrics you gave them are useless. Like you have to troubleshoot an issue before. You're like, wow, how did we miss that metric? Like, that's super important. If you are never in that position, you can't hope to be like aware of when you're making bad decisions on the application side. And it really does require a two way street there, whether ops and engineering are separate departments or the sand department doesn't really matter. You have to have people that understand the impact of, you know, logging and metrics and alarms and all that, whether they're beneficial or just spurious and adding to the noise. CHRIS (06:07): I mean, application people have to be involved in this. This is something I'm really, really big on because at the end of the day, even if you're building some silly Restin point, you have SLS. So you have objectives that that service has to fulfill. And realistically you have different SLS on a per endpoint basis. Like for that service, like there was probably end points that are more important than others. The example I use a lot is like, you might need four nines for your login page, but only two or three nines for your user profile page. Like, who cares if you can't get to the user profile page, but login should never be down. It's like the Amazon shopping cart thing. Like, we'll be wrong before we're down. In order to know that level of stuff, you have to have application people involved in that monitoring process. And this is also why tools like new Relic are garbage because they don't tell you anything interesting about your business. They don't tell you how your, how your stuff is actually working in regards to those SLS, because you, you just can't do it. You can't customize it enough. You can't like really get in there and actually look at the real metrics enough to know that. I mean, they work fine for small apps. And like, if all you have is this one app and it's kind of all, it's all equivalent. Like it's either up or down. It's like just one big Phoenix app or whatever. Like, it's fine, whatever. But if you've got a whole host of different services, a whole set of things you need to account for it, you have to build in that level of specificity into the application itself, AMOS (07:36): Every application needs different logging, different metrics. And I think that's why you have to have the application people involved. They have to know it's there because you don't know what the important metric is. And until you don't have it, PAUL (07:50): Yeah. You definitely need to be talking with like whoever the project management side of it is to be like, Hey, you know, from a customer perspective, whether it's internal customer or external, doesn't really matter, you know, what are the objectives that we're trying to achieve? And what are the guarantees that we have to provide? Because those are going to be converted into dashboard items and like, or fauna or whatever that you will then tie individual metrics to say, yes, we're meeting this or no, we're not. And tie alerts too. And that kind of thing. And I think that, like you're saying, yeah, if you don't have the ability to link metrics from your application somehow into a number that represents like, yes, we're achieving this, or no, we're not. Whether that's, we want to be able to support, you know, a hundred thousand concurrent users, regardless of what they're doing. You need to have a metric that shows you how many concurrent users you have and alert that says like, Hey, the app patient is trying to air out at 50,000. You need that for all the objectives that your system and it too rarely. I see people define the objectives. They define the metrics, but oftentimes the metrics are kind of like whatever they felt might be useful for troubleshooting issues, but they're not metrics about the thing you really care about. What, what defines the system being healthy or available or useful. Those are the things that you need metrics to support. AMOS (09:13): If you're not talking, you start adding too many metrics and then your metric stuff is slow in your system down for metrics. You don't need it. Yeah. PAUL (09:20): If you're pumping in way more data than you actually need to troubleshoot issues or identify whether you're meeting your SLA as low guarantees or not. Yeah. You're just wasting bandwidth data CPU, all that address for no real. AMOS (09:37): I've, I've heard often in the embedded world, people say, you know, if you have a system that slowing down, it's always logging. Speaker 2 (09:43): Yeah. Logs are another dangerous one. Yeah. If you're writing. CHRIS (09:46): I freaking hate logs. AMOS (09:48): They're not even fun. Like going through them is terrible. CHRIS (09:52): There's such a necessity, but they sucked so much, like everything about them sucks so much. PAUL (09:57): I don't know how many issues I've actually solved because of logging out exception, output. Yes. But information that was logged, there was not part of my exception has almost never been particularly useful because there's never enough logs to give you this step happen then this step happened then this step happened. Or if there is it's at too high level or it missed some steps because like somebody changed the line of code to call some other part of the system that didn't have that logging. So now you don't know that that whole thing happened. That's why airlines tracing stuff is so useful. Like if you see some problems, you can just remote shell and turn on some targeted tracing and actually see exactly what's happening. That's way more beneficial to me than logs. And if you have good metrics, like you almost don't really need the logs because the problem is usually self evident. Like you had a burst of requests in this time period. And the correspondingly, there is a burst in memory usage and that triggered the auto-memory killer or something like that. Those things are almost more useful for troubleshooting issues then logging out. But logging output is more useful for like this exceptional request came through. We weren't really sure what to do with it, but here it is maybe stripped of sensitive information later on. Maybe take a look at this. And so you log that as a warning or whatever, you can come back in and, and figure out maybe like, well, where are we supposed to handle this? Was it truly broken? How did we even get this thing? That's to me what logging is for. But too often, people just log like literally everything, and it's just noise. You can't sort through that. And any sort of reasonable amount of time. CHRIS (11:28): And after that, like you get these ridiculous log services, then need this ingestion pipelines to be able to handle them all. We have a problem where like, we can barely log stuff because we blow away any pricing tier available on any, any service that we might be able to use almost instantaneously. PAUL (11:46): You pretty much have to self host your log irrigation for sure. CHRIS (11:50): W'ere way beyond what we can pay somebody do for us in a lot of ways. So we just have to like dial in our logging into very specific things. But even then, like once you get the logs and then you got to shred them like into, into something that makes sense. Like you have to learn how, you know, first of all, you have to learn how leucine works, which is fine and whatever, that's a useful tool, but then secondly, you have to write parsers and all this crap to extract all the useful information, then they have to build it like restructure the logs so that they make sense. In context. It's just a nightmare. AMOS (12:21): I think they're good for post-mortem sometimes, but especially in multi-node systems, your log messages, you might have to that pertain to your problem. And then 50 to other stuff, then two more back. Why the time the schedule, or it gets around to doing anything with it. So, PAUL (12:35): That's why tracing ideas are so important is you need to be able to connect the dots between the systems, not just in one system and even within one system, you have different components, particularly it looks there where, you know, everything is processes. You need to know, like the work for this request went through these different components of the system. And when we were logging, like here's the thing, the internal tracing tools in Erlang is one thing I'm talking about tracing IDs in your logs. So like there's a unique one that's tied on entry into the network and then everything that's related to that request or whatever it is, has that unique ID tag CHRIS (13:13): We've been investing pretty heavily, uh, in, in time I should say, uh, not really money so much. Oh, well I guess it's kind of the same thing, but we've been investing in using open tracing a lot more and that's, that's looking like really, really promising so far from a really high level standpoint, because I think most of the problems, at least in our experience, most of the problems we have is very much to do with how systems interact with each other, across multiple nodes, across multiple different isolated services and that kind of stuff. So being able to, just to like open an APM chart and see, Oh, here's when the request came in. Here's how it hit the front door. Here's where it went to this other service. Then it went to this database and then it came back and then it errored. That's pretty, that's pretty dope. Yeah. PAUL (13:54): Super useful. For sure. Particularly if you're the one making the request, you know, it bailed, but you're not exactly sure why being able to actually hop into your logging and see the exact step at which things went through the system and what data was carried along with that request along the way is definitely helpful. Sometimes all it takes to figure out a problem is like knowing where a thing crashed and burned, you know, a complex system that's composed of a lot of microservices, knowing just that piece of information is super handy, as long as you know that, for sure. If it made it somewhere, it got logged, but that's the other problem. CHRIS (14:29): Yeah. It just goes into like the AWS black hole, never to be seen again, PAUL (14:36): You almost need an on entry on exit thing for each system. So that, you know, when a thing leaves, you know that too, because if it got dropped into the ether, somehow you need to know that it wasn't, that source system that dropped it. It was something else unless the source system was supposed to retry. In which case you still have the answer to your problem, right? CHRIS (14:57): Observability is hard. And especially in like these big systems like this, it's complicated. Speaker 2 (15:03): Yeah. It's basically a never ending journey basically, but making it part of your projects, your application design early on makes it way easier later. If you're trying to bolt it on after the fact it's going to be super painful. And every once in a while you have an application or you decide like, Hey, we're going to add this thing into our system to aggregate some metrics or logs or whatever. And so you do have to rework things, but if your system is designed to kind of abstract out where those things are written to and, and all that, but you're making sure to log and track metrics for the things that are important and push them out to that abstraction, then it doesn't matter so much where it actually ends up it'll end up wherever it needs to go. And there are problems to solve there. Like, are you writing, uh, metrics to something that can handle the throughput or the metrics that you're trying to write and all that jazz, but just do stuff. See if it works, if it's not working for whatever reason, there are solutions to those problems. That's how all software engineering works is just try over and over again until you get it. Right. CHRIS (16:08): That sounds like my, my coding methodology. Just keep trying over and over again. AMOS (16:12): I'm just glad that bridge engineering doesn't work. That way. CHRIS (16:17): Those people use those people use math and stuff. Like, PAUL (16:22): I don't know. We don't know anything about early bridge builders. Okay. They might've done exactly that. Speaker 3 (16:27): That's true. It's true. PAUL (16:30): 40 years after they started building the first bridges, were they still just building them until they figured it out or that's basically us? CHRIS (16:38): Oh man, I have so many other things I want to talk about. We're running out of time. PAUL (16:41): That's, that's how it goes. AMOS (16:43): I still have an intern question, but I think it, it will spin into a very large conversation. CHRIS (16:49): I want to talk more about other config things. I have other questions about stacking theory stuff and more specifics about how you're doing the monitoring with ecto. And I, I mean, I want to talk more about use cases I have where I need to be able to change configuration on the fly. Like more specifics about that. Uh, like I need to go to like toggle like an ets-ed, you know, bit or whatever to, to stop, you know, a certain feature or whatever, uh, for now. Like, so now that you've gotten distillery 2.0 out what's next, are you taking a break? Are you just going to focus on something else on one of your other myriad libraries? AMOS (17:25): He is writing a book. CHRIS (17:26): Oh yeah, yeah. You're gonna finish your book. How do you sleep? Like how do you, you have like, you have so many important open source libraries to this, to this silly community and, and you also have a book and a PR and presumably a life like... PAUL (17:49): No. I mean, I will probably be, I'm trying to address some of the things that I've been ignoring for the last month, because I was trying to get this release out the door, but there's not too many. Things that I don't have any. Ideas sitting on my head right now that I need to get out and like form of new library or whatever. But a lot of that is because a, I really want to finish this book and just get that out of my hair because it is a nightmare trying to convert what's in here. Into words that people will want To read. Thankfully, my Editor at PRI Prague is super good about helping me work through that stuff. So, but that's kinda like the big priority for me, I guess. It's just trying to finish that. Cause I've been sitting on that now for months, and then once that's done, I kind of want to trim the fat a little bit on some of the things I've got out there. You know, some of the libraries have kind of reached stability, so to speak. I don't really have to pay attention to them. So I just kind of keep updating them bit by bit or let PRS handle that. Some of the others though, they are active, but I'm not in a position where I'm like actively using them. I have ideas for of my things, but I'm really looking for people that are interested in taking those over and fixing active issues and evolving some of those projects. And I can come back in and, you know, help participate on those things, but really finding maintainers for some of that stuff, either as primaries or just helping me close issues out. And then I've had some people offer in like Timex and swarm, but I really want to trim things down to the point where yeah, I, I don't have as much bandwidth spent across 10 different projects because it really does soak up so much of your time to kind of get back in to the context of a particular project to troubleshoot an issue, figuring out what needs to happen there. You've got all the change management around that too. Like, is this going to be a breaking release? It's not breaking release? Like how does this tie into dependencies upstream that I know are using this library, that kind of thing. Those things take a lot of, just a lot more time than you'd expect. I guess AMOS (20:00): It's hard to be a project manager on multiple projects. PAUL (20:04): Pretty much. Is it more of an investment than you necessarily think when you're getting into open source? I know it was for me anyways. It's not onerous necessarily. Like sometimes it can be when I'm going through like my yearly burnout phase, for sure. I'm like, I hate this, but it always comes back around where like I get excited about stuff again, whether it's because I had some idea or I'm just not feeling burnt out anymore. It's just a natural part of being a software engineer. That's working on stuff outside of work. And even if you're just doing a separate work, burnout is still thing. If you're working on a project that you just are not excited about doing menial stuff, like it will drive me nuts slowly, but surely it's really the worst. CHRIS (20:48): And then you feel like, at least it's been that way for me. And I just feel guilty and I feel, I dunno, I get really complicated emotions about, about, about, all that. PAUL (20:59): Uh, I know what you mean. Yeah. CHRIS (21:02): It's super complicated. And to me it's, it applies such a negative overhead on my life. I mean, that's why I, I wrote a blog post recently about like, Hey, I need somebody else to maintain Wallaby. Now part of that was, it had become such a big project and such a time commitment that I knew I wasn't putting into it. Then it became a negative burden on the rest of my open source stuff. It wasn't even that I could just ignore it and then go focus on the RAFT Stuff or go focus on whatever else. It overrode all my ability to do all of their outside work because I, I just felt so sort of demoralized by it. PAUL (21:43): That's totally how it works for me too. Like if I am feeling burnt out, it's not on one project, it's all of them. Right. Uh, there is no specificity to it. It's just because in general, I'm not excited about programming. And the only thing that works for me on that is to do other things. I will do whatever else I want to do. You know, I've been doing more and more woodworking trying to build up a little workshop here. And so that helps me kind of like get I'm still in like a builder's mindset, but just doing something totally not programming related or just getting outside, you know, doing whatever. I mean, you should be doing that anyway. Yeah. But for sure if you're like dealing with burnout, just spending more time doing things like that, reading books, like that's another one that works wonders to me. Cause I love reading. So I'll just sit down with like a stack of three or four books. And by the time I get done with all of them, I usually feel kind of reset. It just depends. It's always a time based thing for me. And I never know how much time it is. Just one day I wake up and I'm like, yeah, let's do some programming. That's how that goes. AMOS (22:59): I find when I do those other things too, I get, I get ideas that make me excited about programming. Again, that brain break sometimes leads to. PAUL (23:07): you need that seed. Yeah. That thing that just sort of grows in your mind or eventually you're like, man, I really need to see that happen. And so you just started working on it and then that process of working on it gets you excited about all sorts of programming again. Speaker 2 (23:21): And that's what happens to me is like, I'll get burnt out or everything. Something will get me excited about programming again. And then I'll go through and work on all my projects and it's not like I immediately then get burnt out again. It's usually like, I'm excited about stuff for awhile. Things sort of reach an equilibrium. And then at some point we're not happens again. I feel like some people think that burnout is like this thing that happens sometime in your career. And it's just like, that's it for me? It's yearly. It happens every year. CHRIS (23:50): You can set your clock by it. AMOS (23:51): Yeah. I mean, that's, that's why we have vacation, right? PAUL (23:55): Yeah. For sure. Yeah. Working in a place where like you're either not getting vacation or you're like not taking it, that's going to be a part of your problem. You got to do something about that. But I don't know that that's necessarily a problem for a lot of people that have burnout. For me, it's not so much that I'm not taking vacation. It's just, I get too invested in what I'm doing. And so it becomes much more, just a slog than it is enjoyable. Maybe I'm just, all I'm doing is writing bug fixes and just maintenance stuff. I'm not actually generating new stuff, new ideas when I am too long in that kind of a phase, that's when it starts to happen. I need to be working on something interesting while doing the boring stuff. CHRIS (24:37): Yeah. I mean, these things have to fulfill you too, or it's just more work right now. You're just working three jobs or whatever. Yeah, exactly. AMOS (24:46): Work becomes just a paycheck. PAUL (24:48): Yeah. It should be part of it should definitely be like, I'm excited about either like a product that you're building or just the challenges that you're dealing with. Sometimes it's just having a problem and trying to solve it is all it takes to be into it again. Yeah. AMOS (25:05): That's what works for me. It's cool to have a cool project that maybe I could tell everybody else I'm working on them and they get excited about it. But day to day, it's, it's really the challenges and, and the problem solving that I find PAUL (25:18): Well, that'll lead to burnout too, is just being like, I need to come up with something super cool. Like, no, you don't just find something that's interesting. Maybe out of that will become some kind of cool thing, but more likely than not. You'll just work on that cool thing. Finally arrive at a solution. Feel really good about it and maybe share it with some people and then that's it move on to the next thing to me, that's just as fun as running a project that a bunch of people use more fun really, because I don't have to care about it. CHRIS (25:46): Right. Once people use it, then you're like, well, crap. Now I'm on the hook to take care of this thing. Yeah. PAUL (25:53): I also reached a point to where I'm like, if I can't get to the maintenance tasks on this project, there's a reason why it's open source, feel free to fix it. Please contribute back. If you feel that's a good thing to do, but I'm not going to pressure you into it. Just fix the problem you're having. And when I get around to it, eventually, maybe I'll fix it for everybody else. But if you put too much stress on yourself to be like the arbiter of all solutions, for every problem that everybody has with your library, then you're going to burn yourself out so fast. It's just not worth it. You have to be willing to be like, well, sorry that that's happening to you, but I don't have the time to deal with that. AMOS (26:31): Zero issues is a nice thing, but, but don't make it your goal PAUL (26:34): It's impractical. It will never happen unless your project is like fairly minimally used or inactive, very niche. You're almost always going have a few active things. CHRIS (26:45): Well, and you always have to keep in mind. It's like, if, if this person who's asking you to fix this thing, can't be bothered to fix it. But yet their business depends on it. Then they're the ones whose priorities are out of line because it's literally free. Like they're basically asking you to just give of your time to fix a thing in their business. PAUL (27:06): I do think that if somebody uses that as their thing, like, but I'm using this at work and you know, super critical to our business, you need to solve this. Then I don't feel bad for those people at all. But if people are using one of my libraries, it's often because they do not have the time or the expertise to deal with whatever problem, this thing solves, AMOS (27:25): Right. They don't have time to build distillery on their own. CHRIS (27:28): They don't even know how. PAUL (27:30): Yeah. There's lots of reasons. And if the expectation is that if it's not working for them, that they have to learn all that and make it happen. I'm not expecting that at people. I will expect it that you do put in your due diligence. If you have an issue and you're opening an issue and you want some help, like I want to know that there's problems so that I can work on solutions. But I also expect people to put on some effort if they want a solution fast. And if they're going to demand a solution, well, then I might just be like, I'm not doing that. CHRIS (28:02): Well, even people who've been like, Hey, when can I get this in there? And then I, and then that's when I look at the PR and I'm like, Oh, you did not test to this. So once you get some tests in here, then I'll be able to merge this., AMOS (28:11): What I've put in issues had not necessarily known how to fix them myself. I will often put in there. I would like a note that says, I would love to work with you on this. Or if you could point me in the right direction, because I have no clue what is going on or how to get there PAUL (28:26): As a maintainer. Like even if I don't have time, oftentimes if I see somebody that's in that position, like obviously wants to help or is willing to put in the effort, like, I'll go on my way to help people like that. Like it happens all the time on like Slack or IRC or whatever. Somebody has an issue or ping me about something and clearly the right to do whatever they can to help me fix it. And so in those cases, if I can, I will set aside the time and work through problems with them because I know that they would do it themselves, but they don't have the expertise or whatever to fix it themselves. So they need my help to make that happen. AMOS (29:06): And you might be giving them that entry point to add 10 more fixes that you don't have to do. Right. PAUL (29:11): Right. By working through problems with people, you get the ability to teach them how to troubleshoot certain kinds of issues. And in the future, they may be capable of fixing things that they came to you this time for. But in the future, they don't need to do that. They can kind of dig in and fix it themselves and then come to you with an issue or a PR or whatever. But yeah, I do see a lot, like Chris was saying, I see a lot of people that open up PRS with changes and like, no, which is fine. If the thing they're changing already has tests. But if it doesn't like great, you've, you've made a change. I presumably you test it to see if it works, but I don't know that. And now I have to maintain the side effects of whatever this is, what distillery, I'm less, I guess, annoyed by it. Because the test suite there is not friendly. I guess integration does. It's really a pain to test the tool. CHRIS (30:10): That's a big tool chain to try to get all running. PAUL (30:12): You've got a bunch of stuff in shell. And then you got the elixir side of things and you're spinning up nodes and you need to like connect to them and interact to see what's happening. Like I've gotten it better now. It's definitely better than it was. But you know, if you're coming into it, brush figuring out how to write a new test for some specific cases is nontrivial. Lot of issue with the shell side of it. I run shell check. So I at least have one thing and identification of issues that happened. But you know, there's not great ways to test shell on a bunch of different platforms. And that's where a lot of the issues come out and you can either bundle alerts or you can't bond alerts, you can bundle alerts and then deploy it to the wrong platform. You can run shell, an environment that doesn't have proper shell. And so now we're using bash, but even you might have different versions of bash that don't support some weird syntax saying I decided I was going to use, you know, there's lots of things like that. That just sort of add up on the shell side where it's just manual testing, always manual testing. AMOS (31:14): I run, um, the growth PI library. So it's for a growth PI connects to a raspberry PI and allows a bunch of ITC devices to connect. So people add a device to that and I don't always own them and I can't go buy every piece of hardware out there. So CHRIS (31:32): Nerves testing seems like a giant boil the ocean problem. Like it's, it's like, uh, like testing anything in nerves has gotta be a boil the ocean problem. Like, cause it was just too much hardware. PAUL (31:42): You need all the hardware. Yeah. AMOS (31:44): Yeah. So I mean, half the time somebody puts tests in and then I go find the documentation for the device. And not that I have a lot, it's a very beginner library, so there's not a lot that comes in, but when it does, then I go read the documentation for whatever component they're trying to add support for. And then compare that with the tests that they wrote. And it still, for me half the time is just a guess on whether I should actually merge it or not. But I assume that they're using it on a project because otherwise they wouldn't like you, you don't go support a plugin component to your Nerves device that you don't actually own normally. So I hope before they've pushed that to me if they've actually tested it on hardware. And so then I just am double-checking with documentation that, you know, maybe some math is, is in there correctly. And then I just mash, merge and hopefully CHRIS (32:32): Let go and let God. Yeah. PAUL (32:35): Yeah. I think, well, one of the libraries I wrote recently that actually for clustered tests, uh, because that I've got a few libraries that do clustering stuff and even distillery has a lot of tests that run in separate nodes. And I got really sick of all the boiler plate associated with that. So that was kind of what spawned that. But just in general, I feel like X unit out of the box does not have good tools, both for that. And for just testing, like things that interact with the outside world, like in distillery, we have to run a bunch of automated tests and different configurations and tests the results and all that ends up having to be done basically by hand. And it's becomes kind of fragile if you don't hit all the ways that that can fall apart. If you look actually at that extreme clustered case library, it's super complicated. Like there's a lot there and it's doing more than you would necessarily need to do to set up cluster to us. Like you can also go look at like Phoenix pub sub or something to get an idea of how small it can be. But when you want to spin up a bunch of nodes in parallel with that, you can execute and execute anonymous functions on and you want to be able to test like nodes coming and going from a cluster and all that. Pretty soon it evolves into a real beast. That library ended up being way bigger than I was expecting it to be. But on the flip side, like writing tests with it now, it feels a lot more pleasant. CHRIS (34:00): Yeah. I ended up building something similar for some of the RAF stuff I ended up doing, uh, using like slave or whatever in RPC to like start spinning up other nodes and then using the, uh, like code loading off of like the VM you spin up when you run your, uh, X unit tests that becomes like the code server, right. Same sort of idea. Like you load all the code from that remote thing or whatever. PAUL (34:23): The only thing that it doesn't work with, which is one of the things I found out, uh, and it was really annoying is that the code server will not send along modules, compile their memory. Oh, okay. So it has to have a file on disc and it has to be obviously a beam file on disc that matches what you have in memory. So if you've manipulated it somehow, you're not going to get the right version. Ha what happens when you do that as you unbulk, that code, and it'll blow up on the other note as like a bad fun or something. And it's because the byte code is different and that's, if you even get it on the other note, like in the case of things that are purely in memory and not on disc, there is both no way to get the beam by code from memory and write it to disk. And there's no way to pass that in memory, by code to another node, which to me, it seems like a huge failing and the code server. I don't know why that doesn't exist. It's like CHRIS (35:18): A security thing. Maybe PAUL (35:20): I'm not sure I could be, but security has never really been a huge part of the distribution. They, they kinda just, it's an open door for the most part. Even the defaults for distribution are like, yeah, if you want to listen in and, and log on, do stuff, feel free. AMOS (35:35): Enjoy. PAUL (35:37): It's definitely a thing that caught me way off guard because I thought for sure, and I had actually worked around her a little bit. I think the one case where I had run into a previously, like I was adding the code path manually for some modules that weren't already on the code path with the testing server, because I thought like, Hey, I've loaded this module in a memory. Like it should just work, but it, it didn't because it, it couldn't pass alarm that and memory anything. Um, so the code paths have to be the same. And then I ran into this and memory thing, which is just, just nuts. If I think that could be fixed, it probably could be fixed. I don't, I didn't look into actually trying to do that, but I did dig all the way through OTP to figure out why the code server wasn't doing what I thought it was doing. And that's sometimes all you have to do right. Is dig through Erlang source code is actually surprisingly easy to navigate unless you have to dig your way into the C side of it, which is much more obtuse. CHRIS (36:34): Ridiculous and formatted in the most ridiculous way ever. PAUL (36:39): Uh, what's that style. I forget what the style is called. CHRIS (36:42): No, I don't remember. It's wild though. PAUL (36:45): I've had somebody explain to me why formatting works that way. And I mean, I get it. I also don't like it aesthetically, but I understand the reasons behind it. CHRIS (36:56): A lot of sense when you could only see 80, 80, 80 columns on a terminal or whatever. PAUL (37:01): Yeah. That's true. But I think it's also just like stacks, the important things about a particular function on top of each other, rather than having them all online. So it does make, if you follow that convention everywhere, it does become easier to read, but it does look horrible. Also. CHRIS (37:17): It's really bad. That's really not good. PAUL (37:20): C Code in general is like that. It's just, it's true. It's always, somebody hates the way the current stuff is written. CHRIS (37:28): Yeah. That's how you, that's how you get a go format tool built it. You do that long enough and then you get, go form it, fill it just, AMOS (37:36): If you look at the same format long enough, it it's, it, you get used to it. It's kind of nice. PAUL (37:42): Yeah. I, I do appreciate the automatic formatting stuff I have at times been a little frustrated with mixed format because, but then I realized that ultimately, like I can fix the things that initially formats weird, and then it'll be like less bad, but kind of the default way that it deals with things that like are too long, seems arbitrary. You have to, you have to go in and fix them and then rerun the format or, you know, and then it looks properly formatted again. But if you just run the format or like, I'm just going to use what it did, which is what I've done a few times, you know, that's not great. CHRIS (38:19): Well, unfortunately I have to run. AMOS (38:21): I was getting ready to do that. I'm glad. I'm glad you said it first. Cause I, I really didn't want to go. CHRIS (38:28): I want to do this again. I have so many other things to bring up that I want to talk about. PAUL (38:33): Yeah. This was a blast. AMOS (38:34): You're going to be at elixir conf? PAUL (38:38): Yeah. I will be at a Elixir conf and uh, the big elixir too. I don't think either of you probably going to be there though. Right? AMOS (38:44): I'm not going to be a big elixir. I got too much going on. I, I really wanted to and I, I did, I started to bring it up to my wife and then I was like, nah, I think I'm just going to keep this one to myself and not talk about it because she'll kill me, PAUL (38:56): Couldn't make it work this year. But yeah, yeah, Yeah. I mean, there's too many conferences all happening at like the exact same time wherever's organizing all these things really needs to spread them out a little bit more. I mean the big elixirs in November. So it's like not really so much. Yeah. They picked a good time, but that one's pretty good. The rest of them, I think are a little bit more of a toss up. They're all in September basically. Yeah. It's kind of like pick one or two of them unless you happen to live somewhere really convenient to get to all of them. But yeah. I mean, it's been great guys. I really appreciate you having me on the show and, and talking about all this stuff, I should do it again sometime. AMOS (39:33): Thanks. I would love to have another one, so yeah, Speaker 2 (39:36): I should also send you some of the links. I think we're Speaker 3 (39:40): Yeah. Send us the links in the show notes and then people can, can read over it. I think there's going to be a lot in there. There's a lot of stuff. [inaudible].