Ben: Hey everyone. Welcome to LogRocket. Sorry, PodRocket. Now, well, actually today it is both LogRocket and PodRocket, because I have a very special guest, Pascal Kriete, who is the VP of Engineering here at LogRocket and also was employee number one at LogRocket. So he's been a member of our team basically forever and I'm super excited to have him on the podcast today. Pascal Kriete: Yeah. Thank you, Ben. It's good to be here, doing well. Ben: So yeah, I guess I had a freudian slip in my intro because we are on PodRocket, but today we're going to be talking about LogRocket and I think there's a lot of things we could talk about. I think what maybe potentially interesting to our audience to start with first is an overview of how our systems all work. I think folks probably know the product that maybe we can just start with like a 30 second intro about the product itself. I don't want this to turn into an advertisement for the product, but it's helpful just to give some context, what LogRocket does, and then we can dive into how the system is architected and all that fun stuff. Pascal Kriete: Yeah. Absolutely. I think probably the most interesting thing to talk about when it comes to how LogRocket works is talking about the data that we have. And essentially what we record is everything that happens on the screen. So in LogRocket, you can get a video of everything the user has done alongside everything you would see in your Chrome developer console. So network logs, exceptions, performance, all of it, and the general idea behind LogRocket and now it is turning into an ad, sorry, is that you can record what your users do, and then you can figure out what they've done after the fact. Pascal Kriete: So if you have a user that went through, I don't know, a checkout flow, they clicked on a button, the button didn't work. You can then go into LogRocket. You can see them go through that checkout flow. You can see them click the button, and you can then see, okay, a network request went out, it flaked out at our CDN. An exception goes through that the button didn't work. And now I have something I can try to reproduce and try to fix, instead of just getting, "Hey, this doesn't work," and some poor engineer needs to go figure out how to fix it. Ben: And I think the obvious first thing to talk about there is the video. So what is the video? I can say this is session replay. It's technology that has been around for a fairly long time. Certainly predates LogRocket itself, though I think some of the things that we do around session replay or many of the things are unique and our implementation of session replay is pretty unique. So maybe you could explain to us what session replay is, how it works and we can go from there. Pascal Kriete: Yeah. I think the biggest thing to highlight there is it's not video and that surprises a lot of people. It does look exactly like a video, but what we're recording is the DOM. So the DOM over time, we'll take a snapshot of the HTML when the page loads, we record diffs over time, record mass positions, scroll positions, inputs, and you can put that all back together to look like a video, but it's not. Ben: Got it. So when you say putting it back together, could you talk more about how that replay works and maybe even before that, how do you capture diffs of the DOM? How do you know when the DOM changes? How does that work from a JavaScript perspective? Pascal Kriete: Yeah. So the first thing we'd capture is just a snapshot of the DOM and that you can get by walking the DOM as it stands. And then right around, I think IE 6, IE 7 browsers added this API called mutation observer. And what that does is you can observe any DOM element and all of its children and it'll tell you when something changes. It's a pretty verbose API. It'll tell you parameter A change, parameter B changed, node got added, node got removed, and you pretty much get every change on the DOM. Pascal Kriete: So you can treat it like Git where you have initial state up front and then diffs over time. What we do at replay is we take that initial DOM snapshot. We put it in an iFrame and then we apply those diffs over time so that the browser's actually doing the rerendering for the video. And the only thing that we really render ourselves is the mouse, which is surprise an image of a mouse. So we'll put that image of a mouse over the video, we'll fix scroll positions, we'll fix inputs. And it ends up looking exactly like a video with all of the browser rendering applied. Ben: And what about style sheets or images. Pascal Kriete: Mm-hmm (affirmative). Ben: Things that could change between the time a session is originally recorded and then when you subsequently replay the session. Pascal Kriete: Yeah. Good question. We essentially just cache them. So we will download your images, your CSS, cache them ourselves, rewrite your DOM a little bit to point at those cache assets and then use those. It also means that you don't actually have to upload all of your images from the browser. That would be a little too much bandwidth usage for us. So when we see an image tag, we'll go grab the source, we'll download it, we'll cache it and then reuse it at playback time so they don't change. Ben: Got it. And yeah, performance is an interesting question. So I know there's a quite a lot we do with regards to performance and ensuring that our recording doesn't slow down someone's web application. So to the extent you're comfortable sharing, what are maybe some of the more interesting things we do to preserve performance? Pascal Kriete: Yeah. I think most of the audience is probably aware of web workers and that's probably the biggest piece we do is we try to do as much work as possible in a worker. You can't do all of it in a worker. All of the captures still has to happen in the main thread, but most of our performance optimizations are around how do we not block the main thread? So we have a little control loop that runs, that looks at how much time are we taking up, how much CPU, memory network. And we actually end up adjusting how much we record and what fidelity we record at based on how much time we're taking up. At the end of the day, if there's no network or we're blocking the CPU or something like that, we will actually just turn off. We're not more important than your app, as much as I would like that to be true. Ben: And when you say fidelity, so if you were just capturing a video, then you could just record a call resolution or fidelity, but when you're capturing DOM elements, where is the degree of variability and fidelity? Or what can you stop recording, but still add value if you have to slow down your recording to your performance? Pascal Kriete: When you're capturing video, one of the things you can slow down is frame rate. So we actually end up diffing less frequently as things get busier. So by default, we might say every 200 milliseconds we take a diff, if the network is really bad, maybe we'll go to 500 milliseconds or every second. And every second is still enough that if the user clicks on something and it changes, it's a pretty instantaneous experience at playback. Pascal Kriete: We can also drop things like large logs, large network, performance data. Ultimately doesn't change that much if you currently have 100 megabytes of memory usage and then 10 seconds later, it's 120. You can guess roughly what happened in the meantime. So we can slow some of those streams down as well. Ben: Correct me if I'm wrong, but with the DOM, we're not taking a snapshot every X number of seconds. The imitation observers telling us when things have changed. So what exactly do you mean by slowing down your frame rate or sampling rate to every N seconds? Pascal Kriete: Correct. So we actually record all of the mutations that happen, but we more or less just mark elements as dirty. So we can say this element was mutated in some way, and then every second or so go through all of the elements that have been mutated and find the minimal set of mutations. So if you imagine that you add a class and then you immediately remove that class again. If it happens within 10 milliseconds, we shouldn't have to record that. So we'll look at what has changed over the last 200 milliseconds and remove anything that's duplicate or has been undone again, to create our final diff. Ben: It makes sense and I'm curious, LogRocket recently launched a mobile version of session replay for iOS and Android. I myself I'm significantly less familiar with mobile development than I am with web development. So maybe we could start with iOS or Android or both, whatever you think is easiest to comprehend, curious to understand how session replay on mobile works, how mobile applications are structured and yeah, we can go from there. Pascal Kriete: Yeah. When you say you didn't know how mobile works, neither did we. We are a team of web developers. So it was a learning experience. At the end of the day, mobile is sort of traditional graphics processing. You have draw operations that happen to some sort of canvas or frame buffer. And then the screen shows the composite of those draw operations. So what we can do and let's take Android because it's open source and you can actually go look at the underlying code here. We can capture the draw operations that happen. And then at that point, it's very much like a DOM, you have a tree of operations in different parts of the window, and you can record those. You can upload those and then you can play them back on a web canvas. Ben: Got it. So it's like for web developers, we're used to the HTML canvas, you say, draw line, draw rectangle, draw text, and LogRocket on Android is just listening to those. So we have some way of poly filling or listening on those draw operations every time it happens, we record that. I'm curious from a performance perspective, intuitively it would seem like there's orders of magnitude more draw operations in a mobile app than there would be changes to DOM in a equivalent web app. But is that true or not and if so, how do we think about that? Pascal Kriete: Yeah. Same problem as web in many ways. So we have a DOM, we have way too many mutations coming out of mutation observer and every few hundred milliseconds we decide, okay, now is a good time to capture. You can do the same on mobile. One of the things you can do on Android is you can tell a view to draw itself. So you can On-Demand get all of the draw operations that, that view has most recently performed, and you can record those separately. So now you get into sort of optimizations that I want to get too deeply into around when do you ask of you to redraw itself? Where do you capture those operations and then how do you play them back? Ben: Got it. But it, okay. So essentially every time we want to take a screenshot, so to speak, instead of taking a screenshot and having lots of pixels that we have to capture, we just ask, "Hey, view, how would you draw yourself if you had to redraw yourself right now?" Which is a lot less data than an actual screenshot. And then we keep track of that draw operation. And when we do the replay, how does that work? Pascal Kriete: Yeah. That's an interesting question. An interesting problem. So at the lowest level, it's a very similar API. If you look at the canvas and it's actually called a canvas on Android and web canvas. It's draw line, draw circle, draw shadows, whatever it may be. Unfortunately, there are small changes throughout. So if you look at things like compositing and how they composite colors and the compositing modes available, things are a little bit different. So a lot of what we had to do to make this replay happen is reimplement some of those compositing modes. Ben: How about iOS? Pascal Kriete: Well, iOS is a little bit more fun. iOS development ends up being a little more, I don't know if reverse engineering is the right word, but you can't just go in. You can't just read the code. And at this point there's also layers upon layers of APIs. Every time Apple changes their dry APIs or how you build views, they keep the old stuff around and they build on that to create the newer things. So iOS does have something that looks a lot like canvas, they call it a CALayer and that CALayer can receive draw operations. The difference for iOS is that it's actually structured a lot more like a DOM. Pascal Kriete: You don't have one canvas, you have many canvases and they get drawn to almost like a DOM sort of rectangle. Here's the button. We draw the contents of the button and then it composites all of those layers together, which lets it do very interesting things like very smooth animations. It can composite a layer and then it can just move that layer across the screen without having to redraw it. But we can also capture those same operations. Ben: Any substantial differences between the iOS capture and replay method or Android, or it's obviously, there's differences but conceptually reasonably similar. Pascal Kriete: Conceptually reasonably similar and our playback code is at this point identical. So we can take a stream of Android draw events and a stream of iOS draw events. And we've got it down to a point where those will play back the same. Ben: So what I'm curious is on web, there's like you build your apps with the DOM. You have JavaScript. There's not a lot of choice in terms of tooling. Obviously there's a million frameworks. It sounds like LogRocket. Well, I know know LogRocket is agnostic as to what framework whether you're using React or View, we sit at a layer below where frameworks matter. So it doesn't really affect us on web for the most part. Ben: I'm curious on mobile there's, I mean obviously on Android and iOS, completely different tool chains, different programming languages and even on each platform, you have different options, on Android there's Dart, Flutter, there's React Native for both iOS and Android. So does it matter what tool chain you're using? And if so, what have we had to do in building LogRocket to support all these different tool chains? Pascal Kriete: Yeah. I will say I don't envy mobile engineers, especially if you're on the hook for supporting a legacy app in both iOS and Android, and someone started rewriting your Objective-C in Swift, your Java and Kotlin, now you need to know four languages and someone up high said, "Okay, now we're rewriting this whole thing and React Native and now you have a fifth, it's like that comic about standards and another standard. Pascal Kriete: So we support the native code, Objective-C, Swift, Java, Kotlin. If you're interacting with native iOS or Android APIs, that all works fine. When it comes to the modern sort of shims where you write something once and I think the truth is, you write most of it once and then you still end up with a little bit of Swift code and a little bit of Kotlin code for the platform specific pieces. There it gets a little more tricky. Pascal Kriete: So we support React Native. And if you're on the web and you write something and react, then all of your bugs are in JavaScript. That's just not true on mobile. Your bug might be in JavaScript or in Swift or Kotlin. And then currently we don't support Flutter and Xamarin. They all worked roughly the same. You have this sort of bridge between the language they're written in, whether it's Dart or C# or JavaScript. So we can still record the native side, but you won't get your Dart issues. You won't get your C# exceptions. And those are things that we're looking at for the future. Ben: Do we do set like screen capture, if you would happen in Flutter or Dart, do we do the screen capture, but just not have exceptions or crashes or logs or even the screen capture would require us to do more in the future to support? Pascal Kriete: It depends a little bit on how your app is built. But for the most part, we won't capture the screen. They end up using their own draw code in the case of a Flutter they have a very similar library to what Android uses, but they reimplement a lot of the rendering code themselves. Ben: Got it. And for React Native, does React Native use the same render? I have the Native rendering on iOS and Android, and that's why we able to support React Native? Pascal Kriete: That, and we happened to already have a JavaScript SDK that we could reuse a fair amount of code from. So that was just a pretty easy lift for us and as you know, LogRocket actually came out of a pivot from a React Native company. So we have some prior experience there. Ben: I'm curious, in terms of what you're seeing in the market when you talk to teams that are using LogRocket or not using LogRocket just general engineering teams who are building mobile apps, does it seem like React Native is taking over more and more of the market or is there still a place for true native apps and how do teams think about that choice? Pascal Kriete: I think something we see a lot is that very small teams will start with one native app and they'll write it in Swift because the engineer knows Swift and then they'll be told, "Okay. Now also built an Android one," and maybe they start writing it in Kotlin and then they decide that the team is too small and we're going to use React Native or Flutter or one of those tool sets. As they get bigger, they go back to writing it in the native code. It's just much nicer at the end of the day, to write in all native code, if you have the team to support it. Pascal Kriete: So largely depends on company size and how much of a priority their mobile app is. If it's a sort of second class citizen and you put it out just because people want it, then it's probably in React Native. If it's a true first class citizen, it's probably written in a native language. Ben: That's interesting because I would've thought it's the decision is typically, okay, we need a mobile app. Oh, let's just use React Native because they will work on iOS and Android. And I know it's not that straightforward, but you could say you get to share 70% of your code, but it sounds like what you've seen is people decide we're going to support iOS or Android first, they build native on that platform and then when they go to the other platform, then they're like, oh, it'll be easier to use React Native for that other platform. Pascal Kriete: Yeah. And of course it depends on the team you have, what they know, what they're comfortable with. But the truth is, if you're going to build something in React Native, you're also learning the native code basis. It's almost unavoidable unless it's really trivial. You will have something on the native side that you need to support separately. And then you're supporting three, five languages. Ben: So shifting gears completely here, as I mentioned early on Pascal, you were the first engineer on the team here and now you'll lead all of engineering and I should probably know this, was it 35 people or something like that on the engineering team? Pascal Kriete: Somewhere in the '30s. Ben: So yeah. What has that journey been like? If you could tell yourself from five years ago something, what advice would you share? Pascal Kriete: Well, that's a good question. I think a lot of the advice is just go with it, embrace the change. It goes from three people to 10 people really fast. And then for us, it's also gone from 10 people to more than 100 people really quickly and it's not going to be the same. You start talking about things like culture and having to keep the culture the same for but I think in many ways you just try to keep the things that are good and you get rid of the things that are bad. Pascal Kriete: And some of the good things will come along from new people and some of the bad things will come along with new people and you just have to embrace it. So I think I was a lot more change averse five years ago. And the other piece is if you're moving from engineering into management, you're not going to code as much and you just have to accept that. And that takes a little bit of time. Ben: Do you write code nowadays or not much? Pascal Kriete: Not enough to say I write code. Ben: Yeah. Pascal Kriete: I've dabbled a little bit in the mobile code just so I know what's going on, but I think the team has done the bulk of the work there for sure. Do you still write code? Ben: Rarely, very rarely. Not if anyone else has anything to say about it. I still hear once in a while, I'll hear engineers gleefully talking about old code I wrote and how they recently removed it. And it's a good place to be. Pascal Kriete: Its lasted five years. It did its job. Ben: That's true. Yeah. So thanks so much for joining us Pascal. Normally this is the point, the episode where I ask our guest, do you want to plug your project or something you're working on, but I feel like you've already talked a lot about LogRocket, but perhaps I know we're always hiring engineers. So are you, any particular roles you want to highlight that we're hiring for? Pascal Kriete: We are always hiring engineers and I'll just plug LogRocket in this spot in case you haven't heard of it. A lot of what we hire are full stack engineers. If you are not afraid of new code, of different code, we are happy to talk to you. If you like infrastructure, SRE, security, we are more than happy to talk to you. So please reach out to myself, to our recruiting team, whoever you can find. It doesn't hurt to have a conversation, and yeah, go use LogRocket. Speaker 3: Thanks for listening to PodRocket. You can find us @PodRocketpod on Twitter, and don't forget to subscribe, rate and review on Apple Podcasts. Thanks.