Hello friends, we are here with episode 103 of the Our Weekly Highlights podcast. My name is Eric Nance and as always I am delighted for you to join us from wherever you are listening around the world for some more great art content for your listening pleasure. And it's always a pleasure to be joined by my awesome co-host who's rocking a bit of new hardware here, Mike Thomas. Mike, how are you doing today? Doing very well, Eric. I do have a new podcasting mic set up, so I have gotten pretty serious over here and maybe listeners will notice the difference. Maybe they won't, but I'm excited. Yep. You know it gets serious when you get the dedicated mic. Yes, that is very nicely done and for those of you who are listening to audio, you can't see it, but Mike's microphone is really solid, so I'm really impressed. Mike's mic. Exactly. I should make a meme out of that perhaps. Yes. Yeah, so we're here to discuss Issue for this episode 103 that's been curated by Jonathan Carroll, another longtime Our Weekly member of our curation team. He's been a huge help with getting a lot of the infrastructure stuff back up and running and getting access to various things and he and I have some ideas on how we can implement some more automation with some of our new social media exploits with Macedon, but in any event, he did a great job in this issue and as always, he had tremendous help from our fellow Our Weekly team members and contributors like you all around the world. And we begin our episode today with big news on arguably the foundational pillar of the Tidyverse, specifically the Tidyverse team at Posit is preparing a grand release of dplyr 1.1.0 for January and Posit software engineer Davis Vaughn has authored a new blog post to put the spotlight on some major new features and updates to response to community feedback. And I'll start this off with an improvement I'm particularly excited about and that's a new way to perform flexible joins of datasets using the new join underscore buy function. Now what does this really mean? What does this allow you to do? Well, it's not just doing those typical case of joins where you have the variable in common and it's more of a quote unquote equality fashion. But now with this new join underscore buy, you have the ability to introduce custom expressions for different types of joins, such as those dealing with inequality, maybe a rolling join or an overlap join, which are nicely defined in the blog post. And Davis also does a terrific job with including a realistic example of assigning employees to a company party and how you can use a hybrid of these new join capabilities to make it happen, which would not have been possible without a lot of custom manipulation in previous dplyr functions. And the write up in the blog post is really comprehensive, going through each step of the join process and incrementally improving on it to get to that final stage. And this example particularly hit home for me because a few years ago, I was tasked with creating an app to let someone planning a company event at the day job, assign attendees to tables at this company event and ensuring that at least one higher level manager or executive was present at each table and that the attendees at a given table were representing multiple functions or teams. Now of course I use Shiny for that because I use Shiny for all the things, right? But the onus was on the app user to manually create those assignments, even in the app or within a spreadsheet that I could upload. Well, think of it like this, if I could go back in time or if I have to add, if I have to revise this app in the future, I could use the new version of dplyr to give them a little button that says, hey, you do the assignment for me and I'll proofread it later with a combination of all these joins perhaps or these flexible joins. So I'm really excited to try that out if I get that opportunity again. And I even have another opportunity perhaps to use this where I've seen some resources and what's called the Odyssey project for harnessing real world health data. And they have some really complicated SQL joins, a lot of inequality joins, a lot of custom expressions inside. So now with dplyr 1.1.0, I might be able to implement or translate some of those joins into a dplyr syntax. So I'm thinking of giving that a shot as well. Now it is important to note that the highly regarded data.table package, which is very immensely popular among many in the R community, has supported these quote, non-equi joins for many years, but that was part of the inspiration for the tidyverse team to implement the new join by function. And it has been one of the most highly requested features in dplyr going all the way back to 2016. So it is really cool to see this new feature. Those join improvements are potentially huge. And I think I know a few places in my own workflow where I'm going to try to implement those as soon as this package hits CRAN. Another place that is a huge update in 1.1.0 to dplyr, stop me, Eric, have you ever done a group by? Again, some other dplyr function and then an ungroup. How many times in your life do you think you've done that? Too many to count, buddy, too many to count, right? So what's coming in 1.1.0 is temporary grouping with an additional argument in verbs that work by group, such as mutate, summarize, filter, and slice. You've gained a new experimental argument,.by, which allows for inline and temporary grouping. So it's pretty powerful. It's going to save you at least one line of code. So if in your previous scripts you had written something like empty cars piped to group by cylinder, and then summarize eight miles per gallon equals some mpg, and then you had to do an ungroup after that, all you'll have to do now is empty cars, pipe it in to summarize, mpg equals some mpg, comma, second argument is.by, and then cylinder. So it can just be two lines of code. The output is not grouped, so again that.by just creates a temporary grouping to perform the calculation that summarize, mutate, filter, slice calculation and returns you an output that is not grouped. So you do not have to worry about tacking on your ungroup verb at the end of your pipeline. And I mean, most of the time, probably for 90% of the use cases I have, that's exactly what I'm doing. I'm having to ungroup after I am doing that group by operation. So this is going to save me a lot of code, save me a lot of time as well. They're calling it an experimental argument. I hope that it sticks around and we can expect to see it in 1.1.0, but I think it is going to be a huge game changer in the tidyverse for all of us in all of our ETL and data manipulation pipelines. Yeah, I definitely see massive time savings with this operation. I am very thankful that Davis was upfront about the ordering piece of it. Now there are some in the community that are kind of concerned about this, that some may have had pipelines in the past where they took advantage of the ordering that happened in group by in other post-summarization or post-mutate operations. What I'll have link in the show notes is a toot from deposit Macedon account where there was some interesting discussion from some familiar faces actually to R Weekly on maybe some of the caveats that might need to be thought about. But I think as long as people are aware that the.by is not going to change the ordering that was done by default when the data set was imported, then I think as long as you know what to expect, I think it's definitely manageable. But again, credit to POS for putting this out there now instead of waiting until the CRAN release of 1.1.0 at all and then surprising people. I think that's a very important thing in software development, especially in open source software development for a package like dplyr that is used so widely across many different data science workflows to be upfront about this and not surprise people. So again, credit to Davis and the tidyverse team that put this together. But it is a very exciting release nonetheless. Yes. And maybe just one or two more things I will note about this blog post, specifically starting with the.by argument. Just like group by, you can group by multiple columns. You don't necessarily just have to provide one column to this.by argument. You can use multiple columns, which is great. And group by won't ever disappear. That verb will never disappear. So you don't necessarily have to worry about this impacting any of your production work. And if you don't necessarily want to switch right now, you do not have to switch right now. Two other updates coming in 1.1.0, the arrange function is getting some improvements with respect to character vectors. And there is a new function I believe called reframe, which is a generalization of summarize. So check out the blog post for more info on those other two improvements. And certainly there are more features than what are summarized in the post. So there are definitely links in the post to additional features from the GitHub repo. And certainly if you have concerns about some of the new changes, that's what issue boards are for. And I've already seen a few issues posted after the release of this blog post to clarify a few things. So if you do have concerns and you see maybe a gap in testing the dev release, hey, that's what feedback's for, right? So I highly encourage people to check it out, especially if you're writing a package, an app or whatever important pipeline you have that could make use of these new features. So really nicely summarized by Davis and certainly, like I said, I could relate to that example in the SQL joins because I was thinking, oh, you came up with that? Where was this a few years ago? Could have made my life easier. But yeah, really exciting to see here. Well, it is that time of year, Mike, we're approaching that we're in the holiday season basically now it's the end of the year. And you're probably being inundated like I am with various countdowns or top 10 lists or whatever have you. And apparently, I'm not one of them. But if you use Spotify, they'll listen to your music streaming, you probably received your own personalized list of your most listened to songs this year. And yes, many, many people are tweeting that out in the various social media platforms. In fact, I have a link to Travis Gertz humorous LinkedIn post about apparently a little bit of arranging and summarization is like the new data science hotness and in these summaries, I guess. No, I'm kidding. I know this can be a lot of fun. Hey, we're in the our weekly highlights podcast, right? How can we put a little our magic on this? Well, the very talented Nicola Rennie, a data scientist at jumping rivers, enters the highlights podcast once again, with how she pivoted from her listening habits to driving a distinctly our stats flavored brapt of her most used functions in the year. Now, this was a very fun exercise in the blog post on both code and introspection. And of course, a little bit of data munging and visualization at the end. So Nicola starts of importing all of her file paths and our scripts related to her tidy Tuesday submissions, a great way to have kind of a calendar like chronological order of how she's been using our this year. And she also utilize Nicholas Cooper's NCMISCR package, easy for me to say, with a handy function called list dot functions dot to that file. That's a mouthful, isn't it? But it does what it says, right? It's going to take a set of file paths, look at them and and literally give you a list of the functions and packages that were called in that script. So Nicola combined that with some per iteration to assemble a tidy tipple of the function frequencies or you might say the number of times it was used in her scripts. Now of course, this wouldn't be complete without a top notch visualization, right? Well of course, ggplot2 enters the game here and Nicola proceeds to assemble an infographic of the top five functions that were called in these scripts. Now this is quite meta in and of itself because three of the top five functions are indeed, wait for it, from ggplot2 with AES being used 47 times across her Tidy Tuesday scripts. So there you go, usually Tidy Tuesday has some kind of visualization, right? So that's not very surprising, but hey, now you got quantifiable evidence in her case that ggplot2 is an MVP of her Tidy Tuesday adventures. This is a really entertaining read and again, very easy to follow too. So there's ample opportunities wherever you want to do this for say your Tidy Tuesday submissions or some I'm thinking about. Say I have a directory of all my Shiny app code, what are the most common input widgets I use or what are the most common reactive constructs I use? I could see lots of fun doing that. I'm a huge fan of Spotify Wrapped. It's like one of the easiest data products ever made. It's literally just a count and an order by. I think I saw some people last year tweeting that Spotify Wrapped is the coolest AI I've ever seen, which is just hilarious because it's like a two line sequel probably. Yeah. ChadGBT at AIN, but hey, you know what? You got to start somewhere. I put out a tweet today that I thought was great, hasn't gotten a whole lot of love, but for the 90s babies out there like me, we know that the original ChadGBT was Smarter Child if you're ever on AIM. So I'm just going to leave that out there. ChadGBT isn't exciting me that much. I've seen this before. Anywho, people absolutely love Spotify Wrapped. So it was really cool to see Nicola implement this in an R spin. And that pipeline of functions that she uses that you mentioned, Eric, to find the most used functions across all your R scripts in a directory, that's really useful. I can actually see myself using that maybe to try to write an internal package and understand what are some of these functions that I'm just using all of the time. So I don't know. I feel like there are a couple of different interesting ways that I might be able to leverage that particular logic that she's put together, which is really nice. I'm looking forward to creating my own Spotify Wrapped using Nicola's code this afternoon. I would love to see how my most used functions have changed over time. Ooh, I love that idea. That might be scary to look at. And probably, you know, if I looked in a year from now, it'd be a lot of movement from group buys and ungroups away into the.py argument. So we'll see. One of my favorite parts of the viz that she puts together is the color change at the top that makes it look like someone took a bite out of the visual. And she does it with some cool sort of random number generation with a particular seed, as well as the cumulative sum function, just really sort of brilliant code to create what looks like someone taking a bite out of the corner of the visual. So I highly recommend you checking it out, not a ton of ggplot code, like a surprisingly small amount of ggplot code to create this beautiful visual. So I am absolutely excited to test it out myself, and I would encourage everybody else to test it out themselves and tweet their results. Yeah. And the key part is that there were no custom other programs to help with that visual, right? That was all in ggplot2, all in R itself. So yes, another win in the ggplot2 notch, if you will, for creating infographics that you would never guess were produced by R. Another fantastic visual. And yes, even his top lists are also fair game here. Really, really great read. And yeah, if I turn loose on that set of code for the R scripts I made for my dissertation compared to now, I don't even want to know. Oh, gosh, no. I've not looked at that code for many, many years, but it's on my hard drive here in the basement somewhere. I don't have the guts to look at that, but it would be a fun exercise, nonetheless. Keep it tucked far, far away. Yes, yes. No one's going to hack that little mess over there, so thank goodness. Well, speaking of little hacking, if you will, our last highlight talks about one of the Mike's and I favorite topics, and that's Shiny development, of course, and how you might be able to do a slight bit of hacking, but yet make a huge improvement to your app quality. And what we're talking about here is that out of the tin, Shiny and its related package ecosystem comes with so many features out of the box. Of course, we have the huge selection of input widgets. We have reactivity. We have these great wrapper packages to give you new UIs, new ways of interaction. And case in point, one of those being BS4-, one of Mike's and I's favorite packages that create dashboards that look so professional, so polished. So of course, shout out to David Grange and for making BS4- and the R interface suite. But what if you're using that and you're doing what we may call a client-side interaction, but you're still losing a bit of what happened in that interaction to the server side. That's where you might be able to plug that hole with just a little bit of custom JavaScript. And that's what for the third episode in a row, returning to the R Wiki Highlights, Albert Rapp has another awesome blog post on how he enhanced one of his Shiny apps that was serving a dashboard with JavaScript while fully admitting he is not a JavaScript expert. And if I had known this, well, I kind of knew this was always possible in my early days of Shiny, but I felt scared about it. So if you've ever been intimidated about the idea of custom JavaScript in your apps, you definitely need to read Albert's post here. This is a terrific example of one of the features of BS4- when you have these little cards or boxes in your app and then letting the user determine the order of them, you just click and drag it around like you would anything else on a computer interface. But the issue was the text that was inside these boxes was not being preserved in that new order that the user did through this rearranging. How do we get that out? And that's where he took the moment to play around with a little bit of JavaScript in the developer console, to me, for all the 80s geeks out there, going into the JavaScript debugger console and like Chrome or your browser of interest is like my favorite movie of Tron when Flynn goes into the game to hack in the MCP. Yes, yes. There's a 68.71% chance you're right. But it's not so intimidating once you know what to look for, what element you need to get, then the rest of the post goes through just a little bit of JavaScript code to grab the contents of that text input from that card and then even do some more iterative programming to get those values and a map like framework. And then the hook, of course, is to make that manipulation available to your Shiny app on the server side. So in essence, he's made a custom input that he can observe upon or put in any reactive or other construct to get that new ordering of the text that's available in those boxes. Obviously, this is a post you want to read probably a couple of times if you're new to this. But the way he outlines this from the investigation, harnessing on the inputs needed or the text inputs needed, and then bringing that back into Shiny, it's a great use case for just how easy it is to get started in this, but it gives you that little seed of, I can take this much further in other situations where I don't have an R package. I don't have a built-in Shiny app function that will do this for me. It's a great way to know kind of the inside of how Shiny works. So again, great, great post by Albert here, and I really enjoyed reading it. Yes, Albert just keeps coming back with fantastic content, a lot of data viz, but this time it's around Shiny and JavaScript, which is just incredible. If anybody knows what kind of coffee he drinks or energy drink he likes to drink to be this productive and pump out this incredible RStats content, please let me know because I am going to buy as much of it as I can. You covered a lot of sort of the problem statement of what Albert was trying to accomplish here with reordering these text area input boxes using the sortable function in bs for dash, which is really, really nice that allows you to drag and drop different elements, but wanting to get out the user's input into those boxes in the order that the boxes were dragged around in, and this required some JavaScript. One thing that Albert highlights and shows is how to play around with your web browser's console by essentially right clicking, clicking on the developer or the inspect button in your browser and then navigating to that console area and being able to actually have a blinking cursor that allows you to enter in some particular command and get a response from the browser. Really, really cool, something that I have not done enough of to be honest, so this was a great introductory post for me and for anyone else who is looking to maybe get their hands dirty with a little bit of JavaScript and getting into the developer portal in your web browser, which was a really nice introductory way that Albert went about explaining this in his blog post. He talks about a few different ways to incorporate JavaScript code in your Shiny app, and two of them involve the ShinyJS package, which is I believe a Dean Attali special, did I get that wrong? No, you are exactly right, Dean Attali has been one of my MVPs of my early Shiny career and still to this day I use all of his packages in one way, shape, or form. Absolutely, I do as well, so there's a couple ways to go about doing that. You can just define your JavaScript code in a long text string and include your JavaScript code via text variables with the ShinyJS package. You can read your JavaScript code from a particular file using the extend ShinyJS function, or you can incorporate JavaScript code actually without ShinyJS in a couple different ways. If you have a particular button, you can set the onclick attribute of that, you can sneak the JavaScript code into your app by placing the tags, dollar sign, script into the UI, that function to the UI, and you don't necessarily need ShinyJS to do that, so there's a bunch of different ways to go about accomplishing incorporating some JavaScript into your Shiny app. Typically, the way that I have in the past used JavaScript is really with visualization libraries like Echarts for R or Reactable, they allow you to write a little bit of custom JavaScript to do something beyond what those packages offer in just their R functions, which is really nice, but I think going the next step will be a big deal for me here to actually write some custom JavaScript outside of one of those packages and include it in my app, and what a really nice use case that Albert had to showcase how to do this in a few different ways. Yeah, and if you are inspired by this exploration, like I certainly was, and you're wondering, hey, where can I go to kind of get more ideas of what I can do? What's the potential here for me? We'll have linked in the show notes two excellent freely available resources. We have David Grangin's outstanding Shiny user interfaces package, which is also available online for free. And great chapters on JavaScript interaction with Shiny apps, and then John Kuhn, who of course is the author of Echarts for R, shout out to John. We have a great link to his JavaScript for R book. That's another great way to learn about the potential here. So certainly you can go down quite a few rabbit holes here, but I think they're more than worth it, especially when you get into situations like me where it's not just this app I'm making for a handful of people, it's going to be for executives or it's our key leaders that want the best they can get out of a user experience. So really awesome, awesome posts, Albert. And yeah, as Mike said, whatever you're, whatever you're consuming or in your routine, they'll crank all this out. Send it our way, please. We'd love to have it. Yes, please. It's awesome. And what else is awesome? Well, of course it's the rest of the issue of R Weekly. There's a ton of awesome content that Jonathan has put together for us and we'll mention our additional finds here. Now of course, for me, in this episode, you've heard a lot about the tidy verse for good reason of course, but that's not the only verse in this episode. I want to give a well-deserved shout out to my fellow Life Sciences R enthusiasts who have spearheaded the Pharma verse, which is a true testament to the power of collaboration and open source that's putting the power of R into generating clinical results and data processing. And what we have linked is a great post from the Posit blog, which is a summary of how the Pharma verse started, which has some great times at the R Pharma Conference and how it's just become a huge and important part of the full story of how we're getting together in our industry to make things easier to cooperate together instead of in silos trying to do our own version of these clinical reporting or processing needs. Certainly ways to go, but it's really exciting to see the Pharma verse really take a lot of foothold into what we're doing in Life Sciences and also a teaser for an upcoming hackathon for one of the key packages called Admiral that's happening in January. We'll have a link to that in the supplements as well. Micah, what did you find? There's a great post called Setting Up and Exploring a Larger Than Memory Aero Table. And it is all about handling larger than memory data with Aero and DuckDB, which are two of my new favorite tools for essentially handling anything related to ETL and large data, if you will. There's been a ton of buzz lately about DuckDB as well. I think some of the Aero, Buzz Aero has been around a little bit longer, so maybe it's died off a little bit, but I think it's still a fantastic project that I use all the time. And we are now starting to see that there are intersections of both of these packages and both of these technologies that we can leverage to make our queries even faster and make our data prep even faster, which is really incredible. And the authors really do a great job of showcasing the different advantages of using Aero, DuckDB, or both in your ETL pipelines. They run 100 trials of some code, some simulation code, and plot in a really nice ggplot the time that it took to run that code with different combinations of Aero and DuckDB. So I highly recommend checking out this post if, like me, you are interested in the most cutting edge data manipulation libraries and technologies that we have available today. Yeah. Aero and that ecosystem is something I want to pay a lot more attention to in my upcoming efforts in 2023 for some really complicated and voluminous data processing. So I'm really excited to see that tutorial as well. Yeah. Excellent. Excellent find there. And for feedback to the show, I want to offer a little correction from last week. And when we were talking about that great tutorial about using Rselenium for web scraping, I admit I was under the impression that the Rselenium package had found a new maintainer because I had seen a grand release very recently in the fall. That's actually not the case. Rselenium is still looking for an active maintainer. And hopefully we will have somebody in the community step up and take reins of that very important package. So I want to give a great thank you to Cohen Huffkins on Macedon for letting us know about that. So, yep. So if you're interested in Rselenium and taking reins of a very important package, certainly get in touch with the GitHub repository for the package. There's an issue that is looking for an active maintainer. So again, thank you, Cohen, for that feedback. We don't have any boosts this week, but if you're interested in sending a little love to the show, if you're getting any value back from listening to this, you can send value back to us in any way you like. But you can do that easily with a new podcast app, which you can get at newpodcastapps.com and send us a little boost and have a little fun with us. As for where you can find us, well, we have R Weekly available on Macedon with at R Weekly at Fossedon.org. And also you can find me still somewhat on Twitter with at the Rcast and also on Macedon with at our podcast at podcastintex.social. And Mike, where can they find you? Yes, you can find me on Twitter, still hanging around at Mike underscore Ketchbrook, or you can find me on Mastodon at Mike underscore Thomas at Fossedon.org. Thank you. Yes. One of these days it'll become natural for us to say this with repetition at will, of course. But yeah, please get in touch with us and again, we're always happy to hear feedback or corrections or suggestions. Nothing is off the table for us who want to make this podcast the best for all of you out there listening. Well, it's been a lot of fun as always, Mike. And again, really enjoy seeing the new hardware. You're a serious podcaster now. That's awesome to see. Absolutely. Absolutely. You know, it only took me 50 episodes or whatever it's been so far, but looking forward to what's to come. You bet. Yep. We've got a fantastic year, I'm sure coming up in 2023, but we still got to see how to finish up 2022. So that means that we will be back with episode 104 of the R Weekly Holidays podcast next week.