Hello friends, we are listening to episode 101 of the R Weekly Highlights podcast.
Insert your joke about your favorite 101 class or lecture, but here we are.
It's a new era with a new series of episodes and I'm very happy to be joined by my awesome
co-host Mike Thomas.
Mike, how are you doing today?
I'm doing great, Eric.
As you said, 101, I guess it's somewhat of a new era.
We can feel like we're starting fresh on episode 1.
Yeah, yeah, all the feels, right?
All the newfound feels and apparently you've got some newfound equipment we may be testing
out next week as well.
So that's another perk for our listeners to stay tuned.
Yes, hopefully my audio quality will be orders of magnitude greater than it was.
So everybody's going to have to listen in, obligated to listen in next week.
Of course, teaser has been unleashed.
And what else is unleashed, another fantastic issue of R Weekly.
We are talking about issue for week 47, you know, end of year is approaching, so that
number 52 is drawn soon.
But this week our issue was curated by John Calder, another longtime member of our R Weekly
team.
And as always, he had tremendous help from our R Weekly team members and contributors
like you around the world.
So as usual, if you're new to the show, we do a little roundup of the highlights mentioned,
we'll give some additional finds and yes, we're going to have our first ever feedback
segment at the end of the show.
So stick around folks, you'll like it.
We've been there.
You finally solved those last bugs in your code.
The deadline is fast approaching for that report or maybe that shiny app deployment.
You quickly save those scripts, you upload them, you commit them to get your Git repo,
whatever have you.
And yeah, you solved the problem.
No more errors in your code, right?
But you didn't really have time to help out future you and future collaborators with your
code readability.
No judgment here.
I've definitely been there, such as maybe having a function call where you have a lot
of parameters and you've expanded that whole line of calls to over 200 columns and you've
got to look at a little scroll bar to keep up with everything.
That's just one example.
Perhaps you have a mix of multiple function parameters in a single line.
Or worse yet, closing brackets that aren't on their own line.
Okay, okay.
I'm being quite opinionated here based on past experience.
But it sure would be nice, especially if you're new to adhering to various style guides for
your R programming to interactively apply quick tidying up of your R code style.
That's where a very new package released by Posit Software Engineer and Leon O. Henry
aims to fit this bill with a package called CodeGrip.
And in its first dev release on GitHub, CodeGrip comes with both RStudio add-ins as well as
Emacs commands, hey, hey, respect to all the Emacs power users, especially those org mode
ninjas, you know who you are, to reshape your function calls.
I'm very much in the camp of the long format style, where you have every function parameter
as a new line and CodeGrip lets you easily break that wide call into a long one.
Now, like I said, don't worry, it's a safe space here, I don't judge.
If you do like that wide format call, you can go from long to wide the other way too.
Again, all your preference.
Now the readme of CodeGrip gives you a few recommendations for keyboard shortcuts, again,
another enhancement to your productivity flow, to quickly move the cursor to the function
parameters even if you have nested ones.
Maybe one of your parameters is a list or a vector of other things, you can quickly
navigate around that.
Now as I said, this is still quite new, so Leon O. shares a roadmap at the end of the
readme on the GitHub repo for additional features, such as reshaping expressions inside those
curly brackets, React isn't shiny, say hello, and reshaping repeated function calls, which
you might see in packages like DT or some of your dplyr pipelines.
So very intriguing, and you may be wondering, wait, aren't there packages to help you style
your code already?
Well, yes, there are, certainly there's a styler package amongst others, but what I'm
intrigued about CodeGrip is it's kind of a way for you to see how this is applied by
opting into it, but in an interactive way.
I think that's a really neat way to learn how different styles of our function writing
could work.
So again, I think there's always a space for everything here, so I'm really intrigued by
where CodeGrip goes in its future development.
So Mike, we're going to have, I'm going to get your opinion on here.
Are you a long or wide function writer?
Well, I think I'm thinking about, if it's any indication, I'm thinking about submitting
a pull request to CodeGrip to only allow from conversion from wide to long.
Nobody needs to go from long to wide.
Just kidding.
We absolutely do not judge, and it's, if you have a style guide that you're adhering to
by all means, but ease of code interpretation in grokking the code that you wrote or someone
else wrote is something that I've seen taken for granted too often, unfortunately.
And not only does this apply to commenting your code or structuring your repository,
but it also applies to styling your code and styling your code consistently is an important
aspect of any project and any team, especially for a collaborative project and avoiding technical
debt in your organization, but even just for future you as well, to be able to step back
into the project quickly when a new issue or feature request arises down the road.
Take a look at the code that you wrote six months ago, a year ago, and grok it really
quickly.
I highly recommend adopting a code style guide within your team.
There's a whole entire tidy verse style guide ebook, which we will link to in the show notes
already linked to in the show notes for Eric, as well as other resources like Google's our
style guide, which is just an adaptation of the tidy verse style guide with some minor
adjustments.
I might even imagine that there are some folks out there who like me a few years ago have
never even heard or thought about the concept of styling your code consistently.
So this blog might be a great place to start and this repository might be a great place
to get you up and running quickly, especially considering the fact that there's just a nice
add-in that you can use within the RStudio IDE.
This is going to be maybe a hot take, but I think as good data scientists, a ton of
our job is not just about writing code, but it's about thoughtfully designing all of the
pieces of the solution that we're creating.
And one of those pieces is the code that we write.
It's a thing.
It's heavy.
It can be really hard to style your code consistently during the development process when you're
just trying to get your fingers on the keyboard as fast as possible and trying to connect
what's in your brain to your script.
But it requires you to stop and spend some time with each piece of code you write before
moving on to the next piece of code if you are really doing this thoughtfully and correctly.
But tools and packages, fortunately for us, like CodeGrip, can be monumentally helpful
in making the code styling process more efficient as well as keeping our styling consistent
across scripts in our repository or even across different repositories that we have within
our organization.
So I absolutely can't wait to try it.
I think it's a great complement to the other styling tools that we have in our ecosystem.
I cannot claim to be an Emacs aficionado or expert, so I will be using the add-in for
now.
But if you can't wrap your head around exactly how it works just based upon us talking on
a podcast audio, the repository has a bunch of great information as well as GIFs, animated
pictures, which to me, in terms of documentation, are just chef's kiss.
Can't get any better than that, showing you exactly what the package can do.
Yeah, well said.
And one of the biggest challenges I've had is not only dealing with me personally, but
when you get collaborators on a project, getting them to buy into a style guide early on can
save you a lot of time in the future.
But again, it's great to see CodeGrip and the others in the ecosystem can help you rescue
that maybe before it goes too far.
But it's a disciplined thing, trust me.
I know I had a very big project a year or so ago where multiple collaborators jumped
on and I wasn't on the pulse with the style guide right away.
And yeah, reading that after various code reviews, I was like, yeah, I should have done
that a little better.
But hey, better late than never.
So really great package in this.
And yeah, I love the idea of showing off this interactively.
And by the way, Mike, there's a little side comment here, but there's always been a debate
in the tech sector about how to pronounce, is it JIF or GIF?
I just learned from one of the podcasts I listened to on the Linux community that it
is JIF because the programmers said choosy programmers choose JIF.
There's a shout out for some of you that grew up when I did.
It's good to know because I said GIF up until literally today.
And I was very resistant to adopt the soft G JIF for whatever reason.
But I feel like at this point, I have just heard it way more than I've heard the hard
G. So I'm going to convert today's the day.
You know what?
Yeah.
If we can't provide any other service on this podcast, we have put that debate to rest.
So you can thank us later.
We'll tell you how to thank us later.
Moving right along.
You know, while we're on the topic of being nice to future you and future collaborators,
now one of the most productive techniques for assembling a set of related R functions
and over-processing is to bundle them into a new R package.
We've had many highlights in the past that provide a really practical introduction to
taking that next step in your dev journey, especially if you're new to that world.
But it is easy, again, to speak from experience here, to fall into a few habits that can prolong
your development time or debugging efforts, such as that dreaded copy and pasting of a
function or other processing code many, many times.
Here to keep your R package development nice and dry is Indajit Patil, a software engineer
with the Syncra R consulting firm with a jam-packed presentation on how you can apply dry or DRY
principles, however have you, which means don't repeat yourself, in your next R package.
There is a bunch of great recommendations.
So Mike, why don't you take us through some of the ones you read about in this highlight
here?
Absolutely.
There is a wealth of knowledge within this slide deck.
I have immediate takeaways that I am going to literally use in my work today.
The slide deck kicks off with a quote that says, copy and paste is a design error.
It stopped me in my tracks, really spoke to me in my pre-data science days.
I used to copy and paste so much, and nowadays I think I audibly fuss any time that I have
to copy and paste anything.
Fortunately, I don't do that very often, so shout out to Functional Programming and R
and Shiny Modules.
But for whatever reason, whenever I do find myself copying and pasting, it's a last resort
for me.
It might sound silly, but just the way that I've wired myself up to this point and tried
to ingrain DRY, dry principles, in everything that I do.
For whatever reason, I had never thought about using child RMD documents or markdown documents
in a package.
I've used them many, many times in building a report, and that's one of the pieces of
advice that Indrajeet gives in this slide deck, so that you can use the same markdown
document across your readme or your vignette, or at least the same piece of markdown that
you can use, because maybe there are some things that are different that you might want
to represent in a vignette and you might not want to represent in your readme or vice versa,
but probably there is going to be some overlaps and things that are important enough to document
in multiple places, and a child RMD document is a fantastic way to do that, so that when
you do sort of stitch your package together, knit your package together, you only have
to make changes to this overlapping markdown in one place, and that's really the theme
of this slide deck.
I love the links to packages that use these DRY principles in their structure.
I see the DM package referenced quite a bit in this slide deck.
I think it makes sense because Sincra is the author of that DM package, so it looks like
maybe Indrajeet had a hand in working on that DM package and applying a lot of these principles.
I'm not sure, but it's a great reference.
If you haven't used DM before, even just to check out the package down site that it has,
it's phenomenal in terms of the material that's available there for documentation around that
package.
Repeating yourself is not only inefficient in terms of the time it takes to make the
same change in multiple places, but it's actually dangerous due to the increased likelihood
that you'll forget to make a change in one of those many places required.
Modularizing your code, functional programming, and taking a DRY mindset are key to any production
grade code base.
There are fantastic examples across unit testing, including data in your package within this
slide deck.
I don't know what other... There's a lot to pull out here, Eric.
I don't know if there are any other pieces that you want to specifically highlight from
this slide deck.
Oh, yeah, there's a bunch, but the one that got this image in my mind of why the heck
did I know about this sooner is a package that was new to me called Patrick to actually
help parameterize the unit testing calls that you make with test that.
What do we mean by this?
Well, imagine you're building a unit test.
You've opted into that great practice, but you find yourself doing the same kind of either
expect equal or expect whatever expectation you have.
You're doing it over and over again, but all your changes may be like one function parameter
or one other result.
Well, this Patrick package will let you do a tibble of these different, you might say,
invocations going into that expect and what you want out of it so that you can have this
very elegant little metadata frame or tibble that is called then to apply one of these
expectation functions to.
So again, a great way to save you both keystrokes, but also again, potential errors you might
have and copy and pasting that over and over again.
So next time I build a pretty sophisticated unit test suite, I'm going to be looking at
this Patrick package quite a bit.
And then the other aha moment for me is that if you find yourself doing a lot of conditional
messages, if there are errors or warnings that you want the user to use and you find yourself
again repeating that general syntax of it from time to time and maybe varying like what
function it comes from or what parameter triggered it, you can actually create a list in your
package of these more utility kind of functions to call that message and only vary the things
that you need to vary.
So a nice hybrid of things like perhaps the glue package to help you stitch this together
and everything like that.
So that again, you're saving keystrokes, but you have one source of truth for how you're
sending these messages, which to me is the real nugget here, having one source of truth
for a lot of these things so that the repeated calls can build from that instead of multiple
sources of truths that just happen to be the same thing.
So that to me really, really enlightening, especially when you think you've solved a
lot of these issues, but these are the issues that don't really manifest themselves in any
adverse way other than costing you time.
And if you can find some time savings so you can concentrate on the harder problems that
you're trying to solve in your package, opting into these dry principles I think is a big
help.
So yeah, lots of great tips here and definitely a must read, I think if you're building any
kind of sophisticated R package, whether it's for your organization or you're going to release
that out to the CRAN ecosystem, really nice job here.
Absolutely, no, and I just had on the topic of conditions, especially, I just had sort
of a, I don't know, enlightenment idea of, especially if you're, you have maybe a lot
of functions, they use the example in the slide deck, where you're expecting that X
is positive, for example, and you have four or five different functions that you need
to meet that condition in the function.
Instead of writing that separately within each function, you sort of write that condition
within its own function.
You only have to write one test against that condition function, that logic, instead of
having to write four or five tests for that same condition across the four or five functions.
So it's actually going to make your code base a lot smaller.
And that's something I think that I'm going to use this afternoon.
Very good.
Hey, if one thing we've learned over the hundred episodes we've done prior to this is that
a lot of these highlights are things that you and I both say, you know what, I need
to start doing that today, or I do that for my next project.
I just love learning about this stuff, so love it, love it.
Now I will admit the first couple of highlights are definitely a slightly more developer-focused,
let's put it that way.
Now no episode of our week of highlights is complete without bringing the A game of producing
a visually pleasing product all created within R and perhaps a little help from other open
source utilities.
And so one of the mainstays in our first 100 episodes is back again.
And I was so excited about their contribution.
I shouted him out a week early actually, but I'll be shouting it out properly this time.
Albert Rapp, friend of the show and frequent contributor to the highlights is back at it
again with not just a blog post.
Albert has released an early version of a fully online book on creating beautiful tables
with one that gets a lot of attention in the community these days, GT.
And oh, by the way, the book is produced with Quartle.
There's another callback for you.
And I've been on the record that GT is a package that's authored by Richie Young from Posit
that has some of the best documentation and vignettes I've seen.
So this book is surely not trying to replace those, but what I really like about what Albert's
book is doing here is the approaches to introducing GT through realistic scenarios that you might
find yourself doing, especially if you're new to this world.
And in the case of Albert's book here, it's building a simple summary table, the penguins
data set from the ground up while introducing some nice logical and clean guidelines that
creating tables so that we aren't overwhelmed right away.
Again, we're taking a stepwise approach here.
And then he gets to the fancy stuff.
And what I mean by fancy, imagine being able with GT, able to put in some nice attractive
spark lines, little widgets, icons, whatever have you tapping into packages like GT extras
to really blend all this together.
So again, a very nice case approach to building these tables is a fantastic read.
And I dare say Albert has big plans for this book.
I haven't inkling what they might be, but I'll let the reader find out later on.
But this is a great resource.
So if you want to have a great introduction to GT, this is your spot to do it.
What'd you think about this book, Mike?
I don't know how many times we've said it, but Albert Rapid is at it again with some
more very timely RStats content and seen a lot of ggplot data viz stuff from him.
And this time we're doing tabular data viz instead, which is really cool.
The book starts out, like you said, Eric, with a really nice walkthrough of creating
just a beautiful GT table from start to finish with Palmer penguins.
That table includes grouped headers and group rows, summary statistics underneath the data,
formatting, introducing background, color and text color.
Really, really nice sort of case study from start to finish.
Then he employs the GT extras package in that section that I think is literally called fancy
stuff, as you said.
One of the highlights of that section for me is actually the very beginning.
The GT extras package continues to surprise me because I did not know there was this simple
gtplot summary function that you can just throw at any data frame and it'll return a
nice GT table with summary statistics like your mean, median, standard deviation, missingness
statistics for each column in that table and even histograms for numeric columns in that
table.
And it's all in this really beautiful format.
And like you said, in that fancy stuff section, it showcases how GT extras brings in the ability
to embed spark lines, images, ggplots that you can create, you can put right in a row
in your table and more.
The rest of the book walks through formatting, styling and more case studies.
And I think although it is still a work in progress, overall, this book is a fantastic
new resource that we have at our fingertips.
So I'm all for the more the merrier in terms of what we have around GT and documentation
and vignettes and case studies and examples.
So I think this is going to be a really nice resource and excited to see how it grows.
Absolutely, and speaking of tables, just a quick little mini plug is that if you're thinking
about entering POSIT's 2022 table contest, the deadline has been extended.
So I believe it's been extended to December 6 or so.
It'll be in the show notes regardless if I get that wrong, but that's definitely a great
time to use GT or any of the other wealth of amazing table packages in R.
It's December something, I know that, so you have a few more days.
Yeah, maybe after you eat all that turkey, if you're here in the States and you need
something fun to do besides watch football, maybe make a table.
I'm just saying, I'm just saying, you know, what else would we like to say?
Another fantastic issue of our weekly, there's way more than what we talked about here.
So as always, Mike and I take a little time to cause some cool little finds that may be
in the rest of the issue.
And actually for me, one of the topics that came up during my recent appearance on Rachel
Dempsey's data science hangout last week was advice for those trying to get their foot
in the door, so to speak, with a career in data science.
Now what I'm about to say, I can't guarantee that it will automatically land you that dream
role, but one really neat way to potentially impress an organization in preparation for
say an interview or other screening is to build, guess what?
A custom shiny app as an accessory for your cover letter or other introduction material
in a very neat tutorial authored by Javier Araka-Ditku, hopefully I got that right.
But a great little tutorial from start to finish on how he branded a simple little shiny
app in the style of the organization that he was applying to.
So again, really quick wins, completely with how he put that on his GitHub repo and got
it deployed very seamlessly.
So again, great little way to stand out and as a big fan of shiny, I would never turn
that opportunity down if I was new to this.
What did you find, Mike?
So I found two, I guess that I'll call out in this week's highlights.
One, it's not a tool that I am familiar with, but I'm thinking that maybe some others out
there in the data viz space would be interested.
There is a brand new R package and I believe the package itself is just called Figma and
it is for interacting with the Figma API, which I think Figma is one of sort of the
most popular graphic design data illustrator applications out there, if I'm not mistaken.
So I was thinking of some of my folks in the online data science community, like Tanya
Shapiro or folks like that who do a lot of data viz work at the intersection of R and
other tools might be interested in this brand new Figma R package.
I think the data that you get out of it from this API for any particular document that
you have up in Figma really gives you a wealth of information about each shape on the page,
the type of shape, the location of that shape.
So could be interesting to some folks out there, maybe, maybe not worth checking out.
Then the other one is a repository on GitHub called World Cup 2022, which is some modeling
and simulation of the 2022 FIFA World Cup, which is in full swing as we speak.
And the author of this repository has Brazil as the most likely to take home the World
Cup.
So it's interesting to see how the Cup plays out compared to the modeling and simulation
that was done in this repository.
So great work here.
Yeah.
A lot of eyes are on that tournament right now and I think there have been some major
upsets even already.
So it's going to be an interesting ride to get there and we'll see if Brazil is going
to take it home as these simulations show.
But as a fan of the hockey playoffs, I know that predictions can go out the window quite
quickly.
That's the way sports are for you.
But fantastic read there, great, great finds, Mike.
And before we close up shop here, we're going to bring a brand new segment to the show,
which was kind of teased last week.
We got ourselves some feedback, everybody.
And the feedback is coming as way of a podcast boost.
And this boost comes from, hopefully I'm saying this right, Rasta Calavera via the Fountain
podcast app, which is one of the new podcast apps you could download on your phone or computer,
whatever have you.
They sent us 99 sats with the message, first-time listener, not really in an R user, but I enjoyed
the conversation.
Well, there you go.
Maybe, maybe listening to this podcast will convince you to use R. But yes, if you're
interested in sending your support for the show, all you have to do is get yourself a
new podcast app and newpodcastapps.com and then each of these apps can have a quick and
easy way for you to hit a little boost button to give us a little positive encouragement.
Maybe you like a particular highlight or find that we shout out, you know, any feedback
is welcome.
So thank you again, Rasta Calavera for that kind boost.
And also thanks to all of you for listening around the world.
We've had a great reception to episode 100 last week, and certainly Mike and I are going
to continue this train forward as long as it's on the rails here, so to speak.
And you might be wondering, well, where can you find us?
Well, first about the Rwiki project itself, if you'd like to get involved, it's easy enough.
Just go to rwiki.org, feel free to send us a poll request of a great find that you have.
You have a blog post, new package, great video tutorial, whatever have you.
We're always happy to put that into the upcoming issue and everything's written in Markdown.
Very easy to contribute for all of us.
And also if you want to find the Rwiki project on social media, we now have a brand new Mastodon
account that is at rwiki at fostodon.org.
And also feel free to give that a follow if you want to get to the latest updates of the
Rwiki project.
Maybe we'll put a little shout out for this issue coming out.
We're still working out the kinks of the automation piece here.
But also if you want to find me, I am both on Twitter, albeit who knows how long these
days with at the Rcast, but I'm also on Mastodon with at Rpodcast at podcastindex.social.
So feel free to give me a shout there if you'd like to get in touch.
And Mike, where can people find you?
As of this morning, you can find me at mikethomas at fostodon.org.
So I'm pretty excited about that.
I have zero followers and I am following zero people, but I am excited to grow on there.
So I guess that's the new spot.
I dare say in less than 24 hours, you're going to get some followers there.
Well, hashtag just saying.
We'll see.
We'll see.
But yeah, welcome to the Fetaverse, Mike.
But anyway, yeah, it's an exciting time.
We're going to see a lot of the R community be a part of this and R Weekly is going to
be there along the way.
And so if you ever want to hear the back catalog of our recent episodes, again, just head to
rweekly.org.
We got the podcast linked right at the top of the page.
And again, please give us a follow and send us your shout outs if you like on those social
media accounts.
But with that, we're going to close up episode 101 of R Weekly Highlights and we will be back
with another edition next week.