Abe Gong:
One of the things I love about open source is you just kind of start pulling one of these strings and sometimes a whole lot of interesting problems unravel together.

Eric Anderson:
This is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson.

Eric Anderson:
I'm joined today by Abe Gong and Kyle Eaton of the Great Expectations Project. Abe, Kyle, welcome to the show.

Abe Gong:
Thanks Eric, awesome to be here.

Kyle Eaton:
Thank you.

Eric Anderson:
This is one we've been excited about and discussing for some time. Great Expectations has caused quite a stir in the industry. It's good to be able to capture the story. So let's level set with our guests, and explain to them exactly what Great Expectations is. Where should we start?

Abe Gong:
I'm happy to take that one. Great Expectations is the leading open source project for, people use all kinds of words for this now, data quality, data testing, data validation, data observability, pick your buzzword. But at the end of the day, the goal is to understand what is happening in your data pipelines. And the way we think of the problem, it's not just a monitoring tool, we also fold in documentation. So having docs that reflect what is happening, and we can talk more about that in a minute, and then also data profiling, which is scanning through databases or logs, or wherever your data lives, and helping to come up with a set of rules that'll make for good tests. I like the word data quality because I think it's not just a technical problem, it's much more of an organizational challenge of getting everybody on the same page about what's going on in the data.

Eric Anderson:
Perfect. And you used, also, the word data testing, which brings to light, we talk about tests in software development all the time when we want to ensure that future builds meet spec. Is that type of pattern what we're talking about here, in terms of data? Like I have CI tests for my app? And now I have Great Expectation tests for my pipeline?

Abe Gong:
Yeah, so I think that's a really good one to be explicit about. So the question is, what constitutes a change? In the software world, changes happen when the software changes. They happen when you compile your code or do a get pull. In the data world, it's interesting, because other people change the data all the time, things in the real world happen, messages accumulate in queues, and logs, and data warehouses. And when those things change, they can break. So when you talk about monitoring the data, you're testing, you actually have two things that are changing. One is, somebody is changing the code, and yes, you want to do a CI process on that. The other one that actually happens a lot more often, and that you needs higher scrutiny because it's more outside your control, is the data itself changing.

Eric Anderson:
Yep, the data is flowing, if you will, or it's being updated and there's drift in the data, and some of it could drift over time, or you may just get anomalous, erroneous data that you want to be alerted on.

Abe Gong:
Yeah, and I mean all kinds of things can happen there. So you talk about drift, and to me that implies this slow, steady process where, oh, I don't know, you've got a service that's slowly degrading and so the number of failed API calls is slowly changing over time. That's one possibility, like frog in the boiling pot. The other possibility that happens a lot is, maybe you have an upstream data provider, so a logging team, or a data vendor, and maybe they make a change, and if they flip a switch, and then all of a sudden you're getting very different data. So I think that I hear from a lot of machine learning teams, for example, is, they're downstream of the logging team, and if the logging team decides that they want to change the labels on events, that might really make sense for debugging the app, it could also lead to completely screwing up a machine learning model that was trained on those events before.

Eric Anderson:
Got it. And that explains why it's more than testing because you're right, as soon as you help inform one team what another team is doing with data, then it becomes an organizational data quality effort, and I can imagine alternating features you described.

Abe Gong:
Yeah. I mean, if you want to get kind of poetic about it, data people are used to thinking in dags, directed acyclic graphs. And the way that I've come to think of it is, there's the dag that is captured in your software, but upstream of that there is a causal flow, there's other things that are happening in the real world. And when things in that great dag in the sky change, it's the data generating process, the upstream data process that affects what shows up in your database, those things will also affect your data and the conclusions you're drawing from your data, and you want a system that is smart about those things.

Eric Anderson:
Yes. The great dag in the sky.

Abe Gong:
I think that's a very me thing to say.

Eric Anderson:
So tell us how the project came to be. I mean, part of me wants to ask you how this emerged now? We've had data problems before, but take us to the back to the beginning.

Abe Gong:
Yeah, so if you go back, the project started almost four years ago, at this point. It was originally a nights and weekends collaboration between myself and a friend, James Campbell, who's now the CTO at Superconductive, which is the company that's backing Great Expectations. So, at the time it was a thing that we were both using at work.

Abe Gong:
Actually, let me tell you the actual origin story. The two of us had been members of a group of friends who'd known each other since college and before. And we'd jump on a call every month or two to talk about what we were doing, just in life, because we were all getting married at about the same time, and most of us wound up in data science, and so there was a lot to talk about there. So one of these days in early 2017, we jumped on a call, and I had come with this idea of, there really should be some kind of a tool for testing data in the way that we just talked about.

Abe Gong:
And I jumped on the call, and James went first, and he started describing this thing, it was exactly the same thing that I was thinking of. So we'd come to the same call with the same idea, and even some of the technical details were remarkably aligned. And so it just felt like, "Okay, we have to do this now. There's no way that we cannot build this system." At the time he was working in government and national security, I was working in health care doing data integrations and data warehousing, and I think part of it was seeing these two widely disparate industries that very much needed the same thing, and just realizing that we'd arrived at the same place, totally coincidentally.

Eric Anderson:
And I'm curious about the initial ambitions, you wanted this tool so you could get your work done, you were excited to work with your friend on it. Is that the extent of what you wanted to achieve with this thing?

Abe Gong:
I think we had vague notions that, yeah, this is a thing that other people would need, so let's open source it. And at the time, neither of us had a ton of experience as open source contributors or maintainers. We'd done some work there, and I think a lot of people in the data ecosystem today have seen not everything, but most of the platform that the data world is built on today is the product of open source over the last few years. So I think we had a lot of respect and admiration for that, but it wasn't really something that we were deeply experienced in. And maybe to answer the question behind the question, there was no notion at all of like, "Oh, we should make this a startup, like here's our revenue model for this thing." We weren't thinking about that at all, at that point. It was mostly just, "Hey, here's the thing that the world should definitely have, and we have a similar enough idea about it that'd be fun to work together on it."

Eric Anderson:
Yeah. There's definitely a lot of, so you'd been data open source consumers for a while, and so that was the natural thing to do. Let's build this thing that solves our problem, and then see if it solves other people's problems.

Abe Gong:
Yeah, exactly. And I mean, it's one of the things I love about open source, is you just kind of start pulling one of these strings and sometimes a whole lot of interesting problems unravel together.

Eric Anderson:
So I've got to ask about the name, and at what point you decided we were calling this Great Expectations, it's kind of fun.

Abe Gong:
We decided that pretty early. We kicked around some other names. So I don't know if we've ever talked about it publicly, there was a time when we talked about it as Lewis and Clark, and the idea was, explore your data and come back with a map. And at the end of the day, we just decided the whole thing is built around this idea of expectations, let's just call it something that emphasizes that. And it doesn't hurt that my wife is an English major, and we own all of the Dickens books, and so there are a lot of Charles Dickens puns you can make too.

Eric Anderson:
Yeah. Very good. All right. So you've got the name, it's you and James, collaborating on this. Eventually, it reaches a point where you push it out to the world. What's the initial reception, or kind of launch, if you will, like the open source?

Abe Gong:
So we proposed it as a talk for Strata. I need to check, I think that was early 2018. We'd just been working on it quietly, it was open source, but we hadn't really, there'd been no press or no discussion about it. So we did a talk at Strata, we put out one blog post on Medium, called Down With Pipeline Debt, that just described what we were doing and why. And yeah, I mean, there's this immediate sense that, yes, we'd struck a nerve. I have a lot of good friends and then people who I didn't know, but who I'd been following on Twitter and respected, jump on it and retweet it and share it. And yeah, just the sense of like, yeah, this is a thing that lots of other people recognize the need for. It's timely, the way we've architected it, it feels good, integrates well with other tools.

Abe Gong:
And I mean, I guess the unusual thing is, we launched it and then we didn't do a whole lot with it at that point because we were still in this nights and weekends mode. Superconductive at that time was a health care data consulting company, so very much had my plate full, just lining up clients and doing good data consulting. And like I said, James was still in government at that point.

Abe Gong:
And I'll fast forward on timeline a little bit, it wasn't until 2019 that we decided, wow, there's something really interesting happening here, and okay, let's actually make this the focus of what we do. So the real catalyst for that was a combination of consulting contracts that came to Superconductive, and it was people saying, "Hey, we're not a health care company, so we don't need you to do health care stuff for us, but we do have data pipelines, and we know that you have something to do with this open source project, and that's a real issue for us." So a combination of other data consulting companies, there was some pharma in there, there were media companies, so just a very eclectic different group of teams coming and saying basically, "Hey, this is a problem that's real, and we've got money, and please help us solve it."

Abe Gong:
Even more than that, it was contributors that helped us really see that Great Expectations had become load bearing technology. So for example, the first deployment of Great Expectations on Spark was entirely contributed by a team out of Cascade Data Labs, which is a data consulting company that we didn't know personally before, working for a client that we didn't know, and that their client had a bunch of tests in Great Expectations on SQL, they were migrating to Spark and they wanted to take their tests and their documentation with them. And so they basically loaned us an engineer for two months to work on that project. So I mean, real expense that people are paying for, and it just opened our eyes that this project, that we put out there because we thought it was a cool thing, was actually really getting used in a lot of places.

Eric Anderson:
That's fantastic. Abe, we may come back to this because I want to hear about those initial community outsource contribution moments, but maybe just pull Kyle in some. Kyle, maybe you can tell us about when you'd first run into the project, and how the community operates today, and how that's evolved over time?

Kyle Eaton:
Yeah, so I actually was brought on as a UX designer originally, and I was staffed onto one of the clients, as well as start to poke around what of more like a SaaS platform of Great Expectations could possibly be. Those initiatives started this change when we saw the momentum build around the community and Great Expectations. So it had like a small Slack, and some consistent contributors in there, I think at the time it was probably in like the mid-twenties.

Eric Anderson:
Remind me, I think you set up the Slack channel, didn't you?

Kyle Eaton:
Yeah, so I set up the Slack channel, and then I also, I actually came on around the same time that James came on full time, and that's when the writing was on the wall that we were going to start moving towards Great Expectations being the focus of the community. So I shifted over from doing UX, started focusing on community growth, and we've just been doing sort of scrappy, organic content strategy to build out the community.

Kyle Eaton:
And it's definitely an advantage being an open source product because we're able to get in and mingle with lots of different communities who are very protective of their communities and don't want people coming in trying to sell their product. And that has brought us amazing contributors, has given us a lot of great cred within the data engineering community. And that was, I think the Slack channel probably started in, I want to say early 2019, and then we're at 4,600 something today in terms of Slack, and we have over 200 contributors and counting in our GitHub. So it's moving fast, and right now a big initiative for us is building infrastructure to make sure we continue to scale in a way that people still enjoy the community.

Eric Anderson:
Yeah, you're meaning like things beyond Slack, other places to meet and coordinate?

Kyle Eaton:
Exactly. Yeah, that and the Slack, as well. I mean, just making sure you have good support in there, we're building out our DevRel team, so we're getting more developer advocates to help in GitHub, to help in Slack. We're going to start expanding our forms for long form conversations and things like that. We've dabbled a lot with community show and tells, which have been super, super fun, and we want to start bringing those in more consistently now that we're putting more weight behind our DevRel team.

Eric Anderson:
There always seems to be some tension behind like, we want a Slack, but we also want a more state-ful place for conversations. We answer a question on Slack, but then it gets lost, and how do we reuse that? And I've seen people go really heavy on the forums and then other people are like, "No, no, I don't want to get my question answered. I just want to hang out, and lurk on the conversations that are happening. I don't actually have a question to be answered. I just want to interact with a community," so it does feel like that you kind of have to have both.

Kyle Eaton:
You do. And we like to call it the River of Slack, so you say something that could be really interesting, and then just, swoosh, it's gone.

Eric Anderson:
Gone.

Kyle Eaton:
So we really want to build a good bulletin board for those interesting conversations that could be long form, because not everyone is in there daily to catch every moment. Some people pop in monthly, weekly, something like that, everyone has their different cadence. So yeah, we're trying to build a nice bulletin board for those more long form conversations, but the effort for Slack will be there, we're not going to take away from that.

Eric Anderson:
What role has other communities played in your adoption? I've been impressed that Great Expectations gets kind of... You go to other data projects and they say, "Oh, we work well with Great Expectations." You go to Airflow, or to Dagster, or something, and you get to kind of co-opt other communities, do you feel like that is a way of finding users? And what role do other communities play in your adoption?

Kyle Eaton:
You're hitting on a really big point. I think the collaboration with other communities, one is probably my favorite thing to do, and also the best for both community's growth and interaction. It brings a lot of activity and interest when you have that collaboration, and one of the ways that we've been able to do that successfully is with our show and tells, where what we'll do is we'll highlight two different companies that are using Great Expectations and then we'll throw in like a Dagster to present, or an Airflow to present, and the other day we actually did an event with Flight, and those are super fruitful for everyone, especially within those overlapping open source communities, they tend to work really well because everyone has that same mentality of sharing and growing.

Eric Anderson:
Yeah.

Abe Gong:
Some of it is crossover in terms of user adoption as a growth loop. I think a lot of it though is more about cross fertilization of ideas. And what I mean by that is, while we're working with a Flight team, we're having really interesting conversations about what is the nature of typing in data, and where are type systems going to go? And it's a thread that we've been pulling on that I think is going to be a relevant thing. It's a thing they've, honestly, I think thought more deeply than almost anybody. I don't know, Dagster Team is up there too, but being with people whose full-time job is also empowering developers, and they're thinking about what is the data ecosystem going to look like in six months, in a year, in five years, just being really in those conversations, it's just energizing for everybody. And I think it matters to have the right ideas there. I think having the right architecture, having the technical decisions, really does shape the way the ecosystem builds.

Eric Anderson:
Yeah, and open source creates this public forum, you can't really hide behind walls and force developers in a direction. There's a bit of a, forgetting the phrase...

Abe Gong:
Marketplace of ideas?

Eric Anderson:
Marketplace of ideas, exactly, yes.

Abe Gong:
Yeah, there's a transparency there.

Eric Anderson:
Yeah. Which favors developers, the best ideas win out, and it favors the communities who are really engaged in that dialogue to discover the truth, rather than muscle their angle.

Kyle Eaton:
I think there can be muscle and angle in a way.

Eric Anderson:
Yeah.

Kyle Eaton:
And then this could even be a good thing sometimes, is you could have that one really passionate person about like, this thing would be great to have, it could be what we were talking about with, sorry, what was that? What was that major contribution that Cascade put in?

Eric Anderson:
Oh, Spark?

Abe Gong:
The Spark, yeah.

Kyle Eaton:
We weren't about to, I mean, of course, like, agree, that's good, have that, but you weren't about to put in the effort yourself to do it, but then someone was like dedicated, "No, we want this," and just boom, here it is, all we have to do is click merge, and we're good to go.

Abe Gong:
It was more work than that, but yes, but in principle.

Kyle Eaton:
Yeah.

Abe Gong:
It would've been months before we would've gotten to building that out ourselves because we didn't have an active Spark deployment. So I think we had a footnote in a GitHub issue that said, "Oh, by the way, sometime having this as an execution engine would be kind of a cool thing. We'd be supportive of that." And it was really cool that Cascade picked that up. And then the thing that happened after that is somebody came in to do big query and steadily extending out the library. There's a little bit of one upmanship in a open source sometimes, where somebody sees a cool integration, they're like, "I could do that, but I could do it with this bell and this whistle." And that's cool too, it's all good energy.

Eric Anderson:
I think your situation has been an interesting one where you have this great burgeoning open source community, and then in parallel, there seemed to be a rush to proprietary software solutions in a similar vein. And I wonder, as commanding a company and an open source project, what it feels like to be in that situation? Or is there any kind of, "Ah, we've got to rush to a product." Or is there like, "Hey, we're betting on this community, let's keep growing this." How has that felt?

Abe Gong:
Yeah, there's some of both, but I think, overall, it's more the second. Our fundamental belief is that the future here, this is a set of tools and a set of use cases where developers are going to be the ones who really decide, theirs are the votes that count. And so being really close to a large community where we hear all the good and the bad about data quality from a lot of people, I think that puts us in a really good position to develop the thing. So there are other companies out there, I know a lot of the founders, they're good people, they're smart people, and would wish them and their sales teams the best, and I think the developer-led notion here is going to be ultimately the one that really shapes the future.

Abe Gong:
I'll also say, philosophically, being very much a data person myself, but also wearing the entrepreneur hat, I feel this interesting psychological pull from going back and forth between just being delighted that we have venture dollars, we can build this thing, it's going to be part of the data ecosystem, and then the other half of the time being like, "Okay, I want to win the market." And part of the way that we've dealt with that is by trying to be really clear with the open source community about what's open, what will always be open. I want to be careful because it'd be easy to put my foot in my mouth here, but you've seen a lot of other open source communities where they've had to claw back licenses, or take things that used to be free and make them not free anymore.

Abe Gong:
And we had, I'll call it the luxury, but we had this kind of thoughtful period in 2019, that Kyle and I have talked about, where we were running a profitable bootstrapped company, we hadn't yet taken funding and signed up for the venture backed roller coaster. And in that period, we got to be really thoughtful about, how do we build an open source project that, on its own merits, is genuinely valuable to the data ecosystem, that can become one of the defining abstraction layers for the data ecosystem, and then also have a paid product that can come in on top of that and add more without detracting anything from what's in open source. I think being able to be thoughtful about that at that time has put us in a place where everything that's on open source will always be in open source. And there's a cool act two, coming with the cloud paid product before long.

Eric Anderson:
Yeah. That's an interesting observation, that there may be some value in having that incubation period to sort out some things, because you're right, there's not a lot of precedent for a happy path where we've seen these clear cut open source companies commercialized in a way where everybody meets expectations.

Abe Gong:
There are several that have done it really well, I think, and there have been quite a few with, there's been drama. And so just having that thoughtful period to figure out, okay, what is our promise to the community going to be? And how can we draw the lines so that we know can keep those promises? If there are any potential founders thinking about this, I'd strongly encourage just thinking through, take the time to figure out how you're going to segment value between open source and paid, and make sure that there's really enough in both camps.

Eric Anderson:
Great. I want to transition this a bit. We've talked about the very early days with you and James. We've talked about the decision to move Superconductive around to backing the project and focusing there. Maybe you can take us more to today, what the project looks like, and we can move into your plans for the future?

Abe Gong:
Yeah. I guess I'll give the boring business-y stuff, and then Kyle, if you want to fill us in on community cred and all that. So, on the business side, public knowledge, we raised around for the index early this year, that's unlocked us to be able to just really grow the team and go from there, so we're well capitalized at this point. So some companies get to this point and they say, "Okay, put a bow on open source, now we'll figure out the paid thing." We think there's a lot of work to do still in open source, and so we're building up the developer relations team, and also having large fraction of our core engineering continue to work on open source. And then at the same time, there's a team that's working on a paid hosted product.

Abe Gong:
And I guess I'll put in the plug just because we mentioned it a couple times, we're not doing active signup or active wait list at this point. We're working with a small number of design partners, but the people who are forward thinking about data and really like the approach we're taking, we'd be happy to talk to people about design partnerships too. So I'll put that as a plug for anybody who's listening.

Eric Anderson:
Kyle, do you want to give us more about what's coming next on the developer relations and the open source roadmap?

Kyle Eaton:
Sure. And just to add, you can hop in our Slack to ask about the cloud, and then there's also a sign up form on our site, just so you know how to get in contact with us.

Eric Anderson:
Thank you for connecting the dots.

Kyle Eaton:
You got it. Yeah, so for DevRel, like I said, we're growing the team, we've run a lot of experiments on having events, on having community generated content. And we don't want those to just be sort of experiments we do just here and there, we want that to be a consistent thing, so we're looking to have those show and tells on a cadence. Same with our office hours and opening up the roadmap to our users.

Kyle Eaton:
We also are working on some cool features for the community that will be live on our site towards the end of October. One is going to be our Expectation Gallery. So the idea there is that we'll have a list of all the expectations that have been created from us and the community. So when you make your own expectation that is merging in the code base, it'll be listed the gallery with your GitHub handle attached to it, along with all the features of the expectation and the explanation of it. So I'm hoping that that will encourage more people to make expectations, and then allow other people to show off the contributions that they've made. So with this gallery, we're hoping that we could help start to do different initiatives that tackle maybe verticals of expectations, so we can have a calling out for expectations that are in health care and security and pharma.

Abe Gong:
Geo, time series, statistics, explainable AI, there are all kinds of places where the framework can extend to and be used.

Eric Anderson:
For the uninitiated, an expectation is kind of like a type, it's a bundling of, this is what this data should look like, generally.

Abe Gong:
Yeah. You could think of it as an assertion about data. Oh, let's start with basic ones, like expect column values to not be null, or expect column values to be between. But you can also do interesting things like, expect column pair to be correlated, like what is Mike Pearson's correlation? You can do anomaly detection. And as we get into other domains, we're going to be able to do things like, expect point to be within polygon geographically, or expect lat long to be within country. So you can think of it as an assertion. The interesting thing about data is, almost all of data science is about asking questions that are themselves kind of assertion like, so it ends up being this really big library of questions that you can delegate to the machine to ask for you. And yeah, there's actually some really cool ways that those can mix and match as time goes on.

Eric Anderson:
Interesting. So yeah, I guess what maybe you just highlighted for me is that arbitrary data I could feed to the expectation gallery and say, are there any expectations that seem to describe this data?

Abe Gong:
Yeah, if you wanted to truly brute force it, totally. Yeah. You see that sometimes with things like demographic data, like, "Okay, I've got this column, it contains strings. The strings are long enough that they seem to be interesting, but what are these things? Are they addresses? Are they names?" And so looking for [inaudible 00:26:34] hits or statistical characteristics of matching patterns and characters, you can actually size up, what is that data?

Abe Gong:
So we're joining all kinds of threads now, but we were talking before about type systems. I, again, think that the Flight team and the Dagster team are thinking cool thoughts here, over time I think it's going to be really valuable to have your data system know, not just, "Oh, this is a five digit integer, but this is a valid zip code in the State of Massachusetts." Because if you know that, then you can reason all kinds of other things about it, that metadata becomes really, really invaluable. So anyway, the sky's the limit there, but there's all kinds of things that we're looking forward to building in the community.

Kyle Eaton:
And from the community angle, what's cool is that these contributions can be great for someone where it's like their first contribution, or you can have these extremely complicated contributions. We actually did a student hackathon where we did a speed run, and I think the fastest one was definitely under five minutes. I don't want to exaggerate, I think it was like 3:30 or something like that.

Abe Gong:
We've got that video somewhere.

Kyle Eaton:
Yeah, we do. We do. At least I know I have the [inaudible 00:27:40] one that was under five minutes. It might even been three, but that's just to show you can make a contribution quick, and then we'll be highlighting opportunities of how you can enhance these contributions of the expectations instead of having to come up with your own. But yeah, so that's one of the initiatives with the gallery.

Eric Anderson:
Fantastic. There's so much we could discover, I promised you guys a reasonable length conversation, which we're bumping up against. What haven't we covered that we want to share with our listeners today?

Abe Gong:
I don't know that we've actually said the words on this, but as we've thought about what's the difference between open source and a future paid product, the way that we think about open source Great Expectations, I think it already is, and we just want to make sure that it can always be a shared open standard for data quality be having these expectations, having these rules, is a thing that lots and lots of other systems need. So Dagster, for example, Flight, for example, Databricks's Delta Lake tool, they all have a concept of expectations, and none of those are rigidly running Great Expectations code, but the concept is valuable in all of those places. You see it in data catalogs, you see it in ETL and reverse ETL, all of these things need a concept of data quality, and the notion of an expectation turns out to be really valuable for all of them.

Abe Gong:
So, the grand vision that we see is we want to be in a place where we can put out this open standard that defines how those expectations can work across all kinds of platforms and all kinds of infrastructure. You don't have to use our code. I mean, it's open source, so why not? But if you do, or even if you just write code that's compatible with it, you could translate back and forth between those expectations and Great Expectations, and by doing that, you get the profilers, you get the documentation, get all of the other pieces that are added to that. So where we want to go with open source is providing this abstraction layer that just should exist, like it is so valuable in so many places and there's no reason for the world to keep reinventing it. So, that's where we're going to go with open source.

Eric Anderson:
That's fantastic, Abe, and I think it is an important clarification. You described an open standard that could be consumed, even if you didn't use your code. I mean, it sounds like you're describing almost a protocol, or something like SQL, or a syntax maybe, but basically you're not describing something that necessarily executes, but you're describing a way of talking about describing a set of standards.

Abe Gong:
I think that's exactly right. The way I would describe it is, Great Expectations as an open source library is, it is that protocol, and if you look, that's really the testing infrastructure, the docs, that's how the protocol is defined. We then have one reference implementation of that in Python, and we've been, with help from people like Cascade Data Labs, we've made it so that that reference implementation actually transpiles, if I can use the fancy word, into Pandas, into Spark, into a whole bunch of SQL dialects. And so using Python to orchestrate, you can actually put it in all of those places.

Abe Gong:
But, I mean there's discussion in the DBT community, for example, of having a compatible library that's pure SQL, it doesn't have any Python implementation at all. We've had a little bit of conversation with folks in the Tidyverse world of, "Okay, what would it look like to have an R implementation of these things?" Or you could take a library like [inaudible 00:31:02] that does many similar things, and be able to compile back and forth, transpile back and forth, so that they can get the benefit of some of the infrastructure that we've built. So it's a bit of a brain flip, but if you think about what will create the most value in the world, that's the way to do it, is to have this layer that can be used and shared by lots of people in lot of places.

Eric Anderson:
Makes perfect sense. Abe, Kyle, thank you so much for coming on the show today. I know you both have a lot to do. I mean, you have to save us all from bad data code, and you have a limited team and a big community waiting on you, so appreciate you coming on today. For our listeners, this was recorded in mid-September and will probably be published in late October. Some of the details here may have changed at the time of publishing.

Abe Gong:
Hey Eric, thanks for having us on. It's been a wonderful conversation.

Kyle Eaton:
Yeah, thanks so much, Eric. I really enjoyed it.

Eric Anderson:
You can find today's show notes and past episodes at contributor.fyi. Until next time, I'm Eric Anderson, and this has been Contributor.