Artyom Keydunov:
I think we really never really thought about how are we going to make money. We thought that we need to at some point, but we didn't think about specifics. But yeah, at some point later, we started to use those early adopters to understand opportunities to create enterprise value.

Eric Anderson:
This is Contributor, a podcast telling the stories behind the best open-source projects, and the communities that make them. I'm Eric Anderson.
We welcome Artyom Keydunov of Cube on the show with us today. Cube is this exciting layer in the modern data stack. What's exciting to me in part is just the reaction from the market. A bunch of GitHub stars, a bunch of community. So thanks for joining us today, Artyom.

Artyom Keydunov:
Yeah, Eric, thank you for having me.

Eric Anderson:
As we always do, help our listeners understand what Cube is.

Artyom Keydunov:
Yeah. Cube started out as an open-source project. But I think essentially it is a product for data engineers to help to expose data to different data applications and different data experiences. What we saw happening in a tech stack is that all the data goes into data warehouses right now, and that's just only will keep going. We'll have more and more data in the data warehouses on one end. But also on the other end, we have more and more data consumers that require different diverse data applications and different data experiences, basically. So we have more data, but we also have a huge demand for this data to be used across different applications.
And what Cube does actually, it creates a bridge between those two worlds. It takes data from the data warehouses into all of those different data experiences, data applications. So if you wanted to build customer-facing analytics or internal tools, or just use multiple BIs consistently, Cube can bring the data to all of these places and provide access control, security, caching, all of that things. We started as an open-source project about three years ago, and it's been a lot of fun to see so many different creative use cases, how people are using Cube.

Eric Anderson:
Awesome. I had kind of an aha moment when we talked about the project recently. I think Semantic Layer doesn't mean a lot to some people. But if you're embedding visualizations in a product, exposing it to customers, the normal access control that comes with data warehouses don't work. And so at a minimum, you need new access control. And also, data warehouses can be expensive, and so if you're running the same queries, or they're seeing a lot of viewers, you probably want faster queries than what data warehouse is going to give you, and you want some caching, that makes sense to me too, and that seems like the right layer.
And basically from there, everything else makes sense. You need data modeling in order to construct this thing for outside consumption. So while all the individual pieces make sense, that's where I realized that you're the right place for all these different components, and they neatly fit together, even though it wasn't obvious to me from the outset. Is that your experience with the market? Or are people like, "Ooh, I definitely know what a Semantic Layer is and I need one"?

Artyom Keydunov:
It depends. Some of our users, they come to Cube and say, "That's a Semantic Layer, I need a Semantic Layer. Cube is a Semantic Layer, I want to use it to power all of my different applications." Sometimes people are having a hard time to find a good name. What is Cube, right? As you said, it does caching data model, access control, and then has a bunch of APIs and people try to find words to describe it. Sometimes we hear words, middleware. It sounds a little bit like old school, but maybe technically it's correct, because we see it in between the warehouses and applications. But yeah, I think it's been interesting to see how people try to use different words to describe Cube, especially initially. I think lately, we started to keep hearing more and more Semantic Layer as a definition of the Cube, yep.

Eric Anderson:
Perfect. Now tell us the story, this is the part I most enjoy, how did you stumble into this? Is this something you've been working on for a while, more recently, and what's the background that led you here?

Artyom Keydunov:
Yeah, it all started with a product called Statspot, and Statspot was my side project, hobby project at my last company. An idea was, what if we get data from different places and to just put it into Slack? Because we used Slack a lot, and I wanted to have all my data in Slack. From production databases, from New Relic, or Salesforce, all of those data merged, union together and presented in Slack. So build a Slack application. And it was initially only a hobby project, but then it started to grow, and people started to use it. And Slack featured us on a Slack application directory if you remember the days when they launched it.
So got a lot of users and a lot of people were excited about using Statspot. My co-founder now, he was one of the first users, so he started to use it. And then he texted me, and had this Ruby on Rails application that was running Statspot, and it started to have a lot of outages. And I just told him, "Can you please help me just to run the servers and managing that?" And he jumped in, and him and I, we just kept going and working on that. And then we started to see some venture, we see people reaching out to us, wanted to fund that project. And we ended up raising a small seed round for that Statspot as a project. We raised from Antioch Ventures, really great people, the VC firm out of New York, and we kept working on this.
But over time, what we realized is that we built something very valuable for Statspot internally that was Cube. Because we needed the engine that would work on the data from different places, and then be able to present this data to different places again. Because the architecture we were using, we were doing ETL of the data from different places, putting into warehouse and then running this Cube engine on top of the warehouse and then presenting data to the Slack. But we also had a vision to present data to different places outside Slack too. That's why we built it to scale initially.
So when we built Cube, we had a few users that were very active Statspot users. They wanted to dig deeper and to have more controls of the Statspot, and we exposed Cube to them. And when they saw that, they were like, can we use that internally to power our applications? They were like, "That sounds like a good use case." And so we started to think about that more and more. And then we had this crazy idea. We were all playing ping pong back in the office, and I'm with my co-founder. We're like, "What if we open-source Cube, and just see how people's going to use that?" And we just started to talk more and more about this idea, convincing ourself that maybe we should give it a try. And then we decided to do that. We open-sourced Cube.
It was very different from what it is right now. So it was very early, not very matured and very well packaged. But people started to use it. And maybe three or four months after we open-sourced, we started to see contributions, we started to see users. I remember we have this thing where we put a target of 10 companies that actually will deploy it into production, and that's when we going to believe that's a real thing. And we were counting these companies and at some point we were like, "Yeah, we have 10 companies that actually deployed Cube." And then that was a pivotal moment that we realized that probably we need to focus on the Cube full time, and then we changed that. We sunseted the Statspot, and then we fully focused on Cube.

Eric Anderson:
Artyom, great story. As you're playing ping pong and deciding whether to open-source something, do you feel like you need to ask your investors about... I suppose it's not really part of the product, it's just this internal layer that you had to build to ship the product. What goes into the decision to open-source? Is that a lot of work?

Artyom Keydunov:
Yeah. So I think when we were playing ping pong, it was when the idea came first, and then we started to nurture that idea internally. We only had Antioch as an investor, so we didn't have a formal board something. But we spoke with a partner from Antioch, just bouncing the idea, so maybe we should just try that. And they were very supportive.
And then as we started to think about this strategy, how we would open-source it, we decided that we wanted to do it in stages. I think it took three months or so really to fully open-source it. Because we first open-source one piece, and then we open-source the second and third. So it was just a do it. Because it was not really hard, but it was a lot of work to take it out of the existing product and then package it as a standalone thing. So it took us some time to finally open-source it. But I think the first thing we did, we just created our organization, and a repo and a GitHub to make sure that the name is going to be ours.

Eric Anderson:
And then speaking of the name. So you started out with Statspot. Does Cube harken back to the VI Cubes that we're all familiar with?

Artyom Keydunov:
Yeah. It was called Cube.js even. So now we call out... And the reason why we called it Cube.js is actually one of the first users called it this way. Remember, I told you about those power users, that we started to expose this configuration layer. And because we didn't have a name for that, it was a purely internal thing. I think we had some internal name, I don't even remember, but it was not called Cube. And one of those users, they asked us something like, "Hey, how can I do this in Cube.js?" And we were like, "Okay." They called it this way, so let's just keep it that.
And the reason why they use that probably because in a data modeling layer for Cube? You operate on cubes basically. You create cubes, and then you put measures in cubes. You put dimensions in cubes, then you build relationship between cubes. And you need to write the code, and in the code you call it a cube. And then the code was in JavaScript. Now we support YAML definitions too, but back then it was only JavaScript. Cube.js was a very natural way to describe that, so that's how the name came.

Eric Anderson:
Awesome. So you open-source it in parts it sounds like. And at some point, you get all these users and GitHub stars in the community. How did you get there? What was the early successes?

Artyom Keydunov:
Yeah. When my co-founder and I, we started to work on that. We are both engineers, but we decided as his role is going to be more talking to existing users who are trying right now Cube, just to understand what are the patterns of the user, what are the problems they're trying to solve? How they are solving these problems, and just changing the product. And then again, talking to them, changing the product, talking to them. And my job was how would we get this people in? So how people would learn about Cube.
So what I did is, I started to do a little bit blogs about how you solve specific problem with the specific stack with Cube. And I just started to post it on Reddit, different places, hacker news, all of that stuff, and that's how the initial things kicked off. So we created a Slack instance, and idea was that we need to dump fuel, every traffic into Slack, so everyone needs to go into Slack so my co-founder can actually talk to those people. Now my job was bringing people to Slack, his job was talking to those people. And then the goal of that process was to get to the stand production users eventually.

Eric Anderson:
Awesome, sounds like a dynamic duo. And then the 10 users... At this point, you're not charging them for anything, it's all open-source. But presumably, that's on the horizon for you. Is you're trying to figure out, well if we're going to build a business here, how do we do that? And is that a conversation you have with some of your earlier users?

Artyom Keydunov:
No, not really. Yeah, I think we always thinking about that, because we thought that we probably eventually wanted to have a business around that, a product that we can sell. But especially before we had those 10 users, I think we never really thought about how we going to make money. It was just like we thought that we need to right at some point, but we didn't think about specifics. But yeah, at some point later, once we got more users, after we raised our seed and then series A, we started to use this early adopters to understand the problems.
And I think it was mostly not asking them directly, but just observing their journey to the production, and then seeing them successful is what issue they have? They have observability, tracing, debugging issues. Okay, we probably can build tools to help them do this. Then they wanted more visibility into cache. We probably can build that as an enterprise feature, like a UI or something for the cash. So we were more observing and trying to understand that. And what is the missing features? What are the challenges to deploy Cube into production? And were, for us, opportunities to create enterprise value.

Eric Anderson:
You mentioned something at the beginning, and I want to transition a bit to understanding more how the product works. That you were stitching together a bunch of different data sources into a single visualization. Do I stitch them together into my data warehouse and then bring my data warehouse... Now all uniform one data source to Cube, or does Cube help me aggregate data as well from different sources?

Artyom Keydunov:
Yeah. I don't think about Cube as a federation engine. So we don't want to try to replace Trina or something. So we can connect to multiple data sources, and we also can... Because we have a cache and clear, so we can merge data on a cache and clear. So we technically can merge with some restrictions. We can merge data from different data sources on a cache and clear level. But at the same time, I wouldn't think about Cube as a federation engine.
And frankly I believe that warehouse is going to be the major place that organizations are going to store data. It may be couple of them, but I don't think that we will deal with hundreds of different data sources. I think ETL products are getting better, and many, many companies are using them to basically pull the data into the data warehouses right now. And I envision Cube sitting on top of data warehouses.

Eric Anderson:
Perfect. And then the other half of your front end of your stack I guess is that someone chooses their own custom DataViz layer, that's not something you provide. And so what are the typical things people are doing? And I presume that they're even making that choice before they make the choice on you. They pick a UI layer, they pick the data warehouse and then they realize they need something in between.

Artyom Keydunov:
Right, right. That's a very common story. That's actually the story I was using when I was telling about Cube initial is. In all of those blog posts, I was doing the blog post. Now you have Snowflake, and you have a React, and how do you actually build application? And then it was about a Cube in the middle, obviously.
Yeah, a lot of our customers, they built from scratch on the visualization side. So they use React as the most popular front end framework right now. A lot of different libraries. It's a really fragmented market for open-source visualizations. There are also Highcharts, which is not, it's commercial backed, which is good ones. There is a Charge.js, there is like a D3.js, which gives you a little bit more low level power. So we see a lot of this. There are some React specifics, like Recharts. Some of our customers, they also use out of the box tools on top of Cube, like Superset, Metabase, Observable and some other notebooks, so that's an option too. But we do see React or JS style stacks.

Eric Anderson:
Now, I guess once you choose your front end, you choose Cube and you got the state warehouse, you have a full stack for doing a data application. There's other ways of building data apps today, Streamlit or those types of tools. Are they a full stack version of what you do, or are there reasons to choose one or the other?

Artyom Keydunov:
Yeah. I think Streamlit is a good example. I think it's sweet spot, and you can run Streamlit on top of your data warehouse, and maybe soon only on Snowflake. I don't know how that's going to play out. But I think it's good for these one-page or simple applications. It's probably going to be used by a single person or single role, because there is not a lot of a security access control built in. So I think the good example could be if you wanted to build a simple internal tool, or maybe you're working on some data set and you want to present it into more a dynamic way, rather than just building slides or a static dashboard. That could be probably a great choice.
But if you wanted to build embedded analytics in customer facing inside your application? Maybe you're building a software as a service and you need Insights page, or some analytics features in the product, the stream lead could not be a good choice because it's like how do you embed it? You probably need to make it a part of your front end tech stack, and then you need to deal with access control, security, and then you need to bring a cache in. So I think in all, it's just use case wise, it's a little bit different from what we serve in versus with Streamlit. But at the same time, I mean if you wanted to run stream lead with super set with Metabase and having the Semantic Layers that unifies a data across all of these tools, you can run them all on Cube.

Eric Anderson:
Ah, I see. Okay, so it can be complimentary. That's great. So when did you feel like, wow, we're really onto something? You got through these 10 users, but now you're screaming as far as adoption and community goes.

Artyom Keydunov:
Yeah. I think really the first moment we felt that we are onto something was when we got 10 users. So that's when we really... I think we had maybe really 1,000 to maybe 500 people in Slack. But you're getting first people in Slack and community, and then they need to progress, they need to deploy, they need to build something, and then you only get 10 users, because it takes time for them. So it took a little bit time, maybe three or four months once we saw the first 10 users, and that was great.
And I think after that, we started to be more active. We are just talking about Cube, and that it led to getting more people on top of the funnel and just awareness of the Cube. More stars, most Slack communities, and then eventually more production users. But really, the aha moment were those first 10 users.

Eric Anderson:
You said your job was to find more users and part of how you did that was blog posts. Are there other things that you figured out, oh, this is actually the best way to find the people we're looking for, the people who most like Cube?

Artyom Keydunov:
Well, we started with the Covid, and I think we... Really, the first open-source group in 2019 before Covid. But I was doing blogging, color that content as I said. And then by 2020 when Covid started, that was the question for me, like, "What do I do next?" And then Covid happened. It was like, "I'm not going to do events, probably." And so I decided just to keep doing content, and that's when we raised our seed round already. So we brought a dev rail who helped me to build different cool applications, examples. But we really doubled down on the content, and just keep focusing that.

Eric Anderson:
I thought you were going to say that you were doing a bunch of Covid data, because everyone became a Statistician...

Artyom Keydunov:
We did that.

Eric Anderson:
... during Covid. We were all looking at charts every morning.

Artyom Keydunov:
Yeah, that was one of those cool projects that our developer person built. It was a little too noisy. I think it got some traction, but everyone was building them.

Eric Anderson:
Right, I feel like we all discovered five different websites every day that were going to give us Covid data. Wonderful. And where are you at now, Artyom? Tell us as folks are listening to this interview and they get excited about Cube, what do they have in store for the next year or so, and what are ways they can get involved and contribute?

Artyom Keydunov:
So I think Cube really matured over the three years, so it's a different product now that it was three years ago. And we got a lot of contributors, and that's really great to have all these people contributing to Cube, and many core features being contributed from external contributors. One good example is a graph scale API. When Cube started, we only had a REST API. But then GraphQL API had been contributed from the community. Now we maintain it from the core team of course because it becomes really important feature, but it was contributed initially as a community contribution.
We also got a lot of drivers, connectors to different databases, and data sources have been contributed by community. So I would say that's the area where we saw a lot of health and how people got involved. But also, our Slack has just... It's fun to see how people are helping each other, so that's been a big contribution area as well on the surface. And now Cube is an open-source. We don't call ourselves Cube.js anymore, because we don't only serve JavaScript community. But we try to serve more like the data engineers' community, and we support other languages rather than JavaScript for configuration.
So yeah. We have a cloud product, so I definitely would love people to try it out if they feel like contributing. Contributions are always welcome, and just giving us feedback and providing some guidance in the community, that's always great.

Eric Anderson:
Getting a meaningful contribution like the GraphQL API is a real slam dunk. Is that something where somebody just shows up with a PR one day, or do they engage with you and asks what you need?

Artyom Keydunov:
Yeah, they engaged. Yeah, that had been a... In our Slack channel, we have a channel for the contributions. So it's been a lot of conversations. And to be honest, it's been very overdue. So on our end, we just were focusing on other things. Many people, they wanted GraphQL. But one person, they just did it. So many asked, but someone did.

Eric Anderson:
I'm tired of waiting. Good, congrats on an awesome project that's found a foothold in the community. Hopefully we can watch this grow, and then we reconnect here and do a follow on episode and hear the update.

Artyom Keydunov:
Awesome. Yeah, I would love that. Thank you for having me today.

Eric Anderson:
You can subscribe to the podcast and check out our community Slack and newsletter at contributor.fyi. If you like the show, please leave a rating and review on Apple Podcast, Spotify, or wherever you get your podcasts. Until next time, I'm Eric Anderson and this has been contributor.