Brian: PodRocket is sponsored by LogRocket, a frontend monitoring and product analytics solution. Which is to say, it's not really sponsored by anyone, it's sponsored by us, LogRocket, and we're giving it away for free. The podcast is free, the product is not free. There's a free trial. We could split hairs about whether or not that's free to you, but anyway, that's it. There are no more ads. If you're interested and you want us to know that you came from the podcast, please go to logrocket.com/podrocket. If you don't care, logrocket.com works just fine, thanks. Brian: In this episode, Ben interviews Jason Bosco co-founder of Typesense, an open source search engine that's optimized for both performance and ease of use. They talked about how Typesense compares to Algolia and Elastic Search, how to get started using Typesense and how you can get involved. Let's get started. Ben: Hey Jason, welcome to the podcast. Jason Bosco: Hey, great to be here, Ben. Thanks for having me. Ben: Yeah. We're really excited to have you and very excited to learn about Typesense. So maybe you could give us a quick introduction, what is Typesense? Jason Bosco: Yeah. So Typesense is an open source search engine and specifically optimized for performance and ease of use. So a good way to think about Typesense, it's an open source alternative to Algolia and an easier to use version of Elastic Search. Ben: Got it. And so, yeah, maybe we could dig in there a bit for folks that aren't familiar with Algolia or maybe haven't used Elastic for like this type of search use case. What does Algolia provide? And why would someone in the first place turn to bike a standalone search provider? Jason Bosco: Yep. So typically when you have data in a database, maybe you're using a relational database MySQL or a Postgres, or you're using a document store like Mongo, they are more suited for fetching data by record IDs or joining different tables together and fetching data. What they're not optimized for is full-text search. And, of course, Postgres does have some amount of built-in full-text search. But full-text search on large data sets with features like typo tolerance, for example, if you make any mistakes, how do you still get the correct results? Or if you want to do things like, what's called faceting, where you want to show how many counts of each types of records are there in your data set. So these are typical things that you run into when you're building a search experience. So it's not just typing a keyword and getting results, but then once you get the results, you want to also start filtering those results and drilling down into the results. And that's where a full-text search engine like Typesense or Algolia or Elasticsearch come into play. Jason Bosco: Now Algolia is a proprietary search engine and Elastic Search is open source available, some people say, now. And the way Typesense, we're differentiating is, it's an open source, instant search engine. So we're not trying to go after every single feature that Elastic Search has, which they have a ton of features where, if I was grabbing through the code base and they have about 3000 different configuration parameters. Now, someone who wants to implement a simple search experience has to dig through all those parameters to see which one is the best use for their particular use case. We're trying to simplify that whole experience and also provide a instant search experience like Algolia. So unlike Elastic Search, we hold all the data in memory, which is what makes it super fast. Ben: What are some good use cases for Algolia or Typesense for the instant search? I'm guessing maybe e-commerce website, you want to power the search for products. Jason Bosco: Yep. Ben: Are there other use cases that you see commonly when you're talking to users? Jason Bosco: Yeah, so e-commerce is definitely a big use case, because a big portion of e-commerce is search. And in fact, something that you typically do not think of search in e-commerce is when you, for example, search for men's shirts on an e-commerce site, when you're just clicking around, the product grid that shows up, that can also be powered by a search engine. So you can actually use it, not just for keyword search, but also for what's called structured search, where you can filter by categories and you can power that using something like Typesense. Jason Bosco: And then you have the whole SaaS ecosystem, anything to do with in-app or on-site search experiences. So for example, knowledge bases, you have content management systems that you want to search for content in. So it's really a pretty broad market. Anytime you need to build a search bar within your site, you have to probably start using a search engine at some point, once you hit moderate amounts of scale. Ben: Yeah. I'm curious to dig in there on that concept of hitting moderate amounts of scale, because I imagine for small scale projects or small scale applications, as you mentioned before, most SQL and NoSQL databases, you can index and search over various fields in the database. And so, how do you know that you have too much data or at what point does performance start to become a problem with typical use cases of SQL and NoSQL? For example, I have an e-commerce site, how many items do I need to have in my product catalog before I can anticipate search performance starting to break down to the point where I need to put an additional search layer on? Jason Bosco: Yeah. I would say it depends on the size of your dataset more than the type of data. So, I've seen as largest like 5,000 products be searched from a database, so that could be totally fine. But even with that, where you'll start seeing performance hits is, if you want to implement something like typo tolerance on top of your dataset, that particular feature itself is heavy enough. And if you also want to build a search as your type experience where every key pressed starts returning results, that's something, if you add typo tolerance on top of it, search results, for example, it ends up taking even 500 milliseconds to get your data from the database with typo tolerance. Then that's not an ideal user expense for search as you type experience. So people typically expect like sub, maybe I would say, a hundred milliseconds or 200 milliseconds for instant search experience. Jason Bosco: So that's where I would say, if you need instant search plus typo tolerance and the ability to drill down at a high performance, that's where you start using a search engine. If you don't have a need for any of these, if you just want to do regular text-based search, and if you don't have a need to drill down, then again, a database search could be sufficient for you at even small amounts of scale. But a search engine will become relevant even at small scales, when you need to implement these different features that a search engine provides you. Ben: Got it. And I'm just curious to understand, without going into too deep technical detail, like under the hood, how do you implement something like typo tolerance? Jason Bosco: People have done a lot of research on this. If people are curious, it's called Levenshtein distance. We calculate the edit distance between the word that's typed and what we've stored in the index. It's a fascinating topic, so I'd highly recommend folks to check it out. And of course, we have our own proprietary search algorithms on top. It comes down to, I would say, how to best efficiently store the data in a format that's easily retrievable during a search query. So we optimize everything for reads and writes in parallel as well. So again, without going into too much detail, that's at a very high level. It's custom built, the search algorithms, and typo tolerance uses Levenshtein distance. Ben: Cool. So tell me a bit about the process of getting started and deploying a Typesense instance or cluster, what does that look like? Jason Bosco: Yeah, so we offer a native binary's docker. We have a docker image, so that's the easiest way to get up and running. It's one command and you have a running search server. We also have a Typesense Cloud that we launched last year, people were asking for a hosted version. So even with an open source product, people were still asking us for a hosted version, so based on feedback, we started doing that. So Typesense Cloud, again, a few clicks and you have a running cluster in about five minutes. I would say the biggest thing about Typesense is that we try to provide an out of the box experience for all the configuration parameters that we have. So you get relevant results like right out the gate with just one command versus having to go and fine tune every single parameter for your use case. Jason Bosco: So the idea is that hopefully for like 80% of use cases, you don't have to think again about search. It's just one command. You have a search server, your index data, you start coding it. That's the experience that we aim for. Ben: And so, let's say you deploy a docker image, once you have a live instance, what is the process for mirroring your data from your SQL or NoSQL database into the Typesense cluster? Jason Bosco: Yeah. So it's similar to how you would push data into any other external system. So people typically, for example, listen to change streams in their database. MySQL has Binlogs that you can listen to, or DynamoDB has change streams, Mongo has change streams. So you can listen to change streams and then based on whether or not record is updated, deleted, or updated, Typesense has a REST API, so you make equivalent API calls into Typesense to send that whole document as a GIS object into Typesense. Ben: Got it. And you mentioned, I think before that Typesense, it stores all data in memory. Jason Bosco: Mm-hmm (affirmative). Ben: Does Typesense maintain a backup on disk of the data, so that if the cluster goes down, it can repopulate the in-memory database it uses. Or how does that work? Jason Bosco: We store the indexed fields in-memory so that anytime you search for things, we only hit memory, we do not hit disc. Now, we do store on unindexed fields on disc, in addition to a backup of the index fields as well. So you could search for a certain set of fields, but then also set it to return some unindexed fields along with it, so that you can actually use it for storing more data than just your searchable fields. Now, you could spin up Typesense as a single node or in a high availability setup as well. So we use Raft for a consensus algorithm and it's a distributed system, so you can set up three or five or seven nodes, depending on your needs for high availability. So again, that's also a simple one file configuration to set up high availability, once you set that up, data is automatically replicated. Jason Bosco: Any writes that you do are automatically replicated to all nodes. So even if one node goes down and it loses its data as well, the replacement node that you set up for that will automatically sync data from the other two nodes. So on top of that, if you want even an extra layer of comfort, you can also backup the data and store it separately. All of this said, typically you don't use a search engine as your primary data store. You only store data that you already have somewhere else, and you're mirroring a copy of it into the search engine. That's the general design principle that I'd recommend. Usually that's the standard for such instance. Ben: I guess one of the pros of using something like Algolia is they abstract away all of the management from you. So you send data to them, they power your search and they handle all of these topics you were just mentioning from availability to fault tolerance, et cetera. So what is the pro of going with Typesense? It seems like open source is obviously the biggest advantage of something like Typesense. What's the case for why people should choose Typesense, when Algolia exists as a hosted solution? Jason Bosco: I mean, I a 100% agree with you that given that it's an open source technology, I look at open source as a means to innovation, because I've seen open source, open up access to technology to more folks than what something like Algolia that people would be able to afford. Now, of course, Algolia offers it for free for open source projects for just their documentation search. But then the more advanced use cases you've got to start paying, and the technology is not as accessible as it could have been if it was open source technology. So that's where I see open source as a means to driving product adoption, but not just for product adoption sake, but also to use that adoption to then have people use the product more and give feedback so that we can continue improving the product. Jason Bosco: So that's on the open source site, the other thing is, of course there are people where they don't want to manage their own servers. And that's part of the reason why we launched Typesense Cloud, and we've now deployed hundreds of clusters on Typesense Cloud. The nice thing that's happening is as users are using Typesense Cloud, we're now able to take the learnings of operating these clusters in production for so many people, use those learnings, and then improve the open source product, and then offer that back to the community. So it's a nice little feedback loop that we're getting, where people are using the product, both self-hosted and the cloud version, and we're able to improve the product based on real-world production use cases. Jason Bosco: The third thing I would say is that some folks, for many variety of reasons, like for example, as strict compliance requirements, they cannot use an external system, so they have to self-host. I think cases like that make Algolia inaccessible, and that's where it Typesense, they could self-host it. We also offer support even for self-hosted folks. So that's another key differentiation between Algolia and Typesense. Ben: And I'm curious as you think about building a business around Typesense, do you currently charge for the Typesense Cloud or is that something that might come down the road? Jason Bosco: We do charge for it. Yep. Ben: You do. Jason Bosco: Yep. Ben: Do you see yourselves down the road building non open source features on top of the core open source Typesense? Like, I've heard it called, the open core model, where you have the core version as open source, but then you build premium features that you charge for. Jason Bosco: Yeah. So we actually did go down that path last year, actually, where we did have an open core model where some features were under a premium offering. One thing that I realized was that people were hesitant to purchase and then try. Of course, we could have done a free trial, et cetera, but just that experience of offering something behind a paywall seemed like it added some friction to how many people would use that particular feature. And tying this back to what I said earlier, which is I'd love for every feature that we build in Typesense to be used by as many people, from the perspective of them giving us feedback and for us to be able to improve that feature. Jason Bosco: So if we introduce any constraints on top of it and say only these features can be used by everyone and others need to pay for it, I felt like it started coding product feature adoption. And that's when we decided to open source everything like a 100% of the search engine. Now, of course, Typesense Cloud we do have some automation around building a self-healing cluster. And there's a lot of tooling around Typesense Cloud to provide a SaaS service. Now, of course, that is closed source. That's something that not everyone needs. Now, of course, if you run a SaaS solution, you might need it, but that's closed source, and that's something that's proprietary to us. Jason Bosco: But the core search technology itself is open source. And we in fact, run the same open source product that we publish for Typesense cloud as well. And then we build outside of it. So we don't even change the binary that we run on Typesense Cloud. It's the same binary, it's the same source code. We just build additional things around it to make sure that, again, the open source product is the product. That's the thing. What we're building closed source around it is mainly around automation, around scaling and infrastructure and deployment from a SaaS perspective. Ben: So tell me a bit about what's on your roadmap? Whatever you're comfortable sharing, curious to hear what we might expect to see from Typesense in the next year or so. Jason Bosco: Yeah. So the way we prioritize the roadmap is mainly based on feedback from users and interesting use cases that come up. So one thing that has repeatedly come up, which interestingly, I thought might be a niche thing to build initially, but users have now proven me wrong is geo search, being able to search in a map based on lat/long. That's something that we said we'll get to it at a later point in time, now people are actively asking for it, so that's something that we're working on. Jason Bosco: The other set of features that we're building are what I'd like to call maybe edge features, in the sense that once enough people start using a particular feature, you need all these little things that are not too common use cases, but still there are enough people needing it. For example, one use case that came up was, you can index arrays inside of a JSON object. Right now, if you need to add an additional element of the array, you basically have to insert the entire array inside. Jason Bosco: And there was an ask to, "Hey, can I just add one more element to the area?" Or, "Can I check if an eliminate exists in the array already?" So features like this, where, again, if you already have the source of data, you can just replace the array that's not too big a thing, but it would be nice to have an additional thing. So features like this are what I'd probably call it, the long tail of nuanced feature sets that we're planning to tackle. Jason Bosco: And then the other big area of focus is around integrations. And when I say integration, I mean, let's say, there's an e-commerce platform that can come pre-integrated with Typesense, or maybe another framework that can come integrated with Typesense. I would say, Algolia has done a fantastic job around integrations, and anything that makes developer's lives easy is what we also want to do, that's it. We're also a small lean team, so I'm hoping to get the community involved. And folks that are using their own frameworks, they're the experts in their framework, so I'm hoping for contributions from the community around building those types of integrations. So these are some big priorities for us this year. Ben: So we talked a bit earlier about getting started. It sounds like it's really easy to get started with Typesense, and if you want to deploy or use the cloud hosted version. But I'm curious if you're interested in getting started with maybe contributing on an open source basis, what does that look like? And what kind of contributions are you looking for? And how should someone get started if they're interested in learning? Jason Bosco: Yeah, I would say the biggest need for us right now is client libraries in different languages. We currently have libraries in Ruby, Python, JavaScript and PHP, and we also have... Interesting the PHP one, and we also have a Go one contributed by the community that we've adopted. I'd love for more community contributed libraries in specific languages. Again, we're not experts in every single language, so while I'd love to write a Java client, I'm not an expert in writing production level Java code. So I'd love for contributions around that front. Someone recently asked for the Rust library, so if anyone has Rust experience, I'd love to have a library around that. Jason Bosco: I think there's also a level of evangelism that I'd love to get some help with. Because, again, we're a small team we're focused on the product, we're doing a lot of things, and I think having help speaking about Typesense, if you're using Typesense in production already, and we already have about 25,000 docker pools at this point in time from Docker Hub, so people are using it. I'd love for enough for them to now help spread the word and see that they're using it. And we have a showcase section on the site, open a PR and say, if you're using it in production, we're happy to showcase your work. Jason Bosco: And finally, if you have interesting use cases of how you're using Typesense, writing about them in your blog or contributing in other communities or contributing to our documentation to add how you've integrated Typesense with framework X or framework Y. Those are some great ways to contribute back. Ben: And the libraries you mentioned in Go or Rust, are those basically just wrappers around your REST API. Jason Bosco: Correct. For the most part, they're just wrappers. They just make calls out, like their one-to-one match with our REST API. But the additional logic in the client libraries is around retries, so if one particular node goes down, it has logic in there to round-robin between different nodes. Ben: It sounds like an interesting project for someone, if you're into building REST libraries out there, you should definitely reach out to the folks at Typesense about contributing. So lastly, tell me, what are you most excited about in web development? Jason Bosco: Yeah. I'm starting to see a big wave of open source web alternatives to popular products, and I'm not talking just about developer tools, like Typesense, but I just saw it this morning on Hacker News, someone launched an open source alternative to Trello and there's another open source alternative to Intercom. And it seems like the trend of open source alternatives to popular proprietary products that's starting to catch up now, and that encourages me a lot. And the fact that folks are building products in the open, I think will spawn a whole new wave of innovation for the next decade, which I'm excited to watch unfold and also be a part of as well. Jason Bosco: So again, if you haven't yet gotten involved in open source, I think now's a better time than ever, because there are so many open source projects starting out and that are backed by companies behind them as well, and who are also looking for folks. Even if there are companies behind them, they are still, like us, like Typesense, We're still that small team and we'd love to work with the community to be able to bring this product to market just like so many other folks that are starting up now. So I'm super excited by that trend that's happening. Ben: So Jason, thank you so much for coming on the podcast. This has great to learn about Typesense. And we're going to put a bunch of links in the description of the podcast. We'll put a link to your GitHub repo. Maybe, we'll put a link to... Maybe you could share an article about Levenshtein distance for folks that are interested in learning about building typo tolerance and some of those algorithms, anything else we should link to there as well? Jason Bosco: Yeah. Happy to share links and a lot of reading material as well. this is an interesting space and we're passionate about it. I think the more people that know about how search engines work, we could also get contributions back into our core as well. Ben: Great. Well, thanks again, Jason. Jason Bosco: Happy to be here. Thanks for having me. Brian: Hey, it's Brian, again, so it turns out that running a podcast is maybe harder than we thought. And so I kind of want to hear from you. I'm genuinely interested in your feedback. We have to think about new topics, new guests, we have to find them. And don't get me wrong we can do it, but it's a lot easier if everyone else who's listening helps. So if you'd like to suggest the topic or volunteer to be on PodRocket, we'd like to hear from you. So you can do that by going to podrocket.logrocket.com/contact-us. The hyphen is next to the delete key, if you're curious. If all of that is too long, you can just email me directly brian@logrocket.com. That'd be great. Also, if you're feeling magnanimous, be sure to like and subscribe to PodRocket. Thank you.