MongoDB 
===

Paul: ~Test, test, test.~

Michael: ~Test, test, test.~[00:00:00] 

Paul: Hi there and welcome to Pod Rocket, a podcast brought to you by Log Rocket. Log Rocket helps software teams improve user experience with session replay, ever tracking and product analytics. Try it for free at logrocket.com. My name is Paul and joined with us today is Michael Lynn. Michael is a developer advocate over at MongoDB and he's the host of the Mongo DV podcast.

~If you haven't checked it out, you can go search up MongoDB podcast and you can see Michael's smiling face that we're looking at right now, which unfortunately you can't share the view of. Uh,~ welcome to the podcast, Michael.

Michael: ~Paul, it's great to be here. ~Thanks so much for having me.

Paul: I'm excited cuz. Today we're gonna talk about MongoDB, which a lot of people have heard about, a lot of people have either tooled with or talked about and how it kind of relates to the R M D B S world.

Michael: Yeah.

Paul: and relational and some of the differences between that and it. We're gonna even step into a conversation about migrating between them.

Some of the things you might wanna watch out for.

Michael: That sounds great.

Paul: Awesome. I mean, before we get into nitty gritties and details and talking about migration for folks who are tuning in for their first time, ~uh,~ on the [00:01:00] topic of MongoDB, maybe they've heard of it, not really sure what the selling point of it is. What is MongoDB and why is it different than Postgres?

Michael: Yeah, it's a great question ~and it's,~ and it's a great place to start, I think. So MongoDB is a database right ~at,~ at the very top level ~of the,~ of the hierarchy. And you might be thinking, ~well,~ okay, so there are a lot of databases, right? I've stored data in MySQL, Postgres, SQL Server, Oracle, ~whatever you're,~ whatever you're used to using.

Those are all relational databases. They all use table. And in those tables you have rows and columns, and ~that's,~ that's ~a,~ a data storage approach that was created somewhere around 1970s, like 50 years ago. Something like that.

Paul: Wow. It's, it's, that's long when you put it into the timeline perspective,

Michael: Right? Isn't it crazy? So this guy Cod, who worked at ibm, ~Uh,~ gave a lot of thought to like, what, what the most important things are and where money goes when we develop [00:02:00] architectures for computing and software. And he came up with this, this set of rules for laying out data and it, it resulted in, ~you know,~ COD'S rules for data normalization.

And the goal for that, for COD'S rules was to maximize the ability to store. And reduce the amount of times that you duplicate data, right? So you don't want to duplicate data when your disks, when those disks that cod was using back in the seventies, that disc space was ridiculously expensive. So naturally, the rules that he developed were designed to maximize efficiency on data storage and reduce the, the amount of times that you're duplicating data.

~So, uh,~ normalization results in a highly efficient, ~uh,~ disk storage. Mechanism. Now, as time has progressed, the, the cost structure for software development and the stack of things that you put into developing an application and [00:03:00] launching it into production has changed. What's happened over time is disk space is on its way to zero.

~We're~ we're watching, ~you know,~ where it used to cost thousands of dollars per mega. It's now getting close to zero, right? We're in, in the cloud. SaaS-based, ~uh,~ service architectures are offering data storage mechanisms that, ~I mean,~ it's approaching zero. So where we used to maximize for disk storage, we're really looking at a different cost structure.

Now, what costs more Paul in, if I'm looking at ~the,~ the amount of money that I spend to develop an application, it's not in disk space anymore. What do you think it might. 

Paul: ~And~ it's gotta be the. It's gotta be the time and effort put in by your team.

Michael: Exactly.

Paul: ~you look at a lot of startups and you break down their cost sheet. It's team~

Michael: ~Absolutely correct, correct.~ So developers cost more today than they do back, than they did back then. ~And,~ and really, so ~what do we,~ what do we do to impact ~that,~ that cost structure when we're looking at developing an application, we wanna make it easy for developers to store data. Well, uh, along about [00:04:00] 2009, 2009, 2010, ~uh,~ Elliot, the founder of, of MongoDB, along with Dwight Mary.

~Um, uh,~ Elliot Horowitz and, and Dwight Merriman, they were working on some large scale data projects and they decided to come up with their own data storage mechanism, and they wanted to make it fast, efficient, flexible, and easy to use for developers. ~And,~ and that's the key thing right there, easy to use for developers.

That I think that's been the overarching principle for MA for MongoDB from the very beginning. How can we make it easy and flexible for developers? Mongo's architecture differs from those, those databases that we talked about earlier in that it's not relational. ~I mean,~ you can express relationships between data elements with Mongo, Debi, of course, but it's not tabular.

You're not storing data in rows and columns in a table. You're storing data in what's called a document. It's a document oriented database. So what does a document mean? What does it look like? ~Well, um,~ are you [00:05:00] familiar with j s.

Paul: I'm familiar, and I'm sure most people

Michael: listening. R~ Okay.~ Yeah. So it's an object, right? So in code, ~you,~ you manipulate objects.

These are key value pairs. ~Uh,~ in J S o, they're key value pairs. There's ~the,~ the keys are always strings and the values, they can be in j s o, it's pretty straightforward. They could be numerics, they could be strings, they could be, ~um, you know,~ some other minimally flexible data types. But with MongoDB, We store data in those same types of documents, like a J S O document, but on disk it's written in a format called BS o a binary represented form of J S O N.

And the reason ~that it's,~ that it's binary is that j s o is relatively limited ~in the,~ in the data type support. ~And,~ and Elliot and the team that developed it, ~they,~ they wanted to be able to support more flexible, like for. Dates in J s O are stored as string, right? You want to be able to actually recognize that it's a date type.

So in disk, on disk, when you're storing data in MongoDB, it's stored as a date type. [00:06:00] And you store it as a string, but there's some extra bites ~that are,~ that are stored right alongside that field. ~And,~ and so ~that's,~ that's the, a documented, ~uh, uh,~ document oriented database. And I think, ~um, the,~ the key behind this is that it is flexible.

One of the key tenets of a document oriented database is that it's, ~uh,~ that it's polymorphic. It supports polymorphism many shapes, polymorphism. So when you're, when you compare that to a relational or a tabular database, ~uh,~ it's really difficult to store different numbers of columns in the same table, right?

You can't do that, like you can't have one row with 10 columns and the next row with 12 columns. It just doesn't work that way in a table. But with a document, you can have one document for each user, and one user can have 12 fields, and the other user can have 10 fields. ~That's,~ that's polymorphism. ~Um,~ and the reason that you might want to support that is that, ~uh,~ in code you are going to be maintaining what [00:07:00] data is captured and stored in the database.

So you, as the developer decide what that schema, what the shapes of your documents look like. And that makes it super easy ~for,~ for developers because at any point in time, I can go into my code, change what ~the,~ the application is capturing from the application front user front. And store that in the backend.

Now, you might think ~like, well, how does that,~ how does that happen? ~Well,~ imagine I create an application and I've got, I'm capturing users. For example, ~I'm,~ I'm storing user data. Day one, I get a, an influx of users, and I decided early on as an application developer that I'm gonna store the user's name.

Their address and their contact details and ~I'm,~ I'm confident this is gonna be it. This is gonna be a great application. I got, ~you know,~ 200 users on the first day and I realize as I'm looking at the data, I'm capturing the user's name. I, I captured it as ~full,~ full name in one field. Michael Lynn. That's one, one field.

~But,~ but then I want to like sort on [00:08:00] last name of. Oh, damn. ~Well,~ I could write a transform and ~I could,~ I could write ~a, a,~ a query that splits apart the username. Or wait a minute, why don't I just go into the code and add a new field called last name and separate ~the,~ the username field. And when I do that, I, I push my code to production.

Day two, we launch, ~you know,~ we're opening for business and, and people begin to register. And now ~there's,~ there's a, a new set of data ~in the,~ in the data. With username consisting of first name, last name, two separate fields. Now what happens to those day one documents? Those are, ~they're,~ they're still there.

They have 10 fields. Day two, I've got documents with 11 fields. Now that may blow your mind if you're a relational developer because you're used to a schema that restricts ~and, and,~ and has a uniform structure across the rows in your table. ~Um,~ it doesn't have to be that way ~with,~ with Mongo B. So now what happens to those day, day one [00:09:00] documents?

In day two documents? What I like to suggest when I review a customer's, ~uh,~ plans for their data schema, I always ask them to think about this, this scenario that, that the structure of your documents, the schema of your documents with MongoDB, it doesn't exist in, in a schema per se, separate from the database.

It exists in the documents the. we call this schema on read.

Paul: level for the document.

Michael: ~Yeah,~ yeah, exactly. So the schema only exists when you look at the schema. ~So,~ so I can determine ~what,~ what ~the,~ the structure of the documents are by looking at the structure of the documents, not, not by looking at some separate schema thing.

So I ask folks to think about this and include a s a a second field, a field that represents the version number of the s. So in my, in my first example, I've got 10 fields for user data. I'm gonna have an 11th field for the version of the schema. Now, on day two, I'm gonna, [00:10:00] as I modify my code to include a, a separate field for last name, I'm gonna increment my version.

This is the schema version. So now I know as I look at the data that exists in my document, Version one, documents exist and version two documents exist. And if I want to do a transform and cl and clean that data up and make it uniform, I can then write some transform ~and,~ and update the document structure for those day one documents, version one schema.

It's just, ~um,~ it's just good planning to think about the fact that your schema is gonna change over time. And as developers, we are responsible for that, not a separate dba. ~We,~ we get to, to control what that schema looks like for our document s. Does that make sense?

Paul: And I can, I can almost relate this directly to sort of like how you might build out a fire store application in fire based fire store. Cause they have a document based, ~uh,~ database. And something that I personally have never done is thrown a schema field into my documents and. I mean that, [00:11:00] that, that sounds great because what you're arguing here is you have a actual, actual tabular, or I don't wanna say tabular cuz we're out of the world tables here, but you have a written down field that's like Pedantically telling you that this is a different image from another document and it kind of saves you that whole transform step of inspection.

Michael: Yeah, exactly. Exactly. And ~that's,~ that's, that, that's ~kind of~ making ~the,~ the mind shift from, from relational to to document.

Paul: And, and another thing about the mind shift from relational to documents that I want to zero in on really quick too, is like, as somebody who comes from the platform operations, sea and Rust world, I, it's very difficult to sell me on convenience personally.

Michael: Hmm. Mm-hmm.

Paul: to sell me on convenience, even though like it's tried and true that the cost of developers is the big cost.

So you're kind of like walking this line about like, I don't wanna buy like something that I don't need, an abstraction layer I don't need. But one thing you presented [00:12:00] this conversation with from the very beginning is we're not talking about just like, How we access the data or like how it's organized, even though it does play into that at a fundamental level, we're talking about like what are the business concerns with like what is valuable when we store things and it shifted from storage

Michael: Mm-hmm.

Paul: engineers.

And I think that is kind of like this overarching rule that it's like, this is why this is valuable. It's not because it can read like. Some other access method can read 50% faster. Nobody cares. Like disc is cheap, CPU's cheap. We're caring about the quality of the team, our lubrication between teams and how fast we can develop tomorrow.

You know, so that, that really helps wrap my head around the value add of MongoDB in a way that I feel like I haven't been able to pinpoint before.

Michael: ~Awesome.~

Paul: ~so Austin coming from Yeah, that perspective and.~ We stepped into talking a little bit about schema and how data's stored in the document model, and if people are listening, [00:13:00] you can probably notice like, yeah, this is fundamentally different.

I'm sure a lot of people listening have messed with document databases and they're, and you know, they're like, yeah, that's a different camp. I either drink this Kool-Aid or that Kool-Aid and the whole point of this episode and we're gonna kind of break down that barrier and we're gonna talk. Moving from a tabular eyes relational database into MongoDB.

~Um,~ before we hop into that, really quick, I just wanna wanna remind our listeners that this podcast is brought to you by Log Rocket. ~Uh,~ log Rocket offers session, replay, issue tracking and product analytics to help quickly surface and solve impactful issues affecting your user experience. With Log Rocket, you can find and solve your issues faster, improve conversion and adoption and like we're trying to focus on in this episode.

At the end of the day, have your engineers focus their time on building a good product. So go check out log rocket.com today if you want to go learn more.

Michael: ~Yeah, right on that. I love that.~

Paul: ~thank you. Thank you, Michael.~ Well, Stephan into the, the more technical side about the follow up. DB and the difference between [00:14:00] relational, if I have a relational database, Michael, like I'd say the last two personal projects I started were using Prisma.

Cause I wanted to check out Prisma and how it interact with Postgres. Yeah, it was re, it was a good experience, but maybe I'm like, okay, there's another developer who's developing with me now my side project and I don't want to manage relational, I don't wanna manage migrations. What's the first step for me to like move into a document-based database from.

My SMA schema as an example.

Michael: ~Well,~ I think it's, it involves looking at the data that you're storing and how it's referenced ~and, and,~ and look like. I'm not here to sell MongoDB to everybody. I think if you're leveraging a, a tabular database today and it's working for you and this scale is, is such that you're not concerned. ~Um,~ good for you.

That's, ~uh,~ I love that. ~Like, I'm,~ I'm not here to, like I said, to sell any, anybody anything. But if you are concerned about your ability to scale and you wanna start looking at ways that you can, ~uh,~ migrate from a, a tabular to a, a document date [00:15:00] based database, there are some really great tools that will allow you to do that.

Now, when people think of MongoDB, if you haven't used it in the recent, ~uh,~ days, ~like. In,~ in Atlas for example, we have ~a, a,~ a cloud-based solution called Atlas. You can get more information@cloud.mongodb.com. This is ~a,~ a fully in the cloud solution and it's hosted MongoDB. And if you wanna start your journey ~and,~ and start to, to look at how you might begin ~to,~ to migrate your data, the best thing to do is head on over there and launch a free cluster instance.

There's no credit card. ~Uh,~ it's not time boxed, like you're not gonna run out of time ~and,~ and have to like, convert ~or,~ or put your credit card in ~and,~ and have it like bill you or something over time. Just launch a, create an account, launch an M zero size instance. And the wizards are great these days.

~These,~ these guys in growth marketing that they really have worked hard to make. The developers get a great [00:16:00] experience right out of the box. So when you go to cloud.mongodb.com ~and you,~ and you walk through the Wizards, it's gonna walk you through the process of launching a cluster. I'm gonna explain that in a minute.

And then once the cluster is launched, the free tier M zero sized instance, it's gonna ask you if you wanna, ~uh,~ enable access to your local computer. So you can start to, ~uh,~ to manipulate the data from your. Then it's gonna, ~uh,~ have you create credentials. I'm gonna walk you through the process so your data is secure, ~and then,~ and then it's gonna ask you if you want to load some sample data.

And this is gonna be really valuable. ~It's,~ it's about 300 megabytes of sample document, uh, databases. There are a collection of samples. That really represent some interesting ways that you can express relationships between data elements using documents rather than tables in a tabular database.

Relational, we know that the COD'S rules of normalization call for you to separate your data [00:17:00] into separate tables and establish relationships between those tables with MongoDB. While that is possible, you can create separate collections of documents. And express relationships by linking. One of the key things, ~uh,~ concepts that you'll need to wrap your mind around is embedding rather than linking.

And these sample documents will show you. They're gonna give you a good example of how, rather than. Separating your data, you may be able to gain efficiencies by embedding it. And when I say embedding it, think about those key value pairs again, and think about a new structure rather ~in your,~ in your objects, you're probably doing this today, but rather than having a separate table, you're gonna take that separate table and embed it into a field.

So you're gonna have an array within a field. So a field can be a key value. The value will be an array. And in that array you can have strings, numerics, and ~those are,~ those are single level [00:18:00] elements. But you can also embed another document. So you can have an a, a document ~within a document,~ within a document.

You can do multi-level embedding. You can also have arrays of embedded documents. That's how you can gain efficiencies for data retrieval. So by embedding data, We have this mantra when I, I do a data modeling webinar on a monthly basis that teach people how to think about creating these schema for MongoDB.

~And,~ and I always like them to leave with one key mantra. Data that will be accessed together should be stored together. So in a document, if I know I'm always going to be reading username user details and the orders ~that~ that user placed, do I want those orders to be in a separate table? If I. There better be a good reason for it, because I'm gonna have ~to,~ to issue a separate fetch.

~Like, let's,~ let's talk about this as at a physics level. If I can store the data for those orders in the same place that the user has stored, ~I,~ I get the efficiencies of one [00:19:00] data fetch, one disc fetch. Now, you as a developer, ~I,~ I'm suggesting that you should be aware of the cost of that. There are costs, of course, but ~when I,~ when I always.

Order detail with user detail, it would just make sense to me that you would store them in the same place. Now, for relational developers, this ~what,~ what typically happens next is, yeah, ~but,~ but you're gonna duplicate data. Paul, I don't care. Data's cheap. Disk is cheap, right? Let's just duplicate it in your code.

When you write your data out, you can either create a. And a trigger can duplicate the data. So you have it in multiple places if you think you need it, but I'm suggesting that you might not. And, ~uh,~ I think it really just comes down to analyzing the way your application accesses, updates and manipulates data so you understand what, where those efficiencies can be gained by storing data together or by separating it.

Paul: and I mean, I, when you say that, I genuinely wanna look on [00:20:00] my past as a developer and anybody who's listening, look on your past as a developer. And like seriously think about, is there any time my application database was too big? Like it's almost funny, like no, your application database has never been too big.

I don't. And now I don't wanna say never, cuz of course if you work in an enterprise, like what is your application database? Well, it could be a lot of things. ~And of course like. The poor folks run an elastic search out there. That could be your application database. And I'm sorry, but like ~that's not what we're talking about here.

You know, like for most apps, yeah. It doesn't get too big. So if I don't have to care about that, de do, duplicating my data to have it coherently in a view

Michael: Yeah.

Paul: is not something like you should have a problem with. It's not something I would've a problem with. Yeah. It's just a totally different way of thinking.

As a somebody like myself who's more a relational developer,

Michael: Mm-hmm.

Paul: me take this efficient tree of links and connections of my database and let me sort of like flatten it out on a document by document level. That's kind [00:21:00] of what we're talking about here. Cause and what you're insinuating, ~uh,~ Michael is like, it's really depends on your data.

How do you access your data? How do you plan on updating your data? And that will depend how I migrate from a relational structure into the document structure.

Michael: ~Exactly,~ exactly. And I started down that path talking about sample documents in Mango, DVI, Atlas, and ~those,~ those will give you a good idea about some of the ways that we can lay, ~uh,~ a document structure out so that it makes sense in, in terms of documents. Now, once you do that, once you start to understand those, you're gonna wanna take a look at your tables.

Now I can guarantee. If your application has existed for some time and it is relational in nature, you will have a bad time with MongoDB if you simply take your tables and convert them to collections. So the, the rough translation between tables for, from a tabular database to a document database is, is this tables become collections, ~uh,~ [00:22:00] rows become document.

Fields become columns, become fields. So that's the, the rough translation of terminology. And if you simp. Take your tables and convert them to collections and take your rows and convert them to documents, you're probably gonna have a bad time. ~You'll,~ you'll probably experience less than optimal performance.

~I mean, it is a,~ it is a way to start, ~and,~ and you can do that to begin. But I would suggest that you break apart your data, look at your application, and look at the ways that your application is, is accessing and updating data ~and,~ and take a look at where you can consolidate data elements between those table.

And create embedded structures so that you can save, ~uh,~ you can have some efficiency on ~your,~ your data reads. Now we do have a great tool and it's brand new. It's called MongoDB, relational Migrator, and I've been giving talks on this and ~it's,~ it's just so much fun to, to use. I love it. The way it works and ~it's,~ it's in preview.

So you can download this [00:23:00] today. ~Uh,~ I'm gonna give you a link, ~uh,~ shortly. You can, you, you download this, it runs in a browser and it runs locally on your laptop. And it's gonna ask you for a a J d BBC connection to your relational database. And then you're going to to be asked for a connection to a MongoDB database, and that can be running locally on your laptop or it can be running in the cloud in Atlas.

You'll need to get a connection string for that. So just as you have a J D B C connection string, you're gonna need a MongoDB connection string. And what's gonna happen is when you provide that J D B C connection string, it's gonna connect to your relational database and it's going to give you an e r D.

It's gonna paint on the. What your database looks like, it's gonna draw the connections between tables that you've expressed through foreign key relationships, and then it's gonna give you the opportunity to begin to to draw lines to this new structure in a document database. And you can create rules with relational [00:24:00] migrator that state how to get from point A to point B ~from,~ from relational to documents.

And one of the things I love about this is that you can begin to take a look at those foreign key relationships between tables and represent those as embedded structures in your new MongoDB document structure.

Paul: And does, does it help me if I want to even think about and, and model in my own head, how does this foreign key structure translate into an embedded structure? Does this tool sort of have, ~uh,~ visualizations and. View and breakdown that helps me say, well, what about a path forward or b path forward? Can I, can I try this or that?

Michael: Yeah. So it's early days. ~I mean,~ it does give you the ability to create these rules and. ~Um,~ execute the migrations to look at what the data looks like ~in a,~ in a MongoDB database. It's gonna take your data based on the rules that you create and whether you state that you want ~to,~ to take a, a foreign key relationship and embed that into a document structure.

It's gonna write the data out to your [00:25:00] new MongoDB database. Based on that, those rules now, It doesn't do like what if scenarios and it doesn't. The relational migrator today doesn't do any kind of performance analysis. ~Um,~ but what does happen is if you are using MongoDB Atlas, we have the Performance monitor, which is taking a look at all of the reads and rights that you issue to the database in Atlas.

And it's gonna look ~for,~ for bad form, it's gonna look for ways that you're maybe over-leveraging, indexing or maybe overloading an array in an embedded structure. If you're storing, uh, hundreds of, or thousands of array elements in a, a nested array inside a document, it's probably not gonna be efficient for you.

So it'll tell you. And that's what I love about the, the Performance Monitor built into to Mon and B Atlas. ~So,~ so we talked about, like, we got into this section of the conversation about like, okay, how do I begin to think about migrating well, The answer is, is probably, uh, if you wanna begin ~your,~ your journey, get [00:26:00] into the cloud, check out cloud.mongo.com.

It's gonna give you a free solution to, to start looking at your structures. ~Um,~ there are paid tiers that you can scale without downtime. If you do decide that you wanna move up the tree, ~um,~ and then relational migrator to take a look at your data structures. Now, my, my colleague John Page. ~Um,~ one of the, the brightest guys I've ever met, he's cl clearly, ~uh,~ expressed that the relational migrator is probably not gonna be, ~um,~ today is probably not gonna be on the scale of ~like a,~ like a multi-thousand table migration.

It's not gonna be the tool you're gonna use to migrate your multi-thousand table. Database ~to,~ to Mongo B. But what I contend, ~and,~ and this is where we ~kind of ~disagree, I, I believe that you can begin to illuminate some of the structure, ~uh,~ some of the structural paths that you will take to get from that thousand table to a document structure.

~Um,~ and I think the key for me is, In whatever I do, increase visibility and increased perspective. Always [00:27:00] give me increased insights. So I want, I wanna take ~a,~ a problem and look at it from ~every,~ every possible angle and twist it around. And that's gonna give me, ~you know,~ some ideas about how I want to tackle that problem.

~And,~ and relational migrator gives you that.

Paul: I mean, with relational migrator and a free account, I, it's nice I can see you being able to hop in there and just flesh out an example, migration. Hey, what does this look like? If I were to random migration tool? So, ~um,~ within, within the mi, uh, migration tool, can you have a hand in some, because we were talking about like, oh, ~uh,~ a table. Becomes a collection column, becomes a document. I mean, excuse me, a Rowe becomes a document, A column becomes a field in the relational mapper. Can you sort of dig down into how those mappings get carried out, or is it sort of like a gimme one, gimme two, and let's see what happens.

Michael: No, you do get to, you do get to specify. So it gives you two paths. You, once you [00:28:00] get your e r d from your, ~uh,~ your entity relationship diagram, from your tabular database, you have a choice to make. You can have MongoDB, have relational migrator, ~uh,~ automatically decide for you, and it's gonna make some clumsy decisions like table two collect.

Paul: Mm-hmm. Mm.

Michael: ~Uh,~ and you can have it do that automatically ~and,~ and ~you'll get,~ you'll get a dual pain display. I think the default is top and bottom. On the top you'll see the tabular. E r d the entity relationship diagram. On the bottom you'll see ~the,~ the, ~uh,~ en entity relationship diagram between documents and if you choose to have entity or relational wagon or make that choice for you, it's, it's just gonna display pretty much the same thing that you see.

Top and bottom. Bottom will be documents, top will be tables and rows. ~Um,~ or you can say, start from scratch. And that's where you can begin to click on the tables represented in the diagram above ~and,~ and choose that path. You can choose your path to, ~uh, to,~ to specify whether you want to use an embedding or a linking structure.

Yeah.

Paul: Okay.

Michael: Yeah. [00:29:00] Really great work by Tom Hollander ~and the,~ and the folks that, ~uh,~ are building, ~uh,~ the relation, the relational migrator at Mongo.

Paul: So when folks, let's say maybe folks use the relational migrator, their setup in Mongo db. ~Um,~ what is sort of a, ~uh,~ roadblock that you find teams running into where it's like, okay, I got everything set up. I feel confident about it. Is there something that, because something, something that has to do with the schema and the way I'm translating it to a document or collection is at the forefront of my mind when I'm doing this.

It's really like, I wanna make sure I get my schema right. It's what I'm thinking about. ~Um,~ is there something that people maybe don't think about as much that a month or six months down the line you have found comes back to bite them that they should have been thinking about in that first step?

Michael: Yeah, so relational migrator, let's take that outta the loop because there's, I have zero experience in, in the real world with it. I've just done talks on it because it's so new. ~Uh,~ but I have worked ~with,~ with hundreds, if not thousands ~of,~ of customers and potential customers in the context of migrating to MongoDB.

And I [00:30:00] think ~the,~ the overarching like, ~well the,~ the most popular question I get is like, how do I know whether it's to link or to. How do I know what, that's what the proper schema looks like, and there is no right answer. ~We have,~ we have some patterns, ~um,~ that in, in the talk I gave on DOC data modeling, there's some common patterns and ~there's,~ there's some great blog articles that you can read about those common patterns like, ~um,~ but I think ~the,~ the thing that, that most, ~uh,~ developers face, ~And,~ and are have a difficult time wrapping their head around is that you really do need to take a closer look at your application.

The way your application is, is interacted with from the user perspective, and then on the back end to the data there. You, as the developer get additional levels of flexibility because ~you,~ you ultimately control the way the data ~is,~ is laid out. So you have to understand how ~and,~ and what data is, is accessed by the applic.

And it really does [00:31:00] involve introspection, looking at the data elements that are interacted upon by the users. And I like ~to,~ to use like ~a,~ a movie like Netflix for example. ~When you,~ when you fire up Netflix, there's a common pattern of use. ~You know,~ there's this, this initial pain of display that has a summary of data elements.

You're gonna wanna make sure that those data elements in that summary are always super fast. Like ~you want,~ you want those cashed, you want those all grouped in the same place. So when you fetch a movie, you're fetching a, a summary of those details, right. ~With,~ with a single read. ~And,~ and then when somebody says, oh, that, that's interesting, I want to click on that and go deeper, then maybe there's a, there's an opportunity for a secondary call.

So ~if,~ if you're looking to, to. On, on, ~you know, your,~ your initial fetch ~of,~ of data. Maybe there's an opportunity to move that detailed data out to a secondary collection. MongoDB lets you [00:32:00] represent data relationships using linking. And linking is very much like relational database in that you, you separate your data and you use an operator that enables you to, to join those collections adjoin in SQL terms.

Dollar lookup with Mongo B. So if you're looking to lay your data out in a way that is separated rather than ~like,~ I, like I said, with the mantra, data that will be accessed together should be stored together. If you wanna separate it, you're gonna use dollar lookup to represent those relationships. And by the way, ~you know~ Mongo B is on ~this,~ this ever.

Ever, never ending,~ uh,~ path of efficiency. We in, in our latest release in six oh, there are massive improvements in performance around those dollar lookups. So you do get some additional flexibility in terms of, ~uh,~ efficiency and speed for laying your data out. ~Hope that made sense. I, I feel like I kind of rambled there.~

Paul: ~No, no. I mean that.~ I~ think that was a really great example you highlighted. You know, here's actually how you might do a link if you didn't wanna flatten and out into a single view. And cuz we didn't really cover that. Um, so thank you for sharing that with us, Michael. Um,~ talking about what's coming, ~uh,~ coming up, cuz you mentioned, you know, for example, linking great improvements on searching and indexing in the background.

[00:33:00] What other things would you maybe want to get people riled up about that's coming in

Michael: my gosh.

Paul: Oh, uh oh. What did I do?

Michael: There's so much. ~And,~ and if you haven't looked at MongoDB lately, like MongoDB runs on your laptop, runs on a server in your data center, it runs in the cloud now with Cloud dot MongoDB, that's the most basic thing. It's like a, it's an open database. You can, ~uh,~ you can just use it wherever where you Yeah.

~Right.~ It's an open database. ~You can,~ you can just use it wherever. ~Uh,~ but ~that's,~ that's like at the kernel of the architecture. We've built ~so much,~ so much on top of that. It's been a, an amazing, ~uh,~ it's been just amazing to watch the evolution of Mongo be over time moving into the cloud as everybody should be.

By the way, I think I'm gonna be delivering a talk at Info Share in Poland in, in May. The necessity of people to look at cloud native architectures, MongoDB can be at the core ~of this,~ of this evolution for you, and I think ~it's,~ it's absolutely necessary to start looking at those additional layers in your architecture and where you can become more, more agile.

~Uh,~ where you can leverage, ~uh,~ [00:34:00] microservices. For example, MongoDB enables you to, to embrace a cloud native architecture at every layer of the stack. We do offer a database, of course, that's at the core, but we also offer, ~uh,~ the ability to leverage Kubernetes as a part of that. We have a Kubernetes operator that, that works very well with Mongo dbi, um, and that works right alongside Mongo Dbi Atlas in the cloud when you're choosing a cloud.

MongoDB can, can be a part of that. MongoDB Atlas is above the cloud layer. So ~when you,~ when you use and deploy databases with MongoDB Atlas, you're choosing a cloud provider where that actual, the bits will actually live. So you can choose Google, you can choose a, ~uh,~ Microsoft Azure, or you can choose, ~uh,~ aws, ~uh,~ to house ~your,~ your actual data ~and,~ and the determination of that will be up to.

Based on, ~uh,~ maybe application proximity. Where are you running your applications? ~Uh, so,~ so that's Atlas built into Atlas. ~Uh,~ this is what blows my mind.~ I mean,~ there's just every day there's more and more functionality built into the Atlas [00:35:00] platform. I believe one of the, one of the key things that every application needs, I will be hard pressed to find an application that does not need really good.

~Well,~ at the heart of Mongo, Toby Atlas, we have Atlas Search. It's backed by solar, ~by~ by, ~um, uh,~ by the technologies, the open source technologies, ~um,~ that, that offer the best search capabilities. It's built right into Atlas. We essentially spin up that~ and,~ and allow you to index your data and efficiently search, provide search for your applications.

~Um,~ my colleague Karen Wame runs a great webinar ~and,~ and, ~uh,~ live stream. On how to incorporate search into your applications. So you definitely wanna look at that. That's new-ish. That's new. ~Um,~ there's, ~uh,~ data lake. Yeah. Oh yeah, absolutely. ~I mean,~ what application doesn't need the ability to provide really good search and fragments and facets ~and,~ and all of those things and, ~uh,~ that, that come with a good ~search~ search engine.

So ~that's,~ that's search. ~Uh,~ we also offer a data lake. So if you want to do, ~um, you know,~ data [00:36:00] storage, like ~there's,~ there's data archiving, ~um,~ levels of data storage so that you can increase the efficiency ~and, and,~ and the dollars that you're spending if you've got cold, colder data that you want to move off to cheaper storage.

There's a solution ~that~ that supports, that is Data Federation. I had the pleasure ~of,~ of doing the keynote speech at, in, ~uh,~ in San Francisco. In Dallas, ~and,~ and demoed, ~uh,~ out of the box, ~uh,~ data Federation. It's such an amazing technology. Imagine if you've got data stored in S3 buckets. You've got data stored in, ~uh,~ in other databases, ~uh,~ CSV files, TSV files, parque files, ~uh,~ maybe even like APIs that you need to.

Imagine being able to have one connection string, a MongoDB connection string that gives you access to all of those virtual databases underneath that. So front ending, ~uh,~ data in S3 buckets or parquet ~and,~ and even APIs, and unifying that with a single,~ uh,~ single access mechanism connection. Data Federation.

~Um,~ we've got really [00:37:00] great time series capabilities so you can index your data if you've got time series data. There's a really efficient data storage indexing me mechanism built into MongoDB. Today, with time series, we've got column storage data, so we've got analytical data, massive amounts of, yeah, so~ I mean,~ any way you slice it, ~there's,~ there's something that's gonna make it easier, better, and faster for you ~to,~ to access~ and,~ and manipulate your data with Mongo.

Paul: I mean, a column store really that. Deep that that digs deep into the cookie jar on the other side of the kitchen

Michael: Yeah,

Paul: doing column or store. Cuz we're talking like, I've never, I would never reach for a document database if I was building an analytical application,

Michael: Right,

Paul: now, now my mind is changing.

So that's huge. ~Um,~ lots of things on the future.

Michael: Ooh. Ooh. And I just wanted, want you mention one more thing, ~like,~ look, ~if, ~if you're, if you need SQL access to your data and you want to use MongoDB, we now offer Atlas sql. So the ability to leverage SQL to access the data that is stored ~in~ in MongoDB, and you would do that for [00:38:00] that analytical data.

You would be able to, to issue SQL commands against your data that's stored in MongoDB. ~Um,~ that's great for, for data scientists, data analysts. Maybe you're using Tableau or some, ~uh,~ traditionally relational tool to do your analytics ~and,~ and visualizations of data. You now have the ability to leverage. ~Uh,~ your backend store to be MongoDB and send ~those,~ those SQL commands that used to be offered through a tool that ~we,~ we had called the bi connector Business Intelligence Connector.

Now it's offered in native SQL at every layer you can send SQL to, to MongoDB. So where it used to be like no SQL and sql, ~never,~ never the twain she'll meet. But today ~it's,~ it's ~kind of~ a ~mixed,~ mixed bag.

Paul: And it's in like a very. ~Uh,~ controlled and developed sort of ecosystem versus like, you know, Postgres can say, oh, we do the NoSQL thing cuz we have a J S O column. But I, and every time I've used that, like, you need to know what you're doing. It's not. It's super simple. If once you, once you reach for it doesn't have this [00:39:00] ecosystem built around it.

So here we're talking about like a universal data access set of technologies and I, so these are gonna be pumping out in 2020. I mean, they're already out, but they're gonna be improving in 2023. Is there anything that's not out right now, Michael, that you

Michael: Oh, I, I wish I could tell you, I wish I could tell you some secrets, but I think, ~uh,~ I've mentioned all of the things that I've mentioned today are available either in preview beta or, or actually generally available, ~uh,~ the, like I said, ~the,~ the relational migrator that's in preview. I would love to get your feedback on that.

I'd love to give you a link for your show notes. That will have, ~uh,~ a list of all of the things that I talked about. ~So,~ so folks can get easy access to those things. ~So, uh,~ maybe that'll be like mdb.link/pod rocket. ~I'll,~ I'll make sure we load that up with~ all the,~ all the things we talked about today.

Paul: Awesome. That'll be easier for folks to remember too. We can include that in the show links.

Michael: ~Yeah, I love mdb.link. It's a, it's a brand new link shortener that, uh, my, my colleague Joel, uh, Joel created Joel Lord, great developer advocate conference speaker. Uh, so look for mdb.link/pod rocket. We'll load that up.~

Paul: ~Awesome.~ Michael, thank you for your time and your excitement and expertise coming on to talk about Mongo B. I certainly learned more about the ecosystem and what's P possible out there, and hopefully some other folks did too.[00:40:00] 

Michael: Oh yeah. ~And,~ and thank you so much for the opportunity. ~Uh, if you, like, you know,~ what we talked about today, it's just the tip of the iceberg. You can join me on the MongoDB podcast at mongodb.com/podcast to learn more. ~I do like you, dude, Paul. I, I love having conversations about technology. Sometimes it involves Mongo, Debi, sometimes not.~

But, ~uh,~ it's been a pleasure. Thank you so much.

Paul: ~Thanks, Michael.~