hugo bowne-anderson
Hey there, Peter, and welcome to the show

peter wang
hi, Hugo! Thank you for having me

hugo bowne-anderson
such a pleasure. We've been having many conversations about data and many other things for years. So it's really exciting to bring a conversation to the public. So currently you are CEO of Anaconda. And I would love to hear a bit about how you got into data in the first place and your journey to what you're doing now.

peter wang
All right, how detailed? Do we have a little brief one, and then you ask questions, or should I get?

hugo bowne-anderson
Let's do that? That's a great idea.

peter wang
 My educational background, my degree is in physics, as a scientist, basically, and a student of the sciences, I was always involved in data. But then when I got went into the industry, my work was doing computer programming and software development, and I wrote some software. And then I went into consulting using the scientific Python tools in the early 2000s. And there was, it was a lot more engineering, simulation, things like that. But then, you know, what, from numerical computing, and scientific computing, that is very adjacent to then data, data modeling, data processing, and as Python got pulled more and more into, like in finance, FinTech and certain other areas, they start getting pulled more into business oriented tasks. That actually is how I got into more of the modern data analysis kind of stuff. And yeah, it really was a journey from scientific computing, and numerical computing into sort of the land of SQL and the land of BI tools and all these kinds of things. 

hugo bowne-anderson
Why did you leave research?

peter wang
Well, I was never really in it. So I graduated as an undergrad, like in 99. And there was just so much excitement happening around computers and software. And I'd always been a coder and a computer nerd throughout college and everything. So I decided to go and just get a job in software development.

hugo bowne-anderson
And this was so 99 this was around the time Travis was working on NumPy.

peter wang
 Yes, but I didn't know him at the time. And I actually I only barely started using Python as a curiosity as a nights and weekends thing. I was coding in C plus for the most part during my first few years in industry. But then in 04 is when I found this like dream job, I get to use my science stuff, I get to write in Python, I really fall in love with Python at that point. And so that's when I started working at a company called Enthought, which is also here in Austin. And I did a lot of consulting and whatnot. They're

hugo bowne-anderson
great. And that did you make several Brian Van de Ven and Travis around that time? Lots

peter wang
of actually yeah. So Travis, so the one of the founders of Enthought and the CEO was Eric Jones, who was a co author of with Travis. So Travis, I think PR guru and Eric Jones created SciPy back in like 2000.

hugo bowne-anderson
And thought still runs the site by conference. Is that these? Yep, yep,

peter wang
yep, they do, you know, consulting around scientific data, Python kinds of stuff. It's a going concern. And they got a lot of really smart people there. And Travis and I met. So Travis joined Enthought in 07. And then some of the other people that we would later work with at Anaconda or slash continuum analytics. We met when we were collaborators at enthought.

hugo bowne-anderson
So this was around the time perhaps, John Hunter was working on matplotlib, Fernando Perez was working on iPython, it seems like there are ... There was some magic in the air, man. There are a whole bunch of

peter wang
magic is everywhere. Magic is everywhere Hugo magic is everywhere. And it is and what happened in the 2000s, there was a really wonderful global collaboration happening around open source, scientific Python. That's really the truth of it. And it was the result of several things happening and sort of software development. We know people were using Sourceforge, CVS, Subversion started become subversion. But this is prior to GitHub. Right? So we had mailing lists, and people would be on mailing lists, people show up on the mailing list, say, Hey, I built this cool little project Okay, there were places where one can host files, even several megabytes in size on the internet. No, but the interesting was the human ecology of it. And I think we'll probably be talking about that quite a bit over the course of this discussion. Absolutely. These movements are really the result of lots of individual people going the extra mile doing a really putting they're putting their their heart and soul into it. So for instance, we would do the SciPi conferences, we would have once a year out in Pasadena at Caltech, and a guy there, Michael ahwazis, was able to organize for us and get us the use of the space and have this really wonderful venue for us. 100 or so 120, whatever 100 people was to go out there and so much collaboration and sharing happened there. And a lot of the people who are now considered because they're like OG the PI data type people, people who started lots of things on their own. They were all kind of joining the community at that time. Like Greg Wilson with software carpentry I remember when he first showed up in the community and then Gael as he was starting and getting working on the psychic learn and the actual stuff on VanderWalt as well so I could image and whatnot. Like all a lot of these different people were all pulled together into this nexus of collaboration,

hugo bowne-anderson
all working on different questions as well in different fields, but they were all mindset or a lot of them were building tools for stuff they needed to do on a daily basis. And

peter wang
this necessity as the mother of invention, but also, there's an economy. I think part of what made this software, this open source software ecosystem so different is that there's an economy of time, which is everyone, no one going into this open source community really was thinking about, I'm going to go and build a software Empire around these, like Python scripts I'm writing, it was more like, Oh my God, I've got to write these scripts to do this data processing. So I even like maybe tried to get tenure. And so that made us that people really basically did as much as they needed to, but no more, and really would let someone else take the ball and so became a relay race out of necessity, not solely by design. This is my sort of anthropological lens, looking back on it, no, but it's maybe it's a just so story. But it fits pretty well, which is that everyone working on these things, they were building tools that they needed to wield in anger, they didn't have a lot of time to sit and polish and skills, you get a software nurse together in an open source thing working on just software projects, for software sake, there's a ton of bike shedding as so much Oh, what if we did this. And if we do that, and we abstract this, and we create this, like super generic framework for doing data, these are like the scientists and researchers, they don't have time for that. They're going to learn just enough of this module or that technique to do this, then these things, and boom. And so I think that actually led to a really interesting cellular and modular innovation landscape. So things had to work well with other things, you wouldn't have someone coming along saying this part doesn't work in that part. I'm like, I don't like this other thing. Let me build a whole cloth complete replacement for all five of these things. So I can be like the Uber Mensch of this ecosystem. No one was doing that. Like it was just no one had the time. And that was really great. 

hugo bowne-anderson
Was it difficult to establish a shared foundation?

peter wang
Was it difficult to establish a shared foundation, it was work, it was definitely difficult it was work. A great example is the numpy. And numeric, sorry, the numarray and numeric split, which Travis healed over with NumPy in 2005, or six. But then some of the things that we don't have a shared foundation for it's still to this day, don't are around plotting, except that matplotlib, of course, is the one that everyone uses. But then when it doesn't quite do what you need it to do, then there's 15 things you could go use. And so there's some of these things where some shared foundation things did get built, and they took a great deal of effort. I was very difficult work and nuanced work. And then other cases, we still don't have some shared foundation. Yeah, we just end up with a abundance of riches,

hugo bowne-anderson
I'm not the type of guy to make PSAs. But I will make one if you rag on matplotlib, but you use inbuilt pandas plotting methods or use seaborn you are using matplotlib. Okay. So be very aware,

peter wang
yes, one of the things that many people in the community learned was this really interesting thing about the difference between implementation and interface, right part of what made this community great. And part of the reason why the tools that were built out of this community got so much traction so quickly, was because it was domain experts of one stripe or another, taking a tool, a better tool, which is the Python programming language. And creating artifacts that were fit for purpose, not just fit for purpose for what they're trying to do, but fit how their brain thought about doing it, it'd be very difficult to write a spec that you kick over to a Java programmer or some random Java programmer to go and build something like NumPy, you really needed a bunch of numerical computing, scientific computing nerds who spent a ton of time in MATLAB and done some Fortran to make a thing like NumPy. This is true across the board. If you look at so many different things, I think the thing that happened with these to your point about like people gripe about matplotlib, and then they use these other things that still rely on matplotlib. A lot of projects ended up growing as they grew and evolved, they realized they had a foundation, and they had an interface. And in fact, for the initial phase of the project, foundation and interface together were fine. But their engine and interface, let's say, but then as things progressed, and they wanted people like the interface fit in people's heads, but then they wanted the engine to be more powerful. And that's then that then it was like, Oh, crap, we have to get actually really intentional about the software design of this thing.

hugo bowne-anderson
Yeah. And actually, there's a really nice point in there that matplotlib NumPy. These types of interfaces allowed Python and PyData to convert a lot of MATLAB users. So like I used to teach in academia, a great deal where MATLAB was used a lot. And these interfaces allowed me to go hey, you can tweak this. And we can put this over to pi Python pretty straightforwardly.

peter wang
Yes. And I would say that it was the real result of that, like, Why didn't it happen to Ruby or Perl could have done it to go or JavaScript, what a c plus template. There's amazing template libraries for doing array manipulation in C plus why wasn't a c plus concern. And the the higher geophysics community, they use route and all these things that are gigantic c plus library. So there's a proof of existence that's possible. But what was about the Python and these things and NumPy and scipy and matplotlib. And all these things that made it I think my conjecture is that that notation is actually a tool of thought that harks back all the way to the APL stuff. notation is a tool of thought and like computer science People and programmers, I see myself as a very much an amphibian here. I am a programmer nerd. I love like programming stuff. I read Slashdot, religiously, throughout college. I'm a coder nerd with all the codes. But I can also just switch over here and be a science nerd with the scientists and hanging out with both these different tribes. I know they're actually quite different. They're quite different. And the coder nerds for them syntax is it doesn't matter syntax is something the lexer deals with. And yet for all other mortals on Earth, syntax matters tremendously. And so but Python by virtue of being somewhat readable, executable pseudocode, just had a natural leg up on stuff for your one semicolon away from a segfault. So so it just was like a nicer language. And the fact that you could, it wasn't a write only language, or had an ethos of cleverness, the way that Perl did, meant that it was easier for you to build on top of other people's stuff. And maybe there's only two or three elements like that, that you really need. And it was also the right time. It came around at a time as the internet was always on, broadband was getting everywhere, yet people have global community of people coming together. And then some of the people that the other thing that that helped all this come together was that the language was easy to adapt to the language of science, which is mathematics. So Travis, and he doesn't talk about this very much anymore. But in years past, he would really wax poetic about the fact that Python had a built in comp, numerical type. No, not all languages have that. But if you're an electrical engineer trying to do some work, gosh, that's really cool that you've got a built in complex numbers. And they're well thought out to like the leg, the Python core C Python language core dev people, not that they were not really scientists. Although Raymond Hettinger has an engineering degree, I believe. And so he appreciate some of the numerical stuff. Tim Peters really understands about numerical stuff. So he had some core C Python, people really got it and cared about designing that stuff well, and that was all it took for them people like Travis and others to then take it to the next level, I think several of these things coming together. And then the one, maybe the one last thing is the REPL. The REPL really made it look like just like, I have to say I love my HP 48. But I gave it up for Python, like the repple is my calculator now and has been for like 20 years.

hugo bowne-anderson
So for people who don't know what a REPL is, maybe you could tell.

peter wang
So REPL is actually an acronym, our EPL read eval print loop. And what it refers to is if you type python at the terminal, or if you fire up a kind of a Python interpreter, and you see those little three carats, and you have an input, or if you use the Jupyter notebook, and you add in bracket one kind of thing, you can type a statement or you can type an expression, you hit enter, and it evaluates it just like if you were using like an interactive calculator. And if you're a Python person, this makes all the sense of the world. So of course, you would do this. But what's not appreciated perhaps is 20 years ago, if you're used to using things like C or C++ or Java, this is unheard of. And that was before Java script. You know, people were programmers, some today, of course, expect this to every web browser has a built in console. But Python was really maybe not the first language of the REPL by any means. But it was the one that really made that super popular

hugo bowne-anderson
for people. And it makes sense for data exploration, right?

peter wang
That's the thing. It's actually I use a lot as a programmer, because I could try constructs, I could try writing for loops, and having a REPL to play around with. It actually encourages you to write smaller, more bite sized pieces of functionality. So you can import them in to the shell, try them out. So you're unit testing little chunks of code as you go. Whereas as a C++ programmer, I would write a ton of code a lot of code with the IDE helpfully auto completing things and highlighting things whatnot. And then I would run my tests. And I would test like, hundreds of lines at a time with Python, I'm testing 10-20 lines at a time, a very different way to build. And it's very natural in Python. But  you're right that for data exploration, it's wonderful, right? It feels like you're flying an airplane through your data. And so that REPL aspect, and so that's why Fernando when I say Fernando, I'm referring to Fernando Perez, the creator of a project called IPython, which then became IPython notebook, which now has become the Jupyter Notebook. So when he first started IPython, it was a weekend afternoon project, she has this get a slightly nicer interactive prompt for doing science and numerical work, then the built in Python prompt because the built in Python interpreter REPL, it's really meant to execute lines as if you're writing code into a script, it should be no different than if you executed the script. And you paste it into the prompt there. So in any case, I think those things, the fact that Python was nice and easy to read, and you could write stuff that could other people could extend the fact that there were some nice primitives and nice affordances for the numerical and scientific community. And then the fact that you had this wonderful REPL, which is a glorified calculator, then anyone who writes a library or module, they're now surfacing these things into a REPL that people could then use. It was a really forced you actually, as a design point. This is a very important thing. You wanted to create a set of things for your library that people could load into the REPL and use. It wouldn't do a lot of good if it was just a library and people had to read all these document documentation and write a big ole program in order to use it. You want to have some things that people could just pull it something in nicely explore and play with stuff. That was huge. That was like really huge.

hugo bowne-anderson
Humour me for a bit. And I just want to ask a sociological question is around the role of relationships in building such an ecosystem. So I think one, if I recall correctly, Wes McKinney, when he was building pandas considered John hunter who was working on matplotlib at that time, a mentor and a friend and an inspiration and would fly places to go and meet him and sit in a room together and build stuff together. So what is the role of relationships and having lunch and knowing people's families and all of these types of things in building this rich ecosystem?

peter wang
It was huge. And this is why I think like those early scipy conferences were such a big deal. And that it fundamentally got people, to for those of us who have the privilege of being a fly to Pasadena once a year, yeah, I got people really able to to see the the humans behind the mailing lists, email addresses and messages. But and I know that there's a the human relationships that are anchored in shared values. So the fact that it was a science community, you put any two scientists on Earth together, regardless of language, race, creed, whatever kind of thing scientists as a bunch, they have some shared values. There's a natural curiosity, there's an openness to exploration on the ocean, Big Five, sort of list of different things, temperaments. If you get scientists early in their careers before they become jaded academics, there's a natural sort of openness and exploration and kind of curiosity. So when you put some of these people together, now they're building something together. And there's now a center of mass that forms a that does create the center of gravity creates a creates a field that creates a gradient. People who are like total like pre Madonna assholes, they flame out of the thing is no one wants to work with them. And people who like are nice and work well with each other. There's a bonding force that kind of pulls that in. And we're just really blessed that so many of the luminaries in the community are not aholes, like Travis is a deeply humanistic person who cares a lot about community and others and mentorship, right? People like Fernando, it's---

hugo bowne-anderson
the most important thing to Travis, I think I've always valued how, how much time he spends with people and listens to people.

peter wang
Yeah, the human aspect of the it's not just it's not code. It's a human collaboration that produced some really great code. Yeah. And I wish people would look at that more than just, if you go to Hacker News like this project, that project, this GitHub repo, and it's No, put all the code source code just bits on disk somewhere. It's the people if you wiped all of it, and you got the same people together, they will produce something just as great. But if you kept all of it, and you replaced it with a bunch of a-holes, they would not go and we'll just fly apart totally. So it's really that human ecology.

hugo bowne-anderson
Yeah. And I wanted to zoom in on something you said, for those of us who had the privilege to fly to Pasadena. This is actually an incredibly important point. I actually, there's a blog post by Juan Iglesias, who works on scikit-image, in which he talks about the importance of going to conferences, and it's an older blog post. And there's a comment by and I'll find this post and put in the show notes. There's a comment by someone called Cathy O'Neil on this post. And she responded and said, I think that's something like I think that's great. But I'm a mother with two children. When I get time off work. I can't necessarily go to conferences, how can we be more inclusive something, and I may have butchered that. But that's the general takeaway. I think this is key to discuss now, particularly as we've seen in early PyData, there was perhaps a lot more representation from men than women. That's one aspect of diversity we can consider. We saw that essentially blow up in some ways last year with a controversy around the numpy paper. I'm wondering how it was at the time what type of progress was seen since then, and how we can be more inclusive now.

peter wang
I think there are a lot of efforts underway to improve the situation, they have been for quite some time. And the Python community has actually been, I would say, fairly old, like they've tried to lean into this as these issues are brought up. And as the there's a general awareness over the last, I would say probably 10 years, maybe more than 10 years ago. So the Pydata and scipy folks have been part of that as well trying to be more aware about inclusivity. And one thing I will say is it's a very nuanced topic, because certain aspects of the communication systems like it was all very English oriented, the English mailing list a lot of English speakers, you show up and you start, you have some broken English, they say they would apologize for the English being broken. And I'm an immigrant to America and English is my second language. So I feel a lot of empathy for someone trying to post a technical question on the mailing list in not their native language. And I always felt bad when I saw that because it's like, what are we doing to make these people feel bad that they have to apologize for like, we need to do something to make it so that's not the people don't feel like they have to self censor in an opening in their emails. But but the point is, if you could speak English, though, didn't really matter where you are in the world if you're making things of technical value, and you had this similar technical values as other people on the in the mailing list or that pet project, generally the contributions are welcomed. So in a certain way, for the folks that it touched, it wasn't however, that network wasn't the network itself wasn't that inclusive who's English speakers people internet access. So a lot of people in the global south did not have these kinds of opportunities. Those who did though like for instance, you know spider, which is a very popular project. It's been paid, maintained by folks in Central America. And so there's other projects, I think you actually brought up Wes, there was a I remember one time in the pandas, was it a scipy sprint or something where so the pandas folks were just talking about this like this, like random Japanese dude that showed up and was pushing amazing commits, like they had amazing PRs are fixing all these things. Like who is this guy, and they all loved it. In a sense, it wasn't like, again, the network, if it could touch you, and you had great things to contribute. It was very, it was quite open to contribution. But it wasn't intentional about growing the boundaries of that network. 

hugo bowne-anderson
And also, I think the challenge with PI data at that point was actually downstream of other challenges such as lack of gender diversity in physics, for example, right? When you have a lot of people from certain STEM fields coming in to build tools, there's a downstream effect there. And that isn't to say there isn't a responsibility at all there. But it's trying to diagnose what happened several decades ago.

peter wang
 Yeah, I think where these conversations they are, there are good. And then there are less good ways to approach these discussions, right. And so I think when people get upset about certain of these issues, when they start making assumptions about, well, here's the outcome, right? This is the output, I don't care what the transfer function was upstream of this, I'm mad about this outcome. And therefore I'm going to assume that intent or blindness, or bla bla bla, on the part of everyone here, that's not really a constructive way to approach the problem, especially when we go and you talk to us people are like, everyone recognizes this as a problem. And so we are so I think part of the thing that is important to, to look at this, as we should say, yes, there are certain there are I guess what's called like pipeline, right? That would be the pipeline theory that there's a general like funnel of people come in here, and then we need people here. And then there's ultimately the people who end up contributing are the result of a lot of other upstream segments of the pipeline. And so the lack of gender diversity is the result, a large portion of it as a result of lack of gender diversity upstream? Yeah, that's not to excuse the outcome. Yeah. But that's just to say, let's try to apply the appropriate solutions at the appropriate stages of the pipeline to fix all of the different parts of the pipe. And I think that's the way I've seen this go sideways, when someone says it's a pipeline problem, then someone else immediately says, You're just trying to excuse your own role in it. And then it just becomes this very confrontational thing that it doesn't lead to any better outcomes. 

hugo bowne-anderson
I think the objection isn't just a pipeline problem already is totally valid. And we need to diagnose a lot of different aspects in order to perform interventions. I’d encourage listeners to check out Rachel Thomas’ post If you think women in tech is just a pipeline problem, you haven’t been paying attention, which is a wonderfully thoughtful piece on these ongoing concerns, and i’ll link to it in the show notes also.


peter wang
That's right. Yeah. But I will say, for instance, when I went to this, there's also to the point about inclusivity. And kind of, I would say, maybe a Eurocentrism, or a kind of American sort of centric viewpoint on some of these things. When I did, I've gone to India a few times to speak to data conferences there PyData, or Python, India. And what I was always impressed by was a much, much healthier gender balance in terms of practitioners of the data science stuff, like in India, I actually think the female practitioners outnumbered the male practitioners in some of the place I was looking at. So it's interesting thing to see that what can parts of the West learn from places that are doing this a lot better? And I think there's a there's definitely I would say, like the community, there's a point where Oh, who was it, we were at a PyCon, and someone submitted a pull request or started a GitHub issue on SciPy, about not using the Lena image as like a default image or having that be the thing. And you can imagine, there are technical communities where that would have been like, a rant from like the BDFL about Oh, my God, and like all this stuff. But I remember what I was there because Travis was next to me, we're looking at this issue, or this came up and I was looking at he's like, oh, yeah, we totally could we totally remove that. Fix that it wasn't even like a discussion. I was like, Yeah, we need to fix that. Right. And so I think this is what I mean, like that. There are there is in the community in the Python data community. There's a lot of willingness and a lot of good people with good intentions. I do hope that we make strides in improving this for sure. Sadly, it's taken a long time. I think a long time.

hugo bowne-anderson
I'm interested in what happened next in PyData. And I don't have a specific question, but I suppose it would be around the formation or the beginnings of the conda distribution.

peter wang
Yeah. So interesting story there. We started, we had essentially in at the end of the 2000s, we were seeing a lot of demand for Python, for scientific Python tools in finance. And as we were doing consulting as we're looking at what you're doing with it, it became clear to me that number one, the scientific tools that we had and some of the data processing tools, were absolutely killer for some of the modeling that they needed to do and a lot of statistical processing But it was also clear to me that the other tooling that was there's a lot of other tooling needed data manipulation stuff, really treating tabular data structures in a first class way that wasn't NumPy record arrays, SQL interfaces, the web was a thing, people really wanted to build web interfaces. And the story for building web interfaces in Python special with data was like not great, plotting on the web with the Python data stuff was not great. And so there are a lot of these things that need to get papered over. And what I wrote on paper didn't get fixed, not papered over. Yeah. So what I realized was, I said, Look, I think there's a huge opportunity here. But we're not going to get it by trying to just piecemeal open source like consulting, where kind of things trying to take 10 years to creep, they're like, we have to actually do something much more deliberate and intentional, to condense and coalesce this emerging data analysis stuff, business data processing, data analysis around Python. And so we started the company to do some of those things to actually explore other ways of funding, can we sell products, and then take those the product revenue and put that into open source development. And then also, Travis was very passionate about, again, as you said, about community. So you want to make sure that the open source projects in this ecosystem had a nonprofit to be housed that they could get funding, they could just it would be housed at a nonprofit, because that was one of the things that was actually it's one of those if you have governed open source governance is not a topic that the vast majority of users of open source ever cared about. But anyone who makes open source has been doing it for more than two or three years. They really appreciate how important governance is, especially if your project gains steam, and you want to get funding for it and whatnot, that governance stuff is critical. So Travis wanted to have a shared infrastructure for all these different projects. So that's why we created numfocus. And then I'd be like, I love people. I love hanging out with people. And so I said, Look, I think we should also as a way to accelerate and catalyze this effort, create a series of meetups and conferences and events around just Python for data analysis. And let's call it Pydata. And I didn't steal that title, by the way from Wes his book because it hadn't been written yet. So he wrote Python for data analysis, theoretically, that everyone's got, yeah. But prior to that, I said, Look, we should call we should create nicer I read the domain pydata.org, do this workshop at Google in March of 2012. We got a lot of people together. And it was clear that this was a interesting thing. And we should keep doing more of these. And then in the fall, we did a real we did a conference. So we did it was really it was great. So then the PyData conferences started happening. And then now we have them all over the world. And we have meetups all over the world. And then that has been a critical part of getting the community pulled together around the usage of Python for data science, to the point now where the numerical and scientific Python stuff is almost like the stuff that gave birth to all of these things. That's a smaller conference, right? SciPy is attended by a lot of really great people. But it's a smaller conference relative to the large panoply of Python data conferences that are out there.

hugo bowne-anderson
Exactly. And so how about the distribution on the package manager?

peter wang
yeah, we'll be talking about that a little bit. So at that first conference, it's a really, it's a funny story that's not told very often. But at the time, the biggest challenge that we saw to getting adoption for Python for data analysis, were there were two challenges. Number one, inthe open source world, everyone thought R should be it. So when I went to like Strada conference or to other kinds of big data, conferences, big data was really the big term at the time, I would go to big data conferences, and people would assume it was going to be R and I would say, I'm using Python, or I'm interested talking to Python people, and they look at me weird. And then the other thing was everyone was completely obsessed with Hadoop and big data. So MapReduce, big data, MapReduce. And so at this first Python data conference, I thought we should have something about MapReduce, right? Like we should probably have something. And there were Pythonic MapReduce frameworks. There's one lesser known one called disco written by a guy named Ville Tuulos. And we had one of her, we had one of our one of my friends, he actually been using it for some stuff. And so he volunteered to give a talk on disco. If you want to do Pythonic MapReduce, you could use disco to do it. And then I looked closely at disco, I was like, Wait, this the pile of Erlang in the middle of this, which is fine. There's nothing wrong with Erlang it's great language. But at that time in 2012 People still were having trouble installing like matplotlib and scipy, it was very difficult install Windows. On Mac, it would be like one of the build instructions now when in a Linux people can mostly RPM or apt get or something and get it but it was still like it was a mess. Okay, so I sat there at that workshop, I sit in the back with Travis watching my friend Chris Mueller, give a talk on disco. And I'm like, if we can't even get people scipy 10 years later on a reliable basis. How the hell are we going to ship people erlang? Like how's that work? And so I say Travis I think we should create like just a Python distribution focused on data analysis Python for big data. So like big snake like Anaconda, let's call it Anaconda. And that was the origin of that literally a talk on disco was the like two seconds I looked over Travis and proposed that idea.

hugo bowne-anderson
That is fantastic. Because of course, I've been chatting with ville recently and you told me a story, which I haven't asked him so he will either Hear this live? Or I'll tell him before we put, that's amazing. And you will then continuum analytics right now, of course an anaconda. And what are you excited about? What are you up to it Anaconda at the moment?

peter wang
lots and lots of things, we started the company to try to push Python for data analysis, I think we succeeded in doing that. one of the ways we succeeded doing that was, of course, creating the community dynamics and putting the conference together, creating some pieces of open source that have really been helpful for visualization, performance with numba and of course, the most important thing I think, is getting the software to more people's hands. So the Anaconda distribution, conda packages, those were really important parts of doing all that. But the vision was to empower the vision wasn't to be a package manager for data science software. The vision was to empower domain experts around the world, everyone to be able to use computers to answer questions, right? Our vision as a company is still to create a world where people are empowered, right, to ask questions of data and to think about the world understand the world better because with data, so that means schoolchildren that mean school children in whatever country, of whatever means they should have access to the very same high quality tools that the most the wealthiest hedge funds have. Do your kids write Python code yet? My son does. Yes, he's 11 My daughter does not get okay. She's only seven. The other

hugo bowne-anderson
thing I don't know maybe you can't say this. I don't know. You've also Anaconda and continuum has historically funded a huge amount of open source development. Dask is incredibly close to my heart, you began dask and funded a great deal of it funded people to develop pandas are a lot of different parts of the ecosystem where you put a lot of resources and capital behind

peter wang
we ourselves started several projects we started numba, of course, conda, the package manager, we said, the conda ecosystem, there's build, there's a bunch of other tools around conda. What about the data shader stuff? So there's a visualization tool, so the bouquet web visualization stuff, and then data shader, which is a large data visualization, and visual statistical processing system, then we mentioned Numba. There's other tools like intake Fs spec, there's lesser known things like odo, well-knwoj things like Dask for distributed computing 

hugo bowne-anderson
the blaze ecosystem back in the day.

peter wang
All of these things came out of a vision Travis and I had called Blaze. So you remember that we had this vision of actually trying to fuse the what I call the Iron Triangle of computation, data representation, algorithmic expressions, and then compute infrastructure, meta compute, trying to fuse all these things together into one system, it was too big, and then we're just too distracted and a lot of things to make it all happen. So different people took different parts of this and made different things right. So the so the expression stuff became folded into the numba project, which is ready kind of at the time, ongoing project, some of it ended up in Dask. Right, so the meta compute stuff and distributed compute, unified scale up scale out that all went into Dask. Now Rockland looks at all the servers, he was like, that's a really big vision. I'm just gonna write a distributed computing system. He went did

hugo bowne-anderson
that in 16 lines of code. Oh, so I went back and looked at the first commit last year,

peter wang
not a lot to start with. Yeah, it started it had a humble beginning.

hugo bowne-anderson
Yeah. That's a humble man.

peter wang
And on the data, stuff that ended up a variety of different things came around, Phil cloud wrote a project called odo, to try to data transformations of schemas or schema transformation.

hugo bowne-anderson
And he's been fundamental in pandas as well. So

peter wang
he was a pandas maintainer. That's right. And then that data stuff, the data corner of the blaze iron triangle, the FS spec work that is in Dask. Now and pandas that Martin Durant worked on as a way of abstracting file systems, but then also intake, which is a virtual data catalog system that is starting to get more adoption that kind of came out of that portion of the division. So eventually, Travis I joke about this me over beer and him with a Diet Coke, we joke about the eventually these will probably converge into something in the cloud as a part of the Blaze vision. But yeah, these projects all came out of that. But we also funded a number of projects that were already ongoing concerns in the ecosystem. So we funded pandas for a while we funded scikit-learn for a little bit, we funded actually the Jupyter Lab project started a little bit at continuum slash Anaconda. And then it was also funded by Bloomberg 2sigma and other people as well. But the maintainers were there. And also we funded spider, the spider developers for a while and aspects that Jupyter some Jupyter extensions and things like that. So we have done a lot of open source over the years.

hugo bowne-anderson
In all honesty, the way that the whole community has engendered further community is what drew me in initially, and I don't actually I don't know if we've talked about ever or you recall the the first time we met, but it was at a Pydata meetup in New York. I just moved to New York City, and I was really excited. I was like, I want to, I want the city to take control of me. And I'd met Jim McCarthy, and he was like you should come up who was working for yourself? That's right. Yeah. He was like, You should come to this Pydata meetup. And I remember going and it was you gave your data shader talk. It was Brian Granger and Tom Caswell, I think spent speaking something like that. I just remember that the electricity in the room and you gave the first talk and you definitely had this energy which was all encompassing and I was like, this is community I want to be a part of.

peter wang
That's great. I remember that. And it was. It wasn't a raining when we got out. I feel like raining. Yeah, it was right. Yeah, it was great. That was a fun meetup. Yeah. Jim, Jim McCarthy is a good guy. So right there was. There's a lot of open source projects that we did at the time. But to circle back to answer your question about what's going on at Anaconda, and that I'm excited about leading looking forward, what we realized is that the the Open Source Software Foundation, providing just getting open source software into people's hands, and then getting into business users hands, that's actually a really big lift. And there's a lot of invisible work behind that, that is really critical to ensure that the ecosystem is viable in business. So that Pydata users can go into businesses and get jobs doing Python data stuff, and not have to save up everything, go learn SAS. So there's a lot of work we do to try to get businesses feeling good about using open source. And a lot of that work continues. So we're continuing to do a lot of that work, that takes a lot of energy that I think people sometimes don't appreciate. But then in terms of end user facing practitioner facing tooling, we're doing a couple of different major things. So we're trying to actually build more of a community site to pull a lot of our users and a lot of different data science kind of beginners in so they have a home. Right now, there's not really a home for data scientists. There's there's Twitter, of course, you want discussions there Stack Overflow, they have questions, there's GitHub if you want to look at projects, and there's all sorts of different things. But one thing that we see is a lot of people download our software, and they don't know what to do next. So we want to create at least a launching point for them to say like least you you start here, ask them questions and find resources, and then go from there to wherever. So that's one of the things that we're really excited to build. And that's called nucleus. And we're adding more more features into that as the months go by. 

hugo bowne-anderson
And we'll that'll link to nucleus in the show notes as well. 

peter wang
fantastic. Yeah, thank you. It's just Anaconda dot cloud, Okay, who wants to know, it's also easy to type. So we're adding a team to help with maintaining and improving Jupyter. So right now, the Jupyter classic notebook is a little under resourced. So we're trying to add some headcount there, they just take care of that. But then also, there's a desire to move folks to using Jupyter lab as the future of Jupyter. And so implementing a good almost like button for button UI clone of the classic notebook experience in Jupyter lab. That's something that we're gonna be putting muscle behind as well. So that's a fundamental tool that I'm really excited and hopeful that we can add good contributions there. And there's some some new vis projects. But some of the work that the team behind data shader has been putting together is the holoviz stuff, right? So there's holoviz for doing easier specifications, visuals visualizations, but then the panel and lumen work for doing really nice, simple dashboarding application building. That's something that's been a dream of mine from the very beginning. 

hugo bowne-anderson
it's a huge gap.

peter wang
R has shiny, and we have had 80% of shiny, like things like spider and project. I know Yeah. But then as you try to push any of those the last, the last 80% it everyone wants different things. And so there's a real sort of like longtail product design issue around this stuff. I think the work that that Philip and John Luke and others, they're around panel and lumen, that what they're doing there seems to have, I think, cut a really nice design kind of point. So I'm really excited about that work. And then yeah, intake take I mentioned earlier, right, so Virtual Data Catalog, I think that's a problem that a lot of data scientists maybe don't think about too much until they get into business. And they get into business. And they're like, oh, there's already I've got data catalogs. I've got all sorts of different things, and meta stores and all these things. So some point between loading the Iris dataset, and then target hills, Romeo or snowflake somewhere in the middle. There's like a lot of intermediary needs, where people have they do shadow data management. And that's where intake can come in, and really help just up level what people are doing, and get help with a lot of like workflow pain points and quality of life issues,

hugo bowne-anderson
you've raised a very a super key point there, because something I wanted to come around to. And this dovetails really nicely is the scipy and Pydata ecosystem served a lot of needs of basic people in basic research, and then in finance, clearly been adopted by a lot of industry at large. But it's not clear whether these tools have solved for example, the deployment story that a lot of businesses are thinking about now. So what gaps are there? And is the deployment story important?

peter wang
 Yes, the deployment story is an important one. And the reason is because if your stuff doesn't get deployed it you will always be stuck forever in like research toy experimentation land, or insights. So yeah, like you're gonna be basically your output is a PDF, or a PowerPoint. It's important to actually have the stuff be deployed, people want to deploy, they really do want to have online learning. They want to have online training they want to have they want to see data driven software actually accelerate their business, whatever business might be. So the desire fundamentally from the business perspective is there. What's not there is all of you who manage IT infrastructure, who managed deployment and production processes for the last 20 some odd years in enterprise software development, they really don't know how to think like data sort of data oriented people. And what I mean by that is that they have, for the most part been software and infrastructure people. And they've let the data world be completely managed by the DBAs, and enterprise data warehousing people. And so it's really two halves of a brain that then come together to then roll out things into production. And with machine learning, with data science kind of things, the data, the values, and the code commingle into what is correct, and what meets SLAs, and deployment response times, things like that. So you can't really handle the data and the code separately, you have to handle them together. And that's a new fusion kind of practice area. That's not that's emerging. Some people are good at it, some people are really stumbling, and fumbling with it. So I think that for the data science community to get to that level, it's going to be, it's hard because the software tools you build to aid deployment are oftentimes very bespoke to your deployment environment. And everybody's deployment environments different, and there's no sense of over open sourcing it, because it's like, it's a pile of your TerraForm scripts, how's that gonna help anybody else in the world except you, and you probably don't want to show it to the world, they probably got a lot of little weird hacks in there that you don't really want to show to everyone, I think some things will have to emerge. And we as a sort of outside sort of tool builder that we've been together for so long, I think we can serve a very useful purpose by bringing together those users, convening them, as we always do, putting a conference together or putting a workshop together, and hearing from folks about what their pain points are, and then seeing if we can't come up with some good solutions that holistically handle this. So we're trying to handle some of these things. That's just the software governance and software deployment level. But that's only one piece of the equation, like I said,

hugo bowne-anderson
I think one of the most important questions is what is the right abstraction layer for 80% of people to be satisfied. I also think there are many vested interest, in fact that Kubernetes and kubeflow has been adopted to a certain extent, and is probably talked about maybe a bit too much is because we'll live in the shadow of Faang as well. And Google, in particular, in this case, and the types of things that are used to solve Google scale problems are not necessarily the best things for, you know, the long tail of small to medium sized businesses out there.

peter wang
I think there is that there's definitely, I think there's like when I think about open source, I care a lot about the human ecology behind the software, I actually the licensing the specific license, is OSI approved or not, is a source available? Is it what like, that stuff is a little bit ancillary. For me it I first and foremost look at who and what group of people are actually doing the exploration as to what they want to build, what's the right thing to build and how they build it. And so in this ecosystem, around PyData and SciPy, it's usually been groups of users, right, ultimately, that come together groups of like, power users come together and self service this phenomenon. But that's, I wouldn't say it's unique. But that's a defining aspect, I would say, of this community. Sure. A lot of other communities don't really have that they're totally I think about some of the web dev communities, oh, this is Facebook's tool front ends, let's all use it. That's okay. That's fine to Cassandra, Facebook threw it over the fence for everyone to use. And then they Apache zoo of tools, there's a different there's a very commercial ethos there to say, Oh, this group that built this data tool, at some FAANG company, they've all left to take this Apache incubating project, they just raised $20 million from a VC. And they're gonna build like a something as a service around this tool. And that's the format that works for them. And as a tools useful for end users, that's fine. But that's not the move that generally happens in the PyData space. It's just not what has happened. And there's something about community participation, about this, the way that we explore the landscape of possibilities, the way that we play. And this is going to get to maybe something we'll talk to later on in the conversation, the PyData community that modularity and that like humble scoping concept that has allowed it to be an infinite game, people can come in set up camp on the edge of town, and there's always the possibility that town will grow to where you are, you don't have to go kick anybody else out of their condo, move in and set up your like noodle shop. So there's a there's an infinite game aspect. And it allows for the users to flow. And user energy is ultimately I would say the currency, but it's the thing that flows into projects and gets them credibility and gets them the energy to then continue growing. It's a necessary it's like the water or the oxygen in the ecosystem. But when you have these projects that come out of big companies, because they're delivered, like whole, just boom, it's dropped, it's ready and you can use it and of course, they're open for contributions and it's the people shepherding these projects. Kubeflow is Dave Ironchick I think a creator seems like a nice guy like these. There's nothing wrong with  either the projects or the people, it's more of the ethos mentality, and therefore the communities that emerge around some of these projects. And they will tend to, because people are trying to build companies around them, or trying to withhold some features so they can charge for other features, there is much more a finite game dynamic to them. And they're finite game in two ways finally, game in the sense of extracting some aspect or finite game in the sense of being a little bit of a casino, kind of dynamic for venture capitalists, for investors and whatnot.

hugo bowne-anderson
Can you say a bit more about the difference between finite and infinite games, maybe a la James Carse?

peter wang
the idea of a finite game is a game that has a well defined terminal win condition. And there's only there's only if for some person to win, someone else has to lose. And there's n number of dollars, there's a number of properties on a Monopoly board, or in checkers, I win, or you win and chess, there's only so many moves, so many pieces, so many squares. And so many of the things that we encounter in the world, there's a natural scarcity to them. And what humans do in lieu of actually killing each other is we set up finite games, we set up scarcity games, and dynamics and competitions to see who know so we vie for the scarce resources. So there's a natural zero sum mentality, if I win, you lose if you win, I lose. Infinite games, on the other hand, is there are a few examples of things that we do that are infinite games, and then you have an infinite game is that there isn't really a final win condition. And there isn't, it can be multi-win one person can win someone else can when in fact, sometimes both people winning means they each get more. Right? Very few things in the natural world has this dynamic where when you give something away, you get more of it back like that doesn't happen. But in infinite game territory and collaboration spaces, you have this dynamic, right, Travis oliphant, by giving away the source code, and opening it up, he got way more functionality back from the world. And all those people did it by joining in and giving their contributions too.. So the point of an infinite game is to keep the game going as long as possible.

hugo bowne-anderson
Natural language is another example. Right? Where the right we don't have

peter wang
a finite number of words we can say, or syllables we use,

hugo bowne-anderson
but also the more people who speak a language, it comes down to rivalry actually, the more people that speak the language, the more you share language with other people, the more you can collaborate, hopefully. And ideally,

peter wang
that's another term that is used: rivalrous versus non rival some things actually are anti rivalrous

hugo bowne-anderson
Could you give us a quick rundown of rivalry and why it's important to consider that when discussing open source software.

peter wang
Wow, that's a huge topic. So yeah, thank you for the question.

hugo bowne-anderson
I can try and you can correct me. Okay, you go for it. Okay. Yeah, rivalrous is when consumption by one user or consumer prevents simultaneous consumption by another consumer. So if I am eating an apple, you can't eat it. That's mine, right? Anti rivalrous is when this condition doesn't exist. So broadcast TV is an example there, where if I watch something that doesn't stop you watching it. So that's non rivalrous. Anti rivalrous is that each additional person using something benefits more the other people using it and hence open source software, because more people are reporting bugs. And yeah, exactly.

peter wang
Network effects are a class of anti rivalrous effects. And rivalry doesn't have to be finite to have a physical resource necessarily. It can be attainment of sorts. And so human brains such as our limbic systems were wired as primates, basically, to put status games in places wherever we roam, we put status games in place and status games are fundamentally rivalrous, there's one we head honcho, and there's not two of them. So if one person's head honcho, no one else can be the head honcho, so the dynamic of rivalry versus non rivalry versus anti rivalry, it shows up everywhere in life. And so the interesting thing about the modern world is we're moving... so much of what's important to us is transitioning because of the Industrial Revolution, and then the Digital and Information Age. So much of what's important to us is moving out of a rivalrous mode or out of a mode where it must necessarily be rivalrous into a mode where it can be non rivalrous. And then open source or Wikipedia, or some of these other kind of crowdsourcing dynamics are crowd sourced to artifacts, collaborations, whatever you call them. They're shining example. So what can be done if we really harness the power of anti rivalrous things?

hugo bowne-anderson
I love it, I love it. I just want to add one thing that OSS is not only anti rivalrous, it's non excludable. So it's actually a public good. What I mean by that is that there's nobody that says, if you don't pay for this, you you're excluded from using it.

peter wang
However, there's a darker aspect to some of these things, please, which is that there are always going to be five there's going to be finitude to somewhere. And what modern kind of financial systems and investors look for if they look for returns, compounding returns, right? Compounding returns are an exponential curve. We don't really in nature, you don't get exponential curves. You don't get exponentially more oil out of an oil. You don't get exponentially more apples out of your apple orchard, he can't even get exponentially more energy out of a giant exploding nuclear fireball, right the sun, you just get what you get. But if you create network effects of human beings, and you put extractive elements in at every node, the more networking folks do, and the more you can get exponentials out of it. So here's the thing. Those who are intrinsically as a fiduciary obligation required to go and mine for exponentials, they will seek like heat seeking missiles, they will find places to put elements of finitude in and start extracting and pulling exponentials out of what could otherwise be anti rivalrous, abundant generative collaborations.

hugo bowne-anderson
 let's have a hypothetical in which there's a social network, which on the other side, I don't know it has an advertising platform that advertises to the people in the social network and encourages people to grow. For some reason that also polarizes a lot of them. But let's say that was just in the incentive system of the algorithms that was set up, you actually have this anti this beautiful anti rivalrous organism growing, but you have some rivalrous finite game system preying on it and extracting all of the resources from it.

peter wang
It finds the finitude. It finds the finitude and injects it in there taps it out. And through that one little corruption is basically the entirety of the problem with Facebook's the kind of the things that they do. That's, that's problematic. It hinges on that business model of requiring more and more engagement and finds the finitude of human at-- So that's the interesting thing. That but this ties right back to open source, yes, because because you talk about non excludability. Here's the problem. Turns out API's are excludable, in a sense that if you and a bunch of you and all the other cool people are talking on podcasts about some API, and I have an alternative library with a different API, and nobody's using it, my thing will die on the vine. And this is actually a game that has been played by FAANG and by corporate players. They essentially recognize that developer attention just like social network consumer attention, but developer attention developer familiarity with API's. That's a finite resource. Yes, we can capture that. And we can capture it by dumping a ton of money into dev rel, dumping a ton of money to conferences, books, videos, all these things to be able to use this stuff. And then once we've captured developers using this API, they're going to use our API, not somebody else's API. And now it's an API battle,

hugo bowne-anderson
right? Acquiring communities like Kaggle. For example, the Google acquisition of Kaggle, I think is a probably an example of

peter wang
how well it's worked out for them, if that's provided economic value for them. I think Kaggle is still Kaggle. So I

hugo bowne-anderson
But I'm not suggesting that. Okay, yeah, yeah. But I do think it brought it into the Google ecosystem in some ways, and got people working more on TPUs or whatever it is.  I'll include a link in the show notes, but everyone should read here comes everybody, at the very least

peter wang
But I think those absolute like, from a strategy perspective, someone's sitting there looking at a spreadsheet saying, Yes, this makes sense. We'll pay like, however much money because this gets us this community because developer attention is a scarce resource there. At the end of the day, there are healthy competitions for rivalrous goods that happen, if not all rivalry is intrinsically bad. The thing that I don't like is when I see the possibility for abundance for generative kind of antivirus things happening, and then someone goes in and taps like that energy out using an extractive finite system. That's not so good. So that is the I think around open source. And if we unwind the stack, a lot of the ways back to this where I was first talking about finite game vs infinite game. The beauty of open source collaboration is that it is anti-rivalrous. The more you give, the more there is there, like it compounds. And so that's definitely been a dynamic that's helped the Python open source community now have so many libraries, so many tools, so many blog posts and tutorials on how to do just about any it is so Clay Shirky is probably like the most prolific author about crowdsourcing, the power of crowdsourcing, Here comes everybody. And he identifies the four levels of crowdsourcing or collaboration. And he starts with me first collaboration, then he talks about making it fit for other people who are interested in learning about what is it you're doing, and then be able to accept their contributions. At the very end, the final thing I believe was Jose, like addressing the fundamental economic limitations. So like, ultimately, crowdsourcing blows apart, it resolves a market inefficiency, he actually does talk about this in economic terms. 

hugo bowne-anderson
So the Coasean ceiling or the Coasean floor I can't remember which. 

peter wang
The theory of the firm --  the idea that you can just put smart people in a room and they will out innovate all the smart people who are not in the room, that's completely flawed. In an Information Age, the more people we get working together, the more cool stuff we get. And that's the I think if there's anything for us to anchor on as a really we need to clip in on that point. Because that is ultimately... that has got to be the energy from which we manage all of the innovation and drive all the the innovation, the ethos of innovation in AI, and machine learning and sensor systems and all the future Information Age stuff that's coming. It has to center on this economic principle, we're no longer in the industrial age, we're no longer in the Agricultural Age, Bronze Age or whatever, seeing for my time now, information age, we should be solving for minimizing the cost and maximizing the sustainability, the network at everyone's basic needs, but then all the additional things that we like, enjoy and want to get and give to each other. Those should be approached from a mentality of abundance. And the only other person that we haven't someone's keeping a bingo card out there. The only name we haven't mentioned yet in this conversation is Elinor Olstrom and the commons, right because we talk about crowdsourcing all we want to but then where the tension between crowd sourced antirivalous generativity meets rivalrous, extractive exponential return on returns capitalism, the crisis comes when they hit a commons, because the anti rivalrous will find the energy to maintain a commons. But extractive games and rivlarous games will come. rivalrous games, and finite games will come and say I need to capture as much of the commons as possible. So using commons as a finite resource, not as a substrate for generativity. And that is essentially the existential risk on open source human ecologies?

hugo bowne-anderson
You're absolutely right. And so I want to make this you and I can get relatively abstract on occasion. And I enjoy that I want to make this concrete with an example, which I think what you've done in your career. Yeah, I think I look up to what you and Travis and Bryan and all of you did a continuum and now at Anaconda has very rich information about the intersection of open source and business and venture backed startup land as well. So I'd like to... I don't necessarily consider all of VC necessarily extractive of course, but I do think there is definitely that element to it. So I'm wondering what the trade offs are in working in the open source space, and accepting venture capital, and how that can inform the conversation we're having now.

Unknown Speaker
my experience, and this is all just I can only speak through the lens of like, my personal experience on this course, for people at startup land VC seem like at the top of the food chain, but from people who work in the capital markets and work in, you know, institutional investment or whatnot, VCs are just one set of players in a very big ecosystem, that's actually much bigger than VC. So you know, the top tier VCs on Sandhill Road, and that everyone talks about, they have a certain prestige to them, for people who are going, you know, into startup land and coming out of YCombinator and all this stuff. And there's a reason that I think they incubated a lot of the funded and built really good companies, but they are still just one component in a very big set of capital market players. And ultimately, anyone playing the capital markets, you're beholden to your investor, just like startups are awfully beholden to their investors, the investors are beholden to their limited partner

hugo bowne-anderson
LPs. Exactly. I'd love a sociology of like data tooling, LPs one day.

Unknown Speaker
yeah, it'd be interesting to actually go upstream. A lot of LPs are interested in this stuff. They're all over the place, right? Some LPS care some of their family offices, every LP has it basically, LPs are just massive, massive, multi 100 millionaires and billionaires and sovereign funds, and college endowments and all these kinds of things. And they write really big checks to investment management firms that then go and give the money to to entrepreneurs or to invest in land or building real estate. So the reason I tend to give this 101 to capital markets, is because the thing that I want people to understand is that anyone in the middle who isn't investing their own money is merely an agent of the principal that actually owns the money. And so there's a concept called the principal agent, problem, principal slash agent problem, or principal dash, which is that agents are incentivized to do certain things that are not always 100% aligned with what the principal wants, they act on behalf of the principal. But ultimately, they have a living they need to make they have a personal reputation, their fund and the firm has reputation they need to maintain. There's all these things that agents have to do. Their incentives are not 100% aligned with the principal, the principal says, I give you a billion dollars, you need to go make me a certain percentage back and you get carry on it, you get all these different things to incentivize you to make money on that money. But ultimately, I give you the money and then that's transactional for me, essentially, you need to give me that money back with a lot of interest. And that's what the principal cares about. And then everyone in the middle is agents, until that money finally hits somebody who's going to take the money and apply it to some problem. That person is putting their own energy, their time, their person reputation. They're not spend time with their family, their kids and the parents. They're putting like, the most precious resource any person has is their time. They're putting personal time and energy into this. So the principal has a unique stake And the entrepreneur, the doer has a unique stake. Everyone in the middle is an agent. Okay? So the reason I say things like that, and that's very reductive, I realize it can be a little offensive to people maybe. But it is actually I'm a physicist, I like to look at things at the fundamentals. This is the fundamental wiring that people get, you can't break out of this pattern, if you're in the pattern, this is the pattern. So that what happens is, if the LP doesn't have a stipulation, or doesn't have a point of view, or doesn't put requirements on those agents, on those VCs or investors to do certain kinds of things, they're not going to do them in general, unless it dramatically de risks or dramatically increases the possibility of the outcome. So funds of firms can say VCs and venture firms, they can tell you, we have a thesis, we're operating this way, we're this kind of firm, we're founder friendly, we're this we're that we're a growth stage firm, we help accelerate today, everyone's had a story to tell. And they will try their best to actually live that story. But when push comes to shove, they are agents. And they have to do certain things that hold them at least what's the term like they have to be at par with everyone else in that ecosystem. Because it's a small world, when you get into the world of managing billions of dollars. It's a very small world, everyone knows each other, and the reputations last a long time. That's the issue there. If you are a company, and you want to do these, like great things, and like a B Corp, this stuff and impact social impact and generativity and all this stuff, you can tell that story till you're blue in the face. At the end of the day, you got to look at your investors it was the investor want? And how am I going to give that return to the investor. And if you're cool on that, and investor says, Yeah, that's actually what it is, then you're great. If you can believe and trust, the investor is going to fight for you to have certain other kinds, like you can perform outside the envelope of expectations, then that's a trust that you have with that maybe they have a big enough portfolio, and everyone's got to play money. Everyone does, right. So all the firms, the partners, that they'll have some play money, they can toss at various random things just as like outsize bets. If it completely blows up, it doesn't matter. And so then if you are lucky enough to receive some of the doesn't matter money, then you're very free to do whatever you want to-- lucky you. But most people don't get that most people there is an expectation of a certain level of return and certain kinds of things. And there's certain time window performance. Those are just that's just the way that play, that whole system works. 

hugo bowne-anderson
How does it impact a company that is building tools? Or maybe a different question? And we can go in either way. How does it impact the data tooling landscape, which as we've seen the incentive system for these agents or venture capitalists is to have a portfolio of tools, which some of which will work, some of which won't? And is there a tragedy of the commons therein in the tooling landscape?

Unknown Speaker
There's a lot of fertilizer dandelion seeds being dumped on the pasture, absolutely. Put it that way. And I think I don't know how to put it My inclination. I'm a bit of a, I'm a bit of a rebel in a sense. And so I think I really love the kind of energy and ethos of the open collaboration, environment, SciPy, and pydata on all these things. I think it really brings out the best in a lot of people. And there's such a positivity and generativity in it, that I see my personal mission is to help defend some of that. And the way that I plan to defend that is to number one, just call just talk about these dynamics, even just having conversations like this and just being explicit know about what I think some of the challenges are. That's actually awareness raising is an important thing. But another thing is I do and maybe this is naive, but I do you believe that users are more sophisticated than maybe what sometimes VCs or Gartner analysts or whatever it gives them credit for. Because a lot of the VC money being made that was made in software over the last 20 years, it had a certain playbook of how it would take software to market and there's all these different kinds of things. My hope is that if we can actually build a more sophisticated user base, that market becomes too expensive to go to market in inorganic ways. And that market builds community and builds peer review. And I'm sorry not peer review builds peer review sites and builds connection and builds all these things, so that they're not alienated from each other. If you're about to use something, if I were to get a concrete example, I don't know what kind of data analysis problem that you would encounter that you haven't encountered before. But if you were to encounter some problem, and you want to do something with it, you would do some Google searching probably but then you would also ask some people who would maybe work in that area exactly what do you use?

hugo bowne-anderson
Right? I asked people in Slack channels now I've read we're very good at finding-- especially now given all the bullshit out there. We're very good at finding the highest signal places and we're getting better at that I got people on WhatsApp dude or signal I'll message you sometimes to be like, what's up? Google tells me like that. I'm like searching for pipeline until we like data ops fucking star flow. And I'm like, what is happening? All these SEO experts and digital whatever is like pumping crap into my brain. Yeah, the

Unknown Speaker
proliferation of the star dash Ops is a very enterprise software move, right? Because what they tell you, what they tell you is that you have to go and be if new categories are emerging, you want to position your product in the category and be a leader in the category and pay off the gartners and whatnot. So the world to get yourself into the Magic Quadrant, and all those other kinds of things. And so if you can't do it, then you should create a category and because it's all top down selling kind of stuff. It's all like, it's vending into the ignorance gap, where you have practitioners sometimes, is it folks, sometimes the software developers, they don't know what to do what's best. So they go, and they search for how to do these things, because it's an emerging area. And that ignorance gap is where people will come in, and just essentially carpet bag and I find that extremely, I find that really just dishonest. But as a technologist, and as like an old school coder nerd, it offends me a little bit. And so that's fine. I don't have a lot of respect for that kind of thing. But I do believe that if we're able to build this one of the reasons why we're building the nucleus of the community site, we really want to create an environment where there is a high trust, high signa; to noise, place for practitioners to actually share their insights and one or the other. And these kinds of forums did once exist on the internet. Right? When we were not all just random alienated avatars, on sites with hundreds of millions of users, we would actually find forums and find places where we'd create an account and would build reputation, we get to know other people would actually have a relationship. And if you can create a place where relationships are possible, then you can have a much more credible and trustful environment for sensemaking, around emerging technologies, or whatever. So that's the hope that's why we're doing that piece of it. And that is exactly I may not have expected how it would come around to all this. But that's exactly the kind of turf grass or trying to basically put the stuff down to put turf grass in so that then the commons don't erode right from this cost of pounding of more like whatever ml dev SEC ops things something no next year now that those concerns, I say these some of these things that flippant manner, it's not that the problems don't exist, or the problems aren't legitimate problems. It's more of this rush to put labels on a category of emergent problems, and then immediately try to find some off the shelf thing that just solves it, it's highly unlikely that an off the shelf product thing solves us and another name I will drop in here is Erich von Hippel and his work on Democratizing Innovation, and his critique of products even as a concept, but his research on the idea that products only ever suffice, 60 to 70% of a user's need, they have to self service, the remaining 30 40% of it. And this applies from track from like tractors in the cornfields of Iowa to hard drives to whatever, right the idea that we can vend products into technology areas of emergent practice that have 100s of 1000s of different possible configurations, that's a pipe dream, like just stop trying to buy stuff, just be you gonna have to build things for a while. So I think that's an area where, yeah, some of this stuff is emerging. It's a hot area of innovation. So my goal is just to be at least one standard one battle standard on the battlefield, to say, this is a credible, crowdsource practitioner led, Kid tested, Mother approved, like innovation looks like and trying to find lines of demarcation trying to find the allies who will join us under that banner.

hugo bowne-anderson
That's awesome. And I'm personally very proud and humbled to have created a bunch of content with coiled on parallel computing and scalable computing. So we've got some videos and blog posts and white paper and then that type of stuff. That's cool to have started off that collaboration. Yeah, I want to go a bit deeper into data. And, like I'm, in turns quite bullish, and then bearish on data science and analytics. And why are we doing this, Peter?

Unknown Speaker
the interesting thing is that the world if you really go and you work in industry, for a while, and especially I was very privileged, I really enjoyed my time at Enthought I was quite lucky to have the opportunity to go to many different kinds of businesses, see work with maybe scientists in a particular area or quants at some investment bank. And then also from that vantage point, usually a point that is very data driven, that has sometimes physics equations, sometimes, you know, not so accurate financial equations, whatever, that power computing, and then from that vantage point, look at the rest of the organization. And whether it's an oil and gas company, whether it's an investment bank, wherever you go, what you find is that the landscape of the modern firm, 40-50 years after the PC revolution in the digital era, information age, human decisioning at these firms are, for the most part, not very data driven. They're retroactively data informed, but on a forward looking.

hugo bowne-anderson
mad confirmation bias constantly.

Unknown Speaker
There's MAD confirmation bias and the forward looking stuff is always some VP taken a swag,  highest paid person's opinion taking a swag and having the data people generate pretty PDFs and PowerPoints to explain to quarters down the road, why unexpected things caused it to not work? So the sad truth is the world is deeply inefficient and less efficient than it could be because it is not a data driven world where 10 years after the peak of the Hadoop Big Data wave, right with all this marketing, with AI and all these other things, for all of that, it's like professional, that's all pro wrestling. That's all like if you talk if you look at I love watching some of the stock market stuff, especially segments on tech news and everything else. It's all theater all kayfabe. At the end of the day, most of the doesn't work. Most of it is IT overspend on stuff that doesn't work. That really is ultimately most businesses run on spreadsheets be emailed between VPS okay, we've really not progressed to some data systems are more advanced, there's some transactional systems that are incredibly advanced. But for the most part, most businesses are still run based on not just human in a loop. But humans calling all the shots based on a couple of data things confirming their biases. And the way they sometimes there's that that quip that science advances One funeral at a time.

hugo bowne-anderson
That's great.

Unknown Speaker
businesses generally advance one VP exit at a time or one SVP or one CVP. I said at the time. And so why is that such a problem, that's a problem. Because it is not just inefficient from a resources perspective, those people get paid a lot. They make crap decisions, they create a tremendous amount of trauma, a lot of wage earners and then go home and export that trauma to their kids and their spouses. It's really horrible dynamics. And then all the money they're getting paid essentially, as a white collar welfare that could be going to feed starving kids. So there's a deep sort of like global humanity level inefficiency. But you put all of that aside, there's an existentially important problem, which is that the world is getting more complex, faster. So all of the businesses that actually facilitate infrastructure, products and goods that we need to live and all these things, all those businesses have to get a lot much better, and much more agile at responding to a more dynamic environment. So part of what happens there, this is the opportunity for those VCs, right is they know those businesses are dinosaurs that can't do crap. They're just paralyzed in place. So essentially, all you do is you go build a business fund some startup, they get some amount of revenue, they challenge some incumbent, the incumbent come in and comes in and buys them. And then usually the thing dies, and whether it's inside the cupboard, so that innovation dies, right? The VC gets their take, they get some multiple on exit, the corporate people that acquired it, they get to give themselves to declare success and give themselves a bonus at the founders. And hopefully they do well, like the people work the business they got acquired. But the end of the day, did that innovation really make it out? In general? Most cases? No, it really doesn't. Sometimes there are success stories for sure. But in general, it's really not even 50/50. So if we look at that, and we say, okay, hold up, why are we doing all of this right to your question. The reason I'm doing all this is, is I'm hoping that the new generation younger people going into these businesses armed with these data tools, are going to basically slash and burn their way through the inefficiencies of the previous models. And one of the ways they do it is from within the belly of the beast. The other ways they do it is by learning all the trade secrets, not trade secrets, but they learn how the beast operates. And they're like, this totally sucks. I'm leaving, taking thrree of my best friends that are super awesome over here wanted to build a startup, we're going to eat this thing's lunch. So that's really so I see myself as  a steward of a large colony of blacksmiths making tools for the revolution, also the data revolution, it's a cybernetic revolution, I love it, we must have firms that are smaller that a more agile data driven, that are that the prediction action observation out in the market back like that has to run tight.

hugo bowne-anderson
And networks of small firms interoperating and being and I think the interoperability in the tooling space via networks of small firms building tools at different points in the pipeline, I think is a very promising future.

Unknown Speaker
Consider this: we think about supply chains or physical goods, if you're trying to make a phone or a toaster oven, whatever you're trying to build a physical good, you're gonna say, Okay, I have this prototype of the product, I want to feel this product, I need to sell a million units next year. Let me go look at the supply chain, I'm gonna look at my suppliers who can supply me electric cords, who can supply me a heating element, who can give me who can handle boxing up all of these things and distributing it right? You're gonna ask your Suppliers and distributors and ultimately retailers to give you models of what their stuff looks like. Now, they give you those models on the basis of maybe PDFs with some tables built in with the pricing and lead times and stuff. Sometimes they'll even give you a spreadsheet where you can model out yourself. Imagine the future if that was actually all API's it was actually give me no give me your model. Give me your frickin model. I'm gonna put in here and do like stochastic walks through possibility space of optimizing like I do convex optimization of DryCell toasters or toaster ovens or air fryers, right? I can do all of this stuff on the basis of you having actually given me your model, so I can pull it into my ensemble model, that's a very different world, then we're some VP looking at a thing squinting and be like, Ah, screw it, I think we're gonna do this textbook,

hugo bowne-anderson
I love it. And it comes down to something that we're going to speak to soon, which is cybernetics and getting us as pilots or controllers working well, with the machines. I do think I love the idea of the data revolution and supporting people doing the data work, because what I see now is I have a very poorly formed thesis, I've been reading a book by an anthropologist David Graeber, called bullshit jobs where he actually creates a taxonomy and typology of bullshit jobs. And he's actually identified an increase in the number of people who consider their own jobs bullshit. And part of his project is to discuss, meet and write with people who he would never say your job is bullshit. But speaking with people who actually feel that their jobs create no value, and I don't think data science is like, totally bullshit. But I know a lot of working data scientists and analysts who create all types of work for their organization that they feel is never used. And Graeber uses the term spiritual violence with with respect to this, but there is a very demeaning lack of professional dignity with this type of wage labor, I think, and being part of a revolution to enable people to come up and be proud of the data work they do when it's used, I think is incredibly important for our field.

Unknown Speaker
 Yeah, I mean, it's we talk about these kinds of jobs a lot of times the bullshit job actually there's strong correlation bullshit jobs and white collar. Yeah. Oh, absolutely. blue collar jobs, labor, do you get see are you doing? Are you not doing right? But the white collar sounds like Oh I sent some emails today?

hugo bowne-anderson
This is part of the pushback, I think that we're not talking about with remote work is part of the reason corporations don't want people to work remotely isn't because of any of the stuff we've been talking about. It's because they can't see them just sitting in the office pretending to work, it puts the busy work on Slack again, and through email.

Unknown Speaker
If I could actually pull this back to what we talked about the antivalrous vs rivalrous. The industrial era mentality, or modern theory of the firm and a lot of management practices techniques, they come from the industrial era, which then was informed by military type stuff like the Henry Ford stuff and whatnot, they really it was about managing.

hugo bowne-anderson
  Fordism Taylorism.

Unknown Speaker
about managing human labor.

hugo bowne-anderson
with a fucking clock, man. Like, the clock, sorry, for timing, the tyranny of the clock. 

Unknown Speaker
The tell us how you really think, Hugo! So the interesting thing about that is that when we have such a huge -- when we move in the information age, and then you have such a diversity of outputs possible within the same unit of labor, that labor is no longer the right way to measure it. And a lot of people have not really figured this,

hugo bowne-anderson
Am I paid for my time or for my tasks. I've literally asked that to people before,

Unknown Speaker
in small firms even doing information work, there's no place to hide, right? If you're not getting your stuff done, everyone can see you're not getting your stuff done. When you get to a really larger size, when people lose sight of the connection of their work to the bigger output. And there's a tier, what happens is, this is, again, the principal agent problem, the principals of top like the C suite, they know what strategy they want to implement. And the worker bees down here, they know the things they can do. When you put a layer of agents in the middle, whose job is to manage up and down or up and down, then it becomes extremely difficult to-- a lot of stuff kind of filters through and that doesn't quite make it right. It's extremely lossy. And the interesting I was just reading thing about remote work... two different things. One person that was a Reddit comment, this guy bragging about the fact that he quit his previous job with a vengeance. And it felt so good, because the boss was crap. And then he picked up like part time thing, doing something, I think it was some claims processing insurance paperwork kind of thing is getting paid basically 35k a year to do this thing. But it takes him like no time to do it takes an hour to do it. So he picked up a second job, another 35k A year picked up a third one another 35. He works basically a 60% of what used to work and now he's making three times but he's making and this kind of thing is I don't want to extrapolate on one data point. But this is a very interesting kind of thing is the other thing that someone was saying was that one of the challenges with going to remote work is that it really... the tide goes out. And then people really start looking at where am I getting my inputs? And where do my outputs go? Everyone who's doing make work and busy work in the middle gets cut out of it. And so a lot of bosses and managers and whatnot, who have been quite superfluous to the process are all going to get squeezed out of this process. So we're gonna see, I think there'll be a revolution in tooling around corporate and remote work, hybrid work, internal efficacy, and you're gonna end up with a situation where companies are really realizing there's a lot of efficiencies they can gain, you know, it'll be interesting. I'm actually quite hopeful about this, but back to the point about labor Taylorism. Information Age, right? And the meaning, the lack of meaning of some of the very smart people who use data science equipped with these tools, they go into businesses and to what end. And I think this is the thing of we talk about spiritual violence, when you take somebody's time, and you engage them on something, what you owe them is not just the wage, you really owe them some ability to derive some meaning from their work. You can't starve them of meaning, that's not right. And unfortunately, most of corporate America becomes quite starved for being because no other top is said there is no chief meaning officer. So how do we avoid and how do we measure the level of Anomie across the organization? And how do we mitigate I think, in Bhutan or somewhere, they have a gross gross happiness product, their gross national happiness, right, which is a little bit like whatever.

hugo bowne-anderson
 I love that Schmachtenberger right on Rogan, he gave he said GDP is a horrible metric like it goes up with as addiction goes up as war goes up. And he thought he said the inverse of addiction is perhaps a good measure of, he said, a nation's health but we can say that any collection of people.

Unknown Speaker
Well, addiction is actually only one kind of dark thing. So it's actually the inverse of the one hole. Yeah, there's a lot of holes people get stuck into. But the general concept I think, is quite good that if we don't know what is we should what the best thing is like, what is gross national happiness that's hard to measure. But can we measure one over gross national suffering, pain, trauma, alcoholism? Suicide? Right? And yeah, we that might be a useful measure just to say, if I don't know where I'm supposed to go, but my distance from the cliff is still a useful measure how far away am I from the edge because I might not end up where I want to go, at least I'm not going over there. So I think with this kind of thing, it's the same thing, the structure of businesses, when you look at information work, which could be a generative thing, could be an abundant thing. And so I was saying, business from the top in order to have management be able to learn how to extract, in order to better metrics, that extractive things, we completely disregard the concept of freedom and autonomy, and the kinds of things that actually give people a sense of meaning at work, making consequential decisions, I've always maintained, that's the root of meaning isn't making consequential decisions, if you're doing all these things, and then it goes up and just flitters away dies in a PDF somewhere, you don't feel good about yourself. And that's I said, life isn't about feeling good, to be clear. But if we create a system of the world, or constantly depriving people of meaning during their waking hours, and in their collaboration with their peers they see during their waking hours, that cannot be possibly be the right architecture for civilization. Yeah, I did you have an original question?

hugo bowne-anderson
I think we've wrapped around with respect. My question was around the idea of bullshit jobs as it relates to analytics and data. Oh, right. I think I think right on that, what I want where I want to go now is, I asked you what's valuable about data? Why we're in data, okay. Oh, right. Right. Yes. Where I want to take this is, now let me get this right. I want to talk about the value of data. And then let's say the suffering caused by data what I mean like we've heard all this shit, like data is the new oil and that that type of stuff, but you may have even sent me this originally years ago. It's a talk by Maciej Ceglowski at Strata called haunted by data. And I'm gonna just read you the opening. But at the start of this talk seven, seven years ago, he wrote in preparing this talk, I decided to check out the data landscape since I hadn't seen it for a while. The terminology around big data is surprisingly bucolic data flows through streams into the data lake or else it's captured in logs. A data silo stands down by the old data warehouse where granddaddy used to sling bits. And high above it all floats the cloud, then this stuff presumably flows into the digital ocean. I would like to challenge this picture and ask you to imagine data not as a pristine resource, but as a waste product, a bunch of radioactive toxic sludge that we don't know how to handle. In particular, I'd like to draw a parallel between what we're doing and nuclear energy. Another technology whose beneficial uses we could never quite untangle from the harmful ones. Discuss.

Unknown Speaker
Mic drop. Yeah, he's completely right. I did this podcast with a16z Yes. And I said actually, I don't think there is such thing as data. There's only frozen model and I stand by that statement.  I think our metaphysical approach to data, the Mughals will do the Muggel things to say the muggel things but I think as data practitioners, we should be extremely clear as to what it is we're doing here. Every single datum that you collect is the result of a tremendous amount of processing through the DSP through the hardware, through the software, all that stuff before it even ends up In some CSV, that is probably named wrong.

hugo bowne-anderson
So delimited issues that nothing can figure out.

Unknown Speaker
But before we get all waxing poetical about the cybernetic future, for a moment, let's consider the actual present, which is a bit bit off. But the reason I say that so metaphysically that we have to be honest with ourselves is that if as practitioners we don't hold the line, then we will never be able to convince the business users about the right paradigmatic frame to think about this with and so I really do stand by that framing, I agree with a Maciej as well, that it's the data is oils is precious thing to be captured, defended. And oh, these users want to hold on to their own data and is private. So you can't hold on to data. Because most brilliant thing ever said about information information is a verb, right? Information, my whole thing my data is just frozen model is really a riff, it's a corollary on that statement. Information is a verb, information is a difference that makes a difference. So is the number three data depends on who you ask, depends on where it came from, depends on what someone's going to do with it. The idea that there is this objective, distilled thing, so you get a little cup, and I have a number three in it, I have a piece of data, right? No, you don't. It's just the number three. Data is actually the statement about the sensemaking system, and a decisioning system. And in the conjunction of those two things, one might detect some resonant patterns that you can then schematize and say, this is the data flow between the eyeball and the muscle. But if you take the eyeball out of the equation, take the muscle of the equation, it's just some random electrical spikes doesn't mean anything. So this is the kind of.... I like for people to take a more transcendent view metaphysically of what data is, it's all it's just numbers, unless you actually have a sense of where it came from, and how and where it's going and why. So it's one of the Don Draper's thing about happiness, happiness, you know, happiness is the moment before you need more happiness. It's like data. Data is the thing is knowledge just before you're confused again,

hugo bowne-anderson
right? And actually, Cory Doctorow has some great articles on what he calls the half life of data and actually has a premise of how addicted social media is itself to data collection due to the incredibly short half life of data.

Unknown Speaker
And the dangerous thing about that, just as a side note on that, it turns out the human mind is very plastic and human behavior is incredibly easy to condition because we're just we're sick, we're still animals, right. So, the world, the entire system of the Western world, the mass consumer world, it is optimizing to make and condition people that are more predictable, that are more homogeneous in their consumption patterns in their designs back

hugo bowne-anderson
to Henry Ford as well actually.

Unknown Speaker
any color you want as long as it's black, but Noam Chomsky talks about this with mass media, and yes, and Manufacturing Consent. And what we do is we then conditioned people to look for, as if novelty, as if difference, but there's distinction without a difference. Which jaunt like which the sneaker, there's people who are really obsessed with sneakers, and like the vintages of sneakers, released and all these things, and and it's they're all literally made the same factory as like the $5 Chinese knockoffs. And there's distinctions difference. And we're conditioning consumers, conditioned people to be consumers of lots of distinguished aesthetics that have no difference in the fundamentals and substance. And if you just basically eat that kind of sugar all day long your whole life, ultimately, you end up with a deficiency of meaning. Because actual meaning comes from an embodied consequential set of decisions and relationships, you can't be in right relationship or meaningful relationship with objects that have bolted on esthetics to make you crave the next object. That's just not how it goes. So you end up with this vitamin deficient vitamin meaning is going to be deficient for you. And I think about those sad picture you'd like the fish of the oceans that have eaten all the little microplastics in their bellies are full, but they have no nutrition. And this is essentially what we've, what we're doing over and over again, with the bad uses of data powering the monstrous like engine of consumer capitalism. We're just shoveling more stuff into more people's faces and eyeballs. And we're at the same time trying to condition those eyeballs to expect the same kind of knowledge and that is the system of the world that's quite broken. That's what my friends I called Game A right and the only reason this engine does this, if you maybe people here at listeners here have read, I would imagine many of them heard about the concept of the paperclip optimizer, right? What if you have an AI and it go It's super hyper intelligent and we tell it to go make cheap paper clips or make it really cheap to make good paper clips, and then starts optimizing at some point it basically goes off the rails to decide that the most efficient way to do this is actually kill all humans. We already have that, right the modern system of the world create disposable all sorts of stuff burning up the planet, polluting the oceans, killing off all sorts of biodiversity. And to do what it's not paperclips, but it's 100,000 Different kinds of sneakers, which no one needs. So this is the kind of thing that like fast fashion. There's just been the photos of the places the landfills are having to open up in South America and I think parts of Africa to just dump clothes that were made by sweatshop labor in Southeast Asia. It all of it is to do what it goes, it cycles through the US, cycles through the hot tropics, and the other places come back out, ends up dumped to the global south somewhere to d  what to make some number in a spreadsheet, tick up to show quarterly revenue growth. So some analysts will say this number should tick up because they hit precisely their earnings report. That's again, a really rather flawed way to run human civilization. Anyway, that was a bit of a diatribe. But--

hugo bowne-anderson
I love that you mentioned Game A, because that is where I wanted to go to at some point. Also, if you have heard of the paperclip maximizer great. If you haven't, I think probably Nick Bostrom, his book, super intelligence is one of the places you can, I don't know, Bostrom created that thought experiment, but definitely he, he made it relatively famous. So I actually do want to talk about Game A, we've got somewhat abstract and metaphysical in this conversation. I love it. So I'm actually we might lose some people, but I actually want to lean lean into it for the people who are still here. And I think it's important enough to discuss if you're interested in the ideas that we talk about with Game A, I definitely suggest you look into the work of Jordan Hall, or is it Jordan green Hall these days? I'm sorry, but he's Jordan Hal. And he was on the Jim Rutt podcast. Okay. Yeah. And you were on there once you should go back once. And also Daniel Trachtenberg has some interesting stuff on this. But there are communities waiting, we can include some links in the show notes as well. I'm going to paraphrase some of what Jordan Hall said on the Jim Rutt show, probably quite badly. But essentially, they discussed how a lot of what we have today emerged from when societies were at the band level. So below the Dunbar number below 180 people or something along those lines. And we've developed a bunch of collective intelligence tools, what John Vervaeke, who I encourage everyone to check out,  calls psycho technologies, but a collective intelligence toolkit that tried to solve a lot of the issues and problems we had and have as a community. The first three problems are how to survive in nature, how to survive competition with other groups, and a lot of the time want to actually win that competition not merely survive. And third is how to survive internal defection. This actually comes back to tragedy of the commons free rider internal defection, I encourage everyone to check out the definition of multipolar traps in an incredible essay called meditations on Moloch on slate star Codex, which is with respect to the Astro Codex 10. It is now exactly what's on substack Wait, that's for another conversation by another Congress, by the way, right? So Game A was a collection of all these technologies to solve these very important problems. And these technologies include society and identity, settled agriculture, military hierarchy, which now plagues corporations, and education systems. Competition, finite Games is a technology that was created in order to solve a lot of game A problems, literacy, numeracy, market based societies, feudalism, capitalism, all of these things was things developed to to solve game A, now losing my voice slightly, but that's because I'm getting so excited. The idea was that essentially, Game A would work to a certain extent, but it had problems, then societies would collapse, right? The Game A problems that we encountered were the inability of society to actually police defaction from the bottom up in the context of complex human behavior over a long period of time. Okay, so in the fall of the Roman Empire, we saw a lot of internal defection start to crop up. The second problem was the inability to maintain complicated infrastructure. Similarly, in Rome, maintaining aqueducts and roads and this type of stuff. As a society grows, as you build more and more complicated stuff, you get diminishing returns on what it actually provides for you, and you have a maintenance issue. The third of course, is invasion and enemies and perhaps Imperial Rome, the people at the German border, we can view as that. Now, the thing about evolving through collapse is we learned from previous iterations of Game A right now, what we need for that is for collapse, not to not be global. Right. So Game A has a fourth problem, which is the problem of globalized exponential technology. And once again, Joe Rogan's recent interview with Daniel Schamchtenberger and Tristan Harris Can you can watch that learn a lot more about these but the exponential technologies such as what we see in social media, AI, drones, gain of function research as we've definitely seen recently. So the problem now is that if we have of a civilizational collapse Game A people says there's a high chance and it's a plausible hypothesis that this will actually be global, so that we need a new game. And this is what's referred to as game B. We don't know what it is, but we know certain characteristics that might have. And that's why I actually think open source technology provides a wonderful example of things that could come out in game B. So we want more infinite games is one thing. I'm wondering... my question for you but feel free to answer any other things that come to mind from this riff is, what would data science look like in Game B? Or is there an alternative to the way we practice data science today to make it applicable to a different type of system like that? And is there anything you want to add to my description then as well, I suppose?

Unknown Speaker
the way I came to some of this and all, but I think you did a very good job of explaining these concepts. There's a lot. We talked about the scale and scope of civilizations and like the current situations and blood, what Daniel likes to call the kind of the global meta crisis, like our ability to resolve crises as bad. And then, for the first time in human history, we have one single global civilization, that whether it's suitcase nukes, whether it's gray goo, whether it's CRISPR, whether it's whatever, there are a lot of things we're in the middle of pandemic, all sorts of stuff can happen that affects everyone. So in the in history, there have been collapses of civilizations, but they've been like, oh, China's regressed a little bit, or oh, there goes the Incans, as someone else's somewhere else doing something, some other people somewhere else are doing things. So there's a... but now we have a global civilization. And the stakes are much bigger. And there's many more ways for complex interplay of things to cause all sorts of horrible stuff. So we don't even have the capacity to solve those problems. And moreover, a lot of the things that we've built through the medieval institutions to now we're like actually making those problems worse, but we're putting things in our own way to solve those problems. And so Game B is everything that humanity's been doing for the last 10,000 years, things that we think are reasonable approaches, like using competition to incentivize people to do whatever, or always building more technology because it's always good or getting more energy that's always good, or whatever might be all of these things that didn't, you didn't have to think about, okay, I have do a boundary integral over the entire volume of the earth. And overall, like our eight and a half billion souls, no one's ever had the responsibility or needed to do that integral before now we have to because everything we do almost is global scale. So game A, that game is coming up against its hard limits. And part of what makes it hard is also that it's not just one thing, like, oh, well had faster computers, or Oh, if only we could do that. So we'll have clean energy. There's no single thing. It's a complex problem, which means lots of things are intertwined together. So there's no silver bullet that solves all the problems. So the existing orders breaking down. And the theory is that the people who are affiliated with Game B, the theory is that we will see levels of collapse happening. And it's not like instantly overnight, it's gonna be some dystopian thing and everyone dies, what's going to happen is as these things fall down, then we're going to see ourselves regressing to authoritarian regimes, we're gonna see many 10s, or maybe hundreds of millions of people dying of starvation, dying of various natural disasters, dying from wars that come over food and water because of climate change, all sorts of kinds of things. So the regression of human civilization, the decrease in human liberty, that decreased our ability to do science, all these things will start falling down. The system needs to be reformed. Now the problem is how the hell do you reform the entire system of the world? Right with a total global 100 trillion dollar GDP with however many 1000 1000 nuclear warheads and how many you are whatever, like everything we do as a world... 8 billion souls? How do you reform that one of the points that is quoted on the gayby wiki is by Ilya Prigaljin, who is a Nobel Prize winning physicist and he said, what a system is far from equilibrium, small islands of coherence have the capacity to shift the entire another person articulated this a little bit is Bucky Fuller inthe trim tab concept, right? Which the dude Well, I think a Golden Globes talked about. But anyway, so there's this idea here is game B is to figure out can we build non hierarchical bottom up self organizing approaches to explore and build these islands of coherence working groups in economics, currency and monetary theory imagined the future of cities, political meta modernism? 

hugo bowne-anderson
Can you define coherence, as well?

Unknown Speaker
I define coherence as something that had that can maintain its own metabolism, right, that there's a homeostasis that when things try to push it with these trying to come at it from outside and push it out of a pattern. it can restore itself into that pattern. And that pattern has a like a standing wave. It's a It's, gosh, I should better answer this question.

hugo bowne-anderson
Complex Systems can be coherent, so maybe a humming birdt is coherent in the way that a Boeing 747 Isn't? Is that an example?

Unknown Speaker
 No, that's there's different there's an emergent order versus a complex machine. I guess the concept here of coherence is just these things are they're standalone, they can defend an interiority and interior, despite some stuff on the outside, just, you know, some level of disorder on the outside. Yeah, coherent doesn't necessarily have to mean like in the Cynefin sense, complicate this is complex, as opposed complex, you can have complex systems that have cohere, and your whole body is a complex system, but you have coherence, when your brain is going to epileptic seizure, it doesn't have maybe it's actually too coherent. So maybe that's a poor example. But in any case, the point is, the idea of a game B is for people who are concerned this problem who have the same perspective on game, A, the system, the world, following these collapse modes to get together and form working groups on all these different aspects of what makes a human civilization possible. So that is economics. That's agriculture. That's meaning, that's families, that's culture, that's law, all of these different kinds of things. And so you'll find people who are working on permaculture and sustainability, people think like Zack Stein, he's an affiliate, he's in the game B orbit. And he has this book, education and time between worlds, right? How do we actually teach people, children and adults and everyone, what does education, lifelong education look like their souls working on 3d printing and technologies to build in a low impact way. So you build structures that are durable, but then end up creating all sorts of externalities that are negative for the environment, they're in all these kinds of things, are people working in this game B space. Now, the way all that being said, the way that I got involved in thinking about these things, was when I had the revelation that our open source, human ecology and human ecosystem around the Python data movement, like scipy and all that, of course, I had the realization that the economic value created by this small group of people, relatively small group of people, was absolutely astronomical. And so the concepts of modern capitalism came around through industrial age through through better organization, bookkeeping, and the rise of cities, then to through yeah, pre industrial and then industrial structures, a lot of these things where you can take something like capital, apply it to labor and get some much bigger output out. Capitalism says, look, the surplus, the allocation of surplus should go to the people who provided the capital, should have a bigger say, in how we allocate the surplus, and what we do with it. That's one of the core tenets of capitalism. And the issue is we have a new kind of thing now, which is not labor, but human intellect that provides vastly more amplification than Capital. One of the company provide a laptop and an internet connection. Okay, that cost a few $1,000. But then you get one really bright dude or gal, and you know what, they've just produced code that shifts the productivity of millions of people. So we have to have a different conversation about what attribution of economic surplus looks like. And in fact, if you consider if we look at this powerhouse of open source nerds, who collaborated with each other in a gift economy, and a participatory--- it is as if you think about old school tribal culture, people just nerding out with each other. That's what it was, then we there are conflicts, and there's some adjudication that had to happen. And not everything worked out. But for the most part, it was a very free for all. It was an experiment in gift economies and participation cultures. And guess what happened, it produced the software that literally powers all of the alpha in the world, basically, on a go forward basis. If we have that thing, we have such a powerful human energy, human ecology, it produces this kind of economic lift. If on the basis of this, we cannot build a new economic and new sort of institutional order, then we're totally hosed.

hugo bowne-anderson
I love that you framed it as gifting economy and because in another way, maybe it wasn't conscious, but it was a reaction against all the Fordism and capitalism and corporate hierarchy. It's antithetical to that entire system and stepping back from that frame and reconfiguring how we build things together as people and co-evolve together.

Unknown Speaker
It didn't start off as some kind of Marxist rebellion. In fact, I think Travis is a little maybe a little horrified he he's not horrified at the collaboration aspect but he's a very He's very much a market like you Hayek and all these others like He's very much a market a market libertarian but I think of course he knows better than anyone like this that the value of this gift economy and the collaboration and all these things there is there's like the best way of describing this as it hopefully there's not a spoiler if people haven't seen the movie Monsters, Inc. Okay, mute the next maybe 30 seconds because I'm going to give you the spoiler those who have seen it may remember Have you seen it?

hugo bowne-anderson
I haven't seen it, but I'm okay with a spoiler. Okay. The spoiler is the reason why these monsters go into people into little kids bedrooms at night and scare them is when kids scream, then like the outside of the door portal, the monster goes through it to go into a bedroom to scare a kid. There's a little thing that basically captures energy from the kid's fear and they bottle it up and that energy is like this the energy that powers the monster civilization. Okay, but what, at the very end, the cool thing is, they act goofy, these two monsters they don't want to scare this poor girl. They act goofy. And she starts laughing giggling with delight the energy and that goes out that is just absolutely blows up their energy collection thing. And they're like, holy shit, joy and happiness can power civilization too. And okay, spoiler over now that for me is like the most succinct of course Pixar nailed it. 

Unknown Speaker
I've got no problem with money as a way of attributing karma. Absolutely. When it becomes just paper games for burning up the world, that becomes a problem. And that's when we move to fiat. And we move to electronic debt currency, that we really created some problems. That's actually where this stuff really has gone off the rails. 

hugo bowne-anderson
And I just want to make clear, I'm not anti market or anti capitalist at all. I think they're, you know,

Unknown Speaker
I am a refugee from a communist country. I like the market absolutely here. But that's not where most of the financial constructs sit. The Convention has to send the plane above.

hugo bowne-anderson
And competitive markets are better than markets with monopolies, which we're seeing more and more well,

Unknown Speaker
but it bears repeating. The earliest capitalists were monopolists because they were like competition is horribly inefficient. Why would you build two telegraph systems, we have one perfectly good Western Union servicing the entire world where one good one Bell System servicing all the countries. So then we take that when you go back and look at the trust busting and stuff, and Teddy Roosevelt, they put capitalism and monopolists together one sentence all the political cartoons, they were all basically a bunch of just money grubbing fat cats that were looking to extract because they have the capital I built the railroad or I built the telegraph network, of course, I get to charge whatever rent I want. So people nowadays only like the Reagan Thatcher rebranding of all this stuff. Do people get markets and capitalism really mulled together, but China is definitely a socialist, communist country. And they have a lot of healthy markets. And here in the United States, we have markets and we have capitalism doesn't have to be capitalist, we have very significant captured cartels and whatnot here, they don't have healthy ... And yet still capitalism over here. So anyway, but to bring this back to the game B concept, why how open source led me to game B's thinking was really around this idea that if we have, like we have now it's like, Scott, it's like, though, Sally and Mike are like, going through that door. We've just now seen that laughter can just massively power all the stuff. So I'm just like, Oh, shoot. So then what else can we how can we go from here to see if we can't take this dynamic and scale out crowdsourcing participatory stuff, downscaling and going against the Koshien theory of the firm? Or yeah, like all these other things, can we build smaller, more agile things that have people working in small groups high meaning high trust, building valuable things that then sit in a network of other people, and the whole thing becomes a much more vibrant, and actually just as economically powerful, if not more economically powerful system, than the game A sort of way of structuring the world. And the really cool thing is writing into all of these firms on the Trojan horse of this grassroots data and AI revolution.

hugo bowne-anderson
Amazing. So how can listeners get involved, learn more about these concepts? open your mind to the possibility that if you're doing data science, maybe you have a role to play in the revolution, if you think the world is wrong, and broken in so many horrible and interlinked ways, recognize that there's a way out the way out is to actually find other people who agree with you, and then listen and learn more about other ways of framing and thinking about this stuff. But ultimately, we're not going to get from here to there without doing a lot of great...building economic value, but the manner in which you build that economic value significantly sets the tone of what happens next We wanted to talk about cybernetics briefly as well.

Unknown Speaker
I just I only think about the cybernetics thing was just that I think people should use the word cybernetics more. Because the cybernetics is really about closing the control loop. It's not just making predictions and sticking them on a PowerPoint. When Norbert Wiener and others pioneered the field of cybernetics and these ideas, there was no PDFs right? So they're really this idea of active control, active guidance, learning from the environment, respond to the environment, building models, that gets smarter that in response to a complex environment, all of that is really cybernetics is control theory, right? And so making all of our organizations and all these human systems as transparently cybernetic as possible. So we can evaluate when they're doing well, when they're not doing well, when they're doing the right things when they do harmful things. Right now there's just it's so loosely coupled, right? It's again VP shooting from the hip most of the time. The reason I like to use the word cybernetic versus AI or machine learning, both of those things really tend to, what do they do? Just like the whole thing about calling data oil and training data as being precious when really data is the ephemeral thing that exists when there's a connection between sensing and cognition. That's information, right. It's the state of the sensor and the cognitive or being in conjunction with each other having coherence. So cybernetics is important for us to not think about intelligence being abstracted out and put in the machine, because that's actually deeply disempowering and inhumane way to approach the world. The more that we as data people and practitioners, the more that we humor, that kind of positioning, the more that we are, we're accepting a frame that says machines are intelligent. So we don't have to think about it the machines fault, the machine decided the trolley problem, I'm sure was fine, I'm sure the output is the best possible output, we could have had, because the machine figured the trolley should go over this way versus that way, we really have to push back on this concept, because the actual things that we need to do as practitioners are nuanced, the actual decisions have to make are in a gray zone a lot of times, and it has to be, we have to pull organizations kicking and screaming into the conversation around understanding the ethics and implications of what they're doing. If they can pay somebody to tell them, it's an off the shelf ethical system, it does the decisioning, it's at fault, if something goes wrong, the more we let again, these VPs want to shoot from the hip, just go and implement AI from the hip, the worse the world is for everyone, because they will then the immediate consequence of that is they will then put in legislation and other things that changed the competitive landscape to where it's a race to the bottom, all companies have to do it this way. Any company that tries to do the honest thing actually ends up getting hosed because now their decision makers are now responsible for the ethical outcomes, right? So you can absolutely see this race to the bottom, unless all of us as practitioners really drive home, this framing, this paradigm that we are building predictive systems were built putting action things in place. And this is us, we are doing it, no one else. There is no AI ghost in the machine that just figured out the right thing to do. We trained that fucking AI, right, we sat there and told it, this is good. And it said that is good. I'm gonna deny these loans. So I'm gonna go and arrest those people. Like we did that. And we have to take responsibility for it. And the problem with game A, and this is really back to the principal agent problem. Not only has the system, the medieval institutions of nationalization, and the joint stock corporation evolved to what it is now, the system of game A has is all about alienation. It's all about you are among people. Marx was not wrong about this point, though. I will say that, but he was more about alienating labor great, but this guy is alienating it alienates everyone, we have absentee ownership with absentee ownership for most corporations, right? And so when you have owners are alienated, the consumers are alienated, the workers are alienated. Who the fuck owns anything? Where's the ownership? Where's the courage to say I am responsible, right? So around these automated decisioning systems, we all as a practice as a movement of practitioners, we actually have to draw a line in the sand and say, the buck stops here. Like we are going to actually be a point of accountability about decision making in this below. And I think that as a demographic, as demographics change, as boomers aged out from their advanced positions, and new generations of people come in, there will I think that I like to think that my generation has people who are willing to step up to the plate and bear that mantle.

hugo bowne-anderson
also I love that one. I'm gonna paraphrase. But one of your statements was we need as data practitioners to drag these organizations like out of these patterns and out of these deeply harmful frameworks. And I also all your data scientists and data analysts and machine learning engineers, and deep learners or whatever you call yourself out there, we're actually all in incredible position currently in the labor market to make certain demands and labor market on the demand side is hot right now. So perhaps don't take a job at all we can try to make not suggesting unionization in the classic form, but we are in a position to make demands. I also, I wasn't going to go there. And I said, go look at Slate star Codex. But seeing that you mentioned the race to the bottom, I thought it might be good to wrap up actually giving the definition of a multipolar trap, which defines so much of what where we all these days and it is an abstraction or a generalization of the Tragedy of Commons such as factories polluting a river of the prisoner's dilemma in which defection is better irrespective of anyone else's choice of the free rider problem where people are just lazy and taking benefit from other people's work and a generalization of the race to the bottom. It's important to note that below the Dunbar number when we're in the band level that it was actually quite easy to police these things. So it's the scale of society which makes it difficult but the definition of multipolar trap is "In some competition, optimizing for X, the opportunity arises to throw some other value under the bus for improved X, those who take it prosper, those who don't take it die out. Now, this is key. Eventually, everyone's relative status is about the same as before, but everyone's absolute status is worse than before." And I think we're all in a position, at least as part of this data revolution to start thinking about how we can create an incentive system which doesn't result in these multipolar traps.

Unknown Speaker
Yeah. And I think actually, maybe not unionizing. And creating a Union of data practitioners is quite difficult, but at least creating professional society, where there are some of these things that actually we codify and talk about. That's something that has been on the back burner for me. So they have wanted to do for years now. And I think it's yeah, maybe we should actually kick it off in 2022. But that's

hugo bowne-anderson
a great idea. Anyone who is interested, please reach out to PETA and myself on Twitter. Is Twitter. The best place to say hi to you, Pete? Oh, sure.

peter wang
Yeah, my DMs are open P Wang

hugo bowne-anderson
playing at p Wang.

peter wang
Yeah, right. It's a P

hugo bowne-anderson
Wang at Hugo bound. And, Peter, we've had many conversations over the years. None quiet as wide ranging in there. Oh, no, actually, that's not entirely true. But this is one that I'm incredibly grateful for. And thank you so much for your time.

Unknown Speaker
Thank you for having me on. This has been fantastic. Really appreciate the opportunity to go all over and connect all these different dots. Absolutely. I think I don't think I've really talked about some of the game B stuff and how that connects the open source stuff, really in a public setting quite like this. Fantastic. Thank you for

hugo bowne-anderson
absolutely


Transcribed by https://otter.ai