James (00:00): Hey everyone, James here. So we recorded the podcast, but there was some crackly in my microphone. I'm on a different microphone than normal. I do apologize. No need to write into the show or tweet in. Um, I do apologize. There's nothing I could do in post unfortunately, and we didn't have time to rerecord. So here is the pod machine learning. Frank, it is a topic that I think that you're an expert in. Frank (00:30): Oh gosh, no. James, I am not an expert, but I love, love, love, love, love, love, love, love, love, love that. We're opening a show in machine learning. James (00:42): I'll get, well, it's not really, really machine learning. I mean, I want you to tell me if this is really machine learning because it's really not. I think it's a segment of machine learning. But what if I told you that I'm talking about machine learning where I write two lines of code and all of the hard work's already been done for me. Frank (01:04): [inaudible] you're talking about new guts. Are we talking about new gets this week? I thought we were talking about machine learning. You're really teasing me here. James (01:09): We're talking about new gets and we're talking about Azure, we're talking about, we're talking about [inaudible], what else going on? Or now we're talking about pre compiled models for sentiment analysis. Frank (01:22): Ooh. Sentiment analysis. Okay. So now I'm getting your two lines of code thing and absolutely that's machine learning. So if you want to tell us something, machine learning or not just count the number of if statements. If it's less than 50, it's probably machine learning. James (01:39): Oh. So I've known in and around sentiment analysis for a long time. It's basically baked into every single cloud provider out there. It's baked into tons of prebuilt models for you. In fact, I think it's, is it the first machine learning thing that you do is like sentiment analysis? Is that like the one Oh one at this point? Frank (01:59): Well, it's certainly, um, the most famous one and the one that I would say we've kind of solved. And that's why there's so many solutions to it and so many options when you do want to do something like it. Our solution for it actually came out of spam filtering. Remember that old world? Yeah. And we had the big Bayesean revolution where we started using statistics in emails, language statistics, this word occurring with this word, with this title, the probability of this, given that Bayesean probabilities, we developed all that for spam filtering and guess what? It works just as well for um, sentiment detection and all that kind of stuff. So it's kind of a classic problem. It's a good one. James (02:48): Yeah. And I think that you're right on the spam filter stuff is that, you know, we, we, I've definitely looked up somewhere, it was like, you know, is this a, you feed in a bunch naughty words and a bunch of positive words and it will tell you if it's positive or not positive. It's sort of the classic spam filter is there are some negative things in here or there are some naughty things in here. You should just filter this out, which is better than like there, that's sort of better than, um, here's just a bunch of words that are banned. It can really kind of analyze a lot more in a way. Like that's one model. Frank (03:26): Most whenever someone's presented with a problem of filtering, everyone almost starts out with an allow list in a disavow list. And then if you get fancy, maybe you like add regular expressions here, allow list and disavow list and you do things like that. But eventually, uh, we call those heuristics where you're trying to summarize all that knowledge in your brain into code. Or maybe it's actually literally the definition of coding. We're encoding the information. All of these ideas we have about civilization, all of human knowledge into these little rules. Uh, and it turns out there are just limits to it. A, it's a lot of work. And B, you're going to find edge cases forever. That means it's an untractable problem, which means you got to pay people for the rest of eternity. No one wants to pay anyone for the rest of eternity. So we make the machines do it. James (04:16): Yeah, I like that because you know, there's definitely some words that are, you know, considered bad in some cultures and not in other ones. And you may want to ban them or replace them, but you may not want to block the entirety of it. But if you have that binary, yes, no then of just it contains this word right then that then that's bad news. You want to have like a statistic of zero to a hundred or you know, what's the probability that this is good or bad or there's a 90% probability this is bad. So maybe just filter it, but it's so it's in the 60s maybe have a user do that. In fact that could be on images, two image classification. Is this a naughty image? Is this a cruel image? You know, things like that. And then, you know, you read these stories of um, contract workers or, or Facebook employees or YouTube moderators. I'd have to go through the stuff, which is terrible. And the more the machines can do it, the less hopefully human interaction want to do it. Now, luckily I didn't have to do any of that because I was not doing image classification. I had one goal, Frank and I want to know how you would get there before I tell you how I got there. But here's what I was doing on my get hub backlog for Hanselman forum. Classic Frank (05:22): app forever when WinForms classic app forever, Roger informed you already James (05:30): Xamarin forms Hanselman forms. That's all I've been doing. Okay. So I have just, I have this part of the app, which is all of his tweets. And what I wanted to do was have a sentiment analysis of the last 24 hours of Hanselman's tweet. Kind of like how is Hanselman feeling today in the last 24 hours. And it was on like a gauge, basically from zero to 100 of of positive Frank (05:57): or neutral or negative or somewhere in between tweets. That's what I wanted to do and get it all out there. And that was my goal today. Streaming. I absolutely love this. I think this should just be a feature of Twitter. Like when I log in in the morning, it just says, Hey, these people are being positive, these people are being negative and these people are being informative. Um, yeah. So I have plans, uh, for whatever you chose to do here. So, um, you want my solution? What do you, what do you want out of me? Because I really just want to pick your brain, but I'm happy to talk about what I would've done. All right. Cause I, I also, you know, I was, I was encouraged to do this because our good friend Jeff Ritz, he has a sentiment analysis bot that is monitoring his chat room in real time, which is kinda crazy. Frank (06:49): Now you can't do that. There's too many personalities. There's no way a bot can keep context. Look, this is a hard problem. So I, I guess I kinda hand waved a little bit there when I'm like, it's a totally solid problem. We can do spam. No problem, man. Um, none of this stuff is easy, honestly. Uh, whether you're doing machine learning or heuristics, any of that stuff. Uh, and even if you struggle for a year and get it working for English or even us English, all of a sudden you realize, Oh, there's 7,000 languages on the planet. So now I just got to spend 7,000 years and do all this over again. It's very true. Luckily, luckily I went into it with which is that Hanselman I would say 99% of the time tweets in English. Not always because some other languages, but normally pretty much. Frank (07:36): And those tweets are more of a statement in time or you know, or some explicit, you know, thing it's not, it's usually not a conversational because in this I removed all replies and I removed all retweets. Right. So they're all his tweets. Yeah, I like that. That works. Replies. Mike might have some information here. Uh, but I guess, uh, when tackling a machine learning problem, the first thing you have to define is your problem. I think he stated it pretty well, but I want to rephrase it slightly in machine learning way. And that is that the, um, the data, we want to learn the labor we want to learn. In your case, it's sounding like binary data and on off. Now it can take on values in between for sure. Don't worry about that, but it's just kind of two extremes. You just want to know, uh, being positive today or not, but it sounded like also maybe you want three points, positive, neutral, negative, but then you could even have four points, five points, positive, informative, critical of the government. Uh, talking about the virus, you know, you can have categories all over the place. So when you're doing a machine learning problem like this, it's good to think of your end result early in the beginning. So just for clarity, are three points positive, neutral, negative or are you two points positive? Neutral. James (09:01): So I was very much more than two data points. Like that was my big thing is I sorta don't see the world in binary of like, Oh, this tweet is 100% positive or 100% negative. I kind of think that there's a scale between zero and a hundred but I'm okay with more data points of like what I would love is very positive, positive, neutral, negative, very negative. Like five data points would be spectacular. Or in general, what I would really love is a scale from one to or zero to 100 right, which is on the positive level. How positive is this? That would be the ideal scenario. Um, because I went into the problem, I'll tell you this much as I was thinking, if it's only binary, negative or positive, then let's say handsome tweets a hundred times and 80 of them are positive in quotes. Is that really an accurate statement to say he's 80% positive today? Because what if those negatives aren't really that negative? And what if they're more neutral in the middle in which really that scale of 80% isn't accurate. So that's sort of how I was trying to go into the problem. Frank (10:15): Well, you actually got into a really good topic and machine learning here. So what I want to start with is from the beginning, it's still a binary scale, what you're talking about, even if you allow values in the middle. Okay? The point is there are just two extremes. There are two hard points and then everything else is soft in between. So if you designate a neutral, that becomes a third point. So you can be a little positive, neutral, a little negative, neutral, but it's a third point. So if you just have a positivity scale, you 100% out of the machine model are going to get a percentage, 95% positive, 86.2134% positive. You know, you'll get those numbers. Don't worry about that. That said, you should never ever, ever display those numbers because they're semi meaningless. Uh, there, it's so easy as humans to attribute meaning to statistical numbers. Frank (11:15): But humans, we are fantastically bad at interpreting and understanding statistics. So what you should do is, yeah, uh, maybe your, your, the machine models definitely gonna calculate those middle numbers, 95%, but in the UI you should definitely, yes, say positive. And if that's not good enough, then you should add those other points. Super positive, positive, neutral, negative, super negative. Those should be distinct points. They'll should not be percentages. Don't do percentages. These are categories. It's one or the other or the other or the other. Got it. It's just percentages don't make sense to humans. They just don't. James (11:58): Yeah, no, I mean, no, that makes sense. In general, I think that that's really what I wanted in like a gauge. Like if I had a, if I had a bar chart and the bar chart was 100%, you could see like, um, different colors fulfilling. So most of it is positive. There's a little negative and there's a little neutral in between. So kind of a gauge that would move back and forth is sort of what I was thinking is like, it's very positive, not so positive today. You know what I mean? And you would see that positive on that scale and you wouldn't really need to be out of a hundred. It's just here's the positivity meter in a way. Frank (12:31): Okay. Yeah. And in that case, I wouldn't even do that at the machine learning level. So I would use machine learning to go through and binary tag every tweet binary. Yes, no, yes, no. But then in the end, do your sons and your averages and say 85% positive, don't take the number from the machine learning model. That number is meaningless. I promise you that number is meaningless. Um, but instead do the binary model. Do what a machine learning models really good at categorical data and then do statistics on top of that to do your cute little chart and all that kind of stuff. Cause then you could even do a time histogram. Like look, he was this positive in the morning, this negative in the evening. And then be like, Oh he must've had lunch there and got more positive. You know, you can do fun things like that. James (13:18): That's true. Well, so here's what's funny about that because I went into it and I said, well, you know what I want is I want to maybe use, I want to use C sharp. I have an Azure function that's already getting triggered on every tweet. So I can go in and I can run machine learning on, you know, every tweet that happens. I could either go lump the can of glum, put them all together, or I could just run it on every single tweet if I wanted to write some options. So I immediately looked@ml.net, which is machine learning for.net, which I was like, this is going to be great because it's exactly what I want to do and it does a bunch of stuff. So that's where I started. Frank, have you used them on that at all yet? Frank (14:00): Uh, I have, uh, but I am by far no expert in it. M L. dot. Net is a very, very large library that has a lot of different machine learning concepts in it. From a machine learning perspective, I'm best at neuro networks. That's kind of what I know. Beth, I know how to use them. I know how to train them. I know that world. A ml.net is a vast library. It can do neuro networks plus a billion other things. Um, but because it's so big, they've actually built some nice tools to help you, I think tools to help you use it, but also, um, lots of tutorials to show you well for this kind of day to do this for that kind of data to do that. Things like that. I think my favorite feature of it is they have a command line tool and you can just pass a CSV file to the command line tool, tell it which columns you think have data in it and which column you want to predict. And it runs various different machine learning algorithms over it does a little bit of magic sparkles some dev ops on it or something like that. And then out pops a magical machine learning model. I think that that would be your best case scenario. Right? James (15:14): So that's where I started at and I went in and I'm pretty sure they just have pre-trained models. I'm not a hundred percent sure when I go through, it's usually like, Hey, give us a bunch of data and then we'll create off that data, the model for you. So like they have one where you could feed it like huge amounts of data and then like a whole bunch of tweets and you would categorize the tweets as like positive, negative, positive, negative. And then you would feed it some stuff on a new tweet and then it would tell you if it was positive and negative based off of that. Now I think you are right. They have this auto ML thing where they can basically, you can just say, go figure out what model. And they probably have that all created. The problem I had Frank was that, well, the problem I had was that it was only binary. Um, and I didn't one binary data, even though you told me just right now that I should have binary data, but I Frank, I wanted positive, different, I wanted a scale. I didn't want true false. You know, I wanted more than drew falls. Frank (16:18): Hey James, it's meaningless. I promise you it's meaningless, but please continue. In my mind I said, you know what I want, I want to not, you know, I wanted to, I'm going to make this harder. I'm gonna make this harder. That's what you decided. It's pretty simple problem. I'm going to make it harder now. But before we get into all that, cause that's a huge can of worms. Um, let me first touch on something else you just said. Um, the data gathering and the labeling part, that's definitely the um, most time consuming and annoying part of machine learning. Like the algorithm sound fun. The libraries are fun, the things you create are fun data gatherings, terrible labeling data. And so before I kept telling you, you should have multiple categories, not just binary levels and make those categories, but that would require you James to go through every one of his old tweets at least. Frank (17:16): I mean well we can talk about it, but you know a lot of them and label them put, put that data and type that data. And, and I just, you said one sentence of it and that is the longest part of machine learning. So just feel like we have to mention that. Oh that is definitely the longest, longest, most annoying part. So I totally get that. I'm a try to use pre-trained models. So I'm curious to hear how that goes. But B, where you went from here when you didn't like what ml.net was doing. Yeah, so I think of James (17:50): what you just said really described what I didn't want to do, which is that I didn't want to either go search the internet for a bunch of data that was prequel, know classified or you know, classify it myself because really, I'm pretty sure that text analysis somebody has done that before. So I said, you know what, that model has been created and I feel like someone's already done it for me. So I go to Google and I type in Twitter C sharp analysis.net, you'll core whatever. It's, give me something. And I got back, something is called the Stanford core NLP, the Stanford core natural language processing. Frank (18:35): Oh boy. Okay. Now we're going to 1980s technology. We're going backwards in time. Everyone enjoy the backwards roller coaster, trying not to get sick. Uh, try not to reason about how we've had this for a very long time and made zero progress and machine learning. But, okay, what did you do with the, okay, let me give it a more positive spin. This is an amazing database. Took a lot of effort to create and has some relationships between all the words known relationships and can do something, I think like a Gaussian mixed mixture model to do basic predictions off of it. Does that sound about right? James (19:19): I believe so and I believe it's been extended with many, many more models to do many, many more things including sentiment analysis. Now here's the problem is that this thing is all Java based. Frank (19:31): Yeah, that too. But Frank James, please just call me next time. What am I doing? James (19:38): Told you Frank, that there is a Stanford nlp.net library that uses mano ILO VM to take the jar files and run them in C sharp. Frank (19:52): You had me, uh, I KV M I T, is that what it's called? I KVM, something like that. Yeah, I think it's called by KVM. It is the amazing library that can run a Java code on the CLR and it's actually been around forever. Um, yeah, it's been rough forever and it works. Like I've been using it on iOS for years forever. Like it just works. It's awkward. We never talk about it. I think that's a little big, a bit of bigotry or something. But it's there. You can totally run Java code on all the Xamarin stuff and all.net I suppose. Yeah, James (20:31): yeah. It's, it's ridiculous. It is really, really crazy. And there's a GitHub project, I think it's a surrogate Ty Han who did it. Um, and it basically does that and someone else extended it with more models so you can sort of create more models and do this stuff. So I was like, this is because Frank (20:48): the sentiment analysis, Frank, what it does is it gives you back not binary data, but it gives you back five data points. Very positive, positive, neutral, negative, very negative. I was like, Ooh, this is exactly it. You needed categories. You don't actually want percentages. You just want a spectrum of categories. And also shout out, I believe Sergey is an F sharper. So Hey Sarah Guy, little gift from the F sharp community here at James. And I do many, many samples or enough sharp by the way. So, which I think is pretty cool. I mean, when you look at all up online, the thing is fantastical, right? So I'm like, this is great. It does exactly what I want. I'm going to install it. You get packages, everything is super duper good to go. Wait, wait, first you bring it up in nougat, then you go to the bar and you change the end to an F and you click and you see is it.net standard? Frank (21:47): Okay, so that's you. You, you jumped the gun Frank, because here's what happened. I'm sorry. I'm sorry. This is what I do first. I always check for instance, and this is what I should have done, but I didn't. So I installed the new get packages they install, but they have a warning. But I'm like, whatever. Everything has a warning warnings. My God, you ever hear about the boy that cried Wolf? Well, have you seen a thousand of them? So I'm like, everything's fine. And it's really cool. Actually. What you do is you say, um, you say like VAR sentence equals new sentence, give it a string and then say sentence, not sentiment and it gives you a bag. It's really perfect. Perfect. Done. I would like to get sentiment async, but you know, I'm a weirdo. It's two lines of code done. Um, I'm sure you could get async a fi. Frank (22:35): So I'm like, all right, I type in the code. I copied from the GitHub page and sentence can't be found it cannot bring it in. I was like, Oh, it must be a, I don't know, BS for Mac thing. Let me close it, reopen it. Now I just know I don't want to stop there. I'm like, what the heck? It's, it's saying a class isn't there. What's it saying? Isn't there? It can't find the class. I can't bring in the namespace. The namespace doesn't exist. It doesn't exist. Like where is it at? Non-existent. Noddings problem. So it's a problem. So what do I do, Frank now, now you can go now I can go now is when you go and check out which frameworks this new gets supports. And this is basically the entire reason I wrote [inaudible] dot org was I was getting tired of actually opening up new gets to find out what they actually support, not what they say they support James, what they actually support. Frank (23:24): And so the easiest way to do that is, um, just click the new get link for the thing, uh, changed the N to an F. so you go to [inaudible] dot origins that a nigga.org, and then I highlight all the frameworks for you and you can see, and James, when you did that, uh, what did you see? And there was one net net four or five net one. So, wow, this is four, six, one semi modern. Here's the thing. Here's the thing, James, remember when.net standard was first coming out. They're all like, don't worry, we'll let you consume.net assemblies. It'll be fine. It turns out the tooling really does not want you to do that at all. Uh, there are terrible hacks you can do, uh, to force it to accept that DLL. But you're really playing with fire because obviously it wasn't tested on whatever platform you're hacking it to work on. James (24:20): And on the get hub issue, there was one that says Donna core support and they replied and they said, you know, I got a lot of dependencies and those dependencies, the jar conversions and the dissonance Frank (24:33): man. I said, okay, I'm mad, I'm mad to hear. They all depend on left pad and left pad doesn't exist anymore. So no one wants to rewrite left pad. It's a, it's a, it's a treasure trove if you will, of diving James (24:46): deep into dependency, conversion shenanigans. Right. Frank (24:51): You know, I get accused of reinventing the wheel a lot and that's because I reinvent the wheel constantly. But I do it for a reason. People because yeah, can't trust anyone in this world. There's nothing. So when seats are bad, so here's what I said, Frank. So I said, what if I get rid of running the machine learning on my, in my Azure function, right? Cause that's what I thought. I was like [inaudible] is your function to do where I don't do that. Yeah. Okay. This is fair because now you have the whole world opened up to you because pretty much everything is going to run on.net core. Uh, I'm a little bit sad though to be honest, because I definitely a million hundred percent prefer running my ML stuff on device. And I think a part of that is just the engineer in me. I just find that kind of wasteful just from an energy perspective of using radio communications to talk to a server, a continent, a way to transfer, you know, I'm holding that CPU in my hand, but CPU can do it. So I think the engineer me just has a fundamental hatred of the web. But I still think you came up with a good solution honestly. So I went to the one place that I know has everything, the cloud, and if they don't have it, there's probably a radio button or a check box that you missed or dropped out. It's very true. James (26:19): So there's, there's a, there's a thing that I've used for a long time. I didn't use this API. I've used other ones in it, which is Azure cognitive services. We know cognitive services, you know, cognitive services, right? Right. Frank (26:30): We love cognitive services. Actually I do. Um, because as these machine learning problems kind of get solved, you know, like Sentinel analysis for us, English kind of solved. So yeah, why not use a magical cloud service that just gives you magic answers if you trust the AI's not to overthrow us. James (26:53): That is true, but that is what I did because I do trust them not to overthrow a CNET. Um, so they have text analysis inside and there's a bunch of cool stuff inside of it. I recommend people, I'll put links in the show notes, but if you go to the Azure cognitive website website and click on text analysis, you can do a bunch of stuff including sentiment analysis. And this is really cool, Frank, because you give it any string. It doesn't matter how long, it doesn't matter how big and what it will do, it will analyze the entire glob, but it will also analyze every sentence inside that glob before you, which means you could go do whatever you want. Frank (27:33): [inaudible] love it because if we ever get to what would Frank have done section of this. I'm just kidding. Um, I, yeah, the model I would have made, would it kind of given you that too? James (27:44): Yeah. So, but it also gives me three data points, Frank, positive, neutral, negative, and in overall ranking. So if I just wanted just a string, it would give me just what it currently is, but it'll also give me those three data points. And in fact on the website you can just do it directly on the website, put in any string, and it'll give you back a very fancy chart bar chart with those three values, which is exactly what I'm going to put into the app. Frank (28:13): Uh, yeah, I think you should still do fancy things. So like, uh, what I was saying was, uh, time incorporate time into it. So maybe like multiple charts for different times of day, stuff like that. Once you get data, it's hard not to be a bit nerdy about it and try to extract value from it. So, just like I was saying, even though you have positive, neutral negative before you could have constructed a neutral just by 50% positive, 50% negative, you know that statistic could have come up. But it's really cool that uh, cognitive services is doing that. I think I would have preferred, I think I do prefer the five category. Like I don't know, like what does neutral mean? Like I think we need to do specific Hanselman categories here. Like I'm talking about as blog, talking about this, talking about that. That would be kind of fun stuff for me too. But this is an excellent first start. I'm just going to say like that, cause I know you're going to be working on this app forever so you're just going to be taking feature requests forever so it's fine. But yeah, I like it. I like it a lot. James (29:20): Yes, it will keep going forever and ever and ever. And inside of here. What's really nice though, I will tell you this, is that it will give you, in the text analysis, you can do a few things. So you can do, these are different APIs. You can do language, like what language is this. You can do key phrases, which I think are really cool. So it can pick out some key phrases from it that Frank (29:42): it thinks that are important to those categories basically. Um, it'll also give you from it, this is amazing by the way. It will, you can send it any string and it will give you back named entities. So the default here is like, it's like we went to Costco steakhouse located in Midtown NYC last week. So a named entity is Costco steak house, which would be an organization. And then Midtown NYC is a location. There's like date time, there's person, it will even, this is amazing. Give you PII entities, which is personal identifiable information. So that's really, really cool. It is really, really cool. I was amazed. But um, okay. So it sounds to me like they're running a lot of models on this, not just one, not just one magical robot model here, but really just throwing everything at it. Um, I really like those named entities because from that you can start doing some kind of tagging or hierarchical analysis to help people like browse a website, do that kind of stuff. Frank (30:45): Don't do a word cloud. Those are so 2005, but I think you can do a lot with those named entities that sounded especially good. And that first part that you were talking about sounded a bit like summarization, which is another very classic problem and natural language processing. Given a hundred pages of text, you know, give me one page, summarize it. And it's always fun to see what the robots decide is important. You know, what's important, what's not important and the thing. And they did a good job. I really enjoyed it. So I, I took some of handsome and tweets, put them in there and did the whole thing and yeah, it turned out pretty cool. I mean I really like how they visualize it on the website and how they got to go through it. And um, here's the cool part, Frank free. It's just completely free. Frank (31:33): It's always free with conditions. Jane's, there's no free lunch in the world. We all know that phrase. So I assume there's gotta be some kind of rate limiting or throttling or something involved, you know, so you get 5,000 transactions per month. So that's the, okay. Yeah, if you're an app developer, that's probably plenty. Um, but I actually have paid for cloud services where I've hit all the caps and then I do perhaps naughty things and cashed the results just for a short time just to help me get under that number a little bit. Um, but it sounds like you'll probably be doing that any way, cause you're going to do all of this through an Azure function. So you're going to have an Azure function call. Uh, what do they call this? The text analytics API. [inaudible] whenever handsome in tweets. And I'll only run it when ever. It's basically if he replies to someone, I don't, I don't, you know, that's not a tweet that I care about. See, that's concerning because I know for a fact he does more than 5,000 tweets a day. He tells it well. So I figured James (32:40): it out. Right. If it's 5,000 transactions per month, that would be 150 tweets a day. And I don't think so. So I don't include retweets right now and it either he retweets a lot. Um, we all to no judgment. Yeah. And I looked at like his day, in fact, what I did from the last 24 hours, um, I believe it was 30 total tweets, but only seven of those were actual tweets, not retweets or replies or things like that. So, Frank (33:13): yeah. Yeah, I am sorry. I was trying to copy one of his tweets and here to see what the sentiment analysis is, but Twitter for Mac as being our real bear. James, our real bear. I can't select this gosh darn text, but I think, what do you think the Beyonce tweet is going to come up as a hundred percent positive? Oh, that's a good question. So let's roll the dice. Give it, give me a guest. Far seven year old is Beyonce, Marvel or DC? James (33:45): Mm. So that's his favorite tweet is the one that's there. Um, Frank (33:49): neutral, neutral. Let's see. Rolling the dice. Progress bar. Oh, that's, Ooh. Oh, Ooh. Split the Cision James split decision. 47% positive. 52% neutral. Wow. If you're going for binary or yeah, if you had to make a decision neutral. James (34:12): Yeah. Everything else was neutral. But then Beyonce was the positive. Frank (34:16): Yeah, I'm assuming so. They should do like some kind of highlighting so you know exactly which word a hundred percent confident that that was English though. So nailed it. Yeah. Good. I kind of want to put my blog in. Have you tried your blog? Let's put James's blog in. Oh gosh. James, Monta man, you know. James (34:37): Well that's the thing that kind of gets crazy is you can just start to like kind of think about it and in fact, why wouldn't you before you write a blog, go put that into this website and then analyze it. In fact, I mean why doesn't every blog platform or even why doesn't get hub have it. Like here's how positive or negative every single one of your responses on a get hub issue is. Like there's so many great real world like use cases for this because if I start writing something and it comes off negative, maybe I'll reword that. Frank (35:08): Yeah, I kind of want it for every sentence. You know there's probably a vs code extension that does it. We should look into that. If there isn't, we should create it. Highlight every sentence with its positive or negativity. I kinda love that. James (35:23): We should tweet at Nat and recommend this feature to Natha because I would say like imagine how cool it would be as you're responding in get hub and then the over every sentence it like has a little it like puts it in shades of green and red like as you're it. Frank (35:38): That'd be amazing. James. I think you broke the service I pasted your blog in and it's just, it's just not working anymore. I think you've killed it more. I don't know how big that 96% positive JetBlue. Pretty good. Pretty good. That's pretty positive. A 3% neutral and I have to work on that. Neutrality. Not bad though. Not bad at all. That's pretty good. Positive at day. So it's a little bit of a cheat though, because you said IPA and I, I got a, I got the feeling that it's going to associate IPA with positive feelings. That makes sense. Which IPA, how smart is it? Is it a machine learning? It's fun, isn't it fun? It is. Especially with graphs. I do like graphs and charts and I like that it breaks it down and well to me, and I know that there's probably a lot of other services out there. Frank (36:27): I'd be fascinated to see what our listeners use or if they should have been like, you know what James, you should have created your own model. But you know what? I don't want to create my own model. Frank Frank ones that create his own model. That's what Frank wants to do. Uh, okay. But on that topic, let me give you a real quick, um, things have gotten easier in the training, your own specific model. So let's say you actually wanted to train a Hanselman specific data model. You would have to label 10,000 tweets, you know, an ungodly number. It would be ridiculous. You wouldn't do it, you'd get tired of it and it wouldn't happen. Um, but you, fortunately, there is a tool and it's called transfer learning. So you would take a model that's already been trained generally on English, you know, that can give you a general sentiment like that, positive, negative, whatever. Frank (37:18): And then you find tune that model to answer specific questions, uh, specific to let's say Scott Hanselman. And the nice thing there is instead of 10,000 data items, maybe I only need 400 or 200. It really depends. And so I think I'm, that kind of would have been the fun tack, but I, uh, would go on just trying to, you know, uh, do something a little, little custom little, uh, what'd you call it? Artismal. Uh, because it is a pretty good problem. This, um, natural language processing. We, it's not a solved problem no matter what I kept saying during this podcast, but we do have really good tools for it. And transfer learning, oddly enough, is a little bit new to the natural language world, but there are good libraries out there. Uh, one called fast AI. Uh, you can Google that fast AI and they have nice little tutorials on how to spin up your own custom models for specific scenarios. Yeah, I think that's what have been really cool because tweets are fascinating. They're less of here's a blog and here's a whole thing there. There are sometimes context base or there, you know, there's a list of things where it's like, okay, that doesn't, you know, James (38:36): there's not a negative or positive thing mostly there. So I think that would have been really cool as if I created like a website or an app or whatever that would have read in all of his tweets. Then you could go in there and you could, you know, they could already analyze some things and then, but you could go and correct it. So you could add to that model inherently. And as it goes right as you're going, you are creating a custom Hanselman model over time by ranking those. And you could crowd source that in general to, um, that'd be kind of cool. And in fact, you could crowdsource, maybe it's neutral or positive based on what other people think. Frank (39:12): Absolutely. Yup. Yup. And sourcing is pretty much what you have to get into with Henny large data science problem because, well, if you're starting from scratch, you need ridiculous amounts of data. If you're able to do the transfer learning trick, it's, it's more single person doable for sure. But yeah, it'd be nice if like this one that the Beyonce one came up joke 50% funny, like right. The funniness of his joke, like at first say it's a joke and then write it and I've been good. That would've been good. Oh man. Yeah. Have so many ideas. That's the problem. So it's, I'm, I'm actually, I keep throwing all these things at you, but I'm actually just happy that you restrained yourself, found a good solution that works for you, that's going to be done. James (39:58): So it's in a branch, I'm a get hub. It's totally getting ready to go. So I have an Azure function that will update his tweets. It gives me, right now it just does a computer from the last 24 hours. What is the sentiment on that? So it's a rolling sentiment analysis from the last 24 hours. I haven't decided what I want to do. I thought about logging every day over time so I could see a bar chart and graph of it. Um, right now I just have a rolling 24 hours and then I have another end point where I can receive that data back. Frank (40:28): Yeah. Okay. Um, I, I like the idea of breaking it up into time segments at least, so you can see progression over time. I think that's just interesting because I think we all have moods and I just, me personally speaking, I would love to see my sentiment over time graph. I'm sure it's terrible. I think I'm really just projecting here. I just want you to write a feature for me. You can do that per Clara M. dot. Forms so exactly. I think so. It should just be a control, a nice.net standard control. James (41:03): Mm. Um, yeah, I think so. That'd be nice one. I think that that's what I wanted to do is like, it'd be cool to have the current, like here's Hanselman's current, you know, I think I could do the last 24 hours, but I can also maybe do the current day two and be like, here you go, here's a current day from it over time and then have like this chart and then you can group it by month and by whatever. Right. And granularity. Frank (41:27): Yeah. Honestly, just a sentiment per day would be plenty. I think that would be nice. What could have as a calendar, James (41:34): right? And not on the calendar. Activity Frank (41:38): caters progress. He needs to achieve a certain level of positivity every day. James (41:44): Well, be pretty good. That's what I want this app to be like useful to Hanselman too. If he can like look at him, be like, Oh, how are my tweets today? Oh, they know. Frank (41:53): Improve a little bit. Oops. I think this is a good app, man. I think you should. Uh, I think you should write it and sell it. [inaudible] free with ads. Free with ads, with ads. Oh, the wonderful modern world. Yes. Good job. We did a machine learning episode and it was all your fault. This is not on me. I love it so much. James (42:18): I'm glad that you could talk me through it. In fact, that's why I thought that, you know, talking through those three different scenarios that I'd kind of went through my mind. I thought like, here's why I think I was, you know, discouraged from using that solution even though it was something viable. But they each have their pros and cons. Right. That I think that was kind of fun to talk through. Frank (42:38): Yeah. And honestly, it happens all the time. Um, I, I said I was comfortable with neuro networks, but I constantly run into libraries that I just fail to execute straight up. Just can't get the thing to run, you know? And it's so well, it's humbling. You think you're good at computers and then you can't get a bucket of Python code to run. You're like, well, I guess I'm not good at computers. James (42:59): Oh no. Oh no. Oh, so yeah, you know that I know that feeling. Here's something that I, I, I was really fascinated by. It has not machine learning at all, but I'm running these in Azure functions where I'm, um, I'm, I'm talking to blob storage a lot, but I was doing it on my Mac and I was running the function and I kept getting an exception. And I told you that one, I was amazed at the new visual studio for Mac. I'm in preview channel eight. Dot six has a built in terminal, so everything's just running inside and I just like is amazing. Um, so I, I hear it's a cool little terminal to um, with a syntax coloring or you know, like terminal colors. I saw it because the Azure function, like ASCII art came up all color eyes. I was like, Ooh, it's over it. Um, that was cool. Frank (43:44): Bad. I can't believe we've gotten along this many years without it, but thank you. You don't realize it's missing until it's gone. James (43:52): That's very true. That's very true. Cause I'm so used to just running everything by itself are popping up in a new window and then you get lose track of that window and now you just said debug. It's there. Boom. It's beautiful. Um, now on top of that though, I was using the Azure storage stuff. And the problem is that on Mac there's not, it's not built into the SDK when you install whatever visual studio for Mac because there's no Mac or Linux version of it, but there is an open source version of it that you can get via NPM. It's called as rights or something like that or Azure, right. Or something like that, I don't know. And it's on the documentation site and it totally works. You can have full Azure storage Explorer, all the stuff on your Mac right there. And it like blew my mind is awesome. Like built into the finder, like at Mount set as a volume. Um, so what it does, yeah. Basically you, you run it and you tell it where you want the files to go and it's just right there in [inaudible]. Frank (44:51): Yeah. Which is cool. It sounds like it's mounting it. Um, I, I knew they had something like that. Gosh, yeah. I'm sorry. Yeah, I'm just drifting off because I, I did use some kind of file system emulator that they had before and I'm really just questioning if this was it now. James (45:09): Well it was, so what I install on top of that is the Azure storage Explorer, which allows you to connect to your Azure accounts, explore them, Andrew local. But I think you need to, I don't know if that thing and just installs the azurite for you anyways or maybe they had some other hybrid thing, but now when you have the Azure right thing and salt and the Azure storage Explorer, you can just see all the files alongside your Azure files, like all in one, which is really cool. Frank (45:34): Yeah, I love that because um, as we all know, I'm using SQL Lite as my database on several websites and being able to grab that little SQL Lite file as a backup is quite a convenient little feature. So I love it when a, these remotes actually Mount properly and I'm not using, Oh man, I'm really down on the web today. I was going to say, I'm not using a web app to access it, but I'm not going to say that. Not gonna say it at all. James (46:01): Well, you can install this. That'll happen. You'd be good to go. So boom. Perfect. Awesome. Alright. We did it. We did it. Frank, more machine learning for everyone to enjoy. Frank (46:11): It's just an avalanche. You let the little pebbles slip, it knocks a bigger pebble that knocks a bigger pebble. Love it. James (46:21): All right, Frank. Well, we're going to get outta here. I gotta go, but I hope that you stay healthy. Wash your hands and don't go outside. And until next week, Speaker 3 (46:32): this has been merged. Conflict. I'm James Mata, Magnum franker. Thanks for listening.