Steve McDowell 0:05 Welcome to a special edition of DataCentric from Moor Insights & Strategy. I'm Steve McDowell. Today we're going to talk about architecting your data center in the face of an uncertain future. Data is the lifeblood of every organization. For business critical applications to function data needs to be protected. Data needs to be available. The challenge that we find ourselves with today is that the world seems to throw continuous disruptions at the very IT organizations who are tasked with keeping every enterprise's data safe. External forces, whether a pandemic, a fire, hurricane, or even an errant backhoe.. Bad actors who spend their nights keeping your cybersecurity teams awke. Any one of these can disrupt even the most resilient of plans. IT organizations who embrace fast and flexible data center architectures are the ones who find themselves much better poised to face these disruptions. IT organizations who embrace fast and flexible data center architectures demonstrate that they understand the value of their data. They understand the value in leveraging fast and flexible data paths, and in deploying resilient enterprise storage solutions. All designed to deliver data wherever it's needed the most. Data, after all, is the foundation of everything that IT does. So today, to help us understand the importance of responsive and flexible storage solutions, we're being joined by very knowledgeable set of industry veterans. From the Brocade storage networking division of Broadcom, we have Brian Larsen and AJ Casamento. And from IBM storage division we have Brian Sherman. Welcome, gentlemen. Group 1:45 Pleasure. Steve McDowell 1:49 Let's talk about cyber resilient architectures. T he ability to weather the storm. If we look at just the past year, we have seen a global pandemic, an historic hurricane season, earthquakes and forest fires. And while it seems like it's all landed at once, the reality is that these sorts of events happen continuously. And they disrupt everyone's plans, especially IT's. So how do we think about data center architecture in a way that's responsive to these sorts of unanticipated events? Brian Larsen 2:20 So Steve, this is Brian. I'll jump in on that. You know, this cyber resiliency term may be a little bit new, but it's the same but different. But if I look at the definition by one of the groups in the industry, they define it as: "cyber resilience represents the recent shift in thinking from attack prevention, to preparing for the eventuality of failure. And it focuses on how organizations can continuously operate their business and deliver the intended outcomes despite adverse cyber events, whether malicious or inadvertent." So what does that mean? Basically, I think it boils down to what we've talked about forever, is business continuity and disaster recovery. How do you recover your data from a good known source? And, you know, that is something that Broadcom and Brocade have been doing with IBM for years. We've had one of the most, one of the longest relationships in the industry. And I'm going back almost three decades here with IBM and some of the solution sets that we've done together. And it's all been about moving data to the right locations. Whether it's in the room next door in your data center, or if it's across the world, you need to protect that information. And then how do you recover from it? So those are the kinds of things we're talking about with cyber resiliency, Steve. Steve McDowell 3:52 With storage systems, you're continuously moving data around. The books have to match at the end of the day. So when we look at the challenges of cyber resilience, and storage, what are the sorts of things that we want to see within the infrastructure? Brian Sherman 4:06 Well, there's... we'll take it to three key elements that we need to think about here: data security, data protection, and data recovery. Those three elements have to be put into place for the entire scheme. And depending on what your recovery point objectives and recovery time objectives are, that all translates into how you architect, where you're going to move your data, how you're going to move your data, and into what type of storage you're going to move your data. The good news is, there's a lot of different things that IBM has brought to the table recently With their their latest platforms that encrypt the data, starting from the endpoints and continues all the way through the solution, to either a disk array, or from a disk array onto a virtual tape subsystem. And it does it seamlessly and, oh-by-the-way, it starts to reduce some of the overhead associated with moving that data. So one of the things that we should take a look at very closely is the latest solutions that from the IBM Z15 to the DS 8800, I'm sorry, DS 8900 platforms, where they have implemented a thing called Fibre Channel Endpoint Security. And that actually, with no overhead to the data movements, moves the data encrypted from the processor to the endpoint of the DS 8900. That is one start. And then from there, it moves out to any other location that may be moving to from the DS 8800, or 8900, off to maybe the TS 7700 in a transparent cloud tiering solution. That data is now encrypted, compressed, and you have true end-to-end connectivity for data movement. Now, from a Brocade standpoint, and we're the same infrastructure people, we've provided the the interconnect at very high speed, so we're never part of a congestion, we don't try to be part of the congestion issue within the network or within the fabric. We also, for over a number of years, added in compression of the data in-flight, and compression and encryption of the data in-flight. So, we've had that for a number of years to help augment this. But now IBM has actually created the additional security level of when it moves the data and when it's received, and it continues to be encrypted. Now, the good news is, our infrastructure supports both the old and the new. So let's take for instance, endpoint security. You could, if you have encrypted data across the SAN, that's fine. But if you don't have encrypted data, we can encrypt it for you. So you can have a hybrid model here of old to new, and then migrating into a fully encrypted environment and still have your same infrastructure in place. So those are some of the things that we're providing with IBM. in concert with them, s o that people have that investment protection scheme in place today and can and look towards the future of what they want to do and how to employ. Steve McDowell 7:27 Okay, I heard you say a few things. It's about redundancy. It's about integrity. And it's about intelligence embedded within the infrastructure. Now, you talked about it at the transport level within the fabric itself. But let's raise it up a level. Brian, what is IBM doing within the storage stack to both work with the intelligent fabric and also to deliver the right level of cyber resilience within the storage system itself? Brian Sherman 7:55 Okay.. I was just gonna say.. across the portfolio, we do have various cyber-resiliency solutions that we have, you know, Brian, Brian talked about a little bit He mentioned, fail-safe guarded copy that we have on the DS-8000. And it's a, you know, be able to you know, keep multiple copies on on a DS-8000, either a logical air gap, physical air gap, thing, different things we can we can do there with the idea of quick rollback. So it's, you know, the safeguarded copies are fully immutable copies that are maintained on the DS-8000, either the 88 platform or the 8900 platform that Brian mentioned, as well. And if we move outside of the mainframe space, there's things that we've put into our Spectrum Protect portfolio, so around being able to detect when something happens. So if all of a sudden we see backup rates go through the roof, it's like, "hmm, something dramatically changed in the environment." So it's to help people detect that there's something going on in the environment. And we do things in the distributed platform where we can take point-in-time copies and go put it into a mutable file system, or up to an object store that's immutable. So lots of different things across the portfolio that we're doing around cyber-resiliency itself. Steve McDowell 9:25 So it really is a ground up story. We're protecting bits on the wire, and we're protecting bits within the storage itself. One of the most impactful trends of the past decade has been the adoption of autonomous systems. Systems are becoming self-learning, self-optimizing, self-tuning, and even self-healing. Help us understand what autonomy means in the storage world and how these capabilities are benefiting IT. AJ Casamento 9:55 So you know, I think that that one of the one of the points Brian Larsen, anything was, you know, when you're when you're looking at these solutions, that's you have to be thinking about the impact of the applications, because let's face it, we're all in this for the application. That's what customers notice. That's what that's what people see, that's what makes the phone in IT ring, right, is when somebody's unhappy with the performance of an application, or some for some outage in that space. Right. So I would ask the question, or I woud pose the question: as we move forward in the all-flash data center, as we start talking about Non-Volatile Memory Express, as both a new language for talking to storage, but also a new level of performance, right. Or storage class memory, as they come into the systems. When we start talking about going up in speeds, you know, as we move from from our Gen Six into IBM's recent announcement of Gen Seven, and we're talking, you know, 64 gigabit and serial per port, and so on. How are humans fast enough? Right, that's the that's one of the big challenges that exist. So, you know, yes, there's the time honored sort of image that people have of network operations centers, and you have, you know, dozens of people sitting in a half-lit room staring at big screens, waiting for light to pop somewhere so that, you know, so that they can jump on it and begin to go investigate what's going on. And, and, you know, it's cool, you know, it's, it's a nice image, and there's still a place for, for that higher-level. But the response times aren't there, or you can't, you can't all-of-a-sudden, you know, have a red blip on the screen, and then the human goes to begin to investigate. So the autonomous SAN concept for us is: one, how about we learn the environment? We're the fabric. We're in the middle between the application and the data. Right. And, you know, Brian Larson also mentioned, recovery time objective and recovery point objectives. And those are not really deep conversations to have. Recovery point objective is a simple conversation you have with an application owner that says: hey, you know what? How much data can your application drop and you're still happy? And pretty frequently, in mission critical applications, that answer is zero, you know? And then the recovery time objective is, is a similar kind of conversation that says, hey, what's the cost opportunity of your application being offline? Right? And sadly, that's not a constant, right? Because you know, if you're, if you talk to any of any of the payment card industry, for example, holiday seasons are not times where they're happy with any kind of a disruption, or outage or slowdown or anything like that, right. And so there's, there's varying levels of expectations of that environment. So learning the environment, figuring out what's normal look like, you know, Brian Sherman's point, you know, when you want to trigger on, Hey, there is a weird amount of data movement going on, that may be a ransomware, or copies that going on in the background, right? Or how do you know that if you don't know what your normal looks like, right? And so self-learning for that. The self-optimizing piece, right? One of the things that's also true in these environments, customers have a mix of technologies on the floor. Very few people get an opportunity to build a green, what we term a Greenfield data center, right, where you just start from, from cement, and work your way up. And even when they get to do a migration into someplace like that, pretty frequently, there's, there's multiple applications that are getting carried forward that are still riding on older platforms, you know, and those mixes of technologies are challenging to people. And so we need an environment where we can optimize those flows based on the on the performance of the consuming platforms, right. Stop congestion. For us, it's a it's a pretty simple conversation around congestion, being when frames are entering the fabric faster than they're leaving. And so if, if the consuming device has a threshold, let's let's feed it at that threshold, let's not override. The self healing piece, you know, similarly, when we detect the bad behaving device or a device that's in trouble, or a link that's in trouble, right, then we should we should act to mitigate that problem at hardware speeds, you know, in the fabric, rather than just popping an alert, send a message to a human, who's then going to go begin trying to drill down in and look and see where that problem is. Yes, that person may still need to go review the unresolved. But do you really want to hold up ticket sales or, you know, credit card charges? Or, you know, pick your pick your favorite nightmare scenario for you know... you made the point here in your introduction, the books have to match? Yeah, banks, banks have no sense of humor about the set of books on site A not matching the set of books on site B. You know, they get really, really feisty about that kind of stuff. And so that's why these changes have to happen as the technology gets faster. Autonomous has to be has to be a mechanism that people employ. Steve McDowell 14:42 So it seems like we've been talking about the autonomous data center for decades. It's been a slow take, but autonomous infrastructure is quickly becoming a part of every data center design. I think part of the challenge in driving the acceptance has been a question of trust, right? Humans don't like to cede control to their machines? How do we know that we can trust all of us autonomy that's happening? And how out of the loop for humans really when we start looking at things like self optimizing and self healing systems? AJ Casamento 15:20 Well, so that's a fair question. And, and it's a, it's a really good point, because you can think back not all that long ago, when the first network management software started showing up. Right, and they would, they would spot a heavy traffic area, and they would go, you know, we'll load balanced, and they would create a congestion point on the other side, and then flip it back. And, you know, there was all sorts of fun going on it. But I think one of the one of the interesting things, particularly in our our joint portfolio between the IBM storage solutions, and and the Broadcom technology, okay Broadcom is that we've got a very long track record, you know, Brian Larsen mentioned multiple decades of learning in these in these environments. And so humans are still at the top of the pyramid, in terms of reviewing whatever the action is making sure that it is making sure you know, we may mitigate a problem temporarily. When the human may go look at the mitigation we did, you know, to in the morning, you know, they may look at it later and later in the day and say, you know, maybe if I rebalance some workloads, from these applications across these, these storage devices, you know, I could avoid those kinds of challenges in the future. But that's part of the well, if you've got the baseline of the self learnings, the right, then you can, you can do that. Right? Because, you know, you know, what those patterns look like him? You know, I'd like to tell you that it all just runs steady state, you know, because when we talk about sizing, everybody, everybody talks about it as if everything runs just flat, right? You know, like, there's no, come on seriously, who's whose environment does that? No environment does that right? And so, you know, it might be every second Wednesday at 9pm, these two applications kick off that cause of contention problem, right? And, and, you know, unless you've got some history with it, some track record with it, how will you know, that that's that that's normal, maybe it peaks a little bit, but you don't want to, you don't want to panic, right? So humans are still involved, there's going to be a continued level of trust, but I'll tell you this, in talking to the people, particularly about the self optimizing pieces that we're doing, where, because of the way Fibre Channel works, we know the devices that are attached, we know they auto register to us, we know what their performance capabilities are. So the concept that we can say, Hey, you know, this guy drives like, AJ, you know, he's the the grandfather in the family car, you want to put them into this lane, right? This guy drives like Brian Larson, you know, he's a little bit more performant. And then we got this guy over here. He right who's like, you know, but a high performer out on the street. We can sort those automatically, we get rid of a lot of the mixed contention problems, right? Brian Sherman 18:09 So a lot of it seems, just to add on to AJ's comments, there are a lot of this has been obviously built and evolved over time, a lot of this. I would say self optimized, auto failover, all these kinds of things. Many of these things have actually been in the network for quite a few years. And it's been done behind the scenes. So your question about why should humans trust this now? Well, the tool sets have advanced the amount of Deep Packet learning that we do now with evolutions of our ASICs. And our management tools like SANnav Management Portal and SANnav Global Management, so they go down, it's more analytical, we get deep packet inspection and understanding we learn about the flows, and we take action against those. A lot of times, just completely transparent to that user. But they should be trusting the fact that we've been doing a lot of this within the fabric for years, and now we're exposing it and giving them even more analytical understanding of what's happening. And so those are all happening in concert with one another. And we've been doing it for quite some time. Steve McDowell 19:21 And again, this is one of those areas that have multiple layers. So how much interaction is there between the intelligence embedded in the fabric, and what's happening at the higher levels of the storage stack, or IBM storage systems engaging with the fabric, if you will, to deliver a higher level of intelligence or greater level of capability? Brian Sherman 19:44 Yeah, and walk through a couple of ways that we're doing that from the storage side. So within IBM Storage Insight's AI capability, where we're collecting the information from from Broadcom and Brocade. And then if we need to go deeper than that, then SANnav from Brocade comes into play as well. Though, you know, we've been talking about self-learning the environment and all that gets tied into, you know, understanding the performance flows. Again, what's in what's normal within the performance environment, if we see deviations, send your proactive notification, that there's something there. And again, the ties between what what we're doing and collecting fabric information. And then again, partner with that with that with SANnav to get deeper into understanding the workflows and what's what's really going on. So yeah, you know, strong, strong, strong partnership and leveraging each other there for what we can do for the higher level software stack, for us called Storage Insights. Steve McDowell 20:50 So one of the things we talked about is resiliency and the ability to continue to deliver services in the face of disruption, whatever those might be, right. And I really want to keep a little bit on the on the self healing piece. Can you give me examples of where the self healing technologies that you've talked about, have made a practical real difference? Brian Sherman 21:12 You know, Brian and AJ may have some additional comments on this. But I'm going to take this from long distance connectivity between two sites. And the the way in, or the long distance connected connections now are all IP based, right? They're either one gig, 10 gig, or 10 gig at this point in time. And to have a full resilient path between two locations, I don't care if they're 10 miles away, or 5000 miles away, there should be redundant paths to get to that, that location, and they should be actually services from different providers, so that you keep these paths separate from one data center to another. An example of that not happening was quite a few years ago, one of the major airlines, the major airline here in Minneapolis had a data center 10 miles south of their main terminal. And they had a backhoe actually cutting and digging something and took out both pads from different providers, but they were coming out the same spot on the building. So you need to make sure that your pathing is separate, isolated from one another. And for the most part, you probably should look at different service providers. But to that point, if you're if you're going from point A to point B, and they have different paths, our long distance gear been able to scale and logically combine the physical entities into a logical entity so that the data flows across them on an equal basis. And if anything does fail, like one path fails, the network itself will recover without any impact to the application knowing about it. It may run a little bit slower, because you may be at half bandwidth, but the applications continue to run. So that's just one example of resiliency within the network that we can provide. I don't know, AJ do you have anymore. AJ Casamento 23:12 I think the thing, the thing I'd like to add to that I think is let's start at the foundation. One of the things that's different about Fibre Channel and and, you know, when you look the implementations we have with storage systems, we do dual redundant hardware isolated fabrics. Okay, so where, for instance, ethernet topologies all merge at the top core, we architect so that no single human, with a single command line can take down both both sides of the environment. Past that point, the next the next question is, the disaster tolerance is always a sliding scale of time, people and money, or equipment if you want to translate it that way, for how big of a disaster you're trying to avoid. Right? So the data center in it, you know, they called the data center because it's the data that reigns supreme; it's not called the server Center, the network Center, and the data sets have to be has to be maintained. That's why the storage is so critical. And then, you know, you're trying to avoid just a situation within the room in the data center. You're trying to avoid situations in the building? Is that a campus-wide thing? Is that a metro-wide thing? Is it an interstate or intercontentintal thing? Right. One of the proof-points that I would offer is that, when you had that power outage so that memory started in Ohio that swung up through Toronto, came down through New England, hit New York, down into Philadelphia, and it makes for the really nice satellite imagery where you saw the whole northeast blacked out, righ,t and everybody goes "Oh, look, I remember that you remember where you were?" Yeah, cool. Well, if you're in IT those are not you know, those kinds of fun times are not fun time. Sorry. So one of the conversations that started happening immediately after that is, guys, replication across the Hudson River from Manhattan to North Jersey, no longer sufficient, right? Because, while the data centers themselves, they were still up and running, right? They have power generators, diesel generators, and so on. Nobody planned for all the infrastructure around them to meltdown simultaneously. They couldn't talk to anybody. Well, you don't get to take down the payment card systems for the nation, right? And not have people notice. Because all the folks who weren't in the northeast, you know, they're all trying to, you know, pay for groceries with their with their debit cards, or whatever. And it's not going through, and who's got cash? And are the ATM systems up, up and running, right? So the kind of resiliency that we want to talk about, the kind of disaster tolerance that we want to talk about, basically has to encompass the concept that we can take a major outage, and people not notice. And that's one of the things that fiber channel has done an extremely good job of over over the decades. For example, we know where all paths are, we know the entire topology. So unlike other technologies, if we lose a link, if those are known, good alternate path, we bring it live, there's no reconvergence to the network, you don't upset everybody, you don't disrupt the whole room, why would you do that? Right, that doesn't make sense, certainly doesn't make sense for storage. And I'll leave you with this thought and then let Brian Sherman chime in on the on the redundancy piece. And that is, there's a difference in the type of network we're talking about. You can walk up to any CIO, right? And, you know, they'll say to you, Hey, you know, Ethernet, Fibre Channel, storage, application, you know, it's all good. It's all just network, really, okay, so I'm going to take away one of two things from your desktop system: I'm going to take away your network connectivity or your storage drive. You pick. Invariably, they are going to pick the network connectivity, because they can still keep working locally, right. But if you take away the storage capacity, the question becomes, what color is your favorite screen? And that's not a fun question. Brian Larsen 26:59 Yeah, definitely. So and I was just going to add a couple points on there from from, from the storage side of the house, and kind of starting at the at the ground up, you know, things that we do with our FlashCore technology, so that, you know, if we lose a specific, you know, chip inside the FlashCore, we'll just restripe the data. So it's, you know, down at that lowest level, we don't fail the drive, it's just restripe the data across the other assets that are already already in, in the FlashCore module itself. We take that up, and AJ here, we're talking about data centers and resiliency. And so as much as we do for putting resiliency into the hardware and software, stuff is still gonna break. So that's why we have other other solutions, you know, continuous availability solutions that we call HyperSwap on whether it's, you know, DS8000, or Spectrum Virtualized family. So, hey, that will give you great, you know, high availability, continuous availability locally, whatever, you know, people define to local. But to AJ's point, when we had the power outage in the whole East Coast, again, that changed everybody's disaster recovery plans. So we see the majority of our larger accounts, you know, leveraging what we call three or four site type of redundancy. So you know, at Site one, I've got a high availability, continuous availability there, if something takes out that East region again, then I've got DR, either on the other side of the world, or the other, one of the different power boundaries in North America. So you'll start at that the lowest level, but it goes all the way up to the three and four site type type or redundancy and availability. AJ Casamento 28:43 But you're also, just one additional item there, you're also seeing people change a thought process around testing. Because one of the things you never used to see was people actually test the failover of the applications between data centers, right. And now what you see is that the head-set that says, I'd rather test it on on my schedule, than test it when it actually happens. Right, because that's a whole different level of scary. Yeah. Brian Larsen 29:11 And we certainly see that more and more clients are doing the site switch or the toggles. Right, but between data centers, so that it's not a test. At that point. It's just normal business operations. Yeah. Brian Sherman 29:23 And to Steve's point, earlier, he mentioned a lot of folks are going out to the cloud. Brian, you've just recently.. well, within the last year or two, released the Transparent Cloud Tiering from the DS8900 and DS8800 out to the TS7700 as an object store, so that you can create a hybrid model and move data just right directly from the DS8000. out to that that object store with no overhead on the on the mainframe. Or you go to a public cloud, right? Brian Larsen 30:02 Yeah, yeah, absolutely. And, you know, probably back, you know, five years ago when we started talking to everybody about Transparent Cloud Tiering and moving moving mainframe data to to an object store, I think everybody looked at as 'like you want to do what?' Once your once your walk through it, it's like, okay, yeah, we want HSM out of the data movement team. And so, hey, tell the DS8000 to go move the data. And tell me when you're done. And and, you know, that's part of it. So if we, if we're moving data, from the DS8000 to 7700, we're using the grid links, and getting tremendous performance in doing that. So it's always going to be faster, leveraging the grid links for the 7700 rather than going out over an IP-based network to, whether it's an on-prem object store or an off-prem object store. So, but yeah, it's it's just part of part of you know, that what we've been doing for a while now. Steve McDowell 31:04 Not every disruption is the result of a backhoe incident, or a cyber attack, or even a natural disaster. part and parcel to delivering a fast and flexible data center is knowing where and when to embrace new technologies. In the past decade, we've seen the migration of hot data into all flash storage, QLC is pushing all flash down the storage tier. And now we're looking at technologies like NVMe over fabric NVMe, over fabric coupled with Fibre Channel, Gen7 seems like a big win. But these are all technologies to help move data faster, and with more flexibility. So my question to you guys is, you all talk to IT practitioners every day. So what do you tell them about how to plan for the changes that are happening in the storage industry? AJ Casamento 31:55 I'll jump in first, and then the other gents fill in in the gaps. But basically, one of the things that that many customers are surprised about is that IBM has actually been shipping technology that will allow NVMe over Fabrics for 10 years. So the going back to the IBM B-type SAN gen five, we can we can actually carry NVMe traffic. Now it's not as not as fast or as or as intelligent as the gen six platforms that IBM has been shipping for five years, or the gen six HBAs that are required that IBM has been shipping for five years, or the you know, the, Brian Sherman will talk to you about the flash systems that have been shipping for two plus years now, that are capable of this as well. So it's not revolutionary as much as it is evolutionary in the fiber channel space. Right, because with a software update to the firmware and device driver on the HBA making sure that you're running the right version of software on the IBM B-type SAN switches, and then the right version of a different [unintelligable], let's just say the current version of controller code on the on the flash system, you can already carry NVMe over Fibre Channel fabric traffic in those in those environments, right. And so you're not rip and replace, and it does scale, and it does perform. And yes, there are other alternative technologies out there. And you saw a lot of people talking about it. You know, one of the wonderful things about standards is you have so many to choose from. So you know, in that, in that space, you're gonna see other technologies. And the challenge that I think most people are going to struggle with, near-term, is where they're going to get the performance and scale simultaneously. Okay, because you can take other technologies that require lossless Ethernet switches and are going to run their traffic on UDP not TCP and require RDMA capable NICs, you know, which means, you know, I gotta wait until I get new adapters in my servers, I gotta rip and replace my Ethernet switches, you know, you can do that. And, you know, you can run a performance environment, but you can't scale it the way you do TCP. Or you can scale the TCP side, which is, you know, a software stack change that there's hundreds, literally no exaggeration, over $300 billion worth of startup funding investment into trying to make it make TCP perform more like Fibre Channel in terms of lossless low latency deterministic delivery of storage traffic internal to a data center, which by the way, can I just point out was not the design goal for TCP, which was, you know, written for DARPA, during the Cold War, you know, they're the r&d arm of the Department of Defense, right? So, two wildly different design goals. No one should be surprised that they behave differently internal to the data center, right. So when we look at that, it's it's a very straightforward evolution for us. You know, that's, that's mostly software based, but one of the really cool things that we see about it recently released, recently being within the last six months or so, for VMware to to vSphere seven, as an example. So now, you know the IBM flash system admin creates a namespace ID. Right? In NVMe, the logical equivalent of a one in SCSI presents it the VCenter. And the and, and the normal Fibre Channel management. Let's see, let's do all that make sure that that it's visible. And then just in the vSphere tools, a vSphere admin chooses what to move to this new performance, and without touching a cable, right? And without stopping the application, more critically, I think, right? You can get an 80% increase in IOPs without tuning, right? And one of the things that's always fun about application people is they always figure out how to consume all the performance the infrastructure exposes. Right. So that's, that's kind of kind of the fun thing. So for us, NVMe over Fibre Channel fabrics is an evolutionary piece. It's not a... it's disruptive in the sense of what will people do with it? See, right, you know, and, and we're only in the first stages. So everything I just described to you is, is just sort of entry level conversation. There are features in NVMe, like the massive parallelism in command queues, that once the hypervisors, figure out how to assign a complete an NVMe queue to every virtual machine, right ,is going to unlock a level of scale that's, that's going to be bizarre, right. But it will, it will take time. Steve McDowell 36:36 The vSphere Seven support is relatively new, announced just this past summer. When we first started talking about NVMe, over fabric, there was a lot of energy around the idea of NVMe over Ethernet. And you've talked about some of the challenges there. But when I think about that, I cringe a little bit because I think about my days with I scuzzy and diagnosing storage and networking issues together. So I think an important point is that as we look across the fortune 1000, nearly every organization has an installed base of Fibre Channel. This is a technology that we understand, we know how to use and we know how to manage, right, so that makes NVMe over Fibre Channel a nice way to leverage existing infrastructure to get new capabilities without a huge new capital investment. AJ Casamento 37:27 In I think, you know, just just sort of following on the Ethernet conversation there, you know, there there, there was a lot of conversation about how this is going to play out and what it was what it was going to do. And what you see is the prevalent distribution of NVMe over Fabrics todays is Fibre Channel, because it works today, and it works at scale and it and it performs today. And your point on the iSCSI piece is a very good, a very good sort of sort of forecasting for how NVMe over TCP is is going to see similar kinds of challenges. Right now one of the reasons why the the ROCe v2 of the RDMA over converged Ethernet version two folks chose not to use TCP had to do with the latency of the software stack, right. So we share our major customers with IBM who have investigated IP storage platforms previously, I can think of a bank up not too far from where Brian live, Brian Sherman lives where, you know, they they made a major shift into into IP storage. And when you talk to their IT director today, what he will tell you is, had we understood the scale limitation, we would never have gone down that path. Right? Because what happens is that the first implementations look pretty clean. And then as you start to scale it, the applications, not not that they not that they crash, but they become non deterministic, right. And for a lot of for a lot of traffic. And for a lot of data sets and a lot of applications. Time deterministic is kind of a thing, particularly you know, banking as an example, right is, you know, time deterministic matters, right? Brian Larsen 39:09 You know, AJ, Brian Sherman and I have been involved with quite a bit of testing at the Washington System Center where we've actually done application testing with Fibre Channel using SCSI protocol versus Fibre Channel using our NVMe over Fibre Channel fabrics. We've done one in the past with our Gen six platforms. And we use an Oracle Oracle database on that, and we had very good results Brian, I don't actually have the exact percentages of improvement. They were very substantial in actual transactions per minutes, etc. And we just completed the same type of testing of Oracle on top of vSphere now so we have been testing all the functionality, multi pathing capabilities running at full 32 gig Gen six and Gen seven, interconnect, so it's real, it did, it actually has a significant we can prove the benefits of using NVMe over Fibre Channel. So Brian, you might have a few more comments on that. Brian Sherman 40:17 Yeah, definitely. And so we see both the performance improvements as well as CPU utilization reduction. So, you know, what does that mean in terms of licensing or being able to drive more workload per virtual machine now. So it's, you know, equated to the same type of removing CPU weight is when we introduce flash and kind of how you started off to him in the evolution of flash itself. And the difference that it made and and to AJ's point, the application guys will consume it. So you'll the low latencies, the more we can drive for for IO, then okay, you know, how, how can I try more workload and get more out of that resource from from the compute side now, as well. And so all those are just purely fundamental benefits that, again, the joint work we've done. Here it is, I mean, I can light and dramatic, it's not just, you know, 5-10 percent, we're talking, you know, 50 to 80% type of improvements. Brian Larsen 41:20 Oh, and we earlier, we did have a conversation about the deep packet inspection that we've done in the past. And with NVMe being used now, within the fabric, the all the good news is all those metrics are still there. But we've even enhanced them so that you can actually go down and get more metrics on NVMe, specifically, so that when the when the environments evolve, that's already there. So clients can feel comfortable that the management schemes are ready for you, as well. AJ Casamento 41:49 As the fabric, we see, we see the performance, right, so we can see when, when something is, is congested. And so the the premise here is to be able to identify to the IBM flashsystem. Say, for example, hey, this, this server, this, this Server Admin, has structured his queued up as an example, to ask for more data that he can actually consume. In his head, that's a cool thing, because data is always waiting for his application, his application is never waiting for data. That neglects the fact that he's, he's not actually an only child, there are actually other people in the data center that are, there are impacts to behaving in that, in that in that fashion, right. And so, you know, being able to, to alert that into the, into the fabric, and identify that up into the Storage Insights platform and say, Hey, you know, you've, you've, you know, you've got, so they're actually Storage Insights is actually pulling data, or, or alerts out of out of our platform so that they can see it, and they can, they can deal with it. And so that, that kind of integration, you know, and I'll toss that back to Brian Sherman as well. But the kind of integration, I think, plays very nicely, because it's an ecosystem, right? And any, any one element in the ecosystem can do things. But when the ecosystem works together, right, that that better together message you're asking about, you know, that's when things really become better for the advocate. Right, which is what we're really trying to do is not be the problem child for the app. Right? So that's a piece that that's in there, by the way, we're working with the with the with the adapter vendors for the for the servers, and the OS stacks, even all the way up. So this kind of new for us popping up notifications to the operating system level so that the multi-path IO platforms, and can I will use the phrase fail better, you made the point early on, Steve, that that failures will occur. So that's an absolutely true statement. At some point, Murphy will catch you, right there. It's, it's a it's a reliable thing, right. So it's not that outages won't happen. It's not the devices won't fail. Brian Sherman talks about failure of a single component inside of a FlashCore module. But they also do load balancing within the array, they do bunches of stuff that that that people gotten so used to they don't even think about anymore, right? It's just all magic until they start trying to throw a disk drive back in a server and expect all the same behaviors. And it's like, Yeah, no, that's not actually how that works. Right. So, you know, there's, there's that level of better together that that we leverage each other's technologies that that are intended to make outages invisible as much as possible to the application layer. Yes, the admin still needs to know. But but the application should never notice, in a well architected environment Steve McDowell 44:43 Well, to the average it practitioner, this feels like very new technology. It's worth pointing out that IBM is delivering NVMe over fabric for I think three years now, certainly as long or longer than anyone else in the industry. And brocade has been involved in developing these Standards from day one. It's I hear what you're saying about the evolution of fiber channel. I want to tie that back to our earlier conversation about autonomous infrastructure. How much of the goodness that we talked about with self learning, self healing, self optimizing, translates into this NVMe over Fibre Channel world. AJ Casamento 45:23 So that was one of the things that Brian Larsen just alluded to a couple moments ago, which is, when he talks about Deep Packet, one of the things that I want to I want to make clear is you can get deep packet inspection in in, say, for instance, Ethernet technologies, well, those are called firewalls. And you don't generally have an expectation of the same levels of performance in terms of packets per second, on a firewall that you do on a spine or super spine, equivalent to what one of our core directors with the in fibre channel. We build it into the silicon, right, and we do what seems to some people like silly things. We actually measure every frame on a port. We not only measure every frame on every port, when we do the auto learnings, and what we're doing is we're actually in, and on a Gen seven platform, but that's the equivalent of up to 20,000 flows. So server to storage. Where we're going to look at whether it's SCSI or NVMe, because those are two separate languages. It's not just physical devices, it's two separate protocols. And we actually are looking into the header of the Fibre Channel frame on every frame for those 20,000 flows per platform to look at the read and write mix, so we can do things like like when an oracle database administrator might be standing next to the storage admins desk with a forelorn look on their face, you know, once the pandemic's over and you can see it through their eyes about the mask or whatever, you know, thinking that perhaps the storage is is part of the performance issue. You know, what if you could respond back with Well, actually, here's the latency of the first response to the query, here's the exchange completion time, here's the pending IO on the fourth, you know, it's not the DSA9900, it's not the FlashSystem 9200. And we can now do that for NVMe traffic as well. So we can do that for the read write commands in NVMe, as well, on every frame, when we talk about the self learning for for latency in the environment, you know, as the latencies drop, the amount of data in flight on the cable goes up to crazy amounts per second, right. So your management needs to get better, the granularity of your measurement needs to get visual, you know, when we look at fabric performance impact, you know, we look at latency in the fabric, we're actually looking at, to get geeky for a second, we're actually looking at the at the latency of buffer credit return on every port in the fabric more than a million times a second. That's how we determine when a device is being slow. Right, so we can figure out, Oh, we need to move AJ over to the slow lane and get him out of Brian Larson's way or get him out of Steve's way so that they can drive speed they can do. Right? And and you know, by the way, AJ still needs to get where he's going. It's still a critical application. Don't screw it up. Right. But, you know, allow everybody else to get their performance. We can do this now with NVMe, as well as with SCSI, right, because to us they're just, they're just protocols. Steve McDowell 48:17 As a storage administrator, I just want to seamless. I want to plug in my Fibre Channel, zone, my systems and just have it all work. What you've described over the past 45 minutes or so tells us that there's a whole lot happening down there to make that experience seamless. This has been a fantastic conversation, I follow the storage industry. And I learned a lot. But do you have any thoughts you'd like to leave us with a closeout? Brian Sherman 48:43 You know, what I do have one thing I would just go that just kind of close out with, you know, the relationship that Brocade and IBM have had is well into the 30 years plus and the confidence of what we are doing together better together comes to play because we we test all these solutions together. We look at advanced technologies, we try to make it real world like so that people can realize that it's ready for primetime. We have equipment in virtually every IBM lab that's in there just through normal development. So we have that relationship. We continue to have that relationship. We have Brian Sherman on virtually every one of our webcasts, which is fantastic. Thank you, by the way. And we are really looking forward to continuing that, that partnership, so it's better together. Steve McDowell 49:35 So the relationship is deeper than just a couple of vendors who close deals together. Brian Sherman 49:40 I would have to say so. Absolutely. Steve. Absolutely. Steve McDowell 49:44 Thank you guys. I think this is a good place to stop. Thank you again for participating and I'd like to thank our audience for tuning in. There'll be links to both brocade and IBM in our show notes. So again, I'm Steve McDowell and this is DataCentric, and we will see you next time. Thanks for tuning in.