Episode 5: Breaking Down the Data Silo
Sanjeev Sharma: Hello everybody and welcome to the Data Company Podcast. I'm your host Sanjeev Sharma, and my guest today is Gary Gruver himself. Gary is very well known in the DevOps industry. Gary, you've been there right from the beginning. Why don't you introduce yourself, tell us what you do today, tell us about your company, and you're also a prolific author. So I definitely want you to talk about your latest book.
Gary Gruver: Yeah. I led the transformation of HP, and it had such an impact on the business and my life and the productivity of our organization that I spent the rest of my time trying to help as many people as I can, improve and avoid the mistakes I've made.
Sanjeev Sharma: Excellent. Now from a work perspective, you're no longer with HP, right? You have your own practice right now?
Gary Gruver: Yeah. I went from HP where I led a large transformation, then Macy's as a VP of QA Release & Operations. For the last six or seven years, I've been consulting.
Sanjeev Sharma: Excellent. Well that's terrific. Now, our thesis for the Data Company Podcast is that every company is a data company. And it's really now as DevOps and agile have matured that people are realizing, “Hey wait, wait a minute, what about our data?” Somebody pointed this out to me the Agile Manifesto does not mention the word data, and that was fine for then, But today, you have to address the data part. So as you work with your clients, as you help them adopt and scale DevOps, what kind of data-related challenges have you been encountering?
Gary Gruver: You know, one of the big ones is people would ideally like to decouple their systems, so they have smaller, loosely coupled systems that are much easier to manage. I see a lot of organizations try to do that with the application, but they never decouple the data. And if you can't decouple the data you've still got the tight coupling amongst your team, it's still the same deployment pipeline. It's got to go through the same integrated test environment. So the big challenge is trying to qualify separate pieces of it, trying to get it solved. You need to think about how do you solve the data problem, and I usually start by not starting there but trying to build in quality and get some, some consistent gains.
But once you get your deployment pipeline up and running, you want to simplify it by changing your architecture and decoupling things, and you can't decouple it if you don't decouple the data and that's where I see most organizations struggle when they're trying to decouple. They've got this tangled mess of data that everything's reading and writing to it at the same time, and it's not designed for multiple actors to be interacting with it. Because of that, people have a really hard time separating it. They'll go off and separate all their applications, but they don't get any of the benefits associated with going to a loosely coupled architecture because they're bound together by the data.
Sanjeev Sharma: It's just so true, right? I was meeting with a client just recently, and they were showing the different data architecture and the different databases they had, and there was this one single application which was their core business application. So I asked him, what percent of the data is in, in that database? And he had a very exact number. He said 43% of the data is at database. But then he paused and said, “99% of our applications touch it.” Essentially, this single thing with multiple schemas, multiple tables, everything entangled.
Gary Gruver: Which meant his organization is one large, tightly couple deployment pipeline and he needed to think about it that way. And he needs to design how he's gonna fix it that way.
Sanjeev Sharma: When we went through this journey over the last decade of adopting DevOps and breaking down silos, people neglected the data silo. It's only when you've taken care of environment provisioning and automation that you realize, “Wait a minute, now my friction point is data.”
Gary Gruver: I see it show up in another way. I have a lot of clients that have automated their testing, but not the data setup. I have one organization automated a lot of testing, and I make companies run this experiment, which is, if you've got some automated testing, pick an environment. I want you to run all your tests 20 times in a row and see if you get the same answer. Some organizations can't run the experiment because while their automated testing can run in hours, it takes them two weeks to get the data set up.
Sanjeev Sharma: Absolutely. In fact, I sent a tweet out recently and the tweet was, if it takes you two hours to set up your test environment but two weeks to get your test data, how long is your test environment setup? Even if you’re doing two week sprints, but test data takes three weeks to be provisioned, you have no agility.
Gary Gruver: Right. That sounds like what a lot of people do is they automate their testing to make their testing cheaper, and they don't take advantage of this new capability to change the organization. So the classic example is what Goldratt covers in “Beyond the Goal,” where he goes into it and says, the first application for computers was really an MRP system where you plan all the manufacturing, what to order, what to build, all those sorts of things.
Black and Decker implemented it and had world-class levels of low inventory and best availability imagined, so everybody else in the industry went off and implemented MRP systems. And what they found out at the end of it was, they weren't seeing any big benefits. Well, why did Black and Decker win and they didn't? There was an inherent rule that was built into the system that nobody else changed, and that was because in a factory like 300 people you'd have 40 people running MRP. When they automated it, they went down to one person. So they had a productivity gain of running MRP. What Black and Decker did is they ran it several times a week, and they changed the capabilities of the entire organization versus what everybody else did is, they still only ran it once a month.
Sanjeev Sharma: Right. They did not change their culture. They did not change how they leveraged technology.
Gary Gruver: Yeah. Because they had a new capability, they didn't take it to their advantage, and a lot of organizations do that with test automation. They went to they're dev complete to start running their automation, and they're not running on an ongoing basis to give feedback to their developers, so they can build in quality. You'll do that with the code, but you've also go to do that with the data. If you don't solve the data problem for how they're gonna set up and run these tests, you can't run it more frequently.
Sanjeev Sharma: You're preaching to the choir here, right? That's the challenge you know we have focused on fixing. This problem, in my opinion, gets even exasperated further when you start going into new architectures like microservices because it is very easy, as you mentioned, But unless you're in a green field application development space, but even in green field, usually your data is in green field. Because you can throw away your code. You can't throw away your data. You have years and years of transactional data, which you will need in order to operate, right? I mean, if the same customer has been around. Do you see that happening with customers as they're adopting cloud and trying to move to the cloud? Because all I see them do is create another island of data in the cloud now.
Gary Gruver: Yeah. A lot of my clients yet are not aggressively moving to the cloud that much. I mean there, there's some that are doing it, a lot of green field applications are moving there, but a lot of legacy, tightly coupled systems that I work with, there's some basic fundamentals they've got to get right first before they move to the cloud, and a lot of them are at that stage. They're not to the large scale moving everything to the cloud yet.
Sanjeev Sharma: You believe that the main reason for that is lack of decoupling, so you cannot move parts of the system to the cloud because it is so dependent on what's on-prem.
Gary Gruver: It's a big, complex thing, or they're just not that far on their journey.
Sanjeev Sharma: Okay because I recently met with a client over in Asia Pacific, and they actually moved some core systems to the cloud, tried to operate it there, and then moved it back because they realized that there is so much entanglement into dependencies to other system. Unless they were to move everything to the cloud, they were not getting the value of moving a part of the system to the cloud. So they actually turned on, turned back on their on-prem system. They re-architected for the cloud because just taking a part of a larger, complex, interdependent set of applications, just moving one or two up to the cloud, it was making the operationalization even more complex and more difficult to manage.
Gary Gruver: And a lot of what I see with the cloud and cloud capabilities is the types of things you can do internally with Kubernetes and Docker and Helm Charts that enable you to have a lot of that flexible capacity internally in the data center in different types of things. Some of those are really hard for a lot of organizations to do internally, so going to the cloud gets them those capabilities out of the box. Some of the others that struggle with it instead of moving to the cloud are trying to figure out how to implement those cloud-like capabilities internally. When you get to that model and those capabilities, it's much easier to move to the cloud. But, they're not all there.
Sanjeev Sharma: Understood. A lot of these analysts talk about data gravity and data friction, right? Data gravity has all got to do with your application cannot be far away from the data. Most companies are not in a position to move hundreds of terabytes of data to the cloud. It's just not cost effective, and that means if that remains on-prem, the applications can't go very far either.
I wanted to shift gears here. You're one of the top leaders in the industry. As I'm talking to people I mentor, and there's something I'm very passionate about, if one of those people came to you whether they were a fresh new graduate trying to figure out where to go in their technology journey, or they were somebody who was a seasoned technology practitioner looking for a change, because you know, maybe the need to refresh their technology skills. What advice would you give somebody who has that question and seeks for some guidance based on what you’re seeing?
Gary Gruver: I tend not to be technology focused. I tend to be, how do I go in, analyze a company, figure out what the biggest issues are and where the roadblocks are and what's stopping the organization. If that's the organization you're going to be in, if you can map that out and understand what's slowing them down and what's wasting an efficiency, and you can figure out how to fix those things through automation, types of tools or processes, I think you're fundamentally going to be making your business successful, and you're going to be extremely valuable to the business.
In different organizations with different tools or different processes, those are gonna be different technologies. But if you're constantly focused on continuous improvement and helping to build quality in and improve the flow, you'll be working on things that are problems for a lot of broad organization, and you'll be adding the most value to your company, and the specific technology to do that with is probably gonna change.
You probably need to be looking out at different things and trying different things but, if you're picking the things that are going to fix the problem, and a lot of times I talk about, don't do DevOps, don't do agile, don't do any of those things, try to figure out what you're trying to improve about the business, and then look at the tools in your toolkit and pull out the tools that are going to fix the problem. So I tend to start at that level and figure out what do I need to fix that starts with making the factory visible, that starts with putting metrics on it. Then once you have that, you can figure out what skills or tools do I need to fix the problem?
Sanjeev Sharma: That's great advice. Don't get distracted by the shiny new objects. Don't get distracted by the technology. Learn the principles. Learn what is trying, what is being achieved here? If a new technology came out, why did it come out? What did it improve over the previous technology it’s improving because you will then understand the principles. You will understand how to make the kind of inefficiencies this technology try to address. Spot on, that's where the value is. Excellent. This has been a great conversation, but I have to ask you one more thing. How can people get a hold of your newest book, right? Because I, I'm just looking forward to reading it myself.
Gary Gruver: Yeah, and I owe you that email. I want to thank you a lot. I got your book this morning at breakfast, and it's in my bag and on my flight home from India. I'm really looking forward to reading your book. Go to my website: garygruver.com.
Sanjeev Sharma: Who read that book?
Gary Gruver: I think the more people in your organization have a common framework for thinking about how to analyze, how to do software development, and a common process for engaging the broader organization and continuous improvement. It's everybody from executives that understand their role down to practitioners at the lowest level. It's not a technology book. It's not going to tell you how to set up Kubernetes, it's not going to show you how to do any of those. It's more of a process engineering flow, and how you improve, how you develop and deliver software.
Sanjeev Sharma: Perfect. We'll leave it at that, Gary. This has been an excellent conversation. I'd love to have you back at the podcast maybe a couple of seasons down and have a full on conversation, go deeper on this.
Gary Gruver: Yeah, it was great to see you again.
Sanjeev Sharma: I'm so happy to have you on the podcast. Thank you for your time.