Episode 2: Don’t Forget Your Data in DevOps
Sanjeev Sharma: Hello everybody, and welcome to the data company podcast. My name is Sanjeev Sharma. I'm your host, and I have a guest who is an old time friend of mine, Chris Novak. So Chris, why don't you introduce yourself and tell the listeners a little bit about your background and what you do today.
Chris Novak: Thanks, Sanjeev. Glad to be here. I have an interesting background because I came out of industry. I spent 23 years in banking. I used to joke that it was the same seat, three banks because I went through a number of mergers. First Union, Wacovia, Wells Fargo, then I went over to Bank of America. We were doing DevOps there before DevOps really was a word. Went out, consulted for a little while at the executive level around transformations. In the last few months, I've actually joined HCL UrbanCode division, and UrbanCode is a continuous delivery set of products. For them, I'm basically a transformation strategist. I get to work with folks and help them work through their DevOps challenges.
Sanjeev Sharma: That's excellent. To give the listeners some context, back when IBM originally acquired UrbanCode, it was an independent company, a leader in the industry, so it's an interesting opportunity to look at DevOps adoption and transformation in general from varied facets as you work with different industries, different geographies, different companies, and different maturity levels. So let's start there. Right?
You've been in the DevOps consulting world, as you said, for a while, and now with UrbanCode at HCL, we call this the data company podcast. Our thesis here is that every company is a data company. They've always been a data company because you can go back to a company which is more than a hundred years old, guess what they had a hundred years ago? What they were using computers for? For processing data. Right? And they still do it today. The compute power has improved, the storage has improved, the speed with which you can move data, store data, manage data and change data has improved. But it's all about the data.
Chris Novak: It always is. I think that's exactly correct. I think if you haven't been exposed to the data area and thought about it that way, I'm a trained engineer, so I think about things that way. I think you forget that data is where everything is driven from. Ultimately, data that gets processed becomes information, information that gets processed becomes used usable like actionable advice, right? So there's sort of that hierarchy, but in the end it's all about the data and the pieces of information that are basically flying through your organization in your systems. You could actually think about that a lot of different ways but effectively that's how it builds up, I think.
Sanjeev Sharma: Absolutely. I mean, after all we are in the information technology industry and that information, as you said, is all data. Now let's look at it from the lens of DevOps because that's the area you and I have come from. We have been working in it for awhile, right? When I'm talking to clients, I talk from a perspective of DevOps adoption having three layers. Three layers of a chocolate cake if you want. The base layer is your environment, your infrastructure. That's the first thing you automate because that's usually the slowest part of it, right? Every developer is complaining, “Why does it take me so long to get a dev environment or a test environment?” So people automate that.
The next thing they automate is the software delivery life cycle, the CICD part of it, right? That's what you do with UrbanCode. We work with some build tool, whether it's UrbanCode build or Jenkins or something, whoever is doing the build, and then you deliver the code and promote it to the next environment. The third layer is data. Once you've automated the environment and your CICD to a certain extent, then data becomes a slowest piece. As you are working with clients, do you see that as a major challenge and and how are they addressing it?
Chris Novak: My background and experience has generally been in larger enterprises, typically regulated in banking. When you're in a large organization, there are a lot of different groups trying to do different things, different speeds, different priorities and agendas, different funding. I always thought, when I first started doing these things in the organizations, "Oh, we're all going to March to the same band. We're going to do the same things." It's never that way. So the layers you described, I absolutely agree with in principle. Nail down your environment, nail down your applications, and then you think about the data. What I found, at least in my experience in practices, you do what you can with what you have. So for example, in one of the banks I worked in, I didn't really have a lot of control over the infrastructure decisions, but I had a lot of control. In fact, I owned the application delivery decisions, and so that's what I started with because if I had done it the other way, I would've gotten absolutely nowhere.
Having said all that, I think data itself, at least in the context of the things that I was asked to do, we never really thought about it because it was so far out there. We had bigger problems to solve just keeping the environment running, or even making it available, or even making sure the application was built the right way, the same way every time. So data was like nobody even thought about it. At best we thought about database deploy type processes and probably less about the data and more about the database. We're two decades into this DevOps movement, we know better.
The data to me, being a DevOps delivery guy, means two things. It means the data we instrument up to help us do our service delivery better, and the data that we're actually moving that's part of the business process. So I would say that we keep our eye on those couple of things going forward.
Sanjeev Sharma: Absolutely, and I think that distinction is very important to understand what data we are talking about. When we talk about provisioning a test environment, that's usually I'm getting data from production, I'm masking and obfuscating it to make it compliant, and then hydrating the test environment with it. But then what you are also talking about, and I know at UrbanCode, you have the new velocity tool, right? It's a value stream tool, the observability of your telemetry of your entire value stream.
Chris Novak: That's a great way to put it.
Sanjeev Sharma: And then you can act upon it. It becomes very interesting because it's fine when you have a few of the value stream, the few delivery pipelines. but as you move to more modern application architectures, and you're going to developing and delivering microservices independently of each other, you're in to thousands of delivery pipelines each at a different stage.
Chris Novak: It's not something you can keep in your head anymore. No single person has it. And you may actually be in an environment, where you have a hybrid. It's not like you flip a switch and the next day your containerized. People talk about, "Oh, we're going to go to containers." I'm like, "Well that's a five year journey," right? Because you're not just repackaging it, you're re-architecting things. So you have to understand, you're going to be in some sort of a hybrid mode for a long time, which means that you're going to be at things in the data and measuring it differently across two different ways.
Sanjeev Sharma: Absolutely. There used to be this whole bi-modal, multi-modal IT. Luckily all of those are not the norm. But truly it is, that you are multimodal. There'd be some parts of your application which are able to move fast, but you have achieved that goal of continuous data, very small batches of change. And there'll be some backend legacy system which are in maintenance, and you still, you just deliver them twice a year.
Chris Novak: And you don't have to. I think that's the thing too. A lot of people think, "Oh, we have to do all of it." No you don't. I think that more like, to overuse the term continuous, I would say continuous opportunity, right? Where you always have an opportunity to do it. It doesn't mean you're going to, but if you should decide this particular application needs to release faster, you should be able to have that opportunity. But nothing is free, especially with data to you, you want to think, "Well, what do I do now? What should be continuous and how do I release it?" And I think there should be informed decisions.
Sanjeev Sharma: Absolutely. Everybody looks at it from a different perspective, right? I as a developer, I'm looking at it from a "How can I improve my productivity? How can I reduce waste, wait and rework?” I'm having to do all this manual stuff while I'm waiting for somebody to do something, which could be automated. If I'm at an app dev leadership, I'm looking at fixed resources. I need to do either more with less or more with the same o\r same with less. One of the three options.
Chris Novak: And there are only so many levers you can pull to do that.
Sanjeev Sharma: Yeah, speed, quality, and cost. When I go higher up the food chain, then you're all talking about business value and risk production. There are only two conversations you can have at that level is, "Am I reducing my risk and I'm improving business value?" A lot of times we see technologists like us wanting to do something because of the shiny new toy we were handed and we want to play with it.
Chris Novak: Look, we’re technologists, right? We want to write code, but we have to stop ourselves. This is always the conversation. It's like, "Well I understand you want to do it. And yes, it's pretty cool, but have you been asked by the business to do that? Are you solving a business problem or have you uncovered something perhaps the business didn't realize? Well great, let's take it to the business and let's get it prioritized."
Sanjeev Sharma: Right, and how are you lowering the cost of an experiment? I think that's the whole value proposition which comes from utilizing services. Everything is the services because the cost of entry and exit becomes lower. So going back to talking about the telemetry and observability data. What do you do with it? Do you apply machine learning and automation to try to improve quality or reduce risk or reduce an increased speed, one of the three levers? Or is the whole idea there just to get a handle on what the heck is really going on?
Chris Novak: That's usually the first part because in most cases we're flying blind. I want to circle back just a little bit though because we talked about two different kinds of data. And so from the perspective of DataOps. I've always said that when I was creating these effectively DevOps services, my Holy Grail was always being able to deploy the database and to deploy the data with the database. To me that was the hardest thing to do because again, there was never any form around it. People didn't think about it, but it was really an essential part of the application. To me, everything is code, right? So data is code and database is code.
But it was so far out there because we could barely manage our own simple java applications, most people didn't get there, but we did have some teams that were successful doing it. To your point around what are we looking at? When we look at the telemetry across everything, part of it is most of it's invisible. Let's start measuring things or at least looking at things that haven't been looked at. The first thing to do is to create the type of system where you can start to at least take a look and see, well what kind of data is available and then is it useful?
Let's disposition that data and say, "Well what do we do with it? How is it related?" The first level that bubbles up to is a team-level value stream. And you say, "Well given this type of a team, and a dev team is a good example, you have work item tracking and maybe their CI continuous integration builds. How are those things working together to help that team deliver what they've been asked to do in that segment?" You might hop over and say, "Well how is it progressing through the environments?" Now you're talking about a flow metric potentially where it might be an operations, or a deploy team, or a testing team, or an environment management team, a whole different set of folks.
If you're in a regulated industry, there's somebody above you saying, "Oh that's interesting, but I want to know are you satisfying regulations as you do it? How does that piece of data help me satisfy the regulations and maybe convinced the OCC that we're okay"? So it just depends. I think you have to understand what piece of the value stream you're in, what data is available, and how you move up. So we can say, "Yeah, we want to do AI and all these other things," but I think it's a crawl, walk, run. And if you can't measure the data and then figure out if it's even important, what does it matter if you're doing AI or anything else, or correlations between them, or predictive analytics, right? It doesn't matter much because a house built on a bad foundation then it's going to fall down. So I think that next step is saying, "What do we get from these foundational things that we can say, is there an A minus B where we can get some kind of a quick win or a result? Pretty simple stuff like a cycle time. I don't need advanced mathematics, or predictive AI or anything to tell me. Let's just look at this and see are we getting better than we were last week.
Once you start to accumulate some of this data, now you can start to think about: “What's my data science going to do to this? Can I look for the correlations? Can I look for interesting, counter-intuitive things that we didn't think of?” We're looking at data right now in some of the things we're doing, we're starting to see, at least in the data sets we have and they're fairly large, we're pulling on them from open source git repositories. What we're seeing is if you take a work item and a size of a work item, can you predict when it's going to be done in a period of time?
For small work items, the answer is no. It's all over the map. The distribution is all over. For large ones, we can predict that, or at least our models are telling us with a good degree of certainty where it is. So now we're like, "Well, why is that?" And I happen to think it's like that story where if you have a jar, and you're told to put rocks and sand in the jar, if you put the sand in first as no room for the rocks, right? Put the rocks in first and now you can fill it with sand. And I think those smaller sized items of work, people know they can fit it in in a lot of places, so they might hold it until the end. They might start at the beginning and get it out of the way, but those are the types of patterns that we're starting to see exposed. And we're like, "That's interesting data. How can we use that to help our teams perform better together?"
Sanjeev Sharma: I think that's absolutely brilliant. I always challenge a lot of technical executives when I go to clients and say, "Can you give me your dependency matrix across all your systems by business function?" I get a lot of head scratching at that point. And then I say, “Can you give me the second order dependencies?” That's when they throw me out of the room. But if you start instrumenting and measuring it, and knowing how tweaking one dial, one variable is going to impact your output, training more people or something we know doesn't solve the problem.
Chris Novak: You're absolutely right. You want to strive for that univariate change.
Sanjeev Sharma: Excellent. so let's wrap this up. I want to make sure people get something, your personal thoughts. One of the challenges as I talk to people I mentor, and people I'm working with, folks are facing today is that there is too much change. There's too many new things coming up, right? I see a lot of people look around and go, "I'm scared to spend the time and invest the money and energy to learn something new. Because I don't know whether it'll stick." What would you recommend to them?
Chris Novak: Whether you're a college graduate or whether you're switching careers or getting into a new area, learn something really well and do it well. Just dive into it, right? And you will uncover things about it that will make you almost an expert that nobody else has. From there, you can then branch out and use that experience and maybe you'll dive deep on something else but then become a bit of a generalist on it.
Don't be afraid to dive in, even if it feels like it's going to be useless at some point, and then go broad because you're going to have two different perspectives when you do it, and it never really goes to waste. If you can do one thing well great, you're competing against somebody in that organization that does it better than you. But if you can do two or three things on average better than most, you've got a stack of things that nobody else has that combination of. That's the kind of thing I would go for. It's like string a few things together that are related, and then from there you you really are for a certain given amount of a space you're in, you will know what you're doing.
Sanjeev Sharma: I think that is sage advice my friend. That is brilliant advice because it is that cross functional, the cross skill mapping where you can bring concepts from related but separate domains is where the difference lies and not get stuck in a hole.
Chris Novak: I'm glad you said that actually because in the space that I'm in with DevOps, or change management transformation, it's usually the idea from the outside that makes the difference. And I'm always interested in the point of view. To your point, if you can do those sorts of things that is a very powerful approach because we all get locked in our own little silos, so don't be afraid to step outside of the silo.
Sanjeev Sharma: Excellent. I couldn't think of a better spot to stop, so thank you very much Chris. It was great having you on, a great conversation, and I'm sure a lot of people will benefit. Thank you for your time.
Chris Novak: Thank you too.