Kudos to the author for being honest about the flaw in this metric:
> It’s ridiculously easy to cheat this metric. Even if you correctly categorize your muda—it’s very tempting to let edge cases slide—all you have to do is stop fixing bugs, defer some needed upgrades, ignore a security vulnerability... and poof! Happy numbers. At a horrible cost.
> Actually, that’s the root of my org’s current capacity problems. They weren’t cheating a metric, but they were under pressure to deliver as much as possible. So they deferred a bunch of maintenance and took some questionable engineering shortcuts. Now they’re paying the price.
> Unfortunately, you can get away with cheating this metric for a long time. Years, really. It’s not like you cut quality one month and then the truth comes out the next month. This is a metric that only works when people are scrupulously honest, including with themselves.
I've had the same experience with well-meaning productivity metrics collection: Even the execs who were really trying to do the right thing would accidentally invent a new metric that looked fantastic for a couple years, then later collapsed when everything else caught up. By then it might be someone else's job to clean up the mess.
Like the author said, this difficulty in this problem is that it can go on for years. If you have an executive team that actually knows how to balance the metrics with the things that are harder to tracked, it might not be a problem. However, as soon as you get an executive looking to game the system for a couple years before jumping to another job, it becomes a doomed venture.
> In other words, in the absence of RoI measures, the percent of engineering time spent on value-add activities is a pretty good proxy for productivity.
It's a lagging indicator of productivity. If the team is consumed with fixing old bugs, it means that teams weren't actually that productive years back and were instead just exchanging "results" for technical debt.
Instead of creating a perverse incentive to create messes, what if they did something like create a ratio: "60% of work is new features, and 40% is maintenance"? And if maintenance becomes all consuming, could they look back and identify the problems with previous launches?
Education as a domain has identified key performance metrics for teacher evaluation and has come up with a method of collection they can point to which always results in useful data.
It's also an expensive and time-consuming process. You're either burning a stack of money because you absolutely need that info, or a slightly smaller stack to cook the books in full view of people who understand statistics way more than you do.
> Education as a domain has identified key performance metrics for teacher evaluation and has come up with a method of collection they can point to which always results in useful data.
Can you please elaborate on this? My impression was that judging teachers is actually fairly hard. See, for example, push-back on standardized testing.
The most useful metric I can imagine is where the students are in some number of years, but the obvious (to me) problems are
> On-the-job observations done by someone with way more experience
This feels like it puts a potential hard cap on quality growth by discouraging mixups or experimentation that might improve education, but wouldn’t please an old-guard for one reason or another, and discourages alternative class styles which the judge doesn’t approve of.
Both of those seem like potentially serious problems in education, given that its structure has with few exceptions been effectively stagnant over the last several hundred years. You therefore may be mistaking “evaluating the success at implementing the widely accepted method” for an “evaluation of quality”.
The institution of widespread public education that looks anything like what we have now isn’t something I’d describe as having meaningfully existed for hundreds of years, let alone as having been stagnant that whole time.
I would. Most modern systems of education have their roots in 1748, and either are derived from or inspired by reforms to the Prussian educational system to guarantee free and compulsory elementary school education between the ages of 5 and 14 as taught by secular professional teachers for the full populace.
If you mean the university and college levels, they have interesting differences from the 18th century, but are recognizably similar in regards to basically everything besides cost and curriculum differences we'd expect due to changes in societal needs, technical advancements, and changing interests.
I wouldn’t date the current system prior to one-room schoolhouses becoming uncommon. That’s an enormous change.
If we’re going back farther than that, then I’m really having a hard time seeing where the stagnation comes in—I don’t agree with that even past that point (gifted education alone is only just now starting to develop into something halfway useful, and that’s a very recent change in just one small part of primary and secondary education) but if we’re going farther back, then… what?
The structure you're seeing allows for plenty of freedom per classroom and per institution. It's been stagnant in exactly the same ways automotive design has been stagnant, and for largely the same reasons.
Your line of thinking makes sense, and your questions were more or less answered last century. I think this would be a useful conversation to have with a GPT...
> You therefore may be mistaking “evaluating the success at implementing the widely accepted method” for an “evaluation of quality”.
because that conjecture alone is an entire topic of study in several disciplines.
Oh good, I was afraid this was gonna go into cuckoo territory.
Yep, this is what works. All the attempts to turn it into a simple spreadsheet based on something you can easily measure from an office across the state have been so flawed they’re very nearly useless, if not actively harmful. We keep doing it anyway because the people making those calls either don’t understand the field or do, but don’t care because they want to implement bad ideas for political reasons (trend-following to avoid criticism is huge, for one thing)
On-the-job observations are very subvertable though. They happened a few times in my high school in Soviet Union. At least twice the teacher would prep "good" students a few days in advance: I will ask you this question, and this is the answer you are supposed to give, make sure to memorize it well. So the class is engaged, gives good answers to hard questions, except this is all a Potemkin village deal!
Any human can manipulate their assessment. However, I think a skilled and trainer observer has a better chance of detecting that than any standardized blind mechanism.
Practically speaking your interview would require applicants to apply claimed knowledge to novel context.
A simple method would be to identify interesting problems from recent PRs and ask them to walk you through their approach to discovery and solution. It's a problem they should be familiar with, but in a new shape and with different labels. Let's see what they come up with.
Same for the ubiquitous "story points per sprint", "number of hours sitting at the desk", or "lines of code written" measures of productivity for engineering teams; because they're easy to measure and provide some kind of number that can be reported, they get used. Despite being completely useless and actively harmful.
Your point about people who don't understand the process using these measures because it suits their purposes is also relevant. Productivity measures can be a political tool.
I'd say our equivalent might be a staff-level engineer sitting in on a sprint. Maybe less tense. You're being evaluated, but any resulting PIP is practical advice and comes with a fucking rubric.
I'm surprised the author, given the thoughtfulness of much of the post, didn't decide to model tech debt, since then accumulated debt would be quantified frequently (if inaccurately).
Especially since they are optimizing for value-add work versus paying down tech debt, which the author acknowledges is what got the engineering org into a paying down tech hole that they are currently mired in before the author's tenure. For this reason, accumulated tech debt should be modeled in their bar charts too.
This is far preferable to letting tech debt grow and fester for several years. In a way, I find this possibility almost a regression by the author and possibly irresponsible. I.e. they didn't patch the underlying core issue (rest of org not understanding or assessing tech debt as a liability), and instead push a quantitative measure that actively encourages taking on unmeasured tech debt.
It doesn't seem possible to track percentage of code as debt. You can't put a comment on code and call this block either debt or good.
One way could be to measure different activities. If you spend time solving a lot of bugs (muda) then where are those bugs?
If they're in the same old code, does that code need rewritten? If they're in new code, is that code being written in a good way? Can better testing catch these?
"Technical debt" seems as hard to measure as developer productivity. It seems you need something you can actually put on a bar chart to measure the effects of having technical debt.
You need a way to assign technical debt to a process or practice, then you can evaluate the cost of that process over time.
Could be something like... Our team spent 500 hours fixing bugs introduced during overtime, and we spent 250 hours of overtime to introduce those bugs in the first place. So an additional hour of overtime will likely introduce 2 hours of technical debt.
Or maybe... Code written using inheritance hierarchies creates 40 minutes of muda per hour, but using composition only creates 20 minutes. Or... every hour spent on throw-away prototypes reduces muda by 2 hours.
And all projects are comparable among each other, standardized for it to be possible to come up with measurements like that?
It feels like nightmare to have to come up with those hours or measure them in some way.
Anything else besides intuitive guesses by engineers won't work, because to have it any accurate at all would require perfect time tracking and understanding of all the nuances or it will just be a biased metric, which can be whatever you want it to be.
Like, the things you are spending your time on are never clear, there are never clear labels or patterns and the things you spend most time in the future can be very unexpected, the more so, the larger the org.
I think most people forget that most metrics are easy to hack and all metrics are hackable (yes, even if you use math & science). So the only rational thing to do is be aware and acknowledge the limitations of your metric and be open about that and what context they operate under. If you aren't aware of the underlying assumptions, your metric is borderline useless. It's like the old saying "there's two types of code: those with bugs and those that no one uses." All systems are buggy and if you aren't looking for them you'll get bit.
And in great irony, it is why metrics are the downfall of any meritocracy.
I really like this article so I am sad to be cynical about it. Nevertheless. If you're engaging in C-suite politics as described in this article, what on earth would be the motivation for writing all this in public? I doubt the situation will evolve favourably to Mr Shore if the leadership team starts reading this.
Full agreement with the perspective though. CEOs (& consultant friends) generally have some extremely reasonable expectations about software engineer productivity based on experiences in manufacturing, primary industries, healthcare, service and public sectors. Surely we need accountability and productivity metrics! It is a long and painful process that may never end watching them gathering the evidence needed to learn that software engineer productivity is different. A sad history repeating over and over again like clockwork.
This CTO seems to be finding a golden path for how to manage expectations without beating that over people's heads and I'm taking notes. I don't think it'll work out exactly the way he hopes but I love the attempt at making his team productive.
> If you're engaging in C-suite politics as described in this article, what on earth would be the motivation for writing all this in public? I doubt the situation will evolve favourably to Mr Shore if the leadership team starts reading this.
I didn't get that impression at all from this blog post. I didn't see any "C-suite politics" really at all in this article:
1. At the end of the day, the C-suite is responsible for making the business successful. So it's reasonable to ask for some level of productivity measure, no matter how much engineering objects.
2. I thought the author was extremely insightful about (a) trying to get at the root of what execs really wanted, (b) acknowledging that calculating engineering productivity is notoriously difficult, but then (c) coming up with a best, honest estimate, while still highlighting the potential future pitfalls and the dangers of Goodhart's Law.
In short, I read this article as exactly the kind of engineering leadership a good C-suite would want to keep around and promote.
If I was a CEO with a VP who wrote something this realistic and honest, my response would be to pressure the rest of my execs to give me that same level of quality information. Breath of fresh air!
One of my long standing beefs with C-suites and their MBAs and whatever meat factory grinds them out is the continued lack of comprehension of basic software and IT management.
Back in the early 2000s, this could be saliently characterized by the exuberance many CEOs of companies bragged about how IT-ignorant they were. I saw the "my secretary prints out my emails" written proudly in print many a time in the CEO worship magazines like Fortune and the like.
Every decade of continued IT penetration and performance benefits in business are continuing the principle: the health of your business is strongly tied to its IT health.
BUT MY COMPANIES JUST STAMPS WIDGETS. Ok, look. MBAs long ago accepted that every business, regardless of what it does, needs two things: accounting and finance. So every MBA gets a decent education on those.
Well, hate to tell you, MBAs need to know the fundamentals of IT.
Basically, C-suites not understanding productivity measures in IT are hard is stupid not only because it is a basic aspect of IT management, but its a bit more disturbing than that:
YOU CAN'T MEASURE MANAGER PRODUCTIVITY EITHER. How does traditional MBA measure middle managers and those types of orgs? That is rife with all the problems of measuring developers, not the least of it being the "juke the stats" universal phenomenon. That's why the most important single indicator of manager ability is: headcount. Headcount headcount headcount. The second is size of budget: bigger bigger bigger.
Of course everyone that has working in companies knows the hilarity of what that produces: bloated orgs, pointless spending to use up a budget otherwise it gets cut, building empires, etc.
The essence of developer that C-suites need to understand is that they are not factory workers: they are in fact MANAGERs. They just don't manage people, they manage virtual employees/subordinates (software). Go ahead, ask any software dev in an org what they do. Generally, somewhere in there is a bunch of acronyms and the fact that they ... wait for it ... MANAGE those acronyms. Krazam's microservices is a perfect example of this. What is that long diatribe essentially about? MANAGING all those services.
That gets to another major undercurrent of IT. Managers want to place IT employees in the "worker bee" category. They don't want IT workers to be managers, and entitled to the great benefits according the management class. The long running IT salary advantage in the US, and the constant ebb and flow between management and IT "working" to push that down is a major social undercurrent here.
Wondering where the real problem lives I cant help but get back to the disconnect between education and employment. Isn't it that most curriculi have a bunch of borderline nonsense and a bunch of things you end up applying every day?
Companies pay taxes, those go towards education even if it is indirect by organizing life sufficiently to make training possible. What formula can be had by which companies may rate chunks of education or nudge it towards more useful knowledge? Do they even know what they need?
Oddly schools measure productivity all the time. It seems a hilarious puzzle.
For a MBA level, I'd say the education probably needs to involve:
- visualizing the types and size of data collected and maintained by IT systems. Data size will generally scale with the amount and complexity of systems, I think there can be good high-level measurements for MBA types about data sizes and complexity. Of course they won't measure producivity in systems dev, but from a systems maintenance perspective it could help.
- a lot of apocryphal stories about systems development and implementation
- and of course, as I alluded to, a model of the employees and their responsibilities and necessary competency in IT, and how the business relies on them. How a competent IT org isn't just maintenance on the bottom line, how Amazon leveraged good Silicon Valley talent to be a lethal marketplace weapon, and probably lots of stories of big box stores relying on good IT orgs to scale their logistics and operations beyond what was thought possible in the 80s and 90s.
That's from two minutes of me thinking about it. What modern MBA student doesn't want to know how Amazon does things? They run both a high-margin software/hardware operation and a low-margin retail operation with "good" IT (I mean, don't get me started on the workplace practices, but they scaled with good IT talent, ditto for Google).
Good IT + Good Finance should power any decent business plan to success. Really, in the modern age of cartels, the only way to disrupt the big players in a traditional market will likely be through well-applied IT that has superior scaling, cost, integration, and adaptation to the old guard players.
It feels like 'management' here means something like 'take responsibility and ownership'. And that is the thing that demands the rewards. Is it also the difference between a 'worker bee' and a 'manager'? The former is 'do the thing', the latter is 'figure out what thing to do.' But many software people are actually both, of course... so it seems these are not useful dichotomous categories...
Right but I think that use to refer to using applications. Developing them is much more like real illiteracy. You can look at a bug tracker but the todo list might as well be in Chinese.
This is so much the case that on a Freakonomics episode I caught on NPR the other day about economists attempting to prove whether the Peter Principle is true (TL;DR probably yes, and likely more so than you’d even guess—but also this is all very hard so maybe don’t take it too seriously) it seemed that this is something so difficult that they barely even try to do it and when they do the measurements are so narrowly-focused and come with so many caveats that one hesitates to call them useful.
Like best case you manage to find that you can measure one of several plausible performance indicators and if you’ve picked the exact right kind of job where worker productivity can kinda-meaningfully be measured (it often cannot) then maybe you can conclude some things with a large enough dataset, but then connecting that to any particular behavior on the part of the managers is another big hurdle to overcome before you can try to, say, develop effective scientifically-sound training, and at that point you’ve likely wandered into “just give up, this is too messy” territory.
[edit] and this:
> That gets to another major undercurrent of IT. Managers want to place IT employees in the "worker bee" category. They don't want IT workers to be managers, and entitled to the great benefits according the management class. The long running IT salary advantage in the US, and the constant ebb and flow between management and IT "working" to push that down is a major social undercurrent here.
Nail. Head.
Managers of developers act piss-pants afraid of us properly joining the (socially speaking) upper-middle (cf Fussell) or professional class, which the MBA set are busy trying to eliminate everywhere else (lawyers, doctors, college professors) so they’re the last ones standing.
This shit is why micromanagement PM frameworks and packing us into loud visually-messy sardine can open offices is so popular. Dropping that stuff would be table-stakes for our moving up the social status ladder. Adopting it pushes us way down the pecking order.
I was also cynical at first but it ended up as a triumph over C-suite politics.
This person took a nonsense buzzword "productivity" and turned it into actual measurable numbers which make sure the company is moving forward, and linked those numbers to enjoyable developer activities.
This seems like a good outcome to me. The C-suite actively don't want the developers spending time fixing bugs and doing drudge work. Developers are incentivised to avoid bugs with good code and testing. They can knock back bad implementations because those will lead to more bugs which the C-suite have said they don't want.
> So that’s my productivity measure: value-add capacity. The percentage of engineering time we spend on adding value for users and customers.
I think this is an important metric, but it doesn't mean anything if that time is wasted. Imagine the team has no muda, they are spending all their time on value-add, productivity is 100% right? Well no, you still need to succeed at what you're doing, do it well, and in a reasonable time.
This isn't a measure of productivity, it's a measure of time allocation.
If I code up to the specification given by the business, a feature that annoys customers and makes them switch to the competition, is it my problem?
For engineering department amount of "value-add" or how many features they deliver (to the specification of course, if the requirements are bad, it is also on business) is the only measure of productivity. Because that is something they can have control over. Reasonable time is implied by delivering amount of value-add.
Don't go overboard with the definition, it should be context of engineering, not the whole company. If business cannot come up with good features that time is wasted for the company but engineering is doing what they are paid for.
Does his framework include a measure of output too by looking at ROI? So if you are able to deliver more effectively, that should increase ROI independently of whether you reduce the amount of muda time.
All of this feels like a side effect of a low trust environment. I respect the author's creativity, but the environment described sounds highly dysfunctional. In my opinion, if the ceo is demanding an arbitrary metric from his subordinate without being able to articulate what they are looking for sufficiently that the subordinate has to invent a made up number just to justify their existence, that sounds like poor leadership to me. Of course, this is only based on the author's side and perhaps I'd feel differently if I heard the other side.
It is (source: I work in an org like this), but your aim as a leader, the VP in this case, is to do whatever is necessary to enable your team to succeed, given the constraints that exist.
Ineffective company operation and other people’s hangups and shortcomings are constraints. Does the nirvana where you are never limited by the ineffective operation of others exist anywhere? Some companies are better, some are worse, but all of them are collections of people with limitations at the end of the day!
What about the Bayesian methods shown in "How to Measure Anything"? They have been applied to Cybersecurity ("How to Measure Anything in Cybersecurity Risk" in a very thorough and convincing manner. It looks like the business around it is trying to apply it to product management (https://hubbardresearch.com/shop/measure-anything-project-ma...). Basically the idea is when things are hard to measure we should not abandon quantitative scales and use qualitative ones (like t-shirt sizes) but instead use probabilities to quantify our uncertainty and leverage techniques like bayesian updates, confidence intervals, and monte carlo simulations.
This is not inconsistent with How to Measure Anything IMO (I like that book as well). The biggest issue to me is that he does not define actual follow ups on ROI -- it is all estimated in this framework. So it is all good to define how to prioritize, it is not helpful though retrospectively to see if people are making good estimates.
My work very rarely is a nicely isolated new thing -- I am building something on-top of an already existing product. In these scenarios ROI is more difficult -- you need some sort of counterfactual profit absent the upgrade. Most people just take total profit for some line, which is very misleading in the case of incremental improvements.
The problem is the muda should have expected values associated with them. Bugs and security vulnerabilities do cost money, these are 90% confidence intervals of dollar impact from How to Measure.
> Actually, that’s the root of my org’s current capacity problems. They weren’t cheating a metric, but they were under pressure to deliver as much as possible. So they deferred a bunch of maintenance and took some questionable engineering shortcuts. Now they’re paying the price.
> Unfortunately, you can get away with cheating this metric for a long time. Years, really. It’s not like you cut quality one month and then the truth comes out the next month. This is a metric that only works when people are scrupulously honest, including with themselves.
I fear this is what's happening at basically every major tech firm in the world right now. We might not fully realize the stupid breakages and tech debt of these decisions (namely: layoffs and large refactors of the product and system to inject AI everywhere) for _quite_ some time. I fear it'll be just like "social first" and "mobile first" iterations of the web that create a bunch of cruft to be cleaned up later.
I like DORA metrics as a guideline (Deployment frequency, lead time for changes, change failure rate, time to restore) which aren't perfect but are fairly straight forward to understand and measure. I might look at number of production touches(manual interventions in the delivery pipeline), story points estimated, story points delivered, number of pages, number of bugs created/closed/closed w/ change, number of errors/exceptions. It's possible to hide technical debt here but it eventually start to manifest in one of the stability metrics (bugs, exception, production touches, etc) or a drop in velocity .
Though is more about engineering delivery of the product roadmap. I do like the 'bets' framework in the OP to figuring out whether the product roadmap actually results in meeting the desired goals for the organization (growth, profits, whatever) but these should be measured as different things.
1. I want you to build me a borgizaf. How long will it take you and how can I measure your productivity? Now I want you to change the following line of code to print "bye" and answer the same two questions.
`console.log("hello");`
Any answer to the first question has a certainty of 0%. The answer to the second is almost 100% certain. The certainty is directly dependent on whether you know what you are building AND whether you have direct experience having completed the same tasks.
2. Given some task that is to some extent unknown (see above), how do we proceed? We reduce the uncertainty/degree. This leads directly to "hello world". You start with a program that shows what a borgizaf can do and show it to the people who want it. If it is wrong, evolve it. Iterate.
The problem here is that often the people who want the borgizaf can't really tell you what it is. Especially if it has not been made before. But they can tell you what it is not. (See Notes on the Synthesis of Form). Even the people who want a borgizaf may not know what it is. And it may change.
3. In order to increase "productivity" we often have teams of people working on aspects of the same problem. After years of labor each part is done and one day the thing is assembled. Unfortunately, the drive team did not tell anyone they were using front wheel drive, and everyone else thought it was rear wheel drive. Left and right driver position, etc. You don't believe this is possible. One word "Ingres".
Iteration must be the whole. Unless you are iterating on the whole, your certainty will always be 100%. Because the pieces might not fit together.
4.The speed at which you develop a borgizaf, is the dependent on the speed of iterations and the degree to which each iteration reduces uncertainty.
It's a bit worse than that. The borgizaf is often defined by business development, or somebody like that. They decide they need one after a bunch of customer interviews. But the customers don't actually know what they need. So the business development people are trying to describe something that the users need but don't know that they need, and that doesn't yet exist.
Yeah, the main problem is uncertainty, not lines of code to write. You measure progress mostly by how much you reduce uncertainty.
Yes. Often the need is best understood through iteration, but certainly having "eyes on" by the ultimate customer helps. Once I was told by marketing that security was not important. The customers got a prototype and screamed bloody murder. I think the moral of the story is to view the development of a product as a collaboration between all parties. All, including customers.
If there are silos of concerns then you can end up with a beer distribution game.
And this: "You measure progress mostly by how much you reduce uncertainty." is very succinct
The iterative approach described here to finding a 'good' borgizaf is simulated annealing. Make an initial guess to the solution, then start with changes that are large early on and which increasingly refine as you get closer to your goal.
Metropolis-Hastings algorithm works better for situations for which you know even less about what a good borgizaf looks like. Perform many experiments (code changes) and toss out all of the ones that don't work. Science works this way. An automated LLM coder would probably also go this route as the cost of trying many things would be more heavily automated.
Large systems tend to look more like the genetic algorithm. The space is diced up into individual components and then each 'gene' is optimized in parallel. For example if you were trying to build a Linux distribution you'd have several hundred packages and then for each release the packages could improve or be entirely replaced with better versions (or swap back and forth as they competed).
Of course there are other search strategies that can be employed. Search is still an important area of research.
Sure, everything is an optimization problem, the hard part is defining your cost function, especially if the borgizaf is trying to solve an ill-formed business problem.
> Like any engineering organization, we spend some percent of our time on fixing bugs, performing maintenance, and other things that are necessary but don’t add value from a customer or user perspective. The Japanese term for this is muda.
That translates as "pointless" or "useless"... that's like, the opposite of "necessary".
Just sticking with an English word like "foundation" would make sense here - necessary, but not something a customer or user typically sees. Even further, metaphorically, fixing bugs = stabilizing the foundation.
Using foreign words just to sound fancy and ending up with the wrong meaning is just annoying.
They're saying the things that don't add value are muda (waste), not the foundational stuff. I only know that because my employer uses the same jargon. And since almost no one speaks phonetically-spelled-out Japanese, they always follow it with (waste) in parentheses
“Muda” is a technical term with a specific meaning. (Google it.) It originated in the Toyota Production System, then migrated to software development via the Poppendiecks’ “Lean Software Development.” It doesn’t mean “pointless” any more than legacy software means “an inheritance.”
If you're aware of Toyota's 7つのムダ sure. I expect the average person to think 無駄 as used in daily conversation. I don't think the analogy with "legacy software" is accurate; "legacy" has multiple definitions in modern English, whereas 無駄 has exactly one definition in modern Japanese.
One of the example sentences -- 時間を無駄にする -- is exactly what my mind thought when reading the article's explanation of "muda". (Also: the second entry does not apply to modern Japanese; dictionaries reference old literature for example sentences of obsolete definitions)
I think the confusion here is that you're treating "muda" as a Japanese word (at least I presume that's why you're transliterating it back into Japanese).
It's not. It's English technical jargon of Japanese origin. English is rather notorious for this sort of borrowing. In this case, it was borrowed over from Toyota as part of the Lean Software™ methodology[0], so 7つのムダ is in fact the intended reference.
This appears to be in agreement with my original comment:
> In order to eliminate waste, one should be able to recognize it. If some activity could be bypassed or the result could be achieved without it, it is waste. Partially done coding eventually abandoned during the development process is waste. Extra features like paperwork and features not often used by customers are waste. Switching people between tasks is waste (because of time spent, and often lost, by people involved in context-switching). Waiting for other activities, teams, processes is waste. Relearning requirements to complete work is waste. Defects and lower quality are waste. Managerial overhead not producing real value is waste.
Note what isn't included: Necessary but hidden things. In fact, one of the things included here is explicitly the opposite of what OP included - defects (bugs). They called fixing bugs a waste, while this is saying having bugs is a waste.
Everything I'm seeing from the responses I've gotten are just reinforcing that OP is wrong about this word, whether it be straight Japanese or altered English jargon originating from 7つのムダ.
“Defects: Having to discard or rework a product due to earlier defective work or components results in additional cost and delays.”
In TPS and Lean, you’re expected to build quality in. So, while fixing bugs is necessary, it’s considered a type of waste, to be eliminated by building software without bugs.
Which indeed I've never heard of. But from what I can find [0], the meaning here still matches what I said - these are things that can be safely removed to save time/money without harming the end product. Not "necessary but hidden" things.
> In other words, in the absence of RoI measures, the percent of engineering time spent on value-add activities is a pretty good proxy for productivity.
What about using self-rated satisfaction of your workers in terms of how happy they are with their engineering solutions? I find that when I solve a really nice problem and add something that is cool, or fix a very annoying bug, I feel very satisfied with the result. Whereas, I have produced things in the past (when I worked as a programmer) that seemed to make management happy that I wasn't satisfied with, and it seemed to me at least that the results weren't as useful in the end.
This is way too subjective; I've met way too many developers throughout my career who loved to spend their time crafting "perfect" code rather than deliver customer-facing value for this to work as a measure that aligns with business objectives.
I love this. When they RTO'd us at work, I ended up sitting next to someone who isn't on our team and who isn't an engineer. He complained to his manager that I curse a lot and when they asked my manager to have me tone it down they moved him instead.
I'm not being loud or particularly vulgar, just lots of "what the fuck?" and "what is this shit?" and "how the hell does this even work?" being involuntarily mumbled to myself.
I had the exact opposite experience :(
Stating things like "what is this shit"or "This is an absolute dumpster" was taken as demoralising colleagues ...
The “fatal flaw” of skipping maintenance to max out value-add work seems like it could be addressed by ensuring that any accumulated tech debt be properly accounted for, rather than swept under the carpet.
You’d consider “productivity” then to be value-add work minus identified tech debt. Since calling out tech debt would hurt leadership’s own metrics, you’d need a mechanism to allow individual engineers to blamelessly (and probably anonymously) report, validate, and attach a cost metric to this tech debt.
The org would then be incentivized to balance work that delivers immediate value with work that represents an investment in making the team more efficient and effective at delivering that value in the long run.
That's the thing though! Quality :tm: is very hard to account for, especially when the quality is in the system, tooling, and process to build the quality end product.
And often times technical debt isn't actually something you can put on the balance sheet or bug tracker. It's all the little investments in the future that are deferred or skipped. That one code change is so minor you can phone in the review, you'll write better documentation for that new feature in a few weeks when you have some time, etc.
It’s a good practice to note things for later and why they should be done though, right? Even if it is never intended to be worked on, noting that you would write more docs for this class but you don’t have time is an important indicator of productive capacity for leadership. If I start seeing a lot of that as an executive, I should start to worry if we’re building our value-add on a foundation of sand.
I don't think I've ever seen a ticket to write docs get prioritized. The backlog is just another void that happens to make people feel like they're doing something useful
Tech debt is categorically unquantifiable. Most of the time it's more of a feeling than a number, and it's not an accessible number ever. What's the ROI on paying down debt? Hold as is, renegotiate, or extinguish? It's the same calculation that goes into deciding which of the five thousand value-add proposals to prioritize. The piece is unconvincing for that reason. There's an assumption that TD is known, and an implicit assumption that the ROI on its remediation is known. Neither of those is true. Systems evolve to where they are, with all their TD warts, because value-add was prioritized.... and then we have TD. I reckon we should just live with that uncertainty, move the product forward, calculate ROI using the same bogus productivity metrics we always have, and stop inventing "better" systems which are just another form of magic, but manage to suck up time and resources not required by accepting on faith the old bogus metrics.
It’s really not unquantifiable. I read “How to Measure Anything in Cybersecurity Risk” and it was an eye opener. Using a table of risks and outcomes with associated probabilities and 90% confidence intervals of dollar impacts we can quantify categories of technical debt.
If "Cybersecurity Risk" were the only form of technical debt, we'd be just fine(?). Or, at least, we'd have some sort of metric. It wouldn't be a good one, but it'd be there. Chance of a breach: 1%. Existential or not? Probably not. Cost of mitigation? Probably small. Worth addressing? Mostly no, unless you're a regulated entity; then it's mandatory. Quantifiable, for this narrow case, but what of the rest?
Apply the same mentality to other things. If the cybersecurity folks can quantify risk so can you. Are you keeping track of your supply chain? How modular is your code? How easy to refactor is your code? You could think of reasonable metrics to measure various aspects of technical debt. It won't be perfect but it's better than nothing.
I think a bad metric is very much worse than nothing. It sucks away time to record, debate, report, and discuss. It encourages bad decision making. If you throw up a number people will give it weight, even if it's stupid. Multiplying 6 gut checks and trying to make a decision about engineering direction is like tracking someone's mood by the metric of whether they ate an odd or even number of calories yesterday. There's theoretically a signal under all that noise, but the direct gut-check or any number of qualitative clues are so much better than the distracting number.
I agree whole-heartedly. A bad metric is a curse. It's misleading, resulting in waste, and falsely reassuring simply because it exists as a number.
+100 on the gut-check qualitative approach
> The “fatal flaw” of skipping maintenance to max out value-add work seems like it could be addressed
Who says it's a flaw? And even if it is, who says it needs to be addressed?
It's all contextual: tech debt used to be a flaw that could destroy a product, but nowadays I'm seeing teams rewriting components of their products every 18 months in whatever new fad seems to come along.
Why care about debt when it's going to be written off in the future?
And even if it isn't, the person who accumulated the debt did so by adding features - he's the man that delivers, so he gets to go up the ladder
It's not a fair world: anyone who actually cares about bugcount, product quality, customer satisfaction and sustainable velocity just isn't going to get recognised for the fires they prevented.
I think you’ve misread my post. What is being “addressed” isn’t technical debt itself, but rather the author’s proposed failure mode of totally ignoring muda to focus on overly-incentivized “value add”, which he correctly forecasts will slowly destroy the product and company.
I’m saying that this doesn’t have to be a failure mode, so long as you acknowledge and record when muda has been skipped, and take that into consideration when holding leadership accountable to productivity metrics.
For important but non revenue producing aspects like security, there are actually insurance markets now for breaches. The insurance companies lower your premiums based on their assessment of overall risk, making your exposure more quantifiable.
He mentions this in the article to say that it’s a “fatal flaw” that might cause this methodology to not work in your org. However, he also fatalistically assumes that skipping muda is a failure case, rather than just a realistic response to balancing short vs long term considerations.
I suggest above that muda should either 1) be worked on, or 2) the fact that it’s being deferred should be explicitly captured. And, since there are competing interests (leadership is accountable to net productivity while ICs are not) the deferral capture needs to be anonymous to prevent top-down pressure in the direction of ignoring tech debt accumulation.
I'm of the opinion that it isn't really possible to even define "productivity," let alone measure it.
Different layers of the org, have different expectations for "productivity," and the problems generally occur, when conflicting definitions collide.
I have come to learn that there's really no substitute for hiring really good, experienced engineers, and then manage them with Respect. I know that the SV "Holy Grail" is to come up with The Perfect Process, whereby we can add a monkey, pay them bananas, treat them like circus animals, and come out with perfect code, but I haven't seen anything even close to that, succeed.
Beware Goodheart's Law: "when a measure becomes a target, it ceases to be a good measure". If your goal is stopping to waste time solving bugs, I'm sure you're going to be able to do that.
You should have an important counter-metric to see if you're not messing with software. It could be number of reported bugs, crashs in production, etc.
Then it becomes the challenger scenario. Various pieces are failing but the whole mission succeeds so everyone ignores the known risks because management is interested in their return on investment. That works right up until the rocket explodes and suddenly there are lots of external people asking serious questions. Boeing is going through the same thing having optimised for ROI as well and its planes are now falling apart on a daily basis.
Who always gets in trouble for this? More often than not the developers and operators who in a high pressure environment optimised what they were told to optimise and gamed the metrics a little so they weren't fired or held back in their careers.
Naming it "muda" helps push it that way, too: If any of those higher-ups decide to look up the word, they'll see that you're calling bugfixing "pointless work".
Professional athletes have a lot of telemetry on them. But some of that telemetry makes sense during training, and maybe makes more sense for a brief period of time while they work on technique.
You focus on something intensely for a little while, get used to how it feels, then you work on something else. If your other numbers look wrong, or if it's been too long, we look at it again.
I think there are two answers here. The first is treating software develolment as a process - and using some form of statistical process control to “manage” the process - reduce waste etc. this is it turns out exactly what Agile / Scrum was / is all about. Remember those “retrospectives” we are supposed to do - that’s the point at which we whip out a diagram and see if we are trending out of band.
It’s fine. It works really and it does help reduce waste - which for any sane world reduced waste is a proxy for productivity.
But no matter what that is only “doing things right”. It is not “doing the right thing”.
At this point we can also say we have tactical and operational parts under control (or at least monitored). The “doing the right thing” (or strategy) is next and this means something most business refuse to accept - that their grand strategy, their Vision, might be total bollocks.
Searching for productivity in software teams is the new version of blaming lazy workers in the factory floor - we are all software ompanies now, and if you are not succeeding you can either blame your crappy product market fit, your strategy or you can whip the employees harder. Guess which is easier.
So if you and the CEO need to make fucking appointments to have a honest chat about strategy something has gone terribly wrong with your companies ability to quickly respond strategically - if you have a problem, it’s not the employees that need to run faster.
edit: yeah I started ranting there, sorry. The whole “takes ages to get face to face time” is a red flag, but you can get around it. As for the rest, I really respect the attempt to treat the whole company as a system - not just the software team - the product bet idea clearly sets out goals and places emphasis on whole org. That’s really great but as you say “executives prefer if other people are held accountable”. But if they can learn a company can be programmable (then they might learn to start programming)
> this means something most business refuse to accept - that their grand strategy, their Vision, might be total bollocks
Actually this is often talked about, but only internally to the senior team. You generally don't want to be saying "actually widgets might be a bad idea" which might horribly demoralise your widget factory workers, when you might well conclude after discussion and analysis that actually they're still a good idea.
Seems like the author of this is aware of Goodhart's law, but it bears repeating: both of the proposed real measures (RoI and value-add capacity) could come at the expense of uptime and stability. John Doerr talks about balancing OKRs, so you could have you product bets and...something about stability: hitting SLAs, whatever.
And the term "product bets" makes the leadership team sound like Mortimer and Randolph Duke.
With articles like this I always think it'd be really interesting to hear from others on the team. I find it really hard to imagine there's not going to be pressure (either self-inflicted or from above) to 'game' it, or how do you decide what's gaming and what's pragmatic anyway - perhaps better said I find it hard to imagine people don't feel like it's being gamed, that they're having to add value where they think they should be maintaining, even if OP or whoever deciding that thinks it's the right call.
(And to be clear I can't say and am not saying which is right, from over here with zero information about any such decisions.)
It could be fun to have a platform for honest blogs about work stuff where it's grouped by people verified to work (or have worked) at the same place. (Or, I realise as I write that, a descent into in-fighting and unprofessionalism..)
I would point out that Martin Fowler, Kent Beck and Gergely Orosz were writing that you cannot measure productivity of a single team member.
*You can get a rough sense of a team's output by looking at how many features they deliver per iteration. It's a crude sense, but you can get a sense of whether a team's speeding up, or a rough sense if one team is more productive than another.*
So in that sense "value add time" will be a valid metric. But it is still not a single number that one can give without context to the CEO, that CEO still has to ask questions like "we usually have X time spent on fixing bugs, aren't you gaming the system to get your bonus?". That is the CEO job to understand the metric not to be gamed...
Risk is lost in the discussion but very relevant to management.
In theory, in the most productive engineering organization, each person is the expert at what they're doing and work is factored to avoid overlap. This happens somewhat naturally as people silo (to avoid conflict and preserve flexibility). This is actually the perfect system, if your people are perfect: their incentives are aligned, they're working with goodwill, etc.
But that makes every person critical, few people can operate at that level all the time, and fewer still want to work alone doing the same thing ad infinitum. Also, needs change, re-factors work and its mapping to workers.
So then you expand the scope to productivity over time with accommodation for change - i.e., capacity.
But by definition, capacity far exceeds output, and is thus unobservable and unmeasurable.
So instead you experiment and measure: switch roles, re-organize work, address new kinds of problems and prototype new kinds of solutions, bounded by reasonably anticipated product needs.
And to inject this uncertainty, you actually have to inject a lot of countervailing positivity, to deflect the fear of measurement or failure, to encourage sharing, etc.
Unfortunately, these experiments and freedom from consequences is pretty indistinguishable from the bureaucratic games and padding that productivity measures are designed to limit.
Productivity is a totally irrelevant metric. It takes 2 people, 5 years to built a 50 foot sailboat. It takes 10 people, 4 months.
Which is most productive?
That depends on the price and the quality of course.
Never measure for quantity.
Always measure for quality in the environment you are competing in.
And since most professional work is about making money/use value, it would be a could start to measure how much money/use value people are contributing to. If people are not actively contributing to this they are not productive.
It seems like the main problem with any productivity metric, is that once you tell your workers what the metric is, it will be gamed.
However if you collect the metrics in secret, and only use them as a guide for where productivity could be improved, it seems they could be more useful.
I wonder if any engineering managers are doing something like that?
I guess it would be hard to scale to a large organization, as the you would need to share the metrics with the line managers.
I suspect that many quietly effective employees are doing something like that, and that it is a smart way to increase the performance of yourself and your team.
I also suspect that it is not possible to share the metrics, and that you would need to do "parallel construction" to justify your decisions without revealing your real methodology.
I feel like it’s easier to measure for large scale back end systems. Where you can look at things like throughput, scalability, failure rates, latencies, uptime, etc. So to an extent developer productivity can be thought of as the derivative of improvement in those metrics.
For front end development, you need to look at user satisfaction measures. Where it’s a bit more difficult to quantify which contributions are most responsible.
The article remind me of an add in Dr. Dobbs Magazine inside back page.
It is a Microsoft add with two pictures in it.
In the first picture there are two developers in a conference room late at night. There is a Pizza box open with half a pizza and a board with a software design that looks like spaghetti. The developers hair is messy. The looked stress.
The second picture is of a group of people walking in with birthday cake on a bright sunny day to the same conference room. The two developers look happy.
The manager is somewhere in the picture. I think. Is a MS .Net commercial.
I could not find the add in the wayback machine.
Companies attach yearly bonus to production/profits meeting goals/targets to increase productivity.
Whatever productivity is.
I might have skimmed too fast. Productivity seemed not defined?
IMO productivity is all about making product that get used. And profitability is how much people willing to pay to use such product.
I'm reminded of a conversation I had with a colleague from Georgia, who had a beautiful southern drawl. He said, "The way Ah measure Ahr Oh Ah is Ahr over Ah."
Assigning productivity numbers to individual engineers is enormously difficult-- but at the level of VP of all programming, it seems simple: At the end of a quarter, you know how much you spent on programmers during that quarter. And a couple of quarters after that, you should know how much value was added. Why not calculate ROI as R over I???
Would it help to demand that your non-technical leadership knows category theory, queuing theory, version control, and infrastructure automation?
One way that helps me is that I have nobody on the C suite but me. Because when some lazy business jerk with no knowledge of even a basic functor starts telling me what I should do to be accountable to them, I give them a big fat boot.
Obviously a downside is that scaling my organization to more than one head count is going to be a little difficult!
A proto-measure is whether the software has active users, whether as a library or application or service, etc., and whether internal or external. That's after a substantial initial seed investment. If it still has no active users, it's time to change focus. It it has active users, then assuming it ultimately rolls up into a revenue generator, it must continue receiving funding. This is not a full measure, but it's a foundation.
A reasonable bet should correspond to the (estimated) probability of its outcome. So the amount someone is willing to bet should directly reflect the value it is estimated to generate. Betting higher or lower makes no sense.
We may want to bet more for more likely outcomes, even if the expected return is lower, if we value predictability and the cost of a bad portfolio outcome is high.
"but the ceo had a scheduling conflict so he couldn't come to this thing that was really important to him". Does that ever make sense to anyone? It never made sense to me. I think it's about classism. These guys only want to hang out w/ the lower class if they're at the front of the room talking otherwise they want to come off as mysterious and too busy to have any time. What a load of shit.
Of course if you just take point samples, it fails as you say.
But I don't think people really understand what LOC stats look like in practice for engineering teams. The pattern I've consistently seen --across several companies and many teams-- is that most low-performers simply don't commit very much code every month/quarter/year/whatever. I'm not talking about the senior engineers whose job is architectural design and advising and code review, and so on, who contribute a ton to the project without a line of code. I'm talking here only about junior- or mid-level software engineers whose job is supposed to be hands-on-keyboard code. You will always find a number of them who simply push very little code of any significance, compared to their peers. This will be visible in their commits.
So while it's true that there's no function productivity=f(LOC), there is still information to be gleaned from reviewing the commits (in detail) of everyone on the team, and often one will see a correlation where the people who are not delivering much value to their team (they fix only few bugs, or they take a very long time to implement things, etc) they more-often-than-not have very low commit stats overall.
Sure, this is useful for the outliers who just straight don't work as much, but much less useful for telling the difference between a busy person and a productive person.
I have seen bad devs who make daily commits with tens or hundreds of LoC, but whose tools never actually convert to production use because they are mired in development hell, until someone pulls the project away from them and hands it to someone else.
Devs who are so bad that they legitimately spend weeks trying to make a tool that I then built in 2 days.
I think LoC irks me so much, because if I finish someone's "2-month" project in a week, LoC optimization effectively penalizes me for it.
It seems to me that “productivity” is being used here carelessly. It is a ratio of effort and resulting output. High productivity will produce more output given a specific effort, than low productivity, given that other factors like quality are constant.
Frankly, in software we cant even determine the extent and relationships between requirements, effort, quality and output in any predictable way. Given this, isn’t an attempt to produce some resulting measure like productivity meaningless?
Maybe one day they should measure the amount of time they spend on this and improve their own productivity by not wasting so much time chasing a fool's errand.
My TL;DR - estimate (sustainable) Value-add capacity.
The current/instantaneous value-add capacity is what the team/department can deliver in a short-time without consideration of hidden costs (tech debt). The 'sustainable' modifier adjusts that to be a long-term average.
This is something developers and team managers will certainly know, but is good to see framed from the top-down (though I don't know how many C-level execs know tech debt 'in the gut'). Usually it's ignored until it gets so bad a rewrite is required.
The executive summary of this article (for me) is: burst the delusion bubble around your stake holders. This is admirable, but that delusion existed in the first place for a reason. The reason still exists.
TLDR This is an article about a new way to measure software engineering productivity. It discusses the challenges of measuring productivity in software engineering. The author proposes a metric called value-add capacity. This metric focuses on the percentage of time engineers spend on activities that directly add value to the customer. The author acknowledges that this metric can be easily cheated. However, he believes it can be useful if used honestly.
Maybe I'm assuming levels of high functioning and healthy relationships that are unwarranted, but I read this article totally differently.
You're misunderstanding leadership roles, where there's very little 'work-life separation', only 'work-life balance'. You have immense freedom on your calendar, but you also frequently think about work when at home and understand that work can call you back at any hour. Depending on the type of org, you're reasonably likely to spend weekend time at meals or other activities with your coworkers, often with wives and kids along for the ride. Your coworkers are broadly agreeable but, if you had a serious personality clash, you probably wouldn't stick around for that long because such a living situation would become untenable.
In this context (and the remote work situation), his house is presumably a nice place for hosting, and 'come to my house to work on it' is the good-boss version of, 'well unfuck yourself and get back to me in a week with an answer, and it better be good.' It's inviting collaboration and focused time, and the CEO's responsible, agreeable coworker (author of the article) accepted that generous offer and made use of it.
Yes, that’s exactly right. I appreciated the CEO’s offer and was happy to accept. He added on an offer to take me boating on the lake near his house. The actual meeting waited until a bunch of us were in town for an unrelated meeting, and the offer of boating turned into a party for all employees who were in that town.
You’re making a lot of assumptions. The reality is that I was going to be in that city for other reasons, and the visit to his house—which I appreciated and wanted—was tacked on to that trip, and followed by a party the CEO hosted for everyone else who was also in town.
Well, okay, I'm not going to contradict your personal experience. But here's what you said in your article:
> It started half a year ago, in September 2023. My CEO asked me how I was measuring productivity. I told him it wasn’t possible. He told me I was wrong. I took offense. It got heated.
> After things cooled off, he invited me to his house to talk things over in person. (We’re a fully remote company, in different parts of the country, so face time takes some arranging.) I knew I couldn’t just blow off the request, so I decided to approach the question from the standpoint of accountability. How could I demonstrate to my CEO that I was being accountable to the org?
So what I knew is that you said you didn't want to do something, you had an argument, he asked you to come to his house, which is not in your home town, to talk things over, and you knew you couldn't just blow off the request, and had to demonstrate your accountability to the organization. True or not, can you see how one might draw the conclusion that you were forced to present yourself to the CEO in his house in order to show that you were going to toe the line?
False dichotomy. (And I hate the trope that your comment embodied.)
But a CEO probably should know enough to know that certain things are not directly measurable, and should not push their direct reports into trying to directly measure those things. Otherwise the CEO gets meaningless measurements, and then tries to use the meaningless measurements to steer the business.
Sorry, I stand corrected. All of Google's success is due to luck, and none of it due to the talent and skill of their people or the decisions they made.
If they were scrum coaches saying this makes some sense. But apparently they are XP coaches. Scrum says very little about the kind of activity you are doing. You could scrum building a tree house. XP says a lot about programming. An XP coach cannot be someone who never programmed in their entire lives. I am not sure I like all of XP, though. I like using TDD for everything. I am not sure I like pair programming. It sounds exhausting.
This comment is so incredibly useless that it’s hard to follow up, but I’ll try.
Since you seem to know how to effectively manage an engineering team -- or, at the very least, know how it looks like -- please enlighten us: how would you measure productivity/ROI/whatever useful metric?
Not them, but I agree with their point that management is, by definition, where the buck stops when it comes to projects, yet are never held accountable when projects go over budget or behind schedule, and that is instead put on developers.
I think that productivity has to be defined by deliverables achieved on-time.
If one developer is consistently failing to achieve the time boxes that their peers collectively set, that is a red flag for their productivity.
If any meaningful *percentage* of your developers aren't hitting their KPIs, that's a management failure.
This distinction is important, because you can't accurately measure productivity if you don't accurately set goals.
Unfortunately, this assignment of responsibility rarely actually works out properly, and developers get used as scapegoats by bad managers who are in fact themselves unproductive in their NoSP (number of successful projects) metrics, and who go from job to job and group to group blaming bad developer productivity for project failures.
There is a reason we hear about developer productivity and not manager productivity, in an industry where most projects fail to meet project goals (which are a management responsibility to ensure, not a developer responsibility).
Time boxes in software development are a delusion. There's a reason why the entire industry shifted focus from outputs to outcomes when it comes to software engineering.
It should be outcome-based at every level, not just engineering.
Time boxing as a group (e.g. scrum poker) is useful for preventing a leader from underestimating time required for a task, which is a necessary safeguard on most teams when the engineer will be penalized if they don't meet a deadline, rather than a leader being penalized for setting too short of a deadline - even if this is the n-hundredth missed deadline by a report of theirs.
I think he/she is reacting mostly to this quote from the article, not to the main article topic:
> I have a good answer: my job is to double our value-add capacity over the next three years. Essentially, to double our output without increasing spending.
> You know what? With my XP plans and the XP coaches I’ve hired, it’s totally doable. I think I’m being kind of conservative, actually.
I have a simple proposal why SE’s are externally strapped for time: upgrades.
No not security patches. Upgrades. Like ‘oh the newest best version of React was released yesterday and I’ve just GOT to have those and management doesn’t understand how productive I’ll be and they shorted us on our last upgrade by not letting us go all in’
This endless treadmill of half baked framework features and ui refreshes streaming in the root problem.
I really think a team that can close the gate and seal the seams from the problem would be the most effective.
That would mean your most important KPIs would be something like:
* number of new features imported from frameworks (close to zero is ideal). These are expensive, like millions of dollars each.
* number of security patches applied/response time/etc (maximize these)
* eliminate things with no or short LTS, measure weighted lts: (LOC using framework multiplied by framework LTS)
Now this in and out itself isn’t sufficient to measure an organization’s productivity, but these would would keep it on the straight and narrow so other measures would become effective.
"Doing them swiftly, efficiently, and -- most of all -- completely is one of the most critical skills you can develop as a team."
That all sounds great. However, I'd like to understand what teams are actually able to do this, because it seems like a complete fantasy. Nobody I've seen is doing migrations swiftly and efficiently. They are giant time-sucks for every company I've ever worked for and any company anyone I know has ever worked for.
The fact that it takes decades to master such a mundane task may mean the entire approach is wrong. The article hand-waves a lot of the complexity of "automating as much as possible."
In my opinion, the solution lies in append-only software as dependencies. Append-only means you never break an existing contract in a new version. If you need to do a traditional "breaking change" you instead add a new API, but ship all old APIs with the software. In other words - enable teams to upgrade to the latest of anything without risking breaking anything and then updating their API contracts as necessary. This creates the least friction. Of course, it's a long way for every dependency and every transitive dependency to adopt such a model.
This is far too narrow of a scope to be broadly applicable. The industry in general does not spend nearly as much time upgrading packages as you imply, and your metrics don't make sense either. How do you measure the number of "features" imported from a framework?
> It’s ridiculously easy to cheat this metric. Even if you correctly categorize your muda—it’s very tempting to let edge cases slide—all you have to do is stop fixing bugs, defer some needed upgrades, ignore a security vulnerability... and poof! Happy numbers. At a horrible cost.
> Actually, that’s the root of my org’s current capacity problems. They weren’t cheating a metric, but they were under pressure to deliver as much as possible. So they deferred a bunch of maintenance and took some questionable engineering shortcuts. Now they’re paying the price.
> Unfortunately, you can get away with cheating this metric for a long time. Years, really. It’s not like you cut quality one month and then the truth comes out the next month. This is a metric that only works when people are scrupulously honest, including with themselves.
I've had the same experience with well-meaning productivity metrics collection: Even the execs who were really trying to do the right thing would accidentally invent a new metric that looked fantastic for a couple years, then later collapsed when everything else caught up. By then it might be someone else's job to clean up the mess.
Like the author said, this difficulty in this problem is that it can go on for years. If you have an executive team that actually knows how to balance the metrics with the things that are harder to tracked, it might not be a problem. However, as soon as you get an executive looking to game the system for a couple years before jumping to another job, it becomes a doomed venture.