Technical Debt Management

Sorry for posting so late in this blog. But one of the advantages of being late is that I could read all the previous entries. I’d like to pick up the thread Carolyn initiated (on Feb 11) on the “decision what TD to manage”

As an architect working most of my time in industrial product development (at Siemens) a thoughtful decision on what TD to manage is really essential to balance the need between delivering a viable product (release) with useful, market-competitive features and the demand for industrial-grade design and code quality. Herb Sutter expressed this “conflict” well: “Features are your asset, Code is your liability.” In real-world development TD is always present, due to reasons noted by various authors in this blog and the literature

  • The (conscious) decision to trade quality for speed – and, therefore, the need for a later refactoring (Ward Cunningham’s original motivation for the TD metaphor)
  • Wrong or unsustainable design and technology decisions, or to end-of-life of chosen technologies
  • Changes in requirements and business cases are not addressed in existing design and code
  • Process-Quality Paradox: processes are designed to bring discipline to development – but at times they can lead to situations that can decrease quality
  • Architecting by Osmosis (see Damian’s blog post)

TD is also not necessarily a consequence of bad decision making or neglecting the need for action if consciously taken! For instance, a design and technology decision by itself might be highly viable, but if the project team does not have the skills to understand, use, or implement it correctly, debt likely occurs. Often this fact is noticed first time when the system is already under implementation. As another example, long-living systems regularly experience significant changes in requirements or even their business case. What was a sustainable design and code before is insufficient and a debt after such changes.

Putting the analysis of TD root causes aside for a moment, management of technical debt is a continuous activity performed by architects. Continuous on an almost day-to-day basis. This puts the following four requirements on any TD management environment

TD identification and assessment must be performed to a large extent automatically and daily. As an architect, my job is to guide development each and every day – thus I need a daily overview of all concerns of architectural interest, which includes the current state of TD in the system. The blog post by Andriy Shapochka (Technical Debt Evaluation in the context of Architecture Assessment) suggested addressing TD in the context of architecture evaluations. While I am an advocate of regular architecture evaluations, and also support Andriy’s suggestion to make TD an explicit concern in such evaluations, they are too much ritual for day-to-day support. As such they are an important, but complementary guidance for TD identification and management. Fortunately, there are a lot of (open source) tools out there that do the job. They may not be perfect, but they are good enough for practical application (perhaps after some configuration and additional plug-in development)

Identified TD must be put automatically into the working context of development to decide where in the system it is of value to manage it. Knowing what TD is present in the system, and where it is present in the system is only one side of the coin. The second side of the coin is a suggestion of which TD to address. For instance, if there is heavy TD in a module that is not touched at all in the current release but is of sufficient functional and operational quality, I actually do not care. On the other hand, if TD is in a module that experiences many code changes due to feature development, there is need for action, since the TD may hinder feature development progress and quality. Note that this activity is not about TD prioritization, but about general TD management (see Carolyn’s posts). Ideally, the metrics that give me this information are simple and meaningful for an architect. For instance:

  • TD occurs in modules to be touched when realizing the user stories of the next increment / release
  • TD occurs in areas where technology changes are planned
  • Current coding activities in a module are to a large extend refactoring and scaffolding rather than real feature extensions (noticed by a lot of code move, remove, change activities)
  • Current coding activities in a module spread out or ripple through to other modules far too often
  • Increased coding activity is in architectural sensitivity and trade-off points
  • Getting (unit) tests to green appears to be tricky, requiring multiple iterations of code changes in a module

A lot of techniques can be used for this purpose – mainly based on software analytics. Coding activities are analyzed regarding their nature (e.g., extensions, changes, refactoring) and quantity, and related to architectural, technology, and quality concerns of interest. Both for the concerns of interest of the current increment / release in particular, but also for the upcoming increments / releases in general. In the end architects get advanced support to decide where it is of value to actually address TD.

TD to manage must be prioritized according to concrete increment / release goals. Once it is known in what modules of a system to address TD it is necessary to prioritize the TD observed in these modules. This prioritization should be in direct support of concrete TD pay-off goals for the current or next increments / releases. For instance, the integration of new functionality into the system, or the replacement of an outdated / no-longer-available technology. If, in a given context, certain TD causes rippling effects to other modules, it is likely of higher priority than TD that stays internal to the module. TD in module areas that are hard to get stable or modified to prepare for upcoming functional extensions is likely of higher priority than TD in module areas that are only moderately changed during an increment or release. Several blog posts are already addressing TD prioritization, so I am sure there is a wealth of techniques available to address the prioritization question.

Measures to address TD must balance effort and value. In my personal experience, technology-oriented people tend towards perfectionism, seeking for measures to fully resolve a technical debt. Despite the fact that I doubt this is possible in an agile world where requirements and technology is in constant flux, it is definitely not economic. TD was originally coined as a metaphor that connected technical with financial concerns. As software project business is – to a large extent – driven by financial concerns, a pure technological view is not helpful. What if, for example, a perfect resolution of an identified high priority TD is not achievable in the given timeframe for the next release? Just starting and stopping (for a short time) at the release date is not an option.

More important is the ability to deliver a stable, working solution, regardless of how much TD is in it. Being able to deliver on time, on functionality, and on quality may from a purely technical perspective, therefore, require “compromises” in the TD resolution measures. In my IEEE software article on management of technical debt I discussed situations where from a project perspective it was better to pay the interest than resolving the debt. Most interesting was a project, where a “debt conversion” was the best choice. The existing technical debt was not properly resolved, but addressed by a measure that was “cheap” and “doable” in time, but targeted at the symptom and not the root cause of the TD. This was necessary to meet the contractually defined delivery date of the system. From the perspective of TD management the debt conversion “bought” the project team time to address the TD properly, at the expense of some “interest” — which was visibly lower than then the original interest, but still big enough to not neglect the TD in future releases.

 

For a sustainable TD management, it is also important to understand and address the root causes of TD. As said above, there will always be TD in a system. Part of the TD handling is TD management (as above) but equally important are measures to minimize TD:

  • A paper I recommend to read in this context is “Software Process versus Design Quality: Tug of War?” that discusses the dependencies of process and quality / TD.
  • Other research is in the direction of architecture styles and patterns that minimize TD or the effect of TD on an entire system. For instance, the larger the system, the more likely TD will occur, and the more chances are that a specific TD spreads across the entire system. Microservices are probably the most widely known architectural, developmental, and organizational approach that has a positive effect on TD. Microservices promise less TD because each Microservice is developed, deployed, and evolved by an independent small “two pizza” team sitting in “ear-shot distance”. Microservices also promise to limit the outreach of a specific TD inside a Microservice to at most its borders, since each Microservice is a self-contained deployment and execution unit that it is to the largest extent independent of other Microservices.
  • Educational measures can limit the occurrence of TD substantially, i.e., regular technical trainings for all roles in software development regarding deliberate design and coding practices. In the architect’s responsibility is also to create designs that are appropriate for the system and also comprehensible and realizable by a development team with a given skill set. The technically smartest solution is often not the best solution from a project perspective.

Going into depth on how to address TD root causes is probably worth its own blog entry, so I leave it by the above short and exemplary considerations.

To conclude, in professional software development both continuous TD management and a consequent identification and addressing of TD root causes is essential to keep TD in a system at an affordable level.