Measuring and Managing Technical and Social Debt at once

Ipek is probably going to kill me 🙂 since she may still be skeptical for this social debt thingie, but, in my qualitative research experience in industry I have observed at least four times (i.e., in four different companies) an intense correlation (and, more often than not, causality) between social and technical debt. The force that Philippe, Hans and I called social debt a while ago is the result of accumulated sub-optimal socio-technical decisions (for example, choosing an interaction protocol based on wikis instead of emails) and is often resulting into circumstances that force technical debt onto would-be perfect code-bases. A trivial example I observed lately manifested with the addition of a new outsourced partner that forced architecture changes. In this sea of madness, a key research question I’m struggling to address now is:

What are the factors at play around the relation between social and technical debt? How can these be managed?

In my previous work, I stumbled more than once on this intense and often unpredictable relation resulted, for example in a series of organizational and socio-technical patterns. For example, the social debt conceptual model we developed as a result of an industrial case-study:


Also, there were several patterns we observed focusing on software architecture… These seemed to suggest ways in which architectures are themselves a pivot for driving the discovery and often the management of both social and technical debt… most notably, the “Architecture by Osmosis” social debt pattern from [2]:


We witnessed this very pattern iterated several times in a large industrial player extremely active in the aerospace software market… Essentially, disgruntled clients would call up operators, who would accommodate part of the required changes and alert developers, who would accommodate another part of the changes and alert architects who would change the architecture and connected decisions with little or none of the information that goes inevitably lost in the communication chain, i.e., the


chain. In the same paper, we tried to offer a tentative metrics framework that takes several organisational and social measurements to understand what could be the impact of social debt connected to miscommunication… this leads to a key challenge, I’ll be trying to address in the near and not-so-near future, that is:

A selected subset of different metrics and patterns for technical debt should be used or correlated to address and further investigate the interactions between technical debt and its social / organizational counterparts.

[1] Damian A Tamburri, Philippe Kruchten, Patricia Lago and Hans van Vliet “Social debt in software engineering: insights from industry” Journal of Internet Services and Applications – (2015) 6:10 DOI 10.1186/s13174-015-0024-6

[2] D.A. Tamburri, E. Di Nitto, “When Software Architecture Leads to Social Debt” WICSA 2015 Montreal, QC, Canada – May 4, 2015 to May 8, 2015 ISBN: 978-1-4799-1922-2

A Tale of Three Stories on Technical Debt Estimation

1. TD is omnipresent but industry needs estimates (Industrial Report)
The report has been compiled based on feedback from a Senior Software Architect
Domain: Online marketing solutions (providing manufacturers insights for their products on retailer site).
Company profile: Startup company with offices in 3 different EE countries. Serving large enterprises in three continents. 50 employees, development team of 30 people spread in two countries.
Brief software description: Single platform comprised of numerous components. Written in C# and .NET, deployed on a cloud computing platform. Central DB where all components have access. The platform is mainly comprised of backend services used to import products from a manufacturer into the database, and then match these products to supported retailers. Crawlers deployed in Virtual Machines are used to find retailers.
Symptoms of design problems – classified according to mapping study by Li et al. (2015):
1. Deployment (Infrastructure TD) of crawlers in virtual machines makes it almost impossible to integrate them with other subsystems. They do not expose an API so other components can invoke methods.
2. Direct access to the database (Architectural TD). Each component communicates directly with the database. This makes the data fragile as invalid data can be introduced from any point. Moreover, any change in the database schema may break the dependent components.
3. Lack of tests (Test TD). There are no automated tests (unit/integration tests) for individual components. As a result, any change in the imported data affects the crawlers and there is no way to validate that new implementations will not alter existing functionality.

Do the managers and developers perceive the problems as a form of TD?
Yes they do. Actually there is no commonly agreed definition of TD in the platform, however everyone realizes the difficulty of making changes to the code and how fragile it is. All developers understand that the platform needs some kind of refactoring.

Are managers and others willing to deal with technical debt?
Not surprisingly, the main concern of managers is how fast a novel feature will be in production. Nevertheless, management clearly sees that by ‘quick and dirty’ implementation without proper design and planning, bugs show up at increasing levels. Clients’ disappointment, which is often clearly communicated to the company, is also a direct consequence. When symptoms of technical debt are discussed, management seems to perfectly understand it; however they are somehow unwilling to make extensive changes to the platform, claiming that they do not have an accurate estimate for the required effort.

Is TD getting bigger over time?
Definitely yes. Supporting new retailers requires the addition of new crawlers. In the existing design, the higher the number of retailers (i.e. the bigger the business is) the harder it is to make changes to the product matching services (all crawlers must change).

Is it difficult to determine the effort to deal with technical debt?
It’s a kind of a vicious circle: The lack of boundaries between different components of the platform renders any effort to identify and measure technical debt almost impossible. However, without convincingly accurate estimates any plans for drastic changes, such as proper modularization, are abandoned.

What is the main reason for the accumulation of TD?
The platform started as a trial product with a poor initial design. Continuous pressure for the incorporation of new features and a marketing approach claiming an “instant- development time” model led to the accumulation of TD in all components. Moreover, a lack of processes especially during the initial stages resulted in incompletely specified requirements, poor architectural documentation and inefficient testing.

2. Are TD estimates accurate? (Academic Experience)
Context: Two CRUD web applications with state-of-the-art technology have been developed (Java-based enterprise applications, one with the Spring Web MVC and one with the Apache Struts 2 framework). Systems have evolved over successive ‘versions’ by gradually adding features. System was relatively small (~36 classes). Development time was approximately 25 man-days for each application.
Goal: a) To measure TD resulting by following strict programming practices imposed by the employed frameworks on typical Web applications. b) To investigate whether TD increases with the passage of versions.
Means: TD measured in both applications by SonarQube platform according to the SQALE methodology. Type of suggested refactorings and actual time to resolve reported issues have been recorded.
Spring-based system – TD: 1d 4h (only Major, Minor, Info issues) -> Actual time to resolve: less than 2 hours
Struts-based system – TD: 1d 6h (only Major, Minor, Info issues) -> Actual time to resolve: less than 2 hours

• Framework-based development does not lead to blocker/critical issues (limited TD)
• No tremendous improvement achieved by repaying TD
• Required effort was significantly lower than estimates

3. Estimating Technical Debt (our vision)
We believe that assessing software quality by universal thresholds is risky and flawed: for example, prohibitively high complexity in one domain (e.g. Information Management Systems) might be completely reasonable in another domain (e.g. image processing).
TD should be assessed realistically, that is against the potentially achievable levels of TD for each system under study. Let us assume any model where TD is assessed by a function that considers various parameters of interest. Such a quantifiable measure can serve as a fitness function to drive an optimization approach (Harman & Clark, 2004), that is, a process of obtaining the design/system which optimizes the selected fitness function.
Having the fitness function value for the ‘actual’ and the ‘optimum’ system one can determine the distance between the two. This is a single figure which can quantify the principal of TD. Under certain conditions, one can determine the required actions (refactoring effort) to move the actual system to the optimum position, i.e. to cover the entire or part of the distance.


Benefits from this perspective (measuring TD as distance):
• TD is realistic: it is assessed against the potentially achievable quality for any system depending on its characteristics
• TD can be quantified by automated and thus consistent means
• TD principal can be mapped to actual refactoring activities required to cover the distance
• Based on historical data, one can reason about the benefit of addressing TD and find out how much maintenance effort is saved
• It becomes possible to assess not only negative but also positive contribution to TD, i.e. actions that reduce TD

Harman, M., & Clark, J. (2004). Metrics are fitness functions too. In 10th International Symposium on Software Metrics. Proceedings, pp. 58–69.
Li, Z., Avgeriou, P., & Liang, P. (2015). A systematic mapping study on technical debt and its management. Journal of Systems and Software, vol. 101, March 2015, pp. 193-220.

How to measure architectural technical…

How to measure architectural technical debt?

My company (ABB) conducted a survey among hundreds of developers, architects, and product managers, asking about the source of technical debt in their products. Many participants pointed to “poor architecture choices” as an important source for technical debt claiming that they do not have repeatable process to deal with such issues. The presence of poor architecture choices first needs to be established in an objective manner. Then costs and potential benefits for addressing the item need to be associated with each debt items. Afterwards a prioritization for repaying the architectural debt items can be made.

Research Question
How to measure architectural technical debt reliably, objectively, and in a repeatable manner?

Own Work
As architectural documentation is often not in a good condition to be analyzed, the source code and the architects and developers themselves are usually the best information source for measuring architectural debt. Existing source code analysis tools (e.g., SonarQube, NDepend, etc.) only include a few architectural metrics (Survey of Metrics), and mainly focus on design-level issues. We have extended and applied such tools on large-scale ABB software systems with varying results (IEEE Software 2013).
We have also worked on formal structural architecture models (IEEE Software 2011), as well as on formal architecture decision models, as a prerequisite for automated technical debt analysis (WICSA 2014, WICSA 2015, available as Open Source at CodePlex and GitHub).
To interview architects about technical debt, ATAM workshops have proven to be a useful tool to overcome a lack of communication that often leads to technical debt.

Related work
Architecture metrics for source code which are proposed in literature (e.g., Sakar2007, Bouwers2009) often require some subjective parameters as input therefore the metric values may not be comparable across products. Architecture decisions that are not or only indirectly captured in source code, e.g., the choice of a third-party library or the architectural constraints to assure conceptual integrity, are difficult to capture and analyze for architectural debt. Some case studies on architectural debt have been carried out (Bouwers2013, Martini2015).

Future work
More and better architecture metrics need to be established and validated for source code, design documents and requirements. Community benchmarks based on Open Source Systems could help improving existing metrics.
To analyze the technical debt coming out of poor architecture decisions, there is a need to capture decisions more formally, without overwhelming the users. Technical debt analysis tools for such models incorporating software economic models would be the next step.

Hi all, first of all…

Hi all, first of all have a great 2016!

Key challenge

There is a need of a Technical Debt prioritization mechanism to compare items among themselves for decision making.
I’ve been working in contact with several practitioners who struggle deciding what to refactor and when. The challenge is having fixed resources and hours or having to subtract such hours (and motivate such subtraction) from feature development. Usually companies have more TD than it’s possible to fix, and deciding becomes extremely difficult, especially for big TD items such as the Architectural ones.

Our recent work

We have written a paper that has been recenlty accepted at ICSE, SEIP track 2016 (A. Martini, J. Bosch, “An Empirically Developed Method to Aid Decisions on Architectural Technical Debt Refactoring: AnaConDebt”).
We have analyzed 12 cases of Architectural TD in 6 companies and we have developed and evaluated a decision-making approach based on the ratio principal/interest. Such ratio allows the comparison of TD items among themselves and with respect to different points in time (which tells the practitioners if it’s convenient to refactor now or later). We found that this approach, after several improvements steps, was very appreciated by the practitioners who are actually introducing it in practice. However, we found that both factors (principal and interest) need to be split down in several components (and usually there are many involved). First it’s important to find all the components constituting principal and interest (and a checklist is very much appreciated by the practitioners). We also found that it’s difficult to anticipate all the interest (especially the long-term one) in the beginning, and therefore it’s better to implement an iterative process that takes in consideration different evolution of TD using a roadmap (which is subject to variation). Also, the study shows that, if present, the interest on the principal makes a TD item less convenient to be paid over time and this is crucial information. We have developed some indicators that can help the practitioners taking the decision. However, we also found that metrics need to be mixed with qualitative information in order to provide an acceptable overview for decision-making, and that this step is not obvious.

Other work

The main idea of prioritizing technical debt using a cost/benefit indicator has been introduced in a short paper published at the Technical Debt workshop by Guo and Seaman, but I haven’t found much follow-up work providing a comprehensive indicator to evaluate continuously a large TD item. It seems that we researchers have been focusing on single measures but not on an overall decision-making process, which is very needed.

Future work

With the recent study we have done a first step. In the cases we used some measurements, but the availability and units of the metrics depend on the specific company. However, the high-level components constituting principal and interest seem to be quite generic (a part from a few specific ones). Going from basic measurements to an indicator-for-decision is not obvious and needs to be better studied, especially when a number of measures / qualitative information related to many factors are involved in calculating principal and interest.

Sustainability Debts The recent manifesto…

Sustainability Debts

The recent manifesto for sustainable design has identified a number of dimensions that can directly or indirectly influence the sustainability and longevity of a software. Roughly speaking, these dimensions can relate to the individual, society, economics, environment and technical.

It can be argued that a sustainable software shall deliver and sustain its value across these dimensions – while the system is in operation and as it evolves. The presence of debt on any of these dimensions is a threat on sustainability and may call for phasing out the software.

Research Questions:
How does the concept of debt and interest on the debt relate to the sustainability dimensions?
How can we predict, quantify and visualize the debt across these dimensions?
How can we manage the debt across these dimensions: negotiate and reconcile the conflicting objectives?

Closely Related Effort:
B. Ojameruaye and R. Bahsoon(2015). Sustainability Debt: An Economics driven approach for Using Technical Debt Analysis in Decision Making for Sustainable Requirements. CSR-15-03 Technical Report. School of Computer Science, University of Birmingham, UK (under submission – ICSE Software Engineering for Society).

Technical Debt Through Analogy: Call…

Technical Debt Through Analogy: Call for “Twin Assets”, Benchmarks, Artifacts and Repositories

Research Questions:

How can we predict the likely technical debt before we build the software? Can estimation through analogy, which is respectable in science, help here? How can the concept of “twin assets” in finance serve the above objectives? How can we identify these twins? The twins can relate to various concerns – how can we decide on these concerns, manage and reconcile the relative debts?

As a community, we may need to define benchmarks, call for artifacts and repositories which can assist the estimation of debts through analogies. The infrastructure can be specifically useful in estimating debts that can relate to non-functionalities – which are difficult to predict and visualize even in existing software.

An Example – Closely related work:

Though the work does not mention “technical debt” in its exposition, the treatment of debt is implicit. We had used (Performance benchmark repository). We had estimated the value of architecture decisions under uncertainty – relative to scalability scenarios of interest:

R. Bahsoon and W. Emmerich: An Economics-Driven Approach for Valuing Scalability in Distributed Architectures, WICSA 2008

Drawing on a case study that adequately represents a medium-size component-based distributed architecture, we show how existing performance repositories could be mined to value the ranges in which a given software architecture can scale to support likely changes in load. The mining is based on a financial analogy, where we utilize the concept of twin asset in financial engineering to justify mining relevant repositories. The mining process in then complemented with real options analysis for predicting the values resulted from the ranges in which an architecture can scale under uncertainty, where uncertainty is attributed to the unpredicted change in load. As the exact method for analyzing scalability is subject to debate, we focus the analysis on throughput as a way for measuring scalability. Using options analysis, we report on how ranges in which an architecture can scale, can inform the selection of distributed components technology and subsequently the selection of application server products.

I have recently tailored the above analysis to the benefit of cloud service selection. Work published in MTD 2013, UCC 2013 and Australian Software Engineering 2014.

A process for managing Architecture Technical Debt

In our previous work we had proposed a process for Architecture Technical Debt Management (ATDM).
Figure 2

The details of each ATDM as well as their input and output in the architecting process are decribed below.
1. ATD identification detects ATD items during or after the architecting process. An ATD item is incurred by an architecture decision, thus, one can investigate an architecture decision and its rationale to identify an ATD item by considering whether the maintainability or evolvability of the software architecture is compromised.
2. ATD measurement analyzes the cost and benefit associated to an ATD item and estimates them, including the prediction of change scenarios influencing this ATD item for interest measurement. For interest measurement, three types of change scenarios are considered: (1) the planned new features according to the version plan of the software project; (2) the already-known maintenance tasks that enhance specific QAs (except maintainability and evolvability) of the implemented software architecture; and (3) the emerging requirements. The first two types of change scenarios can be predicted while the rest one is unforeseeable. For some complex software systems (e.g., operating systems), the time interval between two releases can be very long. For instance, Microsoft Windows 7 Service Package 1 was released 16 months after the first release of Microsoft Windows 7. For such kind of software systems, it is inevitable that new requirements emerge during the development of a new release. Some of these new requirements need to be implemented in the release. Thus, in such cases situation, to ensure a reasonable accuracy of interest measurement, the interest of related ATD items should be re-measured at different times during the development of the release.
3. ATD prioritization sorts all the identified ATD items in a software system using a number of criteria. The aim of this activity is to identify which ATD items should be resolved first and which ones can be resolved later depending on the system’s business goals and preferences. There are a number of ATD items in a software system and all the ATD items will not be resolved at one time due to their cost or technical issues. The ATD items have different financial and technical impacts on the system. Consequently, it is wise to choose the items with higher priorities to be resolved first. Software projects have different context, and there are no standard criteria to decide the priority of an ATD item in a project. However, the following factors need to be taken into account in ATD prioritization: (1) the total cost of resolving an ATD item; (2) the cost/benefit ratio of the ATD item; (3) the interest rate of the ATD item; (4) how long the ATD item has been incurred; (5) the complexity (e.g., the number of involved components of an ATD item) of resolving an ATD item. Since not all types of benefits can be measured in a unified metric, it is hard to automatically prioritize the ATD items by tooling. However, an appropriate tool, which reasonably deals with the factors described above, can facilitate ATD prioritization.
4. ATD repayment concerns making new or changing existing architecture decisions in order to eliminate or mitigate the negative influences of an ATD item. An ATD item is not necessarily resolved at once. In certain situation, only part of an ATD item is resolved, because it could be too expensive to resolve the entire ATD item, and resolving part of the ATD item can make the ATD item under control with an acceptable cost. When an ATD item is partially resolved, the ATD item will be revised and split into two parts: the part that is resolved and the part that is not.
5. ATD monitoring watches the changes of the cost and benefit of unresolved ATD items over time. When an architectural change happens in the part of architecture design containing an unresolved ATD item or when one ATD item is partially resolved, the affected ATD item will be recognized as a changed ATD item. All the changed ATD items will be measured in the next ATDM iteration. This ATDM activity makes ATD changes explicitly and consequently keeps all the ATD items of the system under control.


  • Z. Li, P. Liang, P. Avgeriou, Architectural Debt Management in Value-oriented Architecting, in I. Mistrik, R. Bahsoon, Y. Zhang, K. Sullivan, R. Kazman (eds.), Economics-Driven Software Architecture, Elsevier, 2014, Pages 183-204.

SONAR Rules and Technical Debt

This is short narrative compared to my first one on Evolutionary Architecture, but I think it can have a big impact on how practitioner teams work these days. Many teams use SONAR (or related tools) to measure Technical Debt. These tools analyze your code base and estimate: your current debt is 2345,94 USD. They also support you with a pre-configured set of rules to determine what’s important to fix and what is not. SONAR, for example, categorizes debt items as info, minor, major, critical and blocker. A missing documentation of a method is “major”, also a magic number or a duplicated String value. If you iterate over a Map in a very inefficient way, it’s surprisingly “critical”. But to be fair, possible NullPointerExceptions are also “critical”. These issues are usually easy to fix.
To my surprise, classes with 1000 LoC and/or 15 dependencies on class level are OK, and even worse: if you have a class with 3000 LoC and 30 class level dependencies, you only have a major problem.

What do teams do if they have 250 SONAR violations? Right, they try to get rid of most of them as quickly as possible, by usually choosing the easy ones first. In the end, you have only 3 major issues left, unfortunately 2 really complex god classes with many dependencies and one 500 LoC method with a very high cyclomatic complexity. All is good! Only 3 major problems, we can lay back and relax! Our code is awesome and we don’t have much Technical Debt.

What are you saying, Sven? We only fixed the unimportant stuff? We should better have fixed the 3 remaining issues instead, because they are really slowing us down. They cause a lot of bugs and it’s really hard to make changes in these classes and methods? No, no, we disagree, because a) the SONAR guys are the real experts in this, they do the whole day nothing else than debt analysis and when they say something is major or critical they are probably right and you are not; b) it takes a lot of time to fix these 3 issues, if even possible. This code is really complex and it is really risky to break down! The way we did it makes even the management happy, because we spent only a couple of hours to basically reduce our Technical Debt to 400 USD and SONAR looks really good right now!

Maybe most of my colleagues are right with that. But I would be really happy if we could analyze the current state-of-the-practice during the workshop and, if necessary, come up with a better proposal on how tool vendors should categorize Technical Debt. Every team is of course free to configure their own importance of rules, but a solid and often cited publication will help to drive discussions besides “but SONAR says so”.

Evolutionary Architecture and Technical Debt

Evolutionary Architecture (EA) is an approach discovered by Neal Ford and Rebecca Parsons to help teams that need to deliver software early and often. This basically includes almost all teams developing enterprise software. Although a great idea implementing managing Technical Debt as a core principle, it is not picked up widely by the industry, which I think is worth exploring.

I offer two reasons – besides the early market entry – why teams accumulate structural / architectural debt, which to my surprise are rarely described:
(1) Most teams are using Scrum or nowadays also Lean Enterprise / Lean startup ideas to create software based products. In all of these approaches you have to deliver early and often, you don’t have much time for up-front architectural work, business stakeholders expect to see the first functionalities after 2-3 weeks, and more and more the following weeks. One experience, which drives this approach, is that many beautifully engineered software products have been built, but nobody wanted to use them – a waste of money. What’s the point of having a perfectly architected system nobody wants? New approaches focus first on finding out what users want with as little investment as possible (e.g. a MVP – minimal viable product or A/B testing). This deliberately causes structural / architectural debt (also other kinds of debt, but those are the expensive ones to really care about).
(2) Delivering visible functionality early and continuously has an additional benefit: it builds trust between the development team and the business stakeholders. Since software is invisible, it is really hard to explain why you create all these diagrams, documents, have all these meetings and discussion. The first times you are a business stakeholder, you understand that upfront investment in architecture doesn’t pay off immediately, but saves you a lot of effort and costs in the medium and long run. But after a few projects you realize that architecture is also a hypothesis about the future: things might end up differently. For example you give the development team enough time, but if the future is different than anticipated, developers might not be to build new functionalities due to the time necessary to revert wrong decisions from the beginning. But if you get features early and regularly, in my experience you are much more open and relaxed to spend money for architectural concerns later without fighting for every penny. The developers delivered what you want first and they showed that they understand that meeting business needs early on is usually more important than meeting technological needs. Bad architecture, however, is really costly (e.g. slow development, more and more bugs, stability problems) and quickly becomes a concern for the whole organization, which needs to be addressed immediately.

The EA practice helps to incrementally build and improve the architecture while uncovering and addressing architectural issues during software development without investing in significant up-front design (see also Eclipse practice page). This means that you only implement what you need right now to deliver your product successfully, knowing that things will change over time and you address the problems when they arise. Of course it’s not that simple because you need to discover, document, track and evaluate possible problems constantly.
One of its 5 core principles is described as “The last responsible moment – delay decisions as long as you can, but no longer” and this is all about managing Technical Debt (the other 4 are: architect and develop for evolvability, Postel’s law, architect for testability, and Conway’s law).

However, finding the last responsible moment is a difficult task. Mostly this moment passes by and you recognize it when it’s too late and then it is really too late: fixing the problem then becomes very expensive and risky. Ford and Parsons recommend to address architectural risks by constantly evaluating the current state of your system against “fitness functions” (a.k.a. quality attributes, non-functional requirements). How to recognize the last responsible moment is so far poorly described.

Except for a few articles and presentations, there is no documentation, guidance, templates or case studies to help you get started with EA. During the workshop I would like to figure out whether EA is a valid practice to tackle the problem of structural/architectural debt introduced by the pressure of faster market entry, deliver early and often and building trust. And if so, what do we need to describe and provide for practitioners and the community to get started, experiment, apply and improve this practice.