DevOps and Your Definition of Done

its-over-its-finally-overRegardless of the agile methodology you are using to drive your software development efforts, you should have an explicit definition of done. I say explicitly, because you will always have one – whether you define it or not. Regardless of your process, even if you are following a Waterfall based process, the level of quality you demand (or allow) your software to reach before you ship it is your definition of done.

Explicitly defining what Done means to your organization helps communication and collaboration, and allows you to set the bar for quality, as well as drive your process improvement efforts.

In this post I will provide some guidance on how to use your Definition of Done to drive collaboration efforts between your developers and the operations engineers, in an organization that is trying to adopt a DevOps mindset.


devops-cycleMicrosoft’s very own Donovan Brown gave us what I view as the best definition of what DevOps is. Even if you’ve heard it before, I believe that it bears repeating, in order to drive the point of this topic:

DevOps is the union of people, processes, and product to enable continuous delivery of value to the users

An organization trying to adopt a DevOps mentality should have every single member’s job be defined by that statement.

The Definition of Done

done_r_hiThere are many definitions for the Definition of Done (ironic, I know). I’m rather partial to’s definition, in the official Scrum Guide:

When a Product Backlog item or an Increment is described as “Done”, everyone must understand what “Done” means. Although this may vary significantly per Scrum Team, members must have a shared understanding of what it means for work to be complete, to ensure transparency

The emphasis is my own.

What most development teams end up doing is coming up with a laundry list of quality demands like the following:

  • Code complete
  • Unit tests code coverage is <insert some number here>% or higher
  • Automated build
  • QA Tested

Yours may have a few others, or missing some, or have minor variations on the same theme. The result is often disjointed, where some items may be nice and lofty asperations, but not achievable by the organization and the resources and knowledge they currently have.

Done for DevOps

ask-question-2DevOps, among other things, is about collaboration, and a shared responsibility for delivering features into production. The product’s Definition of Done should reflect this.

What I prefer to do is to use directed questions to drive the teams’ Definition of Done. These questions are, of course, asked with the aforementioned DevOps definition in mind.

The following are some examples of shared questions that drive this point and (hopefully) start driving the change in mentality required for success as a DevOps organization.

Start a conversation with the delivery team (developers, testers, ops, business unit, i.e. anyone responsible for the delivery of the product) during your next retrospective, post mortem, or whenever you discuss process improvement options, and ask one or more of these questions.

What must we do to ensure continuous delivery of these features?

The answers to this question is this question are as much the responsibility of developers as it is the testers and the operations engineers. Deploying in small batches of changes, architecting the solution in a certain way that makes (automated) deployment easier, using certain development practices, such as the use of feature flags, test automation, setting up infrastructure as code, and deployment patterns such as blue/green deployment increase the ease and likelihood of successful deployment.

Note that some of the aforementioned practices are in the hands of the developers, some in the hands of testers, and others in the hands of operations.

Collaboration is key.

What must we do to ensure that features delivered are valuable?

Let’s face it – ask 10 development teams why they are building their current feature, and at least 9 of them will answer “because it’s in the backlog”, “because my manager said so”, or “huh?”

The fact that our stakeholders, sponsors, product owner, project manager, or team leader, or CEO requested or even demanded a feature, may be a good enough reason to do something, but they are not all-knowing. We don’t know that the features will in fact be valuable to our business.

For an internal app, we will probably want to know if and how the new change affects time to complete a transaction or error rates. What measures could be put in place to prove that?

For commercial apps, we will probably want to know if and how the new change affects conversions, sales, consumption, etc.

For public sector apps, we might want to know how the new change affects consumption, speed of use, efficiency, backend costs, etc.

The business unit should provide these metrics. Developers must develop the features with hooks to enable measuring these things. Testers must verify that the metrics provide the expected data, and operations must monitor these.

What must we do to ensure that features are working properly?

Yes, of course you need to test your system, ideally with automated tests, topped off by some manual exploratory tests by your testers. Unfortunately, that is not enough.

You want to be able to know that your system is working in production. You want to be able to identify problems before they affect your users.

With the right measures and performance counters in production, monitoring not only your diskspace, memory consumption, and other hardware concerns, but also how your business scenarios are performing.

Make sure your definition of done asks questions such as “What do I need to measure in order to identify problems before our users are affected?”

Defects will escape your delivery teams’ quality processes. That’s a fact. Make sure they do not repeat themselves by asking yourself something like “What can I monitor to guarantee that this problem doesn’t show up again?”

Be sure to not only ask these questions; make sure you also answer them, and implement the hooks that will allow you to monitor these.


These are but a few conversation starters. Asking them is important. Answering them is crucial. Following up with an implementation will guarantee that your definition of done is in line with your business goals and your organization’s endeavor to adopt a DevOps mindset.

Keep Calm and DevOps!

Posted in Agile, Definition of Done, DevOps, Scrum | Tagged , , , , | Leave a comment

How to Maximize the Value of Your Planning Session

See the source imageBy now it is commonly accepted that the old way of developing software, in silos, with a big up front plan and design, with only a single true delivery to customers at the end of the project, also knows as the infamous Waterfall approach is not the best way of doing things. Many development teams have fully embraced the agile approach, while others have not (yet) fully done so.

Partial, or early agile transformation attempts can often be characterized by embracing some (but not all) ideas, or certain practices commonly associated with agile development frameworks, such as Scrum, but without fully embracing its principles and philosophies; these teams often get the “what” part of it right, but have not yet embraced the “why” or the “how”.

In this blog post, I will focus on how to improve the agile planning session, also known in the Scrum framework as the Sprint planning session.

A Quick Overview of Sprint Planning

planA sprint is a time-box of one month or less (commonly 2-4 weeks) during which the delivery team will develop, test, and build a potentially releasable product increment.

Sprint Planning, quite simply, is the act of figuring out the work that needs to be done in order achieve the sprint’s goal, I.e. create the releasable product increment, adding what the Scrum team has decided is the next most valuable functionality to deliver.

Sprint Planning, like all other Scrum practices, is a collaborative effort, which is to say, that while the product owner is responsible for maximizing the value created by the delivery team, they, the team are responsible to figure out how to deliver the work. Both are expected to work together to define each sprint goal, rather than having the product owner tell the team what to do.

Change is Difficult

time-for-a-change-897441_1280For many teams, especially those transitioning from a command and control style of organization to an agile, self managing style, embracing the change in management style is much more difficult than embracing a different cadence and/or set of practices. This often manifests in planning sessions where the product owner describes what needs to be done, and the team just listen. It really should be more of a discussion.

The delivery team needs to collaborate, discuss the goal and the plan with the product owner to ensure that they understand what they are asked to deliver.

The SPIDR Technique

spider-technik-user-storiesWriting user stories, while not strictly required in Scrum, is one of the most commonly known Scrum practices. The idea is to describe increments of value from the perspective of one or more users of the product, thus ensuring that the product increment is, in fact, valuable to the users.

In order to ensure that user stories are well enough understood, and that they are small enough to fit in a single sprint, the SPIDR technique provides a way to decompose stories into multiple parts, each delivering some observable incremental value to the users.

SPIDR is an acronym defining the most commonly used methods of decomposing a story: Spikes, Paths, Interfaces, Data, Rules. The SPIDR technique was created by Mike Cohn, and fully described in his blog.

New Practice: SPIDR Planning

collab-spidrIn my work with delivery teams transiting from a waterfall-based world to a more agile, Scrum-based paradigm, I have found that encouraging a rote, a checklist of questions inspired by the SPIDR decomposition technique, helps teams get into the right mindset for discussing a plan, rather expecting a plan to be given from up high.

These are (some of) the questions a team may ask, to ensure their understanding of the value they are asked to deliver in the sprint:


A spike describes a short burst of activity, often necessary to experiment with potential solutions for a complex problem. This will often be the last set of questions that the team will ask, after the requirement itself is understood.

Ask your team mates: Do we know know how to do this? Are there any APIs or components that we have to use that have never been used before? Is there anything missing in our skillset, toolset, or our definition of done, that needs to be in place in order to deliver this user story?


A path describes a specific flow, or a scenario of using the product. Ask and discuss with your product owner, stakeholders, or subject matter experts: What are the conditions under which the users will be interacting with the system? Are there other ways that this functionality may be used? What if something goes wrong? What do we show our users – any and all users? What can go wrong? Each if statement, switch/case, loop, or other control block marks a different path to be considered.


There are often many different ways to interact with a product. Gone are the days where users would only have one option – their desktop or terminal. What are all the various interfaces we need to consider? Web? Desktop? Tablets? Smartphones? Wearables? Command line tools for administrators and automation tools? How should the experience differ from one interface to another? Which interfaces are most valuable, and should thus be considered first?


Different inputs beget different outcomes. What data will be provided? What are the data sources? Have we interacted with that data before (do we need a spike to figure out how to get that data?) Consider environmental data – does location affect the outcome? the time or date? Random elements? What data is required vs. optional? What if data is missing? What defaults to use in place of missing data?


A rule, either business or a technical standard, describes certain operational constraints. What are the business rules? Special cases? Are there any standards? Quotas? Compliance rules? Security standards?


Asking these questions during sprint planning is a great way to start a conversation that will both achieve the principles of collaboration and engagement, and ensure a greater understanding of the value that the product is expected to deliver to its users.

This list of questions is by no means an exhaustive list, and is intended to demonstrate the ways in which a delivery team might engage with the product owner, to negotiate with her or him.


p align=”justify”>What do you think? Are there any important questions that I have missed? Please let me know in the comments

Posted in Agile, Definition of Done, How To, Scrum, Sprint Planning, User Story | Leave a comment

The Importance of Limiting Your WIP

importanceIn the previous post we discussed what WIP (Work In Process/Progress) is, and how to track it. In this post I want to discuss why WIP limits are so important and how they contribute to improving the team’s effectiveness and throughput.

So There’s This Little Law…

LawsLittle’s Law describes the relationship between the average throughput in a system, the average process time (a.k.a. Cycle Time) for an item in the system, and the amount of items being processed at the same time (your WIP). For the mathematically inclined reader, this relationship is described in the following formula: L=λW, or in other words, the long-term average WIP (L) is equal to the long-term average throughput (λ) multiplied by the average wait/process/cycle time (W).

This theorem is both simple and profound; while the formula is intuitively easy to grasp, it also means that it is unaffected by other factors such as the arrival process distribution, service distribution, order of process, or for all intents and purposes – anything else!

This law therefore holds true for both simple and complex systems of any nature.

Okay. So this is interesting (humor me here – let’s assume that if you’re reading this post you are finding this interesting) but what does this mean?

Applying Little’s Law to Product Development

implementIf we play around with the formula, we will arrive at a way to determine the process time – or in product development terms, the amount of time to develop a feature – by dividing the WIP by the system throughput, or W=L/λ. This means that we can reduce the amount of time it will take us to implement features by limiting the amount of items we develop concurrently!

This too should be intuitive. Even if we completely ignore all of the disadvantages of multitasking, context shifting, resource allocation, deadlocking, and conflicting priorities, it is simple to see that if we could, for example, complete only 80% of the planned work, it would be more valuable to have 80% of our items 100% done and delivered, than to have started work on all 100% of our items, but each only 80% complete, and in no shape to ship. Yes in the described scenario, as a manager, you could find how to blame your developers for not completing 100% of the planned work, but this was preventable! That’s on you!

However, there is another subtle application that is implied by the modified theorem. WIP predicts process time!

Predicting Process Time based on WIP

predictPredicting process time is not new. We just like to call it estimating. Estimation is defined as making an extremely hopeful guess about the amount of time it will take to develop or (and?) deliver something, based on personal or shared past experiences (or more often, an arbitrary demand, dictated by someone in a higher tax bracket than the people doing the work). Okay, so perhaps it is not defined quite like this, but in all honesty it really should be.

Regardless of what technique (if any) you use to come up with your estimates, all of the data that feeds your estimates are based on the past. In other words, they are trailing indicators. This means that the numbers come after the fact – they could be used to explain the previous items’ process time, and we statistically assume (read: hope) that they will hold true for our upcoming work.

By contrast, your WIP is a leading indicator! This means that the WIP predicts (affects) the process time for items entering the system, I.e. not yet developed.

Let’s look at the following Cumulative Flow Diagram from a VSTS project:


Let’s try to reduce the work in progress. We will do this by focusing our efforts on closing items that are already active instead of beginning work on new items. This will be represented by taking items out of the Active state, and putting them back in the New state, and in return, moving items from the Resolved state into the Closed state. Doing so will result in the following modified cumulative flow diagram:


Note that as our WIP is reduced, so does our process time! We have reduced the time to deliver an item – any item – doing nothing more than limiting the amount of work the team processes concurrently! And this doesn’t even account for the extra benefits that knowledge work such as software development derives from focusing on a single small unit of work!


In this blog post we’ve discussed Little’s Law and its implications for product management. Likewise, we demonstrated the importance of WIP limits as leading indicators for process time, and how reducing the amount of work we have in process concurrently reduces the amount of time to process items through the system. Armed with this knowledge I hope you will be able to make use of this because all you need to do is not take on a new job before you are done with (most of) the ones you’ve already begun.

Stay lean because you can(ban),


Posted in Uncategorized | Leave a comment

WIP Your Product into Shape

What is WIP?

Enjoy-Drink-Cozy-Cappuccino-Cup-Coffee-Cream-703146WIP simply means work in process (also sometimes, Work In Progress). This metric simply measures how many items (features, stories, backlog items, tasks) your team have started to develop, but have yet to complete. In other words, how many items are currently being developed. This simple metric is extremely important, and a useful number to track and control. In this post we will discuss the reasons for limiting your WIP, how to do so, and how to track your work in process using VSTS.

You’re Doing Too Much

overloadImagine the following scenario: You are walking down Main street, carrying a box. Not a problem. The box is small enough that you can easily pick it up and carry it from wherever it was that you got it to wherever you are going. All is fine.

Now imagine, that you are carrying two boxes. Still not a problem. Granted, carrying one box would be easier, but you believe that the discomfort of carrying two boxes is preferable to the discomfort of having to make the trip twice. You can do it.

Now imagine that you are trying to pick up and carry three boxes. Not easy at all, and the weight, strain, and bulkiness of the trio makes you reconsider the wisdom of trying to take so much with you at once. Staggering carefully might end up taking longer than a second trip would…

Now imagine that you are trying to pick up and carry four boxes. Blind, because you cannot see around all of the boxes, you bump into someone else carrying a box, and you both fall, the contents of your boxes spilling, some shattering.

You really should have limited how many boxes you carry at once…

Setting WIP Limits

What is a WIP Limit?

stop-limit-reachedWIP limits are exactly what they sound like – you limit the number of items that you will work on concurrently, not taking on any new work, until the number of items you are developing is less than the limit. In our above story, the poor courier should have set his limit to 3, possibly even 2 in order for his work to flow optimally, or for him to be effective. The courier could easily carry one item, and could increase his throughput when he carried two items, but slowed down when he was carrying three, and literally crashed when trying to manage four. Note that this is purely individual – another courier could possibly manage three or four boxes.

I hope the metaphor is obvious. The courier is you and your development team. Boxes are your backlog items – whatever you’re tracking (stories, features, tasks, etc.). Some teams can work on one item in collaboration, others no more than two. A third team might prefer to work on twice as many tasks as they have members. It is individual, and it depends on the team, the individuals, and of course, the nature of their work.

But how do you know what limit to set?

Picking a WIP Limit

NO-SILVER-BULLET-HEADER-1080x675Here’s the simple truth: there are no silver bullets. No one can tell your team what the ideal WIP limit is. You should start with whatever makes sense to your team. Change that number to whatever makes sense to you. Change it when circumstances change. Change it to experiment.

If your team tends to collaborate frequently, start with a low number – three, because why not three… If your team rarely collaborates, start with a higher number, tied to the number of developers you have on your team, e.g. 1 or 2 per team member, plus/minus 1.

Next, start tracking metrics that are important to your team. A few examples are:

  • Throughput – the number of items processed (developed) in a given period of time (day, week, sprint – whichever, just be consistent). You want this number to be high
  • Defect Rate – the number of bugs/defects found (in QA, UAT, production, etc.) in a given period of time. You want this number to be low
  • Deployment Frequency, and deployment time. You want these numbers to be high and low, respectively
  • Lead Time – the amount of time that passes between requesting a change to the system and its delivery to the system’s users. You want this number to be low

At a regular cadence, for example, every sprint retrospective, evaluate the numbers. Experiment with lowering your WIP limit. After a period of time, look at your metrics. Did they improve? If so, great! Keep on doing what you’re doing. If not, either adjust (increase) your WIP limit or adjust your practices so that you perform better at the lower limit. Measure again. Rinse and repeat.

Tracking WIP Limits in VSTS

Tracking Your WIP Limit in Your Kanban Board

Limiting your WIP is a practice that should be mostly autonomous, that is to say, at the team’s discretion. As such, you need to set it and visualize it where the team visualizes their work. If you go to VSTS’s boards, you will notice two numbers beside the name of each column (except for the ‘New’ and ‘Done’ columns). The first number represents the number of items in the column, the second represents the WIP limit. In the following board, the ‘Approved’ column has a WIP limit of 5, meaning that the team must approve (or refine) a work item before they can consider a sixth item. This team’s ‘Committed’ WIP limit is set to 6, meaning that they can work on no more than 6 concurrent work items, as a whole team.


If the team takes on a 6th item to be approved, VSTS won’t block them, but the number will turn red to note that the team is doing something wrong:


Tracking Your WIP Limit in the VSTS Team Dashboard

The VSTS boards put this information in the team members’ faces, enough to be noticed, but not so much as to get in the way of progress. This may be enough for the team. Some teams may want more. Some teams may want to have this displayed on the team’s dashboard, where it might be visible to anybody, or always visible. Perhaps the Scrum Master or the team leader may want to know what is going on. Perhaps they want to show the dashboard to middle management in order to prove that reducing WIP is important.

One thing you might do is add a Work Item Query that tracks the work items in a given column (add a where clause like Board Column =  Committed), and add a Query Tile to your dashboard that counts the number of items returned by the query.


You can even set the tile with conditional formatting that would, for example, color the tile green if the count is less than the WIP limit, yellow if it’s at the limit, or red if it is above:


The end result is a query tile that really sticks out and lets whoever needs to know how well you are limiting your WIP:



In this blog post, we’ve discussed what WIP limit is, how to set it, and how to track it in VSTS. In the next post, we will discuss why reducing the WIP limit is so important, and how to incorporate WIP limit management into your real, day-to-day, corporate life.

Stay lean,


Posted in Agile, Productivity, Scrum, Self Management, Team, Uncategorized, VSTS | Tagged , , , , , , | Leave a comment

5 Ways to Reduce the Impact of Failure

brush dirt off shoulderIn the previous post we discussed risk management, how many of us manage it today by attempting to control the likelihood of failure, and why we should instead focus on reducing the impact of failures, as a way to manage the risks involved with software development.

Below are 5 techniques that software development teams can use to reduce the impact of failures in production:

1. Reduce Batch Sizes

reduced-speed-smSmaller workloads take less time to complete – in each phase, and altogether. Smaller workloads include less functionality, thus fewer potential flaws, each likely to have a smaller impact on the system as a whole. Small batches are easier to deploy, thus easier to revert – or rather it is easier to design a successful rollback plan, meaning that the recovery time will be shorter.

Less functionality in a deployment unit results into a smaller area of impact. Faster deployments with faster rollback options result in a reduced time to recover, thus reducing the impact on the system.

Reducing the batch size exponentially reduces the impact of failure on the system.

2. Deploy as Early and Often as Possible

early-birdDevelopers depend on feedback in order to know whether or not, they created the right thing, as well as whether or not they created the thing right. Building the system every time changes are made is a good start, and running automated unit tests is even better, but some defects may only be detected in production or a production-like environment!

By shortening the development cycle, reducing the amount of time that a developer must wait before finding out if his or her changes were successful, the developer will:

  1. have an easier time correcting any mistakes
  2. have an easier time identifying cause and effect between changes made and defects in production
  3. have an easier time learning from these mistakes and findings

Reduced feedback cycle, therefore, results in faster fixes, thus reducing the impact of failures, while the increased learning results in a happy side-effect of reduced likelihood of failure.

3. Shift-Left and Automate Audits and Controls

Cog-icon-grey.svgRisk aversion, or fear of failure, is the primary reason that we have so many audits, checks and controls, and sign-offs for every single change that we introduce in the system. As mentioned in the previous post, these measures are expensive (time-wise) and ineffective at reducing risk; they are introduced too late in the development lifecycle to serve as an efficient method of reducing risk. Worse, most of these controls delay the deployment of changes into production by days, even weeks, thus increasing the feedback cycle, effectively increasing the impact and likelihood of failure!

See the irony here? The very measures taken to mitigate risk actually make it worse!

By replacing post-factum audits and governance boards with automated checks that analyze the code, and test it in a production-like environment, and by discussing operational concerns earlier in the planning process, we can introduce these audits earlier in the lifecycle, even as early as while the developer is coding! Automation also helps reduce the cost and the length of the feedback cycle.

With earlier and faster warnings, fewer quality issues will reach production. Those that do get through these measures are likely to be the smaller and less significant ones. This means that both impact and likelihood will be reduced.

4. Decouple Sub-Systems

break-chainHuge monoliths might be easy to develop, and often offer the greatest performance. Unfortunately, they come with inherent disadvantages:

  1. It is difficult or impossible to deliver changes to part of a monolith; monolith deployments are usually an all-or-nothing endeavor.
  2. Tightly coupled, monoliths are often designed in a way that if one part of a workflow fails, the entire workflow crashes.

By decoupling the architecture, separating steps into individual components that communicate with each other asynchronously, using message queues or event-based communication systems, we can:

  1. Deploy each component separately, ensuring that defects introduced into a system component are isolated from other components, reducing the impact of failures to one localized component.
  2. Rather than fail the workflow if something goes wrong, we can flag defective messages for service teams to handle, notify the user that completion is delayed, and eventually complete each flow when services are restored, thus reducing the impact of failure even further.

As a bonus, decoupled systems are much easier to scale as demand for services increase. A whole class of problems can be completely avoided by architecting loosely coupled systems.

As for performance concerns, make sure that you are developing for good enough performance, rather than best. Remember that Good Enough is by definition – good enough.

5. Continuously Improve the Definition of Done

continuous-improvementWhether your development and operations team(s) use Scrum, Kanban or any other agile methodology or framework to drive the product, the key to successful risk management is to uphold and improve the quality level you demand for anything that you develop and deploy to production.

Following any and all of the aforementioned techniques will greatly reduce the risk to your production pipeline, but never totally eliminate risk.

The most important way is to make sure that the same issue does not cause a failure twice. Any failure that does get through whatever quality measures you already have in place, must be analyzed, and you must figure out how to make sure that this class of problems never goes uncaught again.

By rigorously applying this technique, you will be able to continuously improve your quality controls, ensuring that failures consistently grow smaller until they are no more than a nuisance.


time-for-a-change-897441_1280Nobody and nothing is perfect. How ever some are closer to perfection than others. If any or all of these ideas are new to you, I would highly recommend that you start with your definition of done, and look at the most harmful failures that you have recently had, and identify the measures that are most valuable for you to introduce into your production line!

And then move on to the next one.

Posted in Agile, Change, Delivery, How To, Productivity, Release, Scrum, Software, Team | Tagged , , , , , | Leave a comment

Risk: You’re Managing it Wrong!

warningNo offense, but despite your best intentions, you might not be handling risk properly. In this day and age, everything is software-dependent; even if you do not consider yourself a “software-firm” per-se, even if you are just running a small development team that develops in-house software, your business still depends on said software to run smoothly, and any outages cost money. The bigger the problem, the greater the cost. If you, like many other modern software-based organizations, try to reduce risk by taking every precaution to avoid the occurrence of failures, then I am talking to you. If you are (still) following the waterfall methodology (why would you do that???), then I am definitely talking to you.

In this blog post I will explain what is fundamentally wrong with the waterfall way of addressing risk, why you should resist the temptation to avoid failure, and what you should be doing instead, in order to truly reduce risk that is inherent to delivering software.

What is Wrong with Waterfall?

Amboli_waterfallWhen following waterfall-based methodologies, software projects get developed in phases – first you gather all of the requirements (system, software), then you analyze the requirements and come up with a program design to satisfy the requirements. Once designed, you implement, or develop it. Once the development is done, you (hopefully) test the system thoroughly, and finally, you hand it over to operations, to deploy it and maintain it.

So, what is so fundamentally wrong with that, you might ask. This is a very simple and straightforward process. The problem is, of course, that, as even Winston Royce, the author of the now infamous paper about from 1970 titled “Managing the Development of Large Software Systems” said, this can only work for the most simple and straight forward projects.

It boils down to this: Software development is complicated. In waterfall, we proceed from one phase to the next, when the former completes successfully. Unfortunately, success is by no means guaranteed, or even likely. Worse, we tend to detect most problems only after we completed the development phase, during the testing phase, deployment phase, or worst of all, only after we have already released the flaws into production. What really makes this difficult, is that some of the problems that we uncover may have been introduced prior to development (the design, analysis or even requirement gathering), and as everyone knows, the cost of fixing a problem grows exponentially over time.

So what do we do? How do we mitigate the risk that we might introduce a costly flaw into the system? Intuitively, we attempt to get everything right the first time. We try to think of everything that the system or software might require, create comprehensive design documentation that proves that we thought really hard about the problem, and create lengthy, highly regulated processes and checks that prove that we crossed every ‘T’ and dotted every ‘I’ (and a few lowercase J’s for extra measure).

In other words, we attempt to reduce risk by reducing the likelihood of a problem/incident.

And here our intuition fails us.

Risk Management in Modern Software Projects

Reducing Likelihood of Problems is the Wrong Approach

snake-eyesThere are many different ways that a project may fail. Too many to count them all. Missing a requirement, getting a requirement wrong, designing the wrong architecture, designing a system too rigid to change, developing the wrong capabilities, developing a capability incorrectly, deploying incorrectly, not designing for the right scale, insecure code, etc. The list goes on…

So we set up policies, we come up with plans, we have audits, we enforce waiting periods, we have sign-offs, and because releasing new software is so complicated and scary, we do so rarely, often no more that four times per year.

But here’s the problem – we aren’t eliminating risk. We are – at best – reducing the likelihood of something getting through our safety gates. This means that things will get through eventually, because given enough time, anything that can happen, eventually will.

And when failure does happen, the flaw in the system expresses itself in its full glory. In other words, there might be a 1% chance of a bug reaching production, but when it does, it’ll be there 100%!

All of our audits, sign-offs, and controls fail to stop us from making mistakes. At best they catch some of the mistakes that we’ve made when it is expensive to fix them; often the mistakes get caught too late – discovered flaws become too hard or too expensive to fix. These defects get shipped anyway, hopefully fixed in a service update. Worse, and all-too-often, these audits, controls and sign-offs do nothing to help identify problems, and are instead in place in order to identify whom to blame for the failures – a useless endeavor, in my opinion.

Worst of all, our lengthy processes delay the feedback that analysts, architects, and developers need, making it impossible to learn from mistakes! A bug found 6 months after it was introduced, will do nothing to teach the responsible party how to avoid making the same mistake again. Cause and effect becomes all but lost at this point.

Finally, due to the infrequency of releases, we are not used to dealing with deployment-related issues, and therefore we are surprised and scared every single time we have them.

Manage Risk by Reducing the Impact of a Problem!

brush dirt off shoulderWhat if rather than attempting to minimize the chance that something goes wrong, we instead try to reduce how badly the problems affect us? Ask yourself this, given the choice between having a 1% chance of suffering a heart-attack, or a 100% chance to suffer something that is 1% the strength of a heart-attack, perhaps a flutter, or skipping a beat – which would you pick? I’d definitely go with the latter. In software development, not only is the likelihood of a production-failure more than 1% likely to occur, it does so every quarter or however frequently you release changes.

Agile Risk Management

agile_balance1Agile project managers, whether following Scrum, Kanban, or any other methodology or framework are designed around the following notions:

  1. Software is complicated
  2. Complicated things risk failure
  3. Complexity is directly proportional to risk and to impact of failure
  4. Complexity increases with the size of the workload
  5. Therefore, we design processes that reduce complexity, and thus – impact

We should follow methodologies that allow us to reduce the size of our workload. In Kanban, we focus on single-item flow. In Scrum, we iterate through our entire release process in one month or less. High-performing teams deploy to production small increments of functionality even more frequently, often multiple times per day!

In the next post, I will cover the steps that an organization can take, to reduce the impact of the risks involved with developing software.

Posted in Agile, How To, Productivity, Release, Scrum, Software, Team | Tagged , , , , , , | Leave a comment

How to Track Impediments in VSTS

Note: While this post discusses impediments in VSTS, everything mentioned can be applied to TFS as well.

Visual Studio Team Services (or VSTS) has great tools to support Scrum teams. The product owner can use the backlog and board to track the progress of individual teams or the entire product, at the Product Backlog Item (PBI) level, at the Feature level or even the Epic level, throughout the entire lifetime of the product. The developers can track the progress that they are making within the sprint, and see how their work (tasks) fit into the larger picture, by associating them with the PBIs that make up the product.

But what about the Scrum Masters? What tools do they have in VSTS to help them track their work and their progress?

What are Impediments?

According to’s official Scrum guide, one of the services that a Scrum Master provides to the development team, is the removal of impediments to the team’s progress. An impediment is anything that causes a developer to be unable to make progress towards completing the sprint’s goal. Whenever a developer has a problem that cannot be solved within the scope of the team, it is the Scrum Master’s responsibility to remove it.

Impediments in VSTS

Visual Studio Team Services has a work item type dedicated towards tracking impediments, and progress on their removal. For projects using the Scrum Template, this work item type is called an impediment. For projects using the Agile or CMMI templates, this is called an Issue. Regardless of template, they both serve the same purpose: They mention a problem, and their state machine tracks the progress.

Unfortunately, impediments and issues do not show up in VSTS’ backlogs or boards. Those are designed for tracking progress on the delivery of the product, and the Impediment work item type is not included. That said, how should a Scrum Master and the Scrum team track these impediments, especially in large distributed projects, where face-to-face communication and jotting a note on a pad is not a viable solution?

Step 1 – Gathering Impediments

The first step to being able to track impediments in VSTS is obviously to enter the impediments into VSTS. The best way to guarantee that impediments do, in fact, get logged in VSTS, is to make it quick and easy to do so. I suggest using widgets in the Work Dashboard. This should be located prominently in whichever dashboard all the Scrum team members view regularly.

The default dashboard in VSTS (named “Overview”) has a widget titled New Work Item. This widget has a text box for setting the title of the work item, and a drop-down list to select the work item type. You can rearrange the dashboard, if you wish, to make sure that the widget is conveniently located, but otherwise, all you have to do, is select the Impediment (or Issue) type, enter the title (e.g. “I need an MSDN Enterprise license”), and click on the Create button:


This will open a new work item form, where you can add a description, or any other detail that you might want, as you would with any other work item:


Finally, just click Save & Close, or press Ctrl+Enter to save and exit the work item.

Step 2 – Create a Query for Tracking Impediments

As a Scrum Master, I will want to keep track of all the impediments in the team. I will want to note new impediments that are not assigned to anyone for removal, and I will want to keep track of those assigned to me (if there is more than one Scrum Master in the project, which may occur in large projects).

In order to set up your impediments query, you will need to go to the Work | Queries submenu.


At this point you have two options. You can customize the existing Open Impediments (or Open Issues, in the other templates) query, which you will find in the Shared Queries section, under the Current Iteration folder, and customize it, or you can create a new query. If you choose to create a new one, just be sure to save it as a shared query, so that you may use it later with some widgets.

Regardless of whether you’re updating the existing query or creating a new one, make sure you set the following elements:

· Work Item Type should be equal to Impediment (or Issue if in an Agile or CMMI project). This is already set in the existing query

Alternatively, you may make the following variations on the query. You can save these as separate queries, or apply them to the one you are working on:

· You may optionally track only impediments that have not yet been resolved. To do so, you should set State to be equal to Open. This is already set in the Open Impediments query

· You may optionally set a filter to track only the impediments in the current iteration. To do this, set Iteration Path to be equal to @CurrentIteration. This is already set in the existing query, but you will want to modify the value from its current setting which defaults to ‘Your-Project\Sprint 1’

· You may optionally set a filter to track only unassigned impediments. To do so, set “Assigned To” to be empty (the operator should be set to ‘=’ and the value should be empty)


Step 3 – Visualizing Impediments in VSTS

Having a query that shows the list of open impediments is important, but the information must not only exist and be available, it also needs to be accessible. In Scrum parlance, we call this Information Radiation. In this step, we will make sure that this information is in the face of the Scrum Master.

The Dashboard

Depending on how cluttered your Overview (the default) dashboard is, you may want to create a separate dashboard, just for the Scrum Master. Doing so is extremely easy. Just go to the Dashboards section of the project, and click on the + New button on the far right. Give the dashboard a name, e.g. ‘Scrum Master’, and then click on the OK button:


You will now have an empty dashboard that you may fill with widgets. We will use the space to add widgets that help us track impediments. The ‘Add Widget’ sidebar will open (you can also click on the edit button, and then the + button at the bottom right to open it. You may now add widgets.

Adding Widgets

There’s probably no end to the widgets you may add at this point, but I would like to point out the following, as I find them to be of most value:

Query Tile

This widget simply displays the total number of results for a query. Add this widget, and click on the wrench to configure it, as follows:

· Title: Call it ‘Open Impediments’ or something similar. The query name will be the default title

· Default background color: I suggest setting it to blue, or another color that denotes a calm or good state (blue is better than green for accessibility reasons)

· Conditional formatting: Click on the ‘+Add Rule’, and select the red color, with a condition of the number of items being greater than 0

Query Results

This query simply displays the results of a query, as a list. Add this widget, and click on the wrench to configure it, as follows:

· Title: Open Impediments, for example

· Size: You can leave it at the default of 3×2, or whatever size you like, experiment with it

· Query: Choose the ‘Open Impediments’ query

· Display Columns: Choose the columns that you wish to display. I would make sure to have the Title, State, and ‘Assigned To’ columns.

Chart for Work Items

Visualizes the work items with a chart, such as a pie chart etc. You could use this to track impediments over time. This is especially useful with a query that is not filtered by state. You can create a pie chart for impediments, comparing the number of opened and closed impediments, and so on. Add this widget and click on the wrench to configure:

· Title: Open Impediments

· Size: 2×2 is the default, you may change it or leave it, as desired

· Query: Select the open impediments (or a new query for all impediments)

· Chart type: Pie, for example

· Group by: select something to group by, like State, or Iteration Path

· Aggregation: usually Count of Work Items

· Series: Select the color for each group (e.g. red for open, blue for closed)

If you’ve followed my examples, your dashboard may look something like this:


If you can set this up on a monitor that is always on display in a team’s room, this can be a very powerful tool for Scrum Masters.

What other queries and widgets would you suggest for the Scrum Master’s dashboard? Let me know in the comments’ section.


Posted in Agile, How To, Productivity, Scrum, VSTS | Tagged , , , , , | 1 Comment

How to Set Up Multi-Level Hierarchies in VSTS


Both VSTS (Team Services) and TFS (Team Foundation Server) have a set organizational hierarchy. An individual developer belongs to a team. A project (also called Team Project, or TP) has multiple teams, and a VSTS account – as well as a TFS collection – contain multiple projects.

While it is possible to run queries in Team Services and TFS that return data from multiple projects, and in TFS, reports that bring data from multiple data sources (i.e. multiple collections), the agile governance tools that VSTS and TFS offer do not aggregate beyond the project level. This means that the hierarchy that VSTS offers has 3 levels: Project, Team, and Developer.

In this post, I will show you how to set up VSTS so that you can create a larger reporting hierarchy, with as many levels as you want.

Teams, Teams, and more Teams!

First, a disclaimer – the following technique, while giving us almost everything that we could need out of multiple hierarchies, cannot create new types of containers or entities. The highest level is still the project, and individual members still belong to teams.

What we can do is set up teams within teams – or at least create the illusion of having done so.

A team in VSTS has the following attributes – it has members (the individual developers), it has its own set of dashboards, its backlogs and boards, and it is assigned an Area under the project.

The way that we specify that a work item is assigned to a certain team, is by specifying that the item’s Area Path is in or under an area assigned to said team.

The trick that we will use to accomplish our goal, is to create teams whose areas are under the area of another team. Each “level” will be a different hierarchical level. We will usually assign the products highest governance (or steering) team to the project root.

For example, if we want our project to have the following hierarchy:

  • Division
  • Group
  • Team

We will create “Division” teams under the project root, “Group” teams under the divisions, and “Team” teams under the groups, as in the following hierarchy:


Again, the teams themselves are “flat” there is no team hierarchy. The illusion is created by assigning some teams a default area that is a “parent node” for another team’s area.

In this example, Alpha Group’s area is  MyEnterpriseProduct\Blue Division\Alpha Group, and the Apollo team is MyEnterpriseProduct\Blue Division\Alpha Group\Alpha Team, which is beneath it, but neither team has any other attribute that marks one as higher than another, hence the “illusion”.

But Will it Blend?

So we have successfully created a list of teams, some assigned to areas above others. How do we make sure that the illusion is kept when dealing with boards, backlogs and dashboards?

The trick is to set all but the “leaf” teams (the teams lowest in the hierarchy) to include sub areas, i.e. each team owns its own area and those beneath:


Setting the teams like this gives groups a supervisory view of teams, divisions of groups, and the “steering committee” can oversee all of the work being done in the project.

This means that the steering committee’s boards, backlogs, and dashboards will track all of the work being done in the project, while Alpha Group will oversee only the work done by its teams. Each of the “leaf” teams will see the work that has been assigned to them:

Steering Committee's backlog
The Steering Committee’s backlog

Alpha Group’s backlog

Ares Team’s backlog

This filtering is preserved for the Kanban and Scrum boards as well, and each division, group, and team can have their own set of dashboards to highlight whatever they want to see and use to drive their decision making!


By creating an Area tree that matches the organizational hierarchy, and assigning teams to their proper nodes, VSTS teams can be made to create a hierarchy as high as the group needs it to be!

I hope you find this useful. If there are any questions, please feel free to ping me in the comments!

– Assaf

Posted in Uncategorized | Leave a comment

Cross-Account Package Management for NuGet in VSTS

In the following post, we will look at the difficulties in consuming packages from a different VSTS account, as part of a build process in our own account.


Visual Studio Team Services (VSTS) offers package management services. As per the documentation, this is “an extension that makes it easy to discover, install and publish packages”. Team Services enables teams to collaborate with other teams through versioned, cohesive libraries and APIs, in the same way that many 3rd party vendors offer their libraries and frameworks today. In fact, VSTS Package Management exposes the most common package management services that developers use today – NuGet for .NET developers, NPM for NodeJS developers, and now in public preview, Java packages can be served as well.

Access to Packages

Unlike the public open feeds, your teams’ package managers are designed to enable access only to those accounts who should have access to the packages. When you create a new feed, VSTS asks you to decide who has permissions to contribute (i.e. publish to the feed), and who has access to read (i.e. download and consume packages from the feed):


You have two choices regarding the contribution of packages to the feed. You can specify that any of the team members (i.e. those who are members of the hosting team project) can add packages, and/or the build service account (i.e. the service account used to run builds in VSTS). Note that you will want to specify the latter if you use the build service to publish the packages (you should). You normally wouldn’t need to add permissions for the team members as well.

You also have two choices for controlling access to consumption of the feed as well. You can either limit access so that only members of the project can consume the feed’s packages, or you can allow everyone in the account.

This works fine for most organizations. Not for all, however.

Multi-Account Organizations

Some organizations own multiple VSTS accounts. For some the decision simply grew organically, with various teams trying out the services, and chose to stay this way because there is no easy migration path for a project from one account to another. Others intentionally decided to keep multiple accounts, one for each internal organization (different divisions, for example). Regardless of the reasons, the situation may arise, where components developed in and published by one account need to be consumed by components that are being developed in another account.

Unfortunately, this option does not exist. VSTS does not allow you to specify that you want to give access to members of another account.

How Can We Consume Packages from Another Account’s Feed?

The problem that we are trying to solve is the consumption of packages from Account A, in a build definition that builds an application under Account B. What we will need to do is to overcome the limitations that are set by VSTS’s package management services, and somehow access Account A, with a set of credentials that it will accept.


Step 1 – Access to the Publishing Account

The first thing you need to do, is to gain access to the account. You need credentials that allow you to view packages. This will be given to you by administrators of that account. You will be given a username, and a Personal Access Token, or PAT (see documentation about creating and using PATs). Your PAT will (ideally) be configured to grant you read access to Packaging for the required account.

Step 2 – Add a Custom NuGet Configuration File

Next, in order to customize how the build system restores NuGet packages, you are going to need to add a custom NuGet configuration file. You can set it up with the defaults, configuring whatever you might need for your own circumstances, like this:


Step 3 – Add Your Credentials to the Build Definition

Next, you will need to add the credentials for the package manager’s account to the build definition. You will store these as variables. Be sure to encrypt the PAT – you do not want any passwords to be saved in plaintext!


Step 4 – Add a Task to the Build Definition to Access the Package Feed

In the next step, you will need to create a custom source for accessing the other account’s feed. You will need a Command Line build task, and you will be running against the nuget.exe CLI tool. The simplest way to do this is to add nuget.exe to your source control. I prefer to put it under the solution, in a subfolder named Tools.

You will need to call the sources Add command, and specify the name of the feed, its URL, and the username and password/PAT. You will also specify which NuGet configuration file you are going to modify with this setting.

Note that you will want to set the name of the feed to be the same one that you specified in Visual Studio, when building your application.

Here is an example for the Arguments that you must set for the build service to recognize the other account’s feed:

sources Add -name OtherDivision -source -user $(NugetUsername) -pass $(NugetPassword) -ConfigFile $(Build.SourcesDirectory)\Custom.nuget.config

Step 5 – Configure NuGet to Restore with Custom Configuration

Finally, you need to modify the NuGet restore command, so that it uses your custom configuration file:


Your build definition is now able to restore NuGet packages that are published by another Account!


In this blog post, you have learned how to configure NuGet in a VSTS build definition so that it can consume packages that are published by a different VSTS account.

I hope you find this useful.

Happy coding,

Posted in Package Management, VSTS | Tagged , , | Leave a comment

Scaling Scrum with Nexus in VSTS

In this post, I will cover what Scrum Nexus is, where, when, and why you would want to use it, and how to set up VSTS best to accommodate your Scrum practices. As VSTS’s tooling is not yet perfect for some of Nexus’s practices, I will discuss some viable fallbacks and workarounds.

Background – Why Scale?’s official Scrum guide defines Scrum as “A process framework used to manage complex product development”. Scrum’s events, artifacts and rules revolve around a Scrum team, which consists of a Scrum Master, Product owner, and 3-9 development team members (a.k.a. developers).

This limit on team size is important. For Scrum to succeed, the team must consist of developers who can cover all the work required for the product to be delivered at the requisite quality level. If, however, there are fewer than 3 developers, the team is likely to miss some skills required to deliver the product. More than 9 team members, will require too great an effort to coordinate, and result in an unmanageable process.

Therefore, when delivering a large product, as is often the case in enterprise-level projects, the organization must scale beyond the single Scrum team. A framework is required to manage and coordinate the work of multiple Scrum teams.

Enter Nexus.

What is Nexus? defines Nexus as “an exoskeleton that extends Scrum to guide multiple Scrum team on how they need to work together to deliver working software every Sprint”. While there are other systems out there for scaling Scrum, from a simple (and somewhat naïve) “Scrum of Scrums”, where Scrum Masters get together to coordinate the teams’ interdependencies, to full blown complex frameworks such as SAFe, I tend to prefer working with Nexus, as it is a simple extension of Scrum. It builds upon the knowledge that teams have working with Scrum, and applies the same processes, artifacts, and roles, to a larger scale, introducing minor tweaks, rather than new complex mechanics.

Nexus revolves around the notions that teams minimize risk by minimizing interdependencies. A special integration team (called the Nexus Integration Team, or NIT) is formed. The NIT is responsible to uncover and manage whatever dependencies exist between teams’ work items, by eliminating or carefully controlling their impact. The NIT is also responsible to guide the Scrum teams towards continuously integrating their work, to reduce the risk that comes from large integrations at the end of a sprint or a release.

The Nexus Integration Team members include cross-functional developers, such as the product’s DBA, build-master, architects, technical writer, and anyone who may be of greater use the organization as a coordinator of the integrated product, than a member of one Scrum team. The NIT has one Scrum Master, who depending on the teams, may be the one Scrum Master for the entire product, or just for the NIT. In any case the product must have one, and only one, product owner.

Setting up VSTS for Nexus

The Team Project

First things first – There should be one and only one team project for the entire product. You want to be able to view the entire backlog, measure, query, and track progress for the entire product, and you want to be able to view charts and reports that aggregate data for the entire product. If you separate the product so that each team has its own “Team Project”, you will not be able to do so. The Work tracking and dashboard capabilities of VSTS are limited by scope to a single Team Project. You cannot split a backlog, or visualize a Kanban board across multiple team projects.

This choice does have some limitations – the entire product, including all of its teams will have to follow the same process template, though if you have already decided to follow Scrum and Nexus, this should not be an issue.

Areas and Iterations

One Set of Iterations for All Teams

According to the Scrum guide, if you have multiple teams working on the same product, they should be delivering together on the same cadence. In VSTS, this means that there should be a single iteration cadence that all teams follow. All teams start together and end together because the integrated product increment is delivered at the end of every sprint.

The Nexus Integration Team is the Default Team

In VSTS the default project team has the project root as its area. Each of the other teams get an area under the project root:


The NIT Includes all Sub-Areas

In Nexus, there is a unique event that is added to all other events that normally occur in Scrum. This event is called the Nexus Sprint Planning session. In this meeting, the product owner and the NIT review the upcoming work and coordinate what PBI will be addressed in which sprint, by which team. In addition, the Nexus Integration Team is responsible for coordinating of the integration of all teams’ work, on a daily basis.

In order to accommodate these needs, the NIT’s backlogs and boards will be configured to view not only the work items (Epics, features, and/or PBIs) that are in the NIT’s area (the root area for the project), but also the work in each of the areas beneath it, i.e. work that has been associated with the Scrum teams.

This way, the integration team can visualize all of the PBIs on their board, and be ready whenever a PBI is ready for integration (e.g. moved to a column entitled “Integrate” – you will want to create this column).

Setting Up the Nexus Sprint Backlog

As previously mentioned, one of the NIT’s primary responsibilities is to coordinate work among the Scrum teams in order to mitigate interdependency risks. A commonly used tool is called the Nexus Sprint Backlog. The Nexus backlog is views the PBIs in a two-dimensional grid, where they look at each team’s sprint backlog for the upcoming and next few sprints. They then identify and mark each dependency, and note how risky it is:

· When interdependent PBIs are handled by different teams (e.g. 1 & 5, below), the risk is greater than when both PBIs are handled by the same team (e.g. 1 & 4) because coordination is more complex.

· When PBIs are not only handled by different teams, they also are expected to be completed in the same sprint (e.g. 4 & 5), the urgency requires greater coordination, and the risk is even higher.

· When there is a dependency on a PBI by someone outside the project, the risk is greater because there is likely to be an important commitment (e.g. 8). This risk goes up even further, if the PBI is needed by the end of this sprint (e.g. 9)!


While there is no Nexus Sprint Backlog in VSTS, as of the writing of this post, Microsoft’s Delivery Plans extension goes a long way towards giving us just this visualization. It is available in the marketplace so you can install it and use it to visualize your work across teams and sprints:


Visualizing Dependencies in VSTS

The delivery plans extension does not have the ability to visualize and track dependencies between backlog items. Yet. The product group has mentioned that this capability is on the roadmap, but I cannot confirm when this will be delivered.

In the meantime, we need to come up with a way to mitigate this drawback.

Work Item Relationship Links

One thing that you can do is mark a work item to be the predecessor of another. To do this, simply open the PBI, and under Related Work, click Add link to an existing work item. Select the work item that this work item depends upon:


While this will set the work items’ interdependencies, this will not show up in the plan. At least not as of today.


While this requires some manual work, I would suggest adding tags marking an item as “cross team dependent” or “external dependency”. Tags do show up in the plan and can augment it to give the NIT an at-a-glance idea of the risk involved with this project. Note the tags marked by the arrows in the screenshot below:



We have seen how teams wishing to use Scrum and Nexus to drive their development efforts, can set up VSTS to visualize and track their work. The following is a checklist for everything that you need to do:

1. All teams including the integration team work in the same Team Project

2. The Integration Team is the default team

3. The integration team’s area is the project root (e.g. \MyProject)

4. The Scrum teams’ areas are set under the integration team’s (e.g. under \MyProject\ScrumTeam1)

5. The integration team’s area should include all sub-areas

6. Add a column board entitled “Integrate” to mark PBIs that are ready to be integrated with the entire product

7. Install the Delivery Plan extension if it isn’t already installed for your account

8. Set up a delivery plan to include the Scrum teams and their sprints

9. Use the Predecessor link type to denote a dependency

10. Mark the kind of dependency with a tag, so that it can be seen in the plan.

I hope you find this useful. If you have found other tip and ideas to help drive Nexus, please share them with us!


Posted in Agile, Scrum, Team, User Story, VSTS | Tagged , , , , , , | 1 Comment