10 Reasons to Avoid DevOps

Unless you are either very new or have no connection to the software delivery profession and industry (in which case I’d be curious to hear how you found this blog post), you have probably heard of “DevOps”. DevOps is “the union of people, processes, and products, to enable continuous delivery of value to our end users.” What this actually means for developers, delivery teams, and organizations is often not very clear. But everyone says you have to do it, so it has got to be good, right?

Below are 10 reasons that implementing a DevOps transformation might not be a good idea for you:

1. It’s okay; you know what you’re doing

It says it right there in the formal definition of DevOps: DevOps is about enabling delivery of value to our end users. That might be great for some teams, but you don’t need that kind of distraction. You know exactly what you need to build – the project plan is very clear about it. Besides, your end users will use whatever you build for them. They are users in the end; that’s what they do: use.

2. With DevOps you must continuously deliver

Again, it says it right there in the definition: “…enable continuous delivery of value…” You have enough trouble as it is with your semi-annual releases; imagine how tough it would be to have to do that multiple times a day! Sure, if you release every few weeks, let alone every day, your releases would be simpler and smaller, but don’t they always say that size doesn’t matter, and that it is the thought that counts? Well, thinking about continuous delivery is scary, and that’s that.

3. DevOps destabilizes corporate structure

From a practical standpoint, DevOps is about having software developers collaborate with operations engineers, and that just won’t do. Your organization is set up in silos; you even have developers and ops folks in different buildings, just to make sure that they don’t start giving ideas to each other. Give them an inch, and they will start conspiring together – operations engineers will make requests for making the product more operable, and developers might get operations to help them deliver more frequently, and you just established how scary that is!

4. Automation upsets the workforce

A core aspect of DevOps is the extensive use of automation to build, deploy, and test the software. If you allow development teams to set up automated build pipelines, what are the build masters going to do? If you shift all of your tests from manual to automated unit tests what are you going to do with your QA engineers? If you automate your deployments, let alone the creation of test and production environments, what are your operations engineers going to do?!?

Sure, automated tests cannot replace manual tests completely, and there will always be important exploratory tests for QA engineers to execute. Of course your operations engineers have a lot of work to do, just keeping the production site up and running, but you cannot take the chance, can you?

5. DevOps makes you deliver faster

One of the outcomes of adopting DevOps practices is that teams can deliver software to production faster than before, or in business terms: faster time to market. Not so fast! Haste makes waste, after all, and you don’t want to create waste, do you? Besides, if you somehow manage to deliver faster than the competition, you might be forced to innovate, rather than follow other organizations down a well-known path. Besides, you might not even be a consumer facing organization, and there might not be any competition. Your users will just have to accept that this is how you roll.

It’s not like if your leadership decides that you are not delivering fast enough, they might decide to outsource your work to a software firm that can move faster. Right..?

6. Constant feedback can be confusing

The early and continuous feedback that is a big part of DevOps deliveries can be overwhelming. Whether it is from stakeholders at a sprint review, or input from your product’s telemetry, that your developers enabled the operations engineers to gather and analyze, the feedback might surprise you as to what your users are doing with the software you delivered, and might indicate that you should invest more time further developing features that you thought were fine as they are, or worse, might suggest that you cease efforts in developing a feature that you have planned to develop for another three or four weeks. Either way, this feedback might disagree with the plan that you spent months putting together before a single line of code was written. That can be very upsetting! What should you do in that case? Just ignore the plan?!? Not plan so much in the first place?!? What???

7. Production telemetry helps detect defects

As if it is not bad enough that operations engineers use production telemetry to try to influence what the product does by suggesting changes to the plan, they also use it to detect defects! This completely violates the DRY principle (Don’t Repeat Yourself)! You already have testers and a QA team that are supposed to find all of the defects in the product. If telemetry uncovers any problems that you didn’t already know about, it could make you look bad, right?

Sure, if you have build, test, and deployment automation in place, you could possibly fix the problem before the users begin to complain, but you already decided that automation is not a good fit for your organization.

8. Autonomous engineering is scary

DevOps teams take responsibility for the entire software delivery process, from development to production. DevOps teams operate with a high level of autonomy, and that is scary! How can you know what the developers are doing if they are not going through all of the bureaucratic tape that your organization put in place to inhibit excessive productivity and effectiveness?

Of course, modern tooling can ensure that engineers are not doing things that they are not supposed to do, but you cannot trust computer systems to consistently apply your policies. Besides, if the teams can simply do whatever is needed to deliver software effectively, without your say so, what will your job become?!?

9. DevOps promotes engagement

DevOps teams are known to be happier, more engaged, and better motivated than their less autonomous counterparts, and we cannot have that. You pay your engineering teams to deliver features, not to be happy!

10. DevOps teams are more effective

DevOps practices enable teams to be more efficient and reduce costs associated with organizational waste… Hmm… perhaps that means that you can lay off some people and do more with less..?

Grade your self…

Count how many of these arguments seem like a good reason to avoid implementing a DevOps transformation in your organization. If you scored more than zero, you might want to think about the value proposition – enabling continuous delivery of value to you end users. If customer satisfaction is important to your business, it might be worth the effort.

If you think otherwise, say so in the comments – let’s have a chat!

Summary

Despite the sarcastic tone of this post, every single one of these points describes one of the virtues of DevOps, and how adopting its principles and practices can help your organization. I hope you see the value.

Cheers,

Assaf

Posted in Uncategorized | Leave a comment

Sprint Zero Considered Harmful!

h0us3s-Signs-Hazard-Warning-9Many development teams, especially those who are new to Scrum, or those trying to bring some level of agility to an otherwise conservative waterfall organization, struggle with their first attempts at adopting a truly agile approach. This is particularly true for teams getting started on a new green-field project. For these teams, one of the greatest hurdles to success is the imperative to deliver “something” at the end of just one sprint. The most common solution that these teams arrive at is having what is often called “sprint zero”. Unfortunately, this is a bad choice that hinders the team’s agile transformation; worse, its harmful effects materialize much later in the project.

What is Sprint Zero?

zero“Sprint Zero” is the name commonly given to a period of time that comes before a development team’s first “real” sprint. This period of time is often longer than the “real” sprints that will follow. Its primary characteristic, however, is that it breaks one of Scrum’s core principles. In sprint-zero, a team will do a lot of preparatory work, including research, experiments, design, architectural, and infrastructural work, but ultimately deliver nothing that the customers would consider “valuable”. The Scrum guide defines a sprint as “a time-box of one month or less during which a "Done", useable, and potentially releasable product Increment is created”. In sprint zero, no such increment is created.

What is the Purpose of a Sprint?

sprintThe Scrum guide’s definition of a sprint has two parts to it, both equally important:

1. A time-box of one month or less

2. Create a “done”, usable, and potentially releasable product increment

At its root, like all other Scrum practices, the sprint’s purpose is to provide actionable feedback to the organization: making sure that you are building the right thing, and making sure that you are building the thing right.

The product increment provides something for the stakeholders to evaluate so that they can provide feedback: Is this what we asked for? Is this what we actually need? Is it performing adequately? Bottom line – do we “pivot” or “persevere”?

The time-box allows for the feedback to be actionable and timely, enabling the development team to react to the feedback before it is too late, i.e. the project’s deadline has arrived.

In short, the purpose of the sprint is to make sure that teams are progressing in a timely manner, and in the right direction.

So What is Wrong with Sprint Zero?

wrong wayBy now, the primary flaw of sprint zero should be fairly obvious: without the constraint of having to produce anything of demonstrable value, the development team has no indication as to whether or not what they are doing is the right thing, or whether or not it works as designed!

Without creating any product increment, we cannot know if the architecture we’ve spent time to design will support the features we need to build, whether it will scale, or whether it will actually inhibit our developers to the point that they will feel that they need to circumvent the architecture that was put in place.

Without a time-box, it will take longer to find out that the design is wrong, and if enough time was wasted – yes, wasted – in sprint zero, the efforts spent will have raised the cost of change to the point where undoing the damage will be too expensive, fixing the early decisions will be avoided, and the rest of the project will be spent finding ways to work around them, to the detriment of all involved.

And yet, the idea of a “Sprint Zero” has a more subtle flaw, one that is not intuitively obvious to anyone who read the guide, one that is indicative of a greater flaw of the entire organization, one that will have ripple effects into everything the organization does, as they attempt to enact an agile transformation:

Sprint Zero Robs You of a Learning Experience!

robberMany teams new to Scrum, or attempting an agile transformation, and even more than a few consultants will suggest that a team – particularly if they are new to Scrum – ease into agile delivery, by allowing them to “just this once” not follow agile principles. Why do they suggest this? because developing something valuable from scratch, in a single sprint or less is difficult. They might fail.

And this is the flaw.

Yes! You might fail to deliver something of value in a sprint. So what?!? With sprint zero, you are guaranteed to fail to deliver something in a single sprint, because you are not even trying!

One of the core agile principles is to “fail fast!” This means that if you are going to fail – and you are – it is better to fail sooner rather than later. Doing so means that your failure will be smaller because you’ve had less time to affect your organization, and it means you have more time to fix it because there is more remaining time until your project’s target date.

So yes, you are likely to fail if you try to deliver something of value by the end of the first sprint, especially when starting on a green-field project. But why did you fail? Was it a lack of resources? Incomplete understanding? Insufficient automation? Lack of a certain skill-set? Licenses? Environments? Something else?

Knowing why you failed to deliver something – anything – of value in a single sprint can be vital, not only for the product you are building, but even for understanding the nuances of your organizational culture. Finding out why you failed may help your organization’s agile transformation! Without failing fast at delivering a potentially releasable product increment, you would not know what your organization has to do in order to adapt to an agile culture, and support your team’s efforts.

Summary

growth-mindsetSuccessfully delivering valuable functionality in the first sprint or three can be difficult, and for some, simply impossible. But the lessons learned from the attempt are invaluable! Taking an early hit, failing in the short term is unimportant if it improves the chance of success in the long run.

Moreover, failing can teach you valuable lessons, and if you learn from them and adapt accordingly, you will become more agile, and ultimately have a better chance at successfully delivering valuable software.

Learn from your mistakes. Adopt a Growth Mindset.

“The only real mistake is the one from which we learn nothing.”

- Henry Ford

Posted in Agile, Delivery, Scrum, Sprint Planning | Tagged , , , , | Leave a comment

Learn How to Ask the Right Questions!

ask-question-2One thing that sets high-performing individuals and teams apart from lower performing teams is experience. Experience is a shortcut. It saves time. It helps you skip the time and effort required to research a particular problem. Experience lets you cut directly to the solution.

Experience allows you to skip the questions.

Unfortunately, experience can also give you a false sense of security. You feel secure that you know all of the answers to the relevant questions, that you know the complete context of the problem. You assume that nothing has changed since you have last evaluated the problem space.

That assumption can be wrong, and detrimental to your work.

A Short Story

storytellerSo, once upon a time, about a month or so ago, I was running a hackathon with a team of developers. They were a great team, coming together for the first time in the organization’s history to solve a problem agilely, that is to say, all hands on deck – developers, testers, operations engineers, and the product owner sitting together – working together towards a shared common goal, to the exclusion of all other unrelated tasks.

At some point we realized that we have a dilemma regarding the form of the output of our feature. The default way, the way that their primary customer has established with them a long time ago, and the way that they have been working ever since, was to provide a full “baseline” dataset, in the form of a CSV file, and then provide daily “modification files”, i.e. files that add new records or remove the old ones.

The problem was that we now have to support not just the primary customer but many other customers, some so small that they have little in the way of an IT department capable of writing a process to take monthly baselines and apply daily changes automatically. The solution that some of the team came up with was to send them full daily datasets. Each day they were to use the new CSV as the sole source of truth. These small customers could consume the full daily files with Excel, with no additional IT work needed.

Of course, a lively discussion ensued. Part of the team said that we cannot change how things work because our primary customer relies on it. The other half argued that the many smaller new customers simply cannot deal with the “old way”.

After letting them discuss it for a while I proposed we call the customer representative. The team leader did. After a short introduction, referring to the new feature (which they’ve previously discussed), the team leader, a proponent of the old solution asked “do you want us to send you the files in the same way we did before with the other features, that is monthly baseline with daily changes?” The customer representative said that “yes, it would be fine”.

I thought about it for a moment. Our many new smaller customers needed a different solution. We needed to reframe the question. I asked the team leader to relay the following question to the customer: “Would it be acceptable if instead of using the old method we’ve had before, for this feature we will send full daily datasets?”

The customer representative’s response was, and I believe I am quoting her verbatim, as follows:

“Oh my god, yes! That would be so much simpler for us to implement on our end!”

High-fives and fist bumps were had across the entire room. Team spirits went up dramatically.

The Moral

man-looking-to-question-markThe team leader was very experienced with the system we were modifying, and very familiar with the customer. They talked with one another regularly. This blinded him from the simple truth that while the current solution was acceptable, a better solution would be greatly appreciated.

The rule I drew from this experience is this – if you want to change something, never immediately assume that current conditions are immutable. If you’ve never discussed the possibility to change, you cannot know whether others – your partners, customers, teammates, or managers – would resist, or the reverse – enthusiastically gush and support the change. Perhaps they feel the same pain that you do, the pain driving your desire for change, and merely lack the initiative to stand up and do something about it.

In short, don’t hesitate to ask. Phrase the question in a way that benefits you, and highlights the change you wish to make. It cannot hurt. You might be pleasantly surprised with the results!

Posted in Uncategorized | Leave a comment

4 Ways to Integrate Development and Operations Efforts

virtuous-cycleSo you’ve heard about DevOps. That is great. You’ve decided that your organization could really benefit from a “DevOps transformation”. Even better. You’ve even gone above and beyond, and memorized the following definition for DevOps:

DevOps is the union of people, process and technology to enable continuous delivery of value to customers.

Awesome.

But what should you do next? How do you deal with the fact that the developers in your organization cannot or will not own the process of delivering into production? How do you deal with the fact that your operations engineers cannot or do not fully understand the software to which they have been given stewardship?

How do you get your “Dev” and “Ops” to work together? How do you become a “DevOps” organization?!?

Method #1: You Build It – You Operate It!

build-it-yourselfIf your developers are already, among other things, for whichever reasons, tasked with deploying software, monitoring it in production, and dealing with live site incidents (LSIs), then congratulations, you are already half way there. More than half, really. The organizational structure, tearing down the walls between “Dev” and “Ops” is as difficult as it was about a decade ago, when the Agile movement introduced the notion of tearing down the walls between testers and programmers (a.k.a. “Dev” and “QA”), as well as that between developers and business (i.e. Scrum teams consisting both developers and business people). It is even more difficult in some organizations, where developers report to the VP of R&D, while operation engineers report to the VP of Operations; the first common manager who connects both departments is the CEO. Not the person who should be burdened with getting these two realms to cooperate.

Other organizations are not so lucky. But do not despair; the following methods describe how to enable an organization’s DevOps efforts through collaboration of separate “Development” and “Operations” departments/teams.

Method #2: Embedded Ops

400px-PuzzleIn an organization where development and operations exist in two separate hierarchies (a.k.a. “silos”), if you have enough operations engineers, you should assign – embed - one in each development team. This engineer would be responsible to field all of their development team’s questions and requests, and would be responsible to help the development team deliver their software into production.

This setup has the benefit of being able to create self-managing teams who ultimately own the responsibility of deploying software to production, and making their product “ops-friendly” (e.g. automatically testable, deployable, include monitoring hooks, etc.), while maintaining a governed center of excellence (COE) that defines how software should be built, deployed, tested, and monitored.

If you have more operations engineers than development teams (yes, I know how unlikely that is…), the remaining engineers can focus on cross-cutting concerns such as monitoring overall systemwide health, and creating tools to be used by the teams for their operational efforts.

The embedded Ops engineers can, instead of merely handling the operational workload for their respective teams, help the developers cross-train so that they can become more independent, allowing such teams to move towards the first method. This can free up the embedded engineer to rejoin the main Ops team, or to move on to helping a newly formed team, or an existing team that requires extra attention.

Method #3: Liaison Ops

juggleMany organizations, like those described in the previous section have separate “dev” and “ops” departments. Unlike the aforementioned group, however, these organizations do not have enough operations engineers to dedicate to each development team. In this case, we want to put each Ops engineer in charge of serving several teams. This engineer will have to help each team, and balance their workload with the needs of all of the teams with whom they are liaising. The liaising operations engineer will have the same responsibilities and roles as do the embedded engineers; the only difference is in the number of teams they support.

Sooner or later, liaising Ops engineers become bottlenecks, or at least find scheduling to be difficult. For this reason, if the embedded Ops engineers have the option to train their teams to operational independence, the liaising ops must make training their developers for independence a high priority.

A question that invariably comes up is “how many teams can a liaising operations engineer support?” The answer is, of course, “it depends”. It depends on both the engineer and the teams.

Method #4: The Operational Service Team

SelfcheckoutLowe'sMeyerlandSooner or later, organizations with embedded or liaising operations engineers want to – or must – gravitate towards the servicing model. In this model, the operations team does not support individual development teams directly, except in extreme cases. If an engineer from this operations team has to help a development team deploy their code or debug it in production, then there was a procedural failure at some point.

The role of the service team is to create tools, skills, and knowledge that enable development teams to deploy and own their code in production. This may include creating scripts and templates that automate the build and release processes, setting up automated security and quality checks as gates for continuous delivery, setting up dashboards, system health checks, etc. If it is unique to a team’s product, the team owns it. If it is a cross-cutting concern, the ops team owns it. The ops team provisions environments, or enables teams to provision environments within guidelines and constraints; the developers consume the services.

The Operational Service Team may also be responsible to create DevOps-related knowledge for the teams to consume. This may come in the form of any thing from wiki pages to providing formal or ad-hoc training.

Note that this method is very similar to the first method. In both cases, the development teams are responsible for their code all the way in to production. The primary difference, is that in the first method, there is no operations team; in the fourth (this) method, the operations team serves as a center of excellence.

Summary

Hopefully, this post will help you determine where your organization is in its DevOps journey, and help you figure out where you want to be.

Good luck, and safe journeys,

Assaf

Posted in Uncategorized | Leave a comment

DevOps and Your Definition of Done

its-over-its-finally-overRegardless of the agile methodology you are using to drive your software development efforts, you should have an explicit definition of done. I say explicitly, because you will always have one – whether you define it or not. Regardless of your process, even if you are following a Waterfall based process, the level of quality you demand (or allow) your software to reach before you ship it is your definition of done.

Explicitly defining what Done means to your organization helps communication and collaboration, and allows you to set the bar for quality, as well as drive your process improvement efforts.

In this post I will provide some guidance on how to use your Definition of Done to drive collaboration efforts between your developers and the operations engineers, in an organization that is trying to adopt a DevOps mindset.

DevOps

devops-cycleMicrosoft’s very own Donovan Brown gave us what I view as the best definition of what DevOps is. Even if you’ve heard it before, I believe that it bears repeating, in order to drive the point of this topic:

DevOps is the union of people, processes, and product to enable continuous delivery of value to the users

An organization trying to adopt a DevOps mentality should have every single member’s job be defined by that statement.

The Definition of Done

done_r_hiThere are many definitions for the Definition of Done (ironic, I know). I’m rather partial to Scrum.org’s definition, in the official Scrum Guide:

When a Product Backlog item or an Increment is described as "Done", everyone must understand what "Done" means. Although this may vary significantly per Scrum Team, members must have a shared understanding of what it means for work to be complete, to ensure transparency

The emphasis is my own.

What most development teams end up doing is coming up with a laundry list of quality demands like the following:

  • Code complete
  • Unit tests code coverage is <insert some number here>% or higher
  • Automated build
  • QA Tested

Yours may have a few others, or missing some, or have minor variations on the same theme. The result is often disjointed, where some items may be nice and lofty asperations, but not achievable by the organization and the resources and knowledge they currently have.

Done for DevOps

ask-question-2DevOps, among other things, is about collaboration, and a shared responsibility for delivering features into production. The product’s Definition of Done should reflect this.

What I prefer to do is to use directed questions to drive the teams’ Definition of Done. These questions are, of course, asked with the aforementioned DevOps definition in mind.

The following are some examples of shared questions that drive this point and (hopefully) start driving the change in mentality required for success as a DevOps organization.

Start a conversation with the delivery team (developers, testers, ops, business unit, i.e. anyone responsible for the delivery of the product) during your next retrospective, post mortem, or whenever you discuss process improvement options, and ask one or more of these questions.

What must we do to ensure continuous delivery of these features?

The answers to this question is this question are as much the responsibility of developers as it is the testers and the operations engineers. Deploying in small batches of changes, architecting the solution in a certain way that makes (automated) deployment easier, using certain development practices, such as the use of feature flags, test automation, setting up infrastructure as code, and deployment patterns such as blue/green deployment increase the ease and likelihood of successful deployment.

Note that some of the aforementioned practices are in the hands of the developers, some in the hands of testers, and others in the hands of operations.

Collaboration is key.

What must we do to ensure that features delivered are valuable?

Let’s face it – ask 10 development teams why they are building their current feature, and at least 9 of them will answer “because it’s in the backlog”, “because my manager said so”, or “huh?”

The fact that our stakeholders, sponsors, product owner, project manager, or team leader, or CEO requested or even demanded a feature, may be a good enough reason to do something, but they are not all-knowing. We don’t know that the features will in fact be valuable to our business.

For an internal app, we will probably want to know if and how the new change affects time to complete a transaction or error rates. What measures could be put in place to prove that?

For commercial apps, we will probably want to know if and how the new change affects conversions, sales, consumption, etc.

For public sector apps, we might want to know how the new change affects consumption, speed of use, efficiency, backend costs, etc.

The business unit should provide these metrics. Developers must develop the features with hooks to enable measuring these things. Testers must verify that the metrics provide the expected data, and operations must monitor these.

What must we do to ensure that features are working properly?

Yes, of course you need to test your system, ideally with automated tests, topped off by some manual exploratory tests by your testers. Unfortunately, that is not enough.

You want to be able to know that your system is working in production. You want to be able to identify problems before they affect your users.

With the right measures and performance counters in production, monitoring not only your diskspace, memory consumption, and other hardware concerns, but also how your business scenarios are performing.

Make sure your definition of done asks questions such as “What do I need to measure in order to identify problems before our users are affected?”

Defects will escape your delivery teams’ quality processes. That’s a fact. Make sure they do not repeat themselves by asking yourself something like “What can I monitor to guarantee that this problem doesn’t show up again?”

Be sure to not only ask these questions; make sure you also answer them, and implement the hooks that will allow you to monitor these.

Summary

These are but a few conversation starters. Asking them is important. Answering them is crucial. Following up with an implementation will guarantee that your definition of done is in line with your business goals and your organization’s endeavor to adopt a DevOps mindset.

Keep Calm and DevOps!

Posted in Agile, Definition of Done, DevOps, Scrum | Tagged , , , , | Leave a comment

How to Maximize the Value of Your Planning Session

See the source imageBy now it is commonly accepted that the old way of developing software, in silos, with a big up front plan and design, with only a single true delivery to customers at the end of the project, also knows as the infamous Waterfall approach is not the best way of doing things. Many development teams have fully embraced the agile approach, while others have not (yet) fully done so.

Partial, or early agile transformation attempts can often be characterized by embracing some (but not all) ideas, or certain practices commonly associated with agile development frameworks, such as Scrum, but without fully embracing its principles and philosophies; these teams often get the “what” part of it right, but have not yet embraced the “why” or the “how”.

In this blog post, I will focus on how to improve the agile planning session, also known in the Scrum framework as the Sprint planning session.

A Quick Overview of Sprint Planning

planA sprint is a time-box of one month or less (commonly 2-4 weeks) during which the delivery team will develop, test, and build a potentially releasable product increment.

Sprint Planning, quite simply, is the act of figuring out the work that needs to be done in order achieve the sprint’s goal, I.e. create the releasable product increment, adding what the Scrum team has decided is the next most valuable functionality to deliver.

Sprint Planning, like all other Scrum practices, is a collaborative effort, which is to say, that while the product owner is responsible for maximizing the value created by the delivery team, they, the team are responsible to figure out how to deliver the work. Both are expected to work together to define each sprint goal, rather than having the product owner tell the team what to do.

Change is Difficult

time-for-a-change-897441_1280For many teams, especially those transitioning from a command and control style of organization to an agile, self managing style, embracing the change in management style is much more difficult than embracing a different cadence and/or set of practices. This often manifests in planning sessions where the product owner describes what needs to be done, and the team just listen. It really should be more of a discussion.

The delivery team needs to collaborate, discuss the goal and the plan with the product owner to ensure that they understand what they are asked to deliver.

The SPIDR Technique

spider-technik-user-storiesWriting user stories, while not strictly required in Scrum, is one of the most commonly known Scrum practices. The idea is to describe increments of value from the perspective of one or more users of the product, thus ensuring that the product increment is, in fact, valuable to the users.

In order to ensure that user stories are well enough understood, and that they are small enough to fit in a single sprint, the SPIDR technique provides a way to decompose stories into multiple parts, each delivering some observable incremental value to the users.

SPIDR is an acronym defining the most commonly used methods of decomposing a story: Spikes, Paths, Interfaces, Data, Rules. The SPIDR technique was created by Mike Cohn, and fully described in his blog.

New Practice: SPIDR Planning

collab-spidrIn my work with delivery teams transiting from a waterfall-based world to a more agile, Scrum-based paradigm, I have found that encouraging a rote, a checklist of questions inspired by the SPIDR decomposition technique, helps teams get into the right mindset for discussing a plan, rather expecting a plan to be given from up high.

These are (some of) the questions a team may ask, to ensure their understanding of the value they are asked to deliver in the sprint:

Spikes

A spike describes a short burst of activity, often necessary to experiment with potential solutions for a complex problem. This will often be the last set of questions that the team will ask, after the requirement itself is understood.

Ask your team mates: Do we know know how to do this? Are there any APIs or components that we have to use that have never been used before? Is there anything missing in our skillset, toolset, or our definition of done, that needs to be in place in order to deliver this user story?

Paths

A path describes a specific flow, or a scenario of using the product. Ask and discuss with your product owner, stakeholders, or subject matter experts: What are the conditions under which the users will be interacting with the system? Are there other ways that this functionality may be used? What if something goes wrong? What do we show our users – any and all users? What can go wrong? Each if statement, switch/case, loop, or other control block marks a different path to be considered.

Interfaces

There are often many different ways to interact with a product. Gone are the days where users would only have one option – their desktop or terminal. What are all the various interfaces we need to consider? Web? Desktop? Tablets? Smartphones? Wearables? Command line tools for administrators and automation tools? How should the experience differ from one interface to another? Which interfaces are most valuable, and should thus be considered first?

Data

Different inputs beget different outcomes. What data will be provided? What are the data sources? Have we interacted with that data before (do we need a spike to figure out how to get that data?) Consider environmental data – does location affect the outcome? the time or date? Random elements? What data is required vs. optional? What if data is missing? What defaults to use in place of missing data?

Rules

A rule, either business or a technical standard, describes certain operational constraints. What are the business rules? Special cases? Are there any standards? Quotas? Compliance rules? Security standards?

Summary

Asking these questions during sprint planning is a great way to start a conversation that will both achieve the principles of collaboration and engagement, and ensure a greater understanding of the value that the product is expected to deliver to its users.

This list of questions is by no means an exhaustive list, and is intended to demonstrate the ways in which a delivery team might engage with the product owner, to negotiate with her or him.

<

p align="justify">What do you think? Are there any important questions that I have missed? Please let me know in the comments

Posted in Agile, Definition of Done, How To, Scrum, Sprint Planning, User Story | Leave a comment

The Importance of Limiting Your WIP

importanceIn the previous post we discussed what WIP (Work In Process/Progress) is, and how to track it. In this post I want to discuss why WIP limits are so important and how they contribute to improving the team's effectiveness and throughput.

So There's This Little Law…

LawsLittle's Law describes the relationship between the average throughput in a system, the average process time (a.k.a. Cycle Time) for an item in the system, and the amount of items being processed at the same time (your WIP). For the mathematically inclined reader, this relationship is described in the following formula: L=λW, or in other words, the long-term average WIP (L) is equal to the long-term average throughput (λ) multiplied by the average wait/process/cycle time (W).

This theorem is both simple and profound; while the formula is intuitively easy to grasp, it also means that it is unaffected by other factors such as the arrival process distribution, service distribution, order of process, or for all intents and purposes – anything else!

This law therefore holds true for both simple and complex systems of any nature.

Okay. So this is interesting (humor me here – let's assume that if you're reading this post you are finding this interesting) but what does this mean?

Applying Little's Law to Product Development

implementIf we play around with the formula, we will arrive at a way to determine the process time – or in product development terms, the amount of time to develop a feature – by dividing the WIP by the system throughput, or W=L/λ. This means that we can reduce the amount of time it will take us to implement features by limiting the amount of items we develop concurrently!

This too should be intuitive. Even if we completely ignore all of the disadvantages of multitasking, context shifting, resource allocation, deadlocking, and conflicting priorities, it is simple to see that if we could, for example, complete only 80% of the planned work, it would be more valuable to have 80% of our items 100% done and delivered, than to have started work on all 100% of our items, but each only 80% complete, and in no shape to ship. Yes in the described scenario, as a manager, you could find how to blame your developers for not completing 100% of the planned work, but this was preventable! That's on you!

However, there is another subtle application that is implied by the modified theorem. WIP predicts process time!

Predicting Process Time based on WIP

predictPredicting process time is not new. We just like to call it estimating. Estimation is defined as making an extremely hopeful guess about the amount of time it will take to develop or (and?) deliver something, based on personal or shared past experiences (or more often, an arbitrary demand, dictated by someone in a higher tax bracket than the people doing the work). Okay, so perhaps it is not defined quite like this, but in all honesty it really should be.

Regardless of what technique (if any) you use to come up with your estimates, all of the data that feeds your estimates are based on the past. In other words, they are trailing indicators. This means that the numbers come after the fact – they could be used to explain the previous items' process time, and we statistically assume (read: hope) that they will hold true for our upcoming work.

By contrast, your WIP is a leading indicator! This means that the WIP predicts (affects) the process time for items entering the system, I.e. not yet developed.

Let's look at the following Cumulative Flow Diagram from a VSTS project:

cfd

Let's try to reduce the work in progress. We will do this by focusing our efforts on closing items that are already active instead of beginning work on new items. This will be represented by taking items out of the Active state, and putting them back in the New state, and in return, moving items from the Resolved state into the Closed state. Doing so will result in the following modified cumulative flow diagram:

cfd2

Note that as our WIP is reduced, so does our process time! We have reduced the time to deliver an item – any item – doing nothing more than limiting the amount of work the team processes concurrently! And this doesn't even account for the extra benefits that knowledge work such as software development derives from focusing on a single small unit of work!

Conclusion

In this blog post we've discussed Little's Law and its implications for product management. Likewise, we demonstrated the importance of WIP limits as leading indicators for process time, and how reducing the amount of work we have in process concurrently reduces the amount of time to process items through the system. Armed with this knowledge I hope you will be able to make use of this because all you need to do is not take on a new job before you are done with (most of) the ones you've already begun.

Stay lean because you can(ban),

Assaf

Posted in Uncategorized | Leave a comment

WIP Your Product into Shape

What is WIP?

Enjoy-Drink-Cozy-Cappuccino-Cup-Coffee-Cream-703146WIP simply means work in process (also sometimes, Work In Progress). This metric simply measures how many items (features, stories, backlog items, tasks) your team have started to develop, but have yet to complete. In other words, how many items are currently being developed. This simple metric is extremely important, and a useful number to track and control. In this post we will discuss the reasons for limiting your WIP, how to do so, and how to track your work in process using VSTS.

You're Doing Too Much

overloadImagine the following scenario: You are walking down Main street, carrying a box. Not a problem. The box is small enough that you can easily pick it up and carry it from wherever it was that you got it to wherever you are going. All is fine.

Now imagine, that you are carrying two boxes. Still not a problem. Granted, carrying one box would be easier, but you believe that the discomfort of carrying two boxes is preferable to the discomfort of having to make the trip twice. You can do it.

Now imagine that you are trying to pick up and carry three boxes. Not easy at all, and the weight, strain, and bulkiness of the trio makes you reconsider the wisdom of trying to take so much with you at once. Staggering carefully might end up taking longer than a second trip would…

Now imagine that you are trying to pick up and carry four boxes. Blind, because you cannot see around all of the boxes, you bump into someone else carrying a box, and you both fall, the contents of your boxes spilling, some shattering.

You really should have limited how many boxes you carry at once…

Setting WIP Limits

What is a WIP Limit?

stop-limit-reachedWIP limits are exactly what they sound like - you limit the number of items that you will work on concurrently, not taking on any new work, until the number of items you are developing is less than the limit. In our above story, the poor courier should have set his limit to 3, possibly even 2 in order for his work to flow optimally, or for him to be effective. The courier could easily carry one item, and could increase his throughput when he carried two items, but slowed down when he was carrying three, and literally crashed when trying to manage four. Note that this is purely individual – another courier could possibly manage three or four boxes.

I hope the metaphor is obvious. The courier is you and your development team. Boxes are your backlog items – whatever you’re tracking (stories, features, tasks, etc.). Some teams can work on one item in collaboration, others no more than two. A third team might prefer to work on twice as many tasks as they have members. It is individual, and it depends on the team, the individuals, and of course, the nature of their work.

But how do you know what limit to set?

Picking a WIP Limit

NO-SILVER-BULLET-HEADER-1080x675Here’s the simple truth: there are no silver bullets. No one can tell your team what the ideal WIP limit is. You should start with whatever makes sense to your team. Change that number to whatever makes sense to you. Change it when circumstances change. Change it to experiment.

If your team tends to collaborate frequently, start with a low number – three, because why not three… If your team rarely collaborates, start with a higher number, tied to the number of developers you have on your team, e.g. 1 or 2 per team member, plus/minus 1.

Next, start tracking metrics that are important to your team. A few examples are:

  • Throughput – the number of items processed (developed) in a given period of time (day, week, sprint – whichever, just be consistent). You want this number to be high
  • Defect Rate – the number of bugs/defects found (in QA, UAT, production, etc.) in a given period of time. You want this number to be low
  • Deployment Frequency, and deployment time. You want these numbers to be high and low, respectively
  • Lead Time – the amount of time that passes between requesting a change to the system and its delivery to the system’s users. You want this number to be low

At a regular cadence, for example, every sprint retrospective, evaluate the numbers. Experiment with lowering your WIP limit. After a period of time, look at your metrics. Did they improve? If so, great! Keep on doing what you’re doing. If not, either adjust (increase) your WIP limit or adjust your practices so that you perform better at the lower limit. Measure again. Rinse and repeat.

Tracking WIP Limits in VSTS

Tracking Your WIP Limit in Your Kanban Board

Limiting your WIP is a practice that should be mostly autonomous, that is to say, at the team’s discretion. As such, you need to set it and visualize it where the team visualizes their work. If you go to VSTS’s boards, you will notice two numbers beside the name of each column (except for the ‘New’ and ‘Done’ columns). The first number represents the number of items in the column, the second represents the WIP limit. In the following board, the ‘Approved’ column has a WIP limit of 5, meaning that the team must approve (or refine) a work item before they can consider a sixth item. This team’s ‘Committed’ WIP limit is set to 6, meaning that they can work on no more than 6 concurrent work items, as a whole team.

image

If the team takes on a 6th item to be approved, VSTS won’t block them, but the number will turn red to note that the team is doing something wrong:

image

Tracking Your WIP Limit in the VSTS Team Dashboard

The VSTS boards put this information in the team members' faces, enough to be noticed, but not so much as to get in the way of progress. This may be enough for the team. Some teams may want more. Some teams may want to have this displayed on the team’s dashboard, where it might be visible to anybody, or always visible. Perhaps the Scrum Master or the team leader may want to know what is going on. Perhaps they want to show the dashboard to middle management in order to prove that reducing WIP is important.

One thing you might do is add a Work Item Query that tracks the work items in a given column (add a where clause like Board Column =  Committed), and add a Query Tile to your dashboard that counts the number of items returned by the query.

image

You can even set the tile with conditional formatting that would, for example, color the tile green if the count is less than the WIP limit, yellow if it’s at the limit, or red if it is above:

image

The end result is a query tile that really sticks out and lets whoever needs to know how well you are limiting your WIP:

image

Conclusion

In this blog post, we’ve discussed what WIP limit is, how to set it, and how to track it in VSTS. In the next post, we will discuss why reducing the WIP limit is so important, and how to incorporate WIP limit management into your real, day-to-day, corporate life.

Stay lean,

Assaf

Posted in Agile, Productivity, Scrum, Self Management, Team, Uncategorized, VSTS | Tagged , , , , , , | Leave a comment

5 Ways to Reduce the Impact of Failure

brush dirt off shoulderIn the previous post we discussed risk management, how many of us manage it today by attempting to control the likelihood of failure, and why we should instead focus on reducing the impact of failures, as a way to manage the risks involved with software development.

Below are 5 techniques that software development teams can use to reduce the impact of failures in production:

1. Reduce Batch Sizes

reduced-speed-smSmaller workloads take less time to complete – in each phase, and altogether. Smaller workloads include less functionality, thus fewer potential flaws, each likely to have a smaller impact on the system as a whole. Small batches are easier to deploy, thus easier to revert – or rather it is easier to design a successful rollback plan, meaning that the recovery time will be shorter.

Less functionality in a deployment unit results into a smaller area of impact. Faster deployments with faster rollback options result in a reduced time to recover, thus reducing the impact on the system.

Reducing the batch size exponentially reduces the impact of failure on the system.

2. Deploy as Early and Often as Possible

early-birdDevelopers depend on feedback in order to know whether or not, they created the right thing, as well as whether or not they created the thing right. Building the system every time changes are made is a good start, and running automated unit tests is even better, but some defects may only be detected in production or a production-like environment!

By shortening the development cycle, reducing the amount of time that a developer must wait before finding out if his or her changes were successful, the developer will:

  1. have an easier time correcting any mistakes
  2. have an easier time identifying cause and effect between changes made and defects in production
  3. have an easier time learning from these mistakes and findings

Reduced feedback cycle, therefore, results in faster fixes, thus reducing the impact of failures, while the increased learning results in a happy side-effect of reduced likelihood of failure.

3. Shift-Left and Automate Audits and Controls

Cog-icon-grey.svgRisk aversion, or fear of failure, is the primary reason that we have so many audits, checks and controls, and sign-offs for every single change that we introduce in the system. As mentioned in the previous post, these measures are expensive (time-wise) and ineffective at reducing risk; they are introduced too late in the development lifecycle to serve as an efficient method of reducing risk. Worse, most of these controls delay the deployment of changes into production by days, even weeks, thus increasing the feedback cycle, effectively increasing the impact and likelihood of failure!

See the irony here? The very measures taken to mitigate risk actually make it worse!

By replacing post-factum audits and governance boards with automated checks that analyze the code, and test it in a production-like environment, and by discussing operational concerns earlier in the planning process, we can introduce these audits earlier in the lifecycle, even as early as while the developer is coding! Automation also helps reduce the cost and the length of the feedback cycle.

With earlier and faster warnings, fewer quality issues will reach production. Those that do get through these measures are likely to be the smaller and less significant ones. This means that both impact and likelihood will be reduced.

4. Decouple Sub-Systems

break-chainHuge monoliths might be easy to develop, and often offer the greatest performance. Unfortunately, they come with inherent disadvantages:

  1. It is difficult or impossible to deliver changes to part of a monolith; monolith deployments are usually an all-or-nothing endeavor.
  2. Tightly coupled, monoliths are often designed in a way that if one part of a workflow fails, the entire workflow crashes.

By decoupling the architecture, separating steps into individual components that communicate with each other asynchronously, using message queues or event-based communication systems, we can:

  1. Deploy each component separately, ensuring that defects introduced into a system component are isolated from other components, reducing the impact of failures to one localized component.
  2. Rather than fail the workflow if something goes wrong, we can flag defective messages for service teams to handle, notify the user that completion is delayed, and eventually complete each flow when services are restored, thus reducing the impact of failure even further.

As a bonus, decoupled systems are much easier to scale as demand for services increase. A whole class of problems can be completely avoided by architecting loosely coupled systems.

As for performance concerns, make sure that you are developing for good enough performance, rather than best. Remember that Good Enough is by definition – good enough.

5. Continuously Improve the Definition of Done

continuous-improvementWhether your development and operations team(s) use Scrum, Kanban or any other agile methodology or framework to drive the product, the key to successful risk management is to uphold and improve the quality level you demand for anything that you develop and deploy to production.

Following any and all of the aforementioned techniques will greatly reduce the risk to your production pipeline, but never totally eliminate risk.

The most important way is to make sure that the same issue does not cause a failure twice. Any failure that does get through whatever quality measures you already have in place, must be analyzed, and you must figure out how to make sure that this class of problems never goes uncaught again.

By rigorously applying this technique, you will be able to continuously improve your quality controls, ensuring that failures consistently grow smaller until they are no more than a nuisance.

Conclusion

time-for-a-change-897441_1280Nobody and nothing is perfect. How ever some are closer to perfection than others. If any or all of these ideas are new to you, I would highly recommend that you start with your definition of done, and look at the most harmful failures that you have recently had, and identify the measures that are most valuable for you to introduce into your production line!

And then move on to the next one.

Posted in Agile, Change, Delivery, How To, Productivity, Release, Scrum, Software, Team | Tagged , , , , , | Leave a comment

Risk: You’re Managing it Wrong!

warningNo offense, but despite your best intentions, you might not be handling risk properly. In this day and age, everything is software-dependent; even if you do not consider yourself a “software-firm” per-se, even if you are just running a small development team that develops in-house software, your business still depends on said software to run smoothly, and any outages cost money. The bigger the problem, the greater the cost. If you, like many other modern software-based organizations, try to reduce risk by taking every precaution to avoid the occurrence of failures, then I am talking to you. If you are (still) following the waterfall methodology (why would you do that???), then I am definitely talking to you.

In this blog post I will explain what is fundamentally wrong with the waterfall way of addressing risk, why you should resist the temptation to avoid failure, and what you should be doing instead, in order to truly reduce risk that is inherent to delivering software.

What is Wrong with Waterfall?

Amboli_waterfallWhen following waterfall-based methodologies, software projects get developed in phases – first you gather all of the requirements (system, software), then you analyze the requirements and come up with a program design to satisfy the requirements. Once designed, you implement, or develop it. Once the development is done, you (hopefully) test the system thoroughly, and finally, you hand it over to operations, to deploy it and maintain it.

So, what is so fundamentally wrong with that, you might ask. This is a very simple and straightforward process. The problem is, of course, that, as even Winston Royce, the author of the now infamous paper about from 1970 titled "Managing the Development of Large Software Systems” said, this can only work for the most simple and straight forward projects.

It boils down to this: Software development is complicated. In waterfall, we proceed from one phase to the next, when the former completes successfully. Unfortunately, success is by no means guaranteed, or even likely. Worse, we tend to detect most problems only after we completed the development phase, during the testing phase, deployment phase, or worst of all, only after we have already released the flaws into production. What really makes this difficult, is that some of the problems that we uncover may have been introduced prior to development (the design, analysis or even requirement gathering), and as everyone knows, the cost of fixing a problem grows exponentially over time.

So what do we do? How do we mitigate the risk that we might introduce a costly flaw into the system? Intuitively, we attempt to get everything right the first time. We try to think of everything that the system or software might require, create comprehensive design documentation that proves that we thought really hard about the problem, and create lengthy, highly regulated processes and checks that prove that we crossed every ‘T’ and dotted every ‘I’ (and a few lowercase J’s for extra measure).

In other words, we attempt to reduce risk by reducing the likelihood of a problem/incident.

And here our intuition fails us.

Risk Management in Modern Software Projects

Reducing Likelihood of Problems is the Wrong Approach

snake-eyesThere are many different ways that a project may fail. Too many to count them all. Missing a requirement, getting a requirement wrong, designing the wrong architecture, designing a system too rigid to change, developing the wrong capabilities, developing a capability incorrectly, deploying incorrectly, not designing for the right scale, insecure code, etc. The list goes on…

So we set up policies, we come up with plans, we have audits, we enforce waiting periods, we have sign-offs, and because releasing new software is so complicated and scary, we do so rarely, often no more that four times per year.

But here’s the problem – we aren’t eliminating risk. We are – at best – reducing the likelihood of something getting through our safety gates. This means that things will get through eventually, because given enough time, anything that can happen, eventually will.

And when failure does happen, the flaw in the system expresses itself in its full glory. In other words, there might be a 1% chance of a bug reaching production, but when it does, it’ll be there 100%!

All of our audits, sign-offs, and controls fail to stop us from making mistakes. At best they catch some of the mistakes that we’ve made when it is expensive to fix them; often the mistakes get caught too late – discovered flaws become too hard or too expensive to fix. These defects get shipped anyway, hopefully fixed in a service update. Worse, and all-too-often, these audits, controls and sign-offs do nothing to help identify problems, and are instead in place in order to identify whom to blame for the failures – a useless endeavor, in my opinion.

Worst of all, our lengthy processes delay the feedback that analysts, architects, and developers need, making it impossible to learn from mistakes! A bug found 6 months after it was introduced, will do nothing to teach the responsible party how to avoid making the same mistake again. Cause and effect becomes all but lost at this point.

Finally, due to the infrequency of releases, we are not used to dealing with deployment-related issues, and therefore we are surprised and scared every single time we have them.

Manage Risk by Reducing the Impact of a Problem!

brush dirt off shoulderWhat if rather than attempting to minimize the chance that something goes wrong, we instead try to reduce how badly the problems affect us? Ask yourself this, given the choice between having a 1% chance of suffering a heart-attack, or a 100% chance to suffer something that is 1% the strength of a heart-attack, perhaps a flutter, or skipping a beat - which would you pick? I’d definitely go with the latter. In software development, not only is the likelihood of a production-failure more than 1% likely to occur, it does so every quarter or however frequently you release changes.

Agile Risk Management

agile_balance1Agile project managers, whether following Scrum, Kanban, or any other methodology or framework are designed around the following notions:

  1. Software is complicated
  2. Complicated things risk failure
  3. Complexity is directly proportional to risk and to impact of failure
  4. Complexity increases with the size of the workload
  5. Therefore, we design processes that reduce complexity, and thus – impact

We should follow methodologies that allow us to reduce the size of our workload. In Kanban, we focus on single-item flow. In Scrum, we iterate through our entire release process in one month or less. High-performing teams deploy to production small increments of functionality even more frequently, often multiple times per day!

In the next post, I will cover the steps that an organization can take, to reduce the impact of the risks involved with developing software.

Posted in Agile, How To, Productivity, Release, Scrum, Software, Team | Tagged , , , , , , | Leave a comment