Correlated in Retrospect

A few  years back, I went to visit a company that had managed to achieve a high level of agility without high levels of coaching or training, shipping several times a day. I was curious as to how they had done it. It turned out to be a product of a highly experimental culture, and we spent a whole day swapping my BDD knowledge for their stories of how they managed to reach the place they were in.

While I was there, I saw a very interesting graph that looked a bit like this:

9wcjjha

“That’s interesting,” I said. “Is that your bug count over time? What happened?”

“Well,” one of them said, “we realised our bug count was growing, so we hired a new developer. We thought we’d rotate our existing team through a bug-fixing role, and we hypothesized that it would bring the bug count down. And it worked, for a while – that’s the first dip. It worked so well, we thought we’d hire another developer, so that we could rotate another team member, and we thought that would get rid of the bugs… but they started going up again.”

“Ah,” I said wisely. “The developer was no good?” (Human beings like to spot patterns and think they understand root causes – and I’m human too.)

“Nope.” They were all smiling, waiting for me to guess.

“Two new people was just too many? They got complacent because someone was fixing the bugs? The existing team was fed up of the bug-fixing role?” I ran through all the causes I could think of.

“Nope.”

“All right. Who was writing the bugs?” I asked.

“Nobody.”

I was confused.

“The bugs were already there,” one of them explained. “The users had spotted that we were fixing them, and started reporting them. The bug count going up… that was a good thing.”

And I looked at the graph, and suddenly understood. I didn’t know Cynefin back then, and I didn’t understand complexity, but I did understand perverse incentives, and here was a positive version. In retrospect, the cause was obvious. It’s the same reason why crime goes up when policemen patrol the streets; because it’s easier to report it.

Conversely, a good way to have a low bug count is to make it hard to report. I spent a good few years working in Waterfall environments, and I can remember the arguments I had about whether something in my work was genuinely a bug or not… making it much harder for anyone testing my code, which meant I looked like a good developer (I really wasn’t).

Whenever we do anything in a complex system, we get unexpected side-effects. Another example of this is the Hawthorne effect, which goes something like this:

“Do you work better in this factory if we turn the lights up?”

“Yes.”

“Do you work better if we turn the lights down?” (Just checking our null hypothesis…)

“Yes.”

“What? Um, that’s confusing… do you work better with the lights up, or down?”

“We don’t care; just stop watching us.”

We’ve all come across examples of perverse incentives, which are another kind of unintended consequence. This is what happens when you turn measurements into targets.

When you’re creating a probe, it’s important to have a way of knowing it’s succeeding or failing, it’s true… but the signs of success or failure may only be clear in retrospect. A lot of people who create experiments to try get hung up on one hypothesis, and as a result they obsess over one perceived cause, or one measurement. In the process they might miss signs that the experiment is succeeding or failing, or even misinterpret one as the other.

Rather than having a hypothesis, in complexity, we want coherence – a realistic reason for thinking that the probe might have a good impact, with the understanding that we might not necessarily get the particular outcome we’re thinking of. This is why I get people creating probes to run through multiple scenarios of success or failure, so they think about what things they might want to be watching, or how they can put appropriate signals in place, to which they can apply some common sense in retrospect.

As we’ve seen, watching is itself an intervention… so you probably want to make sure it’s safe-to-fail.

Posted in complexity, cynefin | 4 Comments

BDD: A Three-Headed Monster

676px-Hercules_and_Cerberus_LACMA_65.37.151Back in Greek mythology, there was a dog called Cerberus. It guarded the gate to the underworld, and it had three heads.

There was a great guy called Heracles (Hercules in Latin) who was a demi-god, which means he would have been pretty awesome if the gods hadn’t intervened to make go mad and kill his entire wife and family. Greek gods in general aren’t very nice, and you wouldn’t want them to visit at Christmas.

Heracles ended up atoning for this with twelve tasks, the last of which was to capture Cerberus himself. (It was meant to be ten tasks, but his managers decided that he had collaborated with someone from another team on one of them, and got paid by an external stakeholder for a second, so they didn’t count.)

Fortunately, it turns out that BDD’s a bit easier to tame than Cerberus, and it works best if you involve other people from the outset.

The first thing we do with BDD is have a bit of a conversation. If I know nothing about a project, I’ll ask someone to tell me about it, and whenever I think it’s useful, I ask the magic question: “Can you give me an example?”

If they aren’t specific enough – for instance, they say that there are these twelve labours they have to do – I’ll get them to be more specific. “Can you give me an example of a labour?” Labour to me means Jeremy Corbyn, not wrestling the Nemean Lion, so having these conversations helps me to find out about any misunderstandings early.

Eventually, we end up with something we can think about in concrete terms. There’s a template that we use in BDD – Given a context, when an event happens, then an outcome should occur – but even without the template, having a concrete example which starts from some context, has an event happen and ends with a desired outcome (or the best possible outcome that we can reasonably expect to happen, at least) is useful. It lets us talk about those scenarios, and ask questions about other scenarios that might exist.

Exploration by Example (what it could do)

I like to ask two questions, the patterns for which I call Context Questioning, and Outcome Questioning. “Is there any other context, which, for the same event, gives us a different outcome?” And, “Is there any other outcome that’s important?”

The first is easy. We can usually think of ways that our outcome might be thwarted, and if we can’t, then our testers can, because testers have that “break all the things!” mindset that causes them to think of scenarios that might go a different way to the way we expect.

The second is especially useful though, because it lets us talk about what other stakeholders might need to be involved, and what they need. This is particularly important if there’s a transaction happening – both stakeholders get what they want from it, or three if you’re buying a product or service via a third party like Uber. Without it, you might end up getting one stakeholder’s desired outcome but missing the other.

“Why did they leave that wooden horse out there anyway?”

“We should ask a tester. Do we have any left that haven’t been eaten by sea-serpents?”

With these two questioning patterns, we can start to explore the scope of our requirements. We now know what the system could do. What should it do? By deciding which scenarios are out of scope, we narrow down the requirements. If we don’t know enough to decide what our scenarios ought to look like, we can either do something to try it out in a way that’s safe-to-fail and get some understanding (useful if nobody in the organization has ever done it before) or we can find an expert to talk to (if they’re available).

If you find you never talk about scenarios which you decide are out-of-scope or irrelevant, either quickly or later on, you may not be exploring enough.

Specification by Example (what it should do)

Now we’ve narrowed it down to what it should do, and we’ve got some scenarios to illustrate those aspects of behaviour, we can start turning the ideas into reality. The scenarios give us a focus, and let us continue to ask questions. If we ever come across an area we’re not familiar with, we can fall back into exploration, until we have understanding and can specify once more what our system should do.

“So, if we frown really fiercely and heavily at Charon, he should let us past to get to Cerberus. What does a really fierce frown look like? Can you give me an example?”

“Well, imagine someone just used the words ‘target velocity’ in your hearing…”

“Oh, like this?

And, once we’ve made our ideas into something real, we can use our scenarios to see whether it does what we thought it should do – a test.

Test by Example (what it does)

We can run through the scenario again, either manually or using automation, to see if it works.

And, ideally, we write the automated version of the test first, so that it gives us a very clear focus and some fast feedback; though if we’re in high-uncertainty and keep having to fall back to exploration, we might want to lay off of that for a bit.

“So, what did happen?”

“Um, I chopped off a head, and it grew back. Twice…”

“Oh, sugar. Hold on… I think Jason ran into the same behaviour last week. I can’t remember whether it was intended or not…”

And that’s it.

Some people really focus on the tools and testing. Others focus on specification. Really, though, it’s exploration that we start with; those thought-experiments that are cheap to change, and not nearly as dangerous as the real thing.

In our software, we’re not even dealing with three-headed dogs or mythical monsters; just people who want things. They aren’t even intent on making it hard for us. Even better, those people often find value in small things, and don’t need us to finish all twelve tasks to have a completed story. It’s pretty easy to explore what they want using examples, then use the results of that conversation as specifications, then as tests.

Even so, if you do manage to tame BDD, with all three of its heads, you’re still pretty awesome.

Just remember that a puppy is not just for Christmas.

Posted in bdd, deliberate discovery, stories, testing | 4 Comments

Capabilities and Learning Outcomes

When I started training, I taught topics. Lots of topics!

Nowadays, thanks to some help from Marian Willeke and her incredible understanding of how adults learn, I get to teach capabilities instead. It’s much more fun. This is how I do it.

First off, because I’m into BDD and hypnosis, I sit and imagine some scenarios in which people actually use the learning I’ve given them. Maybe they’re the Product Owner of a Scrum Team, or using BDD for the first time, or they have a good understanding of Agile, and now they’re learning how to coach. I watch them in my head and look at what they do, or I think about what I’ve done, in similar situations.

As with all scenarios, the event that’s happening requires capabilties; the ability to do something, and do it well.

So, for instance, I imagine a team sitting together in a huddle, talking through BDD’s scenarios. Well, you’ll need to be able to use the different strengths of the different roles. And you’ll need to be able to construct well-formed scenarios, and to differentiate between acceptance criteria and a specific example.

If I get stuck thinking about what capabilities I need to teach, I go look at Bloom’s Taxonomy, and the Revised Cognitive Domain – I really like Don Clark’s site. Marian gives some advice; when you’re teaching adults, aim higher than merely remembering; give them something they can actually do with it. The keywords help me to think about the level of expertise that the learners will need to get to (though I don’t always stick to them).

So for instance, I end up with capabilities like these:

  • Explain BDD and its practices
  • Apply shortcuts to well-understood requirements to reduce analysis and planning time
  • Identify core and incidental stakeholders for a project

If I’m training, I use these in conjunction with a bit of teaching, then games or exercises that help attendees really experience the things they’re able to do for the first time, and give me a chance to help them if I see they need it. The learning outcomes make a great advert for the course, too! And I use them as a backlog while I’m running the course, so I always know what’s done and what’s next.

More recently, I’ve been using this technique to put together documents which serve the same purpose for people I can’t train directly. I put the learning outcomes at the start: “When you’ve read this, you will be able to…” It’s fun to relate the titles of each section back to the outcomes at the beginning! And, of course, each capability is an embedded command to someone to actually try out the new skill.

Best of all, each capability comes with its own test. As the person writing the course or document, I can think to myself, “If my student goes on this course or reads this document, will they be able to do  this thing?”

And, if they do actually take the course, I can ask them directly: “Do you feel confident now about doing this thing?” It gives me a chance to go over material if I have time, or to offer follow-up support afterwards (which I generally offer with all my courses, anyway).

You can read more about Bloom’s Taxonomy, and see the backlog for one of my BDD courses, on Marian’s site.

Now you should be able to create courses using capabilities, instead of topics. Hopefully you really want to, as well… but the Affective Domain, and what you can do with it, is a topic for another post.

Posted in bdd, learning models | 3 Comments

On Epiphany and Apophany

We probe, then sense, then respond.

If you’re familiar with Cynefin, you know that we categorize the obvious, analyze the complicated, probe the complex and act in chaos.

You might also know that those approaches to the different domains come with a direction to sense and respond, as well. In the ordered domains – the obvious and complicated, in which cause and effect are correlated – we sense first, then we categorize or analyze, and then we respond.

In the complex and chaotic domains, we either probe or act first, then sense, then respond.

Most people find action in chaos to be intuitive. It’s a transient domain, after all; it resolves itself quickly, and it might not resolve itself in your favour… and is even less likely to do so if you don’t act (the shallow dive into chaos notwithstanding). We don’t sit around asking, “Hm, I wonder what’s causing this fire?” We focus on putting the fire out first, and that makes sense.

But why do we do this in the complex domain? Why isn’t it useful to make sense of what we’re seeing first, before we design our experiments?

As with many questions involving human cognition, the answer is: cognitive bias.

We see patterns which don’t exist.

The term “epiphany” can be loosely defined as that moment when you say, “Oh! I get it!” because you’ve got a sudden sense of understanding something.

The term “apophany” was originally coined as a German word for the same phenomenon in schizophrenic experiences; that moment when a sufferer says, “Oh! I get it!” when they really don’t. But it’s not just schizophrenics who suffer from this. We all have this tendency to some degree. Pareidolia, the tendency to see faces in objects, is probably the best-known type of apophenia, but we see patterns everywhere.

It’s an important part of our survival. If we learn that the berry from that tree with those type of leaves isn’t good for us, or to be careful of that rock because there are often snakes sunning themselves there, or to watch out for the slippery moss, or that the deer come down here to drink and you can catch them more easily, then you have a greater chance of survival. We’re always, always looking out for patterns. In fact, when we find them, it’s so enjoyable that this pattern-learning, and application of patterns in new contexts, forms the heart of video games and is one reason why they’re horribly addictive.

In fact, our brains reward us for almost seeing the pattern, which encourages us to keep trying… and that’s why gambling is also addictive, because a lot of the time, we almost win.

In the complex domain, cause and effect can only be understood in retrospect.

This is pretty much the definition of a complex domain; one in which we can’t understand cause and effect until after we’ve caused the effect. Additionally, if you do the same thing again and again in a complex domain, it will not always have the same effect each time, so we can’t be sure of which cause might give us the effect. Even the act of trying to make sense of the domain can itself have unexpected consequences!

The problem is, we keep thinking we understand the problem. We can see the root causes. “Oh! I get it!”… and off we blithely go to “fix” our systems.

Then we’re surprised when, for instance, complexity reasserts itself and making our entire organization adopt Scrum doesn’t actually enable us to deliver software like we thought it would (though it might cause chaos, which can give us other opportunities… if we survive it).

This is the danger of sensing the problem in the complex domain; our tendency to assume we can see the causes that we need to shift to get the desired effects. And we really can’t.

The best probes are hypothesis-free.

Or rather, the hypothesis is always, “I think this might have a good impact.” Having a reasonable reason for thinking this is called coherence. It’s really hard, though, to avoid tacking on, “…because this will be the outcome.” In the complex domain, you don’t know what the outcome is going to be. It might not be a good outcome. That’s why we spend so much time making sure our probes are safe-to-fail.

I’ve written a fair bit on how to use scenarios to help generate robust experiments, but stories – human tales of what’s happening or has happened – are also a good way to find places that probes might be useful.

Particularly, if you can’t avoid having a hypothesis around outcomes (and you really can’t), one trick you can try is to have multiple outcomes. These can be conflicting, to help you check that you’re not hung up on any one outcome, or even failure outcomes that you can use to make sure your probe really is safe-to-fail.

Having multiple hypotheses means we’re more likely to find other things that we might need to measure, or other things that we need to make safe.

I really love Sensemaker.

Cognitive Edge, founded by Dave Snowden of Cynefin fame, has a really lovely bit of software called Sensemaker that collects narrative fragments – small stories – and allows the people who write those stories to say something about their desirability using Triads and Dyads and Stones.

Because we don’t know whether a story is desirable or not, the Triads and Dyads that Sensemaker uses are designed to allow for ambiguity. They usually consist of either two or three things that are all good, all bad or all neutral.

For instance, if I want to collect stories about pair programming, I might use a Dyad which has “I want to pair-program on absolutely everything!” at one end, and “I don’t want to pair-program on anything, ever,” at the other. Both of those are so extreme that it’s unlikely anyone wants to be right at either end, but they might be close. Or somewhere in the middle.

In CultureScan, Cognitive Edge use the triad, “Attitudes were about: Control, Vulnerability, or Indifference.” You can see more examples of triads, together with how they work, in the demo.

If lots and lots of people add stories, then we start seeing clusters of patterns, and we can start to think of places where experiments might be possible.

A fitness landscape from Cognitive Edge shows loose and tightly-bound clusters, together with possible directions for movement.

A fitness landscape from Cognitive Edge

In the fitness landscapes revealed by the stories, tightly-bound clusters indicate that the whole system is pretty rigidly set up to provide the stories being seen. We can only move them if there’s something to move them to; for instance, an adjacent cluster. Shifting these will require big changes to the system, which means a higher appetite for risk and failure, for which you need a real sense of urgency.

If you start seeing saddle-points, however, or looser clusters… well, that means there’s support there for something different, and we can make smaller changes that begin to shift the stories.

By looking to see what kind of things the stories there talk about, we can think of experiments we might like to perform. The stories though have to be given to the people who are actually going to run the experiments. Interpreting them or suggesting experiments is heading into analysis territory, which won’t help! Let the people on the ground try things out, and teach them how to design great experiments.

A good probe can be amplified or dampened, watched for success or failure, and is coherent.

Cognitive Edge have a practice called Ritual Dissent, that’s a bit like the “Fly on the Wall” pattern, but is done in a pretty negative way, in that the group to whom the experiment is being presented critiques it against the criteria above. I’ve found that testers, with their critical, “What about this scenario?” mindsets, can really help to make sure that probes really are good probes. Make sure the person presenting can take the criticism!

There’s a tendency in human beings, though, to analyze their way out of failure; to think of failure scenarios, then stop those happening. Failure feels bad. It tells us that our patterns were wrong! That we were suffering from apophany, not epiphany.

But we don’t need to be afraid of apophany. Instead of avoiding failure, we can make our probes safe-to-fail; perhaps by doing them at a scale where failure is survivable, or with safety nets that turn commitments into options instead (like having roll-back capability when releasing, for instance), or – my favourite – simply avoiding the trap of signalling intent when we didn’t mean to, and instead, communicating to people who might care that it’s an experiment we want to try.

And that it might just make a difference.

Posted in cynefin, real options, uncertainty | 9 Comments

Negative Scenarios in BDD

One problem I hear repeatedly from people is that they can’t find a good place to start talking about scenarios.

An easy trick is to find the person who fought to get the budget for the project (the primary stakeholder) and ask them why they wanted the project. Often they’ll tell you some story about a particular group of people that they want to support, or some new context that’s coming along that the software needs to work in. All you need to do to get your first scenario in that situation is ask, “Can you give me an example?”

When the project or capability is about non-functionals such as security or performance, though, this can be a bit trickier. =

I can remember when we were talking to the Guardian editors about performance on the R2 project. “When (other national newspaper) went live with their new front page,” one editor explained, “their site went down for 3 days under the weight of people coming to look at the page. We don’t want that to happen.”

Or, as another organization said, “We went live, and it crashed. It took three months to get the site up and running again. The code was so awful we couldn’t fix it.”

These kind of negative stories are often drivers, particularly when there are non-functionals involved. You can always handle the requirements through monitoring instead of testing, but the conversation can’t follow the usual, “Can you give me an example?” pattern, because all the examples are things that people don’t want.

Instead, keep that negativity, and ask questions like, “What performance would we need to have to avoid that happening to us? Do we have a good security strategy for avoiding the hacking attempt that ended up with (major corporation)’s passwords getting stolen? How do we make sure we don’t crash when we go live?” Keep the focus on the negatives, because that’s what we want to avoid.

When you come to write the scenarios down, whether it’s in terms of monitoring or a test, it’s often worth keeping that negative around. You can create positive scenarios to look at the monitoring boundaries, but the negative reminds people why they’re doing this.

Given we’ve gone live with the front page
When Tony Blair resigns on the same day
Then the site shouldn’t go down under the weight of people reading that news.

Remember that if you’re the first people to solve the problem then you’ll need to try something out, but if it’s just an industry standard practice, make sure you’ve got someone on the team who’s implemented it before.

Part of the power of BDD’s scenarios is that they provide examples as to why the behaviour is valuable. You’ll need to convert this to positive behaviour to implement it, but if avoiding the negatives is valuable, include those too, even it’s just text on a wiki or blurb at the top of a feature file… and don’t be afraid to start there. Negative scenarios are hugely powerful, especially since they often have the most interesting stories attached to them.

Posted in bdd | 3 Comments

140 is the new 17

I took a break from Twitter.

A while back, I ran a series of talks on Respect for People. I talked about systems which encourage respect being those which are constrained or focused, transparent, and forgiving. I also outlined one system which had none of those, and would therefore have a tendency towards disrespect: Twitter.

Twitter is unforgiving, because it’s public. It isn’t transparent; 140 characters isn’t enough to even get a hint of motivation. It isn’t constrained, except by its 140 characters. And, in a week in which a lot of stuff happened offline, followed by yet another Twitter conversation that left me wondering why I bothered, I had enough.

I decided to take a break to work out what it was I actually valued about Twitter, and why I kept coming back to it, even though I knew it was the system at fault, not the people.

I worked it out. I know, now, what it is I’m looking for. And the best way to explain it is to look to a similarly constrained system: the Haikai no Renga.

I want more Haikai no Renga.

Back in the old days, the Japanese used to get together and have dinner parties.

At these dinner parties, they had a game they played. Do you remember those childhood games where one person would write a line of a story, then the next person would write a line then fold it over, then the next person, so they could only see the line that came before? And then, when the whole story unfolds, there are usually enough coincidences and ridiculousness to make everyone laugh?

This game was a bit like that, but it always started with a poem; and the poem was usually 17 syllables long, arranged as lines of 5, 7, 5 syllables.

Then the next person would add to the poem. Their addition would be 7, 7. The two verses, together, form a tanka. Then the next person would add a 5, 7, 5; and the next a 7, 7, and so on. The poem would grow and change from its original context, and everyone would laugh and have a good time.

This game became so popular that the people who were good at starting the game got called on for their first verse a lot, so they started collecting those verses. Finally, one guy called Shiki realised that those first verses were themselves an art form, and he coined the term haiku, being a portmanteau pun on the haikai no renga, meaning a humorous linked poem, and hokku, meaning first.

In Japanese, haiku work really well with 17 syllables; but in English I think they actually work better with about 13 to 15. That’s because there are aspects of haiku which are far more important than the syllable count.

There’s a word which sets some context, usually seasonal, called the kigo.  More important than that, though, is the kireji, or cutting word. This word juxtaposes two different concepts within those three short lines, and shows how they’re related. We sometimes show it in English with a bit of punctuation.

The important thing about the concepts is that they should be a little discordant. There should be context missing. The human brain is fantastic at creating relationships and seeing patterns, and it fills in that missing context for itself, creating something called the haiku moment.

My favourite example of this came from someone called Michael in one of my early workshops:

hot air
rushing past my face –
mind the gap!

(Michael, if you’re out there, please let me know your surname as I use this all the time and would like to credit you properly.)

If you live in London like I do, having a poem like that, with the subject matter outlined but not filled in, can make you feel suddenly as if you’re right there on the platform edge with the train coming in. You make the poem. It’s the same phenomenon that causes the scenes in books to be so much better than the movies; because your involvement in creating them really makes them come to life.

But because there’s not enough context, and because one person’s interpretation can differ from another, the haikai no renga changes. It moves from one subject to another. It can be surprising. It can be funny. It can be joyful. Each verse builds on what’s come before, and takes it into a new place.

Haikai no Renga is a bit like the “Yes, and…” game.

In the “Yes, and…” improv game, each person builds on what the person before has said.

There was a man who lived in a house with a green door.
Yes, and the door knocker was haunted.
Yes, and it was the ghost of a cat who used to live there.
Yes, and…

In the “Yes, and…” game, nobody calls out the person before. Nobody says, “Wait a moment! How can a door knocker be haunted? That makes no sense!”

Instead, they run with the ridiculousness. They make it into something playful, and carry it on, and gradually the sense emerges.

They don’t say, “No. You’re wrong. Stop.” They say, “Hah, that’s funny. Let’s run with that and see where it takes us.”

If people were to play this game on Twitter, they might do it by making requests for more details. “Yes, and what was it haunted by?” “Yes, and did the man own the cat?”

Or they might provide resources to enlightening information. “Yes, and have you read the stats on door knocker hauntings? .” “Yes, and do you remember that film where the door knockers spoke in riddles?”

Or they might just add another line, and see where it leads.

It starts with a Haiku.

Like the “Yes, and…” game, and like Improv, the haiku which kicks off the haikai no renga has to be an offer; something that the other players or poets can actually play on.

If a haiku were insulting to the other poets, or had some offensive content, then I imagine the poets wouldn’t play, and probably quite rightly you’d get an argument instead of a poem. Equally, though, if the haiku is too perfect, and too complete, then there’s no room for interpretation. There’s no room for other poets to look good, then make their own offers.

And this is where I think I can make a change to the way I’m using Twitter. If I state something as if it’s a fact, or I don’t leave the right kind of opening for conversation, then I won’t get the renga I’m looking for. It doesn’t even matter whether it’s true or not, or whether there’s context missing that other people don’t know about, or whether it’s a quote from someone else; if it’s not an offer, I won’t get the renga. If I’m writing something to make myself, or one of my friends, look knowledgeable and wise, I won’t get the renga. If I’m being defensive, I won’t get the renga. If I’m engaging someone who obviously doesn’t want the renga, well, then I won’t get the renga.

And I want the renga.

So that’s what I’m going to be trying to do. I’ll try to make tweets, in future, which are offers to you, my fellow Tweeters, to join in the game and play with the conversation and see if it takes us somewhere surprising and joyful. I’ll try to make tweets which invite the “Yes, and…” rather than the “No, but…”. And if I do happen to get the “No, but…”, as I’m sure will happen, then I’ll work out what to do at that point. At least, now, I know better what it is that I want.

If you want to play, come join me.

It’s just a tweet… but it could be poetry.

Posted in life | Tagged , , | 3 Comments

The Estimates in #NoEstimates

A couple of weeks ago, I tweeted a paraphrase of something that David J. Anderson said at the London Lean Kanban Day: “Probabalistic forecasting will outperform estimation every time”. I added the conference hashtag, and, perhaps most controversially, the #NoEstimates one.

The conversation blew up, as conversations on Twitter are wont to do, with a number of people, perhaps better schooled in mathematics than I am, claiming that the tweet was ridiculous and meaningless. “Forecasting is a type of estimation!” they said. “You’re saying that estimation is better than estimation!”

That might be true in mathematics. Is it true in ordinary, everyday English? Apparently, so various arguments go, the way we’re using that #NoEstimates hashtag is confusing to newcomers and making people think we don’t do any estimation at all!

So I wanted to look at what we actually mean by “estimate”, when we’re using it in this context, and compare it to the “probabilistic forecasting” of David’s talk.

Defining “Estimate” in English

While it might be true that a probabilistic forecast is a type of estimate in maths and statistics, the commonly used English definitions are very different. Here’s what Wikipedia says about estimation:

Estimation (or estimating) is the process of finding an estimate, or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable.

And here’s what it says about probabilistic forecasting:

Probabilistic forecasting summarises what is known, or opinions about, future events. In contrast to a single-valued forecasts … probabilistic forecasts assign a probability to each of a number of different outcomes, and the complete set of probabilities represents a probability forecast.

So an estimate is usually a single value, and a probabilistic forecast is a range.

Another way of phrasing that tweet might have been, “Providing a range of outcomes along with the likelihood of those outcomes will lead to better decision-making than providing a single value, every time.”

And that might have been enough to justify David’s assertion on its own… but it gets worse.

Defining “Estimate” in Agile Software Development

In the context of Software Development, estimation has all kinds of horrible connotations. It turns out that Wikipedia has a page on Software Development Estimation too! And here’s what it says:

Software development effort estimation is the process of predicting the most realistic amount of effort (expressed in terms of person-hours or money) required to develop or maintain software based on incomplete, uncertain and noisy input.

Again, we’re looking at a single value; but do notice the “high uncertainty” there. Here’s what the page says later on:

Published surveys on estimation practice suggest that expert estimation is the dominant strategy when estimating software development effort.

The Lean / Kanban movement has emerged (and possibly diverged) from the Agile movement, in which this strategy really is dominant, mostly thanks to Scrum and Extreme Programming. Both of these suggest the use of story points and velocity to create the estimates. The idea of this is that you can then use previous data to provide a forecast; but again, that forecast is largely based on a single value. It isn’t probabilistic.

Then, too, the “expertise” of the various people performing the estimates can often be questionable. Scrum suggests that the whole team should estimate, while XP suggests that developers sign up to do the tasks, then estimate their own. XP, at least, provides some guidance for keeping the cost of change low, meaning that expertise remains relevant and velocity can be approximated from the velocity of previous sprints. I’d love to say that most Scrum teams are doing XP’s engineering practices for this reason, but a lot of them have some way to go.

I have a rough and ready scale that I use for estimating uncertainty, that helps me work out whether an estimate is even likely to be made based on expertise.  I use it to help me make decisions about whether to plan at all, or whether to give something a go and create a prototype or spike. Sometimes a whole project can be based on one small idea or piece of work that’s completely new and unproven, the effort of which can’t even be estimated using expertise (because there isn’t any), let alone historical metrics.

Even when we have expertise, the tendency is for experts to remember the mode, rather than the mean or median value. Since we often make discoveries that slow us down but rarely make discoveries which speed us up, we are almost inevitably over-optimistic. Our expertise is not merely inaccurate; it’s biased and therefore misleading. Decisions made on the basis of expert estimates have a horrible tendency to be wrong. Fortunately everyone knows this, so they include buffers. Unfortunately, work tends to expand to fill the time available… but at least that makes the estimates more accurate, right?

One of the people involved in the Twitter conversation suggested we should be using the word “guess” rather than “estimate”. And indeed, that might be mathematically more precise, and indeed, if we called them that, people might be looking for different ways to inform the decisions we need to make.

But they don’t. They’re called “estimates” in Scrum, in XP, and by just about everyone in Agile software development.

But it gets worse.

Defining “Estimate” in the context of #NoEstimates

Woody Zuill found this very early tweet from Aslak Hellesøy using the #NoEstimates hashtag, possibly the first:

@obie at #speakerconf: “Velocity is important for budgeting”. Disagree. Measuring cycle time is a richer metric. #kanban #noestimates

So the movement started with this concept of “estimate” as the familiar term from Scrum and XP. Twitter being what it is, it’s impossible to explain all the context of a concept in 140 characters, so a certain level of familiarity with the ideas around that tag is assumed. I would hope that newcomers to a movement would approach it with curiosity, and hopefully this post will make that easier.

Woody confessed to being one of the early proponents of the hashtag in the context of software development. In his post on the #NoEstimates hashtag, he defines it as:

#NoEstimates is a hashtag for the topic of exploring alternatives to estimates [of time, effort, cost] for making decisions in software development.  That is, ways to make decisions with “No Estimates”.

And later:

It’s important to ask ourselves questions such as: Do we really need estimates? Are they really that important?  Are there options? Are there other ways to do things? Are there BETTER ways to do thing? (sic)

Woody, and Neil Killick who is another proponent, both question the need for estimates in many of the decisions made in a lot of projects.

I can remember getting the Guardian’s galleries ready in time for the Oscars. Why on earth were we estimating how long things would take? That was time much better spent in retrospect on getting as many of the features complete as we could. Nobody was going to move the Oscars for us, and the safety buffer we’d decided on to make sure that everything was fully tested wasn’t changing in a hurry, either. And yet, there we were, mindlessly putting points on cards. We got enough features out in time, of course, as well as some fun extras… but I wonder if the Guardian, now far more advanced in their ability to deliver than they were in my day, still spend as much time in those meetings as we used to.

I can remember asking one project manager at a different client, “These are estimates, right? Not promises,” and getting the response, “Don’t let the business hear you say that!” The reaction to failing to deliver something to the agreed estimates was to simply get the developers to work overtime, and the reaction to that was, of course, to pad the estimates. There are a lot of posts around on the perils of estimation and estimation anti-patterns.

Even when the estimates were made in terms of time, rather than story points, I can remember decisions being unchanged in the face of the “guesses”. There was too much inertia. If that’s going to be the case, I’d rather spend my time getting work done instead of worrying about the oxymoron of “accurate estimates”.

That’s my rant finished. Woody and Neil have many more examples of decisions that are often best made with alternatives to time estimation, including much kinder, less Machiavellian ones such as trade-off and prioritization.

In that post above, Neil talks about “using empiricism over guesswork”. He regularly refers to “estimates (guesses)”, calling out the fact that we do use that terminology loosely. That’s English for you; we don’t have an authoritiative body which keeps control of definitions, so meanings change over time. For instance, the word “nice” used to mean “precise”, and before that it meant “silly”. It’s almost as if we’ve come full circle.

Defining “Definition”

Wikipedia has a page on definition itself, which points out that definitions in mathematics are different to the way I’ve used that term here:

In mathematics, a definition is used to give a precise meaning to a new term, instead of describing a pre-existing term.

I imagine this refers to “define y to be x + 2,” or similar, but just in case it’s not clear already: the #NoEstimates movement is not using the mathematical definition of “estimate”. (In fact, I’m pretty sure it’s not using the mathematical definition of “no”, either.)

We’re just trying to describe some terms, and the way they’re used, and point people at alternatives and better ways of doing things.

Defining Probabilistic Forecasting

I could describe the term, but sometimes, descriptions are better served with examples, and Troy Magennis has done a far better job of this than I ever have. If you haven’t seen his work, this is a really good starting point. In a nutshell, it says, “Use data,” and, “You don’t need very much data.”

I imagine that when David’s talk is released, that’d be a pretty good thing to watch, too.

Posted in complexity, conference | 37 Comments

A Dreyfus model for Agile adoption

A couple of people have asked for this recently, so just posting it here to get it under the CC licence. It was written a while ago, and there are better maturity models out there, but I still find this useful for showing teams a roadmap they can understand.

If you want to know more about how to create and use your own Dreyfus models, this post might help.

What does an Agile Team look like?

Novice

We have a board
We put our stories on the board
Every two weeks, we get together and celebrate what we’ve done
Sometimes we talk to the stakeholders about it
We think we might miss our deadline and have told our PM
Agile is really hard to do well

Beginner

We are trying to deliver working software
We hold retrospectives to talk about what made us unhappy
When something doesn’t work, we ask our coach what to do about it
Our coach gives us good ideas
We have delegated someone to deal with our offshore communications
We have a great BA who talks to the stakeholders a lot
We know we’re going to miss our deadline; our PM is on it
Agile requires a great deal of discipline

Practitioner

We know that our software will work in production
Every two weeks, we show our working software to the stakeholders
We talk to the stakeholders about the next set of stories they wants us to do
We have established a realistic deadline and are happy that we’ll make it
We have some good ideas of our own
We deal with blockers promptly
We write unit tests
We write acceptance tests
We hold retrospectives to work out what stopped us delivering software
We always know what ‘done’ looks like before we start work
We love our offshore team members; we know who they are and what they look like and talk to them every day
Our stakeholders are really interested in the work we’re doing
We always have tests before we start work, even if they’re manual
We’ve captured knowledge about how to deploy our code to production
Agile is a lot of fun

Knowledgeable

We are going to come well within our deadline
Sometimes we invite our CEO to our show-and-tell, so he can see what Agile looks like done well
People applaud at the end of the show-and-tell; everyone is very happy
That screen shows the offshore team working; we can talk to them any time; they can see us too
We hold retrospectives to celebrate what we learnt
We challenge our coach and change our practices to help us deliver better
We run the tests before we start work – even the manual tests, to see what’s broken and know what will be different when we’re done
Agile is applicable to more than just software delivery

Expert

We go to conferences and talk about our fantastic Agile experiences
We are helping other teams go Agile
Business outside of IT are really interested in what we’re doing
We regularly revisit our practices, and look at other teams to see what they’re doing
The company is innovative and fun
The company are happy to try things out and get quick feedback
We never have to work late or weekends
We deploy to production every two weeks*
Agile is really easy when you do it well!

* Wow, this model is old.

Posted in learning models | 5 Comments

What is BDD?

At #CukeUp today, there’s going to be a panel on defining BDD, again.

BDD is hard to define, for good reason.

First, because to do so would be to say “This is BDD” and “This is not BDD”. When you’ve got a mini-methodology that’s derived from a dozen other methods and philosophies, how do you draw the boundary? When you’re looking at which practices work well with BDD (Three Amigos, Continuous Integration and Deployment, Self-Organising Teams, for instance), how do you stop the practices which fit most contexts from being adopted as part of the whole?

I’m doing BDD with teams which aren’t software, by talking through examples of how their work might affect the world. Does that mean it’s not really BDD any more, because it can’t be automated? I’m talking to people over chat windows and sending them scenarios so they can be checked, because we’re never in the same room. Is that BDD? I’m writing scenarios for my own code, on my own, talking to a rubber duck. Is that it? I’m still using scenarios in conversation to explore and define requirements, with all the patterns that I’ve learnt from the BDD world. I’m still trying to write software that matters. It still feels like BDD to me. I can’t even say that BDD’s about “writing software that matters” any more, because I’ve been using scenarios to explore life for a while now.

I expect in a few years the body of knowledge we call “BDD” will also include adoption patterns, non-software contexts, and a whole bunch of other things that we’re looking at but which haven’t really been explored in depth. BDD is also the community; it’s the people who are learning about this and reporting their learning and asking questions, and the common questions and puzzles are also part of BDD, and they keep changing as our understanding grows and the knowledge becomes easier to access and other methods like Scrum and Kanban are adopted and enable BDD to thrive in different places.

Rather than thinking of BDD as some set of well-defined practices, I think of it as an anchor term. If you look up anything around BDD, you’re likely to find conversation, collaboration, scenarios and examples at its core, together with suggestions for how to automate them. If you look further, you’ll find Three Amigos and Outside-In and the Given / When / Then syntax and Cucumber and Selenium and JBehave and Capybara and SpecFlow and a host of other tools. Further still we have Cynefin and Domain Driven Design and NLP, which come with their own bodies of knowledge and are anchor terms for those, and part of their teaching overlaps part of what I teach, as part of BDD, and that’s OK.

That’s why, when I’m asked to define BDD, I say something like, “Using examples in conversation to illustrate behaviour.” It’s where all this started, for me. That’s the anchor. It’s where everything else comes from, but it doesn’t define the boundaries. There are no boundaries. The knowledge, and the understanding, and especially the community that we call “BDD” will keep on growing.

One day it will be big enough that there will be new names for bits of it, and maybe those new names will be considered part of BDD, and maybe they won’t. And when that happens, that should be OK, too.

NB: I reckon the only reason that other methods are defined more precisely is so they could be taught consistently at scale, especially where certification is involved. Give me excellence, diversity and evolution over consistency any day. I’m pretty sure I can sell them more easily… and so can everyone else.

Posted in bdd, conference | 10 Comments

The Shallow Dive into Chaos

For more on the Chaotic domain and subdomains, read Dave Snowden’s blog post, “…to give birth to a dancing star.” The relationship between separating people that I talk about here, and the Chaotic domain, can be seen in Cynthia Kurtz’s work with Cynefin’s pyramids, as seen here.

On one of my remote training courses over WebX, I asked the participants to do an exercise. “Think of one way that people come to consensus,” I requested, “and put it in the chat box.”

Here’s what I got back…

1st person: Voting

2nd person: Polling

3rd person: Yeah, I’ll go with polling too

And then I had to explain to them why I was giggling so much.

This is, of course, a perfect demonstration one of the ways in which people come to consensus: by following whoever came first. We might follow the dominant voice in the room, or the person who’s our boss, or the one who brought the cookies for the meeting, or the one that’s most popular, or the one who gets bored or busy least quickly.

We might even follow the person with the most expertise.

MACE: We’ll have a vote.
SEARLE: No. No, we won’t. We’re not a democracy. We’re a collection of astronauts and scientists. So we’re going to make the most informed decision available to us.
MACE: Made my you, by any chance?
SEARLE: Made by the person qualified to understand the complexities of payload delivery: our physicist.
CAPA (Physicist): …shit.

— “Sunshine”

If you’re doing something as unsafe to fail as nuclear payload delivery, getting the expert to make a decision might be wise. (Sunshine is a great film, by the way, if you’re into SF with a touch of horror.)

If you’re doing something that’s complex, however, which means that it requires experiment, the expertise available is limited. Experiments, also called Safe-To-Fail Probes, are the way forward when we want our practices and our outcomes to emerge. This is also a fantastic trick for getting out of stultefying complicatedness or simplicity, and generating some innovation.

But… if you stick everyone in a room and ask them to come up with an experiment, you’ll get an experiment.

It just might not be the best one.

More ideas mean better ideas

In a population where we know nothing about the worth of different ideas, the chance of any given idea being above average is 50%. If we call those “good ideas”, then we’ve got a 50% chance of something being good.

Maybe… just maybe… the idea that the dominant person, or the first person, or the expert, or the person with the most time comes up with will be better than average. Maybe.

But if you have three ideas, completely independently generated, what’s the chance of at least one of them being above average?

Going back to my A-level Maths… it’s 1 – (chance of all ideas being below average) which is 1 – (1/2 x 1/2 x 1/2) which is 87.5%.

That’s a vast improvement. Now imagine that everyone has a chance to generate those ideas.

If you want better ideas, stop people gaining consensus too early.

For this to work, the experiments that people come up with have to be independent. That means we have to separate people.

Now, obviously, if you have a hundred and twenty people and want to get experiments from them, you might not have time to go through a hundred and twenty separate ideas. We still want diversity in our ideas, though (and this is why it’s important to have diversity for innovation; because it gives you different ideas).

So we split people into homogenous groups.

This is the complete opposite of Scrum’s cross-functional teams. We want diversity between the groups, not amongst them. This is a bit like Ronald S. Burt’s “Structural Holes” (Jabe Bloom’s LKUK13 talk on this was excellent); innovation comes from the disconnect; from deliberately keeping people silo’d. We put all the devs together; the managers together; the senior staff together; the group visiting from Hungary together; the dominant speakers together… whatever will give you the most diversity in your ideas.

Once people have come up with their experiments, you can bring them back together to work out which one are going to go ahead. Running several concurrently is good too!

If you’ve ever used post-its in a retrospective, or other forms of silent work to help ensure that everyone’s thoughts are captured, you’re already familiar with this. Silent work is an example of the shallow dive into chaos!

Check that your experiments are safe-to-fail

Dave Snowden and Cognitive Edge reckon you need five things for a good experiment:

  • A way to tell it’s succeeding
  • A way to tell it’s failing
  • A way to amplify it
  • A way to dampen it
  • Coherence (a reason to think it might produce good results).

If you can think of a reason why your experiment might fail, look to see if that failure is safe; either because it’s cheap in time or effort, or because the scale of the failure is small. The last post I wrote on using scenarios for experiment design can help you identify these aspects, too.

An even better thing to do is to let someone else examine your ideas for experiment. Cognitive Edge’s Ritual Dissent pattern (requires free sign-up) is fantastic for that; it’s very similar to the Fly-On-The-Wall pattern from Linda Rising and Mary Lynn Mann’s “Fearless Change”.

In both patterns, the idea is presented to the critical group without any questions being asked, then critiqued without eye contact; usually done by the presenter turning around or putting on a mask. Because as soon as we make eye contact… as soon as we have to engage with other people… as soon as we start having a conversation, whether with spoken language or body language… then we automatically start seeking consensus.

And consensus isn’t always what you want.

Posted in complexity, cynefin | 3 Comments