A long time ago, before most people (including me) had ever heard of the concept of story points, I came in as the CTO at a major social networking site. The dev team, even though staffed with a lot of excellent developers, had experienced enormous historical difficulty in delivering according to expectations, theirs or anyone else’s. People both inside and outside of the team complained that the team wasn’t delivering big projects on a timely basis, plus there were a lot of small-but-important items that never got done because the team was focused on larger work.
What’s the team’s capacity, I asked? How much can it reasonably take on before it becomes too much? How do we viably fit in smaller items along with the major initiatives, instead if it being an either/or? No one really knew, or even had thought much about what seemed like natural (even mandatory) questions to be asking.
At the time, I declared that it seemed like we just needed some abstract unit of capacity (I jokingly proposed the first Carrollian word that popped into my head: Quocknipucks) that could be used to help us “fill up the jar” with work items, large and small, without overfilling it. Each item would be valued in terms of its number of Quocknipucks, representing some approximation of size, and we’d come up with a total team capacity for a given time frame by using the same invented Quocknipuck units, which we would adjust as we gained experience with the team, the platform, the flow.
Little did I know that I was independently coming up with the basic idea behind story points. Interestingly, the term I chose was deliberately whimsical, to separate the concept from things in the real world like the actual amount of time needed for any particular item.
Here’s what I’ll argue: the basic idea behind story points is sound, and useful; yet, somehow a certain set of Agilists has now come to reject story points entirely, even referring to them (wrong-headedly and quite overstated) as “widely discredited”.
Not only do I tend to not like bandwagons, I tend to especially dislike reactive backlashes that become anti-bandwagons, as it were, such as the vitriol that is now often directed at story points. So, let’s discuss my now-contrarian view that story points were actually a good and reasonable idea to begin with, and how they can be fruitfully used without (as seems to be the dire stance about them these days) somehow sapping the very life motivation of a team.
What story points really are
A story point represents, in essence, a unit or portion of total team capacity for a given time period. It bakes in some rough assessment of level of effort needed. But that assessment is not size alone, purely: the greater the risk, complexity, or uncertainty involved in the item, the higher the likelihood that the item will consume more of the team’s capacity, so the greater the amount of assigned story points.
Note that a story point assessment of an item is an estimate, not perfect. It’s not intended (at all) to be used directly to predict specific delivery time/date of the item. In fact, I’d argue (contrary to many Agile pundits) that story points don’t even make sense to think of as a unit of time, any more than gas tank capacity units really represent miles driven. They’re capacity units, not time units. They intentionally “abstract away” the time element, for solid reasons discussed below.
Common problems that story points solve
Here’s why story points made sense to invent as a concept/approach. There are two huge and intertwined problems to solve when it comes to planning what work items a team should take on for a specific period of time:
- we don’t know the team’s overall capacity to take on work
- we don’t know the size of each proposed item. And most people intuitively understand that when you’re filling a fixed-dimension bucket, the size of each item you put into the bucket matters, a lot.
Let’s double-click on each of those problems, and examine how story points fit as a solution, particularly as compared to the alternatives commonly used or proposed.
Problem #1: we don’t know/can’t precisely measure the actual capacity of the team: i.e., the total amount of work the team can viably take on for a specific period of time.
Approaches to solve:
- “Just count stories” and track how many the team actually does: there’s your capacity, at least for moving forward. But wait: stories aren’t all equally sized, and can in fact vary drastically in size from item to item. A team might finish a dozen small stories in a sprint, or just a couple of larger ones. (Note that #NoEstimates advocates disagree with this fundamental point: they appear to believe that all stories, no matter what, can be sliced into essentially equal-sized (and independent!) pieces. But no actual examples ever seem to be provided that would demonstrate that this fervent belief in “slice size equality” is well-founded. In my decades of software development, I’ve not worked on an application where desired stories, even when sliced, were even close to either equal-sized and independent, much less both at once. See my example below of Twitter’s tweet-length change).
- Or: use story points, methodically assigned based on the gauged size/complexity/risk of each item, and track how many total story points the team typically accomplishes in an equal-length sprint. That’s your capacity, and it allows you to take item size differences appropriately into account as you plan your next sprint.
Problem #2: we don’t really know the size of each item. Why does that matter? Think capacity planning: again, when you’re filling up a finite-sized bucket, you’d better take into account the sizes of the various items you’re putting into it.
Approaches to solve:
- Estimate the likely hours of work entailed for each item, and use a “people multiplied by available work hours” as the denominator in the formula for determining total capacity. This has proved problematic in many ways. People aren’t good at gauging the time (measured by time unit) that it will take to do something. And time required for an item can vary, often drastically, depending on who does the item. So most people now recognize that this isn’t a viable approach.
Slicing? But stories aren’t sushi
- Or, slice apart every item so that the resulting slices are approximately equal in size, and small. But in the end, that basically recreates story points, in that it answers the critical question, how big is the original item? I.e., this particular item is 5 stories once sliced; that one is 8 stories. Moreover, this approach raises another issue: you don’t have the luxury to do that depth of analysis on all items up front, and if you did, it would be doing so too soon, because a) things change as development progresses on a system; and b) you’ll have whole (original) items that you will end up de-scoping from delivery entirely, meaning that the slicing work was wasted.
Slicing and counting as an approach also tends to assume fungibility/equality-of-size of each end slice, with few or any interdependencies, dismissing the business need for the whole, and emphasizing delivery of mere fragments, one by one. That assumption (often) is simply not viable or true. And if I’m a product owner needing the development of a particular end business capability, I don’t especially care if that capability is composed of ten sliced stories or five, from the point of view of the development team. Getting three out of the ten slices may do the business almost no good whatsoever; despite what advocates claim, it’s often not easy (or possible) to make slices independently valuable, much less independently deliverable. Rather, a product owner typically needs essentially all those slices implemented to deliver the end business capability they’re seeking.
Slicing: no, stories aren’t tapas either
My canonical example: consider what might have qualified as independently deliverable, same-sized “slices” for the necessary dev work when Twitter moved (in November, 2017) from a character limit of 140 to 280. You can’t, at least not meaningfully, when you think about the likely number of subsystems that had to jointly support the higher limit. Were there individual chunks of work involved? Of course. Were any of those chunks really useful (to the end user) on their own? And were those chunks arguably all the same size? Of course not. What mattered was (solely) the end goal, and the concrete delivery of all necessary and coordinated changes, to support the overall desired 280-character capability.
So, for these reasons, slicing (alone) isn’t a great answer to the underlying need for sizing. You just can’t wish away the fact that not all work is the same size and that work cannot necessarily be forced into the same size.
Again, story points provide a workable solution
- Or: use story points to approximate the relative size of each original item. An individual item’s story points DON’T specifically predict duration or time frame to complete, at least not to any level of precision. They’re meant to form an abstract notion of size, combined with risk, combined with complexity, and expressed in a numerical sense relative to other stories. Story points intentionally abstract away important factors such as who will do the work, when it might begin, and whether there could be long interruptions while doing it. As such, story points of course don’t/shouldn’t translate directly to a schedule. This should be blatantly obvious, but often isn’t. See Mike Cohn’s article that presents an analogy of “walking points”, where he points out that the same exact task (or, in the analogy, walking distance) but with greater risk or complexity is “larger” than the task without those aspects.
So, story points provide a workable solution to both of the huge issues outlined above. They help you approximate the total capacity of the team, and they give you a meaningful and explainable way to roughly fill up the team’s capacity bucket appropriately with a variety of different-sized items, without overstuffing it.
There’s so much more to be said on Quocknipucks story points, but this post has already gotten quite long. So, stay tuned for Part 2: now that we’ve delved into why story points make sense, I’ll list some observations, caveats, and recommendations for using them in general, and cover some common-sense answers to the specific objections that tend to get raised about them. And I’ll discuss why, ironically, the dev team should actually be among the biggest advocates of doing capacity planning via the use of story points.
Lagniappe:
- Cohn, Mike. Agile Estimating and Planning. Prentice Hall, 2005.
- Here are links to a few blog posts that disagree with all or part of my stance, and whose arguments I will address in Part II:
-
- Brad Black, May 19, 2016. “Are Story Points Evil?“
- Mike Veerman, August 30, 2017. “Stop using Story Points!“
- Joshua Kerievsky, October 12, 2012. “Stop Using Story Points“
- Michael Dubakov, 2010. “5 Reasons Why You Should Stop Estimating User Stories“
- Ben Northrop, August 22, 2012. “Velocity and Story Points – They don’t add up!“
- Steven Lott, September 23, 2016. “Story Points: Should We Give Up on Them?“
-
- Blog posts from Cohn: all of these are worth reading, but I disagree with his end stance in the final article.
- Mike Cohn, “Story Points Are Still About Effort“
- Mike Cohn, “Velocity-Driven Sprint Planning“
- Mike Cohn, “Capacity-Driven Sprint Planning“
- Mike Cohn, “Why I Prefer Capacity-Driven Sprint Planning“
Speak Your Mind