Using BDD with Legacy Systems

One question I keep being asked is, “Can we use BDD with our legacy systems?”

To help answer this, let me give my simplest definition of BDD:

BDD is the art of using examples in conversation to illustrate behaviour.

So, if you want to talk through the behaviour of your system, and use examples to illustrate that behaviour, yes, you can do this.

There are a couple of benefits of this which might be harder to achieve. The biggest is that using examples in conversation helps you explore behaviour, rather than just specifying it. Examples are easy to discuss, but it’s also easy to decide that they don’t matter, or that you can worry about that scenario later, or that you want different behaviour. It’s harder to do that when you’re talking about tests or specifications, and it’s harder still when the behaviour you’re discussing already exists. If you do find yourself talking through the different examples, you’re probably clarifying behaviour. That’s OK; just recognize that you’re doing it, so some of the usual advice we all give around BDD, and particularly around scope management with BDD, won’t apply. Asking questions, though, is still a great idea. You might find all those places where the behaviour is wrong, or annoying, or isn’t needed at all. Using examples is a great way to illustrate bugs, and helps testers out a lot.

The second aspect is the automation. (If you do this, please consider the other post I wrote about things I like to see in place first; these still apply.) Automation is usually harder with legacy systems, because often they weren’t designed with automation in mind. Websites have elements with no identifiers or meaningful classes, windows applications have complicated pieces with no automation peers, some Adobe widgets assume that if you’re using automation it must be because of reading disabilities so it will helpfully pop up boxes on your screen to ‘help’ you (thank you for that, Adobe).

But the real reason why BDD becomes hard with legacy systems is because often, the system was designed without talking through the behaviour, and the behaviour itself makes no sense.

I recently tried to retrofit SpecFlow around my little toy pet shop. The pet shop itself was designed just as a way of showcasing different automation elements, so it wasn’t particularly realistic. Because of that, I find it impossible now to have conversations about its behaviour, because its behaviour simply isn’t useful. It isn’t how I would design it if I were actually designing a UI for pet shop software. I can’t even talk to my rubber duck about it. I won’t be able to sensibly fit SpecFlow to this until I can actually change the behaviour to something sensible.

If you’re in one of those unfortunate environments with a bit of a blame culture, BDD will help introduce transparency into the quality of your process – or lack of it. Just so you’re warned. (In my instance it was a sensible trade-off at the time, since I originally wanted automation software, not a pet shop, and it’s my software so it’s my problem. You may not be so lucky.)

Automation on legacy systems can give you a nice safety net for any other changes, so it might be worth trying this for a few key scenarios. Teams and particularly testers I’ve worked with have been saved a lot of time in the past by just having one scenario that makes sure the app can start - automation is particularly useful if your build system is closer to your production system than your development one; frequently the case for legacy systems.

If you do happen to find an aspect of behaviour that you like and want to capture, then by all means, do BDD. Talk through the examples, stick them in a wiki, automate them if you can, remembering that having conversations is more important than capturing conversations is more important than automating conversations. You might even find out why things behave a certain way, and come to like the existing behaviour better.

Otherwise, you might want to wait until you’re changing the behaviour to something you like.

This entry was posted in bdd. Bookmark the permalink.

2 Responses to Using BDD with Legacy Systems

  1. I have found that BDD is a luxury in legacy code that I usually can’t afford at first. This is not to say I’m anti-automated-unit-testing, just the opposite. Allow me to explain.

    I have found 4 benefits of unit testing

    1) specification
    2) feedback
    3) regression
    4) granularity

    Each of these is important (although different individuals will value them differently). I find BDD is most helpful in Specification. Particular in the discovery of specification and guiding a specification that locks implementation at the appropriate level (meaning if the test it breaks the user cares)

    In my experiences in legacy code, however, I have not usually been able to focus on any of these except #3) Regression. Maybe this is just because I haven’t learnt how yet, and someday we will look back and think “how naive”. But here’s my general approach.

    Create some locking tests.

    Locking tests consist of
    1) do something (consistently)
    2) capture what it produced
    3) verify it is producing the same thing

    I usually do this with some form of approval testing like approval tests ( github.com/approvals or nice video here http://www.youtube.com/watch?v=n-JSrvW4MVs )

    These test usually will forsake the other 3 values. for example:

    #1 Ignoring Spec
    What does this test do? I had 10,000 of code that ran the billing cycle. It was called from a very simple updateAccounts() method. It wasn’t repeatable because it modified the database, so I turned the database to readonly. this broke the code, but I added a flag to simply log update/create queries to file instead of executing them. now it “worked” creating a *very* long log file. I approved this as the golden master and afterwards could refactor (I got it down to 5,000 lines that week) Now if asked what this test does? the best I could tell you is that it produces sql code, and I am very positive it still produces the same sql that it did before it started. After the refactoring I can also tell you that in some cases is produced *very* similar but not quite exact code which is most likely a bug (someone fixed 7 out of 8 places) but I’m not completely sure that it is and so I didn’t go *fixing* it. what do these sql statements do to the database and how does it change it? I have no idea.I believe when it’s down to ~800 line these questions will be better to start asking…

    #2 Ignoring feedback
    Tests should be short right? I am usually quite happy to have a long test over no test. moreover if I can only test a checksum (say md5 of some data vs all data, or filesizes of a directory instead of the actual contents of the files) so be it. I used a process to lock down the directory listing of hundreds of files being created though a process. The test took 1 1/2 hours initially. I ran it through a code coverage script to iteratively remove run time scenarios that didn’t affect coverage until it got down to a 5 minute test. This is still insanely long (I want unit tests to be in the no tests

    #4 ignoring granularity.
    Why did that tests break? We were working on a php system of 6000 lines of code ( I wish I could say a 6,000 line method, but it wasn’t I was just a block of code, putting it into a method didn’t seem to interest the original author.) We locked the output, a big nasty piece of html. and started to refactor. about 20 minutes in it broke. We were using a diff tool to compare the results, but couldn’t figure it out as to why? So we hit undo a few times. Still broke. few more, Still broke. reverted all the way back to the beginning. Still broke. What? we finally figured out that this app ( a calendar view) updated every 1/2hour. the solution. we waited till 11:01. Locked the tests. coded till 11:25 and then waited until 11:31 and repeat. We worked this way all day. It is worth noting that we did at least get temporal granularity here. I could tell if the line I altered broke something if not how or why it broke it. But the best insight I got was merely “the test is now red”

    All of these tests suck compared to real BDD/unit tests. Many of them I discarded after they allowed me to refactor the section. Later, I would seek out more valuable longer term solutions and BDD is very useful then, but in legacy code I usually find I have to focus on *better*, *good* is a solution that is just too expensive to be practical just yet.

  2. Pingback: The Baeldung Weekly Review 4

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s