Sunday, January 29, 2012

Unit test only?

How far can unit tests take you? There's a very interesting presentation here: http://www.infoq.com/presentations/integration-tests-scam. I've been of the opinion that you can't get around doing higher-level tests, however this is somewhat based on the way I've learned software development. The typical setup I've worked in is hierarchical. At the bottom you have unit tests, usually specific to a single object. Then you have unit integration tests, usually specific to a small set of closely related objects providing a significant chunk of capability. Above that, there's application-level tests, which verify that the capabilities integrate nicely. Above that, there's system level tests of various types, which ensure that all system requirements are met. Above that are systems integration tests, testing communications between systems.

In principle, though, at any given point in time an interface/communication/message consists of two objects -- a sender and a receiver. If the system is highly modular, these objects will likely be small and fairly simple. Certainly in a system of any size there will be differences in level of abstraction, but the basic idea of sender and receiver is fundamental. If this is granted, then in principle it ought to be provable that if all possible inputs and outputs to objects are known, then unit testing is sufficient.

I have two questions regarding this position. The first has to do with emergent behavior. The second is about the feasibility of determining the complete mapping between inputs and outputs. It may be that the two questions are connected, and are really two sides of the same coin, but let start with emergent behavior. My question here is not clear yet; it's more of an uneasiness. Consider the firing of neurons in a neural network simulating a brain, say, or perhaps some much simpler neural network but large enough to be non-trivial, so that the behavior of the system relies on complex interactions between the neurons. Each neuron has a set of inputs and a set of outputs. Each neuron is, in itself, simple. Complex behavior arises not from individual neurons, but from patterns of neuron firing. Now let us suppose that we wanted to test that our network is working properly using the unit test method. Testing each neuron is simple enough. Take a neuron. Construct a set of input values. Test that the outputs follow the neuron activation function. Easy!

However, there is a big gap between what these unit tests tell us about the system, and what the user or higher-level programmer wants to know about the system. True, the neurons may be coded perfectly, but what we care about is at a level that is difficult to relate to the unit tests. I am not talking here about the difference between user tests and programmer tests. A programmer looking at this network from a higher level has the same issue -- what do the unit tests say about the overall network behavior? Or, put another way, where does the confidence come from that the system works correctly? Tests at the neuron level are not sufficient to determine this.

I believe a similar kind of problem is envisioned by one of the commentators to the video who noted that a Mars Rover failed to land correctly because the parachute system deployed with a jerk that the (separately tested) detachment system interpreted as a landing. In principle, this problem could be caught if the characteristic for each system are known and the specs are consistent -- the sensor inputs from the parachute deployment need to be well-characterized through parachute testing, and those characteristics can then be fed to the detachment system independently. If the sensor readings for a landing are very similar to those generated by a parachute jerk, then there is an engineering problem to solve.

In practice, however, though it might be straightforward to test each sensor, this may not determine the overall behavior of the system. System complexity is especially likely when interacting with hardware with many design parameters, working in unpredictable environments. A parachute detachment might be triggered by the combined results from a hundred different sensors, with the value from each sensor variable according to wind, temperature, deterioration due to aging and other factors changing over time. What is needed is to know the overall system behavior, given the component behaviors. Effectively, what needs to be tested is not the design of each component, but the system design model. If the design is wrong -- if the detachment system detaches the parachute early because of a flaw -- then it could still be the case that each individual sensor is working correctly.

The question is: is unit testing sufficient to catch a system-level design flaw? This in part depends on whether "units" are allowed to occur at different levels of abstraction. Let's suppose they can, otherwise unit tests are going to be very limited in scope. So now we have a "decision maker" object somewhere in the detachment system that periodically takes inputs from the sensors and makes a decision on whether or not to detach the parachute.

Off the bat, it seems to me that a unit test for the decision maker object is really an integration test by another name. Granted, there may be time advantages in being able to mock up sensors rather than interact with the hardware. But from the perspective of what the test needs to accomplish, from a design point of view, the unit test takes results from a lower level of abstraction and processes them. The coder for the unit test needs to know about the different possible input values from those sensors and what the appropriate outcomes should be. In terms of the thinking behind the test design, that is de facto integration.

Is the new and cool unit-test-only method, then, really that different from the old-and-crusty unit + unit integration + ... method? I am beginning to think it is less different than I imagined at first. All that appears to be missing is a top-level integration object that, in the traditional view, would represent the system. If we envision the system as an object with a given set of inputs and outputs, and everything at a lower level substituted by mocks, then the unit test for this system object is just an end-to-end test. The same idea applies one level higher up for systems of systems.

Broadening unit testing in this way, we can get a reasonable correspondence between old-style and unit-only testing. Old-style reflects likely changes in responsibility as code is looked at from a higher and higher level. Unit-only emphasizes that, regardless of the level, the same thing is happening -- there are input and outputs.

This correspondence suggests to me that old-style and unit-only testing ultimately share the same strengths and weaknesses. You may conduct a traditional interface test with the parachute and detachment systems, even bring in the hardware if you like, but this does not guarantee that integration problems will be found if the set of inputs and outputs have very complex relations, and the problem rarely occurs. If it is possible to break the complex input-output relations into something simpler, that is all to the good, regardless of test style. The real gains from unit-test only style are not from demanding "unit tests only!" They come from making a system where the objects are cohesive and loosely coupled, and from having test structures such as mocks that support testing objects individually. Credit to the agile guys for coming up with methodologies that push that issue front and center where it belongs.

Thursday, January 26, 2012

TDD does something useful

Thinking through what I need to do for the plot capability I'm writing, I've had a realization. I thought I was writing a library. That might, in fact, still be the form that the plot stuff takes. But thinking about how plot might be used, there's a bifurcation. On the one hand, I might want to construct a plot instance for each cycle, do the drawing, and throw it away. On the other hand, I might want a plot instance that persists and can be updated. I realized that if I take the first route, plot doesn't have a state -- it's really a function, not an object. How did I get to this conclusion? Because writing tests forced me to construct a plot, and think about how it is used rather than just focusing on the object capabilities.

Of course, it's impossible to tell whether I would have arrived at the same conclusion without doing TDD. But even if I had arrived there, I'm not sure it would have happened so early on.

I've been revising my specifications for the plot capability. I'm not sure about format, but here's the current version.

Top-level story: A user wants to add a plot to an display for a sim variable.
Constraints: Existing architecture provides a data source abstraction and a user interface. The user interface will need to change to use plotlib to request a plot drawing. UI will provide plot specifications including an image to draw into. Plot must construct the image using the primitive drawing functions provided by the image interface, according to the specifications.

Specifications include:
1) axes
2) titles (are these user-specifiable?)
3) legends
4) one or more variable to plot
5) styles (color, line style, fonts etc.)
6) a data source, provided as a data river instance

Desirable:
- Plots should be cross-platform

The specifications might not all be relevant to plot. For example, perhaps font needs to be handled at a different level, leaving plot with simply a writeText(string) function, or a setFontSize(int), or even setFontSize(FontSizes) using an enum such as normal, large, small etc. I'm not going to worry about this for now. That's down the road for sure.

I think once I get used to the "run and see the tests pass" I might actually like it. What's surprising me at the moment is that the tests are driving the design to some degree. Based on the great advice I received on the TDD board, I'm feeling free to think about design on both large and small scales, while always coming back to "OK, but what's the next test?", and "how does that spec translate into a test?" One of the advantages of not looking ahead is that I'm not having to carry everything around in my head all the time. I don't have to think "I'm going to write the Plot constructor, and it will have a Spec, and the Spec will need to have an Image, and the Image will need to be constructed, and the Spec will also have Axes which might be an concrete instantiation of some abstract PlotObject abstract class, oh, and ..." If I want to throw up some test balloons like this to help me see where I might be going, I do. But I also realize that I need to get to them via the tests, because the tests show what's necessary on a practical level to get the objects working.

So I suppose that this means I am starting to see the tests playing a positive role in developing the design, and also in refactoring. I've done a bit of the latter already, to reflect my updated specs, and yes, it was nice being able to run the unit tests and see them pass even though at this stage they are fairly trivial. It reminds me of why I prefer using static rather than dynamic languages. Just as the C++ compiler catches all sorts of type errors that might make it though in a dynamic language, so the tests catch all sorts of logic errors that might make it through a compilation without them. Writing the unit tests is like building a customized "logic compiler" for my code.

One big remaining question is the amount of time spent refactoring tests and the quality of the tests that I end up with. TDD advocates tend to minimize this issue, but I don't believe them. There's actually a book on this subject (xUnit Test Patterns: Refactoring Test Code) where the author writes the following:

We started doing eXtreme Programming "by the book" using pretty much all of the practices it recommended, including pair programming, collective ownership, and test-driven development. Of course, we encountered a few challenges in figuring out how to test some aspects of the behavior of the application, but we still managed to write tests for most of the code. Then, as the project progressed, I started to notice a disturbing trend: It was taking longer and longer to implement seemingly similar tasks.

I explained the problem to the developers and asked them to record on each task card how much time had been spent writing new tests, modifying existing tests, and writing the production code. Very quickly, a trend emerged. While the time spent writing new tests and writing the production code seemed to be staying more or less constant, the amount of time spent modifying existing tests was increasing and the developers' estimates were going up as a result. When a developer asked me to pair on a task and we spent 90% of the time modifying existing tests to accommodate a relatively minor change.

The problem is that the people who invent methods use a lot of tacit knowledge as they develop. This is noticeable in the books they write. When I was reading Kent Beck's "Test-Driven Development by Example," there were several occasions when I thought "OK, I can see that the way he goes is a legitimate way to go, but it's not the way I would have gone. I wonder why he chose it?" It's one of those Alistair Cockburn things where we don't know what we know, and therefore can't tell whether or not everything that needs to be expressed has been expressed. even if we could express it.

I will probably need to get that book on xUnit refactoring at some point. But elsewhere on a forum I saw someone else say something to the effect of "if the team doesn't use this book, they will get into trouble." Of course, that individual could be wrong, and certainly he is speaking in a context. But the fact remains: there's no royal road to "clean and working code" developed quickly that's easy to maintain. TDD might start with two sentences worth of rules, but the outworking of the rules is still many books worth of material and experience and that's no bad thing; it implies to me that TDD has enough to it to stand a chance of working across a range of projects.

Enough for now. Off to write some tests.

TDD with QtCreator

I'm feeling better about TDD. I might even be seeing some real benefits even though there's hardly any code yet.

To begin with, let's get some setup out of the way. One challenge was to get gtest working with QtCreator. A question I've had is whether to have one executable containing all my tests, or whether to split tests into multiple executables. This is somewhat significant, since my goal is to type CTRL-R in QtCreator and have the tests build and run -- since the TDD methodology means running tests often, it has to be easy and fast. On the other hand, it seems logical to split tests up into separate executables in case I only want to run one set (e.g. the image tests). I came up with the following solution.
1) Have the makefile produce multiple executables.
2) Have the makefile produce a run_test.sh script that runs all the executables, stopping if there's a breakage and echoing "ALL TESTS PASS" if not. This makes it easy to see when all the tests pass (hopefully the majority of the time). The rule to make the script is quite simple:

make_exec_script: $(TESTS)
echo '#!/bin/sh' > $(EXEC_SCRIPT)
for i in $(TESTS); do \
echo "./$$i && \\" >> $(EXEC_SCRIPT); \
done
echo 'echo " " && \\' >> $(EXEC_SCRIPT); \
echo 'echo \* \* \* ALL TESTS PASSED \* \* \*' >> $(EXEC_SCRIPT)
chmod 755 $(EXEC_SCRIPT)

Now I just set the project up to execute the script, and I'm done. Granted, this won't work on Windows. Poor Windows. Always the oddball.

Saturday, January 21, 2012

Shu-ha-ri and the art of learning

I have waited too long to write this entry.
I thought that I needed time for ideas to sink in, and since I'd posted a question on the GOOS board, I thought it would be a good idea to give time for other people to answer. Then, looking over what I've written as a summary of GOOS so far, I felt I had not done a good job of summarizing the GOOS approach, but possibly lacked the background to understand what they were after. This is always an issue with getting into something new, of course: where there is a community, there is a community language and assumed background which is not always easy for the outsider to pick up on.

To get some background, I've started reading "Agile Software Development: The Cooperative Game" (CG) by Alistair Cockburn. I've always like Cockburn's stuff -- it was his view of the human side of software development that drew me to Agile methods in the first place. I was initially going to look at a different book of his, referenced in GOOS, but I think CG is a book that I need to read. In short, it is a discussion of theories of programming, and more broadly of epistemology and communication in a programming setting.

Cockburn begins with a familiar couple of epistemic problems. Can we know what we are experiencing? And (taking other minds for granted), can we express what we know? Cockburn answers both questions in the negative. I found his discussion interesting, but not always coherent. At some level, if it is not possible to know what we experience, it is hard to see how we can then express it to ourselves. And if we cannot express it to ourselves, and cannot express it to others, then why write a book about it? The method employed is not that of philosophical discussion, building cases from axioms, or syllogisms, for example, but rather attempting to convince through stories and reflections on what he takes from those stories.

In his first story, for example, the author turns up at a party with a bottle of red wine, which the hostess insists is white, even though the label clearly says it is red. Later on, when he points out the mistake, the hostess again insists the wine is white and even points to the label, finding out only when she reads it out loud that it says "red." From this, Cockburn argues that we are subject to making mistakes when we think we know something that we don't know, and therefore we can end up producing requirements that contain observational errors. Fair enough. But this does not really address whether we can express what we know, or whether we can know what we experience. Rather, it argues that we may be mistaken about what we think we know, and may therefore end up conveying mistakes to others. One could argue that, on the contrary, it is precisely the fact that we can communicate and can understand our experiences that allows the hostess to realize a mistake has been made, and to laugh together with the author about it.

I don't think that Cockburn had a epistemological treatise in mind, though, when he drafted this book. His audience is programmers, and pragmatic ones at that. He encourages those who do not like "abstract" discussion to skip the first chapter altogether. His advice, then, should probably be seen as practical rather than theoretical, even in "abstract" chapters. Looked at this way, there is a lot to like.

Practically, our communication suffers from a lot of problems.
1) Our comprehension of our own experiences is limited by our ability to interpret those experiences.
2) Our ability to interpret is limited by many factors including language, presuppositions, eager interpretation (judging too early), and level of mastery.
3) In communicating with others, we need to establish a common vocabulary. This is impossible to do perfectly since understanding is layered in terms of learning and experience and everyone is unique in that regard. At best, we look for sufficiently similar experiences, which might involve the equivalent of an experience English speaker adopting a very simple vocabulary to talk to a child.

Expanding on point 3, Cockburn brings in an idea of learning mastery built on Aikido's concept Shu-Ha-Ri. Mastery occurs in three stages. In Shu (learn), we start from the beginning and learn one particular path or technique. Trying to learn different techniques at this stage of mastery leads only to confusion. In Ha (detach), we come to see that our technique does not always work well, and that there are other techniques that work better in certain circumstances. We look for boundaries that define when to use one technique rather than another. In Ri (transcend), we come to see techniques as means to an end, not an end in themselves, and roll our own ways of getting there specific to the task, using the knowledge of the techniques without being restricted to them.

According to Cockburn, this causes problems with a level 3 (Ri) person talks to a newbie. Ri people say things like "do what works," by which they mean something like "there's no perfect answer, but there is a multiplicity of good ones so there's no need to be prescriptive." What a newbie might hear, though, is "it doesn't matter how you code, so long as it works," or "I'm not going to help you figure out how to do it" (I'm interpolating here -- these are my words, not Cockburn's). Beginners need to know that they are getting something right.

There's an obvious parallel to my experience with GOOS and TDD so far. I may (or may not) have written this explicitly in a previous post, but what I'm looking for is ONE way to get into TDD. Looking around the web, there are plenty of people arguing for their interpretations of TDD. That's fine, but I need context first. I recognize the danger of seeing GOOS as "the one, authentic, right method." I think I have enough experience to avoid making that mistake. But I do want to understand GOOS at a deep level, deeper than just "make a slice, code a test, code the initial behavior then fill out from there." I sense that there is more to GOOS than this -- assumptions that are more or less tacit that make GOOS a good fit for Mocks rather than Mocks simply being one technology the authors have decided to employ.

Clearly, what I was doing in my post to the GOOS message board was an attempt to draw on common background, to phrase GOOS ideas in terminology I have seen before. This is good. The first step towards communication is trying to find what is in common, like two modems negotiating a baud rate. But unlike a modem, whose limitations are inherent in the hardware, I can work my way up from my current 300baud state to 56k, and who knows, maybe to T1 some day.

Wednesday, January 4, 2012

Starting TDD

Last night and this morning, I've been doing some reading in "Growing Object Oriented Software, Guided by Tests," by Steve Freeman and Nat Pryce (hereafter referred to as GOOS). I really like it and I think it's clearing up some of the conceptual problems I had as a newbie with Test-Driven Development (TDD). I'll just list a few here, and maybe in future entries discuss how I'm going to try to do thpoints oute new plot library implementation using TDD. I'm hoping this could be a series on what I'm learning by doing TDD, or, if nothing else, a cheap way to document the design I'm working on :)

First off, I should say that, as with most methodologies, there isn't one normative approach to doing TDD, so I'll try to say "GOOS" when what I'm discussing comes from the book, and "TDD" when it appears to be generic. In the foreword to the book, Kent Beck says, in effect, "this is a good book. It's not how I would do it, but I learned from it." At first, I thought this was a case of what you might call "dissing with faint praise." Perhaps this is because TDDers are so often dogmatic: "TDD IS GOOD FOR YOU. DO IT NOW! BECAUSE I SAID SO," or "you can't call yourself a professional if you don't use it." Then if someone raises objections, the answer is too often "well, there are some hopeless ideologues you just can't convince." Good grief! Why not just say "it made me a more effective professional," a statement that should be perfectly capable of making other professional take notice without all the acrimony? But in any case, I think it is healthy that there are different views of TDD because differences help to drive out what is meaningful. I admit it -- I'm a bit of a Hegelian at heart.

What is missing in the discussions I've seen of TDD, and what the GOOS book tries to answer, is the rationale behind doing it. I'm not looking for "it's X% more effective." I want to understand why it is more effective -- what makes it work? This is a fundamental difference, and it's not academic either. The fact is, when you take on a new project, you have to take on an approach to that project. Saying "I'll use TDD" doesn't get you far, any more than saying "I'll use OO." TDD needs to be applied. Understanding how to apply TDD comes from understanding what TDD is good for, how it works, what sort of things in the project setup to look for. This is not so easy -- the TDD guys are right to argue that you can't give canned answers. But that's precisely why it is important to have a philosophy of TDD as well as a methodology.

OK, so to the beginnings of clearing up my conceptual misunderstandings with TDD. In the following points, I'll list the misconception first, then discuss why it's a misconception.

  1. TDD = no design up front. If GOOS is right, this is nonsense. There's plenty of design up front. There are meetings with stakeholders, rough designs drawn on whiteboards, state diagrams, CRC cards, discussions of what constitutes the first "slice," discussions of what test technologies to use, and likely more. It was surprising to me, actually, to see how many times the authors of GOOS used the phrase "after discussions" or some similar idea. It's irrelevant to me whether design is documented in some fancy 150k tool or on a napkin. The point is, someone has thought about it. Someone has worked through a preliminary version of what needs to happen -- there is a direction. Perhaps anti-design-up-front TDDers prefer to call this "planning." Fine. I don't care.
  2. TDD = writing unit tests. I am sure I have read TDD folks saying something to the effect that "unit tests are all you need." That's such a bizarre statement, I can't believe I would have randomly made it up. In fact, having worked on some decent-sized systems, "unit test only" is one of the biggest problems I had with TDD. How on earth can you say that's sufficient to unit test a system with a million lines of code? Rubbish! All the interesting problems are integration problems; unit testing is trivial by comparison. The same goes for working with external libraries or applications. GOOS takes the sensible perspective that tests need to exist at multiple levels, with "end-to-end" tests at the highest level, just as the conventional development methodologies I've used would indicate, though done via a different approach. Sanity! Thank goodness for that!
  3. TDD = start by writing test cases. One of my problems with TDD has been where to start. In the past when I've tried TDD, I thought "OK, so I need to write tests first. What do I start with?" Then, typically, I picked an object I thought I understood (usually a low-level object) and wrote tests for it. Then, after exhaustively testing all the operators, assignment possibilities etc. I'd end up with a nice unit test which needed extensively rewritten when I tried to fit the object in with other objects. GOOS is quite clear on this point: doing this is a bad idea. More than that, it's fundamentally a misconception of what TDD is supposed to accomplish. In the GOOS way of thinking, tests are intended to drive the design of the system by pushing development from areas of knowledge into areas where knowledge is insufficient. This implies you must start from knowledge. But at the beginning where does the knowledge come from? It must come from constraints like client requirements, available technologies and preliminary design team discussions, and in the beginning, that means you have only high-level details. So "beginning" means gathering requirements and priorities, holding preliminary design discussions, and working out a plan which includes identifying a first "slice" of capability from which the application can grow. Only then can you start with the first test.
  4. TDD = bottom-up. Given that the first word of their title is "growing," you might think that GOOS advocates starting at the bottom with a seed of code, and adding more bits of code until the project is done. What this fails to take into account, however, are the different levels at which TDD operates. As mentioned in point 3), GOOS begins with a slice of capability. This is at the application level, and the first test is an end-to-end test which I think is intended to test the life of the application and its interfaces, though in minimal way -- it might just instantiate an object, connect to a database, retrieve one thing, and shut down, for example. Adding capability involves driving down into the support classes by adding tests that require those supports, then making those tests pass. Thus, if I am understanding this correctly, GOOS is inherently top-down, setting up the application framework first then pushing down to lower levels.
  5. TDD = getting good code coverage. Code coverage is certainly likely to be a benefit of a TDD approach, but it's not the point of TDD. Rather, TDD is a model of programming that proceeds by placing constraints on the solution space, and increasing those constraints until all requirements are satisfied. This is really the fundamental difference between a TDD approach (or in principle any test-first approach), and a conventional test-last approach. In a test-last approach, there are, of course, constraints; they are simply implicit and embedded in the code. Testing is done at the end to ensure that the implicit constraints meet the requirements. In a test-first approach, constraints are explicit, and are themselves developed as part of the design as the software grows. A potential problem with test-first is that you might spend a lot of time formally expressing and revising those constraints in test cases. A potential problem with test-last is that implicit constraints might be hard to test. And yes, I'm calling these "potential" problems. I'm in no position yet to say whether the effort to do TDD is justified, or gets better results than well-designed test-last code.
Well, that's just a beginning. Hopefully I haven't slandered the GOOS guys too much. In future entries, I'll try to elaborate more on what I'm encountering as I try out TDD.