Sunday, January 29, 2012

Unit test only?

How far can unit tests take you? There's a very interesting presentation here: I've been of the opinion that you can't get around doing higher-level tests, however this is somewhat based on the way I've learned software development. The typical setup I've worked in is hierarchical. At the bottom you have unit tests, usually specific to a single object. Then you have unit integration tests, usually specific to a small set of closely related objects providing a significant chunk of capability. Above that, there's application-level tests, which verify that the capabilities integrate nicely. Above that, there's system level tests of various types, which ensure that all system requirements are met. Above that are systems integration tests, testing communications between systems.

In principle, though, at any given point in time an interface/communication/message consists of two objects -- a sender and a receiver. If the system is highly modular, these objects will likely be small and fairly simple. Certainly in a system of any size there will be differences in level of abstraction, but the basic idea of sender and receiver is fundamental. If this is granted, then in principle it ought to be provable that if all possible inputs and outputs to objects are known, then unit testing is sufficient.

I have two questions regarding this position. The first has to do with emergent behavior. The second is about the feasibility of determining the complete mapping between inputs and outputs. It may be that the two questions are connected, and are really two sides of the same coin, but let start with emergent behavior. My question here is not clear yet; it's more of an uneasiness. Consider the firing of neurons in a neural network simulating a brain, say, or perhaps some much simpler neural network but large enough to be non-trivial, so that the behavior of the system relies on complex interactions between the neurons. Each neuron has a set of inputs and a set of outputs. Each neuron is, in itself, simple. Complex behavior arises not from individual neurons, but from patterns of neuron firing. Now let us suppose that we wanted to test that our network is working properly using the unit test method. Testing each neuron is simple enough. Take a neuron. Construct a set of input values. Test that the outputs follow the neuron activation function. Easy!

However, there is a big gap between what these unit tests tell us about the system, and what the user or higher-level programmer wants to know about the system. True, the neurons may be coded perfectly, but what we care about is at a level that is difficult to relate to the unit tests. I am not talking here about the difference between user tests and programmer tests. A programmer looking at this network from a higher level has the same issue -- what do the unit tests say about the overall network behavior? Or, put another way, where does the confidence come from that the system works correctly? Tests at the neuron level are not sufficient to determine this.

I believe a similar kind of problem is envisioned by one of the commentators to the video who noted that a Mars Rover failed to land correctly because the parachute system deployed with a jerk that the (separately tested) detachment system interpreted as a landing. In principle, this problem could be caught if the characteristic for each system are known and the specs are consistent -- the sensor inputs from the parachute deployment need to be well-characterized through parachute testing, and those characteristics can then be fed to the detachment system independently. If the sensor readings for a landing are very similar to those generated by a parachute jerk, then there is an engineering problem to solve.

In practice, however, though it might be straightforward to test each sensor, this may not determine the overall behavior of the system. System complexity is especially likely when interacting with hardware with many design parameters, working in unpredictable environments. A parachute detachment might be triggered by the combined results from a hundred different sensors, with the value from each sensor variable according to wind, temperature, deterioration due to aging and other factors changing over time. What is needed is to know the overall system behavior, given the component behaviors. Effectively, what needs to be tested is not the design of each component, but the system design model. If the design is wrong -- if the detachment system detaches the parachute early because of a flaw -- then it could still be the case that each individual sensor is working correctly.

The question is: is unit testing sufficient to catch a system-level design flaw? This in part depends on whether "units" are allowed to occur at different levels of abstraction. Let's suppose they can, otherwise unit tests are going to be very limited in scope. So now we have a "decision maker" object somewhere in the detachment system that periodically takes inputs from the sensors and makes a decision on whether or not to detach the parachute.

Off the bat, it seems to me that a unit test for the decision maker object is really an integration test by another name. Granted, there may be time advantages in being able to mock up sensors rather than interact with the hardware. But from the perspective of what the test needs to accomplish, from a design point of view, the unit test takes results from a lower level of abstraction and processes them. The coder for the unit test needs to know about the different possible input values from those sensors and what the appropriate outcomes should be. In terms of the thinking behind the test design, that is de facto integration.

Is the new and cool unit-test-only method, then, really that different from the old-and-crusty unit + unit integration + ... method? I am beginning to think it is less different than I imagined at first. All that appears to be missing is a top-level integration object that, in the traditional view, would represent the system. If we envision the system as an object with a given set of inputs and outputs, and everything at a lower level substituted by mocks, then the unit test for this system object is just an end-to-end test. The same idea applies one level higher up for systems of systems.

Broadening unit testing in this way, we can get a reasonable correspondence between old-style and unit-only testing. Old-style reflects likely changes in responsibility as code is looked at from a higher and higher level. Unit-only emphasizes that, regardless of the level, the same thing is happening -- there are input and outputs.

This correspondence suggests to me that old-style and unit-only testing ultimately share the same strengths and weaknesses. You may conduct a traditional interface test with the parachute and detachment systems, even bring in the hardware if you like, but this does not guarantee that integration problems will be found if the set of inputs and outputs have very complex relations, and the problem rarely occurs. If it is possible to break the complex input-output relations into something simpler, that is all to the good, regardless of test style. The real gains from unit-test only style are not from demanding "unit tests only!" They come from making a system where the objects are cohesive and loosely coupled, and from having test structures such as mocks that support testing objects individually. Credit to the agile guys for coming up with methodologies that push that issue front and center where it belongs.

No comments:

Post a Comment