In the world of software development, understanding software behavior is crucial. It’s not just about writing code; it involves various steps and artifacts. In this post, we’ll dive into the different ways to describe what a software product does, showing how they all connect to depict software behavior.
Development Process
Software development is a multi-step process. Every step produces a specific type of output. If we want to boil it down, we create the following artifacts:
- Requirements: The client introduces their needs
- User Stories: We turn the requirements into action items
- Code: We write the code to implement the product
- Tests: We create tests to verify that the product behaves as expected
- Binary: The executable software
Obviously, phases can overlap, repeat, or even be reordered. For example, using agile methodologies or TDD, having new feature requests, and using interpreted languages can alter the process. The important thing is that these are the primary artifacts we create while developing the software, and most of them are present1.
It may be surprising that all these artifacts do the same thing: they describe the software’s behavior. The difference is the syntax and abstraction level:
- The requirements are free-text and easily understandable by anyone who knows the domain2.
- User stories are more formal. In an ideal world, they contain examples and user acceptance test (UAT) descriptions.
- Code is the ultimate description of the software’s behavior. If we change the code, the software will behave differently.
- We often call tests executable documentation. They are as formal as the code but grasp a different perspective. The tests define what the software should do, while the code describes how it does its thing.
- A binary is also code: the only code that computers directly understand.
Creating them is very time-consuming and requires extensive manual work. This is why software development is error-prone. All these should describe the same behaviors, in theory3. But real life is very different. So, somehow, we need to ensure that these are in sync.
We have two simple strategies to ensure consistency: automation and verification4.
Automation
All of these artifacts exist for a reason. In other words, we need all of them. But what if we don’t create them manually but generate them from one of the others? Then, we generate the output from scratch every time the source format changes. This way, we don’t have to look for places we need to update; by definition, the source and the output will be in sync.
This approach has two preconditions:
- We need a way to convert one format to another effortlessly
- The generation needs to be deterministic
Compiling the code to binary is a classic example. And indeed, we don’t write machine code by hand anymore. Because of this (and because we already saw that binaries are low-level code), we’ll treat binaries as code in the rest of the article and don’t mention them specifically.
A less obvious example is executable specifications—for instance, Gherkin or FitNesse.
But not everything is easy to automate. Think of the user stories. Two developers can’t even agree on using tabs or spaces5. It is on another level to make them understand user stories the same way and transform them into code. But there is hope: coding guidelines, standards, and new tooling6 constantly make these steps more consistent and effortless.
Generating one asset from the other has one more Achilles heel: errors in the source. Because the generated artifact will contain that error, too; for example, if the code has a typo in a print statement, the generated binary will print the message with the typo.
This is when we can turn the situation into an advantage: we can cross-verify the different behavior descriptions.
Verification
The Oxford Dictionary has the following definition for “verification”:
[Verification is] the act of showing or checking that something is true or accurate.
For us, verification means checking that our assets are telling the same story. If we’re unsure of something, we must step up one abstraction level and check there. In other words, we verify user stories based on requirements, code and tests based on user stories.
Can we automate those checks? Currently, it’s impossible to reliably verify the contents of free text. Not to mention two texts with different structure.
What about code and tests? They are formal; therefore, they should be easier to verify. And indeed, we write tests to verify the code. The beauty is that this goes both ways: the code can verify tests.
All of us were in a situation where we looked at our code for hours and didn’t understand why it didn’t pass the tests. Ultimately, it turned out we made a mistake in the test.
It’s the second reason why we shouldn’t generate tests from the code. They will verify that the code works as the code works. How useful. If we have a bug7, the test will confirm the bug as the expected behavior. We’ll have false confidence, and we will ensure we’ll have a more challenging time catching the bug.
As a summary, we can visualize the verification paths with the following diagram:
Reducing The Number of Levels
After all the headaches that all these processes involve, it’s natural that we want to simplify things by getting rid of unnecessary steps. Unfortunately, it’s not that easy.
The most significant difference between code and other description formats is the scope: code defines how to solve a problem, while the rest describes the expected behavior.
To close the gap, we have two possible solutions:
- Create a programming language that focuses on the “what” and not the “how”
- Develop tools that understand what humans want
Since our industry constantly looks for possible optimizations, these solutions aren’t new at all.
We call languages that focus on the “what” declarative languages. Two examples of them are SQL (1974) and Prolog (1987). In fact, they created SQL to enable end users to extract the data they need directly from the database. It turned out so great that even some developers can’t write proper queries today. How could we expect end users to use these technologies properly?
So, if we can’t speak formal languages, why don’t we teach computers to understand our language? That’s exactly what we want to achieve with large language models. Recent developments are impressive, but we are still far from the end goal.
There is a straightforward reason that none of the two approaches succeeded so far: humans.
We are bad at properly articulating what we want or need. And even worse at thinking with new paradigms. Also, we tend to overcomplicate things.
Solving all these problems is challenging, but AI has come a long way in the past few years. We can expect its development speed to increase, and today’s mundane tasks will be bad memories tomorrow.
But we live in the present, and we need solutions now. In a new post, we’ll discuss how to work more efficiently with today’s AI tools.
Conclusion
We define software behavior on multiple abstraction levels because people in different roles have different requirements. Those abstraction levels have deep interconnections we couldn’t reduce in the past.
With the rise of AI, the game seems to be changing, and our tasks will be much easier in the future.
-
Because we always write tests, right? ↩︎
-
Presuming that they’re free of contradictions-which they usually aren’t ↩︎
-
And we know what’s the difference between theory and practice. In theory, nothing. ↩︎
-
Simple, but not easy ↩︎
-
Looking at you, AI ↩︎
-
Every decent application contains at least one line of code and one bug ↩︎