|
| 1 | +Using code dependency analysis to decide what to test |
| 2 | +=================== |
| 3 | + |
| 4 | +By [Patrick Kusebauch](https://github.com/patrickkusebauch) |
| 5 | + |
| 6 | +> [!IMPORTANT] |
| 7 | +> Find out how to save 90+% of your test runtime and resources by eliminating 90+% of your tests while keeping your test |
| 8 | +> coverage and confidence. Save over 40% of your CI pipeline runtime overall. |
| 9 | +
|
| 10 | +## Introduction |
| 11 | + |
| 12 | +Tests are expensive to run and the larger the code base the more expensive it becomes to run them all. At some point |
| 13 | +your test runtime might even become so long it will be impossible to run them all on every commit as your rate of |
| 14 | +incoming commits might be higher than your ability to test them. But how else can you have confidence that your |
| 15 | +introduced changes have not broken some existing code? |
| 16 | + |
| 17 | +Even if your situation is not that dire yet, the time it takes to run test makes it hard to get fast feedback on your |
| 18 | +changes. It might even force you to compromise on other development techniques. To lump several changes into larger |
| 19 | +commits, because there is no time to test each small individual change (like type fixing, refactoring, documentation |
| 20 | +etc.). You might like to do trunk-based development, but have feature branches instead, so that you can open PRs and |
| 21 | +test a whole slew of changes all at once. Your DORA metrics are compromised by your slow rate of development. Instead of |
| 22 | +being reactive to customer needs, you have to plan your projects and releases months in advance because that's how often |
| 23 | +you are able to fully test all the changes. |
| 24 | + |
| 25 | +Slow testing can have huge consequences on how the whole development process looks like. While speeding up test |
| 26 | +execution per-se is very individual problem in every project, there is another technique that can be applied everywhere. |
| 27 | +You have to become more picky about what tests to run. So how do you decide what to test? |
| 28 | + |
| 29 | +## Theory |
| 30 | + |
| 31 | +### What is code dependency analysis? |
| 32 | + |
| 33 | +Code dependency analysis is the process of (usually statically) analysing the code to determine what code is used by |
| 34 | +other code. The most common example of this is analysing the specified dependencies of a project to determine potential |
| 35 | +vulnerabilities. This is what tools like [OWASP Dependency Check](https://owasp.org/www-project-dependency-check/) do. |
| 36 | +Another use case is to generate a Software Bill of Materials (SBOM) for a project. |
| 37 | + |
| 38 | +There is one other use case that not many people talk about. That is using code dependency analysis to create a Directed |
| 39 | +Acyclic Graph (DAG) of the various components/modules/domains of a project. This DAG can then be used to determine how |
| 40 | +changes to one component will affect other components. |
| 41 | + |
| 42 | +Imagine you have a project with the following structure of components: |
| 43 | + |
| 44 | + |
| 45 | + |
| 46 | +The `Supportive` component depends on the `Analyser` and `OutputFormatter` components. The `Analyser` in turn depends on |
| 47 | +3 other components - `Ast`, `Layer` and `References`. Lastly `References` depend on the `Ast` component. |
| 48 | + |
| 49 | +If you make a change to the `OutputFormatter` component you will want to run the **contract tests** |
| 50 | +for `OutputFormatter` and **integration tests** for `Supportive` but no tests for `Ast`. If you make changes |
| 51 | +to `References` you will want to run the **contract tests** for `References`, **integration tests** for `Analyser` and |
| 52 | +`Supportive` but no tests for `Layer` or `OutputFormatter`. In fact, there is no one module that you can change that |
| 53 | +would require you to run all the tests. |
| 54 | + |
| 55 | +> [!NOTE] |
| 56 | +> By **contract tests** I mean tests that test the defined API of the component. In other words what the component |
| 57 | +> promises (by contract) to the outside users to always be true about the usage of the component. Such a test mocks out |
| 58 | +> all outside interaction with any other component. |
| 59 | +> |
| 60 | +> By contrast, **integration tests** in this context mean tests that test that the interaction with a dependent |
| 61 | +> component is properly programmed. For that reason the underlying (dependent) component is not mocked out. |
| 62 | +
|
| 63 | +### How do you create the dependency DAG? |
| 64 | + |
| 65 | +There are very few tools that can do this as of today, even though the concept is very simple. So simple you can do it |
| 66 | +yourself if there is no tool available for your language of choice. |
| 67 | + |
| 68 | +You need to parse and lex the code to create an Abstract Syntax Tree (AST) and then walk the AST of every file to find |
| 69 | +the dependencies. The same functionality your IDE does any time you "Find references..." or what your language server |
| 70 | +sends over [LSP (Language Server Protocol)](https://en.wikipedia.org/wiki/Language_Server_Protocol). |
| 71 | + |
| 72 | +You group the dependencies by predefined components/modules/domains, and then combine all the dependencies into a single |
| 73 | +graph. |
| 74 | + |
| 75 | +### How do you use the DAG to decide what to test? |
| 76 | + |
| 77 | +Once you have the DAG there is a 4-step process to run your testing: |
| 78 | + |
| 79 | +1. Get the list of changed files (for example by running `git diff`) |
| 80 | +2. Feed the list to the dependency analysis tool to get the list of changed components (and optionally the list of |
| 81 | + depending components as well for integration testing) |
| 82 | +3. Feed the list to your testing tool of choice to run the test-suites corresponding to each changed component |
| 83 | +4. Revel in how much time you have saved on testing. |
| 84 | + |
| 85 | +## Practice |
| 86 | + |
| 87 | +This is not just some theoretical idea, but rather something you can try out yourself today. If you are lucky, there is |
| 88 | +already an open-source tool in your language of choice that lets you do it today. If you are not, the following |
| 89 | +demonstration will give you enough guidance to write it yourself. If you do, please let me know, I would love to see it. |
| 90 | + |
| 91 | +The tool that I have used today for demonstration is [deptrac](https://qossmic.github.io/deptrac/), and it is written in |
| 92 | +PHP and for PHP. |
| 93 | + |
| 94 | +All you have to do to create a DAG is to specify the modules/domains: |
| 95 | + |
| 96 | +```yaml |
| 97 | +# deptrac.yaml |
| 98 | +deptrac: |
| 99 | + paths: |
| 100 | + - src |
| 101 | + |
| 102 | + layers: |
| 103 | + - name: Analyser |
| 104 | + collectors: |
| 105 | + - type: directory |
| 106 | + value: src/Analyser/.* |
| 107 | + - name: Ast |
| 108 | + collectors: |
| 109 | + - type: directory |
| 110 | + value: src/Ast/.* |
| 111 | + - name: Layer |
| 112 | + collectors: |
| 113 | + - type: directory |
| 114 | + value: src/Layer/.* |
| 115 | + - name: References |
| 116 | + collectors: |
| 117 | + - type: directory |
| 118 | + value: src/References/.* |
| 119 | + - name: Contract |
| 120 | + collectors: |
| 121 | + - type: directory |
| 122 | + value: src/Contract/.* |
| 123 | +``` |
| 124 | +
|
| 125 | +### The 4-step process |
| 126 | +
|
| 127 | +Once you have the DAG you can use combine it with the list of changed files to determine what modules/domains to test. A |
| 128 | +simple git command will give you the list of changed files: |
| 129 | +
|
| 130 | +```bash |
| 131 | +git diff --name-only |
| 132 | +``` |
| 133 | + |
| 134 | +You can then use this list to find the modules/domains that have changed and then use the DAG to find the modules that |
| 135 | +depend on those modules. |
| 136 | + |
| 137 | +```bash |
| 138 | +# to get the list of changed components |
| 139 | +git diff --name-only | xargs php deptrac.php changed-files |
| 140 | + |
| 141 | +# to get the list of changed modules with the depending components |
| 142 | +git diff --name-only | xargs php deptrac.php changed-files --with-dependencies |
| 143 | +``` |
| 144 | + |
| 145 | +If you pick the popular PHPUnit framework for your testing and |
| 146 | +follow [their recommendation for organizing code](https://docs.phpunit.de/en/10.5/organizing-tests.html), it will be |
| 147 | +very easy for you to create a test-suite per component. To run a test for a component you just have to pass the |
| 148 | +parameter `--testsuite {componentName}` to the PHPUnit executable: |
| 149 | + |
| 150 | +```bash |
| 151 | +git diff --name-only |\ |
| 152 | +xargs php deptrac.php changed-files |\ |
| 153 | +sed 's/;/ --testsuite /g; s/^/--testsuite /g' |\ |
| 154 | +xargs ./vendor/bin/phpunit |
| 155 | +``` |
| 156 | + |
| 157 | +Or if you have integration test for the dependent modules, and decide to name you integration test-suites |
| 158 | +as `{componentName}Integration`: |
| 159 | + |
| 160 | +```bash |
| 161 | +git diff --name-only |\ |
| 162 | +xargs php deptrac.php changed-files --with-dependencies |\ |
| 163 | +sed '1s/;/ --testsuite /g; 2s/;/Integration --testsuite /g; /./ { s/^/--testsuite /; 2s/$/Integration/; }' |\ |
| 164 | +sed ':a;N;$!ba;s/\n/ /g' |\ |
| 165 | +xargs ./vendor/bin/phpunit |
| 166 | +``` |
| 167 | + |
| 168 | +### Real life comparison results |
| 169 | + |
| 170 | +I have run the following script a set of changes to compare what the saving were: |
| 171 | + |
| 172 | +```shell |
| 173 | +# Compare timing |
| 174 | +iterations=10 |
| 175 | + |
| 176 | +total_time_with=0 |
| 177 | +for ((i = 1; i <= $iterations; i++)); do |
| 178 | + # Run the command |
| 179 | + runtime=$( |
| 180 | + TIMEFORMAT='%R' |
| 181 | + time (./vendor/bin/phpunit >/dev/null 2>&1) 2>&1 |
| 182 | + ) |
| 183 | + |
| 184 | + miliseconds=$(echo "$runtime" | tr ',' '.') |
| 185 | + total_time_with=$(echo "$total_time_with + $miliseconds * 1000" | bc) |
| 186 | +done |
| 187 | + |
| 188 | +average_time_with=$(echo "$total_time_with / $iterations" | bc) |
| 189 | +echo "Average time (not using deptrac): $average_time_with ms" |
| 190 | + |
| 191 | +# Compare test coverage |
| 192 | +tests_with=$(./vendor/bin/phpunit | grep -oP 'OK \(\K\d+') |
| 193 | +echo "Executed tests (not using deptrac): $tests_with tests" |
| 194 | + |
| 195 | +echo "" |
| 196 | + |
| 197 | +total_time_without=0 |
| 198 | +for ((i = 1; i <= $iterations; i++)); do |
| 199 | + # Run the command |
| 200 | + runtime=$( |
| 201 | + TIMEFORMAT='%R' |
| 202 | + time ( |
| 203 | + git diff --name-only | |
| 204 | + xargs php deptrac.php changed-files --with-dependencies | |
| 205 | + sed '1s/;/ --testsuite /g; 2s/;/Integration --testsuite /g; /./ { s/^/--testsuite /; 2s/$/Integration/; }' | |
| 206 | + sed ':a;N;$!ba;s/\n/ /g' | |
| 207 | + xargs ./vendor/bin/phpunit >/dev/null 2>&1 |
| 208 | + ) 2>&1 |
| 209 | + ) |
| 210 | + |
| 211 | + miliseconds=$(echo "$runtime" | tr ',' '.') |
| 212 | + total_time_without=$(echo "$total_time_without + $miliseconds * 1000" | bc) |
| 213 | +done |
| 214 | + |
| 215 | +average_time_without=$(echo "$total_time_without / $iterations" | bc) |
| 216 | +echo "Average time (using deptrac): $average_time_without ms" |
| 217 | +tests_execution_without=$(git diff --name-only | |
| 218 | + xargs php deptrac.php changed-files --with-dependencies | |
| 219 | + sed '1s/;/ --testsuite /g; 2s/;/Integration --testsuite /g; /./ { s/^/--testsuite /; 2s/$/Integration/; }' | |
| 220 | + sed ':a;N;$!ba;s/\n/ /g' | |
| 221 | + xargs ./vendor/bin/phpunit) |
| 222 | +tests_without=$(echo "$tests_execution_without" | grep -oP 'OK \(\K\d+') |
| 223 | +tests_execution_without_time=$(echo "$tests_execution_without" | grep -oP 'Time: 00:\K\d+\.\d+') |
| 224 | +echo "Executed tests (using deptrac): $tests_without tests" |
| 225 | + |
| 226 | +execution_time=$(echo "$tests_execution_without_time * 1000" | bc | awk '{gsub(/\.?0+$/, ""); print}') |
| 227 | +echo "Time to find tests to execute (using deptrac): $(echo "$average_time_without - $tests_execution_without_time * 1000" | bc | awk '{gsub(/\.?0+$/, ""); print}') ms" |
| 228 | +echo "Time to execute tests (using deptrac): $execution_time ms" |
| 229 | + |
| 230 | +echo "" |
| 231 | + |
| 232 | +percentage=$(echo "scale=3; $tests_without / $tests_with * 100" | bc | awk '{gsub(/\.?0+$/, ""); print}') |
| 233 | +echo "Percentage of tests not needing execution given the changed files: $(echo "100 - $percentage" | bc)%" |
| 234 | +percentage=$(echo "scale=3; $execution_time / $average_time_with * 100" | bc | awk '{gsub(/\.?0+$/, ""); print}') |
| 235 | +echo "Time saved on testing: $(echo "$average_time_with - $execution_time" | bc) ms ($(echo "100 - $percentage" | bc)%)" |
| 236 | +percentage=$(echo "scale=3; $average_time_without / $average_time_with * 100" | bc | awk '{gsub(/\.?0+$/, ""); print}') |
| 237 | +echo "Time saved overall: $(echo "$average_time_with - $average_time_without" | bc) ms ($(echo "100 - $percentage" | bc)%)" |
| 238 | +``` |
| 239 | + |
| 240 | +with the following results: |
| 241 | + |
| 242 | +``` |
| 243 | +Average time (not using deptrac): 984 ms |
| 244 | +Executed tests (not using deptrac): 721 tests |
| 245 | +
|
| 246 | +Average time (using deptrac): 559 ms |
| 247 | +Executed tests (using deptrac): 21 tests |
| 248 | +Time to find tests to execute (using deptrac): 491 ms |
| 249 | +Time to execute tests (using deptrac): 68 ms |
| 250 | +
|
| 251 | +Percentage of tests not needing execution given the changed files: 97.1% |
| 252 | +Time saved on testing: 916 ms (93.1%) |
| 253 | +Time saved overall: 425 ms (43.2%) |
| 254 | +``` |
| 255 | + |
| 256 | +Some interesting observations: |
| 257 | + |
| 258 | +- Only **3% of the tests** that normally run on the PR needed to be run to cover the change with tests. That is a |
| 259 | + **saving of 700 tests** in this case. |
| 260 | +- **Test execution time has decreased by 93%**. You are mostly left with the constant cost of set-up and tear-down of |
| 261 | + the testing framework. |
| 262 | +- **Pipeline overall time has decreased by 43%**. Since the analysis time grows orders of magnitude slower that test |
| 263 | + runtime (it is not completely constant more files still means more to statically analyse), the number is only bound to |
| 264 | + be better the larger the codebase is. |
| 265 | + |
| 266 | +And these saving apply to arguable the worst possible SUT (System Under Test): |
| 267 | + |
| 268 | +- It is a **small application**, so it is hard to get the saving of skipping testing of vast number of components as it |
| 269 | + would be the case for large codebases. |
| 270 | +- It is a **CLI script**, so it has no database, no external APIs to call, minimal slow I/O tests. Those are the tests |
| 271 | + you want skipping the most, and they are barely present here. |
| 272 | + |
| 273 | +## Conclusion |
| 274 | + |
| 275 | +Code dependency analysis is a very useful tool for deciding what to test. It is not a silver bullet, but it can help you |
| 276 | +reduce the number of tests you run and the time it takes to run them. It can also help you decide what tests to run in |
| 277 | +your CI pipeline. It is not a replacement for a good test suite, but it can help you make your test suite more |
| 278 | +efficient. |
| 279 | + |
| 280 | +## References |
| 281 | + |
| 282 | +- [deptrac](https://qossmic.github.io/deptrac/) |
| 283 | +- [deptracpy](https://patrickkusebauch.github.io/deptracpy/) |
| 284 | + |
| 285 | +See you on [Day 16](day16.md). |
0 commit comments