The Test Heisenbug #245

evillemez · 2014-10-23T15:51:54Z

There is currently a bug that results in a segfault when running the test suite. Initially it was triggered by upgrading some dependencies, but upon thorough investigation, it is was never clear what actually caused it. As we are confident that the bug only affects the test suite, and not a production system, we are skipping it for now - but documenting the fact that it happened. Given the odd nature of the issue, it is entirely possible that it will simply go away in the future with updates to libraries and the environment. For now, the segfault can be avoided by running the tests with the --process-isolation flag, which has been enabled by default in the test suite. This makes the test suite much slower, but reliable.

Below is a brief description of various avenues of inquiry, and conditions that seemed to affect whether or not the segfault condition was met.

Does affect tests

All of these things have happened, or happen consistently sometimes. Everything in the following list seemingly affected the tests, but was also ruled out (as much as possible) as the sole cause of the problem.

Upgrading WebServicesBundle independently introduced a segfault at the end of the tests
Upgrading Symfony independently introduced a segfault in the middle of the tests
Upgrading Rabbitmq lib and bundle introduced a segfalt in the middle of the tests
Changing OS || PHP version - seemed to change where a fault occurred, but not whether or not it occurred
- Tried in PHP 5.5.9, 5.5.16, 5.5.X(latest), 5.6.0, 5.6.2
- Ubuntu 14.10
- OS X 10.10
Running tests with --process-isolation (cures crashes, but not in phpunit 4.x)
disabling GC - runs out of memory, THEN faults when the process exits
inserting print statements into the tests will sometimes prevent them from failing, pushing the failure to the next test
commenting out code in blocks that do not execute would prevent or reintroduce the fault... sometimes
stopping rabbitmq-server, to force connection error makes test run and fail, but not fault... sometimes
returning early before connecting/sending messages to rabbitmq makes tests pass... sometimes
first run after cache clear changes where it crashes, but not whether it crashes
before and after a vagrant reload changed position & type of crash (zend_mm_heap corrupted vs segfault)
Integration tests that load more than one instance of the app container in a single test have seemed more likely to fail

The insane inconsistency certainly suggests it is a problem in the PHP interpreter - but tracing it has been impossible.

Does not affect tests

The inability to find anything in user-land code that triggers the segfault led me to believe the problem could be related to the underlying environment. Generally, though, modifying things about that did not seem to have much effect.

Unit tests - these seem to always run fine, which suggests to me that somewhere there may be shared state in integration tests
busying CPU with other processes
inserting sleep()'s into the tests to make them take longer
adding text comments, as opposed to lines of logic (even if they do not execute)
number of open file descriptors - stays fairly low during the test run, seemingly had no effect

Most things in the does affect category were inconsistent, however. They all had some effect at some point - but not all the time, depending on whate else had changed. Therefore

Future steps

There are general refactorings that need to happen in the test suite that could potentially make the problem go away:

abstract all fixture-related stuff out of WSB more cleanly into separate bundle
mix/match traits for sharing functionality in test cases, rather than lots of inheritance
ensure no shared state between tests
try running in HHVM (once there is reliable mongo support; the current mongofill polyfill is very incomplete)

The text was updated successfully, but these errors were encountered:

evillemez added bug question wontfix maintenance labels Oct 23, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Test Heisenbug #245

The Test Heisenbug #245

evillemez commented Oct 23, 2014

The Test Heisenbug #245

The Test Heisenbug #245

Comments

evillemez commented Oct 23, 2014

Does affect tests

Does not affect tests

Future steps