Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Test Heisenbug #245

Open
evillemez opened this issue Oct 23, 2014 · 0 comments
Open

The Test Heisenbug #245

evillemez opened this issue Oct 23, 2014 · 0 comments

Comments

@evillemez
Copy link
Contributor

There is currently a bug that results in a segfault when running the test suite. Initially it was triggered by upgrading some dependencies, but upon thorough investigation, it is was never clear what actually caused it. As we are confident that the bug only affects the test suite, and not a production system, we are skipping it for now - but documenting the fact that it happened. Given the odd nature of the issue, it is entirely possible that it will simply go away in the future with updates to libraries and the environment. For now, the segfault can be avoided by running the tests with the --process-isolation flag, which has been enabled by default in the test suite. This makes the test suite much slower, but reliable.

Below is a brief description of various avenues of inquiry, and conditions that seemed to affect whether or not the segfault condition was met.

Does affect tests

All of these things have happened, or happen consistently sometimes. Everything in the following list seemingly affected the tests, but was also ruled out (as much as possible) as the sole cause of the problem.

  • Upgrading WebServicesBundle independently introduced a segfault at the end of the tests
  • Upgrading Symfony independently introduced a segfault in the middle of the tests
  • Upgrading Rabbitmq lib and bundle introduced a segfalt in the middle of the tests
  • Changing OS || PHP version - seemed to change where a fault occurred, but not whether or not it occurred
    • Tried in PHP 5.5.9, 5.5.16, 5.5.X(latest), 5.6.0, 5.6.2
    • Ubuntu 14.10
    • OS X 10.10
  • Running tests with --process-isolation (cures crashes, but not in phpunit 4.x)
  • disabling GC - runs out of memory, THEN faults when the process exits
  • inserting print statements into the tests will sometimes prevent them from failing, pushing the failure to the next test
  • commenting out code in blocks that do not execute would prevent or reintroduce the fault... sometimes
  • stopping rabbitmq-server, to force connection error makes test run and fail, but not fault... sometimes
  • returning early before connecting/sending messages to rabbitmq makes tests pass... sometimes
  • first run after cache clear changes where it crashes, but not whether it crashes
  • before and after a vagrant reload changed position & type of crash (zend_mm_heap corrupted vs segfault)
  • Integration tests that load more than one instance of the app container in a single test have seemed more likely to fail

The insane inconsistency certainly suggests it is a problem in the PHP interpreter - but tracing it has been impossible.

Does not affect tests

The inability to find anything in user-land code that triggers the segfault led me to believe the problem could be related to the underlying environment. Generally, though, modifying things about that did not seem to have much effect.

  • Unit tests - these seem to always run fine, which suggests to me that somewhere there may be shared state in integration tests
  • busying CPU with other processes
  • inserting sleep()'s into the tests to make them take longer
  • adding text comments, as opposed to lines of logic (even if they do not execute)
  • number of open file descriptors - stays fairly low during the test run, seemingly had no effect

Most things in the does affect category were inconsistent, however. They all had some effect at some point - but not all the time, depending on whate else had changed. Therefore

Future steps

There are general refactorings that need to happen in the test suite that could potentially make the problem go away:

  • abstract all fixture-related stuff out of WSB more cleanly into separate bundle
  • mix/match traits for sharing functionality in test cases, rather than lots of inheritance
  • ensure no shared state between tests
  • try running in HHVM (once there is reliable mongo support; the current mongofill polyfill is very incomplete)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant