Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REPL integration test failures with macOS julia nightly #1111

Open
lgoettgens opened this issue Jan 6, 2025 · 8 comments
Open

REPL integration test failures with macOS julia nightly #1111

lgoettgens opened this issue Jan 6, 2025 · 8 comments

Comments

@lgoettgens
Copy link
Member

lgoettgens commented Jan 6, 2025

First observed in https://github.com/oscar-system/GAP.jl/actions/runs/12625852387/job/35178150930.
With julia 1.12.0-DEV.1829 it succeeded, but with 1.12.0-DEV.1836 it fails. This leaves JuliaLang/julia@638dacc...89afe20 as the diff.

spawn julia --startup-file=no --history-file=no --banner=no --project=@.
�[?2004h
julia> 

us
julia> 
us
ingi
julia> 
usi
ngn
julia> 
usin
gg
julia> 
using
 GA
julia> 
using GA
PP
julia> 
using GAP

julia> 
using GAP

2004l1049h7h1049l
1l25l12l25h


julia> 

2004h
julia> 

 TIMEOUT 
Error: Process completed with exit code 2.
@benlorenz
Copy link
Member

benlorenz commented Jan 6, 2025

I have seen this happen earlier and disappear again, e.g. https://github.com/oscar-system/GAP.jl/actions/runs/12467506675/job/34796923548?pr=1067 with 1.12.0-DEV.1792.

@fingolfin
Copy link
Member

@lgoettgens how systematic was this range obtained? Is this from CI logs? I wonder how reliable "it did not crash" is, i.e. maybe this error only appears with a certain probability, and one should re-run the test a couple times with a specific Julia version?

@lgoettgens
Copy link
Member Author

lgoettgens commented Jan 6, 2025

@lgoettgens how systematic was this range obtained? Is this from CI logs? I wonder how reliable "it did not crash" is, i.e. maybe this error only appears with a certain probability, and one should re-run the test a couple times with a specific Julia version?

Just from ci logs from the scheduled jobs on master. I can try to bisect this in the next days

@fingolfin
Copy link
Member

Would indeed be good to bisect where this comes from.

@lgoettgens
Copy link
Member Author

Would indeed be good to bisect where this comes from.

Thanks for reminding me. I am on it and report back once finished

@lgoettgens
Copy link
Member Author

After some researching and debugging, this seems to not be a regression, but instead has always been a random failure. In https://github.com/oscar-system/GAP.jl/actions/runs/12832003066, I ran the relevant part from the CI job with julia JuliaLang/julia@ff97facbc94, which is exactly the version that the macos-nightly job in #1595 right before merging succeeded with. And even in this version, 3 out of 10 similar jobs failed due to a timeout in the expect testsuite.

Under these circumstances, I am not sure how sensible a full bisection is. Or rather, how far I have to go back in time to find a good commit.

@fingolfin
Copy link
Member

instead has always been a random failure

With "always" you mean "since we added the expect based tests" ? AFAIK the failures are however always with nightly, not with 1.10 or 1.11, right?

Perhaps the expect tests simply are too fragile and we need to make them more resilient, somehow, but in order to do this it would be really helpful to know what really triggers this... And to test a potential fix we need to reliably reproduce the issue. Ideally locally for quicker turnaround.

Of course if we an repro locally then we could also bisect Julia locally. sigh

@lgoettgens
Copy link
Member Author

I can try to bisect nightly against the 1.11 branch off or something like that. (again with many similar CI jobs, and considering it a fail if one of them has a timeout)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants