-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Clp_jll #77
Add Clp_jll #77
Conversation
Codecov Report
@@ Coverage Diff @@
## master #77 +/- ##
==========================================
+ Coverage 40.52% 45.45% +4.93%
==========================================
Files 3 4 +1
Lines 575 660 +85
==========================================
+ Hits 233 300 +67
- Misses 342 360 +18
Continue to review full report at Codecov.
|
Given how sensitive the solvers are to having the right versions of everything, I think we should tightly bind versions for all the jll dependencies. The reason being that we will want to start upgrading the dependencies in the artifacts, but not yet want them to be pulled by Clp.jl and GLPK.jl. Edit: I think this binding of dependencies that Clp needs should be done in |
I assume that even if Clp 1.17 is not compatible with 1.16, we should not be getting the double free kind of malloc related errors in Julia. |
Agreed. On the plus side, Cbc is green: jump-dev/Cbc.jl#133. Which suggests that it might be an API issue regarding how we're calling Clp. I couldn't see any obvious offenders if you scan |
Right, but cbc was already using Clp 1.17 even before. So it is good to know it works and suggests our libraries are correctly built. I will take a look at the clp interface, which before was using 1.16 and see if I can discover what changed. |
I see I see this comment in
|
Yes, I saw the For now, we don't wrap |
Reading through |
Do you think that using our own ASL instead of the one Coin-OR wants to use could be an issue here? Edit: It looks like Coin-OR is using the same ASL from netlib. |
I tried running the tests locally; I can pin down the failure to occurring withing a call to However, it does not reliably occur during a particular testset. |
@mtanneau I was zeroing in on diff --git a/src/MOIWrapper.jl b/src/MOIWrapper.jl
index e3be6ba..8c00a0f 100644
--- a/src/MOIWrapper.jl
+++ b/src/MOIWrapper.jl
@@ -70,22 +70,13 @@ end
function MOI.empty!(model::Optimizer)
old_model = model.inner
-
- # Create new Clp object
model.inner = Clp.ClpModel()
-
# Copy parameters from old model into new model
- for (option, (getter, setter)) in CLP_OPTION_MAP
- value = getter(old_model)
- setter(model.inner, value)
+ for (_, (getter, setter)) in CLP_OPTION_MAP
+ setter(model.inner, getter(old_model))
end
-
model.optimize_called = false
-
- # Free old Clp object
- Clp.ClpCInterface.delete_model(old_model)
-
- return nothing
+ return
end |
Still happening, rarely in the same place. Sometimes a dozen testsets can run, sometimes only a few. I tried this diff
since No real change, but I'm wondering whether the GC might be involved in all this. |
finalizer(delete_model, prob) (also I don't know what this line out actually does, but I'm guessing it's GC-related) |
This is suspicious: https://travis-ci.org/github/JuliaOpt/Clp.jl/jobs/671755091#L431-L434 7f394344d000-7f39436b0000 r-xp 00000000 08:01 2055349 /home/travis/.julia/artifacts/9cf32528460420740b8ba41eb7a2833b8975b4c2/lib/libClp.so.1.14.5
7f39436b0000-7f39438af000 ---p 00263000 08:01 2055349 /home/travis/.julia/artifacts/9cf32528460420740b8ba41eb7a2833b8975b4c2/lib/libClp.so.1.14.5
7f39438af000-7f39438b7000 rw-p 00262000 08:01 2055349 /home/travis/.julia/artifacts/9cf32528460420740b8ba41eb7a2833b8975b4c2/lib/libClp.so.1.14.5
7f39438b7000-7f3943922000 rw-p 002ab000 08:01 2055349 /home/travis/.julia/artifacts/9cf32528460420740b8ba41eb7a2833b8975b4c2/lib/libClp.so.1.14.5 Looks like it built |
In one of the intermediate builds of Clp_jll I had a bad libClp version accidentally. But that should not be happening anymore. |
Well, this seems to be the culprit: Oscars-MBP:artifacts oscar$ ls */lib/libClp*
ls: */lib/libClp*: No such file or directory
Oscars-MBP:artifacts oscar$ ~/julia1.3 --project=/tmp/clp_jll
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.3.1 (2019-12-30)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
(clp_jll) pkg> add Clp_jll
Updating registry at `~/.julia/registries/General`
Updating git-repo `https://github.com/JuliaRegistries/General.git`
Resolving package versions...
Updating `/private/tmp/clp_jll/Project.toml`
[06985876] + Clp_jll v1.17.5+2
Updating `/private/tmp/clp_jll/Manifest.toml`
[ae81ac8f] + ASL_jll v0.1.1+2
[06985876] + Clp_jll v1.17.5+2
[be027038] + CoinUtils_jll v2.11.4+1
[e66e0078] + CompilerSupportLibraries_jll v0.3.3+0
[d00139f3] + METIS_jll v4.0.3+0
[656ef2d0] + OpenBLAS32_jll v0.3.9+1
[7da25872] + Osi_jll v0.108.6+1
[2a0f44e3] + Base64
[ade2ca70] + Dates
[8ba89e20] + Distributed
[b77e0a4c] + InteractiveUtils
[76f85450] + LibGit2
[8f399da3] + Libdl
[56ddb016] + Logging
[d6f4376e] + Markdown
[44cfe95a] + Pkg
[de0858da] + Printf
[3fa0cd96] + REPL
[9a3f8284] + Random
[ea8e919c] + SHA
[9e88b42a] + Serialization
[6462fe0b] + Sockets
[8dfed614] + Test
[cf7118a7] + UUIDs
[4ec0a83e] + Unicode
(clp_jll) pkg> st Clp_jll
Status `/private/tmp/clp_jll/Project.toml`
[06985876] Clp_jll v1.17.5+2
julia> exit()
Oscars-MBP:artifacts oscar$ ls */lib/libClp*
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClp.1.14.5.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClp.1.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClp.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClp.la
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClpSolver.1.14.5.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClpSolver.1.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClpSolver.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClpSolver.la
Oscars-MBP:artifacts oscar$ Although, weirdly (clp_jll) pkg> add Clp_jll@1.16
Updating registry at `~/.julia/registries/General`
Updating git-repo `https://github.com/JuliaRegistries/General.git`
Resolving package versions...
Installed Clp_jll ─ v1.16.11+1
Updating `/private/tmp/clp_jll/Project.toml`
[06985876] ↓ Clp_jll v1.17.5+2 ⇒ v1.16.11+1
Updating `/private/tmp/clp_jll/Manifest.toml`
[06985876] ↓ Clp_jll v1.17.5+2 ⇒ v1.16.11+1
(clp_jll) pkg> st Clp_jll
Status `/private/tmp/clp_jll/Project.toml`
[06985876] Clp_jll v1.16.11+1
julia> exit()
Oscars-MBP:artifacts oscar$ ls */lib/libClp*
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClp.1.14.5.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClp.1.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClp.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClp.la
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClpSolver.1.14.5.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClpSolver.1.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClpSolver.dylib
396708ad058ae6c4f6d3945282a357e4cf8c8d63/lib/libClpSolver.la
c02cd9aa2d544364be228dd23b40386bfc63ace8/lib/libClp.1.13.11.dylib
c02cd9aa2d544364be228dd23b40386bfc63ace8/lib/libClp.1.dylib
c02cd9aa2d544364be228dd23b40386bfc63ace8/lib/libClp.dylib
c02cd9aa2d544364be228dd23b40386bfc63ace8/lib/libClp.la
c02cd9aa2d544364be228dd23b40386bfc63ace8/lib/libClpSolver.1.13.11.dylib
c02cd9aa2d544364be228dd23b40386bfc63ace8/lib/libClpSolver.1.dylib
c02cd9aa2d544364be228dd23b40386bfc63ace8/lib/libClpSolver.dylib
c02cd9aa2d544364be228dd23b40386bfc63ace8/lib/libClpSolver.la
Oscars-MBP:artifacts oscar$ and even
@ViralBShah should the |
I am using this commit: coin-or/Clp@29a3d29 Isn't that the right one? |
Certainly looks like the right commit. Any explanation for the weird filename versioning? |
I had printing turned on locally. It is time inside the solver (you can see the iterations take noticeably longer). Will test other instances. |
CoinBLAS literally just packages the reference BLAS. Are we using openblas with only 1 thread? That seems to be important for many Coin-OR libraries. In fact, I think they even try to detect it and change it internally (which probably will cause other issues for us). |
I think the old Builders also had |
A quick search through the Clp source shows a pretty liberal use of |
Yeah that can do it |
2x performance gap holds pretty consistently across instances. For example, on
|
New binaries published. Retry? |
On my laptop, it still takes about 50 seconds with the new binaries, and about 30 seconds with the older ones. The mystery deepens... Could it be that Clp 1.17 is slower than 1.16? |
I tried 1.16, and it is even slower. 65 seconds. So definitely something about the way we are building Clp. |
@giordano @staticfloat I am seeing the binaries produced by BinaryBuilder are running at half speed compared to the ones produced by the old ClpBuilder. Is this likely to be because of architecture specific optimizations in the compiler in the old builder? The other thing the old ClpBuilder did was link everything statically. Can that explain this difference? Any other ideas for why the performance is so different? If anything, we are linking against a fast BLAS in the BB binaries. |
I didn't follow the development of the old version of BinaryBuilder, but I'm not sure it was optimising more than the current version. One thing that we do now, though, is to not allow the use of flags like |
Yes, you can see the old builder script I linked above which I assume is how the current binaries are produced. It pretty much does not do doing anything like @juan-pablo-vielma Any insights you might have here would be valuable. |
Took a quick look. |
Yes, we do not need the MUMPS MPI thing, because I have already built a sequential mumps and applied their patches. @odow suggested we don't use ASL at all, and hence we removed it. Passes all tests. Static vs. dynamic linking is the big one right now, but calls to lapack and mumps are chunky enough that it shouldn't matter. Here's the build logs: https://dev.azure.com/JuliaPackaging/Yggdrasil/_build/results?buildId=3160&view=results |
Maybe looking at more solve logs? In the pair that @odow posted above everything is identical up to iteration 2530 and then things change. Not sure if there is a way to get more detailed logs from Clp |
The other thing that is different is we are using openblas in the new builders. |
On mac the new binaries are 2x slower, but on linux they are about 10% slower. |
@staticfloat helped me debug that the optimizations flags were being clobbered, and |
Ok, new binaries published. Please try it out. For me, the optimized binaries are twice as fast as the older ones!
|
Works for me! |
Thank you for your outstanding help. You've probably learnt far more than you wanted to about how Clp works. |
It's always fun to learn more about numerical software. And I am glad that these efforts paid off at least for some class of problems. Thanks for providing all the help and understanding. |
I think it will be much easier to update the Coin-OR solvers going forward. |
But can confirm that I have passing locally!
x-ref: JuliaPackaging/Yggdrasil#509
Closes #73
Closes #70