-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring & optimizing large BinaryPolynomial construction #992
base: main
Are you sure you want to change the base?
Changes from 6 commits
050b51f
0452df7
b209990
7ddbe4c
6c4cc54
727ca81
44e0921
6923abd
695b45a
887dec0
679e15f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,11 +16,14 @@ | |
|
||
import itertools | ||
import collections.abc as abc | ||
from collections import defaultdict | ||
|
||
from numbers import Number | ||
|
||
import numpy as np | ||
|
||
import cytoolz as tl | ||
|
||
from dimod.decorators import vartype_argument | ||
from dimod.sampleset import as_samples | ||
from dimod.utilities import iter_safe_relabels | ||
|
@@ -34,6 +37,22 @@ def asfrozenset(term): | |
return term if isinstance(term, frozenset) else frozenset(term) | ||
|
||
|
||
def freeze_term(term, vartype): | ||
return ( | ||
term | ||
if isinstance(term, frozenset) | ||
else ( | ||
frozenset(term) | ||
if vartype is Vartype.BINARY | ||
else frozenset( | ||
var | ||
for var, power in tl.frequencies(term).items() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like a big dependency to bring in for a 4 line function. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also suspect that in typical use cases, i.e. relatively low degree terms, that just doing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In [7]: term = tuple('abcde')
In [8]: %timeit for _ in frequencies(term): pass
1.31 µs ± 9.02 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [9]: %timeit for _ in ((v, term.count(v)) for v in term): pass
979 ns ± 7.79 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also wonder if Counter would do the same thing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That result seems to be from the interpreted %timeit for _ in toolz.frequencies(term).items(): pass
1.16 µs ± 23.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit for _ in ((v, term.count(v)) for v in term): pass
826 ns ± 46.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit for _ in cytoolz.frequencies(term).items(): pass
347 ns ± 10.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit for _ in ((v, term.count(v)) for v in set(term)): pass
1.01 µs ± 49.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit for _ in Counter(term).items(): pass
1.08 µs ± 20.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Cytoolz offers 2 benefits: clean code and speed - the original interpreted toolz library only provides the code quality part This could by demonstrated by refactoring a large piece of code that is performance bottleneck. I will do more refactoring and optimization in this file for now to see how many more cytoolz uses turn up. So far i have been doing it method by method to keep the PRs small but a combine PR would demonstrate the cumulative effect |
||
if power % 2 | ||
) | ||
) | ||
) | ||
|
||
|
||
class BinaryPolynomial(abc.MutableMapping): | ||
"""A polynomial with binary variables and real-valued coefficients. | ||
|
||
|
@@ -92,26 +111,11 @@ def __init__(self, poly, vartype): | |
poly = poly.items() | ||
|
||
# we need to aggregate the repeated terms | ||
self._terms = terms = {} | ||
for term, bias in poly: | ||
|
||
fsterm = asfrozenset(term) | ||
|
||
# when SPIN-valued, s^2 == 1, so we need to handle that case | ||
# in BINARY, x^2 == x | ||
if len(fsterm) < len(term) and vartype is Vartype.SPIN: | ||
new = set() | ||
term = tuple(term) # make sure it has .count | ||
for v in fsterm: | ||
if term.count(v) % 2: | ||
new.add(v) | ||
fsterm = frozenset(new) | ||
|
||
if fsterm in terms: | ||
terms[fsterm] += bias | ||
else: | ||
terms[fsterm] = bias | ||
|
||
self._terms = terms = defaultdict(int) | ||
for term, bias in poly: | ||
terms[freeze_term(term, vartype)] += bias | ||
|
||
self.vartype = vartype | ||
|
||
def __contains__(self, term): | ||
|
@@ -127,10 +131,10 @@ def __eq__(self, other): | |
except Exception: | ||
# not a polynomial | ||
return False | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure why these lines needed changes? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like an eccidental whitespace change. Will fix. I am not seeing any semantic changes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we normally have whitespace in blank lines up to indentation level? Seems like that is what I removed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, in fact removing the whitespace is correct. It's more about keeping the blame/commit history clean. No big deal either way. |
||
self_terms = self._terms | ||
other_terms = other._terms | ||
|
||
return ( | ||
self.vartype == other.vartype | ||
and all( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while
cytoolz
is a cool package, I don't think we can add it as a core ocean dependency. It seems supported-ish, but, for instance, it only started releasing wheels in august/september.If/when it matures more, then we can consider adding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Cytoolz has been in conda since June 2016 though
Many science/ai/ml package are conda only/first.
I am also using these PRs to explore ideas even if they do get merged. Hope that is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More than ok! It's much easier to have these conversations on a PR.
For conda, we actually have an issue about this dwavesystems/dwave-ocean-sdk#144. I do think conda support in addition would be nice, but we also need to support pypi installs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I would love to have dwave code in conda. Outside dwave I use conda almost exclusively. One adwantage is that the offical repos are curated so the quality tends to be higher than pypi. Also the conda (& now also mamba) package manager maintains global environment consistecy so one install does not break previously installed packages. Conda will (up/down) grade previouly installed packages if necessary. Conda is also language agnostic so you can distribute C, R, Julia, Rust packages with it. Main drawback is it can not install directly from git.
The reason I brought cond up though was more that cytoolz is more active/mature/maintained than it's pypi history would suggest. I believe dask was also long time in conda before it got released on pypi.
Otherwise I think cytoolz & conda are independent issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also conda gives you MKL compiled numerical libraries by default, which in turn means that native vectorised numpy/scipy operation run multithreaded by default (without GIL)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also looks like cytool zhas been in PiPy since 2014 https://pypi.org/project/cytoolz/#history
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, though they only released an sdist until september. Which means that users without python-dev installed would not be able to install Ocean.
We have to be extremely careful about introducing new dependencies, especially ones with compiled code. Because they need to work across all of our supported OS/python versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I did not realize that the wheel availability was the issue. I thought the main concern was general maturity & support. If they are releasing wheels now what is the criterion for acceptance? Some period of time or some number of wheel releases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't a hard algorithm. But experience over the last few years has taught us to be extremely cautious when introducing new dependencies to Ocean. Especially compiled dependencies. A significant fraction of our support requests come from dependency management. Another place this comes up, each time Python releases a new version we can only release a supporting version of Ocean once all of our dependencies also update. So for some packages like NumPy, they are so fundamental that we cannot possibly avoid using them. But otherwise we definitely err, pretty far, on keeping our dependency tree as small as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I do think that (cy)toolz is the kind universal nuts&bolts library linke numpy, scipy & pandas that is worth considering since it can be used everywhere and can reduces the need for resorting to C & cython