Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance and reliability #9

Open
jeskowagner opened this issue Oct 19, 2022 · 1 comment
Open

Performance and reliability #9

jeskowagner opened this issue Oct 19, 2022 · 1 comment

Comments

@jeskowagner
Copy link

jeskowagner commented Oct 19, 2022

Hi!

This issue is carried over from the slingshot repo.

My question there had focused on whether pyslingshot was still under active development, thanks @mossjacob for the clarification on that it indeed is. Additionally, I raised two points:

  • Speed of pyslingshot vs. slingshot, with the former being more optimized but the latter benefitting greatly from the approx_points argument, making it faster in practice. If this functionality could be ported that would be amazing.
  • Reliability/user-friendliness: I have not managed to run pyslingshot on a dataset with ~15000 cells and 10 PCs as input. The remainder of this issue will address this point.

Unfortunately I cannot share the tested data set at this stage. However, here are some bits to give you some intuition of the data.

PCA

The data set was center-scaled and dimensionality reduced with PCA and we selected the first 10 PCs using the "elbow" method. Clustering is done with Leiden on a kNN graph built on those PCs. I am aware that we are not really seeing any clear differentiation trajectory, however, I am developing general methods and am using this (imperfect) dataset as proof of concept. Colors in this plot represent the clusters.
image

Running pyslingshot

My call to pyslingshot looks like this:

from slingshot import Slingshot
slingshot = Slingshot(data, cluster_labels_onehot, start_node=start_node)
slingshot.fit(num_epochs=1)

Prints 8 times the following:

UserWarning: 
The maximal number of iterations maxit (set to 20 by the program) allowed for finding a smoothing spline with fp=s has been reached: s too small.
There is an approximation returned but the corresponding weighted sum of squared residuals does not satisfy the condition abs(fp-s)/s < tol.

Before finally erroring out with:

----> 1 slingshot.fit(num_epochs=1)

File ~/.../slingshot/slingshot.py:166, in Slingshot.fit(self, num_epochs, debug_axes)
    163 self.debug_plot_avg = False
    165 if epoch == num_epochs - 1:  # plot curves
--> 166     self.plotter.clusters(self.debug_axes[1, 1], s=2, alpha=0.5)
    167     self.plotter.curves(self.debug_axes[1, 1], self.curves)

TypeError: 'NoneType' object is not subscriptable

Not sure why it's still trying to plot to the debug axes when I am not in debug mode. slingshot.debug_level is 0 in this example.

So I retried with the debug options on, following this tutorial:

fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 10))
custom_xlim = (-12, 12)
custom_ylim = (-12, 12)

slingshot = Slingshot(data, cluster_labels_onehot, start_node=start_node, debug_level='verbose')

slingshot.fit(num_epochs=1, debug_axes=axes)
Click for the resulting error message
/.../scipy/interpolate/_fitpack2.py:280: UserWarning: 
The maximal number of iterations maxit (set to 20 by the program)
allowed for finding a smoothing spline with fp=s has been reached: s
too small.
There is an approximation returned but the corresponding weighted sum
of squared residuals does not satisfy the condition abs(fp-s)/s < tol.
  warnings.warn(message)
  0%|                                                                                                                                                                                                                                 | 0/1 [00:31<?, ?it/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [31], line 7
      4 # plt.setp(axes, xlim=custom_xlim, ylim=custom_ylim)
      6 slingshot = Slingshot(adata.obsm["X_pca"][:,:10], cluster_labels_onehot, start_node=20, debug_level='verbose')
----> 7 slingshot.fit(num_epochs=1, debug_axes=axes)

File ~/.../slingshot/slingshot.py:147, in Slingshot.fit(self, num_epochs, debug_axes)
    144 self.calculate_cell_weights()
    146 # Fit principal curve for all lineages using existing curves
--> 147 self.fit_lineage_curves()
    149 # Ensure starts at 0
    150 for l_idx, lineage in enumerate(self.lineages):

File ~/.../slingshot/slingshot.py:258, in Slingshot.fit_lineage_curves(self)
    256         path_from = (self.data[i][0], curve.points_interp[i][0])
    257         path_to = (self.data[i][1], curve.points_interp[i][1])
--> 258         self.debug_axes[0, 1].plot(path_from, path_to, c='black', alpha=alphas[i])
    259     self.debug_axes[0, 1].plot(curve.points_interp[curve.order, 0],
    260                                curve.points_interp[curve.order, 1], label=str(lineage))
    262 d_sq, dist = curve.project_to_curve(self.data, curve.points_interp[curve.order])

File ~/.../matplotlib/axes/_axes.py:1662, in Axes.plot(self, scalex, scaley, data, *args, **kwargs)
   1419 """
   1420 Plot y versus x as lines and/or markers.
   1421 
   (...)
   1659 (``'green'``) or hex strings (``'#008000'``).
   1660 """
   1661 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D)
-> 1662 lines = [*self._get_lines(*args, data=data, **kwargs)]
   1663 for line in lines:
   1664     self.add_line(line)

File ~/.../matplotlib/axes/_base.py:311, in _process_plot_var_args.__call__(self, data, *args, **kwargs)
    309     this += args[0],
    310     args = args[1:]
--> 311 yield from self._plot_args(
    312     this, kwargs, ambiguous_fmt_datakey=ambiguous_fmt_datakey)

File ~/.../matplotlib/axes/_base.py:544, in _process_plot_var_args._plot_args(self, tup, kwargs, return_kwargs, ambiguous_fmt_datakey)
    542     return list(result)
    543 else:
--> 544     return [l[0] for l in result]

File ~/.../matplotlib/axes/_base.py:544, in <listcomp>(.0)
    542     return list(result)
    543 else:
--> 544     return [l[0] for l in result]

File ~/.../matplotlib/axes/_base.py:537, in <genexpr>(.0)
    534 else:
    535     labels = [label] * n_datasets
--> 537 result = (make_artist(x[:, j % ncx], y[:, j % ncy], kw,
    538                       {**kwargs, 'label': label})
    539           for j, label in enumerate(labels))
    541 if return_kwargs:
    542     return list(result)

File ~/.../matplotlib/axes/_base.py:351, in _process_plot_var_args._makeline(self, x, y, kw, kwargs)
    349 default_dict = self._getdefaults(set(), kw)
    350 self._setdefaults(default_dict, kw)
--> 351 seg = mlines.Line2D(x, y, **kw)
    352 return seg, kw

File ~/.../matplotlib/_api/deprecation.py:454, in make_keyword_only.<locals>.wrapper(*args, **kwargs)
    448 if len(args) > name_idx:
    449     warn_deprecated(
    450         since, message="Passing the %(name)s %(obj_type)s "
    451         "positionally is deprecated since Matplotlib %(since)s; the "
    452         "parameter will become keyword-only %(removal)s.",
    453         name=name, obj_type=f"parameter of {func.__name__}()")
--> 454 return func(*args, **kwargs)

File ~/.../matplotlib/lines.py:393, in Line2D.__init__(self, xdata, ydata, linewidth, linestyle, color, gapcolor, marker, markersize, markeredgewidth, markeredgecolor, markerfacecolor, markerfacecoloralt, fillstyle, antialiased, dash_capstyle, solid_capstyle, dash_joinstyle, solid_joinstyle, pickradius, drawstyle, markevery, **kwargs)
    389 self.set_markeredgewidth(markeredgewidth)
    391 # update kwargs before updating data to give the caller a
    392 # chance to init axes (and hence unit support)
--> 393 self._internal_update(kwargs)
    394 self._pickradius = pickradius
    395 self.ind_offset = 0

File ~/.../matplotlib/artist.py:1186, in Artist._internal_update(self, kwargs)
   1179 def _internal_update(self, kwargs):
   1180     """
   1181     Update artist properties without prenormalizing them, but generating
   1182     errors as if calling `set`.
   1183 
   1184     The lack of prenormalization is to maintain backcompatibility.
   1185     """
-> 1186     return self._update_props(
   1187         kwargs, "{cls.__name__}.set() got an unexpected keyword argument "
   1188         "{prop_name!r}")

File ~/.../matplotlib/artist.py:1162, in Artist._update_props(self, props, errfmt)
   1159             if not callable(func):
   1160                 raise AttributeError(
   1161                     errfmt.format(cls=type(self), prop_name=k))
-> 1162             ret.append(func(v))
   1163 if ret:
   1164     self.pchanged()

File ~/.../matplotlib/artist.py:983, in Artist.set_alpha(self, alpha)
    980     raise TypeError(
    981         f'alpha must be numeric or None, not {type(alpha)}')
    982 if alpha is not None and not (0 <= alpha <= 1):
--> 983     raise ValueError(f'alpha ({alpha}) is outside 0-1 range')
    984 self._alpha = alpha
    985 self.pchanged()

ValueError: alpha (nan) is outside 0-1 range

Through plt.show() I can see that two of the four debug plots still ran through:
image

I can see that this may be caused by an erroneous linage. See the first entry of the following output

[np.isnan(slingshot.curves[i].pseudotimes_interp).any() for i in range(len(slingshot.curves))]
# [True, False, False, False, False, False, False, False]

The corresponding lineage is: Lineage[20, 3, 10, 14, 18]


Long story short: I cannot get pyslingshot to run to completion in my hands, probably caused by this odd lineage there. When I export this exact data, run it through R's slingshot with approx_points set to default it finishes successfuly, and quicker than pyslingshot takes to get to this error.

This account is in no way meant to be discouraging or aggresive, and I am sorry if this wall of text conveys frustration. I am trying to lay out my struggle, but want to emphasise that I appreciate your development of pyslingshot. Also, I realise that I may have a very specific data set and usage here, which may well not fit into the scope of pyslingshot. I am content wrapping R's slingshot if necessary and in fact do so at the moment. But if there was a way to move past these issues and use approx_points, I would definitely move to pyslingshot to save the roundtrip to R.

Thanks! Best, Jesko

@mossjacob
Copy link
Owner

Dear Jesko,

Thank you so much for going into such detail! It will greatly help me when debugging this. I'll get back to you once I've done some investigating.

Best wishes,
Jacob

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants