Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify behaviour/documentation of transform() #217

Closed
st-- opened this issue Jan 9, 2021 · 22 comments · Fixed by #276
Closed

Clarify behaviour/documentation of transform() #217

st-- opened this issue Jan 9, 2021 · 22 comments · Fixed by #276

Comments

@st--
Copy link
Member

st-- commented Jan 9, 2021

There is probably some matter of "getting used to it", but for a user new to this package, having to specify the inverse lengthscale seems deeply unintuitive compared to other GP packages.

At least transform(k, ScaleTransform(2.0)) is still explicitly saying what it is (and there could be a LengthscaleTransform() that divides instead of multiplying...), but I think it is not very helpful to new users that transform(k, 2.0) will give you a kernel with lengthscale 0.5.

Moreover, the syntax TransformedKernel(kernel, ...) suggests to me that the transform is changing something about the kernel, rather than about the inputs. (And ScaleTransform on inputs vs ScaledKernel on outputs is somewhat ambiguous/confusing as well.) If not just renaming it to the more verbose input_transform or similar, what other ways could we clarify what is actually going on here?

Comment from @willtebbutt:

I think I agree that we could do more here to signpost what's going on here / offer a set-the-lengthscale option. I also think we could probably do with providing a bit more of an explanation for the rationale behind the transform approach to setting length scales, but that's a job for a different PR.

(Also, would be good to document/demonstrate how to be able to set bijectors on parameters within this framework. Should that be a separate issue?)

Originally posted by @st-- in #213 (comment)

@willtebbutt
Copy link
Member

Moreover, the syntax TransformedKernel(kernel, ...) suggests to me that the transform is changing something about the kernel, rather than about the inputs.

This is a really good point. I agree that going for more verbosity by renaming to input_transform would make sense. The convention I've adopted before (in Stheno.jl) is to call linear transformations of the inputs stretch rather than scale -- I don't think this is a great option though.

At least transform(k, ScaleTransform(2.0)) is still explicitly saying what it is (and there could be a LengthscaleTransform() that divides instead of multiplying...), but I think it is not very helpful to new users that transform(k, 2.0) will give you a kernel with lengthscale 0.5.

Hmmm this is an interesting point. Would you prefer to disallow this default in favour of insisting that the more verbose version is used? (I could live with that, and I agree that it might be clearer). Moreover, I agree that a LengthscaleTransform object would be a good idea.

@devmotion
Copy link
Member

Moreover, I agree that a LengthscaleTransform object would be a good idea.

It seems it would be sufficient to define LengthscaleTransform(x) = ScaleTransform(inv(x)), even though it would be a bit inconsistent since LengthscaleTransform looks like a constructor of an object of type LengthscaleTransform.

@willtebbutt
Copy link
Member

willtebbutt commented Jan 9, 2021

Agreed, but in general I really dislike doing the looks-like-a-constructor-but-isn't thing as I find it misleading. so would prefer that we either

  1. actually introduce a LengthscaleTransform type, or
  2. provide a function lengthscale(x) = ScaleTransform(inv(x)).

edit: or lengthscale_transform.

@theogf
Copy link
Member

theogf commented Jan 9, 2021

Or we can go back to the idea of the macro :D . For example having a @kernel SqExponentialKernel() lengthscale=2.0 would be sufficiently explicit and easy enough to use for for new users. If one wants to make more complicated constructs it would not work anymore but then we have the more standard constructors.

@st-- st-- mentioned this issue Jan 9, 2021
3 tasks
@willtebbutt
Copy link
Member

I guess? It just seems to me that it's no easier / clearer to write @kernel SqExponentialKernel() lengthscale=2.0 than it is input_transform(SqExponentialKernel(), LengthScale(2.0)). Maybe I've been staring at this stuff for too long?

@theogf
Copy link
Member

theogf commented Jan 9, 2021

I think we are all super biased right now 😅😅😅

@devmotion
Copy link
Member

I think the name @transform or @input_transform would be a bit more appropriate - in which case both approaches would look almost identical and the advantage of a macro is not really obvious anymore to me. The function approach seems easier to maintain (it exists anyway?) and would also cover more complicated use cases.

@theogf
Copy link
Member

theogf commented Jan 15, 2021

@devmotion @kernel has three letters less than @transform 😆😆😆

More seriously, for me the advantage of a macro is that it allows more liberties for making a convenient API:
For example when we moved to TransformedKernel for using lengthscales, one consequence is that one could not do SqExponentialKernel(3.0) because that would be inconsistent (we do not want it to create a TransformedKernel, although that is the most common and expected way to create a kernel (given other packages).
We also considered having lowercased constructors but that's also very cumbersome to maintain and would probably pollute the whole space.
Having a macro would allow to have a convenient way of creating typical kernels (SqExponentialKernel with lengthscale) without having to sacrifice consistency.

I really see it as an API for new users/beginners who just want to get their RBF:
"You only want to create an RBF with a lengthscale? Just call @kernel SqExponentialKernel lengthscale=2.0 and BAM! you're done!"
You don't even have to understand the whole Transform concept which, in my opinion is quite uncommon.
If they want to go further/create more complex things, we have the whole machinery for it anyway.

@devmotion
Copy link
Member

I am still not convinced 😄 😄

I agree that it is a bit cumbersome to create a kernel with lengthscale currently - but I think the also @kernel SqExponentialKernel lengthscale=2.0 (or however the macro would be called) is not a very compact expression. And arguably, macros feel a bit "magical" and introduce some non-standard syntax.

I get the impression that maybe, as suggested above, we should just add a function that adds a lengthscale transformation to the inputs. Basically it would be defined as

with_lengthscale(kernel::Kernel, l::Real) = transform(kernel, inv(l))

I am not sure about the name (I guess it should be both explicit and compact enough), @willtebbutt's suggestion above was lengthscale_transform. This would

  • handle the common case of setting a lengthscale,
  • would work with any kernel no matter how it is constructed,
  • would not require users to understand or know about transform and Transforms,
  • would be easier to implement,
  • and with_lengthscale(SqExponentialKernel(), 3.0) would be only one character more than @kernel SqExponentialKernel lengthscale=3.0 😄

@theogf
Copy link
Member

theogf commented Jan 15, 2021

and with_lengthscale(SqExponentialKernel(), 3.0) would be only one character more than @kernel SqExponentialKernel lengthscale=3.0 😄

That's it you convinced me!

I am on board with with_lengthscale personally, it's clear and easy to use. My first motivation for the macro was definitely to have a more black-box easy to use solution, but this works just as well.
What do you think @willtebbutt @st-- ?

@willtebbutt
Copy link
Member

with_lengthscale sounds good to me. I'm pleased to be avoiding macro magic :)

@theogf
Copy link
Member

theogf commented Jan 15, 2021

Ok then I'll close the everlasting macro PR #38 and make a new one for with_lengthscale

@st--
Copy link
Member Author

st-- commented Jan 15, 2021

+1 on not-a-macro, with_lengthscale seems fine to me too. I still think it'd be good to be a bit more explicit about what is actually getting transformed... with_input_transform(kernel, ...), InputTransformedKernel? :)

@st--
Copy link
Member Author

st-- commented Jan 15, 2021

And I'd retire the transform(kernel, scalar) and transform(kernel, vector) forms...

@theogf
Copy link
Member

theogf commented Jan 15, 2021

I get the sentiment of exactitude but with_input_transform is really verbose...

@willtebbutt
Copy link
Member

Maybe just input_transform?

@devmotion
Copy link
Member

For me input_transform would be sufficiently clear while still quite short. I think the name of the type of the transformed kernel does not really matter, and in particular it does not matter if it is long and annoying to type, if transformed kernels are constructed with input_transform instead of an explicit constructor (which we only mention in the API section in the documentation and whose use we don't recommend so far it seems). Users would not even have to know this type.

@theogf
Copy link
Member

theogf commented Jan 18, 2021

To continue with my silly propositions: an additional possible syntax would be
SqExponentialKernel() ∘ ScaleTransform(2.0) ∘ etc.., this time it's "mathematically" exact :)

@kaandocal
Copy link
Contributor

kaandocal commented Mar 27, 2021

As an outsider who is interested in using GPs with Julia I thought @theogf's last proposal is excellent: concise, memorable and does not pollute the namespace.

I tried overloading as an alias of transform but the function chains multiple transforms in the wrong order: (Kernel ∘ TA) ∘ TB should be equal to Kernel ∘(TA ∘ TB), but it equals Kernel ∘(TB ∘ TA) with the current implementation. The reason for this is that the specialisation of transform when the kernel is a transformed kernel chains the transformations in the opposite order. If this specialisation is removed, normal behaviour is recovered (a twice transformed kernel). I would thus propose switching conventions in the specialisation.

Edit: it might be clearer to change the name of "transform(::Kernel, ::Transformation)". The above behaviour felt very counterintuitive when I stumbled upon it...

@willtebbutt
Copy link
Member

Making this change in favour of composition would also help with some name clashes. transform gets used in a few other packages commonly-used packages (DataFrames.jl for example), so removing it in favour of composition would rid us of that problem.

@theogf
Copy link
Member

theogf commented Apr 11, 2021

The composition is awesome but we should still have a verbose approach, we should not rely solely on a Unicode character! Maybe compose ? It has a music touch to it 😂

@devmotion
Copy link
Member

devmotion commented Apr 11, 2021

Julia maintainers didn't like compose: JuliaLang/julia#33573 😛

If we still want to go with it we should not define our own compose but use the existing alias in https://github.com/JuliaFunctional/CompositionsBase.jl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants