Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S7 arrays #401

Open
sjcowtan opened this issue Mar 15, 2024 · 14 comments
Open

S7 arrays #401

sjcowtan opened this issue Mar 15, 2024 · 14 comments

Comments

@sjcowtan
Copy link

Is there a good way to create an S7 object which inherits from "array" (or "matrix", but my specific use case is "array")? If not, will this be added?

@TimTaylor
Copy link

FWIW - Looks like it's currently in draft stage (#434) so hopefully going to be added.

@sjcowtan
Copy link
Author

sjcowtan commented Sep 9, 2024 via email

@t-kalinowski
Copy link
Member

This is implemented in the dev version of S7.

new_class("my_array", parent = class_array)

@hadley
Copy link
Member

hadley commented Oct 11, 2024

@t-kalinowski how does that work? what type of array do you get?

@t-kalinowski
Copy link
Member

length 0 logical (this makes the most sense, for the same reason that NA is logical; it can easily be promoted to another atomic)

library(S7)

new_class("my_array", parent = class_array)
#> <my_array> class
#> @ parent     : S3<array>
#> @ constructor: function(.data, dim, dimnames) {...}
#> @ validator  : <NULL>
#> @ properties :
new_class("my_array", parent = class_array)()
#> <my_array>  logi[0 (1d)]

Created on 2024-10-11 with reprex v2.1.1

@hadley
Copy link
Member

hadley commented Oct 11, 2024

@t-kalinowski hmmm, that feels weird to me because dim is orthogonal to class in S3. So I would have expected something more like class_array_logical, class_array_numeric etc.

@t-kalinowski
Copy link
Member

I agree, this should be easy.

We only provide class_* for actual base classes, that seems like a compound class.

I wonder if we should introduce a & / new_intersection() for classes, complimenting | /new_union(). Then, class_array_logical could be achieved with:

class_array & class_logical

@hadley
Copy link
Member

hadley commented Oct 11, 2024

It feels like we should discuss this design a bit more before committing to it in a release. Maybe roll back for this release?

@mmaechler
Copy link
Collaborator

I agree, this should be easy.

We only provide class_* for actual base classes, that seems like a compound class.

I wonder if we should introduce a & / new_intersection() for classes, complimenting | /new_union(). Then, class_array_logical could be achieved with:

class_array & class_logical

This sounds nice to me ((without further deep thinking though) independent of class_array etc.

@lawremi
Copy link
Collaborator

lawremi commented Oct 16, 2024

One could think of matrix and array as containers. Specifying the contents of a container is typically done through parameterization. Parameterized types would be interesting. Maybe they are represented by a higher order function that accepts the type parameters and returns a constructor that ensures the right mode?

The interface might look like:

Matrix <- new_parameterized_class("Matrix", class_matrix, list(T = class_vector))
Matrix(class_numeric)(1:10, 2)

Or, we more directly address the contract of the dim attribute presence using something like Haskell type classes:

Matrix <- new_predicate("Matrix", function(self) !is.null(attr(self, "dim")))

But that will really slow down dispatch. The methods package did explore this approach with setIs(test=) but it was never used for method selection.

@t-kalinowski
Copy link
Member

t-kalinowski commented Oct 16, 2024

Quoting from ?UseMethod:

If the object does not have a class attribute, it has an implicit class. Matrices and arrays have class "matrix" or "array", followed by the class of the underlying vector. Most vectors have a class based on the result of mode(x), except that integer vectors have the class c("integer", "numeric"), and real vectors have the class c("double", "numeric").

So, it seems we need to think of this in terms of "implicit" classes.

We currently handle the implicit "numeric" class using a union of class_integer and class_double.

Union classes cannot be used as a parent in new_class(parent=), but they can be used in method<- and in new_property(). This restriction sets a precedent: implicit class types cannot be used in all contexts.

> new_class("foo", class_numeric)
Error: `parent` must be an S7 class, S3 class, or base type, not an S7 union.

Some initial thoughts:

"array" and "matrix" are the only "implicit" S3 classes that are disconnected from typeof().

> .class2(matrix(1))
[1] "matrix"  "array"   "double"  "numeric"

One temptation is to open the barn doors and think through what it means to support arbitrary "implicit" classes, similar to a mixin class in Python or a Rust trait—where a generic can make assumptions about the object’s properties.

For example, in #433, we seem to be reaching for a class_scalar. One could imagine:

class_scalar <- new_class(
  "class_scalar",
  validator = function(self)
    if (!identical(length(self), 1L))
      "must be length 1"
)

Or, continuing along this desire path, one could further imagine implicit types as peers of regular types, allowed to be children of union type and composable with base types.

class_scalar <- new_class(
  "class_scalar",
  parent = class_vector,
  validator = function(self)
    if (!identical(length(self), 1L))
      "must be length 1"
)

class_string <- 
  new_class("class_string", parent = c(class_character, class_scalar))

This would be straightforward to implement from the constructor's side. However, the challenge is method dispatch with base objects that lack a class attribute. For S7 to be viable, method dispatch must be efficient.

I see no efficient way to support:

  • Dispatch on arbitrary combinations of the class vector without an exponential increase in method search time.
  • Dispatch on additional "implicit" entries created at runtime by calling arbitrary R closures.

Another question to consider: What happens when a class_array is not explicitly declared but might be expected to appear implicitly?

Foo <- new_class(
  "Foo", 
  parent = class_double, 
  properties = list(dim = class_integer)
)

foo <- Foo(as.double(1:4), c(2, 2))

.class2(foo)          # "Foo" "double" "S7_object"
.class2(unclass(foo)) # "matrix" "array" "double" "numeric"

There is already an inconsistency with the implicit "numeric" class, where we intentionally omit dispatching on numeric.

Foo <- new_class("Foo", class_double)
foo <- Foo()
.class2(foo)          # "Foo" "double" "S7_object"
.class2(unclass(foo)) # "double" "numeric"

Perhaps, we continue with the existing precedent and intentionally limit support for the concept of implicit classes in S7, beyond some minimal support in method<- for compatability with S3.

E.g., we offer no default constructor for class_numeric or class_POSIXt, and perhaps, neither for class_array and class_matrix.

> new_class("Foo", class_POSIXt)()
Error:
! S3 class <POSIXt> doesn't have a constructor

> new_class("Foo", class_numeric)()
Error:
! `parent` must be an S7 class, S3 class, or base type, not an S7 union.

This could be via new_class("class_array", abstract = TRUE), or via some other, new mechanism.

Another idea is introducing a counterpart to new_union, called new_intersection. Like new_union, it wouldn't be usable in all contexts: The inverse of new_union(), it would valid in new_class() and new_property(), but not in method<-. An intersection could stack or combine validators (and obj_dispatch() entries), enabling use cases like this.

@lawremi
Copy link
Collaborator

lawremi commented Oct 16, 2024

It seems like we should at least merge the existing implicit classes from class() at construction time and have a dim<-.S7_object() that updates it. And merge .class2() during dispatch if we are not already.

In the general case, I think we'll start running into issues similar to those of multiple inheritance. How do we linearize the hierarchy with these implicit/dynamic classes?

One way to make implicit classes semi-efficient would be to figure out the classes upon class definition and upon each modification of the object, to the extent possible, alongside validation.

Definitely worth a discussion at our next meeting.

@t-kalinowski
Copy link
Member

t-kalinowski commented Oct 21, 2024

It's worth noting that it's not valid to have a dim attribute on a non-vector.

library(S7)
Foo <- new_class("Foo", properties = list(
  dim = new_property(NULL | class_integer)
))

foo <- Foo()

foo@dim <- 1:3
#> Error in `@<-.S7_object`(`*tmp*`, "dim", value = 1:3): invalid first argument, must be vector (list or atomic)
dim(foo) <- 1:3
#> Error in dim(foo) <- 1:3: invalid first argument, must be vector (list or atomic)

This error message comes from base R Rf_setAttrib() calling C function dimgets(), which has the check.

@lawremi
Copy link
Collaborator

lawremi commented Oct 25, 2024

That could be loosened to allow OBJSXP. There would be additional constraints though, like having a length() method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants