Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add File interface #181

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Conversation

andreiltd
Copy link
Contributor

@andreiltd andreiltd commented Dec 4, 2024

@andreiltd andreiltd marked this pull request as draft December 4, 2024 14:35
@andreiltd andreiltd force-pushed the file-builtin branch 4 times, most recently from 94de20c to 72b5f33 Compare December 9, 2024 14:25
@andreiltd andreiltd marked this pull request as ready for review December 9, 2024 17:02
Copy link
Member

@tschneidereit tschneidereit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I left a few inline comments on the code as written, after getting far enough in the review I realized that I find it very difficult to reason through how the code implements the specification.

For example, when trying to understand the File constructor, I'm not sure how it implements the log of (simplifying here!) iterating through the list of fileBits and appending them to the blob underlying the File. If I understand the code correctly, it has a special case for a sequence whose first entry is a File instance. If that case is hit, it copies that File's contents and ignores the rest of the sequence?

But I might misunderstand the code here—that's very plausible!

Which is why I'd like to ask for something that's unfortunately a bunch of additional work: could you go through the code and add comments for which parts of the spec are implemented where? You can see examples of that in the various other builtins, in particular the ones for the fetch API. One that I think is quite decent is Response::redirect.

I used to have a script to turn the spec's source into C++ comments, but unfortunately seem to have lost it :( In the meantime, perhaps the easiest way to quickly get decent comments to use is roughly this:

  1. find the spec algorithm you want to implement/get comments for
  2. select the algorithms text in your browser
  3. select "view source" (this works well at least in Firefox, I haven't tried other browsers)
  4. copy the source
  5. paste it into an html-to-markdown converter, such as https://htmlmarkdown.com/
  6. prefix all lines in the output with //

That should give you something that's at least not too far away from usable.

As I said, I know that I'm asking for a whole bunch of extra work here, and I'm sorry for that. I don't really know of a better way to ensure that we're actually implementing the spec as written: the WPT suite gives us decent assurances, but it's not complete and doesn't cover all corner cases.

builtins/web/file.cpp Outdated Show resolved Hide resolved
builtins/web/blob.cpp Show resolved Hide resolved
builtins/web/file.cpp Outdated Show resolved Hide resolved
@andreiltd
Copy link
Contributor Author

andreiltd commented Dec 11, 2024

Hey @tschneidereit ! Thanks for the feedback! I can definitely add comments with the steps from the spec - that’s a good point.

I've been thinking about this implementation, and I feel like part of the issue here is that I don’t have a clear understanding of how to model inheritance using jsapi. According to the spec, the File API extends Blob. This patch models it using composition: embedding the Blob into a File slot and delegating methods and properties. Maybe that's a correct way, but...

I noticed that the JS_InitClass function allows you to pass a prototype of the parent class, which seems like something I should use here. But I couldn’t find any documentation on how to properly implement this. For example, how do I call the parent constructor from a subclass? How do I control which constructor parameters are passed to the parent?

We shouldn’t need to add a special case for something like File([new File(["TEST"]). This should already be covered by the Blob implementation since it’s equivalent to File([new Blob(["TEST"]). In other words, Blob::is_instance() should return true for both Blob and File objects.

BTW, I don't mind redoing this PR from scratch as long as we can come up with better and more idiomatic code :)

@andreiltd
Copy link
Contributor Author

andreiltd commented Dec 12, 2024

Hey @tschneidereit I refactored the code so that File is now a subclass of Blob. This simplifies the implementation significantly and, hopefully doesn’t rely on too much "black magic." I’ve also added the steps from the spec in the comments.

I will prepare a separate PR for Blob, adding steps from spec to the documentation.

@@ -51,13 +51,19 @@ class Blob : public TraceableBuiltinImpl<Blob> {
static const JSPropertySpec properties[];

static constexpr unsigned ctor_length = 0;
enum Slots { Data, Type, Endings, Readers, Count };
enum Slots { Data, Type, Endings, Readers, Reserved1, Reserved2, Count };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are being reserved by the FIle API, it may just make sense to directly call these MaybeFileName and MaybeFileModified, since those are their only meanings. Alternatively explicitly note they are reserved by the File API - FileAPISlot1 and FileAPISlot2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I'm curious why not just have a File slot and have it point to a file instance here, where the File builtin has the Name and Modified slots?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original goal was to use construct

JS::Construct(cx, blob_ctor_val, file_ctor_val, blob_args, &self);

so that internally it behaved like:

Reflect.construct(Blob, blobArgs, File);

and automatically allocated an object with the File class (allowing to set up extra slots such as name and lastModified). But for some reason JS_NewObjectForConstructor in the Blob constructor always produced a Blob object regardless of the newTarget.

I now switched to a simpler manual approach:

  • Create a File-class object directly,
  • Call Blob’s initialization routine on it (for shared functionality),
  • Then add the File-specific slots in the File constructor.

This keeps the “subclass” logic clean, avoids forcing Blob to know about File, and ensures we get a proper File object with all its extra slots.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly it also looks like using subclassing is now discouraged:

Warning: The standard committee now holds the position that the built-in subclassing mechanism in previous spec versions is over-engineered and causes non-negligible performance and security impacts. New built-in methods consider less about subclasses, and engine implementers are investigating whether to remove certain subclassing mechanisms. Consider using composition instead of inheritance when enhancing built-ins.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this recommendation is to spec authors, and doesn't apply to implementations of existing specs. Since File is specified as extending Blob, we unfortunately don't get to decide that it shouldn't do so.

builtins/web/file.cpp Outdated Show resolved Hide resolved

bool File::name_get(JSContext *cx, unsigned argc, JS::Value *vp) {
METHOD_HEADER(0);
// TODO: Change this class so that its prototype isn't an instance of the class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what is meant by this? Is the instance prototype not supposed to be an instance of the class, or the File.prototype constructor "prototype" property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, no idea. I've been carrying this comment from other classes without giving it much thought.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reasons I forgot, our builtin initialization system can't easily make it so that the prototype of instances isn't also an instance of the builtin. Which for some builtins it needs to, while for others it's not supposed to be. Hence, this TODO is strewn all over the place :/

wasi_clocks_wall_clock_datetime_t time_now;
wasi_clocks_wall_clock_now(&time_now);

uint64_t total_nanoseconds = time_now.seconds * 1000000000ULL + time_now.nanoseconds;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will overflow around year 2554. ¯\_(ツ)_/¯

Copy link
Member

@tschneidereit tschneidereit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite the whole host of comments and suggestions I left, I think this is very close, and will just need one more short round!

builtins/web/blob.h Outdated Show resolved Hide resolved
static bool lastModified_get(JSContext *cx, unsigned argc, JS::Value *vp);

public:
enum Slots { Count };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you ensure that the slots used exclusively by File start after the last one used by Blob, you can continue using the same setup for all slots:

Suggested change
enum Slots { Count };
enum Slots {
Name = Blob::Slots::Count,
LastModified,
Count
};

builtins/web/file.cpp Outdated Show resolved Hide resolved
return api::throw_error(cx, api::Errors::WrongReceiver, "name get", "File");
}

auto name = JS::GetReservedSlot(self, static_cast<size_t>(ParentSlots::Name)).toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto name = JS::GetReservedSlot(self, static_cast<size_t>(ParentSlots::Name)).toString();
auto name = JS::GetReservedSlot(self, static_cast<size_t>(Slots::Name)).toString();

builtins/web/file.cpp Outdated Show resolved Hide resolved

bool File::name_get(JSContext *cx, unsigned argc, JS::Value *vp) {
METHOD_HEADER(0);
// TODO: Change this class so that its prototype isn't an instance of the class
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reasons I forgot, our builtin initialization system can't easily make it so that the prototype of instances isn't also an instance of the builtin. Which for some builtins it needs to, while for others it's not supposed to be. Hence, this TODO is strewn all over the place :/


if (!opts) {
return true;
bool File::init(JSContext *cx, HandleObject self, HandleValue fileBits, HandleValue fileName, HandleValue opts) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a doc comment with a link to the algorithm implemented here? See some of the other builtins for how I usually format those.

Comment on lines 133 to 136
RootedValue lastModified(cx);
if (!init_last_modified(cx, opts, &lastModified)) {
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To adjust to the changes suggested above, this should become

Suggested change
RootedValue lastModified(cx);
if (!init_last_modified(cx, opts, &lastModified)) {
return false;
}
int64_t last_modified;
if (!read_last_modified(cx, opts, &last_modified)) {
return false;
}

// Steps 2, 3 and 5 are handled by Blob. We extend the Blob by adding a `name`
// and the `lastModified` properties.
SetReservedSlot(self, static_cast<uint32_t>(Slots::Name), JS::StringValue(name));
SetReservedSlot(self, static_cast<uint32_t>(Slots::LastModified), lastModified);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is a bit unfortunate: we can't store lastModified as a uint32_t, because that'd overflow very quickly. (Way way before 2038, because it includes milliseconds.)

I think the best thing to do so we don't have to step away from using a Value encoding is to use a Number: 53 bits of precision should still be plenty for our purposes here.

@@ -107,6 +107,22 @@ void MonotonicClock::unsubscribe(const int32_t handle_id) {
wasi_io_poll_pollable_drop_own(own_pollable_t{handle_id});
}

uint64_t WallClock::now() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll not need this addition after all, given that we can use JS_Now as described above.

@andreiltd
Copy link
Contributor Author

Hey @tschneidereit thanks for the review! Are you sure looked at the most recent version of the code? It looks like there are a bunch of comments that refer to outdated code, like Slots handling - there are now defined on File class, or using Int32 for storing lastModified - which uses Number now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

File API
3 participants