Add Async Runner for News Site Next Testing #442

flashdesignory · 2024-10-31T18:40:16Z

A potential solution to: #441

This pr adds an Async Runner, which waits for each test.run() to complete.

Tests for comparison:

NewsSite-Next - the current implementation with the current runner
NewsSite-Async - async runner - but no changes to the tests
~~3. NewsSite-Delay - async runner with a 500ms delay, before the test resolves~~

crossbench scores

async:
browser                Safari                    Firefox                Google Chrome
NewsSite-Next          58.20 ± 1.6%              74.8 ± 2.2%            73.8 ± 1.9%
NewsSite-Next-Delayed  1535.24 ± 0.050%          1563.6 ± 0.34%         1534.93 ± 0.050%
NewsSite-Next-Async    70.40 ± 1.2%              79.6 ± 1.9%            75.28 ± 1.1%
Score                  5.430 ± 0.73%             4.761 ± 1.1%           4.901 ± 0.71%

bgrins · 2024-10-31T20:24:49Z

resources/benchmark-runner.mjs

@@ -315,12 +334,35 @@ class RAFTestInvoker extends TestInvoker {
    }
 }

+class AsyncRAFTestInvoker extends BaseRAFTestInvoker {


The semantics here are tricky, but great to be opening this discussion. @smaug---- @rniwa the current behavior is

requestAnimationFrame(() => this._syncCallback()); requestAnimationFrame(() => { setTimeout(() => { this._asyncCallback(); setTimeout(async () => { await this._reportCallback(); resolve(); }, 0); }, 0); });

Since we'd now need to await for _syncCallback in a world with async steps, this new behavior wraps the second rAF into the first rAF's callback (after awaiting for the sync step to finish). Is this how you envision measuring async steps?

I also wonder if it would be feasible to make an "in place" change where, rather than having separate AsyncRAFTestInvoker & AsyncTimerTestInvoker & AsyncSuiteRunner classes, we
(a) update the existing ones to await in the appropriate places
(b) change the existing _runTestAndRecordResults to await test.run(this._page);
(c) allow tests to opt in to the behavior by simply passing async functions into the BenchmarkTestStep rather than setting a type and switching which classes are used.

Obviously this would rely on us being happy with the core async measurement loop, and would change the overall timing & semantics by awaiting on non-promises. But it would also be quite a bit simpler and may allow use cases for v4 where tests have one step that's async and others that aren't.

smaug---- · 2024-11-01T17:30:13Z

resources/benchmark-runner.mjs

+    _scheduleCallbacks(resolve) {
+        requestAnimationFrame(async () => {
+            await this._syncCallback();
+            requestAnimationFrame(() => {


Wouldn't this work rather badly in those tests where the sync part is really fast. There would be idle time between the rAF calls, and thus the measured time would also include idle time.

@smaug---- any idea how we could prevent this, while still waiting for the syncCallback ?

It would get messy, but we could perhaps have another rAF callback right before _syncCallback call.
That new one should fire at the same tick as the one which _syncCallback() uses (assuming I guess correctly how it schedules new work.) The new raf callback would start async time.
In addition to that we'd need a setTimeout(,0) to stop the sync time - that would be used in case there is time to run setTimeout(,0) between the relevant ticks. But the sync time might also just stop when the next raf starts - the order of rAFs and setTimeout can be anything.
(Of course currently there isn't such thing as "stop sync time", there is just starting async time)

@smaug---- honestly, I don't fully follow the above, but here are some additional thoughts.

The following would keep the two sequential rafs, but would enable workloads that might take longer to resolve the promise of the _syncCallback. Feels 'old school', but works in my initial testing in Chrome:

_scheduleCallbacks(resolve) { const checkedIn = { __proto__: null, sync: false, async: false, }; const checkin = async (type) => { checkedIn[type] = true; if (checkedIn.sync && checkedIn.async){ setTimeout(() => { this._asyncCallback(); setTimeout(async () => { await this._reportCallback(); resolve(); }, 0); }, 0); } }; requestAnimationFrame(async () => { await this._syncCallback(); checkin("sync"); }); requestAnimationFrame(() => { checkin("async"); }); }

A super small sample run:

Yet another though: do we even need two rafs, or could we do something simpler.
If we're awaiting a Promise, which is a microtask with higher priority, can we follow that with a setTimeout directly?

requestAnimationFrame(async () => { await this._syncCallback(); setTimeout(() => { this._asyncCallback(); setTimeout(async () => { await this._reportCallback(); resolve(); }, 0); }, 0); });

@smaug---- honestly, I don't fully follow the above,

Not surprised, it is confusing :) Basically what I was thinking is to try to detect when the sync time ends, explicitly.
And then not count the time between sync end and and async start as measured time.

setTimeout after _syncCallback would bring back the same issues sp2 has. There some random animation ticks may or may not run during measured time, because it is relying on setTimeout and not rAF callbacks.

smaug---- · 2024-11-07T22:51:18Z

Assuming react uses postMessage similarly to setTimeout(0),
would it help if the runner was something like

requestAnimationFrame(() => this._syncCallback());
requestAnimationFrame(() => {
  var gotTimer = false;
  var gotMessage = false;

  setTimeout(()=> { gotTimer = true;  tryTriggerAsyncCallback() });

  var mc = new MessageChannel();
  mc.port1.onmessage =  () => {  gotMessage = true; tryTriggerAsyncCallback(); }
  mc.port2.postMessage("sp3");

  function tryTriggerAsyncCallback() {
   if (!gotTimer || !gotMessage) {
    return;
   }
    this._asyncCallback();
    setTimeout(async () => {
      await this._reportCallback();
      resolve();
    }, 0);
  }, 0);
});

So basically require both postMessage and setTimeout to be processed.

Looks like there is a message channel and its port is used for communication
https://share.firefox.dev/48D7xwJ

flashdesignory · 2024-11-07T23:56:27Z

I assume the MessageChannel is from the React framework? - I'd need to dig in a little more though...

It probably depends, if we're trying to fix the React workloads, or if we're trying to allow async test steps. For the later, we'd need to await this._syncCallback(); ideally, so that the test steps could control or signal when they are done?

smaug---- · 2024-11-08T00:36:16Z

The MessageChannel in my example is from HTML spec.
I added locally some profiler markers to check whether React is using window.postMessage or MessaChannel/MessagePort.postMessage, and it was the latter.

flashdesignory · 2024-11-08T15:46:57Z

Summarizing an offline conversation I had with @smaug---- :

The idea @smaug---- proposed with the MessageChannel might be a great solve for the issue Add Async Runner for News Site Next Testing #442, but should be handled in a separate PR, since that solution should get implemented directly in the RAFTestInvoker class
The AsyncRunner pr (this pr) should focus only on enabling async test steps
We are ok, with explicitly marking a test async and in that way keeping a separate AsyncRAFTestInvoker class that handles these cases. We should be explicit about sync, async or remote test steps.
The current proposed AsyncRAFTestInvoker implementation should be OK in its current implementation with the two nested RAFs.
Trying to combine sync and async test steps in one RAFTestInvoker class (as discussed briefly in comments above) gets us in a tricky situation where one solution might be overly complicated.

smaug---- · 2024-11-08T16:00:33Z

And hopefully use of MessageChannel really helps. The idea isn't tested yet ;)

flashdesignory · 2024-11-15T19:48:39Z

resources/test-runner.mjs

@@ -18,6 +18,30 @@ export class TestRunner {
        this.#frame = frame;
    }

+    get frame() {
+        return this.#frame;


since i'm extending and overriding the runTest function, I had to create getters for the private variables.
just more boilerplate..

If there are now external consumers for these, should we just make them public? cc @rniwa

I'm only using getters at the moment, so more boilerplate, but maybe more "secure" than just making them public? 🤷

+1 to making the public and lower overhead.

flashdesignory · 2024-11-15T19:49:23Z

resources/suite-runner.mjs

@@ -100,5 +100,6 @@ class RemoteSuiteRunner extends SuiteRunner {}
 export const SUITE_RUNNER_LOOKUP = {
    __proto__: null,
    default: SuiteRunner,
+    async: SuiteRunner,


no need to create a new SuiteRunner, since only the TestRunner is different and that's dynamically selected.

bgrins · 2024-11-15T21:42:31Z

resources/test-invoker.mjs

@@ -24,7 +24,24 @@ export class TimerTestInvoker extends TestInvoker {
    }
 }

-export class RAFTestInvoker extends TestInvoker {
+class AsyncTimerTestInvoker extends TestInvoker {


Do we plan to use both the timer and RAF invoker for async? I had a sense that we kept the sync timer invoker around for backwards compat or non-default testing environments.

I'm assuming the overall properties are quire different once we are async (potentially reducing utility of timer), and it's complex enough to reason about one implementation, so should we consider just having one and exposing it on both the default and non-default configuration?

I'm open to having both but would like to find ways to reduce the amount of work if possible :). @camillobruni @smaug---- @rniwa thoughts?

I'd be fine to drop the setTimeout approach altogether since we haven't really discovered additional issues with it?

Yeah, I think we should just remove the timer-based test invoker.

Can we do the removal in a separate PR though?

Either way is fine for me, but yeah, I don't think the timer based runner is needed in any form.

camillobruni · 2024-11-19T14:29:53Z

resources/test-runner.mjs

@@ -18,6 +18,30 @@ export class TestRunner {
        this.#frame = frame;


let's not use private properties, they add needless overhead in the runner. The runner code should aim at maximum performance and minimal startup costs.

Last I tested this is measurable in Firefox and Chrome.

I don't think we should restrict ourselves from using new ECMAScript features. With that argument, we should be going back to writing ES5 code without any classes.

If there is performance penalty attached to it for the test runner, then yes, we should restrict ourselves, we should aim at the lowest possible overhead there. Especially if it uses code that is not popular out in the wild.

And I would even agree that we should transpile and minify to ES5 the runner code for a big part for performance reasons!

It's a different story for workloads though, a good mix of old and new features is very welcome.
Note: we did not include the latest framework versions based on this groups feedback... so there is precedence for going equally the other way.

Do you see the overhead in measured code or outside of measured code?

I can measure this easily in isolation... and I think out of principle, I want to keep the runner overhead to a minimum if we easily can.
I know for a fact that private properties are rather slow in chrome (and from my micro benchmark, the same applies to Firefox).

As I've said, if there are applications out there, I've got no objections to workloads. But just using modern JS features for feature's sake is a non-goal IMO for the runner of there are no clear engineering benefits.

Okay, we can agree to disagree on this point.

Could you be more specific for the record?
I've raised multiple points above and your answer can be interpreted rather ambigously.
In order to find a reasonable compromise, please clarify the following questions:

Do you prefer have a low or high overhead runner?

Do you want a runner that has non-equal overhead on different browsers?

Do you want to allow using the latest JS features in the runner?

How about features that need to be monkey patched because some browsers do not support them yet?

Do you want to test (amongst other things) latest features in workloads?

Do you want to test (amongst other things) framework versions in workloads?

(1) and (2) are ill fated questions because any harness is going to have a cost which differs between browsers. It's impossible to have write scripts that perform identically across different browsers beyond anything trivial.

(3): we should use the latest features available to us wherever beneficial. Stronger encapsulation is a definite engineering benefit.

(4) We've previously agreed not to test such features in the Speedometer benchmark. That obviously includes the test harness as well as the benchmark content.

I don't understand the relevancy of (5) and (6).

ES6 class syntax definitely has an overhead. If we're aiming for the lowest overhead, we should definitely not use classes. The same goes for async / await syntaxes and a whole bunch of other ES6 features. Basically all the refactoring done to Speedometer harness code between 2 and 3 should basically undone if the lowest overhead was what we were aiming for the in the harness.

Otherwise, you don't get to pick & choose which features are low overhead and which one isn't based on some random anecdotal evidence. Either there should be empirical observation of overhead and a threshold set forth priori such measurements, or we should allow the use of any new ES6 features.

The benefit of the encapsulation in this case is to make clear where the properties are set (here, in the constructor) and to be sure that no other code touches it.
I think the advantage is quite small in our case here it's not a library but a small app, and I wouldn't have done it myself, but I'm not opposed either.

About the overhead, the main question is: do we have data about the use of the private instance variables in the wild?
if the answer is that it's not used much and we want to reduce the overhead, there are ways to avoid it outside of the measured time, by using local constants in the function that runs the tests.

camillobruni · 2024-11-19T14:31:59Z

resources/test-runner.mjs

@@ -18,6 +18,30 @@ export class TestRunner {
        this.#frame = frame;
    }

+    get frame() {
+        return this.#frame;


+1 to making the public and lower overhead.

rniwa · 2024-12-17T10:58:53Z

resources/test-runner.mjs

+            }
+            performance.mark(syncStartLabel);
+            const syncStartTime = performance.now();
+            await this.test.run(this.page);


This is a lot of code duplication with seemingly the only difference being this line here.
I don't think we should duplicate this much code between the two runners.

rniwa · 2024-12-17T11:05:12Z

resources/test-runner.mjs

+            }
+            performance.mark(syncStartLabel);
+            const syncStartTime = performance.now();
+            await this.test.run(this.page);


Can't we just make this part "virtual"?
e.g.

class TestRunner { async runTest() { ... performance.mark(syncStartLabel); const syncStartTime = performance.now(); await this._runSyncStep(this.test, this.page); const syncEndTime = performance.now(); ... } } ... class SyncTestRunner extends TestRunner { function _runSyncStep(test, page) { test.run(page); } } class AsyncTestRunner extends TestRunner { async function _runSyncStep(test, page) { await test.run(page); } }

yeah, that would make this a bit easier to read.

rniwa · 2024-12-17T11:09:35Z

resources/test-runner.mjs

+
+export const TEST_RUNNER_LOOKUP = {
+    __proto__: null,
+    default: TestRunner,


I'd call this SyncTestRunner.

flashdesignory · 2024-12-17T12:50:29Z

@rniwa - thanks for the feedback.

This pr with the MessageChannel implementation is the preferred solution after discussing with @smaug---- :
#458

I'll keep this pr open for comparison for now.

smaug---- · 2024-12-19T20:45:24Z

resources/test-runner.mjs

+}
+
+// FIXME: implement remote steps
+class RemoteTestRunner extends TestRunner {}


Remove from this pr?

smaug---- · 2024-12-19T20:49:37Z

resources/test-invoker.mjs

+    _scheduleCallbacks(resolve) {
+        requestAnimationFrame(async () => {
+            await this._syncCallback();
+            requestAnimationFrame(() => {


I'm a bit lost now with the PRs. Why we'd ever want raf + (callback + raf) ?
Hmm, https://github.com/WebKit/Speedometer/pull/458/files has the same class.
Which PR should be reviewed?

flashdesignory · 2024-12-19T23:29:53Z

closing in favor of #458

flashdesignory added 7 commits October 30, 2024 17:09

git branch

b594115

initial change

f154a4c

add event

9459c00

add comment

d462764

rollback requestIdleCallback

d037f89

run format

79f8450

undo newssite changes

14f1f6d

bgrins reviewed Oct 31, 2024

View reviewed changes

smaug---- reviewed Nov 1, 2024

View reviewed changes

flashdesignory added 3 commits November 8, 2024 11:43

remove test test

d08606d

Merge branch 'main' into feature/async-runner

4540937

merge main

a02d7cc

flashdesignory marked this pull request as ready for review November 13, 2024 01:32

flashdesignory requested review from smaug----, bgrins, camillobruni, rniwa and julienw November 13, 2024 01:32

flashdesignory added 4 commits November 13, 2024 07:51

Merge branch 'main' into feature/async-runner

5281af4

Merge branch 'main' into feature/async-runner

d4fa570

cleanup

764b61b

reset news site build files

ace8c3b

flashdesignory commented Nov 15, 2024

View reviewed changes

bgrins reviewed Nov 15, 2024

View reviewed changes

camillobruni reviewed Nov 19, 2024

View reviewed changes

flashdesignory mentioned this pull request Nov 20, 2024

Async Runner with Message Channel #458

Open

rniwa requested changes Dec 17, 2024

View reviewed changes

rniwa reviewed Dec 17, 2024

View reviewed changes

resources/test-runner.mjs

export const TEST_RUNNER_LOOKUP = {

__proto__: null,

default: TestRunner,

Copy link

Member

rniwa Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call this SyncTestRunner.

smaug---- reviewed Dec 19, 2024

View reviewed changes

resources/test-runner.mjs

}

// FIXME: implement remote steps

class RemoteTestRunner extends TestRunner {}

Copy link

smaug---- Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove from this pr?

smaug---- reviewed Dec 19, 2024

View reviewed changes

flashdesignory closed this Dec 19, 2024

		@@ -18,6 +18,30 @@ export class TestRunner {
		this.#frame = frame;

Add Async Runner for News Site Next Testing #442

Add Async Runner for News Site Next Testing #442

Conversation

flashdesignory commented Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smaug---- Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smaug---- commented Nov 7, 2024 • edited Loading

flashdesignory commented Nov 7, 2024

smaug---- commented Nov 8, 2024

flashdesignory commented Nov 8, 2024

smaug---- commented Nov 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rniwa Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rniwa Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rniwa Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flashdesignory commented Dec 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flashdesignory commented Dec 19, 2024

flashdesignory commented Oct 31, 2024 •

edited

Loading

smaug---- Nov 1, 2024 •

edited

Loading

smaug---- commented Nov 7, 2024 •

edited

Loading

rniwa Nov 28, 2024 •

edited

Loading

rniwa Nov 28, 2024 •

edited

Loading

rniwa Dec 17, 2024 •

edited

Loading