More on the metrics

intarchboard · Dec 8, 2021 · eabf4ae · eabf4ae
1 parent 0de316f
commit eabf4ae
Showing 1 changed file with 193 additions and 35 deletions.
diff --git a/draft-iab-mnqeu-report.md b/draft-iab-mnqeu-report.md
@@ -275,15 +275,20 @@ informative:
       - ins: S. Cheshire
     seriesinfo: https://www.iab.org/wp-content/IAB-uploads/2021/09/Internet-Score-2.pdf
 
-  ping:
-    title: "PING(8)"
+  tools.ookla_speedtest:
+     title: "Speedtest by Ookla"
+     target: https://www.speedtest.net
+  tools.apple_networkQuality:
+     title: "Apple Network Quality"
+  tools.ping:
+    title: "ping -- send ICMP ECHO_REQUEST packets to network hosts"
 
 --- abstract
 
 The Measuring Network Quality for End-Users workshop was held
 virtually by the Internet Architecture Board (IAB) from September 14-16, 2021.
 This report summarizes the workshop, the topics discussed, and some
-preliminary conclusions drawn at the end of the workshop. 
+preliminary conclusions drawn at the end of the workshop.
 
 --- middle
 
@@ -436,6 +441,28 @@ of the Internet.  The need for improvements to latency and its
 measurements was heavily discussed, especially for certain classes of
 users such as live, collaborative content and gaming.
 
+Historically, the primary metrics for assessing the network quality were the
+throughput (or sometimes goodput) and the idle latency (often referred to as
+"ping time", following the name of the popular UNIX tool).
+
+### Throughput considerations
+
+The throughput have enjoyed being the primary optimization target over the
+yaers, and as a result, it is not uncommon for the common public to have access
+to the Internet at multiple Gbps. As a result, the relative importance of
+throughput to the end users is gradually decreasing - once the network provides
+the user with sufficient throughput to perform their daily tasks, the latency
+becomes more important.
+
+With the onset of COVID-19, the global population became more dependent on
+vairous forms of collaboration and telepresence. At today's state of the
+collaboration technology, high quality videoconference software requires only
+single digit Mbps throughput. Once the sufficient throughput is available, the
+videoconference experience will not improve with further throughput increases.
+
+Other types of traffic, such as browsing the web, can benefit from increases in
+throughput, up to a certain point.
+
 ### Latency considerations
 
 End-to-end latency is the time it takes for a particular segment to traverse
@@ -456,29 +483,160 @@ latency comprises several components:
 4. Some of the workshop sumbissions have explicitly called out the application
    delay, which reflects the inefficiencies in the application layer.
 
-### Idle latency vs. working latency.
-
-Tradionally, end-to-end latency is measured with tools such as {{ping}}, as
-well as with services such as {{speedtest}} or {{ookla}}.
+Tradionally, end-to-end latency is measured with tools such as {{tools.ping}}, as
+well as with services such as {{tools.ookla_speedtest}}.
 Such measurements are typically performed when the network is idle, and as a
 result, such measurements reflect mostly the propagation delay.
 
-A different way to measure end-to-end latency is to perform the test when the
-network is not idle, but in its typical working conditions.
+A different way to measure the end-to-end latency is to perform the test when
+the network is not idle, but in its typical working conditions.
 
 The workshop participants used the term "Idle latency" when referring to the
 subject of the former measurement methods, and "Working latency" when referring
 to the latter.
 
-### Metrics - conclusions
+While historically the tooling available for measuring latency focused on
+measuring the idle latency, there is a trend in the industry to start measuring
+the working latency as well, e.g. {{tools.apple_networkQuality}}.
+
+### Measurement considerations
+
+The participants have proposed several concrete methodologies for measuring the
+network quality for the end users.
+
+{{Paasch2021}} introduced a new dimensionless metric, called RPM
+(which stands for "round-trips per minute"), to communicate the quality of the
+network to the end users. The metric represents the number of round trips a
+segment can make within a minute when the network is under its typical working
+conditions. The RPM metric reflects the combination of the propagation delay,
+the buffering delay and the transport protocol delays. The RPM metric is designed
+to be easy to explain to the general public - the higher values reflect better
+network conditions, and the typical values are 3-4 digits.
+
+{{Mathis2021}} have applied the RPM metric to the results of more than 4
+billion download tests that were performed by M-Lab betwen 2010 and 2021.
+During this timeframe, the measurement platform that M-Lab uses underwent
+several upgrades, which allowed the research team to compare different versions
+of the TCP CCAs. The study have demonstrated that the Cubic CCA have lowered
+the responsiveness, which is attributed to the fact that Cubic uses large
+network queues to maximize throughput.  A subsequent upgrade to the BBR CCA
+have demonstrated increased responsivness (with the exception of several
+locations such as NYC), which is consitent with the BBR design that makes
+efforts to estimate the network capacity better, and relies on the capacity
+estimations and pacing to maximize the throuhgput, vs. large queues.
+
+The study have demonstrated that aside from the platform upgrades, the reported
+RPM values are similar between the adjacent months, which suggests that the RPM
+metric is stable. Changes in RPM metric that could not be attributed to changes
+in the M-Lab platform were found to be consistent and lasting for several
+months, which suggests that these were caused by ISPs upgrading their
+equipment, or other events to the similar effect.
+
+
+{{Schlinker2019}} presented a large scale study of the correlation between the
+network quality and the quality of the experience of users of a large social
+network.
+
+The authors performed the measurements at multiple datacenters from which video
+segmetns of a set size were streamed to a large number of end users. The
+authors used "the probability of transmitting vvideo segment of size S at
+goodput G" as the metric. Goodput is highly dependent on the transport delays,
+but is less sensitive to the latency. Further, the goodput is highly dependent
+on the choice of the transport protocol (the authors used QUIC with BBRv1.)
+
+Specifically, the authors suggested `P(G, S) <- [0, 1)` - the probabilty of
+achieving desired goodput `G` when transmitting object of set size `S`.
+
+The choice of goodput and the logicistical metric were driven by the need to
+assess the maximal quality of the content that can be effectively served by the
+network. Specifically, the suggested metric has several properties that are
+desireable for operators of large scale video streaming service:
+
+1. The suggested metric is easy to aggregate, and therefore can be used to
+   reflect the network quality across multiple connections - up to the scale of
+   entire autonomous systems.
+2. The suggested metric is directly applicable to the operators of streaming
+   media services, since it allows to choose the most appropriate compression
+   algorithms and bitrates to achieve the highest quality for a given network
+   connection (or an aggregate, such as AS), without significant degradation of
+   goodput.
+
+The study have found that the short video clips, which constitute the majority
+of the traffic, are typically consumed in a sequential way. Since the
+sequential consumption of video is less sensitive to the buffering delays than
+random-access consumption, the goodput metric may not be as sensitive to the
+buffering delay as the RPM metric. It is, however, influenced by all types of
+delay, and in particular is useful for identifying infrstructure nodes that are
+shared by multiple forms of traffic.
+
+As such, the suggested metric seems highly valuable to the operators of
+streaming services, either when estimated the needed network capacity, or when
+deciding what bitrate values to use in order to deliver the content at the
+highest quality without stretching the capacity of the network.
+
+### RPM metric considerations
+
+The workshop participants have agreed that the RPM metric is an effective way
+to communicate the network quality to the end users.
+
+There's remaining work to be done before the RPM metric can be used as an
+industry standard:
+
+- The use of natural traffic vs. synthetic traffic.
+
+The RPM metric, as defined in {{Paasch2021}}, relies on synthetic traffic to
+bring the network into the working conditions.
+
+There are pros and cons of the use of the synthetic traffic when compared to
+the natural traffic that's generated by the applications which the end user
+runs.
+
+Using natural traffic to measure the responsiveness of the network is more
+reflective of the interactions of users with the specific application.
+Additionally, utilizing natural traffic to measure the network responsiveness
+does not consume any additional resources beyond what is required by the
+applications. Finally, using the natural traffic minimizes any potential impact
+of the measurement on other applications.
+
+Using the synthetic traffic to measure the responsiveness has the benefit of
+reliably assessing the responsiveness when used by any particular applicaiton,
+or by multiple applicaitons at once. The results of responsiveness
+test that utilizes synthetic traffic allow comparing different
+equipment or different locaitons in a fair way. Additionally, using a
+synthetic traffic makes it easier to standardize the responsiveness
+test - both the specific protocols that are used, as well as the test dynamics.
+Finally, the use of active measurement alleviates the privacy concerns
+that may arise if the users were not aware that their activities are used
+to measure the quality of the network.
+
+Work remains on making the RPM methodology more robust, as well as on
+reconciling the two fundamental ways of measuring the network responsivenes.
+This work had been taken to the IPPM working group within the IETF.
+
+The workshop participants believe that it would be valuable to develop
+methodology which could be used by the community to reconcile the RPM metric
+with the goodput metric.
+
+### Metrics considerations - conclusions
 
 Through the course of the workshop, the following statements have emerged:
 
-1. There's a dramatic difference between the idle latency and the working
-   latency measurements.
-2. The variance in idle latency is not high, while the working latency varies
-   wildly.
-3. Most of the tools to measure end-to-end latency focus on the idle latency.
+Both the goodput and the latency are important to measure. Today, users in many
+locations around the world have access to gigabit throughput, and measuring the
+latency is more important for these. Otehr users can not (yet) benefit from the
+high-bandwidth network access. For these users, both throughput and latency
+should be measured.
+
+When measuring the latency, one should measure the latency under typical
+working conditions, since the idle latency is not a good estimator of the user
+experience. Data shows that there is dramatic difference between the idle
+latency and the working latency. The RPM metric, described in {{Paasch2021}}
+is one way to communicate the working latency to the users.
+
+Most of existing tools today focus on measuring idle latency. It is important
+to continue invsesting into a standard methodology (and a toolkit) to make
+it possible for anyone to measure the working latency.
+
 
 ## Cross-layer considerations
 
@@ -535,7 +693,7 @@ provide context.
 10. End-users that want to be involved in QoS decisions should be able
     to voice their needs and desires.
 11. Applications are needed that can perform and report good quality
-    measurements in order to identify insufficient points in 
+    measurements in order to identify insufficient points in
     network access.
 12. Research done by regulators indicate that users/consumers prefer
     a simple metric per application, which frequently resolves to
@@ -740,26 +898,26 @@ The workshop chairs consisted of:
 
 The program committee consisted of:
 
-    Christoph Paasch 
-    Cullen Jennings 
-    Geoff Huston 
-    Greg White 
-    Jari Arkko 
-    Jason Livingood 
-    Jim Gettys 
-    Katarzyna Kosek-Szott 
-    Kathleen Nichols 
+    Christoph Paasch
+    Cullen Jennings
+    Geoff Huston
+    Greg White
+    Jari Arkko
+    Jason Livingood
+    Jim Gettys
+    Katarzyna Kosek-Szott
+    Kathleen Nichols
     Keith Winstein
-    Matt Mathis 
-    Mirja Kuehlewind 
-    Nick Feamster 
-    Olivier Bonaventure 
-    Randall Meyer 
-    Sam Crowford 
-    Stuart Cheshire 
-    Toke Hoiland-Jorgensen 
-    Tommy Pauly 
-    Vint Cerf 
+    Matt Mathis
+    Mirja Kuehlewind
+    Nick Feamster
+    Olivier Bonaventure
+    Randall Meyer
+    Sam Crowford
+    Stuart Cheshire
+    Toke Hoiland-Jorgensen
+    Tommy Pauly
+    Vint Cerf
 
 # Github Version of this document