Establishing Transparency and Fairness Guidelines for Feature Visibility #277

jcscottiii · 2024-05-14T16:29:42Z

Description

As webstatus.dev grows to encompass a wide range of browser feature
implementations, situations may arise where we need to temporarily or
permanently hide feature scores or features altogether. This could be due to a
variety of reasons, such as:

Data anomalies: Potential errors or inconsistencies in gathered data (such
as features that are actually implemented but have no Web Platform Test
coverage or the coverage implies the feature is not implemented but in
reality the Web Platform Test suite needs to improve the tests).
Ongoing development: Features in an early stage where scores aren't
representative (such as Web Platform Test scores showing 100% for a feature
that has not been fully implemented yet).

While these scenarios are understandable, it's crucial to handle them in a way
that maintains trust, transparency, and fairness amongst browser vendors while
also describing the actual state of the world to end-users. We want to invite
open discussion on how to best achieve this.

Desired Goals

Equitable Process: Establish clear, unbiased criteria for when and how to
hide information.
Transparency: Document all decisions regarding feature visibility, along
with the rationale.
Accountability: Create a mechanism for the community to raise concerns and
suggest changes.
User Information: Provide clear explanations for end-users when information
is hidden, including links back to the relevant discussion.

Possible Solutions (Not Exhaustive)

Test Suite Review Process: Review of test suite to ensure reasonable coverage
and that failures are explainable.
Public Comment Period: Allow a timeframe for feedback before any information
is hidden.
"Hidden Score" Label: Add a visual indicator to hidden items, with a link to
the rationale.
GitHub Discussions: Utilize Discussions to host conversations around feature
visibility concerns.

These are just starting points. We encourage everyone to share their ideas,
concerns, and suggestions to ensure we create a process that upholds the values
of this project.

Call to Action

Please feel free to comment on this issue with your thoughts. Your input is
invaluable in shaping the future of this project and ensuring it is a trusted
resource for everyone. Let's work together to build a truly transparent and
equitable process!

Please voice your concerns as well, while adhering to the project's
Code of Conduct.

This process can evolve over time as well, as we try things out.

foolip · 2024-05-17T18:30:41Z

Feedback from @meyerweb on Mastodon:
https://mastodon.social/@Meyerweb/112457440224134542

foolip · 2024-05-17T18:33:50Z

On "data anomalies", I think we'll need clear criteria for hiding scores, a few different common rationales, and perhaps links out to issues tracking fixing it.

Common rationales are insufficient coverage and widespread failures for reasons unrelated to the implementation quality.

meyerweb · 2024-05-17T19:57:31Z

As a followup on @foolip’s link to my toot (thanks, @foolip!) I think as long as rationales for absent scores are clear and consistent, you’ll be a lot further along.

I also believe there should be a lot more transparency on why a thing is listed at all when the supporting data doesn’t seem to be there. Example: https://webstatus.dev/features/canvas-text-baselines is listed as a newly-available baseline even though one of the tracked browsers is passing 0% of tests. (A whole three tests, it is true.) How can this be considered baseline when it’s apparently not supported at all by a tracked browser? I mean, I can think of at least one scenario where that sort of thing might be defensible, but I have no idea if this is such a scenario, nor does anyone else.

Even beyond that, https://webstatus.dev/features/hyphens is listed as baseline when its scores are mostly in the 50s, and the highest score is just short of 75%. It also has 55 tests, of which only 20 are passed by all four tracked browsers, which is a 36.4% Interop score. Does that qualify as baseline? I personally wouldn’t think so, but if there were a list of the ways things can get on the list, that would help a lot.

And then, I found https://webstatus.dev/features/conic-gradients, which is “Widely available” baseline, with one browser passing 18% of tests? And then https://webstatus.dev/features/webvtt, which ranges from 37-56% in terms of passing tests, and would have an Interop score of 9.1%? These also seem strange to include.

(I know that scores aren’t always the basis of something being considered baseline, but because the scores are so prominent, the questions seem inevitable. This is especially the case since “Insufficient test coverage” is given as a reason to not list scores, even if inconsistently.)

foolip · 2024-05-20T11:17:43Z

As a start, I've added source comments explaining each case in #301. We used the same reason for all of these for expedience, but we should make the distinction between a few different reasons:

Obviously insufficient coverage, like for AVIF
Widespread failures that we know to be for some reason other than the feature's implementation quality, like for device orientation events
Failures that aren't understood, but seem unlikely to be a reflection of the implementation quality based on some out-of-band knowledge. For example, I'm fairly confident that preservesPitch works well enough in the majority of use cases web developers care about, so 22.2% Firefox and 0% for Safari would be unreasonable.

@jcscottiii what do you think about always showing the ⓘ when we don't have a percentage to show, and to have more reasons? The existing "---" should be "no tests found" with an invitation to contribute to the mapping.

Reviewing the specific features @meyerweb mentioned:

https://webstatus.dev/features/canvas-text-baselines: I reviewed the Safari failures and guessed that since it was implemented in Safari so long before other browsers, that the spec probably changed in some way and Safari's implementation doesn't match the current spec. But this needs to be verified, I've filed web-platform-dx/web-features#1120.

https://webstatus.dev/features/hyphens: We'll need a subject matter expert to review this test suite. It's hard to tell if the failures are for cases that will affect web developers or not. My guess is that basic usage of the feature is fine, but that interoperability in the details isn't very good.

https://webstatus.dev/features/conic-gradients: This was on oversight on my part. The failures mostly look like minor pixel value differences. If we can't fix the tests we should hide this score for Safari specifically.

https://webstatus.dev/features/webvtt: I think WebVTT interop is somewhat bad, but you can use the basic feature. However, I see that I need to update this mapping to split WebVTT from WebVTT regions, since that contributes to the low score in at least Chrome and Edge.

foolip · 2024-05-23T09:51:41Z

For WebVTT I've filed web-platform-tests/wpt#46453 and sent #314 to hide the scores on webstatus.dev. This fits the "Widespread failures that we know to be for some reason other than the feature's implementation quality" reason I think.

foolip · 2024-05-23T09:59:37Z

Thinking about some guardrails for support status vs. test results:

Show no scores when a feature isn't supported (already the case)
For supported features, automatically hide scores <50% until reviewed, because most such cases will be a problem with the test suite or infrastructure more than the implementation quality
If a score is changed more than 10% by an issue other than implementation quality, hide the score for that specific browser

This would be the general approach, but exceptions could still be made based on other documented principles.

dmitriid · 2024-08-17T20:58:27Z

For sake of transparency features whose status is "not on any standards track" should be shown as such instead of "limited availability"

jcscottiii · 2024-08-19T14:29:34Z

For sake of transparency features whose status is "not on any standards track" should be shown as such instead of "limited availability"

@dmitriid

That's a great idea. And it would provide better insights than the current solution.

Looking at your comment on the related issue, we can leverage the status field from caniuse and check if it is unoff.

dmitriid · 2024-08-19T14:32:30Z

Yeah, I didn't realize there was a related issue, so ended up commenting (rather tersly 😬) on both.

I don't know how complete/up-to-date the data is, but it's probably okay if Can I Use ended up using it :)

past · 2024-08-21T15:12:57Z

I don't see why we would conflate "not on any standards track" and "limited availability" given they are orthogonal issues. I agree that the first part should be captured somehow though, which we should explore in #486.

jcscottiii pinned this issue May 14, 2024

jcscottiii added the community Issues seeking input from the community on project direction, policy, or decision-making label May 14, 2024

foolip mentioned this issue May 20, 2024

Should canvas text baselines be Baseline? web-platform-dx/web-features#1120

Open

jcscottiii mentioned this issue May 31, 2024

Host the Microsoft Edge 2024 web platform top dev needs #320

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Establishing Transparency and Fairness Guidelines for Feature Visibility #277

Establishing Transparency and Fairness Guidelines for Feature Visibility #277

jcscottiii commented May 14, 2024

foolip commented May 17, 2024

foolip commented May 17, 2024

meyerweb commented May 17, 2024 •

edited

Loading

foolip commented May 20, 2024

foolip commented May 23, 2024

foolip commented May 23, 2024

dmitriid commented Aug 17, 2024

jcscottiii commented Aug 19, 2024

dmitriid commented Aug 19, 2024

past commented Aug 21, 2024

Establishing Transparency and Fairness Guidelines for Feature Visibility #277

Establishing Transparency and Fairness Guidelines for Feature Visibility #277

Comments

jcscottiii commented May 14, 2024

Description

Desired Goals

Possible Solutions (Not Exhaustive)

Call to Action

foolip commented May 17, 2024

foolip commented May 17, 2024

meyerweb commented May 17, 2024 • edited Loading

foolip commented May 20, 2024

foolip commented May 23, 2024

foolip commented May 23, 2024

dmitriid commented Aug 17, 2024

jcscottiii commented Aug 19, 2024

dmitriid commented Aug 19, 2024

past commented Aug 21, 2024

meyerweb commented May 17, 2024 •

edited

Loading