Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document enhanced tracing in Studio via OTel #5967

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

timbotnik
Copy link
Contributor

@timbotnik timbotnik commented Sep 6, 2024

Officially document enhanced tracing via OTel. This is currently marked experimental and we've started to get some usage of this feature. Time to make it more visible.

Fixes #712


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

Copy link
Contributor

github-actions bot commented Sep 6, 2024

@timbotnik, please consider creating a changeset entry in /.changesets/. These instructions describe the process and tooling.

@router-perf
Copy link

router-perf bot commented Sep 6, 2024

CI performance tests

  • connectors-const - Connectors stress test that runs with a constant number of users
  • const - Basic stress test that runs with a constant number of users
  • demand-control-instrumented - A copy of the step test, but with demand control monitoring and metrics enabled
  • demand-control-uninstrumented - A copy of the step test, but with demand control monitoring enabled
  • enhanced-signature - Enhanced signature enabled
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • extended-reference-mode - Extended reference mode enabled
  • large-request - Stress test with a 1 MB request payload
  • no-tracing - Basic stress test, no tracing
  • reload - Reload test over a long period of time at a constant rate of users
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • step-local-metrics - Field stats that are generated from the router rather than FTV1
  • step-with-prometheus - A copy of the step test with the Prometheus metrics exporter enabled
  • step - Basic stress test that steps up the number of users over time
  • xlarge-request - Stress test with 10 MB request payload
  • xxlarge-request - Stress test with 100 MB request payload

@timbotnik timbotnik force-pushed the timbotnik/ROUTER-712/document-otel-studio-path branch 2 times, most recently from 36c3142 to 570c4ee Compare September 6, 2024 03:51
@timbotnik timbotnik force-pushed the timbotnik/ROUTER-712/document-otel-studio-path branch from 570c4ee to 1891344 Compare September 6, 2024 04:00
@timbotnik timbotnik marked this pull request as ready for review September 6, 2024 04:01
@timbotnik timbotnik requested a review from a team as a code owner September 6, 2024 04:01
@bnjjj bnjjj requested a review from shorgi September 6, 2024 07:27
Comment on lines +876 to +880
Beginning in v1.49.0, the router supports sending traces to Studio via the more detailed OTel (OpenTelemetry) protocol.
Support for OTel traces has historically only been available for 3rd party APM tools. With this option,
Studio can now provide a much more granular view of Router internals than the legacy Apollo tracing protocol.

See [Enhanced tracing in Studio via OTel](./telemetry/apollo-telemetry#enhanced-tracing-in-studio-via-opentelemetry).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Beginning in v1.49.0, the router supports sending traces to Studio via the more detailed OTel (OpenTelemetry) protocol.
Support for OTel traces has historically only been available for 3rd party APM tools. With this option,
Studio can now provide a much more granular view of Router internals than the legacy Apollo tracing protocol.
See [Enhanced tracing in Studio via OTel](./telemetry/apollo-telemetry#enhanced-tracing-in-studio-via-opentelemetry).
Beginning in v1.49.0, the router supports sending traces to Studio via the OpenTelemetry Protocol (OTLP).
Support for OTLP traces has historically only been available for third-party APM tools. With this option,
Studio can now provide a much more granular and detailed view of router internals than the previous Apollo tracing protocol.
To learn more, see [Enhanced tracing in Studio via OpenTelemetry](./telemetry/apollo-telemetry#enhanced-tracing-in-studio-via-opentelemetry).


<ExperimentalFeature />

Beginning in v1.49.0, the router supports sending traces to Studio via the more detailed OTel (OpenTelemetry) protocol.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing references to "OTel protocol" to "OTLP" to be consistent with the option's name and the well-known OTLP acronym

Suggested change
Beginning in v1.49.0, the router supports sending traces to Studio via the more detailed OTel (OpenTelemetry) protocol.
Beginning in v1.49.0, the router supports sending traces to Studio via the OpenTelemetry protocol (OTLP).
Support for OTLP traces has historically only been available for third-party APM tools. With this option,
Studio can now provide a much more granular and detailed view of router internals than the previous Apollo tracing protocol.

Support for OTel traces has historically only been available for 3rd party APM tools. With this option,
Studio can now provide a much more granular view of Router internals than the legacy Apollo tracing protocol.

Benefits include:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Benefits include:
Benefits of OTLP traces include:

Comment on lines +89 to +91
- A comprehensive way to visualize the Router execution path in Studio.
- Additional spans that were previously not included in Studio traces, such as query parsing, planning, execution, and more.
- Additional attributes including HTTP request details, REST connector details, and more.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- A comprehensive way to visualize the Router execution path in Studio.
- Additional spans that were previously not included in Studio traces, such as query parsing, planning, execution, and more.
- Additional attributes including HTTP request details, REST connector details, and more.
- Comprehensive visualization of the router execution path in Studio
- New spans in Studio traces, including query parsing, planning, execution, and more
- New attributes, including HTTP request details, REST connector details, and more

- Additional spans that were previously not included in Studio traces, such as query parsing, planning, execution, and more.
- Additional attributes including HTTP request details, REST connector details, and more.

It is expected that this will become the default in a future version of Router.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing, generally we don't document roadmap in docs

Suggested change
It is expected that this will become the default in a future version of Router.

Comment on lines +100 to +102
- `always_off` (default): send all traces via the legacy Apollo Usage Reporting protocol.
- `always_on`: send all traces via OTLP.
- `0.0 - 1.0` (used for testing): the ratio of traces to send via OTLP (0.5 = 50 / 50).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `always_off` (default): send all traces via the legacy Apollo Usage Reporting protocol.
- `always_on`: send all traces via OTLP.
- `0.0 - 1.0` (used for testing): the ratio of traces to send via OTLP (0.5 = 50 / 50).
- `always_off` (default): send all traces via the legacy Apollo Usage Reporting protocol
- `always_on`: send all traces via OTLP
- `0.0 - 1.0` (used for testing): the ratio of traces to send via OTLP (0.4 = 40% OTLP / 60% legacy)

- `always_on`: send all traces via OTLP.
- `0.0 - 1.0` (used for testing): the ratio of traces to send via OTLP (0.5 = 50 / 50).

Note that this sampler is only applied _after_ the common tracing sampler, for example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that this sampler is only applied _after_ the common tracing sampler, for example:
This sampler is applied after the common tracing sampler.


Note that this sampler is only applied _after_ the common tracing sampler, for example:

#### Sample 1% of traces, send all traces via OTLP:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Sample 1% of traces, send all traces via OTLP:
#### Example configuration
An example configuration that samples 1% of traces and sends all traces via OTLP:

Comment on lines +121 to +127
OTel traces sent to Studio will not necessarily be identical to the ones sent to 3rd Party APM tools via OTLP:

- Only specific OTLP attributes will be included for parity with what is provided in legacy traces today. This ensures that data privacy
is maintained in an equivalent manner. The existing Router configuration options for Apollo telemetry will continue to function
with OTLP traces, such as forwarding of GraphQL errors, headers, and variables.
- Some features of OTLP traces may only be available in Studio and not in 3rd Party APM tools (e.g. resolver-level timing information from
[Federated Tracing](../../federation/metrics/#enabling-federated-tracing)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OTel traces sent to Studio will not necessarily be identical to the ones sent to 3rd Party APM tools via OTLP:
- Only specific OTLP attributes will be included for parity with what is provided in legacy traces today. This ensures that data privacy
is maintained in an equivalent manner. The existing Router configuration options for Apollo telemetry will continue to function
with OTLP traces, such as forwarding of GraphQL errors, headers, and variables.
- Some features of OTLP traces may only be available in Studio and not in 3rd Party APM tools (e.g. resolver-level timing information from
[Federated Tracing](../../federation/metrics/#enabling-federated-tracing)).
OTLP traces sent to Studio aren't necessarily identical to ones sent to third-party APM tools via OTLP:
- Only specific OTLP attributes are included for parity with legacy traces today. This ensures that data privacy is maintained in an equivalent manner. Existing router configuration options for Apollo telemetry will continue to function with OTLP traces, including forwarding of GraphQL errors, headers, and variables.
- Some features of OTLP traces are available only in Studio and not in third-party APM tools, such as resolver-level timing information from [Federated Tracing](../../federation/metrics/#enabling-federated-tracing)

Comment on lines +131 to +135
This change results in using a new wire protocol for traces, and some users may experience an increase in tracing traffic
to GraphOS Studio due to the additional detail being captured. In exceptional situations it may be necessary to send fewer traces.
This can be achieved via sending fewer traces (`telemetry.exporters.tracing.common.sampler`) or as a last resort, falling back
to the old protocol via `telemetry.apollo.otlp_tracing_sampler` to send fewer OTLP traces or fully disable them.
Any performance regressions due to the new tracing protocol should also be reported to the Apollo support team.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This change results in using a new wire protocol for traces, and some users may experience an increase in tracing traffic
to GraphOS Studio due to the additional detail being captured. In exceptional situations it may be necessary to send fewer traces.
This can be achieved via sending fewer traces (`telemetry.exporters.tracing.common.sampler`) or as a last resort, falling back
to the old protocol via `telemetry.apollo.otlp_tracing_sampler` to send fewer OTLP traces or fully disable them.
Any performance regressions due to the new tracing protocol should also be reported to the Apollo support team.
You may experience an increase in tracing traffic sent to GraphOS Studio due to the additional detail captured by the new wire protocol. In exceptional situations, you may need to send fewer traces.
To send fewer traces, configure `telemetry.exporters.tracing.common.sampler` or revert to the old protocol via `telemetry.apollo.otlp_tracing_sampler` to send fewer OTLP traces or to disable them.
For performance regressions due to the new tracing protocol, you should report them to the [Apollo support team](https://www.apollographql.com/support).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants