Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify when flush returns unsuccessfully #623

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

felixbarny
Copy link
Member

  • Create PR as draft
  • Approval by at least one other agent
  • Mark as Ready for Review (automatically requests reviews from all agents and PM via CODEOWNERS)
    • Remove PM from reviewers if impact on product is negligible
    • Remove agents from reviewers if the change is not relevant for them
  • Merge after 2 business days passed without objections
    To auto-merge the PR, add /schedule YYYY-MM-DD to the PR description.

@apmmachine
Copy link

apmmachine commented Mar 24, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-03-31T05:18:00.028+0000

  • Duration: 3 min 25 sec

In the edge case where the extension takes too much time to respond (e.g. if there's a lenghy GC pause),
the `flush` method should return after a timeout.

The default timeout is 1s.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a data point for what is currently happening in the agents, we use api_request_time as our flush timeout, which defaults to 10s.

That's likely too long, especially for Lambda. But I'm thinking we should create a new config option so this is configurable (and divorced from api_request_time).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The default timeout is 1s.
| | |
|----------------|---|
| Type | [duration](configuration.md#configuration-value-types) |
| Default | `1s` |
| Dynamic | `true` |

Copy link
Member

@trentm trentm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me as an improvement over #613

Co-authored-by: Trent Mick <[email protected]>
@@ -291,3 +291,17 @@ Therefore, the Lambda instrumentation has to ensure that data is flushed in a bl

Some Lambda functions will use the custom-built Lambda extension that allows the agent to send its data locally. The extension asynchronously forwards the data it receives from the agent to the APM server so the Lambda function can return its result with minimal delay. In order for the extension to know when it can flush its data, it must receive a signal indicating that the lambda function has completed. There are two possible signals: one is via a subscription to the AWS Lambda Logs API and the other is an agent intake request with the query param `flushed=true`. A signal from the agent is preferrable because there is an inherent delay with the sending of the Logs API signal.
Therefore, the agent must send its final intake request at the end of the function invocation with the query param `flushed=true`. In case there is no more data to send at the end of the function invocation, the agent must send an empty intake request with this query param.

### Flush timeout
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option is used by the Java agent already. See https://www.elastic.co/guide/en/apm/agent/java/current/config-serverless.html#config-data-flush-timeout

Suggested change
### Flush timeout
### Configuration option `data_flush_timeout`

In the edge case where the extension takes too much time to respond (e.g. if there's a lenghy GC pause),
the `flush` method should return after a timeout.

The default timeout is 1s.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The default timeout is 1s.
| | |
|----------------|---|
| Type | [duration](configuration.md#configuration-value-types) |
| Default | `1s` |
| Dynamic | `true` |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants