Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues on duplicates occuring on both 8.14.1 and on 8.14.3 but not on 8.12.2 #40808

Open
ZerNox opened this issue Sep 13, 2024 · 2 comments
Open
Labels
needs_team Indicates that the issue/PR needs a Team:* label

Comments

@ZerNox
Copy link

ZerNox commented Sep 13, 2024

We have upgraded from 8.12.2 to first 8.14.1 and then to 8.14.3 but on both 8.14.1 and 8.14.3 we have issues with filebeat, that we think occurs in the filestream reader. No duplications on 8.12.2 if we revert the version and run the same load test.

Is there anything obvious that has changed in terms of the configuration between the versions that i have missed, or is there something that we should modify in our configuration?

For confirmed bugs, please report:

  • Version:
  • Operating System: U22.04.4 LTS (Jammy Jellyfish)
    Server: Docker Engine - Community
    Engine:
    Version: 27.2.0
    API version: 1.47 (minimum version 1.24)
    Go version: go1.21.13
    Git commit: 3ab5c7d
    Built: Tue Aug 27 14:15:13 2024
    OS/Arch: linux/amd64
    Experimental: false
    containerd:
    Version: 1.7.21
    GitCommit: 472731909fa34bd7bc9c087e4c27943f9835f111
    runc:
    Version: 1.1.13
    GitCommit: v1.1.13-0-g58aa920
    docker-init:
    Version: 0.19.0
    GitCommit: de40ad0
  • Discuss Forum URL: N/A
  • Steps to Reproduce:
  • Start logging and we can see that we get duplicates over running a standard load test.
setup.ilm.enabled: false
setup.template.overwrite: true
#logging.level: debug
filebeat.inputs:
  - type: filestream
    paths:
      - "/application/logs/*.log"
      - "/application/logs/archive/*.log-*"
    json.keys_under_root: true
    json.overwrite_keys: true
    prospector.scanner.exclude_files: ["nlog-internal-error.log","^.log"]
    encoding: utf-8
    rotation.external.strategy.copytruncate:
      suffix_regex: \-\d{6}$\.\d$
      dateformat: -2006-01-02
processors:
  - decode_json_fields:
      fields: ["message"]
      process_array: false
      max_depth: 2
      target: ""
      overwrite_keys: true
      add_error_key: false
  - timestamp:
      field: timestamp
      layouts:
        - '2006-01-02T15:04:05.999999Z'
        - '2006-01-02T15:04:05Z'
        - '2006-01-02T15:04:05.999Z'
      test:
        - '2022-04-06T03:13:35.654269Z'
  - drop_fields:
      fields: ["timestamp"]
  - rename:
      when:
        has_fields: ['labels.MetricName']
      fields:
      - from: "labels.MetricName"
        to: "metric-name"
      - from: "labels.MetricValue"
        to: "metric-value"
      ignore_missing: true
      fail_on_error: true
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 13, 2024
@botelastic
Copy link

botelastic bot commented Sep 13, 2024

This issue doesn't have a Team:<team> label.

@ZerNox ZerNox changed the title Issues on duplicates occuring on both 8.14.1 and on 8.14.3 Issues on duplicates occuring on both 8.14.1 and on 8.14.3 but not on 8.12.2 Sep 13, 2024
@rlueckl
Copy link

rlueckl commented Sep 20, 2024

We're experiencing the same issue since upgrading to 8.14.x. I've narrowed it down to the problem being introduced in 8.14.0. The duplicates did not happen on 8.13.4. I suspect #36761 and #38488 to be the cause, but I couldn't confirm it yet.

The duplicates happen for us if we have a logfile which isn't updated within the close_inactive timeout. By using the settings from Inode reuse causes Filebeat to skip lines we could work around the issue and it's not happening anymore, even with 8.14.x

Our default config which causes duplicates on 8.14.x:

---
- type: filestream
  id: cassandra_system
  paths:
  - /var/log/cassandra/system.log
  encoding: plain
  document_type: log
  prospector:
     scanner:
        check_interval: 10s
  harvester_buffer_size: 16384
  message_max_bytes: 10485760
  parsers:
    - multiline:
        pattern: '^[[:space:]]*[A-Z]+[[:space:]]+\['
        negate: true
        match: after
  tail_files: false

  backoff.init: 1s
  backoff.max: 10s
  backoff_factor: 2

  close_inactive: 5m
  close_renamed: false
  close_removed: true
  close_eof: false
  clean_inactive: 0
  clean_removed: true
  close_timeout: 0

  fields:
    type: cassandra

  fields_under_root: true

New config working with 8.14.x:

---
- type: filestream
  id: cassandra_system
  paths:
  - /var/log/cassandra/system.log
  encoding: plain
  ignore_older: 48h
  document_type: log
  prospector:
     scanner:
        check_interval: 10s
  harvester_buffer_size: 16384
  message_max_bytes: 10485760
  parsers:
    - multiline:
        pattern: '^[[:space:]]*[A-Z]+[[:space:]]+\['
        negate: true
        match: after
  tail_files: false

  backoff.init: 1s
  backoff.max: 10s
  backoff_factor: 2

  close_inactive: 10m
  close_renamed: false
  close_removed: true
  close_eof: false
  clean_inactive: 72h
  clean_removed: true
  close_timeout: 0

  fields:
    type: cassandra

  fields_under_root: true

So the differences between the two configs are:

  ignore_older: 48h (vs. not set)
  close_inactive: 10m (vs. 5m)
  clean_inactive: 72h (vs. 0)

Most of these settings are default values coming from pcfens/filebeat and after playing around with the settings only clean_inactive seems to affect the issue. If I set it to anything other than 0, there are no duplicates. If clean_inactive: 0 is set, the duplicates start coming.

Some more investigation revealed that setting clean_inactive: 0 in 8.14.x has the effect of nulling out the cursor and meta fields in the registry file (/var/lib/filebeat/filebeat/log.json):

clean_inactive: 0 in 8.13.x:

{"op":"set","id":65254}
{"k":"filestream::cassandra_system::native::262312-2051","v":{"ttl":1800000000000,"updated":[516033235298,1726832301],"cursor":{"offset":2052210},"meta":{"source":"/var/log/cassandra/system.log","identifier_name":"native"}}}
{"op":"set","id":65255}
{"k":"filestream::cassandra_system::native::262312-2051","v":{"ttl":1800000000000,"updated":[516035577754,1726832336],"cursor":{"offset":2052383},"meta":{"source":"/var/log/cassandra/system.log","identifier_name":"native"}}}

clean_inactive: 0 in 8.14.x:

{"op":"set","id":65260}
{"k":"filestream::cassandra_system::native::262312-2051","v":{"cursor":null,"meta":null,"ttl":0,"updated":[515468346993,1726834065]}}
{"op":"set","id":65261}
{"k":"filestream::cassandra_system::native::262312-2051","v":{"ttl":0,"updated":[515468400457,1726834965],"cursor":null,"meta":null}}

Shouldn't the cursor.offset (and meta) have a value, even if clean_inactive is zero?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs_team Indicates that the issue/PR needs a Team:* label
Projects
None yet
Development

No branches or pull requests

2 participants