Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues with latest minor versions colinmollenhour/cache-backend-redis and colinmollenhour/credis #181

Open
glo71317 opened this issue Feb 1, 2024 · 17 comments

Comments

@glo71317
Copy link

glo71317 commented Feb 1, 2024

Hi,

When we ran the composer update then minor version is updated of colinmollenhour/cache-backend-redis and colinmollenhour/credis then we received the large number of performance issues in adobe commerce.

So, during investigation we found these are dependency which was causing issue so we restricted these dependency with older version of both the packages and it worked well to us.

Still more investigation is pending, right now release is very near so we decided to unblock the delivery for upcoming release. will continue investigating once we will get time from current release.

You can also revisit the changes which are introduced in latest version for those packages which i mentioned above.
Find the log
report.zip

For more detail - magento/magento2@9369884#commitcomment-137202896

hostep referenced this issue in magento/magento2 Feb 1, 2024
…ntroduced performance issue in latest version cache-backend-redis and credis
@Xon
Copy link
Contributor

Xon commented Feb 1, 2024

The lock file indicates you went back from v1.17.1 to v1.16.0

The major performance related change was for lua to be enabled by default in v1.17.0. This means less remote calls to redis to perform cache actions.

@glo71317
Copy link
Author

@Xon You are correct! Thanks for highlighting it.

@colinmollenhour We had investigated bit more today and decided to rolled back your small change https://github.com/colinmollenhour/Cm_Cache_Backend_Redis/pull/182/files and run our build against created 182 after updating the minor version of both library colinmollenhour/cache-backend-redis and colinmollenhour/credis.
Screenshot 2024-02-22 at 3 39 25 PM

Performance issue got sorted. so i think we can conclude the major performance related change was for lua to be enabled by default in v1.17.0.
Screenshot 2024-02-22 at 3 39 57 PM

Please take a look and suggest the way forward!

@colinmollenhour
Copy link
Owner

@glo71317

Note that due to lack of transactions, not using Lua mode introduces potential for race conditions. These race conditions would probably not have much impact but could occasionally cause odd side-effects like a cache record that doesn't get cleared when the tag is cleared.
Also, I think very large caches have trouble clearing and doing gc reliably without Lua mode..

It may be that the best solution is to not use Lua when saving cache records since these are many small operations, but do use it when clearing tags and gc.

Are you able to pinpoint what exactly is slower with Lua enabled? Does your test framework have an unrealistic number of cache saves and tag clears? Lua mode definitely should not hurt read performance and a healthy store should have easily way more reads than writes.

@Xon
Copy link
Contributor

Xon commented Feb 23, 2024

If anything the performance impact should be from disabling lua, The lockfile went from having lua enabled to lua disabled due to the version change

@glo71317
Copy link
Author

glo71317 commented Feb 23, 2024

@colinmollenhour You might be right, lua should not be enabled in minor version, best way to introduce Backward compatibility issues in major version. Normally restrict to minor or patch version of any library is not good way to resolve the issue.

Right now it is time consuming to pin point the issue in adobe commerce. Our suggestion would be disabling lua because many other users are also impacting.

@colinmollenhour
Copy link
Owner

I didn't consider this a backward-compatibility issue as the lua mode has been present for a very long time and no code APIs changed. It is unexpected that it had a noticeable negative impact on performance, though. It was released as a minor version update and not a patch update so you could lock your version to ~1.16.0 allowing for future patch updates while not upgrading to 1.17.0.

@hostep
Copy link

hostep commented Mar 5, 2024

It appears Magento 2.4.7 will come with a configuration option to enable or disable lua: magento/magento2@b5ede69

@colinmollenhour
Copy link
Owner

Thanks for the update, @hostep

@glo71317 I'd still appreciate any insight into under what situations the lua flag is a net negative if you have any more details. If the read:write ratio is "normal" compared to what I've seen in my experience the impact should be minimal so I suspect it is unique to the test environment, but I've been wrong before. :)

@glo71317
Copy link
Author

glo71317 commented Mar 6, 2024

@colinmollenhour As we had adapted the lua changes at our end, By default lua will be false in adobe commerce when redis is enabled if someone want use the lua flag enabled then end user can make it enabled lua flag and continue to achieve their requirements.
Even you can close this issue from our end. Thanks for the update!

@hostep
Copy link

hostep commented Mar 6, 2024

@glo71317: I agree with Colin in that it would be a good idea to dig a bit deeper in Magento's cache usage, to figure out exactly what is the root cause of this problem instead of hiding it with a flag.

@glo71317
Copy link
Author

glo71317 commented Mar 6, 2024

@hostep we are not hiding anything will speak to PO for investigating more bit in deep to share read:write ratio. But surely will do after current release finish.

@igorwulff
Copy link

@hostep We have seen similar issues with Redis after use_lua was by default set to 1. We had to set it to 0 again, to stabilize Redis read timeout errors and other errors from popping up. The backend_clean_cache cron from Magento also started to fail after use_lua=1 was set by default.

@Nuranto
Copy link

Nuranto commented Apr 29, 2024

maybe related : magento/magento2#38668

@damienwebdev
Copy link

I think I have a valid reproduction of this on Magento b2.4.6-p6 and colinmollenhour/cache-backend-redis:v1.17.1

My redis monitor is full of:

1723139520.164118 [0 IP:PORT] "evalsha" "1617c9fb2bda7d790bb1aaa320c1099d81825e64" "3" "40d_CONFIG_SCOPES" "40d_CONFIG" "40d_MAGE" "zc:k:" "d" "t" "m" "i" "zc:tags" "zc:ti:" "zc:ids" "40d_SYSTEM" "gz:..." "40d_CONFIG_SCOPES,40d_CONFIG,40d_MAGE" "1723139520" "1" "" "0"
1723139520.778961 [0 lua] "HMSET" "zc:k:40d_SYSTEM" "d" "gz:..." "t" "40d_CONFIG_SCOPES,40d_CONFIG,40d_MAGE" "m" "1723139520" "i" "1"

Upon repeated attempts to make cacheable GET requests. I would assume that the contents of the cache contain something that causes a hash to be mismatched for the same content over and over.

I'm going to try to dig further to see what (in these cache keys) is causing a miss.

@colinmollenhour
Copy link
Owner

Thanks for the info, @damienwebdev

That's the save() command for the 'SYSTEM' key. Are all the updates referencing the '40d_SYSTEM' value in the same position or is that value not repeated at all or rarely?

The Magento config (which consists of many sections/scopes each with their own key) should be treated as basically immutable, there should be absolutely no code that is changing it directly; only a user-triggered config update or new code deployment should touch the config. I've seen extensions that do very naughty things like using the config to store state data for frequently updated state like after a cron job runs - this is what core/flag is good for (in M1, not sure what the M2 equivalent is called). Also, it is bad to use "Flush" on a busy site as it is possible to cause a "stampede" of updates (this was fixed thoroughly in OpenMage, not sure about other variants). It's even worse when there is a config flush buried in an extension somewhere..

If users are updating the config frequently, it is true that it is not a well-optimized scenario as Magento re-saves the entire config even if only a single value is changed so this too should be avoided. E.g. perhaps some config data should instead live in separate tables and not the config. This really applies to any backend, but the Redis lua mode does sacrifice some save performance for stability - without it there is some possibility for race conditions since it requires multiple updates. In general, the assumption is that saves are rare and reads are not, so if saves are not rare, then lua mode will not be a good candidate.

If the core uses a pattern that needlessly causes lots of config updates, then I'd argue that should be fixed first.

Also, I've never checked this in M2 but in OpenMage the config uses an infinite lifetime to prevent the config from being evicted - only an intentional act should clear the config. Lastly, the "allkeys-random" eviction policy should definitely not be used.

However, as I've looked back on this, the current lua implementation does everything in one call but since it involves extra encoding/decoding steps and the config data can be quite large for some stores, it may make sense to split it into two calls: one that saves the key data (the first hmset) and then the evalsha to update all of the tags. I think only an implementation comparing benchmarks can answer that.. I'm not planning on tackling this at the moment, just to be clear.

@damienwebdev
Copy link

@colinmollenhour after digging into this further, it appears my issue is caused by missing the retry_reads_on_master and then triggering a cache clear. Even on a site with absolutely no volume, I was able to see this behavior on a system with 1 master and 9 replicas simply by loading pages in the Magento admin.

Once I added that config, my specific issue resolved. Apologies for the noise.

@colinmollenhour
Copy link
Owner

colinmollenhour commented Aug 20, 2024

Ahh, that will do it, sounds like the replication latency caused a cache saving stampede. OpenMage fixed this in https://github.com/OpenMage/magento-lts/pull/3533/files even without retry_reads_on_master by using MySQL's GET_LOCK. I'm not sure what the M2 implementation is doing.

I tested the idea of moving the hmset that does the main data save outside of the lua script (see https://github.com/colinmollenhour/Cm_Cache_Backend_Redis/tree/lua-save-performance) and it's surprisingly quite a bit slower so I don't think it's a good improvement even though it would potentially reduce blocking time.. So as of now I don't know of any way to improve either lua or non-lua modes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants