vortimj.blogg.se - Improved overall apache lucene searching performance

#Improved overall apache lucene searching performance code#

The proposal is to replace the internals LruQuer圜ache so that external usages are not affected in terms of the API. Solr has already introduced Caffeine in SOLR-8241 and SOLR-13817.

#Improved overall apache lucene searching performance code#

Since the developers are experts on search, not caching, it seems justified to evaluate if an off-the-shelf library would be more helpful in terms of developer time, code complexity, and performance. It seems that more and more items skip being cached because of concurrency and hit rate performance, causing special case fixes based on knowledge of the external code flows. The cache is tightly intertwined with business logic, making it hard to tease apart core algorithms and data structures from the usage scenarios.An in-progress change tries to avoid LRU thrashing due to large, infrequently used items being cached.That work was reverted due to test failures and is being worked on. An attempt was made to perform computations asynchronously, due to their heavy cost on tail latencies.

This is not counted in the miss rate, giving a false impression. This means a busy lock reduces performance across all usages and the cache's effectiveness degrades. The cache lookup is skipped if the global lock is held and the value is computed, but not stored.However, this is also used to indicate data dependencies for uncachable items (per JIRA), which perhaps shouldn't be invoking the cache. At first glance one assumes that is so that inexpensive entries don't bang on the lock or thrash the LRU. The cache queries the entry to see if it's even worth caching.That redundant work becomes expensive under load and can be mitigated with ~ per-key locks. This doesn't handle the cache stampedes meaning that multiple threads may cache miss, compute the value, and try to store it. All computations are performed outside of the any locking to avoid penalizing other callers.The cache is guarded by a single lock for all reads and writes.

When reviewing the discussions and code, the following issues are concerning: It appears that the cache's overhead can be just as much of a benefit as a liability, causing various workarounds and complexity. LRUQuer圜ache appears to play a central role in Lucene's performance.