Bringing speedups to top-k queries with many and/or high-frequency terms

Sep 11, 2023 by iHash Leave a Comment

In Apache Lucene, queries are responsible for creating sorted streams of matching doc IDs. Implementing a disjunctive query boils down to taking N input queries that produce sorted streams of doc IDs and combining them into a merged sorted stream of doc IDs. The textbook approach to this problem consists of putting input streams into a min-heap data structure ordered by their current doc ID. This approach has been referred to as BooleanScorer2 (BS2) in Lucene.

While BS2 works nicely, it gets a bit of overhead from having to rebalance the heap every time that it needs to move to the next match. BS1 tries to reduce this overhead by splitting the doc ID space into windows of 2,048 documents. In every window, BS1 iterates through all matching doc IDs, one clause at a time. On every doc ID, it computes the index of this doc ID in the window, sets the corresponding bit in a bitset, and adds the current score to the corresponding index in a double[2048]. Iterating matches within the window, then consists of iterating bits of the bitset and looking up the score at the corresponding index in the double[2048]. This approach often runs faster with queries that have many clauses or high-frequency clauses.

These two approaches have been described in a 1997 paper called “Space Optimizations for Total Ranking” by Doug Cutting, the creator of Lucene. BS2 is called “Parallel Merge” in this paper and described in section 4.1, while BS1 is called “Block Merge” and described in section 4.2. These are arguably more descriptive names than BS1 and BS2. Note that the description of “Block Merge” in the paper is quite different from what it looks like in Lucene today, but the underlying idea is the same.

Source link

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Refurbished Apple iPhone 11 Fully Unlocked White / 64GB / Grade A+ for $272

Expires April 09, 2124 10:17 PST Buy now and get 54% off KEY FEATURES Get your hands on this powerful, feature-packed Apple iPhone 11. Shoot 4K videos, beautiful portraits and sweeping landscapes with the all-new dual-camera system. Capture your best low-light photos with night mode. See true-to-life color in your photos, videos and games on […]

Refurbished Apple iPhone 11 Fully Unlocked Green / 64GB / Grade A+ for $272

Expires April 09, 2124 10:16 PST Buy now and get 54% off KEY FEATURES Get your hands on this powerful, feature-packed Apple iPhone 11. Shoot 4K videos, beautiful portraits and sweeping landscapes with the all-new dual-camera system. Capture your best low-light photos with night mode. See true-to-life color in your photos, videos and games on […]

Falcon Fund Invests in Nagomi

Preventable breaches are a common problem. According to research by Nagomi, a leader in the nascent field of automated security control assessment, 80% of breached organizations already had a tool in place that could have prevented it. One solution is to maximize the use of security tools they already have. Many enterprises grapple with ineffective […]

April 2024 Patch Tuesday: Updates and Analysis

Microsoft has released security updates for 150 vulnerabilities in its April 2024 Patch Tuesday rollout, a much larger amount than in recent months. There are three Critical remote code execution vulnerabilities (CVE-2024-21322, CVE-2024-21323 and CVE-2024-29053), all of which are related to Microsoft Defender for IoT, Microsoft’s security platform for IoT devices. April 2024 Risk Analysis […]

Network Threats: A Step-by-Step Attack Demonstration

Follow this real-life network attack simulation, covering 6 steps from Initial Access to Data Exfiltration. See how attackers remain undetected with the simplest tools and why you need multiple choke points in your defense strategy. Surprisingly, most network attacks are not exceptionally sophisticated, technologically advanced, or reliant on zero-day tools that exploit edge-case vulnerabilities. Instead, […]

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.2.0

Pangu has updated its jailbreak utility for iOS 9.0 to 9.0.2 with a fix for the manage storage bug and the latest version of Cydia. Change log V1.2.0 (2015-10-27) 1. Bundle latest Cydia with new Patcyh which fixed failure to open url scheme in MobileSafari 2. Fixed the bug that “preferences -> Storage&iCloud Usage -> […]

Apple Blocks Pangu Jailbreak Exploits With Release of iOS 9.1

Apple has blocked exploits used by the Pangu Jailbreak with the release of iOS 9.1. Pangu was able to jailbreak iOS 9.0 to 9.0.2; however, in Apple’s document on the security content of iOS 9.1, PanguTeam is credited with discovering two vulnerabilities that have been patched.

Pangu Releases Updated Jailbreak of iOS 9 Pangu9 v1.1.0

Pangu has released an update to its jailbreak utility for iOS 9 that improves its reliability and success rate. Change log V1.1.0 (2015-10-21) 1. Improve the success rate and reliability of jailbreak program for 64bit devices 2. Optimize backup process and improve jailbreak speed, and fix an issue that leads to fail to […]

Activator 1.9.6 Released With Support for iOS 9, 3D Touch

Ryan Petrich has released Activator 1.9.6, an update to the centralized gesture, button, and shortcut manager, that brings support for iOS 9 and 3D Touch.

Refurbished Apple iPhone 11 Fully Unlocked White / 64GB / Grade A+ for $272

Refurbished Apple iPhone 11 Fully Unlocked Green / 64GB / Grade A+ for $272

The All-in-One AdGuard Bundle: 5-Year Subscription for $59

UNUM Pro: Lifetime Subscription for $49

Autio Unlimited Plan for $39

Bringing speedups to top-k queries with many and/or high-frequency terms

Refurbished Apple iPhone 11 Fully Unlocked White / 64GB / Grade A+ for $272

Refurbished Apple iPhone 11 Fully Unlocked Green / 64GB / Grade A+ for $272

The All-in-One AdGuard Bundle: 5-Year Subscription for $59

UNUM Pro: Lifetime Subscription for $49

Autio Unlimited Plan for $39

Share this:

Reader Interactions

Leave a ReplyCancel reply