changelog: add perf bug fix for \b

Like the previous CHANGELOG entry, this marks a bug that was fixed
likely with the introduction of regex 1.9:

    $ hyperfine "rg-13.0.0 -ic '\bfoo\b \bbar\b' git-3a06386e.txt" "rg -ic '\bfoo\b \bbar\b' git-3a06386e.txt"
    Benchmark 1: rg-13.0.0 -ic '\bfoo\b \bbar\b' git-3a06386e.txt
      Time (mean ± σ):      1.034 s ±  0.011 s    [User: 1.030 s, System: 0.004 s]
      Range (min … max):    1.021 s …  1.053 s    10 runs

    Benchmark 2: rg -ic '\bfoo\b \bbar\b' git-3a06386e.txt
      Time (mean ± σ):       6.3 ms ±   0.3 ms    [User: 4.6 ms, System: 1.6 ms]
      Range (min … max):     5.6 ms …   7.3 ms    343 runs

    Summary
      'rg -ic '\bfoo\b \bbar\b' git-3a06386e.txt' ran
      164.95 ± 7.70 times faster than 'rg-13.0.0 -ic '\bfoo\b \bbar\b' git-3a06386e.txt'

This was not fixed by making \b itself faster, but rather, by improving
inner literal extraction. In particular, if the regex doesn't have any
literals extracted, then search time can still be quite slow:

    $ time rg-13.0.0 -ic '\b[a-z]{3}\b\s\b[a-z]{3}\b' git-3a06386e.txt
    57538

    real    0.427
    user    0.423
    sys     0.003
    maxmem  46 MB
    faults  0
    $ time rg -ic '\b[a-z]{3}\b\s\b[a-z]{3}\b' git-3a06386e.txt
    57538

    real    0.337
    user    0.333
    sys     0.003
    maxmem  46 MB
    faults  0

But then again, so is grep, because grep doesn't benefit from any
literal optimizations either:

    $ time grep -E -ic '\b[a-z]{3}\b\s\b[a-z]{3}\b' git-3a06386e.txt
    62396

    real    1.316
    user    1.292
    sys     0.007
    maxmem  13 MB
    faults  7

The count mismatch should probably be investigated.

Fixes #1760
This commit is contained in:
Andrew Gallant
2023-10-09 19:51:44 -04:00
parent a2799ccb41
commit 5011f6e9f1

View File

@@ -10,6 +10,8 @@ Unreleased changes. Release notes have not yet been written.
Performance improvements:
* [PERF #1760](https://github.com/BurntSushi/ripgrep/issues/1760):
Make most searches with `\b` look-arounds (among others) much faster.
* [PERF #2591](https://github.com/BurntSushi/ripgrep/pull/2591):
Parallel directory traversal now uses work stealing for faster searches.