Compare commits

...

157 Commits

Author SHA1 Message Date
Andrew Gallant
f4d07b9cbd grep-cli-0.1.8 2023-07-05 17:09:09 -04:00
Andrew Gallant
0b6eccf4d3 ci: try to fix CI 2023-07-05 14:04:29 -04:00
Andrew Gallant
3ac4541e9f regex: remove old inner literal extractor
(It had already been removed from the crate.)
2023-07-05 14:04:29 -04:00
Andrew Gallant
7b72e982f2 deps: update everything 2023-07-05 14:04:29 -04:00
Andrew Gallant
a68db3ac02 deps: drop temporary patch and move to bstr 1.6
Now that regex 1.9 is out, we can depend on it from crates.io.
2023-07-05 14:04:29 -04:00
Andrew Gallant
b12905daca deps: update everything 2023-07-05 14:04:29 -04:00
Andrew Gallant
ca740d9ace regex: add new inner literal extractor
This is mostly a copy of the prefix literal extractor in regex-syntax,
but with a tweaked notion of Seq that keeps track of whether it's a
prefix of an expression or not. If it isn't, then we can't cross it as a
suffix to another Seq.

This new extractor should be a lot more robust than the old one. We
actually will keep going through the regex to try and find the "best"
literals to search for (according to some heuristic).
2023-07-05 14:04:29 -04:00
Andrew Gallant
e80c102dee regex: tweak formatting of regex-automata version spec
This makes it easier to enable the `logging` feature for regex-automata.

I wish I could just enable it unconditionally, but it winds up producing
a lot of output because ripgrep uses regexes for things other than the
primary search (like every glob). Sigh.
2023-07-05 14:04:29 -04:00
Andrew Gallant
8ac66a9e04 regex: refactor matcher construction
This does a little bit of refactoring so that we can pass both a
ConfiguredHIR and a Regex to the inner literal extraction routine.

One downside of this approach is that a regex object hangs on to a
ConfiguredHIR. But the extra memory usage is probably negligible. A
benefit though is that converting the HIR to its concrete syntax is now
lazy and only happens when logging is enabled.
2023-07-05 14:04:29 -04:00
Andrew Gallant
04dde9a4eb regex: tweak DFA settings
This increases the limits a bit for when the regex engine will build and
use a fully compiled DFA. They can faster in some circumstances. For
example, '(?-u)^\w{30,}$' gets a nice speed boost from state
acceleration.

We are also able to remove `regex` proper as a dependency. Wow.
2023-07-05 14:04:29 -04:00
Andrew Gallant
81341702af regex: push more pattern handling to matcher construction
Previously, ripgrep core was responsible for escaping regex patterns and
implementing the --line-regexp flag. This commit moves that
responsibility down into the matchers such that ripgrep just needs to
hand the patterns it gets off to the matcher builder. The builder will
then take care of escaping and all that.

This was done to make pattern construction completely owned by the
matcher builders. With the arrival regex-automata, this means we can
move to the HIR very quickly and then never move back to the concrete
syntax. We can then build our regex directly from the HIR. This overall
can save quite a bit of time, especially when searching for large
dictionaries.

We still aren't quite as fast as GNU grep when searching something on
the scale of /usr/share/dict/words, but we are basically within spitting
distance. Prior to this, we were about an order of magnitude slower.

This architecture in particular lets us write a pretty simple fast path
that avoids AST parsing and HIR translation entirely: the case where one
is just searching for a literal. In that case, we can hand construct the
HIR directly.
2023-07-05 14:04:29 -04:00
Andrew Gallant
d34c5c88a7 globset: fix build error in tests
I guess we haven't been testing with the Serde feature enabled? Weird.
2023-07-05 14:04:29 -04:00
Andrew Gallant
4b8aa91ae5 deps: update to pcre2 0.2.4
0.2.4 updates to PCRE2 10.42 and has a few other nice changes. For
example, when `utf` is enabled, the crate will always set the
PCRE2_MATCH_INVALID_UTF option. That means we no longer need to do
transcoding or UTF-8 validity checks.

Because of this, we actually get to remove one of the two uses of
`unsafe` in ripgrep's `main` program.

(This also updates a couple other dependencies for convenience.)
2023-07-05 14:04:29 -04:00
Andrew Gallant
a775b493fd regex: small cleanups
Just some small polishing. We also get rid of thread_local in favor of
using regex-automata, mostly just in the name of reducing dependencies.
(We should eventually be able to drop thread_local completely.)
2023-07-05 14:04:29 -04:00
Andrew Gallant
a6dbff502f regex: s/locations/captures
Now that we use regex-automata, we no longer use any type with
"locations" in it. Instead, that's mostly legacy from the top-level
regex crate.
2023-07-05 14:04:29 -04:00
Andrew Gallant
51480d57a6 regex: simplify AST analysis a bit
The verbatim literal stuff hasn't been used for a while and I don't
foresee it being used. If it's really needed, it would probably better
to just implement it by looking at the pattern string itself, which
avoids parsing it into an AST altogether.
2023-07-05 14:04:29 -04:00
Andrew Gallant
d9bd261be8 regex: some small cleanup in 'strip.rs'
We also utilize bstr's methods to get rid of some helpers we had written
by hand.
2023-07-05 14:04:29 -04:00
Andrew Gallant
9d62eb997a BREAKING: regex: finally remove CRLF hack
Now that Rust's regex crate finally supports a CRLF mode, we can remove
this giant hack in ripgrep to enable it. (And assuredly did not work in
all cases.)

The way this works in the regex engine is actually subtly different than
what ripgrep previously did. Namely, --crlf would previously treat
either \r\n or \n as a line terminator. But now it treats \r\n, \n and
\r as line terminators. In effect, it is implemented by treating \r and
\n as line terminators, but ^ and $ will never match at a position
between a \r and a \n.

So basically this means that $ will end up matching in more cases than
it might be intended too, but I don't expect this to be a big problem in
practice.

Note that passing --crlf to ripgrep and enabling CRLF mode in the regex
via the `R` inline flag (e.g., `(?R:$)`) are subtly different. The `R`
flag just controls the regex engine, but --crlf instructs all of ripgrep
to use \r\n as a line terminator. There are likely some inconsistencies
or corner cases that are wrong as a result of this cognitive dissonance,
but we choose to leave well enough alone for now.

Fixing this for real will probably require re-thinking how line
terminators are handled in ripgrep. For example, one "problem" with how
they're handled now is that ripgrep will re-insert its own line
terminators when printing output instead of copying the input. This is
maybe not so great and perhaps unexpected. (ripgrep probably can't get
away with not inserting any line terminators. Users probably expect
files that don't end with a line terminator whose last line matches to
have a line terminator inserted.)
2023-07-05 14:04:29 -04:00
Andrew Gallant
e028ea3792 regex: migrate grep-regex to regex-automata
We just do a "basic" dumb migration. We don't try to improve anything
here.
2023-07-05 14:04:29 -04:00
Andrew Gallant
1035f6b1ff deps: initial migration steps to regex 1.9
This leaves the grep-regex crate in tatters. Pretty much the entire
thing needs to be re-worked. The upshot is that it should result in some
big simplifications. I hope.

The idea here is to drop down and actually use regex-automata 0.3
instead of the regex crate itself.
2023-07-05 14:04:29 -04:00
Andrew Gallant
a7f1276021 readme: update Debian instructions
We probably don't need to mention Buster specifically nor Debian
unstable since ripgrep has been in Debian for a while now.

But we can't just get rid of the `deb` file either, because Debian might
package a very old version.

Fixes #2531
2023-06-12 07:50:13 -04:00
Martin Nordholts
4fcb1b2202 cli: replace atty with std::io::IsTerminal
The `atty` crate is unmaintained[1] and `std::io::IsTerminal` was
stabilized in Rust 1.70.

[1]: https://rustsec.org/advisories/RUSTSEC-2021-0145.html

PR #2526
2023-06-05 14:00:46 -04:00
Francois Marier
949092fd22 ignore/types: add 'mdwn' to Markdown
PR #2520
2023-05-26 14:44:41 -04:00
Andrew Gallant
4a7e7094ad deps: update everything else 2023-05-25 13:06:13 -04:00
Andrew Gallant
fc0d9b90a9 deps: bump regex to 1.8.3
This brings in an update from the regex crate that fixes a matching bug
for particular kinds of alternations of literals.

Fixes #2518
2023-05-25 13:06:13 -04:00
Ville Skyttä
335aa4937a ignore/types: add *.pyi for Python
https://peps.python.org/pep-0484/#stub-files

PR #2517
2023-05-23 07:10:02 -04:00
Adam Reichold
803c447845 searcher: re-enable mmap on 32-bit architectures
memmap2 v0.3.0 introduced a regression when trying to map files larger than 4GB
on 32-bit architectures[1] which was subsequently fixed in v0.3.1[2].

This commit bumps locked version of the memmap2 dependency to the current v0.5.0
and reverts fdfc418be5 to re-enable mmap on 32-bit
architectures as a different approach to fixing [3].

This was tested to report matches from the end of a 5GB file using MinGW and Wine.

Ref #1911, PR #2000 

[1] 5e271224c8
[2] 9aa838aed9
[3] https://github.com/BurntSushi/ripgrep/issues/1911
2023-05-19 08:23:53 -04:00
Andrew Gallant
c5415adbe8 deps: update everything
This does unfortunately bring in both regex-syntax 0.6 and 0.7, but
we'll fix that once regex 1.9 is out.
2023-05-16 13:14:23 -04:00
Andrew Gallant
251376597f deps: update minimum version of grep crate
Ref #2516
2023-05-16 13:13:34 -04:00
Andrew Gallant
e593f5b7ee grep-0.2.12 2023-05-16 13:12:45 -04:00
Andrew Gallant
6b19be2477 crates/grep: remove 'deny(missing_docs)'
This crate is only a shim over a bunch of other crates. I'm not sure
that there's anything to add to each of the `pub extern` items. So
instead of just writing fluff, I removed the lint.

Fixes #2516
2023-05-16 13:10:42 -04:00
Ryan Whitehouse
041544853c doc: fix --quiet docs
The wording was previously inverted, which had the opposite
meaning as was intended.

Fixes #1962
2023-03-28 07:22:59 -04:00
Manu
a7ae9e4043 ignore/types: add support for docker-compose files
Default file is docker-compose.yml and the documentation
mentions overrides in the form of docker-compose.*.yml.

PR #2469
2023-03-21 12:56:38 -04:00
Andrew Gallant
595e7845b8 readme: add a link to delta's support for ripgrep
Ref: https://github.com/BurntSushi/ripgrep/issues/86#issuecomment-1469717706
2023-03-15 08:02:04 -04:00
David Ringo
44fb9fce2c ignore/types: add *.sln for msbuild
.sln is the extension for Visual Studio Project Soltion files, one of
the file types accepted as inputs by MSBuild.

PR #2415
2023-02-09 21:20:49 -05:00
Vincent Bockaert
339c46a6ed ignore/types: enhance terraform default filter
The default filter for terraform only checks for *.tf files, but there
are quite few other terraform filetypes.

The explanation for all of them can be found below (including link to
documentation from Hashicorp at time of writing)

- *.tf.json & *.tfvars.json is to capture the files written in
  JSON-based variant of the Terraform language
    - https://developer.hashicorp.com/terraform/language/files
- *.tfvars is used to supply variables
    - https://developer.hashicorp.com/terraform/cloud-docs/workspaces/variables#6-auto-tfvars-variable-files
- .terraform.lock.hcl is used as a Dependency lock file
    - https://developer.hashicorp.com/terraform/language/files/dependency-lock
- terraform.rc & .terraformrc, *.tfrc
    - https://developer.hashicorp.com/terraform/cli/config/config-file

PR #2412
2023-02-09 12:57:01 -05:00
Andrew Gallant
fe97c0a152 ignore-0.4.20 2023-01-15 08:21:02 -05:00
Christian Vallentin
826f3fad5b ignore/api: add Clone and Debug impls for OverrideBuilder
PR #2397
2023-01-15 08:16:27 -05:00
Andrew Gallant
bc55049327 readme: update MSRV in README
... this was apparently long outdated, wow.
2023-01-05 12:09:46 -05:00
Andrew Gallant
d58e9353fc deps: update to grep 0.2.11 2023-01-05 09:13:47 -05:00
Andrew Gallant
ca60fef4db grep-0.2.11 2023-01-05 09:12:49 -05:00
Andrew Gallant
a25307d6c8 deps: update to grep-printer 0.1.7 2023-01-05 09:12:37 -05:00
Andrew Gallant
b80947a8b3 grep-printer-0.1.7 2023-01-05 09:11:16 -05:00
Andrew Gallant
ad793a0d8f deps: update to grep-searcher 0.1.11 2023-01-05 09:07:49 -05:00
Andrew Gallant
120e55e7c7 grep-searcher-0.1.11 2023-01-05 09:07:09 -05:00
Andrew Gallant
3941a7701d deps: update to grep-pcre2 0.1.6 2023-01-05 09:06:52 -05:00
Andrew Gallant
96e130fbf9 grep-pcre2-0.1.6 2023-01-05 09:05:59 -05:00
Andrew Gallant
180c4eaf8b deps: update to grep-regex 0.1.11 2023-01-05 09:05:39 -05:00
Andrew Gallant
81529288cf grep-regex-0.1.11 2023-01-05 09:02:55 -05:00
Andrew Gallant
bcc7473a87 deps: update to grep-matcher 0.1.6 2023-01-05 09:02:40 -05:00
Andrew Gallant
bc78c644db grep-matcher-0.1.6 2023-01-05 09:00:33 -05:00
Andrew Gallant
dc7267a0fb deps: update to grep-cli 0.1.7 2023-01-05 08:58:47 -05:00
Andrew Gallant
3224324e25 grep-cli-0.1.7 2023-01-05 08:57:31 -05:00
Andrew Gallant
0f61f08eb1 deps: update to ignore 0.4.19 2023-01-05 08:57:05 -05:00
Andrew Gallant
a0e8dbe9df ignore-0.4.19 2023-01-05 08:55:46 -05:00
Andrew Gallant
e95254a86f deps: remove ignore's dependency on crossbeam-utils
Scoped threads are now part of std.
2023-01-05 08:51:08 -05:00
Andrew Gallant
2f484d8ce5 deps: update to globset 0.4.10 2023-01-05 08:49:58 -05:00
Andrew Gallant
364772ddd2 globset-0.4.10 2023-01-05 08:45:47 -05:00
Andrew Gallant
2e207833bc deps: upgrade to jemallocator 0.5 2023-01-05 08:33:43 -05:00
Andrew Gallant
92b35a65f8 deps: upgrade to base64 0.20 2023-01-05 08:21:49 -05:00
Andrew Gallant
ac8fecbbf2 deps: upgrade bstr to 1.1 2023-01-05 08:21:15 -05:00
Andrew Gallant
8596817374 deps: do semver compatible upgrades 2023-01-05 08:16:32 -05:00
Andrew Gallant
28bff84a0a deps: remove 'num_cpus'
Now that std:🧵:available_parallelism is a thing, we no longer
need num_cpus.
2023-01-05 08:15:09 -05:00
Alex Touchet
61101289fa cargo: set rust-version
This should hopefully make compilation errors from using
an older-than-supported compiler more helpful.

PR #2373
2022-12-21 07:37:09 -05:00
Andrew Gallant
13faa39b66 deps: update all dependencies within semver
Note that this adds a new dependency, 'unicode-ident', and removes
'unicode-xid'. I looked briefly at 'unicode-ident' and all looks okay.
It is also permissively licensed.
2022-12-20 09:23:29 -05:00
Andrew Gallant
6b61271bbb benchsuite/runs: add another run of the benchmarks
Looks like ripgrep is still the king. ;-)
2022-12-16 11:24:10 -05:00
Andrew Gallant
1be86392e0 benchsuite: pass '-a' to ugrep in some cases
It looks like it incorrectly treats a file that is purely valid UTF-8 as
a binary file, which in turn effectively renders all of the Russian
subtitle benchmarks moot for ugrep. So we pass '-a' to force ugrep to
treat the file as text.

This technically gives ugrep an edge because it now no longer needs to
look to see if the haystack is binary or not. In practice this is
usually implemented using highly optimized SIMD routines (e.g.,
'memchr'), so it tends not to matter much. We might also consider
passing '-a' to all grep commands. But... I think using '-a' is the less
common case and we should try to benchmark the common case.
2022-12-16 11:21:58 -05:00
Andrew Gallant
63058453fa benchsuite: update URLs
This removes the old commented out URLs for the 2016 subtitles that
don't work any more. I should probably upload the files to a more stable
URL.

This also switches to a 'https://' GitHub URL as I believe the 'git://'
URLs are no longer supported.
2022-12-16 11:20:45 -05:00
Armin Brauns
7f23cd63a5 ignore/types: add automated test for sortedness
People occasionally get this wrong and I've been manually
checking it. Instead, let's have CI do it automatically.

PR #2351
2022-11-14 08:31:07 -05:00
Andrew Gallant
8905d54a9f msrv: bump to Rust 1.65.0
This matches the latest stable release of Rust and let's us use nice
things like 'let else'.
2022-11-14 07:56:17 -05:00
Armin Brauns
25a4eaf5ae ignore/types: add devicetree filetype
See: https://www.devicetree.org/

PR #2349
2022-11-14 07:42:57 -05:00
jgart
0000157917 readme: add guix installation instructions
PR #2344
2022-11-02 08:10:54 -04:00
jgart
65b1b0e38a ignore/types: add carp
See: https://github.com/carp-lang/Carp

PR #2343
2022-11-01 07:17:00 -04:00
Glenn Slotte
c032cda4b7 ignore/types: add ReScript and ReasonML
PR #2340
2022-10-29 13:49:19 -04:00
Marcin Nowak-Liebiediew
eab044d829 ignore/types: add motoko and candid
See: https://github.com/dfinity/candid
See: https://github.com/dfinity/motoko

PR #2335
2022-10-20 09:22:41 -04:00
Andrew Gallant
55e62a4411 readme: add more links to overview
Many of the features are documented in the GUIDE, so let's just link to
them.
2022-10-19 11:06:44 -04:00
Andrew Gallant
5b2f614aad readme: add note about 'rg -uuu'
I'm not sure about putting this in such a prominent spot, and it does
bloat the introductory paragraph a bit, but it seems like an important
special case.
2022-10-19 09:52:37 -04:00
dependabot[bot]
4386b8e805 ci: bump actions/checkout from 2 to 3 (#2318)
Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 3.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v2...v3)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-29 08:18:47 -04:00
dependabot[bot]
6b012d8129 ci: bump actions/upload-release-asset from 1.0.1 to 1.0.2 (#2317)
Bumps [actions/upload-release-asset](https://github.com/actions/upload-release-asset) from 1.0.1 to 1.0.2.
- [Release notes](https://github.com/actions/upload-release-asset/releases)
- [Commits](https://github.com/actions/upload-release-asset/compare/v1.0.1...v1.0.2)

---
updated-dependencies:
- dependency-name: actions/upload-release-asset
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-29 08:15:36 -04:00
LingMan
a928ca4221 ci: enable Dependabot for the Actions workflows
Dependabot automatically files PRs for updatable dependencies. As
configured it watches all workflow files in `.github/workflows` for
possible updates to any of the Actions depended upon.

We specifically do not enable Dependabot for other things, in order to
avoid running in a hamster wheel.

Closes #2315
2022-09-29 07:44:30 -04:00
LingMan
d1570defbf ci: remove fetch-depth parameter from the checkout action
It is already set to 1 by default.

Closes #2316
2022-09-29 07:44:19 -04:00
LingMan
b732c23e36 ci: use cargo check's --check option directly 2022-09-29 07:44:13 -04:00
LingMan
49965703fa ci: switch to using '@master' dtolnay action
The `v1` tag exists but isn't really supported.

This mirrors [1]. See also [2].

[1]: 50086e74da
[2]: https://github.com/BurntSushi/bstr/pull/122#issuecomment-1201930916
2022-09-29 07:43:29 -04:00
LingMan
609838aebd ci: use latest runner images in CI
The `ubuntu-18.04` image is deprecated and will be removed by
2023-04-01[1][2] with scheduled brownouts starting on 2022-10-03.
Update all images to the latest available versions.

[1]: https://github.blog/changelog/2022-08-09-github-actions-the-ubuntu-18-04-actions-runner-image-is-being-deprecated-and-will-be-removed-by-12-1-22/
[2]: https://github.com/actions/runner-images/issues/6002
2022-09-29 07:43:10 -04:00
Dave Rolsky
515f120b5c doc: fix typo
PR #2313
2022-09-24 13:23:59 -04:00
Linda_pp
a66315d232 ignore/types: add *.cjs, *.mjs, *.cts, *.mts
These are used by both Node.js and TypeScript to indicate that a file
is CommonJS or ES.

Node.js: https://nodejs.org/api/esm.html

TypeScript: https://www.typescriptlang.org/docs/handbook/esm-node.html#new-file-extensions

PR #2297
2022-08-31 08:11:13 -04:00
Nacho Barrientos
bdf10ab7c0 ignore/types: add embedded puppet templates
.epp files are getting more and more common in Puppet code bases so it
makes sense I think to include them as part of the "puppet" type.

https://puppet.com/docs/puppet/7/lang_template_epp.html

PR #2141
2022-08-21 12:32:03 -04:00
John Saigle
a02678800b ignore/types: add Solidity
See: https://soliditylang.org/about/

PR #2284
2022-08-17 09:37:32 -04:00
Andrew Gallant
387df97d85 ripgrep: add /.github/ to whitelist
It's pretty common to want to search this, since it defines the CI
configuration of the project.
2022-08-17 08:31:22 -04:00
David Marzal
a9d97a1dda doc: add '-.' as short flag for '--hidden'
PR #2279
2022-08-10 08:03:04 -04:00
drebelsky
3bb71b0cb8 doc: fix a few typos
PR #2274
2022-08-06 14:29:27 -04:00
Malte
87b33c96c0 ignore/types: improve 'markdown' and 'php' types
This adds some lesser known extensions.

Notably, it adds php7 and php8, but not php6. Apparently,
php6 was never a thing: https://wiki.php.net/rfc/php6

PR #2263
2022-07-18 10:35:09 -04:00
Andrew Gallant
5e975c43f8 doc: appease rustdoc 2022-07-15 10:13:55 -04:00
Andrew Gallant
7efa2e46d3 grep-0.2.10 2022-07-15 10:06:53 -04:00
Andrew Gallant
db0b92b62d grep: bump grep-searcher to 0.1.10
This was a result of leaving a stray 'dbg!'.
2022-07-15 10:06:31 -04:00
Andrew Gallant
33b81cac48 grep-searcher-0.1.10 2022-07-15 10:05:46 -04:00
Andrew Gallant
6a13a4f64d searcher: remove stray 'dbg!' 2022-07-15 10:05:20 -04:00
Andrew Gallant
b13d835d95 grep-0.2.9 2022-07-15 10:03:06 -04:00
Andrew Gallant
d53506b7f7 grep: bump 'grep-regex' and 'grep-searcher'
To 0.1.10 and 0.1.9, respectively.
2022-07-15 10:02:41 -04:00
Andrew Gallant
78a35d4d43 grep-searcher-0.1.9 2022-07-15 10:02:24 -04:00
Andrew Gallant
a933d0bc90 searcher: bump grep-regex dep to 0.1.10 2022-07-15 10:02:06 -04:00
Andrew Gallant
2cae30e399 grep-regex-0.1.10 2022-07-15 10:01:42 -04:00
Andrew Gallant
8e57989cd2 regex: fix matching bug when text anchors are used
It turns out that if there are text anchors (that is, \A or \z, or ^/$
when multi-line is disabled), then the "fast" line searching path isn't
quite correct. Since searching without multi-line mode is exceptionally
rare, we just look for the presence of text anchors and specifically
disable the line terminator option in 'grep-regex'. This in turn
inhibits the "fast" line searching path.

Fixes #2260
2022-07-15 09:53:39 -04:00
Andrew Gallant
b9f5835534 ci: switch to dtolnay/rust-toolchain
The actions-rs/toolchain project appears dead. dtolnay's also seems more
sustainable given its simplicity, but it does enough to suit our needs.
2022-07-14 13:48:14 -04:00
tleb
e70778e89d ignore/types: add dts to default types
See: https://devicetree-specification.readthedocs.io/en/v0.3/source-language.html

PR #2255
2022-07-07 12:24:12 -04:00
zhimoe
87c4a2b4b1 doc: fix typo
PR #2248
2022-06-26 18:49:54 -04:00
Kian-Meng Ang
0aa31676e3 doc: fix typos
PR #2245
2022-06-24 09:58:20 -04:00
Andrew Gallant
9f0e88bcb1 ignore: fix gitignore parsing bug for trailing \/
When a glob pattern ended with a \/, and since we permit backslash
escapes, the glob parser gave a "dangling escape" error. Which is weird,
because the \ is clearly not dangling.

The issue is that the layer above the glob parser, the gitignore parser,
was stripping the trailing / so that it wouldn't be part of the matching
logic. Of course, stripping the trailing / while it is escaped without
removing the backslash escape is wrong. So we do that here.

Fixes #2236
2022-06-14 10:40:37 -04:00
Alex Touchet
eb4b389846 globset/readme: update version number and some links
PR #2232
2022-06-11 14:17:32 -04:00
Andrew Gallant
dc337bab0a deps: update to globset 0.4.9 2022-06-10 14:11:20 -04:00
Andrew Gallant
2cfb338530 globset-0.4.9 2022-06-10 14:10:34 -04:00
Sergio Benitez
48646e3451 globset: make 'log' an optional feature
PR #1910
2022-06-10 14:10:09 -04:00
Andrew Gallant
985394a19e deps: update to packed_simd_2 0.3.8
It broke on latest nightly. I'm *very* close to just removing the
'simd-accel' feature altogether.

Fixes #2230
2022-06-10 09:39:17 -04:00
jgart
ec36f8c3ff ignore/types: add pants
See: https://www.pantsbuild.org/

PR #2228
2022-06-08 13:29:17 -04:00
jpe90
a726d03641 ignore/types: add hare to default types
PR #2219
2022-05-22 20:08:45 -04:00
Andrew Gallant
91afd4214a printer: fix duplicative replacement in multiline mode
This furthers our kludge of dealing with PCRE2's look-around in the
printer. Because of our bad abstraction boundaries, we added a kludge to
deal with PCRE2 look-around by extending the bytes we search by a fixed
amount to hopefully permit any look-around to operate. But because of
that kludge, we wind up over extending ourselves in some cases and
dragging along those extra bytes.

We had fixed this for simple searching by simply rejecting any matches
past the end point. But we didn't do the same for replacements. So this
commit extends our kludge to replacements.

Thanks to @sonohgong for diagnosing the problem and proposing a fix. I
mostly went with their solution, but adding the new replacement routine
as an internal helper rather than a new APIn in the 'grep-matcher'
crate.

Fixes #2095, Fixes #2208
2022-05-11 14:44:58 -04:00
Keith Smiley
4dc6c73c5a ignore/types: improve Bazel globs
MODULE.bazel is a new file, and WORKSPACE.bazel was always supported
similar to BUILD.bazel vs BUILD.

PR #2203
2022-05-09 11:50:34 -04:00
Alex Touchet
36d03b4101 cargo: use SPDX license format for all crates
This was done for the main crate in d11a3b3377.

See also #987.

PR #2204
2022-05-09 07:52:11 -04:00
Conrad Meyer
d161acb0a3 ignore/types: add '*.hh' to C++ headers
Like .hpp, .hh is an occasionally used extension for C++ headers
(to distinguish them from C headers). At least one popular project,
FreeBSD, uses this extension.

See also: https://docs.fileformat.com/programming/hh/

PR #2192
2022-04-25 07:38:03 -04:00
Matrix Dai
30ee6f08ee ignore/types: add '*.asp' for asp type
The `*.asp` was not included in the type "asp" when it was added.
https://github.com/BurntSushi/ripgrep/pull/1134

PR #2188
2022-04-19 10:36:14 -04:00
Andrew Gallant
ced5b92aa9 deps: bump memmap2 to 0.5
Looking at the memmap2 CHANGELOG, there don't appear to be any breaking
changes that impact us.
2022-03-21 08:59:05 -04:00
Andrew Gallant
191315a2ea deps: update everything
Surprisingly looks like no new dependencies were added! Yay! And we
removed an extra copy of 'cfg-if' due to what appears to be an updated
in 'packed_simd_2'.

Otherwise, all updates appear to be minor things.
2022-03-21 08:59:05 -04:00
Andrew Gallant
5370064f00 warnings: remove/tweak some dead code
It looks like the dead code detector got better, so do a little code
cleanup.
2022-03-21 08:59:05 -04:00
arcsi42
b6189c659e ci: fix failing nightly-arm build on ci workflow
This commit updates the Ubuntu install script to include brotli and
zstd, which are needed for tests.

We also fix the Ubuntu install script to work in environments that
don't have 'sudo'. Instead of creating a totally separate script, we
preserve a single point of truth for these things and just make the
script a bit more flexible.

NOT seen in this commit is that we have built and updated the arm Docker
image. I'm hoping this fixes the GLIBC version issues we're seeing in
CI.

Fixes #2130, Closes #2132
2022-03-21 08:59:05 -04:00
Mateusz Konieczny
0b36942f68 doc: transcoding is done in addition to search
Even if transcoding would be faster than search it would still incur
performance penalty. We make this clearer by tweaking the wording.

PR #2079
2021-11-22 09:48:42 -05:00
mi-wada
7e05cde008 cli: improve configuration failure mode
This improves the error message printed when ripgrep can't read the
file path pointed to by RIPGREP_CONFIG_PATH. Specifically, before this
change:

    $ RIPGREP_CONFIG_PATH=no_exist_path rg 'search regex'
    no_exist_path: No such file or directory (os error 2)

And now after this change:

    $ RIPGREP_CONFIG_PATH=no_exist_path rg 'search regex'
    failed to read the file specified in RIPGREP_CONFIG_PATH: no_exist_path: No such file or directory (os error 2)

In the above examples, the first failure mode looks obvious, but that's
only because RIPGREP_CONFIG_PATH is being set at the same time that we
run the command. Often, the environment variable is set elsewhere and
the error message could be confusing outside of that context.

Closes #1990
2021-11-15 10:29:34 -05:00
jgart
418d048b27 ignore/types: add fennel
https://fennel-lang.org/

PR #2069
2021-11-15 09:58:09 -05:00
Josh Triplett
009dda1488 ignore: if require_git is false, don't stat .git
I've confirmed via strace that this eliminates a pile of stat calls.

PR #2052
2021-11-12 08:37:05 -05:00
Linda_pp
ba535fb5a3 ignore/types: improve 'vim' and 'vimscript' types
This adds various Vim config files to the glob patterns.

PR #2044
2021-10-27 10:59:44 -04:00
jgart
427aaeeb2e ignore/types: add lilypond
This adds file detection for lilypond: https://lilypond.org/

PR #2038
2021-10-24 11:22:07 -04:00
jgart
f5cff746bc ignore/types: add hy
This adds file detection for hy: http://hylang.org/

PR #2033
2021-10-22 08:16:48 -04:00
Philip Munksgaard
457f53b7ee ignore/types: fix futhark type extension
Previously, the 'fut' type only matches files called '.fut', while in
reality we want to match all files with the '.fut' extension. This
commit fixes that issue.

PR #2027
2021-10-19 09:15:19 -04:00
jgart
eb35f7978e ignore/types: add janet
This adds file detection for janet:
https://janet-lang.org/

PR #2018
2021-10-14 07:56:55 -04:00
Markus Dosch
fc69bd366c readme: update install commands for Debian/Ubuntu
This got overlooked during the last release.

PR #2016
2021-10-12 11:08:14 -04:00
Dash
9b01a8f9ae doc: add -F/--fixed-strings to "common options"
#607 is the top result for the search "ripgrep disable regex". I think
it makes sense to add it to the user guide, since it's a very useful
flag.

PR #1945
2021-07-21 20:52:25 -04:00
Andrew Gallant
0ff5dd2360 doc: --field-match-separator's default value is ':'
The docs were out of sync with the implementation. Likely a
copy-and-paste error.

Fixes #1939
2021-07-19 08:07:40 -04:00
Joe Lencioni
3c7819301b doc: fix typo "used" -> "use"
PR #1936
2021-07-14 10:12:30 -04:00
jgart
699e651db2 ignore/types: add texinfo
https://www.gnu.org/software/texinfo/

PR #1934
2021-07-13 07:59:23 -04:00
Eyal
9eddb71b8e ignore/types: add CUDA
Fixes #1918
2021-06-30 09:50:53 -04:00
Andrew Gallant
abf115228e changelog: add #1911 bug fix 2021-06-26 12:57:11 -04:00
Andrew Gallant
fdfc418be5 searcher: disable mmap searching on non-64 bit
It looks like it's possible for mmap to succeed on 32-bit systems even
when the full file can't be addressed in memory. This used to work prior
to ripgrep 13, but (maybe) something about statically linking vcruntime
has caused this to now fail.

It's no big deal to disable mmap searching on 32-bit, so we just do that
instead of returning incorrect results.

Fixes #1911
2021-06-26 12:53:59 -04:00
Sergio Benitez
5bf74362b9 doc: fix typo in --glob flag docs
PR #1899
2021-06-24 08:09:00 -04:00
Kostya M
431ea38620 ignore/types: add file extensions for Crystal
It sounds like Projectfile is no longer being used,
but we should keep it around in case folks are
still using it. It's unlikely that its presence will
do much if any harm.

PR #1904
2021-06-20 08:24:41 -04:00
Andrew Gallant
caba5c4348 globset-0.4.8 2021-06-18 13:30:32 -04:00
Gleb Pomykalov
07f97d42cf globset: fix compilation when serde is enabled
PR #1903
2021-06-18 13:30:47 -04:00
kotborealis
e33d6e73f5 doc: fix formatting of nested list
Markdown wants 4 spaces, not 2.

PR #1894
2021-06-15 10:35:16 -04:00
Andrew Gallant
478da4f271 pkg: fix version number for 13.0.0 release
Fixes #1896
2021-06-15 10:30:01 -04:00
Andrew Gallant
7ce66f73cf regex: update regression test
Sadly, PCRE2 has different behavior (but doesn't panic). We should look
into that, but for now, this is good enough.

Also, update the CHANGELOG.

Ref #1891
2021-06-12 16:22:30 -04:00
Andrew Gallant
bc76a30c23 regex: fix -w when regex can match empty string
This is a weird bug where our optimization for handling -w more quickly
than we would otherwise failed. In particular, if the original regex can
match the empty string, then our word boundary detection would produce
invalid indices to the start the next search at. We "fix" it by simply
bailing when the indices are known to be incorrect.

This wasn't a problem in a previous release since ripgrep 13 tweaked how
word boundaries are detected in commit efd9cfb2.

Fixes #1891
2021-06-12 14:18:53 -04:00
Andrew Gallant
5e81c60b35 ci: use musl to build debian artifact
Previously, I was trying to be a good citizen and let ripgrep use the
system libc. But it turns out that building ripgrep on Arch with a newer
version of glibc than what is in Ubuntu results in the whole thing
breaking. Arguably, I should build the Debian artifact on an Ubuntu or
Debian machine of an appropriate version, but that's too much work. If
people really want that, then they can install some ancient version of
ripgrep from their Ubuntu/Debian repo.

Since we were already statically linking PCRE2, we go the whole nine
yards and statically link the entire thing.

Fixes #1890
2021-06-12 13:36:57 -04:00
Andrew Gallant
b3e5ae9d28 changelog: add template for next entry 2021-06-12 08:43:49 -04:00
Andrew Gallant
a024f14fdd pkg: update brew tap version to 13.0.0 2021-06-12 08:43:30 -04:00
Andrew Gallant
8c30c8294a release: work around GitHub Actions weirdness 2021-06-12 08:40:48 -04:00
Andrew Gallant
c44d263419 release: add note about pushing changes 2021-06-12 08:13:29 -04:00
Andrew Gallant
af6b6c543b 13.0.0 2021-06-12 08:12:24 -04:00
Andrew Gallant
1a4fec8b4a changelog: final prep before ripgrep 13 release 2021-06-12 08:11:51 -04:00
Andrew Gallant
c8d8ab8ded deps/grep: update minimal versions 2021-06-12 08:08:58 -04:00
70 changed files with 3104 additions and 1851 deletions

6
.github/dependabot.yml vendored Normal file
View File

@@ -0,0 +1,6 @@
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"

View File

@@ -42,31 +42,31 @@ jobs:
- win-gnu
include:
- build: pinned
os: ubuntu-18.04
rust: 1.52.1
os: ubuntu-latest
rust: 1.70.0
- build: stable
os: ubuntu-18.04
os: ubuntu-latest
rust: stable
- build: beta
os: ubuntu-18.04
os: ubuntu-latest
rust: beta
- build: nightly
os: ubuntu-18.04
os: ubuntu-latest
rust: nightly
- build: nightly-musl
os: ubuntu-18.04
os: ubuntu-latest
rust: nightly
target: x86_64-unknown-linux-musl
- build: nightly-32
os: ubuntu-18.04
os: ubuntu-latest
rust: nightly
target: i686-unknown-linux-gnu
- build: nightly-mips
os: ubuntu-18.04
os: ubuntu-latest
rust: nightly
target: mips64-unknown-linux-gnuabi64
- build: nightly-arm
os: ubuntu-18.04
os: ubuntu-latest
rust: nightly
# For stripping release binaries:
# docker run --rm -v $PWD/target:/target:Z \
@@ -78,17 +78,17 @@ jobs:
os: macos-latest
rust: nightly
- build: win-msvc
os: windows-2019
os: windows-2022
rust: nightly
- build: win-gnu
os: windows-2019
os: windows-2022
rust: nightly-x86_64-gnu
steps:
- name: Checkout repository
uses: actions/checkout@v2
uses: actions/checkout@v3
- name: Install packages (Ubuntu)
if: matrix.os == 'ubuntu-18.04'
if: matrix.os == 'ubuntu-latest'
run: |
ci/ubuntu-install-packages
@@ -98,11 +98,9 @@ jobs:
ci/macos-install-packages
- name: Install Rust
uses: actions-rs/toolchain@v1
uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ matrix.rust }}
profile: minimal
override: true
- name: Use Cross
if: matrix.target != ''
@@ -150,14 +148,14 @@ jobs:
run: ${{ env.CARGO }} test --verbose --workspace ${{ env.TARGET_FLAGS }}
- name: Test for existence of build artifacts (Windows)
if: matrix.os == 'windows-2019'
if: matrix.os == 'windows-2022'
shell: bash
run: |
outdir="$(ci/cargo-out-dir "${{ env.TARGET_DIR }}")"
ls "$outdir/_rg.ps1" && file "$outdir/_rg.ps1"
- name: Test for existence of build artifacts (Unix)
if: matrix.os != 'windows-2019'
if: matrix.os != 'windows-2022'
shell: bash
run: |
outdir="$(ci/cargo-out-dir "${{ env.TARGET_DIR }}")"
@@ -174,39 +172,34 @@ jobs:
# 'rg' binary (done in test-complete) with qemu, which is a pain and
# doesn't really gain us much. If shell completion works in one place,
# it probably works everywhere.
if: matrix.target == '' && matrix.os != 'windows-2019'
if: matrix.target == '' && matrix.os != 'windows-2022'
shell: bash
run: ci/test-complete
rustfmt:
name: rustfmt
runs-on: ubuntu-18.04
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v2
uses: actions/checkout@v3
- name: Install Rust
uses: actions-rs/toolchain@v1
uses: dtolnay/rust-toolchain@master
with:
toolchain: stable
override: true
profile: minimal
components: rustfmt
- name: Check formatting
run: |
cargo fmt --all -- --check
run: cargo fmt --all --check
docs:
name: Docs
runs-on: ubuntu-20.04
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v2
uses: actions/checkout@v3
- name: Install Rust
uses: actions-rs/toolchain@v1
uses: dtolnay/rust-toolchain@master
with:
toolchain: stable
profile: minimal
override: true
- name: Check documentation
env:
RUSTDOCFLAGS: -D warnings

View File

@@ -24,7 +24,7 @@ on:
jobs:
create-release:
name: create-release
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
# env:
# Set to force version number, e.g., when no tag exists.
# RG_VERSION: TEST-0.0.0
@@ -71,52 +71,48 @@ jobs:
build: [linux, linux-arm, macos, win-msvc, win-gnu, win32-msvc]
include:
- build: linux
os: ubuntu-18.04
os: ubuntu-22.04
rust: nightly
target: x86_64-unknown-linux-musl
- build: linux-arm
os: ubuntu-18.04
os: ubuntu-22.04
rust: nightly
target: arm-unknown-linux-gnueabihf
- build: macos
os: macos-latest
os: macos-12
rust: nightly
target: x86_64-apple-darwin
- build: win-msvc
os: windows-2019
os: windows-2022
rust: nightly
target: x86_64-pc-windows-msvc
- build: win-gnu
os: windows-2019
os: windows-2022
rust: nightly-x86_64-gnu
target: x86_64-pc-windows-gnu
- build: win32-msvc
os: windows-2019
os: windows-2022
rust: nightly
target: i686-pc-windows-msvc
steps:
- name: Checkout repository
uses: actions/checkout@v2
with:
fetch-depth: 1
uses: actions/checkout@v3
- name: Install packages (Ubuntu)
if: matrix.os == 'ubuntu-18.04'
if: matrix.os == 'ubuntu-22.04'
run: |
ci/ubuntu-install-packages
- name: Install packages (macOS)
if: matrix.os == 'macos-latest'
if: matrix.os == 'macos-12'
run: |
ci/macos-install-packages
- name: Install Rust
uses: actions-rs/toolchain@v1
uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ matrix.rust }}
profile: minimal
override: true
target: ${{ matrix.target }}
- name: Use Cross
@@ -161,7 +157,7 @@ jobs:
cp "$outdir"/{rg.bash,rg.fish,_rg.ps1} "$staging/complete/"
cp complete/_rg "$staging/complete/"
if [ "${{ matrix.os }}" = "windows-2019" ]; then
if [ "${{ matrix.os }}" = "windows-2022" ]; then
cp "target/${{ matrix.target }}/release/rg.exe" "$staging/"
7z a "$staging.zip" "$staging"
echo "ASSET=$staging.zip" >> $GITHUB_ENV
@@ -174,7 +170,7 @@ jobs:
fi
- name: Upload release archive
uses: actions/upload-release-asset@v1.0.1
uses: actions/upload-release-asset@v1.0.2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:

1
.ignore Normal file
View File

@@ -0,0 +1 @@
!/.github/

View File

@@ -1,9 +1,25 @@
TBD
===
Unreleased changes. Release notes have not yet been written.
Bug fixes:
* [BUG #1891](https://github.com/BurntSushi/ripgrep/issues/1891):
Fix bug when using `-w` with a regex that can match the empty string.
* [BUG #1911](https://github.com/BurntSushi/ripgrep/issues/1911):
Disable mmap searching in all non-64-bit environments.
* [BUG #2236](https://github.com/BurntSushi/ripgrep/issues/2236):
Fix gitignore parsing bug where a trailing `\/` resulted in an error.
13.0.0 (2021-06-12)
===================
ripgrep 13 is a new major version release of ripgrep that primarily contains
bug fixes. There is also a fix for a security vulnerability on Windows
([CVE-2021-3013](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-3013)),
some performance improvements and some minor breaking changes.
bug fixes, some performance improvements and a few minor breaking changes.
There is also a fix for a security vulnerability on Windows
([CVE-2021-3013](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-3013)).
Some highlights:
A new short flag, `-.`, has been added. It is an alias for the `--hidden` flag,
which instructs ripgrep to search hidden files and directories.

265
Cargo.lock generated
View File

@@ -4,68 +4,51 @@ version = 3
[[package]]
name = "aho-corasick"
version = "0.7.18"
version = "1.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1e37cfd5e7657ada45f742d6e99ca5788580b5c529dc78faf11ece6dc702656f"
checksum = "43f6cb1bf222025340178f382c426f13757b2960e89779dfcb319c32542a5a41"
dependencies = [
"memchr",
]
[[package]]
name = "atty"
version = "0.2.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d9b39be18770d11421cdb1b9947a45dd3f37e93092cbf377614828a319d5fee8"
dependencies = [
"hermit-abi",
"libc",
"winapi",
]
[[package]]
name = "base64"
version = "0.13.0"
version = "0.20.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "904dfeac50f3cdaba28fc6f57fdcddb75f49ed61346676a78c4ffe55877802fd"
checksum = "0ea22880d78093b0cbe17c89f64a7d457941e65759157ec6cb31a31d652b05e5"
[[package]]
name = "bitflags"
version = "1.2.1"
version = "1.3.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cf1de2fe8c75bc145a2f577add951f8134889b4795d47466a54a5c846d691693"
checksum = "bef38d45163c2f1dde094a7dfd33ccf595c92905c8f8f4fdc18d06fb1037718a"
[[package]]
name = "bstr"
version = "0.2.16"
version = "1.6.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "90682c8d613ad3373e66de8c6411e0ae2ab2571e879d2efbf73558cc66f21279"
checksum = "6798148dccfbff0fae41c7574d2fa8f1ef3492fba0face179de5d8d447d67b05"
dependencies = [
"lazy_static",
"memchr",
"regex-automata",
"serde",
]
[[package]]
name = "bytecount"
version = "0.6.2"
version = "0.6.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "72feb31ffc86498dacdbd0fcebb56138e7177a8cc5cea4516031d15ae85a742e"
checksum = "2c676a478f63e9fa2dd5368a42f28bba0d6c560b775f38583c8bbaa7fcd67c9c"
[[package]]
name = "cc"
version = "1.0.68"
version = "1.0.79"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4a72c244c1ff497a746a7e1fb3d14bd08420ecda70c8f25c7112f2781652d787"
checksum = "50d30906286121d95be3d479533b458f87493b30a4b5f79a607db8f5d11aa91f"
dependencies = [
"jobserver",
]
[[package]]
name = "cfg-if"
version = "0.1.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4785bdd1c96b2a846b2bd7cc02e86b6b3dbf14e7e53446c4f54c92a361040822"
[[package]]
name = "cfg-if"
version = "1.0.0"
@@ -74,9 +57,9 @@ checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd"
[[package]]
name = "clap"
version = "2.33.3"
version = "2.34.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "37e58ac78573c40708d45522f0d80fa2f01cc4f9b4e2bf749807255454312002"
checksum = "a0610544180c38b88101fecf2dd634b174a62eef6946f84dfc6a7127512b381c"
dependencies = [
"bitflags",
"strsim",
@@ -86,31 +69,30 @@ dependencies = [
[[package]]
name = "crossbeam-channel"
version = "0.5.1"
version = "0.5.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "06ed27e177f16d65f0f0c22a213e17c696ace5dd64b14258b52f9417ccb52db4"
checksum = "a33c2bf77f2df06183c3aa30d1e96c0695a313d4f9c453cc3762a6db39f99200"
dependencies = [
"cfg-if 1.0.0",
"cfg-if",
"crossbeam-utils",
]
[[package]]
name = "crossbeam-utils"
version = "0.8.5"
version = "0.8.16"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d82cfc11ce7f2c3faef78d8a684447b40d503d9681acebed6cb728d45940c4db"
checksum = "5a22b2d63d4d1dc0b7f1b6b2747dd0088008a9be28b6ddf0b1e7d335e3037294"
dependencies = [
"cfg-if 1.0.0",
"lazy_static",
"cfg-if",
]
[[package]]
name = "encoding_rs"
version = "0.8.28"
version = "0.8.32"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "80df024fbc5ac80f87dfef0d9f5209a252f2a497f7f42944cff24d8253cac065"
checksum = "071a31f4ee85403370b58aca746f01041ede6f0da2730960ad001edc2b71b394"
dependencies = [
"cfg-if 1.0.0",
"cfg-if",
"packed_simd_2",
]
@@ -129,21 +111,15 @@ version = "1.0.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3f9eec918d3f24069decb9af1554cad7c880e2da24a9afd88aca000531ab82c1"
[[package]]
name = "fs_extra"
version = "1.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2022715d62ab30faffd124d40b76f4134a550a87792276512b18d63272333394"
[[package]]
name = "glob"
version = "0.3.0"
version = "0.3.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9b919933a397b79c37e33b77bb2aa3dc8eb6e165ad809e58ff75bc7db2e34574"
checksum = "d2fabcfbdc87f4758337ca535fb41a6d701b65693ce38287d856d1674551ec9b"
[[package]]
name = "globset"
version = "0.4.7"
version = "0.4.10"
dependencies = [
"aho-corasick",
"bstr",
@@ -158,7 +134,7 @@ dependencies = [
[[package]]
name = "grep"
version = "0.2.8"
version = "0.2.12"
dependencies = [
"grep-cli",
"grep-matcher",
@@ -172,9 +148,8 @@ dependencies = [
[[package]]
name = "grep-cli"
version = "0.1.6"
version = "0.1.8"
dependencies = [
"atty",
"bstr",
"globset",
"lazy_static",
@@ -187,7 +162,7 @@ dependencies = [
[[package]]
name = "grep-matcher"
version = "0.1.5"
version = "0.1.6"
dependencies = [
"memchr",
"regex",
@@ -195,15 +170,16 @@ dependencies = [
[[package]]
name = "grep-pcre2"
version = "0.1.5"
version = "0.1.6"
dependencies = [
"grep-matcher",
"log",
"pcre2",
]
[[package]]
name = "grep-printer"
version = "0.1.6"
version = "0.1.7"
dependencies = [
"base64",
"bstr",
@@ -217,20 +193,19 @@ dependencies = [
[[package]]
name = "grep-regex"
version = "0.1.9"
version = "0.1.11"
dependencies = [
"aho-corasick",
"bstr",
"grep-matcher",
"log",
"regex",
"regex-automata",
"regex-syntax",
"thread_local",
]
[[package]]
name = "grep-searcher"
version = "0.1.8"
version = "0.1.11"
dependencies = [
"bstr",
"bytecount",
@@ -243,21 +218,11 @@ dependencies = [
"regex",
]
[[package]]
name = "hermit-abi"
version = "0.1.18"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "322f4de77956e22ed0e5032c359a0f1273f1f7f0d79bfa3b8ffbc730d7fbcc5c"
dependencies = [
"libc",
]
[[package]]
name = "ignore"
version = "0.4.18"
version = "0.4.20"
dependencies = [
"crossbeam-channel",
"crossbeam-utils",
"globset",
"lazy_static",
"log",
@@ -271,26 +236,25 @@ dependencies = [
[[package]]
name = "itoa"
version = "0.4.7"
version = "1.0.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dd25036021b0de88a0aff6b850051563c6516d0bf53f8638938edbb9de732736"
checksum = "62b02a5381cc465bd3041d84623d0fa3b66738b52b8e2fc3bab8ad63ab032f4a"
[[package]]
name = "jemalloc-sys"
version = "0.3.2"
version = "0.5.3+5.3.0-patched"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0d3b9f3f5c9b31aa0f5ed3260385ac205db665baa41d49bb8338008ae94ede45"
checksum = "f9bd5d616ea7ed58b571b2e209a65759664d7fb021a0819d7a790afc67e47ca1"
dependencies = [
"cc",
"fs_extra",
"libc",
]
[[package]]
name = "jemallocator"
version = "0.3.2"
version = "0.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "43ae63fcfc45e99ab3d1b29a46782ad679e98436c3169d15a167a1108a724b69"
checksum = "16c2514137880c52b0b4822b563fadd38257c1f380858addb74a400889696ea6"
dependencies = [
"jemalloc-sys",
"libc",
@@ -298,9 +262,9 @@ dependencies = [
[[package]]
name = "jobserver"
version = "0.1.22"
version = "0.1.26"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "972f5ae5d1cb9c6ae417789196c803205313edde988685da5e3aae0827b9e7fd"
checksum = "936cfd212a0155903bcbc060e316fb6cc7cbf2e1907329391ebadc1fe0ce77c2"
dependencies = [
"libc",
]
@@ -313,9 +277,9 @@ checksum = "e2abad23fbc42b3700f2f279844dc832adb2b2eb069b2df918f455c4e18cc646"
[[package]]
name = "libc"
version = "0.2.97"
version = "0.2.147"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "12b8adadd720df158f4d70dfe7ccc6adb0472d7c55ca83445f6a5ab3e36f8fb6"
checksum = "b4668fb0ea861c1df094127ac5f1da3409a82116a4ba74fca2e58ef927159bb3"
[[package]]
name = "libm"
@@ -325,59 +289,46 @@ checksum = "7fc7aa29613bd6a620df431842069224d8bc9011086b1db4c0e0cd47fa03ec9a"
[[package]]
name = "log"
version = "0.4.14"
version = "0.4.19"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "51b9bbe6c47d51fc3e1a9b945965946b4c44142ab8792c50835a980d362c2710"
dependencies = [
"cfg-if 1.0.0",
]
checksum = "b06a4cde4c0f271a446782e3eff8de789548ce57dbc8eca9292c27f4a42004b4"
[[package]]
name = "memchr"
version = "2.4.0"
version = "2.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b16bd47d9e329435e309c58469fe0791c2d0d1ba96ec0954152a5ae2b04387dc"
checksum = "2dffe52ecf27772e601905b7522cb4ef790d2cc203488bbd0e2fe85fcb74566d"
[[package]]
name = "memmap2"
version = "0.3.0"
version = "0.5.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "20ff203f7bdc401350b1dbaa0355135777d25f41c0bbc601851bbd6cf61e8ff5"
checksum = "83faa42c0a078c393f6b29d5db232d8be22776a891f8f56e5284faee4a20b327"
dependencies = [
"libc",
]
[[package]]
name = "num_cpus"
version = "1.13.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "05499f3756671c15885fee9034446956fff3f243d6077b91e5767df161f766b3"
dependencies = [
"hermit-abi",
"libc",
]
[[package]]
name = "once_cell"
version = "1.7.2"
version = "1.18.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "af8b08b04175473088b46763e51ee54da5f9a164bc162f615b91bc179dbf15a3"
checksum = "dd8b5dd2ae5ed71462c540258bedcb51965123ad7e7ccf4b9a8cafaa4a63576d"
[[package]]
name = "packed_simd_2"
version = "0.3.5"
version = "0.3.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0e64858a2d3733fdd61adfdd6da89aa202f7ff0e741d2fc7ed1e452ba9dc99d7"
checksum = "a1914cd452d8fccd6f9db48147b29fd4ae05bea9dc5d9ad578509f72415de282"
dependencies = [
"cfg-if 0.1.10",
"cfg-if",
"libm",
]
[[package]]
name = "pcre2"
version = "0.2.3"
version = "0.2.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "85b30f2f69903b439dd9dc9e824119b82a55bf113b29af8d70948a03c1b11ab1"
checksum = "486aca7e74edb8cab09a48d461177f450a5cca3b55e61d139f7552190e2bbcf5"
dependencies = [
"libc",
"log",
@@ -387,9 +338,9 @@ dependencies = [
[[package]]
name = "pcre2-sys"
version = "0.2.5"
version = "0.2.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dec30e5e9ec37eb8fbf1dea5989bc957fd3df56fbee5061aa7b7a99dbb37b722"
checksum = "ae234f441970dbd52d4e29bee70f3b56ca83040081cb2b55b7df772b16e0b06e"
dependencies = [
"cc",
"libc",
@@ -398,54 +349,60 @@ dependencies = [
[[package]]
name = "pkg-config"
version = "0.3.19"
version = "0.3.27"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3831453b3449ceb48b6d9c7ad7c96d5ea673e9b470a1dc578c2ce6521230884c"
checksum = "26072860ba924cbfa98ea39c8c19b4dd6a4a25423dbdf219c1eca91aa0cf6964"
[[package]]
name = "proc-macro2"
version = "1.0.27"
version = "1.0.63"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f0d8caf72986c1a598726adc988bb5984792ef84f5ee5aa50209145ee8077038"
checksum = "7b368fba921b0dce7e60f5e04ec15e565b3303972b42bcfde1d0713b881959eb"
dependencies = [
"unicode-xid",
"unicode-ident",
]
[[package]]
name = "quote"
version = "1.0.9"
version = "1.0.29"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c3d0b9745dc2debf507c8422de05d7226cc1f0644216dfdfead988f9b1ab32a7"
checksum = "573015e8ab27661678357f27dc26460738fd2b6c86e46f386fde94cb5d913105"
dependencies = [
"proc-macro2",
]
[[package]]
name = "regex"
version = "1.5.4"
version = "1.9.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d07a8629359eb56f1e2fb1652bb04212c072a87ba68546a04065d525673ac461"
checksum = "89089e897c013b3deb627116ae56a6955a72b8bed395c9526af31c9fe528b484"
dependencies = [
"aho-corasick",
"memchr",
"regex-automata",
"regex-syntax",
]
[[package]]
name = "regex-automata"
version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fa250384981ea14565685dea16a9ccc4d1c541a13f82b9c168572264d1df8c56"
dependencies = [
"aho-corasick",
"memchr",
"regex-syntax",
]
[[package]]
name = "regex-automata"
version = "0.1.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6c230d73fb8d8c1b9c0b3135c5142a8acee3a0558fb8db5cf1cb65f8d7862132"
[[package]]
name = "regex-syntax"
version = "0.6.25"
version = "0.7.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f497285884f3fcff424ffc933e56d7cbca511def0c9831a7f9b5f6153e3cc89b"
checksum = "2ab07dc67230e4a4718e70fd5c20055a4334b121f1f9db8fe63ef39ce9b8c846"
[[package]]
name = "ripgrep"
version = "12.1.1"
version = "13.0.0"
dependencies = [
"bstr",
"clap",
@@ -454,8 +411,6 @@ dependencies = [
"jemallocator",
"lazy_static",
"log",
"num_cpus",
"regex",
"serde",
"serde_derive",
"serde_json",
@@ -465,9 +420,9 @@ dependencies = [
[[package]]
name = "ryu"
version = "1.0.5"
version = "1.0.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "71d301d4193d031abdd79ff7e3dd721168a9572ef3fe51a1517aba235bd8f86e"
checksum = "fe232bdf6be8c8de797b22184ee71118d63780ea42ac85b61d1baa6d3b782ae9"
[[package]]
name = "same-file"
@@ -480,18 +435,18 @@ dependencies = [
[[package]]
name = "serde"
version = "1.0.126"
version = "1.0.166"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ec7505abeacaec74ae4778d9d9328fe5a5d04253220a85c4ee022239fc996d03"
checksum = "d01b7404f9d441d3ad40e6a636a7782c377d2abdbe4fa2440e2edcc2f4f10db8"
dependencies = [
"serde_derive",
]
[[package]]
name = "serde_derive"
version = "1.0.126"
version = "1.0.166"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "963a7dbc9895aeac7ac90e74f34a5d5261828f79df35cbed41e10189d3804d43"
checksum = "5dd83d6dde2b6b2d466e14d9d1acce8816dedee94f735eac6395808b3483c6d6"
dependencies = [
"proc-macro2",
"quote",
@@ -500,9 +455,9 @@ dependencies = [
[[package]]
name = "serde_json"
version = "1.0.64"
version = "1.0.100"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "799e97dc9fdae36a5c8b8f2cae9ce2ee9fdce2058c57a93e6099d919fd982f79"
checksum = "0f1e14e89be7aa4c4b78bdbdc9eb5bf8517829a600ae8eaa39a6e1d960b5185c"
dependencies = [
"itoa",
"ryu",
@@ -517,20 +472,20 @@ checksum = "8ea5119cdb4c55b55d432abb513a0429384878c15dde60cc77b1c99de1a95a6a"
[[package]]
name = "syn"
version = "1.0.73"
version = "2.0.23"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f71489ff30030d2ae598524f61326b902466f72a0fb1a8564c001cc63425bcc7"
checksum = "59fb7d6d8281a51045d62b8eb3a7d1ce347b76f312af50cd3dc0af39c87c1737"
dependencies = [
"proc-macro2",
"quote",
"unicode-xid",
"unicode-ident",
]
[[package]]
name = "termcolor"
version = "1.1.2"
version = "1.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2dfed899f0eb03f32ee8c6a0aabdb8a7949659e3466561fc0adf54e26d88c5f4"
checksum = "be55cf8942feac5c765c2c993422806843c9a9a45d4d5c407ad6dd2ea95eb9b6"
dependencies = [
"winapi-util",
]
@@ -546,33 +501,33 @@ dependencies = [
[[package]]
name = "thread_local"
version = "1.1.3"
version = "1.1.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8018d24e04c95ac8790716a5987d0fec4f8b27249ffa0f7d33f1369bdfb88cbd"
checksum = "3fdd6f064ccff2d6567adcb3873ca630700f00b5ad3f060c25b5dcfd9a4ce152"
dependencies = [
"cfg-if",
"once_cell",
]
[[package]]
name = "unicode-width"
version = "0.1.8"
name = "unicode-ident"
version = "1.0.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9337591893a19b88d8d87f2cec1e73fad5cdfd10e5a6f349f498ad6ea2ffb1e3"
checksum = "22049a19f4a68748a168c0fc439f9516686aa045927ff767eca0a85101fb6e73"
[[package]]
name = "unicode-xid"
version = "0.2.2"
name = "unicode-width"
version = "0.1.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8ccb82d61f80a663efe1f787a51b16b5a51e3314d6ac365b08639f52387b33f3"
checksum = "c0edd1e5b14653f783770bce4a4dabb4a5108a5370a5f5d8cfe8710c361f6c8b"
[[package]]
name = "walkdir"
version = "2.3.2"
version = "2.3.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "808cf2735cd4b6866113f648b791c6adc5714537bc222d9347bb203386ffda56"
checksum = "36df944cda56c7d8d8b7496af378e6b16de9284591917d307c9b4d313c44e698"
dependencies = [
"same-file",
"winapi",
"winapi-util",
]

View File

@@ -1,6 +1,6 @@
[package]
name = "ripgrep"
version = "12.1.1" #:version
version = "13.0.0" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
ripgrep is a line-oriented search tool that recursively searches the current
@@ -17,6 +17,7 @@ exclude = ["HomebrewFormula"]
build = "build.rs"
autotests = false
edition = "2018"
rust-version = "1.65"
[[bin]]
bench = false
@@ -41,13 +42,11 @@ members = [
]
[dependencies]
bstr = "0.2.12"
grep = { version = "0.2.7", path = "crates/grep" }
ignore = { version = "0.4.18", path = "crates/ignore" }
bstr = "1.6.0"
grep = { version = "0.2.12", path = "crates/grep" }
ignore = { version = "0.4.19", path = "crates/ignore" }
lazy_static = "1.1.0"
log = "0.4.5"
num_cpus = "1.8.0"
regex = "1.3.5"
serde_json = "1.0.23"
termcolor = "1.1.0"
@@ -57,7 +56,7 @@ default-features = false
features = ["suggestions"]
[target.'cfg(all(target_env = "musl", target_pointer_width = "64"))'.dependencies.jemallocator]
version = "0.3.0"
version = "0.5.0"
[build-dependencies]
lazy_static = "1.1.0"

View File

@@ -6,6 +6,7 @@ image = "burntsushi/cross:i686-unknown-linux-gnu"
[target.mips64-unknown-linux-gnuabi64]
image = "burntsushi/cross:mips64-unknown-linux-gnuabi64"
build-std = true
[target.arm-unknown-linux-gnueabihf]
image = "burntsushi/cross:arm-unknown-linux-gnueabihf"

View File

@@ -178,11 +178,11 @@ search. By default, when you search a directory, ripgrep will ignore all of
the following:
1. Files and directories that match glob patterns in these three categories:
1. gitignore globs (including global and repo-specific globs).
2. `.ignore` globs, which take precedence over all gitignore globs when
there's a conflict.
3. `.rgignore` globs, which take precedence over all `.ignore` globs when
there's a conflict.
1. gitignore globs (including global and repo-specific globs).
2. `.ignore` globs, which take precedence over all gitignore globs
when there's a conflict.
3. `.rgignore` globs, which take precedence over all `.ignore` globs
when there's a conflict.
2. Hidden files and directories.
3. Binary files. (ripgrep considers any file with a `NUL` byte to be binary.)
4. Symbolic links aren't followed.
@@ -190,7 +190,8 @@ the following:
All of these things can be toggled using various flags provided by ripgrep:
1. You can disable all ignore-related filtering with the `--no-ignore` flag.
2. Hidden files and directories can be searched with the `--hidden` flag.
2. Hidden files and directories can be searched with the `--hidden` (`-.` for
short) flag.
3. Binary files can be searched via the `--text` (`-a` for short) flag.
Be careful with this flag! Binary files may emit control characters to your
terminal, which might cause strange behavior.
@@ -648,9 +649,9 @@ given, which is the default:
they correspond to a UTF-16 BOM, then ripgrep will transcode the contents of
the file from UTF-16 to UTF-8, and then execute the search on the transcoded
version of the file. (This incurs a performance penalty since transcoding
is slower than regex searching.) If the file contains invalid UTF-16, then
the Unicode replacement codepoint is substituted in place of invalid code
units.
is needed in addition to regex searching.) If the file contains invalid
UTF-16, then the Unicode replacement codepoint is substituted in place of
invalid code units.
* To handle other cases, ripgrep provides a `-E/--encoding` flag, which permits
you to specify an encoding from the
[Encoding Standard](https://encoding.spec.whatwg.org/#concept-encoding-get).
@@ -992,6 +993,8 @@ used options that will likely impact how you use ripgrep on a regular basis.
* `-S/--smart-case`: This is similar to `--ignore-case`, but disables itself
if the pattern contains any uppercase letters. Usually this flag is put into
alias or a config file.
* `-F/--fixed-strings`: Disable regular expression matching and treat the pattern
as a literal string.
* `-w/--word-regexp`: Require that all matches of the pattern be surrounded
by word boundaries. That is, given `pattern`, the `--word-regexp` flag will
cause ripgrep to behave as if `pattern` were actually `\b(?:pattern)\b`.

View File

@@ -2,11 +2,11 @@ ripgrep (rg)
------------
ripgrep is a line-oriented search tool that recursively searches the current
directory for a regex pattern. By default, ripgrep will respect gitignore rules
and automatically skip hidden files/directories and binary files. ripgrep
has first class support on Windows, macOS and Linux, with binary downloads
available for [every release](https://github.com/BurntSushi/ripgrep/releases).
ripgrep is similar to other popular search tools like The Silver Searcher, ack
and grep.
and automatically skip hidden files/directories and binary files. (To disable
all automatic filtering by default, use `rg -uuu`.) ripgrep has first class
support on Windows, macOS and Linux, with binary downloads available for [every
release](https://github.com/BurntSushi/ripgrep/releases). ripgrep is similar to
other popular search tools like The Silver Searcher, ack and grep.
[![Build status](https://github.com/BurntSushi/ripgrep/workflows/ci/badge.svg)](https://github.com/BurntSushi/ripgrep/actions)
[![Crates.io](https://img.shields.io/crates/v/ripgrep.svg)](https://crates.io/crates/ripgrep)
@@ -90,16 +90,16 @@ times are unaffected by the presence or absence of `-n`.
because it contains most of their features and is generally faster. (See
[the FAQ](FAQ.md#posix4ever) for more details on whether ripgrep can truly
replace grep.)
* Like other tools specialized to code search, ripgrep defaults to recursive
directory search and won't search files ignored by your
`.gitignore`/`.ignore`/`.rgignore` files. It also ignores hidden and binary
files by default. ripgrep also implements full support for `.gitignore`,
whereas there are many bugs related to that functionality in other code
search tools claiming to provide the same functionality.
* ripgrep can search specific types of files. For example, `rg -tpy foo`
limits your search to Python files and `rg -Tjs foo` excludes JavaScript
files from your search. ripgrep can be taught about new file types with
custom matching rules.
* Like other tools specialized to code search, ripgrep defaults to
[recursive search](GUIDE.md#recursive-search) and does [automatic
filtering](GUIDE.md#automatic-filtering). Namely, ripgrep won't search files
ignored by your `.gitignore`/`.ignore`/`.rgignore` files, it won't search
hidden files and it won't search binary files. Automatic filtering can be
disabled with `rg -uuu`.
* ripgrep can [search specific types of files](GUIDE.md#manual-filtering-file-types).
For example, `rg -tpy foo` limits your search to Python files and `rg -Tjs
foo` excludes JavaScript files from your search. ripgrep can be taught about
new file types with custom matching rules.
* ripgrep supports many features found in `grep`, such as showing the context
of search results, searching multiple patterns, highlighting matches with
color and full Unicode support. Unlike GNU grep, ripgrep stays fast while
@@ -110,16 +110,20 @@ times are unaffected by the presence or absence of `-n`.
regex engine. PCRE2 support can be enabled with `-P/--pcre2` (use PCRE2
always) or `--auto-hybrid-regex` (use PCRE2 only if needed). An alternative
syntax is provided via the `--engine (default|pcre2|auto-hybrid)` option.
* ripgrep supports searching files in text encodings other than UTF-8, such
as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for
automatically detecting UTF-16 is provided. Other text encodings must be
specifically specified with the `-E/--encoding` flag.)
* ripgrep has [rudimentary support for replacements](GUIDE.md#replacements),
which permit rewriting output based on what was matched.
* ripgrep supports [searching files in text encodings](GUIDE.md#file-encoding)
other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more.
(Some support for automatically detecting UTF-16 is provided. Other text
encodings must be specifically specified with the `-E/--encoding` flag.)
* ripgrep supports searching files compressed in a common format (brotli,
bzip2, gzip, lz4, lzma, xz, or zstandard) with the `-z/--search-zip` flag.
* ripgrep supports
[arbitrary input preprocessing filters](GUIDE.md#preprocessor)
which could be PDF text extraction, less supported decompression, decrypting,
automatic encoding detection and so on.
* ripgrep can be configured via a
[configuration file](GUIDE.md#configuration-file).
In other words, use ripgrep if you like speed, filtering by default, fewer
bugs and Unicode support.
@@ -267,17 +271,26 @@ $ nix-env --install ripgrep
$ # (Or using the attribute name, which is also ripgrep.)
```
If you're a **Guix** user, you can install ripgrep from the official
package collection:
```
$ guix install ripgrep
```
If you're a **Debian** user (or a user of a Debian derivative like **Ubuntu**),
then ripgrep can be installed using a binary `.deb` file provided in each
[ripgrep release](https://github.com/BurntSushi/ripgrep/releases).
```
$ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/12.1.1/ripgrep_12.1.1_amd64.deb
$ sudo dpkg -i ripgrep_12.1.1_amd64.deb
$ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/13.0.0/ripgrep_13.0.0_amd64.deb
$ sudo dpkg -i ripgrep_13.0.0_amd64.deb
```
If you run Debian Buster (currently Debian stable) or Debian sid, ripgrep is
[officially maintained by Debian](https://tracker.debian.org/pkg/rust-ripgrep).
If you run Debian stable, ripgrep is [officially maintained by
Debian](https://tracker.debian.org/pkg/rust-ripgrep), although its version may
be older than the `deb` package available in the previous step.
```
$ sudo apt-get install ripgrep
```
@@ -332,7 +345,7 @@ $ pkgman install ripgrep_x86
If you're a **Rust programmer**, ripgrep can be installed with `cargo`.
* Note that the minimum supported version of Rust for ripgrep is **1.34.0**,
* Note that the minimum supported version of Rust for ripgrep is **1.70.0**,
although ripgrep may work with older versions.
* Note that the binary may be bigger than expected because it contains debug
symbols. This is intentional. To remove debug symbols and therefore reduce
@@ -347,7 +360,7 @@ $ cargo install ripgrep
ripgrep is written in Rust, so you'll need to grab a
[Rust installation](https://www.rust-lang.org/) in order to compile it.
ripgrep compiles with Rust 1.34.0 (stable) or newer. In general, ripgrep tracks
ripgrep compiles with Rust 1.65.0 (stable) or newer. In general, ripgrep tracks
the latest stable release of the Rust compiler.
To build ripgrep:
@@ -419,6 +432,14 @@ $ cargo test --all
from the repository root.
### Related tools
* [delta](https://github.com/dandavison/delta) is a syntax highlighting
pager that supports the `rg --json` output format. So all you need to do to
make it work is `rg --json pattern | delta`. See [delta's manual section on
grep](https://dandavison.github.io/delta/grep.html) for more details.
### Vulnerability reporting
For reporting a security vulnerability, please

View File

@@ -26,6 +26,11 @@ Release Checklist
`cargo update -p ripgrep` so that the `Cargo.lock` is updated. Commit the
changes and create a new signed tag. Alternatively, use
`cargo-up --no-push --no-release Cargo.toml {VERSION}` to automate this.
* Push changes to GitHub, NOT including the tag. (But do not publish new
version of ripgrep to crates.io yet.)
* Once CI for `master` finishes successfully, push the version tag. (Trying to
do this in one step seems to result in GitHub Actions not seeing the tag
push and thus not running the release workflow.)
* Wait for CI to finish creating the release. If the release build fails, then
delete the tag from GitHub, make fixes, re-tag, delete the release and push.
* Copy the relevant section of the CHANGELOG to the tagged release notes.

View File

@@ -26,15 +26,13 @@ SUBTITLES_DIR = 'subtitles'
SUBTITLES_EN_NAME = 'en.txt'
SUBTITLES_EN_NAME_SAMPLE = 'en.sample.txt'
SUBTITLES_EN_NAME_GZ = '%s.gz' % SUBTITLES_EN_NAME
# SUBTITLES_EN_URL = 'http://opus.lingfil.uu.se/OpenSubtitles2016/mono/OpenSubtitles2016.raw.en.gz' # noqa
SUBTITLES_EN_URL = 'https://object.pouta.csc.fi/OPUS-OpenSubtitles/v2016/mono/en.txt.gz' # noqa
SUBTITLES_RU_NAME = 'ru.txt'
SUBTITLES_RU_NAME_GZ = '%s.gz' % SUBTITLES_RU_NAME
# SUBTITLES_RU_URL = 'http://opus.lingfil.uu.se/OpenSubtitles2016/mono/OpenSubtitles2016.raw.ru.gz' # noqa
SUBTITLES_RU_URL = 'https://object.pouta.csc.fi/OPUS-OpenSubtitles/v2016/mono/ru.txt.gz' # noqa
LINUX_DIR = 'linux'
LINUX_CLONE = 'git://github.com/BurntSushi/linux'
LINUX_CLONE = 'https://github.com/BurntSushi/linux'
# Grep takes locale settings from the environment. There is a *substantial*
# performance impact for enabling Unicode, so we need to handle this explicitly
@@ -546,7 +544,11 @@ def bench_subtitles_ru_literal(suite_dir):
Command('rg (lines)', ['rg', '-n', pat, ru]),
Command('ag (lines)', ['ag', '-s', pat, ru]),
Command('grep (lines)', ['grep', '-n', pat, ru], env=GREP_ASCII),
Command('ugrep (lines)', ['ugrep', '-n', pat, ru])
# ugrep incorrectly identifies this corpus as binary, but it is
# entirely valid UTF-8. So we tell ugrep to always treat the corpus
# as text even though this technically gives it an edge over other
# tools. (It no longer needs to check for binary data.)
Command('ugrep (lines)', ['ugrep', '-a', '-n', pat, ru])
])
@@ -564,7 +566,8 @@ def bench_subtitles_ru_literal_casei(suite_dir):
Command('grep (ASCII)', ['grep', '-E', '-i', pat, ru], env=GREP_ASCII),
Command('rg (lines)', ['rg', '-n', '-i', pat, ru]),
Command('ag (lines) (ASCII)', ['ag', '-i', pat, ru]),
Command('ugrep (lines) (ASCII)', ['ugrep', '-n', '-i', pat, ru])
# See bench_subtitles_ru_literal for why we use '-a' here.
Command('ugrep (lines) (ASCII)', ['ugrep', '-a', '-n', '-i', pat, ru])
])
@@ -588,7 +591,8 @@ def bench_subtitles_ru_literal_word(suite_dir):
Command('grep (ASCII)', [
'grep', '-nw', pat, ru,
], env=GREP_ASCII),
Command('ugrep (ASCII)', ['ugrep', '-nw', pat, ru]),
# See bench_subtitles_ru_literal for why we use '-a' here.
Command('ugrep (ASCII)', ['ugrep', '-anw', pat, ru]),
Command('rg', ['rg', '-nw', pat, ru]),
Command('grep', ['grep', '-nw', pat, ru], env=GREP_UNICODE),
])
@@ -612,7 +616,8 @@ def bench_subtitles_ru_alternate(suite_dir):
Command('rg (lines)', ['rg', '-n', pat, ru]),
Command('ag (lines)', ['ag', '-s', pat, ru]),
Command('grep (lines)', ['grep', '-E', '-n', pat, ru], env=GREP_ASCII),
Command('ugrep (lines)', ['ugrep', '-n', pat, ru]),
# See bench_subtitles_ru_literal for why we use '-a' here.
Command('ugrep (lines)', ['ugrep', '-an', pat, ru]),
Command('rg', ['rg', pat, ru]),
Command('grep', ['grep', '-E', pat, ru], env=GREP_ASCII),
])
@@ -637,7 +642,8 @@ def bench_subtitles_ru_alternate_casei(suite_dir):
Command('grep (ASCII)', [
'grep', '-E', '-ni', pat, ru,
], env=GREP_ASCII),
Command('ugrep (ASCII)', ['ugrep', '-n', '-i', pat, ru]),
# See bench_subtitles_ru_literal for why we use '-a' here.
Command('ugrep (ASCII)', ['ugrep', '-ani', pat, ru]),
Command('rg', ['rg', '-n', '-i', pat, ru]),
Command('grep', ['grep', '-E', '-ni', pat, ru], env=GREP_UNICODE),
])
@@ -654,10 +660,11 @@ def bench_subtitles_ru_surrounding_words(suite_dir):
return Benchmark(pattern=pat, commands=[
Command('rg', ['rg', '-n', pat, ru]),
Command('grep', ['grep', '-E', '-n', pat, ru], env=GREP_UNICODE),
Command('ugrep', ['ugrep', '-n', pat, ru]),
Command('ugrep', ['ugrep', '-an', pat, ru]),
Command('ag (ASCII)', ['ag', '-s', pat, ru]),
Command('grep (ASCII)', ['grep', '-E', '-n', pat, ru], env=GREP_ASCII),
Command('ugrep (ASCII)', ['ugrep', '-n', '-U', pat, ru]),
# See bench_subtitles_ru_literal for why we use '-a' here.
Command('ugrep (ASCII)', ['ugrep', '-a', '-n', '-U', pat, ru]),
])
@@ -676,11 +683,13 @@ def bench_subtitles_ru_no_literal(suite_dir):
return Benchmark(pattern=pat, commands=[
Command('rg', ['rg', '-n', pat, ru]),
Command('ugrep', ['ugrep', '-n', pat, ru]),
# See bench_subtitles_ru_literal for why we use '-a' here.
Command('ugrep', ['ugrep', '-an', pat, ru]),
Command('rg (ASCII)', ['rg', '-n', '(?-u)' + pat, ru]),
Command('ag (ASCII)', ['ag', '-s', pat, ru]),
Command('grep (ASCII)', ['grep', '-E', '-n', pat, ru], env=GREP_ASCII),
Command('ugrep (ASCII)', ['ugrep', '-n', '-U', pat, ru])
# See bench_subtitles_ru_literal for why we use '-a' here.
Command('ugrep (ASCII)', ['ugrep', '-anU', pat, ru])
])

View File

@@ -0,0 +1,38 @@
This directory contains updated benchmarks as of 2022-12-16. They were captured
via the benchsuite script at `benchsuite/benchsuite` from the root of this
repository. The command that was run:
$ ./benchsuite \
--dir /dev/shm/benchsuite \
--raw runs/2022-12-16-archlinux-duff/raw.csv \
| tee runs/2022-12-16-archlinux-duff/summary
The versions of each tool are as follows:
$ rg --version
ripgrep 13.0.0 (rev 87c4a2b4b1)
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)
$ grep -V
grep (GNU grep) 3.8
$ ag -V
ag version 2.2.0
Features:
+jit +lzma +zlib
$ git --version
git version 2.39.0
$ ugrep --version
ugrep 3.9.2 x86_64-pc-linux-gnu +avx2 +pcre2jit +zlib +bzip2 +lzma +lz4 +zstd
License BSD-3-Clause: <https://opensource.org/licenses/BSD-3-Clause>
Written by Robert van Engelen and others: <https://github.com/Genivia/ugrep>
The version of ripgrep used was compiled from source on commit 7f23cd63:
$ cargo build --release --features 'pcre2'
This was run on a machine with an Intel i9-12900K with 128GB of memory.

View File

@@ -0,0 +1,400 @@
benchmark,warmup_iter,iter,name,command,duration,lines,env
linux_literal_default,1,3,rg,rg PM_RESUME,0.08678817749023438,39,
linux_literal_default,1,3,rg,rg PM_RESUME,0.08307123184204102,39,
linux_literal_default,1,3,rg,rg PM_RESUME,0.08347964286804199,39,
linux_literal_default,1,3,ag,ag PM_RESUME,0.2955434322357178,39,
linux_literal_default,1,3,ag,ag PM_RESUME,0.2954287528991699,39,
linux_literal_default,1,3,ag,ag PM_RESUME,0.2938194274902344,39,
linux_literal_default,1,3,git grep,git grep PM_RESUME,0.23198556900024414,39,LC_ALL=en_US.UTF-8
linux_literal_default,1,3,git grep,git grep PM_RESUME,0.22356963157653809,39,LC_ALL=en_US.UTF-8
linux_literal_default,1,3,git grep,git grep PM_RESUME,0.2189793586730957,39,LC_ALL=en_US.UTF-8
linux_literal_default,1,3,ugrep,ugrep -r PM_RESUME ./,0.10710000991821289,39,
linux_literal_default,1,3,ugrep,ugrep -r PM_RESUME ./,0.10364222526550293,39,
linux_literal_default,1,3,ugrep,ugrep -r PM_RESUME ./,0.1052248477935791,39,
linux_literal_default,1,3,grep,grep -r PM_RESUME ./,0.9994468688964844,39,LC_ALL=en_US.UTF-8
linux_literal_default,1,3,grep,grep -r PM_RESUME ./,0.9939279556274414,39,LC_ALL=en_US.UTF-8
linux_literal_default,1,3,grep,grep -r PM_RESUME ./,0.9957931041717529,39,LC_ALL=en_US.UTF-8
linux_literal,1,3,rg,rg -n PM_RESUME,0.08603358268737793,39,
linux_literal,1,3,rg,rg -n PM_RESUME,0.0837090015411377,39,
linux_literal,1,3,rg,rg -n PM_RESUME,0.08435535430908203,39,
linux_literal,1,3,rg (mmap),rg -n --mmap PM_RESUME,0.3215503692626953,39,
linux_literal,1,3,rg (mmap),rg -n --mmap PM_RESUME,0.32426929473876953,39,
linux_literal,1,3,rg (mmap),rg -n --mmap PM_RESUME,0.3215982913970947,39,
linux_literal,1,3,ag (mmap),ag -s PM_RESUME,0.2894856929779053,39,
linux_literal,1,3,ag (mmap),ag -s PM_RESUME,0.2892603874206543,39,
linux_literal,1,3,ag (mmap),ag -s PM_RESUME,0.29217028617858887,39,
linux_literal,1,3,git grep,git grep -I -n PM_RESUME,0.206068754196167,39,LC_ALL=C
linux_literal,1,3,git grep,git grep -I -n PM_RESUME,0.2218036651611328,39,LC_ALL=C
linux_literal,1,3,git grep,git grep -I -n PM_RESUME,0.20590710639953613,39,LC_ALL=C
linux_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./,0.18692874908447266,39,
linux_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./,0.19518327713012695,39,
linux_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./,0.18577361106872559,39,
linux_literal_casei,1,3,rg,rg -n -i PM_RESUME,0.08709383010864258,536,
linux_literal_casei,1,3,rg,rg -n -i PM_RESUME,0.08861064910888672,536,
linux_literal_casei,1,3,rg,rg -n -i PM_RESUME,0.08769798278808594,536,
linux_literal_casei,1,3,rg (mmap),rg -n -i --mmap PM_RESUME,0.3218965530395508,536,
linux_literal_casei,1,3,rg (mmap),rg -n -i --mmap PM_RESUME,0.30869364738464355,536,
linux_literal_casei,1,3,rg (mmap),rg -n -i --mmap PM_RESUME,0.31044936180114746,536,
linux_literal_casei,1,3,ag (mmap),ag -i PM_RESUME,0.2989068031311035,536,
linux_literal_casei,1,3,ag (mmap),ag -i PM_RESUME,0.2996039390563965,536,
linux_literal_casei,1,3,ag (mmap),ag -i PM_RESUME,0.29817700386047363,536,
linux_literal_casei,1,3,git grep,git grep -I -n -i PM_RESUME,0.2122786045074463,536,LC_ALL=C
linux_literal_casei,1,3,git grep,git grep -I -n -i PM_RESUME,0.20763754844665527,536,LC_ALL=C
linux_literal_casei,1,3,git grep,git grep -I -n -i PM_RESUME,0.220794677734375,536,LC_ALL=C
linux_literal_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./,0.17305850982666016,536,
linux_literal_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./,0.1745915412902832,536,
linux_literal_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./,0.17526865005493164,536,
linux_re_literal_suffix,1,3,rg,rg -n [A-Z]+_RESUME,0.08527851104736328,2160,
linux_re_literal_suffix,1,3,rg,rg -n [A-Z]+_RESUME,0.08487534523010254,2160,
linux_re_literal_suffix,1,3,rg,rg -n [A-Z]+_RESUME,0.0848684310913086,2160,
linux_re_literal_suffix,1,3,ag,ag -s [A-Z]+_RESUME,0.37945985794067383,2160,
linux_re_literal_suffix,1,3,ag,ag -s [A-Z]+_RESUME,0.36303210258483887,2160,
linux_re_literal_suffix,1,3,ag,ag -s [A-Z]+_RESUME,0.36359691619873047,2160,
linux_re_literal_suffix,1,3,git grep,git grep -E -I -n [A-Z]+_RESUME,0.9589834213256836,2160,LC_ALL=C
linux_re_literal_suffix,1,3,git grep,git grep -E -I -n [A-Z]+_RESUME,0.9206984043121338,2160,LC_ALL=C
linux_re_literal_suffix,1,3,git grep,git grep -E -I -n [A-Z]+_RESUME,0.8642933368682861,2160,LC_ALL=C
linux_re_literal_suffix,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./,0.40503501892089844,2160,
linux_re_literal_suffix,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./,0.4531714916229248,2160,
linux_re_literal_suffix,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./,0.4397866725921631,2160,
linux_word,1,3,rg,rg -n -w PM_RESUME,0.08639907836914062,9,
linux_word,1,3,rg,rg -n -w PM_RESUME,0.08583569526672363,9,
linux_word,1,3,rg,rg -n -w PM_RESUME,0.08414363861083984,9,
linux_word,1,3,ag,ag -s -w PM_RESUME,0.2853865623474121,9,
linux_word,1,3,ag,ag -s -w PM_RESUME,0.2871377468109131,9,
linux_word,1,3,ag,ag -s -w PM_RESUME,0.28753662109375,9,
linux_word,1,3,git grep,git grep -E -I -n -w PM_RESUME,0.20428204536437988,9,LC_ALL=C
linux_word,1,3,git grep,git grep -E -I -n -w PM_RESUME,0.20490717887878418,9,LC_ALL=C
linux_word,1,3,git grep,git grep -E -I -n -w PM_RESUME,0.20840072631835938,9,LC_ALL=C
linux_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./,0.18790841102600098,9,
linux_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./,0.18659543991088867,9,
linux_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./,0.19104933738708496,9,
linux_unicode_greek,1,3,rg,rg -n \p{Greek},0.19976496696472168,105,
linux_unicode_greek,1,3,rg,rg -n \p{Greek},0.20618367195129395,105,
linux_unicode_greek,1,3,rg,rg -n \p{Greek},0.19702935218811035,105,
linux_unicode_greek,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./,0.17758727073669434,105,
linux_unicode_greek,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./,0.17793798446655273,105,
linux_unicode_greek,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./,0.1872577667236328,105,
linux_unicode_greek_casei,1,3,rg,rg -n -i \p{Greek},0.19808244705200195,245,
linux_unicode_greek_casei,1,3,rg,rg -n -i \p{Greek},0.1979837417602539,245,
linux_unicode_greek_casei,1,3,rg,rg -n -i \p{Greek},0.1984400749206543,245,
linux_unicode_greek_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./,0.1819148063659668,105,
linux_unicode_greek_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./,0.17530512809753418,105,
linux_unicode_greek_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./,0.17999005317687988,105,
linux_unicode_word,1,3,rg,rg -n \wAh,0.08527827262878418,247,
linux_unicode_word,1,3,rg,rg -n \wAh,0.08541679382324219,247,
linux_unicode_word,1,3,rg,rg -n \wAh,0.08553218841552734,247,
linux_unicode_word,1,3,rg (ASCII),rg -n (?-u)\wAh,0.08484745025634766,233,
linux_unicode_word,1,3,rg (ASCII),rg -n (?-u)\wAh,0.08466482162475586,233,
linux_unicode_word,1,3,rg (ASCII),rg -n (?-u)\wAh,0.08487439155578613,233,
linux_unicode_word,1,3,ag (ASCII),ag -s \wAh,0.3061795234680176,233,
linux_unicode_word,1,3,ag (ASCII),ag -s \wAh,0.2993617057800293,233,
linux_unicode_word,1,3,ag (ASCII),ag -s \wAh,0.29722046852111816,233,
linux_unicode_word,1,3,git grep,git grep -E -I -n \wAh,4.257144451141357,247,LC_ALL=en_US.UTF-8
linux_unicode_word,1,3,git grep,git grep -E -I -n \wAh,3.852163076400757,247,LC_ALL=en_US.UTF-8
linux_unicode_word,1,3,git grep,git grep -E -I -n \wAh,3.8293941020965576,247,LC_ALL=en_US.UTF-8
linux_unicode_word,1,3,git grep (ASCII),git grep -E -I -n \wAh,1.647632122039795,233,LC_ALL=C
linux_unicode_word,1,3,git grep (ASCII),git grep -E -I -n \wAh,1.6269629001617432,233,LC_ALL=C
linux_unicode_word,1,3,git grep (ASCII),git grep -E -I -n \wAh,1.5847914218902588,233,LC_ALL=C
linux_unicode_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \wAh ./,0.1802208423614502,247,
linux_unicode_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \wAh ./,0.17564702033996582,247,
linux_unicode_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \wAh ./,0.1746981143951416,247,
linux_unicode_word,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./,0.1799161434173584,233,
linux_unicode_word,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./,0.18733000755310059,233,
linux_unicode_word,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./,0.18859529495239258,233,
linux_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.26203155517578125,721,
linux_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.2615540027618408,721,
linux_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.2730247974395752,721,
linux_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.19902300834655762,720,
linux_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.20034146308898926,720,
linux_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.20192813873291016,720,
linux_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.8269081115722656,1134,
linux_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.8393104076385498,1134,
linux_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.8293666839599609,1134,
linux_no_literal,1,3,git grep,git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},7.334395408630371,721,LC_ALL=en_US.UTF-8
linux_no_literal,1,3,git grep,git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},7.338796854019165,721,LC_ALL=en_US.UTF-8
linux_no_literal,1,3,git grep,git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},7.36545991897583,721,LC_ALL=en_US.UTF-8
linux_no_literal,1,3,git grep (ASCII),git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},2.1588926315307617,720,LC_ALL=C
linux_no_literal,1,3,git grep (ASCII),git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},2.132209062576294,720,LC_ALL=C
linux_no_literal,1,3,git grep (ASCII),git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},2.1407439708709717,720,LC_ALL=C
linux_no_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,3.410162925720215,723,
linux_no_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,3.405057668685913,723,
linux_no_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,3.3945884704589844,723,
linux_no_literal,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,0.23865604400634766,722,
linux_no_literal,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,0.23371148109436035,722,
linux_no_literal,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,0.2343149185180664,722,
linux_alternates,1,3,rg,rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.08691263198852539,140,
linux_alternates,1,3,rg,rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.08707070350646973,140,
linux_alternates,1,3,rg,rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.08713960647583008,140,
linux_alternates,1,3,ag,ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.32947278022766113,140,
linux_alternates,1,3,ag,ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.33203840255737305,140,
linux_alternates,1,3,ag,ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.3292670249938965,140,
linux_alternates,1,3,git grep,git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.4576725959777832,140,LC_ALL=C
linux_alternates,1,3,git grep,git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.41936421394348145,140,LC_ALL=C
linux_alternates,1,3,git grep,git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.3639688491821289,140,LC_ALL=C
linux_alternates,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.17806458473205566,140,
linux_alternates,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.18224716186523438,140,
linux_alternates,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.17795038223266602,140,
linux_alternates_casei,1,3,rg,rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.12421393394470215,241,
linux_alternates_casei,1,3,rg,rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.12235784530639648,241,
linux_alternates_casei,1,3,rg,rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.12151455879211426,241,
linux_alternates_casei,1,3,ag,ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.529585599899292,241,
linux_alternates_casei,1,3,ag,ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.5305526256561279,241,
linux_alternates_casei,1,3,ag,ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.5311264991760254,241,
linux_alternates_casei,1,3,git grep,git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.7589735984802246,241,LC_ALL=C
linux_alternates_casei,1,3,git grep,git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.7852108478546143,241,LC_ALL=C
linux_alternates_casei,1,3,git grep,git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.8308050632476807,241,LC_ALL=C
linux_alternates_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.17955923080444336,241,
linux_alternates_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.1745290756225586,241,
linux_alternates_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.1773686408996582,241,
subtitles_en_literal,1,3,rg,rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.1213979721069336,830,
subtitles_en_literal,1,3,rg,rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.1213991641998291,830,
subtitles_en_literal,1,3,rg,rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.12620782852172852,830,
subtitles_en_literal,1,3,rg (no mmap),rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18207263946533203,830,
subtitles_en_literal,1,3,rg (no mmap),rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.17281484603881836,830,
subtitles_en_literal,1,3,rg (no mmap),rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.17368507385253906,830,
subtitles_en_literal,1,3,grep,grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.560560941696167,830,LC_ALL=C
subtitles_en_literal,1,3,grep,grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.563499927520752,830,LC_ALL=C
subtitles_en_literal,1,3,grep,grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.5916609764099121,830,LC_ALL=C
subtitles_en_literal,1,3,rg (lines),rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.19600844383239746,830,
subtitles_en_literal,1,3,rg (lines),rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18436980247497559,830,
subtitles_en_literal,1,3,rg (lines),rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18594050407409668,830,
subtitles_en_literal,1,3,ag (lines),ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.871025562286377,830,
subtitles_en_literal,1,3,ag (lines),ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.8636960983276367,830,
subtitles_en_literal,1,3,ag (lines),ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.8680994510650635,830,
subtitles_en_literal,1,3,grep (lines),grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.9978001117706299,830,LC_ALL=C
subtitles_en_literal,1,3,grep (lines),grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.9385361671447754,830,LC_ALL=C
subtitles_en_literal,1,3,grep (lines),grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.0036489963531494,830,LC_ALL=C
subtitles_en_literal,1,3,ugrep (lines),ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18918490409851074,830,
subtitles_en_literal,1,3,ugrep (lines),ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.1769108772277832,830,
subtitles_en_literal,1,3,ugrep (lines),ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18808293342590332,830,
subtitles_en_literal_casei,1,3,rg,rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.21876287460327148,871,
subtitles_en_literal_casei,1,3,rg,rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.2044692039489746,871,
subtitles_en_literal_casei,1,3,rg,rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.2184743881225586,871,
subtitles_en_literal_casei,1,3,grep,grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,2.224027156829834,871,LC_ALL=en_US.UTF-8
subtitles_en_literal_casei,1,3,grep,grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,2.223188877105713,871,LC_ALL=en_US.UTF-8
subtitles_en_literal_casei,1,3,grep,grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,2.223966598510742,871,LC_ALL=en_US.UTF-8
subtitles_en_literal_casei,1,3,grep (ASCII),grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.671149492263794,871,LC_ALL=C
subtitles_en_literal_casei,1,3,grep (ASCII),grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.6705749034881592,871,LC_ALL=C
subtitles_en_literal_casei,1,3,grep (ASCII),grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.6700258255004883,871,LC_ALL=C
subtitles_en_literal_casei,1,3,rg (lines),rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.2624058723449707,871,
subtitles_en_literal_casei,1,3,rg (lines),rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.25513339042663574,871,
subtitles_en_literal_casei,1,3,rg (lines),rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.26088857650756836,871,
subtitles_en_literal_casei,1,3,ag (lines) (ASCII),ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.9144322872161865,871,
subtitles_en_literal_casei,1,3,ag (lines) (ASCII),ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.866628885269165,871,
subtitles_en_literal_casei,1,3,ag (lines) (ASCII),ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.9098389148712158,871,
subtitles_en_literal_casei,1,3,ugrep (lines),ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.7860472202301025,871,
subtitles_en_literal_casei,1,3,ugrep (lines),ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.7858343124389648,871,
subtitles_en_literal_casei,1,3,ugrep (lines),ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.782252311706543,871,
subtitles_en_literal_word,1,3,rg (ASCII),rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt,0.18424677848815918,830,
subtitles_en_literal_word,1,3,rg (ASCII),rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt,0.19610810279846191,830,
subtitles_en_literal_word,1,3,rg (ASCII),rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt,0.18711471557617188,830,
subtitles_en_literal_word,1,3,ag (ASCII),ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.8301315307617188,830,
subtitles_en_literal_word,1,3,ag (ASCII),ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.8689801692962646,830,
subtitles_en_literal_word,1,3,ag (ASCII),ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.8279321193695068,830,
subtitles_en_literal_word,1,3,grep (ASCII),grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.0036842823028564,830,LC_ALL=C
subtitles_en_literal_word,1,3,grep (ASCII),grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.002833604812622,830,LC_ALL=C
subtitles_en_literal_word,1,3,grep (ASCII),grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.9236147403717041,830,LC_ALL=C
subtitles_en_literal_word,1,3,ugrep (ASCII),ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.17717313766479492,830,
subtitles_en_literal_word,1,3,ugrep (ASCII),ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18994617462158203,830,
subtitles_en_literal_word,1,3,ugrep (ASCII),ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.17972850799560547,830,
subtitles_en_literal_word,1,3,rg,rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18804550170898438,830,
subtitles_en_literal_word,1,3,rg,rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18867778778076172,830,
subtitles_en_literal_word,1,3,rg,rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.19913530349731445,830,
subtitles_en_literal_word,1,3,grep,grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.0044364929199219,830,LC_ALL=en_US.UTF-8
subtitles_en_literal_word,1,3,grep,grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.0040032863616943,830,LC_ALL=en_US.UTF-8
subtitles_en_literal_word,1,3,grep,grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.9627983570098877,830,LC_ALL=en_US.UTF-8
subtitles_en_alternate,1,3,rg (lines),rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.24848055839538574,1094,
subtitles_en_alternate,1,3,rg (lines),rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.24738383293151855,1094,
subtitles_en_alternate,1,3,rg (lines),rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.24789118766784668,1094,
subtitles_en_alternate,1,3,ag (lines),ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.668708562850952,1094,
subtitles_en_alternate,1,3,ag (lines),ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.57511305809021,1094,
subtitles_en_alternate,1,3,ag (lines),ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.6714110374450684,1094,
subtitles_en_alternate,1,3,grep (lines),grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.0586187839508057,1094,LC_ALL=C
subtitles_en_alternate,1,3,grep (lines),grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.0227150917053223,1094,LC_ALL=C
subtitles_en_alternate,1,3,grep (lines),grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.075378179550171,1094,LC_ALL=C
subtitles_en_alternate,1,3,ugrep (lines),ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.7863781452178955,1094,
subtitles_en_alternate,1,3,ugrep (lines),ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.7874250411987305,1094,
subtitles_en_alternate,1,3,ugrep (lines),ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.7867889404296875,1094,
subtitles_en_alternate,1,3,rg,rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.18195557594299316,1094,
subtitles_en_alternate,1,3,rg,rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.18239641189575195,1094,
subtitles_en_alternate,1,3,rg,rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.1625690460205078,1094,
subtitles_en_alternate,1,3,grep,grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,1.6601614952087402,1094,LC_ALL=C
subtitles_en_alternate,1,3,grep,grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,1.6617567539215088,1094,LC_ALL=C
subtitles_en_alternate,1,3,grep,grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,1.6584677696228027,1094,LC_ALL=C
subtitles_en_alternate_casei,1,3,ag (ASCII),ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,4.0028722286224365,1136,
subtitles_en_alternate_casei,1,3,ag (ASCII),ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.991217851638794,1136,
subtitles_en_alternate_casei,1,3,ag (ASCII),ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,4.00272274017334,1136,
subtitles_en_alternate_casei,1,3,grep (ASCII),grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.549154758453369,1136,LC_ALL=C
subtitles_en_alternate_casei,1,3,grep (ASCII),grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.5468921661376953,1136,LC_ALL=C
subtitles_en_alternate_casei,1,3,grep (ASCII),grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.5873491764068604,1136,LC_ALL=C
subtitles_en_alternate_casei,1,3,ugrep (ASCII),ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.7872169017791748,1136,
subtitles_en_alternate_casei,1,3,ugrep (ASCII),ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.784674882888794,1136,
subtitles_en_alternate_casei,1,3,ugrep (ASCII),ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.7882401943206787,1136,
subtitles_en_alternate_casei,1,3,rg,rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.4785435199737549,1136,
subtitles_en_alternate_casei,1,3,rg,rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.4940922260284424,1136,
subtitles_en_alternate_casei,1,3,rg,rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.4774627685546875,1136,
subtitles_en_alternate_casei,1,3,grep,grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.5677175521850586,1136,LC_ALL=en_US.UTF-8
subtitles_en_alternate_casei,1,3,grep,grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.603273391723633,1136,LC_ALL=en_US.UTF-8
subtitles_en_alternate_casei,1,3,grep,grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.5834741592407227,1136,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.20238041877746582,278,
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.2031264305114746,278,
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.20475172996520996,278,
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0288453102111816,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.044802188873291,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0432109832763672,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,43.00765633583069,278,
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,42.832849740982056,278,
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,42.915205240249634,278,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.083683967590332,,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0841526985168457,,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0850934982299805,,
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0116353034973145,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.9868073463439941,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0224814414978027,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8892502784729004,,
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8910088539123535,,
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8897674083709717,,
subtitles_en_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,2.11850643157959,22,
subtitles_en_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,2.1359670162200928,22,
subtitles_en_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,2.103114128112793,22,
subtitles_en_no_literal,1,3,ugrep,ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,13.050881385803223,22,
subtitles_en_no_literal,1,3,ugrep,ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,13.050772190093994,22,
subtitles_en_no_literal,1,3,ugrep,ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,13.05719804763794,22,
subtitles_en_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,1.9961926937103271,22,
subtitles_en_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,2.019721508026123,22,
subtitles_en_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,1.9965126514434814,22,
subtitles_en_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,6.849602222442627,302,
subtitles_en_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,6.813834190368652,302,
subtitles_en_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,6.8263633251190186,302,
subtitles_en_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,4.42924165725708,22,LC_ALL=C
subtitles_en_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,4.378557205200195,22,LC_ALL=C
subtitles_en_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,4.376646518707275,22,LC_ALL=C
subtitles_en_no_literal,1,3,ugrep (ASCII),ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,3.5110037326812744,22,
subtitles_en_no_literal,1,3,ugrep (ASCII),ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,3.5137360095977783,22,
subtitles_en_no_literal,1,3,ugrep (ASCII),ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,3.5051844120025635,22,
subtitles_ru_literal,1,3,rg,rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.13207745552062988,583,
subtitles_ru_literal,1,3,rg,rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.13084721565246582,583,
subtitles_ru_literal,1,3,rg,rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.13469862937927246,583,
subtitles_ru_literal,1,3,rg (no mmap),rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.18022370338439941,583,
subtitles_ru_literal,1,3,rg (no mmap),rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.1801767349243164,583,
subtitles_ru_literal,1,3,rg (no mmap),rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.17995166778564453,583,
subtitles_ru_literal,1,3,grep,grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.5151040554046631,583,LC_ALL=C
subtitles_ru_literal,1,3,grep,grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.5154542922973633,583,LC_ALL=C
subtitles_ru_literal,1,3,grep,grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.49927639961242676,583,LC_ALL=C
subtitles_ru_literal,1,3,rg (lines),rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.19464492797851562,583,
subtitles_ru_literal,1,3,rg (lines),rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.18920588493347168,583,
subtitles_ru_literal,1,3,rg (lines),rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.19465351104736328,583,
subtitles_ru_literal,1,3,ag (lines),ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,1.9595966339111328,583,
subtitles_ru_literal,1,3,ag (lines),ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,2.0014493465423584,583,
subtitles_ru_literal,1,3,ag (lines),ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,1.9567768573760986,583,
subtitles_ru_literal,1,3,grep (lines),grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8119180202484131,583,LC_ALL=C
subtitles_ru_literal,1,3,grep (lines),grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8111097812652588,583,LC_ALL=C
subtitles_ru_literal,1,3,grep (lines),grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8006868362426758,583,LC_ALL=C
subtitles_ru_literal,1,3,ugrep (lines),ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.70003342628479,583,
subtitles_ru_literal,1,3,ugrep (lines),ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.650275468826294,583,
subtitles_ru_literal,1,3,ugrep (lines),ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.689772367477417,583,
subtitles_ru_literal_casei,1,3,rg,rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.267578125,604,
subtitles_ru_literal_casei,1,3,rg,rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.2665982246398926,604,
subtitles_ru_literal_casei,1,3,rg,rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.26861572265625,604,
subtitles_ru_literal_casei,1,3,grep,grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,4.764627456665039,604,LC_ALL=en_US.UTF-8
subtitles_ru_literal_casei,1,3,grep,grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,4.767015695571899,604,LC_ALL=en_US.UTF-8
subtitles_ru_literal_casei,1,3,grep,grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,4.7688889503479,604,LC_ALL=en_US.UTF-8
subtitles_ru_literal_casei,1,3,grep (ASCII),grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.5046737194061279,583,LC_ALL=C
subtitles_ru_literal_casei,1,3,grep (ASCII),grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.5139875411987305,583,LC_ALL=C
subtitles_ru_literal_casei,1,3,grep (ASCII),grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.4993159770965576,583,LC_ALL=C
subtitles_ru_literal_casei,1,3,rg (lines),rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.33438658714294434,604,
subtitles_ru_literal_casei,1,3,rg (lines),rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.3398289680480957,604,
subtitles_ru_literal_casei,1,3,rg (lines),rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.3298227787017822,604,
subtitles_ru_literal_casei,1,3,ag (lines) (ASCII),ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.4468214511871338,,
subtitles_ru_literal_casei,1,3,ag (lines) (ASCII),ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.44559574127197266,,
subtitles_ru_literal_casei,1,3,ag (lines) (ASCII),ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.47882938385009766,,
subtitles_ru_literal_casei,1,3,ugrep (lines) (ASCII),ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.7039575576782227,583,
subtitles_ru_literal_casei,1,3,ugrep (lines) (ASCII),ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.6490752696990967,583,
subtitles_ru_literal_casei,1,3,ugrep (lines) (ASCII),ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8081104755401611,583,
subtitles_ru_literal_word,1,3,rg (ASCII),rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt,0.20162224769592285,583,
subtitles_ru_literal_word,1,3,rg (ASCII),rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt,0.18215250968933105,583,
subtitles_ru_literal_word,1,3,rg (ASCII),rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt,0.20087671279907227,583,
subtitles_ru_literal_word,1,3,ag (ASCII),ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.48624587059020996,,
subtitles_ru_literal_word,1,3,ag (ASCII),ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.5212516784667969,,
subtitles_ru_literal_word,1,3,ag (ASCII),ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.520557165145874,,
subtitles_ru_literal_word,1,3,grep (ASCII),grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8108196258544922,583,LC_ALL=C
subtitles_ru_literal_word,1,3,grep (ASCII),grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8121066093444824,583,LC_ALL=C
subtitles_ru_literal_word,1,3,grep (ASCII),grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.7784581184387207,583,LC_ALL=C
subtitles_ru_literal_word,1,3,ugrep (ASCII),ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.7469344139099121,583,
subtitles_ru_literal_word,1,3,ugrep (ASCII),ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.6838233470916748,583,
subtitles_ru_literal_word,1,3,ugrep (ASCII),ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.6921679973602295,583,
subtitles_ru_literal_word,1,3,rg,rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.19918251037597656,579,
subtitles_ru_literal_word,1,3,rg,rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.2046656608581543,579,
subtitles_ru_literal_word,1,3,rg,rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.1984848976135254,579,
subtitles_ru_literal_word,1,3,grep,grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.794173002243042,579,LC_ALL=en_US.UTF-8
subtitles_ru_literal_word,1,3,grep,grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.7715346813201904,579,LC_ALL=en_US.UTF-8
subtitles_ru_literal_word,1,3,grep,grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8116705417633057,579,LC_ALL=en_US.UTF-8
subtitles_ru_alternate,1,3,rg (lines),rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.6730976104736328,691,
subtitles_ru_alternate,1,3,rg (lines),rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.7020411491394043,691,
subtitles_ru_alternate,1,3,rg (lines),rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.6693949699401855,691,
subtitles_ru_alternate,1,3,ag (lines),ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.7100515365600586,691,
subtitles_ru_alternate,1,3,ag (lines),ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.7458419799804688,691,
subtitles_ru_alternate,1,3,ag (lines),ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.7115116119384766,691,
subtitles_ru_alternate,1,3,grep (lines),grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.703738451004028,691,LC_ALL=C
subtitles_ru_alternate,1,3,grep (lines),grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.715883731842041,691,LC_ALL=C
subtitles_ru_alternate,1,3,grep (lines),grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.712724924087524,691,LC_ALL=C
subtitles_ru_alternate,1,3,ugrep (lines),ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.276995420455933,691,
subtitles_ru_alternate,1,3,ugrep (lines),ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.304608345031738,691,
subtitles_ru_alternate,1,3,ugrep (lines),ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.322760820388794,691,
subtitles_ru_alternate,1,3,rg,rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.6119842529296875,691,
subtitles_ru_alternate,1,3,rg,rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.6368775367736816,691,
subtitles_ru_alternate,1,3,rg,rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.6258070468902588,691,
subtitles_ru_alternate,1,3,grep,grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.4300291538238525,691,LC_ALL=C
subtitles_ru_alternate,1,3,grep,grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.418199300765991,691,LC_ALL=C
subtitles_ru_alternate,1,3,grep,grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.425868511199951,691,LC_ALL=C
subtitles_ru_alternate_casei,1,3,ag (ASCII),ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.7216460704803467,691,
subtitles_ru_alternate_casei,1,3,ag (ASCII),ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.7108607292175293,691,
subtitles_ru_alternate_casei,1,3,ag (ASCII),ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.747138500213623,691,
subtitles_ru_alternate_casei,1,3,grep (ASCII),grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.711230039596558,691,LC_ALL=C
subtitles_ru_alternate_casei,1,3,grep (ASCII),grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.709407329559326,691,LC_ALL=C
subtitles_ru_alternate_casei,1,3,grep (ASCII),grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.714034557342529,691,LC_ALL=C
subtitles_ru_alternate_casei,1,3,ugrep (ASCII),ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.305904626846313,691,
subtitles_ru_alternate_casei,1,3,ugrep (ASCII),ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.307406187057495,691,
subtitles_ru_alternate_casei,1,3,ugrep (ASCII),ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.288233995437622,691,
subtitles_ru_alternate_casei,1,3,rg,rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,3.673624277114868,735,
subtitles_ru_alternate_casei,1,3,rg,rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,3.6759188175201416,735,
subtitles_ru_alternate_casei,1,3,rg,rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,3.66877818107605,735,
subtitles_ru_alternate_casei,1,3,grep,grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.366282224655151,735,LC_ALL=en_US.UTF-8
subtitles_ru_alternate_casei,1,3,grep,grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.370524883270264,735,LC_ALL=en_US.UTF-8
subtitles_ru_alternate_casei,1,3,grep,grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.342163324356079,735,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.20331382751464844,278,
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.2034592628479004,278,
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.20407724380493164,278,
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0436389446258545,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0388383865356445,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0446207523345947,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.29245424270629883,1,
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.29168128967285156,1,
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.29593825340270996,1,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.085604190826416,,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.083526372909546,,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.1223819255828857,,
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.9905192852020264,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0222513675689697,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0216262340545654,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8875806331634521,,
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8861405849456787,,
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8898241519927979,,
subtitles_ru_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.237398147583008,41,
subtitles_ru_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.253706693649292,41,
subtitles_ru_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.2161178588867188,41,
subtitles_ru_no_literal,1,3,ugrep,ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,28.85959553718567,41,
subtitles_ru_no_literal,1,3,ugrep,ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,28.666419982910156,41,
subtitles_ru_no_literal,1,3,ugrep,ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,28.90555214881897,41,
subtitles_ru_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.051813840866089,,
subtitles_ru_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.026675224304199,,
subtitles_ru_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.027498245239258,,
subtitles_ru_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0998010635375977,,
subtitles_ru_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0900018215179443,,
subtitles_ru_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0901548862457275,,
subtitles_ru_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0691263675689697,,LC_ALL=C
subtitles_ru_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0875153541564941,,LC_ALL=C
subtitles_ru_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0997354984283447,,LC_ALL=C
subtitles_ru_no_literal,1,3,ugrep (ASCII),ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,0.8329172134399414,,
subtitles_ru_no_literal,1,3,ugrep (ASCII),ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,0.8292679786682129,,
subtitles_ru_no_literal,1,3,ugrep (ASCII),ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,0.8326950073242188,,
1 benchmark warmup_iter iter name command duration lines env
2 linux_literal_default 1 3 rg rg PM_RESUME 0.08678817749023438 39
3 linux_literal_default 1 3 rg rg PM_RESUME 0.08307123184204102 39
4 linux_literal_default 1 3 rg rg PM_RESUME 0.08347964286804199 39
5 linux_literal_default 1 3 ag ag PM_RESUME 0.2955434322357178 39
6 linux_literal_default 1 3 ag ag PM_RESUME 0.2954287528991699 39
7 linux_literal_default 1 3 ag ag PM_RESUME 0.2938194274902344 39
8 linux_literal_default 1 3 git grep git grep PM_RESUME 0.23198556900024414 39 LC_ALL=en_US.UTF-8
9 linux_literal_default 1 3 git grep git grep PM_RESUME 0.22356963157653809 39 LC_ALL=en_US.UTF-8
10 linux_literal_default 1 3 git grep git grep PM_RESUME 0.2189793586730957 39 LC_ALL=en_US.UTF-8
11 linux_literal_default 1 3 ugrep ugrep -r PM_RESUME ./ 0.10710000991821289 39
12 linux_literal_default 1 3 ugrep ugrep -r PM_RESUME ./ 0.10364222526550293 39
13 linux_literal_default 1 3 ugrep ugrep -r PM_RESUME ./ 0.1052248477935791 39
14 linux_literal_default 1 3 grep grep -r PM_RESUME ./ 0.9994468688964844 39 LC_ALL=en_US.UTF-8
15 linux_literal_default 1 3 grep grep -r PM_RESUME ./ 0.9939279556274414 39 LC_ALL=en_US.UTF-8
16 linux_literal_default 1 3 grep grep -r PM_RESUME ./ 0.9957931041717529 39 LC_ALL=en_US.UTF-8
17 linux_literal 1 3 rg rg -n PM_RESUME 0.08603358268737793 39
18 linux_literal 1 3 rg rg -n PM_RESUME 0.0837090015411377 39
19 linux_literal 1 3 rg rg -n PM_RESUME 0.08435535430908203 39
20 linux_literal 1 3 rg (mmap) rg -n --mmap PM_RESUME 0.3215503692626953 39
21 linux_literal 1 3 rg (mmap) rg -n --mmap PM_RESUME 0.32426929473876953 39
22 linux_literal 1 3 rg (mmap) rg -n --mmap PM_RESUME 0.3215982913970947 39
23 linux_literal 1 3 ag (mmap) ag -s PM_RESUME 0.2894856929779053 39
24 linux_literal 1 3 ag (mmap) ag -s PM_RESUME 0.2892603874206543 39
25 linux_literal 1 3 ag (mmap) ag -s PM_RESUME 0.29217028617858887 39
26 linux_literal 1 3 git grep git grep -I -n PM_RESUME 0.206068754196167 39 LC_ALL=C
27 linux_literal 1 3 git grep git grep -I -n PM_RESUME 0.2218036651611328 39 LC_ALL=C
28 linux_literal 1 3 git grep git grep -I -n PM_RESUME 0.20590710639953613 39 LC_ALL=C
29 linux_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./ 0.18692874908447266 39
30 linux_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./ 0.19518327713012695 39
31 linux_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./ 0.18577361106872559 39
32 linux_literal_casei 1 3 rg rg -n -i PM_RESUME 0.08709383010864258 536
33 linux_literal_casei 1 3 rg rg -n -i PM_RESUME 0.08861064910888672 536
34 linux_literal_casei 1 3 rg rg -n -i PM_RESUME 0.08769798278808594 536
35 linux_literal_casei 1 3 rg (mmap) rg -n -i --mmap PM_RESUME 0.3218965530395508 536
36 linux_literal_casei 1 3 rg (mmap) rg -n -i --mmap PM_RESUME 0.30869364738464355 536
37 linux_literal_casei 1 3 rg (mmap) rg -n -i --mmap PM_RESUME 0.31044936180114746 536
38 linux_literal_casei 1 3 ag (mmap) ag -i PM_RESUME 0.2989068031311035 536
39 linux_literal_casei 1 3 ag (mmap) ag -i PM_RESUME 0.2996039390563965 536
40 linux_literal_casei 1 3 ag (mmap) ag -i PM_RESUME 0.29817700386047363 536
41 linux_literal_casei 1 3 git grep git grep -I -n -i PM_RESUME 0.2122786045074463 536 LC_ALL=C
42 linux_literal_casei 1 3 git grep git grep -I -n -i PM_RESUME 0.20763754844665527 536 LC_ALL=C
43 linux_literal_casei 1 3 git grep git grep -I -n -i PM_RESUME 0.220794677734375 536 LC_ALL=C
44 linux_literal_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./ 0.17305850982666016 536
45 linux_literal_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./ 0.1745915412902832 536
46 linux_literal_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./ 0.17526865005493164 536
47 linux_re_literal_suffix 1 3 rg rg -n [A-Z]+_RESUME 0.08527851104736328 2160
48 linux_re_literal_suffix 1 3 rg rg -n [A-Z]+_RESUME 0.08487534523010254 2160
49 linux_re_literal_suffix 1 3 rg rg -n [A-Z]+_RESUME 0.0848684310913086 2160
50 linux_re_literal_suffix 1 3 ag ag -s [A-Z]+_RESUME 0.37945985794067383 2160
51 linux_re_literal_suffix 1 3 ag ag -s [A-Z]+_RESUME 0.36303210258483887 2160
52 linux_re_literal_suffix 1 3 ag ag -s [A-Z]+_RESUME 0.36359691619873047 2160
53 linux_re_literal_suffix 1 3 git grep git grep -E -I -n [A-Z]+_RESUME 0.9589834213256836 2160 LC_ALL=C
54 linux_re_literal_suffix 1 3 git grep git grep -E -I -n [A-Z]+_RESUME 0.9206984043121338 2160 LC_ALL=C
55 linux_re_literal_suffix 1 3 git grep git grep -E -I -n [A-Z]+_RESUME 0.8642933368682861 2160 LC_ALL=C
56 linux_re_literal_suffix 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./ 0.40503501892089844 2160
57 linux_re_literal_suffix 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./ 0.4531714916229248 2160
58 linux_re_literal_suffix 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./ 0.4397866725921631 2160
59 linux_word 1 3 rg rg -n -w PM_RESUME 0.08639907836914062 9
60 linux_word 1 3 rg rg -n -w PM_RESUME 0.08583569526672363 9
61 linux_word 1 3 rg rg -n -w PM_RESUME 0.08414363861083984 9
62 linux_word 1 3 ag ag -s -w PM_RESUME 0.2853865623474121 9
63 linux_word 1 3 ag ag -s -w PM_RESUME 0.2871377468109131 9
64 linux_word 1 3 ag ag -s -w PM_RESUME 0.28753662109375 9
65 linux_word 1 3 git grep git grep -E -I -n -w PM_RESUME 0.20428204536437988 9 LC_ALL=C
66 linux_word 1 3 git grep git grep -E -I -n -w PM_RESUME 0.20490717887878418 9 LC_ALL=C
67 linux_word 1 3 git grep git grep -E -I -n -w PM_RESUME 0.20840072631835938 9 LC_ALL=C
68 linux_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./ 0.18790841102600098 9
69 linux_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./ 0.18659543991088867 9
70 linux_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./ 0.19104933738708496 9
71 linux_unicode_greek 1 3 rg rg -n \p{Greek} 0.19976496696472168 105
72 linux_unicode_greek 1 3 rg rg -n \p{Greek} 0.20618367195129395 105
73 linux_unicode_greek 1 3 rg rg -n \p{Greek} 0.19702935218811035 105
74 linux_unicode_greek 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./ 0.17758727073669434 105
75 linux_unicode_greek 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./ 0.17793798446655273 105
76 linux_unicode_greek 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./ 0.1872577667236328 105
77 linux_unicode_greek_casei 1 3 rg rg -n -i \p{Greek} 0.19808244705200195 245
78 linux_unicode_greek_casei 1 3 rg rg -n -i \p{Greek} 0.1979837417602539 245
79 linux_unicode_greek_casei 1 3 rg rg -n -i \p{Greek} 0.1984400749206543 245
80 linux_unicode_greek_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./ 0.1819148063659668 105
81 linux_unicode_greek_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./ 0.17530512809753418 105
82 linux_unicode_greek_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./ 0.17999005317687988 105
83 linux_unicode_word 1 3 rg rg -n \wAh 0.08527827262878418 247
84 linux_unicode_word 1 3 rg rg -n \wAh 0.08541679382324219 247
85 linux_unicode_word 1 3 rg rg -n \wAh 0.08553218841552734 247
86 linux_unicode_word 1 3 rg (ASCII) rg -n (?-u)\wAh 0.08484745025634766 233
87 linux_unicode_word 1 3 rg (ASCII) rg -n (?-u)\wAh 0.08466482162475586 233
88 linux_unicode_word 1 3 rg (ASCII) rg -n (?-u)\wAh 0.08487439155578613 233
89 linux_unicode_word 1 3 ag (ASCII) ag -s \wAh 0.3061795234680176 233
90 linux_unicode_word 1 3 ag (ASCII) ag -s \wAh 0.2993617057800293 233
91 linux_unicode_word 1 3 ag (ASCII) ag -s \wAh 0.29722046852111816 233
92 linux_unicode_word 1 3 git grep git grep -E -I -n \wAh 4.257144451141357 247 LC_ALL=en_US.UTF-8
93 linux_unicode_word 1 3 git grep git grep -E -I -n \wAh 3.852163076400757 247 LC_ALL=en_US.UTF-8
94 linux_unicode_word 1 3 git grep git grep -E -I -n \wAh 3.8293941020965576 247 LC_ALL=en_US.UTF-8
95 linux_unicode_word 1 3 git grep (ASCII) git grep -E -I -n \wAh 1.647632122039795 233 LC_ALL=C
96 linux_unicode_word 1 3 git grep (ASCII) git grep -E -I -n \wAh 1.6269629001617432 233 LC_ALL=C
97 linux_unicode_word 1 3 git grep (ASCII) git grep -E -I -n \wAh 1.5847914218902588 233 LC_ALL=C
98 linux_unicode_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \wAh ./ 0.1802208423614502 247
99 linux_unicode_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \wAh ./ 0.17564702033996582 247
100 linux_unicode_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \wAh ./ 0.1746981143951416 247
101 linux_unicode_word 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./ 0.1799161434173584 233
102 linux_unicode_word 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./ 0.18733000755310059 233
103 linux_unicode_word 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./ 0.18859529495239258 233
104 linux_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.26203155517578125 721
105 linux_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.2615540027618408 721
106 linux_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.2730247974395752 721
107 linux_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.19902300834655762 720
108 linux_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.20034146308898926 720
109 linux_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.20192813873291016 720
110 linux_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.8269081115722656 1134
111 linux_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.8393104076385498 1134
112 linux_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.8293666839599609 1134
113 linux_no_literal 1 3 git grep git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 7.334395408630371 721 LC_ALL=en_US.UTF-8
114 linux_no_literal 1 3 git grep git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 7.338796854019165 721 LC_ALL=en_US.UTF-8
115 linux_no_literal 1 3 git grep git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 7.36545991897583 721 LC_ALL=en_US.UTF-8
116 linux_no_literal 1 3 git grep (ASCII) git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 2.1588926315307617 720 LC_ALL=C
117 linux_no_literal 1 3 git grep (ASCII) git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 2.132209062576294 720 LC_ALL=C
118 linux_no_literal 1 3 git grep (ASCII) git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 2.1407439708709717 720 LC_ALL=C
119 linux_no_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 3.410162925720215 723
120 linux_no_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 3.405057668685913 723
121 linux_no_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 3.3945884704589844 723
122 linux_no_literal 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 0.23865604400634766 722
123 linux_no_literal 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 0.23371148109436035 722
124 linux_no_literal 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 0.2343149185180664 722
125 linux_alternates 1 3 rg rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.08691263198852539 140
126 linux_alternates 1 3 rg rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.08707070350646973 140
127 linux_alternates 1 3 rg rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.08713960647583008 140
128 linux_alternates 1 3 ag ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.32947278022766113 140
129 linux_alternates 1 3 ag ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.33203840255737305 140
130 linux_alternates 1 3 ag ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.3292670249938965 140
131 linux_alternates 1 3 git grep git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.4576725959777832 140 LC_ALL=C
132 linux_alternates 1 3 git grep git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.41936421394348145 140 LC_ALL=C
133 linux_alternates 1 3 git grep git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.3639688491821289 140 LC_ALL=C
134 linux_alternates 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.17806458473205566 140
135 linux_alternates 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.18224716186523438 140
136 linux_alternates 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.17795038223266602 140
137 linux_alternates_casei 1 3 rg rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.12421393394470215 241
138 linux_alternates_casei 1 3 rg rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.12235784530639648 241
139 linux_alternates_casei 1 3 rg rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.12151455879211426 241
140 linux_alternates_casei 1 3 ag ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.529585599899292 241
141 linux_alternates_casei 1 3 ag ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.5305526256561279 241
142 linux_alternates_casei 1 3 ag ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.5311264991760254 241
143 linux_alternates_casei 1 3 git grep git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.7589735984802246 241 LC_ALL=C
144 linux_alternates_casei 1 3 git grep git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.7852108478546143 241 LC_ALL=C
145 linux_alternates_casei 1 3 git grep git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.8308050632476807 241 LC_ALL=C
146 linux_alternates_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.17955923080444336 241
147 linux_alternates_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.1745290756225586 241
148 linux_alternates_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.1773686408996582 241
149 subtitles_en_literal 1 3 rg rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.1213979721069336 830
150 subtitles_en_literal 1 3 rg rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.1213991641998291 830
151 subtitles_en_literal 1 3 rg rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.12620782852172852 830
152 subtitles_en_literal 1 3 rg (no mmap) rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18207263946533203 830
153 subtitles_en_literal 1 3 rg (no mmap) rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.17281484603881836 830
154 subtitles_en_literal 1 3 rg (no mmap) rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.17368507385253906 830
155 subtitles_en_literal 1 3 grep grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.560560941696167 830 LC_ALL=C
156 subtitles_en_literal 1 3 grep grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.563499927520752 830 LC_ALL=C
157 subtitles_en_literal 1 3 grep grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.5916609764099121 830 LC_ALL=C
158 subtitles_en_literal 1 3 rg (lines) rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.19600844383239746 830
159 subtitles_en_literal 1 3 rg (lines) rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18436980247497559 830
160 subtitles_en_literal 1 3 rg (lines) rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18594050407409668 830
161 subtitles_en_literal 1 3 ag (lines) ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.871025562286377 830
162 subtitles_en_literal 1 3 ag (lines) ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.8636960983276367 830
163 subtitles_en_literal 1 3 ag (lines) ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.8680994510650635 830
164 subtitles_en_literal 1 3 grep (lines) grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.9978001117706299 830 LC_ALL=C
165 subtitles_en_literal 1 3 grep (lines) grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.9385361671447754 830 LC_ALL=C
166 subtitles_en_literal 1 3 grep (lines) grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.0036489963531494 830 LC_ALL=C
167 subtitles_en_literal 1 3 ugrep (lines) ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18918490409851074 830
168 subtitles_en_literal 1 3 ugrep (lines) ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.1769108772277832 830
169 subtitles_en_literal 1 3 ugrep (lines) ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18808293342590332 830
170 subtitles_en_literal_casei 1 3 rg rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.21876287460327148 871
171 subtitles_en_literal_casei 1 3 rg rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.2044692039489746 871
172 subtitles_en_literal_casei 1 3 rg rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.2184743881225586 871
173 subtitles_en_literal_casei 1 3 grep grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 2.224027156829834 871 LC_ALL=en_US.UTF-8
174 subtitles_en_literal_casei 1 3 grep grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 2.223188877105713 871 LC_ALL=en_US.UTF-8
175 subtitles_en_literal_casei 1 3 grep grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 2.223966598510742 871 LC_ALL=en_US.UTF-8
176 subtitles_en_literal_casei 1 3 grep (ASCII) grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.671149492263794 871 LC_ALL=C
177 subtitles_en_literal_casei 1 3 grep (ASCII) grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.6705749034881592 871 LC_ALL=C
178 subtitles_en_literal_casei 1 3 grep (ASCII) grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.6700258255004883 871 LC_ALL=C
179 subtitles_en_literal_casei 1 3 rg (lines) rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.2624058723449707 871
180 subtitles_en_literal_casei 1 3 rg (lines) rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.25513339042663574 871
181 subtitles_en_literal_casei 1 3 rg (lines) rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.26088857650756836 871
182 subtitles_en_literal_casei 1 3 ag (lines) (ASCII) ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.9144322872161865 871
183 subtitles_en_literal_casei 1 3 ag (lines) (ASCII) ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.866628885269165 871
184 subtitles_en_literal_casei 1 3 ag (lines) (ASCII) ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.9098389148712158 871
185 subtitles_en_literal_casei 1 3 ugrep (lines) ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.7860472202301025 871
186 subtitles_en_literal_casei 1 3 ugrep (lines) ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.7858343124389648 871
187 subtitles_en_literal_casei 1 3 ugrep (lines) ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.782252311706543 871
188 subtitles_en_literal_word 1 3 rg (ASCII) rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt 0.18424677848815918 830
189 subtitles_en_literal_word 1 3 rg (ASCII) rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt 0.19610810279846191 830
190 subtitles_en_literal_word 1 3 rg (ASCII) rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt 0.18711471557617188 830
191 subtitles_en_literal_word 1 3 ag (ASCII) ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.8301315307617188 830
192 subtitles_en_literal_word 1 3 ag (ASCII) ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.8689801692962646 830
193 subtitles_en_literal_word 1 3 ag (ASCII) ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.8279321193695068 830
194 subtitles_en_literal_word 1 3 grep (ASCII) grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.0036842823028564 830 LC_ALL=C
195 subtitles_en_literal_word 1 3 grep (ASCII) grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.002833604812622 830 LC_ALL=C
196 subtitles_en_literal_word 1 3 grep (ASCII) grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.9236147403717041 830 LC_ALL=C
197 subtitles_en_literal_word 1 3 ugrep (ASCII) ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.17717313766479492 830
198 subtitles_en_literal_word 1 3 ugrep (ASCII) ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18994617462158203 830
199 subtitles_en_literal_word 1 3 ugrep (ASCII) ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.17972850799560547 830
200 subtitles_en_literal_word 1 3 rg rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18804550170898438 830
201 subtitles_en_literal_word 1 3 rg rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18867778778076172 830
202 subtitles_en_literal_word 1 3 rg rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.19913530349731445 830
203 subtitles_en_literal_word 1 3 grep grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.0044364929199219 830 LC_ALL=en_US.UTF-8
204 subtitles_en_literal_word 1 3 grep grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.0040032863616943 830 LC_ALL=en_US.UTF-8
205 subtitles_en_literal_word 1 3 grep grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.9627983570098877 830 LC_ALL=en_US.UTF-8
206 subtitles_en_alternate 1 3 rg (lines) rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.24848055839538574 1094
207 subtitles_en_alternate 1 3 rg (lines) rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.24738383293151855 1094
208 subtitles_en_alternate 1 3 rg (lines) rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.24789118766784668 1094
209 subtitles_en_alternate 1 3 ag (lines) ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.668708562850952 1094
210 subtitles_en_alternate 1 3 ag (lines) ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.57511305809021 1094
211 subtitles_en_alternate 1 3 ag (lines) ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.6714110374450684 1094
212 subtitles_en_alternate 1 3 grep (lines) grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.0586187839508057 1094 LC_ALL=C
213 subtitles_en_alternate 1 3 grep (lines) grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.0227150917053223 1094 LC_ALL=C
214 subtitles_en_alternate 1 3 grep (lines) grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.075378179550171 1094 LC_ALL=C
215 subtitles_en_alternate 1 3 ugrep (lines) ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.7863781452178955 1094
216 subtitles_en_alternate 1 3 ugrep (lines) ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.7874250411987305 1094
217 subtitles_en_alternate 1 3 ugrep (lines) ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.7867889404296875 1094
218 subtitles_en_alternate 1 3 rg rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.18195557594299316 1094
219 subtitles_en_alternate 1 3 rg rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.18239641189575195 1094
220 subtitles_en_alternate 1 3 rg rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.1625690460205078 1094
221 subtitles_en_alternate 1 3 grep grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 1.6601614952087402 1094 LC_ALL=C
222 subtitles_en_alternate 1 3 grep grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 1.6617567539215088 1094 LC_ALL=C
223 subtitles_en_alternate 1 3 grep grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 1.6584677696228027 1094 LC_ALL=C
224 subtitles_en_alternate_casei 1 3 ag (ASCII) ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 4.0028722286224365 1136
225 subtitles_en_alternate_casei 1 3 ag (ASCII) ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.991217851638794 1136
226 subtitles_en_alternate_casei 1 3 ag (ASCII) ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 4.00272274017334 1136
227 subtitles_en_alternate_casei 1 3 grep (ASCII) grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.549154758453369 1136 LC_ALL=C
228 subtitles_en_alternate_casei 1 3 grep (ASCII) grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.5468921661376953 1136 LC_ALL=C
229 subtitles_en_alternate_casei 1 3 grep (ASCII) grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.5873491764068604 1136 LC_ALL=C
230 subtitles_en_alternate_casei 1 3 ugrep (ASCII) ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.7872169017791748 1136
231 subtitles_en_alternate_casei 1 3 ugrep (ASCII) ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.784674882888794 1136
232 subtitles_en_alternate_casei 1 3 ugrep (ASCII) ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.7882401943206787 1136
233 subtitles_en_alternate_casei 1 3 rg rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.4785435199737549 1136
234 subtitles_en_alternate_casei 1 3 rg rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.4940922260284424 1136
235 subtitles_en_alternate_casei 1 3 rg rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.4774627685546875 1136
236 subtitles_en_alternate_casei 1 3 grep grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.5677175521850586 1136 LC_ALL=en_US.UTF-8
237 subtitles_en_alternate_casei 1 3 grep grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.603273391723633 1136 LC_ALL=en_US.UTF-8
238 subtitles_en_alternate_casei 1 3 grep grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.5834741592407227 1136 LC_ALL=en_US.UTF-8
239 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.20238041877746582 278
240 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.2031264305114746 278
241 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.20475172996520996 278
242 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0288453102111816 278 LC_ALL=en_US.UTF-8
243 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.044802188873291 278 LC_ALL=en_US.UTF-8
244 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0432109832763672 278 LC_ALL=en_US.UTF-8
245 subtitles_ru_surrounding_words 1 3 ugrep ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 43.00765633583069 278
246 subtitles_ru_surrounding_words 1 3 ugrep ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 42.832849740982056 278
247 subtitles_ru_surrounding_words 1 3 ugrep ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 42.915205240249634 278
248 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.083683967590332
249 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0841526985168457
250 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0850934982299805
251 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0116353034973145 LC_ALL=C
252 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.9868073463439941 LC_ALL=C
253 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0224814414978027 LC_ALL=C
254 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8892502784729004
255 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8910088539123535
256 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8897674083709717
257 subtitles_en_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 2.11850643157959 22
258 subtitles_en_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 2.1359670162200928 22
259 subtitles_en_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 2.103114128112793 22
260 subtitles_en_no_literal 1 3 ugrep ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 13.050881385803223 22
261 subtitles_en_no_literal 1 3 ugrep ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 13.050772190093994 22
262 subtitles_en_no_literal 1 3 ugrep ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 13.05719804763794 22
263 subtitles_en_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 1.9961926937103271 22
264 subtitles_en_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 2.019721508026123 22
265 subtitles_en_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 1.9965126514434814 22
266 subtitles_en_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 6.849602222442627 302
267 subtitles_en_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 6.813834190368652 302
268 subtitles_en_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 6.8263633251190186 302
269 subtitles_en_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 4.42924165725708 22 LC_ALL=C
270 subtitles_en_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 4.378557205200195 22 LC_ALL=C
271 subtitles_en_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 4.376646518707275 22 LC_ALL=C
272 subtitles_en_no_literal 1 3 ugrep (ASCII) ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 3.5110037326812744 22
273 subtitles_en_no_literal 1 3 ugrep (ASCII) ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 3.5137360095977783 22
274 subtitles_en_no_literal 1 3 ugrep (ASCII) ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 3.5051844120025635 22
275 subtitles_ru_literal 1 3 rg rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.13207745552062988 583
276 subtitles_ru_literal 1 3 rg rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.13084721565246582 583
277 subtitles_ru_literal 1 3 rg rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.13469862937927246 583
278 subtitles_ru_literal 1 3 rg (no mmap) rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.18022370338439941 583
279 subtitles_ru_literal 1 3 rg (no mmap) rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.1801767349243164 583
280 subtitles_ru_literal 1 3 rg (no mmap) rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.17995166778564453 583
281 subtitles_ru_literal 1 3 grep grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.5151040554046631 583 LC_ALL=C
282 subtitles_ru_literal 1 3 grep grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.5154542922973633 583 LC_ALL=C
283 subtitles_ru_literal 1 3 grep grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.49927639961242676 583 LC_ALL=C
284 subtitles_ru_literal 1 3 rg (lines) rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.19464492797851562 583
285 subtitles_ru_literal 1 3 rg (lines) rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.18920588493347168 583
286 subtitles_ru_literal 1 3 rg (lines) rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.19465351104736328 583
287 subtitles_ru_literal 1 3 ag (lines) ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 1.9595966339111328 583
288 subtitles_ru_literal 1 3 ag (lines) ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 2.0014493465423584 583
289 subtitles_ru_literal 1 3 ag (lines) ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 1.9567768573760986 583
290 subtitles_ru_literal 1 3 grep (lines) grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8119180202484131 583 LC_ALL=C
291 subtitles_ru_literal 1 3 grep (lines) grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8111097812652588 583 LC_ALL=C
292 subtitles_ru_literal 1 3 grep (lines) grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8006868362426758 583 LC_ALL=C
293 subtitles_ru_literal 1 3 ugrep (lines) ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.70003342628479 583
294 subtitles_ru_literal 1 3 ugrep (lines) ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.650275468826294 583
295 subtitles_ru_literal 1 3 ugrep (lines) ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.689772367477417 583
296 subtitles_ru_literal_casei 1 3 rg rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.267578125 604
297 subtitles_ru_literal_casei 1 3 rg rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.2665982246398926 604
298 subtitles_ru_literal_casei 1 3 rg rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.26861572265625 604
299 subtitles_ru_literal_casei 1 3 grep grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 4.764627456665039 604 LC_ALL=en_US.UTF-8
300 subtitles_ru_literal_casei 1 3 grep grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 4.767015695571899 604 LC_ALL=en_US.UTF-8
301 subtitles_ru_literal_casei 1 3 grep grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 4.7688889503479 604 LC_ALL=en_US.UTF-8
302 subtitles_ru_literal_casei 1 3 grep (ASCII) grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.5046737194061279 583 LC_ALL=C
303 subtitles_ru_literal_casei 1 3 grep (ASCII) grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.5139875411987305 583 LC_ALL=C
304 subtitles_ru_literal_casei 1 3 grep (ASCII) grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.4993159770965576 583 LC_ALL=C
305 subtitles_ru_literal_casei 1 3 rg (lines) rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.33438658714294434 604
306 subtitles_ru_literal_casei 1 3 rg (lines) rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.3398289680480957 604
307 subtitles_ru_literal_casei 1 3 rg (lines) rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.3298227787017822 604
308 subtitles_ru_literal_casei 1 3 ag (lines) (ASCII) ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.4468214511871338
309 subtitles_ru_literal_casei 1 3 ag (lines) (ASCII) ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.44559574127197266
310 subtitles_ru_literal_casei 1 3 ag (lines) (ASCII) ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.47882938385009766
311 subtitles_ru_literal_casei 1 3 ugrep (lines) (ASCII) ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.7039575576782227 583
312 subtitles_ru_literal_casei 1 3 ugrep (lines) (ASCII) ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.6490752696990967 583
313 subtitles_ru_literal_casei 1 3 ugrep (lines) (ASCII) ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8081104755401611 583
314 subtitles_ru_literal_word 1 3 rg (ASCII) rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt 0.20162224769592285 583
315 subtitles_ru_literal_word 1 3 rg (ASCII) rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt 0.18215250968933105 583
316 subtitles_ru_literal_word 1 3 rg (ASCII) rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt 0.20087671279907227 583
317 subtitles_ru_literal_word 1 3 ag (ASCII) ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.48624587059020996
318 subtitles_ru_literal_word 1 3 ag (ASCII) ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.5212516784667969
319 subtitles_ru_literal_word 1 3 ag (ASCII) ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.520557165145874
320 subtitles_ru_literal_word 1 3 grep (ASCII) grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8108196258544922 583 LC_ALL=C
321 subtitles_ru_literal_word 1 3 grep (ASCII) grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8121066093444824 583 LC_ALL=C
322 subtitles_ru_literal_word 1 3 grep (ASCII) grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.7784581184387207 583 LC_ALL=C
323 subtitles_ru_literal_word 1 3 ugrep (ASCII) ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.7469344139099121 583
324 subtitles_ru_literal_word 1 3 ugrep (ASCII) ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.6838233470916748 583
325 subtitles_ru_literal_word 1 3 ugrep (ASCII) ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.6921679973602295 583
326 subtitles_ru_literal_word 1 3 rg rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.19918251037597656 579
327 subtitles_ru_literal_word 1 3 rg rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.2046656608581543 579
328 subtitles_ru_literal_word 1 3 rg rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.1984848976135254 579
329 subtitles_ru_literal_word 1 3 grep grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.794173002243042 579 LC_ALL=en_US.UTF-8
330 subtitles_ru_literal_word 1 3 grep grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.7715346813201904 579 LC_ALL=en_US.UTF-8
331 subtitles_ru_literal_word 1 3 grep grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8116705417633057 579 LC_ALL=en_US.UTF-8
332 subtitles_ru_alternate 1 3 rg (lines) rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.6730976104736328 691
333 subtitles_ru_alternate 1 3 rg (lines) rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.7020411491394043 691
334 subtitles_ru_alternate 1 3 rg (lines) rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.6693949699401855 691
335 subtitles_ru_alternate 1 3 ag (lines) ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.7100515365600586 691
336 subtitles_ru_alternate 1 3 ag (lines) ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.7458419799804688 691
337 subtitles_ru_alternate 1 3 ag (lines) ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.7115116119384766 691
338 subtitles_ru_alternate 1 3 grep (lines) grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.703738451004028 691 LC_ALL=C
339 subtitles_ru_alternate 1 3 grep (lines) grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.715883731842041 691 LC_ALL=C
340 subtitles_ru_alternate 1 3 grep (lines) grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.712724924087524 691 LC_ALL=C
341 subtitles_ru_alternate 1 3 ugrep (lines) ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.276995420455933 691
342 subtitles_ru_alternate 1 3 ugrep (lines) ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.304608345031738 691
343 subtitles_ru_alternate 1 3 ugrep (lines) ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.322760820388794 691
344 subtitles_ru_alternate 1 3 rg rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.6119842529296875 691
345 subtitles_ru_alternate 1 3 rg rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.6368775367736816 691
346 subtitles_ru_alternate 1 3 rg rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.6258070468902588 691
347 subtitles_ru_alternate 1 3 grep grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.4300291538238525 691 LC_ALL=C
348 subtitles_ru_alternate 1 3 grep grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.418199300765991 691 LC_ALL=C
349 subtitles_ru_alternate 1 3 grep grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.425868511199951 691 LC_ALL=C
350 subtitles_ru_alternate_casei 1 3 ag (ASCII) ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.7216460704803467 691
351 subtitles_ru_alternate_casei 1 3 ag (ASCII) ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.7108607292175293 691
352 subtitles_ru_alternate_casei 1 3 ag (ASCII) ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.747138500213623 691
353 subtitles_ru_alternate_casei 1 3 grep (ASCII) grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.711230039596558 691 LC_ALL=C
354 subtitles_ru_alternate_casei 1 3 grep (ASCII) grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.709407329559326 691 LC_ALL=C
355 subtitles_ru_alternate_casei 1 3 grep (ASCII) grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.714034557342529 691 LC_ALL=C
356 subtitles_ru_alternate_casei 1 3 ugrep (ASCII) ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.305904626846313 691
357 subtitles_ru_alternate_casei 1 3 ugrep (ASCII) ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.307406187057495 691
358 subtitles_ru_alternate_casei 1 3 ugrep (ASCII) ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.288233995437622 691
359 subtitles_ru_alternate_casei 1 3 rg rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 3.673624277114868 735
360 subtitles_ru_alternate_casei 1 3 rg rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 3.6759188175201416 735
361 subtitles_ru_alternate_casei 1 3 rg rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 3.66877818107605 735
362 subtitles_ru_alternate_casei 1 3 grep grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.366282224655151 735 LC_ALL=en_US.UTF-8
363 subtitles_ru_alternate_casei 1 3 grep grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.370524883270264 735 LC_ALL=en_US.UTF-8
364 subtitles_ru_alternate_casei 1 3 grep grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.342163324356079 735 LC_ALL=en_US.UTF-8
365 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.20331382751464844 278
366 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.2034592628479004 278
367 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.20407724380493164 278
368 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0436389446258545 278 LC_ALL=en_US.UTF-8
369 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0388383865356445 278 LC_ALL=en_US.UTF-8
370 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0446207523345947 278 LC_ALL=en_US.UTF-8
371 subtitles_ru_surrounding_words 1 3 ugrep ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.29245424270629883 1
372 subtitles_ru_surrounding_words 1 3 ugrep ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.29168128967285156 1
373 subtitles_ru_surrounding_words 1 3 ugrep ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.29593825340270996 1
374 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.085604190826416
375 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.083526372909546
376 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.1223819255828857
377 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.9905192852020264 LC_ALL=C
378 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0222513675689697 LC_ALL=C
379 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0216262340545654 LC_ALL=C
380 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8875806331634521
381 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8861405849456787
382 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8898241519927979
383 subtitles_ru_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.237398147583008 41
384 subtitles_ru_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.253706693649292 41
385 subtitles_ru_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.2161178588867188 41
386 subtitles_ru_no_literal 1 3 ugrep ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 28.85959553718567 41
387 subtitles_ru_no_literal 1 3 ugrep ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 28.666419982910156 41
388 subtitles_ru_no_literal 1 3 ugrep ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 28.90555214881897 41
389 subtitles_ru_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.051813840866089
390 subtitles_ru_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.026675224304199
391 subtitles_ru_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.027498245239258
392 subtitles_ru_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0998010635375977
393 subtitles_ru_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0900018215179443
394 subtitles_ru_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0901548862457275
395 subtitles_ru_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0691263675689697 LC_ALL=C
396 subtitles_ru_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0875153541564941 LC_ALL=C
397 subtitles_ru_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0997354984283447 LC_ALL=C
398 subtitles_ru_no_literal 1 3 ugrep (ASCII) ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 0.8329172134399414
399 subtitles_ru_no_literal 1 3 ugrep (ASCII) ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 0.8292679786682129
400 subtitles_ru_no_literal 1 3 ugrep (ASCII) ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 0.8326950073242188

View File

@@ -0,0 +1,208 @@
linux_literal_default (pattern: PM_RESUME)
------------------------------------------
rg* 0.084 +/- 0.002 (lines: 39)*
ag 0.295 +/- 0.001 (lines: 39)
git grep 0.225 +/- 0.007 (lines: 39)
ugrep 0.105 +/- 0.002 (lines: 39)
grep 0.996 +/- 0.003 (lines: 39)
linux_literal (pattern: PM_RESUME)
----------------------------------
rg* 0.085 +/- 0.001 (lines: 39)*
rg (mmap) 0.322 +/- 0.002 (lines: 39)
ag (mmap) 0.290 +/- 0.002 (lines: 39)
git grep 0.211 +/- 0.009 (lines: 39)
ugrep 0.189 +/- 0.005 (lines: 39)
linux_literal_casei (pattern: PM_RESUME)
----------------------------------------
rg* 0.088 +/- 0.001 (lines: 536)*
rg (mmap) 0.314 +/- 0.007 (lines: 536)
ag (mmap) 0.299 +/- 0.001 (lines: 536)
git grep 0.214 +/- 0.007 (lines: 536)
ugrep 0.174 +/- 0.001 (lines: 536)
linux_re_literal_suffix (pattern: [A-Z]+_RESUME)
------------------------------------------------
rg* 0.085 +/- 0.000 (lines: 2160)*
ag 0.369 +/- 0.009 (lines: 2160)
git grep 0.915 +/- 0.048 (lines: 2160)
ugrep 0.433 +/- 0.025 (lines: 2160)
linux_word (pattern: PM_RESUME)
-------------------------------
rg* 0.085 +/- 0.001 (lines: 9)*
ag 0.287 +/- 0.001 (lines: 9)
git grep 0.206 +/- 0.002 (lines: 9)
ugrep 0.189 +/- 0.002 (lines: 9)
linux_unicode_greek (pattern: \p{Greek})
----------------------------------------
rg 0.201 +/- 0.005 (lines: 105)
ugrep* 0.181 +/- 0.005 (lines: 105)*
linux_unicode_greek_casei (pattern: \p{Greek})
----------------------------------------------
rg 0.198 +/- 0.000 (lines: 245)
ugrep* 0.179 +/- 0.003 (lines: 105)*
linux_unicode_word (pattern: \wAh)
----------------------------------
rg 0.085 +/- 0.000 (lines: 247)
rg (ASCII)* 0.085 +/- 0.000 (lines: 233)*
ag (ASCII) 0.301 +/- 0.005 (lines: 233)
git grep 3.980 +/- 0.241 (lines: 247)
git grep (ASCII) 1.620 +/- 0.032 (lines: 233)
ugrep 0.177 +/- 0.003 (lines: 247)
ugrep (ASCII) 0.185 +/- 0.005 (lines: 233)
linux_no_literal (pattern: \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5})
-----------------------------------------------------------------
rg 0.266 +/- 0.006 (lines: 721)
rg (ASCII)* 0.200 +/- 0.001 (lines: 720)*
ag (ASCII) 0.832 +/- 0.007 (lines: 1134)
git grep 7.346 +/- 0.017 (lines: 721)
git grep (ASCII) 2.144 +/- 0.014 (lines: 720)
ugrep 3.403 +/- 0.008 (lines: 723)
ugrep (ASCII) 0.236 +/- 0.003 (lines: 722)
linux_alternates (pattern: ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT)
-------------------------------------------------------------------------
rg* 0.087 +/- 0.000 (lines: 140)*
ag 0.330 +/- 0.002 (lines: 140)
git grep 0.414 +/- 0.047 (lines: 140)
ugrep 0.179 +/- 0.002 (lines: 140)
linux_alternates_casei (pattern: ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT)
-------------------------------------------------------------------------------
rg* 0.123 +/- 0.001 (lines: 241)*
ag 0.530 +/- 0.001 (lines: 241)
git grep 0.792 +/- 0.036 (lines: 241)
ugrep 0.177 +/- 0.003 (lines: 241)
subtitles_en_literal (pattern: Sherlock Holmes)
-----------------------------------------------
rg* 0.123 +/- 0.003 (lines: 830)*
rg (no mmap) 0.176 +/- 0.005 (lines: 830)
grep 0.572 +/- 0.017 (lines: 830)
rg (lines) 0.189 +/- 0.006 (lines: 830)
ag (lines) 1.868 +/- 0.004 (lines: 830)
grep (lines) 0.980 +/- 0.036 (lines: 830)
ugrep (lines) 0.185 +/- 0.007 (lines: 830)
subtitles_en_literal_casei (pattern: Sherlock Holmes)
-----------------------------------------------------
rg* 0.214 +/- 0.008 (lines: 871)*
grep 2.224 +/- 0.000 (lines: 871)
grep (ASCII) 0.671 +/- 0.001 (lines: 871)
rg (lines) 0.259 +/- 0.004 (lines: 871)
ag (lines) (ASCII) 1.897 +/- 0.026 (lines: 871)
ugrep (lines) 0.785 +/- 0.002 (lines: 871)
subtitles_en_literal_word (pattern: Sherlock Holmes)
----------------------------------------------------
rg (ASCII) 0.189 +/- 0.006 (lines: 830)
ag (ASCII) 1.842 +/- 0.023 (lines: 830)
grep (ASCII) 0.977 +/- 0.046 (lines: 830)
ugrep (ASCII)* 0.182 +/- 0.007 (lines: 830)*
rg 0.192 +/- 0.006 (lines: 830)
grep 0.990 +/- 0.024 (lines: 830)
subtitles_en_alternate (pattern: Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)
---------------------------------------------------------------------------------------------------------------
rg (lines) 0.248 +/- 0.001 (lines: 1094)
ag (lines) 2.638 +/- 0.055 (lines: 1094)
grep (lines) 2.052 +/- 0.027 (lines: 1094)
ugrep (lines) 0.787 +/- 0.001 (lines: 1094)
rg* 0.176 +/- 0.011 (lines: 1094)*
grep 1.660 +/- 0.002 (lines: 1094)
subtitles_en_alternate_casei (pattern: Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)
---------------------------------------------------------------------------------------------------------------------
ag (ASCII) 3.999 +/- 0.007 (lines: 1136)
grep (ASCII) 3.561 +/- 0.023 (lines: 1136)
ugrep (ASCII) 0.787 +/- 0.002 (lines: 1136)
rg* 0.483 +/- 0.009 (lines: 1136)*
grep 3.585 +/- 0.018 (lines: 1136)
subtitles_en_surrounding_words (pattern: \w+\s+Holmes\s+\w+)
------------------------------------------------------------
rg 0.200 +/- 0.001 (lines: 483)
grep 1.303 +/- 0.040 (lines: 483)
ugrep 43.220 +/- 0.047 (lines: 483)
rg (ASCII)* 0.197 +/- 0.000 (lines: 483)*
ag (ASCII) 5.223 +/- 0.056 (lines: 489)
grep (ASCII) 1.316 +/- 0.043 (lines: 483)
ugrep (ASCII) 17.647 +/- 0.219 (lines: 483)
subtitles_en_no_literal (pattern: \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5})
----------------------------------------------------------------------------------------
rg 2.119 +/- 0.016 (lines: 22)
ugrep 13.053 +/- 0.004 (lines: 22)
rg (ASCII)* 2.004 +/- 0.013 (lines: 22)*
ag (ASCII) 6.830 +/- 0.018 (lines: 302)
grep (ASCII) 4.395 +/- 0.030 (lines: 22)
ugrep (ASCII) 3.510 +/- 0.004 (lines: 22)
subtitles_ru_literal (pattern: Шерлок Холмс)
--------------------------------------------
rg* 0.133 +/- 0.002 (lines: 583)*
rg (no mmap) 0.180 +/- 0.000 (lines: 583)
grep 0.510 +/- 0.009 (lines: 583)
rg (lines) 0.193 +/- 0.003 (lines: 583)
ag (lines) 1.973 +/- 0.025 (lines: 583)
grep (lines) 0.808 +/- 0.006 (lines: 583)
ugrep (lines) 0.680 +/- 0.026 (lines: 583)
subtitles_ru_literal_casei (pattern: Шерлок Холмс)
--------------------------------------------------
rg* 0.268 +/- 0.001 (lines: 604)*
grep 4.767 +/- 0.002 (lines: 604)
grep (ASCII) 0.506 +/- 0.007 (lines: 583)
rg (lines) 0.335 +/- 0.005 (lines: 604)
ag (lines) (ASCII) 0.457 +/- 0.019 (lines: 0)
ugrep (lines) (ASCII) 0.720 +/- 0.081 (lines: 583)
subtitles_ru_literal_word (pattern: Шерлок Холмс)
-------------------------------------------------
rg (ASCII)* 0.195 +/- 0.011 (lines: 583)*
ag (ASCII) 0.509 +/- 0.020 (lines: 0)
grep (ASCII) 0.800 +/- 0.019 (lines: 583)
ugrep (ASCII) 0.708 +/- 0.034 (lines: 583)
rg 0.201 +/- 0.003 (lines: 579)
grep 0.792 +/- 0.020 (lines: 579)
subtitles_ru_alternate (pattern: Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти)
-----------------------------------------------------------------------------------------------------------
rg (lines) 0.682 +/- 0.018 (lines: 691)
ag (lines) 2.722 +/- 0.020 (lines: 691)
grep (lines) 5.711 +/- 0.006 (lines: 691)
ugrep (lines) 8.301 +/- 0.023 (lines: 691)
rg* 0.625 +/- 0.012 (lines: 691)*
grep 5.425 +/- 0.006 (lines: 691)
subtitles_ru_alternate_casei (pattern: Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти)
-----------------------------------------------------------------------------------------------------------------
ag (ASCII)* 2.727 +/- 0.019 (lines: 691)*
grep (ASCII) 5.712 +/- 0.002 (lines: 691)
ugrep (ASCII) 8.301 +/- 0.011 (lines: 691)
rg 3.673 +/- 0.004 (lines: 735)
grep 5.360 +/- 0.015 (lines: 735)
subtitles_ru_surrounding_words (pattern: \w+\s+Холмс\s+\w+)
-----------------------------------------------------------
rg* 0.203 +/- 0.001 (lines: 278)*
grep 1.039 +/- 0.009 (lines: 278)
ugrep 42.919 +/- 0.087 (lines: 278)
ag (ASCII) 1.084 +/- 0.001 (lines: 0)
grep (ASCII) 1.007 +/- 0.018 (lines: 0)
ugrep (ASCII) 0.890 +/- 0.001 (lines: 0)
subtitles_ru_no_literal (pattern: \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5})
----------------------------------------------------------------------------------------
rg 2.236 +/- 0.019 (lines: 41)
ugrep 28.811 +/- 0.127 (lines: 41)
rg (ASCII) 2.035 +/- 0.014 (lines: 0)
ag (ASCII) 1.093 +/- 0.006 (lines: 0)
grep (ASCII) 1.085 +/- 0.015 (lines: 0)
ugrep (ASCII)* 0.832 +/- 0.002 (lines: 0)*

View File

@@ -39,4 +39,4 @@ cp complete/_rg "$DEPLOY_DIR/"
# Since we're distributing the dpkg, we don't know whether the user will have
# PCRE2 installed, so just do a static build.
PCRE2_SYS_STATIC=1 cargo deb
PCRE2_SYS_STATIC=1 cargo deb --target x86_64-unknown-linux-musl

View File

@@ -1,6 +1,16 @@
#!/bin/sh
# This script gets run in weird environments that have been stripped of just
# about every inessential thing. In order to keep this script versatile, we
# just install 'sudo' and use it like normal if it doesn't exist. If it doesn't
# exist, we assume we're root. (Otherwise we ain't doing much of anything
# anyway.)
if ! command -V sudo; then
apt-get update
apt-get install -y --no-install-recommends sudo
fi
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
asciidoctor \
zsh xz-utils liblz4-tool musl-tools
zsh xz-utils liblz4-tool musl-tools \
brotli zstd

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-cli"
version = "0.1.6" #:version
version = "0.1.8" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Utilities for search oriented command line applications.
@@ -10,13 +10,12 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/cli"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/cli"
readme = "README.md"
keywords = ["regex", "grep", "cli", "utility", "util"]
license = "Unlicense/MIT"
license = "Unlicense OR MIT"
edition = "2018"
[dependencies]
atty = "0.2.11"
bstr = "0.2.0"
globset = { version = "0.4.7", path = "../globset" }
bstr = "1.6.0"
globset = { version = "0.4.10", path = "../globset" }
lazy_static = "1.1.0"
log = "0.4.5"
regex = "1.1"

View File

@@ -382,7 +382,7 @@ impl DecompressionReader {
///
/// `close` is also called in `drop` as a last line of defense against
/// resource leakage. Any error from the child process is then printed as a
/// warning to stderr. This can be avoided by explictly calling `close`
/// warning to stderr. This can be avoided by explicitly calling `close`
/// before the CommandReader is dropped.
pub fn close(&mut self) -> io::Result<()> {
match self.rdr {

View File

@@ -8,7 +8,7 @@ use regex::Regex;
/// An error that occurs when parsing a human readable size description.
///
/// This error provides an end user friendly message describing why the
/// description coudln't be parsed and what the expected format is.
/// description couldn't be parsed and what the expected format is.
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct ParseSizeError {
original: String,

View File

@@ -165,6 +165,8 @@ mod pattern;
mod process;
mod wtr;
use std::io::IsTerminal;
pub use crate::decompress::{
resolve_binary, DecompressionMatcher, DecompressionMatcherBuilder,
DecompressionReader, DecompressionReaderBuilder,
@@ -212,13 +214,13 @@ pub fn is_readable_stdin() -> bool {
!is_tty_stdin() && imp()
}
/// Returns true if and only if stdin is believed to be connectted to a tty
/// Returns true if and only if stdin is believed to be connected to a tty
/// or a console.
pub fn is_tty_stdin() -> bool {
atty::is(atty::Stream::Stdin)
std::io::stdin().is_terminal()
}
/// Returns true if and only if stdout is believed to be connectted to a tty
/// Returns true if and only if stdout is believed to be connected to a tty
/// or a console.
///
/// This is useful for when you want your command line program to produce
@@ -227,11 +229,11 @@ pub fn is_tty_stdin() -> bool {
/// implementations of `ls` will often show one item per line when stdout is
/// redirected, but will condensed output when printing to a tty.
pub fn is_tty_stdout() -> bool {
atty::is(atty::Stream::Stdout)
std::io::stdout().is_terminal()
}
/// Returns true if and only if stderr is believed to be connectted to a tty
/// Returns true if and only if stderr is believed to be connected to a tty
/// or a console.
pub fn is_tty_stderr() -> bool {
atty::is(atty::Stream::Stderr)
std::io::stderr().is_terminal()
}

View File

@@ -221,7 +221,7 @@ impl CommandReader {
///
/// `close` is also called in `drop` as a last line of defense against
/// resource leakage. Any error from the child process is then printed as a
/// warning to stderr. This can be avoided by explictly calling `close`
/// warning to stderr. This can be avoided by explicitly calling `close`
/// before the CommandReader is dropped.
pub fn close(&mut self) -> io::Result<()> {
// Dropping stdout closes the underlying file descriptor, which should

View File

@@ -875,8 +875,8 @@ Print the 0-based byte offset within the input file before each line of output.
If -o (--only-matching) is specified, print the offset of the matching part
itself.
If ripgrep does transcoding, then the byte offset is in terms of the the result
of transcoding and not the original data. This applies similarly to another
If ripgrep does transcoding, then the byte offset is in terms of the result of
transcoding and not the original data. This applies similarly to another
transformation on the source, such as decompression or a --pre filter. Note
that when the PCRE2 regex engine is used, then UTF-8 transcoding is done by
default.
@@ -970,7 +970,7 @@ or, equivalently,
rg --colors 'match:bg:0x0,0x80,0xFF'
Note that the the intense and nointense style flags will have no effect when
Note that the intense and nointense style flags will have no effect when
used alongside these extended color codes.
"
);
@@ -1242,7 +1242,7 @@ fn flag_field_context_separator(args: &mut Vec<RGArg>) {
Set the field context separator, which is used to delimit file paths, line
numbers, columns and the context itself, when printing contextual lines. The
separator may be any number of bytes, including zero. Escape sequences like
\\x7F or \\t may be used. The default value is -.
\\x7F or \\t may be used. The '-' character is the default value.
"
);
let arg = RGArg::flag("field-context-separator", "SEPARATOR")
@@ -1257,8 +1257,8 @@ fn flag_field_match_separator(args: &mut Vec<RGArg>) {
"\
Set the field match separator, which is used to delimit file paths, line
numbers, columns and the match itself. The separator may be any number of
bytes, including zero. Escape sequences like \\x7F or \\t may be used. The
default value is -.
bytes, including zero. Escape sequences like \\x7F or \\t may be used. The ':'
character is the default value.
"
);
let arg = RGArg::flag("field-match-separator", "SEPARATOR")
@@ -1395,7 +1395,7 @@ it. If multiple globs match a file or directory, the glob given later in the
command line takes precedence.
As an extension, globs support specifying alternatives: *-g ab{c,d}* is
equivalet to *-g abc -g abd*. Empty alternatives like *-g ab{,c}* are not
equivalent to *-g abc -g abd*. Empty alternatives like *-g ab{,c}* are not
currently supported. Note that this syntax extension is also currently enabled
in gitignore files, even though this syntax isn't supported by git itself.
ripgrep may disable this syntax extension in gitignore files, but it will
@@ -1548,7 +1548,7 @@ When specifying multiple ignore files, earlier files have lower precedence
than later files.
If you are looking for a way to include or exclude files and directories
directly on the command line, then used -g instead.
directly on the command line, then use -g instead.
"
);
let arg = RGArg::flag("ignore-file", "PATH")
@@ -2583,8 +2583,8 @@ Do not print anything to stdout. If a match is found in a file, then ripgrep
will stop searching. This is useful when ripgrep is used only for its exit
code (which will be an error if no matches are found).
When --files is used, then ripgrep will stop finding files after finding the
first file that matches all ignore rules.
When --files is used, ripgrep will stop finding files after finding the
first file that does not match any ignore rules.
"
);
let arg = RGArg::switch("quiet").short("q").help(SHORT).long_help(LONG);

View File

@@ -31,8 +31,6 @@ use ignore::overrides::{Override, OverrideBuilder};
use ignore::types::{FileTypeDef, Types, TypesBuilder};
use ignore::{Walk, WalkBuilder, WalkParallel};
use log;
use num_cpus;
use regex;
use termcolor::{BufferWriter, ColorChoice, WriteColor};
use crate::app;
@@ -97,14 +95,17 @@ pub struct Args(Arc<ArgsImp>);
struct ArgsImp {
/// Mid-to-low level routines for extracting CLI arguments.
matches: ArgMatches,
/// The patterns provided at the command line and/or via the -f/--file
/// flag. This may be empty.
patterns: Vec<String>,
/// The command we want to execute.
command: Command,
/// The number of threads to use. This is based in part on available
/// threads, in part on the number of threads requested and in part on the
/// command we're running.
threads: usize,
/// A matcher built from the patterns.
///
/// It's important that this is only built once, since building this goes
/// through regex compilation and various types of analyses. That is, if
/// you need many of theses (one per thread, for example), it is better to
/// you need many of these (one per thread, for example), it is better to
/// build it once and then clone it.
matcher: PatternMatcher,
/// The paths provided at the command line. This is guaranteed to be
@@ -165,12 +166,6 @@ impl Args {
&self.0.matches
}
/// Return the patterns found in the command line arguments. This includes
/// patterns read via the -f/--file flags.
fn patterns(&self) -> &[String] {
&self.0.patterns
}
/// Return the matcher builder from the patterns.
fn matcher(&self) -> &PatternMatcher {
&self.0.matcher
@@ -197,7 +192,7 @@ impl Args {
fn printer<W: WriteColor>(&self, wtr: W) -> Result<Printer<W>> {
match self.matches().output_kind() {
OutputKind::Standard => {
let separator_search = self.command()? == Command::Search;
let separator_search = self.command() == Command::Search;
self.matches()
.printer_standard(self.paths(), wtr, separator_search)
.map(Printer::Standard)
@@ -225,28 +220,8 @@ impl Args {
}
/// Return the high-level command that ripgrep should run.
pub fn command(&self) -> Result<Command> {
let is_one_search = self.matches().is_one_search(self.paths());
let threads = self.matches().threads()?;
let one_thread = is_one_search || threads == 1;
Ok(if self.matches().is_present("pcre2-version") {
Command::PCRE2Version
} else if self.matches().is_present("type-list") {
Command::Types
} else if self.matches().is_present("files") {
if one_thread {
Command::Files
} else {
Command::FilesParallel
}
} else if self.matches().can_never_match(self.patterns()) {
Command::SearchNever
} else if one_thread {
Command::Search
} else {
Command::SearchParallel
})
pub fn command(&self) -> Command {
self.0.command
}
/// Builder a path printer that can be used for printing just file paths,
@@ -304,7 +279,7 @@ impl Args {
/// When this returns a `Stats` value, then it is guaranteed that the
/// search worker will be configured to track statistics as well.
pub fn stats(&self) -> Result<Option<Stats>> {
Ok(if self.command()?.is_search() && self.matches().stats() {
Ok(if self.command().is_search() && self.matches().stats() {
Some(Stats::new())
} else {
None
@@ -343,12 +318,18 @@ impl Args {
/// Return a walker that never uses additional threads.
pub fn walker(&self) -> Result<Walk> {
Ok(self.matches().walker_builder(self.paths())?.build())
Ok(self
.matches()
.walker_builder(self.paths(), self.0.threads)?
.build())
}
/// Return a parallel walker that may use additional threads.
pub fn walker_parallel(&self) -> Result<WalkParallel> {
Ok(self.matches().walker_builder(self.paths())?.build_parallel())
Ok(self
.matches()
.walker_builder(self.paths(), self.0.threads)?
.build_parallel())
}
}
@@ -490,24 +471,6 @@ enum EncodingMode {
Disabled,
}
impl EncodingMode {
/// Checks if an explicit encoding has been set. Returns false for
/// automatic BOM sniffing and no sniffing.
///
/// This is only used to determine whether PCRE2 needs to have its own
/// UTF-8 checking enabled. If we have an explicit encoding set, then
/// we're always guaranteed to get UTF-8, so we can disable PCRE2's check.
/// Otherwise, we have no such guarantee, and must enable PCRE2' UTF-8
/// check.
#[cfg(feature = "pcre2")]
fn has_explicit_encoding(&self) -> bool {
match self {
EncodingMode::Some(_) => true,
_ => false,
}
}
}
impl ArgMatches {
/// Create an ArgMatches from clap's parse result.
fn new(clap_matches: clap::ArgMatches<'static>) -> ArgMatches {
@@ -557,9 +520,36 @@ impl ArgMatches {
} else {
false
};
// Now figure out the number of threads we'll use and which
// command will run.
let is_one_search = self.is_one_search(&paths);
let threads = if is_one_search { 1 } else { self.threads()? };
if threads == 1 {
log::debug!("running in single threaded mode");
} else {
log::debug!("running with {threads} threads for parallelism");
}
let command = if self.is_present("pcre2-version") {
Command::PCRE2Version
} else if self.is_present("type-list") {
Command::Types
} else if self.is_present("files") {
if threads == 1 {
Command::Files
} else {
Command::FilesParallel
}
} else if self.can_never_match(&patterns) {
Command::SearchNever
} else if threads == 1 {
Command::Search
} else {
Command::SearchParallel
};
Ok(Args(Arc::new(ArgsImp {
matches: self,
patterns,
command,
threads,
matcher,
paths,
using_default_path,
@@ -662,6 +652,8 @@ impl ArgMatches {
.multi_line(true)
.unicode(self.unicode())
.octal(false)
.fixed_strings(self.is_present("fixed-strings"))
.whole_line(self.is_present("line-regexp"))
.word(self.is_present("word-regexp"));
if self.is_present("multiline") {
builder.dot_matches_new_line(self.is_present("multiline-dotall"));
@@ -688,12 +680,7 @@ impl ArgMatches {
if let Some(limit) = self.dfa_size_limit()? {
builder.dfa_size_limit(limit);
}
let res = if self.is_present("fixed-strings") {
builder.build_literals(patterns)
} else {
builder.build(&patterns.join("|"))
};
match res {
match builder.build_many(patterns) {
Ok(m) => Ok(m),
Err(err) => Err(From::from(suggest_multiline(err.to_string()))),
}
@@ -710,6 +697,8 @@ impl ArgMatches {
.case_smart(self.case_smart())
.caseless(self.case_insensitive())
.multi_line(true)
.fixed_strings(self.is_present("fixed-strings"))
.whole_line(self.is_present("line-regexp"))
.word(self.is_present("word-regexp"));
// For whatever reason, the JIT craps out during regex compilation with
// a "no more memory" error on 32 bit systems. So don't use it there.
@@ -723,14 +712,6 @@ impl ArgMatches {
}
if self.unicode() {
builder.utf(true).ucp(true);
if self.encoding()?.has_explicit_encoding() {
// SAFETY: If an encoding was specified, then we're guaranteed
// to get valid UTF-8, so we can disable PCRE2's UTF checking.
// (Feeding invalid UTF-8 to PCRE2 is undefined behavior.)
unsafe {
builder.disable_utf_check();
}
}
}
if self.is_present("multiline") {
builder.dotall(self.is_present("multiline-dotall"));
@@ -738,7 +719,7 @@ impl ArgMatches {
if self.is_present("crlf") {
builder.crlf(true);
}
Ok(builder.build(&patterns.join("|"))?)
Ok(builder.build_many(patterns)?)
}
/// Build a JSON printer that writes results to the given writer.
@@ -858,7 +839,11 @@ impl ArgMatches {
///
/// If there was a problem parsing the CLI arguments necessary for
/// constructing the builder, then this returns an error.
fn walker_builder(&self, paths: &[PathBuf]) -> Result<WalkBuilder> {
fn walker_builder(
&self,
paths: &[PathBuf],
threads: usize,
) -> Result<WalkBuilder> {
let mut builder = WalkBuilder::new(&paths[0]);
for path in &paths[1..] {
builder.add(path);
@@ -874,7 +859,7 @@ impl ArgMatches {
.max_depth(self.usize_of("max-depth")?)
.follow_links(self.is_present("follow"))
.max_filesize(self.max_file_size()?)
.threads(self.threads()?)
.threads(threads)
.same_file_system(self.is_present("one-file-system"))
.skip_stdout(!self.is_present("files"))
.overrides(self.overrides()?)
@@ -1067,7 +1052,6 @@ impl ArgMatches {
}
let label = match self.value_of_lossy("encoding") {
None if self.pcre2_unicode() => "utf-8".to_string(),
None => return Ok(EncodingMode::Auto),
Some(label) => label,
};
@@ -1399,11 +1383,6 @@ impl ArgMatches {
/// Get a sequence of all available patterns from the command line.
/// This includes reading the -e/--regexp and -f/--file flags.
///
/// Note that if -F/--fixed-strings is set, then all patterns will be
/// escaped. If -x/--line-regexp is set, then all patterns are surrounded
/// by `^...$`. Other things, such as --word-regexp, are handled by the
/// regex matcher itself.
///
/// If any pattern is invalid UTF-8, then an error is returned.
fn patterns(&self) -> Result<Vec<String>> {
if self.is_present("files") || self.is_present("type-list") {
@@ -1444,16 +1423,6 @@ impl ArgMatches {
Ok(pats)
}
/// Returns a pattern that is guaranteed to produce an empty regular
/// expression that is valid in any position.
fn pattern_empty(&self) -> String {
// This would normally just be an empty string, which works on its
// own, but if the patterns are joined in a set of alternations, then
// you wind up with `foo|`, which is currently invalid in Rust's regex
// engine.
"(?:z{0})*".to_string()
}
/// Converts an OsStr pattern to a String pattern. The pattern is escaped
/// if -F/--fixed-strings is set.
///
@@ -1472,30 +1441,12 @@ impl ArgMatches {
/// Applies additional processing on the given pattern if necessary
/// (such as escaping meta characters or turning it into a line regex).
fn pattern_from_string(&self, pat: String) -> String {
let pat = self.pattern_line(self.pattern_literal(pat));
if pat.is_empty() {
self.pattern_empty()
} else {
pat
}
}
/// Returns the given pattern as a line pattern if the -x/--line-regexp
/// flag is set. Otherwise, the pattern is returned unchanged.
fn pattern_line(&self, pat: String) -> String {
if self.is_present("line-regexp") {
format!(r"^(?:{})$", pat)
} else {
pat
}
}
/// Returns the given pattern as a literal pattern if the
/// -F/--fixed-strings flag is set. Otherwise, the pattern is returned
/// unchanged.
fn pattern_literal(&self, pat: String) -> String {
if self.is_present("fixed-strings") {
regex::escape(&pat)
// This would normally just be an empty string, which works on its
// own, but if the patterns are joined in a set of alternations,
// then you wind up with `foo|`, which is currently invalid in
// Rust's regex engine.
"(?:)".to_string()
} else {
pat
}
@@ -1592,7 +1543,9 @@ impl ArgMatches {
return Ok(1);
}
let threads = self.usize_of("threads")?.unwrap_or(0);
Ok(if threads == 0 { cmp::min(12, num_cpus::get()) } else { threads })
let available =
std::thread::available_parallelism().map_or(1, |n| n.get());
Ok(if threads == 0 { cmp::min(12, available) } else { threads })
}
/// Builds a file type matcher from the command line flags.
@@ -1626,12 +1579,6 @@ impl ArgMatches {
!(self.is_present("no-unicode") || self.is_present("no-pcre2-unicode"))
}
/// Returns true if and only if PCRE2 is enabled and its Unicode mode is
/// enabled.
fn pcre2_unicode(&self) -> bool {
self.is_present("pcre2") && self.unicode()
}
/// Returns true if and only if file names containing each match should
/// be emitted.
fn with_filename(&self, paths: &[PathBuf]) -> bool {

View File

@@ -28,7 +28,10 @@ pub fn args() -> Vec<OsString> {
let (args, errs) = match parse(&config_path) {
Ok((args, errs)) => (args, errs),
Err(err) => {
message!("{}", err);
message!(
"failed to read the file specified in RIPGREP_CONFIG_PATH: {}",
err
);
return vec![];
}
};
@@ -77,7 +80,7 @@ fn parse<P: AsRef<Path>>(
fn parse_reader<R: io::Read>(
rdr: R,
) -> Result<(Vec<OsString>, Vec<Box<dyn Error>>)> {
let bufrdr = io::BufReader::new(rdr);
let mut bufrdr = io::BufReader::new(rdr);
let (mut args, mut errs) = (vec![], vec![]);
let mut line_number = 0;
bufrdr.for_byte_line_with_terminator(|line| {

View File

@@ -55,7 +55,7 @@ fn main() {
fn try_main(args: Args) -> Result<()> {
use args::Command::*;
let matched = match args.command()? {
let matched = match args.command() {
Search => search(&args),
SearchParallel => search_parallel(&args),
SearchNever => Ok(false),

View File

@@ -67,7 +67,7 @@ impl SubjectBuilder {
if subj.is_file() {
return Some(subj);
}
// We got nothin. Emit a debug message, but only if this isn't a
// We got nothing. Emit a debug message, but only if this isn't a
// directory. Otherwise, emitting messages for directories is just
// noisy.
if !subj.is_dir() {

View File

@@ -1,6 +1,6 @@
[package]
name = "globset"
version = "0.4.7" #:version
version = "0.4.10" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Cross platform single glob and glob set matching. Glob set matching is the
@@ -12,7 +12,7 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/globset"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/globset"
readme = "README.md"
keywords = ["regex", "glob", "multiple", "set", "pattern"]
license = "Unlicense/MIT"
license = "Unlicense OR MIT"
edition = "2018"
[lib]
@@ -20,11 +20,11 @@ name = "globset"
bench = false
[dependencies]
aho-corasick = "0.7.3"
bstr = { version = "0.2.0", default-features = false, features = ["std"] }
aho-corasick = "1.0.2"
bstr = { version = "1.6.0", default-features = false, features = ["std"] }
fnv = "1.0.6"
log = "0.4.5"
regex = { version = "1.1.5", default-features = false, features = ["perf", "std"] }
log = { version = "0.4.5", optional = true }
regex = { version = "1.8.3", default-features = false, features = ["perf", "std"] }
serde = { version = "1.0.104", optional = true }
[dev-dependencies]
@@ -33,5 +33,6 @@ lazy_static = "1"
serde_json = "1.0.45"
[features]
default = ["log"]
simd-accel = []
serde1 = ["serde"]

View File

@@ -19,7 +19,7 @@ Add this to your `Cargo.toml`:
```toml
[dependencies]
globset = "0.3"
globset = "0.4"
```
### Features
@@ -78,12 +78,12 @@ assert_eq!(set.matches("src/bar/baz/foo.rs"), vec![0, 2]);
This crate implements globs by converting them to regular expressions, and
executing them with the
[`regex`](https://github.com/rust-lang-nursery/regex)
[`regex`](https://github.com/rust-lang/regex)
crate.
For single glob matching, performance of this crate should be roughly on par
with the performance of the
[`glob`](https://github.com/rust-lang-nursery/glob)
[`glob`](https://github.com/rust-lang/glob)
crate. (`*_regex` correspond to benchmarks for this library while `*_glob`
correspond to benchmarks for the `glob` library.)
Optimizations in the `regex` crate may propel this library past `glob`,
@@ -108,7 +108,7 @@ test many_short_glob ... bench: 1,063 ns/iter (+/- 47)
test many_short_regex_set ... bench: 186 ns/iter (+/- 11)
```
### Comparison with the [`glob`](https://github.com/rust-lang-nursery/glob) crate
### Comparison with the [`glob`](https://github.com/rust-lang/glob) crate
* Supports alternate "or" globs, e.g., `*.{foo,bar}`.
* Can match non-UTF-8 file paths correctly.

View File

@@ -143,8 +143,6 @@ impl GlobMatcher {
struct GlobStrategic {
/// The match strategy to use.
strategy: MatchStrategy,
/// The underlying pattern.
pat: Glob,
/// The pattern, as a compiled regex.
re: Regex,
}
@@ -273,7 +271,7 @@ impl Glob {
let strategy = MatchStrategy::new(self);
let re =
new_regex(&self.re).expect("regex compilation shouldn't fail");
GlobStrategic { strategy: strategy, pat: self.clone(), re: re }
GlobStrategic { strategy: strategy, re: re }
}
/// Returns the original glob pattern used to build this pattern.

View File

@@ -125,6 +125,16 @@ mod pathutil;
#[cfg(feature = "serde1")]
mod serde_impl;
#[cfg(feature = "log")]
macro_rules! debug {
($($token:tt)*) => (::log::debug!($($token)*);)
}
#[cfg(not(feature = "log"))]
macro_rules! debug {
($($token:tt)*) => {};
}
/// Represents an error that can occur when parsing a glob pattern.
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct Error {
@@ -413,12 +423,12 @@ impl GlobSet {
required_exts.add(i, ext, p.regex().to_owned());
}
MatchStrategy::Regex => {
log::debug!("glob converted to regex: {:?}", p);
debug!("glob converted to regex: {:?}", p);
regexes.add(i, p.regex().to_owned());
}
}
}
log::debug!(
debug!(
"built glob set; {} literals, {} basenames, {} extensions, \
{} prefixes, {} suffixes, {} required extensions, {} regexes",
lits.0.len(),
@@ -488,13 +498,23 @@ impl GlobSetBuilder {
/// Constructing candidates has a very small cost associated with it, so
/// callers may find it beneficial to amortize that cost when matching a single
/// path against multiple globs or sets of globs.
#[derive(Clone, Debug)]
#[derive(Clone)]
pub struct Candidate<'a> {
path: Cow<'a, [u8]>,
basename: Cow<'a, [u8]>,
ext: Cow<'a, [u8]>,
}
impl<'a> std::fmt::Debug for Candidate<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
f.debug_struct("Candidate")
.field("path", &self.path.as_bstr())
.field("basename", &self.basename.as_bstr())
.field("ext", &self.ext.as_bstr())
.finish()
}
}
impl<'a> Candidate<'a> {
/// Create a new candidate for matching from the given path.
pub fn new<P: AsRef<Path> + ?Sized>(path: &'a P) -> Candidate<'a> {
@@ -808,7 +828,7 @@ impl MultiStrategyBuilder {
fn prefix(self) -> PrefixStrategy {
PrefixStrategy {
matcher: AhoCorasick::new_auto_configured(&self.literals),
matcher: AhoCorasick::new(&self.literals).unwrap(),
map: self.map,
longest: self.longest,
}
@@ -816,7 +836,7 @@ impl MultiStrategyBuilder {
fn suffix(self) -> SuffixStrategy {
SuffixStrategy {
matcher: AhoCorasick::new_auto_configured(&self.literals),
matcher: AhoCorasick::new(&self.literals).unwrap(),
map: self.map,
longest: self.longest,
}

View File

@@ -1,7 +1,7 @@
use serde::de::Error;
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use Glob;
use crate::Glob;
impl Serialize for Glob {
fn serialize<S: Serializer>(
@@ -23,7 +23,7 @@ impl<'de> Deserialize<'de> for Glob {
#[cfg(test)]
mod tests {
use Glob;
use crate::Glob;
#[test]
fn glob_json_works() {

View File

@@ -1,6 +1,6 @@
[package]
name = "grep"
version = "0.2.8" #:version
version = "0.2.12" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Fast line oriented regex searching as a library.
@@ -10,16 +10,16 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/grep"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/grep"
readme = "README.md"
keywords = ["regex", "grep", "egrep", "search", "pattern"]
license = "Unlicense/MIT"
license = "Unlicense OR MIT"
edition = "2018"
[dependencies]
grep-cli = { version = "0.1.6", path = "../cli" }
grep-matcher = { version = "0.1.5", path = "../matcher" }
grep-pcre2 = { version = "0.1.5", path = "../pcre2", optional = true }
grep-printer = { version = "0.1.6", path = "../printer" }
grep-regex = { version = "0.1.9", path = "../regex" }
grep-searcher = { version = "0.1.8", path = "../searcher" }
grep-cli = { version = "0.1.7", path = "../cli" }
grep-matcher = { version = "0.1.6", path = "../matcher" }
grep-pcre2 = { version = "0.1.6", path = "../pcre2", optional = true }
grep-printer = { version = "0.1.7", path = "../printer" }
grep-regex = { version = "0.1.11", path = "../regex" }
grep-searcher = { version = "0.1.11", path = "../searcher" }
[dev-dependencies]
termcolor = "1.0.4"

View File

@@ -12,8 +12,6 @@ are sparse.
A cookbook and a guide are planned.
*/
#![deny(missing_docs)]
pub extern crate grep_cli as cli;
pub extern crate grep_matcher as matcher;
#[cfg(feature = "pcre2")]

View File

@@ -1,6 +1,6 @@
[package]
name = "ignore"
version = "0.4.18" #:version
version = "0.4.20" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
A fast library for efficiently matching ignore files such as `.gitignore`
@@ -11,7 +11,7 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/ignore"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/ignore"
readme = "README.md"
keywords = ["glob", "ignore", "gitignore", "pattern", "file"]
license = "Unlicense/MIT"
license = "Unlicense OR MIT"
edition = "2018"
[lib]
@@ -19,12 +19,11 @@ name = "ignore"
bench = false
[dependencies]
crossbeam-utils = "0.8.0"
globset = { version = "0.4.7", path = "../globset" }
globset = { version = "0.4.10", path = "../globset" }
lazy_static = "1.1"
log = "0.4.5"
memchr = "2.1"
regex = "1.1"
memchr = "2.5"
regex = "1.8.3"
same-file = "1.0.4"
thread_local = "1"
walkdir = "2.2.7"

View File

@@ -16,18 +16,24 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
("asciidoc", &["*.adoc", "*.asc", "*.asciidoc"]),
("asm", &["*.asm", "*.s", "*.S"]),
("asp", &[
"*.aspx", "*.aspx.cs", "*.aspx.vb", "*.ascx", "*.ascx.cs", "*.ascx.vb",
"*.aspx", "*.aspx.cs", "*.aspx.vb", "*.ascx", "*.ascx.cs",
"*.ascx.vb", "*.asp"
]),
("ats", &["*.ats", "*.dats", "*.sats", "*.hats"]),
("avro", &["*.avdl", "*.avpr", "*.avsc"]),
("awk", &["*.awk"]),
("bazel", &["*.bazel", "*.bzl", "*.BUILD", "*.bazelrc", "BUILD", "WORKSPACE"]),
("bazel", &[
"*.bazel", "*.bzl", "*.BUILD", "*.bazelrc", "BUILD", "MODULE.bazel",
"WORKSPACE", "WORKSPACE.bazel",
]),
("bitbake", &["*.bb", "*.bbappend", "*.bbclass", "*.conf", "*.inc"]),
("brotli", &["*.br"]),
("buildstream", &["*.bst"]),
("bzip2", &["*.bz2", "*.tbz2"]),
("c", &["*.[chH]", "*.[chH].in", "*.cats"]),
("cabal", &["*.cabal"]),
("candid", &["*.did"]),
("carp", &["*.carp"]),
("cbor", &["*.cbor"]),
("ceylon", &["*.ceylon"]),
("clojure", &["*.clj", "*.cljc", "*.cljs", "*.cljx"]),
@@ -40,18 +46,22 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
"*.[ChH].in", "*.cc.in", "*.[ch]pp.in", "*.[ch]xx.in", "*.hh.in",
]),
("creole", &["*.creole"]),
("crystal", &["Projectfile", "*.cr"]),
("crystal", &["Projectfile", "*.cr", "*.ecr", "shard.yml"]),
("cs", &["*.cs"]),
("csharp", &["*.cs"]),
("cshtml", &["*.cshtml"]),
("css", &["*.css", "*.scss"]),
("csv", &["*.csv"]),
("cuda", &["*.cu", "*.cuh"]),
("cython", &["*.pyx", "*.pxi", "*.pxd"]),
("d", &["*.d"]),
("dart", &["*.dart"]),
("devicetree", &["*.dts", "*.dtsi"]),
("dhall", &["*.dhall"]),
("diff", &["*.patch", "*.diff"]),
("docker", &["*Dockerfile*"]),
("dockercompose", &["docker-compose.yml", "docker-compose.*.yml"]),
("dts", &["*.dts", "*.dtsi"]),
("dvc", &["Dvcfile", "*.dvc"]),
("ebuild", &["*.ebuild"]),
("edn", &["*.edn"]),
@@ -60,6 +70,7 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
("elm", &["*.elm"]),
("erb", &["*.erb"]),
("erlang", &["*.erl", "*.hrl"]),
("fennel", &["*.fnl"]),
("fidl", &["*.fidl"]),
("fish", &["*.fish"]),
("flatbuffers", &["*.fbs"]),
@@ -68,24 +79,27 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
"*.f90", "*.F90", "*.f95", "*.F95",
]),
("fsharp", &["*.fs", "*.fsx", "*.fsi"]),
("fut", &[".fut"]),
("fut", &["*.fut"]),
("gap", &["*.g", "*.gap", "*.gi", "*.gd", "*.tst"]),
("gn", &["*.gn", "*.gni"]),
("go", &["*.go"]),
("gradle", &["*.gradle"]),
("groovy", &["*.groovy", "*.gradle"]),
("gzip", &["*.gz", "*.tgz"]),
("h", &["*.h", "*.hpp"]),
("h", &["*.h", "*.hh", "*.hpp"]),
("haml", &["*.haml"]),
("hare", &["*.ha"]),
("haskell", &["*.hs", "*.lhs", "*.cpphs", "*.c2hs", "*.hsc"]),
("hbs", &["*.hbs"]),
("hs", &["*.hs", "*.lhs"]),
("html", &["*.htm", "*.html", "*.ejs"]),
("hy", &["*.hy"]),
("idris", &["*.idr", "*.lidr"]),
("janet", &["*.janet"]),
("java", &["*.java", "*.jsp", "*.jspx", "*.properties"]),
("jinja", &["*.j2", "*.jinja", "*.jinja2"]),
("jl", &["*.jl"]),
("js", &["*.js", "*.jsx", "*.vue"]),
("js", &["*.js", "*.jsx", "*.vue", "*.cjs", "*.mjs"]),
("json", &["*.json", "composer.lock"]),
("jsonl", &["*.jsonl"]),
("julia", &["*.jl"]),
@@ -120,6 +134,7 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
"MPL-*[0-9]*",
"OFL-*[0-9]*",
]),
("lilypond", &["*.ly", "*.ily"]),
("lisp", &["*.el", "*.jl", "*.lisp", "*.lsp", "*.sc", "*.scm"]),
("lock", &["*.lock", "package-lock.json"]),
("log", &["*.log"]),
@@ -135,16 +150,18 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
]),
("mako", &["*.mako", "*.mao"]),
("man", &["*.[0-9lnpx]", "*.[0-9][cEFMmpSx]"]),
("markdown", &["*.markdown", "*.md", "*.mdown", "*.mkdn"]),
("markdown", &["*.markdown", "*.md", "*.mdown", "*.mdwn", "*.mkd", "*.mkdn"]),
("matlab", &["*.m"]),
("md", &["*.markdown", "*.md", "*.mdown", "*.mkdn"]),
("md", &["*.markdown", "*.md", "*.mdown", "*.mdwn", "*.mkd", "*.mkdn"]),
("meson", &["meson.build", "meson_options.txt"]),
("minified", &["*.min.html", "*.min.css", "*.min.js"]),
("mint", &["*.mint"]),
("mk", &["mkfile"]),
("ml", &["*.ml"]),
("motoko", &["*.mo"]),
("msbuild", &[
"*.csproj", "*.fsproj", "*.vcxproj", "*.proj", "*.props", "*.targets",
"*.sln",
]),
("nim", &["*.nim", "*.nimf", "*.nimble", "*.nims"]),
("nix", &["*.nix"]),
@@ -152,25 +169,33 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
("objcpp", &["*.h", "*.mm"]),
("ocaml", &["*.ml", "*.mli", "*.mll", "*.mly"]),
("org", &["*.org", "*.org_archive"]),
("pants", &["BUILD"]),
("pascal", &["*.pas", "*.dpr", "*.lpr", "*.pp", "*.inc"]),
("pdf", &["*.pdf"]),
("perl", &["*.perl", "*.pl", "*.PL", "*.plh", "*.plx", "*.pm", "*.t"]),
("php", &["*.php", "*.php3", "*.php4", "*.php5", "*.phtml"]),
("php", &[
// note that PHP 6 doesn't exist
// See: https://wiki.php.net/rfc/php6
"*.php", "*.php3", "*.php4", "*.php5", "*.php7", "*.php8",
"*.pht", "*.phtml"
]),
("po", &["*.po"]),
("pod", &["*.pod"]),
("postscript", &["*.eps", "*.ps"]),
("protobuf", &["*.proto"]),
("ps", &["*.cdxml", "*.ps1", "*.ps1xml", "*.psd1", "*.psm1"]),
("puppet", &["*.erb", "*.pp", "*.rb"]),
("puppet", &["*.epp", "*.erb", "*.pp", "*.rb"]),
("purs", &["*.purs"]),
("py", &["*.py"]),
("py", &["*.py", "*.pyi"]),
("qmake", &["*.pro", "*.pri", "*.prf"]),
("qml", &["*.qml"]),
("r", &["*.R", "*.r", "*.Rmd", "*.Rnw"]),
("racket", &["*.rkt"]),
("rdoc", &["*.rdoc"]),
("readme", &["README*", "*README"]),
("reasonml", &["*.re", "*.rei"]),
("red", &["*.r", "*.red", "*.reds"]),
("rescript", &["*.res", "*.resi"]),
("robot", &["*.robot"]),
("rst", &["*.rst"]),
("ruby", &[
@@ -208,6 +233,7 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
("slim", &["*.skim", "*.slim", "*.slime"]),
("smarty", &["*.tpl"]),
("sml", &["*.sml", "*.sig"]),
("solidity", &["*.sol"]),
("soy", &["*.soy"]),
("spark", &["*.spark"]),
("spec", &["*.spec"]),
@@ -225,11 +251,16 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
("taskpaper", &["*.taskpaper"]),
("tcl", &["*.tcl"]),
("tex", &["*.tex", "*.ltx", "*.cls", "*.sty", "*.bib", "*.dtx", "*.ins"]),
("texinfo", &["*.texi"]),
("textile", &["*.textile"]),
("tf", &["*.tf"]),
("tf", &[
"*.tf", "*.auto.tfvars", "terraform.tfvars", "*.tf.json",
"*.auto.tfvars.json", "terraform.tfvars.json", "*.terraformrc",
"terraform.rc", "*.tfrc", "*.terraform.lock.hcl",
]),
("thrift", &["*.thrift"]),
("toml", &["*.toml", "Cargo.lock"]),
("ts", &["*.ts", "*.tsx"]),
("ts", &["*.ts", "*.tsx", "*.cts", "*.mts"]),
("twig", &["*.twig"]),
("txt", &["*.txt"]),
("typoscript", &["*.typoscript", "*.ts"]),
@@ -238,8 +269,12 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
("vcl", &["*.vcl"]),
("verilog", &["*.v", "*.vh", "*.sv", "*.svh"]),
("vhdl", &["*.vhd", "*.vhdl"]),
("vim", &["*.vim"]),
("vimscript", &["*.vim"]),
("vim", &[
"*.vim", ".vimrc", ".gvimrc", "vimrc", "gvimrc", "_vimrc", "_gvimrc",
]),
("vimscript", &[
"*.vim", ".vimrc", ".gvimrc", "vimrc", "gvimrc", "_vimrc", "_gvimrc",
]),
("webidl", &["*.idl", "*.webidl", "*.widl"]),
("wiki", &["*.mediawiki", "*.wiki"]),
("xml", &[
@@ -262,3 +297,26 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
]),
("zstd", &["*.zst", "*.zstd"]),
];
#[cfg(test)]
mod tests {
use super::DEFAULT_TYPES;
#[test]
fn default_types_are_sorted() {
let mut names = DEFAULT_TYPES.iter().map(|(name, _exts)| name);
let Some(mut previous_name) = names.next() else { return; };
for name in names {
assert!(
name > previous_name,
r#""{}" should be sorted before "{}" in `DEFAULT_TYPES`"#,
name,
previous_name
);
previous_name = name;
}
}
}

View File

@@ -202,11 +202,12 @@ impl Ignore {
errs.maybe_push(err);
igtmp.is_absolute_parent = true;
igtmp.absolute_base = Some(absolute_base.clone());
igtmp.has_git = if self.0.opts.git_ignore {
parent.join(".git").exists()
} else {
false
};
igtmp.has_git =
if self.0.opts.require_git && self.0.opts.git_ignore {
parent.join(".git").exists()
} else {
false
};
ig = Ignore(Arc::new(igtmp));
compiled.insert(parent.as_os_str().to_os_string(), ig.clone());
}
@@ -231,7 +232,9 @@ impl Ignore {
/// Like add_child, but takes a full path and returns an IgnoreInner.
fn add_child_path(&self, dir: &Path) -> (IgnoreInner, Option<Error>) {
let git_type = if self.0.opts.git_ignore || self.0.opts.git_exclude {
let git_type = if self.0.opts.require_git
&& (self.0.opts.git_ignore || self.0.opts.git_exclude)
{
dir.join(".git").metadata().ok().map(|md| md.file_type())
} else {
None

View File

@@ -474,10 +474,13 @@ impl GitignoreBuilder {
}
// If it ends with a slash, then this should only match directories,
// but the slash should otherwise not be used while globbing.
if let Some((i, c)) = line.char_indices().rev().nth(0) {
if c == '/' {
glob.is_only_dir = true;
line = &line[..i];
if line.as_bytes().last() == Some(&b'/') {
glob.is_only_dir = true;
line = &line[..line.len() - 1];
// If the slash was escaped, then remove the escape.
// See: https://github.com/BurntSushi/ripgrep/issues/2236
if line.as_bytes().last() == Some(&b'\\') {
line = &line[..line.len() - 1];
}
}
glob.actual = line.to_string();

View File

@@ -106,6 +106,7 @@ impl Override {
}
/// Builds a matcher for a set of glob overrides.
#[derive(Clone, Debug)]
pub struct OverrideBuilder {
builder: GitignoreBuilder,
}

View File

@@ -122,10 +122,6 @@ enum GlobInner<'a> {
Matched {
/// The file type definition which provided the glob.
def: &'a FileTypeDef,
/// The index of the glob that matched inside the file type definition.
which: usize,
/// Whether the selection was negated or not.
negated: bool,
},
}
@@ -291,13 +287,9 @@ impl Types {
self.set.matches_into(name, &mut *matches);
// The highest precedent match is the last one.
if let Some(&i) = matches.last() {
let (isel, iglob) = self.glob_to_selection[i];
let (isel, _) = self.glob_to_selection[i];
let sel = &self.selections[isel];
let glob = Glob(GlobInner::Matched {
def: sel.inner(),
which: iglob,
negated: sel.is_negated(),
});
let glob = Glob(GlobInner::Matched { def: sel.inner() });
return if sel.is_negated() {
Match::Ignore(glob)
} else {

View File

@@ -941,7 +941,7 @@ impl Walk {
// overheads; an example of this was a bespoke filesystem layer in
// Windows that hosted files remotely and would download them on-demand
// when particular filesystem operations occurred. Users of this system
// who ensured correct file-type fileters were being used could still
// who ensured correct file-type filters were being used could still
// get unnecessary file access resulting in large downloads.
if should_skip_entry(&self.ig, ent) {
return Ok(true);
@@ -1282,7 +1282,7 @@ impl WalkParallel {
let quit_now = Arc::new(AtomicBool::new(false));
let num_pending =
Arc::new(AtomicUsize::new(stack.lock().unwrap().len()));
crossbeam_utils::thread::scope(|s| {
std::thread::scope(|s| {
let mut handles = vec![];
for _ in 0..threads {
let worker = Worker {
@@ -1296,13 +1296,12 @@ impl WalkParallel {
skip: self.skip.clone(),
filter: self.filter.clone(),
};
handles.push(s.spawn(|_| worker.run()));
handles.push(s.spawn(|| worker.run()));
}
for handle in handles {
handle.join().unwrap();
}
})
.unwrap(); // Pass along panics from threads
});
}
fn threads(&self) -> usize {

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-matcher"
version = "0.1.5" #:version
version = "0.1.6" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
A trait for regular expressions, with a focus on line oriented search.
@@ -10,7 +10,7 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/matcher"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/matcher"
readme = "README.md"
keywords = ["regex", "pattern", "trait"]
license = "Unlicense/MIT"
license = "Unlicense OR MIT"
autotests = false
edition = "2018"

View File

@@ -116,7 +116,7 @@ impl Match {
/// This method panics if `start > self.end`.
#[inline]
pub fn with_start(&self, start: usize) -> Match {
assert!(start <= self.end);
assert!(start <= self.end, "{} is not <= {}", start, self.end);
Match { start, ..*self }
}
@@ -128,7 +128,7 @@ impl Match {
/// This method panics if `self.start > end`.
#[inline]
pub fn with_end(&self, end: usize) -> Match {
assert!(self.start <= end);
assert!(self.start <= end, "{} is not <= {}", self.start, end);
Match { end, ..*self }
}

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-pcre2"
version = "0.1.5" #:version
version = "0.1.6" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Use PCRE2 with the 'grep' crate.
@@ -10,9 +10,10 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/pcre2"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/pcre2"
readme = "README.md"
keywords = ["regex", "grep", "pcre", "backreference", "look"]
license = "Unlicense/MIT"
license = "Unlicense OR MIT"
edition = "2018"
[dependencies]
grep-matcher = { version = "0.1.5", path = "../matcher" }
pcre2 = "0.2.3"
grep-matcher = { version = "0.1.6", path = "../matcher" }
log = "0.4.19"
pcre2 = "0.2.4"

View File

@@ -11,6 +11,8 @@ pub struct RegexMatcherBuilder {
builder: RegexBuilder,
case_smart: bool,
word: bool,
fixed_strings: bool,
whole_line: bool,
}
impl RegexMatcherBuilder {
@@ -20,6 +22,8 @@ impl RegexMatcherBuilder {
builder: RegexBuilder::new(),
case_smart: false,
word: false,
fixed_strings: false,
whole_line: false,
}
}
@@ -29,17 +33,40 @@ impl RegexMatcherBuilder {
/// If there was a problem compiling the pattern, then an error is
/// returned.
pub fn build(&self, pattern: &str) -> Result<RegexMatcher, Error> {
self.build_many(&[pattern])
}
/// Compile all of the given patterns into a single regex that matches when
/// at least one of the patterns matches.
///
/// If there was a problem building the regex, then an error is returned.
pub fn build_many<P: AsRef<str>>(
&self,
patterns: &[P],
) -> Result<RegexMatcher, Error> {
let mut builder = self.builder.clone();
if self.case_smart && !has_uppercase_literal(pattern) {
let mut pats = Vec::with_capacity(patterns.len());
for p in patterns.iter() {
pats.push(if self.fixed_strings {
format!("(?:{})", pcre2::escape(p.as_ref()))
} else {
format!("(?:{})", p.as_ref())
});
}
let mut singlepat = pats.join("|");
if self.case_smart && !has_uppercase_literal(&singlepat) {
builder.caseless(true);
}
let res = if self.word {
let pattern = format!(r"(?<!\w)(?:{})(?!\w)", pattern);
builder.build(&pattern)
} else {
builder.build(pattern)
};
res.map_err(Error::regex).map(|regex| {
if self.whole_line {
singlepat = format!(r"(?m:^)(?:{})(?m:$)", singlepat);
} else if self.word {
// We make this option exclusive with whole_line because when
// whole_line is enabled, all matches necessary fall on word
// boundaries. So this extra goop is strictly redundant.
singlepat = format!(r"(?<!\w)(?:{})(?!\w)", singlepat);
}
log::trace!("final regex: {:?}", singlepat);
builder.build(&singlepat).map_err(Error::regex).map(|regex| {
let mut names = HashMap::new();
for (i, name) in regex.capture_names().iter().enumerate() {
if let Some(ref name) = *name {
@@ -144,6 +171,21 @@ impl RegexMatcherBuilder {
self
}
/// Whether the patterns should be treated as literal strings or not. When
/// this is active, all characters, including ones that would normally be
/// special regex meta characters, are matched literally.
pub fn fixed_strings(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.fixed_strings = yes;
self
}
/// Whether each pattern should match the entire line or not. This is
/// equivalent to surrounding the pattern with `(?m:^)` and `(?m:$)`.
pub fn whole_line(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.whole_line = yes;
self
}
/// Enable Unicode matching mode.
///
/// When enabled, the following patterns become Unicode aware: `\b`, `\B`,
@@ -178,23 +220,22 @@ impl RegexMatcherBuilder {
self
}
/// When UTF matching mode is enabled, this will disable the UTF checking
/// that PCRE2 will normally perform automatically. If UTF matching mode
/// is not enabled, then this has no effect.
/// This is now deprecated and is a no-op.
///
/// UTF checking is enabled by default when UTF matching mode is enabled.
/// If UTF matching mode is enabled and UTF checking is enabled, then PCRE2
/// will return an error if you attempt to search a subject string that is
/// not valid UTF-8.
/// Previously, this option permitted disabling PCRE2's UTF-8 validity
/// check, which could result in undefined behavior if the haystack was
/// not valid UTF-8. But PCRE2 introduced a new option, `PCRE2_MATCH_INVALID_UTF`,
/// in 10.34 which this crate always sets. When this option is enabled,
/// PCRE2 claims to not have undefined behavior when the haystack is
/// invalid UTF-8.
///
/// # Safety
///
/// It is undefined behavior to disable the UTF check in UTF matching mode
/// and search a subject string that is not valid UTF-8. When the UTF check
/// is disabled, callers must guarantee that the subject string is valid
/// UTF-8.
pub unsafe fn disable_utf_check(&mut self) -> &mut RegexMatcherBuilder {
self.builder.disable_utf_check();
/// Therefore, disabling the UTF-8 check is not something that is exposed
/// by this crate.
#[deprecated(
since = "0.2.4",
note = "now a no-op due to new PCRE2 features"
)]
pub fn disable_utf_check(&mut self) -> &mut RegexMatcherBuilder {
self
}

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-printer"
version = "0.1.6" #:version
version = "0.1.7" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
An implementation of the grep crate's Sink trait that provides standard
@@ -11,7 +11,7 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/printer"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/printer"
readme = "README.md"
keywords = ["grep", "pattern", "print", "printer", "sink"]
license = "Unlicense/MIT"
license = "Unlicense OR MIT"
edition = "2018"
[features]
@@ -19,13 +19,13 @@ default = ["serde1"]
serde1 = ["base64", "serde", "serde_json"]
[dependencies]
base64 = { version = "0.13.0", optional = true }
bstr = "0.2.0"
grep-matcher = { version = "0.1.5", path = "../matcher" }
grep-searcher = { version = "0.1.8", path = "../searcher" }
base64 = { version = "0.20.0", optional = true }
bstr = "1.6.0"
grep-matcher = { version = "0.1.6", path = "../matcher" }
grep-searcher = { version = "0.1.11", path = "../searcher" }
termcolor = "1.0.4"
serde = { version = "1.0.77", optional = true, features = ["derive"] }
serde_json = { version = "1.0.27", optional = true }
[dev-dependencies]
grep-regex = { version = "0.1.9", path = "../regex" }
grep-regex = { version = "0.1.11", path = "../regex" }

View File

@@ -147,7 +147,7 @@ impl JSONBuilder {
/// is not limited to UTF-8 exclusively, which in turn implies that matches
/// may be reported that contain invalid UTF-8. Moreover, this printer may
/// also print file paths, and the encoding of file paths is itself not
/// guarnateed to be valid UTF-8. Therefore, this printer must deal with the
/// guaranteed to be valid UTF-8. Therefore, this printer must deal with the
/// presence of invalid UTF-8 somehow. The printer could silently ignore such
/// things completely, or even lossily transcode invalid UTF-8 to valid UTF-8
/// by replacing all invalid sequences with the Unicode replacement character.

View File

@@ -1594,7 +1594,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
/// multiple lines.
///
/// Note that this doesn't just return whether the searcher is in multi
/// line mode, but also checks if the mater can match over multiple lines.
/// line mode, but also checks if the matter can match over multiple lines.
/// If it can't, then we don't need multi line handling, even if the
/// searcher has multi line mode enabled.
fn multi_line(&self) -> bool {

View File

@@ -508,7 +508,7 @@ impl<'p, 's, M: Matcher, W: WriteColor> SummarySink<'p, 's, M, W> {
/// multiple lines.
///
/// Note that this doesn't just return whether the searcher is in multi
/// line mode, but also checks if the mater can match over multiple lines.
/// line mode, but also checks if the matter can match over multiple lines.
/// If it can't, then we don't need multi line handling, even if the
/// searcher has multi line mode enabled.
fn multi_line(&self, searcher: &Searcher) -> bool {

View File

@@ -82,26 +82,26 @@ impl<M: Matcher> Replacer<M> {
dst.clear();
matches.clear();
matcher
.replace_with_captures_at(
subject,
range.start,
caps,
dst,
|caps, dst| {
let start = dst.len();
caps.interpolate(
|name| matcher.capture_index(name),
subject,
replacement,
dst,
);
let end = dst.len();
matches.push(Match::new(start, end));
true
},
)
.map_err(io::Error::error_message)?;
replace_with_captures_in_context(
matcher,
subject,
range.clone(),
caps,
dst,
|caps, dst| {
let start = dst.len();
caps.interpolate(
|name| matcher.capture_index(name),
subject,
replacement,
dst,
);
let end = dst.len();
matches.push(Match::new(start, end));
true
},
)
.map_err(io::Error::error_message)?;
}
Ok(())
}
@@ -458,3 +458,33 @@ pub fn trim_line_terminator(
*line = line.with_end(end);
}
}
/// Like `Matcher::replace_with_captures_at`, but accepts an end bound.
///
/// See also: `find_iter_at_in_context` for why we need this.
fn replace_with_captures_in_context<M, F>(
matcher: M,
bytes: &[u8],
range: std::ops::Range<usize>,
caps: &mut M::Captures,
dst: &mut Vec<u8>,
mut append: F,
) -> Result<(), M::Error>
where
M: Matcher,
F: FnMut(&M::Captures, &mut Vec<u8>) -> bool,
{
let mut last_match = range.start;
matcher.captures_iter_at(bytes, range.start, caps, |caps| {
let m = caps.get(0).unwrap();
if m.start() >= range.end {
return false;
}
dst.extend(&bytes[last_match..m.start()]);
last_match = m.end();
append(caps, dst)
})?;
let end = std::cmp::min(bytes.len(), range.end);
dst.extend(&bytes[last_match..end]);
Ok(())
}

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-regex"
version = "0.1.9" #:version
version = "0.1.11" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Use Rust's regex library with the 'grep' crate.
@@ -10,14 +10,13 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/regex"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/regex"
readme = "README.md"
keywords = ["regex", "grep", "search", "pattern", "line"]
license = "Unlicense/MIT"
edition = "2018"
license = "Unlicense OR MIT"
edition = "2021"
[dependencies]
aho-corasick = "0.7.3"
bstr = "0.2.10"
grep-matcher = { version = "0.1.5", path = "../matcher" }
log = "0.4.5"
regex = "1.1"
regex-syntax = "0.6.5"
thread_local = "1.1.2"
aho-corasick = "1.0.2"
bstr = "1.6.0"
grep-matcher = { version = "0.1.6", path = "../matcher" }
log = "0.4.19"
regex-automata = { version = "0.3.0" }
regex-syntax = "0.7.2"

View File

@@ -1,17 +1,13 @@
use regex_syntax::ast::parse::Parser;
use regex_syntax::ast::{self, Ast};
/// The results of analyzing AST of a regular expression (e.g., for supporting
/// smart case).
#[derive(Clone, Debug)]
pub struct AstAnalysis {
pub(crate) struct AstAnalysis {
/// True if and only if a literal uppercase character occurs in the regex.
any_uppercase: bool,
/// True if and only if the regex contains any literal at all.
any_literal: bool,
/// True if and only if the regex consists entirely of a literal and no
/// other special regex characters.
all_verbatim_literal: bool,
}
impl AstAnalysis {
@@ -19,16 +15,16 @@ impl AstAnalysis {
///
/// If `pattern` is not a valid regular expression, then `None` is
/// returned.
#[allow(dead_code)]
pub fn from_pattern(pattern: &str) -> Option<AstAnalysis> {
Parser::new()
#[cfg(test)]
pub(crate) fn from_pattern(pattern: &str) -> Option<AstAnalysis> {
regex_syntax::ast::parse::Parser::new()
.parse(pattern)
.map(|ast| AstAnalysis::from_ast(&ast))
.ok()
}
/// Perform an AST analysis given the AST.
pub fn from_ast(ast: &Ast) -> AstAnalysis {
pub(crate) fn from_ast(ast: &Ast) -> AstAnalysis {
let mut analysis = AstAnalysis::new();
analysis.from_ast_impl(ast);
analysis
@@ -40,7 +36,7 @@ impl AstAnalysis {
/// For example, a pattern like `\pL` contains no uppercase literals,
/// even though `L` is uppercase and the `\pL` class contains uppercase
/// characters.
pub fn any_uppercase(&self) -> bool {
pub(crate) fn any_uppercase(&self) -> bool {
self.any_uppercase
}
@@ -48,32 +44,13 @@ impl AstAnalysis {
///
/// For example, a pattern like `\pL` reports `false`, but a pattern like
/// `\pLfoo` reports `true`.
pub fn any_literal(&self) -> bool {
pub(crate) fn any_literal(&self) -> bool {
self.any_literal
}
/// Returns true if and only if the entire pattern is a verbatim literal
/// with no special meta characters.
///
/// When this is true, then the pattern satisfies the following law:
/// `escape(pattern) == pattern`. Notable examples where this returns
/// `false` include patterns like `a\u0061` even though `\u0061` is just
/// a literal `a`.
///
/// The purpose of this flag is to determine whether the patterns can be
/// given to non-regex substring search algorithms as-is.
#[allow(dead_code)]
pub fn all_verbatim_literal(&self) -> bool {
self.all_verbatim_literal
}
/// Creates a new `AstAnalysis` value with an initial configuration.
fn new() -> AstAnalysis {
AstAnalysis {
any_uppercase: false,
any_literal: false,
all_verbatim_literal: true,
}
AstAnalysis { any_uppercase: false, any_literal: false }
}
fn from_ast_impl(&mut self, ast: &Ast) {
@@ -86,26 +63,20 @@ impl AstAnalysis {
| Ast::Dot(_)
| Ast::Assertion(_)
| Ast::Class(ast::Class::Unicode(_))
| Ast::Class(ast::Class::Perl(_)) => {
self.all_verbatim_literal = false;
}
| Ast::Class(ast::Class::Perl(_)) => {}
Ast::Literal(ref x) => {
self.from_ast_literal(x);
}
Ast::Class(ast::Class::Bracketed(ref x)) => {
self.all_verbatim_literal = false;
self.from_ast_class_set(&x.kind);
}
Ast::Repetition(ref x) => {
self.all_verbatim_literal = false;
self.from_ast_impl(&x.ast);
}
Ast::Group(ref x) => {
self.all_verbatim_literal = false;
self.from_ast_impl(&x.ast);
}
Ast::Alternation(ref alt) => {
self.all_verbatim_literal = false;
for x in &alt.asts {
self.from_ast_impl(x);
}
@@ -161,9 +132,6 @@ impl AstAnalysis {
}
fn from_ast_literal(&mut self, ast: &ast::Literal) {
if ast.kind != ast::LiteralKind::Verbatim {
self.all_verbatim_literal = false;
}
self.any_literal = true;
self.any_uppercase = self.any_uppercase || ast.c.is_uppercase();
}
@@ -171,7 +139,7 @@ impl AstAnalysis {
/// Returns true if and only if the attributes can never change no matter
/// what other AST it might see.
fn done(&self) -> bool {
self.any_uppercase && self.any_literal && !self.all_verbatim_literal
self.any_uppercase && self.any_literal
}
}
@@ -188,76 +156,61 @@ mod tests {
let x = analysis("");
assert!(!x.any_uppercase);
assert!(!x.any_literal);
assert!(x.all_verbatim_literal);
let x = analysis("foo");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(x.all_verbatim_literal);
let x = analysis("Foo");
assert!(x.any_uppercase);
assert!(x.any_literal);
assert!(x.all_verbatim_literal);
let x = analysis("foO");
assert!(x.any_uppercase);
assert!(x.any_literal);
assert!(x.all_verbatim_literal);
let x = analysis(r"foo\\");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo\w");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo\S");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo\p{Ll}");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo[a-z]");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo[A-Z]");
assert!(x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo[\S\t]");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo\\S");
assert!(x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"\p{Ll}");
assert!(!x.any_uppercase);
assert!(!x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"aBc\w");
assert!(x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"a\u0061");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
}
}

View File

@@ -1,15 +1,16 @@
use grep_matcher::{ByteSet, LineTerminator};
use regex::bytes::{Regex, RegexBuilder};
use regex_syntax::ast::{self, Ast};
use regex_syntax::hir::{self, Hir};
use {
grep_matcher::{ByteSet, LineTerminator},
regex_automata::meta::Regex,
regex_syntax::{
ast,
hir::{self, Hir, HirKind},
},
};
use crate::ast::AstAnalysis;
use crate::crlf::crlfify;
use crate::error::Error;
use crate::literal::LiteralSets;
use crate::multi::alternation_literals;
use crate::non_matching::non_matching_bytes;
use crate::strip::strip_from_match;
use crate::{
ast::AstAnalysis, error::Error, non_matching::non_matching_bytes,
strip::strip_from_match,
};
/// Config represents the configuration of a regex matcher in this crate.
/// The configuration is itself a rough combination of the knobs found in
@@ -21,21 +22,23 @@ use crate::strip::strip_from_match;
/// configuration which generated it, and provides transformation on that HIR
/// such that the configuration is preserved.
#[derive(Clone, Debug)]
pub struct Config {
pub case_insensitive: bool,
pub case_smart: bool,
pub multi_line: bool,
pub dot_matches_new_line: bool,
pub swap_greed: bool,
pub ignore_whitespace: bool,
pub unicode: bool,
pub octal: bool,
pub size_limit: usize,
pub dfa_size_limit: usize,
pub nest_limit: u32,
pub line_terminator: Option<LineTerminator>,
pub crlf: bool,
pub word: bool,
pub(crate) struct Config {
pub(crate) case_insensitive: bool,
pub(crate) case_smart: bool,
pub(crate) multi_line: bool,
pub(crate) dot_matches_new_line: bool,
pub(crate) swap_greed: bool,
pub(crate) ignore_whitespace: bool,
pub(crate) unicode: bool,
pub(crate) octal: bool,
pub(crate) size_limit: usize,
pub(crate) dfa_size_limit: usize,
pub(crate) nest_limit: u32,
pub(crate) line_terminator: Option<LineTerminator>,
pub(crate) crlf: bool,
pub(crate) word: bool,
pub(crate) fixed_strings: bool,
pub(crate) whole_line: bool,
}
impl Default for Config {
@@ -50,47 +53,28 @@ impl Default for Config {
unicode: true,
octal: false,
// These size limits are much bigger than what's in the regex
// crate.
// crate by default.
size_limit: 100 * (1 << 20),
dfa_size_limit: 1000 * (1 << 20),
nest_limit: 250,
line_terminator: None,
crlf: false,
word: false,
fixed_strings: false,
whole_line: false,
}
}
}
impl Config {
/// Parse the given pattern and returned its HIR expression along with
/// the current configuration.
///
/// If there was a problem parsing the given expression then an error
/// is returned.
pub fn hir(&self, pattern: &str) -> Result<ConfiguredHIR, Error> {
let ast = self.ast(pattern)?;
let analysis = self.analysis(&ast)?;
let expr = hir::translate::TranslatorBuilder::new()
.allow_invalid_utf8(true)
.case_insensitive(self.is_case_insensitive(&analysis))
.multi_line(self.multi_line)
.dot_matches_new_line(self.dot_matches_new_line)
.swap_greed(self.swap_greed)
.unicode(self.unicode)
.build()
.translate(pattern, &ast)
.map_err(Error::regex)?;
let expr = match self.line_terminator {
None => expr,
Some(line_term) => strip_from_match(expr, line_term)?,
};
Ok(ConfiguredHIR {
original: pattern.to_string(),
config: self.clone(),
analysis,
// If CRLF mode is enabled, replace `$` with `(?:\r?$)`.
expr: if self.crlf { crlfify(expr) } else { expr },
})
/// Use this configuration to build an HIR from the given patterns. The HIR
/// returned corresponds to a single regex that is an alternation of the
/// patterns given.
pub(crate) fn build_many<P: AsRef<str>>(
&self,
patterns: &[P],
) -> Result<ConfiguredHIR, Error> {
ConfiguredHIR::new(self.clone(), patterns)
}
/// Accounting for the `smart_case` config knob, return true if and only if
@@ -105,35 +89,55 @@ impl Config {
analysis.any_literal() && !analysis.any_uppercase()
}
/// Returns true if and only if this config is simple enough such that
/// if the pattern is a simple alternation of literals, then it can be
/// constructed via a plain Aho-Corasick automaton.
/// Returns whether the given patterns should be treated as "fixed strings"
/// literals. This is different from just querying the `fixed_strings` knob
/// in that if the knob is false, this will still return true in some cases
/// if the patterns are themselves indistinguishable from literals.
///
/// Note that it is OK to return true even when settings like `multi_line`
/// are enabled, since if multi-line can impact the match semantics of a
/// regex, then it is by definition not a simple alternation of literals.
pub fn can_plain_aho_corasick(&self) -> bool {
!self.word && !self.case_insensitive && !self.case_smart
}
/// Perform analysis on the AST of this pattern.
///
/// This returns an error if the given pattern failed to parse.
fn analysis(&self, ast: &Ast) -> Result<AstAnalysis, Error> {
Ok(AstAnalysis::from_ast(ast))
}
/// Parse the given pattern into its abstract syntax.
///
/// This returns an error if the given pattern failed to parse.
fn ast(&self, pattern: &str) -> Result<Ast, Error> {
ast::parse::ParserBuilder::new()
.nest_limit(self.nest_limit)
.octal(self.octal)
.ignore_whitespace(self.ignore_whitespace)
.build()
.parse(pattern)
.map_err(Error::regex)
/// The main idea here is that if this returns true, then it is safe
/// to build an `regex_syntax::hir::Hir` value directly from the given
/// patterns as an alternation of `hir::Literal` values.
fn is_fixed_strings<P: AsRef<str>>(&self, patterns: &[P]) -> bool {
// When these are enabled, we really need to parse the patterns and
// let them go through the standard HIR translation process in order
// for case folding transforms to be applied.
if self.case_insensitive || self.case_smart {
return false;
}
// Even if whole_line or word is enabled, both of those things can
// be implemented by wrapping the Hir generated by an alternation of
// fixed string literals. So for here at least, we don't care about the
// word or whole_line settings.
if self.fixed_strings {
// ... but if any literal contains a line terminator, then we've
// got to bail out because this will ultimately result in an error.
if let Some(lineterm) = self.line_terminator {
for p in patterns.iter() {
if has_line_terminator(lineterm, p.as_ref()) {
return false;
}
}
}
return true;
}
// In this case, the only way we can hand construct the Hir is if none
// of the patterns contain meta characters. If they do, then we need to
// send them through the standard parsing/translation process.
for p in patterns.iter() {
let p = p.as_ref();
if p.chars().any(regex_syntax::is_meta_character) {
return false;
}
// Same deal as when fixed_strings is set above. If the pattern has
// a line terminator anywhere, then we need to bail out and let
// an error occur.
if let Some(lineterm) = self.line_terminator {
if has_line_terminator(lineterm, p) {
return false;
}
}
}
true
}
}
@@ -149,140 +153,268 @@ impl Config {
/// size limits set on the configured HIR will be propagated out to any
/// subsequently constructed HIR or regular expression.
#[derive(Clone, Debug)]
pub struct ConfiguredHIR {
original: String,
pub(crate) struct ConfiguredHIR {
config: Config,
analysis: AstAnalysis,
expr: Hir,
hir: Hir,
}
impl ConfiguredHIR {
/// Return the configuration for this HIR expression.
pub fn config(&self) -> &Config {
/// Parse the given patterns into a single HIR expression that represents
/// an alternation of the patterns given.
fn new<P: AsRef<str>>(
config: Config,
patterns: &[P],
) -> Result<ConfiguredHIR, Error> {
let hir = if config.is_fixed_strings(patterns) {
let mut alts = vec![];
for p in patterns.iter() {
alts.push(Hir::literal(p.as_ref().as_bytes()));
}
log::debug!(
"assembling HIR from {} fixed string literals",
alts.len()
);
let hir = Hir::alternation(alts);
hir
} else {
let mut alts = vec![];
for p in patterns.iter() {
alts.push(if config.fixed_strings {
format!("(?:{})", regex_syntax::escape(p.as_ref()))
} else {
format!("(?:{})", p.as_ref())
});
}
let pattern = alts.join("|");
let ast = ast::parse::ParserBuilder::new()
.nest_limit(config.nest_limit)
.octal(config.octal)
.ignore_whitespace(config.ignore_whitespace)
.build()
.parse(&pattern)
.map_err(Error::generic)?;
let analysis = AstAnalysis::from_ast(&ast);
let mut hir = hir::translate::TranslatorBuilder::new()
.utf8(false)
.case_insensitive(config.is_case_insensitive(&analysis))
.multi_line(config.multi_line)
.dot_matches_new_line(config.dot_matches_new_line)
.crlf(config.crlf)
.swap_greed(config.swap_greed)
.unicode(config.unicode)
.build()
.translate(&pattern, &ast)
.map_err(Error::generic)?;
// We don't need to do this for the fixed-strings case above
// because is_fixed_strings will return false if any pattern
// contains a line terminator. Therefore, we don't need to strip
// it.
//
// We go to some pains to avoid doing this in the fixed-strings
// case because this can result in building a new HIR when ripgrep
// is given a huge set of literals to search for. And this can
// actually take a little time. It's not huge, but it's noticeable.
hir = match config.line_terminator {
None => hir,
Some(line_term) => strip_from_match(hir, line_term)?,
};
hir
};
Ok(ConfiguredHIR { config, hir })
}
/// Return a reference to the underlying configuration.
pub(crate) fn config(&self) -> &Config {
&self.config
}
/// Compute the set of non-matching bytes for this HIR expression.
pub fn non_matching_bytes(&self) -> ByteSet {
non_matching_bytes(&self.expr)
/// Return a reference to the underyling HIR.
pub(crate) fn hir(&self) -> &Hir {
&self.hir
}
/// Returns true if and only if this regex needs to have its match offsets
/// tweaked because of CRLF support. Specifically, this occurs when the
/// CRLF hack is enabled and the regex is line anchored at the end. In
/// this case, matches that end with a `\r` have the `\r` stripped.
pub fn needs_crlf_stripped(&self) -> bool {
self.config.crlf && self.expr.is_line_anchored_end()
}
/// Builds a regular expression from this HIR expression.
pub fn regex(&self) -> Result<Regex, Error> {
self.pattern_to_regex(&self.expr.to_string())
}
/// If this HIR corresponds to an alternation of literals with no
/// capturing groups, then this returns those literals.
pub fn alternation_literals(&self) -> Option<Vec<Vec<u8>>> {
if !self.config.can_plain_aho_corasick() {
return None;
}
alternation_literals(&self.expr)
}
/// Applies the given function to the concrete syntax of this HIR and then
/// generates a new HIR based on the result of the function in a way that
/// preserves the configuration.
///
/// For example, this can be used to wrap a user provided regular
/// expression with additional semantics. e.g., See the `WordMatcher`.
pub fn with_pattern<F: FnMut(&str) -> String>(
&self,
mut f: F,
) -> Result<ConfiguredHIR, Error> {
self.pattern_to_hir(&f(&self.expr.to_string()))
}
/// If the current configuration has a line terminator set and if useful
/// literals could be extracted, then a regular expression matching those
/// literals is returned. If no line terminator is set, then `None` is
/// returned.
///
/// If compiling the resulting regular expression failed, then an error
/// is returned.
///
/// This method only returns something when a line terminator is set
/// because matches from this regex are generally candidates that must be
/// confirmed before reporting a match. When performing a line oriented
/// search, confirmation is easy: just extend the candidate match to its
/// respective line boundaries and then re-search that line for a full
/// match. This only works when the line terminator is set because the line
/// terminator setting guarantees that the regex itself can never match
/// through the line terminator byte.
pub fn fast_line_regex(&self) -> Result<Option<Regex>, Error> {
if self.config.line_terminator.is_none() {
return Ok(None);
}
match LiteralSets::new(&self.expr).one_regex(self.config.word) {
None => Ok(None),
Some(pattern) => self.pattern_to_regex(&pattern).map(Some),
}
}
/// Create a regex from the given pattern using this HIR's configuration.
fn pattern_to_regex(&self, pattern: &str) -> Result<Regex, Error> {
// The settings we explicitly set here are intentionally a subset
// of the settings we have. The key point here is that our HIR
// expression is computed with the settings in mind, such that setting
// them here could actually lead to unintended behavior. For example,
// consider the pattern `(?U)a+`. This will get folded into the HIR
// as a non-greedy repetition operator which will in turn get printed
// to the concrete syntax as `a+?`, which is correct. But if we
// set the `swap_greed` option again, then we'll wind up with `(?U)a+?`
// which is equal to `a+` which is not the same as what we were given.
//
// We also don't need to apply `case_insensitive` since this gets
// folded into the HIR and would just cause us to do redundant work.
//
// Finally, we don't need to set `ignore_whitespace` since the concrete
// syntax emitted by the HIR printer never needs it.
//
// We set the rest of the options. Some of them are important, such as
// the size limit, and some of them are necessary to preserve the
// intention of the original pattern. For example, the Unicode flag
// will impact how the WordMatcher functions, namely, whether its
// word boundaries are Unicode aware or not.
RegexBuilder::new(&pattern)
.nest_limit(self.config.nest_limit)
.octal(self.config.octal)
.multi_line(self.config.multi_line)
.dot_matches_new_line(self.config.dot_matches_new_line)
.unicode(self.config.unicode)
.size_limit(self.config.size_limit)
.dfa_size_limit(self.config.dfa_size_limit)
.build()
/// Convert this HIR to a regex that can be used for matching.
pub(crate) fn to_regex(&self) -> Result<Regex, Error> {
let meta = Regex::config()
.utf8_empty(false)
.nfa_size_limit(Some(self.config.size_limit))
// We don't expose a knob for this because the one-pass DFA is
// usually not a perf bottleneck for ripgrep. But we give it some
// extra room than the default.
.onepass_size_limit(Some(10 * (1 << 20)))
// Same deal here. The default limit for full DFAs is VERY small,
// but with ripgrep we can afford to spend a bit more time on
// building them I think.
.dfa_size_limit(Some(1 * (1 << 20)))
.dfa_state_limit(Some(1_000))
.hybrid_cache_capacity(self.config.dfa_size_limit);
Regex::builder()
.configure(meta)
.build_from_hir(&self.hir)
.map_err(Error::regex)
}
/// Create an HIR expression from the given pattern using this HIR's
/// configuration.
fn pattern_to_hir(&self, pattern: &str) -> Result<ConfiguredHIR, Error> {
// See `pattern_to_regex` comment for explanation of why we only set
// a subset of knobs here. e.g., `swap_greed` is explicitly left out.
let expr = ::regex_syntax::ParserBuilder::new()
.nest_limit(self.config.nest_limit)
.octal(self.config.octal)
.allow_invalid_utf8(true)
.multi_line(self.config.multi_line)
.dot_matches_new_line(self.config.dot_matches_new_line)
.unicode(self.config.unicode)
.build()
.parse(pattern)
.map_err(Error::regex)?;
Ok(ConfiguredHIR {
original: self.original.clone(),
config: self.config.clone(),
analysis: self.analysis.clone(),
expr,
})
/// Compute the set of non-matching bytes for this HIR expression.
pub(crate) fn non_matching_bytes(&self) -> ByteSet {
non_matching_bytes(&self.hir)
}
/// Returns the line terminator configured on this expression.
///
/// When we have beginning/end anchors (NOT line anchors), the fast line
/// searching path isn't quite correct. Or at least, doesn't match the slow
/// path. Namely, the slow path strips line terminators while the fast path
/// does not. Since '$' (when multi-line mode is disabled) doesn't match at
/// line boundaries, the existence of a line terminator might cause it to
/// not match when it otherwise would with the line terminator stripped.
///
/// Since searching with text anchors is exceptionally rare in the context
/// of line oriented searching (multi-line mode is basically always
/// enabled), we just disable this optimization when there are text
/// anchors. We disable it by not returning a line terminator, since
/// without a line terminator, the fast search path can't be executed.
///
/// Actually, the above is no longer quite correct. Later on, another
/// optimization was added where if the line terminator was in the set of
/// bytes that was guaranteed to never be part of a match, then the higher
/// level search infrastructure assumes that the fast line-by-line search
/// path can still be taken. This optimization applies when multi-line
/// search (not multi-line mode) is enabled. In that case, there is no
/// configured line terminator since the regex is permitted to match a
/// line terminator. But if the regex is guaranteed to never match across
/// multiple lines despite multi-line search being requested, we can still
/// do the faster and more flexible line-by-line search. This is why the
/// non-matching extraction routine removes `\n` when `\A` and `\z` are
/// present even though that's not quite correct...
///
/// See: <https://github.com/BurntSushi/ripgrep/issues/2260>
pub(crate) fn line_terminator(&self) -> Option<LineTerminator> {
if self.hir.properties().look_set().contains_anchor_haystack() {
None
} else {
self.config.line_terminator
}
}
/// Turns this configured HIR into one that only matches when both sides of
/// the match correspond to a word boundary.
///
/// Note that the HIR returned is like turning `pat` into
/// `(?m:^|\W)(pat)(?m:$|\W)`. That is, the true match is at capture group
/// `1` and not `0`.
pub(crate) fn into_word(self) -> Result<ConfiguredHIR, Error> {
// In theory building the HIR for \W should never fail, but there are
// likely some pathological cases (particularly with respect to certain
// values of limits) where it could in theory fail.
let non_word = {
let mut config = self.config.clone();
config.fixed_strings = false;
ConfiguredHIR::new(config, &[r"\W"])?
};
let line_anchor_start = Hir::look(self.line_anchor_start());
let line_anchor_end = Hir::look(self.line_anchor_end());
let hir = Hir::concat(vec![
Hir::alternation(vec![line_anchor_start, non_word.hir.clone()]),
Hir::capture(hir::Capture {
index: 1,
name: None,
sub: Box::new(renumber_capture_indices(self.hir)?),
}),
Hir::alternation(vec![non_word.hir, line_anchor_end]),
]);
Ok(ConfiguredHIR { config: self.config, hir })
}
/// Turns this configured HIR into an equivalent one, but where it must
/// match at the start and end of a line.
pub(crate) fn into_whole_line(self) -> ConfiguredHIR {
let line_anchor_start = Hir::look(self.line_anchor_start());
let line_anchor_end = Hir::look(self.line_anchor_end());
let hir =
Hir::concat(vec![line_anchor_start, self.hir, line_anchor_end]);
ConfiguredHIR { config: self.config, hir }
}
/// Turns this configured HIR into an equivalent one, but where it must
/// match at the start and end of the haystack.
pub(crate) fn into_anchored(self) -> ConfiguredHIR {
let hir = Hir::concat(vec![
Hir::look(hir::Look::Start),
self.hir,
Hir::look(hir::Look::End),
]);
ConfiguredHIR { config: self.config, hir }
}
/// Returns the "start line" anchor for this configuration.
fn line_anchor_start(&self) -> hir::Look {
if self.config.crlf {
hir::Look::StartCRLF
} else {
hir::Look::StartLF
}
}
/// Returns the "end line" anchor for this configuration.
fn line_anchor_end(&self) -> hir::Look {
if self.config.crlf {
hir::Look::EndCRLF
} else {
hir::Look::EndLF
}
}
}
/// This increments the index of every capture group in the given hir by 1. If
/// any increment results in an overflow, then an error is returned.
fn renumber_capture_indices(hir: Hir) -> Result<Hir, Error> {
Ok(match hir.into_kind() {
HirKind::Empty => Hir::empty(),
HirKind::Literal(hir::Literal(lit)) => Hir::literal(lit),
HirKind::Class(cls) => Hir::class(cls),
HirKind::Look(x) => Hir::look(x),
HirKind::Repetition(mut x) => {
x.sub = Box::new(renumber_capture_indices(*x.sub)?);
Hir::repetition(x)
}
HirKind::Capture(mut cap) => {
cap.index = match cap.index.checked_add(1) {
Some(index) => index,
None => {
// This error message kind of sucks, but it's probably
// impossible for it to happen. The only way a capture
// index can overflow addition is if the regex is huge
// (or something else has gone horribly wrong).
let msg = "could not renumber capture index, too big";
return Err(Error::any(msg));
}
};
cap.sub = Box::new(renumber_capture_indices(*cap.sub)?);
Hir::capture(cap)
}
HirKind::Concat(subs) => {
let subs = subs
.into_iter()
.map(|sub| renumber_capture_indices(sub))
.collect::<Result<Vec<Hir>, Error>>()?;
Hir::concat(subs)
}
HirKind::Alternation(subs) => {
let subs = subs
.into_iter()
.map(|sub| renumber_capture_indices(sub))
.collect::<Result<Vec<Hir>, Error>>()?;
Hir::alternation(subs)
}
})
}
/// Returns true if the given literal string contains any byte from the line
/// terminator given.
fn has_line_terminator(lineterm: LineTerminator, literal: &str) -> bool {
if lineterm.is_crlf() {
literal.as_bytes().iter().copied().any(|b| b == b'\r' || b == b'\n')
} else {
literal.as_bytes().iter().copied().any(|b| b == lineterm.as_byte())
}
}

View File

@@ -1,189 +0,0 @@
use std::collections::HashMap;
use grep_matcher::{Match, Matcher, NoError};
use regex::bytes::Regex;
use regex_syntax::hir::{self, Hir, HirKind};
use crate::config::ConfiguredHIR;
use crate::error::Error;
use crate::matcher::RegexCaptures;
/// A matcher for implementing "word match" semantics.
#[derive(Clone, Debug)]
pub struct CRLFMatcher {
/// The regex.
regex: Regex,
/// A map from capture group name to capture group index.
names: HashMap<String, usize>,
}
impl CRLFMatcher {
/// Create a new matcher from the given pattern that strips `\r` from the
/// end of every match.
///
/// This panics if the given expression doesn't need its CRLF stripped.
pub fn new(expr: &ConfiguredHIR) -> Result<CRLFMatcher, Error> {
assert!(expr.needs_crlf_stripped());
let regex = expr.regex()?;
let mut names = HashMap::new();
for (i, optional_name) in regex.capture_names().enumerate() {
if let Some(name) = optional_name {
names.insert(name.to_string(), i.checked_sub(1).unwrap());
}
}
Ok(CRLFMatcher { regex, names })
}
/// Return the underlying regex used by this matcher.
pub fn regex(&self) -> &Regex {
&self.regex
}
}
impl Matcher for CRLFMatcher {
type Captures = RegexCaptures;
type Error = NoError;
fn find_at(
&self,
haystack: &[u8],
at: usize,
) -> Result<Option<Match>, NoError> {
let m = match self.regex.find_at(haystack, at) {
None => return Ok(None),
Some(m) => Match::new(m.start(), m.end()),
};
Ok(Some(adjust_match(haystack, m)))
}
fn new_captures(&self) -> Result<RegexCaptures, NoError> {
Ok(RegexCaptures::new(self.regex.capture_locations()))
}
fn capture_count(&self) -> usize {
self.regex.captures_len().checked_sub(1).unwrap()
}
fn capture_index(&self, name: &str) -> Option<usize> {
self.names.get(name).map(|i| *i)
}
fn captures_at(
&self,
haystack: &[u8],
at: usize,
caps: &mut RegexCaptures,
) -> Result<bool, NoError> {
caps.strip_crlf(false);
let r =
self.regex.captures_read_at(caps.locations_mut(), haystack, at);
if !r.is_some() {
return Ok(false);
}
// If the end of our match includes a `\r`, then strip it from all
// capture groups ending at the same location.
let end = caps.locations().get(0).unwrap().1;
if end > 0 && haystack.get(end - 1) == Some(&b'\r') {
caps.strip_crlf(true);
}
Ok(true)
}
// We specifically do not implement other methods like find_iter or
// captures_iter. Namely, the iter methods are guaranteed to be correct
// by virtue of implementing find_at and captures_at above.
}
/// If the given match ends with a `\r`, then return a new match that ends
/// immediately before the `\r`.
pub fn adjust_match(haystack: &[u8], m: Match) -> Match {
if m.end() > 0 && haystack.get(m.end() - 1) == Some(&b'\r') {
m.with_end(m.end() - 1)
} else {
m
}
}
/// Substitutes all occurrences of multi-line enabled `$` with `(?:\r?$)`.
///
/// This does not preserve the exact semantics of the given expression,
/// however, it does have the useful property that anything that matched the
/// given expression will also match the returned expression. The difference is
/// that the returned expression can match possibly other things as well.
///
/// The principle reason why we do this is because the underlying regex engine
/// doesn't support CRLF aware `$` look-around. It's planned to fix it at that
/// level, but we perform this kludge in the mean time.
///
/// Note that while the match preserving semantics are nice and neat, the
/// match position semantics are quite a bit messier. Namely, `$` only ever
/// matches the position between characters where as `\r??` can match a
/// character and change the offset. This is regretable, but works out pretty
/// nicely in most cases, especially when a match is limited to a single line.
pub fn crlfify(expr: Hir) -> Hir {
match expr.into_kind() {
HirKind::Anchor(hir::Anchor::EndLine) => {
let concat = Hir::concat(vec![
Hir::repetition(hir::Repetition {
kind: hir::RepetitionKind::ZeroOrOne,
greedy: false,
hir: Box::new(Hir::literal(hir::Literal::Unicode('\r'))),
}),
Hir::anchor(hir::Anchor::EndLine),
]);
Hir::group(hir::Group {
kind: hir::GroupKind::NonCapturing,
hir: Box::new(concat),
})
}
HirKind::Empty => Hir::empty(),
HirKind::Literal(x) => Hir::literal(x),
HirKind::Class(x) => Hir::class(x),
HirKind::Anchor(x) => Hir::anchor(x),
HirKind::WordBoundary(x) => Hir::word_boundary(x),
HirKind::Repetition(mut x) => {
x.hir = Box::new(crlfify(*x.hir));
Hir::repetition(x)
}
HirKind::Group(mut x) => {
x.hir = Box::new(crlfify(*x.hir));
Hir::group(x)
}
HirKind::Concat(xs) => {
Hir::concat(xs.into_iter().map(crlfify).collect())
}
HirKind::Alternation(xs) => {
Hir::alternation(xs.into_iter().map(crlfify).collect())
}
}
}
#[cfg(test)]
mod tests {
use super::crlfify;
use regex_syntax::Parser;
fn roundtrip(pattern: &str) -> String {
let expr1 = Parser::new().parse(pattern).unwrap();
let expr2 = crlfify(expr1);
expr2.to_string()
}
#[test]
fn various() {
assert_eq!(roundtrip(r"(?m)$"), "(?:\r??(?m:$))");
assert_eq!(roundtrip(r"(?m)$$"), "(?:\r??(?m:$))(?:\r??(?m:$))");
assert_eq!(
roundtrip(r"(?m)(?:foo$|bar$)"),
"(?:foo(?:\r??(?m:$))|bar(?:\r??(?m:$)))"
);
assert_eq!(roundtrip(r"(?m)$a"), "(?:\r??(?m:$))a");
// Not a multiline `$`, so no crlfifying occurs.
assert_eq!(roundtrip(r"$"), "\\z");
// It's a literal, derp.
assert_eq!(roundtrip(r"\$"), "\\$");
}
}

View File

@@ -1,8 +1,3 @@
use std::error;
use std::fmt;
use crate::util;
/// An error that can occur in this crate.
///
/// Generally, this error corresponds to problems building a regular
@@ -18,10 +13,27 @@ impl Error {
Error { kind }
}
pub(crate) fn regex<E: error::Error>(err: E) -> Error {
pub(crate) fn regex(err: regex_automata::meta::BuildError) -> Error {
if let Some(size_limit) = err.size_limit() {
let kind = ErrorKind::Regex(format!(
"compiled regex exceeds size limit of {size_limit}",
));
Error { kind }
} else if let Some(ref err) = err.syntax_error() {
Error::generic(err)
} else {
Error::generic(err)
}
}
pub(crate) fn generic<E: std::error::Error>(err: E) -> Error {
Error { kind: ErrorKind::Regex(err.to_string()) }
}
pub(crate) fn any<E: ToString>(msg: E) -> Error {
Error { kind: ErrorKind::Regex(msg.to_string()) }
}
/// Return the kind of this error.
pub fn kind(&self) -> &ErrorKind {
&self.kind
@@ -30,6 +42,7 @@ impl Error {
/// The kind of an error that can occur.
#[derive(Clone, Debug)]
#[non_exhaustive]
pub enum ErrorKind {
/// An error that occurred as a result of parsing a regular expression.
/// This can be a syntax error or an error that results from attempting to
@@ -51,38 +64,26 @@ pub enum ErrorKind {
///
/// The invalid byte is included in this error.
InvalidLineTerminator(u8),
/// Hints that destructuring should not be exhaustive.
///
/// This enum may grow additional variants, so this makes sure clients
/// don't count on exhaustive matching. (Otherwise, adding a new variant
/// could break existing code.)
#[doc(hidden)]
__Nonexhaustive,
}
impl error::Error for Error {
fn description(&self) -> &str {
match self.kind {
ErrorKind::Regex(_) => "regex error",
ErrorKind::NotAllowed(_) => "literal not allowed",
ErrorKind::InvalidLineTerminator(_) => "invalid line terminator",
ErrorKind::__Nonexhaustive => unreachable!(),
}
}
}
impl std::error::Error for Error {}
impl std::fmt::Display for Error {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
use bstr::ByteSlice;
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self.kind {
ErrorKind::Regex(ref s) => write!(f, "{}", s),
ErrorKind::NotAllowed(ref lit) => {
write!(f, "the literal '{:?}' is not allowed in a regex", lit)
write!(f, "the literal {:?} is not allowed in a regex", lit)
}
ErrorKind::InvalidLineTerminator(byte) => {
let x = util::show_bytes(&[byte]);
write!(f, "line terminators must be ASCII, but '{}' is not", x)
write!(
f,
"line terminators must be ASCII, but {} is not",
[byte].as_bstr()
)
}
ErrorKind::__Nonexhaustive => unreachable!(),
}
}
}

View File

@@ -8,12 +8,9 @@ pub use crate::matcher::{RegexCaptures, RegexMatcher, RegexMatcherBuilder};
mod ast;
mod config;
mod crlf;
mod error;
mod literal;
mod matcher;
mod multi;
mod non_matching;
mod strip;
mod util;
mod word;

File diff suppressed because it is too large Load Diff

View File

@@ -1,15 +1,22 @@
use std::collections::HashMap;
use std::sync::Arc;
use grep_matcher::{
ByteSet, Captures, LineMatchKind, LineTerminator, Match, Matcher, NoError,
use {
grep_matcher::{
ByteSet, Captures, LineMatchKind, LineTerminator, Match, Matcher,
NoError,
},
regex_automata::{
meta::Regex, util::captures::Captures as AutomataCaptures, Input,
PatternID,
},
};
use regex::bytes::{CaptureLocations, Regex};
use crate::config::{Config, ConfiguredHIR};
use crate::crlf::CRLFMatcher;
use crate::error::Error;
use crate::multi::MultiLiteralMatcher;
use crate::word::WordMatcher;
use crate::{
config::{Config, ConfiguredHIR},
error::Error,
literal::InnerLiterals,
word::WordMatcher,
};
/// A builder for constructing a `Matcher` using regular expressions.
///
@@ -43,17 +50,40 @@ impl RegexMatcherBuilder {
/// The syntax supported is documented as part of the regex crate:
/// <https://docs.rs/regex/#syntax>.
pub fn build(&self, pattern: &str) -> Result<RegexMatcher, Error> {
let chir = self.config.hir(pattern)?;
let fast_line_regex = chir.fast_line_regex()?;
let non_matching_bytes = chir.non_matching_bytes();
if let Some(ref re) = fast_line_regex {
log::debug!("extracted fast line regex: {:?}", re);
}
self.build_many(&[pattern])
}
let matcher = RegexMatcherImpl::new(&chir)?;
log::trace!("final regex: {:?}", matcher.regex());
/// Build a new matcher using the current configuration for the provided
/// patterns. The resulting matcher behaves as if all of the patterns
/// given are joined together into a single alternation. That is, it
/// reports matches where at least one of the given patterns matches.
pub fn build_many<P: AsRef<str>>(
&self,
patterns: &[P],
) -> Result<RegexMatcher, Error> {
let chir = self.config.build_many(patterns)?;
let matcher = RegexMatcherImpl::new(chir)?;
let (chir, re) = (matcher.chir(), matcher.regex());
log::trace!("final regex: {:?}", chir.hir().to_string());
let non_matching_bytes = chir.non_matching_bytes();
// If we can pick out some literals from the regex, then we might be
// able to build a faster regex that quickly identifies candidate
// matching lines. The regex engine will do what it can on its own, but
// we can specifically do a little more when a line terminator is set.
// For example, for a regex like `\w+foo\w+`, we can look for `foo`,
// and when a match is found, look for the line containing `foo` and
// then run the original regex on only that line. (In this case, the
// regex engine is likely to handle this case for us since it's so
// simple, but the idea applies.)
let fast_line_regex = InnerLiterals::new(chir, re).one_regex()?;
// We override the line terminator in case the configured HIR doesn't
// support it.
let mut config = self.config.clone();
config.line_terminator = chir.line_terminator();
Ok(RegexMatcher {
config: self.config.clone(),
config,
matcher,
fast_line_regex,
non_matching_bytes,
@@ -69,39 +99,7 @@ impl RegexMatcherBuilder {
&self,
literals: &[B],
) -> Result<RegexMatcher, Error> {
let mut has_escape = false;
let mut slices = vec![];
for lit in literals {
slices.push(lit.as_ref());
has_escape = has_escape || lit.as_ref().contains('\\');
}
// Even when we have a fixed set of literals, we might still want to
// use the regex engine. Specifically, if any string has an escape
// in it, then we probably can't feed it to Aho-Corasick without
// removing the escape. Additionally, if there are any particular
// special match semantics we need to honor, that Aho-Corasick isn't
// enough. Finally, the regex engine can do really well with a small
// number of literals (at time of writing, this is changing soon), so
// we use it when there's a small set.
//
// Yes, this is one giant hack. Ideally, this entirely separate literal
// matcher that uses Aho-Corasick would be pushed down into the regex
// engine.
if has_escape
|| !self.config.can_plain_aho_corasick()
|| literals.len() < 40
{
return self.build(&slices.join("|"));
}
let matcher = MultiLiteralMatcher::new(&slices)?;
let imp = RegexMatcherImpl::MultiLiteral(matcher);
Ok(RegexMatcher {
config: self.config.clone(),
matcher: imp,
fast_line_regex: None,
non_matching_bytes: ByteSet::empty(),
})
self.build_many(literals)
}
/// Set the value for the case insensitive (`i`) flag.
@@ -302,20 +300,15 @@ impl RegexMatcherBuilder {
/// 1. It causes the line terminator for the matcher to be `\r\n`. Namely,
/// this prevents the matcher from ever producing a match that contains
/// a `\r` or `\n`.
/// 2. It translates all instances of `$` in the pattern to `(?:\r??$)`.
/// This works around the fact that the regex engine does not support
/// matching CRLF as a line terminator when using `$`.
/// 2. It enables CRLF mode for `^` and `$`. This means that line anchors
/// will treat both `\r` and `\n` as line terminators, but will never
/// match between a `\r` and `\n`.
///
/// In particular, because of (2), the matches produced by the matcher may
/// be slightly different than what one would expect given the pattern.
/// This is the trade off made: in many cases, `$` will "just work" in the
/// presence of `\r\n` line terminators, but matches may require some
/// trimming to faithfully represent the intended match.
///
/// Note that if you do not wish to set the line terminator but would still
/// like `$` to match `\r\n` line terminators, then it is valid to call
/// `crlf(true)` followed by `line_terminator(None)`. Ordering is
/// important, since `crlf` and `line_terminator` override each other.
/// Note that if you do not wish to set the line terminator but would
/// still like `$` to match `\r\n` line terminators, then it is valid to
/// call `crlf(true)` followed by `line_terminator(None)`. Ordering is
/// important, since `crlf` sets the line terminator, but `line_terminator`
/// does not touch the `crlf` setting.
pub fn crlf(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
if yes {
self.config.line_terminator = Some(LineTerminator::crlf());
@@ -341,6 +334,21 @@ impl RegexMatcherBuilder {
self.config.word = yes;
self
}
/// Whether the patterns should be treated as literal strings or not. When
/// this is active, all characters, including ones that would normally be
/// special regex meta characters, are matched literally.
pub fn fixed_strings(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.config.fixed_strings = yes;
self
}
/// Whether each pattern should match the entire line or not. This is
/// equivalent to surrounding the pattern with `(?m:^)` and `(?m:$)`.
pub fn whole_line(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.config.whole_line = yes;
self
}
}
/// An implementation of the `Matcher` trait using Rust's standard regex
@@ -370,10 +378,10 @@ impl RegexMatcher {
/// Create a new matcher from the given pattern using the default
/// configuration, but matches lines terminated by `\n`.
///
/// This is meant to be a convenience constructor for using a
/// `RegexMatcherBuilder` and setting its
/// [`line_terminator`](struct.RegexMatcherBuilder.html#method.line_terminator)
/// to `\n`. The purpose of using this constructor is to permit special
/// This is meant to be a convenience constructor for
/// using a `RegexMatcherBuilder` and setting its
/// [`line_terminator`](RegexMatcherBuilder::method.line_terminator) to
/// `\n`. The purpose of using this constructor is to permit special
/// optimizations that help speed up line oriented search. These types of
/// optimizations are only appropriate when matches span no more than one
/// line. For this reason, this constructor will return an error if the
@@ -389,13 +397,6 @@ impl RegexMatcher {
enum RegexMatcherImpl {
/// The standard matcher used for all regular expressions.
Standard(StandardMatcher),
/// A matcher for an alternation of plain literals.
MultiLiteral(MultiLiteralMatcher),
/// A matcher that strips `\r` from the end of matches.
///
/// This is only used when the CRLF hack is enabled and the regex is line
/// anchored at the end.
CRLF(CRLFMatcher),
/// A matcher that only matches at word boundaries. This transforms the
/// regex to `(^|\W)(...)($|\W)` instead of the more intuitive `\b(...)\b`.
/// Because of this, the WordMatcher provides its own implementation of
@@ -407,29 +408,33 @@ enum RegexMatcherImpl {
impl RegexMatcherImpl {
/// Based on the configuration, create a new implementation of the
/// `Matcher` trait.
fn new(expr: &ConfiguredHIR) -> Result<RegexMatcherImpl, Error> {
if expr.config().word {
Ok(RegexMatcherImpl::Word(WordMatcher::new(expr)?))
} else if expr.needs_crlf_stripped() {
Ok(RegexMatcherImpl::CRLF(CRLFMatcher::new(expr)?))
fn new(mut chir: ConfiguredHIR) -> Result<RegexMatcherImpl, Error> {
// When whole_line is set, we don't use a word matcher even if word
// matching was requested. Why? Because `(?m:^)(pat)(?m:$)` implies
// word matching.
Ok(if chir.config().word && !chir.config().whole_line {
RegexMatcherImpl::Word(WordMatcher::new(chir)?)
} else {
if let Some(lits) = expr.alternation_literals() {
if lits.len() >= 40 {
let matcher = MultiLiteralMatcher::new(&lits)?;
return Ok(RegexMatcherImpl::MultiLiteral(matcher));
}
if chir.config().whole_line {
chir = chir.into_whole_line();
}
Ok(RegexMatcherImpl::Standard(StandardMatcher::new(expr)?))
}
RegexMatcherImpl::Standard(StandardMatcher::new(chir)?)
})
}
/// Return the underlying regex object used.
fn regex(&self) -> String {
fn regex(&self) -> &Regex {
match *self {
RegexMatcherImpl::Word(ref x) => x.regex().to_string(),
RegexMatcherImpl::CRLF(ref x) => x.regex().to_string(),
RegexMatcherImpl::MultiLiteral(_) => "<N/A>".to_string(),
RegexMatcherImpl::Standard(ref x) => x.regex.to_string(),
RegexMatcherImpl::Word(ref x) => x.regex(),
RegexMatcherImpl::Standard(ref x) => &x.regex,
}
}
/// Return the underlying HIR of the regex used for searching.
fn chir(&self) -> &ConfiguredHIR {
match *self {
RegexMatcherImpl::Word(ref x) => x.chir(),
RegexMatcherImpl::Standard(ref x) => &x.chir,
}
}
}
@@ -449,8 +454,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.find_at(haystack, at),
MultiLiteral(ref m) => m.find_at(haystack, at),
CRLF(ref m) => m.find_at(haystack, at),
Word(ref m) => m.find_at(haystack, at),
}
}
@@ -459,8 +462,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.new_captures(),
MultiLiteral(ref m) => m.new_captures(),
CRLF(ref m) => m.new_captures(),
Word(ref m) => m.new_captures(),
}
}
@@ -469,8 +470,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.capture_count(),
MultiLiteral(ref m) => m.capture_count(),
CRLF(ref m) => m.capture_count(),
Word(ref m) => m.capture_count(),
}
}
@@ -479,8 +478,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.capture_index(name),
MultiLiteral(ref m) => m.capture_index(name),
CRLF(ref m) => m.capture_index(name),
Word(ref m) => m.capture_index(name),
}
}
@@ -489,8 +486,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.find(haystack),
MultiLiteral(ref m) => m.find(haystack),
CRLF(ref m) => m.find(haystack),
Word(ref m) => m.find(haystack),
}
}
@@ -502,8 +497,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.find_iter(haystack, matched),
MultiLiteral(ref m) => m.find_iter(haystack, matched),
CRLF(ref m) => m.find_iter(haystack, matched),
Word(ref m) => m.find_iter(haystack, matched),
}
}
@@ -519,8 +512,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.try_find_iter(haystack, matched),
MultiLiteral(ref m) => m.try_find_iter(haystack, matched),
CRLF(ref m) => m.try_find_iter(haystack, matched),
Word(ref m) => m.try_find_iter(haystack, matched),
}
}
@@ -533,8 +524,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.captures(haystack, caps),
MultiLiteral(ref m) => m.captures(haystack, caps),
CRLF(ref m) => m.captures(haystack, caps),
Word(ref m) => m.captures(haystack, caps),
}
}
@@ -551,8 +540,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.captures_iter(haystack, caps, matched),
MultiLiteral(ref m) => m.captures_iter(haystack, caps, matched),
CRLF(ref m) => m.captures_iter(haystack, caps, matched),
Word(ref m) => m.captures_iter(haystack, caps, matched),
}
}
@@ -569,10 +556,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.try_captures_iter(haystack, caps, matched),
MultiLiteral(ref m) => {
m.try_captures_iter(haystack, caps, matched)
}
CRLF(ref m) => m.try_captures_iter(haystack, caps, matched),
Word(ref m) => m.try_captures_iter(haystack, caps, matched),
}
}
@@ -586,8 +569,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.captures_at(haystack, at, caps),
MultiLiteral(ref m) => m.captures_at(haystack, at, caps),
CRLF(ref m) => m.captures_at(haystack, at, caps),
Word(ref m) => m.captures_at(haystack, at, caps),
}
}
@@ -604,8 +585,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.replace(haystack, dst, append),
MultiLiteral(ref m) => m.replace(haystack, dst, append),
CRLF(ref m) => m.replace(haystack, dst, append),
Word(ref m) => m.replace(haystack, dst, append),
}
}
@@ -625,12 +604,6 @@ impl Matcher for RegexMatcher {
Standard(ref m) => {
m.replace_with_captures(haystack, caps, dst, append)
}
MultiLiteral(ref m) => {
m.replace_with_captures(haystack, caps, dst, append)
}
CRLF(ref m) => {
m.replace_with_captures(haystack, caps, dst, append)
}
Word(ref m) => {
m.replace_with_captures(haystack, caps, dst, append)
}
@@ -641,8 +614,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.is_match(haystack),
MultiLiteral(ref m) => m.is_match(haystack),
CRLF(ref m) => m.is_match(haystack),
Word(ref m) => m.is_match(haystack),
}
}
@@ -655,8 +626,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.is_match_at(haystack, at),
MultiLiteral(ref m) => m.is_match_at(haystack, at),
CRLF(ref m) => m.is_match_at(haystack, at),
Word(ref m) => m.is_match_at(haystack, at),
}
}
@@ -668,8 +637,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.shortest_match(haystack),
MultiLiteral(ref m) => m.shortest_match(haystack),
CRLF(ref m) => m.shortest_match(haystack),
Word(ref m) => m.shortest_match(haystack),
}
}
@@ -682,8 +649,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.shortest_match_at(haystack, at),
MultiLiteral(ref m) => m.shortest_match_at(haystack, at),
CRLF(ref m) => m.shortest_match_at(haystack, at),
Word(ref m) => m.shortest_match_at(haystack, at),
}
}
@@ -702,7 +667,10 @@ impl Matcher for RegexMatcher {
) -> Result<Option<LineMatchKind>, NoError> {
Ok(match self.fast_line_regex {
Some(ref regex) => {
regex.shortest_match(haystack).map(LineMatchKind::Candidate)
let input = Input::new(haystack);
regex
.search_half(&input)
.map(|hm| LineMatchKind::Candidate(hm.offset()))
}
None => {
self.shortest_match(haystack)?.map(LineMatchKind::Confirmed)
@@ -717,20 +685,19 @@ struct StandardMatcher {
/// The regular expression compiled from the pattern provided by the
/// caller.
regex: Regex,
/// A map from capture group name to its corresponding index.
names: HashMap<String, usize>,
/// The HIR that produced this regex.
///
/// We put this in an `Arc` because by the time it gets here, it won't
/// change. And because cloning and dropping an `Hir` is somewhat expensive
/// due to its deep recursive representation.
chir: Arc<ConfiguredHIR>,
}
impl StandardMatcher {
fn new(expr: &ConfiguredHIR) -> Result<StandardMatcher, Error> {
let regex = expr.regex()?;
let mut names = HashMap::new();
for (i, optional_name) in regex.capture_names().enumerate() {
if let Some(name) = optional_name {
names.insert(name.to_string(), i);
}
}
Ok(StandardMatcher { regex, names })
fn new(chir: ConfiguredHIR) -> Result<StandardMatcher, Error> {
let chir = Arc::new(chir);
let regex = chir.to_regex()?;
Ok(StandardMatcher { regex, chir })
}
}
@@ -743,14 +710,12 @@ impl Matcher for StandardMatcher {
haystack: &[u8],
at: usize,
) -> Result<Option<Match>, NoError> {
Ok(self
.regex
.find_at(haystack, at)
.map(|m| Match::new(m.start(), m.end())))
let input = Input::new(haystack).span(at..haystack.len());
Ok(self.regex.find(input).map(|m| Match::new(m.start(), m.end())))
}
fn new_captures(&self) -> Result<RegexCaptures, NoError> {
Ok(RegexCaptures::new(self.regex.capture_locations()))
Ok(RegexCaptures::new(self.regex.create_captures()))
}
fn capture_count(&self) -> usize {
@@ -758,7 +723,7 @@ impl Matcher for StandardMatcher {
}
fn capture_index(&self, name: &str) -> Option<usize> {
self.names.get(name).map(|i| *i)
self.regex.group_info().to_index(PatternID::ZERO, name)
}
fn try_find_iter<F, E>(
@@ -785,10 +750,10 @@ impl Matcher for StandardMatcher {
at: usize,
caps: &mut RegexCaptures,
) -> Result<bool, NoError> {
Ok(self
.regex
.captures_read_at(&mut caps.locations_mut(), haystack, at)
.is_some())
let input = Input::new(haystack).span(at..haystack.len());
let caps = caps.captures_mut();
self.regex.search_captures(&input, caps);
Ok(caps.is_match())
}
fn shortest_match_at(
@@ -796,7 +761,8 @@ impl Matcher for StandardMatcher {
haystack: &[u8],
at: usize,
) -> Result<Option<usize>, NoError> {
Ok(self.regex.shortest_match_at(haystack, at))
let input = Input::new(haystack).span(at..haystack.len());
Ok(self.regex.search_half(&input).map(|hm| hm.offset()))
}
}
@@ -815,137 +781,51 @@ impl Matcher for StandardMatcher {
/// index of the group using the corresponding matcher's `capture_index`
/// method, and then use that index with `RegexCaptures::get`.
#[derive(Clone, Debug)]
pub struct RegexCaptures(RegexCapturesImp);
#[derive(Clone, Debug)]
enum RegexCapturesImp {
AhoCorasick {
/// The start and end of the match, corresponding to capture group 0.
mat: Option<Match>,
},
Regex {
/// Where the locations are stored.
locs: CaptureLocations,
/// These captures behave as if the capturing groups begin at the given
/// offset. When set to `0`, this has no affect and capture groups are
/// indexed like normal.
///
/// This is useful when building matchers that wrap arbitrary regular
/// expressions. For example, `WordMatcher` takes an existing regex
/// `re` and creates `(?:^|\W)(re)(?:$|\W)`, but hides the fact that
/// the regex has been wrapped from the caller. In order to do this,
/// the matcher and the capturing groups must behave as if `(re)` is
/// the `0`th capture group.
offset: usize,
/// When enable, the end of a match has `\r` stripped from it, if one
/// exists.
strip_crlf: bool,
},
pub struct RegexCaptures {
/// Where the captures are stored.
caps: AutomataCaptures,
/// These captures behave as if the capturing groups begin at the given
/// offset. When set to `0`, this has no affect and capture groups are
/// indexed like normal.
///
/// This is useful when building matchers that wrap arbitrary regular
/// expressions. For example, `WordMatcher` takes an existing regex
/// `re` and creates `(?:^|\W)(re)(?:$|\W)`, but hides the fact that
/// the regex has been wrapped from the caller. In order to do this,
/// the matcher and the capturing groups must behave as if `(re)` is
/// the `0`th capture group.
offset: usize,
}
impl Captures for RegexCaptures {
fn len(&self) -> usize {
match self.0 {
RegexCapturesImp::AhoCorasick { .. } => 1,
RegexCapturesImp::Regex { ref locs, offset, .. } => {
locs.len().checked_sub(offset).unwrap()
}
}
self.caps
.group_info()
.all_group_len()
.checked_sub(self.offset)
.unwrap()
}
fn get(&self, i: usize) -> Option<Match> {
match self.0 {
RegexCapturesImp::AhoCorasick { mat, .. } => {
if i == 0 {
mat
} else {
None
}
}
RegexCapturesImp::Regex { ref locs, offset, strip_crlf } => {
if !strip_crlf {
let actual = i.checked_add(offset).unwrap();
return locs.pos(actual).map(|(s, e)| Match::new(s, e));
}
// currently don't support capture offsetting with CRLF
// stripping
assert_eq!(offset, 0);
let m = match locs.pos(i).map(|(s, e)| Match::new(s, e)) {
None => return None,
Some(m) => m,
};
// If the end position of this match corresponds to the end
// position of the overall match, then we apply our CRLF
// stripping. Otherwise, we cannot assume stripping is correct.
if i == 0 || m.end() == locs.pos(0).unwrap().1 {
Some(m.with_end(m.end() - 1))
} else {
Some(m)
}
}
}
let actual = i.checked_add(self.offset).unwrap();
self.caps.get_group(actual).map(|sp| Match::new(sp.start, sp.end))
}
}
impl RegexCaptures {
pub(crate) fn simple() -> RegexCaptures {
RegexCaptures(RegexCapturesImp::AhoCorasick { mat: None })
}
pub(crate) fn new(locs: CaptureLocations) -> RegexCaptures {
RegexCaptures::with_offset(locs, 0)
pub(crate) fn new(caps: AutomataCaptures) -> RegexCaptures {
RegexCaptures::with_offset(caps, 0)
}
pub(crate) fn with_offset(
locs: CaptureLocations,
caps: AutomataCaptures,
offset: usize,
) -> RegexCaptures {
RegexCaptures(RegexCapturesImp::Regex {
locs,
offset,
strip_crlf: false,
})
RegexCaptures { caps, offset }
}
pub(crate) fn locations(&self) -> &CaptureLocations {
match self.0 {
RegexCapturesImp::AhoCorasick { .. } => {
panic!("getting locations for simple captures is invalid")
}
RegexCapturesImp::Regex { ref locs, .. } => locs,
}
}
pub(crate) fn locations_mut(&mut self) -> &mut CaptureLocations {
match self.0 {
RegexCapturesImp::AhoCorasick { .. } => {
panic!("getting locations for simple captures is invalid")
}
RegexCapturesImp::Regex { ref mut locs, .. } => locs,
}
}
pub(crate) fn strip_crlf(&mut self, yes: bool) {
match self.0 {
RegexCapturesImp::AhoCorasick { .. } => {
panic!("setting strip_crlf for simple captures is invalid")
}
RegexCapturesImp::Regex { ref mut strip_crlf, .. } => {
*strip_crlf = yes;
}
}
}
pub(crate) fn set_simple(&mut self, one: Option<Match>) {
match self.0 {
RegexCapturesImp::AhoCorasick { ref mut mat } => {
*mat = one;
}
RegexCapturesImp::Regex { .. } => {
panic!("setting simple captures for regex is invalid")
}
}
pub(crate) fn captures_mut(&mut self) -> &mut AutomataCaptures {
&mut self.caps
}
}
@@ -1032,7 +912,9 @@ mod tests {
}
// Test that finding candidate lines works as expected.
// FIXME: Re-enable this test once inner literal extraction works.
#[test]
#[ignore]
fn candidate_lines() {
fn is_confirmed(m: LineMatchKind) -> bool {
match m {

View File

@@ -1,6 +1,6 @@
use aho_corasick::{AhoCorasick, AhoCorasickBuilder, MatchKind};
use aho_corasick::{AhoCorasick, MatchKind};
use grep_matcher::{Match, Matcher, NoError};
use regex_syntax::hir::Hir;
use regex_syntax::hir::{Hir, HirKind};
use crate::error::Error;
use crate::matcher::RegexCaptures;
@@ -23,11 +23,10 @@ impl MultiLiteralMatcher {
pub fn new<B: AsRef<[u8]>>(
literals: &[B],
) -> Result<MultiLiteralMatcher, Error> {
let ac = AhoCorasickBuilder::new()
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.auto_configure(literals)
.build_with_size::<usize, _, _>(literals)
.map_err(Error::regex)?;
.build(literals)
.map_err(Error::generic)?;
Ok(MultiLiteralMatcher { ac })
}
}
@@ -79,13 +78,11 @@ impl Matcher for MultiLiteralMatcher {
/// Alternation literals checks if the given HIR is a simple alternation of
/// literals, and if so, returns them. Otherwise, this returns None.
pub fn alternation_literals(expr: &Hir) -> Option<Vec<Vec<u8>>> {
use regex_syntax::hir::{HirKind, Literal};
// This is pretty hacky, but basically, if `is_alternation_literal` is
// true, then we can make several assumptions about the structure of our
// HIR. This is what justifies the `unreachable!` statements below.
if !expr.is_alternation_literal() {
if !expr.properties().is_alternation_literal() {
return None;
}
let alts = match *expr.kind() {
@@ -93,26 +90,16 @@ pub fn alternation_literals(expr: &Hir) -> Option<Vec<Vec<u8>>> {
_ => return None, // one literal isn't worth it
};
let extendlit = |lit: &Literal, dst: &mut Vec<u8>| match *lit {
Literal::Unicode(c) => {
let mut buf = [0; 4];
dst.extend_from_slice(c.encode_utf8(&mut buf).as_bytes());
}
Literal::Byte(b) => {
dst.push(b);
}
};
let mut lits = vec![];
for alt in alts {
let mut lit = vec![];
match *alt.kind() {
HirKind::Empty => {}
HirKind::Literal(ref x) => extendlit(x, &mut lit),
HirKind::Literal(ref x) => lit.extend_from_slice(&x.0),
HirKind::Concat(ref exprs) => {
for e in exprs {
match *e.kind() {
HirKind::Literal(ref x) => extendlit(x, &mut lit),
HirKind::Literal(ref x) => lit.extend_from_slice(&x.0),
_ => unreachable!("expected literal, got {:?}", e),
}
}

View File

@@ -1,9 +1,13 @@
use grep_matcher::ByteSet;
use regex_syntax::hir::{self, Hir, HirKind};
use regex_syntax::utf8::Utf8Sequences;
use {
grep_matcher::ByteSet,
regex_syntax::{
hir::{self, Hir, HirKind, Look},
utf8::Utf8Sequences,
},
};
/// Return a confirmed set of non-matching bytes from the given expression.
pub fn non_matching_bytes(expr: &Hir) -> ByteSet {
pub(crate) fn non_matching_bytes(expr: &Hir) -> ByteSet {
let mut set = ByteSet::full();
remove_matching_bytes(expr, &mut set);
set
@@ -13,18 +17,27 @@ pub fn non_matching_bytes(expr: &Hir) -> ByteSet {
/// the given expression.
fn remove_matching_bytes(expr: &Hir, set: &mut ByteSet) {
match *expr.kind() {
HirKind::Empty | HirKind::WordBoundary(_) => {}
HirKind::Anchor(_) => {
HirKind::Empty
| HirKind::Look(Look::WordAscii | Look::WordAsciiNegate)
| HirKind::Look(Look::WordUnicode | Look::WordUnicodeNegate) => {}
HirKind::Look(Look::Start | Look::End) => {
// FIXME: This is wrong, but not doing this leads to incorrect
// results because of how anchored searches are implemented in
// the 'grep-searcher' crate.
set.remove(b'\n');
}
HirKind::Literal(hir::Literal::Unicode(c)) => {
for &b in c.encode_utf8(&mut [0; 4]).as_bytes() {
HirKind::Look(Look::StartLF | Look::EndLF) => {
set.remove(b'\n');
}
HirKind::Look(Look::StartCRLF | Look::EndCRLF) => {
set.remove(b'\r');
set.remove(b'\n');
}
HirKind::Literal(hir::Literal(ref lit)) => {
for &b in lit.iter() {
set.remove(b);
}
}
HirKind::Literal(hir::Literal::Byte(b)) => {
set.remove(b);
}
HirKind::Class(hir::Class::Unicode(ref cls)) => {
for range in cls.iter() {
// This is presumably faster than encoding every codepoint
@@ -42,10 +55,10 @@ fn remove_matching_bytes(expr: &Hir, set: &mut ByteSet) {
}
}
HirKind::Repetition(ref x) => {
remove_matching_bytes(&x.hir, set);
remove_matching_bytes(&x.sub, set);
}
HirKind::Group(ref x) => {
remove_matching_bytes(&x.hir, set);
HirKind::Capture(ref x) => {
remove_matching_bytes(&x.sub, set);
}
HirKind::Concat(ref xs) => {
for x in xs {
@@ -62,17 +75,13 @@ fn remove_matching_bytes(expr: &Hir, set: &mut ByteSet) {
#[cfg(test)]
mod tests {
use grep_matcher::ByteSet;
use regex_syntax::ParserBuilder;
use {grep_matcher::ByteSet, regex_syntax::ParserBuilder};
use super::non_matching_bytes;
fn extract(pattern: &str) -> ByteSet {
let expr = ParserBuilder::new()
.allow_invalid_utf8(true)
.build()
.parse(pattern)
.unwrap();
let expr =
ParserBuilder::new().utf8(false).build().parse(pattern).unwrap();
non_matching_bytes(&expr)
}
@@ -131,9 +140,13 @@ mod tests {
#[test]
fn anchor() {
// FIXME: The first four tests below should correspond to a full set
// of bytes for the non-matching bytes I think.
assert_eq!(sparse(&extract(r"^")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"$")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"\A")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"\z")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"(?m)^")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"(?m)$")), sparse_except(&[b'\n']));
}
}

View File

@@ -1,5 +1,7 @@
use grep_matcher::LineTerminator;
use regex_syntax::hir::{self, Hir, HirKind};
use {
grep_matcher::LineTerminator,
regex_syntax::hir::{self, Hir, HirKind},
};
use crate::error::{Error, ErrorKind};
@@ -15,7 +17,26 @@ use crate::error::{Error, ErrorKind};
///
/// If the given line terminator is not ASCII, then this function returns an
/// error.
pub fn strip_from_match(
///
/// Note that as of regex 1.9, this routine could theoretically be implemented
/// without returning an error. Namely, for example, we could turn
/// `foo\nbar` into `foo[a&&b]bar`. That is, replace line terminators with a
/// sub-expression that can never match anything. Thus, ripgrep would accept
/// such regexes and just silently not match anything. Regex versions prior to 1.8
/// don't support such constructs. I ended up deciding to leave the existing
/// behavior of returning an error instead. For example:
///
/// ```text
/// $ echo -n 'foo\nbar\n' | rg 'foo\nbar'
/// the literal '"\n"' is not allowed in a regex
///
/// Consider enabling multiline mode with the --multiline flag (or -U for short).
/// When multiline mode is enabled, new line characters can be matched.
/// ```
///
/// This looks like a good error message to me, and even suggests a flag that
/// the user can use instead.
pub(crate) fn strip_from_match(
expr: Hir,
line_term: LineTerminator,
) -> Result<Hir, Error> {
@@ -23,40 +44,34 @@ pub fn strip_from_match(
let expr1 = strip_from_match_ascii(expr, b'\r')?;
strip_from_match_ascii(expr1, b'\n')
} else {
let b = line_term.as_byte();
if b > 0x7F {
return Err(Error::new(ErrorKind::InvalidLineTerminator(b)));
}
strip_from_match_ascii(expr, b)
strip_from_match_ascii(expr, line_term.as_byte())
}
}
/// The implementation of strip_from_match. The given byte must be ASCII. This
/// function panics otherwise.
/// The implementation of strip_from_match. The given byte must be ASCII.
/// This function returns an error otherwise. It also returns an error if
/// it couldn't remove `\n` from the given regex without leaving an empty
/// character class in its place.
fn strip_from_match_ascii(expr: Hir, byte: u8) -> Result<Hir, Error> {
assert!(byte <= 0x7F);
let chr = byte as char;
assert_eq!(chr.len_utf8(), 1);
let invalid = || Err(Error::new(ErrorKind::NotAllowed(chr.to_string())));
if !byte.is_ascii() {
return Err(Error::new(ErrorKind::InvalidLineTerminator(byte)));
}
let ch = char::from(byte);
let invalid = || Err(Error::new(ErrorKind::NotAllowed(ch.to_string())));
Ok(match expr.into_kind() {
HirKind::Empty => Hir::empty(),
HirKind::Literal(hir::Literal::Unicode(c)) => {
if c == chr {
HirKind::Literal(hir::Literal(lit)) => {
if lit.iter().find(|&&b| b == byte).is_some() {
return invalid();
}
Hir::literal(hir::Literal::Unicode(c))
}
HirKind::Literal(hir::Literal::Byte(b)) => {
if b as char == chr {
return invalid();
}
Hir::literal(hir::Literal::Byte(b))
Hir::literal(lit)
}
HirKind::Class(hir::Class::Unicode(mut cls)) => {
if cls.ranges().is_empty() {
return Ok(Hir::class(hir::Class::Unicode(cls)));
}
let remove = hir::ClassUnicode::new(Some(
hir::ClassUnicodeRange::new(chr, chr),
hir::ClassUnicodeRange::new(ch, ch),
));
cls.difference(&remove);
if cls.ranges().is_empty() {
@@ -65,6 +80,9 @@ fn strip_from_match_ascii(expr: Hir, byte: u8) -> Result<Hir, Error> {
Hir::class(hir::Class::Unicode(cls))
}
HirKind::Class(hir::Class::Bytes(mut cls)) => {
if cls.ranges().is_empty() {
return Ok(Hir::class(hir::Class::Bytes(cls)));
}
let remove = hir::ClassBytes::new(Some(
hir::ClassBytesRange::new(byte, byte),
));
@@ -74,15 +92,14 @@ fn strip_from_match_ascii(expr: Hir, byte: u8) -> Result<Hir, Error> {
}
Hir::class(hir::Class::Bytes(cls))
}
HirKind::Anchor(x) => Hir::anchor(x),
HirKind::WordBoundary(x) => Hir::word_boundary(x),
HirKind::Look(x) => Hir::look(x),
HirKind::Repetition(mut x) => {
x.hir = Box::new(strip_from_match_ascii(*x.hir, byte)?);
x.sub = Box::new(strip_from_match_ascii(*x.sub, byte)?);
Hir::repetition(x)
}
HirKind::Group(mut x) => {
x.hir = Box::new(strip_from_match_ascii(*x.hir, byte)?);
Hir::group(x)
HirKind::Capture(mut x) => {
x.sub = Box::new(strip_from_match_ascii(*x.sub, byte)?);
Hir::capture(x)
}
HirKind::Concat(xs) => {
let xs = xs
@@ -131,11 +148,11 @@ mod tests {
#[test]
fn various() {
assert_eq!(roundtrip(r"[a\n]", b'\n'), "[a]");
assert_eq!(roundtrip(r"[a\n]", b'a'), "[\n]");
assert_eq!(roundtrip_crlf(r"[a\n]"), "[a]");
assert_eq!(roundtrip_crlf(r"[a\r]"), "[a]");
assert_eq!(roundtrip_crlf(r"[a\r\n]"), "[a]");
assert_eq!(roundtrip(r"[a\n]", b'\n'), "a");
assert_eq!(roundtrip(r"[a\n]", b'a'), "\n");
assert_eq!(roundtrip_crlf(r"[a\n]"), "a");
assert_eq!(roundtrip_crlf(r"[a\r]"), "a");
assert_eq!(roundtrip_crlf(r"[a\r\n]"), "a");
assert_eq!(roundtrip(r"(?-u)\s", b'a'), r"(?-u:[\x09-\x0D\x20])");
assert_eq!(roundtrip(r"(?-u)\s", b'\n'), r"(?-u:[\x09\x0B-\x0D\x20])");

View File

@@ -1,29 +0,0 @@
/// Converts an arbitrary sequence of bytes to a literal suitable for building
/// a regular expression.
pub fn bytes_to_regex(bs: &[u8]) -> String {
use regex_syntax::is_meta_character;
use std::fmt::Write;
let mut s = String::with_capacity(bs.len());
for &b in bs {
if b <= 0x7F && !is_meta_character(b as char) {
write!(s, r"{}", b as char).unwrap();
} else {
write!(s, r"\x{:02x}", b).unwrap();
}
}
s
}
/// Converts arbitrary bytes to a nice string.
pub fn show_bytes(bs: &[u8]) -> String {
use std::ascii::escape_default;
use std::str;
let mut nice = String::new();
for &b in bs {
let part: Vec<u8> = escape_default(b).collect();
nice.push_str(str::from_utf8(&part).unwrap());
}
nice
}

View File

@@ -1,39 +1,59 @@
use std::cell::RefCell;
use std::collections::HashMap;
use std::sync::Arc;
use std::{
collections::HashMap,
panic::{RefUnwindSafe, UnwindSafe},
sync::Arc,
};
use grep_matcher::{Match, Matcher, NoError};
use regex::bytes::{CaptureLocations, Regex};
use thread_local::ThreadLocal;
use {
grep_matcher::{Match, Matcher, NoError},
regex_automata::{
meta::Regex, util::captures::Captures, util::pool::Pool, Input,
PatternID,
},
};
use crate::config::ConfiguredHIR;
use crate::error::Error;
use crate::matcher::RegexCaptures;
use crate::{config::ConfiguredHIR, error::Error, matcher::RegexCaptures};
type PoolFn =
Box<dyn Fn() -> Captures + Send + Sync + UnwindSafe + RefUnwindSafe>;
/// A matcher for implementing "word match" semantics.
#[derive(Debug)]
pub struct WordMatcher {
pub(crate) struct WordMatcher {
/// The regex which is roughly `(?:^|\W)(<original pattern>)(?:$|\W)`.
regex: Regex,
/// The HIR that produced the regex above. We don't keep the HIR for the
/// `original` regex.
///
/// We put this in an `Arc` because by the time it gets here, it won't
/// change. And because cloning and dropping an `Hir` is somewhat expensive
/// due to its deep recursive representation.
chir: Arc<ConfiguredHIR>,
/// The original regex supplied by the user, which we use in a fast path
/// to try and detect matches before deferring to slower engines.
original: Regex,
/// A map from capture group name to capture group index.
names: HashMap<String, usize>,
/// A reusable buffer for finding the match location of the inner group.
locs: Arc<ThreadLocal<RefCell<CaptureLocations>>>,
/// A thread-safe pool of reusable buffers for finding the match offset of
/// the inner group.
caps: Arc<Pool<Captures, PoolFn>>,
}
impl Clone for WordMatcher {
fn clone(&self) -> WordMatcher {
// We implement Clone manually so that we get a fresh ThreadLocal such
// that it can set its own thread owner. This permits each thread
// usings `locs` to hit the fast path.
// We implement Clone manually so that we get a fresh Pool such that it
// can set its own thread owner. This permits each thread usings `caps`
// to hit the fast path.
//
// Note that cloning a regex is "cheap" since it uses reference
// counting internally.
let re = self.regex.clone();
WordMatcher {
regex: self.regex.clone(),
chir: Arc::clone(&self.chir),
original: self.original.clone(),
names: self.names.clone(),
locs: Arc::new(ThreadLocal::new()),
caps: Arc::new(Pool::new(Box::new(move || re.create_captures()))),
}
}
}
@@ -44,31 +64,38 @@ impl WordMatcher {
///
/// The given options are used to construct the regular expression
/// internally.
pub fn new(expr: &ConfiguredHIR) -> Result<WordMatcher, Error> {
let original =
expr.with_pattern(|pat| format!("^(?:{})$", pat))?.regex()?;
let word_expr = expr.with_pattern(|pat| {
let pat = format!(r"(?:(?m:^)|\W)({})(?:\W|(?m:$))", pat);
log::debug!("word regex: {:?}", pat);
pat
})?;
let regex = word_expr.regex()?;
let locs = Arc::new(ThreadLocal::new());
pub(crate) fn new(chir: ConfiguredHIR) -> Result<WordMatcher, Error> {
let original = chir.clone().into_anchored().to_regex()?;
let chir = Arc::new(chir.into_word()?);
let regex = chir.to_regex()?;
let caps = Arc::new(Pool::new({
let regex = regex.clone();
Box::new(move || regex.create_captures()) as PoolFn
}));
let mut names = HashMap::new();
for (i, optional_name) in regex.capture_names().enumerate() {
let it = regex.group_info().pattern_names(PatternID::ZERO);
for (i, optional_name) in it.enumerate() {
if let Some(name) = optional_name {
names.insert(name.to_string(), i.checked_sub(1).unwrap());
}
}
Ok(WordMatcher { regex, original, names, locs })
Ok(WordMatcher { regex, chir, original, names, caps })
}
/// Return the underlying regex used by this matcher.
pub fn regex(&self) -> &Regex {
/// Return the underlying regex used to match at word boundaries.
///
/// The original regex is in the capture group at index 1.
pub(crate) fn regex(&self) -> &Regex {
&self.regex
}
/// Return the underlying HIR for the regex used to match at word
/// boundaries.
pub(crate) fn chir(&self) -> &ConfiguredHIR {
&self.chir
}
/// Attempt to do a fast confirmation of a word match that covers a subset
/// (but hopefully a big subset) of most cases. Ok(Some(..)) is returned
/// when a match is found. Ok(None) is returned when there is definitively
@@ -79,12 +106,11 @@ impl WordMatcher {
haystack: &[u8],
at: usize,
) -> Result<Option<Match>, ()> {
// This is a bit hairy. The whole point here is to avoid running an
// NFA simulation in the regex engine. Remember, our word regex looks
// like this:
// This is a bit hairy. The whole point here is to avoid running a
// slower regex engine to extract capture groups. Remember, our word
// regex looks like this:
//
// (^|\W)(<original regex>)($|\W)
// where ^ and $ have multiline mode DISABLED
// (^|\W)(<original regex>)(\W|$)
//
// What we want are the match offsets of <original regex>. So in the
// easy/common case, the original regex will be sandwiched between
@@ -102,7 +128,8 @@ impl WordMatcher {
// The reason why we cannot handle the ^/$ cases here is because we
// can't assume anything about the original pattern. (Try commenting
// out the checks for ^/$ below and run the tests to see examples.)
let mut cand = match self.regex.find_at(haystack, at) {
let input = Input::new(haystack).span(at..haystack.len());
let mut cand = match self.regex.find(input) {
None => return Ok(None),
Some(m) => Match::new(m.start(), m.end()),
};
@@ -111,8 +138,15 @@ impl WordMatcher {
}
let (_, slen) = bstr::decode_utf8(&haystack[cand]);
let (_, elen) = bstr::decode_last_utf8(&haystack[cand]);
cand =
cand.with_start(cand.start() + slen).with_end(cand.end() - elen);
let new_start = cand.start() + slen;
let new_end = cand.end() - elen;
// This occurs the original regex can match the empty string. In this
// case, just bail instead of trying to get it right here since it's
// likely a pathological case.
if new_start > new_end {
return Err(());
}
cand = cand.with_start(new_start).with_end(new_end);
if self.original.is_match(&haystack[cand]) {
Ok(Some(cand))
} else {
@@ -138,23 +172,23 @@ impl Matcher for WordMatcher {
//
// OK, well, it turns out that it is worth it! But it is quite tricky.
// See `fast_find` for details. Effectively, this lets us skip running
// the NFA simulation in the regex engine in the vast majority of
// cases. However, the NFA simulation is required for full correctness.
// a slower regex engine to extract capture groups in the vast majority
// of cases. However, the slower engine is I believe required for full
// correctness.
match self.fast_find(haystack, at) {
Ok(Some(m)) => return Ok(Some(m)),
Ok(None) => return Ok(None),
Err(()) => {}
}
let cell =
self.locs.get_or(|| RefCell::new(self.regex.capture_locations()));
let mut caps = cell.borrow_mut();
self.regex.captures_read_at(&mut caps, haystack, at);
Ok(caps.get(1).map(|m| Match::new(m.0, m.1)))
let input = Input::new(haystack).span(at..haystack.len());
let mut caps = self.caps.get();
self.regex.search_captures(&input, &mut caps);
Ok(caps.get_group(1).map(|sp| Match::new(sp.start, sp.end)))
}
fn new_captures(&self) -> Result<RegexCaptures, NoError> {
Ok(RegexCaptures::with_offset(self.regex.capture_locations(), 1))
Ok(RegexCaptures::with_offset(self.regex.create_captures(), 1))
}
fn capture_count(&self) -> usize {
@@ -171,9 +205,10 @@ impl Matcher for WordMatcher {
at: usize,
caps: &mut RegexCaptures,
) -> Result<bool, NoError> {
let r =
self.regex.captures_read_at(caps.locations_mut(), haystack, at);
Ok(r.is_some())
let input = Input::new(haystack).span(at..haystack.len());
let caps = caps.captures_mut();
self.regex.search_captures(&input, caps);
Ok(caps.is_match())
}
// We specifically do not implement other methods like find_iter or
@@ -188,8 +223,8 @@ mod tests {
use grep_matcher::{Captures, Match, Matcher};
fn matcher(pattern: &str) -> WordMatcher {
let chir = Config::default().hir(pattern).unwrap();
WordMatcher::new(&chir).unwrap()
let chir = Config::default().build_many(&[pattern]).unwrap();
WordMatcher::new(chir).unwrap()
}
fn find(pattern: &str, haystack: &str) -> Option<(usize, usize)> {

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-searcher"
version = "0.1.8" #:version
version = "0.1.11" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Fast line oriented regex searching as a library.
@@ -10,20 +10,20 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/searcher"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/searcher"
readme = "README.md"
keywords = ["regex", "grep", "egrep", "search", "pattern"]
license = "Unlicense/MIT"
license = "Unlicense OR MIT"
edition = "2018"
[dependencies]
bstr = { version = "0.2.0", default-features = false, features = ["std"] }
bstr = { version = "1.6.0", default-features = false, features = ["std"] }
bytecount = "0.6"
encoding_rs = "0.8.14"
encoding_rs_io = "0.1.6"
grep-matcher = { version = "0.1.5", path = "../matcher" }
grep-matcher = { version = "0.1.6", path = "../matcher" }
log = "0.4.5"
memmap = { package = "memmap2", version = "0.3.0" }
memmap = { package = "memmap2", version = "0.5.3" }
[dev-dependencies]
grep-regex = { version = "0.1.9", path = "../regex" }
grep-regex = { version = "0.1.11", path = "../regex" }
regex = "1.1"
[features]

View File

@@ -481,7 +481,7 @@ impl LineBuffer {
}
let roll_len = self.end - self.pos;
self.buf.copy_within_str(self.pos..self.end, 0);
self.buf.copy_within(self.pos..self.end, 0);
self.pos = 0;
self.last_lineterm = roll_len;
self.end = roll_len;

View File

@@ -467,6 +467,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
let keepgoing = self.sink.context(
&self.searcher,
&SinkContext {
#[cfg(test)]
line_term: self.config.line_term,
bytes: &buf[*range],
kind: SinkContextKind::Before,
@@ -497,6 +498,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
let keepgoing = self.sink.context(
&self.searcher,
&SinkContext {
#[cfg(test)]
line_term: self.config.line_term,
bytes: &buf[*range],
kind: SinkContextKind::After,
@@ -526,6 +528,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
let keepgoing = self.sink.context(
&self.searcher,
&SinkContext {
#[cfg(test)]
line_term: self.config.line_term,
bytes: &buf[*range],
kind: SinkContextKind::Other,

View File

@@ -88,7 +88,6 @@ where
#[derive(Debug)]
pub struct SliceByLine<'s, M, S> {
config: &'s Config,
core: Core<'s, M, S>,
slice: &'s [u8],
}
@@ -103,7 +102,6 @@ impl<'s, M: Matcher, S: Sink> SliceByLine<'s, M, S> {
debug_assert!(!searcher.multi_line_with_matcher(&matcher));
SliceByLine {
config: &searcher.config,
core: Core::new(searcher, matcher, write_to, true),
slice: slice,
}
@@ -1514,4 +1512,31 @@ and exhibited clearly, with a label attached.\
)
.unwrap();
}
// See: https://github.com/BurntSushi/ripgrep/issues/2260
#[test]
fn regression_2260() {
use grep_regex::RegexMatcherBuilder;
use crate::SearcherBuilder;
let matcher = RegexMatcherBuilder::new()
.line_terminator(Some(b'\n'))
.build(r"^\w+$")
.unwrap();
let mut searcher = SearcherBuilder::new().line_number(true).build();
let mut matched = false;
searcher
.search_slice(
&matcher,
b"GATC\n",
crate::sinks::UTF8(|_, _| {
matched = true;
Ok(true)
}),
)
.unwrap();
assert!(matched);
}
}

View File

@@ -436,6 +436,7 @@ pub enum SinkContextKind {
/// A type that describes a contextual line reported by a searcher.
#[derive(Clone, Debug)]
pub struct SinkContext<'b> {
#[cfg(test)]
pub(crate) line_term: LineTerminator,
pub(crate) bytes: &'b [u8],
pub(crate) kind: SinkContextKind,

View File

@@ -1,14 +1,14 @@
class RipgrepBin < Formula
version '12.1.1'
version '13.0.0'
desc "Recursively search directories for a regex pattern."
homepage "https://github.com/BurntSushi/ripgrep"
if OS.mac?
url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-apple-darwin.tar.gz"
sha256 "7ff2fd5dd3a438d62fae5866ddae78cf542b733116f58cf21ab691a58c385703"
sha256 "585c18350cb8d4392461edd6c921e6edd5a97cbfc03b567d7bd440423e118082"
elsif OS.linux?
url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-unknown-linux-musl.tar.gz"
sha256 "88d3b735e43f6f16a0181a8fec48847693fae80168d5f889fdbdeb962f1fc804"
sha256 "ee4e0751ab108b6da4f47c52da187d5177dc371f0f512a7caaec5434e711c091"
end
conflicts_with "ripgrep"

View File

@@ -1029,3 +1029,100 @@ rgtest!(r1878, |dir: Dir, _: TestCommand| {
let args = &["-U", "--mmap", r"\Abaz", "test"];
dir.command().args(args).assert_err();
});
// See: https://github.com/BurntSushi/ripgrep/issues/1891
rgtest!(r1891, |dir: Dir, mut cmd: TestCommand| {
// TODO: Sadly, PCRE2 has different behavior here. Not clear why. We should
// look into this and see if there's a fix needed at the regex engine
// level.
if dir.is_pcre2() {
return;
}
dir.create("test", "\n##\n");
// N.B. We use -o here to force the issue to occur, which seems to only
// happen when each match needs to be detected.
eqnice!("1:\n2:\n2:\n", cmd.args(&["-won", "", "test"]).stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/2095
rgtest!(r2095, |dir: Dir, mut cmd: TestCommand| {
dir.create(
"test",
"#!/usr/bin/env bash
zero=one
a=one
if true; then
a=(
a
b
c
)
true
fi
a=two
b=one
});
",
);
cmd.args(&[
"--line-number",
"--multiline",
"--only-matching",
"--replace",
"${value}",
r"^(?P<indent>\s*)a=(?P<value>(?ms:[(].*?[)])|.*?)$",
"test",
]);
let expected = "4:one
8:(
9: a
10: b
11: c
12: )
15:two
";
eqnice!(expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/2208
rgtest!(r2208, |dir: Dir, mut cmd: TestCommand| {
dir.create("test", "# Compile requirements.txt files from all found or specified requirements.in files (compile).
# Use -h to include hashes, -u dep1,dep2... to upgrade specific dependencies, and -U to upgrade all.
pipc () { # [-h] [-U|-u <pkgspec>[,<pkgspec>...]] [<reqs-in>...] [-- <pip-compile-arg>...]
emulate -L zsh
unset REPLY
if [[ $1 == --help ]] { zpy $0; return }
[[ $ZPY_PROCS ]] || return
local gen_hashes upgrade upgrade_csv
while [[ $1 == -[hUu] ]] {
if [[ $1 == -h ]] { gen_hashes=--generate-hashes; shift }
if [[ $1 == -U ]] { upgrade=1; shift }
if [[ $1 == -u ]] { upgrade=1; upgrade_csv=$2; shift 2 }
}
}
");
cmd.args(&[
"-N",
"-U",
"-r", "$usage",
r#"^(?P<predoc>\n?(# .*\n)*)(alias (?P<aname>pipc)="[^"]+"|(?P<fname>pipc) \(\) \{)( #(?P<usage> .+))?"#,
"test",
]);
let expected = " [-h] [-U|-u <pkgspec>[,<pkgspec>...]] [<reqs-in>...] [-- <pip-compile-arg>...]\n";
eqnice!(expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/2236
rgtest!(r2236, |dir: Dir, mut cmd: TestCommand| {
dir.create(".ignore", r"foo\/");
dir.create_dir("foo");
dir.create("foo/bar", "test\n");
cmd.args(&["test"]).assert_err();
});