Compare commits

...

406 Commits

Author SHA1 Message Date
dependabot[bot]
6dfaec03e8
deps: bump crossbeam-channel from 0.5.13 to 0.5.15
Bumps [crossbeam-channel](https://github.com/crossbeam-rs/crossbeam) from 0.5.13 to 0.5.15.
- [Release notes](https://github.com/crossbeam-rs/crossbeam/releases)
- [Changelog](https://github.com/crossbeam-rs/crossbeam/blob/master/CHANGELOG.md)
- [Commits](https://github.com/crossbeam-rs/crossbeam/compare/crossbeam-channel-0.5.13...crossbeam-channel-0.5.15)

---
updated-dependencies:
- dependency-name: crossbeam-channel
  dependency-version: 0.5.15
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-04-10 10:55:32 -04:00
Pierre Rouleau
5fbc4fee64
ignore/types: fix Seed7 file extension
PR #3023
2025-04-07 10:53:32 -04:00
Pierre Rouleau
004370bd16
ignore/types: add support for Seed7 files
For more info on the Seed7 programming Language see:

- on Wikipedia: https://en.wikipedia.org/wiki/Seed7
- Seed7 home:   https://seed7.sourceforge.net/
- Seed7 repo:   https://github.com/ThomasMertes/seed7

PR #3022
2025-04-07 08:51:22 -04:00
Andrew Gallant
de4baa1002
globset-0.4.16 2025-02-27 12:46:58 -05:00
Andrew Gallant
163ac157d3 globset: escape { and } in escape
This appears to be an oversight from when `escape` was
implemented in #2061.
2025-02-27 12:46:48 -05:00
Andrew Gallant
e2362d4d51
searcher: add log message noting detected encoding
This helps improve diagnostics. Otherwise it can be easy to miss that
ripgrep is doing transcoding.

Fixes #2979
2025-01-25 14:27:00 -05:00
Kizhyk
d6b59feff8
github: update WASI compilation job
Ref https://blog.rust-lang.org/2024/04/09/updates-to-rusts-wasi-targets.html

PR #2970
2025-01-13 10:16:09 -05:00
Max Coplan
94305125ef
zsh: support sourcing zsh completion dynamically
Previously, you needed to save the completion script to a file and
then source it. Now, you can dynamically source completions in zsh by
running

    $ source <(rg --generate complete-zsh)

Before this commit, you would get an error after step 1.
After this commit, it should work as expected.

We also improve the FAQ item for zsh completions.

Fixes #2956
2024-12-31 08:23:13 -05:00
Andrew Gallant
79cbe89deb
doc: tweak wording for stdin detection
This makes it slightly more precise to cover weird cases like trying to
pass a directory on stdin.

Closes #2906
2024-09-30 07:38:05 -04:00
Thayne McCombs
bf63fe8f25
regex: add as_match method to Captures trait
Ref https://github.com/rust-lang/regex/issues/1146

PR #2898
2024-09-19 09:30:31 -04:00
Andrew Gallant
8bd5950296
changelog: add next section 2024-09-08 22:32:09 -04:00
Andrew Gallant
6e0539ab91
pkg/brew: update tap 2024-09-08 22:32:02 -04:00
Andrew Gallant
4649aa9700
14.1.1 2024-09-08 22:15:00 -04:00
Andrew Gallant
c009652e77
changelog: 14.1.1 2024-09-08 22:13:53 -04:00
Andrew Gallant
b9f7a9ba2b
deps: bump grep to 0.3.2 2024-09-08 22:11:17 -04:00
Andrew Gallant
a1960877cf
grep-0.3.2 2024-09-08 22:11:00 -04:00
Andrew Gallant
bb0925af91
deps: bump grep-printer to 0.2.2 2024-09-08 22:10:49 -04:00
Andrew Gallant
be117dbafa
grep-printer-0.2.2 2024-09-08 22:10:29 -04:00
Andrew Gallant
06dc13ad2d
deps: bump grep-searcher to 0.1.14 2024-09-08 22:09:55 -04:00
Andrew Gallant
c6c2e69b8f
grep-searcher-0.1.14 2024-09-08 22:09:27 -04:00
Andrew Gallant
e67c868ddd
deps: bump grep-pcre2 to 0.1.8 2024-09-08 22:09:23 -04:00
Andrew Gallant
d33f2e2f70
grep-pcre2-0.1.8 2024-09-08 22:08:41 -04:00
Andrew Gallant
082edafffa
deps: bump grep-regex to 0.1.13 2024-09-08 22:08:22 -04:00
Andrew Gallant
7c8dc332b3
grep-regex-0.1.13 2024-09-08 22:07:52 -04:00
Andrew Gallant
ea961915b5
deps: bump grep-cli to 0.1.11 2024-09-08 22:07:30 -04:00
Andrew Gallant
7943bdfe82
grep-cli-0.1.11 2024-09-08 22:06:59 -04:00
Andrew Gallant
312a7884fc
deps: bump ignore to 0.4.23 2024-09-08 22:06:39 -04:00
Andrew Gallant
ac02f54c89
ignore-0.4.23 2024-09-08 22:06:03 -04:00
Andrew Gallant
24b337b940
deps: bump globset to 0.4.15 2024-09-08 22:05:45 -04:00
Andrew Gallant
a5083f99ce
globset-0.4.15 2024-09-08 22:04:48 -04:00
Andrew Gallant
f89cdba5df
doc: update date in man page template 2024-09-08 22:04:11 -04:00
Andrew Gallant
f7b677d136
deps: update everything 2024-09-08 22:03:29 -04:00
Andrew Gallant
3f68a8f3d7
changelog: 14.1.1 2024-09-08 22:03:22 -04:00
Andrew Gallant
9d738ad0c0 regex: fix inner literal extraction that resulted in false negatives
In some rare cases, it was possible for ripgrep's inner literal detector
to extract a set of literals that could produce a false negative. #2884
gives an example: `(?i:e.x|ex)`. In this case, the set extracted can be
discovered by running `rg '(?i:e.x|ex) --trace`:

    Seq[E("EX"), E("Ex"), E("eX"), E("ex")]

This extraction leads to building a multi-substring matcher for `EX`,
`Ex`, `eX` and `ex`. Searching the haystack `e-x` produces no match,
and thus, ripgrep shows no matches. But the regex `(?i:e.x|ex)` matches
`e-x`.

The issue at play here was that when two extracted literal sequences
were unioned, we were correctly unioning their "prefix" attribute.
And this in turn leads to those literal sequences being combined
incorrectly via cross product. This case in particular triggers it
because two different optimizations combine to produce an incorrect
result. Firslty, the regex has a common prefix extracted and is
rewritten as `(?i:e(?:.x|x))`. Secondly, the `x` in the first branch of
the alternation has its `prefix` attribute set to `false` (correctly),
which means it can't be cross producted with another concatenation. But
in this case, it is unioned with the `x` from the second branch, and
this results in the union result having `prefix` set to `true`. This
in turn pops up and lets it get cross producted with the `e` prefix,
producing an incorrect literal sequence.

We fix this by changing the implementation of `union` to return
`prefix` set to `true` only when *both* literal sequences being unioned
have `prefix` set to `true`.

Doing this exposed a second bug that was present, but was purely
cosmetic: the extracted literals in this case, after the fix, are
`X` and `x`. They were considered "exact" (i.e., lead to a match),
but of course they are not. Observing an `X` or an `x` does not mean
there is a match. This was fixed by making `choose` always return
an inexact literal sequence. This is perhaps too conservative in
aggregate in some cases, but always correct. The idea here is that if
one is choosing between two concatenations, then it is likely the case
that the sequence returned should be considered inexact. The issue
is that this can lead to avoiding cross products in some cases that
would otherwise be correct. This is bad because it means extracting
shorter literals in some cases. (In general, the longer the literal the
better.) But we prioritize correctness for now and fix it. You can see
a few tests where this shortens some extracted literals.

Fixes #2884
2024-09-08 22:00:46 -04:00
Andrew Gallant
6c5108ed17
github: add FUNDING 2024-09-03 11:46:01 -04:00
Andrew Gallant
e0f1000df6
deps: update everything
This removes `once_cell` (a dependency of `cc`) but adds `shlex` (also a
dependency of `cc`). AFAIK, ripgrep does not utilize anything in `cc`
that requires `shlex`, which is pretty unfortunate that we have to spend
time compiling it. (We use `cc` only when the `pcre2` feature is
enabled.)
2024-08-28 11:38:43 -04:00
Henk-Jan Meijer
ea99421ec8
doc: fix transcription bug in ugrep benchmark command
I re-ran the benchmark and the timing remains nearly
unchanged, so that part was correct.

PR #2876
2024-08-21 13:58:36 -04:00
Cort Spellman
af8c386d5e
doc: fix typo in --heading flag help
PR #2864
2024-08-02 17:32:42 -04:00
Naser Aleisa
71d71d2d98
doc: refer to correct flag name for --engine=auto
PR #2850
2024-07-04 07:25:13 -04:00
Tobias Decking
c9ebcbd8ab
globset: optimize character escaping
Rewrites the char_to_escaped_literal and bytes_to_escaped_literal
functions in a way that minimizes heap allocations. After this, the
resulting string is the only allocation remaining.

I believe when this code was originally written, the routines available
to avoid heap allocations didn't exist.

I'm skeptical that this matters in the grand scheme of things, but I
think this is still worth doing for "good sense" reasons.

PR #2833
2024-06-05 09:56:00 -04:00
Pratham Verma
dec0dc3196
doc: update link for debian installation
PR #2829
2024-06-02 17:48:50 -04:00
Andrew Gallant
2f0a269f07
github: use an obviously old version of ripgrep in issue template
This should hopefully avoid confusion where the use of the version
number in the issue template isn't mistaken for the implication that the
version must therefore be recent.

Ref #2824
2024-05-27 18:22:11 -04:00
Andrew Gallant
0a0893a765
ignore: add debug log message when opening gitignore file
I'm not sure why it took me this long to add this debug message, but
it's quite useful in determining where ignore rules are coming from.
2024-05-27 14:53:19 -04:00
Bryan Honof
35160a1cdb
doc: add Flox as an installation method
Ref https://flox.dev/docs/

PR #2817
2024-05-24 11:59:19 -04:00
Andrew Gallant
f1d23c06e3 cli: add more logging for stdin heuristic detection
Stdin heuristic detection is complicated and opaque enough that it's
worth having easy access to the complete story that leads ripgrep to
decide whether to search stdin or not.

Ref #2806
2024-05-13 09:43:04 -04:00
tgolang
22b677900f
doc: fix some typos
PR #2754
2024-05-13 07:44:51 -04:00
NicoElbers
bb6f0f5519
doc: fix typo in --vimgrep help message
PR #2802
2024-05-11 07:02:24 -04:00
Andrew Gallant
b6ef99ee55
doc: remove unused man page template
This seems to be causing confusion. And since we don't use it as of
ripgrep 14, let's just remove it.

Man page generation is now done by ripgrep itself. That is:

    rg --generate man > rg.1

Closes #2801
2024-05-09 13:46:28 -04:00
Nicolas Holzschuch
bb8601b2ba
printer: make compilation on non-unix, non-windows platforms work
Some of the new hyperlink work caused ripgrep to stop compiling
on non-{Unix,Windows} platforms. The most popular of which is WASI.

This commit makes non-{Unix,Windows} compile again. And we add a
very basic WASI test in CI to catch regressions.

More work is needed to make tests on non-{Unix,Windows} platforms
work. And of course, this commit specifically takes the path of disabling
hyperlink support for non-{Unix,Windows} platforms.
2024-04-23 13:12:19 -04:00
Andrew Gallant
02b47b7469
deps: update everything
Notably, this removes winapi in favor of windows-sys, as a result of
winapi-util switching over to windows-sys[1].

Annoyingly, when PCRE2 is enabled, this brings in a dependency on
`once_cell`[2]. I had worked to remove it from my dependencies and now
it's back. Gah. I suppose I could disable the `parallel` feature of
`cc`, but that doesn't seem like a good trade-off.

[1]: https://github.com/BurntSushi/winapi-util/pull/13
[2]: https://github.com/rust-lang/cc-rs/pull/1037
2024-04-23 10:46:12 -04:00
redistay
d922b7ac11
doc: fix typo
PR #2776
2024-04-02 09:10:25 -04:00
Linda_pp
2acf25c689
ignore/types: add WGSL to the default file types
[WGSL][1] is a shading language for WebGPU. As defined in [Appendix
A][2], the file extension is `.wgsl`.

PR #2774 

[1]: https://www.w3.org/TR/WGSL/
[2]: https://www.w3.org/TR/WGSL/#text-wgsl-media-type
2024-04-01 23:05:15 -04:00
Vadim Kostin
80007698d3
ignore/types: add Vue
PR #2772
2024-04-01 07:49:29 -04:00
cgzones
3ad0e83471
ignore/walk: correct build_parallel() documentation
The returned closure should return `WalkState`, not `()`.

Closes #2767
2024-03-27 14:50:05 -04:00
Andrew Gallant
eca13f08a2
deps: bump everything else 2024-03-24 18:58:28 -04:00
Andrew Gallant
4f99f82b19
deps: bump pcre2 and pcre2-sys
This moves to PCRE2 10.43.
2024-03-24 18:58:06 -04:00
Anton Zhiyanov
327d74f161
doc: add link to unofficial playground
PR #2760
2024-03-20 08:11:09 -04:00
Brent Williams
9da0995df4
ignore/types: add 'svelte' to the default file types
Ref: https://svelte.dev/

PR #2759
2024-03-19 13:36:08 -04:00
Andrew Gallant
e9abbc1a02 cargo: nuke 'simd-accel' from orbit
This feature causes nothing but problems and is frequently broken. The
only optimization it was enabling were SIMD optimizations for
transcoding. In particular, for UTF-16 transcoding. This is performed by
the [`encoding_rs`](https://github.com/hsivonen/encoding_rs) crate,
which specifically uses unstable portable SIMD APIs instead of the
stable non-portable SIMD APIs.

SIMD optimizations that apply to search have long been making use of
stable APIs, and are automatically enabled when your target supports
them. This is, IMO, the correct user experience and one that
`encoding_rs` refuses to support. I'm done dealing with it, so
transcoding will only use scalar code until the SIMD optimizations in
`encoding_rs` work on stable. (This doesn't mean that `encoding_rs` has
to change. This could also be fixed by stabilizing `std::simd`.)

Fixes #2748
2024-03-07 09:47:43 -05:00
Andrew Gallant
9bd30e8e48
deps: update everything 2024-03-07 09:38:22 -05:00
Andrew Gallant
59212d08d3
style: fix new lints
The Rust compiler seems to have gotten smarter at finding unused or
redundant imports.
2024-03-07 09:37:48 -05:00
SuperSpecialSweet
6ebebb2aaa
doc: fix typo in comments
PR #2741
2024-02-22 06:57:58 -05:00
Andrew Gallant
e92e2ef813
cli: remove stray dbg!
Whoops, forgot to review my commits before pushing.
2024-02-15 12:02:15 -05:00
Andrew Gallant
4a30819302
cli: tweak how "is one file" predicate works
In effect, we switch from `path.is_file()` to `!path.is_dir()`. In cases
where process substitution is used, for example, the path can actually
have type "fifo" instead of "file." Even if it's a fifo, we want to
treat it as-if it were a file. The real key here is that we basically
always want to consider a lone argument as a file so long as we know it
isn't a directory. Because a directory is the only thing that will
causes us to (potentially) search more than one thing.

Fixes #2736
2024-02-15 11:59:59 -05:00
Wilfred Hughes
9b42af96f0
doc: fix typo in --hidden docs
PR #2718
2024-01-22 13:31:11 -05:00
Alex Touchet
648a65f197 doc: add missing date in changelog
PR #2704
2024-01-06 17:49:18 -05:00
Andrew Gallant
bdf01f46a6
changelog: start next section 2024-01-06 14:41:45 -05:00
Andrew Gallant
1c775f3a82
pkg/brew: update tap 2024-01-06 14:41:09 -05:00
Andrew Gallant
e50df40a19
14.1.0 2024-01-06 14:32:27 -05:00
Andrew Gallant
1fa76d2a42
changelog: add 14.1.0 blurb 2024-01-06 14:31:16 -05:00
Andrew Gallant
44aa5a417d
deps: bump ignore to 0.4.22 2024-01-06 14:28:28 -05:00
Andrew Gallant
2c3897585d
ignore-0.4.22 2024-01-06 14:27:44 -05:00
Andrew Gallant
6e9141a9ca
deps: update everything 2024-01-06 14:26:52 -05:00
Andrew Gallant
c8e4a84519
cli: prefix all non-fatal error messages with 'rg: '
Fixes #2694
2024-01-06 14:15:52 -05:00
Andrew Gallant
f02a50a69d
changelog: various updates 2024-01-06 13:59:52 -05:00
fe9lix
b9c774937f ignore: fix reference cycle for compiled matchers
It looks like there is a reference cycle caused by the compiled
matchers (compiled HashMap holds ref to Ignore and Ignore holds ref
to HashMap). Using weak refs fixes issue #2690 in my test project.
Also confirmed via before and after when profiling the code, see the
attached screenshots in #2692.

Fixes #2690
2024-01-06 12:50:42 -05:00
Andrew Gallant
67dd809a80
ignore: add some 'allow(dead_code)' annotations
I don't usually like doing this and would prefer to just delete unused
code, but I don't have the context required to understand why this code
is unused. A refresh of this crate is on the (distant) horizon, so I'll
just leave these here for now to squash the warnings.
2024-01-06 12:25:06 -05:00
Jan Verbeek
e0a85678e1 complete/fish: improve shell completions for fish
- Stop using `-n __fish_use_subcommand`. This had the effect of
ignoring options if a positional argument has already been given, but
that's not how ripgrep works.

- Only suggest negation options if the option they're negating is
passed (e.g., only complete `--no-pcre2` if `--pcre2` is present). The
zsh completions already do this.

- Take into account whether an option takes an argument. If an option
is not a switch then it won't suggest further options until the
argument is given, e.g. `-C<tab>` won't suggest options but `-i<tab>`
will.

- Suggest correct arguments for options. We already completed a fixed
set of choices where available, but now we go further:

  - Filenames are only suggested for options that take filenames.

  - `--pre` and `--hostname-bin` suggest binaries from `$PATH`.

  - `-t`/`--type`/&c use `--type-list` for suggestions, like in zsh,
  with a preview of the glob patterns.

  - `--encoding` uses a hardcoded list extracted from the zsh
  completions. This has been refactored into a separate file, and the
  range globs (`{1..5}`) replaced by comma globs (`{1,2,3,4,5}`) since
  those work in both shells. I verified that this produces the same
  list as before in zsh, and the same list in fish (albeit in a
  different order).

PR #2684
2024-01-06 10:39:35 -05:00
David Gilman
23af5fb043
doc: update MSRV in README
PR #2673
2024-01-06 10:22:26 -05:00
Andrew Gallant
5dec4b8e37 ci: drop custom Cross images
It looks like these aren't needed any more? I'm not sure why to be
honest. I suspect it's because we no longer need asciidoc(tor)? to
generate man pages. And I believe tests that require things like `zstd`
are automatically if `zstd` isn't installed.
2024-01-06 10:21:34 -05:00
Younes El-karama
827082a33a ci: add more ARM build configurations to CI and release workflows
... it turns out that rustembedded/cross:armv7-unknown-linux-musleabi
doesn't exist. And looking more closely, it looks like the Cross project
has decided to shake things up and publish images to ghcr instead. So we
migrate everything over to that.
2024-01-06 10:21:34 -05:00
Andrew Gallant
6c2a550e1e
deps: update everything
This drops a dependency on memoffset due to a crossbeam-epoch update.
w00t.
2024-01-04 19:46:29 -05:00
Andrew Gallant
8e8fc9c503
deps: bump pcre2-sys to 0.2.8
This release contains some extra logic to disable the JIT on musleabi
targets.
2024-01-04 19:44:28 -05:00
Andrew Gallant
2057023dc5 readme: update benchmarks
We add a few more too.
2024-01-03 16:21:04 -05:00
Andrew Gallant
3f2fe0afee
deps: update everything
This also drops a dependency on scopeguard, courtesy of crossbeam-epoch
dropping it. Not sure why they did, but fine by me.
2023-12-17 09:37:33 -05:00
amesgen
56c7ad175a
ignore/types: add Lean
Ref: https://lean-lang.org/

PR #2678
2023-12-07 11:46:00 -05:00
Timo Wilken
5b7a30846f
doc: fix Guix install instructions
`guix install` should not be run using `sudo`, as per
<https://packages.guix.gnu.org/packages/ripgrep/>.

PR #2669
2023-11-30 10:54:54 -05:00
Patrick Williams
2a4dba3fbf
ignore/types: add meson.options
Starting with meson 1.1, there is a preference for using meson.options
instead of meson_options.txt.  Add the new filename to the meson set.

PR #2666
2023-11-29 19:03:12 -05:00
liberodark
84d65865e6
doc: add Void Linux installation instructions
PR #2665
2023-11-29 07:49:20 -05:00
Andrew Gallant
d9aaa11873
pkg/brew: update tap 2023-11-28 16:23:16 -05:00
Andrew Gallant
67ad9917ad
14.0.3 2023-11-28 16:18:14 -05:00
Andrew Gallant
daa157b5f9
core: actually implement --sortr=path
This is an embarrassing oversight. A `todo!()` actually made its way
into a release! Oof.

This was working in ripgrep 13, but I had redone some aspects of sorting
and this just got left undone.

Fixes #2664
2023-11-28 16:17:14 -05:00
Andrew Gallant
ca5e294ad6
pkg/brew: update tap 2023-11-27 21:44:06 -05:00
Andrew Gallant
6c7947b819
14.0.2 2023-11-27 21:38:21 -05:00
Andrew Gallant
9acb4a5405
deps: bump grep to 0.3.1 2023-11-27 21:37:41 -05:00
Andrew Gallant
0096c74c11
grep-0.3.1 2023-11-27 21:36:54 -05:00
Andrew Gallant
8c48355b03
deps: bump grep-printer to 0.2.1 2023-11-27 21:36:44 -05:00
Andrew Gallant
f9b86de963
grep-printer-0.2.1 2023-11-27 21:36:02 -05:00
Andrew Gallant
d23b74975a
deps: bump grep-searcher to 0.1.13 2023-11-27 21:35:53 -05:00
Andrew Gallant
a5cbdb3dfe
grep-searcher-0.1.13 2023-11-27 21:34:58 -05:00
Andrew Gallant
b6bac8484e
cargo: add release-lto profile
The idea is to build ripgrep with as much optimization as possible.

This makes compilation times absolutely obscene. They jump from <10
seconds to 30+ seconds on my i9-12900K. I don't even want to know how
long CI would take with these.

I tried some ad hoc benchmarks and could not notice any meaningful
improvement with the LTO binary versus the normal release profile.
Because of that, I still don't think it's worth bloating the release
cycle times.

Ref #1225
2023-11-27 21:31:03 -05:00
Andrew Gallant
805fa32d18 searcher: work around NUL line terminator bug
As the FIXME comment says, ripgrep is not yet using the new line
terminator option in regex-automata exposed for exactly this purpose.
Because of that, line anchors like `(?m:^)` and `(?m:$)` will only match
`\n` as a line terminator. This means that when --null-data is used in
combination with --line-regexp, the anchors inserted by --line-regexp
will not match correctly. This is only a big deal in the "fast" path,
which requires the regex engine to deal with line terminators itself
correctly. The slow path strips line terminators regardless of what they
are, and so the line anchors can match (begin/end of haystack).

Fixes #2658
2023-11-27 21:17:12 -05:00
Andrew Gallant
2d518dd1f9 release: tweak how sha256sum is invoked
The output would ideally just have the basename of the file and not a
meaningless relative path.

Fixes #2654
2023-11-27 21:17:12 -05:00
Jan Verbeek
8575d26179 complete/fish: Fix syntax for negated options
And also, negated options don't take arguments.

Specifically, the fish completion generator currently forgets to add
`-l` to negation options, leading to a list of these errors:

    complete: too many arguments

    ~/.config/fish/completions/rg.fish (line 146):
    complete -c rg -n '__fish_use_subcommand'  no-sort-files -d '(DEPRECATED) Sort results by file path.'
    ^
    from sourcing file ~/.config/fish/completions/rg.fish

    (Type 'help complete' for related documentation)

To reproduce, run `fish -c 'rg --generate=complete-fish | source'`.

It also potentially suggests a list of choices for negation options,
even though those never take arguments. That case doesn't occur with
any of the current options but it's an easy fix.

Fixes #2659, Closes #2655
2023-11-27 21:17:12 -05:00
Jon Jensen
2e81a7adfe doc: fix typo that was preventing interpolation
Closes #2662
2023-11-27 21:17:12 -05:00
Andrew Gallant
cd5440fb62
changelog: fix wording
Ref: https://news.ycombinator.com/item?id=38425790
2023-11-26 17:58:30 -05:00
Andrew Gallant
2ee690e87a
pkg/brew: update tap 2023-11-26 17:37:52 -05:00
Andrew Gallant
59f86a45d3
14.0.1 2023-11-26 16:33:35 -05:00
Andrew Gallant
2d31af38a2
cargo: include pkg/windows in crate package
Fixes #2653
2023-11-26 16:32:59 -05:00
Andrew Gallant
0da1176e7d
pkg/brew: update tap 2023-11-26 15:27:09 -05:00
Andrew Gallant
eeffcd50b7
doc: add step to run 'cargo package' 2023-11-26 15:25:23 -05:00
Andrew Gallant
625743d7c8
grep-0.3.0 2023-11-26 15:24:09 -05:00
Andrew Gallant
3d0171040a
grep-printer-0.2.0 2023-11-26 15:21:40 -05:00
Andrew Gallant
93429d0f85
14.0.0 2023-11-26 14:19:31 -05:00
Andrew Gallant
9c4b0baf10
deps: bump grep to 0.2.13 2023-11-26 14:18:53 -05:00
Andrew Gallant
179487aaed
grep-0.2.13 2023-11-26 14:18:17 -05:00
Andrew Gallant
b407d62b63
deps: bump grep-searcher to 0.1.12 2023-11-26 14:18:03 -05:00
Andrew Gallant
9bd1e737bc
grep-searcher-0.1.12 2023-11-26 14:17:26 -05:00
Andrew Gallant
c12231c621
deps: bump grep-pcre2 to 0.1.7 2023-11-26 14:17:11 -05:00
Andrew Gallant
b0df573834
grep-pcre2-0.1.7 2023-11-26 14:16:46 -05:00
Andrew Gallant
85b2ceecd1
deps: bump grep-regex to 0.1.12 2023-11-26 14:16:31 -05:00
Andrew Gallant
fee7ac79f1
grep-regex-0.1.12 2023-11-26 14:15:44 -05:00
Andrew Gallant
54d5540c10
deps: bump grep-matcher to 0.1.7 2023-11-26 14:15:34 -05:00
Andrew Gallant
d0251c77fe
grep-matcher-0.1.7 2023-11-26 14:13:54 -05:00
Andrew Gallant
6aa5993d4b
deps: bump grep-cli to 0.1.10 2023-11-26 14:13:40 -05:00
Andrew Gallant
6f78d211bf
grep-cli-0.1.10 2023-11-26 14:13:03 -05:00
Andrew Gallant
51aa339830
deps: bump ignore to 0.4.21 2023-11-26 14:12:55 -05:00
Andrew Gallant
381c521d02
ignore-0.4.21 2023-11-26 14:12:16 -05:00
Andrew Gallant
57495db10e
deps: bump globset to 0.4.14 2023-11-26 14:11:43 -05:00
Andrew Gallant
47e37175ca
globset-0.4.14 2023-11-26 14:11:05 -05:00
Andrew Gallant
8697946718
release/doc: set date in man page 2023-11-26 14:10:07 -05:00
Andrew Gallant
8058859701
changelog: add link for reporting perf improvements/regressions 2023-11-26 14:05:23 -05:00
Andrew Gallant
e9ff90c8ff
changelog: updates for the 14.0.0 release 2023-11-26 14:03:59 -05:00
Andrew Gallant
bf9f74ea5b
doc: progress 2023-11-26 13:32:39 -05:00
Andrew Gallant
9b5091b895
deps: bump to memmap2 0.9.0 2023-11-26 13:32:39 -05:00
Andrew Gallant
a4f165e3ab
deps: bump everything 2023-11-26 13:32:39 -05:00
Andrew Gallant
d1def67000
deps: bump pcre2 to 0.2.6 2023-11-26 13:32:20 -05:00
Andrew Gallant
56af4d4a74
cli: add simple flag suggestions
We look for similar flag names via Jaccard index on ngrams. In my
experience this tends to work better than Levenshtein or other edit
distance based metrics. Principally because it allows for out-of-order
suggestions. For example, --case-smart will result in a suggestion for
--smart-case, even though the edit distance between them is pretty big.

This is something Clap did for us. I initially thought it wasn't
necessary to add this back in, but I realized it wouldn't be much work
and might actually be helpful to folks.
2023-11-26 09:55:44 -05:00
Andrew Gallant
b0f6645408
ci: remove local deb build-and-publish script
I moved this to GitHub Actions. w00t.
2023-11-25 18:27:52 -05:00
Andrew Gallant
3dbe371fe4
ci: add Debian release build
Previously, we were running 'cargo deb' locally. But the release process
is a little simpler now thanks to GitHub Actions and the 'gh' tool, so I
felt comfortable putting the 'deb' generation in CI.

Now the only real manual part of release asset creation is the M2
release, but that should hopefully be automated once GitHub makes Apple
silicon runners available for free.
2023-11-25 18:20:05 -05:00
Andrew Gallant
30d06b3b4c changelog: note that --no-ignore --ignore-vcs works as expected
This fix fell out of the move off of Clap.

Closes #1376
2023-11-25 15:03:53 -05:00
Andrew Gallant
6a055d922c doc: clarify errors for -z/--search-zip
Fixes #1622
2023-11-25 15:03:53 -05:00
Andrew Gallant
e007523229 doc: note the precedence of -t/--type
Fixes #1635
2023-11-25 15:03:53 -05:00
Andrew Gallant
88353c80da doc: be more explicit about ripgrep's behavior when printing to a tty
Fixes #1709
2023-11-25 15:03:53 -05:00
Andrew Gallant
cd3bcce42d changelog: mention M2 binaries for releases
Fixes #1737
2023-11-25 15:03:53 -05:00
Andrew Gallant
1ea3552f2d changelog: mention perf improvement for inner literals
Fixes #1746
2023-11-25 15:03:53 -05:00
Andrew Gallant
9ed7565fcb cli: error when searching for NUL
Basically, unless the -a/--text flag is given, it is generally always an
error to search for an explicit NUL byte because the binary detection
will prevent it from matching.

Fixes #1838
2023-11-25 15:03:53 -05:00
Andrew Gallant
7bb9f35d2d doc: clarify that --pre can accept any kind of path
Fixes #2046
2023-11-25 15:03:53 -05:00
Andrew Gallant
b138d5740a log: add message about number of threads used
Closes #2122
2023-11-25 15:03:53 -05:00
Andrew Gallant
3f0c8c2900 doc: improve -r/--replace docs
It looks like this was done a while ago, but it didn't get added to the
CHANGELOG or connected with the corresponding issue.

Fixes #2201
2023-11-25 15:03:53 -05:00
Andrew Gallant
0e6e9417f1 log: add message when a binary file is skipped
The way we do this is a little hokey but I believe it is correct.

Fixes #2246
2023-11-25 15:03:53 -05:00
Andrew Gallant
fded2a5fe1 doc: add cargo-binstall instructions
Closes #2298
2023-11-25 15:03:53 -05:00
Andrew Gallant
e14eeb288f doc: mention that --stats is always implied by --json
Fixes #2337
2023-11-25 15:03:53 -05:00
Andrew Gallant
1cbcefddc9 doc: add more warnings about --vimgrep
The --vimgrep flag has some severe footguns when using a pattern that
matches very frequently. We had already written some docs to warn about
that, but now we also include a suggestion to avoid exorbitant heap
usage.

Closes #2505
2023-11-25 15:03:53 -05:00
Andrew Gallant
4fec9ffca8 doc: make the opening line a bit more descriptive
This mimics what was written in the man page.

Closes #2401
2023-11-25 15:03:53 -05:00
Andrew Gallant
00225a035b doc: improve --sort=path
This clarifies that the paths are not sorted in a fully lexicographic
order, but that / is treated specially.

Fixes #2418
2023-11-25 15:03:53 -05:00
Andrew Gallant
286de9564e cli: rejigger --version to include PCRE2 info
This adds info about whether PCRE2 is available or not to the output of
--version. Essentially, --version now subsumes --pcre2-version, although
we do retain the former because it (usefully) emits an exit code based
on whether PCRE2 is available or not.

Closes #2645
2023-11-25 15:03:53 -05:00
Andrew Gallant
038524a580 printer: trim before applying max column windowing
Previously, we were applying the -M/--max-columns flag *before* triming
prefix ASCII whitespace. But this doesn't make a whole lot of sense. We
should be trimming first, but the result of trimming is ultimately what
we'll be printing and that's what -M/--max-columns should be applied to.

Fixes #2458
2023-11-25 15:03:53 -05:00
Andrew Gallant
8f9557d183 changelog: mention shell completion generation feature
Closes #2425
2023-11-25 15:03:53 -05:00
Andrew Gallant
58e7d2ea63 doc: add docs about .ignore/.rgignore in parent directories
Closes #2479
2023-11-25 15:03:53 -05:00
Andrew Gallant
b7df9f8caa changelog: mention --field-match-separator bug fix
This was probably fixed in the migration off of Clap.

Closes #2519
2023-11-25 15:03:53 -05:00
Andrew Gallant
ebb986e767 logging: show heuristic information and decision
When one does not provide any paths to ripgrep to search, it has to
guess between searching stdin and the current working directory. It is
possible for this guess to be wrong, and having the heuristics and the
choice in the debug logs is useful for diagnosing this.

The failure mode here is still pretty bad because you need to know to
reach for the `--debug` flag in the first place. Namely, the typical
failure mode is that ripgrep tries to search stdin while the intent is
for it to search the current working directory, and thus likely blocking
forever waiting for data on stdin.

(Arguably this is a problem with the process architecture that invokes
ripgrep. It shouldn't give ripgrep an open stdin handle that isn't
closed.)

Closes #2524
2023-11-25 15:03:53 -05:00
Andrew Gallant
a2907db2de
faq: update donation section to mention sponsorships 2023-11-21 19:05:58 -05:00
Andrew Gallant
470ad1d072
faq: rewrite the section on shell completions 2023-11-21 19:02:07 -05:00
Tavian Barnes
6d7550d58e ignore: Avoid contention on num_pending
Previously, every worker would increment the shared num_pending count on
every new work item, and decrement it after finishing them, leading to
lots of contention.  Now, we only track the number of workers actively
running, so there is no contention except when workers go to sleep or
wake up.

Closes #2642
2023-11-21 18:39:32 -05:00
Andrew Gallant
af55fc2b38 cli: make -d a short flag for --max-depth
Interestingly, ripgrep now only has two available ASCII letter short
flags remaining: -k and -y.

Closes #2643, Closes #2644
2023-11-21 18:39:32 -05:00
Andrew Gallant
3d2f49f6fe changelog: --pretty now behaves more sensibly
This actually just kind of fell out of the migration off of Clap as a
result of treating `-p/--pretty` more rigorously as an alias for
`--line-number --heading --color always`.

Fixes #2381, Closes #2637
2023-11-21 18:39:32 -05:00
Andrew Gallant
50b2472438 ci: strip release binaries on macOS
We were purportedly doing this already, but actually weren't because of
confusion in the `if` condition.

Closes #2636
2023-11-21 18:39:32 -05:00
Andrew Gallant
ae2a09915f printer: drop dependency on base64 crate
Instead, we just roll our own. A slow version of this is pretty simple
to do, and that's what we write here. The `base64` crate supports a lot
more functionality and is quite fast, but we care about neither of those
things for this particular aspect of ripgrep. (base64 is only used for
non-UTF-8 data or file paths, which are both quite rare.)
2023-11-21 18:39:32 -05:00
Andrew Gallant
9c84575229 printer: drop dependency on serde_derive
As suggested by @epage[1].

Ad hoc timings on my i7-12900K:

    before cargo build: 4.91s
    before cargo build release: 8.05s
    after cargo build: 4.69s
    after cargo build release: 7.83s

... pretty underwhelming if you ask me. Ah well. And on my M2 mac mini:

    before cargo build: 6.18s
    before cargo build release: 14.50s
    after cargo build: 5.52s
    after cargo build release: 13.44s

Still kind of underwhelming, but definitely better. It shaves a full
second off of compile times in release mode. I went back to my
i7-12900K, but passed `-j1` to `cargo build` to force single threaded
mode:

    before cargo build: 19.44s
    before cargo build release: 50.64s
    after cargo build: 16.76s
    after cargo build release: 48.00s

Which seems pretty consistent with the modest improvements above.

Looking at `cargo build --timings`, the beefiest chunk of time is spent
in compiling `regex-automata`, by far. This is fine because it's core
functionality. I wish a fast general purpose regex engine with its
internals exposed as a separately versioned library didn't require so
much code... Blech.

[1]: https://old.reddit.com/r/rust/comments/17rd8ww/faster_compilation_with_the_parallel_frontend_in/k8igjlg/
2023-11-21 18:39:32 -05:00
Andrew Gallant
cddb5f57f8 printer: rejigger how we use serde_derive
The idea is that by bringing derives in via serde's optional feature, it
was inhibiting compilation speed[1]. We try to fix that by depending on
`serde_derive` as a distinct dependency.

It does seem to improve overall compilation time, but only by about 0.5
seconds. With that said, my machine has a lot of cores, so it's possible
this will help more on less powerful CPUs.

[1]: https://old.reddit.com/r/rust/comments/17rd8ww/faster_compilation_with_the_parallel_frontend_in/k8igjlg/
2023-11-21 18:39:32 -05:00
Andrew Gallant
5dc424d302 doc: scrub mentions of asciidoc/asciidoctor
This optional dependency is now finally dropped. So ends a long journey
of trying to generate man pages in a lightweight and dependable way. The
only thing I could figure out how to make work reliably was to just
learn how to write roff myself. Yay.
2023-11-21 18:39:32 -05:00
Andrew Gallant
040d8f2171 ci: improve docs for manual build-and-publish scripts 2023-11-21 18:39:32 -05:00
Andrew Gallant
c81caa673b core: fix file separator bug
I introduced a regression in the migration off of the clap by having
both the buffer writer and the printer be responsible for printing file
separators in multi-threaded search. The buffer writer owns that
responsibility in multi-threaded search.
2023-11-21 18:39:32 -05:00
Andrew Gallant
082245dadb cli: replace clap with lexopt and supporting code
ripgrep began it's life with docopt for argument parsing. Then it moved
to Clap and stayed there for a number of years. Clap has served ripgrep
well, and it probably could continue to serve ripgrep well, but I ended
up deciding to move off of it.

Why?

The first time I had the thought of moving off of Clap was during the
2->3->4 transition. I thought the 3.x and 4.x releases were great, but
for me, it ended up moving a little too quickly. Since the release of
4.x was telegraphed around when 3.x came out, I decided to just hold off
and wait to migrate to 4.x instead of doing a 3.x migration followed
shortly by another 4.x migration. Of course, I just never ended up doing
the migration at all. I never got around to it and there just wasn't a
compelling reason for me to upgrade. While I never investigated it, I
saw an upgrade as a non-trivial amount of work in part because I didn't
encapsulate the usage of Clap enough.

The above is just what got me started thinking about it. It wasn't
enough to get me to move off of it on its own. What ended up pushing me
over the edge was a combination of factors:

* As mentioned above, I didn't want to run on the migration treadmill.
This has proven to not be much of an issue, but at the time of the
2->3->4 releases, I didn't know how long Clap 4.x would be out before a
5.x would come out.
* The release of lexopt[1] caught my eye. IMO, that crate demonstrates
exactly how something new can arrive on the scene and just thoroughly
solve a problem minimalistically. It has the docs, the reasoning, the
simple API, the tests and good judgment. It gets all the weird corner
cases right that Clap also gets right (and is part of why I was
originally attracted to Clap).
* I have an overall desire to reduce the size of my dependency tree. In
part because a smaller dependency tree tends to correlate with better
compile times, but also in part because it reduces my reliance and trust
on others. It lets me be the "master" of ripgrep's destiny by reducing
the amount of behavior that is the result of someone else's decision
(whether good or bad).
* I perceived that Clap solves a more general problem than what I
actually need solved. Despite the vast number of flags that ripgrep has,
its requirements are actually pretty simple. We just need simple
switches and flags that support one value. No multi-value flags. No
sub-commands. And probably a lot of other functionality that Clap has
that makes it so flexible for so many different use cases. (I'm being
hand wavy on the last point.)

With all that said, perhaps most importantly, the future of ripgrep
possibly demands a more flexible CLI argument parser. In today's world,
I would really like, for example, flags like `--type` and `--type-not`
to be able to accumulate their repeated values into a single sequence
while respecting the order they appear on the CLI. For example, prior
to this migration, `rg regex-automata -Tlock -ttoml` would not return
results in `Cargo.lock` in this repository because the `-Tlock` always
took priority even though `-ttoml` appeared after it. But with this
migration, `-ttoml` now correctly overrides `-Tlock`. We would like to
do similar things for `-g/--glob` and `--iglob` and potentially even
now introduce a `-G/--glob-not` flag instead of requiring users to use
`!` to negate a glob. (Which I had done originally to work-around this
problem.) And some day, I'd like to add some kind of boolean matching to
ripgrep perhaps similar to how `git grep` does it. (Although I haven't
thought too carefully on a design yet.) In order to do that, I perceive
it would be difficult to implement correctly in Clap.

I believe that this last point is possible to implement correctly in
Clap 2.x, although it is awkward to do so. I have not looked closely
enough at the Clap 4.x API to know whether it's still possible there. In
any case, these were enough reasons to move off of Clap and own more of
the argument parsing process myself.

This did require a few things:

* I had to write my own logic for how arguments are combined into one
single state object. Of course, I wanted this. This was part of the
upside. But it's still code I didn't have to write for Clap.
* I had to write my own shell completion generator.
* I had to write my own `-h/--help` output generator.
* I also had to write my own man page generator. Well, I had to do this
with Clap 2.x too, although my understanding is that Clap 4.x supports
this. With that said, without having tried it, my guess is that I
probably wouldn't have liked the output it generated because I
ultimately had to write most of the roff by hand myself to get the man
page I wanted. (This also had the benefit of dropping the build
dependency on asciidoc/asciidoctor.)

While this is definitely a fair bit of extra work, it overall only cost
me a couple days. IMO, that's a good trade off given that this code is
unlikely to change again in any substantial way. And it should also
allow for more flexible semantics going forward.

Fixes #884, Fixes #1648, Fixes #1701, Fixes #1814, Fixes #1966

[1]: https://docs.rs/lexopt/0.3.0/lexopt/index.html
2023-11-20 23:51:53 -05:00
Andrew Gallant
c33f623719 cargo: explicitly configure musl to be statically linked
It looks like the musl target will, at some point, default to be
dynamically linked. This config knob should make it so that it's always
statically linked.

Ref https://github.com/rust-lang/compiler-team/issues/422
Ref https://github.com/rust-lang/compiler-team/issues/422#issuecomment-812135847
2023-11-20 23:51:53 -05:00
Jonas Platte
824778c009 globset: add GlobSet::builder
This avoids needing to import and call GlobSetBuilder::new explicitly.

Closes #2635
2023-11-20 23:51:53 -05:00
Kento Okamoto
922bad2b92 ignore: improve 'excludesFile' parsing
This permits the value to be surrounded in double quotes. It's still not
perfect, but probably better than it was. Getting this to be more
correct will likely require writing (or using) a real parser, which I'm
not particularly incliend to do at present.

Fixes #2392, Closes #2629
2023-11-20 23:51:53 -05:00
Andrew Gallant
538ba956dc deps: bump regex and regex-automata 2023-11-20 23:51:53 -05:00
Andrew Gallant
443c057042 deps: bump regex, regex-automata and regex-syntax 2023-11-20 23:51:53 -05:00
Andrew Gallant
5b88515faf build: a bit of clean-up
This does just a smidge of polishing in the build script source code.
2023-11-20 23:51:53 -05:00
Andrew Gallant
92c81b1225 core: switch to anyhow
This commit adds `anyhow` as a dependency and switches over to it from
Box<dyn Error>.

It actually looks like I've kept all of my errors rather shallow, such
that we don't get a huge benefit from anyhow at present. But now that
anyhow is in use, I expect to use its "context" feature more going
forward.
2023-11-20 23:51:53 -05:00
Tavian Barnes
53679e4c43 ignore: simplify the work-stealing strategy
There's no particular reason for this change. I happened to be looking
at the code again and realized that stealing from your left neighbour
or your right neighbour shouldn't make a difference (and indeed perf is
the same in my benchmarks).

Closes #2624
2023-11-20 23:51:53 -05:00
Andrew Gallant
8b766a2522 ripgrep: disable hyperlinks by default
As a result of discussion in #2611, it seems prudent to disable
hyperlinks by default. Ideally they would be enabled, but it looks like
some environments may barf on them. Since this is the first release with
hyperlink support, it makes sense to me at least to make users opt into
them. This does not preclude enabling them by default in future
releases.
2023-11-20 23:51:53 -05:00
Andrew Gallant
c21302b409 regex: tweak inner literal heuristic
Previously, we had logic to skip our own inner literal optimization if
the regex itself was already (likely) accelerated. It turns out that the
presence of a Unicode word boundary can defeat acceleration to a point.
It's likely enough that even if the underlying regex is accelerated, it
would be prudent to do our own inner literal optimization if the pattern
has a Unicode word boundary.

Normally a Unicode word boundary doesn't defeat literal optimizations,
since even the slower engines can make use of *prefix* literal
optimizations. But a regex can be accelerated via its own inner or
suffix literal optimizations, and those require the use of a DFA (or
lazy DFA). Since DFAs crap out on haystacks that contain a non-ASCII
Unicode scalar value when the regex contains a Unicode word boundary, it
follows that an "accelerated" can still wind up being quite slow.

(An "accelerated" regex can also slow down because of restrictions on
avoiding quadratic behavior, but I believe this happens less frequently
and is not as severe as the slow down as a result of Unicode word
boundaries. Namely, avoiding quadratic behavior just means giving up on
the inner literal optimization for a single search. In which case, the
regex engine can still fall back to a normal forward DFA. That will
definitely be slower than an inner literal optimization done by ripgrep,
but not quite as dramatic as it would be when DFAs can't be used at
all.)
2023-11-20 23:51:53 -05:00
Andrew Gallant
8a5b81716a deps: update dependencies
Specifically, regex-syntax 0.8.1 has this fix:
f082244720
2023-11-20 23:51:53 -05:00
Andrew Gallant
7099e174ac cargo: remove dependency patches
I'm too lazy to fixup old commits.
2023-10-09 20:29:52 -04:00
Andrew Gallant
dd810779d4 changelog: add another note about -w/--word-regexp bugs
This was fixed a few commits ago when we updated to regex-automata 0.4
(regex 1.10).

Fixes #2623
2023-10-09 20:29:52 -04:00
Andrew Gallant
5011f6e9f1 changelog: add perf bug fix for \b
Like the previous CHANGELOG entry, this marks a bug that was fixed
likely with the introduction of regex 1.9:

    $ hyperfine "rg-13.0.0 -ic '\bfoo\b \bbar\b' git-3a06386e.txt" "rg -ic '\bfoo\b \bbar\b' git-3a06386e.txt"
    Benchmark 1: rg-13.0.0 -ic '\bfoo\b \bbar\b' git-3a06386e.txt
      Time (mean ± σ):      1.034 s ±  0.011 s    [User: 1.030 s, System: 0.004 s]
      Range (min … max):    1.021 s …  1.053 s    10 runs

    Benchmark 2: rg -ic '\bfoo\b \bbar\b' git-3a06386e.txt
      Time (mean ± σ):       6.3 ms ±   0.3 ms    [User: 4.6 ms, System: 1.6 ms]
      Range (min … max):     5.6 ms …   7.3 ms    343 runs

    Summary
      'rg -ic '\bfoo\b \bbar\b' git-3a06386e.txt' ran
      164.95 ± 7.70 times faster than 'rg-13.0.0 -ic '\bfoo\b \bbar\b' git-3a06386e.txt'

This was not fixed by making \b itself faster, but rather, by improving
inner literal extraction. In particular, if the regex doesn't have any
literals extracted, then search time can still be quite slow:

    $ time rg-13.0.0 -ic '\b[a-z]{3}\b\s\b[a-z]{3}\b' git-3a06386e.txt
    57538

    real    0.427
    user    0.423
    sys     0.003
    maxmem  46 MB
    faults  0
    $ time rg -ic '\b[a-z]{3}\b\s\b[a-z]{3}\b' git-3a06386e.txt
    57538

    real    0.337
    user    0.333
    sys     0.003
    maxmem  46 MB
    faults  0

But then again, so is grep, because grep doesn't benefit from any
literal optimizations either:

    $ time grep -E -ic '\b[a-z]{3}\b\s\b[a-z]{3}\b' git-3a06386e.txt
    62396

    real    1.316
    user    1.292
    sys     0.007
    maxmem  13 MB
    faults  7

The count mismatch should probably be investigated.

Fixes #1760
2023-10-09 20:29:52 -04:00
Andrew Gallant
a2799ccb41 changelog: add bug fix for \b
This was probably fixed in a past commit where I bumped the regex engine
to 1.9 (or perhaps more precisely, regex-automata 0.3). But I didn't
track it as fixed at the time.

Fixes #1275
2023-10-09 20:29:52 -04:00
Andrew Gallant
a13b5e0196 deps: update various crates 2023-10-09 20:29:52 -04:00
Andrew Gallant
9626f16757 progress 2023-10-09 20:29:52 -04:00
Andrew Gallant
f7ff34fdf9 searcher: simplify 'replace_bytes' routine
I did this in the course of trying to optimize it. I don't believe I
made it any faster, but the refactoring led to code that I think is
more readable.
2023-10-09 20:29:52 -04:00
Andrew Gallant
b9de003f81 matcher: add a bunch of inline annotations
Many of these functions should be inlineable, but I'm not 100% sure
that they can be inlined without these annotations. We don't want to
force things, but we do try and nudge the compiler in the right
direction.
2023-10-09 20:29:52 -04:00
Andrew Gallant
1659fb9b43 printer: hand-roll decimal formatting
It seems like a trifle, but if the match frequency is high enough, the
allocation+formatting of line numbers (and columns and byte offsets)
starts to matter. We squash that part of the profile in this commit by
doing our own decimal formatting. I speculate that we get a speed-up
from this by avoiding the formatting machinery and also a possible
allocation.

An alternative would be to use the `itoa` crate, and it is indeed
marginally faster in ad hoc benchmarks, but I'm satisfied enough with
this solution.
2023-10-09 20:29:52 -04:00
Andrew Gallant
dd1bc5b898 printer: sprinkle in a few #[inline] annotations
These seem to help when ripgrep emits a lot of output, especially when
the --column flag is used.
2023-10-09 20:29:52 -04:00
Andrew Gallant
c9bfbe1e3d deps: bump regex and regex-automata
This brings in a fix for a bug I found during ad hoc benchmarking:
aa4e4c7120
2023-10-09 20:29:52 -04:00
Andrew Gallant
88524a2b52 core: dedup patterns
ripgrep does not, and likely never will, report which pattern matched.
Because of that, we can dedup the patterns via just their concrete
syntax without any fuss.

This is somewhat of a pathological case because you don't expect the end
user to pass duplicate patterns in general. But if the end user
generated a list of, say, names and did not dedup them, then ripgrep
could end up spending a lot of extra time on those duplicates if there
are many of them. By deduping them explicitly in the application, we
essentially remove their extra cost completely.
2023-10-09 20:29:52 -04:00
Andrew Gallant
9c6732bd26 printer: remove 'subl' alias
It was apparently using a format specific to a particular plugin. I did
know that, but apparently the plugin is not ubiquitous or de facto
standard[1]. Thus, including it I think just leads to more confusion. We
definitely do not want to be in the business of bundling aliases for
every conceivable plugin to different editors, so just drop it. We
expose the ability to write your own format for exactly this sort of
reason.

[1]: https://github.com/BurntSushi/ripgrep/discussions/2611#discussioncomment-7138302
2023-10-09 20:29:52 -04:00
Andrew Gallant
392bb0944a core: polish the core of ripgrep
This I believe finishes are quest to do mechanical updates to ripgrep's
style, bringing it in line with my current practice (loosely speaking).
2023-10-09 20:29:52 -04:00
Andrew Gallant
90b849912f deps: bump what we can 2023-10-09 20:29:52 -04:00
Andrew Gallant
6d17b3ed68 deps: drop thread_local, lazy_static and once_cell
This is largely made possible by the addition of std::sync::OnceLock to
the standard library, and the memory pool available in regex-automata.
2023-10-09 20:29:52 -04:00
Andrew Gallant
f16ea0812d ignore: polish
Like previous commits, we do a bit of polishing and bring the style up
to my current practice.
2023-10-09 20:29:52 -04:00
Andrew Gallant
be9e308999 globset: use a Pool from regex-automata
In the time before, we just used a RegexSet from the regex crate. That
allocated unconditionally, so there was nothing we could do and it
didn't expose any APIs to reuse that memory. But now that we're using
the lower level regex-automata, we can reuse a PatternSet.

Ideally we would just provide a way for the caller to build a PatternSet
(perhaps via an opaque type) so that we don't have to shuffle data into
a PatternSet and then back into the caller's `Vec<usize>`. But this at
least avoids allocating for every search.
2023-10-09 20:29:52 -04:00
Andrew Gallant
d53b7310ee searcher: polish
This updates some dependencies and brings code style in line with my
current practice.
2023-10-09 20:29:52 -04:00
Andrew Gallant
e30bbb8cff grep: update to the 2021 edition 2023-10-09 20:29:52 -04:00
Andrew Gallant
7f45640401 globset: polishing
This brings the code in line with my current style. It also inlines the
dozen or so lines of code for FNV hashing instead of bringing in a
micro-crate for it. Finally, it drops the dependency on regex in favor
of using regex-syntax and regex-automata directly.
2023-10-09 20:29:52 -04:00
Andrew Gallant
0951820f63 core: doc and logging touchups 2023-10-09 20:29:52 -04:00
Lucas Trzesniewski
c3e85f2b44 printer: fix a few issues in the hyperlink docs
Closes #2612
2023-10-09 20:29:52 -04:00
Andrew Gallant
3ad7a0d95e crates: remove hard-coded links
And use rustdoc's native intra-crate links. So much nicer.
2023-10-09 20:29:52 -04:00
Andrew Gallant
82d3183a04 regex: some minor polish
I think I already did a clean-up of this crate when I moved it to regex
1.9, so the polish here is very minor.
2023-10-09 20:29:52 -04:00
Andrew Gallant
798f8981eb pcre2: small polishing 2023-10-09 20:29:52 -04:00
Andrew Gallant
96f01b92a0 matcher: polish the grep-matcher crate
Not much here. Just updating to reflect my current style and bringing
the crate to the 2021 edition.
2023-10-09 20:29:52 -04:00
Linda_pp
abfa65c2c1
ignore/types: add *.sarif for SARIF format files
[SARIF] is a format for reporting static analysis results. It is [used
by GitHub CodeQL][GH] for example.

Here are some samples from Microsoft's VSCode extension:

https://github.com/microsoft/sarif-vscode-extension/tree/main/samples

The SARIF format is built on top of JSON.

[SARIF]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html
[GH]: https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning

PR #2620
2023-10-05 13:23:29 -04:00
Andrew Gallant
f608d4d9b3 hyperlink: rejigger how hyperlinks work
This essentially takes the work done in #2483 and does a bit of a
facelift. A brief summary:

* We reduce the hyperlink API we expose to just the format, a
  configuration and an environment.
* We move buffer management into a hyperlink-specific interpolator.
* We expand the documentation on --hyperlink-format.
* We rewrite the hyperlink format parser to be a simple state machine
  with support for escaping '{{' and '}}'.
* We remove the 'gethostname' dependency and instead insist on the
  caller to provide the hostname. (So grep-printer doesn't get it
  itself, but the application will.) Similarly for the WSL prefix.
* Probably some other things.

Overall, the general structure of #2483 was kept. The biggest change is
probably requiring the caller to pass in things like a hostname instead
of having the crate do it. I did this for a couple reasons:

1. I feel uncomfortable with code deep inside the printing logic
   reaching out into the environment to assume responsibility for
   retrieving the hostname. This feels more like an application-level
   responsibility. Arguably, path canonicalization falls into this same
   bucket, but it is more difficult to rip that out. (And we can do it
   in the future in a backwards compatible fashion I think.)
2. I wanted to permit end users to tell ripgrep about their system's
   hostname in their own way, e.g., by running a custom executable. I
   want this because I know at least for my own use cases, I sometimes
   log into systems using an SSH hostname that is distinct from the
   system's actual hostname (usually because the system is shared in
   some way or changing its hostname is not allowed/practical).

I think that's about it.

Closes #665, Closes #2483
2023-09-25 14:39:54 -04:00
Andrew Gallant
23e21133ba printer: move PathPrinter into grep-printer
I originally did not put PathPrinter into grep-printer because I
considered it somewhat extraneous to what a "grep" program does, and
also that its implementation was rather simple. But now with hyperlink
support, its implementation has grown a smidge more complicated. And
more importantly, its existence required exposing a lot more of the
hyperlink guts. Without it, we can keep things like HyperlinkPath and
HyperlinkSpan completely private.

We can now also keep `PrinterPath` completely private as well. And this
is a breaking change.
2023-09-25 14:39:54 -04:00
Andrew Gallant
09905560ff printer: clean-up
Like a previous commit did for the grep-cli crate, this does some
polishing to the grep-printer crate. We aren't able to achieve as much
as we did with grep-cli, but we at least eliminate all rust-analyzer
lints and group imports in the way I've been doing recently.

Next we'll start doing some more invasive changes.
2023-09-25 14:39:54 -04:00
Andrew Gallant
25a7145c79 cli: add new 'hostname' function
This will enable us to query for the current system's hostname in both
Unix and Windows environments.

We could have pulled in the 'gethostname' crate for this, but:

1. I'm not a huge fan of micro-crates.
2. The 'gethostname' crate panics if an error occurs. (Which, to be
fair, an error should never occur, but it seems plausible on borked
systems? ripgrep runs in a lot of places, so I'd rather not take the
chance of a panic bringing down ripgrep for an optional convenience
feature.)
3. The 'gethostname' crate uses the 'windows-targets' crate from
Microsoft. This is arguably the "right" thing to do, but ripgrep
doesn't use them yet and they appear high-churn.

So I just added a safe wrapper to do this to winapi-util[1] and then
inlined the Unix version here. This brings in no extra dependencies and
the routine is fallible so that callers can recover from potentially
strange failures.

[1]: https://github.com/BurntSushi/winapi-util/pull/14
2023-09-25 14:39:54 -04:00
Andrew Gallant
19a08bee8a cli: clean-up crate
This does a variety of polishing.

1. Deprecate the tty methods in favor of std's IsTerminal trait.
2. Trim down un-needed dependencies.
3. Use bstr to implement escaping.
4. Various aesthetic polishing.

I'm doing this as prep work before adding more to this crate. And as
part of a general effort toward reducing ripgrep's dependencies.
2023-09-25 14:39:54 -04:00
Lucas Trzesniewski
1a50324013 printer: add hyperlinks
This commit represents the initial work to get hyperlinks working and
was submitted as part of PR #2483. Subsequent commits largely retain the
functionality and structure of the hyperlink support added here, but
rejigger some things around.
2023-09-25 14:39:54 -04:00
Andrew Gallant
86ef683308 deps: update everything
Notably, this includes termcolor 1.3, which comes with hyperlink
support.
2023-09-20 11:52:42 -04:00
Tavian Barnes
d938e955af ignore: use work-stealing stack instead of Arc<Mutex<Vec<_>>>
This represents yet another iteration on how `ignore` enqueues and
distributes work in parallel. The original implementation used a
multi-producer/multi-consumer thread safe queue from crossbeam. At some
point, I migrated to a simple `Arc<Mutex<Vec<_>>>` and treated it as a
stack so that we did depth first traversal. This helped with memory
usage in very wide directories.

But it turns out that a naive stack-behind-a-mutex can be quite a bit
slower than something that's a little smarter, such as a work-stealing
stack used in this commit. My hypothesis for why this helps is that
without the stealing component, work distribution can get stuck in
sub-optimal configurations that depend on which directory entries get
assigned to a particular worker. It's likely that this can result in
some workers getting "more" work than others, just by chance, and thus
remain idle. But the work-stealing approach heads that off.

This does re-introduce a dependency on parts of crossbeam which is kind
of a bummer, but it's carrying its weight for now.

Closes #1823, Closes #2591
Ref https://github.com/sharkdp/fd/issues/28
2023-09-20 11:52:42 -04:00
Thilo Uttendorfer
cad1f5fae2 ignore: fix filtering when searching subdirectories
When searching subdirectories the path was not correctly built and
included duplicate parts. This fix will remove the duplicate part if
possible.

Fixes #1757, Closes #2295
2023-09-20 11:52:42 -04:00
dana
2198bd92fa
github: convert bug-report issue template to issue form
Trying this to see how well it works.

PR #2560
2023-09-18 11:07:46 -04:00
Andrew Gallant
a4387ed491
deps: bump to aho-corasick 1.1.0
This brings in aarch64 SIMD support for Teddy[1]. In effect, it means
searches that are multiple (but a small number of) literals extracted
will likely get much faster on aarch64 (i.e., Apple silicon). For
example, from the PR, on my M2 mac mini:

    $ time rg-before-teddy-aarch64 -i -c 'Sherlock Holmes' OpenSubtitles2018.half.en
    3055

    real    8.196
    user    7.726
    sys     0.469
    maxmem  5728 MB
    faults  17

    $ time rg-after-teddy-aarch64 -i -c 'Sherlock Holmes' OpenSubtitles2018.half.en
    3055

    real    1.127
    user    0.701
    sys     0.425
    maxmem  4880 MB
    faults  13

w00t.

[1]: https://github.com/BurntSushi/aho-corasick/pull/129
2023-09-18 09:35:06 -04:00
Andrew Gallant
d2a409f89f
deps: bump to memchr 2.6.3
This brings in a fix for line counting when SIMD isn't available[1].

[1]: https://github.com/BurntSushi/memchr/pull/137
2023-09-02 14:40:45 -04:00
Andrew Gallant
6cdb99ea61
deps: drop bytecount in favor of memchr_iter(..).count()
As of the memchr 2.6 release, its Iterator::count method is specialized
to only count the number of occurrences instead of finding the offset of
each occurrence. This replaces ripgrep's use of the bytecount crate.
While micro-benchmarks suggest that memchr's method has better
throughput than bytecount, it turned out to be an illusion. Namely, on a
~13GB haystack prior to this change:

    $ time rg-bytecount 'You killed my friend, my best friend, my lifelong friend!' OpenSubtitles2018.raw.en --line-number
    441450441:- You killed my friend, my best friend, my lifelong friend!

    real    1.473
    user    1.186
    sys     0.286
    maxmem  12512 MB
    faults  0

And then after:

    $ time rg 'You killed my friend, my best friend, my lifelong friend!' OpenSubtitles2018.raw.en --line-number
    441450441:- You killed my friend, my best friend, my lifelong friend!

    real    1.532
    user    1.280
    sys     0.250
    maxmem  12512 MB
    faults  0

But perf is just about in the same ballpark. That's good enough for me
at the moment in order to drop the extra dependency.

I did this because the marginal cost of adding the Iterator::count()
specialization to memchr was extremely small.
2023-09-02 12:25:34 -04:00
Andrew Gallant
551ad3bada
deps: update bstr 2023-09-02 12:15:15 -04:00
Andrew Gallant
8856f72df5
deps: update the regex family of crates 2023-09-02 12:14:50 -04:00
Yochem van Rosmalen
d596f6ebd0
ignore/types: add *.vsh to V type
PR #2604
2023-08-31 08:51:07 -04:00
Christian Vallentin
6cd9479634
ignore: implement FusedIterator for Walk
PR #2567
2023-08-28 22:55:19 -04:00
Andrew Gallant
3bfa125b2e ci: replace mips with powerpc64, aarch64 and s390x
We drop our MIPS target because it no longer works.[1] We were
previously using it as a means of testing ripgrep in a big endian
environment. So to achieve that without MIPS, we test on powerpc64 and
s390x. (No particular reason to do both, but why not.)

We also add aarch64 as a proxy for at least ensuring everything works
for the same architecture as Apple silicon. It's not a guarantee that
everything works, but it seems better than nothing until we can actually
test Apple silicon in CI.

[1]: c788378d6f
2023-08-28 22:45:46 -04:00
Andrew Gallant
51765f2f4c
ignore: apply rustfmt
I believe this happened because rustfmt now knows how to format `let ...
else` constructs.
2023-08-28 20:09:26 -04:00
Andrew Gallant
67abd49678
deps: bump everything else 2023-08-28 20:00:41 -04:00
Andrew Gallant
a7fe296772
deps: bump regex, regex-automata and regex-syntax 2023-08-28 19:59:09 -04:00
Andrew Gallant
f75991538b
deps: bump memchr to 2.6.0
This in particular brings in a PR[1] that provides huge speedups on
aarch64 (e.g., Apple silicon).

[1]: https://github.com/BurntSushi/memchr/pull/129
2023-08-28 19:56:59 -04:00
mataha
962d47e6a1
ignore/types: add Prolog file types
This improves the Prolog file type rules.

* `.pl` is the most common extension in the wild, though `.pro` is
   preferred in places where file extension may clash with Perl[1].
* `.P` is used for compatibility with XSB Prolog dialect[2].

PR #2590

[1]: https://www.swi-prolog.org/pldoc/man?section=fileext
[2]: https://www.swi-prolog.org/pldoc/man?section=xsb-source
2023-08-21 10:53:56 -04:00
mataha
19b6a45abb
ignore/types: tweak Gradle file types
This PR extends Gradle file types with the following:

 - Kotlin DSL buildscripts (`*.gradle.kts`)
 - Gradle Java properties (`gradle.properties`)
 - wrapper files (`gradle-wrapper.*`)
 - wrapper scripts (`gradlew`, `gradlew.bat`)

PR #2587
2023-08-20 18:49:02 -04:00
Andrew Gallant
c51790b56d
deps: update everything 2023-08-15 11:09:46 -04:00
Andrew Gallant
2af3734e0c
deps: update aho-corasick
This brings in [1,2], which improves memory usage substantially when
Aho-Corasick is used.

[1]: https://github.com/BurntSushi/aho-corasick/pull/120
[2]: https://github.com/BurntSushi/aho-corasick/pull/121
2023-08-15 11:08:41 -04:00
Andrew Gallant
61733f6378
globset-0.4.13 2023-08-05 09:34:36 -04:00
Andrew Gallant
7227e94ce5 globset: use non-capture groups in regex transform
We currently implement globs by converting them to regexes, and in doing
so, sometimes use grouping. In all but one case, we used non-capturing
groups. But for alternations, we used capturing groups, which was likely
just an oversight. We don't make use of capture groups at all, and while
they usually don't have any overhead, they lead to weird cases like this
one: https://github.com/rust-lang/regex/issues/1059

That particular issue is also a bug in the regex crate itself, which is
fixed in https://github.com/rust-lang/regex/pull/1062. Note though that
the bug fix in the regex crate is required. Even with this patch to
globset, memory usage is reduced (by about half in rust-lang/regex#1059)
but is not returned to where it was prior to the regex 1.9 release.
2023-08-05 09:33:57 -04:00
Andrew Gallant
341a19e0d0
regex: fix fast path for -w/--word-regexp flag (#2576)
It turns out our fast path for -w/--word-regexp wasn't quite correct in
some cases. Namely, we use `(?m:^|\W)(<original-regex>)(?m:\W|$)` as the
implementation of -w/--word-regexp since `\b(<original-regex>)\b` has
some unintuitive results in certain cases, specifically when
<original-regex> matches non-word characters at match boundaries.

The problem is that using this formulation means that you need to
extract the capture group around <original-regex> to find the "real"
match, since the surrounding (^|\W) and (\W|$) aren't part of the match.
This is fine, but the capture group engine is usually slow, so we have a
fast path where we try to deduce the correct match boundary after an
initial match (before running capture groups). The problem is that doing
this is rather tricky because it's hard to know, in general, whether the
`^` or the `\W` matched.

This still doesn't seem quite right overall, but we at least fix one
more case.

Fixes #2574
2023-07-31 08:51:09 -04:00
Vidar
fed4fea217
ignore/types: add csproj
Supports the .NET C# Project file extension.

PR #2575
2023-07-31 07:08:44 -04:00
Andrew Gallant
053a1669bb
globset-0.4.12 2023-07-26 19:51:38 -04:00
David Tolnay
31d3f16254
api: impl Deserialize for GlobSet
PR #2569
2023-07-26 19:51:22 -04:00
Andrew Gallant
304a60e8e9
grep-cli-0.1.9 2023-07-18 13:25:23 -04:00
Andrew Gallant
1d35859861
globset-0.4.11 2023-07-12 12:58:43 -04:00
mataha
601e122e9f
ignore/types: add Windows Command Prompt files
This PR adds `*.bat` and `*.cmd` file types.

In doing so, it makes a distinction between batch files (old standard
from the MS-DOS era) and command scripts (new flavor - can operate on
batch files, although `*.cmd` is preferred for various reasons, the
main one being batch files will set `ERRORLEVEL` following inconsistent
MS-DOS style rules[1]).

PR #2556

[1]: https://groups.google.com/g/microsoft.public.win2000.cmdprompt.admin/c/XHeUq8oe2wk/m/LIEViGNmkK0J#i106
2023-07-10 15:58:17 -04:00
Andrew Gallant
efb2e8ce1e ci/release: use latest OS versions 2023-07-09 10:14:03 -04:00
xEgoist
8d464e5c78 ci/release: add sha256 sums to release artifacts
Fixes #1924, Closes #2168
2023-07-09 10:14:03 -04:00
Andrew Gallant
d67809d6c4 github: remove dependabot configuration
This does not seem to have worked at all. For example, there were
Actions being used that were clearly deprecated/archived[1]. But
Dependabot didn't make a peep. So just get rid of it to avoid the false
sense that someone is checking our dependencies for us.

[1]: https://github.com/BurntSushi/ripgrep/pull/2360
2023-07-09 10:14:03 -04:00
nguyenvukhang
6abb962f0d cli: fix non-path sorting behavior
Previously, sorting worked by sorting the parents and then sorting the
children within each parent. This was done during traversal, but it only
works when sorting parents preserves the overall order. This generally
only works for '--sort path' in ascending order.

This commit fixes the rest of the sorting behavior by collecting all of
the paths to search and then sorting them before searching. We only
collect all of the paths when sorting was requested.

Fixes #2243, Closes #2361
2023-07-09 10:14:03 -04:00
Edoardo Pirovano
6d95c130d5 cli: add --stop-on-nonmatch flag
This causes ripgrep to stop searching an individual file after it has
found a non-matching line. But this only occurs after it has found a
matching line.

Fixes #1790, Closes #1930
2023-07-08 18:52:42 -04:00
Garrett Thornburg
4782ebd5e0 core: lock stdout before printing an error message to stderr
Adds a new eprintln_locked macro which locks STDOUT before logging
to STDERR. This patch also replaces instances of eprintln with
eprintln_locked to avoid interleaving lines.

Fixes #1941, Closes #1968
2023-07-08 18:52:42 -04:00
piegames
4993d29a16 globset: add 'escape' routine
Fixes #2060, Closes #2061
2023-07-08 18:52:42 -04:00
Seth Stadick
23adbd6795 cli: force binary existance check
Previously, we were only doing a binary existence check on Windows. And
in fact, the main point there wasn't binary existence, but ensuring we
didn't accidentally resolve a binary name relative to the CWD, which
could result in executing a program one didn't mean to run.

However, it is useful to be able to check whether a binary exists on any
platform when associating a glob with a binary. If the binary doesn't
exist, then the association can fail eagerly and let some other glob
apply.

Closes #1946
2023-07-08 18:52:42 -04:00
Kevin Svetlitski
9df8ab42b1 cargo: reduce the size of the .crate file published to crates.io
None of this stuff is needed for the main ripgrep crate.

Closes #1940
2023-07-08 18:52:42 -04:00
Michal Terepeta
cb7501ff11 doc: clarify the comment on Worker.work_done
We call `work_done` only once the work has been actually performed
(otherwise `num_pending` could go to 0 before the actual work is done).

Closes #2039
2023-07-08 18:52:42 -04:00
Kyle Todeschini
3b66f37a31 doc: improve -r/--replace flag syntax docs
Fixes #2108, Closes #2123
2023-07-08 18:52:42 -04:00
Andrew Gallant
3eccb7c363 readme: add 'yum-utils' to RHEL/Centos instructions
Closes #2103
2023-07-08 18:52:42 -04:00
kotborealis
f30a30867e ignore/types: name aliases for file types
We also make py/python, md/markdown and ts/typescript aliases of one
another.

Note that this only introduces aliases at the point where default types
are defined. This just makes them a bit easier to read/write, and also
makes it easier to expose more names that describe the same thing.

Fixes #1857, Closes #1895
2023-07-08 18:52:42 -04:00
Klas Mellbourn
7313dca472 ignore/types: add 'typescript' alias for 'ts'
Closes #2009
2023-07-08 18:52:42 -04:00
Tama McGlinn
99bf2b01dc ignore/types: add Ada filetypes, including gprbuild and alire
*.adb and *.ads are the usual extensions for Ada source code,
and *.gpr indicates a GPRbuild project file used for Ada, and
these days often being combined with alire for package dependency
resolution. Alire stores a bunch of files named alire.toml in
different directories in your (gitignored) cache/dependencies/...

Closes #2013
2023-07-08 18:52:42 -04:00
Juan Francisco Cantero Hurtado
ee1360cc07 ignore/types: add raku extensions to ignore types
Closes #2117
2023-07-08 18:52:42 -04:00
Andrew Gallant
db6bb21a62 windows: attempt to enable long path support for MSVC targets
See the README and comments in the build.rs. Basically, this embeds an
XML file that I guess is a way of setting configuration knobs on
Windows. One of those knobs is enabling long path support. You still
need to enable it in your registry (lol), but this will handle the other
half of it.

Fixes #364, Closes #2049
2023-07-08 18:52:42 -04:00
Andrew Gallant
da7c81fb96 ignore/types: add MDX format to Markdown types
Ref https://mdxjs.com/

Closes #2142
2023-07-08 18:52:42 -04:00
chrispy
a4e3d56de1 ignore/types: add DITA (Darwin Information Typing Architecture)
Closes #2148
2023-07-08 18:52:42 -04:00
Ludi Rehak
7c83b90f95 doc: fix typo
Closes #2153
2023-07-08 18:52:42 -04:00
cuishuang
97b5b7769c doc: fix some typos
Closes #2195
2023-07-08 18:52:42 -04:00
dana
2708f9e81d complete: add extra-verbose support to _rg_types
When the extra-verbose style is set for the types tag, completed types
are displayed along with the patterns they correspond to. This can be
enabled by e.g. adding the following to .zshrc:

  zstyle ':completion:*:rg:*:types' extra-verbose true

This change also makes _rg_types use the actual rg specified on the
command line to look up types, and it fixes a mangled complete-all
style check

Fixes #2195
2023-07-08 18:52:42 -04:00
Richard Sternagel
f3241fd657 cli: '--no-ignore-dot' should also '.rgignore'
Fixes #2198, Closes #2202
2023-07-08 18:52:42 -04:00
Andrew Gallant
cfe357188d ignore/types: fix formatting 2023-07-08 18:52:42 -04:00
edam
792451e331 ignore/types: added V type
V (http://vlang.io) uses '.v' files.

Closes #2302
2023-07-08 18:52:42 -04:00
Andrew Gallant
7dafd58a32 readme: use 'sudo' more consistently
I definitely wonder whether I should just drop 'sudo' from the install
instructions and just rely on the user to "know" to do it. But some
commands legitimately do not require 'sudo', so there are actual
differences. Overall, this feels clearer to me but reasonable people can
disagree.
2023-07-08 18:52:42 -04:00
Andrew Savchenko
b92550b67b readme: add install command for ALT Linux
Closes #2330
2023-07-08 18:52:42 -04:00
Kevin Ushey
383d3b336b doc: add '--hidden' to example configuration
This increases visibility of the fact that hidden files are skipped by
default.

Closes #2356
2023-07-08 18:52:42 -04:00
James McKinney
fc7e634395 ci/release: Use GITHUB_REF_NAME instead of GITHUB_REF
This is a nice quality of life improvement.

Closes #2358
2023-07-08 18:52:42 -04:00
James McKinney
c9584b035b ci/release: use GitHub CLI
The old actions I was using are apparently archived because they make
use of deprecated features (like `set-output`). Sigh.

Closes #2360
2023-07-08 18:52:42 -04:00
Alex Rawson
f34fd5c4b6 globset: introduce option to keep empty alternates
Add a method GlobBuilder::empty_alternates and supporting mechanisms.

Ref #1368
Closes #2369
2023-07-08 18:52:42 -04:00
Jérome Eertmans
d51c6c005a globset: permit deserializing Glob from String
Closes #2386, Closes #2388
2023-07-08 18:52:42 -04:00
Jakub Wilk
ea05881319 readme: fix awkward grammar
Closes #2402
2023-07-08 18:52:42 -04:00
sitiom
1d4e3df19c readme: add winget installation section
Closes #2409
2023-07-08 18:52:42 -04:00
Mark Sisson
0f6181d309 ignore/types: add USD to the default file types
Closes #2432
2023-07-08 18:52:42 -04:00
Sam James
e902e2fef4 ignore/types: add Gentoo eclass type
Eclasses are "ebuild libraries" and generally if you're filtering
for/filtering out an ebuild/eclass, you don't want the other either.

Followup to 4dfea016b915bb1e88679361de83a91e60447835

Closes #2437
2023-07-08 18:52:42 -04:00
angrycandy
07cbfee225 ignore/types: improve Elixir globs
Closes #2450
2023-07-08 18:52:42 -04:00
Andrew Gallant
d675844510 core: don't let context flags override eachother
This matches the behavior of GNU grep which does not ignore
before-context and after-context completely if the context flag is also
provided.

Note that this change wasn't done just to match GNU grep. In this case,
GNU grep has the more sensible behavior.

Fixes #2288, Closes #2451
2023-07-08 18:52:42 -04:00
Andrew Gallant
54e609d657 doc: add another example for the config file
Closes #2453
2023-07-08 18:52:42 -04:00
Misaki
43bbcca06f doc: note '-n' and '-N' override each other
Closes #2460
2023-07-08 18:52:42 -04:00
Eric Arellano
ad9bfdd981 ignore/gitignore: expose gitconfig_excludes_path
I have reservations about this, but it looks useful and doesn't seem
terribly onerous to support. The `ignore` crate will really always need
to have some kind of logic supporting this in some form I think.

Closes #2482
2023-07-08 18:52:42 -04:00
Gal Ofri
36194c2742 test: test that regex inline flags work as intended
This was originally fixed by using non-capturing groups when joining
patterns in crates/core/args.rs, but before that landed, it ended up
getting fixed via a refactor in the course of migrating to regex 1.9.
Namely, it's now fixed by pushing pattern joining down into the regex
layer, so that patterns can be joined in the most effective way
possible.

Still, #2488 contains a useful test, so we bring that in here. The
test actually failed for `rg -e ')('`, since it expected the command to
fail with a syntax error. But my refactor actually causes this command
to succeed. And indeed, #2488 worked around this by special casing a
single pattern. That work-around fixes it for the single pattern case,
but doesn't fix it for the -w or -X or multi-pattern case. So for now,
we're content to leave well enough alone. The only real way to fix this
for real is to parse each regexp individual and verify that each is
valid on its own. It's not clear that doing so is worth it.

Fixes #2480, Closes #2488
2023-07-08 18:52:42 -04:00
Jakub Jirutka
0c1cbd99f3 ignore: tweak regex crate features
This removes most of the Unicode features as they aren't currently
used. We can always add them back later if necessary.

We can avoid the unicode-perl feature by changing `\s` to `[[:space:]]`,
which uses the ASCII-only definition of `\s`. Since we don't expect
non-ASCII whitespace in git config files, this seems okay.

Closes #2502
2023-07-08 18:52:42 -04:00
Jon Parise
96cfc0ed13 ignore/types: add 'graphql' type
GraphQL file extensions: .graphql and .graphqls (schema)

We could also add `.gql`, but perhaps it's less correct to do so. We'll
start conservatively here, and we can always add `.gql` later.

Closes #2439, Closes #2508
2023-07-08 18:52:42 -04:00
mataha
da8ecddce9 cli: make resolve_binary take COM executables into account
When `resolve_binary()` attempts to resolve a path to a program on
Windows while searching for a program in `PATH` without an extension,
`ripgrep` will assume the extension of the file to be `.exe` as it's
the *de facto* standard, which will work most (99.99%) of the time...

...unless the binary is a COM executable (we're on Windows, duh).

Closes #2523
2023-07-08 18:52:42 -04:00
Yifei Teng
545a7dc759 ignore/types: add cml to the default types list
It's used in Fuchsia to mean "component manifest language."[1]

[1]: https://fuchsia.dev/reference/cml?hl=en

Closes #2529
2023-07-08 18:52:42 -04:00
Jonathan Schwender
16f783832e doc: update rust-version in Cargo.toml
The MSRV got bumped a little bit ago, so this is just catchup.

Closes #2539
2023-07-08 18:52:42 -04:00
Andrew Gallant
f4d07b9cbd
grep-cli-0.1.8 2023-07-05 17:09:09 -04:00
Andrew Gallant
0b6eccf4d3 ci: try to fix CI 2023-07-05 14:04:29 -04:00
Andrew Gallant
3ac4541e9f regex: remove old inner literal extractor
(It had already been removed from the crate.)
2023-07-05 14:04:29 -04:00
Andrew Gallant
7b72e982f2 deps: update everything 2023-07-05 14:04:29 -04:00
Andrew Gallant
a68db3ac02 deps: drop temporary patch and move to bstr 1.6
Now that regex 1.9 is out, we can depend on it from crates.io.
2023-07-05 14:04:29 -04:00
Andrew Gallant
b12905daca deps: update everything 2023-07-05 14:04:29 -04:00
Andrew Gallant
ca740d9ace regex: add new inner literal extractor
This is mostly a copy of the prefix literal extractor in regex-syntax,
but with a tweaked notion of Seq that keeps track of whether it's a
prefix of an expression or not. If it isn't, then we can't cross it as a
suffix to another Seq.

This new extractor should be a lot more robust than the old one. We
actually will keep going through the regex to try and find the "best"
literals to search for (according to some heuristic).
2023-07-05 14:04:29 -04:00
Andrew Gallant
e80c102dee regex: tweak formatting of regex-automata version spec
This makes it easier to enable the `logging` feature for regex-automata.

I wish I could just enable it unconditionally, but it winds up producing
a lot of output because ripgrep uses regexes for things other than the
primary search (like every glob). Sigh.
2023-07-05 14:04:29 -04:00
Andrew Gallant
8ac66a9e04 regex: refactor matcher construction
This does a little bit of refactoring so that we can pass both a
ConfiguredHIR and a Regex to the inner literal extraction routine.

One downside of this approach is that a regex object hangs on to a
ConfiguredHIR. But the extra memory usage is probably negligible. A
benefit though is that converting the HIR to its concrete syntax is now
lazy and only happens when logging is enabled.
2023-07-05 14:04:29 -04:00
Andrew Gallant
04dde9a4eb regex: tweak DFA settings
This increases the limits a bit for when the regex engine will build and
use a fully compiled DFA. They can faster in some circumstances. For
example, '(?-u)^\w{30,}$' gets a nice speed boost from state
acceleration.

We are also able to remove `regex` proper as a dependency. Wow.
2023-07-05 14:04:29 -04:00
Andrew Gallant
81341702af regex: push more pattern handling to matcher construction
Previously, ripgrep core was responsible for escaping regex patterns and
implementing the --line-regexp flag. This commit moves that
responsibility down into the matchers such that ripgrep just needs to
hand the patterns it gets off to the matcher builder. The builder will
then take care of escaping and all that.

This was done to make pattern construction completely owned by the
matcher builders. With the arrival regex-automata, this means we can
move to the HIR very quickly and then never move back to the concrete
syntax. We can then build our regex directly from the HIR. This overall
can save quite a bit of time, especially when searching for large
dictionaries.

We still aren't quite as fast as GNU grep when searching something on
the scale of /usr/share/dict/words, but we are basically within spitting
distance. Prior to this, we were about an order of magnitude slower.

This architecture in particular lets us write a pretty simple fast path
that avoids AST parsing and HIR translation entirely: the case where one
is just searching for a literal. In that case, we can hand construct the
HIR directly.
2023-07-05 14:04:29 -04:00
Andrew Gallant
d34c5c88a7 globset: fix build error in tests
I guess we haven't been testing with the Serde feature enabled? Weird.
2023-07-05 14:04:29 -04:00
Andrew Gallant
4b8aa91ae5 deps: update to pcre2 0.2.4
0.2.4 updates to PCRE2 10.42 and has a few other nice changes. For
example, when `utf` is enabled, the crate will always set the
PCRE2_MATCH_INVALID_UTF option. That means we no longer need to do
transcoding or UTF-8 validity checks.

Because of this, we actually get to remove one of the two uses of
`unsafe` in ripgrep's `main` program.

(This also updates a couple other dependencies for convenience.)
2023-07-05 14:04:29 -04:00
Andrew Gallant
a775b493fd regex: small cleanups
Just some small polishing. We also get rid of thread_local in favor of
using regex-automata, mostly just in the name of reducing dependencies.
(We should eventually be able to drop thread_local completely.)
2023-07-05 14:04:29 -04:00
Andrew Gallant
a6dbff502f regex: s/locations/captures
Now that we use regex-automata, we no longer use any type with
"locations" in it. Instead, that's mostly legacy from the top-level
regex crate.
2023-07-05 14:04:29 -04:00
Andrew Gallant
51480d57a6 regex: simplify AST analysis a bit
The verbatim literal stuff hasn't been used for a while and I don't
foresee it being used. If it's really needed, it would probably better
to just implement it by looking at the pattern string itself, which
avoids parsing it into an AST altogether.
2023-07-05 14:04:29 -04:00
Andrew Gallant
d9bd261be8 regex: some small cleanup in 'strip.rs'
We also utilize bstr's methods to get rid of some helpers we had written
by hand.
2023-07-05 14:04:29 -04:00
Andrew Gallant
9d62eb997a BREAKING: regex: finally remove CRLF hack
Now that Rust's regex crate finally supports a CRLF mode, we can remove
this giant hack in ripgrep to enable it. (And assuredly did not work in
all cases.)

The way this works in the regex engine is actually subtly different than
what ripgrep previously did. Namely, --crlf would previously treat
either \r\n or \n as a line terminator. But now it treats \r\n, \n and
\r as line terminators. In effect, it is implemented by treating \r and
\n as line terminators, but ^ and $ will never match at a position
between a \r and a \n.

So basically this means that $ will end up matching in more cases than
it might be intended too, but I don't expect this to be a big problem in
practice.

Note that passing --crlf to ripgrep and enabling CRLF mode in the regex
via the `R` inline flag (e.g., `(?R:$)`) are subtly different. The `R`
flag just controls the regex engine, but --crlf instructs all of ripgrep
to use \r\n as a line terminator. There are likely some inconsistencies
or corner cases that are wrong as a result of this cognitive dissonance,
but we choose to leave well enough alone for now.

Fixing this for real will probably require re-thinking how line
terminators are handled in ripgrep. For example, one "problem" with how
they're handled now is that ripgrep will re-insert its own line
terminators when printing output instead of copying the input. This is
maybe not so great and perhaps unexpected. (ripgrep probably can't get
away with not inserting any line terminators. Users probably expect
files that don't end with a line terminator whose last line matches to
have a line terminator inserted.)
2023-07-05 14:04:29 -04:00
Andrew Gallant
e028ea3792 regex: migrate grep-regex to regex-automata
We just do a "basic" dumb migration. We don't try to improve anything
here.
2023-07-05 14:04:29 -04:00
Andrew Gallant
1035f6b1ff deps: initial migration steps to regex 1.9
This leaves the grep-regex crate in tatters. Pretty much the entire
thing needs to be re-worked. The upshot is that it should result in some
big simplifications. I hope.

The idea here is to drop down and actually use regex-automata 0.3
instead of the regex crate itself.
2023-07-05 14:04:29 -04:00
Andrew Gallant
a7f1276021
readme: update Debian instructions
We probably don't need to mention Buster specifically nor Debian
unstable since ripgrep has been in Debian for a while now.

But we can't just get rid of the `deb` file either, because Debian might
package a very old version.

Fixes #2531
2023-06-12 07:50:13 -04:00
Martin Nordholts
4fcb1b2202
cli: replace atty with std::io::IsTerminal
The `atty` crate is unmaintained[1] and `std::io::IsTerminal` was
stabilized in Rust 1.70.

[1]: https://rustsec.org/advisories/RUSTSEC-2021-0145.html

PR #2526
2023-06-05 14:00:46 -04:00
Francois Marier
949092fd22
ignore/types: add 'mdwn' to Markdown
PR #2520
2023-05-26 14:44:41 -04:00
Andrew Gallant
4a7e7094ad
deps: update everything else 2023-05-25 13:06:13 -04:00
Andrew Gallant
fc0d9b90a9
deps: bump regex to 1.8.3
This brings in an update from the regex crate that fixes a matching bug
for particular kinds of alternations of literals.

Fixes #2518
2023-05-25 13:06:13 -04:00
Ville Skyttä
335aa4937a
ignore/types: add *.pyi for Python
https://peps.python.org/pep-0484/#stub-files

PR #2517
2023-05-23 07:10:02 -04:00
Adam Reichold
803c447845
searcher: re-enable mmap on 32-bit architectures
memmap2 v0.3.0 introduced a regression when trying to map files larger than 4GB
on 32-bit architectures[1] which was subsequently fixed in v0.3.1[2].

This commit bumps locked version of the memmap2 dependency to the current v0.5.0
and reverts fdfc418be55ff91e0c2efad6a3e27db054cb5534 to re-enable mmap on 32-bit
architectures as a different approach to fixing [3].

This was tested to report matches from the end of a 5GB file using MinGW and Wine.

Ref #1911, PR #2000 

[1] 5e271224c8
[2] 9aa838aed9
[3] https://github.com/BurntSushi/ripgrep/issues/1911
2023-05-19 08:23:53 -04:00
Andrew Gallant
c5415adbe8
deps: update everything
This does unfortunately bring in both regex-syntax 0.6 and 0.7, but
we'll fix that once regex 1.9 is out.
2023-05-16 13:14:23 -04:00
Andrew Gallant
251376597f
deps: update minimum version of grep crate
Ref #2516
2023-05-16 13:13:34 -04:00
Andrew Gallant
e593f5b7ee
grep-0.2.12 2023-05-16 13:12:45 -04:00
Andrew Gallant
6b19be2477
crates/grep: remove 'deny(missing_docs)'
This crate is only a shim over a bunch of other crates. I'm not sure
that there's anything to add to each of the `pub extern` items. So
instead of just writing fluff, I removed the lint.

Fixes #2516
2023-05-16 13:10:42 -04:00
Ryan Whitehouse
041544853c
doc: fix --quiet docs
The wording was previously inverted, which had the opposite
meaning as was intended.

Fixes #1962
2023-03-28 07:22:59 -04:00
Manu
a7ae9e4043
ignore/types: add support for docker-compose files
Default file is docker-compose.yml and the documentation
mentions overrides in the form of docker-compose.*.yml.

PR #2469
2023-03-21 12:56:38 -04:00
Andrew Gallant
595e7845b8
readme: add a link to delta's support for ripgrep
Ref: https://github.com/BurntSushi/ripgrep/issues/86#issuecomment-1469717706
2023-03-15 08:02:04 -04:00
David Ringo
44fb9fce2c
ignore/types: add *.sln for msbuild
.sln is the extension for Visual Studio Project Soltion files, one of
the file types accepted as inputs by MSBuild.

PR #2415
2023-02-09 21:20:49 -05:00
Vincent Bockaert
339c46a6ed
ignore/types: enhance terraform default filter
The default filter for terraform only checks for *.tf files, but there
are quite few other terraform filetypes.

The explanation for all of them can be found below (including link to
documentation from Hashicorp at time of writing)

- *.tf.json & *.tfvars.json is to capture the files written in
  JSON-based variant of the Terraform language
    - https://developer.hashicorp.com/terraform/language/files
- *.tfvars is used to supply variables
    - https://developer.hashicorp.com/terraform/cloud-docs/workspaces/variables#6-auto-tfvars-variable-files
- .terraform.lock.hcl is used as a Dependency lock file
    - https://developer.hashicorp.com/terraform/language/files/dependency-lock
- terraform.rc & .terraformrc, *.tfrc
    - https://developer.hashicorp.com/terraform/cli/config/config-file

PR #2412
2023-02-09 12:57:01 -05:00
Andrew Gallant
fe97c0a152
ignore-0.4.20 2023-01-15 08:21:02 -05:00
Christian Vallentin
826f3fad5b
ignore/api: add Clone and Debug impls for OverrideBuilder
PR #2397
2023-01-15 08:16:27 -05:00
Andrew Gallant
bc55049327
readme: update MSRV in README
... this was apparently long outdated, wow.
2023-01-05 12:09:46 -05:00
Andrew Gallant
d58e9353fc
deps: update to grep 0.2.11 2023-01-05 09:13:47 -05:00
Andrew Gallant
ca60fef4db
grep-0.2.11 2023-01-05 09:12:49 -05:00
Andrew Gallant
a25307d6c8
deps: update to grep-printer 0.1.7 2023-01-05 09:12:37 -05:00
Andrew Gallant
b80947a8b3
grep-printer-0.1.7 2023-01-05 09:11:16 -05:00
Andrew Gallant
ad793a0d8f
deps: update to grep-searcher 0.1.11 2023-01-05 09:07:49 -05:00
Andrew Gallant
120e55e7c7
grep-searcher-0.1.11 2023-01-05 09:07:09 -05:00
Andrew Gallant
3941a7701d
deps: update to grep-pcre2 0.1.6 2023-01-05 09:06:52 -05:00
Andrew Gallant
96e130fbf9
grep-pcre2-0.1.6 2023-01-05 09:05:59 -05:00
Andrew Gallant
180c4eaf8b
deps: update to grep-regex 0.1.11 2023-01-05 09:05:39 -05:00
Andrew Gallant
81529288cf
grep-regex-0.1.11 2023-01-05 09:02:55 -05:00
Andrew Gallant
bcc7473a87
deps: update to grep-matcher 0.1.6 2023-01-05 09:02:40 -05:00
Andrew Gallant
bc78c644db
grep-matcher-0.1.6 2023-01-05 09:00:33 -05:00
Andrew Gallant
dc7267a0fb
deps: update to grep-cli 0.1.7 2023-01-05 08:58:47 -05:00
Andrew Gallant
3224324e25
grep-cli-0.1.7 2023-01-05 08:57:31 -05:00
Andrew Gallant
0f61f08eb1
deps: update to ignore 0.4.19 2023-01-05 08:57:05 -05:00
Andrew Gallant
a0e8dbe9df
ignore-0.4.19 2023-01-05 08:55:46 -05:00
Andrew Gallant
e95254a86f
deps: remove ignore's dependency on crossbeam-utils
Scoped threads are now part of std.
2023-01-05 08:51:08 -05:00
Andrew Gallant
2f484d8ce5
deps: update to globset 0.4.10 2023-01-05 08:49:58 -05:00
Andrew Gallant
364772ddd2
globset-0.4.10 2023-01-05 08:45:47 -05:00
Andrew Gallant
2e207833bc
deps: upgrade to jemallocator 0.5 2023-01-05 08:33:43 -05:00
Andrew Gallant
92b35a65f8
deps: upgrade to base64 0.20 2023-01-05 08:21:49 -05:00
Andrew Gallant
ac8fecbbf2
deps: upgrade bstr to 1.1 2023-01-05 08:21:15 -05:00
Andrew Gallant
8596817374
deps: do semver compatible upgrades 2023-01-05 08:16:32 -05:00
Andrew Gallant
28bff84a0a
deps: remove 'num_cpus'
Now that std:🧵:available_parallelism is a thing, we no longer
need num_cpus.
2023-01-05 08:15:09 -05:00
Alex Touchet
61101289fa
cargo: set rust-version
This should hopefully make compilation errors from using
an older-than-supported compiler more helpful.

PR #2373
2022-12-21 07:37:09 -05:00
Andrew Gallant
13faa39b66
deps: update all dependencies within semver
Note that this adds a new dependency, 'unicode-ident', and removes
'unicode-xid'. I looked briefly at 'unicode-ident' and all looks okay.
It is also permissively licensed.
2022-12-20 09:23:29 -05:00
Andrew Gallant
6b61271bbb
benchsuite/runs: add another run of the benchmarks
Looks like ripgrep is still the king. ;-)
2022-12-16 11:24:10 -05:00
Andrew Gallant
1be86392e0
benchsuite: pass '-a' to ugrep in some cases
It looks like it incorrectly treats a file that is purely valid UTF-8 as
a binary file, which in turn effectively renders all of the Russian
subtitle benchmarks moot for ugrep. So we pass '-a' to force ugrep to
treat the file as text.

This technically gives ugrep an edge because it now no longer needs to
look to see if the haystack is binary or not. In practice this is
usually implemented using highly optimized SIMD routines (e.g.,
'memchr'), so it tends not to matter much. We might also consider
passing '-a' to all grep commands. But... I think using '-a' is the less
common case and we should try to benchmark the common case.
2022-12-16 11:21:58 -05:00
Andrew Gallant
63058453fa
benchsuite: update URLs
This removes the old commented out URLs for the 2016 subtitles that
don't work any more. I should probably upload the files to a more stable
URL.

This also switches to a 'https://' GitHub URL as I believe the 'git://'
URLs are no longer supported.
2022-12-16 11:20:45 -05:00
Armin Brauns
7f23cd63a5
ignore/types: add automated test for sortedness
People occasionally get this wrong and I've been manually
checking it. Instead, let's have CI do it automatically.

PR #2351
2022-11-14 08:31:07 -05:00
Andrew Gallant
8905d54a9f
msrv: bump to Rust 1.65.0
This matches the latest stable release of Rust and let's us use nice
things like 'let else'.
2022-11-14 07:56:17 -05:00
Armin Brauns
25a4eaf5ae
ignore/types: add devicetree filetype
See: https://www.devicetree.org/

PR #2349
2022-11-14 07:42:57 -05:00
jgart
0000157917
readme: add guix installation instructions
PR #2344
2022-11-02 08:10:54 -04:00
jgart
65b1b0e38a
ignore/types: add carp
See: https://github.com/carp-lang/Carp

PR #2343
2022-11-01 07:17:00 -04:00
Glenn Slotte
c032cda4b7
ignore/types: add ReScript and ReasonML
PR #2340
2022-10-29 13:49:19 -04:00
Marcin Nowak-Liebiediew
eab044d829
ignore/types: add motoko and candid
See: https://github.com/dfinity/candid
See: https://github.com/dfinity/motoko

PR #2335
2022-10-20 09:22:41 -04:00
Andrew Gallant
55e62a4411
readme: add more links to overview
Many of the features are documented in the GUIDE, so let's just link to
them.
2022-10-19 11:06:44 -04:00
Andrew Gallant
5b2f614aad
readme: add note about 'rg -uuu'
I'm not sure about putting this in such a prominent spot, and it does
bloat the introductory paragraph a bit, but it seems like an important
special case.
2022-10-19 09:52:37 -04:00
dependabot[bot]
4386b8e805
ci: bump actions/checkout from 2 to 3 (#2318)
Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 3.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v2...v3)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-29 08:18:47 -04:00
dependabot[bot]
6b012d8129
ci: bump actions/upload-release-asset from 1.0.1 to 1.0.2 (#2317)
Bumps [actions/upload-release-asset](https://github.com/actions/upload-release-asset) from 1.0.1 to 1.0.2.
- [Release notes](https://github.com/actions/upload-release-asset/releases)
- [Commits](https://github.com/actions/upload-release-asset/compare/v1.0.1...v1.0.2)

---
updated-dependencies:
- dependency-name: actions/upload-release-asset
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-29 08:15:36 -04:00
LingMan
a928ca4221
ci: enable Dependabot for the Actions workflows
Dependabot automatically files PRs for updatable dependencies. As
configured it watches all workflow files in `.github/workflows` for
possible updates to any of the Actions depended upon.

We specifically do not enable Dependabot for other things, in order to
avoid running in a hamster wheel.

Closes #2315
2022-09-29 07:44:30 -04:00
LingMan
d1570defbf
ci: remove fetch-depth parameter from the checkout action
It is already set to 1 by default.

Closes #2316
2022-09-29 07:44:19 -04:00
LingMan
b732c23e36
ci: use cargo check's --check option directly 2022-09-29 07:44:13 -04:00
LingMan
49965703fa
ci: switch to using '@master' dtolnay action
The `v1` tag exists but isn't really supported.

This mirrors [1]. See also [2].

[1]: 50086e74da
[2]: https://github.com/BurntSushi/bstr/pull/122#issuecomment-1201930916
2022-09-29 07:43:29 -04:00
LingMan
609838aebd
ci: use latest runner images in CI
The `ubuntu-18.04` image is deprecated and will be removed by
2023-04-01[1][2] with scheduled brownouts starting on 2022-10-03.
Update all images to the latest available versions.

[1]: https://github.blog/changelog/2022-08-09-github-actions-the-ubuntu-18-04-actions-runner-image-is-being-deprecated-and-will-be-removed-by-12-1-22/
[2]: https://github.com/actions/runner-images/issues/6002
2022-09-29 07:43:10 -04:00
Dave Rolsky
515f120b5c
doc: fix typo
PR #2313
2022-09-24 13:23:59 -04:00
Linda_pp
a66315d232
ignore/types: add *.cjs, *.mjs, *.cts, *.mts
These are used by both Node.js and TypeScript to indicate that a file
is CommonJS or ES.

Node.js: https://nodejs.org/api/esm.html

TypeScript: https://www.typescriptlang.org/docs/handbook/esm-node.html#new-file-extensions

PR #2297
2022-08-31 08:11:13 -04:00
Nacho Barrientos
bdf10ab7c0
ignore/types: add embedded puppet templates
.epp files are getting more and more common in Puppet code bases so it
makes sense I think to include them as part of the "puppet" type.

https://puppet.com/docs/puppet/7/lang_template_epp.html

PR #2141
2022-08-21 12:32:03 -04:00
John Saigle
a02678800b
ignore/types: add Solidity
See: https://soliditylang.org/about/

PR #2284
2022-08-17 09:37:32 -04:00
Andrew Gallant
387df97d85
ripgrep: add /.github/ to whitelist
It's pretty common to want to search this, since it defines the CI
configuration of the project.
2022-08-17 08:31:22 -04:00
David Marzal
a9d97a1dda
doc: add '-.' as short flag for '--hidden'
PR #2279
2022-08-10 08:03:04 -04:00
drebelsky
3bb71b0cb8
doc: fix a few typos
PR #2274
2022-08-06 14:29:27 -04:00
Malte
87b33c96c0
ignore/types: improve 'markdown' and 'php' types
This adds some lesser known extensions.

Notably, it adds php7 and php8, but not php6. Apparently,
php6 was never a thing: https://wiki.php.net/rfc/php6

PR #2263
2022-07-18 10:35:09 -04:00
Andrew Gallant
5e975c43f8
doc: appease rustdoc 2022-07-15 10:13:55 -04:00
Andrew Gallant
7efa2e46d3
grep-0.2.10 2022-07-15 10:06:53 -04:00
Andrew Gallant
db0b92b62d
grep: bump grep-searcher to 0.1.10
This was a result of leaving a stray 'dbg!'.
2022-07-15 10:06:31 -04:00
Andrew Gallant
33b81cac48
grep-searcher-0.1.10 2022-07-15 10:05:46 -04:00
Andrew Gallant
6a13a4f64d
searcher: remove stray 'dbg!' 2022-07-15 10:05:20 -04:00
Andrew Gallant
b13d835d95
grep-0.2.9 2022-07-15 10:03:06 -04:00
Andrew Gallant
d53506b7f7
grep: bump 'grep-regex' and 'grep-searcher'
To 0.1.10 and 0.1.9, respectively.
2022-07-15 10:02:41 -04:00
Andrew Gallant
78a35d4d43
grep-searcher-0.1.9 2022-07-15 10:02:24 -04:00
Andrew Gallant
a933d0bc90
searcher: bump grep-regex dep to 0.1.10 2022-07-15 10:02:06 -04:00
Andrew Gallant
2cae30e399
grep-regex-0.1.10 2022-07-15 10:01:42 -04:00
Andrew Gallant
8e57989cd2
regex: fix matching bug when text anchors are used
It turns out that if there are text anchors (that is, \A or \z, or ^/$
when multi-line is disabled), then the "fast" line searching path isn't
quite correct. Since searching without multi-line mode is exceptionally
rare, we just look for the presence of text anchors and specifically
disable the line terminator option in 'grep-regex'. This in turn
inhibits the "fast" line searching path.

Fixes #2260
2022-07-15 09:53:39 -04:00
Andrew Gallant
b9f5835534 ci: switch to dtolnay/rust-toolchain
The actions-rs/toolchain project appears dead. dtolnay's also seems more
sustainable given its simplicity, but it does enough to suit our needs.
2022-07-14 13:48:14 -04:00
tleb
e70778e89d
ignore/types: add dts to default types
See: https://devicetree-specification.readthedocs.io/en/v0.3/source-language.html

PR #2255
2022-07-07 12:24:12 -04:00
zhimoe
87c4a2b4b1
doc: fix typo
PR #2248
2022-06-26 18:49:54 -04:00
Kian-Meng Ang
0aa31676e3
doc: fix typos
PR #2245
2022-06-24 09:58:20 -04:00
Andrew Gallant
9f0e88bcb1
ignore: fix gitignore parsing bug for trailing \/
When a glob pattern ended with a \/, and since we permit backslash
escapes, the glob parser gave a "dangling escape" error. Which is weird,
because the \ is clearly not dangling.

The issue is that the layer above the glob parser, the gitignore parser,
was stripping the trailing / so that it wouldn't be part of the matching
logic. Of course, stripping the trailing / while it is escaped without
removing the backslash escape is wrong. So we do that here.

Fixes #2236
2022-06-14 10:40:37 -04:00
Alex Touchet
eb4b389846
globset/readme: update version number and some links
PR #2232
2022-06-11 14:17:32 -04:00
Andrew Gallant
dc337bab0a
deps: update to globset 0.4.9 2022-06-10 14:11:20 -04:00
148 changed files with 21356 additions and 10923 deletions

View File

@ -6,3 +6,16 @@
rustflags = ["-C", "target-feature=+crt-static"]
[target.i686-pc-windows-msvc]
rustflags = ["-C", "target-feature=+crt-static"]
# Do the same for MUSL targets. At the time of writing (2023-10-23), this is
# the default. But the plan is for the default to change to dynamic linking.
# The whole point of MUSL with respect to ripgrep is to create a fully
# statically linked executable.
#
# See: https://github.com/rust-lang/compiler-team/issues/422
# See: https://github.com/rust-lang/compiler-team/issues/422#issuecomment-812135847
[target.x86_64-unknown-linux-musl]
rustflags = [
"-C", "target-feature=+crt-static",
"-C", "link-self-contained=yes",
]

1
.github/FUNDING.yml vendored Normal file
View File

@ -0,0 +1 @@
github: [BurntSushi]

View File

@ -1,55 +0,0 @@
---
name: Bug report
about: An issue with ripgrep or any of its crates (ignore, globset, etc.)
title: ''
labels: ''
assignees: ''
---
#### What version of ripgrep are you using?
Replace this text with the output of `rg --version`.
#### How did you install ripgrep?
If you installed ripgrep with snap and are getting strange file permission or
file not found errors, then please do not file a bug. Instead, use one of the
Github binary releases.
#### What operating system are you using ripgrep on?
Replace this text with your operating system and version.
#### Describe your bug.
Give a high level description of the bug.
#### What are the steps to reproduce the behavior?
If possible, please include both your search patterns and the corpus on which
you are searching. Unless the bug is very obvious, then it is unlikely that it
will be fixed if the ripgrep maintainers cannot reproduce it.
If the corpus is too big and you cannot decrease its size, file the bug anyway
and the ripgrep maintainers will help figure out next steps.
#### What is the actual behavior?
Show the command you ran and the actual output. Include the `--debug` flag in
your invocation of ripgrep.
If the output is large, put it in a gist: https://gist.github.com/
If the output is small, put it in code fences:
```
your
output
goes
here
```
#### What is the expected behavior?
What do you think ripgrep should have done?

101
.github/ISSUE_TEMPLATE/bug_report.yml vendored Normal file
View File

@ -0,0 +1,101 @@
name: Bug Report
description: An issue with ripgrep or any of its crates (ignore, globset, etc.).
body:
- type: markdown
attributes:
value: |
Please review the following common issues before filing a bug. You may also be interested in reading the [FAQ](https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md)
and the [user guide](https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md).
* Unable to search for text with leading dash/hyphen: This is not a bug. Use `rg -- -mytext` or `rg -e -mytext`. See #102, #215, #624.
* Unable to build with old version of Rust. This is not a bug. ripgrep tracks the latest stable release of Rust. See #1019, #1433, #2534.
* ripgrep package is broken or out of date. ripgrep's author does not maintain packages for Red Hat, Ubuntu, Arch, Homebrew, WinGet, etc. If you have an issue with one of these, please contact your package maintainer. See #1637, #2264, #2459.
- type: checkboxes
id: issue-not-common
attributes:
label: Please tick this box to confirm you have reviewed the above.
options:
- label: I have a different issue.
required: true
- type: textarea
id: ripgrep-version
attributes:
label: What version of ripgrep are you using?
description: Enter the output of `rg --version`.
placeholder: ex. ripgrep 0.2.1
validations:
required: true
- type: textarea
id: install-method
attributes:
label: How did you install ripgrep?
description: |
If you installed ripgrep with snap and are getting strange file permission or file not found errors, then please do not file a bug. Instead, use one of the GitHub binary releases.
Please report any other issues with downstream ripgrep packages to their respective maintainers as mentioned above.
placeholder: ex. Cargo, APT, Homebrew
validations:
required: true
- type: textarea
id: operating-system
attributes:
label: What operating system are you using ripgrep on?
description: Enter the name and version of your operating system.
placeholder: ex. Debian 12.0, macOS 13.4.1
validations:
required: true
- type: textarea
id: description
attributes:
label: Describe your bug.
description: Give a high level description of the bug.
placeholder: ex. ripgrep fails to return the expected matches when...
validations:
required: true
- type: textarea
id: steps-to-reproduce
attributes:
label: What are the steps to reproduce the behavior?
description: |
If possible, please include both your search patterns and the corpus on which you are searching. Unless the bug is very obvious, then it is unlikely that it will be fixed if the ripgrep maintainers cannot reproduce it.
If the corpus is too big and you cannot decrease its size, file the bug anyway and the ripgrep maintainers will help figure out next steps.
placeholder: >
ex. Run `rg bar` in a directory containing a file with the lines 'bar' and 'barbaz'
validations:
required: true
- type: textarea
id: actual-behavior
attributes:
label: What is the actual behavior?
description: |
Show the command you ran and the actual output. **Include the `--debug` flag in your invocation of ripgrep.**
If the output is large, put it in a gist: <https://gist.github.com/>
If the output is small, put it in code fences (see placeholder text).
placeholder: |
ex.
```
$ rg --debug bar
DEBUG|grep_regex::literal|crates/regex/src/literal.rs:58: literal prefixes detected: Literals { lits: [Complete(bar)], limit_size: 250, limit_class: 10 }
...
```
validations:
required: true
- type: textarea
id: expected-behavior
attributes:
label: What is the expected behavior?
description: What do you think ripgrep should have done?
placeholder: ex. ripgrep should have returned 2 matches
validations:
required: true

View File

@ -6,6 +6,27 @@ on:
- master
schedule:
- cron: '00 01 * * *'
# The section is needed to drop write-all permissions that are granted on
# `schedule` event. By specifying any permission explicitly all others are set
# to none. By using the principle of least privilege the damage a compromised
# workflow can do (because of an injection or compromised third party tool or
# action) is restricted. Currently the worklow doesn't need any additional
# permission except for pulling the code. Adding labels to issues, commenting
# on pull-requests, etc. may need additional permissions:
#
# Syntax for this section:
# https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#permissions
#
# Reference for how to assign permissions on a job-by-job basis:
# https://docs.github.com/en/actions/using-jobs/assigning-permissions-to-jobs
#
# Reference for available permissions that we can enable if needed:
# https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token
permissions:
# to fetch code (actions/checkout)
contents: read
jobs:
test:
name: test
@ -14,100 +35,101 @@ jobs:
# systems.
CARGO: cargo
# When CARGO is set to CROSS, this is set to `--target matrix.target`.
# Note that we only use cross on Linux, so setting a target on a
# different OS will just use normal cargo.
TARGET_FLAGS:
# When CARGO is set to CROSS, TARGET_DIR includes matrix.target.
TARGET_DIR: ./target
# Bump this as appropriate. We pin to a version to make sure CI
# continues to work as cross releases in the past have broken things
# in subtle ways.
CROSS_VERSION: v0.2.5
# Emit backtraces on panics.
RUST_BACKTRACE: 1
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
build:
# We test ripgrep on a pinned version of Rust, along with the moving
# targets of 'stable' and 'beta' for good measure.
- pinned
- stable
- beta
# Our release builds are generated by a nightly compiler to take
# advantage of the latest optimizations/compile time improvements. So
# we test all of them here. (We don't do mips releases, but test on
# mips for big-endian coverage.)
- nightly
- nightly-musl
- nightly-32
- nightly-mips
- nightly-arm
- macos
- win-msvc
- win-gnu
include:
- build: pinned
os: ubuntu-18.04
rust: 1.52.1
os: ubuntu-latest
rust: 1.74.0
- build: stable
os: ubuntu-18.04
os: ubuntu-latest
rust: stable
- build: beta
os: ubuntu-18.04
os: ubuntu-latest
rust: beta
- build: nightly
os: ubuntu-18.04
rust: nightly
- build: nightly-musl
os: ubuntu-18.04
os: ubuntu-latest
rust: nightly
- build: stable-musl
os: ubuntu-latest
rust: stable
target: x86_64-unknown-linux-musl
- build: nightly-32
os: ubuntu-18.04
rust: nightly
- build: stable-x86
os: ubuntu-latest
rust: stable
target: i686-unknown-linux-gnu
- build: nightly-mips
os: ubuntu-18.04
rust: nightly
target: mips64-unknown-linux-gnuabi64
- build: nightly-arm
os: ubuntu-18.04
rust: nightly
# For stripping release binaries:
# docker run --rm -v $PWD/target:/target:Z \
# rustembedded/cross:arm-unknown-linux-gnueabihf \
# arm-linux-gnueabihf-strip \
# /target/arm-unknown-linux-gnueabihf/debug/rg
target: arm-unknown-linux-gnueabihf
- build: stable-aarch64
os: ubuntu-latest
rust: stable
target: aarch64-unknown-linux-gnu
- build: stable-arm-gnueabihf
os: ubuntu-latest
rust: stable
target: armv7-unknown-linux-gnueabihf
- build: stable-arm-musleabihf
os: ubuntu-latest
rust: stable
target: armv7-unknown-linux-musleabihf
- build: stable-arm-musleabi
os: ubuntu-latest
rust: stable
target: armv7-unknown-linux-musleabi
- build: stable-powerpc64
os: ubuntu-latest
rust: stable
target: powerpc64-unknown-linux-gnu
- build: stable-s390x
os: ubuntu-latest
rust: stable
target: s390x-unknown-linux-gnu
- build: macos
os: macos-latest
rust: nightly
- build: win-msvc
os: windows-2019
os: windows-2022
rust: nightly
- build: win-gnu
os: windows-2019
os: windows-2022
rust: nightly-x86_64-gnu
steps:
- name: Checkout repository
uses: actions/checkout@v2
uses: actions/checkout@v4
- name: Install packages (Ubuntu)
if: matrix.os == 'ubuntu-18.04'
if: matrix.os == 'ubuntu-latest'
run: |
ci/ubuntu-install-packages
- name: Install packages (macOS)
if: matrix.os == 'macos-latest'
run: |
ci/macos-install-packages
- name: Install Rust
uses: actions-rs/toolchain@v1
uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ matrix.rust }}
profile: minimal
override: true
- name: Use Cross
if: matrix.target != ''
if: matrix.os == 'ubuntu-latest' && matrix.target != ''
run: |
cargo install cross
# In the past, new releases of 'cross' have broken CI. So for now, we
# pin it. We also use their pre-compiled binary releases because cross
# has over 100 dependencies and takes a bit to compile.
dir="$RUNNER_TEMP/cross-download"
mkdir "$dir"
echo "$dir" >> $GITHUB_PATH
cd "$dir"
curl -LO "https://github.com/cross-rs/cross/releases/download/$CROSS_VERSION/cross-x86_64-unknown-linux-musl.tar.gz"
tar xf cross-x86_64-unknown-linux-musl.tar.gz
echo "CARGO=cross" >> $GITHUB_ENV
echo "TARGET_FLAGS=--target ${{ matrix.target }}" >> $GITHUB_ENV
echo "TARGET_DIR=./target/${{ matrix.target }}" >> $GITHUB_ENV
@ -116,6 +138,7 @@ jobs:
run: |
echo "cargo command is: ${{ env.CARGO }}"
echo "target flag is: ${{ env.TARGET_FLAGS }}"
echo "target dir is: ${{ env.TARGET_DIR }}"
- name: Build ripgrep and all crates
run: ${{ env.CARGO }} build --verbose --workspace ${{ env.TARGET_FLAGS }}
@ -149,64 +172,60 @@ jobs:
if: matrix.target != ''
run: ${{ env.CARGO }} test --verbose --workspace ${{ env.TARGET_FLAGS }}
- name: Test for existence of build artifacts (Windows)
if: matrix.os == 'windows-2019'
shell: bash
run: |
outdir="$(ci/cargo-out-dir "${{ env.TARGET_DIR }}")"
ls "$outdir/_rg.ps1" && file "$outdir/_rg.ps1"
- name: Test for existence of build artifacts (Unix)
if: matrix.os != 'windows-2019'
shell: bash
run: |
outdir="$(ci/cargo-out-dir "${{ env.TARGET_DIR }}")"
# TODO: Check for the man page generation here. For whatever reason,
# it seems to be intermittently failing in CI. No idea why.
# for f in rg.bash rg.fish rg.1; do
for f in rg.bash rg.fish; do
# We could use file -E here, but it isn't supported on macOS.
ls "$outdir/$f" && file "$outdir/$f"
done
- name: Test zsh shell completions (Unix, sans cross)
# We could test this when using Cross, but we'd have to execute the
# 'rg' binary (done in test-complete) with qemu, which is a pain and
# doesn't really gain us much. If shell completion works in one place,
# it probably works everywhere.
if: matrix.target == '' && matrix.os != 'windows-2019'
if: matrix.target == '' && matrix.os != 'windows-2022'
shell: bash
run: ci/test-complete
rustfmt:
name: rustfmt
runs-on: ubuntu-18.04
- name: Print hostname detected by grep-cli crate
shell: bash
run: ${{ env.CARGO }} test --manifest-path crates/cli/Cargo.toml ${{ env.TARGET_FLAGS }} --lib print_hostname -- --nocapture
- name: Print available short flags
shell: bash
run: ${{ env.CARGO }} test --bin rg ${{ env.TARGET_FLAGS }} flags::defs::tests::available_shorts -- --nocapture
# Setup and compile on the wasm32-wasip1 target
wasm:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v2
uses: actions/checkout@v4
- name: Install Rust
uses: actions-rs/toolchain@v1
uses: dtolnay/rust-toolchain@master
with:
toolchain: stable
- name: Add wasm32-wasip1 target
run: rustup target add wasm32-wasip1
- name: Basic build
run: cargo build --verbose
rustfmt:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Install Rust
uses: dtolnay/rust-toolchain@master
with:
toolchain: stable
override: true
profile: minimal
components: rustfmt
- name: Check formatting
run: |
cargo fmt --all -- --check
run: cargo fmt --all --check
docs:
name: Docs
runs-on: ubuntu-20.04
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v2
uses: actions/checkout@v4
- name: Install Rust
uses: actions-rs/toolchain@v1
uses: dtolnay/rust-toolchain@master
with:
toolchain: stable
profile: minimal
override: true
- name: Check documentation
env:
RUSTDOCFLAGS: -D warnings

View File

@ -1,54 +1,43 @@
# The way this works is the following:
#
# The create-release job runs purely to initialize the GitHub release itself
# and to output upload_url for the following job.
#
# The build-release job runs only once create-release is finished. It gets the
# release upload URL from create-release job outputs, then builds the release
# executables for each supported platform and attaches them as release assets
# to the previously created release.
#
# The key here is that we create the release only once.
#
# Reference:
# https://eugene-babichenko.github.io/blog/2020/05/09/github-actions-cross-platform-auto-releases/
name: release
# Only do the release on x.y.z tags.
on:
push:
# Enable when testing release infrastructure on a branch.
# branches:
# - ag/work
tags:
- "[0-9]+.[0-9]+.[0-9]+"
# We need this to be able to create releases.
permissions:
contents: write
jobs:
# The create-release job runs purely to initialize the GitHub release itself,
# and names the release after the `x.y.z` tag that was pushed. It's separate
# from building the release so that we only create the release once.
create-release:
name: create-release
runs-on: ubuntu-latest
# env:
# Set to force version number, e.g., when no tag exists.
# RG_VERSION: TEST-0.0.0
outputs:
upload_url: ${{ steps.release.outputs.upload_url }}
rg_version: ${{ env.RG_VERSION }}
steps:
- uses: actions/checkout@v4
- name: Get the release version from the tag
shell: bash
if: env.RG_VERSION == ''
if: env.VERSION == ''
run: echo "VERSION=${{ github.ref_name }}" >> $GITHUB_ENV
- name: Show the version
run: |
# Apparently, this is the right way to get a tag name. Really?
#
# See: https://github.community/t5/GitHub-Actions/How-to-get-just-the-tag-name/m-p/32167/highlight/true#M1027
echo "RG_VERSION=${GITHUB_REF#refs/tags/}" >> $GITHUB_ENV
echo "version is: ${{ env.RG_VERSION }}"
echo "version is: $VERSION"
- name: Check that tag version and Cargo.toml version are the same
shell: bash
run: |
if ! grep -q "version = \"$VERSION\"" Cargo.toml; then
echo "version does not match Cargo.toml" >&2
exit 1
fi
- name: Create GitHub release
id: release
uses: actions/create-release@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
tag_name: ${{ env.RG_VERSION }}
release_name: ${{ env.RG_VERSION }}
run: gh release create $VERSION --draft --verify-tag --title $VERSION
outputs:
version: ${{ env.VERSION }}
build-release:
name: build-release
@ -59,126 +48,324 @@ jobs:
# systems.
CARGO: cargo
# When CARGO is set to CROSS, this is set to `--target matrix.target`.
TARGET_FLAGS: ""
TARGET_FLAGS:
# When CARGO is set to CROSS, TARGET_DIR includes matrix.target.
TARGET_DIR: ./target
# Bump this as appropriate. We pin to a version to make sure CI
# continues to work as cross releases in the past have broken things
# in subtle ways.
CROSS_VERSION: v0.2.5
# Emit backtraces on panics.
RUST_BACKTRACE: 1
# Build static releases with PCRE2.
PCRE2_SYS_STATIC: 1
strategy:
fail-fast: false
matrix:
build: [linux, linux-arm, macos, win-msvc, win-gnu, win32-msvc]
include:
- build: linux
os: ubuntu-18.04
os: ubuntu-latest
rust: nightly
target: x86_64-unknown-linux-musl
- build: linux-arm
os: ubuntu-18.04
rust: nightly
target: arm-unknown-linux-gnueabihf
strip: x86_64-linux-musl-strip
- build: stable-x86
os: ubuntu-latest
rust: stable
target: i686-unknown-linux-gnu
strip: x86_64-linux-gnu-strip
qemu: i386
- build: stable-aarch64
os: ubuntu-latest
rust: stable
target: aarch64-unknown-linux-gnu
strip: aarch64-linux-gnu-strip
qemu: qemu-aarch64
- build: stable-arm-gnueabihf
os: ubuntu-latest
rust: stable
target: armv7-unknown-linux-gnueabihf
strip: arm-linux-gnueabihf-strip
qemu: qemu-arm
- build: stable-arm-musleabihf
os: ubuntu-latest
rust: stable
target: armv7-unknown-linux-musleabihf
strip: arm-linux-musleabihf-strip
qemu: qemu-arm
- build: stable-arm-musleabi
os: ubuntu-latest
rust: stable
target: armv7-unknown-linux-musleabi
strip: arm-linux-musleabi-strip
qemu: qemu-arm
- build: stable-powerpc64
os: ubuntu-latest
rust: stable
target: powerpc64-unknown-linux-gnu
strip: powerpc64-linux-gnu-strip
qemu: qemu-ppc64
- build: stable-s390x
os: ubuntu-latest
rust: stable
target: s390x-unknown-linux-gnu
strip: s390x-linux-gnu-strip
qemu: qemu-s390x
- build: macos
os: macos-latest
rust: nightly
target: x86_64-apple-darwin
- build: win-msvc
os: windows-2019
os: windows-latest
rust: nightly
target: x86_64-pc-windows-msvc
- build: win-gnu
os: windows-2019
os: windows-latest
rust: nightly-x86_64-gnu
target: x86_64-pc-windows-gnu
- build: win32-msvc
os: windows-2019
os: windows-latest
rust: nightly
target: i686-pc-windows-msvc
steps:
- name: Checkout repository
uses: actions/checkout@v2
with:
fetch-depth: 1
uses: actions/checkout@v4
- name: Install packages (Ubuntu)
if: matrix.os == 'ubuntu-18.04'
if: matrix.os == 'ubuntu-latest'
shell: bash
run: |
ci/ubuntu-install-packages
- name: Install packages (macOS)
if: matrix.os == 'macos-latest'
run: |
ci/macos-install-packages
- name: Install Rust
uses: actions-rs/toolchain@v1
uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ matrix.rust }}
profile: minimal
override: true
target: ${{ matrix.target }}
- name: Use Cross
if: matrix.os == 'ubuntu-latest' && matrix.target != ''
shell: bash
run: |
cargo install cross
# In the past, new releases of 'cross' have broken CI. So for now, we
# pin it. We also use their pre-compiled binary releases because cross
# has over 100 dependencies and takes a bit to compile.
dir="$RUNNER_TEMP/cross-download"
mkdir "$dir"
echo "$dir" >> $GITHUB_PATH
cd "$dir"
curl -LO "https://github.com/cross-rs/cross/releases/download/$CROSS_VERSION/cross-x86_64-unknown-linux-musl.tar.gz"
tar xf cross-x86_64-unknown-linux-musl.tar.gz
echo "CARGO=cross" >> $GITHUB_ENV
- name: Set target variables
shell: bash
run: |
echo "TARGET_FLAGS=--target ${{ matrix.target }}" >> $GITHUB_ENV
echo "TARGET_DIR=./target/${{ matrix.target }}" >> $GITHUB_ENV
- name: Show command used for Cargo
shell: bash
run: |
echo "cargo command is: ${{ env.CARGO }}"
echo "target flag is: ${{ env.TARGET_FLAGS }}"
echo "target dir is: ${{ env.TARGET_DIR }}"
- name: Build release binary
run: ${{ env.CARGO }} build --verbose --release --features pcre2 ${{ env.TARGET_FLAGS }}
shell: bash
run: |
${{ env.CARGO }} build --verbose --release --features pcre2 ${{ env.TARGET_FLAGS }}
if [ "${{ matrix.os }}" = "windows-latest" ]; then
bin="target/${{ matrix.target }}/release/rg.exe"
else
bin="target/${{ matrix.target }}/release/rg"
fi
echo "BIN=$bin" >> $GITHUB_ENV
- name: Strip release binary (linux and macos)
if: matrix.build == 'linux' || matrix.build == 'macos'
run: strip "target/${{ matrix.target }}/release/rg"
- name: Strip release binary (macos)
if: matrix.os == 'macos-latest'
shell: bash
run: strip "$BIN"
- name: Strip release binary (arm)
if: matrix.build == 'linux-arm'
- name: Strip release binary (cross)
if: env.CARGO == 'cross'
shell: bash
run: |
docker run --rm -v \
"$PWD/target:/target:Z" \
rustembedded/cross:arm-unknown-linux-gnueabihf \
arm-linux-gnueabihf-strip \
/target/arm-unknown-linux-gnueabihf/release/rg
"ghcr.io/cross-rs/${{ matrix.target }}:main" \
"${{ matrix.strip }}" \
"/$BIN"
- name: Build archive
- name: Determine archive name
shell: bash
run: |
outdir="$(ci/cargo-out-dir "${{ env.TARGET_DIR }}")"
staging="ripgrep-${{ needs.create-release.outputs.rg_version }}-${{ matrix.target }}"
mkdir -p "$staging"/{complete,doc}
version="${{ needs.create-release.outputs.version }}"
echo "ARCHIVE=ripgrep-$version-${{ matrix.target }}" >> $GITHUB_ENV
cp {README.md,COPYING,UNLICENSE,LICENSE-MIT} "$staging/"
cp {CHANGELOG.md,FAQ.md,GUIDE.md} "$staging/doc/"
cp "$outdir"/{rg.bash,rg.fish,_rg.ps1} "$staging/complete/"
cp complete/_rg "$staging/complete/"
- name: Creating directory for archive
shell: bash
run: |
mkdir -p "$ARCHIVE"/{complete,doc}
cp "$BIN" "$ARCHIVE"/
cp {README.md,COPYING,UNLICENSE,LICENSE-MIT} "$ARCHIVE"/
cp {CHANGELOG.md,FAQ.md,GUIDE.md} "$ARCHIVE"/doc/
if [ "${{ matrix.os }}" = "windows-2019" ]; then
cp "target/${{ matrix.target }}/release/rg.exe" "$staging/"
7z a "$staging.zip" "$staging"
echo "ASSET=$staging.zip" >> $GITHUB_ENV
else
# The man page is only generated on Unix systems. ¯\_(ツ)_/¯
cp "$outdir"/rg.1 "$staging/doc/"
cp "target/${{ matrix.target }}/release/rg" "$staging/"
tar czf "$staging.tar.gz" "$staging"
echo "ASSET=$staging.tar.gz" >> $GITHUB_ENV
fi
- name: Generate man page and completions (no emulation)
if: matrix.qemu == ''
shell: bash
run: |
"$BIN" --version
"$BIN" --generate complete-bash > "$ARCHIVE/complete/rg.bash"
"$BIN" --generate complete-fish > "$ARCHIVE/complete/rg.fish"
"$BIN" --generate complete-powershell > "$ARCHIVE/complete/_rg.ps1"
"$BIN" --generate complete-zsh > "$ARCHIVE/complete/_rg"
"$BIN" --generate man > "$ARCHIVE/doc/rg.1"
- name: Generate man page and completions (emulation)
if: matrix.qemu != ''
shell: bash
run: |
docker run --rm -v \
"$PWD/target:/target:Z" \
"ghcr.io/cross-rs/${{ matrix.target }}:main" \
"${{ matrix.qemu }}" "/$BIN" --version
docker run --rm -v \
"$PWD/target:/target:Z" \
"ghcr.io/cross-rs/${{ matrix.target }}:main" \
"${{ matrix.qemu }}" "/$BIN" \
--generate complete-bash > "$ARCHIVE/complete/rg.bash"
docker run --rm -v \
"$PWD/target:/target:Z" \
"ghcr.io/cross-rs/${{ matrix.target }}:main" \
"${{ matrix.qemu }}" "/$BIN" \
--generate complete-fish > "$ARCHIVE/complete/rg.fish"
docker run --rm -v \
"$PWD/target:/target:Z" \
"ghcr.io/cross-rs/${{ matrix.target }}:main" \
"${{ matrix.qemu }}" "/$BIN" \
--generate complete-powershell > "$ARCHIVE/complete/_rg.ps1"
docker run --rm -v \
"$PWD/target:/target:Z" \
"ghcr.io/cross-rs/${{ matrix.target }}:main" \
"${{ matrix.qemu }}" "/$BIN" \
--generate complete-zsh > "$ARCHIVE/complete/_rg"
docker run --rm -v \
"$PWD/target:/target:Z" \
"ghcr.io/cross-rs/${{ matrix.target }}:main" \
"${{ matrix.qemu }}" "/$BIN" \
--generate man > "$ARCHIVE/doc/rg.1"
- name: Build archive (Windows)
shell: bash
if: matrix.os == 'windows-latest'
run: |
7z a "$ARCHIVE.zip" "$ARCHIVE"
certutil -hashfile "$ARCHIVE.zip" SHA256 > "$ARCHIVE.zip.sha256"
echo "ASSET=$ARCHIVE.zip" >> $GITHUB_ENV
echo "ASSET_SUM=$ARCHIVE.zip.sha256" >> $GITHUB_ENV
- name: Build archive (Unix)
shell: bash
if: matrix.os != 'windows-latest'
run: |
tar czf "$ARCHIVE.tar.gz" "$ARCHIVE"
shasum -a 256 "$ARCHIVE.tar.gz" > "$ARCHIVE.tar.gz.sha256"
echo "ASSET=$ARCHIVE.tar.gz" >> $GITHUB_ENV
echo "ASSET_SUM=$ARCHIVE.tar.gz.sha256" >> $GITHUB_ENV
- name: Upload release archive
uses: actions/upload-release-asset@v1.0.1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
shell: bash
run: |
version="${{ needs.create-release.outputs.version }}"
gh release upload "$version" ${{ env.ASSET }} ${{ env.ASSET_SUM }}
build-release-deb:
name: build-release-deb
needs: ['create-release']
runs-on: ubuntu-latest
env:
TARGET: x86_64-unknown-linux-musl
# Emit backtraces on panics.
RUST_BACKTRACE: 1
# Since we're distributing the dpkg, we don't know whether the user will
# have PCRE2 installed, so just do a static build.
PCRE2_SYS_STATIC: 1
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Install packages (Ubuntu)
shell: bash
run: |
ci/ubuntu-install-packages
- name: Install Rust
uses: dtolnay/rust-toolchain@master
with:
upload_url: ${{ needs.create-release.outputs.upload_url }}
asset_path: ${{ env.ASSET }}
asset_name: ${{ env.ASSET }}
asset_content_type: application/octet-stream
toolchain: nightly
target: ${{ env.TARGET }}
- name: Install cargo-deb
shell: bash
run: |
cargo install cargo-deb
# 'cargo deb' does not seem to provide a way to specify an asset that is
# created at build time, such as ripgrep's man page. To work around this,
# we force a debug build, copy out the man page (and shell completions)
# produced from that build, put it into a predictable location and then
# build the deb, which knows where to look.
- name: Build debug binary to create release assets
shell: bash
run: |
cargo build --target ${{ env.TARGET }}
bin="target/${{ env.TARGET }}/debug/rg"
echo "BIN=$bin" >> $GITHUB_ENV
- name: Create deployment directory
shell: bash
run: |
dir=deployment/deb
mkdir -p "$dir"
echo "DEPLOY_DIR=$dir" >> $GITHUB_ENV
- name: Generate man page
shell: bash
run: |
"$BIN" --generate man > "$DEPLOY_DIR/rg.1"
- name: Generate shell completions
shell: bash
run: |
"$BIN" --generate complete-bash > "$DEPLOY_DIR/rg.bash"
"$BIN" --generate complete-fish > "$DEPLOY_DIR/rg.fish"
"$BIN" --generate complete-zsh > "$DEPLOY_DIR/_rg"
- name: Build release binary
shell: bash
run: |
cargo deb --profile deb --target ${{ env.TARGET }}
version="${{ needs.create-release.outputs.version }}"
echo "DEB_DIR=target/${{ env.TARGET }}/debian" >> $GITHUB_ENV
echo "DEB_NAME=ripgrep_$version-1_amd64.deb" >> $GITHUB_ENV
- name: Create sha256 sum of deb file
shell: bash
run: |
cd "$DEB_DIR"
sum="$DEB_NAME.sha256"
shasum -a 256 "$DEB_NAME" > "$sum"
echo "SUM=$sum" >> $GITHUB_ENV
- name: Upload release archive
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
shell: bash
run: |
cd "$DEB_DIR"
version="${{ needs.create-release.outputs.version }}"
gh release upload "$version" "$DEB_NAME" "$SUM"

1
.gitignore vendored
View File

@ -7,6 +7,7 @@ target
/termcolor/Cargo.lock
/wincolor/Cargo.lock
/deployment
/.idea
# Snapcraft files
stage

1
.ignore Normal file
View File

@ -0,0 +1 @@
!/.github/

View File

@ -1,13 +1,240 @@
13.0.1
======
TBD
===
Unreleased changes. Release notes have not yet been written.
14.1.1 (2024-09-08)
===================
This is a minor release with a bug fix for a matching bug. In particular, a bug
was found that could cause ripgrep to ignore lines that should match. That is,
false negatives. It is difficult to characterize the specific set of regexes
in which this occurs as it requires multiple different optimization strategies
to collide and produce an incorrect result. But as one reported example, in
ripgrep, the regex `(?i:e.x|ex)` does not match `e-x` when it should. (This
bug is a result of an inner literal optimization performed in the `grep-regex`
crate and not in the `regex` crate.)
Bug fixes:
* [BUG #2884](https://github.com/BurntSushi/ripgrep/issues/2884):
Fix bug where ripgrep could miss some matches that it should report.
Miscellaneous:
* [MISC #2748](https://github.com/BurntSushi/ripgrep/issues/2748):
Remove ripgrep's `simd-accel` feature because it was frequently broken.
14.1.0 (2024-01-06)
===================
This is a minor release with a few small new features and bug fixes. This
release contains a bug fix for unbounded memory growth while walking a
directory tree. This release also includes improvements to the completions for
the `fish` shell, and release binaries for several additional ARM targets.
Bug fixes:
* [BUG #2664](https://github.com/BurntSushi/ripgrep/issues/2690):
Fix unbounded memory growth in the `ignore` crate.
Feature enhancements:
* Added or improved file type filtering for Lean and Meson.
* [FEATURE #2684](https://github.com/BurntSushi/ripgrep/issues/2684):
Improve completions for the `fish` shell.
* [FEATURE #2702](https://github.com/BurntSushi/ripgrep/pull/2702):
Add release binaries for `armv7-unknown-linux-gnueabihf`,
`armv7-unknown-linux-musleabihf` and `armv7-unknown-linux-musleabi`.
14.0.3 (2023-11-28)
===================
This is a patch release with a bug fix for the `--sortr` flag.
Bug fixes:
* [BUG #2664](https://github.com/BurntSushi/ripgrep/issues/2664):
Fix `--sortr=path`. I left a `todo!()` in the source. Oof.
14.0.2 (2023-11-27)
===================
This is a patch release with a few small bug fixes.
Bug fixes:
* [BUG #2654](https://github.com/BurntSushi/ripgrep/issues/2654):
Fix `deb` release sha256 sum file.
* [BUG #2658](https://github.com/BurntSushi/ripgrep/issues/2658):
Fix partial regression in the behavior of `--null-data --line-regexp`.
* [BUG #2659](https://github.com/BurntSushi/ripgrep/issues/2659):
Fix Fish shell completions.
* [BUG #2662](https://github.com/BurntSushi/ripgrep/issues/2662):
Fix typo in documentation for `-i/--ignore-case`.
14.0.1 (2023-11-26)
===================
This a patch release meant to fix `cargo install ripgrep` on Windows.
Bug fixes:
* [BUG #2653](https://github.com/BurntSushi/ripgrep/issues/2653):
Include `pkg/windows/Manifest.xml` in crate package.
14.0.0 (2023-11-26)
===================
ripgrep 14 is a new major version release of ripgrep that has some new
features, performance improvements and a lot of bug fixes.
The headlining feature in this release is hyperlink support. In this release,
they are an opt-in feature but may change to an opt-out feature in the future.
To enable them, try passing `--hyperlink-format default`. If you use [VS Code],
then try passing `--hyperlink-format vscode`. Please [report your experience
with hyperlinks][report-hyperlinks], positive or negative.
[VS Code]: https://code.visualstudio.com/
[report-hyperlinks]: https://github.com/BurntSushi/ripgrep/discussions/2611
Another headlining development in this release is that it contains a rewrite
of its regex engine. You generally shouldn't notice any changes, except for
some searches may get faster. You can read more about the [regex engine rewrite
on my blog][regex-internals]. Please [report your performance improvements or
regressions that you notice][report-perf].
[report-perf]: https://github.com/BurntSushi/ripgrep/discussions/2652
Finally, ripgrep switched the library it uses for argument parsing. Users
should not notice a difference in most cases (error messages have changed
somewhat), but flag overrides should generally be more consistent. For example,
things like `--no-ignore --ignore-vcs` work as one would expect (disables all
filtering related to ignore rules except for rules found in version control
systems such as `git`).
[regex-internals]: https://blog.burntsushi.net/regex-internals/
**BREAKING CHANGES**:
* `rg -C1 -A2` used to be equivalent to `rg -A2`, but now it is equivalent to
`rg -B1 -A2`. That is, `-A` and `-B` no longer completely override `-C`.
Instead, they only partially override `-C`.
Build process changes:
* ripgrep's shell completions and man page are now created by running ripgrep
with a new `--generate` flag. For example, `rg --generate man` will write a
man page in `roff` format on stdout. The release archives have not changed.
* The optional build dependency on `asciidoc` or `asciidoctor` has been
dropped. Previously, it was used to produce ripgrep's man page. ripgrep now
owns this process itself by writing `roff` directly.
Performance improvements:
* [PERF #1746](https://github.com/BurntSushi/ripgrep/issues/1746):
Make some cases with inner literals faster.
* [PERF #1760](https://github.com/BurntSushi/ripgrep/issues/1760):
Make most searches with `\b` look-arounds (among others) much faster.
* [PERF #2591](https://github.com/BurntSushi/ripgrep/pull/2591):
Parallel directory traversal now uses work stealing for faster searches.
* [PERF #2642](https://github.com/BurntSushi/ripgrep/pull/2642):
Parallel directory traversal has some contention reduced.
Feature enhancements:
* Added or improved file type filtering for Ada, DITA, Elixir, Fuchsia, Gentoo,
Gradle, GraphQL, Markdown, Prolog, Raku, TypeScript, USD, V
* [FEATURE #665](https://github.com/BurntSushi/ripgrep/issues/665):
Add a new `--hyperlink-format` flag that turns file paths into hyperlinks.
* [FEATURE #1709](https://github.com/BurntSushi/ripgrep/issues/1709):
Improve documentation of ripgrep's behavior when stdout is a tty.
* [FEATURE #1737](https://github.com/BurntSushi/ripgrep/issues/1737):
Provide binaries for Apple silicon.
* [FEATURE #1790](https://github.com/BurntSushi/ripgrep/issues/1790):
Add new `--stop-on-nonmatch` flag.
* [FEATURE #1814](https://github.com/BurntSushi/ripgrep/issues/1814):
Flags are now categorized in `-h/--help` output and ripgrep's man page.
* [FEATURE #1838](https://github.com/BurntSushi/ripgrep/issues/1838):
An error is shown when searching for NUL bytes with binary detection enabled.
* [FEATURE #2195](https://github.com/BurntSushi/ripgrep/issues/2195):
When `extra-verbose` mode is enabled in zsh, show extra file type info.
* [FEATURE #2298](https://github.com/BurntSushi/ripgrep/issues/2298):
Add instructions for installing ripgrep using `cargo binstall`.
* [FEATURE #2409](https://github.com/BurntSushi/ripgrep/pull/2409):
Added installation instructions for `winget`.
* [FEATURE #2425](https://github.com/BurntSushi/ripgrep/pull/2425):
Shell completions (and man page) can be created via `rg --generate`.
* [FEATURE #2524](https://github.com/BurntSushi/ripgrep/issues/2524):
The `--debug` flag now indicates whether stdin or `./` is being searched.
* [FEATURE #2643](https://github.com/BurntSushi/ripgrep/issues/2643):
Make `-d` a short flag for `--max-depth`.
* [FEATURE #2645](https://github.com/BurntSushi/ripgrep/issues/2645):
The `--version` output will now also contain PCRE2 availability information.
Bug fixes:
* [BUG #884](https://github.com/BurntSushi/ripgrep/issues/884):
Don't error when `-v/--invert-match` is used multiple times.
* [BUG #1275](https://github.com/BurntSushi/ripgrep/issues/1275):
Fix bug with `\b` assertion in the regex engine.
* [BUG #1376](https://github.com/BurntSushi/ripgrep/issues/1376):
Using `--no-ignore --ignore-vcs` now works as one would expect.
* [BUG #1622](https://github.com/BurntSushi/ripgrep/issues/1622):
Add note about error messages to `-z/--search-zip` documentation.
* [BUG #1648](https://github.com/BurntSushi/ripgrep/issues/1648):
Fix bug where sometimes short flags with values, e.g., `-M 900`, would fail.
* [BUG #1701](https://github.com/BurntSushi/ripgrep/issues/1701):
Fix bug where some flags could not be repeated.
* [BUG #1757](https://github.com/BurntSushi/ripgrep/issues/1757):
Fix bug when searching a sub-directory didn't have ignores applied correctly.
* [BUG #1891](https://github.com/BurntSushi/ripgrep/issues/1891):
Fix bug when using `-w` with a regex that can match the empty string.
* [BUG #1911](https://github.com/BurntSushi/ripgrep/issues/1911):
Disable mmap searching in all non-64-bit environments.
* [BUG #1966](https://github.com/BurntSushi/ripgrep/issues/1966):
Fix bug where ripgrep can panic when printing to stderr.
* [BUG #2046](https://github.com/BurntSushi/ripgrep/issues/2046):
Clarify that `--pre` can accept any kind of path in the documentation.
* [BUG #2108](https://github.com/BurntSushi/ripgrep/issues/2108):
Improve docs for `-r/--replace` syntax.
* [BUG #2198](https://github.com/BurntSushi/ripgrep/issues/2198):
Fix bug where `--no-ignore-dot` would not ignore `.rgignore`.
* [BUG #2201](https://github.com/BurntSushi/ripgrep/issues/2201):
Improve docs for `-r/--replace` flag.
* [BUG #2288](https://github.com/BurntSushi/ripgrep/issues/2288):
`-A` and `-B` now only each partially override `-C`.
* [BUG #2236](https://github.com/BurntSushi/ripgrep/issues/2236):
Fix gitignore parsing bug where a trailing `\/` resulted in an error.
* [BUG #2243](https://github.com/BurntSushi/ripgrep/issues/2243):
Fix `--sort` flag for values other than `path`.
* [BUG #2246](https://github.com/BurntSushi/ripgrep/issues/2246):
Add note in `--debug` logs when binary files are ignored.
* [BUG #2337](https://github.com/BurntSushi/ripgrep/issues/2337):
Improve docs to mention that `--stats` is always implied by `--json`.
* [BUG #2381](https://github.com/BurntSushi/ripgrep/issues/2381):
Make `-p/--pretty` override flags like `--no-line-number`.
* [BUG #2392](https://github.com/BurntSushi/ripgrep/issues/2392):
Improve global git config parsing of the `excludesFile` field.
* [BUG #2418](https://github.com/BurntSushi/ripgrep/pull/2418):
Clarify sorting semantics of `--sort=path`.
* [BUG #2458](https://github.com/BurntSushi/ripgrep/pull/2458):
Make `--trim` run before `-M/--max-columns` takes effect.
* [BUG #2479](https://github.com/BurntSushi/ripgrep/issues/2479):
Add documentation about `.ignore`/`.rgignore` files in parent directories.
* [BUG #2480](https://github.com/BurntSushi/ripgrep/issues/2480):
Fix bug when using inline regex flags with `-e/--regexp`.
* [BUG #2505](https://github.com/BurntSushi/ripgrep/issues/2505):
Improve docs for `--vimgrep` by mentioning footguns and some work-arounds.
* [BUG #2519](https://github.com/BurntSushi/ripgrep/issues/2519):
Fix incorrect default value in documentation for `--field-match-separator`.
* [BUG #2523](https://github.com/BurntSushi/ripgrep/issues/2523):
Make executable searching take `.com` into account on Windows.
* [BUG #2574](https://github.com/BurntSushi/ripgrep/issues/2574):
Fix bug in `-w/--word-regexp` that would result in incorrect match offsets.
* [BUG #2623](https://github.com/BurntSushi/ripgrep/issues/2623):
Fix a number of bugs with the `-w/--word-regexp` flag.
* [BUG #2636](https://github.com/BurntSushi/ripgrep/pull/2636):
Strip release binaries for macOS.
13.0.0 (2021-06-12)

448
Cargo.lock generated
View File

@ -4,60 +4,39 @@ version = 3
[[package]]
name = "aho-corasick"
version = "0.7.18"
version = "1.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1e37cfd5e7657ada45f742d6e99ca5788580b5c529dc78faf11ece6dc702656f"
checksum = "8e60d3430d3a69478ad0993f19238d2df97c507009a52b3c10addcd7f6bcb916"
dependencies = [
"memchr",
]
[[package]]
name = "atty"
version = "0.2.14"
name = "anyhow"
version = "1.0.87"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d9b39be18770d11421cdb1b9947a45dd3f37e93092cbf377614828a319d5fee8"
dependencies = [
"hermit-abi",
"libc",
"winapi",
]
[[package]]
name = "base64"
version = "0.13.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "904dfeac50f3cdaba28fc6f57fdcddb75f49ed61346676a78c4ffe55877802fd"
[[package]]
name = "bitflags"
version = "1.3.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bef38d45163c2f1dde094a7dfd33ccf595c92905c8f8f4fdc18d06fb1037718a"
checksum = "10f00e1f6e58a40e807377c75c6a7f97bf9044fab57816f2414e6f5f4499d7b8"
[[package]]
name = "bstr"
version = "0.2.17"
version = "1.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ba3569f383e8f1598449f1a423e72e99569137b47740b1da11ef19af3d5c3223"
checksum = "40723b8fb387abc38f4f4a37c09073622e41dd12327033091ef8950659e6dc0c"
dependencies = [
"lazy_static",
"memchr",
"regex-automata",
"serde",
]
[[package]]
name = "bytecount"
version = "0.6.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "72feb31ffc86498dacdbd0fcebb56138e7177a8cc5cea4516031d15ae85a742e"
[[package]]
name = "cc"
version = "1.0.73"
version = "1.1.18"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2fff2a6927b3bb87f9595d67196a70493f627687a71d87a0d692242c33f58c11"
checksum = "b62ac837cdb5cb22e10a256099b4fc502b1dfe560cb282963a974d7abd80e476"
dependencies = [
"jobserver",
"libc",
"shlex",
]
[[package]]
@ -67,45 +46,46 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd"
[[package]]
name = "clap"
version = "2.34.0"
name = "crossbeam-channel"
version = "0.5.15"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a0610544180c38b88101fecf2dd634b174a62eef6946f84dfc6a7127512b381c"
checksum = "82b8f8f868b36967f9606790d1903570de9ceaf870a7bf9fbbd3016d636a2cb2"
dependencies = [
"bitflags",
"strsim",
"textwrap",
"unicode-width",
"crossbeam-utils",
]
[[package]]
name = "crossbeam-channel"
version = "0.5.4"
name = "crossbeam-deque"
version = "0.8.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5aaa7bd5fb665c6864b5f963dd9097905c54125909c7aa94c9e18507cdbe6c53"
checksum = "613f8cc01fe9cf1a3eb3d7f488fd2fa8388403e97039e2f73692932e291a770d"
dependencies = [
"crossbeam-epoch",
"crossbeam-utils",
]
[[package]]
name = "crossbeam-epoch"
version = "0.9.18"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5b82ac4a3c2ca9c3460964f020e1402edd5753411d7737aa39c3714ad1b5420e"
dependencies = [
"cfg-if",
"crossbeam-utils",
]
[[package]]
name = "crossbeam-utils"
version = "0.8.8"
version = "0.8.20"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0bf124c720b7686e3c2663cf54062ab0f68a88af2fb6a030e87e30bf721fcb38"
dependencies = [
"cfg-if",
"lazy_static",
]
checksum = "22ec99545bb0ed0ea7bb9b8e1e9122ea386ff8a48c0922e43f36d45ab09e0e80"
[[package]]
name = "encoding_rs"
version = "0.8.30"
version = "0.8.34"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7896dc8abb250ffdda33912550faa54c88ec8b998dec0b2c55ab224921ce11df"
checksum = "b45de904aa0b010bce2ab45264d0631681847fa7b6f2eaa7dab7619943bc4f59"
dependencies = [
"cfg-if",
"packed_simd_2",
]
[[package]]
@ -117,42 +97,29 @@ dependencies = [
"encoding_rs",
]
[[package]]
name = "fnv"
version = "1.0.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3f9eec918d3f24069decb9af1554cad7c880e2da24a9afd88aca000531ab82c1"
[[package]]
name = "fs_extra"
version = "1.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2022715d62ab30faffd124d40b76f4134a550a87792276512b18d63272333394"
[[package]]
name = "glob"
version = "0.3.0"
version = "0.3.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9b919933a397b79c37e33b77bb2aa3dc8eb6e165ad809e58ff75bc7db2e34574"
checksum = "d2fabcfbdc87f4758337ca535fb41a6d701b65693ce38287d856d1674551ec9b"
[[package]]
name = "globset"
version = "0.4.9"
version = "0.4.16"
dependencies = [
"aho-corasick",
"bstr",
"fnv",
"glob",
"lazy_static",
"log",
"regex",
"regex-automata",
"regex-syntax",
"serde",
"serde_json",
]
[[package]]
name = "grep"
version = "0.2.8"
version = "0.3.2"
dependencies = [
"grep-cli",
"grep-matcher",
@ -166,22 +133,19 @@ dependencies = [
[[package]]
name = "grep-cli"
version = "0.1.6"
version = "0.1.11"
dependencies = [
"atty",
"bstr",
"globset",
"lazy_static",
"libc",
"log",
"regex",
"same-file",
"termcolor",
"winapi-util",
]
[[package]]
name = "grep-matcher"
version = "0.1.5"
version = "0.1.7"
dependencies = [
"memchr",
"regex",
@ -189,21 +153,22 @@ dependencies = [
[[package]]
name = "grep-pcre2"
version = "0.1.5"
version = "0.1.8"
dependencies = [
"grep-matcher",
"log",
"pcre2",
]
[[package]]
name = "grep-printer"
version = "0.1.6"
version = "0.2.2"
dependencies = [
"base64",
"bstr",
"grep-matcher",
"grep-regex",
"grep-searcher",
"log",
"serde",
"serde_json",
"termcolor",
@ -211,80 +176,67 @@ dependencies = [
[[package]]
name = "grep-regex"
version = "0.1.9"
version = "0.1.13"
dependencies = [
"aho-corasick",
"bstr",
"grep-matcher",
"log",
"regex",
"regex-automata",
"regex-syntax",
"thread_local",
]
[[package]]
name = "grep-searcher"
version = "0.1.8"
version = "0.1.14"
dependencies = [
"bstr",
"bytecount",
"encoding_rs",
"encoding_rs_io",
"grep-matcher",
"grep-regex",
"log",
"memchr",
"memmap2",
"regex",
]
[[package]]
name = "hermit-abi"
version = "0.1.19"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "62b467343b94ba476dcb2500d242dadbb39557df889310ac77c5d99100aaac33"
dependencies = [
"libc",
]
[[package]]
name = "ignore"
version = "0.4.18"
version = "0.4.23"
dependencies = [
"bstr",
"crossbeam-channel",
"crossbeam-utils",
"crossbeam-deque",
"globset",
"lazy_static",
"log",
"memchr",
"regex",
"regex-automata",
"same-file",
"thread_local",
"walkdir",
"winapi-util",
]
[[package]]
name = "itoa"
version = "1.0.1"
version = "1.0.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1aab8fc367588b89dcee83ab0fd66b72b50b72fa1904d7095045ace2b0c81c35"
checksum = "49f1f14873335454500d59611f1cf4a4b0f786f9ac11f4312a78e4cf2566695b"
[[package]]
name = "jemalloc-sys"
version = "0.3.2"
version = "0.5.4+5.3.0-patched"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0d3b9f3f5c9b31aa0f5ed3260385ac205db665baa41d49bb8338008ae94ede45"
checksum = "ac6c1946e1cea1788cbfde01c993b52a10e2da07f4bac608228d1bed20bfebf2"
dependencies = [
"cc",
"fs_extra",
"libc",
]
[[package]]
name = "jemallocator"
version = "0.3.2"
version = "0.5.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "43ae63fcfc45e99ab3d1b29a46782ad679e98436c3169d15a167a1108a724b69"
checksum = "a0de374a9f8e63150e6f5e8a60cc14c668226d7a347d8aee1a45766e3c4dd3bc"
dependencies = [
"jemalloc-sys",
"libc",
@ -292,98 +244,62 @@ dependencies = [
[[package]]
name = "jobserver"
version = "0.1.24"
version = "0.1.32"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "af25a77299a7f711a01975c35a6a424eb6862092cc2d6c72c4ed6cbc56dfc1fa"
checksum = "48d1dbcbbeb6a7fec7e059840aa538bd62aaccf972c7346c4d9d2059312853d0"
dependencies = [
"libc",
]
[[package]]
name = "lazy_static"
version = "1.4.0"
name = "lexopt"
version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e2abad23fbc42b3700f2f279844dc832adb2b2eb069b2df918f455c4e18cc646"
checksum = "baff4b617f7df3d896f97fe922b64817f6cd9a756bb81d40f8883f2f66dcb401"
[[package]]
name = "libc"
version = "0.2.121"
version = "0.2.158"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "efaa7b300f3b5fe8eb6bf21ce3895e1751d9665086af2d64b42f19701015ff4f"
[[package]]
name = "libm"
version = "0.1.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7fc7aa29613bd6a620df431842069224d8bc9011086b1db4c0e0cd47fa03ec9a"
checksum = "d8adc4bb1803a324070e64a98ae98f38934d91957a99cfb3a43dcbc01bc56439"
[[package]]
name = "log"
version = "0.4.14"
version = "0.4.22"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "51b9bbe6c47d51fc3e1a9b945965946b4c44142ab8792c50835a980d362c2710"
dependencies = [
"cfg-if",
]
checksum = "a7a70ba024b9dc04c27ea2f0c0548feb474ec5c54bba33a7f72f873a39d07b24"
[[package]]
name = "memchr"
version = "2.4.1"
version = "2.7.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "308cc39be01b73d0d18f82a0e7b2a3df85245f84af96fdddc5d202d27e47b86a"
checksum = "78ca9ab1a0babb1e7d5695e3530886289c18cf2f87ec19a575a0abdce112e3a3"
[[package]]
name = "memmap2"
version = "0.5.3"
version = "0.9.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "057a3db23999c867821a7a59feb06a578fcb03685e983dff90daf9e7d24ac08f"
checksum = "fe751422e4a8caa417e13c3ea66452215d7d63e19e604f4980461212f3ae1322"
dependencies = [
"libc",
]
[[package]]
name = "num_cpus"
version = "1.13.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "19e64526ebdee182341572e50e9ad03965aa510cd94427a4549448f285e957a1"
dependencies = [
"hermit-abi",
"libc",
]
[[package]]
name = "once_cell"
version = "1.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "87f3e037eac156d1775da914196f0f37741a274155e34a0b7e427c35d2a2ecb9"
[[package]]
name = "packed_simd_2"
version = "0.3.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a1914cd452d8fccd6f9db48147b29fd4ae05bea9dc5d9ad578509f72415de282"
dependencies = [
"cfg-if",
"libm",
]
[[package]]
name = "pcre2"
version = "0.2.3"
version = "0.2.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "85b30f2f69903b439dd9dc9e824119b82a55bf113b29af8d70948a03c1b11ab1"
checksum = "3be55c43ac18044541d58d897e8f4c55157218428953ebd39d86df3ba0286b2b"
dependencies = [
"libc",
"log",
"pcre2-sys",
"thread_local",
]
[[package]]
name = "pcre2-sys"
version = "0.2.5"
version = "0.2.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dec30e5e9ec37eb8fbf1dea5989bc957fd3df56fbee5061aa7b7a99dbb37b722"
checksum = "550f5d18fb1b90c20b87e161852c10cde77858c3900c5059b5ad2a1449f11d8a"
dependencies = [
"cc",
"libc",
@ -392,76 +308,81 @@ dependencies = [
[[package]]
name = "pkg-config"
version = "0.3.24"
version = "0.3.30"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "58893f751c9b0412871a09abd62ecd2a00298c6c83befa223ef98c52aef40cbe"
checksum = "d231b230927b5e4ad203db57bbcbee2802f6bce620b1e4a9024a07d94e2907ec"
[[package]]
name = "proc-macro2"
version = "1.0.36"
version = "1.0.86"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c7342d5883fbccae1cc37a2353b09c87c9b0f3afd73f5fb9bba687a1f733b029"
checksum = "5e719e8df665df0d1c8fbfd238015744736151d4445ec0836b8e628aae103b77"
dependencies = [
"unicode-xid",
"unicode-ident",
]
[[package]]
name = "quote"
version = "1.0.16"
version = "1.0.37"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b4af2ec4714533fcdf07e886f17025ace8b997b9ce51204ee69b6da831c3da57"
checksum = "b5b9d34b8991d19d98081b46eacdd8eb58c6f2b201139f7c5f643cc155a633af"
dependencies = [
"proc-macro2",
]
[[package]]
name = "regex"
version = "1.5.5"
version = "1.10.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1a11647b6b25ff05a515cb92c365cec08801e83423a235b51e231e1808747286"
checksum = "4219d74c6b67a3654a9fbebc4b419e22126d13d2f3c4a07ee0cb61ff79a79619"
dependencies = [
"aho-corasick",
"memchr",
"regex-automata",
"regex-syntax",
]
[[package]]
name = "regex-automata"
version = "0.4.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "38caf58cc5ef2fed281f89292ef23f6365465ed9a41b7a7754eb4e26496c92df"
dependencies = [
"aho-corasick",
"memchr",
"regex-syntax",
]
[[package]]
name = "regex-automata"
version = "0.1.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6c230d73fb8d8c1b9c0b3135c5142a8acee3a0558fb8db5cf1cb65f8d7862132"
[[package]]
name = "regex-syntax"
version = "0.6.25"
version = "0.8.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f497285884f3fcff424ffc933e56d7cbca511def0c9831a7f9b5f6153e3cc89b"
checksum = "7a66a03ae7c801facd77a29370b4faec201768915ac14a721ba36f20bc9c209b"
[[package]]
name = "ripgrep"
version = "13.0.0"
version = "14.1.1"
dependencies = [
"anyhow",
"bstr",
"clap",
"grep",
"ignore",
"jemallocator",
"lazy_static",
"lexopt",
"log",
"num_cpus",
"regex",
"serde",
"serde_derive",
"serde_json",
"termcolor",
"textwrap",
"walkdir",
]
[[package]]
name = "ryu"
version = "1.0.9"
version = "1.0.18"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "73b4b750c782965c211b42f022f59af1fbceabdd026623714f104152f1ec149f"
checksum = "f3cb5ba0dc43242ce17de99c180e96db90b235b8a9fdc9543c96d2209116bd9f"
[[package]]
name = "same-file"
@ -474,18 +395,18 @@ dependencies = [
[[package]]
name = "serde"
version = "1.0.136"
version = "1.0.210"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ce31e24b01e1e524df96f1c2fdd054405f8d7376249a5110886fb4b658484789"
checksum = "c8e3592472072e6e22e0a54d5904d9febf8508f65fb8552499a1abc7d1078c3a"
dependencies = [
"serde_derive",
]
[[package]]
name = "serde_derive"
version = "1.0.136"
version = "1.0.210"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "08597e7152fcd306f41838ed3e37be9eaeed2b61c42e2117266a554fab4662f9"
checksum = "243902eda00fad750862fc144cea25caca5e20d615af0a81bee94ca738f1df1f"
dependencies = [
"proc-macro2",
"quote",
@ -494,109 +415,142 @@ dependencies = [
[[package]]
name = "serde_json"
version = "1.0.79"
version = "1.0.128"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8e8d9fa5c3b304765ce1fd9c4c8a3de2c8db365a5b91be52f186efc675681d95"
checksum = "6ff5456707a1de34e7e37f2a6fd3d3f808c318259cbd01ab6377795054b483d8"
dependencies = [
"itoa",
"memchr",
"ryu",
"serde",
]
[[package]]
name = "strsim"
version = "0.8.0"
name = "shlex"
version = "1.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8ea5119cdb4c55b55d432abb513a0429384878c15dde60cc77b1c99de1a95a6a"
checksum = "0fda2ff0d084019ba4d7c6f371c95d8fd75ce3524c3cb8fb653a3023f6323e64"
[[package]]
name = "syn"
version = "1.0.89"
version = "2.0.77"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ea297be220d52398dcc07ce15a209fce436d361735ac1db700cab3b6cdfb9f54"
checksum = "9f35bcdf61fd8e7be6caf75f429fdca8beb3ed76584befb503b1569faee373ed"
dependencies = [
"proc-macro2",
"quote",
"unicode-xid",
"unicode-ident",
]
[[package]]
name = "termcolor"
version = "1.1.3"
version = "1.4.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bab24d30b911b2376f3a13cc2cd443142f0c81dda04c118693e35b3835757755"
checksum = "06794f8f6c5c898b3275aebefa6b8a1cb24cd2c6c79397ab15774837a0bc5755"
dependencies = [
"winapi-util",
]
[[package]]
name = "textwrap"
version = "0.11.0"
version = "0.16.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d326610f408c7a4eb6f51c37c330e496b08506c9457c9d34287ecc38809fb060"
dependencies = [
"unicode-width",
]
checksum = "23d434d3f8967a09480fb04132ebe0a3e088c173e6d0ee7897abbdf4eab0f8b9"
[[package]]
name = "thread_local"
version = "1.1.4"
name = "unicode-ident"
version = "1.0.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5516c27b78311c50bf42c071425c560ac799b11c30b31f87e3081965fe5e0180"
dependencies = [
"once_cell",
]
[[package]]
name = "unicode-width"
version = "0.1.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3ed742d4ea2bd1176e236172c8429aaf54486e7ac098db29ffe6529e0ce50973"
[[package]]
name = "unicode-xid"
version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8ccb82d61f80a663efe1f787a51b16b5a51e3314d6ac365b08639f52387b33f3"
checksum = "3354b9ac3fae1ff6755cb6db53683adb661634f67557942dea4facebec0fee4b"
[[package]]
name = "walkdir"
version = "2.3.2"
version = "2.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "808cf2735cd4b6866113f648b791c6adc5714537bc222d9347bb203386ffda56"
checksum = "29790946404f91d9c5d06f9874efddea1dc06c5efe94541a7d6863108e3a5e4b"
dependencies = [
"same-file",
"winapi",
"winapi-util",
]
[[package]]
name = "winapi"
version = "0.3.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5c839a674fcd7a98952e593242ea400abe93992746761e38641405d28b00f419"
dependencies = [
"winapi-i686-pc-windows-gnu",
"winapi-x86_64-pc-windows-gnu",
]
[[package]]
name = "winapi-i686-pc-windows-gnu"
version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ac3b87c63620426dd9b991e5ce0329eff545bccbbb34f3be09ff6fb6ab51b7b6"
[[package]]
name = "winapi-util"
version = "0.1.5"
version = "0.1.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "70ec6ce85bb158151cae5e5c87f95a8e97d2c0c4b001223f33a334e3ce5de178"
checksum = "cf221c93e13a30d793f7645a0e7762c55d169dbb0a49671918a2319d289b10bb"
dependencies = [
"winapi",
"windows-sys",
]
[[package]]
name = "winapi-x86_64-pc-windows-gnu"
version = "0.4.0"
name = "windows-sys"
version = "0.59.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "712e227841d057c1ee1cd2fb22fa7e5a5461ae8e48fa2ca79ec42cfc1931183f"
checksum = "1e38bc4d79ed67fd075bcc251a1c39b32a1776bbe92e5bef1f0bf1f8c531853b"
dependencies = [
"windows-targets",
]
[[package]]
name = "windows-targets"
version = "0.52.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9b724f72796e036ab90c1021d4780d4d3d648aca59e491e6b98e725b84e99973"
dependencies = [
"windows_aarch64_gnullvm",
"windows_aarch64_msvc",
"windows_i686_gnu",
"windows_i686_gnullvm",
"windows_i686_msvc",
"windows_x86_64_gnu",
"windows_x86_64_gnullvm",
"windows_x86_64_msvc",
]
[[package]]
name = "windows_aarch64_gnullvm"
version = "0.52.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "32a4622180e7a0ec044bb555404c800bc9fd9ec262ec147edd5989ccd0c02cd3"
[[package]]
name = "windows_aarch64_msvc"
version = "0.52.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "09ec2a7bb152e2252b53fa7803150007879548bc709c039df7627cabbd05d469"
[[package]]
name = "windows_i686_gnu"
version = "0.52.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8e9b5ad5ab802e97eb8e295ac6720e509ee4c243f69d781394014ebfe8bbfa0b"
[[package]]
name = "windows_i686_gnullvm"
version = "0.52.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0eee52d38c090b3caa76c563b86c3a4bd71ef1a819287c19d586d7334ae8ed66"
[[package]]
name = "windows_i686_msvc"
version = "0.52.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "240948bc05c5e7c6dabba28bf89d89ffce3e303022809e73deaefe4f6ec56c66"
[[package]]
name = "windows_x86_64_gnu"
version = "0.52.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "147a5c80aabfbf0c7d901cb5895d1de30ef2907eb21fbbab29ca94c5b08b1a78"
[[package]]
name = "windows_x86_64_gnullvm"
version = "0.52.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "24d5b23dc417412679681396f2b49f3de8c1473deb516bd34410872eff51ed0d"
[[package]]
name = "windows_x86_64_msvc"
version = "0.52.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "589f6da84c646204747d1270a2a5661ea66ed1cced2631d546fdfb155959f9ec"

View File

@ -1,6 +1,6 @@
[package]
name = "ripgrep"
version = "13.0.0" #:version
version = "14.1.1" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
ripgrep is a line-oriented search tool that recursively searches the current
@ -13,10 +13,18 @@ repository = "https://github.com/BurntSushi/ripgrep"
keywords = ["regex", "grep", "egrep", "search", "pattern"]
categories = ["command-line-utilities", "text-processing"]
license = "Unlicense OR MIT"
exclude = ["HomebrewFormula"]
exclude = [
"HomebrewFormula",
"/.github/",
"/ci/",
"/pkg/brew",
"/benchsuite/",
"/scripts/",
]
build = "build.rs"
autotests = false
edition = "2018"
edition = "2021"
rust-version = "1.72"
[[bin]]
bench = false
@ -41,31 +49,18 @@ members = [
]
[dependencies]
bstr = "0.2.12"
grep = { version = "0.2.8", path = "crates/grep" }
ignore = { version = "0.4.18", path = "crates/ignore" }
lazy_static = "1.1.0"
anyhow = "1.0.75"
bstr = "1.7.0"
grep = { version = "0.3.2", path = "crates/grep" }
ignore = { version = "0.4.23", path = "crates/ignore" }
lexopt = "0.3.0"
log = "0.4.5"
num_cpus = "1.8.0"
regex = "1.3.5"
serde_json = "1.0.23"
termcolor = "1.1.0"
[dependencies.clap]
version = "2.33.0"
default-features = false
features = ["suggestions"]
textwrap = { version = "0.16.0", default-features = false }
[target.'cfg(all(target_env = "musl", target_pointer_width = "64"))'.dependencies.jemallocator]
version = "0.3.0"
[build-dependencies]
lazy_static = "1.1.0"
[build-dependencies.clap]
version = "2.33.0"
default-features = false
features = ["suggestions"]
version = "0.5.0"
[dev-dependencies]
serde = "1.0.77"
@ -73,12 +68,30 @@ serde_derive = "1.0.77"
walkdir = "2"
[features]
simd-accel = ["grep/simd-accel"]
pcre2 = ["grep/pcre2"]
[profile.release]
debug = 1
[profile.release-lto]
inherits = "release"
opt-level = 3
debug = "none"
strip = "symbols"
debug-assertions = false
overflow-checks = false
lto = "fat"
panic = "abort"
incremental = false
codegen-units = 1
# This is the main way to strip binaries in the deb package created by
# 'cargo deb'. For other release binaries, we (currently) call 'strip'
# explicitly in the release process.
[profile.deb]
inherits = "release"
debug = false
[package.metadata.deb]
features = ["pcre2"]
section = "utils"

View File

@ -1,11 +0,0 @@
[target.x86_64-unknown-linux-musl]
image = "burntsushi/cross:x86_64-unknown-linux-musl"
[target.i686-unknown-linux-gnu]
image = "burntsushi/cross:i686-unknown-linux-gnu"
[target.mips64-unknown-linux-gnuabi64]
image = "burntsushi/cross:mips64-unknown-linux-gnuabi64"
[target.arm-unknown-linux-gnueabihf]
image = "burntsushi/cross:arm-unknown-linux-gnueabihf"

93
FAQ.md
View File

@ -61,18 +61,24 @@ patch release out with a fix. However, no promises are made.
Does ripgrep have a man page?
</h3>
Yes! Whenever ripgrep is compiled on a system with `asciidoctor` or `asciidoc`
present, then a man page is generated from ripgrep's argv parser. After
compiling ripgrep, you can find the man page like so from the root of the
repository:
Yes. If you installed ripgrep through a package manager on a Unix system, then
it would have ideally been installed for you in the proper location. In which
case, `man rg` should just work.
Otherwise, you can ask ripgrep to generate the man page:
```
$ find ./target -name rg.1 -print0 | xargs -0 ls -t | head -n1
./target/debug/build/ripgrep-79899d0edd4129ca/out/rg.1
$ mkdir -p man/man1
$ rg --generate man > man/man1/rg.1
$ MANPATH="$PWD/man" man rg
```
Running `man -l ./target/debug/build/ripgrep-79899d0edd4129ca/out/rg.1` will
show the man page in your normal pager.
Or, if your version of `man` supports the `-l/--local-file` flag, then this
will suffice:
```
$ rg --generate man | man -l -
```
Note that the man page's documentation for options is equivalent to the output
shown in `rg --help`. To see more condensed documentation (one line per flag),
@ -86,22 +92,59 @@ The man page is also included in all
Does ripgrep have support for shell auto-completion?
</h3>
Yes! Shell completions can be found in the
[same directory as the man page](#manpage)
after building ripgrep. Zsh completions are maintained separately and committed
to the repository in `complete/_rg`.
Yes! If you installed ripgrep through a package manager on a Unix system, then
the shell completion files included in the release archive should have been
installed for you automatically. If not, you can generate completions using
ripgrep's command line interface.
Shell completions are also included in all
[ripgrep binary releases](https://github.com/BurntSushi/ripgrep/releases).
For **bash**:
For **bash**, move `rg.bash` to
`$XDG_CONFIG_HOME/bash_completion` or `/etc/bash_completion.d/`.
```
$ dir="$XDG_CONFIG_HOME/bash_completion"
$ mkdir -p "$dir"
$ rg --generate complete-bash > "$dir/rg.bash"
```
For **fish**, move `rg.fish` to `$HOME/.config/fish/completions/`.
For **fish**:
For **zsh**, move `_rg` to one of your `$fpath` directories.
```
$ dir="$XDG_CONFIG_HOME/fish/completions"
$ mkdir -p "$dir"
$ rg --generate complete-fish > "$dir/rg.fish"
```
For **PowerShell**, add `. _rg.ps1` to your PowerShell
For **zsh**, the recommended approach is:
```zsh
$ dir="$HOME/.zsh-complete"
$ mkdir -p "$dir"
$ rg --generate complete-zsh > "$dir/_rg"
```
And then add `$HOME/.zsh-complete` to your `fpath` in, e.g., your
`$HOME/.zshrc` file:
```zsh
fpath=($HOME/.zsh-complete $fpath)
```
Or if you'd prefer to load and generate completions at the same time, you can
add the following to your `$HOME/.zshrc` file:
```zsh
$ source <(rg --generate complete-zsh)
```
Note though that while this approach is easier to setup, is generally slower
than the previous method, and will add more time to loading your shell prompt.
For **PowerShell**, create the completions:
```
$ rg --generate complete-powershell > _rg.ps1
```
And then add `. _rg.ps1` to your PowerShell
[profile](https://technet.microsoft.com/en-us/library/bb613488(v=vs.85).aspx)
(note the leading period). If the `_rg.ps1` file is not on your `PATH`, do
`. /path/to/_rg.ps1` instead.
@ -1010,15 +1053,11 @@ tools like ack or The Silver Searcher weren't already doing.
How can I donate to ripgrep or its maintainers?
</h3>
As of now, you can't. While I believe the various efforts that are being
undertaken to help fund FOSS are extremely important, they aren't a good fit
for me. ripgrep is and I hope will remain a project of love that I develop in
my free time. As such, involving money---even in the form of donations given
without expectations---would severely change that dynamic for me personally.
I welcome [sponsorship](https://github.com/sponsors/BurntSushi/).
Instead, I'd recommend donating to something else that is doing work that you
find meaningful. If you would like suggestions, then my favorites are:
Or if you'd prefer, donating to a charitably organization that you like would
also be most welcome. My favorites are:
* [The Internet Archive](https://archive.org/donate/)
* [Rails Girls](https://railsgirlssummerofcode.org/campaign/)
* [Rails Girls](https://railsgirlssummerofcode.org/)
* [Wikipedia](https://wikimediafoundation.org/support/)

View File

@ -178,11 +178,15 @@ search. By default, when you search a directory, ripgrep will ignore all of
the following:
1. Files and directories that match glob patterns in these three categories:
1. gitignore globs (including global and repo-specific globs).
1. `.gitignore` globs (including global and repo-specific globs). This
includes `.gitignore` files in parent directories that are part of the
same `git` repository. (Unless the `--no-require-git` flag is given.)
2. `.ignore` globs, which take precedence over all gitignore globs
when there's a conflict.
when there's a conflict. This includes `.ignore` files in parent
directories.
3. `.rgignore` globs, which take precedence over all `.ignore` globs
when there's a conflict.
when there's a conflict. This includes `.rgignore` files in parent
directories.
2. Hidden files and directories.
3. Binary files. (ripgrep considers any file with a `NUL` byte to be binary.)
4. Symbolic links aren't followed.
@ -190,7 +194,8 @@ the following:
All of these things can be toggled using various flags provided by ripgrep:
1. You can disable all ignore-related filtering with the `--no-ignore` flag.
2. Hidden files and directories can be searched with the `--hidden` flag.
2. Hidden files and directories can be searched with the `--hidden` (`-.` for
short) flag.
3. Binary files can be searched via the `--text` (`-a` for short) flag.
Be careful with this flag! Binary files may emit control characters to your
terminal, which might cause strange behavior.
@ -566,12 +571,15 @@ $ cat $HOME/.ripgreprc
--type-add
web:*.{html,css,js}*
# Search hidden files / directories (e.g. dotfiles) by default
--hidden
# Using glob patterns to include/exclude files or folders
--glob=!git/*
--glob=!.git/*
# or
--glob
!git/*
!.git/*
# Set the colors.
--colors=line:none
@ -993,7 +1001,7 @@ used options that will likely impact how you use ripgrep on a regular basis.
if the pattern contains any uppercase letters. Usually this flag is put into
alias or a config file.
* `-F/--fixed-strings`: Disable regular expression matching and treat the pattern
as a literal string.
as a literal string.
* `-w/--word-regexp`: Require that all matches of the pattern be surrounded
by word boundaries. That is, given `pattern`, the `--word-regexp` flag will
cause ripgrep to behave as if `pattern` were actually `\b(?:pattern)\b`.

226
README.md
View File

@ -2,11 +2,11 @@ ripgrep (rg)
------------
ripgrep is a line-oriented search tool that recursively searches the current
directory for a regex pattern. By default, ripgrep will respect gitignore rules
and automatically skip hidden files/directories and binary files. ripgrep
has first class support on Windows, macOS and Linux, with binary downloads
available for [every release](https://github.com/BurntSushi/ripgrep/releases).
ripgrep is similar to other popular search tools like The Silver Searcher, ack
and grep.
and automatically skip hidden files/directories and binary files. (To disable
all automatic filtering by default, use `rg -uuu`.) ripgrep has first class
support on Windows, macOS and Linux, with binary downloads available for [every
release](https://github.com/BurntSushi/ripgrep/releases). ripgrep is similar to
other popular search tools like The Silver Searcher, ack and grep.
[![Build status](https://github.com/BurntSushi/ripgrep/workflows/ci/badge.svg)](https://github.com/BurntSushi/ripgrep/actions)
[![Crates.io](https://img.shields.io/crates/v/ripgrep.svg)](https://crates.io/crates/ripgrep)
@ -42,7 +42,7 @@ This example searches the entire
[Linux kernel source tree](https://github.com/BurntSushi/linux)
(after running `make defconfig && make -j8`) for `[A-Z]+_SUSPEND`, where
all matches must be words. Timings were collected on a system with an Intel
i7-6900K 3.2 GHz.
i9-12900K 5.2 GHz.
Please remember that a single benchmark is never enough! See my
[blog post on ripgrep](https://blog.burntsushi.net/ripgrep/)
@ -50,13 +50,14 @@ for a very detailed comparison with more benchmarks and analysis.
| Tool | Command | Line count | Time |
| ---- | ------- | ---------- | ---- |
| ripgrep (Unicode) | `rg -n -w '[A-Z]+_SUSPEND'` | 452 | **0.136s** |
| [git grep](https://www.kernel.org/pub/software/scm/git/docs/git-grep.html) | `git grep -P -n -w '[A-Z]+_SUSPEND'` | 452 | 0.348s |
| [ugrep (Unicode)](https://github.com/Genivia/ugrep) | `ugrep -r --ignore-files --no-hidden -I -w '[A-Z]+_SUSPEND'` | 452 | 0.506s |
| [The Silver Searcher](https://github.com/ggreer/the_silver_searcher) | `ag -w '[A-Z]+_SUSPEND'` | 452 | 0.654s |
| [git grep](https://www.kernel.org/pub/software/scm/git/docs/git-grep.html) | `LC_ALL=C git grep -E -n -w '[A-Z]+_SUSPEND'` | 452 | 1.150s |
| [ack](https://github.com/beyondgrep/ack3) | `ack -w '[A-Z]+_SUSPEND'` | 452 | 4.054s |
| [git grep (Unicode)](https://www.kernel.org/pub/software/scm/git/docs/git-grep.html) | `LC_ALL=en_US.UTF-8 git grep -E -n -w '[A-Z]+_SUSPEND'` | 452 | 4.205s |
| ripgrep (Unicode) | `rg -n -w '[A-Z]+_SUSPEND'` | 536 | **0.082s** (1.00x) |
| [hypergrep](https://github.com/p-ranav/hypergrep) | `hgrep -n -w '[A-Z]+_SUSPEND'` | 536 | 0.167s (2.04x) |
| [git grep](https://www.kernel.org/pub/software/scm/git/docs/git-grep.html) | `git grep -P -n -w '[A-Z]+_SUSPEND'` | 536 | 0.273s (3.34x) |
| [The Silver Searcher](https://github.com/ggreer/the_silver_searcher) | `ag -w '[A-Z]+_SUSPEND'` | 534 | 0.443s (5.43x) |
| [ugrep](https://github.com/Genivia/ugrep) | `ugrep -r --ignore-files --no-hidden -I -w '[A-Z]+_SUSPEND'` | 536 | 0.639s (7.82x) |
| [git grep](https://www.kernel.org/pub/software/scm/git/docs/git-grep.html) | `LC_ALL=C git grep -E -n -w '[A-Z]+_SUSPEND'` | 536 | 0.727s (8.91x) |
| [git grep (Unicode)](https://www.kernel.org/pub/software/scm/git/docs/git-grep.html) | `LC_ALL=en_US.UTF-8 git grep -E -n -w '[A-Z]+_SUSPEND'` | 536 | 2.670s (32.70x) |
| [ack](https://github.com/beyondgrep/ack3) | `ack -w '[A-Z]+_SUSPEND'` | 2677 | 2.935s (35.94x) |
Here's another benchmark on the same corpus as above that disregards gitignore
files and searches with a whitelist instead. The corpus is the same as in the
@ -65,24 +66,52 @@ doing equivalent work:
| Tool | Command | Line count | Time |
| ---- | ------- | ---------- | ---- |
| ripgrep | `rg -uuu -tc -n -w '[A-Z]+_SUSPEND'` | 388 | **0.096s** |
| [ugrep](https://github.com/Genivia/ugrep) | `ugrep -r -n --include='*.c' --include='*.h' -w '[A-Z]+_SUSPEND'` | 388 | 0.493s |
| [GNU grep](https://www.gnu.org/software/grep/) | `egrep -r -n --include='*.c' --include='*.h' -w '[A-Z]+_SUSPEND'` | 388 | 0.806s |
| ripgrep | `rg -uuu -tc -n -w '[A-Z]+_SUSPEND'` | 447 | **0.063s** (1.00x) |
| [ugrep](https://github.com/Genivia/ugrep) | `ugrep -r -n --include='*.c' --include='*.h' -w '[A-Z]+_SUSPEND'` | 447 | 0.607s (9.62x) |
| [GNU grep](https://www.gnu.org/software/grep/) | `grep -E -r -n --include='*.c' --include='*.h' -w '[A-Z]+_SUSPEND'` | 447 | 0.674s (10.69x) |
And finally, a straight-up comparison between ripgrep, ugrep and GNU grep on a
single large file cached in memory
(~13GB, [`OpenSubtitles.raw.en.gz`](http://opus.nlpl.eu/download.php?f=OpenSubtitles/v2018/mono/OpenSubtitles.raw.en.gz)):
Now we'll move to searching on single large file. Here is a straight-up
comparison between ripgrep, ugrep and GNU grep on a file cached in memory
(~13GB, [`OpenSubtitles.raw.en.gz`](http://opus.nlpl.eu/download.php?f=OpenSubtitles/v2018/mono/OpenSubtitles.raw.en.gz), decompressed):
| Tool | Command | Line count | Time |
| ---- | ------- | ---------- | ---- |
| ripgrep | `rg -w 'Sherlock [A-Z]\w+'` | 7882 | **2.769s** |
| [ugrep](https://github.com/Genivia/ugrep) | `ugrep -w 'Sherlock [A-Z]\w+'` | 7882 | 6.802s |
| [GNU grep](https://www.gnu.org/software/grep/) | `LC_ALL=en_US.UTF-8 egrep -w 'Sherlock [A-Z]\w+'` | 7882 | 9.027s |
| ripgrep (Unicode) | `rg -w 'Sherlock [A-Z]\w+'` | 7882 | **1.042s** (1.00x) |
| [ugrep](https://github.com/Genivia/ugrep) | `ugrep -w 'Sherlock [A-Z]\w+'` | 7882 | 1.339s (1.28x) |
| [GNU grep (Unicode)](https://www.gnu.org/software/grep/) | `LC_ALL=en_US.UTF-8 egrep -w 'Sherlock [A-Z]\w+'` | 7882 | 6.577s (6.31x) |
In the above benchmark, passing the `-n` flag (for showing line numbers)
increases the times to `3.423s` for ripgrep and `13.031s` for GNU grep. ugrep
increases the times to `1.664s` for ripgrep and `9.484s` for GNU grep. ugrep
times are unaffected by the presence or absence of `-n`.
Beware of performance cliffs though:
| Tool | Command | Line count | Time |
| ---- | ------- | ---------- | ---- |
| ripgrep (Unicode) | `rg -w '[A-Z]\w+ Sherlock [A-Z]\w+'` | 485 | **1.053s** (1.00x) |
| [GNU grep (Unicode)](https://www.gnu.org/software/grep/) | `LC_ALL=en_US.UTF-8 grep -E -w '[A-Z]\w+ Sherlock [A-Z]\w+'` | 485 | 6.234s (5.92x) |
| [ugrep](https://github.com/Genivia/ugrep) | `ugrep -w '[A-Z]\w+ Sherlock [A-Z]\w+'` | 485 | 28.973s (27.51x) |
And performance can drop precipitously across the board when searching big
files for patterns without any opportunities for literal optimizations:
| Tool | Command | Line count | Time |
| ---- | ------- | ---------- | ---- |
| ripgrep | `rg '[A-Za-z]{30}'` | 6749 | **15.569s** (1.00x) |
| [ugrep](https://github.com/Genivia/ugrep) | `ugrep -E '[A-Za-z]{30}'` | 6749 | 21.857s (1.40x) |
| [GNU grep](https://www.gnu.org/software/grep/) | `LC_ALL=C grep -E '[A-Za-z]{30}'` | 6749 | 32.409s (2.08x) |
| [GNU grep (Unicode)](https://www.gnu.org/software/grep/) | `LC_ALL=en_US.UTF-8 grep -E '[A-Za-z]{30}'` | 6795 | 8m30s (32.74x) |
Finally, high match counts also tend to both tank performance and smooth
out the differences between tools (because performance is dominated by how
quickly one can handle a match and not the algorithm used to detect the match,
generally speaking):
| Tool | Command | Line count | Time |
| ---- | ------- | ---------- | ---- |
| ripgrep | `rg the` | 83499915 | **6.948s** (1.00x) |
| [ugrep](https://github.com/Genivia/ugrep) | `ugrep the` | 83499915 | 11.721s (1.69x) |
| [GNU grep](https://www.gnu.org/software/grep/) | `LC_ALL=C grep the` | 83499915 | 15.217s (2.19x) |
### Why should I use ripgrep?
@ -90,16 +119,16 @@ times are unaffected by the presence or absence of `-n`.
because it contains most of their features and is generally faster. (See
[the FAQ](FAQ.md#posix4ever) for more details on whether ripgrep can truly
replace grep.)
* Like other tools specialized to code search, ripgrep defaults to recursive
directory search and won't search files ignored by your
`.gitignore`/`.ignore`/`.rgignore` files. It also ignores hidden and binary
files by default. ripgrep also implements full support for `.gitignore`,
whereas there are many bugs related to that functionality in other code
search tools claiming to provide the same functionality.
* ripgrep can search specific types of files. For example, `rg -tpy foo`
limits your search to Python files and `rg -Tjs foo` excludes JavaScript
files from your search. ripgrep can be taught about new file types with
custom matching rules.
* Like other tools specialized to code search, ripgrep defaults to
[recursive search](GUIDE.md#recursive-search) and does [automatic
filtering](GUIDE.md#automatic-filtering). Namely, ripgrep won't search files
ignored by your `.gitignore`/`.ignore`/`.rgignore` files, it won't search
hidden files and it won't search binary files. Automatic filtering can be
disabled with `rg -uuu`.
* ripgrep can [search specific types of files](GUIDE.md#manual-filtering-file-types).
For example, `rg -tpy foo` limits your search to Python files and `rg -Tjs
foo` excludes JavaScript files from your search. ripgrep can be taught about
new file types with custom matching rules.
* ripgrep supports many features found in `grep`, such as showing the context
of search results, searching multiple patterns, highlighting matches with
color and full Unicode support. Unlike GNU grep, ripgrep stays fast while
@ -109,17 +138,21 @@ times are unaffected by the presence or absence of `-n`.
backreferences in your patterns, which are not supported in ripgrep's default
regex engine. PCRE2 support can be enabled with `-P/--pcre2` (use PCRE2
always) or `--auto-hybrid-regex` (use PCRE2 only if needed). An alternative
syntax is provided via the `--engine (default|pcre2|auto-hybrid)` option.
* ripgrep supports searching files in text encodings other than UTF-8, such
as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for
automatically detecting UTF-16 is provided. Other text encodings must be
specifically specified with the `-E/--encoding` flag.)
syntax is provided via the `--engine (default|pcre2|auto)` option.
* ripgrep has [rudimentary support for replacements](GUIDE.md#replacements),
which permit rewriting output based on what was matched.
* ripgrep supports [searching files in text encodings](GUIDE.md#file-encoding)
other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more.
(Some support for automatically detecting UTF-16 is provided. Other text
encodings must be specifically specified with the `-E/--encoding` flag.)
* ripgrep supports searching files compressed in a common format (brotli,
bzip2, gzip, lz4, lzma, xz, or zstandard) with the `-z/--search-zip` flag.
* ripgrep supports
[arbitrary input preprocessing filters](GUIDE.md#preprocessor)
which could be PDF text extraction, less supported decompression, decrypting,
automatic encoding detection and so on.
* ripgrep can be configured via a
[configuration file](GUIDE.md#configuration-file).
In other words, use ripgrep if you like speed, filtering by default, fewer
bugs and Unicode support.
@ -187,6 +220,16 @@ configuration files, passthru, support for searching compressed files,
multiline search and opt-in fancy regex support via PCRE2.
### Playground
If you'd like to try ripgrep before installing, there's an unofficial
[playground](https://codapi.org/ripgrep/) and an [interactive
tutorial](https://codapi.org/try/ripgrep/).
If you have any questions about these, please open an issue in the [tutorial
repo](https://github.com/nalgeon/tryxinyminutes).
### Installation
The binary name for ripgrep is `rg`.
@ -224,17 +267,25 @@ If you're a **Windows Scoop** user, then you can install ripgrep from the
$ scoop install ripgrep
```
If you're a **Windows Winget** user, then you can install ripgrep from the
[winget-pkgs](https://github.com/microsoft/winget-pkgs/tree/master/manifests/b/BurntSushi/ripgrep)
repository:
```
$ winget install BurntSushi.ripgrep.MSVC
```
If you're an **Arch Linux** user, then you can install ripgrep from the official repos:
```
$ pacman -S ripgrep
$ sudo pacman -S ripgrep
```
If you're a **Gentoo** user, you can install ripgrep from the
[official repo](https://packages.gentoo.org/packages/sys-apps/ripgrep):
```
$ emerge sys-apps/ripgrep
$ sudo emerge sys-apps/ripgrep
```
If you're a **Fedora** user, you can install ripgrep from official
@ -255,6 +306,7 @@ If you're a **RHEL/CentOS 7/8** user, you can install ripgrep from
[copr](https://copr.fedorainfracloud.org/coprs/carlwgeorge/ripgrep/):
```
$ sudo yum install -y yum-utils
$ sudo yum-config-manager --add-repo=https://copr.fedorainfracloud.org/coprs/carlwgeorge/ripgrep/repo/epel-7/carlwgeorge-ripgrep-epel-7.repo
$ sudo yum install ripgrep
```
@ -264,7 +316,19 @@ If you're a **Nix** user, you can install ripgrep from
```
$ nix-env --install ripgrep
$ # (Or using the attribute name, which is also ripgrep.)
```
If you're a **Flox** user, you can install ripgrep as follows:
```
$ flox install ripgrep
```
If you're a **Guix** user, you can install ripgrep from the official
package collection:
```
$ guix install ripgrep
```
If you're a **Debian** user (or a user of a Debian derivative like **Ubuntu**),
@ -272,12 +336,14 @@ then ripgrep can be installed using a binary `.deb` file provided in each
[ripgrep release](https://github.com/BurntSushi/ripgrep/releases).
```
$ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/13.0.0/ripgrep_13.0.0_amd64.deb
$ sudo dpkg -i ripgrep_13.0.0_amd64.deb
$ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/14.1.0/ripgrep_14.1.0-1_amd64.deb
$ sudo dpkg -i ripgrep_14.1.0-1_amd64.deb
```
If you run Debian Buster (currently Debian stable) or Debian sid, ripgrep is
[officially maintained by Debian](https://tracker.debian.org/pkg/rust-ripgrep).
If you run Debian stable, ripgrep is [officially maintained by
Debian](https://tracker.debian.org/pkg/rust-ripgrep), although its version may
be older than the `deb` package available in the previous step.
```
$ sudo apt-get install ripgrep
```
@ -295,11 +361,18 @@ seem to work right and generate a number of very strange bug reports that I
don't know how to fix and don't have the time to fix. Therefore, it is no
longer a recommended installation option.)
If you're an **ALT** user, you can install ripgrep from the
[official repo](https://packages.altlinux.org/en/search?name=ripgrep):
```
$ sudo apt-get install ripgrep
```
If you're a **FreeBSD** user, then you can install ripgrep from the
[official ports](https://www.freshports.org/textproc/ripgrep/):
```
# pkg install ripgrep
$ sudo pkg install ripgrep
```
If you're an **OpenBSD** user, then you can install ripgrep from the
@ -313,26 +386,33 @@ If you're a **NetBSD** user, then you can install ripgrep from
[pkgsrc](https://pkgsrc.se/textproc/ripgrep):
```
# pkgin install ripgrep
$ sudo pkgin install ripgrep
```
If you're a **Haiku x86_64** user, then you can install ripgrep from the
[official ports](https://github.com/haikuports/haikuports/tree/master/sys-apps/ripgrep):
```
$ pkgman install ripgrep
$ sudo pkgman install ripgrep
```
If you're a **Haiku x86_gcc2** user, then you can install ripgrep from the
same port as Haiku x86_64 using the x86 secondary architecture build:
```
$ pkgman install ripgrep_x86
$ sudo pkgman install ripgrep_x86
```
If you're a **Void Linux** user, then you can install ripgrep from the
[official repository](https://voidlinux.org/packages/?arch=x86_64&q=ripgrep):
```
$ sudo xbps-install -Syv ripgrep
```
If you're a **Rust programmer**, ripgrep can be installed with `cargo`.
* Note that the minimum supported version of Rust for ripgrep is **1.34.0**,
* Note that the minimum supported version of Rust for ripgrep is **1.72.0**,
although ripgrep may work with older versions.
* Note that the binary may be bigger than expected because it contains debug
symbols. This is intentional. To remove debug symbols and therefore reduce
@ -342,12 +422,20 @@ If you're a **Rust programmer**, ripgrep can be installed with `cargo`.
$ cargo install ripgrep
```
Alternatively, one can use [`cargo
binstall`](https://github.com/cargo-bins/cargo-binstall) to install a ripgrep
binary directly from GitHub:
```
$ cargo binstall ripgrep
```
### Building
ripgrep is written in Rust, so you'll need to grab a
[Rust installation](https://www.rust-lang.org/) in order to compile it.
ripgrep compiles with Rust 1.34.0 (stable) or newer. In general, ripgrep tracks
ripgrep compiles with Rust 1.72.0 (stable) or newer. In general, ripgrep tracks
the latest stable release of the Rust compiler.
To build ripgrep:
@ -360,18 +448,13 @@ $ ./target/release/rg --version
0.1.3
```
If you have a Rust nightly compiler and a recent Intel CPU, then you can enable
additional optional SIMD acceleration like so:
```
RUSTFLAGS="-C target-cpu=native" cargo build --release --features 'simd-accel'
```
The `simd-accel` feature enables SIMD support in certain ripgrep dependencies
(responsible for transcoding). They are not necessary to get SIMD optimizations
for search; those are enabled automatically. Hopefully, some day, the
`simd-accel` feature will similarly become unnecessary. **WARNING:** Currently,
enabling this option can increase compilation times dramatically.
**NOTE:** In the past, ripgrep supported a `simd-accel` Cargo feature when
using a Rust nightly compiler. This only benefited UTF-16 transcoding.
Since it required unstable features, this build mode was prone to breakage.
Because of that, support for it has been removed. If you want SIMD
optimizations for UTF-16 transcoding, then you'll have to petition the
[`encoding_rs`](https://github.com/hsivonen/encoding_rs) project to use stable
APIs.
Finally, optional PCRE2 support can be built with ripgrep by enabling the
`pcre2` feature:
@ -380,9 +463,6 @@ Finally, optional PCRE2 support can be built with ripgrep by enabling the
$ cargo build --release --features 'pcre2'
```
(Tip: use `--features 'pcre2 simd-accel'` to also include compile time SIMD
optimizations, which will only work with a nightly compiler.)
Enabling the PCRE2 feature works with a stable Rust compiler and will
attempt to automatically find and link with your system's PCRE2 library via
`pkg-config`. If one doesn't exist, then ripgrep will build PCRE2 from source
@ -419,12 +499,20 @@ $ cargo test --all
from the repository root.
### Related tools
* [delta](https://github.com/dandavison/delta) is a syntax highlighting
pager that supports the `rg --json` output format. So all you need to do to
make it work is `rg --json pattern | delta`. See [delta's manual section on
grep](https://dandavison.github.io/delta/grep.html) for more details.
### Vulnerability reporting
For reporting a security vulnerability, please
[contact Andrew Gallant](https://blog.burntsushi.net/about/),
which has my email address and PGP public key if you wish to send an encrypted
message.
[contact Andrew Gallant](https://blog.burntsushi.net/about/).
The contact page has my email address and PGP public key if you wish to send an
encrypted message.
### Translations

View File

@ -1,11 +1,12 @@
Release Checklist
-----------------
# Release Checklist
* Ensure local `master` is up to date with respect to `origin/master`.
* Run `cargo update` and review dependency updates. Commit updated
`Cargo.lock`.
* Run `cargo outdated` and review semver incompatible updates. Unless there is
a strong motivation otherwise, review and update every dependency. Also
run `--aggressive`, but don't update to crates that are still in beta.
* Update date in `crates/core/flags/doc/template.rg.1`.
* Review changes for every crate in `crates` since the last ripgrep release.
If the set of changes is non-empty, issue a new release for that crate. Check
crates in the following order. After updating a crate, ensure minimal
@ -26,7 +27,8 @@ Release Checklist
`cargo update -p ripgrep` so that the `Cargo.lock` is updated. Commit the
changes and create a new signed tag. Alternatively, use
`cargo-up --no-push --no-release Cargo.toml {VERSION}` to automate this.
* Push changes to GitHub, NOT including the tag. (But do not publish new
* Run `cargo package` and ensure it succeeds.
* Push changes to GitHub, NOT including the tag. (But do not publish a new
version of ripgrep to crates.io yet.)
* Once CI for `master` finishes successfully, push the version tag. (Trying to
do this in one step seems to result in GitHub Actions not seeing the tag
@ -39,8 +41,8 @@ Release Checklist
> tool that recursively searches the current directory for a regex pattern.
> By default, ripgrep will respect gitignore rules and automatically skip
> hidden files/directories and binary files.
* Run `ci/build-deb` locally and manually upload the deb package to the
release.
* Run `git checkout {VERSION} && ci/build-and-publish-m2 {VERSION}` on a macOS
system with Apple silicon.
* Run `cargo publish`.
* Run `ci/sha256-releases {VERSION} >> pkg/brew/ripgrep-bin.rb`. Then edit
`pkg/brew/ripgrep-bin.rb` to update the version number and sha256 hashes.
@ -52,5 +54,6 @@ Release Checklist
Unreleased changes. Release notes have not yet been written.
```
Note that
[`cargo-up` can be found in BurntSushi's dotfiles](https://github.com/BurntSushi/dotfiles/blob/master/bin/cargo-up).
Note that [`cargo-up` can be found in BurntSushi's dotfiles][dotfiles].
[dotfiles]: https://github.com/BurntSushi/dotfiles/blob/master/bin/cargo-up

View File

@ -26,15 +26,13 @@ SUBTITLES_DIR = 'subtitles'
SUBTITLES_EN_NAME = 'en.txt'
SUBTITLES_EN_NAME_SAMPLE = 'en.sample.txt'
SUBTITLES_EN_NAME_GZ = '%s.gz' % SUBTITLES_EN_NAME
# SUBTITLES_EN_URL = 'http://opus.lingfil.uu.se/OpenSubtitles2016/mono/OpenSubtitles2016.raw.en.gz' # noqa
SUBTITLES_EN_URL = 'https://object.pouta.csc.fi/OPUS-OpenSubtitles/v2016/mono/en.txt.gz' # noqa
SUBTITLES_RU_NAME = 'ru.txt'
SUBTITLES_RU_NAME_GZ = '%s.gz' % SUBTITLES_RU_NAME
# SUBTITLES_RU_URL = 'http://opus.lingfil.uu.se/OpenSubtitles2016/mono/OpenSubtitles2016.raw.ru.gz' # noqa
SUBTITLES_RU_URL = 'https://object.pouta.csc.fi/OPUS-OpenSubtitles/v2016/mono/ru.txt.gz' # noqa
LINUX_DIR = 'linux'
LINUX_CLONE = 'git://github.com/BurntSushi/linux'
LINUX_CLONE = 'https://github.com/BurntSushi/linux'
# Grep takes locale settings from the environment. There is a *substantial*
# performance impact for enabling Unicode, so we need to handle this explicitly
@ -546,7 +544,11 @@ def bench_subtitles_ru_literal(suite_dir):
Command('rg (lines)', ['rg', '-n', pat, ru]),
Command('ag (lines)', ['ag', '-s', pat, ru]),
Command('grep (lines)', ['grep', '-n', pat, ru], env=GREP_ASCII),
Command('ugrep (lines)', ['ugrep', '-n', pat, ru])
# ugrep incorrectly identifies this corpus as binary, but it is
# entirely valid UTF-8. So we tell ugrep to always treat the corpus
# as text even though this technically gives it an edge over other
# tools. (It no longer needs to check for binary data.)
Command('ugrep (lines)', ['ugrep', '-a', '-n', pat, ru])
])
@ -564,7 +566,8 @@ def bench_subtitles_ru_literal_casei(suite_dir):
Command('grep (ASCII)', ['grep', '-E', '-i', pat, ru], env=GREP_ASCII),
Command('rg (lines)', ['rg', '-n', '-i', pat, ru]),
Command('ag (lines) (ASCII)', ['ag', '-i', pat, ru]),
Command('ugrep (lines) (ASCII)', ['ugrep', '-n', '-i', pat, ru])
# See bench_subtitles_ru_literal for why we use '-a' here.
Command('ugrep (lines) (ASCII)', ['ugrep', '-a', '-n', '-i', pat, ru])
])
@ -588,7 +591,8 @@ def bench_subtitles_ru_literal_word(suite_dir):
Command('grep (ASCII)', [
'grep', '-nw', pat, ru,
], env=GREP_ASCII),
Command('ugrep (ASCII)', ['ugrep', '-nw', pat, ru]),
# See bench_subtitles_ru_literal for why we use '-a' here.
Command('ugrep (ASCII)', ['ugrep', '-anw', pat, ru]),
Command('rg', ['rg', '-nw', pat, ru]),
Command('grep', ['grep', '-nw', pat, ru], env=GREP_UNICODE),
])
@ -612,7 +616,8 @@ def bench_subtitles_ru_alternate(suite_dir):
Command('rg (lines)', ['rg', '-n', pat, ru]),
Command('ag (lines)', ['ag', '-s', pat, ru]),
Command('grep (lines)', ['grep', '-E', '-n', pat, ru], env=GREP_ASCII),
Command('ugrep (lines)', ['ugrep', '-n', pat, ru]),
# See bench_subtitles_ru_literal for why we use '-a' here.
Command('ugrep (lines)', ['ugrep', '-an', pat, ru]),
Command('rg', ['rg', pat, ru]),
Command('grep', ['grep', '-E', pat, ru], env=GREP_ASCII),
])
@ -637,7 +642,8 @@ def bench_subtitles_ru_alternate_casei(suite_dir):
Command('grep (ASCII)', [
'grep', '-E', '-ni', pat, ru,
], env=GREP_ASCII),
Command('ugrep (ASCII)', ['ugrep', '-n', '-i', pat, ru]),
# See bench_subtitles_ru_literal for why we use '-a' here.
Command('ugrep (ASCII)', ['ugrep', '-ani', pat, ru]),
Command('rg', ['rg', '-n', '-i', pat, ru]),
Command('grep', ['grep', '-E', '-ni', pat, ru], env=GREP_UNICODE),
])
@ -654,10 +660,11 @@ def bench_subtitles_ru_surrounding_words(suite_dir):
return Benchmark(pattern=pat, commands=[
Command('rg', ['rg', '-n', pat, ru]),
Command('grep', ['grep', '-E', '-n', pat, ru], env=GREP_UNICODE),
Command('ugrep', ['ugrep', '-n', pat, ru]),
Command('ugrep', ['ugrep', '-an', pat, ru]),
Command('ag (ASCII)', ['ag', '-s', pat, ru]),
Command('grep (ASCII)', ['grep', '-E', '-n', pat, ru], env=GREP_ASCII),
Command('ugrep (ASCII)', ['ugrep', '-n', '-U', pat, ru]),
# See bench_subtitles_ru_literal for why we use '-a' here.
Command('ugrep (ASCII)', ['ugrep', '-a', '-n', '-U', pat, ru]),
])
@ -676,11 +683,13 @@ def bench_subtitles_ru_no_literal(suite_dir):
return Benchmark(pattern=pat, commands=[
Command('rg', ['rg', '-n', pat, ru]),
Command('ugrep', ['ugrep', '-n', pat, ru]),
# See bench_subtitles_ru_literal for why we use '-a' here.
Command('ugrep', ['ugrep', '-an', pat, ru]),
Command('rg (ASCII)', ['rg', '-n', '(?-u)' + pat, ru]),
Command('ag (ASCII)', ['ag', '-s', pat, ru]),
Command('grep (ASCII)', ['grep', '-E', '-n', pat, ru], env=GREP_ASCII),
Command('ugrep (ASCII)', ['ugrep', '-n', '-U', pat, ru])
# See bench_subtitles_ru_literal for why we use '-a' here.
Command('ugrep (ASCII)', ['ugrep', '-anU', pat, ru])
])

View File

@ -0,0 +1,38 @@
This directory contains updated benchmarks as of 2022-12-16. They were captured
via the benchsuite script at `benchsuite/benchsuite` from the root of this
repository. The command that was run:
$ ./benchsuite \
--dir /dev/shm/benchsuite \
--raw runs/2022-12-16-archlinux-duff/raw.csv \
| tee runs/2022-12-16-archlinux-duff/summary
The versions of each tool are as follows:
$ rg --version
ripgrep 13.0.0 (rev 87c4a2b4b1)
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)
$ grep -V
grep (GNU grep) 3.8
$ ag -V
ag version 2.2.0
Features:
+jit +lzma +zlib
$ git --version
git version 2.39.0
$ ugrep --version
ugrep 3.9.2 x86_64-pc-linux-gnu +avx2 +pcre2jit +zlib +bzip2 +lzma +lz4 +zstd
License BSD-3-Clause: <https://opensource.org/licenses/BSD-3-Clause>
Written by Robert van Engelen and others: <https://github.com/Genivia/ugrep>
The version of ripgrep used was compiled from source on commit 7f23cd63:
$ cargo build --release --features 'pcre2'
This was run on a machine with an Intel i9-12900K with 128GB of memory.

View File

@ -0,0 +1,400 @@
benchmark,warmup_iter,iter,name,command,duration,lines,env
linux_literal_default,1,3,rg,rg PM_RESUME,0.08678817749023438,39,
linux_literal_default,1,3,rg,rg PM_RESUME,0.08307123184204102,39,
linux_literal_default,1,3,rg,rg PM_RESUME,0.08347964286804199,39,
linux_literal_default,1,3,ag,ag PM_RESUME,0.2955434322357178,39,
linux_literal_default,1,3,ag,ag PM_RESUME,0.2954287528991699,39,
linux_literal_default,1,3,ag,ag PM_RESUME,0.2938194274902344,39,
linux_literal_default,1,3,git grep,git grep PM_RESUME,0.23198556900024414,39,LC_ALL=en_US.UTF-8
linux_literal_default,1,3,git grep,git grep PM_RESUME,0.22356963157653809,39,LC_ALL=en_US.UTF-8
linux_literal_default,1,3,git grep,git grep PM_RESUME,0.2189793586730957,39,LC_ALL=en_US.UTF-8
linux_literal_default,1,3,ugrep,ugrep -r PM_RESUME ./,0.10710000991821289,39,
linux_literal_default,1,3,ugrep,ugrep -r PM_RESUME ./,0.10364222526550293,39,
linux_literal_default,1,3,ugrep,ugrep -r PM_RESUME ./,0.1052248477935791,39,
linux_literal_default,1,3,grep,grep -r PM_RESUME ./,0.9994468688964844,39,LC_ALL=en_US.UTF-8
linux_literal_default,1,3,grep,grep -r PM_RESUME ./,0.9939279556274414,39,LC_ALL=en_US.UTF-8
linux_literal_default,1,3,grep,grep -r PM_RESUME ./,0.9957931041717529,39,LC_ALL=en_US.UTF-8
linux_literal,1,3,rg,rg -n PM_RESUME,0.08603358268737793,39,
linux_literal,1,3,rg,rg -n PM_RESUME,0.0837090015411377,39,
linux_literal,1,3,rg,rg -n PM_RESUME,0.08435535430908203,39,
linux_literal,1,3,rg (mmap),rg -n --mmap PM_RESUME,0.3215503692626953,39,
linux_literal,1,3,rg (mmap),rg -n --mmap PM_RESUME,0.32426929473876953,39,
linux_literal,1,3,rg (mmap),rg -n --mmap PM_RESUME,0.3215982913970947,39,
linux_literal,1,3,ag (mmap),ag -s PM_RESUME,0.2894856929779053,39,
linux_literal,1,3,ag (mmap),ag -s PM_RESUME,0.2892603874206543,39,
linux_literal,1,3,ag (mmap),ag -s PM_RESUME,0.29217028617858887,39,
linux_literal,1,3,git grep,git grep -I -n PM_RESUME,0.206068754196167,39,LC_ALL=C
linux_literal,1,3,git grep,git grep -I -n PM_RESUME,0.2218036651611328,39,LC_ALL=C
linux_literal,1,3,git grep,git grep -I -n PM_RESUME,0.20590710639953613,39,LC_ALL=C
linux_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./,0.18692874908447266,39,
linux_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./,0.19518327713012695,39,
linux_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./,0.18577361106872559,39,
linux_literal_casei,1,3,rg,rg -n -i PM_RESUME,0.08709383010864258,536,
linux_literal_casei,1,3,rg,rg -n -i PM_RESUME,0.08861064910888672,536,
linux_literal_casei,1,3,rg,rg -n -i PM_RESUME,0.08769798278808594,536,
linux_literal_casei,1,3,rg (mmap),rg -n -i --mmap PM_RESUME,0.3218965530395508,536,
linux_literal_casei,1,3,rg (mmap),rg -n -i --mmap PM_RESUME,0.30869364738464355,536,
linux_literal_casei,1,3,rg (mmap),rg -n -i --mmap PM_RESUME,0.31044936180114746,536,
linux_literal_casei,1,3,ag (mmap),ag -i PM_RESUME,0.2989068031311035,536,
linux_literal_casei,1,3,ag (mmap),ag -i PM_RESUME,0.2996039390563965,536,
linux_literal_casei,1,3,ag (mmap),ag -i PM_RESUME,0.29817700386047363,536,
linux_literal_casei,1,3,git grep,git grep -I -n -i PM_RESUME,0.2122786045074463,536,LC_ALL=C
linux_literal_casei,1,3,git grep,git grep -I -n -i PM_RESUME,0.20763754844665527,536,LC_ALL=C
linux_literal_casei,1,3,git grep,git grep -I -n -i PM_RESUME,0.220794677734375,536,LC_ALL=C
linux_literal_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./,0.17305850982666016,536,
linux_literal_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./,0.1745915412902832,536,
linux_literal_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./,0.17526865005493164,536,
linux_re_literal_suffix,1,3,rg,rg -n [A-Z]+_RESUME,0.08527851104736328,2160,
linux_re_literal_suffix,1,3,rg,rg -n [A-Z]+_RESUME,0.08487534523010254,2160,
linux_re_literal_suffix,1,3,rg,rg -n [A-Z]+_RESUME,0.0848684310913086,2160,
linux_re_literal_suffix,1,3,ag,ag -s [A-Z]+_RESUME,0.37945985794067383,2160,
linux_re_literal_suffix,1,3,ag,ag -s [A-Z]+_RESUME,0.36303210258483887,2160,
linux_re_literal_suffix,1,3,ag,ag -s [A-Z]+_RESUME,0.36359691619873047,2160,
linux_re_literal_suffix,1,3,git grep,git grep -E -I -n [A-Z]+_RESUME,0.9589834213256836,2160,LC_ALL=C
linux_re_literal_suffix,1,3,git grep,git grep -E -I -n [A-Z]+_RESUME,0.9206984043121338,2160,LC_ALL=C
linux_re_literal_suffix,1,3,git grep,git grep -E -I -n [A-Z]+_RESUME,0.8642933368682861,2160,LC_ALL=C
linux_re_literal_suffix,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./,0.40503501892089844,2160,
linux_re_literal_suffix,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./,0.4531714916229248,2160,
linux_re_literal_suffix,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./,0.4397866725921631,2160,
linux_word,1,3,rg,rg -n -w PM_RESUME,0.08639907836914062,9,
linux_word,1,3,rg,rg -n -w PM_RESUME,0.08583569526672363,9,
linux_word,1,3,rg,rg -n -w PM_RESUME,0.08414363861083984,9,
linux_word,1,3,ag,ag -s -w PM_RESUME,0.2853865623474121,9,
linux_word,1,3,ag,ag -s -w PM_RESUME,0.2871377468109131,9,
linux_word,1,3,ag,ag -s -w PM_RESUME,0.28753662109375,9,
linux_word,1,3,git grep,git grep -E -I -n -w PM_RESUME,0.20428204536437988,9,LC_ALL=C
linux_word,1,3,git grep,git grep -E -I -n -w PM_RESUME,0.20490717887878418,9,LC_ALL=C
linux_word,1,3,git grep,git grep -E -I -n -w PM_RESUME,0.20840072631835938,9,LC_ALL=C
linux_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./,0.18790841102600098,9,
linux_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./,0.18659543991088867,9,
linux_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./,0.19104933738708496,9,
linux_unicode_greek,1,3,rg,rg -n \p{Greek},0.19976496696472168,105,
linux_unicode_greek,1,3,rg,rg -n \p{Greek},0.20618367195129395,105,
linux_unicode_greek,1,3,rg,rg -n \p{Greek},0.19702935218811035,105,
linux_unicode_greek,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./,0.17758727073669434,105,
linux_unicode_greek,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./,0.17793798446655273,105,
linux_unicode_greek,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./,0.1872577667236328,105,
linux_unicode_greek_casei,1,3,rg,rg -n -i \p{Greek},0.19808244705200195,245,
linux_unicode_greek_casei,1,3,rg,rg -n -i \p{Greek},0.1979837417602539,245,
linux_unicode_greek_casei,1,3,rg,rg -n -i \p{Greek},0.1984400749206543,245,
linux_unicode_greek_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./,0.1819148063659668,105,
linux_unicode_greek_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./,0.17530512809753418,105,
linux_unicode_greek_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./,0.17999005317687988,105,
linux_unicode_word,1,3,rg,rg -n \wAh,0.08527827262878418,247,
linux_unicode_word,1,3,rg,rg -n \wAh,0.08541679382324219,247,
linux_unicode_word,1,3,rg,rg -n \wAh,0.08553218841552734,247,
linux_unicode_word,1,3,rg (ASCII),rg -n (?-u)\wAh,0.08484745025634766,233,
linux_unicode_word,1,3,rg (ASCII),rg -n (?-u)\wAh,0.08466482162475586,233,
linux_unicode_word,1,3,rg (ASCII),rg -n (?-u)\wAh,0.08487439155578613,233,
linux_unicode_word,1,3,ag (ASCII),ag -s \wAh,0.3061795234680176,233,
linux_unicode_word,1,3,ag (ASCII),ag -s \wAh,0.2993617057800293,233,
linux_unicode_word,1,3,ag (ASCII),ag -s \wAh,0.29722046852111816,233,
linux_unicode_word,1,3,git grep,git grep -E -I -n \wAh,4.257144451141357,247,LC_ALL=en_US.UTF-8
linux_unicode_word,1,3,git grep,git grep -E -I -n \wAh,3.852163076400757,247,LC_ALL=en_US.UTF-8
linux_unicode_word,1,3,git grep,git grep -E -I -n \wAh,3.8293941020965576,247,LC_ALL=en_US.UTF-8
linux_unicode_word,1,3,git grep (ASCII),git grep -E -I -n \wAh,1.647632122039795,233,LC_ALL=C
linux_unicode_word,1,3,git grep (ASCII),git grep -E -I -n \wAh,1.6269629001617432,233,LC_ALL=C
linux_unicode_word,1,3,git grep (ASCII),git grep -E -I -n \wAh,1.5847914218902588,233,LC_ALL=C
linux_unicode_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \wAh ./,0.1802208423614502,247,
linux_unicode_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \wAh ./,0.17564702033996582,247,
linux_unicode_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \wAh ./,0.1746981143951416,247,
linux_unicode_word,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./,0.1799161434173584,233,
linux_unicode_word,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./,0.18733000755310059,233,
linux_unicode_word,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./,0.18859529495239258,233,
linux_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.26203155517578125,721,
linux_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.2615540027618408,721,
linux_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.2730247974395752,721,
linux_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.19902300834655762,720,
linux_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.20034146308898926,720,
linux_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.20192813873291016,720,
linux_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.8269081115722656,1134,
linux_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.8393104076385498,1134,
linux_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.8293666839599609,1134,
linux_no_literal,1,3,git grep,git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},7.334395408630371,721,LC_ALL=en_US.UTF-8
linux_no_literal,1,3,git grep,git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},7.338796854019165,721,LC_ALL=en_US.UTF-8
linux_no_literal,1,3,git grep,git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},7.36545991897583,721,LC_ALL=en_US.UTF-8
linux_no_literal,1,3,git grep (ASCII),git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},2.1588926315307617,720,LC_ALL=C
linux_no_literal,1,3,git grep (ASCII),git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},2.132209062576294,720,LC_ALL=C
linux_no_literal,1,3,git grep (ASCII),git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},2.1407439708709717,720,LC_ALL=C
linux_no_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,3.410162925720215,723,
linux_no_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,3.405057668685913,723,
linux_no_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,3.3945884704589844,723,
linux_no_literal,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,0.23865604400634766,722,
linux_no_literal,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,0.23371148109436035,722,
linux_no_literal,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,0.2343149185180664,722,
linux_alternates,1,3,rg,rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.08691263198852539,140,
linux_alternates,1,3,rg,rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.08707070350646973,140,
linux_alternates,1,3,rg,rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.08713960647583008,140,
linux_alternates,1,3,ag,ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.32947278022766113,140,
linux_alternates,1,3,ag,ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.33203840255737305,140,
linux_alternates,1,3,ag,ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.3292670249938965,140,
linux_alternates,1,3,git grep,git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.4576725959777832,140,LC_ALL=C
linux_alternates,1,3,git grep,git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.41936421394348145,140,LC_ALL=C
linux_alternates,1,3,git grep,git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.3639688491821289,140,LC_ALL=C
linux_alternates,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.17806458473205566,140,
linux_alternates,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.18224716186523438,140,
linux_alternates,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.17795038223266602,140,
linux_alternates_casei,1,3,rg,rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.12421393394470215,241,
linux_alternates_casei,1,3,rg,rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.12235784530639648,241,
linux_alternates_casei,1,3,rg,rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.12151455879211426,241,
linux_alternates_casei,1,3,ag,ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.529585599899292,241,
linux_alternates_casei,1,3,ag,ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.5305526256561279,241,
linux_alternates_casei,1,3,ag,ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.5311264991760254,241,
linux_alternates_casei,1,3,git grep,git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.7589735984802246,241,LC_ALL=C
linux_alternates_casei,1,3,git grep,git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.7852108478546143,241,LC_ALL=C
linux_alternates_casei,1,3,git grep,git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.8308050632476807,241,LC_ALL=C
linux_alternates_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.17955923080444336,241,
linux_alternates_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.1745290756225586,241,
linux_alternates_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.1773686408996582,241,
subtitles_en_literal,1,3,rg,rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.1213979721069336,830,
subtitles_en_literal,1,3,rg,rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.1213991641998291,830,
subtitles_en_literal,1,3,rg,rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.12620782852172852,830,
subtitles_en_literal,1,3,rg (no mmap),rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18207263946533203,830,
subtitles_en_literal,1,3,rg (no mmap),rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.17281484603881836,830,
subtitles_en_literal,1,3,rg (no mmap),rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.17368507385253906,830,
subtitles_en_literal,1,3,grep,grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.560560941696167,830,LC_ALL=C
subtitles_en_literal,1,3,grep,grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.563499927520752,830,LC_ALL=C
subtitles_en_literal,1,3,grep,grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.5916609764099121,830,LC_ALL=C
subtitles_en_literal,1,3,rg (lines),rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.19600844383239746,830,
subtitles_en_literal,1,3,rg (lines),rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18436980247497559,830,
subtitles_en_literal,1,3,rg (lines),rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18594050407409668,830,
subtitles_en_literal,1,3,ag (lines),ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.871025562286377,830,
subtitles_en_literal,1,3,ag (lines),ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.8636960983276367,830,
subtitles_en_literal,1,3,ag (lines),ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.8680994510650635,830,
subtitles_en_literal,1,3,grep (lines),grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.9978001117706299,830,LC_ALL=C
subtitles_en_literal,1,3,grep (lines),grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.9385361671447754,830,LC_ALL=C
subtitles_en_literal,1,3,grep (lines),grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.0036489963531494,830,LC_ALL=C
subtitles_en_literal,1,3,ugrep (lines),ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18918490409851074,830,
subtitles_en_literal,1,3,ugrep (lines),ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.1769108772277832,830,
subtitles_en_literal,1,3,ugrep (lines),ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18808293342590332,830,
subtitles_en_literal_casei,1,3,rg,rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.21876287460327148,871,
subtitles_en_literal_casei,1,3,rg,rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.2044692039489746,871,
subtitles_en_literal_casei,1,3,rg,rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.2184743881225586,871,
subtitles_en_literal_casei,1,3,grep,grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,2.224027156829834,871,LC_ALL=en_US.UTF-8
subtitles_en_literal_casei,1,3,grep,grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,2.223188877105713,871,LC_ALL=en_US.UTF-8
subtitles_en_literal_casei,1,3,grep,grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,2.223966598510742,871,LC_ALL=en_US.UTF-8
subtitles_en_literal_casei,1,3,grep (ASCII),grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.671149492263794,871,LC_ALL=C
subtitles_en_literal_casei,1,3,grep (ASCII),grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.6705749034881592,871,LC_ALL=C
subtitles_en_literal_casei,1,3,grep (ASCII),grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.6700258255004883,871,LC_ALL=C
subtitles_en_literal_casei,1,3,rg (lines),rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.2624058723449707,871,
subtitles_en_literal_casei,1,3,rg (lines),rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.25513339042663574,871,
subtitles_en_literal_casei,1,3,rg (lines),rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.26088857650756836,871,
subtitles_en_literal_casei,1,3,ag (lines) (ASCII),ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.9144322872161865,871,
subtitles_en_literal_casei,1,3,ag (lines) (ASCII),ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.866628885269165,871,
subtitles_en_literal_casei,1,3,ag (lines) (ASCII),ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.9098389148712158,871,
subtitles_en_literal_casei,1,3,ugrep (lines),ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.7860472202301025,871,
subtitles_en_literal_casei,1,3,ugrep (lines),ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.7858343124389648,871,
subtitles_en_literal_casei,1,3,ugrep (lines),ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.782252311706543,871,
subtitles_en_literal_word,1,3,rg (ASCII),rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt,0.18424677848815918,830,
subtitles_en_literal_word,1,3,rg (ASCII),rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt,0.19610810279846191,830,
subtitles_en_literal_word,1,3,rg (ASCII),rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt,0.18711471557617188,830,
subtitles_en_literal_word,1,3,ag (ASCII),ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.8301315307617188,830,
subtitles_en_literal_word,1,3,ag (ASCII),ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.8689801692962646,830,
subtitles_en_literal_word,1,3,ag (ASCII),ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.8279321193695068,830,
subtitles_en_literal_word,1,3,grep (ASCII),grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.0036842823028564,830,LC_ALL=C
subtitles_en_literal_word,1,3,grep (ASCII),grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.002833604812622,830,LC_ALL=C
subtitles_en_literal_word,1,3,grep (ASCII),grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.9236147403717041,830,LC_ALL=C
subtitles_en_literal_word,1,3,ugrep (ASCII),ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.17717313766479492,830,
subtitles_en_literal_word,1,3,ugrep (ASCII),ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18994617462158203,830,
subtitles_en_literal_word,1,3,ugrep (ASCII),ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.17972850799560547,830,
subtitles_en_literal_word,1,3,rg,rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18804550170898438,830,
subtitles_en_literal_word,1,3,rg,rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18867778778076172,830,
subtitles_en_literal_word,1,3,rg,rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.19913530349731445,830,
subtitles_en_literal_word,1,3,grep,grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.0044364929199219,830,LC_ALL=en_US.UTF-8
subtitles_en_literal_word,1,3,grep,grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.0040032863616943,830,LC_ALL=en_US.UTF-8
subtitles_en_literal_word,1,3,grep,grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.9627983570098877,830,LC_ALL=en_US.UTF-8
subtitles_en_alternate,1,3,rg (lines),rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.24848055839538574,1094,
subtitles_en_alternate,1,3,rg (lines),rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.24738383293151855,1094,
subtitles_en_alternate,1,3,rg (lines),rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.24789118766784668,1094,
subtitles_en_alternate,1,3,ag (lines),ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.668708562850952,1094,
subtitles_en_alternate,1,3,ag (lines),ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.57511305809021,1094,
subtitles_en_alternate,1,3,ag (lines),ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.6714110374450684,1094,
subtitles_en_alternate,1,3,grep (lines),grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.0586187839508057,1094,LC_ALL=C
subtitles_en_alternate,1,3,grep (lines),grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.0227150917053223,1094,LC_ALL=C
subtitles_en_alternate,1,3,grep (lines),grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.075378179550171,1094,LC_ALL=C
subtitles_en_alternate,1,3,ugrep (lines),ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.7863781452178955,1094,
subtitles_en_alternate,1,3,ugrep (lines),ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.7874250411987305,1094,
subtitles_en_alternate,1,3,ugrep (lines),ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.7867889404296875,1094,
subtitles_en_alternate,1,3,rg,rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.18195557594299316,1094,
subtitles_en_alternate,1,3,rg,rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.18239641189575195,1094,
subtitles_en_alternate,1,3,rg,rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.1625690460205078,1094,
subtitles_en_alternate,1,3,grep,grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,1.6601614952087402,1094,LC_ALL=C
subtitles_en_alternate,1,3,grep,grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,1.6617567539215088,1094,LC_ALL=C
subtitles_en_alternate,1,3,grep,grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,1.6584677696228027,1094,LC_ALL=C
subtitles_en_alternate_casei,1,3,ag (ASCII),ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,4.0028722286224365,1136,
subtitles_en_alternate_casei,1,3,ag (ASCII),ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.991217851638794,1136,
subtitles_en_alternate_casei,1,3,ag (ASCII),ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,4.00272274017334,1136,
subtitles_en_alternate_casei,1,3,grep (ASCII),grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.549154758453369,1136,LC_ALL=C
subtitles_en_alternate_casei,1,3,grep (ASCII),grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.5468921661376953,1136,LC_ALL=C
subtitles_en_alternate_casei,1,3,grep (ASCII),grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.5873491764068604,1136,LC_ALL=C
subtitles_en_alternate_casei,1,3,ugrep (ASCII),ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.7872169017791748,1136,
subtitles_en_alternate_casei,1,3,ugrep (ASCII),ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.784674882888794,1136,
subtitles_en_alternate_casei,1,3,ugrep (ASCII),ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.7882401943206787,1136,
subtitles_en_alternate_casei,1,3,rg,rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.4785435199737549,1136,
subtitles_en_alternate_casei,1,3,rg,rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.4940922260284424,1136,
subtitles_en_alternate_casei,1,3,rg,rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.4774627685546875,1136,
subtitles_en_alternate_casei,1,3,grep,grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.5677175521850586,1136,LC_ALL=en_US.UTF-8
subtitles_en_alternate_casei,1,3,grep,grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.603273391723633,1136,LC_ALL=en_US.UTF-8
subtitles_en_alternate_casei,1,3,grep,grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.5834741592407227,1136,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.20238041877746582,278,
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.2031264305114746,278,
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.20475172996520996,278,
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0288453102111816,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.044802188873291,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0432109832763672,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,43.00765633583069,278,
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,42.832849740982056,278,
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,42.915205240249634,278,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.083683967590332,,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0841526985168457,,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0850934982299805,,
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0116353034973145,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.9868073463439941,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0224814414978027,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8892502784729004,,
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8910088539123535,,
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8897674083709717,,
subtitles_en_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,2.11850643157959,22,
subtitles_en_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,2.1359670162200928,22,
subtitles_en_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,2.103114128112793,22,
subtitles_en_no_literal,1,3,ugrep,ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,13.050881385803223,22,
subtitles_en_no_literal,1,3,ugrep,ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,13.050772190093994,22,
subtitles_en_no_literal,1,3,ugrep,ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,13.05719804763794,22,
subtitles_en_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,1.9961926937103271,22,
subtitles_en_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,2.019721508026123,22,
subtitles_en_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,1.9965126514434814,22,
subtitles_en_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,6.849602222442627,302,
subtitles_en_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,6.813834190368652,302,
subtitles_en_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,6.8263633251190186,302,
subtitles_en_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,4.42924165725708,22,LC_ALL=C
subtitles_en_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,4.378557205200195,22,LC_ALL=C
subtitles_en_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,4.376646518707275,22,LC_ALL=C
subtitles_en_no_literal,1,3,ugrep (ASCII),ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,3.5110037326812744,22,
subtitles_en_no_literal,1,3,ugrep (ASCII),ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,3.5137360095977783,22,
subtitles_en_no_literal,1,3,ugrep (ASCII),ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,3.5051844120025635,22,
subtitles_ru_literal,1,3,rg,rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.13207745552062988,583,
subtitles_ru_literal,1,3,rg,rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.13084721565246582,583,
subtitles_ru_literal,1,3,rg,rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.13469862937927246,583,
subtitles_ru_literal,1,3,rg (no mmap),rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.18022370338439941,583,
subtitles_ru_literal,1,3,rg (no mmap),rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.1801767349243164,583,
subtitles_ru_literal,1,3,rg (no mmap),rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.17995166778564453,583,
subtitles_ru_literal,1,3,grep,grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.5151040554046631,583,LC_ALL=C
subtitles_ru_literal,1,3,grep,grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.5154542922973633,583,LC_ALL=C
subtitles_ru_literal,1,3,grep,grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.49927639961242676,583,LC_ALL=C
subtitles_ru_literal,1,3,rg (lines),rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.19464492797851562,583,
subtitles_ru_literal,1,3,rg (lines),rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.18920588493347168,583,
subtitles_ru_literal,1,3,rg (lines),rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.19465351104736328,583,
subtitles_ru_literal,1,3,ag (lines),ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,1.9595966339111328,583,
subtitles_ru_literal,1,3,ag (lines),ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,2.0014493465423584,583,
subtitles_ru_literal,1,3,ag (lines),ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,1.9567768573760986,583,
subtitles_ru_literal,1,3,grep (lines),grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8119180202484131,583,LC_ALL=C
subtitles_ru_literal,1,3,grep (lines),grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8111097812652588,583,LC_ALL=C
subtitles_ru_literal,1,3,grep (lines),grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8006868362426758,583,LC_ALL=C
subtitles_ru_literal,1,3,ugrep (lines),ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.70003342628479,583,
subtitles_ru_literal,1,3,ugrep (lines),ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.650275468826294,583,
subtitles_ru_literal,1,3,ugrep (lines),ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.689772367477417,583,
subtitles_ru_literal_casei,1,3,rg,rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.267578125,604,
subtitles_ru_literal_casei,1,3,rg,rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.2665982246398926,604,
subtitles_ru_literal_casei,1,3,rg,rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.26861572265625,604,
subtitles_ru_literal_casei,1,3,grep,grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,4.764627456665039,604,LC_ALL=en_US.UTF-8
subtitles_ru_literal_casei,1,3,grep,grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,4.767015695571899,604,LC_ALL=en_US.UTF-8
subtitles_ru_literal_casei,1,3,grep,grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,4.7688889503479,604,LC_ALL=en_US.UTF-8
subtitles_ru_literal_casei,1,3,grep (ASCII),grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.5046737194061279,583,LC_ALL=C
subtitles_ru_literal_casei,1,3,grep (ASCII),grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.5139875411987305,583,LC_ALL=C
subtitles_ru_literal_casei,1,3,grep (ASCII),grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.4993159770965576,583,LC_ALL=C
subtitles_ru_literal_casei,1,3,rg (lines),rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.33438658714294434,604,
subtitles_ru_literal_casei,1,3,rg (lines),rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.3398289680480957,604,
subtitles_ru_literal_casei,1,3,rg (lines),rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.3298227787017822,604,
subtitles_ru_literal_casei,1,3,ag (lines) (ASCII),ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.4468214511871338,,
subtitles_ru_literal_casei,1,3,ag (lines) (ASCII),ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.44559574127197266,,
subtitles_ru_literal_casei,1,3,ag (lines) (ASCII),ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.47882938385009766,,
subtitles_ru_literal_casei,1,3,ugrep (lines) (ASCII),ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.7039575576782227,583,
subtitles_ru_literal_casei,1,3,ugrep (lines) (ASCII),ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.6490752696990967,583,
subtitles_ru_literal_casei,1,3,ugrep (lines) (ASCII),ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8081104755401611,583,
subtitles_ru_literal_word,1,3,rg (ASCII),rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt,0.20162224769592285,583,
subtitles_ru_literal_word,1,3,rg (ASCII),rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt,0.18215250968933105,583,
subtitles_ru_literal_word,1,3,rg (ASCII),rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt,0.20087671279907227,583,
subtitles_ru_literal_word,1,3,ag (ASCII),ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.48624587059020996,,
subtitles_ru_literal_word,1,3,ag (ASCII),ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.5212516784667969,,
subtitles_ru_literal_word,1,3,ag (ASCII),ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.520557165145874,,
subtitles_ru_literal_word,1,3,grep (ASCII),grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8108196258544922,583,LC_ALL=C
subtitles_ru_literal_word,1,3,grep (ASCII),grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8121066093444824,583,LC_ALL=C
subtitles_ru_literal_word,1,3,grep (ASCII),grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.7784581184387207,583,LC_ALL=C
subtitles_ru_literal_word,1,3,ugrep (ASCII),ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.7469344139099121,583,
subtitles_ru_literal_word,1,3,ugrep (ASCII),ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.6838233470916748,583,
subtitles_ru_literal_word,1,3,ugrep (ASCII),ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.6921679973602295,583,
subtitles_ru_literal_word,1,3,rg,rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.19918251037597656,579,
subtitles_ru_literal_word,1,3,rg,rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.2046656608581543,579,
subtitles_ru_literal_word,1,3,rg,rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.1984848976135254,579,
subtitles_ru_literal_word,1,3,grep,grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.794173002243042,579,LC_ALL=en_US.UTF-8
subtitles_ru_literal_word,1,3,grep,grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.7715346813201904,579,LC_ALL=en_US.UTF-8
subtitles_ru_literal_word,1,3,grep,grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8116705417633057,579,LC_ALL=en_US.UTF-8
subtitles_ru_alternate,1,3,rg (lines),rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.6730976104736328,691,
subtitles_ru_alternate,1,3,rg (lines),rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.7020411491394043,691,
subtitles_ru_alternate,1,3,rg (lines),rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.6693949699401855,691,
subtitles_ru_alternate,1,3,ag (lines),ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.7100515365600586,691,
subtitles_ru_alternate,1,3,ag (lines),ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.7458419799804688,691,
subtitles_ru_alternate,1,3,ag (lines),ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.7115116119384766,691,
subtitles_ru_alternate,1,3,grep (lines),grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.703738451004028,691,LC_ALL=C
subtitles_ru_alternate,1,3,grep (lines),grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.715883731842041,691,LC_ALL=C
subtitles_ru_alternate,1,3,grep (lines),grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.712724924087524,691,LC_ALL=C
subtitles_ru_alternate,1,3,ugrep (lines),ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.276995420455933,691,
subtitles_ru_alternate,1,3,ugrep (lines),ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.304608345031738,691,
subtitles_ru_alternate,1,3,ugrep (lines),ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.322760820388794,691,
subtitles_ru_alternate,1,3,rg,rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.6119842529296875,691,
subtitles_ru_alternate,1,3,rg,rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.6368775367736816,691,
subtitles_ru_alternate,1,3,rg,rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.6258070468902588,691,
subtitles_ru_alternate,1,3,grep,grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.4300291538238525,691,LC_ALL=C
subtitles_ru_alternate,1,3,grep,grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.418199300765991,691,LC_ALL=C
subtitles_ru_alternate,1,3,grep,grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.425868511199951,691,LC_ALL=C
subtitles_ru_alternate_casei,1,3,ag (ASCII),ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.7216460704803467,691,
subtitles_ru_alternate_casei,1,3,ag (ASCII),ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.7108607292175293,691,
subtitles_ru_alternate_casei,1,3,ag (ASCII),ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.747138500213623,691,
subtitles_ru_alternate_casei,1,3,grep (ASCII),grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.711230039596558,691,LC_ALL=C
subtitles_ru_alternate_casei,1,3,grep (ASCII),grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.709407329559326,691,LC_ALL=C
subtitles_ru_alternate_casei,1,3,grep (ASCII),grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.714034557342529,691,LC_ALL=C
subtitles_ru_alternate_casei,1,3,ugrep (ASCII),ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.305904626846313,691,
subtitles_ru_alternate_casei,1,3,ugrep (ASCII),ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.307406187057495,691,
subtitles_ru_alternate_casei,1,3,ugrep (ASCII),ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.288233995437622,691,
subtitles_ru_alternate_casei,1,3,rg,rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,3.673624277114868,735,
subtitles_ru_alternate_casei,1,3,rg,rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,3.6759188175201416,735,
subtitles_ru_alternate_casei,1,3,rg,rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,3.66877818107605,735,
subtitles_ru_alternate_casei,1,3,grep,grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.366282224655151,735,LC_ALL=en_US.UTF-8
subtitles_ru_alternate_casei,1,3,grep,grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.370524883270264,735,LC_ALL=en_US.UTF-8
subtitles_ru_alternate_casei,1,3,grep,grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.342163324356079,735,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.20331382751464844,278,
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.2034592628479004,278,
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.20407724380493164,278,
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0436389446258545,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0388383865356445,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0446207523345947,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.29245424270629883,1,
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.29168128967285156,1,
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.29593825340270996,1,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.085604190826416,,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.083526372909546,,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.1223819255828857,,
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.9905192852020264,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0222513675689697,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0216262340545654,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8875806331634521,,
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8861405849456787,,
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8898241519927979,,
subtitles_ru_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.237398147583008,41,
subtitles_ru_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.253706693649292,41,
subtitles_ru_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.2161178588867188,41,
subtitles_ru_no_literal,1,3,ugrep,ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,28.85959553718567,41,
subtitles_ru_no_literal,1,3,ugrep,ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,28.666419982910156,41,
subtitles_ru_no_literal,1,3,ugrep,ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,28.90555214881897,41,
subtitles_ru_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.051813840866089,,
subtitles_ru_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.026675224304199,,
subtitles_ru_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.027498245239258,,
subtitles_ru_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0998010635375977,,
subtitles_ru_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0900018215179443,,
subtitles_ru_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0901548862457275,,
subtitles_ru_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0691263675689697,,LC_ALL=C
subtitles_ru_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0875153541564941,,LC_ALL=C
subtitles_ru_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0997354984283447,,LC_ALL=C
subtitles_ru_no_literal,1,3,ugrep (ASCII),ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,0.8329172134399414,,
subtitles_ru_no_literal,1,3,ugrep (ASCII),ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,0.8292679786682129,,
subtitles_ru_no_literal,1,3,ugrep (ASCII),ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,0.8326950073242188,,
1 benchmark warmup_iter iter name command duration lines env
2 linux_literal_default 1 3 rg rg PM_RESUME 0.08678817749023438 39
3 linux_literal_default 1 3 rg rg PM_RESUME 0.08307123184204102 39
4 linux_literal_default 1 3 rg rg PM_RESUME 0.08347964286804199 39
5 linux_literal_default 1 3 ag ag PM_RESUME 0.2955434322357178 39
6 linux_literal_default 1 3 ag ag PM_RESUME 0.2954287528991699 39
7 linux_literal_default 1 3 ag ag PM_RESUME 0.2938194274902344 39
8 linux_literal_default 1 3 git grep git grep PM_RESUME 0.23198556900024414 39 LC_ALL=en_US.UTF-8
9 linux_literal_default 1 3 git grep git grep PM_RESUME 0.22356963157653809 39 LC_ALL=en_US.UTF-8
10 linux_literal_default 1 3 git grep git grep PM_RESUME 0.2189793586730957 39 LC_ALL=en_US.UTF-8
11 linux_literal_default 1 3 ugrep ugrep -r PM_RESUME ./ 0.10710000991821289 39
12 linux_literal_default 1 3 ugrep ugrep -r PM_RESUME ./ 0.10364222526550293 39
13 linux_literal_default 1 3 ugrep ugrep -r PM_RESUME ./ 0.1052248477935791 39
14 linux_literal_default 1 3 grep grep -r PM_RESUME ./ 0.9994468688964844 39 LC_ALL=en_US.UTF-8
15 linux_literal_default 1 3 grep grep -r PM_RESUME ./ 0.9939279556274414 39 LC_ALL=en_US.UTF-8
16 linux_literal_default 1 3 grep grep -r PM_RESUME ./ 0.9957931041717529 39 LC_ALL=en_US.UTF-8
17 linux_literal 1 3 rg rg -n PM_RESUME 0.08603358268737793 39
18 linux_literal 1 3 rg rg -n PM_RESUME 0.0837090015411377 39
19 linux_literal 1 3 rg rg -n PM_RESUME 0.08435535430908203 39
20 linux_literal 1 3 rg (mmap) rg -n --mmap PM_RESUME 0.3215503692626953 39
21 linux_literal 1 3 rg (mmap) rg -n --mmap PM_RESUME 0.32426929473876953 39
22 linux_literal 1 3 rg (mmap) rg -n --mmap PM_RESUME 0.3215982913970947 39
23 linux_literal 1 3 ag (mmap) ag -s PM_RESUME 0.2894856929779053 39
24 linux_literal 1 3 ag (mmap) ag -s PM_RESUME 0.2892603874206543 39
25 linux_literal 1 3 ag (mmap) ag -s PM_RESUME 0.29217028617858887 39
26 linux_literal 1 3 git grep git grep -I -n PM_RESUME 0.206068754196167 39 LC_ALL=C
27 linux_literal 1 3 git grep git grep -I -n PM_RESUME 0.2218036651611328 39 LC_ALL=C
28 linux_literal 1 3 git grep git grep -I -n PM_RESUME 0.20590710639953613 39 LC_ALL=C
29 linux_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./ 0.18692874908447266 39
30 linux_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./ 0.19518327713012695 39
31 linux_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./ 0.18577361106872559 39
32 linux_literal_casei 1 3 rg rg -n -i PM_RESUME 0.08709383010864258 536
33 linux_literal_casei 1 3 rg rg -n -i PM_RESUME 0.08861064910888672 536
34 linux_literal_casei 1 3 rg rg -n -i PM_RESUME 0.08769798278808594 536
35 linux_literal_casei 1 3 rg (mmap) rg -n -i --mmap PM_RESUME 0.3218965530395508 536
36 linux_literal_casei 1 3 rg (mmap) rg -n -i --mmap PM_RESUME 0.30869364738464355 536
37 linux_literal_casei 1 3 rg (mmap) rg -n -i --mmap PM_RESUME 0.31044936180114746 536
38 linux_literal_casei 1 3 ag (mmap) ag -i PM_RESUME 0.2989068031311035 536
39 linux_literal_casei 1 3 ag (mmap) ag -i PM_RESUME 0.2996039390563965 536
40 linux_literal_casei 1 3 ag (mmap) ag -i PM_RESUME 0.29817700386047363 536
41 linux_literal_casei 1 3 git grep git grep -I -n -i PM_RESUME 0.2122786045074463 536 LC_ALL=C
42 linux_literal_casei 1 3 git grep git grep -I -n -i PM_RESUME 0.20763754844665527 536 LC_ALL=C
43 linux_literal_casei 1 3 git grep git grep -I -n -i PM_RESUME 0.220794677734375 536 LC_ALL=C
44 linux_literal_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./ 0.17305850982666016 536
45 linux_literal_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./ 0.1745915412902832 536
46 linux_literal_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./ 0.17526865005493164 536
47 linux_re_literal_suffix 1 3 rg rg -n [A-Z]+_RESUME 0.08527851104736328 2160
48 linux_re_literal_suffix 1 3 rg rg -n [A-Z]+_RESUME 0.08487534523010254 2160
49 linux_re_literal_suffix 1 3 rg rg -n [A-Z]+_RESUME 0.0848684310913086 2160
50 linux_re_literal_suffix 1 3 ag ag -s [A-Z]+_RESUME 0.37945985794067383 2160
51 linux_re_literal_suffix 1 3 ag ag -s [A-Z]+_RESUME 0.36303210258483887 2160
52 linux_re_literal_suffix 1 3 ag ag -s [A-Z]+_RESUME 0.36359691619873047 2160
53 linux_re_literal_suffix 1 3 git grep git grep -E -I -n [A-Z]+_RESUME 0.9589834213256836 2160 LC_ALL=C
54 linux_re_literal_suffix 1 3 git grep git grep -E -I -n [A-Z]+_RESUME 0.9206984043121338 2160 LC_ALL=C
55 linux_re_literal_suffix 1 3 git grep git grep -E -I -n [A-Z]+_RESUME 0.8642933368682861 2160 LC_ALL=C
56 linux_re_literal_suffix 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./ 0.40503501892089844 2160
57 linux_re_literal_suffix 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./ 0.4531714916229248 2160
58 linux_re_literal_suffix 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./ 0.4397866725921631 2160
59 linux_word 1 3 rg rg -n -w PM_RESUME 0.08639907836914062 9
60 linux_word 1 3 rg rg -n -w PM_RESUME 0.08583569526672363 9
61 linux_word 1 3 rg rg -n -w PM_RESUME 0.08414363861083984 9
62 linux_word 1 3 ag ag -s -w PM_RESUME 0.2853865623474121 9
63 linux_word 1 3 ag ag -s -w PM_RESUME 0.2871377468109131 9
64 linux_word 1 3 ag ag -s -w PM_RESUME 0.28753662109375 9
65 linux_word 1 3 git grep git grep -E -I -n -w PM_RESUME 0.20428204536437988 9 LC_ALL=C
66 linux_word 1 3 git grep git grep -E -I -n -w PM_RESUME 0.20490717887878418 9 LC_ALL=C
67 linux_word 1 3 git grep git grep -E -I -n -w PM_RESUME 0.20840072631835938 9 LC_ALL=C
68 linux_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./ 0.18790841102600098 9
69 linux_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./ 0.18659543991088867 9
70 linux_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./ 0.19104933738708496 9
71 linux_unicode_greek 1 3 rg rg -n \p{Greek} 0.19976496696472168 105
72 linux_unicode_greek 1 3 rg rg -n \p{Greek} 0.20618367195129395 105
73 linux_unicode_greek 1 3 rg rg -n \p{Greek} 0.19702935218811035 105
74 linux_unicode_greek 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./ 0.17758727073669434 105
75 linux_unicode_greek 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./ 0.17793798446655273 105
76 linux_unicode_greek 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./ 0.1872577667236328 105
77 linux_unicode_greek_casei 1 3 rg rg -n -i \p{Greek} 0.19808244705200195 245
78 linux_unicode_greek_casei 1 3 rg rg -n -i \p{Greek} 0.1979837417602539 245
79 linux_unicode_greek_casei 1 3 rg rg -n -i \p{Greek} 0.1984400749206543 245
80 linux_unicode_greek_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./ 0.1819148063659668 105
81 linux_unicode_greek_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./ 0.17530512809753418 105
82 linux_unicode_greek_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./ 0.17999005317687988 105
83 linux_unicode_word 1 3 rg rg -n \wAh 0.08527827262878418 247
84 linux_unicode_word 1 3 rg rg -n \wAh 0.08541679382324219 247
85 linux_unicode_word 1 3 rg rg -n \wAh 0.08553218841552734 247
86 linux_unicode_word 1 3 rg (ASCII) rg -n (?-u)\wAh 0.08484745025634766 233
87 linux_unicode_word 1 3 rg (ASCII) rg -n (?-u)\wAh 0.08466482162475586 233
88 linux_unicode_word 1 3 rg (ASCII) rg -n (?-u)\wAh 0.08487439155578613 233
89 linux_unicode_word 1 3 ag (ASCII) ag -s \wAh 0.3061795234680176 233
90 linux_unicode_word 1 3 ag (ASCII) ag -s \wAh 0.2993617057800293 233
91 linux_unicode_word 1 3 ag (ASCII) ag -s \wAh 0.29722046852111816 233
92 linux_unicode_word 1 3 git grep git grep -E -I -n \wAh 4.257144451141357 247 LC_ALL=en_US.UTF-8
93 linux_unicode_word 1 3 git grep git grep -E -I -n \wAh 3.852163076400757 247 LC_ALL=en_US.UTF-8
94 linux_unicode_word 1 3 git grep git grep -E -I -n \wAh 3.8293941020965576 247 LC_ALL=en_US.UTF-8
95 linux_unicode_word 1 3 git grep (ASCII) git grep -E -I -n \wAh 1.647632122039795 233 LC_ALL=C
96 linux_unicode_word 1 3 git grep (ASCII) git grep -E -I -n \wAh 1.6269629001617432 233 LC_ALL=C
97 linux_unicode_word 1 3 git grep (ASCII) git grep -E -I -n \wAh 1.5847914218902588 233 LC_ALL=C
98 linux_unicode_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \wAh ./ 0.1802208423614502 247
99 linux_unicode_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \wAh ./ 0.17564702033996582 247
100 linux_unicode_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \wAh ./ 0.1746981143951416 247
101 linux_unicode_word 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./ 0.1799161434173584 233
102 linux_unicode_word 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./ 0.18733000755310059 233
103 linux_unicode_word 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./ 0.18859529495239258 233
104 linux_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.26203155517578125 721
105 linux_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.2615540027618408 721
106 linux_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.2730247974395752 721
107 linux_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.19902300834655762 720
108 linux_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.20034146308898926 720
109 linux_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.20192813873291016 720
110 linux_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.8269081115722656 1134
111 linux_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.8393104076385498 1134
112 linux_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.8293666839599609 1134
113 linux_no_literal 1 3 git grep git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 7.334395408630371 721 LC_ALL=en_US.UTF-8
114 linux_no_literal 1 3 git grep git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 7.338796854019165 721 LC_ALL=en_US.UTF-8
115 linux_no_literal 1 3 git grep git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 7.36545991897583 721 LC_ALL=en_US.UTF-8
116 linux_no_literal 1 3 git grep (ASCII) git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 2.1588926315307617 720 LC_ALL=C
117 linux_no_literal 1 3 git grep (ASCII) git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 2.132209062576294 720 LC_ALL=C
118 linux_no_literal 1 3 git grep (ASCII) git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 2.1407439708709717 720 LC_ALL=C
119 linux_no_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 3.410162925720215 723
120 linux_no_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 3.405057668685913 723
121 linux_no_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 3.3945884704589844 723
122 linux_no_literal 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 0.23865604400634766 722
123 linux_no_literal 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 0.23371148109436035 722
124 linux_no_literal 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 0.2343149185180664 722
125 linux_alternates 1 3 rg rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.08691263198852539 140
126 linux_alternates 1 3 rg rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.08707070350646973 140
127 linux_alternates 1 3 rg rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.08713960647583008 140
128 linux_alternates 1 3 ag ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.32947278022766113 140
129 linux_alternates 1 3 ag ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.33203840255737305 140
130 linux_alternates 1 3 ag ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.3292670249938965 140
131 linux_alternates 1 3 git grep git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.4576725959777832 140 LC_ALL=C
132 linux_alternates 1 3 git grep git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.41936421394348145 140 LC_ALL=C
133 linux_alternates 1 3 git grep git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.3639688491821289 140 LC_ALL=C
134 linux_alternates 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.17806458473205566 140
135 linux_alternates 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.18224716186523438 140
136 linux_alternates 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.17795038223266602 140
137 linux_alternates_casei 1 3 rg rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.12421393394470215 241
138 linux_alternates_casei 1 3 rg rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.12235784530639648 241
139 linux_alternates_casei 1 3 rg rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.12151455879211426 241
140 linux_alternates_casei 1 3 ag ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.529585599899292 241
141 linux_alternates_casei 1 3 ag ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.5305526256561279 241
142 linux_alternates_casei 1 3 ag ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.5311264991760254 241
143 linux_alternates_casei 1 3 git grep git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.7589735984802246 241 LC_ALL=C
144 linux_alternates_casei 1 3 git grep git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.7852108478546143 241 LC_ALL=C
145 linux_alternates_casei 1 3 git grep git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.8308050632476807 241 LC_ALL=C
146 linux_alternates_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.17955923080444336 241
147 linux_alternates_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.1745290756225586 241
148 linux_alternates_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.1773686408996582 241
149 subtitles_en_literal 1 3 rg rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.1213979721069336 830
150 subtitles_en_literal 1 3 rg rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.1213991641998291 830
151 subtitles_en_literal 1 3 rg rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.12620782852172852 830
152 subtitles_en_literal 1 3 rg (no mmap) rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18207263946533203 830
153 subtitles_en_literal 1 3 rg (no mmap) rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.17281484603881836 830
154 subtitles_en_literal 1 3 rg (no mmap) rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.17368507385253906 830
155 subtitles_en_literal 1 3 grep grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.560560941696167 830 LC_ALL=C
156 subtitles_en_literal 1 3 grep grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.563499927520752 830 LC_ALL=C
157 subtitles_en_literal 1 3 grep grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.5916609764099121 830 LC_ALL=C
158 subtitles_en_literal 1 3 rg (lines) rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.19600844383239746 830
159 subtitles_en_literal 1 3 rg (lines) rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18436980247497559 830
160 subtitles_en_literal 1 3 rg (lines) rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18594050407409668 830
161 subtitles_en_literal 1 3 ag (lines) ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.871025562286377 830
162 subtitles_en_literal 1 3 ag (lines) ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.8636960983276367 830
163 subtitles_en_literal 1 3 ag (lines) ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.8680994510650635 830
164 subtitles_en_literal 1 3 grep (lines) grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.9978001117706299 830 LC_ALL=C
165 subtitles_en_literal 1 3 grep (lines) grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.9385361671447754 830 LC_ALL=C
166 subtitles_en_literal 1 3 grep (lines) grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.0036489963531494 830 LC_ALL=C
167 subtitles_en_literal 1 3 ugrep (lines) ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18918490409851074 830
168 subtitles_en_literal 1 3 ugrep (lines) ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.1769108772277832 830
169 subtitles_en_literal 1 3 ugrep (lines) ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18808293342590332 830
170 subtitles_en_literal_casei 1 3 rg rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.21876287460327148 871
171 subtitles_en_literal_casei 1 3 rg rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.2044692039489746 871
172 subtitles_en_literal_casei 1 3 rg rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.2184743881225586 871
173 subtitles_en_literal_casei 1 3 grep grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 2.224027156829834 871 LC_ALL=en_US.UTF-8
174 subtitles_en_literal_casei 1 3 grep grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 2.223188877105713 871 LC_ALL=en_US.UTF-8
175 subtitles_en_literal_casei 1 3 grep grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 2.223966598510742 871 LC_ALL=en_US.UTF-8
176 subtitles_en_literal_casei 1 3 grep (ASCII) grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.671149492263794 871 LC_ALL=C
177 subtitles_en_literal_casei 1 3 grep (ASCII) grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.6705749034881592 871 LC_ALL=C
178 subtitles_en_literal_casei 1 3 grep (ASCII) grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.6700258255004883 871 LC_ALL=C
179 subtitles_en_literal_casei 1 3 rg (lines) rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.2624058723449707 871
180 subtitles_en_literal_casei 1 3 rg (lines) rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.25513339042663574 871
181 subtitles_en_literal_casei 1 3 rg (lines) rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.26088857650756836 871
182 subtitles_en_literal_casei 1 3 ag (lines) (ASCII) ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.9144322872161865 871
183 subtitles_en_literal_casei 1 3 ag (lines) (ASCII) ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.866628885269165 871
184 subtitles_en_literal_casei 1 3 ag (lines) (ASCII) ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.9098389148712158 871
185 subtitles_en_literal_casei 1 3 ugrep (lines) ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.7860472202301025 871
186 subtitles_en_literal_casei 1 3 ugrep (lines) ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.7858343124389648 871
187 subtitles_en_literal_casei 1 3 ugrep (lines) ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.782252311706543 871
188 subtitles_en_literal_word 1 3 rg (ASCII) rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt 0.18424677848815918 830
189 subtitles_en_literal_word 1 3 rg (ASCII) rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt 0.19610810279846191 830
190 subtitles_en_literal_word 1 3 rg (ASCII) rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt 0.18711471557617188 830
191 subtitles_en_literal_word 1 3 ag (ASCII) ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.8301315307617188 830
192 subtitles_en_literal_word 1 3 ag (ASCII) ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.8689801692962646 830
193 subtitles_en_literal_word 1 3 ag (ASCII) ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.8279321193695068 830
194 subtitles_en_literal_word 1 3 grep (ASCII) grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.0036842823028564 830 LC_ALL=C
195 subtitles_en_literal_word 1 3 grep (ASCII) grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.002833604812622 830 LC_ALL=C
196 subtitles_en_literal_word 1 3 grep (ASCII) grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.9236147403717041 830 LC_ALL=C
197 subtitles_en_literal_word 1 3 ugrep (ASCII) ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.17717313766479492 830
198 subtitles_en_literal_word 1 3 ugrep (ASCII) ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18994617462158203 830
199 subtitles_en_literal_word 1 3 ugrep (ASCII) ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.17972850799560547 830
200 subtitles_en_literal_word 1 3 rg rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18804550170898438 830
201 subtitles_en_literal_word 1 3 rg rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18867778778076172 830
202 subtitles_en_literal_word 1 3 rg rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.19913530349731445 830
203 subtitles_en_literal_word 1 3 grep grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.0044364929199219 830 LC_ALL=en_US.UTF-8
204 subtitles_en_literal_word 1 3 grep grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.0040032863616943 830 LC_ALL=en_US.UTF-8
205 subtitles_en_literal_word 1 3 grep grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.9627983570098877 830 LC_ALL=en_US.UTF-8
206 subtitles_en_alternate 1 3 rg (lines) rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.24848055839538574 1094
207 subtitles_en_alternate 1 3 rg (lines) rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.24738383293151855 1094
208 subtitles_en_alternate 1 3 rg (lines) rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.24789118766784668 1094
209 subtitles_en_alternate 1 3 ag (lines) ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.668708562850952 1094
210 subtitles_en_alternate 1 3 ag (lines) ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.57511305809021 1094
211 subtitles_en_alternate 1 3 ag (lines) ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.6714110374450684 1094
212 subtitles_en_alternate 1 3 grep (lines) grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.0586187839508057 1094 LC_ALL=C
213 subtitles_en_alternate 1 3 grep (lines) grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.0227150917053223 1094 LC_ALL=C
214 subtitles_en_alternate 1 3 grep (lines) grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.075378179550171 1094 LC_ALL=C
215 subtitles_en_alternate 1 3 ugrep (lines) ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.7863781452178955 1094
216 subtitles_en_alternate 1 3 ugrep (lines) ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.7874250411987305 1094
217 subtitles_en_alternate 1 3 ugrep (lines) ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.7867889404296875 1094
218 subtitles_en_alternate 1 3 rg rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.18195557594299316 1094
219 subtitles_en_alternate 1 3 rg rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.18239641189575195 1094
220 subtitles_en_alternate 1 3 rg rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.1625690460205078 1094
221 subtitles_en_alternate 1 3 grep grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 1.6601614952087402 1094 LC_ALL=C
222 subtitles_en_alternate 1 3 grep grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 1.6617567539215088 1094 LC_ALL=C
223 subtitles_en_alternate 1 3 grep grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 1.6584677696228027 1094 LC_ALL=C
224 subtitles_en_alternate_casei 1 3 ag (ASCII) ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 4.0028722286224365 1136
225 subtitles_en_alternate_casei 1 3 ag (ASCII) ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.991217851638794 1136
226 subtitles_en_alternate_casei 1 3 ag (ASCII) ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 4.00272274017334 1136
227 subtitles_en_alternate_casei 1 3 grep (ASCII) grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.549154758453369 1136 LC_ALL=C
228 subtitles_en_alternate_casei 1 3 grep (ASCII) grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.5468921661376953 1136 LC_ALL=C
229 subtitles_en_alternate_casei 1 3 grep (ASCII) grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.5873491764068604 1136 LC_ALL=C
230 subtitles_en_alternate_casei 1 3 ugrep (ASCII) ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.7872169017791748 1136
231 subtitles_en_alternate_casei 1 3 ugrep (ASCII) ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.784674882888794 1136
232 subtitles_en_alternate_casei 1 3 ugrep (ASCII) ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.7882401943206787 1136
233 subtitles_en_alternate_casei 1 3 rg rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.4785435199737549 1136
234 subtitles_en_alternate_casei 1 3 rg rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.4940922260284424 1136
235 subtitles_en_alternate_casei 1 3 rg rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.4774627685546875 1136
236 subtitles_en_alternate_casei 1 3 grep grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.5677175521850586 1136 LC_ALL=en_US.UTF-8
237 subtitles_en_alternate_casei 1 3 grep grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.603273391723633 1136 LC_ALL=en_US.UTF-8
238 subtitles_en_alternate_casei 1 3 grep grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.5834741592407227 1136 LC_ALL=en_US.UTF-8
239 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.20238041877746582 278
240 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.2031264305114746 278
241 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.20475172996520996 278
242 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0288453102111816 278 LC_ALL=en_US.UTF-8
243 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.044802188873291 278 LC_ALL=en_US.UTF-8
244 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0432109832763672 278 LC_ALL=en_US.UTF-8
245 subtitles_ru_surrounding_words 1 3 ugrep ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 43.00765633583069 278
246 subtitles_ru_surrounding_words 1 3 ugrep ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 42.832849740982056 278
247 subtitles_ru_surrounding_words 1 3 ugrep ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 42.915205240249634 278
248 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.083683967590332
249 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0841526985168457
250 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0850934982299805
251 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0116353034973145 LC_ALL=C
252 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.9868073463439941 LC_ALL=C
253 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0224814414978027 LC_ALL=C
254 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8892502784729004
255 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8910088539123535
256 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8897674083709717
257 subtitles_en_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 2.11850643157959 22
258 subtitles_en_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 2.1359670162200928 22
259 subtitles_en_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 2.103114128112793 22
260 subtitles_en_no_literal 1 3 ugrep ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 13.050881385803223 22
261 subtitles_en_no_literal 1 3 ugrep ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 13.050772190093994 22
262 subtitles_en_no_literal 1 3 ugrep ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 13.05719804763794 22
263 subtitles_en_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 1.9961926937103271 22
264 subtitles_en_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 2.019721508026123 22
265 subtitles_en_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 1.9965126514434814 22
266 subtitles_en_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 6.849602222442627 302
267 subtitles_en_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 6.813834190368652 302
268 subtitles_en_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 6.8263633251190186 302
269 subtitles_en_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 4.42924165725708 22 LC_ALL=C
270 subtitles_en_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 4.378557205200195 22 LC_ALL=C
271 subtitles_en_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 4.376646518707275 22 LC_ALL=C
272 subtitles_en_no_literal 1 3 ugrep (ASCII) ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 3.5110037326812744 22
273 subtitles_en_no_literal 1 3 ugrep (ASCII) ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 3.5137360095977783 22
274 subtitles_en_no_literal 1 3 ugrep (ASCII) ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 3.5051844120025635 22
275 subtitles_ru_literal 1 3 rg rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.13207745552062988 583
276 subtitles_ru_literal 1 3 rg rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.13084721565246582 583
277 subtitles_ru_literal 1 3 rg rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.13469862937927246 583
278 subtitles_ru_literal 1 3 rg (no mmap) rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.18022370338439941 583
279 subtitles_ru_literal 1 3 rg (no mmap) rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.1801767349243164 583
280 subtitles_ru_literal 1 3 rg (no mmap) rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.17995166778564453 583
281 subtitles_ru_literal 1 3 grep grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.5151040554046631 583 LC_ALL=C
282 subtitles_ru_literal 1 3 grep grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.5154542922973633 583 LC_ALL=C
283 subtitles_ru_literal 1 3 grep grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.49927639961242676 583 LC_ALL=C
284 subtitles_ru_literal 1 3 rg (lines) rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.19464492797851562 583
285 subtitles_ru_literal 1 3 rg (lines) rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.18920588493347168 583
286 subtitles_ru_literal 1 3 rg (lines) rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.19465351104736328 583
287 subtitles_ru_literal 1 3 ag (lines) ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 1.9595966339111328 583
288 subtitles_ru_literal 1 3 ag (lines) ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 2.0014493465423584 583
289 subtitles_ru_literal 1 3 ag (lines) ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 1.9567768573760986 583
290 subtitles_ru_literal 1 3 grep (lines) grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8119180202484131 583 LC_ALL=C
291 subtitles_ru_literal 1 3 grep (lines) grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8111097812652588 583 LC_ALL=C
292 subtitles_ru_literal 1 3 grep (lines) grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8006868362426758 583 LC_ALL=C
293 subtitles_ru_literal 1 3 ugrep (lines) ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.70003342628479 583
294 subtitles_ru_literal 1 3 ugrep (lines) ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.650275468826294 583
295 subtitles_ru_literal 1 3 ugrep (lines) ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.689772367477417 583
296 subtitles_ru_literal_casei 1 3 rg rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.267578125 604
297 subtitles_ru_literal_casei 1 3 rg rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.2665982246398926 604
298 subtitles_ru_literal_casei 1 3 rg rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.26861572265625 604
299 subtitles_ru_literal_casei 1 3 grep grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 4.764627456665039 604 LC_ALL=en_US.UTF-8
300 subtitles_ru_literal_casei 1 3 grep grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 4.767015695571899 604 LC_ALL=en_US.UTF-8
301 subtitles_ru_literal_casei 1 3 grep grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 4.7688889503479 604 LC_ALL=en_US.UTF-8
302 subtitles_ru_literal_casei 1 3 grep (ASCII) grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.5046737194061279 583 LC_ALL=C
303 subtitles_ru_literal_casei 1 3 grep (ASCII) grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.5139875411987305 583 LC_ALL=C
304 subtitles_ru_literal_casei 1 3 grep (ASCII) grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.4993159770965576 583 LC_ALL=C
305 subtitles_ru_literal_casei 1 3 rg (lines) rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.33438658714294434 604
306 subtitles_ru_literal_casei 1 3 rg (lines) rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.3398289680480957 604
307 subtitles_ru_literal_casei 1 3 rg (lines) rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.3298227787017822 604
308 subtitles_ru_literal_casei 1 3 ag (lines) (ASCII) ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.4468214511871338
309 subtitles_ru_literal_casei 1 3 ag (lines) (ASCII) ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.44559574127197266
310 subtitles_ru_literal_casei 1 3 ag (lines) (ASCII) ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.47882938385009766
311 subtitles_ru_literal_casei 1 3 ugrep (lines) (ASCII) ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.7039575576782227 583
312 subtitles_ru_literal_casei 1 3 ugrep (lines) (ASCII) ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.6490752696990967 583
313 subtitles_ru_literal_casei 1 3 ugrep (lines) (ASCII) ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8081104755401611 583
314 subtitles_ru_literal_word 1 3 rg (ASCII) rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt 0.20162224769592285 583
315 subtitles_ru_literal_word 1 3 rg (ASCII) rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt 0.18215250968933105 583
316 subtitles_ru_literal_word 1 3 rg (ASCII) rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt 0.20087671279907227 583
317 subtitles_ru_literal_word 1 3 ag (ASCII) ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.48624587059020996
318 subtitles_ru_literal_word 1 3 ag (ASCII) ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.5212516784667969
319 subtitles_ru_literal_word 1 3 ag (ASCII) ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.520557165145874
320 subtitles_ru_literal_word 1 3 grep (ASCII) grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8108196258544922 583 LC_ALL=C
321 subtitles_ru_literal_word 1 3 grep (ASCII) grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8121066093444824 583 LC_ALL=C
322 subtitles_ru_literal_word 1 3 grep (ASCII) grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.7784581184387207 583 LC_ALL=C
323 subtitles_ru_literal_word 1 3 ugrep (ASCII) ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.7469344139099121 583
324 subtitles_ru_literal_word 1 3 ugrep (ASCII) ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.6838233470916748 583
325 subtitles_ru_literal_word 1 3 ugrep (ASCII) ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.6921679973602295 583
326 subtitles_ru_literal_word 1 3 rg rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.19918251037597656 579
327 subtitles_ru_literal_word 1 3 rg rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.2046656608581543 579
328 subtitles_ru_literal_word 1 3 rg rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.1984848976135254 579
329 subtitles_ru_literal_word 1 3 grep grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.794173002243042 579 LC_ALL=en_US.UTF-8
330 subtitles_ru_literal_word 1 3 grep grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.7715346813201904 579 LC_ALL=en_US.UTF-8
331 subtitles_ru_literal_word 1 3 grep grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8116705417633057 579 LC_ALL=en_US.UTF-8
332 subtitles_ru_alternate 1 3 rg (lines) rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.6730976104736328 691
333 subtitles_ru_alternate 1 3 rg (lines) rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.7020411491394043 691
334 subtitles_ru_alternate 1 3 rg (lines) rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.6693949699401855 691
335 subtitles_ru_alternate 1 3 ag (lines) ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.7100515365600586 691
336 subtitles_ru_alternate 1 3 ag (lines) ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.7458419799804688 691
337 subtitles_ru_alternate 1 3 ag (lines) ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.7115116119384766 691
338 subtitles_ru_alternate 1 3 grep (lines) grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.703738451004028 691 LC_ALL=C
339 subtitles_ru_alternate 1 3 grep (lines) grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.715883731842041 691 LC_ALL=C
340 subtitles_ru_alternate 1 3 grep (lines) grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.712724924087524 691 LC_ALL=C
341 subtitles_ru_alternate 1 3 ugrep (lines) ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.276995420455933 691
342 subtitles_ru_alternate 1 3 ugrep (lines) ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.304608345031738 691
343 subtitles_ru_alternate 1 3 ugrep (lines) ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.322760820388794 691
344 subtitles_ru_alternate 1 3 rg rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.6119842529296875 691
345 subtitles_ru_alternate 1 3 rg rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.6368775367736816 691
346 subtitles_ru_alternate 1 3 rg rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.6258070468902588 691
347 subtitles_ru_alternate 1 3 grep grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.4300291538238525 691 LC_ALL=C
348 subtitles_ru_alternate 1 3 grep grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.418199300765991 691 LC_ALL=C
349 subtitles_ru_alternate 1 3 grep grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.425868511199951 691 LC_ALL=C
350 subtitles_ru_alternate_casei 1 3 ag (ASCII) ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.7216460704803467 691
351 subtitles_ru_alternate_casei 1 3 ag (ASCII) ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.7108607292175293 691
352 subtitles_ru_alternate_casei 1 3 ag (ASCII) ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.747138500213623 691
353 subtitles_ru_alternate_casei 1 3 grep (ASCII) grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.711230039596558 691 LC_ALL=C
354 subtitles_ru_alternate_casei 1 3 grep (ASCII) grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.709407329559326 691 LC_ALL=C
355 subtitles_ru_alternate_casei 1 3 grep (ASCII) grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.714034557342529 691 LC_ALL=C
356 subtitles_ru_alternate_casei 1 3 ugrep (ASCII) ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.305904626846313 691
357 subtitles_ru_alternate_casei 1 3 ugrep (ASCII) ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.307406187057495 691
358 subtitles_ru_alternate_casei 1 3 ugrep (ASCII) ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.288233995437622 691
359 subtitles_ru_alternate_casei 1 3 rg rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 3.673624277114868 735
360 subtitles_ru_alternate_casei 1 3 rg rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 3.6759188175201416 735
361 subtitles_ru_alternate_casei 1 3 rg rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 3.66877818107605 735
362 subtitles_ru_alternate_casei 1 3 grep grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.366282224655151 735 LC_ALL=en_US.UTF-8
363 subtitles_ru_alternate_casei 1 3 grep grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.370524883270264 735 LC_ALL=en_US.UTF-8
364 subtitles_ru_alternate_casei 1 3 grep grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.342163324356079 735 LC_ALL=en_US.UTF-8
365 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.20331382751464844 278
366 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.2034592628479004 278
367 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.20407724380493164 278
368 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0436389446258545 278 LC_ALL=en_US.UTF-8
369 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0388383865356445 278 LC_ALL=en_US.UTF-8
370 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0446207523345947 278 LC_ALL=en_US.UTF-8
371 subtitles_ru_surrounding_words 1 3 ugrep ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.29245424270629883 1
372 subtitles_ru_surrounding_words 1 3 ugrep ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.29168128967285156 1
373 subtitles_ru_surrounding_words 1 3 ugrep ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.29593825340270996 1
374 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.085604190826416
375 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.083526372909546
376 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.1223819255828857
377 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.9905192852020264 LC_ALL=C
378 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0222513675689697 LC_ALL=C
379 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0216262340545654 LC_ALL=C
380 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8875806331634521
381 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8861405849456787
382 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8898241519927979
383 subtitles_ru_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.237398147583008 41
384 subtitles_ru_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.253706693649292 41
385 subtitles_ru_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.2161178588867188 41
386 subtitles_ru_no_literal 1 3 ugrep ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 28.85959553718567 41
387 subtitles_ru_no_literal 1 3 ugrep ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 28.666419982910156 41
388 subtitles_ru_no_literal 1 3 ugrep ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 28.90555214881897 41
389 subtitles_ru_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.051813840866089
390 subtitles_ru_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.026675224304199
391 subtitles_ru_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.027498245239258
392 subtitles_ru_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0998010635375977
393 subtitles_ru_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0900018215179443
394 subtitles_ru_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0901548862457275
395 subtitles_ru_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0691263675689697 LC_ALL=C
396 subtitles_ru_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0875153541564941 LC_ALL=C
397 subtitles_ru_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0997354984283447 LC_ALL=C
398 subtitles_ru_no_literal 1 3 ugrep (ASCII) ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 0.8329172134399414
399 subtitles_ru_no_literal 1 3 ugrep (ASCII) ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 0.8292679786682129
400 subtitles_ru_no_literal 1 3 ugrep (ASCII) ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 0.8326950073242188

View File

@ -0,0 +1,208 @@
linux_literal_default (pattern: PM_RESUME)
------------------------------------------
rg* 0.084 +/- 0.002 (lines: 39)*
ag 0.295 +/- 0.001 (lines: 39)
git grep 0.225 +/- 0.007 (lines: 39)
ugrep 0.105 +/- 0.002 (lines: 39)
grep 0.996 +/- 0.003 (lines: 39)
linux_literal (pattern: PM_RESUME)
----------------------------------
rg* 0.085 +/- 0.001 (lines: 39)*
rg (mmap) 0.322 +/- 0.002 (lines: 39)
ag (mmap) 0.290 +/- 0.002 (lines: 39)
git grep 0.211 +/- 0.009 (lines: 39)
ugrep 0.189 +/- 0.005 (lines: 39)
linux_literal_casei (pattern: PM_RESUME)
----------------------------------------
rg* 0.088 +/- 0.001 (lines: 536)*
rg (mmap) 0.314 +/- 0.007 (lines: 536)
ag (mmap) 0.299 +/- 0.001 (lines: 536)
git grep 0.214 +/- 0.007 (lines: 536)
ugrep 0.174 +/- 0.001 (lines: 536)
linux_re_literal_suffix (pattern: [A-Z]+_RESUME)
------------------------------------------------
rg* 0.085 +/- 0.000 (lines: 2160)*
ag 0.369 +/- 0.009 (lines: 2160)
git grep 0.915 +/- 0.048 (lines: 2160)
ugrep 0.433 +/- 0.025 (lines: 2160)
linux_word (pattern: PM_RESUME)
-------------------------------
rg* 0.085 +/- 0.001 (lines: 9)*
ag 0.287 +/- 0.001 (lines: 9)
git grep 0.206 +/- 0.002 (lines: 9)
ugrep 0.189 +/- 0.002 (lines: 9)
linux_unicode_greek (pattern: \p{Greek})
----------------------------------------
rg 0.201 +/- 0.005 (lines: 105)
ugrep* 0.181 +/- 0.005 (lines: 105)*
linux_unicode_greek_casei (pattern: \p{Greek})
----------------------------------------------
rg 0.198 +/- 0.000 (lines: 245)
ugrep* 0.179 +/- 0.003 (lines: 105)*
linux_unicode_word (pattern: \wAh)
----------------------------------
rg 0.085 +/- 0.000 (lines: 247)
rg (ASCII)* 0.085 +/- 0.000 (lines: 233)*
ag (ASCII) 0.301 +/- 0.005 (lines: 233)
git grep 3.980 +/- 0.241 (lines: 247)
git grep (ASCII) 1.620 +/- 0.032 (lines: 233)
ugrep 0.177 +/- 0.003 (lines: 247)
ugrep (ASCII) 0.185 +/- 0.005 (lines: 233)
linux_no_literal (pattern: \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5})
-----------------------------------------------------------------
rg 0.266 +/- 0.006 (lines: 721)
rg (ASCII)* 0.200 +/- 0.001 (lines: 720)*
ag (ASCII) 0.832 +/- 0.007 (lines: 1134)
git grep 7.346 +/- 0.017 (lines: 721)
git grep (ASCII) 2.144 +/- 0.014 (lines: 720)
ugrep 3.403 +/- 0.008 (lines: 723)
ugrep (ASCII) 0.236 +/- 0.003 (lines: 722)
linux_alternates (pattern: ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT)
-------------------------------------------------------------------------
rg* 0.087 +/- 0.000 (lines: 140)*
ag 0.330 +/- 0.002 (lines: 140)
git grep 0.414 +/- 0.047 (lines: 140)
ugrep 0.179 +/- 0.002 (lines: 140)
linux_alternates_casei (pattern: ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT)
-------------------------------------------------------------------------------
rg* 0.123 +/- 0.001 (lines: 241)*
ag 0.530 +/- 0.001 (lines: 241)
git grep 0.792 +/- 0.036 (lines: 241)
ugrep 0.177 +/- 0.003 (lines: 241)
subtitles_en_literal (pattern: Sherlock Holmes)
-----------------------------------------------
rg* 0.123 +/- 0.003 (lines: 830)*
rg (no mmap) 0.176 +/- 0.005 (lines: 830)
grep 0.572 +/- 0.017 (lines: 830)
rg (lines) 0.189 +/- 0.006 (lines: 830)
ag (lines) 1.868 +/- 0.004 (lines: 830)
grep (lines) 0.980 +/- 0.036 (lines: 830)
ugrep (lines) 0.185 +/- 0.007 (lines: 830)
subtitles_en_literal_casei (pattern: Sherlock Holmes)
-----------------------------------------------------
rg* 0.214 +/- 0.008 (lines: 871)*
grep 2.224 +/- 0.000 (lines: 871)
grep (ASCII) 0.671 +/- 0.001 (lines: 871)
rg (lines) 0.259 +/- 0.004 (lines: 871)
ag (lines) (ASCII) 1.897 +/- 0.026 (lines: 871)
ugrep (lines) 0.785 +/- 0.002 (lines: 871)
subtitles_en_literal_word (pattern: Sherlock Holmes)
----------------------------------------------------
rg (ASCII) 0.189 +/- 0.006 (lines: 830)
ag (ASCII) 1.842 +/- 0.023 (lines: 830)
grep (ASCII) 0.977 +/- 0.046 (lines: 830)
ugrep (ASCII)* 0.182 +/- 0.007 (lines: 830)*
rg 0.192 +/- 0.006 (lines: 830)
grep 0.990 +/- 0.024 (lines: 830)
subtitles_en_alternate (pattern: Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)
---------------------------------------------------------------------------------------------------------------
rg (lines) 0.248 +/- 0.001 (lines: 1094)
ag (lines) 2.638 +/- 0.055 (lines: 1094)
grep (lines) 2.052 +/- 0.027 (lines: 1094)
ugrep (lines) 0.787 +/- 0.001 (lines: 1094)
rg* 0.176 +/- 0.011 (lines: 1094)*
grep 1.660 +/- 0.002 (lines: 1094)
subtitles_en_alternate_casei (pattern: Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)
---------------------------------------------------------------------------------------------------------------------
ag (ASCII) 3.999 +/- 0.007 (lines: 1136)
grep (ASCII) 3.561 +/- 0.023 (lines: 1136)
ugrep (ASCII) 0.787 +/- 0.002 (lines: 1136)
rg* 0.483 +/- 0.009 (lines: 1136)*
grep 3.585 +/- 0.018 (lines: 1136)
subtitles_en_surrounding_words (pattern: \w+\s+Holmes\s+\w+)
------------------------------------------------------------
rg 0.200 +/- 0.001 (lines: 483)
grep 1.303 +/- 0.040 (lines: 483)
ugrep 43.220 +/- 0.047 (lines: 483)
rg (ASCII)* 0.197 +/- 0.000 (lines: 483)*
ag (ASCII) 5.223 +/- 0.056 (lines: 489)
grep (ASCII) 1.316 +/- 0.043 (lines: 483)
ugrep (ASCII) 17.647 +/- 0.219 (lines: 483)
subtitles_en_no_literal (pattern: \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5})
----------------------------------------------------------------------------------------
rg 2.119 +/- 0.016 (lines: 22)
ugrep 13.053 +/- 0.004 (lines: 22)
rg (ASCII)* 2.004 +/- 0.013 (lines: 22)*
ag (ASCII) 6.830 +/- 0.018 (lines: 302)
grep (ASCII) 4.395 +/- 0.030 (lines: 22)
ugrep (ASCII) 3.510 +/- 0.004 (lines: 22)
subtitles_ru_literal (pattern: Шерлок Холмс)
--------------------------------------------
rg* 0.133 +/- 0.002 (lines: 583)*
rg (no mmap) 0.180 +/- 0.000 (lines: 583)
grep 0.510 +/- 0.009 (lines: 583)
rg (lines) 0.193 +/- 0.003 (lines: 583)
ag (lines) 1.973 +/- 0.025 (lines: 583)
grep (lines) 0.808 +/- 0.006 (lines: 583)
ugrep (lines) 0.680 +/- 0.026 (lines: 583)
subtitles_ru_literal_casei (pattern: Шерлок Холмс)
--------------------------------------------------
rg* 0.268 +/- 0.001 (lines: 604)*
grep 4.767 +/- 0.002 (lines: 604)
grep (ASCII) 0.506 +/- 0.007 (lines: 583)
rg (lines) 0.335 +/- 0.005 (lines: 604)
ag (lines) (ASCII) 0.457 +/- 0.019 (lines: 0)
ugrep (lines) (ASCII) 0.720 +/- 0.081 (lines: 583)
subtitles_ru_literal_word (pattern: Шерлок Холмс)
-------------------------------------------------
rg (ASCII)* 0.195 +/- 0.011 (lines: 583)*
ag (ASCII) 0.509 +/- 0.020 (lines: 0)
grep (ASCII) 0.800 +/- 0.019 (lines: 583)
ugrep (ASCII) 0.708 +/- 0.034 (lines: 583)
rg 0.201 +/- 0.003 (lines: 579)
grep 0.792 +/- 0.020 (lines: 579)
subtitles_ru_alternate (pattern: Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти)
-----------------------------------------------------------------------------------------------------------
rg (lines) 0.682 +/- 0.018 (lines: 691)
ag (lines) 2.722 +/- 0.020 (lines: 691)
grep (lines) 5.711 +/- 0.006 (lines: 691)
ugrep (lines) 8.301 +/- 0.023 (lines: 691)
rg* 0.625 +/- 0.012 (lines: 691)*
grep 5.425 +/- 0.006 (lines: 691)
subtitles_ru_alternate_casei (pattern: Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти)
-----------------------------------------------------------------------------------------------------------------
ag (ASCII)* 2.727 +/- 0.019 (lines: 691)*
grep (ASCII) 5.712 +/- 0.002 (lines: 691)
ugrep (ASCII) 8.301 +/- 0.011 (lines: 691)
rg 3.673 +/- 0.004 (lines: 735)
grep 5.360 +/- 0.015 (lines: 735)
subtitles_ru_surrounding_words (pattern: \w+\s+Холмс\s+\w+)
-----------------------------------------------------------
rg* 0.203 +/- 0.001 (lines: 278)*
grep 1.039 +/- 0.009 (lines: 278)
ugrep 42.919 +/- 0.087 (lines: 278)
ag (ASCII) 1.084 +/- 0.001 (lines: 0)
grep (ASCII) 1.007 +/- 0.018 (lines: 0)
ugrep (ASCII) 0.890 +/- 0.001 (lines: 0)
subtitles_ru_no_literal (pattern: \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5})
----------------------------------------------------------------------------------------
rg 2.236 +/- 0.019 (lines: 41)
ugrep 28.811 +/- 0.127 (lines: 41)
rg (ASCII) 2.035 +/- 0.014 (lines: 0)
ag (ASCII) 1.093 +/- 0.006 (lines: 0)
grep (ASCII) 1.085 +/- 0.015 (lines: 0)
ugrep (ASCII)* 0.832 +/- 0.002 (lines: 0)*

271
build.rs
View File

@ -1,239 +1,46 @@
use std::env;
use std::fs::{self, File};
use std::io::{self, Read, Write};
use std::path::Path;
use std::process;
use clap::Shell;
use app::{RGArg, RGArgKind};
#[allow(dead_code)]
#[path = "crates/core/app.rs"]
mod app;
fn main() {
// OUT_DIR is set by Cargo and it's where any additional build artifacts
// are written.
let outdir = match env::var_os("OUT_DIR") {
Some(outdir) => outdir,
None => {
eprintln!(
"OUT_DIR environment variable not defined. \
Please file a bug: \
https://github.com/BurntSushi/ripgrep/issues/new"
);
process::exit(1);
}
};
fs::create_dir_all(&outdir).unwrap();
let stamp_path = Path::new(&outdir).join("ripgrep-stamp");
if let Err(err) = File::create(&stamp_path) {
panic!("failed to write {}: {}", stamp_path.display(), err);
}
if let Err(err) = generate_man_page(&outdir) {
eprintln!("failed to generate man page: {}", err);
}
// Use clap to build completion files.
let mut app = app::app();
app.gen_completions("rg", Shell::Bash, &outdir);
app.gen_completions("rg", Shell::Fish, &outdir);
app.gen_completions("rg", Shell::PowerShell, &outdir);
// Note that we do not use clap's support for zsh. Instead, zsh completions
// are manually maintained in `complete/_rg`.
// Make the current git hash available to the build.
if let Some(rev) = git_revision_hash() {
println!("cargo:rustc-env=RIPGREP_BUILD_GIT_HASH={}", rev);
}
set_git_revision_hash();
set_windows_exe_options();
}
fn git_revision_hash() -> Option<String> {
let result = process::Command::new("git")
.args(&["rev-parse", "--short=10", "HEAD"])
.output();
result.ok().and_then(|output| {
let v = String::from_utf8_lossy(&output.stdout).trim().to_string();
if v.is_empty() {
None
} else {
Some(v)
}
})
/// Embed a Windows manifest and set some linker options.
///
/// The main reason for this is to enable long path support on Windows. This
/// still, I believe, requires enabling long path support in the registry. But
/// if that's enabled, then this will let ripgrep use C:\... style paths that
/// are longer than 260 characters.
fn set_windows_exe_options() {
static MANIFEST: &str = "pkg/windows/Manifest.xml";
let Ok(target_os) = std::env::var("CARGO_CFG_TARGET_OS") else { return };
let Ok(target_env) = std::env::var("CARGO_CFG_TARGET_ENV") else { return };
if !(target_os == "windows" && target_env == "msvc") {
return;
}
let Ok(mut manifest) = std::env::current_dir() else { return };
manifest.push(MANIFEST);
let Some(manifest) = manifest.to_str() else { return };
println!("cargo:rerun-if-changed={}", MANIFEST);
// Embed the Windows application manifest file.
println!("cargo:rustc-link-arg-bin=rg=/MANIFEST:EMBED");
println!("cargo:rustc-link-arg-bin=rg=/MANIFESTINPUT:{manifest}");
// Turn linker warnings into errors. Helps debugging, otherwise the
// warnings get squashed (I believe).
println!("cargo:rustc-link-arg-bin=rg=/WX");
}
fn generate_man_page<P: AsRef<Path>>(outdir: P) -> io::Result<()> {
// If asciidoctor isn't installed, fallback to asciidoc.
if let Err(err) = process::Command::new("asciidoctor").output() {
eprintln!(
"Could not run 'asciidoctor' binary, falling back to 'a2x'."
);
eprintln!("Error from running 'asciidoctor': {}", err);
return legacy_generate_man_page::<P>(outdir);
/// Make the current git hash available to the build as the environment
/// variable `RIPGREP_BUILD_GIT_HASH`.
fn set_git_revision_hash() {
use std::process::Command;
let args = &["rev-parse", "--short=10", "HEAD"];
let Ok(output) = Command::new("git").args(args).output() else { return };
let rev = String::from_utf8_lossy(&output.stdout).trim().to_string();
if rev.is_empty() {
return;
}
// 1. Read asciidoctor template.
// 2. Interpolate template with auto-generated docs.
// 3. Save interpolation to disk.
// 4. Use asciidoctor to convert to man page.
let outdir = outdir.as_ref();
let cwd = env::current_dir()?;
let tpl_path = cwd.join("doc").join("rg.1.txt.tpl");
let txt_path = outdir.join("rg.1.txt");
let mut tpl = String::new();
File::open(&tpl_path)?.read_to_string(&mut tpl)?;
let options =
formatted_options()?.replace("&#123;", "{").replace("&#125;", "}");
tpl = tpl.replace("{OPTIONS}", &options);
let githash = git_revision_hash();
let githash = githash.as_ref().map(|x| &**x);
tpl = tpl.replace("{VERSION}", &app::long_version(githash, false));
File::create(&txt_path)?.write_all(tpl.as_bytes())?;
let result = process::Command::new("asciidoctor")
.arg("--doctype")
.arg("manpage")
.arg("--backend")
.arg("manpage")
.arg(&txt_path)
.spawn()?
.wait()?;
if !result.success() {
let msg =
format!("'asciidoctor' failed with exit code {:?}", result.code());
return Err(ioerr(msg));
}
Ok(())
}
fn legacy_generate_man_page<P: AsRef<Path>>(outdir: P) -> io::Result<()> {
// If asciidoc isn't installed, then don't do anything.
if let Err(err) = process::Command::new("a2x").output() {
eprintln!("Could not run 'a2x' binary, skipping man page generation.");
eprintln!("Error from running 'a2x': {}", err);
return Ok(());
}
// 1. Read asciidoc template.
// 2. Interpolate template with auto-generated docs.
// 3. Save interpolation to disk.
// 4. Use a2x (part of asciidoc) to convert to man page.
let outdir = outdir.as_ref();
let cwd = env::current_dir()?;
let tpl_path = cwd.join("doc").join("rg.1.txt.tpl");
let txt_path = outdir.join("rg.1.txt");
let mut tpl = String::new();
File::open(&tpl_path)?.read_to_string(&mut tpl)?;
tpl = tpl.replace("{OPTIONS}", &formatted_options()?);
let githash = git_revision_hash();
let githash = githash.as_ref().map(|x| &**x);
tpl = tpl.replace("{VERSION}", &app::long_version(githash, false));
File::create(&txt_path)?.write_all(tpl.as_bytes())?;
let result = process::Command::new("a2x")
.arg("--no-xmllint")
.arg("--doctype")
.arg("manpage")
.arg("--format")
.arg("manpage")
.arg(&txt_path)
.spawn()?
.wait()?;
if !result.success() {
let msg = format!("'a2x' failed with exit code {:?}", result.code());
return Err(ioerr(msg));
}
Ok(())
}
fn formatted_options() -> io::Result<String> {
let mut args = app::all_args_and_flags();
args.sort_by(|x1, x2| x1.name.cmp(&x2.name));
let mut formatted = vec![];
for arg in args {
if arg.hidden {
continue;
}
// ripgrep only has two positional arguments, and probably will only
// ever have two positional arguments, so we just hardcode them into
// the template.
if let app::RGArgKind::Positional { .. } = arg.kind {
continue;
}
formatted.push(formatted_arg(&arg)?);
}
Ok(formatted.join("\n\n"))
}
fn formatted_arg(arg: &RGArg) -> io::Result<String> {
match arg.kind {
RGArgKind::Positional { .. } => {
panic!("unexpected positional argument")
}
RGArgKind::Switch { long, short, multiple } => {
let mut out = vec![];
let mut header = format!("--{}", long);
if let Some(short) = short {
header = format!("-{}, {}", short, header);
}
if multiple {
header = format!("*{}* ...::", header);
} else {
header = format!("*{}*::", header);
}
writeln!(out, "{}", header)?;
writeln!(out, "{}", formatted_doc_txt(arg)?)?;
Ok(String::from_utf8(out).unwrap())
}
RGArgKind::Flag { long, short, value_name, multiple, .. } => {
let mut out = vec![];
let mut header = format!("--{}", long);
if let Some(short) = short {
header = format!("-{}, {}", short, header);
}
if multiple {
header = format!("*{}* _{}_ ...::", header, value_name);
} else {
header = format!("*{}* _{}_::", header, value_name);
}
writeln!(out, "{}", header)?;
writeln!(out, "{}", formatted_doc_txt(arg)?)?;
Ok(String::from_utf8(out).unwrap())
}
}
}
fn formatted_doc_txt(arg: &RGArg) -> io::Result<String> {
let paragraphs: Vec<String> = arg
.doc_long
.replace("{", "&#123;")
.replace("}", r"&#125;")
// Hack to render ** literally in man page correctly. We can't put
// these crazy +++ in the help text directly, since that shows
// literally in --help output.
.replace("*-g 'foo/**'*", "*-g +++'foo/**'+++*")
.split("\n\n")
.map(|s| s.to_string())
.collect();
if paragraphs.is_empty() {
return Err(ioerr(format!("missing docs for --{}", arg.name)));
}
let first = format!(" {}", paragraphs[0].replace("\n", "\n "));
if paragraphs.len() == 1 {
return Ok(first);
}
Ok(format!("{}\n+\n{}", first, paragraphs[1..].join("\n+\n")))
}
fn ioerr(msg: String) -> io::Error {
io::Error::new(io::ErrorKind::Other, msg)
println!("cargo:rustc-env=RIPGREP_BUILD_GIT_HASH={}", rev);
}

43
ci/build-and-publish-m2 Executable file
View File

@ -0,0 +1,43 @@
#!/bin/bash
# This script builds a ripgrep release for the aarch64-apple-darwin target.
# At time of writing (2023-11-21), GitHub Actions does not free Apple silicon
# runners. Since I have somewhat recently acquired an M2 mac mini, I just use
# this script to build the release tarball and upload it with `gh`.
#
# Once GitHub Actions has proper support for Apple silicon, we should add it
# to our release workflow and drop this script.
set -e
version="$1"
if [ -z "$version" ]; then
echo "missing version" >&2
echo "Usage: "$(basename "$0")" <version>" >&2
exit 1
fi
if ! grep -q "version = \"$version\"" Cargo.toml; then
echo "version does not match Cargo.toml" >&2
exit 1
fi
target=aarch64-apple-darwin
cargo build --release --features pcre2 --target $target
BIN=target/$target/release/rg
NAME=ripgrep-$version-$target
ARCHIVE="deployment/m2/$NAME"
mkdir -p "$ARCHIVE"/{complete,doc}
cp "$BIN" "$ARCHIVE"/
strip "$ARCHIVE/rg"
cp {README.md,COPYING,UNLICENSE,LICENSE-MIT} "$ARCHIVE"/
cp {CHANGELOG.md,FAQ.md,GUIDE.md} "$ARCHIVE"/doc/
"$BIN" --generate complete-bash > "$ARCHIVE/complete/rg.bash"
"$BIN" --generate complete-fish > "$ARCHIVE/complete/rg.fish"
"$BIN" --generate complete-powershell > "$ARCHIVE/complete/_rg.ps1"
"$BIN" --generate complete-zsh > "$ARCHIVE/complete/_rg"
"$BIN" --generate man > "$ARCHIVE/doc/rg.1"
tar c -C deployment/m2 -z -f "$ARCHIVE.tar.gz" "$NAME"
shasum -a 256 "$ARCHIVE.tar.gz" > "$ARCHIVE.tar.gz.sha256"
gh release upload "$version" "$ARCHIVE.tar.gz" "$ARCHIVE.tar.gz.sha256"

View File

@ -1,42 +0,0 @@
#!/bin/bash
set -e
D="$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null 2>&1 && pwd)"
# This script builds a binary dpkg for Debian based distros. It does not
# currently run in CI, and is instead run manually and the resulting dpkg is
# uploaded to GitHub via the web UI.
#
# Note that this requires 'cargo deb', which can be installed with
# 'cargo install cargo-deb'.
#
# This should be run from the root of the ripgrep repo.
if ! command -V cargo-deb > /dev/null 2>&1; then
echo "cargo-deb command missing" >&2
exit 1
fi
if ! command -V asciidoctor > /dev/null 2>&1; then
echo "asciidoctor command missing" >&2
exit 1
fi
# 'cargo deb' does not seem to provide a way to specify an asset that is
# created at build time, such as ripgrep's man page. To work around this,
# we force a debug build, copy out the man page (and shell completions)
# produced from that build, put it into a predictable location and then build
# the deb, which knows where to look.
cargo build
DEPLOY_DIR=deployment/deb
OUT_DIR="$("$D"/cargo-out-dir target/debug/)"
mkdir -p "$DEPLOY_DIR"
# Copy man page and shell completions.
cp "$OUT_DIR"/{rg.1,rg.bash,rg.fish} "$DEPLOY_DIR/"
cp complete/_rg "$DEPLOY_DIR/"
# Since we're distributing the dpkg, we don't know whether the user will have
# PCRE2 installed, so just do a static build.
PCRE2_SYS_STATIC=1 cargo deb --target x86_64-unknown-linux-musl

View File

@ -1,19 +0,0 @@
#!/bin/bash
# Finds Cargo's `OUT_DIR` directory from the most recent build.
#
# This requires one parameter corresponding to the target directory
# to search for the build output.
if [ $# != 1 ]; then
echo "Usage: $(basename "$0") <target-dir>" >&2
exit 2
fi
# This works by finding the most recent stamp file, which is produced by
# every ripgrep build.
target_dir="$1"
find "$target_dir" -name ripgrep-stamp -print0 \
| xargs -0 ls -t \
| head -n1 \
| xargs dirname

View File

@ -1,24 +0,0 @@
These are Docker images used for cross compilation in CI builds (or locally)
via the [Cross](https://github.com/rust-embedded/cross) tool.
The Cross tool actually provides its own Docker images, and all Docker images
in this directory are derived from one of them. We provide our own in order
to customize the environment. For example, we need to install some things like
`asciidoctor` in order to generate man pages. We also install compression tools
like `xz` so that tests for the `-z/--search-zip` flag are run.
If you make a change to a Docker image, then you can re-build it. `cd` into the
directory containing the `Dockerfile` and run:
$ cd x86_64-unknown-linux-musl
$ ./build
At this point, subsequent uses of `cross` will now use your built image since
Docker prefers local images over remote images. In order to make these changes
stick, they need to be pushed to Docker Hub:
$ docker push burntsushi/cross:x86_64-unknown-linux-musl
Of course, only I (BurntSushi) can push to that location. To make `cross` use
a different location, then edit `Cross.toml` in the root of this repo to use
a different image name for the desired target.

View File

@ -1,4 +0,0 @@
FROM rustembedded/cross:arm-unknown-linux-gnueabihf
COPY stage/ubuntu-install-packages /
RUN /ubuntu-install-packages

View File

@ -1,5 +0,0 @@
#!/bin/sh
mkdir -p stage
cp ../../ubuntu-install-packages ./stage/
docker build -t burntsushi/cross:arm-unknown-linux-gnueabihf .

View File

@ -1,4 +0,0 @@
FROM rustembedded/cross:i686-unknown-linux-gnu
COPY stage/ubuntu-install-packages /
RUN /ubuntu-install-packages

View File

@ -1,5 +0,0 @@
#!/bin/sh
mkdir -p stage
cp ../../ubuntu-install-packages ./stage/
docker build -t burntsushi/cross:i686-unknown-linux-gnu .

View File

@ -1,4 +0,0 @@
FROM rustembedded/cross:mips64-unknown-linux-gnuabi64
COPY stage/ubuntu-install-packages /
RUN /ubuntu-install-packages

View File

@ -1,5 +0,0 @@
#!/bin/sh
mkdir -p stage
cp ../../ubuntu-install-packages ./stage/
docker build -t burntsushi/cross:mips64-unknown-linux-gnuabi64 .

View File

@ -1,4 +0,0 @@
FROM rustembedded/cross:x86_64-unknown-linux-musl
COPY stage/ubuntu-install-packages /
RUN /ubuntu-install-packages

View File

@ -1,5 +0,0 @@
#!/bin/sh
mkdir -p stage
cp ../../ubuntu-install-packages ./stage/
docker build -t burntsushi/cross:x86_64-unknown-linux-musl .

View File

@ -1,3 +0,0 @@
#!/bin/sh
brew install asciidoctor

View File

@ -19,7 +19,7 @@ get_comp_args() {
main() {
local diff
local rg="${0:a:h}/../${TARGET_DIR:-target}/release/rg"
local _rg="${0:a:h}/../complete/_rg"
local _rg="${0:a:h}/../crates/core/flags/complete/rg.zsh"
local -a help_args comp_args
[[ -e $rg ]] || rg=${rg/%\/release\/rg/\/debug\/rg}

View File

@ -11,6 +11,4 @@ if ! command -V sudo; then
fi
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
asciidoctor \
zsh xz-utils liblz4-tool musl-tools \
brotli zstd
zsh xz-utils liblz4-tool musl-tools brotli zstd

View File

@ -1,6 +1,6 @@
[package]
name = "grep-cli"
version = "0.1.6" #:version
version = "0.1.11" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Utilities for search oriented command line applications.
@ -11,17 +11,16 @@ repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/cli"
readme = "README.md"
keywords = ["regex", "grep", "cli", "utility", "util"]
license = "Unlicense OR MIT"
edition = "2018"
edition = "2021"
[dependencies]
atty = "0.2.11"
bstr = "0.2.0"
globset = { version = "0.4.7", path = "../globset" }
lazy_static = "1.1.0"
log = "0.4.5"
regex = "1.1"
same-file = "1.0.4"
termcolor = "1.0.4"
bstr = { version = "1.6.2", features = ["std"] }
globset = { version = "0.4.15", path = "../globset" }
log = "0.4.20"
termcolor = "1.3.0"
[target.'cfg(windows)'.dependencies.winapi-util]
version = "0.1.1"
version = "0.1.6"
[target.'cfg(unix)'.dependencies.libc]
version = "0.2.148"

View File

@ -1,8 +1,10 @@
use std::ffi::{OsStr, OsString};
use std::fs::File;
use std::io;
use std::path::{Path, PathBuf};
use std::process::Command;
use std::{
ffi::{OsStr, OsString},
fs::File,
io,
path::{Path, PathBuf},
process::Command,
};
use globset::{Glob, GlobSet, GlobSetBuilder};
@ -18,7 +20,7 @@ pub struct DecompressionMatcherBuilder {
}
/// A representation of a single command for decompressing data
/// out-of-proccess.
/// out-of-process.
#[derive(Clone, Debug)]
struct DecompressionCommand {
/// The glob that matches this command.
@ -132,7 +134,7 @@ impl DecompressionMatcherBuilder {
A: AsRef<OsStr>,
{
let glob = glob.to_string();
let bin = resolve_binary(Path::new(program.as_ref()))?;
let bin = try_resolve_binary(Path::new(program.as_ref()))?;
let args =
args.into_iter().map(|a| a.as_ref().to_os_string()).collect();
self.commands.push(DecompressionCommand { glob, bin, args });
@ -161,7 +163,7 @@ impl DecompressionMatcher {
/// Create a new matcher with default rules.
///
/// To add more matching rules, build a matcher with
/// [`DecompressionMatcherBuilder`](struct.DecompressionMatcherBuilder.html).
/// [`DecompressionMatcherBuilder`].
pub fn new() -> DecompressionMatcher {
DecompressionMatcherBuilder::new()
.build()
@ -221,9 +223,8 @@ impl DecompressionReaderBuilder {
path: P,
) -> Result<DecompressionReader, CommandError> {
let path = path.as_ref();
let mut cmd = match self.matcher.command(path) {
None => return DecompressionReader::new_passthru(path),
Some(cmd) => cmd,
let Some(mut cmd) = self.matcher.command(path) else {
return DecompressionReader::new_passthru(path);
};
cmd.arg(path);
@ -302,9 +303,7 @@ impl DecompressionReaderBuilder {
/// The default matching rules are probably good enough for most cases, and if
/// they require revision, pull requests are welcome. In cases where they must
/// be changed or extended, they can be customized through the use of
/// [`DecompressionMatcherBuilder`](struct.DecompressionMatcherBuilder.html)
/// and
/// [`DecompressionReaderBuilder`](struct.DecompressionReaderBuilder.html).
/// [`DecompressionMatcherBuilder`] and [`DecompressionReaderBuilder`].
///
/// By default, this reader will asynchronously read the processes' stderr.
/// This prevents subtle deadlocking bugs for noisy processes that write a lot
@ -320,15 +319,14 @@ impl DecompressionReaderBuilder {
/// matcher.
///
/// ```no_run
/// use std::io::Read;
/// use std::process::Command;
/// use std::{io::Read, process::Command};
///
/// use grep_cli::DecompressionReader;
///
/// # fn example() -> Result<(), Box<::std::error::Error>> {
/// let mut rdr = DecompressionReader::new("/usr/share/man/man1/ls.1.gz")?;
/// let mut contents = vec![];
/// rdr.read_to_end(&mut contents)?;
/// # Ok(()) }
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
#[derive(Debug)]
pub struct DecompressionReader {
@ -347,9 +345,7 @@ impl DecompressionReader {
///
/// This uses the default matching rules for determining how to decompress
/// the given file. To change those matching rules, use
/// [`DecompressionReaderBuilder`](struct.DecompressionReaderBuilder.html)
/// and
/// [`DecompressionMatcherBuilder`](struct.DecompressionMatcherBuilder.html).
/// [`DecompressionReaderBuilder`] and [`DecompressionMatcherBuilder`].
///
/// When creating readers for many paths. it is better to use the builder
/// since it will amortize the cost of constructing the matcher.
@ -382,7 +378,7 @@ impl DecompressionReader {
///
/// `close` is also called in `drop` as a last line of defense against
/// resource leakage. Any error from the child process is then printed as a
/// warning to stderr. This can be avoided by explictly calling `close`
/// warning to stderr. This can be avoided by explicitly calling `close`
/// before the CommandReader is dropped.
pub fn close(&mut self) -> io::Result<()> {
match self.rdr {
@ -421,30 +417,52 @@ impl io::Read for DecompressionReader {
/// On non-Windows, this is a no-op.
pub fn resolve_binary<P: AsRef<Path>>(
prog: P,
) -> Result<PathBuf, CommandError> {
if !cfg!(windows) {
return Ok(prog.as_ref().to_path_buf());
}
try_resolve_binary(prog)
}
/// Resolves a path to a program to a path by searching for the program in
/// `PATH`.
///
/// If the program could not be resolved, then an error is returned.
///
/// The purpose of doing this instead of passing the path to the program
/// directly to Command::new is that Command::new will hand relative paths
/// to CreateProcess on Windows, which will implicitly search the current
/// working directory for the executable. This could be undesirable for
/// security reasons. e.g., running ripgrep with the -z/--search-zip flag on an
/// untrusted directory tree could result in arbitrary programs executing on
/// Windows.
///
/// Note that this could still return a relative path if PATH contains a
/// relative path. We permit this since it is assumed that the user has set
/// this explicitly, and thus, desires this behavior.
///
/// If `check_exists` is false or the path is already an absolute path this
/// will return immediately.
fn try_resolve_binary<P: AsRef<Path>>(
prog: P,
) -> Result<PathBuf, CommandError> {
use std::env;
fn is_exe(path: &Path) -> bool {
let md = match path.metadata() {
Err(_) => return false,
Ok(md) => md,
};
let Ok(md) = path.metadata() else { return false };
!md.is_dir()
}
let prog = prog.as_ref();
if !cfg!(windows) || prog.is_absolute() {
if prog.is_absolute() {
return Ok(prog.to_path_buf());
}
let syspaths = match env::var_os("PATH") {
Some(syspaths) => syspaths,
None => {
let msg = "system PATH environment variable not found";
return Err(CommandError::io(io::Error::new(
io::ErrorKind::Other,
msg,
)));
}
let Some(syspaths) = env::var_os("PATH") else {
let msg = "system PATH environment variable not found";
return Err(CommandError::io(io::Error::new(
io::ErrorKind::Other,
msg,
)));
};
for syspath in env::split_paths(&syspaths) {
if syspath.as_os_str().is_empty() {
@ -455,9 +473,11 @@ pub fn resolve_binary<P: AsRef<Path>>(
return Ok(abs_prog.to_path_buf());
}
if abs_prog.extension().is_none() {
let abs_prog = abs_prog.with_extension("exe");
if is_exe(&abs_prog) {
return Ok(abs_prog.to_path_buf());
for extension in ["com", "exe"] {
let abs_prog = abs_prog.with_extension(extension);
if is_exe(&abs_prog) {
return Ok(abs_prog.to_path_buf());
}
}
}
}

View File

@ -1,28 +1,14 @@
use std::ffi::OsStr;
use std::str;
use bstr::{ByteSlice, ByteVec};
/// A single state in the state machine used by `unescape`.
#[derive(Clone, Copy, Eq, PartialEq)]
enum State {
/// The state after seeing a `\`.
Escape,
/// The state after seeing a `\x`.
HexFirst,
/// The state after seeing a `\x[0-9A-Fa-f]`.
HexSecond(char),
/// Default state.
Literal,
}
/// Escapes arbitrary bytes into a human readable string.
///
/// This converts `\t`, `\r` and `\n` into their escaped forms. It also
/// converts the non-printable subset of ASCII in addition to invalid UTF-8
/// bytes to hexadecimal escape sequences. Everything else is left as is.
///
/// The dual of this routine is [`unescape`](fn.unescape.html).
/// The dual of this routine is [`unescape`].
///
/// # Example
///
@ -38,22 +24,12 @@ enum State {
/// assert_eq!(r"foo\nbar\xFFbaz", escape(b"foo\nbar\xFFbaz"));
/// ```
pub fn escape(bytes: &[u8]) -> String {
let mut escaped = String::new();
for (s, e, ch) in bytes.char_indices() {
if ch == '\u{FFFD}' {
for b in bytes[s..e].bytes() {
escape_byte(b, &mut escaped);
}
} else {
escape_char(ch, &mut escaped);
}
}
escaped
bytes.escape_bytes().to_string()
}
/// Escapes an OS string into a human readable string.
///
/// This is like [`escape`](fn.escape.html), but accepts an OS string.
/// This is like [`escape`], but accepts an OS string.
pub fn escape_os(string: &OsStr) -> String {
escape(Vec::from_os_str_lossy(string).as_bytes())
}
@ -72,7 +48,7 @@ pub fn escape_os(string: &OsStr) -> String {
/// capable of specifying arbitrary bytes or otherwise make it easier to
/// specify non-printable characters.
///
/// The dual of this routine is [`escape`](fn.escape.html).
/// The dual of this routine is [`escape`].
///
/// # Example
///
@ -89,81 +65,12 @@ pub fn escape_os(string: &OsStr) -> String {
/// assert_eq!(&b"foo\nbar\xFFbaz"[..], &*unescape(r"foo\nbar\xFFbaz"));
/// ```
pub fn unescape(s: &str) -> Vec<u8> {
use self::State::*;
let mut bytes = vec![];
let mut state = Literal;
for c in s.chars() {
match state {
Escape => match c {
'\\' => {
bytes.push(b'\\');
state = Literal;
}
'n' => {
bytes.push(b'\n');
state = Literal;
}
'r' => {
bytes.push(b'\r');
state = Literal;
}
't' => {
bytes.push(b'\t');
state = Literal;
}
'x' => {
state = HexFirst;
}
c => {
bytes.extend(format!(r"\{}", c).into_bytes());
state = Literal;
}
},
HexFirst => match c {
'0'..='9' | 'A'..='F' | 'a'..='f' => {
state = HexSecond(c);
}
c => {
bytes.extend(format!(r"\x{}", c).into_bytes());
state = Literal;
}
},
HexSecond(first) => match c {
'0'..='9' | 'A'..='F' | 'a'..='f' => {
let ordinal = format!("{}{}", first, c);
let byte = u8::from_str_radix(&ordinal, 16).unwrap();
bytes.push(byte);
state = Literal;
}
c => {
let original = format!(r"\x{}{}", first, c);
bytes.extend(original.into_bytes());
state = Literal;
}
},
Literal => match c {
'\\' => {
state = Escape;
}
c => {
bytes.extend(c.to_string().as_bytes());
}
},
}
}
match state {
Escape => bytes.push(b'\\'),
HexFirst => bytes.extend(b"\\x"),
HexSecond(c) => bytes.extend(format!("\\x{}", c).into_bytes()),
Literal => {}
}
bytes
Vec::unescape_bytes(s)
}
/// Unescapes an OS string.
///
/// This is like [`unescape`](fn.unescape.html), but accepts an OS string.
/// This is like [`unescape`], but accepts an OS string.
///
/// Note that this first lossily decodes the given OS string as UTF-8. That
/// is, an escaped string (the thing given) should be valid UTF-8.
@ -171,27 +78,6 @@ pub fn unescape_os(string: &OsStr) -> Vec<u8> {
unescape(&string.to_string_lossy())
}
/// Adds the given codepoint to the given string, escaping it if necessary.
fn escape_char(cp: char, into: &mut String) {
if cp.is_ascii() {
escape_byte(cp as u8, into);
} else {
into.push(cp);
}
}
/// Adds the given byte to the given string, escaping it if necessary.
fn escape_byte(byte: u8, into: &mut String) {
match byte {
0x21..=0x5B | 0x5D..=0x7D => into.push(byte as char),
b'\n' => into.push_str(r"\n"),
b'\r' => into.push_str(r"\r"),
b'\t' => into.push_str(r"\t"),
b'\\' => into.push_str(r"\\"),
_ => into.push_str(&format!(r"\x{:02X}", byte)),
}
}
#[cfg(test)]
mod tests {
use super::{escape, unescape};
@ -215,7 +101,8 @@ mod tests {
#[test]
fn nul() {
assert_eq!(b(b"\x00"), unescape(r"\x00"));
assert_eq!(r"\x00", escape(b"\x00"));
assert_eq!(b(b"\x00"), unescape(r"\0"));
assert_eq!(r"\0", escape(b"\x00"));
}
#[test]

View File

@ -0,0 +1,85 @@
use std::{ffi::OsString, io};
/// Returns the hostname of the current system.
///
/// It is unusual, although technically possible, for this routine to return
/// an error. It is difficult to list out the error conditions, but one such
/// possibility is platform support.
///
/// # Platform specific behavior
///
/// On Windows, this currently uses the "physical DNS hostname" computer name.
/// This may change in the future.
///
/// On Unix, this returns the result of the `gethostname` function from the
/// `libc` linked into the program.
pub fn hostname() -> io::Result<OsString> {
#[cfg(windows)]
{
use winapi_util::sysinfo::{get_computer_name, ComputerNameKind};
get_computer_name(ComputerNameKind::PhysicalDnsHostname)
}
#[cfg(unix)]
{
gethostname()
}
#[cfg(not(any(windows, unix)))]
{
Err(io::Error::new(
io::ErrorKind::Other,
"hostname could not be found on unsupported platform",
))
}
}
#[cfg(unix)]
fn gethostname() -> io::Result<OsString> {
use std::os::unix::ffi::OsStringExt;
// SAFETY: There don't appear to be any safety requirements for calling
// sysconf.
let limit = unsafe { libc::sysconf(libc::_SC_HOST_NAME_MAX) };
if limit == -1 {
// It is in theory possible for sysconf to return -1 for a limit but
// *not* set errno, in which case, io::Error::last_os_error is
// indeterminate. But untangling that is super annoying because std
// doesn't expose any unix-specific APIs for inspecting the errno. (We
// could do it ourselves, but it just doesn't seem worth doing?)
return Err(io::Error::last_os_error());
}
let Ok(maxlen) = usize::try_from(limit) else {
let msg = format!("host name max limit ({}) overflowed usize", limit);
return Err(io::Error::new(io::ErrorKind::Other, msg));
};
// maxlen here includes the NUL terminator.
let mut buf = vec![0; maxlen];
// SAFETY: The pointer we give is valid as it is derived directly from a
// Vec. Similarly, `maxlen` is the length of our Vec, and is thus valid
// to write to.
let rc = unsafe {
libc::gethostname(buf.as_mut_ptr().cast::<libc::c_char>(), maxlen)
};
if rc == -1 {
return Err(io::Error::last_os_error());
}
// POSIX says that if the hostname is bigger than `maxlen`, then it may
// write a truncate name back that is not necessarily NUL terminated (wtf,
// lol). So if we can't find a NUL terminator, then just give up.
let Some(zeropos) = buf.iter().position(|&b| b == 0) else {
let msg = "could not find NUL terminator in hostname";
return Err(io::Error::new(io::ErrorKind::Other, msg));
};
buf.truncate(zeropos);
buf.shrink_to_fit();
Ok(OsString::from_vec(buf))
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn print_hostname() {
println!("{:?}", hostname().unwrap());
}
}

View File

@ -1,14 +1,7 @@
use std::error;
use std::fmt;
use std::io;
use std::num::ParseIntError;
use regex::Regex;
/// An error that occurs when parsing a human readable size description.
///
/// This error provides an end user friendly message describing why the
/// description coudln't be parsed and what the expected format is.
/// description couldn't be parsed and what the expected format is.
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct ParseSizeError {
original: String,
@ -18,7 +11,7 @@ pub struct ParseSizeError {
#[derive(Clone, Debug, Eq, PartialEq)]
enum ParseSizeErrorKind {
InvalidFormat,
InvalidInt(ParseIntError),
InvalidInt(std::num::ParseIntError),
Overflow,
}
@ -30,7 +23,7 @@ impl ParseSizeError {
}
}
fn int(original: &str, err: ParseIntError) -> ParseSizeError {
fn int(original: &str, err: std::num::ParseIntError) -> ParseSizeError {
ParseSizeError {
original: original.to_string(),
kind: ParseSizeErrorKind::InvalidInt(err),
@ -45,22 +38,18 @@ impl ParseSizeError {
}
}
impl error::Error for ParseSizeError {
fn description(&self) -> &str {
"invalid size"
}
}
impl std::error::Error for ParseSizeError {}
impl fmt::Display for ParseSizeError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
impl std::fmt::Display for ParseSizeError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
use self::ParseSizeErrorKind::*;
match self.kind {
InvalidFormat => write!(
f,
"invalid format for size '{}', which should be a sequence \
of digits followed by an optional 'K', 'M' or 'G' \
suffix",
"invalid format for size '{}', which should be a non-empty \
sequence of digits followed by an optional 'K', 'M' or 'G' \
suffix",
self.original
),
InvalidInt(ref err) => write!(
@ -73,9 +62,9 @@ impl fmt::Display for ParseSizeError {
}
}
impl From<ParseSizeError> for io::Error {
fn from(size_err: ParseSizeError) -> io::Error {
io::Error::new(io::ErrorKind::Other, size_err)
impl From<ParseSizeError> for std::io::Error {
fn from(size_err: ParseSizeError) -> std::io::Error {
std::io::Error::new(std::io::ErrorKind::Other, size_err)
}
}
@ -88,29 +77,24 @@ impl From<ParseSizeError> for io::Error {
///
/// Additional suffixes may be added over time.
pub fn parse_human_readable_size(size: &str) -> Result<u64, ParseSizeError> {
lazy_static::lazy_static! {
// Normally I'd just parse something this simple by hand to avoid the
// regex dep, but we bring regex in any way for glob matching, so might
// as well use it.
static ref RE: Regex = Regex::new(r"^([0-9]+)([KMG])?$").unwrap();
let digits_end =
size.as_bytes().iter().take_while(|&b| b.is_ascii_digit()).count();
let digits = &size[..digits_end];
if digits.is_empty() {
return Err(ParseSizeError::format(size));
}
let value =
digits.parse::<u64>().map_err(|e| ParseSizeError::int(size, e))?;
let caps = match RE.captures(size) {
Some(caps) => caps,
None => return Err(ParseSizeError::format(size)),
};
let value: u64 =
caps[1].parse().map_err(|err| ParseSizeError::int(size, err))?;
let suffix = match caps.get(2) {
None => return Ok(value),
Some(cap) => cap.as_str(),
};
let suffix = &size[digits_end..];
if suffix.is_empty() {
return Ok(value);
}
let bytes = match suffix {
"K" => value.checked_mul(1 << 10),
"M" => value.checked_mul(1 << 20),
"G" => value.checked_mul(1 << 30),
// Because if the regex matches this group, it must be [KMG].
_ => unreachable!(),
_ => return Err(ParseSizeError::format(size)),
};
bytes.ok_or_else(|| ParseSizeError::overflow(size))
}

View File

@ -11,47 +11,26 @@ and Linux.
# Standard I/O
The
[`is_readable_stdin`](fn.is_readable_stdin.html),
[`is_tty_stderr`](fn.is_tty_stderr.html),
[`is_tty_stdin`](fn.is_tty_stdin.html)
and
[`is_tty_stdout`](fn.is_tty_stdout.html)
routines query aspects of standard I/O. `is_readable_stdin` determines whether
stdin can be usefully read from, while the `tty` methods determine whether a
tty is attached to stdin/stdout/stderr.
`is_readable_stdin` is useful when writing an application that changes behavior
based on whether the application was invoked with data on stdin. For example,
`rg foo` might recursively search the current working directory for
occurrences of `foo`, but `rg foo < file` might only search the contents of
`file`.
The `tty` methods are useful for similar reasons. Namely, commands like `ls`
will change their output depending on whether they are printing to a terminal
or not. For example, `ls` shows a file on each line when stdout is redirected
to a file or a pipe, but condenses the output to show possibly many files on
each line when stdout is connected to a tty.
[`is_readable_stdin`] determines whether stdin can be usefully read from. It
is useful when writing an application that changes behavior based on whether
the application was invoked with data on stdin. For example, `rg foo` might
recursively search the current working directory for occurrences of `foo`, but
`rg foo < file` might only search the contents of `file`.
# Coloring and buffering
The
[`stdout`](fn.stdout.html),
[`stdout_buffered_block`](fn.stdout_buffered_block.html)
and
[`stdout_buffered_line`](fn.stdout_buffered_line.html)
routines are alternative constructors for
[`StandardStream`](struct.StandardStream.html).
A `StandardStream` implements `termcolor::WriteColor`, which provides a way
to emit colors to terminals. Its key use is the encapsulation of buffering
style. Namely, `stdout` will return a line buffered `StandardStream` if and
only if stdout is connected to a tty, and will otherwise return a block
buffered `StandardStream`. Line buffering is important for use with a tty
because it typically decreases the latency at which the end user sees output.
Block buffering is used otherwise because it is faster, and redirecting stdout
to a file typically doesn't benefit from the decreased latency that line
buffering provides.
The [`stdout`], [`stdout_buffered_block`] and [`stdout_buffered_line`] routines
are alternative constructors for [`StandardStream`]. A `StandardStream`
implements `termcolor::WriteColor`, which provides a way to emit colors to
terminals. Its key use is the encapsulation of buffering style. Namely,
`stdout` will return a line buffered `StandardStream` if and only if
stdout is connected to a tty, and will otherwise return a block buffered
`StandardStream`. Line buffering is important for use with a tty because it
typically decreases the latency at which the end user sees output. Block
buffering is used otherwise because it is faster, and redirecting stdout to a
file typically doesn't benefit from the decreased latency that line buffering
provides.
The `stdout_buffered_block` and `stdout_buffered_line` can be used to
explicitly set the buffering strategy regardless of whether stdout is connected
@ -60,17 +39,12 @@ to a tty or not.
# Escaping
The
[`escape`](fn.escape.html),
[`escape_os`](fn.escape_os.html),
[`unescape`](fn.unescape.html)
and
[`unescape_os`](fn.unescape_os.html)
routines provide a user friendly way of dealing with UTF-8 encoded strings that
can express arbitrary bytes. For example, you might want to accept a string
containing arbitrary bytes as a command line argument, but most interactive
shells make such strings difficult to type. Instead, we can ask users to use
escape sequences.
The [`escape`](crate::escape()), [`escape_os`], [`unescape`] and
[`unescape_os`] routines provide a user friendly way of dealing with UTF-8
encoded strings that can express arbitrary bytes. For example, you might want
to accept a string containing arbitrary bytes as a command line argument, but
most interactive shells make such strings difficult to type. Instead, we can
ask users to use escape sequences.
For example, `a\xFFz` is itself a valid UTF-8 string corresponding to the
following bytes:
@ -103,44 +77,36 @@ makes it easy to show user friendly error messages involving arbitrary bytes.
# Building patterns
Typically, regular expression patterns must be valid UTF-8. However, command
line arguments aren't guaranteed to be valid UTF-8. Unfortunately, the
standard library's UTF-8 conversion functions from `OsStr`s do not provide
good error messages. However, the
[`pattern_from_bytes`](fn.pattern_from_bytes.html)
and
[`pattern_from_os`](fn.pattern_from_os.html)
do, including reporting exactly where the first invalid UTF-8 byte is seen.
line arguments aren't guaranteed to be valid UTF-8. Unfortunately, the standard
library's UTF-8 conversion functions from `OsStr`s do not provide good error
messages. However, the [`pattern_from_bytes`] and [`pattern_from_os`] do,
including reporting exactly where the first invalid UTF-8 byte is seen.
Additionally, it can be useful to read patterns from a file while reporting
good error messages that include line numbers. The
[`patterns_from_path`](fn.patterns_from_path.html),
[`patterns_from_reader`](fn.patterns_from_reader.html)
and
[`patterns_from_stdin`](fn.patterns_from_stdin.html)
routines do just that. If any pattern is found that is invalid UTF-8, then the
error includes the file path (if available) along with the line number and the
byte offset at which the first invalid UTF-8 byte was observed.
good error messages that include line numbers. The [`patterns_from_path`],
[`patterns_from_reader`] and [`patterns_from_stdin`] routines do just that. If
any pattern is found that is invalid UTF-8, then the error includes the file
path (if available) along with the line number and the byte offset at which the
first invalid UTF-8 byte was observed.
# Read process output
Sometimes a command line application needs to execute other processes and read
its stdout in a streaming fashion. The
[`CommandReader`](struct.CommandReader.html)
provides this functionality with an explicit goal of improving failure modes.
In particular, if the process exits with an error code, then stderr is read
and converted into a normal Rust error to show to end users. This makes the
underlying failure modes explicit and gives more information to end users for
debugging the problem.
Sometimes a command line application needs to execute other processes and
read its stdout in a streaming fashion. The [`CommandReader`] provides this
functionality with an explicit goal of improving failure modes. In particular,
if the process exits with an error code, then stderr is read and converted into
a normal Rust error to show to end users. This makes the underlying failure
modes explicit and gives more information to end users for debugging the
problem.
As a special case,
[`DecompressionReader`](struct.DecompressionReader.html)
provides a way to decompress arbitrary files by matching their file extensions
up with corresponding decompression programs (such as `gzip` and `xz`). This
is useful as a means of performing simplistic decompression in a portable
manner without binding to specific compression libraries. This does come with
some overhead though, so if you need to decompress lots of small files, this
may not be an appropriate convenience to use.
As a special case, [`DecompressionReader`] provides a way to decompress
arbitrary files by matching their file extensions up with corresponding
decompression programs (such as `gzip` and `xz`). This is useful as a means of
performing simplistic decompression in a portable manner without binding to
specific compression libraries. This does come with some overhead though, so
if you need to decompress lots of small files, this may not be an appropriate
convenience to use.
Each reader has a corresponding builder for additional configuration, such as
whether to read stderr asynchronously in order to avoid deadlock (which is
@ -149,35 +115,38 @@ enabled by default).
# Miscellaneous parsing
The
[`parse_human_readable_size`](fn.parse_human_readable_size.html)
routine parses strings like `2M` and converts them to the corresponding number
of bytes (`2 * 1<<20` in this case). If an invalid size is found, then a good
error message is crafted that typically tells the user how to fix the problem.
The [`parse_human_readable_size`] routine parses strings like `2M` and converts
them to the corresponding number of bytes (`2 * 1<<20` in this case). If an
invalid size is found, then a good error message is crafted that typically
tells the user how to fix the problem.
*/
#![deny(missing_docs)]
mod decompress;
mod escape;
mod hostname;
mod human;
mod pattern;
mod process;
mod wtr;
pub use crate::decompress::{
resolve_binary, DecompressionMatcher, DecompressionMatcherBuilder,
DecompressionReader, DecompressionReaderBuilder,
};
pub use crate::escape::{escape, escape_os, unescape, unescape_os};
pub use crate::human::{parse_human_readable_size, ParseSizeError};
pub use crate::pattern::{
pattern_from_bytes, pattern_from_os, patterns_from_path,
patterns_from_reader, patterns_from_stdin, InvalidPatternError,
};
pub use crate::process::{CommandError, CommandReader, CommandReaderBuilder};
pub use crate::wtr::{
stdout, stdout_buffered_block, stdout_buffered_line, StandardStream,
pub use crate::{
decompress::{
resolve_binary, DecompressionMatcher, DecompressionMatcherBuilder,
DecompressionReader, DecompressionReaderBuilder,
},
escape::{escape, escape_os, unescape, unescape_os},
hostname::hostname,
human::{parse_human_readable_size, ParseSizeError},
pattern::{
pattern_from_bytes, pattern_from_os, patterns_from_path,
patterns_from_reader, patterns_from_stdin, InvalidPatternError,
},
process::{CommandError, CommandReader, CommandReaderBuilder},
wtr::{
stdout, stdout_buffered_block, stdout_buffered_line, StandardStream,
},
};
/// Returns true if and only if stdin is believed to be readable.
@ -187,38 +156,113 @@ pub use crate::wtr::{
/// might search the current directory for occurrences of `foo` where as
/// `command foo < some-file` or `cat some-file | command foo` might instead
/// only search stdin for occurrences of `foo`.
///
/// Note that this isn't perfect and essentially corresponds to a heuristic.
/// When things are unclear (such as if an error occurs during introspection to
/// determine whether stdin is readable), this prefers to return `false`. That
/// means it's possible for an end user to pipe something into your program and
/// have this return `false` and thus potentially lead to ignoring the user's
/// stdin data. While not ideal, this is perhaps better than falsely assuming
/// stdin is readable, which would result in blocking forever on reading stdin.
/// Regardless, commands should always provide explicit fallbacks to override
/// behavior. For example, `rg foo -` will explicitly search stdin and `rg foo
/// ./` will explicitly search the current working directory.
pub fn is_readable_stdin() -> bool {
use std::io::IsTerminal;
#[cfg(unix)]
fn imp() -> bool {
use same_file::Handle;
use std::os::unix::fs::FileTypeExt;
let ft = match Handle::stdin().and_then(|h| h.as_file().metadata()) {
Err(_) => return false,
Ok(md) => md.file_type(),
use std::{
fs::File,
os::{fd::AsFd, unix::fs::FileTypeExt},
};
ft.is_file() || ft.is_fifo() || ft.is_socket()
let stdin = std::io::stdin();
let fd = match stdin.as_fd().try_clone_to_owned() {
Ok(fd) => fd,
Err(err) => {
log::debug!(
"for heuristic stdin detection on Unix, \
could not clone stdin file descriptor \
(thus assuming stdin is not readable): {err}",
);
return false;
}
};
let file = File::from(fd);
let md = match file.metadata() {
Ok(md) => md,
Err(err) => {
log::debug!(
"for heuristic stdin detection on Unix, \
could not get file metadata for stdin \
(thus assuming stdin is not readable): {err}",
);
return false;
}
};
let ft = md.file_type();
let is_file = ft.is_file();
let is_fifo = ft.is_fifo();
let is_socket = ft.is_socket();
let is_readable = is_file || is_fifo || is_socket;
log::debug!(
"for heuristic stdin detection on Unix, \
found that \
is_file={is_file}, is_fifo={is_fifo} and is_socket={is_socket}, \
and thus concluded that is_stdin_readable={is_readable}",
);
is_readable
}
#[cfg(windows)]
fn imp() -> bool {
use winapi_util as winutil;
winutil::file::typ(winutil::HandleRef::stdin())
.map(|t| t.is_disk() || t.is_pipe())
.unwrap_or(false)
let stdin = winapi_util::HandleRef::stdin();
let typ = match winapi_util::file::typ(stdin) {
Ok(typ) => typ,
Err(err) => {
log::debug!(
"for heuristic stdin detection on Windows, \
could not get file type of stdin \
(thus assuming stdin is not readable): {err}",
);
return false;
}
};
let is_disk = typ.is_disk();
let is_pipe = typ.is_pipe();
let is_readable = is_disk || is_pipe;
log::debug!(
"for heuristic stdin detection on Windows, \
found that is_disk={is_disk} and is_pipe={is_pipe}, \
and thus concluded that is_stdin_readable={is_readable}",
);
is_readable
}
!is_tty_stdin() && imp()
#[cfg(not(any(unix, windows)))]
fn imp() -> bool {
log::debug!("on non-{{Unix,Windows}}, assuming stdin is not readable");
false
}
!std::io::stdin().is_terminal() && imp()
}
/// Returns true if and only if stdin is believed to be connectted to a tty
/// Returns true if and only if stdin is believed to be connected to a tty
/// or a console.
///
/// Note that this is now just a wrapper around
/// [`std::io::IsTerminal`](https://doc.rust-lang.org/std/io/trait.IsTerminal.html).
/// Callers should prefer using the `IsTerminal` trait directly. This routine
/// is deprecated and will be removed in the next semver incompatible release.
#[deprecated(since = "0.1.10", note = "use std::io::IsTerminal instead")]
pub fn is_tty_stdin() -> bool {
atty::is(atty::Stream::Stdin)
use std::io::IsTerminal;
std::io::stdin().is_terminal()
}
/// Returns true if and only if stdout is believed to be connectted to a tty
/// Returns true if and only if stdout is believed to be connected to a tty
/// or a console.
///
/// This is useful for when you want your command line program to produce
@ -226,12 +270,26 @@ pub fn is_tty_stdin() -> bool {
/// terminal or whether it's being redirected somewhere else. For example,
/// implementations of `ls` will often show one item per line when stdout is
/// redirected, but will condensed output when printing to a tty.
///
/// Note that this is now just a wrapper around
/// [`std::io::IsTerminal`](https://doc.rust-lang.org/std/io/trait.IsTerminal.html).
/// Callers should prefer using the `IsTerminal` trait directly. This routine
/// is deprecated and will be removed in the next semver incompatible release.
#[deprecated(since = "0.1.10", note = "use std::io::IsTerminal instead")]
pub fn is_tty_stdout() -> bool {
atty::is(atty::Stream::Stdout)
use std::io::IsTerminal;
std::io::stdout().is_terminal()
}
/// Returns true if and only if stderr is believed to be connectted to a tty
/// Returns true if and only if stderr is believed to be connected to a tty
/// or a console.
///
/// Note that this is now just a wrapper around
/// [`std::io::IsTerminal`](https://doc.rust-lang.org/std/io/trait.IsTerminal.html).
/// Callers should prefer using the `IsTerminal` trait directly. This routine
/// is deprecated and will be removed in the next semver incompatible release.
#[deprecated(since = "0.1.10", note = "use std::io::IsTerminal instead")]
pub fn is_tty_stderr() -> bool {
atty::is(atty::Stream::Stderr)
use std::io::IsTerminal;
std::io::stderr().is_terminal()
}

View File

@ -1,10 +1,4 @@
use std::error;
use std::ffi::OsStr;
use std::fmt;
use std::fs::File;
use std::io;
use std::path::Path;
use std::str;
use std::{ffi::OsStr, io, path::Path};
use bstr::io::BufReadExt;
@ -28,14 +22,10 @@ impl InvalidPatternError {
}
}
impl error::Error for InvalidPatternError {
fn description(&self) -> &str {
"invalid pattern"
}
}
impl std::error::Error for InvalidPatternError {}
impl fmt::Display for InvalidPatternError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
impl std::fmt::Display for InvalidPatternError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(
f,
"found invalid UTF-8 in pattern at byte offset {}: {} \
@ -77,7 +67,7 @@ pub fn pattern_from_os(pattern: &OsStr) -> Result<&str, InvalidPatternError> {
pub fn pattern_from_bytes(
pattern: &[u8],
) -> Result<&str, InvalidPatternError> {
str::from_utf8(pattern).map_err(|err| InvalidPatternError {
std::str::from_utf8(pattern).map_err(|err| InvalidPatternError {
original: escape(pattern),
valid_up_to: err.valid_up_to(),
})
@ -91,7 +81,7 @@ pub fn pattern_from_bytes(
/// path.
pub fn patterns_from_path<P: AsRef<Path>>(path: P) -> io::Result<Vec<String>> {
let path = path.as_ref();
let file = File::open(path).map_err(|err| {
let file = std::fs::File::open(path).map_err(|err| {
io::Error::new(
io::ErrorKind::Other,
format!("{}: {}", path.display(), err),
@ -135,7 +125,6 @@ pub fn patterns_from_stdin() -> io::Result<Vec<String>> {
/// ```
/// use grep_cli::patterns_from_reader;
///
/// # fn example() -> Result<(), Box<::std::error::Error>> {
/// let patterns = "\
/// foo
/// bar\\s+foo
@ -147,7 +136,7 @@ pub fn patterns_from_stdin() -> io::Result<Vec<String>> {
/// r"bar\s+foo",
/// r"[a-z]{3}",
/// ]);
/// # Ok(()) }
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
pub fn patterns_from_reader<R: io::Read>(rdr: R) -> io::Result<Vec<String>> {
let mut patterns = vec![];

View File

@ -1,9 +1,7 @@
use std::error;
use std::fmt;
use std::io::{self, Read};
use std::iter;
use std::process;
use std::thread::{self, JoinHandle};
use std::{
io::{self, Read},
process,
};
/// An error that can occur while running a command and reading its output.
///
@ -40,14 +38,10 @@ impl CommandError {
}
}
impl error::Error for CommandError {
fn description(&self) -> &str {
"command error"
}
}
impl std::error::Error for CommandError {}
impl fmt::Display for CommandError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
impl std::fmt::Display for CommandError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self.kind {
CommandErrorKind::Io(ref e) => e.fmt(f),
CommandErrorKind::Stderr(ref bytes) => {
@ -55,7 +49,7 @@ impl fmt::Display for CommandError {
if msg.trim().is_empty() {
write!(f, "<stderr is empty>")
} else {
let div = iter::repeat('-').take(79).collect::<String>();
let div = "-".repeat(79);
write!(
f,
"\n{div}\n{msg}\n{div}",
@ -161,18 +155,17 @@ impl CommandReaderBuilder {
/// is returned as an error.
///
/// ```no_run
/// use std::io::Read;
/// use std::process::Command;
/// use std::{io::Read, process::Command};
///
/// use grep_cli::CommandReader;
///
/// # fn example() -> Result<(), Box<::std::error::Error>> {
/// let mut cmd = Command::new("gzip");
/// cmd.arg("-d").arg("-c").arg("/usr/share/man/man1/ls.1.gz");
///
/// let mut rdr = CommandReader::new(&mut cmd)?;
/// let mut contents = vec![];
/// rdr.read_to_end(&mut contents)?;
/// # Ok(()) }
/// # Ok::<(), Box<dyn std::error::Error>>(())
/// ```
#[derive(Debug)]
pub struct CommandReader {
@ -198,8 +191,7 @@ impl CommandReader {
/// returned.
///
/// If the caller requires additional configuration for the reader
/// returned, then use
/// [`CommandReaderBuilder`](struct.CommandReaderBuilder.html).
/// returned, then use [`CommandReaderBuilder`].
pub fn new(
cmd: &mut process::Command,
) -> Result<CommandReader, CommandError> {
@ -221,7 +213,7 @@ impl CommandReader {
///
/// `close` is also called in `drop` as a last line of defense against
/// resource leakage. Any error from the child process is then printed as a
/// warning to stderr. This can be avoided by explictly calling `close`
/// warning to stderr. This can be avoided by explicitly calling `close`
/// before the CommandReader is dropped.
pub fn close(&mut self) -> io::Result<()> {
// Dropping stdout closes the underlying file descriptor, which should
@ -279,7 +271,7 @@ impl io::Read for CommandReader {
/// stderr.
#[derive(Debug)]
enum StderrReader {
Async(Option<JoinHandle<CommandError>>),
Async(Option<std::thread::JoinHandle<CommandError>>),
Sync(process::ChildStderr),
}
@ -287,7 +279,7 @@ impl StderrReader {
/// Create a reader for stderr that reads contents asynchronously.
fn r#async(mut stderr: process::ChildStderr) -> StderrReader {
let handle =
thread::spawn(move || stderr_to_command_error(&mut stderr));
std::thread::spawn(move || stderr_to_command_error(&mut stderr));
StderrReader::Async(Some(handle))
}

View File

@ -1,10 +1,9 @@
use std::io;
use std::io::{self, IsTerminal};
use termcolor;
use crate::is_tty_stdout;
use termcolor::HyperlinkSpec;
/// A writer that supports coloring with either line or block buffering.
#[derive(Debug)]
pub struct StandardStream(StandardStreamKind);
/// Returns a possibly buffered writer to stdout for the given color choice.
@ -22,7 +21,7 @@ pub struct StandardStream(StandardStreamKind);
/// The color choice given is passed along to the underlying writer. To
/// completely disable colors in all cases, use `ColorChoice::Never`.
pub fn stdout(color_choice: termcolor::ColorChoice) -> StandardStream {
if is_tty_stdout() {
if std::io::stdout().is_terminal() {
stdout_buffered_line(color_choice)
} else {
stdout_buffered_block(color_choice)
@ -35,10 +34,8 @@ pub fn stdout(color_choice: termcolor::ColorChoice) -> StandardStream {
/// users see output as soon as it's written. The downside of this approach
/// is that it can be slower, especially when there is a lot of output.
///
/// You might consider using
/// [`stdout`](fn.stdout.html)
/// instead, which chooses the buffering strategy automatically based on
/// whether stdout is connected to a tty.
/// You might consider using [`stdout`] instead, which chooses the buffering
/// strategy automatically based on whether stdout is connected to a tty.
pub fn stdout_buffered_line(
color_choice: termcolor::ColorChoice,
) -> StandardStream {
@ -52,10 +49,8 @@ pub fn stdout_buffered_line(
/// the cost of writing data. The downside of this approach is that it can
/// increase the latency of display output when writing to a tty.
///
/// You might consider using
/// [`stdout`](fn.stdout.html)
/// instead, which chooses the buffering strategy automatically based on
/// whether stdout is connected to a tty.
/// You might consider using [`stdout`] instead, which chooses the buffering
/// strategy automatically based on whether stdout is connected to a tty.
pub fn stdout_buffered_block(
color_choice: termcolor::ColorChoice,
) -> StandardStream {
@ -63,6 +58,7 @@ pub fn stdout_buffered_block(
StandardStream(StandardStreamKind::BlockBuffered(out))
}
#[derive(Debug)]
enum StandardStreamKind {
LineBuffered(termcolor::StandardStream),
BlockBuffered(termcolor::BufferedStandardStream),
@ -101,6 +97,16 @@ impl termcolor::WriteColor for StandardStream {
}
}
#[inline]
fn supports_hyperlinks(&self) -> bool {
use self::StandardStreamKind::*;
match self.0 {
LineBuffered(ref w) => w.supports_hyperlinks(),
BlockBuffered(ref w) => w.supports_hyperlinks(),
}
}
#[inline]
fn set_color(&mut self, spec: &termcolor::ColorSpec) -> io::Result<()> {
use self::StandardStreamKind::*;
@ -111,6 +117,16 @@ impl termcolor::WriteColor for StandardStream {
}
}
#[inline]
fn set_hyperlink(&mut self, link: &HyperlinkSpec) -> io::Result<()> {
use self::StandardStreamKind::*;
match self.0 {
LineBuffered(ref mut w) => w.set_hyperlink(link),
BlockBuffered(ref mut w) => w.set_hyperlink(link),
}
}
#[inline]
fn reset(&mut self) -> io::Result<()> {
use self::StandardStreamKind::*;

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,107 @@
/*!
Provides completions for ripgrep's CLI for the bash shell.
*/
use crate::flags::defs::FLAGS;
const TEMPLATE_FULL: &'static str = "
_rg() {
local i cur prev opts cmds
COMPREPLY=()
cur=\"${COMP_WORDS[COMP_CWORD]}\"
prev=\"${COMP_WORDS[COMP_CWORD-1]}\"
cmd=\"\"
opts=\"\"
for i in ${COMP_WORDS[@]}; do
case \"${i}\" in
rg)
cmd=\"rg\"
;;
*)
;;
esac
done
case \"${cmd}\" in
rg)
opts=\"!OPTS!\"
if [[ ${cur} == -* || ${COMP_CWORD} -eq 1 ]] ; then
COMPREPLY=($(compgen -W \"${opts}\" -- \"${cur}\"))
return 0
fi
case \"${prev}\" in
!CASES!
esac
COMPREPLY=($(compgen -W \"${opts}\" -- \"${cur}\"))
return 0
;;
esac
}
complete -F _rg -o bashdefault -o default rg
";
const TEMPLATE_CASE: &'static str = "
!FLAG!)
COMPREPLY=($(compgen -f \"${cur}\"))
return 0
;;
";
const TEMPLATE_CASE_CHOICES: &'static str = "
!FLAG!)
COMPREPLY=($(compgen -W \"!CHOICES!\" -- \"${cur}\"))
return 0
;;
";
/// Generate completions for Bash.
///
/// Note that these completions are based on what was produced for ripgrep <=13
/// using Clap 2.x. Improvements on this are welcome.
pub(crate) fn generate() -> String {
let mut opts = String::new();
for flag in FLAGS.iter() {
opts.push_str("--");
opts.push_str(flag.name_long());
opts.push(' ');
if let Some(short) = flag.name_short() {
opts.push('-');
opts.push(char::from(short));
opts.push(' ');
}
if let Some(name) = flag.name_negated() {
opts.push_str("--");
opts.push_str(name);
opts.push(' ');
}
}
opts.push_str("<PATTERN> <PATH>...");
let mut cases = String::new();
for flag in FLAGS.iter() {
let template = if !flag.doc_choices().is_empty() {
let choices = flag.doc_choices().join(" ");
TEMPLATE_CASE_CHOICES.trim_end().replace("!CHOICES!", &choices)
} else {
TEMPLATE_CASE.trim_end().to_string()
};
let name = format!("--{}", flag.name_long());
cases.push_str(&template.replace("!FLAG!", &name));
if let Some(short) = flag.name_short() {
let name = format!("-{}", char::from(short));
cases.push_str(&template.replace("!FLAG!", &name));
}
if let Some(negated) = flag.name_negated() {
let name = format!("--{negated}");
cases.push_str(&template.replace("!FLAG!", &name));
}
}
TEMPLATE_FULL
.replace("!OPTS!", &opts)
.replace("!CASES!", &cases)
.trim_start()
.to_string()
}

View File

@ -0,0 +1,29 @@
# This is impossible to read, but these encodings rarely if ever change, so
# it probably does not matter. They are derived from the list given here:
# https://encoding.spec.whatwg.org/#concept-encoding-get
#
# The globbing here works in both fish and zsh (though they expand it in
# different orders). It may work in other shells too.
{{,us-}ascii,arabic,chinese,cyrillic,greek{,8},hebrew,korean}
logical visual mac {,cs}macintosh x-mac-{cyrillic,roman,ukrainian}
866 ibm{819,866} csibm866
big5{,-hkscs} {cn-,cs}big5 x-x-big5
cp{819,866,125{0,1,2,3,4,5,6,7,8}} x-cp125{0,1,2,3,4,5,6,7,8}
csiso2022{jp,kr} csiso8859{6,8}{e,i}
csisolatin{1,2,3,4,5,6,9} csisolatin{arabic,cyrillic,greek,hebrew}
ecma-{114,118} asmo-708 elot_928 sun_eu_greek
euc-{jp,kr} x-euc-jp cseuckr cseucpkdfmtjapanese
{,x-}gbk csiso58gb231280 gb18030 {,cs}gb2312 gb_2312{,-80} hz-gb-2312
iso-2022-{cn,cn-ext,jp,kr}
iso8859{,-}{1,2,3,4,5,6,7,8,9,10,11,13,14,15}
iso-8859-{1,2,3,4,5,6,7,8,9,10,11,{6,8}-{e,i},13,14,15,16} iso_8859-{1,2,3,4,5,6,7,8,9,15}
iso_8859-{1,2,6,7}:1987 iso_8859-{3,4,5,8}:1988 iso_8859-9:1989
iso-ir-{58,100,101,109,110,126,127,138,144,148,149,157}
koi{,8,8-r,8-ru,8-u,8_r} cskoi8r
ks_c_5601-{1987,1989} ksc{,_}5691 csksc56011987
latin{1,2,3,4,5,6} l{1,2,3,4,5,6,9}
shift{-,_}jis csshiftjis {,x-}sjis ms_kanji ms932
utf{,-}8 utf-16{,be,le} unicode-1-1-utf-8
windows-{31j,874,949,125{0,1,2,3,4,5,6,7,8}} dos-874 tis-620 ansi_x3.4-1968
x-user-defined auto none

View File

@ -0,0 +1,68 @@
/*!
Provides completions for ripgrep's CLI for the fish shell.
*/
use crate::flags::{defs::FLAGS, CompletionType};
const TEMPLATE: &'static str = "complete -c rg !SHORT! -l !LONG! -d '!DOC!'";
const TEMPLATE_NEGATED: &'static str =
"complete -c rg -l !NEGATED! -n '__fish_contains_opt !SHORT! !LONG!' -d '!DOC!'\n";
/// Generate completions for Fish.
pub(crate) fn generate() -> String {
let mut out = String::new();
for flag in FLAGS.iter() {
let short = match flag.name_short() {
None => "".to_string(),
Some(byte) => format!("-s {}", char::from(byte)),
};
let long = flag.name_long();
let doc = flag.doc_short().replace("'", "\\'");
let mut completion = TEMPLATE
.replace("!SHORT!", &short)
.replace("!LONG!", &long)
.replace("!DOC!", &doc);
match flag.completion_type() {
CompletionType::Filename => {
completion.push_str(" -r -F");
}
CompletionType::Executable => {
completion.push_str(" -r -f -a '(__fish_complete_command)'");
}
CompletionType::Filetype => {
completion.push_str(
" -r -f -a '(rg --type-list | string replace : \\t)'",
);
}
CompletionType::Encoding => {
completion.push_str(" -r -f -a '");
completion.push_str(super::ENCODINGS);
completion.push_str("'");
}
CompletionType::Other if !flag.doc_choices().is_empty() => {
completion.push_str(" -r -f -a '");
completion.push_str(&flag.doc_choices().join(" "));
completion.push_str("'");
}
CompletionType::Other if !flag.is_switch() => {
completion.push_str(" -r -f");
}
CompletionType::Other => (),
}
completion.push('\n');
out.push_str(&completion);
if let Some(negated) = flag.name_negated() {
out.push_str(
&TEMPLATE_NEGATED
.replace("!NEGATED!", &negated)
.replace("!SHORT!", &short)
.replace("!LONG!", &long)
.replace("!DOC!", &doc),
);
}
}
out
}

View File

@ -0,0 +1,10 @@
/*!
Modules for generating completions for various shells.
*/
static ENCODINGS: &'static str = include_str!("encodings.sh");
pub(super) mod bash;
pub(super) mod fish;
pub(super) mod powershell;
pub(super) mod zsh;

View File

@ -0,0 +1,86 @@
/*!
Provides completions for ripgrep's CLI for PowerShell.
*/
use crate::flags::defs::FLAGS;
const TEMPLATE: &'static str = "
using namespace System.Management.Automation
using namespace System.Management.Automation.Language
Register-ArgumentCompleter -Native -CommandName 'rg' -ScriptBlock {
param($wordToComplete, $commandAst, $cursorPosition)
$commandElements = $commandAst.CommandElements
$command = @(
'rg'
for ($i = 1; $i -lt $commandElements.Count; $i++) {
$element = $commandElements[$i]
if ($element -isnot [StringConstantExpressionAst] -or
$element.StringConstantType -ne [StringConstantType]::BareWord -or
$element.Value.StartsWith('-')) {
break
}
$element.Value
}) -join ';'
$completions = @(switch ($command) {
'rg' {
!FLAGS!
}
})
$completions.Where{ $_.CompletionText -like \"$wordToComplete*\" } |
Sort-Object -Property ListItemText
}
";
const TEMPLATE_FLAG: &'static str =
"[CompletionResult]::new('!DASH_NAME!', '!NAME!', [CompletionResultType]::ParameterName, '!DOC!')";
/// Generate completions for PowerShell.
///
/// Note that these completions are based on what was produced for ripgrep <=13
/// using Clap 2.x. Improvements on this are welcome.
pub(crate) fn generate() -> String {
let mut flags = String::new();
for (i, flag) in FLAGS.iter().enumerate() {
let doc = flag.doc_short().replace("'", "''");
let dash_name = format!("--{}", flag.name_long());
let name = flag.name_long();
if i > 0 {
flags.push('\n');
}
flags.push_str(" ");
flags.push_str(
&TEMPLATE_FLAG
.replace("!DASH_NAME!", &dash_name)
.replace("!NAME!", &name)
.replace("!DOC!", &doc),
);
if let Some(byte) = flag.name_short() {
let dash_name = format!("-{}", char::from(byte));
let name = char::from(byte).to_string();
flags.push_str("\n ");
flags.push_str(
&TEMPLATE_FLAG
.replace("!DASH_NAME!", &dash_name)
.replace("!NAME!", &name)
.replace("!DOC!", &doc),
);
}
if let Some(negated) = flag.name_negated() {
let dash_name = format!("--{}", negated);
flags.push_str("\n ");
flags.push_str(
&TEMPLATE_FLAG
.replace("!DASH_NAME!", &dash_name)
.replace("!NAME!", &negated)
.replace("!DOC!", &doc),
);
}
}
TEMPLATE.trim_start().replace("!FLAGS!", &flags)
}

View File

@ -30,7 +30,7 @@ _rg() {
[[ $_RG_COMPLETE_LIST_ARGS == (1|t*|y*) ]] ||
# (--[imnp]* => --ignore*, --messages, --no-*, --pcre2-unicode)
[[ $PREFIX$SUFFIX == --[imnp]* ]] ||
zstyle -t ":complete:$curcontext:*" complete-all
zstyle -t ":completion:${curcontext}:" complete-all
then
no=
fi
@ -73,6 +73,7 @@ _rg() {
{-c,--count}'[only show count of matching lines for each file]'
'--count-matches[only show count of individual matches for each file]'
'--include-zero[include files with zero matches in summary]'
$no"--no-include-zero[don't include files with zero matches in summary]"
+ '(encoding)' # Encoding options
{-E+,--encoding=}'[specify text encoding of files to search]: :_rg_encodings'
@ -108,6 +109,15 @@ _rg() {
{-L,--follow}'[follow symlinks]'
$no"--no-follow[don't follow symlinks]"
+ '(generate)' # Options for generating ancillary data
'--generate=[generate man page or completion scripts]:when:((
man\:"man page"
complete-bash\:"shell completions for bash"
complete-zsh\:"shell completions for zsh"
complete-fish\:"shell completions for fish"
complete-powershell\:"shell completions for PowerShell"
))'
+ glob # File-glob options
'*'{-g+,--glob=}'[include/exclude files matching specified glob]:glob'
'*--iglob=[include/exclude files matching specified case-insensitive glob]:glob'
@ -125,8 +135,8 @@ _rg() {
$no"--no-hidden[don't search hidden files and directories]"
+ '(hybrid)' # hybrid regex options
'--auto-hybrid-regex[dynamically use PCRE2 if necessary]'
$no"--no-auto-hybrid-regex[don't dynamically use PCRE2 if necessary]"
'--auto-hybrid-regex[DEPRECATED: dynamically use PCRE2 if necessary]'
$no"--no-auto-hybrid-regex[DEPRECATED: don't dynamically use PCRE2 if necessary]"
+ '(ignore)' # Ignore-file options
"(--no-ignore-global --no-ignore-parent --no-ignore-vcs --no-ignore-dot)--no-ignore[don't respect ignore files]"
@ -182,7 +192,8 @@ _rg() {
$no"--no-max-columns-preview[don't show preview for long lines (with -M)]"
+ '(max-depth)' # Directory-depth options
'--max-depth=[specify max number of directories to descend]:number of directories'
{-d,--max-depth}'[specify max number of directories to descend]:number of directories'
'--maxdepth=[alias for --max-depth]:number of directories'
'!--maxdepth=:number of directories'
+ '(messages)' # Error-message options
@ -210,15 +221,15 @@ _rg() {
+ '(passthru)' # Pass-through options
'(--vimgrep)--passthru[show both matching and non-matching lines]'
'!(--vimgrep)--passthrough'
'(--vimgrep)--passthrough[alias for --passthru]'
+ '(pcre2)' # PCRE2 options
{-P,--pcre2}'[enable matching with PCRE2]'
$no'(pcre2-unicode)--no-pcre2[disable matching with PCRE2]'
+ '(pcre2-unicode)' # PCRE2 Unicode options
$no'(--no-pcre2 --no-pcre2-unicode)--pcre2-unicode[enable PCRE2 Unicode mode (with -P)]'
'(--no-pcre2 --pcre2-unicode)--no-pcre2-unicode[disable PCRE2 Unicode mode (with -P)]'
$no'(--no-pcre2 --no-pcre2-unicode)--pcre2-unicode[DEPRECATED: enable PCRE2 Unicode mode (with -P)]'
'(--no-pcre2 --pcre2-unicode)--no-pcre2-unicode[DEPRECATED: disable PCRE2 Unicode mode (with -P)]'
+ '(pre)' # Preprocessing options
'(-z --search-zip)--pre=[specify preprocessor utility]:preprocessor utility:_command_names -e'
@ -252,7 +263,8 @@ _rg() {
accessed\:"sort by last accessed time"
created\:"sort by creation time"
))'
'!(threads)--sort-files[sort results by file path (disables parallelism)]'
'(threads)--sort-files[DEPRECATED: sort results by file path (disables parallelism)]'
$no"--no-sort-files[DEPRECATED: do not sort results]"
+ '(stats)' # Statistics options
'(--files file-match)--stats[show search statistics]'
@ -293,6 +305,7 @@ _rg() {
+ misc # Other options — no need to separate these at the moment
'(-b --byte-offset)'{-b,--byte-offset}'[show 0-based byte offset for each matching line]'
$no"--no-byte-offset[don't show byte offsets for each matching line]"
'--color=[specify when to use colors in output]:when:((
never\:"never use colors"
auto\:"use colors or not based on stdout, TERM, etc."
@ -305,11 +318,14 @@ _rg() {
'--debug[show debug messages]'
'--field-context-separator[set string to delimit fields in context lines]'
'--field-match-separator[set string to delimit fields in matching lines]'
'--hostname-bin=[executable for getting system hostname]:hostname executable:_command_names -e'
'--hyperlink-format=[specify pattern for hyperlinks]:pattern'
'--trace[show more verbose debug messages]'
'--dfa-size-limit=[specify upper size limit of generated DFA]:DFA size (bytes)'
"(1 stats)--files[show each file that would be searched (but don't search)]"
'*--ignore-file=[specify additional ignore file]:ignore file:_files'
'(-v --invert-match)'{-v,--invert-match}'[invert matching]'
$no"--no-invert-match[do not invert matching]"
'(-M --max-columns)'{-M+,--max-columns=}'[specify max length of lines to print]:number of bytes'
'(-m --max-count)'{-m+,--max-count=}'[specify max number of matches per file]:number of matches'
'--max-filesize=[specify size above which files should be ignored]:file size (bytes)'
@ -319,6 +335,7 @@ _rg() {
'(-q --quiet)'{-q,--quiet}'[suppress normal output]'
'--regex-size-limit=[specify upper size limit of compiled regex]:regex size (bytes)'
'*'{-u,--unrestricted}'[reduce level of "smart" searching]'
'--stop-on-nonmatch[stop on first non-matching line after a matching one]'
+ operand # Operands
'(--files --type-list file regexp)1: :_guard "^-*" pattern'
@ -396,32 +413,8 @@ _rg_encodings() {
local -a expl
local -aU _encodings
# This is impossible to read, but these encodings rarely if ever change, so it
# probably doesn't matter. They are derived from the list given here:
# https://encoding.spec.whatwg.org/#concept-encoding-get
_encodings=(
{{,us-}ascii,arabic,chinese,cyrillic,greek{,8},hebrew,korean}
logical visual mac {,cs}macintosh x-mac-{cyrillic,roman,ukrainian}
866 ibm{819,866} csibm866
big5{,-hkscs} {cn-,cs}big5 x-x-big5
cp{819,866,125{0..8}} x-cp125{0..8}
csiso2022{jp,kr} csiso8859{6,8}{e,i}
csisolatin{{1..6},9} csisolatin{arabic,cyrillic,greek,hebrew}
ecma-{114,118} asmo-708 elot_928 sun_eu_greek
euc-{jp,kr} x-euc-jp cseuckr cseucpkdfmtjapanese
{,x-}gbk csiso58gb231280 gb18030 {,cs}gb2312 gb_2312{,-80} hz-gb-2312
iso-2022-{cn,cn-ext,jp,kr}
iso8859{,-}{{1..11},13,14,15}
iso-8859-{{1..11},{6,8}-{e,i},13,14,15,16} iso_8859-{{1..9},15}
iso_8859-{1,2,6,7}:1987 iso_8859-{3,4,5,8}:1988 iso_8859-9:1989
iso-ir-{58,100,101,109,110,126,127,138,144,148,149,157}
koi{,8,8-r,8-ru,8-u,8_r} cskoi8r
ks_c_5601-{1987,1989} ksc{,_}5691 csksc56011987
latin{1..6} l{{1..6},9}
shift{-,_}jis csshiftjis {,x-}sjis ms_kanji ms932
utf{,-}8 utf-16{,be,le} unicode-1-1-utf-8
windows-{31j,874,949,125{0..8}} dos-874 tis-620 ansi_x3.4-1968
x-user-defined auto none
!ENCODINGS!
)
_wanted encodings expl encoding compadd -a "$@" - _encodings
@ -432,12 +425,24 @@ _rg_types() {
local -a expl
local -aU _types
_types=( ${(@)${(f)"$( _call_program types rg --type-list )"}%%:*} )
_types=( ${(@)${(f)"$( _call_program types $words[1] --type-list )"}//:[[:space:]]##/:} )
_wanted types expl 'file type' compadd -a "$@" - _types
if zstyle -t ":completion:${curcontext}:types" extra-verbose; then
_describe -t types 'file type' _types
else
_wanted types expl 'file type' compadd "$@" - ${(@)_types%%:*}
fi
}
_rg "$@"
# Don't run the completion function when being sourced by itself.
#
# See https://github.com/BurntSushi/ripgrep/issues/2956
# See https://github.com/BurntSushi/ripgrep/pull/2957
if [[ $funcstack[1] == _rg ]] || (( ! $+functions[compdef] )); then
_rg "$@"
else
compdef _rg rg
fi
################################################################################
# ZSH COMPLETION REFERENCE

View File

@ -0,0 +1,23 @@
/*!
Provides completions for ripgrep's CLI for the zsh shell.
Unlike completion short for other shells (at time of writing), zsh's
completions for ripgrep are maintained by hand. This is because:
1. They are lovingly written by an expert in such things.
2. Are much higher in quality than the ones below that are auto-generated.
Namely, the zsh completions take application level context about flag
compatibility into account.
3. There is a CI script that fails if a new flag is added to ripgrep that
isn't included in the zsh completions.
4. There is a wealth of documentation in the zsh script explaining how it
works and how it can be extended.
In principle, I'd be open to maintaining any completion script by hand so
long as it meets criteria 3 and 4 above.
*/
/// Generate completions for zsh.
pub(crate) fn generate() -> String {
include_str!("rg.zsh").replace("!ENCODINGS!", super::ENCODINGS.trim_end())
}

View File

@ -1,22 +1,20 @@
// This module provides routines for reading ripgrep config "rc" files. The
// primary output of these routines is a sequence of arguments, where each
// argument corresponds precisely to one shell argument.
/*!
This module provides routines for reading ripgrep config "rc" files.
use std::env;
use std::error::Error;
use std::ffi::OsString;
use std::fs::File;
use std::io;
use std::path::{Path, PathBuf};
The primary output of these routines is a sequence of arguments, where each
argument corresponds precisely to one shell argument.
*/
use std::{
ffi::OsString,
path::{Path, PathBuf},
};
use bstr::{io::BufReadExt, ByteSlice};
use log;
use crate::Result;
/// Return a sequence of arguments derived from ripgrep rc configuration files.
pub fn args() -> Vec<OsString> {
let config_path = match env::var_os("RIPGREP_CONFIG_PATH") {
let config_path = match std::env::var_os("RIPGREP_CONFIG_PATH") {
None => return vec![],
Some(config_path) => {
if config_path.is_empty() {
@ -58,11 +56,11 @@ pub fn args() -> Vec<OsString> {
/// for each line in addition to successfully parsed arguments.
fn parse<P: AsRef<Path>>(
path: P,
) -> Result<(Vec<OsString>, Vec<Box<dyn Error>>)> {
) -> anyhow::Result<(Vec<OsString>, Vec<anyhow::Error>)> {
let path = path.as_ref();
match File::open(&path) {
match std::fs::File::open(&path) {
Ok(file) => parse_reader(file),
Err(err) => Err(From::from(format!("{}: {}", path.display(), err))),
Err(err) => anyhow::bail!("{}: {}", path.display(), err),
}
}
@ -77,10 +75,10 @@ fn parse<P: AsRef<Path>>(
/// If the reader could not be read, then an error is returned. If there was a
/// problem parsing one or more lines, then errors are returned for each line
/// in addition to successfully parsed arguments.
fn parse_reader<R: io::Read>(
fn parse_reader<R: std::io::Read>(
rdr: R,
) -> Result<(Vec<OsString>, Vec<Box<dyn Error>>)> {
let bufrdr = io::BufReader::new(rdr);
) -> anyhow::Result<(Vec<OsString>, Vec<anyhow::Error>)> {
let mut bufrdr = std::io::BufReader::new(rdr);
let (mut args, mut errs) = (vec![], vec![]);
let mut line_number = 0;
bufrdr.for_byte_line_with_terminator(|line| {
@ -95,7 +93,7 @@ fn parse_reader<R: io::Read>(
args.push(osstr.to_os_string());
}
Err(err) => {
errs.push(format!("{}: {}", line_number, err).into());
errs.push(anyhow::anyhow!("{line_number}: {err}"));
}
}
Ok(true)

7675
crates/core/flags/defs.rs Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,259 @@
/*!
Provides routines for generating ripgrep's "short" and "long" help
documentation.
The short version is used when the `-h` flag is given, while the long version
is used when the `--help` flag is given.
*/
use std::{collections::BTreeMap, fmt::Write};
use crate::flags::{defs::FLAGS, doc::version, Category, Flag};
const TEMPLATE_SHORT: &'static str = include_str!("template.short.help");
const TEMPLATE_LONG: &'static str = include_str!("template.long.help");
/// Wraps `std::write!` and asserts there is no failure.
///
/// We only write to `String` in this module.
macro_rules! write {
($($tt:tt)*) => { std::write!($($tt)*).unwrap(); }
}
/// Generate short documentation, i.e., for `-h`.
pub(crate) fn generate_short() -> String {
let mut cats: BTreeMap<Category, (Vec<String>, Vec<String>)> =
BTreeMap::new();
let (mut maxcol1, mut maxcol2) = (0, 0);
for flag in FLAGS.iter().copied() {
let columns =
cats.entry(flag.doc_category()).or_insert((vec![], vec![]));
let (col1, col2) = generate_short_flag(flag);
maxcol1 = maxcol1.max(col1.len());
maxcol2 = maxcol2.max(col2.len());
columns.0.push(col1);
columns.1.push(col2);
}
let mut out =
TEMPLATE_SHORT.replace("!!VERSION!!", &version::generate_digits());
for (cat, (col1, col2)) in cats.iter() {
let var = format!("!!{name}!!", name = cat.as_str());
let val = format_short_columns(col1, col2, maxcol1, maxcol2);
out = out.replace(&var, &val);
}
out
}
/// Generate short for a single flag.
///
/// The first element corresponds to the flag name while the second element
/// corresponds to the documentation string.
fn generate_short_flag(flag: &dyn Flag) -> (String, String) {
let (mut col1, mut col2) = (String::new(), String::new());
// Some of the variable names are fine for longer form
// docs, but they make the succinct short help very noisy.
// So just shorten some of them.
let var = flag.doc_variable().map(|s| {
let mut s = s.to_string();
s = s.replace("SEPARATOR", "SEP");
s = s.replace("REPLACEMENT", "TEXT");
s = s.replace("NUM+SUFFIX?", "NUM");
s
});
// Generate the first column, the flag name.
if let Some(byte) = flag.name_short() {
let name = char::from(byte);
write!(col1, r"-{name}");
write!(col1, r", ");
}
write!(col1, r"--{name}", name = flag.name_long());
if let Some(var) = var.as_ref() {
write!(col1, r"={var}");
}
// And now the second column, with the description.
write!(col2, "{}", flag.doc_short());
(col1, col2)
}
/// Write two columns of documentation.
///
/// `maxcol1` should be the maximum length (in bytes) of the first column,
/// while `maxcol2` should be the maximum length (in bytes) of the second
/// column.
fn format_short_columns(
col1: &[String],
col2: &[String],
maxcol1: usize,
_maxcol2: usize,
) -> String {
assert_eq!(col1.len(), col2.len(), "columns must have equal length");
const PAD: usize = 2;
let mut out = String::new();
for (i, (c1, c2)) in col1.iter().zip(col2.iter()).enumerate() {
if i > 0 {
write!(out, "\n");
}
let pad = maxcol1 - c1.len() + PAD;
write!(out, " ");
write!(out, "{c1}");
write!(out, "{}", " ".repeat(pad));
write!(out, "{c2}");
}
out
}
/// Generate long documentation, i.e., for `--help`.
pub(crate) fn generate_long() -> String {
let mut cats = BTreeMap::new();
for flag in FLAGS.iter().copied() {
let mut cat = cats.entry(flag.doc_category()).or_insert(String::new());
if !cat.is_empty() {
write!(cat, "\n\n");
}
generate_long_flag(flag, &mut cat);
}
let mut out =
TEMPLATE_LONG.replace("!!VERSION!!", &version::generate_digits());
for (cat, value) in cats.iter() {
let var = format!("!!{name}!!", name = cat.as_str());
out = out.replace(&var, value);
}
out
}
/// Write generated documentation for `flag` to `out`.
fn generate_long_flag(flag: &dyn Flag, out: &mut String) {
if let Some(byte) = flag.name_short() {
let name = char::from(byte);
write!(out, r" -{name}");
if let Some(var) = flag.doc_variable() {
write!(out, r" {var}");
}
write!(out, r", ");
} else {
write!(out, r" ");
}
let name = flag.name_long();
write!(out, r"--{name}");
if let Some(var) = flag.doc_variable() {
write!(out, r"={var}");
}
write!(out, "\n");
let doc = flag.doc_long().trim();
let doc = super::render_custom_markup(doc, "flag", |name, out| {
let Some(flag) = crate::flags::parse::lookup(name) else {
unreachable!(r"found unrecognized \flag{{{name}}} in --help docs")
};
if let Some(name) = flag.name_short() {
write!(out, r"-{}/", char::from(name));
}
write!(out, r"--{}", flag.name_long());
});
let doc = super::render_custom_markup(&doc, "flag-negate", |name, out| {
let Some(flag) = crate::flags::parse::lookup(name) else {
unreachable!(
r"found unrecognized \flag-negate{{{name}}} in --help docs"
)
};
let Some(name) = flag.name_negated() else {
let long = flag.name_long();
unreachable!(
"found \\flag-negate{{{long}}} in --help docs but \
{long} does not have a negation"
);
};
write!(out, r"--{name}");
});
let mut cleaned = remove_roff(&doc);
if let Some(negated) = flag.name_negated() {
// Flags that can be negated that aren't switches, like
// --context-separator, are somewhat weird. Because of that, the docs
// for those flags should discuss the semantics of negation explicitly.
// But for switches, the behavior is always the same.
if flag.is_switch() {
write!(cleaned, "\n\nThis flag can be disabled with --{negated}.");
}
}
let indent = " ".repeat(8);
let wrapopts = textwrap::Options::new(71)
// Normally I'd be fine with breaking at hyphens, but ripgrep's docs
// includes a lot of flag names, and they in turn contain hyphens.
// Breaking flag names across lines is not great.
.word_splitter(textwrap::WordSplitter::NoHyphenation);
for (i, paragraph) in cleaned.split("\n\n").enumerate() {
if i > 0 {
write!(out, "\n\n");
}
let mut new = paragraph.to_string();
if paragraph.lines().all(|line| line.starts_with(" ")) {
// Re-indent but don't refill so as to preserve line breaks
// in code/shell example snippets.
new = textwrap::indent(&new, &indent);
} else {
new = new.replace("\n", " ");
new = textwrap::refill(&new, &wrapopts);
new = textwrap::indent(&new, &indent);
}
write!(out, "{}", new.trim_end());
}
}
/// Removes roff syntax from `v` such that the result is approximately plain
/// text readable.
///
/// This is basically a mish mash of heuristics based on the specific roff used
/// in the docs for the flags in this tool. If new kinds of roff are used in
/// the docs, then this may need to be updated to handle them.
fn remove_roff(v: &str) -> String {
let mut lines = vec![];
for line in v.trim().lines() {
assert!(!line.is_empty(), "roff should have no empty lines");
if line.starts_with(".") {
if line.starts_with(".IP ") {
let item_label = line
.split(" ")
.nth(1)
.expect("first argument to .IP")
.replace(r"\(bu", r"•")
.replace(r"\fB", "")
.replace(r"\fP", ":");
lines.push(format!("{item_label}"));
} else if line.starts_with(".IB ") || line.starts_with(".BI ") {
let pieces = line
.split_whitespace()
.skip(1)
.collect::<Vec<_>>()
.concat();
lines.push(format!("{pieces}"));
} else if line.starts_with(".sp")
|| line.starts_with(".PP")
|| line.starts_with(".TP")
{
lines.push("".to_string());
}
} else if line.starts_with(r"\fB") && line.ends_with(r"\fP") {
let line = line.replace(r"\fB", "").replace(r"\fP", "");
lines.push(format!("{line}:"));
} else {
lines.push(line.to_string());
}
}
// Squash multiple adjacent paragraph breaks into one.
lines.dedup_by(|l1, l2| l1.is_empty() && l2.is_empty());
lines
.join("\n")
.replace(r"\fB", "")
.replace(r"\fI", "")
.replace(r"\fP", "")
.replace(r"\-", "-")
.replace(r"\\", r"\")
}

View File

@ -0,0 +1,110 @@
/*!
Provides routines for generating ripgrep's man page in `roff` format.
*/
use std::{collections::BTreeMap, fmt::Write};
use crate::flags::{defs::FLAGS, doc::version, Flag};
const TEMPLATE: &'static str = include_str!("template.rg.1");
/// Wraps `std::write!` and asserts there is no failure.
///
/// We only write to `String` in this module.
macro_rules! write {
($($tt:tt)*) => { std::write!($($tt)*).unwrap(); }
}
/// Wraps `std::writeln!` and asserts there is no failure.
///
/// We only write to `String` in this module.
macro_rules! writeln {
($($tt:tt)*) => { std::writeln!($($tt)*).unwrap(); }
}
/// Returns a `roff` formatted string corresponding to ripgrep's entire man
/// page.
pub(crate) fn generate() -> String {
let mut cats = BTreeMap::new();
for flag in FLAGS.iter().copied() {
let mut cat = cats.entry(flag.doc_category()).or_insert(String::new());
if !cat.is_empty() {
writeln!(cat, ".sp");
}
generate_flag(flag, &mut cat);
}
let mut out = TEMPLATE.replace("!!VERSION!!", &version::generate_digits());
for (cat, value) in cats.iter() {
let var = format!("!!{name}!!", name = cat.as_str());
out = out.replace(&var, value);
}
out
}
/// Writes `roff` formatted documentation for `flag` to `out`.
fn generate_flag(flag: &'static dyn Flag, out: &mut String) {
if let Some(byte) = flag.name_short() {
let name = char::from(byte);
write!(out, r"\fB\-{name}\fP");
if let Some(var) = flag.doc_variable() {
write!(out, r" \fI{var}\fP");
}
write!(out, r", ");
}
let name = flag.name_long();
write!(out, r"\fB\-\-{name}\fP");
if let Some(var) = flag.doc_variable() {
write!(out, r"=\fI{var}\fP");
}
write!(out, "\n");
writeln!(out, ".RS 4");
let doc = flag.doc_long().trim();
// Convert \flag{foo} into something nicer.
let doc = super::render_custom_markup(doc, "flag", |name, out| {
let Some(flag) = crate::flags::parse::lookup(name) else {
unreachable!(r"found unrecognized \flag{{{name}}} in roff docs")
};
out.push_str(r"\fB");
if let Some(name) = flag.name_short() {
write!(out, r"\-{}/", char::from(name));
}
write!(out, r"\-\-{}", flag.name_long());
out.push_str(r"\fP");
});
// Convert \flag-negate{foo} into something nicer.
let doc = super::render_custom_markup(&doc, "flag-negate", |name, out| {
let Some(flag) = crate::flags::parse::lookup(name) else {
unreachable!(
r"found unrecognized \flag-negate{{{name}}} in roff docs"
)
};
let Some(name) = flag.name_negated() else {
let long = flag.name_long();
unreachable!(
"found \\flag-negate{{{long}}} in roff docs but \
{long} does not have a negation"
);
};
out.push_str(r"\fB");
write!(out, r"\-\-{name}");
out.push_str(r"\fP");
});
writeln!(out, "{doc}");
if let Some(negated) = flag.name_negated() {
// Flags that can be negated that aren't switches, like
// --context-separator, are somewhat weird. Because of that, the docs
// for those flags should discuss the semantics of negation explicitly.
// But for switches, the behavior is always the same.
if flag.is_switch() {
writeln!(out, ".sp");
writeln!(
out,
r"This flag can be disabled with \fB\-\-{negated}\fP."
);
}
}
writeln!(out, ".RE");
}

View File

@ -0,0 +1,38 @@
/*!
Modules for generating documentation for ripgrep's flags.
*/
pub(crate) mod help;
pub(crate) mod man;
pub(crate) mod version;
/// Searches for `\tag{...}` occurrences in `doc` and calls `replacement` for
/// each such tag found.
///
/// The first argument given to `replacement` is the tag value, `...`. The
/// second argument is the buffer that accumulates the full replacement text.
///
/// Since this function is only intended to be used on doc strings written into
/// the program source code, callers should panic in `replacement` if there are
/// any errors or unexpected circumstances.
fn render_custom_markup(
mut doc: &str,
tag: &str,
mut replacement: impl FnMut(&str, &mut String),
) -> String {
let mut out = String::with_capacity(doc.len());
let tag_prefix = format!(r"\{tag}{{");
while let Some(offset) = doc.find(&tag_prefix) {
out.push_str(&doc[..offset]);
let start = offset + tag_prefix.len();
let Some(end) = doc[start..].find('}').map(|i| start + i) else {
unreachable!(r"found {tag_prefix} without closing }}");
};
let name = &doc[start..end];
replacement(name, &mut out);
doc = &doc[end + 1..];
}
out.push_str(doc);
out
}

View File

@ -0,0 +1,61 @@
ripgrep !!VERSION!!
Andrew Gallant <jamslam@gmail.com>
ripgrep (rg) recursively searches the current directory for lines matching
a regex pattern. By default, ripgrep will respect gitignore rules and
automatically skip hidden files/directories and binary files.
Use -h for short descriptions and --help for more details.
Project home page: https://github.com/BurntSushi/ripgrep
USAGE:
rg [OPTIONS] PATTERN [PATH ...]
rg [OPTIONS] -e PATTERN ... [PATH ...]
rg [OPTIONS] -f PATTERNFILE ... [PATH ...]
rg [OPTIONS] --files [PATH ...]
rg [OPTIONS] --type-list
command | rg [OPTIONS] PATTERN
rg [OPTIONS] --help
rg [OPTIONS] --version
POSITIONAL ARGUMENTS:
<PATTERN>
A regular expression used for searching. To match a pattern beginning
with a dash, use the -e/--regexp flag.
For example, to search for the literal '-foo', you can use this flag:
rg -e -foo
You can also use the special '--' delimiter to indicate that no more
flags will be provided. Namely, the following is equivalent to the
above:
rg -- -foo
<PATH>...
A file or directory to search. Directories are searched recursively.
File paths specified on the command line override glob and ignore
rules.
INPUT OPTIONS:
!!input!!
SEARCH OPTIONS:
!!search!!
FILTER OPTIONS:
!!filter!!
OUTPUT OPTIONS:
!!output!!
OUTPUT MODES:
!!output-modes!!
LOGGING OPTIONS:
!!logging!!
OTHER BEHAVIORS:
!!other-behaviors!!

View File

@ -0,0 +1,424 @@
.TH RG 1 2024-09-08 "!!VERSION!!" "User Commands"
.
.
.SH NAME
rg \- recursively search the current directory for lines matching a pattern
.
.
.SH SYNOPSIS
.\" I considered using GNU troff's .SY and .YS "synopsis" macros here, but it
.\" looks like they aren't portable. Specifically, they don't appear to be in
.\" BSD's mdoc used on macOS.
.sp
\fBrg\fP [\fIOPTIONS\fP] \fIPATTERN\fP [\fIPATH\fP...]
.sp
\fBrg\fP [\fIOPTIONS\fP] \fB\-e\fP \fIPATTERN\fP... [\fIPATH\fP...]
.sp
\fBrg\fP [\fIOPTIONS\fP] \fB\-f\fP \fIPATTERNFILE\fP... [\fIPATH\fP...]
.sp
\fBrg\fP [\fIOPTIONS\fP] \fB\-\-files\fP [\fIPATH\fP...]
.sp
\fBrg\fP [\fIOPTIONS\fP] \fB\-\-type\-list\fP
.sp
\fIcommand\fP | \fBrg\fP [\fIOPTIONS\fP] \fIPATTERN\fP
.sp
\fBrg\fP [\fIOPTIONS\fP] \fB\-\-help\fP
.sp
\fBrg\fP [\fIOPTIONS\fP] \fB\-\-version\fP
.
.
.SH DESCRIPTION
ripgrep (rg) recursively searches the current directory for a regex pattern.
By default, ripgrep will respect your \fB.gitignore\fP and automatically skip
hidden files/directories and binary files.
.sp
ripgrep's default regex engine uses finite automata and guarantees linear
time searching. Because of this, features like backreferences and arbitrary
look-around are not supported. However, if ripgrep is built with PCRE2,
then the \fB\-P/\-\-pcre2\fP flag can be used to enable backreferences and
look-around.
.sp
ripgrep supports configuration files. Set \fBRIPGREP_CONFIG_PATH\fP to a
configuration file. The file can specify one shell argument per line. Lines
starting with \fB#\fP are ignored. For more details, see \fBCONFIGURATION
FILES\fP below.
.sp
ripgrep will automatically detect if stdin is a readable file and search stdin
for a regex pattern, e.g. \fBls | rg foo\fP. In some environments, stdin may
exist when it shouldn't. To turn off stdin detection, one can explicitly
specify the directory to search, e.g. \fBrg foo ./\fP.
.sp
Like other tools such as \fBls\fP, ripgrep will alter its output depending on
whether stdout is connected to a tty. By default, when printing a tty, ripgrep
will enable colors, line numbers and a heading format that lists each matching
file path once instead of once per matching line.
.sp
Tip: to disable all smart filtering and make ripgrep behave a bit more like
classical grep, use \fBrg -uuu\fP.
.
.
.SH REGEX SYNTAX
ripgrep uses Rust's regex engine by default, which documents its syntax:
\fIhttps://docs.rs/regex/1.*/regex/#syntax\fP
.sp
ripgrep uses byte-oriented regexes, which has some additional documentation:
\fIhttps://docs.rs/regex/1.*/regex/bytes/index.html#syntax\fP
.sp
To a first approximation, ripgrep uses Perl-like regexes without look-around or
backreferences. This makes them very similar to the "extended" (ERE) regular
expressions supported by *egrep*, but with a few additional features like
Unicode character classes.
.sp
If you're using ripgrep with the \fB\-P/\-\-pcre2\fP flag, then please consult
\fIhttps://www.pcre.org\fP or the PCRE2 man pages for documentation on the
supported syntax.
.
.
.SH POSITIONAL ARGUMENTS
.TP 12
\fIPATTERN\fP
A regular expression used for searching. To match a pattern beginning with a
dash, use the \fB\-e/\-\-regexp\fP option.
.TP 12
\fIPATH\fP
A file or directory to search. Directories are searched recursively. File paths
specified explicitly on the command line override glob and ignore rules.
.
.
.SH OPTIONS
This section documents all flags that ripgrep accepts. Flags are grouped into
categories below according to their function.
.sp
Note that many options can be turned on and off. In some cases, those flags are
not listed explicitly below. For example, the \fB\-\-column\fP flag (listed
below) enables column numbers in ripgrep's output, but the \fB\-\-no\-column\fP
flag (not listed below) disables them. The reverse can also exist. For example,
the \fB\-\-no\-ignore\fP flag (listed below) disables ripgrep's \fBgitignore\fP
logic, but the \fB\-\-ignore\fP flag (not listed below) enables it. These
flags are useful for overriding a ripgrep configuration file (or alias) on the
command line. Each flag's documentation notes whether an inverted flag exists.
In all cases, the flag specified last takes precedence.
.
.SS INPUT OPTIONS
!!input!!
.
.SS SEARCH OPTIONS
!!search!!
.
.SS FILTER OPTIONS
!!filter!!
.
.SS OUTPUT OPTIONS
!!output!!
.
.SS OUTPUT MODES
!!output-modes!!
.
.SS LOGGING OPTIONS
!!logging!!
.
.SS OTHER BEHAVIORS
!!other-behaviors!!
.
.
.SH EXIT STATUS
If ripgrep finds a match, then the exit status of the program is \fB0\fP.
If no match could be found, then the exit status is \fB1\fP. If an error
occurred, then the exit status is always \fB2\fP unless ripgrep was run with
the \fB\-q/\-\-quiet\fP flag and a match was found. In summary:
.sp
.IP \(bu 3n
\fB0\fP exit status occurs only when at least one match was found, and if
no error occurred, unless \fB\-q/\-\-quiet\fP was given.
.
.IP \(bu 3n
\fB1\fP exit status occurs only when no match was found and no error occurred.
.
.IP \(bu 3n
\fB2\fP exit status occurs when an error occurred. This is true for both
catastrophic errors (e.g., a regex syntax error) and for soft errors (e.g.,
unable to read a file).
.
.
.SH AUTOMATIC FILTERING
ripgrep does a fair bit of automatic filtering by default. This section
describes that filtering and how to control it.
.sp
\fBTIP\fP: To disable automatic filtering, use \fBrg -uuu\fP.
.sp
ripgrep's automatic "smart" filtering is one of the most apparent
differentiating features between ripgrep and other tools like \fBgrep\fP. As
such, its behavior may be surprising to users that aren't expecting it.
.sp
ripgrep does four types of filtering automatically:
.sp
.
.IP 1. 3n
Files and directories that match ignore rules are not searched.
.IP 2. 3n
Hidden files and directories are not searched.
.IP 3. 3n
Binary files (files with a \fBNUL\fP byte) are not searched.
.IP 4. 3n
Symbolic links are not followed.
.PP
The first type of filtering is the most sophisticated. ripgrep will attempt to
respect your \fBgitignore\fP rules as faithfully as possible. In particular,
this includes the following:
.
.IP \(bu 3n
Any global rules, e.g., in \fB$HOME/.config/git/ignore\fP.
.
.IP \(bu 3n
Any rules in relevant \fB.gitignore\fP files. This includes \fB.gitignore\fP
files in parent directories that are part of the same \fBgit\fP repository.
(Unless \fB\-\-no\-require\-git\fP is given.)
.
.IP \(bu 3n
Any local rules, e.g., in \fB.git/info/exclude\fP.
.PP
In some cases, ripgrep and \fBgit\fP will not always be in sync in terms
of which files are ignored. For example, a file that is ignored via
\fB.gitignore\fP but is tracked by \fBgit\fP would not be searched by ripgrep
even though \fBgit\fP tracks it. This is unlikely to ever be fixed. Instead,
you should either make sure your exclude rules match the files you track
precisely, or otherwise use \fBgit grep\fP for search.
.sp
Additional ignore rules can be provided outside of a \fBgit\fP context:
.
.IP \(bu 3n
Any rules in \fB.ignore\fP. ripgrep will also respect \fB.ignore\fP files in
parent directories.
.
.IP \(bu 3n
Any rules in \fB.rgignore\fP. ripgrep will also respect \fB.rgignore\fP files
in parent directories.
.
.IP \(bu 3n
Any rules in files specified with the \fB\-\-ignore\-file\fP flag.
.PP
The precedence of ignore rules is as follows, with later items overriding
earlier items:
.
.IP \(bu 3n
Files given by \fB\-\-ignore\-file\fP.
.
.IP \(bu 3n
Global gitignore rules, e.g., from \fB$HOME/.config/git/ignore\fP.
.
.IP \(bu 3n
Local rules from \fB.git/info/exclude\fP.
.
.IP \(bu 3n
Rules from \fB.gitignore\fP.
.
.IP \(bu 3n
Rules from \fB.ignore\fP.
.
.IP \(bu 3n
Rules from \fB.rgignore\fP.
.PP
So for example, if \fIfoo\fP were in a \fB.gitignore\fP and \fB!\fP\fIfoo\fP
were in an \fB.rgignore\fP, then \fIfoo\fP would not be ignored since
\fB.rgignore\fP takes precedence over \fB.gitignore\fP.
.sp
Each of the types of filtering can be configured via command line flags:
.
.IP \(bu 3n
There are several flags starting with \fB\-\-no\-ignore\fP that toggle which,
if any, ignore rules are respected. \fB\-\-no\-ignore\fP by itself will disable
all
of them.
.
.IP \(bu 3n
\fB\-./\-\-hidden\fP will force ripgrep to search hidden files and directories.
.
.IP \(bu 3n
\fB\-\-binary\fP will force ripgrep to search binary files.
.
.IP \(bu 3n
\fB\-L/\-\-follow\fP will force ripgrep to follow symlinks.
.PP
As a special short hand, the \fB\-u\fP flag can be specified up to three times.
Each additional time incrementally decreases filtering:
.
.IP \(bu 3n
\fB\-u\fP is equivalent to \fB\-\-no\-ignore\fP.
.
.IP \(bu 3n
\fB\-uu\fP is equivalent to \fB\-\-no\-ignore \-\-hidden\fP.
.
.IP \(bu 3n
\fB\-uuu\fP is equivalent to \fB\-\-no\-ignore \-\-hidden \-\-binary\fP.
.PP
In particular, \fBrg -uuu\fP should search the same exact content as \fBgrep
-r\fP.
.
.
.SH CONFIGURATION FILES
ripgrep supports reading configuration files that change ripgrep's default
behavior. The format of the configuration file is an "rc" style and is very
simple. It is defined by two rules:
.
.IP 1. 3n
Every line is a shell argument, after trimming whitespace.
.
.IP 2. 3n
Lines starting with \fB#\fP (optionally preceded by any amount of whitespace)
are ignored.
.PP
ripgrep will look for a single configuration file if and only if the
\fBRIPGREP_CONFIG_PATH\fP environment variable is set and is non-empty.
ripgrep will parse arguments from this file on startup and will behave as if
the arguments in this file were prepended to any explicit arguments given to
ripgrep on the command line. Note though that the \fBrg\fP command you run
must still be valid. That is, it must always contain at least one pattern at
the command line, even if the configuration file uses the \fB\-e/\-\-regexp\fP
flag.
.sp
For example, if your ripgreprc file contained a single line:
.sp
.EX
\-\-smart\-case
.EE
.sp
then the following command
.sp
.EX
RIPGREP_CONFIG_PATH=wherever/.ripgreprc rg foo
.EE
.sp
would behave identically to the following command:
.sp
.EX
rg \-\-smart-case foo
.EE
.sp
Another example is adding types, like so:
.sp
.EX
\-\-type-add
web:*.{html,css,js}*
.EE
.sp
The above would behave identically to the following command:
.sp
.EX
rg \-\-type\-add 'web:*.{html,css,js}*' foo
.EE
.sp
The same applies to using globs. This:
.sp
.EX
\-\-glob=!.git
.EE
.sp
or this:
.sp
.EX
\-\-glob
!.git
.EE
.sp
would behave identically to the following command:
.sp
.EX
rg \-\-glob '!.git' foo
.EE
.sp
The bottom line is that every shell argument needs to be on its own line. So
for example, a config file containing
.sp
.EX
\-j 4
.EE
.sp
is probably not doing what you intend. Instead, you want
.sp
.EX
\-j
4
.EE
.sp
or
.sp
.EX
\-j4
.EE
.sp
ripgrep also provides a flag, \fB\-\-no\-config\fP, that when present will
suppress any and all support for configuration. This includes any future
support for auto-loading configuration files from pre-determined paths.
.sp
Conflicts between configuration files and explicit arguments are handled
exactly like conflicts in the same command line invocation. That is, assuming
your config file contains only \fB\-\-smart\-case\fP, then this command:
.sp
.EX
RIPGREP_CONFIG_PATH=wherever/.ripgreprc rg foo \-\-case\-sensitive
.EE
.sp
is exactly equivalent to
.sp
.EX
rg \-\-smart\-case foo \-\-case\-sensitive
.EE
.sp
in which case, the \fB\-\-case\-sensitive\fP flag would override the
\fB\-\-smart\-case\fP flag.
.
.
.SH SHELL COMPLETION
Shell completion files are included in the release tarball for Bash, Fish, Zsh
and PowerShell.
.sp
For \fBbash\fP, move \fBrg.bash\fP to \fB$XDG_CONFIG_HOME/bash_completion\fP or
\fB/etc/bash_completion.d/\fP.
.sp
For \fBfish\fP, move \fBrg.fish\fP to \fB$HOME/.config/fish/completions\fP.
.sp
For \fBzsh\fP, move \fB_rg\fP to one of your \fB$fpath\fP directories.
.
.
.SH CAVEATS
ripgrep may abort unexpectedly when using default settings if it searches a
file that is simultaneously truncated. This behavior can be avoided by passing
the \fB\-\-no\-mmap\fP flag which will forcefully disable the use of memory
maps in all cases.
.sp
ripgrep may use a large amount of memory depending on a few factors. Firstly,
if ripgrep uses parallelism for search (the default), then the entire
output for each individual file is buffered into memory in order to prevent
interleaving matches in the output. To avoid this, you can disable parallelism
with the \fB\-j1\fP flag. Secondly, ripgrep always needs to have at least a
single line in memory in order to execute a search. A file with a very long
line can thus cause ripgrep to use a lot of memory. Generally, this only occurs
when searching binary data with the \fB\-a/\-\-text\fP flag enabled. (When the
\fB\-a/\-\-text\fP flag isn't enabled, ripgrep will replace all NUL bytes with
line terminators, which typically prevents exorbitant memory usage.) Thirdly,
when ripgrep searches a large file using a memory map, the process will likely
report its resident memory usage as the size of the file. However, this does
not mean ripgrep actually needed to use that much heap memory; the operating
system will generally handle this for you.
.
.
.SH VERSION
!!VERSION!!
.
.
.SH HOMEPAGE
\fIhttps://github.com/BurntSushi/ripgrep\fP
.sp
Please report bugs and feature requests to the issue tracker. Please do your
best to provide a reproducible test case for bugs. This should include the
corpus being searched, the \fBrg\fP command, the actual output and the expected
output. Please also include the output of running the same \fBrg\fP command but
with the \fB\-\-debug\fP flag.
.sp
If you have questions that don't obviously fall into the "bug" or "feature
request" category, then they are welcome in the Discussions section of the
issue tracker: \fIhttps://github.com/BurntSushi/ripgrep/discussions\fP.
.
.
.SH AUTHORS
Andrew Gallant <\fIjamslam@gmail.com\fP>

View File

@ -0,0 +1,38 @@
ripgrep !!VERSION!!
Andrew Gallant <jamslam@gmail.com>
ripgrep (rg) recursively searches the current directory for lines matching
a regex pattern. By default, ripgrep will respect gitignore rules and
automatically skip hidden files/directories and binary files.
Use -h for short descriptions and --help for more details.
Project home page: https://github.com/BurntSushi/ripgrep
USAGE:
rg [OPTIONS] PATTERN [PATH ...]
POSITIONAL ARGUMENTS:
<PATTERN> A regular expression used for searching.
<PATH>... A file or directory to search.
INPUT OPTIONS:
!!input!!
SEARCH OPTIONS:
!!search!!
FILTER OPTIONS:
!!filter!!
OUTPUT OPTIONS:
!!output!!
OUTPUT MODES:
!!output-modes!!
LOGGING OPTIONS:
!!logging!!
OTHER BEHAVIORS:
!!other-behaviors!!

View File

@ -0,0 +1,177 @@
/*!
Provides routines for generating version strings.
Version strings can be just the digits, an overall short one-line description
or something more verbose that includes things like CPU target feature support.
*/
use std::fmt::Write;
/// Generates just the numerical part of the version of ripgrep.
///
/// This includes the git revision hash.
pub(crate) fn generate_digits() -> String {
let semver = option_env!("CARGO_PKG_VERSION").unwrap_or("N/A");
match option_env!("RIPGREP_BUILD_GIT_HASH") {
None => semver.to_string(),
Some(hash) => format!("{semver} (rev {hash})"),
}
}
/// Generates a short version string of the form `ripgrep x.y.z`.
pub(crate) fn generate_short() -> String {
let digits = generate_digits();
format!("ripgrep {digits}")
}
/// Generates a longer multi-line version string.
///
/// This includes not only the version of ripgrep but some other information
/// about its build. For example, SIMD support and PCRE2 support.
pub(crate) fn generate_long() -> String {
let (compile, runtime) = (compile_cpu_features(), runtime_cpu_features());
let mut out = String::new();
writeln!(out, "{}", generate_short()).unwrap();
writeln!(out).unwrap();
writeln!(out, "features:{}", features().join(",")).unwrap();
if !compile.is_empty() {
writeln!(out, "simd(compile):{}", compile.join(",")).unwrap();
}
if !runtime.is_empty() {
writeln!(out, "simd(runtime):{}", runtime.join(",")).unwrap();
}
let (pcre2_version, _) = generate_pcre2();
writeln!(out, "\n{pcre2_version}").unwrap();
out
}
/// Generates multi-line version string with PCRE2 information.
///
/// This also returns whether PCRE2 is actually available in this build of
/// ripgrep.
pub(crate) fn generate_pcre2() -> (String, bool) {
let mut out = String::new();
#[cfg(feature = "pcre2")]
{
use grep::pcre2;
let (major, minor) = pcre2::version();
write!(out, "PCRE2 {}.{} is available", major, minor).unwrap();
if cfg!(target_pointer_width = "64") && pcre2::is_jit_available() {
writeln!(out, " (JIT is available)").unwrap();
} else {
writeln!(out, " (JIT is unavailable)").unwrap();
}
(out, true)
}
#[cfg(not(feature = "pcre2"))]
{
writeln!(out, "PCRE2 is not available in this build of ripgrep.")
.unwrap();
(out, false)
}
}
/// Returns the relevant SIMD features supported by the CPU at runtime.
///
/// This is kind of a dirty violation of abstraction, since it assumes
/// knowledge about what specific SIMD features are being used by various
/// components.
fn runtime_cpu_features() -> Vec<String> {
#[cfg(target_arch = "x86_64")]
{
let mut features = vec![];
let sse2 = is_x86_feature_detected!("sse2");
features.push(format!("{sign}SSE2", sign = sign(sse2)));
let ssse3 = is_x86_feature_detected!("ssse3");
features.push(format!("{sign}SSSE3", sign = sign(ssse3)));
let avx2 = is_x86_feature_detected!("avx2");
features.push(format!("{sign}AVX2", sign = sign(avx2)));
features
}
#[cfg(target_arch = "aarch64")]
{
let mut features = vec![];
// memchr and aho-corasick only use NEON when it is available at
// compile time. This isn't strictly necessary, but NEON is supposed
// to be available for all aarch64 targets. If this isn't true, please
// file an issue at https://github.com/BurntSushi/memchr.
let neon = cfg!(target_feature = "neon");
features.push(format!("{sign}NEON", sign = sign(neon)));
features
}
#[cfg(not(any(target_arch = "x86_64", target_arch = "aarch64")))]
{
vec![]
}
}
/// Returns the SIMD features supported while compiling ripgrep.
///
/// In essence, any features listed here are required to run ripgrep correctly.
///
/// This is kind of a dirty violation of abstraction, since it assumes
/// knowledge about what specific SIMD features are being used by various
/// components.
///
/// An easy way to enable everything available on your current CPU is to
/// compile ripgrep with `RUSTFLAGS="-C target-cpu=native"`. But note that
/// the binary produced by this will not be portable.
fn compile_cpu_features() -> Vec<String> {
#[cfg(target_arch = "x86_64")]
{
let mut features = vec![];
let sse2 = cfg!(target_feature = "sse2");
features.push(format!("{sign}SSE2", sign = sign(sse2)));
let ssse3 = cfg!(target_feature = "ssse3");
features.push(format!("{sign}SSSE3", sign = sign(ssse3)));
let avx2 = cfg!(target_feature = "avx2");
features.push(format!("{sign}AVX2", sign = sign(avx2)));
features
}
#[cfg(target_arch = "aarch64")]
{
let mut features = vec![];
let neon = cfg!(target_feature = "neon");
features.push(format!("{sign}NEON", sign = sign(neon)));
features
}
#[cfg(not(any(target_arch = "x86_64", target_arch = "aarch64")))]
{
vec![]
}
}
/// Returns a list of "features" supported (or not) by this build of ripgrpe.
fn features() -> Vec<String> {
let mut features = vec![];
let pcre2 = cfg!(feature = "pcre2");
features.push(format!("{sign}pcre2", sign = sign(pcre2)));
features
}
/// Returns `+` when `enabled` is `true` and `-` otherwise.
fn sign(enabled: bool) -> &'static str {
if enabled {
"+"
} else {
"-"
}
}

1471
crates/core/flags/hiargs.rs Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,758 @@
/*!
Provides the definition of low level arguments from CLI flags.
*/
use std::{
ffi::{OsStr, OsString},
path::PathBuf,
};
use {
bstr::{BString, ByteVec},
grep::printer::{HyperlinkFormat, UserColorSpec},
};
/// A collection of "low level" arguments.
///
/// The "low level" here is meant to constrain this type to be as close to the
/// actual CLI flags and arguments as possible. Namely, other than some
/// convenience types to help validate flag values and deal with overrides
/// between flags, these low level arguments do not contain any higher level
/// abstractions.
///
/// Another self-imposed constraint is that populating low level arguments
/// should not require anything other than validating what the user has
/// provided. For example, low level arguments should not contain a
/// `HyperlinkConfig`, since in order to get a full configuration, one needs to
/// discover the hostname of the current system (which might require running a
/// binary or a syscall).
///
/// Low level arguments are populated by the parser directly via the `update`
/// method on the corresponding implementation of the `Flag` trait.
#[derive(Debug, Default)]
pub(crate) struct LowArgs {
// Essential arguments.
pub(crate) special: Option<SpecialMode>,
pub(crate) mode: Mode,
pub(crate) positional: Vec<OsString>,
pub(crate) patterns: Vec<PatternSource>,
// Everything else, sorted lexicographically.
pub(crate) binary: BinaryMode,
pub(crate) boundary: Option<BoundaryMode>,
pub(crate) buffer: BufferMode,
pub(crate) byte_offset: bool,
pub(crate) case: CaseMode,
pub(crate) color: ColorChoice,
pub(crate) colors: Vec<UserColorSpec>,
pub(crate) column: Option<bool>,
pub(crate) context: ContextMode,
pub(crate) context_separator: ContextSeparator,
pub(crate) crlf: bool,
pub(crate) dfa_size_limit: Option<usize>,
pub(crate) encoding: EncodingMode,
pub(crate) engine: EngineChoice,
pub(crate) field_context_separator: FieldContextSeparator,
pub(crate) field_match_separator: FieldMatchSeparator,
pub(crate) fixed_strings: bool,
pub(crate) follow: bool,
pub(crate) glob_case_insensitive: bool,
pub(crate) globs: Vec<String>,
pub(crate) heading: Option<bool>,
pub(crate) hidden: bool,
pub(crate) hostname_bin: Option<PathBuf>,
pub(crate) hyperlink_format: HyperlinkFormat,
pub(crate) iglobs: Vec<String>,
pub(crate) ignore_file: Vec<PathBuf>,
pub(crate) ignore_file_case_insensitive: bool,
pub(crate) include_zero: bool,
pub(crate) invert_match: bool,
pub(crate) line_number: Option<bool>,
pub(crate) logging: Option<LoggingMode>,
pub(crate) max_columns: Option<u64>,
pub(crate) max_columns_preview: bool,
pub(crate) max_count: Option<u64>,
pub(crate) max_depth: Option<usize>,
pub(crate) max_filesize: Option<u64>,
pub(crate) mmap: MmapMode,
pub(crate) multiline: bool,
pub(crate) multiline_dotall: bool,
pub(crate) no_config: bool,
pub(crate) no_ignore_dot: bool,
pub(crate) no_ignore_exclude: bool,
pub(crate) no_ignore_files: bool,
pub(crate) no_ignore_global: bool,
pub(crate) no_ignore_messages: bool,
pub(crate) no_ignore_parent: bool,
pub(crate) no_ignore_vcs: bool,
pub(crate) no_messages: bool,
pub(crate) no_require_git: bool,
pub(crate) no_unicode: bool,
pub(crate) null: bool,
pub(crate) null_data: bool,
pub(crate) one_file_system: bool,
pub(crate) only_matching: bool,
pub(crate) path_separator: Option<u8>,
pub(crate) pre: Option<PathBuf>,
pub(crate) pre_glob: Vec<String>,
pub(crate) quiet: bool,
pub(crate) regex_size_limit: Option<usize>,
pub(crate) replace: Option<BString>,
pub(crate) search_zip: bool,
pub(crate) sort: Option<SortMode>,
pub(crate) stats: bool,
pub(crate) stop_on_nonmatch: bool,
pub(crate) threads: Option<usize>,
pub(crate) trim: bool,
pub(crate) type_changes: Vec<TypeChange>,
pub(crate) unrestricted: usize,
pub(crate) vimgrep: bool,
pub(crate) with_filename: Option<bool>,
}
/// A "special" mode that supercedes everything else.
///
/// When one of these modes is present, it overrides everything else and causes
/// ripgrep to short-circuit. In particular, we avoid converting low-level
/// argument types into higher level arguments types that can fail for various
/// reasons related to the environment. (Parsing the low-level arguments can
/// fail too, but usually not in a way that can't be worked around by removing
/// the corresponding arguments from the CLI command.) This is overall a hedge
/// to ensure that version and help information are basically always available.
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub(crate) enum SpecialMode {
/// Show a condensed version of "help" output. Generally speaking, this
/// shows each flag and an extremely terse description of that flag on
/// a single line. This corresponds to the `-h` flag.
HelpShort,
/// Shows a very verbose version of the "help" output. The docs for some
/// flags will be paragraphs long. This corresponds to the `--help` flag.
HelpLong,
/// Show condensed version information. e.g., `ripgrep x.y.z`.
VersionShort,
/// Show verbose version information. Includes "short" information as well
/// as features included in the build.
VersionLong,
/// Show PCRE2's version information, or an error if this version of
/// ripgrep wasn't compiled with PCRE2 support.
VersionPCRE2,
}
/// The overall mode that ripgrep should operate in.
///
/// If ripgrep were designed without the legacy of grep, these would probably
/// be sub-commands? Perhaps not, since they aren't as frequently used.
///
/// The point of putting these in one enum is that they are all mutually
/// exclusive and override one another.
///
/// Note that -h/--help and -V/--version are not included in this because
/// they always overrides everything else, regardless of where it appears
/// in the command line. They are treated as "special" modes that short-circuit
/// ripgrep's usual flow.
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub(crate) enum Mode {
/// ripgrep will execute a search of some kind.
Search(SearchMode),
/// Show the files that *would* be searched, but don't actually search
/// them.
Files,
/// List all file type definitions configured, including the default file
/// types and any additional file types added to the command line.
Types,
/// Generate various things like the man page and completion files.
Generate(GenerateMode),
}
impl Default for Mode {
fn default() -> Mode {
Mode::Search(SearchMode::Standard)
}
}
impl Mode {
/// Update this mode to the new mode while implementing various override
/// semantics. For example, a search mode cannot override a non-search
/// mode.
pub(crate) fn update(&mut self, new: Mode) {
match *self {
// If we're in a search mode, then anything can override it.
Mode::Search(_) => *self = new,
_ => {
// Once we're in a non-search mode, other non-search modes
// can override it. But search modes cannot. So for example,
// `--files -l` will still be Mode::Files.
if !matches!(*self, Mode::Search(_)) {
*self = new;
}
}
}
}
}
/// The kind of search that ripgrep is going to perform.
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub(crate) enum SearchMode {
/// The default standard mode of operation. ripgrep looks for matches and
/// prints them when found.
///
/// There is no specific flag for this mode since it's the default. But
/// some of the modes below, like JSON, have negation flags like --no-json
/// that let you revert back to this default mode.
Standard,
/// Show files containing at least one match.
FilesWithMatches,
/// Show files that don't contain any matches.
FilesWithoutMatch,
/// Show files containing at least one match and the number of matching
/// lines.
Count,
/// Show files containing at least one match and the total number of
/// matches.
CountMatches,
/// Print matches in a JSON lines format.
JSON,
}
/// The thing to generate via the --generate flag.
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub(crate) enum GenerateMode {
/// Generate the raw roff used for the man page.
Man,
/// Completions for bash.
CompleteBash,
/// Completions for zsh.
CompleteZsh,
/// Completions for fish.
CompleteFish,
/// Completions for PowerShell.
CompletePowerShell,
}
/// Indicates how ripgrep should treat binary data.
#[derive(Debug, Eq, PartialEq)]
pub(crate) enum BinaryMode {
/// Automatically determine the binary mode to use. Essentially, when
/// a file is searched explicitly, then it will be searched using the
/// `SearchAndSuppress` strategy. Otherwise, it will be searched in a way
/// that attempts to skip binary files as much as possible. That is, once
/// a file is classified as binary, searching will immediately stop.
Auto,
/// Search files even when they have binary data, but if a match is found,
/// suppress it and emit a warning.
///
/// In this mode, `NUL` bytes are replaced with line terminators. This is
/// a heuristic meant to reduce heap memory usage, since true binary data
/// isn't line oriented. If one attempts to treat such data as line
/// oriented, then one may wind up with impractically large lines. For
/// example, many binary files contain very long runs of NUL bytes.
SearchAndSuppress,
/// Treat all files as if they were plain text. There's no skipping and no
/// replacement of `NUL` bytes with line terminators.
AsText,
}
impl Default for BinaryMode {
fn default() -> BinaryMode {
BinaryMode::Auto
}
}
/// Indicates what kind of boundary mode to use (line or word).
#[derive(Debug, Eq, PartialEq)]
pub(crate) enum BoundaryMode {
/// Only allow matches when surrounded by line bounaries.
Line,
/// Only allow matches when surrounded by word bounaries.
Word,
}
/// Indicates the buffer mode that ripgrep should use when printing output.
///
/// The default is `Auto`.
#[derive(Debug, Eq, PartialEq)]
pub(crate) enum BufferMode {
/// Select the buffer mode, 'line' or 'block', automatically based on
/// whether stdout is connected to a tty.
Auto,
/// Flush the output buffer whenever a line terminator is seen.
///
/// This is useful when wants to see search results more immediately,
/// for example, with `tail -f`.
Line,
/// Flush the output buffer whenever it reaches some fixed size. The size
/// is usually big enough to hold many lines.
///
/// This is useful for maximum performance, particularly when printing
/// lots of results.
Block,
}
impl Default for BufferMode {
fn default() -> BufferMode {
BufferMode::Auto
}
}
/// Indicates the case mode for how to interpret all patterns given to ripgrep.
///
/// The default is `Sensitive`.
#[derive(Debug, Eq, PartialEq)]
pub(crate) enum CaseMode {
/// Patterns are matched case sensitively. i.e., `a` does not match `A`.
Sensitive,
/// Patterns are matched case insensitively. i.e., `a` does match `A`.
Insensitive,
/// Patterns are automatically matched case insensitively only when they
/// consist of all lowercase literal characters. For example, the pattern
/// `a` will match `A` but `A` will not match `a`.
Smart,
}
impl Default for CaseMode {
fn default() -> CaseMode {
CaseMode::Sensitive
}
}
/// Indicates whether ripgrep should include color/hyperlinks in its output.
///
/// The default is `Auto`.
#[derive(Debug, Eq, PartialEq)]
pub(crate) enum ColorChoice {
/// Color and hyperlinks will never be used.
Never,
/// Color and hyperlinks will be used only when stdout is connected to a
/// tty.
Auto,
/// Color will always be used.
Always,
/// Color will always be used and only ANSI escapes will be used.
///
/// This only makes sense in the context of legacy Windows console APIs.
/// At time of writing, ripgrep will try to use the legacy console APIs
/// if ANSI coloring isn't believed to be possible. This option will force
/// ripgrep to use ANSI coloring.
Ansi,
}
impl Default for ColorChoice {
fn default() -> ColorChoice {
ColorChoice::Auto
}
}
impl ColorChoice {
/// Convert this color choice to the corresponding termcolor type.
pub(crate) fn to_termcolor(&self) -> termcolor::ColorChoice {
match *self {
ColorChoice::Never => termcolor::ColorChoice::Never,
ColorChoice::Auto => termcolor::ColorChoice::Auto,
ColorChoice::Always => termcolor::ColorChoice::Always,
ColorChoice::Ansi => termcolor::ColorChoice::AlwaysAnsi,
}
}
}
/// Indicates the line context options ripgrep should use for output.
///
/// The default is no context at all.
#[derive(Debug, Eq, PartialEq)]
pub(crate) enum ContextMode {
/// All lines will be printed. That is, the context is unbounded.
Passthru,
/// Only show a certain number of lines before and after each match.
Limited(ContextModeLimited),
}
impl Default for ContextMode {
fn default() -> ContextMode {
ContextMode::Limited(ContextModeLimited::default())
}
}
impl ContextMode {
/// Set the "before" context.
///
/// If this was set to "passthru" context, then it is overridden in favor
/// of limited context with the given value for "before" and `0` for
/// "after."
pub(crate) fn set_before(&mut self, lines: usize) {
match *self {
ContextMode::Passthru => {
*self = ContextMode::Limited(ContextModeLimited {
before: Some(lines),
after: None,
both: None,
})
}
ContextMode::Limited(ContextModeLimited {
ref mut before,
..
}) => *before = Some(lines),
}
}
/// Set the "after" context.
///
/// If this was set to "passthru" context, then it is overridden in favor
/// of limited context with the given value for "after" and `0` for
/// "before."
pub(crate) fn set_after(&mut self, lines: usize) {
match *self {
ContextMode::Passthru => {
*self = ContextMode::Limited(ContextModeLimited {
before: None,
after: Some(lines),
both: None,
})
}
ContextMode::Limited(ContextModeLimited {
ref mut after, ..
}) => *after = Some(lines),
}
}
/// Set the "both" context.
///
/// If this was set to "passthru" context, then it is overridden in favor
/// of limited context with the given value for "both" and `None` for
/// "before" and "after".
pub(crate) fn set_both(&mut self, lines: usize) {
match *self {
ContextMode::Passthru => {
*self = ContextMode::Limited(ContextModeLimited {
before: None,
after: None,
both: Some(lines),
})
}
ContextMode::Limited(ContextModeLimited {
ref mut both, ..
}) => *both = Some(lines),
}
}
/// A convenience function for use in tests that returns the limited
/// context. If this mode isn't limited, then it panics.
#[cfg(test)]
pub(crate) fn get_limited(&self) -> (usize, usize) {
match *self {
ContextMode::Passthru => unreachable!("context mode is passthru"),
ContextMode::Limited(ref limited) => limited.get(),
}
}
}
/// A context mode for a finite number of lines.
///
/// Namely, this indicates that a specific number of lines (possibly zero)
/// should be shown before and/or after each matching line.
///
/// Note that there is a subtle difference between `Some(0)` and `None`. In the
/// former case, it happens when `0` is given explicitly, where as `None` is
/// the default value and occurs when no value is specified.
///
/// `both` is only set by the -C/--context flag. The reason why we don't just
/// set before = after = --context is because the before and after context
/// settings always take precedent over the -C/--context setting, regardless of
/// order. Thus, we need to keep track of them separately.
#[derive(Debug, Default, Eq, PartialEq)]
pub(crate) struct ContextModeLimited {
before: Option<usize>,
after: Option<usize>,
both: Option<usize>,
}
impl ContextModeLimited {
/// Returns the specific number of contextual lines that should be shown
/// around each match. This takes proper precedent into account, i.e.,
/// that `before` and `after` both partially override `both` in all cases.
///
/// By default, this returns `(0, 0)`.
pub(crate) fn get(&self) -> (usize, usize) {
let (mut before, mut after) =
self.both.map(|lines| (lines, lines)).unwrap_or((0, 0));
// --before and --after always override --context, regardless
// of where they appear relative to each other.
if let Some(lines) = self.before {
before = lines;
}
if let Some(lines) = self.after {
after = lines;
}
(before, after)
}
}
/// Represents the separator to use between non-contiguous sections of
/// contextual lines.
///
/// The default is `--`.
#[derive(Clone, Debug, Eq, PartialEq)]
pub(crate) struct ContextSeparator(Option<BString>);
impl Default for ContextSeparator {
fn default() -> ContextSeparator {
ContextSeparator(Some(BString::from("--")))
}
}
impl ContextSeparator {
/// Create a new context separator from the user provided argument. This
/// handles unescaping.
pub(crate) fn new(os: &OsStr) -> anyhow::Result<ContextSeparator> {
let Some(string) = os.to_str() else {
anyhow::bail!(
"separator must be valid UTF-8 (use escape sequences \
to provide a separator that is not valid UTF-8)"
)
};
Ok(ContextSeparator(Some(Vec::unescape_bytes(string).into())))
}
/// Creates a new separator that intructs the printer to disable contextual
/// separators entirely.
pub(crate) fn disabled() -> ContextSeparator {
ContextSeparator(None)
}
/// Return the raw bytes of this separator.
///
/// If context separators were disabled, then this returns `None`.
///
/// Note that this may return a `Some` variant with zero bytes.
pub(crate) fn into_bytes(self) -> Option<Vec<u8>> {
self.0.map(|sep| sep.into())
}
}
/// The encoding mode the searcher will use.
///
/// The default is `Auto`.
#[derive(Debug, Eq, PartialEq)]
pub(crate) enum EncodingMode {
/// Use only BOM sniffing to auto-detect an encoding.
Auto,
/// Use an explicit encoding forcefully, but let BOM sniffing override it.
Some(grep::searcher::Encoding),
/// Use no explicit encoding and disable all BOM sniffing. This will
/// always result in searching the raw bytes, regardless of their
/// true encoding.
Disabled,
}
impl Default for EncodingMode {
fn default() -> EncodingMode {
EncodingMode::Auto
}
}
/// The regex engine to use.
///
/// The default is `Default`.
#[derive(Debug, Eq, PartialEq)]
pub(crate) enum EngineChoice {
/// Uses the default regex engine: Rust's `regex` crate.
///
/// (Well, technically it uses `regex-automata`, but `regex-automata` is
/// the implementation of the `regex` crate.)
Default,
/// Dynamically select the right engine to use.
///
/// This works by trying to use the default engine, and if the pattern does
/// not compile, it switches over to the PCRE2 engine if it's available.
Auto,
/// Uses the PCRE2 regex engine if it's available.
PCRE2,
}
impl Default for EngineChoice {
fn default() -> EngineChoice {
EngineChoice::Default
}
}
/// The field context separator to use to between metadata for each contextual
/// line.
///
/// The default is `-`.
#[derive(Clone, Debug, Eq, PartialEq)]
pub(crate) struct FieldContextSeparator(BString);
impl Default for FieldContextSeparator {
fn default() -> FieldContextSeparator {
FieldContextSeparator(BString::from("-"))
}
}
impl FieldContextSeparator {
/// Create a new separator from the given argument value provided by the
/// user. Unescaping it automatically handled.
pub(crate) fn new(os: &OsStr) -> anyhow::Result<FieldContextSeparator> {
let Some(string) = os.to_str() else {
anyhow::bail!(
"separator must be valid UTF-8 (use escape sequences \
to provide a separator that is not valid UTF-8)"
)
};
Ok(FieldContextSeparator(Vec::unescape_bytes(string).into()))
}
/// Return the raw bytes of this separator.
///
/// Note that this may return an empty `Vec`.
pub(crate) fn into_bytes(self) -> Vec<u8> {
self.0.into()
}
}
/// The field match separator to use to between metadata for each matching
/// line.
///
/// The default is `:`.
#[derive(Clone, Debug, Eq, PartialEq)]
pub(crate) struct FieldMatchSeparator(BString);
impl Default for FieldMatchSeparator {
fn default() -> FieldMatchSeparator {
FieldMatchSeparator(BString::from(":"))
}
}
impl FieldMatchSeparator {
/// Create a new separator from the given argument value provided by the
/// user. Unescaping it automatically handled.
pub(crate) fn new(os: &OsStr) -> anyhow::Result<FieldMatchSeparator> {
let Some(string) = os.to_str() else {
anyhow::bail!(
"separator must be valid UTF-8 (use escape sequences \
to provide a separator that is not valid UTF-8)"
)
};
Ok(FieldMatchSeparator(Vec::unescape_bytes(string).into()))
}
/// Return the raw bytes of this separator.
///
/// Note that this may return an empty `Vec`.
pub(crate) fn into_bytes(self) -> Vec<u8> {
self.0.into()
}
}
/// The type of logging to do. `Debug` emits some details while `Trace` emits
/// much more.
#[derive(Debug, Eq, PartialEq)]
pub(crate) enum LoggingMode {
Debug,
Trace,
}
/// Indicates when to use memory maps.
///
/// The default is `Auto`.
#[derive(Debug, Eq, PartialEq)]
pub(crate) enum MmapMode {
/// This instructs ripgrep to use heuristics for selecting when to and not
/// to use memory maps for searching.
Auto,
/// This instructs ripgrep to always try memory maps when possible. (Memory
/// maps are not possible to use in all circumstances, for example, for
/// virtual files.)
AlwaysTryMmap,
/// Never use memory maps under any circumstances. This includes even
/// when multi-line search is enabled where ripgrep will read the entire
/// contents of a file on to the heap before searching it.
Never,
}
impl Default for MmapMode {
fn default() -> MmapMode {
MmapMode::Auto
}
}
/// Represents a source of patterns that ripgrep should search for.
///
/// The reason to unify these is so that we can retain the order of `-f/--flag`
/// and `-e/--regexp` flags relative to one another.
#[derive(Debug, Eq, PartialEq)]
pub(crate) enum PatternSource {
/// Comes from the `-e/--regexp` flag.
Regexp(String),
/// Comes from the `-f/--file` flag.
File(PathBuf),
}
/// The sort criteria, if present.
#[derive(Debug, Eq, PartialEq)]
pub(crate) struct SortMode {
/// Whether to reverse the sort criteria (i.e., descending order).
pub(crate) reverse: bool,
/// The actual sorting criteria.
pub(crate) kind: SortModeKind,
}
/// The criteria to use for sorting.
#[derive(Debug, Eq, PartialEq)]
pub(crate) enum SortModeKind {
/// Sort by path.
Path,
/// Sort by last modified time.
LastModified,
/// Sort by last accessed time.
LastAccessed,
/// Sort by creation time.
Created,
}
impl SortMode {
/// Checks whether the selected sort mode is supported. If it isn't, an
/// error (hopefully explaining why) is returned.
pub(crate) fn supported(&self) -> anyhow::Result<()> {
match self.kind {
SortModeKind::Path => Ok(()),
SortModeKind::LastModified => {
let md = std::env::current_exe()
.and_then(|p| p.metadata())
.and_then(|md| md.modified());
let Err(err) = md else { return Ok(()) };
anyhow::bail!(
"sorting by last modified isn't supported: {err}"
);
}
SortModeKind::LastAccessed => {
let md = std::env::current_exe()
.and_then(|p| p.metadata())
.and_then(|md| md.accessed());
let Err(err) = md else { return Ok(()) };
anyhow::bail!(
"sorting by last accessed isn't supported: {err}"
);
}
SortModeKind::Created => {
let md = std::env::current_exe()
.and_then(|p| p.metadata())
.and_then(|md| md.created());
let Err(err) = md else { return Ok(()) };
anyhow::bail!(
"sorting by creation time isn't supported: {err}"
);
}
}
}
}
/// A single instance of either a change or a selection of one ripgrep's
/// file types.
#[derive(Debug, Eq, PartialEq)]
pub(crate) enum TypeChange {
/// Clear the given type from ripgrep.
Clear { name: String },
/// Add the given type definition (name and glob) to ripgrep.
Add { def: String },
/// Select the given type for filtering.
Select { name: String },
/// Select the given type for filtering but negate it.
Negate { name: String },
}

302
crates/core/flags/mod.rs Normal file
View File

@ -0,0 +1,302 @@
/*!
Defines ripgrep's command line interface.
This modules deals with everything involving ripgrep's flags and positional
arguments. This includes generating shell completions, `--help` output and even
ripgrep's man page. It's also responsible for parsing and validating every
flag (including reading ripgrep's config file), and manages the contact points
between these flags and ripgrep's cast of supporting libraries. For example,
once [`HiArgs`] has been created, it knows how to create a multi threaded
recursive directory traverser.
*/
use std::{
ffi::OsString,
fmt::Debug,
panic::{RefUnwindSafe, UnwindSafe},
};
pub(crate) use crate::flags::{
complete::{
bash::generate as generate_complete_bash,
fish::generate as generate_complete_fish,
powershell::generate as generate_complete_powershell,
zsh::generate as generate_complete_zsh,
},
doc::{
help::{
generate_long as generate_help_long,
generate_short as generate_help_short,
},
man::generate as generate_man_page,
version::{
generate_long as generate_version_long,
generate_pcre2 as generate_version_pcre2,
generate_short as generate_version_short,
},
},
hiargs::HiArgs,
lowargs::{GenerateMode, Mode, SearchMode, SpecialMode},
parse::{parse, ParseResult},
};
mod complete;
mod config;
mod defs;
mod doc;
mod hiargs;
mod lowargs;
mod parse;
/// A trait that encapsulates the definition of an optional flag for ripgrep.
///
/// This trait is meant to be used via dynamic dispatch. Namely, the `defs`
/// module provides a single global slice of `&dyn Flag` values correspondings
/// to all of the flags in ripgrep.
///
/// ripgrep's required positional arguments are handled by the parser and by
/// the conversion from low-level arguments to high level arguments. Namely,
/// all of ripgrep's positional arguments are treated as file paths, except
/// in certain circumstances where the first argument is treated as a regex
/// pattern.
///
/// Note that each implementation of this trait requires a long flag name,
/// but can also optionally have a short version and even a negation flag.
/// For example, the `-E/--encoding` flag accepts a value, but it also has a
/// `--no-encoding` negation flag for reverting back to "automatic" encoding
/// detection. All three of `-E`, `--encoding` and `--no-encoding` are provided
/// by a single implementation of this trait.
///
/// ripgrep only supports flags that are switches or flags that accept a single
/// value. Flags that accept multiple values are an unsupported abberation.
trait Flag: Debug + Send + Sync + UnwindSafe + RefUnwindSafe + 'static {
/// Returns true if this flag is a switch. When a flag is a switch, the
/// CLI parser will not look for a value after the flag is seen.
fn is_switch(&self) -> bool;
/// A short single byte name for this flag. This returns `None` by default,
/// which signifies that the flag has no short name.
///
/// The byte returned must be an ASCII codepoint that is a `.` or is
/// alpha-numeric.
fn name_short(&self) -> Option<u8> {
None
}
/// Returns the long name of this flag. All flags must have a "long" name.
///
/// The long name must be at least 2 bytes, and all of its bytes must be
/// ASCII codepoints that are either `-` or alpha-numeric.
fn name_long(&self) -> &'static str;
/// Returns a list of aliases for this flag.
///
/// The aliases must follow the same rules as `Flag::name_long`.
///
/// By default, an empty slice is returned.
fn aliases(&self) -> &'static [&'static str] {
&[]
}
/// Returns a negated name for this flag. The negation of a flag is
/// intended to have the opposite meaning of a flag or to otherwise turn
/// something "off" or revert it to its default behavior.
///
/// Negated flags are not listed in their own section in the `-h/--help`
/// output or man page. Instead, they are automatically mentioned at the
/// end of the documentation section of the flag they negated.
///
/// The aliases must follow the same rules as `Flag::name_long`.
///
/// By default, a flag has no negation and this returns `None`.
fn name_negated(&self) -> Option<&'static str> {
None
}
/// Returns the variable name describing the type of value this flag
/// accepts. This should always be set for non-switch flags and never set
/// for switch flags.
///
/// For example, the `--max-count` flag has its variable name set to `NUM`.
///
/// The convention is to capitalize variable names.
///
/// By default this returns `None`.
fn doc_variable(&self) -> Option<&'static str> {
None
}
/// Returns the category of this flag.
///
/// Every flag must have a single category. Categories are used to organize
/// flags in the generated documentation.
fn doc_category(&self) -> Category;
/// A (very) short documentation string describing what this flag does.
///
/// This may sacrifice "proper English" in order to be as terse as
/// possible. Generally, we try to ensure that `rg -h` doesn't have any
/// lines that exceed 79 columns.
fn doc_short(&self) -> &'static str;
/// A (possibly very) longer documentation string describing in full
/// detail what this flag does. This should be in mandoc/mdoc format.
fn doc_long(&self) -> &'static str;
/// If this is a non-switch flag that accepts a small set of specific
/// values, then this should list them.
///
/// This returns an empty slice by default.
fn doc_choices(&self) -> &'static [&'static str] {
&[]
}
fn completion_type(&self) -> CompletionType {
CompletionType::Other
}
/// Given the parsed value (which might just be a switch), this should
/// update the state in `args` based on the value given for this flag.
///
/// This may update state for other flags as appropriate.
///
/// The `-V/--version` and `-h/--help` flags are treated specially in the
/// parser and should do nothing here.
///
/// By convention, implementations should generally not try to "do"
/// anything other than validate the value given. For example, the
/// implementation for `--hostname-bin` should not try to resolve the
/// hostname to use by running the binary provided. That should be saved
/// for a later step. This convention is used to ensure that getting the
/// low-level arguments is as reliable and quick as possible. It also
/// ensures that "doing something" occurs a minimal number of times. For
/// example, by avoiding trying to find the hostname here, we can do it
/// once later no matter how many times `--hostname-bin` is provided.
///
/// Implementations should not include the flag name in the error message
/// returned. The flag name is included automatically by the parser.
fn update(
&self,
value: FlagValue,
args: &mut crate::flags::lowargs::LowArgs,
) -> anyhow::Result<()>;
}
/// The category that a flag belongs to.
///
/// Categories are used to organize flags into "logical" groups in the
/// generated documentation.
#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq, PartialOrd, Ord)]
enum Category {
/// Flags related to how ripgrep reads its input. Its "input" generally
/// consists of the patterns it is trying to match and the haystacks it is
/// trying to search.
Input,
/// Flags related to the operation of the search itself. For example,
/// whether case insensitive matching is enabled.
Search,
/// Flags related to how ripgrep filters haystacks. For example, whether
/// to respect gitignore files or not.
Filter,
/// Flags related to how ripgrep shows its search results. For example,
/// whether to show line numbers or not.
Output,
/// Flags related to changing ripgrep's output at a more fundamental level.
/// For example, flags like `--count` suppress printing of individual
/// lines, and instead just print the total count of matches for each file
/// searched.
OutputModes,
/// Flags related to logging behavior such as emitting non-fatal error
/// messages or printing search statistics.
Logging,
/// Other behaviors not related to ripgrep's core functionality. For
/// example, printing the file type globbing rules, or printing the list
/// of files ripgrep would search without actually searching them.
OtherBehaviors,
}
impl Category {
/// Returns a string representation of this category.
///
/// This string is the name of the variable used in various templates for
/// generated documentation. This name can be used for interpolation.
fn as_str(&self) -> &'static str {
match *self {
Category::Input => "input",
Category::Search => "search",
Category::Filter => "filter",
Category::Output => "output",
Category::OutputModes => "output-modes",
Category::Logging => "logging",
Category::OtherBehaviors => "other-behaviors",
}
}
}
/// The kind of argument a flag accepts, to be used for shell completions.
#[derive(Clone, Copy, Debug)]
enum CompletionType {
/// No special category. is_switch() and doc_choices() may apply.
Other,
/// A path to a file.
Filename,
/// A command in $PATH.
Executable,
/// The name of a file type, as used by e.g. --type.
Filetype,
/// The name of an encoding_rs encoding, as used by --encoding.
Encoding,
}
/// Represents a value parsed from the command line.
///
/// This doesn't include the corresponding flag, but values come in one of
/// two forms: a switch (on or off) or an arbitrary value.
///
/// Note that the CLI doesn't directly support negated switches. For example,
/// you can'd do anything like `-n=false` or any of that nonsense. Instead,
/// the CLI parser knows about which flag names are negations and which aren't
/// (courtesy of the `Flag` trait). If a flag given is known as a negation,
/// then a `FlagValue::Switch(false)` value is passed into `Flag::update`.
#[derive(Debug)]
enum FlagValue {
/// A flag that is either on or off.
Switch(bool),
/// A flag that comes with an arbitrary user value.
Value(OsString),
}
impl FlagValue {
/// Return the yes or no value of this switch.
///
/// If this flag value is not a switch, then this panics.
///
/// This is useful when writing the implementation of `Flag::update`.
/// namely, callers usually know whether a switch or a value is expected.
/// If a flag is something different, then it indicates a bug, and thus a
/// panic is acceptable.
fn unwrap_switch(self) -> bool {
match self {
FlagValue::Switch(yes) => yes,
FlagValue::Value(_) => {
unreachable!("got flag value but expected switch")
}
}
}
/// Return the user provided value of this flag.
///
/// If this flag is a switch, then this panics.
///
/// This is useful when writing the implementation of `Flag::update`.
/// namely, callers usually know whether a switch or a value is expected.
/// If a flag is something different, then it indicates a bug, and thus a
/// panic is acceptable.
fn unwrap_value(self) -> OsString {
match self {
FlagValue::Switch(_) => {
unreachable!("got switch but expected flag value")
}
FlagValue::Value(v) => v,
}
}
}

476
crates/core/flags/parse.rs Normal file
View File

@ -0,0 +1,476 @@
/*!
Parses command line arguments into a structured and typed representation.
*/
use std::{borrow::Cow, collections::BTreeSet, ffi::OsString};
use anyhow::Context;
use crate::flags::{
defs::FLAGS,
hiargs::HiArgs,
lowargs::{LoggingMode, LowArgs, SpecialMode},
Flag, FlagValue,
};
/// The result of parsing CLI arguments.
///
/// This is basically a `anyhow::Result<T>`, but with one extra variant that is
/// inhabited whenever ripgrep should execute a "special" mode. That is, when a
/// user provides the `-h/--help` or `-V/--version` flags.
///
/// This special variant exists to allow CLI parsing to short circuit as
/// quickly as is reasonable. For example, it lets CLI parsing avoid reading
/// ripgrep's configuration and converting low level arguments into a higher
/// level representation.
#[derive(Debug)]
pub(crate) enum ParseResult<T> {
Special(SpecialMode),
Ok(T),
Err(anyhow::Error),
}
impl<T> ParseResult<T> {
/// If this result is `Ok`, then apply `then` to it. Otherwise, return this
/// result unchanged.
fn and_then<U>(
self,
mut then: impl FnMut(T) -> ParseResult<U>,
) -> ParseResult<U> {
match self {
ParseResult::Special(mode) => ParseResult::Special(mode),
ParseResult::Ok(t) => then(t),
ParseResult::Err(err) => ParseResult::Err(err),
}
}
}
/// Parse CLI arguments and convert then to their high level representation.
pub(crate) fn parse() -> ParseResult<HiArgs> {
parse_low().and_then(|low| match HiArgs::from_low_args(low) {
Ok(hi) => ParseResult::Ok(hi),
Err(err) => ParseResult::Err(err),
})
}
/// Parse CLI arguments only into their low level representation.
///
/// This takes configuration into account. That is, it will try to read
/// `RIPGREP_CONFIG_PATH` and prepend any arguments found there to the
/// arguments passed to this process.
///
/// This will also set one-time global state flags, such as the log level and
/// whether messages should be printed.
fn parse_low() -> ParseResult<LowArgs> {
if let Err(err) = crate::logger::Logger::init() {
let err = anyhow::anyhow!("failed to initialize logger: {err}");
return ParseResult::Err(err);
}
let parser = Parser::new();
let mut low = LowArgs::default();
if let Err(err) = parser.parse(std::env::args_os().skip(1), &mut low) {
return ParseResult::Err(err);
}
// Even though we haven't parsed the config file yet (assuming it exists),
// we can still use the arguments given on the CLI to setup ripgrep's
// logging preferences. Even if the config file changes them in some way,
// it's really the best we can do. This way, for example, folks can pass
// `--trace` and see any messages logged during config file parsing.
set_log_levels(&low);
// Before we try to take configuration into account, we can bail early
// if a special mode was enabled. This is basically only for version and
// help output which shouldn't be impacted by extra configuration.
if let Some(special) = low.special.take() {
return ParseResult::Special(special);
}
// If the end user says no config, then respect it.
if low.no_config {
log::debug!("not reading config files because --no-config is present");
return ParseResult::Ok(low);
}
// Look for arguments from a config file. If we got nothing (whether the
// file is empty or RIPGREP_CONFIG_PATH wasn't set), then we don't need
// to re-parse.
let config_args = crate::flags::config::args();
if config_args.is_empty() {
log::debug!("no extra arguments found from configuration file");
return ParseResult::Ok(low);
}
// The final arguments are just the arguments from the CLI appending to
// the end of the config arguments.
let mut final_args = config_args;
final_args.extend(std::env::args_os().skip(1));
// Now do the CLI parsing dance again.
let mut low = LowArgs::default();
if let Err(err) = parser.parse(final_args.into_iter(), &mut low) {
return ParseResult::Err(err);
}
// Reset the message and logging levels, since they could have changed.
set_log_levels(&low);
ParseResult::Ok(low)
}
/// Sets global state flags that control logging based on low-level arguments.
fn set_log_levels(low: &LowArgs) {
crate::messages::set_messages(!low.no_messages);
crate::messages::set_ignore_messages(!low.no_ignore_messages);
match low.logging {
Some(LoggingMode::Trace) => {
log::set_max_level(log::LevelFilter::Trace)
}
Some(LoggingMode::Debug) => {
log::set_max_level(log::LevelFilter::Debug)
}
None => log::set_max_level(log::LevelFilter::Warn),
}
}
/// Parse the sequence of CLI arguments given a low level typed set of
/// arguments.
///
/// This is exposed for testing that the correct low-level arguments are parsed
/// from a CLI. It just runs the parser once over the CLI arguments. It doesn't
/// setup logging or read from a config file.
///
/// This assumes the iterator given does *not* begin with the binary name.
#[cfg(test)]
pub(crate) fn parse_low_raw(
rawargs: impl IntoIterator<Item = impl Into<OsString>>,
) -> anyhow::Result<LowArgs> {
let mut args = LowArgs::default();
Parser::new().parse(rawargs, &mut args)?;
Ok(args)
}
/// Return the metadata for the flag of the given name.
pub(super) fn lookup(name: &str) -> Option<&'static dyn Flag> {
// N.B. Creating a new parser might look expensive, but it only builds
// the lookup trie exactly once. That is, we get a `&'static Parser` from
// `Parser::new()`.
match Parser::new().find_long(name) {
FlagLookup::Match(&FlagInfo { flag, .. }) => Some(flag),
_ => None,
}
}
/// A parser for turning a sequence of command line arguments into a more
/// strictly typed set of arguments.
#[derive(Debug)]
struct Parser {
/// A single map that contains all possible flag names. This includes
/// short and long names, aliases and negations. This maps those names to
/// indices into `info`.
map: FlagMap,
/// A map from IDs returned by the `map` to the corresponding flag
/// information.
info: Vec<FlagInfo>,
}
impl Parser {
/// Create a new parser.
///
/// This always creates the same parser and only does it once. Callers may
/// call this repeatedly, and the parser will only be built once.
fn new() -> &'static Parser {
use std::sync::OnceLock;
// Since a parser's state is immutable and completely determined by
// FLAGS, and since FLAGS is a constant, we can initialize it exactly
// once.
static P: OnceLock<Parser> = OnceLock::new();
P.get_or_init(|| {
let mut infos = vec![];
for &flag in FLAGS.iter() {
infos.push(FlagInfo {
flag,
name: Ok(flag.name_long()),
kind: FlagInfoKind::Standard,
});
for alias in flag.aliases() {
infos.push(FlagInfo {
flag,
name: Ok(alias),
kind: FlagInfoKind::Alias,
});
}
if let Some(byte) = flag.name_short() {
infos.push(FlagInfo {
flag,
name: Err(byte),
kind: FlagInfoKind::Standard,
});
}
if let Some(name) = flag.name_negated() {
infos.push(FlagInfo {
flag,
name: Ok(name),
kind: FlagInfoKind::Negated,
});
}
}
let map = FlagMap::new(&infos);
Parser { map, info: infos }
})
}
/// Parse the given CLI arguments into a low level representation.
///
/// The iterator given should *not* start with the binary name.
fn parse<I, O>(&self, rawargs: I, args: &mut LowArgs) -> anyhow::Result<()>
where
I: IntoIterator<Item = O>,
O: Into<OsString>,
{
let mut p = lexopt::Parser::from_args(rawargs);
while let Some(arg) = p.next().context("invalid CLI arguments")? {
let lookup = match arg {
lexopt::Arg::Value(value) => {
args.positional.push(value);
continue;
}
lexopt::Arg::Short(ch) if ch == 'h' => {
// Special case -h/--help since behavior is different
// based on whether short or long flag is given.
args.special = Some(SpecialMode::HelpShort);
continue;
}
lexopt::Arg::Short(ch) if ch == 'V' => {
// Special case -V/--version since behavior is different
// based on whether short or long flag is given.
args.special = Some(SpecialMode::VersionShort);
continue;
}
lexopt::Arg::Short(ch) => self.find_short(ch),
lexopt::Arg::Long(name) if name == "help" => {
// Special case -h/--help since behavior is different
// based on whether short or long flag is given.
args.special = Some(SpecialMode::HelpLong);
continue;
}
lexopt::Arg::Long(name) if name == "version" => {
// Special case -V/--version since behavior is different
// based on whether short or long flag is given.
args.special = Some(SpecialMode::VersionLong);
continue;
}
lexopt::Arg::Long(name) => self.find_long(name),
};
let mat = match lookup {
FlagLookup::Match(mat) => mat,
FlagLookup::UnrecognizedShort(name) => {
anyhow::bail!("unrecognized flag -{name}")
}
FlagLookup::UnrecognizedLong(name) => {
let mut msg = format!("unrecognized flag --{name}");
if let Some(suggest_msg) = suggest(&name) {
msg = format!("{msg}\n\n{suggest_msg}");
}
anyhow::bail!("{msg}")
}
};
let value = if matches!(mat.kind, FlagInfoKind::Negated) {
// Negated flags are always switches, even if the non-negated
// flag is not. For example, --context-separator accepts a
// value, but --no-context-separator does not.
FlagValue::Switch(false)
} else if mat.flag.is_switch() {
FlagValue::Switch(true)
} else {
FlagValue::Value(p.value().with_context(|| {
format!("missing value for flag {mat}")
})?)
};
mat.flag
.update(value, args)
.with_context(|| format!("error parsing flag {mat}"))?;
}
Ok(())
}
/// Look for a flag by its short name.
fn find_short(&self, ch: char) -> FlagLookup<'_> {
if !ch.is_ascii() {
return FlagLookup::UnrecognizedShort(ch);
}
let byte = u8::try_from(ch).unwrap();
let Some(index) = self.map.find(&[byte]) else {
return FlagLookup::UnrecognizedShort(ch);
};
FlagLookup::Match(&self.info[index])
}
/// Look for a flag by its long name.
///
/// This also works for aliases and negated names.
fn find_long(&self, name: &str) -> FlagLookup<'_> {
let Some(index) = self.map.find(name.as_bytes()) else {
return FlagLookup::UnrecognizedLong(name.to_string());
};
FlagLookup::Match(&self.info[index])
}
}
/// The result of looking up a flag name.
#[derive(Debug)]
enum FlagLookup<'a> {
/// Lookup found a match and the metadata for the flag is attached.
Match(&'a FlagInfo),
/// The given short name is unrecognized.
UnrecognizedShort(char),
/// The given long name is unrecognized.
UnrecognizedLong(String),
}
/// The info about a flag associated with a flag's ID in the flag map.
#[derive(Debug)]
struct FlagInfo {
/// The flag object and its associated metadata.
flag: &'static dyn Flag,
/// The actual name that is stored in the Aho-Corasick automaton. When this
/// is a byte, it corresponds to a short single character ASCII flag. The
/// actual pattern that's in the Aho-Corasick automaton is just the single
/// byte.
name: Result<&'static str, u8>,
/// The type of flag that is stored for the corresponding Aho-Corasick
/// pattern.
kind: FlagInfoKind,
}
/// The kind of flag that is being matched.
#[derive(Debug)]
enum FlagInfoKind {
/// A standard flag, e.g., --passthru.
Standard,
/// A negation of a standard flag, e.g., --no-multiline.
Negated,
/// An alias for a standard flag, e.g., --passthrough.
Alias,
}
impl std::fmt::Display for FlagInfo {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
match self.name {
Ok(long) => write!(f, "--{long}"),
Err(short) => write!(f, "-{short}", short = char::from(short)),
}
}
}
/// A map from flag names (short, long, negated and aliases) to their ID.
///
/// Once an ID is known, it can be used to look up a flag's metadata in the
/// parser's internal state.
#[derive(Debug)]
struct FlagMap {
map: std::collections::HashMap<Vec<u8>, usize>,
}
impl FlagMap {
/// Create a new map of flags for the given flag information.
///
/// The index of each flag info corresponds to its ID.
fn new(infos: &[FlagInfo]) -> FlagMap {
let mut map = std::collections::HashMap::with_capacity(infos.len());
for (i, info) in infos.iter().enumerate() {
match info.name {
Ok(name) => {
assert_eq!(None, map.insert(name.as_bytes().to_vec(), i));
}
Err(byte) => {
assert_eq!(None, map.insert(vec![byte], i));
}
}
}
FlagMap { map }
}
/// Look for a match of `name` in the given Aho-Corasick automaton.
///
/// This only returns a match if the one found has a length equivalent to
/// the length of the name given.
fn find(&self, name: &[u8]) -> Option<usize> {
self.map.get(name).copied()
}
}
/// Possibly return a message suggesting flags similar in the name to the one
/// given.
///
/// The one given should be a flag given by the user (without the leading
/// dashes) that was unrecognized. This attempts to find existing flags that
/// are similar to the one given.
fn suggest(unrecognized: &str) -> Option<String> {
let similars = find_similar_names(unrecognized);
if similars.is_empty() {
return None;
}
let list = similars
.into_iter()
.map(|name| format!("--{name}"))
.collect::<Vec<String>>()
.join(", ");
Some(format!("similar flags that are available: {list}"))
}
/// Return a sequence of names similar to the unrecognized name given.
fn find_similar_names(unrecognized: &str) -> Vec<&'static str> {
// The jaccard similarity threshold at which we consider two flag names
// similar enough that it's worth suggesting it to the end user.
//
// This value was determined by some ad hoc experimentation. It might need
// further tweaking.
const THRESHOLD: f64 = 0.4;
let mut similar = vec![];
let bow_given = ngrams(unrecognized);
for &flag in FLAGS.iter() {
let name = flag.name_long();
let bow = ngrams(name);
if jaccard_index(&bow_given, &bow) >= THRESHOLD {
similar.push(name);
}
if let Some(name) = flag.name_negated() {
let bow = ngrams(name);
if jaccard_index(&bow_given, &bow) >= THRESHOLD {
similar.push(name);
}
}
for name in flag.aliases() {
let bow = ngrams(name);
if jaccard_index(&bow_given, &bow) >= THRESHOLD {
similar.push(name);
}
}
}
similar
}
/// A "bag of words" is a set of ngrams.
type BagOfWords<'a> = BTreeSet<Cow<'a, [u8]>>;
/// Returns the jaccard index (a measure of similarity) between sets of ngrams.
fn jaccard_index(ngrams1: &BagOfWords<'_>, ngrams2: &BagOfWords<'_>) -> f64 {
let union = u32::try_from(ngrams1.union(ngrams2).count())
.expect("fewer than u32::MAX flags");
let intersection = u32::try_from(ngrams1.intersection(ngrams2).count())
.expect("fewer than u32::MAX flags");
f64::from(intersection) / f64::from(union)
}
/// Returns all 3-grams in the slice given.
///
/// If the slice doesn't contain a 3-gram, then one is artificially created by
/// padding it out with a character that will never appear in a flag name.
fn ngrams(flag_name: &str) -> BagOfWords<'_> {
// We only allow ASCII flag names, so we can just use bytes.
let slice = flag_name.as_bytes();
let seq: Vec<Cow<[u8]>> = match slice.len() {
0 => vec![Cow::Owned(b"!!!".to_vec())],
1 => vec![Cow::Owned(vec![slice[0], b'!', b'!'])],
2 => vec![Cow::Owned(vec![slice[0], slice[1], b'!'])],
_ => slice.windows(3).map(Cow::Borrowed).collect(),
};
BTreeSet::from_iter(seq)
}

View File

@ -1,111 +1,111 @@
/*!
Defines a builder for haystacks.
A "haystack" represents something we want to search. It encapsulates the logic
for whether a haystack ought to be searched or not, separate from the standard
ignore rules and other filtering logic.
Effectively, a haystack wraps a directory entry and adds some light application
level logic around it.
*/
use std::path::Path;
use ignore::{self, DirEntry};
use log;
/// A configuration for describing how subjects should be built.
#[derive(Clone, Debug)]
struct Config {
strip_dot_prefix: bool,
}
impl Default for Config {
fn default() -> Config {
Config { strip_dot_prefix: false }
}
}
/// A builder for constructing things to search over.
#[derive(Clone, Debug)]
pub struct SubjectBuilder {
config: Config,
pub(crate) struct HaystackBuilder {
strip_dot_prefix: bool,
}
impl SubjectBuilder {
/// Return a new subject builder with a default configuration.
pub fn new() -> SubjectBuilder {
SubjectBuilder { config: Config::default() }
impl HaystackBuilder {
/// Return a new haystack builder with a default configuration.
pub(crate) fn new() -> HaystackBuilder {
HaystackBuilder { strip_dot_prefix: false }
}
/// Create a new subject from a possibly missing directory entry.
/// Create a new haystack from a possibly missing directory entry.
///
/// If the directory entry isn't present, then the corresponding error is
/// logged if messages have been configured. Otherwise, if the subject is
/// deemed searchable, then it is returned.
pub fn build_from_result(
/// logged if messages have been configured. Otherwise, if the directory
/// entry is deemed searchable, then it is returned as a haystack.
pub(crate) fn build_from_result(
&self,
result: Result<DirEntry, ignore::Error>,
) -> Option<Subject> {
result: Result<ignore::DirEntry, ignore::Error>,
) -> Option<Haystack> {
match result {
Ok(dent) => self.build(dent),
Err(err) => {
err_message!("{}", err);
err_message!("{err}");
None
}
}
}
/// Create a new subject using this builder's configuration.
/// Create a new haystack using this builder's configuration.
///
/// If a subject could not be created or should otherwise not be searched,
/// then this returns `None` after emitting any relevant log messages.
pub fn build(&self, dent: DirEntry) -> Option<Subject> {
let subj =
Subject { dent, strip_dot_prefix: self.config.strip_dot_prefix };
if let Some(ignore_err) = subj.dent.error() {
ignore_message!("{}", ignore_err);
/// If a directory entry could not be created or should otherwise not be
/// searched, then this returns `None` after emitting any relevant log
/// messages.
fn build(&self, dent: ignore::DirEntry) -> Option<Haystack> {
let hay = Haystack { dent, strip_dot_prefix: self.strip_dot_prefix };
if let Some(err) = hay.dent.error() {
ignore_message!("{err}");
}
// If this entry was explicitly provided by an end user, then we always
// want to search it.
if subj.is_explicit() {
return Some(subj);
if hay.is_explicit() {
return Some(hay);
}
// At this point, we only want to search something if it's explicitly a
// file. This omits symlinks. (If ripgrep was configured to follow
// symlinks, then they have already been followed by the directory
// traversal.)
if subj.is_file() {
return Some(subj);
if hay.is_file() {
return Some(hay);
}
// We got nothin. Emit a debug message, but only if this isn't a
// We got nothing. Emit a debug message, but only if this isn't a
// directory. Otherwise, emitting messages for directories is just
// noisy.
if !subj.is_dir() {
if !hay.is_dir() {
log::debug!(
"ignoring {}: failed to pass subject filter: \
"ignoring {}: failed to pass haystack filter: \
file type: {:?}, metadata: {:?}",
subj.dent.path().display(),
subj.dent.file_type(),
subj.dent.metadata()
hay.dent.path().display(),
hay.dent.file_type(),
hay.dent.metadata()
);
}
None
}
/// When enabled, if the subject's file path starts with `./` then it is
/// When enabled, if the haystack's file path starts with `./` then it is
/// stripped.
///
/// This is useful when implicitly searching the current working directory.
pub fn strip_dot_prefix(&mut self, yes: bool) -> &mut SubjectBuilder {
self.config.strip_dot_prefix = yes;
pub(crate) fn strip_dot_prefix(
&mut self,
yes: bool,
) -> &mut HaystackBuilder {
self.strip_dot_prefix = yes;
self
}
}
/// A subject is a thing we want to search. Generally, a subject is either a
/// file or stdin.
/// A haystack is a thing we want to search.
///
/// Generally, a haystack is either a file or stdin.
#[derive(Clone, Debug)]
pub struct Subject {
dent: DirEntry,
pub(crate) struct Haystack {
dent: ignore::DirEntry,
strip_dot_prefix: bool,
}
impl Subject {
/// Return the file path corresponding to this subject.
impl Haystack {
/// Return the file path corresponding to this haystack.
///
/// If this subject corresponds to stdin, then a special `<stdin>` path
/// If this haystack corresponds to stdin, then a special `<stdin>` path
/// is returned instead.
pub fn path(&self) -> &Path {
pub(crate) fn path(&self) -> &Path {
if self.strip_dot_prefix && self.dent.path().starts_with("./") {
self.dent.path().strip_prefix("./").unwrap()
} else {
@ -114,21 +114,21 @@ impl Subject {
}
/// Returns true if and only if this entry corresponds to stdin.
pub fn is_stdin(&self) -> bool {
pub(crate) fn is_stdin(&self) -> bool {
self.dent.is_stdin()
}
/// Returns true if and only if this entry corresponds to a subject to
/// Returns true if and only if this entry corresponds to a haystack to
/// search that was explicitly supplied by an end user.
///
/// Generally, this corresponds to either stdin or an explicit file path
/// argument. e.g., in `rg foo some-file ./some-dir/`, `some-file` is
/// an explicit subject, but, e.g., `./some-dir/some-other-file` is not.
/// an explicit haystack, but, e.g., `./some-dir/some-other-file` is not.
///
/// However, note that ripgrep does not see through shell globbing. e.g.,
/// in `rg foo ./some-dir/*`, `./some-dir/some-other-file` will be treated
/// as an explicit subject.
pub fn is_explicit(&self) -> bool {
/// as an explicit haystack.
pub(crate) fn is_explicit(&self) -> bool {
// stdin is obvious. When an entry has a depth of 0, that means it
// was explicitly provided to our directory iterator, which means it
// was in turn explicitly provided by the end user. The !is_dir check
@ -138,7 +138,7 @@ impl Subject {
self.is_stdin() || (self.dent.depth() == 0 && !self.is_dir())
}
/// Returns true if and only if this subject points to a directory after
/// Returns true if and only if this haystack points to a directory after
/// following symbolic links.
fn is_dir(&self) -> bool {
let ft = match self.dent.file_type() {
@ -153,7 +153,7 @@ impl Subject {
self.dent.path_is_symlink() && self.dent.path().is_dir()
}
/// Returns true if and only if this subject points to a file.
/// Returns true if and only if this haystack points to a file.
fn is_file(&self) -> bool {
self.dent.file_type().map_or(false, |ft| ft.is_file())
}

View File

@ -1,24 +1,28 @@
// This module defines a super simple logger that works with the `log` crate.
// We don't need anything fancy; just basic log levels and the ability to
// print to stderr. We therefore avoid bringing in extra dependencies just
// for this functionality.
/*!
Defines a super simple logger that works with the `log` crate.
use log::{self, Log};
We don't do anything fancy. We just need basic log levels and the ability to
print to stderr. We therefore avoid bringing in extra dependencies just for
this functionality.
*/
use log::Log;
/// The simplest possible logger that logs to stderr.
///
/// This logger does no filtering. Instead, it relies on the `log` crates
/// filtering via its global max_level setting.
#[derive(Debug)]
pub struct Logger(());
pub(crate) struct Logger(());
/// A singleton used as the target for an implementation of the `Log` trait.
const LOGGER: &'static Logger = &Logger(());
impl Logger {
/// Create a new logger that logs to stderr and initialize it as the
/// global logger. If there was a problem setting the logger, then an
/// error is returned.
pub fn init() -> Result<(), log::SetLoggerError> {
pub(crate) fn init() -> Result<(), log::SetLoggerError> {
log::set_logger(LOGGER)
}
}
@ -33,7 +37,7 @@ impl Log for Logger {
fn log(&self, record: &log::Record<'_>) {
match (record.file(), record.line()) {
(Some(file), Some(line)) => {
eprintln!(
eprintln_locked!(
"{}|{}|{}:{}: {}",
record.level(),
record.target(),
@ -43,7 +47,7 @@ impl Log for Logger {
);
}
(Some(file), None) => {
eprintln!(
eprintln_locked!(
"{}|{}|{}: {}",
record.level(),
record.target(),
@ -52,7 +56,7 @@ impl Log for Logger {
);
}
_ => {
eprintln!(
eprintln_locked!(
"{}|{}: {}",
record.level(),
record.target(),
@ -63,6 +67,6 @@ impl Log for Logger {
}
fn flush(&self) {
// We use eprintln! which is flushed on every call.
// We use eprintln_locked! which is flushed on every call.
}
}

View File

@ -1,24 +1,20 @@
use std::error;
use std::io::{self, Write};
use std::process;
use std::sync::Mutex;
use std::time::Instant;
/*!
The main entry point into ripgrep.
*/
use std::{io::Write, process::ExitCode};
use ignore::WalkState;
use args::Args;
use subject::Subject;
use crate::flags::{HiArgs, SearchMode};
#[macro_use]
mod messages;
mod app;
mod args;
mod config;
mod flags;
mod haystack;
mod logger;
mod path_printer;
mod search;
mod subject;
// Since Rust no longer uses jemalloc by default, ripgrep will, by default,
// use the system allocator. On Linux, this would normally be glibc's
@ -43,62 +39,96 @@ mod subject;
#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;
type Result<T> = ::std::result::Result<T, Box<dyn error::Error>>;
fn main() {
if let Err(err) = Args::parse().and_then(try_main) {
eprintln!("{}", err);
process::exit(2);
/// Then, as it was, then again it will be.
fn main() -> ExitCode {
match run(flags::parse()) {
Ok(code) => code,
Err(err) => {
// Look for a broken pipe error. In this case, we generally want
// to exit "gracefully" with a success exit code. This matches
// existing Unix convention. We need to handle this explicitly
// since the Rust runtime doesn't ask for PIPE signals, and thus
// we get an I/O error instead. Traditional C Unix applications
// quit by getting a PIPE signal that they don't handle, and thus
// the unhandled signal causes the process to unceremoniously
// terminate.
for cause in err.chain() {
if let Some(ioerr) = cause.downcast_ref::<std::io::Error>() {
if ioerr.kind() == std::io::ErrorKind::BrokenPipe {
return ExitCode::from(0);
}
}
}
eprintln_locked!("{:#}", err);
ExitCode::from(2)
}
}
}
fn try_main(args: Args) -> Result<()> {
use args::Command::*;
/// The main entry point for ripgrep.
///
/// The given parse result determines ripgrep's behavior. The parse
/// result should be the result of parsing CLI arguments in a low level
/// representation, and then followed by an attempt to convert them into a
/// higher level representation. The higher level representation has some nicer
/// abstractions, for example, instead of representing the `-g/--glob` flag
/// as a `Vec<String>` (as in the low level representation), the globs are
/// converted into a single matcher.
fn run(result: crate::flags::ParseResult<HiArgs>) -> anyhow::Result<ExitCode> {
use crate::flags::{Mode, ParseResult};
let matched = match args.command()? {
Search => search(&args),
SearchParallel => search_parallel(&args),
SearchNever => Ok(false),
Files => files(&args),
FilesParallel => files_parallel(&args),
Types => types(&args),
PCRE2Version => pcre2_version(&args),
}?;
if matched && (args.quiet() || !messages::errored()) {
process::exit(0)
let args = match result {
ParseResult::Err(err) => return Err(err),
ParseResult::Special(mode) => return special(mode),
ParseResult::Ok(args) => args,
};
let matched = match args.mode() {
Mode::Search(_) if !args.matches_possible() => false,
Mode::Search(mode) if args.threads() == 1 => search(&args, mode)?,
Mode::Search(mode) => search_parallel(&args, mode)?,
Mode::Files if args.threads() == 1 => files(&args)?,
Mode::Files => files_parallel(&args)?,
Mode::Types => return types(&args),
Mode::Generate(mode) => return generate(mode),
};
Ok(if matched && (args.quiet() || !messages::errored()) {
ExitCode::from(0)
} else if messages::errored() {
process::exit(2)
ExitCode::from(2)
} else {
process::exit(1)
}
ExitCode::from(1)
})
}
/// The top-level entry point for single-threaded search. This recursively
/// steps through the file list (current directory by default) and searches
/// each file sequentially.
fn search(args: &Args) -> Result<bool> {
let started_at = Instant::now();
let quit_after_match = args.quit_after_match()?;
let subject_builder = args.subject_builder();
let mut stats = args.stats()?;
let mut searcher = args.search_worker(args.stdout())?;
/// The top-level entry point for single-threaded search.
///
/// This recursively steps through the file list (current directory by default)
/// and searches each file sequentially.
fn search(args: &HiArgs, mode: SearchMode) -> anyhow::Result<bool> {
let started_at = std::time::Instant::now();
let haystack_builder = args.haystack_builder();
let unsorted = args
.walk_builder()?
.build()
.filter_map(|result| haystack_builder.build_from_result(result));
let haystacks = args.sort(unsorted);
let mut matched = false;
let mut searched = false;
for result in args.walker()? {
let subject = match subject_builder.build_from_result(result) {
Some(subject) => subject,
None => continue,
};
let mut stats = args.stats();
let mut searcher = args.search_worker(
args.matcher()?,
args.searcher()?,
args.printer(mode, args.stdout()),
)?;
for haystack in haystacks {
searched = true;
let search_result = match searcher.search(&subject) {
let search_result = match searcher.search(&haystack) {
Ok(search_result) => search_result,
// A broken pipe means graceful termination.
Err(err) if err.kind() == std::io::ErrorKind::BrokenPipe => break,
Err(err) => {
// A broken pipe means graceful termination.
if err.kind() == io::ErrorKind::BrokenPipe {
break;
}
err_message!("{}: {}", subject.path().display(), err);
err_message!("{}: {}", haystack.path().display(), err);
continue;
}
};
@ -106,66 +136,66 @@ fn search(args: &Args) -> Result<bool> {
if let Some(ref mut stats) = stats {
*stats += search_result.stats().unwrap();
}
if matched && quit_after_match {
if matched && args.quit_after_match() {
break;
}
}
if args.using_default_path() && !searched {
if args.has_implicit_path() && !searched {
eprint_nothing_searched();
}
if let Some(ref stats) = stats {
let elapsed = Instant::now().duration_since(started_at);
// We don't care if we couldn't print this successfully.
let _ = searcher.print_stats(elapsed, stats);
let wtr = searcher.printer().get_mut();
let _ = print_stats(mode, stats, started_at, wtr);
}
Ok(matched)
}
/// The top-level entry point for multi-threaded search. The parallelism is
/// itself achieved by the recursive directory traversal. All we need to do is
/// feed it a worker for performing a search on each file.
fn search_parallel(args: &Args) -> Result<bool> {
use std::sync::atomic::AtomicBool;
use std::sync::atomic::Ordering::SeqCst;
/// The top-level entry point for multi-threaded search.
///
/// The parallelism is itself achieved by the recursive directory traversal.
/// All we need to do is feed it a worker for performing a search on each file.
///
/// Requesting a sorted output from ripgrep (such as with `--sort path`) will
/// automatically disable parallelism and hence sorting is not handled here.
fn search_parallel(args: &HiArgs, mode: SearchMode) -> anyhow::Result<bool> {
use std::sync::atomic::{AtomicBool, Ordering};
let quit_after_match = args.quit_after_match()?;
let started_at = Instant::now();
let subject_builder = args.subject_builder();
let bufwtr = args.buffer_writer()?;
let stats = args.stats()?.map(Mutex::new);
let started_at = std::time::Instant::now();
let haystack_builder = args.haystack_builder();
let bufwtr = args.buffer_writer();
let stats = args.stats().map(std::sync::Mutex::new);
let matched = AtomicBool::new(false);
let searched = AtomicBool::new(false);
let mut searcher_err = None;
args.walker_parallel()?.run(|| {
let mut searcher = args.search_worker(
args.matcher()?,
args.searcher()?,
args.printer(mode, bufwtr.buffer()),
)?;
args.walk_builder()?.build_parallel().run(|| {
let bufwtr = &bufwtr;
let stats = &stats;
let matched = &matched;
let searched = &searched;
let subject_builder = &subject_builder;
let mut searcher = match args.search_worker(bufwtr.buffer()) {
Ok(searcher) => searcher,
Err(err) => {
searcher_err = Some(err);
return Box::new(move |_| WalkState::Quit);
}
};
let haystack_builder = &haystack_builder;
let mut searcher = searcher.clone();
Box::new(move |result| {
let subject = match subject_builder.build_from_result(result) {
Some(subject) => subject,
let haystack = match haystack_builder.build_from_result(result) {
Some(haystack) => haystack,
None => return WalkState::Continue,
};
searched.store(true, SeqCst);
searched.store(true, Ordering::SeqCst);
searcher.printer().get_mut().clear();
let search_result = match searcher.search(&subject) {
let search_result = match searcher.search(&haystack) {
Ok(search_result) => search_result,
Err(err) => {
err_message!("{}: {}", subject.path().display(), err);
err_message!("{}: {}", haystack.path().display(), err);
return WalkState::Continue;
}
};
if search_result.has_match() {
matched.store(true, SeqCst);
matched.store(true, Ordering::SeqCst);
}
if let Some(ref locked_stats) = *stats {
let mut stats = locked_stats.lock().unwrap();
@ -173,63 +203,53 @@ fn search_parallel(args: &Args) -> Result<bool> {
}
if let Err(err) = bufwtr.print(searcher.printer().get_mut()) {
// A broken pipe means graceful termination.
if err.kind() == io::ErrorKind::BrokenPipe {
if err.kind() == std::io::ErrorKind::BrokenPipe {
return WalkState::Quit;
}
// Otherwise, we continue on our merry way.
err_message!("{}: {}", subject.path().display(), err);
err_message!("{}: {}", haystack.path().display(), err);
}
if matched.load(SeqCst) && quit_after_match {
if matched.load(Ordering::SeqCst) && args.quit_after_match() {
WalkState::Quit
} else {
WalkState::Continue
}
})
});
if let Some(err) = searcher_err.take() {
return Err(err);
}
if args.using_default_path() && !searched.load(SeqCst) {
if args.has_implicit_path() && !searched.load(Ordering::SeqCst) {
eprint_nothing_searched();
}
if let Some(ref locked_stats) = stats {
let elapsed = Instant::now().duration_since(started_at);
let stats = locked_stats.lock().unwrap();
let mut searcher = args.search_worker(args.stdout())?;
// We don't care if we couldn't print this successfully.
let _ = searcher.print_stats(elapsed, &stats);
let mut wtr = searcher.printer().get_mut();
let _ = print_stats(mode, &stats, started_at, &mut wtr);
let _ = bufwtr.print(&mut wtr);
}
Ok(matched.load(SeqCst))
Ok(matched.load(Ordering::SeqCst))
}
fn eprint_nothing_searched() {
err_message!(
"No files were searched, which means ripgrep probably \
applied a filter you didn't expect.\n\
Running with --debug will show why files are being skipped."
);
}
/// The top-level entry point for file listing without searching.
///
/// This recursively steps through the file list (current directory by default)
/// and prints each path sequentially using a single thread.
fn files(args: &HiArgs) -> anyhow::Result<bool> {
let haystack_builder = args.haystack_builder();
let unsorted = args
.walk_builder()?
.build()
.filter_map(|result| haystack_builder.build_from_result(result));
let haystacks = args.sort(unsorted);
/// The top-level entry point for listing files without searching them. This
/// recursively steps through the file list (current directory by default) and
/// prints each path sequentially using a single thread.
fn files(args: &Args) -> Result<bool> {
let quit_after_match = args.quit_after_match()?;
let subject_builder = args.subject_builder();
let mut matched = false;
let mut path_printer = args.path_printer(args.stdout())?;
for result in args.walker()? {
let subject = match subject_builder.build_from_result(result) {
Some(subject) => subject,
None => continue,
};
let mut path_printer = args.path_printer_builder().build(args.stdout());
for haystack in haystacks {
matched = true;
if quit_after_match {
if args.quit_after_match() {
break;
}
if let Err(err) = path_printer.write_path(subject.path()) {
if let Err(err) = path_printer.write(haystack.path()) {
// A broken pipe means graceful termination.
if err.kind() == io::ErrorKind::BrokenPipe {
if err.kind() == std::io::ErrorKind::BrokenPipe {
break;
}
// Otherwise, we have some other error that's preventing us from
@ -240,42 +260,53 @@ fn files(args: &Args) -> Result<bool> {
Ok(matched)
}
/// The top-level entry point for listing files without searching them. This
/// recursively steps through the file list (current directory by default) and
/// prints each path sequentially using multiple threads.
fn files_parallel(args: &Args) -> Result<bool> {
use std::sync::atomic::AtomicBool;
use std::sync::atomic::Ordering::SeqCst;
use std::sync::mpsc;
use std::thread;
/// The top-level entry point for multi-threaded file listing without
/// searching.
///
/// This recursively steps through the file list (current directory by default)
/// and prints each path sequentially using multiple threads.
///
/// Requesting a sorted output from ripgrep (such as with `--sort path`) will
/// automatically disable parallelism and hence sorting is not handled here.
fn files_parallel(args: &HiArgs) -> anyhow::Result<bool> {
use std::{
sync::{
atomic::{AtomicBool, Ordering},
mpsc,
},
thread,
};
let quit_after_match = args.quit_after_match()?;
let subject_builder = args.subject_builder();
let mut path_printer = args.path_printer(args.stdout())?;
let haystack_builder = args.haystack_builder();
let mut path_printer = args.path_printer_builder().build(args.stdout());
let matched = AtomicBool::new(false);
let (tx, rx) = mpsc::channel::<Subject>();
let (tx, rx) = mpsc::channel::<crate::haystack::Haystack>();
let print_thread = thread::spawn(move || -> io::Result<()> {
for subject in rx.iter() {
path_printer.write_path(subject.path())?;
// We spawn a single printing thread to make sure we don't tear writes.
// We use a channel here under the presumption that it's probably faster
// than using a mutex in the worker threads below, but this has never been
// seriously litigated.
let print_thread = thread::spawn(move || -> std::io::Result<()> {
for haystack in rx.iter() {
path_printer.write(haystack.path())?;
}
Ok(())
});
args.walker_parallel()?.run(|| {
let subject_builder = &subject_builder;
args.walk_builder()?.build_parallel().run(|| {
let haystack_builder = &haystack_builder;
let matched = &matched;
let tx = tx.clone();
Box::new(move |result| {
let subject = match subject_builder.build_from_result(result) {
Some(subject) => subject,
let haystack = match haystack_builder.build_from_result(result) {
Some(haystack) => haystack,
None => return WalkState::Continue,
};
matched.store(true, SeqCst);
if quit_after_match {
matched.store(true, Ordering::SeqCst);
if args.quit_after_match() {
WalkState::Quit
} else {
match tx.send(subject) {
match tx.send(haystack) {
Ok(_) => WalkState::Continue,
Err(_) => WalkState::Quit,
}
@ -287,18 +318,18 @@ fn files_parallel(args: &Args) -> Result<bool> {
// A broken pipe means graceful termination, so fall through.
// Otherwise, something bad happened while writing to stdout, so bubble
// it up.
if err.kind() != io::ErrorKind::BrokenPipe {
if err.kind() != std::io::ErrorKind::BrokenPipe {
return Err(err.into());
}
}
Ok(matched.load(SeqCst))
Ok(matched.load(Ordering::SeqCst))
}
/// The top-level entry point for --type-list.
fn types(args: &Args) -> Result<bool> {
/// The top-level entry point for `--type-list`.
fn types(args: &HiArgs) -> anyhow::Result<ExitCode> {
let mut count = 0;
let mut stdout = args.stdout();
for def in args.type_defs()? {
for def in args.types().definitions() {
count += 1;
stdout.write_all(def.name().as_bytes())?;
stdout.write_all(b": ")?;
@ -313,32 +344,140 @@ fn types(args: &Args) -> Result<bool> {
}
stdout.write_all(b"\n")?;
}
Ok(count > 0)
Ok(ExitCode::from(if count == 0 { 1 } else { 0 }))
}
/// The top-level entry point for --pcre2-version.
fn pcre2_version(args: &Args) -> Result<bool> {
#[cfg(feature = "pcre2")]
fn imp(args: &Args) -> Result<bool> {
use grep::pcre2;
/// Implements ripgrep's "generate" modes.
///
/// These modes correspond to generating some kind of ancillary data related
/// to ripgrep. At present, this includes ripgrep's man page (in roff format)
/// and supported shell completions.
fn generate(mode: crate::flags::GenerateMode) -> anyhow::Result<ExitCode> {
use crate::flags::GenerateMode;
let mut stdout = args.stdout();
let (major, minor) = pcre2::version();
writeln!(stdout, "PCRE2 {}.{} is available", major, minor)?;
if cfg!(target_pointer_width = "64") && pcre2::is_jit_available() {
writeln!(stdout, "JIT is available")?;
let output = match mode {
GenerateMode::Man => flags::generate_man_page(),
GenerateMode::CompleteBash => flags::generate_complete_bash(),
GenerateMode::CompleteZsh => flags::generate_complete_zsh(),
GenerateMode::CompleteFish => flags::generate_complete_fish(),
GenerateMode::CompletePowerShell => {
flags::generate_complete_powershell()
}
Ok(true)
}
#[cfg(not(feature = "pcre2"))]
fn imp(args: &Args) -> Result<bool> {
let mut stdout = args.stdout();
writeln!(stdout, "PCRE2 is not available in this build of ripgrep.")?;
Ok(false)
}
imp(args)
};
writeln!(std::io::stdout(), "{}", output.trim_end())?;
Ok(ExitCode::from(0))
}
/// Implements ripgrep's "special" modes.
///
/// A special mode is one that generally short-circuits most (not all) of
/// ripgrep's initialization logic and skips right to this routine. The
/// special modes essentially consist of printing help and version output. The
/// idea behind the short circuiting is to ensure there is as little as possible
/// (within reason) that would prevent ripgrep from emitting help output.
///
/// For example, part of the initialization logic that is skipped (among
/// other things) is accessing the current working directory. If that fails,
/// ripgrep emits an error. We don't want to emit an error if it fails and
/// the user requested version or help information.
fn special(mode: crate::flags::SpecialMode) -> anyhow::Result<ExitCode> {
use crate::flags::SpecialMode;
let mut exit = ExitCode::from(0);
let output = match mode {
SpecialMode::HelpShort => flags::generate_help_short(),
SpecialMode::HelpLong => flags::generate_help_long(),
SpecialMode::VersionShort => flags::generate_version_short(),
SpecialMode::VersionLong => flags::generate_version_long(),
// --pcre2-version is a little special because it emits an error
// exit code if this build of ripgrep doesn't support PCRE2.
SpecialMode::VersionPCRE2 => {
let (output, available) = flags::generate_version_pcre2();
if !available {
exit = ExitCode::from(1);
}
output
}
};
writeln!(std::io::stdout(), "{}", output.trim_end())?;
Ok(exit)
}
/// Prints a heuristic error messages when nothing is searched.
///
/// This can happen if an applicable ignore file has one or more rules that
/// are too broad and cause ripgrep to ignore everything.
///
/// We only show this error message when the user does *not* provide an
/// explicit path to search. This is because the message can otherwise be
/// noisy, e.g., when it is intended that there is nothing to search.
fn eprint_nothing_searched() {
err_message!(
"No files were searched, which means ripgrep probably \
applied a filter you didn't expect.\n\
Running with --debug will show why files are being skipped."
);
}
/// Prints the statistics given to the writer given.
///
/// The search mode given determines whether the stats should be printed in
/// a plain text format or in a JSON format.
///
/// The `started` time should be the time at which ripgrep started working.
///
/// If an error occurs while writing, then writing stops and the error is
/// returned. Note that callers should probably ignore this errror, since
/// whether stats fail to print or not generally shouldn't cause ripgrep to
/// enter into an "error" state. And usually the only way for this to fail is
/// if writing to stdout itself fails.
fn print_stats<W: Write>(
mode: SearchMode,
stats: &grep::printer::Stats,
started: std::time::Instant,
mut wtr: W,
) -> std::io::Result<()> {
let elapsed = std::time::Instant::now().duration_since(started);
if matches!(mode, SearchMode::JSON) {
// We specifically match the format laid out by the JSON printer in
// the grep-printer crate. We simply "extend" it with the 'summary'
// message type.
serde_json::to_writer(
&mut wtr,
&serde_json::json!({
"type": "summary",
"data": {
"stats": stats,
"elapsed_total": {
"secs": elapsed.as_secs(),
"nanos": elapsed.subsec_nanos(),
"human": format!("{:0.6}s", elapsed.as_secs_f64()),
},
}
}),
)?;
write!(wtr, "\n")
} else {
write!(
wtr,
"
{matches} matches
{lines} matched lines
{searches_with_match} files contained matches
{searches} files searched
{bytes_printed} bytes printed
{bytes_searched} bytes searched
{search_time:0.6} seconds spent searching
{process_time:0.6} seconds
",
matches = stats.matches(),
lines = stats.matched_lines(),
searches_with_match = stats.searches_with_match(),
searches = stats.searches(),
bytes_printed = stats.bytes_printed(),
bytes_searched = stats.bytes_searched(),
search_time = stats.elapsed().as_secs_f64(),
process_time = elapsed.as_secs_f64(),
)
}
}

View File

@ -1,15 +1,77 @@
/*!
This module defines some macros and some light shared mutable state.
This state is responsible for keeping track of whether we should emit certain
kinds of messages to the user (such as errors) that are distinct from the
standard "debug" or "trace" log messages. This state is specifically set at
startup time when CLI arguments are parsed and then never changed.
The other state tracked here is whether ripgrep experienced an error
condition. Aside from errors associated with invalid CLI arguments, ripgrep
generally does not abort when an error occurs (e.g., if reading a file failed).
But when an error does occur, it will alter ripgrep's exit status. Thus, when
an error message is emitted via `err_message`, then a global flag is toggled
indicating that at least one error occurred. When ripgrep exits, this flag is
consulted to determine what the exit status ought to be.
*/
use std::sync::atomic::{AtomicBool, Ordering};
/// When false, "messages" will not be printed.
static MESSAGES: AtomicBool = AtomicBool::new(false);
/// When false, "messages" related to ignore rules will not be printed.
static IGNORE_MESSAGES: AtomicBool = AtomicBool::new(false);
/// Flipped to true when an error message is printed.
static ERRORED: AtomicBool = AtomicBool::new(false);
/// Like eprintln, but locks stdout to prevent interleaving lines.
///
/// This locks stdout, not stderr, even though this prints to stderr. This
/// avoids the appearance of interleaving output when stdout and stderr both
/// correspond to a tty.
#[macro_export]
macro_rules! eprintln_locked {
($($tt:tt)*) => {{
{
use std::io::Write;
// This is a bit of an abstraction violation because we explicitly
// lock stdout before printing to stderr. This avoids interleaving
// lines within ripgrep because `search_parallel` uses `termcolor`,
// which accesses the same stdout lock when writing lines.
let stdout = std::io::stdout().lock();
let mut stderr = std::io::stderr().lock();
// We specifically ignore any errors here. One plausible error we
// can get in some cases is a broken pipe error. And when that
// occurs, we should exit gracefully. Otherwise, just abort with
// an error code because there isn't much else we can do.
//
// See: https://github.com/BurntSushi/ripgrep/issues/1966
if let Err(err) = write!(stderr, "rg: ") {
if err.kind() == std::io::ErrorKind::BrokenPipe {
std::process::exit(0);
} else {
std::process::exit(2);
}
}
if let Err(err) = writeln!(stderr, $($tt)*) {
if err.kind() == std::io::ErrorKind::BrokenPipe {
std::process::exit(0);
} else {
std::process::exit(2);
}
}
drop(stdout);
}
}}
}
/// Emit a non-fatal error message, unless messages were disabled.
#[macro_export]
macro_rules! message {
($($tt:tt)*) => {
if crate::messages::messages() {
eprintln!($($tt)*);
eprintln_locked!($($tt)*);
}
}
}
@ -30,25 +92,25 @@ macro_rules! err_message {
macro_rules! ignore_message {
($($tt:tt)*) => {
if crate::messages::messages() && crate::messages::ignore_messages() {
eprintln!($($tt)*);
eprintln_locked!($($tt)*);
}
}
}
/// Returns true if and only if messages should be shown.
pub fn messages() -> bool {
pub(crate) fn messages() -> bool {
MESSAGES.load(Ordering::SeqCst)
}
/// Set whether messages should be shown or not.
///
/// By default, they are not shown.
pub fn set_messages(yes: bool) {
pub(crate) fn set_messages(yes: bool) {
MESSAGES.store(yes, Ordering::SeqCst)
}
/// Returns true if and only if "ignore" related messages should be shown.
pub fn ignore_messages() -> bool {
pub(crate) fn ignore_messages() -> bool {
IGNORE_MESSAGES.load(Ordering::SeqCst)
}
@ -59,16 +121,19 @@ pub fn ignore_messages() -> bool {
/// Note that this is overridden if `messages` is disabled. Namely, if
/// `messages` is disabled, then "ignore" messages are never shown, regardless
/// of this setting.
pub fn set_ignore_messages(yes: bool) {
pub(crate) fn set_ignore_messages(yes: bool) {
IGNORE_MESSAGES.store(yes, Ordering::SeqCst)
}
/// Returns true if and only if ripgrep came across a non-fatal error.
pub fn errored() -> bool {
pub(crate) fn errored() -> bool {
ERRORED.load(Ordering::SeqCst)
}
/// Indicate that ripgrep has come across a non-fatal error.
pub fn set_errored() {
///
/// Callers should not use this directly. Instead, it is called automatically
/// via the `err_message` macro.
pub(crate) fn set_errored() {
ERRORED.store(true, Ordering::SeqCst);
}

View File

@ -1,98 +0,0 @@
use std::io;
use std::path::Path;
use grep::printer::{ColorSpecs, PrinterPath};
use termcolor::WriteColor;
/// A configuration for describing how paths should be written.
#[derive(Clone, Debug)]
struct Config {
colors: ColorSpecs,
separator: Option<u8>,
terminator: u8,
}
impl Default for Config {
fn default() -> Config {
Config {
colors: ColorSpecs::default(),
separator: None,
terminator: b'\n',
}
}
}
/// A builder for constructing things to search over.
#[derive(Clone, Debug)]
pub struct PathPrinterBuilder {
config: Config,
}
impl PathPrinterBuilder {
/// Return a new subject builder with a default configuration.
pub fn new() -> PathPrinterBuilder {
PathPrinterBuilder { config: Config::default() }
}
/// Create a new path printer with the current configuration that writes
/// paths to the given writer.
pub fn build<W: WriteColor>(&self, wtr: W) -> PathPrinter<W> {
PathPrinter { config: self.config.clone(), wtr }
}
/// Set the color specification for this printer.
///
/// Currently, only the `path` component of the given specification is
/// used.
pub fn color_specs(
&mut self,
specs: ColorSpecs,
) -> &mut PathPrinterBuilder {
self.config.colors = specs;
self
}
/// A path separator.
///
/// When provided, the path's default separator will be replaced with
/// the given separator.
///
/// This is not set by default, and the system's default path separator
/// will be used.
pub fn separator(&mut self, sep: Option<u8>) -> &mut PathPrinterBuilder {
self.config.separator = sep;
self
}
/// A path terminator.
///
/// When printing a path, it will be by terminated by the given byte.
///
/// This is set to `\n` by default.
pub fn terminator(&mut self, terminator: u8) -> &mut PathPrinterBuilder {
self.config.terminator = terminator;
self
}
}
/// A printer for emitting paths to a writer, with optional color support.
#[derive(Debug)]
pub struct PathPrinter<W> {
config: Config,
wtr: W,
}
impl<W: WriteColor> PathPrinter<W> {
/// Write the given path to the underlying writer.
pub fn write_path(&mut self, path: &Path) -> io::Result<()> {
let ppath = PrinterPath::with_separator(path, self.config.separator);
if !self.wtr.supports_color() {
self.wtr.write_all(ppath.as_bytes())?;
} else {
self.wtr.set_color(self.config.colors.path())?;
self.wtr.write_all(ppath.as_bytes())?;
self.wtr.reset()?;
}
self.wtr.write_all(&[self.config.terminator])
}
}

View File

@ -1,55 +1,47 @@
use std::fs::File;
use std::io;
use std::path::{Path, PathBuf};
use std::process::{Command, Stdio};
use std::time::Duration;
/*!
Defines a very high level "search worker" abstraction.
use grep::cli;
use grep::matcher::Matcher;
#[cfg(feature = "pcre2")]
use grep::pcre2::RegexMatcher as PCRE2RegexMatcher;
use grep::printer::{Standard, Stats, Summary, JSON};
use grep::regex::RegexMatcher as RustRegexMatcher;
use grep::searcher::{BinaryDetection, Searcher};
use ignore::overrides::Override;
use serde_json as json;
use serde_json::json;
use termcolor::WriteColor;
A search worker manages the high level interaction points between the matcher
(i.e., which regex engine is used), the searcher (i.e., how data is actually
read and matched using the regex engine) and the printer. For example, the
search worker is where things like preprocessors or decompression happens.
*/
use crate::subject::Subject;
use std::{io, path::Path};
/// The configuration for the search worker. Among a few other things, the
/// configuration primarily controls the way we show search results to users
/// at a very high level.
use {grep::matcher::Matcher, termcolor::WriteColor};
/// The configuration for the search worker.
///
/// Among a few other things, the configuration primarily controls the way we
/// show search results to users at a very high level.
#[derive(Clone, Debug)]
struct Config {
json_stats: bool,
preprocessor: Option<PathBuf>,
preprocessor_globs: Override,
preprocessor: Option<std::path::PathBuf>,
preprocessor_globs: ignore::overrides::Override,
search_zip: bool,
binary_implicit: BinaryDetection,
binary_explicit: BinaryDetection,
binary_implicit: grep::searcher::BinaryDetection,
binary_explicit: grep::searcher::BinaryDetection,
}
impl Default for Config {
fn default() -> Config {
Config {
json_stats: false,
preprocessor: None,
preprocessor_globs: Override::empty(),
preprocessor_globs: ignore::overrides::Override::empty(),
search_zip: false,
binary_implicit: BinaryDetection::none(),
binary_explicit: BinaryDetection::none(),
binary_implicit: grep::searcher::BinaryDetection::none(),
binary_explicit: grep::searcher::BinaryDetection::none(),
}
}
}
/// A builder for configuring and constructing a search worker.
#[derive(Clone, Debug)]
pub struct SearchWorkerBuilder {
pub(crate) struct SearchWorkerBuilder {
config: Config,
command_builder: cli::CommandReaderBuilder,
decomp_builder: cli::DecompressionReaderBuilder,
command_builder: grep::cli::CommandReaderBuilder,
decomp_builder: grep::cli::DecompressionReaderBuilder,
}
impl Default for SearchWorkerBuilder {
@ -60,11 +52,11 @@ impl Default for SearchWorkerBuilder {
impl SearchWorkerBuilder {
/// Create a new builder for configuring and constructing a search worker.
pub fn new() -> SearchWorkerBuilder {
let mut cmd_builder = cli::CommandReaderBuilder::new();
pub(crate) fn new() -> SearchWorkerBuilder {
let mut cmd_builder = grep::cli::CommandReaderBuilder::new();
cmd_builder.async_stderr(true);
let mut decomp_builder = cli::DecompressionReaderBuilder::new();
let mut decomp_builder = grep::cli::DecompressionReaderBuilder::new();
decomp_builder.async_stderr(true);
SearchWorkerBuilder {
@ -76,10 +68,10 @@ impl SearchWorkerBuilder {
/// Create a new search worker using the given searcher, matcher and
/// printer.
pub fn build<W: WriteColor>(
pub(crate) fn build<W: WriteColor>(
&self,
matcher: PatternMatcher,
searcher: Searcher,
searcher: grep::searcher::Searcher,
printer: Printer<W>,
) -> SearchWorker<W> {
let config = self.config.clone();
@ -95,29 +87,17 @@ impl SearchWorkerBuilder {
}
}
/// Forcefully use JSON to emit statistics, even if the underlying printer
/// is not the JSON printer.
///
/// This is useful for implementing flag combinations like
/// `--json --quiet`, which uses the summary printer for implementing
/// `--quiet` but still wants to emit summary statistics, which should
/// be JSON formatted because of the `--json` flag.
pub fn json_stats(&mut self, yes: bool) -> &mut SearchWorkerBuilder {
self.config.json_stats = yes;
self
}
/// Set the path to a preprocessor command.
///
/// When this is set, instead of searching files directly, the given
/// command will be run with the file path as the first argument, and the
/// output of that command will be searched instead.
pub fn preprocessor(
pub(crate) fn preprocessor(
&mut self,
cmd: Option<PathBuf>,
) -> crate::Result<&mut SearchWorkerBuilder> {
cmd: Option<std::path::PathBuf>,
) -> anyhow::Result<&mut SearchWorkerBuilder> {
if let Some(ref prog) = cmd {
let bin = cli::resolve_binary(prog)?;
let bin = grep::cli::resolve_binary(prog)?;
self.config.preprocessor = Some(bin);
} else {
self.config.preprocessor = None;
@ -128,9 +108,9 @@ impl SearchWorkerBuilder {
/// Set the globs for determining which files should be run through the
/// preprocessor. By default, with no globs and a preprocessor specified,
/// every file is run through the preprocessor.
pub fn preprocessor_globs(
pub(crate) fn preprocessor_globs(
&mut self,
globs: Override,
globs: ignore::overrides::Override,
) -> &mut SearchWorkerBuilder {
self.config.preprocessor_globs = globs;
self
@ -143,7 +123,10 @@ impl SearchWorkerBuilder {
///
/// Note that if a preprocessor command is set, then it overrides this
/// setting.
pub fn search_zip(&mut self, yes: bool) -> &mut SearchWorkerBuilder {
pub(crate) fn search_zip(
&mut self,
yes: bool,
) -> &mut SearchWorkerBuilder {
self.config.search_zip = yes;
self
}
@ -151,13 +134,14 @@ impl SearchWorkerBuilder {
/// Set the binary detection that should be used when searching files
/// found via a recursive directory search.
///
/// Generally, this binary detection may be `BinaryDetection::quit` if
/// we want to skip binary files completely.
/// Generally, this binary detection may be
/// `grep::searcher::BinaryDetection::quit` if we want to skip binary files
/// completely.
///
/// By default, no binary detection is performed.
pub fn binary_detection_implicit(
pub(crate) fn binary_detection_implicit(
&mut self,
detection: BinaryDetection,
detection: grep::searcher::BinaryDetection,
) -> &mut SearchWorkerBuilder {
self.config.binary_implicit = detection;
self
@ -166,14 +150,14 @@ impl SearchWorkerBuilder {
/// Set the binary detection that should be used when searching files
/// explicitly supplied by an end user.
///
/// Generally, this binary detection should NOT be `BinaryDetection::quit`,
/// since we never want to automatically filter files supplied by the end
/// user.
/// Generally, this binary detection should NOT be
/// `grep::searcher::BinaryDetection::quit`, since we never want to
/// automatically filter files supplied by the end user.
///
/// By default, no binary detection is performed.
pub fn binary_detection_explicit(
pub(crate) fn binary_detection_explicit(
&mut self,
detection: BinaryDetection,
detection: grep::searcher::BinaryDetection,
) -> &mut SearchWorkerBuilder {
self.config.binary_explicit = detection;
self
@ -187,14 +171,14 @@ impl SearchWorkerBuilder {
/// every search also has some aggregate statistics or meta data that may be
/// useful to higher level routines.
#[derive(Clone, Debug, Default)]
pub struct SearchResult {
pub(crate) struct SearchResult {
has_match: bool,
stats: Option<Stats>,
stats: Option<grep::printer::Stats>,
}
impl SearchResult {
/// Whether the search found a match or not.
pub fn has_match(&self) -> bool {
pub(crate) fn has_match(&self) -> bool {
self.has_match
}
@ -202,103 +186,36 @@ impl SearchResult {
///
/// It can be expensive to compute statistics, so these are only present
/// if explicitly enabled in the printer provided by the caller.
pub fn stats(&self) -> Option<&Stats> {
pub(crate) fn stats(&self) -> Option<&grep::printer::Stats> {
self.stats.as_ref()
}
}
/// The pattern matcher used by a search worker.
#[derive(Clone, Debug)]
pub enum PatternMatcher {
RustRegex(RustRegexMatcher),
pub(crate) enum PatternMatcher {
RustRegex(grep::regex::RegexMatcher),
#[cfg(feature = "pcre2")]
PCRE2(PCRE2RegexMatcher),
PCRE2(grep::pcre2::RegexMatcher),
}
/// The printer used by a search worker.
///
/// The `W` type parameter refers to the type of the underlying writer.
#[derive(Debug)]
pub enum Printer<W> {
#[derive(Clone, Debug)]
pub(crate) enum Printer<W> {
/// Use the standard printer, which supports the classic grep-like format.
Standard(Standard<W>),
Standard(grep::printer::Standard<W>),
/// Use the summary printer, which supports aggregate displays of search
/// results.
Summary(Summary<W>),
Summary(grep::printer::Summary<W>),
/// A JSON printer, which emits results in the JSON Lines format.
JSON(JSON<W>),
JSON(grep::printer::JSON<W>),
}
impl<W: WriteColor> Printer<W> {
fn print_stats(
&mut self,
total_duration: Duration,
stats: &Stats,
) -> io::Result<()> {
match *self {
Printer::JSON(_) => self.print_stats_json(total_duration, stats),
Printer::Standard(_) | Printer::Summary(_) => {
self.print_stats_human(total_duration, stats)
}
}
}
fn print_stats_human(
&mut self,
total_duration: Duration,
stats: &Stats,
) -> io::Result<()> {
write!(
self.get_mut(),
"
{matches} matches
{lines} matched lines
{searches_with_match} files contained matches
{searches} files searched
{bytes_printed} bytes printed
{bytes_searched} bytes searched
{search_time:0.6} seconds spent searching
{process_time:0.6} seconds
",
matches = stats.matches(),
lines = stats.matched_lines(),
searches_with_match = stats.searches_with_match(),
searches = stats.searches(),
bytes_printed = stats.bytes_printed(),
bytes_searched = stats.bytes_searched(),
search_time = fractional_seconds(stats.elapsed()),
process_time = fractional_seconds(total_duration)
)
}
fn print_stats_json(
&mut self,
total_duration: Duration,
stats: &Stats,
) -> io::Result<()> {
// We specifically match the format laid out by the JSON printer in
// the grep-printer crate. We simply "extend" it with the 'summary'
// message type.
let fractional = fractional_seconds(total_duration);
json::to_writer(
self.get_mut(),
&json!({
"type": "summary",
"data": {
"stats": stats,
"elapsed_total": {
"secs": total_duration.as_secs(),
"nanos": total_duration.subsec_nanos(),
"human": format!("{:0.6}s", fractional),
},
}
}),
)?;
write!(self.get_mut(), "\n")
}
/// Return a mutable reference to the underlying printer's writer.
pub fn get_mut(&mut self) -> &mut W {
pub(crate) fn get_mut(&mut self) -> &mut W {
match *self {
Printer::Standard(ref mut p) => p.get_mut(),
Printer::Summary(ref mut p) => p.get_mut(),
@ -312,29 +229,32 @@ impl<W: WriteColor> Printer<W> {
/// It is intended for a single worker to execute many searches, and is
/// generally intended to be used from a single thread. When searching using
/// multiple threads, it is better to create a new worker for each thread.
#[derive(Debug)]
pub struct SearchWorker<W> {
#[derive(Clone, Debug)]
pub(crate) struct SearchWorker<W> {
config: Config,
command_builder: cli::CommandReaderBuilder,
decomp_builder: cli::DecompressionReaderBuilder,
command_builder: grep::cli::CommandReaderBuilder,
decomp_builder: grep::cli::DecompressionReaderBuilder,
matcher: PatternMatcher,
searcher: Searcher,
searcher: grep::searcher::Searcher,
printer: Printer<W>,
}
impl<W: WriteColor> SearchWorker<W> {
/// Execute a search over the given subject.
pub fn search(&mut self, subject: &Subject) -> io::Result<SearchResult> {
let bin = if subject.is_explicit() {
/// Execute a search over the given haystack.
pub(crate) fn search(
&mut self,
haystack: &crate::haystack::Haystack,
) -> io::Result<SearchResult> {
let bin = if haystack.is_explicit() {
self.config.binary_explicit.clone()
} else {
self.config.binary_implicit.clone()
};
let path = subject.path();
let path = haystack.path();
log::trace!("{}: binary detection: {:?}", path.display(), bin);
self.searcher.set_binary_detection(bin);
if subject.is_stdin() {
if haystack.is_stdin() {
self.search_reader(path, &mut io::stdin().lock())
} else if self.should_preprocess(path) {
self.search_preprocessor(path)
@ -346,28 +266,10 @@ impl<W: WriteColor> SearchWorker<W> {
}
/// Return a mutable reference to the underlying printer.
pub fn printer(&mut self) -> &mut Printer<W> {
pub(crate) fn printer(&mut self) -> &mut Printer<W> {
&mut self.printer
}
/// Print the given statistics to the underlying writer in a way that is
/// consistent with this searcher's printer's format.
///
/// While `Stats` contains a duration itself, this only corresponds to the
/// time spent searching, where as `total_duration` should roughly
/// approximate the lifespan of the ripgrep process itself.
pub fn print_stats(
&mut self,
total_duration: Duration,
stats: &Stats,
) -> io::Result<()> {
if self.config.json_stats {
self.printer().print_stats_json(total_duration, stats)
} else {
self.printer().print_stats(total_duration, stats)
}
}
/// Returns true if and only if the given file path should be
/// decompressed before searching.
fn should_decompress(&self, path: &Path) -> bool {
@ -395,8 +297,10 @@ impl<W: WriteColor> SearchWorker<W> {
&mut self,
path: &Path,
) -> io::Result<SearchResult> {
use std::{fs::File, process::Stdio};
let bin = self.config.preprocessor.as_ref().unwrap();
let mut cmd = Command::new(bin);
let mut cmd = std::process::Command::new(bin);
cmd.arg(path).stdin(Stdio::from(File::open(path)?));
let mut rdr = self.command_builder.build(&mut cmd).map_err(|err| {
@ -473,7 +377,7 @@ impl<W: WriteColor> SearchWorker<W> {
/// searcher and printer.
fn search_path<M: Matcher, W: WriteColor>(
matcher: M,
searcher: &mut Searcher,
searcher: &mut grep::searcher::Searcher,
printer: &mut Printer<W>,
path: &Path,
) -> io::Result<SearchResult> {
@ -509,7 +413,7 @@ fn search_path<M: Matcher, W: WriteColor>(
/// and printer.
fn search_reader<M: Matcher, R: io::Read, W: WriteColor>(
matcher: M,
searcher: &mut Searcher,
searcher: &mut grep::searcher::Searcher,
printer: &mut Printer<W>,
path: &Path,
mut rdr: R,
@ -541,8 +445,3 @@ fn search_reader<M: Matcher, R: io::Read, W: WriteColor>(
}
}
}
/// Return the given duration as fractional seconds.
fn fractional_seconds(duration: Duration) -> f64 {
(duration.as_secs() as f64) + (duration.subsec_nanos() as f64 * 1e-9)
}

View File

@ -1,6 +1,6 @@
[package]
name = "globset"
version = "0.4.9" #:version
version = "0.4.16" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Cross platform single glob and glob set matching. Glob set matching is the
@ -13,26 +13,35 @@ repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/globset"
readme = "README.md"
keywords = ["regex", "glob", "multiple", "set", "pattern"]
license = "Unlicense OR MIT"
edition = "2018"
edition = "2021"
[lib]
name = "globset"
bench = false
[dependencies]
aho-corasick = "0.7.3"
bstr = { version = "0.2.0", default-features = false, features = ["std"] }
fnv = "1.0.6"
log = { version = "0.4.5", optional = true }
regex = { version = "1.1.5", default-features = false, features = ["perf", "std"] }
serde = { version = "1.0.104", optional = true }
aho-corasick = "1.1.1"
bstr = { version = "1.6.2", default-features = false, features = ["std"] }
log = { version = "0.4.20", optional = true }
serde = { version = "1.0.188", optional = true }
[dependencies.regex-syntax]
version = "0.8.0"
default-features = false
features = ["std"]
[dependencies.regex-automata]
version = "0.4.0"
default-features = false
features = ["std", "perf", "syntax", "meta", "nfa", "hybrid"]
[dev-dependencies]
glob = "0.3.0"
lazy_static = "1"
serde_json = "1.0.45"
glob = "0.3.1"
serde_json = "1.0.107"
[features]
default = ["log"]
# DEPRECATED. It is a no-op. SIMD is done automatically through runtime
# dispatch.
simd-accel = []
serde1 = ["serde"]

View File

@ -19,7 +19,7 @@ Add this to your `Cargo.toml`:
```toml
[dependencies]
globset = "0.3"
globset = "0.4"
```
### Features
@ -78,12 +78,12 @@ assert_eq!(set.matches("src/bar/baz/foo.rs"), vec![0, 2]);
This crate implements globs by converting them to regular expressions, and
executing them with the
[`regex`](https://github.com/rust-lang-nursery/regex)
[`regex`](https://github.com/rust-lang/regex)
crate.
For single glob matching, performance of this crate should be roughly on par
with the performance of the
[`glob`](https://github.com/rust-lang-nursery/glob)
[`glob`](https://github.com/rust-lang/glob)
crate. (`*_regex` correspond to benchmarks for this library while `*_glob`
correspond to benchmarks for the `glob` library.)
Optimizations in the `regex` crate may propel this library past `glob`,
@ -108,7 +108,7 @@ test many_short_glob ... bench: 1,063 ns/iter (+/- 47)
test many_short_regex_set ... bench: 186 ns/iter (+/- 11)
```
### Comparison with the [`glob`](https://github.com/rust-lang-nursery/glob) crate
### Comparison with the [`glob`](https://github.com/rust-lang/glob) crate
* Supports alternate "or" globs, e.g., `*.{foo,bar}`.
* Can match non-UTF-8 file paths correctly.

30
crates/globset/src/fnv.rs Normal file
View File

@ -0,0 +1,30 @@
/// A convenience alias for creating a hash map with an FNV hasher.
pub(crate) type HashMap<K, V> =
std::collections::HashMap<K, V, std::hash::BuildHasherDefault<Hasher>>;
/// A hasher that implements the FowlerNollVo (FNV) hash.
pub(crate) struct Hasher(u64);
impl Hasher {
const OFFSET_BASIS: u64 = 0xcbf29ce484222325;
const PRIME: u64 = 0x100000001b3;
}
impl Default for Hasher {
fn default() -> Hasher {
Hasher(Hasher::OFFSET_BASIS)
}
}
impl std::hash::Hasher for Hasher {
fn finish(&self) -> u64 {
self.0
}
fn write(&mut self, bytes: &[u8]) {
for &byte in bytes.iter() {
self.0 = self.0 ^ u64::from(byte);
self.0 = self.0.wrapping_mul(Hasher::PRIME);
}
}
}

View File

@ -1,12 +1,7 @@
use std::fmt;
use std::hash;
use std::iter;
use std::ops::{Deref, DerefMut};
use std::fmt::Write;
use std::path::{is_separator, Path};
use std::str;
use regex;
use regex::bytes::Regex;
use regex_automata::meta::Regex;
use crate::{new_regex, Candidate, Error, ErrorKind};
@ -18,7 +13,7 @@ use crate::{new_regex, Candidate, Error, ErrorKind};
/// possible to test whether any of those patterns matches by looking up a
/// file path's extension in a hash table.
#[derive(Clone, Debug, Eq, PartialEq)]
pub enum MatchStrategy {
pub(crate) enum MatchStrategy {
/// A pattern matches if and only if the entire file path matches this
/// literal string.
Literal(String),
@ -53,7 +48,7 @@ pub enum MatchStrategy {
impl MatchStrategy {
/// Returns a matching strategy for the given pattern.
pub fn new(pat: &Glob) -> MatchStrategy {
pub(crate) fn new(pat: &Glob) -> MatchStrategy {
if let Some(lit) = pat.basename_literal() {
MatchStrategy::BasenameLiteral(lit)
} else if let Some(lit) = pat.literal() {
@ -63,7 +58,7 @@ impl MatchStrategy {
} else if let Some(prefix) = pat.prefix() {
MatchStrategy::Prefix(prefix)
} else if let Some((suffix, component)) = pat.suffix() {
MatchStrategy::Suffix { suffix: suffix, component: component }
MatchStrategy::Suffix { suffix, component }
} else if let Some(ext) = pat.required_ext() {
MatchStrategy::RequiredExtension(ext)
} else {
@ -90,20 +85,20 @@ impl PartialEq for Glob {
}
}
impl hash::Hash for Glob {
fn hash<H: hash::Hasher>(&self, state: &mut H) {
impl std::hash::Hash for Glob {
fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
self.glob.hash(state);
self.opts.hash(state);
}
}
impl fmt::Display for Glob {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
impl std::fmt::Display for Glob {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
self.glob.fmt(f)
}
}
impl str::FromStr for Glob {
impl std::str::FromStr for Glob {
type Err = Error;
fn from_str(glob: &str) -> Result<Self, Self::Err> {
@ -208,6 +203,9 @@ struct GlobOptions {
/// Whether or not to use `\` to escape special characters.
/// e.g., when enabled, `\*` will match a literal `*`.
backslash_escape: bool,
/// Whether or not an empty case in an alternate will be removed.
/// e.g., when enabled, `{,a}` will match "" and "a".
empty_alternates: bool,
}
impl GlobOptions {
@ -216,6 +214,7 @@ impl GlobOptions {
case_insensitive: false,
literal_separator: false,
backslash_escape: !is_separator('\\'),
empty_alternates: false,
}
}
}
@ -223,14 +222,14 @@ impl GlobOptions {
#[derive(Clone, Debug, Default, Eq, PartialEq)]
struct Tokens(Vec<Token>);
impl Deref for Tokens {
impl std::ops::Deref for Tokens {
type Target = Vec<Token>;
fn deref(&self) -> &Vec<Token> {
&self.0
}
}
impl DerefMut for Tokens {
impl std::ops::DerefMut for Tokens {
fn deref_mut(&mut self) -> &mut Vec<Token> {
&mut self.0
}
@ -258,7 +257,7 @@ impl Glob {
pub fn compile_matcher(&self) -> GlobMatcher {
let re =
new_regex(&self.re).expect("regex compilation shouldn't fail");
GlobMatcher { pat: self.clone(), re: re }
GlobMatcher { pat: self.clone(), re }
}
/// Returns a strategic matcher.
@ -271,7 +270,7 @@ impl Glob {
let strategy = MatchStrategy::new(self);
let re =
new_regex(&self.re).expect("regex compilation shouldn't fail");
GlobStrategic { strategy: strategy, re: re }
GlobStrategic { strategy, re }
}
/// Returns the original glob pattern used to build this pattern.
@ -307,10 +306,8 @@ impl Glob {
}
let mut lit = String::new();
for t in &*self.tokens {
match *t {
Token::Literal(c) => lit.push(c),
_ => return None,
}
let Token::Literal(c) = *t else { return None };
lit.push(c);
}
if lit.is_empty() {
None
@ -330,13 +327,12 @@ impl Glob {
if self.opts.case_insensitive {
return None;
}
let start = match self.tokens.get(0) {
Some(&Token::RecursivePrefix) => 1,
Some(_) => 0,
_ => return None,
let start = match *self.tokens.get(0)? {
Token::RecursivePrefix => 1,
_ => 0,
};
match self.tokens.get(start) {
Some(&Token::ZeroOrMore) => {
match *self.tokens.get(start)? {
Token::ZeroOrMore => {
// If there was no recursive prefix, then we only permit
// `*` if `*` can match a `/`. For example, if `*` can't
// match `/`, then `*.c` doesn't match `foo/bar.c`.
@ -346,8 +342,8 @@ impl Glob {
}
_ => return None,
}
match self.tokens.get(start + 1) {
Some(&Token::Literal('.')) => {}
match *self.tokens.get(start + 1)? {
Token::Literal('.') => {}
_ => return None,
}
let mut lit = ".".to_string();
@ -401,8 +397,8 @@ impl Glob {
if self.opts.case_insensitive {
return None;
}
let (end, need_sep) = match self.tokens.last() {
Some(&Token::ZeroOrMore) => {
let (end, need_sep) = match *self.tokens.last()? {
Token::ZeroOrMore => {
if self.opts.literal_separator {
// If a trailing `*` can't match a `/`, then we can't
// assume a match of the prefix corresponds to a match
@ -414,15 +410,13 @@ impl Glob {
}
(self.tokens.len() - 1, false)
}
Some(&Token::RecursiveSuffix) => (self.tokens.len() - 1, true),
Token::RecursiveSuffix => (self.tokens.len() - 1, true),
_ => (self.tokens.len(), false),
};
let mut lit = String::new();
for t in &self.tokens[0..end] {
match *t {
Token::Literal(c) => lit.push(c),
_ => return None,
}
let Token::Literal(c) = *t else { return None };
lit.push(c);
}
if need_sep {
lit.push('/');
@ -451,8 +445,8 @@ impl Glob {
return None;
}
let mut lit = String::new();
let (start, entire) = match self.tokens.get(0) {
Some(&Token::RecursivePrefix) => {
let (start, entire) = match *self.tokens.get(0)? {
Token::RecursivePrefix => {
// We only care if this follows a path component if the next
// token is a literal.
if let Some(&Token::Literal(_)) = self.tokens.get(1) {
@ -464,8 +458,8 @@ impl Glob {
}
_ => (0, false),
};
let start = match self.tokens.get(start) {
Some(&Token::ZeroOrMore) => {
let start = match *self.tokens.get(start)? {
Token::ZeroOrMore => {
// If literal_separator is enabled, then a `*` can't
// necessarily match everything, so reporting a suffix match
// as a match of the pattern would be a false positive.
@ -477,10 +471,8 @@ impl Glob {
_ => start,
};
for t in &self.tokens[start..] {
match *t {
Token::Literal(c) => lit.push(c),
_ => return None,
}
let Token::Literal(c) = *t else { return None };
lit.push(c);
}
if lit.is_empty() || lit == "/" {
None
@ -504,8 +496,8 @@ impl Glob {
if self.opts.case_insensitive {
return None;
}
let start = match self.tokens.get(0) {
Some(&Token::RecursivePrefix) => 1,
let start = match *self.tokens.get(0)? {
Token::RecursivePrefix => 1,
_ => {
// With nothing to gobble up the parent portion of a path,
// we can't assume that matching on only the basename is
@ -516,7 +508,7 @@ impl Glob {
if self.tokens[start..].is_empty() {
return None;
}
for t in &self.tokens[start..] {
for t in self.tokens[start..].iter() {
match *t {
Token::Literal('/') => return None,
Token::Literal(_) => {} // OK
@ -550,16 +542,11 @@ impl Glob {
/// The basic format of these patterns is `**/{literal}`, where `{literal}`
/// does not contain a path separator.
fn basename_literal(&self) -> Option<String> {
let tokens = match self.basename_tokens() {
None => return None,
Some(tokens) => tokens,
};
let tokens = self.basename_tokens()?;
let mut lit = String::new();
for t in tokens {
match *t {
Token::Literal(c) => lit.push(c),
_ => return None,
}
let Token::Literal(c) = *t else { return None };
lit.push(c);
}
Some(lit)
}
@ -570,7 +557,7 @@ impl<'a> GlobBuilder<'a> {
///
/// The pattern is not compiled until `build` is called.
pub fn new(glob: &'a str) -> GlobBuilder<'a> {
GlobBuilder { glob: glob, opts: GlobOptions::default() }
GlobBuilder { glob, opts: GlobOptions::default() }
}
/// Parses and builds the pattern.
@ -600,7 +587,7 @@ impl<'a> GlobBuilder<'a> {
glob: self.glob.to_string(),
re: tokens.to_regex_with(&self.opts),
opts: self.opts,
tokens: tokens,
tokens,
})
}
}
@ -633,6 +620,17 @@ impl<'a> GlobBuilder<'a> {
self.opts.backslash_escape = yes;
self
}
/// Toggle whether an empty pattern in a list of alternates is accepted.
///
/// For example, if this is set then the glob `foo{,.txt}` will match both
/// `foo` and `foo.txt`.
///
/// By default this is false.
pub fn empty_alternates(&mut self, yes: bool) -> &mut GlobBuilder<'a> {
self.opts.empty_alternates = yes;
self
}
}
impl Tokens {
@ -664,7 +662,7 @@ impl Tokens {
tokens: &[Token],
re: &mut String,
) {
for tok in tokens {
for tok in tokens.iter() {
match *tok {
Token::Literal(c) => {
re.push_str(&char_to_escaped_literal(c));
@ -714,7 +712,7 @@ impl Tokens {
for pat in patterns {
let mut altre = String::new();
self.tokens_to_regex(options, &pat, &mut altre);
if !altre.is_empty() {
if !altre.is_empty() || options.empty_alternates {
parts.push(altre);
}
}
@ -722,7 +720,7 @@ impl Tokens {
// It is possible to have an empty set in which case the
// resulting alternation '()' would be an error.
if !parts.is_empty() {
re.push('(');
re.push_str("(?:");
re.push_str(&parts.join("|"));
re.push(')');
}
@ -735,7 +733,9 @@ impl Tokens {
/// Convert a Unicode scalar value to an escaped string suitable for use as
/// a literal in a non-Unicode regex.
fn char_to_escaped_literal(c: char) -> String {
bytes_to_escaped_literal(&c.to_string().into_bytes())
let mut buf = [0; 4];
let bytes = c.encode_utf8(&mut buf).as_bytes();
bytes_to_escaped_literal(bytes)
}
/// Converts an arbitrary sequence of bytes to a UTF-8 string. All non-ASCII
@ -744,9 +744,12 @@ fn bytes_to_escaped_literal(bs: &[u8]) -> String {
let mut s = String::with_capacity(bs.len());
for &b in bs {
if b <= 0x7F {
s.push_str(&regex::escape(&(b as char).to_string()));
regex_syntax::escape_into(
char::from(b).encode_utf8(&mut [0; 4]),
&mut s,
);
} else {
s.push_str(&format!("\\x{:02x}", b));
write!(&mut s, "\\x{:02x}", b).unwrap();
}
}
s
@ -755,7 +758,7 @@ fn bytes_to_escaped_literal(bs: &[u8]) -> String {
struct Parser<'a> {
glob: &'a str,
stack: Vec<Tokens>,
chars: iter::Peekable<str::Chars<'a>>,
chars: std::iter::Peekable<std::str::Chars<'a>>,
prev: Option<char>,
cur: Option<char>,
opts: &'a GlobOptions,
@ -763,7 +766,7 @@ struct Parser<'a> {
impl<'a> Parser<'a> {
fn error(&self, kind: ErrorKind) -> Error {
Error { glob: Some(self.glob.to_string()), kind: kind }
Error { glob: Some(self.glob.to_string()), kind }
}
fn parse(&mut self) -> Result<(), Error> {
@ -982,7 +985,7 @@ impl<'a> Parser<'a> {
// it as a literal.
ranges.push(('-', '-'));
}
self.push_token(Token::Class { negated: negated, ranges: ranges })
self.push_token(Token::Class { negated, ranges })
}
fn bump(&mut self) -> Option<char> {
@ -1020,6 +1023,7 @@ mod tests {
casei: Option<bool>,
litsep: Option<bool>,
bsesc: Option<bool>,
ealtre: Option<bool>,
}
macro_rules! syntax {
@ -1059,6 +1063,9 @@ mod tests {
if let Some(bsesc) = $options.bsesc {
builder.backslash_escape(bsesc);
}
if let Some(ealtre) = $options.ealtre {
builder.empty_alternates(ealtre);
}
let pat = builder.build().unwrap();
assert_eq!(format!("(?-u){}", $re), pat.regex());
}
@ -1082,6 +1089,9 @@ mod tests {
if let Some(bsesc) = $options.bsesc {
builder.backslash_escape(bsesc);
}
if let Some(ealtre) = $options.ealtre {
builder.empty_alternates(ealtre);
}
let pat = builder.build().unwrap();
let matcher = pat.compile_matcher();
let strategic = pat.compile_strategic_matcher();
@ -1110,6 +1120,9 @@ mod tests {
if let Some(bsesc) = $options.bsesc {
builder.backslash_escape(bsesc);
}
if let Some(ealtre) = $options.ealtre {
builder.empty_alternates(ealtre);
}
let pat = builder.build().unwrap();
let matcher = pat.compile_matcher();
let strategic = pat.compile_strategic_matcher();
@ -1195,13 +1208,23 @@ mod tests {
syntaxerr!(err_range2, "[z--]", ErrorKind::InvalidRange('z', '-'));
const CASEI: Options =
Options { casei: Some(true), litsep: None, bsesc: None };
Options { casei: Some(true), litsep: None, bsesc: None, ealtre: None };
const SLASHLIT: Options =
Options { casei: None, litsep: Some(true), bsesc: None };
const NOBSESC: Options =
Options { casei: None, litsep: None, bsesc: Some(false) };
Options { casei: None, litsep: Some(true), bsesc: None, ealtre: None };
const NOBSESC: Options = Options {
casei: None,
litsep: None,
bsesc: Some(false),
ealtre: None,
};
const BSESC: Options =
Options { casei: None, litsep: None, bsesc: Some(true) };
Options { casei: None, litsep: None, bsesc: Some(true), ealtre: None };
const EALTRE: Options = Options {
casei: None,
litsep: None,
bsesc: Some(true),
ealtre: Some(true),
};
toregex!(re_casei, "a", "(?i)^a$", &CASEI);
@ -1242,6 +1265,7 @@ mod tests {
toregex!(re32, "/a**", r"^/a.*.*$");
toregex!(re33, "/**a", r"^/.*.*a$");
toregex!(re34, "/a**b", r"^/a.*.*b$");
toregex!(re35, "{a,b}", r"^(?:b|a)$");
matches!(match1, "a", "a");
matches!(match2, "a*b", "a_b");
@ -1326,6 +1350,9 @@ mod tests {
matches!(matchalt11, "{*.foo,*.bar,*.wat}", "test.foo");
matches!(matchalt12, "{*.foo,*.bar,*.wat}", "test.bar");
matches!(matchalt13, "{*.foo,*.bar,*.wat}", "test.wat");
matches!(matchalt14, "foo{,.txt}", "foo.txt");
nmatches!(matchalt15, "foo{,.txt}", "foo");
matches!(matchalt16, "foo{,.txt}", "foo", EALTRE);
matches!(matchslash1, "abc/def", "abc/def", SLASHLIT);
#[cfg(unix)]
@ -1425,6 +1452,9 @@ mod tests {
if let Some(bsesc) = $options.bsesc {
builder.backslash_escape(bsesc);
}
if let Some(ealtre) = $options.ealtre {
builder.empty_alternates(ealtre);
}
let pat = builder.build().unwrap();
assert_eq!($expect, pat.$which());
}

View File

@ -5,11 +5,9 @@ Glob set matching is the process of matching one or more glob patterns against
a single candidate path simultaneously, and returning all of the globs that
matched. For example, given this set of globs:
```ignore
*.rs
src/lib.rs
src/**/foo.rs
```
* `*.rs`
* `src/lib.rs`
* `src/**/foo.rs`
and a path `src/bar/baz/foo.rs`, then the set would report the first and third
globs as matching.
@ -19,7 +17,6 @@ globs as matching.
This example shows how to match a single glob against a single file path.
```
# fn example() -> Result<(), globset::Error> {
use globset::Glob;
let glob = Glob::new("*.rs")?.compile_matcher();
@ -27,7 +24,7 @@ let glob = Glob::new("*.rs")?.compile_matcher();
assert!(glob.is_match("foo.rs"));
assert!(glob.is_match("foo/bar.rs"));
assert!(!glob.is_match("Cargo.toml"));
# Ok(()) } example().unwrap();
# Ok::<(), Box<dyn std::error::Error>>(())
```
# Example: configuring a glob matcher
@ -36,7 +33,6 @@ This example shows how to use a `GlobBuilder` to configure aspects of match
semantics. In this example, we prevent wildcards from matching path separators.
```
# fn example() -> Result<(), globset::Error> {
use globset::GlobBuilder;
let glob = GlobBuilder::new("*.rs")
@ -45,7 +41,7 @@ let glob = GlobBuilder::new("*.rs")
assert!(glob.is_match("foo.rs"));
assert!(!glob.is_match("foo/bar.rs")); // no longer matches
assert!(!glob.is_match("Cargo.toml"));
# Ok(()) } example().unwrap();
# Ok::<(), Box<dyn std::error::Error>>(())
```
# Example: match multiple globs at once
@ -53,7 +49,6 @@ assert!(!glob.is_match("Cargo.toml"));
This example shows how to match multiple glob patterns at once.
```
# fn example() -> Result<(), globset::Error> {
use globset::{Glob, GlobSetBuilder};
let mut builder = GlobSetBuilder::new();
@ -65,7 +60,7 @@ builder.add(Glob::new("src/**/foo.rs")?);
let set = builder.build()?;
assert_eq!(set.matches("src/bar/baz/foo.rs"), vec![0, 2]);
# Ok(()) } example().unwrap();
# Ok::<(), Box<dyn std::error::Error>>(())
```
# Syntax
@ -103,22 +98,31 @@ or to enable case insensitive matching.
#![deny(missing_docs)]
use std::borrow::Cow;
use std::collections::{BTreeMap, HashMap};
use std::error::Error as StdError;
use std::fmt;
use std::hash;
use std::path::Path;
use std::str;
use std::{
borrow::Cow,
panic::{RefUnwindSafe, UnwindSafe},
path::Path,
sync::Arc,
};
use aho_corasick::AhoCorasick;
use bstr::{ByteSlice, ByteVec, B};
use regex::bytes::{Regex, RegexBuilder, RegexSet};
use {
aho_corasick::AhoCorasick,
bstr::{ByteSlice, ByteVec, B},
regex_automata::{
meta::Regex,
util::pool::{Pool, PoolGuard},
PatternSet,
},
};
use crate::{
glob::MatchStrategy,
pathutil::{file_name, file_name_ext, normalize_path},
};
use crate::glob::MatchStrategy;
pub use crate::glob::{Glob, GlobBuilder, GlobMatcher};
use crate::pathutil::{file_name, file_name_ext, normalize_path};
mod fnv;
mod glob;
mod pathutil;
@ -181,7 +185,7 @@ pub enum ErrorKind {
__Nonexhaustive,
}
impl StdError for Error {
impl std::error::Error for Error {
fn description(&self) -> &str {
self.kind.description()
}
@ -227,8 +231,8 @@ impl ErrorKind {
}
}
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
impl std::fmt::Display for Error {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self.glob {
None => self.kind.fmt(f),
Some(ref glob) => {
@ -238,8 +242,8 @@ impl fmt::Display for Error {
}
}
impl fmt::Display for ErrorKind {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
impl std::fmt::Display for ErrorKind {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match *self {
ErrorKind::InvalidRecursive
| ErrorKind::UnclosedClass
@ -257,30 +261,40 @@ impl fmt::Display for ErrorKind {
}
fn new_regex(pat: &str) -> Result<Regex, Error> {
RegexBuilder::new(pat)
.dot_matches_new_line(true)
.size_limit(10 * (1 << 20))
.dfa_size_limit(10 * (1 << 20))
.build()
.map_err(|err| Error {
let syntax = regex_automata::util::syntax::Config::new()
.utf8(false)
.dot_matches_new_line(true);
let config = Regex::config()
.utf8_empty(false)
.nfa_size_limit(Some(10 * (1 << 20)))
.hybrid_cache_capacity(10 * (1 << 20));
Regex::builder().syntax(syntax).configure(config).build(pat).map_err(
|err| Error {
glob: Some(pat.to_string()),
kind: ErrorKind::Regex(err.to_string()),
},
)
}
fn new_regex_set(pats: Vec<String>) -> Result<Regex, Error> {
let syntax = regex_automata::util::syntax::Config::new()
.utf8(false)
.dot_matches_new_line(true);
let config = Regex::config()
.match_kind(regex_automata::MatchKind::All)
.utf8_empty(false)
.nfa_size_limit(Some(10 * (1 << 20)))
.hybrid_cache_capacity(10 * (1 << 20));
Regex::builder()
.syntax(syntax)
.configure(config)
.build_many(&pats)
.map_err(|err| Error {
glob: None,
kind: ErrorKind::Regex(err.to_string()),
})
}
fn new_regex_set<I, S>(pats: I) -> Result<RegexSet, Error>
where
S: AsRef<str>,
I: IntoIterator<Item = S>,
{
RegexSet::new(pats).map_err(|err| Error {
glob: None,
kind: ErrorKind::Regex(err.to_string()),
})
}
type Fnv = hash::BuildHasherDefault<fnv::FnvHasher>;
/// GlobSet represents a group of globs that can be matched together in a
/// single pass.
#[derive(Clone, Debug)]
@ -290,6 +304,14 @@ pub struct GlobSet {
}
impl GlobSet {
/// Create a new [`GlobSetBuilder`]. A `GlobSetBuilder` can be used to add
/// new patterns. Once all patterns have been added, `build` should be
/// called to produce a `GlobSet`, which can then be used for matching.
#[inline]
pub fn builder() -> GlobSetBuilder {
GlobSetBuilder::new()
}
/// Create an empty `GlobSet`. An empty set matches nothing.
#[inline]
pub fn empty() -> GlobSet {
@ -471,9 +493,9 @@ pub struct GlobSetBuilder {
}
impl GlobSetBuilder {
/// Create a new GlobSetBuilder. A GlobSetBuilder can be used to add new
/// Create a new `GlobSetBuilder`. A `GlobSetBuilder` can be used to add new
/// patterns. Once all patterns have been added, `build` should be called
/// to produce a `GlobSet`, which can then be used for matching.
/// to produce a [`GlobSet`], which can then be used for matching.
pub fn new() -> GlobSetBuilder {
GlobSetBuilder { pats: vec![] }
}
@ -498,20 +520,30 @@ impl GlobSetBuilder {
/// Constructing candidates has a very small cost associated with it, so
/// callers may find it beneficial to amortize that cost when matching a single
/// path against multiple globs or sets of globs.
#[derive(Clone, Debug)]
#[derive(Clone)]
pub struct Candidate<'a> {
path: Cow<'a, [u8]>,
basename: Cow<'a, [u8]>,
ext: Cow<'a, [u8]>,
}
impl<'a> std::fmt::Debug for Candidate<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
f.debug_struct("Candidate")
.field("path", &self.path.as_bstr())
.field("basename", &self.basename.as_bstr())
.field("ext", &self.ext.as_bstr())
.finish()
}
}
impl<'a> Candidate<'a> {
/// Create a new candidate for matching from the given path.
pub fn new<P: AsRef<Path> + ?Sized>(path: &'a P) -> Candidate<'a> {
let path = normalize_path(Vec::from_path_lossy(path.as_ref()));
let basename = file_name(&path).unwrap_or(Cow::Borrowed(B("")));
let ext = file_name_ext(&basename).unwrap_or(Cow::Borrowed(B("")));
Candidate { path: path, basename: basename, ext: ext }
Candidate { path, basename, ext }
}
fn path_prefix(&self, max: usize) -> &[u8] {
@ -575,11 +607,11 @@ impl GlobSetMatchStrategy {
}
#[derive(Clone, Debug)]
struct LiteralStrategy(BTreeMap<Vec<u8>, Vec<usize>>);
struct LiteralStrategy(fnv::HashMap<Vec<u8>, Vec<usize>>);
impl LiteralStrategy {
fn new() -> LiteralStrategy {
LiteralStrategy(BTreeMap::new())
LiteralStrategy(fnv::HashMap::default())
}
fn add(&mut self, global_index: usize, lit: String) {
@ -603,11 +635,11 @@ impl LiteralStrategy {
}
#[derive(Clone, Debug)]
struct BasenameLiteralStrategy(BTreeMap<Vec<u8>, Vec<usize>>);
struct BasenameLiteralStrategy(fnv::HashMap<Vec<u8>, Vec<usize>>);
impl BasenameLiteralStrategy {
fn new() -> BasenameLiteralStrategy {
BasenameLiteralStrategy(BTreeMap::new())
BasenameLiteralStrategy(fnv::HashMap::default())
}
fn add(&mut self, global_index: usize, lit: String) {
@ -637,11 +669,11 @@ impl BasenameLiteralStrategy {
}
#[derive(Clone, Debug)]
struct ExtensionStrategy(HashMap<Vec<u8>, Vec<usize>, Fnv>);
struct ExtensionStrategy(fnv::HashMap<Vec<u8>, Vec<usize>>);
impl ExtensionStrategy {
fn new() -> ExtensionStrategy {
ExtensionStrategy(HashMap::with_hasher(Fnv::default()))
ExtensionStrategy(fnv::HashMap::default())
}
fn add(&mut self, global_index: usize, ext: String) {
@ -735,7 +767,7 @@ impl SuffixStrategy {
}
#[derive(Clone, Debug)]
struct RequiredExtensionStrategy(HashMap<Vec<u8>, Vec<(usize, Regex)>, Fnv>);
struct RequiredExtensionStrategy(fnv::HashMap<Vec<u8>, Vec<(usize, Regex)>>);
impl RequiredExtensionStrategy {
fn is_match(&self, candidate: &Candidate<'_>) -> bool {
@ -776,10 +808,22 @@ impl RequiredExtensionStrategy {
#[derive(Clone, Debug)]
struct RegexSetStrategy {
matcher: RegexSet,
matcher: Regex,
map: Vec<usize>,
// We use a pool of PatternSets to hopefully allocating a fresh one on each
// call.
//
// TODO: In the next semver breaking release, we should drop this pool and
// expose an opaque type that wraps PatternSet. Then callers can provide
// it to `matches_into` directly. Callers might still want to use a pool
// or similar to amortize allocation, but that matches the status quo and
// absolves us of needing to do it here.
patset: Arc<Pool<PatternSet, PatternSetPoolFn>>,
}
type PatternSetPoolFn =
Box<dyn Fn() -> PatternSet + Send + Sync + UnwindSafe + RefUnwindSafe>;
impl RegexSetStrategy {
fn is_match(&self, candidate: &Candidate<'_>) -> bool {
self.matcher.is_match(candidate.path.as_bytes())
@ -790,9 +834,14 @@ impl RegexSetStrategy {
candidate: &Candidate<'_>,
matches: &mut Vec<usize>,
) {
for i in self.matcher.matches(candidate.path.as_bytes()) {
let input = regex_automata::Input::new(candidate.path.as_bytes());
let mut patset = self.patset.get();
patset.clear();
self.matcher.which_overlapping_matches(&input, &mut patset);
for i in patset.iter() {
matches.push(self.map[i]);
}
PoolGuard::put(patset);
}
}
@ -818,7 +867,7 @@ impl MultiStrategyBuilder {
fn prefix(self) -> PrefixStrategy {
PrefixStrategy {
matcher: AhoCorasick::new_auto_configured(&self.literals),
matcher: AhoCorasick::new(&self.literals).unwrap(),
map: self.map,
longest: self.longest,
}
@ -826,28 +875,33 @@ impl MultiStrategyBuilder {
fn suffix(self) -> SuffixStrategy {
SuffixStrategy {
matcher: AhoCorasick::new_auto_configured(&self.literals),
matcher: AhoCorasick::new(&self.literals).unwrap(),
map: self.map,
longest: self.longest,
}
}
fn regex_set(self) -> Result<RegexSetStrategy, Error> {
let matcher = new_regex_set(self.literals)?;
let pattern_len = matcher.pattern_len();
let create: PatternSetPoolFn =
Box::new(move || PatternSet::new(pattern_len));
Ok(RegexSetStrategy {
matcher: new_regex_set(self.literals)?,
matcher,
map: self.map,
patset: Arc::new(Pool::new(create)),
})
}
}
#[derive(Clone, Debug)]
struct RequiredExtensionStrategyBuilder(
HashMap<Vec<u8>, Vec<(usize, String)>>,
fnv::HashMap<Vec<u8>, Vec<(usize, String)>>,
);
impl RequiredExtensionStrategyBuilder {
fn new() -> RequiredExtensionStrategyBuilder {
RequiredExtensionStrategyBuilder(HashMap::new())
RequiredExtensionStrategyBuilder(fnv::HashMap::default())
}
fn add(&mut self, global_index: usize, ext: String, regex: String) {
@ -858,7 +912,7 @@ impl RequiredExtensionStrategyBuilder {
}
fn build(self) -> Result<RequiredExtensionStrategy, Error> {
let mut exts = HashMap::with_hasher(Fnv::default());
let mut exts = fnv::HashMap::default();
for (ext, regexes) in self.0.into_iter() {
exts.insert(ext.clone(), vec![]);
for (global_index, regex) in regexes {
@ -870,11 +924,48 @@ impl RequiredExtensionStrategyBuilder {
}
}
/// Escape meta-characters within the given glob pattern.
///
/// The escaping works by surrounding meta-characters with brackets. For
/// example, `*` becomes `[*]`.
///
/// # Example
///
/// ```
/// use globset::escape;
///
/// assert_eq!(escape("foo*bar"), "foo[*]bar");
/// assert_eq!(escape("foo?bar"), "foo[?]bar");
/// assert_eq!(escape("foo[bar"), "foo[[]bar");
/// assert_eq!(escape("foo]bar"), "foo[]]bar");
/// assert_eq!(escape("foo{bar"), "foo[{]bar");
/// assert_eq!(escape("foo}bar"), "foo[}]bar");
/// ```
pub fn escape(s: &str) -> String {
let mut escaped = String::with_capacity(s.len());
for c in s.chars() {
match c {
// note that ! does not need escaping because it is only special
// inside brackets
'?' | '*' | '[' | ']' | '{' | '}' => {
escaped.push('[');
escaped.push(c);
escaped.push(']');
}
c => {
escaped.push(c);
}
}
}
escaped
}
#[cfg(test)]
mod tests {
use super::{GlobSet, GlobSetBuilder};
use crate::glob::Glob;
use super::{GlobSet, GlobSetBuilder};
#[test]
fn set_works() {
let mut builder = GlobSetBuilder::new();
@ -909,4 +1000,36 @@ mod tests {
assert!(!set.is_match(""));
assert!(!set.is_match("a"));
}
#[test]
fn escape() {
use super::escape;
assert_eq!("foo", escape("foo"));
assert_eq!("foo[*]", escape("foo*"));
assert_eq!("[[][]]", escape("[]"));
assert_eq!("[*][?]", escape("*?"));
assert_eq!("src/[*][*]/[*].rs", escape("src/**/*.rs"));
assert_eq!("bar[[]ab[]]baz", escape("bar[ab]baz"));
assert_eq!("bar[[]!![]]!baz", escape("bar[!!]!baz"));
}
// This tests that regex matching doesn't "remember" the results of
// previous searches. That is, if any memory is reused from a previous
// search, then it should be cleared first.
#[test]
fn set_does_not_remember() {
let mut builder = GlobSetBuilder::new();
builder.add(Glob::new("*foo*").unwrap());
builder.add(Glob::new("*bar*").unwrap());
builder.add(Glob::new("*quux*").unwrap());
let set = builder.build().unwrap();
let matches = set.matches("ZfooZquuxZ");
assert_eq!(2, matches.len());
assert_eq!(0, matches[0]);
assert_eq!(2, matches[1]);
let matches = set.matches("nada");
assert_eq!(0, matches.len());
}
}

View File

@ -4,12 +4,10 @@ use bstr::{ByteSlice, ByteVec};
/// The final component of the path, if it is a normal file.
///
/// If the path terminates in ., .., or consists solely of a root of prefix,
/// file_name will return None.
pub fn file_name<'a>(path: &Cow<'a, [u8]>) -> Option<Cow<'a, [u8]>> {
if path.is_empty() {
return None;
} else if path.last_byte() == Some(b'.') {
/// If the path terminates in `.`, `..`, or consists solely of a root of
/// prefix, file_name will return None.
pub(crate) fn file_name<'a>(path: &Cow<'a, [u8]>) -> Option<Cow<'a, [u8]>> {
if path.last_byte().map_or(true, |b| b == b'.') {
return None;
}
let last_slash = path.rfind_byte(b'/').map(|i| i + 1).unwrap_or(0);
@ -27,7 +25,7 @@ pub fn file_name<'a>(path: &Cow<'a, [u8]>) -> Option<Cow<'a, [u8]>> {
///
/// Note that this does NOT match the semantics of std::path::Path::extension.
/// Namely, the extension includes the `.` and matching is otherwise more
/// liberal. Specifically, the extenion is:
/// liberal. Specifically, the extension is:
///
/// * None, if the file name given is empty;
/// * None, if there is no embedded `.`;
@ -39,7 +37,9 @@ pub fn file_name<'a>(path: &Cow<'a, [u8]>) -> Option<Cow<'a, [u8]>> {
/// a pattern like `*.rs` is obviously trying to match files with a `rs`
/// extension, but it also matches files like `.rs`, which doesn't have an
/// extension according to std::path::Path::extension.
pub fn file_name_ext<'a>(name: &Cow<'a, [u8]>) -> Option<Cow<'a, [u8]>> {
pub(crate) fn file_name_ext<'a>(
name: &Cow<'a, [u8]>,
) -> Option<Cow<'a, [u8]>> {
if name.is_empty() {
return None;
}
@ -60,7 +60,7 @@ pub fn file_name_ext<'a>(name: &Cow<'a, [u8]>) -> Option<Cow<'a, [u8]>> {
/// Normalizes a path to use `/` as a separator everywhere, even on platforms
/// that recognize other characters as separators.
#[cfg(unix)]
pub fn normalize_path(path: Cow<'_, [u8]>) -> Cow<'_, [u8]> {
pub(crate) fn normalize_path(path: Cow<'_, [u8]>) -> Cow<'_, [u8]> {
// UNIX only uses /, so we're good.
path
}
@ -68,11 +68,11 @@ pub fn normalize_path(path: Cow<'_, [u8]>) -> Cow<'_, [u8]> {
/// Normalizes a path to use `/` as a separator everywhere, even on platforms
/// that recognize other characters as separators.
#[cfg(not(unix))]
pub fn normalize_path(mut path: Cow<[u8]>) -> Cow<[u8]> {
pub(crate) fn normalize_path(mut path: Cow<[u8]>) -> Cow<[u8]> {
use std::path::is_separator;
for i in 0..path.len() {
if path[i] == b'/' || !is_separator(path[i] as char) {
if path[i] == b'/' || !is_separator(char::from(path[i])) {
continue;
}
path.to_mut()[i] = b'/';

View File

@ -1,7 +1,9 @@
use serde::de::Error;
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use serde::{
de::{Error, SeqAccess, Visitor},
{Deserialize, Deserializer, Serialize, Serializer},
};
use crate::Glob;
use crate::{Glob, GlobSet, GlobSetBuilder};
impl Serialize for Glob {
fn serialize<S: Serializer>(
@ -12,18 +14,98 @@ impl Serialize for Glob {
}
}
struct GlobVisitor;
impl<'de> Visitor<'de> for GlobVisitor {
type Value = Glob;
fn expecting(
&self,
formatter: &mut std::fmt::Formatter,
) -> std::fmt::Result {
formatter.write_str("a glob pattern")
}
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
where
E: Error,
{
Glob::new(v).map_err(serde::de::Error::custom)
}
}
impl<'de> Deserialize<'de> for Glob {
fn deserialize<D: Deserializer<'de>>(
deserializer: D,
) -> Result<Self, D::Error> {
let glob = <&str as Deserialize>::deserialize(deserializer)?;
Glob::new(glob).map_err(D::Error::custom)
deserializer.deserialize_str(GlobVisitor)
}
}
struct GlobSetVisitor;
impl<'de> Visitor<'de> for GlobSetVisitor {
type Value = GlobSet;
fn expecting(
&self,
formatter: &mut std::fmt::Formatter,
) -> std::fmt::Result {
formatter.write_str("an array of glob patterns")
}
fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
where
A: SeqAccess<'de>,
{
let mut builder = GlobSetBuilder::new();
while let Some(glob) = seq.next_element()? {
builder.add(glob);
}
builder.build().map_err(serde::de::Error::custom)
}
}
impl<'de> Deserialize<'de> for GlobSet {
fn deserialize<D: Deserializer<'de>>(
deserializer: D,
) -> Result<Self, D::Error> {
deserializer.deserialize_seq(GlobSetVisitor)
}
}
#[cfg(test)]
mod tests {
use Glob;
use std::collections::HashMap;
use crate::{Glob, GlobSet};
#[test]
fn glob_deserialize_borrowed() {
let string = r#"{"markdown": "*.md"}"#;
let map: HashMap<String, Glob> =
serde_json::from_str(&string).unwrap();
assert_eq!(map["markdown"], Glob::new("*.md").unwrap());
}
#[test]
fn glob_deserialize_owned() {
let string = r#"{"markdown": "*.md"}"#;
let v: serde_json::Value = serde_json::from_str(&string).unwrap();
let map: HashMap<String, Glob> = serde_json::from_value(v).unwrap();
assert_eq!(map["markdown"], Glob::new("*.md").unwrap());
}
#[test]
fn glob_deserialize_error() {
let string = r#"{"error": "["}"#;
let map = serde_json::from_str::<HashMap<String, Glob>>(&string);
assert!(map.is_err());
}
#[test]
fn glob_json_works() {
@ -35,4 +117,12 @@ mod tests {
let de: Glob = serde_json::from_str(&ser).unwrap();
assert_eq!(test_glob, de);
}
#[test]
fn glob_set_deserialize() {
let j = r#" ["src/**/*.rs", "README.md"] "#;
let set: GlobSet = serde_json::from_str(j).unwrap();
assert!(set.is_match("src/lib.rs"));
assert!(!set.is_match("Cargo.lock"));
}
}

View File

@ -1,6 +1,6 @@
[package]
name = "grep"
version = "0.2.8" #:version
version = "0.3.2" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Fast line oriented regex searching as a library.
@ -11,23 +11,23 @@ repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/grep"
readme = "README.md"
keywords = ["regex", "grep", "egrep", "search", "pattern"]
license = "Unlicense OR MIT"
edition = "2018"
edition = "2021"
[dependencies]
grep-cli = { version = "0.1.6", path = "../cli" }
grep-matcher = { version = "0.1.5", path = "../matcher" }
grep-pcre2 = { version = "0.1.5", path = "../pcre2", optional = true }
grep-printer = { version = "0.1.6", path = "../printer" }
grep-regex = { version = "0.1.9", path = "../regex" }
grep-searcher = { version = "0.1.8", path = "../searcher" }
grep-cli = { version = "0.1.11", path = "../cli" }
grep-matcher = { version = "0.1.7", path = "../matcher" }
grep-pcre2 = { version = "0.1.8", path = "../pcre2", optional = true }
grep-printer = { version = "0.2.2", path = "../printer" }
grep-regex = { version = "0.1.13", path = "../regex" }
grep-searcher = { version = "0.1.14", path = "../searcher" }
[dev-dependencies]
termcolor = "1.0.4"
walkdir = "2.2.7"
[features]
simd-accel = ["grep-searcher/simd-accel"]
pcre2 = ["grep-pcre2"]
# This feature is DEPRECATED. Runtime dispatch is used for SIMD now.
# These features are DEPRECATED. Runtime dispatch is used for SIMD now.
simd-accel = []
avx-accel = []

View File

@ -1,14 +1,15 @@
use std::env;
use std::error::Error;
use std::ffi::OsString;
use std::process;
use std::{env, error::Error, ffi::OsString, io::IsTerminal, process};
use grep::cli;
use grep::printer::{ColorSpecs, StandardBuilder};
use grep::regex::RegexMatcher;
use grep::searcher::{BinaryDetection, SearcherBuilder};
use termcolor::ColorChoice;
use walkdir::WalkDir;
use {
grep::{
cli,
printer::{ColorSpecs, StandardBuilder},
regex::RegexMatcher,
searcher::{BinaryDetection, SearcherBuilder},
},
termcolor::ColorChoice,
walkdir::WalkDir,
};
fn main() {
if let Err(err) = try_main() {
@ -36,7 +37,7 @@ fn search(pattern: &str, paths: &[OsString]) -> Result<(), Box<dyn Error>> {
.build();
let mut printer = StandardBuilder::new()
.color_specs(ColorSpecs::default_with_color())
.build(cli::stdout(if cli::is_tty_stdout() {
.build(cli::stdout(if std::io::stdout().is_terminal() {
ColorChoice::Auto
} else {
ColorChoice::Never

View File

@ -12,8 +12,6 @@ are sparse.
A cookbook and a guide are planned.
*/
#![deny(missing_docs)]
pub extern crate grep_cli as cli;
pub extern crate grep_matcher as matcher;
#[cfg(feature = "pcre2")]

View File

@ -1,6 +1,6 @@
[package]
name = "ignore"
version = "0.4.18" #:version
version = "0.4.23" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
A fast library for efficiently matching ignore files such as `.gitignore`
@ -12,28 +12,33 @@ repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/ignore"
readme = "README.md"
keywords = ["glob", "ignore", "gitignore", "pattern", "file"]
license = "Unlicense OR MIT"
edition = "2018"
edition = "2021"
[lib]
name = "ignore"
bench = false
[dependencies]
crossbeam-utils = "0.8.0"
globset = { version = "0.4.7", path = "../globset" }
lazy_static = "1.1"
log = "0.4.5"
memchr = "2.1"
regex = "1.1"
same-file = "1.0.4"
thread_local = "1"
walkdir = "2.2.7"
crossbeam-deque = "0.8.3"
globset = { version = "0.4.15", path = "../globset" }
log = "0.4.20"
memchr = "2.6.3"
same-file = "1.0.6"
walkdir = "2.4.0"
[dependencies.regex-automata]
version = "0.4.0"
default-features = false
features = ["std", "perf", "syntax", "meta", "nfa", "hybrid", "dfa-onepass"]
[target.'cfg(windows)'.dependencies.winapi-util]
version = "0.1.2"
[dev-dependencies]
crossbeam-channel = "0.5.0"
bstr = { version = "1.6.2", default-features = false, features = ["std"] }
crossbeam-channel = "0.5.15"
[features]
simd-accel = ["globset/simd-accel"]
# DEPRECATED. It is a no-op. SIMD is done automatically through runtime
# dispatch.
simd-accel = []

View File

@ -1,10 +1,6 @@
use std::env;
use std::io::{self, Write};
use std::path::Path;
use std::thread;
use std::{env, io::Write, path::Path};
use ignore::WalkBuilder;
use walkdir::WalkDir;
use {bstr::ByteVec, ignore::WalkBuilder, walkdir::WalkDir};
fn main() {
let mut path = env::args().nth(1).unwrap();
@ -19,10 +15,11 @@ fn main() {
simple = true;
}
let stdout_thread = thread::spawn(move || {
let mut stdout = io::BufWriter::new(io::stdout());
let stdout_thread = std::thread::spawn(move || {
let mut stdout = std::io::BufWriter::new(std::io::stdout());
for dent in rx {
write_path(&mut stdout, dent.path());
stdout.write(&*Vec::from_path_lossy(dent.path())).unwrap();
stdout.write(b"\n").unwrap();
}
});
@ -65,16 +62,3 @@ impl DirEntry {
}
}
}
#[cfg(unix)]
fn write_path<W: Write>(mut wtr: W, path: &Path) {
use std::os::unix::ffi::OsStrExt;
wtr.write(path.as_os_str().as_bytes()).unwrap();
wtr.write(b"\n").unwrap();
}
#[cfg(not(unix))]
fn write_path<W: Write>(mut wtr: W, path: &Path) {
wtr.write(path.to_string_lossy().as_bytes()).unwrap();
wtr.write(b"\n").unwrap();
}

View File

@ -9,100 +9,118 @@
/// Please try to keep this list sorted lexicographically and wrapped to 79
/// columns (inclusive).
#[rustfmt::skip]
pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
("agda", &["*.agda", "*.lagda"]),
("aidl", &["*.aidl"]),
("amake", &["*.mk", "*.bp"]),
("asciidoc", &["*.adoc", "*.asc", "*.asciidoc"]),
("asm", &["*.asm", "*.s", "*.S"]),
("asp", &[
pub(crate) const DEFAULT_TYPES: &[(&[&str], &[&str])] = &[
(&["ada"], &["*.adb", "*.ads"]),
(&["agda"], &["*.agda", "*.lagda"]),
(&["aidl"], &["*.aidl"]),
(&["alire"], &["alire.toml"]),
(&["amake"], &["*.mk", "*.bp"]),
(&["asciidoc"], &["*.adoc", "*.asc", "*.asciidoc"]),
(&["asm"], &["*.asm", "*.s", "*.S"]),
(&["asp"], &[
"*.aspx", "*.aspx.cs", "*.aspx.vb", "*.ascx", "*.ascx.cs",
"*.ascx.vb", "*.asp"
]),
("ats", &["*.ats", "*.dats", "*.sats", "*.hats"]),
("avro", &["*.avdl", "*.avpr", "*.avsc"]),
("awk", &["*.awk"]),
("bazel", &[
(&["ats"], &["*.ats", "*.dats", "*.sats", "*.hats"]),
(&["avro"], &["*.avdl", "*.avpr", "*.avsc"]),
(&["awk"], &["*.awk"]),
(&["bat", "batch"], &["*.bat"]),
(&["bazel"], &[
"*.bazel", "*.bzl", "*.BUILD", "*.bazelrc", "BUILD", "MODULE.bazel",
"WORKSPACE", "WORKSPACE.bazel",
]),
("bitbake", &["*.bb", "*.bbappend", "*.bbclass", "*.conf", "*.inc"]),
("brotli", &["*.br"]),
("buildstream", &["*.bst"]),
("bzip2", &["*.bz2", "*.tbz2"]),
("c", &["*.[chH]", "*.[chH].in", "*.cats"]),
("cabal", &["*.cabal"]),
("cbor", &["*.cbor"]),
("ceylon", &["*.ceylon"]),
("clojure", &["*.clj", "*.cljc", "*.cljs", "*.cljx"]),
("cmake", &["*.cmake", "CMakeLists.txt"]),
("coffeescript", &["*.coffee"]),
("config", &["*.cfg", "*.conf", "*.config", "*.ini"]),
("coq", &["*.v"]),
("cpp", &[
(&["bitbake"], &["*.bb", "*.bbappend", "*.bbclass", "*.conf", "*.inc"]),
(&["brotli"], &["*.br"]),
(&["buildstream"], &["*.bst"]),
(&["bzip2"], &["*.bz2", "*.tbz2"]),
(&["c"], &["*.[chH]", "*.[chH].in", "*.cats"]),
(&["cabal"], &["*.cabal"]),
(&["candid"], &["*.did"]),
(&["carp"], &["*.carp"]),
(&["cbor"], &["*.cbor"]),
(&["ceylon"], &["*.ceylon"]),
(&["clojure"], &["*.clj", "*.cljc", "*.cljs", "*.cljx"]),
(&["cmake"], &["*.cmake", "CMakeLists.txt"]),
(&["cmd"], &["*.bat", "*.cmd"]),
(&["cml"], &["*.cml"]),
(&["coffeescript"], &["*.coffee"]),
(&["config"], &["*.cfg", "*.conf", "*.config", "*.ini"]),
(&["coq"], &["*.v"]),
(&["cpp"], &[
"*.[ChH]", "*.cc", "*.[ch]pp", "*.[ch]xx", "*.hh", "*.inl",
"*.[ChH].in", "*.cc.in", "*.[ch]pp.in", "*.[ch]xx.in", "*.hh.in",
]),
("creole", &["*.creole"]),
("crystal", &["Projectfile", "*.cr", "*.ecr", "shard.yml"]),
("cs", &["*.cs"]),
("csharp", &["*.cs"]),
("cshtml", &["*.cshtml"]),
("css", &["*.css", "*.scss"]),
("csv", &["*.csv"]),
("cuda", &["*.cu", "*.cuh"]),
("cython", &["*.pyx", "*.pxi", "*.pxd"]),
("d", &["*.d"]),
("dart", &["*.dart"]),
("dhall", &["*.dhall"]),
("diff", &["*.patch", "*.diff"]),
("docker", &["*Dockerfile*"]),
("dvc", &["Dvcfile", "*.dvc"]),
("ebuild", &["*.ebuild"]),
("edn", &["*.edn"]),
("elisp", &["*.el"]),
("elixir", &["*.ex", "*.eex", "*.exs"]),
("elm", &["*.elm"]),
("erb", &["*.erb"]),
("erlang", &["*.erl", "*.hrl"]),
("fennel", &["*.fnl"]),
("fidl", &["*.fidl"]),
("fish", &["*.fish"]),
("flatbuffers", &["*.fbs"]),
("fortran", &[
(&["creole"], &["*.creole"]),
(&["crystal"], &["Projectfile", "*.cr", "*.ecr", "shard.yml"]),
(&["cs"], &["*.cs"]),
(&["csharp"], &["*.cs"]),
(&["cshtml"], &["*.cshtml"]),
(&["csproj"], &["*.csproj"]),
(&["css"], &["*.css", "*.scss"]),
(&["csv"], &["*.csv"]),
(&["cuda"], &["*.cu", "*.cuh"]),
(&["cython"], &["*.pyx", "*.pxi", "*.pxd"]),
(&["d"], &["*.d"]),
(&["dart"], &["*.dart"]),
(&["devicetree"], &["*.dts", "*.dtsi"]),
(&["dhall"], &["*.dhall"]),
(&["diff"], &["*.patch", "*.diff"]),
(&["dita"], &["*.dita", "*.ditamap", "*.ditaval"]),
(&["docker"], &["*Dockerfile*"]),
(&["dockercompose"], &["docker-compose.yml", "docker-compose.*.yml"]),
(&["dts"], &["*.dts", "*.dtsi"]),
(&["dvc"], &["Dvcfile", "*.dvc"]),
(&["ebuild"], &["*.ebuild", "*.eclass"]),
(&["edn"], &["*.edn"]),
(&["elisp"], &["*.el"]),
(&["elixir"], &["*.ex", "*.eex", "*.exs", "*.heex", "*.leex", "*.livemd"]),
(&["elm"], &["*.elm"]),
(&["erb"], &["*.erb"]),
(&["erlang"], &["*.erl", "*.hrl"]),
(&["fennel"], &["*.fnl"]),
(&["fidl"], &["*.fidl"]),
(&["fish"], &["*.fish"]),
(&["flatbuffers"], &["*.fbs"]),
(&["fortran"], &[
"*.f", "*.F", "*.f77", "*.F77", "*.pfo",
"*.f90", "*.F90", "*.f95", "*.F95",
]),
("fsharp", &["*.fs", "*.fsx", "*.fsi"]),
("fut", &["*.fut"]),
("gap", &["*.g", "*.gap", "*.gi", "*.gd", "*.tst"]),
("gn", &["*.gn", "*.gni"]),
("go", &["*.go"]),
("gradle", &["*.gradle"]),
("groovy", &["*.groovy", "*.gradle"]),
("gzip", &["*.gz", "*.tgz"]),
("h", &["*.h", "*.hh", "*.hpp"]),
("haml", &["*.haml"]),
("hare", &["*.ha"]),
("haskell", &["*.hs", "*.lhs", "*.cpphs", "*.c2hs", "*.hsc"]),
("hbs", &["*.hbs"]),
("hs", &["*.hs", "*.lhs"]),
("html", &["*.htm", "*.html", "*.ejs"]),
("hy", &["*.hy"]),
("idris", &["*.idr", "*.lidr"]),
("janet", &["*.janet"]),
("java", &["*.java", "*.jsp", "*.jspx", "*.properties"]),
("jinja", &["*.j2", "*.jinja", "*.jinja2"]),
("jl", &["*.jl"]),
("js", &["*.js", "*.jsx", "*.vue"]),
("json", &["*.json", "composer.lock"]),
("jsonl", &["*.jsonl"]),
("julia", &["*.jl"]),
("jupyter", &["*.ipynb", "*.jpynb"]),
("k", &["*.k"]),
("kotlin", &["*.kt", "*.kts"]),
("less", &["*.less"]),
("license", &[
(&["fsharp"], &["*.fs", "*.fsx", "*.fsi"]),
(&["fut"], &["*.fut"]),
(&["gap"], &["*.g", "*.gap", "*.gi", "*.gd", "*.tst"]),
(&["gn"], &["*.gn", "*.gni"]),
(&["go"], &["*.go"]),
(&["gprbuild"], &["*.gpr"]),
(&["gradle"], &[
"*.gradle", "*.gradle.kts", "gradle.properties", "gradle-wrapper.*",
"gradlew", "gradlew.bat",
]),
(&["graphql"], &["*.graphql", "*.graphqls"]),
(&["groovy"], &["*.groovy", "*.gradle"]),
(&["gzip"], &["*.gz", "*.tgz"]),
(&["h"], &["*.h", "*.hh", "*.hpp"]),
(&["haml"], &["*.haml"]),
(&["hare"], &["*.ha"]),
(&["haskell"], &["*.hs", "*.lhs", "*.cpphs", "*.c2hs", "*.hsc"]),
(&["hbs"], &["*.hbs"]),
(&["hs"], &["*.hs", "*.lhs"]),
(&["html"], &["*.htm", "*.html", "*.ejs"]),
(&["hy"], &["*.hy"]),
(&["idris"], &["*.idr", "*.lidr"]),
(&["janet"], &["*.janet"]),
(&["java"], &["*.java", "*.jsp", "*.jspx", "*.properties"]),
(&["jinja"], &["*.j2", "*.jinja", "*.jinja2"]),
(&["jl"], &["*.jl"]),
(&["js"], &["*.js", "*.jsx", "*.vue", "*.cjs", "*.mjs"]),
(&["json"], &["*.json", "composer.lock", "*.sarif"]),
(&["jsonl"], &["*.jsonl"]),
(&["julia"], &["*.jl"]),
(&["jupyter"], &["*.ipynb", "*.jpynb"]),
(&["k"], &["*.k"]),
(&["kotlin"], &["*.kt", "*.kts"]),
(&["lean"], &["*.lean"]),
(&["less"], &["*.less"]),
(&["license"], &[
// General
"COPYING", "COPYING[.-]*",
"COPYRIGHT", "COPYRIGHT[.-]*",
@ -129,71 +147,93 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
"MPL-*[0-9]*",
"OFL-*[0-9]*",
]),
("lilypond", &["*.ly", "*.ily"]),
("lisp", &["*.el", "*.jl", "*.lisp", "*.lsp", "*.sc", "*.scm"]),
("lock", &["*.lock", "package-lock.json"]),
("log", &["*.log"]),
("lua", &["*.lua"]),
("lz4", &["*.lz4"]),
("lzma", &["*.lzma"]),
("m4", &["*.ac", "*.m4"]),
("make", &[
(&["lilypond"], &["*.ly", "*.ily"]),
(&["lisp"], &["*.el", "*.jl", "*.lisp", "*.lsp", "*.sc", "*.scm"]),
(&["lock"], &["*.lock", "package-lock.json"]),
(&["log"], &["*.log"]),
(&["lua"], &["*.lua"]),
(&["lz4"], &["*.lz4"]),
(&["lzma"], &["*.lzma"]),
(&["m4"], &["*.ac", "*.m4"]),
(&["make"], &[
"[Gg][Nn][Uu]makefile", "[Mm]akefile",
"[Gg][Nn][Uu]makefile.am", "[Mm]akefile.am",
"[Gg][Nn][Uu]makefile.in", "[Mm]akefile.in",
"*.mk", "*.mak"
]),
("mako", &["*.mako", "*.mao"]),
("man", &["*.[0-9lnpx]", "*.[0-9][cEFMmpSx]"]),
("markdown", &["*.markdown", "*.md", "*.mdown", "*.mkdn"]),
("matlab", &["*.m"]),
("md", &["*.markdown", "*.md", "*.mdown", "*.mkdn"]),
("meson", &["meson.build", "meson_options.txt"]),
("minified", &["*.min.html", "*.min.css", "*.min.js"]),
("mint", &["*.mint"]),
("mk", &["mkfile"]),
("ml", &["*.ml"]),
("msbuild", &[
"*.csproj", "*.fsproj", "*.vcxproj", "*.proj", "*.props", "*.targets",
(&["mako"], &["*.mako", "*.mao"]),
(&["man"], &["*.[0-9lnpx]", "*.[0-9][cEFMmpSx]"]),
(&["markdown", "md"], &[
"*.markdown",
"*.md",
"*.mdown",
"*.mdwn",
"*.mkd",
"*.mkdn",
"*.mdx",
]),
("nim", &["*.nim", "*.nimf", "*.nimble", "*.nims"]),
("nix", &["*.nix"]),
("objc", &["*.h", "*.m"]),
("objcpp", &["*.h", "*.mm"]),
("ocaml", &["*.ml", "*.mli", "*.mll", "*.mly"]),
("org", &["*.org", "*.org_archive"]),
("pants", &["BUILD"]),
("pascal", &["*.pas", "*.dpr", "*.lpr", "*.pp", "*.inc"]),
("pdf", &["*.pdf"]),
("perl", &["*.perl", "*.pl", "*.PL", "*.plh", "*.plx", "*.pm", "*.t"]),
("php", &["*.php", "*.php3", "*.php4", "*.php5", "*.phtml"]),
("po", &["*.po"]),
("pod", &["*.pod"]),
("postscript", &["*.eps", "*.ps"]),
("protobuf", &["*.proto"]),
("ps", &["*.cdxml", "*.ps1", "*.ps1xml", "*.psd1", "*.psm1"]),
("puppet", &["*.erb", "*.pp", "*.rb"]),
("purs", &["*.purs"]),
("py", &["*.py"]),
("qmake", &["*.pro", "*.pri", "*.prf"]),
("qml", &["*.qml"]),
("r", &["*.R", "*.r", "*.Rmd", "*.Rnw"]),
("racket", &["*.rkt"]),
("rdoc", &["*.rdoc"]),
("readme", &["README*", "*README"]),
("red", &["*.r", "*.red", "*.reds"]),
("robot", &["*.robot"]),
("rst", &["*.rst"]),
("ruby", &[
(&["matlab"], &["*.m"]),
(&["meson"], &["meson.build", "meson_options.txt", "meson.options"]),
(&["minified"], &["*.min.html", "*.min.css", "*.min.js"]),
(&["mint"], &["*.mint"]),
(&["mk"], &["mkfile"]),
(&["ml"], &["*.ml"]),
(&["motoko"], &["*.mo"]),
(&["msbuild"], &[
"*.csproj", "*.fsproj", "*.vcxproj", "*.proj", "*.props", "*.targets",
"*.sln",
]),
(&["nim"], &["*.nim", "*.nimf", "*.nimble", "*.nims"]),
(&["nix"], &["*.nix"]),
(&["objc"], &["*.h", "*.m"]),
(&["objcpp"], &["*.h", "*.mm"]),
(&["ocaml"], &["*.ml", "*.mli", "*.mll", "*.mly"]),
(&["org"], &["*.org", "*.org_archive"]),
(&["pants"], &["BUILD"]),
(&["pascal"], &["*.pas", "*.dpr", "*.lpr", "*.pp", "*.inc"]),
(&["pdf"], &["*.pdf"]),
(&["perl"], &["*.perl", "*.pl", "*.PL", "*.plh", "*.plx", "*.pm", "*.t"]),
(&["php"], &[
// note that PHP 6 doesn't exist
// See: https://wiki.php.net/rfc/php6
"*.php", "*.php3", "*.php4", "*.php5", "*.php7", "*.php8",
"*.pht", "*.phtml"
]),
(&["po"], &["*.po"]),
(&["pod"], &["*.pod"]),
(&["postscript"], &["*.eps", "*.ps"]),
(&["prolog"], &["*.pl", "*.pro", "*.prolog", "*.P"]),
(&["protobuf"], &["*.proto"]),
(&["ps"], &["*.cdxml", "*.ps1", "*.ps1xml", "*.psd1", "*.psm1"]),
(&["puppet"], &["*.epp", "*.erb", "*.pp", "*.rb"]),
(&["purs"], &["*.purs"]),
(&["py", "python"], &["*.py", "*.pyi"]),
(&["qmake"], &["*.pro", "*.pri", "*.prf"]),
(&["qml"], &["*.qml"]),
(&["r"], &["*.R", "*.r", "*.Rmd", "*.Rnw"]),
(&["racket"], &["*.rkt"]),
(&["raku"], &[
"*.raku", "*.rakumod", "*.rakudoc", "*.rakutest",
"*.p6", "*.pl6", "*.pm6"
]),
(&["rdoc"], &["*.rdoc"]),
(&["readme"], &["README*", "*README"]),
(&["reasonml"], &["*.re", "*.rei"]),
(&["red"], &["*.r", "*.red", "*.reds"]),
(&["rescript"], &["*.res", "*.resi"]),
(&["robot"], &["*.robot"]),
(&["rst"], &["*.rst"]),
(&["ruby"], &[
// Idiomatic files
"config.ru", "Gemfile", ".irbrc", "Rakefile",
// Extensions
"*.gemspec", "*.rb", "*.rbw"
]),
("rust", &["*.rs"]),
("sass", &["*.sass", "*.scss"]),
("scala", &["*.scala", "*.sbt"]),
("sh", &[
(&["rust"], &["*.rs"]),
(&["sass"], &["*.sass", "*.scss"]),
(&["scala"], &["*.scala", "*.sbt"]),
(&["seed7"], &["*.sd7", "*.s7i"]),
(&["sh"], &[
// Portable/misc. init files
".login", ".logout", ".profile", "profile",
// bash-specific init files
@ -216,59 +256,69 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
// Extensions
"*.bash", "*.csh", "*.ksh", "*.sh", "*.tcsh", "*.zsh",
]),
("slim", &["*.skim", "*.slim", "*.slime"]),
("smarty", &["*.tpl"]),
("sml", &["*.sml", "*.sig"]),
("soy", &["*.soy"]),
("spark", &["*.spark"]),
("spec", &["*.spec"]),
("sql", &["*.sql", "*.psql"]),
("stylus", &["*.styl"]),
("sv", &["*.v", "*.vg", "*.sv", "*.svh", "*.h"]),
("svg", &["*.svg"]),
("swift", &["*.swift"]),
("swig", &["*.def", "*.i"]),
("systemd", &[
(&["slim"], &["*.skim", "*.slim", "*.slime"]),
(&["smarty"], &["*.tpl"]),
(&["sml"], &["*.sml", "*.sig"]),
(&["solidity"], &["*.sol"]),
(&["soy"], &["*.soy"]),
(&["spark"], &["*.spark"]),
(&["spec"], &["*.spec"]),
(&["sql"], &["*.sql", "*.psql"]),
(&["stylus"], &["*.styl"]),
(&["sv"], &["*.v", "*.vg", "*.sv", "*.svh", "*.h"]),
(&["svelte"], &["*.svelte"]),
(&["svg"], &["*.svg"]),
(&["swift"], &["*.swift"]),
(&["swig"], &["*.def", "*.i"]),
(&["systemd"], &[
"*.automount", "*.conf", "*.device", "*.link", "*.mount", "*.path",
"*.scope", "*.service", "*.slice", "*.socket", "*.swap", "*.target",
"*.timer",
]),
("taskpaper", &["*.taskpaper"]),
("tcl", &["*.tcl"]),
("tex", &["*.tex", "*.ltx", "*.cls", "*.sty", "*.bib", "*.dtx", "*.ins"]),
("texinfo", &["*.texi"]),
("textile", &["*.textile"]),
("tf", &["*.tf"]),
("thrift", &["*.thrift"]),
("toml", &["*.toml", "Cargo.lock"]),
("ts", &["*.ts", "*.tsx"]),
("twig", &["*.twig"]),
("txt", &["*.txt"]),
("typoscript", &["*.typoscript", "*.ts"]),
("vala", &["*.vala"]),
("vb", &["*.vb"]),
("vcl", &["*.vcl"]),
("verilog", &["*.v", "*.vh", "*.sv", "*.svh"]),
("vhdl", &["*.vhd", "*.vhdl"]),
("vim", &[
(&["taskpaper"], &["*.taskpaper"]),
(&["tcl"], &["*.tcl"]),
(&["tex"], &["*.tex", "*.ltx", "*.cls", "*.sty", "*.bib", "*.dtx", "*.ins"]),
(&["texinfo"], &["*.texi"]),
(&["textile"], &["*.textile"]),
(&["tf"], &[
"*.tf", "*.auto.tfvars", "terraform.tfvars", "*.tf.json",
"*.auto.tfvars.json", "terraform.tfvars.json", "*.terraformrc",
"terraform.rc", "*.tfrc", "*.terraform.lock.hcl",
]),
(&["thrift"], &["*.thrift"]),
(&["toml"], &["*.toml", "Cargo.lock"]),
(&["ts", "typescript"], &["*.ts", "*.tsx", "*.cts", "*.mts"]),
(&["twig"], &["*.twig"]),
(&["txt"], &["*.txt"]),
(&["typoscript"], &["*.typoscript", "*.ts"]),
(&["usd"], &["*.usd", "*.usda", "*.usdc"]),
(&["v"], &["*.v", "*.vsh"]),
(&["vala"], &["*.vala"]),
(&["vb"], &["*.vb"]),
(&["vcl"], &["*.vcl"]),
(&["verilog"], &["*.v", "*.vh", "*.sv", "*.svh"]),
(&["vhdl"], &["*.vhd", "*.vhdl"]),
(&["vim"], &[
"*.vim", ".vimrc", ".gvimrc", "vimrc", "gvimrc", "_vimrc", "_gvimrc",
]),
("vimscript", &[
(&["vimscript"], &[
"*.vim", ".vimrc", ".gvimrc", "vimrc", "gvimrc", "_vimrc", "_gvimrc",
]),
("webidl", &["*.idl", "*.webidl", "*.widl"]),
("wiki", &["*.mediawiki", "*.wiki"]),
("xml", &[
(&["vue"], &["*.vue"]),
(&["webidl"], &["*.idl", "*.webidl", "*.widl"]),
(&["wgsl"], &["*.wgsl"]),
(&["wiki"], &["*.mediawiki", "*.wiki"]),
(&["xml"], &[
"*.xml", "*.xml.dist", "*.dtd", "*.xsl", "*.xslt", "*.xsd", "*.xjb",
"*.rng", "*.sch", "*.xhtml",
]),
("xz", &["*.xz", "*.txz"]),
("yacc", &["*.y"]),
("yaml", &["*.yaml", "*.yml"]),
("yang", &["*.yang"]),
("z", &["*.Z"]),
("zig", &["*.zig"]),
("zsh", &[
(&["xz"], &["*.xz", "*.txz"]),
(&["yacc"], &["*.y"]),
(&["yaml"], &["*.yaml", "*.yml"]),
(&["yang"], &["*.yang"]),
(&["z"], &["*.Z"]),
(&["zig"], &["*.zig"]),
(&["zsh"], &[
".zshenv", "zshenv",
".zlogin", "zlogin",
".zlogout", "zlogout",
@ -276,5 +326,27 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
".zshrc", "zshrc",
"*.zsh",
]),
("zstd", &["*.zst", "*.zstd"]),
(&["zstd"], &["*.zst", "*.zstd"]),
];
#[cfg(test)]
mod tests {
use super::DEFAULT_TYPES;
#[test]
fn default_types_are_sorted() {
let mut names = DEFAULT_TYPES.iter().map(|(aliases, _)| aliases[0]);
let Some(mut previous_name) = names.next() else {
return;
};
for name in names {
assert!(
name > previous_name,
r#""{}" should be sorted before "{}" in `DEFAULT_TYPES`"#,
name,
previous_name
);
previous_name = name;
}
}
}

View File

@ -13,28 +13,34 @@
// with non-obvious failure modes. Alas, such things haven't been documented
// well.
use std::collections::HashMap;
use std::ffi::{OsStr, OsString};
use std::fs::{File, FileType};
use std::io::{self, BufRead};
use std::path::{Path, PathBuf};
use std::sync::{Arc, RwLock};
use std::{
collections::HashMap,
ffi::{OsStr, OsString},
fs::{File, FileType},
io::{self, BufRead},
path::{Path, PathBuf},
sync::{Arc, RwLock, Weak},
};
use crate::gitignore::{self, Gitignore, GitignoreBuilder};
use crate::overrides::{self, Override};
use crate::pathutil::{is_hidden, strip_prefix};
use crate::types::{self, Types};
use crate::walk::DirEntry;
use crate::{Error, Match, PartialErrorBuilder};
use crate::{
gitignore::{self, Gitignore, GitignoreBuilder},
overrides::{self, Override},
pathutil::{is_hidden, strip_prefix},
types::{self, Types},
walk::DirEntry,
{Error, Match, PartialErrorBuilder},
};
/// IgnoreMatch represents information about where a match came from when using
/// the `Ignore` matcher.
#[derive(Clone, Debug)]
pub struct IgnoreMatch<'a>(IgnoreMatchInner<'a>);
#[allow(dead_code)]
pub(crate) struct IgnoreMatch<'a>(IgnoreMatchInner<'a>);
/// IgnoreMatchInner describes precisely where the match information came from.
/// This is private to allow expansion to more matchers in the future.
#[derive(Clone, Debug)]
#[allow(dead_code)]
enum IgnoreMatchInner<'a> {
Override(overrides::Glob<'a>),
Gitignore(&'a gitignore::Glob),
@ -85,7 +91,7 @@ struct IgnoreOptions {
/// Ignore is a matcher useful for recursively walking one or more directories.
#[derive(Clone, Debug)]
pub struct Ignore(Arc<IgnoreInner>);
pub(crate) struct Ignore(Arc<IgnoreInner>);
#[derive(Clone, Debug)]
struct IgnoreInner {
@ -95,7 +101,7 @@ struct IgnoreInner {
/// Note that this is never used during matching, only when adding new
/// parent directory matchers. This avoids needing to rebuild glob sets for
/// parent directories if many paths are being searched.
compiled: Arc<RwLock<HashMap<OsString, Ignore>>>,
compiled: Arc<RwLock<HashMap<OsString, Weak<IgnoreInner>>>>,
/// The path to the directory that this matcher was built from.
dir: PathBuf,
/// An override matcher (default is empty).
@ -134,22 +140,22 @@ struct IgnoreInner {
impl Ignore {
/// Return the directory path of this matcher.
pub fn path(&self) -> &Path {
pub(crate) fn path(&self) -> &Path {
&self.0.dir
}
/// Return true if this matcher has no parent.
pub fn is_root(&self) -> bool {
pub(crate) fn is_root(&self) -> bool {
self.0.parent.is_none()
}
/// Returns true if this matcher was added via the `add_parents` method.
pub fn is_absolute_parent(&self) -> bool {
pub(crate) fn is_absolute_parent(&self) -> bool {
self.0.is_absolute_parent
}
/// Return this matcher's parent, if one exists.
pub fn parent(&self) -> Option<Ignore> {
pub(crate) fn parent(&self) -> Option<Ignore> {
self.0.parent.clone()
}
@ -157,7 +163,7 @@ impl Ignore {
///
/// Note that this can only be called on an `Ignore` matcher with no
/// parents (i.e., `is_root` returns `true`). This will panic otherwise.
pub fn add_parents<P: AsRef<Path>>(
pub(crate) fn add_parents<P: AsRef<Path>>(
&self,
path: P,
) -> (Ignore, Option<Error>) {
@ -194,9 +200,11 @@ impl Ignore {
let mut ig = self.clone();
for parent in parents.into_iter().rev() {
let mut compiled = self.0.compiled.write().unwrap();
if let Some(prebuilt) = compiled.get(parent.as_os_str()) {
ig = prebuilt.clone();
continue;
if let Some(weak) = compiled.get(parent.as_os_str()) {
if let Some(prebuilt) = weak.upgrade() {
ig = Ignore(prebuilt);
continue;
}
}
let (mut igtmp, err) = ig.add_child_path(parent);
errs.maybe_push(err);
@ -208,8 +216,12 @@ impl Ignore {
} else {
false
};
ig = Ignore(Arc::new(igtmp));
compiled.insert(parent.as_os_str().to_os_string(), ig.clone());
let ig_arc = Arc::new(igtmp);
ig = Ignore(ig_arc.clone());
compiled.insert(
parent.as_os_str().to_os_string(),
Arc::downgrade(&ig_arc),
);
}
(ig, errs.into_error_option())
}
@ -222,7 +234,7 @@ impl Ignore {
/// returned if it exists.
///
/// Note that all I/O errors are completely ignored.
pub fn add_child<P: AsRef<Path>>(
pub(crate) fn add_child<P: AsRef<Path>>(
&self,
dir: P,
) -> (Ignore, Option<Error>) {
@ -335,7 +347,7 @@ impl Ignore {
}
/// Like `matched`, but works with a directory entry instead.
pub fn matched_dir_entry<'a>(
pub(crate) fn matched_dir_entry<'a>(
&'a self,
dent: &DirEntry,
) -> Match<IgnoreMatch<'a>> {
@ -442,7 +454,29 @@ impl Ignore {
}
if self.0.opts.parents {
if let Some(abs_parent_path) = self.absolute_base() {
let path = abs_parent_path.join(path);
// What we want to do here is take the absolute base path of
// this directory and join it with the path we're searching.
// The main issue we want to avoid is accidentally duplicating
// directory components, so we try to strip any common prefix
// off of `path`. Overall, this seems a little ham-fisted, but
// it does fix a nasty bug. It should do fine until we overhaul
// this crate.
let dirpath = self.0.dir.as_path();
let path_prefix = match strip_prefix("./", dirpath) {
None => dirpath,
Some(stripped_dot_slash) => stripped_dot_slash,
};
let path = match strip_prefix(path_prefix, path) {
None => abs_parent_path.join(path),
Some(p) => {
let p = match strip_prefix("/", p) {
None => p,
Some(p) => p,
};
abs_parent_path.join(p)
}
};
for ig in
self.parents().skip_while(|ig| !ig.0.is_absolute_parent)
{
@ -498,7 +532,7 @@ impl Ignore {
}
/// Returns an iterator over parent ignore matchers, including this one.
pub fn parents(&self) -> Parents<'_> {
pub(crate) fn parents(&self) -> Parents<'_> {
Parents(Some(self))
}
@ -512,7 +546,7 @@ impl Ignore {
/// An iterator over all parents of an ignore matcher, including itself.
///
/// The lifetime `'a` refers to the lifetime of the initial `Ignore` matcher.
pub struct Parents<'a>(Option<&'a Ignore>);
pub(crate) struct Parents<'a>(Option<&'a Ignore>);
impl<'a> Iterator for Parents<'a> {
type Item = &'a Ignore;
@ -530,7 +564,7 @@ impl<'a> Iterator for Parents<'a> {
/// A builder for creating an Ignore matcher.
#[derive(Clone, Debug)]
pub struct IgnoreBuilder {
pub(crate) struct IgnoreBuilder {
/// The root directory path for this ignore matcher.
dir: PathBuf,
/// An override matcher (default is empty).
@ -550,7 +584,7 @@ impl IgnoreBuilder {
///
/// All relative file paths are resolved with respect to the current
/// working directory.
pub fn new() -> IgnoreBuilder {
pub(crate) fn new() -> IgnoreBuilder {
IgnoreBuilder {
dir: Path::new("").to_path_buf(),
overrides: Arc::new(Override::empty()),
@ -574,7 +608,7 @@ impl IgnoreBuilder {
///
/// The matcher returned won't match anything until ignore rules from
/// directories are added to it.
pub fn build(&self) -> Ignore {
pub(crate) fn build(&self) -> Ignore {
let git_global_matcher = if !self.opts.git_global {
Gitignore::empty()
} else {
@ -616,7 +650,10 @@ impl IgnoreBuilder {
/// By default, no override matcher is used.
///
/// This overrides any previous setting.
pub fn overrides(&mut self, overrides: Override) -> &mut IgnoreBuilder {
pub(crate) fn overrides(
&mut self,
overrides: Override,
) -> &mut IgnoreBuilder {
self.overrides = Arc::new(overrides);
self
}
@ -626,13 +663,13 @@ impl IgnoreBuilder {
/// By default, no file type matcher is used.
///
/// This overrides any previous setting.
pub fn types(&mut self, types: Types) -> &mut IgnoreBuilder {
pub(crate) fn types(&mut self, types: Types) -> &mut IgnoreBuilder {
self.types = Arc::new(types);
self
}
/// Adds a new global ignore matcher from the ignore file path given.
pub fn add_ignore(&mut self, ig: Gitignore) -> &mut IgnoreBuilder {
pub(crate) fn add_ignore(&mut self, ig: Gitignore) -> &mut IgnoreBuilder {
self.explicit_ignores.push(ig);
self
}
@ -643,7 +680,7 @@ impl IgnoreBuilder {
///
/// When specifying multiple names, earlier names have lower precedence than
/// later names.
pub fn add_custom_ignore_filename<S: AsRef<OsStr>>(
pub(crate) fn add_custom_ignore_filename<S: AsRef<OsStr>>(
&mut self,
file_name: S,
) -> &mut IgnoreBuilder {
@ -654,7 +691,7 @@ impl IgnoreBuilder {
/// Enables ignoring hidden files.
///
/// This is enabled by default.
pub fn hidden(&mut self, yes: bool) -> &mut IgnoreBuilder {
pub(crate) fn hidden(&mut self, yes: bool) -> &mut IgnoreBuilder {
self.opts.hidden = yes;
self
}
@ -665,7 +702,7 @@ impl IgnoreBuilder {
/// supported by search tools such as ripgrep and The Silver Searcher.
///
/// This is enabled by default.
pub fn ignore(&mut self, yes: bool) -> &mut IgnoreBuilder {
pub(crate) fn ignore(&mut self, yes: bool) -> &mut IgnoreBuilder {
self.opts.ignore = yes;
self
}
@ -676,7 +713,7 @@ impl IgnoreBuilder {
/// file path given are respected. Otherwise, they are ignored.
///
/// This is enabled by default.
pub fn parents(&mut self, yes: bool) -> &mut IgnoreBuilder {
pub(crate) fn parents(&mut self, yes: bool) -> &mut IgnoreBuilder {
self.opts.parents = yes;
self
}
@ -689,7 +726,7 @@ impl IgnoreBuilder {
/// This overwrites any previous global gitignore setting.
///
/// This is enabled by default.
pub fn git_global(&mut self, yes: bool) -> &mut IgnoreBuilder {
pub(crate) fn git_global(&mut self, yes: bool) -> &mut IgnoreBuilder {
self.opts.git_global = yes;
self
}
@ -700,7 +737,7 @@ impl IgnoreBuilder {
/// man page.
///
/// This is enabled by default.
pub fn git_ignore(&mut self, yes: bool) -> &mut IgnoreBuilder {
pub(crate) fn git_ignore(&mut self, yes: bool) -> &mut IgnoreBuilder {
self.opts.git_ignore = yes;
self
}
@ -711,7 +748,7 @@ impl IgnoreBuilder {
/// `gitignore` man page.
///
/// This is enabled by default.
pub fn git_exclude(&mut self, yes: bool) -> &mut IgnoreBuilder {
pub(crate) fn git_exclude(&mut self, yes: bool) -> &mut IgnoreBuilder {
self.opts.git_exclude = yes;
self
}
@ -721,7 +758,7 @@ impl IgnoreBuilder {
///
/// When disabled, git-related ignore rules are applied even when searching
/// outside a git repository.
pub fn require_git(&mut self, yes: bool) -> &mut IgnoreBuilder {
pub(crate) fn require_git(&mut self, yes: bool) -> &mut IgnoreBuilder {
self.opts.require_git = yes;
self
}
@ -729,7 +766,7 @@ impl IgnoreBuilder {
/// Process ignore files case insensitively
///
/// This is disabled by default.
pub fn ignore_case_insensitive(
pub(crate) fn ignore_case_insensitive(
&mut self,
yes: bool,
) -> &mut IgnoreBuilder {
@ -746,7 +783,7 @@ impl IgnoreBuilder {
/// precedence than later names).
///
/// I/O errors are ignored.
pub fn create_gitignore<T: AsRef<OsStr>>(
pub(crate) fn create_gitignore<T: AsRef<OsStr>>(
dir: &Path,
dir_for_ignorefile: &Path,
names: &[T],
@ -839,22 +876,19 @@ fn resolve_git_commondir(
#[cfg(test)]
mod tests {
use std::fs::{self, File};
use std::io::Write;
use std::path::Path;
use std::{io::Write, path::Path};
use crate::dir::IgnoreBuilder;
use crate::gitignore::Gitignore;
use crate::tests::TempDir;
use crate::Error;
use crate::{
dir::IgnoreBuilder, gitignore::Gitignore, tests::TempDir, Error,
};
fn wfile<P: AsRef<Path>>(path: P, contents: &str) {
let mut file = File::create(path).unwrap();
let mut file = std::fs::File::create(path).unwrap();
file.write_all(contents.as_bytes()).unwrap();
}
fn mkdirp<P: AsRef<Path>>(path: P) {
fs::create_dir_all(path).unwrap();
std::fs::create_dir_all(path).unwrap();
}
fn partial(err: Error) -> Vec<Error> {
@ -1171,7 +1205,7 @@ mod tests {
assert!(ignore.matched("ignore_me", false).is_ignore());
// missing commondir file
assert!(fs::remove_file(commondir_path()).is_ok());
assert!(std::fs::remove_file(commondir_path()).is_ok());
let (_, err) = ib.add_child(td.path().join("linked-worktree"));
// We squash the error in this case, because it occurs in repositories
// that are not linked worktrees but have submodules.

View File

@ -7,20 +7,22 @@ Note that this module implements the specification as described in the
the `git` command line tool.
*/
use std::cell::RefCell;
use std::env;
use std::fs::File;
use std::io::{self, BufRead, Read};
use std::path::{Path, PathBuf};
use std::str;
use std::sync::Arc;
use std::{
fs::File,
io::{BufRead, BufReader, Read},
path::{Path, PathBuf},
sync::Arc,
};
use globset::{Candidate, GlobBuilder, GlobSet, GlobSetBuilder};
use regex::bytes::Regex;
use thread_local::ThreadLocal;
use {
globset::{Candidate, GlobBuilder, GlobSet, GlobSetBuilder},
regex_automata::util::pool::Pool,
};
use crate::pathutil::{is_file_name, strip_prefix};
use crate::{Error, Match, PartialErrorBuilder};
use crate::{
pathutil::{is_file_name, strip_prefix},
Error, Match, PartialErrorBuilder,
};
/// Glob represents a single glob in a gitignore file.
///
@ -82,7 +84,7 @@ pub struct Gitignore {
globs: Vec<Glob>,
num_ignores: u64,
num_whitelists: u64,
matches: Option<Arc<ThreadLocal<RefCell<Vec<usize>>>>>,
matches: Option<Arc<Pool<Vec<usize>>>>,
}
impl Gitignore {
@ -249,8 +251,7 @@ impl Gitignore {
return Match::None;
}
let path = path.as_ref();
let _matches = self.matches.as_ref().unwrap().get_or_default();
let mut matches = _matches.borrow_mut();
let mut matches = self.matches.as_ref().unwrap().get();
let candidate = Candidate::new(path);
self.set.matches_candidate_into(&candidate, &mut *matches);
for &i in matches.iter().rev() {
@ -337,12 +338,12 @@ impl GitignoreBuilder {
.build()
.map_err(|err| Error::Glob { glob: None, err: err.to_string() })?;
Ok(Gitignore {
set: set,
set,
root: self.root.clone(),
globs: self.globs.clone(),
num_ignores: nignore as u64,
num_whitelists: nwhite as u64,
matches: Some(Arc::new(ThreadLocal::default())),
matches: Some(Arc::new(Pool::new(|| vec![]))),
})
}
@ -389,7 +390,8 @@ impl GitignoreBuilder {
Err(err) => return Some(Error::Io(err).with_path(path)),
Ok(file) => file,
};
let rdr = io::BufReader::new(file);
log::debug!("opened gitignore file: {}", path.display());
let rdr = BufReader::new(file);
let mut errs = PartialErrorBuilder::default();
for (i, line) in rdr.lines().enumerate() {
let lineno = (i + 1) as u64;
@ -448,7 +450,7 @@ impl GitignoreBuilder {
return Ok(self);
}
let mut glob = Glob {
from: from,
from,
original: line.to_string(),
actual: String::new(),
is_whitelist: false,
@ -474,10 +476,13 @@ impl GitignoreBuilder {
}
// If it ends with a slash, then this should only match directories,
// but the slash should otherwise not be used while globbing.
if let Some((i, c)) = line.char_indices().rev().nth(0) {
if c == '/' {
glob.is_only_dir = true;
line = &line[..i];
if line.as_bytes().last() == Some(&b'/') {
glob.is_only_dir = true;
line = &line[..line.len() - 1];
// If the slash was escaped, then remove the escape.
// See: https://github.com/BurntSushi/ripgrep/issues/2236
if line.as_bytes().last() == Some(&b'\\') {
line = &line[..line.len() - 1];
}
}
glob.actual = line.to_string();
@ -530,7 +535,7 @@ impl GitignoreBuilder {
/// Return the file path of the current environment's global gitignore file.
///
/// Note that the file path returned may not exist.
fn gitconfig_excludes_path() -> Option<PathBuf> {
pub fn gitconfig_excludes_path() -> Option<PathBuf> {
// git supports $HOME/.gitconfig and $XDG_CONFIG_HOME/git/config. Notably,
// both can be active at the same time, where $HOME/.gitconfig takes
// precedent. So if $HOME/.gitconfig defines a `core.excludesFile`, then
@ -555,7 +560,7 @@ fn gitconfig_home_contents() -> Option<Vec<u8>> {
};
let mut file = match File::open(home.join(".gitconfig")) {
Err(_) => return None,
Ok(file) => io::BufReader::new(file),
Ok(file) => BufReader::new(file),
};
let mut contents = vec![];
file.read_to_end(&mut contents).ok().map(|_| contents)
@ -564,13 +569,13 @@ fn gitconfig_home_contents() -> Option<Vec<u8>> {
/// Returns the file contents of git's global config file, if one exists, in
/// the user's XDG_CONFIG_HOME directory.
fn gitconfig_xdg_contents() -> Option<Vec<u8>> {
let path = env::var_os("XDG_CONFIG_HOME")
let path = std::env::var_os("XDG_CONFIG_HOME")
.and_then(|x| if x.is_empty() { None } else { Some(PathBuf::from(x)) })
.or_else(|| home_dir().map(|p| p.join(".config")))
.map(|x| x.join("git/config"));
let mut file = match path.and_then(|p| File::open(p).ok()) {
None => return None,
Some(file) => io::BufReader::new(file),
Some(file) => BufReader::new(file),
};
let mut contents = vec![];
file.read_to_end(&mut contents).ok().map(|_| contents)
@ -580,7 +585,7 @@ fn gitconfig_xdg_contents() -> Option<Vec<u8>> {
///
/// Specifically, this respects XDG_CONFIG_HOME.
fn excludes_file_default() -> Option<PathBuf> {
env::var_os("XDG_CONFIG_HOME")
std::env::var_os("XDG_CONFIG_HOME")
.and_then(|x| if x.is_empty() { None } else { Some(PathBuf::from(x)) })
.or_else(|| home_dir().map(|p| p.join(".config")))
.map(|x| x.join("git/ignore"))
@ -589,18 +594,28 @@ fn excludes_file_default() -> Option<PathBuf> {
/// Extract git's `core.excludesfile` config setting from the raw file contents
/// given.
fn parse_excludes_file(data: &[u8]) -> Option<PathBuf> {
use std::sync::OnceLock;
use regex_automata::{meta::Regex, util::syntax};
// N.B. This is the lazy approach, and isn't technically correct, but
// probably works in more circumstances. I guess we would ideally have
// a full INI parser. Yuck.
lazy_static::lazy_static! {
static ref RE: Regex =
Regex::new(r"(?im)^\s*excludesfile\s*=\s*(.+)\s*$").unwrap();
};
let caps = match RE.captures(data) {
None => return None,
Some(caps) => caps,
};
str::from_utf8(&caps[1]).ok().map(|s| PathBuf::from(expand_tilde(s)))
static RE: OnceLock<Regex> = OnceLock::new();
let re = RE.get_or_init(|| {
Regex::builder()
.configure(Regex::config().utf8_empty(false))
.syntax(syntax::Config::new().utf8(false))
.build(r#"(?im-u)^\s*excludesfile\s*=\s*"?\s*(\S+?)\s*"?\s*$"#)
.unwrap()
});
// We don't care about amortizing allocs here I think. This should only
// be called ~once per traversal or so? (Although it's not guaranteed...)
let mut caps = re.create_captures();
re.captures(data, &mut caps);
let span = caps.get_group(1)?;
let candidate = &data[span];
std::str::from_utf8(candidate).ok().map(|s| PathBuf::from(expand_tilde(s)))
}
/// Expands ~ in file paths to the value of $HOME.
@ -614,18 +629,18 @@ fn expand_tilde(path: &str) -> String {
/// Returns the location of the user's home directory.
fn home_dir() -> Option<PathBuf> {
// We're fine with using env::home_dir for now. Its bugs are, IMO, pretty
// minor corner cases. We should still probably eventually migrate to
// the `dirs` crate to get a proper implementation.
// We're fine with using std::env::home_dir for now. Its bugs are, IMO,
// pretty minor corner cases.
#![allow(deprecated)]
env::home_dir()
std::env::home_dir()
}
#[cfg(test)]
mod tests {
use super::{Gitignore, GitignoreBuilder};
use std::path::Path;
use super::{Gitignore, GitignoreBuilder};
fn gi_from_str<P: AsRef<Path>>(root: P, s: &str) -> Gitignore {
let mut builder = GitignoreBuilder::new(root);
builder.add_str(None, s).unwrap();
@ -758,6 +773,22 @@ mod tests {
assert!(super::parse_excludes_file(&data).is_none());
}
#[test]
fn parse_excludes_file4() {
let data = bytes("[core]\nexcludesFile = \"~/foo/bar\"");
let got = super::parse_excludes_file(&data);
assert_eq!(
path_string(got.unwrap()),
super::expand_tilde("~/foo/bar")
);
}
#[test]
fn parse_excludes_file5() {
let data = bytes("[core]\nexcludesFile = \" \"~/foo/bar \" \"");
assert!(super::parse_excludes_file(&data).is_none());
}
// See: https://github.com/BurntSushi/ripgrep/issues/106
#[test]
fn regression_106() {

View File

@ -46,9 +46,6 @@ See the documentation for `WalkBuilder` for many other options.
#![deny(missing_docs)]
use std::error;
use std::fmt;
use std::io;
use std::path::{Path, PathBuf};
pub use crate::walk::{
@ -101,7 +98,7 @@ pub enum Error {
child: PathBuf,
},
/// An error that occurs when doing I/O, such as reading an ignore file.
Io(io::Error),
Io(std::io::Error),
/// An error that occurs when trying to parse a glob.
Glob {
/// The original glob that caused this error. This glob, when
@ -125,21 +122,23 @@ impl Clone for Error {
match *self {
Error::Partial(ref errs) => Error::Partial(errs.clone()),
Error::WithLineNumber { line, ref err } => {
Error::WithLineNumber { line: line, err: err.clone() }
Error::WithLineNumber { line, err: err.clone() }
}
Error::WithPath { ref path, ref err } => {
Error::WithPath { path: path.clone(), err: err.clone() }
}
Error::WithDepth { depth, ref err } => {
Error::WithDepth { depth: depth, err: err.clone() }
Error::WithDepth { depth, err: err.clone() }
}
Error::Loop { ref ancestor, ref child } => Error::Loop {
ancestor: ancestor.clone(),
child: child.clone(),
},
Error::Io(ref err) => match err.raw_os_error() {
Some(e) => Error::Io(io::Error::from_raw_os_error(e)),
None => Error::Io(io::Error::new(err.kind(), err.to_string())),
Some(e) => Error::Io(std::io::Error::from_raw_os_error(e)),
None => {
Error::Io(std::io::Error::new(err.kind(), err.to_string()))
}
},
Error::Glob { ref glob, ref err } => {
Error::Glob { glob: glob.clone(), err: err.clone() }
@ -183,22 +182,22 @@ impl Error {
}
}
/// Inspect the original [`io::Error`] if there is one.
/// Inspect the original [`std::io::Error`] if there is one.
///
/// [`None`] is returned if the [`Error`] doesn't correspond to an
/// [`io::Error`]. This might happen, for example, when the error was
/// [`std::io::Error`]. This might happen, for example, when the error was
/// produced because a cycle was found in the directory tree while
/// following symbolic links.
///
/// This method returns a borrowed value that is bound to the lifetime of the [`Error`]. To
/// obtain an owned value, the [`into_io_error`] can be used instead.
///
/// > This is the original [`io::Error`] and is _not_ the same as
/// > [`impl From<Error> for std::io::Error`][impl] which contains additional context about the
/// error.
/// > This is the original [`std::io::Error`] and is _not_ the same as
/// > [`impl From<Error> for std::io::Error`][impl] which contains
/// > additional context about the error.
///
/// [`None`]: https://doc.rust-lang.org/stable/std/option/enum.Option.html#variant.None
/// [`io::Error`]: https://doc.rust-lang.org/stable/std/io/struct.Error.html
/// [`std::io::Error`]: https://doc.rust-lang.org/stable/std/io/struct.Error.html
/// [`From`]: https://doc.rust-lang.org/stable/std/convert/trait.From.html
/// [`Error`]: struct.Error.html
/// [`into_io_error`]: struct.Error.html#method.into_io_error
@ -224,10 +223,10 @@ impl Error {
}
/// Similar to [`io_error`] except consumes self to convert to the original
/// [`io::Error`] if one exists.
/// [`std::io::Error`] if one exists.
///
/// [`io_error`]: struct.Error.html#method.io_error
/// [`io::Error`]: https://doc.rust-lang.org/stable/std/io/struct.Error.html
/// [`std::io::Error`]: https://doc.rust-lang.org/stable/std/io/struct.Error.html
pub fn into_io_error(self) -> Option<std::io::Error> {
match self {
Error::Partial(mut errs) => {
@ -268,7 +267,7 @@ impl Error {
/// Turn an error into a tagged error with the given depth.
fn with_depth(self, depth: usize) -> Error {
Error::WithDepth { depth: depth, err: Box::new(self) }
Error::WithDepth { depth, err: Box::new(self) }
}
/// Turn an error into a tagged error with the given file path and line
@ -287,7 +286,7 @@ impl Error {
let depth = err.depth();
if let (Some(anc), Some(child)) = (err.loop_ancestor(), err.path()) {
return Error::WithDepth {
depth: depth,
depth,
err: Box::new(Error::Loop {
ancestor: anc.to_path_buf(),
child: child.to_path_buf(),
@ -295,15 +294,15 @@ impl Error {
};
}
let path = err.path().map(|p| p.to_path_buf());
let mut ig_err = Error::Io(io::Error::from(err));
let mut ig_err = Error::Io(std::io::Error::from(err));
if let Some(path) = path {
ig_err = Error::WithPath { path: path, err: Box::new(ig_err) };
ig_err = Error::WithPath { path, err: Box::new(ig_err) };
}
ig_err
}
}
impl error::Error for Error {
impl std::error::Error for Error {
#[allow(deprecated)]
fn description(&self) -> &str {
match *self {
@ -320,8 +319,8 @@ impl error::Error for Error {
}
}
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
impl std::fmt::Display for Error {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match *self {
Error::Partial(ref errs) => {
let msgs: Vec<String> =
@ -359,8 +358,8 @@ impl fmt::Display for Error {
}
}
impl From<io::Error> for Error {
fn from(err: io::Error) -> Error {
impl From<std::io::Error> for Error {
fn from(err: std::io::Error) -> Error {
Error::Io(err)
}
}
@ -488,19 +487,18 @@ impl<T> Match<T> {
#[cfg(test)]
mod tests {
use std::env;
use std::error;
use std::fs;
use std::path::{Path, PathBuf};
use std::result;
use std::{
env, fs,
path::{Path, PathBuf},
};
/// A convenient result type alias.
pub type Result<T> =
result::Result<T, Box<dyn error::Error + Send + Sync>>;
pub(crate) type Result<T> =
std::result::Result<T, Box<dyn std::error::Error + Send + Sync>>;
macro_rules! err {
($($tt:tt)*) => {
Box::<dyn error::Error + Send + Sync>::from(format!($($tt)*))
Box::<dyn std::error::Error + Send + Sync>::from(format!($($tt)*))
}
}

View File

@ -6,8 +6,10 @@ line tools.
use std::path::Path;
use crate::gitignore::{self, Gitignore, GitignoreBuilder};
use crate::{Error, Match};
use crate::{
gitignore::{self, Gitignore, GitignoreBuilder},
Error, Match,
};
/// Glob represents a single glob in an override matcher.
///
@ -21,9 +23,11 @@ use crate::{Error, Match};
/// The lifetime `'a` refers to the lifetime of the matcher that produced
/// this glob.
#[derive(Clone, Debug)]
#[allow(dead_code)]
pub struct Glob<'a>(GlobInner<'a>);
#[derive(Clone, Debug)]
#[allow(dead_code)]
enum GlobInner<'a> {
/// No glob matched, but the file path should still be ignored.
UnmatchedIgnore,
@ -106,6 +110,7 @@ impl Override {
}
/// Builds a matcher for a set of glob overrides.
#[derive(Clone, Debug)]
pub struct OverrideBuilder {
builder: GitignoreBuilder,
}

View File

@ -1,5 +1,4 @@
use std::ffi::OsStr;
use std::path::Path;
use std::{ffi::OsStr, path::Path};
use crate::walk::DirEntry;
@ -9,7 +8,7 @@ use crate::walk::DirEntry;
///
/// On Unix, this implements a more optimized check.
#[cfg(unix)]
pub fn is_hidden(dent: &DirEntry) -> bool {
pub(crate) fn is_hidden(dent: &DirEntry) -> bool {
use std::os::unix::ffi::OsStrExt;
if let Some(name) = file_name(dent.path()) {
@ -26,7 +25,7 @@ pub fn is_hidden(dent: &DirEntry) -> bool {
/// * The base name of the path starts with a `.`.
/// * The file attributes have the `HIDDEN` property set.
#[cfg(windows)]
pub fn is_hidden(dent: &DirEntry) -> bool {
pub(crate) fn is_hidden(dent: &DirEntry) -> bool {
use std::os::windows::fs::MetadataExt;
use winapi_util::file;
@ -49,7 +48,7 @@ pub fn is_hidden(dent: &DirEntry) -> bool {
///
/// This only returns true if the base name of the path starts with a `.`.
#[cfg(not(any(unix, windows)))]
pub fn is_hidden(dent: &DirEntry) -> bool {
pub(crate) fn is_hidden(dent: &DirEntry) -> bool {
if let Some(name) = file_name(dent.path()) {
name.to_str().map(|s| s.starts_with(".")).unwrap_or(false)
} else {
@ -61,7 +60,7 @@ pub fn is_hidden(dent: &DirEntry) -> bool {
///
/// If `path` doesn't have a prefix `prefix`, then return `None`.
#[cfg(unix)]
pub fn strip_prefix<'a, P: AsRef<Path> + ?Sized>(
pub(crate) fn strip_prefix<'a, P: AsRef<Path> + ?Sized>(
prefix: &'a P,
path: &'a Path,
) -> Option<&'a Path> {
@ -80,7 +79,7 @@ pub fn strip_prefix<'a, P: AsRef<Path> + ?Sized>(
///
/// If `path` doesn't have a prefix `prefix`, then return `None`.
#[cfg(not(unix))]
pub fn strip_prefix<'a, P: AsRef<Path> + ?Sized>(
pub(crate) fn strip_prefix<'a, P: AsRef<Path> + ?Sized>(
prefix: &'a P,
path: &'a Path,
) -> Option<&'a Path> {
@ -90,10 +89,11 @@ pub fn strip_prefix<'a, P: AsRef<Path> + ?Sized>(
/// Returns true if this file path is just a file name. i.e., Its parent is
/// the empty string.
#[cfg(unix)]
pub fn is_file_name<P: AsRef<Path>>(path: P) -> bool {
use memchr::memchr;
pub(crate) fn is_file_name<P: AsRef<Path>>(path: P) -> bool {
use std::os::unix::ffi::OsStrExt;
use memchr::memchr;
let path = path.as_ref().as_os_str().as_bytes();
memchr(b'/', path).is_none()
}
@ -101,7 +101,7 @@ pub fn is_file_name<P: AsRef<Path>>(path: P) -> bool {
/// Returns true if this file path is just a file name. i.e., Its parent is
/// the empty string.
#[cfg(not(unix))]
pub fn is_file_name<P: AsRef<Path>>(path: P) -> bool {
pub(crate) fn is_file_name<P: AsRef<Path>>(path: P) -> bool {
path.as_ref().parent().map(|p| p.as_os_str().is_empty()).unwrap_or(false)
}
@ -110,7 +110,7 @@ pub fn is_file_name<P: AsRef<Path>>(path: P) -> bool {
/// If the path terminates in ., .., or consists solely of a root of prefix,
/// file_name will return None.
#[cfg(unix)]
pub fn file_name<'a, P: AsRef<Path> + ?Sized>(
pub(crate) fn file_name<'a, P: AsRef<Path> + ?Sized>(
path: &'a P,
) -> Option<&'a OsStr> {
use memchr::memrchr;
@ -135,7 +135,7 @@ pub fn file_name<'a, P: AsRef<Path> + ?Sized>(
/// If the path terminates in ., .., or consists solely of a root of prefix,
/// file_name will return None.
#[cfg(not(unix))]
pub fn file_name<'a, P: AsRef<Path> + ?Sized>(
pub(crate) fn file_name<'a, P: AsRef<Path> + ?Sized>(
path: &'a P,
) -> Option<&'a OsStr> {
path.as_ref().file_name()

View File

@ -84,18 +84,14 @@ assert!(matcher.matched("y.cpp", false).is_whitelist());
```
*/
use std::cell::RefCell;
use std::collections::HashMap;
use std::path::Path;
use std::sync::Arc;
use std::{collections::HashMap, path::Path, sync::Arc};
use globset::{GlobBuilder, GlobSet, GlobSetBuilder};
use regex::Regex;
use thread_local::ThreadLocal;
use {
globset::{GlobBuilder, GlobSet, GlobSetBuilder},
regex_automata::util::pool::Pool,
};
use crate::default_types::DEFAULT_TYPES;
use crate::pathutil::file_name;
use crate::{Error, Match};
use crate::{default_types::DEFAULT_TYPES, pathutil::file_name, Error, Match};
/// Glob represents a single glob in a set of file type definitions.
///
@ -181,7 +177,7 @@ pub struct Types {
/// The set of all glob selections, used for actual matching.
set: GlobSet,
/// Temporary storage for globs that match.
matches: Arc<ThreadLocal<RefCell<Vec<usize>>>>,
matches: Arc<Pool<Vec<usize>>>,
}
/// Indicates the type of a selection for a particular file type.
@ -235,7 +231,7 @@ impl Types {
has_selected: false,
glob_to_selection: vec![],
set: GlobSetBuilder::new().build().unwrap(),
matches: Arc::new(ThreadLocal::default()),
matches: Arc::new(Pool::new(|| vec![])),
}
}
@ -283,7 +279,7 @@ impl Types {
return Match::None;
}
};
let mut matches = self.matches.get_or_default().borrow_mut();
let mut matches = self.matches.get();
self.set.matches_into(name, &mut *matches);
// The highest precedent match is the last one.
if let Some(&i) = matches.last() {
@ -356,12 +352,12 @@ impl TypesBuilder {
.build()
.map_err(|err| Error::Glob { glob: None, err: err.to_string() })?;
Ok(Types {
defs: defs,
selections: selections,
has_selected: has_selected,
glob_to_selection: glob_to_selection,
set: set,
matches: Arc::new(ThreadLocal::default()),
defs,
selections,
has_selected,
glob_to_selection,
set,
matches: Arc::new(Pool::new(|| vec![])),
})
}
@ -419,10 +415,7 @@ impl TypesBuilder {
/// If `name` is `all` or otherwise contains any character that is not a
/// Unicode letter or number, then an error is returned.
pub fn add(&mut self, name: &str, glob: &str) -> Result<(), Error> {
lazy_static::lazy_static! {
static ref RE: Regex = Regex::new(r"^[\pL\pN]+$").unwrap();
};
if name == "all" || !RE.is_match(name) {
if name == "all" || !name.chars().all(|c| c.is_alphanumeric()) {
return Err(Error::InvalidDefinition);
}
let (key, glob) = (name.to_string(), glob.to_string());
@ -488,9 +481,11 @@ impl TypesBuilder {
/// Add a set of default file type definitions.
pub fn add_defaults(&mut self) -> &mut TypesBuilder {
static MSG: &'static str = "adding a default type should never fail";
for &(name, exts) in DEFAULT_TYPES {
for ext in exts {
self.add(name, ext).expect(MSG);
for &(names, exts) in DEFAULT_TYPES {
for name in names {
for ext in exts {
self.add(name, ext).expect(MSG);
}
}
}
self
@ -537,6 +532,8 @@ mod tests {
"html:*.htm",
"rust:*.rs",
"js:*.js",
"py:*.py",
"python:*.py",
"foo:*.{rs,foo}",
"combo:include:html,rust",
]
@ -551,6 +548,8 @@ mod tests {
matched!(match7, types(), vec!["foo"], vec!["rust"], "main.foo");
matched!(match8, types(), vec!["combo"], vec![], "index.html");
matched!(match9, types(), vec!["combo"], vec![], "lib.rs");
matched!(match10, types(), vec!["py"], vec![], "main.py");
matched!(match11, types(), vec!["python"], vec![], "main.py");
matched!(not, matchnot1, types(), vec!["rust"], vec![], "index.html");
matched!(not, matchnot2, types(), vec![], vec!["rust"], "main.rs");
@ -558,6 +557,8 @@ mod tests {
matched!(not, matchnot4, types(), vec!["rust"], vec!["foo"], "main.rs");
matched!(not, matchnot5, types(), vec!["rust"], vec!["foo"], "main.foo");
matched!(not, matchnot6, types(), vec!["combo"], vec![], "leftpad.js");
matched!(not, matchnot7, types(), vec!["py"], vec![], "index.html");
matched!(not, matchnot8, types(), vec!["python"], vec![], "doc.md");
#[test]
fn test_invalid_defs() {
@ -569,7 +570,7 @@ mod tests {
let original_defs = btypes.definitions();
let bad_defs = vec![
// Reference to type that does not exist
"combo:include:html,python",
"combo:include:html,qwerty",
// Bad format
"combo:foobar:html,rust",
"",

View File

@ -1,23 +1,26 @@
use std::cmp;
use std::ffi::OsStr;
use std::fmt;
use std::fs::{self, FileType, Metadata};
use std::io;
use std::path::{Path, PathBuf};
use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
use std::vec;
use std::{
cmp::Ordering,
ffi::OsStr,
fs::{self, FileType, Metadata},
io,
path::{Path, PathBuf},
sync::atomic::{AtomicBool, AtomicUsize, Ordering as AtomicOrdering},
sync::Arc,
};
use same_file::Handle;
use walkdir::{self, WalkDir};
use {
crossbeam_deque::{Stealer, Worker as Deque},
same_file::Handle,
walkdir::WalkDir,
};
use crate::dir::{Ignore, IgnoreBuilder};
use crate::gitignore::GitignoreBuilder;
use crate::overrides::Override;
use crate::types::Types;
use crate::{Error, PartialErrorBuilder};
use crate::{
dir::{Ignore, IgnoreBuilder},
gitignore::GitignoreBuilder,
overrides::Override,
types::Types,
Error, PartialErrorBuilder,
};
/// A directory entry with a possible error attached.
///
@ -36,9 +39,7 @@ impl DirEntry {
}
/// The full path that this entry represents.
/// Analogous to [`path`], but moves ownership of the path.
///
/// [`path`]: struct.DirEntry.html#method.path
/// Analogous to [`DirEntry::path`], but moves ownership of the path.
pub fn into_path(self) -> PathBuf {
self.dent.into_path()
}
@ -107,11 +108,11 @@ impl DirEntry {
}
fn new_walkdir(dent: walkdir::DirEntry, err: Option<Error>) -> DirEntry {
DirEntry { dent: DirEntryInner::Walkdir(dent), err: err }
DirEntry { dent: DirEntryInner::Walkdir(dent), err }
}
fn new_raw(dent: DirEntryRaw, err: Option<Error>) -> DirEntry {
DirEntry { dent: DirEntryInner::Raw(dent), err: err }
DirEntry { dent: DirEntryInner::Raw(dent), err }
}
}
@ -251,8 +252,8 @@ struct DirEntryRaw {
metadata: fs::Metadata,
}
impl fmt::Debug for DirEntryRaw {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
impl std::fmt::Debug for DirEntryRaw {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
// Leaving out FileType because it doesn't have a debug impl
// in Rust 1.9. We could add it if we really wanted to by manually
// querying each possibly file type. Meh. ---AG
@ -324,7 +325,7 @@ impl DirEntryRaw {
) -> Result<DirEntryRaw, Error> {
let ty = ent.file_type().map_err(|err| {
let err = Error::Io(io::Error::from(err)).with_path(ent.path());
Error::WithDepth { depth: depth, err: Box::new(err) }
Error::WithDepth { depth, err: Box::new(err) }
})?;
DirEntryRaw::from_entry_os(depth, ent, ty)
}
@ -337,13 +338,13 @@ impl DirEntryRaw {
) -> Result<DirEntryRaw, Error> {
let md = ent.metadata().map_err(|err| {
let err = Error::Io(io::Error::from(err)).with_path(ent.path());
Error::WithDepth { depth: depth, err: Box::new(err) }
Error::WithDepth { depth, err: Box::new(err) }
})?;
Ok(DirEntryRaw {
path: ent.path(),
ty: ty,
ty,
follow_link: false,
depth: depth,
depth,
metadata: md,
})
}
@ -358,9 +359,9 @@ impl DirEntryRaw {
Ok(DirEntryRaw {
path: ent.path(),
ty: ty,
ty,
follow_link: false,
depth: depth,
depth,
ino: ent.ino(),
})
}
@ -391,7 +392,7 @@ impl DirEntryRaw {
path: pb,
ty: md.file_type(),
follow_link: link,
depth: depth,
depth,
metadata: md,
})
}
@ -410,7 +411,7 @@ impl DirEntryRaw {
path: pb,
ty: md.file_type(),
follow_link: link,
depth: depth,
depth,
ino: md.ino(),
})
}
@ -494,17 +495,15 @@ pub struct WalkBuilder {
#[derive(Clone)]
enum Sorter {
ByName(
Arc<dyn Fn(&OsStr, &OsStr) -> cmp::Ordering + Send + Sync + 'static>,
),
ByPath(Arc<dyn Fn(&Path, &Path) -> cmp::Ordering + Send + Sync + 'static>),
ByName(Arc<dyn Fn(&OsStr, &OsStr) -> Ordering + Send + Sync + 'static>),
ByPath(Arc<dyn Fn(&Path, &Path) -> Ordering + Send + Sync + 'static>),
}
#[derive(Clone)]
struct Filter(Arc<dyn Fn(&DirEntry) -> bool + Send + Sync + 'static>);
impl fmt::Debug for WalkBuilder {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
impl std::fmt::Debug for WalkBuilder {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("WalkBuilder")
.field("paths", &self.paths)
.field("ig_builder", &self.ig_builder)
@ -578,7 +577,7 @@ impl WalkBuilder {
.into_iter();
let ig_root = self.ig_builder.build();
Walk {
its: its,
its,
it: None,
ig_root: ig_root.clone(),
ig: ig_root.clone(),
@ -592,7 +591,7 @@ impl WalkBuilder {
///
/// Note that this *doesn't* return something that implements `Iterator`.
/// Instead, the returned value must be run with a closure. e.g.,
/// `builder.build_parallel().run(|| |path| println!("{:?}", path))`.
/// `builder.build_parallel().run(|| |path| { println!("{path:?}"); WalkState::Continue })`.
pub fn build_parallel(&self) -> WalkParallel {
WalkParallel {
paths: self.paths.clone().into_iter(),
@ -828,7 +827,7 @@ impl WalkBuilder {
/// Note that this is not used in the parallel iterator.
pub fn sort_by_file_path<F>(&mut self, cmp: F) -> &mut WalkBuilder
where
F: Fn(&Path, &Path) -> cmp::Ordering + Send + Sync + 'static,
F: Fn(&Path, &Path) -> Ordering + Send + Sync + 'static,
{
self.sorter = Some(Sorter::ByPath(Arc::new(cmp)));
self
@ -847,7 +846,7 @@ impl WalkBuilder {
/// Note that this is not used in the parallel iterator.
pub fn sort_by_file_name<F>(&mut self, cmp: F) -> &mut WalkBuilder
where
F: Fn(&OsStr, &OsStr) -> cmp::Ordering + Send + Sync + 'static,
F: Fn(&OsStr, &OsStr) -> Ordering + Send + Sync + 'static,
{
self.sorter = Some(Sorter::ByName(Arc::new(cmp)));
self
@ -911,7 +910,7 @@ impl WalkBuilder {
/// ignore files like `.gitignore` are respected. The precise matching rules
/// and precedence is explained in the documentation for `WalkBuilder`.
pub struct Walk {
its: vec::IntoIter<(PathBuf, Option<WalkEventIter>)>,
its: std::vec::IntoIter<(PathBuf, Option<WalkEventIter>)>,
it: Option<WalkEventIter>,
ig_root: Ignore,
ig: Ignore,
@ -941,7 +940,7 @@ impl Walk {
// overheads; an example of this was a bespoke filesystem layer in
// Windows that hosted files remotely and would download them on-demand
// when particular filesystem operations occurred. Users of this system
// who ensured correct file-type fileters were being used could still
// who ensured correct file-type filters were being used could still
// get unnecessary file access resulting in large downloads.
if should_skip_entry(&self.ig, ent) {
return Ok(true);
@ -1040,6 +1039,8 @@ impl Iterator for Walk {
}
}
impl std::iter::FusedIterator for Walk {}
/// WalkEventIter transforms a WalkDir iterator into an iterator that more
/// accurately describes the directory tree. Namely, it emits events that are
/// one of three types: directory, file or "exit." An "exit" event means that
@ -1123,10 +1124,10 @@ impl WalkState {
}
}
/// A builder for constructing a visitor when using
/// [`WalkParallel::visit`](struct.WalkParallel.html#method.visit). The builder
/// will be called for each thread started by `WalkParallel`. The visitor
/// returned from each builder is then called for every directory entry.
/// A builder for constructing a visitor when using [`WalkParallel::visit`].
/// The builder will be called for each thread started by `WalkParallel`. The
/// visitor returned from each builder is then called for every directory
/// entry.
pub trait ParallelVisitorBuilder<'s> {
/// Create per-thread `ParallelVisitor`s for `WalkParallel`.
fn build(&mut self) -> Box<dyn ParallelVisitor + 's>;
@ -1143,9 +1144,8 @@ impl<'a, 's, P: ParallelVisitorBuilder<'s>> ParallelVisitorBuilder<'s>
/// Receives files and directories for the current thread.
///
/// Setup for the traversal can be implemented as part of
/// [`ParallelVisitorBuilder::build`](trait.ParallelVisitorBuilder.html#tymethod.build).
/// Teardown when traversal finishes can be implemented by implementing the
/// `Drop` trait on your traversal type.
/// [`ParallelVisitorBuilder::build`]. Teardown when traversal finishes can be
/// implemented by implementing the `Drop` trait on your traversal type.
pub trait ParallelVisitor: Send {
/// Receives files and directories for the current thread. This is called
/// once for every directory entry visited by traversal.
@ -1187,7 +1187,7 @@ impl<'s> ParallelVisitor for FnVisitorImp<'s> {
///
/// Unlike `Walk`, this uses multiple threads for traversing a directory.
pub struct WalkParallel {
paths: vec::IntoIter<PathBuf>,
paths: std::vec::IntoIter<PathBuf>,
ig_root: Ignore,
max_filesize: Option<u64>,
max_depth: Option<usize>,
@ -1228,9 +1228,8 @@ impl WalkParallel {
/// can be merged together into a single data structure.
pub fn visit(mut self, builder: &mut dyn ParallelVisitorBuilder<'_>) {
let threads = self.threads();
let stack = Arc::new(Mutex::new(vec![]));
let mut stack = vec![];
{
let mut stack = stack.lock().unwrap();
let mut visitor = builder.build();
let mut paths = Vec::new().into_iter();
std::mem::swap(&mut paths, &mut self.paths);
@ -1268,9 +1267,9 @@ impl WalkParallel {
}
};
stack.push(Message::Work(Work {
dent: dent,
dent,
ignore: self.ig_root.clone(),
root_device: root_device,
root_device,
}));
}
// ... but there's no need to start workers if we don't need them.
@ -1280,29 +1279,28 @@ impl WalkParallel {
}
// Create the workers and then wait for them to finish.
let quit_now = Arc::new(AtomicBool::new(false));
let num_pending =
Arc::new(AtomicUsize::new(stack.lock().unwrap().len()));
crossbeam_utils::thread::scope(|s| {
let mut handles = vec![];
for _ in 0..threads {
let worker = Worker {
let active_workers = Arc::new(AtomicUsize::new(threads));
let stacks = Stack::new_for_each_thread(threads, stack);
std::thread::scope(|s| {
let handles: Vec<_> = stacks
.into_iter()
.map(|stack| Worker {
visitor: builder.build(),
stack: stack.clone(),
stack,
quit_now: quit_now.clone(),
num_pending: num_pending.clone(),
active_workers: active_workers.clone(),
max_depth: self.max_depth,
max_filesize: self.max_filesize,
follow_links: self.follow_links,
skip: self.skip.clone(),
filter: self.filter.clone(),
};
handles.push(s.spawn(|_| worker.run()));
}
})
.map(|worker| s.spawn(|| worker.run()))
.collect();
for handle in handles {
handle.join().unwrap();
}
})
.unwrap(); // Pass along panics from threads
});
}
fn threads(&self) -> usize {
@ -1388,6 +1386,73 @@ impl Work {
}
}
/// A work-stealing stack.
#[derive(Debug)]
struct Stack {
/// This thread's index.
index: usize,
/// The thread-local stack.
deque: Deque<Message>,
/// The work stealers.
stealers: Arc<[Stealer<Message>]>,
}
impl Stack {
/// Create a work-stealing stack for each thread. The given messages
/// correspond to the initial paths to start the search at. They will
/// be distributed automatically to each stack in a round-robin fashion.
fn new_for_each_thread(threads: usize, init: Vec<Message>) -> Vec<Stack> {
// Using new_lifo() ensures each worker operates depth-first, not
// breadth-first. We do depth-first because a breadth first traversal
// on wide directories with a lot of gitignores is disastrous (for
// example, searching a directory tree containing all of crates.io).
let deques: Vec<Deque<Message>> =
std::iter::repeat_with(Deque::new_lifo).take(threads).collect();
let stealers = Arc::<[Stealer<Message>]>::from(
deques.iter().map(Deque::stealer).collect::<Vec<_>>(),
);
let stacks: Vec<Stack> = deques
.into_iter()
.enumerate()
.map(|(index, deque)| Stack {
index,
deque,
stealers: stealers.clone(),
})
.collect();
// Distribute the initial messages.
init.into_iter()
.zip(stacks.iter().cycle())
.for_each(|(m, s)| s.push(m));
stacks
}
/// Push a message.
fn push(&self, msg: Message) {
self.deque.push(msg);
}
/// Pop a message.
fn pop(&self) -> Option<Message> {
self.deque.pop().or_else(|| self.steal())
}
/// Steal a message from another queue.
fn steal(&self) -> Option<Message> {
// For fairness, try to steal from index + 1, index + 2, ... len - 1,
// then wrap around to 0, 1, ... index - 1.
let (left, right) = self.stealers.split_at(self.index);
// Don't steal from ourselves
let right = &right[1..];
right
.iter()
.chain(left.iter())
.map(|s| s.steal_batch_and_pop(&self.deque))
.find_map(|s| s.success())
}
}
/// A worker is responsible for descending into directories, updating the
/// ignore matchers, producing new work and invoking the caller's callback.
///
@ -1395,19 +1460,19 @@ impl Work {
struct Worker<'s> {
/// The caller's callback.
visitor: Box<dyn ParallelVisitor + 's>,
/// A stack of work to do.
/// A work-stealing stack of work to do.
///
/// We use a stack instead of a channel because a stack lets us visit
/// directories in depth first order. This can substantially reduce peak
/// memory usage by keeping both the number of files path and gitignore
/// memory usage by keeping both the number of file paths and gitignore
/// matchers in memory lower.
stack: Arc<Mutex<Vec<Message>>>,
stack: Stack,
/// Whether all workers should terminate at the next opportunity. Note
/// that we need this because we don't want other `Work` to be done after
/// we quit. We wouldn't need this if have a priority channel.
quit_now: Arc<AtomicBool>,
/// The number of outstanding work items.
num_pending: Arc<AtomicUsize>,
/// The number of currently active workers.
active_workers: Arc<AtomicUsize>,
/// The maximum depth of directories to descend. A value of `0` means no
/// descension at all.
max_depth: Option<usize>,
@ -1435,7 +1500,6 @@ impl<'s> Worker<'s> {
if let WalkState::Quit = self.run_one(work) {
self.quit_now();
}
self.work_done();
}
}
@ -1617,23 +1681,20 @@ impl<'s> Worker<'s> {
return None;
}
None => {
// Once num_pending reaches 0, it is impossible for it to
// ever increase again. Namely, it only reaches 0 once
// all jobs have run such that no jobs have produced more
// work. We have this guarantee because num_pending is
// always incremented before each job is submitted and only
// decremented once each job is completely finished.
// Therefore, if this reaches zero, then there can be no
// other job running.
if self.num_pending() == 0 {
// Every other thread is blocked at the next recv().
// Send the initial quit message and quit.
if self.deactivate_worker() == 0 {
// If deactivate_worker() returns 0, every worker thread
// is currently within the critical section between the
// acquire in deactivate_worker() and the release in
// activate_worker() below. For this to happen, every
// worker's local deque must be simultaneously empty,
// meaning there is no more work left at all.
self.send_quit();
return None;
}
// Wait for next `Work` or `Quit` message.
loop {
if let Some(v) = self.recv() {
self.activate_worker();
value = Some(v);
break;
}
@ -1641,7 +1702,8 @@ impl<'s> Worker<'s> {
// CPU waiting, we let the thread sleep for a bit. In
// general, this tends to only occur once the search is
// approaching termination.
thread::sleep(Duration::from_millis(1));
let dur = std::time::Duration::from_millis(1);
std::thread::sleep(dur);
}
}
}
@ -1650,41 +1712,37 @@ impl<'s> Worker<'s> {
/// Indicates that all workers should quit immediately.
fn quit_now(&self) {
self.quit_now.store(true, Ordering::SeqCst);
self.quit_now.store(true, AtomicOrdering::SeqCst);
}
/// Returns true if this worker should quit immediately.
fn is_quit_now(&self) -> bool {
self.quit_now.load(Ordering::SeqCst)
}
/// Returns the number of pending jobs.
fn num_pending(&self) -> usize {
self.num_pending.load(Ordering::SeqCst)
self.quit_now.load(AtomicOrdering::SeqCst)
}
/// Send work.
fn send(&self, work: Work) {
self.num_pending.fetch_add(1, Ordering::SeqCst);
let mut stack = self.stack.lock().unwrap();
stack.push(Message::Work(work));
self.stack.push(Message::Work(work));
}
/// Send a quit message.
fn send_quit(&self) {
let mut stack = self.stack.lock().unwrap();
stack.push(Message::Quit);
self.stack.push(Message::Quit);
}
/// Receive work.
fn recv(&self) -> Option<Message> {
let mut stack = self.stack.lock().unwrap();
stack.pop()
self.stack.pop()
}
/// Signal that work has been received.
fn work_done(&self) {
self.num_pending.fetch_sub(1, Ordering::SeqCst);
/// Deactivates a worker and returns the number of currently active workers.
fn deactivate_worker(&self) -> usize {
self.active_workers.fetch_sub(1, AtomicOrdering::Acquire) - 1
}
/// Reactivates a worker.
fn activate_worker(&self) {
self.active_workers.fetch_add(1, AtomicOrdering::Release);
}
}

View File

@ -1,6 +1,6 @@
[package]
name = "grep-matcher"
version = "0.1.5" #:version
version = "0.1.7" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
A trait for regular expressions, with a focus on line oriented search.
@ -12,13 +12,13 @@ readme = "README.md"
keywords = ["regex", "pattern", "trait"]
license = "Unlicense OR MIT"
autotests = false
edition = "2018"
edition = "2021"
[dependencies]
memchr = "2.1"
memchr = "2.6.3"
[dev-dependencies]
regex = "1.1"
regex = "1.9.5"
[[test]]
name = "integration"

View File

@ -1,5 +1,3 @@
use std::str;
use memchr::memchr;
/// Interpolate capture references in `replacement` and write the interpolation
@ -12,6 +10,7 @@ use memchr::memchr;
/// of a capture group reference and is expected to resolve the index to its
/// corresponding matched text. If no such match exists, then `append` should
/// not write anything to its given buffer.
#[inline]
pub fn interpolate<A, N>(
mut replacement: &[u8],
mut append: A,
@ -77,12 +76,14 @@ enum Ref<'a> {
}
impl<'a> From<&'a str> for Ref<'a> {
#[inline]
fn from(x: &'a str) -> Ref<'a> {
Ref::Named(x)
}
}
impl From<usize> for Ref<'static> {
#[inline]
fn from(x: usize) -> Ref<'static> {
Ref::Number(x)
}
@ -92,6 +93,7 @@ impl From<usize> for Ref<'static> {
/// starting at the beginning of `replacement`.
///
/// If no such valid reference could be found, None is returned.
#[inline]
fn find_cap_ref(replacement: &[u8]) -> Option<CaptureRef<'_>> {
let mut i = 0;
if replacement.len() <= 1 || replacement[0] != b'$' {
@ -114,7 +116,7 @@ fn find_cap_ref(replacement: &[u8]) -> Option<CaptureRef<'_>> {
// therefore be valid UTF-8. If we really cared, we could avoid this UTF-8
// check with an unchecked conversion or by parsing the number straight
// from &[u8].
let cap = str::from_utf8(&replacement[i..cap_end])
let cap = std::str::from_utf8(&replacement[i..cap_end])
.expect("valid UTF-8 capture name");
if brace {
if !replacement.get(cap_end).map_or(false, |&b| b == b'}') {
@ -132,6 +134,7 @@ fn find_cap_ref(replacement: &[u8]) -> Option<CaptureRef<'_>> {
}
/// Returns true if and only if the given byte is allowed in a capture name.
#[inline]
fn is_valid_cap_letter(b: &u8) -> bool {
match *b {
b'0'..=b'9' | b'a'..=b'z' | b'A'..=b'Z' | b'_' => true,

View File

@ -6,12 +6,10 @@ the search routines provided by the
[`grep-searcher`](https://docs.rs/grep-searcher)
crate.
The primary thing provided by this crate is the
[`Matcher`](trait.Matcher.html)
trait. The trait defines an abstract interface for text search. It is robust
enough to support everything from basic substring search all the way to
arbitrarily complex regular expression implementations without sacrificing
performance.
The primary thing provided by this crate is the [`Matcher`] trait. The trait
defines an abstract interface for text search. It is robust enough to support
everything from basic substring search all the way to arbitrarily complex
regular expression implementations without sacrificing performance.
A key design decision made in this crate is the use of *internal iteration*,
or otherwise known as the "push" model of searching. In this paradigm,
@ -38,11 +36,6 @@ implementations.
#![deny(missing_docs)]
use std::fmt;
use std::io;
use std::ops;
use std::u64;
use crate::interpolate::interpolate;
mod interpolate;
@ -162,7 +155,7 @@ impl Match {
}
}
impl ops::Index<Match> for [u8] {
impl std::ops::Index<Match> for [u8] {
type Output = [u8];
#[inline]
@ -171,14 +164,14 @@ impl ops::Index<Match> for [u8] {
}
}
impl ops::IndexMut<Match> for [u8] {
impl std::ops::IndexMut<Match> for [u8] {
#[inline]
fn index_mut(&mut self, index: Match) -> &mut [u8] {
&mut self[index.start..index.end]
}
}
impl ops::Index<Match> for str {
impl std::ops::Index<Match> for str {
type Output = str;
#[inline]
@ -204,11 +197,7 @@ pub struct LineTerminator(LineTerminatorImp);
#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq)]
enum LineTerminatorImp {
/// Any single byte representing a line terminator.
///
/// We represent this as an array so we can safely convert it to a slice
/// for convenient access. At some point, we can use `std::slice::from_ref`
/// instead.
Byte([u8; 1]),
Byte(u8),
/// A line terminator represented by `\r\n`.
///
/// When this option is used, consumers may generally treat a lone `\n` as
@ -220,7 +209,7 @@ impl LineTerminator {
/// Return a new single-byte line terminator. Any byte is valid.
#[inline]
pub fn byte(byte: u8) -> LineTerminator {
LineTerminator(LineTerminatorImp::Byte([byte]))
LineTerminator(LineTerminatorImp::Byte(byte))
}
/// Return a new line terminator represented by `\r\n`.
@ -246,7 +235,7 @@ impl LineTerminator {
#[inline]
pub fn as_byte(&self) -> u8 {
match self.0 {
LineTerminatorImp::Byte(array) => array[0],
LineTerminatorImp::Byte(byte) => byte,
LineTerminatorImp::CRLF => b'\n',
}
}
@ -260,7 +249,7 @@ impl LineTerminator {
#[inline]
pub fn as_bytes(&self) -> &[u8] {
match self.0 {
LineTerminatorImp::Byte(ref array) => array,
LineTerminatorImp::Byte(ref byte) => std::slice::from_ref(byte),
LineTerminatorImp::CRLF => &[b'\r', b'\n'],
}
}
@ -301,10 +290,10 @@ pub struct ByteSet(BitSet);
#[derive(Clone, Copy)]
struct BitSet([u64; 4]);
impl fmt::Debug for BitSet {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
impl std::fmt::Debug for BitSet {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let mut fmtd = f.debug_set();
for b in (0..256).map(|b| b as u8) {
for b in 0..=255 {
if ByteSet(*self).contains(b) {
fmtd.entry(&b);
}
@ -315,12 +304,14 @@ impl fmt::Debug for BitSet {
impl ByteSet {
/// Create an empty set of bytes.
#[inline]
pub fn empty() -> ByteSet {
ByteSet(BitSet([0; 4]))
}
/// Create a full set of bytes such that every possible byte is in the set
/// returned.
#[inline]
pub fn full() -> ByteSet {
ByteSet(BitSet([u64::MAX; 4]))
}
@ -328,15 +319,17 @@ impl ByteSet {
/// Add a byte to this set.
///
/// If the given byte already belongs to this set, then this is a no-op.
#[inline]
pub fn add(&mut self, byte: u8) {
let bucket = byte / 64;
let bit = byte % 64;
(self.0).0[bucket as usize] |= 1 << bit;
(self.0).0[usize::from(bucket)] |= 1 << bit;
}
/// Add an inclusive range of bytes.
#[inline]
pub fn add_all(&mut self, start: u8, end: u8) {
for b in (start as u64..end as u64 + 1).map(|b| b as u8) {
for b in start..=end {
self.add(b);
}
}
@ -344,24 +337,27 @@ impl ByteSet {
/// Remove a byte from this set.
///
/// If the given byte is not in this set, then this is a no-op.
#[inline]
pub fn remove(&mut self, byte: u8) {
let bucket = byte / 64;
let bit = byte % 64;
(self.0).0[bucket as usize] &= !(1 << bit);
(self.0).0[usize::from(bucket)] &= !(1 << bit);
}
/// Remove an inclusive range of bytes.
#[inline]
pub fn remove_all(&mut self, start: u8, end: u8) {
for b in (start as u64..end as u64 + 1).map(|b| b as u8) {
for b in start..=end {
self.remove(b);
}
}
/// Return true if and only if the given byte is in this set.
#[inline]
pub fn contains(&self, byte: u8) -> bool {
let bucket = byte / 64;
let bit = byte % 64;
(self.0).0[bucket as usize] & (1 << bit) > 0
(self.0).0[usize::from(bucket)] & (1 << bit) > 0
}
}
@ -393,11 +389,21 @@ pub trait Captures {
/// for the overall match.
fn get(&self, i: usize) -> Option<Match>;
/// Return the overall match for the capture.
///
/// This returns the match for index `0`. That is it is equivalent to
/// `get(0).unwrap()`
#[inline]
fn as_match(&self) -> Match {
self.get(0).unwrap()
}
/// Returns true if and only if these captures are empty. This occurs
/// when `len` is `0`.
///
/// Note that capturing groups that have non-zero length but otherwise
/// contain no matching groups are *not* empty.
#[inline]
fn is_empty(&self) -> bool {
self.len() == 0
}
@ -431,6 +437,7 @@ pub trait Captures {
/// the given `haystack`. Generally, this means that `haystack` should be
/// the same slice that was searched to get the current capture group
/// matches.
#[inline]
fn interpolate<F>(
&self,
name_to_index: F,
@ -462,15 +469,19 @@ pub struct NoCaptures(());
impl NoCaptures {
/// Create an empty set of capturing groups.
#[inline]
pub fn new() -> NoCaptures {
NoCaptures(())
}
}
impl Captures for NoCaptures {
#[inline]
fn len(&self) -> usize {
0
}
#[inline]
fn get(&self, _: usize) -> Option<Match> {
None
}
@ -478,27 +489,27 @@ impl Captures for NoCaptures {
/// NoError provides an error type for matchers that never produce errors.
///
/// This error type implements the `std::error::Error` and `fmt::Display`
/// This error type implements the `std::error::Error` and `std::fmt::Display`
/// traits for use in matcher implementations that can never produce errors.
///
/// The `fmt::Debug` and `fmt::Display` impls for this type panics.
/// The `std::fmt::Debug` and `std::fmt::Display` impls for this type panics.
#[derive(Debug, Eq, PartialEq)]
pub struct NoError(());
impl ::std::error::Error for NoError {
impl std::error::Error for NoError {
fn description(&self) -> &str {
"no error"
}
}
impl fmt::Display for NoError {
fn fmt(&self, _: &mut fmt::Formatter<'_>) -> fmt::Result {
impl std::fmt::Display for NoError {
fn fmt(&self, _: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
panic!("BUG for NoError: an impossible error occurred")
}
}
impl From<NoError> for io::Error {
fn from(_: NoError) -> io::Error {
impl From<NoError> for std::io::Error {
fn from(_: NoError) -> std::io::Error {
panic!("BUG for NoError: an impossible error occurred")
}
}
@ -522,13 +533,11 @@ pub enum LineMatchKind {
/// A matcher defines an interface for regular expression implementations.
///
/// While this trait is large, there are only two required methods that
/// implementors must provide: `find_at` and `new_captures`. If captures
/// aren't supported by your implementation, then `new_captures` can be
/// implemented with
/// [`NoCaptures`](struct.NoCaptures.html). If your implementation does support
/// capture groups, then you should also implement the other capture related
/// methods, as dictated by the documentation. Crucially, this includes
/// `captures_at`.
/// implementors must provide: `find_at` and `new_captures`. If captures aren't
/// supported by your implementation, then `new_captures` can be implemented
/// with [`NoCaptures`]. If your implementation does support capture groups,
/// then you should also implement the other capture related methods, as
/// dictated by the documentation. Crucially, this includes `captures_at`.
///
/// The rest of the methods on this trait provide default implementations on
/// top of `find_at` and `new_captures`. It is not uncommon for implementations
@ -547,7 +556,7 @@ pub trait Matcher {
/// use the `NoError` type in this crate. In the future, when the "never"
/// (spelled `!`) type is stabilized, then it should probably be used
/// instead.
type Error: fmt::Display;
type Error: std::fmt::Display;
/// Returns the start and end byte range of the first match in `haystack`
/// after `at`, where the byte offsets are relative to that start of
@ -584,6 +593,7 @@ pub trait Matcher {
///
/// By default, capturing groups are not supported, so this always
/// returns 0.
#[inline]
fn capture_count(&self) -> usize {
0
}
@ -597,6 +607,7 @@ pub trait Matcher {
///
/// By default, capturing groups are not supported, so this always returns
/// `None`.
#[inline]
fn capture_index(&self, _name: &str) -> Option<usize> {
None
}
@ -606,6 +617,7 @@ pub trait Matcher {
///
/// The text encoding of `haystack` is not strictly specified. Matchers are
/// advised to assume UTF-8, or at worst, some ASCII compatible encoding.
#[inline]
fn find(&self, haystack: &[u8]) -> Result<Option<Match>, Self::Error> {
self.find_at(haystack, 0)
}
@ -613,6 +625,7 @@ pub trait Matcher {
/// Executes the given function over successive non-overlapping matches
/// in `haystack`. If no match exists, then the given function is never
/// called. If the function returns `false`, then iteration stops.
#[inline]
fn find_iter<F>(
&self,
haystack: &[u8],
@ -631,6 +644,7 @@ pub trait Matcher {
/// The significance of the starting point is that it takes the surrounding
/// context into consideration. For example, the `\A` anchor can only
/// match when `at == 0`.
#[inline]
fn find_iter_at<F>(
&self,
haystack: &[u8],
@ -651,6 +665,7 @@ pub trait Matcher {
/// the error is yielded. If an error occurs while executing the search,
/// then it is converted to
/// `E`.
#[inline]
fn try_find_iter<F, E>(
&self,
haystack: &[u8],
@ -673,6 +688,7 @@ pub trait Matcher {
/// The significance of the starting point is that it takes the surrounding
/// context into consideration. For example, the `\A` anchor can only
/// match when `at == 0`.
#[inline]
fn try_find_iter_at<F, E>(
&self,
haystack: &[u8],
@ -720,6 +736,7 @@ pub trait Matcher {
///
/// The text encoding of `haystack` is not strictly specified. Matchers are
/// advised to assume UTF-8, or at worst, some ASCII compatible encoding.
#[inline]
fn captures(
&self,
haystack: &[u8],
@ -732,6 +749,7 @@ pub trait Matcher {
/// in `haystack` with capture groups extracted from each match. If no
/// match exists, then the given function is never called. If the function
/// returns `false`, then iteration stops.
#[inline]
fn captures_iter<F>(
&self,
haystack: &[u8],
@ -752,6 +770,7 @@ pub trait Matcher {
/// The significance of the starting point is that it takes the surrounding
/// context into consideration. For example, the `\A` anchor can only
/// match when `at == 0`.
#[inline]
fn captures_iter_at<F>(
&self,
haystack: &[u8],
@ -773,6 +792,7 @@ pub trait Matcher {
/// returns an error then iteration stops and the error is yielded. If
/// an error occurs while executing the search, then it is converted to
/// `E`.
#[inline]
fn try_captures_iter<F, E>(
&self,
haystack: &[u8],
@ -796,6 +816,7 @@ pub trait Matcher {
/// The significance of the starting point is that it takes the surrounding
/// context into consideration. For example, the `\A` anchor can only
/// match when `at == 0`.
#[inline]
fn try_captures_iter_at<F, E>(
&self,
haystack: &[u8],
@ -862,6 +883,7 @@ pub trait Matcher {
/// Note that if implementors seek to support capturing groups, then they
/// should implement this method. Other methods that match based on
/// captures will then work automatically.
#[inline]
fn captures_at(
&self,
_haystack: &[u8],
@ -876,6 +898,7 @@ pub trait Matcher {
/// a handle to the `dst` buffer provided.
///
/// If the given `append` function returns `false`, then replacement stops.
#[inline]
fn replace<F>(
&self,
haystack: &[u8],
@ -899,6 +922,7 @@ pub trait Matcher {
/// `append` with the matching capture groups.
///
/// If the given `append` function returns `false`, then replacement stops.
#[inline]
fn replace_with_captures<F>(
&self,
haystack: &[u8],
@ -920,6 +944,7 @@ pub trait Matcher {
/// The significance of the starting point is that it takes the surrounding
/// context into consideration. For example, the `\A` anchor can only
/// match when `at == 0`.
#[inline]
fn replace_with_captures_at<F>(
&self,
haystack: &[u8],
@ -945,6 +970,7 @@ pub trait Matcher {
/// Returns true if and only if the matcher matches the given haystack.
///
/// By default, this method is implemented by calling `shortest_match`.
#[inline]
fn is_match(&self, haystack: &[u8]) -> Result<bool, Self::Error> {
self.is_match_at(haystack, 0)
}
@ -957,6 +983,7 @@ pub trait Matcher {
/// The significance of the starting point is that it takes the surrounding
/// context into consideration. For example, the `\A` anchor can only
/// match when `at == 0`.
#[inline]
fn is_match_at(
&self,
haystack: &[u8],
@ -979,6 +1006,7 @@ pub trait Matcher {
/// a faster implementation of this than what `find` does.
///
/// By default, this method is implemented by calling `find`.
#[inline]
fn shortest_match(
&self,
haystack: &[u8],
@ -1004,6 +1032,7 @@ pub trait Matcher {
/// The significance of the starting point is that it takes the surrounding
/// context into consideration. For example, the `\A` anchor can only
/// match when `at == 0`.
#[inline]
fn shortest_match_at(
&self,
haystack: &[u8],
@ -1032,6 +1061,7 @@ pub trait Matcher {
/// exists with that byte.
///
/// By default, this returns `None`.
#[inline]
fn non_matching_bytes(&self) -> Option<&ByteSet> {
None
}
@ -1048,6 +1078,7 @@ pub trait Matcher {
/// `CRLF`.
///
/// By default, this returns `None`.
#[inline]
fn line_terminator(&self) -> Option<LineTerminator> {
None
}
@ -1090,6 +1121,7 @@ pub trait Matcher {
/// Note that while this method may report false positives, it must never
/// report false negatives. That is, it can never skip over lines that
/// contain a match.
#[inline]
fn find_candidate_line(
&self,
haystack: &[u8],
@ -1102,6 +1134,7 @@ impl<'a, M: Matcher> Matcher for &'a M {
type Captures = M::Captures;
type Error = M::Error;
#[inline]
fn find_at(
&self,
haystack: &[u8],
@ -1110,10 +1143,12 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).find_at(haystack, at)
}
#[inline]
fn new_captures(&self) -> Result<Self::Captures, Self::Error> {
(*self).new_captures()
}
#[inline]
fn captures_at(
&self,
haystack: &[u8],
@ -1123,18 +1158,22 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).captures_at(haystack, at, caps)
}
#[inline]
fn capture_index(&self, name: &str) -> Option<usize> {
(*self).capture_index(name)
}
#[inline]
fn capture_count(&self) -> usize {
(*self).capture_count()
}
#[inline]
fn find(&self, haystack: &[u8]) -> Result<Option<Match>, Self::Error> {
(*self).find(haystack)
}
#[inline]
fn find_iter<F>(
&self,
haystack: &[u8],
@ -1146,6 +1185,7 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).find_iter(haystack, matched)
}
#[inline]
fn find_iter_at<F>(
&self,
haystack: &[u8],
@ -1158,6 +1198,7 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).find_iter_at(haystack, at, matched)
}
#[inline]
fn try_find_iter<F, E>(
&self,
haystack: &[u8],
@ -1169,6 +1210,7 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).try_find_iter(haystack, matched)
}
#[inline]
fn try_find_iter_at<F, E>(
&self,
haystack: &[u8],
@ -1181,6 +1223,7 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).try_find_iter_at(haystack, at, matched)
}
#[inline]
fn captures(
&self,
haystack: &[u8],
@ -1189,6 +1232,7 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).captures(haystack, caps)
}
#[inline]
fn captures_iter<F>(
&self,
haystack: &[u8],
@ -1201,6 +1245,7 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).captures_iter(haystack, caps, matched)
}
#[inline]
fn captures_iter_at<F>(
&self,
haystack: &[u8],
@ -1214,6 +1259,7 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).captures_iter_at(haystack, at, caps, matched)
}
#[inline]
fn try_captures_iter<F, E>(
&self,
haystack: &[u8],
@ -1226,6 +1272,7 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).try_captures_iter(haystack, caps, matched)
}
#[inline]
fn try_captures_iter_at<F, E>(
&self,
haystack: &[u8],
@ -1239,6 +1286,7 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).try_captures_iter_at(haystack, at, caps, matched)
}
#[inline]
fn replace<F>(
&self,
haystack: &[u8],
@ -1251,6 +1299,7 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).replace(haystack, dst, append)
}
#[inline]
fn replace_with_captures<F>(
&self,
haystack: &[u8],
@ -1264,6 +1313,7 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).replace_with_captures(haystack, caps, dst, append)
}
#[inline]
fn replace_with_captures_at<F>(
&self,
haystack: &[u8],
@ -1278,10 +1328,12 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).replace_with_captures_at(haystack, at, caps, dst, append)
}
#[inline]
fn is_match(&self, haystack: &[u8]) -> Result<bool, Self::Error> {
(*self).is_match(haystack)
}
#[inline]
fn is_match_at(
&self,
haystack: &[u8],
@ -1290,6 +1342,7 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).is_match_at(haystack, at)
}
#[inline]
fn shortest_match(
&self,
haystack: &[u8],
@ -1297,6 +1350,7 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).shortest_match(haystack)
}
#[inline]
fn shortest_match_at(
&self,
haystack: &[u8],
@ -1305,14 +1359,17 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).shortest_match_at(haystack, at)
}
#[inline]
fn non_matching_bytes(&self) -> Option<&ByteSet> {
(*self).non_matching_bytes()
}
#[inline]
fn line_terminator(&self) -> Option<LineTerminator> {
(*self).line_terminator()
}
#[inline]
fn find_candidate_line(
&self,
haystack: &[u8],

View File

@ -1,5 +1,7 @@
use grep_matcher::{Captures, Match, Matcher};
use regex::bytes::Regex;
use {
grep_matcher::{Captures, Match, Matcher},
regex::bytes::Regex,
};
use crate::util::{RegexMatcher, RegexMatcherNoCaps};

View File

@ -1,28 +1,29 @@
use std::collections::HashMap;
use std::result;
use grep_matcher::{Captures, Match, Matcher, NoCaptures, NoError};
use regex::bytes::{CaptureLocations, Regex};
use {
grep_matcher::{Captures, Match, Matcher, NoCaptures, NoError},
regex::bytes::{CaptureLocations, Regex},
};
#[derive(Debug)]
pub struct RegexMatcher {
pub(crate) struct RegexMatcher {
pub re: Regex,
pub names: HashMap<String, usize>,
}
impl RegexMatcher {
pub fn new(re: Regex) -> RegexMatcher {
pub(crate) fn new(re: Regex) -> RegexMatcher {
let mut names = HashMap::new();
for (i, optional_name) in re.capture_names().enumerate() {
if let Some(name) = optional_name {
names.insert(name.to_string(), i);
}
}
RegexMatcher { re: re, names: names }
RegexMatcher { re, names }
}
}
type Result<T> = result::Result<T, NoError>;
type Result<T> = std::result::Result<T, NoError>;
impl Matcher for RegexMatcher {
type Captures = RegexCaptures;
@ -63,7 +64,7 @@ impl Matcher for RegexMatcher {
}
#[derive(Debug)]
pub struct RegexMatcherNoCaps(pub Regex);
pub(crate) struct RegexMatcherNoCaps(pub(crate) Regex);
impl Matcher for RegexMatcherNoCaps {
type Captures = NoCaptures;
@ -82,7 +83,7 @@ impl Matcher for RegexMatcherNoCaps {
}
#[derive(Clone, Debug)]
pub struct RegexCaptures(CaptureLocations);
pub(crate) struct RegexCaptures(CaptureLocations);
impl Captures for RegexCaptures {
fn len(&self) -> usize {

View File

@ -1,6 +1,6 @@
[package]
name = "grep-pcre2"
version = "0.1.5" #:version
version = "0.1.8" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Use PCRE2 with the 'grep' crate.
@ -14,5 +14,6 @@ license = "Unlicense OR MIT"
edition = "2018"
[dependencies]
grep-matcher = { version = "0.1.5", path = "../matcher" }
pcre2 = "0.2.3"
grep-matcher = { version = "0.1.7", path = "../matcher" }
log = "0.4.20"
pcre2 = "0.2.6"

View File

@ -1,6 +1,3 @@
use std::error;
use std::fmt;
/// An error that can occur in this crate.
///
/// Generally, this error corresponds to problems building a regular
@ -12,7 +9,7 @@ pub struct Error {
}
impl Error {
pub(crate) fn regex<E: error::Error>(err: E) -> Error {
pub(crate) fn regex<E: std::error::Error>(err: E) -> Error {
Error { kind: ErrorKind::Regex(err.to_string()) }
}
@ -24,6 +21,7 @@ impl Error {
/// The kind of an error that can occur.
#[derive(Clone, Debug)]
#[non_exhaustive]
pub enum ErrorKind {
/// An error that occurred as a result of parsing a regular expression.
/// This can be a syntax error or an error that results from attempting to
@ -31,29 +29,20 @@ pub enum ErrorKind {
///
/// The string here is the underlying error converted to a string.
Regex(String),
/// Hints that destructuring should not be exhaustive.
///
/// This enum may grow additional variants, so this makes sure clients
/// don't count on exhaustive matching. (Otherwise, adding a new variant
/// could break existing code.)
#[doc(hidden)]
__Nonexhaustive,
}
impl error::Error for Error {
impl std::error::Error for Error {
fn description(&self) -> &str {
match self.kind {
ErrorKind::Regex(_) => "regex error",
ErrorKind::__Nonexhaustive => unreachable!(),
}
}
}
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
impl std::fmt::Display for Error {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self.kind {
ErrorKind::Regex(ref s) => write!(f, "{}", s),
ErrorKind::__Nonexhaustive => unreachable!(),
}
}
}

Some files were not shown because too many files have changed in this diff Show More