Compare commits

...

109 Commits

Author SHA1 Message Date
Andrew Gallant
053a1669bb globset-0.4.12 2023-07-26 19:51:38 -04:00
David Tolnay
31d3f16254 api: impl Deserialize for GlobSet
PR #2569
2023-07-26 19:51:22 -04:00
Andrew Gallant
304a60e8e9 grep-cli-0.1.9 2023-07-18 13:25:23 -04:00
Andrew Gallant
1d35859861 globset-0.4.11 2023-07-12 12:58:43 -04:00
mataha
601e122e9f ignore/types: add Windows Command Prompt files
This PR adds `*.bat` and `*.cmd` file types.

In doing so, it makes a distinction between batch files (old standard
from the MS-DOS era) and command scripts (new flavor - can operate on
batch files, although `*.cmd` is preferred for various reasons, the
main one being batch files will set `ERRORLEVEL` following inconsistent
MS-DOS style rules[1]).

PR #2556

[1]: https://groups.google.com/g/microsoft.public.win2000.cmdprompt.admin/c/XHeUq8oe2wk/m/LIEViGNmkK0J#i106
2023-07-10 15:58:17 -04:00
Andrew Gallant
efb2e8ce1e ci/release: use latest OS versions 2023-07-09 10:14:03 -04:00
xEgoist
8d464e5c78 ci/release: add sha256 sums to release artifacts
Fixes #1924, Closes #2168
2023-07-09 10:14:03 -04:00
Andrew Gallant
d67809d6c4 github: remove dependabot configuration
This does not seem to have worked at all. For example, there were
Actions being used that were clearly deprecated/archived[1]. But
Dependabot didn't make a peep. So just get rid of it to avoid the false
sense that someone is checking our dependencies for us.

[1]: https://github.com/BurntSushi/ripgrep/pull/2360
2023-07-09 10:14:03 -04:00
nguyenvukhang
6abb962f0d cli: fix non-path sorting behavior
Previously, sorting worked by sorting the parents and then sorting the
children within each parent. This was done during traversal, but it only
works when sorting parents preserves the overall order. This generally
only works for '--sort path' in ascending order.

This commit fixes the rest of the sorting behavior by collecting all of
the paths to search and then sorting them before searching. We only
collect all of the paths when sorting was requested.

Fixes #2243, Closes #2361
2023-07-09 10:14:03 -04:00
Edoardo Pirovano
6d95c130d5 cli: add --stop-on-nonmatch flag
This causes ripgrep to stop searching an individual file after it has
found a non-matching line. But this only occurs after it has found a
matching line.

Fixes #1790, Closes #1930
2023-07-08 18:52:42 -04:00
Garrett Thornburg
4782ebd5e0 core: lock stdout before printing an error message to stderr
Adds a new eprintln_locked macro which locks STDOUT before logging
to STDERR. This patch also replaces instances of eprintln with
eprintln_locked to avoid interleaving lines.

Fixes #1941, Closes #1968
2023-07-08 18:52:42 -04:00
piegames
4993d29a16 globset: add 'escape' routine
Fixes #2060, Closes #2061
2023-07-08 18:52:42 -04:00
Seth Stadick
23adbd6795 cli: force binary existance check
Previously, we were only doing a binary existence check on Windows. And
in fact, the main point there wasn't binary existence, but ensuring we
didn't accidentally resolve a binary name relative to the CWD, which
could result in executing a program one didn't mean to run.

However, it is useful to be able to check whether a binary exists on any
platform when associating a glob with a binary. If the binary doesn't
exist, then the association can fail eagerly and let some other glob
apply.

Closes #1946
2023-07-08 18:52:42 -04:00
Kevin Svetlitski
9df8ab42b1 cargo: reduce the size of the .crate file published to crates.io
None of this stuff is needed for the main ripgrep crate.

Closes #1940
2023-07-08 18:52:42 -04:00
Michal Terepeta
cb7501ff11 doc: clarify the comment on Worker.work_done
We call `work_done` only once the work has been actually performed
(otherwise `num_pending` could go to 0 before the actual work is done).

Closes #2039
2023-07-08 18:52:42 -04:00
Kyle Todeschini
3b66f37a31 doc: improve -r/--replace flag syntax docs
Fixes #2108, Closes #2123
2023-07-08 18:52:42 -04:00
Andrew Gallant
3eccb7c363 readme: add 'yum-utils' to RHEL/Centos instructions
Closes #2103
2023-07-08 18:52:42 -04:00
kotborealis
f30a30867e ignore/types: name aliases for file types
We also make py/python, md/markdown and ts/typescript aliases of one
another.

Note that this only introduces aliases at the point where default types
are defined. This just makes them a bit easier to read/write, and also
makes it easier to expose more names that describe the same thing.

Fixes #1857, Closes #1895
2023-07-08 18:52:42 -04:00
Klas Mellbourn
7313dca472 ignore/types: add 'typescript' alias for 'ts'
Closes #2009
2023-07-08 18:52:42 -04:00
Tama McGlinn
99bf2b01dc ignore/types: add Ada filetypes, including gprbuild and alire
*.adb and *.ads are the usual extensions for Ada source code,
and *.gpr indicates a GPRbuild project file used for Ada, and
these days often being combined with alire for package dependency
resolution. Alire stores a bunch of files named alire.toml in
different directories in your (gitignored) cache/dependencies/...

Closes #2013
2023-07-08 18:52:42 -04:00
Juan Francisco Cantero Hurtado
ee1360cc07 ignore/types: add raku extensions to ignore types
Closes #2117
2023-07-08 18:52:42 -04:00
Andrew Gallant
db6bb21a62 windows: attempt to enable long path support for MSVC targets
See the README and comments in the build.rs. Basically, this embeds an
XML file that I guess is a way of setting configuration knobs on
Windows. One of those knobs is enabling long path support. You still
need to enable it in your registry (lol), but this will handle the other
half of it.

Fixes #364, Closes #2049
2023-07-08 18:52:42 -04:00
Andrew Gallant
da7c81fb96 ignore/types: add MDX format to Markdown types
Ref https://mdxjs.com/

Closes #2142
2023-07-08 18:52:42 -04:00
chrispy
a4e3d56de1 ignore/types: add DITA (Darwin Information Typing Architecture)
Closes #2148
2023-07-08 18:52:42 -04:00
Ludi Rehak
7c83b90f95 doc: fix typo
Closes #2153
2023-07-08 18:52:42 -04:00
cuishuang
97b5b7769c doc: fix some typos
Closes #2195
2023-07-08 18:52:42 -04:00
dana
2708f9e81d complete: add extra-verbose support to _rg_types
When the extra-verbose style is set for the types tag, completed types
are displayed along with the patterns they correspond to. This can be
enabled by e.g. adding the following to .zshrc:

  zstyle ':completion:*:rg:*:types' extra-verbose true

This change also makes _rg_types use the actual rg specified on the
command line to look up types, and it fixes a mangled complete-all
style check

Fixes #2195
2023-07-08 18:52:42 -04:00
Richard Sternagel
f3241fd657 cli: '--no-ignore-dot' should also '.rgignore'
Fixes #2198, Closes #2202
2023-07-08 18:52:42 -04:00
Andrew Gallant
cfe357188d ignore/types: fix formatting 2023-07-08 18:52:42 -04:00
edam
792451e331 ignore/types: added V type
V (http://vlang.io) uses '.v' files.

Closes #2302
2023-07-08 18:52:42 -04:00
Andrew Gallant
7dafd58a32 readme: use 'sudo' more consistently
I definitely wonder whether I should just drop 'sudo' from the install
instructions and just rely on the user to "know" to do it. But some
commands legitimately do not require 'sudo', so there are actual
differences. Overall, this feels clearer to me but reasonable people can
disagree.
2023-07-08 18:52:42 -04:00
Andrew Savchenko
b92550b67b readme: add install command for ALT Linux
Closes #2330
2023-07-08 18:52:42 -04:00
Kevin Ushey
383d3b336b doc: add '--hidden' to example configuration
This increases visibility of the fact that hidden files are skipped by
default.

Closes #2356
2023-07-08 18:52:42 -04:00
James McKinney
fc7e634395 ci/release: Use GITHUB_REF_NAME instead of GITHUB_REF
This is a nice quality of life improvement.

Closes #2358
2023-07-08 18:52:42 -04:00
James McKinney
c9584b035b ci/release: use GitHub CLI
The old actions I was using are apparently archived because they make
use of deprecated features (like `set-output`). Sigh.

Closes #2360
2023-07-08 18:52:42 -04:00
Alex Rawson
f34fd5c4b6 globset: introduce option to keep empty alternates
Add a method GlobBuilder::empty_alternates and supporting mechanisms.

Ref #1368
Closes #2369
2023-07-08 18:52:42 -04:00
Jérome Eertmans
d51c6c005a globset: permit deserializing Glob from String
Closes #2386, Closes #2388
2023-07-08 18:52:42 -04:00
Jakub Wilk
ea05881319 readme: fix awkward grammar
Closes #2402
2023-07-08 18:52:42 -04:00
sitiom
1d4e3df19c readme: add winget installation section
Closes #2409
2023-07-08 18:52:42 -04:00
Mark Sisson
0f6181d309 ignore/types: add USD to the default file types
Closes #2432
2023-07-08 18:52:42 -04:00
Sam James
e902e2fef4 ignore/types: add Gentoo eclass type
Eclasses are "ebuild libraries" and generally if you're filtering
for/filtering out an ebuild/eclass, you don't want the other either.

Followup to 4dfea016b9

Closes #2437
2023-07-08 18:52:42 -04:00
angrycandy
07cbfee225 ignore/types: improve Elixir globs
Closes #2450
2023-07-08 18:52:42 -04:00
Andrew Gallant
d675844510 core: don't let context flags override eachother
This matches the behavior of GNU grep which does not ignore
before-context and after-context completely if the context flag is also
provided.

Note that this change wasn't done just to match GNU grep. In this case,
GNU grep has the more sensible behavior.

Fixes #2288, Closes #2451
2023-07-08 18:52:42 -04:00
Andrew Gallant
54e609d657 doc: add another example for the config file
Closes #2453
2023-07-08 18:52:42 -04:00
Misaki
43bbcca06f doc: note '-n' and '-N' override each other
Closes #2460
2023-07-08 18:52:42 -04:00
Eric Arellano
ad9bfdd981 ignore/gitignore: expose gitconfig_excludes_path
I have reservations about this, but it looks useful and doesn't seem
terribly onerous to support. The `ignore` crate will really always need
to have some kind of logic supporting this in some form I think.

Closes #2482
2023-07-08 18:52:42 -04:00
Gal Ofri
36194c2742 test: test that regex inline flags work as intended
This was originally fixed by using non-capturing groups when joining
patterns in crates/core/args.rs, but before that landed, it ended up
getting fixed via a refactor in the course of migrating to regex 1.9.
Namely, it's now fixed by pushing pattern joining down into the regex
layer, so that patterns can be joined in the most effective way
possible.

Still, #2488 contains a useful test, so we bring that in here. The
test actually failed for `rg -e ')('`, since it expected the command to
fail with a syntax error. But my refactor actually causes this command
to succeed. And indeed, #2488 worked around this by special casing a
single pattern. That work-around fixes it for the single pattern case,
but doesn't fix it for the -w or -X or multi-pattern case. So for now,
we're content to leave well enough alone. The only real way to fix this
for real is to parse each regexp individual and verify that each is
valid on its own. It's not clear that doing so is worth it.

Fixes #2480, Closes #2488
2023-07-08 18:52:42 -04:00
Jakub Jirutka
0c1cbd99f3 ignore: tweak regex crate features
This removes most of the Unicode features as they aren't currently
used. We can always add them back later if necessary.

We can avoid the unicode-perl feature by changing `\s` to `[[:space:]]`,
which uses the ASCII-only definition of `\s`. Since we don't expect
non-ASCII whitespace in git config files, this seems okay.

Closes #2502
2023-07-08 18:52:42 -04:00
Jon Parise
96cfc0ed13 ignore/types: add 'graphql' type
GraphQL file extensions: .graphql and .graphqls (schema)

We could also add `.gql`, but perhaps it's less correct to do so. We'll
start conservatively here, and we can always add `.gql` later.

Closes #2439, Closes #2508
2023-07-08 18:52:42 -04:00
mataha
da8ecddce9 cli: make resolve_binary take COM executables into account
When `resolve_binary()` attempts to resolve a path to a program on
Windows while searching for a program in `PATH` without an extension,
`ripgrep` will assume the extension of the file to be `.exe` as it's
the *de facto* standard, which will work most (99.99%) of the time...

...unless the binary is a COM executable (we're on Windows, duh).

Closes #2523
2023-07-08 18:52:42 -04:00
Yifei Teng
545a7dc759 ignore/types: add cml to the default types list
It's used in Fuchsia to mean "component manifest language."[1]

[1]: https://fuchsia.dev/reference/cml?hl=en

Closes #2529
2023-07-08 18:52:42 -04:00
Jonathan Schwender
16f783832e doc: update rust-version in Cargo.toml
The MSRV got bumped a little bit ago, so this is just catchup.

Closes #2539
2023-07-08 18:52:42 -04:00
Andrew Gallant
f4d07b9cbd grep-cli-0.1.8 2023-07-05 17:09:09 -04:00
Andrew Gallant
0b6eccf4d3 ci: try to fix CI 2023-07-05 14:04:29 -04:00
Andrew Gallant
3ac4541e9f regex: remove old inner literal extractor
(It had already been removed from the crate.)
2023-07-05 14:04:29 -04:00
Andrew Gallant
7b72e982f2 deps: update everything 2023-07-05 14:04:29 -04:00
Andrew Gallant
a68db3ac02 deps: drop temporary patch and move to bstr 1.6
Now that regex 1.9 is out, we can depend on it from crates.io.
2023-07-05 14:04:29 -04:00
Andrew Gallant
b12905daca deps: update everything 2023-07-05 14:04:29 -04:00
Andrew Gallant
ca740d9ace regex: add new inner literal extractor
This is mostly a copy of the prefix literal extractor in regex-syntax,
but with a tweaked notion of Seq that keeps track of whether it's a
prefix of an expression or not. If it isn't, then we can't cross it as a
suffix to another Seq.

This new extractor should be a lot more robust than the old one. We
actually will keep going through the regex to try and find the "best"
literals to search for (according to some heuristic).
2023-07-05 14:04:29 -04:00
Andrew Gallant
e80c102dee regex: tweak formatting of regex-automata version spec
This makes it easier to enable the `logging` feature for regex-automata.

I wish I could just enable it unconditionally, but it winds up producing
a lot of output because ripgrep uses regexes for things other than the
primary search (like every glob). Sigh.
2023-07-05 14:04:29 -04:00
Andrew Gallant
8ac66a9e04 regex: refactor matcher construction
This does a little bit of refactoring so that we can pass both a
ConfiguredHIR and a Regex to the inner literal extraction routine.

One downside of this approach is that a regex object hangs on to a
ConfiguredHIR. But the extra memory usage is probably negligible. A
benefit though is that converting the HIR to its concrete syntax is now
lazy and only happens when logging is enabled.
2023-07-05 14:04:29 -04:00
Andrew Gallant
04dde9a4eb regex: tweak DFA settings
This increases the limits a bit for when the regex engine will build and
use a fully compiled DFA. They can faster in some circumstances. For
example, '(?-u)^\w{30,}$' gets a nice speed boost from state
acceleration.

We are also able to remove `regex` proper as a dependency. Wow.
2023-07-05 14:04:29 -04:00
Andrew Gallant
81341702af regex: push more pattern handling to matcher construction
Previously, ripgrep core was responsible for escaping regex patterns and
implementing the --line-regexp flag. This commit moves that
responsibility down into the matchers such that ripgrep just needs to
hand the patterns it gets off to the matcher builder. The builder will
then take care of escaping and all that.

This was done to make pattern construction completely owned by the
matcher builders. With the arrival regex-automata, this means we can
move to the HIR very quickly and then never move back to the concrete
syntax. We can then build our regex directly from the HIR. This overall
can save quite a bit of time, especially when searching for large
dictionaries.

We still aren't quite as fast as GNU grep when searching something on
the scale of /usr/share/dict/words, but we are basically within spitting
distance. Prior to this, we were about an order of magnitude slower.

This architecture in particular lets us write a pretty simple fast path
that avoids AST parsing and HIR translation entirely: the case where one
is just searching for a literal. In that case, we can hand construct the
HIR directly.
2023-07-05 14:04:29 -04:00
Andrew Gallant
d34c5c88a7 globset: fix build error in tests
I guess we haven't been testing with the Serde feature enabled? Weird.
2023-07-05 14:04:29 -04:00
Andrew Gallant
4b8aa91ae5 deps: update to pcre2 0.2.4
0.2.4 updates to PCRE2 10.42 and has a few other nice changes. For
example, when `utf` is enabled, the crate will always set the
PCRE2_MATCH_INVALID_UTF option. That means we no longer need to do
transcoding or UTF-8 validity checks.

Because of this, we actually get to remove one of the two uses of
`unsafe` in ripgrep's `main` program.

(This also updates a couple other dependencies for convenience.)
2023-07-05 14:04:29 -04:00
Andrew Gallant
a775b493fd regex: small cleanups
Just some small polishing. We also get rid of thread_local in favor of
using regex-automata, mostly just in the name of reducing dependencies.
(We should eventually be able to drop thread_local completely.)
2023-07-05 14:04:29 -04:00
Andrew Gallant
a6dbff502f regex: s/locations/captures
Now that we use regex-automata, we no longer use any type with
"locations" in it. Instead, that's mostly legacy from the top-level
regex crate.
2023-07-05 14:04:29 -04:00
Andrew Gallant
51480d57a6 regex: simplify AST analysis a bit
The verbatim literal stuff hasn't been used for a while and I don't
foresee it being used. If it's really needed, it would probably better
to just implement it by looking at the pattern string itself, which
avoids parsing it into an AST altogether.
2023-07-05 14:04:29 -04:00
Andrew Gallant
d9bd261be8 regex: some small cleanup in 'strip.rs'
We also utilize bstr's methods to get rid of some helpers we had written
by hand.
2023-07-05 14:04:29 -04:00
Andrew Gallant
9d62eb997a BREAKING: regex: finally remove CRLF hack
Now that Rust's regex crate finally supports a CRLF mode, we can remove
this giant hack in ripgrep to enable it. (And assuredly did not work in
all cases.)

The way this works in the regex engine is actually subtly different than
what ripgrep previously did. Namely, --crlf would previously treat
either \r\n or \n as a line terminator. But now it treats \r\n, \n and
\r as line terminators. In effect, it is implemented by treating \r and
\n as line terminators, but ^ and $ will never match at a position
between a \r and a \n.

So basically this means that $ will end up matching in more cases than
it might be intended too, but I don't expect this to be a big problem in
practice.

Note that passing --crlf to ripgrep and enabling CRLF mode in the regex
via the `R` inline flag (e.g., `(?R:$)`) are subtly different. The `R`
flag just controls the regex engine, but --crlf instructs all of ripgrep
to use \r\n as a line terminator. There are likely some inconsistencies
or corner cases that are wrong as a result of this cognitive dissonance,
but we choose to leave well enough alone for now.

Fixing this for real will probably require re-thinking how line
terminators are handled in ripgrep. For example, one "problem" with how
they're handled now is that ripgrep will re-insert its own line
terminators when printing output instead of copying the input. This is
maybe not so great and perhaps unexpected. (ripgrep probably can't get
away with not inserting any line terminators. Users probably expect
files that don't end with a line terminator whose last line matches to
have a line terminator inserted.)
2023-07-05 14:04:29 -04:00
Andrew Gallant
e028ea3792 regex: migrate grep-regex to regex-automata
We just do a "basic" dumb migration. We don't try to improve anything
here.
2023-07-05 14:04:29 -04:00
Andrew Gallant
1035f6b1ff deps: initial migration steps to regex 1.9
This leaves the grep-regex crate in tatters. Pretty much the entire
thing needs to be re-worked. The upshot is that it should result in some
big simplifications. I hope.

The idea here is to drop down and actually use regex-automata 0.3
instead of the regex crate itself.
2023-07-05 14:04:29 -04:00
Andrew Gallant
a7f1276021 readme: update Debian instructions
We probably don't need to mention Buster specifically nor Debian
unstable since ripgrep has been in Debian for a while now.

But we can't just get rid of the `deb` file either, because Debian might
package a very old version.

Fixes #2531
2023-06-12 07:50:13 -04:00
Martin Nordholts
4fcb1b2202 cli: replace atty with std::io::IsTerminal
The `atty` crate is unmaintained[1] and `std::io::IsTerminal` was
stabilized in Rust 1.70.

[1]: https://rustsec.org/advisories/RUSTSEC-2021-0145.html

PR #2526
2023-06-05 14:00:46 -04:00
Francois Marier
949092fd22 ignore/types: add 'mdwn' to Markdown
PR #2520
2023-05-26 14:44:41 -04:00
Andrew Gallant
4a7e7094ad deps: update everything else 2023-05-25 13:06:13 -04:00
Andrew Gallant
fc0d9b90a9 deps: bump regex to 1.8.3
This brings in an update from the regex crate that fixes a matching bug
for particular kinds of alternations of literals.

Fixes #2518
2023-05-25 13:06:13 -04:00
Ville Skyttä
335aa4937a ignore/types: add *.pyi for Python
https://peps.python.org/pep-0484/#stub-files

PR #2517
2023-05-23 07:10:02 -04:00
Adam Reichold
803c447845 searcher: re-enable mmap on 32-bit architectures
memmap2 v0.3.0 introduced a regression when trying to map files larger than 4GB
on 32-bit architectures[1] which was subsequently fixed in v0.3.1[2].

This commit bumps locked version of the memmap2 dependency to the current v0.5.0
and reverts fdfc418be5 to re-enable mmap on 32-bit
architectures as a different approach to fixing [3].

This was tested to report matches from the end of a 5GB file using MinGW and Wine.

Ref #1911, PR #2000 

[1] 5e271224c8
[2] 9aa838aed9
[3] https://github.com/BurntSushi/ripgrep/issues/1911
2023-05-19 08:23:53 -04:00
Andrew Gallant
c5415adbe8 deps: update everything
This does unfortunately bring in both regex-syntax 0.6 and 0.7, but
we'll fix that once regex 1.9 is out.
2023-05-16 13:14:23 -04:00
Andrew Gallant
251376597f deps: update minimum version of grep crate
Ref #2516
2023-05-16 13:13:34 -04:00
Andrew Gallant
e593f5b7ee grep-0.2.12 2023-05-16 13:12:45 -04:00
Andrew Gallant
6b19be2477 crates/grep: remove 'deny(missing_docs)'
This crate is only a shim over a bunch of other crates. I'm not sure
that there's anything to add to each of the `pub extern` items. So
instead of just writing fluff, I removed the lint.

Fixes #2516
2023-05-16 13:10:42 -04:00
Ryan Whitehouse
041544853c doc: fix --quiet docs
The wording was previously inverted, which had the opposite
meaning as was intended.

Fixes #1962
2023-03-28 07:22:59 -04:00
Manu
a7ae9e4043 ignore/types: add support for docker-compose files
Default file is docker-compose.yml and the documentation
mentions overrides in the form of docker-compose.*.yml.

PR #2469
2023-03-21 12:56:38 -04:00
Andrew Gallant
595e7845b8 readme: add a link to delta's support for ripgrep
Ref: https://github.com/BurntSushi/ripgrep/issues/86#issuecomment-1469717706
2023-03-15 08:02:04 -04:00
David Ringo
44fb9fce2c ignore/types: add *.sln for msbuild
.sln is the extension for Visual Studio Project Soltion files, one of
the file types accepted as inputs by MSBuild.

PR #2415
2023-02-09 21:20:49 -05:00
Vincent Bockaert
339c46a6ed ignore/types: enhance terraform default filter
The default filter for terraform only checks for *.tf files, but there
are quite few other terraform filetypes.

The explanation for all of them can be found below (including link to
documentation from Hashicorp at time of writing)

- *.tf.json & *.tfvars.json is to capture the files written in
  JSON-based variant of the Terraform language
    - https://developer.hashicorp.com/terraform/language/files
- *.tfvars is used to supply variables
    - https://developer.hashicorp.com/terraform/cloud-docs/workspaces/variables#6-auto-tfvars-variable-files
- .terraform.lock.hcl is used as a Dependency lock file
    - https://developer.hashicorp.com/terraform/language/files/dependency-lock
- terraform.rc & .terraformrc, *.tfrc
    - https://developer.hashicorp.com/terraform/cli/config/config-file

PR #2412
2023-02-09 12:57:01 -05:00
Andrew Gallant
fe97c0a152 ignore-0.4.20 2023-01-15 08:21:02 -05:00
Christian Vallentin
826f3fad5b ignore/api: add Clone and Debug impls for OverrideBuilder
PR #2397
2023-01-15 08:16:27 -05:00
Andrew Gallant
bc55049327 readme: update MSRV in README
... this was apparently long outdated, wow.
2023-01-05 12:09:46 -05:00
Andrew Gallant
d58e9353fc deps: update to grep 0.2.11 2023-01-05 09:13:47 -05:00
Andrew Gallant
ca60fef4db grep-0.2.11 2023-01-05 09:12:49 -05:00
Andrew Gallant
a25307d6c8 deps: update to grep-printer 0.1.7 2023-01-05 09:12:37 -05:00
Andrew Gallant
b80947a8b3 grep-printer-0.1.7 2023-01-05 09:11:16 -05:00
Andrew Gallant
ad793a0d8f deps: update to grep-searcher 0.1.11 2023-01-05 09:07:49 -05:00
Andrew Gallant
120e55e7c7 grep-searcher-0.1.11 2023-01-05 09:07:09 -05:00
Andrew Gallant
3941a7701d deps: update to grep-pcre2 0.1.6 2023-01-05 09:06:52 -05:00
Andrew Gallant
96e130fbf9 grep-pcre2-0.1.6 2023-01-05 09:05:59 -05:00
Andrew Gallant
180c4eaf8b deps: update to grep-regex 0.1.11 2023-01-05 09:05:39 -05:00
Andrew Gallant
81529288cf grep-regex-0.1.11 2023-01-05 09:02:55 -05:00
Andrew Gallant
bcc7473a87 deps: update to grep-matcher 0.1.6 2023-01-05 09:02:40 -05:00
Andrew Gallant
bc78c644db grep-matcher-0.1.6 2023-01-05 09:00:33 -05:00
Andrew Gallant
dc7267a0fb deps: update to grep-cli 0.1.7 2023-01-05 08:58:47 -05:00
Andrew Gallant
3224324e25 grep-cli-0.1.7 2023-01-05 08:57:31 -05:00
Andrew Gallant
0f61f08eb1 deps: update to ignore 0.4.19 2023-01-05 08:57:05 -05:00
Andrew Gallant
a0e8dbe9df ignore-0.4.19 2023-01-05 08:55:46 -05:00
Andrew Gallant
e95254a86f deps: remove ignore's dependency on crossbeam-utils
Scoped threads are now part of std.
2023-01-05 08:51:08 -05:00
Andrew Gallant
2f484d8ce5 deps: update to globset 0.4.10 2023-01-05 08:49:58 -05:00
59 changed files with 2956 additions and 2021 deletions

View File

@@ -1,6 +0,0 @@
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"

View File

@@ -42,31 +42,31 @@ jobs:
- win-gnu
include:
- build: pinned
os: ubuntu-22.04
rust: 1.65.0
os: ubuntu-latest
rust: 1.70.0
- build: stable
os: ubuntu-22.04
os: ubuntu-latest
rust: stable
- build: beta
os: ubuntu-22.04
os: ubuntu-latest
rust: beta
- build: nightly
os: ubuntu-22.04
os: ubuntu-latest
rust: nightly
- build: nightly-musl
os: ubuntu-22.04
os: ubuntu-latest
rust: nightly
target: x86_64-unknown-linux-musl
- build: nightly-32
os: ubuntu-22.04
os: ubuntu-latest
rust: nightly
target: i686-unknown-linux-gnu
- build: nightly-mips
os: ubuntu-22.04
os: ubuntu-latest
rust: nightly
target: mips64-unknown-linux-gnuabi64
- build: nightly-arm
os: ubuntu-22.04
os: ubuntu-latest
rust: nightly
# For stripping release binaries:
# docker run --rm -v $PWD/target:/target:Z \
@@ -75,7 +75,7 @@ jobs:
# /target/arm-unknown-linux-gnueabihf/debug/rg
target: arm-unknown-linux-gnueabihf
- build: macos
os: macos-12
os: macos-latest
rust: nightly
- build: win-msvc
os: windows-2022
@@ -88,12 +88,12 @@ jobs:
uses: actions/checkout@v3
- name: Install packages (Ubuntu)
if: matrix.os == 'ubuntu-22.04'
if: matrix.os == 'ubuntu-latest'
run: |
ci/ubuntu-install-packages
- name: Install packages (macOS)
if: matrix.os == 'macos-12'
if: matrix.os == 'macos-latest'
run: |
ci/macos-install-packages
@@ -178,7 +178,7 @@ jobs:
rustfmt:
name: rustfmt
runs-on: ubuntu-22.04
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
@@ -192,7 +192,7 @@ jobs:
docs:
name: Docs
runs-on: ubuntu-22.04
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3

View File

@@ -24,31 +24,24 @@ on:
jobs:
create-release:
name: create-release
runs-on: ubuntu-22.04
runs-on: ubuntu-latest
# env:
# Set to force version number, e.g., when no tag exists.
# RG_VERSION: TEST-0.0.0
outputs:
upload_url: ${{ steps.release.outputs.upload_url }}
rg_version: ${{ env.RG_VERSION }}
steps:
- uses: actions/checkout@v3
- name: Get the release version from the tag
shell: bash
if: env.RG_VERSION == ''
run: |
# Apparently, this is the right way to get a tag name. Really?
#
# See: https://github.community/t5/GitHub-Actions/How-to-get-just-the-tag-name/m-p/32167/highlight/true#M1027
echo "RG_VERSION=${GITHUB_REF#refs/tags/}" >> $GITHUB_ENV
echo "RG_VERSION=$GITHUB_REF_NAME" >> $GITHUB_ENV
echo "version is: ${{ env.RG_VERSION }}"
- name: Create GitHub release
id: release
uses: actions/create-release@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
tag_name: ${{ env.RG_VERSION }}
release_name: ${{ env.RG_VERSION }}
GH_TOKEN: ${{ github.token }}
run: gh release create ${{ env.RG_VERSION }}
build-release:
name: build-release
@@ -71,27 +64,27 @@ jobs:
build: [linux, linux-arm, macos, win-msvc, win-gnu, win32-msvc]
include:
- build: linux
os: ubuntu-22.04
os: ubuntu-latest
rust: nightly
target: x86_64-unknown-linux-musl
- build: linux-arm
os: ubuntu-22.04
os: ubuntu-latest
rust: nightly
target: arm-unknown-linux-gnueabihf
- build: macos
os: macos-12
os: macos-latest
rust: nightly
target: x86_64-apple-darwin
- build: win-msvc
os: windows-2022
os: windows-latest
rust: nightly
target: x86_64-pc-windows-msvc
- build: win-gnu
os: windows-2022
os: windows-latest
rust: nightly-x86_64-gnu
target: x86_64-pc-windows-gnu
- build: win32-msvc
os: windows-2022
os: windows-latest
rust: nightly
target: i686-pc-windows-msvc
@@ -100,12 +93,12 @@ jobs:
uses: actions/checkout@v3
- name: Install packages (Ubuntu)
if: matrix.os == 'ubuntu-22.04'
if: matrix.os == 'ubuntu-latest'
run: |
ci/ubuntu-install-packages
- name: Install packages (macOS)
if: matrix.os == 'macos-12'
if: matrix.os == 'macos-latest'
run: |
ci/macos-install-packages
@@ -132,8 +125,8 @@ jobs:
- name: Build release binary
run: ${{ env.CARGO }} build --verbose --release --features pcre2 ${{ env.TARGET_FLAGS }}
- name: Strip release binary (linux and macos)
if: matrix.build == 'linux' || matrix.build == 'macos'
- name: Strip release binary (linux, macos and macos-arm)
if: matrix.build == 'linux' || matrix.os == 'macos'
run: strip "target/${{ matrix.target }}/release/rg"
- name: Strip release binary (arm)
@@ -157,24 +150,23 @@ jobs:
cp "$outdir"/{rg.bash,rg.fish,_rg.ps1} "$staging/complete/"
cp complete/_rg "$staging/complete/"
if [ "${{ matrix.os }}" = "windows-2022" ]; then
if [ "${{ matrix.os }}" = "windows-latest" ]; then
cp "target/${{ matrix.target }}/release/rg.exe" "$staging/"
7z a "$staging.zip" "$staging"
certutil -hashfile "$staging.zip" SHA256 > "$staging.zip.sha256"
echo "ASSET=$staging.zip" >> $GITHUB_ENV
echo "ASSET_SUM=$staging.zip.sha256" >> $GITHUB_ENV
else
# The man page is only generated on Unix systems. ¯\_(ツ)_/¯
cp "$outdir"/rg.1 "$staging/doc/"
cp "target/${{ matrix.target }}/release/rg" "$staging/"
tar czf "$staging.tar.gz" "$staging"
shasum -a 256 "$staging.tar.gz" > "$staging.tar.gz.sha256"
echo "ASSET=$staging.tar.gz" >> $GITHUB_ENV
echo "ASSET_SUM=$staging.tar.gz.sha256" >> $GITHUB_ENV
fi
- name: Upload release archive
uses: actions/upload-release-asset@v1.0.2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ needs.create-release.outputs.upload_url }}
asset_path: ${{ env.ASSET }}
asset_name: ${{ env.ASSET }}
asset_content_type: application/octet-stream
GH_TOKEN: ${{ github.token }}
run: gh release upload ${{ needs.create-release.outputs.rg_version }} ${{ env.ASSET }} ${{ env.ASSET_SUM }}

View File

@@ -2,14 +2,42 @@ TBD
===
Unreleased changes. Release notes have not yet been written.
**BREAKING CHANGES**
* `rg -C1 -A2` used to be equivalent to `rg -A2`, but now it is equivalent to
`rg -B1 -A2`. That is, `-A` and `-B` no longer completely override `-C`.
Instead, they only partially override `-C`.
Feature enhancements:
* Added or improved file type filtering for Ada, DITA, Elixir, Fuchsia, Gentoo, GraphQL, Markdown, Raku, TypeScript, USD, V
* [FEATURE #1790](https://github.com/BurntSushi/ripgrep/issues/1790):
Add new `--stop-on-nonmatch` flag.
* [FEATURE #2195](https://github.com/BurntSushi/ripgrep/issues/2195):
When `extra-verbose` mode is enabled in zsh, show extra file type info.
* [FEATURE #2409](https://github.com/BurntSushi/ripgrep/pull/2409):
Added installation instructions for `winget`.
Bug fixes:
* [BUG #1891](https://github.com/BurntSushi/ripgrep/issues/1891):
Fix bug when using `-w` with a regex that can match the empty string.
* [BUG #1911](https://github.com/BurntSushi/ripgrep/issues/1911):
Disable mmap searching in all non-64-bit environments.
* [BUG #2108](https://github.com/BurntSushi/ripgrep/issues/2108):
Improve docs for `-r/--replace` syntax.
* [BUG #2198](https://github.com/BurntSushi/ripgrep/issues/2198):
Fix bug where `--no-ignore-dot` would not ignore `.rgignore`.
* [BUG #2288](https://github.com/BurntSushi/ripgrep/issues/2288):
`-A` and `-B` now only each partially override `-C`.
* [BUG #2236](https://github.com/BurntSushi/ripgrep/issues/2236):
Fix gitignore parsing bug where a trailing `\/` resulted in an error.
* [BUG #2243](https://github.com/BurntSushi/ripgrep/issues/2243):
Fix `--sort` flag for values other than `path`.
* [BUG #2480](https://github.com/BurntSushi/ripgrep/issues/2480):
Fix bug when using inline regex flags with `-e/--regexp`.
* [BUG #2523](https://github.com/BurntSushi/ripgrep/issues/2523):
Make executable searching take `.com` into account on Windows.
13.0.0 (2021-06-12)

196
Cargo.lock generated
View File

@@ -4,24 +4,13 @@ version = 3
[[package]]
name = "aho-corasick"
version = "0.7.20"
version = "1.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cc936419f96fa211c1b9166887b38e5e40b19958e5b895be7c1f93adec7071ac"
checksum = "43f6cb1bf222025340178f382c426f13757b2960e89779dfcb319c32542a5a41"
dependencies = [
"memchr",
]
[[package]]
name = "atty"
version = "0.2.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d9b39be18770d11421cdb1b9947a45dd3f37e93092cbf377614828a319d5fee8"
dependencies = [
"hermit-abi",
"libc",
"winapi",
]
[[package]]
name = "base64"
version = "0.20.0"
@@ -36,12 +25,11 @@ checksum = "bef38d45163c2f1dde094a7dfd33ccf595c92905c8f8f4fdc18d06fb1037718a"
[[package]]
name = "bstr"
version = "1.1.0"
version = "1.6.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b45ea9b00a7b3f2988e9a65ad3917e62123c38dba709b666506207be96d1790b"
checksum = "6798148dccfbff0fae41c7574d2fa8f1ef3492fba0face179de5d8d447d67b05"
dependencies = [
"memchr",
"once_cell",
"regex-automata",
"serde",
]
@@ -54,9 +42,9 @@ checksum = "2c676a478f63e9fa2dd5368a42f28bba0d6c560b775f38583c8bbaa7fcd67c9c"
[[package]]
name = "cc"
version = "1.0.78"
version = "1.0.79"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a20104e2335ce8a659d6dd92a51a767a0c062599c73b343fd152cb401e828c3d"
checksum = "50d30906286121d95be3d479533b458f87493b30a4b5f79a607db8f5d11aa91f"
dependencies = [
"jobserver",
]
@@ -81,9 +69,9 @@ dependencies = [
[[package]]
name = "crossbeam-channel"
version = "0.5.6"
version = "0.5.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c2dd04ddaf88237dc3b8d8f9a3c1004b506b54b3313403944054d23c0870c521"
checksum = "a33c2bf77f2df06183c3aa30d1e96c0695a313d4f9c453cc3762a6db39f99200"
dependencies = [
"cfg-if",
"crossbeam-utils",
@@ -91,18 +79,18 @@ dependencies = [
[[package]]
name = "crossbeam-utils"
version = "0.8.14"
version = "0.8.16"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4fb766fa798726286dbbb842f174001dab8abc7b627a1dd86e0b7222a95d929f"
checksum = "5a22b2d63d4d1dc0b7f1b6b2747dd0088008a9be28b6ddf0b1e7d335e3037294"
dependencies = [
"cfg-if",
]
[[package]]
name = "encoding_rs"
version = "0.8.31"
version = "0.8.32"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9852635589dc9f9ea1b6fe9f05b50ef208c85c834a562f0c6abb1c475736ec2b"
checksum = "071a31f4ee85403370b58aca746f01041ede6f0da2730960ad001edc2b71b394"
dependencies = [
"cfg-if",
"packed_simd_2",
@@ -123,21 +111,15 @@ version = "1.0.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3f9eec918d3f24069decb9af1554cad7c880e2da24a9afd88aca000531ab82c1"
[[package]]
name = "fs_extra"
version = "1.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2022715d62ab30faffd124d40b76f4134a550a87792276512b18d63272333394"
[[package]]
name = "glob"
version = "0.3.0"
version = "0.3.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9b919933a397b79c37e33b77bb2aa3dc8eb6e165ad809e58ff75bc7db2e34574"
checksum = "d2fabcfbdc87f4758337ca535fb41a6d701b65693ce38287d856d1674551ec9b"
[[package]]
name = "globset"
version = "0.4.10"
version = "0.4.12"
dependencies = [
"aho-corasick",
"bstr",
@@ -152,7 +134,7 @@ dependencies = [
[[package]]
name = "grep"
version = "0.2.10"
version = "0.2.12"
dependencies = [
"grep-cli",
"grep-matcher",
@@ -166,9 +148,8 @@ dependencies = [
[[package]]
name = "grep-cli"
version = "0.1.6"
version = "0.1.9"
dependencies = [
"atty",
"bstr",
"globset",
"lazy_static",
@@ -181,7 +162,7 @@ dependencies = [
[[package]]
name = "grep-matcher"
version = "0.1.5"
version = "0.1.6"
dependencies = [
"memchr",
"regex",
@@ -189,15 +170,16 @@ dependencies = [
[[package]]
name = "grep-pcre2"
version = "0.1.5"
version = "0.1.6"
dependencies = [
"grep-matcher",
"log",
"pcre2",
]
[[package]]
name = "grep-printer"
version = "0.1.6"
version = "0.1.7"
dependencies = [
"base64",
"bstr",
@@ -211,20 +193,19 @@ dependencies = [
[[package]]
name = "grep-regex"
version = "0.1.10"
version = "0.1.11"
dependencies = [
"aho-corasick",
"bstr",
"grep-matcher",
"log",
"regex",
"regex-automata",
"regex-syntax",
"thread_local",
]
[[package]]
name = "grep-searcher"
version = "0.1.10"
version = "0.1.11"
dependencies = [
"bstr",
"bytecount",
@@ -237,21 +218,11 @@ dependencies = [
"regex",
]
[[package]]
name = "hermit-abi"
version = "0.1.19"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "62b467343b94ba476dcb2500d242dadbb39557df889310ac77c5d99100aaac33"
dependencies = [
"libc",
]
[[package]]
name = "ignore"
version = "0.4.18"
version = "0.4.20"
dependencies = [
"crossbeam-channel",
"crossbeam-utils",
"globset",
"lazy_static",
"log",
@@ -265,18 +236,17 @@ dependencies = [
[[package]]
name = "itoa"
version = "1.0.5"
version = "1.0.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fad582f4b9e86b6caa621cabeb0963332d92eea04729ab12892c2533951e6440"
checksum = "62b02a5381cc465bd3041d84623d0fa3b66738b52b8e2fc3bab8ad63ab032f4a"
[[package]]
name = "jemalloc-sys"
version = "0.5.2+5.3.0-patched"
version = "0.5.3+5.3.0-patched"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "134163979b6eed9564c98637b710b40979939ba351f59952708234ea11b5f3f8"
checksum = "f9bd5d616ea7ed58b571b2e209a65759664d7fb021a0819d7a790afc67e47ca1"
dependencies = [
"cc",
"fs_extra",
"libc",
]
@@ -292,9 +262,9 @@ dependencies = [
[[package]]
name = "jobserver"
version = "0.1.25"
version = "0.1.26"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "068b1ee6743e4d11fb9c6a1e6064b3693a1b600e7f5f5988047d98b3dc9fb90b"
checksum = "936cfd212a0155903bcbc060e316fb6cc7cbf2e1907329391ebadc1fe0ce77c2"
dependencies = [
"libc",
]
@@ -307,9 +277,9 @@ checksum = "e2abad23fbc42b3700f2f279844dc832adb2b2eb069b2df918f455c4e18cc646"
[[package]]
name = "libc"
version = "0.2.139"
version = "0.2.147"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "201de327520df007757c1f0adce6e827fe8562fbc28bfd9c15571c66ca1f5f79"
checksum = "b4668fb0ea861c1df094127ac5f1da3409a82116a4ba74fca2e58ef927159bb3"
[[package]]
name = "libm"
@@ -319,12 +289,9 @@ checksum = "7fc7aa29613bd6a620df431842069224d8bc9011086b1db4c0e0cd47fa03ec9a"
[[package]]
name = "log"
version = "0.4.17"
version = "0.4.19"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "abb12e687cfb44aa40f41fc3978ef76448f9b6038cad6aef4259d3c095a2382e"
dependencies = [
"cfg-if",
]
checksum = "b06a4cde4c0f271a446782e3eff8de789548ce57dbc8eca9292c27f4a42004b4"
[[package]]
name = "memchr"
@@ -334,18 +301,18 @@ checksum = "2dffe52ecf27772e601905b7522cb4ef790d2cc203488bbd0e2fe85fcb74566d"
[[package]]
name = "memmap2"
version = "0.5.8"
version = "0.5.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4b182332558b18d807c4ce1ca8ca983b34c3ee32765e47b3f0f69b90355cc1dc"
checksum = "83faa42c0a078c393f6b29d5db232d8be22776a891f8f56e5284faee4a20b327"
dependencies = [
"libc",
]
[[package]]
name = "once_cell"
version = "1.17.0"
version = "1.18.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6f61fba1741ea2b3d6a1e3178721804bb716a68a6aeba1149b5d52e3d464ea66"
checksum = "dd8b5dd2ae5ed71462c540258bedcb51965123ad7e7ccf4b9a8cafaa4a63576d"
[[package]]
name = "packed_simd_2"
@@ -359,9 +326,9 @@ dependencies = [
[[package]]
name = "pcre2"
version = "0.2.3"
version = "0.2.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "85b30f2f69903b439dd9dc9e824119b82a55bf113b29af8d70948a03c1b11ab1"
checksum = "486aca7e74edb8cab09a48d461177f450a5cca3b55e61d139f7552190e2bbcf5"
dependencies = [
"libc",
"log",
@@ -371,9 +338,9 @@ dependencies = [
[[package]]
name = "pcre2-sys"
version = "0.2.5"
version = "0.2.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dec30e5e9ec37eb8fbf1dea5989bc957fd3df56fbee5061aa7b7a99dbb37b722"
checksum = "ae234f441970dbd52d4e29bee70f3b56ca83040081cb2b55b7df772b16e0b06e"
dependencies = [
"cc",
"libc",
@@ -382,50 +349,56 @@ dependencies = [
[[package]]
name = "pkg-config"
version = "0.3.26"
version = "0.3.27"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6ac9a59f73473f1b8d852421e59e64809f025994837ef743615c6d0c5b305160"
checksum = "26072860ba924cbfa98ea39c8c19b4dd6a4a25423dbdf219c1eca91aa0cf6964"
[[package]]
name = "proc-macro2"
version = "1.0.49"
version = "1.0.63"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "57a8eca9f9c4ffde41714334dee777596264c7825420f521abc92b5b5deb63a5"
checksum = "7b368fba921b0dce7e60f5e04ec15e565b3303972b42bcfde1d0713b881959eb"
dependencies = [
"unicode-ident",
]
[[package]]
name = "quote"
version = "1.0.23"
version = "1.0.29"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8856d8364d252a14d474036ea1358d63c9e6965c8e5c1885c18f73d70bff9c7b"
checksum = "573015e8ab27661678357f27dc26460738fd2b6c86e46f386fde94cb5d913105"
dependencies = [
"proc-macro2",
]
[[package]]
name = "regex"
version = "1.7.0"
version = "1.9.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e076559ef8e241f2ae3479e36f97bd5741c0330689e217ad51ce2c76808b868a"
checksum = "89089e897c013b3deb627116ae56a6955a72b8bed395c9526af31c9fe528b484"
dependencies = [
"aho-corasick",
"memchr",
"regex-automata",
"regex-syntax",
]
[[package]]
name = "regex-automata"
version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fa250384981ea14565685dea16a9ccc4d1c541a13f82b9c168572264d1df8c56"
dependencies = [
"aho-corasick",
"memchr",
"regex-syntax",
]
[[package]]
name = "regex-automata"
version = "0.1.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6c230d73fb8d8c1b9c0b3135c5142a8acee3a0558fb8db5cf1cb65f8d7862132"
[[package]]
name = "regex-syntax"
version = "0.6.28"
version = "0.7.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "456c603be3e8d448b072f410900c09faf164fbce2d480456f50eea6e25f9c848"
checksum = "2ab07dc67230e4a4718e70fd5c20055a4334b121f1f9db8fe63ef39ce9b8c846"
[[package]]
name = "ripgrep"
@@ -438,7 +411,6 @@ dependencies = [
"jemallocator",
"lazy_static",
"log",
"regex",
"serde",
"serde_derive",
"serde_json",
@@ -448,9 +420,9 @@ dependencies = [
[[package]]
name = "ryu"
version = "1.0.12"
version = "1.0.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7b4b9743ed687d4b4bcedf9ff5eaa7398495ae14e61cba0a295704edbc7decde"
checksum = "fe232bdf6be8c8de797b22184ee71118d63780ea42ac85b61d1baa6d3b782ae9"
[[package]]
name = "same-file"
@@ -463,18 +435,18 @@ dependencies = [
[[package]]
name = "serde"
version = "1.0.152"
version = "1.0.166"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bb7d1f0d3021d347a83e556fc4683dea2ea09d87bccdf88ff5c12545d89d5efb"
checksum = "d01b7404f9d441d3ad40e6a636a7782c377d2abdbe4fa2440e2edcc2f4f10db8"
dependencies = [
"serde_derive",
]
[[package]]
name = "serde_derive"
version = "1.0.152"
version = "1.0.166"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "af487d118eecd09402d70a5d72551860e788df87b464af30e5ea6a38c75c541e"
checksum = "5dd83d6dde2b6b2d466e14d9d1acce8816dedee94f735eac6395808b3483c6d6"
dependencies = [
"proc-macro2",
"quote",
@@ -483,9 +455,9 @@ dependencies = [
[[package]]
name = "serde_json"
version = "1.0.91"
version = "1.0.100"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "877c235533714907a8c2464236f5c4b2a17262ef1bd71f38f35ea592c8da6883"
checksum = "0f1e14e89be7aa4c4b78bdbdc9eb5bf8517829a600ae8eaa39a6e1d960b5185c"
dependencies = [
"itoa",
"ryu",
@@ -500,9 +472,9 @@ checksum = "8ea5119cdb4c55b55d432abb513a0429384878c15dde60cc77b1c99de1a95a6a"
[[package]]
name = "syn"
version = "1.0.107"
version = "2.0.23"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1f4064b5b16e03ae50984a5a8ed5d4f8803e6bc1fd170a3cda91a1be4b18e3f5"
checksum = "59fb7d6d8281a51045d62b8eb3a7d1ce347b76f312af50cd3dc0af39c87c1737"
dependencies = [
"proc-macro2",
"quote",
@@ -511,9 +483,9 @@ dependencies = [
[[package]]
name = "termcolor"
version = "1.1.3"
version = "1.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bab24d30b911b2376f3a13cc2cd443142f0c81dda04c118693e35b3835757755"
checksum = "be55cf8942feac5c765c2c993422806843c9a9a45d4d5c407ad6dd2ea95eb9b6"
dependencies = [
"winapi-util",
]
@@ -529,18 +501,19 @@ dependencies = [
[[package]]
name = "thread_local"
version = "1.1.4"
version = "1.1.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5516c27b78311c50bf42c071425c560ac799b11c30b31f87e3081965fe5e0180"
checksum = "3fdd6f064ccff2d6567adcb3873ca630700f00b5ad3f060c25b5dcfd9a4ce152"
dependencies = [
"cfg-if",
"once_cell",
]
[[package]]
name = "unicode-ident"
version = "1.0.6"
version = "1.0.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "84a22b9f218b40614adcb3f4ff08b703773ad44fa9423e4e0d346d5db86e4ebc"
checksum = "22049a19f4a68748a168c0fc439f9516686aa045927ff767eca0a85101fb6e73"
[[package]]
name = "unicode-width"
@@ -550,12 +523,11 @@ checksum = "c0edd1e5b14653f783770bce4a4dabb4a5108a5370a5f5d8cfe8710c361f6c8b"
[[package]]
name = "walkdir"
version = "2.3.2"
version = "2.3.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "808cf2735cd4b6866113f648b791c6adc5714537bc222d9347bb203386ffda56"
checksum = "36df944cda56c7d8d8b7496af378e6b16de9284591917d307c9b4d313c44e698"
dependencies = [
"same-file",
"winapi",
"winapi-util",
]

View File

@@ -13,11 +13,18 @@ repository = "https://github.com/BurntSushi/ripgrep"
keywords = ["regex", "grep", "egrep", "search", "pattern"]
categories = ["command-line-utilities", "text-processing"]
license = "Unlicense OR MIT"
exclude = ["HomebrewFormula"]
exclude = [
"HomebrewFormula",
"/.github/",
"/ci/",
"/pkg/",
"/benchsuite/",
"/scripts/",
]
build = "build.rs"
autotests = false
edition = "2018"
rust-version = "1.65"
rust-version = "1.70"
[[bin]]
bench = false
@@ -42,12 +49,11 @@ members = [
]
[dependencies]
bstr = "1.1.0"
grep = { version = "0.2.8", path = "crates/grep" }
ignore = { version = "0.4.18", path = "crates/ignore" }
bstr = "1.6.0"
grep = { version = "0.2.12", path = "crates/grep" }
ignore = { version = "0.4.19", path = "crates/ignore" }
lazy_static = "1.1.0"
log = "0.4.5"
regex = "1.3.5"
serde_json = "1.0.23"
termcolor = "1.1.0"

View File

@@ -6,6 +6,7 @@ image = "burntsushi/cross:i686-unknown-linux-gnu"
[target.mips64-unknown-linux-gnuabi64]
image = "burntsushi/cross:mips64-unknown-linux-gnuabi64"
build-std = true
[target.arm-unknown-linux-gnueabihf]
image = "burntsushi/cross:arm-unknown-linux-gnueabihf"

View File

@@ -567,12 +567,15 @@ $ cat $HOME/.ripgreprc
--type-add
web:*.{html,css,js}*
# Search hidden files / directories (e.g. dotfiles) by default
--hidden
# Using glob patterns to include/exclude files or folders
--glob=!git/*
--glob=!.git/*
# or
--glob
!git/*
!.git/*
# Set the colors.
--colors=line:none

View File

@@ -228,17 +228,25 @@ If you're a **Windows Scoop** user, then you can install ripgrep from the
$ scoop install ripgrep
```
If you're a **Windows Winget** user, then you can install ripgrep from the
[winget-pkgs](https://github.com/microsoft/winget-pkgs/tree/master/manifests/b/BurntSushi/ripgrep)
repository:
```
$ winget install BurntSushi.ripgrep.MSVC
```
If you're an **Arch Linux** user, then you can install ripgrep from the official repos:
```
$ pacman -S ripgrep
$ sudo pacman -S ripgrep
```
If you're a **Gentoo** user, you can install ripgrep from the
[official repo](https://packages.gentoo.org/packages/sys-apps/ripgrep):
```
$ emerge sys-apps/ripgrep
$ sudo emerge sys-apps/ripgrep
```
If you're a **Fedora** user, you can install ripgrep from official
@@ -259,6 +267,7 @@ If you're a **RHEL/CentOS 7/8** user, you can install ripgrep from
[copr](https://copr.fedorainfracloud.org/coprs/carlwgeorge/ripgrep/):
```
$ sudo yum install -y yum-utils
$ sudo yum-config-manager --add-repo=https://copr.fedorainfracloud.org/coprs/carlwgeorge/ripgrep/repo/epel-7/carlwgeorge-ripgrep-epel-7.repo
$ sudo yum install ripgrep
```
@@ -268,14 +277,13 @@ If you're a **Nix** user, you can install ripgrep from
```
$ nix-env --install ripgrep
$ # (Or using the attribute name, which is also ripgrep.)
```
If you're a **Guix** user, you can install ripgrep from the official
package collection:
```
$ guix install ripgrep
$ sudo guix install ripgrep
```
If you're a **Debian** user (or a user of a Debian derivative like **Ubuntu**),
@@ -287,8 +295,10 @@ $ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/13.0.0/ripgre
$ sudo dpkg -i ripgrep_13.0.0_amd64.deb
```
If you run Debian Buster (currently Debian stable) or Debian sid, ripgrep is
[officially maintained by Debian](https://tracker.debian.org/pkg/rust-ripgrep).
If you run Debian stable, ripgrep is [officially maintained by
Debian](https://tracker.debian.org/pkg/rust-ripgrep), although its version may
be older than the `deb` package available in the previous step.
```
$ sudo apt-get install ripgrep
```
@@ -306,11 +316,18 @@ seem to work right and generate a number of very strange bug reports that I
don't know how to fix and don't have the time to fix. Therefore, it is no
longer a recommended installation option.)
If you're an **ALT** user, you can install ripgrep from the
[official repo](https://packages.altlinux.org/en/search?name=ripgrep):
```
$ sudo apt-get install ripgrep
```
If you're a **FreeBSD** user, then you can install ripgrep from the
[official ports](https://www.freshports.org/textproc/ripgrep/):
```
# pkg install ripgrep
$ sudo pkg install ripgrep
```
If you're an **OpenBSD** user, then you can install ripgrep from the
@@ -324,26 +341,26 @@ If you're a **NetBSD** user, then you can install ripgrep from
[pkgsrc](https://pkgsrc.se/textproc/ripgrep):
```
# pkgin install ripgrep
$ sudo pkgin install ripgrep
```
If you're a **Haiku x86_64** user, then you can install ripgrep from the
[official ports](https://github.com/haikuports/haikuports/tree/master/sys-apps/ripgrep):
```
$ pkgman install ripgrep
$ sudo pkgman install ripgrep
```
If you're a **Haiku x86_gcc2** user, then you can install ripgrep from the
same port as Haiku x86_64 using the x86 secondary architecture build:
```
$ pkgman install ripgrep_x86
$ sudo pkgman install ripgrep_x86
```
If you're a **Rust programmer**, ripgrep can be installed with `cargo`.
* Note that the minimum supported version of Rust for ripgrep is **1.34.0**,
* Note that the minimum supported version of Rust for ripgrep is **1.70.0**,
although ripgrep may work with older versions.
* Note that the binary may be bigger than expected because it contains debug
symbols. This is intentional. To remove debug symbols and therefore reduce
@@ -358,7 +375,7 @@ $ cargo install ripgrep
ripgrep is written in Rust, so you'll need to grab a
[Rust installation](https://www.rust-lang.org/) in order to compile it.
ripgrep compiles with Rust 1.65.0 (stable) or newer. In general, ripgrep tracks
ripgrep compiles with Rust 1.70.0 (stable) or newer. In general, ripgrep tracks
the latest stable release of the Rust compiler.
To build ripgrep:
@@ -430,12 +447,20 @@ $ cargo test --all
from the repository root.
### Related tools
* [delta](https://github.com/dandavison/delta) is a syntax highlighting
pager that supports the `rg --json` output format. So all you need to do to
make it work is `rg --json pattern | delta`. See [delta's manual section on
grep](https://dandavison.github.io/delta/grep.html) for more details.
### Vulnerability reporting
For reporting a security vulnerability, please
[contact Andrew Gallant](https://blog.burntsushi.net/about/),
which has my email address and PGP public key if you wish to send an encrypted
message.
[contact Andrew Gallant](https://blog.burntsushi.net/about/).
The contact page has my email address and PGP public key if you wish to send an
encrypted message.
### Translations

View File

@@ -48,6 +48,34 @@ fn main() {
if let Some(rev) = git_revision_hash() {
println!("cargo:rustc-env=RIPGREP_BUILD_GIT_HASH={}", rev);
}
// Embed a Windows manifest and set some linker options. The main reason
// for this is to enable long path support on Windows. This still, I
// believe, requires enabling long path support in the registry. But if
// that's enabled, then this will let ripgrep use C:\... style paths that
// are longer than 260 characters.
set_windows_exe_options();
}
fn set_windows_exe_options() {
static MANIFEST: &str = "pkg/windows/Manifest.xml";
let Ok(target_os) = env::var("CARGO_CFG_TARGET_OS") else { return };
let Ok(target_env) = env::var("CARGO_CFG_TARGET_ENV") else { return };
if !(target_os == "windows" && target_env == "msvc") {
return;
}
let Ok(mut manifest) = env::current_dir() else { return };
manifest.push(MANIFEST);
let Some(manifest) = manifest.to_str() else { return };
println!("cargo:rerun-if-changed={}", MANIFEST);
// Embed the Windows application manifest file.
println!("cargo:rustc-link-arg-bin=rg=/MANIFEST:EMBED");
println!("cargo:rustc-link-arg-bin=rg=/MANIFESTINPUT:{manifest}");
// Turn linker warnings into errors. Helps debugging, otherwise the
// warnings get squashed (I believe).
println!("cargo:rustc-link-arg-bin=rg=/WX");
}
fn git_revision_hash() -> Option<String> {

View File

@@ -30,7 +30,7 @@ _rg() {
[[ $_RG_COMPLETE_LIST_ARGS == (1|t*|y*) ]] ||
# (--[imnp]* => --ignore*, --messages, --no-*, --pcre2-unicode)
[[ $PREFIX$SUFFIX == --[imnp]* ]] ||
zstyle -t ":complete:$curcontext:*" complete-all
zstyle -t ":completion:${curcontext}:" complete-all
then
no=
fi
@@ -319,6 +319,7 @@ _rg() {
'(-q --quiet)'{-q,--quiet}'[suppress normal output]'
'--regex-size-limit=[specify upper size limit of compiled regex]:regex size (bytes)'
'*'{-u,--unrestricted}'[reduce level of "smart" searching]'
'--stop-on-nonmatch[stop on first non-matching line after a matching one]'
+ operand # Operands
'(--files --type-list file regexp)1: :_guard "^-*" pattern'
@@ -432,9 +433,13 @@ _rg_types() {
local -a expl
local -aU _types
_types=( ${(@)${(f)"$( _call_program types rg --type-list )"}%%:*} )
_types=( ${(@)${(f)"$( _call_program types $words[1] --type-list )"}//:[[:space:]]##/:} )
_wanted types expl 'file type' compadd -a "$@" - _types
if zstyle -t ":completion:${curcontext}:types" extra-verbose; then
_describe -t types 'file type' _types
else
_wanted types expl 'file type' compadd "$@" - ${(@)_types%%:*}
fi
}
_rg "$@"

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-cli"
version = "0.1.6" #:version
version = "0.1.9" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Utilities for search oriented command line applications.
@@ -14,9 +14,8 @@ license = "Unlicense OR MIT"
edition = "2018"
[dependencies]
atty = "0.2.11"
bstr = "1.1.0"
globset = { version = "0.4.9", path = "../globset" }
bstr = "1.6.0"
globset = { version = "0.4.10", path = "../globset" }
lazy_static = "1.1.0"
log = "0.4.5"
regex = "1.1"

View File

@@ -18,7 +18,7 @@ pub struct DecompressionMatcherBuilder {
}
/// A representation of a single command for decompressing data
/// out-of-proccess.
/// out-of-process.
#[derive(Clone, Debug)]
struct DecompressionCommand {
/// The glob that matches this command.
@@ -132,7 +132,7 @@ impl DecompressionMatcherBuilder {
A: AsRef<OsStr>,
{
let glob = glob.to_string();
let bin = resolve_binary(Path::new(program.as_ref()))?;
let bin = try_resolve_binary(Path::new(program.as_ref()))?;
let args =
args.into_iter().map(|a| a.as_ref().to_os_string()).collect();
self.commands.push(DecompressionCommand { glob, bin, args });
@@ -421,6 +421,34 @@ impl io::Read for DecompressionReader {
/// On non-Windows, this is a no-op.
pub fn resolve_binary<P: AsRef<Path>>(
prog: P,
) -> Result<PathBuf, CommandError> {
if !cfg!(windows) {
return Ok(prog.as_ref().to_path_buf());
}
try_resolve_binary(prog)
}
/// Resolves a path to a program to a path by searching for the program in
/// `PATH`.
///
/// If the program could not be resolved, then an error is returned.
///
/// The purpose of doing this instead of passing the path to the program
/// directly to Command::new is that Command::new will hand relative paths
/// to CreateProcess on Windows, which will implicitly search the current
/// working directory for the executable. This could be undesirable for
/// security reasons. e.g., running ripgrep with the -z/--search-zip flag on an
/// untrusted directory tree could result in arbitrary programs executing on
/// Windows.
///
/// Note that this could still return a relative path if PATH contains a
/// relative path. We permit this since it is assumed that the user has set
/// this explicitly, and thus, desires this behavior.
///
/// If `check_exists` is false or the path is already an absolute path this
/// will return immediately.
fn try_resolve_binary<P: AsRef<Path>>(
prog: P,
) -> Result<PathBuf, CommandError> {
use std::env;
@@ -433,7 +461,7 @@ pub fn resolve_binary<P: AsRef<Path>>(
}
let prog = prog.as_ref();
if !cfg!(windows) || prog.is_absolute() {
if prog.is_absolute() {
return Ok(prog.to_path_buf());
}
let syspaths = match env::var_os("PATH") {
@@ -455,12 +483,14 @@ pub fn resolve_binary<P: AsRef<Path>>(
return Ok(abs_prog.to_path_buf());
}
if abs_prog.extension().is_none() {
let abs_prog = abs_prog.with_extension("exe");
for extension in ["com", "exe"] {
let abs_prog = abs_prog.with_extension(extension);
if is_exe(&abs_prog) {
return Ok(abs_prog.to_path_buf());
}
}
}
}
let msg = format!("{}: could not find executable in PATH", prog.display());
return Err(CommandError::io(io::Error::new(io::ErrorKind::Other, msg)));
}

View File

@@ -165,6 +165,8 @@ mod pattern;
mod process;
mod wtr;
use std::io::IsTerminal;
pub use crate::decompress::{
resolve_binary, DecompressionMatcher, DecompressionMatcherBuilder,
DecompressionReader, DecompressionReaderBuilder,
@@ -215,7 +217,7 @@ pub fn is_readable_stdin() -> bool {
/// Returns true if and only if stdin is believed to be connected to a tty
/// or a console.
pub fn is_tty_stdin() -> bool {
atty::is(atty::Stream::Stdin)
std::io::stdin().is_terminal()
}
/// Returns true if and only if stdout is believed to be connected to a tty
@@ -227,11 +229,11 @@ pub fn is_tty_stdin() -> bool {
/// implementations of `ls` will often show one item per line when stdout is
/// redirected, but will condensed output when printing to a tty.
pub fn is_tty_stdout() -> bool {
atty::is(atty::Stream::Stdout)
std::io::stdout().is_terminal()
}
/// Returns true if and only if stderr is believed to be connected to a tty
/// or a console.
pub fn is_tty_stderr() -> bool {
atty::is(atty::Stream::Stderr)
std::io::stderr().is_terminal()
}

View File

@@ -632,6 +632,7 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
flag_sort(&mut args);
flag_sortr(&mut args);
flag_stats(&mut args);
flag_stop_on_nonmatch(&mut args);
flag_text(&mut args);
flag_threads(&mut args);
flag_trim(&mut args);
@@ -698,7 +699,7 @@ fn flag_after_context(args: &mut Vec<RGArg>) {
"\
Show NUM lines after each match.
This overrides the --context and --passthru flags.
This overrides the --passthru flag and partially overrides --context.
"
);
let arg = RGArg::flag("after-context", "NUM")
@@ -706,8 +707,7 @@ This overrides the --context and --passthru flags.
.help(SHORT)
.long_help(LONG)
.number()
.overrides("passthru")
.overrides("context");
.overrides("passthru");
args.push(arg);
}
@@ -768,7 +768,7 @@ fn flag_before_context(args: &mut Vec<RGArg>) {
"\
Show NUM lines before each match.
This overrides the --context and --passthru flags.
This overrides the --passthru flag and partially overrides --context.
"
);
let arg = RGArg::flag("before-context", "NUM")
@@ -776,8 +776,7 @@ This overrides the --context and --passthru flags.
.help(SHORT)
.long_help(LONG)
.number()
.overrides("passthru")
.overrides("context");
.overrides("passthru");
args.push(arg);
}
@@ -1009,8 +1008,7 @@ fn flag_context(args: &mut Vec<RGArg>) {
Show NUM lines before and after each match. This is equivalent to providing
both the -B/--before-context and -A/--after-context flags with the same value.
This overrides both the -B/--before-context and -A/--after-context flags,
in addition to the --passthru flag.
This overrides the --passthru flag.
"
);
let arg = RGArg::flag("context", "NUM")
@@ -1018,9 +1016,7 @@ in addition to the --passthru flag.
.help(SHORT)
.long_help(LONG)
.number()
.overrides("passthru")
.overrides("before-context")
.overrides("after-context");
.overrides("passthru");
args.push(arg);
}
@@ -1711,6 +1707,8 @@ fn flag_line_number(args: &mut Vec<RGArg>) {
"\
Show line numbers (1-based). This is enabled by default when searching in a
terminal.
This flag overrides --no-line-number.
"
);
let arg = RGArg::switch("line-number")
@@ -1725,6 +1723,8 @@ terminal.
"\
Suppress line numbers. This is enabled by default when not searching in a
terminal.
This flag overrides --line-number.
"
);
let arg = RGArg::switch("no-line-number")
@@ -1927,13 +1927,16 @@ Nevertheless, if you only care about matches spanning at most one line, then it
is always better to disable multiline mode.
This flag can be disabled with --no-multiline.
This overrides the --stop-on-nonmatch flag.
"
);
let arg = RGArg::switch("multiline")
.short("U")
.help(SHORT)
.long_help(LONG)
.overrides("no-multiline");
.overrides("no-multiline")
.overrides("stop-on-nonmatch");
args.push(arg);
let arg = RGArg::switch("no-multiline").hidden().overrides("multiline");
@@ -2583,8 +2586,8 @@ Do not print anything to stdout. If a match is found in a file, then ripgrep
will stop searching. This is useful when ripgrep is used only for its exit
code (which will be an error if no matches are found).
When --files is used, then ripgrep will stop finding files after finding the
first file that matches all ignore rules.
When --files is used, ripgrep will stop finding files after finding the
first file that does not match any ignore rules.
"
);
let arg = RGArg::switch("quiet").short("q").help(SHORT).long_help(LONG);
@@ -2647,6 +2650,17 @@ replacement string. Capture group indices are numbered based on the position of
the opening parenthesis of the group, where the leftmost such group is $1. The
special $0 group corresponds to the entire match.
The name of a group is formed by taking the longest string of letters, numbers
and underscores (i.e. [_0-9A-Za-z]) after the $. For example, $1a will be
replaced with the group named '1a', not the group at index 1. If the group's
name contains characters that aren't letters, numbers or underscores, or you
want to immediately follow the group with another string, the name should be
put inside braces. For example, ${1}a will take the content of the group at
index 1 and append 'a' to the end of it.
If an index or name does not refer to a valid capture group, it will be
replaced with an empty string.
In shells such as Bash and zsh, you should wrap the pattern in single quotes
instead of double quotes. Otherwise, capture group indices will be replaced by
expanded shell variables which will most likely be empty.
@@ -2844,6 +2858,25 @@ This flag can be disabled with --no-stats.
args.push(arg);
}
fn flag_stop_on_nonmatch(args: &mut Vec<RGArg>) {
const SHORT: &str = "Stop searching after a non-match.";
const LONG: &str = long!(
"\
Enabling this option will cause ripgrep to stop reading a file once it
encounters a non-matching line after it has encountered a matching line.
This is useful if it is expected that all matches in a given file will be on
sequential lines, for example due to the lines being sorted.
This overrides the -U/--multiline flag.
"
);
let arg = RGArg::switch("stop-on-nonmatch")
.help(SHORT)
.long_help(LONG)
.overrides("multiline");
args.push(arg);
}
fn flag_text(args: &mut Vec<RGArg>) {
const SHORT: &str = "Search binary files as if they were text.";
const LONG: &str = long!(

View File

@@ -31,7 +31,6 @@ use ignore::overrides::{Override, OverrideBuilder};
use ignore::types::{FileTypeDef, Types, TypesBuilder};
use ignore::{Walk, WalkBuilder, WalkParallel};
use log;
use regex;
use termcolor::{BufferWriter, ColorChoice, WriteColor};
use crate::app;
@@ -42,7 +41,7 @@ use crate::path_printer::{PathPrinter, PathPrinterBuilder};
use crate::search::{
PatternMatcher, Printer, SearchWorker, SearchWorkerBuilder,
};
use crate::subject::SubjectBuilder;
use crate::subject::{Subject, SubjectBuilder};
use crate::Result;
/// The command that ripgrep should execute based on the command line
@@ -325,6 +324,46 @@ impl Args {
.build())
}
/// Returns true if and only if `stat`-related sorting is required
pub fn needs_stat_sort(&self) -> bool {
return self.matches().sort_by().map_or(
false,
|sort_by| match sort_by.kind {
SortByKind::LastModified
| SortByKind::Created
| SortByKind::LastAccessed => sort_by.check().is_ok(),
_ => false,
},
);
}
/// Sort subjects if a sorter is specified, but only if the sort requires
/// stat calls. Non-stat related sorts are handled during file traversal
///
/// This function assumes that it is known that a stat-related sort is
/// required, and does not check for it again.
///
/// It is important that that precondition is fulfilled, since this function
/// consumes the subjects iterator, and is therefore a blocking function.
pub fn sort_by_stat<I>(&self, subjects: I) -> Vec<Subject>
where
I: Iterator<Item = Subject>,
{
let sorter = match self.matches().sort_by() {
Ok(v) => v,
Err(_) => return subjects.collect(),
};
use SortByKind::*;
let mut keyed = match sorter.kind {
LastModified => load_timestamps(subjects, |m| m.modified()),
LastAccessed => load_timestamps(subjects, |m| m.accessed()),
Created => load_timestamps(subjects, |m| m.created()),
_ => return subjects.collect(),
};
keyed.sort_by(|a, b| sort_by_option(&a.0, &b.0, sorter.reverse));
keyed.into_iter().map(|v| v.1).collect()
}
/// Return a parallel walker that may use additional threads.
pub fn walker_parallel(&self) -> Result<WalkParallel> {
Ok(self
@@ -405,44 +444,23 @@ impl SortBy {
Ok(())
}
fn configure_walk_builder(self, builder: &mut WalkBuilder) {
// This isn't entirely optimal. In particular, we will wind up issuing
// a stat for many files redundantly. Aside from having potentially
// inconsistent results with respect to sorting, this is also slow.
// We could fix this here at the expense of memory by caching stat
// calls. A better fix would be to find a way to push this down into
// directory traversal itself, but that's a somewhat nasty change.
/// Load sorters only if they are applicable at the walk stage.
///
/// In particular, sorts that involve `stat` calls are not loaded because
/// the walk inherently assumes that parent directories are aware of all its
/// decendent properties, but `stat` does not work that way.
fn configure_builder_sort(self, builder: &mut WalkBuilder) {
use SortByKind::*;
match self.kind {
SortByKind::None => {}
SortByKind::Path => {
if self.reverse {
Path if self.reverse => {
builder.sort_by_file_name(|a, b| a.cmp(b).reverse());
} else {
}
Path => {
builder.sort_by_file_name(|a, b| a.cmp(b));
}
}
SortByKind::LastModified => {
builder.sort_by_file_path(move |a, b| {
sort_by_metadata_time(a, b, self.reverse, |md| {
md.modified()
})
});
}
SortByKind::LastAccessed => {
builder.sort_by_file_path(move |a, b| {
sort_by_metadata_time(a, b, self.reverse, |md| {
md.accessed()
})
});
}
SortByKind::Created => {
builder.sort_by_file_path(move |a, b| {
sort_by_metadata_time(a, b, self.reverse, |md| {
md.created()
})
});
}
}
// these use `stat` calls and will be sorted in Args::sort_by_stat()
LastModified | LastAccessed | Created | None => {}
};
}
}
@@ -472,24 +490,6 @@ enum EncodingMode {
Disabled,
}
impl EncodingMode {
/// Checks if an explicit encoding has been set. Returns false for
/// automatic BOM sniffing and no sniffing.
///
/// This is only used to determine whether PCRE2 needs to have its own
/// UTF-8 checking enabled. If we have an explicit encoding set, then
/// we're always guaranteed to get UTF-8, so we can disable PCRE2's check.
/// Otherwise, we have no such guarantee, and must enable PCRE2' UTF-8
/// check.
#[cfg(feature = "pcre2")]
fn has_explicit_encoding(&self) -> bool {
match self {
EncodingMode::Some(_) => true,
_ => false,
}
}
}
impl ArgMatches {
/// Create an ArgMatches from clap's parse result.
fn new(clap_matches: clap::ArgMatches<'static>) -> ArgMatches {
@@ -671,6 +671,8 @@ impl ArgMatches {
.multi_line(true)
.unicode(self.unicode())
.octal(false)
.fixed_strings(self.is_present("fixed-strings"))
.whole_line(self.is_present("line-regexp"))
.word(self.is_present("word-regexp"));
if self.is_present("multiline") {
builder.dot_matches_new_line(self.is_present("multiline-dotall"));
@@ -697,12 +699,7 @@ impl ArgMatches {
if let Some(limit) = self.dfa_size_limit()? {
builder.dfa_size_limit(limit);
}
let res = if self.is_present("fixed-strings") {
builder.build_literals(patterns)
} else {
builder.build(&patterns.join("|"))
};
match res {
match builder.build_many(patterns) {
Ok(m) => Ok(m),
Err(err) => Err(From::from(suggest_multiline(err.to_string()))),
}
@@ -719,6 +716,8 @@ impl ArgMatches {
.case_smart(self.case_smart())
.caseless(self.case_insensitive())
.multi_line(true)
.fixed_strings(self.is_present("fixed-strings"))
.whole_line(self.is_present("line-regexp"))
.word(self.is_present("word-regexp"));
// For whatever reason, the JIT craps out during regex compilation with
// a "no more memory" error on 32 bit systems. So don't use it there.
@@ -732,14 +731,6 @@ impl ArgMatches {
}
if self.unicode() {
builder.utf(true).ucp(true);
if self.encoding()?.has_explicit_encoding() {
// SAFETY: If an encoding was specified, then we're guaranteed
// to get valid UTF-8, so we can disable PCRE2's UTF checking.
// (Feeding invalid UTF-8 to PCRE2 is undefined behavior.)
unsafe {
builder.disable_utf_check();
}
}
}
if self.is_present("multiline") {
builder.dotall(self.is_present("multiline-dotall"));
@@ -747,7 +738,7 @@ impl ArgMatches {
if self.is_present("crlf") {
builder.crlf(true);
}
Ok(builder.build(&patterns.join("|"))?)
Ok(builder.build_many(patterns)?)
}
/// Build a JSON printer that writes results to the given writer.
@@ -849,7 +840,8 @@ impl ArgMatches {
.before_context(ctx_before)
.after_context(ctx_after)
.passthru(self.is_present("passthru"))
.memory_map(self.mmap_choice(paths));
.memory_map(self.mmap_choice(paths))
.stop_on_nonmatch(self.is_present("stop-on-nonmatch"));
match self.encoding()? {
EncodingMode::Some(enc) => {
builder.encoding(Some(enc));
@@ -900,12 +892,10 @@ impl ArgMatches {
.git_exclude(!self.no_ignore_vcs() && !self.no_ignore_exclude())
.require_git(!self.is_present("no-require-git"))
.ignore_case_insensitive(self.ignore_file_case_insensitive());
if !self.no_ignore() {
if !self.no_ignore() && !self.no_ignore_dot() {
builder.add_custom_ignore_filename(".rgignore");
}
let sortby = self.sort_by()?;
sortby.check()?;
sortby.configure_walk_builder(&mut builder);
self.sort_by()?.configure_builder_sort(&mut builder);
Ok(builder)
}
}
@@ -1020,10 +1010,10 @@ impl ArgMatches {
/// If there was a problem parsing the values from the user as an integer,
/// then an error is returned.
fn contexts(&self) -> Result<(usize, usize)> {
let after = self.usize_of("after-context")?.unwrap_or(0);
let before = self.usize_of("before-context")?.unwrap_or(0);
let both = self.usize_of("context")?.unwrap_or(0);
Ok(if both > 0 { (both, both) } else { (before, after) })
let after = self.usize_of("after-context")?.unwrap_or(both);
let before = self.usize_of("before-context")?.unwrap_or(both);
Ok((before, after))
}
/// Returns the unescaped context separator in UTF-8 bytes.
@@ -1080,7 +1070,6 @@ impl ArgMatches {
}
let label = match self.value_of_lossy("encoding") {
None if self.pcre2_unicode() => "utf-8".to_string(),
None => return Ok(EncodingMode::Auto),
Some(label) => label,
};
@@ -1412,11 +1401,6 @@ impl ArgMatches {
/// Get a sequence of all available patterns from the command line.
/// This includes reading the -e/--regexp and -f/--file flags.
///
/// Note that if -F/--fixed-strings is set, then all patterns will be
/// escaped. If -x/--line-regexp is set, then all patterns are surrounded
/// by `^...$`. Other things, such as --word-regexp, are handled by the
/// regex matcher itself.
///
/// If any pattern is invalid UTF-8, then an error is returned.
fn patterns(&self) -> Result<Vec<String>> {
if self.is_present("files") || self.is_present("type-list") {
@@ -1457,16 +1441,6 @@ impl ArgMatches {
Ok(pats)
}
/// Returns a pattern that is guaranteed to produce an empty regular
/// expression that is valid in any position.
fn pattern_empty(&self) -> String {
// This would normally just be an empty string, which works on its
// own, but if the patterns are joined in a set of alternations, then
// you wind up with `foo|`, which is currently invalid in Rust's regex
// engine.
"(?:z{0})*".to_string()
}
/// Converts an OsStr pattern to a String pattern. The pattern is escaped
/// if -F/--fixed-strings is set.
///
@@ -1485,30 +1459,12 @@ impl ArgMatches {
/// Applies additional processing on the given pattern if necessary
/// (such as escaping meta characters or turning it into a line regex).
fn pattern_from_string(&self, pat: String) -> String {
let pat = self.pattern_line(self.pattern_literal(pat));
if pat.is_empty() {
self.pattern_empty()
} else {
pat
}
}
/// Returns the given pattern as a line pattern if the -x/--line-regexp
/// flag is set. Otherwise, the pattern is returned unchanged.
fn pattern_line(&self, pat: String) -> String {
if self.is_present("line-regexp") {
format!(r"^(?:{})$", pat)
} else {
pat
}
}
/// Returns the given pattern as a literal pattern if the
/// -F/--fixed-strings flag is set. Otherwise, the pattern is returned
/// unchanged.
fn pattern_literal(&self, pat: String) -> String {
if self.is_present("fixed-strings") {
regex::escape(&pat)
// This would normally just be an empty string, which works on its
// own, but if the patterns are joined in a set of alternations,
// then you wind up with `foo|`, which is currently invalid in
// Rust's regex engine.
"(?:)".to_string()
} else {
pat
}
@@ -1641,12 +1597,6 @@ impl ArgMatches {
!(self.is_present("no-unicode") || self.is_present("no-pcre2-unicode"))
}
/// Returns true if and only if PCRE2 is enabled and its Unicode mode is
/// enabled.
fn pcre2_unicode(&self) -> bool {
self.is_present("pcre2") && self.unicode()
}
/// Returns true if and only if file names containing each match should
/// be emitted.
fn with_filename(&self, paths: &[PathBuf]) -> bool {
@@ -1807,32 +1757,18 @@ fn u64_to_usize(arg_name: &str, value: Option<u64>) -> Result<Option<usize>> {
}
}
/// Builds a comparator for sorting two files according to a system time
/// extracted from the file's metadata.
///
/// If there was a problem extracting the metadata or if the time is not
/// available, then both entries compare equal.
fn sort_by_metadata_time<G>(
p1: &Path,
p2: &Path,
/// Sorts by an optional parameter.
//
/// If parameter is found to be `None`, both entries compare equal.
fn sort_by_option<T: Ord>(
p1: &Option<T>,
p2: &Option<T>,
reverse: bool,
get_time: G,
) -> cmp::Ordering
where
G: Fn(&fs::Metadata) -> io::Result<SystemTime>,
{
let t1 = match p1.metadata().and_then(|md| get_time(&md)) {
Ok(t) => t,
Err(_) => return cmp::Ordering::Equal,
};
let t2 = match p2.metadata().and_then(|md| get_time(&md)) {
Ok(t) => t,
Err(_) => return cmp::Ordering::Equal,
};
if reverse {
t1.cmp(&t2).reverse()
} else {
t1.cmp(&t2)
) -> cmp::Ordering {
match (p1, p2, reverse) {
(Some(p1), Some(p2), true) => p1.cmp(&p2).reverse(),
(Some(p1), Some(p2), false) => p1.cmp(&p2),
_ => cmp::Ordering::Equal,
}
}
@@ -1886,3 +1822,17 @@ fn current_dir() -> Result<PathBuf> {
)
.into())
}
/// Tries to assign a timestamp to every `Subject` in the vector to help with
/// sorting Subjects by time.
fn load_timestamps<G>(
subjects: impl Iterator<Item = Subject>,
get_time: G,
) -> Vec<(Option<SystemTime>, Subject)>
where
G: Fn(&fs::Metadata) -> io::Result<SystemTime>,
{
subjects
.map(|s| (s.path().metadata().and_then(|m| get_time(&m)).ok(), s))
.collect()
}

View File

@@ -33,7 +33,7 @@ impl Log for Logger {
fn log(&self, record: &log::Record<'_>) {
match (record.file(), record.line()) {
(Some(file), Some(line)) => {
eprintln!(
eprintln_locked!(
"{}|{}|{}:{}: {}",
record.level(),
record.target(),
@@ -43,7 +43,7 @@ impl Log for Logger {
);
}
(Some(file), None) => {
eprintln!(
eprintln_locked!(
"{}|{}|{}: {}",
record.level(),
record.target(),
@@ -52,7 +52,7 @@ impl Log for Logger {
);
}
_ => {
eprintln!(
eprintln_locked!(
"{}|{}: {}",
record.level(),
record.target(),
@@ -63,6 +63,6 @@ impl Log for Logger {
}
fn flush(&self) {
// We use eprintln! which is flushed on every call.
// We use eprintln_locked! which is flushed on every call.
}
}

View File

@@ -47,7 +47,7 @@ type Result<T> = ::std::result::Result<T, Box<dyn error::Error>>;
fn main() {
if let Err(err) = Args::parse().and_then(try_main) {
eprintln!("{}", err);
eprintln_locked!("{}", err);
process::exit(2);
}
}
@@ -77,32 +77,33 @@ fn try_main(args: Args) -> Result<()> {
/// steps through the file list (current directory by default) and searches
/// each file sequentially.
fn search(args: &Args) -> Result<bool> {
let started_at = Instant::now();
/// The meat of the routine is here. This lets us call the same iteration
/// code over each file regardless of whether we stream over the files
/// as they're produced by the underlying directory traversal or whether
/// they've been collected and sorted (for example) first.
fn iter(
args: &Args,
subjects: impl Iterator<Item = Subject>,
started_at: std::time::Instant,
) -> Result<bool> {
let quit_after_match = args.quit_after_match()?;
let subject_builder = args.subject_builder();
let mut stats = args.stats()?;
let mut searcher = args.search_worker(args.stdout())?;
let mut matched = false;
let mut searched = false;
for result in args.walker()? {
let subject = match subject_builder.build_from_result(result) {
Some(subject) => subject,
None => continue,
};
for subject in subjects {
searched = true;
let search_result = match searcher.search(&subject) {
Ok(search_result) => search_result,
Err(err) => {
// A broken pipe means graceful termination.
if err.kind() == io::ErrorKind::BrokenPipe {
break;
}
Err(err) if err.kind() == io::ErrorKind::BrokenPipe => break,
Err(err) => {
err_message!("{}: {}", subject.path().display(), err);
continue;
}
};
matched = matched || search_result.has_match();
matched |= search_result.has_match();
if let Some(ref mut stats) = stats {
*stats += search_result.stats().unwrap();
}
@@ -121,9 +122,25 @@ fn search(args: &Args) -> Result<bool> {
Ok(matched)
}
let started_at = Instant::now();
let subject_builder = args.subject_builder();
let subjects = args
.walker()?
.filter_map(|result| subject_builder.build_from_result(result));
if args.needs_stat_sort() {
let subjects = args.sort_by_stat(subjects).into_iter();
iter(args, subjects, started_at)
} else {
iter(args, subjects, started_at)
}
}
/// The top-level entry point for multi-threaded search. The parallelism is
/// itself achieved by the recursive directory traversal. All we need to do is
/// feed it a worker for performing a search on each file.
///
/// Requesting a sorted output from ripgrep (such as with `--sort path`) will
/// automatically disable parallelism and hence sorting is not handled here.
fn search_parallel(args: &Args) -> Result<bool> {
use std::sync::atomic::AtomicBool;
use std::sync::atomic::Ordering::SeqCst;
@@ -214,15 +231,19 @@ fn eprint_nothing_searched() {
/// recursively steps through the file list (current directory by default) and
/// prints each path sequentially using a single thread.
fn files(args: &Args) -> Result<bool> {
/// The meat of the routine is here. This lets us call the same iteration
/// code over each file regardless of whether we stream over the files
/// as they're produced by the underlying directory traversal or whether
/// they've been collected and sorted (for example) first.
fn iter(
args: &Args,
subjects: impl Iterator<Item = Subject>,
) -> Result<bool> {
let quit_after_match = args.quit_after_match()?;
let subject_builder = args.subject_builder();
let mut matched = false;
let mut path_printer = args.path_printer(args.stdout())?;
for result in args.walker()? {
let subject = match subject_builder.build_from_result(result) {
Some(subject) => subject,
None => continue,
};
for subject in subjects {
matched = true;
if quit_after_match {
break;
@@ -240,9 +261,24 @@ fn files(args: &Args) -> Result<bool> {
Ok(matched)
}
let subject_builder = args.subject_builder();
let subjects = args
.walker()?
.filter_map(|result| subject_builder.build_from_result(result));
if args.needs_stat_sort() {
let subjects = args.sort_by_stat(subjects).into_iter();
iter(args, subjects)
} else {
iter(args, subjects)
}
}
/// The top-level entry point for listing files without searching them. This
/// recursively steps through the file list (current directory by default) and
/// prints each path sequentially using multiple threads.
///
/// Requesting a sorted output from ripgrep (such as with `--sort path`) will
/// automatically disable parallelism and hence sorting is not handled here.
fn files_parallel(args: &Args) -> Result<bool> {
use std::sync::atomic::AtomicBool;
use std::sync::atomic::Ordering::SeqCst;

View File

@@ -4,12 +4,28 @@ static MESSAGES: AtomicBool = AtomicBool::new(false);
static IGNORE_MESSAGES: AtomicBool = AtomicBool::new(false);
static ERRORED: AtomicBool = AtomicBool::new(false);
/// Like eprintln, but locks STDOUT to prevent interleaving lines.
#[macro_export]
macro_rules! eprintln_locked {
($($tt:tt)*) => {{
{
// This is a bit of an abstraction violation because we explicitly
// lock STDOUT before printing to STDERR. This avoids interleaving
// lines within ripgrep because `search_parallel` uses `termcolor`,
// which accesses the same STDOUT lock when writing lines.
let stdout = std::io::stdout();
let _handle = stdout.lock();
eprintln!($($tt)*);
}
}}
}
/// Emit a non-fatal error message, unless messages were disabled.
#[macro_export]
macro_rules! message {
($($tt:tt)*) => {
if crate::messages::messages() {
eprintln!($($tt)*);
eprintln_locked!($($tt)*);
}
}
}
@@ -30,7 +46,7 @@ macro_rules! err_message {
macro_rules! ignore_message {
($($tt:tt)*) => {
if crate::messages::messages() && crate::messages::ignore_messages() {
eprintln!($($tt)*);
eprintln_locked!($($tt)*);
}
}
}

View File

@@ -1,6 +1,6 @@
[package]
name = "globset"
version = "0.4.10" #:version
version = "0.4.12" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Cross platform single glob and glob set matching. Glob set matching is the
@@ -20,11 +20,11 @@ name = "globset"
bench = false
[dependencies]
aho-corasick = "0.7.3"
bstr = { version = "1.1.0", default-features = false, features = ["std"] }
aho-corasick = "1.0.2"
bstr = { version = "1.6.0", default-features = false, features = ["std"] }
fnv = "1.0.6"
log = { version = "0.4.5", optional = true }
regex = { version = "1.1.5", default-features = false, features = ["perf", "std"] }
regex = { version = "1.8.3", default-features = false, features = ["perf", "std"] }
serde = { version = "1.0.104", optional = true }
[dev-dependencies]

View File

@@ -208,6 +208,9 @@ struct GlobOptions {
/// Whether or not to use `\` to escape special characters.
/// e.g., when enabled, `\*` will match a literal `*`.
backslash_escape: bool,
/// Whether or not an empty case in an alternate will be removed.
/// e.g., when enabled, `{,a}` will match "" and "a".
empty_alternates: bool,
}
impl GlobOptions {
@@ -216,6 +219,7 @@ impl GlobOptions {
case_insensitive: false,
literal_separator: false,
backslash_escape: !is_separator('\\'),
empty_alternates: false,
}
}
}
@@ -633,6 +637,16 @@ impl<'a> GlobBuilder<'a> {
self.opts.backslash_escape = yes;
self
}
/// Toggle whether an empty pattern in a list of alternates is accepted.
///
/// For example, if this is set then the glob `foo{,.txt}` will match both `foo` and `foo.txt`.
///
/// By default this is false.
pub fn empty_alternates(&mut self, yes: bool) -> &mut GlobBuilder<'a> {
self.opts.empty_alternates = yes;
self
}
}
impl Tokens {
@@ -714,7 +728,7 @@ impl Tokens {
for pat in patterns {
let mut altre = String::new();
self.tokens_to_regex(options, &pat, &mut altre);
if !altre.is_empty() {
if !altre.is_empty() || options.empty_alternates {
parts.push(altre);
}
}
@@ -1020,6 +1034,7 @@ mod tests {
casei: Option<bool>,
litsep: Option<bool>,
bsesc: Option<bool>,
ealtre: Option<bool>,
}
macro_rules! syntax {
@@ -1059,6 +1074,9 @@ mod tests {
if let Some(bsesc) = $options.bsesc {
builder.backslash_escape(bsesc);
}
if let Some(ealtre) = $options.ealtre {
builder.empty_alternates(ealtre);
}
let pat = builder.build().unwrap();
assert_eq!(format!("(?-u){}", $re), pat.regex());
}
@@ -1082,6 +1100,9 @@ mod tests {
if let Some(bsesc) = $options.bsesc {
builder.backslash_escape(bsesc);
}
if let Some(ealtre) = $options.ealtre {
builder.empty_alternates(ealtre);
}
let pat = builder.build().unwrap();
let matcher = pat.compile_matcher();
let strategic = pat.compile_strategic_matcher();
@@ -1110,6 +1131,9 @@ mod tests {
if let Some(bsesc) = $options.bsesc {
builder.backslash_escape(bsesc);
}
if let Some(ealtre) = $options.ealtre {
builder.empty_alternates(ealtre);
}
let pat = builder.build().unwrap();
let matcher = pat.compile_matcher();
let strategic = pat.compile_strategic_matcher();
@@ -1195,13 +1219,23 @@ mod tests {
syntaxerr!(err_range2, "[z--]", ErrorKind::InvalidRange('z', '-'));
const CASEI: Options =
Options { casei: Some(true), litsep: None, bsesc: None };
Options { casei: Some(true), litsep: None, bsesc: None, ealtre: None };
const SLASHLIT: Options =
Options { casei: None, litsep: Some(true), bsesc: None };
const NOBSESC: Options =
Options { casei: None, litsep: None, bsesc: Some(false) };
Options { casei: None, litsep: Some(true), bsesc: None, ealtre: None };
const NOBSESC: Options = Options {
casei: None,
litsep: None,
bsesc: Some(false),
ealtre: None,
};
const BSESC: Options =
Options { casei: None, litsep: None, bsesc: Some(true) };
Options { casei: None, litsep: None, bsesc: Some(true), ealtre: None };
const EALTRE: Options = Options {
casei: None,
litsep: None,
bsesc: Some(true),
ealtre: Some(true),
};
toregex!(re_casei, "a", "(?i)^a$", &CASEI);
@@ -1326,6 +1360,9 @@ mod tests {
matches!(matchalt11, "{*.foo,*.bar,*.wat}", "test.foo");
matches!(matchalt12, "{*.foo,*.bar,*.wat}", "test.bar");
matches!(matchalt13, "{*.foo,*.bar,*.wat}", "test.wat");
matches!(matchalt14, "foo{,.txt}", "foo.txt");
nmatches!(matchalt15, "foo{,.txt}", "foo");
matches!(matchalt16, "foo{,.txt}", "foo", EALTRE);
matches!(matchslash1, "abc/def", "abc/def", SLASHLIT);
#[cfg(unix)]
@@ -1425,6 +1462,9 @@ mod tests {
if let Some(bsesc) = $options.bsesc {
builder.backslash_escape(bsesc);
}
if let Some(ealtre) = $options.ealtre {
builder.empty_alternates(ealtre);
}
let pat = builder.build().unwrap();
assert_eq!($expect, pat.$which());
}

View File

@@ -498,13 +498,23 @@ impl GlobSetBuilder {
/// Constructing candidates has a very small cost associated with it, so
/// callers may find it beneficial to amortize that cost when matching a single
/// path against multiple globs or sets of globs.
#[derive(Clone, Debug)]
#[derive(Clone)]
pub struct Candidate<'a> {
path: Cow<'a, [u8]>,
basename: Cow<'a, [u8]>,
ext: Cow<'a, [u8]>,
}
impl<'a> std::fmt::Debug for Candidate<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
f.debug_struct("Candidate")
.field("path", &self.path.as_bstr())
.field("basename", &self.basename.as_bstr())
.field("ext", &self.ext.as_bstr())
.finish()
}
}
impl<'a> Candidate<'a> {
/// Create a new candidate for matching from the given path.
pub fn new<P: AsRef<Path> + ?Sized>(path: &'a P) -> Candidate<'a> {
@@ -818,7 +828,7 @@ impl MultiStrategyBuilder {
fn prefix(self) -> PrefixStrategy {
PrefixStrategy {
matcher: AhoCorasick::new_auto_configured(&self.literals),
matcher: AhoCorasick::new(&self.literals).unwrap(),
map: self.map,
longest: self.longest,
}
@@ -826,7 +836,7 @@ impl MultiStrategyBuilder {
fn suffix(self) -> SuffixStrategy {
SuffixStrategy {
matcher: AhoCorasick::new_auto_configured(&self.literals),
matcher: AhoCorasick::new(&self.literals).unwrap(),
map: self.map,
longest: self.longest,
}
@@ -870,6 +880,29 @@ impl RequiredExtensionStrategyBuilder {
}
}
/// Escape meta-characters within the given glob pattern.
///
/// The escaping works by surrounding meta-characters with brackets. For
/// example, `*` becomes `[*]`.
pub fn escape(s: &str) -> String {
let mut escaped = String::with_capacity(s.len());
for c in s.chars() {
match c {
// note that ! does not need escaping because it is only special
// inside brackets
'?' | '*' | '[' | ']' => {
escaped.push('[');
escaped.push(c);
escaped.push(']');
}
c => {
escaped.push(c);
}
}
}
escaped
}
#[cfg(test)]
mod tests {
use super::{GlobSet, GlobSetBuilder};
@@ -909,4 +942,16 @@ mod tests {
assert!(!set.is_match(""));
assert!(!set.is_match("a"));
}
#[test]
fn escape() {
use super::escape;
assert_eq!("foo", escape("foo"));
assert_eq!("foo[*]", escape("foo*"));
assert_eq!("[[][]]", escape("[]"));
assert_eq!("[*][?]", escape("*?"));
assert_eq!("src/[*][*]/[*].rs", escape("src/**/*.rs"));
assert_eq!("bar[[]ab[]]baz", escape("bar[ab]baz"));
assert_eq!("bar[[]!![]]!baz", escape("bar[!!]!baz"));
}
}

View File

@@ -27,7 +27,7 @@ pub fn file_name<'a>(path: &Cow<'a, [u8]>) -> Option<Cow<'a, [u8]>> {
///
/// Note that this does NOT match the semantics of std::path::Path::extension.
/// Namely, the extension includes the `.` and matching is otherwise more
/// liberal. Specifically, the extenion is:
/// liberal. Specifically, the extension is:
///
/// * None, if the file name given is empty;
/// * None, if there is no embedded `.`;

View File

@@ -1,7 +1,9 @@
use serde::de::Error;
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use serde::{
de::{Error, SeqAccess, Visitor},
{Deserialize, Deserializer, Serialize, Serializer},
};
use crate::Glob;
use crate::{Glob, GlobSet, GlobSetBuilder};
impl Serialize for Glob {
fn serialize<S: Serializer>(
@@ -12,18 +14,98 @@ impl Serialize for Glob {
}
}
struct GlobVisitor;
impl<'de> Visitor<'de> for GlobVisitor {
type Value = Glob;
fn expecting(
&self,
formatter: &mut std::fmt::Formatter,
) -> std::fmt::Result {
formatter.write_str("a glob pattern")
}
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
where
E: Error,
{
Glob::new(v).map_err(serde::de::Error::custom)
}
}
impl<'de> Deserialize<'de> for Glob {
fn deserialize<D: Deserializer<'de>>(
deserializer: D,
) -> Result<Self, D::Error> {
let glob = <&str as Deserialize>::deserialize(deserializer)?;
Glob::new(glob).map_err(D::Error::custom)
deserializer.deserialize_str(GlobVisitor)
}
}
struct GlobSetVisitor;
impl<'de> Visitor<'de> for GlobSetVisitor {
type Value = GlobSet;
fn expecting(
&self,
formatter: &mut std::fmt::Formatter,
) -> std::fmt::Result {
formatter.write_str("an array of glob patterns")
}
fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
where
A: SeqAccess<'de>,
{
let mut builder = GlobSetBuilder::new();
while let Some(glob) = seq.next_element()? {
builder.add(glob);
}
builder.build().map_err(serde::de::Error::custom)
}
}
impl<'de> Deserialize<'de> for GlobSet {
fn deserialize<D: Deserializer<'de>>(
deserializer: D,
) -> Result<Self, D::Error> {
deserializer.deserialize_seq(GlobSetVisitor)
}
}
#[cfg(test)]
mod tests {
use Glob;
use std::collections::HashMap;
use crate::{Glob, GlobSet};
#[test]
fn glob_deserialize_borrowed() {
let string = r#"{"markdown": "*.md"}"#;
let map: HashMap<String, Glob> =
serde_json::from_str(&string).unwrap();
assert_eq!(map["markdown"], Glob::new("*.md").unwrap());
}
#[test]
fn glob_deserialize_owned() {
let string = r#"{"markdown": "*.md"}"#;
let v: serde_json::Value = serde_json::from_str(&string).unwrap();
let map: HashMap<String, Glob> = serde_json::from_value(v).unwrap();
assert_eq!(map["markdown"], Glob::new("*.md").unwrap());
}
#[test]
fn glob_deserialize_error() {
let string = r#"{"error": "["}"#;
let map = serde_json::from_str::<HashMap<String, Glob>>(&string);
assert!(map.is_err());
}
#[test]
fn glob_json_works() {
@@ -35,4 +117,12 @@ mod tests {
let de: Glob = serde_json::from_str(&ser).unwrap();
assert_eq!(test_glob, de);
}
#[test]
fn glob_set_deserialize() {
let j = r#" ["src/**/*.rs", "README.md"] "#;
let set: GlobSet = serde_json::from_str(j).unwrap();
assert!(set.is_match("src/lib.rs"));
assert!(!set.is_match("Cargo.lock"));
}
}

View File

@@ -1,6 +1,6 @@
[package]
name = "grep"
version = "0.2.10" #:version
version = "0.2.12" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Fast line oriented regex searching as a library.
@@ -14,12 +14,12 @@ license = "Unlicense OR MIT"
edition = "2018"
[dependencies]
grep-cli = { version = "0.1.6", path = "../cli" }
grep-matcher = { version = "0.1.5", path = "../matcher" }
grep-pcre2 = { version = "0.1.5", path = "../pcre2", optional = true }
grep-printer = { version = "0.1.6", path = "../printer" }
grep-regex = { version = "0.1.10", path = "../regex" }
grep-searcher = { version = "0.1.10", path = "../searcher" }
grep-cli = { version = "0.1.7", path = "../cli" }
grep-matcher = { version = "0.1.6", path = "../matcher" }
grep-pcre2 = { version = "0.1.6", path = "../pcre2", optional = true }
grep-printer = { version = "0.1.7", path = "../printer" }
grep-regex = { version = "0.1.11", path = "../regex" }
grep-searcher = { version = "0.1.11", path = "../searcher" }
[dev-dependencies]
termcolor = "1.0.4"

View File

@@ -12,8 +12,6 @@ are sparse.
A cookbook and a guide are planned.
*/
#![deny(missing_docs)]
pub extern crate grep_cli as cli;
pub extern crate grep_matcher as matcher;
#[cfg(feature = "pcre2")]

View File

@@ -1,6 +1,6 @@
[package]
name = "ignore"
version = "0.4.18" #:version
version = "0.4.20" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
A fast library for efficiently matching ignore files such as `.gitignore`
@@ -19,12 +19,11 @@ name = "ignore"
bench = false
[dependencies]
crossbeam-utils = "0.8.0"
globset = { version = "0.4.9", path = "../globset" }
globset = { version = "0.4.10", path = "../globset" }
lazy_static = "1.1"
log = "0.4.5"
memchr = "2.1"
regex = "1.1"
memchr = "2.5"
regex = { version = "1.9.0", default-features = false, features = ["perf", "std", "unicode-gencat"] }
same-file = "1.0.4"
thread_local = "1"
walkdir = "2.2.7"

View File

@@ -9,104 +9,113 @@
/// Please try to keep this list sorted lexicographically and wrapped to 79
/// columns (inclusive).
#[rustfmt::skip]
pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
("agda", &["*.agda", "*.lagda"]),
("aidl", &["*.aidl"]),
("amake", &["*.mk", "*.bp"]),
("asciidoc", &["*.adoc", "*.asc", "*.asciidoc"]),
("asm", &["*.asm", "*.s", "*.S"]),
("asp", &[
pub const DEFAULT_TYPES: &[(&[&str], &[&str])] = &[
(&["ada"], &["*.adb", "*.ads"]),
(&["agda"], &["*.agda", "*.lagda"]),
(&["aidl"], &["*.aidl"]),
(&["alire"], &["alire.toml"]),
(&["amake"], &["*.mk", "*.bp"]),
(&["asciidoc"], &["*.adoc", "*.asc", "*.asciidoc"]),
(&["asm"], &["*.asm", "*.s", "*.S"]),
(&["asp"], &[
"*.aspx", "*.aspx.cs", "*.aspx.vb", "*.ascx", "*.ascx.cs",
"*.ascx.vb", "*.asp"
]),
("ats", &["*.ats", "*.dats", "*.sats", "*.hats"]),
("avro", &["*.avdl", "*.avpr", "*.avsc"]),
("awk", &["*.awk"]),
("bazel", &[
(&["ats"], &["*.ats", "*.dats", "*.sats", "*.hats"]),
(&["avro"], &["*.avdl", "*.avpr", "*.avsc"]),
(&["awk"], &["*.awk"]),
(&["bat", "batch"], &["*.bat"]),
(&["bazel"], &[
"*.bazel", "*.bzl", "*.BUILD", "*.bazelrc", "BUILD", "MODULE.bazel",
"WORKSPACE", "WORKSPACE.bazel",
]),
("bitbake", &["*.bb", "*.bbappend", "*.bbclass", "*.conf", "*.inc"]),
("brotli", &["*.br"]),
("buildstream", &["*.bst"]),
("bzip2", &["*.bz2", "*.tbz2"]),
("c", &["*.[chH]", "*.[chH].in", "*.cats"]),
("cabal", &["*.cabal"]),
("candid", &["*.did"]),
("carp", &["*.carp"]),
("cbor", &["*.cbor"]),
("ceylon", &["*.ceylon"]),
("clojure", &["*.clj", "*.cljc", "*.cljs", "*.cljx"]),
("cmake", &["*.cmake", "CMakeLists.txt"]),
("coffeescript", &["*.coffee"]),
("config", &["*.cfg", "*.conf", "*.config", "*.ini"]),
("coq", &["*.v"]),
("cpp", &[
(&["bitbake"], &["*.bb", "*.bbappend", "*.bbclass", "*.conf", "*.inc"]),
(&["brotli"], &["*.br"]),
(&["buildstream"], &["*.bst"]),
(&["bzip2"], &["*.bz2", "*.tbz2"]),
(&["c"], &["*.[chH]", "*.[chH].in", "*.cats"]),
(&["cabal"], &["*.cabal"]),
(&["candid"], &["*.did"]),
(&["carp"], &["*.carp"]),
(&["cbor"], &["*.cbor"]),
(&["ceylon"], &["*.ceylon"]),
(&["clojure"], &["*.clj", "*.cljc", "*.cljs", "*.cljx"]),
(&["cmake"], &["*.cmake", "CMakeLists.txt"]),
(&["cmd"], &["*.bat", "*.cmd"]),
(&["cml"], &["*.cml"]),
(&["coffeescript"], &["*.coffee"]),
(&["config"], &["*.cfg", "*.conf", "*.config", "*.ini"]),
(&["coq"], &["*.v"]),
(&["cpp"], &[
"*.[ChH]", "*.cc", "*.[ch]pp", "*.[ch]xx", "*.hh", "*.inl",
"*.[ChH].in", "*.cc.in", "*.[ch]pp.in", "*.[ch]xx.in", "*.hh.in",
]),
("creole", &["*.creole"]),
("crystal", &["Projectfile", "*.cr", "*.ecr", "shard.yml"]),
("cs", &["*.cs"]),
("csharp", &["*.cs"]),
("cshtml", &["*.cshtml"]),
("css", &["*.css", "*.scss"]),
("csv", &["*.csv"]),
("cuda", &["*.cu", "*.cuh"]),
("cython", &["*.pyx", "*.pxi", "*.pxd"]),
("d", &["*.d"]),
("dart", &["*.dart"]),
("devicetree", &["*.dts", "*.dtsi"]),
("dhall", &["*.dhall"]),
("diff", &["*.patch", "*.diff"]),
("docker", &["*Dockerfile*"]),
("dts", &["*.dts", "*.dtsi"]),
("dvc", &["Dvcfile", "*.dvc"]),
("ebuild", &["*.ebuild"]),
("edn", &["*.edn"]),
("elisp", &["*.el"]),
("elixir", &["*.ex", "*.eex", "*.exs"]),
("elm", &["*.elm"]),
("erb", &["*.erb"]),
("erlang", &["*.erl", "*.hrl"]),
("fennel", &["*.fnl"]),
("fidl", &["*.fidl"]),
("fish", &["*.fish"]),
("flatbuffers", &["*.fbs"]),
("fortran", &[
(&["creole"], &["*.creole"]),
(&["crystal"], &["Projectfile", "*.cr", "*.ecr", "shard.yml"]),
(&["cs"], &["*.cs"]),
(&["csharp"], &["*.cs"]),
(&["cshtml"], &["*.cshtml"]),
(&["css"], &["*.css", "*.scss"]),
(&["csv"], &["*.csv"]),
(&["cuda"], &["*.cu", "*.cuh"]),
(&["cython"], &["*.pyx", "*.pxi", "*.pxd"]),
(&["d"], &["*.d"]),
(&["dart"], &["*.dart"]),
(&["devicetree"], &["*.dts", "*.dtsi"]),
(&["dhall"], &["*.dhall"]),
(&["diff"], &["*.patch", "*.diff"]),
(&["dita"], &["*.dita", "*.ditamap", "*.ditaval"]),
(&["docker"], &["*Dockerfile*"]),
(&["dockercompose"], &["docker-compose.yml", "docker-compose.*.yml"]),
(&["dts"], &["*.dts", "*.dtsi"]),
(&["dvc"], &["Dvcfile", "*.dvc"]),
(&["ebuild"], &["*.ebuild", "*.eclass"]),
(&["edn"], &["*.edn"]),
(&["elisp"], &["*.el"]),
(&["elixir"], &["*.ex", "*.eex", "*.exs", "*.heex", "*.leex", "*.livemd"]),
(&["elm"], &["*.elm"]),
(&["erb"], &["*.erb"]),
(&["erlang"], &["*.erl", "*.hrl"]),
(&["fennel"], &["*.fnl"]),
(&["fidl"], &["*.fidl"]),
(&["fish"], &["*.fish"]),
(&["flatbuffers"], &["*.fbs"]),
(&["fortran"], &[
"*.f", "*.F", "*.f77", "*.F77", "*.pfo",
"*.f90", "*.F90", "*.f95", "*.F95",
]),
("fsharp", &["*.fs", "*.fsx", "*.fsi"]),
("fut", &["*.fut"]),
("gap", &["*.g", "*.gap", "*.gi", "*.gd", "*.tst"]),
("gn", &["*.gn", "*.gni"]),
("go", &["*.go"]),
("gradle", &["*.gradle"]),
("groovy", &["*.groovy", "*.gradle"]),
("gzip", &["*.gz", "*.tgz"]),
("h", &["*.h", "*.hh", "*.hpp"]),
("haml", &["*.haml"]),
("hare", &["*.ha"]),
("haskell", &["*.hs", "*.lhs", "*.cpphs", "*.c2hs", "*.hsc"]),
("hbs", &["*.hbs"]),
("hs", &["*.hs", "*.lhs"]),
("html", &["*.htm", "*.html", "*.ejs"]),
("hy", &["*.hy"]),
("idris", &["*.idr", "*.lidr"]),
("janet", &["*.janet"]),
("java", &["*.java", "*.jsp", "*.jspx", "*.properties"]),
("jinja", &["*.j2", "*.jinja", "*.jinja2"]),
("jl", &["*.jl"]),
("js", &["*.js", "*.jsx", "*.vue", "*.cjs", "*.mjs"]),
("json", &["*.json", "composer.lock"]),
("jsonl", &["*.jsonl"]),
("julia", &["*.jl"]),
("jupyter", &["*.ipynb", "*.jpynb"]),
("k", &["*.k"]),
("kotlin", &["*.kt", "*.kts"]),
("less", &["*.less"]),
("license", &[
(&["fsharp"], &["*.fs", "*.fsx", "*.fsi"]),
(&["fut"], &["*.fut"]),
(&["gap"], &["*.g", "*.gap", "*.gi", "*.gd", "*.tst"]),
(&["gn"], &["*.gn", "*.gni"]),
(&["go"], &["*.go"]),
(&["gprbuild"], &["*.gpr"]),
(&["gradle"], &["*.gradle"]),
(&["graphql"], &["*.graphql", "*.graphqls"]),
(&["groovy"], &["*.groovy", "*.gradle"]),
(&["gzip"], &["*.gz", "*.tgz"]),
(&["h"], &["*.h", "*.hh", "*.hpp"]),
(&["haml"], &["*.haml"]),
(&["hare"], &["*.ha"]),
(&["haskell"], &["*.hs", "*.lhs", "*.cpphs", "*.c2hs", "*.hsc"]),
(&["hbs"], &["*.hbs"]),
(&["hs"], &["*.hs", "*.lhs"]),
(&["html"], &["*.htm", "*.html", "*.ejs"]),
(&["hy"], &["*.hy"]),
(&["idris"], &["*.idr", "*.lidr"]),
(&["janet"], &["*.janet"]),
(&["java"], &["*.java", "*.jsp", "*.jspx", "*.properties"]),
(&["jinja"], &["*.j2", "*.jinja", "*.jinja2"]),
(&["jl"], &["*.jl"]),
(&["js"], &["*.js", "*.jsx", "*.vue", "*.cjs", "*.mjs"]),
(&["json"], &["*.json", "composer.lock"]),
(&["jsonl"], &["*.jsonl"]),
(&["julia"], &["*.jl"]),
(&["jupyter"], &["*.ipynb", "*.jpynb"]),
(&["k"], &["*.k"]),
(&["kotlin"], &["*.kt", "*.kts"]),
(&["less"], &["*.less"]),
(&["license"], &[
// General
"COPYING", "COPYING[.-]*",
"COPYRIGHT", "COPYRIGHT[.-]*",
@@ -133,79 +142,91 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
"MPL-*[0-9]*",
"OFL-*[0-9]*",
]),
("lilypond", &["*.ly", "*.ily"]),
("lisp", &["*.el", "*.jl", "*.lisp", "*.lsp", "*.sc", "*.scm"]),
("lock", &["*.lock", "package-lock.json"]),
("log", &["*.log"]),
("lua", &["*.lua"]),
("lz4", &["*.lz4"]),
("lzma", &["*.lzma"]),
("m4", &["*.ac", "*.m4"]),
("make", &[
(&["lilypond"], &["*.ly", "*.ily"]),
(&["lisp"], &["*.el", "*.jl", "*.lisp", "*.lsp", "*.sc", "*.scm"]),
(&["lock"], &["*.lock", "package-lock.json"]),
(&["log"], &["*.log"]),
(&["lua"], &["*.lua"]),
(&["lz4"], &["*.lz4"]),
(&["lzma"], &["*.lzma"]),
(&["m4"], &["*.ac", "*.m4"]),
(&["make"], &[
"[Gg][Nn][Uu]makefile", "[Mm]akefile",
"[Gg][Nn][Uu]makefile.am", "[Mm]akefile.am",
"[Gg][Nn][Uu]makefile.in", "[Mm]akefile.in",
"*.mk", "*.mak"
]),
("mako", &["*.mako", "*.mao"]),
("man", &["*.[0-9lnpx]", "*.[0-9][cEFMmpSx]"]),
("markdown", &["*.markdown", "*.md", "*.mdown", "*.mkd", "*.mkdn"]),
("matlab", &["*.m"]),
("md", &["*.markdown", "*.md", "*.mdown", "*.mkd", "*.mkdn"]),
("meson", &["meson.build", "meson_options.txt"]),
("minified", &["*.min.html", "*.min.css", "*.min.js"]),
("mint", &["*.mint"]),
("mk", &["mkfile"]),
("ml", &["*.ml"]),
("motoko", &["*.mo"]),
("msbuild", &[
"*.csproj", "*.fsproj", "*.vcxproj", "*.proj", "*.props", "*.targets",
(&["mako"], &["*.mako", "*.mao"]),
(&["man"], &["*.[0-9lnpx]", "*.[0-9][cEFMmpSx]"]),
(&["markdown", "md"], &[
"*.markdown",
"*.md",
"*.mdown",
"*.mdwn",
"*.mkd",
"*.mkdn",
"*.mdx",
]),
("nim", &["*.nim", "*.nimf", "*.nimble", "*.nims"]),
("nix", &["*.nix"]),
("objc", &["*.h", "*.m"]),
("objcpp", &["*.h", "*.mm"]),
("ocaml", &["*.ml", "*.mli", "*.mll", "*.mly"]),
("org", &["*.org", "*.org_archive"]),
("pants", &["BUILD"]),
("pascal", &["*.pas", "*.dpr", "*.lpr", "*.pp", "*.inc"]),
("pdf", &["*.pdf"]),
("perl", &["*.perl", "*.pl", "*.PL", "*.plh", "*.plx", "*.pm", "*.t"]),
("php", &[
(&["matlab"], &["*.m"]),
(&["meson"], &["meson.build", "meson_options.txt"]),
(&["minified"], &["*.min.html", "*.min.css", "*.min.js"]),
(&["mint"], &["*.mint"]),
(&["mk"], &["mkfile"]),
(&["ml"], &["*.ml"]),
(&["motoko"], &["*.mo"]),
(&["msbuild"], &[
"*.csproj", "*.fsproj", "*.vcxproj", "*.proj", "*.props", "*.targets",
"*.sln",
]),
(&["nim"], &["*.nim", "*.nimf", "*.nimble", "*.nims"]),
(&["nix"], &["*.nix"]),
(&["objc"], &["*.h", "*.m"]),
(&["objcpp"], &["*.h", "*.mm"]),
(&["ocaml"], &["*.ml", "*.mli", "*.mll", "*.mly"]),
(&["org"], &["*.org", "*.org_archive"]),
(&["pants"], &["BUILD"]),
(&["pascal"], &["*.pas", "*.dpr", "*.lpr", "*.pp", "*.inc"]),
(&["pdf"], &["*.pdf"]),
(&["perl"], &["*.perl", "*.pl", "*.PL", "*.plh", "*.plx", "*.pm", "*.t"]),
(&["php"], &[
// note that PHP 6 doesn't exist
// See: https://wiki.php.net/rfc/php6
"*.php", "*.php3", "*.php4", "*.php5", "*.php7", "*.php8",
"*.pht", "*.phtml"
]),
("po", &["*.po"]),
("pod", &["*.pod"]),
("postscript", &["*.eps", "*.ps"]),
("protobuf", &["*.proto"]),
("ps", &["*.cdxml", "*.ps1", "*.ps1xml", "*.psd1", "*.psm1"]),
("puppet", &["*.epp", "*.erb", "*.pp", "*.rb"]),
("purs", &["*.purs"]),
("py", &["*.py"]),
("qmake", &["*.pro", "*.pri", "*.prf"]),
("qml", &["*.qml"]),
("r", &["*.R", "*.r", "*.Rmd", "*.Rnw"]),
("racket", &["*.rkt"]),
("rdoc", &["*.rdoc"]),
("readme", &["README*", "*README"]),
("reasonml", &["*.re", "*.rei"]),
("red", &["*.r", "*.red", "*.reds"]),
("rescript", &["*.res", "*.resi"]),
("robot", &["*.robot"]),
("rst", &["*.rst"]),
("ruby", &[
(&["po"], &["*.po"]),
(&["pod"], &["*.pod"]),
(&["postscript"], &["*.eps", "*.ps"]),
(&["protobuf"], &["*.proto"]),
(&["ps"], &["*.cdxml", "*.ps1", "*.ps1xml", "*.psd1", "*.psm1"]),
(&["puppet"], &["*.epp", "*.erb", "*.pp", "*.rb"]),
(&["purs"], &["*.purs"]),
(&["py", "python"], &["*.py", "*.pyi"]),
(&["qmake"], &["*.pro", "*.pri", "*.prf"]),
(&["qml"], &["*.qml"]),
(&["r"], &["*.R", "*.r", "*.Rmd", "*.Rnw"]),
(&["racket"], &["*.rkt"]),
(&["raku"], &[
"*.raku", "*.rakumod", "*.rakudoc", "*.rakutest",
"*.p6", "*.pl6", "*.pm6"
]),
(&["rdoc"], &["*.rdoc"]),
(&["readme"], &["README*", "*README"]),
(&["reasonml"], &["*.re", "*.rei"]),
(&["red"], &["*.r", "*.red", "*.reds"]),
(&["rescript"], &["*.res", "*.resi"]),
(&["robot"], &["*.robot"]),
(&["rst"], &["*.rst"]),
(&["ruby"], &[
// Idiomatic files
"config.ru", "Gemfile", ".irbrc", "Rakefile",
// Extensions
"*.gemspec", "*.rb", "*.rbw"
]),
("rust", &["*.rs"]),
("sass", &["*.sass", "*.scss"]),
("scala", &["*.scala", "*.sbt"]),
("sh", &[
(&["rust"], &["*.rs"]),
(&["sass"], &["*.sass", "*.scss"]),
(&["scala"], &["*.scala", "*.sbt"]),
(&["sh"], &[
// Portable/misc. init files
".login", ".logout", ".profile", "profile",
// bash-specific init files
@@ -228,60 +249,66 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
// Extensions
"*.bash", "*.csh", "*.ksh", "*.sh", "*.tcsh", "*.zsh",
]),
("slim", &["*.skim", "*.slim", "*.slime"]),
("smarty", &["*.tpl"]),
("sml", &["*.sml", "*.sig"]),
("solidity", &["*.sol"]),
("soy", &["*.soy"]),
("spark", &["*.spark"]),
("spec", &["*.spec"]),
("sql", &["*.sql", "*.psql"]),
("stylus", &["*.styl"]),
("sv", &["*.v", "*.vg", "*.sv", "*.svh", "*.h"]),
("svg", &["*.svg"]),
("swift", &["*.swift"]),
("swig", &["*.def", "*.i"]),
("systemd", &[
(&["slim"], &["*.skim", "*.slim", "*.slime"]),
(&["smarty"], &["*.tpl"]),
(&["sml"], &["*.sml", "*.sig"]),
(&["solidity"], &["*.sol"]),
(&["soy"], &["*.soy"]),
(&["spark"], &["*.spark"]),
(&["spec"], &["*.spec"]),
(&["sql"], &["*.sql", "*.psql"]),
(&["stylus"], &["*.styl"]),
(&["sv"], &["*.v", "*.vg", "*.sv", "*.svh", "*.h"]),
(&["svg"], &["*.svg"]),
(&["swift"], &["*.swift"]),
(&["swig"], &["*.def", "*.i"]),
(&["systemd"], &[
"*.automount", "*.conf", "*.device", "*.link", "*.mount", "*.path",
"*.scope", "*.service", "*.slice", "*.socket", "*.swap", "*.target",
"*.timer",
]),
("taskpaper", &["*.taskpaper"]),
("tcl", &["*.tcl"]),
("tex", &["*.tex", "*.ltx", "*.cls", "*.sty", "*.bib", "*.dtx", "*.ins"]),
("texinfo", &["*.texi"]),
("textile", &["*.textile"]),
("tf", &["*.tf"]),
("thrift", &["*.thrift"]),
("toml", &["*.toml", "Cargo.lock"]),
("ts", &["*.ts", "*.tsx", "*.cts", "*.mts"]),
("twig", &["*.twig"]),
("txt", &["*.txt"]),
("typoscript", &["*.typoscript", "*.ts"]),
("vala", &["*.vala"]),
("vb", &["*.vb"]),
("vcl", &["*.vcl"]),
("verilog", &["*.v", "*.vh", "*.sv", "*.svh"]),
("vhdl", &["*.vhd", "*.vhdl"]),
("vim", &[
(&["taskpaper"], &["*.taskpaper"]),
(&["tcl"], &["*.tcl"]),
(&["tex"], &["*.tex", "*.ltx", "*.cls", "*.sty", "*.bib", "*.dtx", "*.ins"]),
(&["texinfo"], &["*.texi"]),
(&["textile"], &["*.textile"]),
(&["tf"], &[
"*.tf", "*.auto.tfvars", "terraform.tfvars", "*.tf.json",
"*.auto.tfvars.json", "terraform.tfvars.json", "*.terraformrc",
"terraform.rc", "*.tfrc", "*.terraform.lock.hcl",
]),
(&["thrift"], &["*.thrift"]),
(&["toml"], &["*.toml", "Cargo.lock"]),
(&["ts", "typescript"], &["*.ts", "*.tsx", "*.cts", "*.mts"]),
(&["twig"], &["*.twig"]),
(&["txt"], &["*.txt"]),
(&["typoscript"], &["*.typoscript", "*.ts"]),
(&["usd"], &["*.usd", "*.usda", "*.usdc"]),
(&["v"], &["*.v"]),
(&["vala"], &["*.vala"]),
(&["vb"], &["*.vb"]),
(&["vcl"], &["*.vcl"]),
(&["verilog"], &["*.v", "*.vh", "*.sv", "*.svh"]),
(&["vhdl"], &["*.vhd", "*.vhdl"]),
(&["vim"], &[
"*.vim", ".vimrc", ".gvimrc", "vimrc", "gvimrc", "_vimrc", "_gvimrc",
]),
("vimscript", &[
(&["vimscript"], &[
"*.vim", ".vimrc", ".gvimrc", "vimrc", "gvimrc", "_vimrc", "_gvimrc",
]),
("webidl", &["*.idl", "*.webidl", "*.widl"]),
("wiki", &["*.mediawiki", "*.wiki"]),
("xml", &[
(&["webidl"], &["*.idl", "*.webidl", "*.widl"]),
(&["wiki"], &["*.mediawiki", "*.wiki"]),
(&["xml"], &[
"*.xml", "*.xml.dist", "*.dtd", "*.xsl", "*.xslt", "*.xsd", "*.xjb",
"*.rng", "*.sch", "*.xhtml",
]),
("xz", &["*.xz", "*.txz"]),
("yacc", &["*.y"]),
("yaml", &["*.yaml", "*.yml"]),
("yang", &["*.yang"]),
("z", &["*.Z"]),
("zig", &["*.zig"]),
("zsh", &[
(&["xz"], &["*.xz", "*.txz"]),
(&["yacc"], &["*.y"]),
(&["yaml"], &["*.yaml", "*.yml"]),
(&["yang"], &["*.yang"]),
(&["z"], &["*.Z"]),
(&["zig"], &["*.zig"]),
(&["zsh"], &[
".zshenv", "zshenv",
".zlogin", "zlogin",
".zlogout", "zlogout",
@@ -289,7 +316,7 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
".zshrc", "zshrc",
"*.zsh",
]),
("zstd", &["*.zst", "*.zstd"]),
(&["zstd"], &["*.zst", "*.zstd"]),
];
#[cfg(test)]
@@ -298,10 +325,8 @@ mod tests {
#[test]
fn default_types_are_sorted() {
let mut names = DEFAULT_TYPES.iter().map(|(name, _exts)| name);
let mut names = DEFAULT_TYPES.iter().map(|(aliases, _)| aliases[0]);
let Some(mut previous_name) = names.next() else { return; };
for name in names {
assert!(
name > previous_name,
@@ -309,7 +334,6 @@ mod tests {
name,
previous_name
);
previous_name = name;
}
}

View File

@@ -533,7 +533,7 @@ impl GitignoreBuilder {
/// Return the file path of the current environment's global gitignore file.
///
/// Note that the file path returned may not exist.
fn gitconfig_excludes_path() -> Option<PathBuf> {
pub fn gitconfig_excludes_path() -> Option<PathBuf> {
// git supports $HOME/.gitconfig and $XDG_CONFIG_HOME/git/config. Notably,
// both can be active at the same time, where $HOME/.gitconfig takes
// precedent. So if $HOME/.gitconfig defines a `core.excludesFile`, then
@@ -596,8 +596,13 @@ fn parse_excludes_file(data: &[u8]) -> Option<PathBuf> {
// probably works in more circumstances. I guess we would ideally have
// a full INI parser. Yuck.
lazy_static::lazy_static! {
static ref RE: Regex =
Regex::new(r"(?im)^\s*excludesfile\s*=\s*(.+)\s*$").unwrap();
static ref RE: Regex = Regex::new(
r"(?xim-u)
^[[:space:]]*excludesfile[[:space:]]*
=
[[:space:]]*(.+)[[:space:]]*$
"
).unwrap();
};
let caps = match RE.captures(data) {
None => return None,

View File

@@ -106,6 +106,7 @@ impl Override {
}
/// Builds a matcher for a set of glob overrides.
#[derive(Clone, Debug)]
pub struct OverrideBuilder {
builder: GitignoreBuilder,
}

View File

@@ -488,11 +488,13 @@ impl TypesBuilder {
/// Add a set of default file type definitions.
pub fn add_defaults(&mut self) -> &mut TypesBuilder {
static MSG: &'static str = "adding a default type should never fail";
for &(name, exts) in DEFAULT_TYPES {
for &(names, exts) in DEFAULT_TYPES {
for name in names {
for ext in exts {
self.add(name, ext).expect(MSG);
}
}
}
self
}
}
@@ -537,6 +539,8 @@ mod tests {
"html:*.htm",
"rust:*.rs",
"js:*.js",
"py:*.py",
"python:*.py",
"foo:*.{rs,foo}",
"combo:include:html,rust",
]
@@ -551,6 +555,8 @@ mod tests {
matched!(match7, types(), vec!["foo"], vec!["rust"], "main.foo");
matched!(match8, types(), vec!["combo"], vec![], "index.html");
matched!(match9, types(), vec!["combo"], vec![], "lib.rs");
matched!(match10, types(), vec!["py"], vec![], "main.py");
matched!(match11, types(), vec!["python"], vec![], "main.py");
matched!(not, matchnot1, types(), vec!["rust"], vec![], "index.html");
matched!(not, matchnot2, types(), vec![], vec!["rust"], "main.rs");
@@ -558,6 +564,8 @@ mod tests {
matched!(not, matchnot4, types(), vec!["rust"], vec!["foo"], "main.rs");
matched!(not, matchnot5, types(), vec!["rust"], vec!["foo"], "main.foo");
matched!(not, matchnot6, types(), vec!["combo"], vec![], "leftpad.js");
matched!(not, matchnot7, types(), vec!["py"], vec![], "index.html");
matched!(not, matchnot8, types(), vec!["python"], vec![], "doc.md");
#[test]
fn test_invalid_defs() {
@@ -569,7 +577,7 @@ mod tests {
let original_defs = btypes.definitions();
let bad_defs = vec![
// Reference to type that does not exist
"combo:include:html,python",
"combo:include:html,qwerty",
// Bad format
"combo:foobar:html,rust",
"",

View File

@@ -1282,7 +1282,7 @@ impl WalkParallel {
let quit_now = Arc::new(AtomicBool::new(false));
let num_pending =
Arc::new(AtomicUsize::new(stack.lock().unwrap().len()));
crossbeam_utils::thread::scope(|s| {
std::thread::scope(|s| {
let mut handles = vec![];
for _ in 0..threads {
let worker = Worker {
@@ -1296,13 +1296,12 @@ impl WalkParallel {
skip: self.skip.clone(),
filter: self.filter.clone(),
};
handles.push(s.spawn(|_| worker.run()));
handles.push(s.spawn(|| worker.run()));
}
for handle in handles {
handle.join().unwrap();
}
})
.unwrap(); // Pass along panics from threads
});
}
fn threads(&self) -> usize {
@@ -1682,7 +1681,7 @@ impl<'s> Worker<'s> {
stack.pop()
}
/// Signal that work has been received.
/// Signal that work has been finished.
fn work_done(&self) {
self.num_pending.fetch_sub(1, Ordering::SeqCst);
}

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-matcher"
version = "0.1.5" #:version
version = "0.1.6" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
A trait for regular expressions, with a focus on line oriented search.

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-pcre2"
version = "0.1.5" #:version
version = "0.1.6" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Use PCRE2 with the 'grep' crate.
@@ -14,5 +14,6 @@ license = "Unlicense OR MIT"
edition = "2018"
[dependencies]
grep-matcher = { version = "0.1.5", path = "../matcher" }
pcre2 = "0.2.3"
grep-matcher = { version = "0.1.6", path = "../matcher" }
log = "0.4.19"
pcre2 = "0.2.4"

View File

@@ -11,6 +11,8 @@ pub struct RegexMatcherBuilder {
builder: RegexBuilder,
case_smart: bool,
word: bool,
fixed_strings: bool,
whole_line: bool,
}
impl RegexMatcherBuilder {
@@ -20,6 +22,8 @@ impl RegexMatcherBuilder {
builder: RegexBuilder::new(),
case_smart: false,
word: false,
fixed_strings: false,
whole_line: false,
}
}
@@ -29,17 +33,40 @@ impl RegexMatcherBuilder {
/// If there was a problem compiling the pattern, then an error is
/// returned.
pub fn build(&self, pattern: &str) -> Result<RegexMatcher, Error> {
self.build_many(&[pattern])
}
/// Compile all of the given patterns into a single regex that matches when
/// at least one of the patterns matches.
///
/// If there was a problem building the regex, then an error is returned.
pub fn build_many<P: AsRef<str>>(
&self,
patterns: &[P],
) -> Result<RegexMatcher, Error> {
let mut builder = self.builder.clone();
if self.case_smart && !has_uppercase_literal(pattern) {
let mut pats = Vec::with_capacity(patterns.len());
for p in patterns.iter() {
pats.push(if self.fixed_strings {
format!("(?:{})", pcre2::escape(p.as_ref()))
} else {
format!("(?:{})", p.as_ref())
});
}
let mut singlepat = pats.join("|");
if self.case_smart && !has_uppercase_literal(&singlepat) {
builder.caseless(true);
}
let res = if self.word {
let pattern = format!(r"(?<!\w)(?:{})(?!\w)", pattern);
builder.build(&pattern)
} else {
builder.build(pattern)
};
res.map_err(Error::regex).map(|regex| {
if self.whole_line {
singlepat = format!(r"(?m:^)(?:{})(?m:$)", singlepat);
} else if self.word {
// We make this option exclusive with whole_line because when
// whole_line is enabled, all matches necessary fall on word
// boundaries. So this extra goop is strictly redundant.
singlepat = format!(r"(?<!\w)(?:{})(?!\w)", singlepat);
}
log::trace!("final regex: {:?}", singlepat);
builder.build(&singlepat).map_err(Error::regex).map(|regex| {
let mut names = HashMap::new();
for (i, name) in regex.capture_names().iter().enumerate() {
if let Some(ref name) = *name {
@@ -144,6 +171,21 @@ impl RegexMatcherBuilder {
self
}
/// Whether the patterns should be treated as literal strings or not. When
/// this is active, all characters, including ones that would normally be
/// special regex meta characters, are matched literally.
pub fn fixed_strings(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.fixed_strings = yes;
self
}
/// Whether each pattern should match the entire line or not. This is
/// equivalent to surrounding the pattern with `(?m:^)` and `(?m:$)`.
pub fn whole_line(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.whole_line = yes;
self
}
/// Enable Unicode matching mode.
///
/// When enabled, the following patterns become Unicode aware: `\b`, `\B`,
@@ -178,23 +220,22 @@ impl RegexMatcherBuilder {
self
}
/// When UTF matching mode is enabled, this will disable the UTF checking
/// that PCRE2 will normally perform automatically. If UTF matching mode
/// is not enabled, then this has no effect.
/// This is now deprecated and is a no-op.
///
/// UTF checking is enabled by default when UTF matching mode is enabled.
/// If UTF matching mode is enabled and UTF checking is enabled, then PCRE2
/// will return an error if you attempt to search a subject string that is
/// not valid UTF-8.
/// Previously, this option permitted disabling PCRE2's UTF-8 validity
/// check, which could result in undefined behavior if the haystack was
/// not valid UTF-8. But PCRE2 introduced a new option, `PCRE2_MATCH_INVALID_UTF`,
/// in 10.34 which this crate always sets. When this option is enabled,
/// PCRE2 claims to not have undefined behavior when the haystack is
/// invalid UTF-8.
///
/// # Safety
///
/// It is undefined behavior to disable the UTF check in UTF matching mode
/// and search a subject string that is not valid UTF-8. When the UTF check
/// is disabled, callers must guarantee that the subject string is valid
/// UTF-8.
pub unsafe fn disable_utf_check(&mut self) -> &mut RegexMatcherBuilder {
self.builder.disable_utf_check();
/// Therefore, disabling the UTF-8 check is not something that is exposed
/// by this crate.
#[deprecated(
since = "0.2.4",
note = "now a no-op due to new PCRE2 features"
)]
pub fn disable_utf_check(&mut self) -> &mut RegexMatcherBuilder {
self
}

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-printer"
version = "0.1.6" #:version
version = "0.1.7" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
An implementation of the grep crate's Sink trait that provides standard
@@ -20,12 +20,12 @@ serde1 = ["base64", "serde", "serde_json"]
[dependencies]
base64 = { version = "0.20.0", optional = true }
bstr = "1.1.0"
grep-matcher = { version = "0.1.5", path = "../matcher" }
grep-searcher = { version = "0.1.8", path = "../searcher" }
bstr = "1.6.0"
grep-matcher = { version = "0.1.6", path = "../matcher" }
grep-searcher = { version = "0.1.11", path = "../searcher" }
termcolor = "1.0.4"
serde = { version = "1.0.77", optional = true, features = ["derive"] }
serde_json = { version = "1.0.27", optional = true }
[dev-dependencies]
grep-regex = { version = "0.1.9", path = "../regex" }
grep-regex = { version = "0.1.11", path = "../regex" }

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-regex"
version = "0.1.10" #:version
version = "0.1.11" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Use Rust's regex library with the 'grep' crate.
@@ -11,13 +11,12 @@ repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/regex"
readme = "README.md"
keywords = ["regex", "grep", "search", "pattern", "line"]
license = "Unlicense OR MIT"
edition = "2018"
edition = "2021"
[dependencies]
aho-corasick = "0.7.3"
bstr = "1.1.0"
grep-matcher = { version = "0.1.5", path = "../matcher" }
log = "0.4.5"
regex = "1.1"
regex-syntax = "0.6.5"
thread_local = "1.1.2"
aho-corasick = "1.0.2"
bstr = "1.6.0"
grep-matcher = { version = "0.1.6", path = "../matcher" }
log = "0.4.19"
regex-automata = { version = "0.3.0" }
regex-syntax = "0.7.2"

View File

@@ -1,17 +1,13 @@
use regex_syntax::ast::parse::Parser;
use regex_syntax::ast::{self, Ast};
/// The results of analyzing AST of a regular expression (e.g., for supporting
/// smart case).
#[derive(Clone, Debug)]
pub struct AstAnalysis {
pub(crate) struct AstAnalysis {
/// True if and only if a literal uppercase character occurs in the regex.
any_uppercase: bool,
/// True if and only if the regex contains any literal at all.
any_literal: bool,
/// True if and only if the regex consists entirely of a literal and no
/// other special regex characters.
all_verbatim_literal: bool,
}
impl AstAnalysis {
@@ -19,16 +15,16 @@ impl AstAnalysis {
///
/// If `pattern` is not a valid regular expression, then `None` is
/// returned.
#[allow(dead_code)]
pub fn from_pattern(pattern: &str) -> Option<AstAnalysis> {
Parser::new()
#[cfg(test)]
pub(crate) fn from_pattern(pattern: &str) -> Option<AstAnalysis> {
regex_syntax::ast::parse::Parser::new()
.parse(pattern)
.map(|ast| AstAnalysis::from_ast(&ast))
.ok()
}
/// Perform an AST analysis given the AST.
pub fn from_ast(ast: &Ast) -> AstAnalysis {
pub(crate) fn from_ast(ast: &Ast) -> AstAnalysis {
let mut analysis = AstAnalysis::new();
analysis.from_ast_impl(ast);
analysis
@@ -40,7 +36,7 @@ impl AstAnalysis {
/// For example, a pattern like `\pL` contains no uppercase literals,
/// even though `L` is uppercase and the `\pL` class contains uppercase
/// characters.
pub fn any_uppercase(&self) -> bool {
pub(crate) fn any_uppercase(&self) -> bool {
self.any_uppercase
}
@@ -48,32 +44,13 @@ impl AstAnalysis {
///
/// For example, a pattern like `\pL` reports `false`, but a pattern like
/// `\pLfoo` reports `true`.
pub fn any_literal(&self) -> bool {
pub(crate) fn any_literal(&self) -> bool {
self.any_literal
}
/// Returns true if and only if the entire pattern is a verbatim literal
/// with no special meta characters.
///
/// When this is true, then the pattern satisfies the following law:
/// `escape(pattern) == pattern`. Notable examples where this returns
/// `false` include patterns like `a\u0061` even though `\u0061` is just
/// a literal `a`.
///
/// The purpose of this flag is to determine whether the patterns can be
/// given to non-regex substring search algorithms as-is.
#[allow(dead_code)]
pub fn all_verbatim_literal(&self) -> bool {
self.all_verbatim_literal
}
/// Creates a new `AstAnalysis` value with an initial configuration.
fn new() -> AstAnalysis {
AstAnalysis {
any_uppercase: false,
any_literal: false,
all_verbatim_literal: true,
}
AstAnalysis { any_uppercase: false, any_literal: false }
}
fn from_ast_impl(&mut self, ast: &Ast) {
@@ -86,26 +63,20 @@ impl AstAnalysis {
| Ast::Dot(_)
| Ast::Assertion(_)
| Ast::Class(ast::Class::Unicode(_))
| Ast::Class(ast::Class::Perl(_)) => {
self.all_verbatim_literal = false;
}
| Ast::Class(ast::Class::Perl(_)) => {}
Ast::Literal(ref x) => {
self.from_ast_literal(x);
}
Ast::Class(ast::Class::Bracketed(ref x)) => {
self.all_verbatim_literal = false;
self.from_ast_class_set(&x.kind);
}
Ast::Repetition(ref x) => {
self.all_verbatim_literal = false;
self.from_ast_impl(&x.ast);
}
Ast::Group(ref x) => {
self.all_verbatim_literal = false;
self.from_ast_impl(&x.ast);
}
Ast::Alternation(ref alt) => {
self.all_verbatim_literal = false;
for x in &alt.asts {
self.from_ast_impl(x);
}
@@ -161,9 +132,6 @@ impl AstAnalysis {
}
fn from_ast_literal(&mut self, ast: &ast::Literal) {
if ast.kind != ast::LiteralKind::Verbatim {
self.all_verbatim_literal = false;
}
self.any_literal = true;
self.any_uppercase = self.any_uppercase || ast.c.is_uppercase();
}
@@ -171,7 +139,7 @@ impl AstAnalysis {
/// Returns true if and only if the attributes can never change no matter
/// what other AST it might see.
fn done(&self) -> bool {
self.any_uppercase && self.any_literal && !self.all_verbatim_literal
self.any_uppercase && self.any_literal
}
}
@@ -188,76 +156,61 @@ mod tests {
let x = analysis("");
assert!(!x.any_uppercase);
assert!(!x.any_literal);
assert!(x.all_verbatim_literal);
let x = analysis("foo");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(x.all_verbatim_literal);
let x = analysis("Foo");
assert!(x.any_uppercase);
assert!(x.any_literal);
assert!(x.all_verbatim_literal);
let x = analysis("foO");
assert!(x.any_uppercase);
assert!(x.any_literal);
assert!(x.all_verbatim_literal);
let x = analysis(r"foo\\");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo\w");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo\S");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo\p{Ll}");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo[a-z]");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo[A-Z]");
assert!(x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo[\S\t]");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo\\S");
assert!(x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"\p{Ll}");
assert!(!x.any_uppercase);
assert!(!x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"aBc\w");
assert!(x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"a\u0061");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
}
}

View File

@@ -1,15 +1,16 @@
use grep_matcher::{ByteSet, LineTerminator};
use regex::bytes::{Regex, RegexBuilder};
use regex_syntax::ast::{self, Ast};
use regex_syntax::hir::{self, Hir};
use {
grep_matcher::{ByteSet, LineTerminator},
regex_automata::meta::Regex,
regex_syntax::{
ast,
hir::{self, Hir, HirKind},
},
};
use crate::ast::AstAnalysis;
use crate::crlf::crlfify;
use crate::error::Error;
use crate::literal::LiteralSets;
use crate::multi::alternation_literals;
use crate::non_matching::non_matching_bytes;
use crate::strip::strip_from_match;
use crate::{
ast::AstAnalysis, error::Error, non_matching::non_matching_bytes,
strip::strip_from_match,
};
/// Config represents the configuration of a regex matcher in this crate.
/// The configuration is itself a rough combination of the knobs found in
@@ -21,21 +22,23 @@ use crate::strip::strip_from_match;
/// configuration which generated it, and provides transformation on that HIR
/// such that the configuration is preserved.
#[derive(Clone, Debug)]
pub struct Config {
pub case_insensitive: bool,
pub case_smart: bool,
pub multi_line: bool,
pub dot_matches_new_line: bool,
pub swap_greed: bool,
pub ignore_whitespace: bool,
pub unicode: bool,
pub octal: bool,
pub size_limit: usize,
pub dfa_size_limit: usize,
pub nest_limit: u32,
pub line_terminator: Option<LineTerminator>,
pub crlf: bool,
pub word: bool,
pub(crate) struct Config {
pub(crate) case_insensitive: bool,
pub(crate) case_smart: bool,
pub(crate) multi_line: bool,
pub(crate) dot_matches_new_line: bool,
pub(crate) swap_greed: bool,
pub(crate) ignore_whitespace: bool,
pub(crate) unicode: bool,
pub(crate) octal: bool,
pub(crate) size_limit: usize,
pub(crate) dfa_size_limit: usize,
pub(crate) nest_limit: u32,
pub(crate) line_terminator: Option<LineTerminator>,
pub(crate) crlf: bool,
pub(crate) word: bool,
pub(crate) fixed_strings: bool,
pub(crate) whole_line: bool,
}
impl Default for Config {
@@ -50,47 +53,28 @@ impl Default for Config {
unicode: true,
octal: false,
// These size limits are much bigger than what's in the regex
// crate.
// crate by default.
size_limit: 100 * (1 << 20),
dfa_size_limit: 1000 * (1 << 20),
nest_limit: 250,
line_terminator: None,
crlf: false,
word: false,
fixed_strings: false,
whole_line: false,
}
}
}
impl Config {
/// Parse the given pattern and returned its HIR expression along with
/// the current configuration.
///
/// If there was a problem parsing the given expression then an error
/// is returned.
pub fn hir(&self, pattern: &str) -> Result<ConfiguredHIR, Error> {
let ast = self.ast(pattern)?;
let analysis = self.analysis(&ast)?;
let expr = hir::translate::TranslatorBuilder::new()
.allow_invalid_utf8(true)
.case_insensitive(self.is_case_insensitive(&analysis))
.multi_line(self.multi_line)
.dot_matches_new_line(self.dot_matches_new_line)
.swap_greed(self.swap_greed)
.unicode(self.unicode)
.build()
.translate(pattern, &ast)
.map_err(Error::regex)?;
let expr = match self.line_terminator {
None => expr,
Some(line_term) => strip_from_match(expr, line_term)?,
};
Ok(ConfiguredHIR {
original: pattern.to_string(),
config: self.clone(),
analysis,
// If CRLF mode is enabled, replace `$` with `(?:\r?$)`.
expr: if self.crlf { crlfify(expr) } else { expr },
})
/// Use this configuration to build an HIR from the given patterns. The HIR
/// returned corresponds to a single regex that is an alternation of the
/// patterns given.
pub(crate) fn build_many<P: AsRef<str>>(
&self,
patterns: &[P],
) -> Result<ConfiguredHIR, Error> {
ConfiguredHIR::new(self.clone(), patterns)
}
/// Accounting for the `smart_case` config knob, return true if and only if
@@ -105,35 +89,55 @@ impl Config {
analysis.any_literal() && !analysis.any_uppercase()
}
/// Returns true if and only if this config is simple enough such that
/// if the pattern is a simple alternation of literals, then it can be
/// constructed via a plain Aho-Corasick automaton.
/// Returns whether the given patterns should be treated as "fixed strings"
/// literals. This is different from just querying the `fixed_strings` knob
/// in that if the knob is false, this will still return true in some cases
/// if the patterns are themselves indistinguishable from literals.
///
/// Note that it is OK to return true even when settings like `multi_line`
/// are enabled, since if multi-line can impact the match semantics of a
/// regex, then it is by definition not a simple alternation of literals.
pub fn can_plain_aho_corasick(&self) -> bool {
!self.word && !self.case_insensitive && !self.case_smart
/// The main idea here is that if this returns true, then it is safe
/// to build an `regex_syntax::hir::Hir` value directly from the given
/// patterns as an alternation of `hir::Literal` values.
fn is_fixed_strings<P: AsRef<str>>(&self, patterns: &[P]) -> bool {
// When these are enabled, we really need to parse the patterns and
// let them go through the standard HIR translation process in order
// for case folding transforms to be applied.
if self.case_insensitive || self.case_smart {
return false;
}
/// Perform analysis on the AST of this pattern.
///
/// This returns an error if the given pattern failed to parse.
fn analysis(&self, ast: &Ast) -> Result<AstAnalysis, Error> {
Ok(AstAnalysis::from_ast(ast))
// Even if whole_line or word is enabled, both of those things can
// be implemented by wrapping the Hir generated by an alternation of
// fixed string literals. So for here at least, we don't care about the
// word or whole_line settings.
if self.fixed_strings {
// ... but if any literal contains a line terminator, then we've
// got to bail out because this will ultimately result in an error.
if let Some(lineterm) = self.line_terminator {
for p in patterns.iter() {
if has_line_terminator(lineterm, p.as_ref()) {
return false;
}
/// Parse the given pattern into its abstract syntax.
///
/// This returns an error if the given pattern failed to parse.
fn ast(&self, pattern: &str) -> Result<Ast, Error> {
ast::parse::ParserBuilder::new()
.nest_limit(self.nest_limit)
.octal(self.octal)
.ignore_whitespace(self.ignore_whitespace)
.build()
.parse(pattern)
.map_err(Error::regex)
}
}
return true;
}
// In this case, the only way we can hand construct the Hir is if none
// of the patterns contain meta characters. If they do, then we need to
// send them through the standard parsing/translation process.
for p in patterns.iter() {
let p = p.as_ref();
if p.chars().any(regex_syntax::is_meta_character) {
return false;
}
// Same deal as when fixed_strings is set above. If the pattern has
// a line terminator anywhere, then we need to bail out and let
// an error occur.
if let Some(lineterm) = self.line_terminator {
if has_line_terminator(lineterm, p) {
return false;
}
}
}
true
}
}
@@ -149,170 +153,268 @@ impl Config {
/// size limits set on the configured HIR will be propagated out to any
/// subsequently constructed HIR or regular expression.
#[derive(Clone, Debug)]
pub struct ConfiguredHIR {
original: String,
pub(crate) struct ConfiguredHIR {
config: Config,
analysis: AstAnalysis,
expr: Hir,
hir: Hir,
}
impl ConfiguredHIR {
/// Return the configuration for this HIR expression.
pub fn config(&self) -> &Config {
/// Parse the given patterns into a single HIR expression that represents
/// an alternation of the patterns given.
fn new<P: AsRef<str>>(
config: Config,
patterns: &[P],
) -> Result<ConfiguredHIR, Error> {
let hir = if config.is_fixed_strings(patterns) {
let mut alts = vec![];
for p in patterns.iter() {
alts.push(Hir::literal(p.as_ref().as_bytes()));
}
log::debug!(
"assembling HIR from {} fixed string literals",
alts.len()
);
let hir = Hir::alternation(alts);
hir
} else {
let mut alts = vec![];
for p in patterns.iter() {
alts.push(if config.fixed_strings {
format!("(?:{})", regex_syntax::escape(p.as_ref()))
} else {
format!("(?:{})", p.as_ref())
});
}
let pattern = alts.join("|");
let ast = ast::parse::ParserBuilder::new()
.nest_limit(config.nest_limit)
.octal(config.octal)
.ignore_whitespace(config.ignore_whitespace)
.build()
.parse(&pattern)
.map_err(Error::generic)?;
let analysis = AstAnalysis::from_ast(&ast);
let mut hir = hir::translate::TranslatorBuilder::new()
.utf8(false)
.case_insensitive(config.is_case_insensitive(&analysis))
.multi_line(config.multi_line)
.dot_matches_new_line(config.dot_matches_new_line)
.crlf(config.crlf)
.swap_greed(config.swap_greed)
.unicode(config.unicode)
.build()
.translate(&pattern, &ast)
.map_err(Error::generic)?;
// We don't need to do this for the fixed-strings case above
// because is_fixed_strings will return false if any pattern
// contains a line terminator. Therefore, we don't need to strip
// it.
//
// We go to some pains to avoid doing this in the fixed-strings
// case because this can result in building a new HIR when ripgrep
// is given a huge set of literals to search for. And this can
// actually take a little time. It's not huge, but it's noticeable.
hir = match config.line_terminator {
None => hir,
Some(line_term) => strip_from_match(hir, line_term)?,
};
hir
};
Ok(ConfiguredHIR { config, hir })
}
/// Return a reference to the underlying configuration.
pub(crate) fn config(&self) -> &Config {
&self.config
}
/// Compute the set of non-matching bytes for this HIR expression.
pub fn non_matching_bytes(&self) -> ByteSet {
non_matching_bytes(&self.expr)
/// Return a reference to the underyling HIR.
pub(crate) fn hir(&self) -> &Hir {
&self.hir
}
/// Returns true if and only if this regex needs to have its match offsets
/// tweaked because of CRLF support. Specifically, this occurs when the
/// CRLF hack is enabled and the regex is line anchored at the end. In
/// this case, matches that end with a `\r` have the `\r` stripped.
pub fn needs_crlf_stripped(&self) -> bool {
self.config.crlf && self.expr.is_line_anchored_end()
/// Convert this HIR to a regex that can be used for matching.
pub(crate) fn to_regex(&self) -> Result<Regex, Error> {
let meta = Regex::config()
.utf8_empty(false)
.nfa_size_limit(Some(self.config.size_limit))
// We don't expose a knob for this because the one-pass DFA is
// usually not a perf bottleneck for ripgrep. But we give it some
// extra room than the default.
.onepass_size_limit(Some(10 * (1 << 20)))
// Same deal here. The default limit for full DFAs is VERY small,
// but with ripgrep we can afford to spend a bit more time on
// building them I think.
.dfa_size_limit(Some(1 * (1 << 20)))
.dfa_state_limit(Some(1_000))
.hybrid_cache_capacity(self.config.dfa_size_limit);
Regex::builder()
.configure(meta)
.build_from_hir(&self.hir)
.map_err(Error::regex)
}
/// Compute the set of non-matching bytes for this HIR expression.
pub(crate) fn non_matching_bytes(&self) -> ByteSet {
non_matching_bytes(&self.hir)
}
/// Returns the line terminator configured on this expression.
///
/// When we have beginning/end anchors (NOT line anchors), the fast line
/// searching path isn't quite correct. Or at least, doesn't match the
/// slow path. Namely, the slow path strips line terminators while the
/// fast path does not. Since '$' (when multi-line mode is disabled)
/// doesn't match at line boundaries, the existence of a line terminator
/// might cause it to not match when it otherwise would with the line
/// terminator stripped.
/// searching path isn't quite correct. Or at least, doesn't match the slow
/// path. Namely, the slow path strips line terminators while the fast path
/// does not. Since '$' (when multi-line mode is disabled) doesn't match at
/// line boundaries, the existence of a line terminator might cause it to
/// not match when it otherwise would with the line terminator stripped.
///
/// Since searching with text anchors is exceptionally rare in the
/// context of line oriented searching (multi-line mode is basically
/// always enabled), we just disable this optimization when there are
/// text anchors. We disable it by not returning a line terminator, since
/// Since searching with text anchors is exceptionally rare in the context
/// of line oriented searching (multi-line mode is basically always
/// enabled), we just disable this optimization when there are text
/// anchors. We disable it by not returning a line terminator, since
/// without a line terminator, the fast search path can't be executed.
///
/// Actually, the above is no longer quite correct. Later on, another
/// optimization was added where if the line terminator was in the set of
/// bytes that was guaranteed to never be part of a match, then the higher
/// level search infrastructure assumes that the fast line-by-line search
/// path can still be taken. This optimization applies when multi-line
/// search (not multi-line mode) is enabled. In that case, there is no
/// configured line terminator since the regex is permitted to match a
/// line terminator. But if the regex is guaranteed to never match across
/// multiple lines despite multi-line search being requested, we can still
/// do the faster and more flexible line-by-line search. This is why the
/// non-matching extraction routine removes `\n` when `\A` and `\z` are
/// present even though that's not quite correct...
///
/// See: <https://github.com/BurntSushi/ripgrep/issues/2260>
pub fn line_terminator(&self) -> Option<LineTerminator> {
if self.is_any_anchored() {
pub(crate) fn line_terminator(&self) -> Option<LineTerminator> {
if self.hir.properties().look_set().contains_anchor_haystack() {
None
} else {
self.config.line_terminator
}
}
/// Returns true if and only if the underlying HIR has any text anchors.
fn is_any_anchored(&self) -> bool {
self.expr.is_any_anchored_start() || self.expr.is_any_anchored_end()
}
/// Builds a regular expression from this HIR expression.
pub fn regex(&self) -> Result<Regex, Error> {
self.pattern_to_regex(&self.expr.to_string())
}
/// If this HIR corresponds to an alternation of literals with no
/// capturing groups, then this returns those literals.
pub fn alternation_literals(&self) -> Option<Vec<Vec<u8>>> {
if !self.config.can_plain_aho_corasick() {
return None;
}
alternation_literals(&self.expr)
}
/// Applies the given function to the concrete syntax of this HIR and then
/// generates a new HIR based on the result of the function in a way that
/// preserves the configuration.
/// Turns this configured HIR into one that only matches when both sides of
/// the match correspond to a word boundary.
///
/// For example, this can be used to wrap a user provided regular
/// expression with additional semantics. e.g., See the `WordMatcher`.
pub fn with_pattern<F: FnMut(&str) -> String>(
&self,
mut f: F,
) -> Result<ConfiguredHIR, Error> {
self.pattern_to_hir(&f(&self.expr.to_string()))
/// Note that the HIR returned is like turning `pat` into
/// `(?m:^|\W)(pat)(?m:$|\W)`. That is, the true match is at capture group
/// `1` and not `0`.
pub(crate) fn into_word(self) -> Result<ConfiguredHIR, Error> {
// In theory building the HIR for \W should never fail, but there are
// likely some pathological cases (particularly with respect to certain
// values of limits) where it could in theory fail.
let non_word = {
let mut config = self.config.clone();
config.fixed_strings = false;
ConfiguredHIR::new(config, &[r"\W"])?
};
let line_anchor_start = Hir::look(self.line_anchor_start());
let line_anchor_end = Hir::look(self.line_anchor_end());
let hir = Hir::concat(vec![
Hir::alternation(vec![line_anchor_start, non_word.hir.clone()]),
Hir::capture(hir::Capture {
index: 1,
name: None,
sub: Box::new(renumber_capture_indices(self.hir)?),
}),
Hir::alternation(vec![non_word.hir, line_anchor_end]),
]);
Ok(ConfiguredHIR { config: self.config, hir })
}
/// If the current configuration has a line terminator set and if useful
/// literals could be extracted, then a regular expression matching those
/// literals is returned. If no line terminator is set, then `None` is
/// returned.
///
/// If compiling the resulting regular expression failed, then an error
/// is returned.
///
/// This method only returns something when a line terminator is set
/// because matches from this regex are generally candidates that must be
/// confirmed before reporting a match. When performing a line oriented
/// search, confirmation is easy: just extend the candidate match to its
/// respective line boundaries and then re-search that line for a full
/// match. This only works when the line terminator is set because the line
/// terminator setting guarantees that the regex itself can never match
/// through the line terminator byte.
pub fn fast_line_regex(&self) -> Result<Option<Regex>, Error> {
if self.config.line_terminator.is_none() {
return Ok(None);
/// Turns this configured HIR into an equivalent one, but where it must
/// match at the start and end of a line.
pub(crate) fn into_whole_line(self) -> ConfiguredHIR {
let line_anchor_start = Hir::look(self.line_anchor_start());
let line_anchor_end = Hir::look(self.line_anchor_end());
let hir =
Hir::concat(vec![line_anchor_start, self.hir, line_anchor_end]);
ConfiguredHIR { config: self.config, hir }
}
match LiteralSets::new(&self.expr).one_regex(self.config.word) {
None => Ok(None),
Some(pattern) => self.pattern_to_regex(&pattern).map(Some),
/// Turns this configured HIR into an equivalent one, but where it must
/// match at the start and end of the haystack.
pub(crate) fn into_anchored(self) -> ConfiguredHIR {
let hir = Hir::concat(vec![
Hir::look(hir::Look::Start),
self.hir,
Hir::look(hir::Look::End),
]);
ConfiguredHIR { config: self.config, hir }
}
/// Returns the "start line" anchor for this configuration.
fn line_anchor_start(&self) -> hir::Look {
if self.config.crlf {
hir::Look::StartCRLF
} else {
hir::Look::StartLF
}
}
/// Create a regex from the given pattern using this HIR's configuration.
fn pattern_to_regex(&self, pattern: &str) -> Result<Regex, Error> {
// The settings we explicitly set here are intentionally a subset
// of the settings we have. The key point here is that our HIR
// expression is computed with the settings in mind, such that setting
// them here could actually lead to unintended behavior. For example,
// consider the pattern `(?U)a+`. This will get folded into the HIR
// as a non-greedy repetition operator which will in turn get printed
// to the concrete syntax as `a+?`, which is correct. But if we
// set the `swap_greed` option again, then we'll wind up with `(?U)a+?`
// which is equal to `a+` which is not the same as what we were given.
//
// We also don't need to apply `case_insensitive` since this gets
// folded into the HIR and would just cause us to do redundant work.
//
// Finally, we don't need to set `ignore_whitespace` since the concrete
// syntax emitted by the HIR printer never needs it.
//
// We set the rest of the options. Some of them are important, such as
// the size limit, and some of them are necessary to preserve the
// intention of the original pattern. For example, the Unicode flag
// will impact how the WordMatcher functions, namely, whether its
// word boundaries are Unicode aware or not.
RegexBuilder::new(&pattern)
.nest_limit(self.config.nest_limit)
.octal(self.config.octal)
.multi_line(self.config.multi_line)
.dot_matches_new_line(self.config.dot_matches_new_line)
.unicode(self.config.unicode)
.size_limit(self.config.size_limit)
.dfa_size_limit(self.config.dfa_size_limit)
.build()
.map_err(Error::regex)
/// Returns the "end line" anchor for this configuration.
fn line_anchor_end(&self) -> hir::Look {
if self.config.crlf {
hir::Look::EndCRLF
} else {
hir::Look::EndLF
}
}
}
/// Create an HIR expression from the given pattern using this HIR's
/// configuration.
fn pattern_to_hir(&self, pattern: &str) -> Result<ConfiguredHIR, Error> {
// See `pattern_to_regex` comment for explanation of why we only set
// a subset of knobs here. e.g., `swap_greed` is explicitly left out.
let expr = ::regex_syntax::ParserBuilder::new()
.nest_limit(self.config.nest_limit)
.octal(self.config.octal)
.allow_invalid_utf8(true)
.multi_line(self.config.multi_line)
.dot_matches_new_line(self.config.dot_matches_new_line)
.unicode(self.config.unicode)
.build()
.parse(pattern)
.map_err(Error::regex)?;
Ok(ConfiguredHIR {
original: self.original.clone(),
config: self.config.clone(),
analysis: self.analysis.clone(),
expr,
/// This increments the index of every capture group in the given hir by 1. If
/// any increment results in an overflow, then an error is returned.
fn renumber_capture_indices(hir: Hir) -> Result<Hir, Error> {
Ok(match hir.into_kind() {
HirKind::Empty => Hir::empty(),
HirKind::Literal(hir::Literal(lit)) => Hir::literal(lit),
HirKind::Class(cls) => Hir::class(cls),
HirKind::Look(x) => Hir::look(x),
HirKind::Repetition(mut x) => {
x.sub = Box::new(renumber_capture_indices(*x.sub)?);
Hir::repetition(x)
}
HirKind::Capture(mut cap) => {
cap.index = match cap.index.checked_add(1) {
Some(index) => index,
None => {
// This error message kind of sucks, but it's probably
// impossible for it to happen. The only way a capture
// index can overflow addition is if the regex is huge
// (or something else has gone horribly wrong).
let msg = "could not renumber capture index, too big";
return Err(Error::any(msg));
}
};
cap.sub = Box::new(renumber_capture_indices(*cap.sub)?);
Hir::capture(cap)
}
HirKind::Concat(subs) => {
let subs = subs
.into_iter()
.map(|sub| renumber_capture_indices(sub))
.collect::<Result<Vec<Hir>, Error>>()?;
Hir::concat(subs)
}
HirKind::Alternation(subs) => {
let subs = subs
.into_iter()
.map(|sub| renumber_capture_indices(sub))
.collect::<Result<Vec<Hir>, Error>>()?;
Hir::alternation(subs)
}
})
}
/// Returns true if the given literal string contains any byte from the line
/// terminator given.
fn has_line_terminator(lineterm: LineTerminator, literal: &str) -> bool {
if lineterm.is_crlf() {
literal.as_bytes().iter().copied().any(|b| b == b'\r' || b == b'\n')
} else {
literal.as_bytes().iter().copied().any(|b| b == lineterm.as_byte())
}
}

View File

@@ -1,189 +0,0 @@
use std::collections::HashMap;
use grep_matcher::{Match, Matcher, NoError};
use regex::bytes::Regex;
use regex_syntax::hir::{self, Hir, HirKind};
use crate::config::ConfiguredHIR;
use crate::error::Error;
use crate::matcher::RegexCaptures;
/// A matcher for implementing "word match" semantics.
#[derive(Clone, Debug)]
pub struct CRLFMatcher {
/// The regex.
regex: Regex,
/// A map from capture group name to capture group index.
names: HashMap<String, usize>,
}
impl CRLFMatcher {
/// Create a new matcher from the given pattern that strips `\r` from the
/// end of every match.
///
/// This panics if the given expression doesn't need its CRLF stripped.
pub fn new(expr: &ConfiguredHIR) -> Result<CRLFMatcher, Error> {
assert!(expr.needs_crlf_stripped());
let regex = expr.regex()?;
let mut names = HashMap::new();
for (i, optional_name) in regex.capture_names().enumerate() {
if let Some(name) = optional_name {
names.insert(name.to_string(), i.checked_sub(1).unwrap());
}
}
Ok(CRLFMatcher { regex, names })
}
/// Return the underlying regex used by this matcher.
pub fn regex(&self) -> &Regex {
&self.regex
}
}
impl Matcher for CRLFMatcher {
type Captures = RegexCaptures;
type Error = NoError;
fn find_at(
&self,
haystack: &[u8],
at: usize,
) -> Result<Option<Match>, NoError> {
let m = match self.regex.find_at(haystack, at) {
None => return Ok(None),
Some(m) => Match::new(m.start(), m.end()),
};
Ok(Some(adjust_match(haystack, m)))
}
fn new_captures(&self) -> Result<RegexCaptures, NoError> {
Ok(RegexCaptures::new(self.regex.capture_locations()))
}
fn capture_count(&self) -> usize {
self.regex.captures_len().checked_sub(1).unwrap()
}
fn capture_index(&self, name: &str) -> Option<usize> {
self.names.get(name).map(|i| *i)
}
fn captures_at(
&self,
haystack: &[u8],
at: usize,
caps: &mut RegexCaptures,
) -> Result<bool, NoError> {
caps.strip_crlf(false);
let r =
self.regex.captures_read_at(caps.locations_mut(), haystack, at);
if !r.is_some() {
return Ok(false);
}
// If the end of our match includes a `\r`, then strip it from all
// capture groups ending at the same location.
let end = caps.locations().get(0).unwrap().1;
if end > 0 && haystack.get(end - 1) == Some(&b'\r') {
caps.strip_crlf(true);
}
Ok(true)
}
// We specifically do not implement other methods like find_iter or
// captures_iter. Namely, the iter methods are guaranteed to be correct
// by virtue of implementing find_at and captures_at above.
}
/// If the given match ends with a `\r`, then return a new match that ends
/// immediately before the `\r`.
pub fn adjust_match(haystack: &[u8], m: Match) -> Match {
if m.end() > 0 && haystack.get(m.end() - 1) == Some(&b'\r') {
m.with_end(m.end() - 1)
} else {
m
}
}
/// Substitutes all occurrences of multi-line enabled `$` with `(?:\r?$)`.
///
/// This does not preserve the exact semantics of the given expression,
/// however, it does have the useful property that anything that matched the
/// given expression will also match the returned expression. The difference is
/// that the returned expression can match possibly other things as well.
///
/// The principle reason why we do this is because the underlying regex engine
/// doesn't support CRLF aware `$` look-around. It's planned to fix it at that
/// level, but we perform this kludge in the mean time.
///
/// Note that while the match preserving semantics are nice and neat, the
/// match position semantics are quite a bit messier. Namely, `$` only ever
/// matches the position between characters where as `\r??` can match a
/// character and change the offset. This is regretable, but works out pretty
/// nicely in most cases, especially when a match is limited to a single line.
pub fn crlfify(expr: Hir) -> Hir {
match expr.into_kind() {
HirKind::Anchor(hir::Anchor::EndLine) => {
let concat = Hir::concat(vec![
Hir::repetition(hir::Repetition {
kind: hir::RepetitionKind::ZeroOrOne,
greedy: false,
hir: Box::new(Hir::literal(hir::Literal::Unicode('\r'))),
}),
Hir::anchor(hir::Anchor::EndLine),
]);
Hir::group(hir::Group {
kind: hir::GroupKind::NonCapturing,
hir: Box::new(concat),
})
}
HirKind::Empty => Hir::empty(),
HirKind::Literal(x) => Hir::literal(x),
HirKind::Class(x) => Hir::class(x),
HirKind::Anchor(x) => Hir::anchor(x),
HirKind::WordBoundary(x) => Hir::word_boundary(x),
HirKind::Repetition(mut x) => {
x.hir = Box::new(crlfify(*x.hir));
Hir::repetition(x)
}
HirKind::Group(mut x) => {
x.hir = Box::new(crlfify(*x.hir));
Hir::group(x)
}
HirKind::Concat(xs) => {
Hir::concat(xs.into_iter().map(crlfify).collect())
}
HirKind::Alternation(xs) => {
Hir::alternation(xs.into_iter().map(crlfify).collect())
}
}
}
#[cfg(test)]
mod tests {
use super::crlfify;
use regex_syntax::Parser;
fn roundtrip(pattern: &str) -> String {
let expr1 = Parser::new().parse(pattern).unwrap();
let expr2 = crlfify(expr1);
expr2.to_string()
}
#[test]
fn various() {
assert_eq!(roundtrip(r"(?m)$"), "(?:\r??(?m:$))");
assert_eq!(roundtrip(r"(?m)$$"), "(?:\r??(?m:$))(?:\r??(?m:$))");
assert_eq!(
roundtrip(r"(?m)(?:foo$|bar$)"),
"(?:foo(?:\r??(?m:$))|bar(?:\r??(?m:$)))"
);
assert_eq!(roundtrip(r"(?m)$a"), "(?:\r??(?m:$))a");
// Not a multiline `$`, so no crlfifying occurs.
assert_eq!(roundtrip(r"$"), "\\z");
// It's a literal, derp.
assert_eq!(roundtrip(r"\$"), "\\$");
}
}

View File

@@ -1,8 +1,3 @@
use std::error;
use std::fmt;
use crate::util;
/// An error that can occur in this crate.
///
/// Generally, this error corresponds to problems building a regular
@@ -18,10 +13,27 @@ impl Error {
Error { kind }
}
pub(crate) fn regex<E: error::Error>(err: E) -> Error {
pub(crate) fn regex(err: regex_automata::meta::BuildError) -> Error {
if let Some(size_limit) = err.size_limit() {
let kind = ErrorKind::Regex(format!(
"compiled regex exceeds size limit of {size_limit}",
));
Error { kind }
} else if let Some(ref err) = err.syntax_error() {
Error::generic(err)
} else {
Error::generic(err)
}
}
pub(crate) fn generic<E: std::error::Error>(err: E) -> Error {
Error { kind: ErrorKind::Regex(err.to_string()) }
}
pub(crate) fn any<E: ToString>(msg: E) -> Error {
Error { kind: ErrorKind::Regex(msg.to_string()) }
}
/// Return the kind of this error.
pub fn kind(&self) -> &ErrorKind {
&self.kind
@@ -30,6 +42,7 @@ impl Error {
/// The kind of an error that can occur.
#[derive(Clone, Debug)]
#[non_exhaustive]
pub enum ErrorKind {
/// An error that occurred as a result of parsing a regular expression.
/// This can be a syntax error or an error that results from attempting to
@@ -51,38 +64,26 @@ pub enum ErrorKind {
///
/// The invalid byte is included in this error.
InvalidLineTerminator(u8),
/// Hints that destructuring should not be exhaustive.
///
/// This enum may grow additional variants, so this makes sure clients
/// don't count on exhaustive matching. (Otherwise, adding a new variant
/// could break existing code.)
#[doc(hidden)]
__Nonexhaustive,
}
impl error::Error for Error {
fn description(&self) -> &str {
match self.kind {
ErrorKind::Regex(_) => "regex error",
ErrorKind::NotAllowed(_) => "literal not allowed",
ErrorKind::InvalidLineTerminator(_) => "invalid line terminator",
ErrorKind::__Nonexhaustive => unreachable!(),
}
}
}
impl std::error::Error for Error {}
impl std::fmt::Display for Error {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
use bstr::ByteSlice;
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self.kind {
ErrorKind::Regex(ref s) => write!(f, "{}", s),
ErrorKind::NotAllowed(ref lit) => {
write!(f, "the literal '{:?}' is not allowed in a regex", lit)
write!(f, "the literal {:?} is not allowed in a regex", lit)
}
ErrorKind::InvalidLineTerminator(byte) => {
let x = util::show_bytes(&[byte]);
write!(f, "line terminators must be ASCII, but '{}' is not", x)
}
ErrorKind::__Nonexhaustive => unreachable!(),
write!(
f,
"line terminators must be ASCII, but {} is not",
[byte].as_bstr()
)
}
}
}
}

View File

@@ -8,12 +8,9 @@ pub use crate::matcher::{RegexCaptures, RegexMatcher, RegexMatcherBuilder};
mod ast;
mod config;
mod crlf;
mod error;
mod literal;
mod matcher;
mod multi;
mod non_matching;
mod strip;
mod util;
mod word;

File diff suppressed because it is too large Load Diff

View File

@@ -1,15 +1,22 @@
use std::collections::HashMap;
use std::sync::Arc;
use grep_matcher::{
ByteSet, Captures, LineMatchKind, LineTerminator, Match, Matcher, NoError,
use {
grep_matcher::{
ByteSet, Captures, LineMatchKind, LineTerminator, Match, Matcher,
NoError,
},
regex_automata::{
meta::Regex, util::captures::Captures as AutomataCaptures, Input,
PatternID,
},
};
use regex::bytes::{CaptureLocations, Regex};
use crate::config::{Config, ConfiguredHIR};
use crate::crlf::CRLFMatcher;
use crate::error::Error;
use crate::multi::MultiLiteralMatcher;
use crate::word::WordMatcher;
use crate::{
config::{Config, ConfiguredHIR},
error::Error,
literal::InnerLiterals,
word::WordMatcher,
};
/// A builder for constructing a `Matcher` using regular expressions.
///
@@ -43,18 +50,37 @@ impl RegexMatcherBuilder {
/// The syntax supported is documented as part of the regex crate:
/// <https://docs.rs/regex/#syntax>.
pub fn build(&self, pattern: &str) -> Result<RegexMatcher, Error> {
let chir = self.config.hir(pattern)?;
let fast_line_regex = chir.fast_line_regex()?;
let non_matching_bytes = chir.non_matching_bytes();
if let Some(ref re) = fast_line_regex {
log::debug!("extracted fast line regex: {:?}", re);
self.build_many(&[pattern])
}
let matcher = RegexMatcherImpl::new(&chir)?;
log::trace!("final regex: {:?}", matcher.regex());
let mut config = self.config.clone();
// We override the line terminator in case the configured expr doesn't
/// Build a new matcher using the current configuration for the provided
/// patterns. The resulting matcher behaves as if all of the patterns
/// given are joined together into a single alternation. That is, it
/// reports matches where at least one of the given patterns matches.
pub fn build_many<P: AsRef<str>>(
&self,
patterns: &[P],
) -> Result<RegexMatcher, Error> {
let chir = self.config.build_many(patterns)?;
let matcher = RegexMatcherImpl::new(chir)?;
let (chir, re) = (matcher.chir(), matcher.regex());
log::trace!("final regex: {:?}", chir.hir().to_string());
let non_matching_bytes = chir.non_matching_bytes();
// If we can pick out some literals from the regex, then we might be
// able to build a faster regex that quickly identifies candidate
// matching lines. The regex engine will do what it can on its own, but
// we can specifically do a little more when a line terminator is set.
// For example, for a regex like `\w+foo\w+`, we can look for `foo`,
// and when a match is found, look for the line containing `foo` and
// then run the original regex on only that line. (In this case, the
// regex engine is likely to handle this case for us since it's so
// simple, but the idea applies.)
let fast_line_regex = InnerLiterals::new(chir, re).one_regex()?;
// We override the line terminator in case the configured HIR doesn't
// support it.
let mut config = self.config.clone();
config.line_terminator = chir.line_terminator();
Ok(RegexMatcher {
config,
@@ -73,39 +99,7 @@ impl RegexMatcherBuilder {
&self,
literals: &[B],
) -> Result<RegexMatcher, Error> {
let mut has_escape = false;
let mut slices = vec![];
for lit in literals {
slices.push(lit.as_ref());
has_escape = has_escape || lit.as_ref().contains('\\');
}
// Even when we have a fixed set of literals, we might still want to
// use the regex engine. Specifically, if any string has an escape
// in it, then we probably can't feed it to Aho-Corasick without
// removing the escape. Additionally, if there are any particular
// special match semantics we need to honor, that Aho-Corasick isn't
// enough. Finally, the regex engine can do really well with a small
// number of literals (at time of writing, this is changing soon), so
// we use it when there's a small set.
//
// Yes, this is one giant hack. Ideally, this entirely separate literal
// matcher that uses Aho-Corasick would be pushed down into the regex
// engine.
if has_escape
|| !self.config.can_plain_aho_corasick()
|| literals.len() < 40
{
return self.build(&slices.join("|"));
}
let matcher = MultiLiteralMatcher::new(&slices)?;
let imp = RegexMatcherImpl::MultiLiteral(matcher);
Ok(RegexMatcher {
config: self.config.clone(),
matcher: imp,
fast_line_regex: None,
non_matching_bytes: ByteSet::empty(),
})
self.build_many(literals)
}
/// Set the value for the case insensitive (`i`) flag.
@@ -306,20 +300,15 @@ impl RegexMatcherBuilder {
/// 1. It causes the line terminator for the matcher to be `\r\n`. Namely,
/// this prevents the matcher from ever producing a match that contains
/// a `\r` or `\n`.
/// 2. It translates all instances of `$` in the pattern to `(?:\r??$)`.
/// This works around the fact that the regex engine does not support
/// matching CRLF as a line terminator when using `$`.
/// 2. It enables CRLF mode for `^` and `$`. This means that line anchors
/// will treat both `\r` and `\n` as line terminators, but will never
/// match between a `\r` and `\n`.
///
/// In particular, because of (2), the matches produced by the matcher may
/// be slightly different than what one would expect given the pattern.
/// This is the trade off made: in many cases, `$` will "just work" in the
/// presence of `\r\n` line terminators, but matches may require some
/// trimming to faithfully represent the intended match.
///
/// Note that if you do not wish to set the line terminator but would still
/// like `$` to match `\r\n` line terminators, then it is valid to call
/// `crlf(true)` followed by `line_terminator(None)`. Ordering is
/// important, since `crlf` and `line_terminator` override each other.
/// Note that if you do not wish to set the line terminator but would
/// still like `$` to match `\r\n` line terminators, then it is valid to
/// call `crlf(true)` followed by `line_terminator(None)`. Ordering is
/// important, since `crlf` sets the line terminator, but `line_terminator`
/// does not touch the `crlf` setting.
pub fn crlf(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
if yes {
self.config.line_terminator = Some(LineTerminator::crlf());
@@ -345,6 +334,21 @@ impl RegexMatcherBuilder {
self.config.word = yes;
self
}
/// Whether the patterns should be treated as literal strings or not. When
/// this is active, all characters, including ones that would normally be
/// special regex meta characters, are matched literally.
pub fn fixed_strings(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.config.fixed_strings = yes;
self
}
/// Whether each pattern should match the entire line or not. This is
/// equivalent to surrounding the pattern with `(?m:^)` and `(?m:$)`.
pub fn whole_line(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.config.whole_line = yes;
self
}
}
/// An implementation of the `Matcher` trait using Rust's standard regex
@@ -374,10 +378,10 @@ impl RegexMatcher {
/// Create a new matcher from the given pattern using the default
/// configuration, but matches lines terminated by `\n`.
///
/// This is meant to be a convenience constructor for using a
/// `RegexMatcherBuilder` and setting its
/// [`line_terminator`](struct.RegexMatcherBuilder.html#method.line_terminator)
/// to `\n`. The purpose of using this constructor is to permit special
/// This is meant to be a convenience constructor for
/// using a `RegexMatcherBuilder` and setting its
/// [`line_terminator`](RegexMatcherBuilder::method.line_terminator) to
/// `\n`. The purpose of using this constructor is to permit special
/// optimizations that help speed up line oriented search. These types of
/// optimizations are only appropriate when matches span no more than one
/// line. For this reason, this constructor will return an error if the
@@ -393,13 +397,6 @@ impl RegexMatcher {
enum RegexMatcherImpl {
/// The standard matcher used for all regular expressions.
Standard(StandardMatcher),
/// A matcher for an alternation of plain literals.
MultiLiteral(MultiLiteralMatcher),
/// A matcher that strips `\r` from the end of matches.
///
/// This is only used when the CRLF hack is enabled and the regex is line
/// anchored at the end.
CRLF(CRLFMatcher),
/// A matcher that only matches at word boundaries. This transforms the
/// regex to `(^|\W)(...)($|\W)` instead of the more intuitive `\b(...)\b`.
/// Because of this, the WordMatcher provides its own implementation of
@@ -411,29 +408,33 @@ enum RegexMatcherImpl {
impl RegexMatcherImpl {
/// Based on the configuration, create a new implementation of the
/// `Matcher` trait.
fn new(expr: &ConfiguredHIR) -> Result<RegexMatcherImpl, Error> {
if expr.config().word {
Ok(RegexMatcherImpl::Word(WordMatcher::new(expr)?))
} else if expr.needs_crlf_stripped() {
Ok(RegexMatcherImpl::CRLF(CRLFMatcher::new(expr)?))
fn new(mut chir: ConfiguredHIR) -> Result<RegexMatcherImpl, Error> {
// When whole_line is set, we don't use a word matcher even if word
// matching was requested. Why? Because `(?m:^)(pat)(?m:$)` implies
// word matching.
Ok(if chir.config().word && !chir.config().whole_line {
RegexMatcherImpl::Word(WordMatcher::new(chir)?)
} else {
if let Some(lits) = expr.alternation_literals() {
if lits.len() >= 40 {
let matcher = MultiLiteralMatcher::new(&lits)?;
return Ok(RegexMatcherImpl::MultiLiteral(matcher));
}
}
Ok(RegexMatcherImpl::Standard(StandardMatcher::new(expr)?))
if chir.config().whole_line {
chir = chir.into_whole_line();
}
RegexMatcherImpl::Standard(StandardMatcher::new(chir)?)
})
}
/// Return the underlying regex object used.
fn regex(&self) -> String {
fn regex(&self) -> &Regex {
match *self {
RegexMatcherImpl::Word(ref x) => x.regex().to_string(),
RegexMatcherImpl::CRLF(ref x) => x.regex().to_string(),
RegexMatcherImpl::MultiLiteral(_) => "<N/A>".to_string(),
RegexMatcherImpl::Standard(ref x) => x.regex.to_string(),
RegexMatcherImpl::Word(ref x) => x.regex(),
RegexMatcherImpl::Standard(ref x) => &x.regex,
}
}
/// Return the underlying HIR of the regex used for searching.
fn chir(&self) -> &ConfiguredHIR {
match *self {
RegexMatcherImpl::Word(ref x) => x.chir(),
RegexMatcherImpl::Standard(ref x) => &x.chir,
}
}
}
@@ -453,8 +454,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.find_at(haystack, at),
MultiLiteral(ref m) => m.find_at(haystack, at),
CRLF(ref m) => m.find_at(haystack, at),
Word(ref m) => m.find_at(haystack, at),
}
}
@@ -463,8 +462,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.new_captures(),
MultiLiteral(ref m) => m.new_captures(),
CRLF(ref m) => m.new_captures(),
Word(ref m) => m.new_captures(),
}
}
@@ -473,8 +470,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.capture_count(),
MultiLiteral(ref m) => m.capture_count(),
CRLF(ref m) => m.capture_count(),
Word(ref m) => m.capture_count(),
}
}
@@ -483,8 +478,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.capture_index(name),
MultiLiteral(ref m) => m.capture_index(name),
CRLF(ref m) => m.capture_index(name),
Word(ref m) => m.capture_index(name),
}
}
@@ -493,8 +486,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.find(haystack),
MultiLiteral(ref m) => m.find(haystack),
CRLF(ref m) => m.find(haystack),
Word(ref m) => m.find(haystack),
}
}
@@ -506,8 +497,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.find_iter(haystack, matched),
MultiLiteral(ref m) => m.find_iter(haystack, matched),
CRLF(ref m) => m.find_iter(haystack, matched),
Word(ref m) => m.find_iter(haystack, matched),
}
}
@@ -523,8 +512,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.try_find_iter(haystack, matched),
MultiLiteral(ref m) => m.try_find_iter(haystack, matched),
CRLF(ref m) => m.try_find_iter(haystack, matched),
Word(ref m) => m.try_find_iter(haystack, matched),
}
}
@@ -537,8 +524,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.captures(haystack, caps),
MultiLiteral(ref m) => m.captures(haystack, caps),
CRLF(ref m) => m.captures(haystack, caps),
Word(ref m) => m.captures(haystack, caps),
}
}
@@ -555,8 +540,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.captures_iter(haystack, caps, matched),
MultiLiteral(ref m) => m.captures_iter(haystack, caps, matched),
CRLF(ref m) => m.captures_iter(haystack, caps, matched),
Word(ref m) => m.captures_iter(haystack, caps, matched),
}
}
@@ -573,10 +556,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.try_captures_iter(haystack, caps, matched),
MultiLiteral(ref m) => {
m.try_captures_iter(haystack, caps, matched)
}
CRLF(ref m) => m.try_captures_iter(haystack, caps, matched),
Word(ref m) => m.try_captures_iter(haystack, caps, matched),
}
}
@@ -590,8 +569,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.captures_at(haystack, at, caps),
MultiLiteral(ref m) => m.captures_at(haystack, at, caps),
CRLF(ref m) => m.captures_at(haystack, at, caps),
Word(ref m) => m.captures_at(haystack, at, caps),
}
}
@@ -608,8 +585,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.replace(haystack, dst, append),
MultiLiteral(ref m) => m.replace(haystack, dst, append),
CRLF(ref m) => m.replace(haystack, dst, append),
Word(ref m) => m.replace(haystack, dst, append),
}
}
@@ -629,12 +604,6 @@ impl Matcher for RegexMatcher {
Standard(ref m) => {
m.replace_with_captures(haystack, caps, dst, append)
}
MultiLiteral(ref m) => {
m.replace_with_captures(haystack, caps, dst, append)
}
CRLF(ref m) => {
m.replace_with_captures(haystack, caps, dst, append)
}
Word(ref m) => {
m.replace_with_captures(haystack, caps, dst, append)
}
@@ -645,8 +614,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.is_match(haystack),
MultiLiteral(ref m) => m.is_match(haystack),
CRLF(ref m) => m.is_match(haystack),
Word(ref m) => m.is_match(haystack),
}
}
@@ -659,8 +626,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.is_match_at(haystack, at),
MultiLiteral(ref m) => m.is_match_at(haystack, at),
CRLF(ref m) => m.is_match_at(haystack, at),
Word(ref m) => m.is_match_at(haystack, at),
}
}
@@ -672,8 +637,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.shortest_match(haystack),
MultiLiteral(ref m) => m.shortest_match(haystack),
CRLF(ref m) => m.shortest_match(haystack),
Word(ref m) => m.shortest_match(haystack),
}
}
@@ -686,8 +649,6 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.shortest_match_at(haystack, at),
MultiLiteral(ref m) => m.shortest_match_at(haystack, at),
CRLF(ref m) => m.shortest_match_at(haystack, at),
Word(ref m) => m.shortest_match_at(haystack, at),
}
}
@@ -706,7 +667,10 @@ impl Matcher for RegexMatcher {
) -> Result<Option<LineMatchKind>, NoError> {
Ok(match self.fast_line_regex {
Some(ref regex) => {
regex.shortest_match(haystack).map(LineMatchKind::Candidate)
let input = Input::new(haystack);
regex
.search_half(&input)
.map(|hm| LineMatchKind::Candidate(hm.offset()))
}
None => {
self.shortest_match(haystack)?.map(LineMatchKind::Confirmed)
@@ -721,20 +685,19 @@ struct StandardMatcher {
/// The regular expression compiled from the pattern provided by the
/// caller.
regex: Regex,
/// A map from capture group name to its corresponding index.
names: HashMap<String, usize>,
/// The HIR that produced this regex.
///
/// We put this in an `Arc` because by the time it gets here, it won't
/// change. And because cloning and dropping an `Hir` is somewhat expensive
/// due to its deep recursive representation.
chir: Arc<ConfiguredHIR>,
}
impl StandardMatcher {
fn new(expr: &ConfiguredHIR) -> Result<StandardMatcher, Error> {
let regex = expr.regex()?;
let mut names = HashMap::new();
for (i, optional_name) in regex.capture_names().enumerate() {
if let Some(name) = optional_name {
names.insert(name.to_string(), i);
}
}
Ok(StandardMatcher { regex, names })
fn new(chir: ConfiguredHIR) -> Result<StandardMatcher, Error> {
let chir = Arc::new(chir);
let regex = chir.to_regex()?;
Ok(StandardMatcher { regex, chir })
}
}
@@ -747,14 +710,12 @@ impl Matcher for StandardMatcher {
haystack: &[u8],
at: usize,
) -> Result<Option<Match>, NoError> {
Ok(self
.regex
.find_at(haystack, at)
.map(|m| Match::new(m.start(), m.end())))
let input = Input::new(haystack).span(at..haystack.len());
Ok(self.regex.find(input).map(|m| Match::new(m.start(), m.end())))
}
fn new_captures(&self) -> Result<RegexCaptures, NoError> {
Ok(RegexCaptures::new(self.regex.capture_locations()))
Ok(RegexCaptures::new(self.regex.create_captures()))
}
fn capture_count(&self) -> usize {
@@ -762,7 +723,7 @@ impl Matcher for StandardMatcher {
}
fn capture_index(&self, name: &str) -> Option<usize> {
self.names.get(name).map(|i| *i)
self.regex.group_info().to_index(PatternID::ZERO, name)
}
fn try_find_iter<F, E>(
@@ -789,10 +750,10 @@ impl Matcher for StandardMatcher {
at: usize,
caps: &mut RegexCaptures,
) -> Result<bool, NoError> {
Ok(self
.regex
.captures_read_at(&mut caps.locations_mut(), haystack, at)
.is_some())
let input = Input::new(haystack).span(at..haystack.len());
let caps = caps.captures_mut();
self.regex.search_captures(&input, caps);
Ok(caps.is_match())
}
fn shortest_match_at(
@@ -800,7 +761,8 @@ impl Matcher for StandardMatcher {
haystack: &[u8],
at: usize,
) -> Result<Option<usize>, NoError> {
Ok(self.regex.shortest_match_at(haystack, at))
let input = Input::new(haystack).span(at..haystack.len());
Ok(self.regex.search_half(&input).map(|hm| hm.offset()))
}
}
@@ -819,17 +781,9 @@ impl Matcher for StandardMatcher {
/// index of the group using the corresponding matcher's `capture_index`
/// method, and then use that index with `RegexCaptures::get`.
#[derive(Clone, Debug)]
pub struct RegexCaptures(RegexCapturesImp);
#[derive(Clone, Debug)]
enum RegexCapturesImp {
AhoCorasick {
/// The start and end of the match, corresponding to capture group 0.
mat: Option<Match>,
},
Regex {
/// Where the locations are stored.
locs: CaptureLocations,
pub struct RegexCaptures {
/// Where the captures are stored.
caps: AutomataCaptures,
/// These captures behave as if the capturing groups begin at the given
/// offset. When set to `0`, this has no affect and capture groups are
/// indexed like normal.
@@ -841,115 +795,37 @@ enum RegexCapturesImp {
/// the matcher and the capturing groups must behave as if `(re)` is
/// the `0`th capture group.
offset: usize,
/// When enable, the end of a match has `\r` stripped from it, if one
/// exists.
strip_crlf: bool,
},
}
impl Captures for RegexCaptures {
fn len(&self) -> usize {
match self.0 {
RegexCapturesImp::AhoCorasick { .. } => 1,
RegexCapturesImp::Regex { ref locs, offset, .. } => {
locs.len().checked_sub(offset).unwrap()
}
}
self.caps
.group_info()
.all_group_len()
.checked_sub(self.offset)
.unwrap()
}
fn get(&self, i: usize) -> Option<Match> {
match self.0 {
RegexCapturesImp::AhoCorasick { mat, .. } => {
if i == 0 {
mat
} else {
None
}
}
RegexCapturesImp::Regex { ref locs, offset, strip_crlf } => {
if !strip_crlf {
let actual = i.checked_add(offset).unwrap();
return locs.pos(actual).map(|(s, e)| Match::new(s, e));
}
// currently don't support capture offsetting with CRLF
// stripping
assert_eq!(offset, 0);
let m = match locs.pos(i).map(|(s, e)| Match::new(s, e)) {
None => return None,
Some(m) => m,
};
// If the end position of this match corresponds to the end
// position of the overall match, then we apply our CRLF
// stripping. Otherwise, we cannot assume stripping is correct.
if i == 0 || m.end() == locs.pos(0).unwrap().1 {
Some(m.with_end(m.end() - 1))
} else {
Some(m)
}
}
}
let actual = i.checked_add(self.offset).unwrap();
self.caps.get_group(actual).map(|sp| Match::new(sp.start, sp.end))
}
}
impl RegexCaptures {
pub(crate) fn simple() -> RegexCaptures {
RegexCaptures(RegexCapturesImp::AhoCorasick { mat: None })
}
pub(crate) fn new(locs: CaptureLocations) -> RegexCaptures {
RegexCaptures::with_offset(locs, 0)
pub(crate) fn new(caps: AutomataCaptures) -> RegexCaptures {
RegexCaptures::with_offset(caps, 0)
}
pub(crate) fn with_offset(
locs: CaptureLocations,
caps: AutomataCaptures,
offset: usize,
) -> RegexCaptures {
RegexCaptures(RegexCapturesImp::Regex {
locs,
offset,
strip_crlf: false,
})
RegexCaptures { caps, offset }
}
pub(crate) fn locations(&self) -> &CaptureLocations {
match self.0 {
RegexCapturesImp::AhoCorasick { .. } => {
panic!("getting locations for simple captures is invalid")
}
RegexCapturesImp::Regex { ref locs, .. } => locs,
}
}
pub(crate) fn locations_mut(&mut self) -> &mut CaptureLocations {
match self.0 {
RegexCapturesImp::AhoCorasick { .. } => {
panic!("getting locations for simple captures is invalid")
}
RegexCapturesImp::Regex { ref mut locs, .. } => locs,
}
}
pub(crate) fn strip_crlf(&mut self, yes: bool) {
match self.0 {
RegexCapturesImp::AhoCorasick { .. } => {
panic!("setting strip_crlf for simple captures is invalid")
}
RegexCapturesImp::Regex { ref mut strip_crlf, .. } => {
*strip_crlf = yes;
}
}
}
pub(crate) fn set_simple(&mut self, one: Option<Match>) {
match self.0 {
RegexCapturesImp::AhoCorasick { ref mut mat } => {
*mat = one;
}
RegexCapturesImp::Regex { .. } => {
panic!("setting simple captures for regex is invalid")
}
}
pub(crate) fn captures_mut(&mut self) -> &mut AutomataCaptures {
&mut self.caps
}
}
@@ -1036,7 +912,9 @@ mod tests {
}
// Test that finding candidate lines works as expected.
// FIXME: Re-enable this test once inner literal extraction works.
#[test]
#[ignore]
fn candidate_lines() {
fn is_confirmed(m: LineMatchKind) -> bool {
match m {

View File

@@ -1,6 +1,6 @@
use aho_corasick::{AhoCorasick, AhoCorasickBuilder, MatchKind};
use aho_corasick::{AhoCorasick, MatchKind};
use grep_matcher::{Match, Matcher, NoError};
use regex_syntax::hir::Hir;
use regex_syntax::hir::{Hir, HirKind};
use crate::error::Error;
use crate::matcher::RegexCaptures;
@@ -23,11 +23,10 @@ impl MultiLiteralMatcher {
pub fn new<B: AsRef<[u8]>>(
literals: &[B],
) -> Result<MultiLiteralMatcher, Error> {
let ac = AhoCorasickBuilder::new()
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.auto_configure(literals)
.build_with_size::<usize, _, _>(literals)
.map_err(Error::regex)?;
.build(literals)
.map_err(Error::generic)?;
Ok(MultiLiteralMatcher { ac })
}
}
@@ -79,13 +78,11 @@ impl Matcher for MultiLiteralMatcher {
/// Alternation literals checks if the given HIR is a simple alternation of
/// literals, and if so, returns them. Otherwise, this returns None.
pub fn alternation_literals(expr: &Hir) -> Option<Vec<Vec<u8>>> {
use regex_syntax::hir::{HirKind, Literal};
// This is pretty hacky, but basically, if `is_alternation_literal` is
// true, then we can make several assumptions about the structure of our
// HIR. This is what justifies the `unreachable!` statements below.
if !expr.is_alternation_literal() {
if !expr.properties().is_alternation_literal() {
return None;
}
let alts = match *expr.kind() {
@@ -93,26 +90,16 @@ pub fn alternation_literals(expr: &Hir) -> Option<Vec<Vec<u8>>> {
_ => return None, // one literal isn't worth it
};
let extendlit = |lit: &Literal, dst: &mut Vec<u8>| match *lit {
Literal::Unicode(c) => {
let mut buf = [0; 4];
dst.extend_from_slice(c.encode_utf8(&mut buf).as_bytes());
}
Literal::Byte(b) => {
dst.push(b);
}
};
let mut lits = vec![];
for alt in alts {
let mut lit = vec![];
match *alt.kind() {
HirKind::Empty => {}
HirKind::Literal(ref x) => extendlit(x, &mut lit),
HirKind::Literal(ref x) => lit.extend_from_slice(&x.0),
HirKind::Concat(ref exprs) => {
for e in exprs {
match *e.kind() {
HirKind::Literal(ref x) => extendlit(x, &mut lit),
HirKind::Literal(ref x) => lit.extend_from_slice(&x.0),
_ => unreachable!("expected literal, got {:?}", e),
}
}

View File

@@ -1,9 +1,13 @@
use grep_matcher::ByteSet;
use regex_syntax::hir::{self, Hir, HirKind};
use regex_syntax::utf8::Utf8Sequences;
use {
grep_matcher::ByteSet,
regex_syntax::{
hir::{self, Hir, HirKind, Look},
utf8::Utf8Sequences,
},
};
/// Return a confirmed set of non-matching bytes from the given expression.
pub fn non_matching_bytes(expr: &Hir) -> ByteSet {
pub(crate) fn non_matching_bytes(expr: &Hir) -> ByteSet {
let mut set = ByteSet::full();
remove_matching_bytes(expr, &mut set);
set
@@ -13,18 +17,27 @@ pub fn non_matching_bytes(expr: &Hir) -> ByteSet {
/// the given expression.
fn remove_matching_bytes(expr: &Hir, set: &mut ByteSet) {
match *expr.kind() {
HirKind::Empty | HirKind::WordBoundary(_) => {}
HirKind::Anchor(_) => {
HirKind::Empty
| HirKind::Look(Look::WordAscii | Look::WordAsciiNegate)
| HirKind::Look(Look::WordUnicode | Look::WordUnicodeNegate) => {}
HirKind::Look(Look::Start | Look::End) => {
// FIXME: This is wrong, but not doing this leads to incorrect
// results because of how anchored searches are implemented in
// the 'grep-searcher' crate.
set.remove(b'\n');
}
HirKind::Literal(hir::Literal::Unicode(c)) => {
for &b in c.encode_utf8(&mut [0; 4]).as_bytes() {
HirKind::Look(Look::StartLF | Look::EndLF) => {
set.remove(b'\n');
}
HirKind::Look(Look::StartCRLF | Look::EndCRLF) => {
set.remove(b'\r');
set.remove(b'\n');
}
HirKind::Literal(hir::Literal(ref lit)) => {
for &b in lit.iter() {
set.remove(b);
}
}
HirKind::Literal(hir::Literal::Byte(b)) => {
set.remove(b);
}
HirKind::Class(hir::Class::Unicode(ref cls)) => {
for range in cls.iter() {
// This is presumably faster than encoding every codepoint
@@ -42,10 +55,10 @@ fn remove_matching_bytes(expr: &Hir, set: &mut ByteSet) {
}
}
HirKind::Repetition(ref x) => {
remove_matching_bytes(&x.hir, set);
remove_matching_bytes(&x.sub, set);
}
HirKind::Group(ref x) => {
remove_matching_bytes(&x.hir, set);
HirKind::Capture(ref x) => {
remove_matching_bytes(&x.sub, set);
}
HirKind::Concat(ref xs) => {
for x in xs {
@@ -62,17 +75,13 @@ fn remove_matching_bytes(expr: &Hir, set: &mut ByteSet) {
#[cfg(test)]
mod tests {
use grep_matcher::ByteSet;
use regex_syntax::ParserBuilder;
use {grep_matcher::ByteSet, regex_syntax::ParserBuilder};
use super::non_matching_bytes;
fn extract(pattern: &str) -> ByteSet {
let expr = ParserBuilder::new()
.allow_invalid_utf8(true)
.build()
.parse(pattern)
.unwrap();
let expr =
ParserBuilder::new().utf8(false).build().parse(pattern).unwrap();
non_matching_bytes(&expr)
}
@@ -131,9 +140,13 @@ mod tests {
#[test]
fn anchor() {
// FIXME: The first four tests below should correspond to a full set
// of bytes for the non-matching bytes I think.
assert_eq!(sparse(&extract(r"^")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"$")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"\A")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"\z")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"(?m)^")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"(?m)$")), sparse_except(&[b'\n']));
}
}

View File

@@ -1,5 +1,7 @@
use grep_matcher::LineTerminator;
use regex_syntax::hir::{self, Hir, HirKind};
use {
grep_matcher::LineTerminator,
regex_syntax::hir::{self, Hir, HirKind},
};
use crate::error::{Error, ErrorKind};
@@ -15,7 +17,26 @@ use crate::error::{Error, ErrorKind};
///
/// If the given line terminator is not ASCII, then this function returns an
/// error.
pub fn strip_from_match(
///
/// Note that as of regex 1.9, this routine could theoretically be implemented
/// without returning an error. Namely, for example, we could turn
/// `foo\nbar` into `foo[a&&b]bar`. That is, replace line terminators with a
/// sub-expression that can never match anything. Thus, ripgrep would accept
/// such regexes and just silently not match anything. Regex versions prior to 1.8
/// don't support such constructs. I ended up deciding to leave the existing
/// behavior of returning an error instead. For example:
///
/// ```text
/// $ echo -n 'foo\nbar\n' | rg 'foo\nbar'
/// the literal '"\n"' is not allowed in a regex
///
/// Consider enabling multiline mode with the --multiline flag (or -U for short).
/// When multiline mode is enabled, new line characters can be matched.
/// ```
///
/// This looks like a good error message to me, and even suggests a flag that
/// the user can use instead.
pub(crate) fn strip_from_match(
expr: Hir,
line_term: LineTerminator,
) -> Result<Hir, Error> {
@@ -23,40 +44,34 @@ pub fn strip_from_match(
let expr1 = strip_from_match_ascii(expr, b'\r')?;
strip_from_match_ascii(expr1, b'\n')
} else {
let b = line_term.as_byte();
if b > 0x7F {
return Err(Error::new(ErrorKind::InvalidLineTerminator(b)));
}
strip_from_match_ascii(expr, b)
strip_from_match_ascii(expr, line_term.as_byte())
}
}
/// The implementation of strip_from_match. The given byte must be ASCII. This
/// function panics otherwise.
/// The implementation of strip_from_match. The given byte must be ASCII.
/// This function returns an error otherwise. It also returns an error if
/// it couldn't remove `\n` from the given regex without leaving an empty
/// character class in its place.
fn strip_from_match_ascii(expr: Hir, byte: u8) -> Result<Hir, Error> {
assert!(byte <= 0x7F);
let chr = byte as char;
assert_eq!(chr.len_utf8(), 1);
let invalid = || Err(Error::new(ErrorKind::NotAllowed(chr.to_string())));
if !byte.is_ascii() {
return Err(Error::new(ErrorKind::InvalidLineTerminator(byte)));
}
let ch = char::from(byte);
let invalid = || Err(Error::new(ErrorKind::NotAllowed(ch.to_string())));
Ok(match expr.into_kind() {
HirKind::Empty => Hir::empty(),
HirKind::Literal(hir::Literal::Unicode(c)) => {
if c == chr {
HirKind::Literal(hir::Literal(lit)) => {
if lit.iter().find(|&&b| b == byte).is_some() {
return invalid();
}
Hir::literal(hir::Literal::Unicode(c))
}
HirKind::Literal(hir::Literal::Byte(b)) => {
if b as char == chr {
return invalid();
}
Hir::literal(hir::Literal::Byte(b))
Hir::literal(lit)
}
HirKind::Class(hir::Class::Unicode(mut cls)) => {
if cls.ranges().is_empty() {
return Ok(Hir::class(hir::Class::Unicode(cls)));
}
let remove = hir::ClassUnicode::new(Some(
hir::ClassUnicodeRange::new(chr, chr),
hir::ClassUnicodeRange::new(ch, ch),
));
cls.difference(&remove);
if cls.ranges().is_empty() {
@@ -65,6 +80,9 @@ fn strip_from_match_ascii(expr: Hir, byte: u8) -> Result<Hir, Error> {
Hir::class(hir::Class::Unicode(cls))
}
HirKind::Class(hir::Class::Bytes(mut cls)) => {
if cls.ranges().is_empty() {
return Ok(Hir::class(hir::Class::Bytes(cls)));
}
let remove = hir::ClassBytes::new(Some(
hir::ClassBytesRange::new(byte, byte),
));
@@ -74,15 +92,14 @@ fn strip_from_match_ascii(expr: Hir, byte: u8) -> Result<Hir, Error> {
}
Hir::class(hir::Class::Bytes(cls))
}
HirKind::Anchor(x) => Hir::anchor(x),
HirKind::WordBoundary(x) => Hir::word_boundary(x),
HirKind::Look(x) => Hir::look(x),
HirKind::Repetition(mut x) => {
x.hir = Box::new(strip_from_match_ascii(*x.hir, byte)?);
x.sub = Box::new(strip_from_match_ascii(*x.sub, byte)?);
Hir::repetition(x)
}
HirKind::Group(mut x) => {
x.hir = Box::new(strip_from_match_ascii(*x.hir, byte)?);
Hir::group(x)
HirKind::Capture(mut x) => {
x.sub = Box::new(strip_from_match_ascii(*x.sub, byte)?);
Hir::capture(x)
}
HirKind::Concat(xs) => {
let xs = xs
@@ -131,11 +148,11 @@ mod tests {
#[test]
fn various() {
assert_eq!(roundtrip(r"[a\n]", b'\n'), "[a]");
assert_eq!(roundtrip(r"[a\n]", b'a'), "[\n]");
assert_eq!(roundtrip_crlf(r"[a\n]"), "[a]");
assert_eq!(roundtrip_crlf(r"[a\r]"), "[a]");
assert_eq!(roundtrip_crlf(r"[a\r\n]"), "[a]");
assert_eq!(roundtrip(r"[a\n]", b'\n'), "a");
assert_eq!(roundtrip(r"[a\n]", b'a'), "\n");
assert_eq!(roundtrip_crlf(r"[a\n]"), "a");
assert_eq!(roundtrip_crlf(r"[a\r]"), "a");
assert_eq!(roundtrip_crlf(r"[a\r\n]"), "a");
assert_eq!(roundtrip(r"(?-u)\s", b'a'), r"(?-u:[\x09-\x0D\x20])");
assert_eq!(roundtrip(r"(?-u)\s", b'\n'), r"(?-u:[\x09\x0B-\x0D\x20])");

View File

@@ -1,29 +0,0 @@
/// Converts an arbitrary sequence of bytes to a literal suitable for building
/// a regular expression.
pub fn bytes_to_regex(bs: &[u8]) -> String {
use regex_syntax::is_meta_character;
use std::fmt::Write;
let mut s = String::with_capacity(bs.len());
for &b in bs {
if b <= 0x7F && !is_meta_character(b as char) {
write!(s, r"{}", b as char).unwrap();
} else {
write!(s, r"\x{:02x}", b).unwrap();
}
}
s
}
/// Converts arbitrary bytes to a nice string.
pub fn show_bytes(bs: &[u8]) -> String {
use std::ascii::escape_default;
use std::str;
let mut nice = String::new();
for &b in bs {
let part: Vec<u8> = escape_default(b).collect();
nice.push_str(str::from_utf8(&part).unwrap());
}
nice
}

View File

@@ -1,39 +1,59 @@
use std::cell::RefCell;
use std::collections::HashMap;
use std::sync::Arc;
use std::{
collections::HashMap,
panic::{RefUnwindSafe, UnwindSafe},
sync::Arc,
};
use grep_matcher::{Match, Matcher, NoError};
use regex::bytes::{CaptureLocations, Regex};
use thread_local::ThreadLocal;
use {
grep_matcher::{Match, Matcher, NoError},
regex_automata::{
meta::Regex, util::captures::Captures, util::pool::Pool, Input,
PatternID,
},
};
use crate::config::ConfiguredHIR;
use crate::error::Error;
use crate::matcher::RegexCaptures;
use crate::{config::ConfiguredHIR, error::Error, matcher::RegexCaptures};
type PoolFn =
Box<dyn Fn() -> Captures + Send + Sync + UnwindSafe + RefUnwindSafe>;
/// A matcher for implementing "word match" semantics.
#[derive(Debug)]
pub struct WordMatcher {
pub(crate) struct WordMatcher {
/// The regex which is roughly `(?:^|\W)(<original pattern>)(?:$|\W)`.
regex: Regex,
/// The HIR that produced the regex above. We don't keep the HIR for the
/// `original` regex.
///
/// We put this in an `Arc` because by the time it gets here, it won't
/// change. And because cloning and dropping an `Hir` is somewhat expensive
/// due to its deep recursive representation.
chir: Arc<ConfiguredHIR>,
/// The original regex supplied by the user, which we use in a fast path
/// to try and detect matches before deferring to slower engines.
original: Regex,
/// A map from capture group name to capture group index.
names: HashMap<String, usize>,
/// A reusable buffer for finding the match location of the inner group.
locs: Arc<ThreadLocal<RefCell<CaptureLocations>>>,
/// A thread-safe pool of reusable buffers for finding the match offset of
/// the inner group.
caps: Arc<Pool<Captures, PoolFn>>,
}
impl Clone for WordMatcher {
fn clone(&self) -> WordMatcher {
// We implement Clone manually so that we get a fresh ThreadLocal such
// that it can set its own thread owner. This permits each thread
// usings `locs` to hit the fast path.
// We implement Clone manually so that we get a fresh Pool such that it
// can set its own thread owner. This permits each thread usings `caps`
// to hit the fast path.
//
// Note that cloning a regex is "cheap" since it uses reference
// counting internally.
let re = self.regex.clone();
WordMatcher {
regex: self.regex.clone(),
chir: Arc::clone(&self.chir),
original: self.original.clone(),
names: self.names.clone(),
locs: Arc::new(ThreadLocal::new()),
caps: Arc::new(Pool::new(Box::new(move || re.create_captures()))),
}
}
}
@@ -44,31 +64,38 @@ impl WordMatcher {
///
/// The given options are used to construct the regular expression
/// internally.
pub fn new(expr: &ConfiguredHIR) -> Result<WordMatcher, Error> {
let original =
expr.with_pattern(|pat| format!("^(?:{})$", pat))?.regex()?;
let word_expr = expr.with_pattern(|pat| {
let pat = format!(r"(?:(?m:^)|\W)({})(?:\W|(?m:$))", pat);
log::debug!("word regex: {:?}", pat);
pat
})?;
let regex = word_expr.regex()?;
let locs = Arc::new(ThreadLocal::new());
pub(crate) fn new(chir: ConfiguredHIR) -> Result<WordMatcher, Error> {
let original = chir.clone().into_anchored().to_regex()?;
let chir = Arc::new(chir.into_word()?);
let regex = chir.to_regex()?;
let caps = Arc::new(Pool::new({
let regex = regex.clone();
Box::new(move || regex.create_captures()) as PoolFn
}));
let mut names = HashMap::new();
for (i, optional_name) in regex.capture_names().enumerate() {
let it = regex.group_info().pattern_names(PatternID::ZERO);
for (i, optional_name) in it.enumerate() {
if let Some(name) = optional_name {
names.insert(name.to_string(), i.checked_sub(1).unwrap());
}
}
Ok(WordMatcher { regex, original, names, locs })
Ok(WordMatcher { regex, chir, original, names, caps })
}
/// Return the underlying regex used by this matcher.
pub fn regex(&self) -> &Regex {
/// Return the underlying regex used to match at word boundaries.
///
/// The original regex is in the capture group at index 1.
pub(crate) fn regex(&self) -> &Regex {
&self.regex
}
/// Return the underlying HIR for the regex used to match at word
/// boundaries.
pub(crate) fn chir(&self) -> &ConfiguredHIR {
&self.chir
}
/// Attempt to do a fast confirmation of a word match that covers a subset
/// (but hopefully a big subset) of most cases. Ok(Some(..)) is returned
/// when a match is found. Ok(None) is returned when there is definitively
@@ -79,12 +106,11 @@ impl WordMatcher {
haystack: &[u8],
at: usize,
) -> Result<Option<Match>, ()> {
// This is a bit hairy. The whole point here is to avoid running an
// NFA simulation in the regex engine. Remember, our word regex looks
// like this:
// This is a bit hairy. The whole point here is to avoid running a
// slower regex engine to extract capture groups. Remember, our word
// regex looks like this:
//
// (^|\W)(<original regex>)($|\W)
// where ^ and $ have multiline mode DISABLED
// (^|\W)(<original regex>)(\W|$)
//
// What we want are the match offsets of <original regex>. So in the
// easy/common case, the original regex will be sandwiched between
@@ -102,7 +128,8 @@ impl WordMatcher {
// The reason why we cannot handle the ^/$ cases here is because we
// can't assume anything about the original pattern. (Try commenting
// out the checks for ^/$ below and run the tests to see examples.)
let mut cand = match self.regex.find_at(haystack, at) {
let input = Input::new(haystack).span(at..haystack.len());
let mut cand = match self.regex.find(input) {
None => return Ok(None),
Some(m) => Match::new(m.start(), m.end()),
};
@@ -145,23 +172,23 @@ impl Matcher for WordMatcher {
//
// OK, well, it turns out that it is worth it! But it is quite tricky.
// See `fast_find` for details. Effectively, this lets us skip running
// the NFA simulation in the regex engine in the vast majority of
// cases. However, the NFA simulation is required for full correctness.
// a slower regex engine to extract capture groups in the vast majority
// of cases. However, the slower engine is I believe required for full
// correctness.
match self.fast_find(haystack, at) {
Ok(Some(m)) => return Ok(Some(m)),
Ok(None) => return Ok(None),
Err(()) => {}
}
let cell =
self.locs.get_or(|| RefCell::new(self.regex.capture_locations()));
let mut caps = cell.borrow_mut();
self.regex.captures_read_at(&mut caps, haystack, at);
Ok(caps.get(1).map(|m| Match::new(m.0, m.1)))
let input = Input::new(haystack).span(at..haystack.len());
let mut caps = self.caps.get();
self.regex.search_captures(&input, &mut caps);
Ok(caps.get_group(1).map(|sp| Match::new(sp.start, sp.end)))
}
fn new_captures(&self) -> Result<RegexCaptures, NoError> {
Ok(RegexCaptures::with_offset(self.regex.capture_locations(), 1))
Ok(RegexCaptures::with_offset(self.regex.create_captures(), 1))
}
fn capture_count(&self) -> usize {
@@ -178,9 +205,10 @@ impl Matcher for WordMatcher {
at: usize,
caps: &mut RegexCaptures,
) -> Result<bool, NoError> {
let r =
self.regex.captures_read_at(caps.locations_mut(), haystack, at);
Ok(r.is_some())
let input = Input::new(haystack).span(at..haystack.len());
let caps = caps.captures_mut();
self.regex.search_captures(&input, caps);
Ok(caps.is_match())
}
// We specifically do not implement other methods like find_iter or
@@ -195,8 +223,8 @@ mod tests {
use grep_matcher::{Captures, Match, Matcher};
fn matcher(pattern: &str) -> WordMatcher {
let chir = Config::default().hir(pattern).unwrap();
WordMatcher::new(&chir).unwrap()
let chir = Config::default().build_many(&[pattern]).unwrap();
WordMatcher::new(chir).unwrap()
}
fn find(pattern: &str, haystack: &str) -> Option<(usize, usize)> {

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-searcher"
version = "0.1.10" #:version
version = "0.1.11" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Fast line oriented regex searching as a library.
@@ -14,16 +14,16 @@ license = "Unlicense OR MIT"
edition = "2018"
[dependencies]
bstr = { version = "1.1.0", default-features = false, features = ["std"] }
bstr = { version = "1.6.0", default-features = false, features = ["std"] }
bytecount = "0.6"
encoding_rs = "0.8.14"
encoding_rs_io = "0.1.6"
grep-matcher = { version = "0.1.5", path = "../matcher" }
grep-matcher = { version = "0.1.6", path = "../matcher" }
log = "0.4.5"
memmap = { package = "memmap2", version = "0.5.3" }
[dev-dependencies]
grep-regex = { version = "0.1.10", path = "../regex" }
grep-regex = { version = "0.1.11", path = "../regex" }
regex = "1.1"
[features]

View File

@@ -10,6 +10,12 @@ use crate::sink::{
};
use grep_matcher::{LineMatchKind, Matcher};
enum FastMatchResult {
Continue,
Stop,
SwitchToSlow,
}
#[derive(Debug)]
pub struct Core<'s, M: 's, S> {
config: &'s Config,
@@ -25,6 +31,7 @@ pub struct Core<'s, M: 's, S> {
last_line_visited: usize,
after_context_left: usize,
has_sunk: bool,
has_matched: bool,
}
impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
@@ -50,6 +57,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
last_line_visited: 0,
after_context_left: 0,
has_sunk: false,
has_matched: false,
};
if !core.searcher.multi_line_with_matcher(&core.matcher) {
if core.is_line_by_line_fast() {
@@ -109,7 +117,11 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
pub fn match_by_line(&mut self, buf: &[u8]) -> Result<bool, S::Error> {
if self.is_line_by_line_fast() {
self.match_by_line_fast(buf)
match self.match_by_line_fast(buf)? {
FastMatchResult::SwitchToSlow => self.match_by_line_slow(buf),
FastMatchResult::Continue => Ok(true),
FastMatchResult::Stop => Ok(false),
}
} else {
self.match_by_line_slow(buf)
}
@@ -270,7 +282,9 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
}
};
self.set_pos(line.end());
if matched != self.config.invert_match {
let success = matched != self.config.invert_match;
if success {
self.has_matched = true;
if !self.before_context_by_line(buf, line.start())? {
return Ok(false);
}
@@ -286,40 +300,51 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
return Ok(false);
}
}
if self.config.stop_on_nonmatch && !success && self.has_matched {
return Ok(false);
}
}
Ok(true)
}
fn match_by_line_fast(&mut self, buf: &[u8]) -> Result<bool, S::Error> {
debug_assert!(!self.config.passthru);
fn match_by_line_fast(
&mut self,
buf: &[u8],
) -> Result<FastMatchResult, S::Error> {
use FastMatchResult::*;
debug_assert!(!self.config.passthru);
while !buf[self.pos()..].is_empty() {
if self.config.stop_on_nonmatch && self.has_matched {
return Ok(SwitchToSlow);
}
if self.config.invert_match {
if !self.match_by_line_fast_invert(buf)? {
return Ok(false);
return Ok(Stop);
}
} else if let Some(line) = self.find_by_line_fast(buf)? {
self.has_matched = true;
if self.config.max_context() > 0 {
if !self.after_context_by_line(buf, line.start())? {
return Ok(false);
return Ok(Stop);
}
if !self.before_context_by_line(buf, line.start())? {
return Ok(false);
return Ok(Stop);
}
}
self.set_pos(line.end());
if !self.sink_matched(buf, &line)? {
return Ok(false);
return Ok(Stop);
}
} else {
break;
}
}
if !self.after_context_by_line(buf, buf.len())? {
return Ok(false);
return Ok(Stop);
}
self.set_pos(buf.len());
Ok(true)
Ok(Continue)
}
#[inline(always)]
@@ -344,6 +369,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
if invert_match.is_empty() {
return Ok(true);
}
self.has_matched = true;
if !self.after_context_by_line(buf, invert_match.start())? {
return Ok(false);
}
@@ -577,6 +603,9 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
if self.config.passthru {
return false;
}
if self.config.stop_on_nonmatch && self.has_matched {
return false;
}
if let Some(line_term) = self.matcher.line_terminator() {
if line_term == self.config.line_term {
return true;

View File

@@ -71,16 +71,6 @@ impl MmapChoice {
if !self.is_enabled() {
return None;
}
if !cfg!(target_pointer_width = "64") {
// For 32-bit systems, it looks like mmap will succeed even if it
// can't address the entire file. This seems to happen at least on
// Windows, even though it uses to work prior to ripgrep 13. The
// only Windows-related change in ripgrep 13, AFAIK, was statically
// linking vcruntime. So maybe that's related? But I'm not sure.
//
// See: https://github.com/BurntSushi/ripgrep/issues/1911
return None;
}
if cfg!(target_os = "macos") {
// I guess memory maps on macOS aren't great. Should re-evaluate.
return None;

View File

@@ -173,6 +173,9 @@ pub struct Config {
encoding: Option<Encoding>,
/// Whether to do automatic transcoding based on a BOM or not.
bom_sniffing: bool,
/// Whether to stop searching when a non-matching line is found after a
/// matching line.
stop_on_nonmatch: bool,
}
impl Default for Config {
@@ -190,6 +193,7 @@ impl Default for Config {
multi_line: false,
encoding: None,
bom_sniffing: true,
stop_on_nonmatch: false,
}
}
}
@@ -555,6 +559,19 @@ impl SearcherBuilder {
self.config.bom_sniffing = yes;
self
}
/// Stop searching a file when a non-matching line is found after a
/// matching line.
///
/// This is useful for searching sorted files where it is expected that all
/// the matches will be on adjacent lines.
pub fn stop_on_nonmatch(
&mut self,
stop_on_nonmatch: bool,
) -> &mut SearcherBuilder {
self.config.stop_on_nonmatch = stop_on_nonmatch;
self
}
}
/// A searcher executes searches over a haystack and writes results to a caller
@@ -838,6 +855,13 @@ impl Searcher {
self.config.multi_line
}
/// Returns true if and only if this searcher is configured to stop when in
/// finds a non-matching line after a matching one.
#[inline]
pub fn stop_on_nonmatch(&self) -> bool {
self.config.stop_on_nonmatch
}
/// Returns true if and only if this searcher will choose a multi-line
/// strategy given the provided matcher.
///

View File

@@ -232,6 +232,16 @@ would behave identically to the following command
rg --glob '!.git' foo
The bottom line is that every shell argument needs to be on its own line. So
for example, a config file containing
-j 4
is probably not doing what you intend. Instead, you want
-j
4
ripgrep also provides a flag, *--no-config*, that when present will suppress
any and all support for configuration. This includes any future support
for auto-loading configuration files from pre-determined paths.

28
pkg/windows/Manifest.xml Normal file
View File

@@ -0,0 +1,28 @@
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!--
This is a Windows application manifest file.
See: https://docs.microsoft.com/en-us/windows/win32/sbscs/application-manifests
-->
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0" xmlns:asmv3="urn:schemas-microsoft-com:asm.v3">
<!-- Versions rustc supports as compiler hosts -->
<compatibility xmlns="urn:schemas-microsoft-com:compatibility.v1">
<application>
<!-- Windows 7 --><supportedOS Id="{35138b9a-5d96-4fbd-8e2d-a2440225f93a}"/>
<!-- Windows 8 --><supportedOS Id="{4a2f28e3-53b9-4441-ba9c-d69d4a4a6e38}"/>
<!-- Windows 8.1 --><supportedOS Id="{1f676c76-80e1-4239-95bb-83d0f6d0da78}"/>
<!-- Windows 10 and 11 --><supportedOS Id="{8e0f7a12-bfb3-4fe8-b9a5-48fd50a15a9a}"/>
</application>
</compatibility>
<!-- Use UTF-8 code page -->
<asmv3:application>
<asmv3:windowsSettings xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">
<activeCodePage>UTF-8</activeCodePage>
</asmv3:windowsSettings>
</asmv3:application>
<!-- Remove (most) legacy path limits -->
<asmv3:application>
<asmv3:windowsSettings xmlns:ws2="http://schemas.microsoft.com/SMI/2016/WindowsSettings">
<ws2:longPathAware>true</ws2:longPathAware>
</asmv3:windowsSettings>
</asmv3:application>
</assembly>

15
pkg/windows/README.md Normal file
View File

@@ -0,0 +1,15 @@
This directory contains a Windows manifest for various Windows-specific
settings.
The main thing we enable here is [`longPathAware`], which permits paths of the
form `C:\` to be longer than 260 characters.
The approach taken here was modeled off of a [similar change for `rustc`][rustc pr].
In particular, this manifest gets linked into the final binary. Those linker
arguments are applied in `build.rs`.
This currently only applies to MSVC builds. If there's an easy way to make this
apply to GNU builds as well, then patches are welcome.
[`longPathAware`]: https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests#longpathaware
[rustc pr]: https://github.com/rust-lang/rust/pull/96737

View File

@@ -787,6 +787,28 @@ rgtest!(f1466_no_ignore_files, |dir: Dir, mut cmd: TestCommand| {
eqnice!("foo\n", cmd.arg("-u").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/pull/2361
rgtest!(f2361_sort_nested_files, |dir: Dir, mut cmd: TestCommand| {
use std::{thread::sleep, time::Duration};
dir.create("foo", "1");
sleep(Duration::from_millis(100));
dir.create_dir("dir");
sleep(Duration::from_millis(100));
dir.create(dir.path().join("dir").join("bar"), "1");
cmd.arg("--sort").arg("accessed").arg("--files");
eqnice!("foo\ndir/bar\n", cmd.stdout());
dir.create("foo", "2");
sleep(Duration::from_millis(100));
dir.create(dir.path().join("dir").join("bar"), "2");
sleep(Duration::from_millis(100));
cmd.arg("--sort").arg("accessed").arg("--files");
eqnice!("foo\ndir/bar\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/1404
rgtest!(f1404_nothing_searched_warning, |dir: Dir, mut cmd: TestCommand| {
dir.create(".ignore", "ignored-dir/**");
@@ -921,6 +943,23 @@ rgtest!(f1842_field_match_separator, |dir: Dir, _: TestCommand| {
eqnice!(expected, dir.command().args(&args).stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/2288
rgtest!(f2288_context_partial_override, |dir: Dir, mut cmd: TestCommand| {
dir.create("test", "1\n2\n3\n4\n5\n6\n7\n8\n9\n");
cmd.args(&["-C1", "-A2", "5", "test"]);
eqnice!("4\n5\n6\n7\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/2288
rgtest!(
f2288_context_partial_override_rev,
|dir: Dir, mut cmd: TestCommand| {
dir.create("test", "1\n2\n3\n4\n5\n6\n7\n8\n9\n");
cmd.args(&["-A2", "-C1", "5", "test"]);
eqnice!("4\n5\n6\n7\n", cmd.stdout());
}
);
rgtest!(no_context_sep, |dir: Dir, mut cmd: TestCommand| {
dir.create("test", "foo\nctx\nbar\nctx\nfoo\nctx");
cmd.args(&["-A1", "--no-context-separator", "foo", "test"]);
@@ -975,3 +1014,10 @@ rgtest!(no_unicode, |dir: Dir, mut cmd: TestCommand| {
dir.create("test", "δ");
cmd.arg("-i").arg("--no-unicode").arg("Δ").assert_err();
});
// See: https://github.com/BurntSushi/ripgrep/issues/1790
rgtest!(stop_on_nonmatch, |dir: Dir, mut cmd: TestCommand| {
dir.create("test", "line1\nline2\nline3\nline4\nline5");
cmd.args(&["--stop-on-nonmatch", "[235]"]);
eqnice!("test:line2\ntest:line3\n", cmd.stdout());
});

View File

@@ -1065,3 +1065,48 @@ rgtest!(type_list, |_: Dir, mut cmd: TestCommand| {
// This can change over time, so just make sure we print something.
assert!(!cmd.stdout().is_empty());
});
// The following series of tests seeks to test all permutations of ripgrep's
// sorted queries.
//
// They all rely on this setup function, which sets up this particular file
// structure with a particular creation order:
// ├── a # 1
// ├── b # 4
// └── dir # 2
// ├── c # 3
// └── d # 5
//
// This order is important when sorting them by system time-stamps.
fn sort_setup(dir: Dir) {
use std::{thread::sleep, time::Duration};
let sub_dir = dir.path().join("dir");
dir.create("a", "test");
sleep(Duration::from_millis(100));
dir.create_dir(&sub_dir);
sleep(Duration::from_millis(100));
dir.create(sub_dir.join("c"), "test");
sleep(Duration::from_millis(100));
dir.create("b", "test");
sleep(Duration::from_millis(100));
dir.create(sub_dir.join("d"), "test");
}
rgtest!(sort_files, |dir: Dir, mut cmd: TestCommand| {
sort_setup(dir);
let expected = "a:test\nb:test\ndir/c:test\ndir/d:test\n";
eqnice!(expected, cmd.args(["--sort", "path", "test"]).stdout());
});
rgtest!(sort_accessed, |dir: Dir, mut cmd: TestCommand| {
sort_setup(dir);
let expected = "a:test\ndir/c:test\nb:test\ndir/d:test\n";
eqnice!(expected, cmd.args(["--sort", "accessed", "test"]).stdout());
});
rgtest!(sortr_accessed, |dir: Dir, mut cmd: TestCommand| {
sort_setup(dir);
let expected = "dir/d:test\nb:test\ndir/c:test\na:test\n";
eqnice!(expected, cmd.args(["--sortr", "accessed", "test"]).stdout());
});

View File

@@ -1090,6 +1090,19 @@ b=one
eqnice!(expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/2198
rgtest!(r2198, |dir: Dir, mut cmd: TestCommand| {
dir.create(".ignore", "a");
dir.create(".rgignore", "b");
dir.create("a", "");
dir.create("b", "");
dir.create("c", "");
cmd.arg("--files").arg("--sort").arg("path");
eqnice!("c\n", cmd.stdout());
eqnice!("a\nb\nc\n", cmd.arg("--no-ignore-dot").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/2208
rgtest!(r2208, |dir: Dir, mut cmd: TestCommand| {
dir.create("test", "# Compile requirements.txt files from all found or specified requirements.in files (compile).
@@ -1126,3 +1139,37 @@ rgtest!(r2236, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo/bar", "test\n");
cmd.args(&["test"]).assert_err();
});
// See: https://github.com/BurntSushi/ripgrep/issues/2480
rgtest!(r2480, |dir: Dir, mut cmd: TestCommand| {
dir.create("file", "FooBar\n");
// no regression in empty pattern behavior
cmd.args(&["-e", "", "file"]);
eqnice!("FooBar\n", cmd.stdout());
// no regression in single pattern behavior
let mut cmd = dir.command();
cmd.args(&["-e", ")(", "file"]);
eqnice!("FooBar\n", cmd.stdout());
// no regression in multiple patterns behavior
let mut cmd = dir.command();
cmd.args(&["--only-matching", "-e", "Foo", "-e", "Bar", "file"]);
eqnice!("Foo\nBar\n", cmd.stdout());
// no regression in capture groups behavior
let mut cmd = dir.command();
cmd.args(&["-e", "Fo(oB)a(r)", "--replace", "${0}_${1}_${2}${3}", "file"]);
eqnice!("FooBar_oB_r\n", cmd.stdout()); // note: ${3} expected to be empty
// flag does not leak into next pattern on match
let mut cmd = dir.command();
cmd.args(&["--only-matching", "-e", "(?i)foo", "-e", "bar", "file"]);
eqnice!("Foo\n", cmd.stdout());
// flag does not leak into next pattern on mismatch
let mut cmd = dir.command();
cmd.args(&["--only-matching", "-e", "(?i)notfoo", "-e", "bar", "file"]);
cmd.assert_err();
});