Compare commits

...

165 Commits

Author SHA1 Message Date
Andrew Gallant
33b81cac48 grep-searcher-0.1.10 2022-07-15 10:05:46 -04:00
Andrew Gallant
6a13a4f64d searcher: remove stray 'dbg!' 2022-07-15 10:05:20 -04:00
Andrew Gallant
b13d835d95 grep-0.2.9 2022-07-15 10:03:06 -04:00
Andrew Gallant
d53506b7f7 grep: bump 'grep-regex' and 'grep-searcher'
To 0.1.10 and 0.1.9, respectively.
2022-07-15 10:02:41 -04:00
Andrew Gallant
78a35d4d43 grep-searcher-0.1.9 2022-07-15 10:02:24 -04:00
Andrew Gallant
a933d0bc90 searcher: bump grep-regex dep to 0.1.10 2022-07-15 10:02:06 -04:00
Andrew Gallant
2cae30e399 grep-regex-0.1.10 2022-07-15 10:01:42 -04:00
Andrew Gallant
8e57989cd2 regex: fix matching bug when text anchors are used
It turns out that if there are text anchors (that is, \A or \z, or ^/$
when multi-line is disabled), then the "fast" line searching path isn't
quite correct. Since searching without multi-line mode is exceptionally
rare, we just look for the presence of text anchors and specifically
disable the line terminator option in 'grep-regex'. This in turn
inhibits the "fast" line searching path.

Fixes #2260
2022-07-15 09:53:39 -04:00
Andrew Gallant
b9f5835534 ci: switch to dtolnay/rust-toolchain
The actions-rs/toolchain project appears dead. dtolnay's also seems more
sustainable given its simplicity, but it does enough to suit our needs.
2022-07-14 13:48:14 -04:00
tleb
e70778e89d ignore/types: add dts to default types
See: https://devicetree-specification.readthedocs.io/en/v0.3/source-language.html

PR #2255
2022-07-07 12:24:12 -04:00
zhimoe
87c4a2b4b1 doc: fix typo
PR #2248
2022-06-26 18:49:54 -04:00
Kian-Meng Ang
0aa31676e3 doc: fix typos
PR #2245
2022-06-24 09:58:20 -04:00
Andrew Gallant
9f0e88bcb1 ignore: fix gitignore parsing bug for trailing \/
When a glob pattern ended with a \/, and since we permit backslash
escapes, the glob parser gave a "dangling escape" error. Which is weird,
because the \ is clearly not dangling.

The issue is that the layer above the glob parser, the gitignore parser,
was stripping the trailing / so that it wouldn't be part of the matching
logic. Of course, stripping the trailing / while it is escaped without
removing the backslash escape is wrong. So we do that here.

Fixes #2236
2022-06-14 10:40:37 -04:00
Alex Touchet
eb4b389846 globset/readme: update version number and some links
PR #2232
2022-06-11 14:17:32 -04:00
Andrew Gallant
dc337bab0a deps: update to globset 0.4.9 2022-06-10 14:11:20 -04:00
Andrew Gallant
2cfb338530 globset-0.4.9 2022-06-10 14:10:34 -04:00
Sergio Benitez
48646e3451 globset: make 'log' an optional feature
PR #1910
2022-06-10 14:10:09 -04:00
Andrew Gallant
985394a19e deps: update to packed_simd_2 0.3.8
It broke on latest nightly. I'm *very* close to just removing the
'simd-accel' feature altogether.

Fixes #2230
2022-06-10 09:39:17 -04:00
jgart
ec36f8c3ff ignore/types: add pants
See: https://www.pantsbuild.org/

PR #2228
2022-06-08 13:29:17 -04:00
jpe90
a726d03641 ignore/types: add hare to default types
PR #2219
2022-05-22 20:08:45 -04:00
Andrew Gallant
91afd4214a printer: fix duplicative replacement in multiline mode
This furthers our kludge of dealing with PCRE2's look-around in the
printer. Because of our bad abstraction boundaries, we added a kludge to
deal with PCRE2 look-around by extending the bytes we search by a fixed
amount to hopefully permit any look-around to operate. But because of
that kludge, we wind up over extending ourselves in some cases and
dragging along those extra bytes.

We had fixed this for simple searching by simply rejecting any matches
past the end point. But we didn't do the same for replacements. So this
commit extends our kludge to replacements.

Thanks to @sonohgong for diagnosing the problem and proposing a fix. I
mostly went with their solution, but adding the new replacement routine
as an internal helper rather than a new APIn in the 'grep-matcher'
crate.

Fixes #2095, Fixes #2208
2022-05-11 14:44:58 -04:00
Keith Smiley
4dc6c73c5a ignore/types: improve Bazel globs
MODULE.bazel is a new file, and WORKSPACE.bazel was always supported
similar to BUILD.bazel vs BUILD.

PR #2203
2022-05-09 11:50:34 -04:00
Alex Touchet
36d03b4101 cargo: use SPDX license format for all crates
This was done for the main crate in d11a3b3377.

See also #987.

PR #2204
2022-05-09 07:52:11 -04:00
Conrad Meyer
d161acb0a3 ignore/types: add '*.hh' to C++ headers
Like .hpp, .hh is an occasionally used extension for C++ headers
(to distinguish them from C headers). At least one popular project,
FreeBSD, uses this extension.

See also: https://docs.fileformat.com/programming/hh/

PR #2192
2022-04-25 07:38:03 -04:00
Matrix Dai
30ee6f08ee ignore/types: add '*.asp' for asp type
The `*.asp` was not included in the type "asp" when it was added.
https://github.com/BurntSushi/ripgrep/pull/1134

PR #2188
2022-04-19 10:36:14 -04:00
Andrew Gallant
ced5b92aa9 deps: bump memmap2 to 0.5
Looking at the memmap2 CHANGELOG, there don't appear to be any breaking
changes that impact us.
2022-03-21 08:59:05 -04:00
Andrew Gallant
191315a2ea deps: update everything
Surprisingly looks like no new dependencies were added! Yay! And we
removed an extra copy of 'cfg-if' due to what appears to be an updated
in 'packed_simd_2'.

Otherwise, all updates appear to be minor things.
2022-03-21 08:59:05 -04:00
Andrew Gallant
5370064f00 warnings: remove/tweak some dead code
It looks like the dead code detector got better, so do a little code
cleanup.
2022-03-21 08:59:05 -04:00
arcsi42
b6189c659e ci: fix failing nightly-arm build on ci workflow
This commit updates the Ubuntu install script to include brotli and
zstd, which are needed for tests.

We also fix the Ubuntu install script to work in environments that
don't have 'sudo'. Instead of creating a totally separate script, we
preserve a single point of truth for these things and just make the
script a bit more flexible.

NOT seen in this commit is that we have built and updated the arm Docker
image. I'm hoping this fixes the GLIBC version issues we're seeing in
CI.

Fixes #2130, Closes #2132
2022-03-21 08:59:05 -04:00
Mateusz Konieczny
0b36942f68 doc: transcoding is done in addition to search
Even if transcoding would be faster than search it would still incur
performance penalty. We make this clearer by tweaking the wording.

PR #2079
2021-11-22 09:48:42 -05:00
mi-wada
7e05cde008 cli: improve configuration failure mode
This improves the error message printed when ripgrep can't read the
file path pointed to by RIPGREP_CONFIG_PATH. Specifically, before this
change:

    $ RIPGREP_CONFIG_PATH=no_exist_path rg 'search regex'
    no_exist_path: No such file or directory (os error 2)

And now after this change:

    $ RIPGREP_CONFIG_PATH=no_exist_path rg 'search regex'
    failed to read the file specified in RIPGREP_CONFIG_PATH: no_exist_path: No such file or directory (os error 2)

In the above examples, the first failure mode looks obvious, but that's
only because RIPGREP_CONFIG_PATH is being set at the same time that we
run the command. Often, the environment variable is set elsewhere and
the error message could be confusing outside of that context.

Closes #1990
2021-11-15 10:29:34 -05:00
jgart
418d048b27 ignore/types: add fennel
https://fennel-lang.org/

PR #2069
2021-11-15 09:58:09 -05:00
Josh Triplett
009dda1488 ignore: if require_git is false, don't stat .git
I've confirmed via strace that this eliminates a pile of stat calls.

PR #2052
2021-11-12 08:37:05 -05:00
Linda_pp
ba535fb5a3 ignore/types: improve 'vim' and 'vimscript' types
This adds various Vim config files to the glob patterns.

PR #2044
2021-10-27 10:59:44 -04:00
jgart
427aaeeb2e ignore/types: add lilypond
This adds file detection for lilypond: https://lilypond.org/

PR #2038
2021-10-24 11:22:07 -04:00
jgart
f5cff746bc ignore/types: add hy
This adds file detection for hy: http://hylang.org/

PR #2033
2021-10-22 08:16:48 -04:00
Philip Munksgaard
457f53b7ee ignore/types: fix futhark type extension
Previously, the 'fut' type only matches files called '.fut', while in
reality we want to match all files with the '.fut' extension. This
commit fixes that issue.

PR #2027
2021-10-19 09:15:19 -04:00
jgart
eb35f7978e ignore/types: add janet
This adds file detection for janet:
https://janet-lang.org/

PR #2018
2021-10-14 07:56:55 -04:00
Markus Dosch
fc69bd366c readme: update install commands for Debian/Ubuntu
This got overlooked during the last release.

PR #2016
2021-10-12 11:08:14 -04:00
Dash
9b01a8f9ae doc: add -F/--fixed-strings to "common options"
#607 is the top result for the search "ripgrep disable regex". I think
it makes sense to add it to the user guide, since it's a very useful
flag.

PR #1945
2021-07-21 20:52:25 -04:00
Andrew Gallant
0ff5dd2360 doc: --field-match-separator's default value is ':'
The docs were out of sync with the implementation. Likely a
copy-and-paste error.

Fixes #1939
2021-07-19 08:07:40 -04:00
Joe Lencioni
3c7819301b doc: fix typo "used" -> "use"
PR #1936
2021-07-14 10:12:30 -04:00
jgart
699e651db2 ignore/types: add texinfo
https://www.gnu.org/software/texinfo/

PR #1934
2021-07-13 07:59:23 -04:00
Eyal
9eddb71b8e ignore/types: add CUDA
Fixes #1918
2021-06-30 09:50:53 -04:00
Andrew Gallant
abf115228e changelog: add #1911 bug fix 2021-06-26 12:57:11 -04:00
Andrew Gallant
fdfc418be5 searcher: disable mmap searching on non-64 bit
It looks like it's possible for mmap to succeed on 32-bit systems even
when the full file can't be addressed in memory. This used to work prior
to ripgrep 13, but (maybe) something about statically linking vcruntime
has caused this to now fail.

It's no big deal to disable mmap searching on 32-bit, so we just do that
instead of returning incorrect results.

Fixes #1911
2021-06-26 12:53:59 -04:00
Sergio Benitez
5bf74362b9 doc: fix typo in --glob flag docs
PR #1899
2021-06-24 08:09:00 -04:00
Kostya M
431ea38620 ignore/types: add file extensions for Crystal
It sounds like Projectfile is no longer being used,
but we should keep it around in case folks are
still using it. It's unlikely that its presence will
do much if any harm.

PR #1904
2021-06-20 08:24:41 -04:00
Andrew Gallant
caba5c4348 globset-0.4.8 2021-06-18 13:30:32 -04:00
Gleb Pomykalov
07f97d42cf globset: fix compilation when serde is enabled
PR #1903
2021-06-18 13:30:47 -04:00
kotborealis
e33d6e73f5 doc: fix formatting of nested list
Markdown wants 4 spaces, not 2.

PR #1894
2021-06-15 10:35:16 -04:00
Andrew Gallant
478da4f271 pkg: fix version number for 13.0.0 release
Fixes #1896
2021-06-15 10:30:01 -04:00
Andrew Gallant
7ce66f73cf regex: update regression test
Sadly, PCRE2 has different behavior (but doesn't panic). We should look
into that, but for now, this is good enough.

Also, update the CHANGELOG.

Ref #1891
2021-06-12 16:22:30 -04:00
Andrew Gallant
bc76a30c23 regex: fix -w when regex can match empty string
This is a weird bug where our optimization for handling -w more quickly
than we would otherwise failed. In particular, if the original regex can
match the empty string, then our word boundary detection would produce
invalid indices to the start the next search at. We "fix" it by simply
bailing when the indices are known to be incorrect.

This wasn't a problem in a previous release since ripgrep 13 tweaked how
word boundaries are detected in commit efd9cfb2.

Fixes #1891
2021-06-12 14:18:53 -04:00
Andrew Gallant
5e81c60b35 ci: use musl to build debian artifact
Previously, I was trying to be a good citizen and let ripgrep use the
system libc. But it turns out that building ripgrep on Arch with a newer
version of glibc than what is in Ubuntu results in the whole thing
breaking. Arguably, I should build the Debian artifact on an Ubuntu or
Debian machine of an appropriate version, but that's too much work. If
people really want that, then they can install some ancient version of
ripgrep from their Ubuntu/Debian repo.

Since we were already statically linking PCRE2, we go the whole nine
yards and statically link the entire thing.

Fixes #1890
2021-06-12 13:36:57 -04:00
Andrew Gallant
b3e5ae9d28 changelog: add template for next entry 2021-06-12 08:43:49 -04:00
Andrew Gallant
a024f14fdd pkg: update brew tap version to 13.0.0 2021-06-12 08:43:30 -04:00
Andrew Gallant
8c30c8294a release: work around GitHub Actions weirdness 2021-06-12 08:40:48 -04:00
Andrew Gallant
c44d263419 release: add note about pushing changes 2021-06-12 08:13:29 -04:00
Andrew Gallant
af6b6c543b 13.0.0 2021-06-12 08:12:24 -04:00
Andrew Gallant
1a4fec8b4a changelog: final prep before ripgrep 13 release 2021-06-12 08:11:51 -04:00
Andrew Gallant
c8d8ab8ded deps/grep: update minimal versions 2021-06-12 08:08:58 -04:00
Andrew Gallant
1d53ed2744 grep-0.2.8 2021-06-12 08:08:32 -04:00
Andrew Gallant
29696d1455 deps/printer: update minimal versions 2021-06-12 08:08:18 -04:00
Andrew Gallant
57ce623a57 grep-printer-0.1.6 2021-06-12 08:07:46 -04:00
Andrew Gallant
f1c656de40 deps/searcher: update minimal versions 2021-06-12 08:07:28 -04:00
Andrew Gallant
dd47582619 grep-searcher-0.1.8 2021-06-12 08:06:58 -04:00
Andrew Gallant
9b88cf8b72 deps/pcre2: update minimal versions 2021-06-12 08:06:50 -04:00
Andrew Gallant
6668d7ba8a grep-pcre2-0.1.5 2021-06-12 08:06:29 -04:00
Andrew Gallant
008da5dca4 pcre2: update minimal version to 0.2.3 2021-06-12 08:05:56 -04:00
Andrew Gallant
a34df1f690 deps/regex: update minimal versions 2021-06-12 08:05:36 -04:00
Andrew Gallant
7f3fd6f7ce grep-regex-0.1.9 2021-06-12 08:03:56 -04:00
Andrew Gallant
6331a7ac18 deps/matcher: update minimal versions 2021-06-12 08:03:47 -04:00
Andrew Gallant
cd4386bd9b grep-matcher-0.1.5 2021-06-12 08:02:30 -04:00
Andrew Gallant
cdc20c5685 deps/cli: update minimal versions 2021-06-12 08:02:18 -04:00
Andrew Gallant
0cf2b98df2 grep-cli-0.1.6 2021-06-12 08:01:22 -04:00
Andrew Gallant
9efdbf74a1 deps/ignore: update minimal versions 2021-06-12 08:01:13 -04:00
Andrew Gallant
53cb9a779e release: add step about making sure 'master' is in sync
Otherwise, if we start doing crate releases from the local checkout
(with git tags) and it turns out that origin/master has newer commits,
rebasing local master will then invalidate those tags.
2021-06-12 07:59:47 -04:00
Andrew Gallant
14860b0f16 ignore-0.4.18 2021-06-12 07:59:07 -04:00
Andrew Gallant
0eb1a1e7c9 deps/globset: update minimal versions 2021-06-12 07:58:46 -04:00
Andrew Gallant
5631e5c7a0 globset-0.4.7 2021-06-12 07:56:56 -04:00
Andrew Gallant
21644408f2 release: tweak 'cargo outdated' advice
I do run --aggressive, although I've been ignoring the clap 3 update for
what seems like forever since it's still in beta.
2021-06-12 07:54:51 -04:00
Andrew Gallant
0ee85a89f5 deps: update to memmap2
Looking at the changelog for memmap2, the only breaking change was to
MmapOptions, which we don't use. So no migration is needed.
2021-06-12 07:53:42 -04:00
Andrew Gallant
ed9d37959f deps: updates libc and syn 2021-06-12 07:52:04 -04:00
Andrew Gallant
9f924ee187 msrv: bump to Rust 1.52.1
This matches the latest stable release of Rust.
2021-06-01 21:07:37 -04:00
Andrew Gallant
35c5db6d1a deps: update everything
Removes two dependencies! autocfg and byteorder.
2021-06-01 21:07:37 -04:00
Andrew Gallant
e824531e38 edition: manual changes
This is mostly just about removing 'extern crate' everywhere and fixing
the fallout.
2021-06-01 21:07:37 -04:00
Andrew Gallant
af54069c51 edition: run 'cargo fix --edition --edition-idioms --all' 2021-06-01 21:07:37 -04:00
Andrew Gallant
77a9e99964 edition: set edition=2018 2021-06-01 21:07:37 -04:00
Andrew Gallant
459a9c5637 edition: initial 'cargo fix --edition' run 2021-06-01 21:07:37 -04:00
Andrew Gallant
e4c4540f6a changelog: fix typo and add Ruby to type improvement list 2021-06-01 11:57:16 -04:00
Ulysse Buonomo
5d0f2b0fc0 ignore/types: config.ru and *.rbw Ruby
PR #1886
2021-06-01 10:57:09 -04:00
Andrew Gallant
079a23b515 changelog: a bit of polish
I think I'm just waiting on the CVE to be published at this point.
2021-06-01 06:59:06 -04:00
Andrew Gallant
6e27649af1 github: add note about file types 2021-06-01 06:26:13 -04:00
Andrew Gallant
df83b8b444 ci: re-work github actions release
This combines the tips from #1820 and the patch submitted in #1675.
The latter wasn't taken as-is because I didn't agree with some of the
changes, and in particular, it removed the ability to easily test the
release on a branch with a dummy tag name. I've tried to add that back
here with the 'rg_version' output. Overall though, using outputs is
indeed much simpler.

Closes #1675, Closes #1820
2021-05-31 21:51:18 -04:00
Andrew Gallant
e48a17e189 changelog: prep for ripgrep 13 release 2021-05-31 21:51:18 -04:00
Andrew Gallant
fbb2cfed28 printer: trim line terminator before doing replacements
This is basically the same bug as #1401, but applied to replacements
instead of --only-matching.

Fixes #1739
2021-05-31 21:51:18 -04:00
Andrew Gallant
af8b27ffae changelog: fish completions are staying
In a previous release, I announced that Fish completions were being
removed. But the Fish project decided to remove theirs and have
ripgrep's stay.

Closes #1577
2021-05-31 21:51:18 -04:00
Martin Pool
8a4071eea9 globset: expand docs and impl Default for GlobSet
Closes #1882, Closes #1883
2021-05-31 21:51:18 -04:00
Andrew Gallant
ee23ab5173 printer: trim line terminator before finding submatches
This fixes a bug where PCRE2 look-around could change the result of a
match if it observed a line terminator in the printer. And in
particular, this is precisely how the searcher operates: the line is
considered unto itself *without* the line terminator.

Fixes #1401
2021-05-31 21:51:18 -04:00
Andrew Gallant
efd9cfb2fc grep: fix bugs in handling multi-line look-around
This commit hacks in a bug fix for handling look-around across multiple
lines. The main problem is that by the time the matching lines are sent
to the printer, the surrounding context---which some look-behind or
look-ahead might have matched---could have been dropped if it wasn't
part of the set of matching lines. Therefore, when the printer re-runs
the regex engine in some cases (to do replacements, color matches, etc
etc), it won't be guaranteed to see the same matches that the searcher
found.

Overall, this is a giant clusterfuck and suggests that the way I divided
the abstraction boundary between the printer and the searcher is just
wrong. It's likely that the searcher needs to handle more of the work of
matching and pass that info on to the printer. The tricky part is that
this additional work isn't always needed. Ultimately, this means a
serious re-design of the interface between searching and printing. Sigh.

The way this fix works is to smuggle the underlying buffer used by the
searcher through into the printer. Since these bugs only impact
multi-line search (otherwise, searches are only limited to matches
across a single line), and since multi-line search always requires
having the entire file contents in a single contiguous slice (memory
mapped or on the heap), it follows that the buffer we pass through when
we need it is, in fact, the entire haystack. So this commit refactors
the printer's regex searching to use that buffer instead of the intended
bundle of bytes containing just the relevant matching portions of that
same buffer.

There is one last little hiccup: PCRE2 doesn't seem to have a way to
specify an ending position for a search. So when we re-run the search to
find matches, we can't say, "but don't search past here." Since the
buffer is likely to contain the entire file, we really cannot do
anything here other than specify a fixed upper bound on the number of
bytes to search. So if look-ahead goes more than N bytes beyond the
match, this code will break by simply being unable to find the match. In
practice, this is probably pretty rare. I believe that if we did a
better fix for this bug by fixing the interfaces, then we'd probably try
to have PCRE2 find the pertinent matches up front so that it never needs
to re-discover them.

Fixes #1412
2021-05-31 21:51:18 -04:00
Andrew Gallant
656aa12649 printer: fix multi-line replacement bug
This commit fixes a subtle bug in multi-line replacement of line
terminators.

The problem is that even though ripgrep supports multi-line searches, it
is *still* line oriented. It still needs to print line numbers, for
example. For this reason, there are various parts in the printer that
iterate over lines in order to format them into the desired output.

This turns out to be problematic in some cases. #1311 documents one of
those cases (with line numbers enabled to highlight a point later):

    $ printf "hello\nworld\n" | rg -n -U "\n" -r "?"
    1:hello?
    2:world?

But the desired output is this:

    $ printf "hello\nworld\n" | rg -n -U "\n" -r "?"
    1:hello?world?

At first I had thought that the main problem was that the printer was
taking ownership of writing line terminators, even if the input already
had them. But it's more subtle than that. If we fix that issue, we get
output like this instead:

    $ printf "hello\nworld\n" | rg -n -U "\n" -r "?"
    1:hello?2:world?

Notice how '2:' is printed before 'world?'. The reason it works this way
is because matches are reported to the printer in a line oriented way.
That is, the printer gets a block of lines. The searcher guarantees that
all matches that start or end in any of those lines also end or start in
another line in that same block. As a result, the printer uses this
assumption: once it has processed a block of lines, the next match will
begin on a new and distinct line. Thus, things like '2:' are printed.

This is generally all fine and good, but an impedance mismatch arises
when replacements are used. Because now, the replacement can be used to
change the "block of lines" approach. Now, in terms of the output, the
subsequent match might actually continue the current line since the
replacement might get rid of the concept of lines altogether.

We can sometimes work around this. For example:

    $ printf "hello\nworld\n" | rg -U "\n(.)?" -r '?$1'
    hello?world?

Why does this work? It's because the '(.)' after the '\n' causes the
match to overlap between lines. Thus, the searcher guarantees that the
block sent to the printer contains every line.

And there in lay the solution: all we need to do is tweak the multi-line
searcher so that it combines lines with matches that directly adjacent,
instead of requiring at least one byte of overlap. Fixing that solves
the issue above. It does cause some tests to fail:

* The binary3 test in the searcher crate fails because adjacent line
  matches are now one part of block, and that block is scanned for
  binary data. To preserve the essence of the test, we insert a couple
  dummy lines to split up the blocks.
* The JSON CRLF test. It was testing that we didn't output any messages
  with an empty 'submatches' array. That is indeed still the case. The
  difference is that the messages got combined because of the adjacent
  line merging behavior. This is a slight change to the output, but is
  still correct.

Fixes #1311
2021-05-31 21:51:18 -04:00
Andrew Gallant
fc31aedcf3 printer: vimgrep now only prints one line
It turns out that the vimgrep format really only wants one line per
match, even when that match spans multiple lines.

We continue to support the previous behavior (print all lines in a
match) in the `grep-printer` crate. We add a new option to enable the
"only print the first line" behavior, and unconditionally enable it in
ripgrep. We can do that because the option has no effect in single-line
mode, since, well, in that case matches are guaranteed to span one line
anyway.

Fixes #1866
2021-05-31 21:51:18 -04:00
Anthony Huang
578e1992fa cli: add --field-{context,match}-separator flags
These flags permit configuring the bytes used to delimit fields in match
or context lines, where "fields" are things like the file path, line
number, column number and the match/context itself.

Fixes #1842, Closes #1871
2021-05-31 21:51:18 -04:00
Austin Wise
46d0130597 cargo: statically link binary on Windows/MSVC
Before this change, rg.exe depended on vcruntime140.dll, which does not
exist on a fresh install of Windows.

Closes #1613
2021-05-31 21:51:18 -04:00
Andres Suarez
7534d5144f globset: fix recursive suffix over matching
Previous, 'foo/**' would match 'foo', but it shouldn't have. In this
case, not matching 'foo' is what is documented and also seems consistent
with other recursive globbing implementations (like that in zsh).

This also updates the prefix extractor to pull 'foo/' out of 'foo/**'.

Closes #1756
2021-05-31 21:51:18 -04:00
Richard Khoury
a28e664abd ignore: check ignore rules before issuing stat calls
This seems like an obvious optimization but becomes critical when
filesystem operations even as simple as stat can result in significant
overheads; an example of this was a bespoke filesystem layer in Windows
that hosted files remotely and would download them on-demand when
particular filesystem operations occurred. Users of this system who
ensured correct file-type fileters were being used could still get
unnecessary file access resulting in large downloads.

Fixes #1657, Closes #1660
2021-05-31 21:51:18 -04:00
Pen Tree
0ca96e004c printer: fix context bug when --max-count is used
In the case where after-context is requested with a match count limit,
we need to be careful not to reset the state tracking the remaining
context lines.

Fixes #1380, Closes #1642
2021-05-31 21:51:18 -04:00
Alessandro Menezes
2295061e80 searcher: do UTF-8 BOM sniffing like UTF-16
Previously, we were only looking for the UTF-16 BOM for determining
whether to do transcoding or not. But we should also look for the UTF-8
BOM as well.

Fixes #1638, Closes #1697
2021-05-31 21:51:18 -04:00
Raimon Grau
53c4855517 ignore/types: add red
See: https://www.red-lang.org/

Closes #1663
2021-05-31 21:51:18 -04:00
Simon Morgan
121e0135c1 ignore/types: replace duplicate glob with *.aspx.vb
*.aspx.cs was listed twice and the VB variant is missing.

Closes #1683
2021-05-31 21:51:18 -04:00
tillyboy
c53c4c0ade doc: explain ignore rules a bit more
Closes #1600
2021-05-31 21:51:18 -04:00
João Marcos
4566882521 cli: add -. as short option for --hidden
This is somewhat non-standard, but it seems nice on the surface: short
flag names are in short supply, --hidden is probably somewhat common and
-. has an obvious connection with how hidden files are named on Unix.

Closes #1680
2021-05-31 21:51:18 -04:00
Andrew Gallant
12dd455ee9 printer: fix \r\n line terminator handling
This fixes a bug where it was assumed that 'is_suffix' when CRLF
handling was enabled mean that '\r\n' was present. But that's not the
case, and it is intentional that 'is_suffix' only looks for '\n'. (Which
is why #1803 wasn't taken, which tries to fix this by changing
'is_suffix'.)

Fixes #1765, Closes #1803
2021-05-31 21:51:18 -04:00
goto-engineering
e6cac8b119 cli: print warning if nothing was searched
This was once part of ripgrep, but at some point, was unintentionally
removed. The value of this warning is that since ripgrep tries to be
"smart" by default, it can be surprising if it doesn't search certain
things. This warning covers the case when ripgrep searches *nothing*,
which happens somewhat more frequently than you might expect. e.g., If
you're searching within an ignore directory.

Note that for now, we only print this message when the user has not
supplied any explicit paths. It's not clear that we want to print this
otherwise, and in particular, it seems that the message shows up too
eagerly. e.g., 'rg foo does-not-exist' will both print an error about
'does-not-exist' not existing, *and* the message about no files being
searched, which seems annoying in this case. We can always refine this
logic later.

Fixes #1404, Closes #1762
2021-05-31 21:51:18 -04:00
Marco Ieni
0f502a9439 cargo: remove "readme" field
It is apparently no longer required since a README.md file is
automatically detected:
https://doc.rust-lang.org/cargo/reference/manifest.html#the-readme-field

Closes #1770
2021-05-31 21:51:18 -04:00
Ilya Grigoriev
51d2db7f19 doc: document '{a,b}' glob syntax
This syntax does not exist in `git`, so it is not documented in `man
gitignore`. There is a question of whether it *should* exist, but as
long as it does, it should be documented somewhere.

See also:
https://github.com/BurntSushi/ripgrep/issues/1221
https://github.com/BurntSushi/ripgrep/issues/1368

Closes #1816
2021-05-31 21:51:18 -04:00
Marco Ieni
b3a6a69f9d ci: check docs for all crates
This also replaces '--all' in Cargo commands with '--workspace'. The
former has apparently been deprecated.

We also fix a couple warnings that this new step detected.

Closes #1848
2021-05-31 21:51:18 -04:00
Jade
26a29c750e doc: clarify --files-with-matches and --files-without-match
Ref https://github.com/BurntSushi/ripgrep/issues/103#issuecomment-763083510

Closes #1869
2021-05-31 21:51:18 -04:00
Varik Valefor
beda5f70dc doc: improve wording
This tightens up the wording in ripgrep's opening description. It's used
in several places, so we update all of them.

Closes #1881
2021-05-31 21:51:18 -04:00
Vasili Revelas
5af7707a35 cli: fix process leak
If ripgrep was called in a way where the entire contents of a file
aren't read (like --files-with-matches, among other methods), and if the
file was read through an external process, then ripgrep would never reap
that process.

We fix this by introducing an explicit 'close' method, which we now call
when using decompression or preprocessor searches.

The implementation of 'close' is a little hokey. In particular, when we
close stdout, this usually results in a broken pipe, and, consequently,
a non-zero code returned once the child process is reaped. This is
"situation normal," so we invent a (hopefully portable) heuristic for
detecting it.

Fixes #1766, Closes #1767
2021-05-31 21:51:18 -04:00
Vasili Revelas
3f33a83a5f searcher: remove variable shadowing
The previous variable name was the same as one of the method arguments.
2021-05-31 21:51:18 -04:00
Andrew Gallant
35b52d33b9 regex: add unit tests for non-matching anchor bytes
This is in addition to the integration level test added in
581a35e568.
2021-05-31 21:51:18 -04:00
Andrew Gallant
a77b914e7a args: make --passthru and -A/-B/-C override each other
Fixes #1868
2021-05-31 21:51:18 -04:00
Andrew Gallant
2e2af50a4d doc: add vulnerability report docs
Fixes #1773
2021-05-29 09:53:18 -04:00
Andrew Gallant
229d1a8d41 cli: fix arbitrary execution of program bug
This fixes a bug only present on Windows that would permit someone to
execute an arbitrary program if they crafted an appropriate directory
tree. Namely, if someone put an executable named 'xz.exe' in the root of
a directory tree and one ran 'rg -z foo' from the root of that tree,
then the 'xz.exe' executable in that tree would execute if there are any
'xz' files anywhere in the tree.

The root cause of this problem is that 'CreateProcess' on Windows will
implicitly look in the current working directory for an executable when
it is given a relative path to a program. Rust's standard library allows
this behavior to occur, so we work around it here. We work around it by
explicitly resolving programs like 'xz' via 'PATH'. That way, we only
ever pass an absolute path to 'CreateProcess', which avoids the implicit
behavior of checking the current working directory.

This fix doesn't apply to non-Windows systems as it is believed to only
impact Windows. In theory, the bug could apply on Unix if '.' is in
one's PATH, but at that point, you reap what you sow.

While the extent to which this is a security problem isn't clear, I
think users generally expect to be able to download or clone
repositories from the Internet and run ripgrep on them without fear of
anything too awful happening. Being able to execute an arbitrary program
probably violates that expectation. Therefore, CVE-2021-3013[1] was
created for this issue.

We apply the same logic to the --pre command, since the --pre command is
likely in a user's config file and it would be surprising for something
that the user is searching to modify which preprocessor command is used.

The --pre and -z/--search-zip flags are the only two ways that ripgrep
will invoke external programs, so this should cover any possible
exploitable cases of this bug.

[1] - https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-3013
2021-05-29 09:36:48 -04:00
Andrew Gallant
8ec6ef373f changelog: sync with commits since last release
I'm hoping to get a release out soon, and this is the first step.
2021-05-29 08:26:46 -04:00
Andrew Gallant
581a35e568 impl: fix --multiline anchored match bug
This fixes a bug where using \A or (?-m)^ in combination with
-U/--multiline would permit matches that aren't anchored to the
beginning of the file. The underlying cause was an optimization that
occurred when mmaps couldn't be used. Namely, ripgrep tries to still
read the input incrementally if it knows the pattern can't match through
a new line. But the detection logic was flawed, since it didn't account
for line anchors. This commit fixes that.

Fixes #1878, Fixes #1879
2021-05-29 07:37:28 -04:00
jack1142
ba965962fe ignore/types: add po files to supported types
See: https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html

Closes #1875
2021-05-28 12:06:10 -04:00
Andrew Gallant
94e4b8e301 printer: fix --vimgrep for multi-line mode
It turned out that --vimgrep wasn't quite getting the column of each
match correctly. Instead of printing column numbers relative to the
current line, it was printing column numbers as byte offsets relative to
where the match began. To fix this, we simply subtract the offset of the
line number from the beginning of the match. If the beginning of the
match came before the start of the current line, then there's really
nothing sensible we can do other than to use a column number of 1, which
we now document.

Interestingly, existing tests were checking that the previous behavior
was intended. My only defense is that I somehow tricked myself into
thinking it was a byte offset instead of a column number.

Kudos to @bfrg for calling this out in #1866:
https://github.com/BurntSushi/ripgrep/issues/1866#issuecomment-841635553
2021-05-15 08:27:59 -04:00
Alessandro Caputo
2af77242c5 doc: fix typo in --engine flag docs
Fixes #1862
2021-05-08 15:35:44 -04:00
Andrew Gallant
3f4c4188c1 deps: update to regex 1.5.2
This brings in a performance bug fix, merged in
https://github.com/rust-lang/regex/pull/768.

Fixes #1860.
2021-05-01 07:44:47 -04:00
Andrew Gallant
ce4b587055 deps: update everything
It looks like no new dependencies have been introduced. Yay!

This update was primarily motivated to bring regex 1.5 in with its new
memmem implementation from the memchr crate.
2021-04-30 20:26:32 -04:00
Eliaz Bobadilla
be63122508 doc: add links to Spanish translation
PR #1856
2021-04-21 11:14:11 -04:00
Dan Bjorge
92286ad4d2 doc: clarify --hidden definition
On Windows, we didn't previously document that ripgrep
respected both the prefix-dot convention _and_ the "hidden"
attribute on files.

Fixes #1847
2021-04-15 19:21:26 -04:00
jgart
4ebe8375ec ignore/types: add mint
PR #1844
2021-04-04 08:00:12 -04:00
Andrew Gallant
7923d25228 core: add a 'trace' message
This message will emit the binary detection mechanism being used for
each file.

This does not noticeably increases the number of log messages, as the
'trace' level is already used for emitting messages for every file
searched.

This trace message was added in the course of investigating #1838.
2021-03-31 13:54:00 -04:00
aricha1940
1c3eebefec searcher: update outdated comment for buffer size
Looks like this was accidentally left set to 8 in commit 46fb77c.

PR #1839
2021-03-31 08:18:38 -04:00
Andrew Gallant
64ac2ebe0f tests: fix tests for buffer size change
Sadly, there were several tests that are coupled to the size of the
buffer used by ripgrep. Making the tests agnostic to the size is
difficult. And it's annoying to fix the tests. But we rarely change the
buffer size, so ¯\_(ツ)_/¯.
2021-03-23 18:14:18 -04:00
Andrew Gallant
46fb77c20c searcher: bump buffer size
This increases the initial buffer size from 8KB to 64KB. This actually
leads to a reasonably noticeable improvement in at least one work-load,
and is unlikely to regress in any other case. Also, since Rust programs
(at least on Linux) seem to always use a minimum of 6-8MB of memory,
adding an extra 56KB is negligible.

Before:

    $ hyperfine -i "rg 'zqzqzqzq' OpenSubtitles2018.raw.en --no-mmap"
    Benchmark #1: rg 'zqzqzqzq' OpenSubtitles2018.raw.en --no-mmap
      Time (mean ± σ):      2.109 s ±  0.012 s    [User: 565.5 ms, System: 1541.6 ms]
      Range (min … max):    2.094 s …  2.128 s    10 runs

After:

    $ hyperfine -i "rg 'zqzqzqzq' OpenSubtitles2018.raw.en --no-mmap"
    Benchmark #1: rg 'zqzqzqzq' OpenSubtitles2018.raw.en --no-mmap
      Time (mean ± σ):      1.802 s ±  0.006 s    [User: 462.3 ms, System: 1337.9 ms]
      Range (min … max):    1.795 s …  1.814 s    10 runs
2021-03-23 17:45:02 -04:00
Allen Wild
6a1c3253e0 ci: fix deb build script in clean checkout
If ripgrep hasn't been built yet (i.e. target/debug/ doesn't exist),
then cargo-out-dir can't find OUT_DIR and the copy commands fail. Fix by
running cargo build before finding OUT_DIR.

Also add a check to fail early with a sensible error message when
asciidoctor isn't installed, rather than failing because of a missing
rg.1 file after the build.

PR #1831
2021-03-20 13:37:50 -04:00
Andrew Gallant
c7730d1f3a deps: bump regex and regex-syntax 2021-03-11 21:20:25 -05:00
Hanif Ariffin
c5ea5a13df gitignore: add HTML files generated by cargo -Z timings
PR #1801
2021-02-12 11:09:56 -05:00
Sergei Vorobev
9c8d873a75 ignore/types: improve bazel globs
Adds *.BUILD and *.bazelrc.

PR #1789
2021-01-30 18:22:48 -05:00
Andrew Gallant
7899a4b931 regex: s/CachedThreadLocal/ThreadLocal
CachedThreadLocal has been deprecated. We bump thread_local's minimal
version corresponding to that deprecation as well.
2021-01-25 10:38:05 -05:00
Andrew Gallant
ae55a4e872 deps: update everything
Most of these updates come from releases I've made, and the rest appear
minor. No new dependencies have been added, and `const_fn` was removed.
Yay.
2021-01-17 18:55:17 -05:00
Andrew Gallant
3a1780d841 deps: replace memmap with memmap2
memmap is unmaintained at this point and it is being flagged as a
RUSTSEC advisory in ripgrep. This doesn't seem like that big of a deal
to me honestly, but memmap2 looks like a fine choice at this point.

Fixes #1785, Closes #1786
2021-01-17 18:49:51 -05:00
Andrew Gallant
a6d05475fb ignore-0.4.17 2020-11-23 10:25:33 -05:00
Roey Darwish Dror
020c5453a5 cli: fix stdin detection for Powershell on Unix
It seems that PowerShell uses sockets instead of FIFOs to redirect the
output between commands. So add `is_socket` to our `is_readable_stdin`
check.

This seems unlikely to cause problems and it probably more generally
correct than what we had before. In theory, it could cause problems if
it produces false positives, in which case, ripgrep will try to read
stdin when it should search the current working directory. (And this
usually winds up manifesting as ripgrep blocking forever.) But, if the
stdin handle reports itself as a socket, then it seems like we should
read it.

Fixes #1741, Closes #1742
2020-11-23 10:23:34 -05:00
Ed Page
873abecbf1 ignore: provide underlying IO Error
`ignore::Error` wraps `std::io::Error` with additional information
(as well as expose non-IO errors). For people wanting to inspect what
the error is, they have to recursively match the Enum. This provides
`io_error` and `into_io_error` helpers to do this for the user.

PR #1740
2020-11-23 10:19:31 -05:00
tleb
8c73833efc readme: fix link to .deb
This is a common thing to forget to do after a release.
2020-11-22 09:56:02 -05:00
James Harr
44e69ba627 ignore/types: add yang file type
YANG is described in RFC 6020
https://tools.ietf.org/html/rfc6020

PR #1736
2020-11-20 09:41:29 -05:00
Andrew Gallant
13d77ab646 ci: update to GITHUB_ENV
Apparently ::set-env has been completely disabled. Sigh.
2020-11-16 19:17:36 -05:00
Andrew Gallant
d97fb72d84 doc: update CI links in crate READMEs
I switched to GitHub Actions long ago, which replaces both Travis and
AppVeyor.

Fixes #1732
2020-11-16 19:07:16 -05:00
Andrew Gallant
d6365117e2 doc: sync --help output with man page
The man page had the correct usage hints, but the -h/--help output was
using an older more incorrect version of the hints.

Closes #1730 (again)
2020-11-15 15:27:23 -05:00
Andrew Gallant
f32e906012 doc: clarify that CLI invocation must always be valid
This comes up as a corner case where folks provide -e/--regexp in a
configuration file and then expect to be able to run 'rg' with no args.
However, ripgrep fails because it still expects at least one pattern
even though one was specified in the config file.

This occurs because ripgrep has to parse its CLI parameters before
reading the config file. (For log output settings and to handle the
--no-config flag.) This initial parse will fail if there are no patterns
specified.

The only way to solve this that I can see is to somehow relax the
requirements of the initial parse. But this is problematic because we
would still need to enforce those requirements in cases where we don't
do a second parse (when no config file is present).

All in all, this doesn't seem like a problem that is worth solving.

Closes #1730
2020-11-15 15:00:08 -05:00
Taiki Endo
59644d4592 ci: install cross from crates.io
A new release of cross has been put out, so we
no longer need to install it from git.

PR #1728
2020-11-09 07:25:41 -05:00
Alex Touchet
3ca324fda7 doc: update several links to use https
PR #1724
2020-11-03 10:33:36 -05:00
Stefan VanBuren
8782f8200c doc: add missing backtick in FAQ
PR #1723
2020-11-03 10:32:38 -05:00
Andrew Gallant
2819212f89 printer: tweak binary detection message format
This roughly matches similar changes made in GNU grep recently.
2020-11-02 10:52:51 -05:00
Andrew Gallant
810be0b348 deps: update base64 to 0.13.0 2020-11-02 10:52:51 -05:00
Andrew Gallant
a28bb1e953 deps: bring in all semver updates
This brings in all other semver updates.

This did require updating some tests, since bstr changed its debug
output for NUL bytes to be a bit more idiomatic.
2020-11-02 10:52:51 -05:00
Andrew Gallant
3ef63dacbe deps: targeted update of some dependencies
This updates encoding_rs, crossbeam-utils and crossbeam-channel. This
serves two purposes. The encoding_rs update fixes a compilation failure
on the latest nightly. The crossbeam updates are good sense and to
reduce duplicate dependencies such as cfg-if. (Although, we note that
the log crate still pulls in cfg-if 0.1, so ripgrep has a duplicate
dependency there for now. But it's very small.)

Fixes #1721, Closes #1705
2020-11-02 10:52:51 -05:00
Vanessa McHale
e1ac18ef06 ignore/types: add Futhark
See: https://futhark-lang.org/

PR #1720
2020-10-31 12:10:15 -04:00
Brandon Adams
ba3f9673ad ignore/types: generalize bazel type a bit
Bazel supports `BUILD.bazel` as well as `WORKSPACE.bazel`. In
addition, it is common to ship BUILD/WORKSPACE templates for
external repositories suffixed with .bazel for easier tool
recognition.

Co-authored-by: Brandon Adams <brandon.adams@imc.com>

PR #1716
2020-10-23 12:24:30 -04:00
107 changed files with 4376 additions and 1023 deletions

8
.cargo/config.toml Normal file
View File

@@ -0,0 +1,8 @@
# On Windows MSVC, statically link the C runtime so that the resulting EXE does
# not depend on the vcruntime DLL.
#
# See: https://github.com/BurntSushi/ripgrep/pull/1613
[target.x86_64-pc-windows-msvc]
rustflags = ["-C", "target-feature=+crt-static"]
[target.i686-pc-windows-msvc]
rustflags = ["-C", "target-feature=+crt-static"]

View File

@@ -15,3 +15,6 @@ examples of how ripgrep would be used if your feature request were added.
If you're not sure what to write here, then try imagining what the ideal
documentation of your new feature would look like in ripgrep's man page. Then
try to write it.
If you're requesting the addition or change of default file types, please open
a PR. We can discuss it there if necessary.

View File

@@ -43,7 +43,7 @@ jobs:
include:
- build: pinned
os: ubuntu-18.04
rust: 1.41.0
rust: 1.52.1
- build: stable
os: ubuntu-18.04
rust: stable
@@ -98,21 +98,17 @@ jobs:
ci/macos-install-packages
- name: Install Rust
uses: actions-rs/toolchain@v1
uses: dtolnay/rust-toolchain@v1
with:
toolchain: ${{ matrix.rust }}
profile: minimal
override: true
- name: Use Cross
if: matrix.target != ''
run: |
# FIXME: to work around bugs in latest cross release, install master.
# See: https://github.com/rust-embedded/cross/issues/357
cargo install --git https://github.com/rust-embedded/cross
echo "::set-env name=CARGO::cross"
echo "::set-env name=TARGET_FLAGS::--target ${{ matrix.target }}"
echo "::set-env name=TARGET_DIR::./target/${{ matrix.target }}"
cargo install cross
echo "CARGO=cross" >> $GITHUB_ENV
echo "TARGET_FLAGS=--target ${{ matrix.target }}" >> $GITHUB_ENV
echo "TARGET_DIR=./target/${{ matrix.target }}" >> $GITHUB_ENV
- name: Show command used for Cargo
run: |
@@ -120,10 +116,10 @@ jobs:
echo "target flag is: ${{ env.TARGET_FLAGS }}"
- name: Build ripgrep and all crates
run: ${{ env.CARGO }} build --verbose --all ${{ env.TARGET_FLAGS }}
run: ${{ env.CARGO }} build --verbose --workspace ${{ env.TARGET_FLAGS }}
- name: Build ripgrep with PCRE2
run: ${{ env.CARGO }} build --verbose --all --features pcre2 ${{ env.TARGET_FLAGS }}
run: ${{ env.CARGO }} build --verbose --workspace --features pcre2 ${{ env.TARGET_FLAGS }}
# This is useful for debugging problems when the expected build artifacts
# (like shell completions and man pages) aren't generated.
@@ -141,7 +137,7 @@ jobs:
- name: Run tests with PCRE2 (sans cross)
if: matrix.target == ''
run: ${{ env.CARGO }} test --verbose --all --features pcre2 ${{ env.TARGET_FLAGS }}
run: ${{ env.CARGO }} test --verbose --workspace --features pcre2 ${{ env.TARGET_FLAGS }}
- name: Run tests without PCRE2 (with cross)
# These tests should actually work, but they almost double the runtime.
@@ -149,7 +145,7 @@ jobs:
# enabled, every integration test is run twice: one with the default
# regex engine and once with PCRE2.
if: matrix.target != ''
run: ${{ env.CARGO }} test --verbose --all ${{ env.TARGET_FLAGS }}
run: ${{ env.CARGO }} test --verbose --workspace ${{ env.TARGET_FLAGS }}
- name: Test for existence of build artifacts (Windows)
if: matrix.os == 'windows-2019'
@@ -187,12 +183,25 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v2
- name: Install Rust
uses: actions-rs/toolchain@v1
uses: dtolnay/rust-toolchain@v1
with:
toolchain: stable
override: true
profile: minimal
components: rustfmt
- name: Check formatting
run: |
cargo fmt --all -- --check
docs:
name: Docs
runs-on: ubuntu-20.04
steps:
- name: Checkout repository
uses: actions/checkout@v2
- name: Install Rust
uses: dtolnay/rust-toolchain@v1
with:
toolchain: stable
- name: Check documentation
env:
RUSTDOCFLAGS: -D warnings
run: cargo doc --no-deps --document-private-items --workspace

View File

@@ -1,23 +1,26 @@
# The way this works is a little weird. But basically, the create-release job
# runs purely to initialize the GitHub release itself. Once done, the upload
# URL of the release is saved as an artifact.
# The way this works is the following:
#
# The build-release job runs only once create-release is finished. It gets
# the release upload URL by downloading the corresponding artifact (which was
# uploaded by create-release). It then builds the release executables for each
# supported platform and attaches them as release assets to the previously
# created release.
# The create-release job runs purely to initialize the GitHub release itself
# and to output upload_url for the following job.
#
# The build-release job runs only once create-release is finished. It gets the
# release upload URL from create-release job outputs, then builds the release
# executables for each supported platform and attaches them as release assets
# to the previously created release.
#
# The key here is that we create the release only once.
#
# Reference:
# https://eugene-babichenko.github.io/blog/2020/05/09/github-actions-cross-platform-auto-releases/
name: release
on:
push:
# Enable when testing release infrastructure on a branch.
# branches:
# - ag/release
# - ag/work
tags:
- '[0-9]+.[0-9]+.[0-9]+'
- "[0-9]+.[0-9]+.[0-9]+"
jobs:
create-release:
name: create-release
@@ -25,19 +28,19 @@ jobs:
# env:
# Set to force version number, e.g., when no tag exists.
# RG_VERSION: TEST-0.0.0
outputs:
upload_url: ${{ steps.release.outputs.upload_url }}
rg_version: ${{ env.RG_VERSION }}
steps:
- name: Create artifacts directory
run: mkdir artifacts
- name: Get the release version from the tag
shell: bash
if: env.RG_VERSION == ''
run: |
# Apparently, this is the right way to get a tag name. Really?
#
# See: https://github.community/t5/GitHub-Actions/How-to-get-just-the-tag-name/m-p/32167/highlight/true#M1027
echo "::set-env name=RG_VERSION::${GITHUB_REF#refs/tags/}"
echo "RG_VERSION=${GITHUB_REF#refs/tags/}" >> $GITHUB_ENV
echo "version is: ${{ env.RG_VERSION }}"
- name: Create GitHub release
id: release
uses: actions/create-release@v1
@@ -47,18 +50,6 @@ jobs:
tag_name: ${{ env.RG_VERSION }}
release_name: ${{ env.RG_VERSION }}
- name: Save release upload URL to artifact
run: echo "${{ steps.release.outputs.upload_url }}" > artifacts/release-upload-url
- name: Save version number to artifact
run: echo "${{ env.RG_VERSION }}" > artifacts/release-version
- name: Upload artifacts
uses: actions/upload-artifact@v1
with:
name: artifacts
path: artifacts
build-release:
name: build-release
needs: ['create-release']
@@ -68,7 +59,7 @@ jobs:
# systems.
CARGO: cargo
# When CARGO is set to CROSS, this is set to `--target matrix.target`.
TARGET_FLAGS:
TARGET_FLAGS: ""
# When CARGO is set to CROSS, TARGET_DIR includes matrix.target.
TARGET_DIR: ./target
# Emit backtraces on panics.
@@ -121,22 +112,18 @@ jobs:
ci/macos-install-packages
- name: Install Rust
uses: actions-rs/toolchain@v1
uses: dtolnay/rust-toolchain@v1
with:
toolchain: ${{ matrix.rust }}
profile: minimal
override: true
target: ${{ matrix.target }}
- name: Use Cross
# if: matrix.os != 'windows-2019'
shell: bash
run: |
# FIXME: to work around bugs in latest cross release, install master.
# See: https://github.com/rust-embedded/cross/issues/357
cargo install --git https://github.com/rust-embedded/cross
echo "::set-env name=CARGO::cross"
echo "::set-env name=TARGET_FLAGS::--target ${{ matrix.target }}"
echo "::set-env name=TARGET_DIR::./target/${{ matrix.target }}"
cargo install cross
echo "CARGO=cross" >> $GITHUB_ENV
echo "TARGET_FLAGS=--target ${{ matrix.target }}" >> $GITHUB_ENV
echo "TARGET_DIR=./target/${{ matrix.target }}" >> $GITHUB_ENV
- name: Show command used for Cargo
run: |
@@ -144,22 +131,6 @@ jobs:
echo "target flag is: ${{ env.TARGET_FLAGS }}"
echo "target dir is: ${{ env.TARGET_DIR }}"
- name: Get release download URL
uses: actions/download-artifact@v1
with:
name: artifacts
path: artifacts
- name: Set release upload URL and release version
shell: bash
run: |
release_upload_url="$(cat artifacts/release-upload-url)"
echo "::set-env name=RELEASE_UPLOAD_URL::$release_upload_url"
echo "release upload url: $RELEASE_UPLOAD_URL"
release_version="$(cat artifacts/release-version)"
echo "::set-env name=RELEASE_VERSION::$release_version"
echo "release version: $RELEASE_VERSION"
- name: Build release binary
run: ${{ env.CARGO }} build --verbose --release --features pcre2 ${{ env.TARGET_FLAGS }}
@@ -180,7 +151,7 @@ jobs:
shell: bash
run: |
outdir="$(ci/cargo-out-dir "${{ env.TARGET_DIR }}")"
staging="ripgrep-${{ env.RELEASE_VERSION }}-${{ matrix.target }}"
staging="ripgrep-${{ needs.create-release.outputs.rg_version }}-${{ matrix.target }}"
mkdir -p "$staging"/{complete,doc}
cp {README.md,COPYING,UNLICENSE,LICENSE-MIT} "$staging/"
@@ -191,13 +162,13 @@ jobs:
if [ "${{ matrix.os }}" = "windows-2019" ]; then
cp "target/${{ matrix.target }}/release/rg.exe" "$staging/"
7z a "$staging.zip" "$staging"
echo "::set-env name=ASSET::$staging.zip"
echo "ASSET=$staging.zip" >> $GITHUB_ENV
else
# The man page is only generated on Unix systems. ¯\_(ツ)_/¯
cp "$outdir"/rg.1 "$staging/doc/"
cp "target/${{ matrix.target }}/release/rg" "$staging/"
tar czf "$staging.tar.gz" "$staging"
echo "::set-env name=ASSET::$staging.tar.gz"
echo "ASSET=$staging.tar.gz" >> $GITHUB_ENV
fi
- name: Upload release archive
@@ -205,7 +176,7 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ env.RELEASE_UPLOAD_URL }}
upload_url: ${{ needs.create-release.outputs.upload_url }}
asset_path: ${{ env.ASSET }}
asset_name: ${{ env.ASSET }}
asset_content_type: application/octet-stream

4
.gitignore vendored
View File

@@ -15,3 +15,7 @@ parts
*.snap
*.pyc
ripgrep*_source.tar.bz2
# Cargo timings
cargo-timing-*.html
cargo-timing.html

View File

@@ -4,8 +4,147 @@ Unreleased changes. Release notes have not yet been written.
Bug fixes:
* [BUG #1891](https://github.com/BurntSushi/ripgrep/issues/1891):
Fix bug when using `-w` with a regex that can match the empty string.
* [BUG #1911](https://github.com/BurntSushi/ripgrep/issues/1911):
Disable mmap searching in all non-64-bit environments.
* [BUG #2236](https://github.com/BurntSushi/ripgrep/issues/2236):
Fix gitignore parsing bug where a trailing `\/` resulted in an error.
13.0.0 (2021-06-12)
===================
ripgrep 13 is a new major version release of ripgrep that primarily contains
bug fixes, some performance improvements and a few minor breaking changes.
There is also a fix for a security vulnerability on Windows
([CVE-2021-3013](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-3013)).
Some highlights:
A new short flag, `-.`, has been added. It is an alias for the `--hidden` flag,
which instructs ripgrep to search hidden files and directories.
ripgrep is now using a new
[vectorized implementation of `memmem`](https://github.com/BurntSushi/memchr/pull/82),
which accelerates many common searches. If you notice any performance
regressions (or major improvements), I'd love to hear about them through an
issue report!
Also, for Windows users targeting MSVC, Cargo will now build fully static
executables of ripgrep. The release binaries for ripgrep 13 have been compiled
using this configuration.
**BREAKING CHANGES**:
**Binary detection output has changed slightly.**
In this release, a small tweak has been made to the output format when a binary
file is detected. Previously, it looked like this:
```
Binary file FOO matches (found "\0" byte around offset XXX)
```
Now it looks like this:
```
FOO: binary file matches (found "\0" byte around offset XXX)
```
**vimgrep output in multi-line now only prints the first line for each match.**
See [issue 1866](https://github.com/BurntSushi/ripgrep/issues/1866) for more
discussion on this. Previously, every line in a match was duplicated, even
when it spanned multiple lines. There are no changes to vimgrep output when
multi-line mode is disabled.
**In multi-line mode, --count is now equivalent to --count-matches.**
This appears to match how `pcre2grep` implements `--count`. Previously, ripgrep
would produce outright incorrect counts. Another alternative would be to simply
count the number of lines---even if it's more than the number of matches---but
that seems highly unintuitive.
**FULL LIST OF FIXES AND IMPROVEMENTS:**
Security fixes:
* [CVE-2021-3013](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-3013):
Fixes a security hole on Windows where running ripgrep with either the
`-z/--search-zip` or `--pre` flags can result in running arbitrary
executables from the current directory.
* [VULN #1773](https://github.com/BurntSushi/ripgrep/issues/1773):
This is the public facing issue tracking CVE-2021-3013. ripgrep's README
now contains a section describing how to report a vulnerability.
Performance improvements:
* [PERF #1657](https://github.com/BurntSushi/ripgrep/discussions/1657):
Check if a file should be ignored first before issuing stat calls.
* [PERF memchr#82](https://github.com/BurntSushi/memchr/pull/82):
ripgrep now uses a new vectorized implementation of `memmem`.
Feature enhancements:
* Added or improved file type filtering for ASP, Bazel, dvc, FlatBuffers,
Futhark, minified files, Mint, pofiles (from GNU gettext) Racket, Red, Ruby,
VCL, Yang.
* [FEATURE #1404](https://github.com/BurntSushi/ripgrep/pull/1404):
ripgrep now prints a warning if nothing is searched.
* [FEATURE #1613](https://github.com/BurntSushi/ripgrep/pull/1613):
Cargo will now produce static executables on Windows when using MSVC.
* [FEATURE #1680](https://github.com/BurntSushi/ripgrep/pull/1680):
Add `-.` as a short flag alias for `--hidden`.
* [FEATURE #1842](https://github.com/BurntSushi/ripgrep/issues/1842):
Add `--field-{context,match}-separator` for customizing field delimiters.
* [FEATURE #1856](https://github.com/BurntSushi/ripgrep/pull/1856):
The README now links to a
[Spanish translation](https://github.com/UltiRequiem/traducciones/tree/master/ripgrep).
Bug fixes:
* [BUG #1277](https://github.com/BurntSushi/ripgrep/issues/1277):
Document cygwin path translation behavior in the FAQ.
* [BUG #1739](https://github.com/BurntSushi/ripgrep/issues/1739):
Fix bug where replacements were buggy if the regex matched a line terminator.
* [BUG #1311](https://github.com/BurntSushi/ripgrep/issues/1311):
Fix multi-line bug where a search & replace for `\n` didn't work as expected.
* [BUG #1401](https://github.com/BurntSushi/ripgrep/issues/1401):
Fix buggy interaction between PCRE2 look-around and `-o/--only-matching`.
* [BUG #1412](https://github.com/BurntSushi/ripgrep/issues/1412):
Fix multi-line bug with searches using look-around past matching lines.
* [BUG #1577](https://github.com/BurntSushi/ripgrep/issues/1577):
Fish shell completions will continue to be auto-generated.
* [BUG #1642](https://github.com/BurntSushi/ripgrep/issues/1642):
Fixes a bug where using `-m` and `-A` printed more matches than the limit.
* [BUG #1703](https://github.com/BurntSushi/ripgrep/issues/1703):
Clarify the function of `-u/--unrestricted`.
* [BUG #1708](https://github.com/BurntSushi/ripgrep/issues/1708):
Clarify how `-S/--smart-case` works.
* [BUG #1730](https://github.com/BurntSushi/ripgrep/issues/1730):
Clarify that CLI invocation must always be valid, regardless of config file.
* [BUG #1741](https://github.com/BurntSushi/ripgrep/issues/1741):
Fix stdin detection when using PowerShell in UNIX environments.
* [BUG #1756](https://github.com/BurntSushi/ripgrep/pull/1756):
Fix bug where `foo/**` would match `foo`, but it shouldn't.
* [BUG #1765](https://github.com/BurntSushi/ripgrep/issues/1765):
Fix panic when `--crlf` is used in some cases.
* [BUG #1638](https://github.com/BurntSushi/ripgrep/issues/1638):
Correctly sniff UTF-8 and do transcoding, like we do for UTF-16.
* [BUG #1816](https://github.com/BurntSushi/ripgrep/issues/1816):
Add documentation for glob alternate syntax, e.g., `{a,b,..}`.
* [BUG #1847](https://github.com/BurntSushi/ripgrep/issues/1847):
Clarify how the `--hidden` flag works.
* [BUG #1866](https://github.com/BurntSushi/ripgrep/issues/1866#issuecomment-841635553):
Fix bug when computing column numbers in `--vimgrep` mode.
* [BUG #1868](https://github.com/BurntSushi/ripgrep/issues/1868):
Fix bug where `--passthru` and `-A/-B/-C` did not override each other.
* [BUG #1869](https://github.com/BurntSushi/ripgrep/pull/1869):
Clarify docs for `--files-with-matches` and `--files-without-match`.
* [BUG #1878](https://github.com/BurntSushi/ripgrep/issues/1878):
Fix bug where `\A` could produce unanchored matches in multiline search.
* [BUG 94e4b8e3](https://github.com/BurntSushi/ripgrep/commit/94e4b8e3):
Fix column numbers with `--vimgrep` is used with `-U/--multiline`.
12.1.1 (2020-05-29)

227
Cargo.lock generated
View File

@@ -1,10 +1,12 @@
# This file is automatically @generated by Cargo.
# It is not intended for manual editing.
version = 3
[[package]]
name = "aho-corasick"
version = "0.7.10"
version = "0.7.18"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8716408b8bc624ed7f65d223ddb9ac2d044c0547b6fa4b0d554f3a9540496ada"
checksum = "1e37cfd5e7657ada45f742d6e99ca5788580b5c529dc78faf11ece6dc702656f"
dependencies = [
"memchr",
]
@@ -20,29 +22,23 @@ dependencies = [
"winapi",
]
[[package]]
name = "autocfg"
version = "1.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f8aac770f1885fd7e387acedd76065302551364496e46b3dd00860b2f8359b9d"
[[package]]
name = "base64"
version = "0.12.1"
version = "0.13.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "53d1ccbaf7d9ec9537465a97bf19edc1a4e158ecb49fc16178202238c569cc42"
checksum = "904dfeac50f3cdaba28fc6f57fdcddb75f49ed61346676a78c4ffe55877802fd"
[[package]]
name = "bitflags"
version = "1.2.1"
version = "1.3.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cf1de2fe8c75bc145a2f577add951f8134889b4795d47466a54a5c846d691693"
checksum = "bef38d45163c2f1dde094a7dfd33ccf595c92905c8f8f4fdc18d06fb1037718a"
[[package]]
name = "bstr"
version = "0.2.13"
version = "0.2.17"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "31accafdb70df7871592c058eca3985b71104e15ac32f64706022c58867da931"
checksum = "ba3569f383e8f1598449f1a423e72e99569137b47740b1da11ef19af3d5c3223"
dependencies = [
"lazy_static",
"memchr",
@@ -51,36 +47,30 @@ dependencies = [
[[package]]
name = "bytecount"
version = "0.6.0"
version = "0.6.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b0017894339f586ccb943b01b9555de56770c11cda818e7e3d8bd93f4ed7f46e"
[[package]]
name = "byteorder"
version = "1.3.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "08c48aae112d48ed9f069b33538ea9e3e90aa263cfa3d1c24309612b1f7472de"
checksum = "72feb31ffc86498dacdbd0fcebb56138e7177a8cc5cea4516031d15ae85a742e"
[[package]]
name = "cc"
version = "1.0.54"
version = "1.0.73"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7bbb73db36c1246e9034e307d0fba23f9a2e251faa47ade70c1bd252220c8311"
checksum = "2fff2a6927b3bb87f9595d67196a70493f627687a71d87a0d692242c33f58c11"
dependencies = [
"jobserver",
]
[[package]]
name = "cfg-if"
version = "0.1.10"
version = "1.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4785bdd1c96b2a846b2bd7cc02e86b6b3dbf14e7e53446c4f54c92a361040822"
checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd"
[[package]]
name = "clap"
version = "2.33.1"
version = "2.34.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bdfa80d47f954d53a35a64987ca1422f495b8d6483c0fe9f7117b36c2a792129"
checksum = "a0610544180c38b88101fecf2dd634b174a62eef6946f84dfc6a7127512b381c"
dependencies = [
"bitflags",
"strsim",
@@ -90,33 +80,32 @@ dependencies = [
[[package]]
name = "crossbeam-channel"
version = "0.4.2"
version = "0.5.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cced8691919c02aac3cb0a1bc2e9b73d89e832bf9a06fc579d4e71b68a2da061"
checksum = "5aaa7bd5fb665c6864b5f963dd9097905c54125909c7aa94c9e18507cdbe6c53"
dependencies = [
"cfg-if",
"crossbeam-utils",
"maybe-uninit",
]
[[package]]
name = "crossbeam-utils"
version = "0.7.2"
version = "0.8.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c3c7c73a2d1e9fc0886a08b93e98eb643461230d5f1925e4036204d5f2e261a8"
checksum = "0bf124c720b7686e3c2663cf54062ab0f68a88af2fb6a030e87e30bf721fcb38"
dependencies = [
"autocfg",
"cfg-if",
"lazy_static",
]
[[package]]
name = "encoding_rs"
version = "0.8.23"
version = "0.8.30"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e8ac63f94732332f44fe654443c46f6375d1939684c17b0afb6cb56b0456e171"
checksum = "7896dc8abb250ffdda33912550faa54c88ec8b998dec0b2c55ab224921ce11df"
dependencies = [
"cfg-if",
"packed_simd",
"packed_simd_2",
]
[[package]]
@@ -136,9 +125,9 @@ checksum = "3f9eec918d3f24069decb9af1554cad7c880e2da24a9afd88aca000531ab82c1"
[[package]]
name = "fs_extra"
version = "1.1.0"
version = "1.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5f2a4a2034423744d2cc7ca2068453168dcdb82c438419e639a26bd87839c674"
checksum = "2022715d62ab30faffd124d40b76f4134a550a87792276512b18d63272333394"
[[package]]
name = "glob"
@@ -148,7 +137,7 @@ checksum = "9b919933a397b79c37e33b77bb2aa3dc8eb6e165ad809e58ff75bc7db2e34574"
[[package]]
name = "globset"
version = "0.4.6"
version = "0.4.9"
dependencies = [
"aho-corasick",
"bstr",
@@ -163,7 +152,7 @@ dependencies = [
[[package]]
name = "grep"
version = "0.2.7"
version = "0.2.9"
dependencies = [
"grep-cli",
"grep-matcher",
@@ -177,7 +166,7 @@ dependencies = [
[[package]]
name = "grep-cli"
version = "0.1.5"
version = "0.1.6"
dependencies = [
"atty",
"bstr",
@@ -192,7 +181,7 @@ dependencies = [
[[package]]
name = "grep-matcher"
version = "0.1.4"
version = "0.1.5"
dependencies = [
"memchr",
"regex",
@@ -200,7 +189,7 @@ dependencies = [
[[package]]
name = "grep-pcre2"
version = "0.1.4"
version = "0.1.5"
dependencies = [
"grep-matcher",
"pcre2",
@@ -208,7 +197,7 @@ dependencies = [
[[package]]
name = "grep-printer"
version = "0.1.5"
version = "0.1.6"
dependencies = [
"base64",
"bstr",
@@ -216,14 +205,13 @@ dependencies = [
"grep-regex",
"grep-searcher",
"serde",
"serde_derive",
"serde_json",
"termcolor",
]
[[package]]
name = "grep-regex"
version = "0.1.8"
version = "0.1.10"
dependencies = [
"aho-corasick",
"bstr",
@@ -236,7 +224,7 @@ dependencies = [
[[package]]
name = "grep-searcher"
version = "0.1.7"
version = "0.1.10"
dependencies = [
"bstr",
"bytecount",
@@ -245,22 +233,22 @@ dependencies = [
"grep-matcher",
"grep-regex",
"log",
"memmap",
"memmap2",
"regex",
]
[[package]]
name = "hermit-abi"
version = "0.1.13"
version = "0.1.19"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "91780f809e750b0a89f5544be56617ff6b1227ee485bcb06ebe10cdf89bd3b71"
checksum = "62b467343b94ba476dcb2500d242dadbb39557df889310ac77c5d99100aaac33"
dependencies = [
"libc",
]
[[package]]
name = "ignore"
version = "0.4.16"
version = "0.4.18"
dependencies = [
"crossbeam-channel",
"crossbeam-utils",
@@ -277,9 +265,9 @@ dependencies = [
[[package]]
name = "itoa"
version = "0.4.5"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b8b7a7c0c47db5545ed3fef7468ee7bb5b74691498139e4b3f6a20685dc6dd8e"
checksum = "1aab8fc367588b89dcee83ab0fd66b72b50b72fa1904d7095045ace2b0c81c35"
[[package]]
name = "jemalloc-sys"
@@ -304,9 +292,9 @@ dependencies = [
[[package]]
name = "jobserver"
version = "0.1.21"
version = "0.1.24"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5c71313ebb9439f74b00d9d2dcec36440beaf57a6aa0623068441dd7cd81a7f2"
checksum = "af25a77299a7f711a01975c35a6a424eb6862092cc2d6c72c4ed6cbc56dfc1fa"
dependencies = [
"libc",
]
@@ -319,58 +307,64 @@ checksum = "e2abad23fbc42b3700f2f279844dc832adb2b2eb069b2df918f455c4e18cc646"
[[package]]
name = "libc"
version = "0.2.71"
version = "0.2.121"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9457b06509d27052635f90d6466700c65095fdf75409b3fbdd903e988b886f49"
checksum = "efaa7b300f3b5fe8eb6bf21ce3895e1751d9665086af2d64b42f19701015ff4f"
[[package]]
name = "libm"
version = "0.1.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7fc7aa29613bd6a620df431842069224d8bc9011086b1db4c0e0cd47fa03ec9a"
[[package]]
name = "log"
version = "0.4.8"
version = "0.4.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "14b6052be84e6b71ab17edffc2eeabf5c2c3ae1fdb464aae35ac50c67a44e1f7"
checksum = "51b9bbe6c47d51fc3e1a9b945965946b4c44142ab8792c50835a980d362c2710"
dependencies = [
"cfg-if",
]
[[package]]
name = "maybe-uninit"
version = "2.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "60302e4db3a61da70c0cb7991976248362f30319e88850c487b9b95bbf059e00"
[[package]]
name = "memchr"
version = "2.3.3"
version = "2.4.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3728d817d99e5ac407411fa471ff9800a778d88a24685968b36824eaf4bee400"
checksum = "308cc39be01b73d0d18f82a0e7b2a3df85245f84af96fdddc5d202d27e47b86a"
[[package]]
name = "memmap"
version = "0.7.0"
name = "memmap2"
version = "0.5.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6585fd95e7bb50d6cc31e20d4cf9afb4e2ba16c5846fc76793f11218da9c475b"
checksum = "057a3db23999c867821a7a59feb06a578fcb03685e983dff90daf9e7d24ac08f"
dependencies = [
"libc",
"winapi",
]
[[package]]
name = "num_cpus"
version = "1.13.0"
version = "1.13.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "05499f3756671c15885fee9034446956fff3f243d6077b91e5767df161f766b3"
checksum = "19e64526ebdee182341572e50e9ad03965aa510cd94427a4549448f285e957a1"
dependencies = [
"hermit-abi",
"libc",
]
[[package]]
name = "packed_simd"
version = "0.3.3"
name = "once_cell"
version = "1.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a85ea9fc0d4ac0deb6fe7911d38786b32fc11119afd9e9d38b84ff691ce64220"
checksum = "87f3e037eac156d1775da914196f0f37741a274155e34a0b7e427c35d2a2ecb9"
[[package]]
name = "packed_simd_2"
version = "0.3.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a1914cd452d8fccd6f9db48147b29fd4ae05bea9dc5d9ad578509f72415de282"
dependencies = [
"cfg-if",
"libm",
]
[[package]]
@@ -398,58 +392,54 @@ dependencies = [
[[package]]
name = "pkg-config"
version = "0.3.17"
version = "0.3.24"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "05da548ad6865900e60eaba7f589cc0783590a92e940c26953ff81ddbab2d677"
checksum = "58893f751c9b0412871a09abd62ecd2a00298c6c83befa223ef98c52aef40cbe"
[[package]]
name = "proc-macro2"
version = "1.0.17"
version = "1.0.36"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1502d12e458c49a4c9cbff560d0fe0060c252bc29799ed94ca2ed4bb665a0101"
checksum = "c7342d5883fbccae1cc37a2353b09c87c9b0f3afd73f5fb9bba687a1f733b029"
dependencies = [
"unicode-xid",
]
[[package]]
name = "quote"
version = "1.0.6"
version = "1.0.16"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "54a21852a652ad6f610c9510194f398ff6f8692e334fd1145fed931f7fbe44ea"
checksum = "b4af2ec4714533fcdf07e886f17025ace8b997b9ce51204ee69b6da831c3da57"
dependencies = [
"proc-macro2",
]
[[package]]
name = "regex"
version = "1.3.9"
version = "1.5.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9c3780fcf44b193bc4d09f36d2a3c87b251da4a046c87795a0d35f4f927ad8e6"
checksum = "1a11647b6b25ff05a515cb92c365cec08801e83423a235b51e231e1808747286"
dependencies = [
"aho-corasick",
"memchr",
"regex-syntax",
"thread_local",
]
[[package]]
name = "regex-automata"
version = "0.1.9"
version = "0.1.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ae1ded71d66a4a97f5e961fd0cb25a5f366a42a41570d16a763a69c092c26ae4"
dependencies = [
"byteorder",
]
checksum = "6c230d73fb8d8c1b9c0b3135c5142a8acee3a0558fb8db5cf1cb65f8d7862132"
[[package]]
name = "regex-syntax"
version = "0.6.18"
version = "0.6.25"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "26412eb97c6b088a6997e05f69403a802a92d520de2f8e63c2b65f9e0f47c4e8"
checksum = "f497285884f3fcff424ffc933e56d7cbca511def0c9831a7f9b5f6153e3cc89b"
[[package]]
name = "ripgrep"
version = "12.1.1"
version = "13.0.0"
dependencies = [
"bstr",
"clap",
@@ -469,9 +459,9 @@ dependencies = [
[[package]]
name = "ryu"
version = "1.0.4"
version = "1.0.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ed3d612bc64430efeb3f7ee6ef26d590dce0c43249217bddc62112540c7941e1"
checksum = "73b4b750c782965c211b42f022f59af1fbceabdd026623714f104152f1ec149f"
[[package]]
name = "same-file"
@@ -484,15 +474,18 @@ dependencies = [
[[package]]
name = "serde"
version = "1.0.110"
version = "1.0.136"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "99e7b308464d16b56eba9964e4972a3eee817760ab60d88c3f86e1fecb08204c"
checksum = "ce31e24b01e1e524df96f1c2fdd054405f8d7376249a5110886fb4b658484789"
dependencies = [
"serde_derive",
]
[[package]]
name = "serde_derive"
version = "1.0.110"
version = "1.0.136"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "818fbf6bfa9a42d3bfcaca148547aa00c7b915bec71d1757aa2d44ca68771984"
checksum = "08597e7152fcd306f41838ed3e37be9eaeed2b61c42e2117266a554fab4662f9"
dependencies = [
"proc-macro2",
"quote",
@@ -501,9 +494,9 @@ dependencies = [
[[package]]
name = "serde_json"
version = "1.0.53"
version = "1.0.79"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "993948e75b189211a9b31a7528f950c6adc21f9720b6438ff80a7fa2f864cea2"
checksum = "8e8d9fa5c3b304765ce1fd9c4c8a3de2c8db365a5b91be52f186efc675681d95"
dependencies = [
"itoa",
"ryu",
@@ -518,9 +511,9 @@ checksum = "8ea5119cdb4c55b55d432abb513a0429384878c15dde60cc77b1c99de1a95a6a"
[[package]]
name = "syn"
version = "1.0.27"
version = "1.0.89"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ef781e621ee763a2a40721a8861ec519cb76966aee03bb5d00adb6a31dc1c1de"
checksum = "ea297be220d52398dcc07ce15a209fce436d361735ac1db700cab3b6cdfb9f54"
dependencies = [
"proc-macro2",
"quote",
@@ -529,9 +522,9 @@ dependencies = [
[[package]]
name = "termcolor"
version = "1.1.0"
version = "1.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bb6bfa289a4d7c5766392812c0a1f4c1ba45afa1ad47803c11e1f407d846d75f"
checksum = "bab24d30b911b2376f3a13cc2cd443142f0c81dda04c118693e35b3835757755"
dependencies = [
"winapi-util",
]
@@ -547,30 +540,30 @@ dependencies = [
[[package]]
name = "thread_local"
version = "1.0.1"
version = "1.1.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d40c6d1b69745a6ec6fb1ca717914848da4b44ae29d9b3080cbee91d72a69b14"
checksum = "5516c27b78311c50bf42c071425c560ac799b11c30b31f87e3081965fe5e0180"
dependencies = [
"lazy_static",
"once_cell",
]
[[package]]
name = "unicode-width"
version = "0.1.7"
version = "0.1.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "caaa9d531767d1ff2150b9332433f32a24622147e5ebb1f26409d5da67afd479"
checksum = "3ed742d4ea2bd1176e236172c8429aaf54486e7ac098db29ffe6529e0ce50973"
[[package]]
name = "unicode-xid"
version = "0.2.0"
version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "826e7639553986605ec5979c7dd957c7895e93eabed50ab2ffa7f6128a75097c"
checksum = "8ccb82d61f80a663efe1f787a51b16b5a51e3314d6ac365b08639f52387b33f3"
[[package]]
name = "walkdir"
version = "2.3.1"
version = "2.3.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "777182bc735b6424e1a57516d35ed72cb8019d85c8c9bf536dccb3445c1a2f7d"
checksum = "808cf2735cd4b6866113f648b791c6adc5714537bc222d9347bb203386ffda56"
dependencies = [
"same-file",
"winapi",
@@ -579,9 +572,9 @@ dependencies = [
[[package]]
name = "winapi"
version = "0.3.8"
version = "0.3.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8093091eeb260906a183e6ae1abdba2ef5ef2257a21801128899c3fc699229c6"
checksum = "5c839a674fcd7a98952e593242ea400abe93992746761e38641405d28b00f419"
dependencies = [
"winapi-i686-pc-windows-gnu",
"winapi-x86_64-pc-windows-gnu",

View File

@@ -1,16 +1,15 @@
[package]
name = "ripgrep"
version = "12.1.1" #:version
version = "13.0.0" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
ripgrep is a line-oriented search tool that recursively searches your current
directory for a regex pattern while respecting your gitignore rules. ripgrep
has first class support on Windows, macOS and Linux.
ripgrep is a line-oriented search tool that recursively searches the current
directory for a regex pattern while respecting gitignore rules. ripgrep has
first class support on Windows, macOS and Linux.
"""
documentation = "https://github.com/BurntSushi/ripgrep"
homepage = "https://github.com/BurntSushi/ripgrep"
repository = "https://github.com/BurntSushi/ripgrep"
readme = "README.md"
keywords = ["regex", "grep", "egrep", "search", "pattern"]
categories = ["command-line-utilities", "text-processing"]
license = "Unlicense OR MIT"
@@ -43,8 +42,8 @@ members = [
[dependencies]
bstr = "0.2.12"
grep = { version = "0.2.7", path = "crates/grep" }
ignore = { version = "0.4.16", path = "crates/ignore" }
grep = { version = "0.2.8", path = "crates/grep" }
ignore = { version = "0.4.18", path = "crates/ignore" }
lazy_static = "1.1.0"
log = "0.4.5"
num_cpus = "1.8.0"

2
FAQ.md
View File

@@ -208,7 +208,7 @@ The `--color` flag accepts one of the following possible values: `never`,
ripgrep to only enable colors when it is printing to a terminal. But if you
pipe ripgrep to a file or some other process, then it will suppress colors.
The --colors` flag is a bit more complicated. The general format is:
The `--colors` flag is a bit more complicated. The general format is:
```
--colors '{type}:{attribute}:{value}'

View File

@@ -177,15 +177,19 @@ After recursive search, ripgrep's most important feature is what it *doesn't*
search. By default, when you search a directory, ripgrep will ignore all of
the following:
1. Files and directories that match the rules in your `.gitignore` glob
pattern.
1. Files and directories that match glob patterns in these three categories:
1. gitignore globs (including global and repo-specific globs).
2. `.ignore` globs, which take precedence over all gitignore globs
when there's a conflict.
3. `.rgignore` globs, which take precedence over all `.ignore` globs
when there's a conflict.
2. Hidden files and directories.
3. Binary files. (ripgrep considers any file with a `NUL` byte to be binary.)
4. Symbolic links aren't followed.
All of these things can be toggled using various flags provided by ripgrep:
1. You can disable `.gitignore` handling with the `--no-ignore` flag.
1. You can disable all ignore-related filtering with the `--no-ignore` flag.
2. Hidden files and directories can be searched with the `--hidden` flag.
3. Binary files can be searched via the `--text` (`-a` for short) flag.
Be careful with this flag! Binary files may emit control characters to your
@@ -644,9 +648,9 @@ given, which is the default:
they correspond to a UTF-16 BOM, then ripgrep will transcode the contents of
the file from UTF-16 to UTF-8, and then execute the search on the transcoded
version of the file. (This incurs a performance penalty since transcoding
is slower than regex searching.) If the file contains invalid UTF-16, then
the Unicode replacement codepoint is substituted in place of invalid code
units.
is needed in addition to regex searching.) If the file contains invalid
UTF-16, then the Unicode replacement codepoint is substituted in place of
invalid code units.
* To handle other cases, ripgrep provides a `-E/--encoding` flag, which permits
you to specify an encoding from the
[Encoding Standard](https://encoding.spec.whatwg.org/#concept-encoding-get).
@@ -988,6 +992,8 @@ used options that will likely impact how you use ripgrep on a regular basis.
* `-S/--smart-case`: This is similar to `--ignore-case`, but disables itself
if the pattern contains any uppercase letters. Usually this flag is put into
alias or a config file.
* `-F/--fixed-strings`: Disable regular expression matching and treat the pattern
as a literal string.
* `-w/--word-regexp`: Require that all matches of the pattern be surrounded
by word boundaries. That is, given `pattern`, the `--word-regexp` flag will
cause ripgrep to behave as if `pattern` were actually `\b(?:pattern)\b`.

View File

@@ -1,7 +1,7 @@
ripgrep (rg)
------------
ripgrep is a line-oriented search tool that recursively searches your current
directory for a regex pattern. By default, ripgrep will respect your .gitignore
ripgrep is a line-oriented search tool that recursively searches the current
directory for a regex pattern. By default, ripgrep will respect gitignore rules
and automatically skip hidden files/directories and binary files. ripgrep
has first class support on Windows, macOS and Linux, with binary downloads
available for [every release](https://github.com/BurntSushi/ripgrep/releases).
@@ -192,15 +192,9 @@ multiline search and opt-in fancy regex support via PCRE2.
The binary name for ripgrep is `rg`.
**[Archives of precompiled binaries for ripgrep are available for Windows,
macOS and Linux.](https://github.com/BurntSushi/ripgrep/releases)** Users of
platforms not explicitly mentioned below are advised to download one of these
archives.
Linux binaries are static executables. Windows binaries are available either as
built with MinGW (GNU) or with Microsoft Visual C++ (MSVC). When possible,
prefer MSVC over GNU, but you'll need to have the [Microsoft VC++ 2015
redistributable](https://www.microsoft.com/en-us/download/details.aspx?id=48145)
installed.
macOS and Linux.](https://github.com/BurntSushi/ripgrep/releases)** Linux and
Windows binaries are static executables. Users of platforms not explicitly
mentioned below are advised to download one of these archives.
If you're a **macOS Homebrew** or a **Linuxbrew** user, then you can install
ripgrep from homebrew-core:
@@ -278,8 +272,8 @@ then ripgrep can be installed using a binary `.deb` file provided in each
[ripgrep release](https://github.com/BurntSushi/ripgrep/releases).
```
$ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/11.0.2/ripgrep_11.0.2_amd64.deb
$ sudo dpkg -i ripgrep_11.0.2_amd64.deb
$ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/13.0.0/ripgrep_13.0.0_amd64.deb
$ sudo dpkg -i ripgrep_13.0.0_amd64.deb
```
If you run Debian Buster (currently Debian stable) or Debian sid, ripgrep is
@@ -309,14 +303,14 @@ If you're a **FreeBSD** user, then you can install ripgrep from the
```
If you're an **OpenBSD** user, then you can install ripgrep from the
[official ports](http://openports.se/textproc/ripgrep):
[official ports](https://openports.se/textproc/ripgrep):
```
$ doas pkg_add ripgrep
```
If you're a **NetBSD** user, then you can install ripgrep from
[pkgsrc](http://pkgsrc.se/textproc/ripgrep):
[pkgsrc](https://pkgsrc.se/textproc/ripgrep):
```
# pkgin install ripgrep
@@ -425,9 +419,18 @@ $ cargo test --all
from the repository root.
### Vulnerability reporting
For reporting a security vulnerability, please
[contact Andrew Gallant](https://blog.burntsushi.net/about/),
which has my email address and PGP public key if you wish to send an encrypted
message.
### Translations
The following is a list of known translations of ripgrep's documentation. These
are unofficially maintained and may not be up to date.
* [Chinese](https://github.com/chinanf-boy/ripgrep-zh#%E6%9B%B4%E6%96%B0-)
* [Spanish](https://github.com/UltiRequiem/traducciones/tree/master/ripgrep)

View File

@@ -1,9 +1,11 @@
Release Checklist
-----------------
* Ensure local `master` is up to date with respect to `origin/master`.
* Run `cargo update` and review dependency updates. Commit updated
`Cargo.lock`.
* Run `cargo outdated` and review semver incompatible updates. Unless there is
a strong motivation otherwise, review and update every dependency.
a strong motivation otherwise, review and update every dependency. Also
run `--aggressive`, but don't update to crates that are still in beta.
* Review changes for every crate in `crates` since the last ripgrep release.
If the set of changes is non-empty, issue a new release for that crate. Check
crates in the following order. After updating a crate, ensure minimal
@@ -24,14 +26,19 @@ Release Checklist
`cargo update -p ripgrep` so that the `Cargo.lock` is updated. Commit the
changes and create a new signed tag. Alternatively, use
`cargo-up --no-push --no-release Cargo.toml {VERSION}` to automate this.
* Push changes to GitHub, NOT including the tag. (But do not publish new
version of ripgrep to crates.io yet.)
* Once CI for `master` finishes successfully, push the version tag. (Trying to
do this in one step seems to result in GitHub Actions not seeing the tag
push and thus not running the release workflow.)
* Wait for CI to finish creating the release. If the release build fails, then
delete the tag from GitHub, make fixes, re-tag, delete the release and push.
* Copy the relevant section of the CHANGELOG to the tagged release notes.
Include this blurb describing what ripgrep is:
> In case you haven't heard of it before, ripgrep is a line-oriented search
> tool that recursively searches your current directory for a regex pattern.
> By default, ripgrep will respect your gitignore rules and automatically
> skip hidden files/directories and binary files.
> tool that recursively searches the current directory for a regex pattern.
> By default, ripgrep will respect gitignore rules and automatically skip
> hidden files/directories and binary files.
* Run `ci/build-deb` locally and manually upload the deb package to the
release.
* Run `cargo publish`.

View File

@@ -17,16 +17,21 @@ if ! command -V cargo-deb > /dev/null 2>&1; then
exit 1
fi
if ! command -V asciidoctor > /dev/null 2>&1; then
echo "asciidoctor command missing" >&2
exit 1
fi
# 'cargo deb' does not seem to provide a way to specify an asset that is
# created at build time, such as ripgrep's man page. To work around this,
# we force a debug build, copy out the man page (and shell completions)
# produced from that build, put it into a predictable location and then build
# the deb, which knows where to look.
cargo build
DEPLOY_DIR=deployment/deb
OUT_DIR="$("$D"/cargo-out-dir target/debug/)"
mkdir -p "$DEPLOY_DIR"
cargo build
# Copy man page and shell completions.
cp "$OUT_DIR"/{rg.1,rg.bash,rg.fish} "$DEPLOY_DIR/"
@@ -34,4 +39,4 @@ cp complete/_rg "$DEPLOY_DIR/"
# Since we're distributing the dpkg, we don't know whether the user will have
# PCRE2 installed, so just do a static build.
PCRE2_SYS_STATIC=1 cargo deb
PCRE2_SYS_STATIC=1 cargo deb --target x86_64-unknown-linux-musl

View File

@@ -44,8 +44,8 @@ main() {
# Occasionally we may have to handle some manually, however
help_args=( ${(f)"$(
$rg --help |
$rg -i -- '^\s+--?[a-z0-9]|--[a-z]' |
$rg -ior '$1' -- $'[\t /\"\'`.,](-[a-z0-9]|--[a-z0-9-]+)\\b' |
$rg -i -- '^\s+--?[a-z0-9.]|--[a-z]' |
$rg -ior '$1' -- $'[\t /\"\'`.,](-[a-z0-9.]|--[a-z0-9-]+)(,|\\b)' |
$rg -v -- --print0 | # False positives
sort -u
)"} )

View File

@@ -1,6 +1,16 @@
#!/bin/sh
# This script gets run in weird environments that have been stripped of just
# about every inessential thing. In order to keep this script versatile, we
# just install 'sudo' and use it like normal if it doesn't exist. If it doesn't
# exist, we assume we're root. (Otherwise we ain't doing much of anything
# anyway.)
if ! command -V sudo; then
apt-get update
apt-get install -y --no-install-recommends sudo
fi
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
asciidoctor \
zsh xz-utils liblz4-tool musl-tools
zsh xz-utils liblz4-tool musl-tools \
brotli zstd

View File

@@ -99,9 +99,7 @@ is_osx() {
builder() {
if is_musl && is_x86_64; then
# cargo install cross
# To work around https://github.com/rust-embedded/cross/issues/357
cargo install --git https://github.com/rust-embedded/cross --force
cargo install cross
echo "cross"
else
echo "cargo"

View File

@@ -121,7 +121,7 @@ _rg() {
"(pretty-vimgrep)--no-heading[don't show matches grouped by file name]"
+ '(hidden)' # Hidden-file options
'--hidden[search hidden files and directories]'
{-.,--hidden}'[search hidden files and directories]'
$no"--no-hidden[don't search hidden files and directories]"
+ '(hybrid)' # hybrid regex options
@@ -303,6 +303,8 @@ _rg() {
'--context-separator=[specify string used to separate non-continuous context lines in output]:separator'
$no"--no-context-separator[don't print context separators]"
'--debug[show debug messages]'
'--field-context-separator[set string to delimit fields in context lines]'
'--field-match-separator[set string to delimit fields in matching lines]'
'--trace[show more verbose debug messages]'
'--dfa-size-limit=[specify upper size limit of generated DFA]:DFA size (bytes)'
"(1 stats)--files[show each file that would be searched (but don't search)]"

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-cli"
version = "0.1.5" #:version
version = "0.1.6" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Utilities for search oriented command line applications.
@@ -10,12 +10,13 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/cli"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/cli"
readme = "README.md"
keywords = ["regex", "grep", "cli", "utility", "util"]
license = "Unlicense/MIT"
license = "Unlicense OR MIT"
edition = "2018"
[dependencies]
atty = "0.2.11"
bstr = "0.2.0"
globset = { version = "0.4.5", path = "../globset" }
globset = { version = "0.4.9", path = "../globset" }
lazy_static = "1.1.0"
log = "0.4.5"
regex = "1.1"

View File

@@ -5,11 +5,10 @@ command line applications. This includes, but is not limited to, parsing hex
escapes, detecting whether stdin is readable and more. To the extent possible,
this crate strives for compatibility across Windows, macOS and Linux.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![Build status](https://github.com/BurntSushi/ripgrep/workflows/ci/badge.svg)](https://github.com/BurntSushi/ripgrep/actions)
[![](https://img.shields.io/crates/v/grep-cli.svg)](https://crates.io/crates/grep-cli)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
Dual-licensed under MIT or the [UNLICENSE](https://unlicense.org/).
### Documentation
@@ -30,9 +29,3 @@ Add this to your `Cargo.toml`:
[dependencies]
grep-cli = "0.1"
```
and this to your crate root:
```rust
extern crate grep_cli;
```

View File

@@ -1,12 +1,12 @@
use std::ffi::{OsStr, OsString};
use std::fs::File;
use std::io;
use std::path::Path;
use std::path::{Path, PathBuf};
use std::process::Command;
use globset::{Glob, GlobSet, GlobSetBuilder};
use process::{CommandError, CommandReader, CommandReaderBuilder};
use crate::process::{CommandError, CommandReader, CommandReaderBuilder};
/// A builder for a matcher that determines which files get decompressed.
#[derive(Clone, Debug)]
@@ -24,7 +24,7 @@ struct DecompressionCommand {
/// The glob that matches this command.
glob: String,
/// The command or binary name.
bin: OsString,
bin: PathBuf,
/// The arguments to invoke with the command.
args: Vec<OsString>,
}
@@ -83,23 +83,60 @@ impl DecompressionMatcherBuilder {
///
/// The syntax for the glob is documented in the
/// [`globset` crate](https://docs.rs/globset/#syntax).
///
/// The `program` given is resolved with respect to `PATH` and turned
/// into an absolute path internally before being executed by the current
/// platform. Notably, on Windows, this avoids a security problem where
/// passing a relative path to `CreateProcess` will automatically search
/// the current directory for a matching program. If the program could
/// not be resolved, then it is silently ignored and the association is
/// dropped. For this reason, callers should prefer `try_associate`.
pub fn associate<P, I, A>(
&mut self,
glob: &str,
program: P,
args: I,
) -> &mut DecompressionMatcherBuilder
where
P: AsRef<OsStr>,
I: IntoIterator<Item = A>,
A: AsRef<OsStr>,
{
let _ = self.try_associate(glob, program, args);
self
}
/// Associates a glob with a command to decompress files matching the glob.
///
/// If multiple globs match the same file, then the most recently added
/// glob takes precedence.
///
/// The syntax for the glob is documented in the
/// [`globset` crate](https://docs.rs/globset/#syntax).
///
/// The `program` given is resolved with respect to `PATH` and turned
/// into an absolute path internally before being executed by the current
/// platform. Notably, on Windows, this avoids a security problem where
/// passing a relative path to `CreateProcess` will automatically search
/// the current directory for a matching program. If the program could not
/// be resolved, then an error is returned.
pub fn try_associate<P, I, A>(
&mut self,
glob: &str,
program: P,
args: I,
) -> Result<&mut DecompressionMatcherBuilder, CommandError>
where
P: AsRef<OsStr>,
I: IntoIterator<Item = A>,
A: AsRef<OsStr>,
{
let glob = glob.to_string();
let bin = program.as_ref().to_os_string();
let bin = resolve_binary(Path::new(program.as_ref()))?;
let args =
args.into_iter().map(|a| a.as_ref().to_os_string()).collect();
self.commands.push(DecompressionCommand { glob, bin, args });
self
Ok(self)
}
}
@@ -193,7 +230,7 @@ impl DecompressionReaderBuilder {
match self.command_builder.build(&mut cmd) {
Ok(cmd_reader) => Ok(DecompressionReader { rdr: Ok(cmd_reader) }),
Err(err) => {
debug!(
log::debug!(
"{}: error spawning command '{:?}': {} \
(falling back to uncompressed reader)",
path.display(),
@@ -329,6 +366,30 @@ impl DecompressionReader {
let file = File::open(path)?;
Ok(DecompressionReader { rdr: Err(file) })
}
/// Closes this reader, freeing any resources used by its underlying child
/// process, if one was used. If the child process exits with a nonzero
/// exit code, the returned Err value will include its stderr.
///
/// `close` is idempotent, meaning it can be safely called multiple times.
/// The first call closes the CommandReader and any subsequent calls do
/// nothing.
///
/// This method should be called after partially reading a file to prevent
/// resource leakage. However there is no need to call `close` explicitly
/// if your code always calls `read` to EOF, as `read` takes care of
/// calling `close` in this case.
///
/// `close` is also called in `drop` as a last line of defense against
/// resource leakage. Any error from the child process is then printed as a
/// warning to stderr. This can be avoided by explicitly calling `close`
/// before the CommandReader is dropped.
pub fn close(&mut self) -> io::Result<()> {
match self.rdr {
Ok(ref mut rdr) => rdr.close(),
Err(_) => Ok(()),
}
}
}
impl io::Read for DecompressionReader {
@@ -340,6 +401,70 @@ impl io::Read for DecompressionReader {
}
}
/// Resolves a path to a program to a path by searching for the program in
/// `PATH`.
///
/// If the program could not be resolved, then an error is returned.
///
/// The purpose of doing this instead of passing the path to the program
/// directly to Command::new is that Command::new will hand relative paths
/// to CreateProcess on Windows, which will implicitly search the current
/// working directory for the executable. This could be undesirable for
/// security reasons. e.g., running ripgrep with the -z/--search-zip flag on an
/// untrusted directory tree could result in arbitrary programs executing on
/// Windows.
///
/// Note that this could still return a relative path if PATH contains a
/// relative path. We permit this since it is assumed that the user has set
/// this explicitly, and thus, desires this behavior.
///
/// On non-Windows, this is a no-op.
pub fn resolve_binary<P: AsRef<Path>>(
prog: P,
) -> Result<PathBuf, CommandError> {
use std::env;
fn is_exe(path: &Path) -> bool {
let md = match path.metadata() {
Err(_) => return false,
Ok(md) => md,
};
!md.is_dir()
}
let prog = prog.as_ref();
if !cfg!(windows) || prog.is_absolute() {
return Ok(prog.to_path_buf());
}
let syspaths = match env::var_os("PATH") {
Some(syspaths) => syspaths,
None => {
let msg = "system PATH environment variable not found";
return Err(CommandError::io(io::Error::new(
io::ErrorKind::Other,
msg,
)));
}
};
for syspath in env::split_paths(&syspaths) {
if syspath.as_os_str().is_empty() {
continue;
}
let abs_prog = syspath.join(prog);
if is_exe(&abs_prog) {
return Ok(abs_prog.to_path_buf());
}
if abs_prog.extension().is_none() {
let abs_prog = abs_prog.with_extension("exe");
if is_exe(&abs_prog) {
return Ok(abs_prog.to_path_buf());
}
}
}
let msg = format!("{}: could not find executable in PATH", prog.display());
return Err(CommandError::io(io::Error::new(io::ErrorKind::Other, msg)));
}
fn default_decompression_commands() -> Vec<DecompressionCommand> {
const ARGS_GZIP: &[&str] = &["gzip", "-d", "-c"];
const ARGS_BZIP: &[&str] = &["bzip2", "-d", "-c"];
@@ -350,29 +475,36 @@ fn default_decompression_commands() -> Vec<DecompressionCommand> {
const ARGS_ZSTD: &[&str] = &["zstd", "-q", "-d", "-c"];
const ARGS_UNCOMPRESS: &[&str] = &["uncompress", "-c"];
fn cmd(glob: &str, args: &[&str]) -> DecompressionCommand {
DecompressionCommand {
fn add(glob: &str, args: &[&str], cmds: &mut Vec<DecompressionCommand>) {
let bin = match resolve_binary(Path::new(args[0])) {
Ok(bin) => bin,
Err(err) => {
log::debug!("{}", err);
return;
}
};
cmds.push(DecompressionCommand {
glob: glob.to_string(),
bin: OsStr::new(&args[0]).to_os_string(),
bin,
args: args
.iter()
.skip(1)
.map(|s| OsStr::new(s).to_os_string())
.collect(),
}
});
}
vec![
cmd("*.gz", ARGS_GZIP),
cmd("*.tgz", ARGS_GZIP),
cmd("*.bz2", ARGS_BZIP),
cmd("*.tbz2", ARGS_BZIP),
cmd("*.xz", ARGS_XZ),
cmd("*.txz", ARGS_XZ),
cmd("*.lz4", ARGS_LZ4),
cmd("*.lzma", ARGS_LZMA),
cmd("*.br", ARGS_BROTLI),
cmd("*.zst", ARGS_ZSTD),
cmd("*.zstd", ARGS_ZSTD),
cmd("*.Z", ARGS_UNCOMPRESS),
]
let mut cmds = vec![];
add("*.gz", ARGS_GZIP, &mut cmds);
add("*.tgz", ARGS_GZIP, &mut cmds);
add("*.bz2", ARGS_BZIP, &mut cmds);
add("*.tbz2", ARGS_BZIP, &mut cmds);
add("*.xz", ARGS_XZ, &mut cmds);
add("*.txz", ARGS_XZ, &mut cmds);
add("*.lz4", ARGS_LZ4, &mut cmds);
add("*.lzma", ARGS_LZMA, &mut cmds);
add("*.br", ARGS_BROTLI, &mut cmds);
add("*.zst", ARGS_ZSTD, &mut cmds);
add("*.zstd", ARGS_ZSTD, &mut cmds);
add("*.Z", ARGS_UNCOMPRESS, &mut cmds);
cmds
}

View File

@@ -8,7 +8,7 @@ use regex::Regex;
/// An error that occurs when parsing a human readable size description.
///
/// This error provides an end user friendly message describing why the
/// description coudln't be parsed and what the expected format is.
/// description couldn't be parsed and what the expected format is.
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct ParseSizeError {
original: String,
@@ -52,7 +52,7 @@ impl error::Error for ParseSizeError {
}
impl fmt::Display for ParseSizeError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
use self::ParseSizeErrorKind::*;
match self.kind {
@@ -88,7 +88,7 @@ impl From<ParseSizeError> for io::Error {
///
/// Additional suffixes may be added over time.
pub fn parse_human_readable_size(size: &str) -> Result<u64, ParseSizeError> {
lazy_static! {
lazy_static::lazy_static! {
// Normally I'd just parse something this simple by hand to avoid the
// regex dep, but we bring regex in any way for glob matching, so might
// as well use it.

View File

@@ -158,19 +158,6 @@ error message is crafted that typically tells the user how to fix the problem.
#![deny(missing_docs)]
extern crate atty;
extern crate bstr;
extern crate globset;
#[macro_use]
extern crate lazy_static;
#[macro_use]
extern crate log;
extern crate regex;
extern crate same_file;
extern crate termcolor;
#[cfg(windows)]
extern crate winapi_util;
mod decompress;
mod escape;
mod human;
@@ -178,18 +165,18 @@ mod pattern;
mod process;
mod wtr;
pub use decompress::{
DecompressionMatcher, DecompressionMatcherBuilder, DecompressionReader,
DecompressionReaderBuilder,
pub use crate::decompress::{
resolve_binary, DecompressionMatcher, DecompressionMatcherBuilder,
DecompressionReader, DecompressionReaderBuilder,
};
pub use escape::{escape, escape_os, unescape, unescape_os};
pub use human::{parse_human_readable_size, ParseSizeError};
pub use pattern::{
pub use crate::escape::{escape, escape_os, unescape, unescape_os};
pub use crate::human::{parse_human_readable_size, ParseSizeError};
pub use crate::pattern::{
pattern_from_bytes, pattern_from_os, patterns_from_path,
patterns_from_reader, patterns_from_stdin, InvalidPatternError,
};
pub use process::{CommandError, CommandReader, CommandReaderBuilder};
pub use wtr::{
pub use crate::process::{CommandError, CommandReader, CommandReaderBuilder};
pub use crate::wtr::{
stdout, stdout_buffered_block, stdout_buffered_line, StandardStream,
};
@@ -210,7 +197,7 @@ pub fn is_readable_stdin() -> bool {
Err(_) => return false,
Ok(md) => md.file_type(),
};
ft.is_file() || ft.is_fifo()
ft.is_file() || ft.is_fifo() || ft.is_socket()
}
#[cfg(windows)]
@@ -225,13 +212,13 @@ pub fn is_readable_stdin() -> bool {
!is_tty_stdin() && imp()
}
/// Returns true if and only if stdin is believed to be connectted to a tty
/// Returns true if and only if stdin is believed to be connected to a tty
/// or a console.
pub fn is_tty_stdin() -> bool {
atty::is(atty::Stream::Stdin)
}
/// Returns true if and only if stdout is believed to be connectted to a tty
/// Returns true if and only if stdout is believed to be connected to a tty
/// or a console.
///
/// This is useful for when you want your command line program to produce
@@ -243,7 +230,7 @@ pub fn is_tty_stdout() -> bool {
atty::is(atty::Stream::Stdout)
}
/// Returns true if and only if stderr is believed to be connectted to a tty
/// Returns true if and only if stderr is believed to be connected to a tty
/// or a console.
pub fn is_tty_stderr() -> bool {
atty::is(atty::Stream::Stderr)

View File

@@ -8,7 +8,7 @@ use std::str;
use bstr::io::BufReadExt;
use escape::{escape, escape_os};
use crate::escape::{escape, escape_os};
/// An error that occurs when a pattern could not be converted to valid UTF-8.
///
@@ -35,7 +35,7 @@ impl error::Error for InvalidPatternError {
}
impl fmt::Display for InvalidPatternError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(
f,
"found invalid UTF-8 in pattern at byte offset {}: {} \

View File

@@ -30,6 +30,14 @@ impl CommandError {
pub(crate) fn stderr(bytes: Vec<u8>) -> CommandError {
CommandError { kind: CommandErrorKind::Stderr(bytes) }
}
/// Returns true if and only if this error has empty data from stderr.
pub(crate) fn is_empty(&self) -> bool {
match self.kind {
CommandErrorKind::Stderr(ref bytes) => bytes.is_empty(),
_ => false,
}
}
}
impl error::Error for CommandError {
@@ -39,7 +47,7 @@ impl error::Error for CommandError {
}
impl fmt::Display for CommandError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self.kind {
CommandErrorKind::Io(ref e) => e.fmt(f),
CommandErrorKind::Stderr(ref bytes) => {
@@ -107,18 +115,12 @@ impl CommandReaderBuilder {
.stdout(process::Stdio::piped())
.stderr(process::Stdio::piped())
.spawn()?;
let stdout = child.stdout.take().unwrap();
let stderr = if self.async_stderr {
StderrReader::async(child.stderr.take().unwrap())
StderrReader::r#async(child.stderr.take().unwrap())
} else {
StderrReader::sync(child.stderr.take().unwrap())
};
Ok(CommandReader {
child: child,
stdout: stdout,
stderr: stderr,
done: false,
})
Ok(CommandReader { child, stderr, eof: false })
}
/// When enabled, the reader will asynchronously read the contents of the
@@ -175,9 +177,11 @@ impl CommandReaderBuilder {
#[derive(Debug)]
pub struct CommandReader {
child: process::Child,
stdout: process::ChildStdout,
stderr: StderrReader,
done: bool,
/// This is set to true once 'read' returns zero bytes. When this isn't
/// set and we close the reader, then we anticipate a pipe error when
/// reaping the child process and silence it.
eof: bool,
}
impl CommandReader {
@@ -201,23 +205,73 @@ impl CommandReader {
) -> Result<CommandReader, CommandError> {
CommandReaderBuilder::new().build(cmd)
}
/// Closes the CommandReader, freeing any resources used by its underlying
/// child process. If the child process exits with a nonzero exit code, the
/// returned Err value will include its stderr.
///
/// `close` is idempotent, meaning it can be safely called multiple times.
/// The first call closes the CommandReader and any subsequent calls do
/// nothing.
///
/// This method should be called after partially reading a file to prevent
/// resource leakage. However there is no need to call `close` explicitly
/// if your code always calls `read` to EOF, as `read` takes care of
/// calling `close` in this case.
///
/// `close` is also called in `drop` as a last line of defense against
/// resource leakage. Any error from the child process is then printed as a
/// warning to stderr. This can be avoided by explicitly calling `close`
/// before the CommandReader is dropped.
pub fn close(&mut self) -> io::Result<()> {
// Dropping stdout closes the underlying file descriptor, which should
// cause a well-behaved child process to exit. If child.stdout is None
// we assume that close() has already been called and do nothing.
let stdout = match self.child.stdout.take() {
None => return Ok(()),
Some(stdout) => stdout,
};
drop(stdout);
if self.child.wait()?.success() {
Ok(())
} else {
let err = self.stderr.read_to_end();
// In the specific case where we haven't consumed the full data
// from the child process, then closing stdout above results in
// a pipe signal being thrown in most cases. But I don't think
// there is any reliable and portable way of detecting it. Instead,
// if we know we haven't hit EOF (so we anticipate a broken pipe
// error) and if stderr otherwise doesn't have anything on it, then
// we assume total success.
if !self.eof && err.is_empty() {
return Ok(());
}
Err(io::Error::from(err))
}
}
}
impl Drop for CommandReader {
fn drop(&mut self) {
if let Err(error) = self.close() {
log::warn!("{}", error);
}
}
}
impl io::Read for CommandReader {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
if self.done {
return Ok(0);
}
let nread = self.stdout.read(buf)?;
let stdout = match self.child.stdout {
None => return Ok(0),
Some(ref mut stdout) => stdout,
};
let nread = stdout.read(buf)?;
if nread == 0 {
self.done = true;
// Reap the child now that we're done reading. If the command
// failed, report stderr as an error.
if !self.child.wait()?.success() {
return Err(io::Error::from(self.stderr.read_to_end()));
}
self.eof = true;
self.close().map(|_| 0)
} else {
Ok(nread)
}
Ok(nread)
}
}
@@ -231,7 +285,7 @@ enum StderrReader {
impl StderrReader {
/// Create a reader for stderr that reads contents asynchronously.
fn async(mut stderr: process::ChildStderr) -> StderrReader {
fn r#async(mut stderr: process::ChildStderr) -> StderrReader {
let handle =
thread::spawn(move || stderr_to_command_error(&mut stderr));
StderrReader::Async(Some(handle))

View File

@@ -2,7 +2,7 @@ use std::io;
use termcolor;
use is_tty_stdout;
use crate::is_tty_stdout;
/// A writer that supports coloring with either line or block buffering.
pub struct StandardStream(StandardStreamKind);

View File

@@ -13,8 +13,8 @@ use clap::{self, crate_authors, crate_version, App, AppSettings};
use lazy_static::lazy_static;
const ABOUT: &str = "
ripgrep (rg) recursively searches your current directory for a regex pattern.
By default, ripgrep will respect your .gitignore and automatically skip hidden
ripgrep (rg) recursively searches the current directory for a regex pattern.
By default, ripgrep will respect gitignore rules and automatically skip hidden
files/directories and binary files.
Use -h for short descriptions and --help for more details.
@@ -24,10 +24,13 @@ Project home page: https://github.com/BurntSushi/ripgrep
const USAGE: &str = "
rg [OPTIONS] PATTERN [PATH ...]
rg [OPTIONS] [-e PATTERN ...] [-f PATTERNFILE ...] [PATH ...]
rg [OPTIONS] -e PATTERN ... [PATH ...]
rg [OPTIONS] -f PATTERNFILE ... [PATH ...]
rg [OPTIONS] --files [PATH ...]
rg [OPTIONS] --type-list
command | rg [OPTIONS] PATTERN";
command | rg [OPTIONS] PATTERN
rg [OPTIONS] --help
rg [OPTIONS] --version";
const TEMPLATE: &str = "\
{bin} {version}
@@ -565,6 +568,8 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
flag_dfa_size_limit(&mut args);
flag_encoding(&mut args);
flag_engine(&mut args);
flag_field_context_separator(&mut args);
flag_field_match_separator(&mut args);
flag_file(&mut args);
flag_files(&mut args);
flag_files_with_matches(&mut args);
@@ -693,7 +698,7 @@ fn flag_after_context(args: &mut Vec<RGArg>) {
"\
Show NUM lines after each match.
This overrides the --context flag.
This overrides the --context and --passthru flags.
"
);
let arg = RGArg::flag("after-context", "NUM")
@@ -701,6 +706,7 @@ This overrides the --context flag.
.help(SHORT)
.long_help(LONG)
.number()
.overrides("passthru")
.overrides("context");
args.push(arg);
}
@@ -762,7 +768,7 @@ fn flag_before_context(args: &mut Vec<RGArg>) {
"\
Show NUM lines before each match.
This overrides the --context flag.
This overrides the --context and --passthru flags.
"
);
let arg = RGArg::flag("before-context", "NUM")
@@ -770,6 +776,7 @@ This overrides the --context flag.
.help(SHORT)
.long_help(LONG)
.number()
.overrides("passthru")
.overrides("context");
args.push(arg);
}
@@ -1002,7 +1009,8 @@ fn flag_context(args: &mut Vec<RGArg>) {
Show NUM lines before and after each match. This is equivalent to providing
both the -B/--before-context and -A/--after-context flags with the same value.
This overrides both the -B/--before-context and -A/--after-context flags.
This overrides both the -B/--before-context and -A/--after-context flags,
in addition to the --passthru flag.
"
);
let arg = RGArg::flag("context", "NUM")
@@ -1010,6 +1018,7 @@ This overrides both the -B/--before-context and -A/--after-context flags.
.help(SHORT)
.long_help(LONG)
.number()
.overrides("passthru")
.overrides("before-context")
.overrides("after-context");
args.push(arg);
@@ -1048,11 +1057,13 @@ fn flag_count(args: &mut Vec<RGArg>) {
This flag suppresses normal output and shows the number of lines that match
the given patterns for each file searched. Each file containing a match has its
path and count printed on each line. Note that this reports the number of lines
that match and not the total number of matches.
that match and not the total number of matches, unless -U/--multiline is
enabled. In multiline mode, --count is equivalent to --count-matches.
If only one file is given to ripgrep, then only the count is printed if there
is a match. The --with-filename flag can be used to force printing the file
path in this case.
path in this case. If you need a count to be printed regardless of whether
there is a match, then use --include-zero.
This overrides the --count-matches flag. Note that when --count is combined
with --only-matching, then ripgrep behaves as if --count-matches was given.
@@ -1206,7 +1217,7 @@ between supported regex engines depending on the features used in a pattern on
a best effort basis.
Note that the 'pcre2' engine is an optional ripgrep feature. If PCRE2 wasn't
including in your build of ripgrep, then using this flag will result in ripgrep
included in your build of ripgrep, then using this flag will result in ripgrep
printing an error message and exiting.
This overrides previous uses of --pcre2 and --auto-hybrid-regex flags.
@@ -1224,6 +1235,38 @@ This overrides previous uses of --pcre2 and --auto-hybrid-regex flags.
args.push(arg);
}
fn flag_field_context_separator(args: &mut Vec<RGArg>) {
const SHORT: &str = "Set the field context separator.";
const LONG: &str = long!(
"\
Set the field context separator, which is used to delimit file paths, line
numbers, columns and the context itself, when printing contextual lines. The
separator may be any number of bytes, including zero. Escape sequences like
\\x7F or \\t may be used. The '-' character is the default value.
"
);
let arg = RGArg::flag("field-context-separator", "SEPARATOR")
.help(SHORT)
.long_help(LONG);
args.push(arg);
}
fn flag_field_match_separator(args: &mut Vec<RGArg>) {
const SHORT: &str = "Set the match separator.";
const LONG: &str = long!(
"\
Set the field match separator, which is used to delimit file paths, line
numbers, columns and the match itself. The separator may be any number of
bytes, including zero. Escape sequences like \\x7F or \\t may be used. The ':'
character is the default value.
"
);
let arg = RGArg::flag("field-match-separator", "SEPARATOR")
.help(SHORT)
.long_help(LONG);
args.push(arg);
}
fn flag_file(args: &mut Vec<RGArg>) {
const SHORT: &str = "Search for patterns from the given file.";
const LONG: &str = long!(
@@ -1263,10 +1306,10 @@ This is useful to determine whether a particular file is being searched or not.
}
fn flag_files_with_matches(args: &mut Vec<RGArg>) {
const SHORT: &str = "Only print the paths with at least one match.";
const SHORT: &str = "Print the paths with at least one match.";
const LONG: &str = long!(
"\
Only print the paths with at least one match.
Print the paths with at least one match and suppress match contents.
This overrides --files-without-match.
"
@@ -1280,11 +1323,11 @@ This overrides --files-without-match.
}
fn flag_files_without_match(args: &mut Vec<RGArg>) {
const SHORT: &str = "Only print the paths that contain zero matches.";
const SHORT: &str = "Print the paths that contain zero matches.";
const LONG: &str = long!(
"\
Only print the paths that contain zero matches. This inverts/negates the
--files-with-matches flag.
Print the paths that contain zero matches and suppress match contents. This
inverts/negates the --files-with-matches flag.
This overrides --files-with-matches.
"
@@ -1351,6 +1394,13 @@ used. Globbing rules match .gitignore globs. Precede a glob with a ! to exclude
it. If multiple globs match a file or directory, the glob given later in the
command line takes precedence.
As an extension, globs support specifying alternatives: *-g ab{c,d}* is
equivalent to *-g abc -g abd*. Empty alternatives like *-g ab{,c}* are not
currently supported. Note that this syntax extension is also currently enabled
in gitignore files, even though this syntax isn't supported by git itself.
ripgrep may disable this syntax extension in gitignore files, but it will
always remain available via the -g/--glob flag.
When this flag is set, every file and directory is applied to it to test for
a match. So for example, if you only want to search in a particular directory
'foo', then *-g foo* is incorrect because 'foo/bar' does not match the glob
@@ -1430,10 +1480,15 @@ Search hidden files and directories. By default, hidden files and directories
are skipped. Note that if a hidden file or a directory is whitelisted in an
ignore file, then it will be searched even if this flag isn't provided.
A file or directory is considered hidden if its base name starts with a dot
character ('.'). On operating systems which support a `hidden` file attribute,
like Windows, files with this attribute are also considered hidden.
This flag can be disabled with --no-hidden.
"
);
let arg = RGArg::switch("hidden")
.short(".")
.help(SHORT)
.long_help(LONG)
.overrides("no-hidden");
@@ -1493,7 +1548,7 @@ When specifying multiple ignore files, earlier files have lower precedence
than later files.
If you are looking for a way to include or exclude files and directories
directly on the command line, then used -g instead.
directly on the command line, then use -g instead.
"
);
let arg = RGArg::flag("ignore-file", "PATH")
@@ -1968,6 +2023,9 @@ fn flag_no_ignore_dot(args: &mut Vec<RGArg>) {
"\
Don't respect .ignore files.
This does *not* affect whether ripgrep will ignore files and directories
whose names begin with a dot. For that, see the -./--hidden flag.
This flag can be disabled with the --ignore-dot flag.
"
);
@@ -2338,12 +2396,17 @@ the empty string. For example, if you are searching using 'rg foo' then using
'rg \"^|foo\"' instead will emit every line in every file searched, but only
occurrences of 'foo' will be highlighted. This flag enables the same behavior
without needing to modify the pattern.
This overrides the --context, --after-context and --before-context flags.
"
);
let arg = RGArg::switch("passthru")
.help(SHORT)
.long_help(LONG)
.alias("passthrough");
.alias("passthrough")
.overrides("after-context")
.overrides("before-context")
.overrides("context");
args.push(arg);
}
@@ -2964,8 +3027,8 @@ fn flag_unrestricted(args: &mut Vec<RGArg>) {
"\
Reduce the level of \"smart\" searching. A single -u won't respect .gitignore
(etc.) files (--no-ignore). Two -u flags will additionally search hidden files
and directories (--hidden). Three -u flags will additionally search binary files
(--binary).
and directories (-./--hidden). Three -u flags will additionally search binary
files (--binary).
'rg -uuu' is roughly equivalent to 'grep -r'.
"

View File

@@ -104,7 +104,7 @@ struct ArgsImp {
///
/// It's important that this is only built once, since building this goes
/// through regex compilation and various types of analyses. That is, if
/// you need many of theses (one per thread, for example), it is better to
/// you need many of these (one per thread, for example), it is better to
/// build it once and then clone it.
matcher: PatternMatcher,
/// The paths provided at the command line. This is guaranteed to be
@@ -186,7 +186,7 @@ impl Args {
/// Returns true if and only if `paths` had to be populated with a default
/// path, which occurs only when no paths were given as command line
/// arguments.
fn using_default_path(&self) -> bool {
pub fn using_default_path(&self) -> bool {
self.0.using_default_path
}
@@ -290,7 +290,7 @@ impl Args {
let mut builder = SearchWorkerBuilder::new();
builder
.json_stats(matches.is_present("json"))
.preprocessor(matches.preprocessor())
.preprocessor(matches.preprocessor())?
.preprocessor_globs(matches.preprocessor_globs()?)
.search_zip(matches.is_present("search-zip"))
.binary_detection_implicit(matches.binary_detection_implicit())
@@ -777,6 +777,7 @@ impl ArgMatches {
.path(self.with_filename(paths))
.only_matching(self.is_present("only-matching"))
.per_match(self.is_present("vimgrep"))
.per_match_one_line(true)
.replacement(self.replacement())
.max_columns(self.max_columns()?)
.max_columns_preview(self.max_columns_preview())
@@ -786,8 +787,8 @@ impl ArgMatches {
.trim_ascii(self.is_present("trim"))
.separator_search(None)
.separator_context(self.context_separator())
.separator_field_match(b":".to_vec())
.separator_field_context(b"-".to_vec())
.separator_field_match(self.field_match_separator())
.separator_field_context(self.field_context_separator())
.separator_path(self.path_separator()?)
.path_terminator(self.path_terminator());
if separator_search {
@@ -1377,6 +1378,24 @@ impl ArgMatches {
}
}
/// Returns the unescaped field context separator. If one wasn't specified,
/// then '-' is used as the default.
fn field_context_separator(&self) -> Vec<u8> {
match self.value_of_os("field-context-separator") {
None => b"-".to_vec(),
Some(sep) => cli::unescape_os(&sep),
}
}
/// Returns the unescaped field match separator. If one wasn't specified,
/// then ':' is used as the default.
fn field_match_separator(&self) -> Vec<u8> {
match self.value_of_os("field-match-separator") {
None => b":".to_vec(),
Some(sep) => cli::unescape_os(&sep),
}
}
/// Get a sequence of all available patterns from the command line.
/// This includes reading the -e/--regexp and -f/--file flags.
///
@@ -1701,7 +1720,7 @@ impl ArgMatches {
self.0.value_of_os(name)
}
fn values_of_os(&self, name: &str) -> Option<clap::OsValues> {
fn values_of_os(&self, name: &str) -> Option<clap::OsValues<'_>> {
self.0.values_of_os(name)
}
}

View File

@@ -28,7 +28,10 @@ pub fn args() -> Vec<OsString> {
let (args, errs) = match parse(&config_path) {
Ok((args, errs)) => (args, errs),
Err(err) => {
message!("{}", err);
message!(
"failed to read the file specified in RIPGREP_CONFIG_PATH: {}",
err
);
return vec![];
}
};

View File

@@ -24,13 +24,13 @@ impl Logger {
}
impl Log for Logger {
fn enabled(&self, _: &log::Metadata) -> bool {
fn enabled(&self, _: &log::Metadata<'_>) -> bool {
// We set the log level via log::set_max_level, so we don't need to
// implement filtering here.
true
}
fn log(&self, record: &log::Record) {
fn log(&self, record: &log::Record<'_>) {
match (record.file(), record.line()) {
(Some(file), Some(line)) => {
eprintln!(

View File

@@ -83,12 +83,14 @@ fn search(args: &Args) -> Result<bool> {
let mut stats = args.stats()?;
let mut searcher = args.search_worker(args.stdout())?;
let mut matched = false;
let mut searched = false;
for result in args.walker()? {
let subject = match subject_builder.build_from_result(result) {
Some(subject) => subject,
None => continue,
};
searched = true;
let search_result = match searcher.search(&subject) {
Ok(search_result) => search_result,
Err(err) => {
@@ -108,6 +110,9 @@ fn search(args: &Args) -> Result<bool> {
break;
}
}
if args.using_default_path() && !searched {
eprint_nothing_searched();
}
if let Some(ref stats) = stats {
let elapsed = Instant::now().duration_since(started_at);
// We don't care if we couldn't print this successfully.
@@ -129,11 +134,13 @@ fn search_parallel(args: &Args) -> Result<bool> {
let bufwtr = args.buffer_writer()?;
let stats = args.stats()?.map(Mutex::new);
let matched = AtomicBool::new(false);
let searched = AtomicBool::new(false);
let mut searcher_err = None;
args.walker_parallel()?.run(|| {
let bufwtr = &bufwtr;
let stats = &stats;
let matched = &matched;
let searched = &searched;
let subject_builder = &subject_builder;
let mut searcher = match args.search_worker(bufwtr.buffer()) {
Ok(searcher) => searcher,
@@ -148,6 +155,7 @@ fn search_parallel(args: &Args) -> Result<bool> {
Some(subject) => subject,
None => return WalkState::Continue,
};
searched.store(true, SeqCst);
searcher.printer().get_mut().clear();
let search_result = match searcher.search(&subject) {
Ok(search_result) => search_result,
@@ -181,6 +189,9 @@ fn search_parallel(args: &Args) -> Result<bool> {
if let Some(err) = searcher_err.take() {
return Err(err);
}
if args.using_default_path() && !searched.load(SeqCst) {
eprint_nothing_searched();
}
if let Some(ref locked_stats) = stats {
let elapsed = Instant::now().duration_since(started_at);
let stats = locked_stats.lock().unwrap();
@@ -191,6 +202,14 @@ fn search_parallel(args: &Args) -> Result<bool> {
Ok(matched.load(SeqCst))
}
fn eprint_nothing_searched() {
err_message!(
"No files were searched, which means ripgrep probably \
applied a filter you didn't expect.\n\
Running with --debug will show why files are being skipped."
);
}
/// The top-level entry point for listing files without searching them. This
/// recursively steps through the file list (current directory by default) and
/// prints each path sequentially using a single thread.

View File

@@ -115,9 +115,14 @@ impl SearchWorkerBuilder {
pub fn preprocessor(
&mut self,
cmd: Option<PathBuf>,
) -> &mut SearchWorkerBuilder {
self.config.preprocessor = cmd;
self
) -> crate::Result<&mut SearchWorkerBuilder> {
if let Some(ref prog) = cmd {
let bin = cli::resolve_binary(prog)?;
self.config.preprocessor = Some(bin);
} else {
self.config.preprocessor = None;
}
Ok(self)
}
/// Set the globs for determining which files should be run through the
@@ -325,11 +330,12 @@ impl<W: WriteColor> SearchWorker<W> {
} else {
self.config.binary_implicit.clone()
};
self.searcher.set_binary_detection(bin);
let path = subject.path();
log::trace!("{}: binary detection: {:?}", path.display(), bin);
self.searcher.set_binary_detection(bin);
if subject.is_stdin() {
self.search_reader(path, io::stdin().lock())
self.search_reader(path, &mut io::stdin().lock())
} else if self.should_preprocess(path) {
self.search_preprocessor(path)
} else if self.should_decompress(path) {
@@ -393,7 +399,7 @@ impl<W: WriteColor> SearchWorker<W> {
let mut cmd = Command::new(bin);
cmd.arg(path).stdin(Stdio::from(File::open(path)?));
let rdr = self.command_builder.build(&mut cmd).map_err(|err| {
let mut rdr = self.command_builder.build(&mut cmd).map_err(|err| {
io::Error::new(
io::ErrorKind::Other,
format!(
@@ -402,20 +408,28 @@ impl<W: WriteColor> SearchWorker<W> {
),
)
})?;
self.search_reader(path, rdr).map_err(|err| {
let result = self.search_reader(path, &mut rdr).map_err(|err| {
io::Error::new(
io::ErrorKind::Other,
format!("preprocessor command failed: '{:?}': {}", cmd, err),
)
})
});
let close_result = rdr.close();
let search_result = result?;
close_result?;
Ok(search_result)
}
/// Attempt to decompress the data at the given file path and search the
/// result. If the given file path isn't recognized as a compressed file,
/// then search it without doing any decompression.
fn search_decompress(&mut self, path: &Path) -> io::Result<SearchResult> {
let rdr = self.decomp_builder.build(path)?;
self.search_reader(path, rdr)
let mut rdr = self.decomp_builder.build(path)?;
let result = self.search_reader(path, &mut rdr);
let close_result = rdr.close();
let search_result = result?;
close_result?;
Ok(search_result)
}
/// Search the contents of the given file path.
@@ -442,7 +456,7 @@ impl<W: WriteColor> SearchWorker<W> {
fn search_reader<R: io::Read>(
&mut self,
path: &Path,
rdr: R,
rdr: &mut R,
) -> io::Result<SearchResult> {
use self::PatternMatcher::*;
@@ -498,12 +512,12 @@ fn search_reader<M: Matcher, R: io::Read, W: WriteColor>(
searcher: &mut Searcher,
printer: &mut Printer<W>,
path: &Path,
rdr: R,
mut rdr: R,
) -> io::Result<SearchResult> {
match *printer {
Printer::Standard(ref mut p) => {
let mut sink = p.sink_with_path(&matcher, path);
searcher.search_reader(&matcher, rdr, &mut sink)?;
searcher.search_reader(&matcher, &mut rdr, &mut sink)?;
Ok(SearchResult {
has_match: sink.has_match(),
stats: sink.stats().map(|s| s.clone()),
@@ -511,7 +525,7 @@ fn search_reader<M: Matcher, R: io::Read, W: WriteColor>(
}
Printer::Summary(ref mut p) => {
let mut sink = p.sink_with_path(&matcher, path);
searcher.search_reader(&matcher, rdr, &mut sink)?;
searcher.search_reader(&matcher, &mut rdr, &mut sink)?;
Ok(SearchResult {
has_match: sink.has_match(),
stats: sink.stats().map(|s| s.clone()),
@@ -519,7 +533,7 @@ fn search_reader<M: Matcher, R: io::Read, W: WriteColor>(
}
Printer::JSON(ref mut p) => {
let mut sink = p.sink_with_path(&matcher, path);
searcher.search_reader(&matcher, rdr, &mut sink)?;
searcher.search_reader(&matcher, &mut rdr, &mut sink)?;
Ok(SearchResult {
has_match: sink.has_match(),
stats: Some(sink.stats().clone()),

View File

@@ -67,7 +67,7 @@ impl SubjectBuilder {
if subj.is_file() {
return Some(subj);
}
// We got nothin. Emit a debug message, but only if this isn't a
// We got nothing. Emit a debug message, but only if this isn't a
// directory. Otherwise, emitting messages for directories is just
// noisy.
if !subj.is_dir() {

View File

@@ -1,6 +1,6 @@
[package]
name = "globset"
version = "0.4.6" #:version
version = "0.4.9" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Cross platform single glob and glob set matching. Glob set matching is the
@@ -12,7 +12,8 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/globset"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/globset"
readme = "README.md"
keywords = ["regex", "glob", "multiple", "set", "pattern"]
license = "Unlicense/MIT"
license = "Unlicense OR MIT"
edition = "2018"
[lib]
name = "globset"
@@ -22,7 +23,7 @@ bench = false
aho-corasick = "0.7.3"
bstr = { version = "0.2.0", default-features = false, features = ["std"] }
fnv = "1.0.6"
log = "0.4.5"
log = { version = "0.4.5", optional = true }
regex = { version = "1.1.5", default-features = false, features = ["perf", "std"] }
serde = { version = "1.0.104", optional = true }
@@ -32,5 +33,6 @@ lazy_static = "1"
serde_json = "1.0.45"
[features]
default = ["log"]
simd-accel = []
serde1 = ["serde"]

View File

@@ -4,11 +4,10 @@ Cross platform single glob and glob set matching. Glob set matching is the
process of matching one or more glob patterns against a single candidate path
simultaneously, and returning all of the globs that matched.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![Build status](https://github.com/BurntSushi/ripgrep/workflows/ci/badge.svg)](https://github.com/BurntSushi/ripgrep/actions)
[![](https://img.shields.io/crates/v/globset.svg)](https://crates.io/crates/globset)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
Dual-licensed under MIT or the [UNLICENSE](https://unlicense.org/).
### Documentation
@@ -20,13 +19,7 @@ Add this to your `Cargo.toml`:
```toml
[dependencies]
globset = "0.3"
```
and this to your crate root:
```rust
extern crate globset;
globset = "0.4"
```
### Features
@@ -85,12 +78,12 @@ assert_eq!(set.matches("src/bar/baz/foo.rs"), vec![0, 2]);
This crate implements globs by converting them to regular expressions, and
executing them with the
[`regex`](https://github.com/rust-lang-nursery/regex)
[`regex`](https://github.com/rust-lang/regex)
crate.
For single glob matching, performance of this crate should be roughly on par
with the performance of the
[`glob`](https://github.com/rust-lang-nursery/glob)
[`glob`](https://github.com/rust-lang/glob)
crate. (`*_regex` correspond to benchmarks for this library while `*_glob`
correspond to benchmarks for the `glob` library.)
Optimizations in the `regex` crate may propel this library past `glob`,
@@ -115,7 +108,7 @@ test many_short_glob ... bench: 1,063 ns/iter (+/- 47)
test many_short_regex_set ... bench: 186 ns/iter (+/- 11)
```
### Comparison with the [`glob`](https://github.com/rust-lang-nursery/glob) crate
### Comparison with the [`glob`](https://github.com/rust-lang/glob) crate
* Supports alternate "or" globs, e.g., `*.{foo,bar}`.
* Can match non-UTF-8 file paths correctly.

View File

@@ -4,9 +4,6 @@ tool itself, see the benchsuite directory.
*/
#![feature(test)]
extern crate glob;
extern crate globset;
extern crate regex;
extern crate test;
use globset::{Candidate, Glob, GlobMatcher, GlobSet, GlobSetBuilder};

View File

@@ -8,7 +8,7 @@ use std::str;
use regex;
use regex::bytes::Regex;
use {new_regex, Candidate, Error, ErrorKind};
use crate::{new_regex, Candidate, Error, ErrorKind};
/// Describes a matching strategy for a particular pattern.
///
@@ -98,7 +98,7 @@ impl hash::Hash for Glob {
}
impl fmt::Display for Glob {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
self.glob.fmt(f)
}
}
@@ -127,7 +127,7 @@ impl GlobMatcher {
}
/// Tests whether the given path matches this pattern or not.
pub fn is_match_candidate(&self, path: &Candidate) -> bool {
pub fn is_match_candidate(&self, path: &Candidate<'_>) -> bool {
self.re.is_match(&path.path)
}
@@ -143,8 +143,6 @@ impl GlobMatcher {
struct GlobStrategic {
/// The match strategy to use.
strategy: MatchStrategy,
/// The underlying pattern.
pat: Glob,
/// The pattern, as a compiled regex.
re: Regex,
}
@@ -157,7 +155,7 @@ impl GlobStrategic {
}
/// Tests whether the given path matches this pattern or not.
fn is_match_candidate(&self, candidate: &Candidate) -> bool {
fn is_match_candidate(&self, candidate: &Candidate<'_>) -> bool {
let byte_path = &*candidate.path;
match self.strategy {
@@ -273,7 +271,7 @@ impl Glob {
let strategy = MatchStrategy::new(self);
let re =
new_regex(&self.re).expect("regex compilation shouldn't fail");
GlobStrategic { strategy: strategy, pat: self.clone(), re: re }
GlobStrategic { strategy: strategy, re: re }
}
/// Returns the original glob pattern used to build this pattern.
@@ -403,7 +401,7 @@ impl Glob {
if self.opts.case_insensitive {
return None;
}
let end = match self.tokens.last() {
let (end, need_sep) = match self.tokens.last() {
Some(&Token::ZeroOrMore) => {
if self.opts.literal_separator {
// If a trailing `*` can't match a `/`, then we can't
@@ -414,9 +412,10 @@ impl Glob {
// literal prefix.
return None;
}
self.tokens.len() - 1
(self.tokens.len() - 1, false)
}
_ => self.tokens.len(),
Some(&Token::RecursiveSuffix) => (self.tokens.len() - 1, true),
_ => (self.tokens.len(), false),
};
let mut lit = String::new();
for t in &self.tokens[0..end] {
@@ -425,6 +424,9 @@ impl Glob {
_ => return None,
}
}
if need_sep {
lit.push('/');
}
if lit.is_empty() {
None
} else {
@@ -612,6 +614,8 @@ impl<'a> GlobBuilder<'a> {
}
/// Toggle whether a literal `/` is required to match a path separator.
///
/// By default this is false: `*` and `?` will match `/`.
pub fn literal_separator(&mut self, yes: bool) -> &mut GlobBuilder<'a> {
self.opts.literal_separator = yes;
self
@@ -683,7 +687,7 @@ impl Tokens {
re.push_str("(?:/?|.*/)");
}
Token::RecursiveSuffix => {
re.push_str("(?:/?|/.*)");
re.push_str("/.*");
}
Token::RecursiveZeroOrMore => {
re.push_str("(?:/|/.*/)");
@@ -1009,7 +1013,7 @@ fn ends_with(needle: &[u8], haystack: &[u8]) -> bool {
mod tests {
use super::Token::*;
use super::{Glob, GlobBuilder, Token};
use {ErrorKind, GlobSetBuilder};
use crate::{ErrorKind, GlobSetBuilder};
#[derive(Clone, Copy, Debug, Default)]
struct Options {
@@ -1222,9 +1226,9 @@ mod tests {
toregex!(re16, "**/**/*", r"^(?:/?|.*/).*$");
toregex!(re17, "**/**/**", r"^.*$");
toregex!(re18, "**/**/**/*", r"^(?:/?|.*/).*$");
toregex!(re19, "a/**", r"^a(?:/?|/.*)$");
toregex!(re20, "a/**/**", r"^a(?:/?|/.*)$");
toregex!(re21, "a/**/**/**", r"^a(?:/?|/.*)$");
toregex!(re19, "a/**", r"^a/.*$");
toregex!(re20, "a/**/**", r"^a/.*$");
toregex!(re21, "a/**/**/**", r"^a/.*$");
toregex!(re22, "a/**/b", r"^a(?:/|/.*/)b$");
toregex!(re23, "a/**/**/b", r"^a(?:/|/.*/)b$");
toregex!(re24, "a/**/**/**/b", r"^a(?:/|/.*/)b$");
@@ -1270,11 +1274,12 @@ mod tests {
matches!(matchrec18, "/**/test", "/test");
matches!(matchrec19, "**/.*", ".abc");
matches!(matchrec20, "**/.*", "abc/.abc");
matches!(matchrec21, ".*/**", ".abc");
matches!(matchrec21, "**/foo/bar", "foo/bar");
matches!(matchrec22, ".*/**", ".abc/abc");
matches!(matchrec23, "foo/**", "foo");
matches!(matchrec24, "**/foo/bar", "foo/bar");
matches!(matchrec25, "some/*/needle.txt", "some/one/needle.txt");
matches!(matchrec23, "test/**", "test/");
matches!(matchrec24, "test/**", "test/one");
matches!(matchrec25, "test/**", "test/one/two");
matches!(matchrec26, "some/*/needle.txt", "some/one/needle.txt");
matches!(matchrange1, "a[0-9]b", "a0b");
matches!(matchrange2, "a[0-9]b", "a9b");
@@ -1400,6 +1405,8 @@ mod tests {
"some/one/two/three/needle.txt",
SLASHLIT
);
nmatches!(matchrec33, ".*/**", ".abc");
nmatches!(matchrec34, "foo/**", "foo");
macro_rules! extract {
($which:ident, $name:ident, $pat:expr, $expect:expr) => {
@@ -1504,7 +1511,7 @@ mod tests {
prefix!(extract_prefix1, "/foo", Some(s("/foo")));
prefix!(extract_prefix2, "/foo/*", Some(s("/foo/")));
prefix!(extract_prefix3, "**/foo", None);
prefix!(extract_prefix4, "foo/**", None);
prefix!(extract_prefix4, "foo/**", Some(s("foo/")));
suffix!(extract_suffix1, "**/foo/bar", Some((s("/foo/bar"), true)));
suffix!(extract_suffix2, "*/foo/bar", Some((s("/foo/bar"), false)));

View File

@@ -103,16 +103,6 @@ or to enable case insensitive matching.
#![deny(missing_docs)]
extern crate aho_corasick;
extern crate bstr;
extern crate fnv;
#[macro_use]
extern crate log;
extern crate regex;
#[cfg(feature = "serde1")]
extern crate serde;
use std::borrow::Cow;
use std::collections::{BTreeMap, HashMap};
use std::error::Error as StdError;
@@ -125,9 +115,9 @@ use aho_corasick::AhoCorasick;
use bstr::{ByteSlice, ByteVec, B};
use regex::bytes::{Regex, RegexBuilder, RegexSet};
use glob::MatchStrategy;
pub use glob::{Glob, GlobBuilder, GlobMatcher};
use pathutil::{file_name, file_name_ext, normalize_path};
use crate::glob::MatchStrategy;
pub use crate::glob::{Glob, GlobBuilder, GlobMatcher};
use crate::pathutil::{file_name, file_name_ext, normalize_path};
mod glob;
mod pathutil;
@@ -135,6 +125,16 @@ mod pathutil;
#[cfg(feature = "serde1")]
mod serde_impl;
#[cfg(feature = "log")]
macro_rules! debug {
($($token:tt)*) => (::log::debug!($($token)*);)
}
#[cfg(not(feature = "log"))]
macro_rules! debug {
($($token:tt)*) => {};
}
/// Represents an error that can occur when parsing a glob pattern.
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct Error {
@@ -228,7 +228,7 @@ impl ErrorKind {
}
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self.glob {
None => self.kind.fmt(f),
Some(ref glob) => {
@@ -239,7 +239,7 @@ impl fmt::Display for Error {
}
impl fmt::Display for ErrorKind {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match *self {
ErrorKind::InvalidRecursive
| ErrorKind::UnclosedClass
@@ -317,7 +317,7 @@ impl GlobSet {
///
/// This takes a Candidate as input, which can be used to amortize the
/// cost of preparing a path for matching.
pub fn is_match_candidate(&self, path: &Candidate) -> bool {
pub fn is_match_candidate(&self, path: &Candidate<'_>) -> bool {
if self.is_empty() {
return false;
}
@@ -340,7 +340,7 @@ impl GlobSet {
///
/// This takes a Candidate as input, which can be used to amortize the
/// cost of preparing a path for matching.
pub fn matches_candidate(&self, path: &Candidate) -> Vec<usize> {
pub fn matches_candidate(&self, path: &Candidate<'_>) -> Vec<usize> {
let mut into = vec![];
if self.is_empty() {
return into;
@@ -374,7 +374,7 @@ impl GlobSet {
/// cost of preparing a path for matching.
pub fn matches_candidate_into(
&self,
path: &Candidate,
path: &Candidate<'_>,
into: &mut Vec<usize>,
) {
into.clear();
@@ -456,6 +456,13 @@ impl GlobSet {
}
}
impl Default for GlobSet {
/// Create a default empty GlobSet.
fn default() -> Self {
GlobSet::empty()
}
}
/// GlobSetBuilder builds a group of patterns that can be used to
/// simultaneously match a file path.
#[derive(Clone, Debug)]
@@ -536,7 +543,7 @@ enum GlobSetMatchStrategy {
}
impl GlobSetMatchStrategy {
fn is_match(&self, candidate: &Candidate) -> bool {
fn is_match(&self, candidate: &Candidate<'_>) -> bool {
use self::GlobSetMatchStrategy::*;
match *self {
Literal(ref s) => s.is_match(candidate),
@@ -549,7 +556,11 @@ impl GlobSetMatchStrategy {
}
}
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
fn matches_into(
&self,
candidate: &Candidate<'_>,
matches: &mut Vec<usize>,
) {
use self::GlobSetMatchStrategy::*;
match *self {
Literal(ref s) => s.matches_into(candidate, matches),
@@ -575,12 +586,16 @@ impl LiteralStrategy {
self.0.entry(lit.into_bytes()).or_insert(vec![]).push(global_index);
}
fn is_match(&self, candidate: &Candidate) -> bool {
fn is_match(&self, candidate: &Candidate<'_>) -> bool {
self.0.contains_key(candidate.path.as_bytes())
}
#[inline(never)]
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
fn matches_into(
&self,
candidate: &Candidate<'_>,
matches: &mut Vec<usize>,
) {
if let Some(hits) = self.0.get(candidate.path.as_bytes()) {
matches.extend(hits);
}
@@ -599,7 +614,7 @@ impl BasenameLiteralStrategy {
self.0.entry(lit.into_bytes()).or_insert(vec![]).push(global_index);
}
fn is_match(&self, candidate: &Candidate) -> bool {
fn is_match(&self, candidate: &Candidate<'_>) -> bool {
if candidate.basename.is_empty() {
return false;
}
@@ -607,7 +622,11 @@ impl BasenameLiteralStrategy {
}
#[inline(never)]
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
fn matches_into(
&self,
candidate: &Candidate<'_>,
matches: &mut Vec<usize>,
) {
if candidate.basename.is_empty() {
return;
}
@@ -629,7 +648,7 @@ impl ExtensionStrategy {
self.0.entry(ext.into_bytes()).or_insert(vec![]).push(global_index);
}
fn is_match(&self, candidate: &Candidate) -> bool {
fn is_match(&self, candidate: &Candidate<'_>) -> bool {
if candidate.ext.is_empty() {
return false;
}
@@ -637,7 +656,11 @@ impl ExtensionStrategy {
}
#[inline(never)]
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
fn matches_into(
&self,
candidate: &Candidate<'_>,
matches: &mut Vec<usize>,
) {
if candidate.ext.is_empty() {
return;
}
@@ -655,7 +678,7 @@ struct PrefixStrategy {
}
impl PrefixStrategy {
fn is_match(&self, candidate: &Candidate) -> bool {
fn is_match(&self, candidate: &Candidate<'_>) -> bool {
let path = candidate.path_prefix(self.longest);
for m in self.matcher.find_overlapping_iter(path) {
if m.start() == 0 {
@@ -665,7 +688,11 @@ impl PrefixStrategy {
false
}
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
fn matches_into(
&self,
candidate: &Candidate<'_>,
matches: &mut Vec<usize>,
) {
let path = candidate.path_prefix(self.longest);
for m in self.matcher.find_overlapping_iter(path) {
if m.start() == 0 {
@@ -683,7 +710,7 @@ struct SuffixStrategy {
}
impl SuffixStrategy {
fn is_match(&self, candidate: &Candidate) -> bool {
fn is_match(&self, candidate: &Candidate<'_>) -> bool {
let path = candidate.path_suffix(self.longest);
for m in self.matcher.find_overlapping_iter(path) {
if m.end() == path.len() {
@@ -693,7 +720,11 @@ impl SuffixStrategy {
false
}
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
fn matches_into(
&self,
candidate: &Candidate<'_>,
matches: &mut Vec<usize>,
) {
let path = candidate.path_suffix(self.longest);
for m in self.matcher.find_overlapping_iter(path) {
if m.end() == path.len() {
@@ -707,7 +738,7 @@ impl SuffixStrategy {
struct RequiredExtensionStrategy(HashMap<Vec<u8>, Vec<(usize, Regex)>, Fnv>);
impl RequiredExtensionStrategy {
fn is_match(&self, candidate: &Candidate) -> bool {
fn is_match(&self, candidate: &Candidate<'_>) -> bool {
if candidate.ext.is_empty() {
return false;
}
@@ -725,7 +756,11 @@ impl RequiredExtensionStrategy {
}
#[inline(never)]
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
fn matches_into(
&self,
candidate: &Candidate<'_>,
matches: &mut Vec<usize>,
) {
if candidate.ext.is_empty() {
return;
}
@@ -746,11 +781,15 @@ struct RegexSetStrategy {
}
impl RegexSetStrategy {
fn is_match(&self, candidate: &Candidate) -> bool {
fn is_match(&self, candidate: &Candidate<'_>) -> bool {
self.matcher.is_match(candidate.path.as_bytes())
}
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
fn matches_into(
&self,
candidate: &Candidate<'_>,
matches: &mut Vec<usize>,
) {
for i in self.matcher.matches(candidate.path.as_bytes()) {
matches.push(self.map[i]);
}
@@ -833,8 +872,8 @@ impl RequiredExtensionStrategyBuilder {
#[cfg(test)]
mod tests {
use super::GlobSetBuilder;
use glob::Glob;
use super::{GlobSet, GlobSetBuilder};
use crate::glob::Glob;
#[test]
fn set_works() {
@@ -863,4 +902,11 @@ mod tests {
assert!(!set.is_match(""));
assert!(!set.is_match("a"));
}
#[test]
fn default_set_is_empty_works() {
let set: GlobSet = Default::default();
assert!(!set.is_match(""));
assert!(!set.is_match("a"));
}
}

View File

@@ -60,7 +60,7 @@ pub fn file_name_ext<'a>(name: &Cow<'a, [u8]>) -> Option<Cow<'a, [u8]>> {
/// Normalizes a path to use `/` as a separator everywhere, even on platforms
/// that recognize other characters as separators.
#[cfg(unix)]
pub fn normalize_path(path: Cow<[u8]>) -> Cow<[u8]> {
pub fn normalize_path(path: Cow<'_, [u8]>) -> Cow<'_, [u8]> {
// UNIX only uses /, so we're good.
path
}

View File

@@ -1,7 +1,7 @@
use serde::de::Error;
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use Glob;
use crate::Glob;
impl Serialize for Glob {
fn serialize<S: Serializer>(

View File

@@ -1,24 +1,25 @@
[package]
name = "grep"
version = "0.2.7" #:version
version = "0.2.9" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Fast line oriented regex searching as a library.
"""
documentation = "http://burntsushi.net/rustdoc/grep/"
documentation = "https://docs.rs/grep"
homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/grep"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/grep"
readme = "README.md"
keywords = ["regex", "grep", "egrep", "search", "pattern"]
license = "Unlicense/MIT"
license = "Unlicense OR MIT"
edition = "2018"
[dependencies]
grep-cli = { version = "0.1.5", path = "../cli" }
grep-matcher = { version = "0.1.4", path = "../matcher" }
grep-pcre2 = { version = "0.1.4", path = "../pcre2", optional = true }
grep-printer = { version = "0.1.5", path = "../printer" }
grep-regex = { version = "0.1.8", path = "../regex" }
grep-searcher = { version = "0.1.7", path = "../searcher" }
grep-cli = { version = "0.1.6", path = "../cli" }
grep-matcher = { version = "0.1.5", path = "../matcher" }
grep-pcre2 = { version = "0.1.5", path = "../pcre2", optional = true }
grep-printer = { version = "0.1.6", path = "../printer" }
grep-regex = { version = "0.1.10", path = "../regex" }
grep-searcher = { version = "0.1.9", path = "../searcher" }
[dev-dependencies]
termcolor = "1.0.4"

View File

@@ -2,11 +2,10 @@ grep
----
ripgrep, as a library.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![Build status](https://github.com/BurntSushi/ripgrep/workflows/ci/badge.svg)](https://github.com/BurntSushi/ripgrep/actions)
[![](https://img.shields.io/crates/v/grep.svg)](https://crates.io/crates/grep)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
Dual-licensed under MIT or the [UNLICENSE](https://unlicense.org/).
### Documentation
@@ -27,12 +26,6 @@ Add this to your `Cargo.toml`:
grep = "0.2"
```
and this to your crate root:
```rust
extern crate grep;
```
### Features

View File

@@ -1,7 +1,3 @@
extern crate grep;
extern crate termcolor;
extern crate walkdir;
use std::env;
use std::error::Error;
use std::ffi::OsString;

View File

@@ -1,6 +1,6 @@
[package]
name = "ignore"
version = "0.4.16" #:version
version = "0.4.18" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
A fast library for efficiently matching ignore files such as `.gitignore`
@@ -11,15 +11,16 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/ignore"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/ignore"
readme = "README.md"
keywords = ["glob", "ignore", "gitignore", "pattern", "file"]
license = "Unlicense/MIT"
license = "Unlicense OR MIT"
edition = "2018"
[lib]
name = "ignore"
bench = false
[dependencies]
crossbeam-utils = "0.7.0"
globset = { version = "0.4.5", path = "../globset" }
crossbeam-utils = "0.8.0"
globset = { version = "0.4.9", path = "../globset" }
lazy_static = "1.1"
log = "0.4.5"
memchr = "2.1"
@@ -32,7 +33,7 @@ walkdir = "2.2.7"
version = "0.1.2"
[dev-dependencies]
crossbeam-channel = "0.4.0"
crossbeam-channel = "0.5.0"
[features]
simd-accel = ["globset/simd-accel"]

View File

@@ -4,11 +4,10 @@ The ignore crate provides a fast recursive directory iterator that respects
various filters such as globs, file types and `.gitignore` files. This crate
also provides lower level direct access to gitignore and file type matchers.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![Build status](https://github.com/BurntSushi/ripgrep/workflows/ci/badge.svg)](https://github.com/BurntSushi/ripgrep/actions)
[![](https://img.shields.io/crates/v/ignore.svg)](https://crates.io/crates/ignore)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
Dual-licensed under MIT or the [UNLICENSE](https://unlicense.org/).
### Documentation
@@ -23,12 +22,6 @@ Add this to your `Cargo.toml`:
ignore = "0.4"
```
and this to your crate root:
```rust
extern crate ignore;
```
### Example
This example shows the most basic usage of this crate. This code will

View File

@@ -1,7 +1,3 @@
extern crate crossbeam_channel as channel;
extern crate ignore;
extern crate walkdir;
use std::env;
use std::io::{self, Write};
use std::path::Path;
@@ -14,7 +10,7 @@ fn main() {
let mut path = env::args().nth(1).unwrap();
let mut parallel = false;
let mut simple = false;
let (tx, rx) = channel::bounded::<DirEntry>(100);
let (tx, rx) = crossbeam_channel::bounded::<DirEntry>(100);
if path == "parallel" {
path = env::args().nth(2).unwrap();
parallel = true;

View File

@@ -4,7 +4,7 @@
/// types to each invocation of ripgrep with the '--type-add' flag.
///
/// If you would like to add or improve this list, please file a PR:
/// https://github.com/BurntSushi/ripgrep
/// <https://github.com/BurntSushi/ripgrep>.
///
/// Please try to keep this list sorted lexicographically and wrapped to 79
/// columns (inclusive).
@@ -16,12 +16,16 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
("asciidoc", &["*.adoc", "*.asc", "*.asciidoc"]),
("asm", &["*.asm", "*.s", "*.S"]),
("asp", &[
"*.aspx", "*.aspx.cs", "*.aspx.cs", "*.ascx", "*.ascx.cs", "*.ascx.vb",
"*.aspx", "*.aspx.cs", "*.aspx.vb", "*.ascx", "*.ascx.cs",
"*.ascx.vb", "*.asp"
]),
("ats", &["*.ats", "*.dats", "*.sats", "*.hats"]),
("avro", &["*.avdl", "*.avpr", "*.avsc"]),
("awk", &["*.awk"]),
("bazel", &["*.bzl", "WORKSPACE", "BUILD", "BUILD.bazel"]),
("bazel", &[
"*.bazel", "*.bzl", "*.BUILD", "*.bazelrc", "BUILD", "MODULE.bazel",
"WORKSPACE", "WORKSPACE.bazel",
]),
("bitbake", &["*.bb", "*.bbappend", "*.bbclass", "*.conf", "*.inc"]),
("brotli", &["*.br"]),
("buildstream", &["*.bst"]),
@@ -40,18 +44,20 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
"*.[ChH].in", "*.cc.in", "*.[ch]pp.in", "*.[ch]xx.in", "*.hh.in",
]),
("creole", &["*.creole"]),
("crystal", &["Projectfile", "*.cr"]),
("crystal", &["Projectfile", "*.cr", "*.ecr", "shard.yml"]),
("cs", &["*.cs"]),
("csharp", &["*.cs"]),
("cshtml", &["*.cshtml"]),
("css", &["*.css", "*.scss"]),
("csv", &["*.csv"]),
("cuda", &["*.cu", "*.cuh"]),
("cython", &["*.pyx", "*.pxi", "*.pxd"]),
("d", &["*.d"]),
("dart", &["*.dart"]),
("dhall", &["*.dhall"]),
("diff", &["*.patch", "*.diff"]),
("docker", &["*Dockerfile*"]),
("dts", &["*.dts", "*.dtsi"]),
("dvc", &["Dvcfile", "*.dvc"]),
("ebuild", &["*.ebuild"]),
("edn", &["*.edn"]),
@@ -60,6 +66,7 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
("elm", &["*.elm"]),
("erb", &["*.erb"]),
("erlang", &["*.erl", "*.hrl"]),
("fennel", &["*.fnl"]),
("fidl", &["*.fidl"]),
("fish", &["*.fish"]),
("flatbuffers", &["*.fbs"]),
@@ -68,19 +75,23 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
"*.f90", "*.F90", "*.f95", "*.F95",
]),
("fsharp", &["*.fs", "*.fsx", "*.fsi"]),
("fut", &["*.fut"]),
("gap", &["*.g", "*.gap", "*.gi", "*.gd", "*.tst"]),
("gn", &["*.gn", "*.gni"]),
("go", &["*.go"]),
("gradle", &["*.gradle"]),
("groovy", &["*.groovy", "*.gradle"]),
("gzip", &["*.gz", "*.tgz"]),
("h", &["*.h", "*.hpp"]),
("h", &["*.h", "*.hh", "*.hpp"]),
("haml", &["*.haml"]),
("hare", &["*.ha"]),
("haskell", &["*.hs", "*.lhs", "*.cpphs", "*.c2hs", "*.hsc"]),
("hbs", &["*.hbs"]),
("hs", &["*.hs", "*.lhs"]),
("html", &["*.htm", "*.html", "*.ejs"]),
("hy", &["*.hy"]),
("idris", &["*.idr", "*.lidr"]),
("janet", &["*.janet"]),
("java", &["*.java", "*.jsp", "*.jspx", "*.properties"]),
("jinja", &["*.j2", "*.jinja", "*.jinja2"]),
("jl", &["*.jl"]),
@@ -119,6 +130,7 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
"MPL-*[0-9]*",
"OFL-*[0-9]*",
]),
("lilypond", &["*.ly", "*.ily"]),
("lisp", &["*.el", "*.jl", "*.lisp", "*.lsp", "*.sc", "*.scm"]),
("lock", &["*.lock", "package-lock.json"]),
("log", &["*.log"]),
@@ -139,6 +151,7 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
("md", &["*.markdown", "*.md", "*.mdown", "*.mkdn"]),
("meson", &["meson.build", "meson_options.txt"]),
("minified", &["*.min.html", "*.min.css", "*.min.js"]),
("mint", &["*.mint"]),
("mk", &["mkfile"]),
("ml", &["*.ml"]),
("msbuild", &[
@@ -150,10 +163,12 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
("objcpp", &["*.h", "*.mm"]),
("ocaml", &["*.ml", "*.mli", "*.mll", "*.mly"]),
("org", &["*.org", "*.org_archive"]),
("pants", &["BUILD"]),
("pascal", &["*.pas", "*.dpr", "*.lpr", "*.pp", "*.inc"]),
("pdf", &["*.pdf"]),
("perl", &["*.perl", "*.pl", "*.PL", "*.plh", "*.plx", "*.pm", "*.t"]),
("php", &["*.php", "*.php3", "*.php4", "*.php5", "*.phtml"]),
("po", &["*.po"]),
("pod", &["*.pod"]),
("postscript", &["*.eps", "*.ps"]),
("protobuf", &["*.proto"]),
@@ -167,9 +182,15 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
("racket", &["*.rkt"]),
("rdoc", &["*.rdoc"]),
("readme", &["README*", "*README"]),
("red", &["*.r", "*.red", "*.reds"]),
("robot", &["*.robot"]),
("rst", &["*.rst"]),
("ruby", &["Gemfile", "*.gemspec", ".irbrc", "Rakefile", "*.rb"]),
("ruby", &[
// Idiomatic files
"config.ru", "Gemfile", ".irbrc", "Rakefile",
// Extensions
"*.gemspec", "*.rb", "*.rbw"
]),
("rust", &["*.rs"]),
("sass", &["*.sass", "*.scss"]),
("scala", &["*.scala", "*.sbt"]),
@@ -216,6 +237,7 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
("taskpaper", &["*.taskpaper"]),
("tcl", &["*.tcl"]),
("tex", &["*.tex", "*.ltx", "*.cls", "*.sty", "*.bib", "*.dtx", "*.ins"]),
("texinfo", &["*.texi"]),
("textile", &["*.textile"]),
("tf", &["*.tf"]),
("thrift", &["*.thrift"]),
@@ -229,8 +251,12 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
("vcl", &["*.vcl"]),
("verilog", &["*.v", "*.vh", "*.sv", "*.svh"]),
("vhdl", &["*.vhd", "*.vhdl"]),
("vim", &["*.vim"]),
("vimscript", &["*.vim"]),
("vim", &[
"*.vim", ".vimrc", ".gvimrc", "vimrc", "gvimrc", "_vimrc", "_gvimrc",
]),
("vimscript", &[
"*.vim", ".vimrc", ".gvimrc", "vimrc", "gvimrc", "_vimrc", "_gvimrc",
]),
("webidl", &["*.idl", "*.webidl", "*.widl"]),
("wiki", &["*.mediawiki", "*.wiki"]),
("xml", &[
@@ -240,6 +266,7 @@ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
("xz", &["*.xz", "*.txz"]),
("yacc", &["*.y"]),
("yaml", &["*.yaml", "*.yml"]),
("yang", &["*.yang"]),
("z", &["*.Z"]),
("zig", &["*.zig"]),
("zsh", &[

View File

@@ -20,12 +20,12 @@ use std::io::{self, BufRead};
use std::path::{Path, PathBuf};
use std::sync::{Arc, RwLock};
use gitignore::{self, Gitignore, GitignoreBuilder};
use overrides::{self, Override};
use pathutil::{is_hidden, strip_prefix};
use types::{self, Types};
use walk::DirEntry;
use {Error, Match, PartialErrorBuilder};
use crate::gitignore::{self, Gitignore, GitignoreBuilder};
use crate::overrides::{self, Override};
use crate::pathutil::{is_hidden, strip_prefix};
use crate::types::{self, Types};
use crate::walk::DirEntry;
use crate::{Error, Match, PartialErrorBuilder};
/// IgnoreMatch represents information about where a match came from when using
/// the `Ignore` matcher.
@@ -202,11 +202,12 @@ impl Ignore {
errs.maybe_push(err);
igtmp.is_absolute_parent = true;
igtmp.absolute_base = Some(absolute_base.clone());
igtmp.has_git = if self.0.opts.git_ignore {
parent.join(".git").exists()
} else {
false
};
igtmp.has_git =
if self.0.opts.require_git && self.0.opts.git_ignore {
parent.join(".git").exists()
} else {
false
};
ig = Ignore(Arc::new(igtmp));
compiled.insert(parent.as_os_str().to_os_string(), ig.clone());
}
@@ -231,7 +232,9 @@ impl Ignore {
/// Like add_child, but takes a full path and returns an IgnoreInner.
fn add_child_path(&self, dir: &Path) -> (IgnoreInner, Option<Error>) {
let git_type = if self.0.opts.git_ignore || self.0.opts.git_exclude {
let git_type = if self.0.opts.require_git
&& (self.0.opts.git_ignore || self.0.opts.git_exclude)
{
dir.join(".git").metadata().ok().map(|md| md.file_type())
} else {
None
@@ -495,7 +498,7 @@ impl Ignore {
}
/// Returns an iterator over parent ignore matchers, including this one.
pub fn parents(&self) -> Parents {
pub fn parents(&self) -> Parents<'_> {
Parents(Some(self))
}
@@ -581,7 +584,7 @@ impl IgnoreBuilder {
.unwrap();
let (gi, err) = builder.build_global();
if let Some(err) = err {
debug!("{}", err);
log::debug!("{}", err);
}
gi
};
@@ -840,10 +843,10 @@ mod tests {
use std::io::Write;
use std::path::Path;
use dir::IgnoreBuilder;
use gitignore::Gitignore;
use tests::TempDir;
use Error;
use crate::dir::IgnoreBuilder;
use crate::gitignore::Gitignore;
use crate::tests::TempDir;
use crate::Error;
fn wfile<P: AsRef<Path>>(path: P, contents: &str) {
let mut file = File::create(path).unwrap();

View File

@@ -19,8 +19,8 @@ use globset::{Candidate, GlobBuilder, GlobSet, GlobSetBuilder};
use regex::bytes::Regex;
use thread_local::ThreadLocal;
use pathutil::{is_file_name, strip_prefix};
use {Error, Match, PartialErrorBuilder};
use crate::pathutil::{is_file_name, strip_prefix};
use crate::{Error, Match, PartialErrorBuilder};
/// Glob represents a single glob in a gitignore file.
///
@@ -474,10 +474,13 @@ impl GitignoreBuilder {
}
// If it ends with a slash, then this should only match directories,
// but the slash should otherwise not be used while globbing.
if let Some((i, c)) = line.char_indices().rev().nth(0) {
if c == '/' {
glob.is_only_dir = true;
line = &line[..i];
if line.as_bytes().last() == Some(&b'/') {
glob.is_only_dir = true;
line = &line[..line.len() - 1];
// If the slash was escaped, then remove the escape.
// See: https://github.com/BurntSushi/ripgrep/issues/2236
if line.as_bytes().last() == Some(&b'\\') {
line = &line[..line.len() - 1];
}
}
glob.actual = line.to_string();
@@ -592,7 +595,7 @@ fn parse_excludes_file(data: &[u8]) -> Option<PathBuf> {
// N.B. This is the lazy approach, and isn't technically correct, but
// probably works in more circumstances. I guess we would ideally have
// a full INI parser. Yuck.
lazy_static! {
lazy_static::lazy_static! {
static ref RE: Regex =
Regex::new(r"(?im)^\s*excludesfile\s*=\s*(.+)\s*$").unwrap();
};

View File

@@ -46,25 +46,12 @@ See the documentation for `WalkBuilder` for many other options.
#![deny(missing_docs)]
extern crate globset;
#[macro_use]
extern crate lazy_static;
#[macro_use]
extern crate log;
extern crate memchr;
extern crate regex;
extern crate same_file;
extern crate thread_local;
extern crate walkdir;
#[cfg(windows)]
extern crate winapi_util;
use std::error;
use std::fmt;
use std::io;
use std::path::{Path, PathBuf};
pub use walk::{
pub use crate::walk::{
DirEntry, ParallelVisitor, ParallelVisitorBuilder, Walk, WalkBuilder,
WalkParallel, WalkState,
};
@@ -196,6 +183,71 @@ impl Error {
}
}
/// Inspect the original [`io::Error`] if there is one.
///
/// [`None`] is returned if the [`Error`] doesn't correspond to an
/// [`io::Error`]. This might happen, for example, when the error was
/// produced because a cycle was found in the directory tree while
/// following symbolic links.
///
/// This method returns a borrowed value that is bound to the lifetime of the [`Error`]. To
/// obtain an owned value, the [`into_io_error`] can be used instead.
///
/// > This is the original [`io::Error`] and is _not_ the same as
/// > [`impl From<Error> for std::io::Error`][impl] which contains additional context about the
/// error.
///
/// [`None`]: https://doc.rust-lang.org/stable/std/option/enum.Option.html#variant.None
/// [`io::Error`]: https://doc.rust-lang.org/stable/std/io/struct.Error.html
/// [`From`]: https://doc.rust-lang.org/stable/std/convert/trait.From.html
/// [`Error`]: struct.Error.html
/// [`into_io_error`]: struct.Error.html#method.into_io_error
/// [impl]: struct.Error.html#impl-From%3CError%3E
pub fn io_error(&self) -> Option<&std::io::Error> {
match *self {
Error::Partial(ref errs) => {
if errs.len() == 1 {
errs[0].io_error()
} else {
None
}
}
Error::WithLineNumber { ref err, .. } => err.io_error(),
Error::WithPath { ref err, .. } => err.io_error(),
Error::WithDepth { ref err, .. } => err.io_error(),
Error::Loop { .. } => None,
Error::Io(ref err) => Some(err),
Error::Glob { .. } => None,
Error::UnrecognizedFileType(_) => None,
Error::InvalidDefinition => None,
}
}
/// Similar to [`io_error`] except consumes self to convert to the original
/// [`io::Error`] if one exists.
///
/// [`io_error`]: struct.Error.html#method.io_error
/// [`io::Error`]: https://doc.rust-lang.org/stable/std/io/struct.Error.html
pub fn into_io_error(self) -> Option<std::io::Error> {
match self {
Error::Partial(mut errs) => {
if errs.len() == 1 {
errs.remove(0).into_io_error()
} else {
None
}
}
Error::WithLineNumber { err, .. } => err.into_io_error(),
Error::WithPath { err, .. } => err.into_io_error(),
Error::WithDepth { err, .. } => err.into_io_error(),
Error::Loop { .. } => None,
Error::Io(err) => Some(err),
Error::Glob { .. } => None,
Error::UnrecognizedFileType(_) => None,
Error::InvalidDefinition => None,
}
}
/// Returns a depth associated with recursively walking a directory (if
/// this error was generated from a recursive directory iterator).
pub fn depth(&self) -> Option<usize> {
@@ -269,7 +321,7 @@ impl error::Error for Error {
}
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match *self {
Error::Partial(ref errs) => {
let msgs: Vec<String> =

View File

@@ -6,8 +6,8 @@ line tools.
use std::path::Path;
use gitignore::{self, Gitignore, GitignoreBuilder};
use {Error, Match};
use crate::gitignore::{self, Gitignore, GitignoreBuilder};
use crate::{Error, Match};
/// Glob represents a single glob in an override matcher.
///

View File

@@ -1,7 +1,7 @@
use std::ffi::OsStr;
use std::path::Path;
use walk::DirEntry;
use crate::walk::DirEntry;
/// Returns true if and only if this entry is considered to be hidden.
///

View File

@@ -93,9 +93,9 @@ use globset::{GlobBuilder, GlobSet, GlobSetBuilder};
use regex::Regex;
use thread_local::ThreadLocal;
use default_types::DEFAULT_TYPES;
use pathutil::file_name;
use {Error, Match};
use crate::default_types::DEFAULT_TYPES;
use crate::pathutil::file_name;
use crate::{Error, Match};
/// Glob represents a single glob in a set of file type definitions.
///
@@ -122,10 +122,6 @@ enum GlobInner<'a> {
Matched {
/// The file type definition which provided the glob.
def: &'a FileTypeDef,
/// The index of the glob that matched inside the file type definition.
which: usize,
/// Whether the selection was negated or not.
negated: bool,
},
}
@@ -291,13 +287,9 @@ impl Types {
self.set.matches_into(name, &mut *matches);
// The highest precedent match is the last one.
if let Some(&i) = matches.last() {
let (isel, iglob) = self.glob_to_selection[i];
let (isel, _) = self.glob_to_selection[i];
let sel = &self.selections[isel];
let glob = Glob(GlobInner::Matched {
def: sel.inner(),
which: iglob,
negated: sel.is_negated(),
});
let glob = Glob(GlobInner::Matched { def: sel.inner() });
return if sel.is_negated() {
Match::Ignore(glob)
} else {
@@ -427,7 +419,7 @@ impl TypesBuilder {
/// If `name` is `all` or otherwise contains any character that is not a
/// Unicode letter or number, then an error is returned.
pub fn add(&mut self, name: &str, glob: &str) -> Result<(), Error> {
lazy_static! {
lazy_static::lazy_static! {
static ref RE: Regex = Regex::new(r"^[\pL\pN]+$").unwrap();
};
if name == "all" || !RE.is_match(name) {

View File

@@ -13,11 +13,11 @@ use std::vec;
use same_file::Handle;
use walkdir::{self, WalkDir};
use dir::{Ignore, IgnoreBuilder};
use gitignore::GitignoreBuilder;
use overrides::Override;
use types::Types;
use {Error, PartialErrorBuilder};
use crate::dir::{Ignore, IgnoreBuilder};
use crate::gitignore::GitignoreBuilder;
use crate::overrides::Override;
use crate::types::Types;
use crate::{Error, PartialErrorBuilder};
/// A directory entry with a possible error attached.
///
@@ -252,7 +252,7 @@ struct DirEntryRaw {
}
impl fmt::Debug for DirEntryRaw {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
// Leaving out FileType because it doesn't have a debug impl
// in Rust 1.9. We could add it if we really wanted to by manually
// querying each possibly file type. Meh. ---AG
@@ -504,7 +504,7 @@ enum Sorter {
struct Filter(Arc<dyn Fn(&DirEntry) -> bool + Send + Sync + 'static>);
impl fmt::Debug for WalkBuilder {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("WalkBuilder")
.field("paths", &self.paths)
.field("ig_builder", &self.ig_builder)
@@ -934,15 +934,23 @@ impl Walk {
if ent.depth() == 0 {
return Ok(false);
}
// We ensure that trivial skipping is done before any other potentially
// expensive operations (stat, filesystem other) are done. This seems
// like an obvious optimization but becomes critical when filesystem
// operations even as simple as stat can result in significant
// overheads; an example of this was a bespoke filesystem layer in
// Windows that hosted files remotely and would download them on-demand
// when particular filesystem operations occurred. Users of this system
// who ensured correct file-type fileters were being used could still
// get unnecessary file access resulting in large downloads.
if should_skip_entry(&self.ig, ent) {
return Ok(true);
}
if let Some(ref stdout) = self.skip {
if path_equals(ent, stdout)? {
return Ok(true);
}
}
if should_skip_entry(&self.ig, ent) {
return Ok(true);
}
if self.max_filesize.is_some() && !ent.is_dir() {
return Ok(skip_filesize(
self.max_filesize.unwrap(),
@@ -1218,7 +1226,7 @@ impl WalkParallel {
/// visitor runs on only one thread, this build-up can be done without
/// synchronization. Then, once traversal is complete, all of the results
/// can be merged together into a single data structure.
pub fn visit(mut self, builder: &mut dyn ParallelVisitorBuilder) {
pub fn visit(mut self, builder: &mut dyn ParallelVisitorBuilder<'_>) {
let threads = self.threads();
let stack = Arc::new(Mutex::new(vec![]));
{
@@ -1549,6 +1557,11 @@ impl<'s> Worker<'s> {
}
}
}
// N.B. See analogous call in the single-threaded implementation about
// why it's important for this to come before the checks below.
if should_skip_entry(ig, &dent) {
return WalkState::Continue;
}
if let Some(ref stdout) = self.skip {
let is_stdout = match path_equals(&dent, stdout) {
Ok(is_stdout) => is_stdout,
@@ -1558,7 +1571,6 @@ impl<'s> Worker<'s> {
return WalkState::Continue;
}
}
let should_skip_path = should_skip_entry(ig, &dent);
let should_skip_filesize =
if self.max_filesize.is_some() && !dent.is_dir() {
skip_filesize(
@@ -1575,8 +1587,7 @@ impl<'s> Worker<'s> {
} else {
false
};
if !should_skip_path && !should_skip_filesize && !should_skip_filtered
{
if !should_skip_filesize && !should_skip_filtered {
self.send(Work { dent, ignore: ig.clone(), root_device });
}
WalkState::Continue
@@ -1714,7 +1725,7 @@ fn skip_filesize(
if let Some(fs) = filesize {
if fs > max_filesize {
debug!("ignoring {}: {} bytes", path.display(), fs);
log::debug!("ignoring {}: {} bytes", path.display(), fs);
true
} else {
false
@@ -1727,10 +1738,10 @@ fn skip_filesize(
fn should_skip_entry(ig: &Ignore, dent: &DirEntry) -> bool {
let m = ig.matched_dir_entry(dent);
if m.is_ignore() {
debug!("ignoring {}: {:?}", dent.path().display(), m);
log::debug!("ignoring {}: {:?}", dent.path().display(), m);
true
} else if m.is_whitelist() {
debug!("whitelisting {}: {:?}", dent.path().display(), m);
log::debug!("whitelisting {}: {:?}", dent.path().display(), m);
false
} else {
false
@@ -1841,7 +1852,7 @@ mod tests {
use std::sync::{Arc, Mutex};
use super::{DirEntry, WalkBuilder, WalkState};
use tests::TempDir;
use crate::tests::TempDir;
fn wfile<P: AsRef<Path>>(path: P, contents: &str) {
let mut file = File::create(path).unwrap();

View File

@@ -1,5 +1,3 @@
extern crate ignore;
use std::path::Path;
use ignore::gitignore::{Gitignore, GitignoreBuilder};

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-matcher"
version = "0.1.4" #:version
version = "0.1.5" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
A trait for regular expressions, with a focus on line oriented search.
@@ -10,8 +10,9 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/matcher"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/matcher"
readme = "README.md"
keywords = ["regex", "pattern", "trait"]
license = "Unlicense/MIT"
license = "Unlicense OR MIT"
autotests = false
edition = "2018"
[dependencies]
memchr = "2.1"

View File

@@ -4,11 +4,10 @@ This crate provides a low level interface for describing regular expression
matchers. The `grep` crate uses this interface in order to make the regex
engine it uses pluggable.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![Build status](https://github.com/BurntSushi/ripgrep/workflows/ci/badge.svg)](https://github.com/BurntSushi/ripgrep/actions)
[![](https://img.shields.io/crates/v/grep-matcher.svg)](https://crates.io/crates/grep-matcher)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
Dual-licensed under MIT or the [UNLICENSE](https://unlicense.org/).
### Documentation
@@ -28,9 +27,3 @@ Add this to your `Cargo.toml`:
[dependencies]
grep-matcher = "0.1"
```
and this to your crate root:
```rust
extern crate grep_matcher;
```

View File

@@ -92,7 +92,7 @@ impl From<usize> for Ref<'static> {
/// starting at the beginning of `replacement`.
///
/// If no such valid reference could be found, None is returned.
fn find_cap_ref(replacement: &[u8]) -> Option<CaptureRef> {
fn find_cap_ref(replacement: &[u8]) -> Option<CaptureRef<'_>> {
let mut i = 0;
if replacement.len() <= 1 || replacement[0] != b'$' {
return None;

View File

@@ -38,14 +38,12 @@ implementations.
#![deny(missing_docs)]
extern crate memchr;
use std::fmt;
use std::io;
use std::ops;
use std::u64;
use interpolate::interpolate;
use crate::interpolate::interpolate;
mod interpolate;
@@ -118,7 +116,7 @@ impl Match {
/// This method panics if `start > self.end`.
#[inline]
pub fn with_start(&self, start: usize) -> Match {
assert!(start <= self.end);
assert!(start <= self.end, "{} is not <= {}", start, self.end);
Match { start, ..*self }
}
@@ -130,7 +128,7 @@ impl Match {
/// This method panics if `self.start > end`.
#[inline]
pub fn with_end(&self, end: usize) -> Match {
assert!(self.start <= end);
assert!(self.start <= end, "{} is not <= {}", self.start, end);
Match { end, ..*self }
}
@@ -304,7 +302,7 @@ pub struct ByteSet(BitSet);
struct BitSet([u64; 4]);
impl fmt::Debug for BitSet {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let mut fmtd = f.debug_set();
for b in (0..256).map(|b| b as u8) {
if ByteSet(*self).contains(b) {
@@ -494,7 +492,7 @@ impl ::std::error::Error for NoError {
}
impl fmt::Display for NoError {
fn fmt(&self, _: &mut fmt::Formatter) -> fmt::Result {
fn fmt(&self, _: &mut fmt::Formatter<'_>) -> fmt::Result {
panic!("BUG for NoError: an impossible error occurred")
}
}
@@ -618,12 +616,31 @@ pub trait Matcher {
fn find_iter<F>(
&self,
haystack: &[u8],
matched: F,
) -> Result<(), Self::Error>
where
F: FnMut(Match) -> bool,
{
self.find_iter_at(haystack, 0, matched)
}
/// Executes the given function over successive non-overlapping matches
/// in `haystack`. If no match exists, then the given function is never
/// called. If the function returns `false`, then iteration stops.
///
/// The significance of the starting point is that it takes the surrounding
/// context into consideration. For example, the `\A` anchor can only
/// match when `at == 0`.
fn find_iter_at<F>(
&self,
haystack: &[u8],
at: usize,
mut matched: F,
) -> Result<(), Self::Error>
where
F: FnMut(Match) -> bool,
{
self.try_find_iter(haystack, |m| Ok(matched(m)))
self.try_find_iter_at(haystack, at, |m| Ok(matched(m)))
.map(|r: Result<(), ()>| r.unwrap())
}
@@ -637,12 +654,35 @@ pub trait Matcher {
fn try_find_iter<F, E>(
&self,
haystack: &[u8],
matched: F,
) -> Result<Result<(), E>, Self::Error>
where
F: FnMut(Match) -> Result<bool, E>,
{
self.try_find_iter_at(haystack, 0, matched)
}
/// Executes the given function over successive non-overlapping matches
/// in `haystack`. If no match exists, then the given function is never
/// called. If the function returns `false`, then iteration stops.
/// Similarly, if the function returns an error then iteration stops and
/// the error is yielded. If an error occurs while executing the search,
/// then it is converted to
/// `E`.
///
/// The significance of the starting point is that it takes the surrounding
/// context into consideration. For example, the `\A` anchor can only
/// match when `at == 0`.
fn try_find_iter_at<F, E>(
&self,
haystack: &[u8],
at: usize,
mut matched: F,
) -> Result<Result<(), E>, Self::Error>
where
F: FnMut(Match) -> Result<bool, E>,
{
let mut last_end = 0;
let mut last_end = at;
let mut last_match = None;
loop {
@@ -696,12 +736,33 @@ pub trait Matcher {
&self,
haystack: &[u8],
caps: &mut Self::Captures,
matched: F,
) -> Result<(), Self::Error>
where
F: FnMut(&Self::Captures) -> bool,
{
self.captures_iter_at(haystack, 0, caps, matched)
}
/// Executes the given function over successive non-overlapping matches
/// in `haystack` with capture groups extracted from each match. If no
/// match exists, then the given function is never called. If the function
/// returns `false`, then iteration stops.
///
/// The significance of the starting point is that it takes the surrounding
/// context into consideration. For example, the `\A` anchor can only
/// match when `at == 0`.
fn captures_iter_at<F>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
mut matched: F,
) -> Result<(), Self::Error>
where
F: FnMut(&Self::Captures) -> bool,
{
self.try_captures_iter(haystack, caps, |caps| Ok(matched(caps)))
self.try_captures_iter_at(haystack, at, caps, |caps| Ok(matched(caps)))
.map(|r: Result<(), ()>| r.unwrap())
}
@@ -716,12 +777,36 @@ pub trait Matcher {
&self,
haystack: &[u8],
caps: &mut Self::Captures,
matched: F,
) -> Result<Result<(), E>, Self::Error>
where
F: FnMut(&Self::Captures) -> Result<bool, E>,
{
self.try_captures_iter_at(haystack, 0, caps, matched)
}
/// Executes the given function over successive non-overlapping matches
/// in `haystack` with capture groups extracted from each match. If no
/// match exists, then the given function is never called. If the function
/// returns `false`, then iteration stops. Similarly, if the function
/// returns an error then iteration stops and the error is yielded. If
/// an error occurs while executing the search, then it is converted to
/// `E`.
///
/// The significance of the starting point is that it takes the surrounding
/// context into consideration. For example, the `\A` anchor can only
/// match when `at == 0`.
fn try_captures_iter_at<F, E>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
mut matched: F,
) -> Result<Result<(), E>, Self::Error>
where
F: FnMut(&Self::Captures) -> Result<bool, E>,
{
let mut last_end = 0;
let mut last_end = at;
let mut last_match = None;
loop {
@@ -819,13 +904,35 @@ pub trait Matcher {
haystack: &[u8],
caps: &mut Self::Captures,
dst: &mut Vec<u8>,
append: F,
) -> Result<(), Self::Error>
where
F: FnMut(&Self::Captures, &mut Vec<u8>) -> bool,
{
self.replace_with_captures_at(haystack, 0, caps, dst, append)
}
/// Replaces every match in the given haystack with the result of calling
/// `append` with the matching capture groups.
///
/// If the given `append` function returns `false`, then replacement stops.
///
/// The significance of the starting point is that it takes the surrounding
/// context into consideration. For example, the `\A` anchor can only
/// match when `at == 0`.
fn replace_with_captures_at<F>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
dst: &mut Vec<u8>,
mut append: F,
) -> Result<(), Self::Error>
where
F: FnMut(&Self::Captures, &mut Vec<u8>) -> bool,
{
let mut last_match = 0;
self.captures_iter(haystack, caps, |caps| {
let mut last_match = at;
self.captures_iter_at(haystack, at, caps, |caps| {
let m = caps.get(0).unwrap();
dst.extend(&haystack[last_match..m.start]);
last_match = m.end;
@@ -1039,6 +1146,18 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).find_iter(haystack, matched)
}
fn find_iter_at<F>(
&self,
haystack: &[u8],
at: usize,
matched: F,
) -> Result<(), Self::Error>
where
F: FnMut(Match) -> bool,
{
(*self).find_iter_at(haystack, at, matched)
}
fn try_find_iter<F, E>(
&self,
haystack: &[u8],
@@ -1050,6 +1169,18 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).try_find_iter(haystack, matched)
}
fn try_find_iter_at<F, E>(
&self,
haystack: &[u8],
at: usize,
matched: F,
) -> Result<Result<(), E>, Self::Error>
where
F: FnMut(Match) -> Result<bool, E>,
{
(*self).try_find_iter_at(haystack, at, matched)
}
fn captures(
&self,
haystack: &[u8],
@@ -1070,6 +1201,19 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).captures_iter(haystack, caps, matched)
}
fn captures_iter_at<F>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
matched: F,
) -> Result<(), Self::Error>
where
F: FnMut(&Self::Captures) -> bool,
{
(*self).captures_iter_at(haystack, at, caps, matched)
}
fn try_captures_iter<F, E>(
&self,
haystack: &[u8],
@@ -1082,6 +1226,19 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).try_captures_iter(haystack, caps, matched)
}
fn try_captures_iter_at<F, E>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
matched: F,
) -> Result<Result<(), E>, Self::Error>
where
F: FnMut(&Self::Captures) -> Result<bool, E>,
{
(*self).try_captures_iter_at(haystack, at, caps, matched)
}
fn replace<F>(
&self,
haystack: &[u8],
@@ -1107,6 +1264,20 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).replace_with_captures(haystack, caps, dst, append)
}
fn replace_with_captures_at<F>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
dst: &mut Vec<u8>,
append: F,
) -> Result<(), Self::Error>
where
F: FnMut(&Self::Captures, &mut Vec<u8>) -> bool,
{
(*self).replace_with_captures_at(haystack, at, caps, dst, append)
}
fn is_match(&self, haystack: &[u8]) -> Result<bool, Self::Error> {
(*self).is_match(haystack)
}

View File

@@ -1,7 +1,7 @@
use grep_matcher::{Captures, Match, Matcher};
use regex::bytes::Regex;
use util::{RegexMatcher, RegexMatcherNoCaps};
use crate::util::{RegexMatcher, RegexMatcherNoCaps};
fn matcher(pattern: &str) -> RegexMatcher {
RegexMatcher::new(Regex::new(pattern).unwrap())

View File

@@ -1,6 +1,3 @@
extern crate grep_matcher;
extern crate regex;
mod util;
mod test_matcher;

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-pcre2"
version = "0.1.4" #:version
version = "0.1.5" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Use PCRE2 with the 'grep' crate.
@@ -10,8 +10,9 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/pcre2"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/pcre2"
readme = "README.md"
keywords = ["regex", "grep", "pcre", "backreference", "look"]
license = "Unlicense/MIT"
license = "Unlicense OR MIT"
edition = "2018"
[dependencies]
grep-matcher = { version = "0.1.2", path = "../matcher" }
pcre2 = "0.2.0"
grep-matcher = { version = "0.1.5", path = "../matcher" }
pcre2 = "0.2.3"

View File

@@ -4,11 +4,10 @@ The `grep-pcre2` crate provides an implementation of the `Matcher` trait from
the `grep-matcher` crate. This implementation permits PCRE2 to be used in the
`grep` crate for fast line oriented searching.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![Build status](https://github.com/BurntSushi/ripgrep/workflows/ci/badge.svg)](https://github.com/BurntSushi/ripgrep/actions)
[![](https://img.shields.io/crates/v/grep-pcre2.svg)](https://crates.io/crates/grep-pcre2)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
Dual-licensed under MIT or the [UNLICENSE](https://unlicense.org/).
### Documentation
@@ -31,9 +30,3 @@ Add this to your `Cargo.toml`:
[dependencies]
grep-pcre2 = "0.1"
```
and this to your crate root:
```rust
extern crate grep_pcre2;
```

View File

@@ -50,7 +50,7 @@ impl error::Error for Error {
}
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self.kind {
ErrorKind::Regex(ref s) => write!(f, "{}", s),
ErrorKind::__Nonexhaustive => unreachable!(),

View File

@@ -5,11 +5,8 @@ An implementation of `grep-matcher`'s `Matcher` trait for
#![deny(missing_docs)]
extern crate grep_matcher;
extern crate pcre2;
pub use error::{Error, ErrorKind};
pub use matcher::{RegexCaptures, RegexMatcher, RegexMatcherBuilder};
pub use crate::error::{Error, ErrorKind};
pub use crate::matcher::{RegexCaptures, RegexMatcher, RegexMatcherBuilder};
pub use pcre2::{is_jit_available, version};
mod error;

View File

@@ -3,7 +3,7 @@ use std::collections::HashMap;
use grep_matcher::{Captures, Match, Matcher};
use pcre2::bytes::{CaptureLocations, Regex, RegexBuilder};
use error::Error;
use crate::error::Error;
/// A builder for configuring the compilation of a PCRE2 regex.
#[derive(Clone, Debug)]

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-printer"
version = "0.1.5" #:version
version = "0.1.6" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
An implementation of the grep crate's Sink trait that provides standard
@@ -11,21 +11,21 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/printer"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/printer"
readme = "README.md"
keywords = ["grep", "pattern", "print", "printer", "sink"]
license = "Unlicense/MIT"
license = "Unlicense OR MIT"
edition = "2018"
[features]
default = ["serde1"]
serde1 = ["base64", "serde", "serde_derive", "serde_json"]
serde1 = ["base64", "serde", "serde_json"]
[dependencies]
base64 = { version = "0.12.1", optional = true }
base64 = { version = "0.13.0", optional = true }
bstr = "0.2.0"
grep-matcher = { version = "0.1.2", path = "../matcher" }
grep-searcher = { version = "0.1.4", path = "../searcher" }
grep-matcher = { version = "0.1.5", path = "../matcher" }
grep-searcher = { version = "0.1.8", path = "../searcher" }
termcolor = "1.0.4"
serde = { version = "1.0.77", optional = true }
serde_derive = { version = "1.0.77", optional = true }
serde = { version = "1.0.77", optional = true, features = ["derive"] }
serde_json = { version = "1.0.27", optional = true }
[dev-dependencies]
grep-regex = { version = "0.1.3", path = "../regex" }
grep-regex = { version = "0.1.9", path = "../regex" }

View File

@@ -3,11 +3,10 @@ grep-printer
Print results from line oriented searching in a human readable, aggregate or
JSON Lines format.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![Build status](https://github.com/BurntSushi/ripgrep/workflows/ci/badge.svg)](https://github.com/BurntSushi/ripgrep/actions)
[![](https://img.shields.io/crates/v/grep-printer.svg)](https://crates.io/crates/grep-printer)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
Dual-licensed under MIT or the [UNLICENSE](https://unlicense.org/).
### Documentation
@@ -27,9 +26,3 @@ Add this to your `Cargo.toml`:
[dependencies]
grep-printer = "0.1"
```
and this to your crate root:
```rust
extern crate grep_printer;
```

View File

@@ -60,7 +60,7 @@ impl ColorError {
}
impl fmt::Display for ColorError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match *self {
ColorError::UnrecognizedOutType(ref name) => write!(
f,
@@ -147,9 +147,6 @@ pub struct ColorSpecs {
/// A `UserColorSpec` can also be converted to a `termcolor::ColorSpec`:
///
/// ```rust
/// extern crate grep_printer;
/// extern crate termcolor;
///
/// # fn main() {
/// use termcolor::{Color, ColorSpec};
/// use grep_printer::UserColorSpec;

View File

@@ -4,14 +4,14 @@ use std::time::Instant;
use grep_matcher::{Match, Matcher};
use grep_searcher::{
Searcher, Sink, SinkContext, SinkContextKind, SinkError, SinkFinish,
SinkMatch,
Searcher, Sink, SinkContext, SinkContextKind, SinkFinish, SinkMatch,
};
use serde_json as json;
use counter::CounterWriter;
use jsont;
use stats::Stats;
use crate::counter::CounterWriter;
use crate::jsont;
use crate::stats::Stats;
use crate::util::find_iter_at_in_context;
/// The configuration for the JSON printer.
///
@@ -112,7 +112,7 @@ impl JSONBuilder {
///
/// ## Overview
///
/// The format of this printer is the [JSON Lines](http://jsonlines.org/)
/// The format of this printer is the [JSON Lines](https://jsonlines.org/)
/// format. Specifically, this printer emits a sequence of messages, where
/// each message is encoded as a single JSON value on a single line. There are
/// four different types of messages (and this number may expand over time):
@@ -147,7 +147,7 @@ impl JSONBuilder {
/// is not limited to UTF-8 exclusively, which in turn implies that matches
/// may be reported that contain invalid UTF-8. Moreover, this printer may
/// also print file paths, and the encoding of file paths is itself not
/// guarnateed to be valid UTF-8. Therefore, this printer must deal with the
/// guaranteed to be valid UTF-8. Therefore, this printer must deal with the
/// presence of invalid UTF-8 somehow. The printer could silently ignore such
/// things completely, or even lossily transcode invalid UTF-8 to valid UTF-8
/// by replacing all invalid sequences with the Unicode replacement character.
@@ -507,7 +507,10 @@ impl<W: io::Write> JSON<W> {
/// Write the given message followed by a new line. The new line is
/// determined from the configuration of the given searcher.
fn write_message(&mut self, message: &jsont::Message) -> io::Result<()> {
fn write_message(
&mut self,
message: &jsont::Message<'_>,
) -> io::Result<()> {
if self.config.pretty {
json::to_writer_pretty(&mut self.wtr, message)?;
} else {
@@ -552,7 +555,7 @@ impl<W> JSON<W> {
/// * `W` refers to the underlying writer that this printer is writing its
/// output to.
#[derive(Debug)]
pub struct JSONSink<'p, 's, M: Matcher, W: 's> {
pub struct JSONSink<'p, 's, M: Matcher, W> {
matcher: M,
json: &'s mut JSON<W>,
path: Option<&'p Path>,
@@ -603,7 +606,12 @@ impl<'p, 's, M: Matcher, W: io::Write> JSONSink<'p, 's, M, W> {
/// Execute the matcher over the given bytes and record the match
/// locations if the current configuration demands match granularity.
fn record_matches(&mut self, bytes: &[u8]) -> io::Result<()> {
fn record_matches(
&mut self,
searcher: &Searcher,
bytes: &[u8],
range: std::ops::Range<usize>,
) -> io::Result<()> {
self.json.matches.clear();
// If printing requires knowing the location of each individual match,
// then compute and stored those right now for use later. While this
@@ -612,12 +620,17 @@ impl<'p, 's, M: Matcher, W: io::Write> JSONSink<'p, 's, M, W> {
// the extent that it's easy to ensure that we never do more than
// one search to find the matches.
let matches = &mut self.json.matches;
self.matcher
.find_iter(bytes, |m| {
matches.push(m);
find_iter_at_in_context(
searcher,
&self.matcher,
bytes,
range.clone(),
|m| {
let (s, e) = (m.start() - range.start, m.end() - range.start);
matches.push(Match::new(s, e));
true
})
.map_err(io::Error::error_message)?;
},
)?;
// Don't report empty matches appearing at the end of the bytes.
if !matches.is_empty()
&& matches.last().unwrap().is_empty()
@@ -644,6 +657,16 @@ impl<'p, 's, M: Matcher, W: io::Write> JSONSink<'p, 's, M, W> {
self.after_context_remaining == 0
}
/// Returns whether the current match count exceeds the configured limit.
/// If there is no limit, then this always returns false.
fn match_more_than_limit(&self) -> bool {
let limit = match self.json.config.max_matches {
None => return false,
Some(limit) => limit,
};
self.match_count > limit
}
/// Write the "begin" message.
fn write_begin_message(&mut self) -> io::Result<()> {
if self.begin_printed {
@@ -662,13 +685,30 @@ impl<'p, 's, M: Matcher, W: io::Write> Sink for JSONSink<'p, 's, M, W> {
fn matched(
&mut self,
searcher: &Searcher,
mat: &SinkMatch,
mat: &SinkMatch<'_>,
) -> Result<bool, io::Error> {
self.write_begin_message()?;
self.match_count += 1;
self.after_context_remaining = searcher.after_context() as u64;
self.record_matches(mat.bytes())?;
// When we've exceeded our match count, then the remaining context
// lines should not be reset, but instead, decremented. This avoids a
// bug where we display more matches than a configured limit. The main
// idea here is that 'matched' might be called again while printing
// an after-context line. In that case, we should treat this as a
// contextual line rather than a matching line for the purposes of
// termination.
if self.match_more_than_limit() {
self.after_context_remaining =
self.after_context_remaining.saturating_sub(1);
} else {
self.after_context_remaining = searcher.after_context() as u64;
}
self.record_matches(
searcher,
mat.buffer(),
mat.bytes_range_in_buffer(),
)?;
self.stats.add_matches(self.json.matches.len() as u64);
self.stats.add_matched_lines(mat.lines().count() as u64);
@@ -687,7 +727,7 @@ impl<'p, 's, M: Matcher, W: io::Write> Sink for JSONSink<'p, 's, M, W> {
fn context(
&mut self,
searcher: &Searcher,
ctx: &SinkContext,
ctx: &SinkContext<'_>,
) -> Result<bool, io::Error> {
self.write_begin_message()?;
self.json.matches.clear();
@@ -697,7 +737,7 @@ impl<'p, 's, M: Matcher, W: io::Write> Sink for JSONSink<'p, 's, M, W> {
self.after_context_remaining.saturating_sub(1);
}
let submatches = if searcher.invert_match() {
self.record_matches(ctx.bytes())?;
self.record_matches(searcher, ctx.bytes(), 0..ctx.bytes().len())?;
SubMatches::new(ctx.bytes(), &self.json.matches)
} else {
SubMatches::empty()
@@ -799,7 +839,7 @@ impl<'a> SubMatches<'a> {
}
/// Return this set of match ranges as a slice.
fn as_slice(&self) -> &[jsont::SubMatch] {
fn as_slice(&self) -> &[jsont::SubMatch<'_>] {
match *self {
SubMatches::Empty => &[],
SubMatches::Small(ref x) => x,
@@ -871,6 +911,38 @@ and exhibited clearly, with a label attached.\
assert_eq!(got.lines().count(), 3);
}
#[test]
fn max_matches_after_context() {
let haystack = "\
a
b
c
d
e
d
e
d
e
d
e
";
let matcher = RegexMatcher::new(r"d").unwrap();
let mut printer =
JSONBuilder::new().max_matches(Some(1)).build(vec![]);
SearcherBuilder::new()
.after_context(2)
.build()
.search_reader(
&matcher,
haystack.as_bytes(),
printer.sink(&matcher),
)
.unwrap();
let got = printer_contents(&mut printer);
assert_eq!(got.lines().count(), 5);
}
#[test]
fn no_match() {
let matcher = RegexMatcher::new(r"DOES NOT MATCH").unwrap();

View File

@@ -13,7 +13,7 @@ use std::str;
use base64;
use serde::{Serialize, Serializer};
use stats::Stats;
use crate::stats::Stats;
#[derive(Serialize)]
#[serde(tag = "type", content = "data")]
@@ -90,7 +90,7 @@ enum Data<'a> {
}
impl<'a> Data<'a> {
fn from_bytes(bytes: &[u8]) -> Data {
fn from_bytes(bytes: &[u8]) -> Data<'_> {
match str::from_utf8(bytes) {
Ok(text) => Data::Text { text: Cow::Borrowed(text) },
Err(_) => Data::Bytes { bytes },
@@ -98,7 +98,7 @@ impl<'a> Data<'a> {
}
#[cfg(unix)]
fn from_path(path: &Path) -> Data {
fn from_path(path: &Path) -> Data<'_> {
use std::os::unix::ffi::OsStrExt;
match path.to_str() {

View File

@@ -13,7 +13,7 @@ statistics.
The [`JSON`](struct.JSON.html) printer shows results in a machine readable
format. To facilitate a stream of search results, the format uses
[JSON Lines](http://jsonlines.org/)
[JSON Lines](https://jsonlines.org/)
by emitting a series of messages as search results are found.
The [`Summary`](struct.Summary.html) printer shows *aggregate* results for a
@@ -27,10 +27,6 @@ contain matches.
This example shows how to create a "standard" printer and execute a search.
```
extern crate grep_regex;
extern crate grep_printer;
extern crate grep_searcher;
use std::error::Error;
use grep_regex::RegexMatcher;
@@ -68,29 +64,26 @@ fn example() -> Result<(), Box<Error>> {
#![deny(missing_docs)]
pub use crate::color::{
default_color_specs, ColorError, ColorSpecs, UserColorSpec,
};
#[cfg(feature = "serde1")]
extern crate base64;
extern crate bstr;
extern crate grep_matcher;
#[cfg(test)]
extern crate grep_regex;
extern crate grep_searcher;
#[cfg(feature = "serde1")]
extern crate serde;
#[cfg(feature = "serde1")]
#[macro_use]
extern crate serde_derive;
#[cfg(feature = "serde1")]
extern crate serde_json;
extern crate termcolor;
pub use crate::json::{JSONBuilder, JSONSink, JSON};
pub use crate::standard::{Standard, StandardBuilder, StandardSink};
pub use crate::stats::Stats;
pub use crate::summary::{Summary, SummaryBuilder, SummaryKind, SummarySink};
pub use crate::util::PrinterPath;
pub use color::{default_color_specs, ColorError, ColorSpecs, UserColorSpec};
#[cfg(feature = "serde1")]
pub use json::{JSONBuilder, JSONSink, JSON};
pub use standard::{Standard, StandardBuilder, StandardSink};
pub use stats::Stats;
pub use summary::{Summary, SummaryBuilder, SummaryKind, SummarySink};
pub use util::PrinterPath;
// The maximum number of bytes to execute a search to account for look-ahead.
//
// This is an unfortunate kludge since PCRE2 doesn't provide a way to search
// a substring of some input while accounting for look-ahead. In theory, we
// could refactor the various 'grep' interfaces to account for it, but it would
// be a large change. So for now, we just let PCRE2 go looking a bit for a
// match without searching the entire rest of the contents.
//
// Note that this kludge is only active in multi-line mode.
const MAX_LOOK_AHEAD: usize = 128;
#[macro_use]
mod macros;

View File

@@ -8,15 +8,18 @@ use std::time::Instant;
use bstr::ByteSlice;
use grep_matcher::{Match, Matcher};
use grep_searcher::{
LineStep, Searcher, Sink, SinkContext, SinkContextKind, SinkError,
SinkFinish, SinkMatch,
LineStep, Searcher, Sink, SinkContext, SinkContextKind, SinkFinish,
SinkMatch,
};
use termcolor::{ColorSpec, NoColor, WriteColor};
use color::ColorSpecs;
use counter::CounterWriter;
use stats::Stats;
use util::{trim_ascii_prefix, PrinterPath, Replacer, Sunk};
use crate::color::ColorSpecs;
use crate::counter::CounterWriter;
use crate::stats::Stats;
use crate::util::{
find_iter_at_in_context, trim_ascii_prefix, trim_line_terminator,
PrinterPath, Replacer, Sunk,
};
/// The configuration for the standard printer.
///
@@ -31,6 +34,7 @@ struct Config {
path: bool,
only_matching: bool,
per_match: bool,
per_match_one_line: bool,
replacement: Arc<Option<Vec<u8>>>,
max_columns: Option<u64>,
max_columns_preview: bool,
@@ -55,6 +59,7 @@ impl Default for Config {
path: true,
only_matching: false,
per_match: false,
per_match_one_line: false,
replacement: Arc::new(None),
max_columns: None,
max_columns_preview: false,
@@ -219,15 +224,36 @@ impl StandardBuilder {
/// the `column` option, which will show the starting column number for
/// every match on every line.
///
/// When multi-line mode is enabled, each match and its accompanying lines
/// are printed. As with single line matches, if a line contains multiple
/// matches (even if only partially), then that line is printed once for
/// each match it participates in.
/// When multi-line mode is enabled, each match is printed, including every
/// line in the match. As with single line matches, if a line contains
/// multiple matches (even if only partially), then that line is printed
/// once for each match it participates in, assuming it's the first line in
/// that match. In multi-line mode, column numbers only indicate the start
/// of a match. Subsequent lines in a multi-line match always have a column
/// number of `1`.
///
/// When a match contains multiple lines, enabling `per_match_one_line`
/// will cause only the first line each in match to be printed.
pub fn per_match(&mut self, yes: bool) -> &mut StandardBuilder {
self.config.per_match = yes;
self
}
/// Print at most one line per match when `per_match` is enabled.
///
/// By default, every line in each match found is printed when `per_match`
/// is enabled. However, this is sometimes undesirable, e.g., when you
/// only ever want one line per match.
///
/// This is only applicable when multi-line matching is enabled, since
/// otherwise, matches are guaranteed to span one line.
///
/// This is disabled by default.
pub fn per_match_one_line(&mut self, yes: bool) -> &mut StandardBuilder {
self.config.per_match_one_line = yes;
self
}
/// Set the bytes that will be used to replace each occurrence of a match
/// found.
///
@@ -292,9 +318,6 @@ impl StandardBuilder {
/// Column numbers are computed in terms of bytes from the start of the
/// line being printed.
///
/// For matches that span multiple lines, the column number for each
/// matching line is in terms of the first matching line.
///
/// This is disabled by default.
pub fn column(&mut self, yes: bool) -> &mut StandardBuilder {
self.config.column = yes;
@@ -602,7 +625,7 @@ impl<W> Standard<W> {
/// * `W` refers to the underlying writer that this printer is writing its
/// output to.
#[derive(Debug)]
pub struct StandardSink<'p, 's, M: Matcher, W: 's> {
pub struct StandardSink<'p, 's, M: Matcher, W> {
matcher: M,
standard: &'s mut Standard<W>,
replacer: Replacer<M>,
@@ -662,7 +685,12 @@ impl<'p, 's, M: Matcher, W: WriteColor> StandardSink<'p, 's, M, W> {
/// Execute the matcher over the given bytes and record the match
/// locations if the current configuration demands match granularity.
fn record_matches(&mut self, bytes: &[u8]) -> io::Result<()> {
fn record_matches(
&mut self,
searcher: &Searcher,
bytes: &[u8],
range: std::ops::Range<usize>,
) -> io::Result<()> {
self.standard.matches.clear();
if !self.needs_match_granularity {
return Ok(());
@@ -675,16 +703,21 @@ impl<'p, 's, M: Matcher, W: WriteColor> StandardSink<'p, 's, M, W> {
// one search to find the matches (well, for replacements, we do one
// additional search to perform the actual replacement).
let matches = &mut self.standard.matches;
self.matcher
.find_iter(bytes, |m| {
matches.push(m);
find_iter_at_in_context(
searcher,
&self.matcher,
bytes,
range.clone(),
|m| {
let (s, e) = (m.start() - range.start, m.end() - range.start);
matches.push(Match::new(s, e));
true
})
.map_err(io::Error::error_message)?;
},
)?;
// Don't report empty matches appearing at the end of the bytes.
if !matches.is_empty()
&& matches.last().unwrap().is_empty()
&& matches.last().unwrap().start() >= bytes.len()
&& matches.last().unwrap().start() >= range.end
{
matches.pop().unwrap();
}
@@ -695,14 +728,25 @@ impl<'p, 's, M: Matcher, W: WriteColor> StandardSink<'p, 's, M, W> {
/// replacement, lazily allocating memory if necessary.
///
/// To access the result of a replacement, use `replacer.replacement()`.
fn replace(&mut self, bytes: &[u8]) -> io::Result<()> {
fn replace(
&mut self,
searcher: &Searcher,
bytes: &[u8],
range: std::ops::Range<usize>,
) -> io::Result<()> {
self.replacer.clear();
if self.standard.config.replacement.is_some() {
let replacement = (*self.standard.config.replacement)
.as_ref()
.map(|r| &*r)
.unwrap();
self.replacer.replace_all(&self.matcher, bytes, replacement)?;
self.replacer.replace_all(
searcher,
&self.matcher,
bytes,
range,
replacement,
)?;
}
Ok(())
}
@@ -722,6 +766,16 @@ impl<'p, 's, M: Matcher, W: WriteColor> StandardSink<'p, 's, M, W> {
}
self.after_context_remaining == 0
}
/// Returns whether the current match count exceeds the configured limit.
/// If there is no limit, then this always returns false.
fn match_more_than_limit(&self) -> bool {
let limit = match self.standard.config.max_matches {
None => return false,
Some(limit) => limit,
};
self.match_count > limit
}
}
impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
@@ -730,13 +784,29 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
fn matched(
&mut self,
searcher: &Searcher,
mat: &SinkMatch,
mat: &SinkMatch<'_>,
) -> Result<bool, io::Error> {
self.match_count += 1;
self.after_context_remaining = searcher.after_context() as u64;
// When we've exceeded our match count, then the remaining context
// lines should not be reset, but instead, decremented. This avoids a
// bug where we display more matches than a configured limit. The main
// idea here is that 'matched' might be called again while printing
// an after-context line. In that case, we should treat this as a
// contextual line rather than a matching line for the purposes of
// termination.
if self.match_more_than_limit() {
self.after_context_remaining =
self.after_context_remaining.saturating_sub(1);
} else {
self.after_context_remaining = searcher.after_context() as u64;
}
self.record_matches(mat.bytes())?;
self.replace(mat.bytes())?;
self.record_matches(
searcher,
mat.buffer(),
mat.bytes_range_in_buffer(),
)?;
self.replace(searcher, mat.buffer(), mat.bytes_range_in_buffer())?;
if let Some(ref mut stats) = self.stats {
stats.add_matches(self.standard.matches.len() as u64);
@@ -755,7 +825,7 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
fn context(
&mut self,
searcher: &Searcher,
ctx: &SinkContext,
ctx: &SinkContext<'_>,
) -> Result<bool, io::Error> {
self.standard.matches.clear();
self.replacer.clear();
@@ -765,8 +835,8 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
self.after_context_remaining.saturating_sub(1);
}
if searcher.invert_match() {
self.record_matches(ctx.bytes())?;
self.replace(ctx.bytes())?;
self.record_matches(searcher, ctx.bytes(), 0..ctx.bytes().len())?;
self.replace(searcher, ctx.bytes(), 0..ctx.bytes().len())?;
}
if searcher.binary_detection().convert_byte().is_some() {
if self.binary_byte_offset.is_some() {
@@ -834,7 +904,7 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
/// A StandardImpl is initialized every time a match or a contextual line is
/// reported.
#[derive(Debug)]
struct StandardImpl<'a, M: 'a + Matcher, W: 'a> {
struct StandardImpl<'a, M: Matcher, W> {
searcher: &'a Searcher,
sink: &'a StandardSink<'a, 'a, M, W>,
sunk: Sunk<'a>,
@@ -846,7 +916,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
/// Bundle self with a searcher and return the core implementation of Sink.
fn new(
searcher: &'a Searcher,
sink: &'a StandardSink<M, W>,
sink: &'a StandardSink<'_, '_, M, W>,
) -> StandardImpl<'a, M, W> {
StandardImpl {
searcher: searcher,
@@ -860,7 +930,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
/// for use with handling matching lines.
fn from_match(
searcher: &'a Searcher,
sink: &'a StandardSink<M, W>,
sink: &'a StandardSink<'_, '_, M, W>,
mat: &'a SinkMatch<'a>,
) -> StandardImpl<'a, M, W> {
let sunk = Sunk::from_sink_match(
@@ -875,7 +945,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
/// for use with handling contextual lines.
fn from_context(
searcher: &'a Searcher,
sink: &'a StandardSink<M, W>,
sink: &'a StandardSink<'_, '_, M, W>,
ctx: &'a SinkContext<'a>,
) -> StandardImpl<'a, M, W> {
let sunk = Sunk::from_sink_context(
@@ -1090,7 +1160,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
self.write_prelude(
self.sunk.absolute_byte_offset() + line.start() as u64,
self.sunk.line_number().map(|n| n + count),
Some(m.start() as u64 + 1),
Some(m.start().saturating_sub(line.start()) as u64 + 1),
)?;
count += 1;
if self.exceeds_max_columns(&bytes[line]) {
@@ -1115,6 +1185,15 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
}
}
self.write_line_term()?;
// It turns out that vimgrep really only wants one line per
// match, even when a match spans multiple lines. So when
// that option is enabled, we just quit after printing the
// first line.
//
// See: https://github.com/BurntSushi/ripgrep/issues/1866
if self.config().per_match_one_line {
break;
}
}
}
Ok(())
@@ -1358,25 +1437,24 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
let bin = self.searcher.binary_detection();
if let Some(byte) = bin.quit_byte() {
self.write(b"WARNING: stopped searching binary file ")?;
if let Some(path) = self.path() {
self.write_spec(self.config().colors.path(), path.as_bytes())?;
self.write(b" ")?;
self.write(b": ")?;
}
let remainder = format!(
"after match (found {:?} byte around offset {})\n",
"WARNING: stopped searching binary file after match \
(found {:?} byte around offset {})\n",
[byte].as_bstr(),
offset,
);
self.write(remainder.as_bytes())?;
} else if let Some(byte) = bin.convert_byte() {
self.write(b"Binary file ")?;
if let Some(path) = self.path() {
self.write_spec(self.config().colors.path(), path.as_bytes())?;
self.write(b" ")?;
self.write(b": ")?;
}
let remainder = format!(
"matches (found {:?} byte around offset {})\n",
"binary file matches (found {:?} byte around offset {})\n",
[byte].as_bstr(),
offset,
);
@@ -1470,14 +1548,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
}
fn trim_line_terminator(&self, buf: &[u8], line: &mut Match) {
let lineterm = self.searcher.line_terminator();
if lineterm.is_suffix(&buf[*line]) {
let mut end = line.end() - 1;
if lineterm.is_crlf() && buf[end - 1] == b'\r' {
end -= 1;
}
*line = line.with_end(end);
}
trim_line_terminator(&self.searcher, buf, line);
}
fn has_line_terminator(&self, buf: &[u8]) -> bool {
@@ -1523,7 +1594,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
/// multiple lines.
///
/// Note that this doesn't just return whether the searcher is in multi
/// line mode, but also checks if the mater can match over multiple lines.
/// line mode, but also checks if the matter can match over multiple lines.
/// If it can't, then we don't need multi line handling, even if the
/// searcher has multi line mode enabled.
fn multi_line(&self) -> bool {
@@ -1546,11 +1617,12 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
#[cfg(test)]
mod tests {
use grep_regex::RegexMatcher;
use grep_matcher::LineTerminator;
use grep_regex::{RegexMatcher, RegexMatcherBuilder};
use grep_searcher::SearcherBuilder;
use termcolor::NoColor;
use termcolor::{Ansi, NoColor};
use super::{Standard, StandardBuilder};
use super::{ColorSpecs, Standard, StandardBuilder};
const SHERLOCK: &'static str = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
@@ -1575,6 +1647,10 @@ and exhibited clearly, with a label attached.\
String::from_utf8(printer.get_mut().get_ref().to_owned()).unwrap()
}
fn printer_contents_ansi(printer: &mut Standard<Ansi<Vec<u8>>>) -> String {
String::from_utf8(printer.get_mut().get_ref().to_owned()).unwrap()
}
#[test]
fn reports_match() {
let matcher = RegexMatcher::new("Sherlock").unwrap();
@@ -2991,9 +3067,9 @@ Holmeses, success in the province of detective work must always
let got = printer_contents(&mut printer);
let expected = "\
1:16:For the Doctor Watsons of this world, as opposed to the Sherlock
2:16:Holmeses, success in the province of detective work must always
2:1:Holmeses, success in the province of detective work must always
5:12:but Doctor Watson has to have it taken out for him and dusted,
6:12:and exhibited clearly, with a label attached.
6:1:and exhibited clearly, with a label attached.
";
assert_eq_printed!(expected, got);
}
@@ -3020,9 +3096,94 @@ Holmeses, success in the province of detective work must always
let got = printer_contents(&mut printer);
let expected = "\
1:16:For the Doctor Watsons of this world, as opposed to the Sherlock
2:16:Holmeses, success in the province of detective work must always
2:123:Holmeses, success in the province of detective work must always
3:123:be, to a very large extent, the result of luck. Sherlock Holmes
2:1:Holmeses, success in the province of detective work must always
2:58:Holmeses, success in the province of detective work must always
3:1:be, to a very large extent, the result of luck. Sherlock Holmes
";
assert_eq_printed!(expected, got);
}
#[test]
fn per_match_multi_line1_only_first_line() {
let matcher =
RegexMatcher::new(r"(?s:.{0})(Doctor Watsons|Sherlock)").unwrap();
let mut printer = StandardBuilder::new()
.per_match(true)
.per_match_one_line(true)
.column(true)
.build(NoColor::new(vec![]));
SearcherBuilder::new()
.multi_line(true)
.line_number(true)
.build()
.search_reader(
&matcher,
SHERLOCK.as_bytes(),
printer.sink(&matcher),
)
.unwrap();
let got = printer_contents(&mut printer);
let expected = "\
1:9:For the Doctor Watsons of this world, as opposed to the Sherlock
1:57:For the Doctor Watsons of this world, as opposed to the Sherlock
3:49:be, to a very large extent, the result of luck. Sherlock Holmes
";
assert_eq_printed!(expected, got);
}
#[test]
fn per_match_multi_line2_only_first_line() {
let matcher =
RegexMatcher::new(r"(?s)Watson.+?(Holmeses|clearly)").unwrap();
let mut printer = StandardBuilder::new()
.per_match(true)
.per_match_one_line(true)
.column(true)
.build(NoColor::new(vec![]));
SearcherBuilder::new()
.multi_line(true)
.line_number(true)
.build()
.search_reader(
&matcher,
SHERLOCK.as_bytes(),
printer.sink(&matcher),
)
.unwrap();
let got = printer_contents(&mut printer);
let expected = "\
1:16:For the Doctor Watsons of this world, as opposed to the Sherlock
5:12:but Doctor Watson has to have it taken out for him and dusted,
";
assert_eq_printed!(expected, got);
}
#[test]
fn per_match_multi_line3_only_first_line() {
let matcher =
RegexMatcher::new(r"(?s)Watson.+?Holmeses|always.+?be").unwrap();
let mut printer = StandardBuilder::new()
.per_match(true)
.per_match_one_line(true)
.column(true)
.build(NoColor::new(vec![]));
SearcherBuilder::new()
.multi_line(true)
.line_number(true)
.build()
.search_reader(
&matcher,
SHERLOCK.as_bytes(),
printer.sink(&matcher),
)
.unwrap();
let got = printer_contents(&mut printer);
let expected = "\
1:16:For the Doctor Watsons of this world, as opposed to the Sherlock
2:58:Holmeses, success in the province of detective work must always
";
assert_eq_printed!(expected, got);
}
@@ -3081,6 +3242,80 @@ Holmeses, success in the province of detective work must always
assert_eq_printed!(expected, got);
}
// This is a somewhat weird test that checks the behavior of attempting
// to replace a line terminator with something else.
//
// See: https://github.com/BurntSushi/ripgrep/issues/1311
#[test]
fn replacement_multi_line() {
let matcher = RegexMatcher::new(r"\n").unwrap();
let mut printer = StandardBuilder::new()
.replacement(Some(b"?".to_vec()))
.build(NoColor::new(vec![]));
SearcherBuilder::new()
.line_number(true)
.multi_line(true)
.build()
.search_reader(
&matcher,
"hello\nworld\n".as_bytes(),
printer.sink(&matcher),
)
.unwrap();
let got = printer_contents(&mut printer);
let expected = "1:hello?world?\n";
assert_eq_printed!(expected, got);
}
#[test]
fn replacement_multi_line_diff_line_term() {
let matcher = RegexMatcherBuilder::new()
.line_terminator(Some(b'\x00'))
.build(r"\n")
.unwrap();
let mut printer = StandardBuilder::new()
.replacement(Some(b"?".to_vec()))
.build(NoColor::new(vec![]));
SearcherBuilder::new()
.line_terminator(LineTerminator::byte(b'\x00'))
.line_number(true)
.multi_line(true)
.build()
.search_reader(
&matcher,
"hello\nworld\n".as_bytes(),
printer.sink(&matcher),
)
.unwrap();
let got = printer_contents(&mut printer);
let expected = "1:hello?world?\x00";
assert_eq_printed!(expected, got);
}
#[test]
fn replacement_multi_line_combine_lines() {
let matcher = RegexMatcher::new(r"\n(.)?").unwrap();
let mut printer = StandardBuilder::new()
.replacement(Some(b"?$1".to_vec()))
.build(NoColor::new(vec![]));
SearcherBuilder::new()
.line_number(true)
.multi_line(true)
.build()
.search_reader(
&matcher,
"hello\nworld\n".as_bytes(),
printer.sink(&matcher),
)
.unwrap();
let got = printer_contents(&mut printer);
let expected = "1:hello?world?\n";
assert_eq_printed!(expected, got);
}
#[test]
fn replacement_max_columns() {
let matcher = RegexMatcher::new(r"Sherlock|Doctor (\w+)").unwrap();
@@ -3387,4 +3622,57 @@ and xxx clearly, with a label attached.
";
assert_eq_printed!(expected, got);
}
#[test]
fn regression_search_empty_with_crlf() {
let matcher =
RegexMatcherBuilder::new().crlf(true).build(r"x?").unwrap();
let mut printer = StandardBuilder::new()
.color_specs(ColorSpecs::default_with_color())
.build(Ansi::new(vec![]));
SearcherBuilder::new()
.line_terminator(LineTerminator::crlf())
.build()
.search_reader(&matcher, &b"\n"[..], printer.sink(&matcher))
.unwrap();
let got = printer_contents_ansi(&mut printer);
assert!(!got.is_empty());
}
#[test]
fn regression_after_context_with_match() {
let haystack = "\
a
b
c
d
e
d
e
d
e
d
e
";
let matcher = RegexMatcherBuilder::new().build(r"d").unwrap();
let mut printer = StandardBuilder::new()
.max_matches(Some(1))
.build(NoColor::new(vec![]));
SearcherBuilder::new()
.line_number(true)
.after_context(2)
.build()
.search_reader(
&matcher,
haystack.as_bytes(),
printer.sink(&matcher),
)
.unwrap();
let got = printer_contents(&mut printer);
let expected = "4:d\n5-e\n6:d\n";
assert_eq_printed!(expected, got);
}
}

View File

@@ -1,14 +1,14 @@
use std::ops::{Add, AddAssign};
use std::time::Duration;
use util::NiceDuration;
use crate::util::NiceDuration;
/// Summary statistics produced at the end of a search.
///
/// When statistics are reported by a printer, they correspond to all searches
/// executed with that printer.
#[derive(Clone, Debug, Default, PartialEq, Eq)]
#[cfg_attr(feature = "serde1", derive(Serialize))]
#[cfg_attr(feature = "serde1", derive(serde::Serialize))]
pub struct Stats {
elapsed: NiceDuration,
searches: u64,

View File

@@ -8,10 +8,10 @@ use grep_matcher::Matcher;
use grep_searcher::{Searcher, Sink, SinkError, SinkFinish, SinkMatch};
use termcolor::{ColorSpec, NoColor, WriteColor};
use color::ColorSpecs;
use counter::CounterWriter;
use stats::Stats;
use util::PrinterPath;
use crate::color::ColorSpecs;
use crate::counter::CounterWriter;
use crate::stats::Stats;
use crate::util::{find_iter_at_in_context, PrinterPath};
/// The configuration for the summary printer.
///
@@ -457,7 +457,7 @@ impl<W> Summary<W> {
/// * `W` refers to the underlying writer that this printer is writing its
/// output to.
#[derive(Debug)]
pub struct SummarySink<'p, 's, M: Matcher, W: 's> {
pub struct SummarySink<'p, 's, M: Matcher, W> {
matcher: M,
summary: &'s mut Summary<W>,
path: Option<PrinterPath<'p>>,
@@ -504,6 +504,17 @@ impl<'p, 's, M: Matcher, W: WriteColor> SummarySink<'p, 's, M, W> {
self.stats.as_ref()
}
/// Returns true if and only if the searcher may report matches over
/// multiple lines.
///
/// Note that this doesn't just return whether the searcher is in multi
/// line mode, but also checks if the matter can match over multiple lines.
/// If it can't, then we don't need multi line handling, even if the
/// searcher has multi line mode enabled.
fn multi_line(&self, searcher: &Searcher) -> bool {
searcher.multi_line_with_matcher(&self.matcher)
}
/// Returns true if this printer should quit.
///
/// This implements the logic for handling quitting after seeing a certain
@@ -579,32 +590,39 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for SummarySink<'p, 's, M, W> {
fn matched(
&mut self,
_searcher: &Searcher,
mat: &SinkMatch,
searcher: &Searcher,
mat: &SinkMatch<'_>,
) -> Result<bool, io::Error> {
self.match_count += 1;
if let Some(ref mut stats) = self.stats {
let mut match_count = 0;
self.matcher
.find_iter(mat.bytes(), |_| {
match_count += 1;
let is_multi_line = self.multi_line(searcher);
let sink_match_count = if self.stats.is_none() && !is_multi_line {
1
} else {
// This gives us as many bytes as the searcher can offer. This
// isn't guaranteed to hold the necessary context to get match
// detection correct (because of look-around), but it does in
// practice.
let buf = mat.buffer();
let range = mat.bytes_range_in_buffer();
let mut count = 0;
find_iter_at_in_context(
searcher,
&self.matcher,
buf,
range,
|_| {
count += 1;
true
})
.map_err(io::Error::error_message)?;
if match_count == 0 {
// It is possible for the match count to be zero when
// look-around is used. Since `SinkMatch` won't necessarily
// contain the look-around in its match span, the search here
// could fail to find anything.
//
// It seems likely that setting match_count=1 here is probably
// wrong in some cases, but I don't think we can do any
// better. (Because this printer cannot assume that subsequent
// contents have been loaded into memory, so we have no way of
// increasing the search span here.)
match_count = 1;
}
stats.add_matches(match_count);
},
)?;
count
};
if is_multi_line {
self.match_count += sink_match_count;
} else {
self.match_count += 1;
}
if let Some(ref mut stats) = self.stats {
stats.add_matches(sink_match_count);
stats.add_matched_lines(mat.lines().count() as u64);
} else if self.summary.config.kind.quit_early() {
return Ok(false);

View File

@@ -7,11 +7,13 @@ use std::time;
use bstr::{ByteSlice, ByteVec};
use grep_matcher::{Captures, LineTerminator, Match, Matcher};
use grep_searcher::{
LineIter, SinkContext, SinkContextKind, SinkError, SinkMatch,
LineIter, Searcher, SinkContext, SinkContextKind, SinkError, SinkMatch,
};
#[cfg(feature = "serde1")]
use serde::{Serialize, Serializer};
use crate::MAX_LOOK_AHEAD;
/// A type for handling replacements while amortizing allocation.
pub struct Replacer<M: Matcher> {
space: Option<Space<M>>,
@@ -27,7 +29,7 @@ struct Space<M: Matcher> {
}
impl<M: Matcher> fmt::Debug for Replacer<M> {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let (dst, matches) = self.replacement().unwrap_or((&[], &[]));
f.debug_struct("Replacer")
.field("dst", &dst)
@@ -52,18 +54,41 @@ impl<M: Matcher> Replacer<M> {
/// This can fail if the underlying matcher reports an error.
pub fn replace_all<'a>(
&'a mut self,
searcher: &Searcher,
matcher: &M,
subject: &[u8],
mut subject: &[u8],
range: std::ops::Range<usize>,
replacement: &[u8],
) -> io::Result<()> {
// See the giant comment in 'find_iter_at_in_context' below for why we
// do this dance.
let is_multi_line = searcher.multi_line_with_matcher(&matcher);
if is_multi_line {
if subject[range.end..].len() >= MAX_LOOK_AHEAD {
subject = &subject[..range.end + MAX_LOOK_AHEAD];
}
} else {
// When searching a single line, we should remove the line
// terminator. Otherwise, it's possible for the regex (via
// look-around) to observe the line terminator and not match
// because of it.
let mut m = Match::new(0, range.end);
trim_line_terminator(searcher, subject, &mut m);
subject = &subject[..m.end()];
}
{
let &mut Space { ref mut dst, ref mut caps, ref mut matches } =
self.allocate(matcher)?;
dst.clear();
matches.clear();
matcher
.replace_with_captures(subject, caps, dst, |caps, dst| {
replace_with_captures_in_context(
matcher,
subject,
range.clone(),
caps,
dst,
|caps, dst| {
let start = dst.len();
caps.interpolate(
|name| matcher.capture_index(name),
@@ -74,8 +99,9 @@ impl<M: Matcher> Replacer<M> {
let end = dst.len();
matches.push(Match::new(start, end));
true
})
.map_err(io::Error::error_message)?;
},
)
.map_err(io::Error::error_message)?;
}
Ok(())
}
@@ -304,7 +330,7 @@ impl<'a> PrinterPath<'a> {
pub struct NiceDuration(pub time::Duration);
impl fmt::Display for NiceDuration {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{:0.6}s", self.fractional_seconds())
}
}
@@ -357,3 +383,108 @@ pub fn trim_ascii_prefix(
.count();
range.with_start(range.start() + count)
}
pub fn find_iter_at_in_context<M, F>(
searcher: &Searcher,
matcher: M,
mut bytes: &[u8],
range: std::ops::Range<usize>,
mut matched: F,
) -> io::Result<()>
where
M: Matcher,
F: FnMut(Match) -> bool,
{
// This strange dance is to account for the possibility of look-ahead in
// the regex. The problem here is that mat.bytes() doesn't include the
// lines beyond the match boundaries in mulit-line mode, which means that
// when we try to rediscover the full set of matches here, the regex may no
// longer match if it required some look-ahead beyond the matching lines.
//
// PCRE2 (and the grep-matcher interfaces) has no way of specifying an end
// bound of the search. So we kludge it and let the regex engine search the
// rest of the buffer... But to avoid things getting too crazy, we cap the
// buffer.
//
// If it weren't for multi-line mode, then none of this would be needed.
// Alternatively, if we refactored the grep interfaces to pass along the
// full set of matches (if available) from the searcher, then that might
// also help here. But that winds up paying an upfront unavoidable cost for
// the case where matches don't need to be counted. So then you'd have to
// introduce a way to pass along matches conditionally, only when needed.
// Yikes.
//
// Maybe the bigger picture thing here is that the searcher should be
// responsible for finding matches when necessary, and the printer
// shouldn't be involved in this business in the first place. Sigh. Live
// and learn. Abstraction boundaries are hard.
let is_multi_line = searcher.multi_line_with_matcher(&matcher);
if is_multi_line {
if bytes[range.end..].len() >= MAX_LOOK_AHEAD {
bytes = &bytes[..range.end + MAX_LOOK_AHEAD];
}
} else {
// When searching a single line, we should remove the line terminator.
// Otherwise, it's possible for the regex (via look-around) to observe
// the line terminator and not match because of it.
let mut m = Match::new(0, range.end);
trim_line_terminator(searcher, bytes, &mut m);
bytes = &bytes[..m.end()];
}
matcher
.find_iter_at(bytes, range.start, |m| {
if m.start() >= range.end {
return false;
}
matched(m)
})
.map_err(io::Error::error_message)
}
/// Given a buf and some bounds, if there is a line terminator at the end of
/// the given bounds in buf, then the bounds are trimmed to remove the line
/// terminator.
pub fn trim_line_terminator(
searcher: &Searcher,
buf: &[u8],
line: &mut Match,
) {
let lineterm = searcher.line_terminator();
if lineterm.is_suffix(&buf[*line]) {
let mut end = line.end() - 1;
if lineterm.is_crlf() && end > 0 && buf.get(end - 1) == Some(&b'\r') {
end -= 1;
}
*line = line.with_end(end);
}
}
/// Like `Matcher::replace_with_captures_at`, but accepts an end bound.
///
/// See also: `find_iter_at_in_context` for why we need this.
fn replace_with_captures_in_context<M, F>(
matcher: M,
bytes: &[u8],
range: std::ops::Range<usize>,
caps: &mut M::Captures,
dst: &mut Vec<u8>,
mut append: F,
) -> Result<(), M::Error>
where
M: Matcher,
F: FnMut(&M::Captures, &mut Vec<u8>) -> bool,
{
let mut last_match = range.start;
matcher.captures_iter_at(bytes, range.start, caps, |caps| {
let m = caps.get(0).unwrap();
if m.start() >= range.end {
return false;
}
dst.extend(&bytes[last_match..m.start()]);
last_match = m.end();
append(caps, dst)
})?;
let end = std::cmp::min(bytes.len(), range.end);
dst.extend(&bytes[last_match..end]);
Ok(())
}

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-regex"
version = "0.1.8" #:version
version = "0.1.10" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Use Rust's regex library with the 'grep' crate.
@@ -10,13 +10,14 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/regex"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/regex"
readme = "README.md"
keywords = ["regex", "grep", "search", "pattern", "line"]
license = "Unlicense/MIT"
license = "Unlicense OR MIT"
edition = "2018"
[dependencies]
aho-corasick = "0.7.3"
bstr = "0.2.10"
grep-matcher = { version = "0.1.2", path = "../matcher" }
grep-matcher = { version = "0.1.5", path = "../matcher" }
log = "0.4.5"
regex = "1.1"
regex-syntax = "0.6.5"
thread_local = "1"
thread_local = "1.1.2"

View File

@@ -4,11 +4,10 @@ The `grep-regex` crate provides an implementation of the `Matcher` trait from
the `grep-matcher` crate. This implementation permits Rust's regex engine to
be used in the `grep` crate for fast line oriented searching.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![Build status](https://github.com/BurntSushi/ripgrep/workflows/ci/badge.svg)](https://github.com/BurntSushi/ripgrep/actions)
[![](https://img.shields.io/crates/v/grep-regex.svg)](https://crates.io/crates/grep-regex)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
Dual-licensed under MIT or the [UNLICENSE](https://unlicense.org/).
### Documentation
@@ -27,9 +26,3 @@ Add this to your `Cargo.toml`:
[dependencies]
grep-regex = "0.1"
```
and this to your crate root:
```rust
extern crate grep_regex;
```

View File

@@ -3,13 +3,13 @@ use regex::bytes::{Regex, RegexBuilder};
use regex_syntax::ast::{self, Ast};
use regex_syntax::hir::{self, Hir};
use ast::AstAnalysis;
use crlf::crlfify;
use error::Error;
use literal::LiteralSets;
use multi::alternation_literals;
use non_matching::non_matching_bytes;
use strip::strip_from_match;
use crate::ast::AstAnalysis;
use crate::crlf::crlfify;
use crate::error::Error;
use crate::literal::LiteralSets;
use crate::multi::alternation_literals;
use crate::non_matching::non_matching_bytes;
use crate::strip::strip_from_match;
/// Config represents the configuration of a regex matcher in this crate.
/// The configuration is itself a rough combination of the knobs found in
@@ -175,6 +175,36 @@ impl ConfiguredHIR {
self.config.crlf && self.expr.is_line_anchored_end()
}
/// Returns the line terminator configured on this expression.
///
/// When we have beginning/end anchors (NOT line anchors), the fast line
/// searching path isn't quite correct. Or at least, doesn't match the
/// slow path. Namely, the slow path strips line terminators while the
/// fast path does not. Since '$' (when multi-line mode is disabled)
/// doesn't match at line boundaries, the existence of a line terminator
/// might cause it to not match when it otherwise would with the line
/// terminator stripped.
///
/// Since searching with text anchors is exceptionally rare in the
/// context of line oriented searching (multi-line mode is basically
/// always enabled), we just disable this optimization when there are
/// text anchors. We disable it by not returning a line terminator, since
/// without a line terminator, the fast search path can't be executed.
///
/// See: https://github.com/BurntSushi/ripgrep/issues/2260
pub fn line_terminator(&self) -> Option<LineTerminator> {
if self.is_any_anchored() {
None
} else {
self.config.line_terminator
}
}
/// Returns true if and only if the underlying HIR has any text anchors.
fn is_any_anchored(&self) -> bool {
self.expr.is_any_anchored_start() || self.expr.is_any_anchored_end()
}
/// Builds a regular expression from this HIR expression.
pub fn regex(&self) -> Result<Regex, Error> {
self.pattern_to_regex(&self.expr.to_string())

View File

@@ -4,9 +4,9 @@ use grep_matcher::{Match, Matcher, NoError};
use regex::bytes::Regex;
use regex_syntax::hir::{self, Hir, HirKind};
use config::ConfiguredHIR;
use error::Error;
use matcher::RegexCaptures;
use crate::config::ConfiguredHIR;
use crate::error::Error;
use crate::matcher::RegexCaptures;
/// A matcher for implementing "word match" semantics.
#[derive(Clone, Debug)]

View File

@@ -1,7 +1,7 @@
use std::error;
use std::fmt;
use util;
use crate::util;
/// An error that can occur in this crate.
///
@@ -72,7 +72,7 @@ impl error::Error for Error {
}
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self.kind {
ErrorKind::Regex(ref s) => write!(f, "{}", s),
ErrorKind::NotAllowed(ref lit) => {

View File

@@ -1,20 +1,10 @@
/*!
An implementation of `grep-matcher`'s `Matcher` trait for Rust's regex engine.
*/
#![deny(missing_docs)]
extern crate aho_corasick;
extern crate bstr;
extern crate grep_matcher;
#[macro_use]
extern crate log;
extern crate regex;
extern crate regex_syntax;
extern crate thread_local;
pub use error::{Error, ErrorKind};
pub use matcher::{RegexCaptures, RegexMatcher, RegexMatcherBuilder};
pub use crate::error::{Error, ErrorKind};
pub use crate::matcher::{RegexCaptures, RegexMatcher, RegexMatcherBuilder};
mod ast;
mod config;

View File

@@ -9,7 +9,7 @@ use bstr::ByteSlice;
use regex_syntax::hir::literal::{Literal, Literals};
use regex_syntax::hir::{self, Hir, HirKind};
use util;
use crate::util;
/// Represents prefix, suffix and inner "required" literals for a regular
/// expression.
@@ -55,7 +55,7 @@ impl LiteralSets {
if !word {
if self.prefixes.all_complete() && !self.prefixes.is_empty() {
debug!("literal prefixes detected: {:?}", self.prefixes);
log::debug!("literal prefixes detected: {:?}", self.prefixes);
// When this is true, the regex engine will do a literal scan,
// so we don't need to return anything. But we only do this
// if we aren't doing a word regex, since a word regex adds
@@ -106,7 +106,7 @@ impl LiteralSets {
&& !any_empty
&& !any_white
{
debug!("required literals found: {:?}", req_lits);
log::debug!("required literals found: {:?}", req_lits);
let alts: Vec<String> = req_lits
.into_iter()
.map(|x| util::bytes_to_regex(x))
@@ -149,27 +149,27 @@ impl LiteralSets {
let lits = match (p_min_len, s_min_len) {
(None, None) => return None,
(Some(_), None) => {
debug!("prefix literals found");
log::debug!("prefix literals found");
self.prefixes.literals()
}
(None, Some(_)) => {
debug!("suffix literals found");
log::debug!("suffix literals found");
self.suffixes.literals()
}
(Some(p), Some(s)) => {
if p >= s {
debug!("prefix literals found");
log::debug!("prefix literals found");
self.prefixes.literals()
} else {
debug!("suffix literals found");
log::debug!("suffix literals found");
self.suffixes.literals()
}
}
};
debug!("prefix/suffix literals found: {:?}", lits);
log::debug!("prefix/suffix literals found: {:?}", lits);
if has_only_whitespace(lits) {
debug!("dropping literals because one was whitespace");
log::debug!("dropping literals because one was whitespace");
return None;
}
let alts: Vec<String> =
@@ -177,9 +177,9 @@ impl LiteralSets {
// We're matching raw bytes, so disable Unicode mode.
Some(format!("(?-u:{})", alts.join("|")))
} else {
debug!("required literal found: {:?}", util::show_bytes(lit));
log::debug!("required literal found: {:?}", util::show_bytes(lit));
if lit.chars().all(|c| c.is_whitespace()) {
debug!("dropping literal because one was whitespace");
log::debug!("dropping literal because one was whitespace");
return None;
}
Some(format!("(?-u:{})", util::bytes_to_regex(&lit)))

View File

@@ -5,11 +5,11 @@ use grep_matcher::{
};
use regex::bytes::{CaptureLocations, Regex};
use config::{Config, ConfiguredHIR};
use crlf::CRLFMatcher;
use error::Error;
use multi::MultiLiteralMatcher;
use word::WordMatcher;
use crate::config::{Config, ConfiguredHIR};
use crate::crlf::CRLFMatcher;
use crate::error::Error;
use crate::multi::MultiLiteralMatcher;
use crate::word::WordMatcher;
/// A builder for constructing a `Matcher` using regular expressions.
///
@@ -19,7 +19,7 @@ use word::WordMatcher;
/// types of optimizations.
///
/// The syntax supported is documented as part of the regex crate:
/// https://docs.rs/regex/*/regex/#syntax
/// <https://docs.rs/regex/#syntax>.
#[derive(Clone, Debug)]
pub struct RegexMatcherBuilder {
config: Config,
@@ -41,19 +41,23 @@ impl RegexMatcherBuilder {
/// pattern.
///
/// The syntax supported is documented as part of the regex crate:
/// https://docs.rs/regex/*/regex/#syntax
/// <https://docs.rs/regex/#syntax>.
pub fn build(&self, pattern: &str) -> Result<RegexMatcher, Error> {
let chir = self.config.hir(pattern)?;
let fast_line_regex = chir.fast_line_regex()?;
let non_matching_bytes = chir.non_matching_bytes();
if let Some(ref re) = fast_line_regex {
debug!("extracted fast line regex: {:?}", re);
log::debug!("extracted fast line regex: {:?}", re);
}
let matcher = RegexMatcherImpl::new(&chir)?;
trace!("final regex: {:?}", matcher.regex());
log::trace!("final regex: {:?}", matcher.regex());
let mut config = self.config.clone();
// We override the line terminator in case the configured expr doesn't
// support it.
config.line_terminator = chir.line_terminator();
Ok(RegexMatcher {
config: self.config.clone(),
config,
matcher,
fast_line_regex,
non_matching_bytes,

View File

@@ -2,8 +2,8 @@ use aho_corasick::{AhoCorasick, AhoCorasickBuilder, MatchKind};
use grep_matcher::{Match, Matcher, NoError};
use regex_syntax::hir::Hir;
use error::Error;
use matcher::RegexCaptures;
use crate::error::Error;
use crate::matcher::RegexCaptures;
/// A matcher for an alternation of literals.
///

View File

@@ -13,7 +13,10 @@ pub fn non_matching_bytes(expr: &Hir) -> ByteSet {
/// the given expression.
fn remove_matching_bytes(expr: &Hir, set: &mut ByteSet) {
match *expr.kind() {
HirKind::Empty | HirKind::Anchor(_) | HirKind::WordBoundary(_) => {}
HirKind::Empty | HirKind::WordBoundary(_) => {}
HirKind::Anchor(_) => {
set.remove(b'\n');
}
HirKind::Literal(hir::Literal::Unicode(c)) => {
for &b in c.encode_utf8(&mut [0; 4]).as_bytes() {
set.remove(b);
@@ -125,4 +128,12 @@ mod tests {
assert_eq!(sparse(&extract(r"\xFF")), sparse_except(&[0xC3, 0xBF]));
assert_eq!(sparse(&extract(r"(?-u)\xFF")), sparse_except(&[0xFF]));
}
#[test]
fn anchor() {
assert_eq!(sparse(&extract(r"^")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"$")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"\A")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"\z")), sparse_except(&[b'\n']));
}
}

View File

@@ -1,7 +1,7 @@
use grep_matcher::LineTerminator;
use regex_syntax::hir::{self, Hir, HirKind};
use error::{Error, ErrorKind};
use crate::error::{Error, ErrorKind};
/// Return an HIR that is guaranteed to never match the given line terminator,
/// if possible.
@@ -106,7 +106,7 @@ mod tests {
use regex_syntax::Parser;
use super::{strip_from_match, LineTerminator};
use error::Error;
use crate::error::Error;
fn roundtrip(pattern: &str, byte: u8) -> String {
roundtrip_line_term(pattern, LineTerminator::byte(byte)).unwrap()

View File

@@ -4,11 +4,11 @@ use std::sync::Arc;
use grep_matcher::{Match, Matcher, NoError};
use regex::bytes::{CaptureLocations, Regex};
use thread_local::CachedThreadLocal;
use thread_local::ThreadLocal;
use config::ConfiguredHIR;
use error::Error;
use matcher::RegexCaptures;
use crate::config::ConfiguredHIR;
use crate::error::Error;
use crate::matcher::RegexCaptures;
/// A matcher for implementing "word match" semantics.
#[derive(Debug)]
@@ -21,19 +21,19 @@ pub struct WordMatcher {
/// A map from capture group name to capture group index.
names: HashMap<String, usize>,
/// A reusable buffer for finding the match location of the inner group.
locs: Arc<CachedThreadLocal<RefCell<CaptureLocations>>>,
locs: Arc<ThreadLocal<RefCell<CaptureLocations>>>,
}
impl Clone for WordMatcher {
fn clone(&self) -> WordMatcher {
// We implement Clone manually so that we get a fresh CachedThreadLocal
// such that it can set its own thread owner. This permits each thread
// We implement Clone manually so that we get a fresh ThreadLocal such
// that it can set its own thread owner. This permits each thread
// usings `locs` to hit the fast path.
WordMatcher {
regex: self.regex.clone(),
original: self.original.clone(),
names: self.names.clone(),
locs: Arc::new(CachedThreadLocal::new()),
locs: Arc::new(ThreadLocal::new()),
}
}
}
@@ -48,12 +48,12 @@ impl WordMatcher {
let original =
expr.with_pattern(|pat| format!("^(?:{})$", pat))?.regex()?;
let word_expr = expr.with_pattern(|pat| {
let pat = format!(r"(?:(?-m:^)|\W)({})(?:(?-m:$)|\W)", pat);
debug!("word regex: {:?}", pat);
let pat = format!(r"(?:(?m:^)|\W)({})(?:\W|(?m:$))", pat);
log::debug!("word regex: {:?}", pat);
pat
})?;
let regex = word_expr.regex()?;
let locs = Arc::new(CachedThreadLocal::new());
let locs = Arc::new(ThreadLocal::new());
let mut names = HashMap::new();
for (i, optional_name) in regex.capture_names().enumerate() {
@@ -111,8 +111,15 @@ impl WordMatcher {
}
let (_, slen) = bstr::decode_utf8(&haystack[cand]);
let (_, elen) = bstr::decode_last_utf8(&haystack[cand]);
cand =
cand.with_start(cand.start() + slen).with_end(cand.end() - elen);
let new_start = cand.start() + slen;
let new_end = cand.end() - elen;
// This occurs the original regex can match the empty string. In this
// case, just bail instead of trying to get it right here since it's
// likely a pathological case.
if new_start > new_end {
return Err(());
}
cand = cand.with_start(new_start).with_end(new_end);
if self.original.is_match(&haystack[cand]) {
Ok(Some(cand))
} else {
@@ -184,7 +191,7 @@ impl Matcher for WordMatcher {
#[cfg(test)]
mod tests {
use super::WordMatcher;
use config::Config;
use crate::config::Config;
use grep_matcher::{Captures, Match, Matcher};
fn matcher(pattern: &str) -> WordMatcher {
@@ -237,6 +244,8 @@ mod tests {
assert_eq!(Some((2, 5)), find(r"!?foo!?", "a!foo!a"));
assert_eq!(Some((2, 7)), find(r"!?foo!?", "##!foo!\n"));
assert_eq!(Some((3, 8)), find(r"!?foo!?", "##\n!foo!##"));
assert_eq!(Some((3, 8)), find(r"!?foo!?", "##\n!foo!\n##"));
assert_eq!(Some((3, 7)), find(r"f?oo!?", "##\nfoo!##"));
assert_eq!(Some((2, 5)), find(r"(?-u)foo[^a]*", "#!foo☃aaa"));
}

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-searcher"
version = "0.1.7" #:version
version = "0.1.10" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Fast line oriented regex searching as a library.
@@ -10,19 +10,20 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/searcher"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/searcher"
readme = "README.md"
keywords = ["regex", "grep", "egrep", "search", "pattern"]
license = "Unlicense/MIT"
license = "Unlicense OR MIT"
edition = "2018"
[dependencies]
bstr = { version = "0.2.0", default-features = false, features = ["std"] }
bytecount = "0.6"
encoding_rs = "0.8.14"
encoding_rs_io = "0.1.6"
grep-matcher = { version = "0.1.2", path = "../matcher" }
grep-matcher = { version = "0.1.5", path = "../matcher" }
log = "0.4.5"
memmap = "0.7"
memmap = { package = "memmap2", version = "0.5.3" }
[dev-dependencies]
grep-regex = { version = "0.1.3", path = "../regex" }
grep-regex = { version = "0.1.10", path = "../regex" }
regex = "1.1"
[features]

View File

@@ -5,11 +5,10 @@ things like reporting contextual lines, counting lines, inverting a search,
detecting binary data, automatic UTF-16 transcoding and deciding whether or not
to use memory maps.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![Build status](https://github.com/BurntSushi/ripgrep/workflows/ci/badge.svg)](https://github.com/BurntSushi/ripgrep/actions)
[![](https://img.shields.io/crates/v/grep-searcher.svg)](https://crates.io/crates/grep-searcher)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
Dual-licensed under MIT or the [UNLICENSE](https://unlicense.org/).
### Documentation
@@ -29,9 +28,3 @@ Add this to your `Cargo.toml`:
[dependencies]
grep-searcher = "0.1"
```
and this to your crate root:
```rust
extern crate grep_searcher;
```

View File

@@ -1,6 +1,3 @@
extern crate grep_regex;
extern crate grep_searcher;
use std::env;
use std::error::Error;
use std::io;

View File

@@ -48,10 +48,6 @@ using the
implementation of `Sink`.
```
extern crate grep_matcher;
extern crate grep_regex;
extern crate grep_searcher;
use std::error::Error;
use grep_matcher::Matcher;
@@ -99,24 +95,13 @@ searches stdin.
#![deny(missing_docs)]
extern crate bstr;
extern crate bytecount;
extern crate encoding_rs;
extern crate encoding_rs_io;
extern crate grep_matcher;
#[macro_use]
extern crate log;
extern crate memmap;
#[cfg(test)]
extern crate regex;
pub use lines::{LineIter, LineStep};
pub use searcher::{
pub use crate::lines::{LineIter, LineStep};
pub use crate::searcher::{
BinaryDetection, ConfigError, Encoding, MmapChoice, Searcher,
SearcherBuilder,
};
pub use sink::sinks;
pub use sink::{
pub use crate::sink::sinks;
pub use crate::sink::{
Sink, SinkContext, SinkContextKind, SinkError, SinkFinish, SinkMatch,
};

View File

@@ -4,7 +4,7 @@ use std::io;
use bstr::ByteSlice;
/// The default buffer capacity that we use for the line buffer.
pub(crate) const DEFAULT_BUFFER_CAPACITY: usize = 8 * (1 << 10); // 8 KB
pub(crate) const DEFAULT_BUFFER_CAPACITY: usize = 64 * (1 << 10); // 64 KB
/// The behavior of a searcher in the face of long lines and big contexts.
///

View File

@@ -2,13 +2,13 @@ use std::cmp;
use bstr::ByteSlice;
use grep_matcher::{LineMatchKind, Matcher};
use line_buffer::BinaryDetection;
use lines::{self, LineStep};
use searcher::{Config, Range, Searcher};
use sink::{
use crate::line_buffer::BinaryDetection;
use crate::lines::{self, LineStep};
use crate::searcher::{Config, Range, Searcher};
use crate::sink::{
Sink, SinkContext, SinkContextKind, SinkError, SinkFinish, SinkMatch,
};
use grep_matcher::{LineMatchKind, Matcher};
#[derive(Debug)]
pub struct Core<'s, M: 's, S> {
@@ -53,9 +53,9 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
};
if !core.searcher.multi_line_with_matcher(&core.matcher) {
if core.is_line_by_line_fast() {
trace!("searcher core: will use fast line searcher");
log::trace!("searcher core: will use fast line searcher");
} else {
trace!("searcher core: will use slow line searcher");
log::trace!("searcher core: will use slow line searcher");
}
}
core
@@ -441,6 +441,8 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
bytes: linebuf,
absolute_byte_offset: offset,
line_number: self.line_number,
buffer: buf,
bytes_range_in_buffer: range.start()..range.end(),
},
)?;
if !keepgoing {
@@ -465,6 +467,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
let keepgoing = self.sink.context(
&self.searcher,
&SinkContext {
#[cfg(test)]
line_term: self.config.line_term,
bytes: &buf[*range],
kind: SinkContextKind::Before,
@@ -495,6 +498,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
let keepgoing = self.sink.context(
&self.searcher,
&SinkContext {
#[cfg(test)]
line_term: self.config.line_term,
bytes: &buf[*range],
kind: SinkContextKind::After,
@@ -524,6 +528,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
let keepgoing = self.sink.context(
&self.searcher,
&SinkContext {
#[cfg(test)]
line_term: self.config.line_term,
bytes: &buf[*range],
kind: SinkContextKind::Other,

View File

@@ -1,16 +1,16 @@
use std::cmp;
use std::io;
use crate::line_buffer::{LineBufferReader, DEFAULT_BUFFER_CAPACITY};
use crate::lines::{self, LineStep};
use crate::sink::{Sink, SinkError};
use grep_matcher::Matcher;
use line_buffer::{LineBufferReader, DEFAULT_BUFFER_CAPACITY};
use lines::{self, LineStep};
use sink::{Sink, SinkError};
use searcher::core::Core;
use searcher::{Config, Range, Searcher};
use crate::searcher::core::Core;
use crate::searcher::{Config, Range, Searcher};
#[derive(Debug)]
pub struct ReadByLine<'s, M: 's, R, S> {
pub struct ReadByLine<'s, M, R, S> {
config: &'s Config,
core: Core<'s, M, S>,
rdr: LineBufferReader<'s, R>,
@@ -87,8 +87,7 @@ where
}
#[derive(Debug)]
pub struct SliceByLine<'s, M: 's, S> {
config: &'s Config,
pub struct SliceByLine<'s, M, S> {
core: Core<'s, M, S>,
slice: &'s [u8],
}
@@ -103,7 +102,6 @@ impl<'s, M: Matcher, S: Sink> SliceByLine<'s, M, S> {
debug_assert!(!searcher.multi_line_with_matcher(&matcher));
SliceByLine {
config: &searcher.config,
core: Core::new(searcher, matcher, write_to, true),
slice: slice,
}
@@ -134,7 +132,7 @@ impl<'s, M: Matcher, S: Sink> SliceByLine<'s, M, S> {
}
#[derive(Debug)]
pub struct MultiLine<'s, M: 's, S> {
pub struct MultiLine<'s, M, S> {
config: &'s Config,
core: Core<'s, M, S>,
slice: &'s [u8],
@@ -226,10 +224,19 @@ impl<'s, M: Matcher, S: Sink> MultiLine<'s, M, S> {
}
Some(last_match) => {
// If the lines in the previous match overlap with the lines
// in this match, then simply grow the match and move on.
// This happens when the next match begins on the same line
// that the last match ends on.
if last_match.end() > line.start() {
// in this match, then simply grow the match and move on. This
// happens when the next match begins on the same line that the
// last match ends on.
//
// Note that we do not technically require strict overlap here.
// Instead, we only require that the lines are adjacent. This
// provides larger blocks of lines to the printer, and results
// in overall better behavior with respect to how replacements
// are handled.
//
// See: https://github.com/BurntSushi/ripgrep/issues/1311
// And also the associated commit fixing #1311.
if last_match.end() >= line.start() {
self.last_match = Some(last_match.with_end(line.end()));
Ok(true)
} else {
@@ -340,8 +347,8 @@ impl<'s, M: Matcher, S: Sink> MultiLine<'s, M, S> {
#[cfg(test)]
mod tests {
use searcher::{BinaryDetection, SearcherBuilder};
use testutil::{KitchenSink, RegexMatcher, SearcherTester};
use crate::searcher::{BinaryDetection, SearcherBuilder};
use crate::testutil::{KitchenSink, RegexMatcher, SearcherTester};
use super::*;
@@ -633,7 +640,7 @@ d
haystack.push_str("a\n");
let byte_count = haystack.len();
let exp = format!("0:a\n131186:a\n\nbyte count:{}\n", byte_count);
let exp = format!("0:a\n1048690:a\n\nbyte count:{}\n", byte_count);
SearcherTester::new(&haystack, "a")
.line_number(false)
@@ -714,21 +721,23 @@ d
haystack.push_str("zzz\n");
}
haystack.push_str("a\n");
haystack.push_str("zzz\n");
haystack.push_str("a\x00a\n");
haystack.push_str("zzz\n");
haystack.push_str("a\n");
// The line buffered searcher has slightly different semantics here.
// Namely, it will *always* detect binary data in the current buffer
// before searching it. Thus, the total number of bytes searched is
// smaller than below.
let exp = "0:a\n\nbyte count:32770\nbinary offset:32773\n";
let exp = "0:a\n\nbyte count:262146\nbinary offset:262153\n";
// In contrast, the slice readers (for multi line as well) will only
// look for binary data in the initial chunk of bytes. After that
// point, it only looks for binary data in matches. Note though that
// the binary offset remains the same. (See the binary4 test for a case
// where the offset is explicitly different.)
let exp_slice =
"0:a\n32770:a\n\nbyte count:32773\nbinary offset:32773\n";
"0:a\n262146:a\n\nbyte count:262153\nbinary offset:262153\n";
SearcherTester::new(&haystack, "a")
.binary_detection(BinaryDetection::quit(0))
@@ -755,12 +764,12 @@ d
haystack.push_str("a\x00a\n");
haystack.push_str("a\n");
let exp = "0:a\n\nbyte count:32770\nbinary offset:32773\n";
let exp = "0:a\n\nbyte count:262146\nbinary offset:262149\n";
// The binary offset for the Slice readers corresponds to the binary
// data in `a\x00a\n` since the first line with binary data
// (`b\x00b\n`) isn't part of a match, and is therefore undetected.
let exp_slice =
"0:a\n32770:a\n\nbyte count:32777\nbinary offset:32777\n";
"0:a\n262146:a\n\nbyte count:262153\nbinary offset:262153\n";
SearcherTester::new(&haystack, "a")
.binary_detection(BinaryDetection::quit(0))
@@ -1477,8 +1486,8 @@ byte count:307
#[test]
fn scratch() {
use sinks;
use testutil::RegexMatcher;
use crate::sinks;
use crate::testutil::RegexMatcher;
const SHERLOCK: &'static [u8] = b"\
For the Doctor Wat\xFFsons of this world, as opposed to the Sherlock
@@ -1503,4 +1512,31 @@ and exhibited clearly, with a label attached.\
)
.unwrap();
}
// See: https://github.com/BurntSushi/ripgrep/issues/2260
#[test]
fn regression_2260() {
use grep_regex::RegexMatcherBuilder;
use crate::SearcherBuilder;
let matcher = RegexMatcherBuilder::new()
.line_terminator(Some(b'\n'))
.build(r"^\w+$")
.unwrap();
let mut searcher = SearcherBuilder::new().line_number(true).build();
let mut matched = false;
searcher
.search_slice(
&matcher,
b"GATC\n",
crate::sinks::UTF8(|_, _| {
matched = true;
Ok(true)
}),
)
.unwrap();
assert!(matched);
}
}

View File

@@ -71,6 +71,16 @@ impl MmapChoice {
if !self.is_enabled() {
return None;
}
if !cfg!(target_pointer_width = "64") {
// For 32-bit systems, it looks like mmap will succeed even if it
// can't address the entire file. This seems to happen at least on
// Windows, even though it uses to work prior to ripgrep 13. The
// only Windows-related change in ripgrep 13, AFAIK, was statically
// linking vcruntime. So maybe that's related? But I'm not sure.
//
// See: https://github.com/BurntSushi/ripgrep/issues/1911
return None;
}
if cfg!(target_os = "macos") {
// I guess memory maps on macOS aren't great. Should re-evaluate.
return None;
@@ -83,13 +93,13 @@ impl MmapChoice {
Ok(mmap) => Some(mmap),
Err(err) => {
if let Some(path) = path {
debug!(
log::debug!(
"{}: failed to open memory map: {}",
path.display(),
err
);
} else {
debug!("failed to open memory map: {}", err);
log::debug!("failed to open memory map: {}", err);
}
None
}

View File

@@ -5,15 +5,15 @@ use std::fs::File;
use std::io::{self, Read};
use std::path::Path;
use encoding_rs;
use encoding_rs_io::DecodeReaderBytesBuilder;
use grep_matcher::{LineTerminator, Match, Matcher};
use line_buffer::{
use crate::line_buffer::{
self, alloc_error, BufferAllocation, LineBuffer, LineBufferBuilder,
LineBufferReader, DEFAULT_BUFFER_CAPACITY,
};
use searcher::glue::{MultiLine, ReadByLine, SliceByLine};
use sink::{Sink, SinkError};
use crate::searcher::glue::{MultiLine, ReadByLine, SliceByLine};
use crate::sink::{Sink, SinkError};
use encoding_rs;
use encoding_rs_io::DecodeReaderBytesBuilder;
use grep_matcher::{LineTerminator, Match, Matcher};
pub use self::mmap::MmapChoice;
@@ -263,7 +263,7 @@ impl ::std::error::Error for ConfigError {
}
impl fmt::Display for ConfigError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match *self {
ConfigError::SearchUnavailable => {
write!(f, "grep config error: no available searchers")
@@ -659,16 +659,19 @@ impl Searcher {
S: Sink,
{
if let Some(mmap) = self.config.mmap.open(file, path) {
trace!("{:?}: searching via memory map", path);
log::trace!("{:?}: searching via memory map", path);
return self.search_slice(matcher, &mmap, write_to);
}
// Fast path for multi-line searches of files when memory maps are
// not enabled. This pre-allocates a buffer roughly the size of the
// file, which isn't possible when searching an arbitrary io::Read.
if self.multi_line_with_matcher(&matcher) {
trace!("{:?}: reading entire file on to heap for mulitline", path);
log::trace!(
"{:?}: reading entire file on to heap for mulitline",
path
);
self.fill_multi_line_buffer_from_file::<S>(file)?;
trace!("{:?}: searching via multiline strategy", path);
log::trace!("{:?}: searching via multiline strategy", path);
MultiLine::new(
self,
matcher,
@@ -677,7 +680,7 @@ impl Searcher {
)
.run()
} else {
trace!("{:?}: searching using generic reader", path);
log::trace!("{:?}: searching using generic reader", path);
self.search_reader(matcher, file, write_to)
}
}
@@ -707,15 +710,17 @@ impl Searcher {
self.check_config(&matcher).map_err(S::Error::error_config)?;
let mut decode_buffer = self.decode_buffer.borrow_mut();
let read_from = self
let decoder = self
.decode_builder
.build_with_buffer(read_from, &mut *decode_buffer)
.map_err(S::Error::error_io)?;
if self.multi_line_with_matcher(&matcher) {
trace!("generic reader: reading everything to heap for multiline");
self.fill_multi_line_buffer_from_reader::<_, S>(read_from)?;
trace!("generic reader: searching via multiline strategy");
log::trace!(
"generic reader: reading everything to heap for multiline"
);
self.fill_multi_line_buffer_from_reader::<_, S>(decoder)?;
log::trace!("generic reader: searching via multiline strategy");
MultiLine::new(
self,
matcher,
@@ -725,8 +730,8 @@ impl Searcher {
.run()
} else {
let mut line_buffer = self.line_buffer.borrow_mut();
let rdr = LineBufferReader::new(read_from, &mut *line_buffer);
trace!("generic reader: searching via roll buffer strategy");
let rdr = LineBufferReader::new(decoder, &mut *line_buffer);
log::trace!("generic reader: searching via roll buffer strategy");
ReadByLine::new(self, matcher, rdr, write_to).run()
}
}
@@ -747,14 +752,16 @@ impl Searcher {
// We can search the slice directly, unless we need to do transcoding.
if self.slice_needs_transcoding(slice) {
trace!("slice reader: needs transcoding, using generic reader");
log::trace!(
"slice reader: needs transcoding, using generic reader"
);
return self.search_reader(matcher, slice, write_to);
}
if self.multi_line_with_matcher(&matcher) {
trace!("slice reader: searching via multiline strategy");
log::trace!("slice reader: searching via multiline strategy");
MultiLine::new(self, matcher, slice, write_to).run()
} else {
trace!("slice reader: searching via slice-by-line strategy");
log::trace!("slice reader: searching via slice-by-line strategy");
SliceByLine::new(self, matcher, slice, write_to).run()
}
}
@@ -788,7 +795,7 @@ impl Searcher {
/// Returns true if and only if the given slice needs to be transcoded.
fn slice_needs_transcoding(&self, slice: &[u8]) -> bool {
self.config.encoding.is_some()
|| (self.config.bom_sniffing && slice_has_utf16_bom(slice))
|| (self.config.bom_sniffing && slice_has_bom(slice))
}
}
@@ -973,22 +980,24 @@ impl Searcher {
}
}
/// Returns true if and only if the given slice begins with a UTF-16 BOM.
/// Returns true if and only if the given slice begins with a UTF-8 or UTF-16
/// BOM.
///
/// This is used by the searcher to determine if a transcoder is necessary.
/// Otherwise, it is advantageous to search the slice directly.
fn slice_has_utf16_bom(slice: &[u8]) -> bool {
fn slice_has_bom(slice: &[u8]) -> bool {
let enc = match encoding_rs::Encoding::for_bom(slice) {
None => return false,
Some((enc, _)) => enc,
};
[encoding_rs::UTF_16LE, encoding_rs::UTF_16BE].contains(&enc)
[encoding_rs::UTF_16LE, encoding_rs::UTF_16BE, encoding_rs::UTF_8]
.contains(&enc)
}
#[cfg(test)]
mod tests {
use super::*;
use testutil::{KitchenSink, RegexMatcher};
use crate::testutil::{KitchenSink, RegexMatcher};
#[test]
fn config_error_heap_limit() {
@@ -1009,4 +1018,21 @@ mod tests {
let res = searcher.search_slice(matcher, &[], sink);
assert!(res.is_err());
}
#[test]
fn uft8_bom_sniffing() {
// See: https://github.com/BurntSushi/ripgrep/issues/1638
// ripgrep must sniff utf-8 BOM, just like it does with utf-16
let matcher = RegexMatcher::new("foo");
let haystack: &[u8] = &[0xef, 0xbb, 0xbf, 0x66, 0x6f, 0x6f];
let mut sink = KitchenSink::new();
let mut searcher = SearcherBuilder::new().build();
let res = searcher.search_slice(matcher, haystack, &mut sink);
assert!(res.is_ok());
let sink_output = String::from_utf8(sink.as_bytes().to_vec()).unwrap();
assert_eq!(sink_output, "1:0:foo\nbyte count:3\n");
}
}

View File

@@ -4,8 +4,8 @@ use std::io;
use grep_matcher::LineTerminator;
use lines::LineIter;
use searcher::{ConfigError, Searcher};
use crate::lines::LineIter;
use crate::searcher::{ConfigError, Searcher};
/// A trait that describes errors that can be reported by searchers and
/// implementations of `Sink`.
@@ -121,7 +121,7 @@ pub trait Sink {
fn matched(
&mut self,
_searcher: &Searcher,
_mat: &SinkMatch,
_mat: &SinkMatch<'_>,
) -> Result<bool, Self::Error>;
/// This method is called whenever a context line is found, and is optional
@@ -140,7 +140,7 @@ pub trait Sink {
fn context(
&mut self,
_searcher: &Searcher,
_context: &SinkContext,
_context: &SinkContext<'_>,
) -> Result<bool, Self::Error> {
Ok(true)
}
@@ -226,7 +226,7 @@ impl<'a, S: Sink> Sink for &'a mut S {
fn matched(
&mut self,
searcher: &Searcher,
mat: &SinkMatch,
mat: &SinkMatch<'_>,
) -> Result<bool, S::Error> {
(**self).matched(searcher, mat)
}
@@ -235,7 +235,7 @@ impl<'a, S: Sink> Sink for &'a mut S {
fn context(
&mut self,
searcher: &Searcher,
context: &SinkContext,
context: &SinkContext<'_>,
) -> Result<bool, S::Error> {
(**self).context(searcher, context)
}
@@ -279,7 +279,7 @@ impl<S: Sink + ?Sized> Sink for Box<S> {
fn matched(
&mut self,
searcher: &Searcher,
mat: &SinkMatch,
mat: &SinkMatch<'_>,
) -> Result<bool, S::Error> {
(**self).matched(searcher, mat)
}
@@ -288,7 +288,7 @@ impl<S: Sink + ?Sized> Sink for Box<S> {
fn context(
&mut self,
searcher: &Searcher,
context: &SinkContext,
context: &SinkContext<'_>,
) -> Result<bool, S::Error> {
(**self).context(searcher, context)
}
@@ -365,6 +365,8 @@ pub struct SinkMatch<'b> {
pub(crate) bytes: &'b [u8],
pub(crate) absolute_byte_offset: u64,
pub(crate) line_number: Option<u64>,
pub(crate) buffer: &'b [u8],
pub(crate) bytes_range_in_buffer: std::ops::Range<usize>,
}
impl<'b> SinkMatch<'b> {
@@ -405,6 +407,18 @@ impl<'b> SinkMatch<'b> {
pub fn line_number(&self) -> Option<u64> {
self.line_number
}
/// TODO
#[inline]
pub fn buffer(&self) -> &'b [u8] {
self.buffer
}
/// TODO
#[inline]
pub fn bytes_range_in_buffer(&self) -> std::ops::Range<usize> {
self.bytes_range_in_buffer.clone()
}
}
/// The type of context reported by a searcher.
@@ -422,6 +436,7 @@ pub enum SinkContextKind {
/// A type that describes a contextual line reported by a searcher.
#[derive(Clone, Debug)]
pub struct SinkContext<'b> {
#[cfg(test)]
pub(crate) line_term: LineTerminator,
pub(crate) bytes: &'b [u8],
pub(crate) kind: SinkContextKind,
@@ -500,7 +515,7 @@ pub mod sinks {
use std::str;
use super::{Sink, SinkError, SinkMatch};
use searcher::Searcher;
use crate::searcher::Searcher;
/// A sink that provides line numbers and matches as strings while ignoring
/// everything else.
@@ -531,7 +546,7 @@ pub mod sinks {
fn matched(
&mut self,
_searcher: &Searcher,
mat: &SinkMatch,
mat: &SinkMatch<'_>,
) -> Result<bool, io::Error> {
let matched = match str::from_utf8(mat.bytes()) {
Ok(matched) => matched,
@@ -579,7 +594,7 @@ pub mod sinks {
fn matched(
&mut self,
_searcher: &Searcher,
mat: &SinkMatch,
mat: &SinkMatch<'_>,
) -> Result<bool, io::Error> {
use std::borrow::Cow;
@@ -629,7 +644,7 @@ pub mod sinks {
fn matched(
&mut self,
_searcher: &Searcher,
mat: &SinkMatch,
mat: &SinkMatch<'_>,
) -> Result<bool, io::Error> {
let line_number = match mat.line_number() {
Some(line_number) => line_number,

View File

@@ -7,8 +7,8 @@ use grep_matcher::{
};
use regex::bytes::{Regex, RegexBuilder};
use searcher::{BinaryDetection, Searcher, SearcherBuilder};
use sink::{Sink, SinkContext, SinkFinish, SinkMatch};
use crate::searcher::{BinaryDetection, Searcher, SearcherBuilder};
use crate::sink::{Sink, SinkContext, SinkFinish, SinkMatch};
/// A simple regex matcher.
///
@@ -129,7 +129,7 @@ impl Sink for KitchenSink {
fn matched(
&mut self,
_searcher: &Searcher,
mat: &SinkMatch,
mat: &SinkMatch<'_>,
) -> Result<bool, io::Error> {
assert!(!mat.bytes().is_empty());
assert!(mat.lines().count() >= 1);
@@ -152,7 +152,7 @@ impl Sink for KitchenSink {
fn context(
&mut self,
_searcher: &Searcher,
context: &SinkContext,
context: &SinkContext<'_>,
) -> Result<bool, io::Error> {
assert!(!context.bytes().is_empty());
assert!(context.lines().count() == 1);

View File

@@ -3,7 +3,7 @@ rg(1)
Name
----
rg - recursively search current directory for lines matching a pattern
rg - recursively search the current directory for lines matching a pattern
Synopsis
@@ -27,7 +27,7 @@ Synopsis
DESCRIPTION
-----------
ripgrep (rg) recursively searches your current directory for a regex pattern.
ripgrep (rg) recursively searches the current directory for a regex pattern.
By default, ripgrep will respect your .gitignore and automatically skip hidden
files/directories and binary files.
@@ -82,10 +82,10 @@ _PATH_::
OPTIONS
-------
Note that for many options, there exist flags to disable them. In some cases,
those flags are not listed in a first class way below. For example, the
*--column* flag (listed below) enables column numbers in ripgrep's output, but
the *--no-column* flag (not listed below) disables them. The reverse can also
Note that many options can be disabled via flags. In some cases, those flags
are not listed in a first class way below. For example, the *--column*
flag (listed below) enables column numbers in ripgrep's output, but the
*--no-column* flag (not listed below) disables them. The reverse can also
exist. For example, the *--no-ignore* flag (listed below) disables ripgrep's
*gitignore* logic, but the *--ignore* flag (not listed below) enables it. These
flags are useful for overriding a ripgrep configuration file on the command
@@ -166,7 +166,7 @@ Each of the types of filtering can be configured via command line flags:
* There are several flags starting with '--no-ignore' that toggle which,
if any, ignore rules are respected. '--no-ignore' by itself will disable
all of them.
* '--hidden' will force ripgrep to search hidden files and directories.
* '-./--hidden' will force ripgrep to search hidden files and directories.
* '--binary' will force ripgrep to search binary files.
* '-L/--follow' will force ripgrep to follow symlinks.
@@ -191,10 +191,12 @@ simple. It is defined by two rules:
whitespace) are ignored.
ripgrep will look for a single configuration file if and only if the
*RIPGREP_CONFIG_PATH* environment variable is set and is non-empty.
ripgrep will parse shell arguments from this file on startup and will
behave as if the arguments in this file were prepended to any explicit
arguments given to ripgrep on the command line.
*RIPGREP_CONFIG_PATH* environment variable is set and is non-empty. ripgrep
will parse shell arguments from this file on startup and will behave as if
the arguments in this file were prepended to any explicit arguments given to
ripgrep on the command line. Note though that the 'rg' command you run must
still be valid. That is, it must always contain at least one pattern at the
command line, even if the configuration file uses the '-e/--regexp' flag.
For example, if your ripgreprc file contained a single line:

View File

@@ -1,14 +1,14 @@
class RipgrepBin < Formula
version '12.1.1'
version '13.0.0'
desc "Recursively search directories for a regex pattern."
homepage "https://github.com/BurntSushi/ripgrep"
if OS.mac?
url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-apple-darwin.tar.gz"
sha256 "7ff2fd5dd3a438d62fae5866ddae78cf542b733116f58cf21ab691a58c385703"
sha256 "585c18350cb8d4392461edd6c921e6edd5a97cbfc03b567d7bd440423e118082"
elsif OS.linux?
url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-unknown-linux-musl.tar.gz"
sha256 "88d3b735e43f6f16a0181a8fec48847693fae80168d5f889fdbdeb962f1fc804"
sha256 "ee4e0751ab108b6da4f47c52da187d5177dc371f0f512a7caaec5434e711c091"
end
conflicts_with "ripgrep"

Some files were not shown because too many files have changed in this diff Show More