mirror of
https://github.com/BurntSushi/ripgrep.git
synced 2025-07-27 02:01:58 -07:00
Compare commits
143 Commits
0.10.0
...
grep-cli-0
Author | SHA1 | Date | |
---|---|---|---|
|
d66610b295 | ||
|
019ae1989b | ||
|
36d3f235dc | ||
|
79018eb693 | ||
|
44cd344438 | ||
|
e493e54b9b | ||
|
8e8215aa65 | ||
|
3fe701498e | ||
|
e79085e9e4 | ||
|
764c197022 | ||
|
ef1611b5f5 | ||
|
45d12abbc5 | ||
|
5fde8391f9 | ||
|
3edb11c513 | ||
|
ed144be775 | ||
|
967e7ad0de | ||
|
9952ba2068 | ||
|
b751758d60 | ||
|
8f14cb18a5 | ||
|
da9d720431 | ||
|
a9d71a0368 | ||
|
f3646242cc | ||
|
601f212a0b | ||
|
5a565354f8 | ||
|
2a6532ae71 | ||
|
ece1f50cfe | ||
|
a7d26c8f14 | ||
|
bd222ae93f | ||
|
4359d8aac0 | ||
|
308819fb1f | ||
|
09108b7fda | ||
|
743d64f2e4 | ||
|
5962abc465 | ||
|
1604a18db3 | ||
|
9eeb0b01ce | ||
|
df4400209a | ||
|
77439f99a4 | ||
|
be7d6dd9ce | ||
|
9f15e3b671 | ||
|
254b8b67bb | ||
|
8a7f43b84d | ||
|
d968a27ed5 | ||
|
9b8f5cbaba | ||
|
c52da74ac3 | ||
|
7dcbff9a9b | ||
|
bef1f0e770 | ||
|
cd9815cb37 | ||
|
3f22c3a658 | ||
|
0913972104 | ||
|
f19b84fb23 | ||
|
59fc583aeb | ||
|
1c7c4e6640 | ||
|
69c5e3938d | ||
|
d9cf05ad50 | ||
|
af8b6caebb | ||
|
c84cfb6756 | ||
|
895e26a000 | ||
|
8c95290ff6 | ||
|
d6feeb7ff2 | ||
|
626ed00c19 | ||
|
332ad18401 | ||
|
fc3cf41247 | ||
|
a4868b8835 | ||
|
f99b991117 | ||
|
de0bc78982 | ||
|
147e96914c | ||
|
0abc40c23c | ||
|
f768796e4f | ||
|
da0c0c4705 | ||
|
05411b2b32 | ||
|
cc93db3b18 | ||
|
049354b766 | ||
|
386dd2806d | ||
|
5fe9a954e6 | ||
|
f158a42a71 | ||
|
5724391d39 | ||
|
0df71240ff | ||
|
f3164f2615 | ||
|
31d3e24130 | ||
|
bf842dbc7f | ||
|
6d5dba85bd | ||
|
afb89bcdad | ||
|
332dc56372 | ||
|
12a6ca45f9 | ||
|
9d703110cf | ||
|
e99b6bda0e | ||
|
276e2c9b9a | ||
|
9a9f54d44c | ||
|
47833b9ce7 | ||
|
44a9e37737 | ||
|
8fd05cacee | ||
|
4691d11034 | ||
|
519a6b68af | ||
|
9c940b45f4 | ||
|
0a167021c3 | ||
|
aeaa5fc1b1 | ||
|
7048a06c31 | ||
|
23be3cf850 | ||
|
b48bbf527d | ||
|
8eabe47b57 | ||
|
ff712bfd9d | ||
|
a7f2d48234 | ||
|
57500ad013 | ||
|
0b04553aff | ||
|
1ae121122f | ||
|
688003e51c | ||
|
718a00f6f2 | ||
|
7cbc535d70 | ||
|
7a6a40bae1 | ||
|
1e9ee2cc85 | ||
|
968491f8e9 | ||
|
63b0f31a22 | ||
|
7ecee299a5 | ||
|
dd396ff34e | ||
|
fb0a82f3c3 | ||
|
dbc8ca9cc1 | ||
|
c3db8db93d | ||
|
17ef4c40f3 | ||
|
a9e0477ea8 | ||
|
b3c5773266 | ||
|
118b950085 | ||
|
b45b2f58ea | ||
|
662a9bc73d | ||
|
401add0a99 | ||
|
f81b72721b | ||
|
1d4fccaadc | ||
|
09e464e674 | ||
|
31adff6f3c | ||
|
b41e596327 | ||
|
fb62266620 | ||
|
acf226c39d | ||
|
8299625e48 | ||
|
db256c87eb | ||
|
ba533f390e | ||
|
ba503eb677 | ||
|
f72c2dfd90 | ||
|
c0aa58b4f7 | ||
|
184ee4c328 | ||
|
e82fbf2c46 | ||
|
eb18da0450 | ||
|
0f7494216f | ||
|
442a278635 | ||
|
7ebed3ace6 |
@@ -1,4 +1,5 @@
|
||||
language: rust
|
||||
dist: xenial
|
||||
env:
|
||||
global:
|
||||
- PROJECT_NAME: ripgrep
|
||||
@@ -62,13 +63,13 @@ matrix:
|
||||
# Minimum Rust supported channel. We enable these to make sure ripgrep
|
||||
# continues to work on the advertised minimum Rust version.
|
||||
- os: linux
|
||||
rust: 1.28.0
|
||||
rust: 1.34.0
|
||||
env: TARGET=x86_64-unknown-linux-gnu
|
||||
- os: linux
|
||||
rust: 1.28.0
|
||||
rust: 1.34.0
|
||||
env: TARGET=x86_64-unknown-linux-musl
|
||||
- os: linux
|
||||
rust: 1.28.0
|
||||
rust: 1.34.0
|
||||
env: TARGET=arm-unknown-linux-gnueabihf GCC_VERSION=4.8
|
||||
addons:
|
||||
apt:
|
||||
|
128
CHANGELOG.md
128
CHANGELOG.md
@@ -1,3 +1,131 @@
|
||||
11.0.0 (TBD)
|
||||
============
|
||||
ripgrep 11 is a new major version release of ripgrep that contains many bug
|
||||
fixes, some performance improvements and a few feature enhancements. Notably,
|
||||
ripgrep's user experience for binary file filtering has been improved. See the
|
||||
[guide's new section on binary data](GUIDE.md#binary-data) for more details.
|
||||
|
||||
This release also marks a change in ripgrep's versioning. Where as the previous
|
||||
version was `0.10.0`, this version is `11.0.0`. Moving forward, ripgrep's
|
||||
major version will be increased a few times per year. ripgrep will continue to
|
||||
be conservative with respect to backwards compatibility, but may occasionally
|
||||
introduce breaking changes, which will always be documented in this CHANGELOG.
|
||||
See [issue 1172](https://github.com/BurntSushi/ripgrep/issues/1172) for a bit
|
||||
more detail on why this versioning change was made.
|
||||
|
||||
This release increases the **minimum supported Rust version** from 1.28.0 to
|
||||
1.34.0.
|
||||
|
||||
**BREAKING CHANGES**:
|
||||
|
||||
* ripgrep has tweaked its exit status codes to be more like GNU grep's. Namely,
|
||||
if a non-fatal error occurs during a search, then ripgrep will now always
|
||||
emit a `2` exit status code, regardless of whether a match is found or not.
|
||||
Previously, ripgrep would only emit a `2` exit status code for a catastrophic
|
||||
error (e.g., regex syntax error). One exception to this is if ripgrep is run
|
||||
with `-q/--quiet`. In that case, if an error occurs and a match is found,
|
||||
then ripgrep will exit with a `0` exit status code.
|
||||
* Supplying the `-u/--unrestricted` flag three times is now equivalent to
|
||||
supplying `--no-ignore --hidden --binary`. Previously, `-uuu` was equivalent
|
||||
to `--no-ignore --hidden --text`. The difference is that `--binary` disables
|
||||
binary file filtering without potentially dumping binary data into your
|
||||
terminal. That is, `rg -uuu foo` should now be equivalent to `grep -r foo`.
|
||||
* The `avx-accel` feature of ripgrep has been removed since it is no longer
|
||||
necessary. All uses of AVX in ripgrep are now enabled automatically via
|
||||
runtime CPU feature detection. The `simd-accel` feature does remain
|
||||
available, however, it does increase compilation times substantially at the
|
||||
moment.
|
||||
|
||||
Performance improvements:
|
||||
|
||||
* [PERF #497](https://github.com/BurntSushi/ripgrep/issues/497),
|
||||
[PERF #838](https://github.com/BurntSushi/ripgrep/issues/838):
|
||||
Make `rg -F -f dictionary-of-literals` much faster.
|
||||
|
||||
Feature enhancements:
|
||||
|
||||
* Added or improved file type filtering for Apache Thrift, ASP, Bazel, Brotli,
|
||||
BuildStream, bzip2, C, C++, Cython, gzip, Java, Make, Postscript, QML, Tex,
|
||||
XML, xz, zig and zstd.
|
||||
* [FEATURE #855](https://github.com/BurntSushi/ripgrep/issues/855):
|
||||
Add `--binary` flag for disabling binary file filtering.
|
||||
* [FEATURE #1078](https://github.com/BurntSushi/ripgrep/pull/1078):
|
||||
Add `--max-columns-preview` flag for showing a preview of long lines.
|
||||
* [FEATURE #1099](https://github.com/BurntSushi/ripgrep/pull/1099):
|
||||
Add support for Brotli and Zstd to the `-z/--search-zip` flag.
|
||||
* [FEATURE #1138](https://github.com/BurntSushi/ripgrep/pull/1138):
|
||||
Add `--no-ignore-dot` flag for ignoring `.ignore` files.
|
||||
* [FEATURE #1155](https://github.com/BurntSushi/ripgrep/pull/1155):
|
||||
Add `--auto-hybrid-regex` flag for automatically falling back to PCRE2.
|
||||
* [FEATURE #1159](https://github.com/BurntSushi/ripgrep/pull/1159):
|
||||
ripgrep's exit status logic should now match GNU grep. See updated man page.
|
||||
* [FEATURE #1164](https://github.com/BurntSushi/ripgrep/pull/1164):
|
||||
Add `--ignore-file-case-insensitive` for case insensitive ignore globs.
|
||||
* [FEATURE #1185](https://github.com/BurntSushi/ripgrep/pull/1185):
|
||||
Add `-I` flag as a short option for the `--no-filename` flag.
|
||||
* [FEATURE #1207](https://github.com/BurntSushi/ripgrep/pull/1207):
|
||||
Add `none` value to `-E/--encoding` to forcefully disable all transcoding.
|
||||
* [FEATURE da9d7204](https://github.com/BurntSushi/ripgrep/commit/da9d7204):
|
||||
Add `--pcre2-version` for querying showing PCRE2 version information.
|
||||
|
||||
Bug fixes:
|
||||
|
||||
* [BUG #306](https://github.com/BurntSushi/ripgrep/issues/306),
|
||||
[BUG #855](https://github.com/BurntSushi/ripgrep/issues/855):
|
||||
Improve the user experience for ripgrep's binary file filtering.
|
||||
* [BUG #373](https://github.com/BurntSushi/ripgrep/issues/373),
|
||||
[BUG #1098](https://github.com/BurntSushi/ripgrep/issues/1098):
|
||||
`**` is now accepted as valid syntax anywhere in a glob.
|
||||
* [BUG #916](https://github.com/BurntSushi/ripgrep/issues/916):
|
||||
ripgrep no longer hangs when searching `/proc` with a zombie process present.
|
||||
* [BUG #1052](https://github.com/BurntSushi/ripgrep/issues/1052):
|
||||
Fix bug where ripgrep could panic when transcoding UTF-16 files.
|
||||
* [BUG #1055](https://github.com/BurntSushi/ripgrep/issues/1055):
|
||||
Suggest `-U/--multiline` when a pattern contains a `\n`.
|
||||
* [BUG #1063](https://github.com/BurntSushi/ripgrep/issues/1063):
|
||||
Always strip a BOM if it's present, even for UTF-8.
|
||||
* [BUG #1064](https://github.com/BurntSushi/ripgrep/issues/1064):
|
||||
Fix inner literal detection that could lead to incorrect matches.
|
||||
* [BUG #1079](https://github.com/BurntSushi/ripgrep/issues/1079):
|
||||
Fixes a bug where the order of globs could result in missing a match.
|
||||
* [BUG #1089](https://github.com/BurntSushi/ripgrep/issues/1089):
|
||||
Fix another bug where ripgrep could panic when transcoding UTF-16 files.
|
||||
* [BUG #1091](https://github.com/BurntSushi/ripgrep/issues/1091):
|
||||
Add note about inverted flags to the man page.
|
||||
* [BUG #1093](https://github.com/BurntSushi/ripgrep/pull/1093):
|
||||
Fix handling of literal slashes in gitignore patterns.
|
||||
* [BUG #1095](https://github.com/BurntSushi/ripgrep/issues/1095):
|
||||
Fix corner cases involving the `--crlf` flag.
|
||||
* [BUG #1101](https://github.com/BurntSushi/ripgrep/issues/1101):
|
||||
Fix AsciiDoc escaping for man page output.
|
||||
* [BUG #1103](https://github.com/BurntSushi/ripgrep/issues/1103):
|
||||
Clarify what `--encoding auto` does.
|
||||
* [BUG #1106](https://github.com/BurntSushi/ripgrep/issues/1106):
|
||||
`--files-with-matches` and `--files-without-match` work with one file.
|
||||
* [BUG #1121](https://github.com/BurntSushi/ripgrep/issues/1121):
|
||||
Fix bug that was triggering Windows antimalware when using the `--files`
|
||||
flag.
|
||||
* [BUG #1125](https://github.com/BurntSushi/ripgrep/issues/1125),
|
||||
[BUG #1159](https://github.com/BurntSushi/ripgrep/issues/1159):
|
||||
ripgrep shouldn't panic for `rg -h | rg` and should emit correct exit status.
|
||||
* [BUG #1144](https://github.com/BurntSushi/ripgrep/issues/1144):
|
||||
Fixes a bug where line numbers could be wrong on big-endian machines.
|
||||
* [BUG #1154](https://github.com/BurntSushi/ripgrep/issues/1154):
|
||||
Windows files with "hidden" attribute are now treated as hidden.
|
||||
* [BUG #1173](https://github.com/BurntSushi/ripgrep/issues/1173):
|
||||
Fix handling of `**` patterns in gitignore files.
|
||||
* [BUG #1174](https://github.com/BurntSushi/ripgrep/issues/1174):
|
||||
Fix handling of repeated `**` patterns in gitignore files.
|
||||
* [BUG #1176](https://github.com/BurntSushi/ripgrep/issues/1176):
|
||||
Fix bug where `-F`/`-x` weren't applied to patterns given via `-f`.
|
||||
* [BUG #1189](https://github.com/BurntSushi/ripgrep/issues/1189):
|
||||
Document cases where ripgrep may use a lot of memory.
|
||||
* [BUG #1203](https://github.com/BurntSushi/ripgrep/issues/1203):
|
||||
Fix a matching bug related to the suffix literal optimization.
|
||||
* [BUG 8f14cb18](https://github.com/BurntSushi/ripgrep/commit/8f14cb18):
|
||||
Increase the default stack size for PCRE2's JIT.
|
||||
|
||||
|
||||
0.10.0 (2018-09-07)
|
||||
===================
|
||||
This is a new minor version release of ripgrep that contains some major new
|
||||
|
721
Cargo.lock
generated
721
Cargo.lock
generated
File diff suppressed because it is too large
Load Diff
@@ -17,6 +17,7 @@ license = "Unlicense OR MIT"
|
||||
exclude = ["HomebrewFormula"]
|
||||
build = "build.rs"
|
||||
autotests = false
|
||||
edition = "2018"
|
||||
|
||||
[badges]
|
||||
travis-ci = { repository = "BurntSushi/ripgrep" }
|
||||
@@ -45,8 +46,9 @@ members = [
|
||||
]
|
||||
|
||||
[dependencies]
|
||||
grep = { version = "0.2.2", path = "grep" }
|
||||
ignore = { version = "0.4.4", path = "ignore" }
|
||||
bstr = "0.1.2"
|
||||
grep = { version = "0.2.3", path = "grep" }
|
||||
ignore = { version = "0.4.7", path = "ignore" }
|
||||
lazy_static = "1.1.0"
|
||||
log = "0.4.5"
|
||||
num_cpus = "1.8.0"
|
||||
@@ -72,7 +74,6 @@ serde = "1.0.77"
|
||||
serde_derive = "1.0.77"
|
||||
|
||||
[features]
|
||||
avx-accel = ["grep/avx-accel"]
|
||||
simd-accel = ["grep/simd-accel"]
|
||||
pcre2 = ["grep/pcre2"]
|
||||
|
||||
@@ -81,6 +82,7 @@ debug = 1
|
||||
|
||||
[package.metadata.deb]
|
||||
features = ["pcre2"]
|
||||
section = "utils"
|
||||
assets = [
|
||||
["target/release/rg", "usr/bin/", "755"],
|
||||
["COPYING", "usr/share/doc/ripgrep/", "644"],
|
||||
|
15
FAQ.md
15
FAQ.md
@@ -118,7 +118,7 @@ from run to run of ripgrep.
|
||||
The only way to make the order of results consistent is to ask ripgrep to
|
||||
sort the output. Currently, this will disable all parallelism. (On smaller
|
||||
repositories, you might not notice much of a performance difference!) You
|
||||
can achieve this with the `--sort-files` flag.
|
||||
can achieve this with the `--sort path` flag.
|
||||
|
||||
There is more discussion on this topic here:
|
||||
https://github.com/BurntSushi/ripgrep/issues/152
|
||||
@@ -136,10 +136,10 @@ How do I search compressed files?
|
||||
</h3>
|
||||
|
||||
ripgrep's `-z/--search-zip` flag will cause it to search compressed files
|
||||
automatically. Currently, this supports gzip, bzip2, lzma, lz4 and xz only and
|
||||
requires the corresponding `gzip`, `bzip2` and `xz` binaries to be installed on
|
||||
your system. (That is, ripgrep does decompression by shelling out to another
|
||||
process.)
|
||||
automatically. Currently, this supports gzip, bzip2, xz, lzma, lz4, Brotli and
|
||||
Zstd. Each of these requires requires the corresponding `gzip`, `bzip2`, `xz`,
|
||||
`lz4`, `brotli` and `zstd` binaries to be installed on your system. (That is,
|
||||
ripgrep does decompression by shelling out to another process.)
|
||||
|
||||
ripgrep currently does not search archive formats, so `*.tar.gz` files, for
|
||||
example, are skipped.
|
||||
@@ -149,9 +149,8 @@ example, are skipped.
|
||||
How do I search over multiple lines?
|
||||
</h3>
|
||||
|
||||
This isn't currently possible. ripgrep is fundamentally a line-oriented search
|
||||
tool. With that said,
|
||||
[multiline search is a planned opt-in feature](https://github.com/BurntSushi/ripgrep/issues/176).
|
||||
The `-U/--multiline` flag enables ripgrep to report results that span over
|
||||
multiple lines.
|
||||
|
||||
|
||||
<h3 name="fancy">
|
||||
|
126
GUIDE.md
126
GUIDE.md
@@ -18,6 +18,7 @@ translatable to any command line shell environment.
|
||||
* [Replacements](#replacements)
|
||||
* [Configuration file](#configuration-file)
|
||||
* [File encoding](#file-encoding)
|
||||
* [Binary data](#binary-data)
|
||||
* [Common options](#common-options)
|
||||
|
||||
|
||||
@@ -235,6 +236,11 @@ Like `.gitignore`, a `.ignore` file can be placed in any directory. Its rules
|
||||
will be processed with respect to the directory it resides in, just like
|
||||
`.gitignore`.
|
||||
|
||||
To process `.gitignore` and `.ignore` files case insensitively, use the flag
|
||||
`--ignore-file-case-insensitive`. This is especially useful on case insensitive
|
||||
file systems like those on Windows and macOS. Note though that this can come
|
||||
with a significant performance penalty, and is therefore disabled by default.
|
||||
|
||||
For a more in depth description of how glob patterns in a `.gitignore` file
|
||||
are interpreted, please see `man gitignore`.
|
||||
|
||||
@@ -520,9 +526,9 @@ config file. Once the environment variable is set, open the file and just type
|
||||
in the flags you want set automatically. There are only two rules for
|
||||
describing the format of the config file:
|
||||
|
||||
1. Every line is a shell argument, after trimming ASCII whitespace.
|
||||
2. Lines starting with `#` (optionally preceded by any amount of
|
||||
ASCII whitespace) are ignored.
|
||||
1. Every line is a shell argument, after trimming whitespace.
|
||||
2. Lines starting with `#` (optionally preceded by any amount of whitespace)
|
||||
are ignored.
|
||||
|
||||
In particular, there is no escaping. Each line is given to ripgrep as a single
|
||||
command line argument verbatim.
|
||||
@@ -532,8 +538,9 @@ formatting peculiarities:
|
||||
|
||||
```
|
||||
$ cat $HOME/.ripgreprc
|
||||
# Don't let ripgrep vomit really long lines to my terminal.
|
||||
# Don't let ripgrep vomit really long lines to my terminal, and show a preview.
|
||||
--max-columns=150
|
||||
--max-columns-preview
|
||||
|
||||
# Add my 'web' type.
|
||||
--type-add
|
||||
@@ -598,13 +605,14 @@ topic, but we can try to summarize its relevancy to ripgrep:
|
||||
* Files are generally just a bundle of bytes. There is no reliable way to know
|
||||
their encoding.
|
||||
* Either the encoding of the pattern must match the encoding of the files being
|
||||
searched, or a form of transcoding must be performed converts either the
|
||||
searched, or a form of transcoding must be performed that converts either the
|
||||
pattern or the file to the same encoding as the other.
|
||||
* ripgrep tends to work best on plain text files, and among plain text files,
|
||||
the most popular encodings likely consist of ASCII, latin1 or UTF-8. As
|
||||
a special exception, UTF-16 is prevalent in Windows environments
|
||||
|
||||
In light of the above, here is how ripgrep behaves:
|
||||
In light of the above, here is how ripgrep behaves when `--encoding auto` is
|
||||
given, which is the default:
|
||||
|
||||
* All input is assumed to be ASCII compatible (which means every byte that
|
||||
corresponds to an ASCII codepoint actually is an ASCII codepoint). This
|
||||
@@ -620,12 +628,15 @@ In light of the above, here is how ripgrep behaves:
|
||||
they correspond to a UTF-16 BOM, then ripgrep will transcode the contents of
|
||||
the file from UTF-16 to UTF-8, and then execute the search on the transcoded
|
||||
version of the file. (This incurs a performance penalty since transcoding
|
||||
is slower than regex searching.)
|
||||
is slower than regex searching.) If the file contains invalid UTF-16, then
|
||||
the Unicode replacement codepoint is substituted in place of invalid code
|
||||
units.
|
||||
* To handle other cases, ripgrep provides a `-E/--encoding` flag, which permits
|
||||
you to specify an encoding from the
|
||||
[Encoding Standard](https://encoding.spec.whatwg.org/#concept-encoding-get).
|
||||
ripgrep will assume *all* files searched are the encoding specified and
|
||||
will perform a transcoding step just like in the UTF-16 case described above.
|
||||
ripgrep will assume *all* files searched are the encoding specified (unless
|
||||
the file has a BOM) and will perform a transcoding step just like in the
|
||||
UTF-16 case described above.
|
||||
|
||||
By default, ripgrep will not require its input be valid UTF-8. That is, ripgrep
|
||||
can and will search arbitrary bytes. The key here is that if you're searching
|
||||
@@ -635,9 +646,26 @@ pattern won't find anything. With all that said, this mode of operation is
|
||||
important, because it lets you find ASCII or UTF-8 *within* files that are
|
||||
otherwise arbitrary bytes.
|
||||
|
||||
As a special case, the `-E/--encoding` flag supports the value `none`, which
|
||||
will completely disable all encoding related logic, including BOM sniffing.
|
||||
When `-E/--encoding` is set to `none`, ripgrep will search the raw bytes of
|
||||
the underlying file with no transcoding step. For example, here's how you might
|
||||
search the raw UTF-16 encoding of the string `Шерлок`:
|
||||
|
||||
```
|
||||
$ rg '(?-u)\(\x045\x04@\x04;\x04>\x04:\x04' -E none -a some-utf16-file
|
||||
```
|
||||
|
||||
Of course, that's just an example meant to show how one can drop down into
|
||||
raw bytes. Namely, the simpler command works as you might expect automatically:
|
||||
|
||||
```
|
||||
$ rg 'Шерлок' some-utf16-file
|
||||
```
|
||||
|
||||
Finally, it is possible to disable ripgrep's Unicode support from within the
|
||||
pattern regular expression. For example, let's say you wanted `.` to match any
|
||||
byte rather than any Unicode codepoint. (You might want this while searching a
|
||||
regular expression. For example, let's say you wanted `.` to match any byte
|
||||
rather than any Unicode codepoint. (You might want this while searching a
|
||||
binary file, since `.` by default will not match invalid UTF-8.) You could do
|
||||
this by disabling Unicode via a regular expression flag:
|
||||
|
||||
@@ -654,6 +682,76 @@ $ rg '\w(?-u:\w)\w'
|
||||
```
|
||||
|
||||
|
||||
### Binary data
|
||||
|
||||
In addition to skipping hidden files and files in your `.gitignore` by default,
|
||||
ripgrep also attempts to skip binary files. ripgrep does this by default
|
||||
because binary files (like PDFs or images) are typically not things you want to
|
||||
search when searching for regex matches. Moreover, if content in a binary file
|
||||
did match, then it's possible for undesirable binary data to be printed to your
|
||||
terminal and wreak havoc.
|
||||
|
||||
Unfortunately, unlike skipping hidden files and respecting your `.gitignore`
|
||||
rules, a file cannot as easily be classified as binary. In order to figure out
|
||||
whether a file is binary, the most effective heuristic that balances
|
||||
correctness with performance is to simply look for `NUL` bytes. At that point,
|
||||
the determination is simple: a file is considered "binary" if and only if it
|
||||
contains a `NUL` byte somewhere in its contents.
|
||||
|
||||
The issue is that while most binary files will have a `NUL` byte toward the
|
||||
beginning of its contents, this is not necessarily true. The `NUL` byte might
|
||||
be the very last byte in a large file, but that file is still considered
|
||||
binary. While this leads to a fair amount of complexity inside ripgrep's
|
||||
implementation, it also results in some unintuitive user experiences.
|
||||
|
||||
At a high level, ripgrep operates in three different modes with respect to
|
||||
binary files:
|
||||
|
||||
1. The default mode is to attempt to remove binary files from a search
|
||||
completely. This is meant to mirror how ripgrep removes hidden files and
|
||||
files in your `.gitignore` automatically. That is, as soon as a file is
|
||||
detected as binary, searching stops. If a match was already printed (because
|
||||
it was detected long before a `NUL` byte), then ripgrep will print a warning
|
||||
message indicating that the search stopped prematurely. This default mode
|
||||
**only applies to files searched by ripgrep as a result of recursive
|
||||
directory traversal**, which is consistent with ripgrep's other automatic
|
||||
filtering. For example, `rg foo .file` will search `.file` even though it
|
||||
is hidden. Similarly, `rg foo binary-file` search `binary-file` in "binary"
|
||||
mode automatically.
|
||||
2. Binary mode is similar to the default mode, except it will not always
|
||||
stop searching after it sees a `NUL` byte. Namely, in this mode, ripgrep
|
||||
will continue searching a file that is known to be binary until the first
|
||||
of two conditions is met: 1) the end of the file has been reached or 2) a
|
||||
match is or has been seen. This means that in binary mode, if ripgrep
|
||||
reports no matches, then there are no matches in the file. When a match does
|
||||
occur, ripgrep prints a message similar to one it prints when in its default
|
||||
mode indicating that the search has stopped prematurely. This mode can be
|
||||
forcefully enabled for all files with the `--binary` flag. The purpose of
|
||||
binary mode is to provide a way to discover matches in all files, but to
|
||||
avoid having binary data dumped into your terminal.
|
||||
3. Text mode completely disables all binary detection and searches all files
|
||||
as if they were text. This is useful when searching a file that is
|
||||
predominantly text but contains a `NUL` byte, or if you are specifically
|
||||
trying to search binary data. This mode can be enabled with the `-a/--text`
|
||||
flag. Note that when using this mode on very large binary files, it is
|
||||
possible for ripgrep to use a lot of memory.
|
||||
|
||||
Unfortunately, there is one additional complexity in ripgrep that can make it
|
||||
difficult to reason about binary files. That is, the way binary detection works
|
||||
depends on the way that ripgrep searches your files. Specifically:
|
||||
|
||||
* When ripgrep uses memory maps, then binary detection is only performed on the
|
||||
first few kilobytes of the file in addition to every matching line.
|
||||
* When ripgrep doesn't use memory maps, then binary detection is performed on
|
||||
all bytes searched.
|
||||
|
||||
This means that whether a file is detected as binary or not can change based
|
||||
on the internal search strategy used by ripgrep. If you prefer to keep
|
||||
ripgrep's binary file detection consistent, then you can disable memory maps
|
||||
via the `--no-mmap` flag. (The cost will be a small performance regression when
|
||||
searching very large files on some platforms.)
|
||||
|
||||
|
||||
### Common options
|
||||
|
||||
ripgrep has a lot of flags. Too many to keep in your head at once. This section
|
||||
@@ -675,10 +773,10 @@ used options that will likely impact how you use ripgrep on a regular basis.
|
||||
* `--files`: Print the files that ripgrep *would* search, but don't actually
|
||||
search them.
|
||||
* `-a/--text`: Search binary files as if they were plain text.
|
||||
* `-z/--search-zip`: Search compressed files (gzip, bzip2, lzma, xz). This is
|
||||
disabled by default.
|
||||
* `-z/--search-zip`: Search compressed files (gzip, bzip2, lzma, xz, lz4,
|
||||
brotli, zstd). This is disabled by default.
|
||||
* `-C/--context`: Show the lines surrounding a match.
|
||||
* `--sort-files`: Force ripgrep to sort its output by file name. (This disables
|
||||
* `--sort path`: Force ripgrep to sort its output by file name. (This disables
|
||||
parallelism, so it might be slower.)
|
||||
* `-L/--follow`: Follow symbolic links while recursively searching.
|
||||
* `-M/--max-columns`: Limit the length of lines printed by ripgrep.
|
||||
|
63
README.md
63
README.md
@@ -1,15 +1,17 @@
|
||||
ripgrep (rg)
|
||||
------------
|
||||
ripgrep is a line-oriented search tool that recursively searches your current
|
||||
directory for a regex pattern while respecting your gitignore rules. ripgrep
|
||||
directory for a regex pattern. By default, ripgrep will respect your .gitignore
|
||||
and automatically skip hidden files/directories and binary files. ripgrep
|
||||
has first class support on Windows, macOS and Linux, with binary downloads
|
||||
available for [every release](https://github.com/BurntSushi/ripgrep/releases).
|
||||
ripgrep is similar to other popular search tools like The Silver Searcher,
|
||||
ack and grep.
|
||||
ripgrep is similar to other popular search tools like The Silver Searcher, ack
|
||||
and grep.
|
||||
|
||||
[](https://travis-ci.org/BurntSushi/ripgrep)
|
||||
[](https://ci.appveyor.com/project/BurntSushi/ripgrep)
|
||||
[](https://crates.io/crates/ripgrep)
|
||||
[](https://repology.org/project/ripgrep/badges)
|
||||
|
||||
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
|
||||
|
||||
@@ -105,7 +107,7 @@ increases the times to `2.640s` for ripgrep and `10.277s` for GNU grep.
|
||||
supporting Unicode (which is always on).
|
||||
* ripgrep has optional support for switching its regex engine to use PCRE2.
|
||||
Among other things, this makes it possible to use look-around and
|
||||
backreferences in your patterns, which are supported in ripgrep's default
|
||||
backreferences in your patterns, which are not supported in ripgrep's default
|
||||
regex engine. PCRE2 support is enabled with `-P`.
|
||||
* ripgrep supports searching files in text encodings other than UTF-8, such
|
||||
as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for
|
||||
@@ -248,21 +250,22 @@ If you're a **Gentoo** user, you can install ripgrep from the
|
||||
$ emerge sys-apps/ripgrep
|
||||
```
|
||||
|
||||
If you're a **Fedora 27+** user, you can install ripgrep from official
|
||||
If you're a **Fedora** user, you can install ripgrep from official
|
||||
repositories.
|
||||
|
||||
```
|
||||
$ sudo dnf install ripgrep
|
||||
```
|
||||
|
||||
If you're a **Fedora 24+** user, you can install ripgrep from
|
||||
[copr](https://copr.fedorainfracloud.org/coprs/carlwgeorge/ripgrep/):
|
||||
If you're an **openSUSE Leap 15.0** user, you can install ripgrep from the
|
||||
[utilities repo](https://build.opensuse.org/package/show/utilities/ripgrep):
|
||||
|
||||
```
|
||||
$ sudo dnf copr enable carlwgeorge/ripgrep
|
||||
$ sudo dnf install ripgrep
|
||||
$ sudo zypper ar https://download.opensuse.org/repositories/utilities/openSUSE_Leap_15.0/utilities.repo
|
||||
$ sudo zypper install ripgrep
|
||||
```
|
||||
|
||||
|
||||
If you're an **openSUSE Tumbleweed** user, you can install ripgrep from the
|
||||
[official repo](http://software.opensuse.org/package/ripgrep):
|
||||
|
||||
@@ -288,12 +291,11 @@ $ # (Or using the attribute name, which is also ripgrep.)
|
||||
|
||||
If you're a **Debian** user (or a user of a Debian derivative like **Ubuntu**),
|
||||
then ripgrep can be installed using a binary `.deb` file provided in each
|
||||
[ripgrep release](https://github.com/BurntSushi/ripgrep/releases). Note that
|
||||
ripgrep is not in the official Debian or Ubuntu repositories.
|
||||
[ripgrep release](https://github.com/BurntSushi/ripgrep/releases).
|
||||
|
||||
```
|
||||
$ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/0.9.0/ripgrep_0.9.0_amd64.deb
|
||||
$ sudo dpkg -i ripgrep_0.9.0_amd64.deb
|
||||
$ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/0.10.0/ripgrep_0.10.0_amd64.deb
|
||||
$ sudo dpkg -i ripgrep_0.10.0_amd64.deb
|
||||
```
|
||||
|
||||
If you run Debian Buster (currently Debian testing) or Debian sid, ripgrep is
|
||||
@@ -302,6 +304,14 @@ If you run Debian Buster (currently Debian testing) or Debian sid, ripgrep is
|
||||
$ sudo apt-get install ripgrep
|
||||
```
|
||||
|
||||
If you're an **Ubuntu Cosmic (18.10)** (or newer) user, ripgrep is
|
||||
[available](https://launchpad.net/ubuntu/+source/rust-ripgrep) using the same
|
||||
packaging as Debian:
|
||||
|
||||
```
|
||||
$ sudo apt-get install ripgrep
|
||||
```
|
||||
|
||||
(N.B. Various snaps for ripgrep on Ubuntu are also available, but none of them
|
||||
seem to work right and generate a number of very strange bug reports that I
|
||||
don't know how to fix and don't have the time to fix. Therefore, it is no
|
||||
@@ -330,7 +340,7 @@ If you're a **NetBSD** user, then you can install ripgrep from
|
||||
|
||||
If you're a **Rust programmer**, ripgrep can be installed with `cargo`.
|
||||
|
||||
* Note that the minimum supported version of Rust for ripgrep is **1.28.0**,
|
||||
* Note that the minimum supported version of Rust for ripgrep is **1.34.0**,
|
||||
although ripgrep may work with older versions.
|
||||
* Note that the binary may be bigger than expected because it contains debug
|
||||
symbols. This is intentional. To remove debug symbols and therefore reduce
|
||||
@@ -340,9 +350,6 @@ If you're a **Rust programmer**, ripgrep can be installed with `cargo`.
|
||||
$ cargo install ripgrep
|
||||
```
|
||||
|
||||
When compiling with Rust 1.27 or newer, this will automatically enable SIMD
|
||||
optimizations for search.
|
||||
|
||||
ripgrep isn't currently in any other package repositories.
|
||||
[I'd like to change that](https://github.com/BurntSushi/ripgrep/issues/10).
|
||||
|
||||
@@ -351,7 +358,7 @@ ripgrep isn't currently in any other package repositories.
|
||||
|
||||
ripgrep is written in Rust, so you'll need to grab a
|
||||
[Rust installation](https://www.rust-lang.org/) in order to compile it.
|
||||
ripgrep compiles with Rust 1.28.0 (stable) or newer. In general, ripgrep tracks
|
||||
ripgrep compiles with Rust 1.34.0 (stable) or newer. In general, ripgrep tracks
|
||||
the latest stable release of the Rust compiler.
|
||||
|
||||
To build ripgrep:
|
||||
@@ -368,18 +375,14 @@ If you have a Rust nightly compiler and a recent Intel CPU, then you can enable
|
||||
additional optional SIMD acceleration like so:
|
||||
|
||||
```
|
||||
RUSTFLAGS="-C target-cpu=native" cargo build --release --features 'simd-accel avx-accel'
|
||||
RUSTFLAGS="-C target-cpu=native" cargo build --release --features 'simd-accel'
|
||||
```
|
||||
|
||||
If your machine doesn't support AVX instructions, then simply remove
|
||||
`avx-accel` from the features list. Similarly for SIMD (which corresponds
|
||||
roughly to SSE instructions).
|
||||
|
||||
The `simd-accel` and `avx-accel` features enable SIMD support in certain
|
||||
ripgrep dependencies (responsible for counting lines and transcoding). They
|
||||
are not necessary to get SIMD optimizations for search; those are enabled
|
||||
automatically. Hopefully, some day, the `simd-accel` and `avx-accel` features
|
||||
will similarly become unnecessary.
|
||||
The `simd-accel` feature enables SIMD support in certain ripgrep dependencies
|
||||
(responsible for transcoding). They are not necessary to get SIMD optimizations
|
||||
for search; those are enabled automatically. Hopefully, some day, the
|
||||
`simd-accel` feature will similarly become unnecessary. **WARNING:** Currently,
|
||||
enabling this option can increase compilation times dramatically.
|
||||
|
||||
Finally, optional PCRE2 support can be built with ripgrep by enabling the
|
||||
`pcre2` feature:
|
||||
@@ -388,8 +391,8 @@ Finally, optional PCRE2 support can be built with ripgrep by enabling the
|
||||
$ cargo build --release --features 'pcre2'
|
||||
```
|
||||
|
||||
(Tip: use `--features 'pcre2 simd-accel avx-accel'` to also include compile
|
||||
time SIMD optimizations, which will only work with a nightly compiler.)
|
||||
(Tip: use `--features 'pcre2 simd-accel'` to also include compile time SIMD
|
||||
optimizations, which will only work with a nightly compiler.)
|
||||
|
||||
Enabling the PCRE2 feature works with a stable Rust compiler and will
|
||||
attempt to automatically find and link with your system's PCRE2 library via
|
||||
|
@@ -77,5 +77,5 @@ deploy:
|
||||
|
||||
branches:
|
||||
only:
|
||||
- /\d+\.\d+\.\d+/
|
||||
- /^\d+\.\d+\.\d+$/
|
||||
- master
|
||||
|
12
build.rs
12
build.rs
@@ -1,8 +1,3 @@
|
||||
#[macro_use]
|
||||
extern crate clap;
|
||||
#[macro_use]
|
||||
extern crate lazy_static;
|
||||
|
||||
use std::env;
|
||||
use std::fs::{self, File};
|
||||
use std::io::{self, Read, Write};
|
||||
@@ -168,7 +163,12 @@ fn formatted_arg(arg: &RGArg) -> io::Result<String> {
|
||||
}
|
||||
|
||||
fn formatted_doc_txt(arg: &RGArg) -> io::Result<String> {
|
||||
let paragraphs: Vec<&str> = arg.doc_long.split("\n\n").collect();
|
||||
let paragraphs: Vec<String> = arg.doc_long
|
||||
.replace("{", "{")
|
||||
.replace("}", r"}")
|
||||
.split("\n\n")
|
||||
.map(|s| s.to_string())
|
||||
.collect();
|
||||
if paragraphs.is_empty() {
|
||||
return Err(ioerr(format!("missing docs for --{}", arg.name)));
|
||||
}
|
||||
|
27
complete/_rg
27
complete/_rg
@@ -43,6 +43,7 @@ _rg() {
|
||||
+ '(exclusive)' # Misc. fully exclusive options
|
||||
'(: * -)'{-h,--help}'[display help information]'
|
||||
'(: * -)'{-V,--version}'[display version information]'
|
||||
'(: * -)'--pcre2-version'[print the version of PCRE2 used by ripgrep, if available]'
|
||||
|
||||
+ '(buffered)' # buffering options
|
||||
'--line-buffered[force line buffering]'
|
||||
@@ -85,7 +86,7 @@ _rg() {
|
||||
|
||||
+ '(file-name)' # File-name options
|
||||
{-H,--with-filename}'[show file name for matches]'
|
||||
"--no-filename[don't show file name for matches]"
|
||||
{-I,--no-filename}"[don't show file name for matches]"
|
||||
|
||||
+ '(file-system)' # File system options
|
||||
"--one-file-system[don't descend into directories on other file systems]"
|
||||
@@ -111,9 +112,17 @@ _rg() {
|
||||
'--hidden[search hidden files and directories]'
|
||||
$no"--no-hidden[don't search hidden files and directories]"
|
||||
|
||||
+ '(hybrid)' # hybrid regex options
|
||||
'--auto-hybrid-regex[dynamically use PCRE2 if necessary]'
|
||||
$no"--no-auto-hybrid-regex[don't dynamically use PCRE2 if necessary]"
|
||||
|
||||
+ '(ignore)' # Ignore-file options
|
||||
"(--no-ignore-global --no-ignore-parent --no-ignore-vcs)--no-ignore[don't respect ignore files]"
|
||||
$no'(--ignore-global --ignore-parent --ignore-vcs)--ignore[respect ignore files]'
|
||||
"(--no-ignore-global --no-ignore-parent --no-ignore-vcs --no-ignore-dot)--no-ignore[don't respect ignore files]"
|
||||
$no'(--ignore-global --ignore-parent --ignore-vcs --ignore-dot)--ignore[respect ignore files]'
|
||||
|
||||
+ '(ignore-file-case-insensitive)' # Ignore-file case sensitivity options
|
||||
'--ignore-file-case-insensitive[process ignore files case insensitively]'
|
||||
$no'--no-ignore-file-case-insensitive[process ignore files case sensitively]'
|
||||
|
||||
+ '(ignore-global)' # Global ignore-file options
|
||||
"--no-ignore-global[don't respect global ignore files]"
|
||||
@@ -127,6 +136,10 @@ _rg() {
|
||||
"--no-ignore-vcs[don't respect version control ignore files]"
|
||||
$no'--ignore-vcs[respect version control ignore files]'
|
||||
|
||||
+ '(ignore-dot)' # .ignore-file options
|
||||
"--no-ignore-dot[don't respect .ignore files]"
|
||||
$no'--ignore-dot[respect .ignore files]'
|
||||
|
||||
+ '(json)' # JSON options
|
||||
'--json[output results in JSON Lines format]'
|
||||
$no"--no-json[don't output results in JSON Lines format]"
|
||||
@@ -140,6 +153,10 @@ _rg() {
|
||||
$no"--no-crlf[don't use CRLF as line terminator]"
|
||||
'(text)--null-data[use NUL as line terminator]'
|
||||
|
||||
+ '(max-columns-preview)' # max column preview options
|
||||
'--max-columns-preview[show preview for long lines (with -M)]'
|
||||
$no"--no-max-columns-preview[don't show preview for long lines (with -M)]"
|
||||
|
||||
+ '(max-depth)' # Directory-depth options
|
||||
'--max-depth=[specify max number of directories to descend]:number of directories'
|
||||
'!--maxdepth=:number of directories'
|
||||
@@ -219,6 +236,8 @@ _rg() {
|
||||
|
||||
+ '(text)' # Binary-search options
|
||||
{-a,--text}'[search binary files as if they were text]'
|
||||
"--binary[search binary files, don't print binary data]"
|
||||
$no"--no-binary[don't search binary files]"
|
||||
$no"(--null-data)--no-text[don't search binary files as if they were text]"
|
||||
|
||||
+ '(threads)' # Thread-count options
|
||||
@@ -370,7 +389,7 @@ _rg_encodings() {
|
||||
shift{-,_}jis csshiftjis {,x-}sjis ms_kanji ms932
|
||||
utf{,-}8 utf-16{,be,le} unicode-1-1-utf-8
|
||||
windows-{31j,874,949,125{0..8}} dos-874 tis-620 ansi_x3.4-1968
|
||||
x-user-defined auto
|
||||
x-user-defined auto none
|
||||
)
|
||||
|
||||
_wanted encodings expl encoding compadd -a "$@" - _encodings
|
||||
|
@@ -34,12 +34,15 @@ files/directories and binary files.
|
||||
ripgrep's default regex engine uses finite automata and guarantees linear
|
||||
time searching. Because of this, features like backreferences and arbitrary
|
||||
look-around are not supported. However, if ripgrep is built with PCRE2, then
|
||||
the --pcre2 flag can be used to enable backreferences and look-around.
|
||||
the *--pcre2* flag can be used to enable backreferences and look-around.
|
||||
|
||||
ripgrep supports configuration files. Set RIPGREP_CONFIG_PATH to a
|
||||
ripgrep supports configuration files. Set *RIPGREP_CONFIG_PATH* to a
|
||||
configuration file. The file can specify one shell argument per line. Lines
|
||||
starting with '#' are ignored. For more details, see the man page or the
|
||||
README.
|
||||
starting with *#* are ignored. For more details, see the man page or the
|
||||
*README*.
|
||||
|
||||
Tip: to disable all smart filtering and make ripgrep behave a bit more like
|
||||
classical grep, use *rg -uuu*.
|
||||
|
||||
|
||||
REGEX SYNTAX
|
||||
@@ -52,10 +55,10 @@ https://docs.rs/regex/*/regex/bytes/index.html#syntax
|
||||
|
||||
To a first approximation, ripgrep uses Perl-like regexes without look-around or
|
||||
backreferences. This makes them very similar to the "extended" (ERE) regular
|
||||
expressions supported by `egrep`, but with a few additional features like
|
||||
expressions supported by *egrep*, but with a few additional features like
|
||||
Unicode character classes.
|
||||
|
||||
If you're using ripgrep with the --pcre2 flag, then please consult
|
||||
If you're using ripgrep with the *--pcre2* flag, then please consult
|
||||
https://www.pcre.org or the PCRE2 man pages for documentation on the supported
|
||||
syntax.
|
||||
|
||||
@@ -68,18 +71,37 @@ _PATTERN_::
|
||||
|
||||
_PATH_::
|
||||
A file or directory to search. Directories are searched recursively. Paths
|
||||
specified expicitly on the command line override glob and ignore rules.
|
||||
specified explicitly on the command line override glob and ignore rules.
|
||||
|
||||
|
||||
OPTIONS
|
||||
-------
|
||||
Note that for many options, there exist flags to disable them. In some cases,
|
||||
those flags are not listed in a first class way below. For example, the
|
||||
*--column* flag (listed below) enables column numbers in ripgrep's output, but
|
||||
the *--no-column* flag (not listed below) disables them. The reverse can also
|
||||
exist. For example, the *--no-ignore* flag (listed below) disables ripgrep's
|
||||
*gitignore* logic, but the *--ignore* flag (not listed below) enables it. These
|
||||
flags are useful for overriding a ripgrep configuration file on the command
|
||||
line. Each flag's documentation notes whether an inverted flag exists. In all
|
||||
cases, the flag specified last takes precedence.
|
||||
|
||||
{OPTIONS}
|
||||
|
||||
|
||||
EXIT STATUS
|
||||
-----------
|
||||
If ripgrep finds a match, then the exit status of the program is 0. If no match
|
||||
could be found, then the exit status is non-zero.
|
||||
could be found, then the exit status is 1. If an error occurred, then the exit
|
||||
status is always 2 unless ripgrep was run with the *--quiet* flag and a match
|
||||
was found. In summary:
|
||||
|
||||
* `0` exit status occurs only when at least one match was found, and if
|
||||
no error occurred, unless *--quiet* was given.
|
||||
* `1` exit status occurs only when no match was found and no error occurred.
|
||||
* `2` exit status occurs when an error occurred. This is true for both
|
||||
catastrophic errors (e.g., a regex syntax error) and for soft errors (e.g.,
|
||||
unable to read a file).
|
||||
|
||||
|
||||
CONFIGURATION FILES
|
||||
@@ -88,12 +110,12 @@ ripgrep supports reading configuration files that change ripgrep's default
|
||||
behavior. The format of the configuration file is an "rc" style and is very
|
||||
simple. It is defined by two rules:
|
||||
|
||||
1. Every line is a shell argument, after trimming ASCII whitespace.
|
||||
2. Lines starting with _#_ (optionally preceded by any amount of
|
||||
ASCII whitespace) are ignored.
|
||||
1. Every line is a shell argument, after trimming whitespace.
|
||||
2. Lines starting with *#* (optionally preceded by any amount of
|
||||
whitespace) are ignored.
|
||||
|
||||
ripgrep will look for a single configuration file if and only if the
|
||||
_RIPGREP_CONFIG_PATH_ environment variable is set and is non-empty.
|
||||
*RIPGREP_CONFIG_PATH* environment variable is set and is non-empty.
|
||||
ripgrep will parse shell arguments from this file on startup and will
|
||||
behave as if the arguments in this file were prepended to any explicit
|
||||
arguments given to ripgrep on the command line.
|
||||
@@ -155,20 +177,35 @@ SHELL COMPLETION
|
||||
Shell completion files are included in the release tarball for Bash, Fish, Zsh
|
||||
and PowerShell.
|
||||
|
||||
For *bash*, move `rg.bash` to `$XDG_CONFIG_HOME/bash_completion`
|
||||
or `/etc/bash_completion.d/`.
|
||||
For *bash*, move *rg.bash* to *$XDG_CONFIG_HOME/bash_completion*
|
||||
or */etc/bash_completion.d/*.
|
||||
|
||||
For *fish*, move `rg.fish` to `$HOME/.config/fish/completions`.
|
||||
For *fish*, move *rg.fish* to *$HOME/.config/fish/completions*.
|
||||
|
||||
For *zsh*, move `_rg` to one of your `$fpath` directories.
|
||||
For *zsh*, move *_rg* to one of your *$fpath* directories.
|
||||
|
||||
|
||||
CAVEATS
|
||||
-------
|
||||
ripgrep may abort unexpectedly when using default settings if it searches a
|
||||
file that is simultaneously truncated. This behavior can be avoided by passing
|
||||
the --no-mmap flag which will forcefully disable the use of memory maps in all
|
||||
cases.
|
||||
the *--no-mmap* flag which will forcefully disable the use of memory maps in
|
||||
all cases.
|
||||
|
||||
ripgrep may use a large amount of memory depending on a few factors. Firstly,
|
||||
if ripgrep uses parallelism for search (the default), then the entire output
|
||||
for each individual file is buffered into memory in order to prevent
|
||||
interleaving matches in the output. To avoid this, you can disable parallelism
|
||||
with the *-j1* flag. Secondly, ripgrep always needs to have at least a single
|
||||
line in memory in order to execute a search. A file with a very long line can
|
||||
thus cause ripgrep to use a lot of memory. Generally, this only occurs when
|
||||
searching binary data with the *-a* flag enabled. (When the *-a* flag isn't
|
||||
enabled, ripgrep will replace all NUL bytes with line terminators, which
|
||||
typically prevents exorbitant memory usage.) Thirdly, when ripgrep searches
|
||||
a large file using a memory map, the process will report its resident memory
|
||||
usage as the size of the file. However, this does not mean ripgrep actually
|
||||
needed to use that much memory; the operating system will generally handle this
|
||||
for you.
|
||||
|
||||
|
||||
VERSION
|
||||
@@ -180,7 +217,11 @@ HOMEPAGE
|
||||
--------
|
||||
https://github.com/BurntSushi/ripgrep
|
||||
|
||||
Please report bugs and feature requests in the issue tracker.
|
||||
Please report bugs and feature requests in the issue tracker. Please do your
|
||||
best to provide a reproducible test case for bugs. This should include the
|
||||
corpus being searched, the *rg* command, the actual output and the expected
|
||||
output. Please also include the output of running the same *rg* command but
|
||||
with the *--debug* flag.
|
||||
|
||||
|
||||
AUTHORS
|
||||
|
@@ -1,6 +1,6 @@
|
||||
[package]
|
||||
name = "globset"
|
||||
version = "0.4.2" #:version
|
||||
version = "0.4.3" #:version
|
||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||
description = """
|
||||
Cross platform single glob and glob set matching. Glob set matching is the
|
||||
@@ -19,14 +19,14 @@ name = "globset"
|
||||
bench = false
|
||||
|
||||
[dependencies]
|
||||
aho-corasick = "0.6.8"
|
||||
aho-corasick = "0.7.3"
|
||||
bstr = { version = "0.1.2", default-features = false, features = ["std"] }
|
||||
fnv = "1.0.6"
|
||||
log = "0.4.5"
|
||||
memchr = "2.0.2"
|
||||
regex = "1.0.5"
|
||||
regex = "1.1.5"
|
||||
|
||||
[dev-dependencies]
|
||||
glob = "0.2.11"
|
||||
glob = "0.3.0"
|
||||
|
||||
[features]
|
||||
simd-accel = []
|
||||
|
@@ -120,7 +120,7 @@ impl GlobMatcher {
|
||||
|
||||
/// Tests whether the given path matches this pattern or not.
|
||||
pub fn is_match_candidate(&self, path: &Candidate) -> bool {
|
||||
self.re.is_match(&path.path)
|
||||
self.re.is_match(path.path.as_bytes())
|
||||
}
|
||||
}
|
||||
|
||||
@@ -145,7 +145,7 @@ impl GlobStrategic {
|
||||
|
||||
/// Tests whether the given path matches this pattern or not.
|
||||
fn is_match_candidate(&self, candidate: &Candidate) -> bool {
|
||||
let byte_path = &*candidate.path;
|
||||
let byte_path = candidate.path.as_bytes();
|
||||
|
||||
match self.strategy {
|
||||
MatchStrategy::Literal(ref lit) => lit.as_bytes() == byte_path,
|
||||
@@ -837,40 +837,66 @@ impl<'a> Parser<'a> {
|
||||
|
||||
fn parse_star(&mut self) -> Result<(), Error> {
|
||||
let prev = self.prev;
|
||||
if self.chars.peek() != Some(&'*') {
|
||||
if self.peek() != Some('*') {
|
||||
self.push_token(Token::ZeroOrMore)?;
|
||||
return Ok(());
|
||||
}
|
||||
assert!(self.bump() == Some('*'));
|
||||
if !self.have_tokens()? {
|
||||
self.push_token(Token::RecursivePrefix)?;
|
||||
let next = self.bump();
|
||||
if !next.map(is_separator).unwrap_or(true) {
|
||||
return Err(self.error(ErrorKind::InvalidRecursive));
|
||||
if !self.peek().map_or(true, is_separator) {
|
||||
self.push_token(Token::ZeroOrMore)?;
|
||||
self.push_token(Token::ZeroOrMore)?;
|
||||
} else {
|
||||
self.push_token(Token::RecursivePrefix)?;
|
||||
assert!(self.bump().map_or(true, is_separator));
|
||||
}
|
||||
return Ok(());
|
||||
}
|
||||
self.pop_token()?;
|
||||
|
||||
if !prev.map(is_separator).unwrap_or(false) {
|
||||
if self.stack.len() <= 1
|
||||
|| (prev != Some(',') && prev != Some('{')) {
|
||||
return Err(self.error(ErrorKind::InvalidRecursive));
|
||||
|| (prev != Some(',') && prev != Some('{'))
|
||||
{
|
||||
self.push_token(Token::ZeroOrMore)?;
|
||||
self.push_token(Token::ZeroOrMore)?;
|
||||
return Ok(());
|
||||
}
|
||||
}
|
||||
match self.chars.peek() {
|
||||
None => {
|
||||
assert!(self.bump().is_none());
|
||||
self.push_token(Token::RecursiveSuffix)
|
||||
let is_suffix =
|
||||
match self.peek() {
|
||||
None => {
|
||||
assert!(self.bump().is_none());
|
||||
true
|
||||
}
|
||||
Some(',') | Some('}') if self.stack.len() >= 2 => {
|
||||
true
|
||||
}
|
||||
Some(c) if is_separator(c) => {
|
||||
assert!(self.bump().map(is_separator).unwrap_or(false));
|
||||
false
|
||||
}
|
||||
_ => {
|
||||
self.push_token(Token::ZeroOrMore)?;
|
||||
self.push_token(Token::ZeroOrMore)?;
|
||||
return Ok(());
|
||||
}
|
||||
};
|
||||
match self.pop_token()? {
|
||||
Token::RecursivePrefix => {
|
||||
self.push_token(Token::RecursivePrefix)?;
|
||||
}
|
||||
Some(&',') | Some(&'}') if self.stack.len() >= 2 => {
|
||||
self.push_token(Token::RecursiveSuffix)
|
||||
Token::RecursiveSuffix => {
|
||||
self.push_token(Token::RecursiveSuffix)?;
|
||||
}
|
||||
Some(&c) if is_separator(c) => {
|
||||
assert!(self.bump().map(is_separator).unwrap_or(false));
|
||||
self.push_token(Token::RecursiveZeroOrMore)
|
||||
_ => {
|
||||
if is_suffix {
|
||||
self.push_token(Token::RecursiveSuffix)?;
|
||||
} else {
|
||||
self.push_token(Token::RecursiveZeroOrMore)?;
|
||||
}
|
||||
}
|
||||
_ => Err(self.error(ErrorKind::InvalidRecursive)),
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn parse_class(&mut self) -> Result<(), Error> {
|
||||
@@ -959,6 +985,10 @@ impl<'a> Parser<'a> {
|
||||
self.cur = self.chars.next();
|
||||
self.cur
|
||||
}
|
||||
|
||||
fn peek(&mut self) -> Option<char> {
|
||||
self.chars.peek().map(|&ch| ch)
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
@@ -1144,13 +1174,6 @@ mod tests {
|
||||
syntax!(cls20, "[^a]", vec![classn('a', 'a')]);
|
||||
syntax!(cls21, "[^a-z]", vec![classn('a', 'z')]);
|
||||
|
||||
syntaxerr!(err_rseq1, "a**", ErrorKind::InvalidRecursive);
|
||||
syntaxerr!(err_rseq2, "**a", ErrorKind::InvalidRecursive);
|
||||
syntaxerr!(err_rseq3, "a**b", ErrorKind::InvalidRecursive);
|
||||
syntaxerr!(err_rseq4, "***", ErrorKind::InvalidRecursive);
|
||||
syntaxerr!(err_rseq5, "/a**", ErrorKind::InvalidRecursive);
|
||||
syntaxerr!(err_rseq6, "/**a", ErrorKind::InvalidRecursive);
|
||||
syntaxerr!(err_rseq7, "/a**b", ErrorKind::InvalidRecursive);
|
||||
syntaxerr!(err_unclosed1, "[", ErrorKind::UnclosedClass);
|
||||
syntaxerr!(err_unclosed2, "[]", ErrorKind::UnclosedClass);
|
||||
syntaxerr!(err_unclosed3, "[!", ErrorKind::UnclosedClass);
|
||||
@@ -1194,8 +1217,30 @@ mod tests {
|
||||
toregex!(re8, "[*]", r"^[\*]$");
|
||||
toregex!(re9, "[+]", r"^[\+]$");
|
||||
toregex!(re10, "+", r"^\+$");
|
||||
toregex!(re11, "**", r"^.*$");
|
||||
toregex!(re12, "☃", r"^\xe2\x98\x83$");
|
||||
toregex!(re11, "☃", r"^\xe2\x98\x83$");
|
||||
toregex!(re12, "**", r"^.*$");
|
||||
toregex!(re13, "**/", r"^.*$");
|
||||
toregex!(re14, "**/*", r"^(?:/?|.*/).*$");
|
||||
toregex!(re15, "**/**", r"^.*$");
|
||||
toregex!(re16, "**/**/*", r"^(?:/?|.*/).*$");
|
||||
toregex!(re17, "**/**/**", r"^.*$");
|
||||
toregex!(re18, "**/**/**/*", r"^(?:/?|.*/).*$");
|
||||
toregex!(re19, "a/**", r"^a(?:/?|/.*)$");
|
||||
toregex!(re20, "a/**/**", r"^a(?:/?|/.*)$");
|
||||
toregex!(re21, "a/**/**/**", r"^a(?:/?|/.*)$");
|
||||
toregex!(re22, "a/**/b", r"^a(?:/|/.*/)b$");
|
||||
toregex!(re23, "a/**/**/b", r"^a(?:/|/.*/)b$");
|
||||
toregex!(re24, "a/**/**/**/b", r"^a(?:/|/.*/)b$");
|
||||
toregex!(re25, "**/b", r"^(?:/?|.*/)b$");
|
||||
toregex!(re26, "**/**/b", r"^(?:/?|.*/)b$");
|
||||
toregex!(re27, "**/**/**/b", r"^(?:/?|.*/)b$");
|
||||
toregex!(re28, "a**", r"^a.*.*$");
|
||||
toregex!(re29, "**a", r"^.*.*a$");
|
||||
toregex!(re30, "a**b", r"^a.*.*b$");
|
||||
toregex!(re31, "***", r"^.*.*.*$");
|
||||
toregex!(re32, "/a**", r"^/a.*.*$");
|
||||
toregex!(re33, "/**a", r"^/.*.*a$");
|
||||
toregex!(re34, "/a**b", r"^/a.*.*b$");
|
||||
|
||||
matches!(match1, "a", "a");
|
||||
matches!(match2, "a*b", "a_b");
|
||||
|
@@ -104,27 +104,25 @@ or to enable case insensitive matching.
|
||||
#![deny(missing_docs)]
|
||||
|
||||
extern crate aho_corasick;
|
||||
extern crate bstr;
|
||||
extern crate fnv;
|
||||
#[macro_use]
|
||||
extern crate log;
|
||||
extern crate memchr;
|
||||
extern crate regex;
|
||||
|
||||
use std::borrow::Cow;
|
||||
use std::collections::{BTreeMap, HashMap};
|
||||
use std::error::Error as StdError;
|
||||
use std::ffi::OsStr;
|
||||
use std::fmt;
|
||||
use std::hash;
|
||||
use std::path::Path;
|
||||
use std::str;
|
||||
|
||||
use aho_corasick::{Automaton, AcAutomaton, FullAcAutomaton};
|
||||
use aho_corasick::AhoCorasick;
|
||||
use bstr::{B, BStr, BString};
|
||||
use regex::bytes::{Regex, RegexBuilder, RegexSet};
|
||||
|
||||
use pathutil::{
|
||||
file_name, file_name_ext, normalize_path, os_str_bytes, path_bytes,
|
||||
};
|
||||
use pathutil::{file_name, file_name_ext, normalize_path};
|
||||
use glob::MatchStrategy;
|
||||
pub use glob::{Glob, GlobBuilder, GlobMatcher};
|
||||
|
||||
@@ -143,8 +141,13 @@ pub struct Error {
|
||||
/// The kind of error that can occur when parsing a glob pattern.
|
||||
#[derive(Clone, Debug, Eq, PartialEq)]
|
||||
pub enum ErrorKind {
|
||||
/// Occurs when a use of `**` is invalid. Namely, `**` can only appear
|
||||
/// adjacent to a path separator, or the beginning/end of a glob.
|
||||
/// **DEPRECATED**.
|
||||
///
|
||||
/// This error used to occur for consistency with git's glob specification,
|
||||
/// but the specification now accepts all uses of `**`. When `**` does not
|
||||
/// appear adjacent to a path separator or at the beginning/end of a glob,
|
||||
/// it is now treated as two consecutive `*` patterns. As such, this error
|
||||
/// is no longer used.
|
||||
InvalidRecursive,
|
||||
/// Occurs when a character class (e.g., `[abc]`) is not closed.
|
||||
UnclosedClass,
|
||||
@@ -289,6 +292,7 @@ pub struct GlobSet {
|
||||
|
||||
impl GlobSet {
|
||||
/// Create an empty `GlobSet`. An empty set matches nothing.
|
||||
#[inline]
|
||||
pub fn empty() -> GlobSet {
|
||||
GlobSet {
|
||||
len: 0,
|
||||
@@ -297,11 +301,13 @@ impl GlobSet {
|
||||
}
|
||||
|
||||
/// Returns true if this set is empty, and therefore matches nothing.
|
||||
#[inline]
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.len == 0
|
||||
}
|
||||
|
||||
/// Returns the number of globs in this set.
|
||||
#[inline]
|
||||
pub fn len(&self) -> usize {
|
||||
self.len
|
||||
}
|
||||
@@ -484,24 +490,25 @@ impl GlobSetBuilder {
|
||||
/// path against multiple globs or sets of globs.
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct Candidate<'a> {
|
||||
path: Cow<'a, [u8]>,
|
||||
basename: Cow<'a, [u8]>,
|
||||
ext: Cow<'a, [u8]>,
|
||||
path: Cow<'a, BStr>,
|
||||
basename: Cow<'a, BStr>,
|
||||
ext: Cow<'a, BStr>,
|
||||
}
|
||||
|
||||
impl<'a> Candidate<'a> {
|
||||
/// Create a new candidate for matching from the given path.
|
||||
pub fn new<P: AsRef<Path> + ?Sized>(path: &'a P) -> Candidate<'a> {
|
||||
let path = path.as_ref();
|
||||
let basename = file_name(path).unwrap_or(OsStr::new(""));
|
||||
let path = normalize_path(BString::from_path_lossy(path.as_ref()));
|
||||
let basename = file_name(&path).unwrap_or(Cow::Borrowed(B("")));
|
||||
let ext = file_name_ext(&basename).unwrap_or(Cow::Borrowed(B("")));
|
||||
Candidate {
|
||||
path: normalize_path(path_bytes(path)),
|
||||
basename: os_str_bytes(basename),
|
||||
ext: file_name_ext(basename).unwrap_or(Cow::Borrowed(b"")),
|
||||
path: path,
|
||||
basename: basename,
|
||||
ext: ext,
|
||||
}
|
||||
}
|
||||
|
||||
fn path_prefix(&self, max: usize) -> &[u8] {
|
||||
fn path_prefix(&self, max: usize) -> &BStr {
|
||||
if self.path.len() <= max {
|
||||
&*self.path
|
||||
} else {
|
||||
@@ -509,7 +516,7 @@ impl<'a> Candidate<'a> {
|
||||
}
|
||||
}
|
||||
|
||||
fn path_suffix(&self, max: usize) -> &[u8] {
|
||||
fn path_suffix(&self, max: usize) -> &BStr {
|
||||
if self.path.len() <= max {
|
||||
&*self.path
|
||||
} else {
|
||||
@@ -570,12 +577,12 @@ impl LiteralStrategy {
|
||||
}
|
||||
|
||||
fn is_match(&self, candidate: &Candidate) -> bool {
|
||||
self.0.contains_key(&*candidate.path)
|
||||
self.0.contains_key(candidate.path.as_bytes())
|
||||
}
|
||||
|
||||
#[inline(never)]
|
||||
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
|
||||
if let Some(hits) = self.0.get(&*candidate.path) {
|
||||
if let Some(hits) = self.0.get(candidate.path.as_bytes()) {
|
||||
matches.extend(hits);
|
||||
}
|
||||
}
|
||||
@@ -597,7 +604,7 @@ impl BasenameLiteralStrategy {
|
||||
if candidate.basename.is_empty() {
|
||||
return false;
|
||||
}
|
||||
self.0.contains_key(&*candidate.basename)
|
||||
self.0.contains_key(candidate.basename.as_bytes())
|
||||
}
|
||||
|
||||
#[inline(never)]
|
||||
@@ -605,7 +612,7 @@ impl BasenameLiteralStrategy {
|
||||
if candidate.basename.is_empty() {
|
||||
return;
|
||||
}
|
||||
if let Some(hits) = self.0.get(&*candidate.basename) {
|
||||
if let Some(hits) = self.0.get(candidate.basename.as_bytes()) {
|
||||
matches.extend(hits);
|
||||
}
|
||||
}
|
||||
@@ -627,7 +634,7 @@ impl ExtensionStrategy {
|
||||
if candidate.ext.is_empty() {
|
||||
return false;
|
||||
}
|
||||
self.0.contains_key(&*candidate.ext)
|
||||
self.0.contains_key(candidate.ext.as_bytes())
|
||||
}
|
||||
|
||||
#[inline(never)]
|
||||
@@ -635,7 +642,7 @@ impl ExtensionStrategy {
|
||||
if candidate.ext.is_empty() {
|
||||
return;
|
||||
}
|
||||
if let Some(hits) = self.0.get(&*candidate.ext) {
|
||||
if let Some(hits) = self.0.get(candidate.ext.as_bytes()) {
|
||||
matches.extend(hits);
|
||||
}
|
||||
}
|
||||
@@ -643,7 +650,7 @@ impl ExtensionStrategy {
|
||||
|
||||
#[derive(Clone, Debug)]
|
||||
struct PrefixStrategy {
|
||||
matcher: FullAcAutomaton<Vec<u8>>,
|
||||
matcher: AhoCorasick,
|
||||
map: Vec<usize>,
|
||||
longest: usize,
|
||||
}
|
||||
@@ -651,8 +658,8 @@ struct PrefixStrategy {
|
||||
impl PrefixStrategy {
|
||||
fn is_match(&self, candidate: &Candidate) -> bool {
|
||||
let path = candidate.path_prefix(self.longest);
|
||||
for m in self.matcher.find_overlapping(path) {
|
||||
if m.start == 0 {
|
||||
for m in self.matcher.find_overlapping_iter(path) {
|
||||
if m.start() == 0 {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
@@ -661,9 +668,9 @@ impl PrefixStrategy {
|
||||
|
||||
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
|
||||
let path = candidate.path_prefix(self.longest);
|
||||
for m in self.matcher.find_overlapping(path) {
|
||||
if m.start == 0 {
|
||||
matches.push(self.map[m.pati]);
|
||||
for m in self.matcher.find_overlapping_iter(path) {
|
||||
if m.start() == 0 {
|
||||
matches.push(self.map[m.pattern()]);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -671,7 +678,7 @@ impl PrefixStrategy {
|
||||
|
||||
#[derive(Clone, Debug)]
|
||||
struct SuffixStrategy {
|
||||
matcher: FullAcAutomaton<Vec<u8>>,
|
||||
matcher: AhoCorasick,
|
||||
map: Vec<usize>,
|
||||
longest: usize,
|
||||
}
|
||||
@@ -679,8 +686,8 @@ struct SuffixStrategy {
|
||||
impl SuffixStrategy {
|
||||
fn is_match(&self, candidate: &Candidate) -> bool {
|
||||
let path = candidate.path_suffix(self.longest);
|
||||
for m in self.matcher.find_overlapping(path) {
|
||||
if m.end == path.len() {
|
||||
for m in self.matcher.find_overlapping_iter(path) {
|
||||
if m.end() == path.len() {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
@@ -689,9 +696,9 @@ impl SuffixStrategy {
|
||||
|
||||
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
|
||||
let path = candidate.path_suffix(self.longest);
|
||||
for m in self.matcher.find_overlapping(path) {
|
||||
if m.end == path.len() {
|
||||
matches.push(self.map[m.pati]);
|
||||
for m in self.matcher.find_overlapping_iter(path) {
|
||||
if m.end() == path.len() {
|
||||
matches.push(self.map[m.pattern()]);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -705,11 +712,11 @@ impl RequiredExtensionStrategy {
|
||||
if candidate.ext.is_empty() {
|
||||
return false;
|
||||
}
|
||||
match self.0.get(&*candidate.ext) {
|
||||
match self.0.get(candidate.ext.as_bytes()) {
|
||||
None => false,
|
||||
Some(regexes) => {
|
||||
for &(_, ref re) in regexes {
|
||||
if re.is_match(&*candidate.path) {
|
||||
if re.is_match(candidate.path.as_bytes()) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
@@ -723,9 +730,9 @@ impl RequiredExtensionStrategy {
|
||||
if candidate.ext.is_empty() {
|
||||
return;
|
||||
}
|
||||
if let Some(regexes) = self.0.get(&*candidate.ext) {
|
||||
if let Some(regexes) = self.0.get(candidate.ext.as_bytes()) {
|
||||
for &(global_index, ref re) in regexes {
|
||||
if re.is_match(&*candidate.path) {
|
||||
if re.is_match(candidate.path.as_bytes()) {
|
||||
matches.push(global_index);
|
||||
}
|
||||
}
|
||||
@@ -741,11 +748,11 @@ struct RegexSetStrategy {
|
||||
|
||||
impl RegexSetStrategy {
|
||||
fn is_match(&self, candidate: &Candidate) -> bool {
|
||||
self.matcher.is_match(&*candidate.path)
|
||||
self.matcher.is_match(candidate.path.as_bytes())
|
||||
}
|
||||
|
||||
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
|
||||
for i in self.matcher.matches(&*candidate.path) {
|
||||
for i in self.matcher.matches(candidate.path.as_bytes()) {
|
||||
matches.push(self.map[i]);
|
||||
}
|
||||
}
|
||||
@@ -776,18 +783,16 @@ impl MultiStrategyBuilder {
|
||||
}
|
||||
|
||||
fn prefix(self) -> PrefixStrategy {
|
||||
let it = self.literals.into_iter().map(|s| s.into_bytes());
|
||||
PrefixStrategy {
|
||||
matcher: AcAutomaton::new(it).into_full(),
|
||||
matcher: AhoCorasick::new_auto_configured(&self.literals),
|
||||
map: self.map,
|
||||
longest: self.longest,
|
||||
}
|
||||
}
|
||||
|
||||
fn suffix(self) -> SuffixStrategy {
|
||||
let it = self.literals.into_iter().map(|s| s.into_bytes());
|
||||
SuffixStrategy {
|
||||
matcher: AcAutomaton::new(it).into_full(),
|
||||
matcher: AhoCorasick::new_auto_configured(&self.literals),
|
||||
map: self.map,
|
||||
longest: self.longest,
|
||||
}
|
||||
|
@@ -1,41 +1,26 @@
|
||||
use std::borrow::Cow;
|
||||
use std::ffi::OsStr;
|
||||
use std::path::Path;
|
||||
|
||||
use bstr::BStr;
|
||||
|
||||
/// The final component of the path, if it is a normal file.
|
||||
///
|
||||
/// If the path terminates in ., .., or consists solely of a root of prefix,
|
||||
/// file_name will return None.
|
||||
#[cfg(unix)]
|
||||
pub fn file_name<'a, P: AsRef<Path> + ?Sized>(
|
||||
path: &'a P,
|
||||
) -> Option<&'a OsStr> {
|
||||
use std::os::unix::ffi::OsStrExt;
|
||||
use memchr::memrchr;
|
||||
|
||||
let path = path.as_ref().as_os_str().as_bytes();
|
||||
pub fn file_name<'a>(path: &Cow<'a, BStr>) -> Option<Cow<'a, BStr>> {
|
||||
if path.is_empty() {
|
||||
return None;
|
||||
} else if path.len() == 1 && path[0] == b'.' {
|
||||
return None;
|
||||
} else if path.last() == Some(&b'.') {
|
||||
return None;
|
||||
} else if path.len() >= 2 && &path[path.len() - 2..] == &b".."[..] {
|
||||
} else if path.last() == Some(b'.') {
|
||||
return None;
|
||||
}
|
||||
let last_slash = memrchr(b'/', path).map(|i| i + 1).unwrap_or(0);
|
||||
Some(OsStr::from_bytes(&path[last_slash..]))
|
||||
}
|
||||
|
||||
/// The final component of the path, if it is a normal file.
|
||||
///
|
||||
/// If the path terminates in ., .., or consists solely of a root of prefix,
|
||||
/// file_name will return None.
|
||||
#[cfg(not(unix))]
|
||||
pub fn file_name<'a, P: AsRef<Path> + ?Sized>(
|
||||
path: &'a P,
|
||||
) -> Option<&'a OsStr> {
|
||||
path.as_ref().file_name()
|
||||
let last_slash = path.rfind_byte(b'/').map(|i| i + 1).unwrap_or(0);
|
||||
Some(match *path {
|
||||
Cow::Borrowed(path) => Cow::Borrowed(&path[last_slash..]),
|
||||
Cow::Owned(ref path) => {
|
||||
let mut path = path.clone();
|
||||
path.drain_bytes(..last_slash);
|
||||
Cow::Owned(path)
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
/// Return a file extension given a path's file name.
|
||||
@@ -54,59 +39,28 @@ pub fn file_name<'a, P: AsRef<Path> + ?Sized>(
|
||||
/// a pattern like `*.rs` is obviously trying to match files with a `rs`
|
||||
/// extension, but it also matches files like `.rs`, which doesn't have an
|
||||
/// extension according to std::path::Path::extension.
|
||||
pub fn file_name_ext(name: &OsStr) -> Option<Cow<[u8]>> {
|
||||
pub fn file_name_ext<'a>(name: &Cow<'a, BStr>) -> Option<Cow<'a, BStr>> {
|
||||
if name.is_empty() {
|
||||
return None;
|
||||
}
|
||||
let name = os_str_bytes(name);
|
||||
let last_dot_at = {
|
||||
let result = name
|
||||
.iter().enumerate().rev()
|
||||
.find(|&(_, &b)| b == b'.')
|
||||
.map(|(i, _)| i);
|
||||
match result {
|
||||
None => return None,
|
||||
Some(i) => i,
|
||||
}
|
||||
let last_dot_at = match name.rfind_byte(b'.') {
|
||||
None => return None,
|
||||
Some(i) => i,
|
||||
};
|
||||
Some(match name {
|
||||
Some(match *name {
|
||||
Cow::Borrowed(name) => Cow::Borrowed(&name[last_dot_at..]),
|
||||
Cow::Owned(mut name) => {
|
||||
name.drain(..last_dot_at);
|
||||
Cow::Owned(ref name) => {
|
||||
let mut name = name.clone();
|
||||
name.drain_bytes(..last_dot_at);
|
||||
Cow::Owned(name)
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
/// Return raw bytes of a path, transcoded to UTF-8 if necessary.
|
||||
pub fn path_bytes(path: &Path) -> Cow<[u8]> {
|
||||
os_str_bytes(path.as_os_str())
|
||||
}
|
||||
|
||||
/// Return the raw bytes of the given OS string, possibly transcoded to UTF-8.
|
||||
#[cfg(unix)]
|
||||
pub fn os_str_bytes(s: &OsStr) -> Cow<[u8]> {
|
||||
use std::os::unix::ffi::OsStrExt;
|
||||
Cow::Borrowed(s.as_bytes())
|
||||
}
|
||||
|
||||
/// Return the raw bytes of the given OS string, possibly transcoded to UTF-8.
|
||||
#[cfg(not(unix))]
|
||||
pub fn os_str_bytes(s: &OsStr) -> Cow<[u8]> {
|
||||
// TODO(burntsushi): On Windows, OS strings are WTF-8, which is a superset
|
||||
// of UTF-8, so even if we could get at the raw bytes, they wouldn't
|
||||
// be useful. We *must* convert to UTF-8 before doing path matching.
|
||||
// Unfortunate, but necessary.
|
||||
match s.to_string_lossy() {
|
||||
Cow::Owned(s) => Cow::Owned(s.into_bytes()),
|
||||
Cow::Borrowed(s) => Cow::Borrowed(s.as_bytes()),
|
||||
}
|
||||
}
|
||||
|
||||
/// Normalizes a path to use `/` as a separator everywhere, even on platforms
|
||||
/// that recognize other characters as separators.
|
||||
#[cfg(unix)]
|
||||
pub fn normalize_path(path: Cow<[u8]>) -> Cow<[u8]> {
|
||||
pub fn normalize_path(path: Cow<BStr>) -> Cow<BStr> {
|
||||
// UNIX only uses /, so we're good.
|
||||
path
|
||||
}
|
||||
@@ -114,7 +68,7 @@ pub fn normalize_path(path: Cow<[u8]>) -> Cow<[u8]> {
|
||||
/// Normalizes a path to use `/` as a separator everywhere, even on platforms
|
||||
/// that recognize other characters as separators.
|
||||
#[cfg(not(unix))]
|
||||
pub fn normalize_path(mut path: Cow<[u8]>) -> Cow<[u8]> {
|
||||
pub fn normalize_path(mut path: Cow<BStr>) -> Cow<BStr> {
|
||||
use std::path::is_separator;
|
||||
|
||||
for i in 0..path.len() {
|
||||
@@ -129,7 +83,8 @@ pub fn normalize_path(mut path: Cow<[u8]>) -> Cow<[u8]> {
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use std::borrow::Cow;
|
||||
use std::ffi::OsStr;
|
||||
|
||||
use bstr::{B, BString};
|
||||
|
||||
use super::{file_name_ext, normalize_path};
|
||||
|
||||
@@ -137,8 +92,9 @@ mod tests {
|
||||
($name:ident, $file_name:expr, $ext:expr) => {
|
||||
#[test]
|
||||
fn $name() {
|
||||
let got = file_name_ext(OsStr::new($file_name));
|
||||
assert_eq!($ext.map(|s| Cow::Borrowed(s.as_bytes())), got);
|
||||
let bs = BString::from($file_name);
|
||||
let got = file_name_ext(&Cow::Owned(bs));
|
||||
assert_eq!($ext.map(|s| Cow::Borrowed(B(s))), got);
|
||||
}
|
||||
};
|
||||
}
|
||||
@@ -153,7 +109,8 @@ mod tests {
|
||||
($name:ident, $path:expr, $expected:expr) => {
|
||||
#[test]
|
||||
fn $name() {
|
||||
let got = normalize_path(Cow::Owned($path.to_vec()));
|
||||
let bs = BString::from_slice($path);
|
||||
let got = normalize_path(Cow::Owned(bs));
|
||||
assert_eq!($expected.to_vec(), got.into_owned());
|
||||
}
|
||||
};
|
||||
|
@@ -1,6 +1,6 @@
|
||||
[package]
|
||||
name = "grep-cli"
|
||||
version = "0.1.1" #:version
|
||||
version = "0.1.2" #:version
|
||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||
description = """
|
||||
Utilities for search oriented command line applications.
|
||||
@@ -14,12 +14,13 @@ license = "Unlicense/MIT"
|
||||
|
||||
[dependencies]
|
||||
atty = "0.2.11"
|
||||
globset = { version = "0.4.2", path = "../globset" }
|
||||
bstr = "0.1.2"
|
||||
globset = { version = "0.4.3", path = "../globset" }
|
||||
lazy_static = "1.1.0"
|
||||
log = "0.4.5"
|
||||
regex = "1.0.5"
|
||||
same-file = "1.0.3"
|
||||
termcolor = "1.0.3"
|
||||
regex = "1.1"
|
||||
same-file = "1.0.4"
|
||||
termcolor = "1.0.4"
|
||||
|
||||
[target.'cfg(windows)'.dependencies.winapi-util]
|
||||
version = "0.1.1"
|
||||
|
@@ -352,6 +352,8 @@ fn default_decompression_commands() -> Vec<DecompressionCommand> {
|
||||
const ARGS_XZ: &[&str] = &["xz", "-d", "-c"];
|
||||
const ARGS_LZ4: &[&str] = &["lz4", "-d", "-c"];
|
||||
const ARGS_LZMA: &[&str] = &["xz", "--format=lzma", "-d", "-c"];
|
||||
const ARGS_BROTLI: &[&str] = &["brotli", "-d", "-c"];
|
||||
const ARGS_ZSTD: &[&str] = &["zstd", "-q", "-d", "-c"];
|
||||
|
||||
fn cmd(glob: &str, args: &[&str]) -> DecompressionCommand {
|
||||
DecompressionCommand {
|
||||
@@ -367,15 +369,14 @@ fn default_decompression_commands() -> Vec<DecompressionCommand> {
|
||||
vec![
|
||||
cmd("*.gz", ARGS_GZIP),
|
||||
cmd("*.tgz", ARGS_GZIP),
|
||||
|
||||
cmd("*.bz2", ARGS_BZIP),
|
||||
cmd("*.tbz2", ARGS_BZIP),
|
||||
|
||||
cmd("*.xz", ARGS_XZ),
|
||||
cmd("*.txz", ARGS_XZ),
|
||||
|
||||
cmd("*.lz4", ARGS_LZ4),
|
||||
|
||||
cmd("*.lzma", ARGS_LZMA),
|
||||
cmd("*.br", ARGS_BROTLI),
|
||||
cmd("*.zst", ARGS_ZSTD),
|
||||
cmd("*.zstd", ARGS_ZSTD),
|
||||
]
|
||||
}
|
||||
|
@@ -1,6 +1,8 @@
|
||||
use std::ffi::OsStr;
|
||||
use std::str;
|
||||
|
||||
use bstr::{BStr, BString};
|
||||
|
||||
/// A single state in the state machine used by `unescape`.
|
||||
#[derive(Clone, Copy, Eq, PartialEq)]
|
||||
enum State {
|
||||
@@ -35,18 +37,16 @@ enum State {
|
||||
///
|
||||
/// assert_eq!(r"foo\nbar\xFFbaz", escape(b"foo\nbar\xFFbaz"));
|
||||
/// ```
|
||||
pub fn escape(mut bytes: &[u8]) -> String {
|
||||
pub fn escape(bytes: &[u8]) -> String {
|
||||
let bytes = BStr::new(bytes);
|
||||
let mut escaped = String::new();
|
||||
while let Some(result) = decode_utf8(bytes) {
|
||||
match result {
|
||||
Ok(cp) => {
|
||||
escape_char(cp, &mut escaped);
|
||||
bytes = &bytes[cp.len_utf8()..];
|
||||
}
|
||||
Err(byte) => {
|
||||
escape_byte(byte, &mut escaped);
|
||||
bytes = &bytes[1..];
|
||||
for (s, e, ch) in bytes.char_indices() {
|
||||
if ch == '\u{FFFD}' {
|
||||
for b in bytes[s..e].bytes() {
|
||||
escape_byte(b, &mut escaped);
|
||||
}
|
||||
} else {
|
||||
escape_char(ch, &mut escaped);
|
||||
}
|
||||
}
|
||||
escaped
|
||||
@@ -56,19 +56,7 @@ pub fn escape(mut bytes: &[u8]) -> String {
|
||||
///
|
||||
/// This is like [`escape`](fn.escape.html), but accepts an OS string.
|
||||
pub fn escape_os(string: &OsStr) -> String {
|
||||
#[cfg(unix)]
|
||||
fn imp(string: &OsStr) -> String {
|
||||
use std::os::unix::ffi::OsStrExt;
|
||||
|
||||
escape(string.as_bytes())
|
||||
}
|
||||
|
||||
#[cfg(not(unix))]
|
||||
fn imp(string: &OsStr) -> String {
|
||||
escape(string.to_string_lossy().as_bytes())
|
||||
}
|
||||
|
||||
imp(string)
|
||||
escape(BString::from_os_str_lossy(string).as_bytes())
|
||||
}
|
||||
|
||||
/// Unescapes a string.
|
||||
@@ -195,46 +183,6 @@ fn escape_byte(byte: u8, into: &mut String) {
|
||||
}
|
||||
}
|
||||
|
||||
/// Decodes the next UTF-8 encoded codepoint from the given byte slice.
|
||||
///
|
||||
/// If no valid encoding of a codepoint exists at the beginning of the given
|
||||
/// byte slice, then the first byte is returned instead.
|
||||
///
|
||||
/// This returns `None` if and only if `bytes` is empty.
|
||||
fn decode_utf8(bytes: &[u8]) -> Option<Result<char, u8>> {
|
||||
if bytes.is_empty() {
|
||||
return None;
|
||||
}
|
||||
let len = match utf8_len(bytes[0]) {
|
||||
None => return Some(Err(bytes[0])),
|
||||
Some(len) if len > bytes.len() => return Some(Err(bytes[0])),
|
||||
Some(len) => len,
|
||||
};
|
||||
match str::from_utf8(&bytes[..len]) {
|
||||
Ok(s) => Some(Ok(s.chars().next().unwrap())),
|
||||
Err(_) => Some(Err(bytes[0])),
|
||||
}
|
||||
}
|
||||
|
||||
/// Given a UTF-8 leading byte, this returns the total number of code units
|
||||
/// in the following encoded codepoint.
|
||||
///
|
||||
/// If the given byte is not a valid UTF-8 leading byte, then this returns
|
||||
/// `None`.
|
||||
fn utf8_len(byte: u8) -> Option<usize> {
|
||||
if byte <= 0x7F {
|
||||
Some(1)
|
||||
} else if byte <= 0b110_11111 {
|
||||
Some(2)
|
||||
} else if byte <= 0b1110_1111 {
|
||||
Some(3)
|
||||
} else if byte <= 0b1111_0111 {
|
||||
Some(4)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::{escape, unescape};
|
||||
|
@@ -159,6 +159,7 @@ error message is crafted that typically tells the user how to fix the problem.
|
||||
#![deny(missing_docs)]
|
||||
|
||||
extern crate atty;
|
||||
extern crate bstr;
|
||||
extern crate globset;
|
||||
#[macro_use]
|
||||
extern crate lazy_static;
|
||||
|
@@ -1,6 +1,6 @@
|
||||
[package]
|
||||
name = "grep-matcher"
|
||||
version = "0.1.1" #:version
|
||||
version = "0.1.2" #:version
|
||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||
description = """
|
||||
A trait for regular expressions, with a focus on line oriented search.
|
||||
@@ -14,10 +14,10 @@ license = "Unlicense/MIT"
|
||||
autotests = false
|
||||
|
||||
[dependencies]
|
||||
memchr = "2.0.2"
|
||||
memchr = "2.1"
|
||||
|
||||
[dev-dependencies]
|
||||
regex = "1.0.5"
|
||||
regex = "1.1"
|
||||
|
||||
[[test]]
|
||||
name = "integration"
|
||||
|
@@ -1,6 +1,6 @@
|
||||
[package]
|
||||
name = "grep-pcre2"
|
||||
version = "0.1.1" #:version
|
||||
version = "0.1.3" #:version
|
||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||
description = """
|
||||
Use PCRE2 with the 'grep' crate.
|
||||
@@ -13,5 +13,5 @@ keywords = ["regex", "grep", "pcre", "backreference", "look"]
|
||||
license = "Unlicense/MIT"
|
||||
|
||||
[dependencies]
|
||||
grep-matcher = { version = "0.1.1", path = "../grep-matcher" }
|
||||
pcre2 = "0.1.0"
|
||||
grep-matcher = { version = "0.1.2", path = "../grep-matcher" }
|
||||
pcre2 = "0.2.0"
|
||||
|
@@ -10,6 +10,7 @@ extern crate pcre2;
|
||||
|
||||
pub use error::{Error, ErrorKind};
|
||||
pub use matcher::{RegexCaptures, RegexMatcher, RegexMatcherBuilder};
|
||||
pub use pcre2::{is_jit_available, version};
|
||||
|
||||
mod error;
|
||||
mod matcher;
|
||||
|
@@ -199,16 +199,55 @@ impl RegexMatcherBuilder {
|
||||
self
|
||||
}
|
||||
|
||||
/// Enable PCRE2's JIT.
|
||||
/// Enable PCRE2's JIT and return an error if it's not available.
|
||||
///
|
||||
/// This generally speeds up matching quite a bit. The downside is that it
|
||||
/// can increase the time it takes to compile a pattern.
|
||||
///
|
||||
/// This is disabled by default.
|
||||
/// If the JIT isn't available or if JIT compilation returns an error, then
|
||||
/// regex compilation will fail with the corresponding error.
|
||||
///
|
||||
/// This is disabled by default, and always overrides `jit_if_available`.
|
||||
pub fn jit(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
|
||||
self.builder.jit(yes);
|
||||
self
|
||||
}
|
||||
|
||||
/// Enable PCRE2's JIT if it's available.
|
||||
///
|
||||
/// This generally speeds up matching quite a bit. The downside is that it
|
||||
/// can increase the time it takes to compile a pattern.
|
||||
///
|
||||
/// If the JIT isn't available or if JIT compilation returns an error,
|
||||
/// then a debug message with the error will be emitted and the regex will
|
||||
/// otherwise silently fall back to non-JIT matching.
|
||||
///
|
||||
/// This is disabled by default, and always overrides `jit`.
|
||||
pub fn jit_if_available(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
|
||||
self.builder.jit_if_available(yes);
|
||||
self
|
||||
}
|
||||
|
||||
/// Set the maximum size of PCRE2's JIT stack, in bytes. If the JIT is
|
||||
/// not enabled, then this has no effect.
|
||||
///
|
||||
/// When `None` is given, no custom JIT stack will be created, and instead,
|
||||
/// the default JIT stack is used. When the default is used, its maximum
|
||||
/// size is 32 KB.
|
||||
///
|
||||
/// When this is set, then a new JIT stack will be created with the given
|
||||
/// maximum size as its limit.
|
||||
///
|
||||
/// Increasing the stack size can be useful for larger regular expressions.
|
||||
///
|
||||
/// By default, this is set to `None`.
|
||||
pub fn max_jit_stack_size(
|
||||
&mut self,
|
||||
bytes: Option<usize>,
|
||||
) -> &mut RegexMatcherBuilder {
|
||||
self.builder.max_jit_stack_size(bytes);
|
||||
self
|
||||
}
|
||||
}
|
||||
|
||||
/// An implementation of the `Matcher` trait using PCRE2.
|
||||
|
@@ -1,6 +1,6 @@
|
||||
[package]
|
||||
name = "grep-printer"
|
||||
version = "0.1.1" #:version
|
||||
version = "0.1.2" #:version
|
||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||
description = """
|
||||
An implementation of the grep crate's Sink trait that provides standard
|
||||
@@ -18,13 +18,14 @@ default = ["serde1"]
|
||||
serde1 = ["base64", "serde", "serde_derive", "serde_json"]
|
||||
|
||||
[dependencies]
|
||||
base64 = { version = "0.9.2", optional = true }
|
||||
grep-matcher = { version = "0.1.1", path = "../grep-matcher" }
|
||||
grep-searcher = { version = "0.1.1", path = "../grep-searcher" }
|
||||
termcolor = "1.0.3"
|
||||
base64 = { version = "0.10.0", optional = true }
|
||||
bstr = "0.1.2"
|
||||
grep-matcher = { version = "0.1.2", path = "../grep-matcher" }
|
||||
grep-searcher = { version = "0.1.4", path = "../grep-searcher" }
|
||||
termcolor = "1.0.4"
|
||||
serde = { version = "1.0.77", optional = true }
|
||||
serde_derive = { version = "1.0.77", optional = true }
|
||||
serde_json = { version = "1.0.27", optional = true }
|
||||
|
||||
[dev-dependencies]
|
||||
grep-regex = { version = "0.1.1", path = "../grep-regex" }
|
||||
grep-regex = { version = "0.1.3", path = "../grep-regex" }
|
||||
|
@@ -817,7 +817,8 @@ impl<'a> SubMatches<'a> {
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use grep_regex::RegexMatcher;
|
||||
use grep_regex::{RegexMatcher, RegexMatcherBuilder};
|
||||
use grep_matcher::LineTerminator;
|
||||
use grep_searcher::SearcherBuilder;
|
||||
|
||||
use super::{JSON, JSONBuilder};
|
||||
@@ -918,4 +919,45 @@ and exhibited clearly, with a label attached.\
|
||||
assert_eq!(got.lines().count(), 2);
|
||||
assert!(got.contains("begin") && got.contains("end"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn missing_crlf() {
|
||||
let haystack = "test\r\n".as_bytes();
|
||||
|
||||
let matcher = RegexMatcherBuilder::new()
|
||||
.build("test")
|
||||
.unwrap();
|
||||
let mut printer = JSONBuilder::new()
|
||||
.build(vec![]);
|
||||
SearcherBuilder::new()
|
||||
.build()
|
||||
.search_reader(&matcher, haystack, printer.sink(&matcher))
|
||||
.unwrap();
|
||||
let got = printer_contents(&mut printer);
|
||||
assert_eq!(got.lines().count(), 3);
|
||||
assert!(
|
||||
got.lines().nth(1).unwrap().contains(r"test\r\n"),
|
||||
r"missing 'test\r\n' in '{}'",
|
||||
got.lines().nth(1).unwrap(),
|
||||
);
|
||||
|
||||
let matcher = RegexMatcherBuilder::new()
|
||||
.crlf(true)
|
||||
.build("test")
|
||||
.unwrap();
|
||||
let mut printer = JSONBuilder::new()
|
||||
.build(vec![]);
|
||||
SearcherBuilder::new()
|
||||
.line_terminator(LineTerminator::crlf())
|
||||
.build()
|
||||
.search_reader(&matcher, haystack, printer.sink(&matcher))
|
||||
.unwrap();
|
||||
let got = printer_contents(&mut printer);
|
||||
assert_eq!(got.lines().count(), 3);
|
||||
assert!(
|
||||
got.lines().nth(1).unwrap().contains(r"test\r\n"),
|
||||
r"missing 'test\r\n' in '{}'",
|
||||
got.lines().nth(1).unwrap(),
|
||||
);
|
||||
}
|
||||
}
|
||||
|
@@ -70,6 +70,7 @@ fn example() -> Result<(), Box<Error>> {
|
||||
|
||||
#[cfg(feature = "serde1")]
|
||||
extern crate base64;
|
||||
extern crate bstr;
|
||||
extern crate grep_matcher;
|
||||
#[cfg(test)]
|
||||
extern crate grep_regex;
|
||||
|
@@ -1,3 +1,4 @@
|
||||
/// Like assert_eq, but nicer output for long strings.
|
||||
#[cfg(test)]
|
||||
#[macro_export]
|
||||
macro_rules! assert_eq_printed {
|
||||
|
@@ -5,6 +5,7 @@ use std::path::Path;
|
||||
use std::sync::Arc;
|
||||
use std::time::Instant;
|
||||
|
||||
use bstr::BStr;
|
||||
use grep_matcher::{Match, Matcher};
|
||||
use grep_searcher::{
|
||||
LineStep, Searcher,
|
||||
@@ -16,10 +17,7 @@ use termcolor::{ColorSpec, NoColor, WriteColor};
|
||||
use color::ColorSpecs;
|
||||
use counter::CounterWriter;
|
||||
use stats::Stats;
|
||||
use util::{
|
||||
PrinterPath, Replacer, Sunk,
|
||||
trim_ascii_prefix, trim_ascii_prefix_range,
|
||||
};
|
||||
use util::{PrinterPath, Replacer, Sunk, trim_ascii_prefix};
|
||||
|
||||
/// The configuration for the standard printer.
|
||||
///
|
||||
@@ -36,6 +34,7 @@ struct Config {
|
||||
per_match: bool,
|
||||
replacement: Arc<Option<Vec<u8>>>,
|
||||
max_columns: Option<u64>,
|
||||
max_columns_preview: bool,
|
||||
max_matches: Option<u64>,
|
||||
column: bool,
|
||||
byte_offset: bool,
|
||||
@@ -59,6 +58,7 @@ impl Default for Config {
|
||||
per_match: false,
|
||||
replacement: Arc::new(None),
|
||||
max_columns: None,
|
||||
max_columns_preview: false,
|
||||
max_matches: None,
|
||||
column: false,
|
||||
byte_offset: false,
|
||||
@@ -263,6 +263,21 @@ impl StandardBuilder {
|
||||
self
|
||||
}
|
||||
|
||||
/// When enabled, if a line is found to be over the configured maximum
|
||||
/// column limit (measured in terms of bytes), then a preview of the long
|
||||
/// line will be printed instead.
|
||||
///
|
||||
/// The preview will correspond to the first `N` *grapheme clusters* of
|
||||
/// the line, where `N` is the limit configured by `max_columns`.
|
||||
///
|
||||
/// If no limit is set, then enabling this has no effect.
|
||||
///
|
||||
/// This is disabled by default.
|
||||
pub fn max_columns_preview(&mut self, yes: bool) -> &mut StandardBuilder {
|
||||
self.config.max_columns_preview = yes;
|
||||
self
|
||||
}
|
||||
|
||||
/// Set the maximum amount of matching lines that are printed.
|
||||
///
|
||||
/// If multi line search is enabled and a match spans multiple lines, then
|
||||
@@ -743,6 +758,11 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
|
||||
stats.add_matches(self.standard.matches.len() as u64);
|
||||
stats.add_matched_lines(mat.lines().count() as u64);
|
||||
}
|
||||
if searcher.binary_detection().convert_byte().is_some() {
|
||||
if self.binary_byte_offset.is_some() {
|
||||
return Ok(false);
|
||||
}
|
||||
}
|
||||
|
||||
StandardImpl::from_match(searcher, self, mat).sink()?;
|
||||
Ok(!self.should_quit())
|
||||
@@ -764,6 +784,12 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
|
||||
self.record_matches(ctx.bytes())?;
|
||||
self.replace(ctx.bytes())?;
|
||||
}
|
||||
if searcher.binary_detection().convert_byte().is_some() {
|
||||
if self.binary_byte_offset.is_some() {
|
||||
return Ok(false);
|
||||
}
|
||||
}
|
||||
|
||||
StandardImpl::from_context(searcher, self, ctx).sink()?;
|
||||
Ok(!self.should_quit())
|
||||
}
|
||||
@@ -776,6 +802,15 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
fn binary_data(
|
||||
&mut self,
|
||||
_searcher: &Searcher,
|
||||
binary_byte_offset: u64,
|
||||
) -> Result<bool, io::Error> {
|
||||
self.binary_byte_offset = Some(binary_byte_offset);
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
fn begin(
|
||||
&mut self,
|
||||
_searcher: &Searcher,
|
||||
@@ -793,10 +828,12 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
|
||||
|
||||
fn finish(
|
||||
&mut self,
|
||||
_searcher: &Searcher,
|
||||
searcher: &Searcher,
|
||||
finish: &SinkFinish,
|
||||
) -> Result<(), io::Error> {
|
||||
self.binary_byte_offset = finish.binary_byte_offset();
|
||||
if let Some(offset) = self.binary_byte_offset {
|
||||
StandardImpl::new(searcher, self).write_binary_message(offset)?;
|
||||
}
|
||||
if let Some(stats) = self.stats.as_mut() {
|
||||
stats.add_elapsed(self.start_time.elapsed());
|
||||
stats.add_searches(1);
|
||||
@@ -1000,43 +1037,11 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
)?;
|
||||
count += 1;
|
||||
if self.exceeds_max_columns(&bytes[line]) {
|
||||
self.write_exceeded_line()?;
|
||||
continue;
|
||||
self.write_exceeded_line(bytes, line, matches, &mut midx)?;
|
||||
} else {
|
||||
self.write_colored_matches(bytes, line, matches, &mut midx)?;
|
||||
self.write_line_term()?;
|
||||
}
|
||||
if self.has_line_terminator(&bytes[line]) {
|
||||
line = line.with_end(line.end() - 1);
|
||||
}
|
||||
if self.config().trim_ascii {
|
||||
line = self.trim_ascii_prefix_range(bytes, line);
|
||||
}
|
||||
|
||||
while !line.is_empty() {
|
||||
if matches[midx].end() <= line.start() {
|
||||
if midx + 1 < matches.len() {
|
||||
midx += 1;
|
||||
continue;
|
||||
} else {
|
||||
self.end_color_match()?;
|
||||
self.write(&bytes[line])?;
|
||||
break;
|
||||
}
|
||||
}
|
||||
let m = matches[midx];
|
||||
|
||||
if line.start() < m.start() {
|
||||
let upto = cmp::min(line.end(), m.start());
|
||||
self.end_color_match()?;
|
||||
self.write(&bytes[line.with_end(upto)])?;
|
||||
line = line.with_start(upto);
|
||||
} else {
|
||||
let upto = cmp::min(line.end(), m.end());
|
||||
self.start_color_match()?;
|
||||
self.write(&bytes[line.with_end(upto)])?;
|
||||
line = line.with_start(upto);
|
||||
}
|
||||
}
|
||||
self.end_color_match()?;
|
||||
self.write_line_term()?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
@@ -1051,12 +1056,8 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
let mut stepper = LineStep::new(line_term, 0, bytes.len());
|
||||
while let Some((start, end)) = stepper.next(bytes) {
|
||||
let mut line = Match::new(start, end);
|
||||
if self.has_line_terminator(&bytes[line]) {
|
||||
line = line.with_end(line.end() - 1);
|
||||
}
|
||||
if self.config().trim_ascii {
|
||||
line = self.trim_ascii_prefix_range(bytes, line);
|
||||
}
|
||||
self.trim_line_terminator(bytes, &mut line);
|
||||
self.trim_ascii_prefix(bytes, &mut line);
|
||||
while !line.is_empty() {
|
||||
if matches[midx].end() <= line.start() {
|
||||
if midx + 1 < matches.len() {
|
||||
@@ -1079,14 +1080,19 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
Some(m.start() as u64 + 1),
|
||||
)?;
|
||||
|
||||
let buf = &bytes[line.with_end(upto)];
|
||||
let this_line = line.with_end(upto);
|
||||
line = line.with_start(upto);
|
||||
if self.exceeds_max_columns(&buf) {
|
||||
self.write_exceeded_line()?;
|
||||
continue;
|
||||
if self.exceeds_max_columns(&bytes[this_line]) {
|
||||
self.write_exceeded_line(
|
||||
bytes,
|
||||
this_line,
|
||||
matches,
|
||||
&mut midx,
|
||||
)?;
|
||||
} else {
|
||||
self.write_spec(spec, &bytes[this_line])?;
|
||||
self.write_line_term()?;
|
||||
}
|
||||
self.write_spec(spec, buf)?;
|
||||
self.write_line_term()?;
|
||||
}
|
||||
}
|
||||
count += 1;
|
||||
@@ -1117,15 +1123,11 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
)?;
|
||||
count += 1;
|
||||
if self.exceeds_max_columns(&bytes[line]) {
|
||||
self.write_exceeded_line()?;
|
||||
self.write_exceeded_line(bytes, line, &[m], &mut 0)?;
|
||||
continue;
|
||||
}
|
||||
if self.has_line_terminator(&bytes[line]) {
|
||||
line = line.with_end(line.end() - 1);
|
||||
}
|
||||
if self.config().trim_ascii {
|
||||
line = self.trim_ascii_prefix_range(bytes, line);
|
||||
}
|
||||
self.trim_line_terminator(bytes, &mut line);
|
||||
self.trim_ascii_prefix(bytes, &mut line);
|
||||
|
||||
while !line.is_empty() {
|
||||
if m.end() <= line.start() {
|
||||
@@ -1182,7 +1184,10 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
line: &[u8],
|
||||
) -> io::Result<()> {
|
||||
if self.exceeds_max_columns(line) {
|
||||
self.write_exceeded_line()?;
|
||||
let range = Match::new(0, line.len());
|
||||
self.write_exceeded_line(
|
||||
line, range, self.sunk.matches(), &mut 0,
|
||||
)?;
|
||||
} else {
|
||||
self.write_trim(line)?;
|
||||
if !self.has_line_terminator(line) {
|
||||
@@ -1195,50 +1200,114 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
fn write_colored_line(
|
||||
&self,
|
||||
matches: &[Match],
|
||||
line: &[u8],
|
||||
bytes: &[u8],
|
||||
) -> io::Result<()> {
|
||||
// If we know we aren't going to emit color, then we can go faster.
|
||||
let spec = self.config().colors.matched();
|
||||
if !self.wtr().borrow().supports_color() || spec.is_none() {
|
||||
return self.write_line(line);
|
||||
}
|
||||
if self.exceeds_max_columns(line) {
|
||||
return self.write_exceeded_line();
|
||||
return self.write_line(bytes);
|
||||
}
|
||||
|
||||
let mut last_written =
|
||||
if !self.config().trim_ascii {
|
||||
0
|
||||
} else {
|
||||
self.trim_ascii_prefix_range(
|
||||
line,
|
||||
Match::new(0, line.len()),
|
||||
).start()
|
||||
};
|
||||
for mut m in matches.iter().map(|&m| m) {
|
||||
if last_written < m.start() {
|
||||
let line = Match::new(0, bytes.len());
|
||||
if self.exceeds_max_columns(bytes) {
|
||||
self.write_exceeded_line(bytes, line, matches, &mut 0)
|
||||
} else {
|
||||
self.write_colored_matches(bytes, line, matches, &mut 0)?;
|
||||
self.write_line_term()?;
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
/// Write the `line` portion of `bytes`, with appropriate coloring for
|
||||
/// each `match`, starting at `match_index`.
|
||||
///
|
||||
/// This accounts for trimming any whitespace prefix and will *never* print
|
||||
/// a line terminator. If a match exceeds the range specified by `line`,
|
||||
/// then only the part of the match within `line` (if any) is printed.
|
||||
fn write_colored_matches(
|
||||
&self,
|
||||
bytes: &[u8],
|
||||
mut line: Match,
|
||||
matches: &[Match],
|
||||
match_index: &mut usize,
|
||||
) -> io::Result<()> {
|
||||
self.trim_line_terminator(bytes, &mut line);
|
||||
self.trim_ascii_prefix(bytes, &mut line);
|
||||
if matches.is_empty() {
|
||||
self.write(&bytes[line])?;
|
||||
return Ok(());
|
||||
}
|
||||
while !line.is_empty() {
|
||||
if matches[*match_index].end() <= line.start() {
|
||||
if *match_index + 1 < matches.len() {
|
||||
*match_index += 1;
|
||||
continue;
|
||||
} else {
|
||||
self.end_color_match()?;
|
||||
self.write(&bytes[line])?;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
let m = matches[*match_index];
|
||||
if line.start() < m.start() {
|
||||
let upto = cmp::min(line.end(), m.start());
|
||||
self.end_color_match()?;
|
||||
self.write(&line[last_written..m.start()])?;
|
||||
} else if last_written < m.end() {
|
||||
m = m.with_start(last_written);
|
||||
self.write(&bytes[line.with_end(upto)])?;
|
||||
line = line.with_start(upto);
|
||||
} else {
|
||||
continue;
|
||||
}
|
||||
if !m.is_empty() {
|
||||
let upto = cmp::min(line.end(), m.end());
|
||||
self.start_color_match()?;
|
||||
self.write(&line[m])?;
|
||||
self.write(&bytes[line.with_end(upto)])?;
|
||||
line = line.with_start(upto);
|
||||
}
|
||||
last_written = m.end();
|
||||
}
|
||||
self.end_color_match()?;
|
||||
self.write(&line[last_written..])?;
|
||||
if !self.has_line_terminator(line) {
|
||||
self.write_line_term()?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn write_exceeded_line(&self) -> io::Result<()> {
|
||||
fn write_exceeded_line(
|
||||
&self,
|
||||
bytes: &[u8],
|
||||
mut line: Match,
|
||||
matches: &[Match],
|
||||
match_index: &mut usize,
|
||||
) -> io::Result<()> {
|
||||
if self.config().max_columns_preview {
|
||||
let original = line;
|
||||
let end = BStr::new(&bytes[line])
|
||||
.grapheme_indices()
|
||||
.map(|(_, end, _)| end)
|
||||
.take(self.config().max_columns.unwrap_or(0) as usize)
|
||||
.last()
|
||||
.unwrap_or(0) + line.start();
|
||||
line = line.with_end(end);
|
||||
self.write_colored_matches(bytes, line, matches, match_index)?;
|
||||
|
||||
if matches.is_empty() {
|
||||
self.write(b" [... omitted end of long line]")?;
|
||||
} else {
|
||||
let remaining = matches
|
||||
.iter()
|
||||
.filter(|m| {
|
||||
m.start() >= line.end() && m.start() < original.end()
|
||||
})
|
||||
.count();
|
||||
let tense =
|
||||
if remaining == 1 {
|
||||
"match"
|
||||
} else {
|
||||
"matches"
|
||||
};
|
||||
write!(
|
||||
self.wtr().borrow_mut(),
|
||||
" [... {} more {}]",
|
||||
remaining, tense,
|
||||
)?;
|
||||
}
|
||||
self.write_line_term()?;
|
||||
return Ok(());
|
||||
}
|
||||
if self.sunk.original_matches().is_empty() {
|
||||
if self.is_context() {
|
||||
self.write(b"[Omitted long context line]")?;
|
||||
@@ -1314,6 +1383,38 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn write_binary_message(&self, offset: u64) -> io::Result<()> {
|
||||
if self.sink.match_count == 0 {
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
let bin = self.searcher.binary_detection();
|
||||
if let Some(byte) = bin.quit_byte() {
|
||||
self.write(b"WARNING: stopped searching binary file ")?;
|
||||
if let Some(path) = self.path() {
|
||||
self.write_spec(self.config().colors.path(), path.as_bytes())?;
|
||||
self.write(b" ")?;
|
||||
}
|
||||
let remainder = format!(
|
||||
"after match (found {:?} byte around offset {})\n",
|
||||
BStr::new(&[byte]), offset,
|
||||
);
|
||||
self.write(remainder.as_bytes())?;
|
||||
} else if let Some(byte) = bin.convert_byte() {
|
||||
self.write(b"Binary file ")?;
|
||||
if let Some(path) = self.path() {
|
||||
self.write_spec(self.config().colors.path(), path.as_bytes())?;
|
||||
self.write(b" ")?;
|
||||
}
|
||||
let remainder = format!(
|
||||
"matches (found {:?} byte around offset {})\n",
|
||||
BStr::new(&[byte]), offset,
|
||||
);
|
||||
self.write(remainder.as_bytes())?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn write_context_separator(&self) -> io::Result<()> {
|
||||
if let Some(ref sep) = *self.config().separator_context {
|
||||
self.write(sep)?;
|
||||
@@ -1389,13 +1490,26 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
if !self.config().trim_ascii {
|
||||
return self.write(buf);
|
||||
}
|
||||
self.write(self.trim_ascii_prefix(buf))
|
||||
let mut range = Match::new(0, buf.len());
|
||||
self.trim_ascii_prefix(buf, &mut range);
|
||||
self.write(&buf[range])
|
||||
}
|
||||
|
||||
fn write(&self, buf: &[u8]) -> io::Result<()> {
|
||||
self.wtr().borrow_mut().write_all(buf)
|
||||
}
|
||||
|
||||
fn trim_line_terminator(&self, buf: &[u8], line: &mut Match) {
|
||||
let lineterm = self.searcher.line_terminator();
|
||||
if lineterm.is_suffix(&buf[*line]) {
|
||||
let mut end = line.end() - 1;
|
||||
if lineterm.is_crlf() && buf[end - 1] == b'\r' {
|
||||
end -= 1;
|
||||
}
|
||||
*line = line.with_end(end);
|
||||
}
|
||||
}
|
||||
|
||||
fn has_line_terminator(&self, buf: &[u8]) -> bool {
|
||||
self.searcher.line_terminator().is_suffix(buf)
|
||||
}
|
||||
@@ -1451,14 +1565,12 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
///
|
||||
/// This stops trimming a prefix as soon as it sees non-whitespace or a
|
||||
/// line terminator.
|
||||
fn trim_ascii_prefix_range(&self, slice: &[u8], range: Match) -> Match {
|
||||
trim_ascii_prefix_range(self.searcher.line_terminator(), slice, range)
|
||||
}
|
||||
|
||||
/// Trim prefix ASCII spaces from the given slice and return the
|
||||
/// corresponding sub-slice.
|
||||
fn trim_ascii_prefix<'s>(&self, slice: &'s [u8]) -> &'s [u8] {
|
||||
trim_ascii_prefix(self.searcher.line_terminator(), slice)
|
||||
fn trim_ascii_prefix(&self, slice: &[u8], range: &mut Match) {
|
||||
if !self.config().trim_ascii {
|
||||
return;
|
||||
}
|
||||
let lineterm = self.searcher.line_terminator();
|
||||
*range = trim_ascii_prefix(lineterm, slice, *range)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -2225,6 +2337,31 @@ but Doctor Watson has to have it taken out for him and dusted,
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn max_columns_preview() {
|
||||
let matcher = RegexMatcher::new("exhibited|dusted").unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.max_columns(Some(46))
|
||||
.max_columns_preview(true)
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.line_number(false)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
but Doctor Watson has to have it taken out for [... omitted end of long line]
|
||||
and exhibited clearly, with a label attached.
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn max_columns_with_count() {
|
||||
let matcher = RegexMatcher::new("cigar|ash|dusted").unwrap();
|
||||
@@ -2250,6 +2387,86 @@ but Doctor Watson has to have it taken out for him and dusted,
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn max_columns_with_count_preview_no_match() {
|
||||
let matcher = RegexMatcher::new("exhibited|has to have it").unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.stats(true)
|
||||
.max_columns(Some(46))
|
||||
.max_columns_preview(true)
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.line_number(false)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
but Doctor Watson has to have it taken out for [... 0 more matches]
|
||||
and exhibited clearly, with a label attached.
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn max_columns_with_count_preview_one_match() {
|
||||
let matcher = RegexMatcher::new("exhibited|dusted").unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.stats(true)
|
||||
.max_columns(Some(46))
|
||||
.max_columns_preview(true)
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.line_number(false)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
but Doctor Watson has to have it taken out for [... 1 more match]
|
||||
and exhibited clearly, with a label attached.
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn max_columns_with_count_preview_two_matches() {
|
||||
let matcher = RegexMatcher::new(
|
||||
"exhibited|dusted|has to have it",
|
||||
).unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.stats(true)
|
||||
.max_columns(Some(46))
|
||||
.max_columns_preview(true)
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.line_number(false)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
but Doctor Watson has to have it taken out for [... 1 more match]
|
||||
and exhibited clearly, with a label attached.
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn max_columns_multi_line() {
|
||||
let matcher = RegexMatcher::new("(?s)ash.+dusted").unwrap();
|
||||
@@ -2275,6 +2492,36 @@ but Doctor Watson has to have it taken out for him and dusted,
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn max_columns_multi_line_preview() {
|
||||
let matcher = RegexMatcher::new(
|
||||
"(?s)clew|cigar ash.+have it|exhibited",
|
||||
).unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.stats(true)
|
||||
.max_columns(Some(46))
|
||||
.max_columns_preview(true)
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.line_number(false)
|
||||
.multi_line(true)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
can extract a clew from a wisp of straw or a f [... 1 more match]
|
||||
but Doctor Watson has to have it taken out for [... 0 more matches]
|
||||
and exhibited clearly, with a label attached.
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn max_matches() {
|
||||
let matcher = RegexMatcher::new("Sherlock").unwrap();
|
||||
@@ -2564,8 +2811,40 @@ Holmeses, success in the province of detective work must always
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn only_matching_max_columns_preview() {
|
||||
let matcher = RegexMatcher::new("Doctor Watsons|Sherlock").unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.only_matching(true)
|
||||
.max_columns(Some(10))
|
||||
.max_columns_preview(true)
|
||||
.column(true)
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.line_number(true)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
1:9:Doctor Wat [... 0 more matches]
|
||||
1:57:Sherlock
|
||||
3:49:Sherlock
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn only_matching_max_columns_multi_line1() {
|
||||
// The `(?s:.{0})` trick fools the matcher into thinking that it
|
||||
// can match across multiple lines without actually doing so. This is
|
||||
// so we can test multi-line handling in the case of a match on only
|
||||
// one line.
|
||||
let matcher = RegexMatcher::new(
|
||||
r"(?s:.{0})(Doctor Watsons|Sherlock)"
|
||||
).unwrap();
|
||||
@@ -2594,6 +2873,41 @@ Holmeses, success in the province of detective work must always
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn only_matching_max_columns_preview_multi_line1() {
|
||||
// The `(?s:.{0})` trick fools the matcher into thinking that it
|
||||
// can match across multiple lines without actually doing so. This is
|
||||
// so we can test multi-line handling in the case of a match on only
|
||||
// one line.
|
||||
let matcher = RegexMatcher::new(
|
||||
r"(?s:.{0})(Doctor Watsons|Sherlock)"
|
||||
).unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.only_matching(true)
|
||||
.max_columns(Some(10))
|
||||
.max_columns_preview(true)
|
||||
.column(true)
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.multi_line(true)
|
||||
.line_number(true)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
1:9:Doctor Wat [... 0 more matches]
|
||||
1:57:Sherlock
|
||||
3:49:Sherlock
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn only_matching_max_columns_multi_line2() {
|
||||
let matcher = RegexMatcher::new(
|
||||
@@ -2625,6 +2939,38 @@ Holmeses, success in the province of detective work must always
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn only_matching_max_columns_preview_multi_line2() {
|
||||
let matcher = RegexMatcher::new(
|
||||
r"(?s)Watson.+?(Holmeses|clearly)"
|
||||
).unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.only_matching(true)
|
||||
.max_columns(Some(50))
|
||||
.max_columns_preview(true)
|
||||
.column(true)
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.multi_line(true)
|
||||
.line_number(true)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
1:16:Watsons of this world, as opposed to the Sherlock
|
||||
2:16:Holmeses
|
||||
5:12:Watson has to have it taken out for him and dusted [... 0 more matches]
|
||||
6:12:and exhibited clearly
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn per_match() {
|
||||
let matcher = RegexMatcher::new("Doctor Watsons|Sherlock").unwrap();
|
||||
@@ -2820,6 +3166,61 @@ Holmeses, success in the province of detective work must always
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn replacement_max_columns_preview1() {
|
||||
let matcher = RegexMatcher::new(r"Sherlock|Doctor (\w+)").unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.max_columns(Some(67))
|
||||
.max_columns_preview(true)
|
||||
.replacement(Some(b"doctah $1 MD".to_vec()))
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.line_number(true)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
1:For the doctah Watsons MD of this world, as opposed to the doctah [... 0 more matches]
|
||||
3:be, to a very large extent, the result of luck. doctah MD Holmes
|
||||
5:but doctah Watson MD has to have it taken out for him and dusted,
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn replacement_max_columns_preview2() {
|
||||
let matcher = RegexMatcher::new(
|
||||
"exhibited|dusted|has to have it",
|
||||
).unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.max_columns(Some(43))
|
||||
.max_columns_preview(true)
|
||||
.replacement(Some(b"xxx".to_vec()))
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.line_number(false)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
but Doctor Watson xxx taken out for him and [... 1 more match]
|
||||
and xxx clearly, with a label attached.
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn replacement_only_matching() {
|
||||
let matcher = RegexMatcher::new(r"Sherlock|Doctor (\w+)").unwrap();
|
||||
|
@@ -403,7 +403,7 @@ impl<W: WriteColor> Summary<W> {
|
||||
where M: Matcher,
|
||||
P: ?Sized + AsRef<Path>,
|
||||
{
|
||||
if !self.config.path {
|
||||
if !self.config.path && !self.config.kind.requires_path() {
|
||||
return self.sink(matcher);
|
||||
}
|
||||
let stats =
|
||||
@@ -477,7 +477,10 @@ impl<'p, 's, M: Matcher, W: WriteColor> SummarySink<'p, 's, M, W> {
|
||||
/// This is unaffected by the result of searches before the previous
|
||||
/// search.
|
||||
pub fn has_match(&self) -> bool {
|
||||
self.match_count > 0
|
||||
match self.summary.config.kind {
|
||||
SummaryKind::PathWithoutMatch => self.match_count == 0,
|
||||
_ => self.match_count > 0,
|
||||
}
|
||||
}
|
||||
|
||||
/// If binary data was found in the previous search, this returns the
|
||||
@@ -633,6 +636,34 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for SummarySink<'p, 's, M, W> {
|
||||
stats.add_bytes_searched(finish.byte_count());
|
||||
stats.add_bytes_printed(self.summary.wtr.borrow().count());
|
||||
}
|
||||
// If our binary detection method says to quit after seeing binary
|
||||
// data, then we shouldn't print any results at all, even if we've
|
||||
// found a match before detecting binary data. The intent here is to
|
||||
// keep BinaryDetection::quit as a form of filter. Otherwise, we can
|
||||
// present a matching file with a smaller number of matches than
|
||||
// there might be, which can be quite misleading.
|
||||
//
|
||||
// If our binary detection method is to convert binary data, then we
|
||||
// don't quit and therefore search the entire contents of the file.
|
||||
//
|
||||
// There is an unfortunate inconsistency here. Namely, when using
|
||||
// Quiet or PathWithMatch, then the printer can quit after the first
|
||||
// match seen, which could be long before seeing binary data. This
|
||||
// means that using PathWithMatch can print a path where as using
|
||||
// Count might not print it at all because of binary data.
|
||||
//
|
||||
// It's not possible to fix this without also potentially significantly
|
||||
// impacting the performance of Quiet or PathWithMatch, so we accept
|
||||
// the bug.
|
||||
if self.binary_byte_offset.is_some()
|
||||
&& searcher.binary_detection().quit_byte().is_some()
|
||||
{
|
||||
// Squash the match count. The statistics reported will still
|
||||
// contain the match count, but the "official" match count should
|
||||
// be zero.
|
||||
self.match_count = 0;
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
let show_count =
|
||||
!self.summary.config.exclude_zero
|
||||
|
@@ -4,6 +4,7 @@ use std::io;
|
||||
use std::path::Path;
|
||||
use std::time;
|
||||
|
||||
use bstr::{BStr, BString};
|
||||
use grep_matcher::{Captures, LineTerminator, Match, Matcher};
|
||||
use grep_searcher::{
|
||||
LineIter,
|
||||
@@ -262,26 +263,12 @@ impl<'a> Sunk<'a> {
|
||||
/// portability with a small cost: on Windows, paths that are not valid UTF-16
|
||||
/// will not roundtrip correctly.
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct PrinterPath<'a>(Cow<'a, [u8]>);
|
||||
pub struct PrinterPath<'a>(Cow<'a, BStr>);
|
||||
|
||||
impl<'a> PrinterPath<'a> {
|
||||
/// Create a new path suitable for printing.
|
||||
pub fn new(path: &'a Path) -> PrinterPath<'a> {
|
||||
PrinterPath::new_impl(path)
|
||||
}
|
||||
|
||||
#[cfg(unix)]
|
||||
fn new_impl(path: &'a Path) -> PrinterPath<'a> {
|
||||
use std::os::unix::ffi::OsStrExt;
|
||||
PrinterPath(Cow::Borrowed(path.as_os_str().as_bytes()))
|
||||
}
|
||||
|
||||
#[cfg(not(unix))]
|
||||
fn new_impl(path: &'a Path) -> PrinterPath<'a> {
|
||||
PrinterPath(match path.to_string_lossy() {
|
||||
Cow::Owned(path) => Cow::Owned(path.into_bytes()),
|
||||
Cow::Borrowed(path) => Cow::Borrowed(path.as_bytes()),
|
||||
})
|
||||
PrinterPath(BString::from_path_lossy(path))
|
||||
}
|
||||
|
||||
/// Create a new printer path from the given path which can be efficiently
|
||||
@@ -302,7 +289,7 @@ impl<'a> PrinterPath<'a> {
|
||||
/// path separators that are both replaced by `new_sep`. In all other
|
||||
/// environments, only `/` is treated as a path separator.
|
||||
fn replace_separator(&mut self, new_sep: u8) {
|
||||
let transformed_path: Vec<_> = self.as_bytes().iter().map(|&b| {
|
||||
let transformed_path: BString = self.0.bytes().map(|b| {
|
||||
if b == b'/' || (cfg!(windows) && b == b'\\') {
|
||||
new_sep
|
||||
} else {
|
||||
@@ -314,7 +301,7 @@ impl<'a> PrinterPath<'a> {
|
||||
|
||||
/// Return the raw bytes for this path.
|
||||
pub fn as_bytes(&self) -> &[u8] {
|
||||
&*self.0
|
||||
self.0.as_bytes()
|
||||
}
|
||||
}
|
||||
|
||||
@@ -359,7 +346,7 @@ impl Serialize for NiceDuration {
|
||||
///
|
||||
/// This stops trimming a prefix as soon as it sees non-whitespace or a line
|
||||
/// terminator.
|
||||
pub fn trim_ascii_prefix_range(
|
||||
pub fn trim_ascii_prefix(
|
||||
line_term: LineTerminator,
|
||||
slice: &[u8],
|
||||
range: Match,
|
||||
@@ -379,14 +366,3 @@ pub fn trim_ascii_prefix_range(
|
||||
.count();
|
||||
range.with_start(range.start() + count)
|
||||
}
|
||||
|
||||
/// Trim prefix ASCII spaces from the given slice and return the corresponding
|
||||
/// sub-slice.
|
||||
pub fn trim_ascii_prefix(line_term: LineTerminator, slice: &[u8]) -> &[u8] {
|
||||
let range = trim_ascii_prefix_range(
|
||||
line_term,
|
||||
slice,
|
||||
Match::new(0, slice.len()),
|
||||
);
|
||||
&slice[range]
|
||||
}
|
||||
|
@@ -1,6 +1,6 @@
|
||||
[package]
|
||||
name = "grep-regex"
|
||||
version = "0.1.1" #:version
|
||||
version = "0.1.3" #:version
|
||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||
description = """
|
||||
Use Rust's regex library with the 'grep' crate.
|
||||
@@ -13,9 +13,10 @@ keywords = ["regex", "grep", "search", "pattern", "line"]
|
||||
license = "Unlicense/MIT"
|
||||
|
||||
[dependencies]
|
||||
aho-corasick = "0.7.3"
|
||||
grep-matcher = { version = "0.1.2", path = "../grep-matcher" }
|
||||
log = "0.4.5"
|
||||
grep-matcher = { version = "0.1.1", path = "../grep-matcher" }
|
||||
regex = "1.0.5"
|
||||
regex-syntax = "0.6.2"
|
||||
regex = "1.1"
|
||||
regex-syntax = "0.6.5"
|
||||
thread_local = "0.3.6"
|
||||
utf8-ranges = "1.0.1"
|
||||
|
@@ -1,12 +1,13 @@
|
||||
use grep_matcher::{ByteSet, LineTerminator};
|
||||
use regex::bytes::{Regex, RegexBuilder};
|
||||
use regex_syntax::ast::{self, Ast};
|
||||
use regex_syntax::hir::Hir;
|
||||
use regex_syntax::hir::{self, Hir};
|
||||
|
||||
use ast::AstAnalysis;
|
||||
use crlf::crlfify;
|
||||
use error::Error;
|
||||
use literal::LiteralSets;
|
||||
use multi::alternation_literals;
|
||||
use non_matching::non_matching_bytes;
|
||||
use strip::strip_from_match;
|
||||
|
||||
@@ -67,19 +68,17 @@ impl Config {
|
||||
/// If there was a problem parsing the given expression then an error
|
||||
/// is returned.
|
||||
pub fn hir(&self, pattern: &str) -> Result<ConfiguredHIR, Error> {
|
||||
let analysis = self.analysis(pattern)?;
|
||||
let expr = ::regex_syntax::ParserBuilder::new()
|
||||
.nest_limit(self.nest_limit)
|
||||
.octal(self.octal)
|
||||
let ast = self.ast(pattern)?;
|
||||
let analysis = self.analysis(&ast)?;
|
||||
let expr = hir::translate::TranslatorBuilder::new()
|
||||
.allow_invalid_utf8(true)
|
||||
.ignore_whitespace(self.ignore_whitespace)
|
||||
.case_insensitive(self.is_case_insensitive(&analysis)?)
|
||||
.case_insensitive(self.is_case_insensitive(&analysis))
|
||||
.multi_line(self.multi_line)
|
||||
.dot_matches_new_line(self.dot_matches_new_line)
|
||||
.swap_greed(self.swap_greed)
|
||||
.unicode(self.unicode)
|
||||
.build()
|
||||
.parse(pattern)
|
||||
.translate(pattern, &ast)
|
||||
.map_err(Error::regex)?;
|
||||
let expr = match self.line_terminator {
|
||||
None => expr,
|
||||
@@ -99,21 +98,34 @@ impl Config {
|
||||
fn is_case_insensitive(
|
||||
&self,
|
||||
analysis: &AstAnalysis,
|
||||
) -> Result<bool, Error> {
|
||||
) -> bool {
|
||||
if self.case_insensitive {
|
||||
return Ok(true);
|
||||
return true;
|
||||
}
|
||||
if !self.case_smart {
|
||||
return Ok(false);
|
||||
return false;
|
||||
}
|
||||
Ok(analysis.any_literal() && !analysis.any_uppercase())
|
||||
analysis.any_literal() && !analysis.any_uppercase()
|
||||
}
|
||||
|
||||
/// Returns true if and only if this config is simple enough such that
|
||||
/// if the pattern is a simple alternation of literals, then it can be
|
||||
/// constructed via a plain Aho-Corasick automaton.
|
||||
///
|
||||
/// Note that it is OK to return true even when settings like `multi_line`
|
||||
/// are enabled, since if multi-line can impact the match semantics of a
|
||||
/// regex, then it is by definition not a simple alternation of literals.
|
||||
pub fn can_plain_aho_corasick(&self) -> bool {
|
||||
!self.word
|
||||
&& !self.case_insensitive
|
||||
&& !self.case_smart
|
||||
}
|
||||
|
||||
/// Perform analysis on the AST of this pattern.
|
||||
///
|
||||
/// This returns an error if the given pattern failed to parse.
|
||||
fn analysis(&self, pattern: &str) -> Result<AstAnalysis, Error> {
|
||||
Ok(AstAnalysis::from_ast(&self.ast(pattern)?))
|
||||
fn analysis(&self, ast: &Ast) -> Result<AstAnalysis, Error> {
|
||||
Ok(AstAnalysis::from_ast(ast))
|
||||
}
|
||||
|
||||
/// Parse the given pattern into its abstract syntax.
|
||||
@@ -160,11 +172,28 @@ impl ConfiguredHIR {
|
||||
non_matching_bytes(&self.expr)
|
||||
}
|
||||
|
||||
/// Returns true if and only if this regex needs to have its match offsets
|
||||
/// tweaked because of CRLF support. Specifically, this occurs when the
|
||||
/// CRLF hack is enabled and the regex is line anchored at the end. In
|
||||
/// this case, matches that end with a `\r` have the `\r` stripped.
|
||||
pub fn needs_crlf_stripped(&self) -> bool {
|
||||
self.config.crlf && self.expr.is_line_anchored_end()
|
||||
}
|
||||
|
||||
/// Builds a regular expression from this HIR expression.
|
||||
pub fn regex(&self) -> Result<Regex, Error> {
|
||||
self.pattern_to_regex(&self.expr.to_string())
|
||||
}
|
||||
|
||||
/// If this HIR corresponds to an alternation of literals with no
|
||||
/// capturing groups, then this returns those literals.
|
||||
pub fn alternation_literals(&self) -> Option<Vec<Vec<u8>>> {
|
||||
if !self.config.can_plain_aho_corasick() {
|
||||
return None;
|
||||
}
|
||||
alternation_literals(&self.expr)
|
||||
}
|
||||
|
||||
/// Applies the given function to the concrete syntax of this HIR and then
|
||||
/// generates a new HIR based on the result of the function in a way that
|
||||
/// preserves the configuration.
|
||||
@@ -199,7 +228,7 @@ impl ConfiguredHIR {
|
||||
if self.config.line_terminator.is_none() {
|
||||
return Ok(None);
|
||||
}
|
||||
match LiteralSets::new(&self.expr).one_regex() {
|
||||
match LiteralSets::new(&self.expr).one_regex(self.config.word) {
|
||||
None => Ok(None),
|
||||
Some(pattern) => self.pattern_to_regex(&pattern).map(Some),
|
||||
}
|
||||
|
@@ -1,5 +1,112 @@
|
||||
use std::collections::HashMap;
|
||||
|
||||
use grep_matcher::{Match, Matcher, NoError};
|
||||
use regex::bytes::Regex;
|
||||
use regex_syntax::hir::{self, Hir, HirKind};
|
||||
|
||||
use config::ConfiguredHIR;
|
||||
use error::Error;
|
||||
use matcher::RegexCaptures;
|
||||
|
||||
/// A matcher for implementing "word match" semantics.
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct CRLFMatcher {
|
||||
/// The regex.
|
||||
regex: Regex,
|
||||
/// A map from capture group name to capture group index.
|
||||
names: HashMap<String, usize>,
|
||||
}
|
||||
|
||||
impl CRLFMatcher {
|
||||
/// Create a new matcher from the given pattern that strips `\r` from the
|
||||
/// end of every match.
|
||||
///
|
||||
/// This panics if the given expression doesn't need its CRLF stripped.
|
||||
pub fn new(expr: &ConfiguredHIR) -> Result<CRLFMatcher, Error> {
|
||||
assert!(expr.needs_crlf_stripped());
|
||||
|
||||
let regex = expr.regex()?;
|
||||
let mut names = HashMap::new();
|
||||
for (i, optional_name) in regex.capture_names().enumerate() {
|
||||
if let Some(name) = optional_name {
|
||||
names.insert(name.to_string(), i.checked_sub(1).unwrap());
|
||||
}
|
||||
}
|
||||
Ok(CRLFMatcher { regex, names })
|
||||
}
|
||||
|
||||
/// Return the underlying regex used by this matcher.
|
||||
pub fn regex(&self) -> &Regex {
|
||||
&self.regex
|
||||
}
|
||||
}
|
||||
|
||||
impl Matcher for CRLFMatcher {
|
||||
type Captures = RegexCaptures;
|
||||
type Error = NoError;
|
||||
|
||||
fn find_at(
|
||||
&self,
|
||||
haystack: &[u8],
|
||||
at: usize,
|
||||
) -> Result<Option<Match>, NoError> {
|
||||
let m = match self.regex.find_at(haystack, at) {
|
||||
None => return Ok(None),
|
||||
Some(m) => Match::new(m.start(), m.end()),
|
||||
};
|
||||
Ok(Some(adjust_match(haystack, m)))
|
||||
}
|
||||
|
||||
fn new_captures(&self) -> Result<RegexCaptures, NoError> {
|
||||
Ok(RegexCaptures::new(self.regex.capture_locations()))
|
||||
}
|
||||
|
||||
fn capture_count(&self) -> usize {
|
||||
self.regex.captures_len().checked_sub(1).unwrap()
|
||||
}
|
||||
|
||||
fn capture_index(&self, name: &str) -> Option<usize> {
|
||||
self.names.get(name).map(|i| *i)
|
||||
}
|
||||
|
||||
fn captures_at(
|
||||
&self,
|
||||
haystack: &[u8],
|
||||
at: usize,
|
||||
caps: &mut RegexCaptures,
|
||||
) -> Result<bool, NoError> {
|
||||
caps.strip_crlf(false);
|
||||
let r = self.regex.captures_read_at(
|
||||
caps.locations_mut(), haystack, at,
|
||||
);
|
||||
if !r.is_some() {
|
||||
return Ok(false);
|
||||
}
|
||||
|
||||
// If the end of our match includes a `\r`, then strip it from all
|
||||
// capture groups ending at the same location.
|
||||
let end = caps.locations().get(0).unwrap().1;
|
||||
if end > 0 && haystack.get(end - 1) == Some(&b'\r') {
|
||||
caps.strip_crlf(true);
|
||||
}
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
// We specifically do not implement other methods like find_iter or
|
||||
// captures_iter. Namely, the iter methods are guaranteed to be correct
|
||||
// by virtue of implementing find_at and captures_at above.
|
||||
}
|
||||
|
||||
/// If the given match ends with a `\r`, then return a new match that ends
|
||||
/// immediately before the `\r`.
|
||||
pub fn adjust_match(haystack: &[u8], m: Match) -> Match {
|
||||
if m.end() > 0 && haystack.get(m.end() - 1) == Some(&b'\r') {
|
||||
m.with_end(m.end() - 1)
|
||||
} else {
|
||||
m
|
||||
}
|
||||
}
|
||||
|
||||
/// Substitutes all occurrences of multi-line enabled `$` with `(?:\r?$)`.
|
||||
///
|
||||
/// This does not preserve the exact semantics of the given expression,
|
||||
|
@@ -4,6 +4,7 @@ An implementation of `grep-matcher`'s `Matcher` trait for Rust's regex engine.
|
||||
|
||||
#![deny(missing_docs)]
|
||||
|
||||
extern crate aho_corasick;
|
||||
extern crate grep_matcher;
|
||||
#[macro_use]
|
||||
extern crate log;
|
||||
@@ -21,6 +22,7 @@ mod crlf;
|
||||
mod error;
|
||||
mod literal;
|
||||
mod matcher;
|
||||
mod multi;
|
||||
mod non_matching;
|
||||
mod strip;
|
||||
mod util;
|
||||
|
@@ -47,18 +47,23 @@ impl LiteralSets {
|
||||
/// generated these literal sets. The idea here is that the pattern
|
||||
/// returned by this method is much cheaper to search for. i.e., It is
|
||||
/// usually a single literal or an alternation of literals.
|
||||
pub fn one_regex(&self) -> Option<String> {
|
||||
pub fn one_regex(&self, word: bool) -> Option<String> {
|
||||
// TODO: The logic in this function is basically inscrutable. It grew
|
||||
// organically in the old grep 0.1 crate. Ideally, it would be
|
||||
// re-worked. In fact, the entire inner literal extraction should be
|
||||
// re-worked. Actually, most of regex-syntax's literal extraction
|
||||
// should also be re-worked. Alas... only so much time in the day.
|
||||
|
||||
if self.prefixes.all_complete() && !self.prefixes.is_empty() {
|
||||
debug!("literal prefixes detected: {:?}", self.prefixes);
|
||||
// When this is true, the regex engine will do a literal scan,
|
||||
// so we don't need to return anything.
|
||||
return None;
|
||||
if !word {
|
||||
if self.prefixes.all_complete() && !self.prefixes.is_empty() {
|
||||
debug!("literal prefixes detected: {:?}", self.prefixes);
|
||||
// When this is true, the regex engine will do a literal scan,
|
||||
// so we don't need to return anything. But we only do this
|
||||
// if we aren't doing a word regex, since a word regex adds
|
||||
// a `(?:\W|^)` to the beginning of the regex, thereby
|
||||
// defeating the regex engine's literal detection.
|
||||
return None;
|
||||
}
|
||||
}
|
||||
|
||||
// Out of inner required literals, prefixes and suffixes, which one
|
||||
@@ -166,10 +171,10 @@ fn union_required(expr: &Hir, lits: &mut Literals) {
|
||||
lits.cut();
|
||||
continue;
|
||||
}
|
||||
if lits2.contains_empty() {
|
||||
if lits2.contains_empty() || !is_simple(&e) {
|
||||
lits.cut();
|
||||
}
|
||||
if !lits.cross_product(&lits2) {
|
||||
if !lits.cross_product(&lits2) || !lits2.any_complete() {
|
||||
// If this expression couldn't yield any literal that
|
||||
// could be extended, then we need to quit. Since we're
|
||||
// short-circuiting, we also need to freeze every member.
|
||||
@@ -250,6 +255,20 @@ fn alternate_literals<F: FnMut(&Hir, &mut Literals)>(
|
||||
}
|
||||
}
|
||||
|
||||
fn is_simple(expr: &Hir) -> bool {
|
||||
match *expr.kind() {
|
||||
HirKind::Empty
|
||||
| HirKind::Literal(_)
|
||||
| HirKind::Class(_)
|
||||
| HirKind::Repetition(_)
|
||||
| HirKind::Concat(_)
|
||||
| HirKind::Alternation(_) => true,
|
||||
HirKind::Anchor(_)
|
||||
| HirKind::WordBoundary(_)
|
||||
| HirKind::Group(_) => false,
|
||||
}
|
||||
}
|
||||
|
||||
/// Return the number of characters in the given class.
|
||||
fn count_unicode_class(cls: &hir::ClassUnicode) -> u32 {
|
||||
cls.iter().map(|r| 1 + (r.end() as u32 - r.start() as u32)).sum()
|
||||
@@ -271,7 +290,7 @@ mod tests {
|
||||
}
|
||||
|
||||
fn one_regex(pattern: &str) -> Option<String> {
|
||||
sets(pattern).one_regex()
|
||||
sets(pattern).one_regex(false)
|
||||
}
|
||||
|
||||
// Put a pattern into the same format as the one returned by `one_regex`.
|
||||
@@ -301,4 +320,12 @@ mod tests {
|
||||
// assert_eq!(one_regex(r"\w(foo|bar|baz)"), pat("foo|bar|baz"));
|
||||
// assert_eq!(one_regex(r"\w(foo|bar|baz)\w"), pat("foo|bar|baz"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn regression_1064() {
|
||||
// Regression from:
|
||||
// https://github.com/BurntSushi/ripgrep/issues/1064
|
||||
// assert_eq!(one_regex(r"a.*c"), pat("a"));
|
||||
assert_eq!(one_regex(r"a(.*c)"), pat("a"));
|
||||
}
|
||||
}
|
||||
|
@@ -6,7 +6,9 @@ use grep_matcher::{
|
||||
use regex::bytes::{CaptureLocations, Regex};
|
||||
|
||||
use config::{Config, ConfiguredHIR};
|
||||
use crlf::CRLFMatcher;
|
||||
use error::Error;
|
||||
use multi::MultiLiteralMatcher;
|
||||
use word::WordMatcher;
|
||||
|
||||
/// A builder for constructing a `Matcher` using regular expressions.
|
||||
@@ -49,14 +51,40 @@ impl RegexMatcherBuilder {
|
||||
if let Some(ref re) = fast_line_regex {
|
||||
trace!("extracted fast line regex: {:?}", re);
|
||||
}
|
||||
|
||||
let matcher = RegexMatcherImpl::new(&chir)?;
|
||||
trace!("final regex: {:?}", matcher.regex());
|
||||
Ok(RegexMatcher {
|
||||
config: self.config.clone(),
|
||||
matcher: RegexMatcherImpl::new(&chir)?,
|
||||
matcher: matcher,
|
||||
fast_line_regex: fast_line_regex,
|
||||
non_matching_bytes: non_matching_bytes,
|
||||
})
|
||||
}
|
||||
|
||||
/// Build a new matcher from a plain alternation of literals.
|
||||
///
|
||||
/// Depending on the configuration set by the builder, this may be able to
|
||||
/// build a matcher substantially faster than by joining the patterns with
|
||||
/// a `|` and calling `build`.
|
||||
pub fn build_literals<B: AsRef<str>>(
|
||||
&self,
|
||||
literals: &[B],
|
||||
) -> Result<RegexMatcher, Error> {
|
||||
let slices: Vec<_> = literals.iter().map(|s| s.as_ref()).collect();
|
||||
if !self.config.can_plain_aho_corasick() || literals.len() < 40 {
|
||||
return self.build(&slices.join("|"));
|
||||
}
|
||||
let matcher = MultiLiteralMatcher::new(&slices)?;
|
||||
let imp = RegexMatcherImpl::MultiLiteral(matcher);
|
||||
Ok(RegexMatcher {
|
||||
config: self.config.clone(),
|
||||
matcher: imp,
|
||||
fast_line_regex: None,
|
||||
non_matching_bytes: ByteSet::empty(),
|
||||
})
|
||||
}
|
||||
|
||||
/// Set the value for the case insensitive (`i`) flag.
|
||||
///
|
||||
/// When enabled, letters in the pattern will match both upper case and
|
||||
@@ -344,6 +372,13 @@ impl RegexMatcher {
|
||||
enum RegexMatcherImpl {
|
||||
/// The standard matcher used for all regular expressions.
|
||||
Standard(StandardMatcher),
|
||||
/// A matcher for an alternation of plain literals.
|
||||
MultiLiteral(MultiLiteralMatcher),
|
||||
/// A matcher that strips `\r` from the end of matches.
|
||||
///
|
||||
/// This is only used when the CRLF hack is enabled and the regex is line
|
||||
/// anchored at the end.
|
||||
CRLF(CRLFMatcher),
|
||||
/// A matcher that only matches at word boundaries. This transforms the
|
||||
/// regex to `(^|\W)(...)($|\W)` instead of the more intuitive `\b(...)\b`.
|
||||
/// Because of this, the WordMatcher provides its own implementation of
|
||||
@@ -358,10 +393,28 @@ impl RegexMatcherImpl {
|
||||
fn new(expr: &ConfiguredHIR) -> Result<RegexMatcherImpl, Error> {
|
||||
if expr.config().word {
|
||||
Ok(RegexMatcherImpl::Word(WordMatcher::new(expr)?))
|
||||
} else if expr.needs_crlf_stripped() {
|
||||
Ok(RegexMatcherImpl::CRLF(CRLFMatcher::new(expr)?))
|
||||
} else {
|
||||
if let Some(lits) = expr.alternation_literals() {
|
||||
if lits.len() >= 40 {
|
||||
let matcher = MultiLiteralMatcher::new(&lits)?;
|
||||
return Ok(RegexMatcherImpl::MultiLiteral(matcher));
|
||||
}
|
||||
}
|
||||
Ok(RegexMatcherImpl::Standard(StandardMatcher::new(expr)?))
|
||||
}
|
||||
}
|
||||
|
||||
/// Return the underlying regex object used.
|
||||
fn regex(&self) -> String {
|
||||
match *self {
|
||||
RegexMatcherImpl::Word(ref x) => x.regex().to_string(),
|
||||
RegexMatcherImpl::CRLF(ref x) => x.regex().to_string(),
|
||||
RegexMatcherImpl::MultiLiteral(_) => "<N/A>".to_string(),
|
||||
RegexMatcherImpl::Standard(ref x) => x.regex.to_string(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// This implementation just dispatches on the internal matcher impl except
|
||||
@@ -379,6 +432,8 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.find_at(haystack, at),
|
||||
MultiLiteral(ref m) => m.find_at(haystack, at),
|
||||
CRLF(ref m) => m.find_at(haystack, at),
|
||||
Word(ref m) => m.find_at(haystack, at),
|
||||
}
|
||||
}
|
||||
@@ -387,6 +442,8 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.new_captures(),
|
||||
MultiLiteral(ref m) => m.new_captures(),
|
||||
CRLF(ref m) => m.new_captures(),
|
||||
Word(ref m) => m.new_captures(),
|
||||
}
|
||||
}
|
||||
@@ -395,6 +452,8 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.capture_count(),
|
||||
MultiLiteral(ref m) => m.capture_count(),
|
||||
CRLF(ref m) => m.capture_count(),
|
||||
Word(ref m) => m.capture_count(),
|
||||
}
|
||||
}
|
||||
@@ -403,6 +462,8 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.capture_index(name),
|
||||
MultiLiteral(ref m) => m.capture_index(name),
|
||||
CRLF(ref m) => m.capture_index(name),
|
||||
Word(ref m) => m.capture_index(name),
|
||||
}
|
||||
}
|
||||
@@ -411,6 +472,8 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.find(haystack),
|
||||
MultiLiteral(ref m) => m.find(haystack),
|
||||
CRLF(ref m) => m.find(haystack),
|
||||
Word(ref m) => m.find(haystack),
|
||||
}
|
||||
}
|
||||
@@ -425,6 +488,8 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.find_iter(haystack, matched),
|
||||
MultiLiteral(ref m) => m.find_iter(haystack, matched),
|
||||
CRLF(ref m) => m.find_iter(haystack, matched),
|
||||
Word(ref m) => m.find_iter(haystack, matched),
|
||||
}
|
||||
}
|
||||
@@ -439,6 +504,8 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.try_find_iter(haystack, matched),
|
||||
MultiLiteral(ref m) => m.try_find_iter(haystack, matched),
|
||||
CRLF(ref m) => m.try_find_iter(haystack, matched),
|
||||
Word(ref m) => m.try_find_iter(haystack, matched),
|
||||
}
|
||||
}
|
||||
@@ -451,6 +518,8 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.captures(haystack, caps),
|
||||
MultiLiteral(ref m) => m.captures(haystack, caps),
|
||||
CRLF(ref m) => m.captures(haystack, caps),
|
||||
Word(ref m) => m.captures(haystack, caps),
|
||||
}
|
||||
}
|
||||
@@ -466,6 +535,8 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.captures_iter(haystack, caps, matched),
|
||||
MultiLiteral(ref m) => m.captures_iter(haystack, caps, matched),
|
||||
CRLF(ref m) => m.captures_iter(haystack, caps, matched),
|
||||
Word(ref m) => m.captures_iter(haystack, caps, matched),
|
||||
}
|
||||
}
|
||||
@@ -481,6 +552,10 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.try_captures_iter(haystack, caps, matched),
|
||||
MultiLiteral(ref m) => {
|
||||
m.try_captures_iter(haystack, caps, matched)
|
||||
}
|
||||
CRLF(ref m) => m.try_captures_iter(haystack, caps, matched),
|
||||
Word(ref m) => m.try_captures_iter(haystack, caps, matched),
|
||||
}
|
||||
}
|
||||
@@ -494,6 +569,8 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.captures_at(haystack, at, caps),
|
||||
MultiLiteral(ref m) => m.captures_at(haystack, at, caps),
|
||||
CRLF(ref m) => m.captures_at(haystack, at, caps),
|
||||
Word(ref m) => m.captures_at(haystack, at, caps),
|
||||
}
|
||||
}
|
||||
@@ -509,6 +586,8 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.replace(haystack, dst, append),
|
||||
MultiLiteral(ref m) => m.replace(haystack, dst, append),
|
||||
CRLF(ref m) => m.replace(haystack, dst, append),
|
||||
Word(ref m) => m.replace(haystack, dst, append),
|
||||
}
|
||||
}
|
||||
@@ -527,6 +606,12 @@ impl Matcher for RegexMatcher {
|
||||
Standard(ref m) => {
|
||||
m.replace_with_captures(haystack, caps, dst, append)
|
||||
}
|
||||
MultiLiteral(ref m) => {
|
||||
m.replace_with_captures(haystack, caps, dst, append)
|
||||
}
|
||||
CRLF(ref m) => {
|
||||
m.replace_with_captures(haystack, caps, dst, append)
|
||||
}
|
||||
Word(ref m) => {
|
||||
m.replace_with_captures(haystack, caps, dst, append)
|
||||
}
|
||||
@@ -537,6 +622,8 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.is_match(haystack),
|
||||
MultiLiteral(ref m) => m.is_match(haystack),
|
||||
CRLF(ref m) => m.is_match(haystack),
|
||||
Word(ref m) => m.is_match(haystack),
|
||||
}
|
||||
}
|
||||
@@ -549,6 +636,8 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.is_match_at(haystack, at),
|
||||
MultiLiteral(ref m) => m.is_match_at(haystack, at),
|
||||
CRLF(ref m) => m.is_match_at(haystack, at),
|
||||
Word(ref m) => m.is_match_at(haystack, at),
|
||||
}
|
||||
}
|
||||
@@ -560,6 +649,8 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.shortest_match(haystack),
|
||||
MultiLiteral(ref m) => m.shortest_match(haystack),
|
||||
CRLF(ref m) => m.shortest_match(haystack),
|
||||
Word(ref m) => m.shortest_match(haystack),
|
||||
}
|
||||
}
|
||||
@@ -572,6 +663,8 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.shortest_match_at(haystack, at),
|
||||
MultiLiteral(ref m) => m.shortest_match_at(haystack, at),
|
||||
CRLF(ref m) => m.shortest_match_at(haystack, at),
|
||||
Word(ref m) => m.shortest_match_at(haystack, at),
|
||||
}
|
||||
}
|
||||
@@ -671,7 +764,9 @@ impl Matcher for StandardMatcher {
|
||||
at: usize,
|
||||
caps: &mut RegexCaptures,
|
||||
) -> Result<bool, NoError> {
|
||||
Ok(self.regex.captures_read_at(&mut caps.locs, haystack, at).is_some())
|
||||
Ok(self.regex.captures_read_at(
|
||||
&mut caps.locations_mut(), haystack, at,
|
||||
).is_some())
|
||||
}
|
||||
|
||||
fn shortest_match_at(
|
||||
@@ -698,34 +793,84 @@ impl Matcher for StandardMatcher {
|
||||
/// index of the group using the corresponding matcher's `capture_index`
|
||||
/// method, and then use that index with `RegexCaptures::get`.
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct RegexCaptures {
|
||||
/// Where the locations are stored.
|
||||
locs: CaptureLocations,
|
||||
/// These captures behave as if the capturing groups begin at the given
|
||||
/// offset. When set to `0`, this has no affect and capture groups are
|
||||
/// indexed like normal.
|
||||
///
|
||||
/// This is useful when building matchers that wrap arbitrary regular
|
||||
/// expressions. For example, `WordMatcher` takes an existing regex `re`
|
||||
/// and creates `(?:^|\W)(re)(?:$|\W)`, but hides the fact that the regex
|
||||
/// has been wrapped from the caller. In order to do this, the matcher
|
||||
/// and the capturing groups must behave as if `(re)` is the `0`th capture
|
||||
/// group.
|
||||
offset: usize,
|
||||
pub struct RegexCaptures(RegexCapturesImp);
|
||||
|
||||
#[derive(Clone, Debug)]
|
||||
enum RegexCapturesImp {
|
||||
AhoCorasick {
|
||||
/// The start and end of the match, corresponding to capture group 0.
|
||||
mat: Option<Match>,
|
||||
},
|
||||
Regex {
|
||||
/// Where the locations are stored.
|
||||
locs: CaptureLocations,
|
||||
/// These captures behave as if the capturing groups begin at the given
|
||||
/// offset. When set to `0`, this has no affect and capture groups are
|
||||
/// indexed like normal.
|
||||
///
|
||||
/// This is useful when building matchers that wrap arbitrary regular
|
||||
/// expressions. For example, `WordMatcher` takes an existing regex
|
||||
/// `re` and creates `(?:^|\W)(re)(?:$|\W)`, but hides the fact that
|
||||
/// the regex has been wrapped from the caller. In order to do this,
|
||||
/// the matcher and the capturing groups must behave as if `(re)` is
|
||||
/// the `0`th capture group.
|
||||
offset: usize,
|
||||
/// When enable, the end of a match has `\r` stripped from it, if one
|
||||
/// exists.
|
||||
strip_crlf: bool,
|
||||
},
|
||||
}
|
||||
|
||||
impl Captures for RegexCaptures {
|
||||
fn len(&self) -> usize {
|
||||
self.locs.len().checked_sub(self.offset).unwrap()
|
||||
match self.0 {
|
||||
RegexCapturesImp::AhoCorasick { .. } => 1,
|
||||
RegexCapturesImp::Regex { ref locs, offset, .. } => {
|
||||
locs.len().checked_sub(offset).unwrap()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn get(&self, i: usize) -> Option<Match> {
|
||||
let actual = i.checked_add(self.offset).unwrap();
|
||||
self.locs.pos(actual).map(|(s, e)| Match::new(s, e))
|
||||
match self.0 {
|
||||
RegexCapturesImp::AhoCorasick { mat, .. } => {
|
||||
if i == 0 {
|
||||
mat
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
RegexCapturesImp::Regex { ref locs, offset, strip_crlf } => {
|
||||
if !strip_crlf {
|
||||
let actual = i.checked_add(offset).unwrap();
|
||||
return locs.pos(actual).map(|(s, e)| Match::new(s, e));
|
||||
}
|
||||
|
||||
// currently don't support capture offsetting with CRLF
|
||||
// stripping
|
||||
assert_eq!(offset, 0);
|
||||
let m = match locs.pos(i).map(|(s, e)| Match::new(s, e)) {
|
||||
None => return None,
|
||||
Some(m) => m,
|
||||
};
|
||||
// If the end position of this match corresponds to the end
|
||||
// position of the overall match, then we apply our CRLF
|
||||
// stripping. Otherwise, we cannot assume stripping is correct.
|
||||
if i == 0 || m.end() == locs.pos(0).unwrap().1 {
|
||||
Some(m.with_end(m.end() - 1))
|
||||
} else {
|
||||
Some(m)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl RegexCaptures {
|
||||
pub(crate) fn simple() -> RegexCaptures {
|
||||
RegexCaptures(RegexCapturesImp::AhoCorasick { mat: None })
|
||||
}
|
||||
|
||||
pub(crate) fn new(locs: CaptureLocations) -> RegexCaptures {
|
||||
RegexCaptures::with_offset(locs, 0)
|
||||
}
|
||||
@@ -734,11 +879,53 @@ impl RegexCaptures {
|
||||
locs: CaptureLocations,
|
||||
offset: usize,
|
||||
) -> RegexCaptures {
|
||||
RegexCaptures { locs, offset }
|
||||
RegexCaptures(RegexCapturesImp::Regex {
|
||||
locs, offset, strip_crlf: false,
|
||||
})
|
||||
}
|
||||
|
||||
pub(crate) fn locations(&mut self) -> &mut CaptureLocations {
|
||||
&mut self.locs
|
||||
pub(crate) fn locations(&self) -> &CaptureLocations {
|
||||
match self.0 {
|
||||
RegexCapturesImp::AhoCorasick { .. } => {
|
||||
panic!("getting locations for simple captures is invalid")
|
||||
}
|
||||
RegexCapturesImp::Regex { ref locs, .. } => {
|
||||
locs
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub(crate) fn locations_mut(&mut self) -> &mut CaptureLocations {
|
||||
match self.0 {
|
||||
RegexCapturesImp::AhoCorasick { .. } => {
|
||||
panic!("getting locations for simple captures is invalid")
|
||||
}
|
||||
RegexCapturesImp::Regex { ref mut locs, .. } => {
|
||||
locs
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub(crate) fn strip_crlf(&mut self, yes: bool) {
|
||||
match self.0 {
|
||||
RegexCapturesImp::AhoCorasick { .. } => {
|
||||
panic!("setting strip_crlf for simple captures is invalid")
|
||||
}
|
||||
RegexCapturesImp::Regex { ref mut strip_crlf, .. } => {
|
||||
*strip_crlf = yes;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub(crate) fn set_simple(&mut self, one: Option<Match>) {
|
||||
match self.0 {
|
||||
RegexCapturesImp::AhoCorasick { ref mut mat } => {
|
||||
*mat = one;
|
||||
}
|
||||
RegexCapturesImp::Regex { .. } => {
|
||||
panic!("setting simple captures for regex is invalid")
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
127
grep-regex/src/multi.rs
Normal file
127
grep-regex/src/multi.rs
Normal file
@@ -0,0 +1,127 @@
|
||||
use aho_corasick::{AhoCorasick, AhoCorasickBuilder, MatchKind};
|
||||
use grep_matcher::{Matcher, Match, NoError};
|
||||
use regex_syntax::hir::Hir;
|
||||
|
||||
use error::Error;
|
||||
use matcher::RegexCaptures;
|
||||
|
||||
/// A matcher for an alternation of literals.
|
||||
///
|
||||
/// Ideally, this optimization would be pushed down into the regex engine, but
|
||||
/// making this work correctly there would require quite a bit of refactoring.
|
||||
/// Moreover, doing it one layer above lets us do thing like, "if we
|
||||
/// specifically only want to search for literals, then don't bother with
|
||||
/// regex parsing at all."
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct MultiLiteralMatcher {
|
||||
/// The Aho-Corasick automaton.
|
||||
ac: AhoCorasick,
|
||||
}
|
||||
|
||||
impl MultiLiteralMatcher {
|
||||
/// Create a new multi-literal matcher from the given literals.
|
||||
pub fn new<B: AsRef<[u8]>>(
|
||||
literals: &[B],
|
||||
) -> Result<MultiLiteralMatcher, Error> {
|
||||
let ac = AhoCorasickBuilder::new()
|
||||
.match_kind(MatchKind::LeftmostFirst)
|
||||
.auto_configure(literals)
|
||||
.build_with_size::<usize, _, _>(literals)
|
||||
.map_err(Error::regex)?;
|
||||
Ok(MultiLiteralMatcher { ac })
|
||||
}
|
||||
}
|
||||
|
||||
impl Matcher for MultiLiteralMatcher {
|
||||
type Captures = RegexCaptures;
|
||||
type Error = NoError;
|
||||
|
||||
fn find_at(
|
||||
&self,
|
||||
haystack: &[u8],
|
||||
at: usize,
|
||||
) -> Result<Option<Match>, NoError> {
|
||||
match self.ac.find(&haystack[at..]) {
|
||||
None => Ok(None),
|
||||
Some(m) => Ok(Some(Match::new(at + m.start(), at + m.end()))),
|
||||
}
|
||||
}
|
||||
|
||||
fn new_captures(&self) -> Result<RegexCaptures, NoError> {
|
||||
Ok(RegexCaptures::simple())
|
||||
}
|
||||
|
||||
fn capture_count(&self) -> usize {
|
||||
1
|
||||
}
|
||||
|
||||
fn capture_index(&self, _: &str) -> Option<usize> {
|
||||
None
|
||||
}
|
||||
|
||||
fn captures_at(
|
||||
&self,
|
||||
haystack: &[u8],
|
||||
at: usize,
|
||||
caps: &mut RegexCaptures,
|
||||
) -> Result<bool, NoError> {
|
||||
caps.set_simple(None);
|
||||
let mat = self.find_at(haystack, at)?;
|
||||
caps.set_simple(mat);
|
||||
Ok(mat.is_some())
|
||||
}
|
||||
|
||||
// We specifically do not implement other methods like find_iter. Namely,
|
||||
// the iter methods are guaranteed to be correct by virtue of implementing
|
||||
// find_at above.
|
||||
}
|
||||
|
||||
/// Alternation literals checks if the given HIR is a simple alternation of
|
||||
/// literals, and if so, returns them. Otherwise, this returns None.
|
||||
pub fn alternation_literals(expr: &Hir) -> Option<Vec<Vec<u8>>> {
|
||||
use regex_syntax::hir::{HirKind, Literal};
|
||||
|
||||
// This is pretty hacky, but basically, if `is_alternation_literal` is
|
||||
// true, then we can make several assumptions about the structure of our
|
||||
// HIR. This is what justifies the `unreachable!` statements below.
|
||||
|
||||
if !expr.is_alternation_literal() {
|
||||
return None;
|
||||
}
|
||||
let alts = match *expr.kind() {
|
||||
HirKind::Alternation(ref alts) => alts,
|
||||
_ => return None, // one literal isn't worth it
|
||||
};
|
||||
|
||||
let extendlit = |lit: &Literal, dst: &mut Vec<u8>| {
|
||||
match *lit {
|
||||
Literal::Unicode(c) => {
|
||||
let mut buf = [0; 4];
|
||||
dst.extend_from_slice(c.encode_utf8(&mut buf).as_bytes());
|
||||
}
|
||||
Literal::Byte(b) => {
|
||||
dst.push(b);
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
let mut lits = vec![];
|
||||
for alt in alts {
|
||||
let mut lit = vec![];
|
||||
match *alt.kind() {
|
||||
HirKind::Empty => {}
|
||||
HirKind::Literal(ref x) => extendlit(x, &mut lit),
|
||||
HirKind::Concat(ref exprs) => {
|
||||
for e in exprs {
|
||||
match *e.kind() {
|
||||
HirKind::Literal(ref x) => extendlit(x, &mut lit),
|
||||
_ => unreachable!("expected literal, got {:?}", e),
|
||||
}
|
||||
}
|
||||
}
|
||||
_ => unreachable!("expected literal or concat, got {:?}", alt),
|
||||
}
|
||||
lits.push(lit);
|
||||
}
|
||||
Some(lits)
|
||||
}
|
@@ -55,6 +55,11 @@ impl WordMatcher {
|
||||
}
|
||||
Ok(WordMatcher { regex, names, locs })
|
||||
}
|
||||
|
||||
/// Return the underlying regex used by this matcher.
|
||||
pub fn regex(&self) -> &Regex {
|
||||
&self.regex
|
||||
}
|
||||
}
|
||||
|
||||
impl Matcher for WordMatcher {
|
||||
@@ -98,7 +103,9 @@ impl Matcher for WordMatcher {
|
||||
at: usize,
|
||||
caps: &mut RegexCaptures,
|
||||
) -> Result<bool, NoError> {
|
||||
let r = self.regex.captures_read_at(caps.locations(), haystack, at);
|
||||
let r = self.regex.captures_read_at(
|
||||
caps.locations_mut(), haystack, at,
|
||||
);
|
||||
Ok(r.is_some())
|
||||
}
|
||||
|
||||
|
@@ -1,6 +1,6 @@
|
||||
[package]
|
||||
name = "grep-searcher"
|
||||
version = "0.1.1" #:version
|
||||
version = "0.1.4" #:version
|
||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||
description = """
|
||||
Fast line oriented regex searching as a library.
|
||||
@@ -13,23 +13,21 @@ keywords = ["regex", "grep", "egrep", "search", "pattern"]
|
||||
license = "Unlicense/MIT"
|
||||
|
||||
[dependencies]
|
||||
bytecount = "0.3.2"
|
||||
encoding_rs = "0.8.6"
|
||||
encoding_rs_io = "0.1.2"
|
||||
grep-matcher = { version = "0.1.1", path = "../grep-matcher" }
|
||||
bstr = { version = "0.1.2", default-features = false, features = ["std"] }
|
||||
bytecount = "0.5"
|
||||
encoding_rs = "0.8.14"
|
||||
encoding_rs_io = "0.1.6"
|
||||
grep-matcher = { version = "0.1.2", path = "../grep-matcher" }
|
||||
log = "0.4.5"
|
||||
memchr = "2.0.2"
|
||||
memmap = "0.6.2"
|
||||
memmap = "0.7"
|
||||
|
||||
[dev-dependencies]
|
||||
grep-regex = { version = "0.1.1", path = "../grep-regex" }
|
||||
regex = "1.0.5"
|
||||
grep-regex = { version = "0.1.3", path = "../grep-regex" }
|
||||
regex = "1.1"
|
||||
|
||||
[features]
|
||||
avx-accel = [
|
||||
"bytecount/avx-accel",
|
||||
]
|
||||
simd-accel = [
|
||||
"bytecount/simd-accel",
|
||||
"encoding_rs/simd-accel",
|
||||
]
|
||||
default = ["bytecount/runtime-dispatch-simd"]
|
||||
simd-accel = ["encoding_rs/simd-accel"]
|
||||
|
||||
# This feature is DEPRECATED. Runtime dispatch is used for SIMD now.
|
||||
avx-accel = []
|
||||
|
@@ -99,13 +99,13 @@ searches stdin.
|
||||
|
||||
#![deny(missing_docs)]
|
||||
|
||||
extern crate bstr;
|
||||
extern crate bytecount;
|
||||
extern crate encoding_rs;
|
||||
extern crate encoding_rs_io;
|
||||
extern crate grep_matcher;
|
||||
#[macro_use]
|
||||
extern crate log;
|
||||
extern crate memchr;
|
||||
extern crate memmap;
|
||||
#[cfg(test)]
|
||||
extern crate regex;
|
||||
|
@@ -1,8 +1,7 @@
|
||||
use std::cmp;
|
||||
use std::io;
|
||||
use std::ptr;
|
||||
|
||||
use memchr::{memchr, memrchr};
|
||||
use bstr::{BStr, BString};
|
||||
|
||||
/// The default buffer capacity that we use for the line buffer.
|
||||
pub(crate) const DEFAULT_BUFFER_CAPACITY: usize = 8 * (1<<10); // 8 KB
|
||||
@@ -123,7 +122,7 @@ impl LineBufferBuilder {
|
||||
pub fn build(&self) -> LineBuffer {
|
||||
LineBuffer {
|
||||
config: self.config,
|
||||
buf: vec![0; self.config.capacity],
|
||||
buf: BString::from(vec![0; self.config.capacity]),
|
||||
pos: 0,
|
||||
last_lineterm: 0,
|
||||
end: 0,
|
||||
@@ -255,6 +254,12 @@ impl<'b, R: io::Read> LineBufferReader<'b, R> {
|
||||
|
||||
/// Return the contents of this buffer.
|
||||
pub fn buffer(&self) -> &[u8] {
|
||||
self.line_buffer.buffer().as_bytes()
|
||||
}
|
||||
|
||||
/// Return the underlying buffer as a byte string. Used for tests only.
|
||||
#[cfg(test)]
|
||||
fn bstr(&self) -> &BStr {
|
||||
self.line_buffer.buffer()
|
||||
}
|
||||
|
||||
@@ -284,7 +289,7 @@ pub struct LineBuffer {
|
||||
/// The configuration of this buffer.
|
||||
config: Config,
|
||||
/// The primary buffer with which to hold data.
|
||||
buf: Vec<u8>,
|
||||
buf: BString,
|
||||
/// The current position of this buffer. This is always a valid sliceable
|
||||
/// index into `buf`, and its maximum value is the length of `buf`.
|
||||
pos: usize,
|
||||
@@ -312,6 +317,14 @@ pub struct LineBuffer {
|
||||
}
|
||||
|
||||
impl LineBuffer {
|
||||
/// Set the binary detection method used on this line buffer.
|
||||
///
|
||||
/// This permits dynamically changing the binary detection strategy on
|
||||
/// an existing line buffer without needing to create a new one.
|
||||
pub fn set_binary_detection(&mut self, binary: BinaryDetection) {
|
||||
self.config.binary = binary;
|
||||
}
|
||||
|
||||
/// Reset this buffer, such that it can be used with a new reader.
|
||||
fn clear(&mut self) {
|
||||
self.pos = 0;
|
||||
@@ -339,13 +352,13 @@ impl LineBuffer {
|
||||
}
|
||||
|
||||
/// Return the contents of this buffer.
|
||||
fn buffer(&self) -> &[u8] {
|
||||
fn buffer(&self) -> &BStr {
|
||||
&self.buf[self.pos..self.last_lineterm]
|
||||
}
|
||||
|
||||
/// Return the contents of the free space beyond the end of the buffer as
|
||||
/// a mutable slice.
|
||||
fn free_buffer(&mut self) -> &mut [u8] {
|
||||
fn free_buffer(&mut self) -> &mut BStr {
|
||||
&mut self.buf[self.end..]
|
||||
}
|
||||
|
||||
@@ -396,7 +409,7 @@ impl LineBuffer {
|
||||
assert_eq!(self.pos, 0);
|
||||
loop {
|
||||
self.ensure_capacity()?;
|
||||
let readlen = rdr.read(self.free_buffer())?;
|
||||
let readlen = rdr.read(self.free_buffer().as_bytes_mut())?;
|
||||
if readlen == 0 {
|
||||
// We're only done reading for good once the caller has
|
||||
// consumed everything.
|
||||
@@ -416,7 +429,7 @@ impl LineBuffer {
|
||||
match self.config.binary {
|
||||
BinaryDetection::None => {} // nothing to do
|
||||
BinaryDetection::Quit(byte) => {
|
||||
if let Some(i) = memchr(byte, newbytes) {
|
||||
if let Some(i) = newbytes.find_byte(byte) {
|
||||
self.end = oldend + i;
|
||||
self.last_lineterm = self.end;
|
||||
self.binary_byte_offset =
|
||||
@@ -444,7 +457,7 @@ impl LineBuffer {
|
||||
}
|
||||
|
||||
// Update our `last_lineterm` positions if we read one.
|
||||
if let Some(i) = memrchr(self.config.lineterm, newbytes) {
|
||||
if let Some(i) = newbytes.rfind_byte(self.config.lineterm) {
|
||||
self.last_lineterm = oldend + i + 1;
|
||||
return Ok(true);
|
||||
}
|
||||
@@ -467,40 +480,8 @@ impl LineBuffer {
|
||||
return;
|
||||
}
|
||||
|
||||
assert!(self.pos < self.end && self.end <= self.buf.len());
|
||||
let roll_len = self.end - self.pos;
|
||||
unsafe {
|
||||
// SAFETY: A buffer contains Copy data, so there's no problem
|
||||
// moving it around. Safety also depends on our indices being
|
||||
// in bounds, which they should always be, and we enforce with
|
||||
// an assert above.
|
||||
//
|
||||
// It seems like it should be possible to do this in safe code that
|
||||
// results in the same codegen. I tried the obvious:
|
||||
//
|
||||
// for (src, dst) in (self.pos..self.end).zip(0..) {
|
||||
// self.buf[dst] = self.buf[src];
|
||||
// }
|
||||
//
|
||||
// But the above does not work, and in fact compiles down to a slow
|
||||
// byte-by-byte loop. I tried a few other minor variations, but
|
||||
// alas, better minds might prevail.
|
||||
//
|
||||
// Overall, this doesn't save us *too* much. It mostly matters when
|
||||
// the number of bytes we're copying is large, which can happen
|
||||
// if the searcher is asked to produce a lot of context. We could
|
||||
// decide this isn't worth it, but it does make an appreciable
|
||||
// impact at or around the context=30 range on my machine.
|
||||
//
|
||||
// We could also use a temporary buffer that compiles down to two
|
||||
// memcpys and is faster than the byte-at-a-time loop, but it
|
||||
// complicates our options for limiting memory allocation a bit.
|
||||
ptr::copy(
|
||||
self.buf[self.pos..].as_ptr(),
|
||||
self.buf.as_mut_ptr(),
|
||||
roll_len,
|
||||
);
|
||||
}
|
||||
self.buf.copy_within(self.pos.., 0);
|
||||
self.pos = 0;
|
||||
self.last_lineterm = roll_len;
|
||||
self.end = roll_len;
|
||||
@@ -536,14 +517,15 @@ impl LineBuffer {
|
||||
}
|
||||
}
|
||||
|
||||
/// Replaces `src` with `replacement` in bytes.
|
||||
fn replace_bytes(bytes: &mut [u8], src: u8, replacement: u8) -> Option<usize> {
|
||||
/// Replaces `src` with `replacement` in bytes, and return the offset of the
|
||||
/// first replacement, if one exists.
|
||||
fn replace_bytes(bytes: &mut BStr, src: u8, replacement: u8) -> Option<usize> {
|
||||
if src == replacement {
|
||||
return None;
|
||||
}
|
||||
let mut first_pos = None;
|
||||
let mut pos = 0;
|
||||
while let Some(i) = memchr(src, &bytes[pos..]).map(|i| pos + i) {
|
||||
while let Some(i) = bytes[pos..].find_byte(src).map(|i| pos + i) {
|
||||
if first_pos.is_none() {
|
||||
first_pos = Some(i);
|
||||
}
|
||||
@@ -560,6 +542,7 @@ fn replace_bytes(bytes: &mut [u8], src: u8, replacement: u8) -> Option<usize> {
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use std::str;
|
||||
use bstr::BString;
|
||||
use super::*;
|
||||
|
||||
const SHERLOCK: &'static str = "\
|
||||
@@ -575,18 +558,14 @@ and exhibited clearly, with a label attached.\
|
||||
slice.to_string()
|
||||
}
|
||||
|
||||
fn btos(slice: &[u8]) -> &str {
|
||||
str::from_utf8(slice).unwrap()
|
||||
}
|
||||
|
||||
fn replace_str(
|
||||
slice: &str,
|
||||
src: u8,
|
||||
replacement: u8,
|
||||
) -> (String, Option<usize>) {
|
||||
let mut dst = slice.to_string().into_bytes();
|
||||
let mut dst = BString::from(slice);
|
||||
let result = replace_bytes(&mut dst, src, replacement);
|
||||
(String::from_utf8(dst).unwrap(), result)
|
||||
(dst.into_string().unwrap(), result)
|
||||
}
|
||||
|
||||
#[test]
|
||||
@@ -607,7 +586,7 @@ and exhibited clearly, with a label attached.\
|
||||
assert!(rdr.buffer().is_empty());
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "homer\nlisa\n");
|
||||
assert_eq!(rdr.bstr(), "homer\nlisa\n");
|
||||
assert_eq!(rdr.absolute_byte_offset(), 0);
|
||||
rdr.consume(5);
|
||||
assert_eq!(rdr.absolute_byte_offset(), 5);
|
||||
@@ -615,7 +594,7 @@ and exhibited clearly, with a label attached.\
|
||||
assert_eq!(rdr.absolute_byte_offset(), 11);
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "maggie");
|
||||
assert_eq!(rdr.bstr(), "maggie");
|
||||
rdr.consume_all();
|
||||
|
||||
assert!(!rdr.fill().unwrap());
|
||||
@@ -630,7 +609,7 @@ and exhibited clearly, with a label attached.\
|
||||
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "homer\nlisa\nmaggie\n");
|
||||
assert_eq!(rdr.bstr(), "homer\nlisa\nmaggie\n");
|
||||
rdr.consume_all();
|
||||
|
||||
assert!(!rdr.fill().unwrap());
|
||||
@@ -645,7 +624,7 @@ and exhibited clearly, with a label attached.\
|
||||
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "\n");
|
||||
assert_eq!(rdr.bstr(), "\n");
|
||||
rdr.consume_all();
|
||||
|
||||
assert!(!rdr.fill().unwrap());
|
||||
@@ -660,7 +639,7 @@ and exhibited clearly, with a label attached.\
|
||||
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "\n\n");
|
||||
assert_eq!(rdr.bstr(), "\n\n");
|
||||
rdr.consume_all();
|
||||
|
||||
assert!(!rdr.fill().unwrap());
|
||||
@@ -698,12 +677,12 @@ and exhibited clearly, with a label attached.\
|
||||
let mut linebuf = LineBufferBuilder::new().capacity(1).build();
|
||||
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
||||
|
||||
let mut got = vec![];
|
||||
let mut got = BString::new();
|
||||
while rdr.fill().unwrap() {
|
||||
got.extend(rdr.buffer());
|
||||
got.push(rdr.buffer());
|
||||
rdr.consume_all();
|
||||
}
|
||||
assert_eq!(bytes, btos(&got));
|
||||
assert_eq!(bytes, got);
|
||||
assert_eq!(rdr.absolute_byte_offset(), bytes.len() as u64);
|
||||
assert_eq!(rdr.binary_byte_offset(), None);
|
||||
}
|
||||
@@ -718,11 +697,11 @@ and exhibited clearly, with a label attached.\
|
||||
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "homer\n");
|
||||
assert_eq!(rdr.bstr(), "homer\n");
|
||||
rdr.consume_all();
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "lisa\n");
|
||||
assert_eq!(rdr.bstr(), "lisa\n");
|
||||
rdr.consume_all();
|
||||
|
||||
// This returns an error because while we have just enough room to
|
||||
@@ -732,11 +711,11 @@ and exhibited clearly, with a label attached.\
|
||||
assert!(rdr.fill().is_err());
|
||||
|
||||
// We can mush on though!
|
||||
assert_eq!(btos(rdr.buffer()), "m");
|
||||
assert_eq!(rdr.bstr(), "m");
|
||||
rdr.consume_all();
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "aggie");
|
||||
assert_eq!(rdr.bstr(), "aggie");
|
||||
rdr.consume_all();
|
||||
|
||||
assert!(!rdr.fill().unwrap());
|
||||
@@ -752,16 +731,16 @@ and exhibited clearly, with a label attached.\
|
||||
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "homer\n");
|
||||
assert_eq!(rdr.bstr(), "homer\n");
|
||||
rdr.consume_all();
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "lisa\n");
|
||||
assert_eq!(rdr.bstr(), "lisa\n");
|
||||
rdr.consume_all();
|
||||
|
||||
// We have just enough space.
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "maggie");
|
||||
assert_eq!(rdr.bstr(), "maggie");
|
||||
rdr.consume_all();
|
||||
|
||||
assert!(!rdr.fill().unwrap());
|
||||
@@ -777,7 +756,7 @@ and exhibited clearly, with a label attached.\
|
||||
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
||||
|
||||
assert!(rdr.fill().is_err());
|
||||
assert_eq!(btos(rdr.buffer()), "");
|
||||
assert_eq!(rdr.bstr(), "");
|
||||
}
|
||||
|
||||
#[test]
|
||||
@@ -789,7 +768,7 @@ and exhibited clearly, with a label attached.\
|
||||
assert!(rdr.buffer().is_empty());
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "homer\nli\x00sa\nmaggie\n");
|
||||
assert_eq!(rdr.bstr(), "homer\nli\x00sa\nmaggie\n");
|
||||
rdr.consume_all();
|
||||
|
||||
assert!(!rdr.fill().unwrap());
|
||||
@@ -808,7 +787,7 @@ and exhibited clearly, with a label attached.\
|
||||
assert!(rdr.buffer().is_empty());
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "homer\nli");
|
||||
assert_eq!(rdr.bstr(), "homer\nli");
|
||||
rdr.consume_all();
|
||||
|
||||
assert!(!rdr.fill().unwrap());
|
||||
@@ -825,7 +804,7 @@ and exhibited clearly, with a label attached.\
|
||||
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
||||
|
||||
assert!(!rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "");
|
||||
assert_eq!(rdr.bstr(), "");
|
||||
assert_eq!(rdr.absolute_byte_offset(), 0);
|
||||
assert_eq!(rdr.binary_byte_offset(), Some(0));
|
||||
}
|
||||
@@ -841,7 +820,7 @@ and exhibited clearly, with a label attached.\
|
||||
assert!(rdr.buffer().is_empty());
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "homer\nlisa\nmaggie\n");
|
||||
assert_eq!(rdr.bstr(), "homer\nlisa\nmaggie\n");
|
||||
rdr.consume_all();
|
||||
|
||||
assert!(!rdr.fill().unwrap());
|
||||
@@ -860,7 +839,7 @@ and exhibited clearly, with a label attached.\
|
||||
assert!(rdr.buffer().is_empty());
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "homer\nlisa\nmaggie");
|
||||
assert_eq!(rdr.bstr(), "homer\nlisa\nmaggie");
|
||||
rdr.consume_all();
|
||||
|
||||
assert!(!rdr.fill().unwrap());
|
||||
@@ -878,7 +857,7 @@ and exhibited clearly, with a label attached.\
|
||||
assert!(rdr.buffer().is_empty());
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "\
|
||||
assert_eq!(rdr.bstr(), "\
|
||||
For the Doctor Watsons of this world, as opposed to the Sherlock
|
||||
Holmeses, s\
|
||||
");
|
||||
@@ -901,7 +880,7 @@ Holmeses, s\
|
||||
assert!(rdr.buffer().is_empty());
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "homer\nli\nsa\nmaggie\n");
|
||||
assert_eq!(rdr.bstr(), "homer\nli\nsa\nmaggie\n");
|
||||
rdr.consume_all();
|
||||
|
||||
assert!(!rdr.fill().unwrap());
|
||||
@@ -920,7 +899,7 @@ Holmeses, s\
|
||||
assert!(rdr.buffer().is_empty());
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "\nhomer\nlisa\nmaggie\n");
|
||||
assert_eq!(rdr.bstr(), "\nhomer\nlisa\nmaggie\n");
|
||||
rdr.consume_all();
|
||||
|
||||
assert!(!rdr.fill().unwrap());
|
||||
@@ -939,7 +918,7 @@ Holmeses, s\
|
||||
assert!(rdr.buffer().is_empty());
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "homer\nlisa\nmaggie\n\n");
|
||||
assert_eq!(rdr.bstr(), "homer\nlisa\nmaggie\n\n");
|
||||
rdr.consume_all();
|
||||
|
||||
assert!(!rdr.fill().unwrap());
|
||||
@@ -958,7 +937,7 @@ Holmeses, s\
|
||||
assert!(rdr.buffer().is_empty());
|
||||
|
||||
assert!(rdr.fill().unwrap());
|
||||
assert_eq!(btos(rdr.buffer()), "homer\nlisa\nmaggie\n\n");
|
||||
assert_eq!(rdr.bstr(), "homer\nlisa\nmaggie\n\n");
|
||||
rdr.consume_all();
|
||||
|
||||
assert!(!rdr.fill().unwrap());
|
||||
|
@@ -2,8 +2,8 @@
|
||||
A collection of routines for performing operations on lines.
|
||||
*/
|
||||
|
||||
use bstr::B;
|
||||
use bytecount;
|
||||
use memchr::{memchr, memrchr};
|
||||
use grep_matcher::{LineTerminator, Match};
|
||||
|
||||
/// An iterator over lines in a particular slice of bytes.
|
||||
@@ -85,7 +85,7 @@ impl LineStep {
|
||||
#[inline(always)]
|
||||
fn next_impl(&mut self, mut bytes: &[u8]) -> Option<(usize, usize)> {
|
||||
bytes = &bytes[..self.end];
|
||||
match memchr(self.line_term, &bytes[self.pos..]) {
|
||||
match B(&bytes[self.pos..]).find_byte(self.line_term) {
|
||||
None => {
|
||||
if self.pos < bytes.len() {
|
||||
let m = (self.pos, bytes.len());
|
||||
@@ -135,14 +135,16 @@ pub fn locate(
|
||||
line_term: u8,
|
||||
range: Match,
|
||||
) -> Match {
|
||||
let line_start = memrchr(line_term, &bytes[0..range.start()])
|
||||
let line_start = B(&bytes[..range.start()])
|
||||
.rfind_byte(line_term)
|
||||
.map_or(0, |i| i + 1);
|
||||
let line_end =
|
||||
if range.end() > line_start && bytes[range.end() - 1] == line_term {
|
||||
range.end()
|
||||
} else {
|
||||
memchr(line_term, &bytes[range.end()..])
|
||||
.map_or(bytes.len(), |i| range.end() + i + 1)
|
||||
B(&bytes[range.end()..])
|
||||
.find_byte(line_term)
|
||||
.map_or(bytes.len(), |i| range.end() + i + 1)
|
||||
};
|
||||
Match::new(line_start, line_end)
|
||||
}
|
||||
@@ -180,7 +182,7 @@ fn preceding_by_pos(
|
||||
pos -= 1;
|
||||
}
|
||||
loop {
|
||||
match memrchr(line_term, &bytes[..pos]) {
|
||||
match B(&bytes[..pos]).rfind_byte(line_term) {
|
||||
None => {
|
||||
return 0;
|
||||
}
|
||||
|
@@ -1,3 +1,4 @@
|
||||
/// Like assert_eq, but nicer output for long strings.
|
||||
#[cfg(test)]
|
||||
#[macro_export]
|
||||
macro_rules! assert_eq_printed {
|
||||
|
@@ -1,6 +1,6 @@
|
||||
use std::cmp;
|
||||
|
||||
use memchr::memchr;
|
||||
use bstr::B;
|
||||
|
||||
use grep_matcher::{LineMatchKind, Matcher};
|
||||
use lines::{self, LineStep};
|
||||
@@ -90,6 +90,13 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
self.sink_matched(buf, range)
|
||||
}
|
||||
|
||||
pub fn binary_data(
|
||||
&mut self,
|
||||
binary_byte_offset: u64,
|
||||
) -> Result<bool, S::Error> {
|
||||
self.sink.binary_data(&self.searcher, binary_byte_offset)
|
||||
}
|
||||
|
||||
pub fn begin(&mut self) -> Result<bool, S::Error> {
|
||||
self.sink.begin(&self.searcher)
|
||||
}
|
||||
@@ -141,19 +148,28 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
consumed
|
||||
}
|
||||
|
||||
pub fn detect_binary(&mut self, buf: &[u8], range: &Range) -> bool {
|
||||
pub fn detect_binary(
|
||||
&mut self,
|
||||
buf: &[u8],
|
||||
range: &Range,
|
||||
) -> Result<bool, S::Error> {
|
||||
if self.binary_byte_offset.is_some() {
|
||||
return true;
|
||||
return Ok(self.config.binary.quit_byte().is_some());
|
||||
}
|
||||
let binary_byte = match self.config.binary.0 {
|
||||
BinaryDetection::Quit(b) => b,
|
||||
_ => return false,
|
||||
BinaryDetection::Convert(b) => b,
|
||||
_ => return Ok(false),
|
||||
};
|
||||
if let Some(i) = memchr(binary_byte, &buf[*range]) {
|
||||
self.binary_byte_offset = Some(range.start() + i);
|
||||
true
|
||||
if let Some(i) = B(&buf[*range]).find_byte(binary_byte) {
|
||||
let offset = range.start() + i;
|
||||
self.binary_byte_offset = Some(offset);
|
||||
if !self.binary_data(offset as u64)? {
|
||||
return Ok(true);
|
||||
}
|
||||
Ok(self.config.binary.quit_byte().is_some())
|
||||
} else {
|
||||
false
|
||||
Ok(false)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -416,7 +432,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
buf: &[u8],
|
||||
range: &Range,
|
||||
) -> Result<bool, S::Error> {
|
||||
if self.binary && self.detect_binary(buf, range) {
|
||||
if self.binary && self.detect_binary(buf, range)? {
|
||||
return Ok(false);
|
||||
}
|
||||
if !self.sink_break_context(range.start())? {
|
||||
@@ -424,16 +440,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
}
|
||||
self.count_lines(buf, range.start());
|
||||
let offset = self.absolute_byte_offset + range.start() as u64;
|
||||
let linebuf =
|
||||
if self.config.line_term.is_crlf() {
|
||||
// Normally, a line terminator is never part of a match, but
|
||||
// if the line terminator is CRLF, then it's possible for `\r`
|
||||
// to end up in the match, which we generally don't want. So
|
||||
// we strip it here.
|
||||
lines::without_terminator(&buf[*range], self.config.line_term)
|
||||
} else {
|
||||
&buf[*range]
|
||||
};
|
||||
let linebuf = &buf[*range];
|
||||
let keepgoing = self.sink.matched(
|
||||
&self.searcher,
|
||||
&SinkMatch {
|
||||
@@ -457,7 +464,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
buf: &[u8],
|
||||
range: &Range,
|
||||
) -> Result<bool, S::Error> {
|
||||
if self.binary && self.detect_binary(buf, range) {
|
||||
if self.binary && self.detect_binary(buf, range)? {
|
||||
return Ok(false);
|
||||
}
|
||||
self.count_lines(buf, range.start());
|
||||
@@ -487,7 +494,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
) -> Result<bool, S::Error> {
|
||||
assert!(self.after_context_left >= 1);
|
||||
|
||||
if self.binary && self.detect_binary(buf, range) {
|
||||
if self.binary && self.detect_binary(buf, range)? {
|
||||
return Ok(false);
|
||||
}
|
||||
self.count_lines(buf, range.start());
|
||||
@@ -516,7 +523,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
buf: &[u8],
|
||||
range: &Range,
|
||||
) -> Result<bool, S::Error> {
|
||||
if self.binary && self.detect_binary(buf, range) {
|
||||
if self.binary && self.detect_binary(buf, range)? {
|
||||
return Ok(false);
|
||||
}
|
||||
self.count_lines(buf, range.start());
|
||||
|
@@ -51,6 +51,7 @@ where M: Matcher,
|
||||
fn fill(&mut self) -> Result<bool, S::Error> {
|
||||
assert!(self.rdr.buffer()[self.core.pos()..].is_empty());
|
||||
|
||||
let already_binary = self.rdr.binary_byte_offset().is_some();
|
||||
let old_buf_len = self.rdr.buffer().len();
|
||||
let consumed = self.core.roll(self.rdr.buffer());
|
||||
self.rdr.consume(consumed);
|
||||
@@ -58,7 +59,14 @@ where M: Matcher,
|
||||
Err(err) => return Err(S::Error::error_io(err)),
|
||||
Ok(didread) => didread,
|
||||
};
|
||||
if !didread || self.rdr.binary_byte_offset().is_some() {
|
||||
if !already_binary {
|
||||
if let Some(offset) = self.rdr.binary_byte_offset() {
|
||||
if !self.core.binary_data(offset)? {
|
||||
return Ok(false);
|
||||
}
|
||||
}
|
||||
}
|
||||
if !didread || self.should_binary_quit() {
|
||||
return Ok(false);
|
||||
}
|
||||
// If rolling the buffer didn't result in consuming anything and if
|
||||
@@ -71,6 +79,11 @@ where M: Matcher,
|
||||
}
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
fn should_binary_quit(&self) -> bool {
|
||||
self.rdr.binary_byte_offset().is_some()
|
||||
&& self.config.binary.quit_byte().is_some()
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug)]
|
||||
@@ -103,7 +116,7 @@ impl<'s, M: Matcher, S: Sink> SliceByLine<'s, M, S> {
|
||||
DEFAULT_BUFFER_CAPACITY,
|
||||
);
|
||||
let binary_range = Range::new(0, binary_upto);
|
||||
if !self.core.detect_binary(self.slice, &binary_range) {
|
||||
if !self.core.detect_binary(self.slice, &binary_range)? {
|
||||
while
|
||||
!self.slice[self.core.pos()..].is_empty()
|
||||
&& self.core.match_by_line(self.slice)?
|
||||
@@ -155,7 +168,7 @@ impl<'s, M: Matcher, S: Sink> MultiLine<'s, M, S> {
|
||||
DEFAULT_BUFFER_CAPACITY,
|
||||
);
|
||||
let binary_range = Range::new(0, binary_upto);
|
||||
if !self.core.detect_binary(self.slice, &binary_range) {
|
||||
if !self.core.detect_binary(self.slice, &binary_range)? {
|
||||
let mut keepgoing = true;
|
||||
while !self.slice[self.core.pos()..].is_empty() && keepgoing {
|
||||
keepgoing = self.sink()?;
|
||||
|
@@ -75,25 +75,41 @@ impl BinaryDetection {
|
||||
BinaryDetection(line_buffer::BinaryDetection::Quit(binary_byte))
|
||||
}
|
||||
|
||||
// TODO(burntsushi): Figure out how to make binary conversion work. This
|
||||
// permits implementing GNU grep's default behavior, which is to zap NUL
|
||||
// bytes but still execute a search (if a match is detected, then GNU grep
|
||||
// stops and reports that a match was found but doesn't print the matching
|
||||
// line itself).
|
||||
//
|
||||
// This behavior is pretty simple to implement using the line buffer (and
|
||||
// in fact, it is already implemented and tested), since there's a fixed
|
||||
// size buffer that we can easily write to. The issue arises when searching
|
||||
// a `&[u8]` (whether on the heap or via a memory map), since this isn't
|
||||
// something we can easily write to.
|
||||
|
||||
/// The given byte is searched in all contents read by the line buffer. If
|
||||
/// it occurs, then it is replaced by the line terminator. The line buffer
|
||||
/// guarantees that this byte will never be observable by callers.
|
||||
#[allow(dead_code)]
|
||||
fn convert(binary_byte: u8) -> BinaryDetection {
|
||||
/// Binary detection is performed by looking for the given byte, and
|
||||
/// replacing it with the line terminator configured on the searcher.
|
||||
/// (If the searcher is configured to use `CRLF` as the line terminator,
|
||||
/// then this byte is replaced by just `LF`.)
|
||||
///
|
||||
/// When searching is performed using a fixed size buffer, then the
|
||||
/// contents of that buffer are always searched for the presence of this
|
||||
/// byte and replaced with the line terminator. In effect, the caller is
|
||||
/// guaranteed to never observe this byte while searching.
|
||||
///
|
||||
/// When searching is performed with the entire contents mapped into
|
||||
/// memory, then this setting has no effect and is ignored.
|
||||
pub fn convert(binary_byte: u8) -> BinaryDetection {
|
||||
BinaryDetection(line_buffer::BinaryDetection::Convert(binary_byte))
|
||||
}
|
||||
|
||||
/// If this binary detection uses the "quit" strategy, then this returns
|
||||
/// the byte that will cause a search to quit. In any other case, this
|
||||
/// returns `None`.
|
||||
pub fn quit_byte(&self) -> Option<u8> {
|
||||
match self.0 {
|
||||
line_buffer::BinaryDetection::Quit(b) => Some(b),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
/// If this binary detection uses the "convert" strategy, then this returns
|
||||
/// the byte that will be replaced by the line terminator. In any other
|
||||
/// case, this returns `None`.
|
||||
pub fn convert_byte(&self) -> Option<u8> {
|
||||
match self.0 {
|
||||
line_buffer::BinaryDetection::Convert(b) => Some(b),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// An encoding to use when searching.
|
||||
@@ -155,6 +171,8 @@ pub struct Config {
|
||||
/// An encoding that, when present, causes the searcher to transcode all
|
||||
/// input from the encoding to UTF-8.
|
||||
encoding: Option<Encoding>,
|
||||
/// Whether to do automatic transcoding based on a BOM or not.
|
||||
bom_sniffing: bool,
|
||||
}
|
||||
|
||||
impl Default for Config {
|
||||
@@ -171,6 +189,7 @@ impl Default for Config {
|
||||
binary: BinaryDetection::default(),
|
||||
multi_line: false,
|
||||
encoding: None,
|
||||
bom_sniffing: true,
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -303,11 +322,15 @@ impl SearcherBuilder {
|
||||
config.before_context = 0;
|
||||
config.after_context = 0;
|
||||
}
|
||||
|
||||
let mut decode_builder = DecodeReaderBytesBuilder::new();
|
||||
decode_builder
|
||||
.encoding(self.config.encoding.as_ref().map(|e| e.0))
|
||||
.utf8_passthru(true)
|
||||
.bom_override(true);
|
||||
.strip_bom(self.config.bom_sniffing)
|
||||
.bom_override(true)
|
||||
.bom_sniffing(self.config.bom_sniffing);
|
||||
|
||||
Searcher {
|
||||
config: config,
|
||||
decode_builder: decode_builder,
|
||||
@@ -505,12 +528,13 @@ impl SearcherBuilder {
|
||||
/// transcoding process encounters an error, then bytes are replaced with
|
||||
/// the Unicode replacement codepoint.
|
||||
///
|
||||
/// When no encoding is specified (the default), then BOM sniffing is used
|
||||
/// to determine whether the source data is UTF-8 or UTF-16, and
|
||||
/// transcoding will be performed automatically. If no BOM could be found,
|
||||
/// then the source data is searched _as if_ it were UTF-8. However, so
|
||||
/// long as the source data is at least ASCII compatible, then it is
|
||||
/// possible for a search to produce useful results.
|
||||
/// When no encoding is specified (the default), then BOM sniffing is
|
||||
/// used (if it's enabled, which it is, by default) to determine whether
|
||||
/// the source data is UTF-8 or UTF-16, and transcoding will be performed
|
||||
/// automatically. If no BOM could be found, then the source data is
|
||||
/// searched _as if_ it were UTF-8. However, so long as the source data is
|
||||
/// at least ASCII compatible, then it is possible for a search to produce
|
||||
/// useful results.
|
||||
pub fn encoding(
|
||||
&mut self,
|
||||
encoding: Option<Encoding>,
|
||||
@@ -518,6 +542,23 @@ impl SearcherBuilder {
|
||||
self.config.encoding = encoding;
|
||||
self
|
||||
}
|
||||
|
||||
/// Enable automatic transcoding based on BOM sniffing.
|
||||
///
|
||||
/// When this is enabled and an explicit encoding is not set, then this
|
||||
/// searcher will try to detect the encoding of the bytes being searched
|
||||
/// by sniffing its byte-order mark (BOM). In particular, when this is
|
||||
/// enabled, UTF-16 encoded files will be searched seamlessly.
|
||||
///
|
||||
/// When this is disabled and if an explicit encoding is not set, then
|
||||
/// the bytes from the source stream will be passed through unchanged,
|
||||
/// including its BOM, if one is present.
|
||||
///
|
||||
/// This is enabled by default.
|
||||
pub fn bom_sniffing(&mut self, yes: bool) -> &mut SearcherBuilder {
|
||||
self.config.bom_sniffing = yes;
|
||||
self
|
||||
}
|
||||
}
|
||||
|
||||
/// A searcher executes searches over a haystack and writes results to a caller
|
||||
@@ -714,6 +755,12 @@ impl Searcher {
|
||||
}
|
||||
}
|
||||
|
||||
/// Set the binary detection method used on this searcher.
|
||||
pub fn set_binary_detection(&mut self, detection: BinaryDetection) {
|
||||
self.config.binary = detection.clone();
|
||||
self.line_buffer.borrow_mut().set_binary_detection(detection.0);
|
||||
}
|
||||
|
||||
/// Check that the searcher's configuration and the matcher are consistent
|
||||
/// with each other.
|
||||
fn check_config<M: Matcher>(&self, matcher: M) -> Result<(), ConfigError> {
|
||||
@@ -737,7 +784,8 @@ impl Searcher {
|
||||
|
||||
/// Returns true if and only if the given slice needs to be transcoded.
|
||||
fn slice_needs_transcoding(&self, slice: &[u8]) -> bool {
|
||||
self.config.encoding.is_some() || slice_has_utf16_bom(slice)
|
||||
self.config.encoding.is_some()
|
||||
|| (self.config.bom_sniffing && slice_has_utf16_bom(slice))
|
||||
}
|
||||
}
|
||||
|
||||
@@ -752,6 +800,12 @@ impl Searcher {
|
||||
self.config.line_term
|
||||
}
|
||||
|
||||
/// Returns the type of binary detection configured on this searcher.
|
||||
#[inline]
|
||||
pub fn binary_detection(&self) -> &BinaryDetection {
|
||||
&self.config.binary
|
||||
}
|
||||
|
||||
/// Returns true if and only if this searcher is configured to invert its
|
||||
/// search results. That is, matching lines are lines that do **not** match
|
||||
/// the searcher's matcher.
|
||||
|
@@ -167,6 +167,28 @@ pub trait Sink {
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
/// This method is called whenever binary detection is enabled and binary
|
||||
/// data is found. If binary data is found, then this is called at least
|
||||
/// once for the first occurrence with the absolute byte offset at which
|
||||
/// the binary data begins.
|
||||
///
|
||||
/// If this returns `true`, then searching continues. If this returns
|
||||
/// `false`, then searching is stopped immediately and `finish` is called.
|
||||
///
|
||||
/// If this returns an error, then searching is stopped immediately,
|
||||
/// `finish` is not called and the error is bubbled back up to the caller
|
||||
/// of the searcher.
|
||||
///
|
||||
/// By default, it does nothing and returns `true`.
|
||||
#[inline]
|
||||
fn binary_data(
|
||||
&mut self,
|
||||
_searcher: &Searcher,
|
||||
_binary_byte_offset: u64,
|
||||
) -> Result<bool, Self::Error> {
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
/// This method is called when a search has begun, before any search is
|
||||
/// executed. By default, this does nothing.
|
||||
///
|
||||
@@ -228,6 +250,15 @@ impl<'a, S: Sink> Sink for &'a mut S {
|
||||
(**self).context_break(searcher)
|
||||
}
|
||||
|
||||
#[inline]
|
||||
fn binary_data(
|
||||
&mut self,
|
||||
searcher: &Searcher,
|
||||
binary_byte_offset: u64,
|
||||
) -> Result<bool, S::Error> {
|
||||
(**self).binary_data(searcher, binary_byte_offset)
|
||||
}
|
||||
|
||||
#[inline]
|
||||
fn begin(
|
||||
&mut self,
|
||||
@@ -275,6 +306,15 @@ impl<S: Sink + ?Sized> Sink for Box<S> {
|
||||
(**self).context_break(searcher)
|
||||
}
|
||||
|
||||
#[inline]
|
||||
fn binary_data(
|
||||
&mut self,
|
||||
searcher: &Searcher,
|
||||
binary_byte_offset: u64,
|
||||
) -> Result<bool, S::Error> {
|
||||
(**self).binary_data(searcher, binary_byte_offset)
|
||||
}
|
||||
|
||||
#[inline]
|
||||
fn begin(
|
||||
&mut self,
|
||||
|
@@ -1,10 +1,10 @@
|
||||
use std::io::{self, Write};
|
||||
use std::str;
|
||||
|
||||
use bstr::B;
|
||||
use grep_matcher::{
|
||||
LineMatchKind, LineTerminator, Match, Matcher, NoCaptures, NoError,
|
||||
};
|
||||
use memchr::memchr;
|
||||
use regex::bytes::{Regex, RegexBuilder};
|
||||
|
||||
use searcher::{BinaryDetection, Searcher, SearcherBuilder};
|
||||
@@ -94,7 +94,8 @@ impl Matcher for RegexMatcher {
|
||||
}
|
||||
// Make it interesting and return the last byte in the current
|
||||
// line.
|
||||
let i = memchr(self.line_term.unwrap().as_byte(), haystack)
|
||||
let i = B(haystack)
|
||||
.find_byte(self.line_term.unwrap().as_byte())
|
||||
.map(|i| i)
|
||||
.unwrap_or(haystack.len() - 1);
|
||||
Ok(Some(LineMatchKind::Candidate(i)))
|
||||
|
@@ -1,6 +1,6 @@
|
||||
[package]
|
||||
name = "grep"
|
||||
version = "0.2.2" #:version
|
||||
version = "0.2.3" #:version
|
||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||
description = """
|
||||
Fast line oriented regex searching as a library.
|
||||
@@ -13,25 +13,20 @@ keywords = ["regex", "grep", "egrep", "search", "pattern"]
|
||||
license = "Unlicense/MIT"
|
||||
|
||||
[dependencies]
|
||||
grep-cli = { version = "0.1.1", path = "../grep-cli" }
|
||||
grep-matcher = { version = "0.1.1", path = "../grep-matcher" }
|
||||
grep-pcre2 = { version = "0.1.1", path = "../grep-pcre2", optional = true }
|
||||
grep-printer = { version = "0.1.1", path = "../grep-printer" }
|
||||
grep-regex = { version = "0.1.1", path = "../grep-regex" }
|
||||
grep-searcher = { version = "0.1.1", path = "../grep-searcher" }
|
||||
grep-cli = { version = "0.1.2", path = "../grep-cli" }
|
||||
grep-matcher = { version = "0.1.2", path = "../grep-matcher" }
|
||||
grep-pcre2 = { version = "0.1.3", path = "../grep-pcre2", optional = true }
|
||||
grep-printer = { version = "0.1.2", path = "../grep-printer" }
|
||||
grep-regex = { version = "0.1.3", path = "../grep-regex" }
|
||||
grep-searcher = { version = "0.1.4", path = "../grep-searcher" }
|
||||
|
||||
[dev-dependencies]
|
||||
atty = "0.2.11"
|
||||
regex = "1"
|
||||
termcolor = "1"
|
||||
walkdir = "2.2.2"
|
||||
|
||||
[dev-dependencies.clap]
|
||||
version = "2.32.0"
|
||||
default-features = false
|
||||
features = ["suggestions"]
|
||||
termcolor = "1.0.4"
|
||||
walkdir = "2.2.7"
|
||||
|
||||
[features]
|
||||
avx-accel = ["grep-searcher/avx-accel"]
|
||||
simd-accel = ["grep-searcher/simd-accel"]
|
||||
pcre2 = ["grep-pcre2"]
|
||||
|
||||
# This feature is DEPRECATED. Runtime dispatch is used for SIMD now.
|
||||
avx-accel = []
|
||||
|
@@ -1,6 +1,6 @@
|
||||
[package]
|
||||
name = "ignore"
|
||||
version = "0.4.4" #:version
|
||||
version = "0.4.7" #:version
|
||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||
description = """
|
||||
A fast library for efficiently matching ignore files such as `.gitignore`
|
||||
@@ -18,21 +18,21 @@ name = "ignore"
|
||||
bench = false
|
||||
|
||||
[dependencies]
|
||||
crossbeam-channel = "0.2.4"
|
||||
globset = { version = "0.4.2", path = "../globset" }
|
||||
lazy_static = "1.1.0"
|
||||
crossbeam-channel = "0.3.6"
|
||||
globset = { version = "0.4.3", path = "../globset" }
|
||||
lazy_static = "1.1"
|
||||
log = "0.4.5"
|
||||
memchr = "2.0.2"
|
||||
regex = "1.0.5"
|
||||
same-file = "1.0.3"
|
||||
memchr = "2.1"
|
||||
regex = "1.1"
|
||||
same-file = "1.0.4"
|
||||
thread_local = "0.3.6"
|
||||
walkdir = "2.2.5"
|
||||
walkdir = "2.2.7"
|
||||
|
||||
[target.'cfg(windows)'.dependencies.winapi-util]
|
||||
version = "0.1.1"
|
||||
version = "0.1.2"
|
||||
|
||||
[dev-dependencies]
|
||||
tempdir = "0.3.7"
|
||||
tempfile = "3.0.5"
|
||||
|
||||
[features]
|
||||
simd-accel = ["globset/simd-accel"]
|
||||
|
@@ -37,19 +37,19 @@ fn main() {
|
||||
Box::new(move |result| {
|
||||
use ignore::WalkState::*;
|
||||
|
||||
tx.send(DirEntry::Y(result.unwrap()));
|
||||
tx.send(DirEntry::Y(result.unwrap())).unwrap();
|
||||
Continue
|
||||
})
|
||||
});
|
||||
} else if simple {
|
||||
let walker = WalkDir::new(path);
|
||||
for result in walker {
|
||||
tx.send(DirEntry::X(result.unwrap()));
|
||||
tx.send(DirEntry::X(result.unwrap())).unwrap();
|
||||
}
|
||||
} else {
|
||||
let walker = WalkBuilder::new(path).build();
|
||||
for result in walker {
|
||||
tx.send(DirEntry::Y(result.unwrap()));
|
||||
tx.send(DirEntry::Y(result.unwrap())).unwrap();
|
||||
}
|
||||
}
|
||||
drop(tx);
|
||||
|
@@ -22,6 +22,7 @@ use gitignore::{self, Gitignore, GitignoreBuilder};
|
||||
use pathutil::{is_hidden, strip_prefix};
|
||||
use overrides::{self, Override};
|
||||
use types::{self, Types};
|
||||
use walk::DirEntry;
|
||||
use {Error, Match, PartialErrorBuilder};
|
||||
|
||||
/// IgnoreMatch represents information about where a match came from when using
|
||||
@@ -73,6 +74,8 @@ struct IgnoreOptions {
|
||||
git_ignore: bool,
|
||||
/// Whether to read .git/info/exclude files.
|
||||
git_exclude: bool,
|
||||
/// Whether to ignore files case insensitively
|
||||
ignore_case_insensitive: bool,
|
||||
}
|
||||
|
||||
/// Ignore is a matcher useful for recursively walking one or more directories.
|
||||
@@ -225,7 +228,11 @@ impl Ignore {
|
||||
Gitignore::empty()
|
||||
} else {
|
||||
let (m, err) =
|
||||
create_gitignore(&dir, &self.0.custom_ignore_filenames);
|
||||
create_gitignore(
|
||||
&dir,
|
||||
&self.0.custom_ignore_filenames,
|
||||
self.0.opts.ignore_case_insensitive,
|
||||
);
|
||||
errs.maybe_push(err);
|
||||
m
|
||||
};
|
||||
@@ -233,7 +240,12 @@ impl Ignore {
|
||||
if !self.0.opts.ignore {
|
||||
Gitignore::empty()
|
||||
} else {
|
||||
let (m, err) = create_gitignore(&dir, &[".ignore"]);
|
||||
let (m, err) =
|
||||
create_gitignore(
|
||||
&dir,
|
||||
&[".ignore"],
|
||||
self.0.opts.ignore_case_insensitive,
|
||||
);
|
||||
errs.maybe_push(err);
|
||||
m
|
||||
};
|
||||
@@ -241,7 +253,12 @@ impl Ignore {
|
||||
if !self.0.opts.git_ignore {
|
||||
Gitignore::empty()
|
||||
} else {
|
||||
let (m, err) = create_gitignore(&dir, &[".gitignore"]);
|
||||
let (m, err) =
|
||||
create_gitignore(
|
||||
&dir,
|
||||
&[".gitignore"],
|
||||
self.0.opts.ignore_case_insensitive,
|
||||
);
|
||||
errs.maybe_push(err);
|
||||
m
|
||||
};
|
||||
@@ -249,7 +266,12 @@ impl Ignore {
|
||||
if !self.0.opts.git_exclude {
|
||||
Gitignore::empty()
|
||||
} else {
|
||||
let (m, err) = create_gitignore(&dir, &[".git/info/exclude"]);
|
||||
let (m, err) =
|
||||
create_gitignore(
|
||||
&dir,
|
||||
&[".git/info/exclude"],
|
||||
self.0.opts.ignore_case_insensitive,
|
||||
);
|
||||
errs.maybe_push(err);
|
||||
m
|
||||
};
|
||||
@@ -285,11 +307,23 @@ impl Ignore {
|
||||
|| has_explicit_ignores
|
||||
}
|
||||
|
||||
/// Like `matched`, but works with a directory entry instead.
|
||||
pub fn matched_dir_entry<'a>(
|
||||
&'a self,
|
||||
dent: &DirEntry,
|
||||
) -> Match<IgnoreMatch<'a>> {
|
||||
let m = self.matched(dent.path(), dent.is_dir());
|
||||
if m.is_none() && self.0.opts.hidden && is_hidden(dent) {
|
||||
return Match::Ignore(IgnoreMatch::hidden());
|
||||
}
|
||||
m
|
||||
}
|
||||
|
||||
/// Returns a match indicating whether the given file path should be
|
||||
/// ignored or not.
|
||||
///
|
||||
/// The match contains information about its origin.
|
||||
pub fn matched<'a, P: AsRef<Path>>(
|
||||
fn matched<'a, P: AsRef<Path>>(
|
||||
&'a self,
|
||||
path: P,
|
||||
is_dir: bool,
|
||||
@@ -330,9 +364,6 @@ impl Ignore {
|
||||
whitelisted = mat;
|
||||
}
|
||||
}
|
||||
if whitelisted.is_none() && self.0.opts.hidden && is_hidden(path) {
|
||||
return Match::Ignore(IgnoreMatch::hidden());
|
||||
}
|
||||
whitelisted
|
||||
}
|
||||
|
||||
@@ -483,6 +514,7 @@ impl IgnoreBuilder {
|
||||
git_global: true,
|
||||
git_ignore: true,
|
||||
git_exclude: true,
|
||||
ignore_case_insensitive: false,
|
||||
},
|
||||
}
|
||||
}
|
||||
@@ -496,7 +528,11 @@ impl IgnoreBuilder {
|
||||
if !self.opts.git_global {
|
||||
Gitignore::empty()
|
||||
} else {
|
||||
let (gi, err) = Gitignore::global();
|
||||
let mut builder = GitignoreBuilder::new("");
|
||||
builder
|
||||
.case_insensitive(self.opts.ignore_case_insensitive)
|
||||
.unwrap();
|
||||
let (gi, err) = builder.build_global();
|
||||
if let Some(err) = err {
|
||||
debug!("{}", err);
|
||||
}
|
||||
@@ -627,6 +663,17 @@ impl IgnoreBuilder {
|
||||
self.opts.git_exclude = yes;
|
||||
self
|
||||
}
|
||||
|
||||
/// Process ignore files case insensitively
|
||||
///
|
||||
/// This is disabled by default.
|
||||
pub fn ignore_case_insensitive(
|
||||
&mut self,
|
||||
yes: bool,
|
||||
) -> &mut IgnoreBuilder {
|
||||
self.opts.ignore_case_insensitive = yes;
|
||||
self
|
||||
}
|
||||
}
|
||||
|
||||
/// Creates a new gitignore matcher for the directory given.
|
||||
@@ -638,9 +685,11 @@ impl IgnoreBuilder {
|
||||
pub fn create_gitignore<T: AsRef<OsStr>>(
|
||||
dir: &Path,
|
||||
names: &[T],
|
||||
case_insensitive: bool,
|
||||
) -> (Gitignore, Option<Error>) {
|
||||
let mut builder = GitignoreBuilder::new(dir);
|
||||
let mut errs = PartialErrorBuilder::default();
|
||||
builder.case_insensitive(case_insensitive).unwrap();
|
||||
for name in names {
|
||||
let gipath = dir.join(name.as_ref());
|
||||
errs.maybe_push_ignore_io(builder.add(gipath));
|
||||
@@ -661,7 +710,7 @@ mod tests {
|
||||
use std::io::Write;
|
||||
use std::path::Path;
|
||||
|
||||
use tempdir::TempDir;
|
||||
use tempfile::{self, TempDir};
|
||||
|
||||
use dir::IgnoreBuilder;
|
||||
use gitignore::Gitignore;
|
||||
@@ -683,9 +732,13 @@ mod tests {
|
||||
}
|
||||
}
|
||||
|
||||
fn tmpdir(prefix: &str) -> TempDir {
|
||||
tempfile::Builder::new().prefix(prefix).tempdir().unwrap()
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn explicit_ignore() {
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
let td = tmpdir("ignore-test-");
|
||||
wfile(td.path().join("not-an-ignore"), "foo\n!bar");
|
||||
|
||||
let (gi, err) = Gitignore::new(td.path().join("not-an-ignore"));
|
||||
@@ -700,7 +753,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn git_exclude() {
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
let td = tmpdir("ignore-test-");
|
||||
mkdirp(td.path().join(".git/info"));
|
||||
wfile(td.path().join(".git/info/exclude"), "foo\n!bar");
|
||||
|
||||
@@ -713,7 +766,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn gitignore() {
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
let td = tmpdir("ignore-test-");
|
||||
mkdirp(td.path().join(".git"));
|
||||
wfile(td.path().join(".gitignore"), "foo\n!bar");
|
||||
|
||||
@@ -726,7 +779,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn gitignore_no_git() {
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
let td = tmpdir("ignore-test-");
|
||||
wfile(td.path().join(".gitignore"), "foo\n!bar");
|
||||
|
||||
let (ig, err) = IgnoreBuilder::new().build().add_child(td.path());
|
||||
@@ -738,7 +791,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn ignore() {
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
let td = tmpdir("ignore-test-");
|
||||
wfile(td.path().join(".ignore"), "foo\n!bar");
|
||||
|
||||
let (ig, err) = IgnoreBuilder::new().build().add_child(td.path());
|
||||
@@ -750,7 +803,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn custom_ignore() {
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
let td = tmpdir("ignore-test-");
|
||||
let custom_ignore = ".customignore";
|
||||
wfile(td.path().join(custom_ignore), "foo\n!bar");
|
||||
|
||||
@@ -766,7 +819,7 @@ mod tests {
|
||||
// Tests that a custom ignore file will override an .ignore.
|
||||
#[test]
|
||||
fn custom_ignore_over_ignore() {
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
let td = tmpdir("ignore-test-");
|
||||
let custom_ignore = ".customignore";
|
||||
wfile(td.path().join(".ignore"), "foo");
|
||||
wfile(td.path().join(custom_ignore), "!foo");
|
||||
@@ -781,7 +834,7 @@ mod tests {
|
||||
// Tests that earlier custom ignore files have lower precedence than later.
|
||||
#[test]
|
||||
fn custom_ignore_precedence() {
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
let td = tmpdir("ignore-test-");
|
||||
let custom_ignore1 = ".customignore1";
|
||||
let custom_ignore2 = ".customignore2";
|
||||
wfile(td.path().join(custom_ignore1), "foo");
|
||||
@@ -798,7 +851,7 @@ mod tests {
|
||||
// Tests that an .ignore will override a .gitignore.
|
||||
#[test]
|
||||
fn ignore_over_gitignore() {
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
let td = tmpdir("ignore-test-");
|
||||
wfile(td.path().join(".gitignore"), "foo");
|
||||
wfile(td.path().join(".ignore"), "!foo");
|
||||
|
||||
@@ -810,7 +863,7 @@ mod tests {
|
||||
// Tests that exclude has lower precedent than both .ignore and .gitignore.
|
||||
#[test]
|
||||
fn exclude_lowest() {
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
let td = tmpdir("ignore-test-");
|
||||
wfile(td.path().join(".gitignore"), "!foo");
|
||||
wfile(td.path().join(".ignore"), "!bar");
|
||||
mkdirp(td.path().join(".git/info"));
|
||||
@@ -825,8 +878,8 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn errored() {
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
wfile(td.path().join(".gitignore"), "f**oo");
|
||||
let td = tmpdir("ignore-test-");
|
||||
wfile(td.path().join(".gitignore"), "{foo");
|
||||
|
||||
let (_, err) = IgnoreBuilder::new().build().add_child(td.path());
|
||||
assert!(err.is_some());
|
||||
@@ -834,9 +887,9 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn errored_both() {
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
wfile(td.path().join(".gitignore"), "f**oo");
|
||||
wfile(td.path().join(".ignore"), "fo**o");
|
||||
let td = tmpdir("ignore-test-");
|
||||
wfile(td.path().join(".gitignore"), "{foo");
|
||||
wfile(td.path().join(".ignore"), "{bar");
|
||||
|
||||
let (_, err) = IgnoreBuilder::new().build().add_child(td.path());
|
||||
assert_eq!(2, partial(err.expect("an error")).len());
|
||||
@@ -844,9 +897,9 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn errored_partial() {
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
let td = tmpdir("ignore-test-");
|
||||
mkdirp(td.path().join(".git"));
|
||||
wfile(td.path().join(".gitignore"), "f**oo\nbar");
|
||||
wfile(td.path().join(".gitignore"), "{foo\nbar");
|
||||
|
||||
let (ig, err) = IgnoreBuilder::new().build().add_child(td.path());
|
||||
assert!(err.is_some());
|
||||
@@ -855,8 +908,8 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn errored_partial_and_ignore() {
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
wfile(td.path().join(".gitignore"), "f**oo\nbar");
|
||||
let td = tmpdir("ignore-test-");
|
||||
wfile(td.path().join(".gitignore"), "{foo\nbar");
|
||||
wfile(td.path().join(".ignore"), "!bar");
|
||||
|
||||
let (ig, err) = IgnoreBuilder::new().build().add_child(td.path());
|
||||
@@ -866,7 +919,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn not_present_empty() {
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
let td = tmpdir("ignore-test-");
|
||||
|
||||
let (_, err) = IgnoreBuilder::new().build().add_child(td.path());
|
||||
assert!(err.is_none());
|
||||
@@ -876,7 +929,7 @@ mod tests {
|
||||
fn stops_at_git_dir() {
|
||||
// This tests that .gitignore files beyond a .git barrier aren't
|
||||
// matched, but .ignore files are.
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
let td = tmpdir("ignore-test-");
|
||||
mkdirp(td.path().join(".git"));
|
||||
mkdirp(td.path().join("foo/.git"));
|
||||
wfile(td.path().join(".gitignore"), "foo");
|
||||
@@ -897,7 +950,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn absolute_parent() {
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
let td = tmpdir("ignore-test-");
|
||||
mkdirp(td.path().join(".git"));
|
||||
mkdirp(td.path().join("foo"));
|
||||
wfile(td.path().join(".gitignore"), "bar");
|
||||
@@ -920,7 +973,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn absolute_parent_anchored() {
|
||||
let td = TempDir::new("ignore-test-").unwrap();
|
||||
let td = tmpdir("ignore-test-");
|
||||
mkdirp(td.path().join(".git"));
|
||||
mkdirp(td.path().join("src/llvm"));
|
||||
wfile(td.path().join(".gitignore"), "/llvm/\nfoo");
|
||||
|
@@ -69,8 +69,7 @@ impl Glob {
|
||||
|
||||
/// Returns true if and only if this glob has a `**/` prefix.
|
||||
fn has_doublestar_prefix(&self) -> bool {
|
||||
self.actual.starts_with("**/")
|
||||
|| (self.actual == "**" && self.is_only_dir)
|
||||
self.actual.starts_with("**/") || self.actual == "**"
|
||||
}
|
||||
}
|
||||
|
||||
@@ -127,16 +126,7 @@ impl Gitignore {
|
||||
/// `$XDG_CONFIG_HOME/git/ignore` is read. If `$XDG_CONFIG_HOME` is not
|
||||
/// set or is empty, then `$HOME/.config/git/ignore` is used instead.
|
||||
pub fn global() -> (Gitignore, Option<Error>) {
|
||||
match gitconfig_excludes_path() {
|
||||
None => (Gitignore::empty(), None),
|
||||
Some(path) => {
|
||||
if !path.is_file() {
|
||||
(Gitignore::empty(), None)
|
||||
} else {
|
||||
Gitignore::new(path)
|
||||
}
|
||||
}
|
||||
}
|
||||
GitignoreBuilder::new("").build_global()
|
||||
}
|
||||
|
||||
/// Creates a new empty gitignore matcher that never matches anything.
|
||||
@@ -359,6 +349,36 @@ impl GitignoreBuilder {
|
||||
})
|
||||
}
|
||||
|
||||
/// Build a global gitignore matcher using the configuration in this
|
||||
/// builder.
|
||||
///
|
||||
/// This consumes ownership of the builder unlike `build` because it
|
||||
/// must mutate the builder to add the global gitignore globs.
|
||||
///
|
||||
/// Note that this ignores the path given to this builder's constructor
|
||||
/// and instead derives the path automatically from git's global
|
||||
/// configuration.
|
||||
pub fn build_global(mut self) -> (Gitignore, Option<Error>) {
|
||||
match gitconfig_excludes_path() {
|
||||
None => (Gitignore::empty(), None),
|
||||
Some(path) => {
|
||||
if !path.is_file() {
|
||||
(Gitignore::empty(), None)
|
||||
} else {
|
||||
let mut errs = PartialErrorBuilder::default();
|
||||
errs.maybe_push_ignore_io(self.add(path));
|
||||
match self.build() {
|
||||
Ok(gi) => (gi, errs.into_error_option()),
|
||||
Err(err) => {
|
||||
errs.push(err);
|
||||
(Gitignore::empty(), errs.into_error_option())
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Add each glob from the file path given.
|
||||
///
|
||||
/// The file given should be formatted as a `gitignore` file.
|
||||
@@ -419,6 +439,8 @@ impl GitignoreBuilder {
|
||||
from: Option<PathBuf>,
|
||||
mut line: &str,
|
||||
) -> Result<&mut GitignoreBuilder, Error> {
|
||||
#![allow(deprecated)]
|
||||
|
||||
if line.starts_with("#") {
|
||||
return Ok(self);
|
||||
}
|
||||
@@ -435,7 +457,6 @@ impl GitignoreBuilder {
|
||||
is_whitelist: false,
|
||||
is_only_dir: false,
|
||||
};
|
||||
let mut literal_separator = false;
|
||||
let mut is_absolute = false;
|
||||
if line.starts_with("\\!") || line.starts_with("\\#") {
|
||||
line = &line[1..];
|
||||
@@ -450,7 +471,6 @@ impl GitignoreBuilder {
|
||||
// then the glob can only match the beginning of a path
|
||||
// (relative to the location of gitignore). We achieve this by
|
||||
// simply banning wildcards from matching /.
|
||||
literal_separator = true;
|
||||
line = &line[1..];
|
||||
is_absolute = true;
|
||||
}
|
||||
@@ -463,16 +483,11 @@ impl GitignoreBuilder {
|
||||
line = &line[..i];
|
||||
}
|
||||
}
|
||||
// If there is a literal slash, then we note that so that globbing
|
||||
// doesn't let wildcards match slashes.
|
||||
glob.actual = line.to_string();
|
||||
if is_absolute || line.chars().any(|c| c == '/') {
|
||||
literal_separator = true;
|
||||
}
|
||||
// If there was a slash, then this is a glob that must match the entire
|
||||
// path name. Otherwise, we should let it match anywhere, so use a **/
|
||||
// prefix.
|
||||
if !literal_separator {
|
||||
// If there is a literal slash, then this is a glob that must match the
|
||||
// entire path name. Otherwise, we should let it match anywhere, so use
|
||||
// a **/ prefix.
|
||||
if !is_absolute && !line.chars().any(|c| c == '/') {
|
||||
// ... but only if we don't already have a **/ prefix.
|
||||
if !glob.has_doublestar_prefix() {
|
||||
glob.actual = format!("**/{}", glob.actual);
|
||||
@@ -486,7 +501,7 @@ impl GitignoreBuilder {
|
||||
}
|
||||
let parsed =
|
||||
GlobBuilder::new(&glob.actual)
|
||||
.literal_separator(literal_separator)
|
||||
.literal_separator(true)
|
||||
.case_insensitive(self.case_insensitive)
|
||||
.backslash_escape(true)
|
||||
.build()
|
||||
@@ -503,12 +518,16 @@ impl GitignoreBuilder {
|
||||
|
||||
/// Toggle whether the globs should be matched case insensitively or not.
|
||||
///
|
||||
/// When this option is changed, only globs added after the change will be affected.
|
||||
/// When this option is changed, only globs added after the change will be
|
||||
/// affected.
|
||||
///
|
||||
/// This is disabled by default.
|
||||
pub fn case_insensitive(
|
||||
&mut self, yes: bool
|
||||
&mut self,
|
||||
yes: bool,
|
||||
) -> Result<&mut GitignoreBuilder, Error> {
|
||||
// TODO: This should not return a `Result`. Fix this in the next semver
|
||||
// release.
|
||||
self.case_insensitive = yes;
|
||||
Ok(self)
|
||||
}
|
||||
@@ -689,6 +708,9 @@ mod tests {
|
||||
ignored!(ig39, ROOT, "\\?", "?");
|
||||
ignored!(ig40, ROOT, "\\*", "*");
|
||||
ignored!(ig41, ROOT, "\\a", "a");
|
||||
ignored!(ig42, ROOT, "s*.rs", "sfoo.rs");
|
||||
ignored!(ig43, ROOT, "**", "foo.rs");
|
||||
ignored!(ig44, ROOT, "**/**/*", "a/foo.rs");
|
||||
|
||||
not_ignored!(ignot1, ROOT, "amonths", "months");
|
||||
not_ignored!(ignot2, ROOT, "monthsa", "months");
|
||||
@@ -710,6 +732,7 @@ mod tests {
|
||||
not_ignored!(ignot16, ROOT, "*\n!**/", "foo", true);
|
||||
not_ignored!(ignot17, ROOT, "src/*.rs", "src/grep/src/main.rs");
|
||||
not_ignored!(ignot18, ROOT, "path1/*", "path2/path1/foo");
|
||||
not_ignored!(ignot19, ROOT, "s*.rs", "src/foo.rs");
|
||||
|
||||
fn bytes(s: &str) -> Vec<u8> {
|
||||
s.to_string().into_bytes()
|
||||
|
@@ -56,7 +56,7 @@ extern crate memchr;
|
||||
extern crate regex;
|
||||
extern crate same_file;
|
||||
#[cfg(test)]
|
||||
extern crate tempdir;
|
||||
extern crate tempfile;
|
||||
extern crate thread_local;
|
||||
extern crate walkdir;
|
||||
#[cfg(windows)]
|
||||
|
@@ -139,13 +139,16 @@ impl OverrideBuilder {
|
||||
}
|
||||
|
||||
/// Toggle whether the globs should be matched case insensitively or not.
|
||||
///
|
||||
///
|
||||
/// When this option is changed, only globs added after the change will be affected.
|
||||
///
|
||||
/// This is disabled by default.
|
||||
pub fn case_insensitive(
|
||||
&mut self, yes: bool
|
||||
&mut self,
|
||||
yes: bool,
|
||||
) -> Result<&mut OverrideBuilder, Error> {
|
||||
// TODO: This should not return a `Result`. Fix this in the next semver
|
||||
// release.
|
||||
self.builder.case_insensitive(yes)?;
|
||||
Ok(self)
|
||||
}
|
||||
|
@@ -1,22 +1,56 @@
|
||||
use std::ffi::OsStr;
|
||||
use std::path::Path;
|
||||
|
||||
/// Returns true if and only if this file path is considered to be hidden.
|
||||
use walk::DirEntry;
|
||||
|
||||
/// Returns true if and only if this entry is considered to be hidden.
|
||||
///
|
||||
/// This only returns true if the base name of the path starts with a `.`.
|
||||
///
|
||||
/// On Unix, this implements a more optimized check.
|
||||
#[cfg(unix)]
|
||||
pub fn is_hidden<P: AsRef<Path>>(path: P) -> bool {
|
||||
pub fn is_hidden(dent: &DirEntry) -> bool {
|
||||
use std::os::unix::ffi::OsStrExt;
|
||||
|
||||
if let Some(name) = file_name(path.as_ref()) {
|
||||
if let Some(name) = file_name(dent.path()) {
|
||||
name.as_bytes().get(0) == Some(&b'.')
|
||||
} else {
|
||||
false
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns true if and only if this file path is considered to be hidden.
|
||||
#[cfg(not(unix))]
|
||||
pub fn is_hidden<P: AsRef<Path>>(path: P) -> bool {
|
||||
if let Some(name) = file_name(path.as_ref()) {
|
||||
/// Returns true if and only if this entry is considered to be hidden.
|
||||
///
|
||||
/// On Windows, this returns true if one of the following is true:
|
||||
///
|
||||
/// * The base name of the path starts with a `.`.
|
||||
/// * The file attributes have the `HIDDEN` property set.
|
||||
#[cfg(windows)]
|
||||
pub fn is_hidden(dent: &DirEntry) -> bool {
|
||||
use std::os::windows::fs::MetadataExt;
|
||||
use winapi_util::file;
|
||||
|
||||
// This looks like we're doing an extra stat call, but on Windows, the
|
||||
// directory traverser reuses the metadata retrieved from each directory
|
||||
// entry and stores it on the DirEntry itself. So this is "free."
|
||||
if let Ok(md) = dent.metadata() {
|
||||
if file::is_hidden(md.file_attributes() as u64) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
if let Some(name) = file_name(dent.path()) {
|
||||
name.to_str().map(|s| s.starts_with(".")).unwrap_or(false)
|
||||
} else {
|
||||
false
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns true if and only if this entry is considered to be hidden.
|
||||
///
|
||||
/// This only returns true if the base name of the path starts with a `.`.
|
||||
#[cfg(not(any(unix, windows)))]
|
||||
pub fn is_hidden(dent: &DirEntry) -> bool {
|
||||
if let Some(name) = file_name(dent.path()) {
|
||||
name.to_str().map(|s| s.starts_with(".")).unwrap_or(false)
|
||||
} else {
|
||||
false
|
||||
|
@@ -103,12 +103,15 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
||||
("amake", &["*.mk", "*.bp"]),
|
||||
("asciidoc", &["*.adoc", "*.asc", "*.asciidoc"]),
|
||||
("asm", &["*.asm", "*.s", "*.S"]),
|
||||
("asp", &["*.aspx", "*.aspx.cs", "*.aspx.cs", "*.ascx", "*.ascx.cs", "*.ascx.vb"]),
|
||||
("avro", &["*.avdl", "*.avpr", "*.avsc"]),
|
||||
("awk", &["*.awk"]),
|
||||
("bazel", &["*.bzl", "WORKSPACE", "BUILD"]),
|
||||
("bazel", &["*.bzl", "WORKSPACE", "BUILD", "BUILD.bazel"]),
|
||||
("bitbake", &["*.bb", "*.bbappend", "*.bbclass", "*.conf", "*.inc"]),
|
||||
("bzip2", &["*.bz2"]),
|
||||
("c", &["*.c", "*.h", "*.H", "*.cats"]),
|
||||
("brotli", &["*.br"]),
|
||||
("buildstream", &["*.bst"]),
|
||||
("bzip2", &["*.bz2", "*.tbz2"]),
|
||||
("c", &["*.[chH]", "*.[chH].in", "*.cats"]),
|
||||
("cabal", &["*.cabal"]),
|
||||
("cbor", &["*.cbor"]),
|
||||
("ceylon", &["*.ceylon"]),
|
||||
@@ -118,8 +121,8 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
||||
("creole", &["*.creole"]),
|
||||
("config", &["*.cfg", "*.conf", "*.config", "*.ini"]),
|
||||
("cpp", &[
|
||||
"*.C", "*.cc", "*.cpp", "*.cxx",
|
||||
"*.h", "*.H", "*.hh", "*.hpp", "*.hxx", "*.inl",
|
||||
"*.[ChH]", "*.cc", "*.[ch]pp", "*.[ch]xx", "*.hh", "*.inl",
|
||||
"*.[ChH].in", "*.cc.in", "*.[ch]pp.in", "*.[ch]xx.in", "*.hh.in",
|
||||
]),
|
||||
("crystal", &["Projectfile", "*.cr"]),
|
||||
("cs", &["*.cs"]),
|
||||
@@ -127,7 +130,7 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
||||
("cshtml", &["*.cshtml"]),
|
||||
("css", &["*.css", "*.scss"]),
|
||||
("csv", &["*.csv"]),
|
||||
("cython", &["*.pyx"]),
|
||||
("cython", &["*.pyx", "*.pxi", "*.pxd"]),
|
||||
("dart", &["*.dart"]),
|
||||
("d", &["*.d"]),
|
||||
("dhall", &["*.dhall"]),
|
||||
@@ -145,7 +148,7 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
||||
("fsharp", &["*.fs", "*.fsx", "*.fsi"]),
|
||||
("gn", &["*.gn", "*.gni"]),
|
||||
("go", &["*.go"]),
|
||||
("gzip", &["*.gz"]),
|
||||
("gzip", &["*.gz", "*.tgz"]),
|
||||
("groovy", &["*.groovy", "*.gradle"]),
|
||||
("h", &["*.h", "*.hpp"]),
|
||||
("hbs", &["*.hbs"]),
|
||||
@@ -153,7 +156,7 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
||||
("hs", &["*.hs", "*.lhs"]),
|
||||
("html", &["*.htm", "*.html", "*.ejs"]),
|
||||
("idris", &["*.idr", "*.lidr"]),
|
||||
("java", &["*.java", "*.jsp"]),
|
||||
("java", &["*.java", "*.jsp", "*.jspx", "*.properties"]),
|
||||
("jinja", &["*.j2", "*.jinja", "*.jinja2"]),
|
||||
("js", &[
|
||||
"*.js", "*.jsx", "*.vue",
|
||||
@@ -193,14 +196,16 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
||||
"OFL-*[0-9]*",
|
||||
]),
|
||||
("lisp", &["*.el", "*.jl", "*.lisp", "*.lsp", "*.sc", "*.scm"]),
|
||||
("lock", &["*.lock", "package-lock.json"]),
|
||||
("log", &["*.log"]),
|
||||
("lua", &["*.lua"]),
|
||||
("lzma", &["*.lzma"]),
|
||||
("lz4", &["*.lz4"]),
|
||||
("m4", &["*.ac", "*.m4"]),
|
||||
("make", &[
|
||||
"gnumakefile", "Gnumakefile", "GNUmakefile",
|
||||
"makefile", "Makefile",
|
||||
"[Gg][Nn][Uu]makefile", "[Mm]akefile",
|
||||
"[Gg][Nn][Uu]makefile.am", "[Mm]akefile.am",
|
||||
"[Gg][Nn][Uu]makefile.in", "[Mm]akefile.in",
|
||||
"*.mk", "*.mak"
|
||||
]),
|
||||
("mako", &["*.mako", "*.mao"]),
|
||||
@@ -224,12 +229,14 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
||||
("pdf", &["*.pdf"]),
|
||||
("php", &["*.php", "*.php3", "*.php4", "*.php5", "*.phtml"]),
|
||||
("pod", &["*.pod"]),
|
||||
("postscript", &[".eps", ".ps"]),
|
||||
("protobuf", &["*.proto"]),
|
||||
("ps", &["*.cdxml", "*.ps1", "*.ps1xml", "*.psd1", "*.psm1"]),
|
||||
("puppet", &["*.erb", "*.pp", "*.rb"]),
|
||||
("purs", &["*.purs"]),
|
||||
("py", &["*.py"]),
|
||||
("qmake", &["*.pro", "*.pri", "*.prf"]),
|
||||
("qml", &["*.qml"]),
|
||||
("readme", &["README*", "*README"]),
|
||||
("r", &["*.R", "*.r", "*.Rmd", "*.Rnw"]),
|
||||
("rdoc", &["*.rdoc"]),
|
||||
@@ -278,8 +285,9 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
||||
]),
|
||||
("taskpaper", &["*.taskpaper"]),
|
||||
("tcl", &["*.tcl"]),
|
||||
("tex", &["*.tex", "*.ltx", "*.cls", "*.sty", "*.bib"]),
|
||||
("tex", &["*.tex", "*.ltx", "*.cls", "*.sty", "*.bib", "*.dtx", "*.ins"]),
|
||||
("textile", &["*.textile"]),
|
||||
("thrift", &["*.thrift"]),
|
||||
("tf", &["*.tf"]),
|
||||
("ts", &["*.ts", "*.tsx"]),
|
||||
("txt", &["*.txt"]),
|
||||
@@ -293,10 +301,14 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
||||
("vimscript", &["*.vim"]),
|
||||
("wiki", &["*.mediawiki", "*.wiki"]),
|
||||
("webidl", &["*.idl", "*.webidl", "*.widl"]),
|
||||
("xml", &["*.xml", "*.xml.dist"]),
|
||||
("xz", &["*.xz"]),
|
||||
("xml", &[
|
||||
"*.xml", "*.xml.dist", "*.dtd", "*.xsl", "*.xslt", "*.xsd", "*.xjb",
|
||||
"*.rng", "*.sch",
|
||||
]),
|
||||
("xz", &["*.xz", "*.txz"]),
|
||||
("yacc", &["*.y"]),
|
||||
("yaml", &["*.yaml", "*.yml"]),
|
||||
("zig", &["*.zig"]),
|
||||
("zsh", &[
|
||||
".zshenv", "zshenv",
|
||||
".zlogin", "zlogin",
|
||||
@@ -305,6 +317,7 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
||||
".zshrc", "zshrc",
|
||||
"*.zsh",
|
||||
]),
|
||||
("zstd", &["*.zst", "*.zstd"]),
|
||||
];
|
||||
|
||||
/// Glob represents a single glob in a set of file type definitions.
|
||||
@@ -343,6 +356,18 @@ impl<'a> Glob<'a> {
|
||||
fn unmatched() -> Glob<'a> {
|
||||
Glob(GlobInner::UnmatchedIgnore)
|
||||
}
|
||||
|
||||
/// Return the file type defintion that matched, if one exists. A file type
|
||||
/// definition always exists when a specific definition matches a file
|
||||
/// path.
|
||||
pub fn file_type_def(&self) -> Option<&FileTypeDef> {
|
||||
match self {
|
||||
Glob(GlobInner::UnmatchedIgnore) => None,
|
||||
Glob(GlobInner::Matched { def, .. }) => {
|
||||
Some(def)
|
||||
},
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// A single file type definition.
|
||||
|
@@ -99,7 +99,7 @@ impl DirEntry {
|
||||
}
|
||||
|
||||
/// Returns true if and only if this entry points to a directory.
|
||||
fn is_dir(&self) -> bool {
|
||||
pub(crate) fn is_dir(&self) -> bool {
|
||||
self.dent.is_dir()
|
||||
}
|
||||
|
||||
@@ -764,6 +764,14 @@ impl WalkBuilder {
|
||||
self
|
||||
}
|
||||
|
||||
/// Process ignore files case insensitively
|
||||
///
|
||||
/// This is disabled by default.
|
||||
pub fn ignore_case_insensitive(&mut self, yes: bool) -> &mut WalkBuilder {
|
||||
self.ig_builder.ignore_case_insensitive(yes);
|
||||
self
|
||||
}
|
||||
|
||||
/// Set a function for sorting directory entries by their path.
|
||||
///
|
||||
/// If a compare function is set, the resulting iterator will return all
|
||||
@@ -875,16 +883,17 @@ impl Walk {
|
||||
return Ok(true);
|
||||
}
|
||||
}
|
||||
let is_dir = ent.file_type().map_or(false, |ft| ft.is_dir());
|
||||
let max_size = self.max_filesize;
|
||||
let should_skip_path = skip_path(&self.ig, ent.path(), is_dir);
|
||||
let should_skip_filesize = if !is_dir && max_size.is_some() {
|
||||
skip_filesize(max_size.unwrap(), ent.path(), &ent.metadata().ok())
|
||||
} else {
|
||||
false
|
||||
};
|
||||
|
||||
Ok(should_skip_path || should_skip_filesize)
|
||||
if should_skip_entry(&self.ig, ent) {
|
||||
return Ok(true);
|
||||
}
|
||||
if self.max_filesize.is_some() && !ent.is_dir() {
|
||||
return Ok(skip_filesize(
|
||||
self.max_filesize.unwrap(),
|
||||
ent.path(),
|
||||
&ent.metadata().ok(),
|
||||
));
|
||||
}
|
||||
Ok(false)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1118,7 +1127,7 @@ impl WalkParallel {
|
||||
dent: dent,
|
||||
ignore: self.ig_root.clone(),
|
||||
root_device: root_device,
|
||||
}));
|
||||
})).unwrap();
|
||||
any_work = true;
|
||||
}
|
||||
// ... but there's no need to start workers if we don't need them.
|
||||
@@ -1412,13 +1421,11 @@ impl Worker {
|
||||
return WalkState::Continue;
|
||||
}
|
||||
}
|
||||
let is_dir = dent.is_dir();
|
||||
let max_size = self.max_filesize;
|
||||
let should_skip_path = skip_path(ig, dent.path(), is_dir);
|
||||
let should_skip_path = should_skip_entry(ig, &dent);
|
||||
let should_skip_filesize =
|
||||
if !is_dir && max_size.is_some() {
|
||||
if self.max_filesize.is_some() && !dent.is_dir() {
|
||||
skip_filesize(
|
||||
max_size.unwrap(),
|
||||
self.max_filesize.unwrap(),
|
||||
dent.path(),
|
||||
&dent.metadata().ok(),
|
||||
)
|
||||
@@ -1431,7 +1438,7 @@ impl Worker {
|
||||
dent: dent,
|
||||
ignore: ig.clone(),
|
||||
root_device: root_device,
|
||||
}));
|
||||
})).unwrap();
|
||||
}
|
||||
WalkState::Continue
|
||||
}
|
||||
@@ -1446,12 +1453,12 @@ impl Worker {
|
||||
return None;
|
||||
}
|
||||
match self.rx.try_recv() {
|
||||
Some(Message::Work(work)) => {
|
||||
Ok(Message::Work(work)) => {
|
||||
self.waiting(false);
|
||||
self.quitting(false);
|
||||
return Some(work);
|
||||
}
|
||||
Some(Message::Quit) => {
|
||||
Ok(Message::Quit) => {
|
||||
// We can't just quit because a Message::Quit could be
|
||||
// spurious. For example, it's possible to observe that
|
||||
// all workers are waiting even if there's more work to
|
||||
@@ -1482,12 +1489,12 @@ impl Worker {
|
||||
// Otherwise, spin.
|
||||
}
|
||||
}
|
||||
None => {
|
||||
Err(_) => {
|
||||
self.waiting(true);
|
||||
self.quitting(false);
|
||||
if self.num_waiting() == self.threads {
|
||||
for _ in 0..self.threads {
|
||||
self.tx.send(Message::Quit);
|
||||
self.tx.send(Message::Quit).unwrap();
|
||||
}
|
||||
} else {
|
||||
// You're right to consider this suspicious, but it's
|
||||
@@ -1601,17 +1608,16 @@ fn skip_filesize(
|
||||
}
|
||||
}
|
||||
|
||||
fn skip_path(
|
||||
fn should_skip_entry(
|
||||
ig: &Ignore,
|
||||
path: &Path,
|
||||
is_dir: bool,
|
||||
dent: &DirEntry,
|
||||
) -> bool {
|
||||
let m = ig.matched(path, is_dir);
|
||||
let m = ig.matched_dir_entry(dent);
|
||||
if m.is_ignore() {
|
||||
debug!("ignoring {}: {:?}", path.display(), m);
|
||||
debug!("ignoring {}: {:?}", dent.path().display(), m);
|
||||
true
|
||||
} else if m.is_whitelist() {
|
||||
debug!("whitelisting {}: {:?}", path.display(), m);
|
||||
debug!("whitelisting {}: {:?}", dent.path().display(), m);
|
||||
false
|
||||
} else {
|
||||
false
|
||||
@@ -1702,7 +1708,7 @@ mod tests {
|
||||
use std::path::Path;
|
||||
use std::sync::{Arc, Mutex};
|
||||
|
||||
use tempdir::TempDir;
|
||||
use tempfile::{self, TempDir};
|
||||
|
||||
use super::{DirEntry, WalkBuilder, WalkState};
|
||||
|
||||
@@ -1789,6 +1795,10 @@ mod tests {
|
||||
paths
|
||||
}
|
||||
|
||||
fn tmpdir(prefix: &str) -> TempDir {
|
||||
tempfile::Builder::new().prefix(prefix).tempdir().unwrap()
|
||||
}
|
||||
|
||||
fn assert_paths(
|
||||
prefix: &Path,
|
||||
builder: &WalkBuilder,
|
||||
@@ -1802,7 +1812,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn no_ignores() {
|
||||
let td = TempDir::new("walk-test-").unwrap();
|
||||
let td = tmpdir("walk-test-");
|
||||
mkdirp(td.path().join("a/b/c"));
|
||||
mkdirp(td.path().join("x/y"));
|
||||
wfile(td.path().join("a/b/foo"), "");
|
||||
@@ -1815,7 +1825,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn custom_ignore() {
|
||||
let td = TempDir::new("walk-test-").unwrap();
|
||||
let td = tmpdir("walk-test-");
|
||||
let custom_ignore = ".customignore";
|
||||
mkdirp(td.path().join("a"));
|
||||
wfile(td.path().join(custom_ignore), "foo");
|
||||
@@ -1831,7 +1841,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn custom_ignore_exclusive_use() {
|
||||
let td = TempDir::new("walk-test-").unwrap();
|
||||
let td = tmpdir("walk-test-");
|
||||
let custom_ignore = ".customignore";
|
||||
mkdirp(td.path().join("a"));
|
||||
wfile(td.path().join(custom_ignore), "foo");
|
||||
@@ -1851,7 +1861,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn gitignore() {
|
||||
let td = TempDir::new("walk-test-").unwrap();
|
||||
let td = tmpdir("walk-test-");
|
||||
mkdirp(td.path().join(".git"));
|
||||
mkdirp(td.path().join("a"));
|
||||
wfile(td.path().join(".gitignore"), "foo");
|
||||
@@ -1867,7 +1877,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn explicit_ignore() {
|
||||
let td = TempDir::new("walk-test-").unwrap();
|
||||
let td = tmpdir("walk-test-");
|
||||
let igpath = td.path().join(".not-an-ignore");
|
||||
mkdirp(td.path().join("a"));
|
||||
wfile(&igpath, "foo");
|
||||
@@ -1883,7 +1893,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn explicit_ignore_exclusive_use() {
|
||||
let td = TempDir::new("walk-test-").unwrap();
|
||||
let td = tmpdir("walk-test-");
|
||||
let igpath = td.path().join(".not-an-ignore");
|
||||
mkdirp(td.path().join("a"));
|
||||
wfile(&igpath, "foo");
|
||||
@@ -1901,7 +1911,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn gitignore_parent() {
|
||||
let td = TempDir::new("walk-test-").unwrap();
|
||||
let td = tmpdir("walk-test-");
|
||||
mkdirp(td.path().join(".git"));
|
||||
mkdirp(td.path().join("a"));
|
||||
wfile(td.path().join(".gitignore"), "foo");
|
||||
@@ -1914,7 +1924,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn max_depth() {
|
||||
let td = TempDir::new("walk-test-").unwrap();
|
||||
let td = tmpdir("walk-test-");
|
||||
mkdirp(td.path().join("a/b/c"));
|
||||
wfile(td.path().join("foo"), "");
|
||||
wfile(td.path().join("a/foo"), "");
|
||||
@@ -1934,7 +1944,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn max_filesize() {
|
||||
let td = TempDir::new("walk-test-").unwrap();
|
||||
let td = tmpdir("walk-test-");
|
||||
mkdirp(td.path().join("a/b"));
|
||||
wfile_size(td.path().join("foo"), 0);
|
||||
wfile_size(td.path().join("bar"), 400);
|
||||
@@ -1961,7 +1971,7 @@ mod tests {
|
||||
#[cfg(unix)] // because symlinks on windows are weird
|
||||
#[test]
|
||||
fn symlinks() {
|
||||
let td = TempDir::new("walk-test-").unwrap();
|
||||
let td = tmpdir("walk-test-");
|
||||
mkdirp(td.path().join("a/b"));
|
||||
symlink(td.path().join("a/b"), td.path().join("z"));
|
||||
wfile(td.path().join("a/b/foo"), "");
|
||||
@@ -1978,7 +1988,7 @@ mod tests {
|
||||
#[cfg(unix)] // because symlinks on windows are weird
|
||||
#[test]
|
||||
fn first_path_not_symlink() {
|
||||
let td = TempDir::new("walk-test-").unwrap();
|
||||
let td = tmpdir("walk-test-");
|
||||
mkdirp(td.path().join("foo"));
|
||||
|
||||
let dents = WalkBuilder::new(td.path().join("foo"))
|
||||
@@ -1999,7 +2009,7 @@ mod tests {
|
||||
#[cfg(unix)] // because symlinks on windows are weird
|
||||
#[test]
|
||||
fn symlink_loop() {
|
||||
let td = TempDir::new("walk-test-").unwrap();
|
||||
let td = tmpdir("walk-test-");
|
||||
mkdirp(td.path().join("a/b"));
|
||||
symlink(td.path().join("a"), td.path().join("a/b/c"));
|
||||
|
||||
@@ -2029,7 +2039,7 @@ mod tests {
|
||||
|
||||
// If our test directory actually isn't a different volume from /sys,
|
||||
// then this test is meaningless and we shouldn't run it.
|
||||
let td = TempDir::new("walk-test-").unwrap();
|
||||
let td = tmpdir("walk-test-");
|
||||
if device_num(td.path()).unwrap() == device_num("/sys").unwrap() {
|
||||
return;
|
||||
}
|
||||
|
@@ -1,14 +1,14 @@
|
||||
class RipgrepBin < Formula
|
||||
version '0.9.0'
|
||||
version '0.10.0'
|
||||
desc "Recursively search directories for a regex pattern."
|
||||
homepage "https://github.com/BurntSushi/ripgrep"
|
||||
|
||||
if OS.mac?
|
||||
url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-apple-darwin.tar.gz"
|
||||
sha256 "36003ea8b62ad6274dc14140039f448cdf5026827d53cf24dad2d84005557a8c"
|
||||
sha256 "32754b4173ac87a7bfffd436d601a49362676eb1841ab33440f2f49c002c8967"
|
||||
elsif OS.linux?
|
||||
url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-unknown-linux-musl.tar.gz"
|
||||
sha256 "2eb4443e58f95051ff76ea036ed1faf940d5a04af4e7ff5a7dbd74576b907e99"
|
||||
sha256 "c76080aa807a339b44139885d77d15ad60ab8cdd2c2fdaf345d0985625bc0f97"
|
||||
end
|
||||
|
||||
conflicts_with "ripgrep"
|
||||
|
1
rustfmt.toml
Normal file
1
rustfmt.toml
Normal file
@@ -0,0 +1 @@
|
||||
disable_all_formatting = true
|
249
src/app.rs
249
src/app.rs
@@ -9,7 +9,8 @@
|
||||
// is where we read clap's configuration from the end user's arguments and turn
|
||||
// it into a ripgrep-specific configuration type that is not coupled with clap.
|
||||
|
||||
use clap::{self, App, AppSettings};
|
||||
use clap::{self, App, AppSettings, crate_authors, crate_version};
|
||||
use lazy_static::lazy_static;
|
||||
|
||||
const ABOUT: &str = "
|
||||
ripgrep (rg) recursively searches your current directory for a regex pattern.
|
||||
@@ -26,6 +27,9 @@ configuration file. The file can specify one shell argument per line. Lines
|
||||
starting with '#' are ignored. For more details, see the man page or the
|
||||
README.
|
||||
|
||||
Tip: to disable all smart filtering and make ripgrep behave a bit more like
|
||||
classical grep, use 'rg -uuu'.
|
||||
|
||||
Project home page: https://github.com/BurntSushi/ripgrep
|
||||
|
||||
Use -h for short descriptions and --help for more details.";
|
||||
@@ -543,7 +547,9 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
|
||||
// flags are hidden and merely mentioned in the docs of the corresponding
|
||||
// "positive" flag.
|
||||
flag_after_context(&mut args);
|
||||
flag_auto_hybrid_regex(&mut args);
|
||||
flag_before_context(&mut args);
|
||||
flag_binary(&mut args);
|
||||
flag_block_buffered(&mut args);
|
||||
flag_byte_offset(&mut args);
|
||||
flag_case_sensitive(&mut args);
|
||||
@@ -570,12 +576,14 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
|
||||
flag_iglob(&mut args);
|
||||
flag_ignore_case(&mut args);
|
||||
flag_ignore_file(&mut args);
|
||||
flag_ignore_file_case_insensitive(&mut args);
|
||||
flag_invert_match(&mut args);
|
||||
flag_json(&mut args);
|
||||
flag_line_buffered(&mut args);
|
||||
flag_line_number(&mut args);
|
||||
flag_line_regexp(&mut args);
|
||||
flag_max_columns(&mut args);
|
||||
flag_max_columns_preview(&mut args);
|
||||
flag_max_count(&mut args);
|
||||
flag_max_depth(&mut args);
|
||||
flag_max_filesize(&mut args);
|
||||
@@ -584,6 +592,7 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
|
||||
flag_multiline_dotall(&mut args);
|
||||
flag_no_config(&mut args);
|
||||
flag_no_ignore(&mut args);
|
||||
flag_no_ignore_dot(&mut args);
|
||||
flag_no_ignore_global(&mut args);
|
||||
flag_no_ignore_messages(&mut args);
|
||||
flag_no_ignore_parent(&mut args);
|
||||
@@ -597,6 +606,7 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
|
||||
flag_path_separator(&mut args);
|
||||
flag_passthru(&mut args);
|
||||
flag_pcre2(&mut args);
|
||||
flag_pcre2_version(&mut args);
|
||||
flag_pre(&mut args);
|
||||
flag_pre_glob(&mut args);
|
||||
flag_pretty(&mut args);
|
||||
@@ -643,7 +653,7 @@ will be provided. Namely, the following is equivalent to the above:
|
||||
let arg = RGArg::positional("pattern", "PATTERN")
|
||||
.help(SHORT).long_help(LONG)
|
||||
.required_unless(&[
|
||||
"file", "files", "regexp", "type-list",
|
||||
"file", "files", "regexp", "type-list", "pcre2-version",
|
||||
]);
|
||||
args.push(arg);
|
||||
}
|
||||
@@ -674,6 +684,50 @@ This overrides the --context flag.
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_auto_hybrid_regex(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Dynamically use PCRE2 if necessary.";
|
||||
const LONG: &str = long!("\
|
||||
When this flag is used, ripgrep will dynamically choose between supported regex
|
||||
engines depending on the features used in a pattern. When ripgrep chooses a
|
||||
regex engine, it applies that choice for every regex provided to ripgrep (e.g.,
|
||||
via multiple -e/--regexp or -f/--file flags).
|
||||
|
||||
As an example of how this flag might behave, ripgrep will attempt to use
|
||||
its default finite automata based regex engine whenever the pattern can be
|
||||
successfully compiled with that regex engine. If PCRE2 is enabled and if the
|
||||
pattern given could not be compiled with the default regex engine, then PCRE2
|
||||
will be automatically used for searching. If PCRE2 isn't available, then this
|
||||
flag has no effect because there is only one regex engine to choose from.
|
||||
|
||||
In the future, ripgrep may adjust its heuristics for how it decides which
|
||||
regex engine to use. In general, the heuristics will be limited to a static
|
||||
analysis of the patterns, and not to any specific runtime behavior observed
|
||||
while searching files.
|
||||
|
||||
The primary downside of using this flag is that it may not always be obvious
|
||||
which regex engine ripgrep uses, and thus, the match semantics or performance
|
||||
profile of ripgrep may subtly and unexpectedly change. However, in many cases,
|
||||
all regex engines will agree on what constitutes a match and it can be nice
|
||||
to transparently support more advanced regex features like look-around and
|
||||
backreferences without explicitly needing to enable them.
|
||||
|
||||
This flag can be disabled with --no-auto-hybrid-regex.
|
||||
");
|
||||
let arg = RGArg::switch("auto-hybrid-regex")
|
||||
.help(SHORT).long_help(LONG)
|
||||
.overrides("no-auto-hybrid-regex")
|
||||
.overrides("pcre2")
|
||||
.overrides("no-pcre2");
|
||||
args.push(arg);
|
||||
|
||||
let arg = RGArg::switch("no-auto-hybrid-regex")
|
||||
.hidden()
|
||||
.overrides("auto-hybrid-regex")
|
||||
.overrides("pcre2")
|
||||
.overrides("no-pcre2");
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_before_context(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Show NUM lines before each match.";
|
||||
const LONG: &str = long!("\
|
||||
@@ -688,6 +742,55 @@ This overrides the --context flag.
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_binary(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Search binary files.";
|
||||
const LONG: &str = long!("\
|
||||
Enabling this flag will cause ripgrep to search binary files. By default,
|
||||
ripgrep attempts to automatically skip binary files in order to improve the
|
||||
relevance of results and make the search faster.
|
||||
|
||||
Binary files are heuristically detected based on whether they contain a NUL
|
||||
byte or not. By default (without this flag set), once a NUL byte is seen,
|
||||
ripgrep will stop searching the file. Usually, NUL bytes occur in the beginning
|
||||
of most binary files. If a NUL byte occurs after a match, then ripgrep will
|
||||
still stop searching the rest of the file, but a warning will be printed.
|
||||
|
||||
In contrast, when this flag is provided, ripgrep will continue searching a file
|
||||
even if a NUL byte is found. In particular, if a NUL byte is found then ripgrep
|
||||
will continue searching until either a match is found or the end of the file is
|
||||
reached, whichever comes sooner. If a match is found, then ripgrep will stop
|
||||
and print a warning saying that the search stopped prematurely.
|
||||
|
||||
If you want ripgrep to search a file without any special NUL byte handling at
|
||||
all (and potentially print binary data to stdout), then you should use the
|
||||
'-a/--text' flag.
|
||||
|
||||
The '--binary' flag is a flag for controlling ripgrep's automatic filtering
|
||||
mechanism. As such, it does not need to be used when searching a file
|
||||
explicitly or when searching stdin. That is, it is only applicable when
|
||||
recursively searching a directory.
|
||||
|
||||
Note that when the '-u/--unrestricted' flag is provided for a third time, then
|
||||
this flag is automatically enabled.
|
||||
|
||||
This flag can be disabled with '--no-binary'. It overrides the '-a/--text'
|
||||
flag.
|
||||
");
|
||||
let arg = RGArg::switch("binary")
|
||||
.help(SHORT).long_help(LONG)
|
||||
.overrides("no-binary")
|
||||
.overrides("text")
|
||||
.overrides("no-text");
|
||||
args.push(arg);
|
||||
|
||||
let arg = RGArg::switch("no-binary")
|
||||
.hidden()
|
||||
.overrides("binary")
|
||||
.overrides("text")
|
||||
.overrides("no-text");
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_block_buffered(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Force block buffering.";
|
||||
const LONG: &str = long!("\
|
||||
@@ -788,17 +891,17 @@ to one of eight choices: red, blue, green, cyan, magenta, yellow, white and
|
||||
black. Styles are limited to nobold, bold, nointense, intense, nounderline
|
||||
or underline.
|
||||
|
||||
The format of the flag is `{type}:{attribute}:{value}`. `{type}` should be
|
||||
one of path, line, column or match. `{attribute}` can be fg, bg or style.
|
||||
`{value}` is either a color (for fg and bg) or a text style. A special format,
|
||||
`{type}:none`, will clear all color settings for `{type}`.
|
||||
The format of the flag is '{type}:{attribute}:{value}'. '{type}' should be
|
||||
one of path, line, column or match. '{attribute}' can be fg, bg or style.
|
||||
'{value}' is either a color (for fg and bg) or a text style. A special format,
|
||||
'{type}:none', will clear all color settings for '{type}'.
|
||||
|
||||
For example, the following command will change the match color to magenta and
|
||||
the background color for line numbers to yellow:
|
||||
|
||||
rg --colors 'match:fg:magenta' --colors 'line:bg:yellow' foo.
|
||||
|
||||
Extended colors can be used for `{value}` when the terminal supports ANSI color
|
||||
Extended colors can be used for '{value}' when the terminal supports ANSI color
|
||||
sequences. These are specified as either 'x' (256-color) or 'x,x,x' (24-bit
|
||||
truecolor) where x is a number between 0 and 255 inclusive. x may be given as
|
||||
a normal decimal number or a hexadecimal number, which is prefixed by `0x`.
|
||||
@@ -979,10 +1082,17 @@ fn flag_encoding(args: &mut Vec<RGArg>) {
|
||||
const LONG: &str = long!("\
|
||||
Specify the text encoding that ripgrep will use on all files searched. The
|
||||
default value is 'auto', which will cause ripgrep to do a best effort automatic
|
||||
detection of encoding on a per-file basis. Other supported values can be found
|
||||
in the list of labels here:
|
||||
detection of encoding on a per-file basis. Automatic detection in this case
|
||||
only applies to files that begin with a UTF-8 or UTF-16 byte-order mark (BOM).
|
||||
No other automatic detection is performed. One can also specify 'none' which
|
||||
will then completely disable BOM sniffing and always result in searching the
|
||||
raw bytes, including a BOM if it's present, regardless of its encoding.
|
||||
|
||||
Other supported values can be found in the list of labels here:
|
||||
https://encoding.spec.whatwg.org/#concept-encoding-get
|
||||
|
||||
For more details on encoding and how ripgrep deals with it, see GUIDE.md.
|
||||
|
||||
This flag can be disabled with --no-encoding.
|
||||
");
|
||||
let arg = RGArg::flag("encoding", "ENCODING").short("E")
|
||||
@@ -1016,7 +1126,7 @@ fn flag_files(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Print each file that would be searched.";
|
||||
const LONG: &str = long!("\
|
||||
Print each file that would be searched without actually performing the search.
|
||||
This is useful to determine whether a particular file is being search or not.
|
||||
This is useful to determine whether a particular file is being searched or not.
|
||||
");
|
||||
let arg = RGArg::switch("files")
|
||||
.help(SHORT).long_help(LONG)
|
||||
@@ -1208,6 +1318,26 @@ directly on the command line, then used -g instead.
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_ignore_file_case_insensitive(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Process ignore files case insensitively.";
|
||||
const LONG: &str = long!("\
|
||||
Process ignore files (.gitignore, .ignore, etc.) case insensitively. Note that
|
||||
this comes with a performance penalty and is most useful on case insensitive
|
||||
file systems (such as Windows).
|
||||
|
||||
This flag can be disabled with the --no-ignore-file-case-insensitive flag.
|
||||
");
|
||||
let arg = RGArg::switch("ignore-file-case-insensitive")
|
||||
.help(SHORT).long_help(LONG)
|
||||
.overrides("no-ignore-file-case-insensitive");
|
||||
args.push(arg);
|
||||
|
||||
let arg = RGArg::switch("no-ignore-file-case-insensitive")
|
||||
.hidden()
|
||||
.overrides("ignore-file-case-insensitive");
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_invert_match(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Invert matching.";
|
||||
const LONG: &str = long!("\
|
||||
@@ -1360,6 +1490,30 @@ When this flag is omitted or is set to 0, then it has no effect.
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_max_columns_preview(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Print a preview for lines exceeding the limit.";
|
||||
const LONG: &str = long!("\
|
||||
When the '--max-columns' flag is used, ripgrep will by default completely
|
||||
replace any line that is too long with a message indicating that a matching
|
||||
line was removed. When this flag is combined with '--max-columns', a preview
|
||||
of the line (corresponding to the limit size) is shown instead, where the part
|
||||
of the line exceeding the limit is not shown.
|
||||
|
||||
If the '--max-columns' flag is not set, then this has no effect.
|
||||
|
||||
This flag can be disabled with '--no-max-columns-preview'.
|
||||
");
|
||||
let arg = RGArg::switch("max-columns-preview")
|
||||
.help(SHORT).long_help(LONG)
|
||||
.overrides("no-max-columns-preview");
|
||||
args.push(arg);
|
||||
|
||||
let arg = RGArg::switch("no-max-columns-preview")
|
||||
.hidden()
|
||||
.overrides("max-columns-preview");
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_max_count(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Limit the number of matches.";
|
||||
const LONG: &str = long!("\
|
||||
@@ -1535,7 +1689,7 @@ fn flag_no_ignore(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Don't respect ignore files.";
|
||||
const LONG: &str = long!("\
|
||||
Don't respect ignore files (.gitignore, .ignore, etc.). This implies
|
||||
--no-ignore-parent and --no-ignore-vcs.
|
||||
--no-ignore-parent, --no-ignore-dot and --no-ignore-vcs.
|
||||
|
||||
This flag can be disabled with the --ignore flag.
|
||||
");
|
||||
@@ -1550,6 +1704,24 @@ This flag can be disabled with the --ignore flag.
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_no_ignore_dot(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Don't respect .ignore files.";
|
||||
const LONG: &str = long!("\
|
||||
Don't respect .ignore files.
|
||||
|
||||
This flag can be disabled with the --ignore-dot flag.
|
||||
");
|
||||
let arg = RGArg::switch("no-ignore-dot")
|
||||
.help(SHORT).long_help(LONG)
|
||||
.overrides("ignore-dot");
|
||||
args.push(arg);
|
||||
|
||||
let arg = RGArg::switch("ignore-dot")
|
||||
.hidden()
|
||||
.overrides("no-ignore-dot");
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_no_ignore_global(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Don't respect global ignore files.";
|
||||
const LONG: &str = long!("\
|
||||
@@ -1811,12 +1983,28 @@ This flag can be disabled with --no-pcre2.
|
||||
");
|
||||
let arg = RGArg::switch("pcre2").short("P")
|
||||
.help(SHORT).long_help(LONG)
|
||||
.overrides("no-pcre2");
|
||||
.overrides("no-pcre2")
|
||||
.overrides("auto-hybrid-regex")
|
||||
.overrides("no-auto-hybrid-regex");
|
||||
args.push(arg);
|
||||
|
||||
let arg = RGArg::switch("no-pcre2")
|
||||
.hidden()
|
||||
.overrides("pcre2");
|
||||
.overrides("pcre2")
|
||||
.overrides("auto-hybrid-regex")
|
||||
.overrides("no-auto-hybrid-regex");
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_pcre2_version(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Print the version of PCRE2 that ripgrep uses.";
|
||||
const LONG: &str = long!("\
|
||||
When this flag is present, ripgrep will print the version of PCRE2 in use,
|
||||
along with other information, and then exit. If PCRE2 is not available, then
|
||||
ripgrep will print an error message and exit with an error code.
|
||||
");
|
||||
let arg = RGArg::switch("pcre2-version")
|
||||
.help(SHORT).long_help(LONG);
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
@@ -1826,12 +2014,13 @@ fn flag_pre(args: &mut Vec<RGArg>) {
|
||||
For each input FILE, search the standard output of COMMAND FILE rather than the
|
||||
contents of FILE. This option expects the COMMAND program to either be an
|
||||
absolute path or to be available in your PATH. Either an empty string COMMAND
|
||||
or the `--no-pre` flag will disable this behavior.
|
||||
or the '--no-pre' flag will disable this behavior.
|
||||
|
||||
WARNING: When this flag is set, ripgrep will unconditionally spawn a
|
||||
process for every file that is searched. Therefore, this can incur an
|
||||
unnecessarily large performance penalty if you don't otherwise need the
|
||||
flexibility offered by this flag.
|
||||
flexibility offered by this flag. One possible mitigation to this is to use
|
||||
the '--pre-glob' flag to limit which files a preprocessor is run with.
|
||||
|
||||
A preprocessor is not run when ripgrep is searching stdin.
|
||||
|
||||
@@ -1998,9 +2187,9 @@ This flag can be used with the -o/--only-matching flag.
|
||||
fn flag_search_zip(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Search in compressed files.";
|
||||
const LONG: &str = long!("\
|
||||
Search in compressed files. Currently gz, bz2, xz, lzma and lz4 files are
|
||||
supported. This option expects the decompression binaries to be available in
|
||||
your PATH.
|
||||
Search in compressed files. Currently gzip, bzip2, xz, LZ4, LZMA, Brotli and
|
||||
Zstd files are supported. This option expects the decompression binaries to be
|
||||
available in your PATH.
|
||||
|
||||
This flag can be disabled with --no-search-zip.
|
||||
");
|
||||
@@ -2067,7 +2256,7 @@ for this flag are:
|
||||
path Sort by file path.
|
||||
modified Sort by the last modified time on a file.
|
||||
accessed Sort by the last accessed time on a file.
|
||||
created Sort by the cretion time on a file.
|
||||
created Sort by the creation time on a file.
|
||||
none Do not sort results.
|
||||
|
||||
If the sorting criteria isn't available on your system (for example, creation
|
||||
@@ -2100,7 +2289,7 @@ for this flag are:
|
||||
path Sort by file path.
|
||||
modified Sort by the last modified time on a file.
|
||||
accessed Sort by the last accessed time on a file.
|
||||
created Sort by the cretion time on a file.
|
||||
created Sort by the creation time on a file.
|
||||
none Do not sort results.
|
||||
|
||||
If the sorting criteria isn't available on your system (for example, creation
|
||||
@@ -2160,20 +2349,23 @@ escape codes to be printed that alter the behavior of your terminal.
|
||||
When binary file detection is enabled it is imperfect. In general, it uses
|
||||
a simple heuristic. If a NUL byte is seen during search, then the file is
|
||||
considered binary and search stops (unless this flag is present).
|
||||
Alternatively, if the '--binary' flag is used, then ripgrep will only quit
|
||||
when it sees a NUL byte after it sees a match (or searches the entire file).
|
||||
|
||||
Note that when the `-u/--unrestricted` flag is provided for a third time, then
|
||||
this flag is automatically enabled.
|
||||
|
||||
This flag can be disabled with --no-text.
|
||||
This flag can be disabled with '--no-text'. It overrides the '--binary' flag.
|
||||
");
|
||||
let arg = RGArg::switch("text").short("a")
|
||||
.help(SHORT).long_help(LONG)
|
||||
.overrides("no-text");
|
||||
.overrides("no-text")
|
||||
.overrides("binary")
|
||||
.overrides("no-binary");
|
||||
args.push(arg);
|
||||
|
||||
let arg = RGArg::switch("no-text")
|
||||
.hidden()
|
||||
.overrides("text");
|
||||
.overrides("text")
|
||||
.overrides("binary")
|
||||
.overrides("no-binary");
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
@@ -2302,8 +2494,7 @@ Reduce the level of \"smart\" searching. A single -u won't respect .gitignore
|
||||
(etc.) files. Two -u flags will additionally search hidden files and
|
||||
directories. Three -u flags will additionally search binary files.
|
||||
|
||||
-uu is roughly equivalent to grep -r and -uuu is roughly equivalent to grep -a
|
||||
-r.
|
||||
'rg -uuu' is roughly equivalent to 'grep -r'.
|
||||
");
|
||||
let arg = RGArg::switch("unrestricted").short("u")
|
||||
.help(SHORT).long_help(LONG)
|
||||
@@ -2345,7 +2536,7 @@ ripgrep is explicitly instructed to search one file or stdin.
|
||||
|
||||
This flag overrides --with-filename.
|
||||
");
|
||||
let arg = RGArg::switch("no-filename")
|
||||
let arg = RGArg::switch("no-filename").short("I")
|
||||
.help(NO_SHORT).long_help(NO_LONG)
|
||||
.overrides("with-filename");
|
||||
args.push(arg);
|
||||
|
309
src/args.rs
309
src/args.rs
@@ -1,9 +1,10 @@
|
||||
use std::cmp;
|
||||
use std::env;
|
||||
use std::ffi::OsStr;
|
||||
use std::ffi::{OsStr, OsString};
|
||||
use std::fs;
|
||||
use std::io;
|
||||
use std::io::{self, Write};
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::process;
|
||||
use std::sync::Arc;
|
||||
use std::time::SystemTime;
|
||||
|
||||
@@ -34,20 +35,22 @@ use ignore::types::{FileTypeDef, Types, TypesBuilder};
|
||||
use ignore::{Walk, WalkBuilder, WalkParallel};
|
||||
use log;
|
||||
use num_cpus;
|
||||
use path_printer::{PathPrinter, PathPrinterBuilder};
|
||||
use regex;
|
||||
use termcolor::{
|
||||
WriteColor,
|
||||
BufferWriter, ColorChoice,
|
||||
};
|
||||
|
||||
use app;
|
||||
use config;
|
||||
use logger::Logger;
|
||||
use messages::{set_messages, set_ignore_messages};
|
||||
use search::{PatternMatcher, Printer, SearchWorker, SearchWorkerBuilder};
|
||||
use subject::SubjectBuilder;
|
||||
use Result;
|
||||
use crate::app;
|
||||
use crate::config;
|
||||
use crate::logger::Logger;
|
||||
use crate::messages::{set_messages, set_ignore_messages};
|
||||
use crate::path_printer::{PathPrinter, PathPrinterBuilder};
|
||||
use crate::search::{
|
||||
PatternMatcher, Printer, SearchWorker, SearchWorkerBuilder,
|
||||
};
|
||||
use crate::subject::SubjectBuilder;
|
||||
use crate::Result;
|
||||
|
||||
/// The command that ripgrep should execute based on the command line
|
||||
/// configuration.
|
||||
@@ -70,6 +73,8 @@ pub enum Command {
|
||||
/// List all file type definitions configured, including the default file
|
||||
/// types and any additional file types added to the command line.
|
||||
Types,
|
||||
/// Print the version of PCRE2 in use.
|
||||
PCRE2Version,
|
||||
}
|
||||
|
||||
impl Command {
|
||||
@@ -79,7 +84,11 @@ impl Command {
|
||||
|
||||
match *self {
|
||||
Search | SearchParallel => true,
|
||||
SearchNever | Files | FilesParallel | Types => false,
|
||||
| SearchNever
|
||||
| Files
|
||||
| FilesParallel
|
||||
| Types
|
||||
| PCRE2Version => false,
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -128,7 +137,7 @@ impl Args {
|
||||
// trying to parse config files. If a config file exists and has
|
||||
// arguments, then we re-parse argv, otherwise we just use the matches
|
||||
// we have here.
|
||||
let early_matches = ArgMatches::new(app::app().get_matches());
|
||||
let early_matches = ArgMatches::new(clap_matches(env::args_os())?);
|
||||
set_messages(!early_matches.is_present("no-messages"));
|
||||
set_ignore_messages(!early_matches.is_present("no-ignore-messages"));
|
||||
|
||||
@@ -143,7 +152,7 @@ impl Args {
|
||||
log::set_max_level(log::LevelFilter::Warn);
|
||||
}
|
||||
|
||||
let matches = early_matches.reconfigure();
|
||||
let matches = early_matches.reconfigure()?;
|
||||
// The logging level may have changed if we brought in additional
|
||||
// arguments from a configuration file, so recheck it and set the log
|
||||
// level as appropriate.
|
||||
@@ -232,7 +241,9 @@ impl Args {
|
||||
let threads = self.matches().threads()?;
|
||||
let one_thread = is_one_search || threads == 1;
|
||||
|
||||
Ok(if self.matches().is_present("type-list") {
|
||||
Ok(if self.matches().is_present("pcre2-version") {
|
||||
Command::PCRE2Version
|
||||
} else if self.matches().is_present("type-list") {
|
||||
Command::Types
|
||||
} else if self.matches().is_present("files") {
|
||||
if one_thread {
|
||||
@@ -265,6 +276,11 @@ impl Args {
|
||||
Ok(builder.build(wtr))
|
||||
}
|
||||
|
||||
/// Returns true if and only if ripgrep should be "quiet."
|
||||
pub fn quiet(&self) -> bool {
|
||||
self.matches().is_present("quiet")
|
||||
}
|
||||
|
||||
/// Returns true if and only if the search should quit after finding the
|
||||
/// first match.
|
||||
pub fn quit_after_match(&self) -> Result<bool> {
|
||||
@@ -278,15 +294,18 @@ impl Args {
|
||||
&self,
|
||||
wtr: W,
|
||||
) -> Result<SearchWorker<W>> {
|
||||
let matches = self.matches();
|
||||
let matcher = self.matcher().clone();
|
||||
let printer = self.printer(wtr)?;
|
||||
let searcher = self.matches().searcher(self.paths())?;
|
||||
let searcher = matches.searcher(self.paths())?;
|
||||
let mut builder = SearchWorkerBuilder::new();
|
||||
builder
|
||||
.json_stats(self.matches().is_present("json"))
|
||||
.preprocessor(self.matches().preprocessor())
|
||||
.preprocessor_globs(self.matches().preprocessor_globs()?)
|
||||
.search_zip(self.matches().is_present("search-zip"));
|
||||
.json_stats(matches.is_present("json"))
|
||||
.preprocessor(matches.preprocessor())
|
||||
.preprocessor_globs(matches.preprocessor_globs()?)
|
||||
.search_zip(matches.is_present("search-zip"))
|
||||
.binary_detection_implicit(matches.binary_detection_implicit())
|
||||
.binary_detection_explicit(matches.binary_detection_explicit());
|
||||
Ok(builder.build(matcher, searcher, printer))
|
||||
}
|
||||
|
||||
@@ -475,6 +494,37 @@ impl SortByKind {
|
||||
}
|
||||
}
|
||||
|
||||
/// Encoding mode the searcher will use.
|
||||
#[derive(Clone, Debug)]
|
||||
enum EncodingMode {
|
||||
/// Use an explicit encoding forcefully, but let BOM sniffing override it.
|
||||
Some(Encoding),
|
||||
/// Use only BOM sniffing to auto-detect an encoding.
|
||||
Auto,
|
||||
/// Use no explicit encoding and disable all BOM sniffing. This will
|
||||
/// always result in searching the raw bytes, regardless of their
|
||||
/// true encoding.
|
||||
Disabled,
|
||||
}
|
||||
|
||||
impl EncodingMode {
|
||||
/// Checks if an explicit encoding has been set. Returns false for
|
||||
/// automatic BOM sniffing and no sniffing.
|
||||
///
|
||||
/// This is only used to determine whether PCRE2 needs to have its own
|
||||
/// UTF-8 checking enabled. If we have an explicit encoding set, then
|
||||
/// we're always guaranteed to get UTF-8, so we can disable PCRE2's check.
|
||||
/// Otherwise, we have no such guarantee, and must enable PCRE2' UTF-8
|
||||
/// check.
|
||||
#[cfg(feature = "pcre2")]
|
||||
fn has_explicit_encoding(&self) -> bool {
|
||||
match self {
|
||||
EncodingMode::Some(_) => true,
|
||||
_ => false
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl ArgMatches {
|
||||
/// Create an ArgMatches from clap's parse result.
|
||||
fn new(clap_matches: clap::ArgMatches<'static>) -> ArgMatches {
|
||||
@@ -488,25 +538,27 @@ impl ArgMatches {
|
||||
///
|
||||
/// If there are no additional arguments from the environment (e.g., a
|
||||
/// config file), then the given matches are returned as is.
|
||||
fn reconfigure(self) -> ArgMatches {
|
||||
fn reconfigure(self) -> Result<ArgMatches> {
|
||||
// If the end user says no config, then respect it.
|
||||
if self.is_present("no-config") {
|
||||
debug!("not reading config files because --no-config is present");
|
||||
return self;
|
||||
log::debug!(
|
||||
"not reading config files because --no-config is present"
|
||||
);
|
||||
return Ok(self);
|
||||
}
|
||||
// If the user wants ripgrep to use a config file, then parse args
|
||||
// from that first.
|
||||
let mut args = config::args();
|
||||
if args.is_empty() {
|
||||
return self;
|
||||
return Ok(self);
|
||||
}
|
||||
let mut cliargs = env::args_os();
|
||||
if let Some(bin) = cliargs.next() {
|
||||
args.insert(0, bin);
|
||||
}
|
||||
args.extend(cliargs);
|
||||
debug!("final argv: {:?}", args);
|
||||
ArgMatches::new(app::app().get_matches_from(args))
|
||||
log::debug!("final argv: {:?}", args);
|
||||
Ok(ArgMatches(clap_matches(args)?))
|
||||
}
|
||||
|
||||
/// Convert the result of parsing CLI arguments into ripgrep's higher level
|
||||
@@ -547,6 +599,25 @@ impl ArgMatches {
|
||||
if self.is_present("pcre2") {
|
||||
let matcher = self.matcher_pcre2(patterns)?;
|
||||
Ok(PatternMatcher::PCRE2(matcher))
|
||||
} else if self.is_present("auto-hybrid-regex") {
|
||||
let rust_err = match self.matcher_rust(patterns) {
|
||||
Ok(matcher) => return Ok(PatternMatcher::RustRegex(matcher)),
|
||||
Err(err) => err,
|
||||
};
|
||||
log::debug!(
|
||||
"error building Rust regex in hybrid mode:\n{}", rust_err,
|
||||
);
|
||||
let pcre_err = match self.matcher_pcre2(patterns) {
|
||||
Ok(matcher) => return Ok(PatternMatcher::PCRE2(matcher)),
|
||||
Err(err) => err,
|
||||
};
|
||||
Err(From::from(format!(
|
||||
"regex could not be compiled with either the default regex \
|
||||
engine or with PCRE2.\n\n\
|
||||
default regex engine error:\n{}\n{}\n{}\n\n\
|
||||
PCRE2 regex engine error:\n{}",
|
||||
"~".repeat(79), rust_err, "~".repeat(79), pcre_err,
|
||||
)))
|
||||
} else {
|
||||
let matcher = match self.matcher_rust(patterns) {
|
||||
Ok(matcher) => matcher,
|
||||
@@ -615,7 +686,16 @@ impl ArgMatches {
|
||||
if let Some(limit) = self.dfa_size_limit()? {
|
||||
builder.dfa_size_limit(limit);
|
||||
}
|
||||
Ok(builder.build(&patterns.join("|"))?)
|
||||
let res =
|
||||
if self.is_present("fixed-strings") {
|
||||
builder.build_literals(patterns)
|
||||
} else {
|
||||
builder.build(&patterns.join("|"))
|
||||
};
|
||||
match res {
|
||||
Ok(m) => Ok(m),
|
||||
Err(err) => Err(From::from(suggest_multiline(err.to_string()))),
|
||||
}
|
||||
}
|
||||
|
||||
/// Build a matcher using PCRE2.
|
||||
@@ -632,12 +712,17 @@ impl ArgMatches {
|
||||
.word(self.is_present("word-regexp"));
|
||||
// For whatever reason, the JIT craps out during regex compilation with
|
||||
// a "no more memory" error on 32 bit systems. So don't use it there.
|
||||
if !cfg!(target_pointer_width = "32") {
|
||||
builder.jit(true);
|
||||
if cfg!(target_pointer_width = "64") {
|
||||
builder
|
||||
.jit_if_available(true)
|
||||
// The PCRE2 docs say that 32KB is the default, and that 1MB
|
||||
// should be big enough for anything. But let's crank it to
|
||||
// 10MB.
|
||||
.max_jit_stack_size(Some(10 * (1<<20)));
|
||||
}
|
||||
if self.pcre2_unicode() {
|
||||
builder.utf(true).ucp(true);
|
||||
if self.encoding()?.is_some() {
|
||||
if self.encoding()?.has_explicit_encoding() {
|
||||
// SAFETY: If an encoding was specified, then we're guaranteed
|
||||
// to get valid UTF-8, so we can disable PCRE2's UTF checking.
|
||||
// (Feeding invalid UTF-8 to PCRE2 is undefined behavior.)
|
||||
@@ -693,6 +778,7 @@ impl ArgMatches {
|
||||
.per_match(self.is_present("vimgrep"))
|
||||
.replacement(self.replacement())
|
||||
.max_columns(self.max_columns()?)
|
||||
.max_columns_preview(self.max_columns_preview())
|
||||
.max_matches(self.max_count()?)
|
||||
.column(self.column())
|
||||
.byte_offset(self.is_present("byte-offset"))
|
||||
@@ -752,9 +838,16 @@ impl ArgMatches {
|
||||
.before_context(ctx_before)
|
||||
.after_context(ctx_after)
|
||||
.passthru(self.is_present("passthru"))
|
||||
.memory_map(self.mmap_choice(paths))
|
||||
.binary_detection(self.binary_detection())
|
||||
.encoding(self.encoding()?);
|
||||
.memory_map(self.mmap_choice(paths));
|
||||
match self.encoding()? {
|
||||
EncodingMode::Some(enc) => {
|
||||
builder.encoding(Some(enc));
|
||||
}
|
||||
EncodingMode::Auto => {} // default for the searcher
|
||||
EncodingMode::Disabled => {
|
||||
builder.bom_sniffing(false);
|
||||
}
|
||||
}
|
||||
Ok(builder.build())
|
||||
}
|
||||
|
||||
@@ -779,18 +872,16 @@ impl ArgMatches {
|
||||
.max_filesize(self.max_file_size()?)
|
||||
.threads(self.threads()?)
|
||||
.same_file_system(self.is_present("one-file-system"))
|
||||
.skip_stdout(true)
|
||||
.skip_stdout(!self.is_present("files"))
|
||||
.overrides(self.overrides()?)
|
||||
.types(self.types()?)
|
||||
.hidden(!self.hidden())
|
||||
.parents(!self.no_ignore_parent())
|
||||
.ignore(!self.no_ignore())
|
||||
.git_global(
|
||||
!self.no_ignore()
|
||||
&& !self.no_ignore_vcs()
|
||||
&& !self.no_ignore_global())
|
||||
.git_ignore(!self.no_ignore() && !self.no_ignore_vcs())
|
||||
.git_exclude(!self.no_ignore() && !self.no_ignore_vcs());
|
||||
.ignore(!self.no_ignore_dot())
|
||||
.git_global(!self.no_ignore_vcs() && !self.no_ignore_global())
|
||||
.git_ignore(!self.no_ignore_vcs())
|
||||
.git_exclude(!self.no_ignore_vcs())
|
||||
.ignore_case_insensitive(self.ignore_file_case_insensitive());
|
||||
if !self.no_ignore() {
|
||||
builder.add_custom_ignore_filename(".rgignore");
|
||||
}
|
||||
@@ -806,16 +897,39 @@ impl ArgMatches {
|
||||
///
|
||||
/// Methods are sorted alphabetically.
|
||||
impl ArgMatches {
|
||||
/// Returns the form of binary detection to perform.
|
||||
fn binary_detection(&self) -> BinaryDetection {
|
||||
/// Returns the form of binary detection to perform on files that are
|
||||
/// implicitly searched via recursive directory traversal.
|
||||
fn binary_detection_implicit(&self) -> BinaryDetection {
|
||||
let none =
|
||||
self.is_present("text")
|
||||
|| self.is_present("null-data");
|
||||
let convert =
|
||||
self.is_present("binary")
|
||||
|| self.unrestricted_count() >= 3;
|
||||
if none {
|
||||
BinaryDetection::none()
|
||||
} else if convert {
|
||||
BinaryDetection::convert(b'\x00')
|
||||
} else {
|
||||
BinaryDetection::quit(b'\x00')
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns the form of binary detection to perform on files that are
|
||||
/// explicitly searched via the user invoking ripgrep on a particular
|
||||
/// file or files or stdin.
|
||||
///
|
||||
/// In general, this should never be BinaryDetection::quit, since that acts
|
||||
/// as a filter (but quitting immediately once a NUL byte is seen), and we
|
||||
/// should never filter out files that the user wants to explicitly search.
|
||||
fn binary_detection_explicit(&self) -> BinaryDetection {
|
||||
let none =
|
||||
self.is_present("text")
|
||||
|| self.unrestricted_count() >= 3
|
||||
|| self.is_present("null-data");
|
||||
if none {
|
||||
BinaryDetection::none()
|
||||
} else {
|
||||
BinaryDetection::quit(b'\x00')
|
||||
BinaryDetection::convert(b'\x00')
|
||||
}
|
||||
}
|
||||
|
||||
@@ -941,24 +1055,30 @@ impl ArgMatches {
|
||||
u64_to_usize("dfa-size-limit", r)
|
||||
}
|
||||
|
||||
/// Returns the type of encoding to use.
|
||||
/// Returns the encoding mode to use.
|
||||
///
|
||||
/// This only returns an encoding if one is explicitly specified. When no
|
||||
/// encoding is present, the Searcher will still do BOM sniffing for UTF-16
|
||||
/// and transcode seamlessly.
|
||||
fn encoding(&self) -> Result<Option<Encoding>> {
|
||||
/// This only returns an encoding if one is explicitly specified. Otherwise
|
||||
/// if set to automatic, the Searcher will do BOM sniffing for UTF-16
|
||||
/// and transcode seamlessly. If disabled, no BOM sniffing nor transcoding
|
||||
/// will occur.
|
||||
fn encoding(&self) -> Result<EncodingMode> {
|
||||
if self.is_present("no-encoding") {
|
||||
return Ok(None);
|
||||
return Ok(EncodingMode::Auto);
|
||||
}
|
||||
|
||||
let label = match self.value_of_lossy("encoding") {
|
||||
None if self.pcre2_unicode() => "utf-8".to_string(),
|
||||
None => return Ok(None),
|
||||
None => return Ok(EncodingMode::Auto),
|
||||
Some(label) => label,
|
||||
};
|
||||
|
||||
if label == "auto" {
|
||||
return Ok(None);
|
||||
return Ok(EncodingMode::Auto);
|
||||
} else if label == "none" {
|
||||
return Ok(EncodingMode::Disabled);
|
||||
}
|
||||
Ok(Some(Encoding::new(&label)?))
|
||||
|
||||
Ok(EncodingMode::Some(Encoding::new(&label)?))
|
||||
}
|
||||
|
||||
/// Return the file separator to use based on the CLI configuration.
|
||||
@@ -996,6 +1116,11 @@ impl ArgMatches {
|
||||
self.is_present("hidden") || self.unrestricted_count() >= 2
|
||||
}
|
||||
|
||||
/// Returns true if ignore files should be processed case insensitively.
|
||||
fn ignore_file_case_insensitive(&self) -> bool {
|
||||
self.is_present("ignore-file-case-insensitive")
|
||||
}
|
||||
|
||||
/// Return all of the ignore file paths given on the command line.
|
||||
fn ignore_paths(&self) -> Vec<PathBuf> {
|
||||
let paths = match self.values_of_os("ignore-file") {
|
||||
@@ -1050,6 +1175,12 @@ impl ArgMatches {
|
||||
Ok(self.usize_of_nonzero("max-columns")?.map(|n| n as u64))
|
||||
}
|
||||
|
||||
/// Returns true if and only if a preview should be shown for lines that
|
||||
/// exceed the maximum column limit.
|
||||
fn max_columns_preview(&self) -> bool {
|
||||
self.is_present("max-columns-preview")
|
||||
}
|
||||
|
||||
/// The maximum number of matches permitted.
|
||||
fn max_count(&self) -> Result<Option<u64>> {
|
||||
Ok(self.usize_of("max-count")?.map(|n| n as u64))
|
||||
@@ -1090,6 +1221,11 @@ impl ArgMatches {
|
||||
self.is_present("no-ignore") || self.unrestricted_count() >= 1
|
||||
}
|
||||
|
||||
/// Returns true if .ignore files should be ignored.
|
||||
fn no_ignore_dot(&self) -> bool {
|
||||
self.is_present("no-ignore-dot") || self.no_ignore()
|
||||
}
|
||||
|
||||
/// Returns true if global ignore files should be ignored.
|
||||
fn no_ignore_global(&self) -> bool {
|
||||
self.is_present("no-ignore-global") || self.no_ignore()
|
||||
@@ -1136,7 +1272,7 @@ impl ArgMatches {
|
||||
builder.add(&glob)?;
|
||||
}
|
||||
// This only enables case insensitivity for subsequent globs.
|
||||
builder.case_insensitive(true)?;
|
||||
builder.case_insensitive(true).unwrap();
|
||||
for glob in self.values_of_lossy_vec("iglob") {
|
||||
builder.add(&glob)?;
|
||||
}
|
||||
@@ -1174,7 +1310,8 @@ impl ArgMatches {
|
||||
!cli::is_readable_stdin()
|
||||
|| (self.is_present("file") && file_is_stdin)
|
||||
|| self.is_present("files")
|
||||
|| self.is_present("type-list");
|
||||
|| self.is_present("type-list")
|
||||
|| self.is_present("pcre2-version");
|
||||
if search_cwd {
|
||||
Path::new("./").to_path_buf()
|
||||
} else {
|
||||
@@ -1250,9 +1387,15 @@ impl ArgMatches {
|
||||
if let Some(paths) = self.values_of_os("file") {
|
||||
for path in paths {
|
||||
if path == "-" {
|
||||
pats.extend(cli::patterns_from_stdin()?);
|
||||
pats.extend(cli::patterns_from_stdin()?
|
||||
.into_iter()
|
||||
.map(|p| self.pattern_from_string(p))
|
||||
);
|
||||
} else {
|
||||
pats.extend(cli::patterns_from_path(path)?);
|
||||
pats.extend(cli::patterns_from_path(path)?
|
||||
.into_iter()
|
||||
.map(|p| self.pattern_from_string(p))
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1281,13 +1424,17 @@ impl ArgMatches {
|
||||
/// Converts a &str pattern to a String pattern. The pattern is escaped
|
||||
/// if -F/--fixed-strings is set.
|
||||
fn pattern_from_str(&self, pat: &str) -> String {
|
||||
let litpat = self.pattern_literal(pat.to_string());
|
||||
let s = self.pattern_line(litpat);
|
||||
self.pattern_from_string(pat.to_string())
|
||||
}
|
||||
|
||||
if s.is_empty() {
|
||||
/// Applies additional processing on the given pattern if necessary
|
||||
/// (such as escaping meta characters or turning it into a line regex).
|
||||
fn pattern_from_string(&self, pat: String) -> String {
|
||||
let pat = self.pattern_line(self.pattern_literal(pat));
|
||||
if pat.is_empty() {
|
||||
self.pattern_empty()
|
||||
} else {
|
||||
s
|
||||
pat
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1546,6 +1693,17 @@ and look-around.", msg)
|
||||
}
|
||||
}
|
||||
|
||||
fn suggest_multiline(msg: String) -> String {
|
||||
if msg.contains("the literal") && msg.contains("not allowed") {
|
||||
format!("{}
|
||||
|
||||
Consider enabling multiline mode with the --multiline flag (or -U for short).
|
||||
When multiline mode is enabled, new line characters can be matched.", msg)
|
||||
} else {
|
||||
msg
|
||||
}
|
||||
}
|
||||
|
||||
/// Convert the result of parsing a human readable file size to a `usize`,
|
||||
/// failing if the type does not fit.
|
||||
fn u64_to_usize(
|
||||
@@ -1592,3 +1750,32 @@ where G: Fn(&fs::Metadata) -> io::Result<SystemTime>
|
||||
t1.cmp(&t2)
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns a clap matches object if the given arguments parse successfully.
|
||||
///
|
||||
/// Otherwise, if an error occurred, then it is returned unless the error
|
||||
/// corresponds to a `--help` or `--version` request. In which case, the
|
||||
/// corresponding output is printed and the current process is exited
|
||||
/// successfully.
|
||||
fn clap_matches<I, T>(
|
||||
args: I,
|
||||
) -> Result<clap::ArgMatches<'static>>
|
||||
where I: IntoIterator<Item=T>,
|
||||
T: Into<OsString> + Clone
|
||||
{
|
||||
let err = match app::app().get_matches_from_safe(args) {
|
||||
Ok(matches) => return Ok(matches),
|
||||
Err(err) => err,
|
||||
};
|
||||
if err.use_stderr() {
|
||||
return Err(err.into());
|
||||
}
|
||||
// Explicitly ignore any error returned by write!. The most likely error
|
||||
// at this point is a broken pipe error, in which case, we want to ignore
|
||||
// it and exit quietly.
|
||||
//
|
||||
// (This is the point of this helper function. clap's functionality for
|
||||
// doing this will panic on a broken pipe error.)
|
||||
let _ = write!(io::stdout(), "{}", err);
|
||||
process::exit(0);
|
||||
}
|
||||
|
@@ -5,11 +5,14 @@
|
||||
use std::env;
|
||||
use std::error::Error;
|
||||
use std::fs::File;
|
||||
use std::io::{self, BufRead};
|
||||
use std::io;
|
||||
use std::ffi::OsString;
|
||||
use std::path::{Path, PathBuf};
|
||||
|
||||
use Result;
|
||||
use bstr::io::BufReadExt;
|
||||
use log;
|
||||
|
||||
use crate::Result;
|
||||
|
||||
/// Return a sequence of arguments derived from ripgrep rc configuration files.
|
||||
pub fn args() -> Vec<OsString> {
|
||||
@@ -34,7 +37,7 @@ pub fn args() -> Vec<OsString> {
|
||||
message!("{}:{}", config_path.display(), err);
|
||||
}
|
||||
}
|
||||
debug!(
|
||||
log::debug!(
|
||||
"{}: arguments loaded from config file: {:?}",
|
||||
config_path.display(),
|
||||
args
|
||||
@@ -74,62 +77,29 @@ fn parse<P: AsRef<Path>>(
|
||||
fn parse_reader<R: io::Read>(
|
||||
rdr: R,
|
||||
) -> Result<(Vec<OsString>, Vec<Box<Error>>)> {
|
||||
let mut bufrdr = io::BufReader::new(rdr);
|
||||
let bufrdr = io::BufReader::new(rdr);
|
||||
let (mut args, mut errs) = (vec![], vec![]);
|
||||
let mut line = vec![];
|
||||
let mut line_number = 0;
|
||||
while {
|
||||
line.clear();
|
||||
bufrdr.for_byte_line_with_terminator(|line| {
|
||||
line_number += 1;
|
||||
bufrdr.read_until(b'\n', &mut line)? > 0
|
||||
} {
|
||||
trim(&mut line);
|
||||
|
||||
let line = line.trim();
|
||||
if line.is_empty() || line[0] == b'#' {
|
||||
continue;
|
||||
return Ok(true);
|
||||
}
|
||||
match bytes_to_os_string(&line) {
|
||||
match line.to_os_str() {
|
||||
Ok(osstr) => {
|
||||
args.push(osstr);
|
||||
args.push(osstr.to_os_string());
|
||||
}
|
||||
Err(err) => {
|
||||
errs.push(format!("{}: {}", line_number, err).into());
|
||||
}
|
||||
}
|
||||
}
|
||||
Ok(true)
|
||||
})?;
|
||||
Ok((args, errs))
|
||||
}
|
||||
|
||||
/// Trim the given bytes of whitespace according to the ASCII definition.
|
||||
fn trim(x: &mut Vec<u8>) {
|
||||
let upto = x.iter().take_while(|b| is_space(**b)).count();
|
||||
x.drain(..upto);
|
||||
let revto = x.len() - x.iter().rev().take_while(|b| is_space(**b)).count();
|
||||
x.drain(revto..);
|
||||
}
|
||||
|
||||
/// Returns true if and only if the given byte is an ASCII space character.
|
||||
fn is_space(b: u8) -> bool {
|
||||
b == b'\t'
|
||||
|| b == b'\n'
|
||||
|| b == b'\x0B'
|
||||
|| b == b'\x0C'
|
||||
|| b == b'\r'
|
||||
|| b == b' '
|
||||
}
|
||||
|
||||
/// On Unix, get an OsString from raw bytes.
|
||||
#[cfg(unix)]
|
||||
fn bytes_to_os_string(bytes: &[u8]) -> Result<OsString> {
|
||||
use std::os::unix::ffi::OsStringExt;
|
||||
Ok(OsString::from_vec(bytes.to_vec()))
|
||||
}
|
||||
|
||||
/// On non-Unix (like Windows), require UTF-8.
|
||||
#[cfg(not(unix))]
|
||||
fn bytes_to_os_string(bytes: &[u8]) -> Result<OsString> {
|
||||
String::from_utf8(bytes.to_vec()).map(OsString::from).map_err(From::from)
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use std::ffi::OsString;
|
||||
|
92
src/main.rs
92
src/main.rs
@@ -1,17 +1,3 @@
|
||||
#[macro_use]
|
||||
extern crate clap;
|
||||
extern crate grep;
|
||||
extern crate ignore;
|
||||
#[macro_use]
|
||||
extern crate lazy_static;
|
||||
#[macro_use]
|
||||
extern crate log;
|
||||
extern crate num_cpus;
|
||||
extern crate regex;
|
||||
#[macro_use]
|
||||
extern crate serde_json;
|
||||
extern crate termcolor;
|
||||
|
||||
use std::io::{self, Write};
|
||||
use std::process;
|
||||
use std::sync::{Arc, Mutex};
|
||||
@@ -36,33 +22,38 @@ mod subject;
|
||||
type Result<T> = ::std::result::Result<T, Box<::std::error::Error>>;
|
||||
|
||||
fn main() {
|
||||
match Args::parse().and_then(try_main) {
|
||||
Ok(true) => process::exit(0),
|
||||
Ok(false) => process::exit(1),
|
||||
Err(err) => {
|
||||
eprintln!("{}", err);
|
||||
process::exit(2);
|
||||
}
|
||||
if let Err(err) = Args::parse().and_then(try_main) {
|
||||
eprintln!("{}", err);
|
||||
process::exit(2);
|
||||
}
|
||||
}
|
||||
|
||||
fn try_main(args: Args) -> Result<bool> {
|
||||
fn try_main(args: Args) -> Result<()> {
|
||||
use args::Command::*;
|
||||
|
||||
match args.command()? {
|
||||
Search => search(args),
|
||||
SearchParallel => search_parallel(args),
|
||||
SearchNever => Ok(false),
|
||||
Files => files(args),
|
||||
FilesParallel => files_parallel(args),
|
||||
Types => types(args),
|
||||
let matched =
|
||||
match args.command()? {
|
||||
Search => search(&args),
|
||||
SearchParallel => search_parallel(&args),
|
||||
SearchNever => Ok(false),
|
||||
Files => files(&args),
|
||||
FilesParallel => files_parallel(&args),
|
||||
Types => types(&args),
|
||||
PCRE2Version => pcre2_version(&args),
|
||||
}?;
|
||||
if matched && (args.quiet() || !messages::errored()) {
|
||||
process::exit(0)
|
||||
} else if messages::errored() {
|
||||
process::exit(2)
|
||||
} else {
|
||||
process::exit(1)
|
||||
}
|
||||
}
|
||||
|
||||
/// The top-level entry point for single-threaded search. This recursively
|
||||
/// steps through the file list (current directory by default) and searches
|
||||
/// each file sequentially.
|
||||
fn search(args: Args) -> Result<bool> {
|
||||
fn search(args: &Args) -> Result<bool> {
|
||||
let started_at = Instant::now();
|
||||
let quit_after_match = args.quit_after_match()?;
|
||||
let subject_builder = args.subject_builder();
|
||||
@@ -82,7 +73,7 @@ fn search(args: Args) -> Result<bool> {
|
||||
if err.kind() == io::ErrorKind::BrokenPipe {
|
||||
break;
|
||||
}
|
||||
message!("{}: {}", subject.path().display(), err);
|
||||
err_message!("{}: {}", subject.path().display(), err);
|
||||
continue;
|
||||
}
|
||||
};
|
||||
@@ -105,7 +96,7 @@ fn search(args: Args) -> Result<bool> {
|
||||
/// The top-level entry point for multi-threaded search. The parallelism is
|
||||
/// itself achieved by the recursive directory traversal. All we need to do is
|
||||
/// feed it a worker for performing a search on each file.
|
||||
fn search_parallel(args: Args) -> Result<bool> {
|
||||
fn search_parallel(args: &Args) -> Result<bool> {
|
||||
use std::sync::atomic::AtomicBool;
|
||||
use std::sync::atomic::Ordering::SeqCst;
|
||||
|
||||
@@ -141,7 +132,7 @@ fn search_parallel(args: Args) -> Result<bool> {
|
||||
let search_result = match searcher.search(&subject) {
|
||||
Ok(search_result) => search_result,
|
||||
Err(err) => {
|
||||
message!("{}: {}", subject.path().display(), err);
|
||||
err_message!("{}: {}", subject.path().display(), err);
|
||||
return WalkState::Continue;
|
||||
}
|
||||
};
|
||||
@@ -158,7 +149,7 @@ fn search_parallel(args: Args) -> Result<bool> {
|
||||
return WalkState::Quit;
|
||||
}
|
||||
// Otherwise, we continue on our merry way.
|
||||
message!("{}: {}", subject.path().display(), err);
|
||||
err_message!("{}: {}", subject.path().display(), err);
|
||||
}
|
||||
if matched.load(SeqCst) && quit_after_match {
|
||||
WalkState::Quit
|
||||
@@ -183,7 +174,7 @@ fn search_parallel(args: Args) -> Result<bool> {
|
||||
/// The top-level entry point for listing files without searching them. This
|
||||
/// recursively steps through the file list (current directory by default) and
|
||||
/// prints each path sequentially using a single thread.
|
||||
fn files(args: Args) -> Result<bool> {
|
||||
fn files(args: &Args) -> Result<bool> {
|
||||
let quit_after_match = args.quit_after_match()?;
|
||||
let subject_builder = args.subject_builder();
|
||||
let mut matched = false;
|
||||
@@ -213,7 +204,7 @@ fn files(args: Args) -> Result<bool> {
|
||||
/// The top-level entry point for listing files without searching them. This
|
||||
/// recursively steps through the file list (current directory by default) and
|
||||
/// prints each path sequentially using multiple threads.
|
||||
fn files_parallel(args: Args) -> Result<bool> {
|
||||
fn files_parallel(args: &Args) -> Result<bool> {
|
||||
use std::sync::atomic::AtomicBool;
|
||||
use std::sync::atomic::Ordering::SeqCst;
|
||||
use std::sync::mpsc;
|
||||
@@ -265,7 +256,7 @@ fn files_parallel(args: Args) -> Result<bool> {
|
||||
}
|
||||
|
||||
/// The top-level entry point for --type-list.
|
||||
fn types(args: Args) -> Result<bool> {
|
||||
fn types(args: &Args) -> Result<bool> {
|
||||
let mut count = 0;
|
||||
let mut stdout = args.stdout();
|
||||
for def in args.type_defs()? {
|
||||
@@ -285,3 +276,30 @@ fn types(args: Args) -> Result<bool> {
|
||||
}
|
||||
Ok(count > 0)
|
||||
}
|
||||
|
||||
/// The top-level entry point for --pcre2-version.
|
||||
fn pcre2_version(args: &Args) -> Result<bool> {
|
||||
#[cfg(feature = "pcre2")]
|
||||
fn imp(args: &Args) -> Result<bool> {
|
||||
use grep::pcre2;
|
||||
|
||||
let mut stdout = args.stdout();
|
||||
|
||||
let (major, minor) = pcre2::version();
|
||||
writeln!(stdout, "PCRE2 {}.{} is available", major, minor)?;
|
||||
|
||||
if cfg!(target_pointer_width = "64") && pcre2::is_jit_available() {
|
||||
writeln!(stdout, "JIT is available")?;
|
||||
}
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
#[cfg(not(feature = "pcre2"))]
|
||||
fn imp(args: &Args) -> Result<bool> {
|
||||
let mut stdout = args.stdout();
|
||||
writeln!(stdout, "PCRE2 is not available in this build of ripgrep.")?;
|
||||
Ok(false)
|
||||
}
|
||||
|
||||
imp(args)
|
||||
}
|
||||
|
@@ -1,21 +1,35 @@
|
||||
use std::sync::atomic::{ATOMIC_BOOL_INIT, AtomicBool, Ordering};
|
||||
use std::sync::atomic::{AtomicBool, Ordering};
|
||||
|
||||
static MESSAGES: AtomicBool = ATOMIC_BOOL_INIT;
|
||||
static IGNORE_MESSAGES: AtomicBool = ATOMIC_BOOL_INIT;
|
||||
static MESSAGES: AtomicBool = AtomicBool::new(false);
|
||||
static IGNORE_MESSAGES: AtomicBool = AtomicBool::new(false);
|
||||
static ERRORED: AtomicBool = AtomicBool::new(false);
|
||||
|
||||
/// Emit a non-fatal error message, unless messages were disabled.
|
||||
#[macro_export]
|
||||
macro_rules! message {
|
||||
($($tt:tt)*) => {
|
||||
if ::messages::messages() {
|
||||
if crate::messages::messages() {
|
||||
eprintln!($($tt)*);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Like message, but sets ripgrep's "errored" flag, which controls the exit
|
||||
/// status.
|
||||
#[macro_export]
|
||||
macro_rules! err_message {
|
||||
($($tt:tt)*) => {
|
||||
crate::messages::set_errored();
|
||||
message!($($tt)*);
|
||||
}
|
||||
}
|
||||
|
||||
/// Emit a non-fatal ignore-related error message (like a parse error), unless
|
||||
/// ignore-messages were disabled.
|
||||
#[macro_export]
|
||||
macro_rules! ignore_message {
|
||||
($($tt:tt)*) => {
|
||||
if ::messages::messages() && ::messages::ignore_messages() {
|
||||
if crate::messages::messages() && crate::messages::ignore_messages() {
|
||||
eprintln!($($tt)*);
|
||||
}
|
||||
}
|
||||
@@ -48,3 +62,13 @@ pub fn ignore_messages() -> bool {
|
||||
pub fn set_ignore_messages(yes: bool) {
|
||||
IGNORE_MESSAGES.store(yes, Ordering::SeqCst)
|
||||
}
|
||||
|
||||
/// Returns true if and only if ripgrep came across a non-fatal error.
|
||||
pub fn errored() -> bool {
|
||||
ERRORED.load(Ordering::SeqCst)
|
||||
}
|
||||
|
||||
/// Indicate that ripgrep has come across a non-fatal error.
|
||||
pub fn set_errored() {
|
||||
ERRORED.store(true, Ordering::SeqCst);
|
||||
}
|
||||
|
@@ -10,12 +10,13 @@ use grep::matcher::Matcher;
|
||||
use grep::pcre2::{RegexMatcher as PCRE2RegexMatcher};
|
||||
use grep::printer::{JSON, Standard, Summary, Stats};
|
||||
use grep::regex::{RegexMatcher as RustRegexMatcher};
|
||||
use grep::searcher::Searcher;
|
||||
use grep::searcher::{BinaryDetection, Searcher};
|
||||
use ignore::overrides::Override;
|
||||
use serde_json as json;
|
||||
use serde_json::json;
|
||||
use termcolor::WriteColor;
|
||||
|
||||
use subject::Subject;
|
||||
use crate::subject::Subject;
|
||||
|
||||
/// The configuration for the search worker. Among a few other things, the
|
||||
/// configuration primarily controls the way we show search results to users
|
||||
@@ -26,6 +27,8 @@ struct Config {
|
||||
preprocessor: Option<PathBuf>,
|
||||
preprocessor_globs: Override,
|
||||
search_zip: bool,
|
||||
binary_implicit: BinaryDetection,
|
||||
binary_explicit: BinaryDetection,
|
||||
}
|
||||
|
||||
impl Default for Config {
|
||||
@@ -35,6 +38,8 @@ impl Default for Config {
|
||||
preprocessor: None,
|
||||
preprocessor_globs: Override::empty(),
|
||||
search_zip: false,
|
||||
binary_implicit: BinaryDetection::none(),
|
||||
binary_explicit: BinaryDetection::none(),
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -133,6 +138,37 @@ impl SearchWorkerBuilder {
|
||||
self.config.search_zip = yes;
|
||||
self
|
||||
}
|
||||
|
||||
/// Set the binary detection that should be used when searching files
|
||||
/// found via a recursive directory search.
|
||||
///
|
||||
/// Generally, this binary detection may be `BinaryDetection::quit` if
|
||||
/// we want to skip binary files completely.
|
||||
///
|
||||
/// By default, no binary detection is performed.
|
||||
pub fn binary_detection_implicit(
|
||||
&mut self,
|
||||
detection: BinaryDetection,
|
||||
) -> &mut SearchWorkerBuilder {
|
||||
self.config.binary_implicit = detection;
|
||||
self
|
||||
}
|
||||
|
||||
/// Set the binary detection that should be used when searching files
|
||||
/// explicitly supplied by an end user.
|
||||
///
|
||||
/// Generally, this binary detection should NOT be `BinaryDetection::quit`,
|
||||
/// since we never want to automatically filter files supplied by the end
|
||||
/// user.
|
||||
///
|
||||
/// By default, no binary detection is performed.
|
||||
pub fn binary_detection_explicit(
|
||||
&mut self,
|
||||
detection: BinaryDetection,
|
||||
) -> &mut SearchWorkerBuilder {
|
||||
self.config.binary_explicit = detection;
|
||||
self
|
||||
}
|
||||
}
|
||||
|
||||
/// The result of executing a search.
|
||||
@@ -307,6 +343,14 @@ impl<W: WriteColor> SearchWorker<W> {
|
||||
|
||||
/// Search the given subject using the appropriate strategy.
|
||||
fn search_impl(&mut self, subject: &Subject) -> io::Result<SearchResult> {
|
||||
let bin =
|
||||
if subject.is_explicit() {
|
||||
self.config.binary_explicit.clone()
|
||||
} else {
|
||||
self.config.binary_implicit.clone()
|
||||
};
|
||||
self.searcher.set_binary_detection(bin);
|
||||
|
||||
let path = subject.path();
|
||||
if subject.is_stdin() {
|
||||
let stdin = io::stdin();
|
||||
|
@@ -1,6 +1,7 @@
|
||||
use std::path::Path;
|
||||
|
||||
use ignore::{self, DirEntry};
|
||||
use log;
|
||||
|
||||
/// A configuration for describing how subjects should be built.
|
||||
#[derive(Clone, Debug)]
|
||||
@@ -40,7 +41,7 @@ impl SubjectBuilder {
|
||||
match result {
|
||||
Ok(dent) => self.build(dent),
|
||||
Err(err) => {
|
||||
message!("{}", err);
|
||||
err_message!("{}", err);
|
||||
None
|
||||
}
|
||||
}
|
||||
@@ -58,17 +59,12 @@ impl SubjectBuilder {
|
||||
if let Some(ignore_err) = subj.dent.error() {
|
||||
ignore_message!("{}", ignore_err);
|
||||
}
|
||||
// If this entry represents stdin, then we always search it.
|
||||
if subj.dent.is_stdin() {
|
||||
// If this entry was explicitly provided by an end user, then we always
|
||||
// want to search it.
|
||||
if subj.is_explicit() {
|
||||
return Some(subj);
|
||||
}
|
||||
// If this subject has a depth of 0, then it was provided explicitly
|
||||
// by an end user (or via a shell glob). In this case, we always want
|
||||
// to search it if it even smells like a file (e.g., a symlink).
|
||||
if subj.dent.depth() == 0 && !subj.is_dir() {
|
||||
return Some(subj);
|
||||
}
|
||||
// At this point, we only want to search something it's explicitly a
|
||||
// At this point, we only want to search something if it's explicitly a
|
||||
// file. This omits symlinks. (If ripgrep was configured to follow
|
||||
// symlinks, then they have already been followed by the directory
|
||||
// traversal.)
|
||||
@@ -79,7 +75,7 @@ impl SubjectBuilder {
|
||||
// directory. Otherwise, emitting messages for directories is just
|
||||
// noisy.
|
||||
if !subj.is_dir() {
|
||||
debug!(
|
||||
log::debug!(
|
||||
"ignoring {}: failed to pass subject filter: \
|
||||
file type: {:?}, metadata: {:?}",
|
||||
subj.dent.path().display(),
|
||||
@@ -126,9 +122,39 @@ impl Subject {
|
||||
self.dent.is_stdin()
|
||||
}
|
||||
|
||||
/// Returns true if and only if this subject points to a directory.
|
||||
/// Returns true if and only if this entry corresponds to a subject to
|
||||
/// search that was explicitly supplied by an end user.
|
||||
///
|
||||
/// Generally, this corresponds to either stdin or an explicit file path
|
||||
/// argument. e.g., in `rg foo some-file ./some-dir/`, `some-file` is
|
||||
/// an explicit subject, but, e.g., `./some-dir/some-other-file` is not.
|
||||
///
|
||||
/// However, note that ripgrep does not see through shell globbing. e.g.,
|
||||
/// in `rg foo ./some-dir/*`, `./some-dir/some-other-file` will be treated
|
||||
/// as an explicit subject.
|
||||
pub fn is_explicit(&self) -> bool {
|
||||
// stdin is obvious. When an entry has a depth of 0, that means it
|
||||
// was explicitly provided to our directory iterator, which means it
|
||||
// was in turn explicitly provided by the end user. The !is_dir check
|
||||
// means that we want to search files even if their symlinks, again,
|
||||
// because they were explicitly provided. (And we never want to try
|
||||
// to search a directory.)
|
||||
self.is_stdin() || (self.dent.depth() == 0 && !self.is_dir())
|
||||
}
|
||||
|
||||
/// Returns true if and only if this subject points to a directory after
|
||||
/// following symbolic links.
|
||||
fn is_dir(&self) -> bool {
|
||||
self.dent.file_type().map_or(false, |ft| ft.is_dir())
|
||||
let ft = match self.dent.file_type() {
|
||||
None => return false,
|
||||
Some(ft) => ft,
|
||||
};
|
||||
if ft.is_dir() {
|
||||
return true;
|
||||
}
|
||||
// If this is a symlink, then we want to follow it to determine
|
||||
// whether it's a directory or not.
|
||||
self.dent.path_is_symlink() && self.dent.path().is_dir()
|
||||
}
|
||||
|
||||
/// Returns true if and only if this subject points to a file.
|
||||
|
315
tests/binary.rs
Normal file
315
tests/binary.rs
Normal file
@@ -0,0 +1,315 @@
|
||||
use crate::util::{Dir, TestCommand};
|
||||
|
||||
// This file contains a smattering of tests specifically for checking ripgrep's
|
||||
// handling of binary files. There's quite a bit of discussion on this in this
|
||||
// bug report: https://github.com/BurntSushi/ripgrep/issues/306
|
||||
|
||||
// Our haystack is the first 500 lines of Gutenberg's copy of "A Study in
|
||||
// Scarlet," with a NUL byte at line 237: `abcdef\x00`.
|
||||
//
|
||||
// The position and size of the haystack is, unfortunately, significant. In
|
||||
// particular, the NUL byte is specifically inserted at some point *after* the
|
||||
// first 8192 bytes, which corresponds to the initial capacity of the buffer
|
||||
// that ripgrep uses to read files. (grep for DEFAULT_BUFFER_CAPACITY.) The
|
||||
// position of the NUL byte ensures that we can execute some search on the
|
||||
// initial buffer contents without ever detecting any binary data. Moreover,
|
||||
// when using a memory map for searching, only the first 8192 bytes are
|
||||
// scanned for a NUL byte, so no binary bytes are detected at all when using
|
||||
// a memory map (unless our query matches line 237).
|
||||
//
|
||||
// One last note: in the tests below, we use --no-mmap heavily because binary
|
||||
// detection with memory maps is a bit different. Namely, NUL bytes are only
|
||||
// searched for in the first few KB of the file and in a match. Normally, NUL
|
||||
// bytes are searched for everywhere.
|
||||
//
|
||||
// TODO: Add tests for binary file detection when using memory maps.
|
||||
const HAY: &'static [u8] = include_bytes!("./data/sherlock-nul.txt");
|
||||
|
||||
// This tests that ripgrep prints a warning message if it finds and prints a
|
||||
// match in a binary file before detecting that it is a binary file. The point
|
||||
// here is to notify that user that the search of the file is only partially
|
||||
// complete.
|
||||
//
|
||||
// This applies to files that are *implicitly* searched via a recursive
|
||||
// directory traversal. In particular, this results in a WARNING message being
|
||||
// printed. We make our file "implicit" by doing a recursive search with a glob
|
||||
// that matches our file.
|
||||
rgtest!(after_match1_implicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
WARNING: stopped searching binary file hay after match (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit, except we provide a file to search
|
||||
// explicitly. This results in identical behavior, but a different message.
|
||||
rgtest!(after_match1_explicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "Project Gutenberg EBook", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
Binary file matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_explicit, except we feed our content on stdin.
|
||||
rgtest!(after_match1_stdin, |_: Dir, mut cmd: TestCommand| {
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "Project Gutenberg EBook",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
Binary file matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.pipe(HAY));
|
||||
});
|
||||
|
||||
// Like after_match1_implicit, but provides the --binary flag, which
|
||||
// disables binary filtering. Thus, this matches the behavior of ripgrep as
|
||||
// if the file were given explicitly.
|
||||
rgtest!(after_match1_implicit_binary, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--binary", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
Binary file hay matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit, but enables -a/--text, so no binary
|
||||
// detection should be performed.
|
||||
rgtest!(after_match1_implicit_text, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--text", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit_text, but enables -a/--text, so no binary
|
||||
// detection should be performed.
|
||||
rgtest!(after_match1_explicit_text, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--text", "Project Gutenberg EBook", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit, except this asks ripgrep to print all matching
|
||||
// files.
|
||||
//
|
||||
// This is an interesting corner case that one might consider a bug, however,
|
||||
// it's unlikely to be fixed. Namely, ripgrep probably shouldn't print `hay`
|
||||
// as a matching file since it is in fact a binary file, and thus should be
|
||||
// filtered out by default. However, the --files-with-matches flag will print
|
||||
// out the path of a matching file as soon as a match is seen and then stop
|
||||
// searching completely. Therefore, the NUL byte is never actually detected.
|
||||
//
|
||||
// The only way to fix this would be to kill ripgrep's performance in this case
|
||||
// and continue searching the entire file for a NUL byte. (Similarly if the
|
||||
// --quiet flag is set. See the next test.)
|
||||
rgtest!(after_match1_implicit_path, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-l", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
eqnice!("hay\n", cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit_path, except this indicates that a match was
|
||||
// found with no other output. (This is the same bug described above, but
|
||||
// manifest as an exit code with no output.)
|
||||
rgtest!(after_match1_implicit_quiet, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-q", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
eqnice!("", cmd.stdout());
|
||||
});
|
||||
|
||||
// This sets up the same test as after_match1_implicit_path, but instead of
|
||||
// just printing the matching files, this includes the full count of matches.
|
||||
// In this case, we need to search the entire file, so ripgrep correctly
|
||||
// detects the binary data and suppresses output.
|
||||
rgtest!(after_match1_implicit_count, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-c", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
cmd.assert_err();
|
||||
});
|
||||
|
||||
// Like after_match1_implicit_count, except the --binary flag is provided,
|
||||
// which makes ripgrep disable binary data filtering even for implicit files.
|
||||
rgtest!(after_match1_implicit_count_binary, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-c", "--binary",
|
||||
"Project Gutenberg EBook",
|
||||
"-g", "hay",
|
||||
]);
|
||||
eqnice!("hay:1\n", cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit_count, except the file path is provided
|
||||
// explicitly, so binary filtering is disabled and a count is correctly
|
||||
// reported.
|
||||
rgtest!(after_match1_explicit_count, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-c", "Project Gutenberg EBook", "hay",
|
||||
]);
|
||||
eqnice!("1\n", cmd.stdout());
|
||||
});
|
||||
|
||||
// This tests that a match way before the NUL byte is shown, but a match after
|
||||
// the NUL byte is not.
|
||||
rgtest!(after_match2_implicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n",
|
||||
"Project Gutenberg EBook|a medical student",
|
||||
"-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
WARNING: stopped searching binary file hay after match (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match2_implicit, but enables -a/--text, so no binary
|
||||
// detection should be performed.
|
||||
rgtest!(after_match2_implicit_text, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--text",
|
||||
"Project Gutenberg EBook|a medical student",
|
||||
"-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
hay:236:\"And yet you say he is not a medical student?\"
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// This tests that ripgrep *silently* quits before finding a match that occurs
|
||||
// after a NUL byte.
|
||||
rgtest!(before_match1_implicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "Heaven", "-g", "hay",
|
||||
]);
|
||||
cmd.assert_err();
|
||||
});
|
||||
|
||||
// This tests that ripgrep *does not* silently quit before finding a match that
|
||||
// occurs after a NUL byte when a file is explicitly searched.
|
||||
rgtest!(before_match1_explicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "Heaven", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
Binary file matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like before_match1_implicit, but enables the --binary flag, which
|
||||
// disables binary filtering. Thus, this matches the behavior of ripgrep as if
|
||||
// the file were given explicitly.
|
||||
rgtest!(before_match1_implicit_binary, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--binary", "Heaven", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
Binary file hay matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like before_match1_implicit, but enables -a/--text, so no binary
|
||||
// detection should be performed.
|
||||
rgtest!(before_match1_implicit_text, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--text", "Heaven", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:238:\"No. Heaven knows what the objects of his studies are. But here we
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// This tests that ripgrep *silently* quits before finding a match that occurs
|
||||
// before a NUL byte, but within the same buffer as the NUL byte.
|
||||
rgtest!(before_match2_implicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "a medical student", "-g", "hay",
|
||||
]);
|
||||
cmd.assert_err();
|
||||
});
|
||||
|
||||
// This tests that ripgrep *does not* silently quit before finding a match that
|
||||
// occurs before a NUL byte, but within the same buffer as the NUL byte. Even
|
||||
// though the match occurs before the NUL byte, ripgrep still doesn't print it
|
||||
// because it has already scanned ahead to detect the NUL byte. (This matches
|
||||
// the behavior of GNU grep.)
|
||||
rgtest!(before_match2_explicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "a medical student", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
Binary file matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like before_match1_implicit, but enables -a/--text, so no binary
|
||||
// detection should be performed.
|
||||
rgtest!(before_match2_implicit_text, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--text", "a medical student", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:236:\"And yet you say he is not a medical student?\"
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
500
tests/data/sherlock-nul.txt
Normal file
500
tests/data/sherlock-nul.txt
Normal file
@@ -0,0 +1,500 @@
|
||||
The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
|
||||
This eBook is for the use of anyone anywhere at no cost and with
|
||||
almost no restrictions whatsoever. You may copy it, give it away or
|
||||
re-use it under the terms of the Project Gutenberg License included
|
||||
with this eBook or online at www.gutenberg.org
|
||||
|
||||
|
||||
Title: A Study In Scarlet
|
||||
|
||||
Author: Arthur Conan Doyle
|
||||
|
||||
Posting Date: July 12, 2008 [EBook #244]
|
||||
Release Date: April, 1995
|
||||
[Last updated: February 17, 2013]
|
||||
|
||||
Language: English
|
||||
|
||||
|
||||
*** START OF THIS PROJECT GUTENBERG EBOOK A STUDY IN SCARLET ***
|
||||
|
||||
|
||||
|
||||
|
||||
Produced by Roger Squires
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
A STUDY IN SCARLET.
|
||||
|
||||
By A. Conan Doyle
|
||||
|
||||
[1]
|
||||
|
||||
|
||||
|
||||
Original Transcriber's Note: This etext is prepared directly
|
||||
from an 1887 edition, and care has been taken to duplicate the
|
||||
original exactly, including typographical and punctuation
|
||||
vagaries.
|
||||
|
||||
Additions to the text include adding the underscore character to
|
||||
indicate italics, and textual end-notes in square braces.
|
||||
|
||||
Project Gutenberg Editor's Note: In reproofing and moving old PG
|
||||
files such as this to the present PG directory system it is the
|
||||
policy to reformat the text to conform to present PG Standards.
|
||||
In this case however, in consideration of the note above of the
|
||||
original transcriber describing his care to try to duplicate the
|
||||
original 1887 edition as to typography and punctuation vagaries,
|
||||
no changes have been made in this ascii text file. However, in
|
||||
the Latin-1 file and this html file, present standards are
|
||||
followed and the several French and Spanish words have been
|
||||
given their proper accents.
|
||||
|
||||
Part II, The Country of the Saints, deals much with the Mormon Church.
|
||||
|
||||
|
||||
|
||||
|
||||
A STUDY IN SCARLET.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
PART I.
|
||||
|
||||
(_Being a reprint from the reminiscences of_ JOHN H. WATSON, M.D., _late
|
||||
of the Army Medical Department._) [2]
|
||||
|
||||
|
||||
|
||||
|
||||
CHAPTER I. MR. SHERLOCK HOLMES.
|
||||
|
||||
|
||||
IN the year 1878 I took my degree of Doctor of Medicine of the
|
||||
University of London, and proceeded to Netley to go through the course
|
||||
prescribed for surgeons in the army. Having completed my studies there,
|
||||
I was duly attached to the Fifth Northumberland Fusiliers as Assistant
|
||||
Surgeon. The regiment was stationed in India at the time, and before
|
||||
I could join it, the second Afghan war had broken out. On landing at
|
||||
Bombay, I learned that my corps had advanced through the passes, and
|
||||
was already deep in the enemy's country. I followed, however, with many
|
||||
other officers who were in the same situation as myself, and succeeded
|
||||
in reaching Candahar in safety, where I found my regiment, and at once
|
||||
entered upon my new duties.
|
||||
|
||||
The campaign brought honours and promotion to many, but for me it had
|
||||
nothing but misfortune and disaster. I was removed from my brigade and
|
||||
attached to the Berkshires, with whom I served at the fatal battle of
|
||||
Maiwand. There I was struck on the shoulder by a Jezail bullet, which
|
||||
shattered the bone and grazed the subclavian artery. I should have
|
||||
fallen into the hands of the murderous Ghazis had it not been for the
|
||||
devotion and courage shown by Murray, my orderly, who threw me across a
|
||||
pack-horse, and succeeded in bringing me safely to the British lines.
|
||||
|
||||
Worn with pain, and weak from the prolonged hardships which I had
|
||||
undergone, I was removed, with a great train of wounded sufferers, to
|
||||
the base hospital at Peshawar. Here I rallied, and had already improved
|
||||
so far as to be able to walk about the wards, and even to bask a little
|
||||
upon the verandah, when I was struck down by enteric fever, that curse
|
||||
of our Indian possessions. For months my life was despaired of, and
|
||||
when at last I came to myself and became convalescent, I was so weak and
|
||||
emaciated that a medical board determined that not a day should be lost
|
||||
in sending me back to England. I was dispatched, accordingly, in the
|
||||
troopship "Orontes," and landed a month later on Portsmouth jetty, with
|
||||
my health irretrievably ruined, but with permission from a paternal
|
||||
government to spend the next nine months in attempting to improve it.
|
||||
|
||||
I had neither kith nor kin in England, and was therefore as free as
|
||||
air--or as free as an income of eleven shillings and sixpence a day will
|
||||
permit a man to be. Under such circumstances, I naturally gravitated to
|
||||
London, that great cesspool into which all the loungers and idlers of
|
||||
the Empire are irresistibly drained. There I stayed for some time at
|
||||
a private hotel in the Strand, leading a comfortless, meaningless
|
||||
existence, and spending such money as I had, considerably more freely
|
||||
than I ought. So alarming did the state of my finances become, that
|
||||
I soon realized that I must either leave the metropolis and rusticate
|
||||
somewhere in the country, or that I must make a complete alteration in
|
||||
my style of living. Choosing the latter alternative, I began by making
|
||||
up my mind to leave the hotel, and to take up my quarters in some less
|
||||
pretentious and less expensive domicile.
|
||||
|
||||
On the very day that I had come to this conclusion, I was standing at
|
||||
the Criterion Bar, when some one tapped me on the shoulder, and turning
|
||||
round I recognized young Stamford, who had been a dresser under me at
|
||||
Barts. The sight of a friendly face in the great wilderness of London is
|
||||
a pleasant thing indeed to a lonely man. In old days Stamford had never
|
||||
been a particular crony of mine, but now I hailed him with enthusiasm,
|
||||
and he, in his turn, appeared to be delighted to see me. In the
|
||||
exuberance of my joy, I asked him to lunch with me at the Holborn, and
|
||||
we started off together in a hansom.
|
||||
|
||||
"Whatever have you been doing with yourself, Watson?" he asked in
|
||||
undisguised wonder, as we rattled through the crowded London streets.
|
||||
"You are as thin as a lath and as brown as a nut."
|
||||
|
||||
I gave him a short sketch of my adventures, and had hardly concluded it
|
||||
by the time that we reached our destination.
|
||||
|
||||
"Poor devil!" he said, commiseratingly, after he had listened to my
|
||||
misfortunes. "What are you up to now?"
|
||||
|
||||
"Looking for lodgings." [3] I answered. "Trying to solve the problem
|
||||
as to whether it is possible to get comfortable rooms at a reasonable
|
||||
price."
|
||||
|
||||
"That's a strange thing," remarked my companion; "you are the second man
|
||||
to-day that has used that expression to me."
|
||||
|
||||
"And who was the first?" I asked.
|
||||
|
||||
"A fellow who is working at the chemical laboratory up at the hospital.
|
||||
He was bemoaning himself this morning because he could not get someone
|
||||
to go halves with him in some nice rooms which he had found, and which
|
||||
were too much for his purse."
|
||||
|
||||
"By Jove!" I cried, "if he really wants someone to share the rooms and
|
||||
the expense, I am the very man for him. I should prefer having a partner
|
||||
to being alone."
|
||||
|
||||
Young Stamford looked rather strangely at me over his wine-glass. "You
|
||||
don't know Sherlock Holmes yet," he said; "perhaps you would not care
|
||||
for him as a constant companion."
|
||||
|
||||
"Why, what is there against him?"
|
||||
|
||||
"Oh, I didn't say there was anything against him. He is a little queer
|
||||
in his ideas--an enthusiast in some branches of science. As far as I
|
||||
know he is a decent fellow enough."
|
||||
|
||||
"A medical student, I suppose?" said I.
|
||||
|
||||
"No--I have no idea what he intends to go in for. I believe he is well
|
||||
up in anatomy, and he is a first-class chemist; but, as far as I know,
|
||||
he has never taken out any systematic medical classes. His studies are
|
||||
very desultory and eccentric, but he has amassed a lot of out-of-the way
|
||||
knowledge which would astonish his professors."
|
||||
|
||||
"Did you never ask him what he was going in for?" I asked.
|
||||
|
||||
"No; he is not a man that it is easy to draw out, though he can be
|
||||
communicative enough when the fancy seizes him."
|
||||
|
||||
"I should like to meet him," I said. "If I am to lodge with anyone, I
|
||||
should prefer a man of studious and quiet habits. I am not strong
|
||||
enough yet to stand much noise or excitement. I had enough of both in
|
||||
Afghanistan to last me for the remainder of my natural existence. How
|
||||
could I meet this friend of yours?"
|
||||
|
||||
"He is sure to be at the laboratory," returned my companion. "He either
|
||||
avoids the place for weeks, or else he works there from morning to
|
||||
night. If you like, we shall drive round together after luncheon."
|
||||
|
||||
"Certainly," I answered, and the conversation drifted away into other
|
||||
channels.
|
||||
|
||||
As we made our way to the hospital after leaving the Holborn, Stamford
|
||||
gave me a few more particulars about the gentleman whom I proposed to
|
||||
take as a fellow-lodger.
|
||||
|
||||
"You mustn't blame me if you don't get on with him," he said; "I know
|
||||
nothing more of him than I have learned from meeting him occasionally in
|
||||
the laboratory. You proposed this arrangement, so you must not hold me
|
||||
responsible."
|
||||
|
||||
"If we don't get on it will be easy to part company," I answered. "It
|
||||
seems to me, Stamford," I added, looking hard at my companion, "that you
|
||||
have some reason for washing your hands of the matter. Is this fellow's
|
||||
temper so formidable, or what is it? Don't be mealy-mouthed about it."
|
||||
|
||||
"It is not easy to express the inexpressible," he answered with a laugh.
|
||||
"Holmes is a little too scientific for my tastes--it approaches to
|
||||
cold-bloodedness. I could imagine his giving a friend a little pinch of
|
||||
the latest vegetable alkaloid, not out of malevolence, you understand,
|
||||
but simply out of a spirit of inquiry in order to have an accurate idea
|
||||
of the effects. To do him justice, I think that he would take it himself
|
||||
with the same readiness. He appears to have a passion for definite and
|
||||
exact knowledge."
|
||||
|
||||
"Very right too."
|
||||
|
||||
"Yes, but it may be pushed to excess. When it comes to beating the
|
||||
subjects in the dissecting-rooms with a stick, it is certainly taking
|
||||
rather a bizarre shape."
|
||||
|
||||
"Beating the subjects!"
|
||||
|
||||
"Yes, to verify how far bruises may be produced after death. I saw him
|
||||
at it with my own eyes."
|
||||
|
||||
"And yet you say he is not a medical student?"
|
||||
abcdef |