mirror of
https://github.com/BurntSushi/ripgrep.git
synced 2025-07-27 02:01:58 -07:00
Compare commits
2 Commits
grep-match
...
ag/disable
Author | SHA1 | Date | |
---|---|---|---|
|
ebc80b01fa | ||
|
b00cd69a40 |
@@ -63,13 +63,13 @@ matrix:
|
||||
# Minimum Rust supported channel. We enable these to make sure ripgrep
|
||||
# continues to work on the advertised minimum Rust version.
|
||||
- os: linux
|
||||
rust: 1.34.0
|
||||
rust: 1.32.0
|
||||
env: TARGET=x86_64-unknown-linux-gnu
|
||||
- os: linux
|
||||
rust: 1.34.0
|
||||
rust: 1.32.0
|
||||
env: TARGET=x86_64-unknown-linux-musl
|
||||
- os: linux
|
||||
rust: 1.34.0
|
||||
rust: 1.32.0
|
||||
env: TARGET=arm-unknown-linux-gnueabihf GCC_VERSION=4.8
|
||||
addons:
|
||||
apt:
|
||||
|
80
CHANGELOG.md
80
CHANGELOG.md
@@ -1,20 +1,6 @@
|
||||
11.0.0 (TBD)
|
||||
0.11.0 (TBD)
|
||||
============
|
||||
ripgrep 11 is a new major version release of ripgrep that contains many bug
|
||||
fixes, some performance improvements and a few feature enhancements. Notably,
|
||||
ripgrep's user experience for binary file filtering has been improved. See the
|
||||
[guide's new section on binary data](GUIDE.md#binary-data) for more details.
|
||||
|
||||
This release also marks a change in ripgrep's versioning. Where as the previous
|
||||
version was `0.10.0`, this version is `11.0.0`. Moving forward, ripgrep's
|
||||
major version will be increased a few times per year. ripgrep will continue to
|
||||
be conservative with respect to backwards compatibility, but may occasionally
|
||||
introduce breaking changes, which will always be documented in this CHANGELOG.
|
||||
See [issue 1172](https://github.com/BurntSushi/ripgrep/issues/1172) for a bit
|
||||
more detail on why this versioning change was made.
|
||||
|
||||
This release increases the **minimum supported Rust version** from 1.28.0 to
|
||||
1.34.0.
|
||||
TODO.
|
||||
|
||||
**BREAKING CHANGES**:
|
||||
|
||||
@@ -25,91 +11,45 @@ This release increases the **minimum supported Rust version** from 1.28.0 to
|
||||
error (e.g., regex syntax error). One exception to this is if ripgrep is run
|
||||
with `-q/--quiet`. In that case, if an error occurs and a match is found,
|
||||
then ripgrep will exit with a `0` exit status code.
|
||||
* Supplying the `-u/--unrestricted` flag three times is now equivalent to
|
||||
supplying `--no-ignore --hidden --binary`. Previously, `-uuu` was equivalent
|
||||
to `--no-ignore --hidden --text`. The difference is that `--binary` disables
|
||||
binary file filtering without potentially dumping binary data into your
|
||||
terminal. That is, `rg -uuu foo` should now be equivalent to `grep -r foo`.
|
||||
* The `avx-accel` feature of ripgrep has been removed since it is no longer
|
||||
necessary. All uses of AVX in ripgrep are now enabled automatically via
|
||||
runtime CPU feature detection. The `simd-accel` feature does remain
|
||||
available, however, it does increase compilation times substantially at the
|
||||
moment.
|
||||
|
||||
Performance improvements:
|
||||
|
||||
* [PERF #497](https://github.com/BurntSushi/ripgrep/issues/497),
|
||||
[PERF #838](https://github.com/BurntSushi/ripgrep/issues/838):
|
||||
Make `rg -F -f dictionary-of-literals` much faster.
|
||||
|
||||
Feature enhancements:
|
||||
|
||||
* Added or improved file type filtering for Apache Thrift, ASP, Bazel, Brotli,
|
||||
BuildStream, bzip2, C, C++, Cython, gzip, Java, Make, Postscript, QML, Tex,
|
||||
XML, xz, zig and zstd.
|
||||
* [FEATURE #855](https://github.com/BurntSushi/ripgrep/issues/855):
|
||||
Add `--binary` flag for disabling binary file filtering.
|
||||
* [FEATURE #1078](https://github.com/BurntSushi/ripgrep/pull/1078):
|
||||
Add `--max-columns-preview` flag for showing a preview of long lines.
|
||||
* [FEATURE #1099](https://github.com/BurntSushi/ripgrep/pull/1099):
|
||||
Add support for Brotli and Zstd to the `-z/--search-zip` flag.
|
||||
* [FEATURE #1138](https://github.com/BurntSushi/ripgrep/pull/1138):
|
||||
Add `--no-ignore-dot` flag for ignoring `.ignore` files.
|
||||
* [FEATURE #1155](https://github.com/BurntSushi/ripgrep/pull/1155):
|
||||
Add `--auto-hybrid-regex` flag for automatically falling back to PCRE2.
|
||||
* [FEATURE #1159](https://github.com/BurntSushi/ripgrep/pull/1159):
|
||||
ripgrep's exit status logic should now match GNU grep. See updated man page.
|
||||
* [FEATURE #1164](https://github.com/BurntSushi/ripgrep/pull/1164):
|
||||
Add `--ignore-file-case-insensitive` for case insensitive ignore globs.
|
||||
* [FEATURE #1185](https://github.com/BurntSushi/ripgrep/pull/1185):
|
||||
Add `-I` flag as a short option for the `--no-filename` flag.
|
||||
* [FEATURE #1207](https://github.com/BurntSushi/ripgrep/pull/1207):
|
||||
Add `none` value to `-E/--encoding` to forcefully disable all transcoding.
|
||||
* [FEATURE da9d7204](https://github.com/BurntSushi/ripgrep/commit/da9d7204):
|
||||
Add `--pcre2-version` for querying showing PCRE2 version information.
|
||||
* [FEATURE #1170](https://github.com/BurntSushi/ripgrep/pull/1170):
|
||||
Add `--ignore-file-case-insensitive` for case insensitive .ignore globs.
|
||||
|
||||
Bug fixes:
|
||||
|
||||
* [BUG #306](https://github.com/BurntSushi/ripgrep/issues/306),
|
||||
[BUG #855](https://github.com/BurntSushi/ripgrep/issues/855):
|
||||
Improve the user experience for ripgrep's binary file filtering.
|
||||
* [BUG #373](https://github.com/BurntSushi/ripgrep/issues/373),
|
||||
[BUG #1098](https://github.com/BurntSushi/ripgrep/issues/1098):
|
||||
`**` is now accepted as valid syntax anywhere in a glob.
|
||||
* [BUG #916](https://github.com/BurntSushi/ripgrep/issues/916):
|
||||
ripgrep no longer hangs when searching `/proc` with a zombie process present.
|
||||
* [BUG #1052](https://github.com/BurntSushi/ripgrep/issues/1052):
|
||||
Fix bug where ripgrep could panic when transcoding UTF-16 files.
|
||||
* [BUG #1055](https://github.com/BurntSushi/ripgrep/issues/1055):
|
||||
Suggest `-U/--multiline` when a pattern contains a `\n`.
|
||||
* [BUG #1063](https://github.com/BurntSushi/ripgrep/issues/1063):
|
||||
Always strip a BOM if it's present, even for UTF-8.
|
||||
* [BUG #1064](https://github.com/BurntSushi/ripgrep/issues/1064):
|
||||
Fix inner literal detection that could lead to incorrect matches.
|
||||
* [BUG #1079](https://github.com/BurntSushi/ripgrep/issues/1079):
|
||||
Fixes a bug where the order of globs could result in missing a match.
|
||||
* [BUG #1089](https://github.com/BurntSushi/ripgrep/issues/1089):
|
||||
Fix another bug where ripgrep could panic when transcoding UTF-16 files.
|
||||
* [BUG #1091](https://github.com/BurntSushi/ripgrep/issues/1091):
|
||||
Add note about inverted flags to the man page.
|
||||
* [BUG #1093](https://github.com/BurntSushi/ripgrep/pull/1093):
|
||||
Fix handling of literal slashes in gitignore patterns.
|
||||
* [BUG #1095](https://github.com/BurntSushi/ripgrep/issues/1095):
|
||||
Fix corner cases involving the `--crlf` flag.
|
||||
* [BUG #1101](https://github.com/BurntSushi/ripgrep/issues/1101):
|
||||
Fix AsciiDoc escaping for man page output.
|
||||
* [BUG #1103](https://github.com/BurntSushi/ripgrep/issues/1103):
|
||||
Clarify what `--encoding auto` does.
|
||||
* [BUG #1106](https://github.com/BurntSushi/ripgrep/issues/1106):
|
||||
`--files-with-matches` and `--files-without-match` work with one file.
|
||||
* [BUG #1093](https://github.com/BurntSushi/ripgrep/pull/1093):
|
||||
Fix handling of literal slashes in gitignore patterns.
|
||||
* [BUG #1121](https://github.com/BurntSushi/ripgrep/issues/1121):
|
||||
Fix bug that was triggering Windows antimalware when using the `--files`
|
||||
flag.
|
||||
Fix bug that was triggering Windows antimalware when using the --files flag.
|
||||
* [BUG #1125](https://github.com/BurntSushi/ripgrep/issues/1125),
|
||||
[BUG #1159](https://github.com/BurntSushi/ripgrep/issues/1159):
|
||||
ripgrep shouldn't panic for `rg -h | rg` and should emit correct exit status.
|
||||
* [BUG #1144](https://github.com/BurntSushi/ripgrep/issues/1144):
|
||||
Fixes a bug where line numbers could be wrong on big-endian machines.
|
||||
* [BUG #1154](https://github.com/BurntSushi/ripgrep/issues/1154):
|
||||
Windows files with "hidden" attribute are now treated as hidden.
|
||||
* [BUG #1173](https://github.com/BurntSushi/ripgrep/issues/1173):
|
||||
@@ -118,12 +58,6 @@ Bug fixes:
|
||||
Fix handling of repeated `**` patterns in gitignore files.
|
||||
* [BUG #1176](https://github.com/BurntSushi/ripgrep/issues/1176):
|
||||
Fix bug where `-F`/`-x` weren't applied to patterns given via `-f`.
|
||||
* [BUG #1189](https://github.com/BurntSushi/ripgrep/issues/1189):
|
||||
Document cases where ripgrep may use a lot of memory.
|
||||
* [BUG #1203](https://github.com/BurntSushi/ripgrep/issues/1203):
|
||||
Fix a matching bug related to the suffix literal optimization.
|
||||
* [BUG 8f14cb18](https://github.com/BurntSushi/ripgrep/commit/8f14cb18):
|
||||
Increase the default stack size for PCRE2's JIT.
|
||||
|
||||
|
||||
0.10.0 (2018-09-07)
|
||||
|
71
Cargo.lock
generated
71
Cargo.lock
generated
@@ -58,7 +58,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
|
||||
[[package]]
|
||||
name = "cc"
|
||||
version = "1.0.35"
|
||||
version = "1.0.34"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
|
||||
[[package]]
|
||||
@@ -132,17 +132,17 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
|
||||
[[package]]
|
||||
name = "glob"
|
||||
version = "0.3.0"
|
||||
version = "0.2.11"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
|
||||
[[package]]
|
||||
name = "globset"
|
||||
version = "0.4.3"
|
||||
version = "0.4.2"
|
||||
dependencies = [
|
||||
"aho-corasick 0.7.3 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"bstr 0.1.2 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"fnv 1.0.6 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"glob 0.3.0 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"glob 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"log 0.4.6 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"regex 1.1.5 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
]
|
||||
@@ -152,7 +152,7 @@ name = "grep"
|
||||
version = "0.2.3"
|
||||
dependencies = [
|
||||
"grep-cli 0.1.1",
|
||||
"grep-matcher 0.1.2",
|
||||
"grep-matcher 0.1.1",
|
||||
"grep-pcre2 0.1.2",
|
||||
"grep-printer 0.1.1",
|
||||
"grep-regex 0.1.2",
|
||||
@@ -167,7 +167,7 @@ version = "0.1.1"
|
||||
dependencies = [
|
||||
"atty 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"bstr 0.1.2 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"globset 0.4.3",
|
||||
"globset 0.4.2",
|
||||
"lazy_static 1.3.0 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"log 0.4.6 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"regex 1.1.5 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
@@ -178,7 +178,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "grep-matcher"
|
||||
version = "0.1.2"
|
||||
version = "0.1.1"
|
||||
dependencies = [
|
||||
"memchr 2.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"regex 1.1.5 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
@@ -188,8 +188,8 @@ dependencies = [
|
||||
name = "grep-pcre2"
|
||||
version = "0.1.2"
|
||||
dependencies = [
|
||||
"grep-matcher 0.1.2",
|
||||
"pcre2 0.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"grep-matcher 0.1.1",
|
||||
"pcre2 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -198,7 +198,7 @@ version = "0.1.1"
|
||||
dependencies = [
|
||||
"base64 0.10.1 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"bstr 0.1.2 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"grep-matcher 0.1.2",
|
||||
"grep-matcher 0.1.1",
|
||||
"grep-regex 0.1.2",
|
||||
"grep-searcher 0.1.3",
|
||||
"serde 1.0.90 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
@@ -211,8 +211,7 @@ dependencies = [
|
||||
name = "grep-regex"
|
||||
version = "0.1.2"
|
||||
dependencies = [
|
||||
"aho-corasick 0.7.3 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"grep-matcher 0.1.2",
|
||||
"grep-matcher 0.1.1",
|
||||
"log 0.4.6 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"regex 1.1.5 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"regex-syntax 0.6.6 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
@@ -228,7 +227,7 @@ dependencies = [
|
||||
"bytecount 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"encoding_rs 0.8.17 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"encoding_rs_io 0.1.6 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"grep-matcher 0.1.2",
|
||||
"grep-matcher 0.1.1",
|
||||
"grep-regex 0.1.2",
|
||||
"log 0.4.6 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"memmap 0.7.0 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
@@ -237,10 +236,10 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "ignore"
|
||||
version = "0.4.7"
|
||||
version = "0.4.6"
|
||||
dependencies = [
|
||||
"crossbeam-channel 0.3.8 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"globset 0.4.3",
|
||||
"globset 0.4.2",
|
||||
"lazy_static 1.3.0 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"log 0.4.6 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"memchr 2.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
@@ -307,21 +306,21 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "pcre2"
|
||||
version = "0.2.0"
|
||||
version = "0.1.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
dependencies = [
|
||||
"libc 0.2.51 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"log 0.4.6 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"pcre2-sys 0.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"pcre2-sys 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"thread_local 0.3.6 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "pcre2-sys"
|
||||
version = "0.2.0"
|
||||
version = "0.1.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
dependencies = [
|
||||
"cc 1.0.35 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"cc 1.0.34 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"libc 0.2.51 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"pkg-config 0.3.14 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
]
|
||||
@@ -341,7 +340,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "quote"
|
||||
version = "0.6.12"
|
||||
version = "0.6.11"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
dependencies = [
|
||||
"proc-macro2 0.4.27 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
@@ -453,7 +452,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "redox_syscall"
|
||||
version = "0.1.54"
|
||||
version = "0.1.52"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
|
||||
[[package]]
|
||||
@@ -461,7 +460,7 @@ name = "redox_termios"
|
||||
version = "0.1.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
dependencies = [
|
||||
"redox_syscall 0.1.54 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"redox_syscall 0.1.52 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -507,7 +506,7 @@ dependencies = [
|
||||
"bstr 0.1.2 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"clap 2.33.0 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"grep 0.2.3",
|
||||
"ignore 0.4.7",
|
||||
"ignore 0.4.6",
|
||||
"lazy_static 1.3.0 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"log 0.4.6 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"num_cpus 1.10.0 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
@@ -542,8 +541,8 @@ version = "1.0.90"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
dependencies = [
|
||||
"proc-macro2 0.4.27 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"quote 0.6.12 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"syn 0.15.31 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"quote 0.6.11 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"syn 0.15.30 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -568,11 +567,11 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
|
||||
[[package]]
|
||||
name = "syn"
|
||||
version = "0.15.31"
|
||||
version = "0.15.30"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
dependencies = [
|
||||
"proc-macro2 0.4.27 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"quote 0.6.12 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"quote 0.6.11 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"unicode-xid 0.1.0 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
]
|
||||
|
||||
@@ -584,7 +583,7 @@ dependencies = [
|
||||
"cfg-if 0.1.7 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"libc 0.2.51 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"rand 0.6.5 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"redox_syscall 0.1.54 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"redox_syscall 0.1.52 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"remove_dir_all 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"winapi 0.3.7 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
]
|
||||
@@ -603,7 +602,7 @@ version = "1.5.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
dependencies = [
|
||||
"libc 0.2.51 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"redox_syscall 0.1.54 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"redox_syscall 0.1.52 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
"redox_termios 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
|
||||
]
|
||||
|
||||
@@ -698,7 +697,7 @@ dependencies = [
|
||||
"checksum bstr 0.1.2 (registry+https://github.com/rust-lang/crates.io-index)" = "6c8203ca06c502958719dae5f653a79e0cc6ba808ed02beffbf27d09610f2143"
|
||||
"checksum bytecount 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)" = "be0fdd54b507df8f22012890aadd099979befdba27713c767993f8380112ca7c"
|
||||
"checksum byteorder 1.3.1 (registry+https://github.com/rust-lang/crates.io-index)" = "a019b10a2a7cdeb292db131fc8113e57ea2a908f6e7894b0c3c671893b65dbeb"
|
||||
"checksum cc 1.0.35 (registry+https://github.com/rust-lang/crates.io-index)" = "5e5f3fee5eeb60324c2781f1e41286bdee933850fff9b3c672587fed5ec58c83"
|
||||
"checksum cc 1.0.34 (registry+https://github.com/rust-lang/crates.io-index)" = "30f813bf45048a18eda9190fd3c6b78644146056740c43172a5a3699118588fd"
|
||||
"checksum cfg-if 0.1.7 (registry+https://github.com/rust-lang/crates.io-index)" = "11d43355396e872eefb45ce6342e4374ed7bc2b3a502d1b28e36d6e23c05d1f4"
|
||||
"checksum clap 2.33.0 (registry+https://github.com/rust-lang/crates.io-index)" = "5067f5bb2d80ef5d68b4c87db81601f0b75bca627bc2ef76b141d7b846a3c6d9"
|
||||
"checksum cloudabi 0.0.3 (registry+https://github.com/rust-lang/crates.io-index)" = "ddfc5b9aa5d4507acaf872de71051dfd0e309860e88966e1051e462a077aac4f"
|
||||
@@ -708,7 +707,7 @@ dependencies = [
|
||||
"checksum encoding_rs_io 0.1.6 (registry+https://github.com/rust-lang/crates.io-index)" = "9619ee7a2bf4e777e020b95c1439abaf008f8ea8041b78a0552c4f1bcf4df32c"
|
||||
"checksum fnv 1.0.6 (registry+https://github.com/rust-lang/crates.io-index)" = "2fad85553e09a6f881f739c29f0b00b0f01357c743266d478b68951ce23285f3"
|
||||
"checksum fuchsia-cprng 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "a06f77d526c1a601b7c4cdd98f54b5eaabffc14d5f2f0296febdc7f357c6d3ba"
|
||||
"checksum glob 0.3.0 (registry+https://github.com/rust-lang/crates.io-index)" = "9b919933a397b79c37e33b77bb2aa3dc8eb6e165ad809e58ff75bc7db2e34574"
|
||||
"checksum glob 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)" = "8be18de09a56b60ed0edf84bc9df007e30040691af7acd1c41874faac5895bfb"
|
||||
"checksum itoa 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)" = "1306f3464951f30e30d12373d31c79fbd52d236e5e896fd92f96ec7babbbe60b"
|
||||
"checksum lazy_static 1.3.0 (registry+https://github.com/rust-lang/crates.io-index)" = "bc5729f27f159ddd61f4df6228e827e86643d4d3e7c32183cb30a1c08f604a14"
|
||||
"checksum libc 0.2.51 (registry+https://github.com/rust-lang/crates.io-index)" = "bedcc7a809076656486ffe045abeeac163da1b558e963a31e29fbfbeba916917"
|
||||
@@ -717,11 +716,11 @@ dependencies = [
|
||||
"checksum memmap 0.7.0 (registry+https://github.com/rust-lang/crates.io-index)" = "6585fd95e7bb50d6cc31e20d4cf9afb4e2ba16c5846fc76793f11218da9c475b"
|
||||
"checksum num_cpus 1.10.0 (registry+https://github.com/rust-lang/crates.io-index)" = "1a23f0ed30a54abaa0c7e83b1d2d87ada7c3c23078d1d87815af3e3b6385fbba"
|
||||
"checksum packed_simd 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)" = "a85ea9fc0d4ac0deb6fe7911d38786b32fc11119afd9e9d38b84ff691ce64220"
|
||||
"checksum pcre2 0.2.0 (registry+https://github.com/rust-lang/crates.io-index)" = "a08c8195dd1d8a2a1b5e2af94bf0c4c3c195c2359930442a016bf123196f7155"
|
||||
"checksum pcre2-sys 0.2.0 (registry+https://github.com/rust-lang/crates.io-index)" = "1e0092a7eae1c569cf7dbec61eef956516df93eb4afda8f600ccb16980aca849"
|
||||
"checksum pcre2 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "3ae0a2682105ec5ca0ee5910bbc7e926386d348a05166348f74007942983c319"
|
||||
"checksum pcre2-sys 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "a9027f9474e4e13d3b965538aafcaebe48c803488ad76b3c97ef061a8324695f"
|
||||
"checksum pkg-config 0.3.14 (registry+https://github.com/rust-lang/crates.io-index)" = "676e8eb2b1b4c9043511a9b7bea0915320d7e502b0a079fb03f9635a5252b18c"
|
||||
"checksum proc-macro2 0.4.27 (registry+https://github.com/rust-lang/crates.io-index)" = "4d317f9caece796be1980837fd5cb3dfec5613ebdb04ad0956deea83ce168915"
|
||||
"checksum quote 0.6.12 (registry+https://github.com/rust-lang/crates.io-index)" = "faf4799c5d274f3868a4aae320a0a182cbd2baee377b378f080e16a23e9d80db"
|
||||
"checksum quote 0.6.11 (registry+https://github.com/rust-lang/crates.io-index)" = "cdd8e04bd9c52e0342b406469d494fcb033be4bdbe5c606016defbb1681411e1"
|
||||
"checksum rand 0.6.5 (registry+https://github.com/rust-lang/crates.io-index)" = "6d71dacdc3c88c1fde3885a3be3fbab9f35724e6ce99467f7d9c5026132184ca"
|
||||
"checksum rand_chacha 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "556d3a1ca6600bfcbab7c7c91ccb085ac7fbbcd70e008a98742e7847f4f7bcef"
|
||||
"checksum rand_core 0.3.1 (registry+https://github.com/rust-lang/crates.io-index)" = "7a6fdeb83b075e8266dcc8762c22776f6877a63111121f5f8c7411e5be7eed4b"
|
||||
@@ -733,7 +732,7 @@ dependencies = [
|
||||
"checksum rand_pcg 0.1.2 (registry+https://github.com/rust-lang/crates.io-index)" = "abf9b09b01790cfe0364f52bf32995ea3c39f4d2dd011eac241d2914146d0b44"
|
||||
"checksum rand_xorshift 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "cbf7e9e623549b0e21f6e97cf8ecf247c1a8fd2e8a992ae265314300b2455d5c"
|
||||
"checksum rdrand 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)" = "678054eb77286b51581ba43620cc911abf02758c91f93f479767aed0f90458b2"
|
||||
"checksum redox_syscall 0.1.54 (registry+https://github.com/rust-lang/crates.io-index)" = "12229c14a0f65c4f1cb046a3b52047cdd9da1f4b30f8a39c5063c8bae515e252"
|
||||
"checksum redox_syscall 0.1.52 (registry+https://github.com/rust-lang/crates.io-index)" = "d32b3053e5ced86e4bc0411fec997389532bf56b000e66cb4884eeeb41413d69"
|
||||
"checksum redox_termios 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "7e891cfe48e9100a70a3b6eb652fef28920c117d366339687bd5576160db0f76"
|
||||
"checksum regex 1.1.5 (registry+https://github.com/rust-lang/crates.io-index)" = "559008764a17de49a3146b234641644ed37d118d1ef641a0bb573d146edc6ce0"
|
||||
"checksum regex-automata 0.1.6 (registry+https://github.com/rust-lang/crates.io-index)" = "a25a7daa2eea48550e9946133d6cc9621020d29cc7069089617234bf8b6a8693"
|
||||
@@ -746,7 +745,7 @@ dependencies = [
|
||||
"checksum serde_json 1.0.39 (registry+https://github.com/rust-lang/crates.io-index)" = "5a23aa71d4a4d43fdbfaac00eff68ba8a06a51759a89ac3304323e800c4dd40d"
|
||||
"checksum smallvec 0.6.9 (registry+https://github.com/rust-lang/crates.io-index)" = "c4488ae950c49d403731982257768f48fada354a5203fe81f9bb6f43ca9002be"
|
||||
"checksum strsim 0.8.0 (registry+https://github.com/rust-lang/crates.io-index)" = "8ea5119cdb4c55b55d432abb513a0429384878c15dde60cc77b1c99de1a95a6a"
|
||||
"checksum syn 0.15.31 (registry+https://github.com/rust-lang/crates.io-index)" = "d2b4cfac95805274c6afdb12d8f770fa2d27c045953e7b630a81801953699a9a"
|
||||
"checksum syn 0.15.30 (registry+https://github.com/rust-lang/crates.io-index)" = "66c8865bf5a7cbb662d8b011950060b3c8743dca141b054bf7195b20d314d8e2"
|
||||
"checksum tempfile 3.0.7 (registry+https://github.com/rust-lang/crates.io-index)" = "b86c784c88d98c801132806dadd3819ed29d8600836c4088e855cdf3e178ed8a"
|
||||
"checksum termcolor 1.0.4 (registry+https://github.com/rust-lang/crates.io-index)" = "4096add70612622289f2fdcdbd5086dc81c1e2675e6ae58d6c4f62a16c6d7f2f"
|
||||
"checksum termion 1.5.1 (registry+https://github.com/rust-lang/crates.io-index)" = "689a3bdfaab439fd92bc87df5c4c78417d3cbe537487274e9b0b2dce76e92096"
|
||||
|
@@ -48,7 +48,7 @@ members = [
|
||||
[dependencies]
|
||||
bstr = "0.1.2"
|
||||
grep = { version = "0.2.3", path = "grep" }
|
||||
ignore = { version = "0.4.7", path = "ignore" }
|
||||
ignore = { version = "0.4.4", path = "ignore" }
|
||||
lazy_static = "1.1.0"
|
||||
log = "0.4.5"
|
||||
num_cpus = "1.8.0"
|
||||
|
74
GUIDE.md
74
GUIDE.md
@@ -18,7 +18,6 @@ translatable to any command line shell environment.
|
||||
* [Replacements](#replacements)
|
||||
* [Configuration file](#configuration-file)
|
||||
* [File encoding](#file-encoding)
|
||||
* [Binary data](#binary-data)
|
||||
* [Common options](#common-options)
|
||||
|
||||
|
||||
@@ -538,9 +537,8 @@ formatting peculiarities:
|
||||
|
||||
```
|
||||
$ cat $HOME/.ripgreprc
|
||||
# Don't let ripgrep vomit really long lines to my terminal, and show a preview.
|
||||
# Don't let ripgrep vomit really long lines to my terminal.
|
||||
--max-columns=150
|
||||
--max-columns-preview
|
||||
|
||||
# Add my 'web' type.
|
||||
--type-add
|
||||
@@ -682,76 +680,6 @@ $ rg '\w(?-u:\w)\w'
|
||||
```
|
||||
|
||||
|
||||
### Binary data
|
||||
|
||||
In addition to skipping hidden files and files in your `.gitignore` by default,
|
||||
ripgrep also attempts to skip binary files. ripgrep does this by default
|
||||
because binary files (like PDFs or images) are typically not things you want to
|
||||
search when searching for regex matches. Moreover, if content in a binary file
|
||||
did match, then it's possible for undesirable binary data to be printed to your
|
||||
terminal and wreak havoc.
|
||||
|
||||
Unfortunately, unlike skipping hidden files and respecting your `.gitignore`
|
||||
rules, a file cannot as easily be classified as binary. In order to figure out
|
||||
whether a file is binary, the most effective heuristic that balances
|
||||
correctness with performance is to simply look for `NUL` bytes. At that point,
|
||||
the determination is simple: a file is considered "binary" if and only if it
|
||||
contains a `NUL` byte somewhere in its contents.
|
||||
|
||||
The issue is that while most binary files will have a `NUL` byte toward the
|
||||
beginning of its contents, this is not necessarily true. The `NUL` byte might
|
||||
be the very last byte in a large file, but that file is still considered
|
||||
binary. While this leads to a fair amount of complexity inside ripgrep's
|
||||
implementation, it also results in some unintuitive user experiences.
|
||||
|
||||
At a high level, ripgrep operates in three different modes with respect to
|
||||
binary files:
|
||||
|
||||
1. The default mode is to attempt to remove binary files from a search
|
||||
completely. This is meant to mirror how ripgrep removes hidden files and
|
||||
files in your `.gitignore` automatically. That is, as soon as a file is
|
||||
detected as binary, searching stops. If a match was already printed (because
|
||||
it was detected long before a `NUL` byte), then ripgrep will print a warning
|
||||
message indicating that the search stopped prematurely. This default mode
|
||||
**only applies to files searched by ripgrep as a result of recursive
|
||||
directory traversal**, which is consistent with ripgrep's other automatic
|
||||
filtering. For example, `rg foo .file` will search `.file` even though it
|
||||
is hidden. Similarly, `rg foo binary-file` search `binary-file` in "binary"
|
||||
mode automatically.
|
||||
2. Binary mode is similar to the default mode, except it will not always
|
||||
stop searching after it sees a `NUL` byte. Namely, in this mode, ripgrep
|
||||
will continue searching a file that is known to be binary until the first
|
||||
of two conditions is met: 1) the end of the file has been reached or 2) a
|
||||
match is or has been seen. This means that in binary mode, if ripgrep
|
||||
reports no matches, then there are no matches in the file. When a match does
|
||||
occur, ripgrep prints a message similar to one it prints when in its default
|
||||
mode indicating that the search has stopped prematurely. This mode can be
|
||||
forcefully enabled for all files with the `--binary` flag. The purpose of
|
||||
binary mode is to provide a way to discover matches in all files, but to
|
||||
avoid having binary data dumped into your terminal.
|
||||
3. Text mode completely disables all binary detection and searches all files
|
||||
as if they were text. This is useful when searching a file that is
|
||||
predominantly text but contains a `NUL` byte, or if you are specifically
|
||||
trying to search binary data. This mode can be enabled with the `-a/--text`
|
||||
flag. Note that when using this mode on very large binary files, it is
|
||||
possible for ripgrep to use a lot of memory.
|
||||
|
||||
Unfortunately, there is one additional complexity in ripgrep that can make it
|
||||
difficult to reason about binary files. That is, the way binary detection works
|
||||
depends on the way that ripgrep searches your files. Specifically:
|
||||
|
||||
* When ripgrep uses memory maps, then binary detection is only performed on the
|
||||
first few kilobytes of the file in addition to every matching line.
|
||||
* When ripgrep doesn't use memory maps, then binary detection is performed on
|
||||
all bytes searched.
|
||||
|
||||
This means that whether a file is detected as binary or not can change based
|
||||
on the internal search strategy used by ripgrep. If you prefer to keep
|
||||
ripgrep's binary file detection consistent, then you can disable memory maps
|
||||
via the `--no-mmap` flag. (The cost will be a small performance regression when
|
||||
searching very large files on some platforms.)
|
||||
|
||||
|
||||
### Common options
|
||||
|
||||
ripgrep has a lot of flags. Too many to keep in your head at once. This section
|
||||
|
@@ -11,7 +11,6 @@ and grep.
|
||||
[](https://travis-ci.org/BurntSushi/ripgrep)
|
||||
[](https://ci.appveyor.com/project/BurntSushi/ripgrep)
|
||||
[](https://crates.io/crates/ripgrep)
|
||||
[](https://repology.org/project/ripgrep/badges)
|
||||
|
||||
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
|
||||
|
||||
@@ -340,7 +339,7 @@ If you're a **NetBSD** user, then you can install ripgrep from
|
||||
|
||||
If you're a **Rust programmer**, ripgrep can be installed with `cargo`.
|
||||
|
||||
* Note that the minimum supported version of Rust for ripgrep is **1.34.0**,
|
||||
* Note that the minimum supported version of Rust for ripgrep is **1.32.0**,
|
||||
although ripgrep may work with older versions.
|
||||
* Note that the binary may be bigger than expected because it contains debug
|
||||
symbols. This is intentional. To remove debug symbols and therefore reduce
|
||||
@@ -350,6 +349,9 @@ If you're a **Rust programmer**, ripgrep can be installed with `cargo`.
|
||||
$ cargo install ripgrep
|
||||
```
|
||||
|
||||
When compiling with Rust 1.27 or newer, this will automatically enable SIMD
|
||||
optimizations for search.
|
||||
|
||||
ripgrep isn't currently in any other package repositories.
|
||||
[I'd like to change that](https://github.com/BurntSushi/ripgrep/issues/10).
|
||||
|
||||
@@ -358,7 +360,7 @@ ripgrep isn't currently in any other package repositories.
|
||||
|
||||
ripgrep is written in Rust, so you'll need to grab a
|
||||
[Rust installation](https://www.rust-lang.org/) in order to compile it.
|
||||
ripgrep compiles with Rust 1.34.0 (stable) or newer. In general, ripgrep tracks
|
||||
ripgrep compiles with Rust 1.32.0 (stable) or newer. In general, ripgrep tracks
|
||||
the latest stable release of the Rust compiler.
|
||||
|
||||
To build ripgrep:
|
||||
|
13
complete/_rg
13
complete/_rg
@@ -43,7 +43,6 @@ _rg() {
|
||||
+ '(exclusive)' # Misc. fully exclusive options
|
||||
'(: * -)'{-h,--help}'[display help information]'
|
||||
'(: * -)'{-V,--version}'[display version information]'
|
||||
'(: * -)'--pcre2-version'[print the version of PCRE2 used by ripgrep, if available]'
|
||||
|
||||
+ '(buffered)' # buffering options
|
||||
'--line-buffered[force line buffering]'
|
||||
@@ -86,7 +85,7 @@ _rg() {
|
||||
|
||||
+ '(file-name)' # File-name options
|
||||
{-H,--with-filename}'[show file name for matches]'
|
||||
{-I,--no-filename}"[don't show file name for matches]"
|
||||
"--no-filename[don't show file name for matches]"
|
||||
|
||||
+ '(file-system)' # File system options
|
||||
"--one-file-system[don't descend into directories on other file systems]"
|
||||
@@ -112,10 +111,6 @@ _rg() {
|
||||
'--hidden[search hidden files and directories]'
|
||||
$no"--no-hidden[don't search hidden files and directories]"
|
||||
|
||||
+ '(hybrid)' # hybrid regex options
|
||||
'--auto-hybrid-regex[dynamically use PCRE2 if necessary]'
|
||||
$no"--no-auto-hybrid-regex[don't dynamically use PCRE2 if necessary]"
|
||||
|
||||
+ '(ignore)' # Ignore-file options
|
||||
"(--no-ignore-global --no-ignore-parent --no-ignore-vcs --no-ignore-dot)--no-ignore[don't respect ignore files]"
|
||||
$no'(--ignore-global --ignore-parent --ignore-vcs --ignore-dot)--ignore[respect ignore files]'
|
||||
@@ -153,10 +148,6 @@ _rg() {
|
||||
$no"--no-crlf[don't use CRLF as line terminator]"
|
||||
'(text)--null-data[use NUL as line terminator]'
|
||||
|
||||
+ '(max-columns-preview)' # max column preview options
|
||||
'--max-columns-preview[show preview for long lines (with -M)]'
|
||||
$no"--no-max-columns-preview[don't show preview for long lines (with -M)]"
|
||||
|
||||
+ '(max-depth)' # Directory-depth options
|
||||
'--max-depth=[specify max number of directories to descend]:number of directories'
|
||||
'!--maxdepth=:number of directories'
|
||||
@@ -236,8 +227,6 @@ _rg() {
|
||||
|
||||
+ '(text)' # Binary-search options
|
||||
{-a,--text}'[search binary files as if they were text]'
|
||||
"--binary[search binary files, don't print binary data]"
|
||||
$no"--no-binary[don't search binary files]"
|
||||
$no"(--null-data)--no-text[don't search binary files as if they were text]"
|
||||
|
||||
+ '(threads)' # Thread-count options
|
||||
|
@@ -41,9 +41,6 @@ configuration file. The file can specify one shell argument per line. Lines
|
||||
starting with *#* are ignored. For more details, see the man page or the
|
||||
*README*.
|
||||
|
||||
Tip: to disable all smart filtering and make ripgrep behave a bit more like
|
||||
classical grep, use *rg -uuu*.
|
||||
|
||||
|
||||
REGEX SYNTAX
|
||||
------------
|
||||
@@ -192,21 +189,6 @@ file that is simultaneously truncated. This behavior can be avoided by passing
|
||||
the *--no-mmap* flag which will forcefully disable the use of memory maps in
|
||||
all cases.
|
||||
|
||||
ripgrep may use a large amount of memory depending on a few factors. Firstly,
|
||||
if ripgrep uses parallelism for search (the default), then the entire output
|
||||
for each individual file is buffered into memory in order to prevent
|
||||
interleaving matches in the output. To avoid this, you can disable parallelism
|
||||
with the *-j1* flag. Secondly, ripgrep always needs to have at least a single
|
||||
line in memory in order to execute a search. A file with a very long line can
|
||||
thus cause ripgrep to use a lot of memory. Generally, this only occurs when
|
||||
searching binary data with the *-a* flag enabled. (When the *-a* flag isn't
|
||||
enabled, ripgrep will replace all NUL bytes with line terminators, which
|
||||
typically prevents exorbitant memory usage.) Thirdly, when ripgrep searches
|
||||
a large file using a memory map, the process will report its resident memory
|
||||
usage as the size of the file. However, this does not mean ripgrep actually
|
||||
needed to use that much memory; the operating system will generally handle this
|
||||
for you.
|
||||
|
||||
|
||||
VERSION
|
||||
-------
|
||||
|
@@ -1,6 +1,6 @@
|
||||
[package]
|
||||
name = "globset"
|
||||
version = "0.4.3" #:version
|
||||
version = "0.4.2" #:version
|
||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||
description = """
|
||||
Cross platform single glob and glob set matching. Glob set matching is the
|
||||
@@ -26,7 +26,7 @@ log = "0.4.5"
|
||||
regex = "1.1.5"
|
||||
|
||||
[dev-dependencies]
|
||||
glob = "0.3.0"
|
||||
glob = "0.2.11"
|
||||
|
||||
[features]
|
||||
simd-accel = []
|
||||
|
@@ -15,7 +15,7 @@ license = "Unlicense/MIT"
|
||||
[dependencies]
|
||||
atty = "0.2.11"
|
||||
bstr = "0.1.2"
|
||||
globset = { version = "0.4.3", path = "../globset" }
|
||||
globset = { version = "0.4.2", path = "../globset" }
|
||||
lazy_static = "1.1.0"
|
||||
log = "0.4.5"
|
||||
regex = "1.1"
|
||||
|
@@ -1,6 +1,6 @@
|
||||
[package]
|
||||
name = "grep-matcher"
|
||||
version = "0.1.2" #:version
|
||||
version = "0.1.1" #:version
|
||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||
description = """
|
||||
A trait for regular expressions, with a focus on line oriented search.
|
||||
|
@@ -13,5 +13,5 @@ keywords = ["regex", "grep", "pcre", "backreference", "look"]
|
||||
license = "Unlicense/MIT"
|
||||
|
||||
[dependencies]
|
||||
grep-matcher = { version = "0.1.2", path = "../grep-matcher" }
|
||||
pcre2 = "0.2.0"
|
||||
grep-matcher = { version = "0.1.1", path = "../grep-matcher" }
|
||||
pcre2 = "0.1.1"
|
||||
|
@@ -10,7 +10,6 @@ extern crate pcre2;
|
||||
|
||||
pub use error::{Error, ErrorKind};
|
||||
pub use matcher::{RegexCaptures, RegexMatcher, RegexMatcherBuilder};
|
||||
pub use pcre2::{is_jit_available, version};
|
||||
|
||||
mod error;
|
||||
mod matcher;
|
||||
|
@@ -227,27 +227,6 @@ impl RegexMatcherBuilder {
|
||||
self.builder.jit_if_available(yes);
|
||||
self
|
||||
}
|
||||
|
||||
/// Set the maximum size of PCRE2's JIT stack, in bytes. If the JIT is
|
||||
/// not enabled, then this has no effect.
|
||||
///
|
||||
/// When `None` is given, no custom JIT stack will be created, and instead,
|
||||
/// the default JIT stack is used. When the default is used, its maximum
|
||||
/// size is 32 KB.
|
||||
///
|
||||
/// When this is set, then a new JIT stack will be created with the given
|
||||
/// maximum size as its limit.
|
||||
///
|
||||
/// Increasing the stack size can be useful for larger regular expressions.
|
||||
///
|
||||
/// By default, this is set to `None`.
|
||||
pub fn max_jit_stack_size(
|
||||
&mut self,
|
||||
bytes: Option<usize>,
|
||||
) -> &mut RegexMatcherBuilder {
|
||||
self.builder.max_jit_stack_size(bytes);
|
||||
self
|
||||
}
|
||||
}
|
||||
|
||||
/// An implementation of the `Matcher` trait using PCRE2.
|
||||
|
@@ -20,7 +20,7 @@ serde1 = ["base64", "serde", "serde_derive", "serde_json"]
|
||||
[dependencies]
|
||||
base64 = { version = "0.10.0", optional = true }
|
||||
bstr = "0.1.2"
|
||||
grep-matcher = { version = "0.1.2", path = "../grep-matcher" }
|
||||
grep-matcher = { version = "0.1.1", path = "../grep-matcher" }
|
||||
grep-searcher = { version = "0.1.1", path = "../grep-searcher" }
|
||||
termcolor = "1.0.4"
|
||||
serde = { version = "1.0.77", optional = true }
|
||||
|
@@ -5,7 +5,6 @@ use std::path::Path;
|
||||
use std::sync::Arc;
|
||||
use std::time::Instant;
|
||||
|
||||
use bstr::BStr;
|
||||
use grep_matcher::{Match, Matcher};
|
||||
use grep_searcher::{
|
||||
LineStep, Searcher,
|
||||
@@ -17,7 +16,10 @@ use termcolor::{ColorSpec, NoColor, WriteColor};
|
||||
use color::ColorSpecs;
|
||||
use counter::CounterWriter;
|
||||
use stats::Stats;
|
||||
use util::{PrinterPath, Replacer, Sunk, trim_ascii_prefix};
|
||||
use util::{
|
||||
PrinterPath, Replacer, Sunk,
|
||||
trim_ascii_prefix, trim_ascii_prefix_range,
|
||||
};
|
||||
|
||||
/// The configuration for the standard printer.
|
||||
///
|
||||
@@ -34,7 +36,6 @@ struct Config {
|
||||
per_match: bool,
|
||||
replacement: Arc<Option<Vec<u8>>>,
|
||||
max_columns: Option<u64>,
|
||||
max_columns_preview: bool,
|
||||
max_matches: Option<u64>,
|
||||
column: bool,
|
||||
byte_offset: bool,
|
||||
@@ -58,7 +59,6 @@ impl Default for Config {
|
||||
per_match: false,
|
||||
replacement: Arc::new(None),
|
||||
max_columns: None,
|
||||
max_columns_preview: false,
|
||||
max_matches: None,
|
||||
column: false,
|
||||
byte_offset: false,
|
||||
@@ -263,21 +263,6 @@ impl StandardBuilder {
|
||||
self
|
||||
}
|
||||
|
||||
/// When enabled, if a line is found to be over the configured maximum
|
||||
/// column limit (measured in terms of bytes), then a preview of the long
|
||||
/// line will be printed instead.
|
||||
///
|
||||
/// The preview will correspond to the first `N` *grapheme clusters* of
|
||||
/// the line, where `N` is the limit configured by `max_columns`.
|
||||
///
|
||||
/// If no limit is set, then enabling this has no effect.
|
||||
///
|
||||
/// This is disabled by default.
|
||||
pub fn max_columns_preview(&mut self, yes: bool) -> &mut StandardBuilder {
|
||||
self.config.max_columns_preview = yes;
|
||||
self
|
||||
}
|
||||
|
||||
/// Set the maximum amount of matching lines that are printed.
|
||||
///
|
||||
/// If multi line search is enabled and a match spans multiple lines, then
|
||||
@@ -758,11 +743,6 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
|
||||
stats.add_matches(self.standard.matches.len() as u64);
|
||||
stats.add_matched_lines(mat.lines().count() as u64);
|
||||
}
|
||||
if searcher.binary_detection().convert_byte().is_some() {
|
||||
if self.binary_byte_offset.is_some() {
|
||||
return Ok(false);
|
||||
}
|
||||
}
|
||||
|
||||
StandardImpl::from_match(searcher, self, mat).sink()?;
|
||||
Ok(!self.should_quit())
|
||||
@@ -784,12 +764,6 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
|
||||
self.record_matches(ctx.bytes())?;
|
||||
self.replace(ctx.bytes())?;
|
||||
}
|
||||
if searcher.binary_detection().convert_byte().is_some() {
|
||||
if self.binary_byte_offset.is_some() {
|
||||
return Ok(false);
|
||||
}
|
||||
}
|
||||
|
||||
StandardImpl::from_context(searcher, self, ctx).sink()?;
|
||||
Ok(!self.should_quit())
|
||||
}
|
||||
@@ -802,15 +776,6 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
fn binary_data(
|
||||
&mut self,
|
||||
_searcher: &Searcher,
|
||||
binary_byte_offset: u64,
|
||||
) -> Result<bool, io::Error> {
|
||||
self.binary_byte_offset = Some(binary_byte_offset);
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
fn begin(
|
||||
&mut self,
|
||||
_searcher: &Searcher,
|
||||
@@ -828,12 +793,10 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
|
||||
|
||||
fn finish(
|
||||
&mut self,
|
||||
searcher: &Searcher,
|
||||
_searcher: &Searcher,
|
||||
finish: &SinkFinish,
|
||||
) -> Result<(), io::Error> {
|
||||
if let Some(offset) = self.binary_byte_offset {
|
||||
StandardImpl::new(searcher, self).write_binary_message(offset)?;
|
||||
}
|
||||
self.binary_byte_offset = finish.binary_byte_offset();
|
||||
if let Some(stats) = self.stats.as_mut() {
|
||||
stats.add_elapsed(self.start_time.elapsed());
|
||||
stats.add_searches(1);
|
||||
@@ -1037,11 +1000,43 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
)?;
|
||||
count += 1;
|
||||
if self.exceeds_max_columns(&bytes[line]) {
|
||||
self.write_exceeded_line(bytes, line, matches, &mut midx)?;
|
||||
} else {
|
||||
self.write_colored_matches(bytes, line, matches, &mut midx)?;
|
||||
self.write_line_term()?;
|
||||
self.write_exceeded_line()?;
|
||||
continue;
|
||||
}
|
||||
if self.has_line_terminator(&bytes[line]) {
|
||||
line = line.with_end(line.end() - 1);
|
||||
}
|
||||
if self.config().trim_ascii {
|
||||
line = self.trim_ascii_prefix_range(bytes, line);
|
||||
}
|
||||
|
||||
while !line.is_empty() {
|
||||
if matches[midx].end() <= line.start() {
|
||||
if midx + 1 < matches.len() {
|
||||
midx += 1;
|
||||
continue;
|
||||
} else {
|
||||
self.end_color_match()?;
|
||||
self.write(&bytes[line])?;
|
||||
break;
|
||||
}
|
||||
}
|
||||
let m = matches[midx];
|
||||
|
||||
if line.start() < m.start() {
|
||||
let upto = cmp::min(line.end(), m.start());
|
||||
self.end_color_match()?;
|
||||
self.write(&bytes[line.with_end(upto)])?;
|
||||
line = line.with_start(upto);
|
||||
} else {
|
||||
let upto = cmp::min(line.end(), m.end());
|
||||
self.start_color_match()?;
|
||||
self.write(&bytes[line.with_end(upto)])?;
|
||||
line = line.with_start(upto);
|
||||
}
|
||||
}
|
||||
self.end_color_match()?;
|
||||
self.write_line_term()?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
@@ -1056,8 +1051,12 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
let mut stepper = LineStep::new(line_term, 0, bytes.len());
|
||||
while let Some((start, end)) = stepper.next(bytes) {
|
||||
let mut line = Match::new(start, end);
|
||||
self.trim_line_terminator(bytes, &mut line);
|
||||
self.trim_ascii_prefix(bytes, &mut line);
|
||||
if self.has_line_terminator(&bytes[line]) {
|
||||
line = line.with_end(line.end() - 1);
|
||||
}
|
||||
if self.config().trim_ascii {
|
||||
line = self.trim_ascii_prefix_range(bytes, line);
|
||||
}
|
||||
while !line.is_empty() {
|
||||
if matches[midx].end() <= line.start() {
|
||||
if midx + 1 < matches.len() {
|
||||
@@ -1080,19 +1079,14 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
Some(m.start() as u64 + 1),
|
||||
)?;
|
||||
|
||||
let this_line = line.with_end(upto);
|
||||
let buf = &bytes[line.with_end(upto)];
|
||||
line = line.with_start(upto);
|
||||
if self.exceeds_max_columns(&bytes[this_line]) {
|
||||
self.write_exceeded_line(
|
||||
bytes,
|
||||
this_line,
|
||||
matches,
|
||||
&mut midx,
|
||||
)?;
|
||||
} else {
|
||||
self.write_spec(spec, &bytes[this_line])?;
|
||||
self.write_line_term()?;
|
||||
if self.exceeds_max_columns(&buf) {
|
||||
self.write_exceeded_line()?;
|
||||
continue;
|
||||
}
|
||||
self.write_spec(spec, buf)?;
|
||||
self.write_line_term()?;
|
||||
}
|
||||
}
|
||||
count += 1;
|
||||
@@ -1123,11 +1117,15 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
)?;
|
||||
count += 1;
|
||||
if self.exceeds_max_columns(&bytes[line]) {
|
||||
self.write_exceeded_line(bytes, line, &[m], &mut 0)?;
|
||||
self.write_exceeded_line()?;
|
||||
continue;
|
||||
}
|
||||
self.trim_line_terminator(bytes, &mut line);
|
||||
self.trim_ascii_prefix(bytes, &mut line);
|
||||
if self.has_line_terminator(&bytes[line]) {
|
||||
line = line.with_end(line.end() - 1);
|
||||
}
|
||||
if self.config().trim_ascii {
|
||||
line = self.trim_ascii_prefix_range(bytes, line);
|
||||
}
|
||||
|
||||
while !line.is_empty() {
|
||||
if m.end() <= line.start() {
|
||||
@@ -1184,10 +1182,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
line: &[u8],
|
||||
) -> io::Result<()> {
|
||||
if self.exceeds_max_columns(line) {
|
||||
let range = Match::new(0, line.len());
|
||||
self.write_exceeded_line(
|
||||
line, range, self.sunk.matches(), &mut 0,
|
||||
)?;
|
||||
self.write_exceeded_line()?;
|
||||
} else {
|
||||
self.write_trim(line)?;
|
||||
if !self.has_line_terminator(line) {
|
||||
@@ -1200,114 +1195,50 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
fn write_colored_line(
|
||||
&self,
|
||||
matches: &[Match],
|
||||
bytes: &[u8],
|
||||
line: &[u8],
|
||||
) -> io::Result<()> {
|
||||
// If we know we aren't going to emit color, then we can go faster.
|
||||
let spec = self.config().colors.matched();
|
||||
if !self.wtr().borrow().supports_color() || spec.is_none() {
|
||||
return self.write_line(bytes);
|
||||
return self.write_line(line);
|
||||
}
|
||||
if self.exceeds_max_columns(line) {
|
||||
return self.write_exceeded_line();
|
||||
}
|
||||
|
||||
let line = Match::new(0, bytes.len());
|
||||
if self.exceeds_max_columns(bytes) {
|
||||
self.write_exceeded_line(bytes, line, matches, &mut 0)
|
||||
} else {
|
||||
self.write_colored_matches(bytes, line, matches, &mut 0)?;
|
||||
self.write_line_term()?;
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
/// Write the `line` portion of `bytes`, with appropriate coloring for
|
||||
/// each `match`, starting at `match_index`.
|
||||
///
|
||||
/// This accounts for trimming any whitespace prefix and will *never* print
|
||||
/// a line terminator. If a match exceeds the range specified by `line`,
|
||||
/// then only the part of the match within `line` (if any) is printed.
|
||||
fn write_colored_matches(
|
||||
&self,
|
||||
bytes: &[u8],
|
||||
mut line: Match,
|
||||
matches: &[Match],
|
||||
match_index: &mut usize,
|
||||
) -> io::Result<()> {
|
||||
self.trim_line_terminator(bytes, &mut line);
|
||||
self.trim_ascii_prefix(bytes, &mut line);
|
||||
if matches.is_empty() {
|
||||
self.write(&bytes[line])?;
|
||||
return Ok(());
|
||||
}
|
||||
while !line.is_empty() {
|
||||
if matches[*match_index].end() <= line.start() {
|
||||
if *match_index + 1 < matches.len() {
|
||||
*match_index += 1;
|
||||
continue;
|
||||
} else {
|
||||
self.end_color_match()?;
|
||||
self.write(&bytes[line])?;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
let m = matches[*match_index];
|
||||
if line.start() < m.start() {
|
||||
let upto = cmp::min(line.end(), m.start());
|
||||
self.end_color_match()?;
|
||||
self.write(&bytes[line.with_end(upto)])?;
|
||||
line = line.with_start(upto);
|
||||
let mut last_written =
|
||||
if !self.config().trim_ascii {
|
||||
0
|
||||
} else {
|
||||
let upto = cmp::min(line.end(), m.end());
|
||||
self.start_color_match()?;
|
||||
self.write(&bytes[line.with_end(upto)])?;
|
||||
line = line.with_start(upto);
|
||||
self.trim_ascii_prefix_range(
|
||||
line,
|
||||
Match::new(0, line.len()),
|
||||
).start()
|
||||
};
|
||||
for mut m in matches.iter().map(|&m| m) {
|
||||
if last_written < m.start() {
|
||||
self.end_color_match()?;
|
||||
self.write(&line[last_written..m.start()])?;
|
||||
} else if last_written < m.end() {
|
||||
m = m.with_start(last_written);
|
||||
} else {
|
||||
continue;
|
||||
}
|
||||
if !m.is_empty() {
|
||||
self.start_color_match()?;
|
||||
self.write(&line[m])?;
|
||||
}
|
||||
last_written = m.end();
|
||||
}
|
||||
self.end_color_match()?;
|
||||
self.write(&line[last_written..])?;
|
||||
if !self.has_line_terminator(line) {
|
||||
self.write_line_term()?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn write_exceeded_line(
|
||||
&self,
|
||||
bytes: &[u8],
|
||||
mut line: Match,
|
||||
matches: &[Match],
|
||||
match_index: &mut usize,
|
||||
) -> io::Result<()> {
|
||||
if self.config().max_columns_preview {
|
||||
let original = line;
|
||||
let end = BStr::new(&bytes[line])
|
||||
.grapheme_indices()
|
||||
.map(|(_, end, _)| end)
|
||||
.take(self.config().max_columns.unwrap_or(0) as usize)
|
||||
.last()
|
||||
.unwrap_or(0) + line.start();
|
||||
line = line.with_end(end);
|
||||
self.write_colored_matches(bytes, line, matches, match_index)?;
|
||||
|
||||
if matches.is_empty() {
|
||||
self.write(b" [... omitted end of long line]")?;
|
||||
} else {
|
||||
let remaining = matches
|
||||
.iter()
|
||||
.filter(|m| {
|
||||
m.start() >= line.end() && m.start() < original.end()
|
||||
})
|
||||
.count();
|
||||
let tense =
|
||||
if remaining == 1 {
|
||||
"match"
|
||||
} else {
|
||||
"matches"
|
||||
};
|
||||
write!(
|
||||
self.wtr().borrow_mut(),
|
||||
" [... {} more {}]",
|
||||
remaining, tense,
|
||||
)?;
|
||||
}
|
||||
self.write_line_term()?;
|
||||
return Ok(());
|
||||
}
|
||||
fn write_exceeded_line(&self) -> io::Result<()> {
|
||||
if self.sunk.original_matches().is_empty() {
|
||||
if self.is_context() {
|
||||
self.write(b"[Omitted long context line]")?;
|
||||
@@ -1383,38 +1314,6 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn write_binary_message(&self, offset: u64) -> io::Result<()> {
|
||||
if self.sink.match_count == 0 {
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
let bin = self.searcher.binary_detection();
|
||||
if let Some(byte) = bin.quit_byte() {
|
||||
self.write(b"WARNING: stopped searching binary file ")?;
|
||||
if let Some(path) = self.path() {
|
||||
self.write_spec(self.config().colors.path(), path.as_bytes())?;
|
||||
self.write(b" ")?;
|
||||
}
|
||||
let remainder = format!(
|
||||
"after match (found {:?} byte around offset {})\n",
|
||||
BStr::new(&[byte]), offset,
|
||||
);
|
||||
self.write(remainder.as_bytes())?;
|
||||
} else if let Some(byte) = bin.convert_byte() {
|
||||
self.write(b"Binary file ")?;
|
||||
if let Some(path) = self.path() {
|
||||
self.write_spec(self.config().colors.path(), path.as_bytes())?;
|
||||
self.write(b" ")?;
|
||||
}
|
||||
let remainder = format!(
|
||||
"matches (found {:?} byte around offset {})\n",
|
||||
BStr::new(&[byte]), offset,
|
||||
);
|
||||
self.write(remainder.as_bytes())?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn write_context_separator(&self) -> io::Result<()> {
|
||||
if let Some(ref sep) = *self.config().separator_context {
|
||||
self.write(sep)?;
|
||||
@@ -1490,26 +1389,13 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
if !self.config().trim_ascii {
|
||||
return self.write(buf);
|
||||
}
|
||||
let mut range = Match::new(0, buf.len());
|
||||
self.trim_ascii_prefix(buf, &mut range);
|
||||
self.write(&buf[range])
|
||||
self.write(self.trim_ascii_prefix(buf))
|
||||
}
|
||||
|
||||
fn write(&self, buf: &[u8]) -> io::Result<()> {
|
||||
self.wtr().borrow_mut().write_all(buf)
|
||||
}
|
||||
|
||||
fn trim_line_terminator(&self, buf: &[u8], line: &mut Match) {
|
||||
let lineterm = self.searcher.line_terminator();
|
||||
if lineterm.is_suffix(&buf[*line]) {
|
||||
let mut end = line.end() - 1;
|
||||
if lineterm.is_crlf() && buf[end - 1] == b'\r' {
|
||||
end -= 1;
|
||||
}
|
||||
*line = line.with_end(end);
|
||||
}
|
||||
}
|
||||
|
||||
fn has_line_terminator(&self, buf: &[u8]) -> bool {
|
||||
self.searcher.line_terminator().is_suffix(buf)
|
||||
}
|
||||
@@ -1565,12 +1451,14 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
///
|
||||
/// This stops trimming a prefix as soon as it sees non-whitespace or a
|
||||
/// line terminator.
|
||||
fn trim_ascii_prefix(&self, slice: &[u8], range: &mut Match) {
|
||||
if !self.config().trim_ascii {
|
||||
return;
|
||||
}
|
||||
let lineterm = self.searcher.line_terminator();
|
||||
*range = trim_ascii_prefix(lineterm, slice, *range)
|
||||
fn trim_ascii_prefix_range(&self, slice: &[u8], range: Match) -> Match {
|
||||
trim_ascii_prefix_range(self.searcher.line_terminator(), slice, range)
|
||||
}
|
||||
|
||||
/// Trim prefix ASCII spaces from the given slice and return the
|
||||
/// corresponding sub-slice.
|
||||
fn trim_ascii_prefix<'s>(&self, slice: &'s [u8]) -> &'s [u8] {
|
||||
trim_ascii_prefix(self.searcher.line_terminator(), slice)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -2337,31 +2225,6 @@ but Doctor Watson has to have it taken out for him and dusted,
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn max_columns_preview() {
|
||||
let matcher = RegexMatcher::new("exhibited|dusted").unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.max_columns(Some(46))
|
||||
.max_columns_preview(true)
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.line_number(false)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
but Doctor Watson has to have it taken out for [... omitted end of long line]
|
||||
and exhibited clearly, with a label attached.
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn max_columns_with_count() {
|
||||
let matcher = RegexMatcher::new("cigar|ash|dusted").unwrap();
|
||||
@@ -2387,86 +2250,6 @@ but Doctor Watson has to have it taken out for him and dusted,
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn max_columns_with_count_preview_no_match() {
|
||||
let matcher = RegexMatcher::new("exhibited|has to have it").unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.stats(true)
|
||||
.max_columns(Some(46))
|
||||
.max_columns_preview(true)
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.line_number(false)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
but Doctor Watson has to have it taken out for [... 0 more matches]
|
||||
and exhibited clearly, with a label attached.
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn max_columns_with_count_preview_one_match() {
|
||||
let matcher = RegexMatcher::new("exhibited|dusted").unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.stats(true)
|
||||
.max_columns(Some(46))
|
||||
.max_columns_preview(true)
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.line_number(false)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
but Doctor Watson has to have it taken out for [... 1 more match]
|
||||
and exhibited clearly, with a label attached.
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn max_columns_with_count_preview_two_matches() {
|
||||
let matcher = RegexMatcher::new(
|
||||
"exhibited|dusted|has to have it",
|
||||
).unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.stats(true)
|
||||
.max_columns(Some(46))
|
||||
.max_columns_preview(true)
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.line_number(false)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
but Doctor Watson has to have it taken out for [... 1 more match]
|
||||
and exhibited clearly, with a label attached.
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn max_columns_multi_line() {
|
||||
let matcher = RegexMatcher::new("(?s)ash.+dusted").unwrap();
|
||||
@@ -2492,36 +2275,6 @@ but Doctor Watson has to have it taken out for him and dusted,
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn max_columns_multi_line_preview() {
|
||||
let matcher = RegexMatcher::new(
|
||||
"(?s)clew|cigar ash.+have it|exhibited",
|
||||
).unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.stats(true)
|
||||
.max_columns(Some(46))
|
||||
.max_columns_preview(true)
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.line_number(false)
|
||||
.multi_line(true)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
can extract a clew from a wisp of straw or a f [... 1 more match]
|
||||
but Doctor Watson has to have it taken out for [... 0 more matches]
|
||||
and exhibited clearly, with a label attached.
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn max_matches() {
|
||||
let matcher = RegexMatcher::new("Sherlock").unwrap();
|
||||
@@ -2811,40 +2564,8 @@ Holmeses, success in the province of detective work must always
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn only_matching_max_columns_preview() {
|
||||
let matcher = RegexMatcher::new("Doctor Watsons|Sherlock").unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.only_matching(true)
|
||||
.max_columns(Some(10))
|
||||
.max_columns_preview(true)
|
||||
.column(true)
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.line_number(true)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
1:9:Doctor Wat [... 0 more matches]
|
||||
1:57:Sherlock
|
||||
3:49:Sherlock
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn only_matching_max_columns_multi_line1() {
|
||||
// The `(?s:.{0})` trick fools the matcher into thinking that it
|
||||
// can match across multiple lines without actually doing so. This is
|
||||
// so we can test multi-line handling in the case of a match on only
|
||||
// one line.
|
||||
let matcher = RegexMatcher::new(
|
||||
r"(?s:.{0})(Doctor Watsons|Sherlock)"
|
||||
).unwrap();
|
||||
@@ -2873,41 +2594,6 @@ Holmeses, success in the province of detective work must always
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn only_matching_max_columns_preview_multi_line1() {
|
||||
// The `(?s:.{0})` trick fools the matcher into thinking that it
|
||||
// can match across multiple lines without actually doing so. This is
|
||||
// so we can test multi-line handling in the case of a match on only
|
||||
// one line.
|
||||
let matcher = RegexMatcher::new(
|
||||
r"(?s:.{0})(Doctor Watsons|Sherlock)"
|
||||
).unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.only_matching(true)
|
||||
.max_columns(Some(10))
|
||||
.max_columns_preview(true)
|
||||
.column(true)
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.multi_line(true)
|
||||
.line_number(true)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
1:9:Doctor Wat [... 0 more matches]
|
||||
1:57:Sherlock
|
||||
3:49:Sherlock
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn only_matching_max_columns_multi_line2() {
|
||||
let matcher = RegexMatcher::new(
|
||||
@@ -2939,38 +2625,6 @@ Holmeses, success in the province of detective work must always
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn only_matching_max_columns_preview_multi_line2() {
|
||||
let matcher = RegexMatcher::new(
|
||||
r"(?s)Watson.+?(Holmeses|clearly)"
|
||||
).unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.only_matching(true)
|
||||
.max_columns(Some(50))
|
||||
.max_columns_preview(true)
|
||||
.column(true)
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.multi_line(true)
|
||||
.line_number(true)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
1:16:Watsons of this world, as opposed to the Sherlock
|
||||
2:16:Holmeses
|
||||
5:12:Watson has to have it taken out for him and dusted [... 0 more matches]
|
||||
6:12:and exhibited clearly
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn per_match() {
|
||||
let matcher = RegexMatcher::new("Doctor Watsons|Sherlock").unwrap();
|
||||
@@ -3166,61 +2820,6 @@ Holmeses, success in the province of detective work must always
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn replacement_max_columns_preview1() {
|
||||
let matcher = RegexMatcher::new(r"Sherlock|Doctor (\w+)").unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.max_columns(Some(67))
|
||||
.max_columns_preview(true)
|
||||
.replacement(Some(b"doctah $1 MD".to_vec()))
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.line_number(true)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
1:For the doctah Watsons MD of this world, as opposed to the doctah [... 0 more matches]
|
||||
3:be, to a very large extent, the result of luck. doctah MD Holmes
|
||||
5:but doctah Watson MD has to have it taken out for him and dusted,
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn replacement_max_columns_preview2() {
|
||||
let matcher = RegexMatcher::new(
|
||||
"exhibited|dusted|has to have it",
|
||||
).unwrap();
|
||||
let mut printer = StandardBuilder::new()
|
||||
.max_columns(Some(43))
|
||||
.max_columns_preview(true)
|
||||
.replacement(Some(b"xxx".to_vec()))
|
||||
.build(NoColor::new(vec![]));
|
||||
SearcherBuilder::new()
|
||||
.line_number(false)
|
||||
.build()
|
||||
.search_reader(
|
||||
&matcher,
|
||||
SHERLOCK.as_bytes(),
|
||||
printer.sink(&matcher),
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let got = printer_contents(&mut printer);
|
||||
let expected = "\
|
||||
but Doctor Watson xxx taken out for him and [... 1 more match]
|
||||
and xxx clearly, with a label attached.
|
||||
";
|
||||
assert_eq_printed!(expected, got);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn replacement_only_matching() {
|
||||
let matcher = RegexMatcher::new(r"Sherlock|Doctor (\w+)").unwrap();
|
||||
|
@@ -636,34 +636,6 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for SummarySink<'p, 's, M, W> {
|
||||
stats.add_bytes_searched(finish.byte_count());
|
||||
stats.add_bytes_printed(self.summary.wtr.borrow().count());
|
||||
}
|
||||
// If our binary detection method says to quit after seeing binary
|
||||
// data, then we shouldn't print any results at all, even if we've
|
||||
// found a match before detecting binary data. The intent here is to
|
||||
// keep BinaryDetection::quit as a form of filter. Otherwise, we can
|
||||
// present a matching file with a smaller number of matches than
|
||||
// there might be, which can be quite misleading.
|
||||
//
|
||||
// If our binary detection method is to convert binary data, then we
|
||||
// don't quit and therefore search the entire contents of the file.
|
||||
//
|
||||
// There is an unfortunate inconsistency here. Namely, when using
|
||||
// Quiet or PathWithMatch, then the printer can quit after the first
|
||||
// match seen, which could be long before seeing binary data. This
|
||||
// means that using PathWithMatch can print a path where as using
|
||||
// Count might not print it at all because of binary data.
|
||||
//
|
||||
// It's not possible to fix this without also potentially significantly
|
||||
// impacting the performance of Quiet or PathWithMatch, so we accept
|
||||
// the bug.
|
||||
if self.binary_byte_offset.is_some()
|
||||
&& searcher.binary_detection().quit_byte().is_some()
|
||||
{
|
||||
// Squash the match count. The statistics reported will still
|
||||
// contain the match count, but the "official" match count should
|
||||
// be zero.
|
||||
self.match_count = 0;
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
let show_count =
|
||||
!self.summary.config.exclude_zero
|
||||
|
@@ -346,7 +346,7 @@ impl Serialize for NiceDuration {
|
||||
///
|
||||
/// This stops trimming a prefix as soon as it sees non-whitespace or a line
|
||||
/// terminator.
|
||||
pub fn trim_ascii_prefix(
|
||||
pub fn trim_ascii_prefix_range(
|
||||
line_term: LineTerminator,
|
||||
slice: &[u8],
|
||||
range: Match,
|
||||
@@ -366,3 +366,14 @@ pub fn trim_ascii_prefix(
|
||||
.count();
|
||||
range.with_start(range.start() + count)
|
||||
}
|
||||
|
||||
/// Trim prefix ASCII spaces from the given slice and return the corresponding
|
||||
/// sub-slice.
|
||||
pub fn trim_ascii_prefix(line_term: LineTerminator, slice: &[u8]) -> &[u8] {
|
||||
let range = trim_ascii_prefix_range(
|
||||
line_term,
|
||||
slice,
|
||||
Match::new(0, slice.len()),
|
||||
);
|
||||
&slice[range]
|
||||
}
|
||||
|
@@ -13,9 +13,8 @@ keywords = ["regex", "grep", "search", "pattern", "line"]
|
||||
license = "Unlicense/MIT"
|
||||
|
||||
[dependencies]
|
||||
aho-corasick = "0.7.3"
|
||||
grep-matcher = { version = "0.1.2", path = "../grep-matcher" }
|
||||
log = "0.4.5"
|
||||
grep-matcher = { version = "0.1.1", path = "../grep-matcher" }
|
||||
regex = "1.1"
|
||||
regex-syntax = "0.6.5"
|
||||
thread_local = "0.3.6"
|
||||
|
@@ -1,13 +1,12 @@
|
||||
use grep_matcher::{ByteSet, LineTerminator};
|
||||
use regex::bytes::{Regex, RegexBuilder};
|
||||
use regex_syntax::ast::{self, Ast};
|
||||
use regex_syntax::hir::{self, Hir};
|
||||
use regex_syntax::hir::Hir;
|
||||
|
||||
use ast::AstAnalysis;
|
||||
use crlf::crlfify;
|
||||
use error::Error;
|
||||
use literal::LiteralSets;
|
||||
use multi::alternation_literals;
|
||||
use non_matching::non_matching_bytes;
|
||||
use strip::strip_from_match;
|
||||
|
||||
@@ -68,17 +67,19 @@ impl Config {
|
||||
/// If there was a problem parsing the given expression then an error
|
||||
/// is returned.
|
||||
pub fn hir(&self, pattern: &str) -> Result<ConfiguredHIR, Error> {
|
||||
let ast = self.ast(pattern)?;
|
||||
let analysis = self.analysis(&ast)?;
|
||||
let expr = hir::translate::TranslatorBuilder::new()
|
||||
let analysis = self.analysis(pattern)?;
|
||||
let expr = ::regex_syntax::ParserBuilder::new()
|
||||
.nest_limit(self.nest_limit)
|
||||
.octal(self.octal)
|
||||
.allow_invalid_utf8(true)
|
||||
.case_insensitive(self.is_case_insensitive(&analysis))
|
||||
.ignore_whitespace(self.ignore_whitespace)
|
||||
.case_insensitive(self.is_case_insensitive(&analysis)?)
|
||||
.multi_line(self.multi_line)
|
||||
.dot_matches_new_line(self.dot_matches_new_line)
|
||||
.swap_greed(self.swap_greed)
|
||||
.unicode(self.unicode)
|
||||
.build()
|
||||
.translate(pattern, &ast)
|
||||
.parse(pattern)
|
||||
.map_err(Error::regex)?;
|
||||
let expr = match self.line_terminator {
|
||||
None => expr,
|
||||
@@ -98,34 +99,21 @@ impl Config {
|
||||
fn is_case_insensitive(
|
||||
&self,
|
||||
analysis: &AstAnalysis,
|
||||
) -> bool {
|
||||
) -> Result<bool, Error> {
|
||||
if self.case_insensitive {
|
||||
return true;
|
||||
return Ok(true);
|
||||
}
|
||||
if !self.case_smart {
|
||||
return false;
|
||||
return Ok(false);
|
||||
}
|
||||
analysis.any_literal() && !analysis.any_uppercase()
|
||||
}
|
||||
|
||||
/// Returns true if and only if this config is simple enough such that
|
||||
/// if the pattern is a simple alternation of literals, then it can be
|
||||
/// constructed via a plain Aho-Corasick automaton.
|
||||
///
|
||||
/// Note that it is OK to return true even when settings like `multi_line`
|
||||
/// are enabled, since if multi-line can impact the match semantics of a
|
||||
/// regex, then it is by definition not a simple alternation of literals.
|
||||
pub fn can_plain_aho_corasick(&self) -> bool {
|
||||
!self.word
|
||||
&& !self.case_insensitive
|
||||
&& !self.case_smart
|
||||
Ok(analysis.any_literal() && !analysis.any_uppercase())
|
||||
}
|
||||
|
||||
/// Perform analysis on the AST of this pattern.
|
||||
///
|
||||
/// This returns an error if the given pattern failed to parse.
|
||||
fn analysis(&self, ast: &Ast) -> Result<AstAnalysis, Error> {
|
||||
Ok(AstAnalysis::from_ast(ast))
|
||||
fn analysis(&self, pattern: &str) -> Result<AstAnalysis, Error> {
|
||||
Ok(AstAnalysis::from_ast(&self.ast(pattern)?))
|
||||
}
|
||||
|
||||
/// Parse the given pattern into its abstract syntax.
|
||||
@@ -185,15 +173,6 @@ impl ConfiguredHIR {
|
||||
self.pattern_to_regex(&self.expr.to_string())
|
||||
}
|
||||
|
||||
/// If this HIR corresponds to an alternation of literals with no
|
||||
/// capturing groups, then this returns those literals.
|
||||
pub fn alternation_literals(&self) -> Option<Vec<Vec<u8>>> {
|
||||
if !self.config.can_plain_aho_corasick() {
|
||||
return None;
|
||||
}
|
||||
alternation_literals(&self.expr)
|
||||
}
|
||||
|
||||
/// Applies the given function to the concrete syntax of this HIR and then
|
||||
/// generates a new HIR based on the result of the function in a way that
|
||||
/// preserves the configuration.
|
||||
|
@@ -76,9 +76,7 @@ impl Matcher for CRLFMatcher {
|
||||
caps: &mut RegexCaptures,
|
||||
) -> Result<bool, NoError> {
|
||||
caps.strip_crlf(false);
|
||||
let r = self.regex.captures_read_at(
|
||||
caps.locations_mut(), haystack, at,
|
||||
);
|
||||
let r = self.regex.captures_read_at(caps.locations(), haystack, at);
|
||||
if !r.is_some() {
|
||||
return Ok(false);
|
||||
}
|
||||
|
@@ -4,7 +4,6 @@ An implementation of `grep-matcher`'s `Matcher` trait for Rust's regex engine.
|
||||
|
||||
#![deny(missing_docs)]
|
||||
|
||||
extern crate aho_corasick;
|
||||
extern crate grep_matcher;
|
||||
#[macro_use]
|
||||
extern crate log;
|
||||
@@ -22,7 +21,6 @@ mod crlf;
|
||||
mod error;
|
||||
mod literal;
|
||||
mod matcher;
|
||||
mod multi;
|
||||
mod non_matching;
|
||||
mod strip;
|
||||
mod util;
|
||||
|
@@ -8,7 +8,6 @@ use regex::bytes::{CaptureLocations, Regex};
|
||||
use config::{Config, ConfiguredHIR};
|
||||
use crlf::CRLFMatcher;
|
||||
use error::Error;
|
||||
use multi::MultiLiteralMatcher;
|
||||
use word::WordMatcher;
|
||||
|
||||
/// A builder for constructing a `Matcher` using regular expressions.
|
||||
@@ -53,7 +52,7 @@ impl RegexMatcherBuilder {
|
||||
}
|
||||
|
||||
let matcher = RegexMatcherImpl::new(&chir)?;
|
||||
trace!("final regex: {:?}", matcher.regex());
|
||||
trace!("final regex: {:?}", matcher.regex().to_string());
|
||||
Ok(RegexMatcher {
|
||||
config: self.config.clone(),
|
||||
matcher: matcher,
|
||||
@@ -62,29 +61,6 @@ impl RegexMatcherBuilder {
|
||||
})
|
||||
}
|
||||
|
||||
/// Build a new matcher from a plain alternation of literals.
|
||||
///
|
||||
/// Depending on the configuration set by the builder, this may be able to
|
||||
/// build a matcher substantially faster than by joining the patterns with
|
||||
/// a `|` and calling `build`.
|
||||
pub fn build_literals<B: AsRef<str>>(
|
||||
&self,
|
||||
literals: &[B],
|
||||
) -> Result<RegexMatcher, Error> {
|
||||
let slices: Vec<_> = literals.iter().map(|s| s.as_ref()).collect();
|
||||
if !self.config.can_plain_aho_corasick() || literals.len() < 40 {
|
||||
return self.build(&slices.join("|"));
|
||||
}
|
||||
let matcher = MultiLiteralMatcher::new(&slices)?;
|
||||
let imp = RegexMatcherImpl::MultiLiteral(matcher);
|
||||
Ok(RegexMatcher {
|
||||
config: self.config.clone(),
|
||||
matcher: imp,
|
||||
fast_line_regex: None,
|
||||
non_matching_bytes: ByteSet::empty(),
|
||||
})
|
||||
}
|
||||
|
||||
/// Set the value for the case insensitive (`i`) flag.
|
||||
///
|
||||
/// When enabled, letters in the pattern will match both upper case and
|
||||
@@ -372,8 +348,6 @@ impl RegexMatcher {
|
||||
enum RegexMatcherImpl {
|
||||
/// The standard matcher used for all regular expressions.
|
||||
Standard(StandardMatcher),
|
||||
/// A matcher for an alternation of plain literals.
|
||||
MultiLiteral(MultiLiteralMatcher),
|
||||
/// A matcher that strips `\r` from the end of matches.
|
||||
///
|
||||
/// This is only used when the CRLF hack is enabled and the regex is line
|
||||
@@ -396,23 +370,16 @@ impl RegexMatcherImpl {
|
||||
} else if expr.needs_crlf_stripped() {
|
||||
Ok(RegexMatcherImpl::CRLF(CRLFMatcher::new(expr)?))
|
||||
} else {
|
||||
if let Some(lits) = expr.alternation_literals() {
|
||||
if lits.len() >= 40 {
|
||||
let matcher = MultiLiteralMatcher::new(&lits)?;
|
||||
return Ok(RegexMatcherImpl::MultiLiteral(matcher));
|
||||
}
|
||||
}
|
||||
Ok(RegexMatcherImpl::Standard(StandardMatcher::new(expr)?))
|
||||
}
|
||||
}
|
||||
|
||||
/// Return the underlying regex object used.
|
||||
fn regex(&self) -> String {
|
||||
fn regex(&self) -> &Regex {
|
||||
match *self {
|
||||
RegexMatcherImpl::Word(ref x) => x.regex().to_string(),
|
||||
RegexMatcherImpl::CRLF(ref x) => x.regex().to_string(),
|
||||
RegexMatcherImpl::MultiLiteral(_) => "<N/A>".to_string(),
|
||||
RegexMatcherImpl::Standard(ref x) => x.regex.to_string(),
|
||||
RegexMatcherImpl::Word(ref x) => x.regex(),
|
||||
RegexMatcherImpl::CRLF(ref x) => x.regex(),
|
||||
RegexMatcherImpl::Standard(ref x) => &x.regex,
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -432,7 +399,6 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.find_at(haystack, at),
|
||||
MultiLiteral(ref m) => m.find_at(haystack, at),
|
||||
CRLF(ref m) => m.find_at(haystack, at),
|
||||
Word(ref m) => m.find_at(haystack, at),
|
||||
}
|
||||
@@ -442,7 +408,6 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.new_captures(),
|
||||
MultiLiteral(ref m) => m.new_captures(),
|
||||
CRLF(ref m) => m.new_captures(),
|
||||
Word(ref m) => m.new_captures(),
|
||||
}
|
||||
@@ -452,7 +417,6 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.capture_count(),
|
||||
MultiLiteral(ref m) => m.capture_count(),
|
||||
CRLF(ref m) => m.capture_count(),
|
||||
Word(ref m) => m.capture_count(),
|
||||
}
|
||||
@@ -462,7 +426,6 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.capture_index(name),
|
||||
MultiLiteral(ref m) => m.capture_index(name),
|
||||
CRLF(ref m) => m.capture_index(name),
|
||||
Word(ref m) => m.capture_index(name),
|
||||
}
|
||||
@@ -472,7 +435,6 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.find(haystack),
|
||||
MultiLiteral(ref m) => m.find(haystack),
|
||||
CRLF(ref m) => m.find(haystack),
|
||||
Word(ref m) => m.find(haystack),
|
||||
}
|
||||
@@ -488,7 +450,6 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.find_iter(haystack, matched),
|
||||
MultiLiteral(ref m) => m.find_iter(haystack, matched),
|
||||
CRLF(ref m) => m.find_iter(haystack, matched),
|
||||
Word(ref m) => m.find_iter(haystack, matched),
|
||||
}
|
||||
@@ -504,7 +465,6 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.try_find_iter(haystack, matched),
|
||||
MultiLiteral(ref m) => m.try_find_iter(haystack, matched),
|
||||
CRLF(ref m) => m.try_find_iter(haystack, matched),
|
||||
Word(ref m) => m.try_find_iter(haystack, matched),
|
||||
}
|
||||
@@ -518,7 +478,6 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.captures(haystack, caps),
|
||||
MultiLiteral(ref m) => m.captures(haystack, caps),
|
||||
CRLF(ref m) => m.captures(haystack, caps),
|
||||
Word(ref m) => m.captures(haystack, caps),
|
||||
}
|
||||
@@ -535,7 +494,6 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.captures_iter(haystack, caps, matched),
|
||||
MultiLiteral(ref m) => m.captures_iter(haystack, caps, matched),
|
||||
CRLF(ref m) => m.captures_iter(haystack, caps, matched),
|
||||
Word(ref m) => m.captures_iter(haystack, caps, matched),
|
||||
}
|
||||
@@ -552,9 +510,6 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.try_captures_iter(haystack, caps, matched),
|
||||
MultiLiteral(ref m) => {
|
||||
m.try_captures_iter(haystack, caps, matched)
|
||||
}
|
||||
CRLF(ref m) => m.try_captures_iter(haystack, caps, matched),
|
||||
Word(ref m) => m.try_captures_iter(haystack, caps, matched),
|
||||
}
|
||||
@@ -569,7 +524,6 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.captures_at(haystack, at, caps),
|
||||
MultiLiteral(ref m) => m.captures_at(haystack, at, caps),
|
||||
CRLF(ref m) => m.captures_at(haystack, at, caps),
|
||||
Word(ref m) => m.captures_at(haystack, at, caps),
|
||||
}
|
||||
@@ -586,7 +540,6 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.replace(haystack, dst, append),
|
||||
MultiLiteral(ref m) => m.replace(haystack, dst, append),
|
||||
CRLF(ref m) => m.replace(haystack, dst, append),
|
||||
Word(ref m) => m.replace(haystack, dst, append),
|
||||
}
|
||||
@@ -606,9 +559,6 @@ impl Matcher for RegexMatcher {
|
||||
Standard(ref m) => {
|
||||
m.replace_with_captures(haystack, caps, dst, append)
|
||||
}
|
||||
MultiLiteral(ref m) => {
|
||||
m.replace_with_captures(haystack, caps, dst, append)
|
||||
}
|
||||
CRLF(ref m) => {
|
||||
m.replace_with_captures(haystack, caps, dst, append)
|
||||
}
|
||||
@@ -622,7 +572,6 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.is_match(haystack),
|
||||
MultiLiteral(ref m) => m.is_match(haystack),
|
||||
CRLF(ref m) => m.is_match(haystack),
|
||||
Word(ref m) => m.is_match(haystack),
|
||||
}
|
||||
@@ -636,7 +585,6 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.is_match_at(haystack, at),
|
||||
MultiLiteral(ref m) => m.is_match_at(haystack, at),
|
||||
CRLF(ref m) => m.is_match_at(haystack, at),
|
||||
Word(ref m) => m.is_match_at(haystack, at),
|
||||
}
|
||||
@@ -649,7 +597,6 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.shortest_match(haystack),
|
||||
MultiLiteral(ref m) => m.shortest_match(haystack),
|
||||
CRLF(ref m) => m.shortest_match(haystack),
|
||||
Word(ref m) => m.shortest_match(haystack),
|
||||
}
|
||||
@@ -663,7 +610,6 @@ impl Matcher for RegexMatcher {
|
||||
use self::RegexMatcherImpl::*;
|
||||
match self.matcher {
|
||||
Standard(ref m) => m.shortest_match_at(haystack, at),
|
||||
MultiLiteral(ref m) => m.shortest_match_at(haystack, at),
|
||||
CRLF(ref m) => m.shortest_match_at(haystack, at),
|
||||
Word(ref m) => m.shortest_match_at(haystack, at),
|
||||
}
|
||||
@@ -764,9 +710,7 @@ impl Matcher for StandardMatcher {
|
||||
at: usize,
|
||||
caps: &mut RegexCaptures,
|
||||
) -> Result<bool, NoError> {
|
||||
Ok(self.regex.captures_read_at(
|
||||
&mut caps.locations_mut(), haystack, at,
|
||||
).is_some())
|
||||
Ok(self.regex.captures_read_at(&mut caps.locs, haystack, at).is_some())
|
||||
}
|
||||
|
||||
fn shortest_match_at(
|
||||
@@ -793,84 +737,54 @@ impl Matcher for StandardMatcher {
|
||||
/// index of the group using the corresponding matcher's `capture_index`
|
||||
/// method, and then use that index with `RegexCaptures::get`.
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct RegexCaptures(RegexCapturesImp);
|
||||
|
||||
#[derive(Clone, Debug)]
|
||||
enum RegexCapturesImp {
|
||||
AhoCorasick {
|
||||
/// The start and end of the match, corresponding to capture group 0.
|
||||
mat: Option<Match>,
|
||||
},
|
||||
Regex {
|
||||
/// Where the locations are stored.
|
||||
locs: CaptureLocations,
|
||||
/// These captures behave as if the capturing groups begin at the given
|
||||
/// offset. When set to `0`, this has no affect and capture groups are
|
||||
/// indexed like normal.
|
||||
///
|
||||
/// This is useful when building matchers that wrap arbitrary regular
|
||||
/// expressions. For example, `WordMatcher` takes an existing regex
|
||||
/// `re` and creates `(?:^|\W)(re)(?:$|\W)`, but hides the fact that
|
||||
/// the regex has been wrapped from the caller. In order to do this,
|
||||
/// the matcher and the capturing groups must behave as if `(re)` is
|
||||
/// the `0`th capture group.
|
||||
offset: usize,
|
||||
/// When enable, the end of a match has `\r` stripped from it, if one
|
||||
/// exists.
|
||||
strip_crlf: bool,
|
||||
},
|
||||
pub struct RegexCaptures {
|
||||
/// Where the locations are stored.
|
||||
locs: CaptureLocations,
|
||||
/// These captures behave as if the capturing groups begin at the given
|
||||
/// offset. When set to `0`, this has no affect and capture groups are
|
||||
/// indexed like normal.
|
||||
///
|
||||
/// This is useful when building matchers that wrap arbitrary regular
|
||||
/// expressions. For example, `WordMatcher` takes an existing regex `re`
|
||||
/// and creates `(?:^|\W)(re)(?:$|\W)`, but hides the fact that the regex
|
||||
/// has been wrapped from the caller. In order to do this, the matcher
|
||||
/// and the capturing groups must behave as if `(re)` is the `0`th capture
|
||||
/// group.
|
||||
offset: usize,
|
||||
/// When enable, the end of a match has `\r` stripped from it, if one
|
||||
/// exists.
|
||||
strip_crlf: bool,
|
||||
}
|
||||
|
||||
impl Captures for RegexCaptures {
|
||||
fn len(&self) -> usize {
|
||||
match self.0 {
|
||||
RegexCapturesImp::AhoCorasick { .. } => 1,
|
||||
RegexCapturesImp::Regex { ref locs, offset, .. } => {
|
||||
locs.len().checked_sub(offset).unwrap()
|
||||
}
|
||||
}
|
||||
self.locs.len().checked_sub(self.offset).unwrap()
|
||||
}
|
||||
|
||||
fn get(&self, i: usize) -> Option<Match> {
|
||||
match self.0 {
|
||||
RegexCapturesImp::AhoCorasick { mat, .. } => {
|
||||
if i == 0 {
|
||||
mat
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
RegexCapturesImp::Regex { ref locs, offset, strip_crlf } => {
|
||||
if !strip_crlf {
|
||||
let actual = i.checked_add(offset).unwrap();
|
||||
return locs.pos(actual).map(|(s, e)| Match::new(s, e));
|
||||
}
|
||||
if !self.strip_crlf {
|
||||
let actual = i.checked_add(self.offset).unwrap();
|
||||
return self.locs.pos(actual).map(|(s, e)| Match::new(s, e));
|
||||
}
|
||||
|
||||
// currently don't support capture offsetting with CRLF
|
||||
// stripping
|
||||
assert_eq!(offset, 0);
|
||||
let m = match locs.pos(i).map(|(s, e)| Match::new(s, e)) {
|
||||
None => return None,
|
||||
Some(m) => m,
|
||||
};
|
||||
// If the end position of this match corresponds to the end
|
||||
// position of the overall match, then we apply our CRLF
|
||||
// stripping. Otherwise, we cannot assume stripping is correct.
|
||||
if i == 0 || m.end() == locs.pos(0).unwrap().1 {
|
||||
Some(m.with_end(m.end() - 1))
|
||||
} else {
|
||||
Some(m)
|
||||
}
|
||||
}
|
||||
// currently don't support capture offsetting with CRLF stripping
|
||||
assert_eq!(self.offset, 0);
|
||||
let m = match self.locs.pos(i).map(|(s, e)| Match::new(s, e)) {
|
||||
None => return None,
|
||||
Some(m) => m,
|
||||
};
|
||||
// If the end position of this match corresponds to the end position
|
||||
// of the overall match, then we apply our CRLF stripping. Otherwise,
|
||||
// we cannot assume stripping is correct.
|
||||
if i == 0 || m.end() == self.locs.pos(0).unwrap().1 {
|
||||
Some(m.with_end(m.end() - 1))
|
||||
} else {
|
||||
Some(m)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl RegexCaptures {
|
||||
pub(crate) fn simple() -> RegexCaptures {
|
||||
RegexCaptures(RegexCapturesImp::AhoCorasick { mat: None })
|
||||
}
|
||||
|
||||
pub(crate) fn new(locs: CaptureLocations) -> RegexCaptures {
|
||||
RegexCaptures::with_offset(locs, 0)
|
||||
}
|
||||
@@ -879,53 +793,15 @@ impl RegexCaptures {
|
||||
locs: CaptureLocations,
|
||||
offset: usize,
|
||||
) -> RegexCaptures {
|
||||
RegexCaptures(RegexCapturesImp::Regex {
|
||||
locs, offset, strip_crlf: false,
|
||||
})
|
||||
RegexCaptures { locs, offset, strip_crlf: false }
|
||||
}
|
||||
|
||||
pub(crate) fn locations(&self) -> &CaptureLocations {
|
||||
match self.0 {
|
||||
RegexCapturesImp::AhoCorasick { .. } => {
|
||||
panic!("getting locations for simple captures is invalid")
|
||||
}
|
||||
RegexCapturesImp::Regex { ref locs, .. } => {
|
||||
locs
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub(crate) fn locations_mut(&mut self) -> &mut CaptureLocations {
|
||||
match self.0 {
|
||||
RegexCapturesImp::AhoCorasick { .. } => {
|
||||
panic!("getting locations for simple captures is invalid")
|
||||
}
|
||||
RegexCapturesImp::Regex { ref mut locs, .. } => {
|
||||
locs
|
||||
}
|
||||
}
|
||||
pub(crate) fn locations(&mut self) -> &mut CaptureLocations {
|
||||
&mut self.locs
|
||||
}
|
||||
|
||||
pub(crate) fn strip_crlf(&mut self, yes: bool) {
|
||||
match self.0 {
|
||||
RegexCapturesImp::AhoCorasick { .. } => {
|
||||
panic!("setting strip_crlf for simple captures is invalid")
|
||||
}
|
||||
RegexCapturesImp::Regex { ref mut strip_crlf, .. } => {
|
||||
*strip_crlf = yes;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub(crate) fn set_simple(&mut self, one: Option<Match>) {
|
||||
match self.0 {
|
||||
RegexCapturesImp::AhoCorasick { ref mut mat } => {
|
||||
*mat = one;
|
||||
}
|
||||
RegexCapturesImp::Regex { .. } => {
|
||||
panic!("setting simple captures for regex is invalid")
|
||||
}
|
||||
}
|
||||
self.strip_crlf = yes;
|
||||
}
|
||||
}
|
||||
|
||||
|
@@ -1,127 +0,0 @@
|
||||
use aho_corasick::{AhoCorasick, AhoCorasickBuilder, MatchKind};
|
||||
use grep_matcher::{Matcher, Match, NoError};
|
||||
use regex_syntax::hir::Hir;
|
||||
|
||||
use error::Error;
|
||||
use matcher::RegexCaptures;
|
||||
|
||||
/// A matcher for an alternation of literals.
|
||||
///
|
||||
/// Ideally, this optimization would be pushed down into the regex engine, but
|
||||
/// making this work correctly there would require quite a bit of refactoring.
|
||||
/// Moreover, doing it one layer above lets us do thing like, "if we
|
||||
/// specifically only want to search for literals, then don't bother with
|
||||
/// regex parsing at all."
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct MultiLiteralMatcher {
|
||||
/// The Aho-Corasick automaton.
|
||||
ac: AhoCorasick,
|
||||
}
|
||||
|
||||
impl MultiLiteralMatcher {
|
||||
/// Create a new multi-literal matcher from the given literals.
|
||||
pub fn new<B: AsRef<[u8]>>(
|
||||
literals: &[B],
|
||||
) -> Result<MultiLiteralMatcher, Error> {
|
||||
let ac = AhoCorasickBuilder::new()
|
||||
.match_kind(MatchKind::LeftmostFirst)
|
||||
.auto_configure(literals)
|
||||
.build_with_size::<usize, _, _>(literals)
|
||||
.map_err(Error::regex)?;
|
||||
Ok(MultiLiteralMatcher { ac })
|
||||
}
|
||||
}
|
||||
|
||||
impl Matcher for MultiLiteralMatcher {
|
||||
type Captures = RegexCaptures;
|
||||
type Error = NoError;
|
||||
|
||||
fn find_at(
|
||||
&self,
|
||||
haystack: &[u8],
|
||||
at: usize,
|
||||
) -> Result<Option<Match>, NoError> {
|
||||
match self.ac.find(&haystack[at..]) {
|
||||
None => Ok(None),
|
||||
Some(m) => Ok(Some(Match::new(at + m.start(), at + m.end()))),
|
||||
}
|
||||
}
|
||||
|
||||
fn new_captures(&self) -> Result<RegexCaptures, NoError> {
|
||||
Ok(RegexCaptures::simple())
|
||||
}
|
||||
|
||||
fn capture_count(&self) -> usize {
|
||||
1
|
||||
}
|
||||
|
||||
fn capture_index(&self, _: &str) -> Option<usize> {
|
||||
None
|
||||
}
|
||||
|
||||
fn captures_at(
|
||||
&self,
|
||||
haystack: &[u8],
|
||||
at: usize,
|
||||
caps: &mut RegexCaptures,
|
||||
) -> Result<bool, NoError> {
|
||||
caps.set_simple(None);
|
||||
let mat = self.find_at(haystack, at)?;
|
||||
caps.set_simple(mat);
|
||||
Ok(mat.is_some())
|
||||
}
|
||||
|
||||
// We specifically do not implement other methods like find_iter. Namely,
|
||||
// the iter methods are guaranteed to be correct by virtue of implementing
|
||||
// find_at above.
|
||||
}
|
||||
|
||||
/// Alternation literals checks if the given HIR is a simple alternation of
|
||||
/// literals, and if so, returns them. Otherwise, this returns None.
|
||||
pub fn alternation_literals(expr: &Hir) -> Option<Vec<Vec<u8>>> {
|
||||
use regex_syntax::hir::{HirKind, Literal};
|
||||
|
||||
// This is pretty hacky, but basically, if `is_alternation_literal` is
|
||||
// true, then we can make several assumptions about the structure of our
|
||||
// HIR. This is what justifies the `unreachable!` statements below.
|
||||
|
||||
if !expr.is_alternation_literal() {
|
||||
return None;
|
||||
}
|
||||
let alts = match *expr.kind() {
|
||||
HirKind::Alternation(ref alts) => alts,
|
||||
_ => return None, // one literal isn't worth it
|
||||
};
|
||||
|
||||
let extendlit = |lit: &Literal, dst: &mut Vec<u8>| {
|
||||
match *lit {
|
||||
Literal::Unicode(c) => {
|
||||
let mut buf = [0; 4];
|
||||
dst.extend_from_slice(c.encode_utf8(&mut buf).as_bytes());
|
||||
}
|
||||
Literal::Byte(b) => {
|
||||
dst.push(b);
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
let mut lits = vec![];
|
||||
for alt in alts {
|
||||
let mut lit = vec![];
|
||||
match *alt.kind() {
|
||||
HirKind::Empty => {}
|
||||
HirKind::Literal(ref x) => extendlit(x, &mut lit),
|
||||
HirKind::Concat(ref exprs) => {
|
||||
for e in exprs {
|
||||
match *e.kind() {
|
||||
HirKind::Literal(ref x) => extendlit(x, &mut lit),
|
||||
_ => unreachable!("expected literal, got {:?}", e),
|
||||
}
|
||||
}
|
||||
}
|
||||
_ => unreachable!("expected literal or concat, got {:?}", alt),
|
||||
}
|
||||
lits.push(lit);
|
||||
}
|
||||
Some(lits)
|
||||
}
|
@@ -103,9 +103,7 @@ impl Matcher for WordMatcher {
|
||||
at: usize,
|
||||
caps: &mut RegexCaptures,
|
||||
) -> Result<bool, NoError> {
|
||||
let r = self.regex.captures_read_at(
|
||||
caps.locations_mut(), haystack, at,
|
||||
);
|
||||
let r = self.regex.captures_read_at(caps.locations(), haystack, at);
|
||||
Ok(r.is_some())
|
||||
}
|
||||
|
||||
|
@@ -17,7 +17,7 @@ bstr = { version = "0.1.2", default-features = false, features = ["std"] }
|
||||
bytecount = "0.5"
|
||||
encoding_rs = "0.8.14"
|
||||
encoding_rs_io = "0.1.6"
|
||||
grep-matcher = { version = "0.1.2", path = "../grep-matcher" }
|
||||
grep-matcher = { version = "0.1.1", path = "../grep-matcher" }
|
||||
log = "0.4.5"
|
||||
memmap = "0.7"
|
||||
|
||||
|
@@ -317,14 +317,6 @@ pub struct LineBuffer {
|
||||
}
|
||||
|
||||
impl LineBuffer {
|
||||
/// Set the binary detection method used on this line buffer.
|
||||
///
|
||||
/// This permits dynamically changing the binary detection strategy on
|
||||
/// an existing line buffer without needing to create a new one.
|
||||
pub fn set_binary_detection(&mut self, binary: BinaryDetection) {
|
||||
self.config.binary = binary;
|
||||
}
|
||||
|
||||
/// Reset this buffer, such that it can be used with a new reader.
|
||||
fn clear(&mut self) {
|
||||
self.pos = 0;
|
||||
|
@@ -90,13 +90,6 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
self.sink_matched(buf, range)
|
||||
}
|
||||
|
||||
pub fn binary_data(
|
||||
&mut self,
|
||||
binary_byte_offset: u64,
|
||||
) -> Result<bool, S::Error> {
|
||||
self.sink.binary_data(&self.searcher, binary_byte_offset)
|
||||
}
|
||||
|
||||
pub fn begin(&mut self) -> Result<bool, S::Error> {
|
||||
self.sink.begin(&self.searcher)
|
||||
}
|
||||
@@ -148,28 +141,19 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
consumed
|
||||
}
|
||||
|
||||
pub fn detect_binary(
|
||||
&mut self,
|
||||
buf: &[u8],
|
||||
range: &Range,
|
||||
) -> Result<bool, S::Error> {
|
||||
pub fn detect_binary(&mut self, buf: &[u8], range: &Range) -> bool {
|
||||
if self.binary_byte_offset.is_some() {
|
||||
return Ok(self.config.binary.quit_byte().is_some());
|
||||
return true;
|
||||
}
|
||||
let binary_byte = match self.config.binary.0 {
|
||||
BinaryDetection::Quit(b) => b,
|
||||
BinaryDetection::Convert(b) => b,
|
||||
_ => return Ok(false),
|
||||
_ => return false,
|
||||
};
|
||||
if let Some(i) = B(&buf[*range]).find_byte(binary_byte) {
|
||||
let offset = range.start() + i;
|
||||
self.binary_byte_offset = Some(offset);
|
||||
if !self.binary_data(offset as u64)? {
|
||||
return Ok(true);
|
||||
}
|
||||
Ok(self.config.binary.quit_byte().is_some())
|
||||
self.binary_byte_offset = Some(range.start() + i);
|
||||
true
|
||||
} else {
|
||||
Ok(false)
|
||||
false
|
||||
}
|
||||
}
|
||||
|
||||
@@ -432,7 +416,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
buf: &[u8],
|
||||
range: &Range,
|
||||
) -> Result<bool, S::Error> {
|
||||
if self.binary && self.detect_binary(buf, range)? {
|
||||
if self.binary && self.detect_binary(buf, range) {
|
||||
return Ok(false);
|
||||
}
|
||||
if !self.sink_break_context(range.start())? {
|
||||
@@ -464,7 +448,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
buf: &[u8],
|
||||
range: &Range,
|
||||
) -> Result<bool, S::Error> {
|
||||
if self.binary && self.detect_binary(buf, range)? {
|
||||
if self.binary && self.detect_binary(buf, range) {
|
||||
return Ok(false);
|
||||
}
|
||||
self.count_lines(buf, range.start());
|
||||
@@ -494,7 +478,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
) -> Result<bool, S::Error> {
|
||||
assert!(self.after_context_left >= 1);
|
||||
|
||||
if self.binary && self.detect_binary(buf, range)? {
|
||||
if self.binary && self.detect_binary(buf, range) {
|
||||
return Ok(false);
|
||||
}
|
||||
self.count_lines(buf, range.start());
|
||||
@@ -523,7 +507,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
buf: &[u8],
|
||||
range: &Range,
|
||||
) -> Result<bool, S::Error> {
|
||||
if self.binary && self.detect_binary(buf, range)? {
|
||||
if self.binary && self.detect_binary(buf, range) {
|
||||
return Ok(false);
|
||||
}
|
||||
self.count_lines(buf, range.start());
|
||||
|
@@ -51,7 +51,6 @@ where M: Matcher,
|
||||
fn fill(&mut self) -> Result<bool, S::Error> {
|
||||
assert!(self.rdr.buffer()[self.core.pos()..].is_empty());
|
||||
|
||||
let already_binary = self.rdr.binary_byte_offset().is_some();
|
||||
let old_buf_len = self.rdr.buffer().len();
|
||||
let consumed = self.core.roll(self.rdr.buffer());
|
||||
self.rdr.consume(consumed);
|
||||
@@ -59,14 +58,7 @@ where M: Matcher,
|
||||
Err(err) => return Err(S::Error::error_io(err)),
|
||||
Ok(didread) => didread,
|
||||
};
|
||||
if !already_binary {
|
||||
if let Some(offset) = self.rdr.binary_byte_offset() {
|
||||
if !self.core.binary_data(offset)? {
|
||||
return Ok(false);
|
||||
}
|
||||
}
|
||||
}
|
||||
if !didread || self.should_binary_quit() {
|
||||
if !didread || self.rdr.binary_byte_offset().is_some() {
|
||||
return Ok(false);
|
||||
}
|
||||
// If rolling the buffer didn't result in consuming anything and if
|
||||
@@ -79,11 +71,6 @@ where M: Matcher,
|
||||
}
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
fn should_binary_quit(&self) -> bool {
|
||||
self.rdr.binary_byte_offset().is_some()
|
||||
&& self.config.binary.quit_byte().is_some()
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug)]
|
||||
@@ -116,7 +103,7 @@ impl<'s, M: Matcher, S: Sink> SliceByLine<'s, M, S> {
|
||||
DEFAULT_BUFFER_CAPACITY,
|
||||
);
|
||||
let binary_range = Range::new(0, binary_upto);
|
||||
if !self.core.detect_binary(self.slice, &binary_range)? {
|
||||
if !self.core.detect_binary(self.slice, &binary_range) {
|
||||
while
|
||||
!self.slice[self.core.pos()..].is_empty()
|
||||
&& self.core.match_by_line(self.slice)?
|
||||
@@ -168,7 +155,7 @@ impl<'s, M: Matcher, S: Sink> MultiLine<'s, M, S> {
|
||||
DEFAULT_BUFFER_CAPACITY,
|
||||
);
|
||||
let binary_range = Range::new(0, binary_upto);
|
||||
if !self.core.detect_binary(self.slice, &binary_range)? {
|
||||
if !self.core.detect_binary(self.slice, &binary_range) {
|
||||
let mut keepgoing = true;
|
||||
while !self.slice[self.core.pos()..].is_empty() && keepgoing {
|
||||
keepgoing = self.sink()?;
|
||||
|
@@ -75,41 +75,25 @@ impl BinaryDetection {
|
||||
BinaryDetection(line_buffer::BinaryDetection::Quit(binary_byte))
|
||||
}
|
||||
|
||||
/// Binary detection is performed by looking for the given byte, and
|
||||
/// replacing it with the line terminator configured on the searcher.
|
||||
/// (If the searcher is configured to use `CRLF` as the line terminator,
|
||||
/// then this byte is replaced by just `LF`.)
|
||||
///
|
||||
/// When searching is performed using a fixed size buffer, then the
|
||||
/// contents of that buffer are always searched for the presence of this
|
||||
/// byte and replaced with the line terminator. In effect, the caller is
|
||||
/// guaranteed to never observe this byte while searching.
|
||||
///
|
||||
/// When searching is performed with the entire contents mapped into
|
||||
/// memory, then this setting has no effect and is ignored.
|
||||
pub fn convert(binary_byte: u8) -> BinaryDetection {
|
||||
// TODO(burntsushi): Figure out how to make binary conversion work. This
|
||||
// permits implementing GNU grep's default behavior, which is to zap NUL
|
||||
// bytes but still execute a search (if a match is detected, then GNU grep
|
||||
// stops and reports that a match was found but doesn't print the matching
|
||||
// line itself).
|
||||
//
|
||||
// This behavior is pretty simple to implement using the line buffer (and
|
||||
// in fact, it is already implemented and tested), since there's a fixed
|
||||
// size buffer that we can easily write to. The issue arises when searching
|
||||
// a `&[u8]` (whether on the heap or via a memory map), since this isn't
|
||||
// something we can easily write to.
|
||||
|
||||
/// The given byte is searched in all contents read by the line buffer. If
|
||||
/// it occurs, then it is replaced by the line terminator. The line buffer
|
||||
/// guarantees that this byte will never be observable by callers.
|
||||
#[allow(dead_code)]
|
||||
fn convert(binary_byte: u8) -> BinaryDetection {
|
||||
BinaryDetection(line_buffer::BinaryDetection::Convert(binary_byte))
|
||||
}
|
||||
|
||||
/// If this binary detection uses the "quit" strategy, then this returns
|
||||
/// the byte that will cause a search to quit. In any other case, this
|
||||
/// returns `None`.
|
||||
pub fn quit_byte(&self) -> Option<u8> {
|
||||
match self.0 {
|
||||
line_buffer::BinaryDetection::Quit(b) => Some(b),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
/// If this binary detection uses the "convert" strategy, then this returns
|
||||
/// the byte that will be replaced by the line terminator. In any other
|
||||
/// case, this returns `None`.
|
||||
pub fn convert_byte(&self) -> Option<u8> {
|
||||
match self.0 {
|
||||
line_buffer::BinaryDetection::Convert(b) => Some(b),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// An encoding to use when searching.
|
||||
@@ -755,12 +739,6 @@ impl Searcher {
|
||||
}
|
||||
}
|
||||
|
||||
/// Set the binary detection method used on this searcher.
|
||||
pub fn set_binary_detection(&mut self, detection: BinaryDetection) {
|
||||
self.config.binary = detection.clone();
|
||||
self.line_buffer.borrow_mut().set_binary_detection(detection.0);
|
||||
}
|
||||
|
||||
/// Check that the searcher's configuration and the matcher are consistent
|
||||
/// with each other.
|
||||
fn check_config<M: Matcher>(&self, matcher: M) -> Result<(), ConfigError> {
|
||||
@@ -800,12 +778,6 @@ impl Searcher {
|
||||
self.config.line_term
|
||||
}
|
||||
|
||||
/// Returns the type of binary detection configured on this searcher.
|
||||
#[inline]
|
||||
pub fn binary_detection(&self) -> &BinaryDetection {
|
||||
&self.config.binary
|
||||
}
|
||||
|
||||
/// Returns true if and only if this searcher is configured to invert its
|
||||
/// search results. That is, matching lines are lines that do **not** match
|
||||
/// the searcher's matcher.
|
||||
|
@@ -167,28 +167,6 @@ pub trait Sink {
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
/// This method is called whenever binary detection is enabled and binary
|
||||
/// data is found. If binary data is found, then this is called at least
|
||||
/// once for the first occurrence with the absolute byte offset at which
|
||||
/// the binary data begins.
|
||||
///
|
||||
/// If this returns `true`, then searching continues. If this returns
|
||||
/// `false`, then searching is stopped immediately and `finish` is called.
|
||||
///
|
||||
/// If this returns an error, then searching is stopped immediately,
|
||||
/// `finish` is not called and the error is bubbled back up to the caller
|
||||
/// of the searcher.
|
||||
///
|
||||
/// By default, it does nothing and returns `true`.
|
||||
#[inline]
|
||||
fn binary_data(
|
||||
&mut self,
|
||||
_searcher: &Searcher,
|
||||
_binary_byte_offset: u64,
|
||||
) -> Result<bool, Self::Error> {
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
/// This method is called when a search has begun, before any search is
|
||||
/// executed. By default, this does nothing.
|
||||
///
|
||||
@@ -250,15 +228,6 @@ impl<'a, S: Sink> Sink for &'a mut S {
|
||||
(**self).context_break(searcher)
|
||||
}
|
||||
|
||||
#[inline]
|
||||
fn binary_data(
|
||||
&mut self,
|
||||
searcher: &Searcher,
|
||||
binary_byte_offset: u64,
|
||||
) -> Result<bool, S::Error> {
|
||||
(**self).binary_data(searcher, binary_byte_offset)
|
||||
}
|
||||
|
||||
#[inline]
|
||||
fn begin(
|
||||
&mut self,
|
||||
@@ -306,15 +275,6 @@ impl<S: Sink + ?Sized> Sink for Box<S> {
|
||||
(**self).context_break(searcher)
|
||||
}
|
||||
|
||||
#[inline]
|
||||
fn binary_data(
|
||||
&mut self,
|
||||
searcher: &Searcher,
|
||||
binary_byte_offset: u64,
|
||||
) -> Result<bool, S::Error> {
|
||||
(**self).binary_data(searcher, binary_byte_offset)
|
||||
}
|
||||
|
||||
#[inline]
|
||||
fn begin(
|
||||
&mut self,
|
||||
|
@@ -14,7 +14,7 @@ license = "Unlicense/MIT"
|
||||
|
||||
[dependencies]
|
||||
grep-cli = { version = "0.1.1", path = "../grep-cli" }
|
||||
grep-matcher = { version = "0.1.2", path = "../grep-matcher" }
|
||||
grep-matcher = { version = "0.1.1", path = "../grep-matcher" }
|
||||
grep-pcre2 = { version = "0.1.2", path = "../grep-pcre2", optional = true }
|
||||
grep-printer = { version = "0.1.1", path = "../grep-printer" }
|
||||
grep-regex = { version = "0.1.1", path = "../grep-regex" }
|
||||
|
@@ -1,6 +1,6 @@
|
||||
[package]
|
||||
name = "ignore"
|
||||
version = "0.4.7" #:version
|
||||
version = "0.4.6" #:version
|
||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||
description = """
|
||||
A fast library for efficiently matching ignore files such as `.gitignore`
|
||||
@@ -19,7 +19,7 @@ bench = false
|
||||
|
||||
[dependencies]
|
||||
crossbeam-channel = "0.3.6"
|
||||
globset = { version = "0.4.3", path = "../globset" }
|
||||
globset = { version = "0.4.2", path = "../globset" }
|
||||
lazy_static = "1.1"
|
||||
log = "0.4.5"
|
||||
memchr = "2.1"
|
||||
|
@@ -111,7 +111,7 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
||||
("brotli", &["*.br"]),
|
||||
("buildstream", &["*.bst"]),
|
||||
("bzip2", &["*.bz2", "*.tbz2"]),
|
||||
("c", &["*.[chH]", "*.[chH].in", "*.cats"]),
|
||||
("c", &["*.c", "*.h", "*.H", "*.cats"]),
|
||||
("cabal", &["*.cabal"]),
|
||||
("cbor", &["*.cbor"]),
|
||||
("ceylon", &["*.ceylon"]),
|
||||
@@ -121,8 +121,8 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
||||
("creole", &["*.creole"]),
|
||||
("config", &["*.cfg", "*.conf", "*.config", "*.ini"]),
|
||||
("cpp", &[
|
||||
"*.[ChH]", "*.cc", "*.[ch]pp", "*.[ch]xx", "*.hh", "*.inl",
|
||||
"*.[ChH].in", "*.cc.in", "*.[ch]pp.in", "*.[ch]xx.in", "*.hh.in",
|
||||
"*.C", "*.cc", "*.cpp", "*.cxx",
|
||||
"*.h", "*.H", "*.hh", "*.hpp", "*.hxx", "*.inl",
|
||||
]),
|
||||
("crystal", &["Projectfile", "*.cr"]),
|
||||
("cs", &["*.cs"]),
|
||||
@@ -156,7 +156,7 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
||||
("hs", &["*.hs", "*.lhs"]),
|
||||
("html", &["*.htm", "*.html", "*.ejs"]),
|
||||
("idris", &["*.idr", "*.lidr"]),
|
||||
("java", &["*.java", "*.jsp", "*.jspx", "*.properties"]),
|
||||
("java", &["*.java", "*.jsp"]),
|
||||
("jinja", &["*.j2", "*.jinja", "*.jinja2"]),
|
||||
("js", &[
|
||||
"*.js", "*.jsx", "*.vue",
|
||||
@@ -196,16 +196,14 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
||||
"OFL-*[0-9]*",
|
||||
]),
|
||||
("lisp", &["*.el", "*.jl", "*.lisp", "*.lsp", "*.sc", "*.scm"]),
|
||||
("lock", &["*.lock", "package-lock.json"]),
|
||||
("log", &["*.log"]),
|
||||
("lua", &["*.lua"]),
|
||||
("lzma", &["*.lzma"]),
|
||||
("lz4", &["*.lz4"]),
|
||||
("m4", &["*.ac", "*.m4"]),
|
||||
("make", &[
|
||||
"[Gg][Nn][Uu]makefile", "[Mm]akefile",
|
||||
"[Gg][Nn][Uu]makefile.am", "[Mm]akefile.am",
|
||||
"[Gg][Nn][Uu]makefile.in", "[Mm]akefile.in",
|
||||
"gnumakefile", "Gnumakefile", "GNUmakefile",
|
||||
"makefile", "Makefile",
|
||||
"*.mk", "*.mak"
|
||||
]),
|
||||
("mako", &["*.mako", "*.mao"]),
|
||||
@@ -301,10 +299,7 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
||||
("vimscript", &["*.vim"]),
|
||||
("wiki", &["*.mediawiki", "*.wiki"]),
|
||||
("webidl", &["*.idl", "*.webidl", "*.widl"]),
|
||||
("xml", &[
|
||||
"*.xml", "*.xml.dist", "*.dtd", "*.xsl", "*.xslt", "*.xsd", "*.xjb",
|
||||
"*.rng", "*.sch",
|
||||
]),
|
||||
("xml", &["*.xml", "*.xml.dist"]),
|
||||
("xz", &["*.xz", "*.txz"]),
|
||||
("yacc", &["*.y"]),
|
||||
("yaml", &["*.yaml", "*.yml"]),
|
||||
|
171
src/app.rs
171
src/app.rs
@@ -27,9 +27,6 @@ configuration file. The file can specify one shell argument per line. Lines
|
||||
starting with '#' are ignored. For more details, see the man page or the
|
||||
README.
|
||||
|
||||
Tip: to disable all smart filtering and make ripgrep behave a bit more like
|
||||
classical grep, use 'rg -uuu'.
|
||||
|
||||
Project home page: https://github.com/BurntSushi/ripgrep
|
||||
|
||||
Use -h for short descriptions and --help for more details.";
|
||||
@@ -547,9 +544,7 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
|
||||
// flags are hidden and merely mentioned in the docs of the corresponding
|
||||
// "positive" flag.
|
||||
flag_after_context(&mut args);
|
||||
flag_auto_hybrid_regex(&mut args);
|
||||
flag_before_context(&mut args);
|
||||
flag_binary(&mut args);
|
||||
flag_block_buffered(&mut args);
|
||||
flag_byte_offset(&mut args);
|
||||
flag_case_sensitive(&mut args);
|
||||
@@ -583,7 +578,6 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
|
||||
flag_line_number(&mut args);
|
||||
flag_line_regexp(&mut args);
|
||||
flag_max_columns(&mut args);
|
||||
flag_max_columns_preview(&mut args);
|
||||
flag_max_count(&mut args);
|
||||
flag_max_depth(&mut args);
|
||||
flag_max_filesize(&mut args);
|
||||
@@ -606,7 +600,6 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
|
||||
flag_path_separator(&mut args);
|
||||
flag_passthru(&mut args);
|
||||
flag_pcre2(&mut args);
|
||||
flag_pcre2_version(&mut args);
|
||||
flag_pre(&mut args);
|
||||
flag_pre_glob(&mut args);
|
||||
flag_pretty(&mut args);
|
||||
@@ -653,7 +646,7 @@ will be provided. Namely, the following is equivalent to the above:
|
||||
let arg = RGArg::positional("pattern", "PATTERN")
|
||||
.help(SHORT).long_help(LONG)
|
||||
.required_unless(&[
|
||||
"file", "files", "regexp", "type-list", "pcre2-version",
|
||||
"file", "files", "regexp", "type-list",
|
||||
]);
|
||||
args.push(arg);
|
||||
}
|
||||
@@ -684,50 +677,6 @@ This overrides the --context flag.
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_auto_hybrid_regex(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Dynamically use PCRE2 if necessary.";
|
||||
const LONG: &str = long!("\
|
||||
When this flag is used, ripgrep will dynamically choose between supported regex
|
||||
engines depending on the features used in a pattern. When ripgrep chooses a
|
||||
regex engine, it applies that choice for every regex provided to ripgrep (e.g.,
|
||||
via multiple -e/--regexp or -f/--file flags).
|
||||
|
||||
As an example of how this flag might behave, ripgrep will attempt to use
|
||||
its default finite automata based regex engine whenever the pattern can be
|
||||
successfully compiled with that regex engine. If PCRE2 is enabled and if the
|
||||
pattern given could not be compiled with the default regex engine, then PCRE2
|
||||
will be automatically used for searching. If PCRE2 isn't available, then this
|
||||
flag has no effect because there is only one regex engine to choose from.
|
||||
|
||||
In the future, ripgrep may adjust its heuristics for how it decides which
|
||||
regex engine to use. In general, the heuristics will be limited to a static
|
||||
analysis of the patterns, and not to any specific runtime behavior observed
|
||||
while searching files.
|
||||
|
||||
The primary downside of using this flag is that it may not always be obvious
|
||||
which regex engine ripgrep uses, and thus, the match semantics or performance
|
||||
profile of ripgrep may subtly and unexpectedly change. However, in many cases,
|
||||
all regex engines will agree on what constitutes a match and it can be nice
|
||||
to transparently support more advanced regex features like look-around and
|
||||
backreferences without explicitly needing to enable them.
|
||||
|
||||
This flag can be disabled with --no-auto-hybrid-regex.
|
||||
");
|
||||
let arg = RGArg::switch("auto-hybrid-regex")
|
||||
.help(SHORT).long_help(LONG)
|
||||
.overrides("no-auto-hybrid-regex")
|
||||
.overrides("pcre2")
|
||||
.overrides("no-pcre2");
|
||||
args.push(arg);
|
||||
|
||||
let arg = RGArg::switch("no-auto-hybrid-regex")
|
||||
.hidden()
|
||||
.overrides("auto-hybrid-regex")
|
||||
.overrides("pcre2")
|
||||
.overrides("no-pcre2");
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_before_context(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Show NUM lines before each match.";
|
||||
const LONG: &str = long!("\
|
||||
@@ -742,55 +691,6 @@ This overrides the --context flag.
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_binary(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Search binary files.";
|
||||
const LONG: &str = long!("\
|
||||
Enabling this flag will cause ripgrep to search binary files. By default,
|
||||
ripgrep attempts to automatically skip binary files in order to improve the
|
||||
relevance of results and make the search faster.
|
||||
|
||||
Binary files are heuristically detected based on whether they contain a NUL
|
||||
byte or not. By default (without this flag set), once a NUL byte is seen,
|
||||
ripgrep will stop searching the file. Usually, NUL bytes occur in the beginning
|
||||
of most binary files. If a NUL byte occurs after a match, then ripgrep will
|
||||
still stop searching the rest of the file, but a warning will be printed.
|
||||
|
||||
In contrast, when this flag is provided, ripgrep will continue searching a file
|
||||
even if a NUL byte is found. In particular, if a NUL byte is found then ripgrep
|
||||
will continue searching until either a match is found or the end of the file is
|
||||
reached, whichever comes sooner. If a match is found, then ripgrep will stop
|
||||
and print a warning saying that the search stopped prematurely.
|
||||
|
||||
If you want ripgrep to search a file without any special NUL byte handling at
|
||||
all (and potentially print binary data to stdout), then you should use the
|
||||
'-a/--text' flag.
|
||||
|
||||
The '--binary' flag is a flag for controlling ripgrep's automatic filtering
|
||||
mechanism. As such, it does not need to be used when searching a file
|
||||
explicitly or when searching stdin. That is, it is only applicable when
|
||||
recursively searching a directory.
|
||||
|
||||
Note that when the '-u/--unrestricted' flag is provided for a third time, then
|
||||
this flag is automatically enabled.
|
||||
|
||||
This flag can be disabled with '--no-binary'. It overrides the '-a/--text'
|
||||
flag.
|
||||
");
|
||||
let arg = RGArg::switch("binary")
|
||||
.help(SHORT).long_help(LONG)
|
||||
.overrides("no-binary")
|
||||
.overrides("text")
|
||||
.overrides("no-text");
|
||||
args.push(arg);
|
||||
|
||||
let arg = RGArg::switch("no-binary")
|
||||
.hidden()
|
||||
.overrides("binary")
|
||||
.overrides("text")
|
||||
.overrides("no-text");
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_block_buffered(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Force block buffering.";
|
||||
const LONG: &str = long!("\
|
||||
@@ -1490,30 +1390,6 @@ When this flag is omitted or is set to 0, then it has no effect.
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_max_columns_preview(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Print a preview for lines exceeding the limit.";
|
||||
const LONG: &str = long!("\
|
||||
When the '--max-columns' flag is used, ripgrep will by default completely
|
||||
replace any line that is too long with a message indicating that a matching
|
||||
line was removed. When this flag is combined with '--max-columns', a preview
|
||||
of the line (corresponding to the limit size) is shown instead, where the part
|
||||
of the line exceeding the limit is not shown.
|
||||
|
||||
If the '--max-columns' flag is not set, then this has no effect.
|
||||
|
||||
This flag can be disabled with '--no-max-columns-preview'.
|
||||
");
|
||||
let arg = RGArg::switch("max-columns-preview")
|
||||
.help(SHORT).long_help(LONG)
|
||||
.overrides("no-max-columns-preview");
|
||||
args.push(arg);
|
||||
|
||||
let arg = RGArg::switch("no-max-columns-preview")
|
||||
.hidden()
|
||||
.overrides("max-columns-preview");
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_max_count(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Limit the number of matches.";
|
||||
const LONG: &str = long!("\
|
||||
@@ -1983,28 +1859,12 @@ This flag can be disabled with --no-pcre2.
|
||||
");
|
||||
let arg = RGArg::switch("pcre2").short("P")
|
||||
.help(SHORT).long_help(LONG)
|
||||
.overrides("no-pcre2")
|
||||
.overrides("auto-hybrid-regex")
|
||||
.overrides("no-auto-hybrid-regex");
|
||||
.overrides("no-pcre2");
|
||||
args.push(arg);
|
||||
|
||||
let arg = RGArg::switch("no-pcre2")
|
||||
.hidden()
|
||||
.overrides("pcre2")
|
||||
.overrides("auto-hybrid-regex")
|
||||
.overrides("no-auto-hybrid-regex");
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_pcre2_version(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Print the version of PCRE2 that ripgrep uses.";
|
||||
const LONG: &str = long!("\
|
||||
When this flag is present, ripgrep will print the version of PCRE2 in use,
|
||||
along with other information, and then exit. If PCRE2 is not available, then
|
||||
ripgrep will print an error message and exit with an error code.
|
||||
");
|
||||
let arg = RGArg::switch("pcre2-version")
|
||||
.help(SHORT).long_help(LONG);
|
||||
.overrides("pcre2");
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
@@ -2014,13 +1874,12 @@ fn flag_pre(args: &mut Vec<RGArg>) {
|
||||
For each input FILE, search the standard output of COMMAND FILE rather than the
|
||||
contents of FILE. This option expects the COMMAND program to either be an
|
||||
absolute path or to be available in your PATH. Either an empty string COMMAND
|
||||
or the '--no-pre' flag will disable this behavior.
|
||||
or the `--no-pre` flag will disable this behavior.
|
||||
|
||||
WARNING: When this flag is set, ripgrep will unconditionally spawn a
|
||||
process for every file that is searched. Therefore, this can incur an
|
||||
unnecessarily large performance penalty if you don't otherwise need the
|
||||
flexibility offered by this flag. One possible mitigation to this is to use
|
||||
the '--pre-glob' flag to limit which files a preprocessor is run with.
|
||||
flexibility offered by this flag.
|
||||
|
||||
A preprocessor is not run when ripgrep is searching stdin.
|
||||
|
||||
@@ -2349,23 +2208,20 @@ escape codes to be printed that alter the behavior of your terminal.
|
||||
When binary file detection is enabled it is imperfect. In general, it uses
|
||||
a simple heuristic. If a NUL byte is seen during search, then the file is
|
||||
considered binary and search stops (unless this flag is present).
|
||||
Alternatively, if the '--binary' flag is used, then ripgrep will only quit
|
||||
when it sees a NUL byte after it sees a match (or searches the entire file).
|
||||
|
||||
This flag can be disabled with '--no-text'. It overrides the '--binary' flag.
|
||||
Note that when the `-u/--unrestricted` flag is provided for a third time, then
|
||||
this flag is automatically enabled.
|
||||
|
||||
This flag can be disabled with --no-text.
|
||||
");
|
||||
let arg = RGArg::switch("text").short("a")
|
||||
.help(SHORT).long_help(LONG)
|
||||
.overrides("no-text")
|
||||
.overrides("binary")
|
||||
.overrides("no-binary");
|
||||
.overrides("no-text");
|
||||
args.push(arg);
|
||||
|
||||
let arg = RGArg::switch("no-text")
|
||||
.hidden()
|
||||
.overrides("text")
|
||||
.overrides("binary")
|
||||
.overrides("no-binary");
|
||||
.overrides("text");
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
@@ -2494,7 +2350,8 @@ Reduce the level of \"smart\" searching. A single -u won't respect .gitignore
|
||||
(etc.) files. Two -u flags will additionally search hidden files and
|
||||
directories. Three -u flags will additionally search binary files.
|
||||
|
||||
'rg -uuu' is roughly equivalent to 'grep -r'.
|
||||
-uu is roughly equivalent to grep -r and -uuu is roughly equivalent to grep -a
|
||||
-r.
|
||||
");
|
||||
let arg = RGArg::switch("unrestricted").short("u")
|
||||
.help(SHORT).long_help(LONG)
|
||||
@@ -2536,7 +2393,7 @@ ripgrep is explicitly instructed to search one file or stdin.
|
||||
|
||||
This flag overrides --with-filename.
|
||||
");
|
||||
let arg = RGArg::switch("no-filename").short("I")
|
||||
let arg = RGArg::switch("no-filename")
|
||||
.help(NO_SHORT).long_help(NO_LONG)
|
||||
.overrides("with-filename");
|
||||
args.push(arg);
|
||||
|
107
src/args.rs
107
src/args.rs
@@ -73,8 +73,6 @@ pub enum Command {
|
||||
/// List all file type definitions configured, including the default file
|
||||
/// types and any additional file types added to the command line.
|
||||
Types,
|
||||
/// Print the version of PCRE2 in use.
|
||||
PCRE2Version,
|
||||
}
|
||||
|
||||
impl Command {
|
||||
@@ -84,11 +82,7 @@ impl Command {
|
||||
|
||||
match *self {
|
||||
Search | SearchParallel => true,
|
||||
| SearchNever
|
||||
| Files
|
||||
| FilesParallel
|
||||
| Types
|
||||
| PCRE2Version => false,
|
||||
SearchNever | Files | FilesParallel | Types => false,
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -241,9 +235,7 @@ impl Args {
|
||||
let threads = self.matches().threads()?;
|
||||
let one_thread = is_one_search || threads == 1;
|
||||
|
||||
Ok(if self.matches().is_present("pcre2-version") {
|
||||
Command::PCRE2Version
|
||||
} else if self.matches().is_present("type-list") {
|
||||
Ok(if self.matches().is_present("type-list") {
|
||||
Command::Types
|
||||
} else if self.matches().is_present("files") {
|
||||
if one_thread {
|
||||
@@ -294,18 +286,15 @@ impl Args {
|
||||
&self,
|
||||
wtr: W,
|
||||
) -> Result<SearchWorker<W>> {
|
||||
let matches = self.matches();
|
||||
let matcher = self.matcher().clone();
|
||||
let printer = self.printer(wtr)?;
|
||||
let searcher = matches.searcher(self.paths())?;
|
||||
let searcher = self.matches().searcher(self.paths())?;
|
||||
let mut builder = SearchWorkerBuilder::new();
|
||||
builder
|
||||
.json_stats(matches.is_present("json"))
|
||||
.preprocessor(matches.preprocessor())
|
||||
.preprocessor_globs(matches.preprocessor_globs()?)
|
||||
.search_zip(matches.is_present("search-zip"))
|
||||
.binary_detection_implicit(matches.binary_detection_implicit())
|
||||
.binary_detection_explicit(matches.binary_detection_explicit());
|
||||
.json_stats(self.matches().is_present("json"))
|
||||
.preprocessor(self.matches().preprocessor())
|
||||
.preprocessor_globs(self.matches().preprocessor_globs()?)
|
||||
.search_zip(self.matches().is_present("search-zip"));
|
||||
Ok(builder.build(matcher, searcher, printer))
|
||||
}
|
||||
|
||||
@@ -599,25 +588,6 @@ impl ArgMatches {
|
||||
if self.is_present("pcre2") {
|
||||
let matcher = self.matcher_pcre2(patterns)?;
|
||||
Ok(PatternMatcher::PCRE2(matcher))
|
||||
} else if self.is_present("auto-hybrid-regex") {
|
||||
let rust_err = match self.matcher_rust(patterns) {
|
||||
Ok(matcher) => return Ok(PatternMatcher::RustRegex(matcher)),
|
||||
Err(err) => err,
|
||||
};
|
||||
log::debug!(
|
||||
"error building Rust regex in hybrid mode:\n{}", rust_err,
|
||||
);
|
||||
let pcre_err = match self.matcher_pcre2(patterns) {
|
||||
Ok(matcher) => return Ok(PatternMatcher::PCRE2(matcher)),
|
||||
Err(err) => err,
|
||||
};
|
||||
Err(From::from(format!(
|
||||
"regex could not be compiled with either the default regex \
|
||||
engine or with PCRE2.\n\n\
|
||||
default regex engine error:\n{}\n{}\n{}\n\n\
|
||||
PCRE2 regex engine error:\n{}",
|
||||
"~".repeat(79), rust_err, "~".repeat(79), pcre_err,
|
||||
)))
|
||||
} else {
|
||||
let matcher = match self.matcher_rust(patterns) {
|
||||
Ok(matcher) => matcher,
|
||||
@@ -686,13 +656,7 @@ impl ArgMatches {
|
||||
if let Some(limit) = self.dfa_size_limit()? {
|
||||
builder.dfa_size_limit(limit);
|
||||
}
|
||||
let res =
|
||||
if self.is_present("fixed-strings") {
|
||||
builder.build_literals(patterns)
|
||||
} else {
|
||||
builder.build(&patterns.join("|"))
|
||||
};
|
||||
match res {
|
||||
match builder.build(&patterns.join("|")) {
|
||||
Ok(m) => Ok(m),
|
||||
Err(err) => Err(From::from(suggest_multiline(err.to_string()))),
|
||||
}
|
||||
@@ -712,13 +676,8 @@ impl ArgMatches {
|
||||
.word(self.is_present("word-regexp"));
|
||||
// For whatever reason, the JIT craps out during regex compilation with
|
||||
// a "no more memory" error on 32 bit systems. So don't use it there.
|
||||
if cfg!(target_pointer_width = "64") {
|
||||
builder
|
||||
.jit_if_available(true)
|
||||
// The PCRE2 docs say that 32KB is the default, and that 1MB
|
||||
// should be big enough for anything. But let's crank it to
|
||||
// 10MB.
|
||||
.max_jit_stack_size(Some(10 * (1<<20)));
|
||||
if !cfg!(target_pointer_width = "32") {
|
||||
builder.jit_if_available(true);
|
||||
}
|
||||
if self.pcre2_unicode() {
|
||||
builder.utf(true).ucp(true);
|
||||
@@ -778,7 +737,6 @@ impl ArgMatches {
|
||||
.per_match(self.is_present("vimgrep"))
|
||||
.replacement(self.replacement())
|
||||
.max_columns(self.max_columns()?)
|
||||
.max_columns_preview(self.max_columns_preview())
|
||||
.max_matches(self.max_count()?)
|
||||
.column(self.column())
|
||||
.byte_offset(self.is_present("byte-offset"))
|
||||
@@ -838,7 +796,8 @@ impl ArgMatches {
|
||||
.before_context(ctx_before)
|
||||
.after_context(ctx_after)
|
||||
.passthru(self.is_present("passthru"))
|
||||
.memory_map(self.mmap_choice(paths));
|
||||
.memory_map(self.mmap_choice(paths))
|
||||
.binary_detection(self.binary_detection());
|
||||
match self.encoding()? {
|
||||
EncodingMode::Some(enc) => {
|
||||
builder.encoding(Some(enc));
|
||||
@@ -897,42 +856,19 @@ impl ArgMatches {
|
||||
///
|
||||
/// Methods are sorted alphabetically.
|
||||
impl ArgMatches {
|
||||
/// Returns the form of binary detection to perform on files that are
|
||||
/// implicitly searched via recursive directory traversal.
|
||||
fn binary_detection_implicit(&self) -> BinaryDetection {
|
||||
/// Returns the form of binary detection to perform.
|
||||
fn binary_detection(&self) -> BinaryDetection {
|
||||
let none =
|
||||
self.is_present("text")
|
||||
|| self.unrestricted_count() >= 3
|
||||
|| self.is_present("null-data");
|
||||
let convert =
|
||||
self.is_present("binary")
|
||||
|| self.unrestricted_count() >= 3;
|
||||
if none {
|
||||
BinaryDetection::none()
|
||||
} else if convert {
|
||||
BinaryDetection::convert(b'\x00')
|
||||
} else {
|
||||
BinaryDetection::quit(b'\x00')
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns the form of binary detection to perform on files that are
|
||||
/// explicitly searched via the user invoking ripgrep on a particular
|
||||
/// file or files or stdin.
|
||||
///
|
||||
/// In general, this should never be BinaryDetection::quit, since that acts
|
||||
/// as a filter (but quitting immediately once a NUL byte is seen), and we
|
||||
/// should never filter out files that the user wants to explicitly search.
|
||||
fn binary_detection_explicit(&self) -> BinaryDetection {
|
||||
let none =
|
||||
self.is_present("text")
|
||||
|| self.is_present("null-data");
|
||||
if none {
|
||||
BinaryDetection::none()
|
||||
} else {
|
||||
BinaryDetection::convert(b'\x00')
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns true if the command line configuration implies that a match
|
||||
/// can never be shown.
|
||||
fn can_never_match(&self, patterns: &[String]) -> bool {
|
||||
@@ -1175,12 +1111,6 @@ impl ArgMatches {
|
||||
Ok(self.usize_of_nonzero("max-columns")?.map(|n| n as u64))
|
||||
}
|
||||
|
||||
/// Returns true if and only if a preview should be shown for lines that
|
||||
/// exceed the maximum column limit.
|
||||
fn max_columns_preview(&self) -> bool {
|
||||
self.is_present("max-columns-preview")
|
||||
}
|
||||
|
||||
/// The maximum number of matches permitted.
|
||||
fn max_count(&self) -> Result<Option<u64>> {
|
||||
Ok(self.usize_of("max-count")?.map(|n| n as u64))
|
||||
@@ -1310,8 +1240,7 @@ impl ArgMatches {
|
||||
!cli::is_readable_stdin()
|
||||
|| (self.is_present("file") && file_is_stdin)
|
||||
|| self.is_present("files")
|
||||
|| self.is_present("type-list")
|
||||
|| self.is_present("pcre2-version");
|
||||
|| self.is_present("type-list");
|
||||
if search_cwd {
|
||||
Path::new("./").to_path_buf()
|
||||
} else {
|
||||
@@ -1770,12 +1699,12 @@ where I: IntoIterator<Item=T>,
|
||||
if err.use_stderr() {
|
||||
return Err(err.into());
|
||||
}
|
||||
// Explicitly ignore any error returned by write!. The most likely error
|
||||
// Explicitly ignore any error returned by writeln!. The most likely error
|
||||
// at this point is a broken pipe error, in which case, we want to ignore
|
||||
// it and exit quietly.
|
||||
//
|
||||
// (This is the point of this helper function. clap's functionality for
|
||||
// doing this will panic on a broken pipe error.)
|
||||
let _ = write!(io::stdout(), "{}", err);
|
||||
let _ = writeln!(io::stdout(), "{}", err);
|
||||
process::exit(0);
|
||||
}
|
||||
|
28
src/main.rs
28
src/main.rs
@@ -39,7 +39,6 @@ fn try_main(args: Args) -> Result<()> {
|
||||
Files => files(&args),
|
||||
FilesParallel => files_parallel(&args),
|
||||
Types => types(&args),
|
||||
PCRE2Version => pcre2_version(&args),
|
||||
}?;
|
||||
if matched && (args.quiet() || !messages::errored()) {
|
||||
process::exit(0)
|
||||
@@ -276,30 +275,3 @@ fn types(args: &Args) -> Result<bool> {
|
||||
}
|
||||
Ok(count > 0)
|
||||
}
|
||||
|
||||
/// The top-level entry point for --pcre2-version.
|
||||
fn pcre2_version(args: &Args) -> Result<bool> {
|
||||
#[cfg(feature = "pcre2")]
|
||||
fn imp(args: &Args) -> Result<bool> {
|
||||
use grep::pcre2;
|
||||
|
||||
let mut stdout = args.stdout();
|
||||
|
||||
let (major, minor) = pcre2::version();
|
||||
writeln!(stdout, "PCRE2 {}.{} is available", major, minor)?;
|
||||
|
||||
if cfg!(target_pointer_width = "64") && pcre2::is_jit_available() {
|
||||
writeln!(stdout, "JIT is available")?;
|
||||
}
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
#[cfg(not(feature = "pcre2"))]
|
||||
fn imp(args: &Args) -> Result<bool> {
|
||||
let mut stdout = args.stdout();
|
||||
writeln!(stdout, "PCRE2 is not available in this build of ripgrep.")?;
|
||||
Ok(false)
|
||||
}
|
||||
|
||||
imp(args)
|
||||
}
|
||||
|
@@ -10,7 +10,7 @@ use grep::matcher::Matcher;
|
||||
use grep::pcre2::{RegexMatcher as PCRE2RegexMatcher};
|
||||
use grep::printer::{JSON, Standard, Summary, Stats};
|
||||
use grep::regex::{RegexMatcher as RustRegexMatcher};
|
||||
use grep::searcher::{BinaryDetection, Searcher};
|
||||
use grep::searcher::Searcher;
|
||||
use ignore::overrides::Override;
|
||||
use serde_json as json;
|
||||
use serde_json::json;
|
||||
@@ -27,8 +27,6 @@ struct Config {
|
||||
preprocessor: Option<PathBuf>,
|
||||
preprocessor_globs: Override,
|
||||
search_zip: bool,
|
||||
binary_implicit: BinaryDetection,
|
||||
binary_explicit: BinaryDetection,
|
||||
}
|
||||
|
||||
impl Default for Config {
|
||||
@@ -38,8 +36,6 @@ impl Default for Config {
|
||||
preprocessor: None,
|
||||
preprocessor_globs: Override::empty(),
|
||||
search_zip: false,
|
||||
binary_implicit: BinaryDetection::none(),
|
||||
binary_explicit: BinaryDetection::none(),
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -138,37 +134,6 @@ impl SearchWorkerBuilder {
|
||||
self.config.search_zip = yes;
|
||||
self
|
||||
}
|
||||
|
||||
/// Set the binary detection that should be used when searching files
|
||||
/// found via a recursive directory search.
|
||||
///
|
||||
/// Generally, this binary detection may be `BinaryDetection::quit` if
|
||||
/// we want to skip binary files completely.
|
||||
///
|
||||
/// By default, no binary detection is performed.
|
||||
pub fn binary_detection_implicit(
|
||||
&mut self,
|
||||
detection: BinaryDetection,
|
||||
) -> &mut SearchWorkerBuilder {
|
||||
self.config.binary_implicit = detection;
|
||||
self
|
||||
}
|
||||
|
||||
/// Set the binary detection that should be used when searching files
|
||||
/// explicitly supplied by an end user.
|
||||
///
|
||||
/// Generally, this binary detection should NOT be `BinaryDetection::quit`,
|
||||
/// since we never want to automatically filter files supplied by the end
|
||||
/// user.
|
||||
///
|
||||
/// By default, no binary detection is performed.
|
||||
pub fn binary_detection_explicit(
|
||||
&mut self,
|
||||
detection: BinaryDetection,
|
||||
) -> &mut SearchWorkerBuilder {
|
||||
self.config.binary_explicit = detection;
|
||||
self
|
||||
}
|
||||
}
|
||||
|
||||
/// The result of executing a search.
|
||||
@@ -343,14 +308,6 @@ impl<W: WriteColor> SearchWorker<W> {
|
||||
|
||||
/// Search the given subject using the appropriate strategy.
|
||||
fn search_impl(&mut self, subject: &Subject) -> io::Result<SearchResult> {
|
||||
let bin =
|
||||
if subject.is_explicit() {
|
||||
self.config.binary_explicit.clone()
|
||||
} else {
|
||||
self.config.binary_implicit.clone()
|
||||
};
|
||||
self.searcher.set_binary_detection(bin);
|
||||
|
||||
let path = subject.path();
|
||||
if subject.is_stdin() {
|
||||
let stdin = io::stdin();
|
||||
|
@@ -59,12 +59,17 @@ impl SubjectBuilder {
|
||||
if let Some(ignore_err) = subj.dent.error() {
|
||||
ignore_message!("{}", ignore_err);
|
||||
}
|
||||
// If this entry was explicitly provided by an end user, then we always
|
||||
// want to search it.
|
||||
if subj.is_explicit() {
|
||||
// If this entry represents stdin, then we always search it.
|
||||
if subj.dent.is_stdin() {
|
||||
return Some(subj);
|
||||
}
|
||||
// At this point, we only want to search something if it's explicitly a
|
||||
// If this subject has a depth of 0, then it was provided explicitly
|
||||
// by an end user (or via a shell glob). In this case, we always want
|
||||
// to search it if it even smells like a file (e.g., a symlink).
|
||||
if subj.dent.depth() == 0 && !subj.is_dir() {
|
||||
return Some(subj);
|
||||
}
|
||||
// At this point, we only want to search something it's explicitly a
|
||||
// file. This omits symlinks. (If ripgrep was configured to follow
|
||||
// symlinks, then they have already been followed by the directory
|
||||
// traversal.)
|
||||
@@ -122,26 +127,6 @@ impl Subject {
|
||||
self.dent.is_stdin()
|
||||
}
|
||||
|
||||
/// Returns true if and only if this entry corresponds to a subject to
|
||||
/// search that was explicitly supplied by an end user.
|
||||
///
|
||||
/// Generally, this corresponds to either stdin or an explicit file path
|
||||
/// argument. e.g., in `rg foo some-file ./some-dir/`, `some-file` is
|
||||
/// an explicit subject, but, e.g., `./some-dir/some-other-file` is not.
|
||||
///
|
||||
/// However, note that ripgrep does not see through shell globbing. e.g.,
|
||||
/// in `rg foo ./some-dir/*`, `./some-dir/some-other-file` will be treated
|
||||
/// as an explicit subject.
|
||||
pub fn is_explicit(&self) -> bool {
|
||||
// stdin is obvious. When an entry has a depth of 0, that means it
|
||||
// was explicitly provided to our directory iterator, which means it
|
||||
// was in turn explicitly provided by the end user. The !is_dir check
|
||||
// means that we want to search files even if their symlinks, again,
|
||||
// because they were explicitly provided. (And we never want to try
|
||||
// to search a directory.)
|
||||
self.is_stdin() || (self.dent.depth() == 0 && !self.is_dir())
|
||||
}
|
||||
|
||||
/// Returns true if and only if this subject points to a directory after
|
||||
/// following symbolic links.
|
||||
fn is_dir(&self) -> bool {
|
||||
|
315
tests/binary.rs
315
tests/binary.rs
@@ -1,315 +0,0 @@
|
||||
use crate::util::{Dir, TestCommand};
|
||||
|
||||
// This file contains a smattering of tests specifically for checking ripgrep's
|
||||
// handling of binary files. There's quite a bit of discussion on this in this
|
||||
// bug report: https://github.com/BurntSushi/ripgrep/issues/306
|
||||
|
||||
// Our haystack is the first 500 lines of Gutenberg's copy of "A Study in
|
||||
// Scarlet," with a NUL byte at line 237: `abcdef\x00`.
|
||||
//
|
||||
// The position and size of the haystack is, unfortunately, significant. In
|
||||
// particular, the NUL byte is specifically inserted at some point *after* the
|
||||
// first 8192 bytes, which corresponds to the initial capacity of the buffer
|
||||
// that ripgrep uses to read files. (grep for DEFAULT_BUFFER_CAPACITY.) The
|
||||
// position of the NUL byte ensures that we can execute some search on the
|
||||
// initial buffer contents without ever detecting any binary data. Moreover,
|
||||
// when using a memory map for searching, only the first 8192 bytes are
|
||||
// scanned for a NUL byte, so no binary bytes are detected at all when using
|
||||
// a memory map (unless our query matches line 237).
|
||||
//
|
||||
// One last note: in the tests below, we use --no-mmap heavily because binary
|
||||
// detection with memory maps is a bit different. Namely, NUL bytes are only
|
||||
// searched for in the first few KB of the file and in a match. Normally, NUL
|
||||
// bytes are searched for everywhere.
|
||||
//
|
||||
// TODO: Add tests for binary file detection when using memory maps.
|
||||
const HAY: &'static [u8] = include_bytes!("./data/sherlock-nul.txt");
|
||||
|
||||
// This tests that ripgrep prints a warning message if it finds and prints a
|
||||
// match in a binary file before detecting that it is a binary file. The point
|
||||
// here is to notify that user that the search of the file is only partially
|
||||
// complete.
|
||||
//
|
||||
// This applies to files that are *implicitly* searched via a recursive
|
||||
// directory traversal. In particular, this results in a WARNING message being
|
||||
// printed. We make our file "implicit" by doing a recursive search with a glob
|
||||
// that matches our file.
|
||||
rgtest!(after_match1_implicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
WARNING: stopped searching binary file hay after match (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit, except we provide a file to search
|
||||
// explicitly. This results in identical behavior, but a different message.
|
||||
rgtest!(after_match1_explicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "Project Gutenberg EBook", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
Binary file matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_explicit, except we feed our content on stdin.
|
||||
rgtest!(after_match1_stdin, |_: Dir, mut cmd: TestCommand| {
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "Project Gutenberg EBook",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
Binary file matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.pipe(HAY));
|
||||
});
|
||||
|
||||
// Like after_match1_implicit, but provides the --binary flag, which
|
||||
// disables binary filtering. Thus, this matches the behavior of ripgrep as
|
||||
// if the file were given explicitly.
|
||||
rgtest!(after_match1_implicit_binary, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--binary", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
Binary file hay matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit, but enables -a/--text, so no binary
|
||||
// detection should be performed.
|
||||
rgtest!(after_match1_implicit_text, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--text", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit_text, but enables -a/--text, so no binary
|
||||
// detection should be performed.
|
||||
rgtest!(after_match1_explicit_text, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--text", "Project Gutenberg EBook", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit, except this asks ripgrep to print all matching
|
||||
// files.
|
||||
//
|
||||
// This is an interesting corner case that one might consider a bug, however,
|
||||
// it's unlikely to be fixed. Namely, ripgrep probably shouldn't print `hay`
|
||||
// as a matching file since it is in fact a binary file, and thus should be
|
||||
// filtered out by default. However, the --files-with-matches flag will print
|
||||
// out the path of a matching file as soon as a match is seen and then stop
|
||||
// searching completely. Therefore, the NUL byte is never actually detected.
|
||||
//
|
||||
// The only way to fix this would be to kill ripgrep's performance in this case
|
||||
// and continue searching the entire file for a NUL byte. (Similarly if the
|
||||
// --quiet flag is set. See the next test.)
|
||||
rgtest!(after_match1_implicit_path, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-l", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
eqnice!("hay\n", cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit_path, except this indicates that a match was
|
||||
// found with no other output. (This is the same bug described above, but
|
||||
// manifest as an exit code with no output.)
|
||||
rgtest!(after_match1_implicit_quiet, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-q", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
eqnice!("", cmd.stdout());
|
||||
});
|
||||
|
||||
// This sets up the same test as after_match1_implicit_path, but instead of
|
||||
// just printing the matching files, this includes the full count of matches.
|
||||
// In this case, we need to search the entire file, so ripgrep correctly
|
||||
// detects the binary data and suppresses output.
|
||||
rgtest!(after_match1_implicit_count, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-c", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
cmd.assert_err();
|
||||
});
|
||||
|
||||
// Like after_match1_implicit_count, except the --binary flag is provided,
|
||||
// which makes ripgrep disable binary data filtering even for implicit files.
|
||||
rgtest!(after_match1_implicit_count_binary, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-c", "--binary",
|
||||
"Project Gutenberg EBook",
|
||||
"-g", "hay",
|
||||
]);
|
||||
eqnice!("hay:1\n", cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit_count, except the file path is provided
|
||||
// explicitly, so binary filtering is disabled and a count is correctly
|
||||
// reported.
|
||||
rgtest!(after_match1_explicit_count, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-c", "Project Gutenberg EBook", "hay",
|
||||
]);
|
||||
eqnice!("1\n", cmd.stdout());
|
||||
});
|
||||
|
||||
// This tests that a match way before the NUL byte is shown, but a match after
|
||||
// the NUL byte is not.
|
||||
rgtest!(after_match2_implicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n",
|
||||
"Project Gutenberg EBook|a medical student",
|
||||
"-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
WARNING: stopped searching binary file hay after match (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match2_implicit, but enables -a/--text, so no binary
|
||||
// detection should be performed.
|
||||
rgtest!(after_match2_implicit_text, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--text",
|
||||
"Project Gutenberg EBook|a medical student",
|
||||
"-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
hay:236:\"And yet you say he is not a medical student?\"
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// This tests that ripgrep *silently* quits before finding a match that occurs
|
||||
// after a NUL byte.
|
||||
rgtest!(before_match1_implicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "Heaven", "-g", "hay",
|
||||
]);
|
||||
cmd.assert_err();
|
||||
});
|
||||
|
||||
// This tests that ripgrep *does not* silently quit before finding a match that
|
||||
// occurs after a NUL byte when a file is explicitly searched.
|
||||
rgtest!(before_match1_explicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "Heaven", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
Binary file matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like before_match1_implicit, but enables the --binary flag, which
|
||||
// disables binary filtering. Thus, this matches the behavior of ripgrep as if
|
||||
// the file were given explicitly.
|
||||
rgtest!(before_match1_implicit_binary, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--binary", "Heaven", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
Binary file hay matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like before_match1_implicit, but enables -a/--text, so no binary
|
||||
// detection should be performed.
|
||||
rgtest!(before_match1_implicit_text, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--text", "Heaven", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:238:\"No. Heaven knows what the objects of his studies are. But here we
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// This tests that ripgrep *silently* quits before finding a match that occurs
|
||||
// before a NUL byte, but within the same buffer as the NUL byte.
|
||||
rgtest!(before_match2_implicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "a medical student", "-g", "hay",
|
||||
]);
|
||||
cmd.assert_err();
|
||||
});
|
||||
|
||||
// This tests that ripgrep *does not* silently quit before finding a match that
|
||||
// occurs before a NUL byte, but within the same buffer as the NUL byte. Even
|
||||
// though the match occurs before the NUL byte, ripgrep still doesn't print it
|
||||
// because it has already scanned ahead to detect the NUL byte. (This matches
|
||||
// the behavior of GNU grep.)
|
||||
rgtest!(before_match2_explicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "a medical student", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
Binary file matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like before_match1_implicit, but enables -a/--text, so no binary
|
||||
// detection should be performed.
|
||||
rgtest!(before_match2_implicit_text, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--text", "a medical student", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:236:\"And yet you say he is not a medical student?\"
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
@@ -1,500 +0,0 @@
|
||||
The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
|
||||
This eBook is for the use of anyone anywhere at no cost and with
|
||||
almost no restrictions whatsoever. You may copy it, give it away or
|
||||
re-use it under the terms of the Project Gutenberg License included
|
||||
with this eBook or online at www.gutenberg.org
|
||||
|
||||
|
||||
Title: A Study In Scarlet
|
||||
|
||||
Author: Arthur Conan Doyle
|
||||
|
||||
Posting Date: July 12, 2008 [EBook #244]
|
||||
Release Date: April, 1995
|
||||
[Last updated: February 17, 2013]
|
||||
|
||||
Language: English
|
||||
|
||||
|
||||
*** START OF THIS PROJECT GUTENBERG EBOOK A STUDY IN SCARLET ***
|
||||
|
||||
|
||||
|
||||
|
||||
Produced by Roger Squires
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
A STUDY IN SCARLET.
|
||||
|
||||
By A. Conan Doyle
|
||||
|
||||
[1]
|
||||
|
||||
|
||||
|
||||
Original Transcriber's Note: This etext is prepared directly
|
||||
from an 1887 edition, and care has been taken to duplicate the
|
||||
original exactly, including typographical and punctuation
|
||||
vagaries.
|
||||
|
||||
Additions to the text include adding the underscore character to
|
||||
indicate italics, and textual end-notes in square braces.
|
||||
|
||||
Project Gutenberg Editor's Note: In reproofing and moving old PG
|
||||
files such as this to the present PG directory system it is the
|
||||
policy to reformat the text to conform to present PG Standards.
|
||||
In this case however, in consideration of the note above of the
|
||||
original transcriber describing his care to try to duplicate the
|
||||
original 1887 edition as to typography and punctuation vagaries,
|
||||
no changes have been made in this ascii text file. However, in
|
||||
the Latin-1 file and this html file, present standards are
|
||||
followed and the several French and Spanish words have been
|
||||
given their proper accents.
|
||||
|
||||
Part II, The Country of the Saints, deals much with the Mormon Church.
|
||||
|
||||
|
||||
|
||||
|
||||
A STUDY IN SCARLET.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
PART I.
|
||||
|
||||
(_Being a reprint from the reminiscences of_ JOHN H. WATSON, M.D., _late
|
||||
of the Army Medical Department._) [2]
|
||||
|
||||
|
||||
|
||||
|
||||
CHAPTER I. MR. SHERLOCK HOLMES.
|
||||
|
||||
|
||||
IN the year 1878 I took my degree of Doctor of Medicine of the
|
||||
University of London, and proceeded to Netley to go through the course
|
||||
prescribed for surgeons in the army. Having completed my studies there,
|
||||
I was duly attached to the Fifth Northumberland Fusiliers as Assistant
|
||||
Surgeon. The regiment was stationed in India at the time, and before
|
||||
I could join it, the second Afghan war had broken out. On landing at
|
||||
Bombay, I learned that my corps had advanced through the passes, and
|
||||
was already deep in the enemy's country. I followed, however, with many
|
||||
other officers who were in the same situation as myself, and succeeded
|
||||
in reaching Candahar in safety, where I found my regiment, and at once
|
||||
entered upon my new duties.
|
||||
|
||||
The campaign brought honours and promotion to many, but for me it had
|
||||
nothing but misfortune and disaster. I was removed from my brigade and
|
||||
attached to the Berkshires, with whom I served at the fatal battle of
|
||||
Maiwand. There I was struck on the shoulder by a Jezail bullet, which
|
||||
shattered the bone and grazed the subclavian artery. I should have
|
||||
fallen into the hands of the murderous Ghazis had it not been for the
|
||||
devotion and courage shown by Murray, my orderly, who threw me across a
|
||||
pack-horse, and succeeded in bringing me safely to the British lines.
|
||||
|
||||
Worn with pain, and weak from the prolonged hardships which I had
|
||||
undergone, I was removed, with a great train of wounded sufferers, to
|
||||
the base hospital at Peshawar. Here I rallied, and had already improved
|
||||
so far as to be able to walk about the wards, and even to bask a little
|
||||
upon the verandah, when I was struck down by enteric fever, that curse
|
||||
of our Indian possessions. For months my life was despaired of, and
|
||||
when at last I came to myself and became convalescent, I was so weak and
|
||||
emaciated that a medical board determined that not a day should be lost
|
||||
in sending me back to England. I was dispatched, accordingly, in the
|
||||
troopship "Orontes," and landed a month later on Portsmouth jetty, with
|
||||
my health irretrievably ruined, but with permission from a paternal
|
||||
government to spend the next nine months in attempting to improve it.
|
||||
|
||||
I had neither kith nor kin in England, and was therefore as free as
|
||||
air--or as free as an income of eleven shillings and sixpence a day will
|
||||
permit a man to be. Under such circumstances, I naturally gravitated to
|
||||
London, that great cesspool into which all the loungers and idlers of
|
||||
the Empire are irresistibly drained. There I stayed for some time at
|
||||
a private hotel in the Strand, leading a comfortless, meaningless
|
||||
existence, and spending such money as I had, considerably more freely
|
||||
than I ought. So alarming did the state of my finances become, that
|
||||
I soon realized that I must either leave the metropolis and rusticate
|
||||
somewhere in the country, or that I must make a complete alteration in
|
||||
my style of living. Choosing the latter alternative, I began by making
|
||||
up my mind to leave the hotel, and to take up my quarters in some less
|
||||
pretentious and less expensive domicile.
|
||||
|
||||
On the very day that I had come to this conclusion, I was standing at
|
||||
the Criterion Bar, when some one tapped me on the shoulder, and turning
|
||||
round I recognized young Stamford, who had been a dresser under me at
|
||||
Barts. The sight of a friendly face in the great wilderness of London is
|
||||
a pleasant thing indeed to a lonely man. In old days Stamford had never
|
||||
been a particular crony of mine, but now I hailed him with enthusiasm,
|
||||
and he, in his turn, appeared to be delighted to see me. In the
|
||||
exuberance of my joy, I asked him to lunch with me at the Holborn, and
|
||||
we started off together in a hansom.
|
||||
|
||||
"Whatever have you been doing with yourself, Watson?" he asked in
|
||||
undisguised wonder, as we rattled through the crowded London streets.
|
||||
"You are as thin as a lath and as brown as a nut."
|
||||
|
||||
I gave him a short sketch of my adventures, and had hardly concluded it
|
||||
by the time that we reached our destination.
|
||||
|
||||
"Poor devil!" he said, commiseratingly, after he had listened to my
|
||||
misfortunes. "What are you up to now?"
|
||||
|
||||
"Looking for lodgings." [3] I answered. "Trying to solve the problem
|
||||
as to whether it is possible to get comfortable rooms at a reasonable
|
||||
price."
|
||||
|
||||
"That's a strange thing," remarked my companion; "you are the second man
|
||||
to-day that has used that expression to me."
|
||||
|
||||
"And who was the first?" I asked.
|
||||
|
||||
"A fellow who is working at the chemical laboratory up at the hospital.
|
||||
He was bemoaning himself this morning because he could not get someone
|
||||
to go halves with him in some nice rooms which he had found, and which
|
||||
were too much for his purse."
|
||||
|
||||
"By Jove!" I cried, "if he really wants someone to share the rooms and
|
||||
the expense, I am the very man for him. I should prefer having a partner
|
||||
to being alone."
|
||||
|
||||
Young Stamford looked rather strangely at me over his wine-glass. "You
|
||||
don't know Sherlock Holmes yet," he said; "perhaps you would not care
|
||||
for him as a constant companion."
|
||||
|
||||
"Why, what is there against him?"
|
||||
|
||||
"Oh, I didn't say there was anything against him. He is a little queer
|
||||
in his ideas--an enthusiast in some branches of science. As far as I
|
||||
know he is a decent fellow enough."
|
||||
|
||||
"A medical student, I suppose?" said I.
|
||||
|
||||
"No--I have no idea what he intends to go in for. I believe he is well
|
||||
up in anatomy, and he is a first-class chemist; but, as far as I know,
|
||||
he has never taken out any systematic medical classes. His studies are
|
||||
very desultory and eccentric, but he has amassed a lot of out-of-the way
|
||||
knowledge which would astonish his professors."
|
||||
|
||||
"Did you never ask him what he was going in for?" I asked.
|
||||
|
||||
"No; he is not a man that it is easy to draw out, though he can be
|
||||
communicative enough when the fancy seizes him."
|
||||
|
||||
"I should like to meet him," I said. "If I am to lodge with anyone, I
|
||||
should prefer a man of studious and quiet habits. I am not strong
|
||||
enough yet to stand much noise or excitement. I had enough of both in
|
||||
Afghanistan to last me for the remainder of my natural existence. How
|
||||
could I meet this friend of yours?"
|
||||
|
||||
"He is sure to be at the laboratory," returned my companion. "He either
|
||||
avoids the place for weeks, or else he works there from morning to
|
||||
night. If you like, we shall drive round together after luncheon."
|
||||
|
||||
"Certainly," I answered, and the conversation drifted away into other
|
||||
channels.
|
||||
|
||||
As we made our way to the hospital after leaving the Holborn, Stamford
|
||||
gave me a few more particulars about the gentleman whom I proposed to
|
||||
take as a fellow-lodger.
|
||||
|
||||
"You mustn't blame me if you don't get on with him," he said; "I know
|
||||
nothing more of him than I have learned from meeting him occasionally in
|
||||
the laboratory. You proposed this arrangement, so you must not hold me
|
||||
responsible."
|
||||
|
||||
"If we don't get on it will be easy to part company," I answered. "It
|
||||
seems to me, Stamford," I added, looking hard at my companion, "that you
|
||||
have some reason for washing your hands of the matter. Is this fellow's
|
||||
temper so formidable, or what is it? Don't be mealy-mouthed about it."
|
||||
|
||||
"It is not easy to express the inexpressible," he answered with a laugh.
|
||||
"Holmes is a little too scientific for my tastes--it approaches to
|
||||
cold-bloodedness. I could imagine his giving a friend a little pinch of
|
||||
the latest vegetable alkaloid, not out of malevolence, you understand,
|
||||
but simply out of a spirit of inquiry in order to have an accurate idea
|
||||
of the effects. To do him justice, I think that he would take it himself
|
||||
with the same readiness. He appears to have a passion for definite and
|
||||
exact knowledge."
|
||||
|
||||
"Very right too."
|
||||
|
||||
"Yes, but it may be pushed to excess. When it comes to beating the
|
||||
subjects in the dissecting-rooms with a stick, it is certainly taking
|
||||
rather a bizarre shape."
|
||||
|
||||
"Beating the subjects!"
|
||||
|
||||
"Yes, to verify how far bruises may be produced after death. I saw him
|
||||
at it with my own eyes."
|
||||
|
||||
"And yet you say he is not a medical student?"
|
||||
abcdef |