mirror of
https://github.com/BurntSushi/ripgrep.git
synced 2025-08-15 20:23:49 -07:00
Compare commits
5 Commits
grep-match
...
ag/prepare
Author | SHA1 | Date | |
---|---|---|---|
|
16f0fa6aa6 | ||
|
b602dbd294 | ||
|
011aabe477 | ||
|
c12acd7396 | ||
|
71fb43e51e |
10
.travis.yml
10
.travis.yml
@@ -1,9 +1,9 @@
|
|||||||
language: rust
|
language: rust
|
||||||
dist: xenial
|
|
||||||
env:
|
env:
|
||||||
global:
|
global:
|
||||||
- PROJECT_NAME: ripgrep
|
- PROJECT_NAME: ripgrep
|
||||||
- RUST_BACKTRACE: full
|
- RUST_BACKTRACE: full
|
||||||
|
- TRAVIS_TAG: testrelease
|
||||||
addons:
|
addons:
|
||||||
apt:
|
apt:
|
||||||
packages:
|
packages:
|
||||||
@@ -63,13 +63,13 @@ matrix:
|
|||||||
# Minimum Rust supported channel. We enable these to make sure ripgrep
|
# Minimum Rust supported channel. We enable these to make sure ripgrep
|
||||||
# continues to work on the advertised minimum Rust version.
|
# continues to work on the advertised minimum Rust version.
|
||||||
- os: linux
|
- os: linux
|
||||||
rust: 1.34.0
|
rust: 1.28.0
|
||||||
env: TARGET=x86_64-unknown-linux-gnu
|
env: TARGET=x86_64-unknown-linux-gnu
|
||||||
- os: linux
|
- os: linux
|
||||||
rust: 1.34.0
|
rust: 1.28.0
|
||||||
env: TARGET=x86_64-unknown-linux-musl
|
env: TARGET=x86_64-unknown-linux-musl
|
||||||
- os: linux
|
- os: linux
|
||||||
rust: 1.34.0
|
rust: 1.28.0
|
||||||
env: TARGET=arm-unknown-linux-gnueabihf GCC_VERSION=4.8
|
env: TARGET=arm-unknown-linux-gnueabihf GCC_VERSION=4.8
|
||||||
addons:
|
addons:
|
||||||
apt:
|
apt:
|
||||||
@@ -94,7 +94,6 @@ deploy:
|
|||||||
skip_cleanup: true
|
skip_cleanup: true
|
||||||
on:
|
on:
|
||||||
condition: $TRAVIS_RUST_VERSION = nightly
|
condition: $TRAVIS_RUST_VERSION = nightly
|
||||||
branch: master # i guess we do need this after all?
|
|
||||||
tags: true
|
tags: true
|
||||||
api_key:
|
api_key:
|
||||||
secure: "IbSnsbGkxSydR/sozOf1/SRvHplzwRUHzcTjM7BKnr7GccL86gRPUrsrvD103KjQUGWIc1TnK1YTq5M0Onswg/ORDjqa1JEJPkPdPnVh9ipbF7M2De/7IlB4X4qXLKoApn8+bx2x/mfYXu4G+G1/2QdbaKK2yfXZKyjz0YFx+6CNrVCT2Nk8q7aHvOOzAL58vsG8iPDpupuhxlMDDn/UhyOWVInmPPQ0iJR1ZUJN8xJwXvKvBbfp3AhaBiAzkhXHNLgBR8QC5noWWMXnuVDMY3k4f3ic0V+p/qGUCN/nhptuceLxKFicMCYObSZeUzE5RAI0/OBW7l3z2iCoc+TbAnn+JrX/ObJCfzgAOXAU3tLaBFMiqQPGFKjKg1ltSYXomOFP/F7zALjpvFp4lYTBajRR+O3dqaxA9UQuRjw27vOeUpMcga4ZzL4VXFHzrxZKBHN//XIGjYAVhJ1NSSeGpeJV5/+jYzzWKfwSagRxQyVCzMooYFFXzn8Yxdm3PJlmp3GaAogNkdB9qKcrEvRINCelalzALPi0hD/HUDi8DD2PNTCLLMo6VSYtvc685Zbe+KgNzDV1YyTrRCUW6JotrS0r2ULLwnsh40hSB//nNv3XmwNmC/CmW5QAnIGj8cBMF4S2t6ohADIndojdAfNiptmaZOIT6owK7bWMgPMyopo="
|
secure: "IbSnsbGkxSydR/sozOf1/SRvHplzwRUHzcTjM7BKnr7GccL86gRPUrsrvD103KjQUGWIc1TnK1YTq5M0Onswg/ORDjqa1JEJPkPdPnVh9ipbF7M2De/7IlB4X4qXLKoApn8+bx2x/mfYXu4G+G1/2QdbaKK2yfXZKyjz0YFx+6CNrVCT2Nk8q7aHvOOzAL58vsG8iPDpupuhxlMDDn/UhyOWVInmPPQ0iJR1ZUJN8xJwXvKvBbfp3AhaBiAzkhXHNLgBR8QC5noWWMXnuVDMY3k4f3ic0V+p/qGUCN/nhptuceLxKFicMCYObSZeUzE5RAI0/OBW7l3z2iCoc+TbAnn+JrX/ObJCfzgAOXAU3tLaBFMiqQPGFKjKg1ltSYXomOFP/F7zALjpvFp4lYTBajRR+O3dqaxA9UQuRjw27vOeUpMcga4ZzL4VXFHzrxZKBHN//XIGjYAVhJ1NSSeGpeJV5/+jYzzWKfwSagRxQyVCzMooYFFXzn8Yxdm3PJlmp3GaAogNkdB9qKcrEvRINCelalzALPi0hD/HUDi8DD2PNTCLLMo6VSYtvc685Zbe+KgNzDV1YyTrRCUW6JotrS0r2ULLwnsh40hSB//nNv3XmwNmC/CmW5QAnIGj8cBMF4S2t6ohADIndojdAfNiptmaZOIT6owK7bWMgPMyopo="
|
||||||
@@ -102,6 +101,7 @@ branches:
|
|||||||
only:
|
only:
|
||||||
# Pushes and PR to the master branch
|
# Pushes and PR to the master branch
|
||||||
- master
|
- master
|
||||||
|
- ag/prepare-0.10.0
|
||||||
# Ruby regex to match tags. Required, or travis won't trigger deploys when
|
# Ruby regex to match tags. Required, or travis won't trigger deploys when
|
||||||
# a new tag is pushed.
|
# a new tag is pushed.
|
||||||
- /^\d+\.\d+\.\d+.*$/
|
- /^\d+\.\d+\.\d+.*$/
|
||||||
|
130
CHANGELOG.md
130
CHANGELOG.md
@@ -1,133 +1,5 @@
|
|||||||
11.0.0 (TBD)
|
0.10.0 (TBD)
|
||||||
============
|
============
|
||||||
ripgrep 11 is a new major version release of ripgrep that contains many bug
|
|
||||||
fixes, some performance improvements and a few feature enhancements. Notably,
|
|
||||||
ripgrep's user experience for binary file filtering has been improved. See the
|
|
||||||
[guide's new section on binary data](GUIDE.md#binary-data) for more details.
|
|
||||||
|
|
||||||
This release also marks a change in ripgrep's versioning. Where as the previous
|
|
||||||
version was `0.10.0`, this version is `11.0.0`. Moving forward, ripgrep's
|
|
||||||
major version will be increased a few times per year. ripgrep will continue to
|
|
||||||
be conservative with respect to backwards compatibility, but may occasionally
|
|
||||||
introduce breaking changes, which will always be documented in this CHANGELOG.
|
|
||||||
See [issue 1172](https://github.com/BurntSushi/ripgrep/issues/1172) for a bit
|
|
||||||
more detail on why this versioning change was made.
|
|
||||||
|
|
||||||
This release increases the **minimum supported Rust version** from 1.28.0 to
|
|
||||||
1.34.0.
|
|
||||||
|
|
||||||
**BREAKING CHANGES**:
|
|
||||||
|
|
||||||
* ripgrep has tweaked its exit status codes to be more like GNU grep's. Namely,
|
|
||||||
if a non-fatal error occurs during a search, then ripgrep will now always
|
|
||||||
emit a `2` exit status code, regardless of whether a match is found or not.
|
|
||||||
Previously, ripgrep would only emit a `2` exit status code for a catastrophic
|
|
||||||
error (e.g., regex syntax error). One exception to this is if ripgrep is run
|
|
||||||
with `-q/--quiet`. In that case, if an error occurs and a match is found,
|
|
||||||
then ripgrep will exit with a `0` exit status code.
|
|
||||||
* Supplying the `-u/--unrestricted` flag three times is now equivalent to
|
|
||||||
supplying `--no-ignore --hidden --binary`. Previously, `-uuu` was equivalent
|
|
||||||
to `--no-ignore --hidden --text`. The difference is that `--binary` disables
|
|
||||||
binary file filtering without potentially dumping binary data into your
|
|
||||||
terminal. That is, `rg -uuu foo` should now be equivalent to `grep -r foo`.
|
|
||||||
* The `avx-accel` feature of ripgrep has been removed since it is no longer
|
|
||||||
necessary. All uses of AVX in ripgrep are now enabled automatically via
|
|
||||||
runtime CPU feature detection. The `simd-accel` feature does remain
|
|
||||||
available, however, it does increase compilation times substantially at the
|
|
||||||
moment.
|
|
||||||
|
|
||||||
Performance improvements:
|
|
||||||
|
|
||||||
* [PERF #497](https://github.com/BurntSushi/ripgrep/issues/497),
|
|
||||||
[PERF #838](https://github.com/BurntSushi/ripgrep/issues/838):
|
|
||||||
Make `rg -F -f dictionary-of-literals` much faster.
|
|
||||||
|
|
||||||
Feature enhancements:
|
|
||||||
|
|
||||||
* Added or improved file type filtering for Apache Thrift, ASP, Bazel, Brotli,
|
|
||||||
BuildStream, bzip2, C, C++, Cython, gzip, Java, Make, Postscript, QML, Tex,
|
|
||||||
XML, xz, zig and zstd.
|
|
||||||
* [FEATURE #855](https://github.com/BurntSushi/ripgrep/issues/855):
|
|
||||||
Add `--binary` flag for disabling binary file filtering.
|
|
||||||
* [FEATURE #1078](https://github.com/BurntSushi/ripgrep/pull/1078):
|
|
||||||
Add `--max-columns-preview` flag for showing a preview of long lines.
|
|
||||||
* [FEATURE #1099](https://github.com/BurntSushi/ripgrep/pull/1099):
|
|
||||||
Add support for Brotli and Zstd to the `-z/--search-zip` flag.
|
|
||||||
* [FEATURE #1138](https://github.com/BurntSushi/ripgrep/pull/1138):
|
|
||||||
Add `--no-ignore-dot` flag for ignoring `.ignore` files.
|
|
||||||
* [FEATURE #1155](https://github.com/BurntSushi/ripgrep/pull/1155):
|
|
||||||
Add `--auto-hybrid-regex` flag for automatically falling back to PCRE2.
|
|
||||||
* [FEATURE #1159](https://github.com/BurntSushi/ripgrep/pull/1159):
|
|
||||||
ripgrep's exit status logic should now match GNU grep. See updated man page.
|
|
||||||
* [FEATURE #1164](https://github.com/BurntSushi/ripgrep/pull/1164):
|
|
||||||
Add `--ignore-file-case-insensitive` for case insensitive ignore globs.
|
|
||||||
* [FEATURE #1185](https://github.com/BurntSushi/ripgrep/pull/1185):
|
|
||||||
Add `-I` flag as a short option for the `--no-filename` flag.
|
|
||||||
* [FEATURE #1207](https://github.com/BurntSushi/ripgrep/pull/1207):
|
|
||||||
Add `none` value to `-E/--encoding` to forcefully disable all transcoding.
|
|
||||||
* [FEATURE da9d7204](https://github.com/BurntSushi/ripgrep/commit/da9d7204):
|
|
||||||
Add `--pcre2-version` for querying showing PCRE2 version information.
|
|
||||||
|
|
||||||
Bug fixes:
|
|
||||||
|
|
||||||
* [BUG #306](https://github.com/BurntSushi/ripgrep/issues/306),
|
|
||||||
[BUG #855](https://github.com/BurntSushi/ripgrep/issues/855):
|
|
||||||
Improve the user experience for ripgrep's binary file filtering.
|
|
||||||
* [BUG #373](https://github.com/BurntSushi/ripgrep/issues/373),
|
|
||||||
[BUG #1098](https://github.com/BurntSushi/ripgrep/issues/1098):
|
|
||||||
`**` is now accepted as valid syntax anywhere in a glob.
|
|
||||||
* [BUG #916](https://github.com/BurntSushi/ripgrep/issues/916):
|
|
||||||
ripgrep no longer hangs when searching `/proc` with a zombie process present.
|
|
||||||
* [BUG #1052](https://github.com/BurntSushi/ripgrep/issues/1052):
|
|
||||||
Fix bug where ripgrep could panic when transcoding UTF-16 files.
|
|
||||||
* [BUG #1055](https://github.com/BurntSushi/ripgrep/issues/1055):
|
|
||||||
Suggest `-U/--multiline` when a pattern contains a `\n`.
|
|
||||||
* [BUG #1063](https://github.com/BurntSushi/ripgrep/issues/1063):
|
|
||||||
Always strip a BOM if it's present, even for UTF-8.
|
|
||||||
* [BUG #1064](https://github.com/BurntSushi/ripgrep/issues/1064):
|
|
||||||
Fix inner literal detection that could lead to incorrect matches.
|
|
||||||
* [BUG #1079](https://github.com/BurntSushi/ripgrep/issues/1079):
|
|
||||||
Fixes a bug where the order of globs could result in missing a match.
|
|
||||||
* [BUG #1089](https://github.com/BurntSushi/ripgrep/issues/1089):
|
|
||||||
Fix another bug where ripgrep could panic when transcoding UTF-16 files.
|
|
||||||
* [BUG #1091](https://github.com/BurntSushi/ripgrep/issues/1091):
|
|
||||||
Add note about inverted flags to the man page.
|
|
||||||
* [BUG #1093](https://github.com/BurntSushi/ripgrep/pull/1093):
|
|
||||||
Fix handling of literal slashes in gitignore patterns.
|
|
||||||
* [BUG #1095](https://github.com/BurntSushi/ripgrep/issues/1095):
|
|
||||||
Fix corner cases involving the `--crlf` flag.
|
|
||||||
* [BUG #1101](https://github.com/BurntSushi/ripgrep/issues/1101):
|
|
||||||
Fix AsciiDoc escaping for man page output.
|
|
||||||
* [BUG #1103](https://github.com/BurntSushi/ripgrep/issues/1103):
|
|
||||||
Clarify what `--encoding auto` does.
|
|
||||||
* [BUG #1106](https://github.com/BurntSushi/ripgrep/issues/1106):
|
|
||||||
`--files-with-matches` and `--files-without-match` work with one file.
|
|
||||||
* [BUG #1121](https://github.com/BurntSushi/ripgrep/issues/1121):
|
|
||||||
Fix bug that was triggering Windows antimalware when using the `--files`
|
|
||||||
flag.
|
|
||||||
* [BUG #1125](https://github.com/BurntSushi/ripgrep/issues/1125),
|
|
||||||
[BUG #1159](https://github.com/BurntSushi/ripgrep/issues/1159):
|
|
||||||
ripgrep shouldn't panic for `rg -h | rg` and should emit correct exit status.
|
|
||||||
* [BUG #1144](https://github.com/BurntSushi/ripgrep/issues/1144):
|
|
||||||
Fixes a bug where line numbers could be wrong on big-endian machines.
|
|
||||||
* [BUG #1154](https://github.com/BurntSushi/ripgrep/issues/1154):
|
|
||||||
Windows files with "hidden" attribute are now treated as hidden.
|
|
||||||
* [BUG #1173](https://github.com/BurntSushi/ripgrep/issues/1173):
|
|
||||||
Fix handling of `**` patterns in gitignore files.
|
|
||||||
* [BUG #1174](https://github.com/BurntSushi/ripgrep/issues/1174):
|
|
||||||
Fix handling of repeated `**` patterns in gitignore files.
|
|
||||||
* [BUG #1176](https://github.com/BurntSushi/ripgrep/issues/1176):
|
|
||||||
Fix bug where `-F`/`-x` weren't applied to patterns given via `-f`.
|
|
||||||
* [BUG #1189](https://github.com/BurntSushi/ripgrep/issues/1189):
|
|
||||||
Document cases where ripgrep may use a lot of memory.
|
|
||||||
* [BUG #1203](https://github.com/BurntSushi/ripgrep/issues/1203):
|
|
||||||
Fix a matching bug related to the suffix literal optimization.
|
|
||||||
* [BUG 8f14cb18](https://github.com/BurntSushi/ripgrep/commit/8f14cb18):
|
|
||||||
Increase the default stack size for PCRE2's JIT.
|
|
||||||
|
|
||||||
|
|
||||||
0.10.0 (2018-09-07)
|
|
||||||
===================
|
|
||||||
This is a new minor version release of ripgrep that contains some major new
|
This is a new minor version release of ripgrep that contains some major new
|
||||||
features, a huge number of bug fixes, and is the first release based on
|
features, a huge number of bug fixes, and is the first release based on
|
||||||
libripgrep. The entirety of ripgrep's core search and printing code has been
|
libripgrep. The entirety of ripgrep's core search and printing code has been
|
||||||
|
715
Cargo.lock
generated
715
Cargo.lock
generated
File diff suppressed because it is too large
Load Diff
@@ -17,7 +17,6 @@ license = "Unlicense OR MIT"
|
|||||||
exclude = ["HomebrewFormula"]
|
exclude = ["HomebrewFormula"]
|
||||||
build = "build.rs"
|
build = "build.rs"
|
||||||
autotests = false
|
autotests = false
|
||||||
edition = "2018"
|
|
||||||
|
|
||||||
[badges]
|
[badges]
|
||||||
travis-ci = { repository = "BurntSushi/ripgrep" }
|
travis-ci = { repository = "BurntSushi/ripgrep" }
|
||||||
@@ -46,9 +45,8 @@ members = [
|
|||||||
]
|
]
|
||||||
|
|
||||||
[dependencies]
|
[dependencies]
|
||||||
bstr = "0.1.2"
|
grep = { version = "0.2.2", path = "grep" }
|
||||||
grep = { version = "0.2.3", path = "grep" }
|
ignore = { version = "0.4.3", path = "ignore" }
|
||||||
ignore = { version = "0.4.7", path = "ignore" }
|
|
||||||
lazy_static = "1.1.0"
|
lazy_static = "1.1.0"
|
||||||
log = "0.4.5"
|
log = "0.4.5"
|
||||||
num_cpus = "1.8.0"
|
num_cpus = "1.8.0"
|
||||||
@@ -74,6 +72,7 @@ serde = "1.0.77"
|
|||||||
serde_derive = "1.0.77"
|
serde_derive = "1.0.77"
|
||||||
|
|
||||||
[features]
|
[features]
|
||||||
|
avx-accel = ["grep/avx-accel"]
|
||||||
simd-accel = ["grep/simd-accel"]
|
simd-accel = ["grep/simd-accel"]
|
||||||
pcre2 = ["grep/pcre2"]
|
pcre2 = ["grep/pcre2"]
|
||||||
|
|
||||||
@@ -82,7 +81,6 @@ debug = 1
|
|||||||
|
|
||||||
[package.metadata.deb]
|
[package.metadata.deb]
|
||||||
features = ["pcre2"]
|
features = ["pcre2"]
|
||||||
section = "utils"
|
|
||||||
assets = [
|
assets = [
|
||||||
["target/release/rg", "usr/bin/", "755"],
|
["target/release/rg", "usr/bin/", "755"],
|
||||||
["COPYING", "usr/share/doc/ripgrep/", "644"],
|
["COPYING", "usr/share/doc/ripgrep/", "644"],
|
||||||
|
15
FAQ.md
15
FAQ.md
@@ -118,7 +118,7 @@ from run to run of ripgrep.
|
|||||||
The only way to make the order of results consistent is to ask ripgrep to
|
The only way to make the order of results consistent is to ask ripgrep to
|
||||||
sort the output. Currently, this will disable all parallelism. (On smaller
|
sort the output. Currently, this will disable all parallelism. (On smaller
|
||||||
repositories, you might not notice much of a performance difference!) You
|
repositories, you might not notice much of a performance difference!) You
|
||||||
can achieve this with the `--sort path` flag.
|
can achieve this with the `--sort-files` flag.
|
||||||
|
|
||||||
There is more discussion on this topic here:
|
There is more discussion on this topic here:
|
||||||
https://github.com/BurntSushi/ripgrep/issues/152
|
https://github.com/BurntSushi/ripgrep/issues/152
|
||||||
@@ -136,10 +136,10 @@ How do I search compressed files?
|
|||||||
</h3>
|
</h3>
|
||||||
|
|
||||||
ripgrep's `-z/--search-zip` flag will cause it to search compressed files
|
ripgrep's `-z/--search-zip` flag will cause it to search compressed files
|
||||||
automatically. Currently, this supports gzip, bzip2, xz, lzma, lz4, Brotli and
|
automatically. Currently, this supports gzip, bzip2, lzma, lz4 and xz only and
|
||||||
Zstd. Each of these requires requires the corresponding `gzip`, `bzip2`, `xz`,
|
requires the corresponding `gzip`, `bzip2` and `xz` binaries to be installed on
|
||||||
`lz4`, `brotli` and `zstd` binaries to be installed on your system. (That is,
|
your system. (That is, ripgrep does decompression by shelling out to another
|
||||||
ripgrep does decompression by shelling out to another process.)
|
process.)
|
||||||
|
|
||||||
ripgrep currently does not search archive formats, so `*.tar.gz` files, for
|
ripgrep currently does not search archive formats, so `*.tar.gz` files, for
|
||||||
example, are skipped.
|
example, are skipped.
|
||||||
@@ -149,8 +149,9 @@ example, are skipped.
|
|||||||
How do I search over multiple lines?
|
How do I search over multiple lines?
|
||||||
</h3>
|
</h3>
|
||||||
|
|
||||||
The `-U/--multiline` flag enables ripgrep to report results that span over
|
This isn't currently possible. ripgrep is fundamentally a line-oriented search
|
||||||
multiple lines.
|
tool. With that said,
|
||||||
|
[multiline search is a planned opt-in feature](https://github.com/BurntSushi/ripgrep/issues/176).
|
||||||
|
|
||||||
|
|
||||||
<h3 name="fancy">
|
<h3 name="fancy">
|
||||||
|
126
GUIDE.md
126
GUIDE.md
@@ -18,7 +18,6 @@ translatable to any command line shell environment.
|
|||||||
* [Replacements](#replacements)
|
* [Replacements](#replacements)
|
||||||
* [Configuration file](#configuration-file)
|
* [Configuration file](#configuration-file)
|
||||||
* [File encoding](#file-encoding)
|
* [File encoding](#file-encoding)
|
||||||
* [Binary data](#binary-data)
|
|
||||||
* [Common options](#common-options)
|
* [Common options](#common-options)
|
||||||
|
|
||||||
|
|
||||||
@@ -236,11 +235,6 @@ Like `.gitignore`, a `.ignore` file can be placed in any directory. Its rules
|
|||||||
will be processed with respect to the directory it resides in, just like
|
will be processed with respect to the directory it resides in, just like
|
||||||
`.gitignore`.
|
`.gitignore`.
|
||||||
|
|
||||||
To process `.gitignore` and `.ignore` files case insensitively, use the flag
|
|
||||||
`--ignore-file-case-insensitive`. This is especially useful on case insensitive
|
|
||||||
file systems like those on Windows and macOS. Note though that this can come
|
|
||||||
with a significant performance penalty, and is therefore disabled by default.
|
|
||||||
|
|
||||||
For a more in depth description of how glob patterns in a `.gitignore` file
|
For a more in depth description of how glob patterns in a `.gitignore` file
|
||||||
are interpreted, please see `man gitignore`.
|
are interpreted, please see `man gitignore`.
|
||||||
|
|
||||||
@@ -526,9 +520,9 @@ config file. Once the environment variable is set, open the file and just type
|
|||||||
in the flags you want set automatically. There are only two rules for
|
in the flags you want set automatically. There are only two rules for
|
||||||
describing the format of the config file:
|
describing the format of the config file:
|
||||||
|
|
||||||
1. Every line is a shell argument, after trimming whitespace.
|
1. Every line is a shell argument, after trimming ASCII whitespace.
|
||||||
2. Lines starting with `#` (optionally preceded by any amount of whitespace)
|
2. Lines starting with `#` (optionally preceded by any amount of
|
||||||
are ignored.
|
ASCII whitespace) are ignored.
|
||||||
|
|
||||||
In particular, there is no escaping. Each line is given to ripgrep as a single
|
In particular, there is no escaping. Each line is given to ripgrep as a single
|
||||||
command line argument verbatim.
|
command line argument verbatim.
|
||||||
@@ -538,9 +532,8 @@ formatting peculiarities:
|
|||||||
|
|
||||||
```
|
```
|
||||||
$ cat $HOME/.ripgreprc
|
$ cat $HOME/.ripgreprc
|
||||||
# Don't let ripgrep vomit really long lines to my terminal, and show a preview.
|
# Don't let ripgrep vomit really long lines to my terminal.
|
||||||
--max-columns=150
|
--max-columns=150
|
||||||
--max-columns-preview
|
|
||||||
|
|
||||||
# Add my 'web' type.
|
# Add my 'web' type.
|
||||||
--type-add
|
--type-add
|
||||||
@@ -605,14 +598,13 @@ topic, but we can try to summarize its relevancy to ripgrep:
|
|||||||
* Files are generally just a bundle of bytes. There is no reliable way to know
|
* Files are generally just a bundle of bytes. There is no reliable way to know
|
||||||
their encoding.
|
their encoding.
|
||||||
* Either the encoding of the pattern must match the encoding of the files being
|
* Either the encoding of the pattern must match the encoding of the files being
|
||||||
searched, or a form of transcoding must be performed that converts either the
|
searched, or a form of transcoding must be performed converts either the
|
||||||
pattern or the file to the same encoding as the other.
|
pattern or the file to the same encoding as the other.
|
||||||
* ripgrep tends to work best on plain text files, and among plain text files,
|
* ripgrep tends to work best on plain text files, and among plain text files,
|
||||||
the most popular encodings likely consist of ASCII, latin1 or UTF-8. As
|
the most popular encodings likely consist of ASCII, latin1 or UTF-8. As
|
||||||
a special exception, UTF-16 is prevalent in Windows environments
|
a special exception, UTF-16 is prevalent in Windows environments
|
||||||
|
|
||||||
In light of the above, here is how ripgrep behaves when `--encoding auto` is
|
In light of the above, here is how ripgrep behaves:
|
||||||
given, which is the default:
|
|
||||||
|
|
||||||
* All input is assumed to be ASCII compatible (which means every byte that
|
* All input is assumed to be ASCII compatible (which means every byte that
|
||||||
corresponds to an ASCII codepoint actually is an ASCII codepoint). This
|
corresponds to an ASCII codepoint actually is an ASCII codepoint). This
|
||||||
@@ -628,15 +620,12 @@ given, which is the default:
|
|||||||
they correspond to a UTF-16 BOM, then ripgrep will transcode the contents of
|
they correspond to a UTF-16 BOM, then ripgrep will transcode the contents of
|
||||||
the file from UTF-16 to UTF-8, and then execute the search on the transcoded
|
the file from UTF-16 to UTF-8, and then execute the search on the transcoded
|
||||||
version of the file. (This incurs a performance penalty since transcoding
|
version of the file. (This incurs a performance penalty since transcoding
|
||||||
is slower than regex searching.) If the file contains invalid UTF-16, then
|
is slower than regex searching.)
|
||||||
the Unicode replacement codepoint is substituted in place of invalid code
|
|
||||||
units.
|
|
||||||
* To handle other cases, ripgrep provides a `-E/--encoding` flag, which permits
|
* To handle other cases, ripgrep provides a `-E/--encoding` flag, which permits
|
||||||
you to specify an encoding from the
|
you to specify an encoding from the
|
||||||
[Encoding Standard](https://encoding.spec.whatwg.org/#concept-encoding-get).
|
[Encoding Standard](https://encoding.spec.whatwg.org/#concept-encoding-get).
|
||||||
ripgrep will assume *all* files searched are the encoding specified (unless
|
ripgrep will assume *all* files searched are the encoding specified and
|
||||||
the file has a BOM) and will perform a transcoding step just like in the
|
will perform a transcoding step just like in the UTF-16 case described above.
|
||||||
UTF-16 case described above.
|
|
||||||
|
|
||||||
By default, ripgrep will not require its input be valid UTF-8. That is, ripgrep
|
By default, ripgrep will not require its input be valid UTF-8. That is, ripgrep
|
||||||
can and will search arbitrary bytes. The key here is that if you're searching
|
can and will search arbitrary bytes. The key here is that if you're searching
|
||||||
@@ -646,26 +635,9 @@ pattern won't find anything. With all that said, this mode of operation is
|
|||||||
important, because it lets you find ASCII or UTF-8 *within* files that are
|
important, because it lets you find ASCII or UTF-8 *within* files that are
|
||||||
otherwise arbitrary bytes.
|
otherwise arbitrary bytes.
|
||||||
|
|
||||||
As a special case, the `-E/--encoding` flag supports the value `none`, which
|
|
||||||
will completely disable all encoding related logic, including BOM sniffing.
|
|
||||||
When `-E/--encoding` is set to `none`, ripgrep will search the raw bytes of
|
|
||||||
the underlying file with no transcoding step. For example, here's how you might
|
|
||||||
search the raw UTF-16 encoding of the string `Шерлок`:
|
|
||||||
|
|
||||||
```
|
|
||||||
$ rg '(?-u)\(\x045\x04@\x04;\x04>\x04:\x04' -E none -a some-utf16-file
|
|
||||||
```
|
|
||||||
|
|
||||||
Of course, that's just an example meant to show how one can drop down into
|
|
||||||
raw bytes. Namely, the simpler command works as you might expect automatically:
|
|
||||||
|
|
||||||
```
|
|
||||||
$ rg 'Шерлок' some-utf16-file
|
|
||||||
```
|
|
||||||
|
|
||||||
Finally, it is possible to disable ripgrep's Unicode support from within the
|
Finally, it is possible to disable ripgrep's Unicode support from within the
|
||||||
regular expression. For example, let's say you wanted `.` to match any byte
|
pattern regular expression. For example, let's say you wanted `.` to match any
|
||||||
rather than any Unicode codepoint. (You might want this while searching a
|
byte rather than any Unicode codepoint. (You might want this while searching a
|
||||||
binary file, since `.` by default will not match invalid UTF-8.) You could do
|
binary file, since `.` by default will not match invalid UTF-8.) You could do
|
||||||
this by disabling Unicode via a regular expression flag:
|
this by disabling Unicode via a regular expression flag:
|
||||||
|
|
||||||
@@ -682,76 +654,6 @@ $ rg '\w(?-u:\w)\w'
|
|||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
### Binary data
|
|
||||||
|
|
||||||
In addition to skipping hidden files and files in your `.gitignore` by default,
|
|
||||||
ripgrep also attempts to skip binary files. ripgrep does this by default
|
|
||||||
because binary files (like PDFs or images) are typically not things you want to
|
|
||||||
search when searching for regex matches. Moreover, if content in a binary file
|
|
||||||
did match, then it's possible for undesirable binary data to be printed to your
|
|
||||||
terminal and wreak havoc.
|
|
||||||
|
|
||||||
Unfortunately, unlike skipping hidden files and respecting your `.gitignore`
|
|
||||||
rules, a file cannot as easily be classified as binary. In order to figure out
|
|
||||||
whether a file is binary, the most effective heuristic that balances
|
|
||||||
correctness with performance is to simply look for `NUL` bytes. At that point,
|
|
||||||
the determination is simple: a file is considered "binary" if and only if it
|
|
||||||
contains a `NUL` byte somewhere in its contents.
|
|
||||||
|
|
||||||
The issue is that while most binary files will have a `NUL` byte toward the
|
|
||||||
beginning of its contents, this is not necessarily true. The `NUL` byte might
|
|
||||||
be the very last byte in a large file, but that file is still considered
|
|
||||||
binary. While this leads to a fair amount of complexity inside ripgrep's
|
|
||||||
implementation, it also results in some unintuitive user experiences.
|
|
||||||
|
|
||||||
At a high level, ripgrep operates in three different modes with respect to
|
|
||||||
binary files:
|
|
||||||
|
|
||||||
1. The default mode is to attempt to remove binary files from a search
|
|
||||||
completely. This is meant to mirror how ripgrep removes hidden files and
|
|
||||||
files in your `.gitignore` automatically. That is, as soon as a file is
|
|
||||||
detected as binary, searching stops. If a match was already printed (because
|
|
||||||
it was detected long before a `NUL` byte), then ripgrep will print a warning
|
|
||||||
message indicating that the search stopped prematurely. This default mode
|
|
||||||
**only applies to files searched by ripgrep as a result of recursive
|
|
||||||
directory traversal**, which is consistent with ripgrep's other automatic
|
|
||||||
filtering. For example, `rg foo .file` will search `.file` even though it
|
|
||||||
is hidden. Similarly, `rg foo binary-file` search `binary-file` in "binary"
|
|
||||||
mode automatically.
|
|
||||||
2. Binary mode is similar to the default mode, except it will not always
|
|
||||||
stop searching after it sees a `NUL` byte. Namely, in this mode, ripgrep
|
|
||||||
will continue searching a file that is known to be binary until the first
|
|
||||||
of two conditions is met: 1) the end of the file has been reached or 2) a
|
|
||||||
match is or has been seen. This means that in binary mode, if ripgrep
|
|
||||||
reports no matches, then there are no matches in the file. When a match does
|
|
||||||
occur, ripgrep prints a message similar to one it prints when in its default
|
|
||||||
mode indicating that the search has stopped prematurely. This mode can be
|
|
||||||
forcefully enabled for all files with the `--binary` flag. The purpose of
|
|
||||||
binary mode is to provide a way to discover matches in all files, but to
|
|
||||||
avoid having binary data dumped into your terminal.
|
|
||||||
3. Text mode completely disables all binary detection and searches all files
|
|
||||||
as if they were text. This is useful when searching a file that is
|
|
||||||
predominantly text but contains a `NUL` byte, or if you are specifically
|
|
||||||
trying to search binary data. This mode can be enabled with the `-a/--text`
|
|
||||||
flag. Note that when using this mode on very large binary files, it is
|
|
||||||
possible for ripgrep to use a lot of memory.
|
|
||||||
|
|
||||||
Unfortunately, there is one additional complexity in ripgrep that can make it
|
|
||||||
difficult to reason about binary files. That is, the way binary detection works
|
|
||||||
depends on the way that ripgrep searches your files. Specifically:
|
|
||||||
|
|
||||||
* When ripgrep uses memory maps, then binary detection is only performed on the
|
|
||||||
first few kilobytes of the file in addition to every matching line.
|
|
||||||
* When ripgrep doesn't use memory maps, then binary detection is performed on
|
|
||||||
all bytes searched.
|
|
||||||
|
|
||||||
This means that whether a file is detected as binary or not can change based
|
|
||||||
on the internal search strategy used by ripgrep. If you prefer to keep
|
|
||||||
ripgrep's binary file detection consistent, then you can disable memory maps
|
|
||||||
via the `--no-mmap` flag. (The cost will be a small performance regression when
|
|
||||||
searching very large files on some platforms.)
|
|
||||||
|
|
||||||
|
|
||||||
### Common options
|
### Common options
|
||||||
|
|
||||||
ripgrep has a lot of flags. Too many to keep in your head at once. This section
|
ripgrep has a lot of flags. Too many to keep in your head at once. This section
|
||||||
@@ -773,10 +675,10 @@ used options that will likely impact how you use ripgrep on a regular basis.
|
|||||||
* `--files`: Print the files that ripgrep *would* search, but don't actually
|
* `--files`: Print the files that ripgrep *would* search, but don't actually
|
||||||
search them.
|
search them.
|
||||||
* `-a/--text`: Search binary files as if they were plain text.
|
* `-a/--text`: Search binary files as if they were plain text.
|
||||||
* `-z/--search-zip`: Search compressed files (gzip, bzip2, lzma, xz, lz4,
|
* `-z/--search-zip`: Search compressed files (gzip, bzip2, lzma, xz). This is
|
||||||
brotli, zstd). This is disabled by default.
|
disabled by default.
|
||||||
* `-C/--context`: Show the lines surrounding a match.
|
* `-C/--context`: Show the lines surrounding a match.
|
||||||
* `--sort path`: Force ripgrep to sort its output by file name. (This disables
|
* `--sort-files`: Force ripgrep to sort its output by file name. (This disables
|
||||||
parallelism, so it might be slower.)
|
parallelism, so it might be slower.)
|
||||||
* `-L/--follow`: Follow symbolic links while recursively searching.
|
* `-L/--follow`: Follow symbolic links while recursively searching.
|
||||||
* `-M/--max-columns`: Limit the length of lines printed by ripgrep.
|
* `-M/--max-columns`: Limit the length of lines printed by ripgrep.
|
||||||
|
63
README.md
63
README.md
@@ -1,17 +1,15 @@
|
|||||||
ripgrep (rg)
|
ripgrep (rg)
|
||||||
------------
|
------------
|
||||||
ripgrep is a line-oriented search tool that recursively searches your current
|
ripgrep is a line-oriented search tool that recursively searches your current
|
||||||
directory for a regex pattern. By default, ripgrep will respect your .gitignore
|
directory for a regex pattern while respecting your gitignore rules. ripgrep
|
||||||
and automatically skip hidden files/directories and binary files. ripgrep
|
|
||||||
has first class support on Windows, macOS and Linux, with binary downloads
|
has first class support on Windows, macOS and Linux, with binary downloads
|
||||||
available for [every release](https://github.com/BurntSushi/ripgrep/releases).
|
available for [every release](https://github.com/BurntSushi/ripgrep/releases).
|
||||||
ripgrep is similar to other popular search tools like The Silver Searcher, ack
|
ripgrep is similar to other popular search tools like The Silver Searcher,
|
||||||
and grep.
|
ack and grep.
|
||||||
|
|
||||||
[](https://travis-ci.org/BurntSushi/ripgrep)
|
[](https://travis-ci.org/BurntSushi/ripgrep)
|
||||||
[](https://ci.appveyor.com/project/BurntSushi/ripgrep)
|
[](https://ci.appveyor.com/project/BurntSushi/ripgrep)
|
||||||
[](https://crates.io/crates/ripgrep)
|
[](https://crates.io/crates/ripgrep)
|
||||||
[](https://repology.org/project/ripgrep/badges)
|
|
||||||
|
|
||||||
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
|
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
|
||||||
|
|
||||||
@@ -107,7 +105,7 @@ increases the times to `2.640s` for ripgrep and `10.277s` for GNU grep.
|
|||||||
supporting Unicode (which is always on).
|
supporting Unicode (which is always on).
|
||||||
* ripgrep has optional support for switching its regex engine to use PCRE2.
|
* ripgrep has optional support for switching its regex engine to use PCRE2.
|
||||||
Among other things, this makes it possible to use look-around and
|
Among other things, this makes it possible to use look-around and
|
||||||
backreferences in your patterns, which are not supported in ripgrep's default
|
backreferences in your patterns, which are supported in ripgrep's default
|
||||||
regex engine. PCRE2 support is enabled with `-P`.
|
regex engine. PCRE2 support is enabled with `-P`.
|
||||||
* ripgrep supports searching files in text encodings other than UTF-8, such
|
* ripgrep supports searching files in text encodings other than UTF-8, such
|
||||||
as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for
|
as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for
|
||||||
@@ -250,22 +248,21 @@ If you're a **Gentoo** user, you can install ripgrep from the
|
|||||||
$ emerge sys-apps/ripgrep
|
$ emerge sys-apps/ripgrep
|
||||||
```
|
```
|
||||||
|
|
||||||
If you're a **Fedora** user, you can install ripgrep from official
|
If you're a **Fedora 27+** user, you can install ripgrep from official
|
||||||
repositories.
|
repositories.
|
||||||
|
|
||||||
```
|
```
|
||||||
$ sudo dnf install ripgrep
|
$ sudo dnf install ripgrep
|
||||||
```
|
```
|
||||||
|
|
||||||
If you're an **openSUSE Leap 15.0** user, you can install ripgrep from the
|
If you're a **Fedora 24+** user, you can install ripgrep from
|
||||||
[utilities repo](https://build.opensuse.org/package/show/utilities/ripgrep):
|
[copr](https://copr.fedorainfracloud.org/coprs/carlwgeorge/ripgrep/):
|
||||||
|
|
||||||
```
|
```
|
||||||
$ sudo zypper ar https://download.opensuse.org/repositories/utilities/openSUSE_Leap_15.0/utilities.repo
|
$ sudo dnf copr enable carlwgeorge/ripgrep
|
||||||
$ sudo zypper install ripgrep
|
$ sudo dnf install ripgrep
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
If you're an **openSUSE Tumbleweed** user, you can install ripgrep from the
|
If you're an **openSUSE Tumbleweed** user, you can install ripgrep from the
|
||||||
[official repo](http://software.opensuse.org/package/ripgrep):
|
[official repo](http://software.opensuse.org/package/ripgrep):
|
||||||
|
|
||||||
@@ -291,11 +288,12 @@ $ # (Or using the attribute name, which is also ripgrep.)
|
|||||||
|
|
||||||
If you're a **Debian** user (or a user of a Debian derivative like **Ubuntu**),
|
If you're a **Debian** user (or a user of a Debian derivative like **Ubuntu**),
|
||||||
then ripgrep can be installed using a binary `.deb` file provided in each
|
then ripgrep can be installed using a binary `.deb` file provided in each
|
||||||
[ripgrep release](https://github.com/BurntSushi/ripgrep/releases).
|
[ripgrep release](https://github.com/BurntSushi/ripgrep/releases). Note that
|
||||||
|
ripgrep is not in the official Debian or Ubuntu repositories.
|
||||||
|
|
||||||
```
|
```
|
||||||
$ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/0.10.0/ripgrep_0.10.0_amd64.deb
|
$ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/0.9.0/ripgrep_0.9.0_amd64.deb
|
||||||
$ sudo dpkg -i ripgrep_0.10.0_amd64.deb
|
$ sudo dpkg -i ripgrep_0.9.0_amd64.deb
|
||||||
```
|
```
|
||||||
|
|
||||||
If you run Debian Buster (currently Debian testing) or Debian sid, ripgrep is
|
If you run Debian Buster (currently Debian testing) or Debian sid, ripgrep is
|
||||||
@@ -304,14 +302,6 @@ If you run Debian Buster (currently Debian testing) or Debian sid, ripgrep is
|
|||||||
$ sudo apt-get install ripgrep
|
$ sudo apt-get install ripgrep
|
||||||
```
|
```
|
||||||
|
|
||||||
If you're an **Ubuntu Cosmic (18.10)** (or newer) user, ripgrep is
|
|
||||||
[available](https://launchpad.net/ubuntu/+source/rust-ripgrep) using the same
|
|
||||||
packaging as Debian:
|
|
||||||
|
|
||||||
```
|
|
||||||
$ sudo apt-get install ripgrep
|
|
||||||
```
|
|
||||||
|
|
||||||
(N.B. Various snaps for ripgrep on Ubuntu are also available, but none of them
|
(N.B. Various snaps for ripgrep on Ubuntu are also available, but none of them
|
||||||
seem to work right and generate a number of very strange bug reports that I
|
seem to work right and generate a number of very strange bug reports that I
|
||||||
don't know how to fix and don't have the time to fix. Therefore, it is no
|
don't know how to fix and don't have the time to fix. Therefore, it is no
|
||||||
@@ -340,7 +330,7 @@ If you're a **NetBSD** user, then you can install ripgrep from
|
|||||||
|
|
||||||
If you're a **Rust programmer**, ripgrep can be installed with `cargo`.
|
If you're a **Rust programmer**, ripgrep can be installed with `cargo`.
|
||||||
|
|
||||||
* Note that the minimum supported version of Rust for ripgrep is **1.34.0**,
|
* Note that the minimum supported version of Rust for ripgrep is **1.28.0**,
|
||||||
although ripgrep may work with older versions.
|
although ripgrep may work with older versions.
|
||||||
* Note that the binary may be bigger than expected because it contains debug
|
* Note that the binary may be bigger than expected because it contains debug
|
||||||
symbols. This is intentional. To remove debug symbols and therefore reduce
|
symbols. This is intentional. To remove debug symbols and therefore reduce
|
||||||
@@ -350,6 +340,9 @@ If you're a **Rust programmer**, ripgrep can be installed with `cargo`.
|
|||||||
$ cargo install ripgrep
|
$ cargo install ripgrep
|
||||||
```
|
```
|
||||||
|
|
||||||
|
When compiling with Rust 1.27 or newer, this will automatically enable SIMD
|
||||||
|
optimizations for search.
|
||||||
|
|
||||||
ripgrep isn't currently in any other package repositories.
|
ripgrep isn't currently in any other package repositories.
|
||||||
[I'd like to change that](https://github.com/BurntSushi/ripgrep/issues/10).
|
[I'd like to change that](https://github.com/BurntSushi/ripgrep/issues/10).
|
||||||
|
|
||||||
@@ -358,7 +351,7 @@ ripgrep isn't currently in any other package repositories.
|
|||||||
|
|
||||||
ripgrep is written in Rust, so you'll need to grab a
|
ripgrep is written in Rust, so you'll need to grab a
|
||||||
[Rust installation](https://www.rust-lang.org/) in order to compile it.
|
[Rust installation](https://www.rust-lang.org/) in order to compile it.
|
||||||
ripgrep compiles with Rust 1.34.0 (stable) or newer. In general, ripgrep tracks
|
ripgrep compiles with Rust 1.28.0 (stable) or newer. In general, ripgrep tracks
|
||||||
the latest stable release of the Rust compiler.
|
the latest stable release of the Rust compiler.
|
||||||
|
|
||||||
To build ripgrep:
|
To build ripgrep:
|
||||||
@@ -375,14 +368,18 @@ If you have a Rust nightly compiler and a recent Intel CPU, then you can enable
|
|||||||
additional optional SIMD acceleration like so:
|
additional optional SIMD acceleration like so:
|
||||||
|
|
||||||
```
|
```
|
||||||
RUSTFLAGS="-C target-cpu=native" cargo build --release --features 'simd-accel'
|
RUSTFLAGS="-C target-cpu=native" cargo build --release --features 'simd-accel avx-accel'
|
||||||
```
|
```
|
||||||
|
|
||||||
The `simd-accel` feature enables SIMD support in certain ripgrep dependencies
|
If your machine doesn't support AVX instructions, then simply remove
|
||||||
(responsible for transcoding). They are not necessary to get SIMD optimizations
|
`avx-accel` from the features list. Similarly for SIMD (which corresponds
|
||||||
for search; those are enabled automatically. Hopefully, some day, the
|
roughly to SSE instructions).
|
||||||
`simd-accel` feature will similarly become unnecessary. **WARNING:** Currently,
|
|
||||||
enabling this option can increase compilation times dramatically.
|
The `simd-accel` and `avx-accel` features enable SIMD support in certain
|
||||||
|
ripgrep dependencies (responsible for counting lines and transcoding). They
|
||||||
|
are not necessary to get SIMD optimizations for search; those are enabled
|
||||||
|
automatically. Hopefully, some day, the `simd-accel` and `avx-accel` features
|
||||||
|
will similarly become unnecessary.
|
||||||
|
|
||||||
Finally, optional PCRE2 support can be built with ripgrep by enabling the
|
Finally, optional PCRE2 support can be built with ripgrep by enabling the
|
||||||
`pcre2` feature:
|
`pcre2` feature:
|
||||||
@@ -391,8 +388,8 @@ Finally, optional PCRE2 support can be built with ripgrep by enabling the
|
|||||||
$ cargo build --release --features 'pcre2'
|
$ cargo build --release --features 'pcre2'
|
||||||
```
|
```
|
||||||
|
|
||||||
(Tip: use `--features 'pcre2 simd-accel'` to also include compile time SIMD
|
(Tip: use `--features 'pcre2 simd-accel avx-accel'` to also include compile
|
||||||
optimizations, which will only work with a nightly compiler.)
|
time SIMD optimizations, which will only work with a nightly compiler.)
|
||||||
|
|
||||||
Enabling the PCRE2 feature works with a stable Rust compiler and will
|
Enabling the PCRE2 feature works with a stable Rust compiler and will
|
||||||
attempt to automatically find and link with your system's PCRE2 library via
|
attempt to automatically find and link with your system's PCRE2 library via
|
||||||
|
@@ -73,9 +73,10 @@ deploy:
|
|||||||
# deploy when a new tag is pushed and only on the stable channel
|
# deploy when a new tag is pushed and only on the stable channel
|
||||||
on:
|
on:
|
||||||
CHANNEL: stable
|
CHANNEL: stable
|
||||||
appveyor_repo_tag: true
|
branch: ag/prepare-0.10.0
|
||||||
|
|
||||||
branches:
|
branches:
|
||||||
only:
|
only:
|
||||||
- /^\d+\.\d+\.\d+$/
|
- /^\d+\.\d+\.\d+$/
|
||||||
- master
|
- master
|
||||||
|
- ag/prepare-0.10.0
|
||||||
|
12
build.rs
12
build.rs
@@ -1,3 +1,8 @@
|
|||||||
|
#[macro_use]
|
||||||
|
extern crate clap;
|
||||||
|
#[macro_use]
|
||||||
|
extern crate lazy_static;
|
||||||
|
|
||||||
use std::env;
|
use std::env;
|
||||||
use std::fs::{self, File};
|
use std::fs::{self, File};
|
||||||
use std::io::{self, Read, Write};
|
use std::io::{self, Read, Write};
|
||||||
@@ -163,12 +168,7 @@ fn formatted_arg(arg: &RGArg) -> io::Result<String> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
fn formatted_doc_txt(arg: &RGArg) -> io::Result<String> {
|
fn formatted_doc_txt(arg: &RGArg) -> io::Result<String> {
|
||||||
let paragraphs: Vec<String> = arg.doc_long
|
let paragraphs: Vec<&str> = arg.doc_long.split("\n\n").collect();
|
||||||
.replace("{", "{")
|
|
||||||
.replace("}", r"}")
|
|
||||||
.split("\n\n")
|
|
||||||
.map(|s| s.to_string())
|
|
||||||
.collect();
|
|
||||||
if paragraphs.is_empty() {
|
if paragraphs.is_empty() {
|
||||||
return Err(ioerr(format!("missing docs for --{}", arg.name)));
|
return Err(ioerr(format!("missing docs for --{}", arg.name)));
|
||||||
}
|
}
|
||||||
|
@@ -11,9 +11,7 @@ mk_artifacts() {
|
|||||||
if is_arm; then
|
if is_arm; then
|
||||||
cargo build --target "$TARGET" --release
|
cargo build --target "$TARGET" --release
|
||||||
else
|
else
|
||||||
# Technically, MUSL builds will force PCRE2 to get statically compiled,
|
cargo build --target "$TARGET" --release --features 'pcre2'
|
||||||
# but we also want PCRE2 statically build for macOS binaries.
|
|
||||||
PCRE2_SYS_STATIC=1 cargo build --target "$TARGET" --release --features 'pcre2'
|
|
||||||
fi
|
fi
|
||||||
}
|
}
|
||||||
|
|
||||||
|
27
complete/_rg
27
complete/_rg
@@ -43,7 +43,6 @@ _rg() {
|
|||||||
+ '(exclusive)' # Misc. fully exclusive options
|
+ '(exclusive)' # Misc. fully exclusive options
|
||||||
'(: * -)'{-h,--help}'[display help information]'
|
'(: * -)'{-h,--help}'[display help information]'
|
||||||
'(: * -)'{-V,--version}'[display version information]'
|
'(: * -)'{-V,--version}'[display version information]'
|
||||||
'(: * -)'--pcre2-version'[print the version of PCRE2 used by ripgrep, if available]'
|
|
||||||
|
|
||||||
+ '(buffered)' # buffering options
|
+ '(buffered)' # buffering options
|
||||||
'--line-buffered[force line buffering]'
|
'--line-buffered[force line buffering]'
|
||||||
@@ -86,7 +85,7 @@ _rg() {
|
|||||||
|
|
||||||
+ '(file-name)' # File-name options
|
+ '(file-name)' # File-name options
|
||||||
{-H,--with-filename}'[show file name for matches]'
|
{-H,--with-filename}'[show file name for matches]'
|
||||||
{-I,--no-filename}"[don't show file name for matches]"
|
"--no-filename[don't show file name for matches]"
|
||||||
|
|
||||||
+ '(file-system)' # File system options
|
+ '(file-system)' # File system options
|
||||||
"--one-file-system[don't descend into directories on other file systems]"
|
"--one-file-system[don't descend into directories on other file systems]"
|
||||||
@@ -112,17 +111,9 @@ _rg() {
|
|||||||
'--hidden[search hidden files and directories]'
|
'--hidden[search hidden files and directories]'
|
||||||
$no"--no-hidden[don't search hidden files and directories]"
|
$no"--no-hidden[don't search hidden files and directories]"
|
||||||
|
|
||||||
+ '(hybrid)' # hybrid regex options
|
|
||||||
'--auto-hybrid-regex[dynamically use PCRE2 if necessary]'
|
|
||||||
$no"--no-auto-hybrid-regex[don't dynamically use PCRE2 if necessary]"
|
|
||||||
|
|
||||||
+ '(ignore)' # Ignore-file options
|
+ '(ignore)' # Ignore-file options
|
||||||
"(--no-ignore-global --no-ignore-parent --no-ignore-vcs --no-ignore-dot)--no-ignore[don't respect ignore files]"
|
"(--no-ignore-global --no-ignore-parent --no-ignore-vcs)--no-ignore[don't respect ignore files]"
|
||||||
$no'(--ignore-global --ignore-parent --ignore-vcs --ignore-dot)--ignore[respect ignore files]'
|
$no'(--ignore-global --ignore-parent --ignore-vcs)--ignore[respect ignore files]'
|
||||||
|
|
||||||
+ '(ignore-file-case-insensitive)' # Ignore-file case sensitivity options
|
|
||||||
'--ignore-file-case-insensitive[process ignore files case insensitively]'
|
|
||||||
$no'--no-ignore-file-case-insensitive[process ignore files case sensitively]'
|
|
||||||
|
|
||||||
+ '(ignore-global)' # Global ignore-file options
|
+ '(ignore-global)' # Global ignore-file options
|
||||||
"--no-ignore-global[don't respect global ignore files]"
|
"--no-ignore-global[don't respect global ignore files]"
|
||||||
@@ -136,10 +127,6 @@ _rg() {
|
|||||||
"--no-ignore-vcs[don't respect version control ignore files]"
|
"--no-ignore-vcs[don't respect version control ignore files]"
|
||||||
$no'--ignore-vcs[respect version control ignore files]'
|
$no'--ignore-vcs[respect version control ignore files]'
|
||||||
|
|
||||||
+ '(ignore-dot)' # .ignore-file options
|
|
||||||
"--no-ignore-dot[don't respect .ignore files]"
|
|
||||||
$no'--ignore-dot[respect .ignore files]'
|
|
||||||
|
|
||||||
+ '(json)' # JSON options
|
+ '(json)' # JSON options
|
||||||
'--json[output results in JSON Lines format]'
|
'--json[output results in JSON Lines format]'
|
||||||
$no"--no-json[don't output results in JSON Lines format]"
|
$no"--no-json[don't output results in JSON Lines format]"
|
||||||
@@ -153,10 +140,6 @@ _rg() {
|
|||||||
$no"--no-crlf[don't use CRLF as line terminator]"
|
$no"--no-crlf[don't use CRLF as line terminator]"
|
||||||
'(text)--null-data[use NUL as line terminator]'
|
'(text)--null-data[use NUL as line terminator]'
|
||||||
|
|
||||||
+ '(max-columns-preview)' # max column preview options
|
|
||||||
'--max-columns-preview[show preview for long lines (with -M)]'
|
|
||||||
$no"--no-max-columns-preview[don't show preview for long lines (with -M)]"
|
|
||||||
|
|
||||||
+ '(max-depth)' # Directory-depth options
|
+ '(max-depth)' # Directory-depth options
|
||||||
'--max-depth=[specify max number of directories to descend]:number of directories'
|
'--max-depth=[specify max number of directories to descend]:number of directories'
|
||||||
'!--maxdepth=:number of directories'
|
'!--maxdepth=:number of directories'
|
||||||
@@ -236,8 +219,6 @@ _rg() {
|
|||||||
|
|
||||||
+ '(text)' # Binary-search options
|
+ '(text)' # Binary-search options
|
||||||
{-a,--text}'[search binary files as if they were text]'
|
{-a,--text}'[search binary files as if they were text]'
|
||||||
"--binary[search binary files, don't print binary data]"
|
|
||||||
$no"--no-binary[don't search binary files]"
|
|
||||||
$no"(--null-data)--no-text[don't search binary files as if they were text]"
|
$no"(--null-data)--no-text[don't search binary files as if they were text]"
|
||||||
|
|
||||||
+ '(threads)' # Thread-count options
|
+ '(threads)' # Thread-count options
|
||||||
@@ -389,7 +370,7 @@ _rg_encodings() {
|
|||||||
shift{-,_}jis csshiftjis {,x-}sjis ms_kanji ms932
|
shift{-,_}jis csshiftjis {,x-}sjis ms_kanji ms932
|
||||||
utf{,-}8 utf-16{,be,le} unicode-1-1-utf-8
|
utf{,-}8 utf-16{,be,le} unicode-1-1-utf-8
|
||||||
windows-{31j,874,949,125{0..8}} dos-874 tis-620 ansi_x3.4-1968
|
windows-{31j,874,949,125{0..8}} dos-874 tis-620 ansi_x3.4-1968
|
||||||
x-user-defined auto none
|
x-user-defined auto
|
||||||
)
|
)
|
||||||
|
|
||||||
_wanted encodings expl encoding compadd -a "$@" - _encodings
|
_wanted encodings expl encoding compadd -a "$@" - _encodings
|
||||||
|
@@ -34,15 +34,12 @@ files/directories and binary files.
|
|||||||
ripgrep's default regex engine uses finite automata and guarantees linear
|
ripgrep's default regex engine uses finite automata and guarantees linear
|
||||||
time searching. Because of this, features like backreferences and arbitrary
|
time searching. Because of this, features like backreferences and arbitrary
|
||||||
look-around are not supported. However, if ripgrep is built with PCRE2, then
|
look-around are not supported. However, if ripgrep is built with PCRE2, then
|
||||||
the *--pcre2* flag can be used to enable backreferences and look-around.
|
the --pcre2 flag can be used to enable backreferences and look-around.
|
||||||
|
|
||||||
ripgrep supports configuration files. Set *RIPGREP_CONFIG_PATH* to a
|
ripgrep supports configuration files. Set RIPGREP_CONFIG_PATH to a
|
||||||
configuration file. The file can specify one shell argument per line. Lines
|
configuration file. The file can specify one shell argument per line. Lines
|
||||||
starting with *#* are ignored. For more details, see the man page or the
|
starting with '#' are ignored. For more details, see the man page or the
|
||||||
*README*.
|
README.
|
||||||
|
|
||||||
Tip: to disable all smart filtering and make ripgrep behave a bit more like
|
|
||||||
classical grep, use *rg -uuu*.
|
|
||||||
|
|
||||||
|
|
||||||
REGEX SYNTAX
|
REGEX SYNTAX
|
||||||
@@ -55,10 +52,10 @@ https://docs.rs/regex/*/regex/bytes/index.html#syntax
|
|||||||
|
|
||||||
To a first approximation, ripgrep uses Perl-like regexes without look-around or
|
To a first approximation, ripgrep uses Perl-like regexes without look-around or
|
||||||
backreferences. This makes them very similar to the "extended" (ERE) regular
|
backreferences. This makes them very similar to the "extended" (ERE) regular
|
||||||
expressions supported by *egrep*, but with a few additional features like
|
expressions supported by `egrep`, but with a few additional features like
|
||||||
Unicode character classes.
|
Unicode character classes.
|
||||||
|
|
||||||
If you're using ripgrep with the *--pcre2* flag, then please consult
|
If you're using ripgrep with the --pcre2 flag, then please consult
|
||||||
https://www.pcre.org or the PCRE2 man pages for documentation on the supported
|
https://www.pcre.org or the PCRE2 man pages for documentation on the supported
|
||||||
syntax.
|
syntax.
|
||||||
|
|
||||||
@@ -71,37 +68,18 @@ _PATTERN_::
|
|||||||
|
|
||||||
_PATH_::
|
_PATH_::
|
||||||
A file or directory to search. Directories are searched recursively. Paths
|
A file or directory to search. Directories are searched recursively. Paths
|
||||||
specified explicitly on the command line override glob and ignore rules.
|
specified expicitly on the command line override glob and ignore rules.
|
||||||
|
|
||||||
|
|
||||||
OPTIONS
|
OPTIONS
|
||||||
-------
|
-------
|
||||||
Note that for many options, there exist flags to disable them. In some cases,
|
|
||||||
those flags are not listed in a first class way below. For example, the
|
|
||||||
*--column* flag (listed below) enables column numbers in ripgrep's output, but
|
|
||||||
the *--no-column* flag (not listed below) disables them. The reverse can also
|
|
||||||
exist. For example, the *--no-ignore* flag (listed below) disables ripgrep's
|
|
||||||
*gitignore* logic, but the *--ignore* flag (not listed below) enables it. These
|
|
||||||
flags are useful for overriding a ripgrep configuration file on the command
|
|
||||||
line. Each flag's documentation notes whether an inverted flag exists. In all
|
|
||||||
cases, the flag specified last takes precedence.
|
|
||||||
|
|
||||||
{OPTIONS}
|
{OPTIONS}
|
||||||
|
|
||||||
|
|
||||||
EXIT STATUS
|
EXIT STATUS
|
||||||
-----------
|
-----------
|
||||||
If ripgrep finds a match, then the exit status of the program is 0. If no match
|
If ripgrep finds a match, then the exit status of the program is 0. If no match
|
||||||
could be found, then the exit status is 1. If an error occurred, then the exit
|
could be found, then the exit status is non-zero.
|
||||||
status is always 2 unless ripgrep was run with the *--quiet* flag and a match
|
|
||||||
was found. In summary:
|
|
||||||
|
|
||||||
* `0` exit status occurs only when at least one match was found, and if
|
|
||||||
no error occurred, unless *--quiet* was given.
|
|
||||||
* `1` exit status occurs only when no match was found and no error occurred.
|
|
||||||
* `2` exit status occurs when an error occurred. This is true for both
|
|
||||||
catastrophic errors (e.g., a regex syntax error) and for soft errors (e.g.,
|
|
||||||
unable to read a file).
|
|
||||||
|
|
||||||
|
|
||||||
CONFIGURATION FILES
|
CONFIGURATION FILES
|
||||||
@@ -110,12 +88,12 @@ ripgrep supports reading configuration files that change ripgrep's default
|
|||||||
behavior. The format of the configuration file is an "rc" style and is very
|
behavior. The format of the configuration file is an "rc" style and is very
|
||||||
simple. It is defined by two rules:
|
simple. It is defined by two rules:
|
||||||
|
|
||||||
1. Every line is a shell argument, after trimming whitespace.
|
1. Every line is a shell argument, after trimming ASCII whitespace.
|
||||||
2. Lines starting with *#* (optionally preceded by any amount of
|
2. Lines starting with _#_ (optionally preceded by any amount of
|
||||||
whitespace) are ignored.
|
ASCII whitespace) are ignored.
|
||||||
|
|
||||||
ripgrep will look for a single configuration file if and only if the
|
ripgrep will look for a single configuration file if and only if the
|
||||||
*RIPGREP_CONFIG_PATH* environment variable is set and is non-empty.
|
_RIPGREP_CONFIG_PATH_ environment variable is set and is non-empty.
|
||||||
ripgrep will parse shell arguments from this file on startup and will
|
ripgrep will parse shell arguments from this file on startup and will
|
||||||
behave as if the arguments in this file were prepended to any explicit
|
behave as if the arguments in this file were prepended to any explicit
|
||||||
arguments given to ripgrep on the command line.
|
arguments given to ripgrep on the command line.
|
||||||
@@ -177,35 +155,20 @@ SHELL COMPLETION
|
|||||||
Shell completion files are included in the release tarball for Bash, Fish, Zsh
|
Shell completion files are included in the release tarball for Bash, Fish, Zsh
|
||||||
and PowerShell.
|
and PowerShell.
|
||||||
|
|
||||||
For *bash*, move *rg.bash* to *$XDG_CONFIG_HOME/bash_completion*
|
For *bash*, move `rg.bash` to `$XDG_CONFIG_HOME/bash_completion`
|
||||||
or */etc/bash_completion.d/*.
|
or `/etc/bash_completion.d/`.
|
||||||
|
|
||||||
For *fish*, move *rg.fish* to *$HOME/.config/fish/completions*.
|
For *fish*, move `rg.fish` to `$HOME/.config/fish/completions`.
|
||||||
|
|
||||||
For *zsh*, move *_rg* to one of your *$fpath* directories.
|
For *zsh*, move `_rg` to one of your `$fpath` directories.
|
||||||
|
|
||||||
|
|
||||||
CAVEATS
|
CAVEATS
|
||||||
-------
|
-------
|
||||||
ripgrep may abort unexpectedly when using default settings if it searches a
|
ripgrep may abort unexpectedly when using default settings if it searches a
|
||||||
file that is simultaneously truncated. This behavior can be avoided by passing
|
file that is simultaneously truncated. This behavior can be avoided by passing
|
||||||
the *--no-mmap* flag which will forcefully disable the use of memory maps in
|
the --no-mmap flag which will forcefully disable the use of memory maps in all
|
||||||
all cases.
|
cases.
|
||||||
|
|
||||||
ripgrep may use a large amount of memory depending on a few factors. Firstly,
|
|
||||||
if ripgrep uses parallelism for search (the default), then the entire output
|
|
||||||
for each individual file is buffered into memory in order to prevent
|
|
||||||
interleaving matches in the output. To avoid this, you can disable parallelism
|
|
||||||
with the *-j1* flag. Secondly, ripgrep always needs to have at least a single
|
|
||||||
line in memory in order to execute a search. A file with a very long line can
|
|
||||||
thus cause ripgrep to use a lot of memory. Generally, this only occurs when
|
|
||||||
searching binary data with the *-a* flag enabled. (When the *-a* flag isn't
|
|
||||||
enabled, ripgrep will replace all NUL bytes with line terminators, which
|
|
||||||
typically prevents exorbitant memory usage.) Thirdly, when ripgrep searches
|
|
||||||
a large file using a memory map, the process will report its resident memory
|
|
||||||
usage as the size of the file. However, this does not mean ripgrep actually
|
|
||||||
needed to use that much memory; the operating system will generally handle this
|
|
||||||
for you.
|
|
||||||
|
|
||||||
|
|
||||||
VERSION
|
VERSION
|
||||||
@@ -217,11 +180,7 @@ HOMEPAGE
|
|||||||
--------
|
--------
|
||||||
https://github.com/BurntSushi/ripgrep
|
https://github.com/BurntSushi/ripgrep
|
||||||
|
|
||||||
Please report bugs and feature requests in the issue tracker. Please do your
|
Please report bugs and feature requests in the issue tracker.
|
||||||
best to provide a reproducible test case for bugs. This should include the
|
|
||||||
corpus being searched, the *rg* command, the actual output and the expected
|
|
||||||
output. Please also include the output of running the same *rg* command but
|
|
||||||
with the *--debug* flag.
|
|
||||||
|
|
||||||
|
|
||||||
AUTHORS
|
AUTHORS
|
||||||
|
@@ -1,6 +1,6 @@
|
|||||||
[package]
|
[package]
|
||||||
name = "globset"
|
name = "globset"
|
||||||
version = "0.4.3" #:version
|
version = "0.4.2" #:version
|
||||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||||
description = """
|
description = """
|
||||||
Cross platform single glob and glob set matching. Glob set matching is the
|
Cross platform single glob and glob set matching. Glob set matching is the
|
||||||
@@ -19,14 +19,14 @@ name = "globset"
|
|||||||
bench = false
|
bench = false
|
||||||
|
|
||||||
[dependencies]
|
[dependencies]
|
||||||
aho-corasick = "0.7.3"
|
aho-corasick = "0.6.8"
|
||||||
bstr = { version = "0.1.2", default-features = false, features = ["std"] }
|
|
||||||
fnv = "1.0.6"
|
fnv = "1.0.6"
|
||||||
log = "0.4.5"
|
log = "0.4.5"
|
||||||
regex = "1.1.5"
|
memchr = "2.0.2"
|
||||||
|
regex = "1.0.5"
|
||||||
|
|
||||||
[dev-dependencies]
|
[dev-dependencies]
|
||||||
glob = "0.3.0"
|
glob = "0.2.11"
|
||||||
|
|
||||||
[features]
|
[features]
|
||||||
simd-accel = []
|
simd-accel = []
|
||||||
|
@@ -120,7 +120,7 @@ impl GlobMatcher {
|
|||||||
|
|
||||||
/// Tests whether the given path matches this pattern or not.
|
/// Tests whether the given path matches this pattern or not.
|
||||||
pub fn is_match_candidate(&self, path: &Candidate) -> bool {
|
pub fn is_match_candidate(&self, path: &Candidate) -> bool {
|
||||||
self.re.is_match(path.path.as_bytes())
|
self.re.is_match(&path.path)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -145,7 +145,7 @@ impl GlobStrategic {
|
|||||||
|
|
||||||
/// Tests whether the given path matches this pattern or not.
|
/// Tests whether the given path matches this pattern or not.
|
||||||
fn is_match_candidate(&self, candidate: &Candidate) -> bool {
|
fn is_match_candidate(&self, candidate: &Candidate) -> bool {
|
||||||
let byte_path = candidate.path.as_bytes();
|
let byte_path = &*candidate.path;
|
||||||
|
|
||||||
match self.strategy {
|
match self.strategy {
|
||||||
MatchStrategy::Literal(ref lit) => lit.as_bytes() == byte_path,
|
MatchStrategy::Literal(ref lit) => lit.as_bytes() == byte_path,
|
||||||
@@ -837,66 +837,40 @@ impl<'a> Parser<'a> {
|
|||||||
|
|
||||||
fn parse_star(&mut self) -> Result<(), Error> {
|
fn parse_star(&mut self) -> Result<(), Error> {
|
||||||
let prev = self.prev;
|
let prev = self.prev;
|
||||||
if self.peek() != Some('*') {
|
if self.chars.peek() != Some(&'*') {
|
||||||
self.push_token(Token::ZeroOrMore)?;
|
self.push_token(Token::ZeroOrMore)?;
|
||||||
return Ok(());
|
return Ok(());
|
||||||
}
|
}
|
||||||
assert!(self.bump() == Some('*'));
|
assert!(self.bump() == Some('*'));
|
||||||
if !self.have_tokens()? {
|
if !self.have_tokens()? {
|
||||||
if !self.peek().map_or(true, is_separator) {
|
self.push_token(Token::RecursivePrefix)?;
|
||||||
self.push_token(Token::ZeroOrMore)?;
|
let next = self.bump();
|
||||||
self.push_token(Token::ZeroOrMore)?;
|
if !next.map(is_separator).unwrap_or(true) {
|
||||||
} else {
|
return Err(self.error(ErrorKind::InvalidRecursive));
|
||||||
self.push_token(Token::RecursivePrefix)?;
|
|
||||||
assert!(self.bump().map_or(true, is_separator));
|
|
||||||
}
|
}
|
||||||
return Ok(());
|
return Ok(());
|
||||||
}
|
}
|
||||||
|
self.pop_token()?;
|
||||||
if !prev.map(is_separator).unwrap_or(false) {
|
if !prev.map(is_separator).unwrap_or(false) {
|
||||||
if self.stack.len() <= 1
|
if self.stack.len() <= 1
|
||||||
|| (prev != Some(',') && prev != Some('{'))
|
|| (prev != Some(',') && prev != Some('{')) {
|
||||||
{
|
return Err(self.error(ErrorKind::InvalidRecursive));
|
||||||
self.push_token(Token::ZeroOrMore)?;
|
|
||||||
self.push_token(Token::ZeroOrMore)?;
|
|
||||||
return Ok(());
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
let is_suffix =
|
match self.chars.peek() {
|
||||||
match self.peek() {
|
None => {
|
||||||
None => {
|
assert!(self.bump().is_none());
|
||||||
assert!(self.bump().is_none());
|
self.push_token(Token::RecursiveSuffix)
|
||||||
true
|
|
||||||
}
|
|
||||||
Some(',') | Some('}') if self.stack.len() >= 2 => {
|
|
||||||
true
|
|
||||||
}
|
|
||||||
Some(c) if is_separator(c) => {
|
|
||||||
assert!(self.bump().map(is_separator).unwrap_or(false));
|
|
||||||
false
|
|
||||||
}
|
|
||||||
_ => {
|
|
||||||
self.push_token(Token::ZeroOrMore)?;
|
|
||||||
self.push_token(Token::ZeroOrMore)?;
|
|
||||||
return Ok(());
|
|
||||||
}
|
|
||||||
};
|
|
||||||
match self.pop_token()? {
|
|
||||||
Token::RecursivePrefix => {
|
|
||||||
self.push_token(Token::RecursivePrefix)?;
|
|
||||||
}
|
}
|
||||||
Token::RecursiveSuffix => {
|
Some(&',') | Some(&'}') if self.stack.len() >= 2 => {
|
||||||
self.push_token(Token::RecursiveSuffix)?;
|
self.push_token(Token::RecursiveSuffix)
|
||||||
}
|
}
|
||||||
_ => {
|
Some(&c) if is_separator(c) => {
|
||||||
if is_suffix {
|
assert!(self.bump().map(is_separator).unwrap_or(false));
|
||||||
self.push_token(Token::RecursiveSuffix)?;
|
self.push_token(Token::RecursiveZeroOrMore)
|
||||||
} else {
|
|
||||||
self.push_token(Token::RecursiveZeroOrMore)?;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
_ => Err(self.error(ErrorKind::InvalidRecursive)),
|
||||||
}
|
}
|
||||||
Ok(())
|
|
||||||
}
|
}
|
||||||
|
|
||||||
fn parse_class(&mut self) -> Result<(), Error> {
|
fn parse_class(&mut self) -> Result<(), Error> {
|
||||||
@@ -985,10 +959,6 @@ impl<'a> Parser<'a> {
|
|||||||
self.cur = self.chars.next();
|
self.cur = self.chars.next();
|
||||||
self.cur
|
self.cur
|
||||||
}
|
}
|
||||||
|
|
||||||
fn peek(&mut self) -> Option<char> {
|
|
||||||
self.chars.peek().map(|&ch| ch)
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
@@ -1174,6 +1144,13 @@ mod tests {
|
|||||||
syntax!(cls20, "[^a]", vec![classn('a', 'a')]);
|
syntax!(cls20, "[^a]", vec![classn('a', 'a')]);
|
||||||
syntax!(cls21, "[^a-z]", vec![classn('a', 'z')]);
|
syntax!(cls21, "[^a-z]", vec![classn('a', 'z')]);
|
||||||
|
|
||||||
|
syntaxerr!(err_rseq1, "a**", ErrorKind::InvalidRecursive);
|
||||||
|
syntaxerr!(err_rseq2, "**a", ErrorKind::InvalidRecursive);
|
||||||
|
syntaxerr!(err_rseq3, "a**b", ErrorKind::InvalidRecursive);
|
||||||
|
syntaxerr!(err_rseq4, "***", ErrorKind::InvalidRecursive);
|
||||||
|
syntaxerr!(err_rseq5, "/a**", ErrorKind::InvalidRecursive);
|
||||||
|
syntaxerr!(err_rseq6, "/**a", ErrorKind::InvalidRecursive);
|
||||||
|
syntaxerr!(err_rseq7, "/a**b", ErrorKind::InvalidRecursive);
|
||||||
syntaxerr!(err_unclosed1, "[", ErrorKind::UnclosedClass);
|
syntaxerr!(err_unclosed1, "[", ErrorKind::UnclosedClass);
|
||||||
syntaxerr!(err_unclosed2, "[]", ErrorKind::UnclosedClass);
|
syntaxerr!(err_unclosed2, "[]", ErrorKind::UnclosedClass);
|
||||||
syntaxerr!(err_unclosed3, "[!", ErrorKind::UnclosedClass);
|
syntaxerr!(err_unclosed3, "[!", ErrorKind::UnclosedClass);
|
||||||
@@ -1217,30 +1194,8 @@ mod tests {
|
|||||||
toregex!(re8, "[*]", r"^[\*]$");
|
toregex!(re8, "[*]", r"^[\*]$");
|
||||||
toregex!(re9, "[+]", r"^[\+]$");
|
toregex!(re9, "[+]", r"^[\+]$");
|
||||||
toregex!(re10, "+", r"^\+$");
|
toregex!(re10, "+", r"^\+$");
|
||||||
toregex!(re11, "☃", r"^\xe2\x98\x83$");
|
toregex!(re11, "**", r"^.*$");
|
||||||
toregex!(re12, "**", r"^.*$");
|
toregex!(re12, "☃", r"^\xe2\x98\x83$");
|
||||||
toregex!(re13, "**/", r"^.*$");
|
|
||||||
toregex!(re14, "**/*", r"^(?:/?|.*/).*$");
|
|
||||||
toregex!(re15, "**/**", r"^.*$");
|
|
||||||
toregex!(re16, "**/**/*", r"^(?:/?|.*/).*$");
|
|
||||||
toregex!(re17, "**/**/**", r"^.*$");
|
|
||||||
toregex!(re18, "**/**/**/*", r"^(?:/?|.*/).*$");
|
|
||||||
toregex!(re19, "a/**", r"^a(?:/?|/.*)$");
|
|
||||||
toregex!(re20, "a/**/**", r"^a(?:/?|/.*)$");
|
|
||||||
toregex!(re21, "a/**/**/**", r"^a(?:/?|/.*)$");
|
|
||||||
toregex!(re22, "a/**/b", r"^a(?:/|/.*/)b$");
|
|
||||||
toregex!(re23, "a/**/**/b", r"^a(?:/|/.*/)b$");
|
|
||||||
toregex!(re24, "a/**/**/**/b", r"^a(?:/|/.*/)b$");
|
|
||||||
toregex!(re25, "**/b", r"^(?:/?|.*/)b$");
|
|
||||||
toregex!(re26, "**/**/b", r"^(?:/?|.*/)b$");
|
|
||||||
toregex!(re27, "**/**/**/b", r"^(?:/?|.*/)b$");
|
|
||||||
toregex!(re28, "a**", r"^a.*.*$");
|
|
||||||
toregex!(re29, "**a", r"^.*.*a$");
|
|
||||||
toregex!(re30, "a**b", r"^a.*.*b$");
|
|
||||||
toregex!(re31, "***", r"^.*.*.*$");
|
|
||||||
toregex!(re32, "/a**", r"^/a.*.*$");
|
|
||||||
toregex!(re33, "/**a", r"^/.*.*a$");
|
|
||||||
toregex!(re34, "/a**b", r"^/a.*.*b$");
|
|
||||||
|
|
||||||
matches!(match1, "a", "a");
|
matches!(match1, "a", "a");
|
||||||
matches!(match2, "a*b", "a_b");
|
matches!(match2, "a*b", "a_b");
|
||||||
|
@@ -104,25 +104,27 @@ or to enable case insensitive matching.
|
|||||||
#![deny(missing_docs)]
|
#![deny(missing_docs)]
|
||||||
|
|
||||||
extern crate aho_corasick;
|
extern crate aho_corasick;
|
||||||
extern crate bstr;
|
|
||||||
extern crate fnv;
|
extern crate fnv;
|
||||||
#[macro_use]
|
#[macro_use]
|
||||||
extern crate log;
|
extern crate log;
|
||||||
|
extern crate memchr;
|
||||||
extern crate regex;
|
extern crate regex;
|
||||||
|
|
||||||
use std::borrow::Cow;
|
use std::borrow::Cow;
|
||||||
use std::collections::{BTreeMap, HashMap};
|
use std::collections::{BTreeMap, HashMap};
|
||||||
use std::error::Error as StdError;
|
use std::error::Error as StdError;
|
||||||
|
use std::ffi::OsStr;
|
||||||
use std::fmt;
|
use std::fmt;
|
||||||
use std::hash;
|
use std::hash;
|
||||||
use std::path::Path;
|
use std::path::Path;
|
||||||
use std::str;
|
use std::str;
|
||||||
|
|
||||||
use aho_corasick::AhoCorasick;
|
use aho_corasick::{Automaton, AcAutomaton, FullAcAutomaton};
|
||||||
use bstr::{B, BStr, BString};
|
|
||||||
use regex::bytes::{Regex, RegexBuilder, RegexSet};
|
use regex::bytes::{Regex, RegexBuilder, RegexSet};
|
||||||
|
|
||||||
use pathutil::{file_name, file_name_ext, normalize_path};
|
use pathutil::{
|
||||||
|
file_name, file_name_ext, normalize_path, os_str_bytes, path_bytes,
|
||||||
|
};
|
||||||
use glob::MatchStrategy;
|
use glob::MatchStrategy;
|
||||||
pub use glob::{Glob, GlobBuilder, GlobMatcher};
|
pub use glob::{Glob, GlobBuilder, GlobMatcher};
|
||||||
|
|
||||||
@@ -141,13 +143,8 @@ pub struct Error {
|
|||||||
/// The kind of error that can occur when parsing a glob pattern.
|
/// The kind of error that can occur when parsing a glob pattern.
|
||||||
#[derive(Clone, Debug, Eq, PartialEq)]
|
#[derive(Clone, Debug, Eq, PartialEq)]
|
||||||
pub enum ErrorKind {
|
pub enum ErrorKind {
|
||||||
/// **DEPRECATED**.
|
/// Occurs when a use of `**` is invalid. Namely, `**` can only appear
|
||||||
///
|
/// adjacent to a path separator, or the beginning/end of a glob.
|
||||||
/// This error used to occur for consistency with git's glob specification,
|
|
||||||
/// but the specification now accepts all uses of `**`. When `**` does not
|
|
||||||
/// appear adjacent to a path separator or at the beginning/end of a glob,
|
|
||||||
/// it is now treated as two consecutive `*` patterns. As such, this error
|
|
||||||
/// is no longer used.
|
|
||||||
InvalidRecursive,
|
InvalidRecursive,
|
||||||
/// Occurs when a character class (e.g., `[abc]`) is not closed.
|
/// Occurs when a character class (e.g., `[abc]`) is not closed.
|
||||||
UnclosedClass,
|
UnclosedClass,
|
||||||
@@ -292,7 +289,6 @@ pub struct GlobSet {
|
|||||||
|
|
||||||
impl GlobSet {
|
impl GlobSet {
|
||||||
/// Create an empty `GlobSet`. An empty set matches nothing.
|
/// Create an empty `GlobSet`. An empty set matches nothing.
|
||||||
#[inline]
|
|
||||||
pub fn empty() -> GlobSet {
|
pub fn empty() -> GlobSet {
|
||||||
GlobSet {
|
GlobSet {
|
||||||
len: 0,
|
len: 0,
|
||||||
@@ -301,13 +297,11 @@ impl GlobSet {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// Returns true if this set is empty, and therefore matches nothing.
|
/// Returns true if this set is empty, and therefore matches nothing.
|
||||||
#[inline]
|
|
||||||
pub fn is_empty(&self) -> bool {
|
pub fn is_empty(&self) -> bool {
|
||||||
self.len == 0
|
self.len == 0
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Returns the number of globs in this set.
|
/// Returns the number of globs in this set.
|
||||||
#[inline]
|
|
||||||
pub fn len(&self) -> usize {
|
pub fn len(&self) -> usize {
|
||||||
self.len
|
self.len
|
||||||
}
|
}
|
||||||
@@ -490,25 +484,24 @@ impl GlobSetBuilder {
|
|||||||
/// path against multiple globs or sets of globs.
|
/// path against multiple globs or sets of globs.
|
||||||
#[derive(Clone, Debug)]
|
#[derive(Clone, Debug)]
|
||||||
pub struct Candidate<'a> {
|
pub struct Candidate<'a> {
|
||||||
path: Cow<'a, BStr>,
|
path: Cow<'a, [u8]>,
|
||||||
basename: Cow<'a, BStr>,
|
basename: Cow<'a, [u8]>,
|
||||||
ext: Cow<'a, BStr>,
|
ext: Cow<'a, [u8]>,
|
||||||
}
|
}
|
||||||
|
|
||||||
impl<'a> Candidate<'a> {
|
impl<'a> Candidate<'a> {
|
||||||
/// Create a new candidate for matching from the given path.
|
/// Create a new candidate for matching from the given path.
|
||||||
pub fn new<P: AsRef<Path> + ?Sized>(path: &'a P) -> Candidate<'a> {
|
pub fn new<P: AsRef<Path> + ?Sized>(path: &'a P) -> Candidate<'a> {
|
||||||
let path = normalize_path(BString::from_path_lossy(path.as_ref()));
|
let path = path.as_ref();
|
||||||
let basename = file_name(&path).unwrap_or(Cow::Borrowed(B("")));
|
let basename = file_name(path).unwrap_or(OsStr::new(""));
|
||||||
let ext = file_name_ext(&basename).unwrap_or(Cow::Borrowed(B("")));
|
|
||||||
Candidate {
|
Candidate {
|
||||||
path: path,
|
path: normalize_path(path_bytes(path)),
|
||||||
basename: basename,
|
basename: os_str_bytes(basename),
|
||||||
ext: ext,
|
ext: file_name_ext(basename).unwrap_or(Cow::Borrowed(b"")),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn path_prefix(&self, max: usize) -> &BStr {
|
fn path_prefix(&self, max: usize) -> &[u8] {
|
||||||
if self.path.len() <= max {
|
if self.path.len() <= max {
|
||||||
&*self.path
|
&*self.path
|
||||||
} else {
|
} else {
|
||||||
@@ -516,7 +509,7 @@ impl<'a> Candidate<'a> {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn path_suffix(&self, max: usize) -> &BStr {
|
fn path_suffix(&self, max: usize) -> &[u8] {
|
||||||
if self.path.len() <= max {
|
if self.path.len() <= max {
|
||||||
&*self.path
|
&*self.path
|
||||||
} else {
|
} else {
|
||||||
@@ -577,12 +570,12 @@ impl LiteralStrategy {
|
|||||||
}
|
}
|
||||||
|
|
||||||
fn is_match(&self, candidate: &Candidate) -> bool {
|
fn is_match(&self, candidate: &Candidate) -> bool {
|
||||||
self.0.contains_key(candidate.path.as_bytes())
|
self.0.contains_key(&*candidate.path)
|
||||||
}
|
}
|
||||||
|
|
||||||
#[inline(never)]
|
#[inline(never)]
|
||||||
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
|
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
|
||||||
if let Some(hits) = self.0.get(candidate.path.as_bytes()) {
|
if let Some(hits) = self.0.get(&*candidate.path) {
|
||||||
matches.extend(hits);
|
matches.extend(hits);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -604,7 +597,7 @@ impl BasenameLiteralStrategy {
|
|||||||
if candidate.basename.is_empty() {
|
if candidate.basename.is_empty() {
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
self.0.contains_key(candidate.basename.as_bytes())
|
self.0.contains_key(&*candidate.basename)
|
||||||
}
|
}
|
||||||
|
|
||||||
#[inline(never)]
|
#[inline(never)]
|
||||||
@@ -612,7 +605,7 @@ impl BasenameLiteralStrategy {
|
|||||||
if candidate.basename.is_empty() {
|
if candidate.basename.is_empty() {
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
if let Some(hits) = self.0.get(candidate.basename.as_bytes()) {
|
if let Some(hits) = self.0.get(&*candidate.basename) {
|
||||||
matches.extend(hits);
|
matches.extend(hits);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -634,7 +627,7 @@ impl ExtensionStrategy {
|
|||||||
if candidate.ext.is_empty() {
|
if candidate.ext.is_empty() {
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
self.0.contains_key(candidate.ext.as_bytes())
|
self.0.contains_key(&*candidate.ext)
|
||||||
}
|
}
|
||||||
|
|
||||||
#[inline(never)]
|
#[inline(never)]
|
||||||
@@ -642,7 +635,7 @@ impl ExtensionStrategy {
|
|||||||
if candidate.ext.is_empty() {
|
if candidate.ext.is_empty() {
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
if let Some(hits) = self.0.get(candidate.ext.as_bytes()) {
|
if let Some(hits) = self.0.get(&*candidate.ext) {
|
||||||
matches.extend(hits);
|
matches.extend(hits);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -650,7 +643,7 @@ impl ExtensionStrategy {
|
|||||||
|
|
||||||
#[derive(Clone, Debug)]
|
#[derive(Clone, Debug)]
|
||||||
struct PrefixStrategy {
|
struct PrefixStrategy {
|
||||||
matcher: AhoCorasick,
|
matcher: FullAcAutomaton<Vec<u8>>,
|
||||||
map: Vec<usize>,
|
map: Vec<usize>,
|
||||||
longest: usize,
|
longest: usize,
|
||||||
}
|
}
|
||||||
@@ -658,8 +651,8 @@ struct PrefixStrategy {
|
|||||||
impl PrefixStrategy {
|
impl PrefixStrategy {
|
||||||
fn is_match(&self, candidate: &Candidate) -> bool {
|
fn is_match(&self, candidate: &Candidate) -> bool {
|
||||||
let path = candidate.path_prefix(self.longest);
|
let path = candidate.path_prefix(self.longest);
|
||||||
for m in self.matcher.find_overlapping_iter(path) {
|
for m in self.matcher.find_overlapping(path) {
|
||||||
if m.start() == 0 {
|
if m.start == 0 {
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -668,9 +661,9 @@ impl PrefixStrategy {
|
|||||||
|
|
||||||
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
|
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
|
||||||
let path = candidate.path_prefix(self.longest);
|
let path = candidate.path_prefix(self.longest);
|
||||||
for m in self.matcher.find_overlapping_iter(path) {
|
for m in self.matcher.find_overlapping(path) {
|
||||||
if m.start() == 0 {
|
if m.start == 0 {
|
||||||
matches.push(self.map[m.pattern()]);
|
matches.push(self.map[m.pati]);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -678,7 +671,7 @@ impl PrefixStrategy {
|
|||||||
|
|
||||||
#[derive(Clone, Debug)]
|
#[derive(Clone, Debug)]
|
||||||
struct SuffixStrategy {
|
struct SuffixStrategy {
|
||||||
matcher: AhoCorasick,
|
matcher: FullAcAutomaton<Vec<u8>>,
|
||||||
map: Vec<usize>,
|
map: Vec<usize>,
|
||||||
longest: usize,
|
longest: usize,
|
||||||
}
|
}
|
||||||
@@ -686,8 +679,8 @@ struct SuffixStrategy {
|
|||||||
impl SuffixStrategy {
|
impl SuffixStrategy {
|
||||||
fn is_match(&self, candidate: &Candidate) -> bool {
|
fn is_match(&self, candidate: &Candidate) -> bool {
|
||||||
let path = candidate.path_suffix(self.longest);
|
let path = candidate.path_suffix(self.longest);
|
||||||
for m in self.matcher.find_overlapping_iter(path) {
|
for m in self.matcher.find_overlapping(path) {
|
||||||
if m.end() == path.len() {
|
if m.end == path.len() {
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -696,9 +689,9 @@ impl SuffixStrategy {
|
|||||||
|
|
||||||
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
|
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
|
||||||
let path = candidate.path_suffix(self.longest);
|
let path = candidate.path_suffix(self.longest);
|
||||||
for m in self.matcher.find_overlapping_iter(path) {
|
for m in self.matcher.find_overlapping(path) {
|
||||||
if m.end() == path.len() {
|
if m.end == path.len() {
|
||||||
matches.push(self.map[m.pattern()]);
|
matches.push(self.map[m.pati]);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -712,11 +705,11 @@ impl RequiredExtensionStrategy {
|
|||||||
if candidate.ext.is_empty() {
|
if candidate.ext.is_empty() {
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
match self.0.get(candidate.ext.as_bytes()) {
|
match self.0.get(&*candidate.ext) {
|
||||||
None => false,
|
None => false,
|
||||||
Some(regexes) => {
|
Some(regexes) => {
|
||||||
for &(_, ref re) in regexes {
|
for &(_, ref re) in regexes {
|
||||||
if re.is_match(candidate.path.as_bytes()) {
|
if re.is_match(&*candidate.path) {
|
||||||
return true;
|
return true;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -730,9 +723,9 @@ impl RequiredExtensionStrategy {
|
|||||||
if candidate.ext.is_empty() {
|
if candidate.ext.is_empty() {
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
if let Some(regexes) = self.0.get(candidate.ext.as_bytes()) {
|
if let Some(regexes) = self.0.get(&*candidate.ext) {
|
||||||
for &(global_index, ref re) in regexes {
|
for &(global_index, ref re) in regexes {
|
||||||
if re.is_match(candidate.path.as_bytes()) {
|
if re.is_match(&*candidate.path) {
|
||||||
matches.push(global_index);
|
matches.push(global_index);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -748,11 +741,11 @@ struct RegexSetStrategy {
|
|||||||
|
|
||||||
impl RegexSetStrategy {
|
impl RegexSetStrategy {
|
||||||
fn is_match(&self, candidate: &Candidate) -> bool {
|
fn is_match(&self, candidate: &Candidate) -> bool {
|
||||||
self.matcher.is_match(candidate.path.as_bytes())
|
self.matcher.is_match(&*candidate.path)
|
||||||
}
|
}
|
||||||
|
|
||||||
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
|
fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
|
||||||
for i in self.matcher.matches(candidate.path.as_bytes()) {
|
for i in self.matcher.matches(&*candidate.path) {
|
||||||
matches.push(self.map[i]);
|
matches.push(self.map[i]);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -783,16 +776,18 @@ impl MultiStrategyBuilder {
|
|||||||
}
|
}
|
||||||
|
|
||||||
fn prefix(self) -> PrefixStrategy {
|
fn prefix(self) -> PrefixStrategy {
|
||||||
|
let it = self.literals.into_iter().map(|s| s.into_bytes());
|
||||||
PrefixStrategy {
|
PrefixStrategy {
|
||||||
matcher: AhoCorasick::new_auto_configured(&self.literals),
|
matcher: AcAutomaton::new(it).into_full(),
|
||||||
map: self.map,
|
map: self.map,
|
||||||
longest: self.longest,
|
longest: self.longest,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn suffix(self) -> SuffixStrategy {
|
fn suffix(self) -> SuffixStrategy {
|
||||||
|
let it = self.literals.into_iter().map(|s| s.into_bytes());
|
||||||
SuffixStrategy {
|
SuffixStrategy {
|
||||||
matcher: AhoCorasick::new_auto_configured(&self.literals),
|
matcher: AcAutomaton::new(it).into_full(),
|
||||||
map: self.map,
|
map: self.map,
|
||||||
longest: self.longest,
|
longest: self.longest,
|
||||||
}
|
}
|
||||||
|
@@ -1,26 +1,41 @@
|
|||||||
use std::borrow::Cow;
|
use std::borrow::Cow;
|
||||||
|
use std::ffi::OsStr;
|
||||||
use bstr::BStr;
|
use std::path::Path;
|
||||||
|
|
||||||
/// The final component of the path, if it is a normal file.
|
/// The final component of the path, if it is a normal file.
|
||||||
///
|
///
|
||||||
/// If the path terminates in ., .., or consists solely of a root of prefix,
|
/// If the path terminates in ., .., or consists solely of a root of prefix,
|
||||||
/// file_name will return None.
|
/// file_name will return None.
|
||||||
pub fn file_name<'a>(path: &Cow<'a, BStr>) -> Option<Cow<'a, BStr>> {
|
#[cfg(unix)]
|
||||||
|
pub fn file_name<'a, P: AsRef<Path> + ?Sized>(
|
||||||
|
path: &'a P,
|
||||||
|
) -> Option<&'a OsStr> {
|
||||||
|
use std::os::unix::ffi::OsStrExt;
|
||||||
|
use memchr::memrchr;
|
||||||
|
|
||||||
|
let path = path.as_ref().as_os_str().as_bytes();
|
||||||
if path.is_empty() {
|
if path.is_empty() {
|
||||||
return None;
|
return None;
|
||||||
} else if path.last() == Some(b'.') {
|
} else if path.len() == 1 && path[0] == b'.' {
|
||||||
|
return None;
|
||||||
|
} else if path.last() == Some(&b'.') {
|
||||||
|
return None;
|
||||||
|
} else if path.len() >= 2 && &path[path.len() - 2..] == &b".."[..] {
|
||||||
return None;
|
return None;
|
||||||
}
|
}
|
||||||
let last_slash = path.rfind_byte(b'/').map(|i| i + 1).unwrap_or(0);
|
let last_slash = memrchr(b'/', path).map(|i| i + 1).unwrap_or(0);
|
||||||
Some(match *path {
|
Some(OsStr::from_bytes(&path[last_slash..]))
|
||||||
Cow::Borrowed(path) => Cow::Borrowed(&path[last_slash..]),
|
}
|
||||||
Cow::Owned(ref path) => {
|
|
||||||
let mut path = path.clone();
|
/// The final component of the path, if it is a normal file.
|
||||||
path.drain_bytes(..last_slash);
|
///
|
||||||
Cow::Owned(path)
|
/// If the path terminates in ., .., or consists solely of a root of prefix,
|
||||||
}
|
/// file_name will return None.
|
||||||
})
|
#[cfg(not(unix))]
|
||||||
|
pub fn file_name<'a, P: AsRef<Path> + ?Sized>(
|
||||||
|
path: &'a P,
|
||||||
|
) -> Option<&'a OsStr> {
|
||||||
|
path.as_ref().file_name()
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Return a file extension given a path's file name.
|
/// Return a file extension given a path's file name.
|
||||||
@@ -39,28 +54,59 @@ pub fn file_name<'a>(path: &Cow<'a, BStr>) -> Option<Cow<'a, BStr>> {
|
|||||||
/// a pattern like `*.rs` is obviously trying to match files with a `rs`
|
/// a pattern like `*.rs` is obviously trying to match files with a `rs`
|
||||||
/// extension, but it also matches files like `.rs`, which doesn't have an
|
/// extension, but it also matches files like `.rs`, which doesn't have an
|
||||||
/// extension according to std::path::Path::extension.
|
/// extension according to std::path::Path::extension.
|
||||||
pub fn file_name_ext<'a>(name: &Cow<'a, BStr>) -> Option<Cow<'a, BStr>> {
|
pub fn file_name_ext(name: &OsStr) -> Option<Cow<[u8]>> {
|
||||||
if name.is_empty() {
|
if name.is_empty() {
|
||||||
return None;
|
return None;
|
||||||
}
|
}
|
||||||
let last_dot_at = match name.rfind_byte(b'.') {
|
let name = os_str_bytes(name);
|
||||||
None => return None,
|
let last_dot_at = {
|
||||||
Some(i) => i,
|
let result = name
|
||||||
|
.iter().enumerate().rev()
|
||||||
|
.find(|&(_, &b)| b == b'.')
|
||||||
|
.map(|(i, _)| i);
|
||||||
|
match result {
|
||||||
|
None => return None,
|
||||||
|
Some(i) => i,
|
||||||
|
}
|
||||||
};
|
};
|
||||||
Some(match *name {
|
Some(match name {
|
||||||
Cow::Borrowed(name) => Cow::Borrowed(&name[last_dot_at..]),
|
Cow::Borrowed(name) => Cow::Borrowed(&name[last_dot_at..]),
|
||||||
Cow::Owned(ref name) => {
|
Cow::Owned(mut name) => {
|
||||||
let mut name = name.clone();
|
name.drain(..last_dot_at);
|
||||||
name.drain_bytes(..last_dot_at);
|
|
||||||
Cow::Owned(name)
|
Cow::Owned(name)
|
||||||
}
|
}
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Return raw bytes of a path, transcoded to UTF-8 if necessary.
|
||||||
|
pub fn path_bytes(path: &Path) -> Cow<[u8]> {
|
||||||
|
os_str_bytes(path.as_os_str())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Return the raw bytes of the given OS string, possibly transcoded to UTF-8.
|
||||||
|
#[cfg(unix)]
|
||||||
|
pub fn os_str_bytes(s: &OsStr) -> Cow<[u8]> {
|
||||||
|
use std::os::unix::ffi::OsStrExt;
|
||||||
|
Cow::Borrowed(s.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Return the raw bytes of the given OS string, possibly transcoded to UTF-8.
|
||||||
|
#[cfg(not(unix))]
|
||||||
|
pub fn os_str_bytes(s: &OsStr) -> Cow<[u8]> {
|
||||||
|
// TODO(burntsushi): On Windows, OS strings are WTF-8, which is a superset
|
||||||
|
// of UTF-8, so even if we could get at the raw bytes, they wouldn't
|
||||||
|
// be useful. We *must* convert to UTF-8 before doing path matching.
|
||||||
|
// Unfortunate, but necessary.
|
||||||
|
match s.to_string_lossy() {
|
||||||
|
Cow::Owned(s) => Cow::Owned(s.into_bytes()),
|
||||||
|
Cow::Borrowed(s) => Cow::Borrowed(s.as_bytes()),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
/// Normalizes a path to use `/` as a separator everywhere, even on platforms
|
/// Normalizes a path to use `/` as a separator everywhere, even on platforms
|
||||||
/// that recognize other characters as separators.
|
/// that recognize other characters as separators.
|
||||||
#[cfg(unix)]
|
#[cfg(unix)]
|
||||||
pub fn normalize_path(path: Cow<BStr>) -> Cow<BStr> {
|
pub fn normalize_path(path: Cow<[u8]>) -> Cow<[u8]> {
|
||||||
// UNIX only uses /, so we're good.
|
// UNIX only uses /, so we're good.
|
||||||
path
|
path
|
||||||
}
|
}
|
||||||
@@ -68,7 +114,7 @@ pub fn normalize_path(path: Cow<BStr>) -> Cow<BStr> {
|
|||||||
/// Normalizes a path to use `/` as a separator everywhere, even on platforms
|
/// Normalizes a path to use `/` as a separator everywhere, even on platforms
|
||||||
/// that recognize other characters as separators.
|
/// that recognize other characters as separators.
|
||||||
#[cfg(not(unix))]
|
#[cfg(not(unix))]
|
||||||
pub fn normalize_path(mut path: Cow<BStr>) -> Cow<BStr> {
|
pub fn normalize_path(mut path: Cow<[u8]>) -> Cow<[u8]> {
|
||||||
use std::path::is_separator;
|
use std::path::is_separator;
|
||||||
|
|
||||||
for i in 0..path.len() {
|
for i in 0..path.len() {
|
||||||
@@ -83,8 +129,7 @@ pub fn normalize_path(mut path: Cow<BStr>) -> Cow<BStr> {
|
|||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use std::borrow::Cow;
|
use std::borrow::Cow;
|
||||||
|
use std::ffi::OsStr;
|
||||||
use bstr::{B, BString};
|
|
||||||
|
|
||||||
use super::{file_name_ext, normalize_path};
|
use super::{file_name_ext, normalize_path};
|
||||||
|
|
||||||
@@ -92,9 +137,8 @@ mod tests {
|
|||||||
($name:ident, $file_name:expr, $ext:expr) => {
|
($name:ident, $file_name:expr, $ext:expr) => {
|
||||||
#[test]
|
#[test]
|
||||||
fn $name() {
|
fn $name() {
|
||||||
let bs = BString::from($file_name);
|
let got = file_name_ext(OsStr::new($file_name));
|
||||||
let got = file_name_ext(&Cow::Owned(bs));
|
assert_eq!($ext.map(|s| Cow::Borrowed(s.as_bytes())), got);
|
||||||
assert_eq!($ext.map(|s| Cow::Borrowed(B(s))), got);
|
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
@@ -109,8 +153,7 @@ mod tests {
|
|||||||
($name:ident, $path:expr, $expected:expr) => {
|
($name:ident, $path:expr, $expected:expr) => {
|
||||||
#[test]
|
#[test]
|
||||||
fn $name() {
|
fn $name() {
|
||||||
let bs = BString::from_slice($path);
|
let got = normalize_path(Cow::Owned($path.to_vec()));
|
||||||
let got = normalize_path(Cow::Owned(bs));
|
|
||||||
assert_eq!($expected.to_vec(), got.into_owned());
|
assert_eq!($expected.to_vec(), got.into_owned());
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
@@ -14,13 +14,12 @@ license = "Unlicense/MIT"
|
|||||||
|
|
||||||
[dependencies]
|
[dependencies]
|
||||||
atty = "0.2.11"
|
atty = "0.2.11"
|
||||||
bstr = "0.1.2"
|
globset = { version = "0.4.2", path = "../globset" }
|
||||||
globset = { version = "0.4.3", path = "../globset" }
|
|
||||||
lazy_static = "1.1.0"
|
lazy_static = "1.1.0"
|
||||||
log = "0.4.5"
|
log = "0.4.5"
|
||||||
regex = "1.1"
|
regex = "1.0.5"
|
||||||
same-file = "1.0.4"
|
same-file = "1.0.3"
|
||||||
termcolor = "1.0.4"
|
termcolor = "1.0.3"
|
||||||
|
|
||||||
[target.'cfg(windows)'.dependencies.winapi-util]
|
[target.'cfg(windows)'.dependencies.winapi-util]
|
||||||
version = "0.1.1"
|
version = "0.1.1"
|
||||||
|
@@ -352,8 +352,6 @@ fn default_decompression_commands() -> Vec<DecompressionCommand> {
|
|||||||
const ARGS_XZ: &[&str] = &["xz", "-d", "-c"];
|
const ARGS_XZ: &[&str] = &["xz", "-d", "-c"];
|
||||||
const ARGS_LZ4: &[&str] = &["lz4", "-d", "-c"];
|
const ARGS_LZ4: &[&str] = &["lz4", "-d", "-c"];
|
||||||
const ARGS_LZMA: &[&str] = &["xz", "--format=lzma", "-d", "-c"];
|
const ARGS_LZMA: &[&str] = &["xz", "--format=lzma", "-d", "-c"];
|
||||||
const ARGS_BROTLI: &[&str] = &["brotli", "-d", "-c"];
|
|
||||||
const ARGS_ZSTD: &[&str] = &["zstd", "-q", "-d", "-c"];
|
|
||||||
|
|
||||||
fn cmd(glob: &str, args: &[&str]) -> DecompressionCommand {
|
fn cmd(glob: &str, args: &[&str]) -> DecompressionCommand {
|
||||||
DecompressionCommand {
|
DecompressionCommand {
|
||||||
@@ -369,14 +367,15 @@ fn default_decompression_commands() -> Vec<DecompressionCommand> {
|
|||||||
vec![
|
vec![
|
||||||
cmd("*.gz", ARGS_GZIP),
|
cmd("*.gz", ARGS_GZIP),
|
||||||
cmd("*.tgz", ARGS_GZIP),
|
cmd("*.tgz", ARGS_GZIP),
|
||||||
|
|
||||||
cmd("*.bz2", ARGS_BZIP),
|
cmd("*.bz2", ARGS_BZIP),
|
||||||
cmd("*.tbz2", ARGS_BZIP),
|
cmd("*.tbz2", ARGS_BZIP),
|
||||||
|
|
||||||
cmd("*.xz", ARGS_XZ),
|
cmd("*.xz", ARGS_XZ),
|
||||||
cmd("*.txz", ARGS_XZ),
|
cmd("*.txz", ARGS_XZ),
|
||||||
|
|
||||||
cmd("*.lz4", ARGS_LZ4),
|
cmd("*.lz4", ARGS_LZ4),
|
||||||
|
|
||||||
cmd("*.lzma", ARGS_LZMA),
|
cmd("*.lzma", ARGS_LZMA),
|
||||||
cmd("*.br", ARGS_BROTLI),
|
|
||||||
cmd("*.zst", ARGS_ZSTD),
|
|
||||||
cmd("*.zstd", ARGS_ZSTD),
|
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
@@ -1,8 +1,6 @@
|
|||||||
use std::ffi::OsStr;
|
use std::ffi::OsStr;
|
||||||
use std::str;
|
use std::str;
|
||||||
|
|
||||||
use bstr::{BStr, BString};
|
|
||||||
|
|
||||||
/// A single state in the state machine used by `unescape`.
|
/// A single state in the state machine used by `unescape`.
|
||||||
#[derive(Clone, Copy, Eq, PartialEq)]
|
#[derive(Clone, Copy, Eq, PartialEq)]
|
||||||
enum State {
|
enum State {
|
||||||
@@ -37,16 +35,18 @@ enum State {
|
|||||||
///
|
///
|
||||||
/// assert_eq!(r"foo\nbar\xFFbaz", escape(b"foo\nbar\xFFbaz"));
|
/// assert_eq!(r"foo\nbar\xFFbaz", escape(b"foo\nbar\xFFbaz"));
|
||||||
/// ```
|
/// ```
|
||||||
pub fn escape(bytes: &[u8]) -> String {
|
pub fn escape(mut bytes: &[u8]) -> String {
|
||||||
let bytes = BStr::new(bytes);
|
|
||||||
let mut escaped = String::new();
|
let mut escaped = String::new();
|
||||||
for (s, e, ch) in bytes.char_indices() {
|
while let Some(result) = decode_utf8(bytes) {
|
||||||
if ch == '\u{FFFD}' {
|
match result {
|
||||||
for b in bytes[s..e].bytes() {
|
Ok(cp) => {
|
||||||
escape_byte(b, &mut escaped);
|
escape_char(cp, &mut escaped);
|
||||||
|
bytes = &bytes[cp.len_utf8()..];
|
||||||
|
}
|
||||||
|
Err(byte) => {
|
||||||
|
escape_byte(byte, &mut escaped);
|
||||||
|
bytes = &bytes[1..];
|
||||||
}
|
}
|
||||||
} else {
|
|
||||||
escape_char(ch, &mut escaped);
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
escaped
|
escaped
|
||||||
@@ -56,7 +56,19 @@ pub fn escape(bytes: &[u8]) -> String {
|
|||||||
///
|
///
|
||||||
/// This is like [`escape`](fn.escape.html), but accepts an OS string.
|
/// This is like [`escape`](fn.escape.html), but accepts an OS string.
|
||||||
pub fn escape_os(string: &OsStr) -> String {
|
pub fn escape_os(string: &OsStr) -> String {
|
||||||
escape(BString::from_os_str_lossy(string).as_bytes())
|
#[cfg(unix)]
|
||||||
|
fn imp(string: &OsStr) -> String {
|
||||||
|
use std::os::unix::ffi::OsStrExt;
|
||||||
|
|
||||||
|
escape(string.as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(not(unix))]
|
||||||
|
fn imp(string: &OsStr) -> String {
|
||||||
|
escape(string.to_string_lossy().as_bytes())
|
||||||
|
}
|
||||||
|
|
||||||
|
imp(string)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Unescapes a string.
|
/// Unescapes a string.
|
||||||
@@ -183,6 +195,46 @@ fn escape_byte(byte: u8, into: &mut String) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Decodes the next UTF-8 encoded codepoint from the given byte slice.
|
||||||
|
///
|
||||||
|
/// If no valid encoding of a codepoint exists at the beginning of the given
|
||||||
|
/// byte slice, then the first byte is returned instead.
|
||||||
|
///
|
||||||
|
/// This returns `None` if and only if `bytes` is empty.
|
||||||
|
fn decode_utf8(bytes: &[u8]) -> Option<Result<char, u8>> {
|
||||||
|
if bytes.is_empty() {
|
||||||
|
return None;
|
||||||
|
}
|
||||||
|
let len = match utf8_len(bytes[0]) {
|
||||||
|
None => return Some(Err(bytes[0])),
|
||||||
|
Some(len) if len > bytes.len() => return Some(Err(bytes[0])),
|
||||||
|
Some(len) => len,
|
||||||
|
};
|
||||||
|
match str::from_utf8(&bytes[..len]) {
|
||||||
|
Ok(s) => Some(Ok(s.chars().next().unwrap())),
|
||||||
|
Err(_) => Some(Err(bytes[0])),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Given a UTF-8 leading byte, this returns the total number of code units
|
||||||
|
/// in the following encoded codepoint.
|
||||||
|
///
|
||||||
|
/// If the given byte is not a valid UTF-8 leading byte, then this returns
|
||||||
|
/// `None`.
|
||||||
|
fn utf8_len(byte: u8) -> Option<usize> {
|
||||||
|
if byte <= 0x7F {
|
||||||
|
Some(1)
|
||||||
|
} else if byte <= 0b110_11111 {
|
||||||
|
Some(2)
|
||||||
|
} else if byte <= 0b1110_1111 {
|
||||||
|
Some(3)
|
||||||
|
} else if byte <= 0b1111_0111 {
|
||||||
|
Some(4)
|
||||||
|
} else {
|
||||||
|
None
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use super::{escape, unescape};
|
use super::{escape, unescape};
|
||||||
|
@@ -159,7 +159,6 @@ error message is crafted that typically tells the user how to fix the problem.
|
|||||||
#![deny(missing_docs)]
|
#![deny(missing_docs)]
|
||||||
|
|
||||||
extern crate atty;
|
extern crate atty;
|
||||||
extern crate bstr;
|
|
||||||
extern crate globset;
|
extern crate globset;
|
||||||
#[macro_use]
|
#[macro_use]
|
||||||
extern crate lazy_static;
|
extern crate lazy_static;
|
||||||
|
@@ -1,6 +1,6 @@
|
|||||||
[package]
|
[package]
|
||||||
name = "grep-matcher"
|
name = "grep-matcher"
|
||||||
version = "0.1.2" #:version
|
version = "0.1.1" #:version
|
||||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||||
description = """
|
description = """
|
||||||
A trait for regular expressions, with a focus on line oriented search.
|
A trait for regular expressions, with a focus on line oriented search.
|
||||||
@@ -14,10 +14,10 @@ license = "Unlicense/MIT"
|
|||||||
autotests = false
|
autotests = false
|
||||||
|
|
||||||
[dependencies]
|
[dependencies]
|
||||||
memchr = "2.1"
|
memchr = "2.0.2"
|
||||||
|
|
||||||
[dev-dependencies]
|
[dev-dependencies]
|
||||||
regex = "1.1"
|
regex = "1.0.5"
|
||||||
|
|
||||||
[[test]]
|
[[test]]
|
||||||
name = "integration"
|
name = "integration"
|
||||||
|
@@ -1,6 +1,6 @@
|
|||||||
[package]
|
[package]
|
||||||
name = "grep-pcre2"
|
name = "grep-pcre2"
|
||||||
version = "0.1.2" #:version
|
version = "0.1.1" #:version
|
||||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||||
description = """
|
description = """
|
||||||
Use PCRE2 with the 'grep' crate.
|
Use PCRE2 with the 'grep' crate.
|
||||||
@@ -13,5 +13,5 @@ keywords = ["regex", "grep", "pcre", "backreference", "look"]
|
|||||||
license = "Unlicense/MIT"
|
license = "Unlicense/MIT"
|
||||||
|
|
||||||
[dependencies]
|
[dependencies]
|
||||||
grep-matcher = { version = "0.1.2", path = "../grep-matcher" }
|
grep-matcher = { version = "0.1.1", path = "../grep-matcher" }
|
||||||
pcre2 = "0.2.0"
|
pcre2 = "0.1.0"
|
||||||
|
@@ -10,7 +10,6 @@ extern crate pcre2;
|
|||||||
|
|
||||||
pub use error::{Error, ErrorKind};
|
pub use error::{Error, ErrorKind};
|
||||||
pub use matcher::{RegexCaptures, RegexMatcher, RegexMatcherBuilder};
|
pub use matcher::{RegexCaptures, RegexMatcher, RegexMatcherBuilder};
|
||||||
pub use pcre2::{is_jit_available, version};
|
|
||||||
|
|
||||||
mod error;
|
mod error;
|
||||||
mod matcher;
|
mod matcher;
|
||||||
|
@@ -199,55 +199,16 @@ impl RegexMatcherBuilder {
|
|||||||
self
|
self
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Enable PCRE2's JIT and return an error if it's not available.
|
/// Enable PCRE2's JIT.
|
||||||
///
|
///
|
||||||
/// This generally speeds up matching quite a bit. The downside is that it
|
/// This generally speeds up matching quite a bit. The downside is that it
|
||||||
/// can increase the time it takes to compile a pattern.
|
/// can increase the time it takes to compile a pattern.
|
||||||
///
|
///
|
||||||
/// If the JIT isn't available or if JIT compilation returns an error, then
|
/// This is disabled by default.
|
||||||
/// regex compilation will fail with the corresponding error.
|
|
||||||
///
|
|
||||||
/// This is disabled by default, and always overrides `jit_if_available`.
|
|
||||||
pub fn jit(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
|
pub fn jit(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
|
||||||
self.builder.jit(yes);
|
self.builder.jit(yes);
|
||||||
self
|
self
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Enable PCRE2's JIT if it's available.
|
|
||||||
///
|
|
||||||
/// This generally speeds up matching quite a bit. The downside is that it
|
|
||||||
/// can increase the time it takes to compile a pattern.
|
|
||||||
///
|
|
||||||
/// If the JIT isn't available or if JIT compilation returns an error,
|
|
||||||
/// then a debug message with the error will be emitted and the regex will
|
|
||||||
/// otherwise silently fall back to non-JIT matching.
|
|
||||||
///
|
|
||||||
/// This is disabled by default, and always overrides `jit`.
|
|
||||||
pub fn jit_if_available(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
|
|
||||||
self.builder.jit_if_available(yes);
|
|
||||||
self
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Set the maximum size of PCRE2's JIT stack, in bytes. If the JIT is
|
|
||||||
/// not enabled, then this has no effect.
|
|
||||||
///
|
|
||||||
/// When `None` is given, no custom JIT stack will be created, and instead,
|
|
||||||
/// the default JIT stack is used. When the default is used, its maximum
|
|
||||||
/// size is 32 KB.
|
|
||||||
///
|
|
||||||
/// When this is set, then a new JIT stack will be created with the given
|
|
||||||
/// maximum size as its limit.
|
|
||||||
///
|
|
||||||
/// Increasing the stack size can be useful for larger regular expressions.
|
|
||||||
///
|
|
||||||
/// By default, this is set to `None`.
|
|
||||||
pub fn max_jit_stack_size(
|
|
||||||
&mut self,
|
|
||||||
bytes: Option<usize>,
|
|
||||||
) -> &mut RegexMatcherBuilder {
|
|
||||||
self.builder.max_jit_stack_size(bytes);
|
|
||||||
self
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/// An implementation of the `Matcher` trait using PCRE2.
|
/// An implementation of the `Matcher` trait using PCRE2.
|
||||||
|
@@ -18,11 +18,10 @@ default = ["serde1"]
|
|||||||
serde1 = ["base64", "serde", "serde_derive", "serde_json"]
|
serde1 = ["base64", "serde", "serde_derive", "serde_json"]
|
||||||
|
|
||||||
[dependencies]
|
[dependencies]
|
||||||
base64 = { version = "0.10.0", optional = true }
|
base64 = { version = "0.9.2", optional = true }
|
||||||
bstr = "0.1.2"
|
grep-matcher = { version = "0.1.1", path = "../grep-matcher" }
|
||||||
grep-matcher = { version = "0.1.2", path = "../grep-matcher" }
|
|
||||||
grep-searcher = { version = "0.1.1", path = "../grep-searcher" }
|
grep-searcher = { version = "0.1.1", path = "../grep-searcher" }
|
||||||
termcolor = "1.0.4"
|
termcolor = "1.0.3"
|
||||||
serde = { version = "1.0.77", optional = true }
|
serde = { version = "1.0.77", optional = true }
|
||||||
serde_derive = { version = "1.0.77", optional = true }
|
serde_derive = { version = "1.0.77", optional = true }
|
||||||
serde_json = { version = "1.0.27", optional = true }
|
serde_json = { version = "1.0.27", optional = true }
|
||||||
|
@@ -817,8 +817,7 @@ impl<'a> SubMatches<'a> {
|
|||||||
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use grep_regex::{RegexMatcher, RegexMatcherBuilder};
|
use grep_regex::RegexMatcher;
|
||||||
use grep_matcher::LineTerminator;
|
|
||||||
use grep_searcher::SearcherBuilder;
|
use grep_searcher::SearcherBuilder;
|
||||||
|
|
||||||
use super::{JSON, JSONBuilder};
|
use super::{JSON, JSONBuilder};
|
||||||
@@ -919,45 +918,4 @@ and exhibited clearly, with a label attached.\
|
|||||||
assert_eq!(got.lines().count(), 2);
|
assert_eq!(got.lines().count(), 2);
|
||||||
assert!(got.contains("begin") && got.contains("end"));
|
assert!(got.contains("begin") && got.contains("end"));
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
|
||||||
fn missing_crlf() {
|
|
||||||
let haystack = "test\r\n".as_bytes();
|
|
||||||
|
|
||||||
let matcher = RegexMatcherBuilder::new()
|
|
||||||
.build("test")
|
|
||||||
.unwrap();
|
|
||||||
let mut printer = JSONBuilder::new()
|
|
||||||
.build(vec![]);
|
|
||||||
SearcherBuilder::new()
|
|
||||||
.build()
|
|
||||||
.search_reader(&matcher, haystack, printer.sink(&matcher))
|
|
||||||
.unwrap();
|
|
||||||
let got = printer_contents(&mut printer);
|
|
||||||
assert_eq!(got.lines().count(), 3);
|
|
||||||
assert!(
|
|
||||||
got.lines().nth(1).unwrap().contains(r"test\r\n"),
|
|
||||||
r"missing 'test\r\n' in '{}'",
|
|
||||||
got.lines().nth(1).unwrap(),
|
|
||||||
);
|
|
||||||
|
|
||||||
let matcher = RegexMatcherBuilder::new()
|
|
||||||
.crlf(true)
|
|
||||||
.build("test")
|
|
||||||
.unwrap();
|
|
||||||
let mut printer = JSONBuilder::new()
|
|
||||||
.build(vec![]);
|
|
||||||
SearcherBuilder::new()
|
|
||||||
.line_terminator(LineTerminator::crlf())
|
|
||||||
.build()
|
|
||||||
.search_reader(&matcher, haystack, printer.sink(&matcher))
|
|
||||||
.unwrap();
|
|
||||||
let got = printer_contents(&mut printer);
|
|
||||||
assert_eq!(got.lines().count(), 3);
|
|
||||||
assert!(
|
|
||||||
got.lines().nth(1).unwrap().contains(r"test\r\n"),
|
|
||||||
r"missing 'test\r\n' in '{}'",
|
|
||||||
got.lines().nth(1).unwrap(),
|
|
||||||
);
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
@@ -70,7 +70,6 @@ fn example() -> Result<(), Box<Error>> {
|
|||||||
|
|
||||||
#[cfg(feature = "serde1")]
|
#[cfg(feature = "serde1")]
|
||||||
extern crate base64;
|
extern crate base64;
|
||||||
extern crate bstr;
|
|
||||||
extern crate grep_matcher;
|
extern crate grep_matcher;
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
extern crate grep_regex;
|
extern crate grep_regex;
|
||||||
|
@@ -1,4 +1,3 @@
|
|||||||
/// Like assert_eq, but nicer output for long strings.
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
#[macro_export]
|
#[macro_export]
|
||||||
macro_rules! assert_eq_printed {
|
macro_rules! assert_eq_printed {
|
||||||
|
@@ -5,7 +5,6 @@ use std::path::Path;
|
|||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use std::time::Instant;
|
use std::time::Instant;
|
||||||
|
|
||||||
use bstr::BStr;
|
|
||||||
use grep_matcher::{Match, Matcher};
|
use grep_matcher::{Match, Matcher};
|
||||||
use grep_searcher::{
|
use grep_searcher::{
|
||||||
LineStep, Searcher,
|
LineStep, Searcher,
|
||||||
@@ -17,7 +16,10 @@ use termcolor::{ColorSpec, NoColor, WriteColor};
|
|||||||
use color::ColorSpecs;
|
use color::ColorSpecs;
|
||||||
use counter::CounterWriter;
|
use counter::CounterWriter;
|
||||||
use stats::Stats;
|
use stats::Stats;
|
||||||
use util::{PrinterPath, Replacer, Sunk, trim_ascii_prefix};
|
use util::{
|
||||||
|
PrinterPath, Replacer, Sunk,
|
||||||
|
trim_ascii_prefix, trim_ascii_prefix_range,
|
||||||
|
};
|
||||||
|
|
||||||
/// The configuration for the standard printer.
|
/// The configuration for the standard printer.
|
||||||
///
|
///
|
||||||
@@ -34,7 +36,6 @@ struct Config {
|
|||||||
per_match: bool,
|
per_match: bool,
|
||||||
replacement: Arc<Option<Vec<u8>>>,
|
replacement: Arc<Option<Vec<u8>>>,
|
||||||
max_columns: Option<u64>,
|
max_columns: Option<u64>,
|
||||||
max_columns_preview: bool,
|
|
||||||
max_matches: Option<u64>,
|
max_matches: Option<u64>,
|
||||||
column: bool,
|
column: bool,
|
||||||
byte_offset: bool,
|
byte_offset: bool,
|
||||||
@@ -58,7 +59,6 @@ impl Default for Config {
|
|||||||
per_match: false,
|
per_match: false,
|
||||||
replacement: Arc::new(None),
|
replacement: Arc::new(None),
|
||||||
max_columns: None,
|
max_columns: None,
|
||||||
max_columns_preview: false,
|
|
||||||
max_matches: None,
|
max_matches: None,
|
||||||
column: false,
|
column: false,
|
||||||
byte_offset: false,
|
byte_offset: false,
|
||||||
@@ -263,21 +263,6 @@ impl StandardBuilder {
|
|||||||
self
|
self
|
||||||
}
|
}
|
||||||
|
|
||||||
/// When enabled, if a line is found to be over the configured maximum
|
|
||||||
/// column limit (measured in terms of bytes), then a preview of the long
|
|
||||||
/// line will be printed instead.
|
|
||||||
///
|
|
||||||
/// The preview will correspond to the first `N` *grapheme clusters* of
|
|
||||||
/// the line, where `N` is the limit configured by `max_columns`.
|
|
||||||
///
|
|
||||||
/// If no limit is set, then enabling this has no effect.
|
|
||||||
///
|
|
||||||
/// This is disabled by default.
|
|
||||||
pub fn max_columns_preview(&mut self, yes: bool) -> &mut StandardBuilder {
|
|
||||||
self.config.max_columns_preview = yes;
|
|
||||||
self
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Set the maximum amount of matching lines that are printed.
|
/// Set the maximum amount of matching lines that are printed.
|
||||||
///
|
///
|
||||||
/// If multi line search is enabled and a match spans multiple lines, then
|
/// If multi line search is enabled and a match spans multiple lines, then
|
||||||
@@ -758,11 +743,6 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
|
|||||||
stats.add_matches(self.standard.matches.len() as u64);
|
stats.add_matches(self.standard.matches.len() as u64);
|
||||||
stats.add_matched_lines(mat.lines().count() as u64);
|
stats.add_matched_lines(mat.lines().count() as u64);
|
||||||
}
|
}
|
||||||
if searcher.binary_detection().convert_byte().is_some() {
|
|
||||||
if self.binary_byte_offset.is_some() {
|
|
||||||
return Ok(false);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
StandardImpl::from_match(searcher, self, mat).sink()?;
|
StandardImpl::from_match(searcher, self, mat).sink()?;
|
||||||
Ok(!self.should_quit())
|
Ok(!self.should_quit())
|
||||||
@@ -784,12 +764,6 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
|
|||||||
self.record_matches(ctx.bytes())?;
|
self.record_matches(ctx.bytes())?;
|
||||||
self.replace(ctx.bytes())?;
|
self.replace(ctx.bytes())?;
|
||||||
}
|
}
|
||||||
if searcher.binary_detection().convert_byte().is_some() {
|
|
||||||
if self.binary_byte_offset.is_some() {
|
|
||||||
return Ok(false);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
StandardImpl::from_context(searcher, self, ctx).sink()?;
|
StandardImpl::from_context(searcher, self, ctx).sink()?;
|
||||||
Ok(!self.should_quit())
|
Ok(!self.should_quit())
|
||||||
}
|
}
|
||||||
@@ -802,15 +776,6 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
|
|||||||
Ok(true)
|
Ok(true)
|
||||||
}
|
}
|
||||||
|
|
||||||
fn binary_data(
|
|
||||||
&mut self,
|
|
||||||
_searcher: &Searcher,
|
|
||||||
binary_byte_offset: u64,
|
|
||||||
) -> Result<bool, io::Error> {
|
|
||||||
self.binary_byte_offset = Some(binary_byte_offset);
|
|
||||||
Ok(true)
|
|
||||||
}
|
|
||||||
|
|
||||||
fn begin(
|
fn begin(
|
||||||
&mut self,
|
&mut self,
|
||||||
_searcher: &Searcher,
|
_searcher: &Searcher,
|
||||||
@@ -828,12 +793,10 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
|
|||||||
|
|
||||||
fn finish(
|
fn finish(
|
||||||
&mut self,
|
&mut self,
|
||||||
searcher: &Searcher,
|
_searcher: &Searcher,
|
||||||
finish: &SinkFinish,
|
finish: &SinkFinish,
|
||||||
) -> Result<(), io::Error> {
|
) -> Result<(), io::Error> {
|
||||||
if let Some(offset) = self.binary_byte_offset {
|
self.binary_byte_offset = finish.binary_byte_offset();
|
||||||
StandardImpl::new(searcher, self).write_binary_message(offset)?;
|
|
||||||
}
|
|
||||||
if let Some(stats) = self.stats.as_mut() {
|
if let Some(stats) = self.stats.as_mut() {
|
||||||
stats.add_elapsed(self.start_time.elapsed());
|
stats.add_elapsed(self.start_time.elapsed());
|
||||||
stats.add_searches(1);
|
stats.add_searches(1);
|
||||||
@@ -1037,11 +1000,43 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
|||||||
)?;
|
)?;
|
||||||
count += 1;
|
count += 1;
|
||||||
if self.exceeds_max_columns(&bytes[line]) {
|
if self.exceeds_max_columns(&bytes[line]) {
|
||||||
self.write_exceeded_line(bytes, line, matches, &mut midx)?;
|
self.write_exceeded_line()?;
|
||||||
} else {
|
continue;
|
||||||
self.write_colored_matches(bytes, line, matches, &mut midx)?;
|
|
||||||
self.write_line_term()?;
|
|
||||||
}
|
}
|
||||||
|
if self.has_line_terminator(&bytes[line]) {
|
||||||
|
line = line.with_end(line.end() - 1);
|
||||||
|
}
|
||||||
|
if self.config().trim_ascii {
|
||||||
|
line = self.trim_ascii_prefix_range(bytes, line);
|
||||||
|
}
|
||||||
|
|
||||||
|
while !line.is_empty() {
|
||||||
|
if matches[midx].end() <= line.start() {
|
||||||
|
if midx + 1 < matches.len() {
|
||||||
|
midx += 1;
|
||||||
|
continue;
|
||||||
|
} else {
|
||||||
|
self.end_color_match()?;
|
||||||
|
self.write(&bytes[line])?;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
let m = matches[midx];
|
||||||
|
|
||||||
|
if line.start() < m.start() {
|
||||||
|
let upto = cmp::min(line.end(), m.start());
|
||||||
|
self.end_color_match()?;
|
||||||
|
self.write(&bytes[line.with_end(upto)])?;
|
||||||
|
line = line.with_start(upto);
|
||||||
|
} else {
|
||||||
|
let upto = cmp::min(line.end(), m.end());
|
||||||
|
self.start_color_match()?;
|
||||||
|
self.write(&bytes[line.with_end(upto)])?;
|
||||||
|
line = line.with_start(upto);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
self.end_color_match()?;
|
||||||
|
self.write_line_term()?;
|
||||||
}
|
}
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
@@ -1056,8 +1051,12 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
|||||||
let mut stepper = LineStep::new(line_term, 0, bytes.len());
|
let mut stepper = LineStep::new(line_term, 0, bytes.len());
|
||||||
while let Some((start, end)) = stepper.next(bytes) {
|
while let Some((start, end)) = stepper.next(bytes) {
|
||||||
let mut line = Match::new(start, end);
|
let mut line = Match::new(start, end);
|
||||||
self.trim_line_terminator(bytes, &mut line);
|
if self.has_line_terminator(&bytes[line]) {
|
||||||
self.trim_ascii_prefix(bytes, &mut line);
|
line = line.with_end(line.end() - 1);
|
||||||
|
}
|
||||||
|
if self.config().trim_ascii {
|
||||||
|
line = self.trim_ascii_prefix_range(bytes, line);
|
||||||
|
}
|
||||||
while !line.is_empty() {
|
while !line.is_empty() {
|
||||||
if matches[midx].end() <= line.start() {
|
if matches[midx].end() <= line.start() {
|
||||||
if midx + 1 < matches.len() {
|
if midx + 1 < matches.len() {
|
||||||
@@ -1080,19 +1079,14 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
|||||||
Some(m.start() as u64 + 1),
|
Some(m.start() as u64 + 1),
|
||||||
)?;
|
)?;
|
||||||
|
|
||||||
let this_line = line.with_end(upto);
|
let buf = &bytes[line.with_end(upto)];
|
||||||
line = line.with_start(upto);
|
line = line.with_start(upto);
|
||||||
if self.exceeds_max_columns(&bytes[this_line]) {
|
if self.exceeds_max_columns(&buf) {
|
||||||
self.write_exceeded_line(
|
self.write_exceeded_line()?;
|
||||||
bytes,
|
continue;
|
||||||
this_line,
|
|
||||||
matches,
|
|
||||||
&mut midx,
|
|
||||||
)?;
|
|
||||||
} else {
|
|
||||||
self.write_spec(spec, &bytes[this_line])?;
|
|
||||||
self.write_line_term()?;
|
|
||||||
}
|
}
|
||||||
|
self.write_spec(spec, buf)?;
|
||||||
|
self.write_line_term()?;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
count += 1;
|
count += 1;
|
||||||
@@ -1123,11 +1117,15 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
|||||||
)?;
|
)?;
|
||||||
count += 1;
|
count += 1;
|
||||||
if self.exceeds_max_columns(&bytes[line]) {
|
if self.exceeds_max_columns(&bytes[line]) {
|
||||||
self.write_exceeded_line(bytes, line, &[m], &mut 0)?;
|
self.write_exceeded_line()?;
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
self.trim_line_terminator(bytes, &mut line);
|
if self.has_line_terminator(&bytes[line]) {
|
||||||
self.trim_ascii_prefix(bytes, &mut line);
|
line = line.with_end(line.end() - 1);
|
||||||
|
}
|
||||||
|
if self.config().trim_ascii {
|
||||||
|
line = self.trim_ascii_prefix_range(bytes, line);
|
||||||
|
}
|
||||||
|
|
||||||
while !line.is_empty() {
|
while !line.is_empty() {
|
||||||
if m.end() <= line.start() {
|
if m.end() <= line.start() {
|
||||||
@@ -1184,10 +1182,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
|||||||
line: &[u8],
|
line: &[u8],
|
||||||
) -> io::Result<()> {
|
) -> io::Result<()> {
|
||||||
if self.exceeds_max_columns(line) {
|
if self.exceeds_max_columns(line) {
|
||||||
let range = Match::new(0, line.len());
|
self.write_exceeded_line()?;
|
||||||
self.write_exceeded_line(
|
|
||||||
line, range, self.sunk.matches(), &mut 0,
|
|
||||||
)?;
|
|
||||||
} else {
|
} else {
|
||||||
self.write_trim(line)?;
|
self.write_trim(line)?;
|
||||||
if !self.has_line_terminator(line) {
|
if !self.has_line_terminator(line) {
|
||||||
@@ -1200,114 +1195,50 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
|||||||
fn write_colored_line(
|
fn write_colored_line(
|
||||||
&self,
|
&self,
|
||||||
matches: &[Match],
|
matches: &[Match],
|
||||||
bytes: &[u8],
|
line: &[u8],
|
||||||
) -> io::Result<()> {
|
) -> io::Result<()> {
|
||||||
// If we know we aren't going to emit color, then we can go faster.
|
// If we know we aren't going to emit color, then we can go faster.
|
||||||
let spec = self.config().colors.matched();
|
let spec = self.config().colors.matched();
|
||||||
if !self.wtr().borrow().supports_color() || spec.is_none() {
|
if !self.wtr().borrow().supports_color() || spec.is_none() {
|
||||||
return self.write_line(bytes);
|
return self.write_line(line);
|
||||||
|
}
|
||||||
|
if self.exceeds_max_columns(line) {
|
||||||
|
return self.write_exceeded_line();
|
||||||
}
|
}
|
||||||
|
|
||||||
let line = Match::new(0, bytes.len());
|
let mut last_written =
|
||||||
if self.exceeds_max_columns(bytes) {
|
if !self.config().trim_ascii {
|
||||||
self.write_exceeded_line(bytes, line, matches, &mut 0)
|
0
|
||||||
} else {
|
|
||||||
self.write_colored_matches(bytes, line, matches, &mut 0)?;
|
|
||||||
self.write_line_term()?;
|
|
||||||
Ok(())
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Write the `line` portion of `bytes`, with appropriate coloring for
|
|
||||||
/// each `match`, starting at `match_index`.
|
|
||||||
///
|
|
||||||
/// This accounts for trimming any whitespace prefix and will *never* print
|
|
||||||
/// a line terminator. If a match exceeds the range specified by `line`,
|
|
||||||
/// then only the part of the match within `line` (if any) is printed.
|
|
||||||
fn write_colored_matches(
|
|
||||||
&self,
|
|
||||||
bytes: &[u8],
|
|
||||||
mut line: Match,
|
|
||||||
matches: &[Match],
|
|
||||||
match_index: &mut usize,
|
|
||||||
) -> io::Result<()> {
|
|
||||||
self.trim_line_terminator(bytes, &mut line);
|
|
||||||
self.trim_ascii_prefix(bytes, &mut line);
|
|
||||||
if matches.is_empty() {
|
|
||||||
self.write(&bytes[line])?;
|
|
||||||
return Ok(());
|
|
||||||
}
|
|
||||||
while !line.is_empty() {
|
|
||||||
if matches[*match_index].end() <= line.start() {
|
|
||||||
if *match_index + 1 < matches.len() {
|
|
||||||
*match_index += 1;
|
|
||||||
continue;
|
|
||||||
} else {
|
|
||||||
self.end_color_match()?;
|
|
||||||
self.write(&bytes[line])?;
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
let m = matches[*match_index];
|
|
||||||
if line.start() < m.start() {
|
|
||||||
let upto = cmp::min(line.end(), m.start());
|
|
||||||
self.end_color_match()?;
|
|
||||||
self.write(&bytes[line.with_end(upto)])?;
|
|
||||||
line = line.with_start(upto);
|
|
||||||
} else {
|
} else {
|
||||||
let upto = cmp::min(line.end(), m.end());
|
self.trim_ascii_prefix_range(
|
||||||
self.start_color_match()?;
|
line,
|
||||||
self.write(&bytes[line.with_end(upto)])?;
|
Match::new(0, line.len()),
|
||||||
line = line.with_start(upto);
|
).start()
|
||||||
|
};
|
||||||
|
for mut m in matches.iter().map(|&m| m) {
|
||||||
|
if last_written < m.start() {
|
||||||
|
self.end_color_match()?;
|
||||||
|
self.write(&line[last_written..m.start()])?;
|
||||||
|
} else if last_written < m.end() {
|
||||||
|
m = m.with_start(last_written);
|
||||||
|
} else {
|
||||||
|
continue;
|
||||||
}
|
}
|
||||||
|
if !m.is_empty() {
|
||||||
|
self.start_color_match()?;
|
||||||
|
self.write(&line[m])?;
|
||||||
|
}
|
||||||
|
last_written = m.end();
|
||||||
}
|
}
|
||||||
self.end_color_match()?;
|
self.end_color_match()?;
|
||||||
|
self.write(&line[last_written..])?;
|
||||||
|
if !self.has_line_terminator(line) {
|
||||||
|
self.write_line_term()?;
|
||||||
|
}
|
||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
fn write_exceeded_line(
|
fn write_exceeded_line(&self) -> io::Result<()> {
|
||||||
&self,
|
|
||||||
bytes: &[u8],
|
|
||||||
mut line: Match,
|
|
||||||
matches: &[Match],
|
|
||||||
match_index: &mut usize,
|
|
||||||
) -> io::Result<()> {
|
|
||||||
if self.config().max_columns_preview {
|
|
||||||
let original = line;
|
|
||||||
let end = BStr::new(&bytes[line])
|
|
||||||
.grapheme_indices()
|
|
||||||
.map(|(_, end, _)| end)
|
|
||||||
.take(self.config().max_columns.unwrap_or(0) as usize)
|
|
||||||
.last()
|
|
||||||
.unwrap_or(0) + line.start();
|
|
||||||
line = line.with_end(end);
|
|
||||||
self.write_colored_matches(bytes, line, matches, match_index)?;
|
|
||||||
|
|
||||||
if matches.is_empty() {
|
|
||||||
self.write(b" [... omitted end of long line]")?;
|
|
||||||
} else {
|
|
||||||
let remaining = matches
|
|
||||||
.iter()
|
|
||||||
.filter(|m| {
|
|
||||||
m.start() >= line.end() && m.start() < original.end()
|
|
||||||
})
|
|
||||||
.count();
|
|
||||||
let tense =
|
|
||||||
if remaining == 1 {
|
|
||||||
"match"
|
|
||||||
} else {
|
|
||||||
"matches"
|
|
||||||
};
|
|
||||||
write!(
|
|
||||||
self.wtr().borrow_mut(),
|
|
||||||
" [... {} more {}]",
|
|
||||||
remaining, tense,
|
|
||||||
)?;
|
|
||||||
}
|
|
||||||
self.write_line_term()?;
|
|
||||||
return Ok(());
|
|
||||||
}
|
|
||||||
if self.sunk.original_matches().is_empty() {
|
if self.sunk.original_matches().is_empty() {
|
||||||
if self.is_context() {
|
if self.is_context() {
|
||||||
self.write(b"[Omitted long context line]")?;
|
self.write(b"[Omitted long context line]")?;
|
||||||
@@ -1383,38 +1314,6 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
|||||||
Ok(())
|
Ok(())
|
||||||
}
|
}
|
||||||
|
|
||||||
fn write_binary_message(&self, offset: u64) -> io::Result<()> {
|
|
||||||
if self.sink.match_count == 0 {
|
|
||||||
return Ok(());
|
|
||||||
}
|
|
||||||
|
|
||||||
let bin = self.searcher.binary_detection();
|
|
||||||
if let Some(byte) = bin.quit_byte() {
|
|
||||||
self.write(b"WARNING: stopped searching binary file ")?;
|
|
||||||
if let Some(path) = self.path() {
|
|
||||||
self.write_spec(self.config().colors.path(), path.as_bytes())?;
|
|
||||||
self.write(b" ")?;
|
|
||||||
}
|
|
||||||
let remainder = format!(
|
|
||||||
"after match (found {:?} byte around offset {})\n",
|
|
||||||
BStr::new(&[byte]), offset,
|
|
||||||
);
|
|
||||||
self.write(remainder.as_bytes())?;
|
|
||||||
} else if let Some(byte) = bin.convert_byte() {
|
|
||||||
self.write(b"Binary file ")?;
|
|
||||||
if let Some(path) = self.path() {
|
|
||||||
self.write_spec(self.config().colors.path(), path.as_bytes())?;
|
|
||||||
self.write(b" ")?;
|
|
||||||
}
|
|
||||||
let remainder = format!(
|
|
||||||
"matches (found {:?} byte around offset {})\n",
|
|
||||||
BStr::new(&[byte]), offset,
|
|
||||||
);
|
|
||||||
self.write(remainder.as_bytes())?;
|
|
||||||
}
|
|
||||||
Ok(())
|
|
||||||
}
|
|
||||||
|
|
||||||
fn write_context_separator(&self) -> io::Result<()> {
|
fn write_context_separator(&self) -> io::Result<()> {
|
||||||
if let Some(ref sep) = *self.config().separator_context {
|
if let Some(ref sep) = *self.config().separator_context {
|
||||||
self.write(sep)?;
|
self.write(sep)?;
|
||||||
@@ -1490,26 +1389,13 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
|||||||
if !self.config().trim_ascii {
|
if !self.config().trim_ascii {
|
||||||
return self.write(buf);
|
return self.write(buf);
|
||||||
}
|
}
|
||||||
let mut range = Match::new(0, buf.len());
|
self.write(self.trim_ascii_prefix(buf))
|
||||||
self.trim_ascii_prefix(buf, &mut range);
|
|
||||||
self.write(&buf[range])
|
|
||||||
}
|
}
|
||||||
|
|
||||||
fn write(&self, buf: &[u8]) -> io::Result<()> {
|
fn write(&self, buf: &[u8]) -> io::Result<()> {
|
||||||
self.wtr().borrow_mut().write_all(buf)
|
self.wtr().borrow_mut().write_all(buf)
|
||||||
}
|
}
|
||||||
|
|
||||||
fn trim_line_terminator(&self, buf: &[u8], line: &mut Match) {
|
|
||||||
let lineterm = self.searcher.line_terminator();
|
|
||||||
if lineterm.is_suffix(&buf[*line]) {
|
|
||||||
let mut end = line.end() - 1;
|
|
||||||
if lineterm.is_crlf() && buf[end - 1] == b'\r' {
|
|
||||||
end -= 1;
|
|
||||||
}
|
|
||||||
*line = line.with_end(end);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
fn has_line_terminator(&self, buf: &[u8]) -> bool {
|
fn has_line_terminator(&self, buf: &[u8]) -> bool {
|
||||||
self.searcher.line_terminator().is_suffix(buf)
|
self.searcher.line_terminator().is_suffix(buf)
|
||||||
}
|
}
|
||||||
@@ -1565,12 +1451,14 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
|||||||
///
|
///
|
||||||
/// This stops trimming a prefix as soon as it sees non-whitespace or a
|
/// This stops trimming a prefix as soon as it sees non-whitespace or a
|
||||||
/// line terminator.
|
/// line terminator.
|
||||||
fn trim_ascii_prefix(&self, slice: &[u8], range: &mut Match) {
|
fn trim_ascii_prefix_range(&self, slice: &[u8], range: Match) -> Match {
|
||||||
if !self.config().trim_ascii {
|
trim_ascii_prefix_range(self.searcher.line_terminator(), slice, range)
|
||||||
return;
|
}
|
||||||
}
|
|
||||||
let lineterm = self.searcher.line_terminator();
|
/// Trim prefix ASCII spaces from the given slice and return the
|
||||||
*range = trim_ascii_prefix(lineterm, slice, *range)
|
/// corresponding sub-slice.
|
||||||
|
fn trim_ascii_prefix<'s>(&self, slice: &'s [u8]) -> &'s [u8] {
|
||||||
|
trim_ascii_prefix(self.searcher.line_terminator(), slice)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -2337,31 +2225,6 @@ but Doctor Watson has to have it taken out for him and dusted,
|
|||||||
assert_eq_printed!(expected, got);
|
assert_eq_printed!(expected, got);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
|
||||||
fn max_columns_preview() {
|
|
||||||
let matcher = RegexMatcher::new("exhibited|dusted").unwrap();
|
|
||||||
let mut printer = StandardBuilder::new()
|
|
||||||
.max_columns(Some(46))
|
|
||||||
.max_columns_preview(true)
|
|
||||||
.build(NoColor::new(vec![]));
|
|
||||||
SearcherBuilder::new()
|
|
||||||
.line_number(false)
|
|
||||||
.build()
|
|
||||||
.search_reader(
|
|
||||||
&matcher,
|
|
||||||
SHERLOCK.as_bytes(),
|
|
||||||
printer.sink(&matcher),
|
|
||||||
)
|
|
||||||
.unwrap();
|
|
||||||
|
|
||||||
let got = printer_contents(&mut printer);
|
|
||||||
let expected = "\
|
|
||||||
but Doctor Watson has to have it taken out for [... omitted end of long line]
|
|
||||||
and exhibited clearly, with a label attached.
|
|
||||||
";
|
|
||||||
assert_eq_printed!(expected, got);
|
|
||||||
}
|
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn max_columns_with_count() {
|
fn max_columns_with_count() {
|
||||||
let matcher = RegexMatcher::new("cigar|ash|dusted").unwrap();
|
let matcher = RegexMatcher::new("cigar|ash|dusted").unwrap();
|
||||||
@@ -2387,86 +2250,6 @@ but Doctor Watson has to have it taken out for him and dusted,
|
|||||||
assert_eq_printed!(expected, got);
|
assert_eq_printed!(expected, got);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
|
||||||
fn max_columns_with_count_preview_no_match() {
|
|
||||||
let matcher = RegexMatcher::new("exhibited|has to have it").unwrap();
|
|
||||||
let mut printer = StandardBuilder::new()
|
|
||||||
.stats(true)
|
|
||||||
.max_columns(Some(46))
|
|
||||||
.max_columns_preview(true)
|
|
||||||
.build(NoColor::new(vec![]));
|
|
||||||
SearcherBuilder::new()
|
|
||||||
.line_number(false)
|
|
||||||
.build()
|
|
||||||
.search_reader(
|
|
||||||
&matcher,
|
|
||||||
SHERLOCK.as_bytes(),
|
|
||||||
printer.sink(&matcher),
|
|
||||||
)
|
|
||||||
.unwrap();
|
|
||||||
|
|
||||||
let got = printer_contents(&mut printer);
|
|
||||||
let expected = "\
|
|
||||||
but Doctor Watson has to have it taken out for [... 0 more matches]
|
|
||||||
and exhibited clearly, with a label attached.
|
|
||||||
";
|
|
||||||
assert_eq_printed!(expected, got);
|
|
||||||
}
|
|
||||||
|
|
||||||
#[test]
|
|
||||||
fn max_columns_with_count_preview_one_match() {
|
|
||||||
let matcher = RegexMatcher::new("exhibited|dusted").unwrap();
|
|
||||||
let mut printer = StandardBuilder::new()
|
|
||||||
.stats(true)
|
|
||||||
.max_columns(Some(46))
|
|
||||||
.max_columns_preview(true)
|
|
||||||
.build(NoColor::new(vec![]));
|
|
||||||
SearcherBuilder::new()
|
|
||||||
.line_number(false)
|
|
||||||
.build()
|
|
||||||
.search_reader(
|
|
||||||
&matcher,
|
|
||||||
SHERLOCK.as_bytes(),
|
|
||||||
printer.sink(&matcher),
|
|
||||||
)
|
|
||||||
.unwrap();
|
|
||||||
|
|
||||||
let got = printer_contents(&mut printer);
|
|
||||||
let expected = "\
|
|
||||||
but Doctor Watson has to have it taken out for [... 1 more match]
|
|
||||||
and exhibited clearly, with a label attached.
|
|
||||||
";
|
|
||||||
assert_eq_printed!(expected, got);
|
|
||||||
}
|
|
||||||
|
|
||||||
#[test]
|
|
||||||
fn max_columns_with_count_preview_two_matches() {
|
|
||||||
let matcher = RegexMatcher::new(
|
|
||||||
"exhibited|dusted|has to have it",
|
|
||||||
).unwrap();
|
|
||||||
let mut printer = StandardBuilder::new()
|
|
||||||
.stats(true)
|
|
||||||
.max_columns(Some(46))
|
|
||||||
.max_columns_preview(true)
|
|
||||||
.build(NoColor::new(vec![]));
|
|
||||||
SearcherBuilder::new()
|
|
||||||
.line_number(false)
|
|
||||||
.build()
|
|
||||||
.search_reader(
|
|
||||||
&matcher,
|
|
||||||
SHERLOCK.as_bytes(),
|
|
||||||
printer.sink(&matcher),
|
|
||||||
)
|
|
||||||
.unwrap();
|
|
||||||
|
|
||||||
let got = printer_contents(&mut printer);
|
|
||||||
let expected = "\
|
|
||||||
but Doctor Watson has to have it taken out for [... 1 more match]
|
|
||||||
and exhibited clearly, with a label attached.
|
|
||||||
";
|
|
||||||
assert_eq_printed!(expected, got);
|
|
||||||
}
|
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn max_columns_multi_line() {
|
fn max_columns_multi_line() {
|
||||||
let matcher = RegexMatcher::new("(?s)ash.+dusted").unwrap();
|
let matcher = RegexMatcher::new("(?s)ash.+dusted").unwrap();
|
||||||
@@ -2492,36 +2275,6 @@ but Doctor Watson has to have it taken out for him and dusted,
|
|||||||
assert_eq_printed!(expected, got);
|
assert_eq_printed!(expected, got);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
|
||||||
fn max_columns_multi_line_preview() {
|
|
||||||
let matcher = RegexMatcher::new(
|
|
||||||
"(?s)clew|cigar ash.+have it|exhibited",
|
|
||||||
).unwrap();
|
|
||||||
let mut printer = StandardBuilder::new()
|
|
||||||
.stats(true)
|
|
||||||
.max_columns(Some(46))
|
|
||||||
.max_columns_preview(true)
|
|
||||||
.build(NoColor::new(vec![]));
|
|
||||||
SearcherBuilder::new()
|
|
||||||
.line_number(false)
|
|
||||||
.multi_line(true)
|
|
||||||
.build()
|
|
||||||
.search_reader(
|
|
||||||
&matcher,
|
|
||||||
SHERLOCK.as_bytes(),
|
|
||||||
printer.sink(&matcher),
|
|
||||||
)
|
|
||||||
.unwrap();
|
|
||||||
|
|
||||||
let got = printer_contents(&mut printer);
|
|
||||||
let expected = "\
|
|
||||||
can extract a clew from a wisp of straw or a f [... 1 more match]
|
|
||||||
but Doctor Watson has to have it taken out for [... 0 more matches]
|
|
||||||
and exhibited clearly, with a label attached.
|
|
||||||
";
|
|
||||||
assert_eq_printed!(expected, got);
|
|
||||||
}
|
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn max_matches() {
|
fn max_matches() {
|
||||||
let matcher = RegexMatcher::new("Sherlock").unwrap();
|
let matcher = RegexMatcher::new("Sherlock").unwrap();
|
||||||
@@ -2811,40 +2564,8 @@ Holmeses, success in the province of detective work must always
|
|||||||
assert_eq_printed!(expected, got);
|
assert_eq_printed!(expected, got);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
|
||||||
fn only_matching_max_columns_preview() {
|
|
||||||
let matcher = RegexMatcher::new("Doctor Watsons|Sherlock").unwrap();
|
|
||||||
let mut printer = StandardBuilder::new()
|
|
||||||
.only_matching(true)
|
|
||||||
.max_columns(Some(10))
|
|
||||||
.max_columns_preview(true)
|
|
||||||
.column(true)
|
|
||||||
.build(NoColor::new(vec![]));
|
|
||||||
SearcherBuilder::new()
|
|
||||||
.line_number(true)
|
|
||||||
.build()
|
|
||||||
.search_reader(
|
|
||||||
&matcher,
|
|
||||||
SHERLOCK.as_bytes(),
|
|
||||||
printer.sink(&matcher),
|
|
||||||
)
|
|
||||||
.unwrap();
|
|
||||||
|
|
||||||
let got = printer_contents(&mut printer);
|
|
||||||
let expected = "\
|
|
||||||
1:9:Doctor Wat [... 0 more matches]
|
|
||||||
1:57:Sherlock
|
|
||||||
3:49:Sherlock
|
|
||||||
";
|
|
||||||
assert_eq_printed!(expected, got);
|
|
||||||
}
|
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn only_matching_max_columns_multi_line1() {
|
fn only_matching_max_columns_multi_line1() {
|
||||||
// The `(?s:.{0})` trick fools the matcher into thinking that it
|
|
||||||
// can match across multiple lines without actually doing so. This is
|
|
||||||
// so we can test multi-line handling in the case of a match on only
|
|
||||||
// one line.
|
|
||||||
let matcher = RegexMatcher::new(
|
let matcher = RegexMatcher::new(
|
||||||
r"(?s:.{0})(Doctor Watsons|Sherlock)"
|
r"(?s:.{0})(Doctor Watsons|Sherlock)"
|
||||||
).unwrap();
|
).unwrap();
|
||||||
@@ -2873,41 +2594,6 @@ Holmeses, success in the province of detective work must always
|
|||||||
assert_eq_printed!(expected, got);
|
assert_eq_printed!(expected, got);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
|
||||||
fn only_matching_max_columns_preview_multi_line1() {
|
|
||||||
// The `(?s:.{0})` trick fools the matcher into thinking that it
|
|
||||||
// can match across multiple lines without actually doing so. This is
|
|
||||||
// so we can test multi-line handling in the case of a match on only
|
|
||||||
// one line.
|
|
||||||
let matcher = RegexMatcher::new(
|
|
||||||
r"(?s:.{0})(Doctor Watsons|Sherlock)"
|
|
||||||
).unwrap();
|
|
||||||
let mut printer = StandardBuilder::new()
|
|
||||||
.only_matching(true)
|
|
||||||
.max_columns(Some(10))
|
|
||||||
.max_columns_preview(true)
|
|
||||||
.column(true)
|
|
||||||
.build(NoColor::new(vec![]));
|
|
||||||
SearcherBuilder::new()
|
|
||||||
.multi_line(true)
|
|
||||||
.line_number(true)
|
|
||||||
.build()
|
|
||||||
.search_reader(
|
|
||||||
&matcher,
|
|
||||||
SHERLOCK.as_bytes(),
|
|
||||||
printer.sink(&matcher),
|
|
||||||
)
|
|
||||||
.unwrap();
|
|
||||||
|
|
||||||
let got = printer_contents(&mut printer);
|
|
||||||
let expected = "\
|
|
||||||
1:9:Doctor Wat [... 0 more matches]
|
|
||||||
1:57:Sherlock
|
|
||||||
3:49:Sherlock
|
|
||||||
";
|
|
||||||
assert_eq_printed!(expected, got);
|
|
||||||
}
|
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn only_matching_max_columns_multi_line2() {
|
fn only_matching_max_columns_multi_line2() {
|
||||||
let matcher = RegexMatcher::new(
|
let matcher = RegexMatcher::new(
|
||||||
@@ -2939,38 +2625,6 @@ Holmeses, success in the province of detective work must always
|
|||||||
assert_eq_printed!(expected, got);
|
assert_eq_printed!(expected, got);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
|
||||||
fn only_matching_max_columns_preview_multi_line2() {
|
|
||||||
let matcher = RegexMatcher::new(
|
|
||||||
r"(?s)Watson.+?(Holmeses|clearly)"
|
|
||||||
).unwrap();
|
|
||||||
let mut printer = StandardBuilder::new()
|
|
||||||
.only_matching(true)
|
|
||||||
.max_columns(Some(50))
|
|
||||||
.max_columns_preview(true)
|
|
||||||
.column(true)
|
|
||||||
.build(NoColor::new(vec![]));
|
|
||||||
SearcherBuilder::new()
|
|
||||||
.multi_line(true)
|
|
||||||
.line_number(true)
|
|
||||||
.build()
|
|
||||||
.search_reader(
|
|
||||||
&matcher,
|
|
||||||
SHERLOCK.as_bytes(),
|
|
||||||
printer.sink(&matcher),
|
|
||||||
)
|
|
||||||
.unwrap();
|
|
||||||
|
|
||||||
let got = printer_contents(&mut printer);
|
|
||||||
let expected = "\
|
|
||||||
1:16:Watsons of this world, as opposed to the Sherlock
|
|
||||||
2:16:Holmeses
|
|
||||||
5:12:Watson has to have it taken out for him and dusted [... 0 more matches]
|
|
||||||
6:12:and exhibited clearly
|
|
||||||
";
|
|
||||||
assert_eq_printed!(expected, got);
|
|
||||||
}
|
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn per_match() {
|
fn per_match() {
|
||||||
let matcher = RegexMatcher::new("Doctor Watsons|Sherlock").unwrap();
|
let matcher = RegexMatcher::new("Doctor Watsons|Sherlock").unwrap();
|
||||||
@@ -3166,61 +2820,6 @@ Holmeses, success in the province of detective work must always
|
|||||||
assert_eq_printed!(expected, got);
|
assert_eq_printed!(expected, got);
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
|
||||||
fn replacement_max_columns_preview1() {
|
|
||||||
let matcher = RegexMatcher::new(r"Sherlock|Doctor (\w+)").unwrap();
|
|
||||||
let mut printer = StandardBuilder::new()
|
|
||||||
.max_columns(Some(67))
|
|
||||||
.max_columns_preview(true)
|
|
||||||
.replacement(Some(b"doctah $1 MD".to_vec()))
|
|
||||||
.build(NoColor::new(vec![]));
|
|
||||||
SearcherBuilder::new()
|
|
||||||
.line_number(true)
|
|
||||||
.build()
|
|
||||||
.search_reader(
|
|
||||||
&matcher,
|
|
||||||
SHERLOCK.as_bytes(),
|
|
||||||
printer.sink(&matcher),
|
|
||||||
)
|
|
||||||
.unwrap();
|
|
||||||
|
|
||||||
let got = printer_contents(&mut printer);
|
|
||||||
let expected = "\
|
|
||||||
1:For the doctah Watsons MD of this world, as opposed to the doctah [... 0 more matches]
|
|
||||||
3:be, to a very large extent, the result of luck. doctah MD Holmes
|
|
||||||
5:but doctah Watson MD has to have it taken out for him and dusted,
|
|
||||||
";
|
|
||||||
assert_eq_printed!(expected, got);
|
|
||||||
}
|
|
||||||
|
|
||||||
#[test]
|
|
||||||
fn replacement_max_columns_preview2() {
|
|
||||||
let matcher = RegexMatcher::new(
|
|
||||||
"exhibited|dusted|has to have it",
|
|
||||||
).unwrap();
|
|
||||||
let mut printer = StandardBuilder::new()
|
|
||||||
.max_columns(Some(43))
|
|
||||||
.max_columns_preview(true)
|
|
||||||
.replacement(Some(b"xxx".to_vec()))
|
|
||||||
.build(NoColor::new(vec![]));
|
|
||||||
SearcherBuilder::new()
|
|
||||||
.line_number(false)
|
|
||||||
.build()
|
|
||||||
.search_reader(
|
|
||||||
&matcher,
|
|
||||||
SHERLOCK.as_bytes(),
|
|
||||||
printer.sink(&matcher),
|
|
||||||
)
|
|
||||||
.unwrap();
|
|
||||||
|
|
||||||
let got = printer_contents(&mut printer);
|
|
||||||
let expected = "\
|
|
||||||
but Doctor Watson xxx taken out for him and [... 1 more match]
|
|
||||||
and xxx clearly, with a label attached.
|
|
||||||
";
|
|
||||||
assert_eq_printed!(expected, got);
|
|
||||||
}
|
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn replacement_only_matching() {
|
fn replacement_only_matching() {
|
||||||
let matcher = RegexMatcher::new(r"Sherlock|Doctor (\w+)").unwrap();
|
let matcher = RegexMatcher::new(r"Sherlock|Doctor (\w+)").unwrap();
|
||||||
|
@@ -403,7 +403,7 @@ impl<W: WriteColor> Summary<W> {
|
|||||||
where M: Matcher,
|
where M: Matcher,
|
||||||
P: ?Sized + AsRef<Path>,
|
P: ?Sized + AsRef<Path>,
|
||||||
{
|
{
|
||||||
if !self.config.path && !self.config.kind.requires_path() {
|
if !self.config.path {
|
||||||
return self.sink(matcher);
|
return self.sink(matcher);
|
||||||
}
|
}
|
||||||
let stats =
|
let stats =
|
||||||
@@ -477,10 +477,7 @@ impl<'p, 's, M: Matcher, W: WriteColor> SummarySink<'p, 's, M, W> {
|
|||||||
/// This is unaffected by the result of searches before the previous
|
/// This is unaffected by the result of searches before the previous
|
||||||
/// search.
|
/// search.
|
||||||
pub fn has_match(&self) -> bool {
|
pub fn has_match(&self) -> bool {
|
||||||
match self.summary.config.kind {
|
self.match_count > 0
|
||||||
SummaryKind::PathWithoutMatch => self.match_count == 0,
|
|
||||||
_ => self.match_count > 0,
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/// If binary data was found in the previous search, this returns the
|
/// If binary data was found in the previous search, this returns the
|
||||||
@@ -636,34 +633,6 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for SummarySink<'p, 's, M, W> {
|
|||||||
stats.add_bytes_searched(finish.byte_count());
|
stats.add_bytes_searched(finish.byte_count());
|
||||||
stats.add_bytes_printed(self.summary.wtr.borrow().count());
|
stats.add_bytes_printed(self.summary.wtr.borrow().count());
|
||||||
}
|
}
|
||||||
// If our binary detection method says to quit after seeing binary
|
|
||||||
// data, then we shouldn't print any results at all, even if we've
|
|
||||||
// found a match before detecting binary data. The intent here is to
|
|
||||||
// keep BinaryDetection::quit as a form of filter. Otherwise, we can
|
|
||||||
// present a matching file with a smaller number of matches than
|
|
||||||
// there might be, which can be quite misleading.
|
|
||||||
//
|
|
||||||
// If our binary detection method is to convert binary data, then we
|
|
||||||
// don't quit and therefore search the entire contents of the file.
|
|
||||||
//
|
|
||||||
// There is an unfortunate inconsistency here. Namely, when using
|
|
||||||
// Quiet or PathWithMatch, then the printer can quit after the first
|
|
||||||
// match seen, which could be long before seeing binary data. This
|
|
||||||
// means that using PathWithMatch can print a path where as using
|
|
||||||
// Count might not print it at all because of binary data.
|
|
||||||
//
|
|
||||||
// It's not possible to fix this without also potentially significantly
|
|
||||||
// impacting the performance of Quiet or PathWithMatch, so we accept
|
|
||||||
// the bug.
|
|
||||||
if self.binary_byte_offset.is_some()
|
|
||||||
&& searcher.binary_detection().quit_byte().is_some()
|
|
||||||
{
|
|
||||||
// Squash the match count. The statistics reported will still
|
|
||||||
// contain the match count, but the "official" match count should
|
|
||||||
// be zero.
|
|
||||||
self.match_count = 0;
|
|
||||||
return Ok(());
|
|
||||||
}
|
|
||||||
|
|
||||||
let show_count =
|
let show_count =
|
||||||
!self.summary.config.exclude_zero
|
!self.summary.config.exclude_zero
|
||||||
|
@@ -4,7 +4,6 @@ use std::io;
|
|||||||
use std::path::Path;
|
use std::path::Path;
|
||||||
use std::time;
|
use std::time;
|
||||||
|
|
||||||
use bstr::{BStr, BString};
|
|
||||||
use grep_matcher::{Captures, LineTerminator, Match, Matcher};
|
use grep_matcher::{Captures, LineTerminator, Match, Matcher};
|
||||||
use grep_searcher::{
|
use grep_searcher::{
|
||||||
LineIter,
|
LineIter,
|
||||||
@@ -263,12 +262,26 @@ impl<'a> Sunk<'a> {
|
|||||||
/// portability with a small cost: on Windows, paths that are not valid UTF-16
|
/// portability with a small cost: on Windows, paths that are not valid UTF-16
|
||||||
/// will not roundtrip correctly.
|
/// will not roundtrip correctly.
|
||||||
#[derive(Clone, Debug)]
|
#[derive(Clone, Debug)]
|
||||||
pub struct PrinterPath<'a>(Cow<'a, BStr>);
|
pub struct PrinterPath<'a>(Cow<'a, [u8]>);
|
||||||
|
|
||||||
impl<'a> PrinterPath<'a> {
|
impl<'a> PrinterPath<'a> {
|
||||||
/// Create a new path suitable for printing.
|
/// Create a new path suitable for printing.
|
||||||
pub fn new(path: &'a Path) -> PrinterPath<'a> {
|
pub fn new(path: &'a Path) -> PrinterPath<'a> {
|
||||||
PrinterPath(BString::from_path_lossy(path))
|
PrinterPath::new_impl(path)
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(unix)]
|
||||||
|
fn new_impl(path: &'a Path) -> PrinterPath<'a> {
|
||||||
|
use std::os::unix::ffi::OsStrExt;
|
||||||
|
PrinterPath(Cow::Borrowed(path.as_os_str().as_bytes()))
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(not(unix))]
|
||||||
|
fn new_impl(path: &'a Path) -> PrinterPath<'a> {
|
||||||
|
PrinterPath(match path.to_string_lossy() {
|
||||||
|
Cow::Owned(path) => Cow::Owned(path.into_bytes()),
|
||||||
|
Cow::Borrowed(path) => Cow::Borrowed(path.as_bytes()),
|
||||||
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Create a new printer path from the given path which can be efficiently
|
/// Create a new printer path from the given path which can be efficiently
|
||||||
@@ -289,7 +302,7 @@ impl<'a> PrinterPath<'a> {
|
|||||||
/// path separators that are both replaced by `new_sep`. In all other
|
/// path separators that are both replaced by `new_sep`. In all other
|
||||||
/// environments, only `/` is treated as a path separator.
|
/// environments, only `/` is treated as a path separator.
|
||||||
fn replace_separator(&mut self, new_sep: u8) {
|
fn replace_separator(&mut self, new_sep: u8) {
|
||||||
let transformed_path: BString = self.0.bytes().map(|b| {
|
let transformed_path: Vec<_> = self.as_bytes().iter().map(|&b| {
|
||||||
if b == b'/' || (cfg!(windows) && b == b'\\') {
|
if b == b'/' || (cfg!(windows) && b == b'\\') {
|
||||||
new_sep
|
new_sep
|
||||||
} else {
|
} else {
|
||||||
@@ -301,7 +314,7 @@ impl<'a> PrinterPath<'a> {
|
|||||||
|
|
||||||
/// Return the raw bytes for this path.
|
/// Return the raw bytes for this path.
|
||||||
pub fn as_bytes(&self) -> &[u8] {
|
pub fn as_bytes(&self) -> &[u8] {
|
||||||
self.0.as_bytes()
|
&*self.0
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -346,7 +359,7 @@ impl Serialize for NiceDuration {
|
|||||||
///
|
///
|
||||||
/// This stops trimming a prefix as soon as it sees non-whitespace or a line
|
/// This stops trimming a prefix as soon as it sees non-whitespace or a line
|
||||||
/// terminator.
|
/// terminator.
|
||||||
pub fn trim_ascii_prefix(
|
pub fn trim_ascii_prefix_range(
|
||||||
line_term: LineTerminator,
|
line_term: LineTerminator,
|
||||||
slice: &[u8],
|
slice: &[u8],
|
||||||
range: Match,
|
range: Match,
|
||||||
@@ -366,3 +379,14 @@ pub fn trim_ascii_prefix(
|
|||||||
.count();
|
.count();
|
||||||
range.with_start(range.start() + count)
|
range.with_start(range.start() + count)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Trim prefix ASCII spaces from the given slice and return the corresponding
|
||||||
|
/// sub-slice.
|
||||||
|
pub fn trim_ascii_prefix(line_term: LineTerminator, slice: &[u8]) -> &[u8] {
|
||||||
|
let range = trim_ascii_prefix_range(
|
||||||
|
line_term,
|
||||||
|
slice,
|
||||||
|
Match::new(0, slice.len()),
|
||||||
|
);
|
||||||
|
&slice[range]
|
||||||
|
}
|
||||||
|
@@ -1,6 +1,6 @@
|
|||||||
[package]
|
[package]
|
||||||
name = "grep-regex"
|
name = "grep-regex"
|
||||||
version = "0.1.2" #:version
|
version = "0.1.1" #:version
|
||||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||||
description = """
|
description = """
|
||||||
Use Rust's regex library with the 'grep' crate.
|
Use Rust's regex library with the 'grep' crate.
|
||||||
@@ -13,10 +13,9 @@ keywords = ["regex", "grep", "search", "pattern", "line"]
|
|||||||
license = "Unlicense/MIT"
|
license = "Unlicense/MIT"
|
||||||
|
|
||||||
[dependencies]
|
[dependencies]
|
||||||
aho-corasick = "0.7.3"
|
|
||||||
grep-matcher = { version = "0.1.2", path = "../grep-matcher" }
|
|
||||||
log = "0.4.5"
|
log = "0.4.5"
|
||||||
regex = "1.1"
|
grep-matcher = { version = "0.1.1", path = "../grep-matcher" }
|
||||||
regex-syntax = "0.6.5"
|
regex = "1.0.5"
|
||||||
|
regex-syntax = "0.6.2"
|
||||||
thread_local = "0.3.6"
|
thread_local = "0.3.6"
|
||||||
utf8-ranges = "1.0.1"
|
utf8-ranges = "1.0.1"
|
||||||
|
@@ -1,13 +1,12 @@
|
|||||||
use grep_matcher::{ByteSet, LineTerminator};
|
use grep_matcher::{ByteSet, LineTerminator};
|
||||||
use regex::bytes::{Regex, RegexBuilder};
|
use regex::bytes::{Regex, RegexBuilder};
|
||||||
use regex_syntax::ast::{self, Ast};
|
use regex_syntax::ast::{self, Ast};
|
||||||
use regex_syntax::hir::{self, Hir};
|
use regex_syntax::hir::Hir;
|
||||||
|
|
||||||
use ast::AstAnalysis;
|
use ast::AstAnalysis;
|
||||||
use crlf::crlfify;
|
use crlf::crlfify;
|
||||||
use error::Error;
|
use error::Error;
|
||||||
use literal::LiteralSets;
|
use literal::LiteralSets;
|
||||||
use multi::alternation_literals;
|
|
||||||
use non_matching::non_matching_bytes;
|
use non_matching::non_matching_bytes;
|
||||||
use strip::strip_from_match;
|
use strip::strip_from_match;
|
||||||
|
|
||||||
@@ -68,17 +67,19 @@ impl Config {
|
|||||||
/// If there was a problem parsing the given expression then an error
|
/// If there was a problem parsing the given expression then an error
|
||||||
/// is returned.
|
/// is returned.
|
||||||
pub fn hir(&self, pattern: &str) -> Result<ConfiguredHIR, Error> {
|
pub fn hir(&self, pattern: &str) -> Result<ConfiguredHIR, Error> {
|
||||||
let ast = self.ast(pattern)?;
|
let analysis = self.analysis(pattern)?;
|
||||||
let analysis = self.analysis(&ast)?;
|
let expr = ::regex_syntax::ParserBuilder::new()
|
||||||
let expr = hir::translate::TranslatorBuilder::new()
|
.nest_limit(self.nest_limit)
|
||||||
|
.octal(self.octal)
|
||||||
.allow_invalid_utf8(true)
|
.allow_invalid_utf8(true)
|
||||||
.case_insensitive(self.is_case_insensitive(&analysis))
|
.ignore_whitespace(self.ignore_whitespace)
|
||||||
|
.case_insensitive(self.is_case_insensitive(&analysis)?)
|
||||||
.multi_line(self.multi_line)
|
.multi_line(self.multi_line)
|
||||||
.dot_matches_new_line(self.dot_matches_new_line)
|
.dot_matches_new_line(self.dot_matches_new_line)
|
||||||
.swap_greed(self.swap_greed)
|
.swap_greed(self.swap_greed)
|
||||||
.unicode(self.unicode)
|
.unicode(self.unicode)
|
||||||
.build()
|
.build()
|
||||||
.translate(pattern, &ast)
|
.parse(pattern)
|
||||||
.map_err(Error::regex)?;
|
.map_err(Error::regex)?;
|
||||||
let expr = match self.line_terminator {
|
let expr = match self.line_terminator {
|
||||||
None => expr,
|
None => expr,
|
||||||
@@ -98,34 +99,21 @@ impl Config {
|
|||||||
fn is_case_insensitive(
|
fn is_case_insensitive(
|
||||||
&self,
|
&self,
|
||||||
analysis: &AstAnalysis,
|
analysis: &AstAnalysis,
|
||||||
) -> bool {
|
) -> Result<bool, Error> {
|
||||||
if self.case_insensitive {
|
if self.case_insensitive {
|
||||||
return true;
|
return Ok(true);
|
||||||
}
|
}
|
||||||
if !self.case_smart {
|
if !self.case_smart {
|
||||||
return false;
|
return Ok(false);
|
||||||
}
|
}
|
||||||
analysis.any_literal() && !analysis.any_uppercase()
|
Ok(analysis.any_literal() && !analysis.any_uppercase())
|
||||||
}
|
|
||||||
|
|
||||||
/// Returns true if and only if this config is simple enough such that
|
|
||||||
/// if the pattern is a simple alternation of literals, then it can be
|
|
||||||
/// constructed via a plain Aho-Corasick automaton.
|
|
||||||
///
|
|
||||||
/// Note that it is OK to return true even when settings like `multi_line`
|
|
||||||
/// are enabled, since if multi-line can impact the match semantics of a
|
|
||||||
/// regex, then it is by definition not a simple alternation of literals.
|
|
||||||
pub fn can_plain_aho_corasick(&self) -> bool {
|
|
||||||
!self.word
|
|
||||||
&& !self.case_insensitive
|
|
||||||
&& !self.case_smart
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Perform analysis on the AST of this pattern.
|
/// Perform analysis on the AST of this pattern.
|
||||||
///
|
///
|
||||||
/// This returns an error if the given pattern failed to parse.
|
/// This returns an error if the given pattern failed to parse.
|
||||||
fn analysis(&self, ast: &Ast) -> Result<AstAnalysis, Error> {
|
fn analysis(&self, pattern: &str) -> Result<AstAnalysis, Error> {
|
||||||
Ok(AstAnalysis::from_ast(ast))
|
Ok(AstAnalysis::from_ast(&self.ast(pattern)?))
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Parse the given pattern into its abstract syntax.
|
/// Parse the given pattern into its abstract syntax.
|
||||||
@@ -172,28 +160,11 @@ impl ConfiguredHIR {
|
|||||||
non_matching_bytes(&self.expr)
|
non_matching_bytes(&self.expr)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Returns true if and only if this regex needs to have its match offsets
|
|
||||||
/// tweaked because of CRLF support. Specifically, this occurs when the
|
|
||||||
/// CRLF hack is enabled and the regex is line anchored at the end. In
|
|
||||||
/// this case, matches that end with a `\r` have the `\r` stripped.
|
|
||||||
pub fn needs_crlf_stripped(&self) -> bool {
|
|
||||||
self.config.crlf && self.expr.is_line_anchored_end()
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Builds a regular expression from this HIR expression.
|
/// Builds a regular expression from this HIR expression.
|
||||||
pub fn regex(&self) -> Result<Regex, Error> {
|
pub fn regex(&self) -> Result<Regex, Error> {
|
||||||
self.pattern_to_regex(&self.expr.to_string())
|
self.pattern_to_regex(&self.expr.to_string())
|
||||||
}
|
}
|
||||||
|
|
||||||
/// If this HIR corresponds to an alternation of literals with no
|
|
||||||
/// capturing groups, then this returns those literals.
|
|
||||||
pub fn alternation_literals(&self) -> Option<Vec<Vec<u8>>> {
|
|
||||||
if !self.config.can_plain_aho_corasick() {
|
|
||||||
return None;
|
|
||||||
}
|
|
||||||
alternation_literals(&self.expr)
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Applies the given function to the concrete syntax of this HIR and then
|
/// Applies the given function to the concrete syntax of this HIR and then
|
||||||
/// generates a new HIR based on the result of the function in a way that
|
/// generates a new HIR based on the result of the function in a way that
|
||||||
/// preserves the configuration.
|
/// preserves the configuration.
|
||||||
@@ -228,7 +199,7 @@ impl ConfiguredHIR {
|
|||||||
if self.config.line_terminator.is_none() {
|
if self.config.line_terminator.is_none() {
|
||||||
return Ok(None);
|
return Ok(None);
|
||||||
}
|
}
|
||||||
match LiteralSets::new(&self.expr).one_regex(self.config.word) {
|
match LiteralSets::new(&self.expr).one_regex() {
|
||||||
None => Ok(None),
|
None => Ok(None),
|
||||||
Some(pattern) => self.pattern_to_regex(&pattern).map(Some),
|
Some(pattern) => self.pattern_to_regex(&pattern).map(Some),
|
||||||
}
|
}
|
||||||
|
@@ -1,112 +1,5 @@
|
|||||||
use std::collections::HashMap;
|
|
||||||
|
|
||||||
use grep_matcher::{Match, Matcher, NoError};
|
|
||||||
use regex::bytes::Regex;
|
|
||||||
use regex_syntax::hir::{self, Hir, HirKind};
|
use regex_syntax::hir::{self, Hir, HirKind};
|
||||||
|
|
||||||
use config::ConfiguredHIR;
|
|
||||||
use error::Error;
|
|
||||||
use matcher::RegexCaptures;
|
|
||||||
|
|
||||||
/// A matcher for implementing "word match" semantics.
|
|
||||||
#[derive(Clone, Debug)]
|
|
||||||
pub struct CRLFMatcher {
|
|
||||||
/// The regex.
|
|
||||||
regex: Regex,
|
|
||||||
/// A map from capture group name to capture group index.
|
|
||||||
names: HashMap<String, usize>,
|
|
||||||
}
|
|
||||||
|
|
||||||
impl CRLFMatcher {
|
|
||||||
/// Create a new matcher from the given pattern that strips `\r` from the
|
|
||||||
/// end of every match.
|
|
||||||
///
|
|
||||||
/// This panics if the given expression doesn't need its CRLF stripped.
|
|
||||||
pub fn new(expr: &ConfiguredHIR) -> Result<CRLFMatcher, Error> {
|
|
||||||
assert!(expr.needs_crlf_stripped());
|
|
||||||
|
|
||||||
let regex = expr.regex()?;
|
|
||||||
let mut names = HashMap::new();
|
|
||||||
for (i, optional_name) in regex.capture_names().enumerate() {
|
|
||||||
if let Some(name) = optional_name {
|
|
||||||
names.insert(name.to_string(), i.checked_sub(1).unwrap());
|
|
||||||
}
|
|
||||||
}
|
|
||||||
Ok(CRLFMatcher { regex, names })
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Return the underlying regex used by this matcher.
|
|
||||||
pub fn regex(&self) -> &Regex {
|
|
||||||
&self.regex
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl Matcher for CRLFMatcher {
|
|
||||||
type Captures = RegexCaptures;
|
|
||||||
type Error = NoError;
|
|
||||||
|
|
||||||
fn find_at(
|
|
||||||
&self,
|
|
||||||
haystack: &[u8],
|
|
||||||
at: usize,
|
|
||||||
) -> Result<Option<Match>, NoError> {
|
|
||||||
let m = match self.regex.find_at(haystack, at) {
|
|
||||||
None => return Ok(None),
|
|
||||||
Some(m) => Match::new(m.start(), m.end()),
|
|
||||||
};
|
|
||||||
Ok(Some(adjust_match(haystack, m)))
|
|
||||||
}
|
|
||||||
|
|
||||||
fn new_captures(&self) -> Result<RegexCaptures, NoError> {
|
|
||||||
Ok(RegexCaptures::new(self.regex.capture_locations()))
|
|
||||||
}
|
|
||||||
|
|
||||||
fn capture_count(&self) -> usize {
|
|
||||||
self.regex.captures_len().checked_sub(1).unwrap()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn capture_index(&self, name: &str) -> Option<usize> {
|
|
||||||
self.names.get(name).map(|i| *i)
|
|
||||||
}
|
|
||||||
|
|
||||||
fn captures_at(
|
|
||||||
&self,
|
|
||||||
haystack: &[u8],
|
|
||||||
at: usize,
|
|
||||||
caps: &mut RegexCaptures,
|
|
||||||
) -> Result<bool, NoError> {
|
|
||||||
caps.strip_crlf(false);
|
|
||||||
let r = self.regex.captures_read_at(
|
|
||||||
caps.locations_mut(), haystack, at,
|
|
||||||
);
|
|
||||||
if !r.is_some() {
|
|
||||||
return Ok(false);
|
|
||||||
}
|
|
||||||
|
|
||||||
// If the end of our match includes a `\r`, then strip it from all
|
|
||||||
// capture groups ending at the same location.
|
|
||||||
let end = caps.locations().get(0).unwrap().1;
|
|
||||||
if end > 0 && haystack.get(end - 1) == Some(&b'\r') {
|
|
||||||
caps.strip_crlf(true);
|
|
||||||
}
|
|
||||||
Ok(true)
|
|
||||||
}
|
|
||||||
|
|
||||||
// We specifically do not implement other methods like find_iter or
|
|
||||||
// captures_iter. Namely, the iter methods are guaranteed to be correct
|
|
||||||
// by virtue of implementing find_at and captures_at above.
|
|
||||||
}
|
|
||||||
|
|
||||||
/// If the given match ends with a `\r`, then return a new match that ends
|
|
||||||
/// immediately before the `\r`.
|
|
||||||
pub fn adjust_match(haystack: &[u8], m: Match) -> Match {
|
|
||||||
if m.end() > 0 && haystack.get(m.end() - 1) == Some(&b'\r') {
|
|
||||||
m.with_end(m.end() - 1)
|
|
||||||
} else {
|
|
||||||
m
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Substitutes all occurrences of multi-line enabled `$` with `(?:\r?$)`.
|
/// Substitutes all occurrences of multi-line enabled `$` with `(?:\r?$)`.
|
||||||
///
|
///
|
||||||
/// This does not preserve the exact semantics of the given expression,
|
/// This does not preserve the exact semantics of the given expression,
|
||||||
|
@@ -4,7 +4,6 @@ An implementation of `grep-matcher`'s `Matcher` trait for Rust's regex engine.
|
|||||||
|
|
||||||
#![deny(missing_docs)]
|
#![deny(missing_docs)]
|
||||||
|
|
||||||
extern crate aho_corasick;
|
|
||||||
extern crate grep_matcher;
|
extern crate grep_matcher;
|
||||||
#[macro_use]
|
#[macro_use]
|
||||||
extern crate log;
|
extern crate log;
|
||||||
@@ -22,7 +21,6 @@ mod crlf;
|
|||||||
mod error;
|
mod error;
|
||||||
mod literal;
|
mod literal;
|
||||||
mod matcher;
|
mod matcher;
|
||||||
mod multi;
|
|
||||||
mod non_matching;
|
mod non_matching;
|
||||||
mod strip;
|
mod strip;
|
||||||
mod util;
|
mod util;
|
||||||
|
@@ -47,23 +47,18 @@ impl LiteralSets {
|
|||||||
/// generated these literal sets. The idea here is that the pattern
|
/// generated these literal sets. The idea here is that the pattern
|
||||||
/// returned by this method is much cheaper to search for. i.e., It is
|
/// returned by this method is much cheaper to search for. i.e., It is
|
||||||
/// usually a single literal or an alternation of literals.
|
/// usually a single literal or an alternation of literals.
|
||||||
pub fn one_regex(&self, word: bool) -> Option<String> {
|
pub fn one_regex(&self) -> Option<String> {
|
||||||
// TODO: The logic in this function is basically inscrutable. It grew
|
// TODO: The logic in this function is basically inscrutable. It grew
|
||||||
// organically in the old grep 0.1 crate. Ideally, it would be
|
// organically in the old grep 0.1 crate. Ideally, it would be
|
||||||
// re-worked. In fact, the entire inner literal extraction should be
|
// re-worked. In fact, the entire inner literal extraction should be
|
||||||
// re-worked. Actually, most of regex-syntax's literal extraction
|
// re-worked. Actually, most of regex-syntax's literal extraction
|
||||||
// should also be re-worked. Alas... only so much time in the day.
|
// should also be re-worked. Alas... only so much time in the day.
|
||||||
|
|
||||||
if !word {
|
if self.prefixes.all_complete() && !self.prefixes.is_empty() {
|
||||||
if self.prefixes.all_complete() && !self.prefixes.is_empty() {
|
debug!("literal prefixes detected: {:?}", self.prefixes);
|
||||||
debug!("literal prefixes detected: {:?}", self.prefixes);
|
// When this is true, the regex engine will do a literal scan,
|
||||||
// When this is true, the regex engine will do a literal scan,
|
// so we don't need to return anything.
|
||||||
// so we don't need to return anything. But we only do this
|
return None;
|
||||||
// if we aren't doing a word regex, since a word regex adds
|
|
||||||
// a `(?:\W|^)` to the beginning of the regex, thereby
|
|
||||||
// defeating the regex engine's literal detection.
|
|
||||||
return None;
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// Out of inner required literals, prefixes and suffixes, which one
|
// Out of inner required literals, prefixes and suffixes, which one
|
||||||
@@ -171,10 +166,10 @@ fn union_required(expr: &Hir, lits: &mut Literals) {
|
|||||||
lits.cut();
|
lits.cut();
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
if lits2.contains_empty() || !is_simple(&e) {
|
if lits2.contains_empty() {
|
||||||
lits.cut();
|
lits.cut();
|
||||||
}
|
}
|
||||||
if !lits.cross_product(&lits2) || !lits2.any_complete() {
|
if !lits.cross_product(&lits2) {
|
||||||
// If this expression couldn't yield any literal that
|
// If this expression couldn't yield any literal that
|
||||||
// could be extended, then we need to quit. Since we're
|
// could be extended, then we need to quit. Since we're
|
||||||
// short-circuiting, we also need to freeze every member.
|
// short-circuiting, we also need to freeze every member.
|
||||||
@@ -255,20 +250,6 @@ fn alternate_literals<F: FnMut(&Hir, &mut Literals)>(
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn is_simple(expr: &Hir) -> bool {
|
|
||||||
match *expr.kind() {
|
|
||||||
HirKind::Empty
|
|
||||||
| HirKind::Literal(_)
|
|
||||||
| HirKind::Class(_)
|
|
||||||
| HirKind::Repetition(_)
|
|
||||||
| HirKind::Concat(_)
|
|
||||||
| HirKind::Alternation(_) => true,
|
|
||||||
HirKind::Anchor(_)
|
|
||||||
| HirKind::WordBoundary(_)
|
|
||||||
| HirKind::Group(_) => false,
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Return the number of characters in the given class.
|
/// Return the number of characters in the given class.
|
||||||
fn count_unicode_class(cls: &hir::ClassUnicode) -> u32 {
|
fn count_unicode_class(cls: &hir::ClassUnicode) -> u32 {
|
||||||
cls.iter().map(|r| 1 + (r.end() as u32 - r.start() as u32)).sum()
|
cls.iter().map(|r| 1 + (r.end() as u32 - r.start() as u32)).sum()
|
||||||
@@ -290,7 +271,7 @@ mod tests {
|
|||||||
}
|
}
|
||||||
|
|
||||||
fn one_regex(pattern: &str) -> Option<String> {
|
fn one_regex(pattern: &str) -> Option<String> {
|
||||||
sets(pattern).one_regex(false)
|
sets(pattern).one_regex()
|
||||||
}
|
}
|
||||||
|
|
||||||
// Put a pattern into the same format as the one returned by `one_regex`.
|
// Put a pattern into the same format as the one returned by `one_regex`.
|
||||||
@@ -320,12 +301,4 @@ mod tests {
|
|||||||
// assert_eq!(one_regex(r"\w(foo|bar|baz)"), pat("foo|bar|baz"));
|
// assert_eq!(one_regex(r"\w(foo|bar|baz)"), pat("foo|bar|baz"));
|
||||||
// assert_eq!(one_regex(r"\w(foo|bar|baz)\w"), pat("foo|bar|baz"));
|
// assert_eq!(one_regex(r"\w(foo|bar|baz)\w"), pat("foo|bar|baz"));
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
|
||||||
fn regression_1064() {
|
|
||||||
// Regression from:
|
|
||||||
// https://github.com/BurntSushi/ripgrep/issues/1064
|
|
||||||
// assert_eq!(one_regex(r"a.*c"), pat("a"));
|
|
||||||
assert_eq!(one_regex(r"a(.*c)"), pat("a"));
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
@@ -6,9 +6,7 @@ use grep_matcher::{
|
|||||||
use regex::bytes::{CaptureLocations, Regex};
|
use regex::bytes::{CaptureLocations, Regex};
|
||||||
|
|
||||||
use config::{Config, ConfiguredHIR};
|
use config::{Config, ConfiguredHIR};
|
||||||
use crlf::CRLFMatcher;
|
|
||||||
use error::Error;
|
use error::Error;
|
||||||
use multi::MultiLiteralMatcher;
|
|
||||||
use word::WordMatcher;
|
use word::WordMatcher;
|
||||||
|
|
||||||
/// A builder for constructing a `Matcher` using regular expressions.
|
/// A builder for constructing a `Matcher` using regular expressions.
|
||||||
@@ -51,40 +49,14 @@ impl RegexMatcherBuilder {
|
|||||||
if let Some(ref re) = fast_line_regex {
|
if let Some(ref re) = fast_line_regex {
|
||||||
trace!("extracted fast line regex: {:?}", re);
|
trace!("extracted fast line regex: {:?}", re);
|
||||||
}
|
}
|
||||||
|
|
||||||
let matcher = RegexMatcherImpl::new(&chir)?;
|
|
||||||
trace!("final regex: {:?}", matcher.regex());
|
|
||||||
Ok(RegexMatcher {
|
Ok(RegexMatcher {
|
||||||
config: self.config.clone(),
|
config: self.config.clone(),
|
||||||
matcher: matcher,
|
matcher: RegexMatcherImpl::new(&chir)?,
|
||||||
fast_line_regex: fast_line_regex,
|
fast_line_regex: fast_line_regex,
|
||||||
non_matching_bytes: non_matching_bytes,
|
non_matching_bytes: non_matching_bytes,
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Build a new matcher from a plain alternation of literals.
|
|
||||||
///
|
|
||||||
/// Depending on the configuration set by the builder, this may be able to
|
|
||||||
/// build a matcher substantially faster than by joining the patterns with
|
|
||||||
/// a `|` and calling `build`.
|
|
||||||
pub fn build_literals<B: AsRef<str>>(
|
|
||||||
&self,
|
|
||||||
literals: &[B],
|
|
||||||
) -> Result<RegexMatcher, Error> {
|
|
||||||
let slices: Vec<_> = literals.iter().map(|s| s.as_ref()).collect();
|
|
||||||
if !self.config.can_plain_aho_corasick() || literals.len() < 40 {
|
|
||||||
return self.build(&slices.join("|"));
|
|
||||||
}
|
|
||||||
let matcher = MultiLiteralMatcher::new(&slices)?;
|
|
||||||
let imp = RegexMatcherImpl::MultiLiteral(matcher);
|
|
||||||
Ok(RegexMatcher {
|
|
||||||
config: self.config.clone(),
|
|
||||||
matcher: imp,
|
|
||||||
fast_line_regex: None,
|
|
||||||
non_matching_bytes: ByteSet::empty(),
|
|
||||||
})
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Set the value for the case insensitive (`i`) flag.
|
/// Set the value for the case insensitive (`i`) flag.
|
||||||
///
|
///
|
||||||
/// When enabled, letters in the pattern will match both upper case and
|
/// When enabled, letters in the pattern will match both upper case and
|
||||||
@@ -372,13 +344,6 @@ impl RegexMatcher {
|
|||||||
enum RegexMatcherImpl {
|
enum RegexMatcherImpl {
|
||||||
/// The standard matcher used for all regular expressions.
|
/// The standard matcher used for all regular expressions.
|
||||||
Standard(StandardMatcher),
|
Standard(StandardMatcher),
|
||||||
/// A matcher for an alternation of plain literals.
|
|
||||||
MultiLiteral(MultiLiteralMatcher),
|
|
||||||
/// A matcher that strips `\r` from the end of matches.
|
|
||||||
///
|
|
||||||
/// This is only used when the CRLF hack is enabled and the regex is line
|
|
||||||
/// anchored at the end.
|
|
||||||
CRLF(CRLFMatcher),
|
|
||||||
/// A matcher that only matches at word boundaries. This transforms the
|
/// A matcher that only matches at word boundaries. This transforms the
|
||||||
/// regex to `(^|\W)(...)($|\W)` instead of the more intuitive `\b(...)\b`.
|
/// regex to `(^|\W)(...)($|\W)` instead of the more intuitive `\b(...)\b`.
|
||||||
/// Because of this, the WordMatcher provides its own implementation of
|
/// Because of this, the WordMatcher provides its own implementation of
|
||||||
@@ -393,28 +358,10 @@ impl RegexMatcherImpl {
|
|||||||
fn new(expr: &ConfiguredHIR) -> Result<RegexMatcherImpl, Error> {
|
fn new(expr: &ConfiguredHIR) -> Result<RegexMatcherImpl, Error> {
|
||||||
if expr.config().word {
|
if expr.config().word {
|
||||||
Ok(RegexMatcherImpl::Word(WordMatcher::new(expr)?))
|
Ok(RegexMatcherImpl::Word(WordMatcher::new(expr)?))
|
||||||
} else if expr.needs_crlf_stripped() {
|
|
||||||
Ok(RegexMatcherImpl::CRLF(CRLFMatcher::new(expr)?))
|
|
||||||
} else {
|
} else {
|
||||||
if let Some(lits) = expr.alternation_literals() {
|
|
||||||
if lits.len() >= 40 {
|
|
||||||
let matcher = MultiLiteralMatcher::new(&lits)?;
|
|
||||||
return Ok(RegexMatcherImpl::MultiLiteral(matcher));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
Ok(RegexMatcherImpl::Standard(StandardMatcher::new(expr)?))
|
Ok(RegexMatcherImpl::Standard(StandardMatcher::new(expr)?))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Return the underlying regex object used.
|
|
||||||
fn regex(&self) -> String {
|
|
||||||
match *self {
|
|
||||||
RegexMatcherImpl::Word(ref x) => x.regex().to_string(),
|
|
||||||
RegexMatcherImpl::CRLF(ref x) => x.regex().to_string(),
|
|
||||||
RegexMatcherImpl::MultiLiteral(_) => "<N/A>".to_string(),
|
|
||||||
RegexMatcherImpl::Standard(ref x) => x.regex.to_string(),
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// This implementation just dispatches on the internal matcher impl except
|
// This implementation just dispatches on the internal matcher impl except
|
||||||
@@ -432,8 +379,6 @@ impl Matcher for RegexMatcher {
|
|||||||
use self::RegexMatcherImpl::*;
|
use self::RegexMatcherImpl::*;
|
||||||
match self.matcher {
|
match self.matcher {
|
||||||
Standard(ref m) => m.find_at(haystack, at),
|
Standard(ref m) => m.find_at(haystack, at),
|
||||||
MultiLiteral(ref m) => m.find_at(haystack, at),
|
|
||||||
CRLF(ref m) => m.find_at(haystack, at),
|
|
||||||
Word(ref m) => m.find_at(haystack, at),
|
Word(ref m) => m.find_at(haystack, at),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -442,8 +387,6 @@ impl Matcher for RegexMatcher {
|
|||||||
use self::RegexMatcherImpl::*;
|
use self::RegexMatcherImpl::*;
|
||||||
match self.matcher {
|
match self.matcher {
|
||||||
Standard(ref m) => m.new_captures(),
|
Standard(ref m) => m.new_captures(),
|
||||||
MultiLiteral(ref m) => m.new_captures(),
|
|
||||||
CRLF(ref m) => m.new_captures(),
|
|
||||||
Word(ref m) => m.new_captures(),
|
Word(ref m) => m.new_captures(),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -452,8 +395,6 @@ impl Matcher for RegexMatcher {
|
|||||||
use self::RegexMatcherImpl::*;
|
use self::RegexMatcherImpl::*;
|
||||||
match self.matcher {
|
match self.matcher {
|
||||||
Standard(ref m) => m.capture_count(),
|
Standard(ref m) => m.capture_count(),
|
||||||
MultiLiteral(ref m) => m.capture_count(),
|
|
||||||
CRLF(ref m) => m.capture_count(),
|
|
||||||
Word(ref m) => m.capture_count(),
|
Word(ref m) => m.capture_count(),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -462,8 +403,6 @@ impl Matcher for RegexMatcher {
|
|||||||
use self::RegexMatcherImpl::*;
|
use self::RegexMatcherImpl::*;
|
||||||
match self.matcher {
|
match self.matcher {
|
||||||
Standard(ref m) => m.capture_index(name),
|
Standard(ref m) => m.capture_index(name),
|
||||||
MultiLiteral(ref m) => m.capture_index(name),
|
|
||||||
CRLF(ref m) => m.capture_index(name),
|
|
||||||
Word(ref m) => m.capture_index(name),
|
Word(ref m) => m.capture_index(name),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -472,8 +411,6 @@ impl Matcher for RegexMatcher {
|
|||||||
use self::RegexMatcherImpl::*;
|
use self::RegexMatcherImpl::*;
|
||||||
match self.matcher {
|
match self.matcher {
|
||||||
Standard(ref m) => m.find(haystack),
|
Standard(ref m) => m.find(haystack),
|
||||||
MultiLiteral(ref m) => m.find(haystack),
|
|
||||||
CRLF(ref m) => m.find(haystack),
|
|
||||||
Word(ref m) => m.find(haystack),
|
Word(ref m) => m.find(haystack),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -488,8 +425,6 @@ impl Matcher for RegexMatcher {
|
|||||||
use self::RegexMatcherImpl::*;
|
use self::RegexMatcherImpl::*;
|
||||||
match self.matcher {
|
match self.matcher {
|
||||||
Standard(ref m) => m.find_iter(haystack, matched),
|
Standard(ref m) => m.find_iter(haystack, matched),
|
||||||
MultiLiteral(ref m) => m.find_iter(haystack, matched),
|
|
||||||
CRLF(ref m) => m.find_iter(haystack, matched),
|
|
||||||
Word(ref m) => m.find_iter(haystack, matched),
|
Word(ref m) => m.find_iter(haystack, matched),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -504,8 +439,6 @@ impl Matcher for RegexMatcher {
|
|||||||
use self::RegexMatcherImpl::*;
|
use self::RegexMatcherImpl::*;
|
||||||
match self.matcher {
|
match self.matcher {
|
||||||
Standard(ref m) => m.try_find_iter(haystack, matched),
|
Standard(ref m) => m.try_find_iter(haystack, matched),
|
||||||
MultiLiteral(ref m) => m.try_find_iter(haystack, matched),
|
|
||||||
CRLF(ref m) => m.try_find_iter(haystack, matched),
|
|
||||||
Word(ref m) => m.try_find_iter(haystack, matched),
|
Word(ref m) => m.try_find_iter(haystack, matched),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -518,8 +451,6 @@ impl Matcher for RegexMatcher {
|
|||||||
use self::RegexMatcherImpl::*;
|
use self::RegexMatcherImpl::*;
|
||||||
match self.matcher {
|
match self.matcher {
|
||||||
Standard(ref m) => m.captures(haystack, caps),
|
Standard(ref m) => m.captures(haystack, caps),
|
||||||
MultiLiteral(ref m) => m.captures(haystack, caps),
|
|
||||||
CRLF(ref m) => m.captures(haystack, caps),
|
|
||||||
Word(ref m) => m.captures(haystack, caps),
|
Word(ref m) => m.captures(haystack, caps),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -535,8 +466,6 @@ impl Matcher for RegexMatcher {
|
|||||||
use self::RegexMatcherImpl::*;
|
use self::RegexMatcherImpl::*;
|
||||||
match self.matcher {
|
match self.matcher {
|
||||||
Standard(ref m) => m.captures_iter(haystack, caps, matched),
|
Standard(ref m) => m.captures_iter(haystack, caps, matched),
|
||||||
MultiLiteral(ref m) => m.captures_iter(haystack, caps, matched),
|
|
||||||
CRLF(ref m) => m.captures_iter(haystack, caps, matched),
|
|
||||||
Word(ref m) => m.captures_iter(haystack, caps, matched),
|
Word(ref m) => m.captures_iter(haystack, caps, matched),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -552,10 +481,6 @@ impl Matcher for RegexMatcher {
|
|||||||
use self::RegexMatcherImpl::*;
|
use self::RegexMatcherImpl::*;
|
||||||
match self.matcher {
|
match self.matcher {
|
||||||
Standard(ref m) => m.try_captures_iter(haystack, caps, matched),
|
Standard(ref m) => m.try_captures_iter(haystack, caps, matched),
|
||||||
MultiLiteral(ref m) => {
|
|
||||||
m.try_captures_iter(haystack, caps, matched)
|
|
||||||
}
|
|
||||||
CRLF(ref m) => m.try_captures_iter(haystack, caps, matched),
|
|
||||||
Word(ref m) => m.try_captures_iter(haystack, caps, matched),
|
Word(ref m) => m.try_captures_iter(haystack, caps, matched),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -569,8 +494,6 @@ impl Matcher for RegexMatcher {
|
|||||||
use self::RegexMatcherImpl::*;
|
use self::RegexMatcherImpl::*;
|
||||||
match self.matcher {
|
match self.matcher {
|
||||||
Standard(ref m) => m.captures_at(haystack, at, caps),
|
Standard(ref m) => m.captures_at(haystack, at, caps),
|
||||||
MultiLiteral(ref m) => m.captures_at(haystack, at, caps),
|
|
||||||
CRLF(ref m) => m.captures_at(haystack, at, caps),
|
|
||||||
Word(ref m) => m.captures_at(haystack, at, caps),
|
Word(ref m) => m.captures_at(haystack, at, caps),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -586,8 +509,6 @@ impl Matcher for RegexMatcher {
|
|||||||
use self::RegexMatcherImpl::*;
|
use self::RegexMatcherImpl::*;
|
||||||
match self.matcher {
|
match self.matcher {
|
||||||
Standard(ref m) => m.replace(haystack, dst, append),
|
Standard(ref m) => m.replace(haystack, dst, append),
|
||||||
MultiLiteral(ref m) => m.replace(haystack, dst, append),
|
|
||||||
CRLF(ref m) => m.replace(haystack, dst, append),
|
|
||||||
Word(ref m) => m.replace(haystack, dst, append),
|
Word(ref m) => m.replace(haystack, dst, append),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -606,12 +527,6 @@ impl Matcher for RegexMatcher {
|
|||||||
Standard(ref m) => {
|
Standard(ref m) => {
|
||||||
m.replace_with_captures(haystack, caps, dst, append)
|
m.replace_with_captures(haystack, caps, dst, append)
|
||||||
}
|
}
|
||||||
MultiLiteral(ref m) => {
|
|
||||||
m.replace_with_captures(haystack, caps, dst, append)
|
|
||||||
}
|
|
||||||
CRLF(ref m) => {
|
|
||||||
m.replace_with_captures(haystack, caps, dst, append)
|
|
||||||
}
|
|
||||||
Word(ref m) => {
|
Word(ref m) => {
|
||||||
m.replace_with_captures(haystack, caps, dst, append)
|
m.replace_with_captures(haystack, caps, dst, append)
|
||||||
}
|
}
|
||||||
@@ -622,8 +537,6 @@ impl Matcher for RegexMatcher {
|
|||||||
use self::RegexMatcherImpl::*;
|
use self::RegexMatcherImpl::*;
|
||||||
match self.matcher {
|
match self.matcher {
|
||||||
Standard(ref m) => m.is_match(haystack),
|
Standard(ref m) => m.is_match(haystack),
|
||||||
MultiLiteral(ref m) => m.is_match(haystack),
|
|
||||||
CRLF(ref m) => m.is_match(haystack),
|
|
||||||
Word(ref m) => m.is_match(haystack),
|
Word(ref m) => m.is_match(haystack),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -636,8 +549,6 @@ impl Matcher for RegexMatcher {
|
|||||||
use self::RegexMatcherImpl::*;
|
use self::RegexMatcherImpl::*;
|
||||||
match self.matcher {
|
match self.matcher {
|
||||||
Standard(ref m) => m.is_match_at(haystack, at),
|
Standard(ref m) => m.is_match_at(haystack, at),
|
||||||
MultiLiteral(ref m) => m.is_match_at(haystack, at),
|
|
||||||
CRLF(ref m) => m.is_match_at(haystack, at),
|
|
||||||
Word(ref m) => m.is_match_at(haystack, at),
|
Word(ref m) => m.is_match_at(haystack, at),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -649,8 +560,6 @@ impl Matcher for RegexMatcher {
|
|||||||
use self::RegexMatcherImpl::*;
|
use self::RegexMatcherImpl::*;
|
||||||
match self.matcher {
|
match self.matcher {
|
||||||
Standard(ref m) => m.shortest_match(haystack),
|
Standard(ref m) => m.shortest_match(haystack),
|
||||||
MultiLiteral(ref m) => m.shortest_match(haystack),
|
|
||||||
CRLF(ref m) => m.shortest_match(haystack),
|
|
||||||
Word(ref m) => m.shortest_match(haystack),
|
Word(ref m) => m.shortest_match(haystack),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -663,8 +572,6 @@ impl Matcher for RegexMatcher {
|
|||||||
use self::RegexMatcherImpl::*;
|
use self::RegexMatcherImpl::*;
|
||||||
match self.matcher {
|
match self.matcher {
|
||||||
Standard(ref m) => m.shortest_match_at(haystack, at),
|
Standard(ref m) => m.shortest_match_at(haystack, at),
|
||||||
MultiLiteral(ref m) => m.shortest_match_at(haystack, at),
|
|
||||||
CRLF(ref m) => m.shortest_match_at(haystack, at),
|
|
||||||
Word(ref m) => m.shortest_match_at(haystack, at),
|
Word(ref m) => m.shortest_match_at(haystack, at),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -764,9 +671,7 @@ impl Matcher for StandardMatcher {
|
|||||||
at: usize,
|
at: usize,
|
||||||
caps: &mut RegexCaptures,
|
caps: &mut RegexCaptures,
|
||||||
) -> Result<bool, NoError> {
|
) -> Result<bool, NoError> {
|
||||||
Ok(self.regex.captures_read_at(
|
Ok(self.regex.captures_read_at(&mut caps.locs, haystack, at).is_some())
|
||||||
&mut caps.locations_mut(), haystack, at,
|
|
||||||
).is_some())
|
|
||||||
}
|
}
|
||||||
|
|
||||||
fn shortest_match_at(
|
fn shortest_match_at(
|
||||||
@@ -793,84 +698,34 @@ impl Matcher for StandardMatcher {
|
|||||||
/// index of the group using the corresponding matcher's `capture_index`
|
/// index of the group using the corresponding matcher's `capture_index`
|
||||||
/// method, and then use that index with `RegexCaptures::get`.
|
/// method, and then use that index with `RegexCaptures::get`.
|
||||||
#[derive(Clone, Debug)]
|
#[derive(Clone, Debug)]
|
||||||
pub struct RegexCaptures(RegexCapturesImp);
|
pub struct RegexCaptures {
|
||||||
|
/// Where the locations are stored.
|
||||||
#[derive(Clone, Debug)]
|
locs: CaptureLocations,
|
||||||
enum RegexCapturesImp {
|
/// These captures behave as if the capturing groups begin at the given
|
||||||
AhoCorasick {
|
/// offset. When set to `0`, this has no affect and capture groups are
|
||||||
/// The start and end of the match, corresponding to capture group 0.
|
/// indexed like normal.
|
||||||
mat: Option<Match>,
|
///
|
||||||
},
|
/// This is useful when building matchers that wrap arbitrary regular
|
||||||
Regex {
|
/// expressions. For example, `WordMatcher` takes an existing regex `re`
|
||||||
/// Where the locations are stored.
|
/// and creates `(?:^|\W)(re)(?:$|\W)`, but hides the fact that the regex
|
||||||
locs: CaptureLocations,
|
/// has been wrapped from the caller. In order to do this, the matcher
|
||||||
/// These captures behave as if the capturing groups begin at the given
|
/// and the capturing groups must behave as if `(re)` is the `0`th capture
|
||||||
/// offset. When set to `0`, this has no affect and capture groups are
|
/// group.
|
||||||
/// indexed like normal.
|
offset: usize,
|
||||||
///
|
|
||||||
/// This is useful when building matchers that wrap arbitrary regular
|
|
||||||
/// expressions. For example, `WordMatcher` takes an existing regex
|
|
||||||
/// `re` and creates `(?:^|\W)(re)(?:$|\W)`, but hides the fact that
|
|
||||||
/// the regex has been wrapped from the caller. In order to do this,
|
|
||||||
/// the matcher and the capturing groups must behave as if `(re)` is
|
|
||||||
/// the `0`th capture group.
|
|
||||||
offset: usize,
|
|
||||||
/// When enable, the end of a match has `\r` stripped from it, if one
|
|
||||||
/// exists.
|
|
||||||
strip_crlf: bool,
|
|
||||||
},
|
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Captures for RegexCaptures {
|
impl Captures for RegexCaptures {
|
||||||
fn len(&self) -> usize {
|
fn len(&self) -> usize {
|
||||||
match self.0 {
|
self.locs.len().checked_sub(self.offset).unwrap()
|
||||||
RegexCapturesImp::AhoCorasick { .. } => 1,
|
|
||||||
RegexCapturesImp::Regex { ref locs, offset, .. } => {
|
|
||||||
locs.len().checked_sub(offset).unwrap()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
fn get(&self, i: usize) -> Option<Match> {
|
fn get(&self, i: usize) -> Option<Match> {
|
||||||
match self.0 {
|
let actual = i.checked_add(self.offset).unwrap();
|
||||||
RegexCapturesImp::AhoCorasick { mat, .. } => {
|
self.locs.pos(actual).map(|(s, e)| Match::new(s, e))
|
||||||
if i == 0 {
|
|
||||||
mat
|
|
||||||
} else {
|
|
||||||
None
|
|
||||||
}
|
|
||||||
}
|
|
||||||
RegexCapturesImp::Regex { ref locs, offset, strip_crlf } => {
|
|
||||||
if !strip_crlf {
|
|
||||||
let actual = i.checked_add(offset).unwrap();
|
|
||||||
return locs.pos(actual).map(|(s, e)| Match::new(s, e));
|
|
||||||
}
|
|
||||||
|
|
||||||
// currently don't support capture offsetting with CRLF
|
|
||||||
// stripping
|
|
||||||
assert_eq!(offset, 0);
|
|
||||||
let m = match locs.pos(i).map(|(s, e)| Match::new(s, e)) {
|
|
||||||
None => return None,
|
|
||||||
Some(m) => m,
|
|
||||||
};
|
|
||||||
// If the end position of this match corresponds to the end
|
|
||||||
// position of the overall match, then we apply our CRLF
|
|
||||||
// stripping. Otherwise, we cannot assume stripping is correct.
|
|
||||||
if i == 0 || m.end() == locs.pos(0).unwrap().1 {
|
|
||||||
Some(m.with_end(m.end() - 1))
|
|
||||||
} else {
|
|
||||||
Some(m)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
impl RegexCaptures {
|
impl RegexCaptures {
|
||||||
pub(crate) fn simple() -> RegexCaptures {
|
|
||||||
RegexCaptures(RegexCapturesImp::AhoCorasick { mat: None })
|
|
||||||
}
|
|
||||||
|
|
||||||
pub(crate) fn new(locs: CaptureLocations) -> RegexCaptures {
|
pub(crate) fn new(locs: CaptureLocations) -> RegexCaptures {
|
||||||
RegexCaptures::with_offset(locs, 0)
|
RegexCaptures::with_offset(locs, 0)
|
||||||
}
|
}
|
||||||
@@ -879,53 +734,11 @@ impl RegexCaptures {
|
|||||||
locs: CaptureLocations,
|
locs: CaptureLocations,
|
||||||
offset: usize,
|
offset: usize,
|
||||||
) -> RegexCaptures {
|
) -> RegexCaptures {
|
||||||
RegexCaptures(RegexCapturesImp::Regex {
|
RegexCaptures { locs, offset }
|
||||||
locs, offset, strip_crlf: false,
|
|
||||||
})
|
|
||||||
}
|
}
|
||||||
|
|
||||||
pub(crate) fn locations(&self) -> &CaptureLocations {
|
pub(crate) fn locations(&mut self) -> &mut CaptureLocations {
|
||||||
match self.0 {
|
&mut self.locs
|
||||||
RegexCapturesImp::AhoCorasick { .. } => {
|
|
||||||
panic!("getting locations for simple captures is invalid")
|
|
||||||
}
|
|
||||||
RegexCapturesImp::Regex { ref locs, .. } => {
|
|
||||||
locs
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
pub(crate) fn locations_mut(&mut self) -> &mut CaptureLocations {
|
|
||||||
match self.0 {
|
|
||||||
RegexCapturesImp::AhoCorasick { .. } => {
|
|
||||||
panic!("getting locations for simple captures is invalid")
|
|
||||||
}
|
|
||||||
RegexCapturesImp::Regex { ref mut locs, .. } => {
|
|
||||||
locs
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
pub(crate) fn strip_crlf(&mut self, yes: bool) {
|
|
||||||
match self.0 {
|
|
||||||
RegexCapturesImp::AhoCorasick { .. } => {
|
|
||||||
panic!("setting strip_crlf for simple captures is invalid")
|
|
||||||
}
|
|
||||||
RegexCapturesImp::Regex { ref mut strip_crlf, .. } => {
|
|
||||||
*strip_crlf = yes;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
pub(crate) fn set_simple(&mut self, one: Option<Match>) {
|
|
||||||
match self.0 {
|
|
||||||
RegexCapturesImp::AhoCorasick { ref mut mat } => {
|
|
||||||
*mat = one;
|
|
||||||
}
|
|
||||||
RegexCapturesImp::Regex { .. } => {
|
|
||||||
panic!("setting simple captures for regex is invalid")
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@@ -1,127 +0,0 @@
|
|||||||
use aho_corasick::{AhoCorasick, AhoCorasickBuilder, MatchKind};
|
|
||||||
use grep_matcher::{Matcher, Match, NoError};
|
|
||||||
use regex_syntax::hir::Hir;
|
|
||||||
|
|
||||||
use error::Error;
|
|
||||||
use matcher::RegexCaptures;
|
|
||||||
|
|
||||||
/// A matcher for an alternation of literals.
|
|
||||||
///
|
|
||||||
/// Ideally, this optimization would be pushed down into the regex engine, but
|
|
||||||
/// making this work correctly there would require quite a bit of refactoring.
|
|
||||||
/// Moreover, doing it one layer above lets us do thing like, "if we
|
|
||||||
/// specifically only want to search for literals, then don't bother with
|
|
||||||
/// regex parsing at all."
|
|
||||||
#[derive(Clone, Debug)]
|
|
||||||
pub struct MultiLiteralMatcher {
|
|
||||||
/// The Aho-Corasick automaton.
|
|
||||||
ac: AhoCorasick,
|
|
||||||
}
|
|
||||||
|
|
||||||
impl MultiLiteralMatcher {
|
|
||||||
/// Create a new multi-literal matcher from the given literals.
|
|
||||||
pub fn new<B: AsRef<[u8]>>(
|
|
||||||
literals: &[B],
|
|
||||||
) -> Result<MultiLiteralMatcher, Error> {
|
|
||||||
let ac = AhoCorasickBuilder::new()
|
|
||||||
.match_kind(MatchKind::LeftmostFirst)
|
|
||||||
.auto_configure(literals)
|
|
||||||
.build_with_size::<usize, _, _>(literals)
|
|
||||||
.map_err(Error::regex)?;
|
|
||||||
Ok(MultiLiteralMatcher { ac })
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl Matcher for MultiLiteralMatcher {
|
|
||||||
type Captures = RegexCaptures;
|
|
||||||
type Error = NoError;
|
|
||||||
|
|
||||||
fn find_at(
|
|
||||||
&self,
|
|
||||||
haystack: &[u8],
|
|
||||||
at: usize,
|
|
||||||
) -> Result<Option<Match>, NoError> {
|
|
||||||
match self.ac.find(&haystack[at..]) {
|
|
||||||
None => Ok(None),
|
|
||||||
Some(m) => Ok(Some(Match::new(at + m.start(), at + m.end()))),
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
fn new_captures(&self) -> Result<RegexCaptures, NoError> {
|
|
||||||
Ok(RegexCaptures::simple())
|
|
||||||
}
|
|
||||||
|
|
||||||
fn capture_count(&self) -> usize {
|
|
||||||
1
|
|
||||||
}
|
|
||||||
|
|
||||||
fn capture_index(&self, _: &str) -> Option<usize> {
|
|
||||||
None
|
|
||||||
}
|
|
||||||
|
|
||||||
fn captures_at(
|
|
||||||
&self,
|
|
||||||
haystack: &[u8],
|
|
||||||
at: usize,
|
|
||||||
caps: &mut RegexCaptures,
|
|
||||||
) -> Result<bool, NoError> {
|
|
||||||
caps.set_simple(None);
|
|
||||||
let mat = self.find_at(haystack, at)?;
|
|
||||||
caps.set_simple(mat);
|
|
||||||
Ok(mat.is_some())
|
|
||||||
}
|
|
||||||
|
|
||||||
// We specifically do not implement other methods like find_iter. Namely,
|
|
||||||
// the iter methods are guaranteed to be correct by virtue of implementing
|
|
||||||
// find_at above.
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Alternation literals checks if the given HIR is a simple alternation of
|
|
||||||
/// literals, and if so, returns them. Otherwise, this returns None.
|
|
||||||
pub fn alternation_literals(expr: &Hir) -> Option<Vec<Vec<u8>>> {
|
|
||||||
use regex_syntax::hir::{HirKind, Literal};
|
|
||||||
|
|
||||||
// This is pretty hacky, but basically, if `is_alternation_literal` is
|
|
||||||
// true, then we can make several assumptions about the structure of our
|
|
||||||
// HIR. This is what justifies the `unreachable!` statements below.
|
|
||||||
|
|
||||||
if !expr.is_alternation_literal() {
|
|
||||||
return None;
|
|
||||||
}
|
|
||||||
let alts = match *expr.kind() {
|
|
||||||
HirKind::Alternation(ref alts) => alts,
|
|
||||||
_ => return None, // one literal isn't worth it
|
|
||||||
};
|
|
||||||
|
|
||||||
let extendlit = |lit: &Literal, dst: &mut Vec<u8>| {
|
|
||||||
match *lit {
|
|
||||||
Literal::Unicode(c) => {
|
|
||||||
let mut buf = [0; 4];
|
|
||||||
dst.extend_from_slice(c.encode_utf8(&mut buf).as_bytes());
|
|
||||||
}
|
|
||||||
Literal::Byte(b) => {
|
|
||||||
dst.push(b);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
let mut lits = vec![];
|
|
||||||
for alt in alts {
|
|
||||||
let mut lit = vec![];
|
|
||||||
match *alt.kind() {
|
|
||||||
HirKind::Empty => {}
|
|
||||||
HirKind::Literal(ref x) => extendlit(x, &mut lit),
|
|
||||||
HirKind::Concat(ref exprs) => {
|
|
||||||
for e in exprs {
|
|
||||||
match *e.kind() {
|
|
||||||
HirKind::Literal(ref x) => extendlit(x, &mut lit),
|
|
||||||
_ => unreachable!("expected literal, got {:?}", e),
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
_ => unreachable!("expected literal or concat, got {:?}", alt),
|
|
||||||
}
|
|
||||||
lits.push(lit);
|
|
||||||
}
|
|
||||||
Some(lits)
|
|
||||||
}
|
|
@@ -55,11 +55,6 @@ impl WordMatcher {
|
|||||||
}
|
}
|
||||||
Ok(WordMatcher { regex, names, locs })
|
Ok(WordMatcher { regex, names, locs })
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Return the underlying regex used by this matcher.
|
|
||||||
pub fn regex(&self) -> &Regex {
|
|
||||||
&self.regex
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Matcher for WordMatcher {
|
impl Matcher for WordMatcher {
|
||||||
@@ -103,9 +98,7 @@ impl Matcher for WordMatcher {
|
|||||||
at: usize,
|
at: usize,
|
||||||
caps: &mut RegexCaptures,
|
caps: &mut RegexCaptures,
|
||||||
) -> Result<bool, NoError> {
|
) -> Result<bool, NoError> {
|
||||||
let r = self.regex.captures_read_at(
|
let r = self.regex.captures_read_at(caps.locations(), haystack, at);
|
||||||
caps.locations_mut(), haystack, at,
|
|
||||||
);
|
|
||||||
Ok(r.is_some())
|
Ok(r.is_some())
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@@ -1,6 +1,6 @@
|
|||||||
[package]
|
[package]
|
||||||
name = "grep-searcher"
|
name = "grep-searcher"
|
||||||
version = "0.1.3" #:version
|
version = "0.1.1" #:version
|
||||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||||
description = """
|
description = """
|
||||||
Fast line oriented regex searching as a library.
|
Fast line oriented regex searching as a library.
|
||||||
@@ -13,21 +13,23 @@ keywords = ["regex", "grep", "egrep", "search", "pattern"]
|
|||||||
license = "Unlicense/MIT"
|
license = "Unlicense/MIT"
|
||||||
|
|
||||||
[dependencies]
|
[dependencies]
|
||||||
bstr = { version = "0.1.2", default-features = false, features = ["std"] }
|
bytecount = "0.3.2"
|
||||||
bytecount = "0.5"
|
encoding_rs = "0.8.6"
|
||||||
encoding_rs = "0.8.14"
|
encoding_rs_io = "0.1.2"
|
||||||
encoding_rs_io = "0.1.6"
|
grep-matcher = { version = "0.1.1", path = "../grep-matcher" }
|
||||||
grep-matcher = { version = "0.1.2", path = "../grep-matcher" }
|
|
||||||
log = "0.4.5"
|
log = "0.4.5"
|
||||||
memmap = "0.7"
|
memchr = "2.0.2"
|
||||||
|
memmap = "0.6.2"
|
||||||
|
|
||||||
[dev-dependencies]
|
[dev-dependencies]
|
||||||
grep-regex = { version = "0.1.1", path = "../grep-regex" }
|
grep-regex = { version = "0.1.1", path = "../grep-regex" }
|
||||||
regex = "1.1"
|
regex = "1.0.5"
|
||||||
|
|
||||||
[features]
|
[features]
|
||||||
default = ["bytecount/runtime-dispatch-simd"]
|
avx-accel = [
|
||||||
simd-accel = ["encoding_rs/simd-accel"]
|
"bytecount/avx-accel",
|
||||||
|
]
|
||||||
# This feature is DEPRECATED. Runtime dispatch is used for SIMD now.
|
simd-accel = [
|
||||||
avx-accel = []
|
"bytecount/simd-accel",
|
||||||
|
"encoding_rs/simd-accel",
|
||||||
|
]
|
||||||
|
@@ -99,13 +99,13 @@ searches stdin.
|
|||||||
|
|
||||||
#![deny(missing_docs)]
|
#![deny(missing_docs)]
|
||||||
|
|
||||||
extern crate bstr;
|
|
||||||
extern crate bytecount;
|
extern crate bytecount;
|
||||||
extern crate encoding_rs;
|
extern crate encoding_rs;
|
||||||
extern crate encoding_rs_io;
|
extern crate encoding_rs_io;
|
||||||
extern crate grep_matcher;
|
extern crate grep_matcher;
|
||||||
#[macro_use]
|
#[macro_use]
|
||||||
extern crate log;
|
extern crate log;
|
||||||
|
extern crate memchr;
|
||||||
extern crate memmap;
|
extern crate memmap;
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
extern crate regex;
|
extern crate regex;
|
||||||
|
@@ -1,7 +1,8 @@
|
|||||||
use std::cmp;
|
use std::cmp;
|
||||||
use std::io;
|
use std::io;
|
||||||
|
use std::ptr;
|
||||||
|
|
||||||
use bstr::{BStr, BString};
|
use memchr::{memchr, memrchr};
|
||||||
|
|
||||||
/// The default buffer capacity that we use for the line buffer.
|
/// The default buffer capacity that we use for the line buffer.
|
||||||
pub(crate) const DEFAULT_BUFFER_CAPACITY: usize = 8 * (1<<10); // 8 KB
|
pub(crate) const DEFAULT_BUFFER_CAPACITY: usize = 8 * (1<<10); // 8 KB
|
||||||
@@ -122,7 +123,7 @@ impl LineBufferBuilder {
|
|||||||
pub fn build(&self) -> LineBuffer {
|
pub fn build(&self) -> LineBuffer {
|
||||||
LineBuffer {
|
LineBuffer {
|
||||||
config: self.config,
|
config: self.config,
|
||||||
buf: BString::from(vec![0; self.config.capacity]),
|
buf: vec![0; self.config.capacity],
|
||||||
pos: 0,
|
pos: 0,
|
||||||
last_lineterm: 0,
|
last_lineterm: 0,
|
||||||
end: 0,
|
end: 0,
|
||||||
@@ -254,12 +255,6 @@ impl<'b, R: io::Read> LineBufferReader<'b, R> {
|
|||||||
|
|
||||||
/// Return the contents of this buffer.
|
/// Return the contents of this buffer.
|
||||||
pub fn buffer(&self) -> &[u8] {
|
pub fn buffer(&self) -> &[u8] {
|
||||||
self.line_buffer.buffer().as_bytes()
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Return the underlying buffer as a byte string. Used for tests only.
|
|
||||||
#[cfg(test)]
|
|
||||||
fn bstr(&self) -> &BStr {
|
|
||||||
self.line_buffer.buffer()
|
self.line_buffer.buffer()
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -289,7 +284,7 @@ pub struct LineBuffer {
|
|||||||
/// The configuration of this buffer.
|
/// The configuration of this buffer.
|
||||||
config: Config,
|
config: Config,
|
||||||
/// The primary buffer with which to hold data.
|
/// The primary buffer with which to hold data.
|
||||||
buf: BString,
|
buf: Vec<u8>,
|
||||||
/// The current position of this buffer. This is always a valid sliceable
|
/// The current position of this buffer. This is always a valid sliceable
|
||||||
/// index into `buf`, and its maximum value is the length of `buf`.
|
/// index into `buf`, and its maximum value is the length of `buf`.
|
||||||
pos: usize,
|
pos: usize,
|
||||||
@@ -317,14 +312,6 @@ pub struct LineBuffer {
|
|||||||
}
|
}
|
||||||
|
|
||||||
impl LineBuffer {
|
impl LineBuffer {
|
||||||
/// Set the binary detection method used on this line buffer.
|
|
||||||
///
|
|
||||||
/// This permits dynamically changing the binary detection strategy on
|
|
||||||
/// an existing line buffer without needing to create a new one.
|
|
||||||
pub fn set_binary_detection(&mut self, binary: BinaryDetection) {
|
|
||||||
self.config.binary = binary;
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Reset this buffer, such that it can be used with a new reader.
|
/// Reset this buffer, such that it can be used with a new reader.
|
||||||
fn clear(&mut self) {
|
fn clear(&mut self) {
|
||||||
self.pos = 0;
|
self.pos = 0;
|
||||||
@@ -352,13 +339,13 @@ impl LineBuffer {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// Return the contents of this buffer.
|
/// Return the contents of this buffer.
|
||||||
fn buffer(&self) -> &BStr {
|
fn buffer(&self) -> &[u8] {
|
||||||
&self.buf[self.pos..self.last_lineterm]
|
&self.buf[self.pos..self.last_lineterm]
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Return the contents of the free space beyond the end of the buffer as
|
/// Return the contents of the free space beyond the end of the buffer as
|
||||||
/// a mutable slice.
|
/// a mutable slice.
|
||||||
fn free_buffer(&mut self) -> &mut BStr {
|
fn free_buffer(&mut self) -> &mut [u8] {
|
||||||
&mut self.buf[self.end..]
|
&mut self.buf[self.end..]
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -409,7 +396,7 @@ impl LineBuffer {
|
|||||||
assert_eq!(self.pos, 0);
|
assert_eq!(self.pos, 0);
|
||||||
loop {
|
loop {
|
||||||
self.ensure_capacity()?;
|
self.ensure_capacity()?;
|
||||||
let readlen = rdr.read(self.free_buffer().as_bytes_mut())?;
|
let readlen = rdr.read(self.free_buffer())?;
|
||||||
if readlen == 0 {
|
if readlen == 0 {
|
||||||
// We're only done reading for good once the caller has
|
// We're only done reading for good once the caller has
|
||||||
// consumed everything.
|
// consumed everything.
|
||||||
@@ -429,7 +416,7 @@ impl LineBuffer {
|
|||||||
match self.config.binary {
|
match self.config.binary {
|
||||||
BinaryDetection::None => {} // nothing to do
|
BinaryDetection::None => {} // nothing to do
|
||||||
BinaryDetection::Quit(byte) => {
|
BinaryDetection::Quit(byte) => {
|
||||||
if let Some(i) = newbytes.find_byte(byte) {
|
if let Some(i) = memchr(byte, newbytes) {
|
||||||
self.end = oldend + i;
|
self.end = oldend + i;
|
||||||
self.last_lineterm = self.end;
|
self.last_lineterm = self.end;
|
||||||
self.binary_byte_offset =
|
self.binary_byte_offset =
|
||||||
@@ -457,7 +444,7 @@ impl LineBuffer {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Update our `last_lineterm` positions if we read one.
|
// Update our `last_lineterm` positions if we read one.
|
||||||
if let Some(i) = newbytes.rfind_byte(self.config.lineterm) {
|
if let Some(i) = memrchr(self.config.lineterm, newbytes) {
|
||||||
self.last_lineterm = oldend + i + 1;
|
self.last_lineterm = oldend + i + 1;
|
||||||
return Ok(true);
|
return Ok(true);
|
||||||
}
|
}
|
||||||
@@ -480,8 +467,40 @@ impl LineBuffer {
|
|||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
assert!(self.pos < self.end && self.end <= self.buf.len());
|
||||||
let roll_len = self.end - self.pos;
|
let roll_len = self.end - self.pos;
|
||||||
self.buf.copy_within(self.pos.., 0);
|
unsafe {
|
||||||
|
// SAFETY: A buffer contains Copy data, so there's no problem
|
||||||
|
// moving it around. Safety also depends on our indices being
|
||||||
|
// in bounds, which they should always be, and we enforce with
|
||||||
|
// an assert above.
|
||||||
|
//
|
||||||
|
// It seems like it should be possible to do this in safe code that
|
||||||
|
// results in the same codegen. I tried the obvious:
|
||||||
|
//
|
||||||
|
// for (src, dst) in (self.pos..self.end).zip(0..) {
|
||||||
|
// self.buf[dst] = self.buf[src];
|
||||||
|
// }
|
||||||
|
//
|
||||||
|
// But the above does not work, and in fact compiles down to a slow
|
||||||
|
// byte-by-byte loop. I tried a few other minor variations, but
|
||||||
|
// alas, better minds might prevail.
|
||||||
|
//
|
||||||
|
// Overall, this doesn't save us *too* much. It mostly matters when
|
||||||
|
// the number of bytes we're copying is large, which can happen
|
||||||
|
// if the searcher is asked to produce a lot of context. We could
|
||||||
|
// decide this isn't worth it, but it does make an appreciable
|
||||||
|
// impact at or around the context=30 range on my machine.
|
||||||
|
//
|
||||||
|
// We could also use a temporary buffer that compiles down to two
|
||||||
|
// memcpys and is faster than the byte-at-a-time loop, but it
|
||||||
|
// complicates our options for limiting memory allocation a bit.
|
||||||
|
ptr::copy(
|
||||||
|
self.buf[self.pos..].as_ptr(),
|
||||||
|
self.buf.as_mut_ptr(),
|
||||||
|
roll_len,
|
||||||
|
);
|
||||||
|
}
|
||||||
self.pos = 0;
|
self.pos = 0;
|
||||||
self.last_lineterm = roll_len;
|
self.last_lineterm = roll_len;
|
||||||
self.end = roll_len;
|
self.end = roll_len;
|
||||||
@@ -517,15 +536,14 @@ impl LineBuffer {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Replaces `src` with `replacement` in bytes, and return the offset of the
|
/// Replaces `src` with `replacement` in bytes.
|
||||||
/// first replacement, if one exists.
|
fn replace_bytes(bytes: &mut [u8], src: u8, replacement: u8) -> Option<usize> {
|
||||||
fn replace_bytes(bytes: &mut BStr, src: u8, replacement: u8) -> Option<usize> {
|
|
||||||
if src == replacement {
|
if src == replacement {
|
||||||
return None;
|
return None;
|
||||||
}
|
}
|
||||||
let mut first_pos = None;
|
let mut first_pos = None;
|
||||||
let mut pos = 0;
|
let mut pos = 0;
|
||||||
while let Some(i) = bytes[pos..].find_byte(src).map(|i| pos + i) {
|
while let Some(i) = memchr(src, &bytes[pos..]).map(|i| pos + i) {
|
||||||
if first_pos.is_none() {
|
if first_pos.is_none() {
|
||||||
first_pos = Some(i);
|
first_pos = Some(i);
|
||||||
}
|
}
|
||||||
@@ -542,7 +560,6 @@ fn replace_bytes(bytes: &mut BStr, src: u8, replacement: u8) -> Option<usize> {
|
|||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use std::str;
|
use std::str;
|
||||||
use bstr::BString;
|
|
||||||
use super::*;
|
use super::*;
|
||||||
|
|
||||||
const SHERLOCK: &'static str = "\
|
const SHERLOCK: &'static str = "\
|
||||||
@@ -558,14 +575,18 @@ and exhibited clearly, with a label attached.\
|
|||||||
slice.to_string()
|
slice.to_string()
|
||||||
}
|
}
|
||||||
|
|
||||||
|
fn btos(slice: &[u8]) -> &str {
|
||||||
|
str::from_utf8(slice).unwrap()
|
||||||
|
}
|
||||||
|
|
||||||
fn replace_str(
|
fn replace_str(
|
||||||
slice: &str,
|
slice: &str,
|
||||||
src: u8,
|
src: u8,
|
||||||
replacement: u8,
|
replacement: u8,
|
||||||
) -> (String, Option<usize>) {
|
) -> (String, Option<usize>) {
|
||||||
let mut dst = BString::from(slice);
|
let mut dst = slice.to_string().into_bytes();
|
||||||
let result = replace_bytes(&mut dst, src, replacement);
|
let result = replace_bytes(&mut dst, src, replacement);
|
||||||
(dst.into_string().unwrap(), result)
|
(String::from_utf8(dst).unwrap(), result)
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
@@ -586,7 +607,7 @@ and exhibited clearly, with a label attached.\
|
|||||||
assert!(rdr.buffer().is_empty());
|
assert!(rdr.buffer().is_empty());
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "homer\nlisa\n");
|
assert_eq!(btos(rdr.buffer()), "homer\nlisa\n");
|
||||||
assert_eq!(rdr.absolute_byte_offset(), 0);
|
assert_eq!(rdr.absolute_byte_offset(), 0);
|
||||||
rdr.consume(5);
|
rdr.consume(5);
|
||||||
assert_eq!(rdr.absolute_byte_offset(), 5);
|
assert_eq!(rdr.absolute_byte_offset(), 5);
|
||||||
@@ -594,7 +615,7 @@ and exhibited clearly, with a label attached.\
|
|||||||
assert_eq!(rdr.absolute_byte_offset(), 11);
|
assert_eq!(rdr.absolute_byte_offset(), 11);
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "maggie");
|
assert_eq!(btos(rdr.buffer()), "maggie");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
assert!(!rdr.fill().unwrap());
|
assert!(!rdr.fill().unwrap());
|
||||||
@@ -609,7 +630,7 @@ and exhibited clearly, with a label attached.\
|
|||||||
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "homer\nlisa\nmaggie\n");
|
assert_eq!(btos(rdr.buffer()), "homer\nlisa\nmaggie\n");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
assert!(!rdr.fill().unwrap());
|
assert!(!rdr.fill().unwrap());
|
||||||
@@ -624,7 +645,7 @@ and exhibited clearly, with a label attached.\
|
|||||||
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "\n");
|
assert_eq!(btos(rdr.buffer()), "\n");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
assert!(!rdr.fill().unwrap());
|
assert!(!rdr.fill().unwrap());
|
||||||
@@ -639,7 +660,7 @@ and exhibited clearly, with a label attached.\
|
|||||||
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "\n\n");
|
assert_eq!(btos(rdr.buffer()), "\n\n");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
assert!(!rdr.fill().unwrap());
|
assert!(!rdr.fill().unwrap());
|
||||||
@@ -677,12 +698,12 @@ and exhibited clearly, with a label attached.\
|
|||||||
let mut linebuf = LineBufferBuilder::new().capacity(1).build();
|
let mut linebuf = LineBufferBuilder::new().capacity(1).build();
|
||||||
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
||||||
|
|
||||||
let mut got = BString::new();
|
let mut got = vec![];
|
||||||
while rdr.fill().unwrap() {
|
while rdr.fill().unwrap() {
|
||||||
got.push(rdr.buffer());
|
got.extend(rdr.buffer());
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
}
|
}
|
||||||
assert_eq!(bytes, got);
|
assert_eq!(bytes, btos(&got));
|
||||||
assert_eq!(rdr.absolute_byte_offset(), bytes.len() as u64);
|
assert_eq!(rdr.absolute_byte_offset(), bytes.len() as u64);
|
||||||
assert_eq!(rdr.binary_byte_offset(), None);
|
assert_eq!(rdr.binary_byte_offset(), None);
|
||||||
}
|
}
|
||||||
@@ -697,11 +718,11 @@ and exhibited clearly, with a label attached.\
|
|||||||
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "homer\n");
|
assert_eq!(btos(rdr.buffer()), "homer\n");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "lisa\n");
|
assert_eq!(btos(rdr.buffer()), "lisa\n");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
// This returns an error because while we have just enough room to
|
// This returns an error because while we have just enough room to
|
||||||
@@ -711,11 +732,11 @@ and exhibited clearly, with a label attached.\
|
|||||||
assert!(rdr.fill().is_err());
|
assert!(rdr.fill().is_err());
|
||||||
|
|
||||||
// We can mush on though!
|
// We can mush on though!
|
||||||
assert_eq!(rdr.bstr(), "m");
|
assert_eq!(btos(rdr.buffer()), "m");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "aggie");
|
assert_eq!(btos(rdr.buffer()), "aggie");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
assert!(!rdr.fill().unwrap());
|
assert!(!rdr.fill().unwrap());
|
||||||
@@ -731,16 +752,16 @@ and exhibited clearly, with a label attached.\
|
|||||||
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "homer\n");
|
assert_eq!(btos(rdr.buffer()), "homer\n");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "lisa\n");
|
assert_eq!(btos(rdr.buffer()), "lisa\n");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
// We have just enough space.
|
// We have just enough space.
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "maggie");
|
assert_eq!(btos(rdr.buffer()), "maggie");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
assert!(!rdr.fill().unwrap());
|
assert!(!rdr.fill().unwrap());
|
||||||
@@ -756,7 +777,7 @@ and exhibited clearly, with a label attached.\
|
|||||||
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
||||||
|
|
||||||
assert!(rdr.fill().is_err());
|
assert!(rdr.fill().is_err());
|
||||||
assert_eq!(rdr.bstr(), "");
|
assert_eq!(btos(rdr.buffer()), "");
|
||||||
}
|
}
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
@@ -768,7 +789,7 @@ and exhibited clearly, with a label attached.\
|
|||||||
assert!(rdr.buffer().is_empty());
|
assert!(rdr.buffer().is_empty());
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "homer\nli\x00sa\nmaggie\n");
|
assert_eq!(btos(rdr.buffer()), "homer\nli\x00sa\nmaggie\n");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
assert!(!rdr.fill().unwrap());
|
assert!(!rdr.fill().unwrap());
|
||||||
@@ -787,7 +808,7 @@ and exhibited clearly, with a label attached.\
|
|||||||
assert!(rdr.buffer().is_empty());
|
assert!(rdr.buffer().is_empty());
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "homer\nli");
|
assert_eq!(btos(rdr.buffer()), "homer\nli");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
assert!(!rdr.fill().unwrap());
|
assert!(!rdr.fill().unwrap());
|
||||||
@@ -804,7 +825,7 @@ and exhibited clearly, with a label attached.\
|
|||||||
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
|
||||||
|
|
||||||
assert!(!rdr.fill().unwrap());
|
assert!(!rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "");
|
assert_eq!(btos(rdr.buffer()), "");
|
||||||
assert_eq!(rdr.absolute_byte_offset(), 0);
|
assert_eq!(rdr.absolute_byte_offset(), 0);
|
||||||
assert_eq!(rdr.binary_byte_offset(), Some(0));
|
assert_eq!(rdr.binary_byte_offset(), Some(0));
|
||||||
}
|
}
|
||||||
@@ -820,7 +841,7 @@ and exhibited clearly, with a label attached.\
|
|||||||
assert!(rdr.buffer().is_empty());
|
assert!(rdr.buffer().is_empty());
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "homer\nlisa\nmaggie\n");
|
assert_eq!(btos(rdr.buffer()), "homer\nlisa\nmaggie\n");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
assert!(!rdr.fill().unwrap());
|
assert!(!rdr.fill().unwrap());
|
||||||
@@ -839,7 +860,7 @@ and exhibited clearly, with a label attached.\
|
|||||||
assert!(rdr.buffer().is_empty());
|
assert!(rdr.buffer().is_empty());
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "homer\nlisa\nmaggie");
|
assert_eq!(btos(rdr.buffer()), "homer\nlisa\nmaggie");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
assert!(!rdr.fill().unwrap());
|
assert!(!rdr.fill().unwrap());
|
||||||
@@ -857,7 +878,7 @@ and exhibited clearly, with a label attached.\
|
|||||||
assert!(rdr.buffer().is_empty());
|
assert!(rdr.buffer().is_empty());
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "\
|
assert_eq!(btos(rdr.buffer()), "\
|
||||||
For the Doctor Watsons of this world, as opposed to the Sherlock
|
For the Doctor Watsons of this world, as opposed to the Sherlock
|
||||||
Holmeses, s\
|
Holmeses, s\
|
||||||
");
|
");
|
||||||
@@ -880,7 +901,7 @@ Holmeses, s\
|
|||||||
assert!(rdr.buffer().is_empty());
|
assert!(rdr.buffer().is_empty());
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "homer\nli\nsa\nmaggie\n");
|
assert_eq!(btos(rdr.buffer()), "homer\nli\nsa\nmaggie\n");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
assert!(!rdr.fill().unwrap());
|
assert!(!rdr.fill().unwrap());
|
||||||
@@ -899,7 +920,7 @@ Holmeses, s\
|
|||||||
assert!(rdr.buffer().is_empty());
|
assert!(rdr.buffer().is_empty());
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "\nhomer\nlisa\nmaggie\n");
|
assert_eq!(btos(rdr.buffer()), "\nhomer\nlisa\nmaggie\n");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
assert!(!rdr.fill().unwrap());
|
assert!(!rdr.fill().unwrap());
|
||||||
@@ -918,7 +939,7 @@ Holmeses, s\
|
|||||||
assert!(rdr.buffer().is_empty());
|
assert!(rdr.buffer().is_empty());
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "homer\nlisa\nmaggie\n\n");
|
assert_eq!(btos(rdr.buffer()), "homer\nlisa\nmaggie\n\n");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
assert!(!rdr.fill().unwrap());
|
assert!(!rdr.fill().unwrap());
|
||||||
@@ -937,7 +958,7 @@ Holmeses, s\
|
|||||||
assert!(rdr.buffer().is_empty());
|
assert!(rdr.buffer().is_empty());
|
||||||
|
|
||||||
assert!(rdr.fill().unwrap());
|
assert!(rdr.fill().unwrap());
|
||||||
assert_eq!(rdr.bstr(), "homer\nlisa\nmaggie\n\n");
|
assert_eq!(btos(rdr.buffer()), "homer\nlisa\nmaggie\n\n");
|
||||||
rdr.consume_all();
|
rdr.consume_all();
|
||||||
|
|
||||||
assert!(!rdr.fill().unwrap());
|
assert!(!rdr.fill().unwrap());
|
||||||
|
@@ -2,8 +2,8 @@
|
|||||||
A collection of routines for performing operations on lines.
|
A collection of routines for performing operations on lines.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
use bstr::B;
|
|
||||||
use bytecount;
|
use bytecount;
|
||||||
|
use memchr::{memchr, memrchr};
|
||||||
use grep_matcher::{LineTerminator, Match};
|
use grep_matcher::{LineTerminator, Match};
|
||||||
|
|
||||||
/// An iterator over lines in a particular slice of bytes.
|
/// An iterator over lines in a particular slice of bytes.
|
||||||
@@ -85,7 +85,7 @@ impl LineStep {
|
|||||||
#[inline(always)]
|
#[inline(always)]
|
||||||
fn next_impl(&mut self, mut bytes: &[u8]) -> Option<(usize, usize)> {
|
fn next_impl(&mut self, mut bytes: &[u8]) -> Option<(usize, usize)> {
|
||||||
bytes = &bytes[..self.end];
|
bytes = &bytes[..self.end];
|
||||||
match B(&bytes[self.pos..]).find_byte(self.line_term) {
|
match memchr(self.line_term, &bytes[self.pos..]) {
|
||||||
None => {
|
None => {
|
||||||
if self.pos < bytes.len() {
|
if self.pos < bytes.len() {
|
||||||
let m = (self.pos, bytes.len());
|
let m = (self.pos, bytes.len());
|
||||||
@@ -135,16 +135,14 @@ pub fn locate(
|
|||||||
line_term: u8,
|
line_term: u8,
|
||||||
range: Match,
|
range: Match,
|
||||||
) -> Match {
|
) -> Match {
|
||||||
let line_start = B(&bytes[..range.start()])
|
let line_start = memrchr(line_term, &bytes[0..range.start()])
|
||||||
.rfind_byte(line_term)
|
|
||||||
.map_or(0, |i| i + 1);
|
.map_or(0, |i| i + 1);
|
||||||
let line_end =
|
let line_end =
|
||||||
if range.end() > line_start && bytes[range.end() - 1] == line_term {
|
if range.end() > line_start && bytes[range.end() - 1] == line_term {
|
||||||
range.end()
|
range.end()
|
||||||
} else {
|
} else {
|
||||||
B(&bytes[range.end()..])
|
memchr(line_term, &bytes[range.end()..])
|
||||||
.find_byte(line_term)
|
.map_or(bytes.len(), |i| range.end() + i + 1)
|
||||||
.map_or(bytes.len(), |i| range.end() + i + 1)
|
|
||||||
};
|
};
|
||||||
Match::new(line_start, line_end)
|
Match::new(line_start, line_end)
|
||||||
}
|
}
|
||||||
@@ -182,7 +180,7 @@ fn preceding_by_pos(
|
|||||||
pos -= 1;
|
pos -= 1;
|
||||||
}
|
}
|
||||||
loop {
|
loop {
|
||||||
match B(&bytes[..pos]).rfind_byte(line_term) {
|
match memrchr(line_term, &bytes[..pos]) {
|
||||||
None => {
|
None => {
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
@@ -1,4 +1,3 @@
|
|||||||
/// Like assert_eq, but nicer output for long strings.
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
#[macro_export]
|
#[macro_export]
|
||||||
macro_rules! assert_eq_printed {
|
macro_rules! assert_eq_printed {
|
||||||
|
@@ -1,6 +1,6 @@
|
|||||||
use std::cmp;
|
use std::cmp;
|
||||||
|
|
||||||
use bstr::B;
|
use memchr::memchr;
|
||||||
|
|
||||||
use grep_matcher::{LineMatchKind, Matcher};
|
use grep_matcher::{LineMatchKind, Matcher};
|
||||||
use lines::{self, LineStep};
|
use lines::{self, LineStep};
|
||||||
@@ -90,13 +90,6 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
|||||||
self.sink_matched(buf, range)
|
self.sink_matched(buf, range)
|
||||||
}
|
}
|
||||||
|
|
||||||
pub fn binary_data(
|
|
||||||
&mut self,
|
|
||||||
binary_byte_offset: u64,
|
|
||||||
) -> Result<bool, S::Error> {
|
|
||||||
self.sink.binary_data(&self.searcher, binary_byte_offset)
|
|
||||||
}
|
|
||||||
|
|
||||||
pub fn begin(&mut self) -> Result<bool, S::Error> {
|
pub fn begin(&mut self) -> Result<bool, S::Error> {
|
||||||
self.sink.begin(&self.searcher)
|
self.sink.begin(&self.searcher)
|
||||||
}
|
}
|
||||||
@@ -148,28 +141,19 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
|||||||
consumed
|
consumed
|
||||||
}
|
}
|
||||||
|
|
||||||
pub fn detect_binary(
|
pub fn detect_binary(&mut self, buf: &[u8], range: &Range) -> bool {
|
||||||
&mut self,
|
|
||||||
buf: &[u8],
|
|
||||||
range: &Range,
|
|
||||||
) -> Result<bool, S::Error> {
|
|
||||||
if self.binary_byte_offset.is_some() {
|
if self.binary_byte_offset.is_some() {
|
||||||
return Ok(self.config.binary.quit_byte().is_some());
|
return true;
|
||||||
}
|
}
|
||||||
let binary_byte = match self.config.binary.0 {
|
let binary_byte = match self.config.binary.0 {
|
||||||
BinaryDetection::Quit(b) => b,
|
BinaryDetection::Quit(b) => b,
|
||||||
BinaryDetection::Convert(b) => b,
|
_ => return false,
|
||||||
_ => return Ok(false),
|
|
||||||
};
|
};
|
||||||
if let Some(i) = B(&buf[*range]).find_byte(binary_byte) {
|
if let Some(i) = memchr(binary_byte, &buf[*range]) {
|
||||||
let offset = range.start() + i;
|
self.binary_byte_offset = Some(range.start() + i);
|
||||||
self.binary_byte_offset = Some(offset);
|
true
|
||||||
if !self.binary_data(offset as u64)? {
|
|
||||||
return Ok(true);
|
|
||||||
}
|
|
||||||
Ok(self.config.binary.quit_byte().is_some())
|
|
||||||
} else {
|
} else {
|
||||||
Ok(false)
|
false
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -432,7 +416,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
|||||||
buf: &[u8],
|
buf: &[u8],
|
||||||
range: &Range,
|
range: &Range,
|
||||||
) -> Result<bool, S::Error> {
|
) -> Result<bool, S::Error> {
|
||||||
if self.binary && self.detect_binary(buf, range)? {
|
if self.binary && self.detect_binary(buf, range) {
|
||||||
return Ok(false);
|
return Ok(false);
|
||||||
}
|
}
|
||||||
if !self.sink_break_context(range.start())? {
|
if !self.sink_break_context(range.start())? {
|
||||||
@@ -440,7 +424,16 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
|||||||
}
|
}
|
||||||
self.count_lines(buf, range.start());
|
self.count_lines(buf, range.start());
|
||||||
let offset = self.absolute_byte_offset + range.start() as u64;
|
let offset = self.absolute_byte_offset + range.start() as u64;
|
||||||
let linebuf = &buf[*range];
|
let linebuf =
|
||||||
|
if self.config.line_term.is_crlf() {
|
||||||
|
// Normally, a line terminator is never part of a match, but
|
||||||
|
// if the line terminator is CRLF, then it's possible for `\r`
|
||||||
|
// to end up in the match, which we generally don't want. So
|
||||||
|
// we strip it here.
|
||||||
|
lines::without_terminator(&buf[*range], self.config.line_term)
|
||||||
|
} else {
|
||||||
|
&buf[*range]
|
||||||
|
};
|
||||||
let keepgoing = self.sink.matched(
|
let keepgoing = self.sink.matched(
|
||||||
&self.searcher,
|
&self.searcher,
|
||||||
&SinkMatch {
|
&SinkMatch {
|
||||||
@@ -464,7 +457,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
|||||||
buf: &[u8],
|
buf: &[u8],
|
||||||
range: &Range,
|
range: &Range,
|
||||||
) -> Result<bool, S::Error> {
|
) -> Result<bool, S::Error> {
|
||||||
if self.binary && self.detect_binary(buf, range)? {
|
if self.binary && self.detect_binary(buf, range) {
|
||||||
return Ok(false);
|
return Ok(false);
|
||||||
}
|
}
|
||||||
self.count_lines(buf, range.start());
|
self.count_lines(buf, range.start());
|
||||||
@@ -494,7 +487,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
|||||||
) -> Result<bool, S::Error> {
|
) -> Result<bool, S::Error> {
|
||||||
assert!(self.after_context_left >= 1);
|
assert!(self.after_context_left >= 1);
|
||||||
|
|
||||||
if self.binary && self.detect_binary(buf, range)? {
|
if self.binary && self.detect_binary(buf, range) {
|
||||||
return Ok(false);
|
return Ok(false);
|
||||||
}
|
}
|
||||||
self.count_lines(buf, range.start());
|
self.count_lines(buf, range.start());
|
||||||
@@ -523,7 +516,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
|||||||
buf: &[u8],
|
buf: &[u8],
|
||||||
range: &Range,
|
range: &Range,
|
||||||
) -> Result<bool, S::Error> {
|
) -> Result<bool, S::Error> {
|
||||||
if self.binary && self.detect_binary(buf, range)? {
|
if self.binary && self.detect_binary(buf, range) {
|
||||||
return Ok(false);
|
return Ok(false);
|
||||||
}
|
}
|
||||||
self.count_lines(buf, range.start());
|
self.count_lines(buf, range.start());
|
||||||
|
@@ -51,7 +51,6 @@ where M: Matcher,
|
|||||||
fn fill(&mut self) -> Result<bool, S::Error> {
|
fn fill(&mut self) -> Result<bool, S::Error> {
|
||||||
assert!(self.rdr.buffer()[self.core.pos()..].is_empty());
|
assert!(self.rdr.buffer()[self.core.pos()..].is_empty());
|
||||||
|
|
||||||
let already_binary = self.rdr.binary_byte_offset().is_some();
|
|
||||||
let old_buf_len = self.rdr.buffer().len();
|
let old_buf_len = self.rdr.buffer().len();
|
||||||
let consumed = self.core.roll(self.rdr.buffer());
|
let consumed = self.core.roll(self.rdr.buffer());
|
||||||
self.rdr.consume(consumed);
|
self.rdr.consume(consumed);
|
||||||
@@ -59,14 +58,7 @@ where M: Matcher,
|
|||||||
Err(err) => return Err(S::Error::error_io(err)),
|
Err(err) => return Err(S::Error::error_io(err)),
|
||||||
Ok(didread) => didread,
|
Ok(didread) => didread,
|
||||||
};
|
};
|
||||||
if !already_binary {
|
if !didread || self.rdr.binary_byte_offset().is_some() {
|
||||||
if let Some(offset) = self.rdr.binary_byte_offset() {
|
|
||||||
if !self.core.binary_data(offset)? {
|
|
||||||
return Ok(false);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if !didread || self.should_binary_quit() {
|
|
||||||
return Ok(false);
|
return Ok(false);
|
||||||
}
|
}
|
||||||
// If rolling the buffer didn't result in consuming anything and if
|
// If rolling the buffer didn't result in consuming anything and if
|
||||||
@@ -79,11 +71,6 @@ where M: Matcher,
|
|||||||
}
|
}
|
||||||
Ok(true)
|
Ok(true)
|
||||||
}
|
}
|
||||||
|
|
||||||
fn should_binary_quit(&self) -> bool {
|
|
||||||
self.rdr.binary_byte_offset().is_some()
|
|
||||||
&& self.config.binary.quit_byte().is_some()
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#[derive(Debug)]
|
#[derive(Debug)]
|
||||||
@@ -116,7 +103,7 @@ impl<'s, M: Matcher, S: Sink> SliceByLine<'s, M, S> {
|
|||||||
DEFAULT_BUFFER_CAPACITY,
|
DEFAULT_BUFFER_CAPACITY,
|
||||||
);
|
);
|
||||||
let binary_range = Range::new(0, binary_upto);
|
let binary_range = Range::new(0, binary_upto);
|
||||||
if !self.core.detect_binary(self.slice, &binary_range)? {
|
if !self.core.detect_binary(self.slice, &binary_range) {
|
||||||
while
|
while
|
||||||
!self.slice[self.core.pos()..].is_empty()
|
!self.slice[self.core.pos()..].is_empty()
|
||||||
&& self.core.match_by_line(self.slice)?
|
&& self.core.match_by_line(self.slice)?
|
||||||
@@ -168,7 +155,7 @@ impl<'s, M: Matcher, S: Sink> MultiLine<'s, M, S> {
|
|||||||
DEFAULT_BUFFER_CAPACITY,
|
DEFAULT_BUFFER_CAPACITY,
|
||||||
);
|
);
|
||||||
let binary_range = Range::new(0, binary_upto);
|
let binary_range = Range::new(0, binary_upto);
|
||||||
if !self.core.detect_binary(self.slice, &binary_range)? {
|
if !self.core.detect_binary(self.slice, &binary_range) {
|
||||||
let mut keepgoing = true;
|
let mut keepgoing = true;
|
||||||
while !self.slice[self.core.pos()..].is_empty() && keepgoing {
|
while !self.slice[self.core.pos()..].is_empty() && keepgoing {
|
||||||
keepgoing = self.sink()?;
|
keepgoing = self.sink()?;
|
||||||
|
@@ -75,41 +75,25 @@ impl BinaryDetection {
|
|||||||
BinaryDetection(line_buffer::BinaryDetection::Quit(binary_byte))
|
BinaryDetection(line_buffer::BinaryDetection::Quit(binary_byte))
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Binary detection is performed by looking for the given byte, and
|
// TODO(burntsushi): Figure out how to make binary conversion work. This
|
||||||
/// replacing it with the line terminator configured on the searcher.
|
// permits implementing GNU grep's default behavior, which is to zap NUL
|
||||||
/// (If the searcher is configured to use `CRLF` as the line terminator,
|
// bytes but still execute a search (if a match is detected, then GNU grep
|
||||||
/// then this byte is replaced by just `LF`.)
|
// stops and reports that a match was found but doesn't print the matching
|
||||||
///
|
// line itself).
|
||||||
/// When searching is performed using a fixed size buffer, then the
|
//
|
||||||
/// contents of that buffer are always searched for the presence of this
|
// This behavior is pretty simple to implement using the line buffer (and
|
||||||
/// byte and replaced with the line terminator. In effect, the caller is
|
// in fact, it is already implemented and tested), since there's a fixed
|
||||||
/// guaranteed to never observe this byte while searching.
|
// size buffer that we can easily write to. The issue arises when searching
|
||||||
///
|
// a `&[u8]` (whether on the heap or via a memory map), since this isn't
|
||||||
/// When searching is performed with the entire contents mapped into
|
// something we can easily write to.
|
||||||
/// memory, then this setting has no effect and is ignored.
|
|
||||||
pub fn convert(binary_byte: u8) -> BinaryDetection {
|
/// The given byte is searched in all contents read by the line buffer. If
|
||||||
|
/// it occurs, then it is replaced by the line terminator. The line buffer
|
||||||
|
/// guarantees that this byte will never be observable by callers.
|
||||||
|
#[allow(dead_code)]
|
||||||
|
fn convert(binary_byte: u8) -> BinaryDetection {
|
||||||
BinaryDetection(line_buffer::BinaryDetection::Convert(binary_byte))
|
BinaryDetection(line_buffer::BinaryDetection::Convert(binary_byte))
|
||||||
}
|
}
|
||||||
|
|
||||||
/// If this binary detection uses the "quit" strategy, then this returns
|
|
||||||
/// the byte that will cause a search to quit. In any other case, this
|
|
||||||
/// returns `None`.
|
|
||||||
pub fn quit_byte(&self) -> Option<u8> {
|
|
||||||
match self.0 {
|
|
||||||
line_buffer::BinaryDetection::Quit(b) => Some(b),
|
|
||||||
_ => None,
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/// If this binary detection uses the "convert" strategy, then this returns
|
|
||||||
/// the byte that will be replaced by the line terminator. In any other
|
|
||||||
/// case, this returns `None`.
|
|
||||||
pub fn convert_byte(&self) -> Option<u8> {
|
|
||||||
match self.0 {
|
|
||||||
line_buffer::BinaryDetection::Convert(b) => Some(b),
|
|
||||||
_ => None,
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/// An encoding to use when searching.
|
/// An encoding to use when searching.
|
||||||
@@ -171,8 +155,6 @@ pub struct Config {
|
|||||||
/// An encoding that, when present, causes the searcher to transcode all
|
/// An encoding that, when present, causes the searcher to transcode all
|
||||||
/// input from the encoding to UTF-8.
|
/// input from the encoding to UTF-8.
|
||||||
encoding: Option<Encoding>,
|
encoding: Option<Encoding>,
|
||||||
/// Whether to do automatic transcoding based on a BOM or not.
|
|
||||||
bom_sniffing: bool,
|
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Default for Config {
|
impl Default for Config {
|
||||||
@@ -189,7 +171,6 @@ impl Default for Config {
|
|||||||
binary: BinaryDetection::default(),
|
binary: BinaryDetection::default(),
|
||||||
multi_line: false,
|
multi_line: false,
|
||||||
encoding: None,
|
encoding: None,
|
||||||
bom_sniffing: true,
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -322,15 +303,11 @@ impl SearcherBuilder {
|
|||||||
config.before_context = 0;
|
config.before_context = 0;
|
||||||
config.after_context = 0;
|
config.after_context = 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
let mut decode_builder = DecodeReaderBytesBuilder::new();
|
let mut decode_builder = DecodeReaderBytesBuilder::new();
|
||||||
decode_builder
|
decode_builder
|
||||||
.encoding(self.config.encoding.as_ref().map(|e| e.0))
|
.encoding(self.config.encoding.as_ref().map(|e| e.0))
|
||||||
.utf8_passthru(true)
|
.utf8_passthru(true)
|
||||||
.strip_bom(self.config.bom_sniffing)
|
.bom_override(true);
|
||||||
.bom_override(true)
|
|
||||||
.bom_sniffing(self.config.bom_sniffing);
|
|
||||||
|
|
||||||
Searcher {
|
Searcher {
|
||||||
config: config,
|
config: config,
|
||||||
decode_builder: decode_builder,
|
decode_builder: decode_builder,
|
||||||
@@ -528,13 +505,12 @@ impl SearcherBuilder {
|
|||||||
/// transcoding process encounters an error, then bytes are replaced with
|
/// transcoding process encounters an error, then bytes are replaced with
|
||||||
/// the Unicode replacement codepoint.
|
/// the Unicode replacement codepoint.
|
||||||
///
|
///
|
||||||
/// When no encoding is specified (the default), then BOM sniffing is
|
/// When no encoding is specified (the default), then BOM sniffing is used
|
||||||
/// used (if it's enabled, which it is, by default) to determine whether
|
/// to determine whether the source data is UTF-8 or UTF-16, and
|
||||||
/// the source data is UTF-8 or UTF-16, and transcoding will be performed
|
/// transcoding will be performed automatically. If no BOM could be found,
|
||||||
/// automatically. If no BOM could be found, then the source data is
|
/// then the source data is searched _as if_ it were UTF-8. However, so
|
||||||
/// searched _as if_ it were UTF-8. However, so long as the source data is
|
/// long as the source data is at least ASCII compatible, then it is
|
||||||
/// at least ASCII compatible, then it is possible for a search to produce
|
/// possible for a search to produce useful results.
|
||||||
/// useful results.
|
|
||||||
pub fn encoding(
|
pub fn encoding(
|
||||||
&mut self,
|
&mut self,
|
||||||
encoding: Option<Encoding>,
|
encoding: Option<Encoding>,
|
||||||
@@ -542,23 +518,6 @@ impl SearcherBuilder {
|
|||||||
self.config.encoding = encoding;
|
self.config.encoding = encoding;
|
||||||
self
|
self
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Enable automatic transcoding based on BOM sniffing.
|
|
||||||
///
|
|
||||||
/// When this is enabled and an explicit encoding is not set, then this
|
|
||||||
/// searcher will try to detect the encoding of the bytes being searched
|
|
||||||
/// by sniffing its byte-order mark (BOM). In particular, when this is
|
|
||||||
/// enabled, UTF-16 encoded files will be searched seamlessly.
|
|
||||||
///
|
|
||||||
/// When this is disabled and if an explicit encoding is not set, then
|
|
||||||
/// the bytes from the source stream will be passed through unchanged,
|
|
||||||
/// including its BOM, if one is present.
|
|
||||||
///
|
|
||||||
/// This is enabled by default.
|
|
||||||
pub fn bom_sniffing(&mut self, yes: bool) -> &mut SearcherBuilder {
|
|
||||||
self.config.bom_sniffing = yes;
|
|
||||||
self
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/// A searcher executes searches over a haystack and writes results to a caller
|
/// A searcher executes searches over a haystack and writes results to a caller
|
||||||
@@ -755,12 +714,6 @@ impl Searcher {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Set the binary detection method used on this searcher.
|
|
||||||
pub fn set_binary_detection(&mut self, detection: BinaryDetection) {
|
|
||||||
self.config.binary = detection.clone();
|
|
||||||
self.line_buffer.borrow_mut().set_binary_detection(detection.0);
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Check that the searcher's configuration and the matcher are consistent
|
/// Check that the searcher's configuration and the matcher are consistent
|
||||||
/// with each other.
|
/// with each other.
|
||||||
fn check_config<M: Matcher>(&self, matcher: M) -> Result<(), ConfigError> {
|
fn check_config<M: Matcher>(&self, matcher: M) -> Result<(), ConfigError> {
|
||||||
@@ -784,8 +737,7 @@ impl Searcher {
|
|||||||
|
|
||||||
/// Returns true if and only if the given slice needs to be transcoded.
|
/// Returns true if and only if the given slice needs to be transcoded.
|
||||||
fn slice_needs_transcoding(&self, slice: &[u8]) -> bool {
|
fn slice_needs_transcoding(&self, slice: &[u8]) -> bool {
|
||||||
self.config.encoding.is_some()
|
self.config.encoding.is_some() || slice_has_utf16_bom(slice)
|
||||||
|| (self.config.bom_sniffing && slice_has_utf16_bom(slice))
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -800,12 +752,6 @@ impl Searcher {
|
|||||||
self.config.line_term
|
self.config.line_term
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Returns the type of binary detection configured on this searcher.
|
|
||||||
#[inline]
|
|
||||||
pub fn binary_detection(&self) -> &BinaryDetection {
|
|
||||||
&self.config.binary
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Returns true if and only if this searcher is configured to invert its
|
/// Returns true if and only if this searcher is configured to invert its
|
||||||
/// search results. That is, matching lines are lines that do **not** match
|
/// search results. That is, matching lines are lines that do **not** match
|
||||||
/// the searcher's matcher.
|
/// the searcher's matcher.
|
||||||
|
@@ -167,28 +167,6 @@ pub trait Sink {
|
|||||||
Ok(true)
|
Ok(true)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// This method is called whenever binary detection is enabled and binary
|
|
||||||
/// data is found. If binary data is found, then this is called at least
|
|
||||||
/// once for the first occurrence with the absolute byte offset at which
|
|
||||||
/// the binary data begins.
|
|
||||||
///
|
|
||||||
/// If this returns `true`, then searching continues. If this returns
|
|
||||||
/// `false`, then searching is stopped immediately and `finish` is called.
|
|
||||||
///
|
|
||||||
/// If this returns an error, then searching is stopped immediately,
|
|
||||||
/// `finish` is not called and the error is bubbled back up to the caller
|
|
||||||
/// of the searcher.
|
|
||||||
///
|
|
||||||
/// By default, it does nothing and returns `true`.
|
|
||||||
#[inline]
|
|
||||||
fn binary_data(
|
|
||||||
&mut self,
|
|
||||||
_searcher: &Searcher,
|
|
||||||
_binary_byte_offset: u64,
|
|
||||||
) -> Result<bool, Self::Error> {
|
|
||||||
Ok(true)
|
|
||||||
}
|
|
||||||
|
|
||||||
/// This method is called when a search has begun, before any search is
|
/// This method is called when a search has begun, before any search is
|
||||||
/// executed. By default, this does nothing.
|
/// executed. By default, this does nothing.
|
||||||
///
|
///
|
||||||
@@ -250,15 +228,6 @@ impl<'a, S: Sink> Sink for &'a mut S {
|
|||||||
(**self).context_break(searcher)
|
(**self).context_break(searcher)
|
||||||
}
|
}
|
||||||
|
|
||||||
#[inline]
|
|
||||||
fn binary_data(
|
|
||||||
&mut self,
|
|
||||||
searcher: &Searcher,
|
|
||||||
binary_byte_offset: u64,
|
|
||||||
) -> Result<bool, S::Error> {
|
|
||||||
(**self).binary_data(searcher, binary_byte_offset)
|
|
||||||
}
|
|
||||||
|
|
||||||
#[inline]
|
#[inline]
|
||||||
fn begin(
|
fn begin(
|
||||||
&mut self,
|
&mut self,
|
||||||
@@ -306,15 +275,6 @@ impl<S: Sink + ?Sized> Sink for Box<S> {
|
|||||||
(**self).context_break(searcher)
|
(**self).context_break(searcher)
|
||||||
}
|
}
|
||||||
|
|
||||||
#[inline]
|
|
||||||
fn binary_data(
|
|
||||||
&mut self,
|
|
||||||
searcher: &Searcher,
|
|
||||||
binary_byte_offset: u64,
|
|
||||||
) -> Result<bool, S::Error> {
|
|
||||||
(**self).binary_data(searcher, binary_byte_offset)
|
|
||||||
}
|
|
||||||
|
|
||||||
#[inline]
|
#[inline]
|
||||||
fn begin(
|
fn begin(
|
||||||
&mut self,
|
&mut self,
|
||||||
|
@@ -1,10 +1,10 @@
|
|||||||
use std::io::{self, Write};
|
use std::io::{self, Write};
|
||||||
use std::str;
|
use std::str;
|
||||||
|
|
||||||
use bstr::B;
|
|
||||||
use grep_matcher::{
|
use grep_matcher::{
|
||||||
LineMatchKind, LineTerminator, Match, Matcher, NoCaptures, NoError,
|
LineMatchKind, LineTerminator, Match, Matcher, NoCaptures, NoError,
|
||||||
};
|
};
|
||||||
|
use memchr::memchr;
|
||||||
use regex::bytes::{Regex, RegexBuilder};
|
use regex::bytes::{Regex, RegexBuilder};
|
||||||
|
|
||||||
use searcher::{BinaryDetection, Searcher, SearcherBuilder};
|
use searcher::{BinaryDetection, Searcher, SearcherBuilder};
|
||||||
@@ -94,8 +94,7 @@ impl Matcher for RegexMatcher {
|
|||||||
}
|
}
|
||||||
// Make it interesting and return the last byte in the current
|
// Make it interesting and return the last byte in the current
|
||||||
// line.
|
// line.
|
||||||
let i = B(haystack)
|
let i = memchr(self.line_term.unwrap().as_byte(), haystack)
|
||||||
.find_byte(self.line_term.unwrap().as_byte())
|
|
||||||
.map(|i| i)
|
.map(|i| i)
|
||||||
.unwrap_or(haystack.len() - 1);
|
.unwrap_or(haystack.len() - 1);
|
||||||
Ok(Some(LineMatchKind::Candidate(i)))
|
Ok(Some(LineMatchKind::Candidate(i)))
|
||||||
|
@@ -1,6 +1,6 @@
|
|||||||
[package]
|
[package]
|
||||||
name = "grep"
|
name = "grep"
|
||||||
version = "0.2.3" #:version
|
version = "0.2.2" #:version
|
||||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||||
description = """
|
description = """
|
||||||
Fast line oriented regex searching as a library.
|
Fast line oriented regex searching as a library.
|
||||||
@@ -14,19 +14,24 @@ license = "Unlicense/MIT"
|
|||||||
|
|
||||||
[dependencies]
|
[dependencies]
|
||||||
grep-cli = { version = "0.1.1", path = "../grep-cli" }
|
grep-cli = { version = "0.1.1", path = "../grep-cli" }
|
||||||
grep-matcher = { version = "0.1.2", path = "../grep-matcher" }
|
grep-matcher = { version = "0.1.1", path = "../grep-matcher" }
|
||||||
grep-pcre2 = { version = "0.1.2", path = "../grep-pcre2", optional = true }
|
grep-pcre2 = { version = "0.1.1", path = "../grep-pcre2", optional = true }
|
||||||
grep-printer = { version = "0.1.1", path = "../grep-printer" }
|
grep-printer = { version = "0.1.1", path = "../grep-printer" }
|
||||||
grep-regex = { version = "0.1.1", path = "../grep-regex" }
|
grep-regex = { version = "0.1.1", path = "../grep-regex" }
|
||||||
grep-searcher = { version = "0.1.1", path = "../grep-searcher" }
|
grep-searcher = { version = "0.1.1", path = "../grep-searcher" }
|
||||||
|
|
||||||
[dev-dependencies]
|
[dev-dependencies]
|
||||||
termcolor = "1.0.4"
|
atty = "0.2.11"
|
||||||
walkdir = "2.2.7"
|
regex = "1"
|
||||||
|
termcolor = "1"
|
||||||
|
walkdir = "2.2.2"
|
||||||
|
|
||||||
|
[dev-dependencies.clap]
|
||||||
|
version = "2.32.0"
|
||||||
|
default-features = false
|
||||||
|
features = ["suggestions"]
|
||||||
|
|
||||||
[features]
|
[features]
|
||||||
|
avx-accel = ["grep-searcher/avx-accel"]
|
||||||
simd-accel = ["grep-searcher/simd-accel"]
|
simd-accel = ["grep-searcher/simd-accel"]
|
||||||
pcre2 = ["grep-pcre2"]
|
pcre2 = ["grep-pcre2"]
|
||||||
|
|
||||||
# This feature is DEPRECATED. Runtime dispatch is used for SIMD now.
|
|
||||||
avx-accel = []
|
|
||||||
|
@@ -1,6 +1,6 @@
|
|||||||
[package]
|
[package]
|
||||||
name = "ignore"
|
name = "ignore"
|
||||||
version = "0.4.7" #:version
|
version = "0.4.3" #:version
|
||||||
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
authors = ["Andrew Gallant <jamslam@gmail.com>"]
|
||||||
description = """
|
description = """
|
||||||
A fast library for efficiently matching ignore files such as `.gitignore`
|
A fast library for efficiently matching ignore files such as `.gitignore`
|
||||||
@@ -18,21 +18,21 @@ name = "ignore"
|
|||||||
bench = false
|
bench = false
|
||||||
|
|
||||||
[dependencies]
|
[dependencies]
|
||||||
crossbeam-channel = "0.3.6"
|
crossbeam-channel = "0.2.4"
|
||||||
globset = { version = "0.4.3", path = "../globset" }
|
globset = { version = "0.4.2", path = "../globset" }
|
||||||
lazy_static = "1.1"
|
lazy_static = "1.1.0"
|
||||||
log = "0.4.5"
|
log = "0.4.5"
|
||||||
memchr = "2.1"
|
memchr = "2.0.2"
|
||||||
regex = "1.1"
|
regex = "1.0.5"
|
||||||
same-file = "1.0.4"
|
same-file = "1.0.3"
|
||||||
thread_local = "0.3.6"
|
thread_local = "0.3.6"
|
||||||
walkdir = "2.2.7"
|
walkdir = "2.2.5"
|
||||||
|
|
||||||
[target.'cfg(windows)'.dependencies.winapi-util]
|
[target.'cfg(windows)'.dependencies.winapi-util]
|
||||||
version = "0.1.2"
|
version = "0.1.1"
|
||||||
|
|
||||||
[dev-dependencies]
|
[dev-dependencies]
|
||||||
tempfile = "3.0.5"
|
tempdir = "0.3.7"
|
||||||
|
|
||||||
[features]
|
[features]
|
||||||
simd-accel = ["globset/simd-accel"]
|
simd-accel = ["globset/simd-accel"]
|
||||||
|
@@ -37,19 +37,19 @@ fn main() {
|
|||||||
Box::new(move |result| {
|
Box::new(move |result| {
|
||||||
use ignore::WalkState::*;
|
use ignore::WalkState::*;
|
||||||
|
|
||||||
tx.send(DirEntry::Y(result.unwrap())).unwrap();
|
tx.send(DirEntry::Y(result.unwrap()));
|
||||||
Continue
|
Continue
|
||||||
})
|
})
|
||||||
});
|
});
|
||||||
} else if simple {
|
} else if simple {
|
||||||
let walker = WalkDir::new(path);
|
let walker = WalkDir::new(path);
|
||||||
for result in walker {
|
for result in walker {
|
||||||
tx.send(DirEntry::X(result.unwrap())).unwrap();
|
tx.send(DirEntry::X(result.unwrap()));
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
let walker = WalkBuilder::new(path).build();
|
let walker = WalkBuilder::new(path).build();
|
||||||
for result in walker {
|
for result in walker {
|
||||||
tx.send(DirEntry::Y(result.unwrap())).unwrap();
|
tx.send(DirEntry::Y(result.unwrap()));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
drop(tx);
|
drop(tx);
|
||||||
|
@@ -22,7 +22,6 @@ use gitignore::{self, Gitignore, GitignoreBuilder};
|
|||||||
use pathutil::{is_hidden, strip_prefix};
|
use pathutil::{is_hidden, strip_prefix};
|
||||||
use overrides::{self, Override};
|
use overrides::{self, Override};
|
||||||
use types::{self, Types};
|
use types::{self, Types};
|
||||||
use walk::DirEntry;
|
|
||||||
use {Error, Match, PartialErrorBuilder};
|
use {Error, Match, PartialErrorBuilder};
|
||||||
|
|
||||||
/// IgnoreMatch represents information about where a match came from when using
|
/// IgnoreMatch represents information about where a match came from when using
|
||||||
@@ -74,8 +73,6 @@ struct IgnoreOptions {
|
|||||||
git_ignore: bool,
|
git_ignore: bool,
|
||||||
/// Whether to read .git/info/exclude files.
|
/// Whether to read .git/info/exclude files.
|
||||||
git_exclude: bool,
|
git_exclude: bool,
|
||||||
/// Whether to ignore files case insensitively
|
|
||||||
ignore_case_insensitive: bool,
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Ignore is a matcher useful for recursively walking one or more directories.
|
/// Ignore is a matcher useful for recursively walking one or more directories.
|
||||||
@@ -228,11 +225,7 @@ impl Ignore {
|
|||||||
Gitignore::empty()
|
Gitignore::empty()
|
||||||
} else {
|
} else {
|
||||||
let (m, err) =
|
let (m, err) =
|
||||||
create_gitignore(
|
create_gitignore(&dir, &self.0.custom_ignore_filenames);
|
||||||
&dir,
|
|
||||||
&self.0.custom_ignore_filenames,
|
|
||||||
self.0.opts.ignore_case_insensitive,
|
|
||||||
);
|
|
||||||
errs.maybe_push(err);
|
errs.maybe_push(err);
|
||||||
m
|
m
|
||||||
};
|
};
|
||||||
@@ -240,12 +233,7 @@ impl Ignore {
|
|||||||
if !self.0.opts.ignore {
|
if !self.0.opts.ignore {
|
||||||
Gitignore::empty()
|
Gitignore::empty()
|
||||||
} else {
|
} else {
|
||||||
let (m, err) =
|
let (m, err) = create_gitignore(&dir, &[".ignore"]);
|
||||||
create_gitignore(
|
|
||||||
&dir,
|
|
||||||
&[".ignore"],
|
|
||||||
self.0.opts.ignore_case_insensitive,
|
|
||||||
);
|
|
||||||
errs.maybe_push(err);
|
errs.maybe_push(err);
|
||||||
m
|
m
|
||||||
};
|
};
|
||||||
@@ -253,12 +241,7 @@ impl Ignore {
|
|||||||
if !self.0.opts.git_ignore {
|
if !self.0.opts.git_ignore {
|
||||||
Gitignore::empty()
|
Gitignore::empty()
|
||||||
} else {
|
} else {
|
||||||
let (m, err) =
|
let (m, err) = create_gitignore(&dir, &[".gitignore"]);
|
||||||
create_gitignore(
|
|
||||||
&dir,
|
|
||||||
&[".gitignore"],
|
|
||||||
self.0.opts.ignore_case_insensitive,
|
|
||||||
);
|
|
||||||
errs.maybe_push(err);
|
errs.maybe_push(err);
|
||||||
m
|
m
|
||||||
};
|
};
|
||||||
@@ -266,12 +249,7 @@ impl Ignore {
|
|||||||
if !self.0.opts.git_exclude {
|
if !self.0.opts.git_exclude {
|
||||||
Gitignore::empty()
|
Gitignore::empty()
|
||||||
} else {
|
} else {
|
||||||
let (m, err) =
|
let (m, err) = create_gitignore(&dir, &[".git/info/exclude"]);
|
||||||
create_gitignore(
|
|
||||||
&dir,
|
|
||||||
&[".git/info/exclude"],
|
|
||||||
self.0.opts.ignore_case_insensitive,
|
|
||||||
);
|
|
||||||
errs.maybe_push(err);
|
errs.maybe_push(err);
|
||||||
m
|
m
|
||||||
};
|
};
|
||||||
@@ -307,23 +285,11 @@ impl Ignore {
|
|||||||
|| has_explicit_ignores
|
|| has_explicit_ignores
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Like `matched`, but works with a directory entry instead.
|
|
||||||
pub fn matched_dir_entry<'a>(
|
|
||||||
&'a self,
|
|
||||||
dent: &DirEntry,
|
|
||||||
) -> Match<IgnoreMatch<'a>> {
|
|
||||||
let m = self.matched(dent.path(), dent.is_dir());
|
|
||||||
if m.is_none() && self.0.opts.hidden && is_hidden(dent) {
|
|
||||||
return Match::Ignore(IgnoreMatch::hidden());
|
|
||||||
}
|
|
||||||
m
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Returns a match indicating whether the given file path should be
|
/// Returns a match indicating whether the given file path should be
|
||||||
/// ignored or not.
|
/// ignored or not.
|
||||||
///
|
///
|
||||||
/// The match contains information about its origin.
|
/// The match contains information about its origin.
|
||||||
fn matched<'a, P: AsRef<Path>>(
|
pub fn matched<'a, P: AsRef<Path>>(
|
||||||
&'a self,
|
&'a self,
|
||||||
path: P,
|
path: P,
|
||||||
is_dir: bool,
|
is_dir: bool,
|
||||||
@@ -364,6 +330,9 @@ impl Ignore {
|
|||||||
whitelisted = mat;
|
whitelisted = mat;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
if whitelisted.is_none() && self.0.opts.hidden && is_hidden(path) {
|
||||||
|
return Match::Ignore(IgnoreMatch::hidden());
|
||||||
|
}
|
||||||
whitelisted
|
whitelisted
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -514,7 +483,6 @@ impl IgnoreBuilder {
|
|||||||
git_global: true,
|
git_global: true,
|
||||||
git_ignore: true,
|
git_ignore: true,
|
||||||
git_exclude: true,
|
git_exclude: true,
|
||||||
ignore_case_insensitive: false,
|
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -528,11 +496,7 @@ impl IgnoreBuilder {
|
|||||||
if !self.opts.git_global {
|
if !self.opts.git_global {
|
||||||
Gitignore::empty()
|
Gitignore::empty()
|
||||||
} else {
|
} else {
|
||||||
let mut builder = GitignoreBuilder::new("");
|
let (gi, err) = Gitignore::global();
|
||||||
builder
|
|
||||||
.case_insensitive(self.opts.ignore_case_insensitive)
|
|
||||||
.unwrap();
|
|
||||||
let (gi, err) = builder.build_global();
|
|
||||||
if let Some(err) = err {
|
if let Some(err) = err {
|
||||||
debug!("{}", err);
|
debug!("{}", err);
|
||||||
}
|
}
|
||||||
@@ -663,17 +627,6 @@ impl IgnoreBuilder {
|
|||||||
self.opts.git_exclude = yes;
|
self.opts.git_exclude = yes;
|
||||||
self
|
self
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Process ignore files case insensitively
|
|
||||||
///
|
|
||||||
/// This is disabled by default.
|
|
||||||
pub fn ignore_case_insensitive(
|
|
||||||
&mut self,
|
|
||||||
yes: bool,
|
|
||||||
) -> &mut IgnoreBuilder {
|
|
||||||
self.opts.ignore_case_insensitive = yes;
|
|
||||||
self
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Creates a new gitignore matcher for the directory given.
|
/// Creates a new gitignore matcher for the directory given.
|
||||||
@@ -685,11 +638,9 @@ impl IgnoreBuilder {
|
|||||||
pub fn create_gitignore<T: AsRef<OsStr>>(
|
pub fn create_gitignore<T: AsRef<OsStr>>(
|
||||||
dir: &Path,
|
dir: &Path,
|
||||||
names: &[T],
|
names: &[T],
|
||||||
case_insensitive: bool,
|
|
||||||
) -> (Gitignore, Option<Error>) {
|
) -> (Gitignore, Option<Error>) {
|
||||||
let mut builder = GitignoreBuilder::new(dir);
|
let mut builder = GitignoreBuilder::new(dir);
|
||||||
let mut errs = PartialErrorBuilder::default();
|
let mut errs = PartialErrorBuilder::default();
|
||||||
builder.case_insensitive(case_insensitive).unwrap();
|
|
||||||
for name in names {
|
for name in names {
|
||||||
let gipath = dir.join(name.as_ref());
|
let gipath = dir.join(name.as_ref());
|
||||||
errs.maybe_push_ignore_io(builder.add(gipath));
|
errs.maybe_push_ignore_io(builder.add(gipath));
|
||||||
@@ -710,7 +661,7 @@ mod tests {
|
|||||||
use std::io::Write;
|
use std::io::Write;
|
||||||
use std::path::Path;
|
use std::path::Path;
|
||||||
|
|
||||||
use tempfile::{self, TempDir};
|
use tempdir::TempDir;
|
||||||
|
|
||||||
use dir::IgnoreBuilder;
|
use dir::IgnoreBuilder;
|
||||||
use gitignore::Gitignore;
|
use gitignore::Gitignore;
|
||||||
@@ -732,13 +683,9 @@ mod tests {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn tmpdir(prefix: &str) -> TempDir {
|
|
||||||
tempfile::Builder::new().prefix(prefix).tempdir().unwrap()
|
|
||||||
}
|
|
||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn explicit_ignore() {
|
fn explicit_ignore() {
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
wfile(td.path().join("not-an-ignore"), "foo\n!bar");
|
wfile(td.path().join("not-an-ignore"), "foo\n!bar");
|
||||||
|
|
||||||
let (gi, err) = Gitignore::new(td.path().join("not-an-ignore"));
|
let (gi, err) = Gitignore::new(td.path().join("not-an-ignore"));
|
||||||
@@ -753,7 +700,7 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn git_exclude() {
|
fn git_exclude() {
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
mkdirp(td.path().join(".git/info"));
|
mkdirp(td.path().join(".git/info"));
|
||||||
wfile(td.path().join(".git/info/exclude"), "foo\n!bar");
|
wfile(td.path().join(".git/info/exclude"), "foo\n!bar");
|
||||||
|
|
||||||
@@ -766,7 +713,7 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn gitignore() {
|
fn gitignore() {
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
mkdirp(td.path().join(".git"));
|
mkdirp(td.path().join(".git"));
|
||||||
wfile(td.path().join(".gitignore"), "foo\n!bar");
|
wfile(td.path().join(".gitignore"), "foo\n!bar");
|
||||||
|
|
||||||
@@ -779,7 +726,7 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn gitignore_no_git() {
|
fn gitignore_no_git() {
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
wfile(td.path().join(".gitignore"), "foo\n!bar");
|
wfile(td.path().join(".gitignore"), "foo\n!bar");
|
||||||
|
|
||||||
let (ig, err) = IgnoreBuilder::new().build().add_child(td.path());
|
let (ig, err) = IgnoreBuilder::new().build().add_child(td.path());
|
||||||
@@ -791,7 +738,7 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn ignore() {
|
fn ignore() {
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
wfile(td.path().join(".ignore"), "foo\n!bar");
|
wfile(td.path().join(".ignore"), "foo\n!bar");
|
||||||
|
|
||||||
let (ig, err) = IgnoreBuilder::new().build().add_child(td.path());
|
let (ig, err) = IgnoreBuilder::new().build().add_child(td.path());
|
||||||
@@ -803,7 +750,7 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn custom_ignore() {
|
fn custom_ignore() {
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
let custom_ignore = ".customignore";
|
let custom_ignore = ".customignore";
|
||||||
wfile(td.path().join(custom_ignore), "foo\n!bar");
|
wfile(td.path().join(custom_ignore), "foo\n!bar");
|
||||||
|
|
||||||
@@ -819,7 +766,7 @@ mod tests {
|
|||||||
// Tests that a custom ignore file will override an .ignore.
|
// Tests that a custom ignore file will override an .ignore.
|
||||||
#[test]
|
#[test]
|
||||||
fn custom_ignore_over_ignore() {
|
fn custom_ignore_over_ignore() {
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
let custom_ignore = ".customignore";
|
let custom_ignore = ".customignore";
|
||||||
wfile(td.path().join(".ignore"), "foo");
|
wfile(td.path().join(".ignore"), "foo");
|
||||||
wfile(td.path().join(custom_ignore), "!foo");
|
wfile(td.path().join(custom_ignore), "!foo");
|
||||||
@@ -834,7 +781,7 @@ mod tests {
|
|||||||
// Tests that earlier custom ignore files have lower precedence than later.
|
// Tests that earlier custom ignore files have lower precedence than later.
|
||||||
#[test]
|
#[test]
|
||||||
fn custom_ignore_precedence() {
|
fn custom_ignore_precedence() {
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
let custom_ignore1 = ".customignore1";
|
let custom_ignore1 = ".customignore1";
|
||||||
let custom_ignore2 = ".customignore2";
|
let custom_ignore2 = ".customignore2";
|
||||||
wfile(td.path().join(custom_ignore1), "foo");
|
wfile(td.path().join(custom_ignore1), "foo");
|
||||||
@@ -851,7 +798,7 @@ mod tests {
|
|||||||
// Tests that an .ignore will override a .gitignore.
|
// Tests that an .ignore will override a .gitignore.
|
||||||
#[test]
|
#[test]
|
||||||
fn ignore_over_gitignore() {
|
fn ignore_over_gitignore() {
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
wfile(td.path().join(".gitignore"), "foo");
|
wfile(td.path().join(".gitignore"), "foo");
|
||||||
wfile(td.path().join(".ignore"), "!foo");
|
wfile(td.path().join(".ignore"), "!foo");
|
||||||
|
|
||||||
@@ -863,7 +810,7 @@ mod tests {
|
|||||||
// Tests that exclude has lower precedent than both .ignore and .gitignore.
|
// Tests that exclude has lower precedent than both .ignore and .gitignore.
|
||||||
#[test]
|
#[test]
|
||||||
fn exclude_lowest() {
|
fn exclude_lowest() {
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
wfile(td.path().join(".gitignore"), "!foo");
|
wfile(td.path().join(".gitignore"), "!foo");
|
||||||
wfile(td.path().join(".ignore"), "!bar");
|
wfile(td.path().join(".ignore"), "!bar");
|
||||||
mkdirp(td.path().join(".git/info"));
|
mkdirp(td.path().join(".git/info"));
|
||||||
@@ -878,8 +825,8 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn errored() {
|
fn errored() {
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
wfile(td.path().join(".gitignore"), "{foo");
|
wfile(td.path().join(".gitignore"), "f**oo");
|
||||||
|
|
||||||
let (_, err) = IgnoreBuilder::new().build().add_child(td.path());
|
let (_, err) = IgnoreBuilder::new().build().add_child(td.path());
|
||||||
assert!(err.is_some());
|
assert!(err.is_some());
|
||||||
@@ -887,9 +834,9 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn errored_both() {
|
fn errored_both() {
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
wfile(td.path().join(".gitignore"), "{foo");
|
wfile(td.path().join(".gitignore"), "f**oo");
|
||||||
wfile(td.path().join(".ignore"), "{bar");
|
wfile(td.path().join(".ignore"), "fo**o");
|
||||||
|
|
||||||
let (_, err) = IgnoreBuilder::new().build().add_child(td.path());
|
let (_, err) = IgnoreBuilder::new().build().add_child(td.path());
|
||||||
assert_eq!(2, partial(err.expect("an error")).len());
|
assert_eq!(2, partial(err.expect("an error")).len());
|
||||||
@@ -897,9 +844,9 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn errored_partial() {
|
fn errored_partial() {
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
mkdirp(td.path().join(".git"));
|
mkdirp(td.path().join(".git"));
|
||||||
wfile(td.path().join(".gitignore"), "{foo\nbar");
|
wfile(td.path().join(".gitignore"), "f**oo\nbar");
|
||||||
|
|
||||||
let (ig, err) = IgnoreBuilder::new().build().add_child(td.path());
|
let (ig, err) = IgnoreBuilder::new().build().add_child(td.path());
|
||||||
assert!(err.is_some());
|
assert!(err.is_some());
|
||||||
@@ -908,8 +855,8 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn errored_partial_and_ignore() {
|
fn errored_partial_and_ignore() {
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
wfile(td.path().join(".gitignore"), "{foo\nbar");
|
wfile(td.path().join(".gitignore"), "f**oo\nbar");
|
||||||
wfile(td.path().join(".ignore"), "!bar");
|
wfile(td.path().join(".ignore"), "!bar");
|
||||||
|
|
||||||
let (ig, err) = IgnoreBuilder::new().build().add_child(td.path());
|
let (ig, err) = IgnoreBuilder::new().build().add_child(td.path());
|
||||||
@@ -919,7 +866,7 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn not_present_empty() {
|
fn not_present_empty() {
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
|
|
||||||
let (_, err) = IgnoreBuilder::new().build().add_child(td.path());
|
let (_, err) = IgnoreBuilder::new().build().add_child(td.path());
|
||||||
assert!(err.is_none());
|
assert!(err.is_none());
|
||||||
@@ -929,7 +876,7 @@ mod tests {
|
|||||||
fn stops_at_git_dir() {
|
fn stops_at_git_dir() {
|
||||||
// This tests that .gitignore files beyond a .git barrier aren't
|
// This tests that .gitignore files beyond a .git barrier aren't
|
||||||
// matched, but .ignore files are.
|
// matched, but .ignore files are.
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
mkdirp(td.path().join(".git"));
|
mkdirp(td.path().join(".git"));
|
||||||
mkdirp(td.path().join("foo/.git"));
|
mkdirp(td.path().join("foo/.git"));
|
||||||
wfile(td.path().join(".gitignore"), "foo");
|
wfile(td.path().join(".gitignore"), "foo");
|
||||||
@@ -950,7 +897,7 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn absolute_parent() {
|
fn absolute_parent() {
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
mkdirp(td.path().join(".git"));
|
mkdirp(td.path().join(".git"));
|
||||||
mkdirp(td.path().join("foo"));
|
mkdirp(td.path().join("foo"));
|
||||||
wfile(td.path().join(".gitignore"), "bar");
|
wfile(td.path().join(".gitignore"), "bar");
|
||||||
@@ -973,7 +920,7 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn absolute_parent_anchored() {
|
fn absolute_parent_anchored() {
|
||||||
let td = tmpdir("ignore-test-");
|
let td = TempDir::new("ignore-test-").unwrap();
|
||||||
mkdirp(td.path().join(".git"));
|
mkdirp(td.path().join(".git"));
|
||||||
mkdirp(td.path().join("src/llvm"));
|
mkdirp(td.path().join("src/llvm"));
|
||||||
wfile(td.path().join(".gitignore"), "/llvm/\nfoo");
|
wfile(td.path().join(".gitignore"), "/llvm/\nfoo");
|
||||||
|
@@ -69,7 +69,8 @@ impl Glob {
|
|||||||
|
|
||||||
/// Returns true if and only if this glob has a `**/` prefix.
|
/// Returns true if and only if this glob has a `**/` prefix.
|
||||||
fn has_doublestar_prefix(&self) -> bool {
|
fn has_doublestar_prefix(&self) -> bool {
|
||||||
self.actual.starts_with("**/") || self.actual == "**"
|
self.actual.starts_with("**/")
|
||||||
|
|| (self.actual == "**" && self.is_only_dir)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -126,7 +127,16 @@ impl Gitignore {
|
|||||||
/// `$XDG_CONFIG_HOME/git/ignore` is read. If `$XDG_CONFIG_HOME` is not
|
/// `$XDG_CONFIG_HOME/git/ignore` is read. If `$XDG_CONFIG_HOME` is not
|
||||||
/// set or is empty, then `$HOME/.config/git/ignore` is used instead.
|
/// set or is empty, then `$HOME/.config/git/ignore` is used instead.
|
||||||
pub fn global() -> (Gitignore, Option<Error>) {
|
pub fn global() -> (Gitignore, Option<Error>) {
|
||||||
GitignoreBuilder::new("").build_global()
|
match gitconfig_excludes_path() {
|
||||||
|
None => (Gitignore::empty(), None),
|
||||||
|
Some(path) => {
|
||||||
|
if !path.is_file() {
|
||||||
|
(Gitignore::empty(), None)
|
||||||
|
} else {
|
||||||
|
Gitignore::new(path)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Creates a new empty gitignore matcher that never matches anything.
|
/// Creates a new empty gitignore matcher that never matches anything.
|
||||||
@@ -349,36 +359,6 @@ impl GitignoreBuilder {
|
|||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Build a global gitignore matcher using the configuration in this
|
|
||||||
/// builder.
|
|
||||||
///
|
|
||||||
/// This consumes ownership of the builder unlike `build` because it
|
|
||||||
/// must mutate the builder to add the global gitignore globs.
|
|
||||||
///
|
|
||||||
/// Note that this ignores the path given to this builder's constructor
|
|
||||||
/// and instead derives the path automatically from git's global
|
|
||||||
/// configuration.
|
|
||||||
pub fn build_global(mut self) -> (Gitignore, Option<Error>) {
|
|
||||||
match gitconfig_excludes_path() {
|
|
||||||
None => (Gitignore::empty(), None),
|
|
||||||
Some(path) => {
|
|
||||||
if !path.is_file() {
|
|
||||||
(Gitignore::empty(), None)
|
|
||||||
} else {
|
|
||||||
let mut errs = PartialErrorBuilder::default();
|
|
||||||
errs.maybe_push_ignore_io(self.add(path));
|
|
||||||
match self.build() {
|
|
||||||
Ok(gi) => (gi, errs.into_error_option()),
|
|
||||||
Err(err) => {
|
|
||||||
errs.push(err);
|
|
||||||
(Gitignore::empty(), errs.into_error_option())
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Add each glob from the file path given.
|
/// Add each glob from the file path given.
|
||||||
///
|
///
|
||||||
/// The file given should be formatted as a `gitignore` file.
|
/// The file given should be formatted as a `gitignore` file.
|
||||||
@@ -439,8 +419,6 @@ impl GitignoreBuilder {
|
|||||||
from: Option<PathBuf>,
|
from: Option<PathBuf>,
|
||||||
mut line: &str,
|
mut line: &str,
|
||||||
) -> Result<&mut GitignoreBuilder, Error> {
|
) -> Result<&mut GitignoreBuilder, Error> {
|
||||||
#![allow(deprecated)]
|
|
||||||
|
|
||||||
if line.starts_with("#") {
|
if line.starts_with("#") {
|
||||||
return Ok(self);
|
return Ok(self);
|
||||||
}
|
}
|
||||||
@@ -457,6 +435,7 @@ impl GitignoreBuilder {
|
|||||||
is_whitelist: false,
|
is_whitelist: false,
|
||||||
is_only_dir: false,
|
is_only_dir: false,
|
||||||
};
|
};
|
||||||
|
let mut literal_separator = false;
|
||||||
let mut is_absolute = false;
|
let mut is_absolute = false;
|
||||||
if line.starts_with("\\!") || line.starts_with("\\#") {
|
if line.starts_with("\\!") || line.starts_with("\\#") {
|
||||||
line = &line[1..];
|
line = &line[1..];
|
||||||
@@ -471,6 +450,7 @@ impl GitignoreBuilder {
|
|||||||
// then the glob can only match the beginning of a path
|
// then the glob can only match the beginning of a path
|
||||||
// (relative to the location of gitignore). We achieve this by
|
// (relative to the location of gitignore). We achieve this by
|
||||||
// simply banning wildcards from matching /.
|
// simply banning wildcards from matching /.
|
||||||
|
literal_separator = true;
|
||||||
line = &line[1..];
|
line = &line[1..];
|
||||||
is_absolute = true;
|
is_absolute = true;
|
||||||
}
|
}
|
||||||
@@ -483,11 +463,16 @@ impl GitignoreBuilder {
|
|||||||
line = &line[..i];
|
line = &line[..i];
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
// If there is a literal slash, then we note that so that globbing
|
||||||
|
// doesn't let wildcards match slashes.
|
||||||
glob.actual = line.to_string();
|
glob.actual = line.to_string();
|
||||||
// If there is a literal slash, then this is a glob that must match the
|
if is_absolute || line.chars().any(|c| c == '/') {
|
||||||
// entire path name. Otherwise, we should let it match anywhere, so use
|
literal_separator = true;
|
||||||
// a **/ prefix.
|
}
|
||||||
if !is_absolute && !line.chars().any(|c| c == '/') {
|
// If there was a slash, then this is a glob that must match the entire
|
||||||
|
// path name. Otherwise, we should let it match anywhere, so use a **/
|
||||||
|
// prefix.
|
||||||
|
if !literal_separator {
|
||||||
// ... but only if we don't already have a **/ prefix.
|
// ... but only if we don't already have a **/ prefix.
|
||||||
if !glob.has_doublestar_prefix() {
|
if !glob.has_doublestar_prefix() {
|
||||||
glob.actual = format!("**/{}", glob.actual);
|
glob.actual = format!("**/{}", glob.actual);
|
||||||
@@ -501,7 +486,7 @@ impl GitignoreBuilder {
|
|||||||
}
|
}
|
||||||
let parsed =
|
let parsed =
|
||||||
GlobBuilder::new(&glob.actual)
|
GlobBuilder::new(&glob.actual)
|
||||||
.literal_separator(true)
|
.literal_separator(literal_separator)
|
||||||
.case_insensitive(self.case_insensitive)
|
.case_insensitive(self.case_insensitive)
|
||||||
.backslash_escape(true)
|
.backslash_escape(true)
|
||||||
.build()
|
.build()
|
||||||
@@ -518,16 +503,12 @@ impl GitignoreBuilder {
|
|||||||
|
|
||||||
/// Toggle whether the globs should be matched case insensitively or not.
|
/// Toggle whether the globs should be matched case insensitively or not.
|
||||||
///
|
///
|
||||||
/// When this option is changed, only globs added after the change will be
|
/// When this option is changed, only globs added after the change will be affected.
|
||||||
/// affected.
|
|
||||||
///
|
///
|
||||||
/// This is disabled by default.
|
/// This is disabled by default.
|
||||||
pub fn case_insensitive(
|
pub fn case_insensitive(
|
||||||
&mut self,
|
&mut self, yes: bool
|
||||||
yes: bool,
|
|
||||||
) -> Result<&mut GitignoreBuilder, Error> {
|
) -> Result<&mut GitignoreBuilder, Error> {
|
||||||
// TODO: This should not return a `Result`. Fix this in the next semver
|
|
||||||
// release.
|
|
||||||
self.case_insensitive = yes;
|
self.case_insensitive = yes;
|
||||||
Ok(self)
|
Ok(self)
|
||||||
}
|
}
|
||||||
@@ -708,9 +689,6 @@ mod tests {
|
|||||||
ignored!(ig39, ROOT, "\\?", "?");
|
ignored!(ig39, ROOT, "\\?", "?");
|
||||||
ignored!(ig40, ROOT, "\\*", "*");
|
ignored!(ig40, ROOT, "\\*", "*");
|
||||||
ignored!(ig41, ROOT, "\\a", "a");
|
ignored!(ig41, ROOT, "\\a", "a");
|
||||||
ignored!(ig42, ROOT, "s*.rs", "sfoo.rs");
|
|
||||||
ignored!(ig43, ROOT, "**", "foo.rs");
|
|
||||||
ignored!(ig44, ROOT, "**/**/*", "a/foo.rs");
|
|
||||||
|
|
||||||
not_ignored!(ignot1, ROOT, "amonths", "months");
|
not_ignored!(ignot1, ROOT, "amonths", "months");
|
||||||
not_ignored!(ignot2, ROOT, "monthsa", "months");
|
not_ignored!(ignot2, ROOT, "monthsa", "months");
|
||||||
@@ -732,7 +710,6 @@ mod tests {
|
|||||||
not_ignored!(ignot16, ROOT, "*\n!**/", "foo", true);
|
not_ignored!(ignot16, ROOT, "*\n!**/", "foo", true);
|
||||||
not_ignored!(ignot17, ROOT, "src/*.rs", "src/grep/src/main.rs");
|
not_ignored!(ignot17, ROOT, "src/*.rs", "src/grep/src/main.rs");
|
||||||
not_ignored!(ignot18, ROOT, "path1/*", "path2/path1/foo");
|
not_ignored!(ignot18, ROOT, "path1/*", "path2/path1/foo");
|
||||||
not_ignored!(ignot19, ROOT, "s*.rs", "src/foo.rs");
|
|
||||||
|
|
||||||
fn bytes(s: &str) -> Vec<u8> {
|
fn bytes(s: &str) -> Vec<u8> {
|
||||||
s.to_string().into_bytes()
|
s.to_string().into_bytes()
|
||||||
|
@@ -56,7 +56,7 @@ extern crate memchr;
|
|||||||
extern crate regex;
|
extern crate regex;
|
||||||
extern crate same_file;
|
extern crate same_file;
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
extern crate tempfile;
|
extern crate tempdir;
|
||||||
extern crate thread_local;
|
extern crate thread_local;
|
||||||
extern crate walkdir;
|
extern crate walkdir;
|
||||||
#[cfg(windows)]
|
#[cfg(windows)]
|
||||||
|
@@ -139,16 +139,13 @@ impl OverrideBuilder {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// Toggle whether the globs should be matched case insensitively or not.
|
/// Toggle whether the globs should be matched case insensitively or not.
|
||||||
///
|
///
|
||||||
/// When this option is changed, only globs added after the change will be affected.
|
/// When this option is changed, only globs added after the change will be affected.
|
||||||
///
|
///
|
||||||
/// This is disabled by default.
|
/// This is disabled by default.
|
||||||
pub fn case_insensitive(
|
pub fn case_insensitive(
|
||||||
&mut self,
|
&mut self, yes: bool
|
||||||
yes: bool,
|
|
||||||
) -> Result<&mut OverrideBuilder, Error> {
|
) -> Result<&mut OverrideBuilder, Error> {
|
||||||
// TODO: This should not return a `Result`. Fix this in the next semver
|
|
||||||
// release.
|
|
||||||
self.builder.case_insensitive(yes)?;
|
self.builder.case_insensitive(yes)?;
|
||||||
Ok(self)
|
Ok(self)
|
||||||
}
|
}
|
||||||
|
@@ -1,56 +1,22 @@
|
|||||||
use std::ffi::OsStr;
|
use std::ffi::OsStr;
|
||||||
use std::path::Path;
|
use std::path::Path;
|
||||||
|
|
||||||
use walk::DirEntry;
|
/// Returns true if and only if this file path is considered to be hidden.
|
||||||
|
|
||||||
/// Returns true if and only if this entry is considered to be hidden.
|
|
||||||
///
|
|
||||||
/// This only returns true if the base name of the path starts with a `.`.
|
|
||||||
///
|
|
||||||
/// On Unix, this implements a more optimized check.
|
|
||||||
#[cfg(unix)]
|
#[cfg(unix)]
|
||||||
pub fn is_hidden(dent: &DirEntry) -> bool {
|
pub fn is_hidden<P: AsRef<Path>>(path: P) -> bool {
|
||||||
use std::os::unix::ffi::OsStrExt;
|
use std::os::unix::ffi::OsStrExt;
|
||||||
|
|
||||||
if let Some(name) = file_name(dent.path()) {
|
if let Some(name) = file_name(path.as_ref()) {
|
||||||
name.as_bytes().get(0) == Some(&b'.')
|
name.as_bytes().get(0) == Some(&b'.')
|
||||||
} else {
|
} else {
|
||||||
false
|
false
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Returns true if and only if this entry is considered to be hidden.
|
/// Returns true if and only if this file path is considered to be hidden.
|
||||||
///
|
#[cfg(not(unix))]
|
||||||
/// On Windows, this returns true if one of the following is true:
|
pub fn is_hidden<P: AsRef<Path>>(path: P) -> bool {
|
||||||
///
|
if let Some(name) = file_name(path.as_ref()) {
|
||||||
/// * The base name of the path starts with a `.`.
|
|
||||||
/// * The file attributes have the `HIDDEN` property set.
|
|
||||||
#[cfg(windows)]
|
|
||||||
pub fn is_hidden(dent: &DirEntry) -> bool {
|
|
||||||
use std::os::windows::fs::MetadataExt;
|
|
||||||
use winapi_util::file;
|
|
||||||
|
|
||||||
// This looks like we're doing an extra stat call, but on Windows, the
|
|
||||||
// directory traverser reuses the metadata retrieved from each directory
|
|
||||||
// entry and stores it on the DirEntry itself. So this is "free."
|
|
||||||
if let Ok(md) = dent.metadata() {
|
|
||||||
if file::is_hidden(md.file_attributes() as u64) {
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if let Some(name) = file_name(dent.path()) {
|
|
||||||
name.to_str().map(|s| s.starts_with(".")).unwrap_or(false)
|
|
||||||
} else {
|
|
||||||
false
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Returns true if and only if this entry is considered to be hidden.
|
|
||||||
///
|
|
||||||
/// This only returns true if the base name of the path starts with a `.`.
|
|
||||||
#[cfg(not(any(unix, windows)))]
|
|
||||||
pub fn is_hidden(dent: &DirEntry) -> bool {
|
|
||||||
if let Some(name) = file_name(dent.path()) {
|
|
||||||
name.to_str().map(|s| s.starts_with(".")).unwrap_or(false)
|
name.to_str().map(|s| s.starts_with(".")).unwrap_or(false)
|
||||||
} else {
|
} else {
|
||||||
false
|
false
|
||||||
|
@@ -103,15 +103,12 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
|||||||
("amake", &["*.mk", "*.bp"]),
|
("amake", &["*.mk", "*.bp"]),
|
||||||
("asciidoc", &["*.adoc", "*.asc", "*.asciidoc"]),
|
("asciidoc", &["*.adoc", "*.asc", "*.asciidoc"]),
|
||||||
("asm", &["*.asm", "*.s", "*.S"]),
|
("asm", &["*.asm", "*.s", "*.S"]),
|
||||||
("asp", &["*.aspx", "*.aspx.cs", "*.aspx.cs", "*.ascx", "*.ascx.cs", "*.ascx.vb"]),
|
|
||||||
("avro", &["*.avdl", "*.avpr", "*.avsc"]),
|
("avro", &["*.avdl", "*.avpr", "*.avsc"]),
|
||||||
("awk", &["*.awk"]),
|
("awk", &["*.awk"]),
|
||||||
("bazel", &["*.bzl", "WORKSPACE", "BUILD", "BUILD.bazel"]),
|
("bazel", &["*.bzl", "WORKSPACE", "BUILD"]),
|
||||||
("bitbake", &["*.bb", "*.bbappend", "*.bbclass", "*.conf", "*.inc"]),
|
("bitbake", &["*.bb", "*.bbappend", "*.bbclass", "*.conf", "*.inc"]),
|
||||||
("brotli", &["*.br"]),
|
("bzip2", &["*.bz2"]),
|
||||||
("buildstream", &["*.bst"]),
|
("c", &["*.c", "*.h", "*.H", "*.cats"]),
|
||||||
("bzip2", &["*.bz2", "*.tbz2"]),
|
|
||||||
("c", &["*.[chH]", "*.[chH].in", "*.cats"]),
|
|
||||||
("cabal", &["*.cabal"]),
|
("cabal", &["*.cabal"]),
|
||||||
("cbor", &["*.cbor"]),
|
("cbor", &["*.cbor"]),
|
||||||
("ceylon", &["*.ceylon"]),
|
("ceylon", &["*.ceylon"]),
|
||||||
@@ -121,8 +118,8 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
|||||||
("creole", &["*.creole"]),
|
("creole", &["*.creole"]),
|
||||||
("config", &["*.cfg", "*.conf", "*.config", "*.ini"]),
|
("config", &["*.cfg", "*.conf", "*.config", "*.ini"]),
|
||||||
("cpp", &[
|
("cpp", &[
|
||||||
"*.[ChH]", "*.cc", "*.[ch]pp", "*.[ch]xx", "*.hh", "*.inl",
|
"*.C", "*.cc", "*.cpp", "*.cxx",
|
||||||
"*.[ChH].in", "*.cc.in", "*.[ch]pp.in", "*.[ch]xx.in", "*.hh.in",
|
"*.h", "*.H", "*.hh", "*.hpp", "*.hxx", "*.inl",
|
||||||
]),
|
]),
|
||||||
("crystal", &["Projectfile", "*.cr"]),
|
("crystal", &["Projectfile", "*.cr"]),
|
||||||
("cs", &["*.cs"]),
|
("cs", &["*.cs"]),
|
||||||
@@ -130,7 +127,7 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
|||||||
("cshtml", &["*.cshtml"]),
|
("cshtml", &["*.cshtml"]),
|
||||||
("css", &["*.css", "*.scss"]),
|
("css", &["*.css", "*.scss"]),
|
||||||
("csv", &["*.csv"]),
|
("csv", &["*.csv"]),
|
||||||
("cython", &["*.pyx", "*.pxi", "*.pxd"]),
|
("cython", &["*.pyx"]),
|
||||||
("dart", &["*.dart"]),
|
("dart", &["*.dart"]),
|
||||||
("d", &["*.d"]),
|
("d", &["*.d"]),
|
||||||
("dhall", &["*.dhall"]),
|
("dhall", &["*.dhall"]),
|
||||||
@@ -148,7 +145,7 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
|||||||
("fsharp", &["*.fs", "*.fsx", "*.fsi"]),
|
("fsharp", &["*.fs", "*.fsx", "*.fsi"]),
|
||||||
("gn", &["*.gn", "*.gni"]),
|
("gn", &["*.gn", "*.gni"]),
|
||||||
("go", &["*.go"]),
|
("go", &["*.go"]),
|
||||||
("gzip", &["*.gz", "*.tgz"]),
|
("gzip", &["*.gz"]),
|
||||||
("groovy", &["*.groovy", "*.gradle"]),
|
("groovy", &["*.groovy", "*.gradle"]),
|
||||||
("h", &["*.h", "*.hpp"]),
|
("h", &["*.h", "*.hpp"]),
|
||||||
("hbs", &["*.hbs"]),
|
("hbs", &["*.hbs"]),
|
||||||
@@ -156,7 +153,7 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
|||||||
("hs", &["*.hs", "*.lhs"]),
|
("hs", &["*.hs", "*.lhs"]),
|
||||||
("html", &["*.htm", "*.html", "*.ejs"]),
|
("html", &["*.htm", "*.html", "*.ejs"]),
|
||||||
("idris", &["*.idr", "*.lidr"]),
|
("idris", &["*.idr", "*.lidr"]),
|
||||||
("java", &["*.java", "*.jsp", "*.jspx", "*.properties"]),
|
("java", &["*.java", "*.jsp"]),
|
||||||
("jinja", &["*.j2", "*.jinja", "*.jinja2"]),
|
("jinja", &["*.j2", "*.jinja", "*.jinja2"]),
|
||||||
("js", &[
|
("js", &[
|
||||||
"*.js", "*.jsx", "*.vue",
|
"*.js", "*.jsx", "*.vue",
|
||||||
@@ -196,16 +193,14 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
|||||||
"OFL-*[0-9]*",
|
"OFL-*[0-9]*",
|
||||||
]),
|
]),
|
||||||
("lisp", &["*.el", "*.jl", "*.lisp", "*.lsp", "*.sc", "*.scm"]),
|
("lisp", &["*.el", "*.jl", "*.lisp", "*.lsp", "*.sc", "*.scm"]),
|
||||||
("lock", &["*.lock", "package-lock.json"]),
|
|
||||||
("log", &["*.log"]),
|
("log", &["*.log"]),
|
||||||
("lua", &["*.lua"]),
|
("lua", &["*.lua"]),
|
||||||
("lzma", &["*.lzma"]),
|
("lzma", &["*.lzma"]),
|
||||||
("lz4", &["*.lz4"]),
|
("lz4", &["*.lz4"]),
|
||||||
("m4", &["*.ac", "*.m4"]),
|
("m4", &["*.ac", "*.m4"]),
|
||||||
("make", &[
|
("make", &[
|
||||||
"[Gg][Nn][Uu]makefile", "[Mm]akefile",
|
"gnumakefile", "Gnumakefile", "GNUmakefile",
|
||||||
"[Gg][Nn][Uu]makefile.am", "[Mm]akefile.am",
|
"makefile", "Makefile",
|
||||||
"[Gg][Nn][Uu]makefile.in", "[Mm]akefile.in",
|
|
||||||
"*.mk", "*.mak"
|
"*.mk", "*.mak"
|
||||||
]),
|
]),
|
||||||
("mako", &["*.mako", "*.mao"]),
|
("mako", &["*.mako", "*.mao"]),
|
||||||
@@ -229,14 +224,12 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
|||||||
("pdf", &["*.pdf"]),
|
("pdf", &["*.pdf"]),
|
||||||
("php", &["*.php", "*.php3", "*.php4", "*.php5", "*.phtml"]),
|
("php", &["*.php", "*.php3", "*.php4", "*.php5", "*.phtml"]),
|
||||||
("pod", &["*.pod"]),
|
("pod", &["*.pod"]),
|
||||||
("postscript", &[".eps", ".ps"]),
|
|
||||||
("protobuf", &["*.proto"]),
|
("protobuf", &["*.proto"]),
|
||||||
("ps", &["*.cdxml", "*.ps1", "*.ps1xml", "*.psd1", "*.psm1"]),
|
("ps", &["*.cdxml", "*.ps1", "*.ps1xml", "*.psd1", "*.psm1"]),
|
||||||
("puppet", &["*.erb", "*.pp", "*.rb"]),
|
("puppet", &["*.erb", "*.pp", "*.rb"]),
|
||||||
("purs", &["*.purs"]),
|
("purs", &["*.purs"]),
|
||||||
("py", &["*.py"]),
|
("py", &["*.py"]),
|
||||||
("qmake", &["*.pro", "*.pri", "*.prf"]),
|
("qmake", &["*.pro", "*.pri", "*.prf"]),
|
||||||
("qml", &["*.qml"]),
|
|
||||||
("readme", &["README*", "*README"]),
|
("readme", &["README*", "*README"]),
|
||||||
("r", &["*.R", "*.r", "*.Rmd", "*.Rnw"]),
|
("r", &["*.R", "*.r", "*.Rmd", "*.Rnw"]),
|
||||||
("rdoc", &["*.rdoc"]),
|
("rdoc", &["*.rdoc"]),
|
||||||
@@ -285,9 +278,8 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
|||||||
]),
|
]),
|
||||||
("taskpaper", &["*.taskpaper"]),
|
("taskpaper", &["*.taskpaper"]),
|
||||||
("tcl", &["*.tcl"]),
|
("tcl", &["*.tcl"]),
|
||||||
("tex", &["*.tex", "*.ltx", "*.cls", "*.sty", "*.bib", "*.dtx", "*.ins"]),
|
("tex", &["*.tex", "*.ltx", "*.cls", "*.sty", "*.bib"]),
|
||||||
("textile", &["*.textile"]),
|
("textile", &["*.textile"]),
|
||||||
("thrift", &["*.thrift"]),
|
|
||||||
("tf", &["*.tf"]),
|
("tf", &["*.tf"]),
|
||||||
("ts", &["*.ts", "*.tsx"]),
|
("ts", &["*.ts", "*.tsx"]),
|
||||||
("txt", &["*.txt"]),
|
("txt", &["*.txt"]),
|
||||||
@@ -301,14 +293,10 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
|||||||
("vimscript", &["*.vim"]),
|
("vimscript", &["*.vim"]),
|
||||||
("wiki", &["*.mediawiki", "*.wiki"]),
|
("wiki", &["*.mediawiki", "*.wiki"]),
|
||||||
("webidl", &["*.idl", "*.webidl", "*.widl"]),
|
("webidl", &["*.idl", "*.webidl", "*.widl"]),
|
||||||
("xml", &[
|
("xml", &["*.xml", "*.xml.dist"]),
|
||||||
"*.xml", "*.xml.dist", "*.dtd", "*.xsl", "*.xslt", "*.xsd", "*.xjb",
|
("xz", &["*.xz"]),
|
||||||
"*.rng", "*.sch",
|
|
||||||
]),
|
|
||||||
("xz", &["*.xz", "*.txz"]),
|
|
||||||
("yacc", &["*.y"]),
|
("yacc", &["*.y"]),
|
||||||
("yaml", &["*.yaml", "*.yml"]),
|
("yaml", &["*.yaml", "*.yml"]),
|
||||||
("zig", &["*.zig"]),
|
|
||||||
("zsh", &[
|
("zsh", &[
|
||||||
".zshenv", "zshenv",
|
".zshenv", "zshenv",
|
||||||
".zlogin", "zlogin",
|
".zlogin", "zlogin",
|
||||||
@@ -317,7 +305,6 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
|
|||||||
".zshrc", "zshrc",
|
".zshrc", "zshrc",
|
||||||
"*.zsh",
|
"*.zsh",
|
||||||
]),
|
]),
|
||||||
("zstd", &["*.zst", "*.zstd"]),
|
|
||||||
];
|
];
|
||||||
|
|
||||||
/// Glob represents a single glob in a set of file type definitions.
|
/// Glob represents a single glob in a set of file type definitions.
|
||||||
@@ -356,18 +343,6 @@ impl<'a> Glob<'a> {
|
|||||||
fn unmatched() -> Glob<'a> {
|
fn unmatched() -> Glob<'a> {
|
||||||
Glob(GlobInner::UnmatchedIgnore)
|
Glob(GlobInner::UnmatchedIgnore)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Return the file type defintion that matched, if one exists. A file type
|
|
||||||
/// definition always exists when a specific definition matches a file
|
|
||||||
/// path.
|
|
||||||
pub fn file_type_def(&self) -> Option<&FileTypeDef> {
|
|
||||||
match self {
|
|
||||||
Glob(GlobInner::UnmatchedIgnore) => None,
|
|
||||||
Glob(GlobInner::Matched { def, .. }) => {
|
|
||||||
Some(def)
|
|
||||||
},
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/// A single file type definition.
|
/// A single file type definition.
|
||||||
|
@@ -99,7 +99,7 @@ impl DirEntry {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// Returns true if and only if this entry points to a directory.
|
/// Returns true if and only if this entry points to a directory.
|
||||||
pub(crate) fn is_dir(&self) -> bool {
|
fn is_dir(&self) -> bool {
|
||||||
self.dent.is_dir()
|
self.dent.is_dir()
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -764,14 +764,6 @@ impl WalkBuilder {
|
|||||||
self
|
self
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Process ignore files case insensitively
|
|
||||||
///
|
|
||||||
/// This is disabled by default.
|
|
||||||
pub fn ignore_case_insensitive(&mut self, yes: bool) -> &mut WalkBuilder {
|
|
||||||
self.ig_builder.ignore_case_insensitive(yes);
|
|
||||||
self
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Set a function for sorting directory entries by their path.
|
/// Set a function for sorting directory entries by their path.
|
||||||
///
|
///
|
||||||
/// If a compare function is set, the resulting iterator will return all
|
/// If a compare function is set, the resulting iterator will return all
|
||||||
@@ -883,17 +875,16 @@ impl Walk {
|
|||||||
return Ok(true);
|
return Ok(true);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
if should_skip_entry(&self.ig, ent) {
|
let is_dir = ent.file_type().map_or(false, |ft| ft.is_dir());
|
||||||
return Ok(true);
|
let max_size = self.max_filesize;
|
||||||
}
|
let should_skip_path = skip_path(&self.ig, ent.path(), is_dir);
|
||||||
if self.max_filesize.is_some() && !ent.is_dir() {
|
let should_skip_filesize = if !is_dir && max_size.is_some() {
|
||||||
return Ok(skip_filesize(
|
skip_filesize(max_size.unwrap(), ent.path(), &ent.metadata().ok())
|
||||||
self.max_filesize.unwrap(),
|
} else {
|
||||||
ent.path(),
|
false
|
||||||
&ent.metadata().ok(),
|
};
|
||||||
));
|
|
||||||
}
|
Ok(should_skip_path || should_skip_filesize)
|
||||||
Ok(false)
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1127,7 +1118,7 @@ impl WalkParallel {
|
|||||||
dent: dent,
|
dent: dent,
|
||||||
ignore: self.ig_root.clone(),
|
ignore: self.ig_root.clone(),
|
||||||
root_device: root_device,
|
root_device: root_device,
|
||||||
})).unwrap();
|
}));
|
||||||
any_work = true;
|
any_work = true;
|
||||||
}
|
}
|
||||||
// ... but there's no need to start workers if we don't need them.
|
// ... but there's no need to start workers if we don't need them.
|
||||||
@@ -1421,11 +1412,13 @@ impl Worker {
|
|||||||
return WalkState::Continue;
|
return WalkState::Continue;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
let should_skip_path = should_skip_entry(ig, &dent);
|
let is_dir = dent.is_dir();
|
||||||
|
let max_size = self.max_filesize;
|
||||||
|
let should_skip_path = skip_path(ig, dent.path(), is_dir);
|
||||||
let should_skip_filesize =
|
let should_skip_filesize =
|
||||||
if self.max_filesize.is_some() && !dent.is_dir() {
|
if !is_dir && max_size.is_some() {
|
||||||
skip_filesize(
|
skip_filesize(
|
||||||
self.max_filesize.unwrap(),
|
max_size.unwrap(),
|
||||||
dent.path(),
|
dent.path(),
|
||||||
&dent.metadata().ok(),
|
&dent.metadata().ok(),
|
||||||
)
|
)
|
||||||
@@ -1438,7 +1431,7 @@ impl Worker {
|
|||||||
dent: dent,
|
dent: dent,
|
||||||
ignore: ig.clone(),
|
ignore: ig.clone(),
|
||||||
root_device: root_device,
|
root_device: root_device,
|
||||||
})).unwrap();
|
}));
|
||||||
}
|
}
|
||||||
WalkState::Continue
|
WalkState::Continue
|
||||||
}
|
}
|
||||||
@@ -1453,12 +1446,12 @@ impl Worker {
|
|||||||
return None;
|
return None;
|
||||||
}
|
}
|
||||||
match self.rx.try_recv() {
|
match self.rx.try_recv() {
|
||||||
Ok(Message::Work(work)) => {
|
Some(Message::Work(work)) => {
|
||||||
self.waiting(false);
|
self.waiting(false);
|
||||||
self.quitting(false);
|
self.quitting(false);
|
||||||
return Some(work);
|
return Some(work);
|
||||||
}
|
}
|
||||||
Ok(Message::Quit) => {
|
Some(Message::Quit) => {
|
||||||
// We can't just quit because a Message::Quit could be
|
// We can't just quit because a Message::Quit could be
|
||||||
// spurious. For example, it's possible to observe that
|
// spurious. For example, it's possible to observe that
|
||||||
// all workers are waiting even if there's more work to
|
// all workers are waiting even if there's more work to
|
||||||
@@ -1489,12 +1482,12 @@ impl Worker {
|
|||||||
// Otherwise, spin.
|
// Otherwise, spin.
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
Err(_) => {
|
None => {
|
||||||
self.waiting(true);
|
self.waiting(true);
|
||||||
self.quitting(false);
|
self.quitting(false);
|
||||||
if self.num_waiting() == self.threads {
|
if self.num_waiting() == self.threads {
|
||||||
for _ in 0..self.threads {
|
for _ in 0..self.threads {
|
||||||
self.tx.send(Message::Quit).unwrap();
|
self.tx.send(Message::Quit);
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
// You're right to consider this suspicious, but it's
|
// You're right to consider this suspicious, but it's
|
||||||
@@ -1608,16 +1601,17 @@ fn skip_filesize(
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn should_skip_entry(
|
fn skip_path(
|
||||||
ig: &Ignore,
|
ig: &Ignore,
|
||||||
dent: &DirEntry,
|
path: &Path,
|
||||||
|
is_dir: bool,
|
||||||
) -> bool {
|
) -> bool {
|
||||||
let m = ig.matched_dir_entry(dent);
|
let m = ig.matched(path, is_dir);
|
||||||
if m.is_ignore() {
|
if m.is_ignore() {
|
||||||
debug!("ignoring {}: {:?}", dent.path().display(), m);
|
debug!("ignoring {}: {:?}", path.display(), m);
|
||||||
true
|
true
|
||||||
} else if m.is_whitelist() {
|
} else if m.is_whitelist() {
|
||||||
debug!("whitelisting {}: {:?}", dent.path().display(), m);
|
debug!("whitelisting {}: {:?}", path.display(), m);
|
||||||
false
|
false
|
||||||
} else {
|
} else {
|
||||||
false
|
false
|
||||||
@@ -1708,7 +1702,7 @@ mod tests {
|
|||||||
use std::path::Path;
|
use std::path::Path;
|
||||||
use std::sync::{Arc, Mutex};
|
use std::sync::{Arc, Mutex};
|
||||||
|
|
||||||
use tempfile::{self, TempDir};
|
use tempdir::TempDir;
|
||||||
|
|
||||||
use super::{DirEntry, WalkBuilder, WalkState};
|
use super::{DirEntry, WalkBuilder, WalkState};
|
||||||
|
|
||||||
@@ -1795,10 +1789,6 @@ mod tests {
|
|||||||
paths
|
paths
|
||||||
}
|
}
|
||||||
|
|
||||||
fn tmpdir(prefix: &str) -> TempDir {
|
|
||||||
tempfile::Builder::new().prefix(prefix).tempdir().unwrap()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn assert_paths(
|
fn assert_paths(
|
||||||
prefix: &Path,
|
prefix: &Path,
|
||||||
builder: &WalkBuilder,
|
builder: &WalkBuilder,
|
||||||
@@ -1812,7 +1802,7 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn no_ignores() {
|
fn no_ignores() {
|
||||||
let td = tmpdir("walk-test-");
|
let td = TempDir::new("walk-test-").unwrap();
|
||||||
mkdirp(td.path().join("a/b/c"));
|
mkdirp(td.path().join("a/b/c"));
|
||||||
mkdirp(td.path().join("x/y"));
|
mkdirp(td.path().join("x/y"));
|
||||||
wfile(td.path().join("a/b/foo"), "");
|
wfile(td.path().join("a/b/foo"), "");
|
||||||
@@ -1825,7 +1815,7 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn custom_ignore() {
|
fn custom_ignore() {
|
||||||
let td = tmpdir("walk-test-");
|
let td = TempDir::new("walk-test-").unwrap();
|
||||||
let custom_ignore = ".customignore";
|
let custom_ignore = ".customignore";
|
||||||
mkdirp(td.path().join("a"));
|
mkdirp(td.path().join("a"));
|
||||||
wfile(td.path().join(custom_ignore), "foo");
|
wfile(td.path().join(custom_ignore), "foo");
|
||||||
@@ -1841,7 +1831,7 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn custom_ignore_exclusive_use() {
|
fn custom_ignore_exclusive_use() {
|
||||||
let td = tmpdir("walk-test-");
|
let td = TempDir::new("walk-test-").unwrap();
|
||||||
let custom_ignore = ".customignore";
|
let custom_ignore = ".customignore";
|
||||||
mkdirp(td.path().join("a"));
|
mkdirp(td.path().join("a"));
|
||||||
wfile(td.path().join(custom_ignore), "foo");
|
wfile(td.path().join(custom_ignore), "foo");
|
||||||
@@ -1861,7 +1851,7 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn gitignore() {
|
fn gitignore() {
|
||||||
let td = tmpdir("walk-test-");
|
let td = TempDir::new("walk-test-").unwrap();
|
||||||
mkdirp(td.path().join(".git"));
|
mkdirp(td.path().join(".git"));
|
||||||
mkdirp(td.path().join("a"));
|
mkdirp(td.path().join("a"));
|
||||||
wfile(td.path().join(".gitignore"), "foo");
|
wfile(td.path().join(".gitignore"), "foo");
|
||||||
@@ -1877,7 +1867,7 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn explicit_ignore() {
|
fn explicit_ignore() {
|
||||||
let td = tmpdir("walk-test-");
|
let td = TempDir::new("walk-test-").unwrap();
|
||||||
let igpath = td.path().join(".not-an-ignore");
|
let igpath = td.path().join(".not-an-ignore");
|
||||||
mkdirp(td.path().join("a"));
|
mkdirp(td.path().join("a"));
|
||||||
wfile(&igpath, "foo");
|
wfile(&igpath, "foo");
|
||||||
@@ -1893,7 +1883,7 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn explicit_ignore_exclusive_use() {
|
fn explicit_ignore_exclusive_use() {
|
||||||
let td = tmpdir("walk-test-");
|
let td = TempDir::new("walk-test-").unwrap();
|
||||||
let igpath = td.path().join(".not-an-ignore");
|
let igpath = td.path().join(".not-an-ignore");
|
||||||
mkdirp(td.path().join("a"));
|
mkdirp(td.path().join("a"));
|
||||||
wfile(&igpath, "foo");
|
wfile(&igpath, "foo");
|
||||||
@@ -1911,7 +1901,7 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn gitignore_parent() {
|
fn gitignore_parent() {
|
||||||
let td = tmpdir("walk-test-");
|
let td = TempDir::new("walk-test-").unwrap();
|
||||||
mkdirp(td.path().join(".git"));
|
mkdirp(td.path().join(".git"));
|
||||||
mkdirp(td.path().join("a"));
|
mkdirp(td.path().join("a"));
|
||||||
wfile(td.path().join(".gitignore"), "foo");
|
wfile(td.path().join(".gitignore"), "foo");
|
||||||
@@ -1924,7 +1914,7 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn max_depth() {
|
fn max_depth() {
|
||||||
let td = tmpdir("walk-test-");
|
let td = TempDir::new("walk-test-").unwrap();
|
||||||
mkdirp(td.path().join("a/b/c"));
|
mkdirp(td.path().join("a/b/c"));
|
||||||
wfile(td.path().join("foo"), "");
|
wfile(td.path().join("foo"), "");
|
||||||
wfile(td.path().join("a/foo"), "");
|
wfile(td.path().join("a/foo"), "");
|
||||||
@@ -1944,7 +1934,7 @@ mod tests {
|
|||||||
|
|
||||||
#[test]
|
#[test]
|
||||||
fn max_filesize() {
|
fn max_filesize() {
|
||||||
let td = tmpdir("walk-test-");
|
let td = TempDir::new("walk-test-").unwrap();
|
||||||
mkdirp(td.path().join("a/b"));
|
mkdirp(td.path().join("a/b"));
|
||||||
wfile_size(td.path().join("foo"), 0);
|
wfile_size(td.path().join("foo"), 0);
|
||||||
wfile_size(td.path().join("bar"), 400);
|
wfile_size(td.path().join("bar"), 400);
|
||||||
@@ -1971,7 +1961,7 @@ mod tests {
|
|||||||
#[cfg(unix)] // because symlinks on windows are weird
|
#[cfg(unix)] // because symlinks on windows are weird
|
||||||
#[test]
|
#[test]
|
||||||
fn symlinks() {
|
fn symlinks() {
|
||||||
let td = tmpdir("walk-test-");
|
let td = TempDir::new("walk-test-").unwrap();
|
||||||
mkdirp(td.path().join("a/b"));
|
mkdirp(td.path().join("a/b"));
|
||||||
symlink(td.path().join("a/b"), td.path().join("z"));
|
symlink(td.path().join("a/b"), td.path().join("z"));
|
||||||
wfile(td.path().join("a/b/foo"), "");
|
wfile(td.path().join("a/b/foo"), "");
|
||||||
@@ -1988,7 +1978,7 @@ mod tests {
|
|||||||
#[cfg(unix)] // because symlinks on windows are weird
|
#[cfg(unix)] // because symlinks on windows are weird
|
||||||
#[test]
|
#[test]
|
||||||
fn first_path_not_symlink() {
|
fn first_path_not_symlink() {
|
||||||
let td = tmpdir("walk-test-");
|
let td = TempDir::new("walk-test-").unwrap();
|
||||||
mkdirp(td.path().join("foo"));
|
mkdirp(td.path().join("foo"));
|
||||||
|
|
||||||
let dents = WalkBuilder::new(td.path().join("foo"))
|
let dents = WalkBuilder::new(td.path().join("foo"))
|
||||||
@@ -2009,7 +1999,7 @@ mod tests {
|
|||||||
#[cfg(unix)] // because symlinks on windows are weird
|
#[cfg(unix)] // because symlinks on windows are weird
|
||||||
#[test]
|
#[test]
|
||||||
fn symlink_loop() {
|
fn symlink_loop() {
|
||||||
let td = tmpdir("walk-test-");
|
let td = TempDir::new("walk-test-").unwrap();
|
||||||
mkdirp(td.path().join("a/b"));
|
mkdirp(td.path().join("a/b"));
|
||||||
symlink(td.path().join("a"), td.path().join("a/b/c"));
|
symlink(td.path().join("a"), td.path().join("a/b/c"));
|
||||||
|
|
||||||
@@ -2039,7 +2029,7 @@ mod tests {
|
|||||||
|
|
||||||
// If our test directory actually isn't a different volume from /sys,
|
// If our test directory actually isn't a different volume from /sys,
|
||||||
// then this test is meaningless and we shouldn't run it.
|
// then this test is meaningless and we shouldn't run it.
|
||||||
let td = tmpdir("walk-test-");
|
let td = TempDir::new("walk-test-").unwrap();
|
||||||
if device_num(td.path()).unwrap() == device_num("/sys").unwrap() {
|
if device_num(td.path()).unwrap() == device_num("/sys").unwrap() {
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
@@ -1,14 +1,14 @@
|
|||||||
class RipgrepBin < Formula
|
class RipgrepBin < Formula
|
||||||
version '0.10.0'
|
version '0.9.0'
|
||||||
desc "Recursively search directories for a regex pattern."
|
desc "Recursively search directories for a regex pattern."
|
||||||
homepage "https://github.com/BurntSushi/ripgrep"
|
homepage "https://github.com/BurntSushi/ripgrep"
|
||||||
|
|
||||||
if OS.mac?
|
if OS.mac?
|
||||||
url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-apple-darwin.tar.gz"
|
url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-apple-darwin.tar.gz"
|
||||||
sha256 "32754b4173ac87a7bfffd436d601a49362676eb1841ab33440f2f49c002c8967"
|
sha256 "36003ea8b62ad6274dc14140039f448cdf5026827d53cf24dad2d84005557a8c"
|
||||||
elsif OS.linux?
|
elsif OS.linux?
|
||||||
url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-unknown-linux-musl.tar.gz"
|
url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-unknown-linux-musl.tar.gz"
|
||||||
sha256 "c76080aa807a339b44139885d77d15ad60ab8cdd2c2fdaf345d0985625bc0f97"
|
sha256 "2eb4443e58f95051ff76ea036ed1faf940d5a04af4e7ff5a7dbd74576b907e99"
|
||||||
end
|
end
|
||||||
|
|
||||||
conflicts_with "ripgrep"
|
conflicts_with "ripgrep"
|
||||||
|
@@ -1 +0,0 @@
|
|||||||
disable_all_formatting = true
|
|
249
src/app.rs
249
src/app.rs
@@ -9,8 +9,7 @@
|
|||||||
// is where we read clap's configuration from the end user's arguments and turn
|
// is where we read clap's configuration from the end user's arguments and turn
|
||||||
// it into a ripgrep-specific configuration type that is not coupled with clap.
|
// it into a ripgrep-specific configuration type that is not coupled with clap.
|
||||||
|
|
||||||
use clap::{self, App, AppSettings, crate_authors, crate_version};
|
use clap::{self, App, AppSettings};
|
||||||
use lazy_static::lazy_static;
|
|
||||||
|
|
||||||
const ABOUT: &str = "
|
const ABOUT: &str = "
|
||||||
ripgrep (rg) recursively searches your current directory for a regex pattern.
|
ripgrep (rg) recursively searches your current directory for a regex pattern.
|
||||||
@@ -27,9 +26,6 @@ configuration file. The file can specify one shell argument per line. Lines
|
|||||||
starting with '#' are ignored. For more details, see the man page or the
|
starting with '#' are ignored. For more details, see the man page or the
|
||||||
README.
|
README.
|
||||||
|
|
||||||
Tip: to disable all smart filtering and make ripgrep behave a bit more like
|
|
||||||
classical grep, use 'rg -uuu'.
|
|
||||||
|
|
||||||
Project home page: https://github.com/BurntSushi/ripgrep
|
Project home page: https://github.com/BurntSushi/ripgrep
|
||||||
|
|
||||||
Use -h for short descriptions and --help for more details.";
|
Use -h for short descriptions and --help for more details.";
|
||||||
@@ -547,9 +543,7 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
|
|||||||
// flags are hidden and merely mentioned in the docs of the corresponding
|
// flags are hidden and merely mentioned in the docs of the corresponding
|
||||||
// "positive" flag.
|
// "positive" flag.
|
||||||
flag_after_context(&mut args);
|
flag_after_context(&mut args);
|
||||||
flag_auto_hybrid_regex(&mut args);
|
|
||||||
flag_before_context(&mut args);
|
flag_before_context(&mut args);
|
||||||
flag_binary(&mut args);
|
|
||||||
flag_block_buffered(&mut args);
|
flag_block_buffered(&mut args);
|
||||||
flag_byte_offset(&mut args);
|
flag_byte_offset(&mut args);
|
||||||
flag_case_sensitive(&mut args);
|
flag_case_sensitive(&mut args);
|
||||||
@@ -576,14 +570,12 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
|
|||||||
flag_iglob(&mut args);
|
flag_iglob(&mut args);
|
||||||
flag_ignore_case(&mut args);
|
flag_ignore_case(&mut args);
|
||||||
flag_ignore_file(&mut args);
|
flag_ignore_file(&mut args);
|
||||||
flag_ignore_file_case_insensitive(&mut args);
|
|
||||||
flag_invert_match(&mut args);
|
flag_invert_match(&mut args);
|
||||||
flag_json(&mut args);
|
flag_json(&mut args);
|
||||||
flag_line_buffered(&mut args);
|
flag_line_buffered(&mut args);
|
||||||
flag_line_number(&mut args);
|
flag_line_number(&mut args);
|
||||||
flag_line_regexp(&mut args);
|
flag_line_regexp(&mut args);
|
||||||
flag_max_columns(&mut args);
|
flag_max_columns(&mut args);
|
||||||
flag_max_columns_preview(&mut args);
|
|
||||||
flag_max_count(&mut args);
|
flag_max_count(&mut args);
|
||||||
flag_max_depth(&mut args);
|
flag_max_depth(&mut args);
|
||||||
flag_max_filesize(&mut args);
|
flag_max_filesize(&mut args);
|
||||||
@@ -592,7 +584,6 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
|
|||||||
flag_multiline_dotall(&mut args);
|
flag_multiline_dotall(&mut args);
|
||||||
flag_no_config(&mut args);
|
flag_no_config(&mut args);
|
||||||
flag_no_ignore(&mut args);
|
flag_no_ignore(&mut args);
|
||||||
flag_no_ignore_dot(&mut args);
|
|
||||||
flag_no_ignore_global(&mut args);
|
flag_no_ignore_global(&mut args);
|
||||||
flag_no_ignore_messages(&mut args);
|
flag_no_ignore_messages(&mut args);
|
||||||
flag_no_ignore_parent(&mut args);
|
flag_no_ignore_parent(&mut args);
|
||||||
@@ -606,7 +597,6 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
|
|||||||
flag_path_separator(&mut args);
|
flag_path_separator(&mut args);
|
||||||
flag_passthru(&mut args);
|
flag_passthru(&mut args);
|
||||||
flag_pcre2(&mut args);
|
flag_pcre2(&mut args);
|
||||||
flag_pcre2_version(&mut args);
|
|
||||||
flag_pre(&mut args);
|
flag_pre(&mut args);
|
||||||
flag_pre_glob(&mut args);
|
flag_pre_glob(&mut args);
|
||||||
flag_pretty(&mut args);
|
flag_pretty(&mut args);
|
||||||
@@ -653,7 +643,7 @@ will be provided. Namely, the following is equivalent to the above:
|
|||||||
let arg = RGArg::positional("pattern", "PATTERN")
|
let arg = RGArg::positional("pattern", "PATTERN")
|
||||||
.help(SHORT).long_help(LONG)
|
.help(SHORT).long_help(LONG)
|
||||||
.required_unless(&[
|
.required_unless(&[
|
||||||
"file", "files", "regexp", "type-list", "pcre2-version",
|
"file", "files", "regexp", "type-list",
|
||||||
]);
|
]);
|
||||||
args.push(arg);
|
args.push(arg);
|
||||||
}
|
}
|
||||||
@@ -684,50 +674,6 @@ This overrides the --context flag.
|
|||||||
args.push(arg);
|
args.push(arg);
|
||||||
}
|
}
|
||||||
|
|
||||||
fn flag_auto_hybrid_regex(args: &mut Vec<RGArg>) {
|
|
||||||
const SHORT: &str = "Dynamically use PCRE2 if necessary.";
|
|
||||||
const LONG: &str = long!("\
|
|
||||||
When this flag is used, ripgrep will dynamically choose between supported regex
|
|
||||||
engines depending on the features used in a pattern. When ripgrep chooses a
|
|
||||||
regex engine, it applies that choice for every regex provided to ripgrep (e.g.,
|
|
||||||
via multiple -e/--regexp or -f/--file flags).
|
|
||||||
|
|
||||||
As an example of how this flag might behave, ripgrep will attempt to use
|
|
||||||
its default finite automata based regex engine whenever the pattern can be
|
|
||||||
successfully compiled with that regex engine. If PCRE2 is enabled and if the
|
|
||||||
pattern given could not be compiled with the default regex engine, then PCRE2
|
|
||||||
will be automatically used for searching. If PCRE2 isn't available, then this
|
|
||||||
flag has no effect because there is only one regex engine to choose from.
|
|
||||||
|
|
||||||
In the future, ripgrep may adjust its heuristics for how it decides which
|
|
||||||
regex engine to use. In general, the heuristics will be limited to a static
|
|
||||||
analysis of the patterns, and not to any specific runtime behavior observed
|
|
||||||
while searching files.
|
|
||||||
|
|
||||||
The primary downside of using this flag is that it may not always be obvious
|
|
||||||
which regex engine ripgrep uses, and thus, the match semantics or performance
|
|
||||||
profile of ripgrep may subtly and unexpectedly change. However, in many cases,
|
|
||||||
all regex engines will agree on what constitutes a match and it can be nice
|
|
||||||
to transparently support more advanced regex features like look-around and
|
|
||||||
backreferences without explicitly needing to enable them.
|
|
||||||
|
|
||||||
This flag can be disabled with --no-auto-hybrid-regex.
|
|
||||||
");
|
|
||||||
let arg = RGArg::switch("auto-hybrid-regex")
|
|
||||||
.help(SHORT).long_help(LONG)
|
|
||||||
.overrides("no-auto-hybrid-regex")
|
|
||||||
.overrides("pcre2")
|
|
||||||
.overrides("no-pcre2");
|
|
||||||
args.push(arg);
|
|
||||||
|
|
||||||
let arg = RGArg::switch("no-auto-hybrid-regex")
|
|
||||||
.hidden()
|
|
||||||
.overrides("auto-hybrid-regex")
|
|
||||||
.overrides("pcre2")
|
|
||||||
.overrides("no-pcre2");
|
|
||||||
args.push(arg);
|
|
||||||
}
|
|
||||||
|
|
||||||
fn flag_before_context(args: &mut Vec<RGArg>) {
|
fn flag_before_context(args: &mut Vec<RGArg>) {
|
||||||
const SHORT: &str = "Show NUM lines before each match.";
|
const SHORT: &str = "Show NUM lines before each match.";
|
||||||
const LONG: &str = long!("\
|
const LONG: &str = long!("\
|
||||||
@@ -742,55 +688,6 @@ This overrides the --context flag.
|
|||||||
args.push(arg);
|
args.push(arg);
|
||||||
}
|
}
|
||||||
|
|
||||||
fn flag_binary(args: &mut Vec<RGArg>) {
|
|
||||||
const SHORT: &str = "Search binary files.";
|
|
||||||
const LONG: &str = long!("\
|
|
||||||
Enabling this flag will cause ripgrep to search binary files. By default,
|
|
||||||
ripgrep attempts to automatically skip binary files in order to improve the
|
|
||||||
relevance of results and make the search faster.
|
|
||||||
|
|
||||||
Binary files are heuristically detected based on whether they contain a NUL
|
|
||||||
byte or not. By default (without this flag set), once a NUL byte is seen,
|
|
||||||
ripgrep will stop searching the file. Usually, NUL bytes occur in the beginning
|
|
||||||
of most binary files. If a NUL byte occurs after a match, then ripgrep will
|
|
||||||
still stop searching the rest of the file, but a warning will be printed.
|
|
||||||
|
|
||||||
In contrast, when this flag is provided, ripgrep will continue searching a file
|
|
||||||
even if a NUL byte is found. In particular, if a NUL byte is found then ripgrep
|
|
||||||
will continue searching until either a match is found or the end of the file is
|
|
||||||
reached, whichever comes sooner. If a match is found, then ripgrep will stop
|
|
||||||
and print a warning saying that the search stopped prematurely.
|
|
||||||
|
|
||||||
If you want ripgrep to search a file without any special NUL byte handling at
|
|
||||||
all (and potentially print binary data to stdout), then you should use the
|
|
||||||
'-a/--text' flag.
|
|
||||||
|
|
||||||
The '--binary' flag is a flag for controlling ripgrep's automatic filtering
|
|
||||||
mechanism. As such, it does not need to be used when searching a file
|
|
||||||
explicitly or when searching stdin. That is, it is only applicable when
|
|
||||||
recursively searching a directory.
|
|
||||||
|
|
||||||
Note that when the '-u/--unrestricted' flag is provided for a third time, then
|
|
||||||
this flag is automatically enabled.
|
|
||||||
|
|
||||||
This flag can be disabled with '--no-binary'. It overrides the '-a/--text'
|
|
||||||
flag.
|
|
||||||
");
|
|
||||||
let arg = RGArg::switch("binary")
|
|
||||||
.help(SHORT).long_help(LONG)
|
|
||||||
.overrides("no-binary")
|
|
||||||
.overrides("text")
|
|
||||||
.overrides("no-text");
|
|
||||||
args.push(arg);
|
|
||||||
|
|
||||||
let arg = RGArg::switch("no-binary")
|
|
||||||
.hidden()
|
|
||||||
.overrides("binary")
|
|
||||||
.overrides("text")
|
|
||||||
.overrides("no-text");
|
|
||||||
args.push(arg);
|
|
||||||
}
|
|
||||||
|
|
||||||
fn flag_block_buffered(args: &mut Vec<RGArg>) {
|
fn flag_block_buffered(args: &mut Vec<RGArg>) {
|
||||||
const SHORT: &str = "Force block buffering.";
|
const SHORT: &str = "Force block buffering.";
|
||||||
const LONG: &str = long!("\
|
const LONG: &str = long!("\
|
||||||
@@ -891,17 +788,17 @@ to one of eight choices: red, blue, green, cyan, magenta, yellow, white and
|
|||||||
black. Styles are limited to nobold, bold, nointense, intense, nounderline
|
black. Styles are limited to nobold, bold, nointense, intense, nounderline
|
||||||
or underline.
|
or underline.
|
||||||
|
|
||||||
The format of the flag is '{type}:{attribute}:{value}'. '{type}' should be
|
The format of the flag is `{type}:{attribute}:{value}`. `{type}` should be
|
||||||
one of path, line, column or match. '{attribute}' can be fg, bg or style.
|
one of path, line, column or match. `{attribute}` can be fg, bg or style.
|
||||||
'{value}' is either a color (for fg and bg) or a text style. A special format,
|
`{value}` is either a color (for fg and bg) or a text style. A special format,
|
||||||
'{type}:none', will clear all color settings for '{type}'.
|
`{type}:none`, will clear all color settings for `{type}`.
|
||||||
|
|
||||||
For example, the following command will change the match color to magenta and
|
For example, the following command will change the match color to magenta and
|
||||||
the background color for line numbers to yellow:
|
the background color for line numbers to yellow:
|
||||||
|
|
||||||
rg --colors 'match:fg:magenta' --colors 'line:bg:yellow' foo.
|
rg --colors 'match:fg:magenta' --colors 'line:bg:yellow' foo.
|
||||||
|
|
||||||
Extended colors can be used for '{value}' when the terminal supports ANSI color
|
Extended colors can be used for `{value}` when the terminal supports ANSI color
|
||||||
sequences. These are specified as either 'x' (256-color) or 'x,x,x' (24-bit
|
sequences. These are specified as either 'x' (256-color) or 'x,x,x' (24-bit
|
||||||
truecolor) where x is a number between 0 and 255 inclusive. x may be given as
|
truecolor) where x is a number between 0 and 255 inclusive. x may be given as
|
||||||
a normal decimal number or a hexadecimal number, which is prefixed by `0x`.
|
a normal decimal number or a hexadecimal number, which is prefixed by `0x`.
|
||||||
@@ -1082,17 +979,10 @@ fn flag_encoding(args: &mut Vec<RGArg>) {
|
|||||||
const LONG: &str = long!("\
|
const LONG: &str = long!("\
|
||||||
Specify the text encoding that ripgrep will use on all files searched. The
|
Specify the text encoding that ripgrep will use on all files searched. The
|
||||||
default value is 'auto', which will cause ripgrep to do a best effort automatic
|
default value is 'auto', which will cause ripgrep to do a best effort automatic
|
||||||
detection of encoding on a per-file basis. Automatic detection in this case
|
detection of encoding on a per-file basis. Other supported values can be found
|
||||||
only applies to files that begin with a UTF-8 or UTF-16 byte-order mark (BOM).
|
in the list of labels here:
|
||||||
No other automatic detection is performed. One can also specify 'none' which
|
|
||||||
will then completely disable BOM sniffing and always result in searching the
|
|
||||||
raw bytes, including a BOM if it's present, regardless of its encoding.
|
|
||||||
|
|
||||||
Other supported values can be found in the list of labels here:
|
|
||||||
https://encoding.spec.whatwg.org/#concept-encoding-get
|
https://encoding.spec.whatwg.org/#concept-encoding-get
|
||||||
|
|
||||||
For more details on encoding and how ripgrep deals with it, see GUIDE.md.
|
|
||||||
|
|
||||||
This flag can be disabled with --no-encoding.
|
This flag can be disabled with --no-encoding.
|
||||||
");
|
");
|
||||||
let arg = RGArg::flag("encoding", "ENCODING").short("E")
|
let arg = RGArg::flag("encoding", "ENCODING").short("E")
|
||||||
@@ -1126,7 +1016,7 @@ fn flag_files(args: &mut Vec<RGArg>) {
|
|||||||
const SHORT: &str = "Print each file that would be searched.";
|
const SHORT: &str = "Print each file that would be searched.";
|
||||||
const LONG: &str = long!("\
|
const LONG: &str = long!("\
|
||||||
Print each file that would be searched without actually performing the search.
|
Print each file that would be searched without actually performing the search.
|
||||||
This is useful to determine whether a particular file is being searched or not.
|
This is useful to determine whether a particular file is being search or not.
|
||||||
");
|
");
|
||||||
let arg = RGArg::switch("files")
|
let arg = RGArg::switch("files")
|
||||||
.help(SHORT).long_help(LONG)
|
.help(SHORT).long_help(LONG)
|
||||||
@@ -1318,26 +1208,6 @@ directly on the command line, then used -g instead.
|
|||||||
args.push(arg);
|
args.push(arg);
|
||||||
}
|
}
|
||||||
|
|
||||||
fn flag_ignore_file_case_insensitive(args: &mut Vec<RGArg>) {
|
|
||||||
const SHORT: &str = "Process ignore files case insensitively.";
|
|
||||||
const LONG: &str = long!("\
|
|
||||||
Process ignore files (.gitignore, .ignore, etc.) case insensitively. Note that
|
|
||||||
this comes with a performance penalty and is most useful on case insensitive
|
|
||||||
file systems (such as Windows).
|
|
||||||
|
|
||||||
This flag can be disabled with the --no-ignore-file-case-insensitive flag.
|
|
||||||
");
|
|
||||||
let arg = RGArg::switch("ignore-file-case-insensitive")
|
|
||||||
.help(SHORT).long_help(LONG)
|
|
||||||
.overrides("no-ignore-file-case-insensitive");
|
|
||||||
args.push(arg);
|
|
||||||
|
|
||||||
let arg = RGArg::switch("no-ignore-file-case-insensitive")
|
|
||||||
.hidden()
|
|
||||||
.overrides("ignore-file-case-insensitive");
|
|
||||||
args.push(arg);
|
|
||||||
}
|
|
||||||
|
|
||||||
fn flag_invert_match(args: &mut Vec<RGArg>) {
|
fn flag_invert_match(args: &mut Vec<RGArg>) {
|
||||||
const SHORT: &str = "Invert matching.";
|
const SHORT: &str = "Invert matching.";
|
||||||
const LONG: &str = long!("\
|
const LONG: &str = long!("\
|
||||||
@@ -1490,30 +1360,6 @@ When this flag is omitted or is set to 0, then it has no effect.
|
|||||||
args.push(arg);
|
args.push(arg);
|
||||||
}
|
}
|
||||||
|
|
||||||
fn flag_max_columns_preview(args: &mut Vec<RGArg>) {
|
|
||||||
const SHORT: &str = "Print a preview for lines exceeding the limit.";
|
|
||||||
const LONG: &str = long!("\
|
|
||||||
When the '--max-columns' flag is used, ripgrep will by default completely
|
|
||||||
replace any line that is too long with a message indicating that a matching
|
|
||||||
line was removed. When this flag is combined with '--max-columns', a preview
|
|
||||||
of the line (corresponding to the limit size) is shown instead, where the part
|
|
||||||
of the line exceeding the limit is not shown.
|
|
||||||
|
|
||||||
If the '--max-columns' flag is not set, then this has no effect.
|
|
||||||
|
|
||||||
This flag can be disabled with '--no-max-columns-preview'.
|
|
||||||
");
|
|
||||||
let arg = RGArg::switch("max-columns-preview")
|
|
||||||
.help(SHORT).long_help(LONG)
|
|
||||||
.overrides("no-max-columns-preview");
|
|
||||||
args.push(arg);
|
|
||||||
|
|
||||||
let arg = RGArg::switch("no-max-columns-preview")
|
|
||||||
.hidden()
|
|
||||||
.overrides("max-columns-preview");
|
|
||||||
args.push(arg);
|
|
||||||
}
|
|
||||||
|
|
||||||
fn flag_max_count(args: &mut Vec<RGArg>) {
|
fn flag_max_count(args: &mut Vec<RGArg>) {
|
||||||
const SHORT: &str = "Limit the number of matches.";
|
const SHORT: &str = "Limit the number of matches.";
|
||||||
const LONG: &str = long!("\
|
const LONG: &str = long!("\
|
||||||
@@ -1689,7 +1535,7 @@ fn flag_no_ignore(args: &mut Vec<RGArg>) {
|
|||||||
const SHORT: &str = "Don't respect ignore files.";
|
const SHORT: &str = "Don't respect ignore files.";
|
||||||
const LONG: &str = long!("\
|
const LONG: &str = long!("\
|
||||||
Don't respect ignore files (.gitignore, .ignore, etc.). This implies
|
Don't respect ignore files (.gitignore, .ignore, etc.). This implies
|
||||||
--no-ignore-parent, --no-ignore-dot and --no-ignore-vcs.
|
--no-ignore-parent and --no-ignore-vcs.
|
||||||
|
|
||||||
This flag can be disabled with the --ignore flag.
|
This flag can be disabled with the --ignore flag.
|
||||||
");
|
");
|
||||||
@@ -1704,24 +1550,6 @@ This flag can be disabled with the --ignore flag.
|
|||||||
args.push(arg);
|
args.push(arg);
|
||||||
}
|
}
|
||||||
|
|
||||||
fn flag_no_ignore_dot(args: &mut Vec<RGArg>) {
|
|
||||||
const SHORT: &str = "Don't respect .ignore files.";
|
|
||||||
const LONG: &str = long!("\
|
|
||||||
Don't respect .ignore files.
|
|
||||||
|
|
||||||
This flag can be disabled with the --ignore-dot flag.
|
|
||||||
");
|
|
||||||
let arg = RGArg::switch("no-ignore-dot")
|
|
||||||
.help(SHORT).long_help(LONG)
|
|
||||||
.overrides("ignore-dot");
|
|
||||||
args.push(arg);
|
|
||||||
|
|
||||||
let arg = RGArg::switch("ignore-dot")
|
|
||||||
.hidden()
|
|
||||||
.overrides("no-ignore-dot");
|
|
||||||
args.push(arg);
|
|
||||||
}
|
|
||||||
|
|
||||||
fn flag_no_ignore_global(args: &mut Vec<RGArg>) {
|
fn flag_no_ignore_global(args: &mut Vec<RGArg>) {
|
||||||
const SHORT: &str = "Don't respect global ignore files.";
|
const SHORT: &str = "Don't respect global ignore files.";
|
||||||
const LONG: &str = long!("\
|
const LONG: &str = long!("\
|
||||||
@@ -1983,28 +1811,12 @@ This flag can be disabled with --no-pcre2.
|
|||||||
");
|
");
|
||||||
let arg = RGArg::switch("pcre2").short("P")
|
let arg = RGArg::switch("pcre2").short("P")
|
||||||
.help(SHORT).long_help(LONG)
|
.help(SHORT).long_help(LONG)
|
||||||
.overrides("no-pcre2")
|
.overrides("no-pcre2");
|
||||||
.overrides("auto-hybrid-regex")
|
|
||||||
.overrides("no-auto-hybrid-regex");
|
|
||||||
args.push(arg);
|
args.push(arg);
|
||||||
|
|
||||||
let arg = RGArg::switch("no-pcre2")
|
let arg = RGArg::switch("no-pcre2")
|
||||||
.hidden()
|
.hidden()
|
||||||
.overrides("pcre2")
|
.overrides("pcre2");
|
||||||
.overrides("auto-hybrid-regex")
|
|
||||||
.overrides("no-auto-hybrid-regex");
|
|
||||||
args.push(arg);
|
|
||||||
}
|
|
||||||
|
|
||||||
fn flag_pcre2_version(args: &mut Vec<RGArg>) {
|
|
||||||
const SHORT: &str = "Print the version of PCRE2 that ripgrep uses.";
|
|
||||||
const LONG: &str = long!("\
|
|
||||||
When this flag is present, ripgrep will print the version of PCRE2 in use,
|
|
||||||
along with other information, and then exit. If PCRE2 is not available, then
|
|
||||||
ripgrep will print an error message and exit with an error code.
|
|
||||||
");
|
|
||||||
let arg = RGArg::switch("pcre2-version")
|
|
||||||
.help(SHORT).long_help(LONG);
|
|
||||||
args.push(arg);
|
args.push(arg);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -2014,13 +1826,12 @@ fn flag_pre(args: &mut Vec<RGArg>) {
|
|||||||
For each input FILE, search the standard output of COMMAND FILE rather than the
|
For each input FILE, search the standard output of COMMAND FILE rather than the
|
||||||
contents of FILE. This option expects the COMMAND program to either be an
|
contents of FILE. This option expects the COMMAND program to either be an
|
||||||
absolute path or to be available in your PATH. Either an empty string COMMAND
|
absolute path or to be available in your PATH. Either an empty string COMMAND
|
||||||
or the '--no-pre' flag will disable this behavior.
|
or the `--no-pre` flag will disable this behavior.
|
||||||
|
|
||||||
WARNING: When this flag is set, ripgrep will unconditionally spawn a
|
WARNING: When this flag is set, ripgrep will unconditionally spawn a
|
||||||
process for every file that is searched. Therefore, this can incur an
|
process for every file that is searched. Therefore, this can incur an
|
||||||
unnecessarily large performance penalty if you don't otherwise need the
|
unnecessarily large performance penalty if you don't otherwise need the
|
||||||
flexibility offered by this flag. One possible mitigation to this is to use
|
flexibility offered by this flag.
|
||||||
the '--pre-glob' flag to limit which files a preprocessor is run with.
|
|
||||||
|
|
||||||
A preprocessor is not run when ripgrep is searching stdin.
|
A preprocessor is not run when ripgrep is searching stdin.
|
||||||
|
|
||||||
@@ -2187,9 +1998,9 @@ This flag can be used with the -o/--only-matching flag.
|
|||||||
fn flag_search_zip(args: &mut Vec<RGArg>) {
|
fn flag_search_zip(args: &mut Vec<RGArg>) {
|
||||||
const SHORT: &str = "Search in compressed files.";
|
const SHORT: &str = "Search in compressed files.";
|
||||||
const LONG: &str = long!("\
|
const LONG: &str = long!("\
|
||||||
Search in compressed files. Currently gzip, bzip2, xz, LZ4, LZMA, Brotli and
|
Search in compressed files. Currently gz, bz2, xz, lzma and lz4 files are
|
||||||
Zstd files are supported. This option expects the decompression binaries to be
|
supported. This option expects the decompression binaries to be available in
|
||||||
available in your PATH.
|
your PATH.
|
||||||
|
|
||||||
This flag can be disabled with --no-search-zip.
|
This flag can be disabled with --no-search-zip.
|
||||||
");
|
");
|
||||||
@@ -2256,7 +2067,7 @@ for this flag are:
|
|||||||
path Sort by file path.
|
path Sort by file path.
|
||||||
modified Sort by the last modified time on a file.
|
modified Sort by the last modified time on a file.
|
||||||
accessed Sort by the last accessed time on a file.
|
accessed Sort by the last accessed time on a file.
|
||||||
created Sort by the creation time on a file.
|
created Sort by the cretion time on a file.
|
||||||
none Do not sort results.
|
none Do not sort results.
|
||||||
|
|
||||||
If the sorting criteria isn't available on your system (for example, creation
|
If the sorting criteria isn't available on your system (for example, creation
|
||||||
@@ -2289,7 +2100,7 @@ for this flag are:
|
|||||||
path Sort by file path.
|
path Sort by file path.
|
||||||
modified Sort by the last modified time on a file.
|
modified Sort by the last modified time on a file.
|
||||||
accessed Sort by the last accessed time on a file.
|
accessed Sort by the last accessed time on a file.
|
||||||
created Sort by the creation time on a file.
|
created Sort by the cretion time on a file.
|
||||||
none Do not sort results.
|
none Do not sort results.
|
||||||
|
|
||||||
If the sorting criteria isn't available on your system (for example, creation
|
If the sorting criteria isn't available on your system (for example, creation
|
||||||
@@ -2349,23 +2160,20 @@ escape codes to be printed that alter the behavior of your terminal.
|
|||||||
When binary file detection is enabled it is imperfect. In general, it uses
|
When binary file detection is enabled it is imperfect. In general, it uses
|
||||||
a simple heuristic. If a NUL byte is seen during search, then the file is
|
a simple heuristic. If a NUL byte is seen during search, then the file is
|
||||||
considered binary and search stops (unless this flag is present).
|
considered binary and search stops (unless this flag is present).
|
||||||
Alternatively, if the '--binary' flag is used, then ripgrep will only quit
|
|
||||||
when it sees a NUL byte after it sees a match (or searches the entire file).
|
|
||||||
|
|
||||||
This flag can be disabled with '--no-text'. It overrides the '--binary' flag.
|
Note that when the `-u/--unrestricted` flag is provided for a third time, then
|
||||||
|
this flag is automatically enabled.
|
||||||
|
|
||||||
|
This flag can be disabled with --no-text.
|
||||||
");
|
");
|
||||||
let arg = RGArg::switch("text").short("a")
|
let arg = RGArg::switch("text").short("a")
|
||||||
.help(SHORT).long_help(LONG)
|
.help(SHORT).long_help(LONG)
|
||||||
.overrides("no-text")
|
.overrides("no-text");
|
||||||
.overrides("binary")
|
|
||||||
.overrides("no-binary");
|
|
||||||
args.push(arg);
|
args.push(arg);
|
||||||
|
|
||||||
let arg = RGArg::switch("no-text")
|
let arg = RGArg::switch("no-text")
|
||||||
.hidden()
|
.hidden()
|
||||||
.overrides("text")
|
.overrides("text");
|
||||||
.overrides("binary")
|
|
||||||
.overrides("no-binary");
|
|
||||||
args.push(arg);
|
args.push(arg);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -2494,7 +2302,8 @@ Reduce the level of \"smart\" searching. A single -u won't respect .gitignore
|
|||||||
(etc.) files. Two -u flags will additionally search hidden files and
|
(etc.) files. Two -u flags will additionally search hidden files and
|
||||||
directories. Three -u flags will additionally search binary files.
|
directories. Three -u flags will additionally search binary files.
|
||||||
|
|
||||||
'rg -uuu' is roughly equivalent to 'grep -r'.
|
-uu is roughly equivalent to grep -r and -uuu is roughly equivalent to grep -a
|
||||||
|
-r.
|
||||||
");
|
");
|
||||||
let arg = RGArg::switch("unrestricted").short("u")
|
let arg = RGArg::switch("unrestricted").short("u")
|
||||||
.help(SHORT).long_help(LONG)
|
.help(SHORT).long_help(LONG)
|
||||||
@@ -2536,7 +2345,7 @@ ripgrep is explicitly instructed to search one file or stdin.
|
|||||||
|
|
||||||
This flag overrides --with-filename.
|
This flag overrides --with-filename.
|
||||||
");
|
");
|
||||||
let arg = RGArg::switch("no-filename").short("I")
|
let arg = RGArg::switch("no-filename")
|
||||||
.help(NO_SHORT).long_help(NO_LONG)
|
.help(NO_SHORT).long_help(NO_LONG)
|
||||||
.overrides("with-filename");
|
.overrides("with-filename");
|
||||||
args.push(arg);
|
args.push(arg);
|
||||||
|
307
src/args.rs
307
src/args.rs
@@ -1,10 +1,9 @@
|
|||||||
use std::cmp;
|
use std::cmp;
|
||||||
use std::env;
|
use std::env;
|
||||||
use std::ffi::{OsStr, OsString};
|
use std::ffi::OsStr;
|
||||||
use std::fs;
|
use std::fs;
|
||||||
use std::io::{self, Write};
|
use std::io;
|
||||||
use std::path::{Path, PathBuf};
|
use std::path::{Path, PathBuf};
|
||||||
use std::process;
|
|
||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
use std::time::SystemTime;
|
use std::time::SystemTime;
|
||||||
|
|
||||||
@@ -35,22 +34,20 @@ use ignore::types::{FileTypeDef, Types, TypesBuilder};
|
|||||||
use ignore::{Walk, WalkBuilder, WalkParallel};
|
use ignore::{Walk, WalkBuilder, WalkParallel};
|
||||||
use log;
|
use log;
|
||||||
use num_cpus;
|
use num_cpus;
|
||||||
|
use path_printer::{PathPrinter, PathPrinterBuilder};
|
||||||
use regex;
|
use regex;
|
||||||
use termcolor::{
|
use termcolor::{
|
||||||
WriteColor,
|
WriteColor,
|
||||||
BufferWriter, ColorChoice,
|
BufferWriter, ColorChoice,
|
||||||
};
|
};
|
||||||
|
|
||||||
use crate::app;
|
use app;
|
||||||
use crate::config;
|
use config;
|
||||||
use crate::logger::Logger;
|
use logger::Logger;
|
||||||
use crate::messages::{set_messages, set_ignore_messages};
|
use messages::{set_messages, set_ignore_messages};
|
||||||
use crate::path_printer::{PathPrinter, PathPrinterBuilder};
|
use search::{PatternMatcher, Printer, SearchWorker, SearchWorkerBuilder};
|
||||||
use crate::search::{
|
use subject::SubjectBuilder;
|
||||||
PatternMatcher, Printer, SearchWorker, SearchWorkerBuilder,
|
use Result;
|
||||||
};
|
|
||||||
use crate::subject::SubjectBuilder;
|
|
||||||
use crate::Result;
|
|
||||||
|
|
||||||
/// The command that ripgrep should execute based on the command line
|
/// The command that ripgrep should execute based on the command line
|
||||||
/// configuration.
|
/// configuration.
|
||||||
@@ -73,8 +70,6 @@ pub enum Command {
|
|||||||
/// List all file type definitions configured, including the default file
|
/// List all file type definitions configured, including the default file
|
||||||
/// types and any additional file types added to the command line.
|
/// types and any additional file types added to the command line.
|
||||||
Types,
|
Types,
|
||||||
/// Print the version of PCRE2 in use.
|
|
||||||
PCRE2Version,
|
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Command {
|
impl Command {
|
||||||
@@ -84,11 +79,7 @@ impl Command {
|
|||||||
|
|
||||||
match *self {
|
match *self {
|
||||||
Search | SearchParallel => true,
|
Search | SearchParallel => true,
|
||||||
| SearchNever
|
SearchNever | Files | FilesParallel | Types => false,
|
||||||
| Files
|
|
||||||
| FilesParallel
|
|
||||||
| Types
|
|
||||||
| PCRE2Version => false,
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -137,7 +128,7 @@ impl Args {
|
|||||||
// trying to parse config files. If a config file exists and has
|
// trying to parse config files. If a config file exists and has
|
||||||
// arguments, then we re-parse argv, otherwise we just use the matches
|
// arguments, then we re-parse argv, otherwise we just use the matches
|
||||||
// we have here.
|
// we have here.
|
||||||
let early_matches = ArgMatches::new(clap_matches(env::args_os())?);
|
let early_matches = ArgMatches::new(app::app().get_matches());
|
||||||
set_messages(!early_matches.is_present("no-messages"));
|
set_messages(!early_matches.is_present("no-messages"));
|
||||||
set_ignore_messages(!early_matches.is_present("no-ignore-messages"));
|
set_ignore_messages(!early_matches.is_present("no-ignore-messages"));
|
||||||
|
|
||||||
@@ -152,7 +143,7 @@ impl Args {
|
|||||||
log::set_max_level(log::LevelFilter::Warn);
|
log::set_max_level(log::LevelFilter::Warn);
|
||||||
}
|
}
|
||||||
|
|
||||||
let matches = early_matches.reconfigure()?;
|
let matches = early_matches.reconfigure();
|
||||||
// The logging level may have changed if we brought in additional
|
// The logging level may have changed if we brought in additional
|
||||||
// arguments from a configuration file, so recheck it and set the log
|
// arguments from a configuration file, so recheck it and set the log
|
||||||
// level as appropriate.
|
// level as appropriate.
|
||||||
@@ -241,9 +232,7 @@ impl Args {
|
|||||||
let threads = self.matches().threads()?;
|
let threads = self.matches().threads()?;
|
||||||
let one_thread = is_one_search || threads == 1;
|
let one_thread = is_one_search || threads == 1;
|
||||||
|
|
||||||
Ok(if self.matches().is_present("pcre2-version") {
|
Ok(if self.matches().is_present("type-list") {
|
||||||
Command::PCRE2Version
|
|
||||||
} else if self.matches().is_present("type-list") {
|
|
||||||
Command::Types
|
Command::Types
|
||||||
} else if self.matches().is_present("files") {
|
} else if self.matches().is_present("files") {
|
||||||
if one_thread {
|
if one_thread {
|
||||||
@@ -276,11 +265,6 @@ impl Args {
|
|||||||
Ok(builder.build(wtr))
|
Ok(builder.build(wtr))
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Returns true if and only if ripgrep should be "quiet."
|
|
||||||
pub fn quiet(&self) -> bool {
|
|
||||||
self.matches().is_present("quiet")
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Returns true if and only if the search should quit after finding the
|
/// Returns true if and only if the search should quit after finding the
|
||||||
/// first match.
|
/// first match.
|
||||||
pub fn quit_after_match(&self) -> Result<bool> {
|
pub fn quit_after_match(&self) -> Result<bool> {
|
||||||
@@ -294,18 +278,15 @@ impl Args {
|
|||||||
&self,
|
&self,
|
||||||
wtr: W,
|
wtr: W,
|
||||||
) -> Result<SearchWorker<W>> {
|
) -> Result<SearchWorker<W>> {
|
||||||
let matches = self.matches();
|
|
||||||
let matcher = self.matcher().clone();
|
let matcher = self.matcher().clone();
|
||||||
let printer = self.printer(wtr)?;
|
let printer = self.printer(wtr)?;
|
||||||
let searcher = matches.searcher(self.paths())?;
|
let searcher = self.matches().searcher(self.paths())?;
|
||||||
let mut builder = SearchWorkerBuilder::new();
|
let mut builder = SearchWorkerBuilder::new();
|
||||||
builder
|
builder
|
||||||
.json_stats(matches.is_present("json"))
|
.json_stats(self.matches().is_present("json"))
|
||||||
.preprocessor(matches.preprocessor())
|
.preprocessor(self.matches().preprocessor())
|
||||||
.preprocessor_globs(matches.preprocessor_globs()?)
|
.preprocessor_globs(self.matches().preprocessor_globs()?)
|
||||||
.search_zip(matches.is_present("search-zip"))
|
.search_zip(self.matches().is_present("search-zip"));
|
||||||
.binary_detection_implicit(matches.binary_detection_implicit())
|
|
||||||
.binary_detection_explicit(matches.binary_detection_explicit());
|
|
||||||
Ok(builder.build(matcher, searcher, printer))
|
Ok(builder.build(matcher, searcher, printer))
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -494,37 +475,6 @@ impl SortByKind {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Encoding mode the searcher will use.
|
|
||||||
#[derive(Clone, Debug)]
|
|
||||||
enum EncodingMode {
|
|
||||||
/// Use an explicit encoding forcefully, but let BOM sniffing override it.
|
|
||||||
Some(Encoding),
|
|
||||||
/// Use only BOM sniffing to auto-detect an encoding.
|
|
||||||
Auto,
|
|
||||||
/// Use no explicit encoding and disable all BOM sniffing. This will
|
|
||||||
/// always result in searching the raw bytes, regardless of their
|
|
||||||
/// true encoding.
|
|
||||||
Disabled,
|
|
||||||
}
|
|
||||||
|
|
||||||
impl EncodingMode {
|
|
||||||
/// Checks if an explicit encoding has been set. Returns false for
|
|
||||||
/// automatic BOM sniffing and no sniffing.
|
|
||||||
///
|
|
||||||
/// This is only used to determine whether PCRE2 needs to have its own
|
|
||||||
/// UTF-8 checking enabled. If we have an explicit encoding set, then
|
|
||||||
/// we're always guaranteed to get UTF-8, so we can disable PCRE2's check.
|
|
||||||
/// Otherwise, we have no such guarantee, and must enable PCRE2' UTF-8
|
|
||||||
/// check.
|
|
||||||
#[cfg(feature = "pcre2")]
|
|
||||||
fn has_explicit_encoding(&self) -> bool {
|
|
||||||
match self {
|
|
||||||
EncodingMode::Some(_) => true,
|
|
||||||
_ => false
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl ArgMatches {
|
impl ArgMatches {
|
||||||
/// Create an ArgMatches from clap's parse result.
|
/// Create an ArgMatches from clap's parse result.
|
||||||
fn new(clap_matches: clap::ArgMatches<'static>) -> ArgMatches {
|
fn new(clap_matches: clap::ArgMatches<'static>) -> ArgMatches {
|
||||||
@@ -538,27 +488,25 @@ impl ArgMatches {
|
|||||||
///
|
///
|
||||||
/// If there are no additional arguments from the environment (e.g., a
|
/// If there are no additional arguments from the environment (e.g., a
|
||||||
/// config file), then the given matches are returned as is.
|
/// config file), then the given matches are returned as is.
|
||||||
fn reconfigure(self) -> Result<ArgMatches> {
|
fn reconfigure(self) -> ArgMatches {
|
||||||
// If the end user says no config, then respect it.
|
// If the end user says no config, then respect it.
|
||||||
if self.is_present("no-config") {
|
if self.is_present("no-config") {
|
||||||
log::debug!(
|
debug!("not reading config files because --no-config is present");
|
||||||
"not reading config files because --no-config is present"
|
return self;
|
||||||
);
|
|
||||||
return Ok(self);
|
|
||||||
}
|
}
|
||||||
// If the user wants ripgrep to use a config file, then parse args
|
// If the user wants ripgrep to use a config file, then parse args
|
||||||
// from that first.
|
// from that first.
|
||||||
let mut args = config::args();
|
let mut args = config::args();
|
||||||
if args.is_empty() {
|
if args.is_empty() {
|
||||||
return Ok(self);
|
return self;
|
||||||
}
|
}
|
||||||
let mut cliargs = env::args_os();
|
let mut cliargs = env::args_os();
|
||||||
if let Some(bin) = cliargs.next() {
|
if let Some(bin) = cliargs.next() {
|
||||||
args.insert(0, bin);
|
args.insert(0, bin);
|
||||||
}
|
}
|
||||||
args.extend(cliargs);
|
args.extend(cliargs);
|
||||||
log::debug!("final argv: {:?}", args);
|
debug!("final argv: {:?}", args);
|
||||||
Ok(ArgMatches(clap_matches(args)?))
|
ArgMatches::new(app::app().get_matches_from(args))
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Convert the result of parsing CLI arguments into ripgrep's higher level
|
/// Convert the result of parsing CLI arguments into ripgrep's higher level
|
||||||
@@ -599,25 +547,6 @@ impl ArgMatches {
|
|||||||
if self.is_present("pcre2") {
|
if self.is_present("pcre2") {
|
||||||
let matcher = self.matcher_pcre2(patterns)?;
|
let matcher = self.matcher_pcre2(patterns)?;
|
||||||
Ok(PatternMatcher::PCRE2(matcher))
|
Ok(PatternMatcher::PCRE2(matcher))
|
||||||
} else if self.is_present("auto-hybrid-regex") {
|
|
||||||
let rust_err = match self.matcher_rust(patterns) {
|
|
||||||
Ok(matcher) => return Ok(PatternMatcher::RustRegex(matcher)),
|
|
||||||
Err(err) => err,
|
|
||||||
};
|
|
||||||
log::debug!(
|
|
||||||
"error building Rust regex in hybrid mode:\n{}", rust_err,
|
|
||||||
);
|
|
||||||
let pcre_err = match self.matcher_pcre2(patterns) {
|
|
||||||
Ok(matcher) => return Ok(PatternMatcher::PCRE2(matcher)),
|
|
||||||
Err(err) => err,
|
|
||||||
};
|
|
||||||
Err(From::from(format!(
|
|
||||||
"regex could not be compiled with either the default regex \
|
|
||||||
engine or with PCRE2.\n\n\
|
|
||||||
default regex engine error:\n{}\n{}\n{}\n\n\
|
|
||||||
PCRE2 regex engine error:\n{}",
|
|
||||||
"~".repeat(79), rust_err, "~".repeat(79), pcre_err,
|
|
||||||
)))
|
|
||||||
} else {
|
} else {
|
||||||
let matcher = match self.matcher_rust(patterns) {
|
let matcher = match self.matcher_rust(patterns) {
|
||||||
Ok(matcher) => matcher,
|
Ok(matcher) => matcher,
|
||||||
@@ -686,16 +615,7 @@ impl ArgMatches {
|
|||||||
if let Some(limit) = self.dfa_size_limit()? {
|
if let Some(limit) = self.dfa_size_limit()? {
|
||||||
builder.dfa_size_limit(limit);
|
builder.dfa_size_limit(limit);
|
||||||
}
|
}
|
||||||
let res =
|
Ok(builder.build(&patterns.join("|"))?)
|
||||||
if self.is_present("fixed-strings") {
|
|
||||||
builder.build_literals(patterns)
|
|
||||||
} else {
|
|
||||||
builder.build(&patterns.join("|"))
|
|
||||||
};
|
|
||||||
match res {
|
|
||||||
Ok(m) => Ok(m),
|
|
||||||
Err(err) => Err(From::from(suggest_multiline(err.to_string()))),
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Build a matcher using PCRE2.
|
/// Build a matcher using PCRE2.
|
||||||
@@ -712,17 +632,12 @@ impl ArgMatches {
|
|||||||
.word(self.is_present("word-regexp"));
|
.word(self.is_present("word-regexp"));
|
||||||
// For whatever reason, the JIT craps out during regex compilation with
|
// For whatever reason, the JIT craps out during regex compilation with
|
||||||
// a "no more memory" error on 32 bit systems. So don't use it there.
|
// a "no more memory" error on 32 bit systems. So don't use it there.
|
||||||
if cfg!(target_pointer_width = "64") {
|
if !cfg!(target_pointer_width = "32") {
|
||||||
builder
|
builder.jit(true);
|
||||||
.jit_if_available(true)
|
|
||||||
// The PCRE2 docs say that 32KB is the default, and that 1MB
|
|
||||||
// should be big enough for anything. But let's crank it to
|
|
||||||
// 10MB.
|
|
||||||
.max_jit_stack_size(Some(10 * (1<<20)));
|
|
||||||
}
|
}
|
||||||
if self.pcre2_unicode() {
|
if self.pcre2_unicode() {
|
||||||
builder.utf(true).ucp(true);
|
builder.utf(true).ucp(true);
|
||||||
if self.encoding()?.has_explicit_encoding() {
|
if self.encoding()?.is_some() {
|
||||||
// SAFETY: If an encoding was specified, then we're guaranteed
|
// SAFETY: If an encoding was specified, then we're guaranteed
|
||||||
// to get valid UTF-8, so we can disable PCRE2's UTF checking.
|
// to get valid UTF-8, so we can disable PCRE2's UTF checking.
|
||||||
// (Feeding invalid UTF-8 to PCRE2 is undefined behavior.)
|
// (Feeding invalid UTF-8 to PCRE2 is undefined behavior.)
|
||||||
@@ -778,7 +693,6 @@ impl ArgMatches {
|
|||||||
.per_match(self.is_present("vimgrep"))
|
.per_match(self.is_present("vimgrep"))
|
||||||
.replacement(self.replacement())
|
.replacement(self.replacement())
|
||||||
.max_columns(self.max_columns()?)
|
.max_columns(self.max_columns()?)
|
||||||
.max_columns_preview(self.max_columns_preview())
|
|
||||||
.max_matches(self.max_count()?)
|
.max_matches(self.max_count()?)
|
||||||
.column(self.column())
|
.column(self.column())
|
||||||
.byte_offset(self.is_present("byte-offset"))
|
.byte_offset(self.is_present("byte-offset"))
|
||||||
@@ -838,16 +752,9 @@ impl ArgMatches {
|
|||||||
.before_context(ctx_before)
|
.before_context(ctx_before)
|
||||||
.after_context(ctx_after)
|
.after_context(ctx_after)
|
||||||
.passthru(self.is_present("passthru"))
|
.passthru(self.is_present("passthru"))
|
||||||
.memory_map(self.mmap_choice(paths));
|
.memory_map(self.mmap_choice(paths))
|
||||||
match self.encoding()? {
|
.binary_detection(self.binary_detection())
|
||||||
EncodingMode::Some(enc) => {
|
.encoding(self.encoding()?);
|
||||||
builder.encoding(Some(enc));
|
|
||||||
}
|
|
||||||
EncodingMode::Auto => {} // default for the searcher
|
|
||||||
EncodingMode::Disabled => {
|
|
||||||
builder.bom_sniffing(false);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
Ok(builder.build())
|
Ok(builder.build())
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -872,16 +779,18 @@ impl ArgMatches {
|
|||||||
.max_filesize(self.max_file_size()?)
|
.max_filesize(self.max_file_size()?)
|
||||||
.threads(self.threads()?)
|
.threads(self.threads()?)
|
||||||
.same_file_system(self.is_present("one-file-system"))
|
.same_file_system(self.is_present("one-file-system"))
|
||||||
.skip_stdout(!self.is_present("files"))
|
.skip_stdout(true)
|
||||||
.overrides(self.overrides()?)
|
.overrides(self.overrides()?)
|
||||||
.types(self.types()?)
|
.types(self.types()?)
|
||||||
.hidden(!self.hidden())
|
.hidden(!self.hidden())
|
||||||
.parents(!self.no_ignore_parent())
|
.parents(!self.no_ignore_parent())
|
||||||
.ignore(!self.no_ignore_dot())
|
.ignore(!self.no_ignore())
|
||||||
.git_global(!self.no_ignore_vcs() && !self.no_ignore_global())
|
.git_global(
|
||||||
.git_ignore(!self.no_ignore_vcs())
|
!self.no_ignore()
|
||||||
.git_exclude(!self.no_ignore_vcs())
|
&& !self.no_ignore_vcs()
|
||||||
.ignore_case_insensitive(self.ignore_file_case_insensitive());
|
&& !self.no_ignore_global())
|
||||||
|
.git_ignore(!self.no_ignore() && !self.no_ignore_vcs())
|
||||||
|
.git_exclude(!self.no_ignore() && !self.no_ignore_vcs());
|
||||||
if !self.no_ignore() {
|
if !self.no_ignore() {
|
||||||
builder.add_custom_ignore_filename(".rgignore");
|
builder.add_custom_ignore_filename(".rgignore");
|
||||||
}
|
}
|
||||||
@@ -897,42 +806,19 @@ impl ArgMatches {
|
|||||||
///
|
///
|
||||||
/// Methods are sorted alphabetically.
|
/// Methods are sorted alphabetically.
|
||||||
impl ArgMatches {
|
impl ArgMatches {
|
||||||
/// Returns the form of binary detection to perform on files that are
|
/// Returns the form of binary detection to perform.
|
||||||
/// implicitly searched via recursive directory traversal.
|
fn binary_detection(&self) -> BinaryDetection {
|
||||||
fn binary_detection_implicit(&self) -> BinaryDetection {
|
|
||||||
let none =
|
let none =
|
||||||
self.is_present("text")
|
self.is_present("text")
|
||||||
|
|| self.unrestricted_count() >= 3
|
||||||
|| self.is_present("null-data");
|
|| self.is_present("null-data");
|
||||||
let convert =
|
|
||||||
self.is_present("binary")
|
|
||||||
|| self.unrestricted_count() >= 3;
|
|
||||||
if none {
|
if none {
|
||||||
BinaryDetection::none()
|
BinaryDetection::none()
|
||||||
} else if convert {
|
|
||||||
BinaryDetection::convert(b'\x00')
|
|
||||||
} else {
|
} else {
|
||||||
BinaryDetection::quit(b'\x00')
|
BinaryDetection::quit(b'\x00')
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Returns the form of binary detection to perform on files that are
|
|
||||||
/// explicitly searched via the user invoking ripgrep on a particular
|
|
||||||
/// file or files or stdin.
|
|
||||||
///
|
|
||||||
/// In general, this should never be BinaryDetection::quit, since that acts
|
|
||||||
/// as a filter (but quitting immediately once a NUL byte is seen), and we
|
|
||||||
/// should never filter out files that the user wants to explicitly search.
|
|
||||||
fn binary_detection_explicit(&self) -> BinaryDetection {
|
|
||||||
let none =
|
|
||||||
self.is_present("text")
|
|
||||||
|| self.is_present("null-data");
|
|
||||||
if none {
|
|
||||||
BinaryDetection::none()
|
|
||||||
} else {
|
|
||||||
BinaryDetection::convert(b'\x00')
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Returns true if the command line configuration implies that a match
|
/// Returns true if the command line configuration implies that a match
|
||||||
/// can never be shown.
|
/// can never be shown.
|
||||||
fn can_never_match(&self, patterns: &[String]) -> bool {
|
fn can_never_match(&self, patterns: &[String]) -> bool {
|
||||||
@@ -1055,30 +941,24 @@ impl ArgMatches {
|
|||||||
u64_to_usize("dfa-size-limit", r)
|
u64_to_usize("dfa-size-limit", r)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Returns the encoding mode to use.
|
/// Returns the type of encoding to use.
|
||||||
///
|
///
|
||||||
/// This only returns an encoding if one is explicitly specified. Otherwise
|
/// This only returns an encoding if one is explicitly specified. When no
|
||||||
/// if set to automatic, the Searcher will do BOM sniffing for UTF-16
|
/// encoding is present, the Searcher will still do BOM sniffing for UTF-16
|
||||||
/// and transcode seamlessly. If disabled, no BOM sniffing nor transcoding
|
/// and transcode seamlessly.
|
||||||
/// will occur.
|
fn encoding(&self) -> Result<Option<Encoding>> {
|
||||||
fn encoding(&self) -> Result<EncodingMode> {
|
|
||||||
if self.is_present("no-encoding") {
|
if self.is_present("no-encoding") {
|
||||||
return Ok(EncodingMode::Auto);
|
return Ok(None);
|
||||||
}
|
}
|
||||||
|
|
||||||
let label = match self.value_of_lossy("encoding") {
|
let label = match self.value_of_lossy("encoding") {
|
||||||
None if self.pcre2_unicode() => "utf-8".to_string(),
|
None if self.pcre2_unicode() => "utf-8".to_string(),
|
||||||
None => return Ok(EncodingMode::Auto),
|
None => return Ok(None),
|
||||||
Some(label) => label,
|
Some(label) => label,
|
||||||
};
|
};
|
||||||
|
|
||||||
if label == "auto" {
|
if label == "auto" {
|
||||||
return Ok(EncodingMode::Auto);
|
return Ok(None);
|
||||||
} else if label == "none" {
|
|
||||||
return Ok(EncodingMode::Disabled);
|
|
||||||
}
|
}
|
||||||
|
Ok(Some(Encoding::new(&label)?))
|
||||||
Ok(EncodingMode::Some(Encoding::new(&label)?))
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Return the file separator to use based on the CLI configuration.
|
/// Return the file separator to use based on the CLI configuration.
|
||||||
@@ -1116,11 +996,6 @@ impl ArgMatches {
|
|||||||
self.is_present("hidden") || self.unrestricted_count() >= 2
|
self.is_present("hidden") || self.unrestricted_count() >= 2
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Returns true if ignore files should be processed case insensitively.
|
|
||||||
fn ignore_file_case_insensitive(&self) -> bool {
|
|
||||||
self.is_present("ignore-file-case-insensitive")
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Return all of the ignore file paths given on the command line.
|
/// Return all of the ignore file paths given on the command line.
|
||||||
fn ignore_paths(&self) -> Vec<PathBuf> {
|
fn ignore_paths(&self) -> Vec<PathBuf> {
|
||||||
let paths = match self.values_of_os("ignore-file") {
|
let paths = match self.values_of_os("ignore-file") {
|
||||||
@@ -1175,12 +1050,6 @@ impl ArgMatches {
|
|||||||
Ok(self.usize_of_nonzero("max-columns")?.map(|n| n as u64))
|
Ok(self.usize_of_nonzero("max-columns")?.map(|n| n as u64))
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Returns true if and only if a preview should be shown for lines that
|
|
||||||
/// exceed the maximum column limit.
|
|
||||||
fn max_columns_preview(&self) -> bool {
|
|
||||||
self.is_present("max-columns-preview")
|
|
||||||
}
|
|
||||||
|
|
||||||
/// The maximum number of matches permitted.
|
/// The maximum number of matches permitted.
|
||||||
fn max_count(&self) -> Result<Option<u64>> {
|
fn max_count(&self) -> Result<Option<u64>> {
|
||||||
Ok(self.usize_of("max-count")?.map(|n| n as u64))
|
Ok(self.usize_of("max-count")?.map(|n| n as u64))
|
||||||
@@ -1221,11 +1090,6 @@ impl ArgMatches {
|
|||||||
self.is_present("no-ignore") || self.unrestricted_count() >= 1
|
self.is_present("no-ignore") || self.unrestricted_count() >= 1
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Returns true if .ignore files should be ignored.
|
|
||||||
fn no_ignore_dot(&self) -> bool {
|
|
||||||
self.is_present("no-ignore-dot") || self.no_ignore()
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Returns true if global ignore files should be ignored.
|
/// Returns true if global ignore files should be ignored.
|
||||||
fn no_ignore_global(&self) -> bool {
|
fn no_ignore_global(&self) -> bool {
|
||||||
self.is_present("no-ignore-global") || self.no_ignore()
|
self.is_present("no-ignore-global") || self.no_ignore()
|
||||||
@@ -1272,7 +1136,7 @@ impl ArgMatches {
|
|||||||
builder.add(&glob)?;
|
builder.add(&glob)?;
|
||||||
}
|
}
|
||||||
// This only enables case insensitivity for subsequent globs.
|
// This only enables case insensitivity for subsequent globs.
|
||||||
builder.case_insensitive(true).unwrap();
|
builder.case_insensitive(true)?;
|
||||||
for glob in self.values_of_lossy_vec("iglob") {
|
for glob in self.values_of_lossy_vec("iglob") {
|
||||||
builder.add(&glob)?;
|
builder.add(&glob)?;
|
||||||
}
|
}
|
||||||
@@ -1310,8 +1174,7 @@ impl ArgMatches {
|
|||||||
!cli::is_readable_stdin()
|
!cli::is_readable_stdin()
|
||||||
|| (self.is_present("file") && file_is_stdin)
|
|| (self.is_present("file") && file_is_stdin)
|
||||||
|| self.is_present("files")
|
|| self.is_present("files")
|
||||||
|| self.is_present("type-list")
|
|| self.is_present("type-list");
|
||||||
|| self.is_present("pcre2-version");
|
|
||||||
if search_cwd {
|
if search_cwd {
|
||||||
Path::new("./").to_path_buf()
|
Path::new("./").to_path_buf()
|
||||||
} else {
|
} else {
|
||||||
@@ -1387,15 +1250,9 @@ impl ArgMatches {
|
|||||||
if let Some(paths) = self.values_of_os("file") {
|
if let Some(paths) = self.values_of_os("file") {
|
||||||
for path in paths {
|
for path in paths {
|
||||||
if path == "-" {
|
if path == "-" {
|
||||||
pats.extend(cli::patterns_from_stdin()?
|
pats.extend(cli::patterns_from_stdin()?);
|
||||||
.into_iter()
|
|
||||||
.map(|p| self.pattern_from_string(p))
|
|
||||||
);
|
|
||||||
} else {
|
} else {
|
||||||
pats.extend(cli::patterns_from_path(path)?
|
pats.extend(cli::patterns_from_path(path)?);
|
||||||
.into_iter()
|
|
||||||
.map(|p| self.pattern_from_string(p))
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -1424,17 +1281,13 @@ impl ArgMatches {
|
|||||||
/// Converts a &str pattern to a String pattern. The pattern is escaped
|
/// Converts a &str pattern to a String pattern. The pattern is escaped
|
||||||
/// if -F/--fixed-strings is set.
|
/// if -F/--fixed-strings is set.
|
||||||
fn pattern_from_str(&self, pat: &str) -> String {
|
fn pattern_from_str(&self, pat: &str) -> String {
|
||||||
self.pattern_from_string(pat.to_string())
|
let litpat = self.pattern_literal(pat.to_string());
|
||||||
}
|
let s = self.pattern_line(litpat);
|
||||||
|
|
||||||
/// Applies additional processing on the given pattern if necessary
|
if s.is_empty() {
|
||||||
/// (such as escaping meta characters or turning it into a line regex).
|
|
||||||
fn pattern_from_string(&self, pat: String) -> String {
|
|
||||||
let pat = self.pattern_line(self.pattern_literal(pat));
|
|
||||||
if pat.is_empty() {
|
|
||||||
self.pattern_empty()
|
self.pattern_empty()
|
||||||
} else {
|
} else {
|
||||||
pat
|
s
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1693,17 +1546,6 @@ and look-around.", msg)
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn suggest_multiline(msg: String) -> String {
|
|
||||||
if msg.contains("the literal") && msg.contains("not allowed") {
|
|
||||||
format!("{}
|
|
||||||
|
|
||||||
Consider enabling multiline mode with the --multiline flag (or -U for short).
|
|
||||||
When multiline mode is enabled, new line characters can be matched.", msg)
|
|
||||||
} else {
|
|
||||||
msg
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Convert the result of parsing a human readable file size to a `usize`,
|
/// Convert the result of parsing a human readable file size to a `usize`,
|
||||||
/// failing if the type does not fit.
|
/// failing if the type does not fit.
|
||||||
fn u64_to_usize(
|
fn u64_to_usize(
|
||||||
@@ -1750,32 +1592,3 @@ where G: Fn(&fs::Metadata) -> io::Result<SystemTime>
|
|||||||
t1.cmp(&t2)
|
t1.cmp(&t2)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Returns a clap matches object if the given arguments parse successfully.
|
|
||||||
///
|
|
||||||
/// Otherwise, if an error occurred, then it is returned unless the error
|
|
||||||
/// corresponds to a `--help` or `--version` request. In which case, the
|
|
||||||
/// corresponding output is printed and the current process is exited
|
|
||||||
/// successfully.
|
|
||||||
fn clap_matches<I, T>(
|
|
||||||
args: I,
|
|
||||||
) -> Result<clap::ArgMatches<'static>>
|
|
||||||
where I: IntoIterator<Item=T>,
|
|
||||||
T: Into<OsString> + Clone
|
|
||||||
{
|
|
||||||
let err = match app::app().get_matches_from_safe(args) {
|
|
||||||
Ok(matches) => return Ok(matches),
|
|
||||||
Err(err) => err,
|
|
||||||
};
|
|
||||||
if err.use_stderr() {
|
|
||||||
return Err(err.into());
|
|
||||||
}
|
|
||||||
// Explicitly ignore any error returned by write!. The most likely error
|
|
||||||
// at this point is a broken pipe error, in which case, we want to ignore
|
|
||||||
// it and exit quietly.
|
|
||||||
//
|
|
||||||
// (This is the point of this helper function. clap's functionality for
|
|
||||||
// doing this will panic on a broken pipe error.)
|
|
||||||
let _ = write!(io::stdout(), "{}", err);
|
|
||||||
process::exit(0);
|
|
||||||
}
|
|
||||||
|
@@ -5,14 +5,11 @@
|
|||||||
use std::env;
|
use std::env;
|
||||||
use std::error::Error;
|
use std::error::Error;
|
||||||
use std::fs::File;
|
use std::fs::File;
|
||||||
use std::io;
|
use std::io::{self, BufRead};
|
||||||
use std::ffi::OsString;
|
use std::ffi::OsString;
|
||||||
use std::path::{Path, PathBuf};
|
use std::path::{Path, PathBuf};
|
||||||
|
|
||||||
use bstr::io::BufReadExt;
|
use Result;
|
||||||
use log;
|
|
||||||
|
|
||||||
use crate::Result;
|
|
||||||
|
|
||||||
/// Return a sequence of arguments derived from ripgrep rc configuration files.
|
/// Return a sequence of arguments derived from ripgrep rc configuration files.
|
||||||
pub fn args() -> Vec<OsString> {
|
pub fn args() -> Vec<OsString> {
|
||||||
@@ -37,7 +34,7 @@ pub fn args() -> Vec<OsString> {
|
|||||||
message!("{}:{}", config_path.display(), err);
|
message!("{}:{}", config_path.display(), err);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
log::debug!(
|
debug!(
|
||||||
"{}: arguments loaded from config file: {:?}",
|
"{}: arguments loaded from config file: {:?}",
|
||||||
config_path.display(),
|
config_path.display(),
|
||||||
args
|
args
|
||||||
@@ -77,29 +74,62 @@ fn parse<P: AsRef<Path>>(
|
|||||||
fn parse_reader<R: io::Read>(
|
fn parse_reader<R: io::Read>(
|
||||||
rdr: R,
|
rdr: R,
|
||||||
) -> Result<(Vec<OsString>, Vec<Box<Error>>)> {
|
) -> Result<(Vec<OsString>, Vec<Box<Error>>)> {
|
||||||
let bufrdr = io::BufReader::new(rdr);
|
let mut bufrdr = io::BufReader::new(rdr);
|
||||||
let (mut args, mut errs) = (vec![], vec![]);
|
let (mut args, mut errs) = (vec![], vec![]);
|
||||||
|
let mut line = vec![];
|
||||||
let mut line_number = 0;
|
let mut line_number = 0;
|
||||||
bufrdr.for_byte_line_with_terminator(|line| {
|
while {
|
||||||
|
line.clear();
|
||||||
line_number += 1;
|
line_number += 1;
|
||||||
|
bufrdr.read_until(b'\n', &mut line)? > 0
|
||||||
let line = line.trim();
|
} {
|
||||||
|
trim(&mut line);
|
||||||
if line.is_empty() || line[0] == b'#' {
|
if line.is_empty() || line[0] == b'#' {
|
||||||
return Ok(true);
|
continue;
|
||||||
}
|
}
|
||||||
match line.to_os_str() {
|
match bytes_to_os_string(&line) {
|
||||||
Ok(osstr) => {
|
Ok(osstr) => {
|
||||||
args.push(osstr.to_os_string());
|
args.push(osstr);
|
||||||
}
|
}
|
||||||
Err(err) => {
|
Err(err) => {
|
||||||
errs.push(format!("{}: {}", line_number, err).into());
|
errs.push(format!("{}: {}", line_number, err).into());
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
Ok(true)
|
}
|
||||||
})?;
|
|
||||||
Ok((args, errs))
|
Ok((args, errs))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Trim the given bytes of whitespace according to the ASCII definition.
|
||||||
|
fn trim(x: &mut Vec<u8>) {
|
||||||
|
let upto = x.iter().take_while(|b| is_space(**b)).count();
|
||||||
|
x.drain(..upto);
|
||||||
|
let revto = x.len() - x.iter().rev().take_while(|b| is_space(**b)).count();
|
||||||
|
x.drain(revto..);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Returns true if and only if the given byte is an ASCII space character.
|
||||||
|
fn is_space(b: u8) -> bool {
|
||||||
|
b == b'\t'
|
||||||
|
|| b == b'\n'
|
||||||
|
|| b == b'\x0B'
|
||||||
|
|| b == b'\x0C'
|
||||||
|
|| b == b'\r'
|
||||||
|
|| b == b' '
|
||||||
|
}
|
||||||
|
|
||||||
|
/// On Unix, get an OsString from raw bytes.
|
||||||
|
#[cfg(unix)]
|
||||||
|
fn bytes_to_os_string(bytes: &[u8]) -> Result<OsString> {
|
||||||
|
use std::os::unix::ffi::OsStringExt;
|
||||||
|
Ok(OsString::from_vec(bytes.to_vec()))
|
||||||
|
}
|
||||||
|
|
||||||
|
/// On non-Unix (like Windows), require UTF-8.
|
||||||
|
#[cfg(not(unix))]
|
||||||
|
fn bytes_to_os_string(bytes: &[u8]) -> Result<OsString> {
|
||||||
|
String::from_utf8(bytes.to_vec()).map(OsString::from).map_err(From::from)
|
||||||
|
}
|
||||||
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use std::ffi::OsString;
|
use std::ffi::OsString;
|
||||||
|
92
src/main.rs
92
src/main.rs
@@ -1,3 +1,17 @@
|
|||||||
|
#[macro_use]
|
||||||
|
extern crate clap;
|
||||||
|
extern crate grep;
|
||||||
|
extern crate ignore;
|
||||||
|
#[macro_use]
|
||||||
|
extern crate lazy_static;
|
||||||
|
#[macro_use]
|
||||||
|
extern crate log;
|
||||||
|
extern crate num_cpus;
|
||||||
|
extern crate regex;
|
||||||
|
#[macro_use]
|
||||||
|
extern crate serde_json;
|
||||||
|
extern crate termcolor;
|
||||||
|
|
||||||
use std::io::{self, Write};
|
use std::io::{self, Write};
|
||||||
use std::process;
|
use std::process;
|
||||||
use std::sync::{Arc, Mutex};
|
use std::sync::{Arc, Mutex};
|
||||||
@@ -22,38 +36,33 @@ mod subject;
|
|||||||
type Result<T> = ::std::result::Result<T, Box<::std::error::Error>>;
|
type Result<T> = ::std::result::Result<T, Box<::std::error::Error>>;
|
||||||
|
|
||||||
fn main() {
|
fn main() {
|
||||||
if let Err(err) = Args::parse().and_then(try_main) {
|
match Args::parse().and_then(try_main) {
|
||||||
eprintln!("{}", err);
|
Ok(true) => process::exit(0),
|
||||||
process::exit(2);
|
Ok(false) => process::exit(1),
|
||||||
|
Err(err) => {
|
||||||
|
eprintln!("{}", err);
|
||||||
|
process::exit(2);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fn try_main(args: Args) -> Result<()> {
|
fn try_main(args: Args) -> Result<bool> {
|
||||||
use args::Command::*;
|
use args::Command::*;
|
||||||
|
|
||||||
let matched =
|
match args.command()? {
|
||||||
match args.command()? {
|
Search => search(args),
|
||||||
Search => search(&args),
|
SearchParallel => search_parallel(args),
|
||||||
SearchParallel => search_parallel(&args),
|
SearchNever => Ok(false),
|
||||||
SearchNever => Ok(false),
|
Files => files(args),
|
||||||
Files => files(&args),
|
FilesParallel => files_parallel(args),
|
||||||
FilesParallel => files_parallel(&args),
|
Types => types(args),
|
||||||
Types => types(&args),
|
|
||||||
PCRE2Version => pcre2_version(&args),
|
|
||||||
}?;
|
|
||||||
if matched && (args.quiet() || !messages::errored()) {
|
|
||||||
process::exit(0)
|
|
||||||
} else if messages::errored() {
|
|
||||||
process::exit(2)
|
|
||||||
} else {
|
|
||||||
process::exit(1)
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// The top-level entry point for single-threaded search. This recursively
|
/// The top-level entry point for single-threaded search. This recursively
|
||||||
/// steps through the file list (current directory by default) and searches
|
/// steps through the file list (current directory by default) and searches
|
||||||
/// each file sequentially.
|
/// each file sequentially.
|
||||||
fn search(args: &Args) -> Result<bool> {
|
fn search(args: Args) -> Result<bool> {
|
||||||
let started_at = Instant::now();
|
let started_at = Instant::now();
|
||||||
let quit_after_match = args.quit_after_match()?;
|
let quit_after_match = args.quit_after_match()?;
|
||||||
let subject_builder = args.subject_builder();
|
let subject_builder = args.subject_builder();
|
||||||
@@ -73,7 +82,7 @@ fn search(args: &Args) -> Result<bool> {
|
|||||||
if err.kind() == io::ErrorKind::BrokenPipe {
|
if err.kind() == io::ErrorKind::BrokenPipe {
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
err_message!("{}: {}", subject.path().display(), err);
|
message!("{}: {}", subject.path().display(), err);
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
@@ -96,7 +105,7 @@ fn search(args: &Args) -> Result<bool> {
|
|||||||
/// The top-level entry point for multi-threaded search. The parallelism is
|
/// The top-level entry point for multi-threaded search. The parallelism is
|
||||||
/// itself achieved by the recursive directory traversal. All we need to do is
|
/// itself achieved by the recursive directory traversal. All we need to do is
|
||||||
/// feed it a worker for performing a search on each file.
|
/// feed it a worker for performing a search on each file.
|
||||||
fn search_parallel(args: &Args) -> Result<bool> {
|
fn search_parallel(args: Args) -> Result<bool> {
|
||||||
use std::sync::atomic::AtomicBool;
|
use std::sync::atomic::AtomicBool;
|
||||||
use std::sync::atomic::Ordering::SeqCst;
|
use std::sync::atomic::Ordering::SeqCst;
|
||||||
|
|
||||||
@@ -132,7 +141,7 @@ fn search_parallel(args: &Args) -> Result<bool> {
|
|||||||
let search_result = match searcher.search(&subject) {
|
let search_result = match searcher.search(&subject) {
|
||||||
Ok(search_result) => search_result,
|
Ok(search_result) => search_result,
|
||||||
Err(err) => {
|
Err(err) => {
|
||||||
err_message!("{}: {}", subject.path().display(), err);
|
message!("{}: {}", subject.path().display(), err);
|
||||||
return WalkState::Continue;
|
return WalkState::Continue;
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
@@ -149,7 +158,7 @@ fn search_parallel(args: &Args) -> Result<bool> {
|
|||||||
return WalkState::Quit;
|
return WalkState::Quit;
|
||||||
}
|
}
|
||||||
// Otherwise, we continue on our merry way.
|
// Otherwise, we continue on our merry way.
|
||||||
err_message!("{}: {}", subject.path().display(), err);
|
message!("{}: {}", subject.path().display(), err);
|
||||||
}
|
}
|
||||||
if matched.load(SeqCst) && quit_after_match {
|
if matched.load(SeqCst) && quit_after_match {
|
||||||
WalkState::Quit
|
WalkState::Quit
|
||||||
@@ -174,7 +183,7 @@ fn search_parallel(args: &Args) -> Result<bool> {
|
|||||||
/// The top-level entry point for listing files without searching them. This
|
/// The top-level entry point for listing files without searching them. This
|
||||||
/// recursively steps through the file list (current directory by default) and
|
/// recursively steps through the file list (current directory by default) and
|
||||||
/// prints each path sequentially using a single thread.
|
/// prints each path sequentially using a single thread.
|
||||||
fn files(args: &Args) -> Result<bool> {
|
fn files(args: Args) -> Result<bool> {
|
||||||
let quit_after_match = args.quit_after_match()?;
|
let quit_after_match = args.quit_after_match()?;
|
||||||
let subject_builder = args.subject_builder();
|
let subject_builder = args.subject_builder();
|
||||||
let mut matched = false;
|
let mut matched = false;
|
||||||
@@ -204,7 +213,7 @@ fn files(args: &Args) -> Result<bool> {
|
|||||||
/// The top-level entry point for listing files without searching them. This
|
/// The top-level entry point for listing files without searching them. This
|
||||||
/// recursively steps through the file list (current directory by default) and
|
/// recursively steps through the file list (current directory by default) and
|
||||||
/// prints each path sequentially using multiple threads.
|
/// prints each path sequentially using multiple threads.
|
||||||
fn files_parallel(args: &Args) -> Result<bool> {
|
fn files_parallel(args: Args) -> Result<bool> {
|
||||||
use std::sync::atomic::AtomicBool;
|
use std::sync::atomic::AtomicBool;
|
||||||
use std::sync::atomic::Ordering::SeqCst;
|
use std::sync::atomic::Ordering::SeqCst;
|
||||||
use std::sync::mpsc;
|
use std::sync::mpsc;
|
||||||
@@ -256,7 +265,7 @@ fn files_parallel(args: &Args) -> Result<bool> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/// The top-level entry point for --type-list.
|
/// The top-level entry point for --type-list.
|
||||||
fn types(args: &Args) -> Result<bool> {
|
fn types(args: Args) -> Result<bool> {
|
||||||
let mut count = 0;
|
let mut count = 0;
|
||||||
let mut stdout = args.stdout();
|
let mut stdout = args.stdout();
|
||||||
for def in args.type_defs()? {
|
for def in args.type_defs()? {
|
||||||
@@ -276,30 +285,3 @@ fn types(args: &Args) -> Result<bool> {
|
|||||||
}
|
}
|
||||||
Ok(count > 0)
|
Ok(count > 0)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// The top-level entry point for --pcre2-version.
|
|
||||||
fn pcre2_version(args: &Args) -> Result<bool> {
|
|
||||||
#[cfg(feature = "pcre2")]
|
|
||||||
fn imp(args: &Args) -> Result<bool> {
|
|
||||||
use grep::pcre2;
|
|
||||||
|
|
||||||
let mut stdout = args.stdout();
|
|
||||||
|
|
||||||
let (major, minor) = pcre2::version();
|
|
||||||
writeln!(stdout, "PCRE2 {}.{} is available", major, minor)?;
|
|
||||||
|
|
||||||
if cfg!(target_pointer_width = "64") && pcre2::is_jit_available() {
|
|
||||||
writeln!(stdout, "JIT is available")?;
|
|
||||||
}
|
|
||||||
Ok(true)
|
|
||||||
}
|
|
||||||
|
|
||||||
#[cfg(not(feature = "pcre2"))]
|
|
||||||
fn imp(args: &Args) -> Result<bool> {
|
|
||||||
let mut stdout = args.stdout();
|
|
||||||
writeln!(stdout, "PCRE2 is not available in this build of ripgrep.")?;
|
|
||||||
Ok(false)
|
|
||||||
}
|
|
||||||
|
|
||||||
imp(args)
|
|
||||||
}
|
|
||||||
|
@@ -1,35 +1,21 @@
|
|||||||
use std::sync::atomic::{AtomicBool, Ordering};
|
use std::sync::atomic::{ATOMIC_BOOL_INIT, AtomicBool, Ordering};
|
||||||
|
|
||||||
static MESSAGES: AtomicBool = AtomicBool::new(false);
|
static MESSAGES: AtomicBool = ATOMIC_BOOL_INIT;
|
||||||
static IGNORE_MESSAGES: AtomicBool = AtomicBool::new(false);
|
static IGNORE_MESSAGES: AtomicBool = ATOMIC_BOOL_INIT;
|
||||||
static ERRORED: AtomicBool = AtomicBool::new(false);
|
|
||||||
|
|
||||||
/// Emit a non-fatal error message, unless messages were disabled.
|
|
||||||
#[macro_export]
|
#[macro_export]
|
||||||
macro_rules! message {
|
macro_rules! message {
|
||||||
($($tt:tt)*) => {
|
($($tt:tt)*) => {
|
||||||
if crate::messages::messages() {
|
if ::messages::messages() {
|
||||||
eprintln!($($tt)*);
|
eprintln!($($tt)*);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Like message, but sets ripgrep's "errored" flag, which controls the exit
|
|
||||||
/// status.
|
|
||||||
#[macro_export]
|
|
||||||
macro_rules! err_message {
|
|
||||||
($($tt:tt)*) => {
|
|
||||||
crate::messages::set_errored();
|
|
||||||
message!($($tt)*);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Emit a non-fatal ignore-related error message (like a parse error), unless
|
|
||||||
/// ignore-messages were disabled.
|
|
||||||
#[macro_export]
|
#[macro_export]
|
||||||
macro_rules! ignore_message {
|
macro_rules! ignore_message {
|
||||||
($($tt:tt)*) => {
|
($($tt:tt)*) => {
|
||||||
if crate::messages::messages() && crate::messages::ignore_messages() {
|
if ::messages::messages() && ::messages::ignore_messages() {
|
||||||
eprintln!($($tt)*);
|
eprintln!($($tt)*);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -62,13 +48,3 @@ pub fn ignore_messages() -> bool {
|
|||||||
pub fn set_ignore_messages(yes: bool) {
|
pub fn set_ignore_messages(yes: bool) {
|
||||||
IGNORE_MESSAGES.store(yes, Ordering::SeqCst)
|
IGNORE_MESSAGES.store(yes, Ordering::SeqCst)
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Returns true if and only if ripgrep came across a non-fatal error.
|
|
||||||
pub fn errored() -> bool {
|
|
||||||
ERRORED.load(Ordering::SeqCst)
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Indicate that ripgrep has come across a non-fatal error.
|
|
||||||
pub fn set_errored() {
|
|
||||||
ERRORED.store(true, Ordering::SeqCst);
|
|
||||||
}
|
|
||||||
|
@@ -10,13 +10,12 @@ use grep::matcher::Matcher;
|
|||||||
use grep::pcre2::{RegexMatcher as PCRE2RegexMatcher};
|
use grep::pcre2::{RegexMatcher as PCRE2RegexMatcher};
|
||||||
use grep::printer::{JSON, Standard, Summary, Stats};
|
use grep::printer::{JSON, Standard, Summary, Stats};
|
||||||
use grep::regex::{RegexMatcher as RustRegexMatcher};
|
use grep::regex::{RegexMatcher as RustRegexMatcher};
|
||||||
use grep::searcher::{BinaryDetection, Searcher};
|
use grep::searcher::Searcher;
|
||||||
use ignore::overrides::Override;
|
use ignore::overrides::Override;
|
||||||
use serde_json as json;
|
use serde_json as json;
|
||||||
use serde_json::json;
|
|
||||||
use termcolor::WriteColor;
|
use termcolor::WriteColor;
|
||||||
|
|
||||||
use crate::subject::Subject;
|
use subject::Subject;
|
||||||
|
|
||||||
/// The configuration for the search worker. Among a few other things, the
|
/// The configuration for the search worker. Among a few other things, the
|
||||||
/// configuration primarily controls the way we show search results to users
|
/// configuration primarily controls the way we show search results to users
|
||||||
@@ -27,8 +26,6 @@ struct Config {
|
|||||||
preprocessor: Option<PathBuf>,
|
preprocessor: Option<PathBuf>,
|
||||||
preprocessor_globs: Override,
|
preprocessor_globs: Override,
|
||||||
search_zip: bool,
|
search_zip: bool,
|
||||||
binary_implicit: BinaryDetection,
|
|
||||||
binary_explicit: BinaryDetection,
|
|
||||||
}
|
}
|
||||||
|
|
||||||
impl Default for Config {
|
impl Default for Config {
|
||||||
@@ -38,8 +35,6 @@ impl Default for Config {
|
|||||||
preprocessor: None,
|
preprocessor: None,
|
||||||
preprocessor_globs: Override::empty(),
|
preprocessor_globs: Override::empty(),
|
||||||
search_zip: false,
|
search_zip: false,
|
||||||
binary_implicit: BinaryDetection::none(),
|
|
||||||
binary_explicit: BinaryDetection::none(),
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -138,37 +133,6 @@ impl SearchWorkerBuilder {
|
|||||||
self.config.search_zip = yes;
|
self.config.search_zip = yes;
|
||||||
self
|
self
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Set the binary detection that should be used when searching files
|
|
||||||
/// found via a recursive directory search.
|
|
||||||
///
|
|
||||||
/// Generally, this binary detection may be `BinaryDetection::quit` if
|
|
||||||
/// we want to skip binary files completely.
|
|
||||||
///
|
|
||||||
/// By default, no binary detection is performed.
|
|
||||||
pub fn binary_detection_implicit(
|
|
||||||
&mut self,
|
|
||||||
detection: BinaryDetection,
|
|
||||||
) -> &mut SearchWorkerBuilder {
|
|
||||||
self.config.binary_implicit = detection;
|
|
||||||
self
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Set the binary detection that should be used when searching files
|
|
||||||
/// explicitly supplied by an end user.
|
|
||||||
///
|
|
||||||
/// Generally, this binary detection should NOT be `BinaryDetection::quit`,
|
|
||||||
/// since we never want to automatically filter files supplied by the end
|
|
||||||
/// user.
|
|
||||||
///
|
|
||||||
/// By default, no binary detection is performed.
|
|
||||||
pub fn binary_detection_explicit(
|
|
||||||
&mut self,
|
|
||||||
detection: BinaryDetection,
|
|
||||||
) -> &mut SearchWorkerBuilder {
|
|
||||||
self.config.binary_explicit = detection;
|
|
||||||
self
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/// The result of executing a search.
|
/// The result of executing a search.
|
||||||
@@ -343,14 +307,6 @@ impl<W: WriteColor> SearchWorker<W> {
|
|||||||
|
|
||||||
/// Search the given subject using the appropriate strategy.
|
/// Search the given subject using the appropriate strategy.
|
||||||
fn search_impl(&mut self, subject: &Subject) -> io::Result<SearchResult> {
|
fn search_impl(&mut self, subject: &Subject) -> io::Result<SearchResult> {
|
||||||
let bin =
|
|
||||||
if subject.is_explicit() {
|
|
||||||
self.config.binary_explicit.clone()
|
|
||||||
} else {
|
|
||||||
self.config.binary_implicit.clone()
|
|
||||||
};
|
|
||||||
self.searcher.set_binary_detection(bin);
|
|
||||||
|
|
||||||
let path = subject.path();
|
let path = subject.path();
|
||||||
if subject.is_stdin() {
|
if subject.is_stdin() {
|
||||||
let stdin = io::stdin();
|
let stdin = io::stdin();
|
||||||
|
@@ -1,7 +1,6 @@
|
|||||||
use std::path::Path;
|
use std::path::Path;
|
||||||
|
|
||||||
use ignore::{self, DirEntry};
|
use ignore::{self, DirEntry};
|
||||||
use log;
|
|
||||||
|
|
||||||
/// A configuration for describing how subjects should be built.
|
/// A configuration for describing how subjects should be built.
|
||||||
#[derive(Clone, Debug)]
|
#[derive(Clone, Debug)]
|
||||||
@@ -41,7 +40,7 @@ impl SubjectBuilder {
|
|||||||
match result {
|
match result {
|
||||||
Ok(dent) => self.build(dent),
|
Ok(dent) => self.build(dent),
|
||||||
Err(err) => {
|
Err(err) => {
|
||||||
err_message!("{}", err);
|
message!("{}", err);
|
||||||
None
|
None
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -59,12 +58,17 @@ impl SubjectBuilder {
|
|||||||
if let Some(ignore_err) = subj.dent.error() {
|
if let Some(ignore_err) = subj.dent.error() {
|
||||||
ignore_message!("{}", ignore_err);
|
ignore_message!("{}", ignore_err);
|
||||||
}
|
}
|
||||||
// If this entry was explicitly provided by an end user, then we always
|
// If this entry represents stdin, then we always search it.
|
||||||
// want to search it.
|
if subj.dent.is_stdin() {
|
||||||
if subj.is_explicit() {
|
|
||||||
return Some(subj);
|
return Some(subj);
|
||||||
}
|
}
|
||||||
// At this point, we only want to search something if it's explicitly a
|
// If this subject has a depth of 0, then it was provided explicitly
|
||||||
|
// by an end user (or via a shell glob). In this case, we always want
|
||||||
|
// to search it if it even smells like a file (e.g., a symlink).
|
||||||
|
if subj.dent.depth() == 0 && !subj.is_dir() {
|
||||||
|
return Some(subj);
|
||||||
|
}
|
||||||
|
// At this point, we only want to search something it's explicitly a
|
||||||
// file. This omits symlinks. (If ripgrep was configured to follow
|
// file. This omits symlinks. (If ripgrep was configured to follow
|
||||||
// symlinks, then they have already been followed by the directory
|
// symlinks, then they have already been followed by the directory
|
||||||
// traversal.)
|
// traversal.)
|
||||||
@@ -75,7 +79,7 @@ impl SubjectBuilder {
|
|||||||
// directory. Otherwise, emitting messages for directories is just
|
// directory. Otherwise, emitting messages for directories is just
|
||||||
// noisy.
|
// noisy.
|
||||||
if !subj.is_dir() {
|
if !subj.is_dir() {
|
||||||
log::debug!(
|
debug!(
|
||||||
"ignoring {}: failed to pass subject filter: \
|
"ignoring {}: failed to pass subject filter: \
|
||||||
file type: {:?}, metadata: {:?}",
|
file type: {:?}, metadata: {:?}",
|
||||||
subj.dent.path().display(),
|
subj.dent.path().display(),
|
||||||
@@ -122,39 +126,9 @@ impl Subject {
|
|||||||
self.dent.is_stdin()
|
self.dent.is_stdin()
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Returns true if and only if this entry corresponds to a subject to
|
/// Returns true if and only if this subject points to a directory.
|
||||||
/// search that was explicitly supplied by an end user.
|
|
||||||
///
|
|
||||||
/// Generally, this corresponds to either stdin or an explicit file path
|
|
||||||
/// argument. e.g., in `rg foo some-file ./some-dir/`, `some-file` is
|
|
||||||
/// an explicit subject, but, e.g., `./some-dir/some-other-file` is not.
|
|
||||||
///
|
|
||||||
/// However, note that ripgrep does not see through shell globbing. e.g.,
|
|
||||||
/// in `rg foo ./some-dir/*`, `./some-dir/some-other-file` will be treated
|
|
||||||
/// as an explicit subject.
|
|
||||||
pub fn is_explicit(&self) -> bool {
|
|
||||||
// stdin is obvious. When an entry has a depth of 0, that means it
|
|
||||||
// was explicitly provided to our directory iterator, which means it
|
|
||||||
// was in turn explicitly provided by the end user. The !is_dir check
|
|
||||||
// means that we want to search files even if their symlinks, again,
|
|
||||||
// because they were explicitly provided. (And we never want to try
|
|
||||||
// to search a directory.)
|
|
||||||
self.is_stdin() || (self.dent.depth() == 0 && !self.is_dir())
|
|
||||||
}
|
|
||||||
|
|
||||||
/// Returns true if and only if this subject points to a directory after
|
|
||||||
/// following symbolic links.
|
|
||||||
fn is_dir(&self) -> bool {
|
fn is_dir(&self) -> bool {
|
||||||
let ft = match self.dent.file_type() {
|
self.dent.file_type().map_or(false, |ft| ft.is_dir())
|
||||||
None => return false,
|
|
||||||
Some(ft) => ft,
|
|
||||||
};
|
|
||||||
if ft.is_dir() {
|
|
||||||
return true;
|
|
||||||
}
|
|
||||||
// If this is a symlink, then we want to follow it to determine
|
|
||||||
// whether it's a directory or not.
|
|
||||||
self.dent.path_is_symlink() && self.dent.path().is_dir()
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Returns true if and only if this subject points to a file.
|
/// Returns true if and only if this subject points to a file.
|
||||||
|
315
tests/binary.rs
315
tests/binary.rs
@@ -1,315 +0,0 @@
|
|||||||
use crate::util::{Dir, TestCommand};
|
|
||||||
|
|
||||||
// This file contains a smattering of tests specifically for checking ripgrep's
|
|
||||||
// handling of binary files. There's quite a bit of discussion on this in this
|
|
||||||
// bug report: https://github.com/BurntSushi/ripgrep/issues/306
|
|
||||||
|
|
||||||
// Our haystack is the first 500 lines of Gutenberg's copy of "A Study in
|
|
||||||
// Scarlet," with a NUL byte at line 237: `abcdef\x00`.
|
|
||||||
//
|
|
||||||
// The position and size of the haystack is, unfortunately, significant. In
|
|
||||||
// particular, the NUL byte is specifically inserted at some point *after* the
|
|
||||||
// first 8192 bytes, which corresponds to the initial capacity of the buffer
|
|
||||||
// that ripgrep uses to read files. (grep for DEFAULT_BUFFER_CAPACITY.) The
|
|
||||||
// position of the NUL byte ensures that we can execute some search on the
|
|
||||||
// initial buffer contents without ever detecting any binary data. Moreover,
|
|
||||||
// when using a memory map for searching, only the first 8192 bytes are
|
|
||||||
// scanned for a NUL byte, so no binary bytes are detected at all when using
|
|
||||||
// a memory map (unless our query matches line 237).
|
|
||||||
//
|
|
||||||
// One last note: in the tests below, we use --no-mmap heavily because binary
|
|
||||||
// detection with memory maps is a bit different. Namely, NUL bytes are only
|
|
||||||
// searched for in the first few KB of the file and in a match. Normally, NUL
|
|
||||||
// bytes are searched for everywhere.
|
|
||||||
//
|
|
||||||
// TODO: Add tests for binary file detection when using memory maps.
|
|
||||||
const HAY: &'static [u8] = include_bytes!("./data/sherlock-nul.txt");
|
|
||||||
|
|
||||||
// This tests that ripgrep prints a warning message if it finds and prints a
|
|
||||||
// match in a binary file before detecting that it is a binary file. The point
|
|
||||||
// here is to notify that user that the search of the file is only partially
|
|
||||||
// complete.
|
|
||||||
//
|
|
||||||
// This applies to files that are *implicitly* searched via a recursive
|
|
||||||
// directory traversal. In particular, this results in a WARNING message being
|
|
||||||
// printed. We make our file "implicit" by doing a recursive search with a glob
|
|
||||||
// that matches our file.
|
|
||||||
rgtest!(after_match1_implicit, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-n", "Project Gutenberg EBook", "-g", "hay",
|
|
||||||
]);
|
|
||||||
|
|
||||||
let expected = "\
|
|
||||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
|
||||||
WARNING: stopped searching binary file hay after match (found \"\\u{0}\" byte around offset 9741)
|
|
||||||
";
|
|
||||||
eqnice!(expected, cmd.stdout());
|
|
||||||
});
|
|
||||||
|
|
||||||
// Like after_match1_implicit, except we provide a file to search
|
|
||||||
// explicitly. This results in identical behavior, but a different message.
|
|
||||||
rgtest!(after_match1_explicit, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-n", "Project Gutenberg EBook", "hay",
|
|
||||||
]);
|
|
||||||
|
|
||||||
let expected = "\
|
|
||||||
1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
|
||||||
Binary file matches (found \"\\u{0}\" byte around offset 9741)
|
|
||||||
";
|
|
||||||
eqnice!(expected, cmd.stdout());
|
|
||||||
});
|
|
||||||
|
|
||||||
// Like after_match1_explicit, except we feed our content on stdin.
|
|
||||||
rgtest!(after_match1_stdin, |_: Dir, mut cmd: TestCommand| {
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-n", "Project Gutenberg EBook",
|
|
||||||
]);
|
|
||||||
|
|
||||||
let expected = "\
|
|
||||||
1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
|
||||||
Binary file matches (found \"\\u{0}\" byte around offset 9741)
|
|
||||||
";
|
|
||||||
eqnice!(expected, cmd.pipe(HAY));
|
|
||||||
});
|
|
||||||
|
|
||||||
// Like after_match1_implicit, but provides the --binary flag, which
|
|
||||||
// disables binary filtering. Thus, this matches the behavior of ripgrep as
|
|
||||||
// if the file were given explicitly.
|
|
||||||
rgtest!(after_match1_implicit_binary, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-n", "--binary", "Project Gutenberg EBook", "-g", "hay",
|
|
||||||
]);
|
|
||||||
|
|
||||||
let expected = "\
|
|
||||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
|
||||||
Binary file hay matches (found \"\\u{0}\" byte around offset 9741)
|
|
||||||
";
|
|
||||||
eqnice!(expected, cmd.stdout());
|
|
||||||
});
|
|
||||||
|
|
||||||
// Like after_match1_implicit, but enables -a/--text, so no binary
|
|
||||||
// detection should be performed.
|
|
||||||
rgtest!(after_match1_implicit_text, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-n", "--text", "Project Gutenberg EBook", "-g", "hay",
|
|
||||||
]);
|
|
||||||
|
|
||||||
let expected = "\
|
|
||||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
|
||||||
";
|
|
||||||
eqnice!(expected, cmd.stdout());
|
|
||||||
});
|
|
||||||
|
|
||||||
// Like after_match1_implicit_text, but enables -a/--text, so no binary
|
|
||||||
// detection should be performed.
|
|
||||||
rgtest!(after_match1_explicit_text, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-n", "--text", "Project Gutenberg EBook", "hay",
|
|
||||||
]);
|
|
||||||
|
|
||||||
let expected = "\
|
|
||||||
1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
|
||||||
";
|
|
||||||
eqnice!(expected, cmd.stdout());
|
|
||||||
});
|
|
||||||
|
|
||||||
// Like after_match1_implicit, except this asks ripgrep to print all matching
|
|
||||||
// files.
|
|
||||||
//
|
|
||||||
// This is an interesting corner case that one might consider a bug, however,
|
|
||||||
// it's unlikely to be fixed. Namely, ripgrep probably shouldn't print `hay`
|
|
||||||
// as a matching file since it is in fact a binary file, and thus should be
|
|
||||||
// filtered out by default. However, the --files-with-matches flag will print
|
|
||||||
// out the path of a matching file as soon as a match is seen and then stop
|
|
||||||
// searching completely. Therefore, the NUL byte is never actually detected.
|
|
||||||
//
|
|
||||||
// The only way to fix this would be to kill ripgrep's performance in this case
|
|
||||||
// and continue searching the entire file for a NUL byte. (Similarly if the
|
|
||||||
// --quiet flag is set. See the next test.)
|
|
||||||
rgtest!(after_match1_implicit_path, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-l", "Project Gutenberg EBook", "-g", "hay",
|
|
||||||
]);
|
|
||||||
eqnice!("hay\n", cmd.stdout());
|
|
||||||
});
|
|
||||||
|
|
||||||
// Like after_match1_implicit_path, except this indicates that a match was
|
|
||||||
// found with no other output. (This is the same bug described above, but
|
|
||||||
// manifest as an exit code with no output.)
|
|
||||||
rgtest!(after_match1_implicit_quiet, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-q", "Project Gutenberg EBook", "-g", "hay",
|
|
||||||
]);
|
|
||||||
eqnice!("", cmd.stdout());
|
|
||||||
});
|
|
||||||
|
|
||||||
// This sets up the same test as after_match1_implicit_path, but instead of
|
|
||||||
// just printing the matching files, this includes the full count of matches.
|
|
||||||
// In this case, we need to search the entire file, so ripgrep correctly
|
|
||||||
// detects the binary data and suppresses output.
|
|
||||||
rgtest!(after_match1_implicit_count, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-c", "Project Gutenberg EBook", "-g", "hay",
|
|
||||||
]);
|
|
||||||
cmd.assert_err();
|
|
||||||
});
|
|
||||||
|
|
||||||
// Like after_match1_implicit_count, except the --binary flag is provided,
|
|
||||||
// which makes ripgrep disable binary data filtering even for implicit files.
|
|
||||||
rgtest!(after_match1_implicit_count_binary, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-c", "--binary",
|
|
||||||
"Project Gutenberg EBook",
|
|
||||||
"-g", "hay",
|
|
||||||
]);
|
|
||||||
eqnice!("hay:1\n", cmd.stdout());
|
|
||||||
});
|
|
||||||
|
|
||||||
// Like after_match1_implicit_count, except the file path is provided
|
|
||||||
// explicitly, so binary filtering is disabled and a count is correctly
|
|
||||||
// reported.
|
|
||||||
rgtest!(after_match1_explicit_count, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-c", "Project Gutenberg EBook", "hay",
|
|
||||||
]);
|
|
||||||
eqnice!("1\n", cmd.stdout());
|
|
||||||
});
|
|
||||||
|
|
||||||
// This tests that a match way before the NUL byte is shown, but a match after
|
|
||||||
// the NUL byte is not.
|
|
||||||
rgtest!(after_match2_implicit, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-n",
|
|
||||||
"Project Gutenberg EBook|a medical student",
|
|
||||||
"-g", "hay",
|
|
||||||
]);
|
|
||||||
|
|
||||||
let expected = "\
|
|
||||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
|
||||||
WARNING: stopped searching binary file hay after match (found \"\\u{0}\" byte around offset 9741)
|
|
||||||
";
|
|
||||||
eqnice!(expected, cmd.stdout());
|
|
||||||
});
|
|
||||||
|
|
||||||
// Like after_match2_implicit, but enables -a/--text, so no binary
|
|
||||||
// detection should be performed.
|
|
||||||
rgtest!(after_match2_implicit_text, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-n", "--text",
|
|
||||||
"Project Gutenberg EBook|a medical student",
|
|
||||||
"-g", "hay",
|
|
||||||
]);
|
|
||||||
|
|
||||||
let expected = "\
|
|
||||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
|
||||||
hay:236:\"And yet you say he is not a medical student?\"
|
|
||||||
";
|
|
||||||
eqnice!(expected, cmd.stdout());
|
|
||||||
});
|
|
||||||
|
|
||||||
// This tests that ripgrep *silently* quits before finding a match that occurs
|
|
||||||
// after a NUL byte.
|
|
||||||
rgtest!(before_match1_implicit, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-n", "Heaven", "-g", "hay",
|
|
||||||
]);
|
|
||||||
cmd.assert_err();
|
|
||||||
});
|
|
||||||
|
|
||||||
// This tests that ripgrep *does not* silently quit before finding a match that
|
|
||||||
// occurs after a NUL byte when a file is explicitly searched.
|
|
||||||
rgtest!(before_match1_explicit, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-n", "Heaven", "hay",
|
|
||||||
]);
|
|
||||||
|
|
||||||
let expected = "\
|
|
||||||
Binary file matches (found \"\\u{0}\" byte around offset 9741)
|
|
||||||
";
|
|
||||||
eqnice!(expected, cmd.stdout());
|
|
||||||
});
|
|
||||||
|
|
||||||
// Like before_match1_implicit, but enables the --binary flag, which
|
|
||||||
// disables binary filtering. Thus, this matches the behavior of ripgrep as if
|
|
||||||
// the file were given explicitly.
|
|
||||||
rgtest!(before_match1_implicit_binary, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-n", "--binary", "Heaven", "-g", "hay",
|
|
||||||
]);
|
|
||||||
|
|
||||||
let expected = "\
|
|
||||||
Binary file hay matches (found \"\\u{0}\" byte around offset 9741)
|
|
||||||
";
|
|
||||||
eqnice!(expected, cmd.stdout());
|
|
||||||
});
|
|
||||||
|
|
||||||
// Like before_match1_implicit, but enables -a/--text, so no binary
|
|
||||||
// detection should be performed.
|
|
||||||
rgtest!(before_match1_implicit_text, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-n", "--text", "Heaven", "-g", "hay",
|
|
||||||
]);
|
|
||||||
|
|
||||||
let expected = "\
|
|
||||||
hay:238:\"No. Heaven knows what the objects of his studies are. But here we
|
|
||||||
";
|
|
||||||
eqnice!(expected, cmd.stdout());
|
|
||||||
});
|
|
||||||
|
|
||||||
// This tests that ripgrep *silently* quits before finding a match that occurs
|
|
||||||
// before a NUL byte, but within the same buffer as the NUL byte.
|
|
||||||
rgtest!(before_match2_implicit, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-n", "a medical student", "-g", "hay",
|
|
||||||
]);
|
|
||||||
cmd.assert_err();
|
|
||||||
});
|
|
||||||
|
|
||||||
// This tests that ripgrep *does not* silently quit before finding a match that
|
|
||||||
// occurs before a NUL byte, but within the same buffer as the NUL byte. Even
|
|
||||||
// though the match occurs before the NUL byte, ripgrep still doesn't print it
|
|
||||||
// because it has already scanned ahead to detect the NUL byte. (This matches
|
|
||||||
// the behavior of GNU grep.)
|
|
||||||
rgtest!(before_match2_explicit, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-n", "a medical student", "hay",
|
|
||||||
]);
|
|
||||||
|
|
||||||
let expected = "\
|
|
||||||
Binary file matches (found \"\\u{0}\" byte around offset 9741)
|
|
||||||
";
|
|
||||||
eqnice!(expected, cmd.stdout());
|
|
||||||
});
|
|
||||||
|
|
||||||
// Like before_match1_implicit, but enables -a/--text, so no binary
|
|
||||||
// detection should be performed.
|
|
||||||
rgtest!(before_match2_implicit_text, |dir: Dir, mut cmd: TestCommand| {
|
|
||||||
dir.create_bytes("hay", HAY);
|
|
||||||
cmd.args(&[
|
|
||||||
"--no-mmap", "-n", "--text", "a medical student", "-g", "hay",
|
|
||||||
]);
|
|
||||||
|
|
||||||
let expected = "\
|
|
||||||
hay:236:\"And yet you say he is not a medical student?\"
|
|
||||||
";
|
|
||||||
eqnice!(expected, cmd.stdout());
|
|
||||||
});
|
|
@@ -1,500 +0,0 @@
|
|||||||
The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
|
||||||
|
|
||||||
This eBook is for the use of anyone anywhere at no cost and with
|
|
||||||
almost no restrictions whatsoever. You may copy it, give it away or
|
|
||||||
re-use it under the terms of the Project Gutenberg License included
|
|
||||||
with this eBook or online at www.gutenberg.org
|
|
||||||
|
|
||||||
|
|
||||||
Title: A Study In Scarlet
|
|
||||||
|
|
||||||
Author: Arthur Conan Doyle
|
|
||||||
|
|
||||||
Posting Date: July 12, 2008 [EBook #244]
|
|
||||||
Release Date: April, 1995
|
|
||||||
[Last updated: February 17, 2013]
|
|
||||||
|
|
||||||
Language: English
|
|
||||||
|
|
||||||
|
|
||||||
*** START OF THIS PROJECT GUTENBERG EBOOK A STUDY IN SCARLET ***
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Produced by Roger Squires
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
A STUDY IN SCARLET.
|
|
||||||
|
|
||||||
By A. Conan Doyle
|
|
||||||
|
|
||||||
[1]
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Original Transcriber's Note: This etext is prepared directly
|
|
||||||
from an 1887 edition, and care has been taken to duplicate the
|
|
||||||
original exactly, including typographical and punctuation
|
|
||||||
vagaries.
|
|
||||||
|
|
||||||
Additions to the text include adding the underscore character to
|
|
||||||
indicate italics, and textual end-notes in square braces.
|
|
||||||
|
|
||||||
Project Gutenberg Editor's Note: In reproofing and moving old PG
|
|
||||||
files such as this to the present PG directory system it is the
|
|
||||||
policy to reformat the text to conform to present PG Standards.
|
|
||||||
In this case however, in consideration of the note above of the
|
|
||||||
original transcriber describing his care to try to duplicate the
|
|
||||||
original 1887 edition as to typography and punctuation vagaries,
|
|
||||||
no changes have been made in this ascii text file. However, in
|
|
||||||
the Latin-1 file and this html file, present standards are
|
|
||||||
followed and the several French and Spanish words have been
|
|
||||||
given their proper accents.
|
|
||||||
|
|
||||||
Part II, The Country of the Saints, deals much with the Mormon Church.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
A STUDY IN SCARLET.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
PART I.
|
|
||||||
|
|
||||||
(_Being a reprint from the reminiscences of_ JOHN H. WATSON, M.D., _late
|
|
||||||
of the Army Medical Department._) [2]
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
CHAPTER I. MR. SHERLOCK HOLMES.
|
|
||||||
|
|
||||||
|
|
||||||
IN the year 1878 I took my degree of Doctor of Medicine of the
|
|
||||||
University of London, and proceeded to Netley to go through the course
|
|
||||||
prescribed for surgeons in the army. Having completed my studies there,
|
|
||||||
I was duly attached to the Fifth Northumberland Fusiliers as Assistant
|
|
||||||
Surgeon. The regiment was stationed in India at the time, and before
|
|
||||||
I could join it, the second Afghan war had broken out. On landing at
|
|
||||||
Bombay, I learned that my corps had advanced through the passes, and
|
|
||||||
was already deep in the enemy's country. I followed, however, with many
|
|
||||||
other officers who were in the same situation as myself, and succeeded
|
|
||||||
in reaching Candahar in safety, where I found my regiment, and at once
|
|
||||||
entered upon my new duties.
|
|
||||||
|
|
||||||
The campaign brought honours and promotion to many, but for me it had
|
|
||||||
nothing but misfortune and disaster. I was removed from my brigade and
|
|
||||||
attached to the Berkshires, with whom I served at the fatal battle of
|
|
||||||
Maiwand. There I was struck on the shoulder by a Jezail bullet, which
|
|
||||||
shattered the bone and grazed the subclavian artery. I should have
|
|
||||||
fallen into the hands of the murderous Ghazis had it not been for the
|
|
||||||
devotion and courage shown by Murray, my orderly, who threw me across a
|
|
||||||
pack-horse, and succeeded in bringing me safely to the British lines.
|
|
||||||
|
|
||||||
Worn with pain, and weak from the prolonged hardships which I had
|
|
||||||
undergone, I was removed, with a great train of wounded sufferers, to
|
|
||||||
the base hospital at Peshawar. Here I rallied, and had already improved
|
|
||||||
so far as to be able to walk about the wards, and even to bask a little
|
|
||||||
upon the verandah, when I was struck down by enteric fever, that curse
|
|
||||||
of our Indian possessions. For months my life was despaired of, and
|
|
||||||
when at last I came to myself and became convalescent, I was so weak and
|
|
||||||
emaciated that a medical board determined that not a day should be lost
|
|
||||||
in sending me back to England. I was dispatched, accordingly, in the
|
|
||||||
troopship "Orontes," and landed a month later on Portsmouth jetty, with
|
|
||||||
my health irretrievably ruined, but with permission from a paternal
|
|
||||||
government to spend the next nine months in attempting to improve it.
|
|
||||||
|
|
||||||
I had neither kith nor kin in England, and was therefore as free as
|
|
||||||
air--or as free as an income of eleven shillings and sixpence a day will
|
|
||||||
permit a man to be. Under such circumstances, I naturally gravitated to
|
|
||||||
London, that great cesspool into which all the loungers and idlers of
|
|
||||||
the Empire are irresistibly drained. There I stayed for some time at
|
|
||||||
a private hotel in the Strand, leading a comfortless, meaningless
|
|
||||||
existence, and spending such money as I had, considerably more freely
|
|
||||||
than I ought. So alarming did the state of my finances become, that
|
|
||||||
I soon realized that I must either leave the metropolis and rusticate
|
|
||||||
somewhere in the country, or that I must make a complete alteration in
|
|
||||||
my style of living. Choosing the latter alternative, I began by making
|
|
||||||
up my mind to leave the hotel, and to take up my quarters in some less
|
|
||||||
pretentious and less expensive domicile.
|
|
||||||
|
|
||||||
On the very day that I had come to this conclusion, I was standing at
|
|
||||||
the Criterion Bar, when some one tapped me on the shoulder, and turning
|
|
||||||
round I recognized young Stamford, who had been a dresser under me at
|
|
||||||
Barts. The sight of a friendly face in the great wilderness of London is
|
|
||||||
a pleasant thing indeed to a lonely man. In old days Stamford had never
|
|
||||||
been a particular crony of mine, but now I hailed him with enthusiasm,
|
|
||||||
and he, in his turn, appeared to be delighted to see me. In the
|
|
||||||
exuberance of my joy, I asked him to lunch with me at the Holborn, and
|
|
||||||
we started off together in a hansom.
|
|
||||||
|
|
||||||
"Whatever have you been doing with yourself, Watson?" he asked in
|
|
||||||
undisguised wonder, as we rattled through the crowded London streets.
|
|
||||||
"You are as thin as a lath and as brown as a nut."
|
|
||||||
|
|
||||||
I gave him a short sketch of my adventures, and had hardly concluded it
|
|
||||||
by the time that we reached our destination.
|
|
||||||
|
|
||||||
"Poor devil!" he said, commiseratingly, after he had listened to my
|
|
||||||
misfortunes. "What are you up to now?"
|
|
||||||
|
|
||||||
"Looking for lodgings." [3] I answered. "Trying to solve the problem
|
|
||||||
as to whether it is possible to get comfortable rooms at a reasonable
|
|
||||||
price."
|
|
||||||
|
|
||||||
"That's a strange thing," remarked my companion; "you are the second man
|
|
||||||
to-day that has used that expression to me."
|
|
||||||
|
|
||||||
"And who was the first?" I asked.
|
|
||||||
|
|
||||||
"A fellow who is working at the chemical laboratory up at the hospital.
|
|
||||||
He was bemoaning himself this morning because he could not get someone
|
|
||||||
to go halves with him in some nice rooms which he had found, and which
|
|
||||||
were too much for his purse."
|
|
||||||
|
|
||||||
"By Jove!" I cried, "if he really wants someone to share the rooms and
|
|
||||||
the expense, I am the very man for him. I should prefer having a partner
|
|
||||||
to being alone."
|
|
||||||
|
|
||||||
Young Stamford looked rather strangely at me over his wine-glass. "You
|
|
||||||
don't know Sherlock Holmes yet," he said; "perhaps you would not care
|
|
||||||
for him as a constant companion."
|
|
||||||
|
|
||||||
"Why, what is there against him?"
|
|
||||||
|
|
||||||
"Oh, I didn't say there was anything against him. He is a little queer
|
|
||||||
in his ideas--an enthusiast in some branches of science. As far as I
|
|
||||||
know he is a decent fellow enough."
|
|
||||||
|
|
||||||
"A medical student, I suppose?" said I.
|
|
||||||
|
|
||||||
"No--I have no idea what he intends to go in for. I believe he is well
|
|
||||||
up in anatomy, and he is a first-class chemist; but, as far as I know,
|
|
||||||
he has never taken out any systematic medical classes. His studies are
|
|
||||||
very desultory and eccentric, but he has amassed a lot of out-of-the way
|
|
||||||
knowledge which would astonish his professors."
|
|
||||||
|
|
||||||
"Did you never ask him what he was going in for?" I asked.
|
|
||||||
|
|
||||||
"No; he is not a man that it is easy to draw out, though he can be
|
|
||||||
communicative enough when the fancy seizes him."
|
|
||||||
|
|
||||||
"I should like to meet him," I said. "If I am to lodge with anyone, I
|
|
||||||
should prefer a man of studious and quiet habits. I am not strong
|
|
||||||
enough yet to stand much noise or excitement. I had enough of both in
|
|
||||||
Afghanistan to last me for the remainder of my natural existence. How
|
|
||||||
could I meet this friend of yours?"
|
|
||||||
|
|
||||||
"He is sure to be at the laboratory," returned my companion. "He either
|
|
||||||
avoids the place for weeks, or else he works there from morning to
|
|
||||||
night. If you like, we shall drive round together after luncheon."
|
|
||||||
|
|
||||||
"Certainly," I answered, and the conversation drifted away into other
|
|
||||||
channels.
|
|
||||||
|
|
||||||
As we made our way to the hospital after leaving the Holborn, Stamford
|
|
||||||
gave me a few more particulars about the gentleman whom I proposed to
|
|
||||||
take as a fellow-lodger.
|
|
||||||
|
|
||||||
"You mustn't blame me if you don't get on with him," he said; "I know
|
|
||||||
nothing more of him than I have learned from meeting him occasionally in
|
|
||||||
the laboratory. You proposed this arrangement, so you must not hold me
|
|
||||||
responsible."
|
|
||||||
|
|
||||||
"If we don't get on it will be easy to part company," I answered. "It
|
|
||||||
seems to me, Stamford," I added, looking hard at my companion, "that you
|
|
||||||
have some reason for washing your hands of the matter. Is this fellow's
|
|
||||||
temper so formidable, or what is it? Don't be mealy-mouthed about it."
|
|
||||||
|
|
||||||
"It is not easy to express the inexpressible," he answered with a laugh.
|
|
||||||
"Holmes is a little too scientific for my tastes--it approaches to
|
|
||||||
cold-bloodedness. I could imagine his giving a friend a little pinch of
|
|
||||||
the latest vegetable alkaloid, not out of malevolence, you understand,
|
|
||||||
but simply out of a spirit of inquiry in order to have an accurate idea
|
|
||||||
of the effects. To do him justice, I think that he would take it himself
|
|
||||||
with the same readiness. He appears to have a passion for definite and
|
|
||||||
exact knowledge."
|
|
||||||
|
|
||||||
"Very right too."
|
|
||||||
|
|
||||||
"Yes, but it may be pushed to excess. When it comes to beating the
|
|
||||||
subjects in the dissecting-rooms with a stick, it is certainly taking
|
|
||||||
rather a bizarre shape."
|
|
||||||
|
|
||||||
"Beating the subjects!"
|
|
||||||
|
|
||||||
"Yes, to verify how far bruises may be produced after death. I saw him
|
|
||||||
at it with my own eyes."
|
|
||||||
|
|
||||||
"And yet you say he is not a medical student?"
|
|
||||||
abcdef |