Compare commits

..

4 Commits

Author SHA1 Message Date
Andrew Gallant
acd20a803c ripgrep: remove old code 2018-08-06 10:52:53 -04:00
Andrew Gallant
f16f9cedf1 ci: test libripgrep 2018-08-06 10:52:53 -04:00
Andrew Gallant
7eb34e8b32 ripgrep: migrate to libripgrep 2018-08-06 10:52:53 -04:00
Andrew Gallant
b3769ef8f1 libripgrep: initial commit introducing libripgrep
libripgrep is not any one library, but rather, a collection of libraries
that roughly separate the following key distinct phases in a grep
implementation:

  1. Pattern matching (e.g., by a regex engine).
  2. Searching a file using a pattern matcher.
  3. Printing results.

Ultimately, both (1) and (3) are defined by de-coupled interfaces, of
which there may be multiple implementations. Namely, (1) is satisfied by
the `Matcher` trait in the `grep-matcher` crate and (3) is satisfied by
the `Sink` trait in the `grep2` crate. The searcher (2) ties everything
together and finds results using a matcher and reports those results
using a `Sink` implementation.
2018-08-06 10:52:53 -04:00
85 changed files with 3803 additions and 8785 deletions

View File

@@ -17,8 +17,6 @@ addons:
# Needed for testing decompression search.
- xz-utils
- liblz4-tool
# For building MUSL static builds on Linux.
- musl-tools
matrix:
fast_finish: true
include:
@@ -62,13 +60,13 @@ matrix:
# Minimum Rust supported channel. We enable these to make sure ripgrep
# continues to work on the advertised minimum Rust version.
- os: linux
rust: 1.28.0
rust: 1.23.0
env: TARGET=x86_64-unknown-linux-gnu
- os: linux
rust: 1.28.0
rust: 1.23.0
env: TARGET=x86_64-unknown-linux-musl
- os: linux
rust: 1.28.0
rust: 1.23.0
env: TARGET=arm-unknown-linux-gnueabihf GCC_VERSION=4.8
addons:
apt:
@@ -101,6 +99,7 @@ branches:
only:
# Pushes and PR to the master branch
- master
- ag/libripgrep
# Ruby regex to match tags. Required, or travis won't trigger deploys when
# a new tag is pushed.
- /^\d+\.\d+\.\d+.*$/

View File

@@ -1,88 +1,3 @@
0.10.0 (TBD)
============
This is a new minor version release of ripgrep that contains some major new
features, a huge number of bug fixes, and is the first release based on
libripgrep. The entirety of ripgrep's core search and printing code has been
rewritten and generalized so that anyone can make use of it.
Major new features include PCRE2 support, multi-line search and a JSON output
format.
**BREAKING CHANGES**:
* The minimum version required to compile Rust has now changed to track the
latest stable version of Rust. Patch releases will continue to compile with
the same version of Rust as the previous patch release, but new minor
versions will use the current stable version of the Rust compile as its
minimum supported version.
* The match semantics of `-w/--word-regexp` have changed slightly. They used
to be `\b(?:<your pattern>)\b`, but now it's
`(?:^|\W)(?:<your pattern>)(?:$|\W)`. This matches the behavior of GNU grep
and is believed to be closer to the intended semantics of the flag. See
[#389](https://github.com/BurntSushi/ripgrep/issues/389) for more details.
Feature enhancements:
* [FEATURE #162](https://github.com/BurntSushi/ripgrep/issues/162):
libripgrep is now a thing. The primary crate is
[`grep`](https://docs.rs/grep).
* [FEATURE #176](https://github.com/BurntSushi/ripgrep/issues/176):
Add `-U/--multiline` flag that permits matching over multiple lines.
* [FEATURE #188](https://github.com/BurntSushi/ripgrep/issues/188):
Add `-P/--pcre2` flag that gives support for look-around and backreferences.
* [FEATURE #244](https://github.com/BurntSushi/ripgrep/issues/244):
Add `--json` flag that prints results in a JSON Lines format.
* [FEATURE #321](https://github.com/BurntSushi/ripgrep/issues/321):
Add `--one-file-system` flag to skip directories on different file systems.
* [FEATURE #404](https://github.com/BurntSushi/ripgrep/issues/404):
Add `--sort` and `--sortr` flag for more sorting. Deprecate `--sort-files`.
* [FEATURE #416](https://github.com/BurntSushi/ripgrep/issues/416):
Add `--crlf` flag to permit `$` to work with carriage returns on Windows.
* [FEATURE #917](https://github.com/BurntSushi/ripgrep/issues/917):
The `--trim` flag strips prefix whitespace from all lines printed.
* [FEATURE #993](https://github.com/BurntSushi/ripgrep/issues/993):
Add `--null-data` flag, which makes ripgrep use NUL as a line terminator.
* [FEATURE #997](https://github.com/BurntSushi/ripgrep/issues/997):
The `--passthru` flag now works with the `--replace` flag.
* [FEATURE #1038-1](https://github.com/BurntSushi/ripgrep/issues/1038):
Add `--line-buffered` and `--block-buffered` for forcing a buffer strategy.
* [FEATURE #1038-2](https://github.com/BurntSushi/ripgrep/issues/1038):
Add `--pre-glob` for filtering files through the `--pre` flag.
Bug fixes:
* [BUG #2](https://github.com/BurntSushi/ripgrep/issues/2):
Searching with non-zero context can now use memory maps if appropriate.
* [BUG #200](https://github.com/BurntSushi/ripgrep/issues/200):
ripgrep will now stop correctly when its output pipe is closed.
* [BUG #389](https://github.com/BurntSushi/ripgrep/issues/389):
The `-w/--word-regexp` flag now works more intuitively.
* [BUG #643](https://github.com/BurntSushi/ripgrep/issues/643):
Detection of readable stdin has improved on Windows.
* [BUG #441](https://github.com/BurntSushi/ripgrep/issues/441),
[BUG #690](https://github.com/BurntSushi/ripgrep/issues/690),
[BUG #980](https://github.com/BurntSushi/ripgrep/issues/980):
Matching empty lines now works correctly in several corner cases.
* [BUG #764](https://github.com/BurntSushi/ripgrep/issues/764):
Color escape sequences now coalesce, which reduces output size.
* [BUG #842](https://github.com/BurntSushi/ripgrep/issues/842):
Add man page to binary Debian package.
* [BUG #922](https://github.com/BurntSushi/ripgrep/issues/922):
ripgrep is now more robust with respect to memory maps failing.
* [BUG #937](https://github.com/BurntSushi/ripgrep/issues/937):
Color escape sequences are no longer emitted for empty matches.
* [BUG #940](https://github.com/BurntSushi/ripgrep/issues/940):
Context from the `--passthru` flag should not impact process exit status.
* [BUG #984](https://github.com/BurntSushi/ripgrep/issues/984):
Fixes bug in `ignore` crate where first path was always treated as a symlink.
* [BUG #990](https://github.com/BurntSushi/ripgrep/issues/990):
Read stderr asynchronously when running a process.
* [BUG #1013](https://github.com/BurntSushi/ripgrep/issues/1013):
Add compile time and runtime CPU features to `--version` output.
* [BUG #1028](https://github.com/BurntSushi/ripgrep/pull/1028):
Don't complete bare pattern after `-f` in zsh.
0.9.0 (2018-08-03)
==================
This is a new minor version release of ripgrep that contains some minor new
@@ -116,7 +31,7 @@ multi-line search support and a JSON output format.
Feature enhancements:
* Added or improved file type filtering for Android, Bazel, Fuchsia, Haskell,
* Added or improved file type filtering for Android, Bazel, Fuschia, Haskell,
Java and Puppet.
* [FEATURE #411](https://github.com/BurntSushi/ripgrep/issues/411):
Add a `--stats` flag, which emits aggregate statistics after search results.

497
Cargo.lock generated
View File

@@ -1,17 +1,17 @@
[[package]]
name = "aho-corasick"
version = "0.6.8"
version = "0.6.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"memchr 2.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "arrayvec"
version = "0.4.7"
name = "ansi_term"
version = "0.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"nodrop 0.1.12 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
@@ -19,7 +19,7 @@ name = "atty"
version = "0.2.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"libc 0.2.43 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.42 (registry+https://github.com/rust-lang/crates.io-index)",
"termion 1.5.1 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
]
@@ -29,18 +29,18 @@ name = "base64"
version = "0.9.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"byteorder 1.2.6 (registry+https://github.com/rust-lang/crates.io-index)",
"byteorder 1.2.3 (registry+https://github.com/rust-lang/crates.io-index)",
"safemem 0.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "bitflags"
version = "1.0.4"
version = "1.0.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "bytecount"
version = "0.3.2"
version = "0.3.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"simd 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)",
@@ -48,17 +48,12 @@ dependencies = [
[[package]]
name = "byteorder"
version = "1.2.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "cc"
version = "1.0.23"
version = "1.2.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "cfg-if"
version = "0.1.5"
version = "0.1.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
@@ -66,65 +61,39 @@ name = "clap"
version = "2.32.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"bitflags 1.0.4 (registry+https://github.com/rust-lang/crates.io-index)",
"ansi_term 0.11.0 (registry+https://github.com/rust-lang/crates.io-index)",
"atty 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)",
"bitflags 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
"strsim 0.7.0 (registry+https://github.com/rust-lang/crates.io-index)",
"textwrap 0.10.0 (registry+https://github.com/rust-lang/crates.io-index)",
"unicode-width 0.1.5 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "cloudabi"
version = "0.0.3"
name = "crossbeam"
version = "0.3.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"bitflags 1.0.4 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "crossbeam-channel"
version = "0.2.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"crossbeam-epoch 0.5.2 (registry+https://github.com/rust-lang/crates.io-index)",
"crossbeam-utils 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)",
"parking_lot 0.6.4 (registry+https://github.com/rust-lang/crates.io-index)",
"rand 0.5.5 (registry+https://github.com/rust-lang/crates.io-index)",
"smallvec 0.6.5 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "crossbeam-epoch"
version = "0.5.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"arrayvec 0.4.7 (registry+https://github.com/rust-lang/crates.io-index)",
"cfg-if 0.1.5 (registry+https://github.com/rust-lang/crates.io-index)",
"crossbeam-utils 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)",
"lazy_static 1.1.0 (registry+https://github.com/rust-lang/crates.io-index)",
"memoffset 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)",
"scopeguard 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "crossbeam-utils"
version = "0.5.0"
name = "dtoa"
version = "0.4.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "encoding_rs"
version = "0.8.6"
version = "0.8.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"cfg-if 0.1.5 (registry+https://github.com/rust-lang/crates.io-index)",
"cfg-if 0.1.4 (registry+https://github.com/rust-lang/crates.io-index)",
"simd 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "encoding_rs_io"
version = "0.1.2"
version = "0.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"encoding_rs 0.8.6 (registry+https://github.com/rust-lang/crates.io-index)",
"encoding_rs 0.8.4 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
@@ -137,7 +106,7 @@ name = "fuchsia-zircon"
version = "0.3.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"bitflags 1.0.4 (registry+https://github.com/rust-lang/crates.io-index)",
"bitflags 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
"fuchsia-zircon-sys 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)",
]
@@ -155,115 +124,89 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
name = "globset"
version = "0.4.1"
dependencies = [
"aho-corasick 0.6.8 (registry+https://github.com/rust-lang/crates.io-index)",
"aho-corasick 0.6.6 (registry+https://github.com/rust-lang/crates.io-index)",
"fnv 1.0.6 (registry+https://github.com/rust-lang/crates.io-index)",
"glob 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)",
"log 0.4.4 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 1.0.4 (registry+https://github.com/rust-lang/crates.io-index)",
"log 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "grep"
version = "0.2.0"
dependencies = [
"atty 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)",
"grep-cli 0.1.0",
"grep-matcher 0.1.0",
"grep-pcre2 0.1.0",
"grep-printer 0.1.0",
"grep-regex 0.1.0",
"grep-searcher 0.1.0",
"termcolor 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
"walkdir 2.2.5 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "grep-cli"
version = "0.1.0"
dependencies = [
"atty 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)",
"globset 0.4.1",
"lazy_static 1.1.0 (registry+https://github.com/rust-lang/crates.io-index)",
"log 0.4.4 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 1.0.4 (registry+https://github.com/rust-lang/crates.io-index)",
"same-file 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
"termcolor 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi-util 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
"grep-matcher 0.0.1",
"grep-printer 0.0.1",
"grep-regex 0.0.1",
"grep-searcher 0.0.1",
]
[[package]]
name = "grep-matcher"
version = "0.1.0"
version = "0.0.1"
dependencies = [
"memchr 2.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 1.0.4 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "grep-pcre2"
version = "0.1.0"
dependencies = [
"grep-matcher 0.1.0",
"pcre2 0.1.0 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "grep-printer"
version = "0.1.0"
version = "0.0.1"
dependencies = [
"base64 0.9.2 (registry+https://github.com/rust-lang/crates.io-index)",
"grep-matcher 0.1.0",
"grep-regex 0.1.0",
"grep-searcher 0.1.0",
"serde 1.0.75 (registry+https://github.com/rust-lang/crates.io-index)",
"serde_derive 1.0.75 (registry+https://github.com/rust-lang/crates.io-index)",
"serde_json 1.0.26 (registry+https://github.com/rust-lang/crates.io-index)",
"termcolor 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
"grep-matcher 0.0.1",
"grep-regex 0.0.1",
"grep-searcher 0.0.1",
"log 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)",
"serde 1.0.70 (registry+https://github.com/rust-lang/crates.io-index)",
"serde_derive 1.0.70 (registry+https://github.com/rust-lang/crates.io-index)",
"serde_json 1.0.24 (registry+https://github.com/rust-lang/crates.io-index)",
"termcolor 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "grep-regex"
version = "0.1.0"
version = "0.0.1"
dependencies = [
"grep-matcher 0.1.0",
"log 0.4.4 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 1.0.4 (registry+https://github.com/rust-lang/crates.io-index)",
"grep-matcher 0.0.1",
"log 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"regex-syntax 0.6.2 (registry+https://github.com/rust-lang/crates.io-index)",
"thread_local 0.3.6 (registry+https://github.com/rust-lang/crates.io-index)",
"utf8-ranges 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"thread_local 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
"utf8-ranges 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "grep-searcher"
version = "0.1.0"
version = "0.0.1"
dependencies = [
"bytecount 0.3.2 (registry+https://github.com/rust-lang/crates.io-index)",
"encoding_rs 0.8.6 (registry+https://github.com/rust-lang/crates.io-index)",
"encoding_rs_io 0.1.2 (registry+https://github.com/rust-lang/crates.io-index)",
"grep-matcher 0.1.0",
"grep-regex 0.1.0",
"log 0.4.4 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"bytecount 0.3.1 (registry+https://github.com/rust-lang/crates.io-index)",
"encoding_rs 0.8.4 (registry+https://github.com/rust-lang/crates.io-index)",
"encoding_rs_io 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
"grep-matcher 0.0.1",
"grep-regex 0.0.1",
"log 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"memmap 0.6.2 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 1.0.4 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "ignore"
version = "0.4.3"
dependencies = [
"crossbeam-channel 0.2.4 (registry+https://github.com/rust-lang/crates.io-index)",
"crossbeam 0.3.2 (registry+https://github.com/rust-lang/crates.io-index)",
"globset 0.4.1",
"lazy_static 1.1.0 (registry+https://github.com/rust-lang/crates.io-index)",
"log 0.4.4 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 1.0.4 (registry+https://github.com/rust-lang/crates.io-index)",
"same-file 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
"lazy_static 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"log 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"same-file 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"tempdir 0.3.7 (registry+https://github.com/rust-lang/crates.io-index)",
"thread_local 0.3.6 (registry+https://github.com/rust-lang/crates.io-index)",
"walkdir 2.2.5 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi-util 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
"thread_local 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
"walkdir 2.1.4 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
@@ -273,40 +216,28 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "lazy_static"
version = "1.1.0"
version = "1.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"version_check 0.1.4 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "libc"
version = "0.2.43"
version = "0.2.42"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "lock_api"
version = "0.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"owning_ref 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)",
"scopeguard 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "log"
version = "0.4.4"
version = "0.4.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"cfg-if 0.1.5 (registry+https://github.com/rust-lang/crates.io-index)",
"cfg-if 0.1.4 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "memchr"
version = "2.0.2"
version = "2.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"libc 0.2.43 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.42 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
@@ -314,86 +245,21 @@ name = "memmap"
version = "0.6.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"libc 0.2.43 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.42 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "memoffset"
version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "nodrop"
version = "0.1.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "num_cpus"
version = "1.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"libc 0.2.43 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.42 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "owning_ref"
version = "0.3.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"stable_deref_trait 1.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "parking_lot"
version = "0.6.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"lock_api 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)",
"parking_lot_core 0.3.0 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "parking_lot_core"
version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"libc 0.2.43 (registry+https://github.com/rust-lang/crates.io-index)",
"rand 0.5.5 (registry+https://github.com/rust-lang/crates.io-index)",
"smallvec 0.6.5 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "pcre2"
version = "0.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"cc 1.0.23 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.43 (registry+https://github.com/rust-lang/crates.io-index)",
"pcre2-sys 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
"pkg-config 0.3.14 (registry+https://github.com/rust-lang/crates.io-index)",
"thread_local 0.3.6 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "pcre2-sys"
version = "0.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"cc 1.0.23 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.43 (registry+https://github.com/rust-lang/crates.io-index)",
"pkg-config 0.3.14 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "pkg-config"
version = "0.3.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "proc-macro2"
version = "0.4.15"
version = "0.4.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"unicode-xid 0.1.0 (registry+https://github.com/rust-lang/crates.io-index)",
@@ -401,39 +267,22 @@ dependencies = [
[[package]]
name = "quote"
version = "0.6.8"
version = "0.6.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"proc-macro2 0.4.15 (registry+https://github.com/rust-lang/crates.io-index)",
"proc-macro2 0.4.9 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "rand"
version = "0.4.3"
version = "0.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"fuchsia-zircon 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.43 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.42 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "rand"
version = "0.5.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"cloudabi 0.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
"fuchsia-zircon 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.43 (registry+https://github.com/rust-lang/crates.io-index)",
"rand_core 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "rand_core"
version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "redox_syscall"
version = "0.1.40"
@@ -449,14 +298,14 @@ dependencies = [
[[package]]
name = "regex"
version = "1.0.4"
version = "1.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"aho-corasick 0.6.8 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"aho-corasick 0.6.6 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"regex-syntax 0.6.2 (registry+https://github.com/rust-lang/crates.io-index)",
"thread_local 0.3.6 (registry+https://github.com/rust-lang/crates.io-index)",
"utf8-ranges 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"thread_local 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
"utf8-ranges 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
@@ -479,24 +328,20 @@ dependencies = [
name = "ripgrep"
version = "0.9.0"
dependencies = [
"atty 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)",
"clap 2.32.0 (registry+https://github.com/rust-lang/crates.io-index)",
"globset 0.4.1",
"grep 0.2.0",
"ignore 0.4.3",
"lazy_static 1.1.0 (registry+https://github.com/rust-lang/crates.io-index)",
"log 0.4.4 (registry+https://github.com/rust-lang/crates.io-index)",
"lazy_static 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"log 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)",
"num_cpus 1.8.0 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 1.0.4 (registry+https://github.com/rust-lang/crates.io-index)",
"serde 1.0.75 (registry+https://github.com/rust-lang/crates.io-index)",
"serde_derive 1.0.75 (registry+https://github.com/rust-lang/crates.io-index)",
"serde_json 1.0.26 (registry+https://github.com/rust-lang/crates.io-index)",
"termcolor 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"same-file 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"termcolor 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "ryu"
version = "0.2.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "safemem"
version = "0.2.0"
@@ -504,40 +349,35 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "same-file"
version = "1.0.3"
version = "1.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"winapi-util 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "scopeguard"
version = "0.3.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "serde"
version = "1.0.75"
version = "1.0.70"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "serde_derive"
version = "1.0.75"
version = "1.0.70"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"proc-macro2 0.4.15 (registry+https://github.com/rust-lang/crates.io-index)",
"quote 0.6.8 (registry+https://github.com/rust-lang/crates.io-index)",
"syn 0.14.9 (registry+https://github.com/rust-lang/crates.io-index)",
"proc-macro2 0.4.9 (registry+https://github.com/rust-lang/crates.io-index)",
"quote 0.6.3 (registry+https://github.com/rust-lang/crates.io-index)",
"syn 0.14.4 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "serde_json"
version = "1.0.26"
version = "1.0.24"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"dtoa 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)",
"itoa 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)",
"ryu 0.2.6 (registry+https://github.com/rust-lang/crates.io-index)",
"serde 1.0.75 (registry+https://github.com/rust-lang/crates.io-index)",
"serde 1.0.70 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
@@ -545,19 +385,6 @@ name = "simd"
version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "smallvec"
version = "0.6.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"unreachable 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "stable_deref_trait"
version = "1.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "strsim"
version = "0.7.0"
@@ -565,11 +392,11 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "syn"
version = "0.14.9"
version = "0.14.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"proc-macro2 0.4.15 (registry+https://github.com/rust-lang/crates.io-index)",
"quote 0.6.8 (registry+https://github.com/rust-lang/crates.io-index)",
"proc-macro2 0.4.9 (registry+https://github.com/rust-lang/crates.io-index)",
"quote 0.6.3 (registry+https://github.com/rust-lang/crates.io-index)",
"unicode-xid 0.1.0 (registry+https://github.com/rust-lang/crates.io-index)",
]
@@ -578,16 +405,16 @@ name = "tempdir"
version = "0.3.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"rand 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)",
"rand 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)",
"remove_dir_all 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "termcolor"
version = "1.0.3"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"wincolor 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"wincolor 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
@@ -595,7 +422,7 @@ name = "termion"
version = "1.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"libc 0.2.43 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.42 (registry+https://github.com/rust-lang/crates.io-index)",
"redox_syscall 0.1.40 (registry+https://github.com/rust-lang/crates.io-index)",
"redox_termios 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
]
@@ -610,10 +437,11 @@ dependencies = [
[[package]]
name = "thread_local"
version = "0.3.6"
version = "0.3.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"lazy_static 1.1.0 (registry+https://github.com/rust-lang/crates.io-index)",
"lazy_static 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"unreachable 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
@@ -641,12 +469,7 @@ dependencies = [
[[package]]
name = "utf8-ranges"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "version_check"
version = "0.1.4"
version = "1.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
@@ -656,12 +479,11 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "walkdir"
version = "2.2.5"
version = "2.1.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"same-file 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
"same-file 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi-util 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
@@ -678,14 +500,6 @@ name = "winapi-i686-pc-windows-gnu"
version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "winapi-util"
version = "0.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "winapi-x86_64-pc-windows-gnu"
version = "0.4.0"
@@ -693,87 +507,66 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "wincolor"
version = "1.0.1"
version = "1.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi-util 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
]
[metadata]
"checksum aho-corasick 0.6.8 (registry+https://github.com/rust-lang/crates.io-index)" = "68f56c7353e5a9547cbd76ed90f7bb5ffc3ba09d4ea9bd1d8c06c8b1142eeb5a"
"checksum arrayvec 0.4.7 (registry+https://github.com/rust-lang/crates.io-index)" = "a1e964f9e24d588183fcb43503abda40d288c8657dfc27311516ce2f05675aef"
"checksum aho-corasick 0.6.6 (registry+https://github.com/rust-lang/crates.io-index)" = "c1c6d463cbe7ed28720b5b489e7c083eeb8f90d08be2a0d6bb9e1ffea9ce1afa"
"checksum ansi_term 0.11.0 (registry+https://github.com/rust-lang/crates.io-index)" = "ee49baf6cb617b853aa8d93bf420db2383fab46d314482ca2803b40d5fde979b"
"checksum atty 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)" = "9a7d5b8723950951411ee34d271d99dddcc2035a16ab25310ea2c8cfd4369652"
"checksum base64 0.9.2 (registry+https://github.com/rust-lang/crates.io-index)" = "85415d2594767338a74a30c1d370b2f3262ec1b4ed2d7bba5b3faf4de40467d9"
"checksum bitflags 1.0.4 (registry+https://github.com/rust-lang/crates.io-index)" = "228047a76f468627ca71776ecdebd732a3423081fcf5125585bcd7c49886ce12"
"checksum bytecount 0.3.2 (registry+https://github.com/rust-lang/crates.io-index)" = "f861d9ce359f56dbcb6e0c2a1cb84e52ad732cadb57b806adeb3c7668caccbd8"
"checksum byteorder 1.2.6 (registry+https://github.com/rust-lang/crates.io-index)" = "90492c5858dd7d2e78691cfb89f90d273a2800fc11d98f60786e5d87e2f83781"
"checksum cc 1.0.23 (registry+https://github.com/rust-lang/crates.io-index)" = "c37f0efaa4b9b001fa6f02d4b644dee4af97d3414df07c51e3e4f015f3a3e131"
"checksum cfg-if 0.1.5 (registry+https://github.com/rust-lang/crates.io-index)" = "0c4e7bb64a8ebb0d856483e1e682ea3422f883c5f5615a90d51a2c82fe87fdd3"
"checksum bitflags 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)" = "d0c54bb8f454c567f21197eefcdbf5679d0bd99f2ddbe52e84c77061952e6789"
"checksum bytecount 0.3.1 (registry+https://github.com/rust-lang/crates.io-index)" = "882585cd7ec84e902472df34a5e01891202db3bf62614e1f0afe459c1afcf744"
"checksum byteorder 1.2.3 (registry+https://github.com/rust-lang/crates.io-index)" = "74c0b906e9446b0a2e4f760cdb3fa4b2c48cdc6db8766a845c54b6ff063fd2e9"
"checksum cfg-if 0.1.4 (registry+https://github.com/rust-lang/crates.io-index)" = "efe5c877e17a9c717a0bf3613b2709f723202c4e4675cc8f12926ded29bcb17e"
"checksum clap 2.32.0 (registry+https://github.com/rust-lang/crates.io-index)" = "b957d88f4b6a63b9d70d5f454ac8011819c6efa7727858f458ab71c756ce2d3e"
"checksum cloudabi 0.0.3 (registry+https://github.com/rust-lang/crates.io-index)" = "ddfc5b9aa5d4507acaf872de71051dfd0e309860e88966e1051e462a077aac4f"
"checksum crossbeam-channel 0.2.4 (registry+https://github.com/rust-lang/crates.io-index)" = "6c0a94250b0278d7fc5a894c3d276b11ea164edc8bf8feb10ca1ea517b44a649"
"checksum crossbeam-epoch 0.5.2 (registry+https://github.com/rust-lang/crates.io-index)" = "30fecfcac6abfef8771151f8be4abc9e4edc112c2bcb233314cafde2680536e9"
"checksum crossbeam-utils 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)" = "677d453a17e8bd2b913fa38e8b9cf04bcdbb5be790aa294f2389661d72036015"
"checksum encoding_rs 0.8.6 (registry+https://github.com/rust-lang/crates.io-index)" = "2a91912d6f37c6a8fef8a2316a862542d036f13c923ad518b5aca7bcaac7544c"
"checksum encoding_rs_io 0.1.2 (registry+https://github.com/rust-lang/crates.io-index)" = "f222ff554d6e172f3569a2d7d0fd8061d54215984ef67b24ce031c1fcbf2c9b3"
"checksum crossbeam 0.3.2 (registry+https://github.com/rust-lang/crates.io-index)" = "24ce9782d4d5c53674646a6a4c1863a21a8fc0cb649b3c94dfc16e45071dea19"
"checksum dtoa 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)" = "6d301140eb411af13d3115f9a562c85cc6b541ade9dfa314132244aaee7489dd"
"checksum encoding_rs 0.8.4 (registry+https://github.com/rust-lang/crates.io-index)" = "88a1b66a0d28af4b03a8c8278c6dcb90e6e600d89c14500a9e7a02e64b9ee3ac"
"checksum encoding_rs_io 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "ad0ffe753ba194ef1bc070e8d61edaadb1536c05e364fc9178ca6cbde10922c4"
"checksum fnv 1.0.6 (registry+https://github.com/rust-lang/crates.io-index)" = "2fad85553e09a6f881f739c29f0b00b0f01357c743266d478b68951ce23285f3"
"checksum fuchsia-zircon 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)" = "2e9763c69ebaae630ba35f74888db465e49e259ba1bc0eda7d06f4a067615d82"
"checksum fuchsia-zircon-sys 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)" = "3dcaa9ae7725d12cdb85b3ad99a434db70b468c09ded17e012d86b5c1010f7a7"
"checksum glob 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)" = "8be18de09a56b60ed0edf84bc9df007e30040691af7acd1c41874faac5895bfb"
"checksum itoa 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)" = "5adb58558dcd1d786b5f0bd15f3226ee23486e24b7b58304b60f64dc68e62606"
"checksum lazy_static 1.1.0 (registry+https://github.com/rust-lang/crates.io-index)" = "ca488b89a5657b0a2ecd45b95609b3e848cf1755da332a0da46e2b2b1cb371a7"
"checksum libc 0.2.43 (registry+https://github.com/rust-lang/crates.io-index)" = "76e3a3ef172f1a0b9a9ff0dd1491ae5e6c948b94479a3021819ba7d860c8645d"
"checksum lock_api 0.1.3 (registry+https://github.com/rust-lang/crates.io-index)" = "949826a5ccf18c1b3a7c3d57692778d21768b79e46eb9dd07bfc4c2160036c54"
"checksum log 0.4.4 (registry+https://github.com/rust-lang/crates.io-index)" = "cba860f648db8e6f269df990180c2217f333472b4a6e901e97446858487971e2"
"checksum memchr 2.0.2 (registry+https://github.com/rust-lang/crates.io-index)" = "a3b4142ab8738a78c51896f704f83c11df047ff1bda9a92a661aa6361552d93d"
"checksum lazy_static 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)" = "fb497c35d362b6a331cfd94956a07fc2c78a4604cdbee844a81170386b996dd3"
"checksum libc 0.2.42 (registry+https://github.com/rust-lang/crates.io-index)" = "b685088df2b950fccadf07a7187c8ef846a959c142338a48f9dc0b94517eb5f1"
"checksum log 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)" = "61bd98ae7f7b754bc53dca7d44b604f733c6bba044ea6f41bc8d89272d8161d2"
"checksum memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)" = "796fba70e76612589ed2ce7f45282f5af869e0fdd7cc6199fa1aa1f1d591ba9d"
"checksum memmap 0.6.2 (registry+https://github.com/rust-lang/crates.io-index)" = "e2ffa2c986de11a9df78620c01eeaaf27d94d3ff02bf81bfcca953102dd0c6ff"
"checksum memoffset 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)" = "0f9dc261e2b62d7a622bf416ea3c5245cdd5d9a7fcc428c0d06804dfce1775b3"
"checksum nodrop 0.1.12 (registry+https://github.com/rust-lang/crates.io-index)" = "9a2228dca57108069a5262f2ed8bd2e82496d2e074a06d1ccc7ce1687b6ae0a2"
"checksum num_cpus 1.8.0 (registry+https://github.com/rust-lang/crates.io-index)" = "c51a3322e4bca9d212ad9a158a02abc6934d005490c054a2778df73a70aa0a30"
"checksum owning_ref 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)" = "cdf84f41639e037b484f93433aa3897863b561ed65c6e59c7073d7c561710f37"
"checksum parking_lot 0.6.4 (registry+https://github.com/rust-lang/crates.io-index)" = "f0802bff09003b291ba756dc7e79313e51cc31667e94afbe847def490424cde5"
"checksum parking_lot_core 0.3.0 (registry+https://github.com/rust-lang/crates.io-index)" = "06a2b6aae052309c2fd2161ef58f5067bc17bb758377a0de9d4b279d603fdd8a"
"checksum pcre2 0.1.0 (registry+https://github.com/rust-lang/crates.io-index)" = "0c16ec0e30c17f938a2da8ff970ad9a4100166d0538898dcc035b55c393cab54"
"checksum pcre2-sys 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "a9027f9474e4e13d3b965538aafcaebe48c803488ad76b3c97ef061a8324695f"
"checksum pkg-config 0.3.14 (registry+https://github.com/rust-lang/crates.io-index)" = "676e8eb2b1b4c9043511a9b7bea0915320d7e502b0a079fb03f9635a5252b18c"
"checksum proc-macro2 0.4.15 (registry+https://github.com/rust-lang/crates.io-index)" = "295af93acfb1d5be29c16ca5b3f82d863836efd9cb0c14fd83811eb9a110e452"
"checksum quote 0.6.8 (registry+https://github.com/rust-lang/crates.io-index)" = "dd636425967c33af890042c483632d33fa7a18f19ad1d7ea72e8998c6ef8dea5"
"checksum rand 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)" = "8356f47b32624fef5b3301c1be97e5944ecdd595409cc5da11d05f211db6cfbd"
"checksum rand 0.5.5 (registry+https://github.com/rust-lang/crates.io-index)" = "e464cd887e869cddcae8792a4ee31d23c7edd516700695608f5b98c67ee0131c"
"checksum rand_core 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)" = "edecf0f94da5551fc9b492093e30b041a891657db7940ee221f9d2f66e82eef2"
"checksum proc-macro2 0.4.9 (registry+https://github.com/rust-lang/crates.io-index)" = "cccdc7557a98fe98453030f077df7f3a042052fae465bb61d2c2c41435cfd9b6"
"checksum quote 0.6.3 (registry+https://github.com/rust-lang/crates.io-index)" = "e44651a0dc4cdd99f71c83b561e221f714912d11af1a4dff0631f923d53af035"
"checksum rand 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)" = "eba5f8cb59cc50ed56be8880a5c7b496bfd9bd26394e176bc67884094145c2c5"
"checksum redox_syscall 0.1.40 (registry+https://github.com/rust-lang/crates.io-index)" = "c214e91d3ecf43e9a4e41e578973adeb14b474f2bee858742d127af75a0112b1"
"checksum redox_termios 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "7e891cfe48e9100a70a3b6eb652fef28920c117d366339687bd5576160db0f76"
"checksum regex 1.0.4 (registry+https://github.com/rust-lang/crates.io-index)" = "67d0301b0c6804eca7e3c275119d0b01ff3b7ab9258a65709e608a66312a1025"
"checksum regex 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)" = "5bbbea44c5490a1e84357ff28b7d518b4619a159fed5d25f6c1de2d19cc42814"
"checksum regex-syntax 0.6.2 (registry+https://github.com/rust-lang/crates.io-index)" = "747ba3b235651f6e2f67dfa8bcdcd073ddb7c243cb21c442fc12395dfcac212d"
"checksum remove_dir_all 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)" = "3488ba1b9a2084d38645c4c08276a1752dcbf2c7130d74f1569681ad5d2799c5"
"checksum ryu 0.2.6 (registry+https://github.com/rust-lang/crates.io-index)" = "7153dd96dade874ab973e098cb62fcdbb89a03682e46b144fd09550998d4a4a7"
"checksum safemem 0.2.0 (registry+https://github.com/rust-lang/crates.io-index)" = "e27a8b19b835f7aea908818e871f5cc3a5a186550c30773be987e155e8163d8f"
"checksum same-file 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)" = "10f7794e2fda7f594866840e95f5c5962e886e228e68b6505885811a94dd728c"
"checksum scopeguard 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)" = "94258f53601af11e6a49f722422f6e3425c52b06245a5cf9bc09908b174f5e27"
"checksum serde 1.0.75 (registry+https://github.com/rust-lang/crates.io-index)" = "22d340507cea0b7e6632900a176101fea959c7065d93ba555072da90aaaafc87"
"checksum serde_derive 1.0.75 (registry+https://github.com/rust-lang/crates.io-index)" = "234fc8b737737b148ccd625175fc6390f5e4dacfdaa543cb93a3430d984a9119"
"checksum serde_json 1.0.26 (registry+https://github.com/rust-lang/crates.io-index)" = "44dd2cfde475037451fa99b7e5df77aa3cfd1536575fa8e7a538ab36dcde49ae"
"checksum same-file 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)" = "cfb6eded0b06a0b512c8ddbcf04089138c9b4362c2f696f3c3d76039d68f3637"
"checksum serde 1.0.70 (registry+https://github.com/rust-lang/crates.io-index)" = "0c3adf19c07af6d186d91dae8927b83b0553d07ca56cbf7f2f32560455c91920"
"checksum serde_derive 1.0.70 (registry+https://github.com/rust-lang/crates.io-index)" = "3525a779832b08693031b8ecfb0de81cd71cfd3812088fafe9a7496789572124"
"checksum serde_json 1.0.24 (registry+https://github.com/rust-lang/crates.io-index)" = "c3c6908c7b925cd6c590358a4034de93dbddb20c45e1d021931459fd419bf0e2"
"checksum simd 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)" = "ed3686dd9418ebcc3a26a0c0ae56deab0681e53fe899af91f5bbcee667ebffb1"
"checksum smallvec 0.6.5 (registry+https://github.com/rust-lang/crates.io-index)" = "153ffa32fd170e9944f7e0838edf824a754ec4c1fc64746fcc9fe1f8fa602e5d"
"checksum stable_deref_trait 1.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "dba1a27d3efae4351c8051072d619e3ade2820635c3958d826bfea39d59b54c8"
"checksum strsim 0.7.0 (registry+https://github.com/rust-lang/crates.io-index)" = "bb4f380125926a99e52bc279241539c018323fab05ad6368b56f93d9369ff550"
"checksum syn 0.14.9 (registry+https://github.com/rust-lang/crates.io-index)" = "261ae9ecaa397c42b960649561949d69311f08eeaea86a65696e6e46517cf741"
"checksum syn 0.14.4 (registry+https://github.com/rust-lang/crates.io-index)" = "2beff8ebc3658f07512a413866875adddd20f4fd47b2a4e6c9da65cd281baaea"
"checksum tempdir 0.3.7 (registry+https://github.com/rust-lang/crates.io-index)" = "15f2b5fb00ccdf689e0149d1b1b3c03fead81c2b37735d812fa8bddbbf41b6d8"
"checksum termcolor 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)" = "ff3bac0e465b59f194e7037ed404b0326e56ff234d767edc4c5cc9cd49e7a2c7"
"checksum termcolor 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)" = "722426c4a0539da2c4ffd9b419d90ad540b4cff4a053be9069c908d4d07e2836"
"checksum termion 1.5.1 (registry+https://github.com/rust-lang/crates.io-index)" = "689a3bdfaab439fd92bc87df5c4c78417d3cbe537487274e9b0b2dce76e92096"
"checksum textwrap 0.10.0 (registry+https://github.com/rust-lang/crates.io-index)" = "307686869c93e71f94da64286f9a9524c0f308a9e1c87a583de8e9c9039ad3f6"
"checksum thread_local 0.3.6 (registry+https://github.com/rust-lang/crates.io-index)" = "c6b53e329000edc2b34dbe8545fd20e55a333362d0a321909685a19bd28c3f1b"
"checksum thread_local 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)" = "279ef31c19ededf577bfd12dfae728040a21f635b06a24cd670ff510edd38963"
"checksum ucd-util 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "fd2be2d6639d0f8fe6cdda291ad456e23629558d466e2789d2c3e9892bda285d"
"checksum unicode-width 0.1.5 (registry+https://github.com/rust-lang/crates.io-index)" = "882386231c45df4700b275c7ff55b6f3698780a650026380e72dabe76fa46526"
"checksum unicode-xid 0.1.0 (registry+https://github.com/rust-lang/crates.io-index)" = "fc72304796d0818e357ead4e000d19c9c174ab23dc11093ac919054d20a6a7fc"
"checksum unreachable 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "382810877fe448991dfc7f0dd6e3ae5d58088fd0ea5e35189655f84e6814fa56"
"checksum utf8-ranges 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)" = "fd70f467df6810094968e2fce0ee1bd0e87157aceb026a8c083bcf5e25b9efe4"
"checksum version_check 0.1.4 (registry+https://github.com/rust-lang/crates.io-index)" = "7716c242968ee87e5542f8021178248f267f295a5c4803beae8b8b7fd9bc6051"
"checksum utf8-ranges 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "662fab6525a98beff2921d7f61a39e7d59e0b425ebc7d0d9e66d316e55124122"
"checksum void 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)" = "6a02e4885ed3bc0f2de90ea6dd45ebcbb66dacffe03547fadbb0eeae2770887d"
"checksum walkdir 2.2.5 (registry+https://github.com/rust-lang/crates.io-index)" = "af464bc7be7b785c7ac72e266a6b67c4c9070155606f51655a650a6686204e35"
"checksum walkdir 2.1.4 (registry+https://github.com/rust-lang/crates.io-index)" = "63636bd0eb3d00ccb8b9036381b526efac53caf112b7783b730ab3f8e44da369"
"checksum winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)" = "773ef9dcc5f24b7d850d0ff101e542ff24c3b090a9768e03ff889fdef41f00fd"
"checksum winapi-i686-pc-windows-gnu 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)" = "ac3b87c63620426dd9b991e5ce0329eff545bccbbb34f3be09ff6fb6ab51b7b6"
"checksum winapi-util 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "afc5508759c5bf4285e61feb862b6083c8480aec864fa17a81fdec6f69b461ab"
"checksum winapi-x86_64-pc-windows-gnu 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)" = "712e227841d057c1ee1cd2fb22fa7e5a5461ae8e48fa2ca79ec42cfc1931183f"
"checksum wincolor 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)" = "561ed901ae465d6185fa7864d63fbd5720d0ef718366c9a4dc83cf6170d7e9ba"
"checksum wincolor 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "b9dc3aa9dcda98b5a16150c54619c1ead22e3d3a5d458778ae914be760aa981a"

View File

@@ -33,70 +33,42 @@ path = "tests/tests.rs"
[workspace]
members = [
"globset",
"grep",
"grep-cli",
"grep-matcher",
"grep-pcre2",
"grep-printer",
"grep-regex",
"grep-searcher",
"ignore",
"grep", "globset", "ignore",
"grep-matcher", "grep-printer", "grep-regex", "grep-searcher",
]
[dependencies]
atty = "0.2.11"
globset = { version = "0.4.0", path = "globset" }
grep = { version = "0.2.0", path = "grep" }
ignore = { version = "0.4.0", path = "ignore" }
lazy_static = "1"
log = "0.4"
num_cpus = "1"
regex = "1"
serde_json = "1"
same-file = "1"
termcolor = "1"
[dependencies.clap]
version = "2.32.0"
version = "2.29.4"
default-features = false
features = ["suggestions"]
features = ["suggestions", "color"]
[target.'cfg(windows)'.dependencies.winapi]
version = "0.3"
features = ["std", "winnt"]
[build-dependencies]
lazy_static = "1"
[build-dependencies.clap]
version = "2.32.0"
version = "2.29.4"
default-features = false
features = ["suggestions"]
[dev-dependencies]
serde = "1"
serde_derive = "1"
features = ["suggestions", "color"]
[features]
avx-accel = ["grep/avx-accel"]
simd-accel = ["grep/simd-accel"]
pcre2 = ["grep/pcre2"]
[profile.release]
debug = 1
[package.metadata.deb]
features = ["pcre2"]
assets = [
["target/release/rg", "usr/bin/", "755"],
["COPYING", "usr/share/doc/ripgrep/", "644"],
["LICENSE-MIT", "usr/share/doc/ripgrep/", "644"],
["UNLICENSE", "usr/share/doc/ripgrep/", "644"],
["CHANGELOG.md", "usr/share/doc/ripgrep/CHANGELOG", "644"],
["README.md", "usr/share/doc/ripgrep/README", "644"],
["FAQ.md", "usr/share/doc/ripgrep/FAQ", "644"],
# The man page is automatically generated by ripgrep's build process, so
# this file isn't actually commited. Instead, to create a dpkg, either
# create a deployment directory and copy the man page to it, or use the
# 'ci/build_deb.sh' script.
["deployment/rg.1", "usr/share/man/man1/rg.1", "644"],
]
extended-description = """\
ripgrep (rg) recursively searches your current directory for a regex pattern.
By default, ripgrep will respect your .gitignore and automatically skip hidden
files/directories and binary files.
"""
debug = true

332
FAQ.md
View File

@@ -16,7 +16,6 @@
* [How do I get around the regex size limit?](#size-limit)
* [How do I make the `-f/--file` flag faster?](#dfa-size)
* [How do I make the output look like The Silver Searcher's output?](#silver-searcher-output)
* [Why does ripgrep get slower when I enabled PCRE2 regexes?](#pcre2-slow)
* [When I run `rg`, why does it execute some other command?](#rg-other-cmd)
* [How do I create an alias for ripgrep on Windows?](#rg-alias-windows)
* [How do I create a PowerShell profile?](#powershell-profile)
@@ -158,37 +157,13 @@ tool. With that said,
How do I use lookaround and/or backreferences?
</h3>
ripgrep's default regex engine does not support lookaround or backreferences.
This is primarily because the default regex engine is implemented using finite
state machines in order to guarantee a linear worst case time complexity on all
inputs. Backreferences are not possible to implement in this paradigm, and
lookaround appears difficult to do efficiently.
This isn't currently possible. ripgrep uses finite automata to implement
regular expression search, and in turn, guarantees linear time searching on all
inputs. It is difficult to efficiently support lookaround and backreferences in
finite automata engines, so ripgrep does not provide these features.
However, ripgrep optionally supports using PCRE2 as the regex engine instead of
the default one based on finite state machines. You can enable PCRE2 with the
`-P/--pcre2` flag. For example, in the root of the ripgrep repo, you can easily
find all palindromes:
```
$ rg -P '(\w{10})\1'
tests/misc.rs
483: cmd.arg("--max-filesize").arg("44444444444444444444");
globset/src/glob.rs
1206: matches!(match7, "a*a*a*a*a*a*a*a*a", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
```
If your version of ripgrep doesn't support PCRE2, then you'll get an error
message when you try to use the `-P/--pcre2` flag:
```
$ rg -P '(\w{10})\1'
PCRE2 is not available in this build of ripgrep
```
Most of the releases distributed by the ripgrep project here on GitHub will
come bundled with PCRE2 enabled. If you installed ripgrep through a different
means (like your system's package manager), then please reach out to the
maintainer of that package to see whether it's possible to enable the PCRE2
If a production quality regular expression engine with these features is ever
written in Rust, then it is possible ripgrep will provide it as an opt-in
feature.
@@ -393,301 +368,6 @@ $ RIPGREP_CONFIG_PATH=$HOME/.config/ripgrep/rc rg foo
```
<h3 name="pcre2-slow">
Why does ripgrep get slower when I enable PCRE2 regexes?
</h3>
When you use the `--pcre2` (`-P` for short) flag, ripgrep will use the PCRE2
regex engine instead of the default. Both regex engines are quite fast,
but PCRE2 provides a number of additional features such as look-around and
backreferences that many enjoy using. This is largely because PCRE2 uses
a backtracking implementation where as the default regex engine uses a finite
automaton based implementation. The former provides the ability to add lots of
bells and whistles over the latter, but the latter executes with worst case
linear time complexity.
With that out of the way, if you've used `-P` with ripgrep, you may have
noticed that it can be slower. The reasons for why this is are quite complex,
and they are complex because the optimizations that ripgrep uses to implement
fast search are complex.
The task ripgrep has before it is somewhat simple; all it needs to do is search
a file for occurrences of some pattern and then print the lines containing
those occurrences. The problem lies in what is considered a valid match and how
exactly we read the bytes from a file.
In terms of what is considered a valid match, remember that ripgrep will only
report matches spanning a single line by default. The problem here is that
some patterns can match across multiple lines, and ripgrep needs to prevent
that from happening. For example, `foo\sbar` will match `foo\nbar`. The most
obvious way to achieve this is to read the data from a file, and then apply
the pattern search to that data for each line. The problem with this approach
is that it can be quite slow; it would be much faster to let the pattern
search across as much data as possible. It's faster because it gets rid of the
overhead of finding the boundaries of every line, and also because it gets rid
of the overhead of starting and stopping the pattern search for every single
line. (This is operating under the general assumption that matching lines are
much rarer than non-matching lines.)
It turns out that we can use the faster approach by applying a very simple
restriction to the pattern: *statically prevent* the pattern from matching
through a `\n` character. Namely, when given a pattern like `foo\sbar`,
ripgrep will remove `\n` from the `\s` character class automatically. In some
cases, a simple removal is not so easy. For example, ripgrep will return an
error when your pattern includes a `\n` literal:
```
$ rg '\n'
the literal '"\n"' is not allowed in a regex
```
So what does this have to do with PCRE2? Well, ripgrep's default regex engine
exposes APIs for doing syntactic analysis on the pattern in a way that makes
it quite easy to strip `\n` from the pattern (or otherwise detect it and report
an error if stripping isn't possible). PCRE2 seemingly does not provide a
similar API, so ripgrep does not do any stripping when PCRE2 is enabled. This
forces ripgrep to use the "slow" search strategy of searching each line
individually.
OK, so if enabling PCRE2 slows down the default method of searching because it
forces matches to be limited to a single line, then why is PCRE2 also sometimes
slower when performing multiline searches? Well, that's because there are
*multiple* reasons why using PCRE2 in ripgrep can be slower than the default
regex engine. This time, blame PCRE2's Unicode support, which ripgrep enables
by default. In particular, PCRE2 cannot simultaneously enable Unicode support
and search arbitrary data. That is, when PCRE2's Unicode support is enabled,
the data **must** be valid UTF-8 (to do otherwise is to invoke undefined
behavior). This is in contrast to ripgrep's default regex engine, which can
enable Unicode support and still search arbitrary data. ripgrep's default
regex engine simply won't match invalid UTF-8 for a pattern that can otherwise
only match valid UTF-8. Why doesn't PCRE2 do the same? This author isn't
familiar with its internals, so we can't comment on it here.
The bottom line here is that we can't enable PCRE2's Unicode support without
simultaneously incurring a performance penalty for ensuring that we are
searching valid UTF-8. In particular, ripgrep will transcode the contents
of each file to UTF-8 while replacing invalid UTF-8 data with the Unicode
replacement codepoint. ripgrep then disables PCRE2's own internal UTF-8
checking, since we've guaranteed the data we hand it will be valid UTF-8. The
reason why ripgrep takes this approach is because if we do hand PCRE2 invalid
UTF-8, then it will report a match error if it comes across an invalid UTF-8
sequence. This is not good news for ripgrep, since it will stop it from
searching the rest of the file, and will also print potentially undesirable
error messages to users.
All right, the above is a lot of information to swallow if you aren't already
familiar with ripgrep internals. Let's make this concrete with some examples.
First, let's get some data big enough to magnify the performance differences:
```
$ curl -O 'https://burntsushi.net/stuff/subtitles2016-sample.gz'
$ gzip -d subtitles2016-sample
$ md5sum subtitles2016-sample
e3cb796a20bbc602fbfd6bb43bda45f5 subtitles2016-sample
```
To search this data, we will use the pattern `^\w{42}$`, which contains exactly
one hit in the file and has no literals. Having no literals is important,
because it ensures that the regex engine won't use literal optimizations to
speed up the search. In other words, it lets us reason coherently about the
actual task that the regex engine is performing.
Let's now walk through a few examples in light of the information above. First,
let's consider the default search using ripgrep's default regex engine and
then the same search with PCRE2:
```
$ time rg '^\w{42}$' subtitles2016-sample
21225780:EverymajordevelopmentinthehistoryofAmerica
real 0m1.783s
user 0m1.731s
sys 0m0.051s
$ time rg -P '^\w{42}$' subtitles2016-sample
21225780:EverymajordevelopmentinthehistoryofAmerica
real 0m2.458s
user 0m2.419s
sys 0m0.038s
```
In this particular example, both pattern searches are using a Unicode aware
`\w` character class and both are counting lines in order to report line
numbers. The key difference here is that the first search will not search
line by line, but the second one will. We can observe which strategy ripgrep
uses by passing the `--trace` flag:
```
$ rg '^\w{42}$' subtitles2016-sample --trace
[... snip ...]
TRACE|grep_searcher::searcher|grep-searcher/src/searcher/mod.rs:622: Some("subtitles2016-sample"): searching via memory map
TRACE|grep_searcher::searcher|grep-searcher/src/searcher/mod.rs:712: slice reader: searching via slice-by-line strategy
TRACE|grep_searcher::searcher::core|grep-searcher/src/searcher/core.rs:61: searcher core: will use fast line searcher
[... snip ...]
$ rg -P '^\w{42}$' subtitles2016-sample --trace
[... snip ...]
TRACE|grep_searcher::searcher|grep-searcher/src/searcher/mod.rs:622: Some("subtitles2016-sample"): searching via memory map
TRACE|grep_searcher::searcher|grep-searcher/src/searcher/mod.rs:705: slice reader: needs transcoding, using generic reader
TRACE|grep_searcher::searcher|grep-searcher/src/searcher/mod.rs:685: generic reader: searching via roll buffer strategy
TRACE|grep_searcher::searcher::core|grep-searcher/src/searcher/core.rs:63: searcher core: will use slow line searcher
[... snip ...]
```
The first says it is using the "fast line searcher" where as the latter says
it is using the "slow line searcher." The latter also shows that we are
decoding the contents of the file, which also impacts performance.
Interestingly, in this case, the pattern does not match a `\n` and the file
we're searching is valid UTF-8, so neither the slow line-by-line search
strategy nor the decoding are necessary. We could fix the former issue with
better PCRE2 introspection APIs. We can actually fix the latter issue with
ripgrep's `--no-encoding` flag, which prevents the automatic UTF-8 decoding,
but will enable PCRE2's own UTF-8 validity checking. Unfortunately, it's slower
in my build of ripgrep:
```
$ time rg -P '^\w{42}$' subtitles2016-sample --no-encoding
21225780:EverymajordevelopmentinthehistoryofAmerica
real 0m3.074s
user 0m3.021s
sys 0m0.051s
```
(Tip: use the `--trace` flag to verify that no decoding in ripgrep is
happening.)
A possible reason why PCRE2's UTF-8 checking is slower is because it might
not be better than the highly optimized UTF-8 checking routines found in the
[`encoding_rs`](https://github.com/hsivonen/encoding_rs) library, which is what
ripgrep uses for UTF-8 decoding. Moreover, my build of ripgrep enables
`encoding_rs`'s SIMD optimizations, which may be in play here.
Also, note that using the `--no-encoding` flag can cause PCRE2 to report
invalid UTF-8 errors, which causes ripgrep to stop searching the file:
```
$ cat invalid-utf8
foobar
$ xxd invalid-utf8
00000000: 666f 6fff 6261 720a foo.bar.
$ rg foo invalid-utf8
1:foobar
$ rg -P foo invalid-utf8
1:foo<6F>bar
$ rg -P foo invalid-utf8 --no-encoding
invalid-utf8: PCRE2: error matching: UTF-8 error: illegal byte (0xfe or 0xff)
```
All right, so at this point, you might think that we could remove the penalty
for line-by-line searching by enabling multiline search. After all, our
particular pattern can't match across multiple lines anyway, so we'll still get
the results we want. Let's try it:
```
$ time rg -U '^\w{42}$' subtitles2016-sample
21225780:EverymajordevelopmentinthehistoryofAmerica
real 0m1.803s
user 0m1.748s
sys 0m0.054s
$ time rg -P -U '^\w{42}$' subtitles2016-sample
21225780:EverymajordevelopmentinthehistoryofAmerica
real 0m2.962s
user 0m2.246s
sys 0m0.713s
```
Search times remain the same with the default regex engine, but the PCRE2
search gets _slower_. What happened? The secrets can be revealed with the
`--trace` flag once again. In the former case, ripgrep actually detects that
the pattern can't match across multiple lines, and so will fall back to the
"fast line search" strategy as with our search without `-U`.
However, for PCRE2, things are much worse. Namely, since Unicode mode is still
enabled, ripgrep is still going to decode UTF-8 to ensure that it hands only
valid UTF-8 to PCRE2. Unfortunately, one key downside of multiline search is
that ripgrep cannot do it incrementally. Since matches can be arbitrarily long,
ripgrep actually needs the entire file in memory at once. Normally, we can use
a memory map for this, but because we need to UTF-8 decode the file before
searching it, ripgrep winds up reading the entire contents of the file on to
the heap before executing a search. Owch.
OK, so Unicode is killing us here. The file we're searching is _mostly_ ASCII,
so maybe we're OK with missing some data. (Try `rg '[\w--\p{ascii}]'` to see
non-ASCII word characters that an ASCII-only `\w` character class would miss.)
We can disable Unicode in both searches, but this is done differently depending
on the regex engine we use:
```
$ time rg '(?-u)^\w{42}$' subtitles2016-sample
21225780:EverymajordevelopmentinthehistoryofAmerica
real 0m1.714s
user 0m1.669s
sys 0m0.044s
$ time rg -P '^\w{42}$' subtitles2016-sample --no-pcre2-unicode
21225780:EverymajordevelopmentinthehistoryofAmerica
real 0m1.997s
user 0m1.958s
sys 0m0.037s
```
For the most part, ripgrep's default regex engine performs about the same.
PCRE2 does improve a little bit, and is now almost as fast as the default
regex engine. If you look at the output of `--trace`, you'll see that ripgrep
will no longer perform UTF-8 decoding, but it does still use the slow
line-by-line searcher.
At this point, we can combine all of our insights above: let's try to get off
of the slow line-by-line searcher by enabling multiline mode, and let's stop
UTF-8 decoding by disabling Unicode support:
```
$ time rg -U '(?-u)^\w{42}$' subtitles2016-sample
21225780:EverymajordevelopmentinthehistoryofAmerica
real 0m1.714s
user 0m1.655s
sys 0m0.058s
$ time rg -P -U '^\w{42}$' subtitles2016-sample --no-pcre2-unicode
21225780:EverymajordevelopmentinthehistoryofAmerica
real 0m1.121s
user 0m1.071s
sys 0m0.048s
```
Ah, there's PCRE2's JIT shining! ripgrep's default regex engine once again
remains about the same, but PCRE2 no longer needs to search line-by-line and it
no longer needs to do any kind of UTF-8 checks. This allows the file to get
memory mapped and passed right through PCRE2's JIT at impressive speeds. (As
a brief and interesting historical note, the configuration of "memory map +
multiline + no-Unicode" is exactly the configuration used by The Silver
Searcher. This analysis perhaps sheds some reasoning as to why that
configuration is useful!)
In summary, if you want PCRE2 to go as fast as possible and you don't care
about Unicode and you don't care about matches possibly spanning across
multiple lines, then enable multiline mode with `-U` and disable PCRE2's
Unicode support with the `--no-pcre2-unicode` flag.
Caveat emptor: This author is not a PCRE2 expert, so there may be APIs that can
improve performance that the author missed. Similarly, there may be alternative
designs for a searching tool that are more amenable to how PCRE2 works.
<h3 name="rg-other-cmd">
When I run <code>rg</code>, why does it execute some other command?
</h3>

View File

@@ -227,7 +227,7 @@ with the following contents:
```
ripgrep treats `.ignore` files with higher precedence than `.gitignore` files
(and treats `.rgignore` files with higher precedence than `.ignore` files).
(and treats `.rgignore` files with higher precdence than `.ignore` files).
This means ripgrep will see the `!log/` whitelist rule first and search that
directory.
@@ -580,7 +580,7 @@ override it.
If you're confused about what configuration file ripgrep is reading arguments
from, then running ripgrep with the `--debug` flag should help clarify things.
The debug output should note what config file is being loaded and the arguments
The debug output should note what config file is being loaded and the arugments
that have been read from the configuration.
Finally, if you want to make absolutely sure that ripgrep *isn't* reading a

146
README.md
View File

@@ -7,7 +7,7 @@ available for [every release](https://github.com/BurntSushi/ripgrep/releases).
ripgrep is similar to other popular search tools like The Silver Searcher,
ack and grep.
[![Linux build status](https://travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Linux build status](https://travis-ci.org/BurntSushi/ripgrep.svg?branch=master)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![Crates.io](https://img.shields.io/crates/v/ripgrep.svg)](https://crates.io/crates/ripgrep)
@@ -23,7 +23,7 @@ Please see the [CHANGELOG](CHANGELOG.md) for a release history.
* [Installation](#installation)
* [User Guide](GUIDE.md)
* [Frequently Asked Questions](FAQ.md)
* [Regex syntax](https://docs.rs/regex/1/regex/#syntax)
* [Regex syntax](https://docs.rs/regex/0.2.5/regex/#syntax)
* [Configuration files](GUIDE.md#configuration-file)
* [Shell completions](FAQ.md#complete)
* [Building](#building)
@@ -85,16 +85,14 @@ increases the times to `2.640s` for ripgrep and `10.277s` for GNU grep.
### Why should I use ripgrep?
* It can replace many use cases served by other search tools
because it contains most of their features and is generally faster. (See
[the FAQ](FAQ.md#posix4ever) for more details on whether ripgrep can truly
replace grep.)
* Like other tools specialized to code search, ripgrep defaults to recursive
directory search and won't search files ignored by your `.gitignore` files.
It also ignores hidden and binary files by default. ripgrep also implements
full support for `.gitignore`, whereas there are many bugs related to that
functionality in other code search tools claiming to provide the same
functionality.
* It can replace many use cases served by both The Silver Searcher and GNU grep
because it is generally faster than both. (See [the FAQ](FAQ.md#posix4ever)
for more details on whether ripgrep can truly replace grep.)
* Like The Silver Searcher, ripgrep defaults to recursive directory search
and won't search files ignored by your `.gitignore` files. It also ignores
hidden and binary files by default. ripgrep also implements full support
for `.gitignore`, whereas there are many bugs related to that functionality
in The Silver Searcher.
* ripgrep can search specific types of files. For example, `rg -tpy foo`
limits your search to Python files and `rg -Tjs foo` excludes Javascript
files from your search. ripgrep can be taught about new file types with
@@ -103,10 +101,6 @@ increases the times to `2.640s` for ripgrep and `10.277s` for GNU grep.
of search results, searching multiple patterns, highlighting matches with
color and full Unicode support. Unlike GNU grep, ripgrep stays fast while
supporting Unicode (which is always on).
* ripgrep has optional support for switching its regex engine to use PCRE2.
Among other things, this makes it possible to use look-around and
backreferences in your patterns, which are supported in ripgrep's default
regex engine. PCRE2 support is enabled with `-P`.
* ripgrep supports searching files in text encodings other than UTF-8, such
as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for
automatically detecting UTF-16 is provided. Other text encodings must be
@@ -118,29 +112,27 @@ increases the times to `2.640s` for ripgrep and `10.277s` for GNU grep.
detection and so on.
In other words, use ripgrep if you like speed, filtering by default, fewer
bugs and Unicode support.
bugs, and Unicode support.
### Why shouldn't I use ripgrep?
Despite initially not wanting to add every feature under the sun to ripgrep,
over time, ripgrep has grown support for most features found in other file
searching tools. This includes searching for results spanning across multiple
lines, and opt-in support for PCRE2, which provides look-around and
backreference support.
I'd like to try to convince you why you *shouldn't* use ripgrep. This should
give you a glimpse at some important downsides or missing features of
ripgrep.
At this point, the primary reasons not to use ripgrep probably consist of one
or more of the following:
* ripgrep uses a regex engine based on finite automata, so if you want fancy
regex features such as backreferences or lookaround, ripgrep won't provide
them to you. ripgrep does support lots of things though, including, but not
limited to: lazy quantification (e.g., `a+?`), repetitions (e.g., `a{2,5}`),
begin/end assertions (e.g., `^\w+$`), word boundaries (e.g., `\bfoo\b`), and
support for Unicode categories (e.g., `\p{Sc}` to match currency symbols or
`\p{Lu}` to match any uppercase letter). (Fancier regexes will never be
supported.)
* ripgrep doesn't have multiline search. (Will happen as an opt-in feature.)
* You need a portable and ubiquitous tool. While ripgrep works on Windows,
macOS and Linux, it is not ubiquitous and it does not conform to any
standard such as POSIX. The best tool for this job is good old grep.
* There still exists some other minor feature (or bug) found in another tool
that isn't in ripgrep.
* There is a performance edge case where ripgrep doesn't do well where another
tool does do well. (Please file a bug report!)
* ripgrep isn't possible to install on your machine or isn't available for your
platform. (Please file a bug report!)
In other words, if you like fancy regexes or multiline search, then ripgrep
may not quite meet your needs (yet).
### Is it really faster than everything else?
@@ -153,8 +145,7 @@ Summarizing, ripgrep is fast because:
* It is built on top of
[Rust's regex engine](https://github.com/rust-lang-nursery/regex).
Rust's regex engine uses finite automata, SIMD and aggressive literal
optimizations to make searching very fast. (PCRE2 support can be opted into
with the `-P/--pcre2` flag.)
optimizations to make searching very fast.
* Rust's regex library maintains performance with full Unicode support by
building UTF-8 decoding directly into its deterministic finite automaton
engine.
@@ -163,7 +154,7 @@ Summarizing, ripgrep is fast because:
latter is better for large directories. ripgrep chooses the best searching
strategy for you automatically.
* Applies your ignore patterns in `.gitignore` files using a
[`RegexSet`](https://docs.rs/regex/1/regex/struct.RegexSet.html).
[`RegexSet`](https://docs.rs/regex/1.0.0/regex/struct.RegexSet.html).
That means a single file path can be matched against multiple glob patterns
simultaneously.
* It uses a lock-free parallel recursive directory iterator, courtesy of
@@ -177,11 +168,6 @@ Andy Lester, author of [ack](https://beyondgrep.com/), has published an
excellent table comparing the features of ack, ag, git-grep, GNU grep and
ripgrep: https://beyondgrep.com/feature-comparison/
Note that ripgrep has grown a few significant new features recently that
are not yet present in Andy's table. This includes, but is not limited to,
configuration files, passthru, support for searching compressed files,
multiline search and opt-in fancy regex support via PCRE2.
### Installation
@@ -221,15 +207,13 @@ If you're a **MacPorts** user, then you can install ripgrep from the
$ sudo port install ripgrep
```
If you're a **Windows Chocolatey** user, then you can install ripgrep from the
[official repo](https://chocolatey.org/packages/ripgrep):
If you're a **Windows Chocolatey** user, then you can install ripgrep from the [official repo](https://chocolatey.org/packages/ripgrep):
```
$ choco install ripgrep
```
If you're a **Windows Scoop** user, then you can install ripgrep from the
[official bucket](https://github.com/lukesampson/scoop/blob/master/bucket/ripgrep.json):
If you're a **Windows Scoop** user, then you can install ripgrep from the [official bucket](https://github.com/lukesampson/scoop/blob/master/bucket/ripgrep.json):
```
$ scoop install ripgrep
@@ -241,37 +225,32 @@ If you're an **Arch Linux** user, then you can install ripgrep from the official
$ pacman -S ripgrep
```
If you're a **Gentoo** user, you can install ripgrep from the
[official repo](https://packages.gentoo.org/packages/sys-apps/ripgrep):
If you're a **Gentoo** user, you can install ripgrep from the [official repo](https://packages.gentoo.org/packages/sys-apps/ripgrep):
```
$ emerge sys-apps/ripgrep
```
If you're a **Fedora 27+** user, you can install ripgrep from official
repositories.
If you're a **Fedora 27+** user, you can install ripgrep from official repositories.
```
$ sudo dnf install ripgrep
```
If you're a **Fedora 24+** user, you can install ripgrep from
[copr](https://copr.fedorainfracloud.org/coprs/carlwgeorge/ripgrep/):
If you're a **Fedora 24+** user, you can install ripgrep from [copr](https://copr.fedorainfracloud.org/coprs/carlwgeorge/ripgrep/):
```
$ sudo dnf copr enable carlwgeorge/ripgrep
$ sudo dnf install ripgrep
```
If you're an **openSUSE Tumbleweed** user, you can install ripgrep from the
[official repo](http://software.opensuse.org/package/ripgrep):
If you're an **openSUSE Tumbleweed** user, you can install ripgrep from the [official repo](http://software.opensuse.org/package/ripgrep):
```
$ sudo zypper install ripgrep
```
If you're a **RHEL/CentOS 7** user, you can install ripgrep from
[copr](https://copr.fedorainfracloud.org/coprs/carlwgeorge/ripgrep/):
If you're a **RHEL/CentOS 7** user, you can install ripgrep from [copr](https://copr.fedorainfracloud.org/coprs/carlwgeorge/ripgrep/):
```
$ sudo yum-config-manager --add-repo=https://copr.fedorainfracloud.org/coprs/carlwgeorge/ripgrep/repo/epel-7/carlwgeorge-ripgrep-epel-7.repo
@@ -292,14 +271,8 @@ then ripgrep can be installed using a binary `.deb` file provided in each
ripgrep is not in the official Debian or Ubuntu repositories.
```
$ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/0.9.0/ripgrep_0.9.0_amd64.deb
$ sudo dpkg -i ripgrep_0.9.0_amd64.deb
```
If you run Debian Buster (currently Debian testing) or Debian sid, ripgrep is
[officially maintained by Debian](https://tracker.debian.org/pkg/rust-ripgrep).
```
$ sudo apt-get install ripgrep
$ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/0.8.1/ripgrep_0.8.1_amd64.deb
$ sudo dpkg -i ripgrep_0.8.1_amd64.deb
```
(N.B. Various snaps for ripgrep on Ubuntu are also available, but none of them
@@ -307,30 +280,26 @@ seem to work right and generate a number of very strange bug reports that I
don't know how to fix and don't have the time to fix. Therefore, it is no
longer a recommended installation option.)
If you're a **FreeBSD** user, then you can install ripgrep from the
[official ports](https://www.freshports.org/textproc/ripgrep/):
If you're a **FreeBSD** user, then you can install ripgrep from the [official ports](https://www.freshports.org/textproc/ripgrep/):
```
# pkg install ripgrep
```
If you're an **OpenBSD** user, then you can install ripgrep from the
[official ports](http://openports.se/textproc/ripgrep):
If you're an **OpenBSD** user, then you can install ripgrep from the [official ports](http://openports.se/textproc/ripgrep):
```
$ doas pkg_add ripgrep
```
If you're a **NetBSD** user, then you can install ripgrep from
[pkgsrc](http://pkgsrc.se/textproc/ripgrep):
If you're a **NetBSD** user, then you can install ripgrep from [pkgsrc](http://pkgsrc.se/textproc/ripgrep):
```
# pkgin install ripgrep
```
If you're a **Rust programmer**, ripgrep can be installed with `cargo`.
* Note that the minimum supported version of Rust for ripgrep is **1.28.0**,
* Note that the minimum supported version of Rust for ripgrep is **1.23.0**,
although ripgrep may work with older versions.
* Note that the binary may be bigger than expected because it contains debug
symbols. This is intentional. To remove debug symbols and therefore reduce
@@ -351,10 +320,7 @@ ripgrep isn't currently in any other package repositories.
ripgrep is written in Rust, so you'll need to grab a
[Rust installation](https://www.rust-lang.org/) in order to compile it.
ripgrep compiles with Rust 1.28.0 (stable) or newer. In general, ripgrep tracks
the latest stable release of the Rust compiler.
To build ripgrep:
ripgrep compiles with Rust 1.23.0 (stable) or newer. Building is easy:
```
$ git clone https://github.com/BurntSushi/ripgrep
@@ -381,36 +347,6 @@ are not necessary to get SIMD optimizations for search; those are enabled
automatically. Hopefully, some day, the `simd-accel` and `avx-accel` features
will similarly become unnecessary.
Finally, optional PCRE2 support can be built with ripgrep by enabling the
`pcre2` feature:
```
$ cargo build --release --features 'pcre2'
```
(Tip: use `--features 'pcre2 simd-accel avx-accel'` to also include compile
time SIMD optimizations, which will only work with a nightly compiler.)
Enabling the PCRE2 feature works with a stable Rust compiler and will
attempt to automatically find and link with your system's PCRE2 library via
`pkg-config`. If one doesn't exist, then ripgrep will build PCRE2 from source
using your system's C compiler and then statically link it into the final
executable. Static linking can be forced even when there is an available PCRE2
system library by either building ripgrep with the MUSL target or by setting
`PCRE2_SYS_STATIC=1`.
ripgrep can be built with the MUSL target on Linux by first installing the MUSL
library on your system (consult your friendly neighborhood package manager).
Then you just need to add MUSL support to your Rust toolchain and rebuild
ripgrep, which yields a fully static executable:
```
$ rustup target add x86_64-unknown-linux-musl
$ cargo build --release --target x86_64-unknown-linux-musl
```
Applying the `--features` flag from above works as expected.
### Running tests

View File

@@ -1,6 +1,8 @@
# Inspired from https://github.com/habitat-sh/habitat/blob/master/appveyor.yml
cache:
- c:\cargo\registry
- c:\cargo\git
- c:\projects\ripgrep\target
init:
- mkdir c:\cargo
@@ -17,20 +19,14 @@ environment:
PROJECT_NAME: ripgrep
RUST_BACKTRACE: full
matrix:
- TARGET: x86_64-pc-windows-gnu
CHANNEL: stable
BITS: 64
MSYS2: 1
- TARGET: x86_64-pc-windows-msvc
CHANNEL: stable
BITS: 64
- TARGET: i686-pc-windows-gnu
CHANNEL: stable
BITS: 32
MSYS2: 1
- TARGET: i686-pc-windows-msvc
CHANNEL: stable
BITS: 32
- TARGET: x86_64-pc-windows-gnu
CHANNEL: stable
- TARGET: x86_64-pc-windows-msvc
CHANNEL: stable
matrix:
fast_finish: true
@@ -39,9 +35,8 @@ matrix:
# (Based on from https://github.com/rust-lang/libc/blob/master/appveyor.yml)
install:
- curl -sSf -o rustup-init.exe https://win.rustup.rs/
- rustup-init.exe -y --default-host %TARGET%
- set PATH=%PATH%;C:\Users\appveyor\.cargo\bin
- if defined MSYS2 set PATH=C:\msys64\mingw%BITS%\bin;%PATH%
- rustup-init.exe -y --default-host %TARGET% --no-modify-path
- if defined MSYS2_BITS set PATH=%PATH%;C:\msys64\mingw%MSYS2_BITS%\bin
- rustc -V
- cargo -V
@@ -51,11 +46,11 @@ build: false
# Equivalent to Travis' `script` phase
# TODO modify this phase as you see fit
test_script:
- cargo test --verbose --all --features pcre2
- cargo test --verbose --all
before_deploy:
# Generate artifacts for release
- cargo build --release --features pcre2
- cargo build --release
- mkdir staging
- copy target\release\rg.exe staging
- ps: copy target\release\build\ripgrep-*\out\_rg.ps1 staging
@@ -83,6 +78,7 @@ branches:
only:
- /\d+\.\d+\.\d+/
- master
- ag/libripgrep
# - appveyor
# - /\d+\.\d+\.\d+/
# except:

View File

@@ -8,11 +8,7 @@ set -ex
# Generate artifacts for release
mk_artifacts() {
if is_arm; then
cargo build --target "$TARGET" --release
else
cargo build --target "$TARGET" --release --features 'pcre2'
fi
cargo build --target "$TARGET" --release
}
mk_tarball() {

View File

@@ -1,31 +0,0 @@
#!/bin/bash
set -e
# This script builds a binary dpkg for Debian based distros. It does not
# currently run in CI, and is instead run manually and the resulting dpkg is
# uploaded to GitHub via the web UI.
#
# Note that this requires 'cargo deb', which can be installed with
# 'cargo install cargo-deb'.
#
# This should be run from the root of the ripgrep repo.
if ! command -V cargo-deb > /dev/null 2>&1; then
echo "cargo-deb command missing" >&2
exit 1
fi
# 'cargo deb' does not seem to provide a way to specify an asset that is
# created at build time, such as ripgrep's man page. To work around this,
# we force a debug build, copy out the man page produced from that build, put
# it into a predictable location and then build the deb, which knows where to
# look.
mkdir -p deployment
cargo build
manpage="$(find ./target/debug -name rg.1 -print0 | xargs -0 ls -t | head -n1)"
cp "$manpage" deployment/
# Since we're distributing the dpkg, we don't know whether the user will have
# PCRE2 installed, so just do a static build.
PCRE2_SYS_STATIC=1 cargo deb

View File

@@ -8,11 +8,7 @@ set -ex
main() {
# Test a normal debug build.
if is_arm; then
cargo build --target "$TARGET" --verbose
else
cargo build --target "$TARGET" --verbose --all --features 'pcre2'
fi
cargo build --target "$TARGET" --verbose --all
# Show the output of the most recent build.rs stderr.
set +x
@@ -44,7 +40,7 @@ main() {
"$(dirname "${0}")/test_complete.sh"
# Run tests for ripgrep and all sub-crates.
cargo test --target "$TARGET" --verbose --all --features 'pcre2'
cargo test --target "$TARGET" --verbose --all
}
main

View File

@@ -39,14 +39,12 @@ main() {
print -rl - 'Comparing options:' "-$rg" "+$_rg"
# 'Parse' options out of the `--help` output. To prevent false positives we
# only look at lines where the first non-white-space character is `-`, or
# where a long option starting with certain letters (see `_rg`) is found.
# Occasionally we may have to handle some manually, however
# only look at lines where the first non-white-space character is `-`
help_args=( ${(f)"$(
$rg --help |
$rg -i -- '^\s+--?[a-z0-9]|--[imnp]' |
$rg -ior '$1' -- $'[\t /\"\'`.,](-[a-z0-9]|--[a-z0-9-]+)\\b' |
$rg -v -- --print0 | # False positives
$rg -- '^\s*-' |
$rg -io -- '[\t ,](-[a-z0-9]|--[a-z0-9-]+)\b' |
tr -d '\t ,' |
sort -u
)"} )
@@ -60,6 +58,8 @@ main() {
comp_args=( ${comp_args%%-[:[]*} ) # Strip everything after -optname-
comp_args=( ${comp_args%%[:+=[]*} ) # Strip everything after other optspecs
comp_args=( ${comp_args##[^-]*} ) # Remove non-options
# This probably isn't necessary, but we should ensure the same order
comp_args=( ${(f)"$( print -rl - $comp_args | sort -u )"} )
(( $#help_args )) || {

View File

@@ -55,6 +55,13 @@ gcc_prefix() {
esac
}
is_ssse3_target() {
case "$(architecture)" in
amd64) return 0 ;;
*) return 1 ;;
esac
}
is_x86() {
case "$(architecture)" in
amd64|i386) return 0 ;;

View File

@@ -6,8 +6,8 @@
# Run ci/test_complete.sh after building to ensure that the options supported by
# this function stay in synch with the `rg` binary.
#
# For convenience, a completion reference guide is included at the bottom of
# this file.
# @see http://zsh.sourceforge.net/Doc/Release/Completion-System.html
# @see https://github.com/zsh-users/zsh/blob/master/Etc/completion-style-guide
#
# Originally based on code from the zsh-users project — see copyright notice
# below.
@@ -26,10 +26,8 @@ _rg() {
# style set. Note that this prefix check has to be updated manually to account
# for all of the potential negation options listed below!
if
# We also want to list all of these options during testing
[[ $_RG_COMPLETE_LIST_ARGS == (1|t*|y*) ]] ||
# (--[imnp]* => --ignore*, --messages, --no-*, --pcre2-unicode)
[[ $PREFIX$SUFFIX == --[imnp]* ]] ||
# (--[imn]* => --ignore*, --messages, --no-*)
[[ $PREFIX$SUFFIX == --[imn]* ]] ||
zstyle -t ":complete:$curcontext:*" complete-all
then
no=
@@ -44,12 +42,6 @@ _rg() {
'(: * -)'{-h,--help}'[display help information]'
'(: * -)'{-V,--version}'[display version information]'
+ '(buffered)' # buffering options
'--line-buffered[force line buffering]'
$no"--no-line-buffered[don't force line buffering]"
'--block-buffered[force block buffering]'
$no"--no-block-buffered[don't force block buffering]"
+ '(case)' # Case-sensitivity options
{-i,--ignore-case}'[search case-insensitively]'
{-s,--case-sensitive}'[search case-sensitively]'
@@ -69,15 +61,11 @@ _rg() {
$no"--no-column[don't show column numbers for matches]"
+ '(count)' # Counting options
{-c,--count}'[only show count of matching lines for each file]'
'--count-matches[only show count of individual matches for each file]'
+ '(encoding)' # Encoding options
{-E+,--encoding=}'[specify text encoding of files to search]: :_rg_encodings'
$no'--no-encoding[use default text encoding]'
'(passthru)'{-c,--count}'[only show count of matching lines for each file]'
'(passthru)--count-matches[only show count of individual matches for each file]'
+ file # File-input options
'(1)*'{-f+,--file=}'[specify file containing patterns to search for]: :_files'
'*'{-f+,--file=}'[specify file containing patterns to search for]: :_files'
+ '(file-match)' # Files with/without match options
'(stats)'{-l,--files-with-matches}'[only show names of files with matches]'
@@ -87,10 +75,6 @@ _rg() {
{-H,--with-filename}'[show file name for matches]'
"--no-filename[don't show file name for matches]"
+ '(file-system)' # File system options
"--one-file-system[don't descend into directories on other file systems]"
$no'--no-one-file-system[descend into directories on other file systems]'
+ '(fixed)' # Fixed-string options
{-F,--fixed-strings}'[treat pattern as literal string instead of regular expression]'
$no"--no-fixed-strings[don't treat pattern as literal string]"
@@ -127,19 +111,10 @@ _rg() {
"--no-ignore-vcs[don't respect version control ignore files]"
$no'--ignore-vcs[respect version control ignore files]'
+ '(json)' # JSON options
'--json[output results in JSON Lines format]'
$no"--no-json[don't output results in JSON Lines format]"
+ '(line-number)' # Line-number options
+ '(line)' # Line-number options
{-n,--line-number}'[show line numbers for matches]'
{-N,--no-line-number}"[don't show line numbers for matches]"
+ '(line-terminator)' # Line-terminator options
'--crlf[use CRLF as line terminator]'
$no"--no-crlf[don't use CRLF as line terminator]"
'(text)--null-data[use NUL as line terminator]'
+ '(max-depth)' # Directory-depth options
'--max-depth=[specify max number of directories to descend]:number of directories'
'!--maxdepth=:number of directories'
@@ -156,36 +131,21 @@ _rg() {
'--mmap[search using memory maps when possible]'
"--no-mmap[don't search using memory maps]"
+ '(multiline)' # Multiline options
{-U,--multiline}'[permit matching across multiple lines]'
$no'(multiline-dotall)--no-multiline[restrict matches to at most one line each]'
+ '(multiline-dotall)' # Multiline DOTALL options
'(--no-multiline)--multiline-dotall[allow "." to match newline (with -U)]'
$no"(--no-multiline)--no-multiline-dotall[don't allow \".\" to match newline (with -U)]"
+ '(multiline)' # multiline options
'--multiline[permit matching across multiple lines]'
$no"--no-multiline[restrict matches to at most one line each]"
+ '(only)' # Only-match options
{-o,--only-matching}'[show only matching part of each line]'
'(passthru replace)'{-o,--only-matching}'[show only matching part of each line]'
+ '(passthru)' # Pass-through options
'(--vimgrep)--passthru[show both matching and non-matching lines]'
'!(--vimgrep)--passthrough'
+ '(pcre2)' # PCRE2 options
{-P,--pcre2}'[enable matching with PCRE2]'
$no'(pcre2-unicode)--no-pcre2[disable matching with PCRE2]'
+ '(pcre2-unicode)' # PCRE2 Unicode options
$no'(--no-pcre2 --no-pcre2-unicode)--pcre2-unicode[enable PCRE2 Unicode mode (with -P)]'
'(--no-pcre2 --pcre2-unicode)--no-pcre2-unicode[disable PCRE2 Unicode mode (with -P)]'
'(--vimgrep count only replace)--passthru[show both matching and non-matching lines]'
'!(--vimgrep count only replace)--passthrough'
+ '(pre)' # Preprocessing options
'(-z --search-zip)--pre=[specify preprocessor utility]:preprocessor utility:_command_names -e'
$no'--no-pre[disable preprocessor utility]'
+ pre-glob # Preprocessing glob options
'*--pre-glob[include/exclude files for preprocessing with --pre]'
+ '(pretty-vimgrep)' # Pretty/vimgrep display options
'(heading)'{-p,--pretty}'[alias for --color=always --heading -n]'
'(heading passthru)--vimgrep[show results in vim-compatible format]'
@@ -194,39 +154,21 @@ _rg() {
'(1 file)*'{-e+,--regexp=}'[specify pattern]:pattern'
+ '(replace)' # Replacement options
{-r+,--replace=}'[specify string used to replace matches]:replace string'
'(count only passthru)'{-r+,--replace=}'[specify string used to replace matches]:replace string'
+ '(sort)' # File-sorting options
'(threads)--sort=[sort results in ascending order (disables parallelism)]:sort method:((
none\:"no sorting"
path\:"sort by file path"
modified\:"sort by last modified time"
accessed\:"sort by last accessed time"
created\:"sort by creation time"
))'
'(threads)--sortr=[sort results in descending order (disables parallelism)]:sort method:((
none\:"no sorting"
path\:"sort by file path"
modified\:"sort by last modified time"
accessed\:"sort by last accessed time"
created\:"sort by creation time"
))'
'!(threads)--sort-files[sort results by file path (disables parallelism)]'
'(threads)--sort-files[sort results by file path (disables parallelism)]'
$no"--no-sort-files[don't sort results by file path]"
+ '(stats)' # Statistics options
+ stats # Statistics options
'(--files file-match)--stats[show search statistics]'
$no"--no-stats[don't show search statistics]"
+ '(text)' # Binary-search options
{-a,--text}'[search binary files as if they were text]'
$no"(--null-data)--no-text[don't search binary files as if they were text]"
$no"--no-text[don't search binary files as if they were text]"
+ '(threads)' # Thread-count options
'(sort)'{-j+,--threads=}'[specify approximate number of threads to use]:number of threads'
+ '(trim)' # Trim options
'--trim[trim any ASCII whitespace prefix from each line]'
$no"--no-trim[don't trim ASCII whitespace prefix from each line]"
'(--sort-files)'{-j+,--threads=}'[specify approximate number of threads to use]:number of threads'
+ type # Type options
'*'{-t+,--type=}'[only search files matching specified type]: :_rg_types'
@@ -256,6 +198,7 @@ _rg() {
'--context-separator=[specify string used to separate non-continuous context lines in output]:separator'
'--debug[show debug messages]'
'--dfa-size-limit=[specify upper size limit of generated DFA]:DFA size (bytes)'
'(-E --encoding)'{-E+,--encoding=}'[specify text encoding of files to search]: :_rg_encodings'
"(1 stats)--files[show each file that would be searched (but don't search)]"
'*--ignore-file=[specify additional ignore file]:ignore file:_files'
'(-v --invert-match)'{-v,--invert-match}'[invert matching]'
@@ -388,157 +331,6 @@ _rg_types() {
_rg "$@"
################################################################################
# ZSH COMPLETION REFERENCE
#
# For the convenience of developers who aren't especially familiar with zsh
# completion functions, a brief reference guide follows. This is in no way
# comprehensive; it covers just enough of the basic structure, syntax, and
# conventions to help someone make simple changes like adding new options. For
# more complete documentation regarding zsh completion functions, please see the
# following:
#
# * http://zsh.sourceforge.net/Doc/Release/Completion-System.html
# * https://github.com/zsh-users/zsh/blob/master/Etc/completion-style-guide
#
# OVERVIEW
#
# Most zsh completion functions are defined in terms of `_arguments`, which is a
# shell function that takes a series of argument specifications. The specs for
# `rg` are stored in an array, which is common for more complex functions; the
# elements of the array are passed to `_arguments` on invocation.
#
# ARGUMENT-SPECIFICATION SYNTAX
#
# The following is a contrived example of the argument specs for a simple tool:
#
# '(: * -)'{-h,--help}'[display help information]'
# '(-q -v --quiet --verbose)'{-q,--quiet}'[decrease output verbosity]'
# '!(-q -v --quiet --verbose)--silent'
# '(-q -v --quiet --verbose)'{-v,--verbose}'[increase output verbosity]'
# '--color=[specify when to use colors]:when:(always never auto)'
# '*:example file:_files'
#
# Although there may appear to be six specs here, there are actually nine; we
# use brace expansion to combine specs for options that go by multiple names,
# like `-q` and `--quiet`. This is customary, and ties in with the fact that zsh
# merges completion possibilities together when they have the same description.
#
# The first line defines the option `-h`/`--help`. With most tools, it isn't
# useful to complete anything after `--help` because it effectively overrides
# all others; the `(: * -)` at the beginning of the spec tells zsh not to
# complete any other operands (`:` and `*`) or options (`-`) after this one has
# been used. The `[...]` at the end associates a description with `-h`/`--help`;
# as mentioned, zsh will see the identical descriptions and merge these options
# together when offering completion possibilities.
#
# The next line defines `-q`/`--quiet`. Here we don't want to suppress further
# completions entirely, but we don't want to offer `-q` if `--quiet` has been
# given (since they do the same thing), nor do we want to offer `-v` (since it
# doesn't make sense to be quiet and verbose at the same time). We don't need to
# tell zsh not to offer `--quiet` a second time, since that's the default
# behaviour, but since this line expands to two specs describing `-q` *and*
# `--quiet` we do need to explicitly list all of them here.
#
# The next line defines a hidden option `--silent` — maybe it's a deprecated
# synonym for `--quiet`. The leading `!` indicates that zsh shouldn't offer this
# option during completion. The benefit of providing a spec for an option that
# shouldn't be completed is that, if someone *does* use it, we can correctly
# suppress completion of other options afterwards.
#
# The next line defines `-v`/`--verbose`; this works just like `-q`/`--quiet`.
#
# The next line defines `--color`. In this example, `--color` doesn't have a
# corresponding short option, so we don't need to use brace expansion. Further,
# there are no other options it's exclusive with (just itself), so we don't need
# to define those at the beginning. However, it does take a mandatory argument.
# The `=` at the end of `--color=` indicates that the argument may appear either
# like `--color always` or like `--color=always`; this is how most GNU-style
# command-line tools work. The corresponding short option would normally use `+`
# — for example, `-c+` would allow either `-c always` or `-calways`. For this
# option, the arguments are known ahead of time, so we can simply list them in
# parentheses at the end (`when` is used as the description for the argument).
#
# The last line defines an operand (a non-option argument). In this example, the
# operand can be used any number of times (the leading `*`), and it should be a
# file path, so we tell zsh to call the `_files` function to complete it. The
# `example file` in the middle is the description to use for this operand; we
# could use a space instead to accept the default provided by `_files`.
#
# GROUPING ARGUMENT SPECIFICATIONS
#
# Newer versions of zsh support grouping argument specs together. All specs
# following a `+` and then a group name are considered to be members of the
# named group. Grouping is useful mostly for organisational purposes; it makes
# the relationship between different options more obvious, and makes it easier
# to specify exclusions.
#
# We could rewrite our example above using grouping as follows:
#
# '(: * -)'{-h,--help}'[display help information]'
# '--color=[specify when to use colors]:when:(always never auto)'
# '*:example file:_files'
# + '(verbosity)'
# {-q,--quiet}'[decrease output verbosity]'
# '!--silent'
# {-v,--verbose}'[increase output verbosity]'
#
# Here we take advantage of a useful feature of spec grouping — when the group
# name is surrounded by parentheses, as in `(verbosity)`, it tells zsh that all
# of the options in that group are exclusive with each other. As a result, we
# don't need to manually list out the exclusions at the beginning of each
# option.
#
# Groups can also be referred to by name in other argument specs; for example:
#
# '(xyz)--aaa' '*: :_files'
# + xyz --xxx --yyy --zzz
#
# Here we use the group name `xyz` to tell zsh that `--xxx`, `--yyy`, and
# `--zzz` are not to be completed after `--aaa`. This makes the exclusion list
# much more compact and reusable.
#
# CONVENTIONS
#
# zsh completion functions generally adhere to the following conventions:
#
# * Use two spaces for indentation
# * Combine specs for options with different names using brace expansion
# * In combined specs, list the short option first (as in `{-a,--text}`)
# * Use `+` or `=` as described above for options that take arguments
# * Provide a description for all options, option-arguments, and operands
# * Capitalise/punctuate argument descriptions as phrases, not complete
# sentences — 'display help information', never 'Display help information.'
# (but still capitalise acronyms and proper names)
# * Write argument descriptions as verb phrases — 'display x', 'enable y',
# 'use z'
# * Word descriptions to make it clear when an option expects an argument;
# usually this is done with the word 'specify', as in 'specify x' or
# 'use specified x')
# * Write argument descriptions as tersely as possible — for example, articles
# like 'a' and 'the' should be omitted unless it would be confusing
#
# Other conventions currently used by this function:
#
# * Order argument specs alphabetically by group name, then option name
# * Group options that are directly related, mutually exclusive, or frequently
# referenced by other argument specs
# * Use only characters in the set [a-z0-9_-] in group names
# * Order exclusion lists as follows: short options, long options, groups
# * Use American English in descriptions
# * Use 'don't' in descriptions instead of 'do not'
# * Word descriptions for related options as similarly as possible. For example,
# `--foo[enable foo]` and `--no-foo[disable foo]`, or `--foo[use foo]` and
# `--no-foo[don't use foo]`
# * Word descriptions to make it clear when an option only makes sense with
# another option, usually by adding '(with -x)' to the end
# * Don't quote strings or variables unnecessarily. When quotes are required,
# prefer single-quotes to double-quotes
# * Prefix option specs with `$no` when the option serves only to negate the
# behaviour of another option that must be provided explicitly by the user.
# This prevents rarely used options from cluttering up the completion menu
################################################################################
# ------------------------------------------------------------------------------
# Copyright (c) 2011 Github zsh-users - http://github.com/zsh-users
# All rights reserved.

View File

@@ -28,37 +28,27 @@ Synopsis
DESCRIPTION
-----------
ripgrep (rg) recursively searches your current directory for a regex pattern.
By default, ripgrep will respect your .gitignore and automatically skip hidden
files/directories and binary files.
By default, ripgrep will respect your `.gitignore` and automatically skip
hidden files/directories and binary files.
ripgrep's default regex engine uses finite automata and guarantees linear
time searching. Because of this, features like backreferences and arbitrary
look-around are not supported. However, if ripgrep is built with PCRE2, then
the --pcre2 flag can be used to enable backreferences and look-around.
ripgrep supports configuration files. Set RIPGREP_CONFIG_PATH to a
configuration file. The file can specify one shell argument per line. Lines
starting with '#' are ignored. For more details, see the man page or the
README.
ripgrep's regex engine uses finite automata and guarantees linear time
searching. Because of this, features like backreferences and arbitrary
lookaround are not supported.
REGEX SYNTAX
------------
ripgrep uses Rust's regex engine by default, which documents its syntax:
https://docs.rs/regex/*/regex/#syntax
ripgrep uses Rust's regex engine, which documents its syntax:
https://docs.rs/regex/0.2.5/regex/#syntax
ripgrep uses byte-oriented regexes, which has some additional documentation:
https://docs.rs/regex/*/regex/bytes/index.html#syntax
https://docs.rs/regex/0.2.5/regex/bytes/index.html#syntax
To a first approximation, ripgrep uses Perl-like regexes without look-around or
backreferences. This makes them very similar to the "extended" (ERE) regular
expressions supported by `egrep`, but with a few additional features like
Unicode character classes.
If you're using ripgrep with the --pcre2 flag, then please consult
https://www.pcre.org or the PCRE2 man pages for documentation on the supported
syntax.
POSITIONAL ARGUMENTS
--------------------

View File

@@ -4,7 +4,7 @@ Cross platform single glob and glob set matching. Glob set matching is the
process of matching one or more glob patterns against a single candidate path
simultaneously, and returning all of the globs that matched.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.png)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![](https://img.shields.io/crates/v/globset.svg)](https://crates.io/crates/globset)

View File

@@ -470,6 +470,7 @@ impl GlobSetBuilder {
}
/// Add a new pattern to this set.
#[allow(dead_code)]
pub fn add(&mut self, pat: Glob) -> &mut GlobSetBuilder {
self.pats.push(pat);
self

View File

@@ -1,25 +0,0 @@
[package]
name = "grep-cli"
version = "0.1.0" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Utilities for search oriented command line applications.
"""
documentation = "https://docs.rs/grep-cli"
homepage = "https://github.com/BurntSushi/ripgrep"
repository = "https://github.com/BurntSushi/ripgrep"
readme = "README.md"
keywords = ["regex", "grep", "cli", "utility", "util"]
license = "Unlicense/MIT"
[dependencies]
atty = "0.2.11"
globset = { version = "0.4.1", path = "../globset" }
lazy_static = "1.1"
log = "0.4"
regex = "1"
same-file = "1"
termcolor = "1"
[target.'cfg(windows)'.dependencies.winapi-util]
version = "0.1.1"

View File

@@ -1,21 +0,0 @@
The MIT License (MIT)
Copyright (c) 2015 Andrew Gallant
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

View File

@@ -1,38 +0,0 @@
grep-cli
--------
A utility library that provides common routines desired in search oriented
command line applications. This includes, but is not limited to, parsing hex
escapes, detecting whether stdin is readable and more. To the extent possible,
this crate strives for compatibility across Windows, macOS and Linux.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![](https://img.shields.io/crates/v/grep-cli.svg)](https://crates.io/crates/grep-cli)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
### Documentation
[https://docs.rs/grep-cli](https://docs.rs/grep-cli)
**NOTE:** You probably don't want to use this crate directly. Instead, you
should prefer the facade defined in the
[`grep`](https://docs.rs/grep)
crate.
### Usage
Add this to your `Cargo.toml`:
```toml
[dependencies]
grep-cli = "0.1"
```
and this to your crate root:
```rust
extern crate grep_cli;
```

View File

@@ -1,24 +0,0 @@
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.
In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
For more information, please refer to <http://unlicense.org/>

View File

@@ -1,381 +0,0 @@
use std::ffi::{OsStr, OsString};
use std::fs::File;
use std::io;
use std::path::Path;
use std::process::Command;
use globset::{Glob, GlobSet, GlobSetBuilder};
use process::{CommandError, CommandReader, CommandReaderBuilder};
/// A builder for a matcher that determines which files get decompressed.
#[derive(Clone, Debug)]
pub struct DecompressionMatcherBuilder {
/// The commands for each matching glob.
commands: Vec<DecompressionCommand>,
/// Whether to include the default matching rules.
defaults: bool,
}
/// A representation of a single command for decompressing data
/// out-of-proccess.
#[derive(Clone, Debug)]
struct DecompressionCommand {
/// The glob that matches this command.
glob: String,
/// The command or binary name.
bin: OsString,
/// The arguments to invoke with the command.
args: Vec<OsString>,
}
impl Default for DecompressionMatcherBuilder {
fn default() -> DecompressionMatcherBuilder {
DecompressionMatcherBuilder::new()
}
}
impl DecompressionMatcherBuilder {
/// Create a new builder for configuring a decompression matcher.
pub fn new() -> DecompressionMatcherBuilder {
DecompressionMatcherBuilder {
commands: vec![],
defaults: true,
}
}
/// Build a matcher for determining how to decompress files.
///
/// If there was a problem compiling the matcher, then an error is
/// returned.
pub fn build(&self) -> Result<DecompressionMatcher, CommandError> {
let defaults =
if !self.defaults {
vec![]
} else {
default_decompression_commands()
};
let mut glob_builder = GlobSetBuilder::new();
let mut commands = vec![];
for decomp_cmd in defaults.iter().chain(&self.commands) {
let glob = Glob::new(&decomp_cmd.glob).map_err(|err| {
CommandError::io(io::Error::new(io::ErrorKind::Other, err))
})?;
glob_builder.add(glob);
commands.push(decomp_cmd.clone());
}
let globs = glob_builder.build().map_err(|err| {
CommandError::io(io::Error::new(io::ErrorKind::Other, err))
})?;
Ok(DecompressionMatcher { globs, commands })
}
/// When enabled, the default matching rules will be compiled into this
/// matcher before any other associations. When disabled, only the
/// rules explicitly given to this builder will be used.
///
/// This is enabled by default.
pub fn defaults(&mut self, yes: bool) -> &mut DecompressionMatcherBuilder {
self.defaults = yes;
self
}
/// Associates a glob with a command to decompress files matching the glob.
///
/// If multiple globs match the same file, then the most recently added
/// glob takes precedence.
///
/// The syntax for the glob is documented in the
/// [`globset` crate](https://docs.rs/globset/#syntax).
pub fn associate<P, I, A>(
&mut self,
glob: &str,
program: P,
args: I,
) -> &mut DecompressionMatcherBuilder
where P: AsRef<OsStr>,
I: IntoIterator<Item=A>,
A: AsRef<OsStr>,
{
let glob = glob.to_string();
let bin = program.as_ref().to_os_string();
let args = args
.into_iter()
.map(|a| a.as_ref().to_os_string())
.collect();
self.commands.push(DecompressionCommand { glob, bin, args });
self
}
}
/// A matcher for determining how to decompress files.
#[derive(Clone, Debug)]
pub struct DecompressionMatcher {
/// The set of globs to match. Each glob has a corresponding entry in
/// `commands`. When a glob matches, the corresponding command should be
/// used to perform out-of-process decompression.
globs: GlobSet,
/// The commands for each matching glob.
commands: Vec<DecompressionCommand>,
}
impl Default for DecompressionMatcher {
fn default() -> DecompressionMatcher {
DecompressionMatcher::new()
}
}
impl DecompressionMatcher {
/// Create a new matcher with default rules.
///
/// To add more matching rules, build a matcher with
/// [`DecompressionMatcherBuilder`](struct.DecompressionMatcherBuilder.html).
pub fn new() -> DecompressionMatcher {
DecompressionMatcherBuilder::new()
.build()
.expect("built-in matching rules should always compile")
}
/// Return a pre-built command based on the given file path that can
/// decompress its contents. If no such decompressor is known, then this
/// returns `None`.
///
/// If there are multiple possible commands matching the given path, then
/// the command added last takes precedence.
pub fn command<P: AsRef<Path>>(&self, path: P) -> Option<Command> {
for i in self.globs.matches(path).into_iter().rev() {
let decomp_cmd = &self.commands[i];
let mut cmd = Command::new(&decomp_cmd.bin);
cmd.args(&decomp_cmd.args);
return Some(cmd);
}
None
}
/// Returns true if and only if the given file path has at least one
/// matching command to perform decompression on.
pub fn has_command<P: AsRef<Path>>(&self, path: P) -> bool {
self.globs.is_match(path)
}
}
/// Configures and builds a streaming reader for decompressing data.
#[derive(Clone, Debug, Default)]
pub struct DecompressionReaderBuilder {
matcher: DecompressionMatcher,
command_builder: CommandReaderBuilder,
}
impl DecompressionReaderBuilder {
/// Create a new builder with the default configuration.
pub fn new() -> DecompressionReaderBuilder {
DecompressionReaderBuilder::default()
}
/// Build a new streaming reader for decompressing data.
///
/// If decompression is done out-of-process and if there was a problem
/// spawning the process, then its error is logged at the debug level and a
/// passthru reader is returned that does no decompression. This behavior
/// typically occurs when the given file path matches a decompression
/// command, but is executing in an environment where the decompression
/// command is not available.
///
/// If the given file path could not be matched with a decompression
/// strategy, then a passthru reader is returned that does no
/// decompression.
pub fn build<P: AsRef<Path>>(
&self,
path: P,
) -> Result<DecompressionReader, CommandError> {
let path = path.as_ref();
let mut cmd = match self.matcher.command(path) {
None => return DecompressionReader::new_passthru(path),
Some(cmd) => cmd,
};
cmd.arg(path);
match self.command_builder.build(&mut cmd) {
Ok(cmd_reader) => Ok(DecompressionReader { rdr: Ok(cmd_reader) }),
Err(err) => {
debug!(
"{}: error spawning command '{:?}': {} \
(falling back to uncompressed reader)",
path.display(),
cmd,
err,
);
DecompressionReader::new_passthru(path)
}
}
}
/// Set the matcher to use to look up the decompression command for each
/// file path.
///
/// A set of sensible rules is enabled by default. Setting this will
/// completely replace the current rules.
pub fn matcher(
&mut self,
matcher: DecompressionMatcher,
) -> &mut DecompressionReaderBuilder {
self.matcher = matcher;
self
}
/// Get the underlying matcher currently used by this builder.
pub fn get_matcher(&self) -> &DecompressionMatcher {
&self.matcher
}
/// When enabled, the reader will asynchronously read the contents of the
/// command's stderr output. When disabled, stderr is only read after the
/// stdout stream has been exhausted (or if the process quits with an error
/// code).
///
/// Note that when enabled, this may require launching an additional
/// thread in order to read stderr. This is done so that the process being
/// executed is never blocked from writing to stdout or stderr. If this is
/// disabled, then it is possible for the process to fill up the stderr
/// buffer and deadlock.
///
/// This is enabled by default.
pub fn async_stderr(
&mut self,
yes: bool,
) -> &mut DecompressionReaderBuilder {
self.command_builder.async_stderr(yes);
self
}
}
/// A streaming reader for decompressing the contents of a file.
///
/// The purpose of this reader is to provide a seamless way to decompress the
/// contents of file using existing tools in the current environment. This is
/// meant to be an alternative to using decompression libraries in favor of the
/// simplicity and portability of using external commands such as `gzip` and
/// `xz`. This does impose the overhead of spawning a process, so other means
/// for performing decompression should be sought if this overhead isn't
/// acceptable.
///
/// A decompression reader comes with a default set of matching rules that are
/// meant to associate file paths with the corresponding command to use to
/// decompress them. For example, a glob like `*.gz` matches gzip compressed
/// files with the command `gzip -d -c`. If a file path does not match any
/// existing rules, or if it matches a rule whose command does not exist in the
/// current environment, then the decompression reader passes through the
/// contents of the underlying file without doing any decompression.
///
/// The default matching rules are probably good enough for most cases, and if
/// they require revision, pull requests are welcome. In cases where they must
/// be changed or extended, they can be customized through the use of
/// [`DecompressionMatcherBuilder`](struct.DecompressionMatcherBuilder.html)
/// and
/// [`DecompressionReaderBuilder`](struct.DecompressionReaderBuilder.html).
///
/// By default, this reader will asynchronously read the processes' stderr.
/// This prevents subtle deadlocking bugs for noisy processes that write a lot
/// to stderr. Currently, the entire contents of stderr is read on to the heap.
///
/// # Example
///
/// This example shows how to read the decompressed contents of a file without
/// needing to explicitly choose the decompression command to run.
///
/// Note that if you need to decompress multiple files, it is better to use
/// `DecompressionReaderBuilder`, which will amortize the cost of compiling the
/// matcher.
///
/// ```no_run
/// use std::io::Read;
/// use std::process::Command;
/// use grep_cli::DecompressionReader;
///
/// # fn example() -> Result<(), Box<::std::error::Error>> {
/// let mut rdr = DecompressionReader::new("/usr/share/man/man1/ls.1.gz")?;
/// let mut contents = vec![];
/// rdr.read_to_end(&mut contents)?;
/// # Ok(()) }
/// ```
#[derive(Debug)]
pub struct DecompressionReader {
rdr: Result<CommandReader, File>,
}
impl DecompressionReader {
/// Build a new streaming reader for decompressing data.
///
/// If decompression is done out-of-process and if there was a problem
/// spawning the process, then its error is returned.
///
/// If the given file path could not be matched with a decompression
/// strategy, then a passthru reader is returned that does no
/// decompression.
///
/// This uses the default matching rules for determining how to decompress
/// the given file. To change those matching rules, use
/// [`DecompressionReaderBuilder`](struct.DecompressionReaderBuilder.html)
/// and
/// [`DecompressionMatcherBuilder`](struct.DecompressionMatcherBuilder.html).
///
/// When creating readers for many paths. it is better to use the builder
/// since it will amortize the cost of constructing the matcher.
pub fn new<P: AsRef<Path>>(
path: P,
) -> Result<DecompressionReader, CommandError> {
DecompressionReaderBuilder::new().build(path)
}
/// Creates a new "passthru" decompression reader that reads from the file
/// corresponding to the given path without doing decompression and without
/// executing another process.
fn new_passthru(path: &Path) -> Result<DecompressionReader, CommandError> {
let file = File::open(path)?;
Ok(DecompressionReader { rdr: Err(file) })
}
}
impl io::Read for DecompressionReader {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
match self.rdr {
Ok(ref mut rdr) => rdr.read(buf),
Err(ref mut rdr) => rdr.read(buf),
}
}
}
fn default_decompression_commands() -> Vec<DecompressionCommand> {
const ARGS_GZIP: &[&str] = &["gzip", "-d", "-c"];
const ARGS_BZIP: &[&str] = &["bzip2", "-d", "-c"];
const ARGS_XZ: &[&str] = &["xz", "-d", "-c"];
const ARGS_LZ4: &[&str] = &["lz4", "-d", "-c"];
const ARGS_LZMA: &[&str] = &["xz", "--format=lzma", "-d", "-c"];
fn cmd(glob: &str, args: &[&str]) -> DecompressionCommand {
DecompressionCommand {
glob: glob.to_string(),
bin: OsStr::new(&args[0]).to_os_string(),
args: args
.iter()
.skip(1)
.map(|s| OsStr::new(s).to_os_string())
.collect(),
}
}
vec![
cmd("*.gz", ARGS_GZIP),
cmd("*.tgz", ARGS_GZIP),
cmd("*.bz2", ARGS_BZIP),
cmd("*.tbz2", ARGS_BZIP),
cmd("*.xz", ARGS_XZ),
cmd("*.txz", ARGS_XZ),
cmd("*.lz4", ARGS_LZ4),
cmd("*.lzma", ARGS_LZMA),
]
}

View File

@@ -1,315 +0,0 @@
use std::ffi::OsStr;
use std::str;
/// A single state in the state machine used by `unescape`.
#[derive(Clone, Copy, Eq, PartialEq)]
enum State {
/// The state after seeing a `\`.
Escape,
/// The state after seeing a `\x`.
HexFirst,
/// The state after seeing a `\x[0-9A-Fa-f]`.
HexSecond(char),
/// Default state.
Literal,
}
/// Escapes arbitrary bytes into a human readable string.
///
/// This converts `\t`, `\r` and `\n` into their escaped forms. It also
/// converts the non-printable subset of ASCII in addition to invalid UTF-8
/// bytes to hexadecimal escape sequences. Everything else is left as is.
///
/// The dual of this routine is [`unescape`](fn.unescape.html).
///
/// # Example
///
/// This example shows how to convert a byte string that contains a `\n` and
/// invalid UTF-8 bytes into a `String`.
///
/// Pay special attention to the use of raw strings. That is, `r"\n"` is
/// equivalent to `"\\n"`.
///
/// ```
/// use grep_cli::escape;
///
/// assert_eq!(r"foo\nbar\xFFbaz", escape(b"foo\nbar\xFFbaz"));
/// ```
pub fn escape(mut bytes: &[u8]) -> String {
let mut escaped = String::new();
while let Some(result) = decode_utf8(bytes) {
match result {
Ok(cp) => {
escape_char(cp, &mut escaped);
bytes = &bytes[cp.len_utf8()..];
}
Err(byte) => {
escape_byte(byte, &mut escaped);
bytes = &bytes[1..];
}
}
}
escaped
}
/// Escapes an OS string into a human readable string.
///
/// This is like [`escape`](fn.escape.html), but accepts an OS string.
pub fn escape_os(string: &OsStr) -> String {
#[cfg(unix)]
fn imp(string: &OsStr) -> String {
use std::os::unix::ffi::OsStrExt;
escape(string.as_bytes())
}
#[cfg(not(unix))]
fn imp(string: &OsStr) -> String {
escape(string.to_string_lossy().as_bytes())
}
imp(string)
}
/// Unescapes a string.
///
/// It supports a limited set of escape sequences:
///
/// * `\t`, `\r` and `\n` are mapped to their corresponding ASCII bytes.
/// * `\xZZ` hexadecimal escapes are mapped to their byte.
///
/// Everything else is left as is, including non-hexadecimal escapes like
/// `\xGG`.
///
/// This is useful when it is desirable for a command line argument to be
/// capable of specifying arbitrary bytes or otherwise make it easier to
/// specify non-printable characters.
///
/// The dual of this routine is [`escape`](fn.escape.html).
///
/// # Example
///
/// This example shows how to convert an escaped string (which is valid UTF-8)
/// into a corresponding sequence of bytes. Each escape sequence is mapped to
/// its bytes, which may include invalid UTF-8.
///
/// Pay special attention to the use of raw strings. That is, `r"\n"` is
/// equivalent to `"\\n"`.
///
/// ```
/// use grep_cli::unescape;
///
/// assert_eq!(&b"foo\nbar\xFFbaz"[..], &*unescape(r"foo\nbar\xFFbaz"));
/// ```
pub fn unescape(s: &str) -> Vec<u8> {
use self::State::*;
let mut bytes = vec![];
let mut state = Literal;
for c in s.chars() {
match state {
Escape => {
match c {
'\\' => { bytes.push(b'\\'); state = Literal; }
'n' => { bytes.push(b'\n'); state = Literal; }
'r' => { bytes.push(b'\r'); state = Literal; }
't' => { bytes.push(b'\t'); state = Literal; }
'x' => { state = HexFirst; }
c => {
bytes.extend(format!(r"\{}", c).into_bytes());
state = Literal;
}
}
}
HexFirst => {
match c {
'0'...'9' | 'A'...'F' | 'a'...'f' => {
state = HexSecond(c);
}
c => {
bytes.extend(format!(r"\x{}", c).into_bytes());
state = Literal;
}
}
}
HexSecond(first) => {
match c {
'0'...'9' | 'A'...'F' | 'a'...'f' => {
let ordinal = format!("{}{}", first, c);
let byte = u8::from_str_radix(&ordinal, 16).unwrap();
bytes.push(byte);
state = Literal;
}
c => {
let original = format!(r"\x{}{}", first, c);
bytes.extend(original.into_bytes());
state = Literal;
}
}
}
Literal => {
match c {
'\\' => { state = Escape; }
c => { bytes.extend(c.to_string().as_bytes()); }
}
}
}
}
match state {
Escape => bytes.push(b'\\'),
HexFirst => bytes.extend(b"\\x"),
HexSecond(c) => bytes.extend(format!("\\x{}", c).into_bytes()),
Literal => {}
}
bytes
}
/// Unescapes an OS string.
///
/// This is like [`unescape`](fn.unescape.html), but accepts an OS string.
///
/// Note that this first lossily decodes the given OS string as UTF-8. That
/// is, an escaped string (the thing given) should be valid UTF-8.
pub fn unescape_os(string: &OsStr) -> Vec<u8> {
unescape(&string.to_string_lossy())
}
/// Adds the given codepoint to the given string, escaping it if necessary.
fn escape_char(cp: char, into: &mut String) {
if cp.is_ascii() {
escape_byte(cp as u8, into);
} else {
into.push(cp);
}
}
/// Adds the given byte to the given string, escaping it if necessary.
fn escape_byte(byte: u8, into: &mut String) {
match byte {
0x21...0x5B | 0x5D...0x7D => into.push(byte as char),
b'\n' => into.push_str(r"\n"),
b'\r' => into.push_str(r"\r"),
b'\t' => into.push_str(r"\t"),
b'\\' => into.push_str(r"\\"),
_ => into.push_str(&format!(r"\x{:02X}", byte)),
}
}
/// Decodes the next UTF-8 encoded codepoint from the given byte slice.
///
/// If no valid encoding of a codepoint exists at the beginning of the given
/// byte slice, then the first byte is returned instead.
///
/// This returns `None` if and only if `bytes` is empty.
fn decode_utf8(bytes: &[u8]) -> Option<Result<char, u8>> {
if bytes.is_empty() {
return None;
}
let len = match utf8_len(bytes[0]) {
None => return Some(Err(bytes[0])),
Some(len) if len > bytes.len() => return Some(Err(bytes[0])),
Some(len) => len,
};
match str::from_utf8(&bytes[..len]) {
Ok(s) => Some(Ok(s.chars().next().unwrap())),
Err(_) => Some(Err(bytes[0])),
}
}
/// Given a UTF-8 leading byte, this returns the total number of code units
/// in the following encoded codepoint.
///
/// If the given byte is not a valid UTF-8 leading byte, then this returns
/// `None`.
fn utf8_len(byte: u8) -> Option<usize> {
if byte <= 0x7F {
Some(1)
} else if byte <= 0b110_11111 {
Some(2)
} else if byte <= 0b1110_1111 {
Some(3)
} else if byte <= 0b1111_0111 {
Some(4)
} else {
None
}
}
#[cfg(test)]
mod tests {
use super::{escape, unescape};
fn b(bytes: &'static [u8]) -> Vec<u8> {
bytes.to_vec()
}
#[test]
fn empty() {
assert_eq!(b(b""), unescape(r""));
assert_eq!(r"", escape(b""));
}
#[test]
fn backslash() {
assert_eq!(b(b"\\"), unescape(r"\\"));
assert_eq!(r"\\", escape(b"\\"));
}
#[test]
fn nul() {
assert_eq!(b(b"\x00"), unescape(r"\x00"));
assert_eq!(r"\x00", escape(b"\x00"));
}
#[test]
fn nl() {
assert_eq!(b(b"\n"), unescape(r"\n"));
assert_eq!(r"\n", escape(b"\n"));
}
#[test]
fn tab() {
assert_eq!(b(b"\t"), unescape(r"\t"));
assert_eq!(r"\t", escape(b"\t"));
}
#[test]
fn carriage() {
assert_eq!(b(b"\r"), unescape(r"\r"));
assert_eq!(r"\r", escape(b"\r"));
}
#[test]
fn nothing_simple() {
assert_eq!(b(b"\\a"), unescape(r"\a"));
assert_eq!(b(b"\\a"), unescape(r"\\a"));
assert_eq!(r"\\a", escape(b"\\a"));
}
#[test]
fn nothing_hex0() {
assert_eq!(b(b"\\x"), unescape(r"\x"));
assert_eq!(b(b"\\x"), unescape(r"\\x"));
assert_eq!(r"\\x", escape(b"\\x"));
}
#[test]
fn nothing_hex1() {
assert_eq!(b(b"\\xz"), unescape(r"\xz"));
assert_eq!(b(b"\\xz"), unescape(r"\\xz"));
assert_eq!(r"\\xz", escape(b"\\xz"));
}
#[test]
fn nothing_hex2() {
assert_eq!(b(b"\\xzz"), unescape(r"\xzz"));
assert_eq!(b(b"\\xzz"), unescape(r"\\xzz"));
assert_eq!(r"\\xzz", escape(b"\\xzz"));
}
#[test]
fn invalid_utf8() {
assert_eq!(r"\xFF", escape(b"\xFF"));
assert_eq!(r"a\xFFb", escape(b"a\xFFb"));
}
}

View File

@@ -1,171 +0,0 @@
use std::error;
use std::fmt;
use std::io;
use std::num::ParseIntError;
use regex::Regex;
/// An error that occurs when parsing a human readable size description.
///
/// This error provides a end user friendly message describing why the
/// description coudln't be parsed and what the expected format is.
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct ParseSizeError {
original: String,
kind: ParseSizeErrorKind,
}
#[derive(Clone, Debug, Eq, PartialEq)]
enum ParseSizeErrorKind {
InvalidFormat,
InvalidInt(ParseIntError),
Overflow,
}
impl ParseSizeError {
fn format(original: &str) -> ParseSizeError {
ParseSizeError {
original: original.to_string(),
kind: ParseSizeErrorKind::InvalidFormat,
}
}
fn int(original: &str, err: ParseIntError) -> ParseSizeError {
ParseSizeError {
original: original.to_string(),
kind: ParseSizeErrorKind::InvalidInt(err),
}
}
fn overflow(original: &str) -> ParseSizeError {
ParseSizeError {
original: original.to_string(),
kind: ParseSizeErrorKind::Overflow,
}
}
}
impl error::Error for ParseSizeError {
fn description(&self) -> &str { "invalid size" }
}
impl fmt::Display for ParseSizeError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
use self::ParseSizeErrorKind::*;
match self.kind {
InvalidFormat => {
write!(
f,
"invalid format for size '{}', which should be a sequence \
of digits followed by an optional 'K', 'M' or 'G' \
suffix",
self.original
)
}
InvalidInt(ref err) => {
write!(
f,
"invalid integer found in size '{}': {}",
self.original,
err
)
}
Overflow => {
write!(f, "size too big in '{}'", self.original)
}
}
}
}
impl From<ParseSizeError> for io::Error {
fn from(size_err: ParseSizeError) -> io::Error {
io::Error::new(io::ErrorKind::Other, size_err)
}
}
/// Parse a human readable size like `2M` into a corresponding number of bytes.
///
/// Supported size suffixes are `K` (for kilobyte), `M` (for megabyte) and `G`
/// (for gigabyte). If a size suffix is missing, then the size is interpreted
/// as bytes. If the size is too big to fit into a `u64`, then this returns an
/// error.
///
/// Additional suffixes may be added over time.
pub fn parse_human_readable_size(size: &str) -> Result<u64, ParseSizeError> {
lazy_static! {
// Normally I'd just parse something this simple by hand to avoid the
// regex dep, but we bring regex in any way for glob matching, so might
// as well use it.
static ref RE: Regex = Regex::new(r"^([0-9]+)([KMG])?$").unwrap();
}
let caps = match RE.captures(size) {
Some(caps) => caps,
None => return Err(ParseSizeError::format(size)),
};
let value: u64 = caps[1].parse().map_err(|err| {
ParseSizeError::int(size, err)
})?;
let suffix = match caps.get(2) {
None => return Ok(value),
Some(cap) => cap.as_str(),
};
let bytes = match suffix {
"K" => value.checked_mul(1<<10),
"M" => value.checked_mul(1<<20),
"G" => value.checked_mul(1<<30),
// Because if the regex matches this group, it must be [KMG].
_ => unreachable!(),
};
bytes.ok_or_else(|| ParseSizeError::overflow(size))
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn suffix_none() {
let x = parse_human_readable_size("123").unwrap();
assert_eq!(123, x);
}
#[test]
fn suffix_k() {
let x = parse_human_readable_size("123K").unwrap();
assert_eq!(123 * (1<<10), x);
}
#[test]
fn suffix_m() {
let x = parse_human_readable_size("123M").unwrap();
assert_eq!(123 * (1<<20), x);
}
#[test]
fn suffix_g() {
let x = parse_human_readable_size("123G").unwrap();
assert_eq!(123 * (1<<30), x);
}
#[test]
fn invalid_empty() {
assert!(parse_human_readable_size("").is_err());
}
#[test]
fn invalid_non_digit() {
assert!(parse_human_readable_size("a").is_err());
}
#[test]
fn invalid_overflow() {
assert!(parse_human_readable_size("9999999999999999G").is_err());
}
#[test]
fn invalid_suffix() {
assert!(parse_human_readable_size("123T").is_err());
}
}

View File

@@ -1,251 +0,0 @@
/*!
This crate provides common routines used in command line applications, with a
focus on routines useful for search oriented applications. As a utility
library, there is no central type or function. However, a key focus of this
crate is to improve failure modes and provide user friendly error messages
when things go wrong.
To the best extent possible, everything in this crate works on Windows, macOS
and Linux.
# Standard I/O
The
[`is_readable_stdin`](fn.is_readable_stdin.html),
[`is_tty_stderr`](fn.is_tty_stderr.html),
[`is_tty_stdin`](fn.is_tty_stdin.html)
and
[`is_tty_stdout`](fn.is_tty_stdout.html)
routines query aspects of standard I/O. `is_readable_stdin` determines whether
stdin can be usefully read from, while the `tty` methods determine whether a
tty is attached to stdin/stdout/stderr.
`is_readable_stdin` is useful when writing an application that changes behavior
based on whether the application was invoked with data on stdin. For example,
`rg foo` might recursively search the current working directory for
occurrences of `foo`, but `rg foo < file` might only search the contents of
`file`.
The `tty` methods are useful for similar reasons. Namely, commands like `ls`
will change their output depending on whether they are printing to a terminal
or not. For example, `ls` shows a file on each line when stdout is redirected
to a file or a pipe, but condenses the output to show possibly many files on
each line when stdout is connected to a tty.
# Coloring and buffering
The
[`stdout`](fn.stdout.html),
[`stdout_buffered_block`](fn.stdout_buffered_block.html)
and
[`stdout_buffered_line`](fn.stdout_buffered_line.html)
routines are alternative constructors for
[`StandardStream`](struct.StandardStream.html).
A `StandardStream` implements `termcolor::WriteColor`, which provides a way
to emit colors to terminals. Its key use is the encapsulation of buffering
style. Namely, `stdout` will return a line buffered `StandardStream` if and
only if stdout is connected to a tty, and will otherwise return a block
buffered `StandardStream`. Line buffering is important for use with a tty
because it typically decreases the latency at which the end user sees output.
Block buffering is used otherwise because it is faster, and redirecting stdout
to a file typically doesn't benefit from the decreased latency that line
buffering provides.
The `stdout_buffered_block` and `stdout_buffered_line` can be used to
explicitly set the buffering strategy regardless of whether stdout is connected
to a tty or not.
# Escaping
The
[`escape`](fn.escape.html),
[`escape_os`](fn.escape_os.html),
[`unescape`](fn.unescape.html)
and
[`unescape_os`](fn.unescape_os.html)
routines provide a user friendly way of dealing with UTF-8 encoded strings that
can express arbitrary bytes. For example, you might want to accept a string
containing arbitrary bytes as a command line argument, but most interactive
shells make such strings difficult to type. Instead, we can ask users to use
escape sequences.
For example, `a\xFFz` is itself a valid UTF-8 string corresponding to the
following bytes:
```ignore
[b'a', b'\\', b'x', b'F', b'F', b'z']
```
However, we can
interpret `\xFF` as an escape sequence with the `unescape`/`unescape_os`
routines, which will yield
```ignore
[b'a', b'\xFF', b'z']
```
instead. For example:
```
use grep_cli::unescape;
// Note the use of a raw string!
assert_eq!(vec![b'a', b'\xFF', b'z'], unescape(r"a\xFFz"));
```
The `escape`/`escape_os` routines provide the reverse transformation, which
makes it easy to show user friendly error messages involving arbitrary bytes.
# Building patterns
Typically, regular expression patterns must be valid UTF-8. However, command
line arguments aren't guaranteed to be valid UTF-8. Unfortunately, the
standard library's UTF-8 conversion functions from `OsStr`s do not provide
good error messages. However, the
[`pattern_from_bytes`](fn.pattern_from_bytes.html)
and
[`pattern_from_os`](fn.pattern_from_os.html)
do, including reporting exactly where the first invalid UTF-8 byte is seen.
Additionally, it can be useful to read patterns from a file while reporting
good error messages that include line numbers. The
[`patterns_from_path`](fn.patterns_from_path.html),
[`patterns_from_reader`](fn.patterns_from_reader.html)
and
[`patterns_from_stdin`](fn.patterns_from_stdin.html)
routines do just that. If any pattern is found that is invalid UTF-8, then the
error includes the file path (if available) along with the line number and the
byte offset at which the first invalid UTF-8 byte was observed.
# Read process output
Sometimes a command line application needs to execute other processes and read
its stdout in a streaming fashion. The
[`CommandReader`](struct.CommandReader.html)
provides this functionality with an explicit goal of improving failure modes.
In particular, if the process exits with an error code, then stderr is read
and converted into a normal Rust error to show to end users. This makes the
underlying failure modes explicit and gives more information to end users for
debugging the problem.
As a special case,
[`DecompressionReader`](struct.DecompressionReader.html)
provides a way to decompress arbitrary files by matching their file extensions
up with corresponding decompression programs (such as `gzip` and `xz`). This
is useful as a means of performing simplistic decompression in a portable
manner without binding to specific compression libraries. This does come with
some overhead though, so if you need to decompress lots of small files, this
may not be an appropriate convenience to use.
Each reader has a corresponding builder for additional configuration, such as
whether to read stderr asynchronously in order to avoid deadlock (which is
enabled by default).
# Miscellaneous parsing
The
[`parse_human_readable_size`](fn.parse_human_readable_size.html)
routine parses strings like `2M` and converts them to the corresponding number
of bytes (`2 * 1<<20` in this case). If an invalid size is found, then a good
error message is crafted that typically tells the user how to fix the problem.
*/
#![deny(missing_docs)]
extern crate atty;
extern crate globset;
#[macro_use]
extern crate lazy_static;
#[macro_use]
extern crate log;
extern crate regex;
extern crate same_file;
extern crate termcolor;
#[cfg(windows)]
extern crate winapi_util;
mod decompress;
mod escape;
mod human;
mod pattern;
mod process;
mod wtr;
pub use decompress::{
DecompressionMatcher, DecompressionMatcherBuilder,
DecompressionReader, DecompressionReaderBuilder,
};
pub use escape::{escape, escape_os, unescape, unescape_os};
pub use human::{ParseSizeError, parse_human_readable_size};
pub use pattern::{
InvalidPatternError,
pattern_from_os, pattern_from_bytes,
patterns_from_path, patterns_from_reader, patterns_from_stdin,
};
pub use process::{CommandError, CommandReader, CommandReaderBuilder};
pub use wtr::{
StandardStream,
stdout, stdout_buffered_line, stdout_buffered_block,
};
/// Returns true if and only if stdin is believed to be readable.
///
/// When stdin is readable, command line programs may choose to behave
/// differently than when stdin is not readable. For example, `command foo`
/// might search the current directory for occurrences of `foo` where as
/// `command foo < some-file` or `cat some-file | command foo` might instead
/// only search stdin for occurrences of `foo`.
pub fn is_readable_stdin() -> bool {
#[cfg(unix)]
fn imp() -> bool {
use std::os::unix::fs::FileTypeExt;
use same_file::Handle;
let ft = match Handle::stdin().and_then(|h| h.as_file().metadata()) {
Err(_) => return false,
Ok(md) => md.file_type(),
};
ft.is_file() || ft.is_fifo()
}
#[cfg(windows)]
fn imp() -> bool {
use winapi_util as winutil;
winutil::file::typ(winutil::HandleRef::stdin())
.map(|t| t.is_disk() || t.is_pipe())
.unwrap_or(false)
}
!is_tty_stdin() && imp()
}
/// Returns true if and only if stdin is believed to be connectted to a tty
/// or a console.
pub fn is_tty_stdin() -> bool {
atty::is(atty::Stream::Stdin)
}
/// Returns true if and only if stdout is believed to be connectted to a tty
/// or a console.
///
/// This is useful for when you want your command line program to produce
/// different output depending on whether it's printing directly to a user's
/// terminal or whether it's being redirected somewhere else. For example,
/// implementations of `ls` will often show one item per line when stdout is
/// redirected, but will condensed output when printing to a tty.
pub fn is_tty_stdout() -> bool {
atty::is(atty::Stream::Stdout)
}
/// Returns true if and only if stderr is believed to be connectted to a tty
/// or a console.
pub fn is_tty_stderr() -> bool {
atty::is(atty::Stream::Stderr)
}

View File

@@ -1,205 +0,0 @@
use std::error;
use std::ffi::OsStr;
use std::fmt;
use std::fs::File;
use std::io::{self, BufRead};
use std::path::Path;
use std::str;
use escape::{escape, escape_os};
/// An error that occurs when a pattern could not be converted to valid UTF-8.
///
/// The purpose of this error is to give a more targeted failure mode for
/// patterns written by end users that are not valid UTF-8.
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct InvalidPatternError {
original: String,
valid_up_to: usize,
}
impl InvalidPatternError {
/// Returns the index in the given string up to which valid UTF-8 was
/// verified.
pub fn valid_up_to(&self) -> usize {
self.valid_up_to
}
}
impl error::Error for InvalidPatternError {
fn description(&self) -> &str { "invalid pattern" }
}
impl fmt::Display for InvalidPatternError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(
f,
"found invalid UTF-8 in pattern at byte offset {} \
(use hex escape sequences to match arbitrary bytes \
in a pattern, e.g., \\xFF): '{}'",
self.valid_up_to,
self.original,
)
}
}
impl From<InvalidPatternError> for io::Error {
fn from(paterr: InvalidPatternError) -> io::Error {
io::Error::new(io::ErrorKind::Other, paterr)
}
}
/// Convert an OS string into a regular expression pattern.
///
/// This conversion fails if the given pattern is not valid UTF-8, in which
/// case, a targeted error with more information about where the invalid UTF-8
/// occurs is given. The error also suggests the use of hex escape sequences,
/// which are supported by many regex engines.
pub fn pattern_from_os(pattern: &OsStr) -> Result<&str, InvalidPatternError> {
pattern.to_str().ok_or_else(|| {
let valid_up_to = pattern
.to_string_lossy()
.find('\u{FFFD}')
.expect("a Unicode replacement codepoint for invalid UTF-8");
InvalidPatternError {
original: escape_os(pattern),
valid_up_to: valid_up_to,
}
})
}
/// Convert arbitrary bytes into a regular expression pattern.
///
/// This conversion fails if the given pattern is not valid UTF-8, in which
/// case, a targeted error with more information about where the invalid UTF-8
/// occurs is given. The error also suggests the use of hex escape sequences,
/// which are supported by many regex engines.
pub fn pattern_from_bytes(
pattern: &[u8],
) -> Result<&str, InvalidPatternError> {
str::from_utf8(pattern).map_err(|err| {
InvalidPatternError {
original: escape(pattern),
valid_up_to: err.valid_up_to(),
}
})
}
/// Read patterns from a file path, one per line.
///
/// If there was a problem reading or if any of the patterns contain invalid
/// UTF-8, then an error is returned. If there was a problem with a specific
/// pattern, then the error message will include the line number and the file
/// path.
pub fn patterns_from_path<P: AsRef<Path>>(path: P) -> io::Result<Vec<String>> {
let path = path.as_ref();
let file = File::open(path).map_err(|err| {
io::Error::new(
io::ErrorKind::Other,
format!("{}: {}", path.display(), err),
)
})?;
patterns_from_reader(file).map_err(|err| {
io::Error::new(
io::ErrorKind::Other,
format!("{}:{}", path.display(), err),
)
})
}
/// Read patterns from stdin, one per line.
///
/// If there was a problem reading or if any of the patterns contain invalid
/// UTF-8, then an error is returned. If there was a problem with a specific
/// pattern, then the error message will include the line number and the fact
/// that it came from stdin.
pub fn patterns_from_stdin() -> io::Result<Vec<String>> {
let stdin = io::stdin();
let locked = stdin.lock();
patterns_from_reader(locked).map_err(|err| {
io::Error::new(
io::ErrorKind::Other,
format!("<stdin>:{}", err),
)
})
}
/// Read patterns from any reader, one per line.
///
/// If there was a problem reading or if any of the patterns contain invalid
/// UTF-8, then an error is returned. If there was a problem with a specific
/// pattern, then the error message will include the line number.
///
/// Note that this routine uses its own internal buffer, so the caller should
/// not provide their own buffered reader if possible.
///
/// # Example
///
/// This shows how to parse patterns, one per line.
///
/// ```
/// use grep_cli::patterns_from_reader;
///
/// # fn example() -> Result<(), Box<::std::error::Error>> {
/// let patterns = "\
/// foo
/// bar\\s+foo
/// [a-z]{3}
/// ";
///
/// assert_eq!(patterns_from_reader(patterns.as_bytes())?, vec![
/// r"foo",
/// r"bar\s+foo",
/// r"[a-z]{3}",
/// ]);
/// # Ok(()) }
/// ```
pub fn patterns_from_reader<R: io::Read>(rdr: R) -> io::Result<Vec<String>> {
let mut patterns = vec![];
let mut bufrdr = io::BufReader::new(rdr);
let mut line = vec![];
let mut line_number = 0;
while {
line.clear();
line_number += 1;
bufrdr.read_until(b'\n', &mut line)? > 0
} {
line.pop().unwrap(); // remove trailing '\n'
if line.last() == Some(&b'\r') {
line.pop().unwrap();
}
match pattern_from_bytes(&line) {
Ok(pattern) => patterns.push(pattern.to_string()),
Err(err) => {
return Err(io::Error::new(
io::ErrorKind::Other,
format!("{}: {}", line_number, err),
));
}
}
}
Ok(patterns)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn bytes() {
let pat = b"abc\xFFxyz";
let err = pattern_from_bytes(pat).unwrap_err();
assert_eq!(3, err.valid_up_to());
}
#[test]
#[cfg(unix)]
fn os() {
use std::os::unix::ffi::OsStrExt;
use std::ffi::OsStr;
let pat = OsStr::from_bytes(b"abc\xFFxyz");
let err = pattern_from_os(pat).unwrap_err();
assert_eq!(3, err.valid_up_to());
}
}

View File

@@ -1,267 +0,0 @@
use std::error;
use std::fmt;
use std::io::{self, Read};
use std::iter;
use std::process;
use std::thread::{self, JoinHandle};
/// An error that can occur while running a command and reading its output.
///
/// This error can be seamlessly converted to an `io::Error` via a `From`
/// implementation.
#[derive(Debug)]
pub struct CommandError {
kind: CommandErrorKind,
}
#[derive(Debug)]
enum CommandErrorKind {
Io(io::Error),
Stderr(Vec<u8>),
}
impl CommandError {
/// Create an error from an I/O error.
pub(crate) fn io(ioerr: io::Error) -> CommandError {
CommandError { kind: CommandErrorKind::Io(ioerr) }
}
/// Create an error from the contents of stderr (which may be empty).
pub(crate) fn stderr(bytes: Vec<u8>) -> CommandError {
CommandError { kind: CommandErrorKind::Stderr(bytes) }
}
}
impl error::Error for CommandError {
fn description(&self) -> &str { "command error" }
}
impl fmt::Display for CommandError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self.kind {
CommandErrorKind::Io(ref e) => e.fmt(f),
CommandErrorKind::Stderr(ref bytes) => {
let msg = String::from_utf8_lossy(bytes);
if msg.trim().is_empty() {
write!(f, "<stderr is empty>")
} else {
let div = iter::repeat('-').take(79).collect::<String>();
write!(f, "\n{div}\n{msg}\n{div}", div=div, msg=msg.trim())
}
}
}
}
}
impl From<io::Error> for CommandError {
fn from(ioerr: io::Error) -> CommandError {
CommandError { kind: CommandErrorKind::Io(ioerr) }
}
}
impl From<CommandError> for io::Error {
fn from(cmderr: CommandError) -> io::Error {
match cmderr.kind {
CommandErrorKind::Io(ioerr) => ioerr,
CommandErrorKind::Stderr(_) => {
io::Error::new(io::ErrorKind::Other, cmderr)
}
}
}
}
/// Configures and builds a streaming reader for process output.
#[derive(Clone, Debug, Default)]
pub struct CommandReaderBuilder {
async_stderr: bool,
}
impl CommandReaderBuilder {
/// Create a new builder with the default configuration.
pub fn new() -> CommandReaderBuilder {
CommandReaderBuilder::default()
}
/// Build a new streaming reader for the given command's output.
///
/// The caller should set everything that's required on the given command
/// before building a reader, such as its arguments, environment and
/// current working directory. Settings such as the stdout and stderr (but
/// not stdin) pipes will be overridden so that they can be controlled by
/// the reader.
///
/// If there was a problem spawning the given command, then its error is
/// returned.
pub fn build(
&self,
command: &mut process::Command,
) -> Result<CommandReader, CommandError> {
let mut child = command
.stdout(process::Stdio::piped())
.stderr(process::Stdio::piped())
.spawn()?;
let stdout = child.stdout.take().unwrap();
let stderr =
if self.async_stderr {
StderrReader::async(child.stderr.take().unwrap())
} else {
StderrReader::sync(child.stderr.take().unwrap())
};
Ok(CommandReader {
child: child,
stdout: stdout,
stderr: stderr,
done: false,
})
}
/// When enabled, the reader will asynchronously read the contents of the
/// command's stderr output. When disabled, stderr is only read after the
/// stdout stream has been exhausted (or if the process quits with an error
/// code).
///
/// Note that when enabled, this may require launching an additional
/// thread in order to read stderr. This is done so that the process being
/// executed is never blocked from writing to stdout or stderr. If this is
/// disabled, then it is possible for the process to fill up the stderr
/// buffer and deadlock.
///
/// This is enabled by default.
pub fn async_stderr(&mut self, yes: bool) -> &mut CommandReaderBuilder {
self.async_stderr = yes;
self
}
}
/// A streaming reader for a command's output.
///
/// The purpose of this reader is to provide an easy way to execute processes
/// whose stdout is read in a streaming way while also making the processes'
/// stderr available when the process fails with an exit code. This makes it
/// possible to execute processes while surfacing the underlying failure mode
/// in the case of an error.
///
/// Moreover, by default, this reader will asynchronously read the processes'
/// stderr. This prevents subtle deadlocking bugs for noisy processes that
/// write a lot to stderr. Currently, the entire contents of stderr is read
/// on to the heap.
///
/// # Example
///
/// This example shows how to invoke `gzip` to decompress the contents of a
/// file. If the `gzip` command reports a failing exit status, then its stderr
/// is returned as an error.
///
/// ```no_run
/// use std::io::Read;
/// use std::process::Command;
/// use grep_cli::CommandReader;
///
/// # fn example() -> Result<(), Box<::std::error::Error>> {
/// let mut cmd = Command::new("gzip");
/// cmd.arg("-d").arg("-c").arg("/usr/share/man/man1/ls.1.gz");
///
/// let mut rdr = CommandReader::new(&mut cmd)?;
/// let mut contents = vec![];
/// rdr.read_to_end(&mut contents)?;
/// # Ok(()) }
/// ```
#[derive(Debug)]
pub struct CommandReader {
child: process::Child,
stdout: process::ChildStdout,
stderr: StderrReader,
done: bool,
}
impl CommandReader {
/// Create a new streaming reader for the given command using the default
/// configuration.
///
/// The caller should set everything that's required on the given command
/// before building a reader, such as its arguments, environment and
/// current working directory. Settings such as the stdout and stderr (but
/// not stdin) pipes will be overridden so that they can be controlled by
/// the reader.
///
/// If there was a problem spawning the given command, then its error is
/// returned.
///
/// If the caller requires additional configuration for the reader
/// returned, then use
/// [`CommandReaderBuilder`](struct.CommandReaderBuilder.html).
pub fn new(
cmd: &mut process::Command,
) -> Result<CommandReader, CommandError> {
CommandReaderBuilder::new().build(cmd)
}
}
impl io::Read for CommandReader {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
if self.done {
return Ok(0);
}
let nread = self.stdout.read(buf)?;
if nread == 0 {
self.done = true;
// Reap the child now that we're done reading. If the command
// failed, report stderr as an error.
if !self.child.wait()?.success() {
return Err(io::Error::from(self.stderr.read_to_end()));
}
}
Ok(nread)
}
}
/// A reader that encapsulates the asynchronous or synchronous reading of
/// stderr.
#[derive(Debug)]
enum StderrReader {
Async(Option<JoinHandle<CommandError>>),
Sync(process::ChildStderr),
}
impl StderrReader {
/// Create a reader for stderr that reads contents asynchronously.
fn async(mut stderr: process::ChildStderr) -> StderrReader {
let handle = thread::spawn(move || {
stderr_to_command_error(&mut stderr)
});
StderrReader::Async(Some(handle))
}
/// Create a reader for stderr that reads contents synchronously.
fn sync(stderr: process::ChildStderr) -> StderrReader {
StderrReader::Sync(stderr)
}
/// Consumes all of stderr on to the heap and returns it as an error.
///
/// If there was a problem reading stderr itself, then this returns an I/O
/// command error.
fn read_to_end(&mut self) -> CommandError {
match *self {
StderrReader::Async(ref mut handle) => {
let handle = handle
.take()
.expect("read_to_end cannot be called more than once");
handle
.join()
.expect("stderr reading thread does not panic")
}
StderrReader::Sync(ref mut stderr) => {
stderr_to_command_error(stderr)
}
}
}
}
fn stderr_to_command_error(stderr: &mut process::ChildStderr) -> CommandError {
let mut bytes = vec![];
match stderr.read_to_end(&mut bytes) {
Ok(_) => CommandError::stderr(bytes),
Err(err) => CommandError::io(err),
}
}

View File

@@ -1,133 +0,0 @@
use std::io;
use termcolor;
use is_tty_stdout;
/// A writer that supports coloring with either line or block buffering.
pub struct StandardStream(StandardStreamKind);
/// Returns a possibly buffered writer to stdout for the given color choice.
///
/// The writer returned is either line buffered or block buffered. The decision
/// between these two is made automatically based on whether a tty is attached
/// to stdout or not. If a tty is attached, then line buffering is used.
/// Otherwise, block buffering is used. In general, block buffering is more
/// efficient, but may increase the time it takes for the end user to see the
/// first bits of output.
///
/// If you need more fine grained control over the buffering mode, then use one
/// of `stdout_buffered_line` or `stdout_buffered_block`.
///
/// The color choice given is passed along to the underlying writer. To
/// completely disable colors in all cases, use `ColorChoice::Never`.
pub fn stdout(color_choice: termcolor::ColorChoice) -> StandardStream {
if is_tty_stdout() {
stdout_buffered_line(color_choice)
} else {
stdout_buffered_block(color_choice)
}
}
/// Returns a line buffered writer to stdout for the given color choice.
///
/// This writer is useful when printing results directly to a tty such that
/// users see output as soon as it's written. The downside of this approach
/// is that it can be slower, especially when there is a lot of output.
///
/// You might consider using
/// [`stdout`](fn.stdout.html)
/// instead, which chooses the buffering strategy automatically based on
/// whether stdout is connected to a tty.
pub fn stdout_buffered_line(
color_choice: termcolor::ColorChoice,
) -> StandardStream {
let out = termcolor::StandardStream::stdout(color_choice);
StandardStream(StandardStreamKind::LineBuffered(out))
}
/// Returns a block buffered writer to stdout for the given color choice.
///
/// This writer is useful when printing results to a file since it amortizes
/// the cost of writing data. The downside of this approach is that it can
/// increase the latency of display output when writing to a tty.
///
/// You might consider using
/// [`stdout`](fn.stdout.html)
/// instead, which chooses the buffering strategy automatically based on
/// whether stdout is connected to a tty.
pub fn stdout_buffered_block(
color_choice: termcolor::ColorChoice,
) -> StandardStream {
let out = termcolor::BufferedStandardStream::stdout(color_choice);
StandardStream(StandardStreamKind::BlockBuffered(out))
}
enum StandardStreamKind {
LineBuffered(termcolor::StandardStream),
BlockBuffered(termcolor::BufferedStandardStream),
}
impl io::Write for StandardStream {
#[inline]
fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
use self::StandardStreamKind::*;
match self.0 {
LineBuffered(ref mut w) => w.write(buf),
BlockBuffered(ref mut w) => w.write(buf),
}
}
#[inline]
fn flush(&mut self) -> io::Result<()> {
use self::StandardStreamKind::*;
match self.0 {
LineBuffered(ref mut w) => w.flush(),
BlockBuffered(ref mut w) => w.flush(),
}
}
}
impl termcolor::WriteColor for StandardStream {
#[inline]
fn supports_color(&self) -> bool {
use self::StandardStreamKind::*;
match self.0 {
LineBuffered(ref w) => w.supports_color(),
BlockBuffered(ref w) => w.supports_color(),
}
}
#[inline]
fn set_color(&mut self, spec: &termcolor::ColorSpec) -> io::Result<()> {
use self::StandardStreamKind::*;
match self.0 {
LineBuffered(ref mut w) => w.set_color(spec),
BlockBuffered(ref mut w) => w.set_color(spec),
}
}
#[inline]
fn reset(&mut self) -> io::Result<()> {
use self::StandardStreamKind::*;
match self.0 {
LineBuffered(ref mut w) => w.reset(),
BlockBuffered(ref mut w) => w.reset(),
}
}
#[inline]
fn is_synchronous(&self) -> bool {
use self::StandardStreamKind::*;
match self.0 {
LineBuffered(ref w) => w.is_synchronous(),
BlockBuffered(ref w) => w.is_synchronous(),
}
}
}

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-matcher"
version = "0.1.0" #:version
version = "0.0.1" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
A trait for regular expressions, with a focus on line oriented search.

View File

@@ -1,36 +1,4 @@
grep-matcher
------------
This crate provides a low level interface for describing regular expression
matchers. The `grep` crate uses this interface in order to make the regex
engine it uses pluggable.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![](https://img.shields.io/crates/v/grep-matcher.svg)](https://crates.io/crates/grep-matcher)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
### Documentation
[https://docs.rs/grep-matcher](https://docs.rs/grep-matcher)
**NOTE:** You probably don't want to use this crate directly. Instead, you
should prefer the facade defined in the
[`grep`](https://docs.rs/grep)
crate.
### Usage
Add this to your `Cargo.toml`:
```toml
[dependencies]
grep-matcher = "0.1"
```
and this to your crate root:
```rust
extern crate grep_matcher;
```
grep
----
This is a *library* that provides grep-style line-by-line regex searching (with
comparable performance to `grep` itself).

View File

@@ -1,39 +1,5 @@
/*!
This crate provides an interface for regular expressions, with a focus on line
oriented search. The purpose of this crate is to provide a low level matching
interface that permits any kind of substring or regex implementation to power
the search routines provided by the
[`grep-searcher`](https://docs.rs/grep-searcher)
crate.
The primary thing provided by this crate is the
[`Matcher`](trait.Matcher.html)
trait. The trait defines an abstract interface for text search. It is robust
enough to support everything from basic substring search all the way to
arbitrarily complex regular expression implementations without sacrificing
performance.
A key design decision made in this crate is the use of *internal iteration*,
or otherwise known as the "push" model of searching. In this paradigm,
implementations of the `Matcher` trait will drive search and execute callbacks
provided by the caller when a match is found. This is in contrast to the
usual style of *external iteration* (the "pull" model) found throughout the
Rust ecosystem. There are two primary reasons why internal iteration was
chosen:
* Some search implementations may themselves require internal iteration.
Converting an internal iterator to an external iterator can be non-trivial
and sometimes even practically impossible.
* Rust's type system isn't quite expressive enough to write a generic interface
using external iteration without giving something else up (namely, ease of
use and/or performance).
In other words, internal iteration was chosen because it is the lowest common
denominator and because it is probably the least bad way of expressing the
interface in today's Rust. As a result, this trait isn't specifically intended
for everyday use, although, you might find it to be a happy price to pay if you
want to write code that is generic over multiple different regex
implementations.
An interface for regular expressions, with a focus on line oriented search.
*/
#![deny(missing_docs)]
@@ -220,7 +186,6 @@ enum LineTerminatorImp {
impl LineTerminator {
/// Return a new single-byte line terminator. Any byte is valid.
#[inline]
pub fn byte(byte: u8) -> LineTerminator {
LineTerminator(LineTerminatorImp::Byte([byte]))
}
@@ -229,13 +194,11 @@ impl LineTerminator {
///
/// When this option is used, consumers may generally treat a lone `\n` as
/// a line terminator in addition to `\r\n`.
#[inline]
pub fn crlf() -> LineTerminator {
LineTerminator(LineTerminatorImp::CRLF)
}
/// Returns true if and only if this line terminator is CRLF.
#[inline]
pub fn is_crlf(&self) -> bool {
self.0 == LineTerminatorImp::CRLF
}
@@ -245,7 +208,6 @@ impl LineTerminator {
/// If the line terminator is CRLF, then this returns `\n`. This is
/// useful for routines that, for example, find line boundaries by treating
/// `\n` as a line terminator even when it isn't preceded by `\r`.
#[inline]
pub fn as_byte(&self) -> u8 {
match self.0 {
LineTerminatorImp::Byte(array) => array[0],
@@ -259,7 +221,6 @@ impl LineTerminator {
/// `CRLF`, in which case, it returns `\r\n`.
///
/// The slice returned is guaranteed to have length at least `1`.
#[inline]
pub fn as_bytes(&self) -> &[u8] {
match self.0 {
LineTerminatorImp::Byte(ref array) => array,
@@ -269,7 +230,6 @@ impl LineTerminator {
}
impl Default for LineTerminator {
#[inline]
fn default() -> LineTerminator {
LineTerminator::byte(b'\n')
}
@@ -364,12 +324,12 @@ impl ByteSet {
///
/// Principally, this trait provides a way to access capturing groups
/// in a uniform way that does not require any specific representation.
/// Namely, different matcher implementations may require different in-memory
/// Namely, differ matcher implementations may require different in-memory
/// representations of capturing groups. This trait permits matchers to
/// maintain their specific in-memory representation.
///
/// Note that this trait explicitly does not provide a way to construct a new
/// capture value. Instead, it is the responsibility of a `Matcher` to build
/// captures value. Instead, it is the responsibility of a `Matcher` to build
/// one, which might require knowledge of the matcher's internal implementation
/// details.
pub trait Captures {
@@ -466,7 +426,7 @@ impl Captures for NoCaptures {
/// This error type implements the `std::error::Error` and `fmt::Display`
/// traits for use in matcher implementations that can never produce errors.
///
/// The `fmt::Debug` and `fmt::Display` impls for this type panics.
/// The `fmt::Display` impl for this type panics.
#[derive(Debug, Eq, PartialEq)]
pub struct NoError(());
@@ -503,20 +463,6 @@ pub enum LineMatchKind {
}
/// A matcher defines an interface for regular expression implementations.
///
/// While this trait is large, there are only two required methods that
/// implementors must provide: `find_at` and `new_captures`. If captures
/// aren't supported by your implementation, then `new_captures` can be
/// implemented with
/// [`NoCaptures`](struct.NoCaptures.html). If your implementation does support
/// capture groups, then you should also implement the other capture related
/// methods, as dictated by the documentation. Crucially, this includes
/// `captures_at`.
///
/// The rest of the methods on this trait provide default implementations on
/// top of `find_at` and `new_captures`. It is not uncommon for implementations
/// to be able to provide faster variants of some methods; in those cases,
/// simply override the default implementation.
pub trait Matcher {
/// The concrete type of capturing groups used for this matcher.
///

View File

@@ -1,17 +0,0 @@
[package]
name = "grep-pcre2"
version = "0.1.0" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Use PCRE2 with the 'grep' crate.
"""
documentation = "https://docs.rs/grep-pcre2"
homepage = "https://github.com/BurntSushi/ripgrep"
repository = "https://github.com/BurntSushi/ripgrep"
readme = "README.md"
keywords = ["regex", "grep", "pcre", "backreference", "look"]
license = "Unlicense/MIT"
[dependencies]
grep-matcher = { version = "0.1.0", path = "../grep-matcher" }
pcre2 = "0.1"

View File

@@ -1,21 +0,0 @@
The MIT License (MIT)
Copyright (c) 2015 Andrew Gallant
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

View File

@@ -1,39 +0,0 @@
grep-pcre2
----------
The `grep-pcre2` crate provides an implementation of the `Matcher` trait from
the `grep-matcher` crate. This implementation permits PCRE2 to be used in the
`grep` crate for fast line oriented searching.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![](https://img.shields.io/crates/v/grep-pcre2.svg)](https://crates.io/crates/grep-pcre2)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
### Documentation
[https://docs.rs/grep-pcre2](https://docs.rs/grep-pcre2)
**NOTE:** You probably don't want to use this crate directly. Instead, you
should prefer the facade defined in the
[`grep`](https://docs.rs/grep)
crate.
If you're looking to just use PCRE2 from Rust, then you probably want the
[`pcre2`](https://docs.rs/pcre2)
crate, which provide high level safe bindings to PCRE2.
### Usage
Add this to your `Cargo.toml`:
```toml
[dependencies]
grep-pcre2 = "0.1"
```
and this to your crate root:
```rust
extern crate grep_pcre2;
```

View File

@@ -1,24 +0,0 @@
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.
In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
For more information, please refer to <http://unlicense.org/>

View File

@@ -1,59 +0,0 @@
use std::error;
use std::fmt;
/// An error that can occur in this crate.
///
/// Generally, this error corresponds to problems building a regular
/// expression, whether it's in parsing, compilation or a problem with
/// guaranteeing a configured optimization.
#[derive(Clone, Debug)]
pub struct Error {
kind: ErrorKind,
}
impl Error {
pub(crate) fn regex<E: error::Error>(err: E) -> Error {
Error { kind: ErrorKind::Regex(err.to_string()) }
}
/// Return the kind of this error.
pub fn kind(&self) -> &ErrorKind {
&self.kind
}
}
/// The kind of an error that can occur.
#[derive(Clone, Debug)]
pub enum ErrorKind {
/// An error that occurred as a result of parsing a regular expression.
/// This can be a syntax error or an error that results from attempting to
/// compile a regular expression that is too big.
///
/// The string here is the underlying error converted to a string.
Regex(String),
/// Hints that destructuring should not be exhaustive.
///
/// This enum may grow additional variants, so this makes sure clients
/// don't count on exhaustive matching. (Otherwise, adding a new variant
/// could break existing code.)
#[doc(hidden)]
__Nonexhaustive,
}
impl error::Error for Error {
fn description(&self) -> &str {
match self.kind {
ErrorKind::Regex(_) => "regex error",
ErrorKind::__Nonexhaustive => unreachable!(),
}
}
}
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self.kind {
ErrorKind::Regex(ref s) => write!(f, "{}", s),
ErrorKind::__Nonexhaustive => unreachable!(),
}
}
}

View File

@@ -1,15 +0,0 @@
/*!
An implementation of `grep-matcher`'s `Matcher` trait for
[PCRE2](https://www.pcre.org/).
*/
#![deny(missing_docs)]
extern crate grep_matcher;
extern crate pcre2;
pub use error::{Error, ErrorKind};
pub use matcher::{RegexCaptures, RegexMatcher, RegexMatcherBuilder};
mod error;
mod matcher;

View File

@@ -1,425 +0,0 @@
use std::collections::HashMap;
use grep_matcher::{Captures, Match, Matcher};
use pcre2::bytes::{CaptureLocations, Regex, RegexBuilder};
use error::Error;
/// A builder for configuring the compilation of a PCRE2 regex.
#[derive(Clone, Debug)]
pub struct RegexMatcherBuilder {
builder: RegexBuilder,
case_smart: bool,
word: bool,
}
impl RegexMatcherBuilder {
/// Create a new matcher builder with a default configuration.
pub fn new() -> RegexMatcherBuilder {
RegexMatcherBuilder {
builder: RegexBuilder::new(),
case_smart: false,
word: false,
}
}
/// Compile the given pattern into a PCRE matcher using the current
/// configuration.
///
/// If there was a problem compiling the pattern, then an error is
/// returned.
pub fn build(&self, pattern: &str) -> Result<RegexMatcher, Error> {
let mut builder = self.builder.clone();
if self.case_smart && !has_uppercase_literal(pattern) {
builder.caseless(true);
}
let res =
if self.word {
let pattern = format!(r"(?<!\w)(?:{})(?!\w)", pattern);
builder.build(&pattern)
} else {
builder.build(pattern)
};
res.map_err(Error::regex).map(|regex| {
let mut names = HashMap::new();
for (i, name) in regex.capture_names().iter().enumerate() {
if let Some(ref name) = *name {
names.insert(name.to_string(), i);
}
}
RegexMatcher { regex, names }
})
}
/// Enables case insensitive matching.
///
/// If the `utf` option is also set, then Unicode case folding is used
/// to determine case insensitivity. When the `utf` option is not set,
/// then only standard ASCII case insensitivity is considered.
///
/// This option corresponds to the `i` flag.
pub fn caseless(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.builder.caseless(yes);
self
}
/// Whether to enable "smart case" or not.
///
/// When smart case is enabled, the builder will automatically enable
/// case insensitive matching based on how the pattern is written. Namely,
/// case insensitive mode is enabled when both of the following things
/// are believed to be true:
///
/// 1. The pattern contains at least one literal character. For example,
/// `a\w` contains a literal (`a`) but `\w` does not.
/// 2. Of the literals in the pattern, none of them are considered to be
/// uppercase according to Unicode. For example, `foo\pL` has no
/// uppercase literals but `Foo\pL` does.
///
/// Note that the implementation of this is not perfect. Namely, `\p{Ll}`
/// will prevent case insensitive matching even though it is part of a meta
/// sequence. This bug will probably never be fixed.
pub fn case_smart(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.case_smart = yes;
self
}
/// Enables "dot all" matching.
///
/// When enabled, the `.` metacharacter in the pattern matches any
/// character, include `\n`. When disabled (the default), `.` will match
/// any character except for `\n`.
///
/// This option corresponds to the `s` flag.
pub fn dotall(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.builder.dotall(yes);
self
}
/// Enable "extended" mode in the pattern, where whitespace is ignored.
///
/// This option corresponds to the `x` flag.
pub fn extended(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.builder.extended(yes);
self
}
/// Enable multiline matching mode.
///
/// When enabled, the `^` and `$` anchors will match both at the beginning
/// and end of a subject string, in addition to matching at the start of
/// a line and the end of a line. When disabled, the `^` and `$` anchors
/// will only match at the beginning and end of a subject string.
///
/// This option corresponds to the `m` flag.
pub fn multi_line(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.builder.multi_line(yes);
self
}
/// Enable matching of CRLF as a line terminator.
///
/// When enabled, anchors such as `^` and `$` will match any of the
/// following as a line terminator: `\r`, `\n` or `\r\n`.
///
/// This is disabled by default, in which case, only `\n` is recognized as
/// a line terminator.
pub fn crlf(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.builder.crlf(yes);
self
}
/// Require that all matches occur on word boundaries.
///
/// Enabling this option is subtly different than putting `\b` assertions
/// on both sides of your pattern. In particular, a `\b` assertion requires
/// that one side of it match a word character while the other match a
/// non-word character. This option, in contrast, merely requires that
/// one side match a non-word character.
///
/// For example, `\b-2\b` will not match `foo -2 bar` since `-` is not a
/// word character. However, `-2` with this `word` option enabled will
/// match the `-2` in `foo -2 bar`.
pub fn word(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.word = yes;
self
}
/// Enable Unicode matching mode.
///
/// When enabled, the following patterns become Unicode aware: `\b`, `\B`,
/// `\d`, `\D`, `\s`, `\S`, `\w`, `\W`.
///
/// When set, this implies UTF matching mode. It is not possible to enable
/// Unicode matching mode without enabling UTF matching mode.
///
/// This is disabled by default.
pub fn ucp(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.builder.ucp(yes);
self
}
/// Enable UTF matching mode.
///
/// When enabled, characters are treated as sequences of code units that
/// make up a single codepoint instead of as single bytes. For example,
/// this will cause `.` to match any single UTF-8 encoded codepoint, where
/// as when this is disabled, `.` will any single byte (except for `\n` in
/// both cases, unless "dot all" mode is enabled).
///
/// Note that when UTF matching mode is enabled, every search performed
/// will do a UTF-8 validation check, which can impact performance. The
/// UTF-8 check can be disabled via the `disable_utf_check` option, but it
/// is undefined behavior to enable UTF matching mode and search invalid
/// UTF-8.
///
/// This is disabled by default.
pub fn utf(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.builder.utf(yes);
self
}
/// When UTF matching mode is enabled, this will disable the UTF checking
/// that PCRE2 will normally perform automatically. If UTF matching mode
/// is not enabled, then this has no effect.
///
/// UTF checking is enabled by default when UTF matching mode is enabled.
/// If UTF matching mode is enabled and UTF checking is enabled, then PCRE2
/// will return an error if you attempt to search a subject string that is
/// not valid UTF-8.
///
/// # Safety
///
/// It is undefined behavior to disable the UTF check in UTF matching mode
/// and search a subject string that is not valid UTF-8. When the UTF check
/// is disabled, callers must guarantee that the subject string is valid
/// UTF-8.
pub unsafe fn disable_utf_check(&mut self) -> &mut RegexMatcherBuilder {
self.builder.disable_utf_check();
self
}
/// Enable PCRE2's JIT.
///
/// This generally speeds up matching quite a bit. The downside is that it
/// can increase the time it takes to compile a pattern.
///
/// This is disabled by default.
pub fn jit(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.builder.jit(yes);
self
}
}
/// An implementation of the `Matcher` trait using PCRE2.
#[derive(Clone, Debug)]
pub struct RegexMatcher {
regex: Regex,
names: HashMap<String, usize>,
}
impl RegexMatcher {
/// Create a new matcher from the given pattern using the default
/// configuration.
pub fn new(pattern: &str) -> Result<RegexMatcher, Error> {
RegexMatcherBuilder::new().build(pattern)
}
}
impl Matcher for RegexMatcher {
type Captures = RegexCaptures;
type Error = Error;
fn find_at(
&self,
haystack: &[u8],
at: usize,
) -> Result<Option<Match>, Error> {
Ok(self.regex
.find_at(haystack, at)
.map_err(Error::regex)?
.map(|m| Match::new(m.start(), m.end())))
}
fn new_captures(&self) -> Result<RegexCaptures, Error> {
Ok(RegexCaptures::new(self.regex.capture_locations()))
}
fn capture_count(&self) -> usize {
self.regex.captures_len()
}
fn capture_index(&self, name: &str) -> Option<usize> {
self.names.get(name).map(|i| *i)
}
fn try_find_iter<F, E>(
&self,
haystack: &[u8],
mut matched: F,
) -> Result<Result<(), E>, Error>
where F: FnMut(Match) -> Result<bool, E>
{
for result in self.regex.find_iter(haystack) {
let m = result.map_err(Error::regex)?;
match matched(Match::new(m.start(), m.end())) {
Ok(true) => continue,
Ok(false) => return Ok(Ok(())),
Err(err) => return Ok(Err(err)),
}
}
Ok(Ok(()))
}
fn captures_at(
&self,
haystack: &[u8],
at: usize,
caps: &mut RegexCaptures,
) -> Result<bool, Error> {
Ok(self.regex
.captures_read_at(&mut caps.locs, haystack, at)
.map_err(Error::regex)?
.is_some())
}
}
/// Represents the match offsets of each capturing group in a match.
///
/// The first, or `0`th capture group, always corresponds to the entire match
/// and is guaranteed to be present when a match occurs. The next capture
/// group, at index `1`, corresponds to the first capturing group in the regex,
/// ordered by the position at which the left opening parenthesis occurs.
///
/// Note that not all capturing groups are guaranteed to be present in a match.
/// For example, in the regex, `(?P<foo>\w)|(?P<bar>\W)`, only one of `foo`
/// or `bar` will ever be set in any given match.
///
/// In order to access a capture group by name, you'll need to first find the
/// index of the group using the corresponding matcher's `capture_index`
/// method, and then use that index with `RegexCaptures::get`.
#[derive(Clone, Debug)]
pub struct RegexCaptures {
/// Where the locations are stored.
locs: CaptureLocations,
}
impl Captures for RegexCaptures {
fn len(&self) -> usize {
self.locs.len()
}
fn get(&self, i: usize) -> Option<Match> {
self.locs.get(i).map(|(s, e)| Match::new(s, e))
}
}
impl RegexCaptures {
pub(crate) fn new(locs: CaptureLocations) -> RegexCaptures {
RegexCaptures { locs }
}
}
/// Determine whether the pattern contains an uppercase character which should
/// negate the effect of the smart-case option.
///
/// Ideally we would be able to check the AST in order to correctly handle
/// things like '\p{Ll}' and '\p{Lu}' (which should be treated as explicitly
/// cased), but PCRE doesn't expose enough details for that kind of analysis.
/// For now, our 'good enough' solution is to simply perform a semi-naïve
/// scan of the input pattern and ignore all characters following a '\'. The
/// This at least lets us support the most common cases, like 'foo\w' and
/// 'foo\S', in an intuitive manner.
fn has_uppercase_literal(pattern: &str) -> bool {
let mut chars = pattern.chars();
while let Some(c) = chars.next() {
if c == '\\' {
chars.next();
} else if c.is_uppercase() {
return true;
}
}
false
}
#[cfg(test)]
mod tests {
use grep_matcher::{LineMatchKind, Matcher};
use super::*;
// Test that enabling word matches does the right thing and demonstrate
// the difference between it and surrounding the regex in `\b`.
#[test]
fn word() {
let matcher = RegexMatcherBuilder::new()
.word(true)
.build(r"-2")
.unwrap();
assert!(matcher.is_match(b"abc -2 foo").unwrap());
let matcher = RegexMatcherBuilder::new()
.word(false)
.build(r"\b-2\b")
.unwrap();
assert!(!matcher.is_match(b"abc -2 foo").unwrap());
}
// Test that enabling CRLF permits `$` to match at the end of a line.
#[test]
fn line_terminator_crlf() {
// Test normal use of `$` with a `\n` line terminator.
let matcher = RegexMatcherBuilder::new()
.multi_line(true)
.build(r"abc$")
.unwrap();
assert!(matcher.is_match(b"abc\n").unwrap());
// Test that `$` doesn't match at `\r\n` boundary normally.
let matcher = RegexMatcherBuilder::new()
.multi_line(true)
.build(r"abc$")
.unwrap();
assert!(!matcher.is_match(b"abc\r\n").unwrap());
// Now check the CRLF handling.
let matcher = RegexMatcherBuilder::new()
.multi_line(true)
.crlf(true)
.build(r"abc$")
.unwrap();
assert!(matcher.is_match(b"abc\r\n").unwrap());
}
// Test that smart case works.
#[test]
fn case_smart() {
let matcher = RegexMatcherBuilder::new()
.case_smart(true)
.build(r"abc")
.unwrap();
assert!(matcher.is_match(b"ABC").unwrap());
let matcher = RegexMatcherBuilder::new()
.case_smart(true)
.build(r"aBc")
.unwrap();
assert!(!matcher.is_match(b"ABC").unwrap());
}
// Test that finding candidate lines works as expected.
#[test]
fn candidate_lines() {
fn is_confirmed(m: LineMatchKind) -> bool {
match m {
LineMatchKind::Confirmed(_) => true,
_ => false,
}
}
let matcher = RegexMatcherBuilder::new()
.build(r"\wfoo\s")
.unwrap();
let m = matcher.find_candidate_line(b"afoo ").unwrap().unwrap();
assert!(is_confirmed(m));
}
}

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-printer"
version = "0.1.0" #:version
version = "0.0.1" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
An implementation of the grep crate's Sink trait that provides standard
@@ -19,12 +19,13 @@ serde1 = ["base64", "serde", "serde_derive", "serde_json"]
[dependencies]
base64 = { version = "0.9", optional = true }
grep-matcher = { version = "0.1.0", path = "../grep-matcher" }
grep-searcher = { version = "0.1.0", path = "../grep-searcher" }
grep-matcher = { version = "0.0.1", path = "../grep-matcher" }
grep-searcher = { version = "0.0.1", path = "../grep-searcher" }
log = "0.4"
termcolor = "1"
serde = { version = "1", optional = true }
serde_derive = { version = "1", optional = true }
serde_json = { version = "1", optional = true }
[dev-dependencies]
grep-regex = { version = "0.1.0", path = "../grep-regex" }
grep-regex = { version = "0.0.1", path = "../grep-regex" }

View File

@@ -1,35 +1,4 @@
grep-printer
------------
Print results from line oriented searching in a human readable, aggregate or
JSON Lines format.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![](https://img.shields.io/crates/v/grep-printer.svg)](https://crates.io/crates/grep-printer)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
### Documentation
[https://docs.rs/grep-printer](https://docs.rs/grep-printer)
**NOTE:** You probably don't want to use this crate directly. Instead, you
should prefer the facade defined in the
[`grep`](https://docs.rs/grep)
crate.
### Usage
Add this to your `Cargo.toml`:
```toml
[dependencies]
grep-printer = "0.1"
```
and this to your crate root:
```rust
extern crate grep_printer;
```
grep
----
This is a *library* that provides grep-style line-by-line regex searching (with
comparable performance to `grep` itself).

View File

@@ -4,25 +4,6 @@ use std::str::FromStr;
use termcolor::{Color, ColorSpec, ParseColorError};
/// Returns a default set of color specifications.
///
/// This may change over time, but the color choices are meant to be fairly
/// conservative that work across terminal themes.
///
/// Additional color specifications can be added to the list returned. More
/// recently added specifications override previously added specifications.
pub fn default_color_specs() -> Vec<UserColorSpec> {
vec![
#[cfg(unix)]
"path:fg:magenta".parse().unwrap(),
#[cfg(windows)]
"path:fg:cyan".parse().unwrap(),
"line:fg:green".parse().unwrap(),
"match:fg:red".parse().unwrap(),
"match:style:bold".parse().unwrap(),
]
}
/// An error that can occur when parsing color specifications.
#[derive(Clone, Debug, Eq, PartialEq)]
pub enum ColorError {
@@ -246,15 +227,6 @@ impl ColorSpecs {
merged
}
/// Create a default set of specifications that have color.
///
/// This is distinct from `ColorSpecs`'s `Default` implementation in that
/// this provides a set of default color choices, where as the `Default`
/// implementation provides no color choices.
pub fn default_with_color() -> ColorSpecs {
ColorSpecs::new(&default_color_specs())
}
/// Return the color specification for coloring file paths.
pub fn path(&self) -> &ColorSpec {
&self.path

View File

@@ -91,7 +91,7 @@ impl JSONBuilder {
/// When enabled, the `begin` and `end` messages are always emitted, even
/// when no match is found.
///
/// When disabled, the `begin` and `end` messages are only shown if there
/// When disabled, the `begin` and `end` messages are only shown is there
/// is at least one `match` or `context` message.
///
/// This is disabled by default.
@@ -108,7 +108,7 @@ impl JSONBuilder {
///
/// # Format
///
/// This section describes the JSON format used by this printer.
/// This section describe the JSON format used by this printer.
///
/// To skip the rigamarole, take a look at the
/// [example](#example)
@@ -619,13 +619,6 @@ impl<'p, 's, M: Matcher, W: io::Write> JSONSink<'p, 's, M, W> {
matches.push(m);
true
}).map_err(io::Error::error_message)?;
// Don't report empty matches appearing at the end of the bytes.
if !matches.is_empty()
&& matches.last().unwrap().is_empty()
&& matches.last().unwrap().start() >= bytes.len()
{
matches.pop().unwrap();
}
Ok(())
}

View File

@@ -1,7 +1,7 @@
/*!
This crate provides featureful and fast printers that interoperate with the
[`grep-searcher`](https://docs.rs/grep-searcher)
crate.
This crate provides a featureful and fast printer for showing search results
in a human readable way, and another printer for showing results in a machine
readable way.
# Brief overview
@@ -74,6 +74,8 @@ extern crate grep_matcher;
#[cfg(test)]
extern crate grep_regex;
extern crate grep_searcher;
#[macro_use]
extern crate log;
#[cfg(feature = "serde1")]
extern crate serde;
#[cfg(feature = "serde1")]
@@ -83,7 +85,7 @@ extern crate serde_derive;
extern crate serde_json;
extern crate termcolor;
pub use color::{ColorError, ColorSpecs, UserColorSpec, default_color_specs};
pub use color::{ColorError, ColorSpecs, UserColorSpec};
#[cfg(feature = "serde1")]
pub use json::{JSON, JSONBuilder, JSONSink};
pub use standard::{Standard, StandardBuilder, StandardSink};

View File

@@ -1,4 +1,4 @@
use std::cell::{Cell, RefCell};
use std::cell::RefCell;
use std::cmp;
use std::io::{self, Write};
use std::path::Path;
@@ -139,8 +139,6 @@ impl StandardBuilder {
/// A [`UserColorSpec`](struct.UserColorSpec.html) can be constructed from
/// a string in accordance with the color specification format. See the
/// `UserColorSpec` type documentation for more details on the format.
/// A [`ColorSpecs`](struct.ColorSpecs.html) can then be generated from
/// zero or more `UserColorSpec`s.
///
/// Regardless of the color specifications provided here, whether color
/// is actually used or not is determined by the implementation of
@@ -477,7 +475,6 @@ impl<W: WriteColor> Standard<W> {
} else {
None
};
let needs_match_granularity = self.needs_match_granularity();
StandardSink {
matcher: matcher,
standard: self,
@@ -488,7 +485,6 @@ impl<W: WriteColor> Standard<W> {
after_context_remaining: 0,
binary_byte_offset: None,
stats: stats,
needs_match_granularity: needs_match_granularity,
}
}
@@ -515,7 +511,6 @@ impl<W: WriteColor> Standard<W> {
};
let ppath = PrinterPath::with_separator(
path.as_ref(), self.config.separator_path);
let needs_match_granularity = self.needs_match_granularity();
StandardSink {
matcher: matcher,
standard: self,
@@ -526,32 +521,8 @@ impl<W: WriteColor> Standard<W> {
after_context_remaining: 0,
binary_byte_offset: None,
stats: stats,
needs_match_granularity: needs_match_granularity,
}
}
/// Returns true if and only if the configuration of the printer requires
/// us to find each individual match in the lines reported by the searcher.
///
/// We care about this distinction because finding each individual match
/// costs more, so we only do it when we need to.
fn needs_match_granularity(&self) -> bool {
let supports_color = self.wtr.borrow().supports_color();
let match_colored = !self.config.colors.matched().is_none();
// Coloring requires identifying each individual match.
(supports_color && match_colored)
// The column feature requires finding the position of the first match.
|| self.config.column
// Requires finding each match for performing replacement.
|| self.config.replacement.is_some()
// Emitting a line for each match requires finding each match.
|| self.config.per_match
// Emitting only the match requires finding each match.
|| self.config.only_matching
// Computing certain statistics requires finding each match.
|| self.config.stats
}
}
impl<W> Standard<W> {
@@ -610,7 +581,6 @@ pub struct StandardSink<'p, 's, M: Matcher, W: 's> {
after_context_remaining: u64,
binary_byte_offset: Option<u64>,
stats: Option<Stats>,
needs_match_granularity: bool,
}
impl<'p, 's, M: Matcher, W: WriteColor> StandardSink<'p, 's, M, W> {
@@ -662,7 +632,7 @@ impl<'p, 's, M: Matcher, W: WriteColor> StandardSink<'p, 's, M, W> {
/// locations if the current configuration demands match granularity.
fn record_matches(&mut self, bytes: &[u8]) -> io::Result<()> {
self.standard.matches.clear();
if !self.needs_match_granularity {
if !self.needs_match_granularity() {
return Ok(());
}
// If printing requires knowing the location of each individual match,
@@ -677,13 +647,6 @@ impl<'p, 's, M: Matcher, W: WriteColor> StandardSink<'p, 's, M, W> {
matches.push(m);
true
}).map_err(io::Error::error_message)?;
// Don't report empty matches appearing at the end of the bytes.
if !matches.is_empty()
&& matches.last().unwrap().is_empty()
&& matches.last().unwrap().start() >= bytes.len()
{
matches.pop().unwrap();
}
Ok(())
}
@@ -707,6 +670,29 @@ impl<'p, 's, M: Matcher, W: WriteColor> StandardSink<'p, 's, M, W> {
Ok(())
}
/// Returns true if and only if the configuration of the printer requires
/// us to find each individual match in the lines reported by the searcher.
///
/// We care about this distinction because finding each individual match
/// costs more, so we only do it when we need to.
fn needs_match_granularity(&self) -> bool {
let supports_color = self.standard.wtr.borrow().supports_color();
let match_colored = !self.standard.config.colors.matched().is_none();
// Coloring requires identifying each individual match.
(supports_color && match_colored)
// The column feature requires finding the position of the first match.
|| self.standard.config.column
// Requires finding each match for performing replacement.
|| self.standard.config.replacement.is_some()
// Emitting a line for each match requires finding each match.
|| self.standard.config.per_match
// Emitting only the match requires finding each match.
|| self.standard.config.only_matching
// Computing certain statistics requires finding each match.
|| self.standard.config.stats
}
/// Returns true if this printer should quit.
///
/// This implements the logic for handling quitting after seeing a certain
@@ -819,8 +805,6 @@ struct StandardImpl<'a, M: 'a + Matcher, W: 'a> {
searcher: &'a Searcher,
sink: &'a StandardSink<'a, 'a, M, W>,
sunk: Sunk<'a>,
/// Set to true if and only if we are writing a match with color.
in_color_match: Cell<bool>,
}
impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
@@ -833,7 +817,6 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
searcher: searcher,
sink: sink,
sunk: Sunk::empty(),
in_color_match: Cell::new(false),
}
}
@@ -877,14 +860,38 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
self.write_search_prelude()?;
if self.sunk.matches().is_empty() {
if self.multi_line() && !self.is_context() {
trace!(
"{:?}:{:?}:{}: sinking via sink_fast_multi_line",
self.sink.path,
self.sunk.line_number(),
self.sunk.absolute_byte_offset()
);
self.sink_fast_multi_line()
} else {
trace!(
"{:?}:{:?}:{}: sinking via sink_fast",
self.sink.path,
self.sunk.line_number(),
self.sunk.absolute_byte_offset()
);
self.sink_fast()
}
} else {
if self.multi_line() && !self.is_context() {
trace!(
"{:?}:{:?}:{}: sinking via sink_slow_multi_line",
self.sink.path,
self.sunk.line_number(),
self.sunk.absolute_byte_offset()
);
self.sink_slow_multi_line()
} else {
trace!(
"{:?}:{:?}:{}: sinking via sink_slow",
self.sink.path,
self.sunk.line_number(),
self.sunk.absolute_byte_offset()
);
self.sink_slow()
}
}
@@ -985,6 +992,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
}
let line_term = self.searcher.line_terminator().as_byte();
let spec = self.config().colors.matched();
let bytes = self.sunk.bytes();
let matches = self.sunk.matches();
let mut midx = 0;
@@ -1006,7 +1014,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
line = line.with_end(line.end() - 1);
}
if self.config().trim_ascii {
line = self.trim_ascii_prefix_range(bytes, line);
line = trim_ascii_prefix_range(bytes, line);
}
while !line.is_empty() {
@@ -1015,7 +1023,6 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
midx += 1;
continue;
} else {
self.end_color_match()?;
self.write(&bytes[line])?;
break;
}
@@ -1024,17 +1031,14 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
if line.start() < m.start() {
let upto = cmp::min(line.end(), m.start());
self.end_color_match()?;
self.write(&bytes[line.with_end(upto)])?;
line = line.with_start(upto);
} else {
let upto = cmp::min(line.end(), m.end());
self.start_color_match()?;
self.write(&bytes[line.with_end(upto)])?;
self.write_spec(spec, &bytes[line.with_end(upto)])?;
line = line.with_start(upto);
}
}
self.end_color_match()?;
self.write_line_term()?;
}
Ok(())
@@ -1054,7 +1058,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
line = line.with_end(line.end() - 1);
}
if self.config().trim_ascii {
line = self.trim_ascii_prefix_range(bytes, line);
line = trim_ascii_prefix_range(bytes, line);
}
while !line.is_empty() {
if matches[midx].end() <= line.start() {
@@ -1123,7 +1127,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
line = line.with_end(line.end() - 1);
}
if self.config().trim_ascii {
line = self.trim_ascii_prefix_range(bytes, line);
line = trim_ascii_prefix_range(bytes, line);
}
while !line.is_empty() {
@@ -1149,7 +1153,6 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
/// Write the beginning part of a matching line. This (may) include things
/// like the file path, line number among others, depending on the
/// configuration and the parameters given.
#[inline(always)]
fn write_prelude(
&self,
absolute_byte_offset: u64,
@@ -1175,7 +1178,6 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
Ok(())
}
#[inline(always)]
fn write_line(
&self,
line: &[u8],
@@ -1201,35 +1203,32 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
if !self.wtr().borrow().supports_color() || spec.is_none() {
return self.write_line(line);
}
if self.exceeds_max_columns(line) {
return self.write_exceeded_line();
}
let mut last_written =
if !self.config().trim_ascii {
0
} else {
self.trim_ascii_prefix_range(
trim_ascii_prefix_range(
line,
Match::new(0, line.len()),
).start()
};
for mut m in matches.iter().map(|&m| m) {
if last_written < m.start() {
self.end_color_match()?;
if last_written <= m.start() {
self.write(&line[last_written..m.start()])?;
} else if last_written < m.end() {
m = m.with_start(last_written);
} else {
continue;
}
if !m.is_empty() {
self.start_color_match()?;
self.write(&line[m])?;
}
last_written = m.end();
// This conditional checks if the match is both empty *and*
// past the end of the line. In this case, we never want to
// emit an additional color escape.
if m.start() != m.end() || m.end() != line.len() {
self.write_spec(spec, &line[m])?;
}
}
self.end_color_match()?;
self.write(&line[last_written..])?;
if !self.has_line_terminator(line) {
self.write_line_term()?;
@@ -1366,29 +1365,11 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
Ok(())
}
fn start_color_match(&self) -> io::Result<()> {
if self.in_color_match.get() {
return Ok(());
}
self.wtr().borrow_mut().set_color(self.config().colors.matched())?;
self.in_color_match.set(true);
Ok(())
}
fn end_color_match(&self) -> io::Result<()> {
if !self.in_color_match.get() {
return Ok(());
}
self.wtr().borrow_mut().reset()?;
self.in_color_match.set(false);
Ok(())
}
fn write_trim(&self, buf: &[u8]) -> io::Result<()> {
if !self.config().trim_ascii {
return self.write(buf);
}
self.write(self.trim_ascii_prefix(buf))
self.write(trim_ascii_prefix(buf))
}
fn write(&self, buf: &[u8]) -> io::Result<()> {
@@ -1444,21 +1425,6 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
fn multi_line(&self) -> bool {
self.searcher.multi_line_with_matcher(&self.sink.matcher)
}
/// Trim prefix ASCII spaces from the given slice and return the
/// corresponding range.
///
/// This stops trimming a prefix as soon as it sees non-whitespace or a
/// line terminator.
fn trim_ascii_prefix_range(&self, slice: &[u8], range: Match) -> Match {
trim_ascii_prefix_range(self.searcher.line_terminator(), slice, range)
}
/// Trim prefix ASCII spaces from the given slice and return the
/// corresponding sub-slice.
fn trim_ascii_prefix<'s>(&self, slice: &'s [u8]) -> &'s [u8] {
trim_ascii_prefix(self.searcher.line_terminator(), slice)
}
}
#[cfg(test)]
@@ -2021,31 +1987,6 @@ Watson
assert_eq_printed!(expected, got);
}
#[test]
fn trim_ascii_with_line_term() {
let matcher = RegexMatcher::new("Watson").unwrap();
let mut printer = StandardBuilder::new()
.trim_ascii(true)
.build(NoColor::new(vec![]));
SearcherBuilder::new()
.line_number(true)
.before_context(1)
.build()
.search_reader(
&matcher,
"\n Watson".as_bytes(),
printer.sink(&matcher),
)
.unwrap();
let got = printer_contents(&mut printer);
let expected = "\
1-
2:Watson
";
assert_eq_printed!(expected, got);
}
#[test]
fn line_number() {
let matcher = RegexMatcher::new("Watson").unwrap();

View File

@@ -190,8 +190,6 @@ impl SummaryBuilder {
/// A [`UserColorSpec`](struct.UserColorSpec.html) can be constructed from
/// a string in accordance with the color specification format. See the
/// `UserColorSpec` type documentation for more details on the format.
/// A [`ColorSpecs`](struct.ColorSpecs.html) can then be generated from
/// zero or more `UserColorSpec`s.
///
/// Regardless of the color specifications provided here, whether color
/// is actually used or not is determined by the implementation of

View File

@@ -4,7 +4,7 @@ use std::io;
use std::path::Path;
use std::time;
use grep_matcher::{Captures, LineTerminator, Match, Matcher};
use grep_matcher::{Captures, Match, Matcher};
use grep_searcher::{
LineIter,
SinkError, SinkContext, SinkContextKind, SinkMatch,
@@ -157,7 +157,6 @@ pub struct Sunk<'a> {
}
impl<'a> Sunk<'a> {
#[inline]
pub fn empty() -> Sunk<'static> {
Sunk {
bytes: &[],
@@ -169,7 +168,6 @@ impl<'a> Sunk<'a> {
}
}
#[inline]
pub fn from_sink_match(
sunk: &'a SinkMatch<'a>,
original_matches: &'a [Match],
@@ -188,7 +186,6 @@ impl<'a> Sunk<'a> {
}
}
#[inline]
pub fn from_sink_context(
sunk: &'a SinkContext<'a>,
original_matches: &'a [Match],
@@ -207,37 +204,30 @@ impl<'a> Sunk<'a> {
}
}
#[inline]
pub fn context_kind(&self) -> Option<&'a SinkContextKind> {
self.context_kind
}
#[inline]
pub fn bytes(&self) -> &'a [u8] {
self.bytes
}
#[inline]
pub fn matches(&self) -> &'a [Match] {
self.matches
}
#[inline]
pub fn original_matches(&self) -> &'a [Match] {
self.original_matches
}
#[inline]
pub fn lines(&self, line_term: u8) -> LineIter<'a> {
LineIter::new(line_term, self.bytes())
}
#[inline]
pub fn absolute_byte_offset(&self) -> u64 {
self.absolute_byte_offset
}
#[inline]
pub fn line_number(&self) -> Option<u64> {
self.line_number
}
@@ -327,7 +317,7 @@ pub struct NiceDuration(pub time::Duration);
impl fmt::Display for NiceDuration {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{:0.6}s", self.fractional_seconds())
write!(f, "{:0.4}s", self.fractional_seconds())
}
}
@@ -356,37 +346,21 @@ impl Serialize for NiceDuration {
/// Trim prefix ASCII spaces from the given slice and return the corresponding
/// range.
///
/// This stops trimming a prefix as soon as it sees non-whitespace or a line
/// terminator.
pub fn trim_ascii_prefix_range(
line_term: LineTerminator,
slice: &[u8],
range: Match,
) -> Match {
fn is_space(b: u8) -> bool {
match b {
pub fn trim_ascii_prefix_range(slice: &[u8], range: Match) -> Match {
fn is_space(b: &&u8) -> bool {
match **b {
b'\t' | b'\n' | b'\x0B' | b'\x0C' | b'\r' | b' ' => true,
_ => false,
}
}
let count = slice[range]
.iter()
.take_while(|&&b| -> bool {
is_space(b) && !line_term.as_bytes().contains(&b)
})
.count();
let count = slice[range].iter().take_while(is_space).count();
range.with_start(range.start() + count)
}
/// Trim prefix ASCII spaces from the given slice and return the corresponding
/// sub-slice.
pub fn trim_ascii_prefix(line_term: LineTerminator, slice: &[u8]) -> &[u8] {
let range = trim_ascii_prefix_range(
line_term,
slice,
Match::new(0, slice.len()),
);
pub fn trim_ascii_prefix(slice: &[u8]) -> &[u8] {
let range = trim_ascii_prefix_range(slice, Match::new(0, slice.len()));
&slice[range]
}

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-regex"
version = "0.1.0" #:version
version = "0.0.1" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Use Rust's regex library with the 'grep' crate.
@@ -14,8 +14,8 @@ license = "Unlicense/MIT"
[dependencies]
log = "0.4"
grep-matcher = { version = "0.1.0", path = "../grep-matcher" }
grep-matcher = { version = "0.0.1", path = "../grep-matcher" }
regex = "1"
regex-syntax = "0.6"
thread_local = "0.3.6"
thread_local = "0.3.5"
utf8-ranges = "1"

View File

@@ -1,35 +1,4 @@
grep-regex
----------
The `grep-regex` crate provides an implementation of the `Matcher` trait from
the `grep-matcher` crate. This implementation permits Rust's regex engine to
be used in the `grep` crate for fast line oriented searching.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![](https://img.shields.io/crates/v/grep-regex.svg)](https://crates.io/crates/grep-regex)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
### Documentation
[https://docs.rs/grep-regex](https://docs.rs/grep-regex)
**NOTE:** You probably don't want to use this crate directly. Instead, you
should prefer the facade defined in the
[`grep`](https://docs.rs/grep)
crate.
### Usage
Add this to your `Cargo.toml`:
```toml
[dependencies]
grep-regex = "0.1"
```
and this to your crate root:
```rust
extern crate grep_regex;
```
grep
----
This is a *library* that provides grep-style line-by-line regex searching (with
comparable performance to `grep` itself).

View File

@@ -201,6 +201,21 @@ impl ConfiguredHIR {
}
match LiteralSets::new(&self.expr).one_regex() {
None => Ok(None),
/*
if !self.config.crlf {
return Ok(None);
}
// If we're trying to support CRLF, then our "fast" line
// oriented regex needs `$` to be able to match at a `\r\n`
// boundary. The regex engine doesn't support this, so we
// "fake" it by replacing `$` with `(?:\r?$)`. Since the
// fast line regex is only used to detect lines, this never
// infects match offsets. Namely, the regex generated via
// `self.expr` is matched against lines with line terminators
// stripped.
let pattern = crlfify(self.expr.clone()).to_string();
self.pattern_to_regex(&pattern).map(Some)
*/
Some(pattern) => self.pattern_to_regex(&pattern).map(Some),
}
}

View File

@@ -263,7 +263,7 @@ impl RegexMatcherBuilder {
/// be slightly different than what one would expect given the pattern.
/// This is the trade off made: in many cases, `$` will "just work" in the
/// presence of `\r\n` line terminators, but matches may require some
/// trimming to faithfully represent the intended match.
/// trimming to faithfully represent the indended match.
///
/// Note that if you do not wish to set the line terminator but would still
/// like `$` to match `\r\n` line terminators, then it is valid to call

View File

@@ -1,6 +1,6 @@
[package]
name = "grep-searcher"
version = "0.1.0" #:version
version = "0.0.1" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Fast line oriented regex searching as a library.
@@ -15,14 +15,14 @@ license = "Unlicense/MIT"
[dependencies]
bytecount = "0.3.1"
encoding_rs = "0.8"
encoding_rs_io = "0.1.2"
grep-matcher = { version = "0.1.0", path = "../grep-matcher" }
encoding_rs_io = "0.1"
grep-matcher = { version = "0.0.1", path = "../grep-matcher" }
log = "0.4"
memchr = "2"
memmap = "0.6"
[dev-dependencies]
grep-regex = { version = "0.1.0", path = "../grep-regex" }
grep-regex = { version = "0.0.1", path = "../grep-regex" }
regex = "1"
[features]

View File

@@ -1,37 +1,4 @@
grep-searcher
-------------
A high level library for executing fast line oriented searches. This handles
things like reporting contextual lines, counting lines, inverting a search,
detecting binary data, automatic UTF-16 transcoding and deciding whether or not
to use memory maps.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![](https://img.shields.io/crates/v/grep-searcher.svg)](https://crates.io/crates/grep-searcher)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
### Documentation
[https://docs.rs/grep-searcher](https://docs.rs/grep-searcher)
**NOTE:** You probably don't want to use this crate directly. Instead, you
should prefer the facade defined in the
[`grep`](https://docs.rs/grep)
crate.
### Usage
Add this to your `Cargo.toml`:
```toml
[dependencies]
grep-searcher = "0.1"
```
and this to your crate root:
```rust
extern crate grep_searcher;
```
grep
----
This is a *library* that provides grep-style line-by-line regex searching (with
comparable performance to `grep` itself).

View File

@@ -294,8 +294,8 @@ pub struct LineBuffer {
/// has been exhausted.
last_lineterm: usize,
/// The end position of the buffer. This is always greater than or equal to
/// last_lineterm. The bytes between last_lineterm and end, if any, always
/// correspond to a partial line.
/// lastnl. The bytes between lastnl and end, if any, always correspond to
/// a partial line.
end: usize,
/// The absolute byte offset corresponding to `pos`. This is most typically
/// not a valid index into addressable memory, but rather, an offset that
@@ -475,26 +475,8 @@ impl LineBuffer {
// in bounds, which they should always be, and we enforce with
// an assert above.
//
// It seems like it should be possible to do this in safe code that
// results in the same codegen. I tried the obvious:
//
// for (src, dst) in (self.pos..self.end).zip(0..) {
// self.buf[dst] = self.buf[src];
// }
//
// But the above does not work, and in fact compiles down to a slow
// byte-by-byte loop. I tried a few other minor variations, but
// alas, better minds might prevail.
//
// Overall, this doesn't save us *too* much. It mostly matters when
// the number of bytes we're copying is large, which can happen
// if the searcher is asked to produce a lot of context. We could
// decide this isn't worth it, but it does make an appreciable
// impact at or around the context=30 range on my machine.
//
// We could also use a temporary buffer that compiles down to two
// memcpys and is faster than the byte-at-a-time loop, but it
// complicates our options for limiting memory allocation a bit.
// TODO: It seems like it should be possible to do this in safe
// code that results in the same codegen.
ptr::copy(
self.buf[self.pos..].as_ptr(),
self.buf.as_mut_ptr(),
@@ -503,7 +485,7 @@ impl LineBuffer {
}
self.pos = 0;
self.last_lineterm = roll_len;
self.end = roll_len;
self.end = self.last_lineterm;
}
/// Ensures that the internal buffer has a non-zero amount of free space

View File

@@ -72,18 +72,7 @@ impl LineStep {
///
/// The range returned includes the line terminator. Ranges are always
/// non-empty.
pub fn next(&mut self, bytes: &[u8]) -> Option<(usize, usize)> {
self.next_impl(bytes)
}
/// Like next, but returns a `Match` instead of a tuple.
#[inline(always)]
pub(crate) fn next_match(&mut self, bytes: &[u8]) -> Option<Match> {
self.next_impl(bytes).map(|(s, e)| Match::new(s, e))
}
#[inline(always)]
fn next_impl(&mut self, mut bytes: &[u8]) -> Option<(usize, usize)> {
pub fn next(&mut self, mut bytes: &[u8]) -> Option<(usize, usize)> {
bytes = &bytes[..self.end];
match memchr(self.line_term, &bytes[self.pos..]) {
None => {
@@ -106,6 +95,11 @@ impl LineStep {
}
}
}
/// Like next, but returns a `Match` instead of a tuple.
pub(crate) fn next_match(&mut self, bytes: &[u8]) -> Option<Match> {
self.next(bytes).map(|(s, e)| Match::new(s, e))
}
}
/// Count the number of occurrences of `line_term` in `bytes`.
@@ -115,11 +109,9 @@ pub fn count(bytes: &[u8], line_term: u8) -> u64 {
/// Given a line that possibly ends with a terminator, return that line without
/// the terminator.
#[inline(always)]
pub fn without_terminator(bytes: &[u8], line_term: LineTerminator) -> &[u8] {
let line_term = line_term.as_bytes();
let start = bytes.len().saturating_sub(line_term.len());
if bytes.get(start..) == Some(line_term) {
if bytes.get(bytes.len().saturating_sub(line_term.len())..) == Some(line_term) {
return &bytes[..bytes.len() - line_term.len()];
}
bytes
@@ -129,7 +121,6 @@ pub fn without_terminator(bytes: &[u8], line_term: LineTerminator) -> &[u8] {
/// of bytes.
///
/// Line terminators are considered part of the line they terminate.
#[inline(always)]
pub fn locate(
bytes: &[u8],
line_term: u8,
@@ -176,7 +167,7 @@ fn preceding_by_pos(
) -> usize {
if pos == 0 {
return 0;
} else if bytes[pos - 1] == line_term {
} else if bytes[pos - 1] == b'\n' {
pos -= 1;
}
loop {

View File

@@ -290,13 +290,11 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
return Ok(false);
}
} else if let Some(line) = self.find_by_line_fast(buf)? {
if self.config.max_context() > 0 {
if !self.after_context_by_line(buf, line.start())? {
return Ok(false);
}
if !self.before_context_by_line(buf, line.start())? {
return Ok(false);
}
if !self.after_context_by_line(buf, line.start())? {
return Ok(false);
}
if !self.before_context_by_line(buf, line.start())? {
return Ok(false);
}
self.set_pos(line.end());
if !self.sink_matched(buf, &line)? {
@@ -313,7 +311,6 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
Ok(true)
}
#[inline(always)]
fn match_by_line_fast_invert(
&mut self,
buf: &[u8],
@@ -354,7 +351,6 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
Ok(true)
}
#[inline(always)]
fn find_by_line_fast(
&self,
buf: &[u8],
@@ -410,7 +406,6 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
Ok(None)
}
#[inline(always)]
fn sink_matched(
&mut self,
buf: &[u8],
@@ -424,21 +419,11 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
}
self.count_lines(buf, range.start());
let offset = self.absolute_byte_offset + range.start() as u64;
let linebuf =
if self.config.line_term.is_crlf() {
// Normally, a line terminator is never part of a match, but
// if the line terminator is CRLF, then it's possible for `\r`
// to end up in the match, which we generally don't want. So
// we strip it here.
lines::without_terminator(&buf[*range], self.config.line_term)
} else {
&buf[*range]
};
let keepgoing = self.sink.matched(
&self.searcher,
&SinkMatch {
line_term: self.config.line_term,
bytes: linebuf,
bytes: &buf[*range],
absolute_byte_offset: offset,
line_number: self.line_number,
},

View File

@@ -76,9 +76,9 @@ impl MmapChoice {
return None;
}
// SAFETY: This is acceptable because the only way `MmapChoiceImpl` can
// be `Auto` is if the caller invoked the `auto` constructor, which
// is itself not safe. Thus, this is a propagation of the caller's
// assertion that using memory maps is safe.
// be `Auto` is if the caller invoked the `auto` constructor. Thus,
// this is a propagation of the caller's assertion that using memory
// maps is safe.
match unsafe { Mmap::map(file) } {
Ok(mmap) => Some(mmap),
Err(err) => {

View File

@@ -296,7 +296,7 @@ impl SearcherBuilder {
}
}
/// Build a searcher with the given matcher.
/// Builder a searcher with the given matcher.
pub fn build(&self) -> Searcher {
let mut config = self.config.clone();
if config.passthru {
@@ -306,8 +306,7 @@ impl SearcherBuilder {
let mut decode_builder = DecodeReaderBytesBuilder::new();
decode_builder
.encoding(self.config.encoding.as_ref().map(|e| e.0))
.utf8_passthru(true)
.bom_override(true);
.utf8_passthru(true);
Searcher {
config: config,
decode_builder: decode_builder,
@@ -319,7 +318,7 @@ impl SearcherBuilder {
/// Set the line terminator that is used by the searcher.
///
/// When using a searcher, if the matcher provided has a line terminator
/// When building a searcher, if the matcher provided has a line terminator
/// set, then it must be the same as this one. If they aren't, building
/// a searcher will return an error.
///
@@ -454,25 +453,12 @@ impl SearcherBuilder {
/// enabled, then the entire contents will be read on to the heap before
/// searching begins.
///
/// The default behavior is **never**. Generally speaking, and perhaps
/// against conventional wisdom, memory maps don't necessarily enable
/// faster searching. For example, depending on the platform, using memory
/// maps while searching a large directory can actually be quite a bit
/// slower than using normal read calls because of the overhead of managing
/// the memory maps.
///
/// Memory maps can be faster in some cases however. On some platforms,
/// when searching a very large file that *is already in memory*, it can
/// be slightly faster to search it as a memory map instead of using
/// normal read calls.
///
/// Finally, memory maps have a somewhat complicated safety story in Rust.
/// If you aren't sure whether enabling memory maps is worth it, then just
/// don't bother with it.
///
/// **WARNING**: If your process is searching a file backed memory map
/// at the same time that file is truncated, then it's possible for the
/// process to terminate with a bus error.
/// The default behavior is **never**. Generally speaking, command line
/// programs probably want to enable memory maps. The only reason to keep
/// memory maps disabled is if there are concerns using them. For example,
/// if your process is searching a file backed memory map at the same time
/// that file is truncated, then it's possible for the process to terminate
/// with a bus error.
pub fn memory_map(
&mut self,
strategy: MmapChoice,
@@ -500,8 +486,7 @@ impl SearcherBuilder {
/// Set the encoding used to read the source data before searching.
///
/// When an encoding is provided, then the source data is _unconditionally_
/// transcoded using the encoding, unless a BOM is present. If a BOM is
/// present, then the encoding indicated by the BOM is used instead. If the
/// transcoded using the encoding. This will disable BOM sniffing. If the
/// transcoding process encounters an error, then bytes are replaced with
/// the Unicode replacement codepoint.
///
@@ -747,7 +732,6 @@ impl Searcher {
/// where the output may be tailored based on how the searcher is configured.
impl Searcher {
/// Returns the line terminator used by this searcher.
#[inline]
pub fn line_terminator(&self) -> LineTerminator {
self.config.line_term
}
@@ -755,21 +739,18 @@ impl Searcher {
/// Returns true if and only if this searcher is configured to invert its
/// search results. That is, matching lines are lines that do **not** match
/// the searcher's matcher.
#[inline]
pub fn invert_match(&self) -> bool {
self.config.invert_match
}
/// Returns true if and only if this searcher is configured to count line
/// numbers.
#[inline]
pub fn line_number(&self) -> bool {
self.config.line_number
}
/// Returns true if and only if this searcher is configured to perform
/// multi line search.
#[inline]
pub fn multi_line(&self) -> bool {
self.config.multi_line
}
@@ -804,20 +785,17 @@ impl Searcher {
/// Returns the number of "after" context lines to report. When context
/// reporting is not enabled, this returns `0`.
#[inline]
pub fn after_context(&self) -> usize {
self.config.after_context
}
/// Returns the number of "before" context lines to report. When context
/// reporting is not enabled, this returns `0`.
#[inline]
pub fn before_context(&self) -> usize {
self.config.before_context
}
/// Returns true if and only if the searcher has "passthru" mode enabled.
#[inline]
pub fn passthru(&self) -> bool {
self.config.passthru
}

View File

@@ -69,7 +69,7 @@ impl SinkError for Box<::std::error::Error> {
/// an implementation of this trait to a searcher, and the searcher is then
/// responsible for calling the methods on this trait.
///
/// This trait defines several behaviors:
/// This trait defines five behaviors:
///
/// * What to do when a match is found. Callers must provide this.
/// * What to do when an error occurs. Callers must provide this via the

View File

@@ -13,19 +13,11 @@ keywords = ["regex", "grep", "egrep", "search", "pattern"]
license = "Unlicense/MIT"
[dependencies]
grep-cli = { version = "0.1.0", path = "../grep-cli" }
grep-matcher = { version = "0.1.0", path = "../grep-matcher" }
grep-pcre2 = { version = "0.1.0", path = "../grep-pcre2", optional = true }
grep-printer = { version = "0.1.0", path = "../grep-printer" }
grep-regex = { version = "0.1.0", path = "../grep-regex" }
grep-searcher = { version = "0.1.0", path = "../grep-searcher" }
[dev-dependencies]
atty = "0.2.11"
termcolor = "1"
walkdir = "2.2.2"
grep-matcher = { version = "0.0.1", path = "../grep-matcher" }
grep-printer = { version = "0.0.1", path = "../grep-printer" }
grep-regex = { version = "0.0.1", path = "../grep-regex" }
grep-searcher = { version = "0.0.1", path = "../grep-searcher" }
[features]
avx-accel = ["grep-searcher/avx-accel"]
simd-accel = ["grep-searcher/simd-accel"]
pcre2 = ["grep-pcre2"]

View File

@@ -1,41 +1,4 @@
grep
----
ripgrep, as a library.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![](https://img.shields.io/crates/v/grep.svg)](https://crates.io/crates/grep)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
### Documentation
[https://docs.rs/grep](https://docs.rs/grep)
NOTE: This crate isn't ready for wide use yet. Ambitious individuals can
probably piece together the parts, but there is no high level documentation
describing how all of the pieces fit together.
### Usage
Add this to your `Cargo.toml`:
```toml
[dependencies]
grep = "0.2"
```
and this to your crate root:
```rust
extern crate grep;
```
### Features
This crate provides a `pcre2` feature (disabled by default) which, when
enabled, re-exports the `grep-pcre2` crate as an alternative `Matcher`
implementation to the standard `grep-regex` implementation.
This is a *library* that provides grep-style line-by-line regex searching (with
comparable performance to `grep` itself).

View File

@@ -1,89 +0,0 @@
extern crate grep;
extern crate termcolor;
extern crate walkdir;
use std::env;
use std::ffi::OsString;
use std::path::Path;
use std::process;
use std::result;
use grep::cli;
use grep::printer::{ColorSpecs, StandardBuilder};
use grep::regex::RegexMatcher;
use grep::searcher::{BinaryDetection, SearcherBuilder};
use termcolor::ColorChoice;
use walkdir::WalkDir;
macro_rules! fail {
($($tt:tt)*) => {
return Err(From::from(format!($($tt)*)));
}
}
type Result<T> = result::Result<T, Box<::std::error::Error>>;
fn main() {
if let Err(err) = try_main() {
eprintln!("{}", err);
process::exit(1);
}
}
fn try_main() -> Result<()> {
let mut args: Vec<OsString> = env::args_os().collect();
if args.len() < 2 {
fail!("Usage: simplegrep <pattern> [<path> ...]");
}
if args.len() == 2 {
args.push(OsString::from("./"));
}
search(cli::pattern_from_os(&args[1])?, &args[2..])
}
fn search(pattern: &str, paths: &[OsString]) -> Result<()> {
let matcher = RegexMatcher::new_line_matcher(&pattern)?;
let mut searcher = SearcherBuilder::new()
.binary_detection(BinaryDetection::quit(b'\x00'))
.line_number(false)
.build();
let mut printer = StandardBuilder::new()
.color_specs(ColorSpecs::default_with_color())
.build(cli::stdout(color_choice()));
for path in paths {
for result in WalkDir::new(path) {
let dent = match result {
Ok(dent) => dent,
Err(err) => {
eprintln!(
"{}: {}",
err.path().unwrap_or(Path::new("error")).display(),
err,
);
continue;
}
};
if !dent.file_type().is_file() {
continue;
}
let result = searcher.search_path(
&matcher,
dent.path(),
printer.sink_with_path(&matcher, dent.path()),
);
if let Err(err) = result {
eprintln!("{}: {}", dent.path().display(), err);
}
}
}
Ok(())
}
fn color_choice() -> ColorChoice {
if cli::is_tty_stdout() {
ColorChoice::Auto
} else {
ColorChoice::Never
}
}

View File

@@ -1,23 +1,10 @@
/*!
ripgrep, as a library.
This library is intended to provide a high level facade to the crates that
make up ripgrep's core searching routines. However, there is no high level
documentation available yet guiding users on how to fit all of the pieces
together.
Every public API item in the constituent crates is documented, but examples
are sparse.
A cookbook and a guide are planned.
TODO.
*/
#![deny(missing_docs)]
pub extern crate grep_cli as cli;
pub extern crate grep_matcher as matcher;
#[cfg(feature = "pcre2")]
pub extern crate grep_pcre2 as pcre2;
pub extern crate grep_printer as printer;
pub extern crate grep_regex as regex;
pub extern crate grep_searcher as searcher;

View File

@@ -18,7 +18,7 @@ name = "ignore"
bench = false
[dependencies]
crossbeam-channel = "0.2"
crossbeam = "0.3"
globset = { version = "0.4.0", path = "../globset" }
lazy_static = "1"
log = "0.4"
@@ -26,10 +26,11 @@ memchr = "2"
regex = "1"
same-file = "1"
thread_local = "0.3.2"
walkdir = "2.2.2"
walkdir = "2"
[target.'cfg(windows)'.dependencies.winapi-util]
version = "0.1.1"
[target.'cfg(windows)'.dependencies.winapi]
version = "0.3"
features = ["std", "winnt"]
[dev-dependencies]
tempdir = "0.3.5"

View File

@@ -4,7 +4,7 @@ The ignore crate provides a fast recursive directory iterator that respects
various filters such as globs, file types and `.gitignore` files. This crate
also provides lower level direct access to gitignore and file type matchers.
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.svg)](https://travis-ci.org/BurntSushi/ripgrep)
[![Linux build status](https://api.travis-ci.org/BurntSushi/ripgrep.png)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![](https://img.shields.io/crates/v/ignore.svg)](https://crates.io/crates/ignore)

View File

@@ -1,12 +1,17 @@
extern crate crossbeam_channel as channel;
#![allow(dead_code, unused_imports, unused_mut, unused_variables)]
extern crate crossbeam;
extern crate ignore;
extern crate walkdir;
use std::env;
use std::io::{self, Write};
use std::path::Path;
use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::thread;
use crossbeam::sync::MsQueue;
use ignore::WalkBuilder;
use walkdir::WalkDir;
@@ -14,7 +19,7 @@ fn main() {
let mut path = env::args().nth(1).unwrap();
let mut parallel = false;
let mut simple = false;
let (tx, rx) = channel::bounded::<DirEntry>(100);
let queue: Arc<MsQueue<Option<DirEntry>>> = Arc::new(MsQueue::new());
if path == "parallel" {
path = env::args().nth(2).unwrap();
parallel = true;
@@ -23,9 +28,10 @@ fn main() {
simple = true;
}
let stdout_queue = queue.clone();
let stdout_thread = thread::spawn(move || {
let mut stdout = io::BufWriter::new(io::stdout());
for dent in rx {
while let Some(dent) = stdout_queue.pop() {
write_path(&mut stdout, dent.path());
}
});
@@ -33,26 +39,28 @@ fn main() {
if parallel {
let walker = WalkBuilder::new(path).threads(6).build_parallel();
walker.run(|| {
let tx = tx.clone();
let queue = queue.clone();
Box::new(move |result| {
use ignore::WalkState::*;
tx.send(DirEntry::Y(result.unwrap()));
queue.push(Some(DirEntry::Y(result.unwrap())));
Continue
})
});
} else if simple {
let mut stdout = io::BufWriter::new(io::stdout());
let walker = WalkDir::new(path);
for result in walker {
tx.send(DirEntry::X(result.unwrap()));
queue.push(Some(DirEntry::X(result.unwrap())));
}
} else {
let mut stdout = io::BufWriter::new(io::stdout());
let walker = WalkBuilder::new(path).build();
for result in walker {
tx.send(DirEntry::Y(result.unwrap()));
queue.push(Some(DirEntry::Y(result.unwrap())));
}
}
drop(tx);
queue.push(None);
stdout_thread.join().unwrap();
}

View File

@@ -46,7 +46,7 @@ See the documentation for `WalkBuilder` for many other options.
#![deny(missing_docs)]
extern crate crossbeam_channel as channel;
extern crate crossbeam;
extern crate globset;
#[macro_use]
extern crate lazy_static;
@@ -60,7 +60,7 @@ extern crate tempdir;
extern crate thread_local;
extern crate walkdir;
#[cfg(windows)]
extern crate winapi_util;
extern crate winapi;
use std::error;
use std::fmt;

View File

@@ -98,7 +98,6 @@ use {Error, Match};
const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
("agda", &["*.agda", "*.lagda"]),
("ats", &["*.ats", "*.dats", "*.sats", "*.hats"]),
("aidl", &["*.aidl"]),
("amake", &["*.mk", "*.bp"]),
("asciidoc", &["*.adoc", "*.asc", "*.asciidoc"]),
@@ -108,7 +107,7 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
("bazel", &["*.bzl", "WORKSPACE", "BUILD"]),
("bitbake", &["*.bb", "*.bbappend", "*.bbclass", "*.conf", "*.inc"]),
("bzip2", &["*.bz2"]),
("c", &["*.c", "*.h", "*.H", "*.cats"]),
("c", &["*.c", "*.h", "*.H"]),
("cabal", &["*.cabal"]),
("cbor", &["*.cbor"]),
("ceylon", &["*.ceylon"]),
@@ -130,7 +129,6 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
("cython", &["*.pyx"]),
("dart", &["*.dart"]),
("d", &["*.d"]),
("dhall", &["*.dhall"]),
("docker", &["*Dockerfile*"]),
("elisp", &["*.el"]),
("elixir", &["*.ex", "*.eex", "*.exs"]),
@@ -149,10 +147,9 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
("groovy", &["*.groovy", "*.gradle"]),
("h", &["*.h", "*.hpp"]),
("hbs", &["*.hbs"]),
("haskell", &["*.hs", "*.lhs", "*.cpphs", "*.c2hs", "*.hsc"]),
("haskell", &["*.hs", "*.lhs"]),
("hs", &["*.hs", "*.lhs"]),
("html", &["*.htm", "*.html", "*.ejs"]),
("idris", &["*.idr", "*.lidr"]),
("java", &["*.java", "*.jsp"]),
("jinja", &["*.j2", "*.jinja", "*.jinja2"]),
("js", &[
@@ -203,7 +200,6 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
"makefile", "Makefile",
"*.mk", "*.mak"
]),
("mako", &["*.mako", "*.mao"]),
("markdown", &["*.markdown", "*.md", "*.mdown", "*.mkdn"]),
("md", &["*.markdown", "*.md", "*.mdown", "*.mkdn"]),
("man", &["*.[0-9lnpx]", "*.[0-9][cEFMmpSx]"]),
@@ -219,7 +215,6 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
("objcpp", &["*.h", "*.mm"]),
("ocaml", &["*.ml", "*.mli", "*.mll", "*.mly"]),
("org", &["*.org"]),
("pascal", &["*.pas", "*.dpr", "*.lpr", "*.pp", "*.inc"]),
("perl", &["*.perl", "*.pl", "*.PL", "*.plh", "*.plx", "*.pm", "*.t"]),
("pdf", &["*.pdf"]),
("php", &["*.php", "*.php3", "*.php4", "*.php5", "*.phtml"]),
@@ -237,7 +232,7 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
("ruby", &["Gemfile", "*.gemspec", ".irbrc", "Rakefile", "*.rb"]),
("rust", &["*.rs"]),
("sass", &["*.sass", "*.scss"]),
("scala", &["*.scala", "*.sbt"]),
("scala", &["*.scala"]),
("sh", &[
// Portable/misc. init files
".login", ".logout", ".profile", "profile",

View File

@@ -10,7 +10,7 @@ use std::thread;
use std::time::Duration;
use std::vec;
use channel;
use crossbeam::sync::MsQueue;
use same_file::Handle;
use walkdir::{self, WalkDir};
@@ -36,14 +36,6 @@ impl DirEntry {
self.dent.path()
}
/// The full path that this entry represents.
/// Analogous to [`path`], but moves ownership of the path.
///
/// [`path`]: struct.DirEntry.html#method.path
pub fn into_path(self) -> PathBuf {
self.dent.into_path()
}
/// Whether this entry corresponds to a symbolic link or not.
pub fn path_is_symlink(&self) -> bool {
self.dent.path_is_symlink()
@@ -92,8 +84,7 @@ impl DirEntry {
/// Returns an error, if one exists, associated with processing this entry.
///
/// An example of an error is one that occurred while parsing an ignore
/// file. Errors related to traversing a directory tree itself are reported
/// as part of yielding the directory entry, and not with this method.
/// file.
pub fn error(&self) -> Option<&Error> {
self.err.as_ref()
}
@@ -152,15 +143,6 @@ impl DirEntryInner {
}
}
fn into_path(self) -> PathBuf {
use self::DirEntryInner::*;
match self {
Stdin => PathBuf::from("<stdin>"),
Walkdir(x) => x.into_path(),
Raw(x) => x.into_path(),
}
}
fn path_is_symlink(&self) -> bool {
use self::DirEntryInner::*;
match *self {
@@ -233,6 +215,19 @@ impl DirEntryInner {
}
/// Returns true if and only if this entry points to a directory.
///
/// This works around a bug in Rust's standard library:
/// https://github.com/rust-lang/rust/issues/46484
#[cfg(windows)]
fn is_dir(&self) -> bool {
self.metadata().map(|md| metadata_is_dir(&md)).unwrap_or(false)
}
/// Returns true if and only if this entry points to a directory.
///
/// This works around a bug in Rust's standard library:
/// https://github.com/rust-lang/rust/issues/46484
#[cfg(not(windows))]
fn is_dir(&self) -> bool {
self.file_type().map(|ft| ft.is_dir()).unwrap_or(false)
}
@@ -257,6 +252,10 @@ struct DirEntryRaw {
ino: u64,
/// The underlying metadata (Windows only). We store this on Windows
/// because this comes for free while reading a directory.
///
/// We use this to determine whether an entry is a directory or not, which
/// works around a bug in Rust's standard library:
/// https://github.com/rust-lang/rust/issues/46484
#[cfg(windows)]
metadata: fs::Metadata,
}
@@ -279,10 +278,6 @@ impl DirEntryRaw {
&self.path
}
fn into_path(self) -> PathBuf {
self.path
}
fn path_is_symlink(&self) -> bool {
self.ty.is_symlink() || self.follow_link
}
@@ -380,29 +375,21 @@ impl DirEntryRaw {
}
#[cfg(not(unix))]
fn from_path(
depth: usize,
pb: PathBuf,
link: bool,
) -> Result<DirEntryRaw, Error> {
fn from_link(depth: usize, pb: PathBuf) -> Result<DirEntryRaw, Error> {
let md = fs::metadata(&pb).map_err(|err| {
Error::Io(err).with_path(&pb)
})?;
Ok(DirEntryRaw {
path: pb,
ty: md.file_type(),
follow_link: link,
follow_link: true,
depth: depth,
metadata: md,
})
}
#[cfg(unix)]
fn from_path(
depth: usize,
pb: PathBuf,
link: bool,
) -> Result<DirEntryRaw, Error> {
fn from_link(depth: usize, pb: PathBuf) -> Result<DirEntryRaw, Error> {
use std::os::unix::fs::MetadataExt;
let md = fs::metadata(&pb).map_err(|err| {
@@ -411,7 +398,7 @@ impl DirEntryRaw {
Ok(DirEntryRaw {
path: pb,
ty: md.file_type(),
follow_link: link,
follow_link: true,
depth: depth,
ino: md.ino(),
})
@@ -473,16 +460,10 @@ pub struct WalkBuilder {
max_depth: Option<usize>,
max_filesize: Option<u64>,
follow_links: bool,
same_file_system: bool,
sorter: Option<Sorter>,
sorter: Option<Arc<
Fn(&OsStr, &OsStr) -> cmp::Ordering + Send + Sync + 'static
>>,
threads: usize,
skip: Option<Arc<Handle>>,
}
#[derive(Clone)]
enum Sorter {
ByName(Arc<Fn(&OsStr, &OsStr) -> cmp::Ordering + Send + Sync + 'static>),
ByPath(Arc<Fn(&Path, &Path) -> cmp::Ordering + Send + Sync + 'static>),
}
impl fmt::Debug for WalkBuilder {
@@ -494,7 +475,6 @@ impl fmt::Debug for WalkBuilder {
.field("max_filesize", &self.max_filesize)
.field("follow_links", &self.follow_links)
.field("threads", &self.threads)
.field("skip", &self.skip)
.finish()
}
}
@@ -513,10 +493,8 @@ impl WalkBuilder {
max_depth: None,
max_filesize: None,
follow_links: false,
same_file_system: false,
sorter: None,
threads: 0,
skip: None,
}
}
@@ -524,30 +502,21 @@ impl WalkBuilder {
pub fn build(&self) -> Walk {
let follow_links = self.follow_links;
let max_depth = self.max_depth;
let sorter = self.sorter.clone();
let cmp = self.sorter.clone();
let its = self.paths.iter().map(move |p| {
if p == Path::new("-") {
(p.to_path_buf(), None)
} else {
let mut wd = WalkDir::new(p);
wd = wd.follow_links(follow_links || p.is_file());
wd = wd.same_file_system(self.same_file_system);
wd = wd.follow_links(follow_links || path_is_file(p));
if let Some(max_depth) = max_depth {
wd = wd.max_depth(max_depth);
}
if let Some(ref sorter) = sorter {
match sorter.clone() {
Sorter::ByName(cmp) => {
wd = wd.sort_by(move |a, b| {
cmp(a.file_name(), b.file_name())
});
}
Sorter::ByPath(cmp) => {
wd = wd.sort_by(move |a, b| {
cmp(a.path(), b.path())
});
}
}
if let Some(ref cmp) = cmp {
let cmp = cmp.clone();
wd = wd.sort_by(move |a, b| {
cmp(a.file_name(), b.file_name())
});
}
(p.to_path_buf(), Some(WalkEventIter::from(wd)))
}
@@ -559,7 +528,6 @@ impl WalkBuilder {
ig_root: ig_root.clone(),
ig: ig_root.clone(),
max_filesize: self.max_filesize,
skip: self.skip.clone(),
}
}
@@ -575,9 +543,7 @@ impl WalkBuilder {
max_depth: self.max_depth,
max_filesize: self.max_filesize,
follow_links: self.follow_links,
same_file_system: self.same_file_system,
threads: self.threads,
skip: self.skip.clone(),
}
}
@@ -764,30 +730,6 @@ impl WalkBuilder {
self
}
/// Set a function for sorting directory entries by their path.
///
/// If a compare function is set, the resulting iterator will return all
/// paths in sorted order. The compare function will be called to compare
/// entries from the same directory.
///
/// This is like `sort_by_file_name`, except the comparator accepts
/// a `&Path` instead of the base file name, which permits it to sort by
/// more criteria.
///
/// This method will override any previous sorter set by this method or
/// by `sort_by_file_name`.
///
/// Note that this is not used in the parallel iterator.
pub fn sort_by_file_path<F>(
&mut self,
cmp: F,
) -> &mut WalkBuilder
where F: Fn(&Path, &Path) -> cmp::Ordering + Send + Sync + 'static
{
self.sorter = Some(Sorter::ByPath(Arc::new(cmp)));
self
}
/// Set a function for sorting directory entries by file name.
///
/// If a compare function is set, the resulting iterator will return all
@@ -795,47 +737,11 @@ impl WalkBuilder {
/// names from entries from the same directory using only the name of the
/// entry.
///
/// This method will override any previous sorter set by this method or
/// by `sort_by_file_path`.
///
/// Note that this is not used in the parallel iterator.
pub fn sort_by_file_name<F>(&mut self, cmp: F) -> &mut WalkBuilder
where F: Fn(&OsStr, &OsStr) -> cmp::Ordering + Send + Sync + 'static
{
self.sorter = Some(Sorter::ByName(Arc::new(cmp)));
self
}
/// Do not cross file system boundaries.
///
/// When this option is enabled, directory traversal will not descend into
/// directories that are on a different file system from the root path.
///
/// Currently, this option is only supported on Unix and Windows. If this
/// option is used on an unsupported platform, then directory traversal
/// will immediately return an error and will not yield any entries.
pub fn same_file_system(&mut self, yes: bool) -> &mut WalkBuilder {
self.same_file_system = yes;
self
}
/// Do not yield directory entries that are believed to correspond to
/// stdout.
///
/// This is useful when a command is invoked via shell redirection to a
/// file that is also being read. For example, `grep -r foo ./ > results`
/// might end up trying to search `results` even though it is also writing
/// to it, which could cause an unbounded feedback loop. Setting this
/// option prevents this from happening by skipping over the `results`
/// file.
///
/// This is disabled by default.
pub fn skip_stdout(&mut self, yes: bool) -> &mut WalkBuilder {
if yes {
self.skip = stdout_handle().map(Arc::new);
} else {
self.skip = None;
}
self.sorter = Some(Arc::new(cmp));
self
}
}
@@ -852,7 +758,6 @@ pub struct Walk {
ig_root: Ignore,
ig: Ignore,
max_filesize: Option<u64>,
skip: Option<Arc<Handle>>,
}
impl Walk {
@@ -865,17 +770,12 @@ impl Walk {
WalkBuilder::new(path).build()
}
fn skip_entry(&self, ent: &DirEntry) -> Result<bool, Error> {
fn skip_entry(&self, ent: &walkdir::DirEntry) -> bool {
if ent.depth() == 0 {
return Ok(false);
return false;
}
if let Some(ref stdout) = self.skip {
if path_equals(ent, stdout)? {
return Ok(true);
}
}
let is_dir = ent.file_type().map_or(false, |ft| ft.is_dir());
let is_dir = walkdir_entry_is_dir(ent);
let max_size = self.max_filesize;
let should_skip_path = skip_path(&self.ig, ent.path(), is_dir);
let should_skip_filesize = if !is_dir && max_size.is_some() {
@@ -884,7 +784,7 @@ impl Walk {
false
};
Ok(should_skip_path || should_skip_filesize)
should_skip_path || should_skip_filesize
}
}
@@ -904,7 +804,7 @@ impl Iterator for Walk {
}
Some((path, Some(it))) => {
self.it = Some(it);
if path.is_dir() {
if path_is_dir(&path) {
let (ig, err) = self.ig_root.add_parents(path);
self.ig = ig;
if let Some(err) = err {
@@ -926,12 +826,7 @@ impl Iterator for Walk {
self.ig = self.ig.parent().unwrap();
}
Ok(WalkEvent::Dir(ent)) => {
let mut ent = DirEntry::new_walkdir(ent, None);
let should_skip = match self.skip_entry(&ent) {
Err(err) => return Some(Err(err)),
Ok(should_skip) => should_skip,
};
if should_skip {
if self.skip_entry(&ent) {
self.it.as_mut().unwrap().it.skip_current_dir();
// Still need to push this on the stack because
// we'll get a WalkEvent::Exit event for this dir.
@@ -942,19 +837,13 @@ impl Iterator for Walk {
}
let (igtmp, err) = self.ig.add_child(ent.path());
self.ig = igtmp;
ent.err = err;
return Some(Ok(ent));
return Some(Ok(DirEntry::new_walkdir(ent, err)));
}
Ok(WalkEvent::File(ent)) => {
let ent = DirEntry::new_walkdir(ent, None);
let should_skip = match self.skip_entry(&ent) {
Err(err) => return Some(Err(err)),
Ok(should_skip) => should_skip,
};
if should_skip {
if self.skip_entry(&ent) {
continue;
}
return Some(Ok(ent));
return Some(Ok(DirEntry::new_walkdir(ent, None)));
}
}
}
@@ -1005,7 +894,7 @@ impl Iterator for WalkEventIter {
None => None,
Some(Err(err)) => Some(Err(err)),
Some(Ok(dent)) => {
if dent.file_type().is_dir() {
if walkdir_entry_is_dir(&dent) {
self.depth += 1;
Some(Ok(WalkEvent::Dir(dent)))
} else {
@@ -1054,9 +943,7 @@ pub struct WalkParallel {
max_filesize: Option<u64>,
max_depth: Option<usize>,
follow_links: bool,
same_file_system: bool,
threads: usize,
skip: Option<Arc<Handle>>,
}
impl WalkParallel {
@@ -1069,43 +956,18 @@ impl WalkParallel {
) where F: FnMut() -> Box<FnMut(Result<DirEntry, Error>) -> WalkState + Send + 'static> {
let mut f = mkf();
let threads = self.threads();
// TODO: Figure out how to use a bounded channel here. With an
// unbounded channel, the workers can run away and fill up memory
// with all of the file paths. But a bounded channel doesn't work since
// our producers are also are consumers, so they end up getting stuck.
//
// We probably need to rethink parallel traversal completely to fix
// this. The best case scenario would be finding a way to use rayon
// to do this.
let (tx, rx) = channel::unbounded();
let queue = Arc::new(MsQueue::new());
let mut any_work = false;
// Send the initial set of root paths to the pool of workers.
// Note that we only send directories. For files, we send to them the
// callback directly.
for path in self.paths {
let (dent, root_device) =
let dent =
if path == Path::new("-") {
(DirEntry::new_stdin(), None)
DirEntry::new_stdin()
} else {
let root_device =
if !self.same_file_system {
None
} else {
match device_num(&path) {
Ok(root_device) => Some(root_device),
Err(err) => {
let err = Error::Io(err).with_path(path);
if f(Err(err)).is_quit() {
return;
}
continue;
}
}
};
match DirEntryRaw::from_path(0, path, false) {
Ok(dent) => {
(DirEntry::new_raw(dent, None), root_device)
}
match DirEntryRaw::from_link(0, path) {
Ok(dent) => DirEntry::new_raw(dent, None),
Err(err) => {
if f(Err(err)).is_quit() {
return;
@@ -1114,10 +976,9 @@ impl WalkParallel {
}
}
};
tx.send(Message::Work(Work {
queue.push(Message::Work(Work {
dent: dent,
ignore: self.ig_root.clone(),
root_device: root_device,
}));
any_work = true;
}
@@ -1133,8 +994,7 @@ impl WalkParallel {
for _ in 0..threads {
let worker = Worker {
f: mkf(),
tx: tx.clone(),
rx: rx.clone(),
queue: queue.clone(),
quit_now: quit_now.clone(),
is_waiting: false,
is_quitting: false,
@@ -1144,12 +1004,9 @@ impl WalkParallel {
max_depth: self.max_depth,
max_filesize: self.max_filesize,
follow_links: self.follow_links,
skip: self.skip.clone(),
};
handles.push(thread::spawn(|| worker.run()));
}
drop(tx);
drop(rx);
for handle in handles {
handle.join().unwrap();
}
@@ -1183,9 +1040,6 @@ struct Work {
dent: DirEntry,
/// Any ignore matchers that have been built for this directory's parents.
ignore: Ignore,
/// The root device number. When present, only files with the same device
/// number should be considered.
root_device: Option<u64>,
}
impl Work {
@@ -1245,10 +1099,8 @@ impl Work {
struct Worker {
/// The caller's callback.
f: Box<FnMut(Result<DirEntry, Error>) -> WalkState + Send + 'static>,
/// The push side of our mpmc queue.
tx: channel::Sender<Message>,
/// The receive side of our mpmc queue.
rx: channel::Receiver<Message>,
/// A queue of work items. This is multi-producer and multi-consumer.
queue: Arc<MsQueue<Message>>,
/// Whether all workers should quit at the next opportunity. Note that
/// this is distinct from quitting because of exhausting the contents of
/// a directory. Instead, this is used when the caller's callback indicates
@@ -1273,9 +1125,6 @@ struct Worker {
/// Whether to follow symbolic links or not. When this is enabled, loop
/// detection is performed.
follow_links: bool,
/// A file handle to skip, currently is either `None` or stdout, if it's
/// a file and it has been requested to skip files identical to stdout.
skip: Option<Arc<Handle>>,
}
impl Worker {
@@ -1310,23 +1159,6 @@ impl Worker {
continue;
}
};
let descend =
if let Some(root_device) = work.root_device {
match is_same_file_system(root_device, work.dent.path()) {
Ok(true) => true,
Ok(false) => false,
Err(err) => {
if (self.f)(Err(err)).is_quit() {
self.quit_now();
return;
}
false
}
}
} else {
true
};
let depth = work.dent.depth();
match (self.f)(Ok(work.dent)) {
WalkState::Continue => {}
@@ -1336,20 +1168,11 @@ impl Worker {
return;
}
}
if !descend {
continue;
}
if self.max_depth.map_or(false, |max| depth >= max) {
continue;
}
for result in readdir {
let state = self.run_one(
&work.ignore,
depth + 1,
work.root_device,
result,
);
if state.is_quit() {
if self.run_one(&work.ignore, depth + 1, result).is_quit() {
self.quit_now();
return;
}
@@ -1373,7 +1196,6 @@ impl Worker {
&mut self,
ig: &Ignore,
depth: usize,
root_device: Option<u64>,
result: Result<fs::DirEntry, io::Error>,
) -> WalkState {
let fs_dent = match result {
@@ -1391,7 +1213,7 @@ impl Worker {
let is_symlink = dent.file_type().map_or(false, |ft| ft.is_symlink());
if self.follow_links && is_symlink {
let path = dent.path().to_path_buf();
dent = match DirEntryRaw::from_path(depth, path, true) {
dent = match DirEntryRaw::from_link(depth, path) {
Ok(dent) => DirEntry::new_raw(dent, None),
Err(err) => {
return (self.f)(Err(err));
@@ -1403,34 +1225,19 @@ impl Worker {
}
}
}
if let Some(ref stdout) = self.skip {
let is_stdout = match path_equals(&dent, stdout) {
Ok(is_stdout) => is_stdout,
Err(err) => return (self.f)(Err(err)),
};
if is_stdout {
return WalkState::Continue;
}
}
let is_dir = dent.is_dir();
let max_size = self.max_filesize;
let should_skip_path = skip_path(ig, dent.path(), is_dir);
let should_skip_filesize =
if !is_dir && max_size.is_some() {
skip_filesize(
max_size.unwrap(),
dent.path(),
&dent.metadata().ok(),
)
} else {
false
};
let should_skip_filesize = if !is_dir && max_size.is_some() {
skip_filesize(max_size.unwrap(), dent.path(), &dent.metadata().ok())
} else {
false
};
if !should_skip_path && !should_skip_filesize {
self.tx.send(Message::Work(Work {
self.queue.push(Message::Work(Work {
dent: dent,
ignore: ig.clone(),
root_device: root_device,
}));
}
WalkState::Continue
@@ -1445,7 +1252,7 @@ impl Worker {
if self.is_quit_now() {
return None;
}
match self.rx.try_recv() {
match self.queue.try_pop() {
Some(Message::Work(work)) => {
self.waiting(false);
self.quitting(false);
@@ -1487,7 +1294,7 @@ impl Worker {
self.quitting(false);
if self.num_waiting() == self.threads {
for _ in 0..self.threads {
self.tx.send(Message::Quit);
self.queue.push(Message::Quit);
}
} else {
// You're right to consider this suspicious, but it's
@@ -1601,11 +1408,7 @@ fn skip_filesize(
}
}
fn skip_path(
ig: &Ignore,
path: &Path,
is_dir: bool,
) -> bool {
fn skip_path(ig: &Ignore, path: &Path, is_dir: bool) -> bool {
let m = ig.matched(path, is_dir);
if m.is_ignore() {
debug!("ignoring {}: {:?}", path.display(), m);
@@ -1618,81 +1421,60 @@ fn skip_path(
}
}
/// Returns a handle to stdout for filtering search.
/// Returns true if and only if this path points to a directory.
///
/// A handle is returned if and only if stdout is being redirected to a file.
/// The handle returned corresponds to that file.
/// This works around a bug in Rust's standard library:
/// https://github.com/rust-lang/rust/issues/46484
#[cfg(windows)]
fn path_is_dir(path: &Path) -> bool {
fs::metadata(path).map(|md| metadata_is_dir(&md)).unwrap_or(false)
}
/// Returns true if and only if this entry points to a directory.
#[cfg(not(windows))]
fn path_is_dir(path: &Path) -> bool {
path.is_dir()
}
/// Returns true if and only if this path points to a file.
///
/// This can be used to ensure that we do not attempt to search a file that we
/// may also be writing to.
fn stdout_handle() -> Option<Handle> {
let h = match Handle::stdout() {
Err(_) => return None,
Ok(h) => h,
};
let md = match h.as_file().metadata() {
Err(_) => return None,
Ok(md) => md,
};
if !md.is_file() {
return None;
}
Some(h)
/// This works around a bug in Rust's standard library:
/// https://github.com/rust-lang/rust/issues/46484
#[cfg(windows)]
fn path_is_file(path: &Path) -> bool {
!path_is_dir(path)
}
/// Returns true if and only if the given directory entry is believed to be
/// equivalent to the given handle. If there was a problem querying the path
/// for information to determine equality, then that error is returned.
fn path_equals(dent: &DirEntry, handle: &Handle) -> Result<bool, Error> {
#[cfg(unix)]
fn never_equal(dent: &DirEntry, handle: &Handle) -> bool {
dent.ino() != Some(handle.ino())
}
#[cfg(not(unix))]
fn never_equal(_: &DirEntry, _: &Handle) -> bool {
false
}
// If we know for sure that these two things aren't equal, then avoid
// the costly extra stat call to determine equality.
if dent.is_stdin() || never_equal(dent, handle) {
return Ok(false);
}
Handle::from_path(dent.path())
.map(|h| &h == handle)
.map_err(|err| Error::Io(err).with_path(dent.path()))
/// Returns true if and only if this entry points to a directory.
#[cfg(not(windows))]
fn path_is_file(path: &Path) -> bool {
path.is_file()
}
/// Returns true if and only if the given path is on the same device as the
/// given root device.
fn is_same_file_system(root_device: u64, path: &Path) -> Result<bool, Error> {
let dent_device = device_num(path)
.map_err(|err| Error::Io(err).with_path(path))?;
Ok(root_device == dent_device)
/// Returns true if and only if the given walkdir entry points to a directory.
///
/// This works around a bug in Rust's standard library:
/// https://github.com/rust-lang/rust/issues/46484
#[cfg(windows)]
fn walkdir_entry_is_dir(dent: &walkdir::DirEntry) -> bool {
dent.metadata().map(|md| metadata_is_dir(&md)).unwrap_or(false)
}
#[cfg(unix)]
fn device_num<P: AsRef<Path>>(path: P)-> io::Result<u64> {
use std::os::unix::fs::MetadataExt;
path.as_ref().metadata().map(|md| md.dev())
/// Returns true if and only if the given walkdir entry points to a directory.
#[cfg(not(windows))]
fn walkdir_entry_is_dir(dent: &walkdir::DirEntry) -> bool {
dent.file_type().is_dir()
}
#[cfg(windows)]
fn device_num<P: AsRef<Path>>(path: P) -> io::Result<u64> {
use winapi_util::{Handle, file};
let h = Handle::from_path_any(path)?;
file::information(h).map(|info| info.volume_serial_number())
}
#[cfg(not(any(unix, windows)))]
fn device_num<P: AsRef<Path>>(_: P)-> io::Result<u64> {
Err(io::Error::new(
io::ErrorKind::Other,
"walkdir: same_file_system option not supported on this platform",
))
/// Returns true if and only if the given metadata points to a directory.
///
/// This works around a bug in Rust's standard library:
/// https://github.com/rust-lang/rust/issues/46484
#[cfg(windows)]
fn metadata_is_dir(md: &fs::Metadata) -> bool {
use std::os::windows::fs::MetadataExt;
use winapi::um::winnt::FILE_ATTRIBUTE_DIRECTORY;
md.file_attributes() & FILE_ATTRIBUTE_DIRECTORY != 0
}
#[cfg(test)]
@@ -1704,7 +1486,7 @@ mod tests {
use tempdir::TempDir;
use super::{DirEntry, WalkBuilder, WalkState};
use super::{WalkBuilder, WalkState};
fn wfile<P: AsRef<Path>>(path: P, contents: &str) {
let mut file = File::create(path).unwrap();
@@ -1755,32 +1537,28 @@ mod tests {
prefix: &Path,
builder: &WalkBuilder,
) -> Vec<String> {
let mut paths = vec![];
for dent in walk_collect_entries_parallel(builder) {
let path = dent.path().strip_prefix(prefix).unwrap();
if path.as_os_str().is_empty() {
continue;
}
paths.push(normal_path(path.to_str().unwrap()));
}
paths.sort();
paths
}
fn walk_collect_entries_parallel(builder: &WalkBuilder) -> Vec<DirEntry> {
let dents = Arc::new(Mutex::new(vec![]));
let paths = Arc::new(Mutex::new(vec![]));
let prefix = Arc::new(prefix.to_path_buf());
builder.build_parallel().run(|| {
let dents = dents.clone();
let paths = paths.clone();
let prefix = prefix.clone();
Box::new(move |result| {
if let Ok(dent) = result {
dents.lock().unwrap().push(dent);
let dent = match result {
Err(_) => return WalkState::Continue,
Ok(dent) => dent,
};
let path = dent.path().strip_prefix(&**prefix).unwrap();
if path.as_os_str().is_empty() {
return WalkState::Continue;
}
let mut paths = paths.lock().unwrap();
paths.push(normal_path(path.to_str().unwrap()));
WalkState::Continue
})
});
let dents = dents.lock().unwrap();
dents.to_vec()
let mut paths = paths.lock().unwrap();
paths.sort();
paths.to_vec()
}
fn mkpaths(paths: &[&str]) -> Vec<String> {
@@ -1795,9 +1573,9 @@ mod tests {
expected: &[&str],
) {
let got = walk_collect(prefix, builder);
assert_eq!(got, mkpaths(expected), "single threaded");
assert_eq!(got, mkpaths(expected));
let got = walk_collect_parallel(prefix, builder);
assert_eq!(got, mkpaths(expected), "parallel");
assert_eq!(got, mkpaths(expected));
}
#[test]
@@ -1975,27 +1753,6 @@ mod tests {
]);
}
#[cfg(unix)] // because symlinks on windows are weird
#[test]
fn first_path_not_symlink() {
let td = TempDir::new("walk-test-").unwrap();
mkdirp(td.path().join("foo"));
let dents = WalkBuilder::new(td.path().join("foo"))
.build()
.into_iter()
.collect::<Result<Vec<_>, _>>()
.unwrap();
assert_eq!(1, dents.len());
assert!(!dents[0].path_is_symlink());
let dents = walk_collect_entries_parallel(
&WalkBuilder::new(td.path().join("foo")),
);
assert_eq!(1, dents.len());
assert!(!dents[0].path_is_symlink());
}
#[cfg(unix)] // because symlinks on windows are weird
#[test]
fn symlink_loop() {
@@ -2011,40 +1768,4 @@ mod tests {
"a", "a/b",
]);
}
// It's a little tricky to test the 'same_file_system' option since
// we need an environment with more than one file system. We adopt a
// heuristic where /sys is typically a distinct volume on Linux and roll
// with that.
#[test]
#[cfg(target_os = "linux")]
fn same_file_system() {
use super::device_num;
// If for some reason /sys doesn't exist or isn't a directory, just
// skip this test.
if !Path::new("/sys").is_dir() {
return;
}
// If our test directory actually isn't a different volume from /sys,
// then this test is meaningless and we shouldn't run it.
let td = TempDir::new("walk-test-").unwrap();
if device_num(td.path()).unwrap() == device_num("/sys").unwrap() {
return;
}
mkdirp(td.path().join("same_file"));
symlink("/sys", td.path().join("same_file").join("alink"));
// Create a symlink to sys and enable following symlinks. If the
// same_file_system option doesn't work, then this probably will hit a
// permission error. Otherwise, it should just skip over the symlink
// completely.
let mut builder = WalkBuilder::new(td.path());
builder.follow_links(true).same_file_system(true);
assert_paths(td.path(), &builder, &[
"same_file", "same_file/alink",
]);
}
}

View File

@@ -13,13 +13,12 @@ use clap::{self, App, AppSettings};
const ABOUT: &str = "
ripgrep (rg) recursively searches your current directory for a regex pattern.
By default, ripgrep will respect your .gitignore and automatically skip hidden
files/directories and binary files.
By default, ripgrep will respect your `.gitignore` and automatically skip
hidden files/directories and binary files.
ripgrep's default regex engine uses finite automata and guarantees linear
time searching. Because of this, features like backreferences and arbitrary
look-around are not supported. However, if ripgrep is built with PCRE2, then
the --pcre2 flag can be used to enable backreferences and look-around.
ripgrep's regex engine uses finite automata and guarantees linear time
searching. Because of this, features like backreferences and arbitrary
lookaround are not supported.
ripgrep supports configuration files. Set RIPGREP_CONFIG_PATH to a
configuration file. The file can specify one shell argument per line. Lines
@@ -80,37 +79,10 @@ pub fn app() -> App<'static, 'static> {
/// Return the "long" format of ripgrep's version string.
///
/// If a revision hash is given, then it is used. If one isn't given, then
/// the RIPGREP_BUILD_GIT_HASH env var is inspected for it. If that isn't set,
/// the RIPGREP_BUILD_GIT_HASH env var is inspect for it. If that isn't set,
/// then a revision hash is not included in the version string returned.
pub fn long_version(revision_hash: Option<&str>) -> String {
// Do we have a git hash?
// (Yes, if ripgrep was built on a machine with `git` installed.)
let hash = match revision_hash.or(option_env!("RIPGREP_BUILD_GIT_HASH")) {
None => String::new(),
Some(githash) => format!(" (rev {})", githash),
};
// Put everything together.
let runtime = runtime_cpu_features();
if runtime.is_empty() {
format!(
"{}{}\n{} (compiled)",
crate_version!(),
hash,
compile_cpu_features().join(" ")
)
} else {
format!(
"{}{}\n{} (compiled)\n{} (runtime)",
crate_version!(),
hash,
compile_cpu_features().join(" "),
runtime.join(" ")
)
}
}
/// Returns the relevant CPU features enabled at compile time.
fn compile_cpu_features() -> Vec<&'static str> {
// Let's say whether faster CPU instructions are enabled or not.
let mut features = vec![];
if cfg!(feature = "simd-accel") {
features.push("+SIMD");
@@ -122,33 +94,14 @@ fn compile_cpu_features() -> Vec<&'static str> {
} else {
features.push("-AVX");
}
features
}
/// Returns the relevant CPU features enabled at runtime.
#[cfg(target_arch = "x86_64")]
fn runtime_cpu_features() -> Vec<&'static str> {
// This is kind of a dirty violation of abstraction, since it assumes
// knowledge about what specific SIMD features are being used.
let mut features = vec![];
if is_x86_feature_detected!("ssse3") {
features.push("+SIMD");
} else {
features.push("-SIMD");
}
if is_x86_feature_detected!("avx2") {
features.push("+AVX");
} else {
features.push("-AVX");
}
features
}
/// Returns the relevant CPU features enabled at runtime.
#[cfg(not(target_arch = "x86_64"))]
fn runtime_cpu_features() -> Vec<&'static str> {
vec![]
// Do we have a git hash?
// (Yes, if ripgrep was built on a machine with `git` installed.)
let hash = match revision_hash.or(option_env!("RIPGREP_BUILD_GIT_HASH")) {
None => String::new(),
Some(githash) => format!(" (rev {})", githash),
};
// Put everything together.
format!("{}{}\n{}", crate_version!(), hash, features.join(" "))
}
/// Arg is a light alias for a clap::Arg that is specialized to compile time
@@ -537,14 +490,9 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
// The positional arguments must be defined first and in order.
arg_pattern(&mut args);
arg_path(&mut args);
// Flags can be defined in any order, but we do it alphabetically. Note
// that each function may define multiple flags. For example,
// `flag_encoding` defines `--encoding` and `--no-encoding`. Most `--no`
// flags are hidden and merely mentioned in the docs of the corresponding
// "positive" flag.
// Flags can be defined in any order, but we do it alphabetically.
flag_after_context(&mut args);
flag_before_context(&mut args);
flag_block_buffered(&mut args);
flag_byte_offset(&mut args);
flag_case_sensitive(&mut args);
flag_color(&mut args);
@@ -554,7 +502,6 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
flag_context_separator(&mut args);
flag_count(&mut args);
flag_count_matches(&mut args);
flag_crlf(&mut args);
flag_debug(&mut args);
flag_dfa_size_limit(&mut args);
flag_encoding(&mut args);
@@ -571,8 +518,6 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
flag_ignore_case(&mut args);
flag_ignore_file(&mut args);
flag_invert_match(&mut args);
flag_json(&mut args);
flag_line_buffered(&mut args);
flag_line_number(&mut args);
flag_line_regexp(&mut args);
flag_max_columns(&mut args);
@@ -581,7 +526,6 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
flag_max_filesize(&mut args);
flag_mmap(&mut args);
flag_multiline(&mut args);
flag_multiline_dotall(&mut args);
flag_no_config(&mut args);
flag_no_ignore(&mut args);
flag_no_ignore_global(&mut args);
@@ -589,16 +533,11 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
flag_no_ignore_parent(&mut args);
flag_no_ignore_vcs(&mut args);
flag_no_messages(&mut args);
flag_no_pcre2_unicode(&mut args);
flag_null(&mut args);
flag_null_data(&mut args);
flag_one_file_system(&mut args);
flag_only_matching(&mut args);
flag_path_separator(&mut args);
flag_passthru(&mut args);
flag_pcre2(&mut args);
flag_pre(&mut args);
flag_pre_glob(&mut args);
flag_pretty(&mut args);
flag_quiet(&mut args);
flag_regex_size_limit(&mut args);
@@ -607,12 +546,9 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
flag_search_zip(&mut args);
flag_smart_case(&mut args);
flag_sort_files(&mut args);
flag_sort(&mut args);
flag_sortr(&mut args);
flag_stats(&mut args);
flag_text(&mut args);
flag_threads(&mut args);
flag_trim(&mut args);
flag_type(&mut args);
flag_type_add(&mut args);
flag_type_clear(&mut args);
@@ -688,49 +624,13 @@ This overrides the --context flag.
args.push(arg);
}
fn flag_block_buffered(args: &mut Vec<RGArg>) {
const SHORT: &str = "Force block buffering.";
const LONG: &str = long!("\
When enabled, ripgrep will use block buffering. That is, whenever a matching
line is found, it will be written to an in-memory buffer and will not be
written to stdout until the buffer reaches a certain size. This is the default
when ripgrep's stdout is redirected to a pipeline or a file. When ripgrep's
stdout is connected to a terminal, line buffering will be used. Forcing block
buffering can be useful when dumping a large amount of contents to a terminal.
Forceful block buffering can be disabled with --no-block-buffered. Note that
using --no-block-buffered causes ripgrep to revert to its default behavior of
automatically detecting the buffering strategy. To force line buffering, use
the --line-buffered flag.
");
let arg = RGArg::switch("block-buffered")
.help(SHORT).long_help(LONG)
.overrides("no-block-buffered")
.overrides("line-buffered")
.overrides("no-line-buffered");
args.push(arg);
let arg = RGArg::switch("no-block-buffered")
.hidden()
.overrides("block-buffered")
.overrides("line-buffered")
.overrides("no-line-buffered");
args.push(arg);
}
fn flag_byte_offset(args: &mut Vec<RGArg>) {
const SHORT: &str =
"Print the 0-based byte offset for each matching line.";
const LONG: &str = long!("\
Print the 0-based byte offset within the input file before each line of output.
If -o (--only-matching) is specified, print the offset of the matching part
itself.
If ripgrep does transcoding, then the byte offset is in terms of the the result
of transcoding and not the original data. This applies similarly to another
transformation on the source, such as decompression or a --pre filter. Note
that when the PCRE2 regex engine is used, then UTF-8 transcoding is done by
default.
Print the 0-based byte offset within the input file
before each line of output. If -o (--only-matching) is
specified, print the offset of the matching part itself.
");
let arg = RGArg::switch("byte-offset").short("b")
.help(SHORT).long_help(LONG);
@@ -910,32 +810,6 @@ This overrides the --count flag. Note that when --count is combined with
args.push(arg);
}
fn flag_crlf(args: &mut Vec<RGArg>) {
const SHORT: &str = "Support CRLF line terminators (useful on Windows).";
const LONG: &str = long!("\
When enabled, ripgrep will treat CRLF ('\\r\\n') as a line terminator instead
of just '\\n'.
Principally, this permits '$' in regex patterns to match just before CRLF
instead of just before LF. The underlying regex engine may not support this
natively, so ripgrep will translate all instances of '$' to '(?:\\r??$)'. This
may produce slightly different than desired match offsets. It is intended as a
work-around until the regex engine supports this natively.
CRLF support can be disabled with --no-crlf.
");
let arg = RGArg::switch("crlf")
.help(SHORT).long_help(LONG)
.overrides("no-crlf")
.overrides("null-data");
args.push(arg);
let arg = RGArg::switch("no-crlf")
.hidden()
.overrides("crlf");
args.push(arg);
}
fn flag_debug(args: &mut Vec<RGArg>) {
const SHORT: &str = "Show debug messages.";
const LONG: &str = long!("\
@@ -982,17 +856,10 @@ default value is 'auto', which will cause ripgrep to do a best effort automatic
detection of encoding on a per-file basis. Other supported values can be found
in the list of labels here:
https://encoding.spec.whatwg.org/#concept-encoding-get
This flag can be disabled with --no-encoding.
");
let arg = RGArg::flag("encoding", "ENCODING").short("E")
.help(SHORT).long_help(LONG);
args.push(arg);
let arg = RGArg::switch("no-encoding")
.hidden()
.overrides("encoding");
args.push(arg);
}
fn flag_file(args: &mut Vec<RGArg>) {
@@ -1218,97 +1085,6 @@ Invert matching. Show lines that do not match the given patterns.
args.push(arg);
}
fn flag_json(args: &mut Vec<RGArg>) {
const SHORT: &str = "Show search results in a JSON Lines format.";
const LONG: &str = long!("\
Enable printing results in a JSON Lines format.
When this flag is provided, ripgrep will emit a sequence of messages, each
encoded as a JSON object, where there are five different message types:
**begin** - A message that indicates a file is being searched and contains at
least one match.
**end** - A message the indicates a file is done being searched. This message
also include summary statistics about the search for a particular file.
**match** - A message that indicates a match was found. This includes the text
and offsets of the match.
**context** - A message that indicates a contextual line was found. This
includes the text of the line, along with any match information if the search
was inverted.
**summary** - The final message emitted by ripgrep that contains summary
statistics about the search across all files.
Since file paths or the contents of files are not guaranteed to be valid UTF-8
and JSON itself must be representable by a Unicode encoding, ripgrep will emit
all data elements as objects with one of two keys: 'text' or 'bytes'. 'text' is
a normal JSON string when the data is valid UTF-8 while 'bytes' is the base64
encoded contents of the data.
The JSON Lines format is only supported for showing search results. It cannot
be used with other flags that emit other types of output, such as --files,
--files-with-matches, --files-without-match, --count or --count-matches.
ripgrep will report an error if any of the aforementioned flags are used in
concert with --json.
Other flags that control aspects of the standard output such as
--only-matching, --heading, --replace, --max-columns, etc., have no effect
when --json is set.
A more complete description of the JSON format used can be found here:
https://docs.rs/grep-printer/*/grep_printer/struct.JSON.html
The JSON Lines format can be disabled with --no-json.
");
let arg = RGArg::switch("json")
.help(SHORT).long_help(LONG)
.overrides("no-json")
.conflicts(&[
"count", "count-matches",
"files", "files-with-matches", "files-without-match",
]);
args.push(arg);
let arg = RGArg::switch("no-json")
.hidden()
.overrides("json");
args.push(arg);
}
fn flag_line_buffered(args: &mut Vec<RGArg>) {
const SHORT: &str = "Force line buffering.";
const LONG: &str = long!("\
When enabled, ripgrep will use line buffering. That is, whenever a matching
line is found, it will be flushed to stdout immediately. This is the default
when ripgrep's stdout is connected to a terminal, but otherwise, ripgrep will
use block buffering, which is typically faster. This flag forces ripgrep to
use line buffering even if it would otherwise use block buffering. This is
typically useful in shell pipelines, e.g.,
'tail -f something.log | rg foo --line-buffered | rg bar'.
Forceful line buffering can be disabled with --no-line-buffered. Note that
using --no-line-buffered causes ripgrep to revert to its default behavior of
automatically detecting the buffering strategy. To force block buffering, use
the --block-buffered flag.
");
let arg = RGArg::switch("line-buffered")
.help(SHORT).long_help(LONG)
.overrides("no-line-buffered")
.overrides("block-buffered")
.overrides("no-block-buffered");
args.push(arg);
let arg = RGArg::switch("no-line-buffered")
.hidden()
.overrides("line-buffered")
.overrides("block-buffered")
.overrides("no-block-buffered");
args.push(arg);
}
fn flag_line_number(args: &mut Vec<RGArg>) {
const SHORT: &str = "Show line numbers.";
const LONG: &str = long!("\
@@ -1441,41 +1217,9 @@ fn flag_multiline(args: &mut Vec<RGArg>) {
const LONG: &str = long!("\
Enable matching across multiple lines.
When multiline mode is enabled, ripgrep will lift the restriction that a match
cannot include a line terminator. For example, when multiline mode is not
enabled (the default), then the regex '\\p{any}' will match any Unicode
codepoint other than '\\n'. Similarly, the regex '\\n' is explicitly forbidden,
and if you try to use it, ripgrep will return an error. However, when multiline
mode is enabled, '\\p{any}' will match any Unicode codepoint, including '\\n',
and regexes like '\\n' are permitted.
An important caveat is that multiline mode does not change the match semantics
of '.'. Namely, in most regex matchers, a '.' will by default match any
character other than '\\n', and this is true in ripgrep as well. In order to
make '.' match '\\n', you must enable the \"dot all\" flag inside the regex.
For example, both '(?s).' and '(?s:.)' have the same semantics, where '.' will
match any character, including '\\n'. Alternatively, the '--multiline-dotall'
flag may be passed to make the \"dot all\" behavior the default. This flag only
applies when multiline search is enabled.
There is no limit on the number of the lines that a single match can span.
**WARNING**: Because of how the underlying regex engine works, multiline
searches may be slower than normal line-oriented searches, and they may also
use more memory. In particular, when multiline mode is enabled, ripgrep
requires that each file it searches is laid out contiguously in memory
(either by reading it onto the heap or by memory-mapping it). Things that
cannot be memory-mapped (such as stdin) will be consumed until EOF before
searching can begin. In general, ripgrep will only do these things when
necessary. Specifically, if the --multiline flag is provided but the regex
does not contain patterns that would match '\\n' characters, then ripgrep
will automatically avoid reading each file into memory before searching it.
Nevertheless, if you only care about matches spanning at most one line, then it
is always better to disable multiline mode.
This flag can be disabled with --no-multiline.
");
let arg = RGArg::switch("multiline").short("U")
let arg = RGArg::switch("multiline")
.help(SHORT).long_help(LONG)
.overrides("no-multiline");
args.push(arg);
@@ -1486,37 +1230,6 @@ This flag can be disabled with --no-multiline.
args.push(arg);
}
fn flag_multiline_dotall(args: &mut Vec<RGArg>) {
const SHORT: &str = "Make '.' match new lines when multiline is enabled.";
const LONG: &str = long!("\
This flag enables \"dot all\" in your regex pattern, which causes '.' to match
newlines when multiline searching is enabled. This flag has no effect if
multiline searching isn't enabled with the --multiline flag.
Normally, a '.' will match any character except newlines. While this behavior
typically isn't relevant for line-oriented matching (since matches can span at
most one line), this can be useful when searching with the -U/--multiline flag.
By default, the multiline mode runs without this flag.
This flag is generally intended to be used in an alias or your ripgrep config
file if you prefer \"dot all\" semantics by default. Note that regardless of
whether this flag is used, \"dot all\" semantics can still be controlled via
inline flags in the regex pattern itself, e.g., '(?s:.)' always enables \"dot
all\" whereas '(?-s:.)' always disables \"dot all\".
This flag can be disabled with --no-multiline-dotall.
");
let arg = RGArg::switch("multiline-dotall")
.help(SHORT).long_help(LONG)
.overrides("no-multiline-dotall");
args.push(arg);
let arg = RGArg::switch("no-multiline-dotall")
.hidden()
.overrides("multiline-dotall");
args.push(arg);
}
fn flag_no_config(args: &mut Vec<RGArg>) {
const SHORT: &str = "Never read configuration files.";
const LONG: &str = long!("\
@@ -1555,7 +1268,7 @@ fn flag_no_ignore_global(args: &mut Vec<RGArg>) {
const LONG: &str = long!("\
Don't respect ignore files that come from \"global\" sources such as git's
`core.excludesFile` configuration option (which defaults to
`$HOME/.config/git/ignore`).
`$HOME/.config/git/ignore).
This flag can be disabled with the --ignore-global flag.
");
@@ -1646,48 +1359,6 @@ This flag can be disabled with the --messages flag.
args.push(arg);
}
fn flag_no_pcre2_unicode(args: &mut Vec<RGArg>) {
const SHORT: &str = "Disable Unicode mode for PCRE2 matching.";
const LONG: &str = long!("\
When PCRE2 matching is enabled, this flag will disable Unicode mode, which is
otherwise enabled by default. If PCRE2 matching is not enabled, then this flag
has no effect.
When PCRE2's Unicode mode is enabled, several different types of patterns
become Unicode aware. This includes '\\b', '\\B', '\\w', '\\W', '\\d', '\\D',
'\\s' and '\\S'. Similarly, the '.' meta character will match any Unicode
codepoint instead of any byte. Caseless matching will also use Unicode simple
case folding instead of ASCII-only case insensitivity.
Unicode mode in PCRE2 represents a critical trade off in the user experience
of ripgrep. In particular, unlike the default regex engine, PCRE2 does not
support the ability to search possibly invalid UTF-8 with Unicode features
enabled. Instead, PCRE2 *requires* that everything it searches when Unicode
mode is enabled is valid UTF-8. (Or valid UTF-16/UTF-32, but for the purposes
of ripgrep, we only discuss UTF-8.) This means that if you have PCRE2's Unicode
mode enabled and you attempt to search invalid UTF-8, then the search for that
file will halt and print an error. For this reason, when PCRE2's Unicode mode
is enabled, ripgrep will automatically \"fix\" invalid UTF-8 sequences by
replacing them with the Unicode replacement codepoint.
If you would rather see the encoding errors surfaced by PCRE2 when Unicode mode
is enabled, then pass the --no-encoding flag to disable all transcoding.
Related flags: --pcre2
This flag can be disabled with --pcre2-unicode.
");
let arg = RGArg::switch("no-pcre2-unicode")
.help(SHORT).long_help(LONG)
.overrides("pcre2-unicode");
args.push(arg);
let arg = RGArg::switch("pcre2-unicode")
.hidden()
.overrides("no-pcre2-unicode");
args.push(arg);
}
fn flag_null(args: &mut Vec<RGArg>) {
const SHORT: &str = "Print a NUL byte after file paths.";
const LONG: &str = long!("\
@@ -1701,56 +1372,6 @@ for use with xargs.
args.push(arg);
}
fn flag_null_data(args: &mut Vec<RGArg>) {
const SHORT: &str = "Use NUL as a line terminator instead of \\n.";
const LONG: &str = long!("\
Enabling this option causes ripgrep to use NUL as a line terminator instead of
the default of '\\n'.
This is useful when searching large binary files that would otherwise have very
long lines if '\\n' were used as the line terminator. In particular, ripgrep
requires that, at a minimum, each line must fit into memory. Using NUL instead
can be a useful stopgap to keep memory requirements low and avoid OOM (out of
memory) conditions.
This is also useful for processing NUL delimited data, such as that emitted
when using ripgrep's -0/--null flag or find's --print0 flag.
Using this flag implies -a/--text.
");
let arg = RGArg::switch("null-data")
.help(SHORT).long_help(LONG)
.overrides("crlf");
args.push(arg);
}
fn flag_one_file_system(args: &mut Vec<RGArg>) {
const SHORT: &str =
"Do not descend into directories on other file systems.";
const LONG: &str = long!("\
When enabled, ripgrep will not cross file system boundaries relative to where
the search started from.
Note that this applies to each path argument given to ripgrep. For example, in
the command 'rg --one-file-system /foo/bar /quux/baz', ripgrep will search both
'/foo/bar' and '/quux/baz' even if they are on different file systems, but will
not cross a file system boundary when traversing each path's directory tree.
This is similar to find's '-xdev' or '-mount' flag.
This flag can be disabled with --no-one-file-system.
");
let arg = RGArg::switch("one-file-system")
.help(SHORT).long_help(LONG)
.overrides("no-one-file-system");
args.push(arg);
let arg = RGArg::switch("no-one-file-system")
.hidden()
.overrides("one-file-system");
args.push(arg);
}
fn flag_only_matching(args: &mut Vec<RGArg>) {
const SHORT: &str = "Print only matches parts of a line.";
const LONG: &str = long!("\
@@ -1792,125 +1413,6 @@ without needing to modify the pattern.
args.push(arg);
}
fn flag_pcre2(args: &mut Vec<RGArg>) {
const SHORT: &str = "Enable PCRE2 matching.";
const LONG: &str = long!("\
When this flag is present, ripgrep will use the PCRE2 regex engine instead of
its default regex engine.
This is generally useful when you want to use features such as look-around
or backreferences.
Note that PCRE2 is an optional ripgrep feature. If PCRE2 wasn't included in
your build of ripgrep, then using this flag will result in ripgrep printing
an error message and exiting.
Related flags: --no-pcre2-unicode
This flag can be disabled with --no-pcre2.
");
let arg = RGArg::switch("pcre2").short("P")
.help(SHORT).long_help(LONG)
.overrides("no-pcre2");
args.push(arg);
let arg = RGArg::switch("no-pcre2")
.hidden()
.overrides("pcre2");
args.push(arg);
}
fn flag_pre(args: &mut Vec<RGArg>) {
const SHORT: &str = "search outputs of COMMAND FILE for each FILE";
const LONG: &str = long!("\
For each input FILE, search the standard output of COMMAND FILE rather than the
contents of FILE. This option expects the COMMAND program to either be an
absolute path or to be available in your PATH. Either an empty string COMMAND
or the `--no-pre` flag will disable this behavior.
WARNING: When this flag is set, ripgrep will unconditionally spawn a
process for every file that is searched. Therefore, this can incur an
unnecessarily large performance penalty if you don't otherwise need the
flexibility offered by this flag.
A preprocessor is not run when ripgrep is searching stdin.
When searching over sets of files that may require one of several decoders
as preprocessors, COMMAND should be a wrapper program or script which first
classifies FILE based on magic numbers/content or based on the FILE name and
then dispatches to an appropriate preprocessor. Each COMMAND also has its
standard input connected to FILE for convenience.
For example, a shell script for COMMAND might look like:
case \"$1\" in
*.pdf)
exec pdftotext \"$1\" -
;;
*)
case $(file \"$1\") in
*Zstandard*)
exec pzstd -cdq
;;
*)
exec cat
;;
esac
;;
esac
The above script uses `pdftotext` to convert a PDF file to plain text. For
all other files, the script uses the `file` utility to sniff the type of the
file based on its contents. If it is a compressed file in the Zstandard format,
then `pzstd` is used to decompress the contents to stdout.
This overrides the -z/--search-zip flag.
");
let arg = RGArg::flag("pre", "COMMAND")
.help(SHORT).long_help(LONG)
.overrides("no-pre")
.overrides("search-zip");
args.push(arg);
let arg = RGArg::switch("no-pre")
.hidden()
.overrides("pre");
args.push(arg);
}
fn flag_pre_glob(args: &mut Vec<RGArg>) {
const SHORT: &str =
"Include or exclude files from a preprocessing command.";
const LONG: &str = long!("\
This flag works in conjunction with the --pre flag. Namely, when one or more
--pre-glob flags are given, then only files that match the given set of globs
will be handed to the command specified by the --pre flag. Any non-matching
files will be searched without using the preprocessor command.
This flag is useful when searching many files with the --pre flag. Namely,
it permits the ability to avoid process overhead for files that don't need
preprocessing. For example, given the following shell script, 'pre-pdftotext':
#!/bin/sh
pdftotext \"$1\" -
then it is possible to use '--pre pre-pdftotext --pre-glob \'*.pdf\'' to make
it so ripgrep only executes the 'pre-pdftotext' command on files with a '.pdf'
extension.
Multiple --pre-glob flags may be used. Globbing rules match .gitignore globs.
Precede a glob with a ! to exclude it.
This flag has no effect if the --pre flag is not used.
");
let arg = RGArg::flag("pre-glob", "GLOB")
.help(SHORT).long_help(LONG)
.multiple()
.allow_leading_hyphen();
args.push(arg);
}
fn flag_pretty(args: &mut Vec<RGArg>) {
const SHORT: &str = "Alias for --color always --heading --line-number.";
const LONG: &str = long!("\
@@ -2016,6 +1518,64 @@ This flag can be disabled with --no-search-zip.
args.push(arg);
}
fn flag_pre(args: &mut Vec<RGArg>) {
const SHORT: &str = "search outputs of COMMAND FILE for each FILE";
const LONG: &str = long!("\
For each input FILE, search the standard output of COMMAND FILE rather than the
contents of FILE. This option expects the COMMAND program to either be an
absolute path or to be available in your PATH. Either an empty string COMMAND
or the `--no-pre` flag will disable this behavior.
WARNING: When this flag is set, ripgrep will unconditionally spawn a
process for every file that is searched. Therefore, this can incur an
unnecessarily large performance penalty if you don't otherwise need the
flexibility offered by this flag.
A preprocessor is not run when ripgrep is searching stdin.
When searching over sets of files that may require one of several decoders
as preprocessors, COMMAND should be a wrapper program or script which first
classifies FILE based on magic numbers/content or based on the FILE name and
then dispatches to an appropriate preprocessor. Each COMMAND also has its
standard input connected to FILE for convenience.
For example, a shell script for COMMAND might look like:
case \"$1\" in
*.pdf)
exec pdftotext \"$1\" -
;;
*)
case $(file \"$1\") in
*Zstandard*)
exec pzstd -cdq
;;
*)
exec cat
;;
esac
;;
esac
The above script uses `pdftotext` to convert a PDF file to plain text. For
all other files, the script uses the `file` utility to sniff the type of the
file based on its contents. If it is a compressed file in the Zstandard format,
then `pzstd` is used to decompress the contents to stdout.
This overrides the -z/--search-zip flag.
");
let arg = RGArg::flag("pre", "COMMAND")
.help(SHORT).long_help(LONG)
.overrides("no-pre")
.overrides("search-zip");
args.push(arg);
let arg = RGArg::switch("no-pre")
.hidden()
.overrides("pre");
args.push(arg);
}
fn flag_smart_case(args: &mut Vec<RGArg>) {
const SHORT: &str = "Smart case search.";
const LONG: &str = long!("\
@@ -2032,10 +1592,8 @@ This overrides the -s/--case-sensitive and -i/--ignore-case flags.
}
fn flag_sort_files(args: &mut Vec<RGArg>) {
const SHORT: &str = "DEPRECATED";
const SHORT: &str = "Sort results by file path. Implies --threads=1.";
const LONG: &str = long!("\
DEPRECATED: Use --sort or --sortr instead.
Sort results by file path. Note that this currently disables all parallelism
and runs search in a single thread.
@@ -2043,83 +1601,12 @@ This flag can be disabled with --no-sort-files.
");
let arg = RGArg::switch("sort-files")
.help(SHORT).long_help(LONG)
.hidden()
.overrides("no-sort-files")
.overrides("sort")
.overrides("sortr");
.overrides("no-sort-files");
args.push(arg);
let arg = RGArg::switch("no-sort-files")
.hidden()
.overrides("sort-files")
.overrides("sort")
.overrides("sortr");
args.push(arg);
}
fn flag_sort(args: &mut Vec<RGArg>) {
const SHORT: &str =
"Sort results in ascending order. Implies --threads=1.";
const LONG: &str = long!("\
This flag enables sorting of results in ascending order. The possible values
for this flag are:
path Sort by file path.
modified Sort by the last modified time on a file.
accessed Sort by the last accessed time on a file.
created Sort by the cretion time on a file.
none Do not sort results.
If the sorting criteria isn't available on your system (for example, creation
time is not available on ext4 file systems), then ripgrep will attempt to
detect this and print an error without searching any results. Otherwise, the
sort order is unspecified.
To sort results in reverse or descending order, use the --sortr flag. Also,
this flag overrides --sortr.
Note that sorting results currently always forces ripgrep to abandon
parallelism and run in a single thread.
");
let arg = RGArg::flag("sort", "SORTBY")
.help(SHORT).long_help(LONG)
.possible_values(&["path", "modified", "accessed", "created", "none"])
.overrides("sortr")
.overrides("sort-files")
.overrides("no-sort-files");
args.push(arg);
}
fn flag_sortr(args: &mut Vec<RGArg>) {
const SHORT: &str =
"Sort results in descending order. Implies --threads=1.";
const LONG: &str = long!("\
This flag enables sorting of results in descending order. The possible values
for this flag are:
path Sort by file path.
modified Sort by the last modified time on a file.
accessed Sort by the last accessed time on a file.
created Sort by the cretion time on a file.
none Do not sort results.
If the sorting criteria isn't available on your system (for example, creation
time is not available on ext4 file systems), then ripgrep will attempt to
detect this and print an error without searching any results. Otherwise, the
sort order is unspecified.
To sort results in ascending order, use the --sort flag. Also, this flag
overrides --sort.
Note that sorting results currently always forces ripgrep to abandon
parallelism and run in a single thread.
");
let arg = RGArg::flag("sortr", "SORTBY")
.help(SHORT).long_help(LONG)
.possible_values(&["path", "modified", "accessed", "created", "none"])
.overrides("sort")
.overrides("sort-files")
.overrides("no-sort-files");
.overrides("sort-files");
args.push(arg);
}
@@ -2134,18 +1621,11 @@ searched, and the time taken for the entire search to complete.
This set of aggregate statistics may expand over time.
Note that this flag has no effect if --files, --files-with-matches or
--files-without-match is passed.
--files-without-match is passed.");
This flag can be disabled with --no-stats.
");
let arg = RGArg::switch("stats")
.help(SHORT).long_help(LONG)
.overrides("no-stats");
args.push(arg);
.help(SHORT).long_help(LONG);
let arg = RGArg::switch("no-stats")
.hidden()
.overrides("stats");
args.push(arg);
}
@@ -2188,25 +1668,6 @@ causes ripgrep to choose the thread count using heuristics.
args.push(arg);
}
fn flag_trim(args: &mut Vec<RGArg>) {
const SHORT: &str = "Trim prefixed whitespace from matches.";
const LONG: &str = long!("\
When set, all ASCII whitespace at the beginning of each line printed will be
trimmed.
This flag can be disabled with --no-trim.
");
let arg = RGArg::switch("trim")
.help(SHORT).long_help(LONG)
.overrides("no-trim");
args.push(arg);
let arg = RGArg::switch("no-trim")
.hidden()
.overrides("trim");
args.push(arg);
}
fn flag_type(args: &mut Vec<RGArg>) {
const SHORT: &str = "Only search files matching TYPE.";
const LONG: &str = long!("\

View File

@@ -1,44 +1,35 @@
use std::cmp;
use std::env;
use std::ffi::OsStr;
use std::fs;
use std::io;
use std::fs::File;
use std::io::{self, BufRead};
use std::path::{Path, PathBuf};
use std::sync::Arc;
use std::time::SystemTime;
use atty;
use clap;
use grep::cli;
use grep::matcher::LineTerminator;
#[cfg(feature = "pcre2")]
use grep::pcre2::{
RegexMatcher as PCRE2RegexMatcher,
RegexMatcherBuilder as PCRE2RegexMatcherBuilder,
use grep::searcher::{
BinaryDetection, Encoding, MmapChoice, Searcher, SearcherBuilder,
};
use grep::printer::{
ColorSpecs, Stats,
JSON, JSONBuilder,
Standard, StandardBuilder,
Summary, SummaryBuilder, SummaryKind,
default_color_specs,
};
use grep::regex::{
RegexMatcher as RustRegexMatcher,
RegexMatcherBuilder as RustRegexMatcherBuilder,
};
use grep::searcher::{
BinaryDetection, Encoding, MmapChoice, Searcher, SearcherBuilder,
};
use grep::regex::{RegexMatcher, RegexMatcherBuilder};
use ignore::overrides::{Override, OverrideBuilder};
use ignore::types::{FileTypeDef, Types, TypesBuilder};
use ignore::{Walk, WalkBuilder, WalkParallel};
use log;
use num_cpus;
use path_printer::{PathPrinter, PathPrinterBuilder};
use regex;
use regex::{self, Regex};
use same_file::Handle;
use termcolor::{
WriteColor,
BufferWriter, ColorChoice,
BufferedStandardStream, BufferWriter, ColorChoice, StandardStream,
};
use app;
@@ -47,6 +38,7 @@ use logger::Logger;
use messages::{set_messages, set_ignore_messages};
use search::{PatternMatcher, Printer, SearchWorker, SearchWorkerBuilder};
use subject::SubjectBuilder;
use unescape::{escape, unescape};
use Result;
/// The command that ripgrep should execute based on the command line
@@ -283,9 +275,7 @@ impl Args {
let searcher = self.matches().searcher(self.paths())?;
let mut builder = SearchWorkerBuilder::new();
builder
.json_stats(self.matches().is_present("json"))
.preprocessor(self.matches().preprocessor())
.preprocessor_globs(self.matches().preprocessor_globs()?)
.search_zip(self.matches().is_present("search-zip"));
Ok(builder.build(matcher, searcher, printer))
}
@@ -308,20 +298,20 @@ impl Args {
/// file or a stream such as stdin.
pub fn subject_builder(&self) -> SubjectBuilder {
let mut builder = SubjectBuilder::new();
builder.strip_dot_prefix(self.using_default_path());
builder
.strip_dot_prefix(self.using_default_path())
.skip(self.matches().stdout_handle());
builder
}
/// Execute the given function with a writer to stdout that enables color
/// support based on the command line configuration.
pub fn stdout(&self) -> cli::StandardStream {
let color = self.matches().color_choice();
if self.matches().is_present("line-buffered") {
cli::stdout_buffered_line(color)
} else if self.matches().is_present("block-buffered") {
cli::stdout_buffered_block(color)
pub fn stdout(&self) -> Box<WriteColor + Send> {
let color_choice = self.matches().color_choice();
if atty::is(atty::Stream::Stdout) {
Box::new(StandardStream::stdout(color_choice))
} else {
cli::stdout(color)
Box::new(BufferedStandardStream::stdout(color_choice))
}
}
@@ -361,120 +351,6 @@ enum OutputKind {
JSON,
}
/// The sort criteria, if present.
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
struct SortBy {
/// Whether to reverse the sort criteria (i.e., descending order).
reverse: bool,
/// The actual sorting criteria.
kind: SortByKind,
}
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
enum SortByKind {
/// No sorting at all.
None,
/// Sort by path.
Path,
/// Sort by last modified time.
LastModified,
/// Sort by last accessed time.
LastAccessed,
/// Sort by creation time.
Created,
}
impl SortBy {
fn asc(kind: SortByKind) -> SortBy {
SortBy { reverse: false, kind: kind }
}
fn desc(kind: SortByKind) -> SortBy {
SortBy { reverse: true, kind: kind }
}
fn none() -> SortBy {
SortBy::asc(SortByKind::None)
}
/// Try to check that the sorting criteria selected is actually supported.
/// If it isn't, then an error is returned.
fn check(&self) -> Result<()> {
match self.kind {
SortByKind::None | SortByKind::Path => {}
SortByKind::LastModified => {
env::current_exe()?.metadata()?.modified()?;
}
SortByKind::LastAccessed => {
env::current_exe()?.metadata()?.accessed()?;
}
SortByKind::Created => {
env::current_exe()?.metadata()?.created()?;
}
}
Ok(())
}
fn configure_walk_builder(self, builder: &mut WalkBuilder) {
// This isn't entirely optimal. In particular, we will wind up issuing
// a stat for many files redundantly. Aside from having potentially
// inconsistent results with respect to sorting, this is also slow.
// We could fix this here at the expense of memory by caching stat
// calls. A better fix would be to find a way to push this down into
// directory traversal itself, but that's a somewhat nasty change.
match self.kind {
SortByKind::None => {}
SortByKind::Path => {
if self.reverse {
builder.sort_by_file_name(|a, b| a.cmp(b).reverse());
} else {
builder.sort_by_file_name(|a, b| a.cmp(b));
}
}
SortByKind::LastModified => {
builder.sort_by_file_path(move |a, b| {
sort_by_metadata_time(
a, b,
self.reverse,
|md| md.modified(),
)
});
}
SortByKind::LastAccessed => {
builder.sort_by_file_path(move |a, b| {
sort_by_metadata_time(
a, b,
self.reverse,
|md| md.accessed(),
)
});
}
SortByKind::Created => {
builder.sort_by_file_path(move |a, b| {
sort_by_metadata_time(
a, b,
self.reverse,
|md| md.created(),
)
});
}
}
}
}
impl SortByKind {
fn new(kind: &str) -> SortByKind {
match kind {
"none" => SortByKind::None,
"path" => SortByKind::Path,
"modified" => SortByKind::LastModified,
"accessed" => SortByKind::LastAccessed,
"created" => SortByKind::Created,
_ => SortByKind::None,
}
}
}
impl ArgMatches {
/// Create an ArgMatches from clap's parse result.
fn new(clap_matches: clap::ArgMatches<'static>) -> ArgMatches {
@@ -542,33 +418,7 @@ impl ArgMatches {
///
/// If there was a problem building the matcher (e.g., a syntax error),
/// then this returns an error.
#[cfg(feature = "pcre2")]
fn matcher(&self, patterns: &[String]) -> Result<PatternMatcher> {
if self.is_present("pcre2") {
let matcher = self.matcher_pcre2(patterns)?;
Ok(PatternMatcher::PCRE2(matcher))
} else {
let matcher = match self.matcher_rust(patterns) {
Ok(matcher) => matcher,
Err(err) => {
return Err(From::from(suggest_pcre2(err.to_string())));
}
};
Ok(PatternMatcher::RustRegex(matcher))
}
}
/// Return the matcher that should be used for searching.
///
/// If there was a problem building the matcher (e.g., a syntax error),
/// then this returns an error.
#[cfg(not(feature = "pcre2"))]
fn matcher(&self, patterns: &[String]) -> Result<PatternMatcher> {
if self.is_present("pcre2") {
return Err(From::from(
"PCRE2 is not available in this build of ripgrep",
));
}
let matcher = self.matcher_rust(patterns)?;
Ok(PatternMatcher::RustRegex(matcher))
}
@@ -577,37 +427,18 @@ impl ArgMatches {
///
/// If there was a problem building the matcher (such as a regex syntax
/// error), then an error is returned.
fn matcher_rust(&self, patterns: &[String]) -> Result<RustRegexMatcher> {
let mut builder = RustRegexMatcherBuilder::new();
fn matcher_rust(&self, patterns: &[String]) -> Result<RegexMatcher> {
let mut builder = RegexMatcherBuilder::new();
builder
.case_smart(self.case_smart())
.case_insensitive(self.case_insensitive())
.multi_line(true)
.dot_matches_new_line(false)
.unicode(true)
.octal(false)
.word(self.is_present("word-regexp"));
if self.is_present("multiline") {
builder.dot_matches_new_line(self.is_present("multiline-dotall"));
if self.is_present("crlf") {
builder
.crlf(true)
.line_terminator(None);
}
} else {
builder
.line_terminator(Some(b'\n'))
.dot_matches_new_line(false);
if self.is_present("crlf") {
builder.crlf(true);
}
// We don't need to set this in multiline mode since mulitline
// matchers don't use optimizations related to line terminators.
// Moreover, a mulitline regex used with --null-data should
// be allowed to match NUL bytes explicitly, which this would
// otherwise forbid.
if self.is_present("null-data") {
builder.line_terminator(Some(b'\x00'));
}
if !self.is_present("multiline") {
builder.line_terminator(Some(b'\n'));
}
if let Some(limit) = self.regex_size_limit()? {
builder.size_limit(limit);
@@ -618,43 +449,6 @@ impl ArgMatches {
Ok(builder.build(&patterns.join("|"))?)
}
/// Build a matcher using PCRE2.
///
/// If there was a problem building the matcher (such as a regex syntax
/// error), then an error is returned.
#[cfg(feature = "pcre2")]
fn matcher_pcre2(&self, patterns: &[String]) -> Result<PCRE2RegexMatcher> {
let mut builder = PCRE2RegexMatcherBuilder::new();
builder
.case_smart(self.case_smart())
.caseless(self.case_insensitive())
.multi_line(true)
.word(self.is_present("word-regexp"));
// For whatever reason, the JIT craps out during regex compilation with
// a "no more memory" error on 32 bit systems. So don't use it there.
if !cfg!(target_pointer_width = "32") {
builder.jit(true);
}
if self.pcre2_unicode() {
builder.utf(true).ucp(true);
if self.encoding()?.is_some() {
// SAFETY: If an encoding was specified, then we're guaranteed
// to get valid UTF-8, so we can disable PCRE2's UTF checking.
// (Feeding invalid UTF-8 to PCRE2 is undefined behavior.)
unsafe {
builder.disable_utf_check();
}
}
}
if self.is_present("multiline") {
builder.dotall(self.is_present("multiline-dotall"));
}
if self.is_present("crlf") {
builder.crlf(true);
}
Ok(builder.build(&patterns.join("|"))?)
}
/// Build a JSON printer that writes results to the given writer.
fn printer_json<W: io::Write>(&self, wtr: W) -> Result<JSON<W>> {
let mut builder = JSONBuilder::new();
@@ -696,7 +490,7 @@ impl ArgMatches {
.max_matches(self.max_count()?)
.column(self.column())
.byte_offset(self.is_present("byte-offset"))
.trim_ascii(self.is_present("trim"))
.trim_ascii(false)
.separator_search(None)
.separator_context(Some(self.context_separator()))
.separator_field_match(b":".to_vec())
@@ -735,17 +529,9 @@ impl ArgMatches {
/// Build a searcher from the command line parameters.
fn searcher(&self, paths: &[PathBuf]) -> Result<Searcher> {
let (ctx_before, ctx_after) = self.contexts()?;
let line_term =
if self.is_present("crlf") {
LineTerminator::crlf()
} else if self.is_present("null-data") {
LineTerminator::byte(b'\x00')
} else {
LineTerminator::byte(b'\n')
};
let mut builder = SearcherBuilder::new();
builder
.line_terminator(line_term)
.line_terminator(LineTerminator::byte(b'\n'))
.invert_match(self.is_present("invert-match"))
.line_number(self.line_number(paths))
.multi_line(self.is_present("multiline"))
@@ -778,8 +564,6 @@ impl ArgMatches {
.follow_links(self.is_present("follow"))
.max_filesize(self.max_file_size()?)
.threads(self.threads()?)
.same_file_system(self.is_present("one-file-system"))
.skip_stdout(true)
.overrides(self.overrides()?)
.types(self.types()?)
.hidden(!self.hidden())
@@ -794,9 +578,9 @@ impl ArgMatches {
if !self.no_ignore() {
builder.add_custom_ignore_filename(".rgignore");
}
let sortby = self.sort_by()?;
sortby.check()?;
sortby.configure_walk_builder(&mut builder);
if self.is_present("sort-files") {
builder.sort_by_file_name(|a, b| a.cmp(b));
}
Ok(builder)
}
}
@@ -808,11 +592,7 @@ impl ArgMatches {
impl ArgMatches {
/// Returns the form of binary detection to perform.
fn binary_detection(&self) -> BinaryDetection {
let none =
self.is_present("text")
|| self.unrestricted_count() >= 3
|| self.is_present("null-data");
if none {
if self.is_present("text") || self.unrestricted_count() >= 3 {
BinaryDetection::none()
} else {
BinaryDetection::quit(b'\x00')
@@ -855,7 +635,7 @@ impl ArgMatches {
} else if preference == "ansi" {
ColorChoice::AlwaysAnsi
} else if preference == "auto" {
if cli::is_tty_stdout() || self.is_present("pretty") {
if atty::is(atty::Stream::Stdout) || self.is_present("pretty") {
ColorChoice::Auto
} else {
ColorChoice::Never
@@ -871,7 +651,15 @@ impl ArgMatches {
/// is returned.
fn color_specs(&self) -> Result<ColorSpecs> {
// Start with a default set of color specs.
let mut specs = default_color_specs();
let mut specs = vec![
#[cfg(unix)]
"path:fg:magenta".parse().unwrap(),
#[cfg(windows)]
"path:fg:cyan".parse().unwrap(),
"line:fg:green".parse().unwrap(),
"match:fg:red".parse().unwrap(),
"match:style:bold".parse().unwrap(),
];
for spec_str in self.values_of_lossy_vec("colors") {
specs.push(spec_str.parse()?);
}
@@ -907,9 +695,9 @@ impl ArgMatches {
///
/// If one was not provided, the default `--` is returned.
fn context_separator(&self) -> Vec<u8> {
match self.value_of_os("context-separator") {
match self.value_of_lossy("context-separator") {
None => b"--".to_vec(),
Some(sep) => cli::unescape_os(&sep),
Some(sep) => unescape(&sep),
}
}
@@ -947,11 +735,7 @@ impl ArgMatches {
/// encoding is present, the Searcher will still do BOM sniffing for UTF-16
/// and transcode seamlessly.
fn encoding(&self) -> Result<Option<Encoding>> {
if self.is_present("no-encoding") {
return Ok(None);
}
let label = match self.value_of_lossy("encoding") {
None if self.pcre2_unicode() => "utf-8".to_string(),
None => return Ok(None),
Some(label) => label,
};
@@ -978,13 +762,14 @@ impl ArgMatches {
})
}
/// Returns true if and only if matches should be grouped with file name
/// headings.
fn heading(&self) -> bool {
if self.is_present("no-heading") || self.is_present("vimgrep") {
false
} else {
cli::is_tty_stdout()
atty::is(atty::Stream::Stdout)
|| self.is_present("heading")
|| self.is_present("pretty")
}
@@ -1028,15 +813,12 @@ impl ArgMatches {
if self.is_present("no-line-number") {
return false;
}
if self.output_kind() == OutputKind::JSON {
return true;
}
// A few things can imply counting line numbers. In particular, we
// generally want to show line numbers by default when printing to a
// tty for human consumption, except for one interesting case: when
// we're only searching stdin. This makes pipelines work as expected.
(cli::is_tty_stdout() && !self.is_only_stdin(paths))
(atty::is(atty::Stream::Stdout) && !self.is_only_stdin(paths))
|| self.is_present("line-number")
|| self.is_present("column")
|| self.is_present("pretty")
@@ -1069,7 +851,7 @@ impl ArgMatches {
// in a data structure that depends on immutability. Generally
// speaking, the worst thing that can happen is a SIGBUS (if the
// underlying file is truncated while reading it), which will cause
// ripgrep to abort. This reasoning should be treated as suspect.
// ripgrep to abort.
let maybe = unsafe { MmapChoice::auto() };
let never = MmapChoice::never();
if self.is_present("no-mmap") {
@@ -1107,21 +889,13 @@ impl ArgMatches {
/// Determine the type of output we should produce.
fn output_kind(&self) -> OutputKind {
if self.is_present("quiet") {
// While we don't technically print results (or aggregate results)
// in quiet mode, we still support the --stats flag, and those
// stats are computed by the Summary printer for now.
return OutputKind::Summary;
} else if self.is_present("json") {
return OutputKind::JSON;
}
let (count, count_matches) = self.counts();
let summary =
count
|| count_matches
|| self.is_present("files-with-matches")
|| self.is_present("files-without-match");
|| self.is_present("files-without-match")
|| self.is_present("quiet");
if summary {
OutputKind::Summary
} else {
@@ -1171,7 +945,8 @@ impl ArgMatches {
let file_is_stdin = self.values_of_os("file")
.map_or(false, |mut files| files.any(|f| f == "-"));
let search_cwd =
!cli::is_readable_stdin()
atty::is(atty::Stream::Stdin)
|| !stdin_is_readable()
|| (self.is_present("file") && file_is_stdin)
|| self.is_present("files")
|| self.is_present("type-list");
@@ -1187,9 +962,9 @@ impl ArgMatches {
/// If the provided path separator is more than a single byte, then an
/// error is returned.
fn path_separator(&self) -> Result<Option<u8>> {
let sep = match self.value_of_os("path-separator") {
let sep = match self.value_of_lossy("path-separator") {
None => return Ok(None),
Some(sep) => cli::unescape_os(&sep),
Some(sep) => unescape(&sep),
};
if sep.is_empty() {
Ok(None)
@@ -1200,7 +975,7 @@ impl ArgMatches {
In some shells on Windows '/' is automatically \
expanded. Use '//' instead.",
sep.len(),
cli::escape(&sep),
escape(&sep),
)))
} else {
Ok(Some(sep[0]))
@@ -1247,12 +1022,18 @@ impl ArgMatches {
}
}
}
if let Some(paths) = self.values_of_os("file") {
for path in paths {
if path == "-" {
pats.extend(cli::patterns_from_stdin()?);
if let Some(files) = self.values_of_os("file") {
for file in files {
if file == "-" {
let stdin = io::stdin();
for line in stdin.lock().lines() {
pats.push(self.pattern_from_str(&line?));
}
} else {
pats.extend(cli::patterns_from_path(path)?);
let f = File::open(file)?;
for line in io::BufReader::new(f).lines() {
pats.push(self.pattern_from_str(&line?));
}
}
}
}
@@ -1274,7 +1055,7 @@ impl ArgMatches {
///
/// If the pattern is not valid UTF-8, then an error is returned.
fn pattern_from_os_str(&self, pat: &OsStr) -> Result<String> {
let s = cli::pattern_from_os(pat)?;
let s = pattern_to_str(pat)?;
Ok(self.pattern_from_str(s))
}
@@ -1324,17 +1105,6 @@ impl ArgMatches {
Some(Path::new(path).to_path_buf())
}
/// Builds the set of globs for filtering files to apply to the --pre
/// flag. If no --pre-globs are available, then this always returns an
/// empty set of globs.
fn preprocessor_globs(&self) -> Result<Override> {
let mut builder = OverrideBuilder::new(env::current_dir()?);
for glob in self.values_of_lossy_vec("pre-glob") {
builder.add(&glob)?;
}
Ok(builder.build()?)
}
/// Parse the regex-size-limit argument option into a byte count.
fn regex_size_limit(&self) -> Result<Option<usize>> {
let r = self.parse_human_readable_size("regex-size-limit")?;
@@ -1346,22 +1116,6 @@ impl ArgMatches {
self.value_of_lossy("replace").map(|s| s.into_bytes())
}
/// Returns the sorting criteria based on command line parameters.
fn sort_by(&self) -> Result<SortBy> {
// For backcompat, continue supporting deprecated --sort-files flag.
if self.is_present("sort-files") {
return Ok(SortBy::asc(SortByKind::Path));
}
let sortby = match self.value_of_lossy("sort") {
None => match self.value_of_lossy("sortr") {
None => return Ok(SortBy::none()),
Some(choice) => SortBy::desc(SortByKind::new(&choice)),
}
Some(choice) => SortBy::asc(SortByKind::new(&choice)),
};
Ok(sortby)
}
/// Returns true if and only if aggregate statistics for a search should
/// be tracked.
///
@@ -1372,6 +1126,28 @@ impl ArgMatches {
self.output_kind() == OutputKind::JSON || self.is_present("stats")
}
/// Returns a handle to stdout for filtering search.
///
/// A handle is returned if and only if ripgrep's stdout is being
/// redirected to a file. The handle returned corresponds to that file.
///
/// This can be used to ensure that we do not attempt to search a file
/// that ripgrep is writing to.
fn stdout_handle(&self) -> Option<Handle> {
let h = match Handle::stdout() {
Err(_) => return None,
Ok(h) => h,
};
let md = match h.as_file().metadata() {
Err(_) => return None,
Ok(md) => md,
};
if !md.is_file() {
return None;
}
Some(h)
}
/// When the output format is `Summary`, this returns the type of summary
/// output to show.
///
@@ -1395,7 +1171,7 @@ impl ArgMatches {
/// Return the number of threads that should be used for parallelism.
fn threads(&self) -> Result<usize> {
if self.sort_by()?.kind != SortByKind::None {
if self.is_present("sort-files") {
return Ok(1);
}
let threads = self.usize_of("threads")?.unwrap_or(0);
@@ -1430,13 +1206,6 @@ impl ArgMatches {
self.occurrences_of("unrestricted")
}
/// Returns true if and only if PCRE2's Unicode mode should be enabled.
fn pcre2_unicode(&self) -> bool {
// PCRE2 Unicode is enabled by default, so only disable it when told
// to do so explicitly.
self.is_present("pcre2") && !self.is_present("no-pcre2-unicode")
}
/// Returns true if and only if file names containing each match should
/// be emitted.
fn with_filename(&self, paths: &[PathBuf]) -> bool {
@@ -1493,11 +1262,40 @@ impl ArgMatches {
&self,
arg_name: &str,
) -> Result<Option<u64>> {
let size = match self.value_of_lossy(arg_name) {
None => return Ok(None),
Some(size) => size,
lazy_static! {
static ref RE: Regex = Regex::new(r"^([0-9]+)([KMG])?$").unwrap();
}
let arg_value = match self.value_of_lossy(arg_name) {
Some(x) => x,
None => return Ok(None)
};
Ok(Some(cli::parse_human_readable_size(&size)?))
let caps = RE
.captures(&arg_value)
.ok_or_else(|| {
format!("invalid format for {}", arg_name)
})?;
let value = caps[1].parse::<u64>()?;
let suffix = caps.get(2).map(|x| x.as_str());
let v_10 = value.checked_mul(1024);
let v_20 = v_10.and_then(|x| x.checked_mul(1024));
let v_30 = v_20.and_then(|x| x.checked_mul(1024));
let try_suffix = |x: Option<u64>| {
if x.is_some() {
Ok(x)
} else {
Err(From::from(format!("number too large for {}", arg_name)))
}
};
match suffix {
None => Ok(Some(value)),
Some("K") => try_suffix(v_10),
Some("M") => try_suffix(v_20),
Some("G") => try_suffix(v_30),
_ => Err(From::from(format!("invalid suffix for {}", arg_name)))
}
}
}
@@ -1531,19 +1329,19 @@ impl ArgMatches {
}
}
/// Inspect an error resulting from building a Rust regex matcher, and if it's
/// believed to correspond to a syntax error that PCRE2 could handle, then
/// add a message to suggest the use of -P/--pcre2.
#[cfg(feature = "pcre2")]
fn suggest_pcre2(msg: String) -> String {
if !msg.contains("backreferences") && !msg.contains("look-around") {
msg
} else {
format!("{}
Consider enabling PCRE2 with the --pcre2 flag, which can handle backreferences
and look-around.", msg)
}
/// Convert an OsStr to a Unicode string.
///
/// Patterns _must_ be valid UTF-8, so if the given OsStr isn't valid UTF-8,
/// this returns an error.
fn pattern_to_str(s: &OsStr) -> Result<&str> {
s.to_str().ok_or_else(|| {
From::from(format!(
"Argument '{}' is not valid UTF-8. \
Use hex escape sequences to match arbitrary \
bytes in a pattern (e.g., \\xFF).",
s.to_string_lossy()
))
})
}
/// Convert the result of parsing a human readable file size to a `usize`,
@@ -1565,30 +1363,22 @@ fn u64_to_usize(
}
}
/// Builds a comparator for sorting two files according to a system time
/// extracted from the file's metadata.
///
/// If there was a problem extracting the metadata or if the time is not
/// available, then both entries compare equal.
fn sort_by_metadata_time<G>(
p1: &Path,
p2: &Path,
reverse: bool,
get_time: G,
) -> cmp::Ordering
where G: Fn(&fs::Metadata) -> io::Result<SystemTime>
{
let t1 = match p1.metadata().and_then(|md| get_time(&md)) {
Ok(t) => t,
Err(_) => return cmp::Ordering::Equal,
/// Returns true if and only if stdin is deemed searchable.
#[cfg(unix)]
fn stdin_is_readable() -> bool {
use std::os::unix::fs::FileTypeExt;
let ft = match Handle::stdin().and_then(|h| h.as_file().metadata()) {
Err(_) => return false,
Ok(md) => md.file_type(),
};
let t2 = match p2.metadata().and_then(|md| get_time(&md)) {
Ok(t) => t,
Err(_) => return cmp::Ordering::Equal,
};
if reverse {
t1.cmp(&t2).reverse()
} else {
t1.cmp(&t2)
}
ft.is_file() || ft.is_fifo()
}
/// Returns true if and only if stdin is deemed searchable.
#[cfg(windows)]
fn stdin_is_readable() -> bool {
// On Windows, it's not clear what the possibilities are to me, so just
// always return true.
true
}

190
src/decompressor.rs Normal file
View File

@@ -0,0 +1,190 @@
use std::collections::HashMap;
use std::ffi::OsStr;
use std::fmt;
use std::io::{self, Read};
use std::path::Path;
use std::process::{self, Stdio};
use globset::{Glob, GlobSet, GlobSetBuilder};
/// A decompression command, contains the command to be spawned as well as any
/// necessary CLI args.
#[derive(Clone, Copy, Debug)]
struct DecompressionCommand {
cmd: &'static str,
args: &'static [&'static str],
}
impl DecompressionCommand {
/// Create a new decompress command
fn new(
cmd: &'static str,
args: &'static [&'static str],
) -> DecompressionCommand {
DecompressionCommand {
cmd, args
}
}
}
impl fmt::Display for DecompressionCommand {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{} {}", self.cmd, self.args.join(" "))
}
}
lazy_static! {
static ref DECOMPRESSION_COMMANDS: HashMap<
&'static str,
DecompressionCommand,
> = {
let mut m = HashMap::new();
const ARGS: &[&str] = &["-d", "-c"];
m.insert("gz", DecompressionCommand::new("gzip", ARGS));
m.insert("bz2", DecompressionCommand::new("bzip2", ARGS));
m.insert("xz", DecompressionCommand::new("xz", ARGS));
m.insert("lz4", DecompressionCommand::new("lz4", ARGS));
const LZMA_ARGS: &[&str] = &["--format=lzma", "-d", "-c"];
m.insert("lzma", DecompressionCommand::new("xz", LZMA_ARGS));
m
};
static ref SUPPORTED_COMPRESSION_FORMATS: GlobSet = {
let mut builder = GlobSetBuilder::new();
builder.add(Glob::new("*.gz").unwrap());
builder.add(Glob::new("*.bz2").unwrap());
builder.add(Glob::new("*.xz").unwrap());
builder.add(Glob::new("*.lz4").unwrap());
builder.add(Glob::new("*.lzma").unwrap());
builder.build().unwrap()
};
static ref TAR_ARCHIVE_FORMATS: GlobSet = {
let mut builder = GlobSetBuilder::new();
builder.add(Glob::new("*.tar.gz").unwrap());
builder.add(Glob::new("*.tar.xz").unwrap());
builder.add(Glob::new("*.tar.bz2").unwrap());
builder.add(Glob::new("*.tar.lz4").unwrap());
builder.add(Glob::new("*.tgz").unwrap());
builder.add(Glob::new("*.txz").unwrap());
builder.add(Glob::new("*.tbz2").unwrap());
builder.build().unwrap()
};
}
/// DecompressionReader provides an `io::Read` implementation for a limited
/// set of compression formats.
#[derive(Debug)]
pub struct DecompressionReader {
cmd: DecompressionCommand,
child: process::Child,
done: bool,
}
impl DecompressionReader {
/// Returns a handle to the stdout of the spawned decompression process for
/// `path`, which can be directly searched in the worker. When the returned
/// value is exhausted, the underlying process is reaped. If the underlying
/// process fails, then its stderr is read and converted into a normal
/// io::Error.
///
/// If there is any error in spawning the decompression command, then
/// return `None`, after outputting any necessary debug or error messages.
pub fn from_path(path: &Path) -> Option<DecompressionReader> {
let extension = match path.extension().and_then(OsStr::to_str) {
Some(extension) => extension,
None => {
debug!(
"{}: failed to get compresson extension", path.display());
return None;
}
};
let decompression_cmd = match DECOMPRESSION_COMMANDS.get(extension) {
Some(cmd) => cmd,
None => {
debug!(
"{}: failed to get decompression command", path.display());
return None;
}
};
let cmd = process::Command::new(decompression_cmd.cmd)
.args(decompression_cmd.args)
.arg(path)
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn();
let child = match cmd {
Ok(process) => process,
Err(_) => {
debug!(
"{}: decompression command '{}' not found",
path.display(), decompression_cmd.cmd);
return None;
}
};
Some(DecompressionReader::new(*decompression_cmd, child))
}
fn new(
cmd: DecompressionCommand,
child: process::Child,
) -> DecompressionReader {
DecompressionReader {
cmd: cmd,
child: child,
done: false,
}
}
fn read_error(&mut self) -> io::Result<io::Error> {
let mut errbytes = vec![];
self.child.stderr.as_mut().unwrap().read_to_end(&mut errbytes)?;
let errstr = String::from_utf8_lossy(&errbytes);
let errstr = errstr.trim();
Ok(if errstr.is_empty() {
let msg = format!("decompression command failed: '{}'", self.cmd);
io::Error::new(io::ErrorKind::Other, msg)
} else {
let msg = format!(
"decompression command '{}' failed: {}", self.cmd, errstr);
io::Error::new(io::ErrorKind::Other, msg)
})
}
}
impl io::Read for DecompressionReader {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
if self.done {
return Ok(0);
}
let nread = self.child.stdout.as_mut().unwrap().read(buf)?;
if nread == 0 {
self.done = true;
// Reap the child now that we're done reading.
// If the command failed, report stderr as an error.
if !self.child.wait()?.success() {
return Err(self.read_error()?);
}
}
Ok(nread)
}
}
/// Returns true if the given path contains a supported compression format or
/// is a TAR archive.
pub fn is_compressed(path: &Path) -> bool {
is_supported_compression_format(path) || is_tar_archive(path)
}
/// Returns true if the given path matches any one of the supported compression
/// formats
fn is_supported_compression_format(path: &Path) -> bool {
SUPPORTED_COMPRESSION_FORMATS.is_match(path)
}
/// Returns true if the given path matches any of the known TAR file formats.
fn is_tar_archive(path: &Path) -> bool {
TAR_ARCHIVE_FORMATS.is_match(path)
}

View File

@@ -1,5 +1,7 @@
extern crate atty;
#[macro_use]
extern crate clap;
extern crate globset;
extern crate grep;
extern crate ignore;
#[macro_use]
@@ -8,11 +10,12 @@ extern crate lazy_static;
extern crate log;
extern crate num_cpus;
extern crate regex;
#[macro_use]
extern crate serde_json;
extern crate same_file;
extern crate termcolor;
#[cfg(windows)]
extern crate winapi;
use std::io::{self, Write};
use std::io;
use std::process;
use std::sync::{Arc, Mutex};
use std::time::Instant;
@@ -28,15 +31,18 @@ mod messages;
mod app;
mod args;
mod config;
mod decompressor;
mod preprocessor;
mod logger;
mod path_printer;
mod search;
mod subject;
mod unescape;
type Result<T> = ::std::result::Result<T, Box<::std::error::Error>>;
pub type Result<T> = ::std::result::Result<T, Box<::std::error::Error>>;
fn main() {
match Args::parse().and_then(try_main) {
pub fn main() {
match Args::parse().and_then(run) {
Ok(true) => process::exit(0),
Ok(false) => process::exit(1),
Err(err) => {
@@ -46,7 +52,7 @@ fn main() {
}
}
fn try_main(args: Args) -> Result<bool> {
fn run(args: Args) -> Result<bool> {
use args::Command::*;
match args.command()? {
@@ -97,7 +103,7 @@ fn search(args: Args) -> Result<bool> {
if let Some(ref stats) = stats {
let elapsed = Instant::now().duration_since(started_at);
// We don't care if we couldn't print this successfully.
let _ = searcher.print_stats(elapsed, stats);
let _ = searcher.printer().print_stats(elapsed, stats);
}
Ok(matched)
}
@@ -175,7 +181,7 @@ fn search_parallel(args: Args) -> Result<bool> {
let stats = locked_stats.lock().unwrap();
let mut searcher = args.search_worker(args.stdout())?;
// We don't care if we couldn't print this successfully.
let _ = searcher.print_stats(elapsed, &stats);
let _ = searcher.printer().print_stats(elapsed, &stats);
}
Ok(matched.load(SeqCst))
}

93
src/preprocessor.rs Normal file
View File

@@ -0,0 +1,93 @@
use std::fs::File;
use std::io::{self, Read};
use std::path::{Path, PathBuf};
use std::process::{self, Stdio};
/// PreprocessorReader provides an `io::Read` impl to read kids output.
#[derive(Debug)]
pub struct PreprocessorReader {
cmd: PathBuf,
path: PathBuf,
child: process::Child,
done: bool,
}
impl PreprocessorReader {
/// Returns a handle to the stdout of the spawned preprocessor process for
/// `path`, which can be directly searched in the worker. When the returned
/// value is exhausted, the underlying process is reaped. If the underlying
/// process fails, then its stderr is read and converted into a normal
/// io::Error.
///
/// If there is any error in spawning the preprocessor command, then
/// return the corresponding error.
pub fn from_cmd_path(
cmd: PathBuf,
path: &Path,
) -> io::Result<PreprocessorReader> {
let child = process::Command::new(&cmd)
.arg(path)
.stdin(Stdio::from(File::open(path)?))
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
.map_err(|err| {
io::Error::new(
io::ErrorKind::Other,
format!(
"error running preprocessor command '{}': {}",
cmd.display(),
err,
),
)
})?;
Ok(PreprocessorReader {
cmd: cmd,
path: path.to_path_buf(),
child: child,
done: false,
})
}
fn read_error(&mut self) -> io::Result<io::Error> {
let mut errbytes = vec![];
self.child.stderr.as_mut().unwrap().read_to_end(&mut errbytes)?;
let errstr = String::from_utf8_lossy(&errbytes);
let errstr = errstr.trim();
Ok(if errstr.is_empty() {
let msg = format!(
"preprocessor command failed: '{} {}'",
self.cmd.display(),
self.path.display(),
);
io::Error::new(io::ErrorKind::Other, msg)
} else {
let msg = format!(
"preprocessor command failed: '{} {}': {}",
self.cmd.display(),
self.path.display(),
errstr,
);
io::Error::new(io::ErrorKind::Other, msg)
})
}
}
impl io::Read for PreprocessorReader {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
if self.done {
return Ok(0);
}
let nread = self.child.stdout.as_mut().unwrap().read(buf)?;
if nread == 0 {
self.done = true;
// Reap the child now that we're done reading.
// If the command failed, report stderr as an error.
if !self.child.wait()?.success() {
return Err(self.read_error()?);
}
}
Ok(nread)
}
}

View File

@@ -1,20 +1,15 @@
use std::fs::File;
use std::io;
use std::path::{Path, PathBuf};
use std::process::{Command, Stdio};
use std::time::Duration;
use grep::cli;
use grep::matcher::Matcher;
#[cfg(feature = "pcre2")]
use grep::pcre2::{RegexMatcher as PCRE2RegexMatcher};
use grep::printer::{JSON, Standard, Summary, Stats};
use grep::regex::{RegexMatcher as RustRegexMatcher};
use grep::regex::RegexMatcher;
use grep::searcher::Searcher;
use ignore::overrides::Override;
use serde_json as json;
use termcolor::WriteColor;
use decompressor::{DecompressionReader, is_compressed};
use preprocessor::PreprocessorReader;
use subject::Subject;
/// The configuration for the search worker. Among a few other things, the
@@ -22,18 +17,14 @@ use subject::Subject;
/// at a very high level.
#[derive(Clone, Debug)]
struct Config {
json_stats: bool,
preprocessor: Option<PathBuf>,
preprocessor_globs: Override,
search_zip: bool,
}
impl Default for Config {
fn default() -> Config {
Config {
json_stats: false,
preprocessor: None,
preprocessor_globs: Override::empty(),
search_zip: false,
}
}
@@ -43,8 +34,6 @@ impl Default for Config {
#[derive(Clone, Debug)]
pub struct SearchWorkerBuilder {
config: Config,
command_builder: cli::CommandReaderBuilder,
decomp_builder: cli::DecompressionReaderBuilder,
}
impl Default for SearchWorkerBuilder {
@@ -56,17 +45,7 @@ impl Default for SearchWorkerBuilder {
impl SearchWorkerBuilder {
/// Create a new builder for configuring and constructing a search worker.
pub fn new() -> SearchWorkerBuilder {
let mut cmd_builder = cli::CommandReaderBuilder::new();
cmd_builder.async_stderr(true);
let mut decomp_builder = cli::DecompressionReaderBuilder::new();
decomp_builder.async_stderr(true);
SearchWorkerBuilder {
config: Config::default(),
command_builder: cmd_builder,
decomp_builder: decomp_builder,
}
SearchWorkerBuilder { config: Config::default() }
}
/// Create a new search worker using the given searcher, matcher and
@@ -78,24 +57,7 @@ impl SearchWorkerBuilder {
printer: Printer<W>,
) -> SearchWorker<W> {
let config = self.config.clone();
let command_builder = self.command_builder.clone();
let decomp_builder = self.decomp_builder.clone();
SearchWorker {
config, command_builder, decomp_builder,
matcher, searcher, printer,
}
}
/// Forcefully use JSON to emit statistics, even if the underlying printer
/// is not the JSON printer.
///
/// This is useful for implementing flag combinations like
/// `--json --quiet`, which uses the summary printer for implementing
/// `--quiet` but still wants to emit summary statistics, which should
/// be JSON formatted because of the `--json` flag.
pub fn json_stats(&mut self, yes: bool) -> &mut SearchWorkerBuilder {
self.config.json_stats = yes;
self
SearchWorker { config, matcher, searcher, printer }
}
/// Set the path to a preprocessor command.
@@ -111,17 +73,6 @@ impl SearchWorkerBuilder {
self
}
/// Set the globs for determining which files should be run through the
/// preprocessor. By default, with no globs and a preprocessor specified,
/// every file is run through the preprocessor.
pub fn preprocessor_globs(
&mut self,
globs: Override,
) -> &mut SearchWorkerBuilder {
self.config.preprocessor_globs = globs;
self
}
/// Enable the decompression and searching of common compressed files.
///
/// When enabled, if a particular file path is recognized as a compressed
@@ -165,9 +116,7 @@ impl SearchResult {
/// The pattern matcher used by a search worker.
#[derive(Clone, Debug)]
pub enum PatternMatcher {
RustRegex(RustRegexMatcher),
#[cfg(feature = "pcre2")]
PCRE2(PCRE2RegexMatcher),
RustRegex(RegexMatcher),
}
/// The printer used by a search worker.
@@ -185,15 +134,19 @@ pub enum Printer<W> {
}
impl<W: WriteColor> Printer<W> {
fn print_stats(
/// Print the given statistics to the underlying writer in a way that is
/// consistent with this printer's format.
///
/// While `Stats` contains a duration itself, this only corresponds to the
/// time spent searching, where as `total_duration` should roughly
/// approximate the lifespan of the ripgrep process itself.
pub fn print_stats(
&mut self,
total_duration: Duration,
stats: &Stats,
) -> io::Result<()> {
match *self {
Printer::JSON(_) => {
self.print_stats_json(total_duration, stats)
}
Printer::JSON(_) => unimplemented!(),
Printer::Standard(_) | Printer::Summary(_) => {
self.print_stats_human(total_duration, stats)
}
@@ -214,8 +167,8 @@ impl<W: WriteColor> Printer<W> {
{searches} files searched
{bytes_printed} bytes printed
{bytes_searched} bytes searched
{search_time:0.6} seconds spent searching
{process_time:0.6} seconds
{search_time:.6} seconds spent searching
{process_time:.6} seconds
",
matches = stats.matches(),
lines = stats.matched_lines(),
@@ -228,29 +181,6 @@ impl<W: WriteColor> Printer<W> {
)
}
fn print_stats_json(
&mut self,
total_duration: Duration,
stats: &Stats,
) -> io::Result<()> {
// We specifically match the format laid out by the JSON printer in
// the grep-printer crate. We simply "extend" it with the 'summary'
// message type.
let fractional = fractional_seconds(total_duration);
json::to_writer(self.get_mut(), &json!({
"type": "summary",
"data": {
"stats": stats,
"elapsed_total": {
"secs": total_duration.as_secs(),
"nanos": total_duration.subsec_nanos(),
"human": format!("{:0.6}s", fractional),
},
}
}))?;
write!(self.get_mut(), "\n")
}
/// Return a mutable reference to the underlying printer's writer.
pub fn get_mut(&mut self) -> &mut W {
match *self {
@@ -269,8 +199,6 @@ impl<W: WriteColor> Printer<W> {
#[derive(Debug)]
pub struct SearchWorker<W> {
config: Config,
command_builder: cli::CommandReaderBuilder,
decomp_builder: cli::DecompressionReaderBuilder,
matcher: PatternMatcher,
searcher: Searcher,
printer: Printer<W>,
@@ -287,24 +215,6 @@ impl<W: WriteColor> SearchWorker<W> {
&mut self.printer
}
/// Print the given statistics to the underlying writer in a way that is
/// consistent with this searcher's printer's format.
///
/// While `Stats` contains a duration itself, this only corresponds to the
/// time spent searching, where as `total_duration` should roughly
/// approximate the lifespan of the ripgrep process itself.
pub fn print_stats(
&mut self,
total_duration: Duration,
stats: &Stats,
) -> io::Result<()> {
if self.config.json_stats {
self.printer().print_stats_json(total_duration, stats)
} else {
self.printer().print_stats(total_duration, stats)
}
}
/// Search the given subject using the appropriate strategy.
fn search_impl(&mut self, subject: &Subject) -> io::Result<SearchResult> {
let path = subject.path();
@@ -312,66 +222,20 @@ impl<W: WriteColor> SearchWorker<W> {
let stdin = io::stdin();
// A `return` here appeases the borrow checker. NLL will fix this.
return self.search_reader(path, stdin.lock());
} else if self.should_preprocess(path) {
self.search_preprocessor(path)
} else if self.should_decompress(path) {
self.search_decompress(path)
} else if self.config.preprocessor.is_some() {
let cmd = self.config.preprocessor.clone().unwrap();
let rdr = PreprocessorReader::from_cmd_path(cmd, path)?;
self.search_reader(path, rdr)
} else if self.config.search_zip && is_compressed(path) {
match DecompressionReader::from_path(path) {
None => Ok(SearchResult::default()),
Some(rdr) => self.search_reader(path, rdr),
}
} else {
self.search_path(path)
}
}
/// Returns true if and only if the given file path should be
/// decompressed before searching.
fn should_decompress(&self, path: &Path) -> bool {
if !self.config.search_zip {
return false;
}
self.decomp_builder.get_matcher().has_command(path)
}
/// Returns true if and only if the given file path should be run through
/// the preprocessor.
fn should_preprocess(&self, path: &Path) -> bool {
if !self.config.preprocessor.is_some() {
return false;
}
if self.config.preprocessor_globs.is_empty() {
return true;
}
!self.config.preprocessor_globs.matched(path, false).is_ignore()
}
/// Search the given file path by first asking the preprocessor for the
/// data to search instead of opening the path directly.
fn search_preprocessor(
&mut self,
path: &Path,
) -> io::Result<SearchResult> {
let bin = self.config.preprocessor.clone().unwrap();
let mut cmd = Command::new(&bin);
cmd.arg(path).stdin(Stdio::from(File::open(path)?));
let rdr = self.command_builder.build(&mut cmd)?;
self.search_reader(path, rdr).map_err(|err| {
io::Error::new(
io::ErrorKind::Other,
format!("preprocessor command failed: '{:?}': {}", cmd, err),
)
})
}
/// Attempt to decompress the data at the given file path and search the
/// result. If the given file path isn't recognized as a compressed file,
/// then search it without doing any decompression.
fn search_decompress(
&mut self,
path: &Path,
) -> io::Result<SearchResult> {
let rdr = self.decomp_builder.build(path)?;
self.search_reader(path, rdr)
}
/// Search the contents of the given file path.
fn search_path(&mut self, path: &Path) -> io::Result<SearchResult> {
use self::PatternMatcher::*;
@@ -379,8 +243,6 @@ impl<W: WriteColor> SearchWorker<W> {
let (searcher, printer) = (&mut self.searcher, &mut self.printer);
match self.matcher {
RustRegex(ref m) => search_path(m, searcher, printer, path),
#[cfg(feature = "pcre2")]
PCRE2(ref m) => search_path(m, searcher, printer, path),
}
}
@@ -403,8 +265,6 @@ impl<W: WriteColor> SearchWorker<W> {
let (searcher, printer) = (&mut self.searcher, &mut self.printer);
match self.matcher {
RustRegex(ref m) => search_reader(m, searcher, printer, path, rdr),
#[cfg(feature = "pcre2")]
PCRE2(ref m) => search_reader(m, searcher, printer, path, rdr),
}
}
}

View File

@@ -1,17 +1,26 @@
use std::io;
use std::path::Path;
use std::sync::Arc;
use ignore::{self, DirEntry};
use same_file::Handle;
/// A configuration for describing how subjects should be built.
#[derive(Clone, Debug)]
struct Config {
skip: Option<Arc<Handle>>,
strip_dot_prefix: bool,
separator: Option<u8>,
terminator: Option<u8>,
}
impl Default for Config {
fn default() -> Config {
Config {
skip: None,
strip_dot_prefix: false,
separator: None,
terminator: None,
}
}
}
@@ -62,6 +71,27 @@ impl SubjectBuilder {
if subj.dent.is_stdin() {
return Some(subj);
}
// If we're supposed to skip a particular file, then skip it.
if let Some(ref handle) = self.config.skip {
match subj.equals(handle) {
Ok(false) => {} // fallthrough
Ok(true) => {
debug!(
"ignoring {}: (probably same file as stdout)",
subj.dent.path().display()
);
return None;
}
Err(err) => {
message!("{}: {}", subj.dent.path().display(), err);
debug!(
"ignoring {}: got error: {}",
subj.dent.path().display(), err
);
return None;
}
}
}
// If this subject has a depth of 0, then it was provided explicitly
// by an end user (or via a shell glob). In this case, we always want
// to search it if it even smells like a file (e.g., a symlink).
@@ -90,6 +120,22 @@ impl SubjectBuilder {
None
}
/// When provided, subjects that represent the same file as the handle
/// given will be skipped.
///
/// Typically, it is useful to pass a handle referring to stdout, such
/// that the file being written to isn't searched, which can lead to
/// an unbounded feedback mechanism.
///
/// Only one handle to skip can be provided.
pub fn skip(
&mut self,
handle: Option<Handle>,
) -> &mut SubjectBuilder {
self.config.skip = handle.map(Arc::new);
self
}
/// When enabled, if the subject's file path starts with `./` then it is
/// stripped.
///
@@ -127,12 +173,59 @@ impl Subject {
}
/// Returns true if and only if this subject points to a directory.
///
/// This works around a bug in Rust's standard library:
/// https://github.com/rust-lang/rust/issues/46484
#[cfg(windows)]
fn is_dir(&self) -> bool {
use std::os::windows::fs::MetadataExt;
use winapi::um::winnt::FILE_ATTRIBUTE_DIRECTORY;
self.dent.metadata().map(|md| {
md.file_attributes() & FILE_ATTRIBUTE_DIRECTORY != 0
}).unwrap_or(false)
}
/// Returns true if and only if this subject points to a directory.
#[cfg(not(windows))]
fn is_dir(&self) -> bool {
self.dent.file_type().map_or(false, |ft| ft.is_dir())
}
/// Returns true if and only if this subject points to a file.
///
/// This works around a bug in Rust's standard library:
/// https://github.com/rust-lang/rust/issues/46484
#[cfg(windows)]
fn is_file(&self) -> bool {
!self.is_dir()
}
/// Returns true if and only if this subject points to a file.
#[cfg(not(windows))]
fn is_file(&self) -> bool {
self.dent.file_type().map_or(false, |ft| ft.is_file())
}
/// Returns true if and only if this subject is believed to be equivalent
/// to the given handle. If there was a problem querying this subject for
/// information to determine equality, then that error is returned.
fn equals(&self, handle: &Handle) -> io::Result<bool> {
#[cfg(unix)]
fn never_equal(dent: &DirEntry, handle: &Handle) -> bool {
dent.ino() != Some(handle.ino())
}
#[cfg(not(unix))]
fn never_equal(_: &DirEntry, _: &Handle) -> bool {
false
}
// If we know for sure that these two things aren't equal, then avoid
// the costly extra stat call to determine equality.
if self.dent.is_stdin() || never_equal(&self.dent, handle) {
return Ok(false);
}
Handle::from_path(self.path()).map(|h| &h == handle)
}
}

137
src/unescape.rs Normal file
View File

@@ -0,0 +1,137 @@
/// A single state in the state machine used by `unescape`.
#[derive(Clone, Copy, Eq, PartialEq)]
enum State {
/// The state after seeing a `\`.
Escape,
/// The state after seeing a `\x`.
HexFirst,
/// The state after seeing a `\x[0-9A-Fa-f]`.
HexSecond(char),
/// Default state.
Literal,
}
/// Escapes an arbitrary byte slice such that it can be presented as a human
/// readable string.
pub fn escape(bytes: &[u8]) -> String {
use std::ascii::escape_default;
let escaped = bytes.iter().flat_map(|&b| escape_default(b)).collect();
String::from_utf8(escaped).unwrap()
}
/// Unescapes a string given on the command line. It supports a limited set of
/// escape sequences:
///
/// * `\t`, `\r` and `\n` are mapped to their corresponding ASCII bytes.
/// * `\xZZ` hexadecimal escapes are mapped to their byte.
pub fn unescape(s: &str) -> Vec<u8> {
use self::State::*;
let mut bytes = vec![];
let mut state = Literal;
for c in s.chars() {
match state {
Escape => {
match c {
'n' => { bytes.push(b'\n'); state = Literal; }
'r' => { bytes.push(b'\r'); state = Literal; }
't' => { bytes.push(b'\t'); state = Literal; }
'x' => { state = HexFirst; }
c => {
bytes.extend(format!(r"\{}", c).into_bytes());
state = Literal;
}
}
}
HexFirst => {
match c {
'0'...'9' | 'A'...'F' | 'a'...'f' => {
state = HexSecond(c);
}
c => {
bytes.extend(format!(r"\x{}", c).into_bytes());
state = Literal;
}
}
}
HexSecond(first) => {
match c {
'0'...'9' | 'A'...'F' | 'a'...'f' => {
let ordinal = format!("{}{}", first, c);
let byte = u8::from_str_radix(&ordinal, 16).unwrap();
bytes.push(byte);
state = Literal;
}
c => {
let original = format!(r"\x{}{}", first, c);
bytes.extend(original.into_bytes());
state = Literal;
}
}
}
Literal => {
match c {
'\\' => { state = Escape; }
c => { bytes.extend(c.to_string().as_bytes()); }
}
}
}
}
match state {
Escape => bytes.push(b'\\'),
HexFirst => bytes.extend(b"\\x"),
HexSecond(c) => bytes.extend(format!("\\x{}", c).into_bytes()),
Literal => {}
}
bytes
}
#[cfg(test)]
mod tests {
use super::unescape;
fn b(bytes: &'static [u8]) -> Vec<u8> {
bytes.to_vec()
}
#[test]
fn unescape_nul() {
assert_eq!(b(b"\x00"), unescape(r"\x00"));
}
#[test]
fn unescape_nl() {
assert_eq!(b(b"\n"), unescape(r"\n"));
}
#[test]
fn unescape_tab() {
assert_eq!(b(b"\t"), unescape(r"\t"));
}
#[test]
fn unescape_carriage() {
assert_eq!(b(b"\r"), unescape(r"\r"));
}
#[test]
fn unescape_nothing_simple() {
assert_eq!(b(b"\\a"), unescape(r"\a"));
}
#[test]
fn unescape_nothing_hex0() {
assert_eq!(b(b"\\x"), unescape(r"\x"));
}
#[test]
fn unescape_nothing_hex1() {
assert_eq!(b(b"\\xz"), unescape(r"\xz"));
}
#[test]
fn unescape_nothing_hex2() {
assert_eq!(b(b"\\xzz"), unescape(r"\xzz"));
}
}

View File

@@ -1,631 +0,0 @@
use hay::{SHERLOCK, SHERLOCK_CRLF};
use util::{Dir, TestCommand, sort_lines};
// See: https://github.com/BurntSushi/ripgrep/issues/1
rgtest!(f1_sjis, |dir: Dir, mut cmd: TestCommand| {
dir.create_bytes(
"foo",
b"\x84Y\x84u\x84\x82\x84|\x84\x80\x84{ \x84V\x84\x80\x84|\x84}\x84\x83"
);
cmd.arg("-Esjis").arg("Шерлок Холмс");
eqnice!("foo:Шерлок Холмс\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/1
rgtest!(f1_utf16_auto, |dir: Dir, mut cmd: TestCommand| {
dir.create_bytes(
"foo",
b"\xff\xfe(\x045\x04@\x04;\x04>\x04:\x04 \x00%\x04>\x04;\x04<\x04A\x04"
);
cmd.arg("Шерлок Холмс");
eqnice!("foo:Шерлок Холмс\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/1
rgtest!(f1_utf16_explicit, |dir: Dir, mut cmd: TestCommand| {
dir.create_bytes(
"foo",
b"\xff\xfe(\x045\x04@\x04;\x04>\x04:\x04 \x00%\x04>\x04;\x04<\x04A\x04"
);
cmd.arg("-Eutf-16le").arg("Шерлок Холмс");
eqnice!("foo:Шерлок Холмс\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/1
rgtest!(f1_eucjp, |dir: Dir, mut cmd: TestCommand| {
dir.create_bytes(
"foo",
b"\xa7\xba\xa7\xd6\xa7\xe2\xa7\xdd\xa7\xe0\xa7\xdc \xa7\xb7\xa7\xe0\xa7\xdd\xa7\xde\xa7\xe3"
);
cmd.arg("-Eeuc-jp").arg("Шерлок Холмс");
eqnice!("foo:Шерлок Холмс\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/1
rgtest!(f1_unknown_encoding, |_: Dir, mut cmd: TestCommand| {
cmd.arg("-Efoobar").assert_non_empty_stderr();
});
// See: https://github.com/BurntSushi/ripgrep/issues/1
rgtest!(f1_replacement_encoding, |_: Dir, mut cmd: TestCommand| {
cmd.arg("-Ecsiso2022kr").assert_non_empty_stderr();
});
// See: https://github.com/BurntSushi/ripgrep/issues/7
rgtest!(f7, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create("pat", "Sherlock\nHolmes");
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
Holmeses, success in the province of detective work must always
be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.arg("-fpat").arg("sherlock").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/7
rgtest!(f7_stdin, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
let expected = "\
sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.arg("-f-").pipe("Sherlock"));
});
// See: https://github.com/BurntSushi/ripgrep/issues/20
rgtest!(f20_no_filename, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("--no-filename");
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.arg("--no-filename").arg("Sherlock").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/34
rgtest!(f34_only_matching, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
let expected = "\
sherlock:Sherlock
sherlock:Sherlock
";
eqnice!(expected, cmd.arg("-o").arg("Sherlock").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/34
rgtest!(f34_only_matching_line_column, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
let expected = "\
sherlock:1:57:Sherlock
sherlock:3:49:Sherlock
";
cmd.arg("-o").arg("--column").arg("-n").arg("Sherlock");
eqnice!(expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/45
rgtest!(f45_relative_cwd, |dir: Dir, mut cmd: TestCommand| {
dir.create(".not-an-ignore", "foo\n/bar");
dir.create_dir("bar");
dir.create_dir("baz/bar");
dir.create_dir("baz/baz/bar");
dir.create("bar/test", "test");
dir.create("baz/bar/test", "test");
dir.create("baz/baz/bar/test", "test");
dir.create("baz/foo", "test");
dir.create("baz/test", "test");
dir.create("foo", "test");
dir.create("test", "test");
cmd.arg("-l").arg("test");
// First, get a baseline without applying ignore rules.
let expected = "
bar/test
baz/bar/test
baz/baz/bar/test
baz/foo
baz/test
foo
test
";
eqnice!(sort_lines(expected), sort_lines(&cmd.stdout()));
// Now try again with the ignore file activated.
cmd.arg("--ignore-file").arg(".not-an-ignore");
let expected = "
baz/bar/test
baz/baz/bar/test
baz/test
test
";
eqnice!(sort_lines(expected), sort_lines(&cmd.stdout()));
// Now do it again, but inside the baz directory. Since the ignore file
// is interpreted relative to the CWD, this will cause the /bar anchored
// pattern to filter out baz/bar, which is a subtle difference between true
// parent ignore files and manually specified ignore files.
let mut cmd = dir.command();
cmd.args(&["--ignore-file", "../.not-an-ignore", "-l", "test"]);
cmd.current_dir(dir.path().join("baz"));
let expected = "
baz/bar/test
test
";
eqnice!(sort_lines(expected), sort_lines(&cmd.stdout()));
});
// See: https://github.com/BurntSushi/ripgrep/issues/45
rgtest!(f45_precedence_with_others, |dir: Dir, mut cmd: TestCommand| {
dir.create(".not-an-ignore", "*.log");
dir.create(".ignore", "!imp.log");
dir.create("imp.log", "test");
dir.create("wat.log", "test");
cmd.arg("--ignore-file").arg(".not-an-ignore").arg("test");
eqnice!("imp.log:test\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/45
rgtest!(f45_precedence_internal, |dir: Dir, mut cmd: TestCommand| {
dir.create(".not-an-ignore1", "*.log");
dir.create(".not-an-ignore2", "!imp.log");
dir.create("imp.log", "test");
dir.create("wat.log", "test");
cmd.args(&[
"--ignore-file", ".not-an-ignore1",
"--ignore-file", ".not-an-ignore2",
"test",
]);
eqnice!("imp.log:test\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/68
rgtest!(f68_no_ignore_vcs, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir(".git");
dir.create(".gitignore", "foo");
dir.create(".ignore", "bar");
dir.create("foo", "test");
dir.create("bar", "test");
eqnice!("foo:test\n", cmd.arg("--no-ignore-vcs").arg("test").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/70
rgtest!(f70_smart_case, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
let expected = "\
sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.arg("-S").arg("sherlock").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/89
rgtest!(f89_files_with_matches, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("--null").arg("--files-with-matches").arg("Sherlock");
eqnice!("sherlock\x00", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/89
rgtest!(f89_files_without_match, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create("file.py", "foo");
cmd.arg("--null").arg("--files-without-match").arg("Sherlock");
eqnice!("file.py\x00", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/89
rgtest!(f89_count, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("--null").arg("--count").arg("Sherlock");
eqnice!("sherlock\x002\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/89
rgtest!(f89_files, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
eqnice!("sherlock\x00", cmd.arg("--null").arg("--files").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/89
rgtest!(f89_match, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
let expected = "\
sherlock\x00For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock\x00Holmeses, success in the province of detective work must always
sherlock\x00be, to a very large extent, the result of luck. Sherlock Holmes
sherlock\x00can extract a clew from a wisp of straw or a flake of cigar ash;
";
eqnice!(expected, cmd.arg("--null").arg("-C1").arg("Sherlock").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/109
rgtest!(f109_max_depth, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir("one");
dir.create("one/pass", "far");
dir.create_dir("one/too");
dir.create("one/too/many", "far");
cmd.arg("--maxdepth").arg("2").arg("far");
eqnice!("one/pass:far\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/124
rgtest!(f109_case_sensitive_part1, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "tEsT");
cmd.arg("--smart-case").arg("--case-sensitive").arg("test").assert_err();
});
// See: https://github.com/BurntSushi/ripgrep/issues/124
rgtest!(f109_case_sensitive_part2, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "tEsT");
cmd.arg("--ignore-case").arg("--case-sensitive").arg("test").assert_err();
});
// See: https://github.com/BurntSushi/ripgrep/issues/129
rgtest!(f129_matches, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "test\ntest abcdefghijklmnopqrstuvwxyz test");
let expected = "foo:test\nfoo:[Omitted long matching line]\n";
eqnice!(expected, cmd.arg("-M26").arg("test").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/129
rgtest!(f129_context, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "test\nabcdefghijklmnopqrstuvwxyz");
let expected = "foo:test\nfoo-[Omitted long context line]\n";
eqnice!(expected, cmd.arg("-M20").arg("-C1").arg("test").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/129
rgtest!(f129_replace, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "test\ntest abcdefghijklmnopqrstuvwxyz test");
let expected = "foo:foo\nfoo:[Omitted long line with 2 matches]\n";
eqnice!(expected, cmd.arg("-M26").arg("-rfoo").arg("test").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/159
rgtest!(f159_max_count, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "test\ntest");
eqnice!("foo:test\n", cmd.arg("-m1").arg("test").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/159
rgtest!(f159_max_count_zero, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "test\ntest");
cmd.arg("-m0").arg("test").assert_err();
});
// See: https://github.com/BurntSushi/ripgrep/issues/196
rgtest!(f196_persistent_config, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("sherlock").arg("sherlock");
// Make sure we get no matches by default.
cmd.assert_err();
// Now add our config file, and make sure it impacts ripgrep.
dir.create(".ripgreprc", "--ignore-case");
cmd.cmd().env("RIPGREP_CONFIG_PATH", ".ripgreprc");
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/243
rgtest!(f243_column_line, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "test");
eqnice!("foo:1:1:test\n", cmd.arg("--column").arg("test").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/263
rgtest!(f263_sort_files, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "test");
dir.create("abc", "test");
dir.create("zoo", "test");
dir.create("bar", "test");
let expected = "abc:test\nbar:test\nfoo:test\nzoo:test\n";
eqnice!(expected, cmd.arg("--sort-files").arg("test").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/275
rgtest!(f275_pathsep, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir("foo");
dir.create("foo/bar", "test");
cmd.arg("test").arg("--path-separator").arg("Z");
eqnice!("fooZbar:test\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/362
rgtest!(f362_dfa_size_limit, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
// This should fall back to the nfa engine but should still produce the
// expected result.
cmd.arg("--dfa-size-limit").arg("10").arg(r"For\s").arg("sherlock");
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
";
eqnice!(expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/362
rgtest!(f362_exceeds_regex_size_limit, |dir: Dir, mut cmd: TestCommand| {
// --regex-size-limit doesn't apply to PCRE2.
if dir.is_pcre2() {
return;
}
cmd.arg("--regex-size-limit").arg("10K").arg(r"[0-9]\w+").assert_err();
});
// See: https://github.com/BurntSushi/ripgrep/issues/362
#[cfg(target_pointer_width = "32")]
rgtest!(f362_u64_to_narrow_usize_overflow, |dir: Dir, mut cmd: TestCommand| {
// --dfa-size-limit doesn't apply to PCRE2.
if dir.is_pcre2() {
return;
}
dir.create_size("foo", 1000000);
// 2^35 * 2^20 is ok for u64, but not for usize
cmd.arg("--dfa-size-limit").arg("34359738368M").arg("--files");
cmd.assert_err();
});
// See: https://github.com/BurntSushi/ripgrep/issues/411
rgtest!(f411_single_threaded_search_stats, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
let lines = cmd.arg("--stats").arg("Sherlock").stdout();
assert!(lines.contains("2 matched lines"));
assert!(lines.contains("1 files contained matches"));
assert!(lines.contains("1 files searched"));
assert!(lines.contains("seconds"));
});
rgtest!(f411_parallel_search_stats, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock_1", SHERLOCK);
dir.create("sherlock_2", SHERLOCK);
let lines = cmd.arg("--stats").arg("Sherlock").stdout();
assert!(lines.contains("4 matched lines"));
assert!(lines.contains("2 files contained matches"));
assert!(lines.contains("2 files searched"));
assert!(lines.contains("seconds"));
});
// See: https://github.com/BurntSushi/ripgrep/issues/416
rgtest!(f416_crlf, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK_CRLF);
cmd.arg("--crlf").arg(r"Sherlock$").arg("sherlock");
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock\r
";
eqnice!(expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/416
rgtest!(f416_crlf_multiline, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK_CRLF);
cmd.arg("--crlf").arg("-U").arg(r"Sherlock$").arg("sherlock");
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock\r
";
eqnice!(expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/416
rgtest!(f416_crlf_only_matching, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK_CRLF);
cmd.arg("--crlf").arg("-o").arg(r"Sherlock$").arg("sherlock");
let expected = "\
Sherlock\r
";
eqnice!(expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/419
rgtest!(f419_zero_as_shortcut_for_null, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("-0").arg("--count").arg("Sherlock");
eqnice!("sherlock\x002\n", cmd.stdout());
});
rgtest!(f740_passthru, |dir: Dir, mut cmd: TestCommand| {
dir.create("file", "\nfoo\nbar\nfoobar\n\nbaz\n");
dir.create("patterns", "foo\nbar\n");
// We can't assume that the way colour specs are translated to ANSI
// sequences will remain stable, and --replace doesn't currently work with
// pass-through, so for now we don't actually test the match sub-strings
let common_args = &["-n", "--passthru"];
let foo_expected = "\
1-
2:foo
3-bar
4:foobar
5-
6-baz
";
// With single pattern
cmd.args(common_args).arg("foo").arg("file");
eqnice!(foo_expected, cmd.stdout());
let foo_bar_expected = "\
1-
2:foo
3:bar
4:foobar
5-
6-baz
";
// With multiple -e patterns
let mut cmd = dir.command();
cmd.args(common_args);
cmd.args(&["-e", "foo", "-e", "bar", "file"]);
eqnice!(foo_bar_expected, cmd.stdout());
// With multiple -f patterns
let mut cmd = dir.command();
cmd.args(common_args);
cmd.args(&["-f", "patterns", "file"]);
eqnice!(foo_bar_expected, cmd.stdout());
// -c should override
let mut cmd = dir.command();
cmd.args(common_args);
cmd.args(&["-c", "foo", "file"]);
eqnice!("2\n", cmd.stdout());
let only_foo_expected = "\
1-
2:foo
3-bar
4:foo
5-
6-baz
";
// -o should work
let mut cmd = dir.command();
cmd.args(common_args);
cmd.args(&["-o", "foo", "file"]);
eqnice!(only_foo_expected, cmd.stdout());
let replace_foo_expected = "\
1-
2:wat
3-bar
4:watbar
5-
6-baz
";
// -r should work
let mut cmd = dir.command();
cmd.args(common_args);
cmd.args(&["-r", "wat", "foo", "file"]);
eqnice!(replace_foo_expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/948
rgtest!(f948_exit_code_match, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg(".");
cmd.assert_exit_code(0);
});
// See: https://github.com/BurntSushi/ripgrep/issues/948
rgtest!(f948_exit_code_no_match, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("NADA");
cmd.assert_exit_code(1);
});
// See: https://github.com/BurntSushi/ripgrep/issues/948
rgtest!(f948_exit_code_error, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("*");
cmd.assert_exit_code(2);
});
// See: https://github.com/BurntSushi/ripgrep/issues/917
rgtest!(f917_trim, |dir: Dir, mut cmd: TestCommand| {
const SHERLOCK: &'static str = "\
zzz
For the Doctor Watsons of this world, as opposed to the Sherlock
Holmeses, success in the province of detective work must always
\tbe, to a very large extent, the result of luck. Sherlock Holmes
can extract a clew from a wisp of straw or a flake of cigar ash;
but Doctor Watson has to have it taken out for him and dusted,
and exhibited clearly, with a label attached.
";
dir.create("sherlock", SHERLOCK);
cmd.args(&[
"-n", "-B1", "-A2", "--trim", "Holmeses", "sherlock",
]);
let expected = "\
2-For the Doctor Watsons of this world, as opposed to the Sherlock
3:Holmeses, success in the province of detective work must always
4-be, to a very large extent, the result of luck. Sherlock Holmes
5-can extract a clew from a wisp of straw or a flake of cigar ash;
";
eqnice!(expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/917
//
// This is like f917_trim, except this tests that trimming occurs even when the
// whitespace is part of a match.
rgtest!(f917_trim_match, |dir: Dir, mut cmd: TestCommand| {
const SHERLOCK: &'static str = "\
zzz
For the Doctor Watsons of this world, as opposed to the Sherlock
Holmeses, success in the province of detective work must always
\tbe, to a very large extent, the result of luck. Sherlock Holmes
can extract a clew from a wisp of straw or a flake of cigar ash;
but Doctor Watson has to have it taken out for him and dusted,
and exhibited clearly, with a label attached.
";
dir.create("sherlock", SHERLOCK);
cmd.args(&[
"-n", "-B1", "-A2", "--trim", r"\s+Holmeses", "sherlock",
]);
let expected = "\
2-For the Doctor Watsons of this world, as opposed to the Sherlock
3:Holmeses, success in the province of detective work must always
4-be, to a very large extent, the result of luck. Sherlock Holmes
5-can extract a clew from a wisp of straw or a flake of cigar ash;
";
eqnice!(expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/993
rgtest!(f993_null_data, |dir: Dir, mut cmd: TestCommand| {
dir.create("test", "foo\x00bar\x00\x00\x00baz\x00");
cmd.arg("--null-data").arg(r".+").arg("test");
// If we just used -a instead of --null-data, then the result would include
// all NUL bytes.
let expected = "foo\x00bar\x00baz\x00";
eqnice!(expected, cmd.stdout());
});

View File

@@ -7,11 +7,18 @@ but Doctor Watson has to have it taken out for him and dusted,
and exhibited clearly, with a label attached.
";
pub const SHERLOCK_CRLF: &'static str = "\
For the Doctor Watsons of this world, as opposed to the Sherlock\r
Holmeses, success in the province of detective work must always\r
be, to a very large extent, the result of luck. Sherlock Holmes\r
can extract a clew from a wisp of straw or a flake of cigar ash;\r
but Doctor Watson has to have it taken out for him and dusted,\r
and exhibited clearly, with a label attached.\r
pub const CODE: &'static str = "\
extern crate snap;
use std::io;
fn main() {
let stdin = io::stdin();
let stdout = io::stdout();
// Wrap the stdin reader in a Snappy reader.
let mut rdr = snap::Reader::new(stdin.lock());
let mut wtr = stdout.lock();
io::copy(&mut rdr, &mut wtr).expect(\"I/O operation failed\");
}
";

View File

@@ -1,306 +0,0 @@
use std::time;
use serde_json as json;
use hay::{SHERLOCK, SHERLOCK_CRLF};
use util::{Dir, TestCommand};
#[derive(Clone, Debug, Deserialize, PartialEq, Eq)]
#[serde(tag = "type", content = "data")]
#[serde(rename_all = "snake_case")]
enum Message {
Begin(Begin),
End(End),
Match(Match),
Context(Context),
Summary(Summary),
}
impl Message {
fn unwrap_begin(&self) -> Begin {
match *self {
Message::Begin(ref x) => x.clone(),
ref x => panic!("expected Message::Begin but got {:?}", x),
}
}
fn unwrap_end(&self) -> End {
match *self {
Message::End(ref x) => x.clone(),
ref x => panic!("expected Message::End but got {:?}", x),
}
}
fn unwrap_match(&self) -> Match {
match *self {
Message::Match(ref x) => x.clone(),
ref x => panic!("expected Message::Match but got {:?}", x),
}
}
fn unwrap_context(&self) -> Context {
match *self {
Message::Context(ref x) => x.clone(),
ref x => panic!("expected Message::Context but got {:?}", x),
}
}
fn unwrap_summary(&self) -> Summary {
match *self {
Message::Summary(ref x) => x.clone(),
ref x => panic!("expected Message::Summary but got {:?}", x),
}
}
}
#[derive(Clone, Debug, Deserialize, PartialEq, Eq)]
struct Begin {
path: Option<Data>,
}
#[derive(Clone, Debug, Deserialize, PartialEq, Eq)]
struct End {
path: Option<Data>,
binary_offset: Option<u64>,
stats: Stats,
}
#[derive(Clone, Debug, Deserialize, PartialEq, Eq)]
struct Summary {
elapsed_total: Duration,
stats: Stats,
}
#[derive(Clone, Debug, Deserialize, PartialEq, Eq)]
struct Match {
path: Option<Data>,
lines: Data,
line_number: Option<u64>,
absolute_offset: u64,
submatches: Vec<SubMatch>,
}
#[derive(Clone, Debug, Deserialize, PartialEq, Eq)]
struct Context {
path: Option<Data>,
lines: Data,
line_number: Option<u64>,
absolute_offset: u64,
submatches: Vec<SubMatch>,
}
#[derive(Clone, Debug, Deserialize, PartialEq, Eq)]
struct SubMatch {
#[serde(rename = "match")]
m: Data,
start: usize,
end: usize,
}
#[derive(Clone, Debug, Deserialize, PartialEq, Eq)]
#[serde(untagged)]
enum Data {
Text { text: String },
// This variant is used when the data isn't valid UTF-8. The bytes are
// base64 encoded, so using a String here is OK.
Bytes { bytes: String },
}
impl Data {
fn text(s: &str) -> Data { Data::Text { text: s.to_string() } }
fn bytes(s: &str) -> Data { Data::Bytes { bytes: s.to_string() } }
}
#[derive(Clone, Debug, Deserialize, PartialEq, Eq)]
struct Stats {
elapsed: Duration,
searches: u64,
searches_with_match: u64,
bytes_searched: u64,
bytes_printed: u64,
matched_lines: u64,
matches: u64,
}
#[derive(Clone, Debug, Deserialize, PartialEq, Eq)]
struct Duration {
#[serde(flatten)]
duration: time::Duration,
human: String,
}
/// Decode JSON Lines into a Vec<Message>. If there was an error decoding,
/// this function panics.
fn json_decode(jsonlines: &str) -> Vec<Message> {
json::Deserializer::from_str(jsonlines)
.into_iter()
.collect::<Result<Vec<Message>, _>>()
.unwrap()
}
rgtest!(basic, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("--json").arg("-B1").arg("Sherlock Holmes").arg("sherlock");
let msgs = json_decode(&cmd.stdout());
assert_eq!(
msgs[0].unwrap_begin(),
Begin { path: Some(Data::text("sherlock")) }
);
assert_eq!(
msgs[1].unwrap_context(),
Context {
path: Some(Data::text("sherlock")),
lines: Data::text("Holmeses, success in the province of detective work must always\n"),
line_number: Some(2),
absolute_offset: 65,
submatches: vec![],
}
);
assert_eq!(
msgs[2].unwrap_match(),
Match {
path: Some(Data::text("sherlock")),
lines: Data::text("be, to a very large extent, the result of luck. Sherlock Holmes\n"),
line_number: Some(3),
absolute_offset: 129,
submatches: vec![
SubMatch {
m: Data::text("Sherlock Holmes"),
start: 48,
end: 63,
},
],
}
);
assert_eq!(
msgs[3].unwrap_end().path,
Some(Data::text("sherlock"))
);
assert_eq!(
msgs[3].unwrap_end().binary_offset,
None
);
assert_eq!(
msgs[4].unwrap_summary().stats.searches_with_match,
1
);
assert_eq!(
msgs[4].unwrap_summary().stats.bytes_printed,
494
);
});
#[cfg(unix)]
rgtest!(notutf8, |dir: Dir, mut cmd: TestCommand| {
use std::ffi::OsStr;
use std::os::unix::ffi::OsStrExt;
// This test does not work with PCRE2 because PCRE2 does not support the
// `u` flag.
if dir.is_pcre2() {
return;
}
// macOS doesn't like this either... sigh.
if cfg!(target_os = "macos") {
return;
}
let name = &b"foo\xFFbar"[..];
let contents = &b"quux\xFFbaz"[..];
// APFS does not support creating files with invalid UTF-8 bytes, so just
// skip the test if we can't create our file.
if !dir.try_create_bytes(OsStr::from_bytes(name), contents).is_ok() {
return;
}
cmd.arg("--json").arg(r"(?-u)\xFF");
let msgs = json_decode(&cmd.stdout());
assert_eq!(
msgs[0].unwrap_begin(),
Begin { path: Some(Data::bytes("Zm9v/2Jhcg==")) }
);
assert_eq!(
msgs[1].unwrap_match(),
Match {
path: Some(Data::bytes("Zm9v/2Jhcg==")),
lines: Data::bytes("cXV1eP9iYXo="),
line_number: Some(1),
absolute_offset: 0,
submatches: vec![
SubMatch {
m: Data::bytes("/w=="),
start: 4,
end: 5,
},
],
}
);
});
rgtest!(notutf8_file, |dir: Dir, mut cmd: TestCommand| {
use std::ffi::OsStr;
// This test does not work with PCRE2 because PCRE2 does not support the
// `u` flag.
if dir.is_pcre2() {
return;
}
let name = "foo";
let contents = &b"quux\xFFbaz"[..];
// APFS does not support creating files with invalid UTF-8 bytes, so just
// skip the test if we can't create our file.
if !dir.try_create_bytes(OsStr::new(name), contents).is_ok() {
return;
}
cmd.arg("--json").arg(r"(?-u)\xFF");
let msgs = json_decode(&cmd.stdout());
assert_eq!(
msgs[0].unwrap_begin(),
Begin { path: Some(Data::text("foo")) }
);
assert_eq!(
msgs[1].unwrap_match(),
Match {
path: Some(Data::text("foo")),
lines: Data::bytes("cXV1eP9iYXo="),
line_number: Some(1),
absolute_offset: 0,
submatches: vec![
SubMatch {
m: Data::bytes("/w=="),
start: 4,
end: 5,
},
],
}
);
});
// See: https://github.com/BurntSushi/ripgrep/issues/416
//
// This test in particular checks that our match does _not_ include the `\r`
// even though the '$' may be rewritten as '(?:\r??$)' and could thus include
// `\r` in the match.
rgtest!(crlf, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK_CRLF);
cmd.arg("--json").arg("--crlf").arg(r"Sherlock$").arg("sherlock");
let msgs = json_decode(&cmd.stdout());
assert_eq!(
msgs[1].unwrap_match().submatches[0].clone(),
SubMatch {
m: Data::text("Sherlock"),
start: 56,
end: 64,
},
);
});

View File

@@ -1,61 +0,0 @@
#[macro_export]
macro_rules! rgtest {
($name:ident, $fun:expr) => {
#[test]
fn $name() {
let (dir, cmd) = ::util::setup(stringify!($name));
$fun(dir, cmd);
if cfg!(feature = "pcre2") {
let (dir, cmd) = ::util::setup_pcre2(stringify!($name));
$fun(dir, cmd);
}
}
}
}
#[macro_export]
macro_rules! eqnice {
($expected:expr, $got:expr) => {
let expected = &*$expected;
let got = &*$got;
if expected != got {
panic!("
printed outputs differ!
expected:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
{}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
got:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
{}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
", expected, got);
}
}
}
#[macro_export]
macro_rules! eqnice_repr {
($expected:expr, $got:expr) => {
let expected = &*$expected;
let got = &*$got;
if expected != got {
panic!("
printed outputs differ!
expected:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
{:?}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
got:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
{:?}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
", expected, got);
}
}
}

View File

@@ -1,966 +0,0 @@
use hay::SHERLOCK;
use util::{Dir, TestCommand, cmd_exists, sort_lines};
// This file contains "miscellaneous" tests that were either written before
// features were tracked more explicitly, or were simply written without
// linking them to a specific issue number. We should try to minimize the
// addition of more tests in this file and instead add them to either the
// regression test suite or the feature test suite (found in regression.rs and
// feature.rs, respectively).
rgtest!(single_file, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.arg("Sherlock").arg("sherlock").stdout());
});
rgtest!(dir, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
let expected = "\
sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.arg("Sherlock").stdout());
});
rgtest!(line_numbers, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
let expected = "\
1:For the Doctor Watsons of this world, as opposed to the Sherlock
3:be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.arg("-n").arg("Sherlock").arg("sherlock").stdout());
});
rgtest!(columns, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("--column").arg("Sherlock").arg("sherlock");
let expected = "\
1:57:For the Doctor Watsons of this world, as opposed to the Sherlock
3:49:be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(with_filename, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("-H").arg("Sherlock").arg("sherlock");
let expected = "\
sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(with_heading, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.args(&[
// This forces the issue since --with-filename is disabled by default
// when searching one file.
"--with-filename", "--heading",
"Sherlock", "sherlock",
]);
let expected = "\
sherlock
For the Doctor Watsons of this world, as opposed to the Sherlock
be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(with_heading_default, |dir: Dir, mut cmd: TestCommand| {
// Search two or more and get --with-filename enabled by default.
// Use -j1 to get deterministic results.
dir.create("sherlock", SHERLOCK);
dir.create("foo", "Sherlock Holmes lives on Baker Street.");
cmd.arg("-j1").arg("--heading").arg("Sherlock");
let expected = "\
foo
Sherlock Holmes lives on Baker Street.
sherlock
For the Doctor Watsons of this world, as opposed to the Sherlock
be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(sort_lines(expected), sort_lines(&cmd.stdout()));
});
rgtest!(inverted, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("-v").arg("Sherlock").arg("sherlock");
let expected = "\
Holmeses, success in the province of detective work must always
can extract a clew from a wisp of straw or a flake of cigar ash;
but Doctor Watson has to have it taken out for him and dusted,
and exhibited clearly, with a label attached.
";
eqnice!(expected, cmd.stdout());
});
rgtest!(inverted_line_numbers, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("-n").arg("-v").arg("Sherlock").arg("sherlock");
let expected = "\
2:Holmeses, success in the province of detective work must always
4:can extract a clew from a wisp of straw or a flake of cigar ash;
5:but Doctor Watson has to have it taken out for him and dusted,
6:and exhibited clearly, with a label attached.
";
eqnice!(expected, cmd.stdout());
});
rgtest!(case_insensitive, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("-i").arg("sherlock").arg("sherlock");
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(word, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("-w").arg("as").arg("sherlock");
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
";
eqnice!(expected, cmd.stdout());
});
rgtest!(line, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.args(&[
"-x",
"Watson|and exhibited clearly, with a label attached.",
"sherlock",
]);
let expected = "\
and exhibited clearly, with a label attached.
";
eqnice!(expected, cmd.stdout());
});
rgtest!(literal, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create("file", "blib\n()\nblab\n");
cmd.arg("-F").arg("()").arg("file");
eqnice!("()\n", cmd.stdout());
});
rgtest!(quiet, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("-q").arg("Sherlock").arg("sherlock");
assert!(cmd.stdout().is_empty());
});
rgtest!(replace, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("-r").arg("FooBar").arg("Sherlock").arg("sherlock");
let expected = "\
For the Doctor Watsons of this world, as opposed to the FooBar
be, to a very large extent, the result of luck. FooBar Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(replace_groups, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.args(&[
"-r", "$2, $1", "([A-Z][a-z]+) ([A-Z][a-z]+)", "sherlock",
]);
let expected = "\
For the Watsons, Doctor of this world, as opposed to the Sherlock
be, to a very large extent, the result of luck. Holmes, Sherlock
but Watson, Doctor has to have it taken out for him and dusted,
";
eqnice!(expected, cmd.stdout());
});
rgtest!(replace_named_groups, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.args(&[
"-r", "$last, $first",
"(?P<first>[A-Z][a-z]+) (?P<last>[A-Z][a-z]+)",
"sherlock",
]);
let expected = "\
For the Watsons, Doctor of this world, as opposed to the Sherlock
be, to a very large extent, the result of luck. Holmes, Sherlock
but Watson, Doctor has to have it taken out for him and dusted,
";
eqnice!(expected, cmd.stdout());
});
rgtest!(replace_with_only_matching, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("-o").arg("-r").arg("$1").arg(r"of (\w+)").arg("sherlock");
let expected = "\
this
detective
luck
straw
cigar
";
eqnice!(expected, cmd.stdout());
});
rgtest!(file_types, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create("file.py", "Sherlock");
dir.create("file.rs", "Sherlock");
cmd.arg("-t").arg("rust").arg("Sherlock");
eqnice!("file.rs:Sherlock\n", cmd.stdout());
});
rgtest!(file_types_all, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create("file.py", "Sherlock");
cmd.arg("-t").arg("all").arg("Sherlock");
eqnice!("file.py:Sherlock\n", cmd.stdout());
});
rgtest!(file_types_negate, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.remove("sherlock");
dir.create("file.py", "Sherlock");
dir.create("file.rs", "Sherlock");
cmd.arg("-T").arg("rust").arg("Sherlock");
eqnice!("file.py:Sherlock\n", cmd.stdout());
});
rgtest!(file_types_negate_all, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create("file.py", "Sherlock");
cmd.arg("-T").arg("all").arg("Sherlock");
let expected = "\
sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(file_type_clear, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create("file.py", "Sherlock");
dir.create("file.rs", "Sherlock");
cmd.arg("--type-clear").arg("rust").arg("-t").arg("rust").arg("Sherlock");
cmd.assert_non_empty_stderr();
});
rgtest!(file_type_add, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create("file.py", "Sherlock");
dir.create("file.rs", "Sherlock");
dir.create("file.wat", "Sherlock");
cmd.args(&[
"--type-add", "wat:*.wat", "-t", "wat", "Sherlock",
]);
eqnice!("file.wat:Sherlock\n", cmd.stdout());
});
rgtest!(file_type_add_compose, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create("file.py", "Sherlock");
dir.create("file.rs", "Sherlock");
dir.create("file.wat", "Sherlock");
cmd.args(&[
"--type-add", "wat:*.wat",
"--type-add", "combo:include:wat,py",
"-t", "combo",
"Sherlock",
]);
let expected = "\
file.py:Sherlock
file.wat:Sherlock
";
eqnice!(expected, sort_lines(&cmd.stdout()));
});
rgtest!(glob, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create("file.py", "Sherlock");
dir.create("file.rs", "Sherlock");
cmd.arg("-g").arg("*.rs").arg("Sherlock");
eqnice!("file.rs:Sherlock\n", cmd.stdout());
});
rgtest!(glob_negate, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.remove("sherlock");
dir.create("file.py", "Sherlock");
dir.create("file.rs", "Sherlock");
cmd.arg("-g").arg("!*.rs").arg("Sherlock");
eqnice!("file.py:Sherlock\n", cmd.stdout());
});
rgtest!(glob_case_insensitive, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create("file.HTML", "Sherlock");
cmd.arg("--iglob").arg("*.html").arg("Sherlock");
eqnice!("file.HTML:Sherlock\n", cmd.stdout());
});
rgtest!(glob_case_sensitive, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create("file1.HTML", "Sherlock");
dir.create("file2.html", "Sherlock");
cmd.arg("--glob").arg("*.html").arg("Sherlock");
eqnice!("file2.html:Sherlock\n", cmd.stdout());
});
rgtest!(byte_offset_only_matching, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("-b").arg("-o").arg("Sherlock");
let expected = "\
sherlock:56:Sherlock
sherlock:177:Sherlock
";
eqnice!(expected, cmd.stdout());
});
rgtest!(count, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("--count").arg("Sherlock");
let expected = "sherlock:2\n";
eqnice!(expected, cmd.stdout());
});
rgtest!(count_matches, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("--count-matches").arg("the");
let expected = "sherlock:4\n";
eqnice!(expected, cmd.stdout());
});
rgtest!(count_matches_inverted, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("--count-matches").arg("--invert-match").arg("Sherlock");
let expected = "sherlock:4\n";
eqnice!(expected, cmd.stdout());
});
rgtest!(count_matches_via_only, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("--count").arg("--only-matching").arg("the");
let expected = "sherlock:4\n";
eqnice!(expected, cmd.stdout());
});
rgtest!(files_with_matches, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("--files-with-matches").arg("Sherlock");
let expected = "sherlock\n";
eqnice!(expected, cmd.stdout());
});
rgtest!(files_without_match, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create("file.py", "foo");
cmd.arg("--files-without-match").arg("Sherlock");
let expected = "file.py\n";
eqnice!(expected, cmd.stdout());
});
rgtest!(after_context, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("-A").arg("1").arg("Sherlock").arg("sherlock");
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
Holmeses, success in the province of detective work must always
be, to a very large extent, the result of luck. Sherlock Holmes
can extract a clew from a wisp of straw or a flake of cigar ash;
";
eqnice!(expected, cmd.stdout());
});
rgtest!(after_context_line_numbers, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("-A").arg("1").arg("-n").arg("Sherlock").arg("sherlock");
let expected = "\
1:For the Doctor Watsons of this world, as opposed to the Sherlock
2-Holmeses, success in the province of detective work must always
3:be, to a very large extent, the result of luck. Sherlock Holmes
4-can extract a clew from a wisp of straw or a flake of cigar ash;
";
eqnice!(expected, cmd.stdout());
});
rgtest!(before_context, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("-B").arg("1").arg("Sherlock").arg("sherlock");
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
Holmeses, success in the province of detective work must always
be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(before_context_line_numbers, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("-B").arg("1").arg("-n").arg("Sherlock").arg("sherlock");
let expected = "\
1:For the Doctor Watsons of this world, as opposed to the Sherlock
2-Holmeses, success in the province of detective work must always
3:be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(context, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("-C").arg("1").arg("world|attached").arg("sherlock");
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
Holmeses, success in the province of detective work must always
--
but Doctor Watson has to have it taken out for him and dusted,
and exhibited clearly, with a label attached.
";
eqnice!(expected, cmd.stdout());
});
rgtest!(context_line_numbers, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("-C").arg("1").arg("-n").arg("world|attached").arg("sherlock");
let expected = "\
1:For the Doctor Watsons of this world, as opposed to the Sherlock
2-Holmeses, success in the province of detective work must always
--
5-but Doctor Watson has to have it taken out for him and dusted,
6:and exhibited clearly, with a label attached.
";
eqnice!(expected, cmd.stdout());
});
rgtest!(max_filesize_parse_errro_length, |_: Dir, mut cmd: TestCommand| {
cmd.arg("--max-filesize").arg("44444444444444444444");
cmd.assert_non_empty_stderr();
});
rgtest!(max_filesize_parse_error_suffix, |_: Dir, mut cmd: TestCommand| {
cmd.arg("--max-filesize").arg("45k");
cmd.assert_non_empty_stderr();
});
rgtest!(max_filesize_parse_no_suffix, |dir: Dir, mut cmd: TestCommand| {
dir.create_size("foo", 40);
dir.create_size("bar", 60);
cmd.arg("--max-filesize").arg("50").arg("--files");
eqnice!("foo\n", cmd.stdout());
});
rgtest!(max_filesize_parse_k_suffix, |dir: Dir, mut cmd: TestCommand| {
dir.create_size("foo", 3048);
dir.create_size("bar", 4100);
cmd.arg("--max-filesize").arg("4K").arg("--files");
eqnice!("foo\n", cmd.stdout());
});
rgtest!(max_filesize_parse_m_suffix, |dir: Dir, mut cmd: TestCommand| {
dir.create_size("foo", 1000000);
dir.create_size("bar", 1400000);
cmd.arg("--max-filesize").arg("1M").arg("--files");
eqnice!("foo\n", cmd.stdout());
});
rgtest!(max_filesize_suffix_overflow, |dir: Dir, mut cmd: TestCommand| {
dir.create_size("foo", 1000000);
// 2^35 * 2^30 would otherwise overflow
cmd.arg("--max-filesize").arg("34359738368G").arg("--files");
cmd.assert_non_empty_stderr();
});
rgtest!(ignore_hidden, |dir: Dir, mut cmd: TestCommand| {
dir.create(".sherlock", SHERLOCK);
cmd.arg("Sherlock").assert_err();
});
rgtest!(no_ignore_hidden, |dir: Dir, mut cmd: TestCommand| {
dir.create(".sherlock", SHERLOCK);
cmd.arg("--hidden").arg("Sherlock");
let expected = "\
.sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
.sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(ignore_git, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create_dir(".git");
dir.create(".gitignore", "sherlock\n");
cmd.arg("Sherlock");
cmd.assert_err();
});
rgtest!(ignore_generic, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create(".ignore", "sherlock\n");
cmd.arg("Sherlock");
cmd.assert_err();
});
rgtest!(ignore_ripgrep, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create(".rgignore", "sherlock\n");
cmd.arg("Sherlock");
cmd.assert_err();
});
rgtest!(no_ignore, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create(".gitignore", "sherlock\n");
cmd.arg("--no-ignore").arg("Sherlock");
let expected = "\
sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(ignore_git_parent, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir(".git");
dir.create(".gitignore", "sherlock\n");
dir.create_dir("foo");
dir.create("foo/sherlock", SHERLOCK);
cmd.arg("Sherlock");
// Even though we search in foo/, which has no .gitignore, ripgrep will
// traverse parent directories and respect the gitignore files found.
cmd.current_dir(dir.path().join("foo"));
cmd.assert_err();
});
rgtest!(ignore_git_parent_stop, |dir: Dir, mut cmd: TestCommand| {
// This tests that searching parent directories for .gitignore files stops
// after it sees a .git directory. To test this, we create this directory
// hierarchy:
//
// .gitignore (contains `sherlock`)
// foo/
// .git/
// bar/
// sherlock
//
// And we perform the search inside `foo/bar/`. ripgrep will stop looking
// for .gitignore files after it sees `foo/.git/`, and therefore not
// respect the top-level `.gitignore` containing `sherlock`.
dir.create(".gitignore", "sherlock\n");
dir.create_dir("foo");
dir.create_dir("foo/.git");
dir.create_dir("foo/bar");
dir.create("foo/bar/sherlock", SHERLOCK);
cmd.arg("Sherlock");
cmd.current_dir(dir.path().join("foo").join("bar"));
let expected = "\
sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
// Like ignore_git_parent_stop, but with a .git file instead of a .git
// directory.
rgtest!(ignore_git_parent_stop_file, |dir: Dir, mut cmd: TestCommand| {
// This tests that searching parent directories for .gitignore files stops
// after it sees a .git *file*. A .git file is used for submodules. To test
// this, we create this directory hierarchy:
//
// .gitignore (contains `sherlock`)
// foo/
// .git
// bar/
// sherlock
//
// And we perform the search inside `foo/bar/`. ripgrep will stop looking
// for .gitignore files after it sees `foo/.git`, and therefore not
// respect the top-level `.gitignore` containing `sherlock`.
dir.create(".gitignore", "sherlock\n");
dir.create_dir("foo");
dir.create("foo/.git", "");
dir.create_dir("foo/bar");
dir.create("foo/bar/sherlock", SHERLOCK);
cmd.arg("Sherlock");
cmd.current_dir(dir.path().join("foo").join("bar"));
let expected = "\
sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(ignore_ripgrep_parent_no_stop, |dir: Dir, mut cmd: TestCommand| {
// This is like the `ignore_git_parent_stop` test, except it checks that
// ripgrep *doesn't* stop checking for .rgignore files.
dir.create(".rgignore", "sherlock\n");
dir.create_dir("foo");
dir.create_dir("foo/.git");
dir.create_dir("foo/bar");
dir.create("foo/bar/sherlock", SHERLOCK);
cmd.arg("Sherlock");
cmd.current_dir(dir.path().join("foo").join("bar"));
// The top-level .rgignore applies.
cmd.assert_err();
});
rgtest!(no_parent_ignore_git, |dir: Dir, mut cmd: TestCommand| {
// Set up a directory hierarchy like this:
//
// .git/
// .gitignore
// foo/
// .gitignore
// sherlock
// watson
//
// Where `.gitignore` contains `sherlock` and `foo/.gitignore` contains
// `watson`.
//
// Now *do the search* from the foo directory. By default, ripgrep will
// search parent directories for .gitignore files. The --no-ignore-parent
// flag should prevent that. At the same time, the `foo/.gitignore` file
// will still be respected (since the search is happening in `foo/`).
//
// In other words, we should only see results from `sherlock`, not from
// `watson`.
dir.create_dir(".git");
dir.create(".gitignore", "sherlock\n");
dir.create_dir("foo");
dir.create("foo/.gitignore", "watson\n");
dir.create("foo/sherlock", SHERLOCK);
dir.create("foo/watson", SHERLOCK);
cmd.arg("--no-ignore-parent").arg("Sherlock");
cmd.current_dir(dir.path().join("foo"));
let expected = "\
sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(symlink_nofollow, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir("foo");
dir.create_dir("foo/bar");
dir.link_dir("foo/baz", "foo/bar/baz");
dir.create_dir("foo/baz");
dir.create("foo/baz/sherlock", SHERLOCK);
cmd.arg("Sherlock");
cmd.current_dir(dir.path().join("foo/bar"));
cmd.assert_err();
});
#[cfg(not(windows))]
rgtest!(symlink_follow, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir("foo");
dir.create_dir("foo/bar");
dir.create_dir("foo/baz");
dir.create("foo/baz/sherlock", SHERLOCK);
dir.link_dir("foo/baz", "foo/bar/baz");
cmd.arg("-L").arg("Sherlock");
cmd.current_dir(dir.path().join("foo/bar"));
let expected = "\
baz/sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
baz/sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(unrestricted1, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create(".gitignore", "sherlock\n");
cmd.arg("-u").arg("Sherlock");
let expected = "\
sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(unrestricted2, |dir: Dir, mut cmd: TestCommand| {
dir.create(".sherlock", SHERLOCK);
cmd.arg("-uu").arg("Sherlock");
let expected = "\
.sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
.sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(unrestricted3, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create("file", "foo\x00bar\nfoo\x00baz\n");
cmd.arg("-uuu").arg("foo");
let expected = "\
file:foo\x00bar
file:foo\x00baz
";
eqnice!(expected, cmd.stdout());
});
rgtest!(vimgrep, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("--vimgrep").arg("Sherlock|Watson");
let expected = "\
sherlock:1:16:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:1:57:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:3:49:be, to a very large extent, the result of luck. Sherlock Holmes
sherlock:5:12:but Doctor Watson has to have it taken out for him and dusted,
";
eqnice!(expected, cmd.stdout());
});
rgtest!(vimgrep_no_line, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("--vimgrep").arg("-N").arg("Sherlock|Watson");
let expected = "\
sherlock:16:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:57:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:49:be, to a very large extent, the result of luck. Sherlock Holmes
sherlock:12:but Doctor Watson has to have it taken out for him and dusted,
";
eqnice!(expected, cmd.stdout());
});
rgtest!(vimgrep_no_line_no_column, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.arg("--vimgrep").arg("-N").arg("--no-column").arg("Sherlock|Watson");
let expected = "\
sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
sherlock:but Doctor Watson has to have it taken out for him and dusted,
";
eqnice!(expected, cmd.stdout());
});
rgtest!(preprocessing, |dir: Dir, mut cmd: TestCommand| {
if !cmd_exists("xzcat") {
return;
}
dir.create_bytes("sherlock.xz", include_bytes!("./data/sherlock.xz"));
cmd.arg("--pre").arg("xzcat").arg("Sherlock").arg("sherlock.xz");
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(preprocessing_glob, |dir: Dir, mut cmd: TestCommand| {
if !cmd_exists("xzcat") {
return;
}
dir.create("sherlock", SHERLOCK);
dir.create_bytes("sherlock.xz", include_bytes!("./data/sherlock.xz"));
cmd.args(&["--pre", "xzcat", "--pre-glob", "*.xz", "Sherlock"]);
let expected = "\
sherlock.xz:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock.xz:be, to a very large extent, the result of luck. Sherlock Holmes
sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(sort_lines(expected), sort_lines(&cmd.stdout()));
});
rgtest!(compressed_gzip, |dir: Dir, mut cmd: TestCommand| {
if !cmd_exists("gzip") {
return;
}
dir.create_bytes("sherlock.gz", include_bytes!("./data/sherlock.gz"));
cmd.arg("-z").arg("Sherlock").arg("sherlock.gz");
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(compressed_bzip2, |dir: Dir, mut cmd: TestCommand| {
if !cmd_exists("bzip2") {
return;
}
dir.create_bytes("sherlock.bz2", include_bytes!("./data/sherlock.bz2"));
cmd.arg("-z").arg("Sherlock").arg("sherlock.bz2");
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(compressed_xz, |dir: Dir, mut cmd: TestCommand| {
if !cmd_exists("xz") {
return;
}
dir.create_bytes("sherlock.xz", include_bytes!("./data/sherlock.xz"));
cmd.arg("-z").arg("Sherlock").arg("sherlock.xz");
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(compressed_lz4, |dir: Dir, mut cmd: TestCommand| {
if !cmd_exists("lz4") {
return;
}
dir.create_bytes("sherlock.lz4", include_bytes!("./data/sherlock.lz4"));
cmd.arg("-z").arg("Sherlock").arg("sherlock.lz4");
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(compressed_lzma, |dir: Dir, mut cmd: TestCommand| {
if !cmd_exists("xz") {
return;
}
dir.create_bytes("sherlock.lzma", include_bytes!("./data/sherlock.lzma"));
cmd.arg("-z").arg("Sherlock").arg("sherlock.lzma");
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
be, to a very large extent, the result of luck. Sherlock Holmes
";
eqnice!(expected, cmd.stdout());
});
rgtest!(compressed_failing_gzip, |dir: Dir, mut cmd: TestCommand| {
if !cmd_exists("gzip") {
return;
}
dir.create("sherlock.gz", SHERLOCK);
cmd.arg("-z").arg("Sherlock").arg("sherlock.gz");
cmd.assert_non_empty_stderr();
});
rgtest!(binary_nosearch, |dir: Dir, mut cmd: TestCommand| {
dir.create("file", "foo\x00bar\nfoo\x00baz\n");
cmd.arg("foo").arg("file");
cmd.assert_err();
});
// The following two tests show a discrepancy in search results between
// searching with memory mapped files and stream searching. Stream searching
// uses a heuristic (that GNU grep also uses) where NUL bytes are replaced with
// the EOL terminator, which tends to avoid allocating large amounts of memory
// for really long "lines." The memory map searcher has no need to worry about
// such things, and more than that, it would be pretty hard for it to match the
// semantics of streaming search in this case.
//
// Binary files with lots of NULs aren't really part of the use case of ripgrep
// (or any other grep-like tool for that matter), so we shouldn't feel too bad
// about it.
rgtest!(binary_search_mmap, |dir: Dir, mut cmd: TestCommand| {
dir.create("file", "foo\x00bar\nfoo\x00baz\n");
cmd.arg("-a").arg("--mmap").arg("foo").arg("file");
eqnice!("foo\x00bar\nfoo\x00baz\n", cmd.stdout());
});
rgtest!(binary_search_no_mmap, |dir: Dir, mut cmd: TestCommand| {
dir.create("file", "foo\x00bar\nfoo\x00baz\n");
cmd.arg("-a").arg("--no-mmap").arg("foo").arg("file");
eqnice!("foo\x00bar\nfoo\x00baz\n", cmd.stdout());
});
rgtest!(files, |dir: Dir, mut cmd: TestCommand| {
dir.create("file", "");
dir.create_dir("dir");
dir.create("dir/file", "");
cmd.arg("--files");
eqnice!(sort_lines("file\ndir/file\n"), sort_lines(&cmd.stdout()));
});
rgtest!(type_list, |_: Dir, mut cmd: TestCommand| {
cmd.arg("--type-list");
// This can change over time, so just make sure we print something.
assert!(!cmd.stdout().is_empty());
});

View File

@@ -1,109 +0,0 @@
use hay::SHERLOCK;
use util::{Dir, TestCommand};
// This tests that multiline matches that span multiple lines, but where
// multiple matches may begin and end on the same line work correctly.
rgtest!(overlap1, |dir: Dir, mut cmd: TestCommand| {
dir.create("test", "xxx\nabc\ndefxxxabc\ndefxxx\nxxx");
cmd.arg("-n").arg("-U").arg("abc\ndef").arg("test");
eqnice!("2:abc\n3:defxxxabc\n4:defxxx\n", cmd.stdout());
});
// Like overlap1, but tests the case where one match ends at precisely the same
// location at which the next match begins.
rgtest!(overlap2, |dir: Dir, mut cmd: TestCommand| {
dir.create("test", "xxx\nabc\ndefabc\ndefxxx\nxxx");
cmd.arg("-n").arg("-U").arg("abc\ndef").arg("test");
eqnice!("2:abc\n3:defabc\n4:defxxx\n", cmd.stdout());
});
// Tests that even in a multiline search, a '.' does not match a newline.
rgtest!(dot_no_newline, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.args(&[
"-n", "-U", "of this world.+detective work", "sherlock",
]);
cmd.assert_err();
});
// Tests that the --multiline-dotall flag causes '.' to match a newline.
rgtest!(dot_all, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.args(&[
"-n", "-U", "--multiline-dotall",
"of this world.+detective work", "sherlock",
]);
let expected = "\
1:For the Doctor Watsons of this world, as opposed to the Sherlock
2:Holmeses, success in the province of detective work must always
";
eqnice!(expected, cmd.stdout());
});
// Tests that --only-matching works in multiline mode.
rgtest!(only_matching, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.args(&[
"-n", "-U", "--only-matching",
r"Watson|Sherlock\p{Any}+?Holmes", "sherlock",
]);
let expected = "\
1:Watson
1:Sherlock
2:Holmes
3:Sherlock Holmes
5:Watson
";
eqnice!(expected, cmd.stdout());
});
// Tests that --vimgrep works in multiline mode.
rgtest!(vimgrep, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.args(&[
"-n", "-U", "--vimgrep",
r"Watson|Sherlock\p{Any}+?Holmes", "sherlock",
]);
let expected = "\
sherlock:1:16:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:1:57:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:2:57:Holmeses, success in the province of detective work must always
sherlock:3:49:be, to a very large extent, the result of luck. Sherlock Holmes
sherlock:5:12:but Doctor Watson has to have it taken out for him and dusted,
";
eqnice!(expected, cmd.stdout());
});
// Tests that multiline search works when reading from stdin. This is an
// important test because multiline search must read the entire contents of
// what it is searching into memory before executing the search.
rgtest!(stdin, |_: Dir, mut cmd: TestCommand| {
cmd.args(&[
"-n", "-U", r"of this world\p{Any}+?detective work",
]);
let expected = "\
1:For the Doctor Watsons of this world, as opposed to the Sherlock
2:Holmeses, success in the province of detective work must always
";
eqnice!(expected, cmd.pipe(SHERLOCK));
});
// Test that multiline search and contextual matches work.
rgtest!(context, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
cmd.args(&[
"-n", "-U", "-C1",
r"detective work\p{Any}+?result of luck", "sherlock",
]);
let expected = "\
1-For the Doctor Watsons of this world, as opposed to the Sherlock
2:Holmeses, success in the province of detective work must always
3:be, to a very large extent, the result of luck. Sherlock Holmes
4-can extract a clew from a wisp of straw or a flake of cigar ash;
";
eqnice!(expected, cmd.stdout());
});

View File

@@ -1,564 +0,0 @@
use hay::SHERLOCK;
use util::{Dir, TestCommand, sort_lines};
// See: https://github.com/BurntSushi/ripgrep/issues/16
rgtest!(r16, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir(".git");
dir.create(".gitignore", "ghi/");
dir.create_dir("ghi");
dir.create_dir("def/ghi");
dir.create("ghi/toplevel.txt", "xyz");
dir.create("def/ghi/subdir.txt", "xyz");
cmd.arg("xyz").assert_err();
});
// See: https://github.com/BurntSushi/ripgrep/issues/25
rgtest!(r25, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir(".git");
dir.create(".gitignore", "/llvm/");
dir.create_dir("src/llvm");
dir.create("src/llvm/foo", "test");
cmd.arg("test");
eqnice!("src/llvm/foo:test\n", cmd.stdout());
cmd.current_dir(dir.path().join("src"));
eqnice!("llvm/foo:test\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/30
rgtest!(r30, |dir: Dir, mut cmd: TestCommand| {
dir.create(".gitignore", "vendor/**\n!vendor/manifest");
dir.create_dir("vendor");
dir.create("vendor/manifest", "test");
eqnice!("vendor/manifest:test\n", cmd.arg("test").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/49
rgtest!(r49, |dir: Dir, mut cmd: TestCommand| {
dir.create(".gitignore", "foo/bar");
dir.create_dir("test/foo/bar");
dir.create("test/foo/bar/baz", "test");
cmd.arg("xyz").assert_err();
});
// See: https://github.com/BurntSushi/ripgrep/issues/50
rgtest!(r50, |dir: Dir, mut cmd: TestCommand| {
dir.create(".gitignore", "XXX/YYY/");
dir.create_dir("abc/def/XXX/YYY");
dir.create_dir("ghi/XXX/YYY");
dir.create("abc/def/XXX/YYY/bar", "test");
dir.create("ghi/XXX/YYY/bar", "test");
cmd.arg("xyz").assert_err();
});
// See: https://github.com/BurntSushi/ripgrep/issues/64
rgtest!(r64, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir("dir");
dir.create_dir("foo");
dir.create("dir/abc", "");
dir.create("foo/abc", "");
eqnice!("foo/abc\n", cmd.arg("--files").arg("foo").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/65
rgtest!(r65, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir(".git");
dir.create(".gitignore", "a/");
dir.create_dir("a");
dir.create("a/foo", "xyz");
dir.create("a/bar", "xyz");
cmd.arg("xyz").assert_err();
});
// See: https://github.com/BurntSushi/ripgrep/issues/67
rgtest!(r67, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir(".git");
dir.create(".gitignore", "/*\n!/dir");
dir.create_dir("dir");
dir.create_dir("foo");
dir.create("foo/bar", "test");
dir.create("dir/bar", "test");
eqnice!("dir/bar:test\n", cmd.arg("test").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/87
rgtest!(r87, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir(".git");
dir.create(".gitignore", "foo\n**no-vcs**");
dir.create("foo", "test");
cmd.arg("test").assert_err();
});
// See: https://github.com/BurntSushi/ripgrep/issues/90
rgtest!(r90, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir(".git");
dir.create(".gitignore", "!.foo");
dir.create(".foo", "test");
eqnice!(".foo:test\n", cmd.arg("test").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/93
rgtest!(r93, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "192.168.1.1");
eqnice!("foo:192.168.1.1\n", cmd.arg(r"(\d{1,3}\.){3}\d{1,3}").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/99
rgtest!(r99, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo1", "test");
dir.create("foo2", "zzz");
dir.create("bar", "test");
eqnice!(
sort_lines("bar\ntest\n\nfoo1\ntest\n"),
sort_lines(&cmd.arg("-j1").arg("--heading").arg("test").stdout())
);
});
// See: https://github.com/BurntSushi/ripgrep/issues/105
rgtest!(r105_part1, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "zztest");
eqnice!("foo:1:3:zztest\n", cmd.arg("--vimgrep").arg("test").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/105
rgtest!(r105_part2, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "zztest");
eqnice!("foo:1:3:zztest\n", cmd.arg("--column").arg("test").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/127
rgtest!(r127, |dir: Dir, mut cmd: TestCommand| {
// Set up a directory hierarchy like this:
//
// .gitignore
// foo/
// sherlock
// watson
//
// Where `.gitignore` contains `foo/sherlock`.
//
// ripgrep should ignore 'foo/sherlock' giving us results only from
// 'foo/watson' but on Windows ripgrep will include both 'foo/sherlock' and
// 'foo/watson' in the search results.
dir.create_dir(".git");
dir.create(".gitignore", "foo/sherlock\n");
dir.create_dir("foo");
dir.create("foo/sherlock", SHERLOCK);
dir.create("foo/watson", SHERLOCK);
let expected = "\
foo/watson:For the Doctor Watsons of this world, as opposed to the Sherlock
foo/watson:be, to a very large extent, the result of luck. Sherlock Holmes
";
assert_eq!(expected, cmd.arg("Sherlock").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/128
rgtest!(r128, |dir: Dir, mut cmd: TestCommand| {
dir.create_bytes("foo", b"01234567\x0b\n\x0b\n\x0b\n\x0b\nx");
eqnice!("foo:5:x\n", cmd.arg("-n").arg("x").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/131
//
// TODO(burntsushi): Darwin doesn't like this test for some reason. Probably
// due to the weird file path.
#[cfg(not(target_os = "macos"))]
rgtest!(r131, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir(".git");
dir.create(".gitignore", "TopÑapa");
dir.create("TopÑapa", "test");
cmd.arg("test").assert_err();
});
// See: https://github.com/BurntSushi/ripgrep/issues/137
//
// TODO(burntsushi): Figure out how to make this test work on Windows. Right
// now it gives "access denied" errors when trying to create a file symlink.
// For now, disable test on Windows.
#[cfg(not(windows))]
rgtest!(r137, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.link_file("sherlock", "sym1");
dir.link_file("sherlock", "sym2");
let expected = "\
./sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
./sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
sym1:For the Doctor Watsons of this world, as opposed to the Sherlock
sym1:be, to a very large extent, the result of luck. Sherlock Holmes
sym2:For the Doctor Watsons of this world, as opposed to the Sherlock
sym2:be, to a very large extent, the result of luck. Sherlock Holmes
";
cmd.arg("-j1").arg("Sherlock").arg("./").arg("sym1").arg("sym2");
eqnice!(expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/156
rgtest!(r156, |dir: Dir, mut cmd: TestCommand| {
let expected = r#"#parse('widgets/foo_bar_macros.vm')
#parse ( 'widgets/mobile/foo_bar_macros.vm' )
#parse ("widgets/foobarhiddenformfields.vm")
#parse ( "widgets/foo_bar_legal.vm" )
#include( 'widgets/foo_bar_tips.vm' )
#include('widgets/mobile/foo_bar_macros.vm')
#include ("widgets/mobile/foo_bar_resetpw.vm")
#parse('widgets/foo-bar-macros.vm')
#parse ( 'widgets/mobile/foo-bar-macros.vm' )
#parse ("widgets/foo-bar-hiddenformfields.vm")
#parse ( "widgets/foo-bar-legal.vm" )
#include( 'widgets/foo-bar-tips.vm' )
#include('widgets/mobile/foo-bar-macros.vm')
#include ("widgets/mobile/foo-bar-resetpw.vm")
"#;
dir.create("testcase.txt", expected);
cmd.arg("-N");
cmd.arg(r#"#(?:parse|include)\s*\(\s*(?:"|')[./A-Za-z_-]+(?:"|')"#);
cmd.arg("testcase.txt");
eqnice!(expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/184
rgtest!(r184, |dir: Dir, mut cmd: TestCommand| {
dir.create(".gitignore", ".*");
dir.create_dir("foo/bar");
dir.create("foo/bar/baz", "test");
cmd.arg("test");
eqnice!("foo/bar/baz:test\n", cmd.stdout());
cmd.current_dir(dir.path().join("./foo/bar"));
eqnice!("baz:test\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/199
rgtest!(r199, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "tEsT");
eqnice!("foo:tEsT\n", cmd.arg("--smart-case").arg(r"\btest\b").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/206
rgtest!(r206, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir("foo");
dir.create("foo/bar.txt", "test");
cmd.arg("test").arg("-g").arg("*.txt");
eqnice!("foo/bar.txt:test\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/210
#[cfg(unix)]
rgtest!(r210, |dir: Dir, mut cmd: TestCommand| {
use std::ffi::OsStr;
use std::os::unix::ffi::OsStrExt;
let badutf8 = OsStr::from_bytes(&b"foo\xffbar"[..]);
// APFS does not support creating files with invalid UTF-8 bytes.
// https://github.com/BurntSushi/ripgrep/issues/559
if dir.try_create(badutf8, "test").is_ok() {
cmd.arg("-H").arg("test").arg(badutf8);
assert_eq!(b"foo\xffbar:test\n".to_vec(), cmd.output().stdout);
}
});
// See: https://github.com/BurntSushi/ripgrep/issues/228
rgtest!(r228, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir("foo");
cmd.arg("--ignore-file").arg("foo").arg("test").assert_err();
});
// See: https://github.com/BurntSushi/ripgrep/issues/229
rgtest!(r229, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "economie");
cmd.arg("-S").arg("[E]conomie").assert_err();
});
// See: https://github.com/BurntSushi/ripgrep/issues/251
rgtest!(r251, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "привет\nПривет\nПрИвЕт");
let expected = "foo:привет\nfoo:Привет\nfoo:ПрИвЕт\n";
eqnice!(expected, cmd.arg("-i").arg("привет").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/256
#[cfg(not(windows))]
rgtest!(r256, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir("bar");
dir.create("bar/baz", "test");
dir.link_dir("bar", "foo");
eqnice!("foo/baz:test\n", cmd.arg("test").arg("foo").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/256
#[cfg(not(windows))]
rgtest!(r256_j1, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir("bar");
dir.create("bar/baz", "test");
dir.link_dir("bar", "foo");
eqnice!("foo/baz:test\n", cmd.arg("-j1").arg("test").arg("foo").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/270
rgtest!(r270, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "-test");
cmd.arg("-e").arg("-test");
eqnice!("foo:-test\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/279
rgtest!(r279, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "test");
eqnice!("", cmd.arg("-q").arg("test").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/391
rgtest!(r391, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir(".git");
dir.create("lock", "");
dir.create("bar.py", "");
dir.create(".git/packed-refs", "");
dir.create(".git/description", "");
cmd.args(&[
"--no-ignore", "--hidden", "--follow", "--files",
"--glob",
"!{.git,node_modules,plugged}/**",
"--glob",
"*.{js,json,php,md,styl,scss,sass,pug,html,config,py,cpp,c,go,hs}",
]);
eqnice!("bar.py\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/405
rgtest!(r405, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir("foo/bar");
dir.create_dir("bar/foo");
dir.create("foo/bar/file1.txt", "test");
dir.create("bar/foo/file2.txt", "test");
cmd.arg("-g").arg("!/foo/**").arg("test");
eqnice!("bar/foo/file2.txt:test\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/428
#[cfg(not(windows))]
rgtest!(r428_color_context_path, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", "foo\nbar");
cmd.args(&[
"-A1", "-H", "--no-heading", "-N",
"--colors=match:none", "--color=always",
"foo",
]);
let expected = format!(
"{colored_path}:foo\n{colored_path}-bar\n",
colored_path=
"\x1b\x5b\x30\x6d\x1b\x5b\x33\x35\x6dsherlock\x1b\x5b\x30\x6d"
);
eqnice!(expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/428
rgtest!(r428_unrecognized_style, |_: Dir, mut cmd: TestCommand| {
cmd.arg("--colors=match:style:").arg("Sherlock");
cmd.assert_err();
let output = cmd.cmd().output().unwrap();
let stderr = String::from_utf8_lossy(&output.stderr);
let expected = "\
unrecognized style attribute ''. Choose from: nobold, bold, nointense, \
intense, nounderline, underline.
";
eqnice!(expected, stderr);
});
// See: https://github.com/BurntSushi/ripgrep/issues/451
rgtest!(r451_only_matching_as_in_issue, |dir: Dir, mut cmd: TestCommand| {
dir.create("digits.txt", "1 2 3\n");
cmd.arg("--only-matching").arg(r"[0-9]+").arg("digits.txt");
let expected = "\
1
2
3
";
eqnice!(expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/451
rgtest!(r451_only_matching, |dir: Dir, mut cmd: TestCommand| {
dir.create("digits.txt", "1 2 3\n123\n");
cmd.args(&[
"--only-matching", "--column", r"[0-9]", "digits.txt",
]);
let expected = "\
1:1:1
1:3:2
1:5:3
2:1:1
2:2:2
2:3:3
";
eqnice!(expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/483
rgtest!(r483_matching_no_stdout, |dir: Dir, mut cmd: TestCommand| {
dir.create("file.py", "");
cmd.arg("--quiet").arg("--files").arg("--glob").arg("*.py");
eqnice!("", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/483
rgtest!(r483_non_matching_exit_code, |dir: Dir, mut cmd: TestCommand| {
dir.create("file.rs", "");
cmd.arg("--quiet").arg("--files").arg("--glob").arg("*.py");
cmd.assert_err();
});
// See: https://github.com/BurntSushi/ripgrep/issues/493
rgtest!(r493, |dir: Dir, mut cmd: TestCommand| {
dir.create("input.txt", "peshwaship 're seminomata");
cmd.arg("-o").arg(r"\b 're \b").arg("input.txt");
assert_eq!(" 're \n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/506
rgtest!(r506_word_not_parenthesized, |dir: Dir, mut cmd: TestCommand| {
dir.create("wb.txt", "min minimum amin\nmax maximum amax");
cmd.arg("-w").arg("-o").arg("min|max").arg("wb.txt");
eqnice!("min\nmax\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/553
rgtest!(r553_switch, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
let expected = "\
sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
";
cmd.arg("-i").arg("sherlock");
eqnice!(expected, cmd.stdout());
// Repeat the `i` flag to make sure everything still works.
eqnice!(expected, cmd.arg("-i").stdout());
});
rgtest!(r553_flag, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
Holmeses, success in the province of detective work must always
--
but Doctor Watson has to have it taken out for him and dusted,
and exhibited clearly, with a label attached.
";
cmd.arg("-C").arg("1").arg(r"world|attached").arg("sherlock");
eqnice!(expected, cmd.stdout());
let expected = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
and exhibited clearly, with a label attached.
";
eqnice!(expected, cmd.arg("-C").arg("0").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/568
rgtest!(r568_leading_hyphen_option_args, |dir: Dir, mut cmd: TestCommand| {
dir.create("file", "foo bar -baz\n");
cmd.arg("-e-baz").arg("-e").arg("-baz").arg("file");
eqnice!("foo bar -baz\n", cmd.stdout());
let mut cmd = dir.command();
cmd.arg("-rni").arg("bar").arg("file");
eqnice!("foo ni -baz\n", cmd.stdout());
let mut cmd = dir.command();
cmd.arg("-r").arg("-n").arg("-i").arg("bar").arg("file");
eqnice!("foo -n -baz\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/599
//
// This test used to check that we emitted color escape sequences even for
// empty matches, but with the addition of the JSON output format, clients no
// longer need to rely on escape sequences to parse matches. Therefore, we no
// longer emit useless escape sequences.
rgtest!(r599, |dir: Dir, mut cmd: TestCommand| {
dir.create("input.txt", "\n\ntest\n");
cmd.args(&[
"--color", "ansi",
"--colors", "path:none",
"--colors", "line:none",
"--colors", "match:fg:red",
"--colors", "match:style:nobold",
"--line-number",
r"^$",
"input.txt",
]);
let expected = "\
1:
2:
";
eqnice_repr!(expected, cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/693
rgtest!(r693_context_in_contextless_mode, |dir: Dir, mut cmd: TestCommand| {
dir.create("foo", "xyz\n");
dir.create("bar", "xyz\n");
cmd.arg("-C1").arg("-c").arg("--sort-files").arg("xyz");
eqnice!("bar:1\nfoo:1\n", cmd.stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/807
rgtest!(r807, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir(".git");
dir.create(".gitignore", ".a/b");
dir.create_dir(".a/b");
dir.create_dir(".a/c");
dir.create(".a/b/file", "test");
dir.create(".a/c/file", "test");
eqnice!(".a/c/file:test\n", cmd.arg("--hidden").arg("test").stdout());
});
// See: https://github.com/BurntSushi/ripgrep/issues/900
rgtest!(r900, |dir: Dir, mut cmd: TestCommand| {
dir.create("sherlock", SHERLOCK);
dir.create("pat", "");
cmd.arg("-fpat").arg("sherlock").assert_err();
});

File diff suppressed because it is too large Load Diff

View File

@@ -1,10 +1,11 @@
use std::env;
use std::error;
use std::ffi::OsStr;
use std::fmt;
use std::fs::{self, File};
use std::io::{self, Write};
use std::path::{Path, PathBuf};
use std::process::{self, Command};
use std::process;
use std::str::FromStr;
use std::sync::atomic::{ATOMIC_USIZE_INIT, AtomicUsize, Ordering};
use std::thread;
use std::time::Duration;
@@ -12,60 +13,24 @@ use std::time::Duration;
static TEST_DIR: &'static str = "ripgrep-tests";
static NEXT_ID: AtomicUsize = ATOMIC_USIZE_INIT;
/// Setup an empty work directory and return a command pointing to the ripgrep
/// executable whose CWD is set to the work directory.
///
/// The name given will be used to create the directory. Generally, it should
/// correspond to the test name.
pub fn setup(test_name: &str) -> (Dir, TestCommand) {
let dir = Dir::new(test_name);
let cmd = dir.command();
(dir, cmd)
}
/// Like `setup`, but uses PCRE2 as the underlying regex engine.
pub fn setup_pcre2(test_name: &str) -> (Dir, TestCommand) {
let mut dir = Dir::new(test_name);
dir.pcre2(true);
let cmd = dir.command();
(dir, cmd)
}
/// Break the given string into lines, sort them and then join them back
/// together. This is useful for testing output from ripgrep that may not
/// always be in the same order.
pub fn sort_lines(lines: &str) -> String {
let mut lines: Vec<&str> = lines.trim().lines().collect();
lines.sort();
format!("{}\n", lines.join("\n"))
}
/// Returns true if and only if the given program can be successfully executed
/// with a `--help` flag.
pub fn cmd_exists(program: &str) -> bool {
Command::new(program).arg("--help").output().is_ok()
}
/// Dir represents a directory in which tests should be run.
/// `WorkDir` represents a directory in which tests are run.
///
/// Directories are created from a global atomic counter to avoid duplicates.
#[derive(Clone, Debug)]
pub struct Dir {
#[derive(Debug)]
pub struct WorkDir {
/// The directory in which this test executable is running.
root: PathBuf,
/// The directory in which the test should run. If a test needs to create
/// files, they should go in here. This directory is also used as the CWD
/// for any processes created by the test.
dir: PathBuf,
/// Set to true when the test should use PCRE2 as the regex engine.
pcre2: bool,
}
impl Dir {
impl WorkDir {
/// Create a new test working directory with the given name. The name
/// does not need to be distinct for each invocation, but should correspond
/// to a logical grouping of tests.
pub fn new(name: &str) -> Dir {
pub fn new(name: &str) -> WorkDir {
let id = NEXT_ID.fetch_add(1, Ordering::SeqCst);
let root = env::current_exe()
.unwrap()
@@ -77,24 +42,12 @@ impl Dir {
.join(name)
.join(&format!("{}", id));
nice_err(&dir, repeat(|| fs::create_dir_all(&dir)));
Dir {
WorkDir {
root: root,
dir: dir,
pcre2: false,
}
}
/// Use PCRE2 for this test.
pub fn pcre2(&mut self, yes: bool) {
self.pcre2 = yes;
}
/// Returns true if and only if this test is configured to use PCRE2 as
/// the regex engine.
pub fn is_pcre2(&self) -> bool {
self.pcre2
}
/// Create a new file with the given name and contents in this directory,
/// or panic on error.
pub fn create<P: AsRef<Path>>(&self, name: P, contents: &str) {
@@ -103,7 +56,6 @@ impl Dir {
/// Try to create a new file with the given name and contents in this
/// directory.
#[allow(dead_code)] // unused on Windows
pub fn try_create<P: AsRef<Path>>(
&self,
name: P,
@@ -123,19 +75,18 @@ impl Dir {
/// Create a new file with the given name and contents in this directory,
/// or panic on error.
pub fn create_bytes<P: AsRef<Path>>(&self, name: P, contents: &[u8]) {
let path = self.dir.join(&name);
nice_err(&path, self.try_create_bytes(name, contents));
let path = self.dir.join(name);
nice_err(&path, self.try_create_bytes(&path, contents));
}
/// Try to create a new file with the given name and contents in this
/// directory.
pub fn try_create_bytes<P: AsRef<Path>>(
fn try_create_bytes<P: AsRef<Path>>(
&self,
name: P,
path: P,
contents: &[u8],
) -> io::Result<()> {
let path = self.dir.join(name);
let mut file = File::create(path)?;
let mut file = File::create(&path)?;
file.write_all(contents)?;
file.flush()
}
@@ -155,22 +106,11 @@ impl Dir {
/// Creates a new command that is set to use the ripgrep executable in
/// this working directory.
///
/// This also:
///
/// * Unsets the `RIPGREP_CONFIG_PATH` environment variable.
/// * Sets the `--path-separator` to `/` so that paths have the same output
/// on all systems. Tests that need to check `--path-separator` itself
/// can simply pass it again to override it.
pub fn command(&self) -> TestCommand {
pub fn command(&self) -> process::Command {
let mut cmd = process::Command::new(&self.bin());
cmd.env_remove("RIPGREP_CONFIG_PATH");
cmd.current_dir(&self.dir);
cmd.arg("--path-separator").arg("/");
if self.is_pcre2() {
cmd.arg("--pcre2");
}
TestCommand { dir: self.clone(), cmd: cmd }
cmd
}
/// Returns the path to the ripgrep executable.
@@ -223,7 +163,6 @@ impl Dir {
/// Creates a file symlink to the src with the given target name
/// in this directory.
#[cfg(windows)]
#[allow(dead_code)] // unused on Windows
pub fn link_file<S: AsRef<Path>, T: AsRef<Path>>(
&self,
src: S,
@@ -235,54 +174,16 @@ impl Dir {
let _ = fs::remove_file(&target);
nice_err(&target, symlink_file(&src, &target));
}
}
/// A simple wrapper around a process::Command with some conveniences.
#[derive(Debug)]
pub struct TestCommand {
/// The dir used to launched this command.
dir: Dir,
/// The actual command we use to control the process.
cmd: Command,
}
impl TestCommand {
/// Returns a mutable reference to the underlying command.
pub fn cmd(&mut self) -> &mut Command {
&mut self.cmd
}
/// Add an argument to pass to the command.
pub fn arg<A: AsRef<OsStr>>(&mut self, arg: A) -> &mut TestCommand {
self.cmd.arg(arg);
self
}
/// Add any number of arguments to the command.
pub fn args<I, A>(
&mut self,
args: I,
) -> &mut TestCommand
where I: IntoIterator<Item=A>,
A: AsRef<OsStr>
{
self.cmd.args(args);
self
}
/// Set the working directory for this command.
///
/// Note that this does not need to be called normally, since the creation
/// of this TestCommand causes its working directory to be set to the
/// test's directory automatically.
pub fn current_dir<P: AsRef<Path>>(&mut self, dir: P) -> &mut TestCommand {
self.cmd.current_dir(dir);
self
}
/// Runs and captures the stdout of the given command.
pub fn stdout(&mut self) -> String {
let o = self.output();
///
/// If the return type could not be created from a string, then this
/// panics.
pub fn stdout<E: fmt::Debug, T: FromStr<Err=E>>(
&self,
cmd: &mut process::Command,
) -> T {
let o = self.output(cmd);
let stdout = String::from_utf8_lossy(&o.stdout);
match stdout.parse() {
Ok(t) => t,
@@ -296,13 +197,23 @@ impl TestCommand {
}
}
/// Pipe `input` to a command, and collect the output.
pub fn pipe(&mut self, input: &str) -> String {
self.cmd.stdin(process::Stdio::piped());
self.cmd.stdout(process::Stdio::piped());
self.cmd.stderr(process::Stdio::piped());
/// Gets the output of a command. If the command failed, then this panics.
pub fn output(&self, cmd: &mut process::Command) -> process::Output {
let output = cmd.output().unwrap();
self.expect_success(cmd, output)
}
let mut child = self.cmd.spawn().unwrap();
/// Pipe `input` to a command, and collect the output.
pub fn pipe(
&self,
cmd: &mut process::Command,
input: &str
) -> process::Output {
cmd.stdin(process::Stdio::piped());
cmd.stdout(process::Stdio::piped());
cmd.stderr(process::Stdio::piped());
let mut child = cmd.spawn().unwrap();
// Pipe input to child process using a separate thread to avoid
// risk of deadlock between parent and child process.
@@ -312,86 +223,20 @@ impl TestCommand {
write!(stdin, "{}", input)
});
let output = self.expect_success(child.wait_with_output().unwrap());
worker.join().unwrap().unwrap();
let stdout = String::from_utf8_lossy(&output.stdout);
match stdout.parse() {
Ok(t) => t,
Err(err) => {
panic!(
"could not convert from string: {:?}\n\n{}",
err,
stdout
);
}
}
}
/// Gets the output of a command. If the command failed, then this panics.
pub fn output(&mut self) -> process::Output {
let output = self.cmd.output().unwrap();
self.expect_success(output)
}
/// Runs the command and asserts that it resulted in an error exit code.
pub fn assert_err(&mut self) {
let o = self.cmd.output().unwrap();
if o.status.success() {
panic!(
"\n\n===== {:?} =====\n\
command succeeded but expected failure!\
\n\ncwd: {}\
\n\nstatus: {}\
\n\nstdout: {}\n\nstderr: {}\
\n\n=====\n",
self.cmd,
self.dir.dir.display(),
o.status,
String::from_utf8_lossy(&o.stdout),
String::from_utf8_lossy(&o.stderr)
);
}
}
/// Runs the command and asserts that its exit code matches expected exit
/// code.
pub fn assert_exit_code(&mut self, expected_code: i32) {
let code = self.cmd.output().unwrap().status.code().unwrap();
assert_eq!(
expected_code, code,
"\n\n===== {:?} =====\n\
expected exit code did not match\
\n\nexpected: {}\
\n\nfound: {}\
\n\n=====\n",
self.cmd,
expected_code,
code
let output = self.expect_success(
cmd,
child.wait_with_output().unwrap(),
);
worker.join().unwrap().unwrap();
output
}
/// Runs the command and asserts that something was printed to stderr.
pub fn assert_non_empty_stderr(&mut self) {
let o = self.cmd.output().unwrap();
if o.status.success() || o.stderr.is_empty() {
panic!(
"\n\n===== {:?} =====\n\
command succeeded but expected failure!\
\n\ncwd: {}\
\n\nstatus: {}\
\n\nstdout: {}\n\nstderr: {}\
\n\n=====\n",
self.cmd,
self.dir.dir.display(),
o.status,
String::from_utf8_lossy(&o.stdout),
String::from_utf8_lossy(&o.stderr)
);
}
}
fn expect_success(&self, o: process::Output) -> process::Output {
/// If `o` is not the output of a successful process run
fn expect_success(
&self,
cmd: &process::Command,
o: process::Output
) -> process::Output {
if !o.status.success() {
let suggest =
if o.stderr.is_empty() {
@@ -409,21 +254,81 @@ impl TestCommand {
\n\nstdout: {}\
\n\nstderr: {}\
\n\n==========\n",
suggest, self.cmd, self.dir.dir.display(), o.status,
suggest, cmd, self.dir.display(), o.status,
String::from_utf8_lossy(&o.stdout),
String::from_utf8_lossy(&o.stderr));
}
o
}
/// Runs the given command and asserts that it resulted in an error exit
/// code.
pub fn assert_err(&self, cmd: &mut process::Command) {
let o = cmd.output().unwrap();
if o.status.success() {
panic!(
"\n\n===== {:?} =====\n\
command succeeded but expected failure!\
\n\ncwd: {}\
\n\nstatus: {}\
\n\nstdout: {}\n\nstderr: {}\
\n\n=====\n",
cmd,
self.dir.display(),
o.status,
String::from_utf8_lossy(&o.stdout),
String::from_utf8_lossy(&o.stderr)
);
}
}
/// Runs the given command and asserts that its exit code matches expected
/// exit code.
pub fn assert_exit_code(
&self,
expected_code: i32,
cmd: &mut process::Command,
) {
let code = cmd.status().unwrap().code().unwrap();
assert_eq!(
expected_code, code,
"\n\n===== {:?} =====\n\
expected exit code did not match\
\n\nexpected: {}\
\n\nfound: {}\
\n\n=====\n",
cmd, expected_code, code
);
}
/// Runs the given command and asserts that something was printed to
/// stderr.
pub fn assert_non_empty_stderr(&self, cmd: &mut process::Command) {
let o = cmd.output().unwrap();
if o.status.success() || o.stderr.is_empty() {
panic!("\n\n===== {:?} =====\n\
command succeeded but expected failure!\
\n\ncwd: {}\
\n\nstatus: {}\
\n\nstdout: {}\n\nstderr: {}\
\n\n=====\n",
cmd, self.dir.display(), o.status,
String::from_utf8_lossy(&o.stdout),
String::from_utf8_lossy(&o.stderr));
}
}
}
fn nice_err<T, E: error::Error>(
path: &Path,
fn nice_err<P: AsRef<Path>, T, E: error::Error>(
path: P,
res: Result<T, E>,
) -> T {
match res {
Ok(t) => t,
Err(err) => panic!("{}: {:?}", path.display(), err),
Err(err) => {
panic!("{}: {:?}", path.as_ref().display(), err);
}
}
}