Compare commits

...

91 Commits

Author SHA1 Message Date
Andrew Gallant
acd20a803c ripgrep: remove old code 2018-08-06 10:52:53 -04:00
Andrew Gallant
f16f9cedf1 ci: test libripgrep 2018-08-06 10:52:53 -04:00
Andrew Gallant
7eb34e8b32 ripgrep: migrate to libripgrep 2018-08-06 10:52:53 -04:00
Andrew Gallant
b3769ef8f1 libripgrep: initial commit introducing libripgrep
libripgrep is not any one library, but rather, a collection of libraries
that roughly separate the following key distinct phases in a grep
implementation:

  1. Pattern matching (e.g., by a regex engine).
  2. Searching a file using a pattern matcher.
  3. Printing results.

Ultimately, both (1) and (3) are defined by de-coupled interfaces, of
which there may be multiple implementations. Namely, (1) is satisfied by
the `Matcher` trait in the `grep-matcher` crate and (3) is satisfied by
the `Sink` trait in the `grep2` crate. The searcher (2) ties everything
together and finds results using a matcher and reports those results
using a `Sink` implementation.
2018-08-06 10:52:53 -04:00
Andrew Gallant
e86d3d95c2 pkg: update brew tap to 0.9.0 2018-08-03 17:04:36 -04:00
Andrew Gallant
6799dcfc0e release: 0.9.0 2018-08-03 16:13:31 -04:00
Andrew Gallant
0fdab0ec5e grep-0.1.9 2018-08-03 16:12:08 -04:00
Andrew Gallant
74ec5b8932 deps: update termcolor and encoding_rs_io 2018-08-03 16:08:57 -04:00
Andrew Gallant
2913fc4cd0 tests: reduce reliance on state in tests
This commit improves the integration test setup by running tests inside
the system's temporary directory instead of within ripgrep's `target`
directory. The motivation here is to attempt to reduce the effect of
unanticipated state on ripgrep's integration tests, such as the presence
of `.gitignore` files in ripgrep's checkout directory hierarchy
(including parent directories).

This doesn't remove all possible state. For example, there's no
guarantee that the system's temporary directory isn't itself within a
git repository. Moreover, there may still be other ignore rules in the
directory tree that might impact test behavior. Fixing this seems
somewhat difficult. Conceptually, it seems like ripgrep should run each
test in its own `chroot`-like environment, but doing this in a
non-annoying and portable way (including Windows) doesn't appear to be
possible.

Another approach to take here might be to teach ripgrep itself that a
particular directory should be treated as root, and therefore, never
look at anything outside that directory. This also seems complex to
implement, but tractable. Let's see how this approach works for now.

Fixes #448, #996
2018-07-29 10:41:03 -04:00
Andrew Gallant
7c412bb2fa tests/style: 80 columns, dammit 2018-07-29 10:41:03 -04:00
Andrew Gallant
651a0f1ddf ignore: fix typo 2018-07-29 08:37:24 -04:00
Andrew Gallant
45473ba48f ignore/style: 80 columns, dammit 2018-07-29 08:31:04 -04:00
Andrew Gallant
0863c75a5a ignore: fix bug in matched_path_or_any_parents
This method was supposed to panic whenever the given path wasn't under
the root of the gitignore patcher. Instead of using assert!, it was using
debug_assert!. This actually caused tests to fail when running under
release mode, because the debug_assert! wouldn't trip.

Fixes #671
2018-07-29 08:30:53 -04:00
Andrew Gallant
d94d99f657 ignore-0.4.3 2018-07-28 11:05:27 -04:00
Andrew Gallant
84585908ac globset-0.4.1 2018-07-28 10:59:54 -04:00
Andrew Gallant
1611c04e6f ignore: respect XDG_CONFIG_DIR/git/config
This commit updates the logic for finding the value of git's
`core.excludesFile` configuration parameter. Namely, we now check
`$XDG_CONFIG_DIR/git/config` in addition to `$HOME/.gitconfig` (where
the latter overrules the former on a knob-by-knob basis).

Fixes #995
2018-07-28 10:04:16 -04:00
dana
d857ad6ed3 complete: add --no-pre, improve --pre/--search-zip exclusivity 2018-07-24 07:21:18 -04:00
Andrew Gallant
4dd2f8e40e deps: update atty and winapi
This updates atty and winapi to their latest versions, including the bug
fix in atty that allows it to work with winapi 0.3.5.
2018-07-22 13:07:48 -04:00
Andrew Gallant
dca8110da2 ripgrep: when given no patterns, don't match
Generally speaking, ripgrep prevents the case of not having any patterns
via its arg parsing. However, it is possible for users to provide a file
of patterns via the `-f` flag. If that file is empty, then ripgrep has
nothing to search for and therefore should not ever produce any match.

One way of fixing this might be to replace the absence of patterns with
a pattern that can never match, but this still requires opening and
searching through every file, which is quite a waste. Instead, we detect
this case explicitly and quit early.

Fixes #900
2018-07-22 12:07:18 -04:00
Andrew Gallant
0d11497d21 tests: be looser with gzip failure
Don't expect the exact error message. Instead, just ask that the error
message exist and be non-empty.

Fixes #903
2018-07-22 11:08:16 -04:00
Andrew Gallant
22ac2e056e ripgrep: stop early when --files --quiet is used
This commit tweaks the implementation of the --files flag to stop early
when --quiet is provided.

Fixes #907
2018-07-22 11:05:24 -04:00
Andrew Gallant
03af61fc7b ripgrep: don't skip tar archives
This removes logic from the decompressor for skipping tar archives. This
logic was originally added under the assumption that we probably want to
avoid the cost of reading them. However, this is generally inconsistent
with how ripgrep treats files like tar archives: it should search them
and do binary detection like normal.

Fixes #918
2018-07-22 10:59:09 -04:00
Andrew Gallant
560dffd247 ripgrep: add --no-ignore-global flag
This commit adds a new --no-ignore-global flag that permits disabling
the use of global gitignore filtering. Global gitignores are generally
found in `$HOME/.config/git/ignore`, but its location can be configured
via git's `core.excludesFile` option.

Closes #934
2018-07-22 10:42:32 -04:00
Andrew Gallant
e65ca21a6c ignore: only respect .gitignore in git repos
This commit fixes an interesting bug in the `ignore` crate where it
would basically respect any `.gitignore` file anywhere (including global
gitignores in `~/.config/git/ignore`), regardless of whether we were
searching in a git repository or not. This commit rectifies that
behavior to only respect gitignore rules when in a git repo.

The key change here is to move the logic of whether to traverse parents
into the directory matcher rather than putting the onus on the directory
traverser. In particular, we now need to traverse parent directories in
more cases than we previously did, since we need to determine whether
we're in a git repository or not.

Fixes #934
2018-07-22 10:33:23 -04:00
Andrew Gallant
6771626553 ripgrep: improve usage documentation
This shows an example for reading stdin.

Fixes #951
2018-07-22 09:38:49 -04:00
Andrew Gallant
b9c922be53 ripgrep: better --path-separator error message
This commit improves the error message when --path-separator fails. Namely,
it prints the separator it got and also prints a notice for Windows users
for common failure modes.

Fixes #957
2018-07-22 09:33:03 -04:00
Andrew Gallant
7a44cad599 deps: pin winapi to 0.3.4
winapi 0.3.5 changed how it represents some of its structs, which caused
a bug to surface in atty that prevents tty detection on Windows. atty
has an open PR to fix this: https://github.com/softprops/atty/pull/28

Until a new release of atty, we pin winapi to a version that works.
2018-07-22 09:31:22 -04:00
Andrew Gallant
6dc09c5b1b ripgrep: add --no-pre switch
This switch explicitly disables the --pre behavior. An empty string will
also disable --pre behavior, but actually utterring an empty flag value
is quite awkward, so we provide an explicit switch to do the same thing.

Thanks to @c-blake for pointing this out.
2018-07-21 20:57:27 -04:00
Andrew Gallant
209a125ea2 ripgrep: replace decoder with encoding_rs_io
This commit mostly moves the transcoder implementation to its own
crate: https://github.com/BurntSushi/encoding_rs_io

The new crate adds clear documentation and cleans up the implementation
to fully implement the contract of io::Read.
2018-07-21 20:36:32 -04:00
Andrew Gallant
090216cf00 ripgrep: reformat --pre warning 2018-07-21 17:53:55 -04:00
Andrew Gallant
c96a358593 ripgrep: add warning to --pre flag
The --pre flag can result in a pretty large performance penalty, so put
a warning in the flag documentation. This warning is important because a
flag like this could easily wind up in a user's configuration file.
2018-07-21 17:50:54 -04:00
Charles Blake
231456c409 ripgrep: add --pre flag
The preprocessor flag accepts a command program and executes this
program for every input file that is searched. Instead of searching the
file directly, ripgrep will instead search the stdout contents of the
program.

Closes #978, Closes #981
2018-07-21 17:25:12 -04:00
Kalle Samuels
1d09d4d31b ripgrep: add support for lz4 decompression
This uses the lz4 binary for decompression.

Closes #898
2018-07-21 16:26:39 -04:00
Andrew Gallant
02f08f3800 changelog: updates
We continue our quest to update the CHANGELOG incrementally.
2018-07-21 14:23:38 -04:00
Mateusz Mikuła
470afa1bd7 snap: build without wrappers
Wrappers usually define things required for snaps to work like library path.
Ripgrep being static binary doesn't need it at all.

This will give minor speed up in scenarios when ripgrep is ran multiple
times and can be considered as good practice for static binaries.

PR #979
2018-07-21 13:28:27 -04:00
phiresky
aa2ce39d14 ignore: fix has_any_ignore_rules for explicit ignores
When building a ignore::WalkBuilder by disabling all standard filters
and adding a custom global ignore file, the ignore file is not used. Example:

    let mut walker = ignore::WalkBuilder::new(dir);
    walker.standard_filters(false);
    walker.add_ignore(myfile);

This makes it impossible to use the ignore crate to walk a directory
with only custom ignore files. Very similar to issue #800 (fixed in
b71a110).

PR #988
2018-07-21 13:26:54 -04:00
Andrew Gallant
d11a3b3377 crates.io: use 'OR' instead of '/'
Fixes #987
2018-07-21 13:25:39 -04:00
Andrew Gallant
d09e2f6af1 globset: clarify documentation on regex method
This makes it clear that the `bytes` API of the regex crate should be
used instead of the Unicode API.

Fixes #985
2018-07-21 13:23:46 -04:00
Andrew Gallant
7b6af5a177 deps: update regex to 1.0.2
And also update to regex-syntax 0.6.2.
2018-07-18 09:29:04 -04:00
Andrew Gallant
9bd1aa1c04 readme: update rogue 1.20 reference 2018-07-17 20:41:34 -04:00
Andrew Gallant
1393ce4b6b deps: update all transitive dependencies
This updates all remaining transitive dependencies. Most changes appear
minor and there appear to be no minimum Rust version conflicts. Yay!
2018-07-17 20:34:03 -04:00
Andrew Gallant
8e358ee056 deps: bump various dependencies
Nothing major here. All patch releases. This should bring us completely
up to date with all direct dependencies.
2018-07-17 20:33:13 -04:00
Andrew Gallant
5b5f4e74d9 deps: bump encoding_rs to 0.8
This brings in performance improvements.
2018-07-17 20:29:41 -04:00
Andrew Gallant
7829850bf0 deps: bump minimum Rust to 1.23.0 from 1.20.0
1.23.0 is the first Rust release of 2018 and is around half a year old,
which seems old enough to move to. This also lets us bring in encoding_rs
0.8, which includes performance optimizations.
2018-07-17 20:29:20 -04:00
Andrew Gallant
06b66efd59 deps: get rid of unstable feature
This was introduced as a temporary measure for dealing with the regex
crate's unstable feature, but it was never included in a release of
ripgrep. Thus, we remove it. The regex crate will now automatically enable
SIMD optimizations when available.
2018-07-17 20:27:04 -04:00
Andrew Gallant
7e5a590276 grep: small literal detection fix
This commit tweaks the inner literal detection heuristic such that if it
comes up with any literal that is all whitespace, then it's likely a bad
literal to look for since it's so common. Therefore, we simply reject the
inner literal optimization in this case and let the regex engine do its
thang.
2018-07-17 20:27:04 -04:00
Andrew Gallant
d17ca45063 deps: update termcolor to 1.0.0 2018-07-17 18:37:02 -04:00
Andrew Gallant
df469fe1b4 termcolor: moved to its own repository
We also move wincolor with it.

Fixes #924
2018-07-17 18:03:38 -04:00
Andrew Gallant
b64475aeac ignore: permit use of env::home_dir
Upstream deprecated env::home_dir because of minor bugs in some corner
cases. We should probably eventually migrate to a correct implementation
in the `dirs` crate, but using the buggy version is just fine for now.
2018-07-16 17:40:02 -04:00
dana
fca9709d94 complete: improve zsh completion
- Use groups with _arguments
- Conditionally complete 'negation' options (--messages, --no-column, &c.)
- Improve option exclusivity
- Improve option descriptions
- Improve completion of colour 'whens'
- Improve completion of colour specs
- Remove some unnecessary work-arounds
- Use more idiomatic conventions
2018-07-06 10:23:20 -04:00
dana
62b4813b8a ci: minor improvements to test_complete.sh 2018-07-06 10:23:20 -04:00
dana
b38b101c77 ripgrep: rename --maxdepth to --max-depth
We keep the old `--maxdepth` spelling to preserve backward
compatibility.

PR #967
2018-06-25 20:22:09 -04:00
dana
ac90316e35 ripgrep: add --fixed-strings flag
Fixes #964

PR #965
2018-06-25 17:02:02 -04:00
Andrew Gallant
a6467f880a ripgrep: disable autotests
This keeps us working in Rust 2018.

Fixes #923
2018-06-23 20:49:05 -04:00
Andrew Gallant
004bb35694 ripgrep/printer: fix small performance regression
This commit removes an unconditional extra regex search that is in fact
not always necessary. This can result in a 2x performance improvement in
cases where ripgrep reports many matches.

The fix itself isn't ideal, but we continue to punt on cleaning up the
printer until it is rewritten for libripgrep, which is happening Real
Soon Now.

Fixes #955
2018-06-23 20:49:05 -04:00
Andrew Gallant
cd6c190967 ripgrep: use new BufferedStandardStream from termcolor
Specifically, this will use a buffered writer when not printing to a tty.
This fixes a long standing performance regression where ripgrep would
slow down dramatically if it needed to report a lot of matches.

Fixes #955
2018-06-23 20:49:05 -04:00
Andrew Gallant
d5139228e5 termcolor: add BufferedStandardStream
This commit adds a new type, BufferedStandardStream, which emulates the
StandardStream API (sans `lock`), but will internally use a buffered
writer.

To achieve this, we add a new default method to the WriteColor trait that
indicates whether the underlying writer must synchronously communicate
with an API to control coloring (e.g., the Windows console API). The new
BufferedStandardStream then uses this default method to determine how
eager it should be to flush its buffer before employing color settings.
This should have basically zero overhead when using ANSI color escape
sequences.
2018-06-23 20:49:05 -04:00
Stepan Koltsov
bb16ba6311 ignore/types: add Bazel
`BUILD`, `WORKSPACE` are mentioned here:
https://docs.bazel.build/versions/master/build-ref.html

`*.bzl` is mentioned here:
https://docs.bazel.build/versions/master/skylark/concepts.html

PR #960
2018-06-23 15:25:43 -04:00
Andrew Gallant
5e85f2577b deps: update to regex 1.0.1
This causes SIMD to kick in automatically when compiling with stable
Rust 1.27+.

We also update the README to describe the current state of things.

Thanks to @hartley for pointing this out:
https://twitter.com/hartley/status/1009950392862453760
2018-06-21 20:14:23 -04:00
Jon Surrell
ca23a170f7 ripgrep: use exit code 2 to indicate error
Exit code 1 was shared to indicate both "no results" and "error." Use
status code 2 to indicate errors, similar to grep's behavior.

Fixes #948 

PR #954
2018-06-19 07:41:44 -04:00
Mårten Kongstad
223d7d9846 ignore/types: add Android makefile types
The Android platform is gradually moving from Makefiles (*.mk) to
Blueprint files (*.bp). Add support for both. See [1] for more
information.

1. https://android.googlesource.com/platform/build/blueprint/+/master-soong
2018-06-15 08:29:53 -04:00
Mårten Kongstad
e4bce86111 ignore/types: add AIDL file type
Add support for Android Interface Definition Language files (*.aidl).
See [1] for more information.

1. https://developer.android.com/guide/components/aidl
2018-06-15 08:29:53 -04:00
Tim Kilbourn
15fa77cdb3 ignore/types: add FIDL type
FIDL is the Fuchsia Interface Definition Language
https://fuchsia.googlesource.com/zircon/+/HEAD/docs/fidl/index.md
2018-06-06 07:42:08 -04:00
Ahmed El Gabri
c3f97513d6 doc: sync config file examples
This brings the examples for configuration files in sync between
the man page and the guide.
2018-05-25 06:42:05 -04:00
George Plymale II
c21b9b20cf doc: add MacPorts installation instructions 2018-05-24 13:01:28 -04:00
Khalid Jebbari
ce145c6a2a doc: update crates.io badge
This is the official markdown snippet from shields.io
2018-05-24 06:46:08 -04:00
Wesley Moore
a383d5c4e9 doc: add BSD packages to README 2018-05-14 06:45:39 -04:00
Michael Hay
64317bda9f doc: fix broken link to RegexSet docs 2018-05-08 12:03:47 -04:00
Stephen E. Baker
7f3a0f0828 ignore/types: add jsp extension to java type 2018-05-08 12:03:19 -04:00
Bastien Orivel
49f36c7dcd deps: update regex to 1.0
We retain the `simd-accel` feature on globset for backwards
compatibility, but will remove it in the next semver release.
2018-05-07 13:07:30 -04:00
Garrett Squire
83b4fdb8d6 ignore/doc: improve docs for case_insensitive
This commit updates the OverrideBuilder and GitignoreBuilder docs
for the case_insensitive method, denoting that it must be called before
adding any patterns.
2018-05-06 19:03:11 -04:00
Elliott Slaughter
8b57d78b96 doc: update suggested version for ignore 2018-05-06 19:02:08 -04:00
Bram Geron
a2d8c49d6f ignore/types: add shorthand 'hs' for Haskell 2018-05-03 09:05:00 -04:00
Andrew Gallant
6ffb4b7466 doc: go away snap
Snap has caused a number of nonsensical bug reports, and not even the
`--classic` flag seems capable of fixing them. Therefore, remove snap
from the README and put in a special line in the ISSUE_TEMPLATE about
snap.

FIxes #902
2018-04-30 15:25:51 -04:00
Andrew Gallant
198d1fede9 ignore/types: fix typo in puppet glob
Thanks @CYBAI for noticing this!

See #899
2018-04-29 09:30:44 -04:00
Zach Crownover
667b9a7d62 ignore/types: add puppet
Puppet is primarily written in it's own format of .pp files, but
custom facts and functions are often written in Ruby. The templating
language is ERB and so this will allow scanning of any of the three
most commonly used formats for Puppet specific things.
2018-04-29 09:21:48 -04:00
Andrew Gallant
1f528f1641 doc: update crates.io description
This brings it in line with the README.
2018-04-26 17:04:35 -04:00
Andrew Gallant
ab64da73ab ignore: speed up Gitignore::empty
This commit makes Gitignore::empty a bit faster by avoiding allocation
and manually specializing the implementation instead of routing it through
the GitignoreBuilder.

This helps improve uses of ripgrep that traverse *many* directories, and
in particular, when the use of ignores is disabled via command line
switches.

Fixes #835, Closes #836
2018-04-24 11:19:03 -04:00
Jonathan Klimt
1266de3d4c ignore/types: add verilog 2018-04-24 09:00:03 -04:00
Andrew Gallant
bf51058eb2 tests: fix tests on Windows
A bug in the atty crate was previously masking a problem with the
integration tests on Windows. Namely, the bug in atty resulted in
atty::is(Stdin) returning true if we couldn't get the file name for the
stdin stream. This in turn caused tests like `rg foo` to search the CWD,
which was the intended behavior. However, once the atty bug was fixed,
atty::is(Stdin) no longer returned true, causing `rg foo` searches to
fail.

On Unix-like systems, the atty behavior has always been correct.
However, on Unix-like systems we have a decent way of detecting whether
stdin is readable or not. If it isn't---which is the case in the
integration tests---then we fall back to searching the CWD. On Windows
however, we haven't yet implemented anything to detect whether stdin is
readable or not, so we must always assume that it is. Therefore, we
never get the "go ahead" to search the CWD and the tests fail.

Most of the tests are written to search the CWD explicitly, but there
were a few stragglers that don't.

This isn't great, and we should try to figure out how to do better stdin
detection on Windows.
2018-04-23 20:37:59 -04:00
Andrew Gallant
3dc6fe6f05 output: remove unnecessary mut binding 2018-04-23 20:06:03 -04:00
Andrew Gallant
06438d5360 changelog: update 2018-04-23 20:01:57 -04:00
Andrew Gallant
ae6f871491 output: remove --line-number-width flag
This commit does what no software project has ever done before: we've
outright removed a flag with no possible way to recapture its
functionality.

This flag presents numerous problems in that it never really worked well
in the first place, and completely falls over when ripgrep uses the
--no-heading output format. Well meaning users want ripgrep to fix this
by getting into the alignment business by buffering all output, but that
is a line that I refuse to cross.

Fixes #795
2018-04-23 19:57:22 -04:00
Andrew Gallant
ed059559cd deps: update to atty 0.2.9
https://github.com/softprops/atty/pull/25 was merged, so we can upgrade.
2018-04-23 19:32:39 -04:00
Andrew Gallant
b75526bd7f output: add --no-column flag
This disables columns in the output if they were otherwise enabled.

Fixes #880
2018-04-23 19:26:58 -04:00
Andrew Gallant
507801c1f2 ignore: support .git directory OR file
This improves support for submodules, which seem to use a '.git' file
instead of a '.git' directory to indicate a worktree.

Fixes #893
2018-04-23 18:33:25 -04:00
Andrew Gallant
2a9d007261 complete: add --no-ignore-messages 2018-04-23 18:29:02 -04:00
Andrew Gallant
0ee0b160b5 logging: add new --no-ignore-messages flag
The new --no-ignore-messages flag permits suppressing errors related to
parsing .gitignore or .ignore files. These error messages can be somewhat
annoying since they can surface from repositories that one has no control
over.

Fixes #646
2018-04-23 18:18:44 -04:00
Andrew Gallant
b4781e2f91 doc: more specific docs for --no-messages
This makes it clear that the --no-messages flag doesn't actually
suppress all error messages, and is therefore not equivalent to
redirecting stderr to /dev/null.

See also: #860
2018-04-23 18:07:57 -04:00
Jeremy Day
8cb03941e6 doc: add search and replace faq
Closes #870
2018-04-23 17:45:26 -04:00
Andrew Gallant
6b15ce2342 deps: update remove_dir_all 2018-04-21 12:13:16 -04:00
116 changed files with 20329 additions and 21768 deletions

View File

@@ -16,6 +16,7 @@ addons:
- zsh - zsh
# Needed for testing decompression search. # Needed for testing decompression search.
- xz-utils - xz-utils
- liblz4-tool
matrix: matrix:
fast_finish: true fast_finish: true
include: include:
@@ -59,13 +60,13 @@ matrix:
# Minimum Rust supported channel. We enable these to make sure ripgrep # Minimum Rust supported channel. We enable these to make sure ripgrep
# continues to work on the advertised minimum Rust version. # continues to work on the advertised minimum Rust version.
- os: linux - os: linux
rust: 1.20.0 rust: 1.23.0
env: TARGET=x86_64-unknown-linux-gnu env: TARGET=x86_64-unknown-linux-gnu
- os: linux - os: linux
rust: 1.20.0 rust: 1.23.0
env: TARGET=x86_64-unknown-linux-musl env: TARGET=x86_64-unknown-linux-musl
- os: linux - os: linux
rust: 1.20.0 rust: 1.23.0
env: TARGET=arm-unknown-linux-gnueabihf GCC_VERSION=4.8 env: TARGET=arm-unknown-linux-gnueabihf GCC_VERSION=4.8
addons: addons:
apt: apt:
@@ -98,6 +99,7 @@ branches:
only: only:
# Pushes and PR to the master branch # Pushes and PR to the master branch
- master - master
- ag/libripgrep
# Ruby regex to match tags. Required, or travis won't trigger deploys when # Ruby regex to match tags. Required, or travis won't trigger deploys when
# a new tag is pushed. # a new tag is pushed.
- /^\d+\.\d+\.\d+.*$/ - /^\d+\.\d+\.\d+.*$/

View File

@@ -1,10 +1,17 @@
0.9.0 (TBD) 0.9.0 (2018-08-03)
=========== ==================
This is a new minor version release of ripgrep that mostly contains bug fixes. This is a new minor version release of ripgrep that contains some minor new
features and a panoply of bug fixes.
Releases provided on Github for `x86` and `x86_64` will now work on all target Releases provided on Github for `x86_64` will now work on all target CPUs, and
CPUs, and will also automatically take advantage of features found on modern will also automatically take advantage of features found on modern CPUs (such
CPUs (such as AVX2) for additional optimizations. as AVX2) for additional optimizations.
This release increases the **minimum supported Rust version** from 1.20.0 to
1.23.0.
It is anticipated that the next release of ripgrep (0.10.0) will provide
multi-line search support and a JSON output format.
**BREAKING CHANGES**: **BREAKING CHANGES**:
@@ -17,17 +24,43 @@ CPUs (such as AVX2) for additional optimizations.
* Octal syntax is no longer supported. ripgrep previously accepted expressions * Octal syntax is no longer supported. ripgrep previously accepted expressions
like `\1` as syntax for matching `U+0001`, but ripgrep will now report an like `\1` as syntax for matching `U+0001`, but ripgrep will now report an
error instead. error instead.
* The `--line-number-width` flag has been removed. Its functionality was not
carefully considered with all ripgrep output formats.
See [#795](https://github.com/BurntSushi/ripgrep/issues/795) for more
details.
Feature enhancements: Feature enhancements:
* Added or improved file type filtering for Android, Bazel, Fuschia, Haskell,
Java and Puppet.
* [FEATURE #411](https://github.com/BurntSushi/ripgrep/issues/411): * [FEATURE #411](https://github.com/BurntSushi/ripgrep/issues/411):
Add a `--stats` flag, which emits aggregate statistics after search results. Add a `--stats` flag, which emits aggregate statistics after search results.
* [FEATURE #646](https://github.com/BurntSushi/ripgrep/issues/646):
Add a `--no-ignore-messages` flag, which suppresses parse errors from reading
`.ignore` and `.gitignore` files.
* [FEATURE #702](https://github.com/BurntSushi/ripgrep/issues/702): * [FEATURE #702](https://github.com/BurntSushi/ripgrep/issues/702):
Support `\u{..}` Unicode escape sequences. Support `\u{..}` Unicode escape sequences.
* [FEATURE #812](https://github.com/BurntSushi/ripgrep/issues/812): * [FEATURE #812](https://github.com/BurntSushi/ripgrep/issues/812):
Add `-b/--byte-offset` flag that reports byte offset of each matching line. Add `-b/--byte-offset` flag that shows the byte offset of each matching line.
* [FEATURE #814](https://github.com/BurntSushi/ripgrep/issues/814): * [FEATURE #814](https://github.com/BurntSushi/ripgrep/issues/814):
Add `--count-matches` flag, which is like `--count`, but for each match. Add `--count-matches` flag, which is like `--count`, but for each match.
* [FEATURE #880](https://github.com/BurntSushi/ripgrep/issues/880):
Add a `--no-column` flag, which disables column numbers in the output.
* [FEATURE #898](https://github.com/BurntSushi/ripgrep/issues/898):
Add support for `lz4` when using the `-z/--search-zip` flag.
* [FEATURE #924](https://github.com/BurntSushi/ripgrep/issues/924):
`termcolor` has moved to its own repository:
https://github.com/BurntSushi/termcolor
* [FEATURE #934](https://github.com/BurntSushi/ripgrep/issues/934):
Add a new flag, `--no-ignore-global`, that permits disabling global
gitignores.
* [FEATURE #967](https://github.com/BurntSushi/ripgrep/issues/967):
Rename `--maxdepth` to `--max-depth` for consistency. Keep `--maxdepth` for
backwards compatibility.
* [FEATURE #978](https://github.com/BurntSushi/ripgrep/issues/978):
Add a `--pre` option to filter inputs with an arbitrary program.
* [FEATURE fca9709d](https://github.com/BurntSushi/ripgrep/commit/fca9709d):
Improve zsh completion.
Bug fixes: Bug fixes:
@@ -41,14 +74,47 @@ Bug fixes:
Show comprehensible error messages for regexes like `\s*{`. Show comprehensible error messages for regexes like `\s*{`.
* [BUG #526](https://github.com/BurntSushi/ripgrep/issues/526): * [BUG #526](https://github.com/BurntSushi/ripgrep/issues/526):
Support backslash escapes in globs. Support backslash escapes in globs.
* [BUG #795](https://github.com/BurntSushi/ripgrep/issues/795):
Fix problems with `--line-number-width` by removing it.
* [BUG #832](https://github.com/BurntSushi/ripgrep/issues/832): * [BUG #832](https://github.com/BurntSushi/ripgrep/issues/832):
Clarify usage instructions for `-f/--file` flag. Clarify usage instructions for `-f/--file` flag.
* [BUG #835](https://github.com/BurntSushi/ripgrep/issues/835):
Fix small performance regression while crawling very large directory trees.
* [BUG #851](https://github.com/BurntSushi/ripgrep/issues/851): * [BUG #851](https://github.com/BurntSushi/ripgrep/issues/851):
Fix `-S/--smart-case` detection once and for all. Fix `-S/--smart-case` detection once and for all.
* [BUG #852](https://github.com/BurntSushi/ripgrep/issues/852): * [BUG #852](https://github.com/BurntSushi/ripgrep/issues/852):
Be robust with respect to `ENOMEM` errors returned by `mmap`. Be robust with respect to `ENOMEM` errors returned by `mmap`.
* [BUG #853](https://github.com/BurntSushi/ripgrep/issues/853): * [BUG #853](https://github.com/BurntSushi/ripgrep/issues/853):
Upgrade `grep` crate to `regex-syntax 0.5.0`. Upgrade `grep` crate to `regex-syntax 0.6.0`.
* [BUG #893](https://github.com/BurntSushi/ripgrep/issues/893):
Improve support for git submodules.
* [BUG #900](https://github.com/BurntSushi/ripgrep/issues/900):
When no patterns are given, ripgrep should never match anything.
* [BUG #907](https://github.com/BurntSushi/ripgrep/issues/907):
ripgrep will now stop traversing after the first file when `--quiet --files`
is used.
* [BUG #918](https://github.com/BurntSushi/ripgrep/issues/918):
Don't skip tar archives when `-z/--search-zip` is used.
* [BUG #934](https://github.com/BurntSushi/ripgrep/issues/934):
Don't respect gitignore files when searching outside git repositories.
* [BUG #948](https://github.com/BurntSushi/ripgrep/issues/948):
Use exit code 2 to indicate error, and use exit code 1 to indicate no
matches.
* [BUG #951](https://github.com/BurntSushi/ripgrep/issues/951):
Add stdin example to ripgrep usage documentation.
* [BUG #955](https://github.com/BurntSushi/ripgrep/issues/955):
Use buffered writing when not printing to a tty, which fixes a performance
regression.
* [BUG #957](https://github.com/BurntSushi/ripgrep/issues/957):
Improve the error message shown for `--path separator /` in some Windows
shells.
* [BUG #964](https://github.com/BurntSushi/ripgrep/issues/964):
Add a `--no-fixed-strings` flag to disable `-F/--fixed-strings`.
* [BUG #988](https://github.com/BurntSushi/ripgrep/issues/988):
Fix a bug in the `ignore` crate that prevented the use of explicit ignore
files after disabling all other ignore rules.
* [BUG #995](https://github.com/BurntSushi/ripgrep/issues/995):
Respect `$XDG_CONFIG_DIR/git/config` for detecting `core.excludesFile`.
0.8.1 (2018-02-20) 0.8.1 (2018-02-20)

349
Cargo.lock generated
View File

@@ -1,6 +1,6 @@
[[package]] [[package]]
name = "aho-corasick" name = "aho-corasick"
version = "0.6.4" version = "0.6.6"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)", "memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
@@ -11,22 +11,31 @@ name = "ansi_term"
version = "0.11.0" version = "0.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)", "winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
name = "atty" name = "atty"
version = "0.2.6" version = "0.2.11"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"libc 0.2.40 (registry+https://github.com/rust-lang/crates.io-index)", "libc 0.2.42 (registry+https://github.com/rust-lang/crates.io-index)",
"termion 1.5.1 (registry+https://github.com/rust-lang/crates.io-index)", "termion 1.5.1 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)", "winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "base64"
version = "0.9.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"byteorder 1.2.3 (registry+https://github.com/rust-lang/crates.io-index)",
"safemem 0.2.0 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
name = "bitflags" name = "bitflags"
version = "1.0.1" version = "1.0.3"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]] [[package]]
@@ -34,25 +43,30 @@ name = "bytecount"
version = "0.3.1" version = "0.3.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"simd 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)", "simd 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]]
name = "byteorder"
version = "1.2.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]] [[package]]
name = "cfg-if" name = "cfg-if"
version = "0.1.2" version = "0.1.4"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]] [[package]]
name = "clap" name = "clap"
version = "2.31.2" version = "2.32.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"ansi_term 0.11.0 (registry+https://github.com/rust-lang/crates.io-index)", "ansi_term 0.11.0 (registry+https://github.com/rust-lang/crates.io-index)",
"atty 0.2.6 (registry+https://github.com/rust-lang/crates.io-index)", "atty 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)",
"bitflags 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)", "bitflags 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
"strsim 0.7.0 (registry+https://github.com/rust-lang/crates.io-index)", "strsim 0.7.0 (registry+https://github.com/rust-lang/crates.io-index)",
"textwrap 0.9.0 (registry+https://github.com/rust-lang/crates.io-index)", "textwrap 0.10.0 (registry+https://github.com/rust-lang/crates.io-index)",
"unicode-width 0.1.4 (registry+https://github.com/rust-lang/crates.io-index)", "unicode-width 0.1.5 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
@@ -60,13 +74,26 @@ name = "crossbeam"
version = "0.3.2" version = "0.3.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "dtoa"
version = "0.4.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]] [[package]]
name = "encoding_rs" name = "encoding_rs"
version = "0.7.2" version = "0.8.4"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"cfg-if 0.1.2 (registry+https://github.com/rust-lang/crates.io-index)", "cfg-if 0.1.4 (registry+https://github.com/rust-lang/crates.io-index)",
"simd 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)", "simd 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "encoding_rs_io"
version = "0.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"encoding_rs 0.8.4 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
@@ -79,7 +106,7 @@ name = "fuchsia-zircon"
version = "0.3.3" version = "0.3.3"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"bitflags 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)", "bitflags 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)",
"fuchsia-zircon-sys 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)", "fuchsia-zircon-sys 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
@@ -95,59 +122,114 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]] [[package]]
name = "globset" name = "globset"
version = "0.4.0" version = "0.4.1"
dependencies = [ dependencies = [
"aho-corasick 0.6.4 (registry+https://github.com/rust-lang/crates.io-index)", "aho-corasick 0.6.6 (registry+https://github.com/rust-lang/crates.io-index)",
"fnv 1.0.6 (registry+https://github.com/rust-lang/crates.io-index)", "fnv 1.0.6 (registry+https://github.com/rust-lang/crates.io-index)",
"glob 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)", "glob 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)",
"log 0.4.1 (registry+https://github.com/rust-lang/crates.io-index)", "log 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)", "memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 0.2.10 (registry+https://github.com/rust-lang/crates.io-index)", "regex 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
name = "grep" name = "grep"
version = "0.1.8" version = "0.2.0"
dependencies = [
"grep-matcher 0.0.1",
"grep-printer 0.0.1",
"grep-regex 0.0.1",
"grep-searcher 0.0.1",
]
[[package]]
name = "grep-matcher"
version = "0.0.1"
dependencies = [ dependencies = [
"log 0.4.1 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)", "memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 0.2.10 (registry+https://github.com/rust-lang/crates.io-index)", "regex 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"regex-syntax 0.5.5 (registry+https://github.com/rust-lang/crates.io-index)", ]
[[package]]
name = "grep-printer"
version = "0.0.1"
dependencies = [
"base64 0.9.2 (registry+https://github.com/rust-lang/crates.io-index)",
"grep-matcher 0.0.1",
"grep-regex 0.0.1",
"grep-searcher 0.0.1",
"log 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)",
"serde 1.0.70 (registry+https://github.com/rust-lang/crates.io-index)",
"serde_derive 1.0.70 (registry+https://github.com/rust-lang/crates.io-index)",
"serde_json 1.0.24 (registry+https://github.com/rust-lang/crates.io-index)",
"termcolor 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "grep-regex"
version = "0.0.1"
dependencies = [
"grep-matcher 0.0.1",
"log 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"regex-syntax 0.6.2 (registry+https://github.com/rust-lang/crates.io-index)",
"thread_local 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
"utf8-ranges 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "grep-searcher"
version = "0.0.1"
dependencies = [
"bytecount 0.3.1 (registry+https://github.com/rust-lang/crates.io-index)",
"encoding_rs 0.8.4 (registry+https://github.com/rust-lang/crates.io-index)",
"encoding_rs_io 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
"grep-matcher 0.0.1",
"grep-regex 0.0.1",
"log 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"memmap 0.6.2 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
name = "ignore" name = "ignore"
version = "0.4.2" version = "0.4.3"
dependencies = [ dependencies = [
"crossbeam 0.3.2 (registry+https://github.com/rust-lang/crates.io-index)", "crossbeam 0.3.2 (registry+https://github.com/rust-lang/crates.io-index)",
"globset 0.4.0", "globset 0.4.1",
"lazy_static 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)", "lazy_static 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"log 0.4.1 (registry+https://github.com/rust-lang/crates.io-index)", "log 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)", "memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 0.2.10 (registry+https://github.com/rust-lang/crates.io-index)", "regex 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"same-file 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)", "same-file 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"tempdir 0.3.7 (registry+https://github.com/rust-lang/crates.io-index)", "tempdir 0.3.7 (registry+https://github.com/rust-lang/crates.io-index)",
"thread_local 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)", "thread_local 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
"walkdir 2.1.4 (registry+https://github.com/rust-lang/crates.io-index)", "walkdir 2.1.4 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)", "winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]]
name = "itoa"
version = "0.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]] [[package]]
name = "lazy_static" name = "lazy_static"
version = "1.0.0" version = "1.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]] [[package]]
name = "libc" name = "libc"
version = "0.2.40" version = "0.2.42"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]] [[package]]
name = "log" name = "log"
version = "0.4.1" version = "0.4.3"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"cfg-if 0.1.2 (registry+https://github.com/rust-lang/crates.io-index)", "cfg-if 0.1.4 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
@@ -155,7 +237,7 @@ name = "memchr"
version = "2.0.1" version = "2.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"libc 0.2.40 (registry+https://github.com/rust-lang/crates.io-index)", "libc 0.2.42 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
@@ -163,8 +245,8 @@ name = "memmap"
version = "0.6.2" version = "0.6.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"libc 0.2.40 (registry+https://github.com/rust-lang/crates.io-index)", "libc 0.2.42 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)", "winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
@@ -172,7 +254,23 @@ name = "num_cpus"
version = "1.8.0" version = "1.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"libc 0.2.40 (registry+https://github.com/rust-lang/crates.io-index)", "libc 0.2.42 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "proc-macro2"
version = "0.4.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"unicode-xid 0.1.0 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "quote"
version = "0.6.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"proc-macro2 0.4.9 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
@@ -181,13 +279,13 @@ version = "0.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"fuchsia-zircon 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)", "fuchsia-zircon 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.40 (registry+https://github.com/rust-lang/crates.io-index)", "libc 0.2.42 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)", "winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
name = "redox_syscall" name = "redox_syscall"
version = "0.1.37" version = "0.1.40"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]] [[package]]
@@ -195,24 +293,24 @@ name = "redox_termios"
version = "0.1.1" version = "0.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"redox_syscall 0.1.37 (registry+https://github.com/rust-lang/crates.io-index)", "redox_syscall 0.1.40 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
name = "regex" name = "regex"
version = "0.2.10" version = "1.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"aho-corasick 0.6.4 (registry+https://github.com/rust-lang/crates.io-index)", "aho-corasick 0.6.6 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)", "memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"regex-syntax 0.5.5 (registry+https://github.com/rust-lang/crates.io-index)", "regex-syntax 0.6.2 (registry+https://github.com/rust-lang/crates.io-index)",
"thread_local 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)", "thread_local 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
"utf8-ranges 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)", "utf8-ranges 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
name = "regex-syntax" name = "regex-syntax"
version = "0.5.5" version = "0.6.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"ucd-util 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)", "ucd-util 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
@@ -220,46 +318,71 @@ dependencies = [
[[package]] [[package]]
name = "remove_dir_all" name = "remove_dir_all"
version = "0.5.0" version = "0.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)", "winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
name = "ripgrep" name = "ripgrep"
version = "0.8.1" version = "0.9.0"
dependencies = [ dependencies = [
"atty 0.2.6 (registry+https://github.com/rust-lang/crates.io-index)", "atty 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)",
"bytecount 0.3.1 (registry+https://github.com/rust-lang/crates.io-index)", "clap 2.32.0 (registry+https://github.com/rust-lang/crates.io-index)",
"clap 2.31.2 (registry+https://github.com/rust-lang/crates.io-index)", "globset 0.4.1",
"encoding_rs 0.7.2 (registry+https://github.com/rust-lang/crates.io-index)", "grep 0.2.0",
"globset 0.4.0", "ignore 0.4.3",
"grep 0.1.8", "lazy_static 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"ignore 0.4.2", "log 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)",
"lazy_static 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.40 (registry+https://github.com/rust-lang/crates.io-index)",
"log 0.4.1 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"memmap 0.6.2 (registry+https://github.com/rust-lang/crates.io-index)",
"num_cpus 1.8.0 (registry+https://github.com/rust-lang/crates.io-index)", "num_cpus 1.8.0 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 0.2.10 (registry+https://github.com/rust-lang/crates.io-index)", "regex 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"same-file 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)", "same-file 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"termcolor 0.3.6", "termcolor 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)", "winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]]
name = "safemem"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]] [[package]]
name = "same-file" name = "same-file"
version = "1.0.2" version = "1.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)", "winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "serde"
version = "1.0.70"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "serde_derive"
version = "1.0.70"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"proc-macro2 0.4.9 (registry+https://github.com/rust-lang/crates.io-index)",
"quote 0.6.3 (registry+https://github.com/rust-lang/crates.io-index)",
"syn 0.14.4 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "serde_json"
version = "1.0.24"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"dtoa 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)",
"itoa 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)",
"serde 1.0.70 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
name = "simd" name = "simd"
version = "0.2.1" version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]] [[package]]
@@ -267,20 +390,31 @@ name = "strsim"
version = "0.7.0" version = "0.7.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "syn"
version = "0.14.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"proc-macro2 0.4.9 (registry+https://github.com/rust-lang/crates.io-index)",
"quote 0.6.3 (registry+https://github.com/rust-lang/crates.io-index)",
"unicode-xid 0.1.0 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]] [[package]]
name = "tempdir" name = "tempdir"
version = "0.3.7" version = "0.3.7"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"rand 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)", "rand 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)",
"remove_dir_all 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)", "remove_dir_all 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
name = "termcolor" name = "termcolor"
version = "0.3.6" version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"wincolor 0.1.6", "wincolor 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
@@ -288,17 +422,17 @@ name = "termion"
version = "1.5.1" version = "1.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"libc 0.2.40 (registry+https://github.com/rust-lang/crates.io-index)", "libc 0.2.42 (registry+https://github.com/rust-lang/crates.io-index)",
"redox_syscall 0.1.37 (registry+https://github.com/rust-lang/crates.io-index)", "redox_syscall 0.1.40 (registry+https://github.com/rust-lang/crates.io-index)",
"redox_termios 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)", "redox_termios 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
name = "textwrap" name = "textwrap"
version = "0.9.0" version = "0.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"unicode-width 0.1.4 (registry+https://github.com/rust-lang/crates.io-index)", "unicode-width 0.1.5 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
@@ -306,7 +440,7 @@ name = "thread_local"
version = "0.3.5" version = "0.3.5"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"lazy_static 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)", "lazy_static 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"unreachable 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)", "unreachable 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
@@ -317,7 +451,12 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]] [[package]]
name = "unicode-width" name = "unicode-width"
version = "0.1.4" version = "0.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "unicode-xid"
version = "0.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]] [[package]]
@@ -344,12 +483,12 @@ version = "2.1.4"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"same-file 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)", "same-file 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)", "winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[[package]] [[package]]
name = "winapi" name = "winapi"
version = "0.3.4" version = "0.3.5"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"winapi-i686-pc-windows-gnu 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)", "winapi-i686-pc-windows-gnu 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)",
@@ -368,50 +507,66 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]] [[package]]
name = "wincolor" name = "wincolor"
version = "0.1.6" version = "1.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [ dependencies = [
"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)", "winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
] ]
[metadata] [metadata]
"checksum aho-corasick 0.6.4 (registry+https://github.com/rust-lang/crates.io-index)" = "d6531d44de723825aa81398a6415283229725a00fa30713812ab9323faa82fc4" "checksum aho-corasick 0.6.6 (registry+https://github.com/rust-lang/crates.io-index)" = "c1c6d463cbe7ed28720b5b489e7c083eeb8f90d08be2a0d6bb9e1ffea9ce1afa"
"checksum ansi_term 0.11.0 (registry+https://github.com/rust-lang/crates.io-index)" = "ee49baf6cb617b853aa8d93bf420db2383fab46d314482ca2803b40d5fde979b" "checksum ansi_term 0.11.0 (registry+https://github.com/rust-lang/crates.io-index)" = "ee49baf6cb617b853aa8d93bf420db2383fab46d314482ca2803b40d5fde979b"
"checksum atty 0.2.6 (registry+https://github.com/rust-lang/crates.io-index)" = "8352656fd42c30a0c3c89d26dea01e3b77c0ab2af18230835c15e2e13cd51859" "checksum atty 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)" = "9a7d5b8723950951411ee34d271d99dddcc2035a16ab25310ea2c8cfd4369652"
"checksum bitflags 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)" = "b3c30d3802dfb7281680d6285f2ccdaa8c2d8fee41f93805dba5c4cf50dc23cf" "checksum base64 0.9.2 (registry+https://github.com/rust-lang/crates.io-index)" = "85415d2594767338a74a30c1d370b2f3262ec1b4ed2d7bba5b3faf4de40467d9"
"checksum bitflags 1.0.3 (registry+https://github.com/rust-lang/crates.io-index)" = "d0c54bb8f454c567f21197eefcdbf5679d0bd99f2ddbe52e84c77061952e6789"
"checksum bytecount 0.3.1 (registry+https://github.com/rust-lang/crates.io-index)" = "882585cd7ec84e902472df34a5e01891202db3bf62614e1f0afe459c1afcf744" "checksum bytecount 0.3.1 (registry+https://github.com/rust-lang/crates.io-index)" = "882585cd7ec84e902472df34a5e01891202db3bf62614e1f0afe459c1afcf744"
"checksum cfg-if 0.1.2 (registry+https://github.com/rust-lang/crates.io-index)" = "d4c819a1287eb618df47cc647173c5c4c66ba19d888a6e50d605672aed3140de" "checksum byteorder 1.2.3 (registry+https://github.com/rust-lang/crates.io-index)" = "74c0b906e9446b0a2e4f760cdb3fa4b2c48cdc6db8766a845c54b6ff063fd2e9"
"checksum clap 2.31.2 (registry+https://github.com/rust-lang/crates.io-index)" = "f0f16b89cbb9ee36d87483dc939fe9f1e13c05898d56d7b230a0d4dff033a536" "checksum cfg-if 0.1.4 (registry+https://github.com/rust-lang/crates.io-index)" = "efe5c877e17a9c717a0bf3613b2709f723202c4e4675cc8f12926ded29bcb17e"
"checksum clap 2.32.0 (registry+https://github.com/rust-lang/crates.io-index)" = "b957d88f4b6a63b9d70d5f454ac8011819c6efa7727858f458ab71c756ce2d3e"
"checksum crossbeam 0.3.2 (registry+https://github.com/rust-lang/crates.io-index)" = "24ce9782d4d5c53674646a6a4c1863a21a8fc0cb649b3c94dfc16e45071dea19" "checksum crossbeam 0.3.2 (registry+https://github.com/rust-lang/crates.io-index)" = "24ce9782d4d5c53674646a6a4c1863a21a8fc0cb649b3c94dfc16e45071dea19"
"checksum encoding_rs 0.7.2 (registry+https://github.com/rust-lang/crates.io-index)" = "98fd0f24d1fb71a4a6b9330c8ca04cbd4e7cc5d846b54ca74ff376bc7c9f798d" "checksum dtoa 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)" = "6d301140eb411af13d3115f9a562c85cc6b541ade9dfa314132244aaee7489dd"
"checksum encoding_rs 0.8.4 (registry+https://github.com/rust-lang/crates.io-index)" = "88a1b66a0d28af4b03a8c8278c6dcb90e6e600d89c14500a9e7a02e64b9ee3ac"
"checksum encoding_rs_io 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "ad0ffe753ba194ef1bc070e8d61edaadb1536c05e364fc9178ca6cbde10922c4"
"checksum fnv 1.0.6 (registry+https://github.com/rust-lang/crates.io-index)" = "2fad85553e09a6f881f739c29f0b00b0f01357c743266d478b68951ce23285f3" "checksum fnv 1.0.6 (registry+https://github.com/rust-lang/crates.io-index)" = "2fad85553e09a6f881f739c29f0b00b0f01357c743266d478b68951ce23285f3"
"checksum fuchsia-zircon 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)" = "2e9763c69ebaae630ba35f74888db465e49e259ba1bc0eda7d06f4a067615d82" "checksum fuchsia-zircon 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)" = "2e9763c69ebaae630ba35f74888db465e49e259ba1bc0eda7d06f4a067615d82"
"checksum fuchsia-zircon-sys 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)" = "3dcaa9ae7725d12cdb85b3ad99a434db70b468c09ded17e012d86b5c1010f7a7" "checksum fuchsia-zircon-sys 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)" = "3dcaa9ae7725d12cdb85b3ad99a434db70b468c09ded17e012d86b5c1010f7a7"
"checksum glob 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)" = "8be18de09a56b60ed0edf84bc9df007e30040691af7acd1c41874faac5895bfb" "checksum glob 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)" = "8be18de09a56b60ed0edf84bc9df007e30040691af7acd1c41874faac5895bfb"
"checksum lazy_static 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "c8f31047daa365f19be14b47c29df4f7c3b581832407daabe6ae77397619237d" "checksum itoa 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)" = "5adb58558dcd1d786b5f0bd15f3226ee23486e24b7b58304b60f64dc68e62606"
"checksum libc 0.2.40 (registry+https://github.com/rust-lang/crates.io-index)" = "6fd41f331ac7c5b8ac259b8bf82c75c0fb2e469bbf37d2becbba9a6a2221965b" "checksum lazy_static 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)" = "fb497c35d362b6a331cfd94956a07fc2c78a4604cdbee844a81170386b996dd3"
"checksum log 0.4.1 (registry+https://github.com/rust-lang/crates.io-index)" = "89f010e843f2b1a31dbd316b3b8d443758bc634bed37aabade59c686d644e0a2" "checksum libc 0.2.42 (registry+https://github.com/rust-lang/crates.io-index)" = "b685088df2b950fccadf07a7187c8ef846a959c142338a48f9dc0b94517eb5f1"
"checksum log 0.4.3 (registry+https://github.com/rust-lang/crates.io-index)" = "61bd98ae7f7b754bc53dca7d44b604f733c6bba044ea6f41bc8d89272d8161d2"
"checksum memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)" = "796fba70e76612589ed2ce7f45282f5af869e0fdd7cc6199fa1aa1f1d591ba9d" "checksum memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)" = "796fba70e76612589ed2ce7f45282f5af869e0fdd7cc6199fa1aa1f1d591ba9d"
"checksum memmap 0.6.2 (registry+https://github.com/rust-lang/crates.io-index)" = "e2ffa2c986de11a9df78620c01eeaaf27d94d3ff02bf81bfcca953102dd0c6ff" "checksum memmap 0.6.2 (registry+https://github.com/rust-lang/crates.io-index)" = "e2ffa2c986de11a9df78620c01eeaaf27d94d3ff02bf81bfcca953102dd0c6ff"
"checksum num_cpus 1.8.0 (registry+https://github.com/rust-lang/crates.io-index)" = "c51a3322e4bca9d212ad9a158a02abc6934d005490c054a2778df73a70aa0a30" "checksum num_cpus 1.8.0 (registry+https://github.com/rust-lang/crates.io-index)" = "c51a3322e4bca9d212ad9a158a02abc6934d005490c054a2778df73a70aa0a30"
"checksum proc-macro2 0.4.9 (registry+https://github.com/rust-lang/crates.io-index)" = "cccdc7557a98fe98453030f077df7f3a042052fae465bb61d2c2c41435cfd9b6"
"checksum quote 0.6.3 (registry+https://github.com/rust-lang/crates.io-index)" = "e44651a0dc4cdd99f71c83b561e221f714912d11af1a4dff0631f923d53af035"
"checksum rand 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)" = "eba5f8cb59cc50ed56be8880a5c7b496bfd9bd26394e176bc67884094145c2c5" "checksum rand 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)" = "eba5f8cb59cc50ed56be8880a5c7b496bfd9bd26394e176bc67884094145c2c5"
"checksum redox_syscall 0.1.37 (registry+https://github.com/rust-lang/crates.io-index)" = "0d92eecebad22b767915e4d529f89f28ee96dbbf5a4810d2b844373f136417fd" "checksum redox_syscall 0.1.40 (registry+https://github.com/rust-lang/crates.io-index)" = "c214e91d3ecf43e9a4e41e578973adeb14b474f2bee858742d127af75a0112b1"
"checksum redox_termios 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "7e891cfe48e9100a70a3b6eb652fef28920c117d366339687bd5576160db0f76" "checksum redox_termios 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "7e891cfe48e9100a70a3b6eb652fef28920c117d366339687bd5576160db0f76"
"checksum regex 0.2.10 (registry+https://github.com/rust-lang/crates.io-index)" = "aec3f58d903a7d2a9dc2bf0e41a746f4530e0cab6b615494e058f67a3ef947fb" "checksum regex 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)" = "5bbbea44c5490a1e84357ff28b7d518b4619a159fed5d25f6c1de2d19cc42814"
"checksum regex-syntax 0.5.5 (registry+https://github.com/rust-lang/crates.io-index)" = "bd90079345f4a4c3409214734ae220fd773c6f2e8a543d07370c6c1c369cfbfb" "checksum regex-syntax 0.6.2 (registry+https://github.com/rust-lang/crates.io-index)" = "747ba3b235651f6e2f67dfa8bcdcd073ddb7c243cb21c442fc12395dfcac212d"
"checksum remove_dir_all 0.5.0 (registry+https://github.com/rust-lang/crates.io-index)" = "dfc5b3ce5d5ea144bb04ebd093a9e14e9765bcfec866aecda9b6dec43b3d1e24" "checksum remove_dir_all 0.5.1 (registry+https://github.com/rust-lang/crates.io-index)" = "3488ba1b9a2084d38645c4c08276a1752dcbf2c7130d74f1569681ad5d2799c5"
"checksum safemem 0.2.0 (registry+https://github.com/rust-lang/crates.io-index)" = "e27a8b19b835f7aea908818e871f5cc3a5a186550c30773be987e155e8163d8f"
"checksum same-file 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)" = "cfb6eded0b06a0b512c8ddbcf04089138c9b4362c2f696f3c3d76039d68f3637" "checksum same-file 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)" = "cfb6eded0b06a0b512c8ddbcf04089138c9b4362c2f696f3c3d76039d68f3637"
"checksum simd 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)" = "3dd0805c7363ab51a829a1511ad24b6ed0349feaa756c4bc2f977f9f496e6673" "checksum serde 1.0.70 (registry+https://github.com/rust-lang/crates.io-index)" = "0c3adf19c07af6d186d91dae8927b83b0553d07ca56cbf7f2f32560455c91920"
"checksum serde_derive 1.0.70 (registry+https://github.com/rust-lang/crates.io-index)" = "3525a779832b08693031b8ecfb0de81cd71cfd3812088fafe9a7496789572124"
"checksum serde_json 1.0.24 (registry+https://github.com/rust-lang/crates.io-index)" = "c3c6908c7b925cd6c590358a4034de93dbddb20c45e1d021931459fd419bf0e2"
"checksum simd 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)" = "ed3686dd9418ebcc3a26a0c0ae56deab0681e53fe899af91f5bbcee667ebffb1"
"checksum strsim 0.7.0 (registry+https://github.com/rust-lang/crates.io-index)" = "bb4f380125926a99e52bc279241539c018323fab05ad6368b56f93d9369ff550" "checksum strsim 0.7.0 (registry+https://github.com/rust-lang/crates.io-index)" = "bb4f380125926a99e52bc279241539c018323fab05ad6368b56f93d9369ff550"
"checksum syn 0.14.4 (registry+https://github.com/rust-lang/crates.io-index)" = "2beff8ebc3658f07512a413866875adddd20f4fd47b2a4e6c9da65cd281baaea"
"checksum tempdir 0.3.7 (registry+https://github.com/rust-lang/crates.io-index)" = "15f2b5fb00ccdf689e0149d1b1b3c03fead81c2b37735d812fa8bddbbf41b6d8" "checksum tempdir 0.3.7 (registry+https://github.com/rust-lang/crates.io-index)" = "15f2b5fb00ccdf689e0149d1b1b3c03fead81c2b37735d812fa8bddbbf41b6d8"
"checksum termcolor 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)" = "722426c4a0539da2c4ffd9b419d90ad540b4cff4a053be9069c908d4d07e2836"
"checksum termion 1.5.1 (registry+https://github.com/rust-lang/crates.io-index)" = "689a3bdfaab439fd92bc87df5c4c78417d3cbe537487274e9b0b2dce76e92096" "checksum termion 1.5.1 (registry+https://github.com/rust-lang/crates.io-index)" = "689a3bdfaab439fd92bc87df5c4c78417d3cbe537487274e9b0b2dce76e92096"
"checksum textwrap 0.9.0 (registry+https://github.com/rust-lang/crates.io-index)" = "c0b59b6b4b44d867f1370ef1bd91bfb262bf07bf0ae65c202ea2fbc16153b693" "checksum textwrap 0.10.0 (registry+https://github.com/rust-lang/crates.io-index)" = "307686869c93e71f94da64286f9a9524c0f308a9e1c87a583de8e9c9039ad3f6"
"checksum thread_local 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)" = "279ef31c19ededf577bfd12dfae728040a21f635b06a24cd670ff510edd38963" "checksum thread_local 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)" = "279ef31c19ededf577bfd12dfae728040a21f635b06a24cd670ff510edd38963"
"checksum ucd-util 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "fd2be2d6639d0f8fe6cdda291ad456e23629558d466e2789d2c3e9892bda285d" "checksum ucd-util 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "fd2be2d6639d0f8fe6cdda291ad456e23629558d466e2789d2c3e9892bda285d"
"checksum unicode-width 0.1.4 (registry+https://github.com/rust-lang/crates.io-index)" = "bf3a113775714a22dcb774d8ea3655c53a32debae63a063acc00a91cc586245f" "checksum unicode-width 0.1.5 (registry+https://github.com/rust-lang/crates.io-index)" = "882386231c45df4700b275c7ff55b6f3698780a650026380e72dabe76fa46526"
"checksum unicode-xid 0.1.0 (registry+https://github.com/rust-lang/crates.io-index)" = "fc72304796d0818e357ead4e000d19c9c174ab23dc11093ac919054d20a6a7fc"
"checksum unreachable 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "382810877fe448991dfc7f0dd6e3ae5d58088fd0ea5e35189655f84e6814fa56" "checksum unreachable 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "382810877fe448991dfc7f0dd6e3ae5d58088fd0ea5e35189655f84e6814fa56"
"checksum utf8-ranges 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "662fab6525a98beff2921d7f61a39e7d59e0b425ebc7d0d9e66d316e55124122" "checksum utf8-ranges 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "662fab6525a98beff2921d7f61a39e7d59e0b425ebc7d0d9e66d316e55124122"
"checksum void 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)" = "6a02e4885ed3bc0f2de90ea6dd45ebcbb66dacffe03547fadbb0eeae2770887d" "checksum void 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)" = "6a02e4885ed3bc0f2de90ea6dd45ebcbb66dacffe03547fadbb0eeae2770887d"
"checksum walkdir 2.1.4 (registry+https://github.com/rust-lang/crates.io-index)" = "63636bd0eb3d00ccb8b9036381b526efac53caf112b7783b730ab3f8e44da369" "checksum walkdir 2.1.4 (registry+https://github.com/rust-lang/crates.io-index)" = "63636bd0eb3d00ccb8b9036381b526efac53caf112b7783b730ab3f8e44da369"
"checksum winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)" = "04e3bd221fcbe8a271359c04f21a76db7d0c6028862d1bb5512d85e1e2eb5bb3" "checksum winapi 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)" = "773ef9dcc5f24b7d850d0ff101e542ff24c3b090a9768e03ff889fdef41f00fd"
"checksum winapi-i686-pc-windows-gnu 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)" = "ac3b87c63620426dd9b991e5ce0329eff545bccbbb34f3be09ff6fb6ab51b7b6" "checksum winapi-i686-pc-windows-gnu 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)" = "ac3b87c63620426dd9b991e5ce0329eff545bccbbb34f3be09ff6fb6ab51b7b6"
"checksum winapi-x86_64-pc-windows-gnu 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)" = "712e227841d057c1ee1cd2fb22fa7e5a5461ae8e48fa2ca79ec42cfc1931183f" "checksum winapi-x86_64-pc-windows-gnu 0.4.0 (registry+https://github.com/rust-lang/crates.io-index)" = "712e227841d057c1ee1cd2fb22fa7e5a5461ae8e48fa2ca79ec42cfc1931183f"
"checksum wincolor 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "b9dc3aa9dcda98b5a16150c54619c1ead22e3d3a5d458778ae914be760aa981a"

View File

@@ -1,10 +1,11 @@
[package] [package]
name = "ripgrep" name = "ripgrep"
version = "0.8.1" #:version version = "0.9.0" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"] authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """ description = """
Line oriented search tool using Rust's regex library. Combines the raw ripgrep is a line-oriented search tool that recursively searches your current
performance of grep with the usability of the silver searcher. directory for a regex pattern while respecting your gitignore rules. ripgrep
has first class support on Windows, macOS and Linux
""" """
documentation = "https://github.com/BurntSushi/ripgrep" documentation = "https://github.com/BurntSushi/ripgrep"
homepage = "https://github.com/BurntSushi/ripgrep" homepage = "https://github.com/BurntSushi/ripgrep"
@@ -12,9 +13,10 @@ repository = "https://github.com/BurntSushi/ripgrep"
readme = "README.md" readme = "README.md"
keywords = ["regex", "grep", "egrep", "search", "pattern"] keywords = ["regex", "grep", "egrep", "search", "pattern"]
categories = ["command-line-utilities", "text-processing"] categories = ["command-line-utilities", "text-processing"]
license = "Unlicense/MIT" license = "Unlicense OR MIT"
exclude = ["HomebrewFormula"] exclude = ["HomebrewFormula"]
build = "build.rs" build = "build.rs"
autotests = false
[badges] [badges]
travis-ci = { repository = "BurntSushi/ripgrep" } travis-ci = { repository = "BurntSushi/ripgrep" }
@@ -30,24 +32,22 @@ name = "integration"
path = "tests/tests.rs" path = "tests/tests.rs"
[workspace] [workspace]
members = ["grep", "globset", "ignore", "termcolor", "wincolor"] members = [
"grep", "globset", "ignore",
"grep-matcher", "grep-printer", "grep-regex", "grep-searcher",
]
[dependencies] [dependencies]
atty = "=0.2.6" atty = "0.2.11"
bytecount = "0.3.1"
encoding_rs = "0.7"
globset = { version = "0.4.0", path = "globset" } globset = { version = "0.4.0", path = "globset" }
grep = { version = "0.1.8", path = "grep" } grep = { version = "0.2.0", path = "grep" }
ignore = { version = "0.4.0", path = "ignore" } ignore = { version = "0.4.0", path = "ignore" }
lazy_static = "1" lazy_static = "1"
libc = "0.2"
log = "0.4" log = "0.4"
memchr = "2"
memmap = "0.6"
num_cpus = "1" num_cpus = "1"
regex = "0.2.10" regex = "1"
same-file = "1" same-file = "1"
termcolor = { version = "0.3.4", path = "termcolor" } termcolor = "1"
[dependencies.clap] [dependencies.clap]
version = "2.29.4" version = "2.29.4"
@@ -67,13 +67,8 @@ default-features = false
features = ["suggestions", "color"] features = ["suggestions", "color"]
[features] [features]
avx-accel = ["bytecount/avx-accel", "regex/unstable"] avx-accel = ["grep/avx-accel"]
simd-accel = [ simd-accel = ["grep/simd-accel"]
"bytecount/simd-accel",
"encoding_rs/simd-accel",
"regex/unstable",
]
unstable = ["regex/unstable"]
[profile.release] [profile.release]
debug = true debug = true

67
FAQ.md
View File

@@ -20,6 +20,7 @@
* [How do I create an alias for ripgrep on Windows?](#rg-alias-windows) * [How do I create an alias for ripgrep on Windows?](#rg-alias-windows)
* [How do I create a PowerShell profile?](#powershell-profile) * [How do I create a PowerShell profile?](#powershell-profile)
* [How do I pipe non-ASCII content to ripgrep on Windows?](#pipe-non-ascii-windows) * [How do I pipe non-ASCII content to ripgrep on Windows?](#pipe-non-ascii-windows)
* [How can I search and replace with ripgrep?](#search-and-replace)
* [How is ripgrep licensed?](#license) * [How is ripgrep licensed?](#license)
* [Can ripgrep replace grep?](#posix4ever) * [Can ripgrep replace grep?](#posix4ever)
* [What does the "rip" in ripgrep mean?](#intentcountsforsomething) * [What does the "rip" in ripgrep mean?](#intentcountsforsomething)
@@ -134,7 +135,7 @@ How do I search compressed files?
</h3> </h3>
ripgrep's `-z/--search-zip` flag will cause it to search compressed files ripgrep's `-z/--search-zip` flag will cause it to search compressed files
automatically. Currently, this supports gzip, bzip2, lzma and xz only and automatically. Currently, this supports gzip, bzip2, lzma, lz4 and xz only and
requires the corresponding `gzip`, `bzip2` and `xz` binaries to be installed on requires the corresponding `gzip`, `bzip2` and `xz` binaries to be installed on
your system. (That is, ripgrep does decompression by shelling out to another your system. (That is, ripgrep does decompression by shelling out to another
process.) process.)
@@ -470,6 +471,70 @@ that the console will use for printing to UTF-8 with
will also reset when PowerShell is restarted, so you can add that line will also reset when PowerShell is restarted, so you can add that line
to your profile as well if you want to make the setting permanent. to your profile as well if you want to make the setting permanent.
<h3 name="search-and-replace">
How can I search and replace with ripgrep?
</h3>
Using ripgrep alone, you can't. ripgrep is a search tool that will never
touch your files. However, the output of ripgrep can be piped to other tools
that do modify files on disk. See
[this issue](https://github.com/BurntSushi/ripgrep/issues/74) for more
information.
sed is one such tool that can modify files on disk. sed can take a filename
and a substitution command to search and replace in the specified file.
Files containing matching patterns can be provided to sed using
```
rg foo --files-with-matches
```
The output of this command is a list of filenames that contain a match for
the `foo` pattern.
This list can be piped into `xargs`, which will split the filenames from
standard input into arguments for the command following xargs. You can use this
combination to pipe a list of filenames into sed for replacement. For example:
```
rg foo --files-with-matches | xargs sed -i 's/foo/bar/g'
```
will replace all instances of 'foo' with 'bar' in the files in which
ripgrep finds the foo pattern. The `-i` flag to sed indicates that you are
editing files in place, and `s/foo/bar/g` says that you are performing a
**s**ubstitution of the pattren `foo` for `bar`, and that you are doing this
substitution **g**lobally (all occurrences of the pattern in each file).
Note: the above command assumes that you are using GNU sed. If you are using
BSD sed (the default on macOS and FreeBSD) then you must modify the above
command to be the following:
```
rg foo --files-with-matches | xargs sed -i '' 's/foo/bar/g'
```
The `-i` flag in BSD sed requires a file extension to be given to make backups
for all modified files. Specifying the empty string prevents file backups from
being made.
Finally, if any of your file paths contain whitespace in them, then you might
need to delimit your file paths with a NUL terminator. This requires telling
ripgrep to output NUL bytes between each path, and telling xargs to read paths
delimited by NUL bytes:
```
rg foo --files-with-matches -0 | xargs -0 sed -i 's/foo/bar/g'
```
To learn more about sed, see the sed manual
[here](https://www.gnu.org/software/sed/manual/sed.html).
Additionally, Facebook has a tool called
[fastmod](https://github.com/facebookincubator/fastmod)
that uses some of the same libraries as ripgrep and might provide a more
ergonomic search-and-replace experience.
<h3 name="license"> <h3 name="license">
How is ripgrep licensed? How is ripgrep licensed?

View File

@@ -539,6 +539,13 @@ $ cat $HOME/.ripgreprc
--type-add --type-add
web:*.{html,css,js}* web:*.{html,css,js}*
# Using glob patterns to include/exclude files or folders
--glob=!git/*
# or
--glob
!git/*
# Set the colors. # Set the colors.
--colors=line:none --colors=line:none
--colors=line:style:bold --colors=line:style:bold

View File

@@ -2,6 +2,12 @@
Replace this text with the output of `rg --version`. Replace this text with the output of `rg --version`.
#### How did you install ripgrep?
If you installed ripgrep with snap and are getting strange file permission or
file not found errors, then please do not file a bug. Instead, use one of the
Github binary releases.
#### What operating system are you using ripgrep on? #### What operating system are you using ripgrep on?
Replace this text with your operating system and version. Replace this text with your operating system and version.

View File

@@ -9,7 +9,7 @@ ack and grep.
[![Linux build status](https://travis-ci.org/BurntSushi/ripgrep.svg?branch=master)](https://travis-ci.org/BurntSushi/ripgrep) [![Linux build status](https://travis-ci.org/BurntSushi/ripgrep.svg?branch=master)](https://travis-ci.org/BurntSushi/ripgrep)
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep) [![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![](https://img.shields.io/crates/v/ripgrep.svg)](https://crates.io/crates/ripgrep) [![Crates.io](https://img.shields.io/crates/v/ripgrep.svg)](https://crates.io/crates/ripgrep)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org). Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
@@ -106,7 +106,10 @@ increases the times to `2.640s` for ripgrep and `10.277s` for GNU grep.
automatically detecting UTF-16 is provided. Other text encodings must be automatically detecting UTF-16 is provided. Other text encodings must be
specifically specified with the `-E/--encoding` flag.) specifically specified with the `-E/--encoding` flag.)
* ripgrep supports searching files compressed in a common format (gzip, xz, * ripgrep supports searching files compressed in a common format (gzip, xz,
lzma or bzip2 current) with the `-z/--search-zip` flag. lzma, bzip2 or lz4) with the `-z/--search-zip` flag.
* ripgrep supports arbitrary input preprocessing filters which could be PDF
text extraction, less supported decompression, decrypting, automatic encoding
detection and so on.
In other words, use ripgrep if you like speed, filtering by default, fewer In other words, use ripgrep if you like speed, filtering by default, fewer
bugs, and Unicode support. bugs, and Unicode support.
@@ -151,7 +154,7 @@ Summarizing, ripgrep is fast because:
latter is better for large directories. ripgrep chooses the best searching latter is better for large directories. ripgrep chooses the best searching
strategy for you automatically. strategy for you automatically.
* Applies your ignore patterns in `.gitignore` files using a * Applies your ignore patterns in `.gitignore` files using a
[`RegexSet`](https://doc.rust-lang.org/regex/regex/struct.RegexSet.html). [`RegexSet`](https://docs.rs/regex/1.0.0/regex/struct.RegexSet.html).
That means a single file path can be matched against multiple glob patterns That means a single file path can be matched against multiple glob patterns
simultaneously. simultaneously.
* It uses a lock-free parallel recursive directory iterator, courtesy of * It uses a lock-free parallel recursive directory iterator, courtesy of
@@ -197,6 +200,13 @@ $ brew tap burntsushi/ripgrep https://github.com/BurntSushi/ripgrep.git
$ brew install ripgrep-bin $ brew install ripgrep-bin
``` ```
If you're a **MacPorts** user, then you can install ripgrep from the
[official ports](https://www.macports.org/ports.php?by=name&substr=ripgrep):
```
$ sudo port install ripgrep
```
If you're a **Windows Chocolatey** user, then you can install ripgrep from the [official repo](https://chocolatey.org/packages/ripgrep): If you're a **Windows Chocolatey** user, then you can install ripgrep from the [official repo](https://chocolatey.org/packages/ripgrep):
``` ```
@@ -265,20 +275,31 @@ $ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/0.8.1/ripgrep
$ sudo dpkg -i ripgrep_0.8.1_amd64.deb $ sudo dpkg -i ripgrep_0.8.1_amd64.deb
``` ```
If you're an **Ubuntu** user, ripgrep can be installed from the `snap` store. (N.B. Various snaps for ripgrep on Ubuntu are also available, but none of them
seem to work right and generate a number of very strange bug reports that I
don't know how to fix and don't have the time to fix. Therefore, it is no
longer a recommended installation option.)
* Note that if you are using `16.04 LTS` or later, snap is already installed. If you're a **FreeBSD** user, then you can install ripgrep from the [official ports](https://www.freshports.org/textproc/ripgrep/):
* For older versions you can install snap using
[this guide](https://docs.snapcraft.io/core/install-ubuntu).
For the latest stable release:
``` ```
$ sudo snap install --classic rg # pkg install ripgrep
```
If you're an **OpenBSD** user, then you can install ripgrep from the [official ports](http://openports.se/textproc/ripgrep):
```
$ doas pkg_add ripgrep
```
If you're a **NetBSD** user, then you can install ripgrep from [pkgsrc](http://pkgsrc.se/textproc/ripgrep):
```
# pkgin install ripgrep
``` ```
If you're a **Rust programmer**, ripgrep can be installed with `cargo`. If you're a **Rust programmer**, ripgrep can be installed with `cargo`.
* Note that the minimum supported version of Rust for ripgrep is **1.20**, * Note that the minimum supported version of Rust for ripgrep is **1.23.0**,
although ripgrep may work with older versions. although ripgrep may work with older versions.
* Note that the binary may be bigger than expected because it contains debug * Note that the binary may be bigger than expected because it contains debug
symbols. This is intentional. To remove debug symbols and therefore reduce symbols. This is intentional. To remove debug symbols and therefore reduce
@@ -288,13 +309,8 @@ If you're a **Rust programmer**, ripgrep can be installed with `cargo`.
$ cargo install ripgrep $ cargo install ripgrep
``` ```
If you're using Rust nightly, then use When compiling with Rust 1.27 or newer, this will automatically enable SIMD
optimizations for search.
```
$ cargo install ripgrep --features unstable
```
to get SIMD optimizations.
ripgrep isn't currently in any other package repositories. ripgrep isn't currently in any other package repositories.
[I'd like to change that](https://github.com/BurntSushi/ripgrep/issues/10). [I'd like to change that](https://github.com/BurntSushi/ripgrep/issues/10).
@@ -304,7 +320,7 @@ ripgrep isn't currently in any other package repositories.
ripgrep is written in Rust, so you'll need to grab a ripgrep is written in Rust, so you'll need to grab a
[Rust installation](https://www.rust-lang.org/) in order to compile it. [Rust installation](https://www.rust-lang.org/) in order to compile it.
ripgrep compiles with Rust 1.20 (stable) or newer. Building is easy: ripgrep compiles with Rust 1.23.0 (stable) or newer. Building is easy:
``` ```
$ git clone https://github.com/BurntSushi/ripgrep $ git clone https://github.com/BurntSushi/ripgrep
@@ -315,7 +331,7 @@ $ ./target/release/rg --version
``` ```
If you have a Rust nightly compiler and a recent Intel CPU, then you can enable If you have a Rust nightly compiler and a recent Intel CPU, then you can enable
optional SIMD acceleration like so: additional optional SIMD acceleration like so:
``` ```
RUSTFLAGS="-C target-cpu=native" cargo build --release --features 'simd-accel avx-accel' RUSTFLAGS="-C target-cpu=native" cargo build --release --features 'simd-accel avx-accel'
@@ -325,6 +341,12 @@ If your machine doesn't support AVX instructions, then simply remove
`avx-accel` from the features list. Similarly for SIMD (which corresponds `avx-accel` from the features list. Similarly for SIMD (which corresponds
roughly to SSE instructions). roughly to SSE instructions).
The `simd-accel` and `avx-accel` features enable SIMD support in certain
ripgrep dependencies (responsible for counting lines and transcoding). They
are not necessary to get SIMD optimizations for search; those are enabled
automatically. Hopefully, some day, the `simd-accel` and `avx-accel` features
will similarly become unnecessary.
### Running tests ### Running tests

View File

@@ -50,7 +50,7 @@ test_script:
before_deploy: before_deploy:
# Generate artifacts for release # Generate artifacts for release
- cargo build --release --features unstable - cargo build --release
- mkdir staging - mkdir staging
- copy target\release\rg.exe staging - copy target\release\rg.exe staging
- ps: copy target\release\build\ripgrep-*\out\_rg.ps1 staging - ps: copy target\release\build\ripgrep-*\out\_rg.ps1 staging
@@ -78,6 +78,7 @@ branches:
only: only:
- /\d+\.\d+\.\d+/ - /\d+\.\d+\.\d+/
- master - master
- ag/libripgrep
# - appveyor # - appveyor
# - /\d+\.\d+\.\d+/ # - /\d+\.\d+\.\d+/
# except: # except:

View File

@@ -8,7 +8,7 @@ set -ex
# Generate artifacts for release # Generate artifacts for release
mk_artifacts() { mk_artifacts() {
cargo build --target "$TARGET" --release --features unstable cargo build --target "$TARGET" --release
} }
mk_tarball() { mk_tarball() {

View File

@@ -1,41 +1,42 @@
#!/usr/bin/env zsh #!/usr/bin/env zsh
emulate zsh -o extended_glob -o no_function_argzero -o no_unset
## ##
# Compares options in `rg --help` output to options in zsh completion function # Compares options in `rg --help` output to options in zsh completion function
emulate -R zsh
setopt extended_glob
setopt no_function_argzero
setopt no_unset
get_comp_args() { get_comp_args() {
# Technically there are many options that the completion system sets that
# our function may rely on, but we'll trust that we've got it mostly right
setopt local_options unset setopt local_options unset
# Our completion function recognises a special variable which tells it to # Our completion function recognises a special variable which tells it to
# dump the _arguments specs and then just return. But do this in a sub-shell # dump the _arguments specs and then just return. But do this in a sub-shell
# anyway to avoid any weirdness # anyway to avoid any weirdness
( _RG_COMPLETE_LIST_ARGS=1 source $1 ) ( _RG_COMPLETE_LIST_ARGS=1 source $1 )
return $?
} }
main() { main() {
local diff local diff
local rg="${${0:a}:h}/../target/${TARGET:-}/release/rg" local rg="${0:a:h}/../target/${TARGET:-}/release/rg"
local _rg="${${0:a}:h}/../complete/_rg" local _rg="${0:a:h}/../complete/_rg"
local -a help_args comp_args local -a help_args comp_args
[[ -e $rg ]] || rg=${rg/%\/release\/rg/\/debug\/rg} [[ -e $rg ]] || rg=${rg/%\/release\/rg/\/debug\/rg}
rg=${rg:a}
_rg=${_rg:a}
[[ -e $rg ]] || { [[ -e $rg ]] || {
printf >&2 'File not found: %s\n' $rg print -r >&2 "File not found: $rg"
return 1 return 1
} }
[[ -e $_rg ]] || { [[ -e $_rg ]] || {
printf >&2 'File not found: %s\n' $_rg print -r >&2 "File not found: $_rg"
return 1 return 1
} }
printf 'Comparing options:\n-%s\n+%s\n' $rg $_rg print -rl - 'Comparing options:' "-$rg" "+$_rg"
# 'Parse' options out of the `--help` output. To prevent false positives we # 'Parse' options out of the `--help` output. To prevent false positives we
# only look at lines where the first non-white-space character is `-` # only look at lines where the first non-white-space character is `-`
@@ -50,6 +51,8 @@ main() {
# 'Parse' options out of the completion function # 'Parse' options out of the completion function
comp_args=( ${(f)"$( get_comp_args $_rg )"} ) comp_args=( ${(f)"$( get_comp_args $_rg )"} )
# Note that we currently exclude hidden (!...) options; matching these
# properly against the `--help` output could be irritating
comp_args=( ${comp_args#\(*\)} ) # Strip excluded options comp_args=( ${comp_args#\(*\)} ) # Strip excluded options
comp_args=( ${comp_args#\*} ) # Strip repetition indicator comp_args=( ${comp_args#\*} ) # Strip repetition indicator
comp_args=( ${comp_args%%-[:[]*} ) # Strip everything after -optname- comp_args=( ${comp_args%%-[:[]*} ) # Strip everything after -optname-
@@ -57,14 +60,14 @@ main() {
comp_args=( ${comp_args##[^-]*} ) # Remove non-options comp_args=( ${comp_args##[^-]*} ) # Remove non-options
# This probably isn't necessary, but we should ensure the same order # This probably isn't necessary, but we should ensure the same order
comp_args=( ${(f)"$( printf '%s\n' $comp_args | sort -u )"} ) comp_args=( ${(f)"$( print -rl - $comp_args | sort -u )"} )
(( $#help_args )) || { (( $#help_args )) || {
printf >&2 'Failed to get help_args\n' print -r >&2 'Failed to get help_args'
return 1 return 1
} }
(( $#comp_args )) || { (( $#comp_args )) || {
printf >&2 'Failed to get comp_args\n' print -r >&2 'Failed to get comp_args'
return 1 return 1
} }
@@ -73,12 +76,12 @@ main() {
diff -U2 \ diff -U2 \
--label '`rg --help`' \ --label '`rg --help`' \
--label '`_rg`' \ --label '`_rg`' \
=( printf '%s\n' $help_args ) =( printf '%s\n' $comp_args ) =( print -rl - $help_args ) =( print -rl - $comp_args )
else else
diff -U2 \ diff -U2 \
-L '`rg --help`' \ -L '`rg --help`' \
-L '`_rg`' \ -L '`_rg`' \
=( printf '%s\n' $help_args ) =( printf '%s\n' $comp_args ) =( print -rl - $help_args ) =( print -rl - $comp_args )
fi fi
)" )"
@@ -91,4 +94,4 @@ main() {
return 0 return 0
} }
main "${@}" main "$@"

View File

@@ -6,163 +6,281 @@
# Run ci/test_complete.sh after building to ensure that the options supported by # Run ci/test_complete.sh after building to ensure that the options supported by
# this function stay in synch with the `rg` binary. # this function stay in synch with the `rg` binary.
# #
# @see http://zsh.sourceforge.net/Doc/Release/Completion-System.html
# @see https://github.com/zsh-users/zsh/blob/master/Etc/completion-style-guide # @see https://github.com/zsh-users/zsh/blob/master/Etc/completion-style-guide
# #
# Based on code from the zsh-users project — see copyright notice below. # Originally based on code from the zsh-users project — see copyright notice
# below.
_rg() { _rg() {
local state_descr ret curcontext="${curcontext:-}" local curcontext=$curcontext no='!' descr ret=1
local -a context line state local -a context line state state_descr args tmp suf
local -A opt_args val_args local -A opt_args
local -a rg_args
# Sort by long option name to match `rg --help` # ripgrep has many options which negate the effect of a more common one — for
rg_args=( # example, `--no-column` to negate `--column`, and `--messages` to negate
'(-A -C --after-context --context)'{-A+,--after-context=}'[specify lines to show after each match]:number of lines' # `--no-messages`. There are so many of these, and they're so infrequently
'(-B -C --before-context --context)'{-B+,--before-context=}'[specify lines to show before each match]:number of lines' # used, that some users will probably find it irritating if they're completed
'(-i -s -S --ignore-case --case-sensitive --smart-case)'{-s,--case-sensitive}'[search case-sensitively]' # indiscriminately, so let's not do that unless either the current prefix
'--color=[specify when to use colors in output]:when:( never auto always ansi )' # matches one of those negation options or the user has the `complete-all`
'*--colors=[specify color settings and styles]: :->colorspec' # style set. Note that this prefix check has to be updated manually to account
'--column[show column numbers]' # for all of the potential negation options listed below!
'(-A -B -C --after-context --before-context --context)'{-C+,--context=}'[specify lines to show before and after each match]:number of lines' if
'(-b --byte-offset)'{-b,--byte-offset}'[print the 0-based byte offset for each matching line]' # (--[imn]* => --ignore*, --messages, --no-*)
'--context-separator=[specify string used to separate non-continuous context lines in output]:separator' [[ $PREFIX$SUFFIX == --[imn]* ]] ||
'(-c --count --count-matches --passthrough --passthru)'{-c,--count}'[only show count of matching lines for each file]' zstyle -t ":complete:$curcontext:*" complete-all
'(--count-matches -c --count --passthrough --passthru)--count-matches[only show count of individual matches for each file]' then
'--debug[show debug messages]' no=
'--dfa-size-limit=[specify upper size limit of generated DFA]:DFA size' fi
'(-E --encoding)'{-E+,--encoding=}'[specify text encoding of files to search]: :_rg_encodings'
'*'{-f+,--file=}'[specify file containing patterns to search for]:file:_files' # We make heavy use of argument groups here to prevent the option specs from
"(1)--files[show each file that would be searched (but don't search)]" # growing unwieldy. These aren't supported in zsh <5.4, though, so we'll strip
'(-l --files-with-matches --files-without-match)'{-l,--files-with-matches}'[only show names of files with matches]' # them out below if necessary. This makes the exclusions inaccurate on those
'(-l --files-with-matches --files-without-match)--files-without-match[only show names of files without matches]' # older versions, but oh well — it's not that big a deal
'(-F --fixed-strings)'{-F,--fixed-strings}'[treat pattern as literal string instead of regular expression]' args=(
'(-L --follow)'{-L,--follow}'[follow symlinks]' + '(exclusive)' # Misc. fully exclusive options
'*'{-g+,--glob=}'[include or exclude files for searching that match the specified glob]:glob' '(: * -)'{-h,--help}'[display help information]'
'(: -)'{-h,--help}'[display help information]' '(: * -)'{-V,--version}'[display version information]'
'(-p --no-heading --pretty --vimgrep)--heading[show matches grouped by file name]'
+ '(case)' # Case-sensitivity options
{-i,--ignore-case}'[search case-insensitively]'
{-s,--case-sensitive}'[search case-sensitively]'
{-S,--smart-case}'[search case-insensitively if pattern is all lowercase]'
+ '(context-a)' # Context (after) options
'(context-c)'{-A+,--after-context=}'[specify lines to show after each match]:number of lines'
+ '(context-b)' # Context (before) options
'(context-c)'{-B+,--before-context=}'[specify lines to show before each match]:number of lines'
+ '(context-c)' # Context (combined) options
'(context-a context-b)'{-C+,--context=}'[specify lines to show before and after each match]:number of lines'
+ '(column)' # Column options
'--column[show column numbers for matches]'
$no"--no-column[don't show column numbers for matches]"
+ '(count)' # Counting options
'(passthru)'{-c,--count}'[only show count of matching lines for each file]'
'(passthru)--count-matches[only show count of individual matches for each file]'
+ file # File-input options
'*'{-f+,--file=}'[specify file containing patterns to search for]: :_files'
+ '(file-match)' # Files with/without match options
'(stats)'{-l,--files-with-matches}'[only show names of files with matches]'
'(stats)--files-without-match[only show names of files without matches]'
+ '(file-name)' # File-name options
{-H,--with-filename}'[show file name for matches]'
"--no-filename[don't show file name for matches]"
+ '(fixed)' # Fixed-string options
{-F,--fixed-strings}'[treat pattern as literal string instead of regular expression]'
$no"--no-fixed-strings[don't treat pattern as literal string]"
+ '(follow)' # Symlink-following options
{-L,--follow}'[follow symlinks]'
$no"--no-follow[don't follow symlinks]"
+ glob # File-glob options
'*'{-g+,--glob=}'[include/exclude files matching specified glob]:glob'
'*--iglob=[include/exclude files matching specified case-insensitive glob]:glob'
+ '(heading)' # Heading options
'(pretty-vimgrep)--heading[show matches grouped by file name]'
"(pretty-vimgrep)--no-heading[don't show matches grouped by file name]"
+ '(hidden)' # Hidden-file options
'--hidden[search hidden files and directories]' '--hidden[search hidden files and directories]'
'*--iglob=[include or exclude files for searching that match the specified case-insensitive glob]:glob' $no"--no-hidden[don't search hidden files and directories]"
'(-i -s -S --case-sensitive --ignore-case --smart-case)'{-i,--ignore-case}'[search case-insensitively]'
'--ignore-file=[specify additional ignore file]:file:_files' + '(ignore)' # Ignore-file options
'(-v --invert-match)'{-v,--invert-match}'[invert matching]' "(--no-ignore-global --no-ignore-parent --no-ignore-vcs)--no-ignore[don't respect ignore files]"
'(-n -N --line-number --no-line-number)'{-n,--line-number}'[show line numbers]' $no'(--ignore-global --ignore-parent --ignore-vcs)--ignore[respect ignore files]'
'(-N --no-line-number)--line-number-width=[specify width of displayed line number]:number of columns'
'(-w -x --line-regexp --word-regexp)'{-x,--line-regexp}'[only show matches surrounded by line boundaries]' + '(ignore-global)' # Global ignore-file options
'(-M --max-columns)'{-M+,--max-columns=}'[specify max length of lines to print]:number of bytes' "--no-ignore-global[don't respect global ignore files]"
'(-m --max-count)'{-m+,--max-count=}'[specify max number of matches per file]:number of matches' $no'--ignore-global[respect global ignore files]'
'--max-filesize=[specify size above which files should be ignored]:file size'
'--maxdepth=[specify max number of directories to descend]:number of directories' + '(ignore-parent)' # Parent ignore-file options
'(--mmap --no-mmap)--mmap[search using memory maps when possible]'
'(-H --with-filename --no-filename)--no-filename[suppress all file names]'
"(-p --heading --pretty --vimgrep)--no-heading[don't group matches by file name]"
"--no-config[don't load configuration files]"
"(--no-ignore-parent)--no-ignore[don't respect ignore files]"
"--no-ignore-parent[don't respect ignore files in parent directories]" "--no-ignore-parent[don't respect ignore files in parent directories]"
$no'--ignore-parent[respect ignore files in parent directories]'
+ '(ignore-vcs)' # VCS ignore-file options
"--no-ignore-vcs[don't respect version control ignore files]" "--no-ignore-vcs[don't respect version control ignore files]"
'(-n -N --line-number --no-line-number)'{-N,--no-line-number}'[suppress line numbers]' $no'--ignore-vcs[respect version control ignore files]'
'--no-messages[suppress all error messages]'
"(--mmap --no-mmap)--no-mmap[don't search using memory maps]" + '(line)' # Line-number options
'(-0 --null)'{-0,--null}'[print NUL byte after file names]' {-n,--line-number}'[show line numbers for matches]'
'(-o -r --only-matching --passthrough --passthru --replace)'{-o,--only-matching}'[show only matching part of each line]' {-N,--no-line-number}"[don't show line numbers for matches]"
'(-c -o -r --count --only-matching --passthrough --replace)--passthru[show both matching and non-matching lines]'
'!(-c -o -r --count --only-matching --passthru --replace)--passthrough' + '(max-depth)' # Directory-depth options
'--path-separator=[specify path separator to use when printing file names]:separator' '--max-depth=[specify max number of directories to descend]:number of directories'
'(-p --heading --no-heading --pretty --vimgrep)'{-p,--pretty}'[alias for --color=always --heading -n]' '!--maxdepth=:number of directories'
'(-q --quiet)'{-q,--quiet}'[suppress normal output]'
'--regex-size-limit=[specify upper size limit of compiled regex]:regex size' + '(messages)' # Error-message options
'(1 -f --file)*'{-e+,--regexp=}'[specify pattern]:pattern' '(--no-ignore-messages)--no-messages[suppress some error messages]'
'(-c -o -r --count --only-matching --passthrough --passthru --replace)'{-r+,--replace=}'[specify string used to replace matches]:replace string' $no"--messages[don't suppress error messages affected by --no-messages]"
'(-i -s -S --ignore-case --case-sensitive --smart-case)'{-S,--smart-case}'[search case-insensitively if the pattern is all lowercase]'
'(-j --threads)--sort-files[sort results by file path (disables parallelism)]' + '(messages-ignore)' # Ignore-error message options
'(-a --text)'{-a,--text}'[search binary files as if they were text]' "--no-ignore-messages[don't show ignore-file parse error messages]"
'(-j --sort-files --threads)'{-j+,--threads=}'[specify approximate number of threads to use]:number of threads' $no'--ignore-messages[show ignore-file parse error messages]'
+ '(mmap)' # mmap options
'--mmap[search using memory maps when possible]'
"--no-mmap[don't search using memory maps]"
+ '(multiline)' # multiline options
'--multiline[permit matching across multiple lines]'
$no"--no-multiline[restrict matches to at most one line each]"
+ '(only)' # Only-match options
'(passthru replace)'{-o,--only-matching}'[show only matching part of each line]'
+ '(passthru)' # Pass-through options
'(--vimgrep count only replace)--passthru[show both matching and non-matching lines]'
'!(--vimgrep count only replace)--passthrough'
+ '(pre)' # Preprocessing options
'(-z --search-zip)--pre=[specify preprocessor utility]:preprocessor utility:_command_names -e'
$no'--no-pre[disable preprocessor utility]'
+ '(pretty-vimgrep)' # Pretty/vimgrep display options
'(heading)'{-p,--pretty}'[alias for --color=always --heading -n]'
'(heading passthru)--vimgrep[show results in vim-compatible format]'
+ regexp # Explicit pattern options
'(1 file)*'{-e+,--regexp=}'[specify pattern]:pattern'
+ '(replace)' # Replacement options
'(count only passthru)'{-r+,--replace=}'[specify string used to replace matches]:replace string'
+ '(sort)' # File-sorting options
'(threads)--sort-files[sort results by file path (disables parallelism)]'
$no"--no-sort-files[don't sort results by file path]"
+ stats # Statistics options
'(--files file-match)--stats[show search statistics]'
+ '(text)' # Binary-search options
{-a,--text}'[search binary files as if they were text]'
$no"--no-text[don't search binary files as if they were text]"
+ '(threads)' # Thread-count options
'(--sort-files)'{-j+,--threads=}'[specify approximate number of threads to use]:number of threads'
+ type # Type options
'*'{-t+,--type=}'[only search files matching specified type]: :_rg_types' '*'{-t+,--type=}'[only search files matching specified type]: :_rg_types'
'*--type-add=[add new glob for file type]: :->typespec' '*--type-add=[add new glob for specified file type]: :->typespec'
'*--type-clear=[clear globs previously defined for specified file type]: :_rg_types' '*--type-clear=[clear globs previously defined for specified file type]: :_rg_types'
# This should actually be exclusive with everything but other type options # This should actually be exclusive with everything but other type options
'(:)--type-list[show all supported file types and their associated globs]' '(: *)--type-list[show all supported file types and their associated globs]'
'*'{-T+,--type-not=}"[don't search files matching specified type]: :_rg_types" '*'{-T+,--type-not=}"[don't search files matching specified file type]: :_rg_types"
+ '(word-line)' # Whole-word/line match options
{-w,--word-regexp}'[only show matches surrounded by word boundaries]'
{-x,--line-regexp}'[only show matches surrounded by line boundaries]'
+ '(zip)' # Compression options
'(--pre)'{-z,--search-zip}'[search in compressed files]'
$no"--no-search-zip[don't search in compressed files]"
+ misc # Other options — no need to separate these at the moment
'(-b --byte-offset)'{-b,--byte-offset}'[show 0-based byte offset for each matching line]'
'--color=[specify when to use colors in output]:when:((
never\:"never use colors"
auto\:"use colors or not based on stdout, TERM, etc."
always\:"always use colors"
ansi\:"always use ANSI colors (even on Windows)"
))'
'*--colors=[specify color and style settings]: :->colorspec'
'--context-separator=[specify string used to separate non-continuous context lines in output]:separator'
'--debug[show debug messages]'
'--dfa-size-limit=[specify upper size limit of generated DFA]:DFA size (bytes)'
'(-E --encoding)'{-E+,--encoding=}'[specify text encoding of files to search]: :_rg_encodings'
"(1 stats)--files[show each file that would be searched (but don't search)]"
'*--ignore-file=[specify additional ignore file]:ignore file:_files'
'(-v --invert-match)'{-v,--invert-match}'[invert matching]'
'(-M --max-columns)'{-M+,--max-columns=}'[specify max length of lines to print]:number of bytes'
'(-m --max-count)'{-m+,--max-count=}'[specify max number of matches per file]:number of matches'
'--max-filesize=[specify size above which files should be ignored]:file size (bytes)'
"--no-config[don't load configuration files]"
'(-0 --null)'{-0,--null}'[print NUL byte after file names]'
'--path-separator=[specify path separator to use when printing file names]:separator'
'(-q --quiet)'{-q,--quiet}'[suppress normal output]'
'--regex-size-limit=[specify upper size limit of compiled regex]:regex size (bytes)'
'*'{-u,--unrestricted}'[reduce level of "smart" searching]' '*'{-u,--unrestricted}'[reduce level of "smart" searching]'
'(: -)'{-V,--version}'[display version information]'
'(-p --heading --no-heading --pretty)--vimgrep[show results in vim-compatible format]' + operand # Operands
'(-H --no-filename --with-filename)'{-H,--with-filename}'[display the file name for matches]' '(--files --type-list file regexp)1: :_guard "^-*" pattern'
'(-w -x --line-regexp --word-regexp)'{-w,--word-regexp}'[only show matches surrounded by word boundaries]' '(--type-list)*: :_files'
'(-e -f --file --files --regexp --type-list)1: :_rg_pattern'
'(--type-list)*:file:_files'
'(-z --search-zip)'{-z,--search-zip}'[search in compressed files]'
"(--stats)--stats[print stats about this search]"
) )
[[ ${_RG_COMPLETE_LIST_ARGS:-} == (1|t*|y*) ]] && { # This is used with test_complete.sh to verify that there are no options
printf '%s\n' "${rg_args[@]}" # listed in the help output that aren't also defined here
[[ $_RG_COMPLETE_LIST_ARGS == (1|t*|y*) ]] && {
print -rl - $args
return 0 return 0
} }
_arguments -s -S : "${rg_args[@]}" && return 0 # Strip out argument groups where unsupported (see above)
[[ $ZSH_VERSION == (4|5.<0-3>)(.*)# ]] &&
args=( ${(@)args:#(#i)(+|[a-z0-9][a-z0-9_-]#|\([a-z0-9][a-z0-9_-]#\))} )
while (( $#state )); do _arguments -C -s -S : $args && ret=0
case "${state[1]}" in
case $state in
colorspec) colorspec)
# @todo I don't like this because it allows you to do weird things like if [[ ${IPREFIX#--*=}$PREFIX == [^:]# ]]; then
# `line:line:bg:`. Also, i would like the `compadd -q` behaviour suf=( -qS: )
[[ -prefix *:none: ]] && return 1 tmp=(
[[ -prefix *:*:*:* ]] && return 1 'column:specify coloring for column numbers'
'line:specify coloring for line numbers'
'match:specify coloring for match text'
'path:specify coloring for file names'
)
descr='color/style type'
elif [[ ${IPREFIX#--*=}$PREFIX == (column|line|match|path):[^:]# ]]; then
suf=( -qS: )
tmp=(
'none:clear color/style for type'
'bg:specify background color'
'fg:specify foreground color'
'style:specify text style'
)
descr='color/style attribute'
elif [[ ${IPREFIX#--*=}$PREFIX == [^:]##:(bg|fg):[^:]# ]]; then
tmp=( black blue green red cyan magenta yellow white )
descr='color name or r,g,b'
elif [[ ${IPREFIX#--*=}$PREFIX == [^:]##:style:[^:]# ]]; then
tmp=( {,no}bold {,no}intense {,no}underline )
descr='style name'
else
_message -e colorspec 'no more arguments'
fi
_values -S ':' 'color/style type' \ (( $#tmp )) && {
'column[specify coloring for column numbers]: :->attribute' \ compset -P '*:'
'line[specify coloring for line numbers]: :->attribute' \ _describe -t colorspec $descr tmp $suf && ret=0
'match[specify coloring for match text]: :->attribute' \ }
'path[specify color for file names]: :->attribute' && return 0
[[ "${state}" == 'attribute' ]] &&
_values -S ':' 'color/style attribute' \
'none[clear color/style for type]' \
'bg[specify background color]: :->color' \
'fg[specify foreground color]: :->color' \
'style[specify text style]: :->style' && return 0
[[ "${state}" == 'color' ]] &&
_values -S ':' 'color value' \
black blue green red cyan magenta yellow white && return 0
[[ "${state}" == 'style' ]] &&
_values -S ':' 'style value' \
bold nobold intense nointense underline nounderline && return 0
;; ;;
typespec) typespec)
if compset -P '[^:]##:include:'; then if compset -P '[^:]##:include:'; then
_sequence -s ',' _rg_types && return 0 _sequence -s , _rg_types && ret=0
# @todo This bit in particular could be better, but it's a little # @todo This bit in particular could be better, but it's a little
# complex, and attempting to solve it seems to run us up against a crash # complex, and attempting to solve it seems to run us up against a crash
# bug — zsh # 40362 # bug — zsh # 40362
elif compset -P '[^:]##:'; then elif compset -P '[^:]##:'; then
_message 'glob or include directive' && return 1 _message 'glob or include directive' && ret=1
elif [[ ! -prefix *:* ]]; then elif [[ ! -prefix *:* ]]; then
_rg_types -qS ':' && return 0 _rg_types -qS : && ret=0
fi fi
;; ;;
esac esac
shift state
done
return 1 return ret
}
# zsh 5.1 refuses to complete options if a 'match-less' operand like our pattern
# could be 'completed' instead. We can use _guard() to avoid this problem, but
# it introduces another one: zsh won't print the message if we try to complete
# the pattern after having passed `--`. To work around *that* problem, we can
# use this function to bypass the _guard() when `--` is on the command line.
# This is inaccurate (it'd get confused by e.g. `rg -e --`), but zsh's handling
# of `--` isn't accurate anyway
_rg_pattern() {
if (( ${words[(I)--]} )); then
_message 'pattern'
else
_guard '^-*' 'pattern'
fi
} }
# Complete encodings # Complete encodings
@@ -198,7 +316,7 @@ _rg_encodings() {
x-user-defined auto x-user-defined auto
) )
_wanted rg-encodings expl 'encoding' compadd -a "${@}" - _encodings _wanted encodings expl encoding compadd -a "$@" - _encodings
} }
# Complete file types # Complete file types
@@ -206,12 +324,12 @@ _rg_types() {
local -a expl local -a expl
local -aU _types local -aU _types
_types=( ${${(f)"$( _call_program rg-types rg --type-list )"}%%:*} ) _types=( ${(@)${(f)"$( _call_program types rg --type-list )"}%%:*} )
_wanted rg-types expl 'file type' compadd -a "${@}" - _types _wanted types expl 'file type' compadd -a "$@" - _types
} }
_rg "${@}" _rg "$@"
# ------------------------------------------------------------------------------ # ------------------------------------------------------------------------------
# Copyright (c) 2011 Github zsh-users - http://github.com/zsh-users # Copyright (c) 2011 Github zsh-users - http://github.com/zsh-users

View File

@@ -18,6 +18,8 @@ Synopsis
*rg* [_OPTIONS_] *--type-list* *rg* [_OPTIONS_] *--type-list*
*command* | *rg* [_OPTIONS_] _PATTERN_
*rg* [_OPTIONS_] *--help* *rg* [_OPTIONS_] *--help*
*rg* [_OPTIONS_] *--version* *rg* [_OPTIONS_] *--version*
@@ -98,6 +100,28 @@ would behave identically to the following command
rg --smart-case foo rg --smart-case foo
another example is adding types
--type-add
web:*.{html,css,js}*
would behave identically to the following command
rg --type-add 'web:*.{html,css,js}*' foo
same with using globs
--glob=!git/*
or
--glob
!git/*
would behave identically to the following command
rg --glob '!git/*' foo
ripgrep also provides a flag, *--no-config*, that when present will suppress ripgrep also provides a flag, *--no-config*, that when present will suppress
any and all support for configuration. This includes any future support any and all support for configuration. This includes any future support
for auto-loading configuration files from pre-determined paths. for auto-loading configuration files from pre-determined paths.

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "globset" name = "globset"
version = "0.4.0" #:version version = "0.4.1" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"] authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """ description = """
Cross platform single glob and glob set matching. Glob set matching is the Cross platform single glob and glob set matching. Glob set matching is the
@@ -23,10 +23,10 @@ aho-corasick = "0.6.0"
fnv = "1.0" fnv = "1.0"
log = "0.4" log = "0.4"
memchr = "2" memchr = "2"
regex = "0.2.9" regex = "1"
[dev-dependencies] [dev-dependencies]
glob = "0.2" glob = "0.2"
[features] [features]
simd-accel = ["regex/simd-accel"] simd-accel = []

View File

@@ -275,6 +275,19 @@ impl Glob {
} }
/// Returns the regular expression string for this glob. /// Returns the regular expression string for this glob.
///
/// Note that regular expressions for globs are intended to be matched on
/// arbitrary bytes (`&[u8]`) instead of Unicode strings (`&str`). In
/// particular, globs are frequently used on file paths, where there is no
/// general guarantee that file paths are themselves valid UTF-8. As a
/// result, callers will need to ensure that they are using a regex API
/// that can match on arbitrary bytes. For example, the
/// [`regex`](https://crates.io/regex)
/// crate's
/// [`Regex`](https://docs.rs/regex/*/regex/struct.Regex.html)
/// API is not suitable for this since it matches on `&str`, but its
/// [`bytes::Regex`](https://docs.rs/regex/*/regex/bytes/struct.Regex.html)
/// API is suitable for this.
pub fn regex(&self) -> &str { pub fn regex(&self) -> &str {
&self.re &self.re
} }

View File

@@ -288,6 +288,14 @@ pub struct GlobSet {
} }
impl GlobSet { impl GlobSet {
/// Create an empty `GlobSet`. An empty set matches nothing.
pub fn empty() -> GlobSet {
GlobSet {
len: 0,
strats: vec![],
}
}
/// Returns true if this set is empty, and therefore matches nothing. /// Returns true if this set is empty, and therefore matches nothing.
pub fn is_empty(&self) -> bool { pub fn is_empty(&self) -> bool {
self.len == 0 self.len == 0

24
grep-matcher/Cargo.toml Normal file
View File

@@ -0,0 +1,24 @@
[package]
name = "grep-matcher"
version = "0.0.1" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
A trait for regular expressions, with a focus on line oriented search.
"""
documentation = "https://docs.rs/grep-matcher"
homepage = "https://github.com/BurntSushi/ripgrep"
repository = "https://github.com/BurntSushi/ripgrep"
readme = "README.md"
keywords = ["regex", "pattern", "trait"]
license = "Unlicense/MIT"
autotests = false
[dependencies]
memchr = "2"
[dev-dependencies]
regex = "1"
[[test]]
name = "integration"
path = "tests/tests.rs"

4
grep-matcher/README.md Normal file
View File

@@ -0,0 +1,4 @@
grep
----
This is a *library* that provides grep-style line-by-line regex searching (with
comparable performance to `grep` itself).

View File

@@ -0,0 +1,328 @@
use std::str;
use memchr::memchr;
/// Interpolate capture references in `replacement` and write the interpolation
/// result to `dst`. References in `replacement` take the form of $N or $name,
/// where `N` is a capture group index and `name` is a capture group name. The
/// function provided, `name_to_index`, maps capture group names to indices.
///
/// The `append` function given is responsible for writing the replacement
/// to the `dst` buffer. That is, it is called with the capture group index
/// of a capture group reference and is expected to resolve the index to its
/// corresponding matched text. If no such match exists, then `append` should
/// not write anything to its given buffer.
pub fn interpolate<A, N>(
mut replacement: &[u8],
mut append: A,
mut name_to_index: N,
dst: &mut Vec<u8>,
) where
A: FnMut(usize, &mut Vec<u8>),
N: FnMut(&str) -> Option<usize>
{
while !replacement.is_empty() {
match memchr(b'$', replacement) {
None => break,
Some(i) => {
dst.extend(&replacement[..i]);
replacement = &replacement[i..];
}
}
if replacement.get(1).map_or(false, |&b| b == b'$') {
dst.push(b'$');
replacement = &replacement[2..];
continue;
}
debug_assert!(!replacement.is_empty());
let cap_ref = match find_cap_ref(replacement) {
Some(cap_ref) => cap_ref,
None => {
dst.push(b'$');
replacement = &replacement[1..];
continue;
}
};
replacement = &replacement[cap_ref.end..];
match cap_ref.cap {
Ref::Number(i) => append(i, dst),
Ref::Named(name) => {
if let Some(i) = name_to_index(name) {
append(i, dst);
}
}
}
}
dst.extend(replacement);
}
/// `CaptureRef` represents a reference to a capture group inside some text.
/// The reference is either a capture group name or a number.
///
/// It is also tagged with the position in the text immediately proceding the
/// capture reference.
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
struct CaptureRef<'a> {
cap: Ref<'a>,
end: usize,
}
/// A reference to a capture group in some text.
///
/// e.g., `$2`, `$foo`, `${foo}`.
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
enum Ref<'a> {
Named(&'a str),
Number(usize),
}
impl<'a> From<&'a str> for Ref<'a> {
fn from(x: &'a str) -> Ref<'a> {
Ref::Named(x)
}
}
impl From<usize> for Ref<'static> {
fn from(x: usize) -> Ref<'static> {
Ref::Number(x)
}
}
/// Parses a possible reference to a capture group name in the given text,
/// starting at the beginning of `replacement`.
///
/// If no such valid reference could be found, None is returned.
fn find_cap_ref(replacement: &[u8]) -> Option<CaptureRef> {
let mut i = 0;
if replacement.len() <= 1 || replacement[0] != b'$' {
return None;
}
let mut brace = false;
i += 1;
if replacement[i] == b'{' {
brace = true;
i += 1;
}
let mut cap_end = i;
while replacement.get(cap_end).map_or(false, is_valid_cap_letter) {
cap_end += 1;
}
if cap_end == i {
return None;
}
// We just verified that the range 0..cap_end is valid ASCII, so it must
// therefore be valid UTF-8. If we really cared, we could avoid this UTF-8
// check with an unchecked conversion or by parsing the number straight
// from &[u8].
let cap = str::from_utf8(&replacement[i..cap_end])
.expect("valid UTF-8 capture name");
if brace {
if !replacement.get(cap_end).map_or(false, |&b| b == b'}') {
return None;
}
cap_end += 1;
}
Some(CaptureRef {
cap: match cap.parse::<u32>() {
Ok(i) => Ref::Number(i as usize),
Err(_) => Ref::Named(cap),
},
end: cap_end,
})
}
/// Returns true if and only if the given byte is allowed in a capture name.
fn is_valid_cap_letter(b: &u8) -> bool {
match *b {
b'0' ... b'9' | b'a' ... b'z' | b'A' ... b'Z' | b'_' => true,
_ => false,
}
}
#[cfg(test)]
mod tests {
use super::{CaptureRef, find_cap_ref, interpolate};
macro_rules! find {
($name:ident, $text:expr) => {
#[test]
fn $name() {
assert_eq!(None, find_cap_ref($text.as_bytes()));
}
};
($name:ident, $text:expr, $capref:expr) => {
#[test]
fn $name() {
assert_eq!(Some($capref), find_cap_ref($text.as_bytes()));
}
};
}
macro_rules! c {
($name_or_number:expr, $pos:expr) => {
CaptureRef { cap: $name_or_number.into(), end: $pos }
};
}
find!(find_cap_ref1, "$foo", c!("foo", 4));
find!(find_cap_ref2, "${foo}", c!("foo", 6));
find!(find_cap_ref3, "$0", c!(0, 2));
find!(find_cap_ref4, "$5", c!(5, 2));
find!(find_cap_ref5, "$10", c!(10, 3));
find!(find_cap_ref6, "$42a", c!("42a", 4));
find!(find_cap_ref7, "${42}a", c!(42, 5));
find!(find_cap_ref8, "${42");
find!(find_cap_ref9, "${42 ");
find!(find_cap_ref10, " $0 ");
find!(find_cap_ref11, "$");
find!(find_cap_ref12, " ");
find!(find_cap_ref13, "");
// A convenience routine for using interpolate's unwieldy but flexible API.
fn interpolate_string(
mut name_to_index: Vec<(&'static str, usize)>,
caps: Vec<&'static str>,
replacement: &str,
) -> String {
name_to_index.sort_by_key(|x| x.0);
let mut dst = vec![];
interpolate(
replacement.as_bytes(),
|i, dst| {
if let Some(&s) = caps.get(i) {
dst.extend(s.as_bytes());
}
},
|name| -> Option<usize> {
name_to_index
.binary_search_by_key(&name, |x| x.0)
.ok()
.map(|i| name_to_index[i].1)
},
&mut dst,
);
String::from_utf8(dst).unwrap()
}
macro_rules! interp {
($name:ident, $map:expr, $caps:expr, $hay:expr, $expected:expr $(,)*) => {
#[test]
fn $name() {
assert_eq!($expected, interpolate_string($map, $caps, $hay));
}
}
}
interp!(
interp1,
vec![("foo", 2)],
vec!["", "", "xxx"],
"test $foo test",
"test xxx test",
);
interp!(
interp2,
vec![("foo", 2)],
vec!["", "", "xxx"],
"test$footest",
"test",
);
interp!(
interp3,
vec![("foo", 2)],
vec!["", "", "xxx"],
"test${foo}test",
"testxxxtest",
);
interp!(
interp4,
vec![("foo", 2)],
vec!["", "", "xxx"],
"test$2test",
"test",
);
interp!(
interp5,
vec![("foo", 2)],
vec!["", "", "xxx"],
"test${2}test",
"testxxxtest",
);
interp!(
interp6,
vec![("foo", 2)],
vec!["", "", "xxx"],
"test $$foo test",
"test $foo test",
);
interp!(
interp7,
vec![("foo", 2)],
vec!["", "", "xxx"],
"test $foo",
"test xxx",
);
interp!(
interp8,
vec![("foo", 2)],
vec!["", "", "xxx"],
"$foo test",
"xxx test",
);
interp!(
interp9,
vec![("bar", 1), ("foo", 2)],
vec!["", "yyy", "xxx"],
"test $bar$foo",
"test yyyxxx",
);
interp!(
interp10,
vec![("bar", 1), ("foo", 2)],
vec!["", "yyy", "xxx"],
"test $ test",
"test $ test",
);
interp!(
interp11,
vec![("bar", 1), ("foo", 2)],
vec!["", "yyy", "xxx"],
"test ${} test",
"test ${} test",
);
interp!(
interp12,
vec![("bar", 1), ("foo", 2)],
vec!["", "yyy", "xxx"],
"test ${ } test",
"test ${ } test",
);
interp!(
interp13,
vec![("bar", 1), ("foo", 2)],
vec!["", "yyy", "xxx"],
"test ${a b} test",
"test ${a b} test",
);
interp!(
interp14,
vec![("bar", 1), ("foo", 2)],
vec!["", "yyy", "xxx"],
"test ${a} test",
"test test",
);
}

1072
grep-matcher/src/lib.rs Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,208 @@
use grep_matcher::{Captures, Match, Matcher};
use regex::bytes::Regex;
use util::{RegexMatcher, RegexMatcherNoCaps};
fn matcher(pattern: &str) -> RegexMatcher {
RegexMatcher::new(Regex::new(pattern).unwrap())
}
fn matcher_no_caps(pattern: &str) -> RegexMatcherNoCaps {
RegexMatcherNoCaps(Regex::new(pattern).unwrap())
}
fn m(start: usize, end: usize) -> Match {
Match::new(start, end)
}
#[test]
fn find() {
let matcher = matcher(r"(\w+)\s+(\w+)");
assert_eq!(matcher.find(b" homer simpson ").unwrap(), Some(m(1, 14)));
}
#[test]
fn find_iter() {
let matcher = matcher(r"(\w+)\s+(\w+)");
let mut matches = vec![];
matcher.find_iter(b"aa bb cc dd", |m| {
matches.push(m);
true
}).unwrap();
assert_eq!(matches, vec![m(0, 5), m(6, 11)]);
// Test that find_iter respects short circuiting.
matches.clear();
matcher.find_iter(b"aa bb cc dd", |m| {
matches.push(m);
false
}).unwrap();
assert_eq!(matches, vec![m(0, 5)]);
}
#[test]
fn try_find_iter() {
#[derive(Clone, Debug, Eq, PartialEq)]
struct MyError;
let matcher = matcher(r"(\w+)\s+(\w+)");
let mut matches = vec![];
let err = matcher.try_find_iter(b"aa bb cc dd", |m| {
if matches.is_empty() {
matches.push(m);
Ok(true)
} else {
Err(MyError)
}
}).unwrap().unwrap_err();
assert_eq!(matches, vec![m(0, 5)]);
assert_eq!(err, MyError);
}
#[test]
fn shortest_match() {
let matcher = matcher(r"a+");
// This tests that the default impl isn't doing anything smart, and simply
// defers to `find`.
assert_eq!(matcher.shortest_match(b"aaa").unwrap(), Some(3));
// The actual underlying regex is smarter.
assert_eq!(matcher.re.shortest_match(b"aaa"), Some(1));
}
#[test]
fn captures() {
let matcher = matcher(r"(?P<a>\w+)\s+(?P<b>\w+)");
assert_eq!(matcher.capture_count(), 3);
assert_eq!(matcher.capture_index("a"), Some(1));
assert_eq!(matcher.capture_index("b"), Some(2));
assert_eq!(matcher.capture_index("nada"), None);
let mut caps = matcher.new_captures().unwrap();
assert!(matcher.captures(b" homer simpson ", &mut caps).unwrap());
assert_eq!(caps.get(0), Some(m(1, 14)));
assert_eq!(caps.get(1), Some(m(1, 6)));
assert_eq!(caps.get(2), Some(m(7, 14)));
}
#[test]
fn captures_iter() {
let matcher = matcher(r"(?P<a>\w+)\s+(?P<b>\w+)");
let mut caps = matcher.new_captures().unwrap();
let mut matches = vec![];
matcher.captures_iter(b"aa bb cc dd", &mut caps, |caps| {
matches.push(caps.get(0).unwrap());
matches.push(caps.get(1).unwrap());
matches.push(caps.get(2).unwrap());
true
}).unwrap();
assert_eq!(matches, vec![
m(0, 5), m(0, 2), m(3, 5),
m(6, 11), m(6, 8), m(9, 11),
]);
// Test that captures_iter respects short circuiting.
matches.clear();
matcher.captures_iter(b"aa bb cc dd", &mut caps, |caps| {
matches.push(caps.get(0).unwrap());
matches.push(caps.get(1).unwrap());
matches.push(caps.get(2).unwrap());
false
}).unwrap();
assert_eq!(matches, vec![
m(0, 5), m(0, 2), m(3, 5),
]);
}
#[test]
fn try_captures_iter() {
#[derive(Clone, Debug, Eq, PartialEq)]
struct MyError;
let matcher = matcher(r"(?P<a>\w+)\s+(?P<b>\w+)");
let mut caps = matcher.new_captures().unwrap();
let mut matches = vec![];
let err = matcher.try_captures_iter(b"aa bb cc dd", &mut caps, |caps| {
if matches.is_empty() {
matches.push(caps.get(0).unwrap());
matches.push(caps.get(1).unwrap());
matches.push(caps.get(2).unwrap());
Ok(true)
} else {
Err(MyError)
}
}).unwrap().unwrap_err();
assert_eq!(matches, vec![m(0, 5), m(0, 2), m(3, 5)]);
assert_eq!(err, MyError);
}
// Test that our default impls for capturing are correct. Namely, when
// capturing isn't supported by the underlying matcher, then all of the
// various capturing related APIs fail fast.
#[test]
fn no_captures() {
let matcher = matcher_no_caps(r"(?P<a>\w+)\s+(?P<b>\w+)");
assert_eq!(matcher.capture_count(), 0);
assert_eq!(matcher.capture_index("a"), None);
assert_eq!(matcher.capture_index("b"), None);
assert_eq!(matcher.capture_index("nada"), None);
let mut caps = matcher.new_captures().unwrap();
assert!(!matcher.captures(b"homer simpson", &mut caps).unwrap());
let mut called = false;
matcher.captures_iter(b"homer simpson", &mut caps, |_| {
called = true;
true
}).unwrap();
assert!(!called);
}
#[test]
fn replace() {
let matcher = matcher(r"(\w+)\s+(\w+)");
let mut dst = vec![];
matcher.replace(b"aa bb cc dd", &mut dst, |_, dst| {
dst.push(b'z');
true
}).unwrap();
assert_eq!(dst, b"z z");
// Test that replacements respect short circuiting.
dst.clear();
matcher.replace(b"aa bb cc dd", &mut dst, |_, dst| {
dst.push(b'z');
false
}).unwrap();
assert_eq!(dst, b"z cc dd");
}
#[test]
fn replace_with_captures() {
let matcher = matcher(r"(\w+)\s+(\w+)");
let haystack = b"aa bb cc dd";
let mut caps = matcher.new_captures().unwrap();
let mut dst = vec![];
matcher.replace_with_captures(haystack, &mut caps, &mut dst, |caps, dst| {
caps.interpolate(
|name| matcher.capture_index(name),
haystack,
b"$2 $1",
dst,
);
true
}).unwrap();
assert_eq!(dst, b"bb aa dd cc");
// Test that replacements respect short circuiting.
dst.clear();
matcher.replace_with_captures(haystack, &mut caps, &mut dst, |caps, dst| {
caps.interpolate(
|name| matcher.capture_index(name),
haystack,
b"$2 $1",
dst,
);
false
}).unwrap();
assert_eq!(dst, b"bb aa cc dd");
}

View File

@@ -0,0 +1,6 @@
extern crate grep_matcher;
extern crate regex;
mod util;
mod test_matcher;

104
grep-matcher/tests/util.rs Normal file
View File

@@ -0,0 +1,104 @@
use std::collections::HashMap;
use std::result;
use grep_matcher::{Captures, Match, Matcher, NoCaptures, NoError};
use regex::bytes::{CaptureLocations, Regex};
#[derive(Debug)]
pub struct RegexMatcher {
pub re: Regex,
pub names: HashMap<String, usize>,
}
impl RegexMatcher {
pub fn new(re: Regex) -> RegexMatcher {
let mut names = HashMap::new();
for (i, optional_name) in re.capture_names().enumerate() {
if let Some(name) = optional_name {
names.insert(name.to_string(), i);
}
}
RegexMatcher {
re: re,
names: names,
}
}
}
type Result<T> = result::Result<T, NoError>;
impl Matcher for RegexMatcher {
type Captures = RegexCaptures;
type Error = NoError;
fn find_at(
&self,
haystack: &[u8],
at: usize,
) -> Result<Option<Match>> {
Ok(self.re
.find_at(haystack, at)
.map(|m| Match::new(m.start(), m.end())))
}
fn new_captures(&self) -> Result<RegexCaptures> {
Ok(RegexCaptures(self.re.capture_locations()))
}
fn captures_at(
&self,
haystack: &[u8],
at: usize,
caps: &mut RegexCaptures,
) -> Result<bool> {
Ok(self.re.captures_read_at(&mut caps.0, haystack, at).is_some())
}
fn capture_count(&self) -> usize {
self.re.captures_len()
}
fn capture_index(&self, name: &str) -> Option<usize> {
self.names.get(name).map(|i| *i)
}
// We purposely don't implement any other methods, so that we test the
// default impls. The "real" Regex impl for Matcher provides a few more
// impls. e.g., Its `find_iter` impl is faster than what we can do here,
// since the regex crate avoids synchronization overhead.
}
#[derive(Debug)]
pub struct RegexMatcherNoCaps(pub Regex);
impl Matcher for RegexMatcherNoCaps {
type Captures = NoCaptures;
type Error = NoError;
fn find_at(
&self,
haystack: &[u8],
at: usize,
) -> Result<Option<Match>> {
Ok(self.0
.find_at(haystack, at)
.map(|m| Match::new(m.start(), m.end())))
}
fn new_captures(&self) -> Result<NoCaptures> {
Ok(NoCaptures::new())
}
}
#[derive(Clone, Debug)]
pub struct RegexCaptures(CaptureLocations);
impl Captures for RegexCaptures {
fn len(&self) -> usize {
self.0.len()
}
fn get(&self, i: usize) -> Option<Match> {
self.0.pos(i).map(|(s, e)| Match::new(s, e))
}
}

31
grep-printer/Cargo.toml Normal file
View File

@@ -0,0 +1,31 @@
[package]
name = "grep-printer"
version = "0.0.1" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
An implementation of the grep crate's Sink trait that provides standard
printing of search results, similar to grep itself.
"""
documentation = "https://docs.rs/grep-printer"
homepage = "https://github.com/BurntSushi/ripgrep"
repository = "https://github.com/BurntSushi/ripgrep"
readme = "README.md"
keywords = ["grep", "pattern", "print", "printer", "sink"]
license = "Unlicense/MIT"
[features]
default = ["serde1"]
serde1 = ["base64", "serde", "serde_derive", "serde_json"]
[dependencies]
base64 = { version = "0.9", optional = true }
grep-matcher = { version = "0.0.1", path = "../grep-matcher" }
grep-searcher = { version = "0.0.1", path = "../grep-searcher" }
log = "0.4"
termcolor = "1"
serde = { version = "1", optional = true }
serde_derive = { version = "1", optional = true }
serde_json = { version = "1", optional = true }
[dev-dependencies]
grep-regex = { version = "0.0.1", path = "../grep-regex" }

4
grep-printer/README.md Normal file
View File

@@ -0,0 +1,4 @@
grep
----
This is a *library* that provides grep-style line-by-line regex searching (with
comparable performance to `grep` itself).

366
grep-printer/src/color.rs Normal file
View File

@@ -0,0 +1,366 @@
use std::error;
use std::fmt;
use std::str::FromStr;
use termcolor::{Color, ColorSpec, ParseColorError};
/// An error that can occur when parsing color specifications.
#[derive(Clone, Debug, Eq, PartialEq)]
pub enum ColorError {
/// This occurs when an unrecognized output type is used.
UnrecognizedOutType(String),
/// This occurs when an unrecognized spec type is used.
UnrecognizedSpecType(String),
/// This occurs when an unrecognized color name is used.
UnrecognizedColor(String, String),
/// This occurs when an unrecognized style attribute is used.
UnrecognizedStyle(String),
/// This occurs when the format of a color specification is invalid.
InvalidFormat(String),
}
impl error::Error for ColorError {
fn description(&self) -> &str {
match *self {
ColorError::UnrecognizedOutType(_) => "unrecognized output type",
ColorError::UnrecognizedSpecType(_) => "unrecognized spec type",
ColorError::UnrecognizedColor(_, _) => "unrecognized color name",
ColorError::UnrecognizedStyle(_) => "unrecognized style attribute",
ColorError::InvalidFormat(_) => "invalid color spec",
}
}
}
impl ColorError {
fn from_parse_error(err: ParseColorError) -> ColorError {
ColorError::UnrecognizedColor(
err.invalid().to_string(),
err.to_string(),
)
}
}
impl fmt::Display for ColorError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match *self {
ColorError::UnrecognizedOutType(ref name) => {
write!(
f,
"unrecognized output type '{}'. Choose from: \
path, line, column, match.",
name,
)
}
ColorError::UnrecognizedSpecType(ref name) => {
write!(
f,
"unrecognized spec type '{}'. Choose from: \
fg, bg, style, none.",
name,
)
}
ColorError::UnrecognizedColor(_, ref msg) => {
write!(f, "{}", msg)
}
ColorError::UnrecognizedStyle(ref name) => {
write!(
f,
"unrecognized style attribute '{}'. Choose from: \
nobold, bold, nointense, intense, nounderline, \
underline.",
name,
)
}
ColorError::InvalidFormat(ref original) => {
write!(
f,
"invalid color spec format: '{}'. Valid format \
is '(path|line|column|match):(fg|bg|style):(value)'.",
original,
)
}
}
}
}
/// A merged set of color specifications.
///
/// This set of color specifications represents the various color types that
/// are supported by the printers in this crate. A set of color specifications
/// can be created from a sequence of
/// [`UserColorSpec`s](struct.UserColorSpec.html).
#[derive(Clone, Debug, Default, Eq, PartialEq)]
pub struct ColorSpecs {
path: ColorSpec,
line: ColorSpec,
column: ColorSpec,
matched: ColorSpec,
}
/// A single color specification provided by the user.
///
/// ## Format
///
/// The format of a `Spec` is a triple: `{type}:{attribute}:{value}`. Each
/// component is defined as follows:
///
/// * `{type}` can be one of `path`, `line`, `column` or `match`.
/// * `{attribute}` can be one of `fg`, `bg` or `style`. `{attribute}` may also
/// be the special value `none`, in which case, `{value}` can be omitted.
/// * `{value}` is either a color name (for `fg`/`bg`) or a style instruction.
///
/// `{type}` controls which part of the output should be styled.
///
/// When `{attribute}` is `none`, then this should cause any existing style
/// settings to be cleared for the specified `type`.
///
/// `{value}` should be a color when `{attribute}` is `fg` or `bg`, or it
/// should be a style instruction when `{attribute}` is `style`. When
/// `{attribute}` is `none`, `{value}` must be omitted.
///
/// Valid colors are `black`, `blue`, `green`, `red`, `cyan`, `magenta`,
/// `yellow`, `white`. Extended colors can also be specified, and are formatted
/// as `x` (for 256-bit colors) or `x,x,x` (for 24-bit true color), where
/// `x` is a number between 0 and 255 inclusive. `x` may be given as a normal
/// decimal number of a hexadecimal number, where the latter is prefixed by
/// `0x`.
///
/// Valid style instructions are `nobold`, `bold`, `intense`, `nointense`,
/// `underline`, `nounderline`.
///
/// ## Example
///
/// The standard way to build a `UserColorSpec` is to parse it from a string.
/// Once multiple `UserColorSpec`s have been constructed, they can be provided
/// to the standard printer where they will automatically be applied to the
/// output.
///
/// A `UserColorSpec` can also be converted to a `termcolor::ColorSpec`:
///
/// ```rust
/// extern crate grep_printer;
/// extern crate termcolor;
///
/// # fn main() {
/// use termcolor::{Color, ColorSpec};
/// use grep_printer::UserColorSpec;
///
/// let user_spec1: UserColorSpec = "path:fg:blue".parse().unwrap();
/// let user_spec2: UserColorSpec = "match:bg:0xff,0x7f,0x00".parse().unwrap();
///
/// let spec1 = user_spec1.to_color_spec();
/// let spec2 = user_spec2.to_color_spec();
///
/// assert_eq!(spec1.fg(), Some(&Color::Blue));
/// assert_eq!(spec2.bg(), Some(&Color::Rgb(0xFF, 0x7F, 0x00)));
/// # }
/// ```
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct UserColorSpec {
ty: OutType,
value: SpecValue,
}
impl UserColorSpec {
/// Convert this user provided color specification to a specification that
/// can be used with `termcolor`. This drops the type of this specification
/// (where the type indicates where the color is applied in the standard
/// printer, e.g., to the file path or the line numbers, etc.).
pub fn to_color_spec(&self) -> ColorSpec {
let mut spec = ColorSpec::default();
self.value.merge_into(&mut spec);
spec
}
}
/// The actual value given by the specification.
#[derive(Clone, Debug, Eq, PartialEq)]
enum SpecValue {
None,
Fg(Color),
Bg(Color),
Style(Style),
}
/// The set of configurable portions of ripgrep's output.
#[derive(Clone, Debug, Eq, PartialEq)]
enum OutType {
Path,
Line,
Column,
Match,
}
/// The specification type.
#[derive(Clone, Debug, Eq, PartialEq)]
enum SpecType {
Fg,
Bg,
Style,
None,
}
/// The set of available styles for use in the terminal.
#[derive(Clone, Debug, Eq, PartialEq)]
enum Style {
Bold,
NoBold,
Intense,
NoIntense,
Underline,
NoUnderline
}
impl ColorSpecs {
/// Create color specifications from a list of user supplied
/// specifications.
pub fn new(specs: &[UserColorSpec]) -> ColorSpecs {
let mut merged = ColorSpecs::default();
for spec in specs {
match spec.ty {
OutType::Path => spec.merge_into(&mut merged.path),
OutType::Line => spec.merge_into(&mut merged.line),
OutType::Column => spec.merge_into(&mut merged.column),
OutType::Match => spec.merge_into(&mut merged.matched),
}
}
merged
}
/// Return the color specification for coloring file paths.
pub fn path(&self) -> &ColorSpec {
&self.path
}
/// Return the color specification for coloring line numbers.
pub fn line(&self) -> &ColorSpec {
&self.line
}
/// Return the color specification for coloring column numbers.
pub fn column(&self) -> &ColorSpec {
&self.column
}
/// Return the color specification for coloring matched text.
pub fn matched(&self) -> &ColorSpec {
&self.matched
}
}
impl UserColorSpec {
/// Merge this spec into the given color specification.
fn merge_into(&self, cspec: &mut ColorSpec) {
self.value.merge_into(cspec);
}
}
impl SpecValue {
/// Merge this spec value into the given color specification.
fn merge_into(&self, cspec: &mut ColorSpec) {
match *self {
SpecValue::None => cspec.clear(),
SpecValue::Fg(ref color) => { cspec.set_fg(Some(color.clone())); }
SpecValue::Bg(ref color) => { cspec.set_bg(Some(color.clone())); }
SpecValue::Style(ref style) => {
match *style {
Style::Bold => { cspec.set_bold(true); }
Style::NoBold => { cspec.set_bold(false); }
Style::Intense => { cspec.set_intense(true); }
Style::NoIntense => { cspec.set_intense(false); }
Style::Underline => { cspec.set_underline(true); }
Style::NoUnderline => { cspec.set_underline(false); }
}
}
}
}
}
impl FromStr for UserColorSpec {
type Err = ColorError;
fn from_str(s: &str) -> Result<UserColorSpec, ColorError> {
let pieces: Vec<&str> = s.split(':').collect();
if pieces.len() <= 1 || pieces.len() > 3 {
return Err(ColorError::InvalidFormat(s.to_string()));
}
let otype: OutType = pieces[0].parse()?;
match pieces[1].parse()? {
SpecType::None => {
Ok(UserColorSpec {
ty: otype,
value: SpecValue::None,
})
}
SpecType::Style => {
if pieces.len() < 3 {
return Err(ColorError::InvalidFormat(s.to_string()));
}
let style: Style = pieces[2].parse()?;
Ok(UserColorSpec { ty: otype, value: SpecValue::Style(style) })
}
SpecType::Fg => {
if pieces.len() < 3 {
return Err(ColorError::InvalidFormat(s.to_string()));
}
let color: Color = pieces[2]
.parse()
.map_err(ColorError::from_parse_error)?;
Ok(UserColorSpec { ty: otype, value: SpecValue::Fg(color) })
}
SpecType::Bg => {
if pieces.len() < 3 {
return Err(ColorError::InvalidFormat(s.to_string()));
}
let color: Color = pieces[2]
.parse()
.map_err(ColorError::from_parse_error)?;
Ok(UserColorSpec { ty: otype, value: SpecValue::Bg(color) })
}
}
}
}
impl FromStr for OutType {
type Err = ColorError;
fn from_str(s: &str) -> Result<OutType, ColorError> {
match &*s.to_lowercase() {
"path" => Ok(OutType::Path),
"line" => Ok(OutType::Line),
"column" => Ok(OutType::Column),
"match" => Ok(OutType::Match),
_ => Err(ColorError::UnrecognizedOutType(s.to_string())),
}
}
}
impl FromStr for SpecType {
type Err = ColorError;
fn from_str(s: &str) -> Result<SpecType, ColorError> {
match &*s.to_lowercase() {
"fg" => Ok(SpecType::Fg),
"bg" => Ok(SpecType::Bg),
"style" => Ok(SpecType::Style),
"none" => Ok(SpecType::None),
_ => Err(ColorError::UnrecognizedSpecType(s.to_string())),
}
}
}
impl FromStr for Style {
type Err = ColorError;
fn from_str(s: &str) -> Result<Style, ColorError> {
match &*s.to_lowercase() {
"bold" => Ok(Style::Bold),
"nobold" => Ok(Style::NoBold),
"intense" => Ok(Style::Intense),
"nointense" => Ok(Style::NoIntense),
"underline" => Ok(Style::Underline),
"nounderline" => Ok(Style::NoUnderline),
_ => Err(ColorError::UnrecognizedStyle(s.to_string())),
}
}
}

View File

@@ -0,0 +1,90 @@
use std::io::{self, Write};
use termcolor::{ColorSpec, WriteColor};
/// A writer that counts the number of bytes that have been successfully
/// written.
#[derive(Clone, Debug)]
pub struct CounterWriter<W> {
wtr: W,
count: u64,
total_count: u64,
}
impl<W: Write> CounterWriter<W> {
pub fn new(wtr: W) -> CounterWriter<W> {
CounterWriter { wtr: wtr, count: 0, total_count: 0 }
}
}
impl<W> CounterWriter<W> {
/// Returns the total number of bytes written since construction or the
/// last time `reset` was called.
pub fn count(&self) -> u64 {
self.count
}
/// Returns the total number of bytes written since construction.
pub fn total_count(&self) -> u64 {
self.total_count + self.count
}
/// Resets the number of bytes written to `0`.
pub fn reset_count(&mut self) {
self.total_count += self.count;
self.count = 0;
}
/// Clear resets all counting related state for this writer.
///
/// After this call, the total count of bytes written to the underlying
/// writer is erased and reset.
#[allow(dead_code)]
pub fn clear(&mut self) {
self.count = 0;
self.total_count = 0;
}
#[allow(dead_code)]
pub fn get_ref(&self) -> &W {
&self.wtr
}
pub fn get_mut(&mut self) -> &mut W {
&mut self.wtr
}
pub fn into_inner(self) -> W {
self.wtr
}
}
impl<W: Write> Write for CounterWriter<W> {
fn write(&mut self, buf: &[u8]) -> Result<usize, io::Error> {
let n = self.wtr.write(buf)?;
self.count += n as u64;
Ok(n)
}
fn flush(&mut self) -> Result<(), io::Error> {
self.wtr.flush()
}
}
impl<W: WriteColor> WriteColor for CounterWriter<W> {
fn supports_color(&self) -> bool {
self.wtr.supports_color()
}
fn set_color(&mut self, spec: &ColorSpec) -> io::Result<()> {
self.wtr.set_color(spec)
}
fn reset(&mut self) -> io::Result<()> {
self.wtr.reset()
}
fn is_synchronous(&self) -> bool {
self.wtr.is_synchronous()
}
}

914
grep-printer/src/json.rs Normal file
View File

@@ -0,0 +1,914 @@
use std::io::{self, Write};
use std::path::Path;
use std::time::Instant;
use grep_matcher::{Match, Matcher};
use grep_searcher::{
Searcher,
Sink, SinkError, SinkContext, SinkContextKind, SinkFinish, SinkMatch,
};
use serde_json as json;
use counter::CounterWriter;
use jsont;
use stats::Stats;
/// The configuration for the JSON printer.
///
/// This is manipulated by the JSONBuilder and then referenced by the actual
/// implementation. Once a printer is build, the configuration is frozen and
/// cannot changed.
#[derive(Debug, Clone)]
struct Config {
pretty: bool,
max_matches: Option<u64>,
always_begin_end: bool,
}
impl Default for Config {
fn default() -> Config {
Config {
pretty: false,
max_matches: None,
always_begin_end: false,
}
}
}
/// A builder for a JSON lines printer.
///
/// The builder permits configuring how the printer behaves. The JSON printer
/// has fewer configuration options than the standard printer because it is
/// a structured format, and the printer always attempts to find the most
/// information possible.
///
/// Some configuration options, such as whether line numbers are included or
/// whether contextual lines are shown, are drawn directly from the
/// `grep_searcher::Searcher`'s configuration.
///
/// Once a `JSON` printer is built, its configuration cannot be changed.
#[derive(Clone, Debug)]
pub struct JSONBuilder {
config: Config,
}
impl JSONBuilder {
/// Return a new builder for configuring the JSON printer.
pub fn new() -> JSONBuilder {
JSONBuilder { config: Config::default() }
}
/// Create a JSON printer that writes results to the given writer.
pub fn build<W: io::Write>(&self, wtr: W) -> JSON<W> {
JSON {
config: self.config.clone(),
wtr: CounterWriter::new(wtr),
matches: vec![],
}
}
/// Print JSON in a pretty printed format.
///
/// Enabling this will no longer produce a "JSON lines" format, in that
/// each JSON object printed may span multiple lines.
///
/// This is disabled by default.
pub fn pretty(&mut self, yes: bool) -> &mut JSONBuilder {
self.config.pretty = yes;
self
}
/// Set the maximum amount of matches that are printed.
///
/// If multi line search is enabled and a match spans multiple lines, then
/// that match is counted exactly once for the purposes of enforcing this
/// limit, regardless of how many lines it spans.
pub fn max_matches(&mut self, limit: Option<u64>) -> &mut JSONBuilder {
self.config.max_matches = limit;
self
}
/// When enabled, the `begin` and `end` messages are always emitted, even
/// when no match is found.
///
/// When disabled, the `begin` and `end` messages are only shown is there
/// is at least one `match` or `context` message.
///
/// This is disabled by default.
pub fn always_begin_end(&mut self, yes: bool) -> &mut JSONBuilder {
self.config.always_begin_end = yes;
self
}
}
/// The JSON printer, which emits results in a JSON lines format.
///
/// This type is generic over `W`, which represents any implementation of
/// the standard library `io::Write` trait.
///
/// # Format
///
/// This section describe the JSON format used by this printer.
///
/// To skip the rigamarole, take a look at the
/// [example](#example)
/// at the end.
///
/// ## Overview
///
/// The format of this printer is the [JSON Lines](http://jsonlines.org/)
/// format. Specifically, this printer emits a sequence of messages, where
/// each message is encoded as a single JSON value on a single line. There are
/// four different types of messages (and this number may expand over time):
///
/// * **begin** - A message that indicates a file is being searched.
/// * **end** - A message the indicates a file is done being searched. This
/// message also include summary statistics about the search.
/// * **match** - A message that indicates a match was found. This includes
/// the text and offsets of the match.
/// * **context** - A message that indicates a contextual line was found.
/// This includes the text of the line, along with any match information if
/// the search was inverted.
///
/// Every message is encoded in the same envelope format, which includes a tag
/// indicating the message type along with an object for the payload:
///
/// ```json
/// {
/// "type": "{begin|end|match|context}",
/// "data": { ... }
/// }
/// ```
///
/// The message itself is encoded in the envelope's `data` key.
///
/// ## Text encoding
///
/// Before describing each message format, we first must briefly discuss text
/// encoding, since it factors into every type of message. In particular, JSON
/// may only be encoded in UTF-8, UTF-16 or UTF-32. For the purposes of this
/// printer, we need only worry about UTF-8. The problem here is that searching
/// is not limited to UTF-8 exclusively, which in turn implies that matches
/// may be reported that contain invalid UTF-8. Moreover, this printer may
/// also print file paths, and the encoding of file paths is itself not
/// guarnateed to be valid UTF-8. Therefore, this printer must deal with the
/// presence of invalid UTF-8 somehow. The printer could silently ignore such
/// things completely, or even lossily transcode invalid UTF-8 to valid UTF-8
/// by replacing all invalid sequences with the Unicode replacement character.
/// However, this would prevent consumers of this format from accessing the
/// original data in a non-lossy way.
///
/// Therefore, this printer will emit valid UTF-8 encoded bytes as normal
/// JSON strings and otherwise base64 encode data that isn't valid UTF-8. To
/// communicate whether this process occurs or not, strings are keyed by the
/// name `text` where as arbitrary bytes are keyed by `bytes`.
///
/// For example, when a path is included in a message, it is formatted like so,
/// if and only if the path is valid UTF-8:
///
/// ```json
/// {
/// "path": {
/// "text": "/home/ubuntu/lib.rs"
/// }
/// }
/// ```
///
/// If instead our path was `/home/ubuntu/lib\xFF.rs`, where the `\xFF` byte
/// makes it invalid UTF-8, the path would instead be encoded like so:
///
/// ```json
/// {
/// "path": {
/// "bytes": "L2hvbWUvdWJ1bnR1L2xpYv8ucnM="
/// }
/// }
/// ```
///
/// This same representation is used for reporting matches as well.
///
/// The printer guarantees that the `text` field is used whenever the
/// underlying bytes are valid UTF-8.
///
/// ## Wire format
///
/// This section documents the wire format emitted by this printer, starting
/// with the four types of messages.
///
/// Each message has its own format, and is contained inside an envelope that
/// indicates the type of message. The envelope has these fields:
///
/// * **type** - A string indicating the type of this message. It may be one
/// of four possible strings: `begin`, `end`, `match` or `context`. This
/// list may expand over time.
/// * **data** - The actual message data. The format of this field depends on
/// the value of `type`. The possible message formats are
/// [`begin`](#message-begin),
/// [`end`](#message-end),
/// [`match`](#message-match),
/// [`context`](#message-context).
///
/// #### Message: **begin**
///
/// This message indicates that a search has begun. It has these fields:
///
/// * **path** - An
/// [arbitrary data object](#object-arbitrary-data)
/// representing the file path corresponding to the search, if one is
/// present. If no file path is available, then this field is `null`.
///
/// #### Message: **end**
///
/// This message indicates that a search has finished. It has these fields:
///
/// * **path** - An
/// [arbitrary data object](#object-arbitrary-data)
/// representing the file path corresponding to the search, if one is
/// present. If no file path is available, then this field is `null`.
/// * **binary_offset** - The absolute offset in the data searched
/// corresponding to the place at which binary data was detected. If no
/// binary data was detected (or if binary detection was disabled), then this
/// field is `null`.
/// * **stats** - A [`stats` object](#object-stats) that contains summary
/// statistics for the previous search.
///
/// #### Message: **match**
///
/// This message indicates that a match has been found. A match generally
/// corresponds to a single line of text, although it may correspond to
/// multiple lines if the search can emit matches over multiple lines. It
/// has these fields:
///
/// * **path** - An
/// [arbitrary data object](#object-arbitrary-data)
/// representing the file path corresponding to the search, if one is
/// present. If no file path is available, then this field is `null`.
/// * **lines** - An
/// [arbitrary data object](#object-arbitrary-data)
/// representing one or more lines contained in this match.
/// * **line_number** - If the searcher has been configured to report line
/// numbers, then this corresponds to the line number of the first line
/// in `lines`. If no line numbers are available, then this is `null`.
/// * **absolute_offset** - The absolute byte offset corresponding to the start
/// of `lines` in the data being searched.
/// * **submatches** - An array of [`submatch` objects](#object-submatch)
/// corresponding to matches in `lines`. The offsets included in each
/// `submatch` correspond to byte offsets into `lines`. (If `lines` is base64
/// encoded, then the byte offsets correspond to the data after base64
/// decoding.) The `submatch` objects are guaranteed to be sorted by their
/// starting offsets. Note that it is possible for this array to be empty,
/// for example, when searching reports inverted matches.
///
/// #### Message: **context**
///
/// This message indicates that a contextual line has been found. A contextual
/// line is a line that doesn't contain a match, but is generally adjacent to
/// a line that does contain a match. The precise way in which contextual lines
/// are reported is determined by the searcher. It has these fields, which are
/// exactly the same fields found in a [`match`](#message-match):
///
/// * **path** - An
/// [arbitrary data object](#object-arbitrary-data)
/// representing the file path corresponding to the search, if one is
/// present. If no file path is available, then this field is `null`.
/// * **lines** - An
/// [arbitrary data object](#object-arbitrary-data)
/// representing one or more lines contained in this context. This includes
/// line terminators, if they're present.
/// * **line_number** - If the searcher has been configured to report line
/// numbers, then this corresponds to the line number of the first line
/// in `lines`. If no line numbers are available, then this is `null`.
/// * **absolute_offset** - The absolute byte offset corresponding to the start
/// of `lines` in the data being searched.
/// * **submatches** - An array of [`submatch` objects](#object-submatch)
/// corresponding to matches in `lines`. The offsets included in each
/// `submatch` correspond to byte offsets into `lines`. (If `lines` is base64
/// encoded, then the byte offsets correspond to the data after base64
/// decoding.) The `submatch` objects are guaranteed to be sorted by
/// their starting offsets. Note that it is possible for this array to be
/// non-empty, for example, when searching reports inverted matches such that
/// the original matcher could match things in the contextual lines.
///
/// #### Object: **submatch**
///
/// This object describes submatches found within `match` or `context`
/// messages. The `start` and `end` fields indicate the half-open interval on
/// which the match occurs (`start` is included, but `end` is not). It is
/// guaranteed that `start <= end`. It has these fields:
///
/// * **match** - An
/// [arbitrary data object](#object-arbitrary-data)
/// corresponding to the text in this submatch.
/// * **start** - A byte offset indicating the start of this match. This offset
/// is generally reported in terms of the parent object's data. For example,
/// the `lines` field in the
/// [`match`](#message-match) or [`context`](#message-context)
/// messages.
/// * **end** - A byte offset indicating the end of this match. This offset
/// is generally reported in terms of the parent object's data. For example,
/// the `lines` field in the
/// [`match`](#message-match) or [`context`](#message-context)
/// messages.
///
/// #### Object: **stats**
///
/// This object is included in messages and contains summary statistics about
/// a search. It has these fields:
///
/// * **elapsed** - A [`duration` object](#object-duration) describing the
/// length of time that elapsed while performing the search.
/// * **searches** - The number of searches that have run. For this printer,
/// this value is always `1`. (Implementations may emit additional message
/// types that use this same `stats` object that represents summary
/// statistics over multiple searches.)
/// * **searches_with_match** - The number of searches that have run that have
/// found at least one match. This is never more than `searches`.
/// * **bytes_searched** - The total number of bytes that have been searched.
/// * **bytes_printed** - The total number of bytes that have been printed.
/// This includes everything emitted by this printer.
/// * **matched_lines** - The total number of lines that participated in a
/// match. When matches may contain multiple lines, then this includes every
/// line that is part of every match.
/// * **matches** - The total number of matches. There may be multiple matches
/// per line. When matches may contain multiple lines, each match is counted
/// only once, regardless of how many lines it spans.
///
/// #### Object: **duration**
///
/// This object includes a few fields for describing a duration. Two of its
/// fields, `secs` and `nanos`, can be combined to give nanosecond precision
/// on systems that support it. It has these fields:
///
/// * **secs** - A whole number of seconds indicating the length of this
/// duration.
/// * **nanos** - A fractional part of this duration represent by nanoseconds.
/// If nanosecond precision isn't supported, then this is typically rounded
/// up to the nearest number of nanoseconds.
/// * **human** - A human readable string describing the length of the
/// duration. The format of the string is itself unspecified.
///
/// #### Object: **arbitrary data**
///
/// This object is used whenever arbitrary data needs to be represented as a
/// JSON value. This object contains two fields, where generally only one of
/// the fields is present:
///
/// * **text** - A normal JSON string that is UTF-8 encoded. This field is
/// populated if and only if the underlying data is valid UTF-8.
/// * **bytes** - A normal JSON string that is a base64 encoding of the
/// underlying bytes.
///
/// More information on the motivation for this representation can be seen in
/// the section [text encoding](#text-encoding) above.
///
/// ## Example
///
/// This section shows a small example that includes all message types.
///
/// Here's the file we want to search, located at `/home/andrew/sherlock`:
///
/// ```text
/// For the Doctor Watsons of this world, as opposed to the Sherlock
/// Holmeses, success in the province of detective work must always
/// be, to a very large extent, the result of luck. Sherlock Holmes
/// can extract a clew from a wisp of straw or a flake of cigar ash;
/// but Doctor Watson has to have it taken out for him and dusted,
/// and exhibited clearly, with a label attached.
/// ```
///
/// Searching for `Watson` with a `before_context` of `1` with line numbers
/// enabled shows something like this using the standard printer:
///
/// ```text
/// sherlock:1:For the Doctor Watsons of this world, as opposed to the Sherlock
/// --
/// sherlock-4-can extract a clew from a wisp of straw or a flake of cigar ash;
/// sherlock:5:but Doctor Watson has to have it taken out for him and dusted,
/// ```
///
/// Here's what the same search looks like using the JSON wire format described
/// above, where in we show semi-prettified JSON (instead of a strict JSON
/// Lines format), for illustrative purposes:
///
/// ```json
/// {
/// "type": "begin",
/// "data": {
/// "path": {"text": "/home/andrew/sherlock"}}
/// }
/// }
/// {
/// "type": "match",
/// "data": {
/// "path": {"text": "/home/andrew/sherlock"},
/// "lines": {"text": "For the Doctor Watsons of this world, as opposed to the Sherlock\n"},
/// "line_number": 1,
/// "absolute_offset": 0,
/// "submatches": [
/// {"match": {"text": "Watson"}, "start": 15, "end": 21}
/// ]
/// }
/// }
/// {
/// "type": "context",
/// "data": {
/// "path": {"text": "/home/andrew/sherlock"},
/// "lines": {"text": "can extract a clew from a wisp of straw or a flake of cigar ash;\n"},
/// "line_number": 4,
/// "absolute_offset": 193,
/// "submatches": []
/// }
/// }
/// {
/// "type": "match",
/// "data": {
/// "path": {"text": "/home/andrew/sherlock"},
/// "lines": {"text": "but Doctor Watson has to have it taken out for him and dusted,\n"},
/// "line_number": 5,
/// "absolute_offset": 258,
/// "submatches": [
/// {"match": {"text": "Watson"}, "start": 11, "end": 17}
/// ]
/// }
/// }
/// {
/// "type": "end",
/// "data": {
/// "path": {"text": "/home/andrew/sherlock"},
/// "binary_offset": null,
/// "stats": {
/// "elapsed": {"secs": 0, "nanos": 36296, "human": "0.0000s"},
/// "searches": 1,
/// "searches_with_match": 1,
/// "bytes_searched": 367,
/// "bytes_printed": 1151,
/// "matched_lines": 2,
/// "matches": 2
/// }
/// }
/// }
/// ```
#[derive(Debug)]
pub struct JSON<W> {
config: Config,
wtr: CounterWriter<W>,
matches: Vec<Match>,
}
impl<W: io::Write> JSON<W> {
/// Return a JSON lines printer with a default configuration that writes
/// matches to the given writer.
pub fn new(wtr: W) -> JSON<W> {
JSONBuilder::new().build(wtr)
}
/// Return an implementation of `Sink` for the JSON printer.
///
/// This does not associate the printer with a file path, which means this
/// implementation will never print a file path along with the matches.
pub fn sink<'s, M: Matcher>(
&'s mut self,
matcher: M,
) -> JSONSink<'static, 's, M, W> {
JSONSink {
matcher: matcher,
json: self,
path: None,
start_time: Instant::now(),
match_count: 0,
after_context_remaining: 0,
binary_byte_offset: None,
begin_printed: false,
stats: Stats::new(),
}
}
/// Return an implementation of `Sink` associated with a file path.
///
/// When the printer is associated with a path, then it may, depending on
/// its configuration, print the path along with the matches found.
pub fn sink_with_path<'p, 's, M, P>(
&'s mut self,
matcher: M,
path: &'p P,
) -> JSONSink<'p, 's, M, W>
where M: Matcher,
P: ?Sized + AsRef<Path>,
{
JSONSink {
matcher: matcher,
json: self,
path: Some(path.as_ref()),
start_time: Instant::now(),
match_count: 0,
after_context_remaining: 0,
binary_byte_offset: None,
begin_printed: false,
stats: Stats::new(),
}
}
/// Write the given message followed by a new line. The new line is
/// determined from the configuration of the given searcher.
fn write_message(&mut self, message: &jsont::Message) -> io::Result<()> {
if self.config.pretty {
json::to_writer_pretty(&mut self.wtr, message)?;
} else {
json::to_writer(&mut self.wtr, message)?;
}
self.wtr.write(&[b'\n'])?;
Ok(())
}
}
impl<W> JSON<W> {
/// Returns true if and only if this printer has written at least one byte
/// to the underlying writer during any of the previous searches.
pub fn has_written(&self) -> bool {
self.wtr.total_count() > 0
}
/// Return a mutable reference to the underlying writer.
pub fn get_mut(&mut self) -> &mut W {
self.wtr.get_mut()
}
/// Consume this printer and return back ownership of the underlying
/// writer.
pub fn into_inner(self) -> W {
self.wtr.into_inner()
}
}
/// An implementation of `Sink` associated with a matcher and an optional file
/// path for the JSON printer.
///
/// This type is generic over a few type parameters:
///
/// * `'p` refers to the lifetime of the file path, if one is provided. When
/// no file path is given, then this is `'static`.
/// * `'s` refers to the lifetime of the
/// [`JSON`](struct.JSON.html)
/// printer that this type borrows.
/// * `M` refers to the type of matcher used by
/// `grep_searcher::Searcher` that is reporting results to this sink.
/// * `W` refers to the underlying writer that this printer is writing its
/// output to.
#[derive(Debug)]
pub struct JSONSink<'p, 's, M: Matcher, W: 's> {
matcher: M,
json: &'s mut JSON<W>,
path: Option<&'p Path>,
start_time: Instant,
match_count: u64,
after_context_remaining: u64,
binary_byte_offset: Option<u64>,
begin_printed: bool,
stats: Stats,
}
impl<'p, 's, M: Matcher, W: io::Write> JSONSink<'p, 's, M, W> {
/// Returns true if and only if this printer received a match in the
/// previous search.
///
/// This is unaffected by the result of searches before the previous
/// search.
pub fn has_match(&self) -> bool {
self.match_count > 0
}
/// Return the total number of matches reported to this sink.
///
/// This corresponds to the number of times `Sink::matched` is called.
pub fn match_count(&self) -> u64 {
self.match_count
}
/// If binary data was found in the previous search, this returns the
/// offset at which the binary data was first detected.
///
/// The offset returned is an absolute offset relative to the entire
/// set of bytes searched.
///
/// This is unaffected by the result of searches before the previous
/// search. e.g., If the search prior to the previous search found binary
/// data but the previous search found no binary data, then this will
/// return `None`.
pub fn binary_byte_offset(&self) -> Option<u64> {
self.binary_byte_offset
}
/// Return a reference to the stats produced by the printer for all
/// searches executed on this sink.
pub fn stats(&self) -> &Stats {
&self.stats
}
/// Execute the matcher over the given bytes and record the match
/// locations if the current configuration demands match granularity.
fn record_matches(&mut self, bytes: &[u8]) -> io::Result<()> {
self.json.matches.clear();
// If printing requires knowing the location of each individual match,
// then compute and stored those right now for use later. While this
// adds an extra copy for storing the matches, we do amortize the
// allocation for it and this greatly simplifies the printing logic to
// the extent that it's easy to ensure that we never do more than
// one search to find the matches.
let matches = &mut self.json.matches;
self.matcher.find_iter(bytes, |m| {
matches.push(m);
true
}).map_err(io::Error::error_message)?;
Ok(())
}
/// Returns true if this printer should quit.
///
/// This implements the logic for handling quitting after seeing a certain
/// amount of matches. In most cases, the logic is simple, but we must
/// permit all "after" contextual lines to print after reaching the limit.
fn should_quit(&self) -> bool {
let limit = match self.json.config.max_matches {
None => return false,
Some(limit) => limit,
};
if self.match_count < limit {
return false;
}
self.after_context_remaining == 0
}
/// Write the "begin" message.
fn write_begin_message(&mut self) -> io::Result<()> {
if self.begin_printed {
return Ok(());
}
let msg = jsont::Message::Begin(jsont::Begin {
path: self.path,
});
self.json.write_message(&msg)?;
self.begin_printed = true;
Ok(())
}
}
impl<'p, 's, M: Matcher, W: io::Write> Sink for JSONSink<'p, 's, M, W> {
type Error = io::Error;
fn matched(
&mut self,
searcher: &Searcher,
mat: &SinkMatch,
) -> Result<bool, io::Error> {
self.write_begin_message()?;
self.match_count += 1;
self.after_context_remaining = searcher.after_context() as u64;
self.record_matches(mat.bytes())?;
self.stats.add_matches(self.json.matches.len() as u64);
self.stats.add_matched_lines(mat.lines().count() as u64);
let submatches = SubMatches::new(mat.bytes(), &self.json.matches);
let msg = jsont::Message::Match(jsont::Match {
path: self.path,
lines: mat.bytes(),
line_number: mat.line_number(),
absolute_offset: mat.absolute_byte_offset(),
submatches: submatches.as_slice(),
});
self.json.write_message(&msg)?;
Ok(!self.should_quit())
}
fn context(
&mut self,
searcher: &Searcher,
ctx: &SinkContext,
) -> Result<bool, io::Error> {
self.write_begin_message()?;
self.json.matches.clear();
if ctx.kind() == &SinkContextKind::After {
self.after_context_remaining =
self.after_context_remaining.saturating_sub(1);
}
let submatches =
if searcher.invert_match() {
self.record_matches(ctx.bytes())?;
SubMatches::new(ctx.bytes(), &self.json.matches)
} else {
SubMatches::empty()
};
let msg = jsont::Message::Context(jsont::Context {
path: self.path,
lines: ctx.bytes(),
line_number: ctx.line_number(),
absolute_offset: ctx.absolute_byte_offset(),
submatches: submatches.as_slice(),
});
self.json.write_message(&msg)?;
Ok(!self.should_quit())
}
fn begin(
&mut self,
_searcher: &Searcher,
) -> Result<bool, io::Error> {
self.json.wtr.reset_count();
self.start_time = Instant::now();
self.match_count = 0;
self.after_context_remaining = 0;
self.binary_byte_offset = None;
if self.json.config.max_matches == Some(0) {
return Ok(false);
}
if !self.json.config.always_begin_end {
return Ok(true);
}
self.write_begin_message()?;
Ok(true)
}
fn finish(
&mut self,
_searcher: &Searcher,
finish: &SinkFinish,
) -> Result<(), io::Error> {
if !self.begin_printed {
return Ok(());
}
self.binary_byte_offset = finish.binary_byte_offset();
self.stats.add_elapsed(self.start_time.elapsed());
self.stats.add_searches(1);
if self.match_count > 0 {
self.stats.add_searches_with_match(1);
}
self.stats.add_bytes_searched(finish.byte_count());
self.stats.add_bytes_printed(self.json.wtr.count());
let msg = jsont::Message::End(jsont::End {
path: self.path,
binary_offset: finish.binary_byte_offset(),
stats: self.stats.clone(),
});
self.json.write_message(&msg)?;
Ok(())
}
}
/// SubMatches represents a set of matches in a contiguous range of bytes.
///
/// A simpler representation for this would just simply be `Vec<SubMatch>`,
/// but the common case is exactly one match per range of bytes, which we
/// specialize here using a fixed size array without any allocation.
enum SubMatches<'a> {
Empty,
Small([jsont::SubMatch<'a>; 1]),
Big(Vec<jsont::SubMatch<'a>>),
}
impl<'a> SubMatches<'a> {
/// Create a new set of match ranges from a set of matches and the
/// corresponding bytes that those matches apply to.
fn new(bytes: &'a[u8], matches: &[Match]) -> SubMatches<'a> {
if matches.len() == 1 {
let mat = matches[0];
SubMatches::Small([jsont::SubMatch {
m: &bytes[mat],
start: mat.start(),
end: mat.end(),
}])
} else {
let mut match_ranges = vec![];
for &mat in matches {
match_ranges.push(jsont::SubMatch {
m: &bytes[mat],
start: mat.start(),
end: mat.end(),
});
}
SubMatches::Big(match_ranges)
}
}
/// Create an empty set of match ranges.
fn empty() -> SubMatches<'static> {
SubMatches::Empty
}
/// Return this set of match ranges as a slice.
fn as_slice(&self) -> &[jsont::SubMatch] {
match *self {
SubMatches::Empty => &[],
SubMatches::Small(ref x) => x,
SubMatches::Big(ref x) => x,
}
}
}
#[cfg(test)]
mod tests {
use grep_regex::RegexMatcher;
use grep_searcher::SearcherBuilder;
use super::{JSON, JSONBuilder};
const SHERLOCK: &'static [u8] = b"\
For the Doctor Watsons of this world, as opposed to the Sherlock
Holmeses, success in the province of detective work must always
be, to a very large extent, the result of luck. Sherlock Holmes
can extract a clew from a wisp of straw or a flake of cigar ash;
but Doctor Watson has to have it taken out for him and dusted,
and exhibited clearly, with a label attached.
";
fn printer_contents(
printer: &mut JSON<Vec<u8>>,
) -> String {
String::from_utf8(printer.get_mut().to_owned()).unwrap()
}
#[test]
fn binary_detection() {
use grep_searcher::BinaryDetection;
const BINARY: &'static [u8] = b"\
For the Doctor Watsons of this world, as opposed to the Sherlock
Holmeses, success in the province of detective work must always
be, to a very large extent, the result of luck. Sherlock Holmes
can extract a clew \x00 from a wisp of straw or a flake of cigar ash;
but Doctor Watson has to have it taken out for him and dusted,
and exhibited clearly, with a label attached.\
";
let matcher = RegexMatcher::new(
r"Watson"
).unwrap();
let mut printer = JSONBuilder::new()
.build(vec![]);
SearcherBuilder::new()
.binary_detection(BinaryDetection::quit(b'\x00'))
.heap_limit(Some(80))
.build()
.search_reader(&matcher, BINARY, printer.sink(&matcher))
.unwrap();
let got = printer_contents(&mut printer);
assert_eq!(got.lines().count(), 3);
let last = got.lines().last().unwrap();
assert!(last.contains(r#""binary_offset":212,"#));
}
#[test]
fn max_matches() {
let matcher = RegexMatcher::new(
r"Watson"
).unwrap();
let mut printer = JSONBuilder::new()
.max_matches(Some(1))
.build(vec![]);
SearcherBuilder::new()
.build()
.search_reader(&matcher, SHERLOCK, printer.sink(&matcher))
.unwrap();
let got = printer_contents(&mut printer);
assert_eq!(got.lines().count(), 3);
}
#[test]
fn no_match() {
let matcher = RegexMatcher::new(
r"DOES NOT MATCH"
).unwrap();
let mut printer = JSONBuilder::new()
.build(vec![]);
SearcherBuilder::new()
.build()
.search_reader(&matcher, SHERLOCK, printer.sink(&matcher))
.unwrap();
let got = printer_contents(&mut printer);
assert!(got.is_empty());
}
#[test]
fn always_begin_end_no_match() {
let matcher = RegexMatcher::new(
r"DOES NOT MATCH"
).unwrap();
let mut printer = JSONBuilder::new()
.always_begin_end(true)
.build(vec![]);
SearcherBuilder::new()
.build()
.search_reader(&matcher, SHERLOCK, printer.sink(&matcher))
.unwrap();
let got = printer_contents(&mut printer);
assert_eq!(got.lines().count(), 2);
assert!(got.contains("begin") && got.contains("end"));
}
}

213
grep-printer/src/jsont.rs Normal file
View File

@@ -0,0 +1,213 @@
// This module defines the types we use for JSON serialization. We specifically
// omit deserialization, partially because there isn't a clear use case for
// them at this time, but also because deserialization will complicate things.
// Namely, the types below are designed in a way that permits JSON
// serialization with little or no allocation. Allocation is often quite
// convenient for deserialization however, so these types would become a bit
// more complex.
use std::borrow::Cow;
use std::path::Path;
use std::str;
use base64;
use serde::{Serialize, Serializer};
use stats::Stats;
#[derive(Serialize)]
#[serde(tag = "type", content = "data")]
#[serde(rename_all = "snake_case")]
pub enum Message<'a> {
Begin(Begin<'a>),
End(End<'a>),
Match(Match<'a>),
Context(Context<'a>),
}
#[derive(Serialize)]
pub struct Begin<'a> {
#[serde(serialize_with = "ser_path")]
pub path: Option<&'a Path>,
}
#[derive(Serialize)]
pub struct End<'a> {
#[serde(serialize_with = "ser_path")]
pub path: Option<&'a Path>,
pub binary_offset: Option<u64>,
pub stats: Stats,
}
#[derive(Serialize)]
pub struct Match<'a> {
#[serde(serialize_with = "ser_path")]
pub path: Option<&'a Path>,
#[serde(serialize_with = "ser_bytes")]
pub lines: &'a [u8],
pub line_number: Option<u64>,
pub absolute_offset: u64,
pub submatches: &'a [SubMatch<'a>],
}
#[derive(Serialize)]
pub struct Context<'a> {
#[serde(serialize_with = "ser_path")]
pub path: Option<&'a Path>,
#[serde(serialize_with = "ser_bytes")]
pub lines: &'a [u8],
pub line_number: Option<u64>,
pub absolute_offset: u64,
pub submatches: &'a [SubMatch<'a>],
}
#[derive(Serialize)]
pub struct SubMatch<'a> {
#[serde(rename = "match")]
#[serde(serialize_with = "ser_bytes")]
pub m: &'a [u8],
pub start: usize,
pub end: usize,
}
/// Data represents things that look like strings, but may actually not be
/// valid UTF-8. To handle this, `Data` is serialized as an object with one
/// of two keys: `text` (for valid UTF-8) or `bytes` (for invalid UTF-8).
///
/// The happy path is valid UTF-8, which streams right through as-is, since
/// it is natively supported by JSON. When invalid UTF-8 is found, then it is
/// represented as arbitrary bytes and base64 encoded.
#[derive(Clone, Debug, Hash, PartialEq, Eq, Serialize)]
#[serde(untagged)]
enum Data<'a> {
Text { text: Cow<'a, str> },
Bytes {
#[serde(serialize_with = "to_base64")]
bytes: &'a [u8],
},
}
impl<'a> Data<'a> {
fn from_bytes(bytes: &[u8]) -> Data {
match str::from_utf8(bytes) {
Ok(text) => Data::Text { text: Cow::Borrowed(text) },
Err(_) => Data::Bytes { bytes },
}
}
#[cfg(unix)]
fn from_path(path: &Path) -> Data {
use std::os::unix::ffi::OsStrExt;
match path.to_str() {
Some(text) => Data::Text { text: Cow::Borrowed(text) },
None => Data::Bytes { bytes: path.as_os_str().as_bytes() },
}
}
#[cfg(not(unix))]
fn from_path(path: &Path) -> Data {
// Using lossy conversion means some paths won't round trip precisely,
// but it's not clear what we should actually do. Serde rejects
// non-UTF-8 paths, and OsStr's are serialized as a sequence of UTF-16
// code units on Windows. Neither seem appropriate for this use case,
// so we do the easy thing for now.
Data::Text { text: path.to_string_lossy() }
}
// Unused deserialization routines.
/*
fn into_bytes(self) -> Vec<u8> {
match self {
Data::Text { text } => text.into_bytes(),
Data::Bytes { bytes } => bytes,
}
}
#[cfg(unix)]
fn into_path_buf(&self) -> PathBuf {
use std::os::unix::ffi::OsStrExt;
match self {
Data::Text { text } => PathBuf::from(text),
Data::Bytes { bytes } => {
PathBuf::from(OsStr::from_bytes(bytes))
}
}
}
#[cfg(not(unix))]
fn into_path_buf(&self) -> PathBuf {
match self {
Data::Text { text } => PathBuf::from(text),
Data::Bytes { bytes } => {
PathBuf::from(String::from_utf8_lossy(&bytes).into_owned())
}
}
}
*/
}
fn to_base64<T, S>(
bytes: T,
ser: S,
) -> Result<S::Ok, S::Error>
where T: AsRef<[u8]>,
S: Serializer
{
ser.serialize_str(&base64::encode(&bytes))
}
fn ser_bytes<T, S>(
bytes: T,
ser: S,
) -> Result<S::Ok, S::Error>
where T: AsRef<[u8]>,
S: Serializer
{
Data::from_bytes(bytes.as_ref()).serialize(ser)
}
fn ser_path<P, S>(
path: &Option<P>,
ser: S,
) -> Result<S::Ok, S::Error>
where P: AsRef<Path>,
S: Serializer
{
path.as_ref().map(|p| Data::from_path(p.as_ref())).serialize(ser)
}
// The following are some deserialization helpers, in case we decide to support
// deserialization of the above types.
/*
fn from_base64<'de, D>(
de: D,
) -> Result<Vec<u8>, D::Error>
where D: Deserializer<'de>
{
let encoded = String::deserialize(de)?;
let decoded = base64::decode(encoded.as_bytes())
.map_err(D::Error::custom)?;
Ok(decoded)
}
fn deser_bytes<'de, D>(
de: D,
) -> Result<Vec<u8>, D::Error>
where D: Deserializer<'de>
{
Data::deserialize(de).map(|datum| datum.into_bytes())
}
fn deser_path<'de, D>(
de: D,
) -> Result<Option<PathBuf>, D::Error>
where D: Deserializer<'de>
{
Option::<Data>::deserialize(de)
.map(|opt| opt.map(|datum| datum.into_path_buf()))
}
*/

108
grep-printer/src/lib.rs Normal file
View File

@@ -0,0 +1,108 @@
/*!
This crate provides a featureful and fast printer for showing search results
in a human readable way, and another printer for showing results in a machine
readable way.
# Brief overview
The [`Standard`](struct.Standard.html) printer shows results in a human
readable format, and is modeled after the formats used by standard grep-like
tools. Features include, but are not limited to, cross platform terminal
coloring, search & replace, multi-line result handling and reporting summary
statistics.
The [`JSON`](struct.JSON.html) printer shows results in a machine readable
format. To facilitate a stream of search results, the format uses
[JSON Lines](http://jsonlines.org/)
by emitting a series of messages as search results are found.
The [`Summary`](struct.Summary.html) printer shows *aggregate* results for a
single search in a human readable format, and is modeled after similar formats
found in standard grep-like tools. This printer is useful for showing the total
number of matches and/or printing file paths that either contain or don't
contain matches.
# Example
This example shows how to create a "standard" printer and execute a search.
```
extern crate grep_regex;
extern crate grep_printer;
extern crate grep_searcher;
use std::error::Error;
use grep_regex::RegexMatcher;
use grep_printer::Standard;
use grep_searcher::Searcher;
const SHERLOCK: &'static [u8] = b"\
For the Doctor Watsons of this world, as opposed to the Sherlock
Holmeses, success in the province of detective work must always
be, to a very large extent, the result of luck. Sherlock Holmes
can extract a clew from a wisp of straw or a flake of cigar ash;
but Doctor Watson has to have it taken out for him and dusted,
and exhibited clearly, with a label attached.
";
# fn main() { example().unwrap(); }
fn example() -> Result<(), Box<Error>> {
let matcher = RegexMatcher::new(r"Sherlock")?;
let mut printer = Standard::new_no_color(vec![]);
Searcher::new().search_slice(&matcher, SHERLOCK, printer.sink(&matcher))?;
// into_inner gives us back the underlying writer we provided to
// new_no_color, which is wrapped in a termcolor::NoColor. Thus, a second
// into_inner gives us back the actual buffer.
let output = String::from_utf8(printer.into_inner().into_inner())?;
let expected = "\
1:For the Doctor Watsons of this world, as opposed to the Sherlock
3:be, to a very large extent, the result of luck. Sherlock Holmes
";
assert_eq!(output, expected);
Ok(())
}
```
*/
#![deny(missing_docs)]
#[cfg(feature = "serde1")]
extern crate base64;
extern crate grep_matcher;
#[cfg(test)]
extern crate grep_regex;
extern crate grep_searcher;
#[macro_use]
extern crate log;
#[cfg(feature = "serde1")]
extern crate serde;
#[cfg(feature = "serde1")]
#[macro_use]
extern crate serde_derive;
#[cfg(feature = "serde1")]
extern crate serde_json;
extern crate termcolor;
pub use color::{ColorError, ColorSpecs, UserColorSpec};
#[cfg(feature = "serde1")]
pub use json::{JSON, JSONBuilder, JSONSink};
pub use standard::{Standard, StandardBuilder, StandardSink};
pub use stats::Stats;
pub use summary::{Summary, SummaryBuilder, SummaryKind, SummarySink};
pub use util::PrinterPath;
#[macro_use]
mod macros;
mod color;
mod counter;
#[cfg(feature = "serde1")]
mod json;
#[cfg(feature = "serde1")]
mod jsont;
mod standard;
mod stats;
mod summary;
mod util;

View File

@@ -0,0 +1,23 @@
#[cfg(test)]
#[macro_export]
macro_rules! assert_eq_printed {
($expected:expr, $got:expr) => {
let expected = &*$expected;
let got = &*$got;
if expected != got {
panic!("
printed outputs differ!
expected:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
{}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
got:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
{}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
", expected, got);
}
}
}

2993
grep-printer/src/standard.rs Normal file

File diff suppressed because it is too large Load Diff

147
grep-printer/src/stats.rs Normal file
View File

@@ -0,0 +1,147 @@
use std::ops::{Add, AddAssign};
use std::time::Duration;
use util::NiceDuration;
/// Summary statistics produced at the end of a search.
///
/// When statistics are reported by a printer, they correspond to all searches
/// executed with that printer.
#[derive(Clone, Debug, Default, PartialEq, Eq)]
#[cfg_attr(feature = "serde1", derive(Serialize))]
pub struct Stats {
elapsed: NiceDuration,
searches: u64,
searches_with_match: u64,
bytes_searched: u64,
bytes_printed: u64,
matched_lines: u64,
matches: u64,
}
impl Add for Stats {
type Output = Stats;
fn add(self, rhs: Stats) -> Stats {
self + &rhs
}
}
impl<'a> Add<&'a Stats> for Stats {
type Output = Stats;
fn add(self, rhs: &'a Stats) -> Stats {
Stats {
elapsed: NiceDuration(self.elapsed.0 + rhs.elapsed.0),
searches: self.searches + rhs.searches,
searches_with_match:
self.searches_with_match + rhs.searches_with_match,
bytes_searched: self.bytes_searched + rhs.bytes_searched,
bytes_printed: self.bytes_printed + rhs.bytes_printed,
matched_lines: self.matched_lines + rhs.matched_lines,
matches: self.matches + rhs.matches,
}
}
}
impl AddAssign for Stats {
fn add_assign(&mut self, rhs: Stats) {
*self += &rhs;
}
}
impl<'a> AddAssign<&'a Stats> for Stats {
fn add_assign(&mut self, rhs: &'a Stats) {
self.elapsed.0 += rhs.elapsed.0;
self.searches += rhs.searches;
self.searches_with_match += rhs.searches_with_match;
self.bytes_searched += rhs.bytes_searched;
self.bytes_printed += rhs.bytes_printed;
self.matched_lines += rhs.matched_lines;
self.matches += rhs.matches;
}
}
impl Stats {
/// Return a new value for tracking aggregate statistics across searches.
///
/// All statistics are set to `0`.
pub fn new() -> Stats {
Stats::default()
}
/// Return the total amount of time elapsed.
pub fn elapsed(&self) -> Duration {
self.elapsed.0
}
/// Return the total number of searches executed.
pub fn searches(&self) -> u64 {
self.searches
}
/// Return the total number of searches that found at least one match.
pub fn searches_with_match(&self) -> u64 {
self.searches_with_match
}
/// Return the total number of bytes searched.
pub fn bytes_searched(&self) -> u64 {
self.bytes_searched
}
/// Return the total number of bytes printed.
pub fn bytes_printed(&self) -> u64 {
self.bytes_printed
}
/// Return the total number of lines that participated in a match.
///
/// When matches may contain multiple lines then this includes every line
/// that is part of every match.
pub fn matched_lines(&self) -> u64 {
self.matched_lines
}
/// Return the total number of matches.
///
/// There may be multiple matches per line.
pub fn matches(&self) -> u64 {
self.matches
}
/// Add to the elapsed time.
pub fn add_elapsed(&mut self, duration: Duration) {
self.elapsed.0 += duration;
}
/// Add to the number of searches executed.
pub fn add_searches(&mut self, n: u64) {
self.searches += n;
}
/// Add to the number of searches that found at least one match.
pub fn add_searches_with_match(&mut self, n: u64) {
self.searches_with_match += n;
}
/// Add to the total number of bytes searched.
pub fn add_bytes_searched(&mut self, n: u64) {
self.bytes_searched += n;
}
/// Add to the total number of bytes printed.
pub fn add_bytes_printed(&mut self, n: u64) {
self.bytes_printed += n;
}
/// Add to the total number of lines that participated in a match.
pub fn add_matched_lines(&mut self, n: u64) {
self.matched_lines += n;
}
/// Add to the total number of matches.
pub fn add_matches(&mut self, n: u64) {
self.matches += n;
}
}

1066
grep-printer/src/summary.rs Normal file

File diff suppressed because it is too large Load Diff

366
grep-printer/src/util.rs Normal file
View File

@@ -0,0 +1,366 @@
use std::borrow::Cow;
use std::fmt;
use std::io;
use std::path::Path;
use std::time;
use grep_matcher::{Captures, Match, Matcher};
use grep_searcher::{
LineIter,
SinkError, SinkContext, SinkContextKind, SinkMatch,
};
#[cfg(feature = "serde1")]
use serde::{Serialize, Serializer};
/// A type for handling replacements while amortizing allocation.
pub struct Replacer<M: Matcher> {
space: Option<Space<M>>,
}
struct Space<M: Matcher> {
/// The place to store capture locations.
caps: M::Captures,
/// The place to write a replacement to.
dst: Vec<u8>,
/// The place to store match offsets in terms of `dst`.
matches: Vec<Match>,
}
impl<M: Matcher> fmt::Debug for Replacer<M> {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
let (dst, matches) = self.replacement().unwrap_or((&[], &[]));
f.debug_struct("Replacer")
.field("dst", &dst)
.field("matches", &matches)
.finish()
}
}
impl<M: Matcher> Replacer<M> {
/// Create a new replacer for use with a particular matcher.
///
/// This constructor does not allocate. Instead, space for dealing with
/// replacements is allocated lazily only when needed.
pub fn new() -> Replacer<M> {
Replacer { space: None }
}
/// Executes a replacement on the given subject string by replacing all
/// matches with the given replacement. To access the result of the
/// replacement, use the `replacement` method.
///
/// This can fail if the underlying matcher reports an error.
pub fn replace_all<'a>(
&'a mut self,
matcher: &M,
subject: &[u8],
replacement: &[u8],
) -> io::Result<()> {
{
let &mut Space {
ref mut dst,
ref mut caps,
ref mut matches,
} = self.allocate(matcher)?;
dst.clear();
matches.clear();
matcher.replace_with_captures(
subject,
caps,
dst,
|caps, dst| {
let start = dst.len();
caps.interpolate(
|name| matcher.capture_index(name),
subject,
replacement,
dst,
);
let end = dst.len();
matches.push(Match::new(start, end));
true
},
).map_err(io::Error::error_message)?;
}
Ok(())
}
/// Return the result of the prior replacement and the match offsets for
/// all replacement occurrences within the returned replacement buffer.
///
/// If no replacement has occurred then `None` is returned.
pub fn replacement<'a>(&'a self) -> Option<(&'a [u8], &'a [Match])> {
match self.space {
None => None,
Some(ref space) => {
if space.matches.is_empty() {
None
} else {
Some((&space.dst, &space.matches))
}
}
}
}
/// Clear space used for performing a replacement.
///
/// Subsequent calls to `replacement` after calling `clear` (but before
/// executing another replacement) will always return `None`.
pub fn clear(&mut self) {
if let Some(ref mut space) = self.space {
space.dst.clear();
space.matches.clear();
}
}
/// Allocate space for replacements when used with the given matcher and
/// return a mutable reference to that space.
///
/// This can fail if allocating space for capture locations from the given
/// matcher fails.
fn allocate(&mut self, matcher: &M) -> io::Result<&mut Space<M>> {
if self.space.is_none() {
let caps = matcher
.new_captures()
.map_err(io::Error::error_message)?;
self.space = Some(Space {
caps: caps,
dst: vec![],
matches: vec![],
});
}
Ok(self.space.as_mut().unwrap())
}
}
/// A simple layer of abstraction over either a match or a contextual line
/// reported by the searcher.
///
/// In particular, this provides an API that unions the `SinkMatch` and
/// `SinkContext` types while also exposing a list of all individual match
/// locations.
///
/// While this serves as a convenient mechanism to abstract over `SinkMatch`
/// and `SinkContext`, this also provides a way to abstract over replacements.
/// Namely, after a replacement, a `Sunk` value can be constructed using the
/// results of the replacement instead of the bytes reported directly by the
/// searcher.
#[derive(Debug)]
pub struct Sunk<'a> {
bytes: &'a [u8],
absolute_byte_offset: u64,
line_number: Option<u64>,
context_kind: Option<&'a SinkContextKind>,
matches: &'a [Match],
original_matches: &'a [Match],
}
impl<'a> Sunk<'a> {
pub fn empty() -> Sunk<'static> {
Sunk {
bytes: &[],
absolute_byte_offset: 0,
line_number: None,
context_kind: None,
matches: &[],
original_matches: &[],
}
}
pub fn from_sink_match(
sunk: &'a SinkMatch<'a>,
original_matches: &'a [Match],
replacement: Option<(&'a [u8], &'a [Match])>,
) -> Sunk<'a> {
let (bytes, matches) = replacement.unwrap_or_else(|| {
(sunk.bytes(), original_matches)
});
Sunk {
bytes: bytes,
absolute_byte_offset: sunk.absolute_byte_offset(),
line_number: sunk.line_number(),
context_kind: None,
matches: matches,
original_matches: original_matches,
}
}
pub fn from_sink_context(
sunk: &'a SinkContext<'a>,
original_matches: &'a [Match],
replacement: Option<(&'a [u8], &'a [Match])>,
) -> Sunk<'a> {
let (bytes, matches) = replacement.unwrap_or_else(|| {
(sunk.bytes(), original_matches)
});
Sunk {
bytes: bytes,
absolute_byte_offset: sunk.absolute_byte_offset(),
line_number: sunk.line_number(),
context_kind: Some(sunk.kind()),
matches: matches,
original_matches: original_matches,
}
}
pub fn context_kind(&self) -> Option<&'a SinkContextKind> {
self.context_kind
}
pub fn bytes(&self) -> &'a [u8] {
self.bytes
}
pub fn matches(&self) -> &'a [Match] {
self.matches
}
pub fn original_matches(&self) -> &'a [Match] {
self.original_matches
}
pub fn lines(&self, line_term: u8) -> LineIter<'a> {
LineIter::new(line_term, self.bytes())
}
pub fn absolute_byte_offset(&self) -> u64 {
self.absolute_byte_offset
}
pub fn line_number(&self) -> Option<u64> {
self.line_number
}
}
/// A simple encapsulation of a file path used by a printer.
///
/// This represents any transforms that we might want to perform on the path,
/// such as converting it to valid UTF-8 and/or replacing its separator with
/// something else. This allows us to amortize work if we are printing the
/// file path for every match.
///
/// In the common case, no transformation is needed, which lets us avoid the
/// allocation. Typically, only Windows requires a transform, since we can't
/// access the raw bytes of a path directly and first need to lossily convert
/// to UTF-8. Windows is also typically where the path separator replacement
/// is used, e.g., in cygwin environments to use `/` instead of `\`.
///
/// Users of this type are expected to construct it from a normal `Path`
/// found in the standard library. It can then be written to any `io::Write`
/// implementation using the `as_bytes` method. This achieves platform
/// portability with a small cost: on Windows, paths that are not valid UTF-16
/// will not roundtrip correctly.
#[derive(Clone, Debug)]
pub struct PrinterPath<'a>(Cow<'a, [u8]>);
impl<'a> PrinterPath<'a> {
/// Create a new path suitable for printing.
pub fn new(path: &'a Path) -> PrinterPath<'a> {
PrinterPath::new_impl(path)
}
#[cfg(unix)]
fn new_impl(path: &'a Path) -> PrinterPath<'a> {
use std::os::unix::ffi::OsStrExt;
PrinterPath(Cow::Borrowed(path.as_os_str().as_bytes()))
}
#[cfg(not(unix))]
fn new_impl(path: &'a Path) -> PrinterPath<'a> {
PrinterPath(match path.to_string_lossy() {
Cow::Owned(path) => Cow::Owned(path.into_bytes()),
Cow::Borrowed(path) => Cow::Borrowed(path.as_bytes()),
})
}
/// Create a new printer path from the given path which can be efficiently
/// written to a writer without allocation.
///
/// If the given separator is present, then any separators in `path` are
/// replaced with it.
pub fn with_separator(path: &'a Path, sep: Option<u8>) -> PrinterPath<'a> {
let mut ppath = PrinterPath::new(path);
if let Some(sep) = sep {
ppath.replace_separator(sep);
}
ppath
}
/// Replace the path separator in this path with the given separator
/// and do it in place. On Windows, both `/` and `\` are treated as
/// path separators that are both replaced by `new_sep`. In all other
/// environments, only `/` is treated as a path separator.
fn replace_separator(&mut self, new_sep: u8) {
let transformed_path: Vec<_> = self.as_bytes().iter().map(|&b| {
if b == b'/' || (cfg!(windows) && b == b'\\') {
new_sep
} else {
b
}
}).collect();
self.0 = Cow::Owned(transformed_path);
}
/// Return the raw bytes for this path.
pub fn as_bytes(&self) -> &[u8] {
&*self.0
}
}
/// A type that provides "nicer" Display and Serialize impls for
/// std::time::Duration. The serialization format should actually be compatible
/// with the Deserialize impl for std::time::Duration, since this type only
/// adds new fields.
#[derive(Clone, Copy, Debug, Default, PartialEq, Eq)]
pub struct NiceDuration(pub time::Duration);
impl fmt::Display for NiceDuration {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{:0.4}s", self.fractional_seconds())
}
}
impl NiceDuration {
/// Returns the number of seconds in this duration in fraction form.
/// The number to the left of the decimal point is the number of seconds,
/// and the number to the right is the number of milliseconds.
fn fractional_seconds(&self) -> f64 {
let fractional = (self.0.subsec_nanos() as f64) / 1_000_000_000.0;
self.0.as_secs() as f64 + fractional
}
}
#[cfg(feature = "serde1")]
impl Serialize for NiceDuration {
fn serialize<S: Serializer>(&self, ser: S) -> Result<S::Ok, S::Error> {
use serde::ser::SerializeStruct;
let mut state = ser.serialize_struct("Duration", 2)?;
state.serialize_field("secs", &self.0.as_secs())?;
state.serialize_field("nanos", &self.0.subsec_nanos())?;
state.serialize_field("human", &format!("{}", self))?;
state.end()
}
}
/// Trim prefix ASCII spaces from the given slice and return the corresponding
/// range.
pub fn trim_ascii_prefix_range(slice: &[u8], range: Match) -> Match {
fn is_space(b: &&u8) -> bool {
match **b {
b'\t' | b'\n' | b'\x0B' | b'\x0C' | b'\r' | b' ' => true,
_ => false,
}
}
let count = slice[range].iter().take_while(is_space).count();
range.with_start(range.start() + count)
}
/// Trim prefix ASCII spaces from the given slice and return the corresponding
/// sub-slice.
pub fn trim_ascii_prefix(slice: &[u8]) -> &[u8] {
let range = trim_ascii_prefix_range(slice, Match::new(0, slice.len()));
&slice[range]
}

21
grep-regex/Cargo.toml Normal file
View File

@@ -0,0 +1,21 @@
[package]
name = "grep-regex"
version = "0.0.1" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Use Rust's regex library with the 'grep' crate.
"""
documentation = "https://docs.rs/grep-regex"
homepage = "https://github.com/BurntSushi/ripgrep"
repository = "https://github.com/BurntSushi/ripgrep"
readme = "README.md"
keywords = ["regex", "grep", "search", "pattern", "line"]
license = "Unlicense/MIT"
[dependencies]
log = "0.4"
grep-matcher = { version = "0.0.1", path = "../grep-matcher" }
regex = "1"
regex-syntax = "0.6"
thread_local = "0.3.5"
utf8-ranges = "1"

21
grep-regex/LICENSE-MIT Normal file
View File

@@ -0,0 +1,21 @@
The MIT License (MIT)
Copyright (c) 2015 Andrew Gallant
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

4
grep-regex/README.md Normal file
View File

@@ -0,0 +1,4 @@
grep
----
This is a *library* that provides grep-style line-by-line regex searching (with
comparable performance to `grep` itself).

24
grep-regex/UNLICENSE Normal file
View File

@@ -0,0 +1,24 @@
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.
In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
For more information, please refer to <http://unlicense.org/>

263
grep-regex/src/ast.rs Normal file
View File

@@ -0,0 +1,263 @@
use regex_syntax::ast::{self, Ast};
use regex_syntax::ast::parse::Parser;
/// The results of analyzing AST of a regular expression (e.g., for supporting
/// smart case).
#[derive(Clone, Debug)]
pub struct AstAnalysis {
/// True if and only if a literal uppercase character occurs in the regex.
any_uppercase: bool,
/// True if and only if the regex contains any literal at all.
any_literal: bool,
/// True if and only if the regex consists entirely of a literal and no
/// other special regex characters.
all_verbatim_literal: bool,
}
impl AstAnalysis {
/// Returns a `AstAnalysis` value by doing analysis on the AST of `pattern`.
///
/// If `pattern` is not a valid regular expression, then `None` is
/// returned.
#[allow(dead_code)]
pub fn from_pattern(pattern: &str) -> Option<AstAnalysis> {
Parser::new()
.parse(pattern)
.map(|ast| AstAnalysis::from_ast(&ast))
.ok()
}
/// Perform an AST analysis given the AST.
pub fn from_ast(ast: &Ast) -> AstAnalysis {
let mut analysis = AstAnalysis::new();
analysis.from_ast_impl(ast);
analysis
}
/// Returns true if and only if a literal uppercase character occurs in
/// the pattern.
///
/// For example, a pattern like `\pL` contains no uppercase literals,
/// even though `L` is uppercase and the `\pL` class contains uppercase
/// characters.
pub fn any_uppercase(&self) -> bool {
self.any_uppercase
}
/// Returns true if and only if the regex contains any literal at all.
///
/// For example, a pattern like `\pL` reports `false`, but a pattern like
/// `\pLfoo` reports `true`.
pub fn any_literal(&self) -> bool {
self.any_literal
}
/// Returns true if and only if the entire pattern is a verbatim literal
/// with no special meta characters.
///
/// When this is true, then the pattern satisfies the following law:
/// `escape(pattern) == pattern`. Notable examples where this returns
/// `false` include patterns like `a\u0061` even though `\u0061` is just
/// a literal `a`.
///
/// The purpose of this flag is to determine whether the patterns can be
/// given to non-regex substring search algorithms as-is.
#[allow(dead_code)]
pub fn all_verbatim_literal(&self) -> bool {
self.all_verbatim_literal
}
/// Creates a new `AstAnalysis` value with an initial configuration.
fn new() -> AstAnalysis {
AstAnalysis {
any_uppercase: false,
any_literal: false,
all_verbatim_literal: true,
}
}
fn from_ast_impl(&mut self, ast: &Ast) {
if self.done() {
return;
}
match *ast {
Ast::Empty(_) => {}
Ast::Flags(_)
| Ast::Dot(_)
| Ast::Assertion(_)
| Ast::Class(ast::Class::Unicode(_))
| Ast::Class(ast::Class::Perl(_)) => {
self.all_verbatim_literal = false;
}
Ast::Literal(ref x) => {
self.from_ast_literal(x);
}
Ast::Class(ast::Class::Bracketed(ref x)) => {
self.all_verbatim_literal = false;
self.from_ast_class_set(&x.kind);
}
Ast::Repetition(ref x) => {
self.all_verbatim_literal = false;
self.from_ast_impl(&x.ast);
}
Ast::Group(ref x) => {
self.all_verbatim_literal = false;
self.from_ast_impl(&x.ast);
}
Ast::Alternation(ref alt) => {
self.all_verbatim_literal = false;
for x in &alt.asts {
self.from_ast_impl(x);
}
}
Ast::Concat(ref alt) => {
for x in &alt.asts {
self.from_ast_impl(x);
}
}
}
}
fn from_ast_class_set(&mut self, ast: &ast::ClassSet) {
if self.done() {
return;
}
match *ast {
ast::ClassSet::Item(ref item) => {
self.from_ast_class_set_item(item);
}
ast::ClassSet::BinaryOp(ref x) => {
self.from_ast_class_set(&x.lhs);
self.from_ast_class_set(&x.rhs);
}
}
}
fn from_ast_class_set_item(&mut self, ast: &ast::ClassSetItem) {
if self.done() {
return;
}
match *ast {
ast::ClassSetItem::Empty(_)
| ast::ClassSetItem::Ascii(_)
| ast::ClassSetItem::Unicode(_)
| ast::ClassSetItem::Perl(_) => {}
ast::ClassSetItem::Literal(ref x) => {
self.from_ast_literal(x);
}
ast::ClassSetItem::Range(ref x) => {
self.from_ast_literal(&x.start);
self.from_ast_literal(&x.end);
}
ast::ClassSetItem::Bracketed(ref x) => {
self.from_ast_class_set(&x.kind);
}
ast::ClassSetItem::Union(ref union) => {
for x in &union.items {
self.from_ast_class_set_item(x);
}
}
}
}
fn from_ast_literal(&mut self, ast: &ast::Literal) {
if ast.kind != ast::LiteralKind::Verbatim {
self.all_verbatim_literal = false;
}
self.any_literal = true;
self.any_uppercase = self.any_uppercase || ast.c.is_uppercase();
}
/// Returns true if and only if the attributes can never change no matter
/// what other AST it might see.
fn done(&self) -> bool {
self.any_uppercase && self.any_literal && !self.all_verbatim_literal
}
}
#[cfg(test)]
mod tests {
use super::*;
fn analysis(pattern: &str) -> AstAnalysis {
AstAnalysis::from_pattern(pattern).unwrap()
}
#[test]
fn various() {
let x = analysis("");
assert!(!x.any_uppercase);
assert!(!x.any_literal);
assert!(x.all_verbatim_literal);
let x = analysis("foo");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(x.all_verbatim_literal);
let x = analysis("Foo");
assert!(x.any_uppercase);
assert!(x.any_literal);
assert!(x.all_verbatim_literal);
let x = analysis("foO");
assert!(x.any_uppercase);
assert!(x.any_literal);
assert!(x.all_verbatim_literal);
let x = analysis(r"foo\\");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo\w");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo\S");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo\p{Ll}");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo[a-z]");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo[A-Z]");
assert!(x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo[\S\t]");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo\\S");
assert!(x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"\p{Ll}");
assert!(!x.any_uppercase);
assert!(!x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"aBc\w");
assert!(x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"a\u0061");
assert!(!x.any_uppercase);
assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
}
}

280
grep-regex/src/config.rs Normal file
View File

@@ -0,0 +1,280 @@
use grep_matcher::{ByteSet, LineTerminator};
use regex::bytes::{Regex, RegexBuilder};
use regex_syntax::ast::{self, Ast};
use regex_syntax::hir::Hir;
use ast::AstAnalysis;
use crlf::crlfify;
use error::Error;
use literal::LiteralSets;
use non_matching::non_matching_bytes;
use strip::strip_from_match;
/// Config represents the configuration of a regex matcher in this crate.
/// The configuration is itself a rough combination of the knobs found in
/// the `regex` crate itself, along with additional `grep-matcher` specific
/// options.
///
/// The configuration can be used to build a "configured" HIR expression. A
/// configured HIR expression is an HIR expression that is aware of the
/// configuration which generated it, and provides transformation on that HIR
/// such that the configuration is preserved.
#[derive(Clone, Debug)]
pub struct Config {
pub case_insensitive: bool,
pub case_smart: bool,
pub multi_line: bool,
pub dot_matches_new_line: bool,
pub swap_greed: bool,
pub ignore_whitespace: bool,
pub unicode: bool,
pub octal: bool,
pub size_limit: usize,
pub dfa_size_limit: usize,
pub nest_limit: u32,
pub line_terminator: Option<LineTerminator>,
pub crlf: bool,
pub word: bool,
}
impl Default for Config {
fn default() -> Config {
Config {
case_insensitive: false,
case_smart: false,
multi_line: false,
dot_matches_new_line: false,
swap_greed: false,
ignore_whitespace: false,
unicode: true,
octal: false,
// These size limits are much bigger than what's in the regex
// crate.
size_limit: 100 * (1<<20),
dfa_size_limit: 1000 * (1<<20),
nest_limit: 250,
line_terminator: None,
crlf: false,
word: false,
}
}
}
impl Config {
/// Parse the given pattern and returned its HIR expression along with
/// the current configuration.
///
/// If there was a problem parsing the given expression then an error
/// is returned.
pub fn hir(&self, pattern: &str) -> Result<ConfiguredHIR, Error> {
let analysis = self.analysis(pattern)?;
let expr = ::regex_syntax::ParserBuilder::new()
.nest_limit(self.nest_limit)
.octal(self.octal)
.allow_invalid_utf8(true)
.ignore_whitespace(self.ignore_whitespace)
.case_insensitive(self.is_case_insensitive(&analysis)?)
.multi_line(self.multi_line)
.dot_matches_new_line(self.dot_matches_new_line)
.swap_greed(self.swap_greed)
.unicode(self.unicode)
.build()
.parse(pattern)
.map_err(Error::regex)?;
let expr = match self.line_terminator {
None => expr,
Some(line_term) => strip_from_match(expr, line_term)?,
};
Ok(ConfiguredHIR {
original: pattern.to_string(),
config: self.clone(),
analysis: analysis,
// If CRLF mode is enabled, replace `$` with `(?:\r?$)`.
expr: if self.crlf { crlfify(expr) } else { expr },
})
}
/// Accounting for the `smart_case` config knob, return true if and only if
/// this pattern should be matched case insensitively.
fn is_case_insensitive(
&self,
analysis: &AstAnalysis,
) -> Result<bool, Error> {
if self.case_insensitive {
return Ok(true);
}
if !self.case_smart {
return Ok(false);
}
Ok(analysis.any_literal() && !analysis.any_uppercase())
}
/// Perform analysis on the AST of this pattern.
///
/// This returns an error if the given pattern failed to parse.
fn analysis(&self, pattern: &str) -> Result<AstAnalysis, Error> {
Ok(AstAnalysis::from_ast(&self.ast(pattern)?))
}
/// Parse the given pattern into its abstract syntax.
///
/// This returns an error if the given pattern failed to parse.
fn ast(&self, pattern: &str) -> Result<Ast, Error> {
ast::parse::ParserBuilder::new()
.nest_limit(self.nest_limit)
.octal(self.octal)
.ignore_whitespace(self.ignore_whitespace)
.build()
.parse(pattern)
.map_err(Error::regex)
}
}
/// A "configured" HIR expression, which is aware of the configuration which
/// produced this HIR.
///
/// Since the configuration is tracked, values with this type can be
/// transformed into other HIR expressions (or regular expressions) in a way
/// that preserves the configuration. For example, the `fast_line_regex`
/// method will apply literal extraction to the inner HIR and use that to build
/// a new regex that matches the extracted literals in a way that is
/// consistent with the configuration that produced this HIR. For example, the
/// size limits set on the configured HIR will be propagated out to any
/// subsequently constructed HIR or regular expression.
#[derive(Clone, Debug)]
pub struct ConfiguredHIR {
original: String,
config: Config,
analysis: AstAnalysis,
expr: Hir,
}
impl ConfiguredHIR {
/// Return the configuration for this HIR expression.
pub fn config(&self) -> &Config {
&self.config
}
/// Compute the set of non-matching bytes for this HIR expression.
pub fn non_matching_bytes(&self) -> ByteSet {
non_matching_bytes(&self.expr)
}
/// Builds a regular expression from this HIR expression.
pub fn regex(&self) -> Result<Regex, Error> {
self.pattern_to_regex(&self.expr.to_string())
}
/// Applies the given function to the concrete syntax of this HIR and then
/// generates a new HIR based on the result of the function in a way that
/// preserves the configuration.
///
/// For example, this can be used to wrap a user provided regular
/// expression with additional semantics. e.g., See the `WordMatcher`.
pub fn with_pattern<F: FnMut(&str) -> String>(
&self,
mut f: F,
) -> Result<ConfiguredHIR, Error>
{
self.pattern_to_hir(&f(&self.expr.to_string()))
}
/// If the current configuration has a line terminator set and if useful
/// literals could be extracted, then a regular expression matching those
/// literals is returned. If no line terminator is set, then `None` is
/// returned.
///
/// If compiling the resulting regular expression failed, then an error
/// is returned.
///
/// This method only returns something when a line terminator is set
/// because matches from this regex are generally candidates that must be
/// confirmed before reporting a match. When performing a line oriented
/// search, confirmation is easy: just extend the candidate match to its
/// respective line boundaries and then re-search that line for a full
/// match. This only works when the line terminator is set because the line
/// terminator setting guarantees that the regex itself can never match
/// through the line terminator byte.
pub fn fast_line_regex(&self) -> Result<Option<Regex>, Error> {
if self.config.line_terminator.is_none() {
return Ok(None);
}
match LiteralSets::new(&self.expr).one_regex() {
None => Ok(None),
/*
if !self.config.crlf {
return Ok(None);
}
// If we're trying to support CRLF, then our "fast" line
// oriented regex needs `$` to be able to match at a `\r\n`
// boundary. The regex engine doesn't support this, so we
// "fake" it by replacing `$` with `(?:\r?$)`. Since the
// fast line regex is only used to detect lines, this never
// infects match offsets. Namely, the regex generated via
// `self.expr` is matched against lines with line terminators
// stripped.
let pattern = crlfify(self.expr.clone()).to_string();
self.pattern_to_regex(&pattern).map(Some)
*/
Some(pattern) => self.pattern_to_regex(&pattern).map(Some),
}
}
/// Create a regex from the given pattern using this HIR's configuration.
fn pattern_to_regex(&self, pattern: &str) -> Result<Regex, Error> {
// The settings we explicitly set here are intentionally a subset
// of the settings we have. The key point here is that our HIR
// expression is computed with the settings in mind, such that setting
// them here could actually lead to unintended behavior. For example,
// consider the pattern `(?U)a+`. This will get folded into the HIR
// as a non-greedy repetition operator which will in turn get printed
// to the concrete syntax as `a+?`, which is correct. But if we
// set the `swap_greed` option again, then we'll wind up with `(?U)a+?`
// which is equal to `a+` which is not the same as what we were given.
//
// We also don't need to apply `case_insensitive` since this gets
// folded into the HIR and would just cause us to do redundant work.
//
// Finally, we don't need to set `ignore_whitespace` since the concrete
// syntax emitted by the HIR printer never needs it.
//
// We set the rest of the options. Some of them are important, such as
// the size limit, and some of them are necessary to preserve the
// intention of the original pattern. For example, the Unicode flag
// will impact how the WordMatcher functions, namely, whether its
// word boundaries are Unicode aware or not.
RegexBuilder::new(&pattern)
.nest_limit(self.config.nest_limit)
.octal(self.config.octal)
.multi_line(self.config.multi_line)
.dot_matches_new_line(self.config.dot_matches_new_line)
.unicode(self.config.unicode)
.size_limit(self.config.size_limit)
.dfa_size_limit(self.config.dfa_size_limit)
.build()
.map_err(Error::regex)
}
/// Create an HIR expression from the given pattern using this HIR's
/// configuration.
fn pattern_to_hir(&self, pattern: &str) -> Result<ConfiguredHIR, Error> {
// See `pattern_to_regex` comment for explanation of why we only set
// a subset of knobs here. e.g., `swap_greed` is explicitly left out.
let expr = ::regex_syntax::ParserBuilder::new()
.nest_limit(self.config.nest_limit)
.octal(self.config.octal)
.allow_invalid_utf8(true)
.multi_line(self.config.multi_line)
.dot_matches_new_line(self.config.dot_matches_new_line)
.unicode(self.config.unicode)
.build()
.parse(pattern)
.map_err(Error::regex)?;
Ok(ConfiguredHIR {
original: self.original.clone(),
config: self.config.clone(),
analysis: self.analysis.clone(),
expr: expr,
})
}
}

83
grep-regex/src/crlf.rs Normal file
View File

@@ -0,0 +1,83 @@
use regex_syntax::hir::{self, Hir, HirKind};
/// Substitutes all occurrences of multi-line enabled `$` with `(?:\r?$)`.
///
/// This does not preserve the exact semantics of the given expression,
/// however, it does have the useful property that anything that matched the
/// given expression will also match the returned expression. The difference is
/// that the returned expression can match possibly other things as well.
///
/// The principle reason why we do this is because the underlying regex engine
/// doesn't support CRLF aware `$` look-around. It's planned to fix it at that
/// level, but we perform this kludge in the mean time.
///
/// Note that while the match preserving semantics are nice and neat, the
/// match position semantics are quite a bit messier. Namely, `$` only ever
/// matches the position between characters where as `\r??` can match a
/// character and change the offset. This is regretable, but works out pretty
/// nicely in most cases, especially when a match is limited to a single line.
pub fn crlfify(expr: Hir) -> Hir {
match expr.into_kind() {
HirKind::Anchor(hir::Anchor::EndLine) => {
let concat = Hir::concat(vec![
Hir::repetition(hir::Repetition {
kind: hir::RepetitionKind::ZeroOrOne,
greedy: false,
hir: Box::new(Hir::literal(hir::Literal::Unicode('\r'))),
}),
Hir::anchor(hir::Anchor::EndLine),
]);
Hir::group(hir::Group {
kind: hir::GroupKind::NonCapturing,
hir: Box::new(concat),
})
}
HirKind::Empty => Hir::empty(),
HirKind::Literal(x) => Hir::literal(x),
HirKind::Class(x) => Hir::class(x),
HirKind::Anchor(x) => Hir::anchor(x),
HirKind::WordBoundary(x) => Hir::word_boundary(x),
HirKind::Repetition(mut x) => {
x.hir = Box::new(crlfify(*x.hir));
Hir::repetition(x)
}
HirKind::Group(mut x) => {
x.hir = Box::new(crlfify(*x.hir));
Hir::group(x)
}
HirKind::Concat(xs) => {
Hir::concat(xs.into_iter().map(crlfify).collect())
}
HirKind::Alternation(xs) => {
Hir::alternation(xs.into_iter().map(crlfify).collect())
}
}
}
#[cfg(test)]
mod tests {
use regex_syntax::Parser;
use super::crlfify;
fn roundtrip(pattern: &str) -> String {
let expr1 = Parser::new().parse(pattern).unwrap();
let expr2 = crlfify(expr1);
expr2.to_string()
}
#[test]
fn various() {
assert_eq!(roundtrip(r"(?m)$"), "(?:\r??(?m:$))");
assert_eq!(roundtrip(r"(?m)$$"), "(?:\r??(?m:$))(?:\r??(?m:$))");
assert_eq!(
roundtrip(r"(?m)(?:foo$|bar$)"),
"(?:foo(?:\r??(?m:$))|bar(?:\r??(?m:$)))"
);
assert_eq!(roundtrip(r"(?m)$a"), "(?:\r??(?m:$))a");
// Not a multiline `$`, so no crlfifying occurs.
assert_eq!(roundtrip(r"$"), "\\z");
// It's a literal, derp.
assert_eq!(roundtrip(r"\$"), "\\$");
}
}

88
grep-regex/src/error.rs Normal file
View File

@@ -0,0 +1,88 @@
use std::error;
use std::fmt;
use util;
/// An error that can occur in this crate.
///
/// Generally, this error corresponds to problems building a regular
/// expression, whether it's in parsing, compilation or a problem with
/// guaranteeing a configured optimization.
#[derive(Clone, Debug)]
pub struct Error {
kind: ErrorKind,
}
impl Error {
pub(crate) fn new(kind: ErrorKind) -> Error {
Error { kind }
}
pub(crate) fn regex<E: error::Error>(err: E) -> Error {
Error { kind: ErrorKind::Regex(err.to_string()) }
}
/// Return the kind of this error.
pub fn kind(&self) -> &ErrorKind {
&self.kind
}
}
/// The kind of an error that can occur.
#[derive(Clone, Debug)]
pub enum ErrorKind {
/// An error that occurred as a result of parsing a regular expression.
/// This can be a syntax error or an error that results from attempting to
/// compile a regular expression that is too big.
///
/// The string here is the underlying error converted to a string.
Regex(String),
/// An error that occurs when a building a regex that isn't permitted to
/// match a line terminator. In general, building the regex will do its
/// best to make matching a line terminator impossible (e.g., by removing
/// `\n` from the `\s` character class), but if the regex contains a
/// `\n` literal, then there is no reasonable choice that can be made and
/// therefore an error is reported.
///
/// The string is the literal sequence found in the regex that is not
/// allowed.
NotAllowed(String),
/// This error occurs when a non-ASCII line terminator was provided.
///
/// The invalid byte is included in this error.
InvalidLineTerminator(u8),
/// Hints that destructuring should not be exhaustive.
///
/// This enum may grow additional variants, so this makes sure clients
/// don't count on exhaustive matching. (Otherwise, adding a new variant
/// could break existing code.)
#[doc(hidden)]
__Nonexhaustive,
}
impl error::Error for Error {
fn description(&self) -> &str {
match self.kind {
ErrorKind::Regex(_) => "regex error",
ErrorKind::NotAllowed(_) => "literal not allowed",
ErrorKind::InvalidLineTerminator(_) => "invalid line terminator",
ErrorKind::__Nonexhaustive => unreachable!(),
}
}
}
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self.kind {
ErrorKind::Regex(ref s) => write!(f, "{}", s),
ErrorKind::NotAllowed(ref lit) => {
write!(f, "the literal '{:?}' is not allowed in a regex", lit)
}
ErrorKind::InvalidLineTerminator(byte) => {
let x = util::show_bytes(&[byte]);
write!(f, "line terminators must be ASCII, but '{}' is not", x)
}
ErrorKind::__Nonexhaustive => unreachable!(),
}
}
}

27
grep-regex/src/lib.rs Normal file
View File

@@ -0,0 +1,27 @@
/*!
An implementation of `grep-matcher`'s `Matcher` trait for Rust's regex engine.
*/
#![deny(missing_docs)]
extern crate grep_matcher;
#[macro_use]
extern crate log;
extern crate regex;
extern crate regex_syntax;
extern crate thread_local;
extern crate utf8_ranges;
pub use error::{Error, ErrorKind};
pub use matcher::{RegexCaptures, RegexMatcher, RegexMatcherBuilder};
mod ast;
mod config;
mod crlf;
mod error;
mod literal;
mod matcher;
mod non_matching;
mod strip;
mod util;
mod word;

View File

@@ -1,27 +1,37 @@
/*! /*
The literals module is responsible for extracting *inner* literals out of the This module is responsible for extracting *inner* literals out of the AST of a
AST of a regular expression. Normally this is the job of the regex engine regular expression. Normally this is the job of the regex engine itself, but
itself, but the regex engine doesn't look for inner literals. Since we're doing the regex engine doesn't look for inner literals. Since we're doing line based
line based searching, we can use them, so we need to do it ourselves. searching, we can use them, so we need to do it ourselves.
Note that this implementation is incredibly suspicious. We need something more
principled.
*/ */
use std::cmp; use std::cmp;
use regex::bytes::RegexBuilder; use regex_syntax::hir::{self, Hir, HirKind};
use syntax::hir::{self, Hir, HirKind}; use regex_syntax::hir::literal::{Literal, Literals};
use syntax::hir::literal::{Literal, Literals};
use util;
/// Represents prefix, suffix and inner "required" literals for a regular
/// expression.
///
/// Prefixes and suffixes are detected using regex-syntax. The inner required
/// literals are detected using something custom (but based on the code in
/// regex-syntax).
#[derive(Clone, Debug)] #[derive(Clone, Debug)]
pub struct LiteralSets { pub struct LiteralSets {
/// A set of prefix literals.
prefixes: Literals, prefixes: Literals,
/// A set of suffix literals.
suffixes: Literals, suffixes: Literals,
/// A set of literals such that at least one of them must appear in every
/// match. A literal in this set may be neither a prefix nor a suffix.
required: Literals, required: Literals,
} }
impl LiteralSets { impl LiteralSets {
pub fn create(expr: &Hir) -> Self { /// Create a set of literals from the given HIR expression.
pub fn new(expr: &Hir) -> LiteralSets {
let mut required = Literals::empty(); let mut required = Literals::empty();
union_required(expr, &mut required); union_required(expr, &mut required);
LiteralSets { LiteralSets {
@@ -31,10 +41,23 @@ impl LiteralSets {
} }
} }
pub fn to_regex_builder(&self) -> Option<RegexBuilder> { /// If it is deemed advantageuous to do so (via various suspicious
/// heuristics), this will return a single regular expression pattern that
/// matches a subset of the language matched by the regular expression that
/// generated these literal sets. The idea here is that the pattern
/// returned by this method is much cheaper to search for. i.e., It is
/// usually a single literal or an alternation of literals.
pub fn one_regex(&self) -> Option<String> {
// TODO: The logic in this function is basically inscrutable. It grew
// organically in the old grep 0.1 crate. Ideally, it would be
// re-worked. In fact, the entire inner literal extraction should be
// re-worked. Actually, most of regex-syntax's literal extraction
// should also be re-worked. Alas... only so much time in the day.
if self.prefixes.all_complete() && !self.prefixes.is_empty() { if self.prefixes.all_complete() && !self.prefixes.is_empty() {
debug!("literal prefixes detected: {:?}", self.prefixes); debug!("literal prefixes detected: {:?}", self.prefixes);
// When this is true, the regex engine will do a literal scan. // When this is true, the regex engine will do a literal scan,
// so we don't need to return anything.
return None; return None;
} }
@@ -75,18 +98,17 @@ impl LiteralSets {
let any_empty = req_lits.iter().any(|lit| lit.is_empty()); let any_empty = req_lits.iter().any(|lit| lit.is_empty());
if req.len() > lit.len() && req_lits.len() > 1 && !any_empty { if req.len() > lit.len() && req_lits.len() > 1 && !any_empty {
debug!("required literals found: {:?}", req_lits); debug!("required literals found: {:?}", req_lits);
let alts: Vec<String> = let alts: Vec<String> = req_lits
req_lits.into_iter().map(|x| bytes_to_regex(x)).collect(); .into_iter()
let mut builder = RegexBuilder::new(&alts.join("|")); .map(|x| util::bytes_to_regex(x))
builder.unicode(false); .collect();
Some(builder) // We're matching raw bytes, so disable Unicode mode.
Some(format!("(?-u:{})", alts.join("|")))
} else if lit.is_empty() { } else if lit.is_empty() {
None None
} else { } else {
debug!("required literal found: {:?}", show(lit)); debug!("required literal found: {:?}", util::show_bytes(lit));
let mut builder = RegexBuilder::new(&bytes_to_regex(&lit)); Some(format!("(?-u:{})", util::bytes_to_regex(&lit)))
builder.unicode(false);
Some(builder)
} }
} }
} }
@@ -238,27 +260,45 @@ fn count_byte_class(cls: &hir::ClassBytes) -> u32 {
cls.iter().map(|r| 1 + (r.end() as u32 - r.start() as u32)).sum() cls.iter().map(|r| 1 + (r.end() as u32 - r.start() as u32)).sum()
} }
/// Converts an arbitrary sequence of bytes to a literal suitable for building #[cfg(test)]
/// a regular expression. mod tests {
fn bytes_to_regex(bs: &[u8]) -> String { use regex_syntax::Parser;
let mut s = String::with_capacity(bs.len()); use super::LiteralSets;
for &b in bs {
s.push_str(&format!("\\x{:02x}", b));
}
s
}
/// Converts arbitrary bytes to a nice string. fn sets(pattern: &str) -> LiteralSets {
fn show(bs: &[u8]) -> String { let hir = Parser::new().parse(pattern).unwrap();
// Why aren't we using this to feed to the regex? Doesn't really matter LiteralSets::new(&hir)
// I guess. ---AG }
use std::ascii::escape_default;
use std::str; fn one_regex(pattern: &str) -> Option<String> {
sets(pattern).one_regex()
let mut nice = String::new(); }
for &b in bs {
let part: Vec<u8> = escape_default(b).collect(); // Put a pattern into the same format as the one returned by `one_regex`.
nice.push_str(str::from_utf8(&part).unwrap()); fn pat(pattern: &str) -> Option<String> {
Some(format!("(?-u:{})", pattern))
}
#[test]
fn various() {
// Obviously no literals.
assert!(one_regex(r"\w").is_none());
assert!(one_regex(r"\pL").is_none());
// Tantalizingly close.
assert!(one_regex(r"\w|foo").is_none());
// There's a literal, but it's better if the regex engine handles it
// internally.
assert!(one_regex(r"abc").is_none());
// Core use cases.
assert_eq!(one_regex(r"\wabc\w"), pat("abc"));
assert_eq!(one_regex(r"abc\w"), pat("abc"));
// TODO: Make these pass. We're missing some potentially big wins
// without these.
// assert_eq!(one_regex(r"\w(foo|bar|baz)"), pat("foo|bar|baz"));
// assert_eq!(one_regex(r"\w(foo|bar|baz)\w"), pat("foo|bar|baz"));
} }
nice
} }

864
grep-regex/src/matcher.rs Normal file
View File

@@ -0,0 +1,864 @@
use std::collections::HashMap;
use grep_matcher::{
Captures, LineMatchKind, LineTerminator, Match, Matcher, NoError, ByteSet,
};
use regex::bytes::{CaptureLocations, Regex};
use config::{Config, ConfiguredHIR};
use error::Error;
use word::WordMatcher;
/// A builder for constructing a `Matcher` using regular expressions.
///
/// This builder re-exports many of the same options found on the regex crate's
/// builder, in addition to a few other options such as smart case, word
/// matching and the ability to set a line terminator which may enable certain
/// types of optimizations.
///
/// The syntax supported is documented as part of the regex crate:
/// https://docs.rs/regex/*/regex/#syntax
#[derive(Clone, Debug)]
pub struct RegexMatcherBuilder {
config: Config,
}
impl Default for RegexMatcherBuilder {
fn default() -> RegexMatcherBuilder {
RegexMatcherBuilder::new()
}
}
impl RegexMatcherBuilder {
/// Create a new builder for configuring a regex matcher.
pub fn new() -> RegexMatcherBuilder {
RegexMatcherBuilder {
config: Config::default(),
}
}
/// Build a new matcher using the current configuration for the provided
/// pattern.
///
/// The syntax supported is documented as part of the regex crate:
/// https://docs.rs/regex/*/regex/#syntax
pub fn build(&self, pattern: &str) -> Result<RegexMatcher, Error> {
let chir = self.config.hir(pattern)?;
let fast_line_regex = chir.fast_line_regex()?;
let non_matching_bytes = chir.non_matching_bytes();
if let Some(ref re) = fast_line_regex {
trace!("extracted fast line regex: {:?}", re);
}
Ok(RegexMatcher {
config: self.config.clone(),
matcher: RegexMatcherImpl::new(&chir)?,
fast_line_regex: fast_line_regex,
non_matching_bytes: non_matching_bytes,
})
}
/// Set the value for the case insensitive (`i`) flag.
///
/// When enabled, letters in the pattern will match both upper case and
/// lower case variants.
pub fn case_insensitive(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.config.case_insensitive = yes;
self
}
/// Whether to enable "smart case" or not.
///
/// When smart case is enabled, the builder will automatically enable
/// case insensitive matching based on how the pattern is written. Namely,
/// case insensitive mode is enabled when both of the following things
/// are true:
///
/// 1. The pattern contains at least one literal character. For example,
/// `a\w` contains a literal (`a`) but `\w` does not.
/// 2. Of the literals in the pattern, none of them are considered to be
/// uppercase according to Unicode. For example, `foo\pL` has no
/// uppercase literals but `Foo\pL` does.
pub fn case_smart(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.config.case_smart = yes;
self
}
/// Set the value for the multi-line matching (`m`) flag.
///
/// When enabled, `^` matches the beginning of lines and `$` matches the
/// end of lines.
///
/// By default, they match beginning/end of the input.
pub fn multi_line(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.config.multi_line = yes;
self
}
/// Set the value for the any character (`s`) flag, where in `.` matches
/// anything when `s` is set and matches anything except for new line when
/// it is not set (the default).
///
/// N.B. "matches anything" means "any byte" when Unicode is disabled and
/// means "any valid UTF-8 encoding of any Unicode scalar value" when
/// Unicode is enabled.
pub fn dot_matches_new_line(
&mut self,
yes: bool,
) -> &mut RegexMatcherBuilder {
self.config.dot_matches_new_line = yes;
self
}
/// Set the value for the greedy swap (`U`) flag.
///
/// When enabled, a pattern like `a*` is lazy (tries to find shortest
/// match) and `a*?` is greedy (tries to find longest match).
///
/// By default, `a*` is greedy and `a*?` is lazy.
pub fn swap_greed(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.config.swap_greed = yes;
self
}
/// Set the value for the ignore whitespace (`x`) flag.
///
/// When enabled, whitespace such as new lines and spaces will be ignored
/// between expressions of the pattern, and `#` can be used to start a
/// comment until the next new line.
pub fn ignore_whitespace(
&mut self,
yes: bool,
) -> &mut RegexMatcherBuilder {
self.config.ignore_whitespace = yes;
self
}
/// Set the value for the Unicode (`u`) flag.
///
/// Enabled by default. When disabled, character classes such as `\w` only
/// match ASCII word characters instead of all Unicode word characters.
pub fn unicode(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.config.unicode = yes;
self
}
/// Whether to support octal syntax or not.
///
/// Octal syntax is a little-known way of uttering Unicode codepoints in
/// a regular expression. For example, `a`, `\x61`, `\u0061` and
/// `\141` are all equivalent regular expressions, where the last example
/// shows octal syntax.
///
/// While supporting octal syntax isn't in and of itself a problem, it does
/// make good error messages harder. That is, in PCRE based regex engines,
/// syntax like `\0` invokes a backreference, which is explicitly
/// unsupported in Rust's regex engine. However, many users expect it to
/// be supported. Therefore, when octal support is disabled, the error
/// message will explicitly mention that backreferences aren't supported.
///
/// Octal syntax is disabled by default.
pub fn octal(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.config.octal = yes;
self
}
/// Set the approximate size limit of the compiled regular expression.
///
/// This roughly corresponds to the number of bytes occupied by a single
/// compiled program. If the program exceeds this number, then a
/// compilation error is returned.
pub fn size_limit(&mut self, bytes: usize) -> &mut RegexMatcherBuilder {
self.config.size_limit = bytes;
self
}
/// Set the approximate size of the cache used by the DFA.
///
/// This roughly corresponds to the number of bytes that the DFA will
/// use while searching.
///
/// Note that this is a *per thread* limit. There is no way to set a global
/// limit. In particular, if a regex is used from multiple threads
/// simultaneously, then each thread may use up to the number of bytes
/// specified here.
pub fn dfa_size_limit(
&mut self,
bytes: usize,
) -> &mut RegexMatcherBuilder {
self.config.dfa_size_limit = bytes;
self
}
/// Set the nesting limit for this parser.
///
/// The nesting limit controls how deep the abstract syntax tree is allowed
/// to be. If the AST exceeds the given limit (e.g., with too many nested
/// groups), then an error is returned by the parser.
///
/// The purpose of this limit is to act as a heuristic to prevent stack
/// overflow for consumers that do structural induction on an `Ast` using
/// explicit recursion. While this crate never does this (instead using
/// constant stack space and moving the call stack to the heap), other
/// crates may.
///
/// This limit is not checked until the entire Ast is parsed. Therefore,
/// if callers want to put a limit on the amount of heap space used, then
/// they should impose a limit on the length, in bytes, of the concrete
/// pattern string. In particular, this is viable since this parser
/// implementation will limit itself to heap space proportional to the
/// lenth of the pattern string.
///
/// Note that a nest limit of `0` will return a nest limit error for most
/// patterns but not all. For example, a nest limit of `0` permits `a` but
/// not `ab`, since `ab` requires a concatenation, which results in a nest
/// depth of `1`. In general, a nest limit is not something that manifests
/// in an obvious way in the concrete syntax, therefore, it should not be
/// used in a granular way.
pub fn nest_limit(&mut self, limit: u32) -> &mut RegexMatcherBuilder {
self.config.nest_limit = limit;
self
}
/// Set an ASCII line terminator for the matcher.
///
/// The purpose of setting a line terminator is to enable a certain class
/// of optimizations that can make line oriented searching faster. Namely,
/// when a line terminator is enabled, then the builder will guarantee that
/// the resulting matcher will never be capable of producing a match that
/// contains the line terminator. Because of this guarantee, users of the
/// resulting matcher do not need to slowly execute a search line by line
/// for line oriented search.
///
/// If the aforementioned guarantee about not matching a line terminator
/// cannot be made because of how the pattern was written, then the builder
/// will return an error when attempting to construct the matcher. For
/// example, the pattern `a\sb` will be transformed such that it can never
/// match `a\nb` (when `\n` is the line terminator), but the pattern `a\nb`
/// will result in an error since the `\n` cannot be easily removed without
/// changing the fundamental intent of the pattern.
///
/// If the given line terminator isn't an ASCII byte (`<=127`), then the
/// builder will return an error when constructing the matcher.
pub fn line_terminator(
&mut self,
line_term: Option<u8>,
) -> &mut RegexMatcherBuilder {
self.config.line_terminator = line_term.map(LineTerminator::byte);
self
}
/// Set the line terminator to `\r\n` and enable CRLF matching for `$` in
/// regex patterns.
///
/// This method sets two distinct settings:
///
/// 1. It causes the line terminator for the matcher to be `\r\n`. Namely,
/// this prevents the matcher from ever producing a match that contains
/// a `\r` or `\n`.
/// 2. It translates all instances of `$` in the pattern to `(?:\r??$)`.
/// This works around the fact that the regex engine does not support
/// matching CRLF as a line terminator when using `$`.
///
/// In particular, because of (2), the matches produced by the matcher may
/// be slightly different than what one would expect given the pattern.
/// This is the trade off made: in many cases, `$` will "just work" in the
/// presence of `\r\n` line terminators, but matches may require some
/// trimming to faithfully represent the indended match.
///
/// Note that if you do not wish to set the line terminator but would still
/// like `$` to match `\r\n` line terminators, then it is valid to call
/// `crlf(true)` followed by `line_terminator(None)`. Ordering is
/// important, since `crlf` and `line_terminator` override each other.
pub fn crlf(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
if yes {
self.config.line_terminator = Some(LineTerminator::crlf());
} else {
self.config.line_terminator = None;
}
self.config.crlf = yes;
self
}
/// Require that all matches occur on word boundaries.
///
/// Enabling this option is subtly different than putting `\b` assertions
/// on both sides of your pattern. In particular, a `\b` assertion requires
/// that one side of it match a word character while the other match a
/// non-word character. This option, in contrast, merely requires that
/// one side match a non-word character.
///
/// For example, `\b-2\b` will not match `foo -2 bar` since `-` is not a
/// word character. However, `-2` with this `word` option enabled will
/// match the `-2` in `foo -2 bar`.
pub fn word(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.config.word = yes;
self
}
}
/// An implementation of the `Matcher` trait using Rust's standard regex
/// library.
#[derive(Clone, Debug)]
pub struct RegexMatcher {
/// The configuration specified by the caller.
config: Config,
/// The underlying matcher implementation.
matcher: RegexMatcherImpl,
/// A regex that never reports false negatives but may report false
/// positives that is believed to be capable of being matched more quickly
/// than `regex`. Typically, this is a single literal or an alternation
/// of literals.
fast_line_regex: Option<Regex>,
/// A set of bytes that will never appear in a match.
non_matching_bytes: ByteSet,
}
impl RegexMatcher {
/// Create a new matcher from the given pattern using the default
/// configuration.
pub fn new(pattern: &str) -> Result<RegexMatcher, Error> {
RegexMatcherBuilder::new().build(pattern)
}
/// Create a new matcher from the given pattern using the default
/// configuration, but matches lines terminated by `\n`.
///
/// This returns an error if the given pattern contains a literal `\n`.
/// Other uses of `\n` (such as in `\s`) are removed transparently.
pub fn new_line_matcher(pattern: &str) -> Result<RegexMatcher, Error> {
RegexMatcherBuilder::new()
.line_terminator(Some(b'\n'))
.build(pattern)
}
}
/// An encapsulation of the type of matcher we use in `RegexMatcher`.
#[derive(Clone, Debug)]
enum RegexMatcherImpl {
/// The standard matcher used for all regular expressions.
Standard(StandardMatcher),
/// A matcher that only matches at word boundaries. This transforms the
/// regex to `(^|\W)(...)($|\W)` instead of the more intuitive `\b(...)\b`.
/// Because of this, the WordMatcher provides its own implementation of
/// `Matcher` to encapsulate its use of capture groups to make them
/// invisible to the caller.
Word(WordMatcher),
}
impl RegexMatcherImpl {
/// Based on the configuration, create a new implementation of the
/// `Matcher` trait.
fn new(expr: &ConfiguredHIR) -> Result<RegexMatcherImpl, Error> {
if expr.config().word {
Ok(RegexMatcherImpl::Word(WordMatcher::new(expr)?))
} else {
Ok(RegexMatcherImpl::Standard(StandardMatcher::new(expr)?))
}
}
}
// This implementation just dispatches on the internal matcher impl except
// for the line terminator optimization, which is possibly executed via
// `fast_line_regex`.
impl Matcher for RegexMatcher {
type Captures = RegexCaptures;
type Error = NoError;
fn find_at(
&self,
haystack: &[u8],
at: usize,
) -> Result<Option<Match>, NoError> {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.find_at(haystack, at),
Word(ref m) => m.find_at(haystack, at),
}
}
fn new_captures(&self) -> Result<RegexCaptures, NoError> {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.new_captures(),
Word(ref m) => m.new_captures(),
}
}
fn capture_count(&self) -> usize {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.capture_count(),
Word(ref m) => m.capture_count(),
}
}
fn capture_index(&self, name: &str) -> Option<usize> {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.capture_index(name),
Word(ref m) => m.capture_index(name),
}
}
fn find(&self, haystack: &[u8]) -> Result<Option<Match>, NoError> {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.find(haystack),
Word(ref m) => m.find(haystack),
}
}
fn find_iter<F>(
&self,
haystack: &[u8],
matched: F,
) -> Result<(), NoError>
where F: FnMut(Match) -> bool
{
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.find_iter(haystack, matched),
Word(ref m) => m.find_iter(haystack, matched),
}
}
fn try_find_iter<F, E>(
&self,
haystack: &[u8],
matched: F,
) -> Result<Result<(), E>, NoError>
where F: FnMut(Match) -> Result<bool, E>
{
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.try_find_iter(haystack, matched),
Word(ref m) => m.try_find_iter(haystack, matched),
}
}
fn captures(
&self,
haystack: &[u8],
caps: &mut RegexCaptures,
) -> Result<bool, NoError> {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.captures(haystack, caps),
Word(ref m) => m.captures(haystack, caps),
}
}
fn captures_iter<F>(
&self,
haystack: &[u8],
caps: &mut RegexCaptures,
matched: F,
) -> Result<(), NoError>
where F: FnMut(&RegexCaptures) -> bool
{
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.captures_iter(haystack, caps, matched),
Word(ref m) => m.captures_iter(haystack, caps, matched),
}
}
fn try_captures_iter<F, E>(
&self,
haystack: &[u8],
caps: &mut RegexCaptures,
matched: F,
) -> Result<Result<(), E>, NoError>
where F: FnMut(&RegexCaptures) -> Result<bool, E>
{
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.try_captures_iter(haystack, caps, matched),
Word(ref m) => m.try_captures_iter(haystack, caps, matched),
}
}
fn captures_at(
&self,
haystack: &[u8],
at: usize,
caps: &mut RegexCaptures,
) -> Result<bool, NoError> {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.captures_at(haystack, at, caps),
Word(ref m) => m.captures_at(haystack, at, caps),
}
}
fn replace<F>(
&self,
haystack: &[u8],
dst: &mut Vec<u8>,
append: F,
) -> Result<(), NoError>
where F: FnMut(Match, &mut Vec<u8>) -> bool
{
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.replace(haystack, dst, append),
Word(ref m) => m.replace(haystack, dst, append),
}
}
fn replace_with_captures<F>(
&self,
haystack: &[u8],
caps: &mut RegexCaptures,
dst: &mut Vec<u8>,
append: F,
) -> Result<(), NoError>
where F: FnMut(&Self::Captures, &mut Vec<u8>) -> bool
{
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => {
m.replace_with_captures(haystack, caps, dst, append)
}
Word(ref m) => {
m.replace_with_captures(haystack, caps, dst, append)
}
}
}
fn is_match(&self, haystack: &[u8]) -> Result<bool, NoError> {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.is_match(haystack),
Word(ref m) => m.is_match(haystack),
}
}
fn is_match_at(
&self,
haystack: &[u8],
at: usize,
) -> Result<bool, NoError> {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.is_match_at(haystack, at),
Word(ref m) => m.is_match_at(haystack, at),
}
}
fn shortest_match(
&self,
haystack: &[u8],
) -> Result<Option<usize>, NoError> {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.shortest_match(haystack),
Word(ref m) => m.shortest_match(haystack),
}
}
fn shortest_match_at(
&self,
haystack: &[u8],
at: usize,
) -> Result<Option<usize>, NoError> {
use self::RegexMatcherImpl::*;
match self.matcher {
Standard(ref m) => m.shortest_match_at(haystack, at),
Word(ref m) => m.shortest_match_at(haystack, at),
}
}
fn non_matching_bytes(&self) -> Option<&ByteSet> {
Some(&self.non_matching_bytes)
}
fn line_terminator(&self) -> Option<LineTerminator> {
self.config.line_terminator
}
fn find_candidate_line(
&self,
haystack: &[u8],
) -> Result<Option<LineMatchKind>, NoError> {
Ok(match self.fast_line_regex {
Some(ref regex) => {
regex.shortest_match(haystack).map(LineMatchKind::Candidate)
}
None => {
self.shortest_match(haystack)?.map(LineMatchKind::Confirmed)
}
})
}
}
/// The implementation of the standard regex matcher.
#[derive(Clone, Debug)]
struct StandardMatcher {
/// The regular expression compiled from the pattern provided by the
/// caller.
regex: Regex,
/// A map from capture group name to its corresponding index.
names: HashMap<String, usize>,
}
impl StandardMatcher {
fn new(expr: &ConfiguredHIR) -> Result<StandardMatcher, Error> {
let regex = expr.regex()?;
let mut names = HashMap::new();
for (i, optional_name) in regex.capture_names().enumerate() {
if let Some(name) = optional_name {
names.insert(name.to_string(), i);
}
}
Ok(StandardMatcher { regex, names })
}
}
impl Matcher for StandardMatcher {
type Captures = RegexCaptures;
type Error = NoError;
fn find_at(
&self,
haystack: &[u8],
at: usize,
) -> Result<Option<Match>, NoError> {
Ok(self.regex
.find_at(haystack, at)
.map(|m| Match::new(m.start(), m.end())))
}
fn new_captures(&self) -> Result<RegexCaptures, NoError> {
Ok(RegexCaptures::new(self.regex.capture_locations()))
}
fn capture_count(&self) -> usize {
self.regex.captures_len()
}
fn capture_index(&self, name: &str) -> Option<usize> {
self.names.get(name).map(|i| *i)
}
fn try_find_iter<F, E>(
&self,
haystack: &[u8],
mut matched: F,
) -> Result<Result<(), E>, NoError>
where F: FnMut(Match) -> Result<bool, E>
{
for m in self.regex.find_iter(haystack) {
match matched(Match::new(m.start(), m.end())) {
Ok(true) => continue,
Ok(false) => return Ok(Ok(())),
Err(err) => return Ok(Err(err)),
}
}
Ok(Ok(()))
}
fn captures_at(
&self,
haystack: &[u8],
at: usize,
caps: &mut RegexCaptures,
) -> Result<bool, NoError> {
Ok(self.regex.captures_read_at(&mut caps.locs, haystack, at).is_some())
}
fn shortest_match_at(
&self,
haystack: &[u8],
at: usize,
) -> Result<Option<usize>, NoError> {
Ok(self.regex.shortest_match_at(haystack, at))
}
}
/// Represents the match offsets of each capturing group in a match.
///
/// The first, or `0`th capture group, always corresponds to the entire match
/// and is guaranteed to be present when a match occurs. The next capture
/// group, at index `1`, corresponds to the first capturing group in the regex,
/// ordered by the position at which the left opening parenthesis occurs.
///
/// Note that not all capturing groups are guaranteed to be present in a match.
/// For example, in the regex, `(?P<foo>\w)|(?P<bar>\W)`, only one of `foo`
/// or `bar` will ever be set in any given match.
///
/// In order to access a capture group by name, you'll need to first find the
/// index of the group using the corresponding matcher's `capture_index`
/// method, and then use that index with `RegexCaptures::get`.
#[derive(Clone, Debug)]
pub struct RegexCaptures {
/// Where the locations are stored.
locs: CaptureLocations,
/// These captures behave as if the capturing groups begin at the given
/// offset. When set to `0`, this has no affect and capture groups are
/// indexed like normal.
///
/// This is useful when building matchers that wrap arbitrary regular
/// expressions. For example, `WordMatcher` takes an existing regex `re`
/// and creates `(?:^|\W)(re)(?:$|\W)`, but hides the fact that the regex
/// has been wrapped from the caller. In order to do this, the matcher
/// and the capturing groups must behave as if `(re)` is the `0`th capture
/// group.
offset: usize,
}
impl Captures for RegexCaptures {
fn len(&self) -> usize {
self.locs.len().checked_sub(self.offset).unwrap()
}
fn get(&self, i: usize) -> Option<Match> {
let actual = i.checked_add(self.offset).unwrap();
self.locs.pos(actual).map(|(s, e)| Match::new(s, e))
}
}
impl RegexCaptures {
pub(crate) fn new(locs: CaptureLocations) -> RegexCaptures {
RegexCaptures::with_offset(locs, 0)
}
pub(crate) fn with_offset(
locs: CaptureLocations,
offset: usize,
) -> RegexCaptures {
RegexCaptures { locs, offset }
}
pub(crate) fn locations(&mut self) -> &mut CaptureLocations {
&mut self.locs
}
}
#[cfg(test)]
mod tests {
use grep_matcher::{LineMatchKind, Matcher};
use super::*;
// Test that enabling word matches does the right thing and demonstrate
// the difference between it and surrounding the regex in `\b`.
#[test]
fn word() {
let matcher = RegexMatcherBuilder::new()
.word(true)
.build(r"-2")
.unwrap();
assert!(matcher.is_match(b"abc -2 foo").unwrap());
let matcher = RegexMatcherBuilder::new()
.word(false)
.build(r"\b-2\b")
.unwrap();
assert!(!matcher.is_match(b"abc -2 foo").unwrap());
}
// Test that enabling a line terminator prevents it from matching through
// said line terminator.
#[test]
fn line_terminator() {
// This works, because there's no line terminator specified.
let matcher = RegexMatcherBuilder::new()
.build(r"abc\sxyz")
.unwrap();
assert!(matcher.is_match(b"abc\nxyz").unwrap());
// This doesn't.
let matcher = RegexMatcherBuilder::new()
.line_terminator(Some(b'\n'))
.build(r"abc\sxyz")
.unwrap();
assert!(!matcher.is_match(b"abc\nxyz").unwrap());
}
// Ensure that the builder returns an error if a line terminator is set
// and the regex could not be modified to remove a line terminator.
#[test]
fn line_terminator_error() {
assert!(RegexMatcherBuilder::new()
.line_terminator(Some(b'\n'))
.build(r"a\nz")
.is_err())
}
// Test that enabling CRLF permits `$` to match at the end of a line.
#[test]
fn line_terminator_crlf() {
// Test normal use of `$` with a `\n` line terminator.
let matcher = RegexMatcherBuilder::new()
.multi_line(true)
.build(r"abc$")
.unwrap();
assert!(matcher.is_match(b"abc\n").unwrap());
// Test that `$` doesn't match at `\r\n` boundary normally.
let matcher = RegexMatcherBuilder::new()
.multi_line(true)
.build(r"abc$")
.unwrap();
assert!(!matcher.is_match(b"abc\r\n").unwrap());
// Now check the CRLF handling.
let matcher = RegexMatcherBuilder::new()
.multi_line(true)
.crlf(true)
.build(r"abc$")
.unwrap();
assert!(matcher.is_match(b"abc\r\n").unwrap());
}
// Test that smart case works.
#[test]
fn case_smart() {
let matcher = RegexMatcherBuilder::new()
.case_smart(true)
.build(r"abc")
.unwrap();
assert!(matcher.is_match(b"ABC").unwrap());
let matcher = RegexMatcherBuilder::new()
.case_smart(true)
.build(r"aBc")
.unwrap();
assert!(!matcher.is_match(b"ABC").unwrap());
}
// Test that finding candidate lines works as expected.
#[test]
fn candidate_lines() {
fn is_confirmed(m: LineMatchKind) -> bool {
match m {
LineMatchKind::Confirmed(_) => true,
_ => false,
}
}
fn is_candidate(m: LineMatchKind) -> bool {
match m {
LineMatchKind::Candidate(_) => true,
_ => false,
}
}
// With no line terminator set, we can't employ any optimizations,
// so we get a confirmed match.
let matcher = RegexMatcherBuilder::new()
.build(r"\wfoo\s")
.unwrap();
let m = matcher.find_candidate_line(b"afoo ").unwrap().unwrap();
assert!(is_confirmed(m));
// With a line terminator and a regex specially crafted to have an
// easy-to-detect inner literal, we can apply an optimization that
// quickly finds candidate matches.
let matcher = RegexMatcherBuilder::new()
.line_terminator(Some(b'\n'))
.build(r"\wfoo\s")
.unwrap();
let m = matcher.find_candidate_line(b"afoo ").unwrap().unwrap();
assert!(is_candidate(m));
}
}

View File

@@ -0,0 +1,128 @@
use grep_matcher::ByteSet;
use regex_syntax::hir::{self, Hir, HirKind};
use utf8_ranges::Utf8Sequences;
/// Return a confirmed set of non-matching bytes from the given expression.
pub fn non_matching_bytes(expr: &Hir) -> ByteSet {
let mut set = ByteSet::full();
remove_matching_bytes(expr, &mut set);
set
}
/// Remove any bytes from the given set that can occur in a matched produced by
/// the given expression.
fn remove_matching_bytes(
expr: &Hir,
set: &mut ByteSet,
) {
match *expr.kind() {
HirKind::Empty
| HirKind::Anchor(_)
| HirKind::WordBoundary(_) => {}
HirKind::Literal(hir::Literal::Unicode(c)) => {
for &b in c.encode_utf8(&mut [0; 4]).as_bytes() {
set.remove(b);
}
}
HirKind::Literal(hir::Literal::Byte(b)) => {
set.remove(b);
}
HirKind::Class(hir::Class::Unicode(ref cls)) => {
for range in cls.iter() {
// This is presumably faster than encoding every codepoint
// to UTF-8 and then removing those bytes from the set.
for seq in Utf8Sequences::new(range.start(), range.end()) {
for byte_range in seq.as_slice() {
set.remove_all(byte_range.start, byte_range.end);
}
}
}
}
HirKind::Class(hir::Class::Bytes(ref cls)) => {
for range in cls.iter() {
set.remove_all(range.start(), range.end());
}
}
HirKind::Repetition(ref x) => {
remove_matching_bytes(&x.hir, set);
}
HirKind::Group(ref x) => {
remove_matching_bytes(&x.hir, set);
}
HirKind::Concat(ref xs) => {
for x in xs {
remove_matching_bytes(x, set);
}
}
HirKind::Alternation(ref xs) => {
for x in xs {
remove_matching_bytes(x, set);
}
}
}
}
#[cfg(test)]
mod tests {
use grep_matcher::ByteSet;
use regex_syntax::ParserBuilder;
use super::non_matching_bytes;
fn extract(pattern: &str) -> ByteSet {
let expr = ParserBuilder::new()
.allow_invalid_utf8(true)
.build()
.parse(pattern)
.unwrap();
non_matching_bytes(&expr)
}
fn sparse(set: &ByteSet) -> Vec<u8> {
let mut sparse_set = vec![];
for b in (0..256).map(|b| b as u8) {
if set.contains(b) {
sparse_set.push(b);
}
}
sparse_set
}
fn sparse_except(except: &[u8]) -> Vec<u8> {
let mut except_set = vec![false; 256];
for &b in except {
except_set[b as usize] = true;
}
let mut set = vec![];
for b in (0..256).map(|b| b as u8) {
if !except_set[b as usize] {
set.push(b);
}
}
set
}
#[test]
fn dot() {
assert_eq!(sparse(&extract(".")), vec![
b'\n',
192, 193, 245, 246, 247, 248, 249,
250, 251, 252, 253, 254, 255,
]);
assert_eq!(sparse(&extract("(?s).")), vec![
192, 193, 245, 246, 247, 248, 249,
250, 251, 252, 253, 254, 255,
]);
assert_eq!(sparse(&extract("(?-u).")), vec![b'\n']);
assert_eq!(sparse(&extract("(?s-u).")), vec![]);
}
#[test]
fn literal() {
assert_eq!(sparse(&extract("a")), sparse_except(&[b'a']));
assert_eq!(sparse(&extract("")), sparse_except(&[0xE2, 0x98, 0x83]));
assert_eq!(sparse(&extract(r"\xFF")), sparse_except(&[0xC3, 0xBF]));
assert_eq!(sparse(&extract(r"(?-u)\xFF")), sparse_except(&[0xFF]));
}
}

154
grep-regex/src/strip.rs Normal file
View File

@@ -0,0 +1,154 @@
use grep_matcher::LineTerminator;
use regex_syntax::hir::{self, Hir, HirKind};
use error::{Error, ErrorKind};
/// Return an HIR that is guaranteed to never match the given line terminator,
/// if possible.
///
/// If the transformation isn't possible, then an error is returned.
///
/// In general, if a literal line terminator occurs anywhere in the HIR, then
/// this will return an error. However, if the line terminator occurs within
/// a character class with at least one other character (that isn't also a line
/// terminator), then the line terminator is simply stripped from that class.
///
/// If the given line terminator is not ASCII, then this function returns an
/// error.
pub fn strip_from_match(
expr: Hir,
line_term: LineTerminator,
) -> Result<Hir, Error> {
if line_term.is_crlf() {
let expr1 = strip_from_match_ascii(expr, b'\r')?;
strip_from_match_ascii(expr1, b'\n')
} else {
let b = line_term.as_byte();
if b > 0x7F {
return Err(Error::new(ErrorKind::InvalidLineTerminator(b)));
}
strip_from_match_ascii(expr, b)
}
}
/// The implementation of strip_from_match. The given byte must be ASCII. This
/// function panics otherwise.
fn strip_from_match_ascii(
expr: Hir,
byte: u8,
) -> Result<Hir, Error> {
assert!(byte <= 0x7F);
let chr = byte as char;
assert_eq!(chr.len_utf8(), 1);
let invalid = || Err(Error::new(ErrorKind::NotAllowed(chr.to_string())));
Ok(match expr.into_kind() {
HirKind::Empty => Hir::empty(),
HirKind::Literal(hir::Literal::Unicode(c)) => {
if c == chr {
return invalid();
}
Hir::literal(hir::Literal::Unicode(c))
}
HirKind::Literal(hir::Literal::Byte(b)) => {
if b as char == chr {
return invalid();
}
Hir::literal(hir::Literal::Byte(b))
}
HirKind::Class(hir::Class::Unicode(mut cls)) => {
let remove = hir::ClassUnicode::new(Some(
hir::ClassUnicodeRange::new(chr, chr),
));
cls.difference(&remove);
if cls.ranges().is_empty() {
return invalid();
}
Hir::class(hir::Class::Unicode(cls))
}
HirKind::Class(hir::Class::Bytes(mut cls)) => {
let remove = hir::ClassBytes::new(Some(
hir::ClassBytesRange::new(byte, byte),
));
cls.difference(&remove);
if cls.ranges().is_empty() {
return invalid();
}
Hir::class(hir::Class::Bytes(cls))
}
HirKind::Anchor(x) => Hir::anchor(x),
HirKind::WordBoundary(x) => Hir::word_boundary(x),
HirKind::Repetition(mut x) => {
x.hir = Box::new(strip_from_match_ascii(*x.hir, byte)?);
Hir::repetition(x)
}
HirKind::Group(mut x) => {
x.hir = Box::new(strip_from_match_ascii(*x.hir, byte)?);
Hir::group(x)
}
HirKind::Concat(xs) => {
let xs = xs.into_iter()
.map(|e| strip_from_match_ascii(e, byte))
.collect::<Result<Vec<Hir>, Error>>()?;
Hir::concat(xs)
}
HirKind::Alternation(xs) => {
let xs = xs.into_iter()
.map(|e| strip_from_match_ascii(e, byte))
.collect::<Result<Vec<Hir>, Error>>()?;
Hir::alternation(xs)
}
})
}
#[cfg(test)]
mod tests {
use regex_syntax::Parser;
use error::Error;
use super::{LineTerminator, strip_from_match};
fn roundtrip(pattern: &str, byte: u8) -> String {
roundtrip_line_term(pattern, LineTerminator::byte(byte)).unwrap()
}
fn roundtrip_crlf(pattern: &str) -> String {
roundtrip_line_term(pattern, LineTerminator::crlf()).unwrap()
}
fn roundtrip_err(pattern: &str, byte: u8) -> Result<String, Error> {
roundtrip_line_term(pattern, LineTerminator::byte(byte))
}
fn roundtrip_line_term(
pattern: &str,
line_term: LineTerminator,
) -> Result<String, Error> {
let expr1 = Parser::new().parse(pattern).unwrap();
let expr2 = strip_from_match(expr1, line_term)?;
Ok(expr2.to_string())
}
#[test]
fn various() {
assert_eq!(roundtrip(r"[a\n]", b'\n'), "[a]");
assert_eq!(roundtrip(r"[a\n]", b'a'), "[\n]");
assert_eq!(roundtrip_crlf(r"[a\n]"), "[a]");
assert_eq!(roundtrip_crlf(r"[a\r]"), "[a]");
assert_eq!(roundtrip_crlf(r"[a\r\n]"), "[a]");
assert_eq!(roundtrip(r"(?-u)\s", b'a'), r"(?-u:[\x09-\x0D\x20])");
assert_eq!(roundtrip(r"(?-u)\s", b'\n'), r"(?-u:[\x09\x0B-\x0D\x20])");
assert!(roundtrip_err(r"\n", b'\n').is_err());
assert!(roundtrip_err(r"abc\n", b'\n').is_err());
assert!(roundtrip_err(r"\nabc", b'\n').is_err());
assert!(roundtrip_err(r"abc\nxyz", b'\n').is_err());
assert!(roundtrip_err(r"\x0A", b'\n').is_err());
assert!(roundtrip_err(r"\u000A", b'\n').is_err());
assert!(roundtrip_err(r"\U0000000A", b'\n').is_err());
assert!(roundtrip_err(r"\u{A}", b'\n').is_err());
assert!(roundtrip_err("\n", b'\n').is_err());
}
}

29
grep-regex/src/util.rs Normal file
View File

@@ -0,0 +1,29 @@
/// Converts an arbitrary sequence of bytes to a literal suitable for building
/// a regular expression.
pub fn bytes_to_regex(bs: &[u8]) -> String {
use std::fmt::Write;
use regex_syntax::is_meta_character;
let mut s = String::with_capacity(bs.len());
for &b in bs {
if b <= 0x7F && !is_meta_character(b as char) {
write!(s, r"{}", b as char).unwrap();
} else {
write!(s, r"\x{:02x}", b).unwrap();
}
}
s
}
/// Converts arbitrary bytes to a nice string.
pub fn show_bytes(bs: &[u8]) -> String {
use std::ascii::escape_default;
use std::str;
let mut nice = String::new();
for &b in bs {
let part: Vec<u8> = escape_default(b).collect();
nice.push_str(str::from_utf8(&part).unwrap());
}
nice
}

196
grep-regex/src/word.rs Normal file
View File

@@ -0,0 +1,196 @@
use std::collections::HashMap;
use std::cell::RefCell;
use std::sync::Arc;
use grep_matcher::{Match, Matcher, NoError};
use regex::bytes::{CaptureLocations, Regex};
use thread_local::CachedThreadLocal;
use config::ConfiguredHIR;
use error::Error;
use matcher::RegexCaptures;
/// A matcher for implementing "word match" semantics.
#[derive(Debug)]
pub struct WordMatcher {
/// The regex which is roughly `(?:^|\W)(<original pattern>)(?:$|\W)`.
regex: Regex,
/// A map from capture group name to capture group index.
names: HashMap<String, usize>,
/// A reusable buffer for finding the match location of the inner group.
locs: Arc<CachedThreadLocal<RefCell<CaptureLocations>>>,
}
impl Clone for WordMatcher {
fn clone(&self) -> WordMatcher {
// We implement Clone manually so that we get a fresh CachedThreadLocal
// such that it can set its own thread owner. This permits each thread
// usings `locs` to hit the fast path.
WordMatcher {
regex: self.regex.clone(),
names: self.names.clone(),
locs: Arc::new(CachedThreadLocal::new()),
}
}
}
impl WordMatcher {
/// Create a new matcher from the given pattern that only produces matches
/// that are considered "words."
///
/// The given options are used to construct the regular expression
/// internally.
pub fn new(expr: &ConfiguredHIR) -> Result<WordMatcher, Error> {
let word_expr = expr.with_pattern(|pat| {
format!(r"(?:(?m:^)|\W)({})(?:(?m:$)|\W)", pat)
})?;
let regex = word_expr.regex()?;
let locs = Arc::new(CachedThreadLocal::new());
let mut names = HashMap::new();
for (i, optional_name) in regex.capture_names().enumerate() {
if let Some(name) = optional_name {
names.insert(name.to_string(), i.checked_sub(1).unwrap());
}
}
Ok(WordMatcher { regex, names, locs })
}
}
impl Matcher for WordMatcher {
type Captures = RegexCaptures;
type Error = NoError;
fn find_at(
&self,
haystack: &[u8],
at: usize,
) -> Result<Option<Match>, NoError> {
// To make this easy to get right, we extract captures here instead of
// calling `find_at`. The actual match is at capture group `1` instead
// of `0`. We *could* use `find_at` here and then trim the match after
// the fact, but that's a bit harder to get right, and it's not clear
// if it's worth it.
let cell = self.locs.get_or(|| {
Box::new(RefCell::new(self.regex.capture_locations()))
});
let mut caps = cell.borrow_mut();
self.regex.captures_read_at(&mut caps, haystack, at);
Ok(caps.get(1).map(|m| Match::new(m.0, m.1)))
}
fn new_captures(&self) -> Result<RegexCaptures, NoError> {
Ok(RegexCaptures::with_offset(self.regex.capture_locations(), 1))
}
fn capture_count(&self) -> usize {
self.regex.captures_len().checked_sub(1).unwrap()
}
fn capture_index(&self, name: &str) -> Option<usize> {
self.names.get(name).map(|i| *i)
}
fn captures_at(
&self,
haystack: &[u8],
at: usize,
caps: &mut RegexCaptures,
) -> Result<bool, NoError> {
let r = self.regex.captures_read_at(caps.locations(), haystack, at);
Ok(r.is_some())
}
// We specifically do not implement other methods like find_iter or
// captures_iter. Namely, the iter methods are guaranteed to be correct
// by virtue of implementing find_at and captures_at above.
}
#[cfg(test)]
mod tests {
use grep_matcher::{Captures, Match, Matcher};
use config::Config;
use super::WordMatcher;
fn matcher(pattern: &str) -> WordMatcher {
let chir = Config::default().hir(pattern).unwrap();
WordMatcher::new(&chir).unwrap()
}
fn find(pattern: &str, haystack: &str) -> Option<(usize, usize)> {
matcher(pattern)
.find(haystack.as_bytes())
.unwrap()
.map(|m| (m.start(), m.end()))
}
fn find_by_caps(pattern: &str, haystack: &str) -> Option<(usize, usize)> {
let m = matcher(pattern);
let mut caps = m.new_captures().unwrap();
if !m.captures(haystack.as_bytes(), &mut caps).unwrap() {
None
} else {
caps.get(0).map(|m| (m.start(), m.end()))
}
}
// Test that the standard `find` API reports offsets correctly.
#[test]
fn various_find() {
assert_eq!(Some((0, 3)), find(r"foo", "foo"));
assert_eq!(Some((0, 3)), find(r"foo", "foo("));
assert_eq!(Some((1, 4)), find(r"foo", "!foo("));
assert_eq!(None, find(r"foo", "!afoo("));
assert_eq!(Some((0, 3)), find(r"foo", "foo☃"));
assert_eq!(None, find(r"foo", "fooб"));
// assert_eq!(Some((0, 3)), find(r"foo", "fooб"));
// See: https://github.com/BurntSushi/ripgrep/issues/389
assert_eq!(Some((0, 2)), find(r"-2", "-2"));
}
// Test that the captures API also reports offsets correctly, just as
// find does. This exercises a different path in the code since captures
// are handled differently.
#[test]
fn various_captures() {
assert_eq!(Some((0, 3)), find_by_caps(r"foo", "foo"));
assert_eq!(Some((0, 3)), find_by_caps(r"foo", "foo("));
assert_eq!(Some((1, 4)), find_by_caps(r"foo", "!foo("));
assert_eq!(None, find_by_caps(r"foo", "!afoo("));
assert_eq!(Some((0, 3)), find_by_caps(r"foo", "foo☃"));
assert_eq!(None, find_by_caps(r"foo", "fooб"));
// assert_eq!(Some((0, 3)), find_by_caps(r"foo", "fooб"));
// See: https://github.com/BurntSushi/ripgrep/issues/389
assert_eq!(Some((0, 2)), find_by_caps(r"-2", "-2"));
}
// Test that the capture reporting methods work as advertised.
#[test]
fn capture_indexing() {
let m = matcher(r"(a)(?P<foo>b)(c)");
assert_eq!(4, m.capture_count());
assert_eq!(Some(2), m.capture_index("foo"));
let mut caps = m.new_captures().unwrap();
assert_eq!(4, caps.len());
assert!(m.captures(b"abc", &mut caps).unwrap());
assert_eq!(caps.get(0), Some(Match::new(0, 3)));
assert_eq!(caps.get(1), Some(Match::new(0, 1)));
assert_eq!(caps.get(2), Some(Match::new(1, 2)));
assert_eq!(caps.get(3), Some(Match::new(2, 3)));
assert_eq!(caps.get(4), None);
assert!(m.captures(b"#abc#", &mut caps).unwrap());
assert_eq!(caps.get(0), Some(Match::new(1, 4)));
assert_eq!(caps.get(1), Some(Match::new(1, 2)));
assert_eq!(caps.get(2), Some(Match::new(2, 3)));
assert_eq!(caps.get(3), Some(Match::new(3, 4)));
assert_eq!(caps.get(4), None);
}
}

35
grep-searcher/Cargo.toml Normal file
View File

@@ -0,0 +1,35 @@
[package]
name = "grep-searcher"
version = "0.0.1" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Fast line oriented regex searching as a library.
"""
documentation = "https://docs.rs/grep-searcher"
homepage = "https://github.com/BurntSushi/ripgrep"
repository = "https://github.com/BurntSushi/ripgrep"
readme = "README.md"
keywords = ["regex", "grep", "egrep", "search", "pattern"]
license = "Unlicense/MIT"
[dependencies]
bytecount = "0.3.1"
encoding_rs = "0.8"
encoding_rs_io = "0.1"
grep-matcher = { version = "0.0.1", path = "../grep-matcher" }
log = "0.4"
memchr = "2"
memmap = "0.6"
[dev-dependencies]
grep-regex = { version = "0.0.1", path = "../grep-regex" }
regex = "1"
[features]
avx-accel = [
"bytecount/avx-accel",
]
simd-accel = [
"bytecount/simd-accel",
"encoding_rs/simd-accel",
]

21
grep-searcher/LICENSE-MIT Normal file
View File

@@ -0,0 +1,21 @@
The MIT License (MIT)
Copyright (c) 2015 Andrew Gallant
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

4
grep-searcher/README.md Normal file
View File

@@ -0,0 +1,4 @@
grep
----
This is a *library* that provides grep-style line-by-line regex searching (with
comparable performance to `grep` itself).

24
grep-searcher/UNLICENSE Normal file
View File

@@ -0,0 +1,24 @@
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or
distribute this software, either in source code form or as a compiled
binary, for any purpose, commercial or non-commercial, and by any
means.
In jurisdictions that recognize copyright laws, the author or authors
of this software dedicate any and all copyright interest in the
software to the public domain. We make this dedication for the benefit
of the public at large and to the detriment of our heirs and
successors. We intend this dedication to be an overt act of
relinquishment in perpetuity of all present and future rights to this
software under copyright law.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
For more information, please refer to <http://unlicense.org/>

View File

@@ -0,0 +1,33 @@
extern crate grep_regex;
extern crate grep_searcher;
use std::env;
use std::error::Error;
use std::io;
use std::process;
use grep_regex::RegexMatcher;
use grep_searcher::Searcher;
use grep_searcher::sinks::UTF8;
fn main() {
if let Err(err) = example() {
eprintln!("{}", err);
process::exit(1);
}
}
fn example() -> Result<(), Box<Error>> {
let pattern = match env::args().nth(1) {
Some(pattern) => pattern,
None => return Err(From::from(format!(
"Usage: search-stdin <pattern>"
))),
};
let matcher = RegexMatcher::new(&pattern)?;
Searcher::new().search_reader(&matcher, io::stdin(), UTF8(|lnum, line| {
print!("{}:{}", lnum, line);
Ok(true)
}))?;
Ok(())
}

135
grep-searcher/src/lib.rs Normal file
View File

@@ -0,0 +1,135 @@
/*!
This crate provides an implementation of line oriented search, with optional
support for multi-line search.
# Brief overview
The principle type in this crate is a
[`Searcher`](struct.Searcher.html),
which can be configured and built by a
[`SearcherBuilder`](struct.SearcherBuilder.html).
A `Searcher` is responsible for reading bytes from a source (e.g., a file),
executing a search of those bytes using a `Matcher` (e.g., a regex) and then
reporting the results of that search to a
[`Sink`](trait.Sink.html)
(e.g., stdout). The `Searcher` itself is principally responsible for managing
the consumption of bytes from a source and applying a `Matcher` over those
bytes in an efficient way. The `Searcher` is also responsible for inverting
a search, counting lines, reporting contextual lines, detecting binary data
and even deciding whether or not to use memory maps.
A `Matcher` (which is defined in the
[`grep-matcher`](https://crates.io/crates/grep-matcher)
crate) is a trait for describing the lowest levels of pattern search in a
generic way. The interface itself is very similar to the interface of a regular
expression. For example, the
[`grep-regex`](https://crates.io/crates/grep-regex)
crate provides an implementation of the `Matcher` trait using Rust's
[`regex`](https://crates.io/crates/regex)
crate.
Finally, a `Sink` describes how callers receive search results producer by a
`Searcher`. This includes routines that are called at the beginning and end of
a search, in addition to routines that are called when matching or contextual
lines are found by the `Searcher`. Implementations of `Sink` can be trivially
simple, or extraordinarily complex, such as the
`Standard` printer found in the
[`grep-printer`](https://crates.io/crates/grep-printer)
crate, which effectively implements grep-like output.
This crate also provides convenience `Sink` implementations in the
[`sinks`](sinks/index.html)
sub-module for easy searching with closures.
# Example
This example shows how to execute the searcher and read the search results
using the
[`UTF8`](sinks/struct.UTF8.html)
implementation of `Sink`.
```
extern crate grep_matcher;
extern crate grep_regex;
extern crate grep_searcher;
use std::error::Error;
use grep_matcher::Matcher;
use grep_regex::RegexMatcher;
use grep_searcher::Searcher;
use grep_searcher::sinks::UTF8;
const SHERLOCK: &'static [u8] = b"\
For the Doctor Watsons of this world, as opposed to the Sherlock
Holmeses, success in the province of detective work must always
be, to a very large extent, the result of luck. Sherlock Holmes
can extract a clew from a wisp of straw or a flake of cigar ash;
but Doctor Watson has to have it taken out for him and dusted,
and exhibited clearly, with a label attached.
";
# fn main() { example().unwrap() }
fn example() -> Result<(), Box<Error>> {
let matcher = RegexMatcher::new(r"Doctor \w+")?;
let mut matches: Vec<(u64, String)> = vec![];
Searcher::new().search_slice(&matcher, SHERLOCK, UTF8(|lnum, line| {
// We are guaranteed to find a match, so the unwrap is OK.
eprintln!("LINE: {:?}", line);
let mymatch = matcher.find(line.as_bytes())?.unwrap();
matches.push((lnum, line[mymatch].to_string()));
Ok(true)
}))?;
eprintln!("MATCHES: {:?}", matches);
assert_eq!(matches.len(), 2);
assert_eq!(
matches[0],
(1, "Doctor Watsons".to_string())
);
assert_eq!(
matches[1],
(5, "Doctor Watson".to_string())
);
Ok(())
}
```
See also `examples/search-stdin.rs` from the root of this crate's directory
to see a similar example that accepts a pattern on the command line and
searches stdin.
*/
#![deny(missing_docs)]
extern crate bytecount;
extern crate encoding_rs;
extern crate encoding_rs_io;
extern crate grep_matcher;
#[macro_use]
extern crate log;
extern crate memchr;
extern crate memmap;
#[cfg(test)]
extern crate regex;
pub use lines::{LineIter, LineStep};
pub use searcher::{
BinaryDetection, ConfigError, Encoding, MmapChoice,
Searcher, SearcherBuilder,
};
pub use sink::{
Sink, SinkError,
SinkContext, SinkContextKind, SinkFinish, SinkMatch,
};
pub use sink::sinks;
#[macro_use]
mod macros;
mod line_buffer;
mod lines;
mod searcher;
mod sink;
#[cfg(test)]
mod testutil;

View File

@@ -0,0 +1,950 @@
use std::cmp;
use std::io;
use std::ptr;
use memchr::{memchr, memrchr};
/// The default buffer capacity that we use for the line buffer.
pub(crate) const DEFAULT_BUFFER_CAPACITY: usize = 8 * (1<<10); // 8 KB
/// The behavior of a searcher in the face of long lines and big contexts.
///
/// When searching data incrementally using a fixed size buffer, this controls
/// the amount of *additional* memory to allocate beyond the size of the buffer
/// to accommodate lines (which may include the lines in a context window, when
/// enabled) that do not fit in the buffer.
///
/// The default is to eagerly allocate without a limit.
#[derive(Clone, Copy, Debug)]
pub enum BufferAllocation {
/// Attempt to expand the size of the buffer until either at least the next
/// line fits into memory or until all available memory is exhausted.
///
/// This is the default.
Eager,
/// Limit the amount of additional memory allocated to the given size. If
/// a line is found that requires more memory than is allowed here, then
/// stop reading and return an error.
Error(usize),
}
impl Default for BufferAllocation {
fn default() -> BufferAllocation {
BufferAllocation::Eager
}
}
/// Create a new error to be used when a configured allocation limit has been
/// reached.
pub fn alloc_error(limit: usize) -> io::Error {
let msg = format!("configured allocation limit ({}) exceeded", limit);
io::Error::new(io::ErrorKind::Other, msg)
}
/// The behavior of binary detection in the line buffer.
///
/// Binary detection is the process of _heuristically_ identifying whether a
/// given chunk of data is binary or not, and then taking an action based on
/// the result of that heuristic. The motivation behind detecting binary data
/// is that binary data often indicates data that is undesirable to search
/// using textual patterns. Of course, there are many cases in which this isn't
/// true, which is why binary detection is disabled by default.
#[derive(Clone, Copy, Debug)]
pub enum BinaryDetection {
/// No binary detection is performed. Data reported by the line buffer may
/// contain arbitrary bytes.
None,
/// The given byte is searched in all contents read by the line buffer. If
/// it occurs, then the data is considered binary and the line buffer acts
/// as if it reached EOF. The line buffer guarantees that this byte will
/// never be observable by callers.
Quit(u8),
/// The given byte is searched in all contents read by the line buffer. If
/// it occurs, then it is replaced by the line terminator. The line buffer
/// guarantees that this byte will never be observable by callers.
Convert(u8),
}
impl Default for BinaryDetection {
fn default() -> BinaryDetection {
BinaryDetection::None
}
}
impl BinaryDetection {
/// Returns true if and only if the detection heuristic demands that
/// the line buffer stop read data once binary data is observed.
fn is_quit(&self) -> bool {
match *self {
BinaryDetection::Quit(_) => true,
_ => false,
}
}
}
/// The configuration of a buffer. This contains options that are fixed once
/// a buffer has been constructed.
#[derive(Clone, Copy, Debug)]
struct Config {
/// The number of bytes to attempt to read at a time.
capacity: usize,
/// The line terminator.
lineterm: u8,
/// The behavior for handling long lines.
buffer_alloc: BufferAllocation,
/// When set, the presence of the given byte indicates binary content.
binary: BinaryDetection,
}
impl Default for Config {
fn default() -> Config {
Config {
capacity: DEFAULT_BUFFER_CAPACITY,
lineterm: b'\n',
buffer_alloc: BufferAllocation::default(),
binary: BinaryDetection::default(),
}
}
}
/// A builder for constructing line buffers.
#[derive(Clone, Debug, Default)]
pub struct LineBufferBuilder {
config: Config,
}
impl LineBufferBuilder {
/// Create a new builder for a buffer.
pub fn new() -> LineBufferBuilder {
LineBufferBuilder { config: Config::default() }
}
/// Create a new line buffer from this builder's configuration.
pub fn build(&self) -> LineBuffer {
LineBuffer {
config: self.config,
buf: vec![0; self.config.capacity],
pos: 0,
last_lineterm: 0,
end: 0,
absolute_byte_offset: 0,
binary_byte_offset: None,
}
}
/// Set the default capacity to use for a buffer.
///
/// In general, the capacity of a buffer corresponds to the amount of data
/// to hold in memory, and the size of the reads to make to the underlying
/// reader.
///
/// This is set to a reasonable default and probably shouldn't be changed
/// unless there's a specific reason to do so.
pub fn capacity(&mut self, capacity: usize) -> &mut LineBufferBuilder {
self.config.capacity = capacity;
self
}
/// Set the line terminator for the buffer.
///
/// Every buffer has a line terminator, and this line terminator is used
/// to determine how to roll the buffer forward. For example, when a read
/// to the buffer's underlying reader occurs, the end of the data that is
/// read is likely to correspond to an incomplete line. As a line buffer,
/// callers should not access this data since it is incomplete. The line
/// terminator is how the line buffer determines the part of the read that
/// is incomplete.
///
/// By default, this is set to `b'\n'`.
pub fn line_terminator(&mut self, lineterm: u8) -> &mut LineBufferBuilder {
self.config.lineterm = lineterm;
self
}
/// Set the maximum amount of additional memory to allocate for long lines.
///
/// In order to enable line oriented search, a fundamental requirement is
/// that, at a minimum, each line must be able to fit into memory. This
/// setting controls how big that line is allowed to be. By default, this
/// is set to `BufferAllocation::Eager`, which means a line buffer will
/// attempt to allocate as much memory as possible to fit a line, and will
/// only be limited by available memory.
///
/// Note that this setting only applies to the amount of *additional*
/// memory to allocate, beyond the capacity of the buffer. That means that
/// a value of `0` is sensible, and in particular, will guarantee that a
/// line buffer will never allocate additional memory beyond its initial
/// capacity.
pub fn buffer_alloc(
&mut self,
behavior: BufferAllocation,
) -> &mut LineBufferBuilder {
self.config.buffer_alloc = behavior;
self
}
/// Whether to enable binary detection or not. Depending on the setting,
/// this can either cause the line buffer to report EOF early or it can
/// cause the line buffer to clean the data.
///
/// By default, this is disabled. In general, binary detection should be
/// viewed as an imperfect heuristic.
pub fn binary_detection(
&mut self,
detection: BinaryDetection,
) -> &mut LineBufferBuilder {
self.config.binary = detection;
self
}
}
/// A line buffer reader efficiently reads a line oriented buffer from an
/// arbitrary reader.
#[derive(Debug)]
pub struct LineBufferReader<'b, R> {
rdr: R,
line_buffer: &'b mut LineBuffer,
}
impl<'b, R: io::Read> LineBufferReader<'b, R> {
/// Create a new buffered reader that reads from `rdr` and uses the given
/// `line_buffer` as an intermediate buffer.
///
/// This does not change the binary detection behavior of the given line
/// buffer.
pub fn new(
rdr: R,
line_buffer: &'b mut LineBuffer,
) -> LineBufferReader<'b, R> {
line_buffer.clear();
LineBufferReader { rdr, line_buffer }
}
/// The absolute byte offset which corresponds to the starting offsets
/// of the data returned by `buffer` relative to the beginning of the
/// underlying reader's contents. As such, this offset does not generally
/// correspond to an offset in memory. It is typically used for reporting
/// purposes. It can also be used for counting the number of bytes that
/// have been searched.
pub fn absolute_byte_offset(&self) -> u64 {
self.line_buffer.absolute_byte_offset()
}
/// If binary data was detected, then this returns the absolute byte offset
/// at which binary data was initially found.
pub fn binary_byte_offset(&self) -> Option<u64> {
self.line_buffer.binary_byte_offset()
}
/// Fill the contents of this buffer by discarding the part of the buffer
/// that has been consumed. The free space created by discarding the
/// consumed part of the buffer is then filled with new data from the
/// reader.
///
/// If EOF is reached, then `false` is returned. Otherwise, `true` is
/// returned. (Note that if this line buffer's binary detection is set to
/// `Quit`, then the presence of binary data will cause this buffer to
/// behave as if it had seen EOF at the first occurrence of binary data.)
///
/// This forwards any errors returned by the underlying reader, and will
/// also return an error if the buffer must be expanded past its allocation
/// limit, as governed by the buffer allocation strategy.
pub fn fill(&mut self) -> Result<bool, io::Error> {
self.line_buffer.fill(&mut self.rdr)
}
/// Return the contents of this buffer.
pub fn buffer(&self) -> &[u8] {
self.line_buffer.buffer()
}
/// Consume the number of bytes provided. This must be less than or equal
/// to the number of bytes returned by `buffer`.
pub fn consume(&mut self, amt: usize) {
self.line_buffer.consume(amt);
}
/// Consumes the remainder of the buffer. Subsequent calls to `buffer` are
/// guaranteed to return an empty slice until the buffer is refilled.
///
/// This is a convenience function for `consume(buffer.len())`.
#[cfg(test)]
fn consume_all(&mut self) {
self.line_buffer.consume_all();
}
}
/// A line buffer manages a (typically fixed) buffer for holding lines.
///
/// Callers should create line buffers sparingly and reuse them when possible.
/// Line buffers cannot be used directly, but instead must be used via the
/// LineBufferReader.
#[derive(Clone, Debug)]
pub struct LineBuffer {
/// The configuration of this buffer.
config: Config,
/// The primary buffer with which to hold data.
buf: Vec<u8>,
/// The current position of this buffer. This is always a valid sliceable
/// index into `buf`, and its maximum value is the length of `buf`.
pos: usize,
/// The end position of searchable content in this buffer. This is either
/// set to just after the final line terminator in the buffer, or to just
/// after the end of the last byte emitted by the reader when the reader
/// has been exhausted.
last_lineterm: usize,
/// The end position of the buffer. This is always greater than or equal to
/// lastnl. The bytes between lastnl and end, if any, always correspond to
/// a partial line.
end: usize,
/// The absolute byte offset corresponding to `pos`. This is most typically
/// not a valid index into addressable memory, but rather, an offset that
/// is relative to all data that passes through a line buffer (since
/// construction or since the last time `clear` was called).
///
/// When the line buffer reaches EOF, this is set to the position just
/// after the last byte read from the underlying reader. That is, it
/// becomes the total count of bytes that have been read.
absolute_byte_offset: u64,
/// If binary data was found, this records the absolute byte offset at
/// which it was first detected.
binary_byte_offset: Option<u64>,
}
impl LineBuffer {
/// Reset this buffer, such that it can be used with a new reader.
fn clear(&mut self) {
self.pos = 0;
self.last_lineterm = 0;
self.end = 0;
self.absolute_byte_offset = 0;
self.binary_byte_offset = None;
}
/// The absolute byte offset which corresponds to the starting offsets
/// of the data returned by `buffer` relative to the beginning of the
/// reader's contents. As such, this offset does not generally correspond
/// to an offset in memory. It is typically used for reporting purposes,
/// particularly in error messages.
///
/// This is reset to `0` when `clear` is called.
fn absolute_byte_offset(&self) -> u64 {
self.absolute_byte_offset
}
/// If binary data was detected, then this returns the absolute byte offset
/// at which binary data was initially found.
fn binary_byte_offset(&self) -> Option<u64> {
self.binary_byte_offset
}
/// Return the contents of this buffer.
fn buffer(&self) -> &[u8] {
&self.buf[self.pos..self.last_lineterm]
}
/// Return the contents of the free space beyond the end of the buffer as
/// a mutable slice.
fn free_buffer(&mut self) -> &mut [u8] {
&mut self.buf[self.end..]
}
/// Consume the number of bytes provided. This must be less than or equal
/// to the number of bytes returned by `buffer`.
fn consume(&mut self, amt: usize) {
assert!(amt <= self.buffer().len());
self.pos += amt;
self.absolute_byte_offset += amt as u64;
}
/// Consumes the remainder of the buffer. Subsequent calls to `buffer` are
/// guaranteed to return an empty slice until the buffer is refilled.
///
/// This is a convenience function for `consume(buffer.len())`.
#[cfg(test)]
fn consume_all(&mut self) {
let amt = self.buffer().len();
self.consume(amt);
}
/// Fill the contents of this buffer by discarding the part of the buffer
/// that has been consumed. The free space created by discarding the
/// consumed part of the buffer is then filled with new data from the given
/// reader.
///
/// Callers should provide the same reader to this line buffer in
/// subsequent calls to fill. A different reader can only be used
/// immediately following a call to `clear`.
///
/// If EOF is reached, then `false` is returned. Otherwise, `true` is
/// returned. (Note that if this line buffer's binary detection is set to
/// `Quit`, then the presence of binary data will cause this buffer to
/// behave as if it had seen EOF.)
///
/// This forwards any errors returned by `rdr`, and will also return an
/// error if the buffer must be expanded past its allocation limit, as
/// governed by the buffer allocation strategy.
fn fill<R: io::Read>(&mut self, mut rdr: R) -> Result<bool, io::Error> {
// If the binary detection heuristic tells us to quit once binary data
// has been observed, then we no longer read new data and reach EOF
// once the current buffer has been consumed.
if self.config.binary.is_quit() && self.binary_byte_offset.is_some() {
return Ok(!self.buffer().is_empty());
}
self.roll();
assert_eq!(self.pos, 0);
loop {
self.ensure_capacity()?;
let readlen = rdr.read(self.free_buffer())?;
if readlen == 0 {
// We're only done reading for good once the caller has
// consumed everything.
self.last_lineterm = self.end;
return Ok(!self.buffer().is_empty());
}
// Get a mutable view into the bytes we've just read. These are
// the bytes that we do binary detection on, and also the bytes we
// search to find the last line terminator. We need a mutable slice
// in the case of binary conversion.
let oldend = self.end;
self.end += readlen;
let newbytes = &mut self.buf[oldend..self.end];
// Binary detection.
match self.config.binary {
BinaryDetection::None => {} // nothing to do
BinaryDetection::Quit(byte) => {
if let Some(i) = memchr(byte, newbytes) {
self.end = oldend + i;
self.last_lineterm = self.end;
self.binary_byte_offset =
Some(self.absolute_byte_offset + self.end as u64);
// If the first byte in our buffer is a binary byte,
// then our buffer is empty and we should report as
// such to the caller.
return Ok(self.pos < self.end);
}
}
BinaryDetection::Convert(byte) => {
if let Some(i) = replace_bytes(
newbytes,
byte,
self.config.lineterm,
) {
// Record only the first binary offset.
if self.binary_byte_offset.is_none() {
self.binary_byte_offset =
Some(self.absolute_byte_offset
+ (oldend + i) as u64);
}
}
}
}
// Update our `last_lineterm` positions if we read one.
if let Some(i) = memrchr(self.config.lineterm, newbytes) {
self.last_lineterm = oldend + i + 1;
return Ok(true);
}
// At this point, if we couldn't find a line terminator, then we
// don't have a complete line. Therefore, we try to read more!
}
}
/// Roll the unconsumed parts of the buffer to the front.
///
/// This operation is idempotent.
///
/// After rolling, `last_lineterm` and `end` point to the same location,
/// and `pos` is always set to `0`.
fn roll(&mut self) {
if self.pos == self.end {
self.pos = 0;
self.last_lineterm = 0;
self.end = 0;
return;
}
assert!(self.pos < self.end && self.end <= self.buf.len());
let roll_len = self.end - self.pos;
unsafe {
// SAFETY: A buffer contains Copy data, so there's no problem
// moving it around. Safety also depends on our indices being
// in bounds, which they should always be, and we enforce with
// an assert above.
//
// TODO: It seems like it should be possible to do this in safe
// code that results in the same codegen.
ptr::copy(
self.buf[self.pos..].as_ptr(),
self.buf.as_mut_ptr(),
roll_len,
);
}
self.pos = 0;
self.last_lineterm = roll_len;
self.end = self.last_lineterm;
}
/// Ensures that the internal buffer has a non-zero amount of free space
/// in which to read more data. If there is no free space, then more is
/// allocated. If the allocation must exceed the configured limit, then
/// this returns an error.
fn ensure_capacity(&mut self) -> Result<(), io::Error> {
if !self.free_buffer().is_empty() {
return Ok(());
}
// `len` is used for computing the next allocation size. The capacity
// is permitted to start at `0`, so we make sure it's at least `1`.
let len = cmp::max(1, self.buf.len());
let additional = match self.config.buffer_alloc {
BufferAllocation::Eager => len * 2,
BufferAllocation::Error(limit) => {
let used = self.buf.len() - self.config.capacity;
let n = cmp::min(len * 2, limit - used);
if n == 0 {
return Err(alloc_error(self.config.capacity + limit));
}
n
}
};
assert!(additional > 0);
let newlen = self.buf.len() + additional;
self.buf.resize(newlen, 0);
assert!(!self.free_buffer().is_empty());
Ok(())
}
}
/// Replaces `src` with `replacement` in bytes.
fn replace_bytes(bytes: &mut [u8], src: u8, replacement: u8) -> Option<usize> {
if src == replacement {
return None;
}
let mut first_pos = None;
let mut pos = 0;
while let Some(i) = memchr(src, &bytes[pos..]).map(|i| pos + i) {
if first_pos.is_none() {
first_pos = Some(i);
}
bytes[i] = replacement;
pos = i + 1;
while bytes.get(pos) == Some(&src) {
bytes[pos] = replacement;
pos += 1;
}
}
first_pos
}
#[cfg(test)]
mod tests {
use std::str;
use super::*;
const SHERLOCK: &'static str = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
Holmeses, success in the province of detective work must always
be, to a very large extent, the result of luck. Sherlock Holmes
can extract a clew from a wisp of straw or a flake of cigar ash;
but Doctor Watson has to have it taken out for him and dusted,
and exhibited clearly, with a label attached.\
";
fn s(slice: &str) -> String {
slice.to_string()
}
fn btos(slice: &[u8]) -> &str {
str::from_utf8(slice).unwrap()
}
fn replace_str(
slice: &str,
src: u8,
replacement: u8,
) -> (String, Option<usize>) {
let mut dst = slice.to_string().into_bytes();
let result = replace_bytes(&mut dst, src, replacement);
(String::from_utf8(dst).unwrap(), result)
}
#[test]
fn replace() {
assert_eq!(replace_str("abc", b'b', b'z'), (s("azc"), Some(1)));
assert_eq!(replace_str("abb", b'b', b'z'), (s("azz"), Some(1)));
assert_eq!(replace_str("aba", b'a', b'z'), (s("zbz"), Some(0)));
assert_eq!(replace_str("bbb", b'b', b'z'), (s("zzz"), Some(0)));
assert_eq!(replace_str("bac", b'b', b'z'), (s("zac"), Some(0)));
}
#[test]
fn buffer_basics1() {
let bytes = "homer\nlisa\nmaggie";
let mut linebuf = LineBufferBuilder::new().build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(rdr.buffer().is_empty());
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "homer\nlisa\n");
assert_eq!(rdr.absolute_byte_offset(), 0);
rdr.consume(5);
assert_eq!(rdr.absolute_byte_offset(), 5);
rdr.consume_all();
assert_eq!(rdr.absolute_byte_offset(), 11);
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "maggie");
rdr.consume_all();
assert!(!rdr.fill().unwrap());
assert_eq!(rdr.absolute_byte_offset(), bytes.len() as u64);
assert_eq!(rdr.binary_byte_offset(), None);
}
#[test]
fn buffer_basics2() {
let bytes = "homer\nlisa\nmaggie\n";
let mut linebuf = LineBufferBuilder::new().build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "homer\nlisa\nmaggie\n");
rdr.consume_all();
assert!(!rdr.fill().unwrap());
assert_eq!(rdr.absolute_byte_offset(), bytes.len() as u64);
assert_eq!(rdr.binary_byte_offset(), None);
}
#[test]
fn buffer_basics3() {
let bytes = "\n";
let mut linebuf = LineBufferBuilder::new().build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "\n");
rdr.consume_all();
assert!(!rdr.fill().unwrap());
assert_eq!(rdr.absolute_byte_offset(), bytes.len() as u64);
assert_eq!(rdr.binary_byte_offset(), None);
}
#[test]
fn buffer_basics4() {
let bytes = "\n\n";
let mut linebuf = LineBufferBuilder::new().build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "\n\n");
rdr.consume_all();
assert!(!rdr.fill().unwrap());
assert_eq!(rdr.absolute_byte_offset(), bytes.len() as u64);
assert_eq!(rdr.binary_byte_offset(), None);
}
#[test]
fn buffer_empty() {
let bytes = "";
let mut linebuf = LineBufferBuilder::new().build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(!rdr.fill().unwrap());
assert_eq!(rdr.absolute_byte_offset(), bytes.len() as u64);
assert_eq!(rdr.binary_byte_offset(), None);
}
#[test]
fn buffer_zero_capacity() {
let bytes = "homer\nlisa\nmaggie";
let mut linebuf = LineBufferBuilder::new().capacity(0).build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
while rdr.fill().unwrap() {
rdr.consume_all();
}
assert_eq!(rdr.absolute_byte_offset(), bytes.len() as u64);
assert_eq!(rdr.binary_byte_offset(), None);
}
#[test]
fn buffer_small_capacity() {
let bytes = "homer\nlisa\nmaggie";
let mut linebuf = LineBufferBuilder::new().capacity(1).build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
let mut got = vec![];
while rdr.fill().unwrap() {
got.extend(rdr.buffer());
rdr.consume_all();
}
assert_eq!(bytes, btos(&got));
assert_eq!(rdr.absolute_byte_offset(), bytes.len() as u64);
assert_eq!(rdr.binary_byte_offset(), None);
}
#[test]
fn buffer_limited_capacity1() {
let bytes = "homer\nlisa\nmaggie";
let mut linebuf = LineBufferBuilder::new()
.capacity(1)
.buffer_alloc(BufferAllocation::Error(5))
.build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "homer\n");
rdr.consume_all();
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "lisa\n");
rdr.consume_all();
// This returns an error because while we have just enough room to
// store maggie in the buffer, we *don't* have enough room to read one
// more byte, so we don't know whether we're at EOF or not, and
// therefore must give up.
assert!(rdr.fill().is_err());
// We can mush on though!
assert_eq!(btos(rdr.buffer()), "m");
rdr.consume_all();
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "aggie");
rdr.consume_all();
assert!(!rdr.fill().unwrap());
}
#[test]
fn buffer_limited_capacity2() {
let bytes = "homer\nlisa\nmaggie";
let mut linebuf = LineBufferBuilder::new()
.capacity(1)
.buffer_alloc(BufferAllocation::Error(6))
.build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "homer\n");
rdr.consume_all();
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "lisa\n");
rdr.consume_all();
// We have just enough space.
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "maggie");
rdr.consume_all();
assert!(!rdr.fill().unwrap());
}
#[test]
fn buffer_limited_capacity3() {
let bytes = "homer\nlisa\nmaggie";
let mut linebuf = LineBufferBuilder::new()
.capacity(1)
.buffer_alloc(BufferAllocation::Error(0))
.build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(rdr.fill().is_err());
assert_eq!(btos(rdr.buffer()), "");
}
#[test]
fn buffer_binary_none() {
let bytes = "homer\nli\x00sa\nmaggie\n";
let mut linebuf = LineBufferBuilder::new().build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(rdr.buffer().is_empty());
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "homer\nli\x00sa\nmaggie\n");
rdr.consume_all();
assert!(!rdr.fill().unwrap());
assert_eq!(rdr.absolute_byte_offset(), bytes.len() as u64);
assert_eq!(rdr.binary_byte_offset(), None);
}
#[test]
fn buffer_binary_quit1() {
let bytes = "homer\nli\x00sa\nmaggie\n";
let mut linebuf = LineBufferBuilder::new()
.binary_detection(BinaryDetection::Quit(b'\x00'))
.build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(rdr.buffer().is_empty());
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "homer\nli");
rdr.consume_all();
assert!(!rdr.fill().unwrap());
assert_eq!(rdr.absolute_byte_offset(), 8);
assert_eq!(rdr.binary_byte_offset(), Some(8));
}
#[test]
fn buffer_binary_quit2() {
let bytes = "\x00homer\nlisa\nmaggie\n";
let mut linebuf = LineBufferBuilder::new()
.binary_detection(BinaryDetection::Quit(b'\x00'))
.build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(!rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "");
assert_eq!(rdr.absolute_byte_offset(), 0);
assert_eq!(rdr.binary_byte_offset(), Some(0));
}
#[test]
fn buffer_binary_quit3() {
let bytes = "homer\nlisa\nmaggie\n\x00";
let mut linebuf = LineBufferBuilder::new()
.binary_detection(BinaryDetection::Quit(b'\x00'))
.build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(rdr.buffer().is_empty());
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "homer\nlisa\nmaggie\n");
rdr.consume_all();
assert!(!rdr.fill().unwrap());
assert_eq!(rdr.absolute_byte_offset(), bytes.len() as u64 - 1);
assert_eq!(rdr.binary_byte_offset(), Some(bytes.len() as u64 - 1));
}
#[test]
fn buffer_binary_quit4() {
let bytes = "homer\nlisa\nmaggie\x00\n";
let mut linebuf = LineBufferBuilder::new()
.binary_detection(BinaryDetection::Quit(b'\x00'))
.build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(rdr.buffer().is_empty());
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "homer\nlisa\nmaggie");
rdr.consume_all();
assert!(!rdr.fill().unwrap());
assert_eq!(rdr.absolute_byte_offset(), bytes.len() as u64 - 2);
assert_eq!(rdr.binary_byte_offset(), Some(bytes.len() as u64 - 2));
}
#[test]
fn buffer_binary_quit5() {
let mut linebuf = LineBufferBuilder::new()
.binary_detection(BinaryDetection::Quit(b'u'))
.build();
let mut rdr = LineBufferReader::new(SHERLOCK.as_bytes(), &mut linebuf);
assert!(rdr.buffer().is_empty());
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "\
For the Doctor Watsons of this world, as opposed to the Sherlock
Holmeses, s\
");
rdr.consume_all();
assert!(!rdr.fill().unwrap());
assert_eq!(rdr.absolute_byte_offset(), 76);
assert_eq!(rdr.binary_byte_offset(), Some(76));
assert_eq!(SHERLOCK.as_bytes()[76], b'u');
}
#[test]
fn buffer_binary_convert1() {
let bytes = "homer\nli\x00sa\nmaggie\n";
let mut linebuf = LineBufferBuilder::new()
.binary_detection(BinaryDetection::Convert(b'\x00'))
.build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(rdr.buffer().is_empty());
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "homer\nli\nsa\nmaggie\n");
rdr.consume_all();
assert!(!rdr.fill().unwrap());
assert_eq!(rdr.absolute_byte_offset(), bytes.len() as u64);
assert_eq!(rdr.binary_byte_offset(), Some(8));
}
#[test]
fn buffer_binary_convert2() {
let bytes = "\x00homer\nlisa\nmaggie\n";
let mut linebuf = LineBufferBuilder::new()
.binary_detection(BinaryDetection::Convert(b'\x00'))
.build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(rdr.buffer().is_empty());
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "\nhomer\nlisa\nmaggie\n");
rdr.consume_all();
assert!(!rdr.fill().unwrap());
assert_eq!(rdr.absolute_byte_offset(), bytes.len() as u64);
assert_eq!(rdr.binary_byte_offset(), Some(0));
}
#[test]
fn buffer_binary_convert3() {
let bytes = "homer\nlisa\nmaggie\n\x00";
let mut linebuf = LineBufferBuilder::new()
.binary_detection(BinaryDetection::Convert(b'\x00'))
.build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(rdr.buffer().is_empty());
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "homer\nlisa\nmaggie\n\n");
rdr.consume_all();
assert!(!rdr.fill().unwrap());
assert_eq!(rdr.absolute_byte_offset(), bytes.len() as u64);
assert_eq!(rdr.binary_byte_offset(), Some(bytes.len() as u64 - 1));
}
#[test]
fn buffer_binary_convert4() {
let bytes = "homer\nlisa\nmaggie\x00\n";
let mut linebuf = LineBufferBuilder::new()
.binary_detection(BinaryDetection::Convert(b'\x00'))
.build();
let mut rdr = LineBufferReader::new(bytes.as_bytes(), &mut linebuf);
assert!(rdr.buffer().is_empty());
assert!(rdr.fill().unwrap());
assert_eq!(btos(rdr.buffer()), "homer\nlisa\nmaggie\n\n");
rdr.consume_all();
assert!(!rdr.fill().unwrap());
assert_eq!(rdr.absolute_byte_offset(), bytes.len() as u64);
assert_eq!(rdr.binary_byte_offset(), Some(bytes.len() as u64 - 2));
}
}

453
grep-searcher/src/lines.rs Normal file
View File

@@ -0,0 +1,453 @@
/*!
A collection of routines for performing operations on lines.
*/
use bytecount;
use memchr::{memchr, memrchr};
use grep_matcher::{LineTerminator, Match};
/// An iterator over lines in a particular slice of bytes.
///
/// Line terminators are considered part of the line they terminate. All lines
/// yielded by the iterator are guaranteed to be non-empty.
///
/// `'b` refers to the lifetime of the underlying bytes.
#[derive(Debug)]
pub struct LineIter<'b> {
bytes: &'b [u8],
stepper: LineStep,
}
impl<'b> LineIter<'b> {
/// Create a new line iterator that yields lines in the given bytes that
/// are terminated by `line_term`.
pub fn new(line_term: u8, bytes: &'b [u8]) -> LineIter<'b> {
LineIter {
bytes: bytes,
stepper: LineStep::new(line_term, 0, bytes.len()),
}
}
}
impl<'b> Iterator for LineIter<'b> {
type Item = &'b [u8];
fn next(&mut self) -> Option<&'b [u8]> {
self.stepper.next_match(self.bytes).map(|m| &self.bytes[m])
}
}
/// An explicit iterator over lines in a particular slice of bytes.
///
/// This iterator avoids borrowing the bytes themselves, and instead requires
/// callers to explicitly provide the bytes when moving through the iterator.
/// While not idiomatic, this provides a simple way of iterating over lines
/// that doesn't require borrowing the slice itself, which can be convenient.
///
/// Line terminators are considered part of the line they terminate. All lines
/// yielded by the iterator are guaranteed to be non-empty.
#[derive(Debug)]
pub struct LineStep {
line_term: u8,
pos: usize,
end: usize,
}
impl LineStep {
/// Create a new line iterator over the given range of bytes using the
/// given line terminator.
///
/// Callers should provide the actual bytes for each call to `next`. The
/// same slice must be provided to each call.
///
/// This panics if `start` is not less than or equal to `end`.
pub fn new(line_term: u8, start: usize, end: usize) -> LineStep {
LineStep { line_term, pos: start, end: end }
}
/// Return the start and end position of the next line in the given bytes.
///
/// The caller must past exactly the same slice of bytes for each call to
/// `next`.
///
/// The range returned includes the line terminator. Ranges are always
/// non-empty.
pub fn next(&mut self, mut bytes: &[u8]) -> Option<(usize, usize)> {
bytes = &bytes[..self.end];
match memchr(self.line_term, &bytes[self.pos..]) {
None => {
if self.pos < bytes.len() {
let m = (self.pos, bytes.len());
assert!(m.0 <= m.1);
self.pos = m.1;
Some(m)
} else {
None
}
}
Some(line_end) => {
let m = (self.pos, self.pos + line_end + 1);
assert!(m.0 <= m.1);
self.pos = m.1;
Some(m)
}
}
}
/// Like next, but returns a `Match` instead of a tuple.
pub(crate) fn next_match(&mut self, bytes: &[u8]) -> Option<Match> {
self.next(bytes).map(|(s, e)| Match::new(s, e))
}
}
/// Count the number of occurrences of `line_term` in `bytes`.
pub fn count(bytes: &[u8], line_term: u8) -> u64 {
bytecount::count(bytes, line_term) as u64
}
/// Given a line that possibly ends with a terminator, return that line without
/// the terminator.
pub fn without_terminator(bytes: &[u8], line_term: LineTerminator) -> &[u8] {
let line_term = line_term.as_bytes();
if bytes.get(bytes.len().saturating_sub(line_term.len())..) == Some(line_term) {
return &bytes[..bytes.len() - line_term.len()];
}
bytes
}
/// Return the start and end offsets of the lines containing the given range
/// of bytes.
///
/// Line terminators are considered part of the line they terminate.
pub fn locate(
bytes: &[u8],
line_term: u8,
range: Match,
) -> Match {
let line_start = memrchr(line_term, &bytes[0..range.start()])
.map_or(0, |i| i + 1);
let line_end =
if range.end() > line_start && bytes[range.end() - 1] == line_term {
range.end()
} else {
memchr(line_term, &bytes[range.end()..])
.map_or(bytes.len(), |i| range.end() + i + 1)
};
Match::new(line_start, line_end)
}
/// Returns the minimal starting offset of the line that occurs `count` lines
/// before the last line in `bytes`.
///
/// Lines are terminated by `line_term`. If `count` is zero, then this returns
/// the starting offset of the last line in `bytes`.
///
/// If `bytes` ends with a line terminator, then the terminator itself is
/// considered part of the last line.
pub fn preceding(bytes: &[u8], line_term: u8, count: usize) -> usize {
preceding_by_pos(bytes, bytes.len(), line_term, count)
}
/// Returns the minimal starting offset of the line that occurs `count` lines
/// before the line containing `pos`. Lines are terminated by `line_term`.
/// If `count` is zero, then this returns the starting offset of the line
/// containing `pos`.
///
/// If `pos` points just past a line terminator, then it is considered part of
/// the line that it terminates. For example, given `bytes = b"abc\nxyz\n"`
/// and `pos = 7`, `preceding(bytes, pos, b'\n', 0)` returns `4` (as does `pos
/// = 8`) and `preceding(bytes, pos, `b'\n', 1)` returns `0`.
fn preceding_by_pos(
bytes: &[u8],
mut pos: usize,
line_term: u8,
mut count: usize,
) -> usize {
if pos == 0 {
return 0;
} else if bytes[pos - 1] == b'\n' {
pos -= 1;
}
loop {
match memrchr(line_term, &bytes[..pos]) {
None => {
return 0;
}
Some(i) => {
if count == 0 {
return i + 1;
} else if i == 0 {
return 0;
}
count -= 1;
pos = i;
}
}
}
}
#[cfg(test)]
mod tests {
use std::ops::Range;
use std::str;
use grep_matcher::Match;
use super::*;
const SHERLOCK: &'static str = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
Holmeses, success in the province of detective work must always
be, to a very large extent, the result of luck. Sherlock Holmes
can extract a clew from a wisp of straw or a flake of cigar ash;
but Doctor Watson has to have it taken out for him and dusted,
and exhibited clearly, with a label attached.\
";
fn m(start: usize, end: usize) -> Match {
Match::new(start, end)
}
fn lines(text: &str) -> Vec<&str> {
let mut results = vec![];
let mut it = LineStep::new(b'\n', 0, text.len());
while let Some(m) = it.next_match(text.as_bytes()) {
results.push(&text[m]);
}
results
}
fn line_ranges(text: &str) -> Vec<Range<usize>> {
let mut results = vec![];
let mut it = LineStep::new(b'\n', 0, text.len());
while let Some(m) = it.next_match(text.as_bytes()) {
results.push(m.start()..m.end());
}
results
}
fn prev(text: &str, pos: usize, count: usize) -> usize {
preceding_by_pos(text.as_bytes(), pos, b'\n', count)
}
fn loc(text: &str, start: usize, end: usize) -> Match {
locate(text.as_bytes(), b'\n', Match::new(start, end))
}
#[test]
fn line_count() {
assert_eq!(0, count(b"", b'\n'));
assert_eq!(1, count(b"\n", b'\n'));
assert_eq!(2, count(b"\n\n", b'\n'));
assert_eq!(2, count(b"a\nb\nc", b'\n'));
}
#[test]
fn line_locate() {
let t = SHERLOCK;
let lines = line_ranges(t);
assert_eq!(
loc(t, lines[0].start, lines[0].end),
m(lines[0].start, lines[0].end));
assert_eq!(
loc(t, lines[0].start + 1, lines[0].end),
m(lines[0].start, lines[0].end));
assert_eq!(
loc(t, lines[0].end - 1, lines[0].end),
m(lines[0].start, lines[0].end));
assert_eq!(
loc(t, lines[0].end, lines[0].end),
m(lines[1].start, lines[1].end));
assert_eq!(
loc(t, lines[5].start, lines[5].end),
m(lines[5].start, lines[5].end));
assert_eq!(
loc(t, lines[5].start + 1, lines[5].end),
m(lines[5].start, lines[5].end));
assert_eq!(
loc(t, lines[5].end - 1, lines[5].end),
m(lines[5].start, lines[5].end));
assert_eq!(
loc(t, lines[5].end, lines[5].end),
m(lines[5].start, lines[5].end));
}
#[test]
fn line_locate_weird() {
assert_eq!(loc("", 0, 0), m(0, 0));
assert_eq!(loc("\n", 0, 1), m(0, 1));
assert_eq!(loc("\n", 1, 1), m(1, 1));
assert_eq!(loc("\n\n", 0, 0), m(0, 1));
assert_eq!(loc("\n\n", 0, 1), m(0, 1));
assert_eq!(loc("\n\n", 1, 1), m(1, 2));
assert_eq!(loc("\n\n", 1, 2), m(1, 2));
assert_eq!(loc("\n\n", 2, 2), m(2, 2));
assert_eq!(loc("a\nb\nc", 0, 1), m(0, 2));
assert_eq!(loc("a\nb\nc", 1, 2), m(0, 2));
assert_eq!(loc("a\nb\nc", 2, 3), m(2, 4));
assert_eq!(loc("a\nb\nc", 3, 4), m(2, 4));
assert_eq!(loc("a\nb\nc", 4, 5), m(4, 5));
assert_eq!(loc("a\nb\nc", 5, 5), m(4, 5));
}
#[test]
fn line_iter() {
assert_eq!(lines("abc"), vec!["abc"]);
assert_eq!(lines("abc\n"), vec!["abc\n"]);
assert_eq!(lines("abc\nxyz"), vec!["abc\n", "xyz"]);
assert_eq!(lines("abc\nxyz\n"), vec!["abc\n", "xyz\n"]);
assert_eq!(lines("abc\n\n"), vec!["abc\n", "\n"]);
assert_eq!(lines("abc\n\n\n"), vec!["abc\n", "\n", "\n"]);
assert_eq!(lines("abc\n\nxyz"), vec!["abc\n", "\n", "xyz"]);
assert_eq!(lines("abc\n\nxyz\n"), vec!["abc\n", "\n", "xyz\n"]);
assert_eq!(lines("abc\nxyz\n\n"), vec!["abc\n", "xyz\n", "\n"]);
assert_eq!(lines("\n"), vec!["\n"]);
assert_eq!(lines(""), Vec::<&str>::new());
}
#[test]
fn line_iter_empty() {
let mut it = LineStep::new(b'\n', 0, 0);
assert_eq!(it.next(b"abc"), None);
}
#[test]
fn preceding_lines_doc() {
// These are the examples mentions in the documentation of `preceding`.
let bytes = b"abc\nxyz\n";
assert_eq!(4, preceding_by_pos(bytes, 7, b'\n', 0));
assert_eq!(4, preceding_by_pos(bytes, 8, b'\n', 0));
assert_eq!(0, preceding_by_pos(bytes, 7, b'\n', 1));
assert_eq!(0, preceding_by_pos(bytes, 8, b'\n', 1));
}
#[test]
fn preceding_lines_sherlock() {
let t = SHERLOCK;
let lines = line_ranges(t);
// The following tests check the count == 0 case, i.e., finding the
// beginning of the line containing the given position.
assert_eq!(0, prev(t, 0, 0));
assert_eq!(0, prev(t, 1, 0));
// The line terminator is addressed by `end-1` and terminates the line
// it is part of.
assert_eq!(0, prev(t, lines[0].end - 1, 0));
assert_eq!(lines[0].start, prev(t, lines[0].end, 0));
// The end position of line addresses the byte immediately following a
// line terminator, which puts it on the following line.
assert_eq!(lines[1].start, prev(t, lines[0].end + 1, 0));
// Now tests for count > 0.
assert_eq!(0, prev(t, 0, 1));
assert_eq!(0, prev(t, 0, 2));
assert_eq!(0, prev(t, 1, 1));
assert_eq!(0, prev(t, 1, 2));
assert_eq!(0, prev(t, lines[0].end - 1, 1));
assert_eq!(0, prev(t, lines[0].end - 1, 2));
assert_eq!(0, prev(t, lines[0].end, 1));
assert_eq!(0, prev(t, lines[0].end, 2));
assert_eq!(lines[3].start, prev(t, lines[4].end - 1, 1));
assert_eq!(lines[3].start, prev(t, lines[4].end, 1));
assert_eq!(lines[4].start, prev(t, lines[4].end + 1, 1));
// The last line has no line terminator.
assert_eq!(lines[5].start, prev(t, lines[5].end, 0));
assert_eq!(lines[5].start, prev(t, lines[5].end - 1, 0));
assert_eq!(lines[4].start, prev(t, lines[5].end, 1));
assert_eq!(lines[0].start, prev(t, lines[5].end, 5));
}
#[test]
fn preceding_lines_short() {
let t = "a\nb\nc\nd\ne\nf\n";
let lines = line_ranges(t);
assert_eq!(12, t.len());
assert_eq!(lines[5].start, prev(t, lines[5].end, 0));
assert_eq!(lines[4].start, prev(t, lines[5].end, 1));
assert_eq!(lines[3].start, prev(t, lines[5].end, 2));
assert_eq!(lines[2].start, prev(t, lines[5].end, 3));
assert_eq!(lines[1].start, prev(t, lines[5].end, 4));
assert_eq!(lines[0].start, prev(t, lines[5].end, 5));
assert_eq!(lines[0].start, prev(t, lines[5].end, 6));
assert_eq!(lines[5].start, prev(t, lines[5].end - 1, 0));
assert_eq!(lines[4].start, prev(t, lines[5].end - 1, 1));
assert_eq!(lines[3].start, prev(t, lines[5].end - 1, 2));
assert_eq!(lines[2].start, prev(t, lines[5].end - 1, 3));
assert_eq!(lines[1].start, prev(t, lines[5].end - 1, 4));
assert_eq!(lines[0].start, prev(t, lines[5].end - 1, 5));
assert_eq!(lines[0].start, prev(t, lines[5].end - 1, 6));
assert_eq!(lines[4].start, prev(t, lines[5].start, 0));
assert_eq!(lines[3].start, prev(t, lines[5].start, 1));
assert_eq!(lines[2].start, prev(t, lines[5].start, 2));
assert_eq!(lines[1].start, prev(t, lines[5].start, 3));
assert_eq!(lines[0].start, prev(t, lines[5].start, 4));
assert_eq!(lines[0].start, prev(t, lines[5].start, 5));
assert_eq!(lines[3].start, prev(t, lines[4].end - 1, 1));
assert_eq!(lines[2].start, prev(t, lines[4].start, 1));
assert_eq!(lines[2].start, prev(t, lines[3].end - 1, 1));
assert_eq!(lines[1].start, prev(t, lines[3].start, 1));
assert_eq!(lines[1].start, prev(t, lines[2].end - 1, 1));
assert_eq!(lines[0].start, prev(t, lines[2].start, 1));
assert_eq!(lines[0].start, prev(t, lines[1].end - 1, 1));
assert_eq!(lines[0].start, prev(t, lines[1].start, 1));
assert_eq!(lines[0].start, prev(t, lines[0].end - 1, 1));
assert_eq!(lines[0].start, prev(t, lines[0].start, 1));
}
#[test]
fn preceding_lines_empty1() {
let t = "\n\n\nd\ne\nf\n";
let lines = line_ranges(t);
assert_eq!(9, t.len());
assert_eq!(lines[0].start, prev(t, lines[0].end, 0));
assert_eq!(lines[0].start, prev(t, lines[0].end, 1));
assert_eq!(lines[1].start, prev(t, lines[1].end, 0));
assert_eq!(lines[0].start, prev(t, lines[1].end, 1));
assert_eq!(lines[5].start, prev(t, lines[5].end, 0));
assert_eq!(lines[4].start, prev(t, lines[5].end, 1));
assert_eq!(lines[3].start, prev(t, lines[5].end, 2));
assert_eq!(lines[2].start, prev(t, lines[5].end, 3));
assert_eq!(lines[1].start, prev(t, lines[5].end, 4));
assert_eq!(lines[0].start, prev(t, lines[5].end, 5));
assert_eq!(lines[0].start, prev(t, lines[5].end, 6));
}
#[test]
fn preceding_lines_empty2() {
let t = "a\n\n\nd\ne\nf\n";
let lines = line_ranges(t);
assert_eq!(10, t.len());
assert_eq!(lines[0].start, prev(t, lines[0].end, 0));
assert_eq!(lines[0].start, prev(t, lines[0].end, 1));
assert_eq!(lines[1].start, prev(t, lines[1].end, 0));
assert_eq!(lines[0].start, prev(t, lines[1].end, 1));
assert_eq!(lines[5].start, prev(t, lines[5].end, 0));
assert_eq!(lines[4].start, prev(t, lines[5].end, 1));
assert_eq!(lines[3].start, prev(t, lines[5].end, 2));
assert_eq!(lines[2].start, prev(t, lines[5].end, 3));
assert_eq!(lines[1].start, prev(t, lines[5].end, 4));
assert_eq!(lines[0].start, prev(t, lines[5].end, 5));
assert_eq!(lines[0].start, prev(t, lines[5].end, 6));
}
}

View File

@@ -0,0 +1,24 @@
#[cfg(test)]
#[macro_export]
macro_rules! assert_eq_printed {
($expected:expr, $got:expr, $($tt:tt)*) => {
let expected = &*$expected;
let got = &*$got;
let label = format!($($tt)*);
if expected != got {
panic!("
printed outputs differ! (label: {})
expected:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
{}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
got:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
{}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
", label, expected, got);
}
}
}

View File

@@ -0,0 +1,577 @@
use std::cmp;
use memchr::memchr;
use grep_matcher::{LineMatchKind, Matcher};
use lines::{self, LineStep};
use line_buffer::BinaryDetection;
use searcher::{Config, Range, Searcher};
use sink::{
Sink, SinkError,
SinkFinish, SinkContext, SinkContextKind, SinkMatch,
};
#[derive(Debug)]
pub struct Core<'s, M: 's, S> {
config: &'s Config,
matcher: M,
searcher: &'s Searcher,
sink: S,
binary: bool,
pos: usize,
absolute_byte_offset: u64,
binary_byte_offset: Option<usize>,
line_number: Option<u64>,
last_line_counted: usize,
last_line_visited: usize,
after_context_left: usize,
has_sunk: bool,
}
impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
pub fn new(
searcher: &'s Searcher,
matcher: M,
sink: S,
binary: bool,
) -> Core<'s, M, S> {
let line_number =
if searcher.config.line_number {
Some(1)
} else {
None
};
let core = Core {
config: &searcher.config,
matcher: matcher,
searcher: searcher,
sink: sink,
binary: binary,
pos: 0,
absolute_byte_offset: 0,
binary_byte_offset: None,
line_number: line_number,
last_line_counted: 0,
last_line_visited: 0,
after_context_left: 0,
has_sunk: false,
};
if !core.searcher.multi_line_with_matcher(&core.matcher) {
if core.is_line_by_line_fast() {
trace!("searcher core: will use fast line searcher");
} else {
trace!("searcher core: will use slow line searcher");
}
}
core
}
pub fn pos(&self) -> usize {
self.pos
}
pub fn set_pos(&mut self, pos: usize) {
self.pos = pos;
}
pub fn binary_byte_offset(&self) -> Option<u64> {
self.binary_byte_offset.map(|offset| offset as u64)
}
pub fn matcher(&self) -> &M {
&self.matcher
}
pub fn matched(
&mut self,
buf: &[u8],
range: &Range,
) -> Result<bool, S::Error> {
self.sink_matched(buf, range)
}
pub fn begin(&mut self) -> Result<bool, S::Error> {
self.sink.begin(&self.searcher)
}
pub fn finish(
&mut self,
byte_count: u64,
binary_byte_offset: Option<u64>,
) -> Result<(), S::Error> {
self.sink.finish(
&self.searcher,
&SinkFinish {
byte_count,
binary_byte_offset,
})
}
pub fn match_by_line(&mut self, buf: &[u8]) -> Result<bool, S::Error> {
if self.is_line_by_line_fast() {
self.match_by_line_fast(buf)
} else {
self.match_by_line_slow(buf)
}
}
pub fn roll(&mut self, buf: &[u8]) -> usize {
let consumed =
if self.config.max_context() == 0 {
buf.len()
} else {
// It might seem like all we need to care about here is just
// the "before context," but in order to sink the context
// separator (when before_context==0 and after_context>0), we
// need to know something about the position of the previous
// line visited, even if we're at the beginning of the buffer.
let context_start = lines::preceding(
buf,
self.config.line_term.as_byte(),
self.config.max_context(),
);
let consumed = cmp::max(context_start, self.last_line_visited);
consumed
};
self.count_lines(buf, consumed);
self.absolute_byte_offset += consumed as u64;
self.last_line_counted = 0;
self.last_line_visited = 0;
self.set_pos(buf.len() - consumed);
consumed
}
pub fn detect_binary(&mut self, buf: &[u8], range: &Range) -> bool {
if self.binary_byte_offset.is_some() {
return true;
}
let binary_byte = match self.config.binary.0 {
BinaryDetection::Quit(b) => b,
_ => return false,
};
if let Some(i) = memchr(binary_byte, &buf[*range]) {
self.binary_byte_offset = Some(range.start() + i);
true
} else {
false
}
}
pub fn before_context_by_line(
&mut self,
buf: &[u8],
upto: usize,
) -> Result<bool, S::Error> {
if self.config.before_context == 0 {
return Ok(true);
}
let range = Range::new(self.last_line_visited, upto);
if range.is_empty() {
return Ok(true);
}
let before_context_start = range.start() + lines::preceding(
&buf[range],
self.config.line_term.as_byte(),
self.config.before_context - 1,
);
let range = Range::new(before_context_start, range.end());
let mut stepper = LineStep::new(
self.config.line_term.as_byte(),
range.start(),
range.end(),
);
while let Some(line) = stepper.next_match(buf) {
if !self.sink_break_context(line.start())? {
return Ok(false);
}
if !self.sink_before_context(buf, &line)? {
return Ok(false);
}
}
Ok(true)
}
pub fn after_context_by_line(
&mut self,
buf: &[u8],
upto: usize,
) -> Result<bool, S::Error> {
if self.after_context_left == 0 {
return Ok(true);
}
let range = Range::new(self.last_line_visited, upto);
let mut stepper = LineStep::new(
self.config.line_term.as_byte(),
range.start(),
range.end(),
);
while let Some(line) = stepper.next_match(buf) {
if !self.sink_after_context(buf, &line)? {
return Ok(false);
}
if self.after_context_left == 0 {
break;
}
}
Ok(true)
}
pub fn other_context_by_line(
&mut self,
buf: &[u8],
upto: usize,
) -> Result<bool, S::Error> {
let range = Range::new(self.last_line_visited, upto);
let mut stepper = LineStep::new(
self.config.line_term.as_byte(),
range.start(),
range.end(),
);
while let Some(line) = stepper.next_match(buf) {
if !self.sink_other_context(buf, &line)? {
return Ok(false);
}
}
Ok(true)
}
fn match_by_line_slow(&mut self, buf: &[u8]) -> Result<bool, S::Error> {
debug_assert!(!self.searcher.multi_line_with_matcher(&self.matcher));
let range = Range::new(self.pos(), buf.len());
let mut stepper = LineStep::new(
self.config.line_term.as_byte(),
range.start(),
range.end(),
);
while let Some(line) = stepper.next_match(buf) {
let matched = {
// Stripping the line terminator is necessary to prevent some
// classes of regexes from matching the empty position *after*
// the end of the line. For example, `(?m)^$` will match at
// position (2, 2) in the string `a\n`.
let slice = lines::without_terminator(
&buf[line],
self.config.line_term,
);
match self.matcher.shortest_match(slice) {
Err(err) => return Err(S::Error::error_message(err)),
Ok(result) => result.is_some(),
}
};
self.set_pos(line.end());
if matched != self.config.invert_match {
if !self.before_context_by_line(buf, line.start())? {
return Ok(false);
}
if !self.sink_matched(buf, &line)? {
return Ok(false);
}
} else if self.after_context_left >= 1 {
if !self.sink_after_context(buf, &line)? {
return Ok(false);
}
} else if self.config.passthru {
if !self.sink_other_context(buf, &line)? {
return Ok(false);
}
}
}
Ok(true)
}
fn match_by_line_fast(&mut self, buf: &[u8]) -> Result<bool, S::Error> {
debug_assert!(!self.config.passthru);
while !buf[self.pos()..].is_empty() {
if self.config.invert_match {
if !self.match_by_line_fast_invert(buf)? {
return Ok(false);
}
} else if let Some(line) = self.find_by_line_fast(buf)? {
if !self.after_context_by_line(buf, line.start())? {
return Ok(false);
}
if !self.before_context_by_line(buf, line.start())? {
return Ok(false);
}
self.set_pos(line.end());
if !self.sink_matched(buf, &line)? {
return Ok(false);
}
} else {
break;
}
}
if !self.after_context_by_line(buf, buf.len())? {
return Ok(false);
}
self.set_pos(buf.len());
Ok(true)
}
fn match_by_line_fast_invert(
&mut self,
buf: &[u8],
) -> Result<bool, S::Error> {
assert!(self.config.invert_match);
let invert_match = match self.find_by_line_fast(buf)? {
None => {
let range = Range::new(self.pos(), buf.len());
self.set_pos(range.end());
range
}
Some(line) => {
let range = Range::new(self.pos(), line.start());
self.set_pos(line.end());
range
}
};
if invert_match.is_empty() {
return Ok(true);
}
if !self.after_context_by_line(buf, invert_match.start())? {
return Ok(false);
}
if !self.before_context_by_line(buf, invert_match.start())? {
return Ok(false);
}
let mut stepper = LineStep::new(
self.config.line_term.as_byte(),
invert_match.start(),
invert_match.end(),
);
while let Some(line) = stepper.next_match(buf) {
if !self.sink_matched(buf, &line)? {
return Ok(false);
}
}
Ok(true)
}
fn find_by_line_fast(
&self,
buf: &[u8],
) -> Result<Option<Range>, S::Error> {
debug_assert!(!self.searcher.multi_line_with_matcher(&self.matcher));
debug_assert!(self.is_line_by_line_fast());
let mut pos = self.pos();
while !buf[pos..].is_empty() {
match self.matcher.find_candidate_line(&buf[pos..]) {
Err(err) => return Err(S::Error::error_message(err)),
Ok(None) => return Ok(None),
Ok(Some(LineMatchKind::Confirmed(i))) => {
let line = lines::locate(
buf,
self.config.line_term.as_byte(),
Range::zero(i).offset(pos),
);
// If we matched beyond the end of the buffer, then we
// don't report this as a match.
if line.start() == buf.len() {
pos = buf.len();
continue;
}
return Ok(Some(line));
}
Ok(Some(LineMatchKind::Candidate(i))) => {
let line = lines::locate(
buf,
self.config.line_term.as_byte(),
Range::zero(i).offset(pos),
);
// We need to strip the line terminator here to match the
// semantics of line-by-line searching. Namely, regexes
// like `(?m)^$` can match at the final position beyond a
// line terminator, which is non-sensical in line oriented
// matching.
let slice = lines::without_terminator(
&buf[line],
self.config.line_term,
);
match self.matcher.is_match(slice) {
Err(err) => return Err(S::Error::error_message(err)),
Ok(true) => return Ok(Some(line)),
Ok(false) => {
pos = line.end();
continue;
}
}
}
}
}
Ok(None)
}
fn sink_matched(
&mut self,
buf: &[u8],
range: &Range,
) -> Result<bool, S::Error> {
if self.binary && self.detect_binary(buf, range) {
return Ok(false);
}
if !self.sink_break_context(range.start())? {
return Ok(false);
}
self.count_lines(buf, range.start());
let offset = self.absolute_byte_offset + range.start() as u64;
let keepgoing = self.sink.matched(
&self.searcher,
&SinkMatch {
line_term: self.config.line_term,
bytes: &buf[*range],
absolute_byte_offset: offset,
line_number: self.line_number,
},
)?;
if !keepgoing {
return Ok(false);
}
self.last_line_visited = range.end();
self.after_context_left = self.config.after_context;
self.has_sunk = true;
Ok(true)
}
fn sink_before_context(
&mut self,
buf: &[u8],
range: &Range,
) -> Result<bool, S::Error> {
if self.binary && self.detect_binary(buf, range) {
return Ok(false);
}
self.count_lines(buf, range.start());
let offset = self.absolute_byte_offset + range.start() as u64;
let keepgoing = self.sink.context(
&self.searcher,
&SinkContext {
line_term: self.config.line_term,
bytes: &buf[*range],
kind: SinkContextKind::Before,
absolute_byte_offset: offset,
line_number: self.line_number,
},
)?;
if !keepgoing {
return Ok(false);
}
self.last_line_visited = range.end();
self.has_sunk = true;
Ok(true)
}
fn sink_after_context(
&mut self,
buf: &[u8],
range: &Range,
) -> Result<bool, S::Error> {
assert!(self.after_context_left >= 1);
if self.binary && self.detect_binary(buf, range) {
return Ok(false);
}
self.count_lines(buf, range.start());
let offset = self.absolute_byte_offset + range.start() as u64;
let keepgoing = self.sink.context(
&self.searcher,
&SinkContext {
line_term: self.config.line_term,
bytes: &buf[*range],
kind: SinkContextKind::After,
absolute_byte_offset: offset,
line_number: self.line_number,
},
)?;
if !keepgoing {
return Ok(false);
}
self.last_line_visited = range.end();
self.after_context_left -= 1;
self.has_sunk = true;
Ok(true)
}
fn sink_other_context(
&mut self,
buf: &[u8],
range: &Range,
) -> Result<bool, S::Error> {
if self.binary && self.detect_binary(buf, range) {
return Ok(false);
}
self.count_lines(buf, range.start());
let offset = self.absolute_byte_offset + range.start() as u64;
let keepgoing = self.sink.context(
&self.searcher,
&SinkContext {
line_term: self.config.line_term,
bytes: &buf[*range],
kind: SinkContextKind::Other,
absolute_byte_offset: offset,
line_number: self.line_number,
},
)?;
if !keepgoing {
return Ok(false);
}
self.last_line_visited = range.end();
self.has_sunk = true;
Ok(true)
}
fn sink_break_context(
&mut self,
start_of_line: usize,
) -> Result<bool, S::Error> {
let is_gap = self.last_line_visited < start_of_line;
let any_context =
self.config.before_context > 0
|| self.config.after_context > 0;
if !any_context || !self.has_sunk || !is_gap {
Ok(true)
} else {
self.sink.context_break(&self.searcher)
}
}
fn count_lines(&mut self, buf: &[u8], upto: usize) {
if let Some(ref mut line_number) = self.line_number {
if self.last_line_counted >= upto {
return;
}
let slice = &buf[self.last_line_counted..upto];
let count = lines::count(slice, self.config.line_term.as_byte());
*line_number += count;
self.last_line_counted = upto;
}
}
fn is_line_by_line_fast(&self) -> bool {
debug_assert!(!self.searcher.multi_line_with_matcher(&self.matcher));
if self.config.passthru {
return false;
}
if let Some(line_term) = self.matcher.line_terminator() {
if line_term == self.config.line_term {
return true;
}
}
if let Some(non_matching) = self.matcher.non_matching_bytes() {
// If the line terminator is CRLF, we don't actually need to care
// whether the regex can match `\r` or not. Namely, a `\r` is
// neither necessary nor sufficient to terminate a line. A `\n` is
// always required.
if non_matching.contains(self.config.line_term.as_byte()) {
return true;
}
}
false
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,106 @@
use std::fs::File;
use std::path::Path;
use memmap::Mmap;
/// Controls the strategy used for determining when to use memory maps.
///
/// If a searcher is called in circumstances where it is possible to use memory
/// maps, and memory maps are enabled, then it will attempt to do so if it
/// believes it will make the search faster.
///
/// By default, memory maps are disabled.
#[derive(Clone, Debug)]
pub struct MmapChoice(MmapChoiceImpl);
#[derive(Clone, Debug)]
enum MmapChoiceImpl {
Auto,
Never,
}
impl Default for MmapChoice {
fn default() -> MmapChoice {
MmapChoice(MmapChoiceImpl::Never)
}
}
impl MmapChoice {
/// Use memory maps when they are believed to be advantageous.
///
/// The heuristics used to determine whether to use a memory map or not
/// may depend on many things, including but not limited to, file size
/// and platform.
///
/// If memory maps are unavailable or cannot be used for a specific input,
/// then normal OS read calls are used instead.
///
/// # Safety
///
/// This constructor is not safe because there is no obvious way to
/// encapsulate the safety of file backed memory maps on all platforms
/// without simultaneously negating some or all of their benefits.
///
/// The specific contract the caller is required to uphold isn't precise,
/// but it basically amounts to something like, "the caller guarantees that
/// the underlying file won't be mutated." This, of course, isn't feasible
/// in many environments. However, command line tools may still decide to
/// take the risk of, say, a `SIGBUS` occurring while attempting to read a
/// memory map.
pub unsafe fn auto() -> MmapChoice {
MmapChoice(MmapChoiceImpl::Auto)
}
/// Never use memory maps, no matter what. This is the default.
pub fn never() -> MmapChoice {
MmapChoice(MmapChoiceImpl::Never)
}
/// Return a memory map if memory maps are enabled and if creating a
/// memory from the given file succeeded and if memory maps are believed
/// to be advantageous for performance.
///
/// If this does attempt to open a memory map and it fails, then `None`
/// is returned and the corresponding error (along with the file path, if
/// present) is logged at the debug level.
pub(crate) fn open(
&self,
file: &File,
path: Option<&Path>,
) -> Option<Mmap> {
if !self.is_enabled() {
return None;
}
if cfg!(target_os = "macos") {
// I guess memory maps on macOS aren't great. Should re-evaluate.
return None;
}
// SAFETY: This is acceptable because the only way `MmapChoiceImpl` can
// be `Auto` is if the caller invoked the `auto` constructor. Thus,
// this is a propagation of the caller's assertion that using memory
// maps is safe.
match unsafe { Mmap::map(file) } {
Ok(mmap) => Some(mmap),
Err(err) => {
if let Some(path) = path {
debug!(
"{}: failed to open memory map: {}",
path.display(),
err
);
} else {
debug!("failed to open memory map: {}", err);
}
None
}
}
}
/// Whether this strategy may employ memory maps or not.
pub(crate) fn is_enabled(&self) -> bool {
match self.0 {
MmapChoiceImpl::Auto => true,
MmapChoiceImpl::Never => false,
}
}
}

View File

@@ -0,0 +1,934 @@
use std::cell::RefCell;
use std::cmp;
use std::fmt;
use std::fs::File;
use std::io::{self, Read};
use std::path::Path;
use encoding_rs;
use encoding_rs_io::DecodeReaderBytesBuilder;
use grep_matcher::{LineTerminator, Match, Matcher};
use line_buffer::{
self, BufferAllocation, LineBuffer, LineBufferBuilder, LineBufferReader,
DEFAULT_BUFFER_CAPACITY, alloc_error,
};
use searcher::glue::{ReadByLine, SliceByLine, MultiLine};
use sink::{Sink, SinkError};
pub use self::mmap::MmapChoice;
mod core;
mod glue;
mod mmap;
/// We use this type alias since we want the ergonomics of a matcher's `Match`
/// type, but in practice, we use it for arbitrary ranges, so give it a more
/// accurate name. This is only used in the searcher's internals.
type Range = Match;
/// The behavior of binary detection while searching.
///
/// Binary detection is the process of _heuristically_ identifying whether a
/// given chunk of data is binary or not, and then taking an action based on
/// the result of that heuristic. The motivation behind detecting binary data
/// is that binary data often indicates data that is undesirable to search
/// using textual patterns. Of course, there are many cases in which this isn't
/// true, which is why binary detection is disabled by default.
///
/// Unfortunately, binary detection works differently depending on the type of
/// search being executed:
///
/// 1. When performing a search using a fixed size buffer, binary detection is
/// applied to the buffer's contents as it is filled. Binary detection must
/// be applied to the buffer directly because binary files may not contain
/// line terminators, which could result in exorbitant memory usage.
/// 2. When performing a search using memory maps or by reading data off the
/// heap, then binary detection is only guaranteed to be applied to the
/// parts corresponding to a match. When `Quit` is enabled, then the first
/// few KB of the data are searched for binary data.
#[derive(Clone, Debug, Default)]
pub struct BinaryDetection(line_buffer::BinaryDetection);
impl BinaryDetection {
/// No binary detection is performed. Data reported by the searcher may
/// contain arbitrary bytes.
///
/// This is the default.
pub fn none() -> BinaryDetection {
BinaryDetection(line_buffer::BinaryDetection::None)
}
/// Binary detection is performed by looking for the given byte.
///
/// When searching is performed using a fixed size buffer, then the
/// contents of that buffer are always searched for the presence of this
/// byte. If it is found, then the underlying data is considered binary
/// and the search stops as if it reached EOF.
///
/// When searching is performed with the entire contents mapped into
/// memory, then binary detection is more conservative. Namely, only a
/// fixed sized region at the beginning of the contents are detected for
/// binary data. As a compromise, any subsequent matching (or context)
/// lines are also searched for binary data. If binary data is detected at
/// any point, then the search stops as if it reached EOF.
pub fn quit(binary_byte: u8) -> BinaryDetection {
BinaryDetection(line_buffer::BinaryDetection::Quit(binary_byte))
}
// TODO(burntsushi): Figure out how to make binary conversion work. This
// permits implementing GNU grep's default behavior, which is to zap NUL
// bytes but still execute a search (if a match is detected, then GNU grep
// stops and reports that a match was found but doesn't print the matching
// line itself).
//
// This behavior is pretty simple to implement using the line buffer (and
// in fact, it is already implemented and tested), since there's a fixed
// size buffer that we can easily write to. The issue arises when searching
// a `&[u8]` (whether on the heap or via a memory map), since this isn't
// something we can easily write to.
/// The given byte is searched in all contents read by the line buffer. If
/// it occurs, then it is replaced by the line terminator. The line buffer
/// guarantees that this byte will never be observable by callers.
#[allow(dead_code)]
fn convert(binary_byte: u8) -> BinaryDetection {
BinaryDetection(line_buffer::BinaryDetection::Convert(binary_byte))
}
}
/// An encoding to use when searching.
///
/// An encoding can be used to configure a
/// [`SearcherBuilder`](struct.SearchBuilder.html)
/// to transcode source data from an encoding to UTF-8 before searching.
///
/// An `Encoding` will always be cheap to clone.
#[derive(Clone, Debug)]
pub struct Encoding(&'static encoding_rs::Encoding);
impl Encoding {
/// Create a new encoding for the specified label.
///
/// The encoding label provided is mapped to an encoding via the set of
/// available choices specified in the
/// [Encoding Standard](https://encoding.spec.whatwg.org/#concept-encoding-get).
/// If the given label does not correspond to a valid encoding, then this
/// returns an error.
pub fn new(label: &str) -> Result<Encoding, ConfigError> {
let label = label.as_bytes();
match encoding_rs::Encoding::for_label_no_replacement(label) {
Some(encoding) => Ok(Encoding(encoding)),
None => {
Err(ConfigError::UnknownEncoding { label: label.to_vec() })
}
}
}
}
/// The internal configuration of a searcher. This is shared among several
/// search related types, but is only ever written to by the SearcherBuilder.
#[derive(Clone, Debug)]
pub struct Config {
/// The line terminator to use.
line_term: LineTerminator,
/// Whether to invert matching.
invert_match: bool,
/// The number of lines after a match to include.
after_context: usize,
/// The number of lines before a match to include.
before_context: usize,
/// Whether to enable unbounded context or not.
passthru: bool,
/// Whether to count line numbers.
line_number: bool,
/// The maximum amount of heap memory to use.
///
/// When not given, no explicit limit is enforced. When set to `0`, then
/// only the memory map search strategy is available.
heap_limit: Option<usize>,
/// The memory map strategy.
mmap: MmapChoice,
/// The binary data detection strategy.
binary: BinaryDetection,
/// Whether to enable matching across multiple lines.
multi_line: bool,
/// An encoding that, when present, causes the searcher to transcode all
/// input from the encoding to UTF-8.
encoding: Option<Encoding>,
}
impl Default for Config {
fn default() -> Config {
Config {
line_term: LineTerminator::default(),
invert_match: false,
after_context: 0,
before_context: 0,
passthru: false,
line_number: true,
heap_limit: None,
mmap: MmapChoice::default(),
binary: BinaryDetection::default(),
multi_line: false,
encoding: None,
}
}
}
impl Config {
/// Return the maximal amount of lines needed to fulfill this
/// configuration's context.
///
/// If this returns `0`, then no context is ever needed.
fn max_context(&self) -> usize {
cmp::max(self.before_context, self.after_context)
}
/// Build a line buffer from this configuration.
fn line_buffer(&self) -> LineBuffer {
let mut builder = LineBufferBuilder::new();
builder
.line_terminator(self.line_term.as_byte())
.binary_detection(self.binary.0);
if let Some(limit) = self.heap_limit {
let (capacity, additional) =
if limit <= DEFAULT_BUFFER_CAPACITY {
(limit, 0)
} else {
(DEFAULT_BUFFER_CAPACITY, limit - DEFAULT_BUFFER_CAPACITY)
};
builder
.capacity(capacity)
.buffer_alloc(BufferAllocation::Error(additional));
}
builder.build()
}
}
/// An error that can occur when building a searcher.
///
/// This error occurs when a non-sensical configuration is present when trying
/// to construct a `Searcher` from a `SearcherBuilder`.
#[derive(Clone, Debug, Eq, PartialEq)]
pub enum ConfigError {
/// Indicates that the heap limit configuration prevents all possible
/// search strategies from being used. For example, if the heap limit is
/// set to 0 and memory map searching is disabled or unavailable.
SearchUnavailable,
/// Occurs when a matcher reports a line terminator that is different than
/// the one configured in the searcher.
MismatchedLineTerminators {
/// The matcher's line terminator.
matcher: LineTerminator,
/// The searcher's line terminator.
searcher: LineTerminator,
},
/// Occurs when no encoding could be found for a particular label.
UnknownEncoding {
/// The provided encoding label that could not be found.
label: Vec<u8>,
},
/// Hints that destructuring should not be exhaustive.
///
/// This enum may grow additional variants, so this makes sure clients
/// don't count on exhaustive matching. (Otherwise, adding a new variant
/// could break existing code.)
#[doc(hidden)]
__Nonexhaustive,
}
impl ::std::error::Error for ConfigError {
fn description(&self) -> &str { "grep-searcher configuration error" }
}
impl fmt::Display for ConfigError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match *self {
ConfigError::SearchUnavailable => {
write!(f, "grep config error: no available searchers")
}
ConfigError::MismatchedLineTerminators { matcher, searcher } => {
write!(
f,
"grep config error: mismatched line terminators, \
matcher has {:?} but searcher has {:?}",
matcher,
searcher
)
}
ConfigError::UnknownEncoding { ref label } => {
write!(
f,
"grep config error: unknown encoding: {}",
String::from_utf8_lossy(label),
)
}
_ => panic!("BUG: unexpected variant found"),
}
}
}
/// A builder for configuring a searcher.
///
/// A search builder permits specifying the configuration of a searcher,
/// including options like whether to invert the search or to enable multi
/// line search.
///
/// Once a searcher has been built, it is beneficial to reuse that searcher
/// for multiple searches, if possible.
#[derive(Clone, Debug)]
pub struct SearcherBuilder {
config: Config,
}
impl Default for SearcherBuilder {
fn default() -> SearcherBuilder {
SearcherBuilder::new()
}
}
impl SearcherBuilder {
/// Create a new searcher builder with a default configuration.
pub fn new() -> SearcherBuilder {
SearcherBuilder {
config: Config::default(),
}
}
/// Builder a searcher with the given matcher.
pub fn build(&self) -> Searcher {
let mut config = self.config.clone();
if config.passthru {
config.before_context = 0;
config.after_context = 0;
}
let mut decode_builder = DecodeReaderBytesBuilder::new();
decode_builder
.encoding(self.config.encoding.as_ref().map(|e| e.0))
.utf8_passthru(true);
Searcher {
config: config,
decode_builder: decode_builder,
decode_buffer: RefCell::new(vec![0; 8 * (1<<10)]),
line_buffer: RefCell::new(self.config.line_buffer()),
multi_line_buffer: RefCell::new(vec![]),
}
}
/// Set the line terminator that is used by the searcher.
///
/// When building a searcher, if the matcher provided has a line terminator
/// set, then it must be the same as this one. If they aren't, building
/// a searcher will return an error.
///
/// By default, this is set to `b'\n'`.
pub fn line_terminator(
&mut self,
line_term: LineTerminator,
) -> &mut SearcherBuilder {
self.config.line_term = line_term;
self
}
/// Whether to invert matching, whereby lines that don't match are reported
/// instead of reporting lines that do match.
///
/// By default, this is disabled.
pub fn invert_match(&mut self, yes: bool) -> &mut SearcherBuilder {
self.config.invert_match = yes;
self
}
/// Whether to count and include line numbers with matching lines.
///
/// This is enabled by default. There is a small performance penalty
/// associated with computing line numbers, so this can be disabled when
/// this isn't desirable.
pub fn line_number(&mut self, yes: bool) -> &mut SearcherBuilder {
self.config.line_number = yes;
self
}
/// Whether to enable multi line search or not.
///
/// When multi line search is enabled, matches *may* match across multiple
/// lines. Conversely, when multi line search is disabled, it is impossible
/// for any match to span more than one line.
///
/// **Warning:** multi line search requires having the entire contents to
/// search mapped in memory at once. When searching files, memory maps
/// will be used if possible and if they are enabled, which avoids using
/// your program's heap. However, if memory maps cannot be used (e.g.,
/// for searching streams like `stdin` or if transcoding is necessary),
/// then the entire contents of the stream are read on to the heap before
/// starting the search.
///
/// This is disabled by default.
pub fn multi_line(&mut self, yes: bool) -> &mut SearcherBuilder {
self.config.multi_line = yes;
self
}
/// Whether to include a fixed number of lines after every match.
///
/// When this is set to a non-zero number, then the searcher will report
/// `line_count` contextual lines after every match.
///
/// This is set to `0` by default.
pub fn after_context(
&mut self,
line_count: usize,
) -> &mut SearcherBuilder {
self.config.after_context = line_count;
self
}
/// Whether to include a fixed number of lines before every match.
///
/// When this is set to a non-zero number, then the searcher will report
/// `line_count` contextual lines before every match.
///
/// This is set to `0` by default.
pub fn before_context(
&mut self,
line_count: usize,
) -> &mut SearcherBuilder {
self.config.before_context = line_count;
self
}
/// Whether to enable the "passthru" feature or not.
///
/// When passthru is enabled, it effectively treats all non-matching lines
/// as contextual lines. In other words, enabling this is akin to
/// requesting an unbounded number of before and after contextual lines.
///
/// When passthru mode is enabled, any `before_context` or `after_context`
/// settings are ignored by setting them to `0`.
///
/// This is disabled by default.
pub fn passthru(&mut self, yes: bool) -> &mut SearcherBuilder {
self.config.passthru = yes;
self
}
/// Set an approximate limit on the amount of heap space used by a
/// searcher.
///
/// The heap limit is enforced in two scenarios:
///
/// * When searching using a fixed size buffer, the heap limit controls
/// how big this buffer is allowed to be. Assuming contexts are disabled,
/// the minimum size of this buffer is the length (in bytes) of the
/// largest single line in the contents being searched. If any line
/// exceeds the heap limit, then an error will be returned.
/// * When performing a multi line search, a fixed size buffer cannot be
/// used. Thus, the only choices are to read the entire contents on to
/// the heap, or use memory maps. In the former case, the heap limit set
/// here is enforced.
///
/// If a heap limit is set to `0`, then no heap space is used. If there are
/// no alternative strategies available for searching without heap space
/// (e.g., memory maps are disabled), then the searcher wil return an error
/// immediately.
///
/// By default, no limit is set.
pub fn heap_limit(
&mut self,
bytes: Option<usize>,
) -> &mut SearcherBuilder {
self.config.heap_limit = bytes;
self
}
/// Set the strategy to employ use of memory maps.
///
/// Currently, there are only two strategies that can be employed:
///
/// * **Automatic** - A searcher will use heuristics, including but not
/// limited to file size and platform, to determine whether to use memory
/// maps or not.
/// * **Never** - Memory maps will never be used. If multi line search is
/// enabled, then the entire contents will be read on to the heap before
/// searching begins.
///
/// The default behavior is **never**. Generally speaking, command line
/// programs probably want to enable memory maps. The only reason to keep
/// memory maps disabled is if there are concerns using them. For example,
/// if your process is searching a file backed memory map at the same time
/// that file is truncated, then it's possible for the process to terminate
/// with a bus error.
pub fn memory_map(
&mut self,
strategy: MmapChoice,
) -> &mut SearcherBuilder {
self.config.mmap = strategy;
self
}
/// Set the binary detection strategy.
///
/// The binary detection strategy determines not only how the searcher
/// detects binary data, but how it responds to the presence of binary
/// data. See the [`BinaryDetection`](struct.BinaryDetection.html) type
/// for more information.
///
/// By default, binary detection is disabled.
pub fn binary_detection(
&mut self,
detection: BinaryDetection,
) -> &mut SearcherBuilder {
self.config.binary = detection;
self
}
/// Set the encoding used to read the source data before searching.
///
/// When an encoding is provided, then the source data is _unconditionally_
/// transcoded using the encoding. This will disable BOM sniffing. If the
/// transcoding process encounters an error, then bytes are replaced with
/// the Unicode replacement codepoint.
///
/// When no encoding is specified (the default), then BOM sniffing is used
/// to determine whether the source data is UTF-8 or UTF-16, and
/// transcoding will be performed automatically. If no BOM could be found,
/// then the source data is searched _as if_ it were UTF-8. However, so
/// long as the source data is at least ASCII compatible, then it is
/// possible for a search to produce useful results.
pub fn encoding(
&mut self,
encoding: Option<Encoding>,
) -> &mut SearcherBuilder {
self.config.encoding = encoding;
self
}
}
/// A searcher executes searches over a haystack and writes results to a caller
/// provided sink.
///
/// Matches are detected via implementations of the `Matcher` trait, which must
/// be provided by the caller when executing a search.
///
/// When possible, a searcher should be reused.
#[derive(Clone, Debug)]
pub struct Searcher {
/// The configuration for this searcher.
///
/// We make most of these settings available to users of `Searcher` via
/// public API methods, which can be queried in implementations of `Sink`
/// if necessary.
config: Config,
/// A builder for constructing a streaming reader that transcodes source
/// data according to either an explicitly specified encoding or via an
/// automatically detected encoding via BOM sniffing.
///
/// When no transcoding is needed, then the transcoder built will pass
/// through the underlying bytes with no additional overhead.
decode_builder: DecodeReaderBytesBuilder,
/// A buffer that is used for transcoding scratch space.
decode_buffer: RefCell<Vec<u8>>,
/// A line buffer for use in line oriented searching.
///
/// We wrap it in a RefCell to permit lending out borrows of `Searcher`
/// to sinks. We still require a mutable borrow to execute a search, so
/// we statically prevent callers from causing RefCell to panic at runtime
/// due to a borrowing violation.
line_buffer: RefCell<LineBuffer>,
/// A buffer in which to store the contents of a reader when performing a
/// multi line search. In particular, multi line searches cannot be
/// performed incrementally, and need the entire haystack in memory at
/// once.
multi_line_buffer: RefCell<Vec<u8>>,
}
impl Searcher {
/// Create a new searcher with a default configuration.
///
/// To configure the searcher (e.g., invert matching, enable memory maps,
/// enable contexts, etc.), use the
/// [`SearcherBuilder`](struct.SearcherBuilder.html).
pub fn new() -> Searcher {
SearcherBuilder::new().build()
}
/// Execute a search over the file with the given path and write the
/// results to the given sink.
///
/// If memory maps are enabled and the searcher heuristically believes
/// memory maps will help the search run faster, then this will use
/// memory maps. For this reason, callers should prefer using this method
/// or `search_file` over the more generic `search_reader` when possible.
pub fn search_path<P, M, S>(
&mut self,
matcher: M,
path: P,
write_to: S,
) -> Result<(), S::Error>
where P: AsRef<Path>,
M: Matcher,
S: Sink,
{
let path = path.as_ref();
let file = File::open(path).map_err(S::Error::error_io)?;
self.search_file_maybe_path(matcher, Some(path), &file, write_to)
}
/// Execute a search over a file and write the results to the given sink.
///
/// If memory maps are enabled and the searcher heuristically believes
/// memory maps will help the search run faster, then this will use
/// memory maps. For this reason, callers should prefer using this method
/// or `search_path` over the more generic `search_reader` when possible.
pub fn search_file<M, S>(
&mut self,
matcher: M,
file: &File,
write_to: S,
) -> Result<(), S::Error>
where M: Matcher,
S: Sink,
{
self.search_file_maybe_path(matcher, None, file, write_to)
}
fn search_file_maybe_path<M, S>(
&mut self,
matcher: M,
path: Option<&Path>,
file: &File,
write_to: S,
) -> Result<(), S::Error>
where M: Matcher,
S: Sink,
{
if let Some(mmap) = self.config.mmap.open(file, path) {
trace!("{:?}: searching via memory map", path);
return self.search_slice(matcher, &mmap, write_to);
}
// Fast path for multi-line searches of files when memory maps are
// not enabled. This pre-allocates a buffer roughly the size of the
// file, which isn't possible when searching an arbitrary io::Read.
if self.multi_line_with_matcher(&matcher) {
trace!("{:?}: reading entire file on to heap for mulitline", path);
self.fill_multi_line_buffer_from_file::<S>(file)?;
trace!("{:?}: searching via multiline strategy", path);
MultiLine::new(
self,
matcher,
&*self.multi_line_buffer.borrow(),
write_to,
).run()
} else {
trace!("{:?}: searching using generic reader", path);
self.search_reader(matcher, file, write_to)
}
}
/// Execute a search over any implementation of `io::Read` and write the
/// results to the given sink.
///
/// When possible, this implementation will search the reader incrementally
/// without reading it into memory. In some cases---for example, if multi
/// line search is enabled---an incremental search isn't possible and the
/// given reader is consumed completely and placed on the heap before
/// searching begins. For this reason, when multi line search is enabled,
/// one should try to use higher level APIs (e.g., searching by file or
/// file path) so that memory maps can be used if they are available and
/// enabled.
pub fn search_reader<M, R, S>(
&mut self,
matcher: M,
read_from: R,
write_to: S,
) -> Result<(), S::Error>
where M: Matcher,
R: io::Read,
S: Sink,
{
self.check_config(&matcher).map_err(S::Error::error_config)?;
let mut decode_buffer = self.decode_buffer.borrow_mut();
let read_from = self.decode_builder
.build_with_buffer(read_from, &mut *decode_buffer)
.map_err(S::Error::error_io)?;
if self.multi_line_with_matcher(&matcher) {
trace!("generic reader: reading everything to heap for multiline");
self.fill_multi_line_buffer_from_reader::<_, S>(read_from)?;
trace!("generic reader: searching via multiline strategy");
MultiLine::new(
self,
matcher,
&*self.multi_line_buffer.borrow(),
write_to,
).run()
} else {
let mut line_buffer = self.line_buffer.borrow_mut();
let rdr = LineBufferReader::new(read_from, &mut *line_buffer);
trace!("generic reader: searching via roll buffer strategy");
ReadByLine::new(self, matcher, rdr, write_to).run()
}
}
/// Execute a search over the given slice and write the results to the
/// given sink.
pub fn search_slice<M, S>(
&mut self,
matcher: M,
slice: &[u8],
write_to: S,
) -> Result<(), S::Error>
where M: Matcher,
S: Sink,
{
self.check_config(&matcher).map_err(S::Error::error_config)?;
// We can search the slice directly, unless we need to do transcoding.
if self.slice_needs_transcoding(slice) {
trace!("slice reader: needs transcoding, using generic reader");
return self.search_reader(matcher, slice, write_to);
}
if self.multi_line_with_matcher(&matcher) {
trace!("slice reader: searching via multiline strategy");
MultiLine::new(self, matcher, slice, write_to).run()
} else {
trace!("slice reader: searching via slice-by-line strategy");
SliceByLine::new(self, matcher, slice, write_to).run()
}
}
/// Check that the searcher's configuration and the matcher are consistent
/// with each other.
fn check_config<M: Matcher>(&self, matcher: M) -> Result<(), ConfigError> {
if self.config.heap_limit == Some(0)
&& !self.config.mmap.is_enabled()
{
return Err(ConfigError::SearchUnavailable);
}
let matcher_line_term = match matcher.line_terminator() {
None => return Ok(()),
Some(line_term) => line_term,
};
if matcher_line_term != self.config.line_term {
return Err(ConfigError::MismatchedLineTerminators {
matcher: matcher_line_term,
searcher: self.config.line_term,
});
}
Ok(())
}
/// Returns true if and only if the given slice needs to be transcoded.
fn slice_needs_transcoding(&self, slice: &[u8]) -> bool {
self.config.encoding.is_some() || slice_has_utf16_bom(slice)
}
}
/// The following methods permit querying the configuration of a searcher.
/// These can be useful in generic implementations of
/// [`Sink`](trait.Sink.html),
/// where the output may be tailored based on how the searcher is configured.
impl Searcher {
/// Returns the line terminator used by this searcher.
pub fn line_terminator(&self) -> LineTerminator {
self.config.line_term
}
/// Returns true if and only if this searcher is configured to invert its
/// search results. That is, matching lines are lines that do **not** match
/// the searcher's matcher.
pub fn invert_match(&self) -> bool {
self.config.invert_match
}
/// Returns true if and only if this searcher is configured to count line
/// numbers.
pub fn line_number(&self) -> bool {
self.config.line_number
}
/// Returns true if and only if this searcher is configured to perform
/// multi line search.
pub fn multi_line(&self) -> bool {
self.config.multi_line
}
/// Returns true if and only if this searcher will choose a multi-line
/// strategy given the provided matcher.
///
/// This may diverge from the result of `multi_line` in cases where the
/// searcher has been configured to execute a search that can report
/// matches over multiple lines, but where the matcher guarantees that it
/// will never produce a match over multiple lines.
pub fn multi_line_with_matcher<M: Matcher>(&self, matcher: M) -> bool {
if !self.multi_line() {
return false;
}
if let Some(line_term) = matcher.line_terminator() {
if line_term == self.line_terminator() {
return false;
}
}
if let Some(non_matching) = matcher.non_matching_bytes() {
// If the line terminator is CRLF, we don't actually need to care
// whether the regex can match `\r` or not. Namely, a `\r` is
// neither necessary nor sufficient to terminate a line. A `\n` is
// always required.
if non_matching.contains(self.line_terminator().as_byte()) {
return false;
}
}
true
}
/// Returns the number of "after" context lines to report. When context
/// reporting is not enabled, this returns `0`.
pub fn after_context(&self) -> usize {
self.config.after_context
}
/// Returns the number of "before" context lines to report. When context
/// reporting is not enabled, this returns `0`.
pub fn before_context(&self) -> usize {
self.config.before_context
}
/// Returns true if and only if the searcher has "passthru" mode enabled.
pub fn passthru(&self) -> bool {
self.config.passthru
}
/// Fill the buffer for use with multi-line searching from the given file.
/// This reads from the file until EOF or until an error occurs. If the
/// contents exceed the configured heap limit, then an error is returned.
fn fill_multi_line_buffer_from_file<S: Sink>(
&self,
file: &File,
) -> Result<(), S::Error> {
assert!(self.config.multi_line);
let mut decode_buffer = self.decode_buffer.borrow_mut();
let mut read_from = self.decode_builder
.build_with_buffer(file, &mut *decode_buffer)
.map_err(S::Error::error_io)?;
// If we don't have a heap limit, then we can defer to std's
// read_to_end implementation. fill_multi_line_buffer_from_reader will
// do this too, but since we have a File, we can be a bit smarter about
// pre-allocating here.
//
// If we're transcoding, then our pre-allocation might not be exact,
// but is probably still better than nothing.
if self.config.heap_limit.is_none() {
let mut buf = self.multi_line_buffer.borrow_mut();
buf.clear();
let cap = file
.metadata()
.map(|m| m.len() as usize + 1)
.unwrap_or(0);
buf.reserve(cap);
read_from.read_to_end(&mut *buf).map_err(S::Error::error_io)?;
return Ok(());
}
self.fill_multi_line_buffer_from_reader::<_, S>(read_from)
}
/// Fill the buffer for use with multi-line searching from the given
/// reader. This reads from the reader until EOF or until an error occurs.
/// If the contents exceed the configured heap limit, then an error is
/// returned.
fn fill_multi_line_buffer_from_reader<R: io::Read, S: Sink>(
&self,
mut read_from: R,
) -> Result<(), S::Error> {
assert!(self.config.multi_line);
let mut buf = self.multi_line_buffer.borrow_mut();
buf.clear();
// If we don't have a heap limit, then we can defer to std's
// read_to_end implementation...
let heap_limit = match self.config.heap_limit {
Some(heap_limit) => heap_limit,
None => {
read_from.read_to_end(&mut *buf).map_err(S::Error::error_io)?;
return Ok(());
}
};
if heap_limit == 0 {
return Err(S::Error::error_io(alloc_error(heap_limit)));
}
// ... otherwise we need to roll our own. This is likely quite a bit
// slower than what is optimal, but we avoid worry about memory safety
// until there's a compelling reason to speed this up.
buf.resize(cmp::min(DEFAULT_BUFFER_CAPACITY, heap_limit), 0);
let mut pos = 0;
loop {
let nread = match read_from.read(&mut buf[pos..]) {
Ok(nread) => nread,
Err(ref err) if err.kind() == io::ErrorKind::Interrupted => {
continue;
}
Err(err) => return Err(S::Error::error_io(err)),
};
if nread == 0 {
buf.resize(pos, 0);
return Ok(());
}
pos += nread;
if buf[pos..].is_empty() {
let additional = heap_limit - buf.len();
if additional == 0 {
return Err(S::Error::error_io(alloc_error(heap_limit)));
}
let limit = buf.len() + additional;
let doubled = 2 * buf.len();
buf.resize(cmp::min(doubled, limit), 0);
}
}
}
}
/// Returns true if and only if the given slice begins with a UTF-16 BOM.
///
/// This is used by the searcher to determine if a transcoder is necessary.
/// Otherwise, it is advantageous to search the slice directly.
fn slice_has_utf16_bom(slice: &[u8]) -> bool {
let enc = match encoding_rs::Encoding::for_bom(slice) {
None => return false,
Some((enc, _)) => enc,
};
[encoding_rs::UTF_16LE, encoding_rs::UTF_16BE].contains(&enc)
}
#[cfg(test)]
mod tests {
use testutil::{KitchenSink, RegexMatcher};
use super::*;
#[test]
fn config_error_heap_limit() {
let matcher = RegexMatcher::new("");
let sink = KitchenSink::new();
let mut searcher = SearcherBuilder::new()
.heap_limit(Some(0))
.build();
let res = searcher.search_slice(matcher, &[], sink);
assert!(res.is_err());
}
#[test]
fn config_error_line_terminator() {
let mut matcher = RegexMatcher::new("");
matcher.set_line_term(Some(LineTerminator::byte(b'z')));
let sink = KitchenSink::new();
let mut searcher = Searcher::new();
let res = searcher.search_slice(matcher, &[], sink);
assert!(res.is_err());
}
}

559
grep-searcher/src/sink.rs Normal file
View File

@@ -0,0 +1,559 @@
use std::fmt;
use std::io;
use grep_matcher::LineTerminator;
use lines::LineIter;
use searcher::{ConfigError, Searcher};
/// A trait that describes errors that can be reported by searchers and
/// implementations of `Sink`.
///
/// Unless you have a specialized use case, you probably don't need to
/// implement this trait explicitly. It's likely that using `io::Error` (which
/// implements this trait) for your error type is good enough, largely because
/// most errors that occur during search will likely be an `io::Error`.
pub trait SinkError: Sized {
/// A constructor for converting any value that satisfies the
/// `fmt::Display` trait into an error.
fn error_message<T: fmt::Display>(message: T) -> Self;
/// A constructor for converting I/O errors that occur while searching into
/// an error of this type.
///
/// By default, this is implemented via the `error_message` constructor.
fn error_io(err: io::Error) -> Self {
Self::error_message(err)
}
/// A constructor for converting configuration errors that occur while
/// building a searcher into an error of this type.
///
/// By default, this is implemented via the `error_message` constructor.
fn error_config(err: ConfigError) -> Self {
Self::error_message(err)
}
}
/// An `io::Error` can be used as an error for `Sink` implementations out of
/// the box.
impl SinkError for io::Error {
fn error_message<T: fmt::Display>(message: T) -> io::Error {
io::Error::new(io::ErrorKind::Other, message.to_string())
}
fn error_io(err: io::Error) -> io::Error {
err
}
}
/// A `Box<std::error::Error>` can be used as an error for `Sink`
/// implementations out of the box.
impl SinkError for Box<::std::error::Error> {
fn error_message<T: fmt::Display>(message: T) -> Box<::std::error::Error> {
Box::<::std::error::Error>::from(message.to_string())
}
}
/// A trait that defines how results from searchers are handled.
///
/// In this crate, a searcher follows the "push" model. What that means is that
/// the searcher drives execution, and pushes results back to the caller. This
/// is in contrast to a "pull" model where the caller drives execution and
/// takes results as they need them. These are also known as "internal" and
/// "external" iteration strategies, respectively.
///
/// For a variety of reasons, including the complexity of the searcher
/// implementation, this crate chooses the "push" or "internal" model of
/// execution. Thus, in order to act on search results, callers must provide
/// an implementation of this trait to a searcher, and the searcher is then
/// responsible for calling the methods on this trait.
///
/// This trait defines five behaviors:
///
/// * What to do when a match is found. Callers must provide this.
/// * What to do when an error occurs. Callers must provide this via the
/// [`SinkError`](trait.SinkError.html) trait. Generally, callers can just
/// use `io::Error` for this, which already implements `SinkError`.
/// * What to do when a contextual line is found. By default, these are
/// ignored.
/// * What to do when a gap between contextual lines has been found. By
/// default, this is ignored.
/// * What to do when a search has started. By default, this does nothing.
/// * What to do when a search has finished successfully. By default, this does
/// nothing.
///
/// Callers must, at minimum, specify the behavior when an error occurs and
/// the behavior when a match occurs. The rest is optional. For each behavior,
/// callers may report an error (say, if writing the result to another
/// location failed) or simply return `false` if they want the search to stop
/// (e.g., when implementing a cap on the number of search results to show).
///
/// When errors are reported (whether in the searcher or in the implementation
/// of `Sink`), then searchers quit immediately without calling `finish`.
///
/// For simpler uses of `Sink`, callers may elect to use one of
/// the more convenient but less flexible implementations in the
/// [`sinks`](sinks/index.html) module.
pub trait Sink {
/// The type of an error that should be reported by a searcher.
///
/// Errors of this type are not only returned by the methods on this
/// trait, but the constructors defined in `SinkError` are also used in
/// the searcher implementation itself. e.g., When a I/O error occurs when
/// reading data from a file.
type Error: SinkError;
/// This method is called whenever a match is found.
///
/// If multi line is enabled on the searcher, then the match reported here
/// may span multiple lines and it may include multiple matches. When multi
/// line is disabled, then the match is guaranteed to span exactly one
/// non-empty line (where a single line is, at minimum, a line terminator).
///
/// If this returns `true`, then searching continues. If this returns
/// `false`, then searching is stopped immediately and `finish` is called.
///
/// If this returns an error, then searching is stopped immediately,
/// `finish` is not called and the error is bubbled back up to the caller
/// of the searcher.
fn matched(
&mut self,
_searcher: &Searcher,
_mat: &SinkMatch,
) -> Result<bool, Self::Error>;
/// This method is called whenever a context line is found, and is optional
/// to implement. By default, it does nothing and returns `true`.
///
/// In all cases, the context given is guaranteed to span exactly one
/// non-empty line (where a single line is, at minimum, a line terminator).
///
/// If this returns `true`, then searching continues. If this returns
/// `false`, then searching is stopped immediately and `finish` is called.
///
/// If this returns an error, then searching is stopped immediately,
/// `finish` is not called and the error is bubbled back up to the caller
/// of the searcher.
#[inline]
fn context(
&mut self,
_searcher: &Searcher,
_context: &SinkContext,
) -> Result<bool, Self::Error> {
Ok(true)
}
/// This method is called whenever a break in contextual lines is found,
/// and is optional to implement. By default, it does nothing and returns
/// `true`.
///
/// A break can only occur when context reporting is enabled (that is,
/// either or both of `before_context` or `after_context` are greater than
/// `0`). More precisely, a break occurs between non-contiguous groups of
/// lines.
///
/// If this returns `true`, then searching continues. If this returns
/// `false`, then searching is stopped immediately and `finish` is called.
///
/// If this returns an error, then searching is stopped immediately,
/// `finish` is not called and the error is bubbled back up to the caller
/// of the searcher.
#[inline]
fn context_break(
&mut self,
_searcher: &Searcher,
) -> Result<bool, Self::Error> {
Ok(true)
}
/// This method is called when a search has begun, before any search is
/// executed. By default, this does nothing.
///
/// If this returns `true`, then searching continues. If this returns
/// `false`, then searching is stopped immediately and `finish` is called.
///
/// If this returns an error, then searching is stopped immediately,
/// `finish` is not called and the error is bubbled back up to the caller
/// of the searcher.
#[inline]
fn begin(
&mut self,
_searcher: &Searcher,
) -> Result<bool, Self::Error> {
Ok(true)
}
/// This method is called when a search has completed. By default, this
/// does nothing.
///
/// If this returns an error, the error is bubbled back up to the caller of
/// the searcher.
#[inline]
fn finish(
&mut self,
_searcher: &Searcher,
_: &SinkFinish,
) -> Result<(), Self::Error> {
Ok(())
}
}
impl<'a, S: Sink> Sink for &'a mut S {
type Error = S::Error;
#[inline]
fn matched(
&mut self,
searcher: &Searcher,
mat: &SinkMatch,
) -> Result<bool, S::Error> {
(**self).matched(searcher, mat)
}
#[inline]
fn context(
&mut self,
searcher: &Searcher,
context: &SinkContext,
) -> Result<bool, S::Error> {
(**self).context(searcher, context)
}
#[inline]
fn context_break(
&mut self,
searcher: &Searcher,
) -> Result<bool, S::Error> {
(**self).context_break(searcher)
}
#[inline]
fn begin(
&mut self,
searcher: &Searcher,
) -> Result<bool, S::Error> {
(**self).begin(searcher)
}
#[inline]
fn finish(
&mut self,
searcher: &Searcher,
sink_finish: &SinkFinish,
) -> Result<(), S::Error> {
(**self).finish(searcher, sink_finish)
}
}
/// Summary data reported at the end of a search.
///
/// This reports data such as the total number of bytes searched and the
/// absolute offset of the first occurrence of binary data, if any were found.
///
/// A searcher that stops early because of an error does not call `finish`.
/// A searcher that stops early because the `Sink` implementor instructed it
/// to will still call `finish`.
#[derive(Clone, Debug)]
pub struct SinkFinish {
pub(crate) byte_count: u64,
pub(crate) binary_byte_offset: Option<u64>,
}
impl SinkFinish {
/// Return the total number of bytes searched.
#[inline]
pub fn byte_count(&self) -> u64 {
self.byte_count
}
/// If binary detection is enabled and if binary data was found, then this
/// returns the absolute byte offset of the first detected byte of binary
/// data.
///
/// Note that since this is an absolute byte offset, it cannot be relied
/// upon to index into any addressable memory.
#[inline]
pub fn binary_byte_offset(&self) -> Option<u64> {
self.binary_byte_offset
}
}
/// A type that describes a match reported by a searcher.
#[derive(Clone, Debug)]
pub struct SinkMatch<'b> {
pub(crate) line_term: LineTerminator,
pub(crate) bytes: &'b [u8],
pub(crate) absolute_byte_offset: u64,
pub(crate) line_number: Option<u64>,
}
impl<'b> SinkMatch<'b> {
/// Returns the bytes for all matching lines, including the line
/// terminators, if they exist.
#[inline]
pub fn bytes(&self) -> &'b [u8] {
self.bytes
}
/// Return an iterator over the lines in this match.
///
/// If multi line search is enabled, then this may yield more than one
/// line (but always at least one line). If multi line search is disabled,
/// then this always reports exactly one line (but may consist of just
/// the line terminator).
///
/// Lines yielded by this iterator include their terminators.
#[inline]
pub fn lines(&self) -> LineIter<'b> {
LineIter::new(self.line_term.as_byte(), self.bytes)
}
/// Returns the absolute byte offset of the start of this match. This
/// offset is absolute in that it is relative to the very beginning of the
/// input in a search, and can never be relied upon to be a valid index
/// into an in-memory slice.
#[inline]
pub fn absolute_byte_offset(&self) -> u64 {
self.absolute_byte_offset
}
/// Returns the line number of the first line in this match, if available.
///
/// Line numbers are only available when the search builder is instructed
/// to compute them.
#[inline]
pub fn line_number(&self) -> Option<u64> {
self.line_number
}
}
/// The type of context reported by a searcher.
#[derive(Clone, Debug, Eq, PartialEq)]
pub enum SinkContextKind {
/// The line reported occurred before a match.
Before,
/// The line reported occurred after a match.
After,
/// Any other type of context reported, e.g., as a result of a searcher's
/// "passthru" mode.
Other,
}
/// A type that describes a contextual line reported by a searcher.
#[derive(Clone, Debug)]
pub struct SinkContext<'b> {
pub(crate) line_term: LineTerminator,
pub(crate) bytes: &'b [u8],
pub(crate) kind: SinkContextKind,
pub(crate) absolute_byte_offset: u64,
pub(crate) line_number: Option<u64>,
}
impl<'b> SinkContext<'b> {
/// Returns the context bytes, including line terminators.
#[inline]
pub fn bytes(&self) -> &'b [u8] {
self.bytes
}
/// Returns the type of context.
#[inline]
pub fn kind(&self) -> &SinkContextKind {
&self.kind
}
/// Return an iterator over the lines in this match.
///
/// This always yields exactly one line (and that one line may contain just
/// the line terminator).
///
/// Lines yielded by this iterator include their terminators.
#[cfg(test)]
pub(crate) fn lines(&self) -> LineIter<'b> {
LineIter::new(self.line_term.as_byte(), self.bytes)
}
/// Returns the absolute byte offset of the start of this context. This
/// offset is absolute in that it is relative to the very beginning of the
/// input in a search, and can never be relied upon to be a valid index
/// into an in-memory slice.
#[inline]
pub fn absolute_byte_offset(&self) -> u64 {
self.absolute_byte_offset
}
/// Returns the line number of the first line in this context, if
/// available.
///
/// Line numbers are only available when the search builder is instructed
/// to compute them.
#[inline]
pub fn line_number(&self) -> Option<u64> {
self.line_number
}
}
/// A collection of convenience implementations of `Sink`.
///
/// Each implementation in this module makes some kind of sacrifice in the name
/// of making common cases easier to use. Most frequently, each type is a
/// wrapper around a closure specified by the caller that provides limited
/// access to the full suite of information available to implementors of
/// `Sink`.
///
/// For example, the `UTF8` sink makes the following sacrifices:
///
/// * All matches must be UTF-8. An arbitrary `Sink` does not have this
/// restriction and can deal with arbitrary data. If this sink sees invalid
/// UTF-8, then an error is returned and searching stops. (Use the `Lossy`
/// sink instead to suppress this error.)
/// * The searcher must be configured to report line numbers. If it isn't,
/// an error is reported at the first match and searching stops.
/// * Context lines, context breaks and summary data reported at the end of
/// a search are all ignored.
/// * Implementors are forced to use `io::Error` as their error type.
///
/// If you need more flexibility, then you're advised to implement the `Sink`
/// trait directly.
pub mod sinks {
use std::io;
use std::str;
use searcher::Searcher;
use super::{Sink, SinkError, SinkMatch};
/// A sink that provides line numbers and matches as strings while ignoring
/// everything else.
///
/// This implementation will return an error if a match contains invalid
/// UTF-8 or if the searcher was not configured to count lines. Errors
/// on invalid UTF-8 can be suppressed by using the `Lossy` sink instead
/// of this one.
///
/// The closure accepts two parameters: a line number and a UTF-8 string
/// containing the matched data. The closure returns a
/// `Result<bool, io::Error>`. If the `bool` is `false`, then the search
/// stops immediately. Otherwise, searching continues.
///
/// If multi line mode was enabled, the line number refers to the line
/// number of the first line in the match.
#[derive(Clone, Debug)]
pub struct UTF8<F>(pub F)
where F: FnMut(u64, &str) -> Result<bool, io::Error>;
impl<F> Sink for UTF8<F>
where F: FnMut(u64, &str) -> Result<bool, io::Error>
{
type Error = io::Error;
fn matched(
&mut self,
_searcher: &Searcher,
mat: &SinkMatch,
) -> Result<bool, io::Error> {
let matched = match str::from_utf8(mat.bytes()) {
Ok(matched) => matched,
Err(err) => return Err(io::Error::error_message(err)),
};
let line_number = match mat.line_number() {
Some(line_number) => line_number,
None => {
let msg = "line numbers not enabled";
return Err(io::Error::error_message(msg));
}
};
(self.0)(line_number, &matched)
}
}
/// A sink that provides line numbers and matches as (lossily converted)
/// strings while ignoring everything else.
///
/// This is like `UTF8`, except that if a match contains invalid UTF-8,
/// then it will be lossily converted to valid UTF-8 by substituting
/// invalid UTF-8 with Unicode replacement characters.
///
/// This implementation will return an error on the first match if the
/// searcher was not configured to count lines.
///
/// The closure accepts two parameters: a line number and a UTF-8 string
/// containing the matched data. The closure returns a
/// `Result<bool, io::Error>`. If the `bool` is `false`, then the search
/// stops immediately. Otherwise, searching continues.
///
/// If multi line mode was enabled, the line number refers to the line
/// number of the first line in the match.
#[derive(Clone, Debug)]
pub struct Lossy<F>(pub F)
where F: FnMut(u64, &str) -> Result<bool, io::Error>;
impl<F> Sink for Lossy<F>
where F: FnMut(u64, &str) -> Result<bool, io::Error>
{
type Error = io::Error;
fn matched(
&mut self,
_searcher: &Searcher,
mat: &SinkMatch,
) -> Result<bool, io::Error> {
use std::borrow::Cow;
let matched = match str::from_utf8(mat.bytes()) {
Ok(matched) => Cow::Borrowed(matched),
// TODO: In theory, it should be possible to amortize
// allocation here, but `std` doesn't provide such an API.
// Regardless, this only happens on matches with invalid UTF-8,
// which should be pretty rare.
Err(_) => String::from_utf8_lossy(mat.bytes()),
};
let line_number = match mat.line_number() {
Some(line_number) => line_number,
None => {
let msg = "line numbers not enabled";
return Err(io::Error::error_message(msg));
}
};
(self.0)(line_number, &matched)
}
}
/// A sink that provides line numbers and matches as raw bytes while
/// ignoring everything else.
///
/// This implementation will return an error on the first match if the
/// searcher was not configured to count lines.
///
/// The closure accepts two parameters: a line number and a raw byte string
/// containing the matched data. The closure returns a `Result<bool,
/// io::Error>`. If the `bool` is `false`, then the search stops
/// immediately. Otherwise, searching continues.
///
/// If multi line mode was enabled, the line number refers to the line
/// number of the first line in the match.
#[derive(Clone, Debug)]
pub struct Bytes<F>(pub F)
where F: FnMut(u64, &[u8]) -> Result<bool, io::Error>;
impl<F> Sink for Bytes<F>
where F: FnMut(u64, &[u8]) -> Result<bool, io::Error>
{
type Error = io::Error;
fn matched(
&mut self,
_searcher: &Searcher,
mat: &SinkMatch,
) -> Result<bool, io::Error> {
let line_number = match mat.line_number() {
Some(line_number) => line_number,
None => {
let msg = "line numbers not enabled";
return Err(io::Error::error_message(msg));
}
};
(self.0)(line_number, mat.bytes())
}
}
}

View File

@@ -0,0 +1,787 @@
use std::io::{self, Write};
use std::str;
use grep_matcher::{
LineMatchKind, LineTerminator, Match, Matcher, NoCaptures, NoError,
};
use memchr::memchr;
use regex::bytes::{Regex, RegexBuilder};
use searcher::{BinaryDetection, Searcher, SearcherBuilder};
use sink::{Sink, SinkContext, SinkFinish, SinkMatch};
/// A simple regex matcher.
///
/// This supports setting the matcher's line terminator configuration directly,
/// which we use for testing purposes. That is, the caller explicitly
/// determines whether the line terminator optimization is enabled. (In reality
/// this optimization is detected automatically by inspecting and possibly
/// modifying the regex itself.)
#[derive(Clone, Debug)]
pub struct RegexMatcher {
regex: Regex,
line_term: Option<LineTerminator>,
every_line_is_candidate: bool,
}
impl RegexMatcher {
/// Create a new regex matcher.
pub fn new(pattern: &str) -> RegexMatcher {
let regex = RegexBuilder::new(pattern)
.multi_line(true) // permits ^ and $ to match at \n boundaries
.build()
.unwrap();
RegexMatcher {
regex: regex,
line_term: None,
every_line_is_candidate: false,
}
}
/// Forcefully set the line terminator of this matcher.
///
/// By default, this matcher has no line terminator set.
pub fn set_line_term(
&mut self,
line_term: Option<LineTerminator>,
) -> &mut RegexMatcher {
self.line_term = line_term;
self
}
/// Whether to return every line as a candidate or not.
///
/// This forces searchers to handle the case of reporting a false positive.
pub fn every_line_is_candidate(
&mut self,
yes: bool,
) -> &mut RegexMatcher {
self.every_line_is_candidate = yes;
self
}
}
impl Matcher for RegexMatcher {
type Captures = NoCaptures;
type Error = NoError;
fn find_at(
&self,
haystack: &[u8],
at: usize,
) -> Result<Option<Match>, NoError> {
Ok(self.regex
.find_at(haystack, at)
.map(|m| Match::new(m.start(), m.end())))
}
fn new_captures(&self) -> Result<NoCaptures, NoError> {
Ok(NoCaptures::new())
}
fn line_terminator(&self) -> Option<LineTerminator> {
self.line_term
}
fn find_candidate_line(
&self,
haystack: &[u8],
) -> Result<Option<LineMatchKind>, NoError> {
if self.every_line_is_candidate {
assert!(self.line_term.is_some());
if haystack.is_empty() {
return Ok(None);
}
// Make it interesting and return the last byte in the current
// line.
let i = memchr(self.line_term.unwrap().as_byte(), haystack)
.map(|i| i)
.unwrap_or(haystack.len() - 1);
Ok(Some(LineMatchKind::Candidate(i)))
} else {
Ok(self.shortest_match(haystack)?.map(LineMatchKind::Confirmed))
}
}
}
/// An implementation of Sink that prints all available information.
///
/// This is useful for tests because it lets us easily confirm whether data
/// is being passed to Sink correctly.
#[derive(Clone, Debug)]
pub struct KitchenSink(Vec<u8>);
impl KitchenSink {
/// Create a new implementation of Sink that includes everything in the
/// kitchen.
pub fn new() -> KitchenSink {
KitchenSink(vec![])
}
/// Return the data written to this sink.
pub fn as_bytes(&self) -> &[u8] {
&self.0
}
}
impl Sink for KitchenSink {
type Error = io::Error;
fn matched(
&mut self,
_searcher: &Searcher,
mat: &SinkMatch,
) -> Result<bool, io::Error> {
assert!(!mat.bytes().is_empty());
assert!(mat.lines().count() >= 1);
let mut line_number = mat.line_number();
let mut byte_offset = mat.absolute_byte_offset();
for line in mat.lines() {
if let Some(ref mut n) = line_number {
write!(self.0, "{}:", n)?;
*n += 1;
}
write!(self.0, "{}:", byte_offset)?;
byte_offset += line.len() as u64;
self.0.write_all(line)?;
}
Ok(true)
}
fn context(
&mut self,
_searcher: &Searcher,
context: &SinkContext,
) -> Result<bool, io::Error> {
assert!(!context.bytes().is_empty());
assert!(context.lines().count() == 1);
if let Some(line_number) = context.line_number() {
write!(self.0, "{}-", line_number)?;
}
write!(self.0, "{}-", context.absolute_byte_offset)?;
self.0.write_all(context.bytes())?;
Ok(true)
}
fn context_break(
&mut self,
_searcher: &Searcher,
) -> Result<bool, io::Error> {
self.0.write_all(b"--\n")?;
Ok(true)
}
fn finish(
&mut self,
_searcher: &Searcher,
sink_finish: &SinkFinish,
) -> Result<(), io::Error> {
writeln!(self.0, "")?;
writeln!(self.0, "byte count:{}", sink_finish.byte_count())?;
if let Some(offset) = sink_finish.binary_byte_offset() {
writeln!(self.0, "binary offset:{}", offset)?;
}
Ok(())
}
}
/// A type for expressing tests on a searcher.
///
/// The searcher code has a lot of different code paths, mostly for the
/// purposes of optimizing a bunch of different use cases. The intent of the
/// searcher is to pick the best code path based on the configuration, which
/// means there is no obviously direct way to ask that a specific code path
/// be exercised. Thus, the purpose of this tester is to explicitly check as
/// many code paths that make sense.
///
/// The tester works by assuming you want to test all pertinent code paths.
/// These can be trimmed down as necessary via the various builder methods.
#[derive(Debug)]
pub struct SearcherTester {
haystack: String,
pattern: String,
filter: Option<::regex::Regex>,
print_labels: bool,
expected_no_line_number: Option<String>,
expected_with_line_number: Option<String>,
expected_slice_no_line_number: Option<String>,
expected_slice_with_line_number: Option<String>,
by_line: bool,
multi_line: bool,
invert_match: bool,
line_number: bool,
binary: BinaryDetection,
auto_heap_limit: bool,
after_context: usize,
before_context: usize,
passthru: bool,
}
impl SearcherTester {
/// Create a new tester for testing searchers.
pub fn new(haystack: &str, pattern: &str) -> SearcherTester {
SearcherTester {
haystack: haystack.to_string(),
pattern: pattern.to_string(),
filter: None,
print_labels: false,
expected_no_line_number: None,
expected_with_line_number: None,
expected_slice_no_line_number: None,
expected_slice_with_line_number: None,
by_line: true,
multi_line: true,
invert_match: false,
line_number: true,
binary: BinaryDetection::none(),
auto_heap_limit: true,
after_context: 0,
before_context: 0,
passthru: false,
}
}
/// Execute the test. If the test succeeds, then this returns successfully.
/// If the test fails, then it panics with an informative message.
pub fn test(&self) {
// Check for configuration errors.
if self.expected_no_line_number.is_none() {
panic!("an 'expected' string with NO line numbers must be given");
}
if self.line_number && self.expected_with_line_number.is_none() {
panic!("an 'expected' string with line numbers must be given, \
or disable testing with line numbers");
}
let configs = self.configs();
if configs.is_empty() {
panic!("test configuration resulted in nothing being tested");
}
if self.print_labels {
for config in &configs {
let labels = vec![
format!("reader-{}", config.label),
format!("slice-{}", config.label),
];
for label in &labels {
if self.include(label) {
println!("{}", label);
} else {
println!("{} (ignored)", label);
}
}
}
}
for config in &configs {
let label = format!("reader-{}", config.label);
if self.include(&label) {
let got = config.search_reader(&self.haystack);
assert_eq_printed!(config.expected_reader, got, "{}", label);
}
let label = format!("slice-{}", config.label);
if self.include(&label) {
let got = config.search_slice(&self.haystack);
assert_eq_printed!(config.expected_slice, got, "{}", label);
}
}
}
/// Set a regex pattern to filter the tests that are run.
///
/// By default, no filter is present. When a filter is set, only test
/// configurations with a label matching the given pattern will be run.
///
/// This is often useful when debugging tests, e.g., when you want to do
/// printf debugging and only want one particular test configuration to
/// execute.
#[allow(dead_code)]
pub fn filter(&mut self, pattern: &str) -> &mut SearcherTester {
self.filter = Some(::regex::Regex::new(pattern).unwrap());
self
}
/// When set, the labels for all test configurations are printed before
/// executing any test.
///
/// Note that in order to see these in tests that aren't failing, you'll
/// want to use `cargo test -- --nocapture`.
#[allow(dead_code)]
pub fn print_labels(&mut self, yes: bool) -> &mut SearcherTester {
self.print_labels = yes;
self
}
/// Set the expected search results, without line numbers.
pub fn expected_no_line_number(
&mut self,
exp: &str,
) -> &mut SearcherTester {
self.expected_no_line_number = Some(exp.to_string());
self
}
/// Set the expected search results, with line numbers.
pub fn expected_with_line_number(
&mut self,
exp: &str,
) -> &mut SearcherTester {
self.expected_with_line_number = Some(exp.to_string());
self
}
/// Set the expected search results, without line numbers, when performing
/// a search on a slice. When not present, `expected_no_line_number` is
/// used instead.
pub fn expected_slice_no_line_number(
&mut self,
exp: &str,
) -> &mut SearcherTester {
self.expected_slice_no_line_number = Some(exp.to_string());
self
}
/// Set the expected search results, with line numbers, when performing a
/// search on a slice. When not present, `expected_with_line_number` is
/// used instead.
#[allow(dead_code)]
pub fn expected_slice_with_line_number(
&mut self,
exp: &str,
) -> &mut SearcherTester {
self.expected_slice_with_line_number = Some(exp.to_string());
self
}
/// Whether to test search with line numbers or not.
///
/// This is enabled by default. When enabled, the string that is expected
/// when line numbers are present must be provided. Otherwise, the expected
/// string isn't required.
pub fn line_number(&mut self, yes: bool) -> &mut SearcherTester {
self.line_number = yes;
self
}
/// Whether to test search using the line-by-line searcher or not.
///
/// By default, this is enabled.
pub fn by_line(&mut self, yes: bool) -> &mut SearcherTester {
self.by_line = yes;
self
}
/// Whether to test search using the multi line searcher or not.
///
/// By default, this is enabled.
#[allow(dead_code)]
pub fn multi_line(&mut self, yes: bool) -> &mut SearcherTester {
self.multi_line = yes;
self
}
/// Whether to perform an inverted search or not.
///
/// By default, this is disabled.
pub fn invert_match(&mut self, yes: bool) -> &mut SearcherTester {
self.invert_match = yes;
self
}
/// Whether to enable binary detection on all searches.
///
/// By default, this is disabled.
pub fn binary_detection(
&mut self,
detection: BinaryDetection,
) -> &mut SearcherTester {
self.binary = detection;
self
}
/// Whether to automatically attempt to test the heap limit setting or not.
///
/// By default, one of the test configurations includes setting the heap
/// limit to its minimal value for normal operation, which checks that
/// everything works even at the extremes. However, in some cases, the heap
/// limit can (expectedly) alter the output slightly. For example, it can
/// impact the number of bytes searched when performing binary detection.
/// For convenience, it can be useful to disable the automatic heap limit
/// test.
pub fn auto_heap_limit(&mut self, yes: bool) -> &mut SearcherTester {
self.auto_heap_limit = yes;
self
}
/// Set the number of lines to include in the "after" context.
///
/// The default is `0`, which is equivalent to not printing any context.
pub fn after_context(&mut self, lines: usize) -> &mut SearcherTester {
self.after_context = lines;
self
}
/// Set the number of lines to include in the "before" context.
///
/// The default is `0`, which is equivalent to not printing any context.
pub fn before_context(&mut self, lines: usize) -> &mut SearcherTester {
self.before_context = lines;
self
}
/// Whether to enable the "passthru" feature or not.
///
/// When passthru is enabled, it effectively treats all non-matching lines
/// as contextual lines. In other words, enabling this is akin to
/// requesting an unbounded number of before and after contextual lines.
///
/// This is disabled by default.
pub fn passthru(&mut self, yes: bool) -> &mut SearcherTester {
self.passthru = yes;
self
}
/// Return the minimum size of a buffer required for a successful search.
///
/// Generally, this corresponds to the maximum length of a line (including
/// its terminator), but if context settings are enabled, then this must
/// include the sum of the longest N lines.
///
/// Note that this must account for whether the test is using multi line
/// search or not, since multi line search requires being able to fit the
/// entire haystack into memory.
fn minimal_heap_limit(&self, multi_line: bool) -> usize {
if multi_line {
1 + self.haystack.len()
} else if self.before_context == 0 && self.after_context == 0 {
1 + self.haystack.lines().map(|s| s.len()).max().unwrap_or(0)
} else {
let mut lens: Vec<usize> =
self.haystack.lines().map(|s| s.len()).collect();
lens.sort();
lens.reverse();
let context_count =
if self.passthru {
self.haystack.lines().count()
} else {
// Why do we add 2 here? Well, we need to add 1 in order to
// have room to search at least one line. We add another
// because the implementation will occasionally include
// an additional line when handling the context. There's
// no particularly good reason, other than keeping the
// implementation simple.
2 + self.before_context + self.after_context
};
// We add 1 to each line since `str::lines` doesn't include the
// line terminator.
lens.into_iter()
.take(context_count)
.map(|len| len + 1)
.sum::<usize>()
}
}
/// Returns true if and only if the given label should be included as part
/// of executing `test`.
///
/// Inclusion is determined by the filter specified. If no filter has been
/// given, then this always returns `true`.
fn include(&self, label: &str) -> bool {
let re = match self.filter {
None => return true,
Some(ref re) => re,
};
re.is_match(label)
}
/// Configs generates a set of all search configurations that should be
/// tested. The configs generated are based on the configuration in this
/// builder.
fn configs(&self) -> Vec<TesterConfig> {
let mut configs = vec![];
let matcher = RegexMatcher::new(&self.pattern);
let mut builder = SearcherBuilder::new();
builder
.line_number(false)
.invert_match(self.invert_match)
.binary_detection(self.binary.clone())
.after_context(self.after_context)
.before_context(self.before_context)
.passthru(self.passthru);
if self.by_line {
let mut matcher = matcher.clone();
let mut builder = builder.clone();
let expected_reader =
self.expected_no_line_number.as_ref().unwrap().to_string();
let expected_slice = match self.expected_slice_no_line_number {
None => expected_reader.clone(),
Some(ref e) => e.to_string(),
};
configs.push(TesterConfig {
label: "byline-noterm-nonumber".to_string(),
expected_reader: expected_reader.clone(),
expected_slice: expected_slice.clone(),
builder: builder.clone(),
matcher: matcher.clone(),
});
if self.auto_heap_limit {
builder.heap_limit(Some(self.minimal_heap_limit(false)));
configs.push(TesterConfig {
label: "byline-noterm-nonumber-heaplimit".to_string(),
expected_reader: expected_reader.clone(),
expected_slice: expected_slice.clone(),
builder: builder.clone(),
matcher: matcher.clone(),
});
builder.heap_limit(None);
}
matcher.set_line_term(Some(LineTerminator::byte(b'\n')));
configs.push(TesterConfig {
label: "byline-term-nonumber".to_string(),
expected_reader: expected_reader.clone(),
expected_slice: expected_slice.clone(),
builder: builder.clone(),
matcher: matcher.clone(),
});
matcher.every_line_is_candidate(true);
configs.push(TesterConfig {
label: "byline-term-nonumber-candidates".to_string(),
expected_reader: expected_reader.clone(),
expected_slice: expected_slice.clone(),
builder: builder.clone(),
matcher: matcher.clone(),
});
}
if self.by_line && self.line_number {
let mut matcher = matcher.clone();
let mut builder = builder.clone();
let expected_reader =
self.expected_with_line_number.as_ref().unwrap().to_string();
let expected_slice = match self.expected_slice_with_line_number {
None => expected_reader.clone(),
Some(ref e) => e.to_string(),
};
builder.line_number(true);
configs.push(TesterConfig {
label: "byline-noterm-number".to_string(),
expected_reader: expected_reader.clone(),
expected_slice: expected_slice.clone(),
builder: builder.clone(),
matcher: matcher.clone(),
});
matcher.set_line_term(Some(LineTerminator::byte(b'\n')));
configs.push(TesterConfig {
label: "byline-term-number".to_string(),
expected_reader: expected_reader.clone(),
expected_slice: expected_slice.clone(),
builder: builder.clone(),
matcher: matcher.clone(),
});
matcher.every_line_is_candidate(true);
configs.push(TesterConfig {
label: "byline-term-number-candidates".to_string(),
expected_reader: expected_reader.clone(),
expected_slice: expected_slice.clone(),
builder: builder.clone(),
matcher: matcher.clone(),
});
}
if self.multi_line {
let mut builder = builder.clone();
let expected_slice = match self.expected_slice_no_line_number {
None => {
self.expected_no_line_number.as_ref().unwrap().to_string()
}
Some(ref e) => e.to_string(),
};
builder.multi_line(true);
configs.push(TesterConfig {
label: "multiline-nonumber".to_string(),
expected_reader: expected_slice.clone(),
expected_slice: expected_slice.clone(),
builder: builder.clone(),
matcher: matcher.clone(),
});
if self.auto_heap_limit {
builder.heap_limit(Some(self.minimal_heap_limit(true)));
configs.push(TesterConfig {
label: "multiline-nonumber-heaplimit".to_string(),
expected_reader: expected_slice.clone(),
expected_slice: expected_slice.clone(),
builder: builder.clone(),
matcher: matcher.clone(),
});
builder.heap_limit(None);
}
}
if self.multi_line && self.line_number {
let mut builder = builder.clone();
let expected_slice = match self.expected_slice_with_line_number {
None => {
self.expected_with_line_number
.as_ref().unwrap().to_string()
}
Some(ref e) => e.to_string(),
};
builder.multi_line(true);
builder.line_number(true);
configs.push(TesterConfig {
label: "multiline-number".to_string(),
expected_reader: expected_slice.clone(),
expected_slice: expected_slice.clone(),
builder: builder.clone(),
matcher: matcher.clone(),
});
builder.heap_limit(Some(self.minimal_heap_limit(true)));
configs.push(TesterConfig {
label: "multiline-number-heaplimit".to_string(),
expected_reader: expected_slice.clone(),
expected_slice: expected_slice.clone(),
builder: builder.clone(),
matcher: matcher.clone(),
});
builder.heap_limit(None);
}
configs
}
}
#[derive(Debug)]
struct TesterConfig {
label: String,
expected_reader: String,
expected_slice: String,
builder: SearcherBuilder,
matcher: RegexMatcher,
}
impl TesterConfig {
/// Execute a search using a reader. This exercises the incremental search
/// strategy, where the entire contents of the corpus aren't necessarily
/// in memory at once.
fn search_reader(&self, haystack: &str) -> String {
let mut sink = KitchenSink::new();
let mut searcher = self.builder.build();
let result = searcher.search_reader(
&self.matcher,
haystack.as_bytes(),
&mut sink,
);
if let Err(err) = result {
let label = format!("reader-{}", self.label);
panic!("error running '{}': {}", label, err);
}
String::from_utf8(sink.as_bytes().to_vec()).unwrap()
}
/// Execute a search using a slice. This exercises the search routines that
/// have the entire contents of the corpus in memory at one time.
fn search_slice(&self, haystack: &str) -> String {
let mut sink = KitchenSink::new();
let mut searcher = self.builder.build();
let result = searcher.search_slice(
&self.matcher,
haystack.as_bytes(),
&mut sink,
);
if let Err(err) = result {
let label = format!("slice-{}", self.label);
panic!("error running '{}': {}", label, err);
}
String::from_utf8(sink.as_bytes().to_vec()).unwrap()
}
}
#[cfg(test)]
mod tests {
use grep_matcher::{Match, Matcher};
use super::*;
fn m(start: usize, end: usize) -> Match {
Match::new(start, end)
}
#[test]
fn empty_line1() {
let haystack = b"";
let matcher = RegexMatcher::new(r"^$");
assert_eq!(matcher.find_at(haystack, 0), Ok(Some(m(0, 0))));
}
#[test]
fn empty_line2() {
let haystack = b"\n";
let matcher = RegexMatcher::new(r"^$");
assert_eq!(matcher.find_at(haystack, 0), Ok(Some(m(0, 0))));
assert_eq!(matcher.find_at(haystack, 1), Ok(Some(m(1, 1))));
}
#[test]
fn empty_line3() {
let haystack = b"\n\n";
let matcher = RegexMatcher::new(r"^$");
assert_eq!(matcher.find_at(haystack, 0), Ok(Some(m(0, 0))));
assert_eq!(matcher.find_at(haystack, 1), Ok(Some(m(1, 1))));
assert_eq!(matcher.find_at(haystack, 2), Ok(Some(m(2, 2))));
}
#[test]
fn empty_line4() {
let haystack = b"a\n\nb\n";
let matcher = RegexMatcher::new(r"^$");
assert_eq!(matcher.find_at(haystack, 0), Ok(Some(m(2, 2))));
assert_eq!(matcher.find_at(haystack, 1), Ok(Some(m(2, 2))));
assert_eq!(matcher.find_at(haystack, 2), Ok(Some(m(2, 2))));
assert_eq!(matcher.find_at(haystack, 3), Ok(Some(m(5, 5))));
assert_eq!(matcher.find_at(haystack, 4), Ok(Some(m(5, 5))));
assert_eq!(matcher.find_at(haystack, 5), Ok(Some(m(5, 5))));
}
#[test]
fn empty_line5() {
let haystack = b"a\n\nb\nc";
let matcher = RegexMatcher::new(r"^$");
assert_eq!(matcher.find_at(haystack, 0), Ok(Some(m(2, 2))));
assert_eq!(matcher.find_at(haystack, 1), Ok(Some(m(2, 2))));
assert_eq!(matcher.find_at(haystack, 2), Ok(Some(m(2, 2))));
assert_eq!(matcher.find_at(haystack, 3), Ok(None));
assert_eq!(matcher.find_at(haystack, 4), Ok(None));
assert_eq!(matcher.find_at(haystack, 5), Ok(None));
assert_eq!(matcher.find_at(haystack, 6), Ok(None));
}
#[test]
fn empty_line6() {
let haystack = b"a\n";
let matcher = RegexMatcher::new(r"^$");
assert_eq!(matcher.find_at(haystack, 0), Ok(Some(m(2, 2))));
assert_eq!(matcher.find_at(haystack, 1), Ok(Some(m(2, 2))));
assert_eq!(matcher.find_at(haystack, 2), Ok(Some(m(2, 2))));
}
}

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "grep" name = "grep"
version = "0.1.8" #:version version = "0.2.0" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"] authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """ description = """
Fast line oriented regex searching as a library. Fast line oriented regex searching as a library.
@@ -13,7 +13,11 @@ keywords = ["regex", "grep", "egrep", "search", "pattern"]
license = "Unlicense/MIT" license = "Unlicense/MIT"
[dependencies] [dependencies]
log = "0.4" grep-matcher = { version = "0.0.1", path = "../grep-matcher" }
memchr = "2" grep-printer = { version = "0.0.1", path = "../grep-printer" }
regex = "0.2.9" grep-regex = { version = "0.0.1", path = "../grep-regex" }
regex-syntax = "0.5.3" grep-searcher = { version = "0.0.1", path = "../grep-searcher" }
[features]
avx-accel = ["grep-searcher/avx-accel"]
simd-accel = ["grep-searcher/simd-accel"]

File diff suppressed because it is too large Load Diff

View File

@@ -1,84 +1,10 @@
#![deny(missing_docs)]
/*! /*!
A fast line oriented regex searcher. TODO.
*/ */
#[macro_use] #![deny(missing_docs)]
extern crate log;
extern crate memchr;
extern crate regex;
extern crate regex_syntax as syntax;
use std::error; pub extern crate grep_matcher as matcher;
use std::fmt; pub extern crate grep_printer as printer;
use std::result; pub extern crate grep_regex as regex;
pub extern crate grep_searcher as searcher;
pub use search::{Grep, GrepBuilder, Iter, Match};
mod literals;
mod nonl;
mod search;
mod smart_case;
mod word_boundary;
/// Result is a convenient type alias that fixes the type of the error to
/// the `Error` type defined in this crate.
pub type Result<T> = result::Result<T, Error>;
/// Error enumerates the list of possible error conditions when building or
/// using a `Grep` line searcher.
#[derive(Debug)]
pub enum Error {
/// An error from parsing or compiling a regex.
Regex(regex::Error),
/// This error occurs when an illegal literal was found in the regex
/// pattern. For example, if the line terminator is `\n` and the regex
/// pattern is `\w+\n\w+`, then the presence of `\n` will cause this error.
LiteralNotAllowed(char),
/// An unused enum variant that indicates this enum may be expanded in
/// the future and therefore should not be exhaustively matched.
#[doc(hidden)]
__Nonexhaustive,
}
impl error::Error for Error {
fn description(&self) -> &str {
match *self {
Error::Regex(ref err) => err.description(),
Error::LiteralNotAllowed(_) => "use of forbidden literal",
Error::__Nonexhaustive => unreachable!(),
}
}
fn cause(&self) -> Option<&error::Error> {
match *self {
Error::Regex(ref err) => err.cause(),
_ => None,
}
}
}
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match *self {
Error::Regex(ref err) => err.fmt(f),
Error::LiteralNotAllowed(chr) => {
write!(f, "Literal {:?} not allowed.", chr)
}
Error::__Nonexhaustive => unreachable!(),
}
}
}
impl From<regex::Error> for Error {
fn from(err: regex::Error) -> Error {
Error::Regex(err)
}
}
impl From<syntax::Error> for Error {
fn from(err: syntax::Error) -> Error {
Error::Regex(regex::Error::Syntax(err.to_string()))
}
}

View File

@@ -1,74 +0,0 @@
use syntax::hir::{self, Hir, HirKind};
use {Error, Result};
/// Returns a new expression that is guaranteed to never match the given
/// ASCII character.
///
/// If the expression contains the literal byte, then an error is returned.
///
/// If `byte` is not an ASCII character (i.e., greater than `0x7F`), then this
/// function panics.
pub fn remove(expr: Hir, byte: u8) -> Result<Hir> {
assert!(byte <= 0x7F);
let chr = byte as char;
assert!(chr.len_utf8() == 1);
Ok(match expr.into_kind() {
HirKind::Empty => Hir::empty(),
HirKind::Literal(hir::Literal::Unicode(c)) => {
if c == chr {
return Err(Error::LiteralNotAllowed(chr));
}
Hir::literal(hir::Literal::Unicode(c))
}
HirKind::Literal(hir::Literal::Byte(b)) => {
if b as char == chr {
return Err(Error::LiteralNotAllowed(chr));
}
Hir::literal(hir::Literal::Byte(b))
}
HirKind::Class(hir::Class::Unicode(mut cls)) => {
let remove = hir::ClassUnicode::new(Some(
hir::ClassUnicodeRange::new(chr, chr),
));
cls.difference(&remove);
if cls.iter().next().is_none() {
return Err(Error::LiteralNotAllowed(chr));
}
Hir::class(hir::Class::Unicode(cls))
}
HirKind::Class(hir::Class::Bytes(mut cls)) => {
let remove = hir::ClassBytes::new(Some(
hir::ClassBytesRange::new(byte, byte),
));
cls.difference(&remove);
if cls.iter().next().is_none() {
return Err(Error::LiteralNotAllowed(chr));
}
Hir::class(hir::Class::Bytes(cls))
}
HirKind::Anchor(x) => Hir::anchor(x),
HirKind::WordBoundary(x) => Hir::word_boundary(x),
HirKind::Repetition(mut x) => {
x.hir = Box::new(remove(*x.hir, byte)?);
Hir::repetition(x)
}
HirKind::Group(mut x) => {
x.hir = Box::new(remove(*x.hir, byte)?);
Hir::group(x)
}
HirKind::Concat(xs) => {
let xs = xs.into_iter()
.map(|e| remove(e, byte))
.collect::<Result<Vec<Hir>>>()?;
Hir::concat(xs)
}
HirKind::Alternation(xs) => {
let xs = xs.into_iter()
.map(|e| remove(e, byte))
.collect::<Result<Vec<Hir>>>()?;
Hir::alternation(xs)
}
})
}

View File

@@ -1,356 +0,0 @@
use memchr::{memchr, memrchr};
use syntax::ParserBuilder;
use syntax::hir::Hir;
use regex::bytes::{Regex, RegexBuilder};
use literals::LiteralSets;
use nonl;
use smart_case::Cased;
use word_boundary::strip_unicode_word_boundaries;
use Result;
/// A matched line.
#[derive(Clone, Debug, Default, Eq, PartialEq)]
pub struct Match {
start: usize,
end: usize,
}
impl Match {
/// Create a new empty match value.
pub fn new() -> Match {
Match::default()
}
/// Return the starting byte offset of the line that matched.
#[inline]
pub fn start(&self) -> usize {
self.start
}
/// Return the ending byte offset of the line that matched.
#[inline]
pub fn end(&self) -> usize {
self.end
}
}
/// A fast line oriented regex searcher.
#[derive(Clone, Debug)]
pub struct Grep {
re: Regex,
required: Option<Regex>,
opts: Options,
}
/// A builder for a grep searcher.
#[derive(Clone, Debug)]
pub struct GrepBuilder {
pattern: String,
opts: Options,
}
#[derive(Clone, Debug)]
struct Options {
case_insensitive: bool,
case_smart: bool,
line_terminator: u8,
size_limit: usize,
dfa_size_limit: usize,
}
impl Default for Options {
fn default() -> Options {
Options {
case_insensitive: false,
case_smart: false,
line_terminator: b'\n',
size_limit: 10 * (1 << 20),
dfa_size_limit: 10 * (1 << 20),
}
}
}
impl GrepBuilder {
/// Create a new builder for line searching.
///
/// The pattern given should be a regular expression. The precise syntax
/// supported is documented on the regex crate.
pub fn new(pattern: &str) -> GrepBuilder {
GrepBuilder {
pattern: pattern.to_string(),
opts: Options::default(),
}
}
/// Set the line terminator.
///
/// The line terminator can be any ASCII character and serves to delineate
/// the match boundaries in the text searched.
///
/// This panics if `ascii_byte` is greater than `0x7F` (i.e., not ASCII).
pub fn line_terminator(mut self, ascii_byte: u8) -> GrepBuilder {
assert!(ascii_byte <= 0x7F);
self.opts.line_terminator = ascii_byte;
self
}
/// Set the case sensitive flag (`i`) on the regex.
pub fn case_insensitive(mut self, yes: bool) -> GrepBuilder {
self.opts.case_insensitive = yes;
self
}
/// Whether to enable smart case search or not (disabled by default).
///
/// Smart case uses case insensitive search if the pattern contains only
/// lowercase characters (ignoring any characters which immediately follow
/// a '\'). Otherwise, a case sensitive search is used instead.
///
/// Enabling the case_insensitive flag overrides this.
pub fn case_smart(mut self, yes: bool) -> GrepBuilder {
self.opts.case_smart = yes;
self
}
/// Set the approximate size limit of the compiled regular expression.
///
/// This roughly corresponds to the number of bytes occupied by a
/// single compiled program. If the program exceeds this number, then a
/// compilation error is returned.
pub fn size_limit(mut self, limit: usize) -> GrepBuilder {
self.opts.size_limit = limit;
self
}
/// Set the approximate size of the cache used by the DFA.
///
/// This roughly corresponds to the number of bytes that the DFA will use
/// while searching.
///
/// Note that this is a per thread limit. There is no way to set a global
/// limit. In particular, if a regex is used from multiple threads
/// simulanteously, then each thread may use up to the number of bytes
/// specified here.
pub fn dfa_size_limit(mut self, limit: usize) -> GrepBuilder {
self.opts.dfa_size_limit = limit;
self
}
/// Create a line searcher.
///
/// If there was a problem parsing or compiling the regex with the given
/// options, then an error is returned.
pub fn build(self) -> Result<Grep> {
let expr = self.parse()?;
let literals = LiteralSets::create(&expr);
let re = self.regex(&expr)?;
let required = match literals.to_regex_builder() {
Some(builder) => Some(self.regex_build(builder)?),
None => {
match strip_unicode_word_boundaries(&expr) {
None => None,
Some(expr) => {
debug!("Stripped Unicode word boundaries. \
New AST:\n{:?}", expr);
self.regex(&expr).ok()
}
}
}
};
Ok(Grep {
re: re,
required: required,
opts: self.opts,
})
}
/// Creates a new regex from the given expression with the current
/// configuration.
fn regex(&self, expr: &Hir) -> Result<Regex> {
let mut builder = RegexBuilder::new(&expr.to_string());
builder.unicode(true);
self.regex_build(builder)
}
/// Builds a new regex from the given builder using the caller's settings.
fn regex_build(&self, mut builder: RegexBuilder) -> Result<Regex> {
builder
.multi_line(true)
.size_limit(self.opts.size_limit)
.dfa_size_limit(self.opts.dfa_size_limit)
.build()
.map_err(From::from)
}
/// Parses the underlying pattern and ensures the pattern can never match
/// the line terminator.
fn parse(&self) -> Result<Hir> {
let expr = ParserBuilder::new()
.allow_invalid_utf8(true)
.case_insensitive(self.is_case_insensitive()?)
.multi_line(true)
.build()
.parse(&self.pattern)?;
debug!("original regex HIR pattern:\n{}", expr);
let expr = nonl::remove(expr, self.opts.line_terminator)?;
debug!("transformed regex HIR pattern:\n{}", expr);
Ok(expr)
}
/// Determines whether the case insensitive flag should be enabled or not.
fn is_case_insensitive(&self) -> Result<bool> {
if self.opts.case_insensitive {
return Ok(true);
}
if !self.opts.case_smart {
return Ok(false);
}
let cased = match Cased::from_pattern(&self.pattern) {
None => return Ok(false),
Some(cased) => cased,
};
Ok(cased.any_literal && !cased.any_uppercase)
}
}
impl Grep {
/// Returns a reference to the underlying regex used by the searcher.
pub fn regex(&self) -> &Regex {
&self.re
}
/// Returns an iterator over all matches in the given buffer.
pub fn iter<'b, 's>(&'s self, buf: &'b [u8]) -> Iter<'b, 's> {
Iter {
searcher: self,
buf: buf,
start: 0,
}
}
/// Fills in the next line that matches in the given buffer starting at
/// the position given.
///
/// If no match could be found, `false` is returned, otherwise, `true` is
/// returned.
pub fn read_match(
&self,
mat: &mut Match,
buf: &[u8],
mut start: usize,
) -> bool {
if start >= buf.len() {
return false;
}
if let Some(ref req) = self.required {
while start < buf.len() {
let e = match req.shortest_match(&buf[start..]) {
None => return false,
Some(e) => start + e,
};
let (prevnl, nextnl) = self.find_line(buf, e, e);
match self.re.shortest_match(&buf[prevnl..nextnl]) {
None => {
start = nextnl;
continue;
}
Some(_) => {
self.fill_match(mat, prevnl, nextnl);
return true;
}
}
}
false
} else {
let e = match self.re.shortest_match(&buf[start..]) {
None => return false,
Some(e) => start + e,
};
let (s, e) = self.find_line(buf, e, e);
self.fill_match(mat, s, e);
true
}
}
fn fill_match(&self, mat: &mut Match, start: usize, end: usize) {
mat.start = start;
mat.end = end;
}
fn find_line(&self, buf: &[u8], s: usize, e: usize) -> (usize, usize) {
(self.find_line_start(buf, s), self.find_line_end(buf, e))
}
fn find_line_start(&self, buf: &[u8], pos: usize) -> usize {
memrchr(self.opts.line_terminator, &buf[0..pos]).map_or(0, |i| i + 1)
}
fn find_line_end(&self, buf: &[u8], pos: usize) -> usize {
memchr(self.opts.line_terminator, &buf[pos..])
.map_or(buf.len(), |i| pos + i + 1)
}
}
/// An iterator over all matches in a particular buffer.
///
/// `'b` refers to the lifetime of the buffer, and `'s` refers to the lifetime
/// of the searcher.
pub struct Iter<'b, 's> {
searcher: &'s Grep,
buf: &'b [u8],
start: usize,
}
impl<'b, 's> Iterator for Iter<'b, 's> {
type Item = Match;
fn next(&mut self) -> Option<Match> {
let mut mat = Match::default();
if !self.searcher.read_match(&mut mat, self.buf, self.start) {
self.start = self.buf.len();
return None;
}
self.start = mat.end;
Some(mat)
}
}
#[cfg(test)]
mod tests {
use memchr::{memchr, memrchr};
use regex::bytes::Regex;
use super::{GrepBuilder, Match};
static SHERLOCK: &'static [u8] = include_bytes!("./data/sherlock.txt");
fn find_lines(pat: &str, haystack: &[u8]) -> Vec<Match> {
let re = Regex::new(pat).unwrap();
let mut lines = vec![];
for m in re.find_iter(haystack) {
let start = memrchr(b'\n', &haystack[..m.start()])
.map_or(0, |i| i + 1);
let end = memchr(b'\n', &haystack[m.end()..])
.map_or(haystack.len(), |i| m.end() + i + 1);
lines.push(Match {
start: start,
end: end,
});
}
lines
}
fn grep_lines(pat: &str, haystack: &[u8]) -> Vec<Match> {
let g = GrepBuilder::new(pat).build().unwrap();
g.iter(haystack).collect()
}
#[test]
fn buffered_literal() {
let expected = find_lines("Sherlock Holmes", SHERLOCK);
let got = grep_lines("Sherlock Holmes", SHERLOCK);
assert_eq!(expected.len(), got.len());
assert_eq!(expected, got);
}
}

View File

@@ -1,191 +0,0 @@
use syntax::ast::{self, Ast};
use syntax::ast::parse::Parser;
/// The results of analyzing a regex for cased literals.
#[derive(Clone, Debug, Default)]
pub struct Cased {
/// True if and only if a literal uppercase character occurs in the regex.
///
/// A regex like `\pL` contains no uppercase literals, even though `L`
/// is uppercase and the `\pL` class contains uppercase characters.
pub any_uppercase: bool,
/// True if and only if the regex contains any literal at all. A regex like
/// `\pL` has this set to false.
pub any_literal: bool,
}
impl Cased {
/// Returns a `Cased` value by doing analysis on the AST of `pattern`.
///
/// If `pattern` is not a valid regular expression, then `None` is
/// returned.
pub fn from_pattern(pattern: &str) -> Option<Cased> {
Parser::new()
.parse(pattern)
.map(|ast| Cased::from_ast(&ast))
.ok()
}
fn from_ast(ast: &Ast) -> Cased {
let mut cased = Cased::default();
cased.from_ast_impl(ast);
cased
}
fn from_ast_impl(&mut self, ast: &Ast) {
if self.done() {
return;
}
match *ast {
Ast::Empty(_)
| Ast::Flags(_)
| Ast::Dot(_)
| Ast::Assertion(_)
| Ast::Class(ast::Class::Unicode(_))
| Ast::Class(ast::Class::Perl(_)) => {}
Ast::Literal(ref x) => {
self.from_ast_literal(x);
}
Ast::Class(ast::Class::Bracketed(ref x)) => {
self.from_ast_class_set(&x.kind);
}
Ast::Repetition(ref x) => {
self.from_ast_impl(&x.ast);
}
Ast::Group(ref x) => {
self.from_ast_impl(&x.ast);
}
Ast::Alternation(ref alt) => {
for x in &alt.asts {
self.from_ast_impl(x);
}
}
Ast::Concat(ref alt) => {
for x in &alt.asts {
self.from_ast_impl(x);
}
}
}
}
fn from_ast_class_set(&mut self, ast: &ast::ClassSet) {
if self.done() {
return;
}
match *ast {
ast::ClassSet::Item(ref item) => {
self.from_ast_class_set_item(item);
}
ast::ClassSet::BinaryOp(ref x) => {
self.from_ast_class_set(&x.lhs);
self.from_ast_class_set(&x.rhs);
}
}
}
fn from_ast_class_set_item(&mut self, ast: &ast::ClassSetItem) {
if self.done() {
return;
}
match *ast {
ast::ClassSetItem::Empty(_)
| ast::ClassSetItem::Ascii(_)
| ast::ClassSetItem::Unicode(_)
| ast::ClassSetItem::Perl(_) => {}
ast::ClassSetItem::Literal(ref x) => {
self.from_ast_literal(x);
}
ast::ClassSetItem::Range(ref x) => {
self.from_ast_literal(&x.start);
self.from_ast_literal(&x.end);
}
ast::ClassSetItem::Bracketed(ref x) => {
self.from_ast_class_set(&x.kind);
}
ast::ClassSetItem::Union(ref union) => {
for x in &union.items {
self.from_ast_class_set_item(x);
}
}
}
}
fn from_ast_literal(&mut self, ast: &ast::Literal) {
self.any_literal = true;
self.any_uppercase = self.any_uppercase || ast.c.is_uppercase();
}
/// Returns true if and only if the attributes can never change no matter
/// what other AST it might see.
fn done(&self) -> bool {
self.any_uppercase && self.any_literal
}
}
#[cfg(test)]
mod tests {
use super::*;
fn cased(pattern: &str) -> Cased {
Cased::from_pattern(pattern).unwrap()
}
#[test]
fn various() {
let x = cased("");
assert!(!x.any_uppercase);
assert!(!x.any_literal);
let x = cased("foo");
assert!(!x.any_uppercase);
assert!(x.any_literal);
let x = cased("Foo");
assert!(x.any_uppercase);
assert!(x.any_literal);
let x = cased("foO");
assert!(x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"foo\\");
assert!(!x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"foo\w");
assert!(!x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"foo\S");
assert!(!x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"foo\p{Ll}");
assert!(!x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"foo[a-z]");
assert!(!x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"foo[A-Z]");
assert!(x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"foo[\S\t]");
assert!(!x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"foo\\S");
assert!(x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"\p{Ll}");
assert!(!x.any_uppercase);
assert!(!x.any_literal);
let x = cased(r"aBc\w");
assert!(x.any_uppercase);
assert!(x.any_literal);
}
}

View File

@@ -1,53 +0,0 @@
use syntax::hir::{self, Hir, HirKind};
/// Strips Unicode word boundaries from the given expression.
///
/// The key invariant this maintains is that the expression returned will match
/// *at least* every where the expression given will match. Namely, a match of
/// the returned expression can report false positives but it will never report
/// false negatives.
///
/// If no word boundaries could be stripped, then None is returned.
pub fn strip_unicode_word_boundaries(expr: &Hir) -> Option<Hir> {
// The real reason we do this is because Unicode word boundaries are the
// one thing that Rust's regex DFA engine can't handle. When it sees a
// Unicode word boundary among non-ASCII text, it falls back to one of the
// slower engines. We work around this limitation by attempting to use
// a regex to find candidate matches without a Unicode word boundary. We'll
// only then use the full (and slower) regex to confirm a candidate as a
// match or not during search.
//
// It looks like we only check the outer edges for `\b`? I guess this is
// an attempt to optimize for the `-w/--word-regexp` flag? ---AG
match *expr.kind() {
HirKind::Concat(ref es) if !es.is_empty() => {
let first = is_unicode_word_boundary(&es[0]);
let last = is_unicode_word_boundary(es.last().unwrap());
// Be careful not to strip word boundaries if there are no other
// expressions to match.
match (first, last) {
(true, false) if es.len() > 1 => {
Some(Hir::concat(es[1..].to_vec()))
}
(false, true) if es.len() > 1 => {
Some(Hir::concat(es[..es.len() - 1].to_vec()))
}
(true, true) if es.len() > 2 => {
Some(Hir::concat(es[1..es.len() - 1].to_vec()))
}
_ => None,
}
}
_ => None,
}
}
/// Returns true if the given expression is a Unicode word boundary.
fn is_unicode_word_boundary(expr: &Hir) -> bool {
match *expr.kind() {
HirKind::WordBoundary(hir::WordBoundary::Unicode) => true,
HirKind::WordBoundary(hir::WordBoundary::UnicodeNegate) => true,
HirKind::Group(ref x) => is_unicode_word_boundary(&x.hir),
_ => false,
}
}

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "ignore" name = "ignore"
version = "0.4.2" #:version version = "0.4.3" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"] authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """ description = """
A fast library for efficiently matching ignore files such as `.gitignore` A fast library for efficiently matching ignore files such as `.gitignore`
@@ -23,7 +23,7 @@ globset = { version = "0.4.0", path = "../globset" }
lazy_static = "1" lazy_static = "1"
log = "0.4" log = "0.4"
memchr = "2" memchr = "2"
regex = "0.2.9" regex = "1"
same-file = "1" same-file = "1"
thread_local = "0.3.2" thread_local = "0.3.2"
walkdir = "2" walkdir = "2"

View File

@@ -20,7 +20,7 @@ Add this to your `Cargo.toml`:
```toml ```toml
[dependencies] [dependencies]
ignore = "0.3" ignore = "0.4"
``` ```
and this to your crate root: and this to your crate root:

View File

@@ -65,6 +65,8 @@ struct IgnoreOptions {
hidden: bool, hidden: bool,
/// Whether to read .ignore files. /// Whether to read .ignore files.
ignore: bool, ignore: bool,
/// Whether to respect any ignore files in parent directories.
parents: bool,
/// Whether to read git's global gitignore file. /// Whether to read git's global gitignore file.
git_global: bool, git_global: bool,
/// Whether to read .gitignore files. /// Whether to read .gitignore files.
@@ -151,6 +153,15 @@ impl Ignore {
&self, &self,
path: P, path: P,
) -> (Ignore, Option<Error>) { ) -> (Ignore, Option<Error>) {
if !self.0.opts.parents
&& !self.0.opts.git_ignore
&& !self.0.opts.git_exclude
&& !self.0.opts.git_global
{
// If we never need info from parent directories, then don't do
// anything.
return (self.clone(), None);
}
if !self.is_root() { if !self.is_root() {
panic!("Ignore::add_parents called on non-root matcher"); panic!("Ignore::add_parents called on non-root matcher");
} }
@@ -183,6 +194,7 @@ impl Ignore {
errs.maybe_push(err); errs.maybe_push(err);
igtmp.is_absolute_parent = true; igtmp.is_absolute_parent = true;
igtmp.absolute_base = Some(absolute_base.clone()); igtmp.absolute_base = Some(absolute_base.clone());
igtmp.has_git = parent.join(".git").exists();
ig = Ignore(Arc::new(igtmp)); ig = Ignore(Arc::new(igtmp));
compiled.insert(parent.as_os_str().to_os_string(), ig.clone()); compiled.insert(parent.as_os_str().to_os_string(), ig.clone());
} }
@@ -209,7 +221,9 @@ impl Ignore {
fn add_child_path(&self, dir: &Path) -> (IgnoreInner, Option<Error>) { fn add_child_path(&self, dir: &Path) -> (IgnoreInner, Option<Error>) {
let mut errs = PartialErrorBuilder::default(); let mut errs = PartialErrorBuilder::default();
let custom_ig_matcher = let custom_ig_matcher =
{ if self.0.custom_ignore_filenames.is_empty() {
Gitignore::empty()
} else {
let (m, err) = let (m, err) =
create_gitignore(&dir, &self.0.custom_ignore_filenames); create_gitignore(&dir, &self.0.custom_ignore_filenames);
errs.maybe_push(err); errs.maybe_push(err);
@@ -254,7 +268,7 @@ impl Ignore {
git_global_matcher: self.0.git_global_matcher.clone(), git_global_matcher: self.0.git_global_matcher.clone(),
git_ignore_matcher: gi_matcher, git_ignore_matcher: gi_matcher,
git_exclude_matcher: gi_exclude_matcher, git_exclude_matcher: gi_exclude_matcher,
has_git: dir.join(".git").is_dir(), has_git: dir.join(".git").exists(),
opts: self.0.opts, opts: self.0.opts,
}; };
(ig, errs.into_error_option()) (ig, errs.into_error_option())
@@ -264,9 +278,11 @@ impl Ignore {
fn has_any_ignore_rules(&self) -> bool { fn has_any_ignore_rules(&self) -> bool {
let opts = self.0.opts; let opts = self.0.opts;
let has_custom_ignore_files = !self.0.custom_ignore_filenames.is_empty(); let has_custom_ignore_files = !self.0.custom_ignore_filenames.is_empty();
let has_explicit_ignores = !self.0.explicit_ignores.is_empty();
opts.ignore || opts.git_global || opts.git_ignore opts.ignore || opts.git_global || opts.git_ignore
|| opts.git_exclude || has_custom_ignore_files || opts.git_exclude || has_custom_ignore_files
|| has_explicit_ignores
} }
/// Returns a match indicating whether the given file path should be /// Returns a match indicating whether the given file path should be
@@ -329,6 +345,7 @@ impl Ignore {
) -> Match<IgnoreMatch<'a>> { ) -> Match<IgnoreMatch<'a>> {
let (mut m_custom_ignore, mut m_ignore, mut m_gi, mut m_gi_exclude, mut m_explicit) = let (mut m_custom_ignore, mut m_ignore, mut m_gi, mut m_gi_exclude, mut m_explicit) =
(Match::None, Match::None, Match::None, Match::None, Match::None); (Match::None, Match::None, Match::None, Match::None, Match::None);
let any_git = self.parents().any(|ig| ig.0.has_git);
let mut saw_git = false; let mut saw_git = false;
for ig in self.parents().take_while(|ig| !ig.0.is_absolute_parent) { for ig in self.parents().take_while(|ig| !ig.0.is_absolute_parent) {
if m_custom_ignore.is_none() { if m_custom_ignore.is_none() {
@@ -341,18 +358,19 @@ impl Ignore {
ig.0.ignore_matcher.matched(path, is_dir) ig.0.ignore_matcher.matched(path, is_dir)
.map(IgnoreMatch::gitignore); .map(IgnoreMatch::gitignore);
} }
if !saw_git && m_gi.is_none() { if any_git && !saw_git && m_gi.is_none() {
m_gi = m_gi =
ig.0.git_ignore_matcher.matched(path, is_dir) ig.0.git_ignore_matcher.matched(path, is_dir)
.map(IgnoreMatch::gitignore); .map(IgnoreMatch::gitignore);
} }
if !saw_git && m_gi_exclude.is_none() { if any_git && !saw_git && m_gi_exclude.is_none() {
m_gi_exclude = m_gi_exclude =
ig.0.git_exclude_matcher.matched(path, is_dir) ig.0.git_exclude_matcher.matched(path, is_dir)
.map(IgnoreMatch::gitignore); .map(IgnoreMatch::gitignore);
} }
saw_git = saw_git || ig.0.has_git; saw_git = saw_git || ig.0.has_git;
} }
if self.0.opts.parents {
if let Some(abs_parent_path) = self.absolute_base() { if let Some(abs_parent_path) = self.absolute_base() {
let path = abs_parent_path.join(path); let path = abs_parent_path.join(path);
for ig in self.parents().skip_while(|ig|!ig.0.is_absolute_parent) { for ig in self.parents().skip_while(|ig|!ig.0.is_absolute_parent) {
@@ -366,12 +384,12 @@ impl Ignore {
ig.0.ignore_matcher.matched(&path, is_dir) ig.0.ignore_matcher.matched(&path, is_dir)
.map(IgnoreMatch::gitignore); .map(IgnoreMatch::gitignore);
} }
if !saw_git && m_gi.is_none() { if any_git && !saw_git && m_gi.is_none() {
m_gi = m_gi =
ig.0.git_ignore_matcher.matched(&path, is_dir) ig.0.git_ignore_matcher.matched(&path, is_dir)
.map(IgnoreMatch::gitignore); .map(IgnoreMatch::gitignore);
} }
if !saw_git && m_gi_exclude.is_none() { if any_git && !saw_git && m_gi_exclude.is_none() {
m_gi_exclude = m_gi_exclude =
ig.0.git_exclude_matcher.matched(&path, is_dir) ig.0.git_exclude_matcher.matched(&path, is_dir)
.map(IgnoreMatch::gitignore); .map(IgnoreMatch::gitignore);
@@ -379,14 +397,21 @@ impl Ignore {
saw_git = saw_git || ig.0.has_git; saw_git = saw_git || ig.0.has_git;
} }
} }
}
for gi in self.0.explicit_ignores.iter().rev() { for gi in self.0.explicit_ignores.iter().rev() {
if !m_explicit.is_none() { if !m_explicit.is_none() {
break; break;
} }
m_explicit = gi.matched(&path, is_dir).map(IgnoreMatch::gitignore); m_explicit = gi.matched(&path, is_dir).map(IgnoreMatch::gitignore);
} }
let m_global = self.0.git_global_matcher.matched(&path, is_dir) let m_global =
.map(IgnoreMatch::gitignore); if any_git {
self.0.git_global_matcher
.matched(&path, is_dir)
.map(IgnoreMatch::gitignore)
} else {
Match::None
};
m_custom_ignore.or(m_ignore).or(m_gi).or(m_gi_exclude).or(m_global).or(m_explicit) m_custom_ignore.or(m_ignore).or(m_gi).or(m_gi_exclude).or(m_global).or(m_explicit)
} }
@@ -454,6 +479,7 @@ impl IgnoreBuilder {
opts: IgnoreOptions { opts: IgnoreOptions {
hidden: true, hidden: true,
ignore: true, ignore: true,
parents: true,
git_global: true, git_global: true,
git_ignore: true, git_ignore: true,
git_exclude: true, git_exclude: true,
@@ -556,6 +582,17 @@ impl IgnoreBuilder {
self self
} }
/// Enables reading ignore files from parent directories.
///
/// If this is enabled, then .gitignore files in parent directories of each
/// file path given are respected. Otherwise, they are ignored.
///
/// This is enabled by default.
pub fn parents(&mut self, yes: bool) -> &mut IgnoreBuilder {
self.opts.parents = yes;
self
}
/// Add a global gitignore matcher. /// Add a global gitignore matcher.
/// ///
/// Its precedence is lower than both normal `.gitignore` files and /// Its precedence is lower than both normal `.gitignore` files and
@@ -677,6 +714,7 @@ mod tests {
#[test] #[test]
fn gitignore() { fn gitignore() {
let td = TempDir::new("ignore-test-").unwrap(); let td = TempDir::new("ignore-test-").unwrap();
mkdirp(td.path().join(".git"));
wfile(td.path().join(".gitignore"), "foo\n!bar"); wfile(td.path().join(".gitignore"), "foo\n!bar");
let (ig, err) = IgnoreBuilder::new().build().add_child(td.path()); let (ig, err) = IgnoreBuilder::new().build().add_child(td.path());
@@ -686,6 +724,18 @@ mod tests {
assert!(ig.matched("baz", false).is_none()); assert!(ig.matched("baz", false).is_none());
} }
#[test]
fn gitignore_no_git() {
let td = TempDir::new("ignore-test-").unwrap();
wfile(td.path().join(".gitignore"), "foo\n!bar");
let (ig, err) = IgnoreBuilder::new().build().add_child(td.path());
assert!(err.is_none());
assert!(ig.matched("foo", false).is_none());
assert!(ig.matched("bar", false).is_none());
assert!(ig.matched("baz", false).is_none());
}
#[test] #[test]
fn ignore() { fn ignore() {
let td = TempDir::new("ignore-test-").unwrap(); let td = TempDir::new("ignore-test-").unwrap();
@@ -795,6 +845,7 @@ mod tests {
#[test] #[test]
fn errored_partial() { fn errored_partial() {
let td = TempDir::new("ignore-test-").unwrap(); let td = TempDir::new("ignore-test-").unwrap();
mkdirp(td.path().join(".git"));
wfile(td.path().join(".gitignore"), "f**oo\nbar"); wfile(td.path().join(".gitignore"), "f**oo\nbar");
let (ig, err) = IgnoreBuilder::new().build().add_child(td.path()); let (ig, err) = IgnoreBuilder::new().build().add_child(td.path());

View File

@@ -83,7 +83,7 @@ pub struct Gitignore {
globs: Vec<Glob>, globs: Vec<Glob>,
num_ignores: u64, num_ignores: u64,
num_whitelists: u64, num_whitelists: u64,
matches: Arc<ThreadLocal<RefCell<Vec<usize>>>>, matches: Option<Arc<ThreadLocal<RefCell<Vec<usize>>>>>,
} }
impl Gitignore { impl Gitignore {
@@ -143,7 +143,14 @@ impl Gitignore {
/// ///
/// Its path is empty. /// Its path is empty.
pub fn empty() -> Gitignore { pub fn empty() -> Gitignore {
GitignoreBuilder::new("").build().unwrap() Gitignore {
set: GlobSet::empty(),
root: PathBuf::from(""),
globs: vec![],
num_ignores: 0,
num_whitelists: 0,
matches: None,
}
} }
/// Returns the directory containing this gitignore matcher. /// Returns the directory containing this gitignore matcher.
@@ -213,6 +220,11 @@ impl Gitignore {
/// determined by a common suffix of the directory containing this /// determined by a common suffix of the directory containing this
/// gitignore) is stripped. If there is no common suffix/prefix overlap, /// gitignore) is stripped. If there is no common suffix/prefix overlap,
/// then `path` is assumed to be relative to this matcher. /// then `path` is assumed to be relative to this matcher.
///
/// # Panics
///
/// This method panics if the given file path is not under the root path
/// of this matcher.
pub fn matched_path_or_any_parents<P: AsRef<Path>>( pub fn matched_path_or_any_parents<P: AsRef<Path>>(
&self, &self,
path: P, path: P,
@@ -222,10 +234,8 @@ impl Gitignore {
return Match::None; return Match::None;
} }
let mut path = self.strip(path.as_ref()); let mut path = self.strip(path.as_ref());
debug_assert!( assert!(!path.has_root(), "path is expected to be under the root");
!path.has_root(),
"path is expect to be under the root"
);
match self.matched_stripped(path, is_dir) { match self.matched_stripped(path, is_dir) {
Match::None => (), // walk up Match::None => (), // walk up
a_match => return a_match, a_match => return a_match,
@@ -249,7 +259,7 @@ impl Gitignore {
return Match::None; return Match::None;
} }
let path = path.as_ref(); let path = path.as_ref();
let _matches = self.matches.get_default(); let _matches = self.matches.as_ref().unwrap().get_default();
let mut matches = _matches.borrow_mut(); let mut matches = _matches.borrow_mut();
let candidate = Candidate::new(path); let candidate = Candidate::new(path);
self.set.matches_candidate_into(&candidate, &mut *matches); self.set.matches_candidate_into(&candidate, &mut *matches);
@@ -345,7 +355,7 @@ impl GitignoreBuilder {
globs: self.globs.clone(), globs: self.globs.clone(),
num_ignores: nignore as u64, num_ignores: nignore as u64,
num_whitelists: nwhite as u64, num_whitelists: nwhite as u64,
matches: Arc::new(ThreadLocal::default()), matches: Some(Arc::new(ThreadLocal::default())),
}) })
} }
@@ -493,6 +503,8 @@ impl GitignoreBuilder {
/// Toggle whether the globs should be matched case insensitively or not. /// Toggle whether the globs should be matched case insensitively or not.
/// ///
/// When this option is changed, only globs added after the change will be affected.
///
/// This is disabled by default. /// This is disabled by default.
pub fn case_insensitive( pub fn case_insensitive(
&mut self, yes: bool &mut self, yes: bool
@@ -506,16 +518,27 @@ impl GitignoreBuilder {
/// ///
/// Note that the file path returned may not exist. /// Note that the file path returned may not exist.
fn gitconfig_excludes_path() -> Option<PathBuf> { fn gitconfig_excludes_path() -> Option<PathBuf> {
gitconfig_contents() // git supports $HOME/.gitconfig and $XDG_CONFIG_DIR/git/config. Notably,
.and_then(|data| parse_excludes_file(&data)) // both can be active at the same time, where $HOME/.gitconfig takes
.or_else(excludes_file_default) // precedent. So if $HOME/.gitconfig defines a `core.excludesFile`, then
// we're done.
match gitconfig_home_contents().and_then(|x| parse_excludes_file(&x)) {
Some(path) => return Some(path),
None => {}
}
match gitconfig_xdg_contents().and_then(|x| parse_excludes_file(&x)) {
Some(path) => return Some(path),
None => {}
}
excludes_file_default()
} }
/// Returns the file contents of git's global config file, if one exists. /// Returns the file contents of git's global config file, if one exists, in
fn gitconfig_contents() -> Option<Vec<u8>> { /// the user's home directory.
let home = match env::var_os("HOME") { fn gitconfig_home_contents() -> Option<Vec<u8>> {
let home = match home_dir() {
None => return None, None => return None,
Some(home) => PathBuf::from(home), Some(home) => home,
}; };
let mut file = match File::open(home.join(".gitconfig")) { let mut file = match File::open(home.join(".gitconfig")) {
Err(_) => return None, Err(_) => return None,
@@ -525,13 +548,28 @@ fn gitconfig_contents() -> Option<Vec<u8>> {
file.read_to_end(&mut contents).ok().map(|_| contents) file.read_to_end(&mut contents).ok().map(|_| contents)
} }
/// Returns the file contents of git's global config file, if one exists, in
/// the user's XDG_CONFIG_DIR directory.
fn gitconfig_xdg_contents() -> Option<Vec<u8>> {
let path = env::var_os("XDG_CONFIG_HOME")
.and_then(|x| if x.is_empty() { None } else { Some(PathBuf::from(x)) })
.or_else(|| home_dir().map(|p| p.join(".config")))
.map(|x| x.join("git/config"));
let mut file = match path.and_then(|p| File::open(p).ok()) {
None => return None,
Some(file) => io::BufReader::new(file),
};
let mut contents = vec![];
file.read_to_end(&mut contents).ok().map(|_| contents)
}
/// Returns the default file path for a global .gitignore file. /// Returns the default file path for a global .gitignore file.
/// ///
/// Specifically, this respects XDG_CONFIG_HOME. /// Specifically, this respects XDG_CONFIG_HOME.
fn excludes_file_default() -> Option<PathBuf> { fn excludes_file_default() -> Option<PathBuf> {
env::var_os("XDG_CONFIG_HOME") env::var_os("XDG_CONFIG_HOME")
.and_then(|x| if x.is_empty() { None } else { Some(PathBuf::from(x)) }) .and_then(|x| if x.is_empty() { None } else { Some(PathBuf::from(x)) })
.or_else(|| env::home_dir().map(|p| p.join(".config"))) .or_else(|| home_dir().map(|p| p.join(".config")))
.map(|x| x.join("git/ignore")) .map(|x| x.join("git/ignore"))
} }
@@ -543,7 +581,8 @@ fn parse_excludes_file(data: &[u8]) -> Option<PathBuf> {
// a full INI parser. Yuck. // a full INI parser. Yuck.
lazy_static! { lazy_static! {
static ref RE: Regex = Regex::new( static ref RE: Regex = Regex::new(
r"(?ium)^\s*excludesfile\s*=\s*(.+)\s*$").unwrap(); r"(?im)^\s*excludesfile\s*=\s*(.+)\s*$"
).unwrap();
}; };
let caps = match RE.captures(data) { let caps = match RE.captures(data) {
None => return None, None => return None,
@@ -554,13 +593,22 @@ fn parse_excludes_file(data: &[u8]) -> Option<PathBuf> {
/// Expands ~ in file paths to the value of $HOME. /// Expands ~ in file paths to the value of $HOME.
fn expand_tilde(path: &str) -> String { fn expand_tilde(path: &str) -> String {
let home = match env::var("HOME") { let home = match home_dir() {
Err(_) => return path.to_string(), None => return path.to_string(),
Ok(home) => home, Some(home) => home.to_string_lossy().into_owned(),
}; };
path.replace("~", &home) path.replace("~", &home)
} }
/// Returns the location of the user's home directory.
fn home_dir() -> Option<PathBuf> {
// We're fine with using env::home_dir for now. Its bugs are, IMO, pretty
// minor corner cases. We should still probably eventually migrate to
// the `dirs` crate to get a proper implementation.
#![allow(deprecated)]
env::home_dir()
}
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use std::path::Path; use std::path::Path;

View File

@@ -140,6 +140,8 @@ impl OverrideBuilder {
/// Toggle whether the globs should be matched case insensitively or not. /// Toggle whether the globs should be matched case insensitively or not.
/// ///
/// When this option is changed, only globs added after the change will be affected.
///
/// This is disabled by default. /// This is disabled by default.
pub fn case_insensitive( pub fn case_insensitive(
&mut self, yes: bool &mut self, yes: bool

View File

@@ -98,10 +98,13 @@ use {Error, Match};
const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
("agda", &["*.agda", "*.lagda"]), ("agda", &["*.agda", "*.lagda"]),
("aidl", &["*.aidl"]),
("amake", &["*.mk", "*.bp"]),
("asciidoc", &["*.adoc", "*.asc", "*.asciidoc"]), ("asciidoc", &["*.adoc", "*.asc", "*.asciidoc"]),
("asm", &["*.asm", "*.s", "*.S"]), ("asm", &["*.asm", "*.s", "*.S"]),
("avro", &["*.avdl", "*.avpr", "*.avsc"]), ("avro", &["*.avdl", "*.avpr", "*.avsc"]),
("awk", &["*.awk"]), ("awk", &["*.awk"]),
("bazel", &["*.bzl", "WORKSPACE", "BUILD"]),
("bitbake", &["*.bb", "*.bbappend", "*.bbclass", "*.conf", "*.inc"]), ("bitbake", &["*.bb", "*.bbappend", "*.bbclass", "*.conf", "*.inc"]),
("bzip2", &["*.bz2"]), ("bzip2", &["*.bz2"]),
("c", &["*.c", "*.h", "*.H"]), ("c", &["*.c", "*.h", "*.H"]),
@@ -131,6 +134,7 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
("elixir", &["*.ex", "*.eex", "*.exs"]), ("elixir", &["*.ex", "*.eex", "*.exs"]),
("elm", &["*.elm"]), ("elm", &["*.elm"]),
("erlang", &["*.erl", "*.hrl"]), ("erlang", &["*.erl", "*.hrl"]),
("fidl", &["*.fidl"]),
("fish", &["*.fish"]), ("fish", &["*.fish"]),
("fortran", &[ ("fortran", &[
"*.f", "*.F", "*.f77", "*.F77", "*.pfo", "*.f", "*.F", "*.f77", "*.F77", "*.pfo",
@@ -144,8 +148,9 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
("h", &["*.h", "*.hpp"]), ("h", &["*.h", "*.hpp"]),
("hbs", &["*.hbs"]), ("hbs", &["*.hbs"]),
("haskell", &["*.hs", "*.lhs"]), ("haskell", &["*.hs", "*.lhs"]),
("hs", &["*.hs", "*.lhs"]),
("html", &["*.htm", "*.html", "*.ejs"]), ("html", &["*.htm", "*.html", "*.ejs"]),
("java", &["*.java"]), ("java", &["*.java", "*.jsp"]),
("jinja", &["*.j2", "*.jinja", "*.jinja2"]), ("jinja", &["*.j2", "*.jinja", "*.jinja2"]),
("js", &[ ("js", &[
"*.js", "*.jsx", "*.vue", "*.js", "*.jsx", "*.vue",
@@ -188,6 +193,7 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
("log", &["*.log"]), ("log", &["*.log"]),
("lua", &["*.lua"]), ("lua", &["*.lua"]),
("lzma", &["*.lzma"]), ("lzma", &["*.lzma"]),
("lz4", &["*.lz4"]),
("m4", &["*.ac", "*.m4"]), ("m4", &["*.ac", "*.m4"]),
("make", &[ ("make", &[
"gnumakefile", "Gnumakefile", "GNUmakefile", "gnumakefile", "Gnumakefile", "GNUmakefile",
@@ -215,6 +221,7 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
("pod", &["*.pod"]), ("pod", &["*.pod"]),
("protobuf", &["*.proto"]), ("protobuf", &["*.proto"]),
("ps", &["*.cdxml", "*.ps1", "*.ps1xml", "*.psd1", "*.psm1"]), ("ps", &["*.cdxml", "*.ps1", "*.ps1xml", "*.psd1", "*.psm1"]),
("puppet", &["*.erb", "*.pp", "*.rb"]),
("purs", &["*.purs"]), ("purs", &["*.purs"]),
("py", &["*.py"]), ("py", &["*.py"]),
("qmake", &["*.pro", "*.pri", "*.prf"]), ("qmake", &["*.pro", "*.pri", "*.prf"]),
@@ -275,6 +282,7 @@ const DEFAULT_TYPES: &'static [(&'static str, &'static [&'static str])] = &[
("twig", &["*.twig"]), ("twig", &["*.twig"]),
("vala", &["*.vala"]), ("vala", &["*.vala"]),
("vb", &["*.vb"]), ("vb", &["*.vb"]),
("verilog", &["*.v", "*.vh", "*.sv", "*.svh"]),
("vhdl", &["*.vhd", "*.vhdl"]), ("vhdl", &["*.vhd", "*.vhdl"]),
("vim", &["*.vim"]), ("vim", &["*.vim"]),
("vimscript", &["*.vim"]), ("vimscript", &["*.vim"]),

View File

@@ -457,7 +457,6 @@ impl DirEntryRaw {
pub struct WalkBuilder { pub struct WalkBuilder {
paths: Vec<PathBuf>, paths: Vec<PathBuf>,
ig_builder: IgnoreBuilder, ig_builder: IgnoreBuilder,
parents: bool,
max_depth: Option<usize>, max_depth: Option<usize>,
max_filesize: Option<u64>, max_filesize: Option<u64>,
follow_links: bool, follow_links: bool,
@@ -472,7 +471,6 @@ impl fmt::Debug for WalkBuilder {
f.debug_struct("WalkBuilder") f.debug_struct("WalkBuilder")
.field("paths", &self.paths) .field("paths", &self.paths)
.field("ig_builder", &self.ig_builder) .field("ig_builder", &self.ig_builder)
.field("parents", &self.parents)
.field("max_depth", &self.max_depth) .field("max_depth", &self.max_depth)
.field("max_filesize", &self.max_filesize) .field("max_filesize", &self.max_filesize)
.field("follow_links", &self.follow_links) .field("follow_links", &self.follow_links)
@@ -492,7 +490,6 @@ impl WalkBuilder {
WalkBuilder { WalkBuilder {
paths: vec![path.as_ref().to_path_buf()], paths: vec![path.as_ref().to_path_buf()],
ig_builder: IgnoreBuilder::new(), ig_builder: IgnoreBuilder::new(),
parents: true,
max_depth: None, max_depth: None,
max_filesize: None, max_filesize: None,
follow_links: false, follow_links: false,
@@ -531,7 +528,6 @@ impl WalkBuilder {
ig_root: ig_root.clone(), ig_root: ig_root.clone(),
ig: ig_root.clone(), ig: ig_root.clone(),
max_filesize: self.max_filesize, max_filesize: self.max_filesize,
parents: self.parents,
} }
} }
@@ -547,7 +543,6 @@ impl WalkBuilder {
max_depth: self.max_depth, max_depth: self.max_depth,
max_filesize: self.max_filesize, max_filesize: self.max_filesize,
follow_links: self.follow_links, follow_links: self.follow_links,
parents: self.parents,
threads: self.threads, threads: self.threads,
} }
} }
@@ -679,14 +674,12 @@ impl WalkBuilder {
/// Enables reading ignore files from parent directories. /// Enables reading ignore files from parent directories.
/// ///
/// If this is enabled, then the parent directories of each file path given /// If this is enabled, then .gitignore files in parent directories of each
/// are traversed for ignore files (subject to the ignore settings on /// file path given are respected. Otherwise, they are ignored.
/// this builder). Note that file paths are canonicalized with respect to
/// the current working directory in order to determine parent directories.
/// ///
/// This is enabled by default. /// This is enabled by default.
pub fn parents(&mut self, yes: bool) -> &mut WalkBuilder { pub fn parents(&mut self, yes: bool) -> &mut WalkBuilder {
self.parents = yes; self.ig_builder.parents(yes);
self self
} }
@@ -765,7 +758,6 @@ pub struct Walk {
ig_root: Ignore, ig_root: Ignore,
ig: Ignore, ig: Ignore,
max_filesize: Option<u64>, max_filesize: Option<u64>,
parents: bool,
} }
impl Walk { impl Walk {
@@ -812,7 +804,7 @@ impl Iterator for Walk {
} }
Some((path, Some(it))) => { Some((path, Some(it))) => {
self.it = Some(it); self.it = Some(it);
if self.parents && path_is_dir(&path) { if path_is_dir(&path) {
let (ig, err) = self.ig_root.add_parents(path); let (ig, err) = self.ig_root.add_parents(path);
self.ig = ig; self.ig = ig;
if let Some(err) = err { if let Some(err) = err {
@@ -948,7 +940,6 @@ impl WalkState {
pub struct WalkParallel { pub struct WalkParallel {
paths: vec::IntoIter<PathBuf>, paths: vec::IntoIter<PathBuf>,
ig_root: Ignore, ig_root: Ignore,
parents: bool,
max_filesize: Option<u64>, max_filesize: Option<u64>,
max_depth: Option<usize>, max_depth: Option<usize>,
follow_links: bool, follow_links: bool,
@@ -1010,7 +1001,6 @@ impl WalkParallel {
num_waiting: num_waiting.clone(), num_waiting: num_waiting.clone(),
num_quitting: num_quitting.clone(), num_quitting: num_quitting.clone(),
threads: threads, threads: threads,
parents: self.parents,
max_depth: self.max_depth, max_depth: self.max_depth,
max_filesize: self.max_filesize, max_filesize: self.max_filesize,
follow_links: self.follow_links, follow_links: self.follow_links,
@@ -1126,9 +1116,6 @@ struct Worker {
num_quitting: Arc<AtomicUsize>, num_quitting: Arc<AtomicUsize>,
/// The total number of workers. /// The total number of workers.
threads: usize, threads: usize,
/// Whether to create ignore matchers for parents of caller specified
/// directories.
parents: bool,
/// The maximum depth of directories to descend. A value of `0` means no /// The maximum depth of directories to descend. A value of `0` means no
/// descension at all. /// descension at all.
max_depth: Option<usize>, max_depth: Option<usize>,
@@ -1156,14 +1143,12 @@ impl Worker {
} }
continue; continue;
} }
if self.parents {
if let Some(err) = work.add_parents() { if let Some(err) = work.add_parents() {
if (self.f)(Err(err)).is_quit() { if (self.f)(Err(err)).is_quit() {
self.quit_now(); self.quit_now();
return; return;
} }
} }
}
let readdir = match work.read_dir() { let readdir = match work.read_dir() {
Ok(readdir) => readdir, Ok(readdir) => readdir,
Err(err) => { Err(err) => {
@@ -1645,6 +1630,7 @@ mod tests {
#[test] #[test]
fn gitignore() { fn gitignore() {
let td = TempDir::new("walk-test-").unwrap(); let td = TempDir::new("walk-test-").unwrap();
mkdirp(td.path().join(".git"));
mkdirp(td.path().join("a")); mkdirp(td.path().join("a"));
wfile(td.path().join(".gitignore"), "foo"); wfile(td.path().join(".gitignore"), "foo");
wfile(td.path().join("foo"), ""); wfile(td.path().join("foo"), "");
@@ -1673,9 +1659,28 @@ mod tests {
assert_paths(td.path(), &builder, &["bar", "a", "a/bar"]); assert_paths(td.path(), &builder, &["bar", "a", "a/bar"]);
} }
#[test]
fn explicit_ignore_exclusive_use() {
let td = TempDir::new("walk-test-").unwrap();
let igpath = td.path().join(".not-an-ignore");
mkdirp(td.path().join("a"));
wfile(&igpath, "foo");
wfile(td.path().join("foo"), "");
wfile(td.path().join("a/foo"), "");
wfile(td.path().join("bar"), "");
wfile(td.path().join("a/bar"), "");
let mut builder = WalkBuilder::new(td.path());
builder.standard_filters(false);
assert!(builder.add_ignore(&igpath).is_none());
assert_paths(td.path(), &builder,
&[".not-an-ignore", "bar", "a", "a/bar"]);
}
#[test] #[test]
fn gitignore_parent() { fn gitignore_parent() {
let td = TempDir::new("walk-test-").unwrap(); let td = TempDir::new("walk-test-").unwrap();
mkdirp(td.path().join(".git"));
mkdirp(td.path().join("a")); mkdirp(td.path().join("a"));
wfile(td.path().join(".gitignore"), "foo"); wfile(td.path().join(".gitignore"), "foo");
wfile(td.path().join("a/foo"), ""); wfile(td.path().join("a/foo"), "");

View File

@@ -1,13 +1,11 @@
extern crate ignore; extern crate ignore;
use std::path::Path; use std::path::Path;
use ignore::gitignore::{Gitignore, GitignoreBuilder}; use ignore::gitignore::{Gitignore, GitignoreBuilder};
const IGNORE_FILE: &'static str =
const IGNORE_FILE: &'static str = "tests/gitignore_matched_path_or_any_parents_tests.gitignore"; "tests/gitignore_matched_path_or_any_parents_tests.gitignore";
fn get_gitignore() -> Gitignore { fn get_gitignore() -> Gitignore {
let mut builder = GitignoreBuilder::new("ROOT"); let mut builder = GitignoreBuilder::new("ROOT");
@@ -16,9 +14,8 @@ fn get_gitignore() -> Gitignore {
builder.build().unwrap() builder.build().unwrap()
} }
#[test] #[test]
#[should_panic(expected = "path is expect to be under the root")] #[should_panic(expected = "path is expected to be under the root")]
fn test_path_should_be_under_root() { fn test_path_should_be_under_root() {
let gitignore = get_gitignore(); let gitignore = get_gitignore();
let path = "/tmp/some_file"; let path = "/tmp/some_file";
@@ -26,11 +23,12 @@ fn test_path_should_be_under_root() {
assert!(false); assert!(false);
} }
#[test] #[test]
fn test_files_in_root() { fn test_files_in_root() {
let gitignore = get_gitignore(); let gitignore = get_gitignore();
let m = |path: &str| gitignore.matched_path_or_any_parents(Path::new(path), false); let m = |path: &str| {
gitignore.matched_path_or_any_parents(Path::new(path), false)
};
// 0x // 0x
assert!(m("ROOT/file_root_00").is_ignore()); assert!(m("ROOT/file_root_00").is_ignore());
@@ -61,7 +59,9 @@ fn test_files_in_root() {
#[test] #[test]
fn test_files_in_deep() { fn test_files_in_deep() {
let gitignore = get_gitignore(); let gitignore = get_gitignore();
let m = |path: &str| gitignore.matched_path_or_any_parents(Path::new(path), false); let m = |path: &str| {
gitignore.matched_path_or_any_parents(Path::new(path), false)
};
// 0x // 0x
assert!(m("ROOT/parent_dir/file_deep_00").is_ignore()); assert!(m("ROOT/parent_dir/file_deep_00").is_ignore());
@@ -92,8 +92,9 @@ fn test_files_in_deep() {
#[test] #[test]
fn test_dirs_in_root() { fn test_dirs_in_root() {
let gitignore = get_gitignore(); let gitignore = get_gitignore();
let m = let m = |path: &str, is_dir: bool| {
|path: &str, is_dir: bool| gitignore.matched_path_or_any_parents(Path::new(path), is_dir); gitignore.matched_path_or_any_parents(Path::new(path), is_dir)
};
// 00 // 00
assert!(m("ROOT/dir_root_00", true).is_ignore()); assert!(m("ROOT/dir_root_00", true).is_ignore());
@@ -196,20 +197,25 @@ fn test_dirs_in_root() {
#[test] #[test]
fn test_dirs_in_deep() { fn test_dirs_in_deep() {
let gitignore = get_gitignore(); let gitignore = get_gitignore();
let m = let m = |path: &str, is_dir: bool| {
|path: &str, is_dir: bool| gitignore.matched_path_or_any_parents(Path::new(path), is_dir); gitignore.matched_path_or_any_parents(Path::new(path), is_dir)
};
// 00 // 00
assert!(m("ROOT/parent_dir/dir_deep_00", true).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_00", true).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_00/file", false).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_00/file", false).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_00/child_dir", true).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_00/child_dir", true).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_00/child_dir/file", false).is_ignore()); assert!(
m("ROOT/parent_dir/dir_deep_00/child_dir/file", false).is_ignore()
);
// 01 // 01
assert!(m("ROOT/parent_dir/dir_deep_01", true).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_01", true).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_01/file", false).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_01/file", false).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_01/child_dir", true).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_01/child_dir", true).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_01/child_dir/file", false).is_ignore()); assert!(
m("ROOT/parent_dir/dir_deep_01/child_dir/file", false).is_ignore()
);
// 02 // 02
assert!(m("ROOT/parent_dir/dir_deep_02", true).is_none()); assert!(m("ROOT/parent_dir/dir_deep_02", true).is_none());
@@ -251,47 +257,67 @@ fn test_dirs_in_deep() {
assert!(m("ROOT/parent_dir/dir_deep_20", true).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_20", true).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_20/file", false).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_20/file", false).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_20/child_dir", true).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_20/child_dir", true).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_20/child_dir/file", false).is_ignore()); assert!(
m("ROOT/parent_dir/dir_deep_20/child_dir/file", false).is_ignore()
);
// 21 // 21
assert!(m("ROOT/parent_dir/dir_deep_21", true).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_21", true).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_21/file", false).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_21/file", false).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_21/child_dir", true).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_21/child_dir", true).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_21/child_dir/file", false).is_ignore()); assert!(
m("ROOT/parent_dir/dir_deep_21/child_dir/file", false).is_ignore()
);
// 22 // 22
assert!(m("ROOT/parent_dir/dir_deep_22", true).is_none()); // dir itself doesn't match // dir itself doesn't match
assert!(m("ROOT/parent_dir/dir_deep_22", true).is_none());
assert!(m("ROOT/parent_dir/dir_deep_22/file", false).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_22/file", false).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_22/child_dir", true).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_22/child_dir", true).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_22/child_dir/file", false).is_ignore()); assert!(
m("ROOT/parent_dir/dir_deep_22/child_dir/file", false).is_ignore()
);
// 23 // 23
assert!(m("ROOT/parent_dir/dir_deep_23", true).is_none()); // dir itself doesn't match // dir itself doesn't match
assert!(m("ROOT/parent_dir/dir_deep_23", true).is_none());
assert!(m("ROOT/parent_dir/dir_deep_23/file", false).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_23/file", false).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_23/child_dir", true).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_23/child_dir", true).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_23/child_dir/file", false).is_ignore()); assert!(
m("ROOT/parent_dir/dir_deep_23/child_dir/file", false).is_ignore()
);
// 30 // 30
assert!(m("ROOT/parent_dir/dir_deep_30", true).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_30", true).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_30/file", false).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_30/file", false).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_30/child_dir", true).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_30/child_dir", true).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_30/child_dir/file", false).is_ignore()); assert!(
m("ROOT/parent_dir/dir_deep_30/child_dir/file", false).is_ignore()
);
// 31 // 31
assert!(m("ROOT/parent_dir/dir_deep_31", true).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_31", true).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_31/file", false).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_31/file", false).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_31/child_dir", true).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_31/child_dir", true).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_31/child_dir/file", false).is_ignore()); assert!(
m("ROOT/parent_dir/dir_deep_31/child_dir/file", false).is_ignore()
);
// 32 // 32
assert!(m("ROOT/parent_dir/dir_deep_32", true).is_none()); // dir itself doesn't match // dir itself doesn't match
assert!(m("ROOT/parent_dir/dir_deep_32", true).is_none());
assert!(m("ROOT/parent_dir/dir_deep_32/file", false).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_32/file", false).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_32/child_dir", true).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_32/child_dir", true).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_32/child_dir/file", false).is_ignore()); assert!(
m("ROOT/parent_dir/dir_deep_32/child_dir/file", false).is_ignore()
);
// 33 // 33
assert!(m("ROOT/parent_dir/dir_deep_33", true).is_none()); // dir itself doesn't match // dir itself doesn't match
assert!(m("ROOT/parent_dir/dir_deep_33", true).is_none());
assert!(m("ROOT/parent_dir/dir_deep_33/file", false).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_33/file", false).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_33/child_dir", true).is_ignore()); assert!(m("ROOT/parent_dir/dir_deep_33/child_dir", true).is_ignore());
assert!(m("ROOT/parent_dir/dir_deep_33/child_dir/file", false).is_ignore()); assert!(
m("ROOT/parent_dir/dir_deep_33/child_dir/file", false).is_ignore()
);
} }

View File

@@ -1,14 +1,14 @@
class RipgrepBin < Formula class RipgrepBin < Formula
version '0.8.1' version '0.9.0'
desc "Recursively search directories for a regex pattern." desc "Recursively search directories for a regex pattern."
homepage "https://github.com/BurntSushi/ripgrep" homepage "https://github.com/BurntSushi/ripgrep"
if OS.mac? if OS.mac?
url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-apple-darwin.tar.gz" url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-apple-darwin.tar.gz"
sha256 "71f8d2907b473e5fc30159b822b0f1de247634ee292d5cc3fa1bb80222e0f613" sha256 "36003ea8b62ad6274dc14140039f448cdf5026827d53cf24dad2d84005557a8c"
elsif OS.linux? elsif OS.linux?
url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-unknown-linux-musl.tar.gz" url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-unknown-linux-musl.tar.gz"
sha256 "08b1aa1440a23a88c94cff41a860340ecf38e9108817aff30ff778c00c63eb76" sha256 "2eb4443e58f95051ff76ea036ed1faf940d5a04af4e7ff5a7dbd74576b907e99"
end end
conflicts_with "ripgrep" conflicts_with "ripgrep"

View File

@@ -11,4 +11,5 @@ parts:
source: . source: .
apps: apps:
rg: rg:
command: env PATH=$SNAP/bin:$PATH rg adapter: none
command: ./bin/rg

View File

@@ -2,8 +2,8 @@
// including some light validation. // including some light validation.
// //
// This module is purposely written in a bare-bones way, since it is included // This module is purposely written in a bare-bones way, since it is included
// in ripgrep's build.rs file as a way to generate completion files for common // in ripgrep's build.rs file as a way to generate a man page and completion
// shells. // files for common shells.
// //
// The only other place that ripgrep deals with clap is in src/args.rs, which // The only other place that ripgrep deals with clap is in src/args.rs, which
// is where we read clap's configuration from the end user's arguments and turn // is where we read clap's configuration from the end user's arguments and turn
@@ -33,7 +33,8 @@ const USAGE: &str = "
rg [OPTIONS] PATTERN [PATH ...] rg [OPTIONS] PATTERN [PATH ...]
rg [OPTIONS] [-e PATTERN ...] [-f PATTERNFILE ...] [PATH ...] rg [OPTIONS] [-e PATTERN ...] [-f PATTERNFILE ...] [PATH ...]
rg [OPTIONS] --files [PATH ...] rg [OPTIONS] --files [PATH ...]
rg [OPTIONS] --type-list"; rg [OPTIONS] --type-list
command | rg [OPTIONS] PATTERN";
const TEMPLATE: &str = "\ const TEMPLATE: &str = "\
{bin} {version} {bin} {version}
@@ -475,26 +476,9 @@ impl RGArg {
}); });
self self
} }
/// Indicate that any value given to this argument should be a valid
/// line number width. A valid line number width cannot start with `0`
/// to maintain compatibility with future improvements that add support
/// for padding character specifies.
fn line_number_width(mut self) -> RGArg {
self.claparg = self.claparg.validator(|val| {
if val.starts_with("0") {
Err(String::from(
"Custom padding characters are currently not supported. \
Please enter only a numeric value."))
} else {
val.parse::<usize>().map(|_| ()).map_err(|err| err.to_string())
}
});
self
}
} }
// We add an extra space to long descriptions so that a black line is inserted // We add an extra space to long descriptions so that a blank line is inserted
// between flag descriptions in --help output. // between flag descriptions in --help output.
macro_rules! long { macro_rules! long {
($lit:expr) => { concat!($lit, " ") } ($lit:expr) => { concat!($lit, " ") }
@@ -535,15 +519,17 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
flag_ignore_file(&mut args); flag_ignore_file(&mut args);
flag_invert_match(&mut args); flag_invert_match(&mut args);
flag_line_number(&mut args); flag_line_number(&mut args);
flag_line_number_width(&mut args);
flag_line_regexp(&mut args); flag_line_regexp(&mut args);
flag_max_columns(&mut args); flag_max_columns(&mut args);
flag_max_count(&mut args); flag_max_count(&mut args);
flag_max_depth(&mut args);
flag_max_filesize(&mut args); flag_max_filesize(&mut args);
flag_maxdepth(&mut args);
flag_mmap(&mut args); flag_mmap(&mut args);
flag_multiline(&mut args);
flag_no_config(&mut args); flag_no_config(&mut args);
flag_no_ignore(&mut args); flag_no_ignore(&mut args);
flag_no_ignore_global(&mut args);
flag_no_ignore_messages(&mut args);
flag_no_ignore_parent(&mut args); flag_no_ignore_parent(&mut args);
flag_no_ignore_vcs(&mut args); flag_no_ignore_vcs(&mut args);
flag_no_messages(&mut args); flag_no_messages(&mut args);
@@ -551,6 +537,7 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
flag_only_matching(&mut args); flag_only_matching(&mut args);
flag_path_separator(&mut args); flag_path_separator(&mut args);
flag_passthru(&mut args); flag_passthru(&mut args);
flag_pre(&mut args);
flag_pretty(&mut args); flag_pretty(&mut args);
flag_quiet(&mut args); flag_quiet(&mut args);
flag_regex_size_limit(&mut args); flag_regex_size_limit(&mut args);
@@ -740,9 +727,17 @@ fn flag_column(args: &mut Vec<RGArg>) {
Show column numbers (1-based). This only shows the column numbers for the first Show column numbers (1-based). This only shows the column numbers for the first
match on each line. This does not try to account for Unicode. One byte is equal match on each line. This does not try to account for Unicode. One byte is equal
to one column. This implies --line-number. to one column. This implies --line-number.
This flag can be disabled with --no-column.
"); ");
let arg = RGArg::switch("column") let arg = RGArg::switch("column")
.help(SHORT).long_help(LONG); .help(SHORT).long_help(LONG)
.overrides("no-column");
args.push(arg);
let arg = RGArg::switch("no-column")
.hidden()
.overrides("column");
args.push(arg); args.push(arg);
} }
@@ -819,10 +814,23 @@ fn flag_debug(args: &mut Vec<RGArg>) {
const SHORT: &str = "Show debug messages."; const SHORT: &str = "Show debug messages.";
const LONG: &str = long!("\ const LONG: &str = long!("\
Show debug messages. Please use this when filing a bug report. Show debug messages. Please use this when filing a bug report.
The --debug flag is generally useful for figuring out why ripgrep skipped
searching a particular file. The debug messages should mention all files
skipped and why they were skipped.
To get even more debug output, use the --trace flag, which implies --debug
along with additional trace data. With --trace, the output could be quite
large and is generally more useful for development.
"); ");
let arg = RGArg::switch("debug") let arg = RGArg::switch("debug")
.help(SHORT).long_help(LONG); .help(SHORT).long_help(LONG);
args.push(arg); args.push(arg);
let arg = RGArg::switch("trace")
.hidden()
.overrides("debug");
args.push(arg);
} }
fn flag_dfa_size_limit(args: &mut Vec<RGArg>) { fn flag_dfa_size_limit(args: &mut Vec<RGArg>) {
@@ -918,9 +926,17 @@ fn flag_fixed_strings(args: &mut Vec<RGArg>) {
Treat the pattern as a literal string instead of a regular expression. When Treat the pattern as a literal string instead of a regular expression. When
this flag is used, special regular expression meta characters such as .(){}*+ this flag is used, special regular expression meta characters such as .(){}*+
do not need to be escaped. do not need to be escaped.
This flag can be disabled with --no-fixed-strings.
"); ");
let arg = RGArg::switch("fixed-strings").short("F") let arg = RGArg::switch("fixed-strings").short("F")
.help(SHORT).long_help(LONG); .help(SHORT).long_help(LONG)
.overrides("no-fixed-strings");
args.push(arg);
let arg = RGArg::switch("no-fixed-strings")
.hidden()
.overrides("fixed-strings");
args.push(arg); args.push(arg);
} }
@@ -1091,18 +1107,6 @@ terminal.
args.push(arg); args.push(arg);
} }
fn flag_line_number_width(args: &mut Vec<RGArg>) {
const SHORT: &str = "Left pad line numbers up to NUM width.";
const LONG: &str = long!("\
Left pad line numbers up to NUM width. Space is used as the default padding
character. This has no effect if --no-line-number is enabled.
");
let arg = RGArg::flag("line-number-width", "NUM")
.help(SHORT).long_help(LONG)
.line_number_width();
args.push(arg);
}
fn flag_line_regexp(args: &mut Vec<RGArg>) { fn flag_line_regexp(args: &mut Vec<RGArg>) {
const SHORT: &str = "Only show matches surrounded by line boundaries."; const SHORT: &str = "Only show matches surrounded by line boundaries.";
const LONG: &str = long!("\ const LONG: &str = long!("\
@@ -1143,6 +1147,23 @@ Limit the number of matching lines per file searched to NUM.
args.push(arg); args.push(arg);
} }
fn flag_max_depth(args: &mut Vec<RGArg>) {
const SHORT: &str = "Descend at most NUM directories.";
const LONG: &str = long!("\
Limit the depth of directory traversal to NUM levels beyond the paths given. A
value of zero only searches the explicitly given paths themselves.
For example, 'rg --max-depth 0 dir/' is a no-op because dir/ will not be
descended into. 'rg --max-depth 1 dir/' will search only the direct children of
'dir'.
");
let arg = RGArg::flag("max-depth", "NUM")
.help(SHORT).long_help(LONG)
.alias("maxdepth")
.number();
args.push(arg);
}
fn flag_max_filesize(args: &mut Vec<RGArg>) { fn flag_max_filesize(args: &mut Vec<RGArg>) {
const SHORT: &str = "Ignore files larger than NUM in size."; const SHORT: &str = "Ignore files larger than NUM in size.";
const LONG: &str = long!("\ const LONG: &str = long!("\
@@ -1159,22 +1180,6 @@ Examples: --max-filesize 50K or --max-filesize 80M
args.push(arg); args.push(arg);
} }
fn flag_maxdepth(args: &mut Vec<RGArg>) {
const SHORT: &str = "Descend at most NUM directories.";
const LONG: &str = long!("\
Limit the depth of directory traversal to NUM levels beyond the paths given. A
value of zero only searches the explicitly given paths themselves.
For example, 'rg --maxdepth 0 dir/' is a no-op because dir/ will not be
descended into. 'rg --maxdepth 1 dir/' will search only the direct children of
'dir'.
");
let arg = RGArg::flag("maxdepth", "NUM")
.help(SHORT).long_help(LONG)
.number();
args.push(arg);
}
fn flag_mmap(args: &mut Vec<RGArg>) { fn flag_mmap(args: &mut Vec<RGArg>) {
const SHORT: &str = "Search using memory maps when possible."; const SHORT: &str = "Search using memory maps when possible.";
const LONG: &str = long!("\ const LONG: &str = long!("\
@@ -1207,6 +1212,24 @@ This flag overrides --mmap.
args.push(arg); args.push(arg);
} }
fn flag_multiline(args: &mut Vec<RGArg>) {
const SHORT: &str = "Enable matching across multiple lines.";
const LONG: &str = long!("\
Enable matching across multiple lines.
This flag can be disabled with --no-multiline.
");
let arg = RGArg::switch("multiline")
.help(SHORT).long_help(LONG)
.overrides("no-multiline");
args.push(arg);
let arg = RGArg::switch("no-multiline")
.hidden()
.overrides("multiline");
args.push(arg);
}
fn flag_no_config(args: &mut Vec<RGArg>) { fn flag_no_config(args: &mut Vec<RGArg>) {
const SHORT: &str = "Never read configuration files."; const SHORT: &str = "Never read configuration files.";
const LONG: &str = long!("\ const LONG: &str = long!("\
@@ -1240,6 +1263,45 @@ This flag can be disabled with the --ignore flag.
args.push(arg); args.push(arg);
} }
fn flag_no_ignore_global(args: &mut Vec<RGArg>) {
const SHORT: &str = "Don't respect global ignore files.";
const LONG: &str = long!("\
Don't respect ignore files that come from \"global\" sources such as git's
`core.excludesFile` configuration option (which defaults to
`$HOME/.config/git/ignore).
This flag can be disabled with the --ignore-global flag.
");
let arg = RGArg::switch("no-ignore-global")
.help(SHORT).long_help(LONG)
.overrides("ignore-global");
args.push(arg);
let arg = RGArg::switch("ignore-global")
.hidden()
.overrides("no-ignore-global");
args.push(arg);
}
fn flag_no_ignore_messages(args: &mut Vec<RGArg>) {
const SHORT: &str = "Suppress gitignore parse error messages.";
const LONG: &str = long!("\
Suppresses all error messages related to parsing ignore files such as .ignore
or .gitignore.
This flag can be disabled with the --ignore-messages flag.
");
let arg = RGArg::switch("no-ignore-messages")
.help(SHORT).long_help(LONG)
.overrides("ignore-messages");
args.push(arg);
let arg = RGArg::switch("ignore-messages")
.hidden()
.overrides("no-ignore-messages");
args.push(arg);
}
fn flag_no_ignore_parent(args: &mut Vec<RGArg>) { fn flag_no_ignore_parent(args: &mut Vec<RGArg>) {
const SHORT: &str = "Don't respect ignore files in parent directories."; const SHORT: &str = "Don't respect ignore files in parent directories.";
const LONG: &str = long!("\ const LONG: &str = long!("\
@@ -1279,10 +1341,10 @@ This flag can be disabled with the --ignore-vcs flag.
} }
fn flag_no_messages(args: &mut Vec<RGArg>) { fn flag_no_messages(args: &mut Vec<RGArg>) {
const SHORT: &str = "Suppress all error messages."; const SHORT: &str = "Suppress some error messages.";
const LONG: &str = long!("\ const LONG: &str = long!("\
Suppress all error messages. This provides the same behavior as redirecting Suppress all error messages related to opening and reading files. Error
stderr to /dev/null on Unix-like systems. messages related to the syntax of the pattern given are still shown.
This flag can be disabled with the --messages flag. This flag can be disabled with the --messages flag.
"); ");
@@ -1344,13 +1406,10 @@ the empty string. For example, if you are searching using 'rg foo' then using
'rg \"^|foo\"' instead will emit every line in every file searched, but only 'rg \"^|foo\"' instead will emit every line in every file searched, but only
occurrences of 'foo' will be highlighted. This flag enables the same behavior occurrences of 'foo' will be highlighted. This flag enables the same behavior
without needing to modify the pattern. without needing to modify the pattern.
This flag conflicts with the --only-matching and --replace flags.
"); ");
let arg = RGArg::switch("passthru") let arg = RGArg::switch("passthru")
.help(SHORT).long_help(LONG) .help(SHORT).long_help(LONG)
.alias("passthrough") .alias("passthrough");
.conflicts(&["only-matching", "replace"]);
args.push(arg); args.push(arg);
} }
@@ -1372,6 +1431,9 @@ fn flag_quiet(args: &mut Vec<RGArg>) {
Do not print anything to stdout. If a match is found in a file, then ripgrep Do not print anything to stdout. If a match is found in a file, then ripgrep
will stop searching. This is useful when ripgrep is used only for its exit will stop searching. This is useful when ripgrep is used only for its exit
code (which will be an error if no matches are found). code (which will be an error if no matches are found).
When --files is used, then ripgrep will stop finding files after finding the
first file that matches all ignore rules.
"); ");
let arg = RGArg::switch("quiet").short("q") let arg = RGArg::switch("quiet").short("q")
.help(SHORT).long_help(LONG); .help(SHORT).long_help(LONG);
@@ -1438,7 +1500,7 @@ This flag can be used with the -o/--only-matching flag.
fn flag_search_zip(args: &mut Vec<RGArg>) { fn flag_search_zip(args: &mut Vec<RGArg>) {
const SHORT: &str = "Search in compressed files."; const SHORT: &str = "Search in compressed files.";
const LONG: &str = long!("\ const LONG: &str = long!("\
Search in compressed files. Currently gz, bz2, xz, and lzma files are Search in compressed files. Currently gz, bz2, xz, lzma and lz4 files are
supported. This option expects the decompression binaries to be available in supported. This option expects the decompression binaries to be available in
your PATH. your PATH.
@@ -1446,7 +1508,8 @@ This flag can be disabled with --no-search-zip.
"); ");
let arg = RGArg::switch("search-zip").short("z") let arg = RGArg::switch("search-zip").short("z")
.help(SHORT).long_help(LONG) .help(SHORT).long_help(LONG)
.overrides("no-search-zip"); .overrides("no-search-zip")
.overrides("pre");
args.push(arg); args.push(arg);
let arg = RGArg::switch("no-search-zip") let arg = RGArg::switch("no-search-zip")
@@ -1455,6 +1518,64 @@ This flag can be disabled with --no-search-zip.
args.push(arg); args.push(arg);
} }
fn flag_pre(args: &mut Vec<RGArg>) {
const SHORT: &str = "search outputs of COMMAND FILE for each FILE";
const LONG: &str = long!("\
For each input FILE, search the standard output of COMMAND FILE rather than the
contents of FILE. This option expects the COMMAND program to either be an
absolute path or to be available in your PATH. Either an empty string COMMAND
or the `--no-pre` flag will disable this behavior.
WARNING: When this flag is set, ripgrep will unconditionally spawn a
process for every file that is searched. Therefore, this can incur an
unnecessarily large performance penalty if you don't otherwise need the
flexibility offered by this flag.
A preprocessor is not run when ripgrep is searching stdin.
When searching over sets of files that may require one of several decoders
as preprocessors, COMMAND should be a wrapper program or script which first
classifies FILE based on magic numbers/content or based on the FILE name and
then dispatches to an appropriate preprocessor. Each COMMAND also has its
standard input connected to FILE for convenience.
For example, a shell script for COMMAND might look like:
case \"$1\" in
*.pdf)
exec pdftotext \"$1\" -
;;
*)
case $(file \"$1\") in
*Zstandard*)
exec pzstd -cdq
;;
*)
exec cat
;;
esac
;;
esac
The above script uses `pdftotext` to convert a PDF file to plain text. For
all other files, the script uses the `file` utility to sniff the type of the
file based on its contents. If it is a compressed file in the Zstandard format,
then `pzstd` is used to decompress the contents to stdout.
This overrides the -z/--search-zip flag.
");
let arg = RGArg::flag("pre", "COMMAND")
.help(SHORT).long_help(LONG)
.overrides("no-pre")
.overrides("search-zip");
args.push(arg);
let arg = RGArg::switch("no-pre")
.hidden()
.overrides("pre");
args.push(arg);
}
fn flag_smart_case(args: &mut Vec<RGArg>) { fn flag_smart_case(args: &mut Vec<RGArg>) {
const SHORT: &str = "Smart case search."; const SHORT: &str = "Smart case search.";
const LONG: &str = long!("\ const LONG: &str = long!("\

File diff suppressed because it is too large Load Diff

View File

@@ -12,10 +12,7 @@ use std::path::{Path, PathBuf};
use Result; use Result;
/// Return a sequence of arguments derived from ripgrep rc configuration files. /// Return a sequence of arguments derived from ripgrep rc configuration files.
/// pub fn args() -> Vec<OsString> {
/// If no_messages is false and there was a problem reading a config file,
/// then errors are printed to stderr.
pub fn args(no_messages: bool) -> Vec<OsString> {
let config_path = match env::var_os("RIPGREP_CONFIG_PATH") { let config_path = match env::var_os("RIPGREP_CONFIG_PATH") {
None => return vec![], None => return vec![],
Some(config_path) => { Some(config_path) => {
@@ -28,20 +25,20 @@ pub fn args(no_messages: bool) -> Vec<OsString> {
let (args, errs) = match parse(&config_path) { let (args, errs) = match parse(&config_path) {
Ok((args, errs)) => (args, errs), Ok((args, errs)) => (args, errs),
Err(err) => { Err(err) => {
if !no_messages { message!("{}", err);
eprintln!("{}", err);
}
return vec![]; return vec![];
} }
}; };
if !no_messages && !errs.is_empty() { if !errs.is_empty() {
for err in errs { for err in errs {
eprintln!("{}:{}", config_path.display(), err); message!("{}:{}", config_path.display(), err);
} }
} }
debug!( debug!(
"{}: arguments loaded from config file: {:?}", "{}: arguments loaded from config file: {:?}",
config_path.display(), args); config_path.display(),
args
);
args args
} }
@@ -59,7 +56,7 @@ fn parse<P: AsRef<Path>>(
let path = path.as_ref(); let path = path.as_ref();
match File::open(&path) { match File::open(&path) {
Ok(file) => parse_reader(file), Ok(file) => parse_reader(file),
Err(err) => errored!("{}: {}", path.display(), err), Err(err) => Err(From::from(format!("{}: {}", path.display(), err))),
} }
} }

View File

@@ -1,456 +0,0 @@
use std::cmp;
use std::io::{self, Read};
use encoding_rs::{Decoder, Encoding, UTF_8};
/// A BOM is at least 2 bytes and at most 3 bytes.
///
/// If fewer than 2 bytes are available to be read at the beginning of a
/// reader, then a BOM is `None`.
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
struct Bom {
bytes: [u8; 3],
len: usize,
}
impl Bom {
fn as_slice(&self) -> &[u8] {
&self.bytes[0..self.len]
}
fn decoder(&self) -> Option<Decoder> {
let bom = self.as_slice();
if bom.len() < 3 {
return None;
}
if let Some((enc, _)) = Encoding::for_bom(bom) {
if enc != UTF_8 {
return Some(enc.new_decoder_with_bom_removal());
}
}
None
}
}
/// `BomPeeker` wraps `R` and satisfies the `io::Read` interface while also
/// providing a peek at the BOM if one exists. Peeking at the BOM does not
/// advance the reader.
struct BomPeeker<R> {
rdr: R,
bom: Option<Bom>,
nread: usize,
}
impl<R: io::Read> BomPeeker<R> {
/// Create a new BomPeeker.
///
/// The first three bytes can be read using the `peek_bom` method, but
/// will not advance the reader.
fn new(rdr: R) -> BomPeeker<R> {
BomPeeker { rdr: rdr, bom: None, nread: 0 }
}
/// Peek at the first three bytes of the underlying reader.
///
/// This does not advance the reader provided by `BomPeeker`.
///
/// If the underlying reader does not have at least two bytes available,
/// then `None` is returned.
fn peek_bom(&mut self) -> io::Result<Bom> {
if let Some(bom) = self.bom {
return Ok(bom);
}
self.bom = Some(Bom { bytes: [0; 3], len: 0 });
let mut buf = [0u8; 3];
let bom_len = read_full(&mut self.rdr, &mut buf)?;
self.bom = Some(Bom { bytes: buf, len: bom_len });
Ok(self.bom.unwrap())
}
}
impl<R: io::Read> io::Read for BomPeeker<R> {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
if self.nread < 3 {
let bom = self.peek_bom()?;
let bom = bom.as_slice();
if self.nread < bom.len() {
let rest = &bom[self.nread..];
let len = cmp::min(buf.len(), rest.len());
buf[..len].copy_from_slice(&rest[..len]);
self.nread += len;
return Ok(len);
}
}
let nread = self.rdr.read(buf)?;
self.nread += nread;
Ok(nread)
}
}
/// Like `io::Read::read_exact`, except it never returns `UnexpectedEof` and
/// instead returns the number of bytes read if EOF is seen before filling
/// `buf`.
fn read_full<R: io::Read>(
mut rdr: R,
mut buf: &mut [u8],
) -> io::Result<usize> {
let mut nread = 0;
while !buf.is_empty() {
match rdr.read(buf) {
Ok(0) => break,
Ok(n) => {
nread += n;
let tmp = buf;
buf = &mut tmp[n..];
}
Err(ref e) if e.kind() == io::ErrorKind::Interrupted => {}
Err(e) => return Err(e),
}
}
Ok(nread)
}
/// A reader that transcodes to UTF-8. The source encoding is determined by
/// inspecting the BOM from the stream read from `R`, if one exists. If a
/// UTF-16 BOM exists, then the source stream is transcoded to UTF-8 with
/// invalid UTF-16 sequences translated to the Unicode replacement character.
/// In all other cases, the underlying reader is passed through unchanged.
///
/// `R` is the type of the underlying reader and `B` is the type of an internal
/// buffer used to store the results of transcoding.
///
/// Note that not all methods on `io::Read` work with this implementation.
/// For example, the `bytes` adapter method attempts to read a single byte at
/// a time, but this implementation requires a buffer of size at least `4`. If
/// a buffer of size less than 4 is given, then an error is returned.
pub struct DecodeReader<R, B> {
/// The underlying reader, wrapped in a peeker for reading a BOM if one
/// exists.
rdr: BomPeeker<R>,
/// The internal buffer to store transcoded bytes before they are read by
/// callers.
buf: B,
/// The current position in `buf`. Subsequent reads start here.
pos: usize,
/// The number of transcoded bytes in `buf`. Subsequent reads end here.
buflen: usize,
/// Whether this is the first read or not (in which we inspect the BOM).
first: bool,
/// Whether a "last" read has occurred. After this point, EOF will always
/// be returned.
last: bool,
/// The underlying text decoder derived from the BOM, if one exists.
decoder: Option<Decoder>,
}
impl<R: io::Read, B: AsMut<[u8]>> DecodeReader<R, B> {
/// Create a new transcoder that converts a source stream to valid UTF-8.
///
/// If an encoding is specified, then it is used to transcode `rdr` to
/// UTF-8. Otherwise, if no encoding is specified, and if a UTF-16 BOM is
/// found, then the corresponding UTF-16 encoding is used to transcode
/// `rdr` to UTF-8. In all other cases, `rdr` is assumed to be at least
/// ASCII-compatible and passed through untouched.
///
/// Errors in the encoding of `rdr` are handled with the Unicode
/// replacement character. If no encoding of `rdr` is specified, then
/// errors are not handled.
pub fn new(
rdr: R,
buf: B,
enc: Option<&'static Encoding>,
) -> DecodeReader<R, B> {
DecodeReader {
rdr: BomPeeker::new(rdr),
buf: buf,
buflen: 0,
pos: 0,
first: enc.is_none(),
last: false,
decoder: enc.map(|enc| enc.new_decoder_with_bom_removal()),
}
}
/// Fill the internal buffer from the underlying reader.
///
/// If there are unread bytes in the internal buffer, then we move them
/// to the beginning of the internal buffer and fill the remainder.
///
/// If the internal buffer is too small to read additional bytes, then an
/// error is returned.
#[inline(always)] // massive perf benefit (???)
fn fill(&mut self) -> io::Result<()> {
if self.pos < self.buflen {
if self.buflen >= self.buf.as_mut().len() {
return Err(io::Error::new(
io::ErrorKind::Other,
"DecodeReader: internal buffer exhausted"));
}
let newlen = self.buflen - self.pos;
let mut tmp = Vec::with_capacity(newlen);
tmp.extend_from_slice(&self.buf.as_mut()[self.pos..self.buflen]);
self.buf.as_mut()[..newlen].copy_from_slice(&tmp);
self.buflen = newlen;
} else {
self.buflen = 0;
}
self.pos = 0;
self.buflen +=
self.rdr.read(&mut self.buf.as_mut()[self.buflen..])?;
Ok(())
}
/// Transcode the inner stream to UTF-8 in `buf`. This assumes that there
/// is a decoder capable of transcoding the inner stream to UTF-8. This
/// returns the number of bytes written to `buf`.
///
/// When this function returns, exactly one of the following things will
/// be true:
///
/// 1. A non-zero number of bytes were written to `buf`.
/// 2. The underlying reader reached EOF.
/// 3. An error is returned: the internal buffer ran out of room.
/// 4. An I/O error occurred.
///
/// Note that `buf` must have at least 4 bytes of space.
fn transcode(&mut self, buf: &mut [u8]) -> io::Result<usize> {
assert!(buf.len() >= 4);
if self.last {
return Ok(0);
}
if self.pos >= self.buflen {
self.fill()?;
}
let mut nwrite = 0;
loop {
let (_, nin, nout, _) =
self.decoder.as_mut().unwrap().decode_to_utf8(
&self.buf.as_mut()[self.pos..self.buflen], buf, false);
self.pos += nin;
nwrite += nout;
// If we've written at least one byte to the caller-provided
// buffer, then our mission is complete.
if nwrite > 0 {
break;
}
// Otherwise, we know that our internal buffer has insufficient
// data to transcode at least one char, so we attempt to refill it.
self.fill()?;
// Quit on EOF.
if self.buflen == 0 {
self.pos = 0;
self.last = true;
let (_, _, nout, _) =
self.decoder.as_mut().unwrap().decode_to_utf8(
&[], buf, true);
return Ok(nout);
}
}
Ok(nwrite)
}
#[inline(never)] // impacts perf...
fn detect(&mut self) -> io::Result<()> {
let bom = self.rdr.peek_bom()?;
self.decoder = bom.decoder();
Ok(())
}
}
impl<R: io::Read, B: AsMut<[u8]>> io::Read for DecodeReader<R, B> {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
if self.first {
self.first = false;
self.detect()?;
}
if self.decoder.is_none() {
return self.rdr.read(buf);
}
// When decoding UTF-8, we need at least 4 bytes of space to guarantee
// that we can decode at least one codepoint. If we don't have it, we
// can either return `0` for the number of bytes read or return an
// error. Since `0` would be interpreted as a possibly premature EOF,
// we opt for an error.
if buf.len() < 4 {
return Err(io::Error::new(
io::ErrorKind::Other,
"DecodeReader: byte buffer must have length at least 4"));
}
self.transcode(buf)
}
}
#[cfg(test)]
mod tests {
use std::io::Read;
use encoding_rs::Encoding;
use super::{Bom, BomPeeker, DecodeReader};
fn read_to_string<R: Read>(mut rdr: R) -> String {
let mut s = String::new();
rdr.read_to_string(&mut s).unwrap();
s
}
#[test]
fn peeker_empty() {
let buf = [];
let mut peeker = BomPeeker::new(&buf[..]);
assert_eq!(Bom { bytes: [0; 3], len: 0}, peeker.peek_bom().unwrap());
let mut tmp = [0; 100];
assert_eq!(0, peeker.read(&mut tmp).unwrap());
}
#[test]
fn peeker_one() {
let buf = [1];
let mut peeker = BomPeeker::new(&buf[..]);
assert_eq!(
Bom { bytes: [1, 0, 0], len: 1},
peeker.peek_bom().unwrap());
let mut tmp = [0; 100];
assert_eq!(1, peeker.read(&mut tmp).unwrap());
assert_eq!(1, tmp[0]);
assert_eq!(0, peeker.read(&mut tmp).unwrap());
}
#[test]
fn peeker_two() {
let buf = [1, 2];
let mut peeker = BomPeeker::new(&buf[..]);
assert_eq!(
Bom { bytes: [1, 2, 0], len: 2},
peeker.peek_bom().unwrap());
let mut tmp = [0; 100];
assert_eq!(2, peeker.read(&mut tmp).unwrap());
assert_eq!(1, tmp[0]);
assert_eq!(2, tmp[1]);
assert_eq!(0, peeker.read(&mut tmp).unwrap());
}
#[test]
fn peeker_three() {
let buf = [1, 2, 3];
let mut peeker = BomPeeker::new(&buf[..]);
assert_eq!(
Bom { bytes: [1, 2, 3], len: 3},
peeker.peek_bom().unwrap());
let mut tmp = [0; 100];
assert_eq!(3, peeker.read(&mut tmp).unwrap());
assert_eq!(1, tmp[0]);
assert_eq!(2, tmp[1]);
assert_eq!(3, tmp[2]);
assert_eq!(0, peeker.read(&mut tmp).unwrap());
}
#[test]
fn peeker_four() {
let buf = [1, 2, 3, 4];
let mut peeker = BomPeeker::new(&buf[..]);
assert_eq!(
Bom { bytes: [1, 2, 3], len: 3},
peeker.peek_bom().unwrap());
let mut tmp = [0; 100];
assert_eq!(3, peeker.read(&mut tmp).unwrap());
assert_eq!(1, tmp[0]);
assert_eq!(2, tmp[1]);
assert_eq!(3, tmp[2]);
assert_eq!(1, peeker.read(&mut tmp).unwrap());
assert_eq!(4, tmp[0]);
assert_eq!(0, peeker.read(&mut tmp).unwrap());
}
#[test]
fn peeker_one_at_a_time() {
let buf = [1, 2, 3, 4];
let mut peeker = BomPeeker::new(&buf[..]);
let mut tmp = [0; 1];
assert_eq!(0, peeker.read(&mut tmp[..0]).unwrap());
assert_eq!(0, tmp[0]);
assert_eq!(1, peeker.read(&mut tmp).unwrap());
assert_eq!(1, tmp[0]);
assert_eq!(1, peeker.read(&mut tmp).unwrap());
assert_eq!(2, tmp[0]);
assert_eq!(1, peeker.read(&mut tmp).unwrap());
assert_eq!(3, tmp[0]);
assert_eq!(1, peeker.read(&mut tmp).unwrap());
assert_eq!(4, tmp[0]);
}
// In cases where all we have is a bom, we expect the bytes to be
// passed through unchanged.
#[test]
fn trans_utf16_bom() {
let srcbuf = vec![0xFF, 0xFE];
let mut dstbuf = vec![0; 8 * (1<<10)];
let mut rdr = DecodeReader::new(&*srcbuf, vec![0; 8 * (1<<10)], None);
let n = rdr.read(&mut dstbuf).unwrap();
assert_eq!(&*srcbuf, &dstbuf[..n]);
let srcbuf = vec![0xFE, 0xFF];
let mut rdr = DecodeReader::new(&*srcbuf, vec![0; 8 * (1<<10)], None);
let n = rdr.read(&mut dstbuf).unwrap();
assert_eq!(&*srcbuf, &dstbuf[..n]);
let srcbuf = vec![0xEF, 0xBB, 0xBF];
let mut rdr = DecodeReader::new(&*srcbuf, vec![0; 8 * (1<<10)], None);
let n = rdr.read(&mut dstbuf).unwrap();
assert_eq!(&*srcbuf, &dstbuf[..n]);
}
// Test basic UTF-16 decoding.
#[test]
fn trans_utf16_basic() {
let srcbuf = vec![0xFF, 0xFE, 0x61, 0x00];
let mut rdr = DecodeReader::new(&*srcbuf, vec![0; 8 * (1<<10)], None);
assert_eq!("a", read_to_string(&mut rdr));
let srcbuf = vec![0xFE, 0xFF, 0x00, 0x61];
let mut rdr = DecodeReader::new(&*srcbuf, vec![0; 8 * (1<<10)], None);
assert_eq!("a", read_to_string(&mut rdr));
}
// Test incomplete UTF-16 decoding. This ensures we see a replacement char
// if the stream ends with an unpaired code unit.
#[test]
fn trans_utf16_incomplete() {
let srcbuf = vec![0xFF, 0xFE, 0x61, 0x00, 0x00];
let mut rdr = DecodeReader::new(&*srcbuf, vec![0; 8 * (1<<10)], None);
assert_eq!("a\u{FFFD}", read_to_string(&mut rdr));
}
macro_rules! test_trans_simple {
($name:ident, $enc:expr, $srcbytes:expr, $dst:expr) => {
#[test]
fn $name() {
let srcbuf = &$srcbytes[..];
let enc = Encoding::for_label($enc.as_bytes());
let mut rdr = DecodeReader::new(
&*srcbuf, vec![0; 8 * (1<<10)], enc);
assert_eq!($dst, read_to_string(&mut rdr));
}
}
}
// This isn't exhaustive obviously, but it lets us test base level support.
test_trans_simple!(trans_simple_auto, "does not exist", b"\xD0\x96", "Ж");
test_trans_simple!(trans_simple_utf8, "utf-8", b"\xD0\x96", "Ж");
test_trans_simple!(trans_simple_utf16le, "utf-16le", b"\x16\x04", "Ж");
test_trans_simple!(trans_simple_utf16be, "utf-16be", b"\x04\x16", "Ж");
test_trans_simple!(trans_simple_chinese, "chinese", b"\xA7\xA8", "Ж");
test_trans_simple!(trans_simple_korean, "korean", b"\xAC\xA8", "Ж");
test_trans_simple!(
trans_simple_big5_hkscs, "big5-hkscs", b"\xC7\xFA", "Ж");
test_trans_simple!(trans_simple_gbk, "gbk", b"\xA7\xA8", "Ж");
test_trans_simple!(trans_simple_sjis, "sjis", b"\x84\x47", "Ж");
test_trans_simple!(trans_simple_eucjp, "euc-jp", b"\xA7\xA8", "Ж");
test_trans_simple!(trans_simple_latin1, "latin1", b"\xA9", "©");
}

View File

@@ -44,6 +44,7 @@ lazy_static! {
m.insert("gz", DecompressionCommand::new("gzip", ARGS)); m.insert("gz", DecompressionCommand::new("gzip", ARGS));
m.insert("bz2", DecompressionCommand::new("bzip2", ARGS)); m.insert("bz2", DecompressionCommand::new("bzip2", ARGS));
m.insert("xz", DecompressionCommand::new("xz", ARGS)); m.insert("xz", DecompressionCommand::new("xz", ARGS));
m.insert("lz4", DecompressionCommand::new("lz4", ARGS));
const LZMA_ARGS: &[&str] = &["--format=lzma", "-d", "-c"]; const LZMA_ARGS: &[&str] = &["--format=lzma", "-d", "-c"];
m.insert("lzma", DecompressionCommand::new("xz", LZMA_ARGS)); m.insert("lzma", DecompressionCommand::new("xz", LZMA_ARGS));
@@ -55,6 +56,7 @@ lazy_static! {
builder.add(Glob::new("*.gz").unwrap()); builder.add(Glob::new("*.gz").unwrap());
builder.add(Glob::new("*.bz2").unwrap()); builder.add(Glob::new("*.bz2").unwrap());
builder.add(Glob::new("*.xz").unwrap()); builder.add(Glob::new("*.xz").unwrap());
builder.add(Glob::new("*.lz4").unwrap());
builder.add(Glob::new("*.lzma").unwrap()); builder.add(Glob::new("*.lzma").unwrap());
builder.build().unwrap() builder.build().unwrap()
}; };
@@ -63,6 +65,7 @@ lazy_static! {
builder.add(Glob::new("*.tar.gz").unwrap()); builder.add(Glob::new("*.tar.gz").unwrap());
builder.add(Glob::new("*.tar.xz").unwrap()); builder.add(Glob::new("*.tar.xz").unwrap());
builder.add(Glob::new("*.tar.bz2").unwrap()); builder.add(Glob::new("*.tar.bz2").unwrap());
builder.add(Glob::new("*.tar.lz4").unwrap());
builder.add(Glob::new("*.tgz").unwrap()); builder.add(Glob::new("*.tgz").unwrap());
builder.add(Glob::new("*.txz").unwrap()); builder.add(Glob::new("*.txz").unwrap());
builder.add(Glob::new("*.tbz2").unwrap()); builder.add(Glob::new("*.tbz2").unwrap());
@@ -89,10 +92,6 @@ impl DecompressionReader {
/// If there is any error in spawning the decompression command, then /// If there is any error in spawning the decompression command, then
/// return `None`, after outputting any necessary debug or error messages. /// return `None`, after outputting any necessary debug or error messages.
pub fn from_path(path: &Path) -> Option<DecompressionReader> { pub fn from_path(path: &Path) -> Option<DecompressionReader> {
if is_tar_archive(path) {
debug!("{}: skipping tar archive", path.display());
return None;
}
let extension = match path.extension().and_then(OsStr::to_str) { let extension = match path.extension().and_then(OsStr::to_str) {
Some(extension) => extension, Some(extension) => extension,
None => { None => {

View File

@@ -34,19 +34,30 @@ impl Log for Logger {
match (record.file(), record.line()) { match (record.file(), record.line()) {
(Some(file), Some(line)) => { (Some(file), Some(line)) => {
eprintln!( eprintln!(
"{}/{}/{}:{}: {}", "{}|{}|{}:{}: {}",
record.level(), record.target(), record.level(),
file, line, record.args()); record.target(),
file,
line,
record.args()
);
} }
(Some(file), None) => { (Some(file), None) => {
eprintln!( eprintln!(
"{}/{}/{}: {}", "{}|{}|{}: {}",
record.level(), record.target(), file, record.args()); record.level(),
record.target(),
file,
record.args()
);
} }
_ => { _ => {
eprintln!( eprintln!(
"{}/{}: {}", "{}|{}: {}",
record.level(), record.target(), record.args()); record.level(),
record.target(),
record.args()
);
} }
} }
} }

View File

@@ -1,18 +1,13 @@
extern crate atty; extern crate atty;
extern crate bytecount;
#[macro_use] #[macro_use]
extern crate clap; extern crate clap;
extern crate encoding_rs;
extern crate globset; extern crate globset;
extern crate grep; extern crate grep;
extern crate ignore; extern crate ignore;
#[macro_use] #[macro_use]
extern crate lazy_static; extern crate lazy_static;
extern crate libc;
#[macro_use] #[macro_use]
extern crate log; extern crate log;
extern crate memchr;
extern crate memmap;
extern crate num_cpus; extern crate num_cpus;
extern crate regex; extern crate regex;
extern crate same_file; extern crate same_file;
@@ -20,421 +15,279 @@ extern crate termcolor;
#[cfg(windows)] #[cfg(windows)]
extern crate winapi; extern crate winapi;
use std::error::Error; use std::io;
use std::process; use std::process;
use std::result; use std::sync::{Arc, Mutex};
use std::sync::Arc; use std::time::Instant;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::mpsc; use ignore::WalkState;
use std::thread;
use std::time::{Duration, Instant};
use args::Args; use args::Args;
use worker::Work; use subject::Subject;
macro_rules! errored { #[macro_use]
($($tt:tt)*) => { mod messages;
return Err(From::from(format!($($tt)*)));
}
}
mod app; mod app;
mod args; mod args;
mod config; mod config;
mod decoder;
mod decompressor; mod decompressor;
mod preprocessor;
mod logger; mod logger;
mod pathutil; mod path_printer;
mod printer; mod search;
mod search_buffer; mod subject;
mod search_stream;
mod unescape; mod unescape;
mod worker;
pub type Result<T> = result::Result<T, Box<Error>>; pub type Result<T> = ::std::result::Result<T, Box<::std::error::Error>>;
fn main() { pub fn main() {
reset_sigpipe(); match Args::parse().and_then(run) {
match Args::parse().map(Arc::new).and_then(run) { Ok(true) => process::exit(0),
Ok(0) => process::exit(1), Ok(false) => process::exit(1),
Ok(_) => process::exit(0),
Err(err) => { Err(err) => {
eprintln!("{}", err); eprintln!("{}", err);
process::exit(1); process::exit(2);
} }
} }
} }
fn run(args: Arc<Args>) -> Result<u64> { fn run(args: Args) -> Result<bool> {
if args.never_match() { use args::Command::*;
return Ok(0);
} match args.command()? {
let threads = args.threads(); Search => search(args),
if args.files() { SearchParallel => search_parallel(args),
if threads == 1 || args.is_one_path() { SearchNever => Ok(false),
run_files_one_thread(&args) Files => files(args),
} else { FilesParallel => files_parallel(args),
run_files_parallel(args) Types => types(args),
}
} else if args.type_list() {
run_types(&args)
} else if threads == 1 || args.is_one_path() {
run_one_thread(&args)
} else {
run_parallel(&args)
} }
} }
fn run_parallel(args: &Arc<Args>) -> Result<u64> { /// The top-level entry point for single-threaded search. This recursively
let start_time = Instant::now(); /// steps through the file list (current directory by default) and searches
let bufwtr = Arc::new(args.buffer_writer()); /// each file sequentially.
let quiet_matched = args.quiet_matched(); fn search(args: Args) -> Result<bool> {
let paths_searched = Arc::new(AtomicUsize::new(0)); let started_at = Instant::now();
let match_line_count = Arc::new(AtomicUsize::new(0)); let quit_after_match = args.quit_after_match()?;
let paths_matched = Arc::new(AtomicUsize::new(0)); let subject_builder = args.subject_builder();
let mut stats = args.stats()?;
let mut searcher = args.search_worker(args.stdout())?;
let mut matched = false;
args.walker_parallel().run(|| { for result in args.walker()? {
let args = Arc::clone(args); let subject = match subject_builder.build_from_result(result) {
let quiet_matched = quiet_matched.clone(); Some(subject) => subject,
let paths_searched = paths_searched.clone();
let match_line_count = match_line_count.clone();
let paths_matched = paths_matched.clone();
let bufwtr = Arc::clone(&bufwtr);
let mut buf = bufwtr.buffer();
let mut worker = args.worker();
Box::new(move |result| {
use ignore::WalkState::*;
if quiet_matched.has_match() {
return Quit;
}
let dent = match get_or_log_dir_entry(
result,
args.stdout_handle(),
args.files(),
args.no_messages(),
) {
None => return Continue,
Some(dent) => dent,
};
paths_searched.fetch_add(1, Ordering::SeqCst);
buf.clear();
{
// This block actually executes the search and prints the
// results into outbuf.
let mut printer = args.printer(&mut buf);
let count =
if dent.is_stdin() {
worker.run(&mut printer, Work::Stdin)
} else {
worker.run(&mut printer, Work::DirEntry(dent))
};
match_line_count.fetch_add(count as usize, Ordering::SeqCst);
if quiet_matched.set_match(count > 0) {
return Quit;
}
if args.stats() && count > 0 {
paths_matched.fetch_add(1, Ordering::SeqCst);
}
}
// BUG(burntsushi): We should handle this error instead of ignoring
// it. See: https://github.com/BurntSushi/ripgrep/issues/200
let _ = bufwtr.print(&buf);
Continue
})
});
if !args.paths().is_empty() && paths_searched.load(Ordering::SeqCst) == 0 {
if !args.no_messages() {
eprint_nothing_searched();
}
}
let match_line_count = match_line_count.load(Ordering::SeqCst) as u64;
let paths_searched = paths_searched.load(Ordering::SeqCst) as u64;
let paths_matched = paths_matched.load(Ordering::SeqCst) as u64;
if args.stats() {
print_stats(
match_line_count,
paths_searched,
paths_matched,
start_time.elapsed(),
);
}
Ok(match_line_count)
}
fn run_one_thread(args: &Arc<Args>) -> Result<u64> {
let start_time = Instant::now();
let stdout = args.stdout();
let mut stdout = stdout.lock();
let mut worker = args.worker();
let mut paths_searched: u64 = 0;
let mut match_line_count = 0;
let mut paths_matched: u64 = 0;
for result in args.walker() {
let dent = match get_or_log_dir_entry(
result,
args.stdout_handle(),
args.files(),
args.no_messages(),
) {
None => continue, None => continue,
Some(dent) => dent,
}; };
let mut printer = args.printer(&mut stdout); let search_result = match searcher.search(&subject) {
if match_line_count > 0 { Ok(search_result) => search_result,
if args.quiet() { Err(err) => {
// A broken pipe means graceful termination.
if err.kind() == io::ErrorKind::BrokenPipe {
break; break;
} }
if let Some(sep) = args.file_separator() { message!("{}: {}", subject.path().display(), err);
printer = printer.file_separator(sep); continue;
} }
}
paths_searched += 1;
let count =
if dent.is_stdin() {
worker.run(&mut printer, Work::Stdin)
} else {
worker.run(&mut printer, Work::DirEntry(dent))
}; };
match_line_count += count; matched = matched || search_result.has_match();
if args.stats() && count > 0 { if let Some(ref mut stats) = stats {
paths_matched += 1; *stats += search_result.stats().unwrap();
}
if matched && quit_after_match {
break;
} }
} }
if !args.paths().is_empty() && paths_searched == 0 { if let Some(ref stats) = stats {
if !args.no_messages() { let elapsed = Instant::now().duration_since(started_at);
eprint_nothing_searched(); // We don't care if we couldn't print this successfully.
let _ = searcher.printer().print_stats(elapsed, stats);
} }
} Ok(matched)
if args.stats() {
print_stats(
match_line_count,
paths_searched,
paths_matched,
start_time.elapsed(),
);
}
Ok(match_line_count)
} }
fn run_files_parallel(args: Arc<Args>) -> Result<u64> { /// The top-level entry point for multi-threaded search. The parallelism is
let print_args = Arc::clone(&args); /// itself achieved by the recursive directory traversal. All we need to do is
let (tx, rx) = mpsc::channel::<ignore::DirEntry>(); /// feed it a worker for performing a search on each file.
let print_thread = thread::spawn(move || { fn search_parallel(args: Args) -> Result<bool> {
let stdout = print_args.stdout(); use std::sync::atomic::AtomicBool;
let mut printer = print_args.printer(stdout.lock()); use std::sync::atomic::Ordering::SeqCst;
let mut file_count = 0;
for dent in rx.iter() { let quit_after_match = args.quit_after_match()?;
if !print_args.quiet() { let started_at = Instant::now();
printer.path(dent.path()); let subject_builder = Arc::new(args.subject_builder());
} let bufwtr = Arc::new(args.buffer_writer()?);
file_count += 1; let stats = Arc::new(args.stats()?.map(Mutex::new));
} let matched = Arc::new(AtomicBool::new(false));
file_count let mut searcher_err = None;
args.walker_parallel()?.run(|| {
let args = args.clone();
let bufwtr = Arc::clone(&bufwtr);
let stats = Arc::clone(&stats);
let matched = Arc::clone(&matched);
let subject_builder = Arc::clone(&subject_builder);
let mut searcher = match args.search_worker(bufwtr.buffer()) {
Ok(searcher) => searcher,
Err(err) => {
searcher_err = Some(err);
return Box::new(move |_| {
WalkState::Quit
}); });
args.walker_parallel().run(move || {
let args = Arc::clone(&args);
let tx = tx.clone();
Box::new(move |result| {
if let Some(dent) = get_or_log_dir_entry(
result,
args.stdout_handle(),
args.files(),
args.no_messages(),
) {
tx.send(dent).unwrap();
} }
ignore::WalkState::Continue };
Box::new(move |result| {
let subject = match subject_builder.build_from_result(result) {
Some(subject) => subject,
None => return WalkState::Continue,
};
searcher.printer().get_mut().clear();
let search_result = match searcher.search(&subject) {
Ok(search_result) => search_result,
Err(err) => {
message!("{}: {}", subject.path().display(), err);
return WalkState::Continue;
}
};
if search_result.has_match() {
matched.store(true, SeqCst);
}
if let Some(ref locked_stats) = *stats {
let mut stats = locked_stats.lock().unwrap();
*stats += search_result.stats().unwrap();
}
if let Err(err) = bufwtr.print(searcher.printer().get_mut()) {
// A broken pipe means graceful termination.
if err.kind() == io::ErrorKind::BrokenPipe {
return WalkState::Quit;
}
// Otherwise, we continue on our merry way.
message!("{}: {}", subject.path().display(), err);
}
if matched.load(SeqCst) && quit_after_match {
WalkState::Quit
} else {
WalkState::Continue
}
}) })
}); });
Ok(print_thread.join().unwrap()) if let Some(err) = searcher_err.take() {
return Err(err);
}
if let Some(ref locked_stats) = *stats {
let elapsed = Instant::now().duration_since(started_at);
let stats = locked_stats.lock().unwrap();
let mut searcher = args.search_worker(args.stdout())?;
// We don't care if we couldn't print this successfully.
let _ = searcher.printer().print_stats(elapsed, &stats);
}
Ok(matched.load(SeqCst))
} }
fn run_files_one_thread(args: &Arc<Args>) -> Result<u64> { /// The top-level entry point for listing files without searching them. This
let stdout = args.stdout(); /// recursively steps through the file list (current directory by default) and
let mut printer = args.printer(stdout.lock()); /// prints each path sequentially using a single thread.
let mut file_count = 0; fn files(args: Args) -> Result<bool> {
for result in args.walker() { let quit_after_match = args.quit_after_match()?;
let dent = match get_or_log_dir_entry( let subject_builder = args.subject_builder();
result, let mut matched = false;
args.stdout_handle(), let mut path_printer = args.path_printer(args.stdout())?;
args.files(), for result in args.walker()? {
args.no_messages(), let subject = match subject_builder.build_from_result(result) {
) { Some(subject) => subject,
None => continue, None => continue,
Some(dent) => dent,
}; };
if !args.quiet() { matched = true;
printer.path(dent.path()); if quit_after_match {
break;
} }
file_count += 1; if let Err(err) = path_printer.write_path(subject.path()) {
// A broken pipe means graceful termination.
if err.kind() == io::ErrorKind::BrokenPipe {
break;
} }
Ok(file_count) // Otherwise, we have some other error that's preventing us from
// writing to stdout, so we should bubble it up.
return Err(err.into());
}
}
Ok(matched)
} }
fn run_types(args: &Arc<Args>) -> Result<u64> { /// The top-level entry point for listing files without searching them. This
let stdout = args.stdout(); /// recursively steps through the file list (current directory by default) and
let mut printer = args.printer(stdout.lock()); /// prints each path sequentially using multiple threads.
let mut ty_count = 0; fn files_parallel(args: Args) -> Result<bool> {
for def in args.type_defs() { use std::sync::atomic::AtomicBool;
printer.type_def(def); use std::sync::atomic::Ordering::SeqCst;
ty_count += 1; use std::sync::mpsc;
} use std::thread;
Ok(ty_count)
}
fn get_or_log_dir_entry( let quit_after_match = args.quit_after_match()?;
result: result::Result<ignore::DirEntry, ignore::Error>, let subject_builder = Arc::new(args.subject_builder());
stdout_handle: Option<&same_file::Handle>, let mut path_printer = args.path_printer(args.stdout())?;
files_only: bool, let matched = Arc::new(AtomicBool::new(false));
no_messages: bool, let (tx, rx) = mpsc::channel::<Subject>();
) -> Option<ignore::DirEntry> {
match result {
Err(err) => {
if !no_messages {
eprintln!("{}", err);
}
None
}
Ok(dent) => {
if let Some(err) = dent.error() {
if !no_messages {
eprintln!("{}", err);
}
}
if dent.file_type().is_none() {
return Some(dent); // entry is stdin
}
// A depth of 0 means the user gave the path explicitly, so we
// should always try to search it.
if dent.depth() == 0 && !ignore_entry_is_dir(&dent) {
return Some(dent);
} else if !ignore_entry_is_file(&dent) {
return None;
}
// If we are redirecting stdout to a file, then don't search that
// file.
if !files_only && is_stdout_file(&dent, stdout_handle, no_messages) {
return None;
}
Some(dent)
}
}
}
/// Returns true if and only if the given `ignore::DirEntry` points to a let print_thread = thread::spawn(move || -> io::Result<()> {
/// directory. for subject in rx.iter() {
/// path_printer.write_path(subject.path())?;
/// This works around a bug in Rust's standard library: }
/// https://github.com/rust-lang/rust/issues/46484 Ok(())
#[cfg(windows)] });
fn ignore_entry_is_dir(dent: &ignore::DirEntry) -> bool { args.walker_parallel()?.run(|| {
use std::os::windows::fs::MetadataExt; let subject_builder = Arc::clone(&subject_builder);
use winapi::um::winnt::FILE_ATTRIBUTE_DIRECTORY; let matched = Arc::clone(&matched);
let tx = tx.clone();
dent.metadata().map(|md| { Box::new(move |result| {
md.file_attributes() & FILE_ATTRIBUTE_DIRECTORY != 0 let subject = match subject_builder.build_from_result(result) {
}).unwrap_or(false) Some(subject) => subject,
} None => return WalkState::Continue,
/// Returns true if and only if the given `ignore::DirEntry` points to a
/// directory.
#[cfg(not(windows))]
fn ignore_entry_is_dir(dent: &ignore::DirEntry) -> bool {
dent.file_type().map_or(false, |ft| ft.is_dir())
}
/// Returns true if and only if the given `ignore::DirEntry` points to a
/// file.
///
/// This works around a bug in Rust's standard library:
/// https://github.com/rust-lang/rust/issues/46484
#[cfg(windows)]
fn ignore_entry_is_file(dent: &ignore::DirEntry) -> bool {
!ignore_entry_is_dir(dent)
}
/// Returns true if and only if the given `ignore::DirEntry` points to a
/// file.
#[cfg(not(windows))]
fn ignore_entry_is_file(dent: &ignore::DirEntry) -> bool {
dent.file_type().map_or(false, |ft| ft.is_file())
}
fn is_stdout_file(
dent: &ignore::DirEntry,
stdout_handle: Option<&same_file::Handle>,
no_messages: bool,
) -> bool {
let stdout_handle = match stdout_handle {
None => return false,
Some(stdout_handle) => stdout_handle,
}; };
// If we know for sure that these two things aren't equal, then avoid matched.store(true, SeqCst);
// the costly extra stat call to determine equality. if quit_after_match {
if !maybe_dent_eq_handle(dent, stdout_handle) { WalkState::Quit
return false; } else {
} match tx.send(subject) {
match same_file::Handle::from_path(dent.path()) { Ok(_) => WalkState::Continue,
Ok(h) => stdout_handle == &h, Err(_) => WalkState::Quit,
Err(err) => {
if !no_messages {
eprintln!("{}: {}", dent.path().display(), err);
}
false
} }
} }
} })
});
#[cfg(unix)] drop(tx);
fn maybe_dent_eq_handle( if let Err(err) = print_thread.join().unwrap() {
dent: &ignore::DirEntry, // A broken pipe means graceful termination, so fall through.
handle: &same_file::Handle, // Otherwise, something bad happened while writing to stdout, so bubble
) -> bool { // it up.
dent.ino() == Some(handle.ino()) if err.kind() != io::ErrorKind::BrokenPipe {
} return Err(err.into());
#[cfg(not(unix))]
fn maybe_dent_eq_handle(_: &ignore::DirEntry, _: &same_file::Handle) -> bool {
true
}
fn eprint_nothing_searched() {
eprintln!("No files were searched, which means ripgrep probably \
applied a filter you didn't expect. \
Try running again with --debug.");
}
fn print_stats(
match_count: u64,
paths_searched: u64,
paths_matched: u64,
time_elapsed: Duration,
) {
let time_elapsed =
time_elapsed.as_secs() as f64
+ (time_elapsed.subsec_nanos() as f64 * 1e-9);
println!("\n{} matched lines\n\
{} files contained matches\n\
{} files searched\n\
{:.3} seconds", match_count, paths_matched,
paths_searched, time_elapsed);
}
// The Rust standard library suppresses the default SIGPIPE behavior, so that
// writing to a closed pipe doesn't kill the process. The goal is to instead
// handle errors through the normal result mechanism. Ripgrep needs some
// refactoring before it will be able to do that, however, so we re-enable the
// standard SIGPIPE behavior as a workaround. See
// https://github.com/BurntSushi/ripgrep/issues/200.
#[cfg(unix)]
fn reset_sigpipe() {
unsafe {
libc::signal(libc::SIGPIPE, libc::SIG_DFL);
} }
}
Ok(matched.load(SeqCst))
} }
#[cfg(not(unix))] /// The top-level entry point for --type-list.
fn reset_sigpipe() { fn types(args: Args) -> Result<bool> {
// no-op let mut count = 0;
let mut stdout = args.stdout();
for def in args.type_defs()? {
count += 1;
stdout.write_all(def.name().as_bytes())?;
stdout.write_all(b": ")?;
let mut first = true;
for glob in def.globs() {
if !first {
stdout.write_all(b", ")?;
}
stdout.write_all(glob.as_bytes())?;
first = false;
}
stdout.write_all(b"\n")?;
}
Ok(count > 0)
} }

50
src/messages.rs Normal file
View File

@@ -0,0 +1,50 @@
use std::sync::atomic::{ATOMIC_BOOL_INIT, AtomicBool, Ordering};
static MESSAGES: AtomicBool = ATOMIC_BOOL_INIT;
static IGNORE_MESSAGES: AtomicBool = ATOMIC_BOOL_INIT;
#[macro_export]
macro_rules! message {
($($tt:tt)*) => {
if ::messages::messages() {
eprintln!($($tt)*);
}
}
}
#[macro_export]
macro_rules! ignore_message {
($($tt:tt)*) => {
if ::messages::messages() && ::messages::ignore_messages() {
eprintln!($($tt)*);
}
}
}
/// Returns true if and only if messages should be shown.
pub fn messages() -> bool {
MESSAGES.load(Ordering::SeqCst)
}
/// Set whether messages should be shown or not.
///
/// By default, they are not shown.
pub fn set_messages(yes: bool) {
MESSAGES.store(yes, Ordering::SeqCst)
}
/// Returns true if and only if "ignore" related messages should be shown.
pub fn ignore_messages() -> bool {
IGNORE_MESSAGES.load(Ordering::SeqCst)
}
/// Set whether "ignore" related messages should be shown or not.
///
/// By default, they are not shown.
///
/// Note that this is overridden if `messages` is disabled. Namely, if
/// `messages` is disabled, then "ignore" messages are never shown, regardless
/// of this setting.
pub fn set_ignore_messages(yes: bool) {
IGNORE_MESSAGES.store(yes, Ordering::SeqCst)
}

101
src/path_printer.rs Normal file
View File

@@ -0,0 +1,101 @@
use std::io;
use std::path::Path;
use grep::printer::{ColorSpecs, PrinterPath};
use termcolor::WriteColor;
/// A configuration for describing how paths should be written.
#[derive(Clone, Debug)]
struct Config {
colors: ColorSpecs,
separator: Option<u8>,
terminator: u8,
}
impl Default for Config {
fn default() -> Config {
Config {
colors: ColorSpecs::default(),
separator: None,
terminator: b'\n',
}
}
}
/// A builder for constructing things to search over.
#[derive(Clone, Debug)]
pub struct PathPrinterBuilder {
config: Config,
}
impl PathPrinterBuilder {
/// Return a new subject builder with a default configuration.
pub fn new() -> PathPrinterBuilder {
PathPrinterBuilder { config: Config::default() }
}
/// Create a new path printer with the current configuration that writes
/// paths to the given writer.
pub fn build<W: WriteColor>(&self, wtr: W) -> PathPrinter<W> {
PathPrinter {
config: self.config.clone(),
wtr: wtr,
}
}
/// Set the color specification for this printer.
///
/// Currently, only the `path` component of the given specification is
/// used.
pub fn color_specs(
&mut self,
specs: ColorSpecs,
) -> &mut PathPrinterBuilder {
self.config.colors = specs;
self
}
/// A path separator.
///
/// When provided, the path's default separator will be replaced with
/// the given separator.
///
/// This is not set by default, and the system's default path separator
/// will be used.
pub fn separator(&mut self, sep: Option<u8>) -> &mut PathPrinterBuilder {
self.config.separator = sep;
self
}
/// A path terminator.
///
/// When printing a path, it will be by terminated by the given byte.
///
/// This is set to `\n` by default.
pub fn terminator(&mut self, terminator: u8) -> &mut PathPrinterBuilder {
self.config.terminator = terminator;
self
}
}
/// A printer for emitting paths to a writer, with optional color support.
#[derive(Debug)]
pub struct PathPrinter<W> {
config: Config,
wtr: W,
}
impl<W: WriteColor> PathPrinter<W> {
/// Write the given path to the underlying writer.
pub fn write_path(&mut self, path: &Path) -> io::Result<()> {
let ppath = PrinterPath::with_separator(path, self.config.separator);
if !self.wtr.supports_color() {
self.wtr.write_all(ppath.as_bytes())?;
} else {
self.wtr.set_color(self.config.colors.path())?;
self.wtr.write_all(ppath.as_bytes())?;
self.wtr.reset()?;
}
self.wtr.write_all(&[self.config.terminator])
}
}

View File

@@ -1,42 +0,0 @@
/*!
The pathutil module provides platform specific operations on paths that are
typically faster than the same operations as provided in `std::path`. In
particular, we really want to avoid the costly operation of parsing the path
into its constituent components. We give up on Windows, but on Unix, we deal
with the raw bytes directly.
On large repositories (like chromium), this can have a ~25% performance
improvement on just listing the files to search (!).
*/
use std::path::Path;
/// Strip `prefix` from the `path` and return the remainder.
///
/// If `path` doesn't have a prefix `prefix`, then return `None`.
#[cfg(unix)]
pub fn strip_prefix<'a, P: AsRef<Path> + ?Sized>(
prefix: &'a P,
path: &'a Path,
) -> Option<&'a Path> {
use std::ffi::OsStr;
use std::os::unix::ffi::OsStrExt;
let prefix = prefix.as_ref().as_os_str().as_bytes();
let path = path.as_os_str().as_bytes();
if prefix.len() > path.len() || prefix != &path[0..prefix.len()] {
None
} else {
Some(Path::new(OsStr::from_bytes(&path[prefix.len()..])))
}
}
/// Strip `prefix` from the `path` and return the remainder.
///
/// If `path` doesn't have a prefix `prefix`, then return `None`.
#[cfg(not(unix))]
pub fn strip_prefix<'a, P: AsRef<Path> + ?Sized>(
prefix: &'a P,
path: &'a Path,
) -> Option<&'a Path> {
path.strip_prefix(prefix).ok()
}

93
src/preprocessor.rs Normal file
View File

@@ -0,0 +1,93 @@
use std::fs::File;
use std::io::{self, Read};
use std::path::{Path, PathBuf};
use std::process::{self, Stdio};
/// PreprocessorReader provides an `io::Read` impl to read kids output.
#[derive(Debug)]
pub struct PreprocessorReader {
cmd: PathBuf,
path: PathBuf,
child: process::Child,
done: bool,
}
impl PreprocessorReader {
/// Returns a handle to the stdout of the spawned preprocessor process for
/// `path`, which can be directly searched in the worker. When the returned
/// value is exhausted, the underlying process is reaped. If the underlying
/// process fails, then its stderr is read and converted into a normal
/// io::Error.
///
/// If there is any error in spawning the preprocessor command, then
/// return the corresponding error.
pub fn from_cmd_path(
cmd: PathBuf,
path: &Path,
) -> io::Result<PreprocessorReader> {
let child = process::Command::new(&cmd)
.arg(path)
.stdin(Stdio::from(File::open(path)?))
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
.map_err(|err| {
io::Error::new(
io::ErrorKind::Other,
format!(
"error running preprocessor command '{}': {}",
cmd.display(),
err,
),
)
})?;
Ok(PreprocessorReader {
cmd: cmd,
path: path.to_path_buf(),
child: child,
done: false,
})
}
fn read_error(&mut self) -> io::Result<io::Error> {
let mut errbytes = vec![];
self.child.stderr.as_mut().unwrap().read_to_end(&mut errbytes)?;
let errstr = String::from_utf8_lossy(&errbytes);
let errstr = errstr.trim();
Ok(if errstr.is_empty() {
let msg = format!(
"preprocessor command failed: '{} {}'",
self.cmd.display(),
self.path.display(),
);
io::Error::new(io::ErrorKind::Other, msg)
} else {
let msg = format!(
"preprocessor command failed: '{} {}': {}",
self.cmd.display(),
self.path.display(),
errstr,
);
io::Error::new(io::ErrorKind::Other, msg)
})
}
}
impl io::Read for PreprocessorReader {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
if self.done {
return Ok(0);
}
let nread = self.child.stdout.as_mut().unwrap().read(buf)?;
if nread == 0 {
self.done = true;
// Reap the child now that we're done reading.
// If the command failed, report stderr as an error.
if !self.child.wait()?.success() {
return Err(self.read_error()?);
}
}
Ok(nread)
}
}

View File

@@ -1,932 +0,0 @@
use std::error;
use std::fmt;
use std::path::Path;
use std::str::FromStr;
use regex::bytes::{Captures, Match, Regex, Replacer};
use termcolor::{Color, ColorSpec, ParseColorError, WriteColor};
use pathutil::strip_prefix;
use ignore::types::FileTypeDef;
/// Track the start and end of replacements to allow coloring them on output.
#[derive(Debug)]
struct Offset {
start: usize,
end: usize,
}
impl Offset {
fn new(start: usize, end: usize) -> Offset {
Offset { start: start, end: end }
}
}
impl<'m, 'r> From<&'m Match<'r>> for Offset {
fn from(m: &'m Match<'r>) -> Self {
Offset{ start: m.start(), end: m.end() }
}
}
/// `CountingReplacer` implements the Replacer interface for Regex,
/// and counts how often replacement is being performed.
struct CountingReplacer<'r> {
replace: &'r [u8],
count: &'r mut usize,
offsets: &'r mut Vec<Offset>,
}
impl<'r> CountingReplacer<'r> {
fn new(
replace: &'r [u8],
count: &'r mut usize,
offsets: &'r mut Vec<Offset>,
) -> CountingReplacer<'r> {
CountingReplacer { replace: replace, count: count, offsets: offsets, }
}
}
impl<'r> Replacer for CountingReplacer<'r> {
fn replace_append(&mut self, caps: &Captures, dst: &mut Vec<u8>) {
*self.count += 1;
let start = dst.len();
caps.expand(self.replace, dst);
let end = dst.len();
if start != end {
self.offsets.push(Offset::new(start, end));
}
}
}
/// Printer encapsulates all output logic for searching.
///
/// Note that we currently ignore all write errors. It's probably worthwhile
/// to fix this, but printers are only ever used for writes to stdout or
/// writes to memory, neither of which commonly fail.
pub struct Printer<W> {
/// The underlying writer.
wtr: W,
/// Whether anything has been printed to wtr yet.
has_printed: bool,
/// Whether to show column numbers for the first match or not.
column: bool,
/// The string to use to separate non-contiguous runs of context lines.
context_separator: Vec<u8>,
/// The end-of-line terminator used by the printer. In general, eols are
/// printed via the match directly, but occasionally we need to insert them
/// ourselves (for example, to print a context separator).
eol: u8,
/// A file separator to show before any matches are printed.
file_separator: Option<Vec<u8>>,
/// Whether to show file name as a heading or not.
///
/// N.B. If with_filename is false, then this setting has no effect.
heading: bool,
/// Whether to show every match on its own line.
line_per_match: bool,
/// Whether to print NUL bytes after a file path instead of new lines
/// or `:`.
null: bool,
/// Print only the matched (non-empty) parts of a matching line
only_matching: bool,
/// A string to use as a replacement of each match in a matching line.
replace: Option<Vec<u8>>,
/// Whether to prefix each match with the corresponding file name.
with_filename: bool,
/// The color specifications.
colors: ColorSpecs,
/// The separator to use for file paths. If empty, this is ignored.
path_separator: Option<u8>,
/// Restrict lines to this many columns.
max_columns: Option<usize>,
/// Width of line number displayed. If the number of digits in the
/// line number is less than this, it is left padded with
/// spaces.
line_number_width: Option<usize>
}
impl<W: WriteColor> Printer<W> {
/// Create a new printer that writes to wtr with the given color settings.
pub fn new(wtr: W) -> Printer<W> {
Printer {
wtr: wtr,
has_printed: false,
column: false,
context_separator: "--".to_string().into_bytes(),
eol: b'\n',
file_separator: None,
heading: false,
line_per_match: false,
null: false,
only_matching: false,
replace: None,
with_filename: false,
colors: ColorSpecs::default(),
path_separator: None,
max_columns: None,
line_number_width: None
}
}
/// Set the color specifications.
pub fn colors(mut self, colors: ColorSpecs) -> Printer<W> {
self.colors = colors;
self
}
/// When set, column numbers will be printed for the first match on each
/// line.
pub fn column(mut self, yes: bool) -> Printer<W> {
self.column = yes;
self
}
/// Set the context separator. The default is `--`.
pub fn context_separator(mut self, sep: Vec<u8>) -> Printer<W> {
self.context_separator = sep;
self
}
/// Set the end-of-line terminator. The default is `\n`.
pub fn eol(mut self, eol: u8) -> Printer<W> {
self.eol = eol;
self
}
/// If set, the separator is printed before any matches. By default, no
/// separator is printed.
pub fn file_separator(mut self, sep: Vec<u8>) -> Printer<W> {
self.file_separator = Some(sep);
self
}
/// Whether to show file name as a heading or not.
///
/// N.B. If with_filename is false, then this setting has no effect.
pub fn heading(mut self, yes: bool) -> Printer<W> {
self.heading = yes;
self
}
/// Whether to show every match on its own line.
pub fn line_per_match(mut self, yes: bool) -> Printer<W> {
self.line_per_match = yes;
self
}
/// Whether to cause NUL bytes to follow file paths instead of other
/// visual separators (like `:`, `-` and `\n`).
pub fn null(mut self, yes: bool) -> Printer<W> {
self.null = yes;
self
}
/// Print only the matched (non-empty) parts of a matching line
pub fn only_matching(mut self, yes: bool) -> Printer<W> {
self.only_matching = yes;
self
}
/// A separator to use when printing file paths. When empty, use the
/// default separator for the current platform. (/ on Unix, \ on Windows.)
pub fn path_separator(mut self, sep: Option<u8>) -> Printer<W> {
self.path_separator = sep;
self
}
/// Replace every match in each matching line with the replacement string
/// given.
pub fn replace(mut self, replacement: Vec<u8>) -> Printer<W> {
self.replace = Some(replacement);
self
}
/// When set, each match is prefixed with the file name that it came from.
pub fn with_filename(mut self, yes: bool) -> Printer<W> {
self.with_filename = yes;
self
}
/// Configure the max. number of columns used for printing matching lines.
pub fn max_columns(mut self, max_columns: Option<usize>) -> Printer<W> {
self.max_columns = max_columns;
self
}
/// Configure the width of the displayed line number
pub fn line_number_width(mut self, line_number_width: Option<usize>) -> Printer<W> {
self.line_number_width = line_number_width;
self
}
/// Returns true if and only if something has been printed.
pub fn has_printed(&self) -> bool {
self.has_printed
}
/// Flushes the underlying writer and returns it.
#[allow(dead_code)]
pub fn into_inner(mut self) -> W {
let _ = self.wtr.flush();
self.wtr
}
/// Prints a type definition.
pub fn type_def(&mut self, def: &FileTypeDef) {
self.write(def.name().as_bytes());
self.write(b": ");
let mut first = true;
for glob in def.globs() {
if !first {
self.write(b", ");
}
self.write(glob.as_bytes());
first = false;
}
self.write_eol();
}
/// Prints the given path.
pub fn path<P: AsRef<Path>>(&mut self, path: P) {
let path = strip_prefix("./", path.as_ref()).unwrap_or(path.as_ref());
self.write_path(path);
self.write_path_eol();
}
/// Prints the given path and a count of the number of matches found.
pub fn path_count<P: AsRef<Path>>(&mut self, path: P, count: u64) {
if self.with_filename {
self.write_path(path);
self.write_path_sep(b':');
}
self.write(count.to_string().as_bytes());
self.write_eol();
}
/// Prints the context separator.
pub fn context_separate(&mut self) {
if self.context_separator.is_empty() {
return;
}
let _ = self.wtr.write_all(&self.context_separator);
self.write_eol();
}
pub fn matched<P: AsRef<Path>>(
&mut self,
re: &Regex,
path: P,
buf: &[u8],
start: usize,
end: usize,
line_number: Option<u64>,
byte_offset: Option<u64>
) {
if !self.line_per_match && !self.only_matching {
let mat = re
.find(&buf[start..end])
.map(|m| (m.start(), m.end()))
.unwrap_or((0, 0));
return self.write_match(
re, path, buf, start, end, line_number,
byte_offset, mat.0, mat.1);
}
for m in re.find_iter(&buf[start..end]) {
self.write_match(
re, path.as_ref(), buf, start, end, line_number,
byte_offset, m.start(), m.end());
}
}
fn write_match<P: AsRef<Path>>(
&mut self,
re: &Regex,
path: P,
buf: &[u8],
start: usize,
end: usize,
line_number: Option<u64>,
byte_offset: Option<u64>,
match_start: usize,
match_end: usize,
) {
if self.heading && self.with_filename && !self.has_printed {
self.write_file_sep();
self.write_path(path);
self.write_path_eol();
} else if !self.heading && self.with_filename {
self.write_path(path);
self.write_path_sep(b':');
}
if let Some(line_number) = line_number {
self.line_number(line_number, b':');
}
if self.column {
self.column_number(match_start as u64 + 1, b':');
}
if let Some(byte_offset) = byte_offset {
if self.only_matching {
self.write_byte_offset(
byte_offset + ((start + match_start) as u64), b':');
} else {
self.write_byte_offset(byte_offset + (start as u64), b':');
}
}
if self.replace.is_some() {
let mut count = 0;
let mut offsets = Vec::new();
let line = {
let replacer = CountingReplacer::new(
self.replace.as_ref().unwrap(), &mut count, &mut offsets);
if self.only_matching {
re.replace_all(
&buf[start + match_start..start + match_end], replacer)
} else {
re.replace_all(&buf[start..end], replacer)
}
};
if self.max_columns.map_or(false, |m| line.len() > m) {
let msg = format!(
"[Omitted long line with {} replacements]", count);
self.write_colored(msg.as_bytes(), |colors| colors.matched());
self.write_eol();
return;
}
self.write_matched_line(offsets, &*line, false);
} else {
let buf = if self.only_matching {
&buf[start + match_start..start + match_end]
} else {
&buf[start..end]
};
if self.max_columns.map_or(false, |m| buf.len() > m) {
let count = re.find_iter(buf).count();
let msg = format!("[Omitted long line with {} matches]", count);
self.write_colored(msg.as_bytes(), |colors| colors.matched());
self.write_eol();
return;
}
let only_match = self.only_matching;
self.write_matched_line(
re.find_iter(buf).map(|x| Offset::from(&x)), buf, only_match);
}
}
fn write_matched_line<I>(&mut self, offsets: I, buf: &[u8], only_match: bool)
where I: IntoIterator<Item=Offset>,
{
if !self.wtr.supports_color() || self.colors.matched().is_none() {
self.write(buf);
} else if only_match {
self.write_colored(buf, |colors| colors.matched());
} else {
let mut last_written = 0;
for o in offsets {
self.write(&buf[last_written..o.start]);
// This conditional checks if the match is both empty *and*
// past the end of the line. In this case, we never want to
// emit an additional color escape.
if o.start != o.end || o.end != buf.len() {
self.write_colored(
&buf[o.start..o.end], |colors| colors.matched());
}
last_written = o.end;
}
self.write(&buf[last_written..]);
}
if buf.last() != Some(&self.eol) {
self.write_eol();
}
}
pub fn context<P: AsRef<Path>>(
&mut self,
path: P,
buf: &[u8],
start: usize,
end: usize,
line_number: Option<u64>,
byte_offset: Option<u64>,
) {
if self.heading && self.with_filename && !self.has_printed {
self.write_file_sep();
self.write_path(path);
self.write_path_eol();
} else if !self.heading && self.with_filename {
self.write_path(path);
self.write_path_sep(b'-');
}
if let Some(line_number) = line_number {
self.line_number(line_number, b'-');
}
if let Some(byte_offset) = byte_offset {
self.write_byte_offset(byte_offset + (start as u64), b'-');
}
if self.max_columns.map_or(false, |m| end - start > m) {
self.write(b"[Omitted long context line]");
self.write_eol();
return;
}
self.write(&buf[start..end]);
if buf[start..end].last() != Some(&self.eol) {
self.write_eol();
}
}
fn separator(&mut self, sep: &[u8]) {
self.write(sep);
}
fn write_path_sep(&mut self, sep: u8) {
if self.null {
self.write(b"\x00");
} else {
self.separator(&[sep]);
}
}
fn write_path_eol(&mut self) {
if self.null {
self.write(b"\x00");
} else {
self.write_eol();
}
}
#[cfg(unix)]
fn write_path<P: AsRef<Path>>(&mut self, path: P) {
use std::os::unix::ffi::OsStrExt;
let path = path.as_ref().as_os_str().as_bytes();
self.write_path_replace_separator(path);
}
#[cfg(not(unix))]
fn write_path<P: AsRef<Path>>(&mut self, path: P) {
let path = path.as_ref().to_string_lossy();
self.write_path_replace_separator(path.as_bytes());
}
fn write_path_replace_separator(&mut self, path: &[u8]) {
match self.path_separator {
None => self.write_colored(path, |colors| colors.path()),
Some(sep) => {
let transformed_path: Vec<_> = path.iter().map(|&b| {
if b == b'/' || (cfg!(windows) && b == b'\\') {
sep
} else {
b
}
}).collect();
self.write_colored(&transformed_path, |colors| colors.path());
}
}
}
fn line_number(&mut self, n: u64, sep: u8) {
let mut line_number = n.to_string();
if let Some(width) = self.line_number_width {
line_number = format!("{:>width$}", line_number, width = width);
}
self.write_colored(line_number.as_bytes(), |colors| colors.line());
self.separator(&[sep]);
}
fn column_number(&mut self, n: u64, sep: u8) {
self.write_colored(n.to_string().as_bytes(), |colors| colors.column());
self.separator(&[sep]);
}
fn write_byte_offset(&mut self, o: u64, sep: u8) {
self.write_colored(o.to_string().as_bytes(), |colors| colors.column());
self.separator(&[sep]);
}
fn write(&mut self, buf: &[u8]) {
self.has_printed = true;
let _ = self.wtr.write_all(buf);
}
fn write_eol(&mut self) {
let eol = self.eol;
self.write(&[eol]);
}
fn write_colored<F>(&mut self, buf: &[u8], get_color: F)
where F: Fn(&ColorSpecs) -> &ColorSpec
{
let _ = self.wtr.set_color(get_color(&self.colors));
self.write(buf);
let _ = self.wtr.reset();
}
fn write_file_sep(&mut self) {
if let Some(ref sep) = self.file_separator {
self.has_printed = true;
let _ = self.wtr.write_all(sep);
let _ = self.wtr.write_all(b"\n");
}
}
}
/// An error that can occur when parsing color specifications.
#[derive(Clone, Debug, Eq, PartialEq)]
pub enum Error {
/// This occurs when an unrecognized output type is used.
UnrecognizedOutType(String),
/// This occurs when an unrecognized spec type is used.
UnrecognizedSpecType(String),
/// This occurs when an unrecognized color name is used.
UnrecognizedColor(String, String),
/// This occurs when an unrecognized style attribute is used.
UnrecognizedStyle(String),
/// This occurs when the format of a color specification is invalid.
InvalidFormat(String),
}
impl error::Error for Error {
fn description(&self) -> &str {
match *self {
Error::UnrecognizedOutType(_) => "unrecognized output type",
Error::UnrecognizedSpecType(_) => "unrecognized spec type",
Error::UnrecognizedColor(_, _) => "unrecognized color name",
Error::UnrecognizedStyle(_) => "unrecognized style attribute",
Error::InvalidFormat(_) => "invalid color spec",
}
}
fn cause(&self) -> Option<&error::Error> {
None
}
}
impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match *self {
Error::UnrecognizedOutType(ref name) => {
write!(f, "Unrecognized output type '{}'. Choose from: \
path, line, column, match.", name)
}
Error::UnrecognizedSpecType(ref name) => {
write!(f, "Unrecognized spec type '{}'. Choose from: \
fg, bg, style, none.", name)
}
Error::UnrecognizedColor(_, ref msg) => {
write!(f, "{}", msg)
}
Error::UnrecognizedStyle(ref name) => {
write!(f, "Unrecognized style attribute '{}'. Choose from: \
nobold, bold, nointense, intense, nounderline, \
underline.", name)
}
Error::InvalidFormat(ref original) => {
write!(
f,
"Invalid color spec format: '{}'. Valid format \
is '(path|line|column|match):(fg|bg|style):(value)'.",
original)
}
}
}
}
impl From<ParseColorError> for Error {
fn from(err: ParseColorError) -> Error {
Error::UnrecognizedColor(err.invalid().to_string(), err.to_string())
}
}
/// A merged set of color specifications.
#[derive(Clone, Debug, Default, Eq, PartialEq)]
pub struct ColorSpecs {
path: ColorSpec,
line: ColorSpec,
column: ColorSpec,
matched: ColorSpec,
}
/// A single color specification provided by the user.
///
/// A `ColorSpecs` can be built by merging a sequence of `Spec`s.
///
/// ## Example
///
/// The only way to build a `Spec` is to parse it from a string. Once multiple
/// `Spec`s have been constructed, then can be merged into a single
/// `ColorSpecs` value.
///
/// ```rust
/// use termcolor::{Color, ColorSpecs, Spec};
///
/// let spec1: Spec = "path:fg:blue".parse().unwrap();
/// let spec2: Spec = "match:bg:green".parse().unwrap();
/// let specs = ColorSpecs::new(&[spec1, spec2]);
///
/// assert_eq!(specs.path().fg(), Some(Color::Blue));
/// assert_eq!(specs.matched().bg(), Some(Color::Green));
/// ```
///
/// ## Format
///
/// The format of a `Spec` is a triple: `{type}:{attribute}:{value}`. Each
/// component is defined as follows:
///
/// * `{type}` can be one of `path`, `line`, `column` or `match`.
/// * `{attribute}` can be one of `fg`, `bg` or `style`. `{attribute}` may also
/// be the special value `none`, in which case, `{value}` can be omitted.
/// * `{value}` is either a color name (for `fg`/`bg`) or a style instruction.
///
/// `{type}` controls which part of the output should be styled and is
/// application dependent.
///
/// When `{attribute}` is `none`, then this should cause any existing color
/// settings to be cleared.
///
/// `{value}` should be a color when `{attribute}` is `fg` or `bg`, or it
/// should be a style instruction when `{attribute}` is `style`. When
/// `{attribute}` is `none`, `{value}` must be omitted.
///
/// Valid colors are `black`, `blue`, `green`, `red`, `cyan`, `magenta`,
/// `yellow`, `white`.
///
/// Valid style instructions are `nobold`, `bold`, `intense`, `nointense`,
/// `underline`, `nounderline`.
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct Spec {
ty: OutType,
value: SpecValue,
}
/// The actual value given by the specification.
#[derive(Clone, Debug, Eq, PartialEq)]
enum SpecValue {
None,
Fg(Color),
Bg(Color),
Style(Style),
}
/// The set of configurable portions of ripgrep's output.
#[derive(Clone, Debug, Eq, PartialEq)]
enum OutType {
Path,
Line,
Column,
Match,
}
/// The specification type.
#[derive(Clone, Debug, Eq, PartialEq)]
enum SpecType {
Fg,
Bg,
Style,
None,
}
/// The set of available styles for use in the terminal.
#[derive(Clone, Debug, Eq, PartialEq)]
enum Style {
Bold,
NoBold,
Intense,
NoIntense,
Underline,
NoUnderline
}
impl ColorSpecs {
/// Create color specifications from a list of user supplied
/// specifications.
pub fn new(user_specs: &[Spec]) -> ColorSpecs {
let mut specs = ColorSpecs::default();
for user_spec in user_specs {
match user_spec.ty {
OutType::Path => user_spec.merge_into(&mut specs.path),
OutType::Line => user_spec.merge_into(&mut specs.line),
OutType::Column => user_spec.merge_into(&mut specs.column),
OutType::Match => user_spec.merge_into(&mut specs.matched),
}
}
specs
}
/// Return the color specification for coloring file paths.
fn path(&self) -> &ColorSpec {
&self.path
}
/// Return the color specification for coloring line numbers.
fn line(&self) -> &ColorSpec {
&self.line
}
/// Return the color specification for coloring column numbers.
fn column(&self) -> &ColorSpec {
&self.column
}
/// Return the color specification for coloring matched text.
fn matched(&self) -> &ColorSpec {
&self.matched
}
}
impl Spec {
/// Merge this spec into the given color specification.
fn merge_into(&self, cspec: &mut ColorSpec) {
self.value.merge_into(cspec);
}
}
impl SpecValue {
/// Merge this spec value into the given color specification.
fn merge_into(&self, cspec: &mut ColorSpec) {
match *self {
SpecValue::None => cspec.clear(),
SpecValue::Fg(ref color) => { cspec.set_fg(Some(color.clone())); }
SpecValue::Bg(ref color) => { cspec.set_bg(Some(color.clone())); }
SpecValue::Style(ref style) => {
match *style {
Style::Bold => { cspec.set_bold(true); }
Style::NoBold => { cspec.set_bold(false); }
Style::Intense => { cspec.set_intense(true); }
Style::NoIntense => { cspec.set_intense(false); }
Style::Underline => { cspec.set_underline(true); }
Style::NoUnderline => { cspec.set_underline(false); }
}
}
}
}
}
impl FromStr for Spec {
type Err = Error;
fn from_str(s: &str) -> Result<Spec, Error> {
let pieces: Vec<&str> = s.split(':').collect();
if pieces.len() <= 1 || pieces.len() > 3 {
return Err(Error::InvalidFormat(s.to_string()));
}
let otype: OutType = pieces[0].parse()?;
match pieces[1].parse()? {
SpecType::None => Ok(Spec { ty: otype, value: SpecValue::None }),
SpecType::Style => {
if pieces.len() < 3 {
return Err(Error::InvalidFormat(s.to_string()));
}
let style: Style = pieces[2].parse()?;
Ok(Spec { ty: otype, value: SpecValue::Style(style) })
}
SpecType::Fg => {
if pieces.len() < 3 {
return Err(Error::InvalidFormat(s.to_string()));
}
let color: Color = pieces[2].parse()?;
Ok(Spec { ty: otype, value: SpecValue::Fg(color) })
}
SpecType::Bg => {
if pieces.len() < 3 {
return Err(Error::InvalidFormat(s.to_string()));
}
let color: Color = pieces[2].parse()?;
Ok(Spec { ty: otype, value: SpecValue::Bg(color) })
}
}
}
}
impl FromStr for OutType {
type Err = Error;
fn from_str(s: &str) -> Result<OutType, Error> {
match &*s.to_lowercase() {
"path" => Ok(OutType::Path),
"line" => Ok(OutType::Line),
"column" => Ok(OutType::Column),
"match" => Ok(OutType::Match),
_ => Err(Error::UnrecognizedOutType(s.to_string())),
}
}
}
impl FromStr for SpecType {
type Err = Error;
fn from_str(s: &str) -> Result<SpecType, Error> {
match &*s.to_lowercase() {
"fg" => Ok(SpecType::Fg),
"bg" => Ok(SpecType::Bg),
"style" => Ok(SpecType::Style),
"none" => Ok(SpecType::None),
_ => Err(Error::UnrecognizedSpecType(s.to_string())),
}
}
}
impl FromStr for Style {
type Err = Error;
fn from_str(s: &str) -> Result<Style, Error> {
match &*s.to_lowercase() {
"bold" => Ok(Style::Bold),
"nobold" => Ok(Style::NoBold),
"intense" => Ok(Style::Intense),
"nointense" => Ok(Style::NoIntense),
"underline" => Ok(Style::Underline),
"nounderline" => Ok(Style::NoUnderline),
_ => Err(Error::UnrecognizedStyle(s.to_string())),
}
}
}
#[cfg(test)]
mod tests {
use termcolor::{Color, ColorSpec};
use super::{ColorSpecs, Error, OutType, Spec, SpecValue, Style};
#[test]
fn merge() {
let user_specs: &[Spec] = &[
"match:fg:blue".parse().unwrap(),
"match:none".parse().unwrap(),
"match:style:bold".parse().unwrap(),
];
let mut expect_matched = ColorSpec::new();
expect_matched.set_bold(true);
assert_eq!(ColorSpecs::new(user_specs), ColorSpecs {
path: ColorSpec::default(),
line: ColorSpec::default(),
column: ColorSpec::default(),
matched: expect_matched,
});
}
#[test]
fn specs() {
let spec: Spec = "path:fg:blue".parse().unwrap();
assert_eq!(spec, Spec {
ty: OutType::Path,
value: SpecValue::Fg(Color::Blue),
});
let spec: Spec = "path:bg:red".parse().unwrap();
assert_eq!(spec, Spec {
ty: OutType::Path,
value: SpecValue::Bg(Color::Red),
});
let spec: Spec = "match:style:bold".parse().unwrap();
assert_eq!(spec, Spec {
ty: OutType::Match,
value: SpecValue::Style(Style::Bold),
});
let spec: Spec = "match:style:intense".parse().unwrap();
assert_eq!(spec, Spec {
ty: OutType::Match,
value: SpecValue::Style(Style::Intense),
});
let spec: Spec = "match:style:underline".parse().unwrap();
assert_eq!(spec, Spec {
ty: OutType::Match,
value: SpecValue::Style(Style::Underline),
});
let spec: Spec = "line:none".parse().unwrap();
assert_eq!(spec, Spec {
ty: OutType::Line,
value: SpecValue::None,
});
let spec: Spec = "column:bg:green".parse().unwrap();
assert_eq!(spec, Spec {
ty: OutType::Column,
value: SpecValue::Bg(Color::Green),
});
}
#[test]
fn spec_errors() {
let err = "line:nonee".parse::<Spec>().unwrap_err();
assert_eq!(err, Error::UnrecognizedSpecType("nonee".to_string()));
let err = "".parse::<Spec>().unwrap_err();
assert_eq!(err, Error::InvalidFormat("".to_string()));
let err = "foo".parse::<Spec>().unwrap_err();
assert_eq!(err, Error::InvalidFormat("foo".to_string()));
let err = "line:style:italic".parse::<Spec>().unwrap_err();
assert_eq!(err, Error::UnrecognizedStyle("italic".to_string()));
let err = "line:fg:brown".parse::<Spec>().unwrap_err();
match err {
Error::UnrecognizedColor(name, _) => assert_eq!(name, "brown"),
err => assert!(false, "unexpected error: {:?}", err),
}
let err = "foo:fg:brown".parse::<Spec>().unwrap_err();
assert_eq!(err, Error::UnrecognizedOutType("foo".to_string()));
}
}

348
src/search.rs Normal file
View File

@@ -0,0 +1,348 @@
use std::io;
use std::path::{Path, PathBuf};
use std::time::Duration;
use grep::matcher::Matcher;
use grep::printer::{JSON, Standard, Summary, Stats};
use grep::regex::RegexMatcher;
use grep::searcher::Searcher;
use termcolor::WriteColor;
use decompressor::{DecompressionReader, is_compressed};
use preprocessor::PreprocessorReader;
use subject::Subject;
/// The configuration for the search worker. Among a few other things, the
/// configuration primarily controls the way we show search results to users
/// at a very high level.
#[derive(Clone, Debug)]
struct Config {
preprocessor: Option<PathBuf>,
search_zip: bool,
}
impl Default for Config {
fn default() -> Config {
Config {
preprocessor: None,
search_zip: false,
}
}
}
/// A builder for configuring and constructing a search worker.
#[derive(Clone, Debug)]
pub struct SearchWorkerBuilder {
config: Config,
}
impl Default for SearchWorkerBuilder {
fn default() -> SearchWorkerBuilder {
SearchWorkerBuilder::new()
}
}
impl SearchWorkerBuilder {
/// Create a new builder for configuring and constructing a search worker.
pub fn new() -> SearchWorkerBuilder {
SearchWorkerBuilder { config: Config::default() }
}
/// Create a new search worker using the given searcher, matcher and
/// printer.
pub fn build<W: WriteColor>(
&self,
matcher: PatternMatcher,
searcher: Searcher,
printer: Printer<W>,
) -> SearchWorker<W> {
let config = self.config.clone();
SearchWorker { config, matcher, searcher, printer }
}
/// Set the path to a preprocessor command.
///
/// When this is set, instead of searching files directly, the given
/// command will be run with the file path as the first argument, and the
/// output of that command will be searched instead.
pub fn preprocessor(
&mut self,
cmd: Option<PathBuf>,
) -> &mut SearchWorkerBuilder {
self.config.preprocessor = cmd;
self
}
/// Enable the decompression and searching of common compressed files.
///
/// When enabled, if a particular file path is recognized as a compressed
/// file, then it is decompressed before searching.
///
/// Note that if a preprocessor command is set, then it overrides this
/// setting.
pub fn search_zip(&mut self, yes: bool) -> &mut SearchWorkerBuilder {
self.config.search_zip = yes;
self
}
}
/// The result of executing a search.
///
/// Generally speaking, the "result" of a search is sent to a printer, which
/// writes results to an underlying writer such as stdout or a file. However,
/// every search also has some aggregate statistics or meta data that may be
/// useful to higher level routines.
#[derive(Clone, Debug, Default)]
pub struct SearchResult {
has_match: bool,
stats: Option<Stats>,
}
impl SearchResult {
/// Whether the search found a match or not.
pub fn has_match(&self) -> bool {
self.has_match
}
/// Return aggregate search statistics for a single search, if available.
///
/// It can be expensive to compute statistics, so these are only present
/// if explicitly enabled in the printer provided by the caller.
pub fn stats(&self) -> Option<&Stats> {
self.stats.as_ref()
}
}
/// The pattern matcher used by a search worker.
#[derive(Clone, Debug)]
pub enum PatternMatcher {
RustRegex(RegexMatcher),
}
/// The printer used by a search worker.
///
/// The `W` type parameter refers to the type of the underlying writer.
#[derive(Debug)]
pub enum Printer<W> {
/// Use the standard printer, which supports the classic grep-like format.
Standard(Standard<W>),
/// Use the summary printer, which supports aggregate displays of search
/// results.
Summary(Summary<W>),
/// A JSON printer, which emits results in the JSON Lines format.
JSON(JSON<W>),
}
impl<W: WriteColor> Printer<W> {
/// Print the given statistics to the underlying writer in a way that is
/// consistent with this printer's format.
///
/// While `Stats` contains a duration itself, this only corresponds to the
/// time spent searching, where as `total_duration` should roughly
/// approximate the lifespan of the ripgrep process itself.
pub fn print_stats(
&mut self,
total_duration: Duration,
stats: &Stats,
) -> io::Result<()> {
match *self {
Printer::JSON(_) => unimplemented!(),
Printer::Standard(_) | Printer::Summary(_) => {
self.print_stats_human(total_duration, stats)
}
}
}
fn print_stats_human(
&mut self,
total_duration: Duration,
stats: &Stats,
) -> io::Result<()> {
write!(
self.get_mut(),
"
{matches} matches
{lines} matched lines
{searches_with_match} files contained matches
{searches} files searched
{bytes_printed} bytes printed
{bytes_searched} bytes searched
{search_time:.6} seconds spent searching
{process_time:.6} seconds
",
matches = stats.matches(),
lines = stats.matched_lines(),
searches_with_match = stats.searches_with_match(),
searches = stats.searches(),
bytes_printed = stats.bytes_printed(),
bytes_searched = stats.bytes_searched(),
search_time = fractional_seconds(stats.elapsed()),
process_time = fractional_seconds(total_duration)
)
}
/// Return a mutable reference to the underlying printer's writer.
pub fn get_mut(&mut self) -> &mut W {
match *self {
Printer::Standard(ref mut p) => p.get_mut(),
Printer::Summary(ref mut p) => p.get_mut(),
Printer::JSON(ref mut p) => p.get_mut(),
}
}
}
/// A worker for executing searches.
///
/// It is intended for a single worker to execute many searches, and is
/// generally intended to be used from a single thread. When searching using
/// multiple threads, it is better to create a new worker for each thread.
#[derive(Debug)]
pub struct SearchWorker<W> {
config: Config,
matcher: PatternMatcher,
searcher: Searcher,
printer: Printer<W>,
}
impl<W: WriteColor> SearchWorker<W> {
/// Execute a search over the given subject.
pub fn search(&mut self, subject: &Subject) -> io::Result<SearchResult> {
self.search_impl(subject)
}
/// Return a mutable reference to the underlying printer.
pub fn printer(&mut self) -> &mut Printer<W> {
&mut self.printer
}
/// Search the given subject using the appropriate strategy.
fn search_impl(&mut self, subject: &Subject) -> io::Result<SearchResult> {
let path = subject.path();
if subject.is_stdin() {
let stdin = io::stdin();
// A `return` here appeases the borrow checker. NLL will fix this.
return self.search_reader(path, stdin.lock());
} else if self.config.preprocessor.is_some() {
let cmd = self.config.preprocessor.clone().unwrap();
let rdr = PreprocessorReader::from_cmd_path(cmd, path)?;
self.search_reader(path, rdr)
} else if self.config.search_zip && is_compressed(path) {
match DecompressionReader::from_path(path) {
None => Ok(SearchResult::default()),
Some(rdr) => self.search_reader(path, rdr),
}
} else {
self.search_path(path)
}
}
/// Search the contents of the given file path.
fn search_path(&mut self, path: &Path) -> io::Result<SearchResult> {
use self::PatternMatcher::*;
let (searcher, printer) = (&mut self.searcher, &mut self.printer);
match self.matcher {
RustRegex(ref m) => search_path(m, searcher, printer, path),
}
}
/// Executes a search on the given reader, which may or may not correspond
/// directly to the contents of the given file path. Instead, the reader
/// may actually cause something else to be searched (for example, when
/// a preprocessor is set or when decompression is enabled). In those
/// cases, the file path is used for visual purposes only.
///
/// Generally speaking, this method should only be used when there is no
/// other choice. Searching via `search_path` provides more opportunities
/// for optimizations (such as memory maps).
fn search_reader<R: io::Read>(
&mut self,
path: &Path,
rdr: R,
) -> io::Result<SearchResult> {
use self::PatternMatcher::*;
let (searcher, printer) = (&mut self.searcher, &mut self.printer);
match self.matcher {
RustRegex(ref m) => search_reader(m, searcher, printer, path, rdr),
}
}
}
/// Search the contents of the given file path using the given matcher,
/// searcher and printer.
fn search_path<M: Matcher, W: WriteColor>(
matcher: M,
searcher: &mut Searcher,
printer: &mut Printer<W>,
path: &Path,
) -> io::Result<SearchResult> {
match *printer {
Printer::Standard(ref mut p) => {
let mut sink = p.sink_with_path(&matcher, path);
searcher.search_path(&matcher, path, &mut sink)?;
Ok(SearchResult {
has_match: sink.has_match(),
stats: sink.stats().map(|s| s.clone()),
})
}
Printer::Summary(ref mut p) => {
let mut sink = p.sink_with_path(&matcher, path);
searcher.search_path(&matcher, path, &mut sink)?;
Ok(SearchResult {
has_match: sink.has_match(),
stats: sink.stats().map(|s| s.clone()),
})
}
Printer::JSON(ref mut p) => {
let mut sink = p.sink_with_path(&matcher, path);
searcher.search_path(&matcher, path, &mut sink)?;
Ok(SearchResult {
has_match: sink.has_match(),
stats: Some(sink.stats().clone()),
})
}
}
}
/// Search the contents of the given reader using the given matcher, searcher
/// and printer.
fn search_reader<M: Matcher, R: io::Read, W: WriteColor>(
matcher: M,
searcher: &mut Searcher,
printer: &mut Printer<W>,
path: &Path,
rdr: R,
) -> io::Result<SearchResult> {
match *printer {
Printer::Standard(ref mut p) => {
let mut sink = p.sink_with_path(&matcher, path);
searcher.search_reader(&matcher, rdr, &mut sink)?;
Ok(SearchResult {
has_match: sink.has_match(),
stats: sink.stats().map(|s| s.clone()),
})
}
Printer::Summary(ref mut p) => {
let mut sink = p.sink_with_path(&matcher, path);
searcher.search_reader(&matcher, rdr, &mut sink)?;
Ok(SearchResult {
has_match: sink.has_match(),
stats: sink.stats().map(|s| s.clone()),
})
}
Printer::JSON(ref mut p) => {
let mut sink = p.sink_with_path(&matcher, path);
searcher.search_reader(&matcher, rdr, &mut sink)?;
Ok(SearchResult {
has_match: sink.has_match(),
stats: Some(sink.stats().clone()),
})
}
}
}
/// Return the given duration as fractional seconds.
fn fractional_seconds(duration: Duration) -> f64 {
(duration.as_secs() as f64) + (duration.subsec_nanos() as f64 * 1e-9)
}

View File

@@ -1,424 +0,0 @@
/*!
The `search_buffer` module is responsible for searching a single file all in a
single buffer. Typically, the source of the buffer is a memory map. This can
be useful for when memory maps are faster than streaming search.
Note that this module doesn't quite support everything that `search_stream`
does. Notably, showing contexts.
*/
use std::cmp;
use std::path::Path;
use grep::Grep;
use termcolor::WriteColor;
use printer::Printer;
use search_stream::{IterLines, Options, count_lines, is_binary};
pub struct BufferSearcher<'a, W: 'a> {
opts: Options,
printer: &'a mut Printer<W>,
grep: &'a Grep,
path: &'a Path,
buf: &'a [u8],
match_line_count: u64,
match_count: Option<u64>,
line_count: Option<u64>,
byte_offset: Option<u64>,
last_line: usize,
}
impl<'a, W: WriteColor> BufferSearcher<'a, W> {
pub fn new(
printer: &'a mut Printer<W>,
grep: &'a Grep,
path: &'a Path,
buf: &'a [u8],
) -> BufferSearcher<'a, W> {
BufferSearcher {
opts: Options::default(),
printer: printer,
grep: grep,
path: path,
buf: buf,
match_line_count: 0,
match_count: None,
line_count: None,
byte_offset: None,
last_line: 0,
}
}
/// If enabled, searching will print a 0-based offset of the
/// matching line (or the actual match if -o is specified) before
/// printing the line itself.
///
/// Disabled by default.
pub fn byte_offset(mut self, yes: bool) -> Self {
self.opts.byte_offset = yes;
self
}
/// If enabled, searching will print a count instead of each match.
///
/// Disabled by default.
pub fn count(mut self, yes: bool) -> Self {
self.opts.count = yes;
self
}
/// If enabled, searching will print the count of individual matches
/// instead of each match.
///
/// Disabled by default.
pub fn count_matches(mut self, yes: bool) -> Self {
self.opts.count_matches = yes;
self
}
/// If enabled, searching will print the path instead of each match.
///
/// Disabled by default.
pub fn files_with_matches(mut self, yes: bool) -> Self {
self.opts.files_with_matches = yes;
self
}
/// If enabled, searching will print the path of files that *don't* match
/// the given pattern.
///
/// Disabled by default.
pub fn files_without_matches(mut self, yes: bool) -> Self {
self.opts.files_without_matches = yes;
self
}
/// Set the end-of-line byte used by this searcher.
pub fn eol(mut self, eol: u8) -> Self {
self.opts.eol = eol;
self
}
/// If enabled, matching is inverted so that lines that *don't* match the
/// given pattern are treated as matches.
pub fn invert_match(mut self, yes: bool) -> Self {
self.opts.invert_match = yes;
self
}
/// If enabled, compute line numbers and prefix each line of output with
/// them.
pub fn line_number(mut self, yes: bool) -> Self {
self.opts.line_number = yes;
self
}
/// Limit the number of matches to the given count.
///
/// The default is None, which corresponds to no limit.
pub fn max_count(mut self, count: Option<u64>) -> Self {
self.opts.max_count = count;
self
}
/// If enabled, don't show any output and quit searching after the first
/// match is found.
pub fn quiet(mut self, yes: bool) -> Self {
self.opts.quiet = yes;
self
}
/// If enabled, search binary files as if they were text.
pub fn text(mut self, yes: bool) -> Self {
self.opts.text = yes;
self
}
#[inline(never)]
pub fn run(mut self) -> u64 {
let binary_upto = cmp::min(10_240, self.buf.len());
if !self.opts.text && is_binary(&self.buf[..binary_upto], true) {
return 0;
}
self.match_line_count = 0;
self.line_count = if self.opts.line_number { Some(0) } else { None };
// The memory map searcher uses one contiguous block of bytes, so the
// offsets given the printer are sufficient to compute the byte offset.
self.byte_offset = if self.opts.byte_offset { Some(0) } else { None };
self.match_count = if self.opts.count_matches { Some(0) } else { None };
let mut last_end = 0;
for m in self.grep.iter(self.buf) {
if self.opts.invert_match {
self.print_inverted_matches(last_end, m.start());
} else {
self.print_match(m.start(), m.end());
}
last_end = m.end();
if self.opts.terminate(self.match_line_count) {
break;
}
}
if self.opts.invert_match && !self.opts.terminate(self.match_line_count) {
let upto = self.buf.len();
self.print_inverted_matches(last_end, upto);
}
if self.opts.count && self.match_line_count > 0 {
self.printer.path_count(self.path, self.match_line_count);
} else if self.opts.count_matches
&& self.match_count.map_or(false, |c| c > 0)
{
self.printer.path_count(self.path, self.match_count.unwrap());
}
if self.opts.files_with_matches && self.match_line_count > 0 {
self.printer.path(self.path);
}
if self.opts.files_without_matches && self.match_line_count == 0 {
self.printer.path(self.path);
}
self.match_line_count
}
#[inline(always)]
fn count_individual_matches(&mut self, start: usize, end: usize) {
if let Some(ref mut count) = self.match_count {
for _ in self.grep.regex().find_iter(&self.buf[start..end]) {
*count += 1;
}
}
}
#[inline(always)]
pub fn print_match(&mut self, start: usize, end: usize) {
self.match_line_count += 1;
self.count_individual_matches(start, end);
if self.opts.skip_matches() {
return;
}
self.count_lines(start);
self.add_line(end);
self.printer.matched(
self.grep.regex(), self.path, self.buf,
start, end, self.line_count, self.byte_offset);
}
#[inline(always)]
fn print_inverted_matches(&mut self, start: usize, end: usize) {
debug_assert!(self.opts.invert_match);
let mut it = IterLines::new(self.opts.eol, start);
while let Some((s, e)) = it.next(&self.buf[..end]) {
if self.opts.terminate(self.match_line_count) {
return;
}
self.print_match(s, e);
}
}
#[inline(always)]
fn count_lines(&mut self, upto: usize) {
if let Some(ref mut line_count) = self.line_count {
*line_count += count_lines(
&self.buf[self.last_line..upto], self.opts.eol);
self.last_line = upto;
}
}
#[inline(always)]
fn add_line(&mut self, line_end: usize) {
if let Some(ref mut line_count) = self.line_count {
*line_count += 1;
self.last_line = line_end;
}
}
}
#[cfg(test)]
mod tests {
use std::path::Path;
use grep::GrepBuilder;
use printer::Printer;
use termcolor;
use super::BufferSearcher;
const SHERLOCK: &'static str = "\
For the Doctor Watsons of this world, as opposed to the Sherlock
Holmeses, success in the province of detective work must always
be, to a very large extent, the result of luck. Sherlock Holmes
can extract a clew from a wisp of straw or a flake of cigar ash;
but Doctor Watson has to have it taken out for him and dusted,
and exhibited clearly, with a label attached.\
";
fn test_path() -> &'static Path {
&Path::new("/baz.rs")
}
type TestSearcher<'a> = BufferSearcher<'a, termcolor::NoColor<Vec<u8>>>;
fn search<F: FnMut(TestSearcher) -> TestSearcher>(
pat: &str,
haystack: &str,
mut map: F,
) -> (u64, String) {
let outbuf = termcolor::NoColor::new(vec![]);
let mut pp = Printer::new(outbuf).with_filename(true);
let grep = GrepBuilder::new(pat).build().unwrap();
let count = {
let searcher = BufferSearcher::new(
&mut pp, &grep, test_path(), haystack.as_bytes());
map(searcher).run()
};
(count, String::from_utf8(pp.into_inner().into_inner()).unwrap())
}
#[test]
fn basic_search() {
let (count, out) = search("Sherlock", SHERLOCK, |s|s);
assert_eq!(2, count);
assert_eq!(out, "\
/baz.rs:For the Doctor Watsons of this world, as opposed to the Sherlock
/baz.rs:be, to a very large extent, the result of luck. Sherlock Holmes
");
}
#[test]
fn binary() {
let text = "Sherlock\n\x00Holmes\n";
let (count, out) = search("Sherlock|Holmes", text, |s|s);
assert_eq!(0, count);
assert_eq!(out, "");
}
#[test]
fn binary_text() {
let text = "Sherlock\n\x00Holmes\n";
let (count, out) = search("Sherlock|Holmes", text, |s| s.text(true));
assert_eq!(2, count);
assert_eq!(out, "/baz.rs:Sherlock\n/baz.rs:\x00Holmes\n");
}
#[test]
fn line_numbers() {
let (count, out) = search(
"Sherlock", SHERLOCK, |s| s.line_number(true));
assert_eq!(2, count);
assert_eq!(out, "\
/baz.rs:1:For the Doctor Watsons of this world, as opposed to the Sherlock
/baz.rs:3:be, to a very large extent, the result of luck. Sherlock Holmes
");
}
#[test]
fn byte_offset() {
let (_, out) = search(
"Sherlock", SHERLOCK, |s| s.byte_offset(true));
assert_eq!(out, "\
/baz.rs:0:For the Doctor Watsons of this world, as opposed to the Sherlock
/baz.rs:129:be, to a very large extent, the result of luck. Sherlock Holmes
");
}
#[test]
fn byte_offset_inverted() {
let (_, out) = search("Sherlock", SHERLOCK, |s| {
s.invert_match(true).byte_offset(true)
});
assert_eq!(out, "\
/baz.rs:65:Holmeses, success in the province of detective work must always
/baz.rs:193:can extract a clew from a wisp of straw or a flake of cigar ash;
/baz.rs:258:but Doctor Watson has to have it taken out for him and dusted,
/baz.rs:321:and exhibited clearly, with a label attached.
");
}
#[test]
fn count() {
let (count, out) = search(
"Sherlock", SHERLOCK, |s| s.count(true));
assert_eq!(2, count);
assert_eq!(out, "/baz.rs:2\n");
}
#[test]
fn count_matches() {
let (_, out) = search(
"the", SHERLOCK, |s| s.count_matches(true));
assert_eq!(out, "/baz.rs:4\n");
}
#[test]
fn files_with_matches() {
let (count, out) = search(
"Sherlock", SHERLOCK, |s| s.files_with_matches(true));
assert_eq!(1, count);
assert_eq!(out, "/baz.rs\n");
}
#[test]
fn files_without_matches() {
let (count, out) = search(
"zzzz", SHERLOCK, |s| s.files_without_matches(true));
assert_eq!(0, count);
assert_eq!(out, "/baz.rs\n");
}
#[test]
fn max_count() {
let (count, out) = search(
"Sherlock", SHERLOCK, |s| s.max_count(Some(1)));
assert_eq!(1, count);
assert_eq!(out, "\
/baz.rs:For the Doctor Watsons of this world, as opposed to the Sherlock
");
}
#[test]
fn invert_match_max_count() {
let (count, out) = search(
"zzzz", SHERLOCK, |s| s.invert_match(true).max_count(Some(1)));
assert_eq!(1, count);
assert_eq!(out, "\
/baz.rs:For the Doctor Watsons of this world, as opposed to the Sherlock
");
}
#[test]
fn invert_match() {
let (count, out) = search(
"Sherlock", SHERLOCK, |s| s.invert_match(true));
assert_eq!(4, count);
assert_eq!(out, "\
/baz.rs:Holmeses, success in the province of detective work must always
/baz.rs:can extract a clew from a wisp of straw or a flake of cigar ash;
/baz.rs:but Doctor Watson has to have it taken out for him and dusted,
/baz.rs:and exhibited clearly, with a label attached.
");
}
#[test]
fn invert_match_line_numbers() {
let (count, out) = search("Sherlock", SHERLOCK, |s| {
s.invert_match(true).line_number(true)
});
assert_eq!(4, count);
assert_eq!(out, "\
/baz.rs:2:Holmeses, success in the province of detective work must always
/baz.rs:4:can extract a clew from a wisp of straw or a flake of cigar ash;
/baz.rs:5:but Doctor Watson has to have it taken out for him and dusted,
/baz.rs:6:and exhibited clearly, with a label attached.
");
}
#[test]
fn invert_match_count() {
let (count, out) = search("Sherlock", SHERLOCK, |s| {
s.invert_match(true).count(true)
});
assert_eq!(4, count);
assert_eq!(out, "/baz.rs:4\n");
}
}

Some files were not shown because too many files have changed in this diff Show More