Compare commits

..

35 Commits

Author SHA1 Message Date
Andrew Gallant
835600794f termcolor: release 0.3.6 2018-03-26 17:28:21 -04:00
ehuss
07713fb5c5 termcolor: fix bold + intense colors in Win 10
There is an issue with the Windows 10 console where if you issue the bold
escape sequence after one of the extended foreground colors, it overrides the
color.  This happens in termcolor if you have bold, intense, and color set.
The workaround is to issue the bold sequence before the color.

Fixes rust-lang/rust#49322
2018-03-26 16:42:48 -04:00
Dezhi “Andy” Fang
d7c9323a68 deps: update regex
This fixes build failures on latest nightly with SIMD features.
2018-03-17 19:33:34 -04:00
Andrew Gallant
b7d29d126f deps: update clap, atty, libc
Nothing to see here.

Note that we continue to refrain to update tempdir, which means we are
still bringing in rand 0.4 and rand 0.3. Updating tempdir brings in an
old version of remove_dir_all, which in turn brings in winapi 0.2. No
thanks.
2018-03-13 22:55:39 -04:00
Andrew Gallant
42b8132d0a grep: add "perfect" smart case detection
This commit removes the previous smart case detection logic and replaces
it with detection based on the regex AST. This particular AST is a faithful
representation of the concrete syntax, which lets us be very precise in
how we handle it.

Closes #851
2018-03-13 22:55:39 -04:00
Andrew Gallant
cd08707c7c grep: upgrade to regex-syntax 0.5
This update brings with it many bug fixes:

  * Better error messages are printed overall. We also include
    explicit call out for unsupported features like backreferences
    and look-around.
  * Regexes like `\s*{` no longer emit incomprehensible errors.
  * Unicode escape sequences, such as `\u{..}` are now supported.

For the most part, this upgrade was done in a straight-forward way. We
resist the urge to refactor the `grep` crate, in anticipation of it
being rewritten anyway.

Note that we removed the `--fixed-strings` suggestion whenever a regex
syntax error occurs. In practice, I've found that it results in a lot of
false positives, and I believe that its use is not as paramount now that
regex parse errors are much more readable.

Closes #268, Closes #395, Closes #702, Closes #853
2018-03-13 22:55:39 -04:00
Andrew Gallant
c2e97cd858 changelog: update for 0.9.0 2018-03-12 23:21:42 -04:00
Andrew Gallant
1f70e9187c deps: update regex crate
This update brings with it a new feature of the regex crate which will
now use SIMD optimizations automatically at runtime with no necessary
compile time flags. All that's needed is to enable the `unstable` feature.

Other crates, such as bytecount and encoding_rs, are still using the
old-style SIMD support, so we leave the simd-accel and avx-accel features.
However, the binaries we distribute on Github no longer have those
features enabled, which makes them truly portable.

Fixes #135
2018-03-12 23:21:42 -04:00
Markus Staab
7120f32258 globset/doc: update README for 0.3 release 2018-03-12 07:19:55 -04:00
Balaji Sivaraman
00520b30f5 output: add --stats flag
This commit provides basic support for a --stats flag, which will print
various aggregate statistics about a search after all of the results
have been printed. This is mostly intended to support a similar feature
found in the Silver Searcher. Note though that we don't emit the total
bytes searched; this is a first pass at an implementation and we can
improve upon it later.

Closes #411, Closes #799
2018-03-10 10:59:00 -05:00
Andrew Gallant
11a8f0eaf0 args: treat --count --only-matching as --count-matches
Namely, when ripgrep is asked to count things and is also asked to print
every match on its own line, then we should just automatically count the
matches and not the lines. This is a departure from how GNU grep behaves,
but there is a compelling argument to be made that GNU grep's behavior
doesn't make a lot of sense.

Note that since this changes the behavior of combining two existing
flags, this is a breaking change.
2018-03-10 10:38:34 -05:00
Balaji Sivaraman
27fc9f2fd3 search: add a --count-matches flag
This commit introduces a new flag, --count-matches, which will cause
ripgrep to report a total count of all matches instead of a count of
total lines matched.

Closes #566, Closes #814
2018-03-10 10:38:25 -05:00
Balaji Sivaraman
96f73293c0 cleanup: rename match_count to match_line_count 2018-03-10 10:23:38 -05:00
Balaji Sivaraman
b006943c01 search: add -b/--byte-offset flag
This commit adds support for printing 0-based byte offset before each
line. We handle corner cases such as `-o/--only-matching` and
`-C/--context` as well.

Closes #812
2018-03-10 10:15:19 -05:00
Brian Malehorn
91d0756f62 ignore: support backslash escaping
Use the new `Globset::backslash_escape` knob to conform to git behavior:
`\` will escape the following character. For example, the pattern `\*`
will match a file literally named `*`.

Also tweak a test in ripgrep that was relying on this incorrect
behavior.

Closes #526, Closes #811
2018-03-10 09:30:55 -05:00
Andrew Gallant
54256515b4 globset: make ErrorKind enum extensible
This commit makes the ErrorKind enum extensible by adding a
__Nonexhaustive variant. Callers should use this as a hint that
exhaustive case analysis isn't possible in a stable way since new
variants may be added in the future without a semver bump.
2018-03-10 09:30:55 -05:00
Brian Malehorn
e2516ed095 globset: support backslash escaping
From `man 7 glob`:

    One can remove the special meaning of '?', '*' and '[' by preceding
    them by a backslash, or, in case this is part of a shell command
    line, enclosing them in quotes.

Conform to glob / fnmatch / git implementations by making `\` escape the
following character - for example `\?` will match a literal `?`.

However, only enable this by default on Unix platforms. Windows builds
will continue to use `\` as a path separator, but can still get the new
behavior by calling `globset.backslash_escape(true)`.

Adding tests for the `Globset::backslash_escape` option was a bit
involved, since the default value of this option is platform-dependent.

Extend the options framework to hold an `Option<T>` for each
knob, where `None` means "default" and `Some(v)` means "override with
`v`". This way we only have to specify the default values once in
`GlobOptions::default()` rather than replicated in both code and tests.

Finally write a few behavioral tests, and some tests to confirm it
varies by platform.
2018-03-10 09:30:55 -05:00
Alejandro Barreto
c0c80e0209 doc: add Windows Scoop install instructions 2018-03-10 08:15:22 -05:00
Andrew Gallant
dbf6f15625 mmap: handle ENOMEM error
This commit causes a memory map strategy to fall back to a file backed
strategy if the mmap call fails with an `ENOMEM` error.

Fixes #852
2018-03-10 08:13:27 -05:00
Ryan Hayle
9163aaac27 doc: add additional instructions for snap
In the snap store, ripgrep 0.8.1 is only available as a candidate
release, while the default (stable) release is 0.7.1.
2018-03-09 07:09:02 -05:00
Patrick Artounian
9d7448bfc0 doc: shorten Rust nightly brew install command
The burntsushi/ripgrep/ prefix is not needed.
2018-03-07 12:44:06 -05:00
Richard Dodd (dodj)
b98585b429 termcolor/doc: fix typo 2018-03-03 09:20:28 -05:00
Andrew Gallant
f5411b992c doc: use PATTERNFILE for -f/--file flag
Fixes #832
2018-02-23 12:18:15 -05:00
Anthony Wong
492effc7be pkg: update snapcraft.yaml
Update version number and don't declare alias as it has been deprecated.
2018-02-23 06:51:39 -05:00
Andrew Gallant
4889d2d37c doc: clarify licensing story 2018-02-22 19:01:43 -05:00
Andrew Gallant
354996a16f doc: clarify snap installation instructions
Fixes #782
2018-02-22 16:56:22 -05:00
Andrew Gallant
cbebb010a7 doc: fix typos 2018-02-21 15:59:12 -05:00
Andrew Gallant
7098daf6a8 doc: "to rip" means "fast"
Answer the origin story of ripgrep's name.
2018-02-21 15:53:50 -05:00
Andrew Gallant
17d09c0882 pkg: update brew tap to 0.8.1 2018-02-20 21:51:41 -05:00
Andrew Gallant
c8e9f25b85 ci: fix macOS asciidoc installation 2018-02-20 21:05:40 -05:00
Andrew Gallant
9305f89f39 ci: fix man page generation on macOS 2018-02-20 20:55:12 -05:00
Andrew Gallant
9c216ad9a4 release: 0.8.1 2018-02-20 20:19:03 -05:00
Andrew Gallant
6862e07870 changelog: 0.8.1 2018-02-20 20:18:01 -05:00
Andrew Gallant
a6d09b2d42 deps: update to clap 2.30.0 2018-02-20 20:16:57 -05:00
Andrew Gallant
ab1b877c20 termcolor: release 0.3.5 2018-02-20 20:15:08 -05:00
37 changed files with 1253 additions and 373 deletions

View File

@@ -30,7 +30,8 @@ matrix:
env: TARGET=x86_64-unknown-linux-musl
- os: osx
rust: nightly
env: TARGET=x86_64-apple-darwin
# XML_CATALOG_FILES is apparently necessary for asciidoc on macOS.
env: TARGET=x86_64-apple-darwin XML_CATALOG_FILES=/usr/local/etc/xml/catalog
- os: linux
rust: nightly
env: TARGET=arm-unknown-linux-gnueabihf GCC_VERSION=4.8

View File

@@ -1,3 +1,92 @@
0.9.0 (TBD)
===========
This is a new minor version release of ripgrep that mostly contains bug fixes.
Releases provided on Github for `x86` and `x86_64` will now work on all target
CPUs, and will also automatically take advantage of features found on modern
CPUs (such as AVX2) for additional optimizations.
**BREAKING CHANGES**:
* When `--count` and `--only-matching` are provided simultaneously, the
behavior of ripgrep is as if the `--count-matches` flag was given. That is,
the total number of matches is reported, where there may be multiple matches
per line. Previously, the behavior of ripgrep was to report the total number
of matching lines. (Note that this behavior diverges from the behavior of
GNU grep.)
* Octal syntax is no longer supported. ripgrep previously accepted expressions
like `\1` as syntax for matching `U+0001`, but ripgrep will now report an
error instead.
Feature enhancements:
* [FEATURE #411](https://github.com/BurntSushi/ripgrep/issues/411):
Add a `--stats` flag, which emits aggregate statistics after search results.
* [FEATURE #702](https://github.com/BurntSushi/ripgrep/issues/702):
Support `\u{..}` Unicode escape sequences.
* [FEATURE #812](https://github.com/BurntSushi/ripgrep/issues/812):
Add `-b/--byte-offset` flag that reports byte offset of each matching line.
* [FEATURE #814](https://github.com/BurntSushi/ripgrep/issues/814):
Add `--count-matches` flag, which is like `--count`, but for each match.
Bug fixes:
* [BUG #135](https://github.com/BurntSushi/ripgrep/issues/135):
Release portable binaries that conditionally use SSSE3, AVX2, etc., at
runtime.
* [BUG #268](https://github.com/BurntSushi/ripgrep/issues/268):
Print descriptive error message when trying to use look-around or
backreferences.
* [BUG #395](https://github.com/BurntSushi/ripgrep/issues/395):
Show comprehensible error messages for regexes like `\s*{`.
* [BUG #526](https://github.com/BurntSushi/ripgrep/issues/526):
Support backslash escapes in globs.
* [BUG #832](https://github.com/BurntSushi/ripgrep/issues/832):
Clarify usage instructions for `-f/--file` flag.
* [BUG #851](https://github.com/BurntSushi/ripgrep/issues/851):
Fix `-S/--smart-case` detection once and for all.
* [BUG #852](https://github.com/BurntSushi/ripgrep/issues/852):
Be robust with respect to `ENOMEM` errors returned by `mmap`.
* [BUG #853](https://github.com/BurntSushi/ripgrep/issues/853):
Upgrade `grep` crate to `regex-syntax 0.5.0`.
0.8.1 (2018-02-20)
==================
This is a patch release of ripgrep that primarily fixes regressions introduced
in 0.8.0 (#820 and #824) in directory traversal on Windows. These regressions
do not impact non-Windows users.
Feature enhancements:
* Added or improved file type filtering for csv and VHDL.
* [FEATURE #798](https://github.com/BurntSushi/ripgrep/issues/798):
Add `underline` support to `termcolor` and ripgrep. See documentation on the
`--colors` flag for details.
Bug fixes:
* [BUG #684](https://github.com/BurntSushi/ripgrep/issues/684):
Improve documentation for the `--ignore-file` flag.
* [BUG #789](https://github.com/BurntSushi/ripgrep/issues/789):
Don't show `(rev )` if the revision wasn't available during the build.
* [BUG #791](https://github.com/BurntSushi/ripgrep/issues/791):
Add man page to ARM release.
* [BUG #797](https://github.com/BurntSushi/ripgrep/issues/797):
Improve documentation for "intense" setting in `termcolor`.
* [BUG #800](https://github.com/BurntSushi/ripgrep/issues/800):
Fix a bug in the `ignore` crate for custom ignore files. This had no impact
on ripgrep.
* [BUG #807](https://github.com/BurntSushi/ripgrep/issues/807):
Fix a bug where `rg --hidden .` behaved differently from `rg --hidden ./`.
* [BUG #815](https://github.com/BurntSushi/ripgrep/issues/815):
Clarify a common failure mode in user guide.
* [BUG #820](https://github.com/BurntSushi/ripgrep/issues/820):
Fixes a bug on Windows where symlinks were followed even if not requested.
* [BUG #824](https://github.com/BurntSushi/ripgrep/issues/824):
Fix a performance regression in directory traversal on Windows.
0.8.0 (2018-02-11)
==================
This is a new minor version releae of ripgrep that satisfies several popular

79
Cargo.lock generated
View File

@@ -8,15 +8,18 @@ dependencies = [
[[package]]
name = "ansi_term"
version = "0.10.2"
version = "0.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "atty"
version = "0.2.6"
version = "0.2.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.39 (registry+https://github.com/rust-lang/crates.io-index)",
"termion 1.5.1 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)",
]
@@ -41,11 +44,11 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "clap"
version = "2.29.4"
version = "2.31.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"ansi_term 0.10.2 (registry+https://github.com/rust-lang/crates.io-index)",
"atty 0.2.6 (registry+https://github.com/rust-lang/crates.io-index)",
"ansi_term 0.11.0 (registry+https://github.com/rust-lang/crates.io-index)",
"atty 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)",
"bitflags 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"strsim 0.7.0 (registry+https://github.com/rust-lang/crates.io-index)",
"textwrap 0.9.0 (registry+https://github.com/rust-lang/crates.io-index)",
@@ -99,7 +102,7 @@ dependencies = [
"glob 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)",
"log 0.4.1 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 0.2.6 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 0.2.10 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
@@ -108,8 +111,8 @@ version = "0.1.8"
dependencies = [
"log 0.4.1 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 0.2.6 (registry+https://github.com/rust-lang/crates.io-index)",
"regex-syntax 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 0.2.10 (registry+https://github.com/rust-lang/crates.io-index)",
"regex-syntax 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
@@ -121,7 +124,7 @@ dependencies = [
"lazy_static 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
"log 0.4.1 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 0.2.6 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 0.2.10 (registry+https://github.com/rust-lang/crates.io-index)",
"same-file 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"tempdir 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
"thread_local 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
@@ -136,7 +139,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "libc"
version = "0.2.36"
version = "0.2.39"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
@@ -152,7 +155,7 @@ name = "memchr"
version = "2.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.39 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
@@ -160,7 +163,7 @@ name = "memmap"
version = "0.6.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.39 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)",
]
@@ -169,7 +172,7 @@ name = "num_cpus"
version = "1.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.39 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
@@ -178,7 +181,7 @@ version = "0.3.22"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"fuchsia-zircon 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.39 (registry+https://github.com/rust-lang/crates.io-index)",
"rand 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)",
]
@@ -188,7 +191,7 @@ version = "0.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"fuchsia-zircon 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.39 (registry+https://github.com/rust-lang/crates.io-index)",
"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)",
]
@@ -207,42 +210,44 @@ dependencies = [
[[package]]
name = "regex"
version = "0.2.6"
version = "0.2.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"aho-corasick 0.6.4 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"regex-syntax 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)",
"simd 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)",
"regex-syntax 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)",
"thread_local 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
"utf8-ranges 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "regex-syntax"
version = "0.4.2"
version = "0.5.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"ucd-util 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "ripgrep"
version = "0.8.0"
version = "0.8.1"
dependencies = [
"atty 0.2.6 (registry+https://github.com/rust-lang/crates.io-index)",
"atty 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)",
"bytecount 0.3.1 (registry+https://github.com/rust-lang/crates.io-index)",
"clap 2.29.4 (registry+https://github.com/rust-lang/crates.io-index)",
"clap 2.31.1 (registry+https://github.com/rust-lang/crates.io-index)",
"encoding_rs 0.7.2 (registry+https://github.com/rust-lang/crates.io-index)",
"globset 0.3.0",
"grep 0.1.8",
"ignore 0.4.1",
"lazy_static 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.39 (registry+https://github.com/rust-lang/crates.io-index)",
"log 0.4.1 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)",
"memmap 0.6.2 (registry+https://github.com/rust-lang/crates.io-index)",
"num_cpus 1.8.0 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 0.2.6 (registry+https://github.com/rust-lang/crates.io-index)",
"regex 0.2.10 (registry+https://github.com/rust-lang/crates.io-index)",
"same-file 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)",
"termcolor 0.3.4",
"termcolor 0.3.6",
"winapi 0.3.4 (registry+https://github.com/rust-lang/crates.io-index)",
]
@@ -274,7 +279,7 @@ dependencies = [
[[package]]
name = "termcolor"
version = "0.3.4"
version = "0.3.6"
dependencies = [
"wincolor 0.1.6",
]
@@ -284,7 +289,7 @@ name = "termion"
version = "1.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
"libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.39 (registry+https://github.com/rust-lang/crates.io-index)",
"redox_syscall 0.1.37 (registry+https://github.com/rust-lang/crates.io-index)",
"redox_termios 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)",
]
@@ -306,6 +311,11 @@ dependencies = [
"unreachable 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)",
]
[[package]]
name = "ucd-util"
version = "0.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "unicode-width"
version = "0.1.4"
@@ -366,12 +376,12 @@ dependencies = [
[metadata]
"checksum aho-corasick 0.6.4 (registry+https://github.com/rust-lang/crates.io-index)" = "d6531d44de723825aa81398a6415283229725a00fa30713812ab9323faa82fc4"
"checksum ansi_term 0.10.2 (registry+https://github.com/rust-lang/crates.io-index)" = "6b3568b48b7cefa6b8ce125f9bb4989e52fbcc29ebea88df04cc7c5f12f70455"
"checksum atty 0.2.6 (registry+https://github.com/rust-lang/crates.io-index)" = "8352656fd42c30a0c3c89d26dea01e3b77c0ab2af18230835c15e2e13cd51859"
"checksum ansi_term 0.11.0 (registry+https://github.com/rust-lang/crates.io-index)" = "ee49baf6cb617b853aa8d93bf420db2383fab46d314482ca2803b40d5fde979b"
"checksum atty 0.2.8 (registry+https://github.com/rust-lang/crates.io-index)" = "af80143d6f7608d746df1520709e5d141c96f240b0e62b0aa41bdfb53374d9d4"
"checksum bitflags 1.0.1 (registry+https://github.com/rust-lang/crates.io-index)" = "b3c30d3802dfb7281680d6285f2ccdaa8c2d8fee41f93805dba5c4cf50dc23cf"
"checksum bytecount 0.3.1 (registry+https://github.com/rust-lang/crates.io-index)" = "882585cd7ec84e902472df34a5e01891202db3bf62614e1f0afe459c1afcf744"
"checksum cfg-if 0.1.2 (registry+https://github.com/rust-lang/crates.io-index)" = "d4c819a1287eb618df47cc647173c5c4c66ba19d888a6e50d605672aed3140de"
"checksum clap 2.29.4 (registry+https://github.com/rust-lang/crates.io-index)" = "7b8f59bcebcfe4269b09f71dab0da15b355c75916a8f975d3876ce81561893ee"
"checksum clap 2.31.1 (registry+https://github.com/rust-lang/crates.io-index)" = "5dc18f6f4005132120d9711636b32c46a233fad94df6217fa1d81c5e97a9f200"
"checksum crossbeam 0.3.2 (registry+https://github.com/rust-lang/crates.io-index)" = "24ce9782d4d5c53674646a6a4c1863a21a8fc0cb649b3c94dfc16e45071dea19"
"checksum encoding_rs 0.7.2 (registry+https://github.com/rust-lang/crates.io-index)" = "98fd0f24d1fb71a4a6b9330c8ca04cbd4e7cc5d846b54ca74ff376bc7c9f798d"
"checksum fnv 1.0.6 (registry+https://github.com/rust-lang/crates.io-index)" = "2fad85553e09a6f881f739c29f0b00b0f01357c743266d478b68951ce23285f3"
@@ -379,7 +389,7 @@ dependencies = [
"checksum fuchsia-zircon-sys 0.3.3 (registry+https://github.com/rust-lang/crates.io-index)" = "3dcaa9ae7725d12cdb85b3ad99a434db70b468c09ded17e012d86b5c1010f7a7"
"checksum glob 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)" = "8be18de09a56b60ed0edf84bc9df007e30040691af7acd1c41874faac5895bfb"
"checksum lazy_static 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "c8f31047daa365f19be14b47c29df4f7c3b581832407daabe6ae77397619237d"
"checksum libc 0.2.36 (registry+https://github.com/rust-lang/crates.io-index)" = "1e5d97d6708edaa407429faa671b942dc0f2727222fb6b6539bf1db936e4b121"
"checksum libc 0.2.39 (registry+https://github.com/rust-lang/crates.io-index)" = "f54263ad99207254cf58b5f701ecb432c717445ea2ee8af387334bdd1a03fdff"
"checksum log 0.4.1 (registry+https://github.com/rust-lang/crates.io-index)" = "89f010e843f2b1a31dbd316b3b8d443758bc634bed37aabade59c686d644e0a2"
"checksum memchr 2.0.1 (registry+https://github.com/rust-lang/crates.io-index)" = "796fba70e76612589ed2ce7f45282f5af869e0fdd7cc6199fa1aa1f1d591ba9d"
"checksum memmap 0.6.2 (registry+https://github.com/rust-lang/crates.io-index)" = "e2ffa2c986de11a9df78620c01eeaaf27d94d3ff02bf81bfcca953102dd0c6ff"
@@ -388,8 +398,8 @@ dependencies = [
"checksum rand 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)" = "eba5f8cb59cc50ed56be8880a5c7b496bfd9bd26394e176bc67884094145c2c5"
"checksum redox_syscall 0.1.37 (registry+https://github.com/rust-lang/crates.io-index)" = "0d92eecebad22b767915e4d529f89f28ee96dbbf5a4810d2b844373f136417fd"
"checksum redox_termios 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "7e891cfe48e9100a70a3b6eb652fef28920c117d366339687bd5576160db0f76"
"checksum regex 0.2.6 (registry+https://github.com/rust-lang/crates.io-index)" = "5be5347bde0c48cfd8c3fdc0766cdfe9d8a755ef84d620d6794c778c91de8b2b"
"checksum regex-syntax 0.4.2 (registry+https://github.com/rust-lang/crates.io-index)" = "8e931c58b93d86f080c734bfd2bce7dd0079ae2331235818133c8be7f422e20e"
"checksum regex 0.2.10 (registry+https://github.com/rust-lang/crates.io-index)" = "aec3f58d903a7d2a9dc2bf0e41a746f4530e0cab6b615494e058f67a3ef947fb"
"checksum regex-syntax 0.5.3 (registry+https://github.com/rust-lang/crates.io-index)" = "b2550876c31dc914696a6c2e01cbce8afba79a93c8ae979d2fe051c0230b3756"
"checksum same-file 1.0.2 (registry+https://github.com/rust-lang/crates.io-index)" = "cfb6eded0b06a0b512c8ddbcf04089138c9b4362c2f696f3c3d76039d68f3637"
"checksum simd 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)" = "3dd0805c7363ab51a829a1511ad24b6ed0349feaa756c4bc2f977f9f496e6673"
"checksum strsim 0.7.0 (registry+https://github.com/rust-lang/crates.io-index)" = "bb4f380125926a99e52bc279241539c018323fab05ad6368b56f93d9369ff550"
@@ -397,6 +407,7 @@ dependencies = [
"checksum termion 1.5.1 (registry+https://github.com/rust-lang/crates.io-index)" = "689a3bdfaab439fd92bc87df5c4c78417d3cbe537487274e9b0b2dce76e92096"
"checksum textwrap 0.9.0 (registry+https://github.com/rust-lang/crates.io-index)" = "c0b59b6b4b44d867f1370ef1bd91bfb262bf07bf0ae65c202ea2fbc16153b693"
"checksum thread_local 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)" = "279ef31c19ededf577bfd12dfae728040a21f635b06a24cd670ff510edd38963"
"checksum ucd-util 0.1.1 (registry+https://github.com/rust-lang/crates.io-index)" = "fd2be2d6639d0f8fe6cdda291ad456e23629558d466e2789d2c3e9892bda285d"
"checksum unicode-width 0.1.4 (registry+https://github.com/rust-lang/crates.io-index)" = "bf3a113775714a22dcb774d8ea3655c53a32debae63a063acc00a91cc586245f"
"checksum unreachable 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "382810877fe448991dfc7f0dd6e3ae5d58088fd0ea5e35189655f84e6814fa56"
"checksum utf8-ranges 1.0.0 (registry+https://github.com/rust-lang/crates.io-index)" = "662fab6525a98beff2921d7f61a39e7d59e0b425ebc7d0d9e66d316e55124122"

View File

@@ -1,6 +1,6 @@
[package]
name = "ripgrep"
version = "0.8.0" #:version
version = "0.8.1" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Line oriented search tool using Rust's regex library. Combines the raw
@@ -45,7 +45,7 @@ log = "0.4"
memchr = "2"
memmap = "0.6"
num_cpus = "1"
regex = "0.2.4"
regex = "0.2.10"
same-file = "1"
termcolor = { version = "0.3.4", path = "termcolor" }
@@ -67,12 +67,13 @@ default-features = false
features = ["suggestions", "color"]
[features]
avx-accel = ["bytecount/avx-accel"]
avx-accel = ["bytecount/avx-accel", "regex/unstable"]
simd-accel = [
"bytecount/simd-accel",
"regex/simd-accel",
"encoding_rs/simd-accel",
"regex/unstable",
]
unstable = ["regex/unstable"]
[profile.release]
debug = true

67
FAQ.md
View File

@@ -20,7 +20,9 @@
* [How do I create an alias for ripgrep on Windows?](#rg-alias-windows)
* [How do I create a PowerShell profile?](#powershell-profile)
* [How do I pipe non-ASCII content to ripgrep on Windows?](#pipe-non-ascii-windows)
* [How is ripgrep licensed?](#license)
* [Can ripgrep replace grep?](#posix4ever)
* [What does the "rip" in ripgrep mean?](#intentcountsforsomething)
<h3 name="config">
@@ -469,6 +471,34 @@ will also reset when PowerShell is restarted, so you can add that line
to your profile as well if you want to make the setting permanent.
<h3 name="license">
How is ripgrep licensed?
</h3>
ripgrep is dual licensed under the
[Unlicense](https://unlicense.org/)
and MIT licenses. Specifically, you may use ripgrep under the terms of either
license.
The reason why ripgrep is dual licensed this way is two-fold:
1. I, as ripgrep's author, would like to participate in a small bit of
ideological activism by promoting the Unlicense's goal: to disclaim
copyright monopoly interest.
2. I, as ripgrep's author, would like as many people to use rigprep as
possible. Since the Unlicense is not a proven or well known license, ripgrep
is also offered under the MIT license, which is ubiquitous and accepted by
almost everyone.
More specifically, ripgrep and all its dependencies are compatible with this
licensing choice. In particular, ripgrep's dependencies (direct and transitive)
will always be limited to permissive licenses. That is, ripgrep will never
depend on code that is not permissively licensed. This means rejecting any
dependency that uses a copyleft license such as the GPL, LGPL, MPL or any of
the Creative Commons ShareAlike licenses. Whether the license is "weak"
copyleft or not does not matter; ripgrep will **not** depend on it.
<h3 name="posix4ever">
Can ripgrep replace grep?
</h3>
@@ -530,3 +560,40 @@ for the previous section apply.
* Is there a particular feature of grep you rely on that ripgrep either doesn't
have or never will have? If the former, file a bug report, maybe ripgrep can
do it! If the latter, well, then, just use grep.
<h3 name="intentcountsforsomething">
What does the "rip" in ripgrep mean?
</h3>
When I first started writing ripgrep, I called it `rep`, intending it to be a
shorter variant of `grep`. Soon after, I renamed it to `xrep` since `rep`
wasn't obvious enough of a name for my taste. And also because adding `x` to
anything always makes it better, right?
Before ripgrep's first public release, I decided that I didn't like `xrep`. I
thought it was slightly awkward to type, and despite my previous praise of the
letter `x`, I kind of thought it was pretty lame. Being someone who really
likes Rust, I wanted to call it "rustgrep" or maybe "rgrep" for short. But I
thought that was just as lame, and maybe a little too in-your-face. But I
wanted to continue using `r` so I could at least pretend Rust had something to
do with it.
I spent a couple of days trying to think of very short words that began with
the letter `r` that were even somewhat related to the task of searching. I
don't remember how it popped into my head, but "rip" came up as something that
meant "fast," as in, "to rip through your text." The fact that RIP is also
an initialism for "Rest in Peace" (as in, "ripgrep kills grep") never really
dawned on me. Perhaps the coincidence is too striking to believe that, but
I didn't realize it until someone explicitly pointed it out to me after the
initial public release. I admit that I found it mildly amusing, but if I had
realized it myself before the public release, I probably would have pressed on
and chose a different name. Alas, renaming things after a release is hard, so I
decided to mush on.
Given the fact that
[ripgrep never was, is or will be a 100% drop-in replacement for
grep](#posix4ever),
ripgrep is neither actually a "grep killer" nor was it ever intended to be. It
certainly does eat into some of its use cases, but that's nothing that other
tools like ack or The Silver Searcher weren't already doing.

View File

@@ -194,7 +194,7 @@ optimizations) by utilizing a custom tap:
```
$ brew tap burntsushi/ripgrep https://github.com/BurntSushi/ripgrep.git
$ brew install burntsushi/ripgrep/ripgrep-bin
$ brew install ripgrep-bin
```
If you're a **Windows Chocolatey** user, then you can install ripgrep from the [official repo](https://chocolatey.org/packages/ripgrep):
@@ -203,6 +203,12 @@ If you're a **Windows Chocolatey** user, then you can install ripgrep from the [
$ choco install ripgrep
```
If you're a **Windows Scoop** user, then you can install ripgrep from the [official bucket](https://github.com/lukesampson/scoop/blob/master/bucket/ripgrep.json):
```
$ scoop install ripgrep
```
If you're an **Arch Linux** user, then you can install ripgrep from the official repos:
```
@@ -249,18 +255,25 @@ then ripgrep can be installed using a binary `.deb` file provided in each
ripgrep is not in the official Debian or Ubuntu repositories.
```
$ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/0.8.0/ripgrep_0.8.0_amd64.deb
$ sudo dpkg -i ripgrep_0.8.0_amd64.deb
$ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/0.8.1/ripgrep_0.8.1_amd64.deb
$ sudo dpkg -i ripgrep_0.8.1_amd64.deb
```
If you're an **Ubuntu** user, ripgrep can be installed from the `snap` store.
* Note that if you are using `16.04 LTS` or later, snap is already installed.
* For older versions you can install snap using
[this guide](https://docs.snapcraft.io/core/install-ubuntu).
[this guide](https://docs.snapcraft.io/core/install-ubuntu).
* If you get permission errors when running ripgrep after installation, try
uninstalling and then re-installing with the `--classic` flag.
For the latest stable release:
```
$ sudo snap install rg
```
For the most recent release candidate:
```
$ sudo snap install --candidate rg
```
If you're a **Rust programmer**, ripgrep can be installed with `cargo`.
* Note that the minimum supported version of Rust for ripgrep is **1.20**,
@@ -273,6 +286,14 @@ If you're a **Rust programmer**, ripgrep can be installed with `cargo`.
$ cargo install ripgrep
```
If you're using Rust nightly, then use
```
$ cargo install ripgrep --features unstable
```
to get SIMD optimizations.
ripgrep isn't currently in any other package repositories.
[I'd like to change that](https://github.com/BurntSushi/ripgrep/issues/10).
@@ -299,7 +320,8 @@ RUSTFLAGS="-C target-cpu=native" cargo build --release --features 'simd-accel av
```
If your machine doesn't support AVX instructions, then simply remove
`avx-accel` from the features list. Similarly for SIMD.
`avx-accel` from the features list. Similarly for SIMD (which corresponds
roughly to SSE instructions).
### Running tests

View File

@@ -50,8 +50,7 @@ test_script:
before_deploy:
# Generate artifacts for release
# TODO(burntsushi): How can we enable SSSE3 on Windows?
- cargo build --release
- cargo build --release --features unstable
- mkdir staging
- copy target\release\rg.exe staging
- ps: copy target\release\build\ripgrep-*\out\_rg.ps1 staging

View File

@@ -8,12 +8,7 @@ set -ex
# Generate artifacts for release
mk_artifacts() {
if is_ssse3_target; then
RUSTFLAGS="-C target-feature=+ssse3" \
cargo build --target "$TARGET" --release --features simd-accel
else
cargo build --target "$TARGET" --release
fi
cargo build --target "$TARGET" --release --features unstable
}
mk_tarball() {

View File

@@ -27,7 +27,7 @@ install_osx_dependencies() {
return
fi
brew install asciidoc
brew install asciidoc docbook-xsl
}
configure_cargo() {

View File

@@ -25,8 +25,10 @@ _rg() {
'*--colors=[specify color settings and styles]: :->colorspec'
'--column[show column numbers]'
'(-A -B -C --after-context --before-context --context)'{-C+,--context=}'[specify lines to show before and after each match]:number of lines'
'(-b --byte-offset)'{-b,--byte-offset}'[print the 0-based byte offset for each matching line]'
'--context-separator=[specify string used to separate non-continuous context lines in output]:separator'
'(-c --count --passthrough --passthru)'{-c,--count}'[only show count of matches for each file]'
'(-c --count --count-matches --passthrough --passthru)'{-c,--count}'[only show count of matching lines for each file]'
'(--count-matches -c --count --passthrough --passthru)--count-matches[only show count of individual matches for each file]'
'--debug[show debug messages]'
'--dfa-size-limit=[specify upper size limit of generated DFA]:DFA size'
'(-E --encoding)'{-E+,--encoding=}'[specify text encoding of files to search]: :_rg_encodings'
@@ -89,6 +91,7 @@ _rg() {
'(-e -f --file --files --regexp --type-list)1: :_rg_pattern'
'(--type-list)*:file:_files'
'(-z --search-zip)'{-z,--search-zip}'[search in compressed files]'
"(--stats)--stats[print stats about this search]"
)
[[ ${_RG_COMPLETE_LIST_ARGS:-} == (1|t*|y*) ]] && {

View File

@@ -12,7 +12,7 @@ Synopsis
*rg* [_OPTIONS_] *-e* _PATTERN_... [_PATH_...]
*rg* [_OPTIONS_] *-f* _PATH_... [_PATH_...]
*rg* [_OPTIONS_] *-f* _PATTERNFILE_... [_PATH_...]
*rg* [_OPTIONS_] *--files* [_PATH_...]

View File

@@ -23,7 +23,7 @@ aho-corasick = "0.6.0"
fnv = "1.0"
log = "0.4"
memchr = "2"
regex = "0.2.1"
regex = "0.2.9"
[dev-dependencies]
glob = "0.2"

View File

@@ -20,7 +20,7 @@ Add this to your `Cargo.toml`:
```toml
[dependencies]
globset = "0.2"
globset = "0.3"
```
and this to your crate root:

View File

@@ -187,13 +187,26 @@ pub struct GlobBuilder<'a> {
opts: GlobOptions,
}
#[derive(Clone, Copy, Debug, Default, Eq, Hash, PartialEq)]
#[derive(Clone, Copy, Debug, Eq, Hash, PartialEq)]
struct GlobOptions {
/// Whether to match case insensitively.
case_insensitive: bool,
/// Whether to require a literal separator to match a separator in a file
/// path. e.g., when enabled, `*` won't match `/`.
literal_separator: bool,
/// Whether or not to use `\` to escape special characters.
/// e.g., when enabled, `\*` will match a literal `*`.
backslash_escape: bool,
}
impl GlobOptions {
fn default() -> GlobOptions {
GlobOptions {
case_insensitive: false,
literal_separator: false,
backslash_escape: !is_separator('\\'),
}
}
}
#[derive(Clone, Debug, Default, Eq, PartialEq)]
@@ -549,6 +562,7 @@ impl<'a> GlobBuilder<'a> {
chars: self.glob.chars().peekable(),
prev: None,
cur: None,
opts: &self.opts,
};
p.parse()?;
if p.stack.is_empty() {
@@ -585,6 +599,19 @@ impl<'a> GlobBuilder<'a> {
self.opts.literal_separator = yes;
self
}
/// When enabled, a back slash (`\`) may be used to escape
/// special characters in a glob pattern. Additionally, this will
/// prevent `\` from being interpreted as a path separator on all
/// platforms.
///
/// This is enabled by default on platforms where `\` is not a
/// path separator and disabled by default on platforms where `\`
/// is a path separator.
pub fn backslash_escape(&mut self, yes: bool) -> &mut GlobBuilder<'a> {
self.opts.backslash_escape = yes;
self
}
}
impl Tokens {
@@ -710,6 +737,7 @@ struct Parser<'a> {
chars: iter::Peekable<str::Chars<'a>>,
prev: Option<char>,
cur: Option<char>,
opts: &'a GlobOptions,
}
impl<'a> Parser<'a> {
@@ -726,14 +754,8 @@ impl<'a> Parser<'a> {
'{' => self.push_alternate()?,
'}' => self.pop_alternate()?,
',' => self.parse_comma()?,
c => {
if is_separator(c) {
// Normalize all patterns to use / as a separator.
self.push_token(Token::Literal('/'))?
} else {
self.push_token(Token::Literal(c))?
}
}
'\\' => self.parse_backslash()?,
c => self.push_token(Token::Literal(c))?,
}
}
Ok(())
@@ -786,6 +808,20 @@ impl<'a> Parser<'a> {
}
}
fn parse_backslash(&mut self) -> Result<(), Error> {
if self.opts.backslash_escape {
match self.bump() {
None => Err(self.error(ErrorKind::DanglingEscape)),
Some(c) => self.push_token(Token::Literal(c)),
}
} else if is_separator('\\') {
// Normalize all patterns to use / as a separator.
self.push_token(Token::Literal('/'))
} else {
self.push_token(Token::Literal('\\'))
}
}
fn parse_star(&mut self) -> Result<(), Error> {
let prev = self.prev;
if self.chars.peek() != Some(&'*') {
@@ -933,8 +969,9 @@ mod tests {
#[derive(Clone, Copy, Debug, Default)]
struct Options {
casei: bool,
litsep: bool,
casei: Option<bool>,
litsep: Option<bool>,
bsesc: Option<bool>,
}
macro_rules! syntax {
@@ -964,11 +1001,17 @@ mod tests {
($name:ident, $pat:expr, $re:expr, $options:expr) => {
#[test]
fn $name() {
let pat = GlobBuilder::new($pat)
.case_insensitive($options.casei)
.literal_separator($options.litsep)
.build()
.unwrap();
let mut builder = GlobBuilder::new($pat);
if let Some(casei) = $options.casei {
builder.case_insensitive(casei);
}
if let Some(litsep) = $options.litsep {
builder.literal_separator(litsep);
}
if let Some(bsesc) = $options.bsesc {
builder.backslash_escape(bsesc);
}
let pat = builder.build().unwrap();
assert_eq!(format!("(?-u){}", $re), pat.regex());
}
};
@@ -981,11 +1024,17 @@ mod tests {
($name:ident, $pat:expr, $path:expr, $options:expr) => {
#[test]
fn $name() {
let pat = GlobBuilder::new($pat)
.case_insensitive($options.casei)
.literal_separator($options.litsep)
.build()
.unwrap();
let mut builder = GlobBuilder::new($pat);
if let Some(casei) = $options.casei {
builder.case_insensitive(casei);
}
if let Some(litsep) = $options.litsep {
builder.literal_separator(litsep);
}
if let Some(bsesc) = $options.bsesc {
builder.backslash_escape(bsesc);
}
let pat = builder.build().unwrap();
let matcher = pat.compile_matcher();
let strategic = pat.compile_strategic_matcher();
let set = GlobSetBuilder::new().add(pat).build().unwrap();
@@ -1003,11 +1052,17 @@ mod tests {
($name:ident, $pat:expr, $path:expr, $options:expr) => {
#[test]
fn $name() {
let pat = GlobBuilder::new($pat)
.case_insensitive($options.casei)
.literal_separator($options.litsep)
.build()
.unwrap();
let mut builder = GlobBuilder::new($pat);
if let Some(casei) = $options.casei {
builder.case_insensitive(casei);
}
if let Some(litsep) = $options.litsep {
builder.literal_separator(litsep);
}
if let Some(bsesc) = $options.bsesc {
builder.backslash_escape(bsesc);
}
let pat = builder.build().unwrap();
let matcher = pat.compile_matcher();
let strategic = pat.compile_strategic_matcher();
let set = GlobSetBuilder::new().add(pat).build().unwrap();
@@ -1091,12 +1146,24 @@ mod tests {
syntaxerr!(err_range2, "[z--]", ErrorKind::InvalidRange('z', '-'));
const CASEI: Options = Options {
casei: true,
litsep: false,
casei: Some(true),
litsep: None,
bsesc: None,
};
const SLASHLIT: Options = Options {
casei: false,
litsep: true,
casei: None,
litsep: Some(true),
bsesc: None,
};
const NOBSESC: Options = Options {
casei: None,
litsep: None,
bsesc: Some(false),
};
const BSESC: Options = Options {
casei: None,
litsep: None,
bsesc: Some(true),
};
toregex!(re_casei, "a", "(?i)^a$", &CASEI);
@@ -1209,6 +1276,17 @@ mod tests {
#[cfg(not(unix))]
matches!(matchslash5, "abc\\def", "abc/def", SLASHLIT);
matches!(matchbackslash1, "\\[", "[", BSESC);
matches!(matchbackslash2, "\\?", "?", BSESC);
matches!(matchbackslash3, "\\*", "*", BSESC);
matches!(matchbackslash4, "\\[a-z]", "\\a", NOBSESC);
matches!(matchbackslash5, "\\?", "\\a", NOBSESC);
matches!(matchbackslash6, "\\*", "\\\\", NOBSESC);
#[cfg(unix)]
matches!(matchbackslash7, "\\a", "a");
#[cfg(not(unix))]
matches!(matchbackslash8, "\\a", "/a");
nmatches!(matchnot1, "a*b*c", "abcd");
nmatches!(matchnot2, "abc*abc*abc", "abcabcabcabcabcabcabca");
nmatches!(matchnot3, "some/**/needle.txt", "some/other/notthis.txt");
@@ -1253,13 +1331,20 @@ mod tests {
($which:ident, $name:ident, $pat:expr, $expect:expr) => {
extract!($which, $name, $pat, $expect, Options::default());
};
($which:ident, $name:ident, $pat:expr, $expect:expr, $opts:expr) => {
($which:ident, $name:ident, $pat:expr, $expect:expr, $options:expr) => {
#[test]
fn $name() {
let pat = GlobBuilder::new($pat)
.case_insensitive($opts.casei)
.literal_separator($opts.litsep)
.build().unwrap();
let mut builder = GlobBuilder::new($pat);
if let Some(casei) = $options.casei {
builder.case_insensitive(casei);
}
if let Some(litsep) = $options.litsep {
builder.literal_separator(litsep);
}
if let Some(bsesc) = $options.bsesc {
builder.backslash_escape(bsesc);
}
let pat = builder.build().unwrap();
assert_eq!($expect, pat.$which());
}
};

View File

@@ -91,6 +91,11 @@ Standard Unix-style glob syntax is supported:
`[!ab]` to match any character except for `a` and `b`.
* Metacharacters such as `*` and `?` can be escaped with character class
notation. e.g., `[*]` matches `*`.
* When backslash escapes are enabled, a backslash (`\`) will escape all meta
characters in a glob. If it precedes a non-meta character, then the slash is
ignored. A `\\` will match a literal `\\`. Note that this mode is only
enabled on Unix platforms by default, but can be enabled on any platform
via the `backslash_escape` setting on `Glob`.
A `GlobBuilder` can be used to prevent wildcards from matching path separators,
or to enable case insensitive matching.
@@ -154,8 +159,17 @@ pub enum ErrorKind {
/// Occurs when an alternating group is nested inside another alternating
/// group, e.g., `{{a,b},{c,d}}`.
NestedAlternates,
/// Occurs when an unescaped '\' is found at the end of a glob.
DanglingEscape,
/// An error associated with parsing or compiling a regex.
Regex(String),
/// Hints that destructuring should not be exhaustive.
///
/// This enum may grow additional variants, so this makes sure clients
/// don't count on exhaustive matching. (Otherwise, adding a new variant
/// could break existing code.)
#[doc(hidden)]
__Nonexhaustive,
}
impl StdError for Error {
@@ -199,7 +213,11 @@ impl ErrorKind {
ErrorKind::NestedAlternates => {
"nested alternate groups are not allowed"
}
ErrorKind::DanglingEscape => {
"dangling '\\'"
}
ErrorKind::Regex(ref err) => err,
ErrorKind::__Nonexhaustive => unreachable!(),
}
}
}
@@ -223,12 +241,14 @@ impl fmt::Display for ErrorKind {
| ErrorKind::UnopenedAlternates
| ErrorKind::UnclosedAlternates
| ErrorKind::NestedAlternates
| ErrorKind::DanglingEscape
| ErrorKind::Regex(_) => {
write!(f, "{}", self.description())
}
ErrorKind::InvalidRange(s, e) => {
write!(f, "invalid range; '{}' > '{}'", s, e)
}
ErrorKind::__Nonexhaustive => unreachable!(),
}
}
}

View File

@@ -15,5 +15,5 @@ license = "Unlicense/MIT"
[dependencies]
log = "0.4"
memchr = "2"
regex = "0.2.1"
regex-syntax = "0.4.0"
regex = "0.2.9"
regex-syntax = "0.5.3"

View File

@@ -19,6 +19,7 @@ pub use search::{Grep, GrepBuilder, Iter, Match};
mod literals;
mod nonl;
mod search;
mod smart_case;
mod word_boundary;
/// Result is a convenient type alias that fixes the type of the error to

View File

@@ -10,10 +10,8 @@ principled.
use std::cmp;
use regex::bytes::RegexBuilder;
use syntax::{
Expr, Literals, Lit,
ByteClass, ByteRange, CharClass, ClassRange, Repeater,
};
use syntax::hir::{self, Hir, HirKind};
use syntax::hir::literal::{Literal, Literals};
#[derive(Clone, Debug)]
pub struct LiteralSets {
@@ -23,12 +21,12 @@ pub struct LiteralSets {
}
impl LiteralSets {
pub fn create(expr: &Expr) -> Self {
pub fn create(expr: &Hir) -> Self {
let mut required = Literals::empty();
union_required(expr, &mut required);
LiteralSets {
prefixes: expr.prefixes(),
suffixes: expr.suffixes(),
prefixes: Literals::prefixes(expr),
suffixes: Literals::suffixes(expr),
required: required,
}
}
@@ -93,60 +91,52 @@ impl LiteralSets {
}
}
fn union_required(expr: &Expr, lits: &mut Literals) {
use syntax::Expr::*;
match *expr {
Literal { ref chars, casei: false } => {
let s: String = chars.iter().cloned().collect();
lits.cross_add(s.as_bytes());
fn union_required(expr: &Hir, lits: &mut Literals) {
match *expr.kind() {
HirKind::Literal(hir::Literal::Unicode(c)) => {
let mut buf = [0u8; 4];
lits.cross_add(c.encode_utf8(&mut buf).as_bytes());
}
Literal { ref chars, casei: true } => {
for &c in chars {
let cls = CharClass::new(vec![
ClassRange { start: c, end: c },
]).case_fold();
if !lits.add_char_class(&cls) {
HirKind::Literal(hir::Literal::Byte(b)) => {
lits.cross_add(&[b]);
}
HirKind::Class(hir::Class::Unicode(ref cls)) => {
if count_unicode_class(cls) >= 5 || !lits.add_char_class(cls) {
lits.cut();
}
}
HirKind::Class(hir::Class::Bytes(ref cls)) => {
if count_byte_class(cls) >= 5 || !lits.add_byte_class(cls) {
lits.cut();
}
}
HirKind::Group(hir::Group { ref hir, .. }) => {
union_required(&**hir, lits);
}
HirKind::Repetition(ref x) => {
match x.kind {
hir::RepetitionKind::ZeroOrOne => lits.cut(),
hir::RepetitionKind::ZeroOrMore => lits.cut(),
hir::RepetitionKind::OneOrMore => {
union_required(&x.hir, lits);
lits.cut();
return;
}
hir::RepetitionKind::Range(ref rng) => {
let (min, max) = match *rng {
hir::RepetitionRange::Exactly(m) => (m, Some(m)),
hir::RepetitionRange::AtLeast(m) => (m, None),
hir::RepetitionRange::Bounded(m, n) => (m, Some(n)),
};
repeat_range_literals(
&x.hir, min, max, x.greedy, lits, union_required);
}
}
}
LiteralBytes { ref bytes, casei: false } => {
lits.cross_add(bytes);
HirKind::Concat(ref es) if es.is_empty() => {}
HirKind::Concat(ref es) if es.len() == 1 => {
union_required(&es[0], lits)
}
LiteralBytes { ref bytes, casei: true } => {
for &b in bytes {
let cls = ByteClass::new(vec![
ByteRange { start: b, end: b },
]).case_fold();
if !lits.add_byte_class(&cls) {
lits.cut();
return;
}
}
}
Class(_) => {
lits.cut();
}
ClassBytes(_) => {
lits.cut();
}
Group { ref e, .. } => {
union_required(&**e, lits);
}
Repeat { r: Repeater::ZeroOrOne, .. } => lits.cut(),
Repeat { r: Repeater::ZeroOrMore, .. } => lits.cut(),
Repeat { ref e, r: Repeater::OneOrMore, .. } => {
union_required(&**e, lits);
lits.cut();
}
Repeat { ref e, r: Repeater::Range { min, max }, greedy } => {
repeat_range_literals(
&**e, min, max, greedy, lits, union_required);
}
Concat(ref es) if es.is_empty() => {}
Concat(ref es) if es.len() == 1 => union_required(&es[0], lits),
Concat(ref es) => {
HirKind::Concat(ref es) => {
for e in es {
let mut lits2 = lits.to_empty();
union_required(e, &mut lits2);
@@ -157,7 +147,6 @@ fn union_required(expr: &Expr, lits: &mut Literals) {
if lits2.contains_empty() {
lits.cut();
}
// if !lits.union(lits2) {
if !lits.cross_product(&lits2) {
// If this expression couldn't yield any literal that
// could be extended, then we need to quit. Since we're
@@ -167,15 +156,15 @@ fn union_required(expr: &Expr, lits: &mut Literals) {
}
}
}
Alternate(ref es) => {
HirKind::Alternation(ref es) => {
alternate_literals(es, lits, union_required);
}
_ => lits.cut(),
}
}
fn repeat_range_literals<F: FnMut(&Expr, &mut Literals)>(
e: &Expr,
fn repeat_range_literals<F: FnMut(&Hir, &mut Literals)>(
e: &Hir,
min: u32,
max: Option<u32>,
_greedy: bool,
@@ -204,8 +193,8 @@ fn repeat_range_literals<F: FnMut(&Expr, &mut Literals)>(
}
}
fn alternate_literals<F: FnMut(&Expr, &mut Literals)>(
es: &[Expr],
fn alternate_literals<F: FnMut(&Hir, &mut Literals)>(
es: &[Hir],
lits: &mut Literals,
mut f: F,
) {
@@ -234,11 +223,21 @@ fn alternate_literals<F: FnMut(&Expr, &mut Literals)>(
}
lits.cut();
if !lcs.is_empty() {
lits.add(Lit::empty());
lits.add(Lit::new(lcs.to_vec()));
lits.add(Literal::empty());
lits.add(Literal::new(lcs.to_vec()));
}
}
/// Return the number of characters in the given class.
fn count_unicode_class(cls: &hir::ClassUnicode) -> u32 {
cls.iter().map(|r| 1 + (r.end() as u32 - r.start() as u32)).sum()
}
/// Return the number of bytes in the given class.
fn count_byte_class(cls: &hir::ClassBytes) -> u32 {
cls.iter().map(|r| 1 + (r.end() as u32 - r.start() as u32)).sum()
}
/// Converts an arbitrary sequence of bytes to a literal suitable for building
/// a regular expression.
fn bytes_to_regex(bs: &[u8]) -> String {

View File

@@ -1,4 +1,4 @@
use syntax::Expr;
use syntax::hir::{self, Hir, HirKind};
use {Error, Result};
@@ -9,59 +9,66 @@ use {Error, Result};
///
/// If `byte` is not an ASCII character (i.e., greater than `0x7F`), then this
/// function panics.
pub fn remove(expr: Expr, byte: u8) -> Result<Expr> {
// TODO(burntsushi): There is a bug in this routine where only `\n` is
// handled correctly. Namely, `AnyChar` and `AnyByte` need to be translated
// to proper character classes instead of the special `AnyCharNoNL` and
// `AnyByteNoNL` classes.
use syntax::Expr::*;
pub fn remove(expr: Hir, byte: u8) -> Result<Hir> {
assert!(byte <= 0x7F);
let chr = byte as char;
assert!(chr.len_utf8() == 1);
Ok(match expr {
Literal { chars, casei } => {
if chars.iter().position(|&c| c == chr).is_some() {
Ok(match expr.into_kind() {
HirKind::Empty => Hir::empty(),
HirKind::Literal(hir::Literal::Unicode(c)) => {
if c == chr {
return Err(Error::LiteralNotAllowed(chr));
}
Literal { chars: chars, casei: casei }
Hir::literal(hir::Literal::Unicode(c))
}
LiteralBytes { bytes, casei } => {
if bytes.iter().position(|&b| b == byte).is_some() {
HirKind::Literal(hir::Literal::Byte(b)) => {
if b as char == chr {
return Err(Error::LiteralNotAllowed(chr));
}
LiteralBytes { bytes: bytes, casei: casei }
Hir::literal(hir::Literal::Byte(b))
}
AnyChar => AnyCharNoNL,
AnyByte => AnyByteNoNL,
Class(mut cls) => {
cls.remove(chr);
Class(cls)
}
ClassBytes(mut cls) => {
cls.remove(byte);
ClassBytes(cls)
}
Group { e, i, name } => {
Group {
e: Box::new(remove(*e, byte)?),
i: i,
name: name,
HirKind::Class(hir::Class::Unicode(mut cls)) => {
let remove = hir::ClassUnicode::new(Some(
hir::ClassUnicodeRange::new(chr, chr),
));
cls.difference(&remove);
if cls.iter().next().is_none() {
return Err(Error::LiteralNotAllowed(chr));
}
Hir::class(hir::Class::Unicode(cls))
}
Repeat { e, r, greedy } => {
Repeat {
e: Box::new(remove(*e, byte)?),
r: r,
greedy: greedy,
HirKind::Class(hir::Class::Bytes(mut cls)) => {
let remove = hir::ClassBytes::new(Some(
hir::ClassBytesRange::new(byte, byte),
));
cls.difference(&remove);
if cls.iter().next().is_none() {
return Err(Error::LiteralNotAllowed(chr));
}
Hir::class(hir::Class::Bytes(cls))
}
Concat(exprs) => {
Concat(exprs.into_iter().map(|e| remove(e, byte)).collect::<Result<Vec<Expr>>>()?)
HirKind::Anchor(x) => Hir::anchor(x),
HirKind::WordBoundary(x) => Hir::word_boundary(x),
HirKind::Repetition(mut x) => {
x.hir = Box::new(remove(*x.hir, byte)?);
Hir::repetition(x)
}
Alternate(exprs) => {
Alternate(exprs.into_iter().map(|e| remove(e, byte)).collect::<Result<Vec<Expr>>>()?)
HirKind::Group(mut x) => {
x.hir = Box::new(remove(*x.hir, byte)?);
Hir::group(x)
}
HirKind::Concat(xs) => {
let xs = xs.into_iter()
.map(|e| remove(e, byte))
.collect::<Result<Vec<Hir>>>()?;
Hir::concat(xs)
}
HirKind::Alternation(xs) => {
let xs = xs.into_iter()
.map(|e| remove(e, byte))
.collect::<Result<Vec<Hir>>>()?;
Hir::alternation(xs)
}
e => e,
})
}

View File

@@ -1,10 +1,11 @@
use memchr::{memchr, memrchr};
use syntax::ParserBuilder;
use syntax::hir::Hir;
use regex::bytes::{Regex, RegexBuilder};
use syntax;
use literals::LiteralSets;
use nonl;
use syntax::Expr;
use smart_case::Cased;
use word_boundary::strip_unicode_word_boundaries;
use Result;
@@ -166,7 +167,7 @@ impl GrepBuilder {
/// Creates a new regex from the given expression with the current
/// configuration.
fn regex(&self, expr: &Expr) -> Result<Regex> {
fn regex(&self, expr: &Hir) -> Result<Regex> {
let mut builder = RegexBuilder::new(&expr.to_string());
builder.unicode(true);
self.regex_build(builder)
@@ -184,15 +185,16 @@ impl GrepBuilder {
/// Parses the underlying pattern and ensures the pattern can never match
/// the line terminator.
fn parse(&self) -> Result<syntax::Expr> {
let expr =
syntax::ExprBuilder::new()
.allow_bytes(true)
.unicode(true)
fn parse(&self) -> Result<Hir> {
let expr = ParserBuilder::new()
.allow_invalid_utf8(true)
.case_insensitive(self.is_case_insensitive()?)
.multi_line(true)
.build()
.parse(&self.pattern)?;
debug!("original regex HIR pattern:\n{}", expr);
let expr = nonl::remove(expr, self.opts.line_terminator)?;
debug!("regex ast:\n{:#?}", expr);
debug!("transformed regex HIR pattern:\n{}", expr);
Ok(expr)
}
@@ -204,7 +206,11 @@ impl GrepBuilder {
if !self.opts.case_smart {
return Ok(false);
}
Ok(!has_uppercase_literal(&self.pattern))
let cased = match Cased::from_pattern(&self.pattern) {
None => return Ok(false),
Some(cased) => cased,
};
Ok(cased.any_literal && !cased.any_uppercase)
}
}
@@ -310,44 +316,15 @@ impl<'b, 's> Iterator for Iter<'b, 's> {
}
}
/// Determine whether the pattern contains an uppercase character which should
/// negate the effect of the smart-case option.
///
/// Ideally we would be able to check the AST in order to correctly handle
/// things like '\p{Ll}' and '\p{Lu}' (which should be treated as explicitly
/// cased), but we don't currently have that option. For now, our 'good enough'
/// solution is to simply perform a semi-naïve scan of the input pattern and
/// ignore all characters following a '\'. The ExprBuilder will handle any
/// actual errors, and this at least lets us support the most common cases,
/// like 'foo\w' and 'foo\S', in an intuitive manner.
fn has_uppercase_literal(pattern: &str) -> bool {
let mut chars = pattern.chars();
while let Some(c) = chars.next() {
if c == '\\' {
chars.next();
} else if c.is_uppercase() {
return true;
}
}
false
}
#[cfg(test)]
mod tests {
#![allow(unused_imports)]
use memchr::{memchr, memrchr};
use regex::bytes::Regex;
use super::{GrepBuilder, Match, has_uppercase_literal};
use super::{GrepBuilder, Match};
static SHERLOCK: &'static [u8] = include_bytes!("./data/sherlock.txt");
#[allow(dead_code)]
fn s(bytes: &[u8]) -> String {
String::from_utf8(bytes.to_vec()).unwrap()
}
fn find_lines(pat: &str, haystack: &[u8]) -> Vec<Match> {
let re = Regex::new(pat).unwrap();
let mut lines = vec![];
@@ -376,20 +353,4 @@ mod tests {
assert_eq!(expected.len(), got.len());
assert_eq!(expected, got);
}
#[test]
fn pattern_case() {
assert_eq!(has_uppercase_literal(&"".to_string()), false);
assert_eq!(has_uppercase_literal(&"foo".to_string()), false);
assert_eq!(has_uppercase_literal(&"Foo".to_string()), true);
assert_eq!(has_uppercase_literal(&"foO".to_string()), true);
assert_eq!(has_uppercase_literal(&"foo\\\\".to_string()), false);
assert_eq!(has_uppercase_literal(&"foo\\w".to_string()), false);
assert_eq!(has_uppercase_literal(&"foo\\S".to_string()), false);
assert_eq!(has_uppercase_literal(&"foo\\p{Ll}".to_string()), true);
assert_eq!(has_uppercase_literal(&"foo[a-z]".to_string()), false);
assert_eq!(has_uppercase_literal(&"foo[A-Z]".to_string()), true);
assert_eq!(has_uppercase_literal(&"foo[\\S\\t]".to_string()), false);
assert_eq!(has_uppercase_literal(&"foo\\\\S".to_string()), true);
}
}

191
grep/src/smart_case.rs Normal file
View File

@@ -0,0 +1,191 @@
use syntax::ast::{self, Ast};
use syntax::ast::parse::Parser;
/// The results of analyzing a regex for cased literals.
#[derive(Clone, Debug, Default)]
pub struct Cased {
/// True if and only if a literal uppercase character occurs in the regex.
///
/// A regex like `\pL` contains no uppercase literals, even though `L`
/// is uppercase and the `\pL` class contains uppercase characters.
pub any_uppercase: bool,
/// True if and only if the regex contains any literal at all. A regex like
/// `\pL` has this set to false.
pub any_literal: bool,
}
impl Cased {
/// Returns a `Cased` value by doing analysis on the AST of `pattern`.
///
/// If `pattern` is not a valid regular expression, then `None` is
/// returned.
pub fn from_pattern(pattern: &str) -> Option<Cased> {
Parser::new()
.parse(pattern)
.map(|ast| Cased::from_ast(&ast))
.ok()
}
fn from_ast(ast: &Ast) -> Cased {
let mut cased = Cased::default();
cased.from_ast_impl(ast);
cased
}
fn from_ast_impl(&mut self, ast: &Ast) {
if self.done() {
return;
}
match *ast {
Ast::Empty(_)
| Ast::Flags(_)
| Ast::Dot(_)
| Ast::Assertion(_)
| Ast::Class(ast::Class::Unicode(_))
| Ast::Class(ast::Class::Perl(_)) => {}
Ast::Literal(ref x) => {
self.from_ast_literal(x);
}
Ast::Class(ast::Class::Bracketed(ref x)) => {
self.from_ast_class_set(&x.kind);
}
Ast::Repetition(ref x) => {
self.from_ast_impl(&x.ast);
}
Ast::Group(ref x) => {
self.from_ast_impl(&x.ast);
}
Ast::Alternation(ref alt) => {
for x in &alt.asts {
self.from_ast_impl(x);
}
}
Ast::Concat(ref alt) => {
for x in &alt.asts {
self.from_ast_impl(x);
}
}
}
}
fn from_ast_class_set(&mut self, ast: &ast::ClassSet) {
if self.done() {
return;
}
match *ast {
ast::ClassSet::Item(ref item) => {
self.from_ast_class_set_item(item);
}
ast::ClassSet::BinaryOp(ref x) => {
self.from_ast_class_set(&x.lhs);
self.from_ast_class_set(&x.rhs);
}
}
}
fn from_ast_class_set_item(&mut self, ast: &ast::ClassSetItem) {
if self.done() {
return;
}
match *ast {
ast::ClassSetItem::Empty(_)
| ast::ClassSetItem::Ascii(_)
| ast::ClassSetItem::Unicode(_)
| ast::ClassSetItem::Perl(_) => {}
ast::ClassSetItem::Literal(ref x) => {
self.from_ast_literal(x);
}
ast::ClassSetItem::Range(ref x) => {
self.from_ast_literal(&x.start);
self.from_ast_literal(&x.end);
}
ast::ClassSetItem::Bracketed(ref x) => {
self.from_ast_class_set(&x.kind);
}
ast::ClassSetItem::Union(ref union) => {
for x in &union.items {
self.from_ast_class_set_item(x);
}
}
}
}
fn from_ast_literal(&mut self, ast: &ast::Literal) {
self.any_literal = true;
self.any_uppercase = self.any_uppercase || ast.c.is_uppercase();
}
/// Returns true if and only if the attributes can never change no matter
/// what other AST it might see.
fn done(&self) -> bool {
self.any_uppercase && self.any_literal
}
}
#[cfg(test)]
mod tests {
use super::*;
fn cased(pattern: &str) -> Cased {
Cased::from_pattern(pattern).unwrap()
}
#[test]
fn various() {
let x = cased("");
assert!(!x.any_uppercase);
assert!(!x.any_literal);
let x = cased("foo");
assert!(!x.any_uppercase);
assert!(x.any_literal);
let x = cased("Foo");
assert!(x.any_uppercase);
assert!(x.any_literal);
let x = cased("foO");
assert!(x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"foo\\");
assert!(!x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"foo\w");
assert!(!x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"foo\S");
assert!(!x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"foo\p{Ll}");
assert!(!x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"foo[a-z]");
assert!(!x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"foo[A-Z]");
assert!(x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"foo[\S\t]");
assert!(!x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"foo\\S");
assert!(x.any_uppercase);
assert!(x.any_literal);
let x = cased(r"\p{Ll}");
assert!(!x.any_uppercase);
assert!(!x.any_literal);
let x = cased(r"aBc\w");
assert!(x.any_uppercase);
assert!(x.any_literal);
}
}

View File

@@ -1,4 +1,4 @@
use syntax::Expr;
use syntax::hir::{self, Hir, HirKind};
/// Strips Unicode word boundaries from the given expression.
///
@@ -8,7 +8,7 @@ use syntax::Expr;
/// false negatives.
///
/// If no word boundaries could be stripped, then None is returned.
pub fn strip_unicode_word_boundaries(expr: &Expr) -> Option<Expr> {
pub fn strip_unicode_word_boundaries(expr: &Hir) -> Option<Hir> {
// The real reason we do this is because Unicode word boundaries are the
// one thing that Rust's regex DFA engine can't handle. When it sees a
// Unicode word boundary among non-ASCII text, it falls back to one of the
@@ -16,23 +16,24 @@ pub fn strip_unicode_word_boundaries(expr: &Expr) -> Option<Expr> {
// a regex to find candidate matches without a Unicode word boundary. We'll
// only then use the full (and slower) regex to confirm a candidate as a
// match or not during search.
use syntax::Expr::*;
match *expr {
Concat(ref es) if !es.is_empty() => {
//
// It looks like we only check the outer edges for `\b`? I guess this is
// an attempt to optimize for the `-w/--word-regexp` flag? ---AG
match *expr.kind() {
HirKind::Concat(ref es) if !es.is_empty() => {
let first = is_unicode_word_boundary(&es[0]);
let last = is_unicode_word_boundary(es.last().unwrap());
// Be careful not to strip word boundaries if there are no other
// expressions to match.
match (first, last) {
(true, false) if es.len() > 1 => {
Some(Concat(es[1..].to_vec()))
Some(Hir::concat(es[1..].to_vec()))
}
(false, true) if es.len() > 1 => {
Some(Concat(es[..es.len() - 1].to_vec()))
Some(Hir::concat(es[..es.len() - 1].to_vec()))
}
(true, true) if es.len() > 2 => {
Some(Concat(es[1..es.len() - 1].to_vec()))
Some(Hir::concat(es[1..es.len() - 1].to_vec()))
}
_ => None,
}
@@ -42,13 +43,11 @@ pub fn strip_unicode_word_boundaries(expr: &Expr) -> Option<Expr> {
}
/// Returns true if the given expression is a Unicode word boundary.
fn is_unicode_word_boundary(expr: &Expr) -> bool {
use syntax::Expr::*;
match *expr {
WordBoundary => true,
NotWordBoundary => true,
Group { ref e, .. } => is_unicode_word_boundary(e),
fn is_unicode_word_boundary(expr: &Hir) -> bool {
match *expr.kind() {
HirKind::WordBoundary(hir::WordBoundary::Unicode) => true,
HirKind::WordBoundary(hir::WordBoundary::UnicodeNegate) => true,
HirKind::Group(ref x) => is_unicode_word_boundary(&x.hir),
_ => false,
}
}

View File

@@ -23,7 +23,7 @@ globset = { version = "0.3.0", path = "../globset" }
lazy_static = "1"
log = "0.4"
memchr = "2"
regex = "0.2.1"
regex = "0.2.9"
same-file = "1"
thread_local = "0.3.2"
walkdir = "2"

View File

@@ -477,6 +477,7 @@ impl GitignoreBuilder {
GlobBuilder::new(&glob.actual)
.literal_separator(literal_separator)
.case_insensitive(self.case_insensitive)
.backslash_escape(true)
.build()
.map_err(|err| {
Error::Glob {
@@ -635,6 +636,10 @@ mod tests {
ignored!(ig35, "./.", ".a/b", ".a/b");
ignored!(ig36, "././", ".a/b", ".a/b");
ignored!(ig37, "././.", ".a/b", ".a/b");
ignored!(ig38, ROOT, "\\[", "[");
ignored!(ig39, ROOT, "\\?", "?");
ignored!(ig40, ROOT, "\\*", "*");
ignored!(ig41, ROOT, "\\a", "a");
not_ignored!(ignot1, ROOT, "amonths", "months");
not_ignored!(ignot2, ROOT, "monthsa", "months");

View File

@@ -1,14 +1,14 @@
class RipgrepBin < Formula
version '0.8.0'
version '0.8.1'
desc "Recursively search directories for a regex pattern."
homepage "https://github.com/BurntSushi/ripgrep"
if OS.mac?
url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-apple-darwin.tar.gz"
sha256 "fb35e92fd57d28a1e68daf964764c3da7f027ad30cca7a07a1848224776f36b2"
sha256 "71f8d2907b473e5fc30159b822b0f1de247634ee292d5cc3fa1bb80222e0f613"
elsif OS.linux?
url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-unknown-linux-musl.tar.gz"
sha256 "621f2f16f65203aa37e7c10ecfb22384c4c01e70ebbd30a13c0d6944ffc6e59e"
sha256 "08b1aa1440a23a88c94cff41a860340ecf38e9108817aff30ff778c00c63eb76"
end
conflicts_with "ripgrep"

View File

@@ -1,5 +1,5 @@
name: ripgrep # you probably want to 'snapcraft register <name>'
version: '0.5.1' # just for humans, typically '1.2+git' or '1.3.2'
version: '0.8.1' # just for humans, typically '1.2+git' or '1.3.2'
summary: Fast file searcher # 79 char long summary
description: |
ripgrep combines the usability of The Silver Searcher with the raw speed of grep.
@@ -12,4 +12,3 @@ parts:
apps:
rg:
command: env PATH=$SNAP/bin:$PATH rg
aliases: [rg]

View File

@@ -31,7 +31,7 @@ Use -h for short descriptions and --help for more details.";
const USAGE: &str = "
rg [OPTIONS] PATTERN [PATH ...]
rg [OPTIONS] [-e PATTERN ...] [-f FILE ...] [PATH ...]
rg [OPTIONS] [-e PATTERN ...] [-f PATTERNFILE ...] [PATH ...]
rg [OPTIONS] --files [PATH ...]
rg [OPTIONS] --type-list";
@@ -509,6 +509,7 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
// Flags can be defined in any order, but we do it alphabetically.
flag_after_context(&mut args);
flag_before_context(&mut args);
flag_byte_offset(&mut args);
flag_case_sensitive(&mut args);
flag_color(&mut args);
flag_colors(&mut args);
@@ -516,6 +517,7 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
flag_context(&mut args);
flag_context_separator(&mut args);
flag_count(&mut args);
flag_count_matches(&mut args);
flag_debug(&mut args);
flag_dfa_size_limit(&mut args);
flag_encoding(&mut args);
@@ -557,6 +559,7 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
flag_search_zip(&mut args);
flag_smart_case(&mut args);
flag_sort_files(&mut args);
flag_stats(&mut args);
flag_text(&mut args);
flag_threads(&mut args);
flag_type(&mut args);
@@ -634,6 +637,19 @@ This overrides the --context flag.
args.push(arg);
}
fn flag_byte_offset(args: &mut Vec<RGArg>) {
const SHORT: &str =
"Print the 0-based byte offset for each matching line.";
const LONG: &str = long!("\
Print the 0-based byte offset within the input file
before each line of output. If -o (--only-matching) is
specified, print the offset of the matching part itself.
");
let arg = RGArg::switch("byte-offset").short("b")
.help(SHORT).long_help(LONG);
args.push(arg);
}
fn flag_case_sensitive(args: &mut Vec<RGArg>) {
const SHORT: &str = "Search case sensitively (default).";
const LONG: &str = long!("\
@@ -758,7 +774,7 @@ sequences like \\x7F or \\t may be used. The default value is --.
}
fn flag_count(args: &mut Vec<RGArg>) {
const SHORT: &str = "Only show the count of matches for each file.";
const SHORT: &str = "Only show the count of matching lines for each file.";
const LONG: &str = long!("\
This flag suppresses normal output and shows the number of lines that match
the given patterns for each file searched. Each file containing a match has its
@@ -768,9 +784,34 @@ that match and not the total number of matches.
If only one file is given to ripgrep, then only the count is printed if there
is a match. The --with-filename flag can be used to force printing the file
path in this case.
This overrides the --count-matches flag. Note that when --count is combined
with --only-matching, then ripgrep behaves as if --count-matches was given.
");
let arg = RGArg::switch("count").short("c")
.help(SHORT).long_help(LONG);
.help(SHORT).long_help(LONG).overrides("count-matches");
args.push(arg);
}
fn flag_count_matches(args: &mut Vec<RGArg>) {
const SHORT: &str =
"Only show the count of individual matches for each file.";
const LONG: &str = long!("\
This flag suppresses normal output and shows the number of individual
matches of the given patterns for each file searched. Each file
containing matches has its path and match count printed on each line.
Note that this reports the total number of individual matches and not
the number of lines that match.
If only one file is given to ripgrep, then only the count is printed if there
is a match. The --with-filename flag can be used to force printing the file
path in this case.
This overrides the --count flag. Note that when --count is combined with
--only-matching, then ripgrep behaves as if --count-matches was given.
");
let arg = RGArg::switch("count-matches")
.help(SHORT).long_help(LONG).overrides("count");
args.push(arg);
}
@@ -823,7 +864,7 @@ input lines, and the newline is not counted as part of the pattern.
A line is printed if and only if it matches at least one of the patterns.
");
let arg = RGArg::flag("file", "PATH").short("f")
let arg = RGArg::flag("file", "PATTERNFILE").short("f")
.help(SHORT).long_help(LONG)
.multiple()
.allow_leading_hyphen();
@@ -1002,10 +1043,10 @@ fn flag_ignore_file(args: &mut Vec<RGArg>) {
const SHORT: &str = "Specify additional ignore files.";
const LONG: &str = long!("\
Specifies a path to one or more .gitignore format rules files. These patterns
are applied after the patterns found in .gitignore and .ignore are applied
and are matched relative to the current working directory. Multiple additional
ignore files can be specified by using the --ignore-file flag several times.
When specifying multiple ignore files, earlier files have lower precedence
are applied after the patterns found in .gitignore and .ignore are applied
and are matched relative to the current working directory. Multiple additional
ignore files can be specified by using the --ignore-file flag several times.
When specifying multiple ignore files, earlier files have lower precedence
than later files.
If you are looking for a way to include or exclude files and directories
@@ -1448,6 +1489,25 @@ This flag can be disabled with --no-sort-files.
args.push(arg);
}
fn flag_stats(args: &mut Vec<RGArg>) {
const SHORT: &str = "Print statistics about this ripgrep search.";
const LONG: &str = long!("\
Print aggregate statistics about this ripgrep search. When this flag is
present, ripgrep will print the following stats to stdout at the end of the
search: number of matched lines, number of files with matches, number of files
searched, and the time taken for the entire search to complete.
This set of aggregate statistics may expand over time.
Note that this flag has no effect if --files, --files-with-matches or
--files-without-match is passed.");
let arg = RGArg::switch("stats")
.help(SHORT).long_help(LONG);
args.push(arg);
}
fn flag_text(args: &mut Vec<RGArg>) {
const SHORT: &str = "Search binary files as if they were text.";
const LONG: &str = long!("\

View File

@@ -9,7 +9,7 @@ use std::sync::atomic::{AtomicBool, Ordering};
use clap;
use encoding_rs::Encoding;
use grep::{Grep, GrepBuilder, Error as GrepError};
use grep::{Grep, GrepBuilder};
use log;
use num_cpus;
use regex;
@@ -35,11 +35,13 @@ pub struct Args {
paths: Vec<PathBuf>,
after_context: usize,
before_context: usize,
byte_offset: bool,
color_choice: termcolor::ColorChoice,
colors: ColorSpecs,
column: bool,
context_separator: Vec<u8>,
count: bool,
count_matches: bool,
encoding: Option<&'static Encoding>,
files_with_matches: bool,
files_without_matches: bool,
@@ -77,7 +79,8 @@ pub struct Args {
type_list: bool,
types: Types,
with_filename: bool,
search_zip_files: bool
search_zip_files: bool,
stats: bool
}
impl Args {
@@ -199,6 +202,7 @@ impl Args {
pub fn file_separator(&self) -> Option<Vec<u8>> {
let contextless =
self.count
|| self.count_matches
|| self.files_with_matches
|| self.files_without_matches;
let use_heading_sep = self.heading && !contextless;
@@ -218,6 +222,12 @@ impl Args {
self.max_count == Some(0)
}
/// Returns whether ripgrep should track stats for this run
pub fn stats(&self) -> bool {
self.stats
}
/// Create a new writer for single-threaded searching with color support.
pub fn stdout(&self) -> termcolor::StandardStream {
termcolor::StandardStream::stdout(self.color_choice)
@@ -259,7 +269,9 @@ impl Args {
WorkerBuilder::new(self.grep())
.after_context(self.after_context)
.before_context(self.before_context)
.byte_offset(self.byte_offset)
.count(self.count)
.count_matches(self.count_matches)
.encoding(self.encoding)
.files_with_matches(self.files_with_matches)
.files_without_matches(self.files_without_matches)
@@ -356,16 +368,19 @@ impl<'a> ArgMatches<'a> {
let mmap = self.mmap(&paths)?;
let with_filename = self.with_filename(&paths);
let (before_context, after_context) = self.contexts()?;
let (count, count_matches) = self.counts();
let quiet = self.is_present("quiet");
let args = Args {
paths: paths,
after_context: after_context,
before_context: before_context,
byte_offset: self.is_present("byte-offset"),
color_choice: self.color_choice(),
colors: self.color_specs()?,
column: self.column(),
context_separator: self.context_separator(),
count: self.is_present("count"),
count: count,
count_matches: count_matches,
encoding: self.encoding()?,
files_with_matches: self.is_present("files-with-matches"),
files_without_matches: self.is_present("files-without-match"),
@@ -403,7 +418,8 @@ impl<'a> ArgMatches<'a> {
type_list: self.is_present("type-list"),
types: self.types()?,
with_filename: with_filename,
search_zip_files: self.is_present("search-zip")
search_zip_files: self.is_present("search-zip"),
stats: self.stats()
};
if args.mmap {
debug!("will try to use memory maps");
@@ -583,7 +599,7 @@ impl<'a> ArgMatches<'a> {
// This would normally just be an empty string, which works on its
// own, but if the patterns are joined in a set of alternations, then
// you wind up with `foo|`, which is invalid.
self.word_pattern("z{0}".to_string())
self.word_pattern("(?:z{0})*".to_string())
}
/// Returns true if and only if file names containing each match should
@@ -729,6 +745,28 @@ impl<'a> ArgMatches<'a> {
})
}
/// Returns whether the -c/--count or the --count-matches flags were
/// passed from the command line.
///
/// If --count-matches and --invert-match were passed in, behave
/// as if --count and --invert-match were passed in (i.e. rg will
/// count inverted matches as per existing behavior).
fn counts(&self) -> (bool, bool) {
let count = self.is_present("count");
let count_matches = self.is_present("count-matches");
let invert_matches = self.is_present("invert-match");
let only_matching = self.is_present("only-matching");
if count_matches && invert_matches {
// Treat `-v --count-matches` as `-v -c`.
(true, false)
} else if count && only_matching {
// Treat `-c --only-matching` as `--count-matches`.
(false, true)
} else {
(count, count_matches)
}
}
/// Returns the user's color choice based on command line parameters and
/// environment.
fn color_choice(&self) -> termcolor::ColorChoice {
@@ -795,6 +833,19 @@ impl<'a> ArgMatches<'a> {
}
}
/// Returns whether status should be tracked for this run of ripgrep
/// This is automatically disabled if we're asked to only list the
/// files that wil be searched, files with matches or files
/// without matches.
fn stats(&self) -> bool {
if self.is_present("files-with-matches") ||
self.is_present("files-without-match") {
return false;
}
self.is_present("stats")
}
/// Returns the approximate number of threads that ripgrep should use.
fn threads(&self) -> Result<usize> {
if self.is_present("sort-files") {
@@ -831,16 +882,7 @@ impl<'a> ArgMatches<'a> {
if let Some(limit) = self.regex_size_limit()? {
gb = gb.size_limit(limit);
}
gb.build().map_err(|err| {
match err {
GrepError::Regex(err) => {
let s = format!("{}\n(Hint: Try the --fixed-strings flag \
to search for a literal string.)", err.to_string());
From::from(s)
},
err => From::from(err)
}
})
Ok(gb.build()?)
}
/// Builds the set of glob overrides from the command line flags.

View File

@@ -27,6 +27,7 @@ use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::mpsc;
use std::thread;
use std::time::{Duration, Instant};
use args::Args;
use worker::Work;
@@ -85,16 +86,19 @@ fn run(args: Arc<Args>) -> Result<u64> {
}
fn run_parallel(args: &Arc<Args>) -> Result<u64> {
let start_time = Instant::now();
let bufwtr = Arc::new(args.buffer_writer());
let quiet_matched = args.quiet_matched();
let paths_searched = Arc::new(AtomicUsize::new(0));
let match_count = Arc::new(AtomicUsize::new(0));
let match_line_count = Arc::new(AtomicUsize::new(0));
let paths_matched = Arc::new(AtomicUsize::new(0));
args.walker_parallel().run(|| {
let args = Arc::clone(args);
let quiet_matched = quiet_matched.clone();
let paths_searched = paths_searched.clone();
let match_count = match_count.clone();
let match_line_count = match_line_count.clone();
let paths_matched = paths_matched.clone();
let bufwtr = Arc::clone(&bufwtr);
let mut buf = bufwtr.buffer();
let mut worker = args.worker();
@@ -125,10 +129,13 @@ fn run_parallel(args: &Arc<Args>) -> Result<u64> {
} else {
worker.run(&mut printer, Work::DirEntry(dent))
};
match_count.fetch_add(count as usize, Ordering::SeqCst);
match_line_count.fetch_add(count as usize, Ordering::SeqCst);
if quiet_matched.set_match(count > 0) {
return Quit;
}
if args.stats() && count > 0 {
paths_matched.fetch_add(1, Ordering::SeqCst);
}
}
// BUG(burntsushi): We should handle this error instead of ignoring
// it. See: https://github.com/BurntSushi/ripgrep/issues/200
@@ -141,15 +148,28 @@ fn run_parallel(args: &Arc<Args>) -> Result<u64> {
eprint_nothing_searched();
}
}
Ok(match_count.load(Ordering::SeqCst) as u64)
let match_line_count = match_line_count.load(Ordering::SeqCst) as u64;
let paths_searched = paths_searched.load(Ordering::SeqCst) as u64;
let paths_matched = paths_matched.load(Ordering::SeqCst) as u64;
if args.stats() {
print_stats(
match_line_count,
paths_searched,
paths_matched,
start_time.elapsed(),
);
}
Ok(match_line_count)
}
fn run_one_thread(args: &Arc<Args>) -> Result<u64> {
let start_time = Instant::now();
let stdout = args.stdout();
let mut stdout = stdout.lock();
let mut worker = args.worker();
let mut paths_searched: u64 = 0;
let mut match_count = 0;
let mut match_line_count = 0;
let mut paths_matched: u64 = 0;
for result in args.walker() {
let dent = match get_or_log_dir_entry(
result,
@@ -161,7 +181,7 @@ fn run_one_thread(args: &Arc<Args>) -> Result<u64> {
Some(dent) => dent,
};
let mut printer = args.printer(&mut stdout);
if match_count > 0 {
if match_line_count > 0 {
if args.quiet() {
break;
}
@@ -170,19 +190,31 @@ fn run_one_thread(args: &Arc<Args>) -> Result<u64> {
}
}
paths_searched += 1;
match_count +=
let count =
if dent.is_stdin() {
worker.run(&mut printer, Work::Stdin)
} else {
worker.run(&mut printer, Work::DirEntry(dent))
};
match_line_count += count;
if args.stats() && count > 0 {
paths_matched += 1;
}
}
if !args.paths().is_empty() && paths_searched == 0 {
if !args.no_messages() {
eprint_nothing_searched();
}
}
Ok(match_count)
if args.stats() {
print_stats(
match_line_count,
paths_searched,
paths_matched,
start_time.elapsed(),
);
}
Ok(match_line_count)
}
fn run_files_parallel(args: Arc<Args>) -> Result<u64> {
@@ -373,6 +405,22 @@ fn eprint_nothing_searched() {
Try running again with --debug.");
}
fn print_stats(
match_count: u64,
paths_searched: u64,
paths_matched: u64,
time_elapsed: Duration,
) {
let time_elapsed =
time_elapsed.as_secs() as f64
+ (time_elapsed.subsec_nanos() as f64 * 1e-9);
println!("\n{} matched lines\n\
{} files contained matches\n\
{} files searched\n\
{:.3} seconds", match_count, paths_matched,
paths_searched, time_elapsed);
}
// The Rust standard library suppresses the default SIGPIPE behavior, so that
// writing to a closed pipe doesn't kill the process. The goal is to instead
// handle errors through the normal result mechanism. Ripgrep needs some

View File

@@ -280,6 +280,7 @@ impl<W: WriteColor> Printer<W> {
start: usize,
end: usize,
line_number: Option<u64>,
byte_offset: Option<u64>
) {
if !self.line_per_match && !self.only_matching {
let mat = re
@@ -287,12 +288,13 @@ impl<W: WriteColor> Printer<W> {
.map(|m| (m.start(), m.end()))
.unwrap_or((0, 0));
return self.write_match(
re, path, buf, start, end, line_number, mat.0, mat.1);
re, path, buf, start, end, line_number,
byte_offset, mat.0, mat.1);
}
for m in re.find_iter(&buf[start..end]) {
self.write_match(
re, path.as_ref(), buf, start, end,
line_number, m.start(), m.end());
re, path.as_ref(), buf, start, end, line_number,
byte_offset, m.start(), m.end());
}
}
@@ -304,6 +306,7 @@ impl<W: WriteColor> Printer<W> {
start: usize,
end: usize,
line_number: Option<u64>,
byte_offset: Option<u64>,
match_start: usize,
match_end: usize,
) {
@@ -321,6 +324,14 @@ impl<W: WriteColor> Printer<W> {
if self.column {
self.column_number(match_start as u64 + 1, b':');
}
if let Some(byte_offset) = byte_offset {
if self.only_matching {
self.write_byte_offset(
byte_offset + ((start + match_start) as u64), b':');
} else {
self.write_byte_offset(byte_offset + (start as u64), b':');
}
}
if self.replace.is_some() {
let mut count = 0;
let mut offsets = Vec::new();
@@ -395,6 +406,7 @@ impl<W: WriteColor> Printer<W> {
start: usize,
end: usize,
line_number: Option<u64>,
byte_offset: Option<u64>,
) {
if self.heading && self.with_filename && !self.has_printed {
self.write_file_sep();
@@ -407,6 +419,9 @@ impl<W: WriteColor> Printer<W> {
if let Some(line_number) = line_number {
self.line_number(line_number, b'-');
}
if let Some(byte_offset) = byte_offset {
self.write_byte_offset(byte_offset + (start as u64), b'-');
}
if self.max_columns.map_or(false, |m| end - start > m) {
self.write(b"[Omitted long context line]");
self.write_eol();
@@ -481,6 +496,11 @@ impl<W: WriteColor> Printer<W> {
self.separator(&[sep]);
}
fn write_byte_offset(&mut self, o: u64, sep: u8) {
self.write_colored(o.to_string().as_bytes(), |colors| colors.column());
self.separator(&[sep]);
}
fn write(&mut self, buf: &[u8]) {
self.has_printed = true;
let _ = self.wtr.write_all(buf);

View File

@@ -21,8 +21,10 @@ pub struct BufferSearcher<'a, W: 'a> {
grep: &'a Grep,
path: &'a Path,
buf: &'a [u8],
match_count: u64,
match_line_count: u64,
match_count: Option<u64>,
line_count: Option<u64>,
byte_offset: Option<u64>,
last_line: usize,
}
@@ -39,12 +41,24 @@ impl<'a, W: WriteColor> BufferSearcher<'a, W> {
grep: grep,
path: path,
buf: buf,
match_count: 0,
match_line_count: 0,
match_count: None,
line_count: None,
byte_offset: None,
last_line: 0,
}
}
/// If enabled, searching will print a 0-based offset of the
/// matching line (or the actual match if -o is specified) before
/// printing the line itself.
///
/// Disabled by default.
pub fn byte_offset(mut self, yes: bool) -> Self {
self.opts.byte_offset = yes;
self
}
/// If enabled, searching will print a count instead of each match.
///
/// Disabled by default.
@@ -53,6 +67,15 @@ impl<'a, W: WriteColor> BufferSearcher<'a, W> {
self
}
/// If enabled, searching will print the count of individual matches
/// instead of each match.
///
/// Disabled by default.
pub fn count_matches(mut self, yes: bool) -> Self {
self.opts.count_matches = yes;
self
}
/// If enabled, searching will print the path instead of each match.
///
/// Disabled by default.
@@ -118,8 +141,12 @@ impl<'a, W: WriteColor> BufferSearcher<'a, W> {
return 0;
}
self.match_count = 0;
self.match_line_count = 0;
self.line_count = if self.opts.line_number { Some(0) } else { None };
// The memory map searcher uses one contiguous block of bytes, so the
// offsets given the printer are sufficient to compute the byte offset.
self.byte_offset = if self.opts.byte_offset { Some(0) } else { None };
self.match_count = if self.opts.count_matches { Some(0) } else { None };
let mut last_end = 0;
for m in self.grep.iter(self.buf) {
if self.opts.invert_match {
@@ -128,29 +155,43 @@ impl<'a, W: WriteColor> BufferSearcher<'a, W> {
self.print_match(m.start(), m.end());
}
last_end = m.end();
if self.opts.terminate(self.match_count) {
if self.opts.terminate(self.match_line_count) {
break;
}
}
if self.opts.invert_match && !self.opts.terminate(self.match_count) {
if self.opts.invert_match && !self.opts.terminate(self.match_line_count) {
let upto = self.buf.len();
self.print_inverted_matches(last_end, upto);
}
if self.opts.count && self.match_count > 0 {
self.printer.path_count(self.path, self.match_count);
if self.opts.count && self.match_line_count > 0 {
self.printer.path_count(self.path, self.match_line_count);
} else if self.opts.count_matches
&& self.match_count.map_or(false, |c| c > 0)
{
self.printer.path_count(self.path, self.match_count.unwrap());
}
if self.opts.files_with_matches && self.match_count > 0 {
if self.opts.files_with_matches && self.match_line_count > 0 {
self.printer.path(self.path);
}
if self.opts.files_without_matches && self.match_count == 0 {
if self.opts.files_without_matches && self.match_line_count == 0 {
self.printer.path(self.path);
}
self.match_count
self.match_line_count
}
#[inline(always)]
fn count_individual_matches(&mut self, start: usize, end: usize) {
if let Some(ref mut count) = self.match_count {
for _ in self.grep.regex().find_iter(&self.buf[start..end]) {
*count += 1;
}
}
}
#[inline(always)]
pub fn print_match(&mut self, start: usize, end: usize) {
self.match_count += 1;
self.match_line_count += 1;
self.count_individual_matches(start, end);
if self.opts.skip_matches() {
return;
}
@@ -158,7 +199,7 @@ impl<'a, W: WriteColor> BufferSearcher<'a, W> {
self.add_line(end);
self.printer.matched(
self.grep.regex(), self.path, self.buf,
start, end, self.line_count);
start, end, self.line_count, self.byte_offset);
}
#[inline(always)]
@@ -166,7 +207,7 @@ impl<'a, W: WriteColor> BufferSearcher<'a, W> {
debug_assert!(self.opts.invert_match);
let mut it = IterLines::new(self.opts.eol, start);
while let Some((s, e)) = it.next(&self.buf[..end]) {
if self.opts.terminate(self.match_count) {
if self.opts.terminate(self.match_line_count) {
return;
}
self.print_match(s, e);
@@ -271,6 +312,29 @@ and exhibited clearly, with a label attached.\
");
}
#[test]
fn byte_offset() {
let (_, out) = search(
"Sherlock", SHERLOCK, |s| s.byte_offset(true));
assert_eq!(out, "\
/baz.rs:0:For the Doctor Watsons of this world, as opposed to the Sherlock
/baz.rs:129:be, to a very large extent, the result of luck. Sherlock Holmes
");
}
#[test]
fn byte_offset_inverted() {
let (_, out) = search("Sherlock", SHERLOCK, |s| {
s.invert_match(true).byte_offset(true)
});
assert_eq!(out, "\
/baz.rs:65:Holmeses, success in the province of detective work must always
/baz.rs:193:can extract a clew from a wisp of straw or a flake of cigar ash;
/baz.rs:258:but Doctor Watson has to have it taken out for him and dusted,
/baz.rs:321:and exhibited clearly, with a label attached.
");
}
#[test]
fn count() {
let (count, out) = search(
@@ -279,6 +343,13 @@ and exhibited clearly, with a label attached.\
assert_eq!(out, "/baz.rs:2\n");
}
#[test]
fn count_matches() {
let (_, out) = search(
"the", SHERLOCK, |s| s.count_matches(true));
assert_eq!(out, "/baz.rs:4\n");
}
#[test]
fn files_with_matches() {
let (count, out) = search(

View File

@@ -67,8 +67,10 @@ pub struct Searcher<'a, R, W: 'a> {
grep: &'a Grep,
path: &'a Path,
haystack: R,
match_count: u64,
match_line_count: u64,
match_count: Option<u64>,
line_count: Option<u64>,
byte_offset: Option<u64>,
last_match: Match,
last_printed: usize,
last_line: usize,
@@ -80,7 +82,9 @@ pub struct Searcher<'a, R, W: 'a> {
pub struct Options {
pub after_context: usize,
pub before_context: usize,
pub byte_offset: bool,
pub count: bool,
pub count_matches: bool,
pub files_with_matches: bool,
pub files_without_matches: bool,
pub eol: u8,
@@ -96,7 +100,9 @@ impl Default for Options {
Options {
after_context: 0,
before_context: 0,
byte_offset: false,
count: false,
count_matches: false,
files_with_matches: false,
files_without_matches: false,
eol: b'\n',
@@ -111,11 +117,11 @@ impl Default for Options {
}
impl Options {
/// Several options (--quiet, --count, --files-with-matches,
/// Several options (--quiet, --count, --count-matches, --files-with-matches,
/// --files-without-match) imply that we shouldn't ever display matches.
pub fn skip_matches(&self) -> bool {
self.count || self.files_with_matches || self.files_without_matches
|| self.quiet
|| self.quiet || self.count_matches
}
/// Some options (--quiet, --files-with-matches, --files-without-match)
@@ -124,12 +130,12 @@ impl Options {
self.files_with_matches || self.files_without_matches || self.quiet
}
/// Returns true if the search should terminate based on the match count.
pub fn terminate(&self, match_count: u64) -> bool {
if match_count > 0 && self.stop_after_first_match() {
/// Returns true if the search should terminate based on the match line count.
pub fn terminate(&self, match_line_count: u64) -> bool {
if match_line_count > 0 && self.stop_after_first_match() {
return true;
}
if self.max_count.map_or(false, |max| match_count >= max) {
if self.max_count.map_or(false, |max| match_line_count >= max) {
return true;
}
false
@@ -163,8 +169,10 @@ impl<'a, R: io::Read, W: WriteColor> Searcher<'a, R, W> {
grep: grep,
path: path,
haystack: haystack,
match_count: 0,
match_line_count: 0,
match_count: None,
line_count: None,
byte_offset: None,
last_match: Match::default(),
last_printed: 0,
last_line: 0,
@@ -186,6 +194,16 @@ impl<'a, R: io::Read, W: WriteColor> Searcher<'a, R, W> {
self
}
/// If enabled, searching will print a 0-based offset of the
/// matching line (or the actual match if -o is specified) before
/// printing the line itself.
///
/// Disabled by default.
pub fn byte_offset(mut self, yes: bool) -> Self {
self.opts.byte_offset = yes;
self
}
/// If enabled, searching will print a count instead of each match.
///
/// Disabled by default.
@@ -194,6 +212,15 @@ impl<'a, R: io::Read, W: WriteColor> Searcher<'a, R, W> {
self
}
/// If enabled, searching will print the count of individual matches
/// instead of each match.
///
/// Disabled by default.
pub fn count_matches(mut self, yes: bool) -> Self {
self.opts.count_matches = yes;
self
}
/// If enabled, searching will print the path instead of each match.
///
/// Disabled by default.
@@ -257,8 +284,10 @@ impl<'a, R: io::Read, W: WriteColor> Searcher<'a, R, W> {
#[inline(never)]
pub fn run(mut self) -> Result<u64, Error> {
self.inp.reset();
self.match_count = 0;
self.match_line_count = 0;
self.line_count = if self.opts.line_number { Some(0) } else { None };
self.byte_offset = if self.opts.byte_offset { Some(0) } else { None };
self.match_count = if self.opts.count_matches { Some(0) } else { None };
self.last_match = Match::default();
self.after_context_remaining = 0;
while !self.terminate() {
@@ -308,36 +337,39 @@ impl<'a, R: io::Read, W: WriteColor> Searcher<'a, R, W> {
self.print_after_context(upto);
}
}
if self.match_count > 0 {
if self.match_line_count > 0 {
if self.opts.count {
self.printer.path_count(self.path, self.match_count);
self.printer.path_count(self.path, self.match_line_count);
} else if self.opts.count_matches {
self.printer.path_count(self.path, self.match_count.unwrap());
} else if self.opts.files_with_matches {
self.printer.path(self.path);
}
} else if self.opts.files_without_matches {
self.printer.path(self.path);
}
Ok(self.match_count)
Ok(self.match_line_count)
}
#[inline(always)]
fn terminate(&self) -> bool {
self.opts.terminate(self.match_count)
self.opts.terminate(self.match_line_count)
}
#[inline(always)]
fn fill(&mut self) -> Result<bool, Error> {
let keep = if self.opts.before_context > 0 || self.opts.after_context > 0 {
let lines = 1 + cmp::max(
self.opts.before_context, self.opts.after_context);
start_of_previous_lines(
self.opts.eol,
&self.inp.buf,
self.inp.lastnl.saturating_sub(1),
lines)
} else {
self.inp.lastnl
};
let keep =
if self.opts.before_context > 0 || self.opts.after_context > 0 {
let lines = 1 + cmp::max(
self.opts.before_context, self.opts.after_context);
start_of_previous_lines(
self.opts.eol,
&self.inp.buf,
self.inp.lastnl.saturating_sub(1),
lines)
} else {
self.inp.lastnl
};
if keep < self.last_printed {
self.last_printed -= keep;
} else {
@@ -349,6 +381,7 @@ impl<'a, R: io::Read, W: WriteColor> Searcher<'a, R, W> {
self.count_lines(keep);
self.last_line = 0;
}
self.count_byte_offset(keep);
let ok = self.inp.fill(&mut self.haystack, keep).map_err(|err| {
Error::from_io(err, &self.path)
})?;
@@ -410,7 +443,8 @@ impl<'a, R: io::Read, W: WriteColor> Searcher<'a, R, W> {
#[inline(always)]
fn print_match(&mut self, start: usize, end: usize) {
self.match_count += 1;
self.match_line_count += 1;
self.count_individual_matches(start, end);
if self.opts.skip_matches() {
return;
}
@@ -419,7 +453,7 @@ impl<'a, R: io::Read, W: WriteColor> Searcher<'a, R, W> {
self.add_line(end);
self.printer.matched(
self.grep.regex(), self.path,
&self.inp.buf, start, end, self.line_count);
&self.inp.buf, start, end, self.line_count, self.byte_offset);
self.last_printed = end;
self.after_context_remaining = self.opts.after_context;
}
@@ -429,7 +463,8 @@ impl<'a, R: io::Read, W: WriteColor> Searcher<'a, R, W> {
self.count_lines(start);
self.add_line(end);
self.printer.context(
&self.path, &self.inp.buf, start, end, self.line_count);
&self.path, &self.inp.buf, start, end,
self.line_count, self.byte_offset);
self.last_printed = end;
}
@@ -447,6 +482,22 @@ impl<'a, R: io::Read, W: WriteColor> Searcher<'a, R, W> {
}
}
#[inline(always)]
fn count_byte_offset(&mut self, buf_last_end: usize) {
if let Some(ref mut byte_offset) = self.byte_offset {
*byte_offset += buf_last_end as u64;
}
}
#[inline(always)]
fn count_individual_matches(&mut self, start: usize, end: usize) {
if let Some(ref mut count) = self.match_count {
for _ in self.grep.regex().find_iter(&self.inp.buf[start..end]) {
*count += 1;
}
}
}
#[inline(always)]
fn count_lines(&mut self, upto: usize) {
if let Some(ref mut line_count) = self.line_count {
@@ -1006,6 +1057,48 @@ fn main() {
assert_eq!(out, "/baz.rs:2\n");
}
#[test]
fn byte_offset() {
let (_, out) = search_smallcap(
"Sherlock", SHERLOCK, |s| s.byte_offset(true));
assert_eq!(out, "\
/baz.rs:0:For the Doctor Watsons of this world, as opposed to the Sherlock
/baz.rs:129:be, to a very large extent, the result of luck. Sherlock Holmes
");
}
#[test]
fn byte_offset_with_before_context() {
let (_, out) = search_smallcap("dusted", SHERLOCK, |s| {
s.line_number(true).byte_offset(true).before_context(2)
});
assert_eq!(out, "\
/baz.rs-3-129-be, to a very large extent, the result of luck. Sherlock Holmes
/baz.rs-4-193-can extract a clew from a wisp of straw or a flake of cigar ash;
/baz.rs:5:258:but Doctor Watson has to have it taken out for him and dusted,
");
}
#[test]
fn byte_offset_inverted() {
let (_, out) = search_smallcap("Sherlock", SHERLOCK, |s| {
s.invert_match(true).byte_offset(true)
});
assert_eq!(out, "\
/baz.rs:65:Holmeses, success in the province of detective work must always
/baz.rs:193:can extract a clew from a wisp of straw or a flake of cigar ash;
/baz.rs:258:but Doctor Watson has to have it taken out for him and dusted,
/baz.rs:321:and exhibited clearly, with a label attached.
");
}
#[test]
fn count_matches() {
let (_, out) = search_smallcap(
"the", SHERLOCK, |s| s.count_matches(true));
assert_eq!(out, "/baz.rs:4\n");
}
#[test]
fn files_with_matches() {
let (count, out) = search_smallcap(

View File

@@ -33,7 +33,9 @@ struct Options {
encoding: Option<&'static Encoding>,
after_context: usize,
before_context: usize,
byte_offset: bool,
count: bool,
count_matches: bool,
files_with_matches: bool,
files_without_matches: bool,
eol: u8,
@@ -53,7 +55,9 @@ impl Default for Options {
encoding: None,
after_context: 0,
before_context: 0,
byte_offset: false,
count: false,
count_matches: false,
files_with_matches: false,
files_without_matches: false,
eol: b'\n',
@@ -106,6 +110,16 @@ impl WorkerBuilder {
self
}
/// If enabled, searching will print a 0-based offset of the
/// matching line (or the actual match if -o is specified) before
/// printing the line itself.
///
/// Disabled by default.
pub fn byte_offset(mut self, yes: bool) -> Self {
self.opts.byte_offset = yes;
self
}
/// If enabled, searching will print a count instead of each match.
///
/// Disabled by default.
@@ -114,6 +128,15 @@ impl WorkerBuilder {
self
}
/// If enabled, searching will print the count of individual matches
/// instead of each match.
///
/// Disabled by default.
pub fn count_matches(mut self, yes: bool) -> Self {
self.opts.count_matches = yes;
self
}
/// Set the encoding to use to read each file.
///
/// If the encoding is `None` (the default), then the encoding is
@@ -283,7 +306,9 @@ impl Worker {
searcher
.after_context(self.opts.after_context)
.before_context(self.opts.before_context)
.byte_offset(self.opts.byte_offset)
.count(self.opts.count)
.count_matches(self.opts.count_matches)
.files_with_matches(self.opts.files_with_matches)
.files_without_matches(self.opts.files_without_matches)
.eol(self.opts.eol)
@@ -322,7 +347,9 @@ impl Worker {
}
let searcher = BufferSearcher::new(printer, &self.grep, path, buf);
Ok(searcher
.byte_offset(self.opts.byte_offset)
.count(self.opts.count)
.count_matches(self.opts.count_matches)
.files_with_matches(self.opts.files_with_matches)
.files_without_matches(self.opts.files_without_matches)
.eol(self.opts.eol)
@@ -341,14 +368,17 @@ impl Worker {
#[cfg(unix)]
fn mmap(&self, file: &File) -> Result<Option<Mmap>> {
use libc::{ENODEV, EOVERFLOW};
use libc::{EOVERFLOW, ENODEV, ENOMEM};
let err = match mmap_readonly(file) {
Ok(mmap) => return Ok(Some(mmap)),
Err(err) => err,
};
let code = err.raw_os_error();
if code == Some(ENODEV) || code == Some(EOVERFLOW) {
if code == Some(EOVERFLOW)
|| code == Some(ENODEV)
|| code == Some(ENOMEM)
{
return Ok(None);
}
Err(From::from(err))

View File

@@ -1,6 +1,6 @@
[package]
name = "termcolor"
version = "0.3.4" #:version
version = "0.3.6" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
A simple cross platform library for writing colored text to a terminal.

View File

@@ -70,7 +70,7 @@ writeln!(&mut stdout, "green text!")?;
A `BufferWriter` can create buffers and write buffers to stdout or stderr. It
does *not* implement `io::Write` or `WriteColor` itself. Instead, `Buffer`
implements `io::Write` and `io::WriteColor`.
implements `io::Write` and `termcolor::WriteColor`.
This example shows how to print some green text to stderr.

View File

@@ -971,18 +971,18 @@ impl<W: io::Write> WriteColor for Ansi<W> {
fn set_color(&mut self, spec: &ColorSpec) -> io::Result<()> {
self.reset()?;
if let Some(ref c) = spec.fg_color {
self.write_color(true, c, spec.intense)?;
}
if let Some(ref c) = spec.bg_color {
self.write_color(false, c, spec.intense)?;
}
if spec.bold {
self.write_str("\x1B[1m")?;
}
if spec.underline {
self.write_str("\x1B[4m")?;
}
if let Some(ref c) = spec.fg_color {
self.write_color(true, c, spec.intense)?;
}
if let Some(ref c) = spec.bg_color {
self.write_color(false, c, spec.intense)?;
}
Ok(())
}

View File

@@ -395,6 +395,16 @@ sherlock!(csglob, "Sherlock", ".", |wd: WorkDir, mut cmd: Command| {
assert_eq!(lines, "file2.html:Sherlock\n");
});
sherlock!(byte_offset_only_matching, "Sherlock", ".", |wd: WorkDir, mut cmd: Command| {
cmd.arg("-b").arg("-o");
let lines: String = wd.stdout(&mut cmd);
let expected = "\
sherlock:56:Sherlock
sherlock:177:Sherlock
";
assert_eq!(lines, expected);
});
sherlock!(count, "Sherlock", ".", |wd: WorkDir, mut cmd: Command| {
cmd.arg("--count");
let lines: String = wd.stdout(&mut cmd);
@@ -402,6 +412,27 @@ sherlock!(count, "Sherlock", ".", |wd: WorkDir, mut cmd: Command| {
assert_eq!(lines, expected);
});
sherlock!(count_matches, "the", ".", |wd: WorkDir, mut cmd: Command| {
cmd.arg("--count-matches");
let lines: String = wd.stdout(&mut cmd);
let expected = "sherlock:4\n";
assert_eq!(lines, expected);
});
sherlock!(count_matches_inverted, "Sherlock", ".", |wd: WorkDir, mut cmd: Command| {
cmd.arg("--count-matches").arg("--invert-match");
let lines: String = wd.stdout(&mut cmd);
let expected = "sherlock:4\n";
assert_eq!(lines, expected);
});
sherlock!(count_matches_via_only, "the", ".", |wd: WorkDir, mut cmd: Command| {
cmd.arg("--count").arg("--only-matching");
let lines: String = wd.stdout(&mut cmd);
let expected = "sherlock:4\n";
assert_eq!(lines, expected);
});
sherlock!(files_with_matches, "Sherlock", ".", |wd: WorkDir, mut cmd: Command| {
cmd.arg("--files-with-matches");
let lines: String = wd.stdout(&mut cmd);
@@ -800,11 +831,7 @@ clean!(regression_25, "test", ".", |wd: WorkDir, mut cmd: Command| {
// See: https://github.com/BurntSushi/ripgrep/issues/30
clean!(regression_30, "test", ".", |wd: WorkDir, mut cmd: Command| {
if cfg!(windows) {
wd.create(".gitignore", "vendor/**\n!vendor\\manifest");
} else {
wd.create(".gitignore", "vendor/**\n!vendor/manifest");
}
wd.create(".gitignore", "vendor/**\n!vendor/manifest");
wd.create_dir("vendor");
wd.create("vendor/manifest", "test");
@@ -1660,16 +1687,6 @@ sherlock!(feature_419_zero_as_shortcut_for_null, "Sherlock", ".",
assert_eq!(lines, "sherlock\x002\n");
});
// See: https://github.com/BurntSushi/ripgrep/issues/709
clean!(suggest_fixed_strings_for_invalid_regex, "foo(", ".",
|wd: WorkDir, mut cmd: Command| {
wd.assert_non_empty_stderr(&mut cmd);
let output = cmd.output().unwrap();
let err = String::from_utf8_lossy(&output.stderr);
assert_eq!(err.contains("--fixed-strings"), true);
});
#[test]
fn compressed_gzip() {
if !cmd_exists("gzip") {
@@ -1784,6 +1801,50 @@ be, to a very large extent, the result of luck. Sherlock Holmes
assert_eq!(lines, expected);
});
sherlock!(feature_411_single_threaded_search_stats,
|wd: WorkDir, mut cmd: Command| {
cmd.arg("--stats");
let lines: String = wd.stdout(&mut cmd);
assert_eq!(lines.contains("2 matched lines"), true);
assert_eq!(lines.contains("1 files contained matches"), true);
assert_eq!(lines.contains("1 files searched"), true);
assert_eq!(lines.contains("seconds"), true);
});
#[test]
fn feature_411_parallel_search_stats() {
let wd = WorkDir::new("feature_411");
wd.create("sherlock_1", hay::SHERLOCK);
wd.create("sherlock_2", hay::SHERLOCK);
let mut cmd = wd.command();
cmd.arg("--stats");
cmd.arg("Sherlock");
let lines: String = wd.stdout(&mut cmd);
assert_eq!(lines.contains("4 matched lines"), true);
assert_eq!(lines.contains("2 files contained matches"), true);
assert_eq!(lines.contains("2 files searched"), true);
assert_eq!(lines.contains("seconds"), true);
}
sherlock!(feature_411_ignore_stats_1, |wd: WorkDir, mut cmd: Command| {
cmd.arg("--files-with-matches");
cmd.arg("--stats");
let lines: String = wd.stdout(&mut cmd);
assert_eq!(lines.contains("seconds"), false);
});
sherlock!(feature_411_ignore_stats_2, |wd: WorkDir, mut cmd: Command| {
cmd.arg("--files-without-match");
cmd.arg("--stats");
let lines: String = wd.stdout(&mut cmd);
assert_eq!(lines.contains("seconds"), false);
});
#[test]
fn feature_740_passthru() {
let wd = WorkDir::new("feature_740");