diff --git a/FAQ.md b/FAQ.md index ab71e7bb..75bbff67 100644 --- a/FAQ.md +++ b/FAQ.md @@ -635,7 +635,7 @@ real 0m1.714s user 0m1.669s sys 0m0.044s -[andrew@Cheetah 2016] time rg -P '^\w{42}$' subtitles2016-sample --no-pcre2-unicode +$ time rg -P '^\w{42}$' subtitles2016-sample --no-pcre2-unicode 21225780:EverymajordevelopmentinthehistoryofAmerica real 0m1.997s diff --git a/README.md b/README.md index 6322347e..c4c5b2e6 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ Please see the [CHANGELOG](CHANGELOG.md) for a release history. * [Installation](#installation) * [User Guide](GUIDE.md) * [Frequently Asked Questions](FAQ.md) -* [Regex syntax](https://docs.rs/regex/0.2.5/regex/#syntax) +* [Regex syntax](https://docs.rs/regex/1/regex/#syntax) * [Configuration files](GUIDE.md#configuration-file) * [Shell completions](FAQ.md#complete) * [Building](#building) @@ -103,6 +103,10 @@ increases the times to `2.640s` for ripgrep and `10.277s` for GNU grep. of search results, searching multiple patterns, highlighting matches with color and full Unicode support. Unlike GNU grep, ripgrep stays fast while supporting Unicode (which is always on). +* ripgrep has optional support for switching its regex engine to use PCRE2. + Among other things, this makes it possible to use look-around and + backreferences in your patterns, which are supported in ripgrep's default + regex engine. PCRE2 support is enabled with `-P`. * ripgrep supports searching files in text encodings other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for automatically detecting UTF-16 is provided. Other text encodings must be @@ -114,7 +118,7 @@ increases the times to `2.640s` for ripgrep and `10.277s` for GNU grep. detection and so on. In other words, use ripgrep if you like speed, filtering by default, fewer -bugs, and Unicode support. +bugs and Unicode support. ### Why shouldn't I use ripgrep? @@ -159,7 +163,7 @@ Summarizing, ripgrep is fast because: latter is better for large directories. ripgrep chooses the best searching strategy for you automatically. * Applies your ignore patterns in `.gitignore` files using a - [`RegexSet`](https://docs.rs/regex/1.0.0/regex/struct.RegexSet.html). + [`RegexSet`](https://docs.rs/regex/1/regex/struct.RegexSet.html). That means a single file path can be matched against multiple glob patterns simultaneously. * It uses a lock-free parallel recursive directory iterator, courtesy of diff --git a/src/app.rs b/src/app.rs index 19306c2c..059effbd 100644 --- a/src/app.rs +++ b/src/app.rs @@ -685,9 +685,15 @@ fn flag_byte_offset(args: &mut Vec) { const SHORT: &str = "Print the 0-based byte offset for each matching line."; const LONG: &str = long!("\ -Print the 0-based byte offset within the input file -before each line of output. If -o (--only-matching) is -specified, print the offset of the matching part itself. +Print the 0-based byte offset within the input file before each line of output. +If -o (--only-matching) is specified, print the offset of the matching part +itself. + +If ripgrep does transcoding, then the byte offset is in terms of the the result +of transcoding and not the original data. This applies similarly to another +transformation on the source, such as decompression or a --pre filter. Note +that when the PCRE2 regex engine is used, then UTF-8 transcoding is done by +default. "); let arg = RGArg::switch("byte-offset").short("b") .help(SHORT).long_help(LONG);