249 Commits

Author SHA1 Message Date
Collin Styles
a070722ff2 cli: add --include-zero flag
This flag, when used in conjunction with --count or --count-matches,
will print a result for each file searched even if there were zero
matches in that file. This is off by default but can be enabled to make
ripgrep behave more like grep.

This also clarifies some of the defaults for the
grep-printer::SummaryBuilder type.

Closes #1370, Closes #1405
2020-02-17 17:16:28 -05:00
Matěj Cepl
4628d77808 ignore/types: add spec file type
This is for RPM package SPEC files.

Fixes #946, Closes #1449
2020-02-17 17:16:28 -05:00
luh2
040ca45ba0 ignore/types: add xhtml to xml file type
Closes #1426
2020-02-17 17:16:28 -05:00
Andrew Gallant
91470572cd changelog: add notes about new file types 2020-02-17 17:16:28 -05:00
Sven-Hendrik Haase
027adbf485 ignore/types: add 'diff' file type
This includes .patch and .diff files.

Fixes #1418, Closes #1419
2020-02-17 17:16:28 -05:00
Mohammad AlSaleh
e71eedf0eb cli: add --no-context-separator flag
--context-separator='' still adds a new line separator, which could
still potentially be useful. So we add a new `--no-context-separator`
flag that completely disables context separators even when the -A/-B/-C
context flags are used.

Closes #1390
2020-02-17 17:16:28 -05:00
sharkdp
a18cf6ec39 ignore: add existence check for ignore files
This commit adds a simple `.exists()` check for `.gitignore`,
`.ignore`, and other similar files before actually calling
`File::open(…)` in `GitIgnoreBuilder::add`.

The reason is that a simple existence check via `stat` can be faster
than actually trying to `open` the file, see
https://stackoverflow.com/a/12774387/704831. As we typically expect(?)
the number of directories *without* ignore files to be much larger
than the number of directories *with* ignore files, this leads to an
overall speedup.

The performance gain is not huge for `rg`, but can be quite significant
if more `.gitignore`-like files are added via
`add_custom_ignore_filename`. The speedup is *larger* for folders with
*low* files-per-directory ratios.

Note though that we do not do this check on Windows until a specific
analysis there suggests this is beneficial. Namely, Windows generally
has slower file system operations, so it's not clear whether this
speculative check is actually a benefit or not.

Benchmark results
-----------------

`rg --files` in my home folder (200k results, 6.5 files per directory):

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `./rg-master --files` | 396.4 ± 3.2 | 390.9 | 400.0 | 1.05 |
| `./rg-feature --files` | 376.0 ± 3.6 | 369.3 | 383.5 | 1.00 |

`rg --files --hidden` in my home folder (800k results, 5.4
files per directory)

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| `./rg-master --files --hidden` | 1.575 ± 0.012 | 1.560 | 1.597 | 1.06 |
| `./rg-feature --files --hidden` | 1.479 ± 0.011 | 1.464 | 1.496 | 1.00 |

`rg --files` in the chromium-79.0.3915.2 source tree (300k results, 12.7 files per
directory)

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `~/rg-master --files` | 445.2 ± 5.3 | 435.6 | 453.0 | 1.04 |
| `~/rg-feature --files` | 428.9 ± 7.0 | 418.2 | 440.0 | 1.00 |

`rg --files` in the linux-5.3 source tree (65k results, 15.1
files per directory)

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `./rg-master --files` | 94.5 ± 1.9 | 89.8 | 98.5 | 1.02 |
| `./rg-feature --files` | 92.6 ± 2.7 | 88.4 | 98.7 | 1.00 |

Closes #1381
2020-02-17 17:16:28 -05:00
Andrew Gallant
5b10328f41
changelog: update with bug fix 2019-08-02 07:37:27 -04:00
Andrew Gallant
931ab35f76
changelog: start work on 11.0.2 release 2019-08-01 17:42:38 -04:00
Andrew Gallant
d1389db2e3
search: better errors for preprocessor commands
If a preprocessor command could not be started, we now show some
additional context with the error message. Previously, it showed
something like this:

  some/file: No such file or directory (os error 2)

Which is itself pretty misleading. Now it shows:

  some/file: preprocessor command could not start: '"nonexist" "some/file"': No such file or directory (os error 2)

Fixes #1302
2019-06-16 19:02:02 -04:00
Andrew Gallant
e7829c05d3
cli: fix bug where last byte was stripped
In an effort to strip line terminators, we assumed their existence. But
a pattern file may not end with a line terminator, so we shouldn't
unconditionally strip them.

We fix this by moving to bstr's line handling, which does this for us
automatically.
2019-04-19 07:11:44 -04:00
Andrew Gallant
5f8805a496
ripgrep: release 11.0.1 2019-04-16 13:10:29 -04:00
Andrew Gallant
d7f57d9aab
ripgrep: release 11.0.0 2019-04-15 18:09:40 -04:00
Andrew Gallant
ef1611b5f5
ripgrep: max-column-preview --> max-columns-preview
Credit to @okdana for catching this. This naming is a bit more
consistent with the existing --max-columns flag.
2019-04-15 06:51:51 -04:00
Andrew Gallant
45d12abbc5
changelog: small fixups 2019-04-14 20:21:55 -04:00
Andrew Gallant
5fde8391f9
changelog: backfill it
I went through every commit since the 0.10.0 release and added anything
that I thought was missing.
2019-04-14 20:04:01 -04:00
Andrew Gallant
967e7ad0de ripgrep: add --auto-hybrid-regex flag
This flag, when set, will automatically dispatch to PCRE2 if the given
regex cannot be compiled by Rust's regex engine. If both engines fail to
compile the regex, then both errors are surfaced.

Closes #1155
2019-04-14 19:29:27 -04:00
Andrew Gallant
8f14cb18a5 ripgrep: increase pcre2's default JIT stack size
The default stack size is 32KB, and this increases it to 10MB. 32KB is
pretty paltry in the environments in which ripgrep runs, and 10MB is
easily afforded as a maximum size. (The size limit we set for Rust's
regex engine is considerably larger.)

This was motivated due to the fack that JIT stack limits have been
observed to be hit in the wild:
https://github.com/Microsoft/vscode/issues/64606
2019-04-14 19:29:27 -04:00
Andrew Gallant
da9d720431 ripgrep: add --pcre2-version flag
This flag will output details about the version of PCRE2 that ripgrep
is using (if any).
2019-04-14 19:29:27 -04:00
Andrew Gallant
5a565354f8 versioning: next version will be ripgrep 11
This sets up the release announcement and briefly describes the
versioning change. The actual version change itself won't happen until
the release.

Closes #1172
2019-04-14 19:29:27 -04:00
Andrew Gallant
2a6532ae71 doc: note cases of exorbitant memory usage
Fixes #1189
2019-04-14 19:29:27 -04:00
Andrew Gallant
ece1f50cfe printer: support previews for long lines
This commit adds support for showing a preview of long lines. While the
default still remains as completely suppressing the entire line, this
new functionality will show the first N graphemes of a matching line,
including the number of matches that are suppressed.

This was unfortunately a fairly invasive change to the printer that
required a bit of refactoring. On the bright side, the single line
and multi-line coloring are now more unified than they were before.

Closes #1078
2019-04-14 19:29:27 -04:00
Andrew Gallant
a7d26c8f14 binary: rejigger ripgrep's handling of binary files
This commit attempts to surface binary filtering in a slightly more
user friendly way. Namely, before, ripgrep would silently stop
searching a file if it detected a NUL byte, even if it had previously
printed a match. This can lead to the user quite reasonably assuming
that there are no more matches, since a partial search is fairly
unintuitive. (ripgrep has this behavior by default because it really
wants to NOT search binary files at all, just like it doesn't search
gitignored or hidden files.)

With this commit, if a match has already been printed and ripgrep detects
a NUL byte, then it will print a warning message indicating that the search
stopped prematurely.

Moreover, this commit adds a new flag, --binary, which causes ripgrep to
stop filtering binary files, but in a way that still avoids dumping
binary data into terminals. That is, the --binary flag makes ripgrep
behave more like grep's default behavior.

For files explicitly specified in a search, e.g., `rg foo some-file`,
then no binary filtering is applied (just like no gitignore and no
hidden file filtering is applied). Instead, ripgrep behaves as if you
gave the --binary flag for all explicitly given files.

This was a fairly invasive change, and potentially increases the UX
complexity of ripgrep around binary files. (Before, there were two
binary modes, where as now there are three.) However, ripgrep is now a
bit louder with warning messages when binary file detection might
otherwise be hiding potential matches, so hopefully this is a net
improvement.

Finally, the `-uuu` convenience now maps to `--no-ignore --hidden
--binary`, since this is closer to the actualy intent of the
`--unrestricted` flag, i.e., to reduce ripgrep's smart filtering. As a
consequence, `rg -uuu foo` should now search roughly the same number of
bytes as `grep -r foo`, and `rg -uuua foo` should search roughly the
same number of bytes as `grep -ra foo`. (The "roughly" weasel word is
used because grep's and ripgrep's binary file detection might differ
somewhat---perhaps based on buffer sizes---which can impact exactly what
is and isn't searched.)

See the numerous tests in tests/binary.rs for intended behavior.

Fixes #306, Fixes #855
2019-04-14 19:29:27 -04:00
Andrew Gallant
09108b7fda regex: make multi-literal searcher faster
This makes the case of searching for a dictionary of a very large number
of literals much much faster. (~10x or so.) In particular, we achieve this
by short-circuiting the construction of a full regex when we know we have
a simple alternation of literals. Building the regex for a large dictionary
(>100,000 literals) turns out to be quite slow, even if it internally will
dispatch to Aho-Corasick.

Even that isn't quite enough. It turns out that even *parsing* such a regex
is quite slow. So when the -F/--fixed-strings flag is set, we short
circuit regex parsing completely and jump straight to Aho-Corasick.

We aren't quite as fast as GNU grep here, but it's much closer (less than
2x slower).

In general, this is somewhat of a hack. In particular, it seems plausible
that this optimization could be implemented entirely in the regex engine.
Unfortunately, the regex engine's internals are just not amenable to this
at all, so it would require a larger refactoring effort. For now, it's
good enough to add this fairly simple hack at a higher level.

Unfortunately, if you don't pass -F/--fixed-strings, then ripgrep will
be slower, because of the aforementioned missing optimization. Moreover,
passing flags like `-i` or `-S` will cause ripgrep to abandon this
optimization and fall back to something potentially much slower. Again,
this fix really needs to happen inside the regex engine, although we
might be able to special case -i when the input literals are pure ASCII
via Aho-Corasick's `ascii_case_insensitive`.

Fixes #497, Fixes #838
2019-04-07 19:11:03 -04:00
Andrew Gallant
de0bc78982
deps: bump encoding_rs to 0.8.16
This brings in an updated `encoding_rs` crate that uses `packed_simd`,
which compiles on the latest nightly. Compilation times do appear to be
impacted significantly though.

Fixes #1175 (again)
2019-02-07 17:05:14 -05:00
Andrew Gallant
386dd2806d
changelog: BUG #916
This was fixed by bumping the MSRV above Rust 1.28.

Fixes #916
2019-01-27 13:15:17 -05:00
Andrew Gallant
5fe9a954e6
changelog: BUG #1154 2019-01-27 13:05:50 -05:00
Andrew Gallant
0df71240ff
search: fix -F and -f interaction bug
This fixes what appears to be a pretty egregious regression where the
`-F/--fixed-strings` flag wasn't be applied to patterns supplied via
the `-f/--file` flag. The same bug existed for the `-x/--line-regexp`
flag as well, which we fix here.

Fixes #1176
2019-01-26 16:01:52 -05:00
Andrew Gallant
f3164f2615
exit: tweak exit status logic
This changes how ripgrep emit exit status codes. In particular, any error
that occurs while searching will now cause ripgrep to emit a `2` exit
code, where as it previously would emit either a `0` or a `1` code based
on whether it matched or not. That is, ripgrep would only emit a `2` exit
code for a catastrophic error.

This tweak includes additional logic that GNU grep adheres to, which seems
like good sense. Namely, if -q/--quiet is given, and an error occurs and
a match occurs, then ripgrep will emit a `0` exit code.

Closes #1159
2019-01-26 15:44:49 -05:00
Andrew Gallant
31d3e24130
args: prevent panicking in 'rg -h | rg'
Previously, we relied on clap to handle printing either an error
message, or --help/--version output, in addition to setting the exit
status code. Unfortunately, for --help/--version output, clap was
panicking if the write failed, which can happen in fairly common
scenarios via a broken pipe error. e.g., `rg -h | head`.

We fix this by using clap's "safe" API and doing the printing ourselves.
We also set the exit code to `2` when an invalid command has been given.

Fixes #1125 and partially addresses #1159
2019-01-26 14:39:40 -05:00
Andrew Gallant
bf842dbc7f
doc: add note about inverted flags
Fixes #1091
2019-01-26 14:13:06 -05:00
Andrew Gallant
6d5dba85bd
doc: clarify automatic encoding detection
Fixes #1103
2019-01-26 13:55:47 -05:00
Andrew Gallant
332dc56372
changelog: BUG #1095 2019-01-26 13:40:59 -05:00
Andrew Gallant
12a6ca45f9
config: add --no-ignore-dot flag
This flag causes ripgrep to ignore `.ignore` files.

Closes #1138
2019-01-26 13:40:12 -05:00
Andrew Gallant
9a9f54d44c
readme: encoding_rs's SIMD support is broken
Add a note about it to the README.

Also, remove mention of the avx-accel feature since it no longer exists.
(bytecount now uses runtime detection to enable SIMD support.)

Fixes #1175
2019-01-24 07:00:53 -05:00
Andrew Gallant
8fd05cacee
changelog: BUG #1121 2019-01-23 20:06:01 -05:00
Andrew Gallant
9c940b45f4
globset: permit ** to appear anywhere
Previously, `man gitignore` specified that `**` was invalid unless it
was used in one of a few specific circumstances, i.e., `**`, `a/**`,
`**/b` or `a/**/b`. That is, `**` always had to be surrounded by either
a path separator or the beginning/end of the pattern.

It turns out that git itself has treated `**` outside the above contexts
as valid for quite a while, so there was an inconsistency between the
spec `man gitignore` and the implementation, and it wasn't clear which
was actually correct.

@okdana filed a bug against git[1] and got this fixed. The spec was wrong,
which has now been fixed [2] and updated[2].

This commit brings ripgrep in line with git and treats `**` outside of
the above contexts as two consecutive `*` patterns. We deprecate the
`InvalidRecursive` error since it is no longer used.

Fixes #373, Fixes #1098

[1] - https://public-inbox.org/git/C16A9F17-0375-42F9-90A9-A92C9F3D8BBA@dana.is
[2] - 627186d020
[3] - https://git-scm.com/docs/gitignore
2019-01-23 19:59:39 -05:00
Andrew Gallant
0a167021c3
changelog: BUG #1174 2019-01-23 19:19:26 -05:00
Andrew Gallant
7048a06c31
changelog: BUG #1173 2019-01-23 18:14:16 -05:00
Andrew Gallant
b48bbf527d
changelog: PR #1093 2019-01-23 17:56:18 -05:00
Mika Dede
a7f2d48234
printer: fix path handling in summarizer
This commit fixes a bug where both of the following commands always
reported an error:

    rg --files-with-matches foo file
    rg --files-without-match foo file

In particular, the printer was erroneously respecting the `path` option
even the the summary kind was `PathWithMatch` or `PathWithoutMatch`. The
documented behavior is that those summary kinds always require a path,
and thus, the `path` option has no effect. We fix this by correcting the
case analysis.

This also fixes a bug where the exit code for `--files-without-match`
was not set correctly. We update the printer's `has_match` method to
report the correct value.

Fixes #1106, Closes #1130
2019-01-22 21:37:23 -05:00
Andrew Gallant
57500ad013
changelog: brotli/zstd addition 2019-01-22 20:57:28 -05:00
David Torosyan
718a00f6f2
ripgrep: add --ignore-file-case-insensitive
The --ignore-file-case-insensitive flag causes all
.gitignore/.rgignore/.ignore files to have their globs matched without
regard for case. Because this introduces a potentially significant
performance regression, this is always disabled by default. Users that
need case insensitive matching can enable it on a case by case basis.

Closes #1164, Closes #1170
2019-01-22 20:03:59 -05:00
Andrew Gallant
ce80d794c0
changelog: add release date 2018-09-07 14:00:23 -04:00
Andrew Gallant
827179250b
changelog: assign feature id 2018-09-04 23:24:22 -04:00
Andrew Gallant
241bc8f8fc ripgrep: add --pre-glob flag
The --pre-glob flag is like the --glob flag, except it applies to filtering
files through the preprocessor instead of for search. This makes it
possible to apply the preprocessor to only a small subset of files, which
can greatly reduce the process overhead of using a preprocessor when
searching large directories.
2018-09-04 23:18:55 -04:00
Andrew Gallant
b6e30124e0 ripgrep: add --line-buffered and --block-buffered
These flags provide granular control over ripgrep's buffering strategy.
The --line-buffered flag can be genuinely useful in certain types of shell
pipelines. The --block-buffered flag has a murkier use case, but we add it
for completeness.
2018-09-04 23:18:55 -04:00
Andrew Gallant
4846d63539 grep-cli: introduce new grep-cli crate
This commit moves a lot of "utility" code from ripgrep core into
grep-cli. Any one of these things might not be worth creating a new
crate, but combining everything together results in a fair number of a
convenience routines that make up a decent sized crate.

There is potentially more we could move into the crate, but much of what
remains in ripgrep core is almost entirely dealing with the number of
flags we support.

In the course of doing moving things to the grep-cli crate, we clean up
a lot of gunk and improve failure modes in a number of cases. In
particular, we've fixed a bug where other processes could deadlock if
they write too much to stderr.

Fixes #990
2018-09-04 23:18:55 -04:00
Andrew Gallant
3edeeca6e9
changelog: fix typo 2018-08-29 18:46:34 -04:00
Andrew Gallant
c41b353009
changelog: update
This brings the changelog up to date with HEAD and rewords a few things.
2018-08-29 18:25:08 -04:00