87 Commits

Author SHA1 Message Date
Andrew Gallant
5ce2d7351d ci: use cross for musl x86_64 builds
This is necessary because jemalloc + musl + Ubuntu 16.04 is apparently
broken.

Moreover, jemalloc doesn't support i686, so we accept the performance
regression there.

See also: https://github.com/gnzlbg/jemallocator/issues/124
2019-04-25 11:12:14 -04:00
Andrew Gallant
03bf37ff4a
alloc: use jemalloc when building with musl
It turns out that musl's allocator is slow enough to cause a fairly
noticeable performance regression when ripgrep is built as a static
binary with musl. We fix this by using jemalloc when building with musl.

We continue to use the default system allocator in all other scenarios.
Namely, glibc's allocator doesn't noticeably regress performance compared
to jemalloc. But we could add more targets to this logic if other
system allocators (macOS, Windows) prove to be slow.

This wasn't necessary before because rustc recently stopped using jemalloc
by default.

Fixes #1268
2019-04-24 17:21:38 -04:00
Andrew Gallant
da9d720431 ripgrep: add --pcre2-version flag
This flag will output details about the version of PCRE2 that ripgrep
is using (if any).
2019-04-14 19:29:27 -04:00
Andrew Gallant
f3164f2615
exit: tweak exit status logic
This changes how ripgrep emit exit status codes. In particular, any error
that occurs while searching will now cause ripgrep to emit a `2` exit
code, where as it previously would emit either a `0` or a `1` code based
on whether it matched or not. That is, ripgrep would only emit a `2` exit
code for a catastrophic error.

This tweak includes additional logic that GNU grep adheres to, which seems
like good sense. Namely, if -q/--quiet is given, and an error occurs and
a match occurs, then ripgrep will emit a `0` exit code.

Closes #1159
2019-01-26 15:44:49 -05:00
Andrew Gallant
7a6a40bae1 edition: move core ripgrep to Rust 2018 2019-01-19 10:44:30 -05:00
Andrew Gallant
4846d63539 grep-cli: introduce new grep-cli crate
This commit moves a lot of "utility" code from ripgrep core into
grep-cli. Any one of these things might not be worth creating a new
crate, but combining everything together results in a fair number of a
convenience routines that make up a decent sized crate.

There is potentially more we could move into the crate, but much of what
remains in ripgrep core is almost entirely dealing with the number of
flags we support.

In the course of doing moving things to the grep-cli crate, we clean up
a lot of gunk and improve failure modes in a number of cases. In
particular, we've fixed a bug where other processes could deadlock if
they write too much to stderr.

Fixes #990
2018-09-04 23:18:55 -04:00
Andrew Gallant
05a0389555
ripgrep: use winapi-util for stdin_is_readable 2018-08-25 00:30:15 -04:00
Andrew Gallant
7eaaa04c69
ripgrep: small cleanups 2018-08-20 17:34:45 -04:00
Andrew Gallant
bb110c1ebe ripgrep: migrate to libripgrep
This commit does the work to delete the old `grep` crate and effectively
rewrite most of ripgrep core to use the new libripgrep crates. The new
`grep` crate is now a facade that collects the various crates that make
up libripgrep.

The most complex part of ripgrep core is now arguably the translation
between command line parameters and the library options, which is
ultimately where we want to be.
2018-08-20 07:10:19 -04:00
Andrew Gallant
22ac2e056e
ripgrep: stop early when --files --quiet is used
This commit tweaks the implementation of the --files flag to stop early
when --quiet is provided.

Fixes #907
2018-07-22 11:05:24 -04:00
Andrew Gallant
209a125ea2
ripgrep: replace decoder with encoding_rs_io
This commit mostly moves the transcoder implementation to its own
crate: https://github.com/BurntSushi/encoding_rs_io

The new crate adds clear documentation and cleans up the implementation
to fully implement the contract of io::Read.
2018-07-21 20:36:32 -04:00
Charles Blake
231456c409 ripgrep: add --pre flag
The preprocessor flag accepts a command program and executes this
program for every input file that is searched. Instead of searching the
file directly, ripgrep will instead search the stdout contents of the
program.

Closes #978, Closes #981
2018-07-21 17:25:12 -04:00
Andrew Gallant
cd6c190967 ripgrep: use new BufferedStandardStream from termcolor
Specifically, this will use a buffered writer when not printing to a tty.
This fixes a long standing performance regression where ripgrep would
slow down dramatically if it needed to report a lot of matches.

Fixes #955
2018-06-23 20:49:05 -04:00
Jon Surrell
ca23a170f7 ripgrep: use exit code 2 to indicate error
Exit code 1 was shared to indicate both "no results" and "error." Use
status code 2 to indicate errors, similar to grep's behavior.

Fixes #948 

PR #954
2018-06-19 07:41:44 -04:00
Andrew Gallant
0ee0b160b5
logging: add new --no-ignore-messages flag
The new --no-ignore-messages flag permits suppressing errors related to
parsing .gitignore or .ignore files. These error messages can be somewhat
annoying since they can surface from repositories that one has no control
over.

Fixes #646
2018-04-23 18:18:44 -04:00
Balaji Sivaraman
00520b30f5
output: add --stats flag
This commit provides basic support for a --stats flag, which will print
various aggregate statistics about a search after all of the results
have been printed. This is mostly intended to support a similar feature
found in the Silver Searcher. Note though that we don't emit the total
bytes searched; this is a first pass at an implementation and we can
improve upon it later.

Closes #411, Closes #799
2018-03-10 10:59:00 -05:00
Balaji Sivaraman
96f73293c0
cleanup: rename match_count to match_line_count 2018-03-10 10:23:38 -05:00
Andrew Gallant
c57d0fb4e8 config: add persistent configuration
This commit adds support for reading configuration files that change
ripgrep's default behavior. The format of the configuration file is an
"rc" style and is very simple. It is defined by two rules:

  1. Every line is a shell argument, after trimming ASCII whitespace.
  2. Lines starting with '#' (optionally preceded by any amount of
     ASCII whitespace) are ignored.

ripgrep will look for a single configuration file if and only if the
RIPGREP_CONFIG_PATH environment variable is set and is non-empty.
ripgrep will parse shell arguments from this file on startup and will
behave as if the arguments in this file were prepended to any explicit
arguments given to ripgrep on the command line.

For example, if your ripgreprc file contained a single line:

    --smart-case

then the following command

    RIPGREP_CONFIG_PATH=wherever/.ripgreprc rg foo

would behave identically to the following command

    rg --smart-case foo

This commit also adds a new flag, --no-config, that when present will
suppress any and all support for configuration. This includes any future
support for auto-loading configuration files from pre-determined paths
(which this commit does not add).

Conflicts between configuration files and explicit arguments are handled
exactly like conflicts in the same command line invocation. That is,
this command:

    RIPGREP_CONFIG_PATH=wherever/.ripgreprc rg foo --case-sensitive

is exactly equivalent to

    rg --smart-case foo --case-sensitive

in which case, the --case-sensitive flag would override the --smart-case
flag.

Closes #196
2018-02-04 10:40:20 -05:00
Andrew Gallant
3535047094 logger: drop env_logger
This commit updates the `log` crate to 0.4 and drops the dependency on
env_logger. In particular, the latest version of env_logger brings in
additional non-optional dependencies such as chrono that I don't think is
worth including into ripgrep.

It turns out ripgrep doesn't need any fancy logging. We just need a concept
of log levels and the ability to print to stderr. Therefore, we just roll
our own super simple logger.

This update is motivated by the persistent configuration task. In
particular, we need the ability to toggle the global log level more than
once, and this doesn't appear to be possible with older versions of the
log crate.
2018-02-04 10:40:20 -05:00
Andrew Gallant
e36b65a11a
windows: fix OneDrive traversals
This commit fixes a bug on Windows where directory traversals were
completely broken when attempting to scan OneDrive directories that use
the "file on demand" strategy.

The specific problem was that Rust's standard library treats OneDrive
directories as reparse points instead of directories, which causes
methods like `FileType::is_file` and `FileType::is_dir` to always return
false, even when retrieved via methods like `metadata` that purport to
follow symbolic links.

We fix this by peppering our code with checks on the underlying file
attributes exposed by Windows. We consider an entry a directory if and
only if the directory bit is set on the attributes. We are careful to
make sure that the code remains the same on non-Windows platforms.

Note that we also bump the dependency on `walkdir`, which contains a
similar fix for its traversals.

This bug is recorded upstream:
https://github.com/rust-lang/rust/issues/46484

Upstream also has a pending PR:
https://github.com/rust-lang/rust/pull/47956

Fixes #705
2018-02-01 21:11:02 -05:00
Andrew Gallant
93943793c3
worker: better error handling for memory maps
Previously, we would bail out of using memory maps if we could detect
ahead of time that opening a memory map would fail. The only case we
checked was whether the file size was 0 or not.

This is actually insufficient. The mmap call can return ENODEV errors
when a file doesn't support memory maps. This is the case for new files
exposed by Linux, for example,
/sys/devices/system/cpu/vulnerabilities/meltdown.

We fix this by checking the actual error codes returned by the mmap call.
If ENODEV (or EOVERFLOW) is returned, then we fall back to regular `read`
calls. If any other error occurs, we report it to the user.

Fixes #760
2018-01-31 19:20:36 -05:00
Andrew Gallant
0fedaa7d28
style: remove eprintln macro
The eprintln! macro was added to Rust's standard library in Rust 1.19.0,
which is below ripgrep's minimum Rust version. Therefore, we can rely on
the standard library variant now.
2018-01-31 19:17:51 -05:00
Balaji Sivaraman
f007f940c5 search: add support for searching compressed files
This commit adds opt-in support for searching compressed files during
recursive search. This behavior is only enabled when the
`-z/--search-zip` flag is passed to ripgrep. When enabled, a limited set
of common compression formats are recognized via file extension, and a
new process is spawned to perform the decompression. ripgrep then
searches the stdout of that spawned process.

Closes #539
2018-01-30 09:13:53 -05:00
Matthias Krüger
4d34132365 clippy: main.rs: call Clone() on trait instead of ref-counted pointers and pass Arc<Args> by ref more often. 2017-11-22 10:50:28 -05:00
Christof Marti
1136f8adab Avoid expensive check with --files (fixes #600) 2017-09-18 11:54:48 -04:00
Jack O'Connor
3065a8c9c8 restore the default SIGPIPE behavior as a temporary workaround
See https://github.com/BurntSushi/ripgrep/issues/200.
2017-08-27 15:01:05 -04:00
Vurich
b3a9c34515 Remove unused libc dependency 2017-08-08 07:03:58 -04:00
Marc Tiehuis
229b8e3b33 Make --quiet flag apply when using --files option
Fixes #483.
2017-05-19 20:00:47 -04:00
Andrew Gallant
8bbe58d623 Add support for additional text encodings.
This includes, but is not limited to, UTF-16, latin-1, GBK, EUC-JP and
Shift_JIS. (Courtesy of the `encoding_rs` crate.)

Specifically, this feature enables ripgrep to search files that are
encoded in an encoding other than UTF-8. The list of available encodings
is tied directly to what the `encoding_rs` crate supports, which is in
turn tied to the Encoding Standard. The full list of available encodings
can be found here: https://encoding.spec.whatwg.org/#concept-encoding-get

This pull request also introduces the notion that text encodings can be
automatically detected on a best effort basis. Currently, the only
support for this is checking for a UTF-16 bom. In all other cases, a
text encoding of `auto` (the default) implies a UTF-8 or ASCII
compatible source encoding. When a text encoding is otherwise specified,
it is unconditionally used for all files searched.

Since ripgrep's regex engine is fundamentally built on top of UTF-8,
this feature works by transcoding the files to be searched from their
source encoding to UTF-8. This transcoding only happens when:

1. `auto` is specified and a non-UTF-8 encoding is detected.
2. A specific encoding is given by end users (including UTF-8).

When transcoding occurs, errors are handled by automatically inserting
the Unicode replacement character. In this case, ripgrep's output is
guaranteed to be valid UTF-8 (excluding non-UTF-8 file paths, if they
are printed).

In all other cases, the source text is searched directly, which implies
an assumption that it is at least ASCII compatible, but where UTF-8 is
most useful. In this scenario, encoding errors are not detected. In this
case, ripgrep's output will match the input exactly, byte-for-byte.

This design may not be optimal in all cases, but it has some advantages:

1. In the happy path ("UTF-8 everywhere") remains happy. I have not been
   able to witness any performance regressions.
2. In the non-UTF-8 path, implementation complexity is kept relatively
   low. The cost here is transcoding itself. A potentially superior
   implementation might build decoding of any encoding into the regex
   engine itself. In particular, the fundamental problem with
   transcoding everything first is that literal optimizations are nearly
   negated.

Future work should entail improving the user experience. For example, we
might want to auto-detect more text encodings. A more elaborate UX
experience might permit end users to specify multiple text encodings,
although this seems hard to pull off in an ergonomic way.

Fixes #1
2017-03-12 19:54:48 -04:00
Andrew Gallant
8ac5bc0147 Remove Windows deps from ripgrep proper.
All Windows specific code has been (mostly) pushed out of ripgrep and
into its constituent libraries.
2017-02-18 15:06:20 -05:00
Andrew Gallant
f5a2d022ec Replace internal atty module with atty crate.
This removes all use of explicit unsafe in ripgrep proper except for
one: accessing the contents of a memory map. (Which may never go away.)
2017-01-15 16:32:30 -05:00
Andrew Gallant
461e0c4e33 Don't search stdout redirected file.
When running ripgrep like this:

    rg foo > output

we must be careful not to search `output` since ripgrep is actively writing
to it. Searching it can cause massive blowups where the file grows without
bound.

While this is conceptually easy to fix (check the inode of the redirection
and the inode of the file you're about to search), there are a few problems
with it.

First, inodes are a Unix thing, so we need a Windows specific solution to
this as well. To resolve this concern, I created a new crate, `same-file`,
which provides a cross platform abstraction.

Second, stat'ing every file is costly. This is not avoidable on Windows,
but on Unix, we can get the inode number directly from directory traversal.
However, this information wasn't exposed, but now it is (through both the
ignore and walkdir crates).

Fixes #286
2017-01-09 16:12:08 -05:00
Andrew Gallant
de5cb7d22e Remove special ^C handling.
This means that ripgrep will no longer try to reset your colors in your
terminal if you kill it while searching. This could result in messing up
the colors in your terminal, and the fix is to simply run some other
command that resets them for you. For example:

    $ echo -ne "\033[0m"

The reason why the ^C handling was removed is because it is irrevocably
broken on Windows and is impossible to do correctly and efficiently in
ANSI terminals.

Fixes #281
2016-12-24 12:53:09 -05:00
Andrew Gallant
084d3f4911 Small code cleanups. 2016-12-24 10:06:37 -05:00
Andrew Gallant
160f04894f Simplify code.
Instead of `Ok(n) if n == 0` we can just write `Ok(0)`.
2016-12-04 12:00:13 -05:00
Andrew Gallant
e8a30cb893 Completely re-work colored output and tty handling.
This commit completely guts all of the color handling code and replaces
most of it with two new crates: wincolor and termcolor. wincolor
provides a simple API to coloring using the Windows console and
termcolor provides a platform independent coloring API tuned for
multithreaded command line programs. This required a lot more
flexibility than what the `term` crate provided, so it was dropped.
We instead switch to writing ANSI escape sequences directly and ignore
the TERMINFO database.

In addition to fixing several bugs, this commit also permits end users
to customize colors to a certain extent. For example, this command will
set the match color to magenta and the line number background to yellow:

    rg --colors 'match:fg:magenta' --colors 'line:bg:yellow' foo

For tty handling, we've adopted a hack from `git` to do tty detection in
MSYS/mintty terminals. As a result, ripgrep should get both color
detection and piping correct on Windows regardless of which terminal you
use.

Finally, switch to line buffering. Performance doesn't seem to be
impacted and it's an otherwise more user friendly option.

Fixes #37, Fixes #51, Fixes #94, Fixes #117, Fixes #182, Fixes #231
2016-11-20 11:14:52 -05:00
Andrew Gallant
92dc402f7f Switch from Docopt to Clap.
There were two important reasons for the switch:

1. Performance. Docopt does poorly when the argv becomes large, which is
   a reasonable common use case for search tools. (e.g., use with xargs)
2. Better failure modes. Clap knows a lot more about how a particular
   argv might be invalid, and can therefore provide much clearer error
   messages.

While both were important, (1) made it urgent.

Note that since Clap requires at least Rust 1.11, this will in turn
increase the minimum Rust version supported by ripgrep from Rust 1.9 to
Rust 1.11. It is therefore a breaking change, so the soonest release of
ripgrep with Clap will have to be 0.3.

There is also at least one subtle breaking change in real usage.
Previous to this commit, this used to work:

    rg -e -foo

Where this would cause ripgrep to search for the string `-foo`. Clap
currently has problems supporting this use case
(see: https://github.com/kbknapp/clap-rs/issues/742),
but it can be worked around by using this instead:

    rg -e [-]foo

or even

    rg [-]foo

and this still works:

    rg -- -foo

This commit also adds Bash, Fish and PowerShell completion files to the
release, fixes a bug that prevented ripgrep from working on file
paths containing invalid UTF-8 and shows short descriptions in the
output of `-h` but longer descriptions in the output of `--help`.

Fixes #136, Fixes #189, Fixes #210, Fixes #230
2016-11-17 19:53:41 -05:00
Andrew Gallant
f24873c70b Don't ever search directories. 2016-11-06 19:02:14 -05:00
Andrew Gallant
9fc9f368f5 Always search paths given by user.
This permits doing `rg -a test /dev/sda1` for example, where as before
/dev/sda1 was skipped because it wasn't a regular file.
2016-11-06 18:23:50 -05:00
Andrew Gallant
77ad7588ae Add --no-messages flag.
This flag is similar to what's found in grep: it will suppress all error
messages, such as those shown when a particular file couldn't be read.

Closes #149
2016-11-06 14:36:08 -05:00
Andrew Gallant
58aca2efb2 Add -m/--max-count flag.
This flag limits the number of matches printed *per file*.

Closes #159
2016-11-06 13:09:53 -05:00
Andrew Gallant
b272be25fa Add parallel recursive directory iterator.
This adds a new walk type in the `ignore` crate, `WalkParallel`, which
provides a way for recursively iterating over a set of paths in parallel
while respecting various ignore rules.

The API is a bit strange, as a closure producing a closure isn't
something one often sees, but it does seem to work well.

This also allowed us to simplify much of the worker logic in ripgrep
proper, where MultiWorker is now gone.
2016-11-05 21:45:55 -04:00
Brian Campbell
79a8d0ab3f Reset the terminal when Ctrl-C is pressed
If a user hits Ctrl-C to exit out of a search in the middle of printing
a line, we don't want to leave the terminal colors screwed up for them.
Catch Ctrl-C using the ctrlc crate, obtain a stdout lock to ensure that
other threads don't continue writing after we do so, reset the terminal,
and exit the program.

Closes #119
2016-10-29 21:23:05 -04:00
Andrew Gallant
d79add341b Move all gitignore matching to separate crate.
This PR introduces a new sub-crate, `ignore`, which primarily provides a
fast recursive directory iterator that respects ignore files like
gitignore and other configurable filtering rules based on globs or even
file types.

This results in a substantial source of complexity moved out of ripgrep's
core and into a reusable component that others can now (hopefully)
benefit from.

While much of the ignore code carried over from ripgrep's core, a
substantial portion of it was rewritten with the following goals in
mind:

1. Reuse matchers built from gitignore files across directory iteration.
2. Design the matcher data structure to be amenable for parallelizing
   directory iteration. (Indeed, writing the parallel iterator is the
   next step.)

Fixes #9, #44, #45
2016-10-29 20:48:59 -04:00
Andrew Gallant
247a9398f4 Switch to thread_local crate in lieu of thread_local!.
This is to work around a bug where using a thread_local! was causing
a segfault on macos.

Fixes #164.
2016-10-11 18:23:49 -04:00
Andrew Gallant
fdf24317ac Move glob implementation to new crate.
It is isolated and complex enough that it deserves attention all on its
own. It's also eminently reusable.
2016-09-30 19:42:41 -04:00
Andrew Gallant
46dff8f4be Be better with short circuiting with --quiet.
It didn't make sense for --quiet to be part of the printer, because --quiet
doesn't just mean "don't print," it also means, "stop after the first
match is found." This needs to be wired all the way up through directory
traversal, and it also needs to cause all of the search workers to quit
as well. We do it with an atomic that is only checked with --quiet is
given.

Fixes #116.
2016-09-28 20:50:50 -04:00
Andrew Gallant
3e78fce3a3 Don't print empty lines in single threaded mode.
Fixes #99.
2016-09-26 19:57:23 -04:00
Andrew Gallant
104d740f76 Don't quit if opening a file fails.
This was already working correctly in multithreaded mode, but in single
threaded mode, a file failing to open caused search to stop. That's bad.

Fixes #98.
2016-09-26 18:44:19 -04:00
Andrew Gallant
f85822266f Don't use an intermediate buffer when --threads=1.
Fixes #8
2016-09-25 21:27:17 -04:00