Compare commits

...

17 Commits

Author SHA1 Message Date
Andrew Gallant
9ffd4c421f ignore-0.1.5 2016-11-09 18:52:52 -05:00
Andrew Gallant
d862b80afb changelog 0.2.9 2016-11-09 18:52:08 -05:00
Andrew Gallant
5b73dcc8ab Rework parallelism in directory iterator.
Previously, ignore::WalkParallel would invoke the callback for all
*explicitly* given file paths in a single thread, which effectively
meant that `rg pattern foo bar baz ...` didn't actually search foo, bar
and baz in parallel.

The code was structured that way to avoid spinning up workers if no
directory paths were given. The original intention was probably to have
a separate pool of threads responsible for searching, but ripgrep ended
up just reusing the ignore::WalkParallel workers themselves for searching,
and thereby subjected to its sub-par performance in this case.

The code has been restructured so that file paths are sent to the workers,
which brings back parallelism.

Fixes #226
2016-11-09 17:19:40 -05:00
Andrew Gallant
2dce0dc0df Fix a bug with handling --ignore-file.
Namely, passing a directory to --ignore-file caused ripgrep to allocate
memory without bound.

The issue was that I got a bit overzealous with partial error
reporting. Namely, when processing a gitignore file, we should try
to use every pattern even if some patterns are invalid globs (e.g.,
a**b). In the process, I applied the same logic to I/O errors. In this
case, it manifest by attempting to read lines from a directory, which
appears to yield Results forever, where each Result is an error of the
form "you can't read from a directory silly." Since I treated it as a
partial error, ripgrep was just spinning and accruing each error in
memory, which caused the OOM killer to kick in.

Fixes #228
2016-11-09 16:45:23 -05:00
Andrew Gallant
2e5c3c05e8 reword 2016-11-06 19:48:49 -05:00
Andrew Gallant
6884eea2f5 reword 2016-11-06 19:48:17 -05:00
Andrew Gallant
a3a2f0be6a ucg author says it's not a bug per se 2016-11-06 19:45:18 -05:00
Andrew Gallant
f24873c70b Don't ever search directories. 2016-11-06 19:02:14 -05:00
Andrew Gallant
58126ffe15 touchups 2016-11-06 18:51:00 -05:00
Andrew Gallant
17644a76c0 typo 2016-11-06 18:49:07 -05:00
Andrew Gallant
9fc9f368f5 Always search paths given by user.
This permits doing `rg -a test /dev/sda1` for example, where as before
/dev/sda1 was skipped because it wasn't a regular file.
2016-11-06 18:23:50 -05:00
Andrew Gallant
9cab076a72 touchups 2016-11-06 18:04:55 -05:00
Andrew Gallant
7aa9652f3c touchups 2016-11-06 18:02:45 -05:00
Andrew Gallant
7187f61ca8 touchups 2016-11-06 18:01:55 -05:00
Andrew Gallant
f869c58a5a touchups 2016-11-06 17:59:57 -05:00
Andrew Gallant
3538ba3577 Update README with more/updated benchmarks 2016-11-06 17:55:38 -05:00
Andrew Gallant
a454fa75b9 Update brew tap. 2016-11-06 16:51:14 -05:00
10 changed files with 111 additions and 46 deletions

View File

@@ -1,3 +1,15 @@
0.2.9
=====
Bug fixes:
* [BUG #226](https://github.com/BurntSushi/ripgrep/issues/226):
File paths explicitly given on the command line weren't searched in parallel.
(This was a regression in `0.2.7`.)
* [BUG #228](https://github.com/BurntSushi/ripgrep/issues/228):
If a directory was given to `--ignore-file`, ripgrep's memory usage would
grow without bound.
0.2.8
=====
Bug fixes:

View File

@@ -15,11 +15,12 @@ Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
[![A screenshot of a sample search with ripgrep](http://burntsushi.net/stuff/ripgrep1.png)](http://burntsushi.net/stuff/ripgrep1.png)
### Quick example comparing tools
### Quick examples comparing tools
This example searches the entire Linux kernel source tree (after running
`make defconfig && make -j8`) for `[A-Z]+_SUSPEND`, where all matches must be
words. Timings were collected on a system with an Intel i7-6900K 3.2 GHz.
words. Timings were collected on a system with an Intel i7-6900K 3.2 GHz, and
ripgrep was compiled using the `compile` script in this repo.
Please remember that a single benchmark is never enough! See my
[blog post on `ripgrep`](http://blog.burntsushi.net/ripgrep/)
@@ -27,17 +28,41 @@ for a very detailed comparison with more benchmarks and analysis.
| Tool | Command | Line count | Time |
| ---- | ------- | ---------- | ---- |
| ripgrep | `rg -n -w '[A-Z]+_SUSPEND'` | 450 | **0.245s** |
| ripgrep (Unicode) | `rg -n -w '[A-Z]+_SUSPEND'` | 450 | **0.134s** |
| [The Silver Searcher](https://github.com/ggreer/the_silver_searcher) | `ag -w '[A-Z]+_SUSPEND'` | 450 | 0.753s |
| [git grep](https://www.kernel.org/pub/software/scm/git/docs/git-grep.html) | `LC_ALL=C git grep -E -n -w '[A-Z]+_SUSPEND'` | 450 | 0.823s |
| [git grep (Unicode)](https://www.kernel.org/pub/software/scm/git/docs/git-grep.html) | `LC_ALL=en_US.UTF-8 git grep -E -n -w '[A-Z]+_SUSPEND'` | 450 | 2.880s |
| [sift](https://github.com/svent/sift) | `sift --git -n -w '[A-Z]+_SUSPEND'` | 450 | 3.656s |
| [The Platinum Searcher](https://github.com/monochromegane/the_platinum_searcher) | `pt -w -e '[A-Z]+_SUSPEND'` | 450 | 12.369s |
| [ack](http://beyondgrep.com/) | `ack -w '[A-Z]+_SUSPEND'` | 1878 | 16.952s |
| [ack](https://github.com/petdance/ack2) | `ack -w '[A-Z]+_SUSPEND'` | 1878 | 16.952s |
(Yes, `ack` [has](https://github.com/petdance/ack2/issues/445) a
[bug](https://github.com/petdance/ack2/issues/14).)
Here's another benchmark that disregards gitignore files and searches with a
whitelist instead. The corpus is the same as in the previous benchmark, and the
flags passed to each command ensures that they are doing equivalent work:
| Tool | Command | Line count | Time |
| ---- | ------- | ---------- | ---- |
| ripgrep | `rg -L -u -tc -n -w '[A-Z]+_SUSPEND'` | 404 | **0.108s** |
| [ucg](https://github.com/gvansickle/ucg) | `ucg --type=cc -w '[A-Z]+_SUSPEND'` | 392 | 0.219s |
| [GNU grep](https://www.gnu.org/software/grep/) | `egrep -R -n --include='*.c' --include='*.h' -w '[A-Z]+_SUSPEND'` | 404 | 0.733s |
(`ucg` [has slightly different behavior in the presence of symbolic links](https://github.com/gvansickle/ucg/issues/106).)
And finally, a straight up comparison between ripgrep and GNU grep on a single
large file (~9.3GB,
[`OpenSubtitles2016.raw.en.gz`](http://opus.lingfil.uu.se/OpenSubtitles2016/mono/OpenSubtitles2016.raw.en.gz)):
| Tool | Command | Line count | Time |
| ---- | ------- | ---------- | ---- |
| ripgrep | `rg -w 'Sherlock [A-Z]\w+'` | 5268 | **2.520s** |
| [GNU grep](https://www.gnu.org/software/grep/) | `LC_ALL=C egrep -w 'Sherlock [A-Z]\w+'` | 5268 | 7.143s |
In the above benchmark, passing the `-n` flag (for showing line numbers)
increases the times to `3.081s` for ripgrep and `11.403s` for GNU grep.
### Why should I use `ripgrep`?
* It can replace both The Silver Searcher and GNU grep because it is faster
@@ -82,8 +107,9 @@ Summarizing, `ripgrep` is fast because:
[`RegexSet`](https://doc.rust-lang.org/regex/regex/struct.RegexSet.html).
That means a single file path can be matched against multiple glob patterns
simultaneously.
* Uses a Chase-Lev work-stealing queue for quickly distributing work to
multiple threads.
* It uses a lock-free parallel recursive directory iterator, courtesy of
[`crossbeam`](https://docs.rs/crossbeam) and
[`ignore`](https://docs.rs/ignore).
### Installation
@@ -282,9 +308,12 @@ If you have a Rust nightly compiler, then you can enable optional SIMD
acceleration like so:
```
RUSTFLAGS="-C target-cpu=native" cargo build --release --features simd-accel
RUSTFLAGS="-C target-cpu=native" cargo build --release --features 'simd-accel avx-accel'
```
If your machine doesn't support AVX instructions, then simply remove
`avx-accel` from the features list. Similarly for SIMD.
### Running tests
`ripgrep` is relatively well tested, including both unit tests and integration

View File

@@ -1,6 +1,6 @@
[package]
name = "ignore"
version = "0.1.4" #:version
version = "0.1.5" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
A fast library for efficiently matching ignore files such as `.gitignore`

View File

@@ -311,7 +311,7 @@ impl GitignoreBuilder {
Ok(line) => line,
Err(err) => {
errs.push(Error::Io(err).tagged(path, lineno));
continue;
break;
}
};
if let Err(err) = self.add_line(Some(path.to_path_buf()), &line) {

View File

@@ -451,7 +451,7 @@ impl WalkBuilder {
pub fn add_ignore<P: AsRef<Path>>(&mut self, path: P) -> Option<Error> {
let mut builder = GitignoreBuilder::new("");
let mut errs = PartialErrorBuilder::default();
errs.maybe_push_ignore_io(builder.add(path));
errs.maybe_push(builder.add(path));
match builder.build() {
Ok(gi) => { self.ig_builder.add_ignore(gi); }
Err(err) => { errs.push(err); }
@@ -747,40 +747,33 @@ impl WalkParallel {
let mut f = mkf();
let threads = self.threads();
let queue = Arc::new(MsQueue::new());
let mut any_dirs = false;
let mut any_work = false;
// Send the initial set of root paths to the pool of workers.
// Note that we only send directories. For files, we send to them the
// callback directly.
for path in self.paths {
if path == Path::new("-") {
if f(Ok(DirEntry::new_stdin())).is_quit() {
return;
}
continue;
}
let dent = match DirEntryRaw::from_path(0, path) {
Ok(dent) => DirEntry::new_raw(dent, None),
Err(err) => {
if f(Err(err)).is_quit() {
return;
let dent =
if path == Path::new("-") {
DirEntry::new_stdin()
} else {
match DirEntryRaw::from_path(0, path) {
Ok(dent) => DirEntry::new_raw(dent, None),
Err(err) => {
if f(Err(err)).is_quit() {
return;
}
continue;
}
}
continue;
}
};
if !dent.file_type().map_or(false, |t| t.is_dir()) {
if f(Ok(dent)).is_quit() {
return;
}
} else {
any_dirs = true;
queue.push(Message::Work(Work {
dent: dent,
ignore: self.ig_root.clone(),
}));
}
};
queue.push(Message::Work(Work {
dent: dent,
ignore: self.ig_root.clone(),
}));
any_work = true;
}
// ... but there's no need to start workers if we don't need them.
if !any_dirs {
if !any_work {
return;
}
// Create the workers and then wait for them to finish.
@@ -839,6 +832,11 @@ struct Work {
}
impl Work {
/// Returns true if and only if this work item is a directory.
fn is_dir(&self) -> bool {
self.dent.file_type().map_or(false, |t| t.is_dir())
}
/// Adds ignore rules for parent directories.
///
/// Note that this only applies to entries at depth 0. On all other
@@ -921,6 +919,15 @@ impl Worker {
fn run(mut self) {
while let Some(mut work) = self.get_work() {
let depth = work.dent.depth();
// If this is an explicitly given path and is not a directory,
// then execute the caller's callback and move on.
if depth == 0 && !work.is_dir() {
if (self.f)(Ok(work.dent)).is_quit() {
self.quit_now();
return;
}
continue;
}
if self.parents {
if let Some(err) = work.add_parents() {
if (self.f)(Err(err)).is_quit() {

View File

@@ -1,9 +1,9 @@
class RipgrepBin < Formula
version '0.2.5'
version '0.2.8'
desc "Search tool like grep and The Silver Searcher."
homepage "https://github.com/BurntSushi/ripgrep"
url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-apple-darwin.tar.gz"
sha256 "c6775a50c6f769de2ee66892a700961ec60a85219aa414ef6880dfcc17bf2467"
sha256 "349aba7561028e869932bae8fd27cd5ce45a68f47f05d426d6701a50a8474aa0"
conflicts_with "ripgrep"

View File

@@ -1,3 +1,4 @@
use std::cmp;
use std::env;
use std::io;
use std::path::{Path, PathBuf};
@@ -376,7 +377,7 @@ impl RawArgs {
};
let threads =
if self.flag_threads == 0 {
num_cpus::get()
cmp::min(12, num_cpus::get())
} else {
self.flag_threads
};

View File

@@ -271,10 +271,18 @@ fn get_or_log_dir_entry(
eprintln!("{}", err);
}
}
if !dent.file_type().map_or(true, |x| x.is_file()) {
None
} else {
let ft = match dent.file_type() {
None => return Some(dent), // entry is stdin
Some(ft) => ft,
};
// A depth of 0 means the user gave the path explicitly, so we
// should always try to search it.
if dent.depth() == 0 && !ft.is_dir() {
Some(dent)
} else if ft.is_file() {
Some(dent)
} else {
None
}
}
}

View File

@@ -550,9 +550,10 @@ impl InputBuffer {
keep_from: usize,
) -> Result<bool, io::Error> {
// Rollover bytes from buf[keep_from..end] and update our various
// pointers. N.B. This could be done with the unsafe ptr::copy, but
// I haven't been able to produce a benchmark that notices a difference
// in performance. (Invariably, ptr::copy is also clearer IMO.)
// pointers. N.B. This could be done with the ptr::copy, but I haven't
// been able to produce a benchmark that notices a difference in
// performance. (Invariably, ptr::copy is seems clearer IMO, but it is
// not safe.)
self.tmp.clear();
self.tmp.extend_from_slice(&self.buf[keep_from..self.end]);
self.buf[0..self.tmp.len()].copy_from_slice(&self.tmp);

View File

@@ -893,6 +893,13 @@ clean!(regression_206, "test", ".", |wd: WorkDir, mut cmd: Command| {
assert_eq!(lines, format!("{}:test\n", path("foo/bar.txt")));
});
// See: https://github.com/BurntSushi/ripgrep/issues/228
clean!(regression_228, "test", ".", |wd: WorkDir, mut cmd: Command| {
wd.create_dir("foo");
cmd.arg("--ignore-file").arg("foo");
wd.assert_err(&mut cmd);
});
// See: https://github.com/BurntSushi/ripgrep/issues/20
sherlock!(feature_20_no_filename, "Sherlock", ".",
|wd: WorkDir, mut cmd: Command| {