ripgrep

mirror of https://github.com/BurntSushi/ripgrep.git synced 2025-08-01 04:32:01 -07:00

Author	SHA1	Message	Date
Andrew Gallant	f158a42a71	ignore: correctly detect hidden files on Windows This commit fixes a bug where ripgrep only treated files beginning with a `.` as hidden. On Windows, we continue this tradition, but additionally check whether a file has the special Windows "hidden" attribute set. If so, we treat it as a hidden file. In order to make this work without an additional stat call, we had to rearrange some of the plumbing from the directory traverser. Fixes #1154	2019-01-27 12:11:52 -05:00
David Torosyan	718a00f6f2	ripgrep: add --ignore-file-case-insensitive The --ignore-file-case-insensitive flag causes all .gitignore/.rgignore/.ignore files to have their globs matched without regard for case. Because this introduces a potentially significant performance regression, this is always disabled by default. Users that need case insensitive matching can enable it on a case by case basis. Closes #1164, Closes #1170	2019-01-22 20:03:59 -05:00
Andrew Gallant	662a9bc73d	deps: update to crossbeam-channel 0.3 This also requires corresponding updates to both rand and rand_core. Doing an update of rand without doing an update of rand_core results in compilation errors because two distinct versions of rand_core are included in the build, and the traits they expose are distinct and incompatible. We also switch over to using tempfile instead of tempdir, which drops the last remaining thing keeping rand 0.4 in the build. Fixes #1141, Fixes #1142	2018-12-15 08:40:04 -05:00
Andrew Gallant	fd22cd520b	windows: fix unused warnings on Windows	2018-09-04 23:18:55 -04:00
Aaron Power	d18839f3dc	ignore: add into_path for DirEntry (#1031 ) This commit adds ignore::DirEntry::into_path to match the corresponding method on walkdir::DirEntry.	2018-08-28 18:27:34 -04:00
Andrew Gallant	e5bb750995	ignore: add 'stdout' skipping to the walker This commit adds a new 'skip_stdout' option to the directory walker. When enabled, it will skip yielding any directory entries that are believed to correspond to stdout for the current process. This is useful for filtering out 'results' in a command like 'grep -r foo > results' in order to avoid an unbounded feedback mechanism.	2018-08-27 21:18:53 -04:00
Andrew Gallant	510f15f4da	ignore: add sort_by_file_path builder method This permits callers to sort entries by their full file path, which makes it easy to query for various file statistics. It would have been better to provide a comparator on DirEntry itself, similar to how walkdir does it, but this seems to require quite a bit of work to make the types work out, assuming we want to continue to use walkdir's sorting support (we do).	2018-08-26 18:42:25 -04:00
Andrew Gallant	f9ce7a84a8	ignore: add 'same_file_system' option This commit adds a 'same_file_system' option to the walk builder. For single threaded walking, it defers to the walkdir crate, which has the same option. The bulk of this commit implements this flag for the parallel walker. We add one very feeble test for this. The parallel walker is now officially a complete mess. Closes #321	2018-08-26 18:42:25 -04:00
Andrew Gallant	2f3dbf5fee	ignore: fix false positive in path_is_symlink This commit fixes a bug where the first path always reported itself as as symlink via `path_is_symlink`. Part of this fix includes updating walkdir to 2.2.1, which also includes a corresponding bug fix. Fixes #984	2018-08-21 23:05:52 -04:00
Andrew Gallant	1529ce3341	ripgrep: remove workaround for std bug This commit undoes a work-around for a bug in Rust's standard library that prevented correct file type detection on Windows in OneDrive directories. We remove the work-around because we are moving to a latest-stable Rust version policy, which has included this fix for a while now. ref #705, https://github.com/rust-lang/rust/issues/46484	2018-08-21 23:05:52 -04:00
Andrew Gallant	95a4f15916	ignore: clarify docs for DirEntry::error Fixes #953	2018-08-21 23:05:52 -04:00
Andrew Gallant	0eef05142a	ripgrep: move minimum version to Rust stable This also updates some code to make use of our more liberal versioning requirement, including the use of crossbeam-channel instead of the MsQueue from the older an unmaintained crossbeam 0.3. This does regrettably add a sizable number of dependencies, however, compile times seem mostly unaffected. Closes #1019	2018-08-21 23:05:52 -04:00
Andrew Gallant	e65ca21a6c	ignore: only respect .gitignore in git repos This commit fixes an interesting bug in the `ignore` crate where it would basically respect any `.gitignore` file anywhere (including global gitignores in `~/.config/git/ignore`), regardless of whether we were searching in a git repository or not. This commit rectifies that behavior to only respect gitignore rules when in a git repo. The key change here is to move the logic of whether to traverse parents into the directory matcher rather than putting the onus on the directory traverser. In particular, we now need to traverse parent directories in more cases than we previously did, since we need to determine whether we're in a git repository or not. Fixes #934	2018-07-22 10:33:23 -04:00
phiresky	aa2ce39d14	ignore: fix has_any_ignore_rules for explicit ignores When building a ignore::WalkBuilder by disabling all standard filters and adding a custom global ignore file, the ignore file is not used. Example: let mut walker = ignore::WalkBuilder::new(dir); walker.standard_filters(false); walker.add_ignore(myfile); This makes it impossible to use the ignore crate to walk a directory with only custom ignore files. Very similar to issue #800 (fixed in `b71a110`). PR #988	2018-07-21 13:26:54 -04:00
Andrew Gallant	ebdb7c1d4c	ignore: impl Clone for DirEntry There is a small hiccup here in that a `DirEntry` can embed errors associated with reading an ignore file, which can be accessed and logged by consumers if desired. That error type can contain an io::Error, which isn't cloneable. We therefore implement Clone on our library's error type in a way that re-creates the I/O error as best as possible. Fixes #891	2018-04-21 12:01:11 -04:00
Andrew Gallant	d65966efbc	ignore: fix performance regression on Windows This commit fixes a performance regression in Windows that resulted from fallout from fixing #705. In particular, we introduced an additional stat call for every single directory entry, which can be quite disastrous for performance. There is a corresponding companion PR that fixes the same bug in walkdir: https://github.com/BurntSushi/walkdir/pull/96 Fixes #820	2018-02-20 19:50:52 -05:00
Andrew Gallant	18f549d289	ignore: fix symlink following on Windows This commit fixes a bug where symlinks were always being followed on Windows, even if the user did not request it. This only impacts the parallel iterator. This is a regression from the fallout of fixing #705. Fixes #824	2018-02-19 20:52:37 -05:00
David Peter	b71a110ccf	ignore: fix custom ignore name bug This commit fixes a bug in the handling of custom gitignore file names. Previously, the directory walker would check for whether there were any ignore rules present, but this check didn't incorporate the custom gitignore rules. At a high level, this permits custom gitignore names to be used even if no other source of gitignore rules is used. Fixes #800	2018-02-14 06:53:26 -05:00
Andrew Gallant	e36b65a11a	windows: fix OneDrive traversals This commit fixes a bug on Windows where directory traversals were completely broken when attempting to scan OneDrive directories that use the "file on demand" strategy. The specific problem was that Rust's standard library treats OneDrive directories as reparse points instead of directories, which causes methods like `FileType::is_file` and `FileType::is_dir` to always return false, even when retrieved via methods like `metadata` that purport to follow symbolic links. We fix this by peppering our code with checks on the underlying file attributes exposed by Windows. We consider an entry a directory if and only if the directory bit is set on the attributes. We are careful to make sure that the code remains the same on non-Windows platforms. Note that we also bump the dependency on `walkdir`, which contains a similar fix for its traversals. This bug is recorded upstream: https://github.com/rust-lang/rust/issues/46484 Upstream also has a pending PR: https://github.com/rust-lang/rust/pull/47956 Fixes #705	2018-02-01 21:11:02 -05:00
ptzz	3cb4d1337e	ignore: support custom file names This commit adds support for ignore files with custom names. This allows for application specific ignorefile names, e.g. using `.fdignore` for `fd`. See also: https://github.com/BurntSushi/ripgrep/issues/673 See also: https://github.com/sharkdp/fd/issues/156	2018-01-29 16:06:05 -05:00
Balaji Sivaraman	b6177f0459	cleanup: replace try! with ?	2018-01-01 09:22:35 -05:00
Andrew Gallant	5714dbde09	ignore: partially revert symlink loop check optimization This optimization wasn't tested too carefully, and it seems to result in a massive amount of file handles open simultaneously. This is likely a result of the parallel iterator, where many directories are being traversed simultaneously. Fixes #648	2017-10-22 10:31:34 -04:00
Andrew Gallant	1bf9d29259	ignore: be fastidious with file handles This commit fixes the symlink loop checker in the parallel directory traverser to open fewer handles at the expense of keeping handles held open longer. This roughly matches the corresponding change in walkdir: `5bcc5b87ee` Fixes #633	2017-10-21 22:40:10 -04:00
Andrew Gallant	cd575d99f8	ignore: upgrade to walkdir 2 The uninteresting bits of this commit involve mechanical changes for updates to walkdir 2. The more interesting bits of this commit are the breaking changes, although none of them should require any significant change on users of this library. The breaking changes are as follows: * `DirEntry::path_is_symbolic_link` has been renamed to `DirEntry::path_is_symlink`. This matches the conventions in the standard library, and also the corresponding name change in walkdir. * Removed the `From<walkdir::Error> for ignore::Error` impl. This was intended to only be used internally, but was the only thing that made `walkdir` a public dependency of `ignore`. Therefore, we remove it since it seems unnecessary. * Renamed `WalkBuilder::sort_by` to `WalkBuilder::sort_by_file_name`, and changed the type of the comparator from Fn(&OsString, &OsString) -> cmp::Ordering + 'static to Fn(&OsStr, &OsStr) -> cmp::Ordering + Send + Sync + 'static The corresponding change in `walkdir` retains the `sort_by` name, but gives the comparator a pair of `&DirEntry` values instead of a pair of `&OsStr` values. Ideally, `ignore` would hand off its own pair of `&ignore::DirEntry` values, but this requires more design work. So for now, we retain previous functionality, but leave room to make a proper `sort_by` method. [breaking-change]	2017-10-21 22:40:09 -04:00
Alex Burka	a5f82e8826	ignore: add grouped toggle for standard filters	2017-09-02 12:28:59 -04:00
Alex Burka	82d101907a	ignore: document git_global enabled by default	2017-08-26 14:49:40 -04:00
Jordan Danford	c8a5a7a3f4	Fix minor grammar issues in docs for `ignore::Walk`	2017-07-06 06:58:14 -04:00
Marc Tiehuis	71585f6d47	Reduce unnecessary stat calls for max_filesize	2017-03-08 10:17:18 -05:00
tiehuis	49fd668712	Add file size exclusion to walker A maximum filesize can be specified as an argument to a `WalkBuilder`. If a file exceeds the specified size it will be ignored as part of the resulting file/directory set. The filesize limit never applies to directories.	2017-03-08 10:17:18 -05:00
Andrew Gallant	461e0c4e33	Don't search stdout redirected file. When running ripgrep like this: rg foo > output we must be careful not to search `output` since ripgrep is actively writing to it. Searching it can cause massive blowups where the file grows without bound. While this is conceptually easy to fix (check the inode of the redirection and the inode of the file you're about to search), there are a few problems with it. First, inodes are a Unix thing, so we need a Windows specific solution to this as well. To resolve this concern, I created a new crate, `same-file`, which provides a cross platform abstraction. Second, stat'ing every file is costly. This is not avoidable on Windows, but on Unix, we can get the inode number directly from directory traversal. However, this information wasn't exposed, but now it is (through both the ignore and walkdir crates). Fixes #286	2017-01-09 16:12:08 -05:00
Andrew Gallant	b65a8c353b	Add --sort-files flag. When used, parallelism is disabled but the results are sorted by file path. Closes #263	2017-01-06 22:43:59 -05:00
Andrew Gallant	95cea77625	Tweak the parallel directory iterator. This commit fixes two issues. First, the iterator was executing the callback for every child of a directory in a single thread. Therefore, if the walker was run over a single directory, then no parallelism is used. We tweak the iterator slightly so that we don't fall into this trap. The second issue is a bit more subtle. In particular, we don't use the blocking semantics of MsQueue because we don't know when iteration finishes. This means that if there are a bunch of idle workers because there is no work available to them, then they will spin and burn the CPU. One case where this crops up is if you pipe the output of ripgrep into `less` and the total number of files to search is fewer than the number of threads ripgrep uses. We "fix" this with a very stupid heuristic: when the queue yields no work, we sleep the thread for 1ms. This still pegs the CPU, but not nearly as much as before. If one really want to avoid this behavior when using ripgrep, then `-j1` can be used to disable parallelism. Fixes #258	2017-01-06 21:43:49 -05:00
Andrew Gallant	bb70f96743	Fix a non-termination bug. This was a very silly bug. Instead of creating a particular atomic once and cloning it, we created a new value for each worker. Fixes #279	2016-12-12 06:55:49 -05:00
Andrew Gallant	7282706b42	Fix bug reading root symlink. When give an explicit file path on the command line like `foo` where `foo` is a symlink, ripgrep should follow it even if `-L` isn't set. This is consistent with the behavior of `foo/`. Fixes #256	2016-12-05 20:05:57 -05:00
Andrew Gallant	5b73dcc8ab	Rework parallelism in directory iterator. Previously, ignore::WalkParallel would invoke the callback for all explicitly given file paths in a single thread, which effectively meant that `rg pattern foo bar baz ...` didn't actually search foo, bar and baz in parallel. The code was structured that way to avoid spinning up workers if no directory paths were given. The original intention was probably to have a separate pool of threads responsible for searching, but ripgrep ended up just reusing the ignore::WalkParallel workers themselves for searching, and thereby subjected to its sub-par performance in this case. The code has been restructured so that file paths are sent to the workers, which brings back parallelism. Fixes #226	2016-11-09 17:19:40 -05:00
Andrew Gallant	2dce0dc0df	Fix a bug with handling --ignore-file. Namely, passing a directory to --ignore-file caused ripgrep to allocate memory without bound. The issue was that I got a bit overzealous with partial error reporting. Namely, when processing a gitignore file, we should try to use every pattern even if some patterns are invalid globs (e.g., a**b). In the process, I applied the same logic to I/O errors. In this case, it manifest by attempting to read lines from a directory, which appears to yield Results forever, where each Result is an error of the form "you can't read from a directory silly." Since I treated it as a partial error, ripgrep was just spinning and accruing each error in memory, which caused the OOM killer to kick in. Fixes #228	2016-11-09 16:45:23 -05:00
Andrew Gallant	b272be25fa	Add parallel recursive directory iterator. This adds a new walk type in the `ignore` crate, `WalkParallel`, which provides a way for recursively iterating over a set of paths in parallel while respecting various ignore rules. The API is a bit strange, as a closure producing a closure isn't something one often sees, but it does seem to work well. This also allowed us to simplify much of the worker logic in ripgrep proper, where MultiWorker is now gone.	2016-11-05 21:45:55 -04:00
Andrew Gallant	d79add341b	Move all gitignore matching to separate crate. This PR introduces a new sub-crate, `ignore`, which primarily provides a fast recursive directory iterator that respects ignore files like gitignore and other configurable filtering rules based on globs or even file types. This results in a substantial source of complexity moved out of ripgrep's core and into a reusable component that others can now (hopefully) benefit from. While much of the ignore code carried over from ripgrep's core, a substantial portion of it was rewritten with the following goals in mind: 1. Reuse matchers built from gitignore files across directory iteration. 2. Design the matcher data structure to be amenable for parallelizing directory iteration. (Indeed, writing the parallel iterator is the next step.) Fixes #9, #44, #45	2016-10-29 20:48:59 -04:00

38 Commits