This is why I was so intent on clearing the PR queue. This will
effectively invalidate all existing patches, so I wanted to start from a
clean slate.
We do make one little tweak: we put the default type definitions in
their own file and tell rustfmt to keep its grubby mits off of it. We
also sort it lexicographically and hopefully will enforce that from here
on.
Change the meaning of `Quit` message. Now it means terminate. The final
"dance" is unnecessary, because by the time quitting begins, no thread
will ever spawn a new `Work`. The trick was to replace the heuristic
spin-loop with blocking receive.
Closes#1337
Due to how walkdir works if symlinks are not followed, symlinks to
directories are seen as simple files by ripgrep. This caused a panic
in some cases due to receiving a WalkEvent::Exit event without a
corresponding WalkEvent::Dir event.
This is fixed by looking at the metadata of the file in the case of a
symlink to determine if it's a directory. We are careful to only do
this stat check when the depth of the entry is 0, as this bug only
impacts us when 1) we aren't following symlinks generally and 2) the
user provides a symlinked directory that we do follow as a top-level
path to search.
Fixes#1389, Closes#1397
This flag prevents ripgrep from requiring one to search a git repository
in order to respect git-related ignore rules (global, .gitignore and
local excludes). This actually corresponds to behavior ripgrep had long
ago, but #934 changed that. It turns out that users were relying on this
buggy behavior. In most cases, fixing it as simple as converting one's
rules to .ignore or .rgignore files. Unfortunately, there are other use
cases---like Perforce automatically respecting .gitignore files---that
make a strong case for ripgrep to at least support this.
The UX of a flag like this is absolutely atrocious. It's so obscure that
it's really not worth explicitly calling it out anywhere. Moreover, the
error cases that occur when this flag isn't used (but its behavior is
desirable) will not be intuitive, do not seem easily detectable and will
not guide users to this flag. Nevertheless, the motivation for this is
just barely strong enough for me to begrudgingly accept this.
Fixes#1414, Closes#1416
This commit fixes an inconsistency between the serial and the parallel
directory walkers around visiting a directory for which the user holds
insufficient permissions to descend into.
The serial walker does produce a successful entry for a directory that
it cannot descend into due to insufficient permissions. However, before
this change that has not been the case for the parallel walker, which
would produce an `Err` item not only when descending into a directory
that it cannot read from but also for the directory entry itself.
This change brings the behaviour of the parallel variant in line with
that of the serial one.
Fixes#1346, Closes#1365
On top of the parallel-walk's closures, this provides a Visitor API.
This clarifies the role of the two different closures in the `run`
API and allows implementing of `Drop` for post-processing once traversal
is finished.
The closure API is maintained not just for compatibility but also
convinience for simple cases.
Fixes#469, Closes#1430
This makes it so the caller can more easily refactor from
single-threaded to multi-threaded walking. If they want to support both,
this makes it easier to do so with a single initialization code-path. In
particular, it side-steps the need to put everything into an `Arc`.
This is not a breaking change because it strictly increases the number
of allowed inputs to `WalkParallel::run`.
Closes#1410, Closes#1432
Git looks for this file in GIT_COMMON_DIR, which is usually the same
as GIT_DIR (.git). However, when searching inside a linked worktree,
.git is usually a file that contains the path of the actual git dir,
which in turn contains a file "commondir" which references the directory
where info/exclude may reside, alongside other configuration shared across
all worktrees. This directory is usually the git dir of the main worktree.
Unlike git this does *not* read environment variables GIT_DIR and
GIT_COMMON_DIR, because it is not clear how to interpret them when
searching multiple repositories.
Fixes#1445, Closes#1446
This commit adds a simple `.exists()` check for `.gitignore`,
`.ignore`, and other similar files before actually calling
`File::open(…)` in `GitIgnoreBuilder::add`.
The reason is that a simple existence check via `stat` can be faster
than actually trying to `open` the file, see
https://stackoverflow.com/a/12774387/704831. As we typically expect(?)
the number of directories *without* ignore files to be much larger
than the number of directories *with* ignore files, this leads to an
overall speedup.
The performance gain is not huge for `rg`, but can be quite significant
if more `.gitignore`-like files are added via
`add_custom_ignore_filename`. The speedup is *larger* for folders with
*low* files-per-directory ratios.
Note though that we do not do this check on Windows until a specific
analysis there suggests this is beneficial. Namely, Windows generally
has slower file system operations, so it's not clear whether this
speculative check is actually a benefit or not.
Benchmark results
-----------------
`rg --files` in my home folder (200k results, 6.5 files per directory):
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `./rg-master --files` | 396.4 ± 3.2 | 390.9 | 400.0 | 1.05 |
| `./rg-feature --files` | 376.0 ± 3.6 | 369.3 | 383.5 | 1.00 |
`rg --files --hidden` in my home folder (800k results, 5.4
files per directory)
| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| `./rg-master --files --hidden` | 1.575 ± 0.012 | 1.560 | 1.597 | 1.06 |
| `./rg-feature --files --hidden` | 1.479 ± 0.011 | 1.464 | 1.496 | 1.00 |
`rg --files` in the chromium-79.0.3915.2 source tree (300k results, 12.7 files per
directory)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `~/rg-master --files` | 445.2 ± 5.3 | 435.6 | 453.0 | 1.04 |
| `~/rg-feature --files` | 428.9 ± 7.0 | 418.2 | 440.0 | 1.00 |
`rg --files` in the linux-5.3 source tree (65k results, 15.1
files per directory)
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `./rg-master --files` | 94.5 ± 1.9 | 89.8 | 98.5 | 1.02 |
| `./rg-feature --files` | 92.6 ± 2.7 | 88.4 | 98.7 | 1.00 |
Closes#1381
.org_archive is the default extension for Org archive files, created when
entries from an Org-mode file are archived (see
<https://orgmode.org/org.html#Moving-subtrees>). These files are still in Org
mode format, so it's worth searching them at the same time as non-archive Org
mode files.
PR #1475
We were only using it to create temporary directories for `ignore`
tests, but it pulls in a bunch of dependencies and we don't really need
randomness. So just use our own simple wrapper instead.
Currently the crate assumes that exactly one of `cfg(windows)` or
`cfg(unix)` is true, but this is not actually the case, for instance
when compiling for `wasm32`.
Implement the missing functions so that the crate can compile on other
platforms, even though those functions will always return an error.
PR #1327
This includes:
*.dtd for Document Type Definitions
*.xsl and *.xslt for XSL Transformation descriptions
*.xsd for XML Schema definitions
*.xjb for JAXB bindings
*.rng for Relax NG files
*.sch for Schematron files
PR #1243
This commit fixes a bug where ripgrep only treated files beginning with
a `.` as hidden. On Windows, we continue this tradition, but
additionally check whether a file has the special Windows "hidden"
attribute set. If so, we treat it as a hidden file.
In order to make this work without an additional stat call, we had to
rearrange some of the plumbing from the directory traverser.
Fixes#1154
Previously, `man gitignore` specified that `**` was invalid unless it
was used in one of a few specific circumstances, i.e., `**`, `a/**`,
`**/b` or `a/**/b`. That is, `**` always had to be surrounded by either
a path separator or the beginning/end of the pattern.
It turns out that git itself has treated `**` outside the above contexts
as valid for quite a while, so there was an inconsistency between the
spec `man gitignore` and the implementation, and it wasn't clear which
was actually correct.
@okdana filed a bug against git[1] and got this fixed. The spec was wrong,
which has now been fixed [2] and updated[2].
This commit brings ripgrep in line with git and treats `**` outside of
the above contexts as two consecutive `*` patterns. We deprecate the
`InvalidRecursive` error since it is no longer used.
Fixes#373, Fixes#1098
[1] - https://public-inbox.org/git/C16A9F17-0375-42F9-90A9-A92C9F3D8BBA@dana.is
[2] - 627186d020
[3] - https://git-scm.com/docs/gitignore
This fixes a bug where repeated use of ** didn't behave as it should. In
particular, each use of `**` added a new requirement directory depth
requirement. For example, something like `**/**/b` would match
`foo/bar/b`, but it wouldn't match `foo/b` even though it should. In
particular, `**` semantics demand "infinite" depth, so repeated uses of
`**` should just coalesce as if only one was given.
We do this coalescing in the parser. It's a little tricky because we
treat `**/a`, `a/**` and `a/**/b` as distinct tokens with their own
regex conversions. We also test the crap out of it.
Fixes#1174
When deciding whether to add the `**/` prefix or not, we should choose
not to add it if the pattern is simply a bare `**`. Previously, we were
only not adding it if it was `**/`, which is correct, but we also need
to do it for `**` since `**` can already match anywhere.
There's likely a more principled solution to this, but this works for
now.
Fixes#1173