searcher: work around NUL line terminator bug

As the FIXME comment says, ripgrep is not yet using the new line
terminator option in regex-automata exposed for exactly this purpose.
Because of that, line anchors like `(?m:^)` and `(?m:$)` will only match
`\n` as a line terminator. This means that when --null-data is used in
combination with --line-regexp, the anchors inserted by --line-regexp
will not match correctly. This is only a big deal in the "fast" path,
which requires the regex engine to deal with line terminators itself
correctly. The slow path strips line terminators regardless of what they
are, and so the line anchors can match (begin/end of haystack).

Fixes #2658
This commit is contained in:
Andrew Gallant
2023-11-27 21:07:23 -05:00
parent 2d518dd1f9
commit 805fa32d18
3 changed files with 20 additions and 0 deletions

View File

@@ -1210,3 +1210,10 @@ rgtest!(r2574, |dir: Dir, mut cmd: TestCommand| {
.stdout();
eqnice!("some.domain.com\nsome.domain.com\n", got);
});
// See: https://github.com/BurntSushi/ripgrep/issues/2658
rgtest!(r2658_null_data_line_regexp, |dir: Dir, mut cmd: TestCommand| {
dir.create("haystack", "foo\0bar\0quux\0");
let got = cmd.args(&["--null-data", "--line-regexp", r"bar"]).stdout();
eqnice!("haystack:bar\0", got);
});