ripgrep

mirror of https://github.com/BurntSushi/ripgrep.git synced 2025-05-19 01:30:21 -07:00

Author	SHA1	Message	Date
Andrew Gallant	656aa12649	printer: fix multi-line replacement bug This commit fixes a subtle bug in multi-line replacement of line terminators. The problem is that even though ripgrep supports multi-line searches, it is still line oriented. It still needs to print line numbers, for example. For this reason, there are various parts in the printer that iterate over lines in order to format them into the desired output. This turns out to be problematic in some cases. #1311 documents one of those cases (with line numbers enabled to highlight a point later): $ printf "hello\nworld\n" \| rg -n -U "\n" -r "?" 1:hello? 2:world? But the desired output is this: $ printf "hello\nworld\n" \| rg -n -U "\n" -r "?" 1:hello?world? At first I had thought that the main problem was that the printer was taking ownership of writing line terminators, even if the input already had them. But it's more subtle than that. If we fix that issue, we get output like this instead: $ printf "hello\nworld\n" \| rg -n -U "\n" -r "?" 1:hello?2:world? Notice how '2:' is printed before 'world?'. The reason it works this way is because matches are reported to the printer in a line oriented way. That is, the printer gets a block of lines. The searcher guarantees that all matches that start or end in any of those lines also end or start in another line in that same block. As a result, the printer uses this assumption: once it has processed a block of lines, the next match will begin on a new and distinct line. Thus, things like '2:' are printed. This is generally all fine and good, but an impedance mismatch arises when replacements are used. Because now, the replacement can be used to change the "block of lines" approach. Now, in terms of the output, the subsequent match might actually continue the current line since the replacement might get rid of the concept of lines altogether. We can sometimes work around this. For example: $ printf "hello\nworld\n" \| rg -U "\n(.)?" -r '?$1' hello?world? Why does this work? It's because the '(.)' after the '\n' causes the match to overlap between lines. Thus, the searcher guarantees that the block sent to the printer contains every line. And there in lay the solution: all we need to do is tweak the multi-line searcher so that it combines lines with matches that directly adjacent, instead of requiring at least one byte of overlap. Fixing that solves the issue above. It does cause some tests to fail: * The binary3 test in the searcher crate fails because adjacent line matches are now one part of block, and that block is scanned for binary data. To preserve the essence of the test, we insert a couple dummy lines to split up the blocks. * The JSON CRLF test. It was testing that we didn't output any messages with an empty 'submatches' array. That is indeed still the case. The difference is that the messages got combined because of the adjacent line merging behavior. This is a slight change to the output, but is still correct. Fixes #1311	2021-05-31 21:51:18 -04:00
Andrew Gallant	0bc4f0447b	style: rustfmt everything This is why I was so intent on clearing the PR queue. This will effectively invalidate all existing patches, so I wanted to start from a clean slate. We do make one little tweak: we put the default type definitions in their own file and tell rustfmt to keep its grubby mits off of it. We also sort it lexicographically and hopefully will enforce that from here on.	2020-02-17 19:24:53 -05:00
Andrew Gallant	9d703110cf	regex: make CRLF hack more robust This commit improves the CRLF hack to be more robust. In particular, in addition to rewriting `$` as `(?:\r??$)`, we now strip `\r` from the end of a match if and only if the regex has an ending line anchor required for a match. This doesn't quite make the hack 100% correct, but should fix most use cases in practice. An example of a regex that will still be incorrect is `foo\|bar$`, since the analysis isn't quite sophisticated enough to determine that a `\r` can be safely stripped from any match. Even if we fix that, regexes like `foo\r\|bar$` still won't be handled correctly. Alas, more work on this front should really be focused on enabling this in the regex engine itself. The specific cause of this bug was that grep-searcher was sneakily stripping CRLF from matching lines when it really shouldn't have. We remove that code now, and instead rely on better match semantics provided at a lower level. Fixes #1095	2019-01-26 12:34:28 -05:00
Andrew Gallant	7a6a40bae1	edition: move core ripgrep to Rust 2018	2019-01-19 10:44:30 -05:00
Andrew Gallant	5b1ce8bdc2	tests: touch up tests on Windows This fixes warnings and adds an additional invalid UTF-8 test that will run on Windows.	2018-08-21 23:05:52 -04:00
Andrew Gallant	eb184d7711	tests: re-tool integration tests This basically rewrites every integration test. We reduce the amount of magic involved here in terms of which arguments are being passed to ripgrep processes. To make up for the boiler plate saved by the magic, we make the Dir (formerly WorkDir) type a bit nicer to use, along with a new TestCommand that wraps a std::process::Command. In exchange, we get tests that are easier to read and write. We also run every test with the `--pcre2` flag to make sure that works, when PCRE2 is available.	2018-08-20 07:10:19 -04:00

6 Commits