searcher: always strip BOM

This fixes a bug where a BOM prefix was included. While this was somewhat
intentional in order to have a faithful "UTF8 passthru" option, in
practice, this causes problems such as breaking patterns like `^` in a
really non-obvious way.

The actual fix was to add a new API to encoding_rs_io, which this commit
brings in.

Fixes #1163
This commit is contained in:
Andrew Gallant
2019-01-25 17:18:57 -05:00
parent 9a9f54d44c
commit 276e2c9b9a
4 changed files with 14 additions and 4 deletions

View File

@@ -592,6 +592,15 @@ rgtest!(r1130, |dir: Dir, mut cmd: TestCommand| {
);
});
// See: https://github.com/BurntSushi/ripgrep/issues/1163
rgtest!(r1163, |dir: Dir, mut cmd: TestCommand| {
dir.create("bom.txt", "\u{FEFF}test123\ntest123");
eqnice!(
"bom.txt:test123\nbom.txt:test123\n",
cmd.arg("^test123").stdout()
);
});
// See: https://github.com/BurntSushi/ripgrep/issues/1164
rgtest!(r1164, |dir: Dir, mut cmd: TestCommand| {
dir.create_dir(".git");