searcher: always strip BOM

This fixes a bug where a BOM prefix was included. While this was somewhat intentional in order to have a faithful "UTF8 passthru" option, in practice, this causes problems such as breaking patterns like `^` in a really non-obvious way. The actual fix was to add a new API to encoding_rs_io, which this commit brings in. Fixes #1163
2025-07-25 17:21:57 -07:00 · 2019-01-25 17:18:57 -05:00
parent 9a9f54d44c
commit 276e2c9b9a
4 changed files with 14 additions and 4 deletions
--- a/tests/regression.rs
+++ b/tests/regression.rs
@@ -592,6 +592,15 @@ rgtest!(r1130, |dir: Dir, mut cmd: TestCommand| {
    );
 });

+// See: https://github.com/BurntSushi/ripgrep/issues/1163
+rgtest!(r1163, |dir: Dir, mut cmd: TestCommand| {
+    dir.create("bom.txt", "\u{FEFF}test123\ntest123");
+    eqnice!(
+        "bom.txt:test123\nbom.txt:test123\n",
+        cmd.arg("^test123").stdout()
+    );
+});
+
 // See: https://github.com/BurntSushi/ripgrep/issues/1164
 rgtest!(r1164, |dir: Dir, mut cmd: TestCommand| {
    dir.create_dir(".git");