doc: clarify automatic encoding detection

Fixes #1103
2025-05-19 09:40:22 -07:00 · 2019-01-26 13:55:17 -05:00 · 2019-01-26 13:55:17 -05:00 · 6d5dba85bd
commit 6d5dba85bd
parent afb89bcdad
3 changed files with 11 additions and 3 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -27,6 +27,8 @@ Bug fixes:
  `**` is now accepted as valid syntax anywhere in a glob.
 * [BUG #1095](https://github.com/BurntSushi/ripgrep/issues/1095):
  Fix corner cases involving the `--crlf` flag.
 * [BUG #1103](https://github.com/BurntSushi/ripgrep/issues/1103):
  Clarify what `--encoding auto` does.
 * [BUG #1106](https://github.com/BurntSushi/ripgrep/issues/1106):
  `--files-with-matches` and `--files-without-match` work with one file.
 * [BUG #1093](https://github.com/BurntSushi/ripgrep/pull/1093):
--- a/GUIDE.md
+++ b/GUIDE.md
@ -609,7 +609,8 @@ topic, but we can try to summarize its relevancy to ripgrep:
  the most popular encodings likely consist of ASCII, latin1 or UTF-8. As
  a special exception, UTF-16 is prevalent in Windows environments
-In light of the above, here is how ripgrep behaves:
+In light of the above, here is how ripgrep behaves when `--encoding auto` is
 given, which is the default:
 * All input is assumed to be ASCII compatible (which means every byte that
  corresponds to an ASCII codepoint actually is an ASCII codepoint). This
--- a/src/app.rs
+++ b/src/app.rs
@ -982,10 +982,15 @@ fn flag_encoding(args: &mut Vec<RGArg>) {
    const LONG: &str = long!("\
 Specify the text encoding that ripgrep will use on all files searched. The
 default value is 'auto', which will cause ripgrep to do a best effort automatic
-detection of encoding on a per-file basis. Other supported values can be found
+detection of encoding on a per-file basis. Automatic detection in this case
-in the list of labels here:
+only applies to files that begin with a UTF-8 or UTF-16 byte-order mark (BOM).
 No other automatic detection is performend.
 Other supported values can be found in the list of labels here:
 https://encoding.spec.whatwg.org/#concept-encoding-get
 For more details on encoding and how ripgrep deals with it, see GUIDE.md.
 This flag can be disabled with --no-encoding.
 ");
    let arg = RGArg::flag("encoding", "ENCODING").short("E")