This feature causes nothing but problems and is frequently broken. The
only optimization it was enabling were SIMD optimizations for
transcoding. In particular, for UTF-16 transcoding. This is performed by
the [`encoding_rs`](https://github.com/hsivonen/encoding_rs) crate,
which specifically uses unstable portable SIMD APIs instead of the
stable non-portable SIMD APIs.
SIMD optimizations that apply to search have long been making use of
stable APIs, and are automatically enabled when your target supports
them. This is, IMO, the correct user experience and one that
`encoding_rs` refuses to support. I'm done dealing with it, so
transcoding will only use scalar code until the SIMD optimizations in
`encoding_rs` work on stable. (This doesn't mean that `encoding_rs` has
to change. This could also be fixed by stabilizing `std::simd`.)
Fixes#2748
As of the memchr 2.6 release, its Iterator::count method is specialized
to only count the number of occurrences instead of finding the offset of
each occurrence. This replaces ripgrep's use of the bytecount crate.
While micro-benchmarks suggest that memchr's method has better
throughput than bytecount, it turned out to be an illusion. Namely, on a
~13GB haystack prior to this change:
$ time rg-bytecount 'You killed my friend, my best friend, my lifelong friend!' OpenSubtitles2018.raw.en --line-number
441450441:- You killed my friend, my best friend, my lifelong friend!
real 1.473
user 1.186
sys 0.286
maxmem 12512 MB
faults 0
And then after:
$ time rg 'You killed my friend, my best friend, my lifelong friend!' OpenSubtitles2018.raw.en --line-number
441450441:- You killed my friend, my best friend, my lifelong friend!
real 1.532
user 1.280
sys 0.250
maxmem 12512 MB
faults 0
But perf is just about in the same ballpark. That's good enough for me
at the moment in order to drop the extra dependency.
I did this because the marginal cost of adding the Iterator::count()
specialization to memchr was extremely small.
memmap is unmaintained at this point and it is being flagged as a
RUSTSEC advisory in ripgrep. This doesn't seem like that big of a deal
to me honestly, but memmap2 looks like a fine choice at this point.
Fixes#1785, Closes#1786
The top-level listing was just getting a bit too long for my taste. So
put all of the code in one directory and shrink the large top-level mess
to a small top-level mess.
NOTE: This commit only contains renames. The subsequent commit will
actually make ripgrep build again. We do it this way with the naive hope
that this will make it easier for git history to track the renames.
Sigh.