mirror of
https://github.com/BurntSushi/ripgrep.git
synced 2025-08-10 09:01:59 -07:00
deps: drop bytecount in favor of memchr_iter(..).count()
As of the memchr 2.6 release, its Iterator::count method is specialized to only count the number of occurrences instead of finding the offset of each occurrence. This replaces ripgrep's use of the bytecount crate. While micro-benchmarks suggest that memchr's method has better throughput than bytecount, it turned out to be an illusion. Namely, on a ~13GB haystack prior to this change: $ time rg-bytecount 'You killed my friend, my best friend, my lifelong friend!' OpenSubtitles2018.raw.en --line-number 441450441:- You killed my friend, my best friend, my lifelong friend! real 1.473 user 1.186 sys 0.286 maxmem 12512 MB faults 0 And then after: $ time rg 'You killed my friend, my best friend, my lifelong friend!' OpenSubtitles2018.raw.en --line-number 441450441:- You killed my friend, my best friend, my lifelong friend! real 1.532 user 1.280 sys 0.250 maxmem 12512 MB faults 0 But perf is just about in the same ballpark. That's good enough for me at the moment in order to drop the extra dependency. I did this because the marginal cost of adding the Iterator::count() specialization to memchr was extremely small.
This commit is contained in:
@@ -3,7 +3,6 @@ A collection of routines for performing operations on lines.
|
||||
*/
|
||||
|
||||
use bstr::ByteSlice;
|
||||
use bytecount;
|
||||
use grep_matcher::{LineTerminator, Match};
|
||||
|
||||
/// An iterator over lines in a particular slice of bytes.
|
||||
@@ -110,7 +109,7 @@ impl LineStep {
|
||||
|
||||
/// Count the number of occurrences of `line_term` in `bytes`.
|
||||
pub fn count(bytes: &[u8], line_term: u8) -> u64 {
|
||||
bytecount::count(bytes, line_term) as u64
|
||||
memchr::memchr_iter(line_term, bytes).count() as u64
|
||||
}
|
||||
|
||||
/// Given a line that possibly ends with a terminator, return that line without
|
||||
|
Reference in New Issue
Block a user