File path filtering works and is pretty fast.

I'm pretty disappointed by the performance of regex sets. They are
apparently spending a lot of their time in construction of the DFA,
which probably means that the DFA is just too big.

It turns out that it's actually faster to build an *additional* normal
regex with the alternation of every glob and use it as a first-pass
filter over every file path. If there's a match, only then do we try the
more expensive RegexSet.
This commit is contained in:
Andrew Gallant
2016-08-27 01:01:06 -04:00
parent b55ecf34c7
commit 065c449980
5 changed files with 673 additions and 26 deletions

View File

@@ -29,6 +29,9 @@ regex-syntax = { version = "0.3.1", path = "/home/andrew/rust/regex/regex-syntax
rustc-serialize = "0.3"
walkdir = "0.1"
[features]
simd-accel = ["regex/simd-accel"]
[dev-dependencies]
glob = "0.2"
lazy_static = "0.2"