regex: add new inner literal extractor

This is mostly a copy of the prefix literal extractor in regex-syntax,
but with a tweaked notion of Seq that keeps track of whether it's a
prefix of an expression or not. If it isn't, then we can't cross it as a
suffix to another Seq.

This new extractor should be a lot more robust than the old one. We
actually will keep going through the regex to try and find the "best"
literals to search for (according to some heuristic).
This commit is contained in:
Andrew Gallant
2023-06-21 20:38:02 -04:00
parent e80c102dee
commit ca740d9ace
2 changed files with 958 additions and 19 deletions

View File

@@ -76,15 +76,7 @@ impl RegexMatcherBuilder {
// then run the original regex on only that line. (In this case, the
// regex engine is likely to handle this case for us since it's so
// simple, but the idea applies.)
let fast_line_regex = match InnerLiterals::new(chir, re).one_regex() {
None => None,
Some(pattern) => {
Some(chir.config().build_many(&[pattern])?.to_regex()?)
}
};
if let Some(ref re) = fast_line_regex {
log::debug!("extracted fast line regex: {:?}", re);
}
let fast_line_regex = InnerLiterals::new(chir, re).one_regex()?;
// We override the line terminator in case the configured HIR doesn't
// support it.