mirror of
https://github.com/BurntSushi/ripgrep.git
synced 2025-08-03 13:42:06 -07:00
libripgrep: initial commit introducing libripgrep
libripgrep is not any one library, but rather, a collection of libraries that roughly separate the following key distinct phases in a grep implementation: 1. Pattern matching (e.g., by a regex engine). 2. Searching a file using a pattern matcher. 3. Printing results. Ultimately, both (1) and (3) are defined by de-coupled interfaces, of which there may be multiple implementations. Namely, (1) is satisfied by the `Matcher` trait in the `grep-matcher` crate and (3) is satisfied by the `Sink` trait in the `grep2` crate. The searcher (2) ties everything together and finds results using a matcher and reports those results using a `Sink` implementation. Closes #162
This commit is contained in:
135
grep-searcher/src/lib.rs
Normal file
135
grep-searcher/src/lib.rs
Normal file
@@ -0,0 +1,135 @@
|
||||
/*!
|
||||
This crate provides an implementation of line oriented search, with optional
|
||||
support for multi-line search.
|
||||
|
||||
# Brief overview
|
||||
|
||||
The principle type in this crate is a
|
||||
[`Searcher`](struct.Searcher.html),
|
||||
which can be configured and built by a
|
||||
[`SearcherBuilder`](struct.SearcherBuilder.html).
|
||||
A `Searcher` is responsible for reading bytes from a source (e.g., a file),
|
||||
executing a search of those bytes using a `Matcher` (e.g., a regex) and then
|
||||
reporting the results of that search to a
|
||||
[`Sink`](trait.Sink.html)
|
||||
(e.g., stdout). The `Searcher` itself is principally responsible for managing
|
||||
the consumption of bytes from a source and applying a `Matcher` over those
|
||||
bytes in an efficient way. The `Searcher` is also responsible for inverting
|
||||
a search, counting lines, reporting contextual lines, detecting binary data
|
||||
and even deciding whether or not to use memory maps.
|
||||
|
||||
A `Matcher` (which is defined in the
|
||||
[`grep-matcher`](https://crates.io/crates/grep-matcher)
|
||||
crate) is a trait for describing the lowest levels of pattern search in a
|
||||
generic way. The interface itself is very similar to the interface of a regular
|
||||
expression. For example, the
|
||||
[`grep-regex`](https://crates.io/crates/grep-regex)
|
||||
crate provides an implementation of the `Matcher` trait using Rust's
|
||||
[`regex`](https://crates.io/crates/regex)
|
||||
crate.
|
||||
|
||||
Finally, a `Sink` describes how callers receive search results producer by a
|
||||
`Searcher`. This includes routines that are called at the beginning and end of
|
||||
a search, in addition to routines that are called when matching or contextual
|
||||
lines are found by the `Searcher`. Implementations of `Sink` can be trivially
|
||||
simple, or extraordinarily complex, such as the
|
||||
`Standard` printer found in the
|
||||
[`grep-printer`](https://crates.io/crates/grep-printer)
|
||||
crate, which effectively implements grep-like output.
|
||||
This crate also provides convenience `Sink` implementations in the
|
||||
[`sinks`](sinks/index.html)
|
||||
sub-module for easy searching with closures.
|
||||
|
||||
# Example
|
||||
|
||||
This example shows how to execute the searcher and read the search results
|
||||
using the
|
||||
[`UTF8`](sinks/struct.UTF8.html)
|
||||
implementation of `Sink`.
|
||||
|
||||
```
|
||||
extern crate grep_matcher;
|
||||
extern crate grep_regex;
|
||||
extern crate grep_searcher;
|
||||
|
||||
use std::error::Error;
|
||||
|
||||
use grep_matcher::Matcher;
|
||||
use grep_regex::RegexMatcher;
|
||||
use grep_searcher::Searcher;
|
||||
use grep_searcher::sinks::UTF8;
|
||||
|
||||
const SHERLOCK: &'static [u8] = b"\
|
||||
For the Doctor Watsons of this world, as opposed to the Sherlock
|
||||
Holmeses, success in the province of detective work must always
|
||||
be, to a very large extent, the result of luck. Sherlock Holmes
|
||||
can extract a clew from a wisp of straw or a flake of cigar ash;
|
||||
but Doctor Watson has to have it taken out for him and dusted,
|
||||
and exhibited clearly, with a label attached.
|
||||
";
|
||||
|
||||
# fn main() { example().unwrap() }
|
||||
fn example() -> Result<(), Box<Error>> {
|
||||
let matcher = RegexMatcher::new(r"Doctor \w+")?;
|
||||
let mut matches: Vec<(u64, String)> = vec![];
|
||||
Searcher::new().search_slice(&matcher, SHERLOCK, UTF8(|lnum, line| {
|
||||
// We are guaranteed to find a match, so the unwrap is OK.
|
||||
eprintln!("LINE: {:?}", line);
|
||||
let mymatch = matcher.find(line.as_bytes())?.unwrap();
|
||||
matches.push((lnum, line[mymatch].to_string()));
|
||||
Ok(true)
|
||||
}))?;
|
||||
|
||||
eprintln!("MATCHES: {:?}", matches);
|
||||
|
||||
assert_eq!(matches.len(), 2);
|
||||
assert_eq!(
|
||||
matches[0],
|
||||
(1, "Doctor Watsons".to_string())
|
||||
);
|
||||
assert_eq!(
|
||||
matches[1],
|
||||
(5, "Doctor Watson".to_string())
|
||||
);
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
See also `examples/search-stdin.rs` from the root of this crate's directory
|
||||
to see a similar example that accepts a pattern on the command line and
|
||||
searches stdin.
|
||||
*/
|
||||
|
||||
#![deny(missing_docs)]
|
||||
|
||||
extern crate bytecount;
|
||||
extern crate encoding_rs;
|
||||
extern crate encoding_rs_io;
|
||||
extern crate grep_matcher;
|
||||
#[macro_use]
|
||||
extern crate log;
|
||||
extern crate memchr;
|
||||
extern crate memmap;
|
||||
#[cfg(test)]
|
||||
extern crate regex;
|
||||
|
||||
pub use lines::{LineIter, LineStep};
|
||||
pub use searcher::{
|
||||
BinaryDetection, ConfigError, Encoding, MmapChoice,
|
||||
Searcher, SearcherBuilder,
|
||||
};
|
||||
pub use sink::{
|
||||
Sink, SinkError,
|
||||
SinkContext, SinkContextKind, SinkFinish, SinkMatch,
|
||||
};
|
||||
pub use sink::sinks;
|
||||
|
||||
#[macro_use]
|
||||
mod macros;
|
||||
|
||||
mod line_buffer;
|
||||
mod lines;
|
||||
mod searcher;
|
||||
mod sink;
|
||||
#[cfg(test)]
|
||||
mod testutil;
|
Reference in New Issue
Block a user