Compare commits

...

14 Commits

Author SHA1 Message Date
Andrew Gallant
525d051172 0.1.12 2016-09-21 20:47:44 -04:00
Andrew Gallant
5a9883d27c Try to use memory maps more aggressively on Windows.
Some brief playing around suggests that it is faster.

However, it's probably slower in a VM. Let's prioritize native users.
2016-09-21 20:47:40 -04:00
Andrew Gallant
f462d092e7 Add Archlinux AUR package. 2016-09-21 20:19:29 -04:00
Andrew Gallant
fe84928c85 0.1.11 2016-09-21 19:37:37 -04:00
Andrew Gallant
f7eaf67fc3 grrr fix appveyor deployment filter 2016-09-21 19:37:34 -04:00
Andrew Gallant
c1c92e4fee 0.1.10 2016-09-21 19:27:16 -04:00
Andrew Gallant
5644bbe43a attempt to fix Windows build 2016-09-21 19:27:12 -04:00
Andrew Gallant
aeb3a5ba0f bump grep to 0.1.2 2016-09-21 19:16:28 -04:00
Andrew Gallant
24e14a0341 grep 0.1.2 2016-09-21 19:14:12 -04:00
Andrew Gallant
2a2b1506d4 Fix a performance bug where using -w could result in very bad performance.
The specific issue is that -w causes the regex to be wrapped in Unicode
word boundaries. Regrettably, Unicode word boundaries are the one thing
our regex engine can't handle well in the presence of non-ASCII text. We
work around its slowness by stripping word boundaries in some
circumstances, and using the resulting expression as a way to produce match
candidates that are then verified by the full original regex.

This doesn't fix all cases, but it should fix all cases where -w is used.
2016-09-21 19:12:07 -04:00
Andrew Gallant
4d6b3c727e Bump regex version. 2016-09-21 19:05:15 -04:00
Andrew Gallant
c2bf9e3d45 fix brew 2016-09-21 17:36:46 -04:00
Andrew Gallant
dad73b92eb Add brew. 2016-09-21 17:28:19 -04:00
Andrew Gallant
b0d8ff6f4a 0.1.9 2016-09-21 16:41:28 -04:00
12 changed files with 198 additions and 37 deletions

6
Cargo.lock generated
View File

@@ -1,13 +1,13 @@
[root]
name = "ripgrep"
version = "0.1.8"
version = "0.1.12"
dependencies = [
"deque 0.3.1 (registry+https://github.com/rust-lang/crates.io-index)",
"docopt 0.6.83 (registry+https://github.com/rust-lang/crates.io-index)",
"env_logger 0.3.5 (registry+https://github.com/rust-lang/crates.io-index)",
"fnv 1.0.5 (registry+https://github.com/rust-lang/crates.io-index)",
"glob 0.2.11 (registry+https://github.com/rust-lang/crates.io-index)",
"grep 0.1.1",
"grep 0.1.2",
"kernel32-sys 0.2.2 (registry+https://github.com/rust-lang/crates.io-index)",
"lazy_static 0.2.1 (registry+https://github.com/rust-lang/crates.io-index)",
"libc 0.2.16 (registry+https://github.com/rust-lang/crates.io-index)",
@@ -80,7 +80,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
[[package]]
name = "grep"
version = "0.1.1"
version = "0.1.2"
dependencies = [
"log 0.3.6 (registry+https://github.com/rust-lang/crates.io-index)",
"memchr 0.1.11 (registry+https://github.com/rust-lang/crates.io-index)",

View File

@@ -1,6 +1,6 @@
[package]
name = "ripgrep"
version = "0.1.8" #:version
version = "0.1.12" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Line oriented search tool using Rust's regex library. Combines the raw
@@ -27,14 +27,14 @@ deque = "0.3"
docopt = "0.6"
env_logger = "0.3"
fnv = "1.0"
grep = { version = "0.1.1", path = "grep" }
grep = { version = "0.1.2", path = "grep" }
lazy_static = "0.2"
libc = "0.2"
log = "0.3"
memchr = "0.1"
memmap = "0.2"
num_cpus = "1"
regex = "0.1.76"
regex = "0.1.77"
rustc-serialize = "0.3"
term = "0.4"
walkdir = "0.1"

View File

@@ -57,6 +57,9 @@ for a very detailed comparison with more benchmarks and analysis.
color and full Unicode support. Unlike GNU grep, `ripgrep` stays fast while
supporting Unicode (which is always on).
In other words, use `ripgrep` if you like speed, sane defaults, fewer bugs and
Unicode.
### Is it really faster than everything else?
Yes. A large number of benchmarks with detailed analysis for each is
@@ -84,9 +87,6 @@ Summarizing, `ripgrep` is fast because:
### Installation
N.B. `ripgrep` is not yet available in any package repositories. I'd like to
fix that in the future.
[Binaries for `ripgrep` are available for Windows, Mac and
Linux.](https://github.com/BurntSushi/ripgrep/releases) Linux binaries are
static executables. Windows binaries are available either as built with MinGW
@@ -96,7 +96,22 @@ but you'll need to have the
Tools](http://landinghub.visualstudio.com/visual-cpp-build-tools)
installed.
If you're a Rust programmer, `ripgrep` can be installed with `cargo`:
If you're a **Homebrew** user, then you can install it with a custom formula
(N.B. `ripgrep` isn't actually in Homebrew yet. This just installs the binary
directly):
```
$ brew install https://raw.githubusercontent.com/BurntSushi/ripgrep/master/pkg/brew/ripgrep.rb
```
If you're an **Archlinux** user, then you can install `ripgrep` from the
[`ripgrep` AUR package](https://aur.archlinux.org/packages/ripgrep/), e.g.,
```
$ yaourt -S ripgrep
```
If you're a **Rust programmer**, `ripgrep` can be installed with `cargo`:
```
$ cargo install ripgrep

View File

@@ -2,27 +2,22 @@ environment:
global:
PROJECT_NAME: ripgrep
matrix:
# Nightly channel
- TARGET: i686-pc-windows-gnu
CHANNEL: nightly
CHANNEL: stable
- TARGET: i686-pc-windows-msvc
CHANNEL: nightly
CHANNEL: stable
- TARGET: x86_64-pc-windows-gnu
CHANNEL: nightly
CHANNEL: stable
- TARGET: x86_64-pc-windows-msvc
CHANNEL: nightly
CHANNEL: stable
# Install Rust and Cargo
# (Based on from https://github.com/rust-lang/libc/blob/master/appveyor.yml)
install:
- ps: Start-FileDownload "https://static.rust-lang.org/dist/channel-rust-stable"
- ps: $env:RUST_VERSION = Get-Content channel-rust-stable | select -first 1 | %{$_.split('-')[1]}
- if NOT "%CHANNEL%" == "stable" set RUST_VERSION=%CHANNEL%
- ps: Start-FileDownload "https://static.rust-lang.org/dist/rust-${env:RUST_VERSION}-${env:TARGET}.exe"
- rust-%RUST_VERSION%-%TARGET%.exe /VERYSILENT /NORESTART /DIR="C:\Program Files (x86)\Rust"
- SET PATH=%PATH%;C:\Program Files (x86)\Rust\bin
- if "%TARGET%" == "i686-pc-windows-gnu" set PATH=%PATH%;C:\msys64\mingw32\bin
- if "%TARGET%" == "x86_64-pc-windows-gnu" set PATH=%PATH%;C:\msys64\mingw64\bin
- curl -sSf -o rustup-init.exe https://win.rustup.rs/
- rustup-init.exe -y --default-host %TARGET%
- set PATH=%PATH%;C:\Users\appveyor\.cargo\bin
- if defined MSYS2_BITS set PATH=%PATH%;C:\msys64\mingw%MSYS2_BITS%\bin
- rustc -V
- cargo -V
@@ -57,7 +52,7 @@ deploy:
# channel to use to produce the release artifacts
# NOTE make sure you only release *once* per target
# TODO you may want to pick a different channel
CHANNEL: nightly
CHANNEL: stable
appveyor_repo_tag: true
branches:

25
ci/sha256.sh Normal file
View File

@@ -0,0 +1,25 @@
#!/bin/sh
set -e
if [ $# != 1 ]; then
echo "Usage: $(basename $0) version" >&2
exit 1
fi
version="$1"
# Linux and Darwin builds.
for arch in i686 x86_64; do
for target in apple-darwin unknown-linux-musl; do
url="https://github.com/BurntSushi/ripgrep/releases/download/$version/ripgrep-$version-$arch-$target.tar.gz"
sha=$(curl -sfSL "$url" | sha256sum)
echo "$version-$arch-$target $sha"
done
done
# Source.
for ext in zip tar.gz; do
url="https://github.com/BurntSushi/ripgrep/archive/$version.$ext"
sha=$(curl -sfSL "$url" | sha256sum)
echo "source.$ext $sha"
done

View File

@@ -1,6 +1,6 @@
[package]
name = "grep"
version = "0.1.1" #:version
version = "0.1.2" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
Fast line oriented regex searching as a library.
@@ -16,5 +16,5 @@ license = "Unlicense/MIT"
log = "0.3"
memchr = "0.1"
memmap = "0.2"
regex = "0.1.76"
regex = "0.1.77"
regex-syntax = "0.3.5"

View File

@@ -19,6 +19,7 @@ pub use search::{Grep, GrepBuilder, Iter, Match};
mod literals;
mod nonl;
mod search;
mod word_boundary;
/// Result is a convenient type alias that fixes the type of the error to
/// the `Error` type defined in this crate.

View File

@@ -4,6 +4,8 @@ use syntax;
use literals::LiteralSets;
use nonl;
use syntax::Expr;
use word_boundary::strip_unicode_word_boundaries;
use Result;
/// A matched line.
@@ -127,22 +129,35 @@ impl GrepBuilder {
pub fn build(self) -> Result<Grep> {
let expr = try!(self.parse());
let literals = LiteralSets::create(&expr);
let re = try!(
RegexBuilder::new(&expr.to_string())
.case_insensitive(self.opts.case_insensitive)
.multi_line(true)
.unicode(true)
.size_limit(self.opts.size_limit)
.dfa_size_limit(self.opts.dfa_size_limit)
.compile()
);
let re = try!(self.regex(&expr));
let required = literals.to_regex().or_else(|| {
let expr = match strip_unicode_word_boundaries(&expr) {
None => return None,
Some(expr) => expr,
};
debug!("Stripped Unicode word boundaries. New AST:\n{:?}", expr);
self.regex(&expr).ok()
});
Ok(Grep {
re: re,
required: literals.to_regex(),
required: required,
opts: self.opts,
})
}
/// Creates a new regex from the given expression with the current
/// configuration.
fn regex(&self, expr: &Expr) -> Result<Regex> {
RegexBuilder::new(&expr.to_string())
.case_insensitive(self.opts.case_insensitive)
.multi_line(true)
.unicode(true)
.size_limit(self.opts.size_limit)
.dfa_size_limit(self.opts.dfa_size_limit)
.compile()
.map_err(From::from)
}
/// Parses the underlying pattern and ensures the pattern can never match
/// the line terminator.
fn parse(&self) -> Result<syntax::Expr> {

54
grep/src/word_boundary.rs Normal file
View File

@@ -0,0 +1,54 @@
use syntax::Expr;
/// Strips Unicode word boundaries from the given expression.
///
/// The key invariant this maintains is that the expression returned will match
/// *at least* every where the expression given will match. Namely, a match of
/// the returned expression can report false positives but it will never report
/// false negatives.
///
/// If no word boundaries could be stripped, then None is returned.
pub fn strip_unicode_word_boundaries(expr: &Expr) -> Option<Expr> {
// The real reason we do this is because Unicode word boundaries are the
// one thing that Rust's regex DFA engine can't handle. When it sees a
// Unicode word boundary among non-ASCII text, it falls back to one of the
// slower engines. We work around this limitation by attempting to use
// a regex to find candidate matches without a Unicode word boundary. We'll
// only then use the full (and slower) regex to confirm a candidate as a
// match or not during search.
use syntax::Expr::*;
match *expr {
Concat(ref es) if !es.is_empty() => {
let first = is_unicode_word_boundary(&es[0]);
let last = is_unicode_word_boundary(es.last().unwrap());
// Be careful not to strip word boundaries if there are no other
// expressions to match.
match (first, last) {
(true, false) if es.len() > 1 => {
Some(Concat(es[1..].to_vec()))
}
(false, true) if es.len() > 1 => {
Some(Concat(es[..es.len() - 1].to_vec()))
}
(true, true) if es.len() > 2 => {
Some(Concat(es[1..es.len() - 1].to_vec()))
}
_ => None,
}
}
_ => None,
}
}
/// Returns true if the given expression is a Unicode word boundary.
fn is_unicode_word_boundary(expr: &Expr) -> bool {
use syntax::Expr::*;
match *expr {
WordBoundary => true,
NotWordBoundary => true,
Group { ref e, .. } => is_unicode_word_boundary(e),
_ => false,
}
}

35
pkg/archlinux/PKGBUILD Normal file
View File

@@ -0,0 +1,35 @@
# Contributor: Andrew Gallant <jamslam@gmail.com>
# Maintainer: Andrew Gallant
pkgname=ripgrep
pkgver=0.1.11
pkgrel=1
pkgdesc="A search tool that combines the usability of The Silver Searcher with the raw speed of grep."
arch=('i686' 'x86_64')
url="https://github.com/BurntSushi/ripgrep"
license=('UNLICENSE')
makedepends=('cargo')
source=("https://github.com/BurntSushi/$pkgname/archive/$pkgver.tar.gz")
sha256sums=('d29beb1a43a263d75ce4ef23a07253ed6ea306b14ffb5b37bc4972fb5d98238c')
build() {
cd "$pkgname-$pkgver"
if command -v rustup > /dev/null 2>&1; then
RUSTFLAGS="-C target-cpu=native" rustup run nightly \
cargo build --release --features simd-accel
elif rustc --version | grep -q nightly; then
RUSTFLAGS="-C target-cpu=native" \
cargo build --release --features simd-accel
else
cargo build --release
fi
}
package() {
cd "$pkgname-$pkgver"
install -Dm755 "target/release/rg" "$pkgdir/usr/bin/rg"
install -Dm644 "README-NEW.md" "$pkgdir/usr/share/doc/ripgrep/README.md"
install -Dm644 "COPYING" "$pkgdir/usr/share/doc/ripgrep/COPYING"
install -Dm644 "LICENSE-MIT" "$pkgdir/usr/share/doc/ripgrep/LICENSE-MIT"
install -Dm644 "UNLICENSE" "$pkgdir/usr/share/doc/ripgrep/UNLICENSE"
}

18
pkg/brew/ripgrep.rb Normal file
View File

@@ -0,0 +1,18 @@
require 'formula'
class Ripgrep < Formula
version '0.1.8'
desc "Search tool like grep and The Silver Searcher."
homepage "https://github.com/BurntSushi/ripgrep"
if Hardware::CPU.is_64_bit?
url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-x86_64-apple-darwin.tar.gz"
sha256 "893e0e7fac88ebbef024829466fafef6eae5b1060273bbfca3806090e660b06b"
else
url "https://github.com/BurntSushi/ripgrep/releases/download/#{version}/ripgrep-#{version}-i686-apple-darwin.tar.gz"
sha256 "2296c8081a2bfe28b43dea4326a9e8ce9c2821fd628a1ca366e824aceddc5fad"
end
def install
bin.install "rg"
end
end

View File

@@ -131,7 +131,7 @@ Less common options:
--mmap
Search using memory maps when possible. This is enabled by default
when ripgrep thinks it will be faster. (Note that mmap searching
doesn't current support the various context related options.)
doesn't currently support the various context related options.)
--no-mmap
Never use memory maps, even when they might be faster.
@@ -273,6 +273,9 @@ impl RawArgs {
false
} else if self.flag_mmap {
true
} else if cfg!(windows) {
// On Windows, memory maps appear faster than read calls. Neat.
true
} else {
// If we're only searching a few paths and all of them are
// files, then memory maps are probably faster.