Compare commits

..

1 Commits

Author SHA1 Message Date
Andrew Gallant
aecc0ea126 cli: fix arbitrary execution of program bug
This fixes a bug only present on Windows that would permit someoen to
execute an arbitrary program if they crafted an appropriate directory
tree. Namely, if someone put an executable named 'xz.exe' in the root of
a directory tree and one ran 'rg -z foo' from the root of that tree,
then the 'xz.exe' executable in that tree would execute if there are any
'xz' files anywhere in the tree.

The root cause of this problem is that 'CreateProcess' on Windows will
implicitly look in the current working directory for an executable when
it is given a relative path to a program. Rust's standard library allows
this behavior to occur, so we work around it here. We work around it by
explicitly resolving programs like 'xz' via 'PATH'. That way, we only
ever pass an absolute path to 'CreateProcess', which avoids the implicit
behavior of checking the current working directory.

This fix doesn't apply to non-Windows systems as it is believed to only
impact Windows. In theory, the bug could apply on Unix if '.' is in
one's PATH, but at that point, you reap what you sow.

While the extent to which this is a security problem isn't clear, I
think users generally expect to be able to download or clone
repositories from the Internet and run ripgrep on them without fear of
anything too awful happening. Being able to execute an arbitrary program
probably violates that expectation. Therefore, CVE-2021-3013[1] was
created for this issue.

We apply the same logic to the --pre command, since the --pre command is
likely in a user's config file and it would be surprising for something
that the user is searching to modify which preprocessor command is used.

The --pre and -z/--search-zip flags are the only two ways that ripgrep
will invoke external programs, so this should cover any possible
exploitable cases of this bug.

[1] - https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-3013
2021-01-11 14:20:54 -05:00
118 changed files with 2839 additions and 7599 deletions

View File

@@ -1,8 +0,0 @@
# On Windows MSVC, statically link the C runtime so that the resulting EXE does
# not depend on the vcruntime DLL.
#
# See: https://github.com/BurntSushi/ripgrep/pull/1613
[target.x86_64-pc-windows-msvc]
rustflags = ["-C", "target-feature=+crt-static"]
[target.i686-pc-windows-msvc]
rustflags = ["-C", "target-feature=+crt-static"]

View File

@@ -15,6 +15,3 @@ examples of how ripgrep would be used if your feature request were added.
If you're not sure what to write here, then try imagining what the ideal If you're not sure what to write here, then try imagining what the ideal
documentation of your new feature would look like in ripgrep's man page. Then documentation of your new feature would look like in ripgrep's man page. Then
try to write it. try to write it.
If you're requesting the addition or change of default file types, please open
a PR. We can discuss it there if necessary.

View File

@@ -42,31 +42,31 @@ jobs:
- win-gnu - win-gnu
include: include:
- build: pinned - build: pinned
os: ubuntu-latest os: ubuntu-18.04
rust: 1.70.0 rust: 1.41.0
- build: stable - build: stable
os: ubuntu-latest os: ubuntu-18.04
rust: stable rust: stable
- build: beta - build: beta
os: ubuntu-latest os: ubuntu-18.04
rust: beta rust: beta
- build: nightly - build: nightly
os: ubuntu-latest os: ubuntu-18.04
rust: nightly rust: nightly
- build: nightly-musl - build: nightly-musl
os: ubuntu-latest os: ubuntu-18.04
rust: nightly rust: nightly
target: x86_64-unknown-linux-musl target: x86_64-unknown-linux-musl
- build: nightly-32 - build: nightly-32
os: ubuntu-latest os: ubuntu-18.04
rust: nightly rust: nightly
target: i686-unknown-linux-gnu target: i686-unknown-linux-gnu
- build: nightly-mips - build: nightly-mips
os: ubuntu-latest os: ubuntu-18.04
rust: nightly rust: nightly
target: mips64-unknown-linux-gnuabi64 target: mips64-unknown-linux-gnuabi64
- build: nightly-arm - build: nightly-arm
os: ubuntu-latest os: ubuntu-18.04
rust: nightly rust: nightly
# For stripping release binaries: # For stripping release binaries:
# docker run --rm -v $PWD/target:/target:Z \ # docker run --rm -v $PWD/target:/target:Z \
@@ -78,17 +78,17 @@ jobs:
os: macos-latest os: macos-latest
rust: nightly rust: nightly
- build: win-msvc - build: win-msvc
os: windows-2022 os: windows-2019
rust: nightly rust: nightly
- build: win-gnu - build: win-gnu
os: windows-2022 os: windows-2019
rust: nightly-x86_64-gnu rust: nightly-x86_64-gnu
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v3 uses: actions/checkout@v2
- name: Install packages (Ubuntu) - name: Install packages (Ubuntu)
if: matrix.os == 'ubuntu-latest' if: matrix.os == 'ubuntu-18.04'
run: | run: |
ci/ubuntu-install-packages ci/ubuntu-install-packages
@@ -98,9 +98,11 @@ jobs:
ci/macos-install-packages ci/macos-install-packages
- name: Install Rust - name: Install Rust
uses: dtolnay/rust-toolchain@master uses: actions-rs/toolchain@v1
with: with:
toolchain: ${{ matrix.rust }} toolchain: ${{ matrix.rust }}
profile: minimal
override: true
- name: Use Cross - name: Use Cross
if: matrix.target != '' if: matrix.target != ''
@@ -116,10 +118,10 @@ jobs:
echo "target flag is: ${{ env.TARGET_FLAGS }}" echo "target flag is: ${{ env.TARGET_FLAGS }}"
- name: Build ripgrep and all crates - name: Build ripgrep and all crates
run: ${{ env.CARGO }} build --verbose --workspace ${{ env.TARGET_FLAGS }} run: ${{ env.CARGO }} build --verbose --all ${{ env.TARGET_FLAGS }}
- name: Build ripgrep with PCRE2 - name: Build ripgrep with PCRE2
run: ${{ env.CARGO }} build --verbose --workspace --features pcre2 ${{ env.TARGET_FLAGS }} run: ${{ env.CARGO }} build --verbose --all --features pcre2 ${{ env.TARGET_FLAGS }}
# This is useful for debugging problems when the expected build artifacts # This is useful for debugging problems when the expected build artifacts
# (like shell completions and man pages) aren't generated. # (like shell completions and man pages) aren't generated.
@@ -137,7 +139,7 @@ jobs:
- name: Run tests with PCRE2 (sans cross) - name: Run tests with PCRE2 (sans cross)
if: matrix.target == '' if: matrix.target == ''
run: ${{ env.CARGO }} test --verbose --workspace --features pcre2 ${{ env.TARGET_FLAGS }} run: ${{ env.CARGO }} test --verbose --all --features pcre2 ${{ env.TARGET_FLAGS }}
- name: Run tests without PCRE2 (with cross) - name: Run tests without PCRE2 (with cross)
# These tests should actually work, but they almost double the runtime. # These tests should actually work, but they almost double the runtime.
@@ -145,17 +147,17 @@ jobs:
# enabled, every integration test is run twice: one with the default # enabled, every integration test is run twice: one with the default
# regex engine and once with PCRE2. # regex engine and once with PCRE2.
if: matrix.target != '' if: matrix.target != ''
run: ${{ env.CARGO }} test --verbose --workspace ${{ env.TARGET_FLAGS }} run: ${{ env.CARGO }} test --verbose --all ${{ env.TARGET_FLAGS }}
- name: Test for existence of build artifacts (Windows) - name: Test for existence of build artifacts (Windows)
if: matrix.os == 'windows-2022' if: matrix.os == 'windows-2019'
shell: bash shell: bash
run: | run: |
outdir="$(ci/cargo-out-dir "${{ env.TARGET_DIR }}")" outdir="$(ci/cargo-out-dir "${{ env.TARGET_DIR }}")"
ls "$outdir/_rg.ps1" && file "$outdir/_rg.ps1" ls "$outdir/_rg.ps1" && file "$outdir/_rg.ps1"
- name: Test for existence of build artifacts (Unix) - name: Test for existence of build artifacts (Unix)
if: matrix.os != 'windows-2022' if: matrix.os != 'windows-2019'
shell: bash shell: bash
run: | run: |
outdir="$(ci/cargo-out-dir "${{ env.TARGET_DIR }}")" outdir="$(ci/cargo-out-dir "${{ env.TARGET_DIR }}")"
@@ -172,35 +174,23 @@ jobs:
# 'rg' binary (done in test-complete) with qemu, which is a pain and # 'rg' binary (done in test-complete) with qemu, which is a pain and
# doesn't really gain us much. If shell completion works in one place, # doesn't really gain us much. If shell completion works in one place,
# it probably works everywhere. # it probably works everywhere.
if: matrix.target == '' && matrix.os != 'windows-2022' if: matrix.target == '' && matrix.os != 'windows-2019'
shell: bash shell: bash
run: ci/test-complete run: ci/test-complete
rustfmt: rustfmt:
name: rustfmt name: rustfmt
runs-on: ubuntu-latest runs-on: ubuntu-18.04
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v3 uses: actions/checkout@v2
- name: Install Rust - name: Install Rust
uses: dtolnay/rust-toolchain@master uses: actions-rs/toolchain@v1
with: with:
toolchain: stable toolchain: stable
override: true
profile: minimal
components: rustfmt components: rustfmt
- name: Check formatting - name: Check formatting
run: cargo fmt --all --check run: |
cargo fmt --all -- --check
docs:
name: Docs
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Install Rust
uses: dtolnay/rust-toolchain@master
with:
toolchain: stable
- name: Check documentation
env:
RUSTDOCFLAGS: -D warnings
run: cargo doc --no-deps --document-private-items --workspace

View File

@@ -1,26 +1,23 @@
# The way this works is the following: # The way this works is a little weird. But basically, the create-release job
# runs purely to initialize the GitHub release itself. Once done, the upload
# URL of the release is saved as an artifact.
# #
# The create-release job runs purely to initialize the GitHub release itself # The build-release job runs only once create-release is finished. It gets
# and to output upload_url for the following job. # the release upload URL by downloading the corresponding artifact (which was
# # uploaded by create-release). It then builds the release executables for each
# The build-release job runs only once create-release is finished. It gets the # supported platform and attaches them as release assets to the previously
# release upload URL from create-release job outputs, then builds the release # created release.
# executables for each supported platform and attaches them as release assets
# to the previously created release.
# #
# The key here is that we create the release only once. # The key here is that we create the release only once.
#
# Reference:
# https://eugene-babichenko.github.io/blog/2020/05/09/github-actions-cross-platform-auto-releases/
name: release name: release
on: on:
push: push:
# Enable when testing release infrastructure on a branch. # Enable when testing release infrastructure on a branch.
# branches: # branches:
# - ag/work # - ag/release
tags: tags:
- "[0-9]+.[0-9]+.[0-9]+" - '[0-9]+.[0-9]+.[0-9]+'
jobs: jobs:
create-release: create-release:
name: create-release name: create-release
@@ -28,20 +25,39 @@ jobs:
# env: # env:
# Set to force version number, e.g., when no tag exists. # Set to force version number, e.g., when no tag exists.
# RG_VERSION: TEST-0.0.0 # RG_VERSION: TEST-0.0.0
outputs:
rg_version: ${{ env.RG_VERSION }}
steps: steps:
- uses: actions/checkout@v3 - name: Create artifacts directory
run: mkdir artifacts
- name: Get the release version from the tag - name: Get the release version from the tag
shell: bash
if: env.RG_VERSION == '' if: env.RG_VERSION == ''
run: | run: |
echo "RG_VERSION=$GITHUB_REF_NAME" >> $GITHUB_ENV # Apparently, this is the right way to get a tag name. Really?
#
# See: https://github.community/t5/GitHub-Actions/How-to-get-just-the-tag-name/m-p/32167/highlight/true#M1027
echo "RG_VERSION=${GITHUB_REF#refs/tags/}" >> $GITHUB_ENV
echo "version is: ${{ env.RG_VERSION }}" echo "version is: ${{ env.RG_VERSION }}"
- name: Create GitHub release - name: Create GitHub release
id: release
uses: actions/create-release@v1
env: env:
GH_TOKEN: ${{ github.token }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: gh release create ${{ env.RG_VERSION }} with:
tag_name: ${{ env.RG_VERSION }}
release_name: ${{ env.RG_VERSION }}
- name: Save release upload URL to artifact
run: echo "${{ steps.release.outputs.upload_url }}" > artifacts/release-upload-url
- name: Save version number to artifact
run: echo "${{ env.RG_VERSION }}" > artifacts/release-version
- name: Upload artifacts
uses: actions/upload-artifact@v1
with:
name: artifacts
path: artifacts
build-release: build-release:
name: build-release name: build-release
@@ -52,7 +68,7 @@ jobs:
# systems. # systems.
CARGO: cargo CARGO: cargo
# When CARGO is set to CROSS, this is set to `--target matrix.target`. # When CARGO is set to CROSS, this is set to `--target matrix.target`.
TARGET_FLAGS: "" TARGET_FLAGS:
# When CARGO is set to CROSS, TARGET_DIR includes matrix.target. # When CARGO is set to CROSS, TARGET_DIR includes matrix.target.
TARGET_DIR: ./target TARGET_DIR: ./target
# Emit backtraces on panics. # Emit backtraces on panics.
@@ -64,11 +80,11 @@ jobs:
build: [linux, linux-arm, macos, win-msvc, win-gnu, win32-msvc] build: [linux, linux-arm, macos, win-msvc, win-gnu, win32-msvc]
include: include:
- build: linux - build: linux
os: ubuntu-latest os: ubuntu-18.04
rust: nightly rust: nightly
target: x86_64-unknown-linux-musl target: x86_64-unknown-linux-musl
- build: linux-arm - build: linux-arm
os: ubuntu-latest os: ubuntu-18.04
rust: nightly rust: nightly
target: arm-unknown-linux-gnueabihf target: arm-unknown-linux-gnueabihf
- build: macos - build: macos
@@ -76,24 +92,26 @@ jobs:
rust: nightly rust: nightly
target: x86_64-apple-darwin target: x86_64-apple-darwin
- build: win-msvc - build: win-msvc
os: windows-latest os: windows-2019
rust: nightly rust: nightly
target: x86_64-pc-windows-msvc target: x86_64-pc-windows-msvc
- build: win-gnu - build: win-gnu
os: windows-latest os: windows-2019
rust: nightly-x86_64-gnu rust: nightly-x86_64-gnu
target: x86_64-pc-windows-gnu target: x86_64-pc-windows-gnu
- build: win32-msvc - build: win32-msvc
os: windows-latest os: windows-2019
rust: nightly rust: nightly
target: i686-pc-windows-msvc target: i686-pc-windows-msvc
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v3 uses: actions/checkout@v2
with:
fetch-depth: 1
- name: Install packages (Ubuntu) - name: Install packages (Ubuntu)
if: matrix.os == 'ubuntu-latest' if: matrix.os == 'ubuntu-18.04'
run: | run: |
ci/ubuntu-install-packages ci/ubuntu-install-packages
@@ -103,13 +121,15 @@ jobs:
ci/macos-install-packages ci/macos-install-packages
- name: Install Rust - name: Install Rust
uses: dtolnay/rust-toolchain@master uses: actions-rs/toolchain@v1
with: with:
toolchain: ${{ matrix.rust }} toolchain: ${{ matrix.rust }}
profile: minimal
override: true
target: ${{ matrix.target }} target: ${{ matrix.target }}
- name: Use Cross - name: Use Cross
shell: bash # if: matrix.os != 'windows-2019'
run: | run: |
cargo install cross cargo install cross
echo "CARGO=cross" >> $GITHUB_ENV echo "CARGO=cross" >> $GITHUB_ENV
@@ -122,11 +142,27 @@ jobs:
echo "target flag is: ${{ env.TARGET_FLAGS }}" echo "target flag is: ${{ env.TARGET_FLAGS }}"
echo "target dir is: ${{ env.TARGET_DIR }}" echo "target dir is: ${{ env.TARGET_DIR }}"
- name: Get release download URL
uses: actions/download-artifact@v1
with:
name: artifacts
path: artifacts
- name: Set release upload URL and release version
shell: bash
run: |
release_upload_url="$(cat artifacts/release-upload-url)"
echo "RELEASE_UPLOAD_URL=$release_upload_url" >> $GITHUB_ENV
echo "release upload url: $RELEASE_UPLOAD_URL"
release_version="$(cat artifacts/release-version)"
echo "RELEASE_VERSION=$release_version" >> $GITHUB_ENV
echo "release version: $RELEASE_VERSION"
- name: Build release binary - name: Build release binary
run: ${{ env.CARGO }} build --verbose --release --features pcre2 ${{ env.TARGET_FLAGS }} run: ${{ env.CARGO }} build --verbose --release --features pcre2 ${{ env.TARGET_FLAGS }}
- name: Strip release binary (linux, macos and macos-arm) - name: Strip release binary (linux and macos)
if: matrix.build == 'linux' || matrix.os == 'macos' if: matrix.build == 'linux' || matrix.build == 'macos'
run: strip "target/${{ matrix.target }}/release/rg" run: strip "target/${{ matrix.target }}/release/rg"
- name: Strip release binary (arm) - name: Strip release binary (arm)
@@ -142,7 +178,7 @@ jobs:
shell: bash shell: bash
run: | run: |
outdir="$(ci/cargo-out-dir "${{ env.TARGET_DIR }}")" outdir="$(ci/cargo-out-dir "${{ env.TARGET_DIR }}")"
staging="ripgrep-${{ needs.create-release.outputs.rg_version }}-${{ matrix.target }}" staging="ripgrep-${{ env.RELEASE_VERSION }}-${{ matrix.target }}"
mkdir -p "$staging"/{complete,doc} mkdir -p "$staging"/{complete,doc}
cp {README.md,COPYING,UNLICENSE,LICENSE-MIT} "$staging/" cp {README.md,COPYING,UNLICENSE,LICENSE-MIT} "$staging/"
@@ -150,23 +186,24 @@ jobs:
cp "$outdir"/{rg.bash,rg.fish,_rg.ps1} "$staging/complete/" cp "$outdir"/{rg.bash,rg.fish,_rg.ps1} "$staging/complete/"
cp complete/_rg "$staging/complete/" cp complete/_rg "$staging/complete/"
if [ "${{ matrix.os }}" = "windows-latest" ]; then if [ "${{ matrix.os }}" = "windows-2019" ]; then
cp "target/${{ matrix.target }}/release/rg.exe" "$staging/" cp "target/${{ matrix.target }}/release/rg.exe" "$staging/"
7z a "$staging.zip" "$staging" 7z a "$staging.zip" "$staging"
certutil -hashfile "$staging.zip" SHA256 > "$staging.zip.sha256"
echo "ASSET=$staging.zip" >> $GITHUB_ENV echo "ASSET=$staging.zip" >> $GITHUB_ENV
echo "ASSET_SUM=$staging.zip.sha256" >> $GITHUB_ENV
else else
# The man page is only generated on Unix systems. ¯\_(ツ)_/¯ # The man page is only generated on Unix systems. ¯\_(ツ)_/¯
cp "$outdir"/rg.1 "$staging/doc/" cp "$outdir"/rg.1 "$staging/doc/"
cp "target/${{ matrix.target }}/release/rg" "$staging/" cp "target/${{ matrix.target }}/release/rg" "$staging/"
tar czf "$staging.tar.gz" "$staging" tar czf "$staging.tar.gz" "$staging"
shasum -a 256 "$staging.tar.gz" > "$staging.tar.gz.sha256"
echo "ASSET=$staging.tar.gz" >> $GITHUB_ENV echo "ASSET=$staging.tar.gz" >> $GITHUB_ENV
echo "ASSET_SUM=$staging.tar.gz.sha256" >> $GITHUB_ENV
fi fi
- name: Upload release archive - name: Upload release archive
uses: actions/upload-release-asset@v1.0.1
env: env:
GH_TOKEN: ${{ github.token }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: gh release upload ${{ needs.create-release.outputs.rg_version }} ${{ env.ASSET }} ${{ env.ASSET_SUM }} with:
upload_url: ${{ env.RELEASE_UPLOAD_URL }}
asset_path: ${{ env.ASSET }}
asset_name: ${{ env.ASSET }}
asset_content_type: application/octet-stream

4
.gitignore vendored
View File

@@ -15,7 +15,3 @@ parts
*.snap *.snap
*.pyc *.pyc
ripgrep*_source.tar.bz2 ripgrep*_source.tar.bz2
# Cargo timings
cargo-timing-*.html
cargo-timing.html

View File

@@ -1 +0,0 @@
!/.github/

View File

@@ -2,70 +2,6 @@ TBD
=== ===
Unreleased changes. Release notes have not yet been written. Unreleased changes. Release notes have not yet been written.
**BREAKING CHANGES**
* `rg -C1 -A2` used to be equivalent to `rg -A2`, but now it is equivalent to
`rg -B1 -A2`. That is, `-A` and `-B` no longer completely override `-C`.
Instead, they only partially override `-C`.
Feature enhancements:
* Added or improved file type filtering for Ada, DITA, Elixir, Fuchsia, Gentoo, GraphQL, Markdown, Raku, TypeScript, USD, V
* [FEATURE #1790](https://github.com/BurntSushi/ripgrep/issues/1790):
Add new `--stop-on-nonmatch` flag.
* [FEATURE #2195](https://github.com/BurntSushi/ripgrep/issues/2195):
When `extra-verbose` mode is enabled in zsh, show extra file type info.
* [FEATURE #2409](https://github.com/BurntSushi/ripgrep/pull/2409):
Added installation instructions for `winget`.
Bug fixes:
* [BUG #1891](https://github.com/BurntSushi/ripgrep/issues/1891):
Fix bug when using `-w` with a regex that can match the empty string.
* [BUG #1911](https://github.com/BurntSushi/ripgrep/issues/1911):
Disable mmap searching in all non-64-bit environments.
* [BUG #2108](https://github.com/BurntSushi/ripgrep/issues/2108):
Improve docs for `-r/--replace` syntax.
* [BUG #2198](https://github.com/BurntSushi/ripgrep/issues/2198):
Fix bug where `--no-ignore-dot` would not ignore `.rgignore`.
* [BUG #2288](https://github.com/BurntSushi/ripgrep/issues/2288):
`-A` and `-B` now only each partially override `-C`.
* [BUG #2236](https://github.com/BurntSushi/ripgrep/issues/2236):
Fix gitignore parsing bug where a trailing `\/` resulted in an error.
* [BUG #2243](https://github.com/BurntSushi/ripgrep/issues/2243):
Fix `--sort` flag for values other than `path`.
* [BUG #2480](https://github.com/BurntSushi/ripgrep/issues/2480):
Fix bug when using inline regex flags with `-e/--regexp`.
* [BUG #2523](https://github.com/BurntSushi/ripgrep/issues/2523):
Make executable searching take `.com` into account on Windows.
13.0.0 (2021-06-12)
===================
ripgrep 13 is a new major version release of ripgrep that primarily contains
bug fixes, some performance improvements and a few minor breaking changes.
There is also a fix for a security vulnerability on Windows
([CVE-2021-3013](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-3013)).
Some highlights:
A new short flag, `-.`, has been added. It is an alias for the `--hidden` flag,
which instructs ripgrep to search hidden files and directories.
ripgrep is now using a new
[vectorized implementation of `memmem`](https://github.com/BurntSushi/memchr/pull/82),
which accelerates many common searches. If you notice any performance
regressions (or major improvements), I'd love to hear about them through an
issue report!
Also, for Windows users targeting MSVC, Cargo will now build fully static
executables of ripgrep. The release binaries for ripgrep 13 have been compiled
using this configuration.
**BREAKING CHANGES**:
**Binary detection output has changed slightly.**
In this release, a small tweak has been made to the output format when a binary In this release, a small tweak has been made to the output format when a binary
file is detected. Previously, it looked like this: file is detected. Previously, it looked like this:
@@ -79,100 +15,12 @@ Now it looks like this:
FOO: binary file matches (found "\0" byte around offset XXX) FOO: binary file matches (found "\0" byte around offset XXX)
``` ```
**vimgrep output in multi-line now only prints the first line for each match.**
See [issue 1866](https://github.com/BurntSushi/ripgrep/issues/1866) for more
discussion on this. Previously, every line in a match was duplicated, even
when it spanned multiple lines. There are no changes to vimgrep output when
multi-line mode is disabled.
**In multi-line mode, --count is now equivalent to --count-matches.**
This appears to match how `pcre2grep` implements `--count`. Previously, ripgrep
would produce outright incorrect counts. Another alternative would be to simply
count the number of lines---even if it's more than the number of matches---but
that seems highly unintuitive.
**FULL LIST OF FIXES AND IMPROVEMENTS:**
Security fixes:
* [CVE-2021-3013](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-3013):
Fixes a security hole on Windows where running ripgrep with either the
`-z/--search-zip` or `--pre` flags can result in running arbitrary
executables from the current directory.
* [VULN #1773](https://github.com/BurntSushi/ripgrep/issues/1773):
This is the public facing issue tracking CVE-2021-3013. ripgrep's README
now contains a section describing how to report a vulnerability.
Performance improvements:
* [PERF #1657](https://github.com/BurntSushi/ripgrep/discussions/1657):
Check if a file should be ignored first before issuing stat calls.
* [PERF memchr#82](https://github.com/BurntSushi/memchr/pull/82):
ripgrep now uses a new vectorized implementation of `memmem`.
Feature enhancements:
* Added or improved file type filtering for ASP, Bazel, dvc, FlatBuffers,
Futhark, minified files, Mint, pofiles (from GNU gettext) Racket, Red, Ruby,
VCL, Yang.
* [FEATURE #1404](https://github.com/BurntSushi/ripgrep/pull/1404):
ripgrep now prints a warning if nothing is searched.
* [FEATURE #1613](https://github.com/BurntSushi/ripgrep/pull/1613):
Cargo will now produce static executables on Windows when using MSVC.
* [FEATURE #1680](https://github.com/BurntSushi/ripgrep/pull/1680):
Add `-.` as a short flag alias for `--hidden`.
* [FEATURE #1842](https://github.com/BurntSushi/ripgrep/issues/1842):
Add `--field-{context,match}-separator` for customizing field delimiters.
* [FEATURE #1856](https://github.com/BurntSushi/ripgrep/pull/1856):
The README now links to a
[Spanish translation](https://github.com/UltiRequiem/traducciones/tree/master/ripgrep).
Bug fixes: Bug fixes:
* [BUG #1277](https://github.com/BurntSushi/ripgrep/issues/1277): * [BUG #1277](https://github.com/BurntSushi/ripgrep/issues/1277):
Document cygwin path translation behavior in the FAQ. Document cygwin path translation behavior in the FAQ.
* [BUG #1739](https://github.com/BurntSushi/ripgrep/issues/1739):
Fix bug where replacements were buggy if the regex matched a line terminator.
* [BUG #1311](https://github.com/BurntSushi/ripgrep/issues/1311):
Fix multi-line bug where a search & replace for `\n` didn't work as expected.
* [BUG #1401](https://github.com/BurntSushi/ripgrep/issues/1401):
Fix buggy interaction between PCRE2 look-around and `-o/--only-matching`.
* [BUG #1412](https://github.com/BurntSushi/ripgrep/issues/1412):
Fix multi-line bug with searches using look-around past matching lines.
* [BUG #1577](https://github.com/BurntSushi/ripgrep/issues/1577):
Fish shell completions will continue to be auto-generated.
* [BUG #1642](https://github.com/BurntSushi/ripgrep/issues/1642):
Fixes a bug where using `-m` and `-A` printed more matches than the limit.
* [BUG #1703](https://github.com/BurntSushi/ripgrep/issues/1703):
Clarify the function of `-u/--unrestricted`.
* [BUG #1708](https://github.com/BurntSushi/ripgrep/issues/1708):
Clarify how `-S/--smart-case` works.
* [BUG #1730](https://github.com/BurntSushi/ripgrep/issues/1730):
Clarify that CLI invocation must always be valid, regardless of config file.
* [BUG #1741](https://github.com/BurntSushi/ripgrep/issues/1741): * [BUG #1741](https://github.com/BurntSushi/ripgrep/issues/1741):
Fix stdin detection when using PowerShell in UNIX environments. Fix stdin detection when using PowerShell in UNIX environments.
* [BUG #1756](https://github.com/BurntSushi/ripgrep/pull/1756):
Fix bug where `foo/**` would match `foo`, but it shouldn't.
* [BUG #1765](https://github.com/BurntSushi/ripgrep/issues/1765):
Fix panic when `--crlf` is used in some cases.
* [BUG #1638](https://github.com/BurntSushi/ripgrep/issues/1638):
Correctly sniff UTF-8 and do transcoding, like we do for UTF-16.
* [BUG #1816](https://github.com/BurntSushi/ripgrep/issues/1816):
Add documentation for glob alternate syntax, e.g., `{a,b,..}`.
* [BUG #1847](https://github.com/BurntSushi/ripgrep/issues/1847):
Clarify how the `--hidden` flag works.
* [BUG #1866](https://github.com/BurntSushi/ripgrep/issues/1866#issuecomment-841635553):
Fix bug when computing column numbers in `--vimgrep` mode.
* [BUG #1868](https://github.com/BurntSushi/ripgrep/issues/1868):
Fix bug where `--passthru` and `-A/-B/-C` did not override each other.
* [BUG #1869](https://github.com/BurntSushi/ripgrep/pull/1869):
Clarify docs for `--files-with-matches` and `--files-without-match`.
* [BUG #1878](https://github.com/BurntSushi/ripgrep/issues/1878):
Fix bug where `\A` could produce unanchored matches in multiline search.
* [BUG 94e4b8e3](https://github.com/BurntSushi/ripgrep/commit/94e4b8e3):
Fix column numbers with `--vimgrep` is used with `-U/--multiline`.
12.1.1 (2020-05-29) 12.1.1 (2020-05-29)

298
Cargo.lock generated
View File

@@ -1,54 +1,81 @@
# This file is automatically @generated by Cargo. # This file is automatically @generated by Cargo.
# It is not intended for manual editing. # It is not intended for manual editing.
version = 3
[[package]] [[package]]
name = "aho-corasick" name = "aho-corasick"
version = "1.0.2" version = "0.7.14"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "43f6cb1bf222025340178f382c426f13757b2960e89779dfcb319c32542a5a41" checksum = "b476ce7103678b0c6d3d395dbbae31d48ff910bd28be979ba5d48c6351131d0d"
dependencies = [ dependencies = [
"memchr", "memchr",
] ]
[[package]] [[package]]
name = "base64" name = "atty"
version = "0.20.0" version = "0.2.14"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0ea22880d78093b0cbe17c89f64a7d457941e65759157ec6cb31a31d652b05e5" checksum = "d9b39be18770d11421cdb1b9947a45dd3f37e93092cbf377614828a319d5fee8"
dependencies = [
"hermit-abi",
"libc",
"winapi",
]
[[package]]
name = "autocfg"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cdb031dd78e28731d87d56cc8ffef4a8f36ca26c38fe2de700543e627f8a464a"
[[package]]
name = "base64"
version = "0.13.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "904dfeac50f3cdaba28fc6f57fdcddb75f49ed61346676a78c4ffe55877802fd"
[[package]] [[package]]
name = "bitflags" name = "bitflags"
version = "1.3.2" version = "1.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bef38d45163c2f1dde094a7dfd33ccf595c92905c8f8f4fdc18d06fb1037718a" checksum = "cf1de2fe8c75bc145a2f577add951f8134889b4795d47466a54a5c846d691693"
[[package]] [[package]]
name = "bstr" name = "bstr"
version = "1.6.0" version = "0.2.14"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6798148dccfbff0fae41c7574d2fa8f1ef3492fba0face179de5d8d447d67b05" checksum = "473fc6b38233f9af7baa94fb5852dca389e3d95b8e21c8e3719301462c5d9faf"
dependencies = [ dependencies = [
"lazy_static",
"memchr", "memchr",
"regex-automata", "regex-automata",
"serde",
] ]
[[package]] [[package]]
name = "bytecount" name = "bytecount"
version = "0.6.3" version = "0.6.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2c676a478f63e9fa2dd5368a42f28bba0d6c560b775f38583c8bbaa7fcd67c9c" checksum = "b0017894339f586ccb943b01b9555de56770c11cda818e7e3d8bd93f4ed7f46e"
[[package]]
name = "byteorder"
version = "1.3.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "08c48aae112d48ed9f069b33538ea9e3e90aa263cfa3d1c24309612b1f7472de"
[[package]] [[package]]
name = "cc" name = "cc"
version = "1.0.79" version = "1.0.61"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "50d30906286121d95be3d479533b458f87493b30a4b5f79a607db8f5d11aa91f" checksum = "ed67cbde08356238e75fc4656be4749481eeffb09e19f320a25237d5221c985d"
dependencies = [ dependencies = [
"jobserver", "jobserver",
] ]
[[package]]
name = "cfg-if"
version = "0.1.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4785bdd1c96b2a846b2bd7cc02e86b6b3dbf14e7e53446c4f54c92a361040822"
[[package]] [[package]]
name = "cfg-if" name = "cfg-if"
version = "1.0.0" version = "1.0.0"
@@ -57,9 +84,9 @@ checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd"
[[package]] [[package]]
name = "clap" name = "clap"
version = "2.34.0" version = "2.33.3"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a0610544180c38b88101fecf2dd634b174a62eef6946f84dfc6a7127512b381c" checksum = "37e58ac78573c40708d45522f0d80fa2f01cc4f9b4e2bf749807255454312002"
dependencies = [ dependencies = [
"bitflags", "bitflags",
"strsim", "strsim",
@@ -68,31 +95,40 @@ dependencies = [
] ]
[[package]] [[package]]
name = "crossbeam-channel" name = "const_fn"
version = "0.5.8" version = "0.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a33c2bf77f2df06183c3aa30d1e96c0695a313d4f9c453cc3762a6db39f99200" checksum = "ce90df4c658c62f12d78f7508cf92f9173e5184a539c10bfe54a3107b3ffd0f2"
[[package]]
name = "crossbeam-channel"
version = "0.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dca26ee1f8d361640700bde38b2c37d8c22b3ce2d360e1fc1c74ea4b0aa7d775"
dependencies = [ dependencies = [
"cfg-if", "cfg-if 1.0.0",
"crossbeam-utils", "crossbeam-utils",
] ]
[[package]] [[package]]
name = "crossbeam-utils" name = "crossbeam-utils"
version = "0.8.16" version = "0.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5a22b2d63d4d1dc0b7f1b6b2747dd0088008a9be28b6ddf0b1e7d335e3037294" checksum = "ec91540d98355f690a86367e566ecad2e9e579f230230eb7c21398372be73ea5"
dependencies = [ dependencies = [
"cfg-if", "autocfg",
"cfg-if 1.0.0",
"const_fn",
"lazy_static",
] ]
[[package]] [[package]]
name = "encoding_rs" name = "encoding_rs"
version = "0.8.32" version = "0.8.26"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "071a31f4ee85403370b58aca746f01041ede6f0da2730960ad001edc2b71b394" checksum = "801bbab217d7f79c0062f4f7205b5d4427c6d1a7bd7aafdd1475f7c59d62b283"
dependencies = [ dependencies = [
"cfg-if", "cfg-if 1.0.0",
"packed_simd_2", "packed_simd_2",
] ]
@@ -112,14 +148,20 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3f9eec918d3f24069decb9af1554cad7c880e2da24a9afd88aca000531ab82c1" checksum = "3f9eec918d3f24069decb9af1554cad7c880e2da24a9afd88aca000531ab82c1"
[[package]] [[package]]
name = "glob" name = "fs_extra"
version = "0.3.1" version = "1.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d2fabcfbdc87f4758337ca535fb41a6d701b65693ce38287d856d1674551ec9b" checksum = "2022715d62ab30faffd124d40b76f4134a550a87792276512b18d63272333394"
[[package]]
name = "glob"
version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9b919933a397b79c37e33b77bb2aa3dc8eb6e165ad809e58ff75bc7db2e34574"
[[package]] [[package]]
name = "globset" name = "globset"
version = "0.4.11" version = "0.4.6"
dependencies = [ dependencies = [
"aho-corasick", "aho-corasick",
"bstr", "bstr",
@@ -134,7 +176,7 @@ dependencies = [
[[package]] [[package]]
name = "grep" name = "grep"
version = "0.2.12" version = "0.2.7"
dependencies = [ dependencies = [
"grep-cli", "grep-cli",
"grep-matcher", "grep-matcher",
@@ -148,8 +190,9 @@ dependencies = [
[[package]] [[package]]
name = "grep-cli" name = "grep-cli"
version = "0.1.9" version = "0.1.5"
dependencies = [ dependencies = [
"atty",
"bstr", "bstr",
"globset", "globset",
"lazy_static", "lazy_static",
@@ -162,7 +205,7 @@ dependencies = [
[[package]] [[package]]
name = "grep-matcher" name = "grep-matcher"
version = "0.1.6" version = "0.1.4"
dependencies = [ dependencies = [
"memchr", "memchr",
"regex", "regex",
@@ -170,16 +213,15 @@ dependencies = [
[[package]] [[package]]
name = "grep-pcre2" name = "grep-pcre2"
version = "0.1.6" version = "0.1.4"
dependencies = [ dependencies = [
"grep-matcher", "grep-matcher",
"log",
"pcre2", "pcre2",
] ]
[[package]] [[package]]
name = "grep-printer" name = "grep-printer"
version = "0.1.7" version = "0.1.5"
dependencies = [ dependencies = [
"base64", "base64",
"bstr", "bstr",
@@ -187,25 +229,27 @@ dependencies = [
"grep-regex", "grep-regex",
"grep-searcher", "grep-searcher",
"serde", "serde",
"serde_derive",
"serde_json", "serde_json",
"termcolor", "termcolor",
] ]
[[package]] [[package]]
name = "grep-regex" name = "grep-regex"
version = "0.1.11" version = "0.1.8"
dependencies = [ dependencies = [
"aho-corasick", "aho-corasick",
"bstr", "bstr",
"grep-matcher", "grep-matcher",
"log", "log",
"regex-automata", "regex",
"regex-syntax", "regex-syntax",
"thread_local",
] ]
[[package]] [[package]]
name = "grep-searcher" name = "grep-searcher"
version = "0.1.11" version = "0.1.7"
dependencies = [ dependencies = [
"bstr", "bstr",
"bytecount", "bytecount",
@@ -214,15 +258,25 @@ dependencies = [
"grep-matcher", "grep-matcher",
"grep-regex", "grep-regex",
"log", "log",
"memmap2", "memmap",
"regex", "regex",
] ]
[[package]]
name = "hermit-abi"
version = "0.1.17"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5aca5565f760fb5b220e499d72710ed156fdb74e631659e99377d9ebfbd13ae8"
dependencies = [
"libc",
]
[[package]] [[package]]
name = "ignore" name = "ignore"
version = "0.4.20" version = "0.4.17"
dependencies = [ dependencies = [
"crossbeam-channel", "crossbeam-channel",
"crossbeam-utils",
"globset", "globset",
"lazy_static", "lazy_static",
"log", "log",
@@ -236,25 +290,26 @@ dependencies = [
[[package]] [[package]]
name = "itoa" name = "itoa"
version = "1.0.8" version = "0.4.6"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "62b02a5381cc465bd3041d84623d0fa3b66738b52b8e2fc3bab8ad63ab032f4a" checksum = "dc6f3ad7b9d11a0c00842ff8de1b60ee58661048eb8049ed33c73594f359d7e6"
[[package]] [[package]]
name = "jemalloc-sys" name = "jemalloc-sys"
version = "0.5.3+5.3.0-patched" version = "0.3.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f9bd5d616ea7ed58b571b2e209a65759664d7fb021a0819d7a790afc67e47ca1" checksum = "0d3b9f3f5c9b31aa0f5ed3260385ac205db665baa41d49bb8338008ae94ede45"
dependencies = [ dependencies = [
"cc", "cc",
"fs_extra",
"libc", "libc",
] ]
[[package]] [[package]]
name = "jemallocator" name = "jemallocator"
version = "0.5.0" version = "0.3.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "16c2514137880c52b0b4822b563fadd38257c1f380858addb74a400889696ea6" checksum = "43ae63fcfc45e99ab3d1b29a46782ad679e98436c3169d15a167a1108a724b69"
dependencies = [ dependencies = [
"jemalloc-sys", "jemalloc-sys",
"libc", "libc",
@@ -262,9 +317,9 @@ dependencies = [
[[package]] [[package]]
name = "jobserver" name = "jobserver"
version = "0.1.26" version = "0.1.21"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "936cfd212a0155903bcbc060e316fb6cc7cbf2e1907329391ebadc1fe0ce77c2" checksum = "5c71313ebb9439f74b00d9d2dcec36440beaf57a6aa0623068441dd7cd81a7f2"
dependencies = [ dependencies = [
"libc", "libc",
] ]
@@ -277,9 +332,9 @@ checksum = "e2abad23fbc42b3700f2f279844dc832adb2b2eb069b2df918f455c4e18cc646"
[[package]] [[package]]
name = "libc" name = "libc"
version = "0.2.147" version = "0.2.80"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b4668fb0ea861c1df094127ac5f1da3409a82116a4ba74fca2e58ef927159bb3" checksum = "4d58d1b70b004888f764dfbf6a26a3b0342a1632d33968e4a179d8011c760614"
[[package]] [[package]]
name = "libm" name = "libm"
@@ -289,46 +344,54 @@ checksum = "7fc7aa29613bd6a620df431842069224d8bc9011086b1db4c0e0cd47fa03ec9a"
[[package]] [[package]]
name = "log" name = "log"
version = "0.4.19" version = "0.4.11"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b06a4cde4c0f271a446782e3eff8de789548ce57dbc8eca9292c27f4a42004b4" checksum = "4fabed175da42fed1fa0746b0ea71f412aa9d35e76e95e59b192c64b9dc2bf8b"
dependencies = [
"cfg-if 0.1.10",
]
[[package]] [[package]]
name = "memchr" name = "memchr"
version = "2.5.0" version = "2.3.4"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2dffe52ecf27772e601905b7522cb4ef790d2cc203488bbd0e2fe85fcb74566d" checksum = "0ee1c47aaa256ecabcaea351eae4a9b01ef39ed810004e298d2511ed284b1525"
[[package]] [[package]]
name = "memmap2" name = "memmap"
version = "0.5.10" version = "0.7.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "83faa42c0a078c393f6b29d5db232d8be22776a891f8f56e5284faee4a20b327" checksum = "6585fd95e7bb50d6cc31e20d4cf9afb4e2ba16c5846fc76793f11218da9c475b"
dependencies = [ dependencies = [
"libc", "libc",
"winapi",
]
[[package]]
name = "num_cpus"
version = "1.13.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "05499f3756671c15885fee9034446956fff3f243d6077b91e5767df161f766b3"
dependencies = [
"hermit-abi",
"libc",
] ]
[[package]]
name = "once_cell"
version = "1.18.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dd8b5dd2ae5ed71462c540258bedcb51965123ad7e7ccf4b9a8cafaa4a63576d"
[[package]] [[package]]
name = "packed_simd_2" name = "packed_simd_2"
version = "0.3.8" version = "0.3.4"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a1914cd452d8fccd6f9db48147b29fd4ae05bea9dc5d9ad578509f72415de282" checksum = "3278e0492f961fd4ae70909f56b2723a7e8d01a228427294e19cdfdebda89a17"
dependencies = [ dependencies = [
"cfg-if", "cfg-if 0.1.10",
"libm", "libm",
] ]
[[package]] [[package]]
name = "pcre2" name = "pcre2"
version = "0.2.4" version = "0.2.3"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "486aca7e74edb8cab09a48d461177f450a5cca3b55e61d139f7552190e2bbcf5" checksum = "85b30f2f69903b439dd9dc9e824119b82a55bf113b29af8d70948a03c1b11ab1"
dependencies = [ dependencies = [
"libc", "libc",
"log", "log",
@@ -338,9 +401,9 @@ dependencies = [
[[package]] [[package]]
name = "pcre2-sys" name = "pcre2-sys"
version = "0.2.6" version = "0.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ae234f441970dbd52d4e29bee70f3b56ca83040081cb2b55b7df772b16e0b06e" checksum = "dec30e5e9ec37eb8fbf1dea5989bc957fd3df56fbee5061aa7b7a99dbb37b722"
dependencies = [ dependencies = [
"cc", "cc",
"libc", "libc",
@@ -349,60 +412,58 @@ dependencies = [
[[package]] [[package]]
name = "pkg-config" name = "pkg-config"
version = "0.3.27" version = "0.3.19"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "26072860ba924cbfa98ea39c8c19b4dd6a4a25423dbdf219c1eca91aa0cf6964" checksum = "3831453b3449ceb48b6d9c7ad7c96d5ea673e9b470a1dc578c2ce6521230884c"
[[package]] [[package]]
name = "proc-macro2" name = "proc-macro2"
version = "1.0.63" version = "1.0.24"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7b368fba921b0dce7e60f5e04ec15e565b3303972b42bcfde1d0713b881959eb" checksum = "1e0704ee1a7e00d7bb417d0770ea303c1bccbabf0ef1667dae92b5967f5f8a71"
dependencies = [ dependencies = [
"unicode-ident", "unicode-xid",
] ]
[[package]] [[package]]
name = "quote" name = "quote"
version = "1.0.29" version = "1.0.7"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "573015e8ab27661678357f27dc26460738fd2b6c86e46f386fde94cb5d913105" checksum = "aa563d17ecb180e500da1cfd2b028310ac758de548efdd203e18f283af693f37"
dependencies = [ dependencies = [
"proc-macro2", "proc-macro2",
] ]
[[package]] [[package]]
name = "regex" name = "regex"
version = "1.9.0" version = "1.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "89089e897c013b3deb627116ae56a6955a72b8bed395c9526af31c9fe528b484" checksum = "38cf2c13ed4745de91a5eb834e11c00bcc3709e773173b2ce4c56c9fbde04b9c"
dependencies = [ dependencies = [
"aho-corasick", "aho-corasick",
"memchr", "memchr",
"regex-automata",
"regex-syntax", "regex-syntax",
"thread_local",
] ]
[[package]] [[package]]
name = "regex-automata" name = "regex-automata"
version = "0.3.0" version = "0.1.9"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fa250384981ea14565685dea16a9ccc4d1c541a13f82b9c168572264d1df8c56" checksum = "ae1ded71d66a4a97f5e961fd0cb25a5f366a42a41570d16a763a69c092c26ae4"
dependencies = [ dependencies = [
"aho-corasick", "byteorder",
"memchr",
"regex-syntax",
] ]
[[package]] [[package]]
name = "regex-syntax" name = "regex-syntax"
version = "0.7.3" version = "0.6.21"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2ab07dc67230e4a4718e70fd5c20055a4334b121f1f9db8fe63ef39ce9b8c846" checksum = "3b181ba2dcf07aaccad5448e8ead58db5b742cf85dfe035e2227f137a539a189"
[[package]] [[package]]
name = "ripgrep" name = "ripgrep"
version = "13.0.0" version = "12.1.1"
dependencies = [ dependencies = [
"bstr", "bstr",
"clap", "clap",
@@ -411,6 +472,8 @@ dependencies = [
"jemallocator", "jemallocator",
"lazy_static", "lazy_static",
"log", "log",
"num_cpus",
"regex",
"serde", "serde",
"serde_derive", "serde_derive",
"serde_json", "serde_json",
@@ -420,9 +483,9 @@ dependencies = [
[[package]] [[package]]
name = "ryu" name = "ryu"
version = "1.0.14" version = "1.0.5"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fe232bdf6be8c8de797b22184ee71118d63780ea42ac85b61d1baa6d3b782ae9" checksum = "71d301d4193d031abdd79ff7e3dd721168a9572ef3fe51a1517aba235bd8f86e"
[[package]] [[package]]
name = "same-file" name = "same-file"
@@ -435,18 +498,15 @@ dependencies = [
[[package]] [[package]]
name = "serde" name = "serde"
version = "1.0.166" version = "1.0.117"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d01b7404f9d441d3ad40e6a636a7782c377d2abdbe4fa2440e2edcc2f4f10db8" checksum = "b88fa983de7720629c9387e9f517353ed404164b1e482c970a90c1a4aaf7dc1a"
dependencies = [
"serde_derive",
]
[[package]] [[package]]
name = "serde_derive" name = "serde_derive"
version = "1.0.166" version = "1.0.117"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5dd83d6dde2b6b2d466e14d9d1acce8816dedee94f735eac6395808b3483c6d6" checksum = "cbd1ae72adb44aab48f325a02444a5fc079349a8d804c1fc922aed3f7454c74e"
dependencies = [ dependencies = [
"proc-macro2", "proc-macro2",
"quote", "quote",
@@ -455,9 +515,9 @@ dependencies = [
[[package]] [[package]]
name = "serde_json" name = "serde_json"
version = "1.0.100" version = "1.0.59"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0f1e14e89be7aa4c4b78bdbdc9eb5bf8517829a600ae8eaa39a6e1d960b5185c" checksum = "dcac07dbffa1c65e7f816ab9eba78eb142c6d44410f4eeba1e26e4f5dfa56b95"
dependencies = [ dependencies = [
"itoa", "itoa",
"ryu", "ryu",
@@ -472,20 +532,20 @@ checksum = "8ea5119cdb4c55b55d432abb513a0429384878c15dde60cc77b1c99de1a95a6a"
[[package]] [[package]]
name = "syn" name = "syn"
version = "2.0.23" version = "1.0.48"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "59fb7d6d8281a51045d62b8eb3a7d1ce347b76f312af50cd3dc0af39c87c1737" checksum = "cc371affeffc477f42a221a1e4297aedcea33d47d19b61455588bd9d8f6b19ac"
dependencies = [ dependencies = [
"proc-macro2", "proc-macro2",
"quote", "quote",
"unicode-ident", "unicode-xid",
] ]
[[package]] [[package]]
name = "termcolor" name = "termcolor"
version = "1.2.0" version = "1.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "be55cf8942feac5c765c2c993422806843c9a9a45d4d5c407ad6dd2ea95eb9b6" checksum = "bb6bfa289a4d7c5766392812c0a1f4c1ba45afa1ad47803c11e1f407d846d75f"
dependencies = [ dependencies = [
"winapi-util", "winapi-util",
] ]
@@ -501,33 +561,33 @@ dependencies = [
[[package]] [[package]]
name = "thread_local" name = "thread_local"
version = "1.1.7" version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3fdd6f064ccff2d6567adcb3873ca630700f00b5ad3f060c25b5dcfd9a4ce152" checksum = "d40c6d1b69745a6ec6fb1ca717914848da4b44ae29d9b3080cbee91d72a69b14"
dependencies = [ dependencies = [
"cfg-if", "lazy_static",
"once_cell",
] ]
[[package]] [[package]]
name = "unicode-ident" name = "unicode-width"
version = "1.0.10" version = "0.1.8"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "22049a19f4a68748a168c0fc439f9516686aa045927ff767eca0a85101fb6e73" checksum = "9337591893a19b88d8d87f2cec1e73fad5cdfd10e5a6f349f498ad6ea2ffb1e3"
[[package]] [[package]]
name = "unicode-width" name = "unicode-xid"
version = "0.1.10" version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c0edd1e5b14653f783770bce4a4dabb4a5108a5370a5f5d8cfe8710c361f6c8b" checksum = "f7fe0bb3479651439c9112f72b6c505038574c9fbb575ed1bf3b797fa39dd564"
[[package]] [[package]]
name = "walkdir" name = "walkdir"
version = "2.3.3" version = "2.3.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "36df944cda56c7d8d8b7496af378e6b16de9284591917d307c9b4d313c44e698" checksum = "777182bc735b6424e1a57516d35ed72cb8019d85c8c9bf536dccb3445c1a2f7d"
dependencies = [ dependencies = [
"same-file", "same-file",
"winapi",
"winapi-util", "winapi-util",
] ]

View File

@@ -1,30 +1,23 @@
[package] [package]
name = "ripgrep" name = "ripgrep"
version = "13.0.0" #:version version = "12.1.1" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"] authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """ description = """
ripgrep is a line-oriented search tool that recursively searches the current ripgrep is a line-oriented search tool that recursively searches your current
directory for a regex pattern while respecting gitignore rules. ripgrep has directory for a regex pattern while respecting your gitignore rules. ripgrep
first class support on Windows, macOS and Linux. has first class support on Windows, macOS and Linux.
""" """
documentation = "https://github.com/BurntSushi/ripgrep" documentation = "https://github.com/BurntSushi/ripgrep"
homepage = "https://github.com/BurntSushi/ripgrep" homepage = "https://github.com/BurntSushi/ripgrep"
repository = "https://github.com/BurntSushi/ripgrep" repository = "https://github.com/BurntSushi/ripgrep"
readme = "README.md"
keywords = ["regex", "grep", "egrep", "search", "pattern"] keywords = ["regex", "grep", "egrep", "search", "pattern"]
categories = ["command-line-utilities", "text-processing"] categories = ["command-line-utilities", "text-processing"]
license = "Unlicense OR MIT" license = "Unlicense OR MIT"
exclude = [ exclude = ["HomebrewFormula"]
"HomebrewFormula",
"/.github/",
"/ci/",
"/pkg/",
"/benchsuite/",
"/scripts/",
]
build = "build.rs" build = "build.rs"
autotests = false autotests = false
edition = "2018" edition = "2018"
rust-version = "1.70"
[[bin]] [[bin]]
bench = false bench = false
@@ -49,11 +42,13 @@ members = [
] ]
[dependencies] [dependencies]
bstr = "1.6.0" bstr = "0.2.12"
grep = { version = "0.2.12", path = "crates/grep" } grep = { version = "0.2.7", path = "crates/grep" }
ignore = { version = "0.4.19", path = "crates/ignore" } ignore = { version = "0.4.16", path = "crates/ignore" }
lazy_static = "1.1.0" lazy_static = "1.1.0"
log = "0.4.5" log = "0.4.5"
num_cpus = "1.8.0"
regex = "1.3.5"
serde_json = "1.0.23" serde_json = "1.0.23"
termcolor = "1.1.0" termcolor = "1.1.0"
@@ -63,7 +58,7 @@ default-features = false
features = ["suggestions"] features = ["suggestions"]
[target.'cfg(all(target_env = "musl", target_pointer_width = "64"))'.dependencies.jemallocator] [target.'cfg(all(target_env = "musl", target_pointer_width = "64"))'.dependencies.jemallocator]
version = "0.5.0" version = "0.3.0"
[build-dependencies] [build-dependencies]
lazy_static = "1.1.0" lazy_static = "1.1.0"

View File

@@ -6,7 +6,6 @@ image = "burntsushi/cross:i686-unknown-linux-gnu"
[target.mips64-unknown-linux-gnuabi64] [target.mips64-unknown-linux-gnuabi64]
image = "burntsushi/cross:mips64-unknown-linux-gnuabi64" image = "burntsushi/cross:mips64-unknown-linux-gnuabi64"
build-std = true
[target.arm-unknown-linux-gnueabihf] [target.arm-unknown-linux-gnueabihf]
image = "burntsushi/cross:arm-unknown-linux-gnueabihf" image = "burntsushi/cross:arm-unknown-linux-gnueabihf"

View File

@@ -177,21 +177,16 @@ After recursive search, ripgrep's most important feature is what it *doesn't*
search. By default, when you search a directory, ripgrep will ignore all of search. By default, when you search a directory, ripgrep will ignore all of
the following: the following:
1. Files and directories that match glob patterns in these three categories: 1. Files and directories that match the rules in your `.gitignore` glob
1. gitignore globs (including global and repo-specific globs). pattern.
2. `.ignore` globs, which take precedence over all gitignore globs
when there's a conflict.
3. `.rgignore` globs, which take precedence over all `.ignore` globs
when there's a conflict.
2. Hidden files and directories. 2. Hidden files and directories.
3. Binary files. (ripgrep considers any file with a `NUL` byte to be binary.) 3. Binary files. (ripgrep considers any file with a `NUL` byte to be binary.)
4. Symbolic links aren't followed. 4. Symbolic links aren't followed.
All of these things can be toggled using various flags provided by ripgrep: All of these things can be toggled using various flags provided by ripgrep:
1. You can disable all ignore-related filtering with the `--no-ignore` flag. 1. You can disable `.gitignore` handling with the `--no-ignore` flag.
2. Hidden files and directories can be searched with the `--hidden` (`-.` for 2. Hidden files and directories can be searched with the `--hidden` flag.
short) flag.
3. Binary files can be searched via the `--text` (`-a` for short) flag. 3. Binary files can be searched via the `--text` (`-a` for short) flag.
Be careful with this flag! Binary files may emit control characters to your Be careful with this flag! Binary files may emit control characters to your
terminal, which might cause strange behavior. terminal, which might cause strange behavior.
@@ -567,15 +562,12 @@ $ cat $HOME/.ripgreprc
--type-add --type-add
web:*.{html,css,js}* web:*.{html,css,js}*
# Search hidden files / directories (e.g. dotfiles) by default
--hidden
# Using glob patterns to include/exclude files or folders # Using glob patterns to include/exclude files or folders
--glob=!.git/* --glob=!git/*
# or # or
--glob --glob
!.git/* !git/*
# Set the colors. # Set the colors.
--colors=line:none --colors=line:none
@@ -652,9 +644,9 @@ given, which is the default:
they correspond to a UTF-16 BOM, then ripgrep will transcode the contents of they correspond to a UTF-16 BOM, then ripgrep will transcode the contents of
the file from UTF-16 to UTF-8, and then execute the search on the transcoded the file from UTF-16 to UTF-8, and then execute the search on the transcoded
version of the file. (This incurs a performance penalty since transcoding version of the file. (This incurs a performance penalty since transcoding
is needed in addition to regex searching.) If the file contains invalid is slower than regex searching.) If the file contains invalid UTF-16, then
UTF-16, then the Unicode replacement codepoint is substituted in place of the Unicode replacement codepoint is substituted in place of invalid code
invalid code units. units.
* To handle other cases, ripgrep provides a `-E/--encoding` flag, which permits * To handle other cases, ripgrep provides a `-E/--encoding` flag, which permits
you to specify an encoding from the you to specify an encoding from the
[Encoding Standard](https://encoding.spec.whatwg.org/#concept-encoding-get). [Encoding Standard](https://encoding.spec.whatwg.org/#concept-encoding-get).
@@ -996,8 +988,6 @@ used options that will likely impact how you use ripgrep on a regular basis.
* `-S/--smart-case`: This is similar to `--ignore-case`, but disables itself * `-S/--smart-case`: This is similar to `--ignore-case`, but disables itself
if the pattern contains any uppercase letters. Usually this flag is put into if the pattern contains any uppercase letters. Usually this flag is put into
alias or a config file. alias or a config file.
* `-F/--fixed-strings`: Disable regular expression matching and treat the pattern
as a literal string.
* `-w/--word-regexp`: Require that all matches of the pattern be surrounded * `-w/--word-regexp`: Require that all matches of the pattern be surrounded
by word boundaries. That is, given `pattern`, the `--word-regexp` flag will by word boundaries. That is, given `pattern`, the `--word-regexp` flag will
cause ripgrep to behave as if `pattern` were actually `\b(?:pattern)\b`. cause ripgrep to behave as if `pattern` were actually `\b(?:pattern)\b`.

125
README.md
View File

@@ -1,12 +1,12 @@
ripgrep (rg) ripgrep (rg)
------------ ------------
ripgrep is a line-oriented search tool that recursively searches the current ripgrep is a line-oriented search tool that recursively searches your current
directory for a regex pattern. By default, ripgrep will respect gitignore rules directory for a regex pattern. By default, ripgrep will respect your .gitignore
and automatically skip hidden files/directories and binary files. (To disable and automatically skip hidden files/directories and binary files. ripgrep
all automatic filtering by default, use `rg -uuu`.) ripgrep has first class has first class support on Windows, macOS and Linux, with binary downloads
support on Windows, macOS and Linux, with binary downloads available for [every available for [every release](https://github.com/BurntSushi/ripgrep/releases).
release](https://github.com/BurntSushi/ripgrep/releases). ripgrep is similar to ripgrep is similar to other popular search tools like The Silver Searcher, ack
other popular search tools like The Silver Searcher, ack and grep. and grep.
[![Build status](https://github.com/BurntSushi/ripgrep/workflows/ci/badge.svg)](https://github.com/BurntSushi/ripgrep/actions) [![Build status](https://github.com/BurntSushi/ripgrep/workflows/ci/badge.svg)](https://github.com/BurntSushi/ripgrep/actions)
[![Crates.io](https://img.shields.io/crates/v/ripgrep.svg)](https://crates.io/crates/ripgrep) [![Crates.io](https://img.shields.io/crates/v/ripgrep.svg)](https://crates.io/crates/ripgrep)
@@ -90,16 +90,16 @@ times are unaffected by the presence or absence of `-n`.
because it contains most of their features and is generally faster. (See because it contains most of their features and is generally faster. (See
[the FAQ](FAQ.md#posix4ever) for more details on whether ripgrep can truly [the FAQ](FAQ.md#posix4ever) for more details on whether ripgrep can truly
replace grep.) replace grep.)
* Like other tools specialized to code search, ripgrep defaults to * Like other tools specialized to code search, ripgrep defaults to recursive
[recursive search](GUIDE.md#recursive-search) and does [automatic directory search and won't search files ignored by your
filtering](GUIDE.md#automatic-filtering). Namely, ripgrep won't search files `.gitignore`/`.ignore`/`.rgignore` files. It also ignores hidden and binary
ignored by your `.gitignore`/`.ignore`/`.rgignore` files, it won't search files by default. ripgrep also implements full support for `.gitignore`,
hidden files and it won't search binary files. Automatic filtering can be whereas there are many bugs related to that functionality in other code
disabled with `rg -uuu`. search tools claiming to provide the same functionality.
* ripgrep can [search specific types of files](GUIDE.md#manual-filtering-file-types). * ripgrep can search specific types of files. For example, `rg -tpy foo`
For example, `rg -tpy foo` limits your search to Python files and `rg -Tjs limits your search to Python files and `rg -Tjs foo` excludes JavaScript
foo` excludes JavaScript files from your search. ripgrep can be taught about files from your search. ripgrep can be taught about new file types with
new file types with custom matching rules. custom matching rules.
* ripgrep supports many features found in `grep`, such as showing the context * ripgrep supports many features found in `grep`, such as showing the context
of search results, searching multiple patterns, highlighting matches with of search results, searching multiple patterns, highlighting matches with
color and full Unicode support. Unlike GNU grep, ripgrep stays fast while color and full Unicode support. Unlike GNU grep, ripgrep stays fast while
@@ -110,20 +110,16 @@ times are unaffected by the presence or absence of `-n`.
regex engine. PCRE2 support can be enabled with `-P/--pcre2` (use PCRE2 regex engine. PCRE2 support can be enabled with `-P/--pcre2` (use PCRE2
always) or `--auto-hybrid-regex` (use PCRE2 only if needed). An alternative always) or `--auto-hybrid-regex` (use PCRE2 only if needed). An alternative
syntax is provided via the `--engine (default|pcre2|auto-hybrid)` option. syntax is provided via the `--engine (default|pcre2|auto-hybrid)` option.
* ripgrep has [rudimentary support for replacements](GUIDE.md#replacements), * ripgrep supports searching files in text encodings other than UTF-8, such
which permit rewriting output based on what was matched. as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for
* ripgrep supports [searching files in text encodings](GUIDE.md#file-encoding) automatically detecting UTF-16 is provided. Other text encodings must be
other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. specifically specified with the `-E/--encoding` flag.)
(Some support for automatically detecting UTF-16 is provided. Other text
encodings must be specifically specified with the `-E/--encoding` flag.)
* ripgrep supports searching files compressed in a common format (brotli, * ripgrep supports searching files compressed in a common format (brotli,
bzip2, gzip, lz4, lzma, xz, or zstandard) with the `-z/--search-zip` flag. bzip2, gzip, lz4, lzma, xz, or zstandard) with the `-z/--search-zip` flag.
* ripgrep supports * ripgrep supports
[arbitrary input preprocessing filters](GUIDE.md#preprocessor) [arbitrary input preprocessing filters](GUIDE.md#preprocessor)
which could be PDF text extraction, less supported decompression, decrypting, which could be PDF text extraction, less supported decompression, decrypting,
automatic encoding detection and so on. automatic encoding detection and so on.
* ripgrep can be configured via a
[configuration file](GUIDE.md#configuration-file).
In other words, use ripgrep if you like speed, filtering by default, fewer In other words, use ripgrep if you like speed, filtering by default, fewer
bugs and Unicode support. bugs and Unicode support.
@@ -196,9 +192,15 @@ multiline search and opt-in fancy regex support via PCRE2.
The binary name for ripgrep is `rg`. The binary name for ripgrep is `rg`.
**[Archives of precompiled binaries for ripgrep are available for Windows, **[Archives of precompiled binaries for ripgrep are available for Windows,
macOS and Linux.](https://github.com/BurntSushi/ripgrep/releases)** Linux and macOS and Linux.](https://github.com/BurntSushi/ripgrep/releases)** Users of
Windows binaries are static executables. Users of platforms not explicitly platforms not explicitly mentioned below are advised to download one of these
mentioned below are advised to download one of these archives. archives.
Linux binaries are static executables. Windows binaries are available either as
built with MinGW (GNU) or with Microsoft Visual C++ (MSVC). When possible,
prefer MSVC over GNU, but you'll need to have the [Microsoft VC++ 2015
redistributable](https://www.microsoft.com/en-us/download/details.aspx?id=48145)
installed.
If you're a **macOS Homebrew** or a **Linuxbrew** user, then you can install If you're a **macOS Homebrew** or a **Linuxbrew** user, then you can install
ripgrep from homebrew-core: ripgrep from homebrew-core:
@@ -228,25 +230,17 @@ If you're a **Windows Scoop** user, then you can install ripgrep from the
$ scoop install ripgrep $ scoop install ripgrep
``` ```
If you're a **Windows Winget** user, then you can install ripgrep from the
[winget-pkgs](https://github.com/microsoft/winget-pkgs/tree/master/manifests/b/BurntSushi/ripgrep)
repository:
```
$ winget install BurntSushi.ripgrep.MSVC
```
If you're an **Arch Linux** user, then you can install ripgrep from the official repos: If you're an **Arch Linux** user, then you can install ripgrep from the official repos:
``` ```
$ sudo pacman -S ripgrep $ pacman -S ripgrep
``` ```
If you're a **Gentoo** user, you can install ripgrep from the If you're a **Gentoo** user, you can install ripgrep from the
[official repo](https://packages.gentoo.org/packages/sys-apps/ripgrep): [official repo](https://packages.gentoo.org/packages/sys-apps/ripgrep):
``` ```
$ sudo emerge sys-apps/ripgrep $ emerge sys-apps/ripgrep
``` ```
If you're a **Fedora** user, you can install ripgrep from official If you're a **Fedora** user, you can install ripgrep from official
@@ -267,7 +261,6 @@ If you're a **RHEL/CentOS 7/8** user, you can install ripgrep from
[copr](https://copr.fedorainfracloud.org/coprs/carlwgeorge/ripgrep/): [copr](https://copr.fedorainfracloud.org/coprs/carlwgeorge/ripgrep/):
``` ```
$ sudo yum install -y yum-utils
$ sudo yum-config-manager --add-repo=https://copr.fedorainfracloud.org/coprs/carlwgeorge/ripgrep/repo/epel-7/carlwgeorge-ripgrep-epel-7.repo $ sudo yum-config-manager --add-repo=https://copr.fedorainfracloud.org/coprs/carlwgeorge/ripgrep/repo/epel-7/carlwgeorge-ripgrep-epel-7.repo
$ sudo yum install ripgrep $ sudo yum install ripgrep
``` ```
@@ -277,13 +270,7 @@ If you're a **Nix** user, you can install ripgrep from
``` ```
$ nix-env --install ripgrep $ nix-env --install ripgrep
``` $ # (Or using the attribute name, which is also ripgrep.)
If you're a **Guix** user, you can install ripgrep from the official
package collection:
```
$ sudo guix install ripgrep
``` ```
If you're a **Debian** user (or a user of a Debian derivative like **Ubuntu**), If you're a **Debian** user (or a user of a Debian derivative like **Ubuntu**),
@@ -291,14 +278,12 @@ then ripgrep can be installed using a binary `.deb` file provided in each
[ripgrep release](https://github.com/BurntSushi/ripgrep/releases). [ripgrep release](https://github.com/BurntSushi/ripgrep/releases).
``` ```
$ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/13.0.0/ripgrep_13.0.0_amd64.deb $ curl -LO https://github.com/BurntSushi/ripgrep/releases/download/12.1.1/ripgrep_12.1.1_amd64.deb
$ sudo dpkg -i ripgrep_13.0.0_amd64.deb $ sudo dpkg -i ripgrep_12.1.1_amd64.deb
``` ```
If you run Debian stable, ripgrep is [officially maintained by If you run Debian Buster (currently Debian stable) or Debian sid, ripgrep is
Debian](https://tracker.debian.org/pkg/rust-ripgrep), although its version may [officially maintained by Debian](https://tracker.debian.org/pkg/rust-ripgrep).
be older than the `deb` package available in the previous step.
``` ```
$ sudo apt-get install ripgrep $ sudo apt-get install ripgrep
``` ```
@@ -316,18 +301,11 @@ seem to work right and generate a number of very strange bug reports that I
don't know how to fix and don't have the time to fix. Therefore, it is no don't know how to fix and don't have the time to fix. Therefore, it is no
longer a recommended installation option.) longer a recommended installation option.)
If you're an **ALT** user, you can install ripgrep from the
[official repo](https://packages.altlinux.org/en/search?name=ripgrep):
```
$ sudo apt-get install ripgrep
```
If you're a **FreeBSD** user, then you can install ripgrep from the If you're a **FreeBSD** user, then you can install ripgrep from the
[official ports](https://www.freshports.org/textproc/ripgrep/): [official ports](https://www.freshports.org/textproc/ripgrep/):
``` ```
$ sudo pkg install ripgrep # pkg install ripgrep
``` ```
If you're an **OpenBSD** user, then you can install ripgrep from the If you're an **OpenBSD** user, then you can install ripgrep from the
@@ -341,26 +319,26 @@ If you're a **NetBSD** user, then you can install ripgrep from
[pkgsrc](https://pkgsrc.se/textproc/ripgrep): [pkgsrc](https://pkgsrc.se/textproc/ripgrep):
``` ```
$ sudo pkgin install ripgrep # pkgin install ripgrep
``` ```
If you're a **Haiku x86_64** user, then you can install ripgrep from the If you're a **Haiku x86_64** user, then you can install ripgrep from the
[official ports](https://github.com/haikuports/haikuports/tree/master/sys-apps/ripgrep): [official ports](https://github.com/haikuports/haikuports/tree/master/sys-apps/ripgrep):
``` ```
$ sudo pkgman install ripgrep $ pkgman install ripgrep
``` ```
If you're a **Haiku x86_gcc2** user, then you can install ripgrep from the If you're a **Haiku x86_gcc2** user, then you can install ripgrep from the
same port as Haiku x86_64 using the x86 secondary architecture build: same port as Haiku x86_64 using the x86 secondary architecture build:
``` ```
$ sudo pkgman install ripgrep_x86 $ pkgman install ripgrep_x86
``` ```
If you're a **Rust programmer**, ripgrep can be installed with `cargo`. If you're a **Rust programmer**, ripgrep can be installed with `cargo`.
* Note that the minimum supported version of Rust for ripgrep is **1.70.0**, * Note that the minimum supported version of Rust for ripgrep is **1.34.0**,
although ripgrep may work with older versions. although ripgrep may work with older versions.
* Note that the binary may be bigger than expected because it contains debug * Note that the binary may be bigger than expected because it contains debug
symbols. This is intentional. To remove debug symbols and therefore reduce symbols. This is intentional. To remove debug symbols and therefore reduce
@@ -375,7 +353,7 @@ $ cargo install ripgrep
ripgrep is written in Rust, so you'll need to grab a ripgrep is written in Rust, so you'll need to grab a
[Rust installation](https://www.rust-lang.org/) in order to compile it. [Rust installation](https://www.rust-lang.org/) in order to compile it.
ripgrep compiles with Rust 1.70.0 (stable) or newer. In general, ripgrep tracks ripgrep compiles with Rust 1.34.0 (stable) or newer. In general, ripgrep tracks
the latest stable release of the Rust compiler. the latest stable release of the Rust compiler.
To build ripgrep: To build ripgrep:
@@ -447,26 +425,9 @@ $ cargo test --all
from the repository root. from the repository root.
### Related tools
* [delta](https://github.com/dandavison/delta) is a syntax highlighting
pager that supports the `rg --json` output format. So all you need to do to
make it work is `rg --json pattern | delta`. See [delta's manual section on
grep](https://dandavison.github.io/delta/grep.html) for more details.
### Vulnerability reporting
For reporting a security vulnerability, please
[contact Andrew Gallant](https://blog.burntsushi.net/about/).
The contact page has my email address and PGP public key if you wish to send an
encrypted message.
### Translations ### Translations
The following is a list of known translations of ripgrep's documentation. These The following is a list of known translations of ripgrep's documentation. These
are unofficially maintained and may not be up to date. are unofficially maintained and may not be up to date.
* [Chinese](https://github.com/chinanf-boy/ripgrep-zh#%E6%9B%B4%E6%96%B0-) * [Chinese](https://github.com/chinanf-boy/ripgrep-zh#%E6%9B%B4%E6%96%B0-)
* [Spanish](https://github.com/UltiRequiem/traducciones/tree/master/ripgrep)

View File

@@ -1,11 +1,9 @@
Release Checklist Release Checklist
----------------- -----------------
* Ensure local `master` is up to date with respect to `origin/master`.
* Run `cargo update` and review dependency updates. Commit updated * Run `cargo update` and review dependency updates. Commit updated
`Cargo.lock`. `Cargo.lock`.
* Run `cargo outdated` and review semver incompatible updates. Unless there is * Run `cargo outdated` and review semver incompatible updates. Unless there is
a strong motivation otherwise, review and update every dependency. Also a strong motivation otherwise, review and update every dependency.
run `--aggressive`, but don't update to crates that are still in beta.
* Review changes for every crate in `crates` since the last ripgrep release. * Review changes for every crate in `crates` since the last ripgrep release.
If the set of changes is non-empty, issue a new release for that crate. Check If the set of changes is non-empty, issue a new release for that crate. Check
crates in the following order. After updating a crate, ensure minimal crates in the following order. After updating a crate, ensure minimal
@@ -26,19 +24,14 @@ Release Checklist
`cargo update -p ripgrep` so that the `Cargo.lock` is updated. Commit the `cargo update -p ripgrep` so that the `Cargo.lock` is updated. Commit the
changes and create a new signed tag. Alternatively, use changes and create a new signed tag. Alternatively, use
`cargo-up --no-push --no-release Cargo.toml {VERSION}` to automate this. `cargo-up --no-push --no-release Cargo.toml {VERSION}` to automate this.
* Push changes to GitHub, NOT including the tag. (But do not publish new
version of ripgrep to crates.io yet.)
* Once CI for `master` finishes successfully, push the version tag. (Trying to
do this in one step seems to result in GitHub Actions not seeing the tag
push and thus not running the release workflow.)
* Wait for CI to finish creating the release. If the release build fails, then * Wait for CI to finish creating the release. If the release build fails, then
delete the tag from GitHub, make fixes, re-tag, delete the release and push. delete the tag from GitHub, make fixes, re-tag, delete the release and push.
* Copy the relevant section of the CHANGELOG to the tagged release notes. * Copy the relevant section of the CHANGELOG to the tagged release notes.
Include this blurb describing what ripgrep is: Include this blurb describing what ripgrep is:
> In case you haven't heard of it before, ripgrep is a line-oriented search > In case you haven't heard of it before, ripgrep is a line-oriented search
> tool that recursively searches the current directory for a regex pattern. > tool that recursively searches your current directory for a regex pattern.
> By default, ripgrep will respect gitignore rules and automatically skip > By default, ripgrep will respect your gitignore rules and automatically
> hidden files/directories and binary files. > skip hidden files/directories and binary files.
* Run `ci/build-deb` locally and manually upload the deb package to the * Run `ci/build-deb` locally and manually upload the deb package to the
release. release.
* Run `cargo publish`. * Run `cargo publish`.

View File

@@ -26,13 +26,15 @@ SUBTITLES_DIR = 'subtitles'
SUBTITLES_EN_NAME = 'en.txt' SUBTITLES_EN_NAME = 'en.txt'
SUBTITLES_EN_NAME_SAMPLE = 'en.sample.txt' SUBTITLES_EN_NAME_SAMPLE = 'en.sample.txt'
SUBTITLES_EN_NAME_GZ = '%s.gz' % SUBTITLES_EN_NAME SUBTITLES_EN_NAME_GZ = '%s.gz' % SUBTITLES_EN_NAME
# SUBTITLES_EN_URL = 'http://opus.lingfil.uu.se/OpenSubtitles2016/mono/OpenSubtitles2016.raw.en.gz' # noqa
SUBTITLES_EN_URL = 'https://object.pouta.csc.fi/OPUS-OpenSubtitles/v2016/mono/en.txt.gz' # noqa SUBTITLES_EN_URL = 'https://object.pouta.csc.fi/OPUS-OpenSubtitles/v2016/mono/en.txt.gz' # noqa
SUBTITLES_RU_NAME = 'ru.txt' SUBTITLES_RU_NAME = 'ru.txt'
SUBTITLES_RU_NAME_GZ = '%s.gz' % SUBTITLES_RU_NAME SUBTITLES_RU_NAME_GZ = '%s.gz' % SUBTITLES_RU_NAME
# SUBTITLES_RU_URL = 'http://opus.lingfil.uu.se/OpenSubtitles2016/mono/OpenSubtitles2016.raw.ru.gz' # noqa
SUBTITLES_RU_URL = 'https://object.pouta.csc.fi/OPUS-OpenSubtitles/v2016/mono/ru.txt.gz' # noqa SUBTITLES_RU_URL = 'https://object.pouta.csc.fi/OPUS-OpenSubtitles/v2016/mono/ru.txt.gz' # noqa
LINUX_DIR = 'linux' LINUX_DIR = 'linux'
LINUX_CLONE = 'https://github.com/BurntSushi/linux' LINUX_CLONE = 'git://github.com/BurntSushi/linux'
# Grep takes locale settings from the environment. There is a *substantial* # Grep takes locale settings from the environment. There is a *substantial*
# performance impact for enabling Unicode, so we need to handle this explicitly # performance impact for enabling Unicode, so we need to handle this explicitly
@@ -544,11 +546,7 @@ def bench_subtitles_ru_literal(suite_dir):
Command('rg (lines)', ['rg', '-n', pat, ru]), Command('rg (lines)', ['rg', '-n', pat, ru]),
Command('ag (lines)', ['ag', '-s', pat, ru]), Command('ag (lines)', ['ag', '-s', pat, ru]),
Command('grep (lines)', ['grep', '-n', pat, ru], env=GREP_ASCII), Command('grep (lines)', ['grep', '-n', pat, ru], env=GREP_ASCII),
# ugrep incorrectly identifies this corpus as binary, but it is Command('ugrep (lines)', ['ugrep', '-n', pat, ru])
# entirely valid UTF-8. So we tell ugrep to always treat the corpus
# as text even though this technically gives it an edge over other
# tools. (It no longer needs to check for binary data.)
Command('ugrep (lines)', ['ugrep', '-a', '-n', pat, ru])
]) ])
@@ -566,8 +564,7 @@ def bench_subtitles_ru_literal_casei(suite_dir):
Command('grep (ASCII)', ['grep', '-E', '-i', pat, ru], env=GREP_ASCII), Command('grep (ASCII)', ['grep', '-E', '-i', pat, ru], env=GREP_ASCII),
Command('rg (lines)', ['rg', '-n', '-i', pat, ru]), Command('rg (lines)', ['rg', '-n', '-i', pat, ru]),
Command('ag (lines) (ASCII)', ['ag', '-i', pat, ru]), Command('ag (lines) (ASCII)', ['ag', '-i', pat, ru]),
# See bench_subtitles_ru_literal for why we use '-a' here. Command('ugrep (lines) (ASCII)', ['ugrep', '-n', '-i', pat, ru])
Command('ugrep (lines) (ASCII)', ['ugrep', '-a', '-n', '-i', pat, ru])
]) ])
@@ -591,8 +588,7 @@ def bench_subtitles_ru_literal_word(suite_dir):
Command('grep (ASCII)', [ Command('grep (ASCII)', [
'grep', '-nw', pat, ru, 'grep', '-nw', pat, ru,
], env=GREP_ASCII), ], env=GREP_ASCII),
# See bench_subtitles_ru_literal for why we use '-a' here. Command('ugrep (ASCII)', ['ugrep', '-nw', pat, ru]),
Command('ugrep (ASCII)', ['ugrep', '-anw', pat, ru]),
Command('rg', ['rg', '-nw', pat, ru]), Command('rg', ['rg', '-nw', pat, ru]),
Command('grep', ['grep', '-nw', pat, ru], env=GREP_UNICODE), Command('grep', ['grep', '-nw', pat, ru], env=GREP_UNICODE),
]) ])
@@ -616,8 +612,7 @@ def bench_subtitles_ru_alternate(suite_dir):
Command('rg (lines)', ['rg', '-n', pat, ru]), Command('rg (lines)', ['rg', '-n', pat, ru]),
Command('ag (lines)', ['ag', '-s', pat, ru]), Command('ag (lines)', ['ag', '-s', pat, ru]),
Command('grep (lines)', ['grep', '-E', '-n', pat, ru], env=GREP_ASCII), Command('grep (lines)', ['grep', '-E', '-n', pat, ru], env=GREP_ASCII),
# See bench_subtitles_ru_literal for why we use '-a' here. Command('ugrep (lines)', ['ugrep', '-n', pat, ru]),
Command('ugrep (lines)', ['ugrep', '-an', pat, ru]),
Command('rg', ['rg', pat, ru]), Command('rg', ['rg', pat, ru]),
Command('grep', ['grep', '-E', pat, ru], env=GREP_ASCII), Command('grep', ['grep', '-E', pat, ru], env=GREP_ASCII),
]) ])
@@ -642,8 +637,7 @@ def bench_subtitles_ru_alternate_casei(suite_dir):
Command('grep (ASCII)', [ Command('grep (ASCII)', [
'grep', '-E', '-ni', pat, ru, 'grep', '-E', '-ni', pat, ru,
], env=GREP_ASCII), ], env=GREP_ASCII),
# See bench_subtitles_ru_literal for why we use '-a' here. Command('ugrep (ASCII)', ['ugrep', '-n', '-i', pat, ru]),
Command('ugrep (ASCII)', ['ugrep', '-ani', pat, ru]),
Command('rg', ['rg', '-n', '-i', pat, ru]), Command('rg', ['rg', '-n', '-i', pat, ru]),
Command('grep', ['grep', '-E', '-ni', pat, ru], env=GREP_UNICODE), Command('grep', ['grep', '-E', '-ni', pat, ru], env=GREP_UNICODE),
]) ])
@@ -660,11 +654,10 @@ def bench_subtitles_ru_surrounding_words(suite_dir):
return Benchmark(pattern=pat, commands=[ return Benchmark(pattern=pat, commands=[
Command('rg', ['rg', '-n', pat, ru]), Command('rg', ['rg', '-n', pat, ru]),
Command('grep', ['grep', '-E', '-n', pat, ru], env=GREP_UNICODE), Command('grep', ['grep', '-E', '-n', pat, ru], env=GREP_UNICODE),
Command('ugrep', ['ugrep', '-an', pat, ru]), Command('ugrep', ['ugrep', '-n', pat, ru]),
Command('ag (ASCII)', ['ag', '-s', pat, ru]), Command('ag (ASCII)', ['ag', '-s', pat, ru]),
Command('grep (ASCII)', ['grep', '-E', '-n', pat, ru], env=GREP_ASCII), Command('grep (ASCII)', ['grep', '-E', '-n', pat, ru], env=GREP_ASCII),
# See bench_subtitles_ru_literal for why we use '-a' here. Command('ugrep (ASCII)', ['ugrep', '-n', '-U', pat, ru]),
Command('ugrep (ASCII)', ['ugrep', '-a', '-n', '-U', pat, ru]),
]) ])
@@ -683,13 +676,11 @@ def bench_subtitles_ru_no_literal(suite_dir):
return Benchmark(pattern=pat, commands=[ return Benchmark(pattern=pat, commands=[
Command('rg', ['rg', '-n', pat, ru]), Command('rg', ['rg', '-n', pat, ru]),
# See bench_subtitles_ru_literal for why we use '-a' here. Command('ugrep', ['ugrep', '-n', pat, ru]),
Command('ugrep', ['ugrep', '-an', pat, ru]),
Command('rg (ASCII)', ['rg', '-n', '(?-u)' + pat, ru]), Command('rg (ASCII)', ['rg', '-n', '(?-u)' + pat, ru]),
Command('ag (ASCII)', ['ag', '-s', pat, ru]), Command('ag (ASCII)', ['ag', '-s', pat, ru]),
Command('grep (ASCII)', ['grep', '-E', '-n', pat, ru], env=GREP_ASCII), Command('grep (ASCII)', ['grep', '-E', '-n', pat, ru], env=GREP_ASCII),
# See bench_subtitles_ru_literal for why we use '-a' here. Command('ugrep (ASCII)', ['ugrep', '-n', '-U', pat, ru])
Command('ugrep (ASCII)', ['ugrep', '-anU', pat, ru])
]) ])

View File

@@ -1,38 +0,0 @@
This directory contains updated benchmarks as of 2022-12-16. They were captured
via the benchsuite script at `benchsuite/benchsuite` from the root of this
repository. The command that was run:
$ ./benchsuite \
--dir /dev/shm/benchsuite \
--raw runs/2022-12-16-archlinux-duff/raw.csv \
| tee runs/2022-12-16-archlinux-duff/summary
The versions of each tool are as follows:
$ rg --version
ripgrep 13.0.0 (rev 87c4a2b4b1)
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)
$ grep -V
grep (GNU grep) 3.8
$ ag -V
ag version 2.2.0
Features:
+jit +lzma +zlib
$ git --version
git version 2.39.0
$ ugrep --version
ugrep 3.9.2 x86_64-pc-linux-gnu +avx2 +pcre2jit +zlib +bzip2 +lzma +lz4 +zstd
License BSD-3-Clause: <https://opensource.org/licenses/BSD-3-Clause>
Written by Robert van Engelen and others: <https://github.com/Genivia/ugrep>
The version of ripgrep used was compiled from source on commit 7f23cd63:
$ cargo build --release --features 'pcre2'
This was run on a machine with an Intel i9-12900K with 128GB of memory.

View File

@@ -1,400 +0,0 @@
benchmark,warmup_iter,iter,name,command,duration,lines,env
linux_literal_default,1,3,rg,rg PM_RESUME,0.08678817749023438,39,
linux_literal_default,1,3,rg,rg PM_RESUME,0.08307123184204102,39,
linux_literal_default,1,3,rg,rg PM_RESUME,0.08347964286804199,39,
linux_literal_default,1,3,ag,ag PM_RESUME,0.2955434322357178,39,
linux_literal_default,1,3,ag,ag PM_RESUME,0.2954287528991699,39,
linux_literal_default,1,3,ag,ag PM_RESUME,0.2938194274902344,39,
linux_literal_default,1,3,git grep,git grep PM_RESUME,0.23198556900024414,39,LC_ALL=en_US.UTF-8
linux_literal_default,1,3,git grep,git grep PM_RESUME,0.22356963157653809,39,LC_ALL=en_US.UTF-8
linux_literal_default,1,3,git grep,git grep PM_RESUME,0.2189793586730957,39,LC_ALL=en_US.UTF-8
linux_literal_default,1,3,ugrep,ugrep -r PM_RESUME ./,0.10710000991821289,39,
linux_literal_default,1,3,ugrep,ugrep -r PM_RESUME ./,0.10364222526550293,39,
linux_literal_default,1,3,ugrep,ugrep -r PM_RESUME ./,0.1052248477935791,39,
linux_literal_default,1,3,grep,grep -r PM_RESUME ./,0.9994468688964844,39,LC_ALL=en_US.UTF-8
linux_literal_default,1,3,grep,grep -r PM_RESUME ./,0.9939279556274414,39,LC_ALL=en_US.UTF-8
linux_literal_default,1,3,grep,grep -r PM_RESUME ./,0.9957931041717529,39,LC_ALL=en_US.UTF-8
linux_literal,1,3,rg,rg -n PM_RESUME,0.08603358268737793,39,
linux_literal,1,3,rg,rg -n PM_RESUME,0.0837090015411377,39,
linux_literal,1,3,rg,rg -n PM_RESUME,0.08435535430908203,39,
linux_literal,1,3,rg (mmap),rg -n --mmap PM_RESUME,0.3215503692626953,39,
linux_literal,1,3,rg (mmap),rg -n --mmap PM_RESUME,0.32426929473876953,39,
linux_literal,1,3,rg (mmap),rg -n --mmap PM_RESUME,0.3215982913970947,39,
linux_literal,1,3,ag (mmap),ag -s PM_RESUME,0.2894856929779053,39,
linux_literal,1,3,ag (mmap),ag -s PM_RESUME,0.2892603874206543,39,
linux_literal,1,3,ag (mmap),ag -s PM_RESUME,0.29217028617858887,39,
linux_literal,1,3,git grep,git grep -I -n PM_RESUME,0.206068754196167,39,LC_ALL=C
linux_literal,1,3,git grep,git grep -I -n PM_RESUME,0.2218036651611328,39,LC_ALL=C
linux_literal,1,3,git grep,git grep -I -n PM_RESUME,0.20590710639953613,39,LC_ALL=C
linux_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./,0.18692874908447266,39,
linux_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./,0.19518327713012695,39,
linux_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./,0.18577361106872559,39,
linux_literal_casei,1,3,rg,rg -n -i PM_RESUME,0.08709383010864258,536,
linux_literal_casei,1,3,rg,rg -n -i PM_RESUME,0.08861064910888672,536,
linux_literal_casei,1,3,rg,rg -n -i PM_RESUME,0.08769798278808594,536,
linux_literal_casei,1,3,rg (mmap),rg -n -i --mmap PM_RESUME,0.3218965530395508,536,
linux_literal_casei,1,3,rg (mmap),rg -n -i --mmap PM_RESUME,0.30869364738464355,536,
linux_literal_casei,1,3,rg (mmap),rg -n -i --mmap PM_RESUME,0.31044936180114746,536,
linux_literal_casei,1,3,ag (mmap),ag -i PM_RESUME,0.2989068031311035,536,
linux_literal_casei,1,3,ag (mmap),ag -i PM_RESUME,0.2996039390563965,536,
linux_literal_casei,1,3,ag (mmap),ag -i PM_RESUME,0.29817700386047363,536,
linux_literal_casei,1,3,git grep,git grep -I -n -i PM_RESUME,0.2122786045074463,536,LC_ALL=C
linux_literal_casei,1,3,git grep,git grep -I -n -i PM_RESUME,0.20763754844665527,536,LC_ALL=C
linux_literal_casei,1,3,git grep,git grep -I -n -i PM_RESUME,0.220794677734375,536,LC_ALL=C
linux_literal_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./,0.17305850982666016,536,
linux_literal_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./,0.1745915412902832,536,
linux_literal_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./,0.17526865005493164,536,
linux_re_literal_suffix,1,3,rg,rg -n [A-Z]+_RESUME,0.08527851104736328,2160,
linux_re_literal_suffix,1,3,rg,rg -n [A-Z]+_RESUME,0.08487534523010254,2160,
linux_re_literal_suffix,1,3,rg,rg -n [A-Z]+_RESUME,0.0848684310913086,2160,
linux_re_literal_suffix,1,3,ag,ag -s [A-Z]+_RESUME,0.37945985794067383,2160,
linux_re_literal_suffix,1,3,ag,ag -s [A-Z]+_RESUME,0.36303210258483887,2160,
linux_re_literal_suffix,1,3,ag,ag -s [A-Z]+_RESUME,0.36359691619873047,2160,
linux_re_literal_suffix,1,3,git grep,git grep -E -I -n [A-Z]+_RESUME,0.9589834213256836,2160,LC_ALL=C
linux_re_literal_suffix,1,3,git grep,git grep -E -I -n [A-Z]+_RESUME,0.9206984043121338,2160,LC_ALL=C
linux_re_literal_suffix,1,3,git grep,git grep -E -I -n [A-Z]+_RESUME,0.8642933368682861,2160,LC_ALL=C
linux_re_literal_suffix,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./,0.40503501892089844,2160,
linux_re_literal_suffix,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./,0.4531714916229248,2160,
linux_re_literal_suffix,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./,0.4397866725921631,2160,
linux_word,1,3,rg,rg -n -w PM_RESUME,0.08639907836914062,9,
linux_word,1,3,rg,rg -n -w PM_RESUME,0.08583569526672363,9,
linux_word,1,3,rg,rg -n -w PM_RESUME,0.08414363861083984,9,
linux_word,1,3,ag,ag -s -w PM_RESUME,0.2853865623474121,9,
linux_word,1,3,ag,ag -s -w PM_RESUME,0.2871377468109131,9,
linux_word,1,3,ag,ag -s -w PM_RESUME,0.28753662109375,9,
linux_word,1,3,git grep,git grep -E -I -n -w PM_RESUME,0.20428204536437988,9,LC_ALL=C
linux_word,1,3,git grep,git grep -E -I -n -w PM_RESUME,0.20490717887878418,9,LC_ALL=C
linux_word,1,3,git grep,git grep -E -I -n -w PM_RESUME,0.20840072631835938,9,LC_ALL=C
linux_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./,0.18790841102600098,9,
linux_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./,0.18659543991088867,9,
linux_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./,0.19104933738708496,9,
linux_unicode_greek,1,3,rg,rg -n \p{Greek},0.19976496696472168,105,
linux_unicode_greek,1,3,rg,rg -n \p{Greek},0.20618367195129395,105,
linux_unicode_greek,1,3,rg,rg -n \p{Greek},0.19702935218811035,105,
linux_unicode_greek,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./,0.17758727073669434,105,
linux_unicode_greek,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./,0.17793798446655273,105,
linux_unicode_greek,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./,0.1872577667236328,105,
linux_unicode_greek_casei,1,3,rg,rg -n -i \p{Greek},0.19808244705200195,245,
linux_unicode_greek_casei,1,3,rg,rg -n -i \p{Greek},0.1979837417602539,245,
linux_unicode_greek_casei,1,3,rg,rg -n -i \p{Greek},0.1984400749206543,245,
linux_unicode_greek_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./,0.1819148063659668,105,
linux_unicode_greek_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./,0.17530512809753418,105,
linux_unicode_greek_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./,0.17999005317687988,105,
linux_unicode_word,1,3,rg,rg -n \wAh,0.08527827262878418,247,
linux_unicode_word,1,3,rg,rg -n \wAh,0.08541679382324219,247,
linux_unicode_word,1,3,rg,rg -n \wAh,0.08553218841552734,247,
linux_unicode_word,1,3,rg (ASCII),rg -n (?-u)\wAh,0.08484745025634766,233,
linux_unicode_word,1,3,rg (ASCII),rg -n (?-u)\wAh,0.08466482162475586,233,
linux_unicode_word,1,3,rg (ASCII),rg -n (?-u)\wAh,0.08487439155578613,233,
linux_unicode_word,1,3,ag (ASCII),ag -s \wAh,0.3061795234680176,233,
linux_unicode_word,1,3,ag (ASCII),ag -s \wAh,0.2993617057800293,233,
linux_unicode_word,1,3,ag (ASCII),ag -s \wAh,0.29722046852111816,233,
linux_unicode_word,1,3,git grep,git grep -E -I -n \wAh,4.257144451141357,247,LC_ALL=en_US.UTF-8
linux_unicode_word,1,3,git grep,git grep -E -I -n \wAh,3.852163076400757,247,LC_ALL=en_US.UTF-8
linux_unicode_word,1,3,git grep,git grep -E -I -n \wAh,3.8293941020965576,247,LC_ALL=en_US.UTF-8
linux_unicode_word,1,3,git grep (ASCII),git grep -E -I -n \wAh,1.647632122039795,233,LC_ALL=C
linux_unicode_word,1,3,git grep (ASCII),git grep -E -I -n \wAh,1.6269629001617432,233,LC_ALL=C
linux_unicode_word,1,3,git grep (ASCII),git grep -E -I -n \wAh,1.5847914218902588,233,LC_ALL=C
linux_unicode_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \wAh ./,0.1802208423614502,247,
linux_unicode_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \wAh ./,0.17564702033996582,247,
linux_unicode_word,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \wAh ./,0.1746981143951416,247,
linux_unicode_word,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./,0.1799161434173584,233,
linux_unicode_word,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./,0.18733000755310059,233,
linux_unicode_word,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./,0.18859529495239258,233,
linux_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.26203155517578125,721,
linux_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.2615540027618408,721,
linux_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.2730247974395752,721,
linux_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.19902300834655762,720,
linux_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.20034146308898926,720,
linux_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.20192813873291016,720,
linux_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.8269081115722656,1134,
linux_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.8393104076385498,1134,
linux_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},0.8293666839599609,1134,
linux_no_literal,1,3,git grep,git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},7.334395408630371,721,LC_ALL=en_US.UTF-8
linux_no_literal,1,3,git grep,git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},7.338796854019165,721,LC_ALL=en_US.UTF-8
linux_no_literal,1,3,git grep,git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},7.36545991897583,721,LC_ALL=en_US.UTF-8
linux_no_literal,1,3,git grep (ASCII),git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},2.1588926315307617,720,LC_ALL=C
linux_no_literal,1,3,git grep (ASCII),git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},2.132209062576294,720,LC_ALL=C
linux_no_literal,1,3,git grep (ASCII),git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5},2.1407439708709717,720,LC_ALL=C
linux_no_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,3.410162925720215,723,
linux_no_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,3.405057668685913,723,
linux_no_literal,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,3.3945884704589844,723,
linux_no_literal,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,0.23865604400634766,722,
linux_no_literal,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,0.23371148109436035,722,
linux_no_literal,1,3,ugrep (ASCII),ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./,0.2343149185180664,722,
linux_alternates,1,3,rg,rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.08691263198852539,140,
linux_alternates,1,3,rg,rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.08707070350646973,140,
linux_alternates,1,3,rg,rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.08713960647583008,140,
linux_alternates,1,3,ag,ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.32947278022766113,140,
linux_alternates,1,3,ag,ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.33203840255737305,140,
linux_alternates,1,3,ag,ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.3292670249938965,140,
linux_alternates,1,3,git grep,git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.4576725959777832,140,LC_ALL=C
linux_alternates,1,3,git grep,git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.41936421394348145,140,LC_ALL=C
linux_alternates,1,3,git grep,git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.3639688491821289,140,LC_ALL=C
linux_alternates,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.17806458473205566,140,
linux_alternates,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.18224716186523438,140,
linux_alternates,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.17795038223266602,140,
linux_alternates_casei,1,3,rg,rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.12421393394470215,241,
linux_alternates_casei,1,3,rg,rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.12235784530639648,241,
linux_alternates_casei,1,3,rg,rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.12151455879211426,241,
linux_alternates_casei,1,3,ag,ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.529585599899292,241,
linux_alternates_casei,1,3,ag,ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.5305526256561279,241,
linux_alternates_casei,1,3,ag,ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.5311264991760254,241,
linux_alternates_casei,1,3,git grep,git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.7589735984802246,241,LC_ALL=C
linux_alternates_casei,1,3,git grep,git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.7852108478546143,241,LC_ALL=C
linux_alternates_casei,1,3,git grep,git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT,0.8308050632476807,241,LC_ALL=C
linux_alternates_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.17955923080444336,241,
linux_alternates_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.1745290756225586,241,
linux_alternates_casei,1,3,ugrep,ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./,0.1773686408996582,241,
subtitles_en_literal,1,3,rg,rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.1213979721069336,830,
subtitles_en_literal,1,3,rg,rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.1213991641998291,830,
subtitles_en_literal,1,3,rg,rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.12620782852172852,830,
subtitles_en_literal,1,3,rg (no mmap),rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18207263946533203,830,
subtitles_en_literal,1,3,rg (no mmap),rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.17281484603881836,830,
subtitles_en_literal,1,3,rg (no mmap),rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.17368507385253906,830,
subtitles_en_literal,1,3,grep,grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.560560941696167,830,LC_ALL=C
subtitles_en_literal,1,3,grep,grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.563499927520752,830,LC_ALL=C
subtitles_en_literal,1,3,grep,grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.5916609764099121,830,LC_ALL=C
subtitles_en_literal,1,3,rg (lines),rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.19600844383239746,830,
subtitles_en_literal,1,3,rg (lines),rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18436980247497559,830,
subtitles_en_literal,1,3,rg (lines),rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18594050407409668,830,
subtitles_en_literal,1,3,ag (lines),ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.871025562286377,830,
subtitles_en_literal,1,3,ag (lines),ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.8636960983276367,830,
subtitles_en_literal,1,3,ag (lines),ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.8680994510650635,830,
subtitles_en_literal,1,3,grep (lines),grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.9978001117706299,830,LC_ALL=C
subtitles_en_literal,1,3,grep (lines),grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.9385361671447754,830,LC_ALL=C
subtitles_en_literal,1,3,grep (lines),grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.0036489963531494,830,LC_ALL=C
subtitles_en_literal,1,3,ugrep (lines),ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18918490409851074,830,
subtitles_en_literal,1,3,ugrep (lines),ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.1769108772277832,830,
subtitles_en_literal,1,3,ugrep (lines),ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18808293342590332,830,
subtitles_en_literal_casei,1,3,rg,rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.21876287460327148,871,
subtitles_en_literal_casei,1,3,rg,rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.2044692039489746,871,
subtitles_en_literal_casei,1,3,rg,rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.2184743881225586,871,
subtitles_en_literal_casei,1,3,grep,grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,2.224027156829834,871,LC_ALL=en_US.UTF-8
subtitles_en_literal_casei,1,3,grep,grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,2.223188877105713,871,LC_ALL=en_US.UTF-8
subtitles_en_literal_casei,1,3,grep,grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,2.223966598510742,871,LC_ALL=en_US.UTF-8
subtitles_en_literal_casei,1,3,grep (ASCII),grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.671149492263794,871,LC_ALL=C
subtitles_en_literal_casei,1,3,grep (ASCII),grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.6705749034881592,871,LC_ALL=C
subtitles_en_literal_casei,1,3,grep (ASCII),grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.6700258255004883,871,LC_ALL=C
subtitles_en_literal_casei,1,3,rg (lines),rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.2624058723449707,871,
subtitles_en_literal_casei,1,3,rg (lines),rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.25513339042663574,871,
subtitles_en_literal_casei,1,3,rg (lines),rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.26088857650756836,871,
subtitles_en_literal_casei,1,3,ag (lines) (ASCII),ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.9144322872161865,871,
subtitles_en_literal_casei,1,3,ag (lines) (ASCII),ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.866628885269165,871,
subtitles_en_literal_casei,1,3,ag (lines) (ASCII),ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.9098389148712158,871,
subtitles_en_literal_casei,1,3,ugrep (lines),ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.7860472202301025,871,
subtitles_en_literal_casei,1,3,ugrep (lines),ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.7858343124389648,871,
subtitles_en_literal_casei,1,3,ugrep (lines),ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.782252311706543,871,
subtitles_en_literal_word,1,3,rg (ASCII),rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt,0.18424677848815918,830,
subtitles_en_literal_word,1,3,rg (ASCII),rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt,0.19610810279846191,830,
subtitles_en_literal_word,1,3,rg (ASCII),rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt,0.18711471557617188,830,
subtitles_en_literal_word,1,3,ag (ASCII),ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.8301315307617188,830,
subtitles_en_literal_word,1,3,ag (ASCII),ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.8689801692962646,830,
subtitles_en_literal_word,1,3,ag (ASCII),ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.8279321193695068,830,
subtitles_en_literal_word,1,3,grep (ASCII),grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.0036842823028564,830,LC_ALL=C
subtitles_en_literal_word,1,3,grep (ASCII),grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.002833604812622,830,LC_ALL=C
subtitles_en_literal_word,1,3,grep (ASCII),grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.9236147403717041,830,LC_ALL=C
subtitles_en_literal_word,1,3,ugrep (ASCII),ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.17717313766479492,830,
subtitles_en_literal_word,1,3,ugrep (ASCII),ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18994617462158203,830,
subtitles_en_literal_word,1,3,ugrep (ASCII),ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.17972850799560547,830,
subtitles_en_literal_word,1,3,rg,rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18804550170898438,830,
subtitles_en_literal_word,1,3,rg,rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.18867778778076172,830,
subtitles_en_literal_word,1,3,rg,rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.19913530349731445,830,
subtitles_en_literal_word,1,3,grep,grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.0044364929199219,830,LC_ALL=en_US.UTF-8
subtitles_en_literal_word,1,3,grep,grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,1.0040032863616943,830,LC_ALL=en_US.UTF-8
subtitles_en_literal_word,1,3,grep,grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt,0.9627983570098877,830,LC_ALL=en_US.UTF-8
subtitles_en_alternate,1,3,rg (lines),rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.24848055839538574,1094,
subtitles_en_alternate,1,3,rg (lines),rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.24738383293151855,1094,
subtitles_en_alternate,1,3,rg (lines),rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.24789118766784668,1094,
subtitles_en_alternate,1,3,ag (lines),ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.668708562850952,1094,
subtitles_en_alternate,1,3,ag (lines),ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.57511305809021,1094,
subtitles_en_alternate,1,3,ag (lines),ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.6714110374450684,1094,
subtitles_en_alternate,1,3,grep (lines),grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.0586187839508057,1094,LC_ALL=C
subtitles_en_alternate,1,3,grep (lines),grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.0227150917053223,1094,LC_ALL=C
subtitles_en_alternate,1,3,grep (lines),grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,2.075378179550171,1094,LC_ALL=C
subtitles_en_alternate,1,3,ugrep (lines),ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.7863781452178955,1094,
subtitles_en_alternate,1,3,ugrep (lines),ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.7874250411987305,1094,
subtitles_en_alternate,1,3,ugrep (lines),ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.7867889404296875,1094,
subtitles_en_alternate,1,3,rg,rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.18195557594299316,1094,
subtitles_en_alternate,1,3,rg,rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.18239641189575195,1094,
subtitles_en_alternate,1,3,rg,rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.1625690460205078,1094,
subtitles_en_alternate,1,3,grep,grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,1.6601614952087402,1094,LC_ALL=C
subtitles_en_alternate,1,3,grep,grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,1.6617567539215088,1094,LC_ALL=C
subtitles_en_alternate,1,3,grep,grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,1.6584677696228027,1094,LC_ALL=C
subtitles_en_alternate_casei,1,3,ag (ASCII),ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,4.0028722286224365,1136,
subtitles_en_alternate_casei,1,3,ag (ASCII),ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.991217851638794,1136,
subtitles_en_alternate_casei,1,3,ag (ASCII),ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,4.00272274017334,1136,
subtitles_en_alternate_casei,1,3,grep (ASCII),grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.549154758453369,1136,LC_ALL=C
subtitles_en_alternate_casei,1,3,grep (ASCII),grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.5468921661376953,1136,LC_ALL=C
subtitles_en_alternate_casei,1,3,grep (ASCII),grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.5873491764068604,1136,LC_ALL=C
subtitles_en_alternate_casei,1,3,ugrep (ASCII),ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.7872169017791748,1136,
subtitles_en_alternate_casei,1,3,ugrep (ASCII),ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.784674882888794,1136,
subtitles_en_alternate_casei,1,3,ugrep (ASCII),ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.7882401943206787,1136,
subtitles_en_alternate_casei,1,3,rg,rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.4785435199737549,1136,
subtitles_en_alternate_casei,1,3,rg,rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.4940922260284424,1136,
subtitles_en_alternate_casei,1,3,rg,rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,0.4774627685546875,1136,
subtitles_en_alternate_casei,1,3,grep,grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.5677175521850586,1136,LC_ALL=en_US.UTF-8
subtitles_en_alternate_casei,1,3,grep,grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.603273391723633,1136,LC_ALL=en_US.UTF-8
subtitles_en_alternate_casei,1,3,grep,grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt,3.5834741592407227,1136,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.20238041877746582,278,
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.2031264305114746,278,
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.20475172996520996,278,
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0288453102111816,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.044802188873291,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0432109832763672,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,43.00765633583069,278,
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,42.832849740982056,278,
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,42.915205240249634,278,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.083683967590332,,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0841526985168457,,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0850934982299805,,
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0116353034973145,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.9868073463439941,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0224814414978027,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8892502784729004,,
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8910088539123535,,
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8897674083709717,,
subtitles_en_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,2.11850643157959,22,
subtitles_en_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,2.1359670162200928,22,
subtitles_en_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,2.103114128112793,22,
subtitles_en_no_literal,1,3,ugrep,ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,13.050881385803223,22,
subtitles_en_no_literal,1,3,ugrep,ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,13.050772190093994,22,
subtitles_en_no_literal,1,3,ugrep,ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,13.05719804763794,22,
subtitles_en_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,1.9961926937103271,22,
subtitles_en_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,2.019721508026123,22,
subtitles_en_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,1.9965126514434814,22,
subtitles_en_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,6.849602222442627,302,
subtitles_en_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,6.813834190368652,302,
subtitles_en_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,6.8263633251190186,302,
subtitles_en_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,4.42924165725708,22,LC_ALL=C
subtitles_en_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,4.378557205200195,22,LC_ALL=C
subtitles_en_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,4.376646518707275,22,LC_ALL=C
subtitles_en_no_literal,1,3,ugrep (ASCII),ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,3.5110037326812744,22,
subtitles_en_no_literal,1,3,ugrep (ASCII),ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,3.5137360095977783,22,
subtitles_en_no_literal,1,3,ugrep (ASCII),ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt,3.5051844120025635,22,
subtitles_ru_literal,1,3,rg,rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.13207745552062988,583,
subtitles_ru_literal,1,3,rg,rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.13084721565246582,583,
subtitles_ru_literal,1,3,rg,rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.13469862937927246,583,
subtitles_ru_literal,1,3,rg (no mmap),rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.18022370338439941,583,
subtitles_ru_literal,1,3,rg (no mmap),rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.1801767349243164,583,
subtitles_ru_literal,1,3,rg (no mmap),rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.17995166778564453,583,
subtitles_ru_literal,1,3,grep,grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.5151040554046631,583,LC_ALL=C
subtitles_ru_literal,1,3,grep,grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.5154542922973633,583,LC_ALL=C
subtitles_ru_literal,1,3,grep,grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.49927639961242676,583,LC_ALL=C
subtitles_ru_literal,1,3,rg (lines),rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.19464492797851562,583,
subtitles_ru_literal,1,3,rg (lines),rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.18920588493347168,583,
subtitles_ru_literal,1,3,rg (lines),rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.19465351104736328,583,
subtitles_ru_literal,1,3,ag (lines),ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,1.9595966339111328,583,
subtitles_ru_literal,1,3,ag (lines),ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,2.0014493465423584,583,
subtitles_ru_literal,1,3,ag (lines),ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,1.9567768573760986,583,
subtitles_ru_literal,1,3,grep (lines),grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8119180202484131,583,LC_ALL=C
subtitles_ru_literal,1,3,grep (lines),grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8111097812652588,583,LC_ALL=C
subtitles_ru_literal,1,3,grep (lines),grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8006868362426758,583,LC_ALL=C
subtitles_ru_literal,1,3,ugrep (lines),ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.70003342628479,583,
subtitles_ru_literal,1,3,ugrep (lines),ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.650275468826294,583,
subtitles_ru_literal,1,3,ugrep (lines),ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.689772367477417,583,
subtitles_ru_literal_casei,1,3,rg,rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.267578125,604,
subtitles_ru_literal_casei,1,3,rg,rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.2665982246398926,604,
subtitles_ru_literal_casei,1,3,rg,rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.26861572265625,604,
subtitles_ru_literal_casei,1,3,grep,grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,4.764627456665039,604,LC_ALL=en_US.UTF-8
subtitles_ru_literal_casei,1,3,grep,grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,4.767015695571899,604,LC_ALL=en_US.UTF-8
subtitles_ru_literal_casei,1,3,grep,grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,4.7688889503479,604,LC_ALL=en_US.UTF-8
subtitles_ru_literal_casei,1,3,grep (ASCII),grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.5046737194061279,583,LC_ALL=C
subtitles_ru_literal_casei,1,3,grep (ASCII),grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.5139875411987305,583,LC_ALL=C
subtitles_ru_literal_casei,1,3,grep (ASCII),grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.4993159770965576,583,LC_ALL=C
subtitles_ru_literal_casei,1,3,rg (lines),rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.33438658714294434,604,
subtitles_ru_literal_casei,1,3,rg (lines),rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.3398289680480957,604,
subtitles_ru_literal_casei,1,3,rg (lines),rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.3298227787017822,604,
subtitles_ru_literal_casei,1,3,ag (lines) (ASCII),ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.4468214511871338,,
subtitles_ru_literal_casei,1,3,ag (lines) (ASCII),ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.44559574127197266,,
subtitles_ru_literal_casei,1,3,ag (lines) (ASCII),ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.47882938385009766,,
subtitles_ru_literal_casei,1,3,ugrep (lines) (ASCII),ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.7039575576782227,583,
subtitles_ru_literal_casei,1,3,ugrep (lines) (ASCII),ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.6490752696990967,583,
subtitles_ru_literal_casei,1,3,ugrep (lines) (ASCII),ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8081104755401611,583,
subtitles_ru_literal_word,1,3,rg (ASCII),rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt,0.20162224769592285,583,
subtitles_ru_literal_word,1,3,rg (ASCII),rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt,0.18215250968933105,583,
subtitles_ru_literal_word,1,3,rg (ASCII),rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt,0.20087671279907227,583,
subtitles_ru_literal_word,1,3,ag (ASCII),ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.48624587059020996,,
subtitles_ru_literal_word,1,3,ag (ASCII),ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.5212516784667969,,
subtitles_ru_literal_word,1,3,ag (ASCII),ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.520557165145874,,
subtitles_ru_literal_word,1,3,grep (ASCII),grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8108196258544922,583,LC_ALL=C
subtitles_ru_literal_word,1,3,grep (ASCII),grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8121066093444824,583,LC_ALL=C
subtitles_ru_literal_word,1,3,grep (ASCII),grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.7784581184387207,583,LC_ALL=C
subtitles_ru_literal_word,1,3,ugrep (ASCII),ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.7469344139099121,583,
subtitles_ru_literal_word,1,3,ugrep (ASCII),ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.6838233470916748,583,
subtitles_ru_literal_word,1,3,ugrep (ASCII),ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.6921679973602295,583,
subtitles_ru_literal_word,1,3,rg,rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.19918251037597656,579,
subtitles_ru_literal_word,1,3,rg,rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.2046656608581543,579,
subtitles_ru_literal_word,1,3,rg,rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.1984848976135254,579,
subtitles_ru_literal_word,1,3,grep,grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.794173002243042,579,LC_ALL=en_US.UTF-8
subtitles_ru_literal_word,1,3,grep,grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.7715346813201904,579,LC_ALL=en_US.UTF-8
subtitles_ru_literal_word,1,3,grep,grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt,0.8116705417633057,579,LC_ALL=en_US.UTF-8
subtitles_ru_alternate,1,3,rg (lines),rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.6730976104736328,691,
subtitles_ru_alternate,1,3,rg (lines),rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.7020411491394043,691,
subtitles_ru_alternate,1,3,rg (lines),rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.6693949699401855,691,
subtitles_ru_alternate,1,3,ag (lines),ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.7100515365600586,691,
subtitles_ru_alternate,1,3,ag (lines),ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.7458419799804688,691,
subtitles_ru_alternate,1,3,ag (lines),ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.7115116119384766,691,
subtitles_ru_alternate,1,3,grep (lines),grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.703738451004028,691,LC_ALL=C
subtitles_ru_alternate,1,3,grep (lines),grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.715883731842041,691,LC_ALL=C
subtitles_ru_alternate,1,3,grep (lines),grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.712724924087524,691,LC_ALL=C
subtitles_ru_alternate,1,3,ugrep (lines),ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.276995420455933,691,
subtitles_ru_alternate,1,3,ugrep (lines),ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.304608345031738,691,
subtitles_ru_alternate,1,3,ugrep (lines),ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.322760820388794,691,
subtitles_ru_alternate,1,3,rg,rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.6119842529296875,691,
subtitles_ru_alternate,1,3,rg,rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.6368775367736816,691,
subtitles_ru_alternate,1,3,rg,rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,0.6258070468902588,691,
subtitles_ru_alternate,1,3,grep,grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.4300291538238525,691,LC_ALL=C
subtitles_ru_alternate,1,3,grep,grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.418199300765991,691,LC_ALL=C
subtitles_ru_alternate,1,3,grep,grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.425868511199951,691,LC_ALL=C
subtitles_ru_alternate_casei,1,3,ag (ASCII),ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.7216460704803467,691,
subtitles_ru_alternate_casei,1,3,ag (ASCII),ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.7108607292175293,691,
subtitles_ru_alternate_casei,1,3,ag (ASCII),ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,2.747138500213623,691,
subtitles_ru_alternate_casei,1,3,grep (ASCII),grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.711230039596558,691,LC_ALL=C
subtitles_ru_alternate_casei,1,3,grep (ASCII),grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.709407329559326,691,LC_ALL=C
subtitles_ru_alternate_casei,1,3,grep (ASCII),grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.714034557342529,691,LC_ALL=C
subtitles_ru_alternate_casei,1,3,ugrep (ASCII),ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.305904626846313,691,
subtitles_ru_alternate_casei,1,3,ugrep (ASCII),ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.307406187057495,691,
subtitles_ru_alternate_casei,1,3,ugrep (ASCII),ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,8.288233995437622,691,
subtitles_ru_alternate_casei,1,3,rg,rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,3.673624277114868,735,
subtitles_ru_alternate_casei,1,3,rg,rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,3.6759188175201416,735,
subtitles_ru_alternate_casei,1,3,rg,rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,3.66877818107605,735,
subtitles_ru_alternate_casei,1,3,grep,grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.366282224655151,735,LC_ALL=en_US.UTF-8
subtitles_ru_alternate_casei,1,3,grep,grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.370524883270264,735,LC_ALL=en_US.UTF-8
subtitles_ru_alternate_casei,1,3,grep,grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt,5.342163324356079,735,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.20331382751464844,278,
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.2034592628479004,278,
subtitles_ru_surrounding_words,1,3,rg,rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.20407724380493164,278,
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0436389446258545,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0388383865356445,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,grep,grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0446207523345947,278,LC_ALL=en_US.UTF-8
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.29245424270629883,1,
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.29168128967285156,1,
subtitles_ru_surrounding_words,1,3,ugrep,ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.29593825340270996,1,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.085604190826416,,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.083526372909546,,
subtitles_ru_surrounding_words,1,3,ag (ASCII),ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.1223819255828857,,
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.9905192852020264,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0222513675689697,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,grep (ASCII),grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,1.0216262340545654,,LC_ALL=C
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8875806331634521,,
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8861405849456787,,
subtitles_ru_surrounding_words,1,3,ugrep (ASCII),ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt,0.8898241519927979,,
subtitles_ru_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.237398147583008,41,
subtitles_ru_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.253706693649292,41,
subtitles_ru_no_literal,1,3,rg,rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.2161178588867188,41,
subtitles_ru_no_literal,1,3,ugrep,ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,28.85959553718567,41,
subtitles_ru_no_literal,1,3,ugrep,ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,28.666419982910156,41,
subtitles_ru_no_literal,1,3,ugrep,ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,28.90555214881897,41,
subtitles_ru_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.051813840866089,,
subtitles_ru_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.026675224304199,,
subtitles_ru_no_literal,1,3,rg (ASCII),rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,2.027498245239258,,
subtitles_ru_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0998010635375977,,
subtitles_ru_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0900018215179443,,
subtitles_ru_no_literal,1,3,ag (ASCII),ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0901548862457275,,
subtitles_ru_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0691263675689697,,LC_ALL=C
subtitles_ru_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0875153541564941,,LC_ALL=C
subtitles_ru_no_literal,1,3,grep (ASCII),grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,1.0997354984283447,,LC_ALL=C
subtitles_ru_no_literal,1,3,ugrep (ASCII),ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,0.8329172134399414,,
subtitles_ru_no_literal,1,3,ugrep (ASCII),ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,0.8292679786682129,,
subtitles_ru_no_literal,1,3,ugrep (ASCII),ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt,0.8326950073242188,,
1 benchmark warmup_iter iter name command duration lines env
2 linux_literal_default 1 3 rg rg PM_RESUME 0.08678817749023438 39
3 linux_literal_default 1 3 rg rg PM_RESUME 0.08307123184204102 39
4 linux_literal_default 1 3 rg rg PM_RESUME 0.08347964286804199 39
5 linux_literal_default 1 3 ag ag PM_RESUME 0.2955434322357178 39
6 linux_literal_default 1 3 ag ag PM_RESUME 0.2954287528991699 39
7 linux_literal_default 1 3 ag ag PM_RESUME 0.2938194274902344 39
8 linux_literal_default 1 3 git grep git grep PM_RESUME 0.23198556900024414 39 LC_ALL=en_US.UTF-8
9 linux_literal_default 1 3 git grep git grep PM_RESUME 0.22356963157653809 39 LC_ALL=en_US.UTF-8
10 linux_literal_default 1 3 git grep git grep PM_RESUME 0.2189793586730957 39 LC_ALL=en_US.UTF-8
11 linux_literal_default 1 3 ugrep ugrep -r PM_RESUME ./ 0.10710000991821289 39
12 linux_literal_default 1 3 ugrep ugrep -r PM_RESUME ./ 0.10364222526550293 39
13 linux_literal_default 1 3 ugrep ugrep -r PM_RESUME ./ 0.1052248477935791 39
14 linux_literal_default 1 3 grep grep -r PM_RESUME ./ 0.9994468688964844 39 LC_ALL=en_US.UTF-8
15 linux_literal_default 1 3 grep grep -r PM_RESUME ./ 0.9939279556274414 39 LC_ALL=en_US.UTF-8
16 linux_literal_default 1 3 grep grep -r PM_RESUME ./ 0.9957931041717529 39 LC_ALL=en_US.UTF-8
17 linux_literal 1 3 rg rg -n PM_RESUME 0.08603358268737793 39
18 linux_literal 1 3 rg rg -n PM_RESUME 0.0837090015411377 39
19 linux_literal 1 3 rg rg -n PM_RESUME 0.08435535430908203 39
20 linux_literal 1 3 rg (mmap) rg -n --mmap PM_RESUME 0.3215503692626953 39
21 linux_literal 1 3 rg (mmap) rg -n --mmap PM_RESUME 0.32426929473876953 39
22 linux_literal 1 3 rg (mmap) rg -n --mmap PM_RESUME 0.3215982913970947 39
23 linux_literal 1 3 ag (mmap) ag -s PM_RESUME 0.2894856929779053 39
24 linux_literal 1 3 ag (mmap) ag -s PM_RESUME 0.2892603874206543 39
25 linux_literal 1 3 ag (mmap) ag -s PM_RESUME 0.29217028617858887 39
26 linux_literal 1 3 git grep git grep -I -n PM_RESUME 0.206068754196167 39 LC_ALL=C
27 linux_literal 1 3 git grep git grep -I -n PM_RESUME 0.2218036651611328 39 LC_ALL=C
28 linux_literal 1 3 git grep git grep -I -n PM_RESUME 0.20590710639953613 39 LC_ALL=C
29 linux_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./ 0.18692874908447266 39
30 linux_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./ 0.19518327713012695 39
31 linux_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n PM_RESUME ./ 0.18577361106872559 39
32 linux_literal_casei 1 3 rg rg -n -i PM_RESUME 0.08709383010864258 536
33 linux_literal_casei 1 3 rg rg -n -i PM_RESUME 0.08861064910888672 536
34 linux_literal_casei 1 3 rg rg -n -i PM_RESUME 0.08769798278808594 536
35 linux_literal_casei 1 3 rg (mmap) rg -n -i --mmap PM_RESUME 0.3218965530395508 536
36 linux_literal_casei 1 3 rg (mmap) rg -n -i --mmap PM_RESUME 0.30869364738464355 536
37 linux_literal_casei 1 3 rg (mmap) rg -n -i --mmap PM_RESUME 0.31044936180114746 536
38 linux_literal_casei 1 3 ag (mmap) ag -i PM_RESUME 0.2989068031311035 536
39 linux_literal_casei 1 3 ag (mmap) ag -i PM_RESUME 0.2996039390563965 536
40 linux_literal_casei 1 3 ag (mmap) ag -i PM_RESUME 0.29817700386047363 536
41 linux_literal_casei 1 3 git grep git grep -I -n -i PM_RESUME 0.2122786045074463 536 LC_ALL=C
42 linux_literal_casei 1 3 git grep git grep -I -n -i PM_RESUME 0.20763754844665527 536 LC_ALL=C
43 linux_literal_casei 1 3 git grep git grep -I -n -i PM_RESUME 0.220794677734375 536 LC_ALL=C
44 linux_literal_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./ 0.17305850982666016 536
45 linux_literal_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./ 0.1745915412902832 536
46 linux_literal_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i PM_RESUME ./ 0.17526865005493164 536
47 linux_re_literal_suffix 1 3 rg rg -n [A-Z]+_RESUME 0.08527851104736328 2160
48 linux_re_literal_suffix 1 3 rg rg -n [A-Z]+_RESUME 0.08487534523010254 2160
49 linux_re_literal_suffix 1 3 rg rg -n [A-Z]+_RESUME 0.0848684310913086 2160
50 linux_re_literal_suffix 1 3 ag ag -s [A-Z]+_RESUME 0.37945985794067383 2160
51 linux_re_literal_suffix 1 3 ag ag -s [A-Z]+_RESUME 0.36303210258483887 2160
52 linux_re_literal_suffix 1 3 ag ag -s [A-Z]+_RESUME 0.36359691619873047 2160
53 linux_re_literal_suffix 1 3 git grep git grep -E -I -n [A-Z]+_RESUME 0.9589834213256836 2160 LC_ALL=C
54 linux_re_literal_suffix 1 3 git grep git grep -E -I -n [A-Z]+_RESUME 0.9206984043121338 2160 LC_ALL=C
55 linux_re_literal_suffix 1 3 git grep git grep -E -I -n [A-Z]+_RESUME 0.8642933368682861 2160 LC_ALL=C
56 linux_re_literal_suffix 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./ 0.40503501892089844 2160
57 linux_re_literal_suffix 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./ 0.4531714916229248 2160
58 linux_re_literal_suffix 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n [A-Z]+_RESUME ./ 0.4397866725921631 2160
59 linux_word 1 3 rg rg -n -w PM_RESUME 0.08639907836914062 9
60 linux_word 1 3 rg rg -n -w PM_RESUME 0.08583569526672363 9
61 linux_word 1 3 rg rg -n -w PM_RESUME 0.08414363861083984 9
62 linux_word 1 3 ag ag -s -w PM_RESUME 0.2853865623474121 9
63 linux_word 1 3 ag ag -s -w PM_RESUME 0.2871377468109131 9
64 linux_word 1 3 ag ag -s -w PM_RESUME 0.28753662109375 9
65 linux_word 1 3 git grep git grep -E -I -n -w PM_RESUME 0.20428204536437988 9 LC_ALL=C
66 linux_word 1 3 git grep git grep -E -I -n -w PM_RESUME 0.20490717887878418 9 LC_ALL=C
67 linux_word 1 3 git grep git grep -E -I -n -w PM_RESUME 0.20840072631835938 9 LC_ALL=C
68 linux_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./ 0.18790841102600098 9
69 linux_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./ 0.18659543991088867 9
70 linux_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -w PM_RESUME ./ 0.19104933738708496 9
71 linux_unicode_greek 1 3 rg rg -n \p{Greek} 0.19976496696472168 105
72 linux_unicode_greek 1 3 rg rg -n \p{Greek} 0.20618367195129395 105
73 linux_unicode_greek 1 3 rg rg -n \p{Greek} 0.19702935218811035 105
74 linux_unicode_greek 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./ 0.17758727073669434 105
75 linux_unicode_greek 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./ 0.17793798446655273 105
76 linux_unicode_greek 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \p{Greek} ./ 0.1872577667236328 105
77 linux_unicode_greek_casei 1 3 rg rg -n -i \p{Greek} 0.19808244705200195 245
78 linux_unicode_greek_casei 1 3 rg rg -n -i \p{Greek} 0.1979837417602539 245
79 linux_unicode_greek_casei 1 3 rg rg -n -i \p{Greek} 0.1984400749206543 245
80 linux_unicode_greek_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./ 0.1819148063659668 105
81 linux_unicode_greek_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./ 0.17530512809753418 105
82 linux_unicode_greek_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i \p{Greek} ./ 0.17999005317687988 105
83 linux_unicode_word 1 3 rg rg -n \wAh 0.08527827262878418 247
84 linux_unicode_word 1 3 rg rg -n \wAh 0.08541679382324219 247
85 linux_unicode_word 1 3 rg rg -n \wAh 0.08553218841552734 247
86 linux_unicode_word 1 3 rg (ASCII) rg -n (?-u)\wAh 0.08484745025634766 233
87 linux_unicode_word 1 3 rg (ASCII) rg -n (?-u)\wAh 0.08466482162475586 233
88 linux_unicode_word 1 3 rg (ASCII) rg -n (?-u)\wAh 0.08487439155578613 233
89 linux_unicode_word 1 3 ag (ASCII) ag -s \wAh 0.3061795234680176 233
90 linux_unicode_word 1 3 ag (ASCII) ag -s \wAh 0.2993617057800293 233
91 linux_unicode_word 1 3 ag (ASCII) ag -s \wAh 0.29722046852111816 233
92 linux_unicode_word 1 3 git grep git grep -E -I -n \wAh 4.257144451141357 247 LC_ALL=en_US.UTF-8
93 linux_unicode_word 1 3 git grep git grep -E -I -n \wAh 3.852163076400757 247 LC_ALL=en_US.UTF-8
94 linux_unicode_word 1 3 git grep git grep -E -I -n \wAh 3.8293941020965576 247 LC_ALL=en_US.UTF-8
95 linux_unicode_word 1 3 git grep (ASCII) git grep -E -I -n \wAh 1.647632122039795 233 LC_ALL=C
96 linux_unicode_word 1 3 git grep (ASCII) git grep -E -I -n \wAh 1.6269629001617432 233 LC_ALL=C
97 linux_unicode_word 1 3 git grep (ASCII) git grep -E -I -n \wAh 1.5847914218902588 233 LC_ALL=C
98 linux_unicode_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \wAh ./ 0.1802208423614502 247
99 linux_unicode_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \wAh ./ 0.17564702033996582 247
100 linux_unicode_word 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \wAh ./ 0.1746981143951416 247
101 linux_unicode_word 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./ 0.1799161434173584 233
102 linux_unicode_word 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./ 0.18733000755310059 233
103 linux_unicode_word 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \wAh ./ 0.18859529495239258 233
104 linux_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.26203155517578125 721
105 linux_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.2615540027618408 721
106 linux_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.2730247974395752 721
107 linux_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.19902300834655762 720
108 linux_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.20034146308898926 720
109 linux_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.20192813873291016 720
110 linux_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.8269081115722656 1134
111 linux_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.8393104076385498 1134
112 linux_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 0.8293666839599609 1134
113 linux_no_literal 1 3 git grep git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 7.334395408630371 721 LC_ALL=en_US.UTF-8
114 linux_no_literal 1 3 git grep git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 7.338796854019165 721 LC_ALL=en_US.UTF-8
115 linux_no_literal 1 3 git grep git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 7.36545991897583 721 LC_ALL=en_US.UTF-8
116 linux_no_literal 1 3 git grep (ASCII) git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 2.1588926315307617 720 LC_ALL=C
117 linux_no_literal 1 3 git grep (ASCII) git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 2.132209062576294 720 LC_ALL=C
118 linux_no_literal 1 3 git grep (ASCII) git grep -E -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} 2.1407439708709717 720 LC_ALL=C
119 linux_no_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 3.410162925720215 723
120 linux_no_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 3.405057668685913 723
121 linux_no_literal 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 3.3945884704589844 723
122 linux_no_literal 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 0.23865604400634766 722
123 linux_no_literal 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 0.23371148109436035 722
124 linux_no_literal 1 3 ugrep (ASCII) ugrep -r --ignore-files --no-hidden -I -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} ./ 0.2343149185180664 722
125 linux_alternates 1 3 rg rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.08691263198852539 140
126 linux_alternates 1 3 rg rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.08707070350646973 140
127 linux_alternates 1 3 rg rg -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.08713960647583008 140
128 linux_alternates 1 3 ag ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.32947278022766113 140
129 linux_alternates 1 3 ag ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.33203840255737305 140
130 linux_alternates 1 3 ag ag -s ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.3292670249938965 140
131 linux_alternates 1 3 git grep git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.4576725959777832 140 LC_ALL=C
132 linux_alternates 1 3 git grep git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.41936421394348145 140 LC_ALL=C
133 linux_alternates 1 3 git grep git grep -E -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.3639688491821289 140 LC_ALL=C
134 linux_alternates 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.17806458473205566 140
135 linux_alternates 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.18224716186523438 140
136 linux_alternates 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.17795038223266602 140
137 linux_alternates_casei 1 3 rg rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.12421393394470215 241
138 linux_alternates_casei 1 3 rg rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.12235784530639648 241
139 linux_alternates_casei 1 3 rg rg -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.12151455879211426 241
140 linux_alternates_casei 1 3 ag ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.529585599899292 241
141 linux_alternates_casei 1 3 ag ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.5305526256561279 241
142 linux_alternates_casei 1 3 ag ag -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.5311264991760254 241
143 linux_alternates_casei 1 3 git grep git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.7589735984802246 241 LC_ALL=C
144 linux_alternates_casei 1 3 git grep git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.7852108478546143 241 LC_ALL=C
145 linux_alternates_casei 1 3 git grep git grep -E -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT 0.8308050632476807 241 LC_ALL=C
146 linux_alternates_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.17955923080444336 241
147 linux_alternates_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.1745290756225586 241
148 linux_alternates_casei 1 3 ugrep ugrep -r --ignore-files --no-hidden -I -n -i ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT ./ 0.1773686408996582 241
149 subtitles_en_literal 1 3 rg rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.1213979721069336 830
150 subtitles_en_literal 1 3 rg rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.1213991641998291 830
151 subtitles_en_literal 1 3 rg rg Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.12620782852172852 830
152 subtitles_en_literal 1 3 rg (no mmap) rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18207263946533203 830
153 subtitles_en_literal 1 3 rg (no mmap) rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.17281484603881836 830
154 subtitles_en_literal 1 3 rg (no mmap) rg --no-mmap Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.17368507385253906 830
155 subtitles_en_literal 1 3 grep grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.560560941696167 830 LC_ALL=C
156 subtitles_en_literal 1 3 grep grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.563499927520752 830 LC_ALL=C
157 subtitles_en_literal 1 3 grep grep Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.5916609764099121 830 LC_ALL=C
158 subtitles_en_literal 1 3 rg (lines) rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.19600844383239746 830
159 subtitles_en_literal 1 3 rg (lines) rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18436980247497559 830
160 subtitles_en_literal 1 3 rg (lines) rg -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18594050407409668 830
161 subtitles_en_literal 1 3 ag (lines) ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.871025562286377 830
162 subtitles_en_literal 1 3 ag (lines) ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.8636960983276367 830
163 subtitles_en_literal 1 3 ag (lines) ag -s Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.8680994510650635 830
164 subtitles_en_literal 1 3 grep (lines) grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.9978001117706299 830 LC_ALL=C
165 subtitles_en_literal 1 3 grep (lines) grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.9385361671447754 830 LC_ALL=C
166 subtitles_en_literal 1 3 grep (lines) grep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.0036489963531494 830 LC_ALL=C
167 subtitles_en_literal 1 3 ugrep (lines) ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18918490409851074 830
168 subtitles_en_literal 1 3 ugrep (lines) ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.1769108772277832 830
169 subtitles_en_literal 1 3 ugrep (lines) ugrep -n Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18808293342590332 830
170 subtitles_en_literal_casei 1 3 rg rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.21876287460327148 871
171 subtitles_en_literal_casei 1 3 rg rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.2044692039489746 871
172 subtitles_en_literal_casei 1 3 rg rg -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.2184743881225586 871
173 subtitles_en_literal_casei 1 3 grep grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 2.224027156829834 871 LC_ALL=en_US.UTF-8
174 subtitles_en_literal_casei 1 3 grep grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 2.223188877105713 871 LC_ALL=en_US.UTF-8
175 subtitles_en_literal_casei 1 3 grep grep -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 2.223966598510742 871 LC_ALL=en_US.UTF-8
176 subtitles_en_literal_casei 1 3 grep (ASCII) grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.671149492263794 871 LC_ALL=C
177 subtitles_en_literal_casei 1 3 grep (ASCII) grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.6705749034881592 871 LC_ALL=C
178 subtitles_en_literal_casei 1 3 grep (ASCII) grep -E -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.6700258255004883 871 LC_ALL=C
179 subtitles_en_literal_casei 1 3 rg (lines) rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.2624058723449707 871
180 subtitles_en_literal_casei 1 3 rg (lines) rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.25513339042663574 871
181 subtitles_en_literal_casei 1 3 rg (lines) rg -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.26088857650756836 871
182 subtitles_en_literal_casei 1 3 ag (lines) (ASCII) ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.9144322872161865 871
183 subtitles_en_literal_casei 1 3 ag (lines) (ASCII) ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.866628885269165 871
184 subtitles_en_literal_casei 1 3 ag (lines) (ASCII) ag -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.9098389148712158 871
185 subtitles_en_literal_casei 1 3 ugrep (lines) ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.7860472202301025 871
186 subtitles_en_literal_casei 1 3 ugrep (lines) ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.7858343124389648 871
187 subtitles_en_literal_casei 1 3 ugrep (lines) ugrep -n -i Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.782252311706543 871
188 subtitles_en_literal_word 1 3 rg (ASCII) rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt 0.18424677848815918 830
189 subtitles_en_literal_word 1 3 rg (ASCII) rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt 0.19610810279846191 830
190 subtitles_en_literal_word 1 3 rg (ASCII) rg -n (?-u:\b)Sherlock Holmes(?-u:\b) /dev/shm/benchsuite/subtitles/en.sample.txt 0.18711471557617188 830
191 subtitles_en_literal_word 1 3 ag (ASCII) ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.8301315307617188 830
192 subtitles_en_literal_word 1 3 ag (ASCII) ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.8689801692962646 830
193 subtitles_en_literal_word 1 3 ag (ASCII) ag -sw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.8279321193695068 830
194 subtitles_en_literal_word 1 3 grep (ASCII) grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.0036842823028564 830 LC_ALL=C
195 subtitles_en_literal_word 1 3 grep (ASCII) grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.002833604812622 830 LC_ALL=C
196 subtitles_en_literal_word 1 3 grep (ASCII) grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.9236147403717041 830 LC_ALL=C
197 subtitles_en_literal_word 1 3 ugrep (ASCII) ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.17717313766479492 830
198 subtitles_en_literal_word 1 3 ugrep (ASCII) ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18994617462158203 830
199 subtitles_en_literal_word 1 3 ugrep (ASCII) ugrep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.17972850799560547 830
200 subtitles_en_literal_word 1 3 rg rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18804550170898438 830
201 subtitles_en_literal_word 1 3 rg rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.18867778778076172 830
202 subtitles_en_literal_word 1 3 rg rg -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.19913530349731445 830
203 subtitles_en_literal_word 1 3 grep grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.0044364929199219 830 LC_ALL=en_US.UTF-8
204 subtitles_en_literal_word 1 3 grep grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 1.0040032863616943 830 LC_ALL=en_US.UTF-8
205 subtitles_en_literal_word 1 3 grep grep -nw Sherlock Holmes /dev/shm/benchsuite/subtitles/en.sample.txt 0.9627983570098877 830 LC_ALL=en_US.UTF-8
206 subtitles_en_alternate 1 3 rg (lines) rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.24848055839538574 1094
207 subtitles_en_alternate 1 3 rg (lines) rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.24738383293151855 1094
208 subtitles_en_alternate 1 3 rg (lines) rg -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.24789118766784668 1094
209 subtitles_en_alternate 1 3 ag (lines) ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.668708562850952 1094
210 subtitles_en_alternate 1 3 ag (lines) ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.57511305809021 1094
211 subtitles_en_alternate 1 3 ag (lines) ag -s Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.6714110374450684 1094
212 subtitles_en_alternate 1 3 grep (lines) grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.0586187839508057 1094 LC_ALL=C
213 subtitles_en_alternate 1 3 grep (lines) grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.0227150917053223 1094 LC_ALL=C
214 subtitles_en_alternate 1 3 grep (lines) grep -E -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 2.075378179550171 1094 LC_ALL=C
215 subtitles_en_alternate 1 3 ugrep (lines) ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.7863781452178955 1094
216 subtitles_en_alternate 1 3 ugrep (lines) ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.7874250411987305 1094
217 subtitles_en_alternate 1 3 ugrep (lines) ugrep -n Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.7867889404296875 1094
218 subtitles_en_alternate 1 3 rg rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.18195557594299316 1094
219 subtitles_en_alternate 1 3 rg rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.18239641189575195 1094
220 subtitles_en_alternate 1 3 rg rg Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.1625690460205078 1094
221 subtitles_en_alternate 1 3 grep grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 1.6601614952087402 1094 LC_ALL=C
222 subtitles_en_alternate 1 3 grep grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 1.6617567539215088 1094 LC_ALL=C
223 subtitles_en_alternate 1 3 grep grep -E Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 1.6584677696228027 1094 LC_ALL=C
224 subtitles_en_alternate_casei 1 3 ag (ASCII) ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 4.0028722286224365 1136
225 subtitles_en_alternate_casei 1 3 ag (ASCII) ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.991217851638794 1136
226 subtitles_en_alternate_casei 1 3 ag (ASCII) ag -s -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 4.00272274017334 1136
227 subtitles_en_alternate_casei 1 3 grep (ASCII) grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.549154758453369 1136 LC_ALL=C
228 subtitles_en_alternate_casei 1 3 grep (ASCII) grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.5468921661376953 1136 LC_ALL=C
229 subtitles_en_alternate_casei 1 3 grep (ASCII) grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.5873491764068604 1136 LC_ALL=C
230 subtitles_en_alternate_casei 1 3 ugrep (ASCII) ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.7872169017791748 1136
231 subtitles_en_alternate_casei 1 3 ugrep (ASCII) ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.784674882888794 1136
232 subtitles_en_alternate_casei 1 3 ugrep (ASCII) ugrep -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.7882401943206787 1136
233 subtitles_en_alternate_casei 1 3 rg rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.4785435199737549 1136
234 subtitles_en_alternate_casei 1 3 rg rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.4940922260284424 1136
235 subtitles_en_alternate_casei 1 3 rg rg -n -i Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 0.4774627685546875 1136
236 subtitles_en_alternate_casei 1 3 grep grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.5677175521850586 1136 LC_ALL=en_US.UTF-8
237 subtitles_en_alternate_casei 1 3 grep grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.603273391723633 1136 LC_ALL=en_US.UTF-8
238 subtitles_en_alternate_casei 1 3 grep grep -E -ni Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty /dev/shm/benchsuite/subtitles/en.sample.txt 3.5834741592407227 1136 LC_ALL=en_US.UTF-8
239 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.20238041877746582 278
240 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.2031264305114746 278
241 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.20475172996520996 278
242 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0288453102111816 278 LC_ALL=en_US.UTF-8
243 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.044802188873291 278 LC_ALL=en_US.UTF-8
244 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0432109832763672 278 LC_ALL=en_US.UTF-8
245 subtitles_ru_surrounding_words 1 3 ugrep ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 43.00765633583069 278
246 subtitles_ru_surrounding_words 1 3 ugrep ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 42.832849740982056 278
247 subtitles_ru_surrounding_words 1 3 ugrep ugrep -an \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 42.915205240249634 278
248 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.083683967590332
249 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0841526985168457
250 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0850934982299805
251 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0116353034973145 LC_ALL=C
252 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.9868073463439941 LC_ALL=C
253 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0224814414978027 LC_ALL=C
254 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8892502784729004
255 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8910088539123535
256 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8897674083709717
257 subtitles_en_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 2.11850643157959 22
258 subtitles_en_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 2.1359670162200928 22
259 subtitles_en_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 2.103114128112793 22
260 subtitles_en_no_literal 1 3 ugrep ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 13.050881385803223 22
261 subtitles_en_no_literal 1 3 ugrep ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 13.050772190093994 22
262 subtitles_en_no_literal 1 3 ugrep ugrep -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 13.05719804763794 22
263 subtitles_en_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 1.9961926937103271 22
264 subtitles_en_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 2.019721508026123 22
265 subtitles_en_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 1.9965126514434814 22
266 subtitles_en_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 6.849602222442627 302
267 subtitles_en_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 6.813834190368652 302
268 subtitles_en_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 6.8263633251190186 302
269 subtitles_en_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 4.42924165725708 22 LC_ALL=C
270 subtitles_en_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 4.378557205200195 22 LC_ALL=C
271 subtitles_en_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 4.376646518707275 22 LC_ALL=C
272 subtitles_en_no_literal 1 3 ugrep (ASCII) ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 3.5110037326812744 22
273 subtitles_en_no_literal 1 3 ugrep (ASCII) ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 3.5137360095977783 22
274 subtitles_en_no_literal 1 3 ugrep (ASCII) ugrep -n -U \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/en.sample.txt 3.5051844120025635 22
275 subtitles_ru_literal 1 3 rg rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.13207745552062988 583
276 subtitles_ru_literal 1 3 rg rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.13084721565246582 583
277 subtitles_ru_literal 1 3 rg rg Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.13469862937927246 583
278 subtitles_ru_literal 1 3 rg (no mmap) rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.18022370338439941 583
279 subtitles_ru_literal 1 3 rg (no mmap) rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.1801767349243164 583
280 subtitles_ru_literal 1 3 rg (no mmap) rg --no-mmap Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.17995166778564453 583
281 subtitles_ru_literal 1 3 grep grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.5151040554046631 583 LC_ALL=C
282 subtitles_ru_literal 1 3 grep grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.5154542922973633 583 LC_ALL=C
283 subtitles_ru_literal 1 3 grep grep Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.49927639961242676 583 LC_ALL=C
284 subtitles_ru_literal 1 3 rg (lines) rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.19464492797851562 583
285 subtitles_ru_literal 1 3 rg (lines) rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.18920588493347168 583
286 subtitles_ru_literal 1 3 rg (lines) rg -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.19465351104736328 583
287 subtitles_ru_literal 1 3 ag (lines) ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 1.9595966339111328 583
288 subtitles_ru_literal 1 3 ag (lines) ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 2.0014493465423584 583
289 subtitles_ru_literal 1 3 ag (lines) ag -s Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 1.9567768573760986 583
290 subtitles_ru_literal 1 3 grep (lines) grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8119180202484131 583 LC_ALL=C
291 subtitles_ru_literal 1 3 grep (lines) grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8111097812652588 583 LC_ALL=C
292 subtitles_ru_literal 1 3 grep (lines) grep -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8006868362426758 583 LC_ALL=C
293 subtitles_ru_literal 1 3 ugrep (lines) ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.70003342628479 583
294 subtitles_ru_literal 1 3 ugrep (lines) ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.650275468826294 583
295 subtitles_ru_literal 1 3 ugrep (lines) ugrep -a -n Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.689772367477417 583
296 subtitles_ru_literal_casei 1 3 rg rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.267578125 604
297 subtitles_ru_literal_casei 1 3 rg rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.2665982246398926 604
298 subtitles_ru_literal_casei 1 3 rg rg -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.26861572265625 604
299 subtitles_ru_literal_casei 1 3 grep grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 4.764627456665039 604 LC_ALL=en_US.UTF-8
300 subtitles_ru_literal_casei 1 3 grep grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 4.767015695571899 604 LC_ALL=en_US.UTF-8
301 subtitles_ru_literal_casei 1 3 grep grep -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 4.7688889503479 604 LC_ALL=en_US.UTF-8
302 subtitles_ru_literal_casei 1 3 grep (ASCII) grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.5046737194061279 583 LC_ALL=C
303 subtitles_ru_literal_casei 1 3 grep (ASCII) grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.5139875411987305 583 LC_ALL=C
304 subtitles_ru_literal_casei 1 3 grep (ASCII) grep -E -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.4993159770965576 583 LC_ALL=C
305 subtitles_ru_literal_casei 1 3 rg (lines) rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.33438658714294434 604
306 subtitles_ru_literal_casei 1 3 rg (lines) rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.3398289680480957 604
307 subtitles_ru_literal_casei 1 3 rg (lines) rg -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.3298227787017822 604
308 subtitles_ru_literal_casei 1 3 ag (lines) (ASCII) ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.4468214511871338
309 subtitles_ru_literal_casei 1 3 ag (lines) (ASCII) ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.44559574127197266
310 subtitles_ru_literal_casei 1 3 ag (lines) (ASCII) ag -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.47882938385009766
311 subtitles_ru_literal_casei 1 3 ugrep (lines) (ASCII) ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.7039575576782227 583
312 subtitles_ru_literal_casei 1 3 ugrep (lines) (ASCII) ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.6490752696990967 583
313 subtitles_ru_literal_casei 1 3 ugrep (lines) (ASCII) ugrep -a -n -i Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8081104755401611 583
314 subtitles_ru_literal_word 1 3 rg (ASCII) rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt 0.20162224769592285 583
315 subtitles_ru_literal_word 1 3 rg (ASCII) rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt 0.18215250968933105 583
316 subtitles_ru_literal_word 1 3 rg (ASCII) rg -n (?-u:^|\W)Шерлок Холмс(?-u:$|\W) /dev/shm/benchsuite/subtitles/ru.txt 0.20087671279907227 583
317 subtitles_ru_literal_word 1 3 ag (ASCII) ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.48624587059020996
318 subtitles_ru_literal_word 1 3 ag (ASCII) ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.5212516784667969
319 subtitles_ru_literal_word 1 3 ag (ASCII) ag -sw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.520557165145874
320 subtitles_ru_literal_word 1 3 grep (ASCII) grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8108196258544922 583 LC_ALL=C
321 subtitles_ru_literal_word 1 3 grep (ASCII) grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8121066093444824 583 LC_ALL=C
322 subtitles_ru_literal_word 1 3 grep (ASCII) grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.7784581184387207 583 LC_ALL=C
323 subtitles_ru_literal_word 1 3 ugrep (ASCII) ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.7469344139099121 583
324 subtitles_ru_literal_word 1 3 ugrep (ASCII) ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.6838233470916748 583
325 subtitles_ru_literal_word 1 3 ugrep (ASCII) ugrep -anw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.6921679973602295 583
326 subtitles_ru_literal_word 1 3 rg rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.19918251037597656 579
327 subtitles_ru_literal_word 1 3 rg rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.2046656608581543 579
328 subtitles_ru_literal_word 1 3 rg rg -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.1984848976135254 579
329 subtitles_ru_literal_word 1 3 grep grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.794173002243042 579 LC_ALL=en_US.UTF-8
330 subtitles_ru_literal_word 1 3 grep grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.7715346813201904 579 LC_ALL=en_US.UTF-8
331 subtitles_ru_literal_word 1 3 grep grep -nw Шерлок Холмс /dev/shm/benchsuite/subtitles/ru.txt 0.8116705417633057 579 LC_ALL=en_US.UTF-8
332 subtitles_ru_alternate 1 3 rg (lines) rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.6730976104736328 691
333 subtitles_ru_alternate 1 3 rg (lines) rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.7020411491394043 691
334 subtitles_ru_alternate 1 3 rg (lines) rg -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.6693949699401855 691
335 subtitles_ru_alternate 1 3 ag (lines) ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.7100515365600586 691
336 subtitles_ru_alternate 1 3 ag (lines) ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.7458419799804688 691
337 subtitles_ru_alternate 1 3 ag (lines) ag -s Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.7115116119384766 691
338 subtitles_ru_alternate 1 3 grep (lines) grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.703738451004028 691 LC_ALL=C
339 subtitles_ru_alternate 1 3 grep (lines) grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.715883731842041 691 LC_ALL=C
340 subtitles_ru_alternate 1 3 grep (lines) grep -E -n Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.712724924087524 691 LC_ALL=C
341 subtitles_ru_alternate 1 3 ugrep (lines) ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.276995420455933 691
342 subtitles_ru_alternate 1 3 ugrep (lines) ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.304608345031738 691
343 subtitles_ru_alternate 1 3 ugrep (lines) ugrep -an Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.322760820388794 691
344 subtitles_ru_alternate 1 3 rg rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.6119842529296875 691
345 subtitles_ru_alternate 1 3 rg rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.6368775367736816 691
346 subtitles_ru_alternate 1 3 rg rg Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 0.6258070468902588 691
347 subtitles_ru_alternate 1 3 grep grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.4300291538238525 691 LC_ALL=C
348 subtitles_ru_alternate 1 3 grep grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.418199300765991 691 LC_ALL=C
349 subtitles_ru_alternate 1 3 grep grep -E Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.425868511199951 691 LC_ALL=C
350 subtitles_ru_alternate_casei 1 3 ag (ASCII) ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.7216460704803467 691
351 subtitles_ru_alternate_casei 1 3 ag (ASCII) ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.7108607292175293 691
352 subtitles_ru_alternate_casei 1 3 ag (ASCII) ag -s -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 2.747138500213623 691
353 subtitles_ru_alternate_casei 1 3 grep (ASCII) grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.711230039596558 691 LC_ALL=C
354 subtitles_ru_alternate_casei 1 3 grep (ASCII) grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.709407329559326 691 LC_ALL=C
355 subtitles_ru_alternate_casei 1 3 grep (ASCII) grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.714034557342529 691 LC_ALL=C
356 subtitles_ru_alternate_casei 1 3 ugrep (ASCII) ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.305904626846313 691
357 subtitles_ru_alternate_casei 1 3 ugrep (ASCII) ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.307406187057495 691
358 subtitles_ru_alternate_casei 1 3 ugrep (ASCII) ugrep -ani Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 8.288233995437622 691
359 subtitles_ru_alternate_casei 1 3 rg rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 3.673624277114868 735
360 subtitles_ru_alternate_casei 1 3 rg rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 3.6759188175201416 735
361 subtitles_ru_alternate_casei 1 3 rg rg -n -i Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 3.66877818107605 735
362 subtitles_ru_alternate_casei 1 3 grep grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.366282224655151 735 LC_ALL=en_US.UTF-8
363 subtitles_ru_alternate_casei 1 3 grep grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.370524883270264 735 LC_ALL=en_US.UTF-8
364 subtitles_ru_alternate_casei 1 3 grep grep -E -ni Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти /dev/shm/benchsuite/subtitles/ru.txt 5.342163324356079 735 LC_ALL=en_US.UTF-8
365 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.20331382751464844 278
366 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.2034592628479004 278
367 subtitles_ru_surrounding_words 1 3 rg rg -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.20407724380493164 278
368 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0436389446258545 278 LC_ALL=en_US.UTF-8
369 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0388383865356445 278 LC_ALL=en_US.UTF-8
370 subtitles_ru_surrounding_words 1 3 grep grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0446207523345947 278 LC_ALL=en_US.UTF-8
371 subtitles_ru_surrounding_words 1 3 ugrep ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.29245424270629883 1
372 subtitles_ru_surrounding_words 1 3 ugrep ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.29168128967285156 1
373 subtitles_ru_surrounding_words 1 3 ugrep ugrep -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.29593825340270996 1
374 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.085604190826416
375 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.083526372909546
376 subtitles_ru_surrounding_words 1 3 ag (ASCII) ag -s \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.1223819255828857
377 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.9905192852020264 LC_ALL=C
378 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0222513675689697 LC_ALL=C
379 subtitles_ru_surrounding_words 1 3 grep (ASCII) grep -E -n \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 1.0216262340545654 LC_ALL=C
380 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8875806331634521
381 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8861405849456787
382 subtitles_ru_surrounding_words 1 3 ugrep (ASCII) ugrep -a -n -U \w+\s+Холмс\s+\w+ /dev/shm/benchsuite/subtitles/ru.txt 0.8898241519927979
383 subtitles_ru_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.237398147583008 41
384 subtitles_ru_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.253706693649292 41
385 subtitles_ru_no_literal 1 3 rg rg -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.2161178588867188 41
386 subtitles_ru_no_literal 1 3 ugrep ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 28.85959553718567 41
387 subtitles_ru_no_literal 1 3 ugrep ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 28.666419982910156 41
388 subtitles_ru_no_literal 1 3 ugrep ugrep -an \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 28.90555214881897 41
389 subtitles_ru_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.051813840866089
390 subtitles_ru_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.026675224304199
391 subtitles_ru_no_literal 1 3 rg (ASCII) rg -n (?-u)\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 2.027498245239258
392 subtitles_ru_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0998010635375977
393 subtitles_ru_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0900018215179443
394 subtitles_ru_no_literal 1 3 ag (ASCII) ag -s \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0901548862457275
395 subtitles_ru_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0691263675689697 LC_ALL=C
396 subtitles_ru_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0875153541564941 LC_ALL=C
397 subtitles_ru_no_literal 1 3 grep (ASCII) grep -E -n \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 1.0997354984283447 LC_ALL=C
398 subtitles_ru_no_literal 1 3 ugrep (ASCII) ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 0.8329172134399414
399 subtitles_ru_no_literal 1 3 ugrep (ASCII) ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 0.8292679786682129
400 subtitles_ru_no_literal 1 3 ugrep (ASCII) ugrep -anU \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5} /dev/shm/benchsuite/subtitles/ru.txt 0.8326950073242188

View File

@@ -1,208 +0,0 @@
linux_literal_default (pattern: PM_RESUME)
------------------------------------------
rg* 0.084 +/- 0.002 (lines: 39)*
ag 0.295 +/- 0.001 (lines: 39)
git grep 0.225 +/- 0.007 (lines: 39)
ugrep 0.105 +/- 0.002 (lines: 39)
grep 0.996 +/- 0.003 (lines: 39)
linux_literal (pattern: PM_RESUME)
----------------------------------
rg* 0.085 +/- 0.001 (lines: 39)*
rg (mmap) 0.322 +/- 0.002 (lines: 39)
ag (mmap) 0.290 +/- 0.002 (lines: 39)
git grep 0.211 +/- 0.009 (lines: 39)
ugrep 0.189 +/- 0.005 (lines: 39)
linux_literal_casei (pattern: PM_RESUME)
----------------------------------------
rg* 0.088 +/- 0.001 (lines: 536)*
rg (mmap) 0.314 +/- 0.007 (lines: 536)
ag (mmap) 0.299 +/- 0.001 (lines: 536)
git grep 0.214 +/- 0.007 (lines: 536)
ugrep 0.174 +/- 0.001 (lines: 536)
linux_re_literal_suffix (pattern: [A-Z]+_RESUME)
------------------------------------------------
rg* 0.085 +/- 0.000 (lines: 2160)*
ag 0.369 +/- 0.009 (lines: 2160)
git grep 0.915 +/- 0.048 (lines: 2160)
ugrep 0.433 +/- 0.025 (lines: 2160)
linux_word (pattern: PM_RESUME)
-------------------------------
rg* 0.085 +/- 0.001 (lines: 9)*
ag 0.287 +/- 0.001 (lines: 9)
git grep 0.206 +/- 0.002 (lines: 9)
ugrep 0.189 +/- 0.002 (lines: 9)
linux_unicode_greek (pattern: \p{Greek})
----------------------------------------
rg 0.201 +/- 0.005 (lines: 105)
ugrep* 0.181 +/- 0.005 (lines: 105)*
linux_unicode_greek_casei (pattern: \p{Greek})
----------------------------------------------
rg 0.198 +/- 0.000 (lines: 245)
ugrep* 0.179 +/- 0.003 (lines: 105)*
linux_unicode_word (pattern: \wAh)
----------------------------------
rg 0.085 +/- 0.000 (lines: 247)
rg (ASCII)* 0.085 +/- 0.000 (lines: 233)*
ag (ASCII) 0.301 +/- 0.005 (lines: 233)
git grep 3.980 +/- 0.241 (lines: 247)
git grep (ASCII) 1.620 +/- 0.032 (lines: 233)
ugrep 0.177 +/- 0.003 (lines: 247)
ugrep (ASCII) 0.185 +/- 0.005 (lines: 233)
linux_no_literal (pattern: \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5})
-----------------------------------------------------------------
rg 0.266 +/- 0.006 (lines: 721)
rg (ASCII)* 0.200 +/- 0.001 (lines: 720)*
ag (ASCII) 0.832 +/- 0.007 (lines: 1134)
git grep 7.346 +/- 0.017 (lines: 721)
git grep (ASCII) 2.144 +/- 0.014 (lines: 720)
ugrep 3.403 +/- 0.008 (lines: 723)
ugrep (ASCII) 0.236 +/- 0.003 (lines: 722)
linux_alternates (pattern: ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT)
-------------------------------------------------------------------------
rg* 0.087 +/- 0.000 (lines: 140)*
ag 0.330 +/- 0.002 (lines: 140)
git grep 0.414 +/- 0.047 (lines: 140)
ugrep 0.179 +/- 0.002 (lines: 140)
linux_alternates_casei (pattern: ERR_SYS|PME_TURN_OFF|LINK_REQ_RST|CFG_BME_EVT)
-------------------------------------------------------------------------------
rg* 0.123 +/- 0.001 (lines: 241)*
ag 0.530 +/- 0.001 (lines: 241)
git grep 0.792 +/- 0.036 (lines: 241)
ugrep 0.177 +/- 0.003 (lines: 241)
subtitles_en_literal (pattern: Sherlock Holmes)
-----------------------------------------------
rg* 0.123 +/- 0.003 (lines: 830)*
rg (no mmap) 0.176 +/- 0.005 (lines: 830)
grep 0.572 +/- 0.017 (lines: 830)
rg (lines) 0.189 +/- 0.006 (lines: 830)
ag (lines) 1.868 +/- 0.004 (lines: 830)
grep (lines) 0.980 +/- 0.036 (lines: 830)
ugrep (lines) 0.185 +/- 0.007 (lines: 830)
subtitles_en_literal_casei (pattern: Sherlock Holmes)
-----------------------------------------------------
rg* 0.214 +/- 0.008 (lines: 871)*
grep 2.224 +/- 0.000 (lines: 871)
grep (ASCII) 0.671 +/- 0.001 (lines: 871)
rg (lines) 0.259 +/- 0.004 (lines: 871)
ag (lines) (ASCII) 1.897 +/- 0.026 (lines: 871)
ugrep (lines) 0.785 +/- 0.002 (lines: 871)
subtitles_en_literal_word (pattern: Sherlock Holmes)
----------------------------------------------------
rg (ASCII) 0.189 +/- 0.006 (lines: 830)
ag (ASCII) 1.842 +/- 0.023 (lines: 830)
grep (ASCII) 0.977 +/- 0.046 (lines: 830)
ugrep (ASCII)* 0.182 +/- 0.007 (lines: 830)*
rg 0.192 +/- 0.006 (lines: 830)
grep 0.990 +/- 0.024 (lines: 830)
subtitles_en_alternate (pattern: Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)
---------------------------------------------------------------------------------------------------------------
rg (lines) 0.248 +/- 0.001 (lines: 1094)
ag (lines) 2.638 +/- 0.055 (lines: 1094)
grep (lines) 2.052 +/- 0.027 (lines: 1094)
ugrep (lines) 0.787 +/- 0.001 (lines: 1094)
rg* 0.176 +/- 0.011 (lines: 1094)*
grep 1.660 +/- 0.002 (lines: 1094)
subtitles_en_alternate_casei (pattern: Sherlock Holmes|John Watson|Irene Adler|Inspector Lestrade|Professor Moriarty)
---------------------------------------------------------------------------------------------------------------------
ag (ASCII) 3.999 +/- 0.007 (lines: 1136)
grep (ASCII) 3.561 +/- 0.023 (lines: 1136)
ugrep (ASCII) 0.787 +/- 0.002 (lines: 1136)
rg* 0.483 +/- 0.009 (lines: 1136)*
grep 3.585 +/- 0.018 (lines: 1136)
subtitles_en_surrounding_words (pattern: \w+\s+Holmes\s+\w+)
------------------------------------------------------------
rg 0.200 +/- 0.001 (lines: 483)
grep 1.303 +/- 0.040 (lines: 483)
ugrep 43.220 +/- 0.047 (lines: 483)
rg (ASCII)* 0.197 +/- 0.000 (lines: 483)*
ag (ASCII) 5.223 +/- 0.056 (lines: 489)
grep (ASCII) 1.316 +/- 0.043 (lines: 483)
ugrep (ASCII) 17.647 +/- 0.219 (lines: 483)
subtitles_en_no_literal (pattern: \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5})
----------------------------------------------------------------------------------------
rg 2.119 +/- 0.016 (lines: 22)
ugrep 13.053 +/- 0.004 (lines: 22)
rg (ASCII)* 2.004 +/- 0.013 (lines: 22)*
ag (ASCII) 6.830 +/- 0.018 (lines: 302)
grep (ASCII) 4.395 +/- 0.030 (lines: 22)
ugrep (ASCII) 3.510 +/- 0.004 (lines: 22)
subtitles_ru_literal (pattern: Шерлок Холмс)
--------------------------------------------
rg* 0.133 +/- 0.002 (lines: 583)*
rg (no mmap) 0.180 +/- 0.000 (lines: 583)
grep 0.510 +/- 0.009 (lines: 583)
rg (lines) 0.193 +/- 0.003 (lines: 583)
ag (lines) 1.973 +/- 0.025 (lines: 583)
grep (lines) 0.808 +/- 0.006 (lines: 583)
ugrep (lines) 0.680 +/- 0.026 (lines: 583)
subtitles_ru_literal_casei (pattern: Шерлок Холмс)
--------------------------------------------------
rg* 0.268 +/- 0.001 (lines: 604)*
grep 4.767 +/- 0.002 (lines: 604)
grep (ASCII) 0.506 +/- 0.007 (lines: 583)
rg (lines) 0.335 +/- 0.005 (lines: 604)
ag (lines) (ASCII) 0.457 +/- 0.019 (lines: 0)
ugrep (lines) (ASCII) 0.720 +/- 0.081 (lines: 583)
subtitles_ru_literal_word (pattern: Шерлок Холмс)
-------------------------------------------------
rg (ASCII)* 0.195 +/- 0.011 (lines: 583)*
ag (ASCII) 0.509 +/- 0.020 (lines: 0)
grep (ASCII) 0.800 +/- 0.019 (lines: 583)
ugrep (ASCII) 0.708 +/- 0.034 (lines: 583)
rg 0.201 +/- 0.003 (lines: 579)
grep 0.792 +/- 0.020 (lines: 579)
subtitles_ru_alternate (pattern: Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти)
-----------------------------------------------------------------------------------------------------------
rg (lines) 0.682 +/- 0.018 (lines: 691)
ag (lines) 2.722 +/- 0.020 (lines: 691)
grep (lines) 5.711 +/- 0.006 (lines: 691)
ugrep (lines) 8.301 +/- 0.023 (lines: 691)
rg* 0.625 +/- 0.012 (lines: 691)*
grep 5.425 +/- 0.006 (lines: 691)
subtitles_ru_alternate_casei (pattern: Шерлок Холмс|Джон Уотсон|Ирен Адлер|инспектор Лестрейд|профессор Мориарти)
-----------------------------------------------------------------------------------------------------------------
ag (ASCII)* 2.727 +/- 0.019 (lines: 691)*
grep (ASCII) 5.712 +/- 0.002 (lines: 691)
ugrep (ASCII) 8.301 +/- 0.011 (lines: 691)
rg 3.673 +/- 0.004 (lines: 735)
grep 5.360 +/- 0.015 (lines: 735)
subtitles_ru_surrounding_words (pattern: \w+\s+Холмс\s+\w+)
-----------------------------------------------------------
rg* 0.203 +/- 0.001 (lines: 278)*
grep 1.039 +/- 0.009 (lines: 278)
ugrep 42.919 +/- 0.087 (lines: 278)
ag (ASCII) 1.084 +/- 0.001 (lines: 0)
grep (ASCII) 1.007 +/- 0.018 (lines: 0)
ugrep (ASCII) 0.890 +/- 0.001 (lines: 0)
subtitles_ru_no_literal (pattern: \w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5}\s+\w{5})
----------------------------------------------------------------------------------------
rg 2.236 +/- 0.019 (lines: 41)
ugrep 28.811 +/- 0.127 (lines: 41)
rg (ASCII) 2.035 +/- 0.014 (lines: 0)
ag (ASCII) 1.093 +/- 0.006 (lines: 0)
grep (ASCII) 1.085 +/- 0.015 (lines: 0)
ugrep (ASCII)* 0.832 +/- 0.002 (lines: 0)*

View File

@@ -48,34 +48,6 @@ fn main() {
if let Some(rev) = git_revision_hash() { if let Some(rev) = git_revision_hash() {
println!("cargo:rustc-env=RIPGREP_BUILD_GIT_HASH={}", rev); println!("cargo:rustc-env=RIPGREP_BUILD_GIT_HASH={}", rev);
} }
// Embed a Windows manifest and set some linker options. The main reason
// for this is to enable long path support on Windows. This still, I
// believe, requires enabling long path support in the registry. But if
// that's enabled, then this will let ripgrep use C:\... style paths that
// are longer than 260 characters.
set_windows_exe_options();
}
fn set_windows_exe_options() {
static MANIFEST: &str = "pkg/windows/Manifest.xml";
let Ok(target_os) = env::var("CARGO_CFG_TARGET_OS") else { return };
let Ok(target_env) = env::var("CARGO_CFG_TARGET_ENV") else { return };
if !(target_os == "windows" && target_env == "msvc") {
return;
}
let Ok(mut manifest) = env::current_dir() else { return };
manifest.push(MANIFEST);
let Some(manifest) = manifest.to_str() else { return };
println!("cargo:rerun-if-changed={}", MANIFEST);
// Embed the Windows application manifest file.
println!("cargo:rustc-link-arg-bin=rg=/MANIFEST:EMBED");
println!("cargo:rustc-link-arg-bin=rg=/MANIFESTINPUT:{manifest}");
// Turn linker warnings into errors. Helps debugging, otherwise the
// warnings get squashed (I believe).
println!("cargo:rustc-link-arg-bin=rg=/WX");
} }
fn git_revision_hash() -> Option<String> { fn git_revision_hash() -> Option<String> {

View File

@@ -17,21 +17,16 @@ if ! command -V cargo-deb > /dev/null 2>&1; then
exit 1 exit 1
fi fi
if ! command -V asciidoctor > /dev/null 2>&1; then
echo "asciidoctor command missing" >&2
exit 1
fi
# 'cargo deb' does not seem to provide a way to specify an asset that is # 'cargo deb' does not seem to provide a way to specify an asset that is
# created at build time, such as ripgrep's man page. To work around this, # created at build time, such as ripgrep's man page. To work around this,
# we force a debug build, copy out the man page (and shell completions) # we force a debug build, copy out the man page (and shell completions)
# produced from that build, put it into a predictable location and then build # produced from that build, put it into a predictable location and then build
# the deb, which knows where to look. # the deb, which knows where to look.
cargo build
DEPLOY_DIR=deployment/deb DEPLOY_DIR=deployment/deb
OUT_DIR="$("$D"/cargo-out-dir target/debug/)" OUT_DIR="$("$D"/cargo-out-dir target/debug/)"
mkdir -p "$DEPLOY_DIR" mkdir -p "$DEPLOY_DIR"
cargo build
# Copy man page and shell completions. # Copy man page and shell completions.
cp "$OUT_DIR"/{rg.1,rg.bash,rg.fish} "$DEPLOY_DIR/" cp "$OUT_DIR"/{rg.1,rg.bash,rg.fish} "$DEPLOY_DIR/"
@@ -39,4 +34,4 @@ cp complete/_rg "$DEPLOY_DIR/"
# Since we're distributing the dpkg, we don't know whether the user will have # Since we're distributing the dpkg, we don't know whether the user will have
# PCRE2 installed, so just do a static build. # PCRE2 installed, so just do a static build.
PCRE2_SYS_STATIC=1 cargo deb --target x86_64-unknown-linux-musl PCRE2_SYS_STATIC=1 cargo deb

View File

@@ -44,8 +44,8 @@ main() {
# Occasionally we may have to handle some manually, however # Occasionally we may have to handle some manually, however
help_args=( ${(f)"$( help_args=( ${(f)"$(
$rg --help | $rg --help |
$rg -i -- '^\s+--?[a-z0-9.]|--[a-z]' | $rg -i -- '^\s+--?[a-z0-9]|--[a-z]' |
$rg -ior '$1' -- $'[\t /\"\'`.,](-[a-z0-9.]|--[a-z0-9-]+)(,|\\b)' | $rg -ior '$1' -- $'[\t /\"\'`.,](-[a-z0-9]|--[a-z0-9-]+)\\b' |
$rg -v -- --print0 | # False positives $rg -v -- --print0 | # False positives
sort -u sort -u
)"} ) )"} )

View File

@@ -1,16 +1,6 @@
#!/bin/sh #!/bin/sh
# This script gets run in weird environments that have been stripped of just
# about every inessential thing. In order to keep this script versatile, we
# just install 'sudo' and use it like normal if it doesn't exist. If it doesn't
# exist, we assume we're root. (Otherwise we ain't doing much of anything
# anyway.)
if ! command -V sudo; then
apt-get update
apt-get install -y --no-install-recommends sudo
fi
sudo apt-get update sudo apt-get update
sudo apt-get install -y --no-install-recommends \ sudo apt-get install -y --no-install-recommends \
asciidoctor \ asciidoctor \
zsh xz-utils liblz4-tool musl-tools \ zsh xz-utils liblz4-tool musl-tools
brotli zstd

View File

@@ -30,7 +30,7 @@ _rg() {
[[ $_RG_COMPLETE_LIST_ARGS == (1|t*|y*) ]] || [[ $_RG_COMPLETE_LIST_ARGS == (1|t*|y*) ]] ||
# (--[imnp]* => --ignore*, --messages, --no-*, --pcre2-unicode) # (--[imnp]* => --ignore*, --messages, --no-*, --pcre2-unicode)
[[ $PREFIX$SUFFIX == --[imnp]* ]] || [[ $PREFIX$SUFFIX == --[imnp]* ]] ||
zstyle -t ":completion:${curcontext}:" complete-all zstyle -t ":complete:$curcontext:*" complete-all
then then
no= no=
fi fi
@@ -121,7 +121,7 @@ _rg() {
"(pretty-vimgrep)--no-heading[don't show matches grouped by file name]" "(pretty-vimgrep)--no-heading[don't show matches grouped by file name]"
+ '(hidden)' # Hidden-file options + '(hidden)' # Hidden-file options
{-.,--hidden}'[search hidden files and directories]' '--hidden[search hidden files and directories]'
$no"--no-hidden[don't search hidden files and directories]" $no"--no-hidden[don't search hidden files and directories]"
+ '(hybrid)' # hybrid regex options + '(hybrid)' # hybrid regex options
@@ -303,8 +303,6 @@ _rg() {
'--context-separator=[specify string used to separate non-continuous context lines in output]:separator' '--context-separator=[specify string used to separate non-continuous context lines in output]:separator'
$no"--no-context-separator[don't print context separators]" $no"--no-context-separator[don't print context separators]"
'--debug[show debug messages]' '--debug[show debug messages]'
'--field-context-separator[set string to delimit fields in context lines]'
'--field-match-separator[set string to delimit fields in matching lines]'
'--trace[show more verbose debug messages]' '--trace[show more verbose debug messages]'
'--dfa-size-limit=[specify upper size limit of generated DFA]:DFA size (bytes)' '--dfa-size-limit=[specify upper size limit of generated DFA]:DFA size (bytes)'
"(1 stats)--files[show each file that would be searched (but don't search)]" "(1 stats)--files[show each file that would be searched (but don't search)]"
@@ -319,7 +317,6 @@ _rg() {
'(-q --quiet)'{-q,--quiet}'[suppress normal output]' '(-q --quiet)'{-q,--quiet}'[suppress normal output]'
'--regex-size-limit=[specify upper size limit of compiled regex]:regex size (bytes)' '--regex-size-limit=[specify upper size limit of compiled regex]:regex size (bytes)'
'*'{-u,--unrestricted}'[reduce level of "smart" searching]' '*'{-u,--unrestricted}'[reduce level of "smart" searching]'
'--stop-on-nonmatch[stop on first non-matching line after a matching one]'
+ operand # Operands + operand # Operands
'(--files --type-list file regexp)1: :_guard "^-*" pattern' '(--files --type-list file regexp)1: :_guard "^-*" pattern'
@@ -433,13 +430,9 @@ _rg_types() {
local -a expl local -a expl
local -aU _types local -aU _types
_types=( ${(@)${(f)"$( _call_program types $words[1] --type-list )"}//:[[:space:]]##/:} ) _types=( ${(@)${(f)"$( _call_program types rg --type-list )"}%%:*} )
if zstyle -t ":completion:${curcontext}:types" extra-verbose; then _wanted types expl 'file type' compadd -a "$@" - _types
_describe -t types 'file type' _types
else
_wanted types expl 'file type' compadd "$@" - ${(@)_types%%:*}
fi
} }
_rg "$@" _rg "$@"

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "grep-cli" name = "grep-cli"
version = "0.1.9" #:version version = "0.1.5" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"] authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """ description = """
Utilities for search oriented command line applications. Utilities for search oriented command line applications.
@@ -10,12 +10,12 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/cli"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/cli" repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/cli"
readme = "README.md" readme = "README.md"
keywords = ["regex", "grep", "cli", "utility", "util"] keywords = ["regex", "grep", "cli", "utility", "util"]
license = "Unlicense OR MIT" license = "Unlicense/MIT"
edition = "2018"
[dependencies] [dependencies]
bstr = "1.6.0" atty = "0.2.11"
globset = { version = "0.4.10", path = "../globset" } bstr = "0.2.0"
globset = { version = "0.4.5", path = "../globset" }
lazy_static = "1.1.0" lazy_static = "1.1.0"
log = "0.4.5" log = "0.4.5"
regex = "1.1" regex = "1.1"

View File

@@ -29,3 +29,9 @@ Add this to your `Cargo.toml`:
[dependencies] [dependencies]
grep-cli = "0.1" grep-cli = "0.1"
``` ```
and this to your crate root:
```rust
extern crate grep_cli;
```

View File

@@ -6,7 +6,7 @@ use std::process::Command;
use globset::{Glob, GlobSet, GlobSetBuilder}; use globset::{Glob, GlobSet, GlobSetBuilder};
use crate::process::{CommandError, CommandReader, CommandReaderBuilder}; use process::{CommandError, CommandReader, CommandReaderBuilder};
/// A builder for a matcher that determines which files get decompressed. /// A builder for a matcher that determines which files get decompressed.
#[derive(Clone, Debug)] #[derive(Clone, Debug)]
@@ -18,7 +18,7 @@ pub struct DecompressionMatcherBuilder {
} }
/// A representation of a single command for decompressing data /// A representation of a single command for decompressing data
/// out-of-process. /// out-of-proccess.
#[derive(Clone, Debug)] #[derive(Clone, Debug)]
struct DecompressionCommand { struct DecompressionCommand {
/// The glob that matches this command. /// The glob that matches this command.
@@ -132,7 +132,7 @@ impl DecompressionMatcherBuilder {
A: AsRef<OsStr>, A: AsRef<OsStr>,
{ {
let glob = glob.to_string(); let glob = glob.to_string();
let bin = try_resolve_binary(Path::new(program.as_ref()))?; let bin = resolve_binary(Path::new(program.as_ref()))?;
let args = let args =
args.into_iter().map(|a| a.as_ref().to_os_string()).collect(); args.into_iter().map(|a| a.as_ref().to_os_string()).collect();
self.commands.push(DecompressionCommand { glob, bin, args }); self.commands.push(DecompressionCommand { glob, bin, args });
@@ -230,7 +230,7 @@ impl DecompressionReaderBuilder {
match self.command_builder.build(&mut cmd) { match self.command_builder.build(&mut cmd) {
Ok(cmd_reader) => Ok(DecompressionReader { rdr: Ok(cmd_reader) }), Ok(cmd_reader) => Ok(DecompressionReader { rdr: Ok(cmd_reader) }),
Err(err) => { Err(err) => {
log::debug!( debug!(
"{}: error spawning command '{:?}': {} \ "{}: error spawning command '{:?}': {} \
(falling back to uncompressed reader)", (falling back to uncompressed reader)",
path.display(), path.display(),
@@ -366,30 +366,6 @@ impl DecompressionReader {
let file = File::open(path)?; let file = File::open(path)?;
Ok(DecompressionReader { rdr: Err(file) }) Ok(DecompressionReader { rdr: Err(file) })
} }
/// Closes this reader, freeing any resources used by its underlying child
/// process, if one was used. If the child process exits with a nonzero
/// exit code, the returned Err value will include its stderr.
///
/// `close` is idempotent, meaning it can be safely called multiple times.
/// The first call closes the CommandReader and any subsequent calls do
/// nothing.
///
/// This method should be called after partially reading a file to prevent
/// resource leakage. However there is no need to call `close` explicitly
/// if your code always calls `read` to EOF, as `read` takes care of
/// calling `close` in this case.
///
/// `close` is also called in `drop` as a last line of defense against
/// resource leakage. Any error from the child process is then printed as a
/// warning to stderr. This can be avoided by explicitly calling `close`
/// before the CommandReader is dropped.
pub fn close(&mut self) -> io::Result<()> {
match self.rdr {
Ok(ref mut rdr) => rdr.close(),
Err(_) => Ok(()),
}
}
} }
impl io::Read for DecompressionReader { impl io::Read for DecompressionReader {
@@ -421,34 +397,6 @@ impl io::Read for DecompressionReader {
/// On non-Windows, this is a no-op. /// On non-Windows, this is a no-op.
pub fn resolve_binary<P: AsRef<Path>>( pub fn resolve_binary<P: AsRef<Path>>(
prog: P, prog: P,
) -> Result<PathBuf, CommandError> {
if !cfg!(windows) {
return Ok(prog.as_ref().to_path_buf());
}
try_resolve_binary(prog)
}
/// Resolves a path to a program to a path by searching for the program in
/// `PATH`.
///
/// If the program could not be resolved, then an error is returned.
///
/// The purpose of doing this instead of passing the path to the program
/// directly to Command::new is that Command::new will hand relative paths
/// to CreateProcess on Windows, which will implicitly search the current
/// working directory for the executable. This could be undesirable for
/// security reasons. e.g., running ripgrep with the -z/--search-zip flag on an
/// untrusted directory tree could result in arbitrary programs executing on
/// Windows.
///
/// Note that this could still return a relative path if PATH contains a
/// relative path. We permit this since it is assumed that the user has set
/// this explicitly, and thus, desires this behavior.
///
/// If `check_exists` is false or the path is already an absolute path this
/// will return immediately.
fn try_resolve_binary<P: AsRef<Path>>(
prog: P,
) -> Result<PathBuf, CommandError> { ) -> Result<PathBuf, CommandError> {
use std::env; use std::env;
@@ -461,7 +409,7 @@ fn try_resolve_binary<P: AsRef<Path>>(
} }
let prog = prog.as_ref(); let prog = prog.as_ref();
if prog.is_absolute() { if !cfg!(windows) || prog.is_absolute() {
return Ok(prog.to_path_buf()); return Ok(prog.to_path_buf());
} }
let syspaths = match env::var_os("PATH") { let syspaths = match env::var_os("PATH") {
@@ -483,11 +431,9 @@ fn try_resolve_binary<P: AsRef<Path>>(
return Ok(abs_prog.to_path_buf()); return Ok(abs_prog.to_path_buf());
} }
if abs_prog.extension().is_none() { if abs_prog.extension().is_none() {
for extension in ["com", "exe"] { let abs_prog = abs_prog.with_extension("exe");
let abs_prog = abs_prog.with_extension(extension); if is_exe(&abs_prog) {
if is_exe(&abs_prog) { return Ok(abs_prog.to_path_buf());
return Ok(abs_prog.to_path_buf());
}
} }
} }
} }
@@ -509,7 +455,7 @@ fn default_decompression_commands() -> Vec<DecompressionCommand> {
let bin = match resolve_binary(Path::new(args[0])) { let bin = match resolve_binary(Path::new(args[0])) {
Ok(bin) => bin, Ok(bin) => bin,
Err(err) => { Err(err) => {
log::debug!("{}", err); debug!("{}", err);
return; return;
} }
}; };

View File

@@ -8,7 +8,7 @@ use regex::Regex;
/// An error that occurs when parsing a human readable size description. /// An error that occurs when parsing a human readable size description.
/// ///
/// This error provides an end user friendly message describing why the /// This error provides an end user friendly message describing why the
/// description couldn't be parsed and what the expected format is. /// description coudln't be parsed and what the expected format is.
#[derive(Clone, Debug, Eq, PartialEq)] #[derive(Clone, Debug, Eq, PartialEq)]
pub struct ParseSizeError { pub struct ParseSizeError {
original: String, original: String,
@@ -52,7 +52,7 @@ impl error::Error for ParseSizeError {
} }
impl fmt::Display for ParseSizeError { impl fmt::Display for ParseSizeError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
use self::ParseSizeErrorKind::*; use self::ParseSizeErrorKind::*;
match self.kind { match self.kind {
@@ -88,7 +88,7 @@ impl From<ParseSizeError> for io::Error {
/// ///
/// Additional suffixes may be added over time. /// Additional suffixes may be added over time.
pub fn parse_human_readable_size(size: &str) -> Result<u64, ParseSizeError> { pub fn parse_human_readable_size(size: &str) -> Result<u64, ParseSizeError> {
lazy_static::lazy_static! { lazy_static! {
// Normally I'd just parse something this simple by hand to avoid the // Normally I'd just parse something this simple by hand to avoid the
// regex dep, but we bring regex in any way for glob matching, so might // regex dep, but we bring regex in any way for glob matching, so might
// as well use it. // as well use it.

View File

@@ -158,6 +158,19 @@ error message is crafted that typically tells the user how to fix the problem.
#![deny(missing_docs)] #![deny(missing_docs)]
extern crate atty;
extern crate bstr;
extern crate globset;
#[macro_use]
extern crate lazy_static;
#[macro_use]
extern crate log;
extern crate regex;
extern crate same_file;
extern crate termcolor;
#[cfg(windows)]
extern crate winapi_util;
mod decompress; mod decompress;
mod escape; mod escape;
mod human; mod human;
@@ -165,20 +178,18 @@ mod pattern;
mod process; mod process;
mod wtr; mod wtr;
use std::io::IsTerminal; pub use decompress::{
pub use crate::decompress::{
resolve_binary, DecompressionMatcher, DecompressionMatcherBuilder, resolve_binary, DecompressionMatcher, DecompressionMatcherBuilder,
DecompressionReader, DecompressionReaderBuilder, DecompressionReader, DecompressionReaderBuilder,
}; };
pub use crate::escape::{escape, escape_os, unescape, unescape_os}; pub use escape::{escape, escape_os, unescape, unescape_os};
pub use crate::human::{parse_human_readable_size, ParseSizeError}; pub use human::{parse_human_readable_size, ParseSizeError};
pub use crate::pattern::{ pub use pattern::{
pattern_from_bytes, pattern_from_os, patterns_from_path, pattern_from_bytes, pattern_from_os, patterns_from_path,
patterns_from_reader, patterns_from_stdin, InvalidPatternError, patterns_from_reader, patterns_from_stdin, InvalidPatternError,
}; };
pub use crate::process::{CommandError, CommandReader, CommandReaderBuilder}; pub use process::{CommandError, CommandReader, CommandReaderBuilder};
pub use crate::wtr::{ pub use wtr::{
stdout, stdout_buffered_block, stdout_buffered_line, StandardStream, stdout, stdout_buffered_block, stdout_buffered_line, StandardStream,
}; };
@@ -214,13 +225,13 @@ pub fn is_readable_stdin() -> bool {
!is_tty_stdin() && imp() !is_tty_stdin() && imp()
} }
/// Returns true if and only if stdin is believed to be connected to a tty /// Returns true if and only if stdin is believed to be connectted to a tty
/// or a console. /// or a console.
pub fn is_tty_stdin() -> bool { pub fn is_tty_stdin() -> bool {
std::io::stdin().is_terminal() atty::is(atty::Stream::Stdin)
} }
/// Returns true if and only if stdout is believed to be connected to a tty /// Returns true if and only if stdout is believed to be connectted to a tty
/// or a console. /// or a console.
/// ///
/// This is useful for when you want your command line program to produce /// This is useful for when you want your command line program to produce
@@ -229,11 +240,11 @@ pub fn is_tty_stdin() -> bool {
/// implementations of `ls` will often show one item per line when stdout is /// implementations of `ls` will often show one item per line when stdout is
/// redirected, but will condensed output when printing to a tty. /// redirected, but will condensed output when printing to a tty.
pub fn is_tty_stdout() -> bool { pub fn is_tty_stdout() -> bool {
std::io::stdout().is_terminal() atty::is(atty::Stream::Stdout)
} }
/// Returns true if and only if stderr is believed to be connected to a tty /// Returns true if and only if stderr is believed to be connectted to a tty
/// or a console. /// or a console.
pub fn is_tty_stderr() -> bool { pub fn is_tty_stderr() -> bool {
std::io::stderr().is_terminal() atty::is(atty::Stream::Stderr)
} }

View File

@@ -8,7 +8,7 @@ use std::str;
use bstr::io::BufReadExt; use bstr::io::BufReadExt;
use crate::escape::{escape, escape_os}; use escape::{escape, escape_os};
/// An error that occurs when a pattern could not be converted to valid UTF-8. /// An error that occurs when a pattern could not be converted to valid UTF-8.
/// ///
@@ -35,7 +35,7 @@ impl error::Error for InvalidPatternError {
} }
impl fmt::Display for InvalidPatternError { impl fmt::Display for InvalidPatternError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!( write!(
f, f,
"found invalid UTF-8 in pattern at byte offset {}: {} \ "found invalid UTF-8 in pattern at byte offset {}: {} \

View File

@@ -30,14 +30,6 @@ impl CommandError {
pub(crate) fn stderr(bytes: Vec<u8>) -> CommandError { pub(crate) fn stderr(bytes: Vec<u8>) -> CommandError {
CommandError { kind: CommandErrorKind::Stderr(bytes) } CommandError { kind: CommandErrorKind::Stderr(bytes) }
} }
/// Returns true if and only if this error has empty data from stderr.
pub(crate) fn is_empty(&self) -> bool {
match self.kind {
CommandErrorKind::Stderr(ref bytes) => bytes.is_empty(),
_ => false,
}
}
} }
impl error::Error for CommandError { impl error::Error for CommandError {
@@ -47,7 +39,7 @@ impl error::Error for CommandError {
} }
impl fmt::Display for CommandError { impl fmt::Display for CommandError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self.kind { match self.kind {
CommandErrorKind::Io(ref e) => e.fmt(f), CommandErrorKind::Io(ref e) => e.fmt(f),
CommandErrorKind::Stderr(ref bytes) => { CommandErrorKind::Stderr(ref bytes) => {
@@ -115,12 +107,18 @@ impl CommandReaderBuilder {
.stdout(process::Stdio::piped()) .stdout(process::Stdio::piped())
.stderr(process::Stdio::piped()) .stderr(process::Stdio::piped())
.spawn()?; .spawn()?;
let stdout = child.stdout.take().unwrap();
let stderr = if self.async_stderr { let stderr = if self.async_stderr {
StderrReader::r#async(child.stderr.take().unwrap()) StderrReader::async(child.stderr.take().unwrap())
} else { } else {
StderrReader::sync(child.stderr.take().unwrap()) StderrReader::sync(child.stderr.take().unwrap())
}; };
Ok(CommandReader { child, stderr, eof: false }) Ok(CommandReader {
child: child,
stdout: stdout,
stderr: stderr,
done: false,
})
} }
/// When enabled, the reader will asynchronously read the contents of the /// When enabled, the reader will asynchronously read the contents of the
@@ -177,11 +175,9 @@ impl CommandReaderBuilder {
#[derive(Debug)] #[derive(Debug)]
pub struct CommandReader { pub struct CommandReader {
child: process::Child, child: process::Child,
stdout: process::ChildStdout,
stderr: StderrReader, stderr: StderrReader,
/// This is set to true once 'read' returns zero bytes. When this isn't done: bool,
/// set and we close the reader, then we anticipate a pipe error when
/// reaping the child process and silence it.
eof: bool,
} }
impl CommandReader { impl CommandReader {
@@ -205,73 +201,23 @@ impl CommandReader {
) -> Result<CommandReader, CommandError> { ) -> Result<CommandReader, CommandError> {
CommandReaderBuilder::new().build(cmd) CommandReaderBuilder::new().build(cmd)
} }
/// Closes the CommandReader, freeing any resources used by its underlying
/// child process. If the child process exits with a nonzero exit code, the
/// returned Err value will include its stderr.
///
/// `close` is idempotent, meaning it can be safely called multiple times.
/// The first call closes the CommandReader and any subsequent calls do
/// nothing.
///
/// This method should be called after partially reading a file to prevent
/// resource leakage. However there is no need to call `close` explicitly
/// if your code always calls `read` to EOF, as `read` takes care of
/// calling `close` in this case.
///
/// `close` is also called in `drop` as a last line of defense against
/// resource leakage. Any error from the child process is then printed as a
/// warning to stderr. This can be avoided by explicitly calling `close`
/// before the CommandReader is dropped.
pub fn close(&mut self) -> io::Result<()> {
// Dropping stdout closes the underlying file descriptor, which should
// cause a well-behaved child process to exit. If child.stdout is None
// we assume that close() has already been called and do nothing.
let stdout = match self.child.stdout.take() {
None => return Ok(()),
Some(stdout) => stdout,
};
drop(stdout);
if self.child.wait()?.success() {
Ok(())
} else {
let err = self.stderr.read_to_end();
// In the specific case where we haven't consumed the full data
// from the child process, then closing stdout above results in
// a pipe signal being thrown in most cases. But I don't think
// there is any reliable and portable way of detecting it. Instead,
// if we know we haven't hit EOF (so we anticipate a broken pipe
// error) and if stderr otherwise doesn't have anything on it, then
// we assume total success.
if !self.eof && err.is_empty() {
return Ok(());
}
Err(io::Error::from(err))
}
}
}
impl Drop for CommandReader {
fn drop(&mut self) {
if let Err(error) = self.close() {
log::warn!("{}", error);
}
}
} }
impl io::Read for CommandReader { impl io::Read for CommandReader {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> { fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
let stdout = match self.child.stdout { if self.done {
None => return Ok(0), return Ok(0);
Some(ref mut stdout) => stdout,
};
let nread = stdout.read(buf)?;
if nread == 0 {
self.eof = true;
self.close().map(|_| 0)
} else {
Ok(nread)
} }
let nread = self.stdout.read(buf)?;
if nread == 0 {
self.done = true;
// Reap the child now that we're done reading. If the command
// failed, report stderr as an error.
if !self.child.wait()?.success() {
return Err(io::Error::from(self.stderr.read_to_end()));
}
}
Ok(nread)
} }
} }
@@ -285,7 +231,7 @@ enum StderrReader {
impl StderrReader { impl StderrReader {
/// Create a reader for stderr that reads contents asynchronously. /// Create a reader for stderr that reads contents asynchronously.
fn r#async(mut stderr: process::ChildStderr) -> StderrReader { fn async(mut stderr: process::ChildStderr) -> StderrReader {
let handle = let handle =
thread::spawn(move || stderr_to_command_error(&mut stderr)); thread::spawn(move || stderr_to_command_error(&mut stderr));
StderrReader::Async(Some(handle)) StderrReader::Async(Some(handle))

View File

@@ -2,7 +2,7 @@ use std::io;
use termcolor; use termcolor;
use crate::is_tty_stdout; use is_tty_stdout;
/// A writer that supports coloring with either line or block buffering. /// A writer that supports coloring with either line or block buffering.
pub struct StandardStream(StandardStreamKind); pub struct StandardStream(StandardStreamKind);

View File

@@ -13,8 +13,8 @@ use clap::{self, crate_authors, crate_version, App, AppSettings};
use lazy_static::lazy_static; use lazy_static::lazy_static;
const ABOUT: &str = " const ABOUT: &str = "
ripgrep (rg) recursively searches the current directory for a regex pattern. ripgrep (rg) recursively searches your current directory for a regex pattern.
By default, ripgrep will respect gitignore rules and automatically skip hidden By default, ripgrep will respect your .gitignore and automatically skip hidden
files/directories and binary files. files/directories and binary files.
Use -h for short descriptions and --help for more details. Use -h for short descriptions and --help for more details.
@@ -568,8 +568,6 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
flag_dfa_size_limit(&mut args); flag_dfa_size_limit(&mut args);
flag_encoding(&mut args); flag_encoding(&mut args);
flag_engine(&mut args); flag_engine(&mut args);
flag_field_context_separator(&mut args);
flag_field_match_separator(&mut args);
flag_file(&mut args); flag_file(&mut args);
flag_files(&mut args); flag_files(&mut args);
flag_files_with_matches(&mut args); flag_files_with_matches(&mut args);
@@ -632,7 +630,6 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
flag_sort(&mut args); flag_sort(&mut args);
flag_sortr(&mut args); flag_sortr(&mut args);
flag_stats(&mut args); flag_stats(&mut args);
flag_stop_on_nonmatch(&mut args);
flag_text(&mut args); flag_text(&mut args);
flag_threads(&mut args); flag_threads(&mut args);
flag_trim(&mut args); flag_trim(&mut args);
@@ -699,7 +696,7 @@ fn flag_after_context(args: &mut Vec<RGArg>) {
"\ "\
Show NUM lines after each match. Show NUM lines after each match.
This overrides the --passthru flag and partially overrides --context. This overrides the --context flag.
" "
); );
let arg = RGArg::flag("after-context", "NUM") let arg = RGArg::flag("after-context", "NUM")
@@ -707,7 +704,7 @@ This overrides the --passthru flag and partially overrides --context.
.help(SHORT) .help(SHORT)
.long_help(LONG) .long_help(LONG)
.number() .number()
.overrides("passthru"); .overrides("context");
args.push(arg); args.push(arg);
} }
@@ -768,7 +765,7 @@ fn flag_before_context(args: &mut Vec<RGArg>) {
"\ "\
Show NUM lines before each match. Show NUM lines before each match.
This overrides the --passthru flag and partially overrides --context. This overrides the --context flag.
" "
); );
let arg = RGArg::flag("before-context", "NUM") let arg = RGArg::flag("before-context", "NUM")
@@ -776,7 +773,7 @@ This overrides the --passthru flag and partially overrides --context.
.help(SHORT) .help(SHORT)
.long_help(LONG) .long_help(LONG)
.number() .number()
.overrides("passthru"); .overrides("context");
args.push(arg); args.push(arg);
} }
@@ -874,8 +871,8 @@ Print the 0-based byte offset within the input file before each line of output.
If -o (--only-matching) is specified, print the offset of the matching part If -o (--only-matching) is specified, print the offset of the matching part
itself. itself.
If ripgrep does transcoding, then the byte offset is in terms of the result of If ripgrep does transcoding, then the byte offset is in terms of the the result
transcoding and not the original data. This applies similarly to another of transcoding and not the original data. This applies similarly to another
transformation on the source, such as decompression or a --pre filter. Note transformation on the source, such as decompression or a --pre filter. Note
that when the PCRE2 regex engine is used, then UTF-8 transcoding is done by that when the PCRE2 regex engine is used, then UTF-8 transcoding is done by
default. default.
@@ -969,7 +966,7 @@ or, equivalently,
rg --colors 'match:bg:0x0,0x80,0xFF' rg --colors 'match:bg:0x0,0x80,0xFF'
Note that the intense and nointense style flags will have no effect when Note that the the intense and nointense style flags will have no effect when
used alongside these extended color codes. used alongside these extended color codes.
" "
); );
@@ -1008,7 +1005,7 @@ fn flag_context(args: &mut Vec<RGArg>) {
Show NUM lines before and after each match. This is equivalent to providing Show NUM lines before and after each match. This is equivalent to providing
both the -B/--before-context and -A/--after-context flags with the same value. both the -B/--before-context and -A/--after-context flags with the same value.
This overrides the --passthru flag. This overrides both the -B/--before-context and -A/--after-context flags.
" "
); );
let arg = RGArg::flag("context", "NUM") let arg = RGArg::flag("context", "NUM")
@@ -1016,7 +1013,8 @@ This overrides the --passthru flag.
.help(SHORT) .help(SHORT)
.long_help(LONG) .long_help(LONG)
.number() .number()
.overrides("passthru"); .overrides("before-context")
.overrides("after-context");
args.push(arg); args.push(arg);
} }
@@ -1053,13 +1051,11 @@ fn flag_count(args: &mut Vec<RGArg>) {
This flag suppresses normal output and shows the number of lines that match This flag suppresses normal output and shows the number of lines that match
the given patterns for each file searched. Each file containing a match has its the given patterns for each file searched. Each file containing a match has its
path and count printed on each line. Note that this reports the number of lines path and count printed on each line. Note that this reports the number of lines
that match and not the total number of matches, unless -U/--multiline is that match and not the total number of matches.
enabled. In multiline mode, --count is equivalent to --count-matches.
If only one file is given to ripgrep, then only the count is printed if there If only one file is given to ripgrep, then only the count is printed if there
is a match. The --with-filename flag can be used to force printing the file is a match. The --with-filename flag can be used to force printing the file
path in this case. If you need a count to be printed regardless of whether path in this case.
there is a match, then use --include-zero.
This overrides the --count-matches flag. Note that when --count is combined This overrides the --count-matches flag. Note that when --count is combined
with --only-matching, then ripgrep behaves as if --count-matches was given. with --only-matching, then ripgrep behaves as if --count-matches was given.
@@ -1213,7 +1209,7 @@ between supported regex engines depending on the features used in a pattern on
a best effort basis. a best effort basis.
Note that the 'pcre2' engine is an optional ripgrep feature. If PCRE2 wasn't Note that the 'pcre2' engine is an optional ripgrep feature. If PCRE2 wasn't
included in your build of ripgrep, then using this flag will result in ripgrep including in your build of ripgrep, then using this flag will result in ripgrep
printing an error message and exiting. printing an error message and exiting.
This overrides previous uses of --pcre2 and --auto-hybrid-regex flags. This overrides previous uses of --pcre2 and --auto-hybrid-regex flags.
@@ -1231,38 +1227,6 @@ This overrides previous uses of --pcre2 and --auto-hybrid-regex flags.
args.push(arg); args.push(arg);
} }
fn flag_field_context_separator(args: &mut Vec<RGArg>) {
const SHORT: &str = "Set the field context separator.";
const LONG: &str = long!(
"\
Set the field context separator, which is used to delimit file paths, line
numbers, columns and the context itself, when printing contextual lines. The
separator may be any number of bytes, including zero. Escape sequences like
\\x7F or \\t may be used. The '-' character is the default value.
"
);
let arg = RGArg::flag("field-context-separator", "SEPARATOR")
.help(SHORT)
.long_help(LONG);
args.push(arg);
}
fn flag_field_match_separator(args: &mut Vec<RGArg>) {
const SHORT: &str = "Set the match separator.";
const LONG: &str = long!(
"\
Set the field match separator, which is used to delimit file paths, line
numbers, columns and the match itself. The separator may be any number of
bytes, including zero. Escape sequences like \\x7F or \\t may be used. The ':'
character is the default value.
"
);
let arg = RGArg::flag("field-match-separator", "SEPARATOR")
.help(SHORT)
.long_help(LONG);
args.push(arg);
}
fn flag_file(args: &mut Vec<RGArg>) { fn flag_file(args: &mut Vec<RGArg>) {
const SHORT: &str = "Search for patterns from the given file."; const SHORT: &str = "Search for patterns from the given file.";
const LONG: &str = long!( const LONG: &str = long!(
@@ -1302,10 +1266,10 @@ This is useful to determine whether a particular file is being searched or not.
} }
fn flag_files_with_matches(args: &mut Vec<RGArg>) { fn flag_files_with_matches(args: &mut Vec<RGArg>) {
const SHORT: &str = "Print the paths with at least one match."; const SHORT: &str = "Only print the paths with at least one match.";
const LONG: &str = long!( const LONG: &str = long!(
"\ "\
Print the paths with at least one match and suppress match contents. Only print the paths with at least one match.
This overrides --files-without-match. This overrides --files-without-match.
" "
@@ -1319,11 +1283,11 @@ This overrides --files-without-match.
} }
fn flag_files_without_match(args: &mut Vec<RGArg>) { fn flag_files_without_match(args: &mut Vec<RGArg>) {
const SHORT: &str = "Print the paths that contain zero matches."; const SHORT: &str = "Only print the paths that contain zero matches.";
const LONG: &str = long!( const LONG: &str = long!(
"\ "\
Print the paths that contain zero matches and suppress match contents. This Only print the paths that contain zero matches. This inverts/negates the
inverts/negates the --files-with-matches flag. --files-with-matches flag.
This overrides --files-with-matches. This overrides --files-with-matches.
" "
@@ -1390,13 +1354,6 @@ used. Globbing rules match .gitignore globs. Precede a glob with a ! to exclude
it. If multiple globs match a file or directory, the glob given later in the it. If multiple globs match a file or directory, the glob given later in the
command line takes precedence. command line takes precedence.
As an extension, globs support specifying alternatives: *-g ab{c,d}* is
equivalent to *-g abc -g abd*. Empty alternatives like *-g ab{,c}* are not
currently supported. Note that this syntax extension is also currently enabled
in gitignore files, even though this syntax isn't supported by git itself.
ripgrep may disable this syntax extension in gitignore files, but it will
always remain available via the -g/--glob flag.
When this flag is set, every file and directory is applied to it to test for When this flag is set, every file and directory is applied to it to test for
a match. So for example, if you only want to search in a particular directory a match. So for example, if you only want to search in a particular directory
'foo', then *-g foo* is incorrect because 'foo/bar' does not match the glob 'foo', then *-g foo* is incorrect because 'foo/bar' does not match the glob
@@ -1476,15 +1433,10 @@ Search hidden files and directories. By default, hidden files and directories
are skipped. Note that if a hidden file or a directory is whitelisted in an are skipped. Note that if a hidden file or a directory is whitelisted in an
ignore file, then it will be searched even if this flag isn't provided. ignore file, then it will be searched even if this flag isn't provided.
A file or directory is considered hidden if its base name starts with a dot
character ('.'). On operating systems which support a `hidden` file attribute,
like Windows, files with this attribute are also considered hidden.
This flag can be disabled with --no-hidden. This flag can be disabled with --no-hidden.
" "
); );
let arg = RGArg::switch("hidden") let arg = RGArg::switch("hidden")
.short(".")
.help(SHORT) .help(SHORT)
.long_help(LONG) .long_help(LONG)
.overrides("no-hidden"); .overrides("no-hidden");
@@ -1544,7 +1496,7 @@ When specifying multiple ignore files, earlier files have lower precedence
than later files. than later files.
If you are looking for a way to include or exclude files and directories If you are looking for a way to include or exclude files and directories
directly on the command line, then use -g instead. directly on the command line, then used -g instead.
" "
); );
let arg = RGArg::flag("ignore-file", "PATH") let arg = RGArg::flag("ignore-file", "PATH")
@@ -1707,8 +1659,6 @@ fn flag_line_number(args: &mut Vec<RGArg>) {
"\ "\
Show line numbers (1-based). This is enabled by default when searching in a Show line numbers (1-based). This is enabled by default when searching in a
terminal. terminal.
This flag overrides --no-line-number.
" "
); );
let arg = RGArg::switch("line-number") let arg = RGArg::switch("line-number")
@@ -1723,8 +1673,6 @@ This flag overrides --no-line-number.
"\ "\
Suppress line numbers. This is enabled by default when not searching in a Suppress line numbers. This is enabled by default when not searching in a
terminal. terminal.
This flag overrides --line-number.
" "
); );
let arg = RGArg::switch("no-line-number") let arg = RGArg::switch("no-line-number")
@@ -1927,16 +1875,13 @@ Nevertheless, if you only care about matches spanning at most one line, then it
is always better to disable multiline mode. is always better to disable multiline mode.
This flag can be disabled with --no-multiline. This flag can be disabled with --no-multiline.
This overrides the --stop-on-nonmatch flag.
" "
); );
let arg = RGArg::switch("multiline") let arg = RGArg::switch("multiline")
.short("U") .short("U")
.help(SHORT) .help(SHORT)
.long_help(LONG) .long_help(LONG)
.overrides("no-multiline") .overrides("no-multiline");
.overrides("stop-on-nonmatch");
args.push(arg); args.push(arg);
let arg = RGArg::switch("no-multiline").hidden().overrides("multiline"); let arg = RGArg::switch("no-multiline").hidden().overrides("multiline");
@@ -2026,9 +1971,6 @@ fn flag_no_ignore_dot(args: &mut Vec<RGArg>) {
"\ "\
Don't respect .ignore files. Don't respect .ignore files.
This does *not* affect whether ripgrep will ignore files and directories
whose names begin with a dot. For that, see the -./--hidden flag.
This flag can be disabled with the --ignore-dot flag. This flag can be disabled with the --ignore-dot flag.
" "
); );
@@ -2399,17 +2341,12 @@ the empty string. For example, if you are searching using 'rg foo' then using
'rg \"^|foo\"' instead will emit every line in every file searched, but only 'rg \"^|foo\"' instead will emit every line in every file searched, but only
occurrences of 'foo' will be highlighted. This flag enables the same behavior occurrences of 'foo' will be highlighted. This flag enables the same behavior
without needing to modify the pattern. without needing to modify the pattern.
This overrides the --context, --after-context and --before-context flags.
" "
); );
let arg = RGArg::switch("passthru") let arg = RGArg::switch("passthru")
.help(SHORT) .help(SHORT)
.long_help(LONG) .long_help(LONG)
.alias("passthrough") .alias("passthrough");
.overrides("after-context")
.overrides("before-context")
.overrides("context");
args.push(arg); args.push(arg);
} }
@@ -2586,8 +2523,8 @@ Do not print anything to stdout. If a match is found in a file, then ripgrep
will stop searching. This is useful when ripgrep is used only for its exit will stop searching. This is useful when ripgrep is used only for its exit
code (which will be an error if no matches are found). code (which will be an error if no matches are found).
When --files is used, ripgrep will stop finding files after finding the When --files is used, then ripgrep will stop finding files after finding the
first file that does not match any ignore rules. first file that matches all ignore rules.
" "
); );
let arg = RGArg::switch("quiet").short("q").help(SHORT).long_help(LONG); let arg = RGArg::switch("quiet").short("q").help(SHORT).long_help(LONG);
@@ -2650,17 +2587,6 @@ replacement string. Capture group indices are numbered based on the position of
the opening parenthesis of the group, where the leftmost such group is $1. The the opening parenthesis of the group, where the leftmost such group is $1. The
special $0 group corresponds to the entire match. special $0 group corresponds to the entire match.
The name of a group is formed by taking the longest string of letters, numbers
and underscores (i.e. [_0-9A-Za-z]) after the $. For example, $1a will be
replaced with the group named '1a', not the group at index 1. If the group's
name contains characters that aren't letters, numbers or underscores, or you
want to immediately follow the group with another string, the name should be
put inside braces. For example, ${1}a will take the content of the group at
index 1 and append 'a' to the end of it.
If an index or name does not refer to a valid capture group, it will be
replaced with an empty string.
In shells such as Bash and zsh, you should wrap the pattern in single quotes In shells such as Bash and zsh, you should wrap the pattern in single quotes
instead of double quotes. Otherwise, capture group indices will be replaced by instead of double quotes. Otherwise, capture group indices will be replaced by
expanded shell variables which will most likely be empty. expanded shell variables which will most likely be empty.
@@ -2858,25 +2784,6 @@ This flag can be disabled with --no-stats.
args.push(arg); args.push(arg);
} }
fn flag_stop_on_nonmatch(args: &mut Vec<RGArg>) {
const SHORT: &str = "Stop searching after a non-match.";
const LONG: &str = long!(
"\
Enabling this option will cause ripgrep to stop reading a file once it
encounters a non-matching line after it has encountered a matching line.
This is useful if it is expected that all matches in a given file will be on
sequential lines, for example due to the lines being sorted.
This overrides the -U/--multiline flag.
"
);
let arg = RGArg::switch("stop-on-nonmatch")
.help(SHORT)
.long_help(LONG)
.overrides("multiline");
args.push(arg);
}
fn flag_text(args: &mut Vec<RGArg>) { fn flag_text(args: &mut Vec<RGArg>) {
const SHORT: &str = "Search binary files as if they were text."; const SHORT: &str = "Search binary files as if they were text.";
const LONG: &str = long!( const LONG: &str = long!(
@@ -3060,8 +2967,8 @@ fn flag_unrestricted(args: &mut Vec<RGArg>) {
"\ "\
Reduce the level of \"smart\" searching. A single -u won't respect .gitignore Reduce the level of \"smart\" searching. A single -u won't respect .gitignore
(etc.) files (--no-ignore). Two -u flags will additionally search hidden files (etc.) files (--no-ignore). Two -u flags will additionally search hidden files
and directories (-./--hidden). Three -u flags will additionally search binary and directories (--hidden). Three -u flags will additionally search binary files
files (--binary). (--binary).
'rg -uuu' is roughly equivalent to 'grep -r'. 'rg -uuu' is roughly equivalent to 'grep -r'.
" "

View File

@@ -31,6 +31,8 @@ use ignore::overrides::{Override, OverrideBuilder};
use ignore::types::{FileTypeDef, Types, TypesBuilder}; use ignore::types::{FileTypeDef, Types, TypesBuilder};
use ignore::{Walk, WalkBuilder, WalkParallel}; use ignore::{Walk, WalkBuilder, WalkParallel};
use log; use log;
use num_cpus;
use regex;
use termcolor::{BufferWriter, ColorChoice, WriteColor}; use termcolor::{BufferWriter, ColorChoice, WriteColor};
use crate::app; use crate::app;
@@ -41,7 +43,7 @@ use crate::path_printer::{PathPrinter, PathPrinterBuilder};
use crate::search::{ use crate::search::{
PatternMatcher, Printer, SearchWorker, SearchWorkerBuilder, PatternMatcher, Printer, SearchWorker, SearchWorkerBuilder,
}; };
use crate::subject::{Subject, SubjectBuilder}; use crate::subject::SubjectBuilder;
use crate::Result; use crate::Result;
/// The command that ripgrep should execute based on the command line /// The command that ripgrep should execute based on the command line
@@ -95,17 +97,14 @@ pub struct Args(Arc<ArgsImp>);
struct ArgsImp { struct ArgsImp {
/// Mid-to-low level routines for extracting CLI arguments. /// Mid-to-low level routines for extracting CLI arguments.
matches: ArgMatches, matches: ArgMatches,
/// The command we want to execute. /// The patterns provided at the command line and/or via the -f/--file
command: Command, /// flag. This may be empty.
/// The number of threads to use. This is based in part on available patterns: Vec<String>,
/// threads, in part on the number of threads requested and in part on the
/// command we're running.
threads: usize,
/// A matcher built from the patterns. /// A matcher built from the patterns.
/// ///
/// It's important that this is only built once, since building this goes /// It's important that this is only built once, since building this goes
/// through regex compilation and various types of analyses. That is, if /// through regex compilation and various types of analyses. That is, if
/// you need many of these (one per thread, for example), it is better to /// you need many of theses (one per thread, for example), it is better to
/// build it once and then clone it. /// build it once and then clone it.
matcher: PatternMatcher, matcher: PatternMatcher,
/// The paths provided at the command line. This is guaranteed to be /// The paths provided at the command line. This is guaranteed to be
@@ -166,6 +165,12 @@ impl Args {
&self.0.matches &self.0.matches
} }
/// Return the patterns found in the command line arguments. This includes
/// patterns read via the -f/--file flags.
fn patterns(&self) -> &[String] {
&self.0.patterns
}
/// Return the matcher builder from the patterns. /// Return the matcher builder from the patterns.
fn matcher(&self) -> &PatternMatcher { fn matcher(&self) -> &PatternMatcher {
&self.0.matcher &self.0.matcher
@@ -181,7 +186,7 @@ impl Args {
/// Returns true if and only if `paths` had to be populated with a default /// Returns true if and only if `paths` had to be populated with a default
/// path, which occurs only when no paths were given as command line /// path, which occurs only when no paths were given as command line
/// arguments. /// arguments.
pub fn using_default_path(&self) -> bool { fn using_default_path(&self) -> bool {
self.0.using_default_path self.0.using_default_path
} }
@@ -192,7 +197,7 @@ impl Args {
fn printer<W: WriteColor>(&self, wtr: W) -> Result<Printer<W>> { fn printer<W: WriteColor>(&self, wtr: W) -> Result<Printer<W>> {
match self.matches().output_kind() { match self.matches().output_kind() {
OutputKind::Standard => { OutputKind::Standard => {
let separator_search = self.command() == Command::Search; let separator_search = self.command()? == Command::Search;
self.matches() self.matches()
.printer_standard(self.paths(), wtr, separator_search) .printer_standard(self.paths(), wtr, separator_search)
.map(Printer::Standard) .map(Printer::Standard)
@@ -220,8 +225,28 @@ impl Args {
} }
/// Return the high-level command that ripgrep should run. /// Return the high-level command that ripgrep should run.
pub fn command(&self) -> Command { pub fn command(&self) -> Result<Command> {
self.0.command let is_one_search = self.matches().is_one_search(self.paths());
let threads = self.matches().threads()?;
let one_thread = is_one_search || threads == 1;
Ok(if self.matches().is_present("pcre2-version") {
Command::PCRE2Version
} else if self.matches().is_present("type-list") {
Command::Types
} else if self.matches().is_present("files") {
if one_thread {
Command::Files
} else {
Command::FilesParallel
}
} else if self.matches().can_never_match(self.patterns()) {
Command::SearchNever
} else if one_thread {
Command::Search
} else {
Command::SearchParallel
})
} }
/// Builder a path printer that can be used for printing just file paths, /// Builder a path printer that can be used for printing just file paths,
@@ -279,7 +304,7 @@ impl Args {
/// When this returns a `Stats` value, then it is guaranteed that the /// When this returns a `Stats` value, then it is guaranteed that the
/// search worker will be configured to track statistics as well. /// search worker will be configured to track statistics as well.
pub fn stats(&self) -> Result<Option<Stats>> { pub fn stats(&self) -> Result<Option<Stats>> {
Ok(if self.command().is_search() && self.matches().stats() { Ok(if self.command()?.is_search() && self.matches().stats() {
Some(Stats::new()) Some(Stats::new())
} else { } else {
None None
@@ -318,58 +343,12 @@ impl Args {
/// Return a walker that never uses additional threads. /// Return a walker that never uses additional threads.
pub fn walker(&self) -> Result<Walk> { pub fn walker(&self) -> Result<Walk> {
Ok(self Ok(self.matches().walker_builder(self.paths())?.build())
.matches()
.walker_builder(self.paths(), self.0.threads)?
.build())
}
/// Returns true if and only if `stat`-related sorting is required
pub fn needs_stat_sort(&self) -> bool {
return self.matches().sort_by().map_or(
false,
|sort_by| match sort_by.kind {
SortByKind::LastModified
| SortByKind::Created
| SortByKind::LastAccessed => sort_by.check().is_ok(),
_ => false,
},
);
}
/// Sort subjects if a sorter is specified, but only if the sort requires
/// stat calls. Non-stat related sorts are handled during file traversal
///
/// This function assumes that it is known that a stat-related sort is
/// required, and does not check for it again.
///
/// It is important that that precondition is fulfilled, since this function
/// consumes the subjects iterator, and is therefore a blocking function.
pub fn sort_by_stat<I>(&self, subjects: I) -> Vec<Subject>
where
I: Iterator<Item = Subject>,
{
let sorter = match self.matches().sort_by() {
Ok(v) => v,
Err(_) => return subjects.collect(),
};
use SortByKind::*;
let mut keyed = match sorter.kind {
LastModified => load_timestamps(subjects, |m| m.modified()),
LastAccessed => load_timestamps(subjects, |m| m.accessed()),
Created => load_timestamps(subjects, |m| m.created()),
_ => return subjects.collect(),
};
keyed.sort_by(|a, b| sort_by_option(&a.0, &b.0, sorter.reverse));
keyed.into_iter().map(|v| v.1).collect()
} }
/// Return a parallel walker that may use additional threads. /// Return a parallel walker that may use additional threads.
pub fn walker_parallel(&self) -> Result<WalkParallel> { pub fn walker_parallel(&self) -> Result<WalkParallel> {
Ok(self Ok(self.matches().walker_builder(self.paths())?.build_parallel())
.matches()
.walker_builder(self.paths(), self.0.threads)?
.build_parallel())
} }
} }
@@ -444,23 +423,44 @@ impl SortBy {
Ok(()) Ok(())
} }
/// Load sorters only if they are applicable at the walk stage. fn configure_walk_builder(self, builder: &mut WalkBuilder) {
/// // This isn't entirely optimal. In particular, we will wind up issuing
/// In particular, sorts that involve `stat` calls are not loaded because // a stat for many files redundantly. Aside from having potentially
/// the walk inherently assumes that parent directories are aware of all its // inconsistent results with respect to sorting, this is also slow.
/// decendent properties, but `stat` does not work that way. // We could fix this here at the expense of memory by caching stat
fn configure_builder_sort(self, builder: &mut WalkBuilder) { // calls. A better fix would be to find a way to push this down into
use SortByKind::*; // directory traversal itself, but that's a somewhat nasty change.
match self.kind { match self.kind {
Path if self.reverse => { SortByKind::None => {}
builder.sort_by_file_name(|a, b| a.cmp(b).reverse()); SortByKind::Path => {
if self.reverse {
builder.sort_by_file_name(|a, b| a.cmp(b).reverse());
} else {
builder.sort_by_file_name(|a, b| a.cmp(b));
}
} }
Path => { SortByKind::LastModified => {
builder.sort_by_file_name(|a, b| a.cmp(b)); builder.sort_by_file_path(move |a, b| {
sort_by_metadata_time(a, b, self.reverse, |md| {
md.modified()
})
});
} }
// these use `stat` calls and will be sorted in Args::sort_by_stat() SortByKind::LastAccessed => {
LastModified | LastAccessed | Created | None => {} builder.sort_by_file_path(move |a, b| {
}; sort_by_metadata_time(a, b, self.reverse, |md| {
md.accessed()
})
});
}
SortByKind::Created => {
builder.sort_by_file_path(move |a, b| {
sort_by_metadata_time(a, b, self.reverse, |md| {
md.created()
})
});
}
}
} }
} }
@@ -490,6 +490,24 @@ enum EncodingMode {
Disabled, Disabled,
} }
impl EncodingMode {
/// Checks if an explicit encoding has been set. Returns false for
/// automatic BOM sniffing and no sniffing.
///
/// This is only used to determine whether PCRE2 needs to have its own
/// UTF-8 checking enabled. If we have an explicit encoding set, then
/// we're always guaranteed to get UTF-8, so we can disable PCRE2's check.
/// Otherwise, we have no such guarantee, and must enable PCRE2' UTF-8
/// check.
#[cfg(feature = "pcre2")]
fn has_explicit_encoding(&self) -> bool {
match self {
EncodingMode::Some(_) => true,
_ => false,
}
}
}
impl ArgMatches { impl ArgMatches {
/// Create an ArgMatches from clap's parse result. /// Create an ArgMatches from clap's parse result.
fn new(clap_matches: clap::ArgMatches<'static>) -> ArgMatches { fn new(clap_matches: clap::ArgMatches<'static>) -> ArgMatches {
@@ -539,36 +557,9 @@ impl ArgMatches {
} else { } else {
false false
}; };
// Now figure out the number of threads we'll use and which
// command will run.
let is_one_search = self.is_one_search(&paths);
let threads = if is_one_search { 1 } else { self.threads()? };
if threads == 1 {
log::debug!("running in single threaded mode");
} else {
log::debug!("running with {threads} threads for parallelism");
}
let command = if self.is_present("pcre2-version") {
Command::PCRE2Version
} else if self.is_present("type-list") {
Command::Types
} else if self.is_present("files") {
if threads == 1 {
Command::Files
} else {
Command::FilesParallel
}
} else if self.can_never_match(&patterns) {
Command::SearchNever
} else if threads == 1 {
Command::Search
} else {
Command::SearchParallel
};
Ok(Args(Arc::new(ArgsImp { Ok(Args(Arc::new(ArgsImp {
matches: self, matches: self,
command, patterns,
threads,
matcher, matcher,
paths, paths,
using_default_path, using_default_path,
@@ -671,8 +662,6 @@ impl ArgMatches {
.multi_line(true) .multi_line(true)
.unicode(self.unicode()) .unicode(self.unicode())
.octal(false) .octal(false)
.fixed_strings(self.is_present("fixed-strings"))
.whole_line(self.is_present("line-regexp"))
.word(self.is_present("word-regexp")); .word(self.is_present("word-regexp"));
if self.is_present("multiline") { if self.is_present("multiline") {
builder.dot_matches_new_line(self.is_present("multiline-dotall")); builder.dot_matches_new_line(self.is_present("multiline-dotall"));
@@ -699,7 +688,12 @@ impl ArgMatches {
if let Some(limit) = self.dfa_size_limit()? { if let Some(limit) = self.dfa_size_limit()? {
builder.dfa_size_limit(limit); builder.dfa_size_limit(limit);
} }
match builder.build_many(patterns) { let res = if self.is_present("fixed-strings") {
builder.build_literals(patterns)
} else {
builder.build(&patterns.join("|"))
};
match res {
Ok(m) => Ok(m), Ok(m) => Ok(m),
Err(err) => Err(From::from(suggest_multiline(err.to_string()))), Err(err) => Err(From::from(suggest_multiline(err.to_string()))),
} }
@@ -716,8 +710,6 @@ impl ArgMatches {
.case_smart(self.case_smart()) .case_smart(self.case_smart())
.caseless(self.case_insensitive()) .caseless(self.case_insensitive())
.multi_line(true) .multi_line(true)
.fixed_strings(self.is_present("fixed-strings"))
.whole_line(self.is_present("line-regexp"))
.word(self.is_present("word-regexp")); .word(self.is_present("word-regexp"));
// For whatever reason, the JIT craps out during regex compilation with // For whatever reason, the JIT craps out during regex compilation with
// a "no more memory" error on 32 bit systems. So don't use it there. // a "no more memory" error on 32 bit systems. So don't use it there.
@@ -731,6 +723,14 @@ impl ArgMatches {
} }
if self.unicode() { if self.unicode() {
builder.utf(true).ucp(true); builder.utf(true).ucp(true);
if self.encoding()?.has_explicit_encoding() {
// SAFETY: If an encoding was specified, then we're guaranteed
// to get valid UTF-8, so we can disable PCRE2's UTF checking.
// (Feeding invalid UTF-8 to PCRE2 is undefined behavior.)
unsafe {
builder.disable_utf_check();
}
}
} }
if self.is_present("multiline") { if self.is_present("multiline") {
builder.dotall(self.is_present("multiline-dotall")); builder.dotall(self.is_present("multiline-dotall"));
@@ -738,7 +738,7 @@ impl ArgMatches {
if self.is_present("crlf") { if self.is_present("crlf") {
builder.crlf(true); builder.crlf(true);
} }
Ok(builder.build_many(patterns)?) Ok(builder.build(&patterns.join("|"))?)
} }
/// Build a JSON printer that writes results to the given writer. /// Build a JSON printer that writes results to the given writer.
@@ -777,7 +777,6 @@ impl ArgMatches {
.path(self.with_filename(paths)) .path(self.with_filename(paths))
.only_matching(self.is_present("only-matching")) .only_matching(self.is_present("only-matching"))
.per_match(self.is_present("vimgrep")) .per_match(self.is_present("vimgrep"))
.per_match_one_line(true)
.replacement(self.replacement()) .replacement(self.replacement())
.max_columns(self.max_columns()?) .max_columns(self.max_columns()?)
.max_columns_preview(self.max_columns_preview()) .max_columns_preview(self.max_columns_preview())
@@ -787,8 +786,8 @@ impl ArgMatches {
.trim_ascii(self.is_present("trim")) .trim_ascii(self.is_present("trim"))
.separator_search(None) .separator_search(None)
.separator_context(self.context_separator()) .separator_context(self.context_separator())
.separator_field_match(self.field_match_separator()) .separator_field_match(b":".to_vec())
.separator_field_context(self.field_context_separator()) .separator_field_context(b"-".to_vec())
.separator_path(self.path_separator()?) .separator_path(self.path_separator()?)
.path_terminator(self.path_terminator()); .path_terminator(self.path_terminator());
if separator_search { if separator_search {
@@ -840,8 +839,7 @@ impl ArgMatches {
.before_context(ctx_before) .before_context(ctx_before)
.after_context(ctx_after) .after_context(ctx_after)
.passthru(self.is_present("passthru")) .passthru(self.is_present("passthru"))
.memory_map(self.mmap_choice(paths)) .memory_map(self.mmap_choice(paths));
.stop_on_nonmatch(self.is_present("stop-on-nonmatch"));
match self.encoding()? { match self.encoding()? {
EncodingMode::Some(enc) => { EncodingMode::Some(enc) => {
builder.encoding(Some(enc)); builder.encoding(Some(enc));
@@ -859,11 +857,7 @@ impl ArgMatches {
/// ///
/// If there was a problem parsing the CLI arguments necessary for /// If there was a problem parsing the CLI arguments necessary for
/// constructing the builder, then this returns an error. /// constructing the builder, then this returns an error.
fn walker_builder( fn walker_builder(&self, paths: &[PathBuf]) -> Result<WalkBuilder> {
&self,
paths: &[PathBuf],
threads: usize,
) -> Result<WalkBuilder> {
let mut builder = WalkBuilder::new(&paths[0]); let mut builder = WalkBuilder::new(&paths[0]);
for path in &paths[1..] { for path in &paths[1..] {
builder.add(path); builder.add(path);
@@ -879,7 +873,7 @@ impl ArgMatches {
.max_depth(self.usize_of("max-depth")?) .max_depth(self.usize_of("max-depth")?)
.follow_links(self.is_present("follow")) .follow_links(self.is_present("follow"))
.max_filesize(self.max_file_size()?) .max_filesize(self.max_file_size()?)
.threads(threads) .threads(self.threads()?)
.same_file_system(self.is_present("one-file-system")) .same_file_system(self.is_present("one-file-system"))
.skip_stdout(!self.is_present("files")) .skip_stdout(!self.is_present("files"))
.overrides(self.overrides()?) .overrides(self.overrides()?)
@@ -892,10 +886,12 @@ impl ArgMatches {
.git_exclude(!self.no_ignore_vcs() && !self.no_ignore_exclude()) .git_exclude(!self.no_ignore_vcs() && !self.no_ignore_exclude())
.require_git(!self.is_present("no-require-git")) .require_git(!self.is_present("no-require-git"))
.ignore_case_insensitive(self.ignore_file_case_insensitive()); .ignore_case_insensitive(self.ignore_file_case_insensitive());
if !self.no_ignore() && !self.no_ignore_dot() { if !self.no_ignore() {
builder.add_custom_ignore_filename(".rgignore"); builder.add_custom_ignore_filename(".rgignore");
} }
self.sort_by()?.configure_builder_sort(&mut builder); let sortby = self.sort_by()?;
sortby.check()?;
sortby.configure_walk_builder(&mut builder);
Ok(builder) Ok(builder)
} }
} }
@@ -1010,10 +1006,10 @@ impl ArgMatches {
/// If there was a problem parsing the values from the user as an integer, /// If there was a problem parsing the values from the user as an integer,
/// then an error is returned. /// then an error is returned.
fn contexts(&self) -> Result<(usize, usize)> { fn contexts(&self) -> Result<(usize, usize)> {
let after = self.usize_of("after-context")?.unwrap_or(0);
let before = self.usize_of("before-context")?.unwrap_or(0);
let both = self.usize_of("context")?.unwrap_or(0); let both = self.usize_of("context")?.unwrap_or(0);
let after = self.usize_of("after-context")?.unwrap_or(both); Ok(if both > 0 { (both, both) } else { (before, after) })
let before = self.usize_of("before-context")?.unwrap_or(both);
Ok((before, after))
} }
/// Returns the unescaped context separator in UTF-8 bytes. /// Returns the unescaped context separator in UTF-8 bytes.
@@ -1070,6 +1066,7 @@ impl ArgMatches {
} }
let label = match self.value_of_lossy("encoding") { let label = match self.value_of_lossy("encoding") {
None if self.pcre2_unicode() => "utf-8".to_string(),
None => return Ok(EncodingMode::Auto), None => return Ok(EncodingMode::Auto),
Some(label) => label, Some(label) => label,
}; };
@@ -1380,27 +1377,14 @@ impl ArgMatches {
} }
} }
/// Returns the unescaped field context separator. If one wasn't specified,
/// then '-' is used as the default.
fn field_context_separator(&self) -> Vec<u8> {
match self.value_of_os("field-context-separator") {
None => b"-".to_vec(),
Some(sep) => cli::unescape_os(&sep),
}
}
/// Returns the unescaped field match separator. If one wasn't specified,
/// then ':' is used as the default.
fn field_match_separator(&self) -> Vec<u8> {
match self.value_of_os("field-match-separator") {
None => b":".to_vec(),
Some(sep) => cli::unescape_os(&sep),
}
}
/// Get a sequence of all available patterns from the command line. /// Get a sequence of all available patterns from the command line.
/// This includes reading the -e/--regexp and -f/--file flags. /// This includes reading the -e/--regexp and -f/--file flags.
/// ///
/// Note that if -F/--fixed-strings is set, then all patterns will be
/// escaped. If -x/--line-regexp is set, then all patterns are surrounded
/// by `^...$`. Other things, such as --word-regexp, are handled by the
/// regex matcher itself.
///
/// If any pattern is invalid UTF-8, then an error is returned. /// If any pattern is invalid UTF-8, then an error is returned.
fn patterns(&self) -> Result<Vec<String>> { fn patterns(&self) -> Result<Vec<String>> {
if self.is_present("files") || self.is_present("type-list") { if self.is_present("files") || self.is_present("type-list") {
@@ -1441,6 +1425,16 @@ impl ArgMatches {
Ok(pats) Ok(pats)
} }
/// Returns a pattern that is guaranteed to produce an empty regular
/// expression that is valid in any position.
fn pattern_empty(&self) -> String {
// This would normally just be an empty string, which works on its
// own, but if the patterns are joined in a set of alternations, then
// you wind up with `foo|`, which is currently invalid in Rust's regex
// engine.
"(?:z{0})*".to_string()
}
/// Converts an OsStr pattern to a String pattern. The pattern is escaped /// Converts an OsStr pattern to a String pattern. The pattern is escaped
/// if -F/--fixed-strings is set. /// if -F/--fixed-strings is set.
/// ///
@@ -1459,12 +1453,30 @@ impl ArgMatches {
/// Applies additional processing on the given pattern if necessary /// Applies additional processing on the given pattern if necessary
/// (such as escaping meta characters or turning it into a line regex). /// (such as escaping meta characters or turning it into a line regex).
fn pattern_from_string(&self, pat: String) -> String { fn pattern_from_string(&self, pat: String) -> String {
let pat = self.pattern_line(self.pattern_literal(pat));
if pat.is_empty() { if pat.is_empty() {
// This would normally just be an empty string, which works on its self.pattern_empty()
// own, but if the patterns are joined in a set of alternations, } else {
// then you wind up with `foo|`, which is currently invalid in pat
// Rust's regex engine. }
"(?:)".to_string() }
/// Returns the given pattern as a line pattern if the -x/--line-regexp
/// flag is set. Otherwise, the pattern is returned unchanged.
fn pattern_line(&self, pat: String) -> String {
if self.is_present("line-regexp") {
format!(r"^(?:{})$", pat)
} else {
pat
}
}
/// Returns the given pattern as a literal pattern if the
/// -F/--fixed-strings flag is set. Otherwise, the pattern is returned
/// unchanged.
fn pattern_literal(&self, pat: String) -> String {
if self.is_present("fixed-strings") {
regex::escape(&pat)
} else { } else {
pat pat
} }
@@ -1561,9 +1573,7 @@ impl ArgMatches {
return Ok(1); return Ok(1);
} }
let threads = self.usize_of("threads")?.unwrap_or(0); let threads = self.usize_of("threads")?.unwrap_or(0);
let available = Ok(if threads == 0 { cmp::min(12, num_cpus::get()) } else { threads })
std::thread::available_parallelism().map_or(1, |n| n.get());
Ok(if threads == 0 { cmp::min(12, available) } else { threads })
} }
/// Builds a file type matcher from the command line flags. /// Builds a file type matcher from the command line flags.
@@ -1597,6 +1607,12 @@ impl ArgMatches {
!(self.is_present("no-unicode") || self.is_present("no-pcre2-unicode")) !(self.is_present("no-unicode") || self.is_present("no-pcre2-unicode"))
} }
/// Returns true if and only if PCRE2 is enabled and its Unicode mode is
/// enabled.
fn pcre2_unicode(&self) -> bool {
self.is_present("pcre2") && self.unicode()
}
/// Returns true if and only if file names containing each match should /// Returns true if and only if file names containing each match should
/// be emitted. /// be emitted.
fn with_filename(&self, paths: &[PathBuf]) -> bool { fn with_filename(&self, paths: &[PathBuf]) -> bool {
@@ -1685,7 +1701,7 @@ impl ArgMatches {
self.0.value_of_os(name) self.0.value_of_os(name)
} }
fn values_of_os(&self, name: &str) -> Option<clap::OsValues<'_>> { fn values_of_os(&self, name: &str) -> Option<clap::OsValues> {
self.0.values_of_os(name) self.0.values_of_os(name)
} }
} }
@@ -1757,18 +1773,32 @@ fn u64_to_usize(arg_name: &str, value: Option<u64>) -> Result<Option<usize>> {
} }
} }
/// Sorts by an optional parameter. /// Builds a comparator for sorting two files according to a system time
// /// extracted from the file's metadata.
/// If parameter is found to be `None`, both entries compare equal. ///
fn sort_by_option<T: Ord>( /// If there was a problem extracting the metadata or if the time is not
p1: &Option<T>, /// available, then both entries compare equal.
p2: &Option<T>, fn sort_by_metadata_time<G>(
p1: &Path,
p2: &Path,
reverse: bool, reverse: bool,
) -> cmp::Ordering { get_time: G,
match (p1, p2, reverse) { ) -> cmp::Ordering
(Some(p1), Some(p2), true) => p1.cmp(&p2).reverse(), where
(Some(p1), Some(p2), false) => p1.cmp(&p2), G: Fn(&fs::Metadata) -> io::Result<SystemTime>,
_ => cmp::Ordering::Equal, {
let t1 = match p1.metadata().and_then(|md| get_time(&md)) {
Ok(t) => t,
Err(_) => return cmp::Ordering::Equal,
};
let t2 = match p2.metadata().and_then(|md| get_time(&md)) {
Ok(t) => t,
Err(_) => return cmp::Ordering::Equal,
};
if reverse {
t1.cmp(&t2).reverse()
} else {
t1.cmp(&t2)
} }
} }
@@ -1822,17 +1852,3 @@ fn current_dir() -> Result<PathBuf> {
) )
.into()) .into())
} }
/// Tries to assign a timestamp to every `Subject` in the vector to help with
/// sorting Subjects by time.
fn load_timestamps<G>(
subjects: impl Iterator<Item = Subject>,
get_time: G,
) -> Vec<(Option<SystemTime>, Subject)>
where
G: Fn(&fs::Metadata) -> io::Result<SystemTime>,
{
subjects
.map(|s| (s.path().metadata().and_then(|m| get_time(&m)).ok(), s))
.collect()
}

View File

@@ -28,10 +28,7 @@ pub fn args() -> Vec<OsString> {
let (args, errs) = match parse(&config_path) { let (args, errs) = match parse(&config_path) {
Ok((args, errs)) => (args, errs), Ok((args, errs)) => (args, errs),
Err(err) => { Err(err) => {
message!( message!("{}", err);
"failed to read the file specified in RIPGREP_CONFIG_PATH: {}",
err
);
return vec![]; return vec![];
} }
}; };
@@ -80,7 +77,7 @@ fn parse<P: AsRef<Path>>(
fn parse_reader<R: io::Read>( fn parse_reader<R: io::Read>(
rdr: R, rdr: R,
) -> Result<(Vec<OsString>, Vec<Box<dyn Error>>)> { ) -> Result<(Vec<OsString>, Vec<Box<dyn Error>>)> {
let mut bufrdr = io::BufReader::new(rdr); let bufrdr = io::BufReader::new(rdr);
let (mut args, mut errs) = (vec![], vec![]); let (mut args, mut errs) = (vec![], vec![]);
let mut line_number = 0; let mut line_number = 0;
bufrdr.for_byte_line_with_terminator(|line| { bufrdr.for_byte_line_with_terminator(|line| {

View File

@@ -24,16 +24,16 @@ impl Logger {
} }
impl Log for Logger { impl Log for Logger {
fn enabled(&self, _: &log::Metadata<'_>) -> bool { fn enabled(&self, _: &log::Metadata) -> bool {
// We set the log level via log::set_max_level, so we don't need to // We set the log level via log::set_max_level, so we don't need to
// implement filtering here. // implement filtering here.
true true
} }
fn log(&self, record: &log::Record<'_>) { fn log(&self, record: &log::Record) {
match (record.file(), record.line()) { match (record.file(), record.line()) {
(Some(file), Some(line)) => { (Some(file), Some(line)) => {
eprintln_locked!( eprintln!(
"{}|{}|{}:{}: {}", "{}|{}|{}:{}: {}",
record.level(), record.level(),
record.target(), record.target(),
@@ -43,7 +43,7 @@ impl Log for Logger {
); );
} }
(Some(file), None) => { (Some(file), None) => {
eprintln_locked!( eprintln!(
"{}|{}|{}: {}", "{}|{}|{}: {}",
record.level(), record.level(),
record.target(), record.target(),
@@ -52,7 +52,7 @@ impl Log for Logger {
); );
} }
_ => { _ => {
eprintln_locked!( eprintln!(
"{}|{}: {}", "{}|{}: {}",
record.level(), record.level(),
record.target(), record.target(),
@@ -63,6 +63,6 @@ impl Log for Logger {
} }
fn flush(&self) { fn flush(&self) {
// We use eprintln_locked! which is flushed on every call. // We use eprintln! which is flushed on every call.
} }
} }

View File

@@ -47,7 +47,7 @@ type Result<T> = ::std::result::Result<T, Box<dyn error::Error>>;
fn main() { fn main() {
if let Err(err) = Args::parse().and_then(try_main) { if let Err(err) = Args::parse().and_then(try_main) {
eprintln_locked!("{}", err); eprintln!("{}", err);
process::exit(2); process::exit(2);
} }
} }
@@ -55,7 +55,7 @@ fn main() {
fn try_main(args: Args) -> Result<()> { fn try_main(args: Args) -> Result<()> {
use args::Command::*; use args::Command::*;
let matched = match args.command() { let matched = match args.command()? {
Search => search(&args), Search => search(&args),
SearchParallel => search_parallel(&args), SearchParallel => search_parallel(&args),
SearchNever => Ok(false), SearchNever => Ok(false),
@@ -77,70 +77,48 @@ fn try_main(args: Args) -> Result<()> {
/// steps through the file list (current directory by default) and searches /// steps through the file list (current directory by default) and searches
/// each file sequentially. /// each file sequentially.
fn search(args: &Args) -> Result<bool> { fn search(args: &Args) -> Result<bool> {
/// The meat of the routine is here. This lets us call the same iteration
/// code over each file regardless of whether we stream over the files
/// as they're produced by the underlying directory traversal or whether
/// they've been collected and sorted (for example) first.
fn iter(
args: &Args,
subjects: impl Iterator<Item = Subject>,
started_at: std::time::Instant,
) -> Result<bool> {
let quit_after_match = args.quit_after_match()?;
let mut stats = args.stats()?;
let mut searcher = args.search_worker(args.stdout())?;
let mut matched = false;
let mut searched = false;
for subject in subjects {
searched = true;
let search_result = match searcher.search(&subject) {
Ok(search_result) => search_result,
// A broken pipe means graceful termination.
Err(err) if err.kind() == io::ErrorKind::BrokenPipe => break,
Err(err) => {
err_message!("{}: {}", subject.path().display(), err);
continue;
}
};
matched |= search_result.has_match();
if let Some(ref mut stats) = stats {
*stats += search_result.stats().unwrap();
}
if matched && quit_after_match {
break;
}
}
if args.using_default_path() && !searched {
eprint_nothing_searched();
}
if let Some(ref stats) = stats {
let elapsed = Instant::now().duration_since(started_at);
// We don't care if we couldn't print this successfully.
let _ = searcher.print_stats(elapsed, stats);
}
Ok(matched)
}
let started_at = Instant::now(); let started_at = Instant::now();
let quit_after_match = args.quit_after_match()?;
let subject_builder = args.subject_builder(); let subject_builder = args.subject_builder();
let subjects = args let mut stats = args.stats()?;
.walker()? let mut searcher = args.search_worker(args.stdout())?;
.filter_map(|result| subject_builder.build_from_result(result)); let mut matched = false;
if args.needs_stat_sort() {
let subjects = args.sort_by_stat(subjects).into_iter(); for result in args.walker()? {
iter(args, subjects, started_at) let subject = match subject_builder.build_from_result(result) {
} else { Some(subject) => subject,
iter(args, subjects, started_at) None => continue,
};
let search_result = match searcher.search(&subject) {
Ok(search_result) => search_result,
Err(err) => {
// A broken pipe means graceful termination.
if err.kind() == io::ErrorKind::BrokenPipe {
break;
}
err_message!("{}: {}", subject.path().display(), err);
continue;
}
};
matched = matched || search_result.has_match();
if let Some(ref mut stats) = stats {
*stats += search_result.stats().unwrap();
}
if matched && quit_after_match {
break;
}
} }
if let Some(ref stats) = stats {
let elapsed = Instant::now().duration_since(started_at);
// We don't care if we couldn't print this successfully.
let _ = searcher.print_stats(elapsed, stats);
}
Ok(matched)
} }
/// The top-level entry point for multi-threaded search. The parallelism is /// The top-level entry point for multi-threaded search. The parallelism is
/// itself achieved by the recursive directory traversal. All we need to do is /// itself achieved by the recursive directory traversal. All we need to do is
/// feed it a worker for performing a search on each file. /// feed it a worker for performing a search on each file.
///
/// Requesting a sorted output from ripgrep (such as with `--sort path`) will
/// automatically disable parallelism and hence sorting is not handled here.
fn search_parallel(args: &Args) -> Result<bool> { fn search_parallel(args: &Args) -> Result<bool> {
use std::sync::atomic::AtomicBool; use std::sync::atomic::AtomicBool;
use std::sync::atomic::Ordering::SeqCst; use std::sync::atomic::Ordering::SeqCst;
@@ -151,13 +129,11 @@ fn search_parallel(args: &Args) -> Result<bool> {
let bufwtr = args.buffer_writer()?; let bufwtr = args.buffer_writer()?;
let stats = args.stats()?.map(Mutex::new); let stats = args.stats()?.map(Mutex::new);
let matched = AtomicBool::new(false); let matched = AtomicBool::new(false);
let searched = AtomicBool::new(false);
let mut searcher_err = None; let mut searcher_err = None;
args.walker_parallel()?.run(|| { args.walker_parallel()?.run(|| {
let bufwtr = &bufwtr; let bufwtr = &bufwtr;
let stats = &stats; let stats = &stats;
let matched = &matched; let matched = &matched;
let searched = &searched;
let subject_builder = &subject_builder; let subject_builder = &subject_builder;
let mut searcher = match args.search_worker(bufwtr.buffer()) { let mut searcher = match args.search_worker(bufwtr.buffer()) {
Ok(searcher) => searcher, Ok(searcher) => searcher,
@@ -172,7 +148,6 @@ fn search_parallel(args: &Args) -> Result<bool> {
Some(subject) => subject, Some(subject) => subject,
None => return WalkState::Continue, None => return WalkState::Continue,
}; };
searched.store(true, SeqCst);
searcher.printer().get_mut().clear(); searcher.printer().get_mut().clear();
let search_result = match searcher.search(&subject) { let search_result = match searcher.search(&subject) {
Ok(search_result) => search_result, Ok(search_result) => search_result,
@@ -206,9 +181,6 @@ fn search_parallel(args: &Args) -> Result<bool> {
if let Some(err) = searcher_err.take() { if let Some(err) = searcher_err.take() {
return Err(err); return Err(err);
} }
if args.using_default_path() && !searched.load(SeqCst) {
eprint_nothing_searched();
}
if let Some(ref locked_stats) = stats { if let Some(ref locked_stats) = stats {
let elapsed = Instant::now().duration_since(started_at); let elapsed = Instant::now().duration_since(started_at);
let stats = locked_stats.lock().unwrap(); let stats = locked_stats.lock().unwrap();
@@ -219,66 +191,39 @@ fn search_parallel(args: &Args) -> Result<bool> {
Ok(matched.load(SeqCst)) Ok(matched.load(SeqCst))
} }
fn eprint_nothing_searched() {
err_message!(
"No files were searched, which means ripgrep probably \
applied a filter you didn't expect.\n\
Running with --debug will show why files are being skipped."
);
}
/// The top-level entry point for listing files without searching them. This /// The top-level entry point for listing files without searching them. This
/// recursively steps through the file list (current directory by default) and /// recursively steps through the file list (current directory by default) and
/// prints each path sequentially using a single thread. /// prints each path sequentially using a single thread.
fn files(args: &Args) -> Result<bool> { fn files(args: &Args) -> Result<bool> {
/// The meat of the routine is here. This lets us call the same iteration let quit_after_match = args.quit_after_match()?;
/// code over each file regardless of whether we stream over the files let subject_builder = args.subject_builder();
/// as they're produced by the underlying directory traversal or whether let mut matched = false;
/// they've been collected and sorted (for example) first. let mut path_printer = args.path_printer(args.stdout())?;
fn iter( for result in args.walker()? {
args: &Args, let subject = match subject_builder.build_from_result(result) {
subjects: impl Iterator<Item = Subject>, Some(subject) => subject,
) -> Result<bool> { None => continue,
let quit_after_match = args.quit_after_match()?; };
let mut matched = false; matched = true;
let mut path_printer = args.path_printer(args.stdout())?; if quit_after_match {
break;
for subject in subjects { }
matched = true; if let Err(err) = path_printer.write_path(subject.path()) {
if quit_after_match { // A broken pipe means graceful termination.
if err.kind() == io::ErrorKind::BrokenPipe {
break; break;
} }
if let Err(err) = path_printer.write_path(subject.path()) { // Otherwise, we have some other error that's preventing us from
// A broken pipe means graceful termination. // writing to stdout, so we should bubble it up.
if err.kind() == io::ErrorKind::BrokenPipe { return Err(err.into());
break;
}
// Otherwise, we have some other error that's preventing us from
// writing to stdout, so we should bubble it up.
return Err(err.into());
}
} }
Ok(matched)
}
let subject_builder = args.subject_builder();
let subjects = args
.walker()?
.filter_map(|result| subject_builder.build_from_result(result));
if args.needs_stat_sort() {
let subjects = args.sort_by_stat(subjects).into_iter();
iter(args, subjects)
} else {
iter(args, subjects)
} }
Ok(matched)
} }
/// The top-level entry point for listing files without searching them. This /// The top-level entry point for listing files without searching them. This
/// recursively steps through the file list (current directory by default) and /// recursively steps through the file list (current directory by default) and
/// prints each path sequentially using multiple threads. /// prints each path sequentially using multiple threads.
///
/// Requesting a sorted output from ripgrep (such as with `--sort path`) will
/// automatically disable parallelism and hence sorting is not handled here.
fn files_parallel(args: &Args) -> Result<bool> { fn files_parallel(args: &Args) -> Result<bool> {
use std::sync::atomic::AtomicBool; use std::sync::atomic::AtomicBool;
use std::sync::atomic::Ordering::SeqCst; use std::sync::atomic::Ordering::SeqCst;

View File

@@ -4,28 +4,12 @@ static MESSAGES: AtomicBool = AtomicBool::new(false);
static IGNORE_MESSAGES: AtomicBool = AtomicBool::new(false); static IGNORE_MESSAGES: AtomicBool = AtomicBool::new(false);
static ERRORED: AtomicBool = AtomicBool::new(false); static ERRORED: AtomicBool = AtomicBool::new(false);
/// Like eprintln, but locks STDOUT to prevent interleaving lines.
#[macro_export]
macro_rules! eprintln_locked {
($($tt:tt)*) => {{
{
// This is a bit of an abstraction violation because we explicitly
// lock STDOUT before printing to STDERR. This avoids interleaving
// lines within ripgrep because `search_parallel` uses `termcolor`,
// which accesses the same STDOUT lock when writing lines.
let stdout = std::io::stdout();
let _handle = stdout.lock();
eprintln!($($tt)*);
}
}}
}
/// Emit a non-fatal error message, unless messages were disabled. /// Emit a non-fatal error message, unless messages were disabled.
#[macro_export] #[macro_export]
macro_rules! message { macro_rules! message {
($($tt:tt)*) => { ($($tt:tt)*) => {
if crate::messages::messages() { if crate::messages::messages() {
eprintln_locked!($($tt)*); eprintln!($($tt)*);
} }
} }
} }
@@ -46,7 +30,7 @@ macro_rules! err_message {
macro_rules! ignore_message { macro_rules! ignore_message {
($($tt:tt)*) => { ($($tt:tt)*) => {
if crate::messages::messages() && crate::messages::ignore_messages() { if crate::messages::messages() && crate::messages::ignore_messages() {
eprintln_locked!($($tt)*); eprintln!($($tt)*);
} }
} }
} }

View File

@@ -330,12 +330,11 @@ impl<W: WriteColor> SearchWorker<W> {
} else { } else {
self.config.binary_implicit.clone() self.config.binary_implicit.clone()
}; };
let path = subject.path();
log::trace!("{}: binary detection: {:?}", path.display(), bin);
self.searcher.set_binary_detection(bin); self.searcher.set_binary_detection(bin);
let path = subject.path();
if subject.is_stdin() { if subject.is_stdin() {
self.search_reader(path, &mut io::stdin().lock()) self.search_reader(path, io::stdin().lock())
} else if self.should_preprocess(path) { } else if self.should_preprocess(path) {
self.search_preprocessor(path) self.search_preprocessor(path)
} else if self.should_decompress(path) { } else if self.should_decompress(path) {
@@ -399,7 +398,7 @@ impl<W: WriteColor> SearchWorker<W> {
let mut cmd = Command::new(bin); let mut cmd = Command::new(bin);
cmd.arg(path).stdin(Stdio::from(File::open(path)?)); cmd.arg(path).stdin(Stdio::from(File::open(path)?));
let mut rdr = self.command_builder.build(&mut cmd).map_err(|err| { let rdr = self.command_builder.build(&mut cmd).map_err(|err| {
io::Error::new( io::Error::new(
io::ErrorKind::Other, io::ErrorKind::Other,
format!( format!(
@@ -408,28 +407,20 @@ impl<W: WriteColor> SearchWorker<W> {
), ),
) )
})?; })?;
let result = self.search_reader(path, &mut rdr).map_err(|err| { self.search_reader(path, rdr).map_err(|err| {
io::Error::new( io::Error::new(
io::ErrorKind::Other, io::ErrorKind::Other,
format!("preprocessor command failed: '{:?}': {}", cmd, err), format!("preprocessor command failed: '{:?}': {}", cmd, err),
) )
}); })
let close_result = rdr.close();
let search_result = result?;
close_result?;
Ok(search_result)
} }
/// Attempt to decompress the data at the given file path and search the /// Attempt to decompress the data at the given file path and search the
/// result. If the given file path isn't recognized as a compressed file, /// result. If the given file path isn't recognized as a compressed file,
/// then search it without doing any decompression. /// then search it without doing any decompression.
fn search_decompress(&mut self, path: &Path) -> io::Result<SearchResult> { fn search_decompress(&mut self, path: &Path) -> io::Result<SearchResult> {
let mut rdr = self.decomp_builder.build(path)?; let rdr = self.decomp_builder.build(path)?;
let result = self.search_reader(path, &mut rdr); self.search_reader(path, rdr)
let close_result = rdr.close();
let search_result = result?;
close_result?;
Ok(search_result)
} }
/// Search the contents of the given file path. /// Search the contents of the given file path.
@@ -456,7 +447,7 @@ impl<W: WriteColor> SearchWorker<W> {
fn search_reader<R: io::Read>( fn search_reader<R: io::Read>(
&mut self, &mut self,
path: &Path, path: &Path,
rdr: &mut R, rdr: R,
) -> io::Result<SearchResult> { ) -> io::Result<SearchResult> {
use self::PatternMatcher::*; use self::PatternMatcher::*;
@@ -512,12 +503,12 @@ fn search_reader<M: Matcher, R: io::Read, W: WriteColor>(
searcher: &mut Searcher, searcher: &mut Searcher,
printer: &mut Printer<W>, printer: &mut Printer<W>,
path: &Path, path: &Path,
mut rdr: R, rdr: R,
) -> io::Result<SearchResult> { ) -> io::Result<SearchResult> {
match *printer { match *printer {
Printer::Standard(ref mut p) => { Printer::Standard(ref mut p) => {
let mut sink = p.sink_with_path(&matcher, path); let mut sink = p.sink_with_path(&matcher, path);
searcher.search_reader(&matcher, &mut rdr, &mut sink)?; searcher.search_reader(&matcher, rdr, &mut sink)?;
Ok(SearchResult { Ok(SearchResult {
has_match: sink.has_match(), has_match: sink.has_match(),
stats: sink.stats().map(|s| s.clone()), stats: sink.stats().map(|s| s.clone()),
@@ -525,7 +516,7 @@ fn search_reader<M: Matcher, R: io::Read, W: WriteColor>(
} }
Printer::Summary(ref mut p) => { Printer::Summary(ref mut p) => {
let mut sink = p.sink_with_path(&matcher, path); let mut sink = p.sink_with_path(&matcher, path);
searcher.search_reader(&matcher, &mut rdr, &mut sink)?; searcher.search_reader(&matcher, rdr, &mut sink)?;
Ok(SearchResult { Ok(SearchResult {
has_match: sink.has_match(), has_match: sink.has_match(),
stats: sink.stats().map(|s| s.clone()), stats: sink.stats().map(|s| s.clone()),
@@ -533,7 +524,7 @@ fn search_reader<M: Matcher, R: io::Read, W: WriteColor>(
} }
Printer::JSON(ref mut p) => { Printer::JSON(ref mut p) => {
let mut sink = p.sink_with_path(&matcher, path); let mut sink = p.sink_with_path(&matcher, path);
searcher.search_reader(&matcher, &mut rdr, &mut sink)?; searcher.search_reader(&matcher, rdr, &mut sink)?;
Ok(SearchResult { Ok(SearchResult {
has_match: sink.has_match(), has_match: sink.has_match(),
stats: Some(sink.stats().clone()), stats: Some(sink.stats().clone()),

View File

@@ -67,7 +67,7 @@ impl SubjectBuilder {
if subj.is_file() { if subj.is_file() {
return Some(subj); return Some(subj);
} }
// We got nothing. Emit a debug message, but only if this isn't a // We got nothin. Emit a debug message, but only if this isn't a
// directory. Otherwise, emitting messages for directories is just // directory. Otherwise, emitting messages for directories is just
// noisy. // noisy.
if !subj.is_dir() { if !subj.is_dir() {

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "globset" name = "globset"
version = "0.4.11" #:version version = "0.4.6" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"] authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """ description = """
Cross platform single glob and glob set matching. Glob set matching is the Cross platform single glob and glob set matching. Glob set matching is the
@@ -12,19 +12,18 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/globset"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/globset" repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/globset"
readme = "README.md" readme = "README.md"
keywords = ["regex", "glob", "multiple", "set", "pattern"] keywords = ["regex", "glob", "multiple", "set", "pattern"]
license = "Unlicense OR MIT" license = "Unlicense/MIT"
edition = "2018"
[lib] [lib]
name = "globset" name = "globset"
bench = false bench = false
[dependencies] [dependencies]
aho-corasick = "1.0.2" aho-corasick = "0.7.3"
bstr = { version = "1.6.0", default-features = false, features = ["std"] } bstr = { version = "0.2.0", default-features = false, features = ["std"] }
fnv = "1.0.6" fnv = "1.0.6"
log = { version = "0.4.5", optional = true } log = "0.4.5"
regex = { version = "1.8.3", default-features = false, features = ["perf", "std"] } regex = { version = "1.1.5", default-features = false, features = ["perf", "std"] }
serde = { version = "1.0.104", optional = true } serde = { version = "1.0.104", optional = true }
[dev-dependencies] [dev-dependencies]
@@ -33,6 +32,5 @@ lazy_static = "1"
serde_json = "1.0.45" serde_json = "1.0.45"
[features] [features]
default = ["log"]
simd-accel = [] simd-accel = []
serde1 = ["serde"] serde1 = ["serde"]

View File

@@ -19,7 +19,13 @@ Add this to your `Cargo.toml`:
```toml ```toml
[dependencies] [dependencies]
globset = "0.4" globset = "0.3"
```
and this to your crate root:
```rust
extern crate globset;
``` ```
### Features ### Features
@@ -78,12 +84,12 @@ assert_eq!(set.matches("src/bar/baz/foo.rs"), vec![0, 2]);
This crate implements globs by converting them to regular expressions, and This crate implements globs by converting them to regular expressions, and
executing them with the executing them with the
[`regex`](https://github.com/rust-lang/regex) [`regex`](https://github.com/rust-lang-nursery/regex)
crate. crate.
For single glob matching, performance of this crate should be roughly on par For single glob matching, performance of this crate should be roughly on par
with the performance of the with the performance of the
[`glob`](https://github.com/rust-lang/glob) [`glob`](https://github.com/rust-lang-nursery/glob)
crate. (`*_regex` correspond to benchmarks for this library while `*_glob` crate. (`*_regex` correspond to benchmarks for this library while `*_glob`
correspond to benchmarks for the `glob` library.) correspond to benchmarks for the `glob` library.)
Optimizations in the `regex` crate may propel this library past `glob`, Optimizations in the `regex` crate may propel this library past `glob`,
@@ -108,7 +114,7 @@ test many_short_glob ... bench: 1,063 ns/iter (+/- 47)
test many_short_regex_set ... bench: 186 ns/iter (+/- 11) test many_short_regex_set ... bench: 186 ns/iter (+/- 11)
``` ```
### Comparison with the [`glob`](https://github.com/rust-lang/glob) crate ### Comparison with the [`glob`](https://github.com/rust-lang-nursery/glob) crate
* Supports alternate "or" globs, e.g., `*.{foo,bar}`. * Supports alternate "or" globs, e.g., `*.{foo,bar}`.
* Can match non-UTF-8 file paths correctly. * Can match non-UTF-8 file paths correctly.

View File

@@ -4,6 +4,9 @@ tool itself, see the benchsuite directory.
*/ */
#![feature(test)] #![feature(test)]
extern crate glob;
extern crate globset;
extern crate regex;
extern crate test; extern crate test;
use globset::{Candidate, Glob, GlobMatcher, GlobSet, GlobSetBuilder}; use globset::{Candidate, Glob, GlobMatcher, GlobSet, GlobSetBuilder};

View File

@@ -8,7 +8,7 @@ use std::str;
use regex; use regex;
use regex::bytes::Regex; use regex::bytes::Regex;
use crate::{new_regex, Candidate, Error, ErrorKind}; use {new_regex, Candidate, Error, ErrorKind};
/// Describes a matching strategy for a particular pattern. /// Describes a matching strategy for a particular pattern.
/// ///
@@ -98,7 +98,7 @@ impl hash::Hash for Glob {
} }
impl fmt::Display for Glob { impl fmt::Display for Glob {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
self.glob.fmt(f) self.glob.fmt(f)
} }
} }
@@ -127,7 +127,7 @@ impl GlobMatcher {
} }
/// Tests whether the given path matches this pattern or not. /// Tests whether the given path matches this pattern or not.
pub fn is_match_candidate(&self, path: &Candidate<'_>) -> bool { pub fn is_match_candidate(&self, path: &Candidate) -> bool {
self.re.is_match(&path.path) self.re.is_match(&path.path)
} }
@@ -143,6 +143,8 @@ impl GlobMatcher {
struct GlobStrategic { struct GlobStrategic {
/// The match strategy to use. /// The match strategy to use.
strategy: MatchStrategy, strategy: MatchStrategy,
/// The underlying pattern.
pat: Glob,
/// The pattern, as a compiled regex. /// The pattern, as a compiled regex.
re: Regex, re: Regex,
} }
@@ -155,7 +157,7 @@ impl GlobStrategic {
} }
/// Tests whether the given path matches this pattern or not. /// Tests whether the given path matches this pattern or not.
fn is_match_candidate(&self, candidate: &Candidate<'_>) -> bool { fn is_match_candidate(&self, candidate: &Candidate) -> bool {
let byte_path = &*candidate.path; let byte_path = &*candidate.path;
match self.strategy { match self.strategy {
@@ -208,9 +210,6 @@ struct GlobOptions {
/// Whether or not to use `\` to escape special characters. /// Whether or not to use `\` to escape special characters.
/// e.g., when enabled, `\*` will match a literal `*`. /// e.g., when enabled, `\*` will match a literal `*`.
backslash_escape: bool, backslash_escape: bool,
/// Whether or not an empty case in an alternate will be removed.
/// e.g., when enabled, `{,a}` will match "" and "a".
empty_alternates: bool,
} }
impl GlobOptions { impl GlobOptions {
@@ -219,7 +218,6 @@ impl GlobOptions {
case_insensitive: false, case_insensitive: false,
literal_separator: false, literal_separator: false,
backslash_escape: !is_separator('\\'), backslash_escape: !is_separator('\\'),
empty_alternates: false,
} }
} }
} }
@@ -275,7 +273,7 @@ impl Glob {
let strategy = MatchStrategy::new(self); let strategy = MatchStrategy::new(self);
let re = let re =
new_regex(&self.re).expect("regex compilation shouldn't fail"); new_regex(&self.re).expect("regex compilation shouldn't fail");
GlobStrategic { strategy: strategy, re: re } GlobStrategic { strategy: strategy, pat: self.clone(), re: re }
} }
/// Returns the original glob pattern used to build this pattern. /// Returns the original glob pattern used to build this pattern.
@@ -405,7 +403,7 @@ impl Glob {
if self.opts.case_insensitive { if self.opts.case_insensitive {
return None; return None;
} }
let (end, need_sep) = match self.tokens.last() { let end = match self.tokens.last() {
Some(&Token::ZeroOrMore) => { Some(&Token::ZeroOrMore) => {
if self.opts.literal_separator { if self.opts.literal_separator {
// If a trailing `*` can't match a `/`, then we can't // If a trailing `*` can't match a `/`, then we can't
@@ -416,10 +414,9 @@ impl Glob {
// literal prefix. // literal prefix.
return None; return None;
} }
(self.tokens.len() - 1, false) self.tokens.len() - 1
} }
Some(&Token::RecursiveSuffix) => (self.tokens.len() - 1, true), _ => self.tokens.len(),
_ => (self.tokens.len(), false),
}; };
let mut lit = String::new(); let mut lit = String::new();
for t in &self.tokens[0..end] { for t in &self.tokens[0..end] {
@@ -428,9 +425,6 @@ impl Glob {
_ => return None, _ => return None,
} }
} }
if need_sep {
lit.push('/');
}
if lit.is_empty() { if lit.is_empty() {
None None
} else { } else {
@@ -618,8 +612,6 @@ impl<'a> GlobBuilder<'a> {
} }
/// Toggle whether a literal `/` is required to match a path separator. /// Toggle whether a literal `/` is required to match a path separator.
///
/// By default this is false: `*` and `?` will match `/`.
pub fn literal_separator(&mut self, yes: bool) -> &mut GlobBuilder<'a> { pub fn literal_separator(&mut self, yes: bool) -> &mut GlobBuilder<'a> {
self.opts.literal_separator = yes; self.opts.literal_separator = yes;
self self
@@ -637,16 +629,6 @@ impl<'a> GlobBuilder<'a> {
self.opts.backslash_escape = yes; self.opts.backslash_escape = yes;
self self
} }
/// Toggle whether an empty pattern in a list of alternates is accepted.
///
/// For example, if this is set then the glob `foo{,.txt}` will match both `foo` and `foo.txt`.
///
/// By default this is false.
pub fn empty_alternates(&mut self, yes: bool) -> &mut GlobBuilder<'a> {
self.opts.empty_alternates = yes;
self
}
} }
impl Tokens { impl Tokens {
@@ -701,7 +683,7 @@ impl Tokens {
re.push_str("(?:/?|.*/)"); re.push_str("(?:/?|.*/)");
} }
Token::RecursiveSuffix => { Token::RecursiveSuffix => {
re.push_str("/.*"); re.push_str("(?:/?|/.*)");
} }
Token::RecursiveZeroOrMore => { Token::RecursiveZeroOrMore => {
re.push_str("(?:/|/.*/)"); re.push_str("(?:/|/.*/)");
@@ -728,7 +710,7 @@ impl Tokens {
for pat in patterns { for pat in patterns {
let mut altre = String::new(); let mut altre = String::new();
self.tokens_to_regex(options, &pat, &mut altre); self.tokens_to_regex(options, &pat, &mut altre);
if !altre.is_empty() || options.empty_alternates { if !altre.is_empty() {
parts.push(altre); parts.push(altre);
} }
} }
@@ -1027,14 +1009,13 @@ fn ends_with(needle: &[u8], haystack: &[u8]) -> bool {
mod tests { mod tests {
use super::Token::*; use super::Token::*;
use super::{Glob, GlobBuilder, Token}; use super::{Glob, GlobBuilder, Token};
use crate::{ErrorKind, GlobSetBuilder}; use {ErrorKind, GlobSetBuilder};
#[derive(Clone, Copy, Debug, Default)] #[derive(Clone, Copy, Debug, Default)]
struct Options { struct Options {
casei: Option<bool>, casei: Option<bool>,
litsep: Option<bool>, litsep: Option<bool>,
bsesc: Option<bool>, bsesc: Option<bool>,
ealtre: Option<bool>,
} }
macro_rules! syntax { macro_rules! syntax {
@@ -1074,9 +1055,6 @@ mod tests {
if let Some(bsesc) = $options.bsesc { if let Some(bsesc) = $options.bsesc {
builder.backslash_escape(bsesc); builder.backslash_escape(bsesc);
} }
if let Some(ealtre) = $options.ealtre {
builder.empty_alternates(ealtre);
}
let pat = builder.build().unwrap(); let pat = builder.build().unwrap();
assert_eq!(format!("(?-u){}", $re), pat.regex()); assert_eq!(format!("(?-u){}", $re), pat.regex());
} }
@@ -1100,9 +1078,6 @@ mod tests {
if let Some(bsesc) = $options.bsesc { if let Some(bsesc) = $options.bsesc {
builder.backslash_escape(bsesc); builder.backslash_escape(bsesc);
} }
if let Some(ealtre) = $options.ealtre {
builder.empty_alternates(ealtre);
}
let pat = builder.build().unwrap(); let pat = builder.build().unwrap();
let matcher = pat.compile_matcher(); let matcher = pat.compile_matcher();
let strategic = pat.compile_strategic_matcher(); let strategic = pat.compile_strategic_matcher();
@@ -1131,9 +1106,6 @@ mod tests {
if let Some(bsesc) = $options.bsesc { if let Some(bsesc) = $options.bsesc {
builder.backslash_escape(bsesc); builder.backslash_escape(bsesc);
} }
if let Some(ealtre) = $options.ealtre {
builder.empty_alternates(ealtre);
}
let pat = builder.build().unwrap(); let pat = builder.build().unwrap();
let matcher = pat.compile_matcher(); let matcher = pat.compile_matcher();
let strategic = pat.compile_strategic_matcher(); let strategic = pat.compile_strategic_matcher();
@@ -1219,23 +1191,13 @@ mod tests {
syntaxerr!(err_range2, "[z--]", ErrorKind::InvalidRange('z', '-')); syntaxerr!(err_range2, "[z--]", ErrorKind::InvalidRange('z', '-'));
const CASEI: Options = const CASEI: Options =
Options { casei: Some(true), litsep: None, bsesc: None, ealtre: None }; Options { casei: Some(true), litsep: None, bsesc: None };
const SLASHLIT: Options = const SLASHLIT: Options =
Options { casei: None, litsep: Some(true), bsesc: None, ealtre: None }; Options { casei: None, litsep: Some(true), bsesc: None };
const NOBSESC: Options = Options { const NOBSESC: Options =
casei: None, Options { casei: None, litsep: None, bsesc: Some(false) };
litsep: None,
bsesc: Some(false),
ealtre: None,
};
const BSESC: Options = const BSESC: Options =
Options { casei: None, litsep: None, bsesc: Some(true), ealtre: None }; Options { casei: None, litsep: None, bsesc: Some(true) };
const EALTRE: Options = Options {
casei: None,
litsep: None,
bsesc: Some(true),
ealtre: Some(true),
};
toregex!(re_casei, "a", "(?i)^a$", &CASEI); toregex!(re_casei, "a", "(?i)^a$", &CASEI);
@@ -1260,9 +1222,9 @@ mod tests {
toregex!(re16, "**/**/*", r"^(?:/?|.*/).*$"); toregex!(re16, "**/**/*", r"^(?:/?|.*/).*$");
toregex!(re17, "**/**/**", r"^.*$"); toregex!(re17, "**/**/**", r"^.*$");
toregex!(re18, "**/**/**/*", r"^(?:/?|.*/).*$"); toregex!(re18, "**/**/**/*", r"^(?:/?|.*/).*$");
toregex!(re19, "a/**", r"^a/.*$"); toregex!(re19, "a/**", r"^a(?:/?|/.*)$");
toregex!(re20, "a/**/**", r"^a/.*$"); toregex!(re20, "a/**/**", r"^a(?:/?|/.*)$");
toregex!(re21, "a/**/**/**", r"^a/.*$"); toregex!(re21, "a/**/**/**", r"^a(?:/?|/.*)$");
toregex!(re22, "a/**/b", r"^a(?:/|/.*/)b$"); toregex!(re22, "a/**/b", r"^a(?:/|/.*/)b$");
toregex!(re23, "a/**/**/b", r"^a(?:/|/.*/)b$"); toregex!(re23, "a/**/**/b", r"^a(?:/|/.*/)b$");
toregex!(re24, "a/**/**/**/b", r"^a(?:/|/.*/)b$"); toregex!(re24, "a/**/**/**/b", r"^a(?:/|/.*/)b$");
@@ -1308,12 +1270,11 @@ mod tests {
matches!(matchrec18, "/**/test", "/test"); matches!(matchrec18, "/**/test", "/test");
matches!(matchrec19, "**/.*", ".abc"); matches!(matchrec19, "**/.*", ".abc");
matches!(matchrec20, "**/.*", "abc/.abc"); matches!(matchrec20, "**/.*", "abc/.abc");
matches!(matchrec21, "**/foo/bar", "foo/bar"); matches!(matchrec21, ".*/**", ".abc");
matches!(matchrec22, ".*/**", ".abc/abc"); matches!(matchrec22, ".*/**", ".abc/abc");
matches!(matchrec23, "test/**", "test/"); matches!(matchrec23, "foo/**", "foo");
matches!(matchrec24, "test/**", "test/one"); matches!(matchrec24, "**/foo/bar", "foo/bar");
matches!(matchrec25, "test/**", "test/one/two"); matches!(matchrec25, "some/*/needle.txt", "some/one/needle.txt");
matches!(matchrec26, "some/*/needle.txt", "some/one/needle.txt");
matches!(matchrange1, "a[0-9]b", "a0b"); matches!(matchrange1, "a[0-9]b", "a0b");
matches!(matchrange2, "a[0-9]b", "a9b"); matches!(matchrange2, "a[0-9]b", "a9b");
@@ -1360,9 +1321,6 @@ mod tests {
matches!(matchalt11, "{*.foo,*.bar,*.wat}", "test.foo"); matches!(matchalt11, "{*.foo,*.bar,*.wat}", "test.foo");
matches!(matchalt12, "{*.foo,*.bar,*.wat}", "test.bar"); matches!(matchalt12, "{*.foo,*.bar,*.wat}", "test.bar");
matches!(matchalt13, "{*.foo,*.bar,*.wat}", "test.wat"); matches!(matchalt13, "{*.foo,*.bar,*.wat}", "test.wat");
matches!(matchalt14, "foo{,.txt}", "foo.txt");
nmatches!(matchalt15, "foo{,.txt}", "foo");
matches!(matchalt16, "foo{,.txt}", "foo", EALTRE);
matches!(matchslash1, "abc/def", "abc/def", SLASHLIT); matches!(matchslash1, "abc/def", "abc/def", SLASHLIT);
#[cfg(unix)] #[cfg(unix)]
@@ -1442,8 +1400,6 @@ mod tests {
"some/one/two/three/needle.txt", "some/one/two/three/needle.txt",
SLASHLIT SLASHLIT
); );
nmatches!(matchrec33, ".*/**", ".abc");
nmatches!(matchrec34, "foo/**", "foo");
macro_rules! extract { macro_rules! extract {
($which:ident, $name:ident, $pat:expr, $expect:expr) => { ($which:ident, $name:ident, $pat:expr, $expect:expr) => {
@@ -1462,9 +1418,6 @@ mod tests {
if let Some(bsesc) = $options.bsesc { if let Some(bsesc) = $options.bsesc {
builder.backslash_escape(bsesc); builder.backslash_escape(bsesc);
} }
if let Some(ealtre) = $options.ealtre {
builder.empty_alternates(ealtre);
}
let pat = builder.build().unwrap(); let pat = builder.build().unwrap();
assert_eq!($expect, pat.$which()); assert_eq!($expect, pat.$which());
} }
@@ -1551,7 +1504,7 @@ mod tests {
prefix!(extract_prefix1, "/foo", Some(s("/foo"))); prefix!(extract_prefix1, "/foo", Some(s("/foo")));
prefix!(extract_prefix2, "/foo/*", Some(s("/foo/"))); prefix!(extract_prefix2, "/foo/*", Some(s("/foo/")));
prefix!(extract_prefix3, "**/foo", None); prefix!(extract_prefix3, "**/foo", None);
prefix!(extract_prefix4, "foo/**", Some(s("foo/"))); prefix!(extract_prefix4, "foo/**", None);
suffix!(extract_suffix1, "**/foo/bar", Some((s("/foo/bar"), true))); suffix!(extract_suffix1, "**/foo/bar", Some((s("/foo/bar"), true)));
suffix!(extract_suffix2, "*/foo/bar", Some((s("/foo/bar"), false))); suffix!(extract_suffix2, "*/foo/bar", Some((s("/foo/bar"), false)));

View File

@@ -103,6 +103,16 @@ or to enable case insensitive matching.
#![deny(missing_docs)] #![deny(missing_docs)]
extern crate aho_corasick;
extern crate bstr;
extern crate fnv;
#[macro_use]
extern crate log;
extern crate regex;
#[cfg(feature = "serde1")]
extern crate serde;
use std::borrow::Cow; use std::borrow::Cow;
use std::collections::{BTreeMap, HashMap}; use std::collections::{BTreeMap, HashMap};
use std::error::Error as StdError; use std::error::Error as StdError;
@@ -115,9 +125,9 @@ use aho_corasick::AhoCorasick;
use bstr::{ByteSlice, ByteVec, B}; use bstr::{ByteSlice, ByteVec, B};
use regex::bytes::{Regex, RegexBuilder, RegexSet}; use regex::bytes::{Regex, RegexBuilder, RegexSet};
use crate::glob::MatchStrategy; use glob::MatchStrategy;
pub use crate::glob::{Glob, GlobBuilder, GlobMatcher}; pub use glob::{Glob, GlobBuilder, GlobMatcher};
use crate::pathutil::{file_name, file_name_ext, normalize_path}; use pathutil::{file_name, file_name_ext, normalize_path};
mod glob; mod glob;
mod pathutil; mod pathutil;
@@ -125,16 +135,6 @@ mod pathutil;
#[cfg(feature = "serde1")] #[cfg(feature = "serde1")]
mod serde_impl; mod serde_impl;
#[cfg(feature = "log")]
macro_rules! debug {
($($token:tt)*) => (::log::debug!($($token)*);)
}
#[cfg(not(feature = "log"))]
macro_rules! debug {
($($token:tt)*) => {};
}
/// Represents an error that can occur when parsing a glob pattern. /// Represents an error that can occur when parsing a glob pattern.
#[derive(Clone, Debug, Eq, PartialEq)] #[derive(Clone, Debug, Eq, PartialEq)]
pub struct Error { pub struct Error {
@@ -228,7 +228,7 @@ impl ErrorKind {
} }
impl fmt::Display for Error { impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self.glob { match self.glob {
None => self.kind.fmt(f), None => self.kind.fmt(f),
Some(ref glob) => { Some(ref glob) => {
@@ -239,7 +239,7 @@ impl fmt::Display for Error {
} }
impl fmt::Display for ErrorKind { impl fmt::Display for ErrorKind {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match *self { match *self {
ErrorKind::InvalidRecursive ErrorKind::InvalidRecursive
| ErrorKind::UnclosedClass | ErrorKind::UnclosedClass
@@ -317,7 +317,7 @@ impl GlobSet {
/// ///
/// This takes a Candidate as input, which can be used to amortize the /// This takes a Candidate as input, which can be used to amortize the
/// cost of preparing a path for matching. /// cost of preparing a path for matching.
pub fn is_match_candidate(&self, path: &Candidate<'_>) -> bool { pub fn is_match_candidate(&self, path: &Candidate) -> bool {
if self.is_empty() { if self.is_empty() {
return false; return false;
} }
@@ -340,7 +340,7 @@ impl GlobSet {
/// ///
/// This takes a Candidate as input, which can be used to amortize the /// This takes a Candidate as input, which can be used to amortize the
/// cost of preparing a path for matching. /// cost of preparing a path for matching.
pub fn matches_candidate(&self, path: &Candidate<'_>) -> Vec<usize> { pub fn matches_candidate(&self, path: &Candidate) -> Vec<usize> {
let mut into = vec![]; let mut into = vec![];
if self.is_empty() { if self.is_empty() {
return into; return into;
@@ -374,7 +374,7 @@ impl GlobSet {
/// cost of preparing a path for matching. /// cost of preparing a path for matching.
pub fn matches_candidate_into( pub fn matches_candidate_into(
&self, &self,
path: &Candidate<'_>, path: &Candidate,
into: &mut Vec<usize>, into: &mut Vec<usize>,
) { ) {
into.clear(); into.clear();
@@ -456,13 +456,6 @@ impl GlobSet {
} }
} }
impl Default for GlobSet {
/// Create a default empty GlobSet.
fn default() -> Self {
GlobSet::empty()
}
}
/// GlobSetBuilder builds a group of patterns that can be used to /// GlobSetBuilder builds a group of patterns that can be used to
/// simultaneously match a file path. /// simultaneously match a file path.
#[derive(Clone, Debug)] #[derive(Clone, Debug)]
@@ -498,23 +491,13 @@ impl GlobSetBuilder {
/// Constructing candidates has a very small cost associated with it, so /// Constructing candidates has a very small cost associated with it, so
/// callers may find it beneficial to amortize that cost when matching a single /// callers may find it beneficial to amortize that cost when matching a single
/// path against multiple globs or sets of globs. /// path against multiple globs or sets of globs.
#[derive(Clone)] #[derive(Clone, Debug)]
pub struct Candidate<'a> { pub struct Candidate<'a> {
path: Cow<'a, [u8]>, path: Cow<'a, [u8]>,
basename: Cow<'a, [u8]>, basename: Cow<'a, [u8]>,
ext: Cow<'a, [u8]>, ext: Cow<'a, [u8]>,
} }
impl<'a> std::fmt::Debug for Candidate<'a> {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
f.debug_struct("Candidate")
.field("path", &self.path.as_bstr())
.field("basename", &self.basename.as_bstr())
.field("ext", &self.ext.as_bstr())
.finish()
}
}
impl<'a> Candidate<'a> { impl<'a> Candidate<'a> {
/// Create a new candidate for matching from the given path. /// Create a new candidate for matching from the given path.
pub fn new<P: AsRef<Path> + ?Sized>(path: &'a P) -> Candidate<'a> { pub fn new<P: AsRef<Path> + ?Sized>(path: &'a P) -> Candidate<'a> {
@@ -553,7 +536,7 @@ enum GlobSetMatchStrategy {
} }
impl GlobSetMatchStrategy { impl GlobSetMatchStrategy {
fn is_match(&self, candidate: &Candidate<'_>) -> bool { fn is_match(&self, candidate: &Candidate) -> bool {
use self::GlobSetMatchStrategy::*; use self::GlobSetMatchStrategy::*;
match *self { match *self {
Literal(ref s) => s.is_match(candidate), Literal(ref s) => s.is_match(candidate),
@@ -566,11 +549,7 @@ impl GlobSetMatchStrategy {
} }
} }
fn matches_into( fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
&self,
candidate: &Candidate<'_>,
matches: &mut Vec<usize>,
) {
use self::GlobSetMatchStrategy::*; use self::GlobSetMatchStrategy::*;
match *self { match *self {
Literal(ref s) => s.matches_into(candidate, matches), Literal(ref s) => s.matches_into(candidate, matches),
@@ -596,16 +575,12 @@ impl LiteralStrategy {
self.0.entry(lit.into_bytes()).or_insert(vec![]).push(global_index); self.0.entry(lit.into_bytes()).or_insert(vec![]).push(global_index);
} }
fn is_match(&self, candidate: &Candidate<'_>) -> bool { fn is_match(&self, candidate: &Candidate) -> bool {
self.0.contains_key(candidate.path.as_bytes()) self.0.contains_key(candidate.path.as_bytes())
} }
#[inline(never)] #[inline(never)]
fn matches_into( fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
&self,
candidate: &Candidate<'_>,
matches: &mut Vec<usize>,
) {
if let Some(hits) = self.0.get(candidate.path.as_bytes()) { if let Some(hits) = self.0.get(candidate.path.as_bytes()) {
matches.extend(hits); matches.extend(hits);
} }
@@ -624,7 +599,7 @@ impl BasenameLiteralStrategy {
self.0.entry(lit.into_bytes()).or_insert(vec![]).push(global_index); self.0.entry(lit.into_bytes()).or_insert(vec![]).push(global_index);
} }
fn is_match(&self, candidate: &Candidate<'_>) -> bool { fn is_match(&self, candidate: &Candidate) -> bool {
if candidate.basename.is_empty() { if candidate.basename.is_empty() {
return false; return false;
} }
@@ -632,11 +607,7 @@ impl BasenameLiteralStrategy {
} }
#[inline(never)] #[inline(never)]
fn matches_into( fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
&self,
candidate: &Candidate<'_>,
matches: &mut Vec<usize>,
) {
if candidate.basename.is_empty() { if candidate.basename.is_empty() {
return; return;
} }
@@ -658,7 +629,7 @@ impl ExtensionStrategy {
self.0.entry(ext.into_bytes()).or_insert(vec![]).push(global_index); self.0.entry(ext.into_bytes()).or_insert(vec![]).push(global_index);
} }
fn is_match(&self, candidate: &Candidate<'_>) -> bool { fn is_match(&self, candidate: &Candidate) -> bool {
if candidate.ext.is_empty() { if candidate.ext.is_empty() {
return false; return false;
} }
@@ -666,11 +637,7 @@ impl ExtensionStrategy {
} }
#[inline(never)] #[inline(never)]
fn matches_into( fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
&self,
candidate: &Candidate<'_>,
matches: &mut Vec<usize>,
) {
if candidate.ext.is_empty() { if candidate.ext.is_empty() {
return; return;
} }
@@ -688,7 +655,7 @@ struct PrefixStrategy {
} }
impl PrefixStrategy { impl PrefixStrategy {
fn is_match(&self, candidate: &Candidate<'_>) -> bool { fn is_match(&self, candidate: &Candidate) -> bool {
let path = candidate.path_prefix(self.longest); let path = candidate.path_prefix(self.longest);
for m in self.matcher.find_overlapping_iter(path) { for m in self.matcher.find_overlapping_iter(path) {
if m.start() == 0 { if m.start() == 0 {
@@ -698,11 +665,7 @@ impl PrefixStrategy {
false false
} }
fn matches_into( fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
&self,
candidate: &Candidate<'_>,
matches: &mut Vec<usize>,
) {
let path = candidate.path_prefix(self.longest); let path = candidate.path_prefix(self.longest);
for m in self.matcher.find_overlapping_iter(path) { for m in self.matcher.find_overlapping_iter(path) {
if m.start() == 0 { if m.start() == 0 {
@@ -720,7 +683,7 @@ struct SuffixStrategy {
} }
impl SuffixStrategy { impl SuffixStrategy {
fn is_match(&self, candidate: &Candidate<'_>) -> bool { fn is_match(&self, candidate: &Candidate) -> bool {
let path = candidate.path_suffix(self.longest); let path = candidate.path_suffix(self.longest);
for m in self.matcher.find_overlapping_iter(path) { for m in self.matcher.find_overlapping_iter(path) {
if m.end() == path.len() { if m.end() == path.len() {
@@ -730,11 +693,7 @@ impl SuffixStrategy {
false false
} }
fn matches_into( fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
&self,
candidate: &Candidate<'_>,
matches: &mut Vec<usize>,
) {
let path = candidate.path_suffix(self.longest); let path = candidate.path_suffix(self.longest);
for m in self.matcher.find_overlapping_iter(path) { for m in self.matcher.find_overlapping_iter(path) {
if m.end() == path.len() { if m.end() == path.len() {
@@ -748,7 +707,7 @@ impl SuffixStrategy {
struct RequiredExtensionStrategy(HashMap<Vec<u8>, Vec<(usize, Regex)>, Fnv>); struct RequiredExtensionStrategy(HashMap<Vec<u8>, Vec<(usize, Regex)>, Fnv>);
impl RequiredExtensionStrategy { impl RequiredExtensionStrategy {
fn is_match(&self, candidate: &Candidate<'_>) -> bool { fn is_match(&self, candidate: &Candidate) -> bool {
if candidate.ext.is_empty() { if candidate.ext.is_empty() {
return false; return false;
} }
@@ -766,11 +725,7 @@ impl RequiredExtensionStrategy {
} }
#[inline(never)] #[inline(never)]
fn matches_into( fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
&self,
candidate: &Candidate<'_>,
matches: &mut Vec<usize>,
) {
if candidate.ext.is_empty() { if candidate.ext.is_empty() {
return; return;
} }
@@ -791,15 +746,11 @@ struct RegexSetStrategy {
} }
impl RegexSetStrategy { impl RegexSetStrategy {
fn is_match(&self, candidate: &Candidate<'_>) -> bool { fn is_match(&self, candidate: &Candidate) -> bool {
self.matcher.is_match(candidate.path.as_bytes()) self.matcher.is_match(candidate.path.as_bytes())
} }
fn matches_into( fn matches_into(&self, candidate: &Candidate, matches: &mut Vec<usize>) {
&self,
candidate: &Candidate<'_>,
matches: &mut Vec<usize>,
) {
for i in self.matcher.matches(candidate.path.as_bytes()) { for i in self.matcher.matches(candidate.path.as_bytes()) {
matches.push(self.map[i]); matches.push(self.map[i]);
} }
@@ -828,7 +779,7 @@ impl MultiStrategyBuilder {
fn prefix(self) -> PrefixStrategy { fn prefix(self) -> PrefixStrategy {
PrefixStrategy { PrefixStrategy {
matcher: AhoCorasick::new(&self.literals).unwrap(), matcher: AhoCorasick::new_auto_configured(&self.literals),
map: self.map, map: self.map,
longest: self.longest, longest: self.longest,
} }
@@ -836,7 +787,7 @@ impl MultiStrategyBuilder {
fn suffix(self) -> SuffixStrategy { fn suffix(self) -> SuffixStrategy {
SuffixStrategy { SuffixStrategy {
matcher: AhoCorasick::new(&self.literals).unwrap(), matcher: AhoCorasick::new_auto_configured(&self.literals),
map: self.map, map: self.map,
longest: self.longest, longest: self.longest,
} }
@@ -880,33 +831,10 @@ impl RequiredExtensionStrategyBuilder {
} }
} }
/// Escape meta-characters within the given glob pattern.
///
/// The escaping works by surrounding meta-characters with brackets. For
/// example, `*` becomes `[*]`.
pub fn escape(s: &str) -> String {
let mut escaped = String::with_capacity(s.len());
for c in s.chars() {
match c {
// note that ! does not need escaping because it is only special
// inside brackets
'?' | '*' | '[' | ']' => {
escaped.push('[');
escaped.push(c);
escaped.push(']');
}
c => {
escaped.push(c);
}
}
}
escaped
}
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::{GlobSet, GlobSetBuilder}; use super::GlobSetBuilder;
use crate::glob::Glob; use glob::Glob;
#[test] #[test]
fn set_works() { fn set_works() {
@@ -935,23 +863,4 @@ mod tests {
assert!(!set.is_match("")); assert!(!set.is_match(""));
assert!(!set.is_match("a")); assert!(!set.is_match("a"));
} }
#[test]
fn default_set_is_empty_works() {
let set: GlobSet = Default::default();
assert!(!set.is_match(""));
assert!(!set.is_match("a"));
}
#[test]
fn escape() {
use super::escape;
assert_eq!("foo", escape("foo"));
assert_eq!("foo[*]", escape("foo*"));
assert_eq!("[[][]]", escape("[]"));
assert_eq!("[*][?]", escape("*?"));
assert_eq!("src/[*][*]/[*].rs", escape("src/**/*.rs"));
assert_eq!("bar[[]ab[]]baz", escape("bar[ab]baz"));
assert_eq!("bar[[]!![]]!baz", escape("bar[!!]!baz"));
}
} }

View File

@@ -27,7 +27,7 @@ pub fn file_name<'a>(path: &Cow<'a, [u8]>) -> Option<Cow<'a, [u8]>> {
/// ///
/// Note that this does NOT match the semantics of std::path::Path::extension. /// Note that this does NOT match the semantics of std::path::Path::extension.
/// Namely, the extension includes the `.` and matching is otherwise more /// Namely, the extension includes the `.` and matching is otherwise more
/// liberal. Specifically, the extension is: /// liberal. Specifically, the extenion is:
/// ///
/// * None, if the file name given is empty; /// * None, if the file name given is empty;
/// * None, if there is no embedded `.`; /// * None, if there is no embedded `.`;
@@ -60,7 +60,7 @@ pub fn file_name_ext<'a>(name: &Cow<'a, [u8]>) -> Option<Cow<'a, [u8]>> {
/// Normalizes a path to use `/` as a separator everywhere, even on platforms /// Normalizes a path to use `/` as a separator everywhere, even on platforms
/// that recognize other characters as separators. /// that recognize other characters as separators.
#[cfg(unix)] #[cfg(unix)]
pub fn normalize_path(path: Cow<'_, [u8]>) -> Cow<'_, [u8]> { pub fn normalize_path(path: Cow<[u8]>) -> Cow<[u8]> {
// UNIX only uses /, so we're good. // UNIX only uses /, so we're good.
path path
} }

View File

@@ -1,9 +1,7 @@
use serde::{ use serde::de::Error;
de::{Error, Visitor}, use serde::{Deserialize, Deserializer, Serialize, Serializer};
{Deserialize, Deserializer, Serialize, Serializer},
};
use crate::Glob; use Glob;
impl Serialize for Glob { impl Serialize for Glob {
fn serialize<S: Serializer>( fn serialize<S: Serializer>(
@@ -14,66 +12,18 @@ impl Serialize for Glob {
} }
} }
struct GlobVisitor;
impl<'a> Visitor<'a> for GlobVisitor {
type Value = Glob;
fn expecting(
&self,
formatter: &mut std::fmt::Formatter,
) -> std::fmt::Result {
formatter.write_str("a glob pattern")
}
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
where
E: Error,
{
Glob::new(v).map_err(serde::de::Error::custom)
}
}
impl<'de> Deserialize<'de> for Glob { impl<'de> Deserialize<'de> for Glob {
fn deserialize<D: Deserializer<'de>>( fn deserialize<D: Deserializer<'de>>(
deserializer: D, deserializer: D,
) -> Result<Self, D::Error> { ) -> Result<Self, D::Error> {
deserializer.deserialize_str(GlobVisitor) let glob = <&str as Deserialize>::deserialize(deserializer)?;
Glob::new(glob).map_err(D::Error::custom)
} }
} }
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use std::collections::HashMap; use Glob;
use crate::Glob;
#[test]
fn glob_deserialize_borrowed() {
let string = r#"{"markdown": "*.md"}"#;
let map: HashMap<String, Glob> =
serde_json::from_str(&string).unwrap();
assert_eq!(map["markdown"], Glob::new("*.md").unwrap());
}
#[test]
fn glob_deserialize_owned() {
let string = r#"{"markdown": "*.md"}"#;
let v: serde_json::Value = serde_json::from_str(&string).unwrap();
let map: HashMap<String, Glob> = serde_json::from_value(v).unwrap();
assert_eq!(map["markdown"], Glob::new("*.md").unwrap());
}
#[test]
fn glob_deserialize_error() {
let string = r#"{"error": "["}"#;
let map = serde_json::from_str::<HashMap<String, Glob>>(&string);
assert!(map.is_err());
}
#[test] #[test]
fn glob_json_works() { fn glob_json_works() {

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "grep" name = "grep"
version = "0.2.12" #:version version = "0.2.7" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"] authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """ description = """
Fast line oriented regex searching as a library. Fast line oriented regex searching as a library.
@@ -10,16 +10,15 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/grep"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/grep" repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/grep"
readme = "README.md" readme = "README.md"
keywords = ["regex", "grep", "egrep", "search", "pattern"] keywords = ["regex", "grep", "egrep", "search", "pattern"]
license = "Unlicense OR MIT" license = "Unlicense/MIT"
edition = "2018"
[dependencies] [dependencies]
grep-cli = { version = "0.1.7", path = "../cli" } grep-cli = { version = "0.1.5", path = "../cli" }
grep-matcher = { version = "0.1.6", path = "../matcher" } grep-matcher = { version = "0.1.4", path = "../matcher" }
grep-pcre2 = { version = "0.1.6", path = "../pcre2", optional = true } grep-pcre2 = { version = "0.1.4", path = "../pcre2", optional = true }
grep-printer = { version = "0.1.7", path = "../printer" } grep-printer = { version = "0.1.5", path = "../printer" }
grep-regex = { version = "0.1.11", path = "../regex" } grep-regex = { version = "0.1.8", path = "../regex" }
grep-searcher = { version = "0.1.11", path = "../searcher" } grep-searcher = { version = "0.1.7", path = "../searcher" }
[dev-dependencies] [dev-dependencies]
termcolor = "1.0.4" termcolor = "1.0.4"

View File

@@ -26,6 +26,12 @@ Add this to your `Cargo.toml`:
grep = "0.2" grep = "0.2"
``` ```
and this to your crate root:
```rust
extern crate grep;
```
### Features ### Features

View File

@@ -1,3 +1,7 @@
extern crate grep;
extern crate termcolor;
extern crate walkdir;
use std::env; use std::env;
use std::error::Error; use std::error::Error;
use std::ffi::OsString; use std::ffi::OsString;

View File

@@ -12,6 +12,8 @@ are sparse.
A cookbook and a guide are planned. A cookbook and a guide are planned.
*/ */
#![deny(missing_docs)]
pub extern crate grep_cli as cli; pub extern crate grep_cli as cli;
pub extern crate grep_matcher as matcher; pub extern crate grep_matcher as matcher;
#[cfg(feature = "pcre2")] #[cfg(feature = "pcre2")]

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "ignore" name = "ignore"
version = "0.4.20" #:version version = "0.4.17" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"] authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """ description = """
A fast library for efficiently matching ignore files such as `.gitignore` A fast library for efficiently matching ignore files such as `.gitignore`
@@ -11,19 +11,19 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/ignore"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/ignore" repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/ignore"
readme = "README.md" readme = "README.md"
keywords = ["glob", "ignore", "gitignore", "pattern", "file"] keywords = ["glob", "ignore", "gitignore", "pattern", "file"]
license = "Unlicense OR MIT" license = "Unlicense/MIT"
edition = "2018"
[lib] [lib]
name = "ignore" name = "ignore"
bench = false bench = false
[dependencies] [dependencies]
globset = { version = "0.4.10", path = "../globset" } crossbeam-utils = "0.8.0"
globset = { version = "0.4.5", path = "../globset" }
lazy_static = "1.1" lazy_static = "1.1"
log = "0.4.5" log = "0.4.5"
memchr = "2.5" memchr = "2.1"
regex = { version = "1.9.0", default-features = false, features = ["perf", "std", "unicode-gencat"] } regex = "1.1"
same-file = "1.0.4" same-file = "1.0.4"
thread_local = "1" thread_local = "1"
walkdir = "2.2.7" walkdir = "2.2.7"

View File

@@ -22,6 +22,12 @@ Add this to your `Cargo.toml`:
ignore = "0.4" ignore = "0.4"
``` ```
and this to your crate root:
```rust
extern crate ignore;
```
### Example ### Example
This example shows the most basic usage of this crate. This code will This example shows the most basic usage of this crate. This code will

View File

@@ -1,3 +1,7 @@
extern crate crossbeam_channel as channel;
extern crate ignore;
extern crate walkdir;
use std::env; use std::env;
use std::io::{self, Write}; use std::io::{self, Write};
use std::path::Path; use std::path::Path;
@@ -10,7 +14,7 @@ fn main() {
let mut path = env::args().nth(1).unwrap(); let mut path = env::args().nth(1).unwrap();
let mut parallel = false; let mut parallel = false;
let mut simple = false; let mut simple = false;
let (tx, rx) = crossbeam_channel::bounded::<DirEntry>(100); let (tx, rx) = channel::bounded::<DirEntry>(100);
if path == "parallel" { if path == "parallel" {
path = env::args().nth(2).unwrap(); path = env::args().nth(2).unwrap();
parallel = true; parallel = true;

View File

@@ -4,118 +4,96 @@
/// types to each invocation of ripgrep with the '--type-add' flag. /// types to each invocation of ripgrep with the '--type-add' flag.
/// ///
/// If you would like to add or improve this list, please file a PR: /// If you would like to add or improve this list, please file a PR:
/// <https://github.com/BurntSushi/ripgrep>. /// https://github.com/BurntSushi/ripgrep
/// ///
/// Please try to keep this list sorted lexicographically and wrapped to 79 /// Please try to keep this list sorted lexicographically and wrapped to 79
/// columns (inclusive). /// columns (inclusive).
#[rustfmt::skip] #[rustfmt::skip]
pub const DEFAULT_TYPES: &[(&[&str], &[&str])] = &[ pub const DEFAULT_TYPES: &[(&str, &[&str])] = &[
(&["ada"], &["*.adb", "*.ads"]), ("agda", &["*.agda", "*.lagda"]),
(&["agda"], &["*.agda", "*.lagda"]), ("aidl", &["*.aidl"]),
(&["aidl"], &["*.aidl"]), ("amake", &["*.mk", "*.bp"]),
(&["alire"], &["alire.toml"]), ("asciidoc", &["*.adoc", "*.asc", "*.asciidoc"]),
(&["amake"], &["*.mk", "*.bp"]), ("asm", &["*.asm", "*.s", "*.S"]),
(&["asciidoc"], &["*.adoc", "*.asc", "*.asciidoc"]), ("asp", &[
(&["asm"], &["*.asm", "*.s", "*.S"]), "*.aspx", "*.aspx.cs", "*.aspx.cs", "*.ascx", "*.ascx.cs", "*.ascx.vb",
(&["asp"], &[
"*.aspx", "*.aspx.cs", "*.aspx.vb", "*.ascx", "*.ascx.cs",
"*.ascx.vb", "*.asp"
]), ]),
(&["ats"], &["*.ats", "*.dats", "*.sats", "*.hats"]), ("ats", &["*.ats", "*.dats", "*.sats", "*.hats"]),
(&["avro"], &["*.avdl", "*.avpr", "*.avsc"]), ("avro", &["*.avdl", "*.avpr", "*.avsc"]),
(&["awk"], &["*.awk"]), ("awk", &["*.awk"]),
(&["bat", "batch"], &["*.bat"]), ("bazel", &["*.bazel", "*.bzl", "BUILD", "WORKSPACE"]),
(&["bazel"], &[ ("bitbake", &["*.bb", "*.bbappend", "*.bbclass", "*.conf", "*.inc"]),
"*.bazel", "*.bzl", "*.BUILD", "*.bazelrc", "BUILD", "MODULE.bazel", ("brotli", &["*.br"]),
"WORKSPACE", "WORKSPACE.bazel", ("buildstream", &["*.bst"]),
]), ("bzip2", &["*.bz2", "*.tbz2"]),
(&["bitbake"], &["*.bb", "*.bbappend", "*.bbclass", "*.conf", "*.inc"]), ("c", &["*.[chH]", "*.[chH].in", "*.cats"]),
(&["brotli"], &["*.br"]), ("cabal", &["*.cabal"]),
(&["buildstream"], &["*.bst"]), ("cbor", &["*.cbor"]),
(&["bzip2"], &["*.bz2", "*.tbz2"]), ("ceylon", &["*.ceylon"]),
(&["c"], &["*.[chH]", "*.[chH].in", "*.cats"]), ("clojure", &["*.clj", "*.cljc", "*.cljs", "*.cljx"]),
(&["cabal"], &["*.cabal"]), ("cmake", &["*.cmake", "CMakeLists.txt"]),
(&["candid"], &["*.did"]), ("coffeescript", &["*.coffee"]),
(&["carp"], &["*.carp"]), ("config", &["*.cfg", "*.conf", "*.config", "*.ini"]),
(&["cbor"], &["*.cbor"]), ("coq", &["*.v"]),
(&["ceylon"], &["*.ceylon"]), ("cpp", &[
(&["clojure"], &["*.clj", "*.cljc", "*.cljs", "*.cljx"]),
(&["cmake"], &["*.cmake", "CMakeLists.txt"]),
(&["cmd"], &["*.bat", "*.cmd"]),
(&["cml"], &["*.cml"]),
(&["coffeescript"], &["*.coffee"]),
(&["config"], &["*.cfg", "*.conf", "*.config", "*.ini"]),
(&["coq"], &["*.v"]),
(&["cpp"], &[
"*.[ChH]", "*.cc", "*.[ch]pp", "*.[ch]xx", "*.hh", "*.inl", "*.[ChH]", "*.cc", "*.[ch]pp", "*.[ch]xx", "*.hh", "*.inl",
"*.[ChH].in", "*.cc.in", "*.[ch]pp.in", "*.[ch]xx.in", "*.hh.in", "*.[ChH].in", "*.cc.in", "*.[ch]pp.in", "*.[ch]xx.in", "*.hh.in",
]), ]),
(&["creole"], &["*.creole"]), ("creole", &["*.creole"]),
(&["crystal"], &["Projectfile", "*.cr", "*.ecr", "shard.yml"]), ("crystal", &["Projectfile", "*.cr"]),
(&["cs"], &["*.cs"]), ("cs", &["*.cs"]),
(&["csharp"], &["*.cs"]), ("csharp", &["*.cs"]),
(&["cshtml"], &["*.cshtml"]), ("cshtml", &["*.cshtml"]),
(&["css"], &["*.css", "*.scss"]), ("css", &["*.css", "*.scss"]),
(&["csv"], &["*.csv"]), ("csv", &["*.csv"]),
(&["cuda"], &["*.cu", "*.cuh"]), ("cython", &["*.pyx", "*.pxi", "*.pxd"]),
(&["cython"], &["*.pyx", "*.pxi", "*.pxd"]), ("d", &["*.d"]),
(&["d"], &["*.d"]), ("dart", &["*.dart"]),
(&["dart"], &["*.dart"]), ("dhall", &["*.dhall"]),
(&["devicetree"], &["*.dts", "*.dtsi"]), ("diff", &["*.patch", "*.diff"]),
(&["dhall"], &["*.dhall"]), ("docker", &["*Dockerfile*"]),
(&["diff"], &["*.patch", "*.diff"]), ("dvc", &["Dvcfile", "*.dvc"]),
(&["dita"], &["*.dita", "*.ditamap", "*.ditaval"]), ("ebuild", &["*.ebuild"]),
(&["docker"], &["*Dockerfile*"]), ("edn", &["*.edn"]),
(&["dockercompose"], &["docker-compose.yml", "docker-compose.*.yml"]), ("elisp", &["*.el"]),
(&["dts"], &["*.dts", "*.dtsi"]), ("elixir", &["*.ex", "*.eex", "*.exs"]),
(&["dvc"], &["Dvcfile", "*.dvc"]), ("elm", &["*.elm"]),
(&["ebuild"], &["*.ebuild", "*.eclass"]), ("erb", &["*.erb"]),
(&["edn"], &["*.edn"]), ("erlang", &["*.erl", "*.hrl"]),
(&["elisp"], &["*.el"]), ("fidl", &["*.fidl"]),
(&["elixir"], &["*.ex", "*.eex", "*.exs", "*.heex", "*.leex", "*.livemd"]), ("fish", &["*.fish"]),
(&["elm"], &["*.elm"]), ("flatbuffers", &["*.fbs"]),
(&["erb"], &["*.erb"]), ("fortran", &[
(&["erlang"], &["*.erl", "*.hrl"]),
(&["fennel"], &["*.fnl"]),
(&["fidl"], &["*.fidl"]),
(&["fish"], &["*.fish"]),
(&["flatbuffers"], &["*.fbs"]),
(&["fortran"], &[
"*.f", "*.F", "*.f77", "*.F77", "*.pfo", "*.f", "*.F", "*.f77", "*.F77", "*.pfo",
"*.f90", "*.F90", "*.f95", "*.F95", "*.f90", "*.F90", "*.f95", "*.F95",
]), ]),
(&["fsharp"], &["*.fs", "*.fsx", "*.fsi"]), ("fsharp", &["*.fs", "*.fsx", "*.fsi"]),
(&["fut"], &["*.fut"]), ("fut", &[".fut"]),
(&["gap"], &["*.g", "*.gap", "*.gi", "*.gd", "*.tst"]), ("gap", &["*.g", "*.gap", "*.gi", "*.gd", "*.tst"]),
(&["gn"], &["*.gn", "*.gni"]), ("gn", &["*.gn", "*.gni"]),
(&["go"], &["*.go"]), ("go", &["*.go"]),
(&["gprbuild"], &["*.gpr"]), ("gradle", &["*.gradle"]),
(&["gradle"], &["*.gradle"]), ("groovy", &["*.groovy", "*.gradle"]),
(&["graphql"], &["*.graphql", "*.graphqls"]), ("gzip", &["*.gz", "*.tgz"]),
(&["groovy"], &["*.groovy", "*.gradle"]), ("h", &["*.h", "*.hpp"]),
(&["gzip"], &["*.gz", "*.tgz"]), ("haml", &["*.haml"]),
(&["h"], &["*.h", "*.hh", "*.hpp"]), ("haskell", &["*.hs", "*.lhs", "*.cpphs", "*.c2hs", "*.hsc"]),
(&["haml"], &["*.haml"]), ("hbs", &["*.hbs"]),
(&["hare"], &["*.ha"]), ("hs", &["*.hs", "*.lhs"]),
(&["haskell"], &["*.hs", "*.lhs", "*.cpphs", "*.c2hs", "*.hsc"]), ("html", &["*.htm", "*.html", "*.ejs"]),
(&["hbs"], &["*.hbs"]), ("idris", &["*.idr", "*.lidr"]),
(&["hs"], &["*.hs", "*.lhs"]), ("java", &["*.java", "*.jsp", "*.jspx", "*.properties"]),
(&["html"], &["*.htm", "*.html", "*.ejs"]), ("jinja", &["*.j2", "*.jinja", "*.jinja2"]),
(&["hy"], &["*.hy"]), ("jl", &["*.jl"]),
(&["idris"], &["*.idr", "*.lidr"]), ("js", &["*.js", "*.jsx", "*.vue"]),
(&["janet"], &["*.janet"]), ("json", &["*.json", "composer.lock"]),
(&["java"], &["*.java", "*.jsp", "*.jspx", "*.properties"]), ("jsonl", &["*.jsonl"]),
(&["jinja"], &["*.j2", "*.jinja", "*.jinja2"]), ("julia", &["*.jl"]),
(&["jl"], &["*.jl"]), ("jupyter", &["*.ipynb", "*.jpynb"]),
(&["js"], &["*.js", "*.jsx", "*.vue", "*.cjs", "*.mjs"]), ("k", &["*.k"]),
(&["json"], &["*.json", "composer.lock"]), ("kotlin", &["*.kt", "*.kts"]),
(&["jsonl"], &["*.jsonl"]), ("less", &["*.less"]),
(&["julia"], &["*.jl"]), ("license", &[
(&["jupyter"], &["*.ipynb", "*.jpynb"]),
(&["k"], &["*.k"]),
(&["kotlin"], &["*.kt", "*.kts"]),
(&["less"], &["*.less"]),
(&["license"], &[
// General // General
"COPYING", "COPYING[.-]*", "COPYING", "COPYING[.-]*",
"COPYRIGHT", "COPYRIGHT[.-]*", "COPYRIGHT", "COPYRIGHT[.-]*",
@@ -142,91 +120,61 @@ pub const DEFAULT_TYPES: &[(&[&str], &[&str])] = &[
"MPL-*[0-9]*", "MPL-*[0-9]*",
"OFL-*[0-9]*", "OFL-*[0-9]*",
]), ]),
(&["lilypond"], &["*.ly", "*.ily"]), ("lisp", &["*.el", "*.jl", "*.lisp", "*.lsp", "*.sc", "*.scm"]),
(&["lisp"], &["*.el", "*.jl", "*.lisp", "*.lsp", "*.sc", "*.scm"]), ("lock", &["*.lock", "package-lock.json"]),
(&["lock"], &["*.lock", "package-lock.json"]), ("log", &["*.log"]),
(&["log"], &["*.log"]), ("lua", &["*.lua"]),
(&["lua"], &["*.lua"]), ("lz4", &["*.lz4"]),
(&["lz4"], &["*.lz4"]), ("lzma", &["*.lzma"]),
(&["lzma"], &["*.lzma"]), ("m4", &["*.ac", "*.m4"]),
(&["m4"], &["*.ac", "*.m4"]), ("make", &[
(&["make"], &[
"[Gg][Nn][Uu]makefile", "[Mm]akefile", "[Gg][Nn][Uu]makefile", "[Mm]akefile",
"[Gg][Nn][Uu]makefile.am", "[Mm]akefile.am", "[Gg][Nn][Uu]makefile.am", "[Mm]akefile.am",
"[Gg][Nn][Uu]makefile.in", "[Mm]akefile.in", "[Gg][Nn][Uu]makefile.in", "[Mm]akefile.in",
"*.mk", "*.mak" "*.mk", "*.mak"
]), ]),
(&["mako"], &["*.mako", "*.mao"]), ("mako", &["*.mako", "*.mao"]),
(&["man"], &["*.[0-9lnpx]", "*.[0-9][cEFMmpSx]"]), ("man", &["*.[0-9lnpx]", "*.[0-9][cEFMmpSx]"]),
(&["markdown", "md"], &[ ("markdown", &["*.markdown", "*.md", "*.mdown", "*.mkdn"]),
"*.markdown", ("matlab", &["*.m"]),
"*.md", ("md", &["*.markdown", "*.md", "*.mdown", "*.mkdn"]),
"*.mdown", ("meson", &["meson.build", "meson_options.txt"]),
"*.mdwn", ("minified", &["*.min.html", "*.min.css", "*.min.js"]),
"*.mkd", ("mk", &["mkfile"]),
"*.mkdn", ("ml", &["*.ml"]),
"*.mdx", ("msbuild", &[
]),
(&["matlab"], &["*.m"]),
(&["meson"], &["meson.build", "meson_options.txt"]),
(&["minified"], &["*.min.html", "*.min.css", "*.min.js"]),
(&["mint"], &["*.mint"]),
(&["mk"], &["mkfile"]),
(&["ml"], &["*.ml"]),
(&["motoko"], &["*.mo"]),
(&["msbuild"], &[
"*.csproj", "*.fsproj", "*.vcxproj", "*.proj", "*.props", "*.targets", "*.csproj", "*.fsproj", "*.vcxproj", "*.proj", "*.props", "*.targets",
"*.sln",
]), ]),
(&["nim"], &["*.nim", "*.nimf", "*.nimble", "*.nims"]), ("nim", &["*.nim", "*.nimf", "*.nimble", "*.nims"]),
(&["nix"], &["*.nix"]), ("nix", &["*.nix"]),
(&["objc"], &["*.h", "*.m"]), ("objc", &["*.h", "*.m"]),
(&["objcpp"], &["*.h", "*.mm"]), ("objcpp", &["*.h", "*.mm"]),
(&["ocaml"], &["*.ml", "*.mli", "*.mll", "*.mly"]), ("ocaml", &["*.ml", "*.mli", "*.mll", "*.mly"]),
(&["org"], &["*.org", "*.org_archive"]), ("org", &["*.org", "*.org_archive"]),
(&["pants"], &["BUILD"]), ("pascal", &["*.pas", "*.dpr", "*.lpr", "*.pp", "*.inc"]),
(&["pascal"], &["*.pas", "*.dpr", "*.lpr", "*.pp", "*.inc"]), ("pdf", &["*.pdf"]),
(&["pdf"], &["*.pdf"]), ("perl", &["*.perl", "*.pl", "*.PL", "*.plh", "*.plx", "*.pm", "*.t"]),
(&["perl"], &["*.perl", "*.pl", "*.PL", "*.plh", "*.plx", "*.pm", "*.t"]), ("php", &["*.php", "*.php3", "*.php4", "*.php5", "*.phtml"]),
(&["php"], &[ ("pod", &["*.pod"]),
// note that PHP 6 doesn't exist ("postscript", &["*.eps", "*.ps"]),
// See: https://wiki.php.net/rfc/php6 ("protobuf", &["*.proto"]),
"*.php", "*.php3", "*.php4", "*.php5", "*.php7", "*.php8", ("ps", &["*.cdxml", "*.ps1", "*.ps1xml", "*.psd1", "*.psm1"]),
"*.pht", "*.phtml" ("puppet", &["*.erb", "*.pp", "*.rb"]),
]), ("purs", &["*.purs"]),
(&["po"], &["*.po"]), ("py", &["*.py"]),
(&["pod"], &["*.pod"]), ("qmake", &["*.pro", "*.pri", "*.prf"]),
(&["postscript"], &["*.eps", "*.ps"]), ("qml", &["*.qml"]),
(&["protobuf"], &["*.proto"]), ("r", &["*.R", "*.r", "*.Rmd", "*.Rnw"]),
(&["ps"], &["*.cdxml", "*.ps1", "*.ps1xml", "*.psd1", "*.psm1"]), ("racket", &["*.rkt"]),
(&["puppet"], &["*.epp", "*.erb", "*.pp", "*.rb"]), ("rdoc", &["*.rdoc"]),
(&["purs"], &["*.purs"]), ("readme", &["README*", "*README"]),
(&["py", "python"], &["*.py", "*.pyi"]), ("robot", &["*.robot"]),
(&["qmake"], &["*.pro", "*.pri", "*.prf"]), ("rst", &["*.rst"]),
(&["qml"], &["*.qml"]), ("ruby", &["Gemfile", "*.gemspec", ".irbrc", "Rakefile", "*.rb"]),
(&["r"], &["*.R", "*.r", "*.Rmd", "*.Rnw"]), ("rust", &["*.rs"]),
(&["racket"], &["*.rkt"]), ("sass", &["*.sass", "*.scss"]),
(&["raku"], &[ ("scala", &["*.scala", "*.sbt"]),
"*.raku", "*.rakumod", "*.rakudoc", "*.rakutest", ("sh", &[
"*.p6", "*.pl6", "*.pm6"
]),
(&["rdoc"], &["*.rdoc"]),
(&["readme"], &["README*", "*README"]),
(&["reasonml"], &["*.re", "*.rei"]),
(&["red"], &["*.r", "*.red", "*.reds"]),
(&["rescript"], &["*.res", "*.resi"]),
(&["robot"], &["*.robot"]),
(&["rst"], &["*.rst"]),
(&["ruby"], &[
// Idiomatic files
"config.ru", "Gemfile", ".irbrc", "Rakefile",
// Extensions
"*.gemspec", "*.rb", "*.rbw"
]),
(&["rust"], &["*.rs"]),
(&["sass"], &["*.sass", "*.scss"]),
(&["scala"], &["*.scala", "*.sbt"]),
(&["sh"], &[
// Portable/misc. init files // Portable/misc. init files
".login", ".logout", ".profile", "profile", ".login", ".logout", ".profile", "profile",
// bash-specific init files // bash-specific init files
@@ -249,66 +197,54 @@ pub const DEFAULT_TYPES: &[(&[&str], &[&str])] = &[
// Extensions // Extensions
"*.bash", "*.csh", "*.ksh", "*.sh", "*.tcsh", "*.zsh", "*.bash", "*.csh", "*.ksh", "*.sh", "*.tcsh", "*.zsh",
]), ]),
(&["slim"], &["*.skim", "*.slim", "*.slime"]), ("slim", &["*.skim", "*.slim", "*.slime"]),
(&["smarty"], &["*.tpl"]), ("smarty", &["*.tpl"]),
(&["sml"], &["*.sml", "*.sig"]), ("sml", &["*.sml", "*.sig"]),
(&["solidity"], &["*.sol"]), ("soy", &["*.soy"]),
(&["soy"], &["*.soy"]), ("spark", &["*.spark"]),
(&["spark"], &["*.spark"]), ("spec", &["*.spec"]),
(&["spec"], &["*.spec"]), ("sql", &["*.sql", "*.psql"]),
(&["sql"], &["*.sql", "*.psql"]), ("stylus", &["*.styl"]),
(&["stylus"], &["*.styl"]), ("sv", &["*.v", "*.vg", "*.sv", "*.svh", "*.h"]),
(&["sv"], &["*.v", "*.vg", "*.sv", "*.svh", "*.h"]), ("svg", &["*.svg"]),
(&["svg"], &["*.svg"]), ("swift", &["*.swift"]),
(&["swift"], &["*.swift"]), ("swig", &["*.def", "*.i"]),
(&["swig"], &["*.def", "*.i"]), ("systemd", &[
(&["systemd"], &[
"*.automount", "*.conf", "*.device", "*.link", "*.mount", "*.path", "*.automount", "*.conf", "*.device", "*.link", "*.mount", "*.path",
"*.scope", "*.service", "*.slice", "*.socket", "*.swap", "*.target", "*.scope", "*.service", "*.slice", "*.socket", "*.swap", "*.target",
"*.timer", "*.timer",
]), ]),
(&["taskpaper"], &["*.taskpaper"]), ("taskpaper", &["*.taskpaper"]),
(&["tcl"], &["*.tcl"]), ("tcl", &["*.tcl"]),
(&["tex"], &["*.tex", "*.ltx", "*.cls", "*.sty", "*.bib", "*.dtx", "*.ins"]), ("tex", &["*.tex", "*.ltx", "*.cls", "*.sty", "*.bib", "*.dtx", "*.ins"]),
(&["texinfo"], &["*.texi"]), ("textile", &["*.textile"]),
(&["textile"], &["*.textile"]), ("tf", &["*.tf"]),
(&["tf"], &[ ("thrift", &["*.thrift"]),
"*.tf", "*.auto.tfvars", "terraform.tfvars", "*.tf.json", ("toml", &["*.toml", "Cargo.lock"]),
"*.auto.tfvars.json", "terraform.tfvars.json", "*.terraformrc", ("ts", &["*.ts", "*.tsx"]),
"terraform.rc", "*.tfrc", "*.terraform.lock.hcl", ("twig", &["*.twig"]),
]), ("txt", &["*.txt"]),
(&["thrift"], &["*.thrift"]), ("typoscript", &["*.typoscript", "*.ts"]),
(&["toml"], &["*.toml", "Cargo.lock"]), ("vala", &["*.vala"]),
(&["ts", "typescript"], &["*.ts", "*.tsx", "*.cts", "*.mts"]), ("vb", &["*.vb"]),
(&["twig"], &["*.twig"]), ("vcl", &["*.vcl"]),
(&["txt"], &["*.txt"]), ("verilog", &["*.v", "*.vh", "*.sv", "*.svh"]),
(&["typoscript"], &["*.typoscript", "*.ts"]), ("vhdl", &["*.vhd", "*.vhdl"]),
(&["usd"], &["*.usd", "*.usda", "*.usdc"]), ("vim", &["*.vim"]),
(&["v"], &["*.v"]), ("vimscript", &["*.vim"]),
(&["vala"], &["*.vala"]), ("webidl", &["*.idl", "*.webidl", "*.widl"]),
(&["vb"], &["*.vb"]), ("wiki", &["*.mediawiki", "*.wiki"]),
(&["vcl"], &["*.vcl"]), ("xml", &[
(&["verilog"], &["*.v", "*.vh", "*.sv", "*.svh"]),
(&["vhdl"], &["*.vhd", "*.vhdl"]),
(&["vim"], &[
"*.vim", ".vimrc", ".gvimrc", "vimrc", "gvimrc", "_vimrc", "_gvimrc",
]),
(&["vimscript"], &[
"*.vim", ".vimrc", ".gvimrc", "vimrc", "gvimrc", "_vimrc", "_gvimrc",
]),
(&["webidl"], &["*.idl", "*.webidl", "*.widl"]),
(&["wiki"], &["*.mediawiki", "*.wiki"]),
(&["xml"], &[
"*.xml", "*.xml.dist", "*.dtd", "*.xsl", "*.xslt", "*.xsd", "*.xjb", "*.xml", "*.xml.dist", "*.dtd", "*.xsl", "*.xslt", "*.xsd", "*.xjb",
"*.rng", "*.sch", "*.xhtml", "*.rng", "*.sch", "*.xhtml",
]), ]),
(&["xz"], &["*.xz", "*.txz"]), ("xz", &["*.xz", "*.txz"]),
(&["yacc"], &["*.y"]), ("yacc", &["*.y"]),
(&["yaml"], &["*.yaml", "*.yml"]), ("yaml", &["*.yaml", "*.yml"]),
(&["yang"], &["*.yang"]), ("yang", &["*.yang"]),
(&["z"], &["*.Z"]), ("z", &["*.Z"]),
(&["zig"], &["*.zig"]), ("zig", &["*.zig"]),
(&["zsh"], &[ ("zsh", &[
".zshenv", "zshenv", ".zshenv", "zshenv",
".zlogin", "zlogin", ".zlogin", "zlogin",
".zlogout", "zlogout", ".zlogout", "zlogout",
@@ -316,25 +252,5 @@ pub const DEFAULT_TYPES: &[(&[&str], &[&str])] = &[
".zshrc", "zshrc", ".zshrc", "zshrc",
"*.zsh", "*.zsh",
]), ]),
(&["zstd"], &["*.zst", "*.zstd"]), ("zstd", &["*.zst", "*.zstd"]),
]; ];
#[cfg(test)]
mod tests {
use super::DEFAULT_TYPES;
#[test]
fn default_types_are_sorted() {
let mut names = DEFAULT_TYPES.iter().map(|(aliases, _)| aliases[0]);
let Some(mut previous_name) = names.next() else { return; };
for name in names {
assert!(
name > previous_name,
r#""{}" should be sorted before "{}" in `DEFAULT_TYPES`"#,
name,
previous_name
);
previous_name = name;
}
}
}

View File

@@ -20,12 +20,12 @@ use std::io::{self, BufRead};
use std::path::{Path, PathBuf}; use std::path::{Path, PathBuf};
use std::sync::{Arc, RwLock}; use std::sync::{Arc, RwLock};
use crate::gitignore::{self, Gitignore, GitignoreBuilder}; use gitignore::{self, Gitignore, GitignoreBuilder};
use crate::overrides::{self, Override}; use overrides::{self, Override};
use crate::pathutil::{is_hidden, strip_prefix}; use pathutil::{is_hidden, strip_prefix};
use crate::types::{self, Types}; use types::{self, Types};
use crate::walk::DirEntry; use walk::DirEntry;
use crate::{Error, Match, PartialErrorBuilder}; use {Error, Match, PartialErrorBuilder};
/// IgnoreMatch represents information about where a match came from when using /// IgnoreMatch represents information about where a match came from when using
/// the `Ignore` matcher. /// the `Ignore` matcher.
@@ -202,12 +202,11 @@ impl Ignore {
errs.maybe_push(err); errs.maybe_push(err);
igtmp.is_absolute_parent = true; igtmp.is_absolute_parent = true;
igtmp.absolute_base = Some(absolute_base.clone()); igtmp.absolute_base = Some(absolute_base.clone());
igtmp.has_git = igtmp.has_git = if self.0.opts.git_ignore {
if self.0.opts.require_git && self.0.opts.git_ignore { parent.join(".git").exists()
parent.join(".git").exists() } else {
} else { false
false };
};
ig = Ignore(Arc::new(igtmp)); ig = Ignore(Arc::new(igtmp));
compiled.insert(parent.as_os_str().to_os_string(), ig.clone()); compiled.insert(parent.as_os_str().to_os_string(), ig.clone());
} }
@@ -232,9 +231,7 @@ impl Ignore {
/// Like add_child, but takes a full path and returns an IgnoreInner. /// Like add_child, but takes a full path and returns an IgnoreInner.
fn add_child_path(&self, dir: &Path) -> (IgnoreInner, Option<Error>) { fn add_child_path(&self, dir: &Path) -> (IgnoreInner, Option<Error>) {
let git_type = if self.0.opts.require_git let git_type = if self.0.opts.git_ignore || self.0.opts.git_exclude {
&& (self.0.opts.git_ignore || self.0.opts.git_exclude)
{
dir.join(".git").metadata().ok().map(|md| md.file_type()) dir.join(".git").metadata().ok().map(|md| md.file_type())
} else { } else {
None None
@@ -498,7 +495,7 @@ impl Ignore {
} }
/// Returns an iterator over parent ignore matchers, including this one. /// Returns an iterator over parent ignore matchers, including this one.
pub fn parents(&self) -> Parents<'_> { pub fn parents(&self) -> Parents {
Parents(Some(self)) Parents(Some(self))
} }
@@ -584,7 +581,7 @@ impl IgnoreBuilder {
.unwrap(); .unwrap();
let (gi, err) = builder.build_global(); let (gi, err) = builder.build_global();
if let Some(err) = err { if let Some(err) = err {
log::debug!("{}", err); debug!("{}", err);
} }
gi gi
}; };
@@ -843,10 +840,10 @@ mod tests {
use std::io::Write; use std::io::Write;
use std::path::Path; use std::path::Path;
use crate::dir::IgnoreBuilder; use dir::IgnoreBuilder;
use crate::gitignore::Gitignore; use gitignore::Gitignore;
use crate::tests::TempDir; use tests::TempDir;
use crate::Error; use Error;
fn wfile<P: AsRef<Path>>(path: P, contents: &str) { fn wfile<P: AsRef<Path>>(path: P, contents: &str) {
let mut file = File::create(path).unwrap(); let mut file = File::create(path).unwrap();

View File

@@ -19,8 +19,8 @@ use globset::{Candidate, GlobBuilder, GlobSet, GlobSetBuilder};
use regex::bytes::Regex; use regex::bytes::Regex;
use thread_local::ThreadLocal; use thread_local::ThreadLocal;
use crate::pathutil::{is_file_name, strip_prefix}; use pathutil::{is_file_name, strip_prefix};
use crate::{Error, Match, PartialErrorBuilder}; use {Error, Match, PartialErrorBuilder};
/// Glob represents a single glob in a gitignore file. /// Glob represents a single glob in a gitignore file.
/// ///
@@ -474,13 +474,10 @@ impl GitignoreBuilder {
} }
// If it ends with a slash, then this should only match directories, // If it ends with a slash, then this should only match directories,
// but the slash should otherwise not be used while globbing. // but the slash should otherwise not be used while globbing.
if line.as_bytes().last() == Some(&b'/') { if let Some((i, c)) = line.char_indices().rev().nth(0) {
glob.is_only_dir = true; if c == '/' {
line = &line[..line.len() - 1]; glob.is_only_dir = true;
// If the slash was escaped, then remove the escape. line = &line[..i];
// See: https://github.com/BurntSushi/ripgrep/issues/2236
if line.as_bytes().last() == Some(&b'\\') {
line = &line[..line.len() - 1];
} }
} }
glob.actual = line.to_string(); glob.actual = line.to_string();
@@ -533,7 +530,7 @@ impl GitignoreBuilder {
/// Return the file path of the current environment's global gitignore file. /// Return the file path of the current environment's global gitignore file.
/// ///
/// Note that the file path returned may not exist. /// Note that the file path returned may not exist.
pub fn gitconfig_excludes_path() -> Option<PathBuf> { fn gitconfig_excludes_path() -> Option<PathBuf> {
// git supports $HOME/.gitconfig and $XDG_CONFIG_HOME/git/config. Notably, // git supports $HOME/.gitconfig and $XDG_CONFIG_HOME/git/config. Notably,
// both can be active at the same time, where $HOME/.gitconfig takes // both can be active at the same time, where $HOME/.gitconfig takes
// precedent. So if $HOME/.gitconfig defines a `core.excludesFile`, then // precedent. So if $HOME/.gitconfig defines a `core.excludesFile`, then
@@ -595,14 +592,9 @@ fn parse_excludes_file(data: &[u8]) -> Option<PathBuf> {
// N.B. This is the lazy approach, and isn't technically correct, but // N.B. This is the lazy approach, and isn't technically correct, but
// probably works in more circumstances. I guess we would ideally have // probably works in more circumstances. I guess we would ideally have
// a full INI parser. Yuck. // a full INI parser. Yuck.
lazy_static::lazy_static! { lazy_static! {
static ref RE: Regex = Regex::new( static ref RE: Regex =
r"(?xim-u) Regex::new(r"(?im)^\s*excludesfile\s*=\s*(.+)\s*$").unwrap();
^[[:space:]]*excludesfile[[:space:]]*
=
[[:space:]]*(.+)[[:space:]]*$
"
).unwrap();
}; };
let caps = match RE.captures(data) { let caps = match RE.captures(data) {
None => return None, None => return None,

View File

@@ -46,12 +46,25 @@ See the documentation for `WalkBuilder` for many other options.
#![deny(missing_docs)] #![deny(missing_docs)]
extern crate globset;
#[macro_use]
extern crate lazy_static;
#[macro_use]
extern crate log;
extern crate memchr;
extern crate regex;
extern crate same_file;
extern crate thread_local;
extern crate walkdir;
#[cfg(windows)]
extern crate winapi_util;
use std::error; use std::error;
use std::fmt; use std::fmt;
use std::io; use std::io;
use std::path::{Path, PathBuf}; use std::path::{Path, PathBuf};
pub use crate::walk::{ pub use walk::{
DirEntry, ParallelVisitor, ParallelVisitorBuilder, Walk, WalkBuilder, DirEntry, ParallelVisitor, ParallelVisitorBuilder, Walk, WalkBuilder,
WalkParallel, WalkState, WalkParallel, WalkState,
}; };
@@ -321,7 +334,7 @@ impl error::Error for Error {
} }
impl fmt::Display for Error { impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match *self { match *self {
Error::Partial(ref errs) => { Error::Partial(ref errs) => {
let msgs: Vec<String> = let msgs: Vec<String> =

View File

@@ -6,8 +6,8 @@ line tools.
use std::path::Path; use std::path::Path;
use crate::gitignore::{self, Gitignore, GitignoreBuilder}; use gitignore::{self, Gitignore, GitignoreBuilder};
use crate::{Error, Match}; use {Error, Match};
/// Glob represents a single glob in an override matcher. /// Glob represents a single glob in an override matcher.
/// ///
@@ -106,7 +106,6 @@ impl Override {
} }
/// Builds a matcher for a set of glob overrides. /// Builds a matcher for a set of glob overrides.
#[derive(Clone, Debug)]
pub struct OverrideBuilder { pub struct OverrideBuilder {
builder: GitignoreBuilder, builder: GitignoreBuilder,
} }

View File

@@ -1,7 +1,7 @@
use std::ffi::OsStr; use std::ffi::OsStr;
use std::path::Path; use std::path::Path;
use crate::walk::DirEntry; use walk::DirEntry;
/// Returns true if and only if this entry is considered to be hidden. /// Returns true if and only if this entry is considered to be hidden.
/// ///

View File

@@ -93,9 +93,9 @@ use globset::{GlobBuilder, GlobSet, GlobSetBuilder};
use regex::Regex; use regex::Regex;
use thread_local::ThreadLocal; use thread_local::ThreadLocal;
use crate::default_types::DEFAULT_TYPES; use default_types::DEFAULT_TYPES;
use crate::pathutil::file_name; use pathutil::file_name;
use crate::{Error, Match}; use {Error, Match};
/// Glob represents a single glob in a set of file type definitions. /// Glob represents a single glob in a set of file type definitions.
/// ///
@@ -122,6 +122,10 @@ enum GlobInner<'a> {
Matched { Matched {
/// The file type definition which provided the glob. /// The file type definition which provided the glob.
def: &'a FileTypeDef, def: &'a FileTypeDef,
/// The index of the glob that matched inside the file type definition.
which: usize,
/// Whether the selection was negated or not.
negated: bool,
}, },
} }
@@ -287,9 +291,13 @@ impl Types {
self.set.matches_into(name, &mut *matches); self.set.matches_into(name, &mut *matches);
// The highest precedent match is the last one. // The highest precedent match is the last one.
if let Some(&i) = matches.last() { if let Some(&i) = matches.last() {
let (isel, _) = self.glob_to_selection[i]; let (isel, iglob) = self.glob_to_selection[i];
let sel = &self.selections[isel]; let sel = &self.selections[isel];
let glob = Glob(GlobInner::Matched { def: sel.inner() }); let glob = Glob(GlobInner::Matched {
def: sel.inner(),
which: iglob,
negated: sel.is_negated(),
});
return if sel.is_negated() { return if sel.is_negated() {
Match::Ignore(glob) Match::Ignore(glob)
} else { } else {
@@ -419,7 +427,7 @@ impl TypesBuilder {
/// If `name` is `all` or otherwise contains any character that is not a /// If `name` is `all` or otherwise contains any character that is not a
/// Unicode letter or number, then an error is returned. /// Unicode letter or number, then an error is returned.
pub fn add(&mut self, name: &str, glob: &str) -> Result<(), Error> { pub fn add(&mut self, name: &str, glob: &str) -> Result<(), Error> {
lazy_static::lazy_static! { lazy_static! {
static ref RE: Regex = Regex::new(r"^[\pL\pN]+$").unwrap(); static ref RE: Regex = Regex::new(r"^[\pL\pN]+$").unwrap();
}; };
if name == "all" || !RE.is_match(name) { if name == "all" || !RE.is_match(name) {
@@ -488,11 +496,9 @@ impl TypesBuilder {
/// Add a set of default file type definitions. /// Add a set of default file type definitions.
pub fn add_defaults(&mut self) -> &mut TypesBuilder { pub fn add_defaults(&mut self) -> &mut TypesBuilder {
static MSG: &'static str = "adding a default type should never fail"; static MSG: &'static str = "adding a default type should never fail";
for &(names, exts) in DEFAULT_TYPES { for &(name, exts) in DEFAULT_TYPES {
for name in names { for ext in exts {
for ext in exts { self.add(name, ext).expect(MSG);
self.add(name, ext).expect(MSG);
}
} }
} }
self self
@@ -539,8 +545,6 @@ mod tests {
"html:*.htm", "html:*.htm",
"rust:*.rs", "rust:*.rs",
"js:*.js", "js:*.js",
"py:*.py",
"python:*.py",
"foo:*.{rs,foo}", "foo:*.{rs,foo}",
"combo:include:html,rust", "combo:include:html,rust",
] ]
@@ -555,8 +559,6 @@ mod tests {
matched!(match7, types(), vec!["foo"], vec!["rust"], "main.foo"); matched!(match7, types(), vec!["foo"], vec!["rust"], "main.foo");
matched!(match8, types(), vec!["combo"], vec![], "index.html"); matched!(match8, types(), vec!["combo"], vec![], "index.html");
matched!(match9, types(), vec!["combo"], vec![], "lib.rs"); matched!(match9, types(), vec!["combo"], vec![], "lib.rs");
matched!(match10, types(), vec!["py"], vec![], "main.py");
matched!(match11, types(), vec!["python"], vec![], "main.py");
matched!(not, matchnot1, types(), vec!["rust"], vec![], "index.html"); matched!(not, matchnot1, types(), vec!["rust"], vec![], "index.html");
matched!(not, matchnot2, types(), vec![], vec!["rust"], "main.rs"); matched!(not, matchnot2, types(), vec![], vec!["rust"], "main.rs");
@@ -564,8 +566,6 @@ mod tests {
matched!(not, matchnot4, types(), vec!["rust"], vec!["foo"], "main.rs"); matched!(not, matchnot4, types(), vec!["rust"], vec!["foo"], "main.rs");
matched!(not, matchnot5, types(), vec!["rust"], vec!["foo"], "main.foo"); matched!(not, matchnot5, types(), vec!["rust"], vec!["foo"], "main.foo");
matched!(not, matchnot6, types(), vec!["combo"], vec![], "leftpad.js"); matched!(not, matchnot6, types(), vec!["combo"], vec![], "leftpad.js");
matched!(not, matchnot7, types(), vec!["py"], vec![], "index.html");
matched!(not, matchnot8, types(), vec!["python"], vec![], "doc.md");
#[test] #[test]
fn test_invalid_defs() { fn test_invalid_defs() {
@@ -577,7 +577,7 @@ mod tests {
let original_defs = btypes.definitions(); let original_defs = btypes.definitions();
let bad_defs = vec![ let bad_defs = vec![
// Reference to type that does not exist // Reference to type that does not exist
"combo:include:html,qwerty", "combo:include:html,python",
// Bad format // Bad format
"combo:foobar:html,rust", "combo:foobar:html,rust",
"", "",

View File

@@ -13,11 +13,11 @@ use std::vec;
use same_file::Handle; use same_file::Handle;
use walkdir::{self, WalkDir}; use walkdir::{self, WalkDir};
use crate::dir::{Ignore, IgnoreBuilder}; use dir::{Ignore, IgnoreBuilder};
use crate::gitignore::GitignoreBuilder; use gitignore::GitignoreBuilder;
use crate::overrides::Override; use overrides::Override;
use crate::types::Types; use types::Types;
use crate::{Error, PartialErrorBuilder}; use {Error, PartialErrorBuilder};
/// A directory entry with a possible error attached. /// A directory entry with a possible error attached.
/// ///
@@ -252,7 +252,7 @@ struct DirEntryRaw {
} }
impl fmt::Debug for DirEntryRaw { impl fmt::Debug for DirEntryRaw {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
// Leaving out FileType because it doesn't have a debug impl // Leaving out FileType because it doesn't have a debug impl
// in Rust 1.9. We could add it if we really wanted to by manually // in Rust 1.9. We could add it if we really wanted to by manually
// querying each possibly file type. Meh. ---AG // querying each possibly file type. Meh. ---AG
@@ -504,7 +504,7 @@ enum Sorter {
struct Filter(Arc<dyn Fn(&DirEntry) -> bool + Send + Sync + 'static>); struct Filter(Arc<dyn Fn(&DirEntry) -> bool + Send + Sync + 'static>);
impl fmt::Debug for WalkBuilder { impl fmt::Debug for WalkBuilder {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
f.debug_struct("WalkBuilder") f.debug_struct("WalkBuilder")
.field("paths", &self.paths) .field("paths", &self.paths)
.field("ig_builder", &self.ig_builder) .field("ig_builder", &self.ig_builder)
@@ -934,23 +934,15 @@ impl Walk {
if ent.depth() == 0 { if ent.depth() == 0 {
return Ok(false); return Ok(false);
} }
// We ensure that trivial skipping is done before any other potentially
// expensive operations (stat, filesystem other) are done. This seems
// like an obvious optimization but becomes critical when filesystem
// operations even as simple as stat can result in significant
// overheads; an example of this was a bespoke filesystem layer in
// Windows that hosted files remotely and would download them on-demand
// when particular filesystem operations occurred. Users of this system
// who ensured correct file-type filters were being used could still
// get unnecessary file access resulting in large downloads.
if should_skip_entry(&self.ig, ent) {
return Ok(true);
}
if let Some(ref stdout) = self.skip { if let Some(ref stdout) = self.skip {
if path_equals(ent, stdout)? { if path_equals(ent, stdout)? {
return Ok(true); return Ok(true);
} }
} }
if should_skip_entry(&self.ig, ent) {
return Ok(true);
}
if self.max_filesize.is_some() && !ent.is_dir() { if self.max_filesize.is_some() && !ent.is_dir() {
return Ok(skip_filesize( return Ok(skip_filesize(
self.max_filesize.unwrap(), self.max_filesize.unwrap(),
@@ -1226,7 +1218,7 @@ impl WalkParallel {
/// visitor runs on only one thread, this build-up can be done without /// visitor runs on only one thread, this build-up can be done without
/// synchronization. Then, once traversal is complete, all of the results /// synchronization. Then, once traversal is complete, all of the results
/// can be merged together into a single data structure. /// can be merged together into a single data structure.
pub fn visit(mut self, builder: &mut dyn ParallelVisitorBuilder<'_>) { pub fn visit(mut self, builder: &mut dyn ParallelVisitorBuilder) {
let threads = self.threads(); let threads = self.threads();
let stack = Arc::new(Mutex::new(vec![])); let stack = Arc::new(Mutex::new(vec![]));
{ {
@@ -1282,7 +1274,7 @@ impl WalkParallel {
let quit_now = Arc::new(AtomicBool::new(false)); let quit_now = Arc::new(AtomicBool::new(false));
let num_pending = let num_pending =
Arc::new(AtomicUsize::new(stack.lock().unwrap().len())); Arc::new(AtomicUsize::new(stack.lock().unwrap().len()));
std::thread::scope(|s| { crossbeam_utils::thread::scope(|s| {
let mut handles = vec![]; let mut handles = vec![];
for _ in 0..threads { for _ in 0..threads {
let worker = Worker { let worker = Worker {
@@ -1296,12 +1288,13 @@ impl WalkParallel {
skip: self.skip.clone(), skip: self.skip.clone(),
filter: self.filter.clone(), filter: self.filter.clone(),
}; };
handles.push(s.spawn(|| worker.run())); handles.push(s.spawn(|_| worker.run()));
} }
for handle in handles { for handle in handles {
handle.join().unwrap(); handle.join().unwrap();
} }
}); })
.unwrap(); // Pass along panics from threads
} }
fn threads(&self) -> usize { fn threads(&self) -> usize {
@@ -1556,11 +1549,6 @@ impl<'s> Worker<'s> {
} }
} }
} }
// N.B. See analogous call in the single-threaded implementation about
// why it's important for this to come before the checks below.
if should_skip_entry(ig, &dent) {
return WalkState::Continue;
}
if let Some(ref stdout) = self.skip { if let Some(ref stdout) = self.skip {
let is_stdout = match path_equals(&dent, stdout) { let is_stdout = match path_equals(&dent, stdout) {
Ok(is_stdout) => is_stdout, Ok(is_stdout) => is_stdout,
@@ -1570,6 +1558,7 @@ impl<'s> Worker<'s> {
return WalkState::Continue; return WalkState::Continue;
} }
} }
let should_skip_path = should_skip_entry(ig, &dent);
let should_skip_filesize = let should_skip_filesize =
if self.max_filesize.is_some() && !dent.is_dir() { if self.max_filesize.is_some() && !dent.is_dir() {
skip_filesize( skip_filesize(
@@ -1586,7 +1575,8 @@ impl<'s> Worker<'s> {
} else { } else {
false false
}; };
if !should_skip_filesize && !should_skip_filtered { if !should_skip_path && !should_skip_filesize && !should_skip_filtered
{
self.send(Work { dent, ignore: ig.clone(), root_device }); self.send(Work { dent, ignore: ig.clone(), root_device });
} }
WalkState::Continue WalkState::Continue
@@ -1681,7 +1671,7 @@ impl<'s> Worker<'s> {
stack.pop() stack.pop()
} }
/// Signal that work has been finished. /// Signal that work has been received.
fn work_done(&self) { fn work_done(&self) {
self.num_pending.fetch_sub(1, Ordering::SeqCst); self.num_pending.fetch_sub(1, Ordering::SeqCst);
} }
@@ -1724,7 +1714,7 @@ fn skip_filesize(
if let Some(fs) = filesize { if let Some(fs) = filesize {
if fs > max_filesize { if fs > max_filesize {
log::debug!("ignoring {}: {} bytes", path.display(), fs); debug!("ignoring {}: {} bytes", path.display(), fs);
true true
} else { } else {
false false
@@ -1737,10 +1727,10 @@ fn skip_filesize(
fn should_skip_entry(ig: &Ignore, dent: &DirEntry) -> bool { fn should_skip_entry(ig: &Ignore, dent: &DirEntry) -> bool {
let m = ig.matched_dir_entry(dent); let m = ig.matched_dir_entry(dent);
if m.is_ignore() { if m.is_ignore() {
log::debug!("ignoring {}: {:?}", dent.path().display(), m); debug!("ignoring {}: {:?}", dent.path().display(), m);
true true
} else if m.is_whitelist() { } else if m.is_whitelist() {
log::debug!("whitelisting {}: {:?}", dent.path().display(), m); debug!("whitelisting {}: {:?}", dent.path().display(), m);
false false
} else { } else {
false false
@@ -1851,7 +1841,7 @@ mod tests {
use std::sync::{Arc, Mutex}; use std::sync::{Arc, Mutex};
use super::{DirEntry, WalkBuilder, WalkState}; use super::{DirEntry, WalkBuilder, WalkState};
use crate::tests::TempDir; use tests::TempDir;
fn wfile<P: AsRef<Path>>(path: P, contents: &str) { fn wfile<P: AsRef<Path>>(path: P, contents: &str) {
let mut file = File::create(path).unwrap(); let mut file = File::create(path).unwrap();

View File

@@ -1,3 +1,5 @@
extern crate ignore;
use std::path::Path; use std::path::Path;
use ignore::gitignore::{Gitignore, GitignoreBuilder}; use ignore::gitignore::{Gitignore, GitignoreBuilder};

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "grep-matcher" name = "grep-matcher"
version = "0.1.6" #:version version = "0.1.4" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"] authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """ description = """
A trait for regular expressions, with a focus on line oriented search. A trait for regular expressions, with a focus on line oriented search.
@@ -10,9 +10,8 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/matcher"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/matcher" repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/matcher"
readme = "README.md" readme = "README.md"
keywords = ["regex", "pattern", "trait"] keywords = ["regex", "pattern", "trait"]
license = "Unlicense OR MIT" license = "Unlicense/MIT"
autotests = false autotests = false
edition = "2018"
[dependencies] [dependencies]
memchr = "2.1" memchr = "2.1"

View File

@@ -27,3 +27,9 @@ Add this to your `Cargo.toml`:
[dependencies] [dependencies]
grep-matcher = "0.1" grep-matcher = "0.1"
``` ```
and this to your crate root:
```rust
extern crate grep_matcher;
```

View File

@@ -92,7 +92,7 @@ impl From<usize> for Ref<'static> {
/// starting at the beginning of `replacement`. /// starting at the beginning of `replacement`.
/// ///
/// If no such valid reference could be found, None is returned. /// If no such valid reference could be found, None is returned.
fn find_cap_ref(replacement: &[u8]) -> Option<CaptureRef<'_>> { fn find_cap_ref(replacement: &[u8]) -> Option<CaptureRef> {
let mut i = 0; let mut i = 0;
if replacement.len() <= 1 || replacement[0] != b'$' { if replacement.len() <= 1 || replacement[0] != b'$' {
return None; return None;

View File

@@ -38,12 +38,14 @@ implementations.
#![deny(missing_docs)] #![deny(missing_docs)]
extern crate memchr;
use std::fmt; use std::fmt;
use std::io; use std::io;
use std::ops; use std::ops;
use std::u64; use std::u64;
use crate::interpolate::interpolate; use interpolate::interpolate;
mod interpolate; mod interpolate;
@@ -116,7 +118,7 @@ impl Match {
/// This method panics if `start > self.end`. /// This method panics if `start > self.end`.
#[inline] #[inline]
pub fn with_start(&self, start: usize) -> Match { pub fn with_start(&self, start: usize) -> Match {
assert!(start <= self.end, "{} is not <= {}", start, self.end); assert!(start <= self.end);
Match { start, ..*self } Match { start, ..*self }
} }
@@ -128,7 +130,7 @@ impl Match {
/// This method panics if `self.start > end`. /// This method panics if `self.start > end`.
#[inline] #[inline]
pub fn with_end(&self, end: usize) -> Match { pub fn with_end(&self, end: usize) -> Match {
assert!(self.start <= end, "{} is not <= {}", self.start, end); assert!(self.start <= end);
Match { end, ..*self } Match { end, ..*self }
} }
@@ -302,7 +304,7 @@ pub struct ByteSet(BitSet);
struct BitSet([u64; 4]); struct BitSet([u64; 4]);
impl fmt::Debug for BitSet { impl fmt::Debug for BitSet {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
let mut fmtd = f.debug_set(); let mut fmtd = f.debug_set();
for b in (0..256).map(|b| b as u8) { for b in (0..256).map(|b| b as u8) {
if ByteSet(*self).contains(b) { if ByteSet(*self).contains(b) {
@@ -492,7 +494,7 @@ impl ::std::error::Error for NoError {
} }
impl fmt::Display for NoError { impl fmt::Display for NoError {
fn fmt(&self, _: &mut fmt::Formatter<'_>) -> fmt::Result { fn fmt(&self, _: &mut fmt::Formatter) -> fmt::Result {
panic!("BUG for NoError: an impossible error occurred") panic!("BUG for NoError: an impossible error occurred")
} }
} }
@@ -616,31 +618,12 @@ pub trait Matcher {
fn find_iter<F>( fn find_iter<F>(
&self, &self,
haystack: &[u8], haystack: &[u8],
matched: F,
) -> Result<(), Self::Error>
where
F: FnMut(Match) -> bool,
{
self.find_iter_at(haystack, 0, matched)
}
/// Executes the given function over successive non-overlapping matches
/// in `haystack`. If no match exists, then the given function is never
/// called. If the function returns `false`, then iteration stops.
///
/// The significance of the starting point is that it takes the surrounding
/// context into consideration. For example, the `\A` anchor can only
/// match when `at == 0`.
fn find_iter_at<F>(
&self,
haystack: &[u8],
at: usize,
mut matched: F, mut matched: F,
) -> Result<(), Self::Error> ) -> Result<(), Self::Error>
where where
F: FnMut(Match) -> bool, F: FnMut(Match) -> bool,
{ {
self.try_find_iter_at(haystack, at, |m| Ok(matched(m))) self.try_find_iter(haystack, |m| Ok(matched(m)))
.map(|r: Result<(), ()>| r.unwrap()) .map(|r: Result<(), ()>| r.unwrap())
} }
@@ -654,35 +637,12 @@ pub trait Matcher {
fn try_find_iter<F, E>( fn try_find_iter<F, E>(
&self, &self,
haystack: &[u8], haystack: &[u8],
matched: F,
) -> Result<Result<(), E>, Self::Error>
where
F: FnMut(Match) -> Result<bool, E>,
{
self.try_find_iter_at(haystack, 0, matched)
}
/// Executes the given function over successive non-overlapping matches
/// in `haystack`. If no match exists, then the given function is never
/// called. If the function returns `false`, then iteration stops.
/// Similarly, if the function returns an error then iteration stops and
/// the error is yielded. If an error occurs while executing the search,
/// then it is converted to
/// `E`.
///
/// The significance of the starting point is that it takes the surrounding
/// context into consideration. For example, the `\A` anchor can only
/// match when `at == 0`.
fn try_find_iter_at<F, E>(
&self,
haystack: &[u8],
at: usize,
mut matched: F, mut matched: F,
) -> Result<Result<(), E>, Self::Error> ) -> Result<Result<(), E>, Self::Error>
where where
F: FnMut(Match) -> Result<bool, E>, F: FnMut(Match) -> Result<bool, E>,
{ {
let mut last_end = at; let mut last_end = 0;
let mut last_match = None; let mut last_match = None;
loop { loop {
@@ -736,33 +696,12 @@ pub trait Matcher {
&self, &self,
haystack: &[u8], haystack: &[u8],
caps: &mut Self::Captures, caps: &mut Self::Captures,
matched: F,
) -> Result<(), Self::Error>
where
F: FnMut(&Self::Captures) -> bool,
{
self.captures_iter_at(haystack, 0, caps, matched)
}
/// Executes the given function over successive non-overlapping matches
/// in `haystack` with capture groups extracted from each match. If no
/// match exists, then the given function is never called. If the function
/// returns `false`, then iteration stops.
///
/// The significance of the starting point is that it takes the surrounding
/// context into consideration. For example, the `\A` anchor can only
/// match when `at == 0`.
fn captures_iter_at<F>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
mut matched: F, mut matched: F,
) -> Result<(), Self::Error> ) -> Result<(), Self::Error>
where where
F: FnMut(&Self::Captures) -> bool, F: FnMut(&Self::Captures) -> bool,
{ {
self.try_captures_iter_at(haystack, at, caps, |caps| Ok(matched(caps))) self.try_captures_iter(haystack, caps, |caps| Ok(matched(caps)))
.map(|r: Result<(), ()>| r.unwrap()) .map(|r: Result<(), ()>| r.unwrap())
} }
@@ -777,36 +716,12 @@ pub trait Matcher {
&self, &self,
haystack: &[u8], haystack: &[u8],
caps: &mut Self::Captures, caps: &mut Self::Captures,
matched: F,
) -> Result<Result<(), E>, Self::Error>
where
F: FnMut(&Self::Captures) -> Result<bool, E>,
{
self.try_captures_iter_at(haystack, 0, caps, matched)
}
/// Executes the given function over successive non-overlapping matches
/// in `haystack` with capture groups extracted from each match. If no
/// match exists, then the given function is never called. If the function
/// returns `false`, then iteration stops. Similarly, if the function
/// returns an error then iteration stops and the error is yielded. If
/// an error occurs while executing the search, then it is converted to
/// `E`.
///
/// The significance of the starting point is that it takes the surrounding
/// context into consideration. For example, the `\A` anchor can only
/// match when `at == 0`.
fn try_captures_iter_at<F, E>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
mut matched: F, mut matched: F,
) -> Result<Result<(), E>, Self::Error> ) -> Result<Result<(), E>, Self::Error>
where where
F: FnMut(&Self::Captures) -> Result<bool, E>, F: FnMut(&Self::Captures) -> Result<bool, E>,
{ {
let mut last_end = at; let mut last_end = 0;
let mut last_match = None; let mut last_match = None;
loop { loop {
@@ -904,35 +819,13 @@ pub trait Matcher {
haystack: &[u8], haystack: &[u8],
caps: &mut Self::Captures, caps: &mut Self::Captures,
dst: &mut Vec<u8>, dst: &mut Vec<u8>,
append: F,
) -> Result<(), Self::Error>
where
F: FnMut(&Self::Captures, &mut Vec<u8>) -> bool,
{
self.replace_with_captures_at(haystack, 0, caps, dst, append)
}
/// Replaces every match in the given haystack with the result of calling
/// `append` with the matching capture groups.
///
/// If the given `append` function returns `false`, then replacement stops.
///
/// The significance of the starting point is that it takes the surrounding
/// context into consideration. For example, the `\A` anchor can only
/// match when `at == 0`.
fn replace_with_captures_at<F>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
dst: &mut Vec<u8>,
mut append: F, mut append: F,
) -> Result<(), Self::Error> ) -> Result<(), Self::Error>
where where
F: FnMut(&Self::Captures, &mut Vec<u8>) -> bool, F: FnMut(&Self::Captures, &mut Vec<u8>) -> bool,
{ {
let mut last_match = at; let mut last_match = 0;
self.captures_iter_at(haystack, at, caps, |caps| { self.captures_iter(haystack, caps, |caps| {
let m = caps.get(0).unwrap(); let m = caps.get(0).unwrap();
dst.extend(&haystack[last_match..m.start]); dst.extend(&haystack[last_match..m.start]);
last_match = m.end; last_match = m.end;
@@ -1146,18 +1039,6 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).find_iter(haystack, matched) (*self).find_iter(haystack, matched)
} }
fn find_iter_at<F>(
&self,
haystack: &[u8],
at: usize,
matched: F,
) -> Result<(), Self::Error>
where
F: FnMut(Match) -> bool,
{
(*self).find_iter_at(haystack, at, matched)
}
fn try_find_iter<F, E>( fn try_find_iter<F, E>(
&self, &self,
haystack: &[u8], haystack: &[u8],
@@ -1169,18 +1050,6 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).try_find_iter(haystack, matched) (*self).try_find_iter(haystack, matched)
} }
fn try_find_iter_at<F, E>(
&self,
haystack: &[u8],
at: usize,
matched: F,
) -> Result<Result<(), E>, Self::Error>
where
F: FnMut(Match) -> Result<bool, E>,
{
(*self).try_find_iter_at(haystack, at, matched)
}
fn captures( fn captures(
&self, &self,
haystack: &[u8], haystack: &[u8],
@@ -1201,19 +1070,6 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).captures_iter(haystack, caps, matched) (*self).captures_iter(haystack, caps, matched)
} }
fn captures_iter_at<F>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
matched: F,
) -> Result<(), Self::Error>
where
F: FnMut(&Self::Captures) -> bool,
{
(*self).captures_iter_at(haystack, at, caps, matched)
}
fn try_captures_iter<F, E>( fn try_captures_iter<F, E>(
&self, &self,
haystack: &[u8], haystack: &[u8],
@@ -1226,19 +1082,6 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).try_captures_iter(haystack, caps, matched) (*self).try_captures_iter(haystack, caps, matched)
} }
fn try_captures_iter_at<F, E>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
matched: F,
) -> Result<Result<(), E>, Self::Error>
where
F: FnMut(&Self::Captures) -> Result<bool, E>,
{
(*self).try_captures_iter_at(haystack, at, caps, matched)
}
fn replace<F>( fn replace<F>(
&self, &self,
haystack: &[u8], haystack: &[u8],
@@ -1264,20 +1107,6 @@ impl<'a, M: Matcher> Matcher for &'a M {
(*self).replace_with_captures(haystack, caps, dst, append) (*self).replace_with_captures(haystack, caps, dst, append)
} }
fn replace_with_captures_at<F>(
&self,
haystack: &[u8],
at: usize,
caps: &mut Self::Captures,
dst: &mut Vec<u8>,
append: F,
) -> Result<(), Self::Error>
where
F: FnMut(&Self::Captures, &mut Vec<u8>) -> bool,
{
(*self).replace_with_captures_at(haystack, at, caps, dst, append)
}
fn is_match(&self, haystack: &[u8]) -> Result<bool, Self::Error> { fn is_match(&self, haystack: &[u8]) -> Result<bool, Self::Error> {
(*self).is_match(haystack) (*self).is_match(haystack)
} }

View File

@@ -1,7 +1,7 @@
use grep_matcher::{Captures, Match, Matcher}; use grep_matcher::{Captures, Match, Matcher};
use regex::bytes::Regex; use regex::bytes::Regex;
use crate::util::{RegexMatcher, RegexMatcherNoCaps}; use util::{RegexMatcher, RegexMatcherNoCaps};
fn matcher(pattern: &str) -> RegexMatcher { fn matcher(pattern: &str) -> RegexMatcher {
RegexMatcher::new(Regex::new(pattern).unwrap()) RegexMatcher::new(Regex::new(pattern).unwrap())

View File

@@ -1,3 +1,6 @@
extern crate grep_matcher;
extern crate regex;
mod util; mod util;
mod test_matcher; mod test_matcher;

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "grep-pcre2" name = "grep-pcre2"
version = "0.1.6" #:version version = "0.1.4" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"] authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """ description = """
Use PCRE2 with the 'grep' crate. Use PCRE2 with the 'grep' crate.
@@ -10,10 +10,8 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/pcre2"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/pcre2" repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/pcre2"
readme = "README.md" readme = "README.md"
keywords = ["regex", "grep", "pcre", "backreference", "look"] keywords = ["regex", "grep", "pcre", "backreference", "look"]
license = "Unlicense OR MIT" license = "Unlicense/MIT"
edition = "2018"
[dependencies] [dependencies]
grep-matcher = { version = "0.1.6", path = "../matcher" } grep-matcher = { version = "0.1.2", path = "../matcher" }
log = "0.4.19" pcre2 = "0.2.0"
pcre2 = "0.2.4"

View File

@@ -30,3 +30,9 @@ Add this to your `Cargo.toml`:
[dependencies] [dependencies]
grep-pcre2 = "0.1" grep-pcre2 = "0.1"
``` ```
and this to your crate root:
```rust
extern crate grep_pcre2;
```

View File

@@ -50,7 +50,7 @@ impl error::Error for Error {
} }
impl fmt::Display for Error { impl fmt::Display for Error {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self.kind { match self.kind {
ErrorKind::Regex(ref s) => write!(f, "{}", s), ErrorKind::Regex(ref s) => write!(f, "{}", s),
ErrorKind::__Nonexhaustive => unreachable!(), ErrorKind::__Nonexhaustive => unreachable!(),

View File

@@ -5,8 +5,11 @@ An implementation of `grep-matcher`'s `Matcher` trait for
#![deny(missing_docs)] #![deny(missing_docs)]
pub use crate::error::{Error, ErrorKind}; extern crate grep_matcher;
pub use crate::matcher::{RegexCaptures, RegexMatcher, RegexMatcherBuilder}; extern crate pcre2;
pub use error::{Error, ErrorKind};
pub use matcher::{RegexCaptures, RegexMatcher, RegexMatcherBuilder};
pub use pcre2::{is_jit_available, version}; pub use pcre2::{is_jit_available, version};
mod error; mod error;

View File

@@ -3,7 +3,7 @@ use std::collections::HashMap;
use grep_matcher::{Captures, Match, Matcher}; use grep_matcher::{Captures, Match, Matcher};
use pcre2::bytes::{CaptureLocations, Regex, RegexBuilder}; use pcre2::bytes::{CaptureLocations, Regex, RegexBuilder};
use crate::error::Error; use error::Error;
/// A builder for configuring the compilation of a PCRE2 regex. /// A builder for configuring the compilation of a PCRE2 regex.
#[derive(Clone, Debug)] #[derive(Clone, Debug)]
@@ -11,8 +11,6 @@ pub struct RegexMatcherBuilder {
builder: RegexBuilder, builder: RegexBuilder,
case_smart: bool, case_smart: bool,
word: bool, word: bool,
fixed_strings: bool,
whole_line: bool,
} }
impl RegexMatcherBuilder { impl RegexMatcherBuilder {
@@ -22,8 +20,6 @@ impl RegexMatcherBuilder {
builder: RegexBuilder::new(), builder: RegexBuilder::new(),
case_smart: false, case_smart: false,
word: false, word: false,
fixed_strings: false,
whole_line: false,
} }
} }
@@ -33,40 +29,17 @@ impl RegexMatcherBuilder {
/// If there was a problem compiling the pattern, then an error is /// If there was a problem compiling the pattern, then an error is
/// returned. /// returned.
pub fn build(&self, pattern: &str) -> Result<RegexMatcher, Error> { pub fn build(&self, pattern: &str) -> Result<RegexMatcher, Error> {
self.build_many(&[pattern])
}
/// Compile all of the given patterns into a single regex that matches when
/// at least one of the patterns matches.
///
/// If there was a problem building the regex, then an error is returned.
pub fn build_many<P: AsRef<str>>(
&self,
patterns: &[P],
) -> Result<RegexMatcher, Error> {
let mut builder = self.builder.clone(); let mut builder = self.builder.clone();
let mut pats = Vec::with_capacity(patterns.len()); if self.case_smart && !has_uppercase_literal(pattern) {
for p in patterns.iter() {
pats.push(if self.fixed_strings {
format!("(?:{})", pcre2::escape(p.as_ref()))
} else {
format!("(?:{})", p.as_ref())
});
}
let mut singlepat = pats.join("|");
if self.case_smart && !has_uppercase_literal(&singlepat) {
builder.caseless(true); builder.caseless(true);
} }
if self.whole_line { let res = if self.word {
singlepat = format!(r"(?m:^)(?:{})(?m:$)", singlepat); let pattern = format!(r"(?<!\w)(?:{})(?!\w)", pattern);
} else if self.word { builder.build(&pattern)
// We make this option exclusive with whole_line because when } else {
// whole_line is enabled, all matches necessary fall on word builder.build(pattern)
// boundaries. So this extra goop is strictly redundant. };
singlepat = format!(r"(?<!\w)(?:{})(?!\w)", singlepat); res.map_err(Error::regex).map(|regex| {
}
log::trace!("final regex: {:?}", singlepat);
builder.build(&singlepat).map_err(Error::regex).map(|regex| {
let mut names = HashMap::new(); let mut names = HashMap::new();
for (i, name) in regex.capture_names().iter().enumerate() { for (i, name) in regex.capture_names().iter().enumerate() {
if let Some(ref name) = *name { if let Some(ref name) = *name {
@@ -171,21 +144,6 @@ impl RegexMatcherBuilder {
self self
} }
/// Whether the patterns should be treated as literal strings or not. When
/// this is active, all characters, including ones that would normally be
/// special regex meta characters, are matched literally.
pub fn fixed_strings(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.fixed_strings = yes;
self
}
/// Whether each pattern should match the entire line or not. This is
/// equivalent to surrounding the pattern with `(?m:^)` and `(?m:$)`.
pub fn whole_line(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.whole_line = yes;
self
}
/// Enable Unicode matching mode. /// Enable Unicode matching mode.
/// ///
/// When enabled, the following patterns become Unicode aware: `\b`, `\B`, /// When enabled, the following patterns become Unicode aware: `\b`, `\B`,
@@ -220,22 +178,23 @@ impl RegexMatcherBuilder {
self self
} }
/// This is now deprecated and is a no-op. /// When UTF matching mode is enabled, this will disable the UTF checking
/// that PCRE2 will normally perform automatically. If UTF matching mode
/// is not enabled, then this has no effect.
/// ///
/// Previously, this option permitted disabling PCRE2's UTF-8 validity /// UTF checking is enabled by default when UTF matching mode is enabled.
/// check, which could result in undefined behavior if the haystack was /// If UTF matching mode is enabled and UTF checking is enabled, then PCRE2
/// not valid UTF-8. But PCRE2 introduced a new option, `PCRE2_MATCH_INVALID_UTF`, /// will return an error if you attempt to search a subject string that is
/// in 10.34 which this crate always sets. When this option is enabled, /// not valid UTF-8.
/// PCRE2 claims to not have undefined behavior when the haystack is
/// invalid UTF-8.
/// ///
/// Therefore, disabling the UTF-8 check is not something that is exposed /// # Safety
/// by this crate. ///
#[deprecated( /// It is undefined behavior to disable the UTF check in UTF matching mode
since = "0.2.4", /// and search a subject string that is not valid UTF-8. When the UTF check
note = "now a no-op due to new PCRE2 features" /// is disabled, callers must guarantee that the subject string is valid
)] /// UTF-8.
pub fn disable_utf_check(&mut self) -> &mut RegexMatcherBuilder { pub unsafe fn disable_utf_check(&mut self) -> &mut RegexMatcherBuilder {
self.builder.disable_utf_check();
self self
} }

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "grep-printer" name = "grep-printer"
version = "0.1.7" #:version version = "0.1.5" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"] authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """ description = """
An implementation of the grep crate's Sink trait that provides standard An implementation of the grep crate's Sink trait that provides standard
@@ -11,21 +11,21 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/printer"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/printer" repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/printer"
readme = "README.md" readme = "README.md"
keywords = ["grep", "pattern", "print", "printer", "sink"] keywords = ["grep", "pattern", "print", "printer", "sink"]
license = "Unlicense OR MIT" license = "Unlicense/MIT"
edition = "2018"
[features] [features]
default = ["serde1"] default = ["serde1"]
serde1 = ["base64", "serde", "serde_json"] serde1 = ["base64", "serde", "serde_derive", "serde_json"]
[dependencies] [dependencies]
base64 = { version = "0.20.0", optional = true } base64 = { version = "0.13.0", optional = true }
bstr = "1.6.0" bstr = "0.2.0"
grep-matcher = { version = "0.1.6", path = "../matcher" } grep-matcher = { version = "0.1.2", path = "../matcher" }
grep-searcher = { version = "0.1.11", path = "../searcher" } grep-searcher = { version = "0.1.4", path = "../searcher" }
termcolor = "1.0.4" termcolor = "1.0.4"
serde = { version = "1.0.77", optional = true, features = ["derive"] } serde = { version = "1.0.77", optional = true }
serde_derive = { version = "1.0.77", optional = true }
serde_json = { version = "1.0.27", optional = true } serde_json = { version = "1.0.27", optional = true }
[dev-dependencies] [dev-dependencies]
grep-regex = { version = "0.1.11", path = "../regex" } grep-regex = { version = "0.1.3", path = "../regex" }

View File

@@ -26,3 +26,9 @@ Add this to your `Cargo.toml`:
[dependencies] [dependencies]
grep-printer = "0.1" grep-printer = "0.1"
``` ```
and this to your crate root:
```rust
extern crate grep_printer;
```

View File

@@ -60,7 +60,7 @@ impl ColorError {
} }
impl fmt::Display for ColorError { impl fmt::Display for ColorError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match *self { match *self {
ColorError::UnrecognizedOutType(ref name) => write!( ColorError::UnrecognizedOutType(ref name) => write!(
f, f,
@@ -147,6 +147,9 @@ pub struct ColorSpecs {
/// A `UserColorSpec` can also be converted to a `termcolor::ColorSpec`: /// A `UserColorSpec` can also be converted to a `termcolor::ColorSpec`:
/// ///
/// ```rust /// ```rust
/// extern crate grep_printer;
/// extern crate termcolor;
///
/// # fn main() { /// # fn main() {
/// use termcolor::{Color, ColorSpec}; /// use termcolor::{Color, ColorSpec};
/// use grep_printer::UserColorSpec; /// use grep_printer::UserColorSpec;

View File

@@ -4,14 +4,14 @@ use std::time::Instant;
use grep_matcher::{Match, Matcher}; use grep_matcher::{Match, Matcher};
use grep_searcher::{ use grep_searcher::{
Searcher, Sink, SinkContext, SinkContextKind, SinkFinish, SinkMatch, Searcher, Sink, SinkContext, SinkContextKind, SinkError, SinkFinish,
SinkMatch,
}; };
use serde_json as json; use serde_json as json;
use crate::counter::CounterWriter; use counter::CounterWriter;
use crate::jsont; use jsont;
use crate::stats::Stats; use stats::Stats;
use crate::util::find_iter_at_in_context;
/// The configuration for the JSON printer. /// The configuration for the JSON printer.
/// ///
@@ -147,7 +147,7 @@ impl JSONBuilder {
/// is not limited to UTF-8 exclusively, which in turn implies that matches /// is not limited to UTF-8 exclusively, which in turn implies that matches
/// may be reported that contain invalid UTF-8. Moreover, this printer may /// may be reported that contain invalid UTF-8. Moreover, this printer may
/// also print file paths, and the encoding of file paths is itself not /// also print file paths, and the encoding of file paths is itself not
/// guaranteed to be valid UTF-8. Therefore, this printer must deal with the /// guarnateed to be valid UTF-8. Therefore, this printer must deal with the
/// presence of invalid UTF-8 somehow. The printer could silently ignore such /// presence of invalid UTF-8 somehow. The printer could silently ignore such
/// things completely, or even lossily transcode invalid UTF-8 to valid UTF-8 /// things completely, or even lossily transcode invalid UTF-8 to valid UTF-8
/// by replacing all invalid sequences with the Unicode replacement character. /// by replacing all invalid sequences with the Unicode replacement character.
@@ -507,10 +507,7 @@ impl<W: io::Write> JSON<W> {
/// Write the given message followed by a new line. The new line is /// Write the given message followed by a new line. The new line is
/// determined from the configuration of the given searcher. /// determined from the configuration of the given searcher.
fn write_message( fn write_message(&mut self, message: &jsont::Message) -> io::Result<()> {
&mut self,
message: &jsont::Message<'_>,
) -> io::Result<()> {
if self.config.pretty { if self.config.pretty {
json::to_writer_pretty(&mut self.wtr, message)?; json::to_writer_pretty(&mut self.wtr, message)?;
} else { } else {
@@ -555,7 +552,7 @@ impl<W> JSON<W> {
/// * `W` refers to the underlying writer that this printer is writing its /// * `W` refers to the underlying writer that this printer is writing its
/// output to. /// output to.
#[derive(Debug)] #[derive(Debug)]
pub struct JSONSink<'p, 's, M: Matcher, W> { pub struct JSONSink<'p, 's, M: Matcher, W: 's> {
matcher: M, matcher: M,
json: &'s mut JSON<W>, json: &'s mut JSON<W>,
path: Option<&'p Path>, path: Option<&'p Path>,
@@ -606,12 +603,7 @@ impl<'p, 's, M: Matcher, W: io::Write> JSONSink<'p, 's, M, W> {
/// Execute the matcher over the given bytes and record the match /// Execute the matcher over the given bytes and record the match
/// locations if the current configuration demands match granularity. /// locations if the current configuration demands match granularity.
fn record_matches( fn record_matches(&mut self, bytes: &[u8]) -> io::Result<()> {
&mut self,
searcher: &Searcher,
bytes: &[u8],
range: std::ops::Range<usize>,
) -> io::Result<()> {
self.json.matches.clear(); self.json.matches.clear();
// If printing requires knowing the location of each individual match, // If printing requires knowing the location of each individual match,
// then compute and stored those right now for use later. While this // then compute and stored those right now for use later. While this
@@ -620,17 +612,12 @@ impl<'p, 's, M: Matcher, W: io::Write> JSONSink<'p, 's, M, W> {
// the extent that it's easy to ensure that we never do more than // the extent that it's easy to ensure that we never do more than
// one search to find the matches. // one search to find the matches.
let matches = &mut self.json.matches; let matches = &mut self.json.matches;
find_iter_at_in_context( self.matcher
searcher, .find_iter(bytes, |m| {
&self.matcher, matches.push(m);
bytes,
range.clone(),
|m| {
let (s, e) = (m.start() - range.start, m.end() - range.start);
matches.push(Match::new(s, e));
true true
}, })
)?; .map_err(io::Error::error_message)?;
// Don't report empty matches appearing at the end of the bytes. // Don't report empty matches appearing at the end of the bytes.
if !matches.is_empty() if !matches.is_empty()
&& matches.last().unwrap().is_empty() && matches.last().unwrap().is_empty()
@@ -657,16 +644,6 @@ impl<'p, 's, M: Matcher, W: io::Write> JSONSink<'p, 's, M, W> {
self.after_context_remaining == 0 self.after_context_remaining == 0
} }
/// Returns whether the current match count exceeds the configured limit.
/// If there is no limit, then this always returns false.
fn match_more_than_limit(&self) -> bool {
let limit = match self.json.config.max_matches {
None => return false,
Some(limit) => limit,
};
self.match_count > limit
}
/// Write the "begin" message. /// Write the "begin" message.
fn write_begin_message(&mut self) -> io::Result<()> { fn write_begin_message(&mut self) -> io::Result<()> {
if self.begin_printed { if self.begin_printed {
@@ -685,30 +662,13 @@ impl<'p, 's, M: Matcher, W: io::Write> Sink for JSONSink<'p, 's, M, W> {
fn matched( fn matched(
&mut self, &mut self,
searcher: &Searcher, searcher: &Searcher,
mat: &SinkMatch<'_>, mat: &SinkMatch,
) -> Result<bool, io::Error> { ) -> Result<bool, io::Error> {
self.write_begin_message()?; self.write_begin_message()?;
self.match_count += 1; self.match_count += 1;
// When we've exceeded our match count, then the remaining context self.after_context_remaining = searcher.after_context() as u64;
// lines should not be reset, but instead, decremented. This avoids a self.record_matches(mat.bytes())?;
// bug where we display more matches than a configured limit. The main
// idea here is that 'matched' might be called again while printing
// an after-context line. In that case, we should treat this as a
// contextual line rather than a matching line for the purposes of
// termination.
if self.match_more_than_limit() {
self.after_context_remaining =
self.after_context_remaining.saturating_sub(1);
} else {
self.after_context_remaining = searcher.after_context() as u64;
}
self.record_matches(
searcher,
mat.buffer(),
mat.bytes_range_in_buffer(),
)?;
self.stats.add_matches(self.json.matches.len() as u64); self.stats.add_matches(self.json.matches.len() as u64);
self.stats.add_matched_lines(mat.lines().count() as u64); self.stats.add_matched_lines(mat.lines().count() as u64);
@@ -727,7 +687,7 @@ impl<'p, 's, M: Matcher, W: io::Write> Sink for JSONSink<'p, 's, M, W> {
fn context( fn context(
&mut self, &mut self,
searcher: &Searcher, searcher: &Searcher,
ctx: &SinkContext<'_>, ctx: &SinkContext,
) -> Result<bool, io::Error> { ) -> Result<bool, io::Error> {
self.write_begin_message()?; self.write_begin_message()?;
self.json.matches.clear(); self.json.matches.clear();
@@ -737,7 +697,7 @@ impl<'p, 's, M: Matcher, W: io::Write> Sink for JSONSink<'p, 's, M, W> {
self.after_context_remaining.saturating_sub(1); self.after_context_remaining.saturating_sub(1);
} }
let submatches = if searcher.invert_match() { let submatches = if searcher.invert_match() {
self.record_matches(searcher, ctx.bytes(), 0..ctx.bytes().len())?; self.record_matches(ctx.bytes())?;
SubMatches::new(ctx.bytes(), &self.json.matches) SubMatches::new(ctx.bytes(), &self.json.matches)
} else { } else {
SubMatches::empty() SubMatches::empty()
@@ -839,7 +799,7 @@ impl<'a> SubMatches<'a> {
} }
/// Return this set of match ranges as a slice. /// Return this set of match ranges as a slice.
fn as_slice(&self) -> &[jsont::SubMatch<'_>] { fn as_slice(&self) -> &[jsont::SubMatch] {
match *self { match *self {
SubMatches::Empty => &[], SubMatches::Empty => &[],
SubMatches::Small(ref x) => x, SubMatches::Small(ref x) => x,
@@ -911,38 +871,6 @@ and exhibited clearly, with a label attached.\
assert_eq!(got.lines().count(), 3); assert_eq!(got.lines().count(), 3);
} }
#[test]
fn max_matches_after_context() {
let haystack = "\
a
b
c
d
e
d
e
d
e
d
e
";
let matcher = RegexMatcher::new(r"d").unwrap();
let mut printer =
JSONBuilder::new().max_matches(Some(1)).build(vec![]);
SearcherBuilder::new()
.after_context(2)
.build()
.search_reader(
&matcher,
haystack.as_bytes(),
printer.sink(&matcher),
)
.unwrap();
let got = printer_contents(&mut printer);
assert_eq!(got.lines().count(), 5);
}
#[test] #[test]
fn no_match() { fn no_match() {
let matcher = RegexMatcher::new(r"DOES NOT MATCH").unwrap(); let matcher = RegexMatcher::new(r"DOES NOT MATCH").unwrap();

View File

@@ -13,7 +13,7 @@ use std::str;
use base64; use base64;
use serde::{Serialize, Serializer}; use serde::{Serialize, Serializer};
use crate::stats::Stats; use stats::Stats;
#[derive(Serialize)] #[derive(Serialize)]
#[serde(tag = "type", content = "data")] #[serde(tag = "type", content = "data")]
@@ -90,7 +90,7 @@ enum Data<'a> {
} }
impl<'a> Data<'a> { impl<'a> Data<'a> {
fn from_bytes(bytes: &[u8]) -> Data<'_> { fn from_bytes(bytes: &[u8]) -> Data {
match str::from_utf8(bytes) { match str::from_utf8(bytes) {
Ok(text) => Data::Text { text: Cow::Borrowed(text) }, Ok(text) => Data::Text { text: Cow::Borrowed(text) },
Err(_) => Data::Bytes { bytes }, Err(_) => Data::Bytes { bytes },
@@ -98,7 +98,7 @@ impl<'a> Data<'a> {
} }
#[cfg(unix)] #[cfg(unix)]
fn from_path(path: &Path) -> Data<'_> { fn from_path(path: &Path) -> Data {
use std::os::unix::ffi::OsStrExt; use std::os::unix::ffi::OsStrExt;
match path.to_str() { match path.to_str() {

View File

@@ -27,6 +27,10 @@ contain matches.
This example shows how to create a "standard" printer and execute a search. This example shows how to create a "standard" printer and execute a search.
``` ```
extern crate grep_regex;
extern crate grep_printer;
extern crate grep_searcher;
use std::error::Error; use std::error::Error;
use grep_regex::RegexMatcher; use grep_regex::RegexMatcher;
@@ -64,26 +68,29 @@ fn example() -> Result<(), Box<Error>> {
#![deny(missing_docs)] #![deny(missing_docs)]
pub use crate::color::{
default_color_specs, ColorError, ColorSpecs, UserColorSpec,
};
#[cfg(feature = "serde1")] #[cfg(feature = "serde1")]
pub use crate::json::{JSONBuilder, JSONSink, JSON}; extern crate base64;
pub use crate::standard::{Standard, StandardBuilder, StandardSink}; extern crate bstr;
pub use crate::stats::Stats; extern crate grep_matcher;
pub use crate::summary::{Summary, SummaryBuilder, SummaryKind, SummarySink}; #[cfg(test)]
pub use crate::util::PrinterPath; extern crate grep_regex;
extern crate grep_searcher;
#[cfg(feature = "serde1")]
extern crate serde;
#[cfg(feature = "serde1")]
#[macro_use]
extern crate serde_derive;
#[cfg(feature = "serde1")]
extern crate serde_json;
extern crate termcolor;
// The maximum number of bytes to execute a search to account for look-ahead. pub use color::{default_color_specs, ColorError, ColorSpecs, UserColorSpec};
// #[cfg(feature = "serde1")]
// This is an unfortunate kludge since PCRE2 doesn't provide a way to search pub use json::{JSONBuilder, JSONSink, JSON};
// a substring of some input while accounting for look-ahead. In theory, we pub use standard::{Standard, StandardBuilder, StandardSink};
// could refactor the various 'grep' interfaces to account for it, but it would pub use stats::Stats;
// be a large change. So for now, we just let PCRE2 go looking a bit for a pub use summary::{Summary, SummaryBuilder, SummaryKind, SummarySink};
// match without searching the entire rest of the contents. pub use util::PrinterPath;
//
// Note that this kludge is only active in multi-line mode.
const MAX_LOOK_AHEAD: usize = 128;
#[macro_use] #[macro_use]
mod macros; mod macros;

View File

@@ -8,18 +8,15 @@ use std::time::Instant;
use bstr::ByteSlice; use bstr::ByteSlice;
use grep_matcher::{Match, Matcher}; use grep_matcher::{Match, Matcher};
use grep_searcher::{ use grep_searcher::{
LineStep, Searcher, Sink, SinkContext, SinkContextKind, SinkFinish, LineStep, Searcher, Sink, SinkContext, SinkContextKind, SinkError,
SinkMatch, SinkFinish, SinkMatch,
}; };
use termcolor::{ColorSpec, NoColor, WriteColor}; use termcolor::{ColorSpec, NoColor, WriteColor};
use crate::color::ColorSpecs; use color::ColorSpecs;
use crate::counter::CounterWriter; use counter::CounterWriter;
use crate::stats::Stats; use stats::Stats;
use crate::util::{ use util::{trim_ascii_prefix, PrinterPath, Replacer, Sunk};
find_iter_at_in_context, trim_ascii_prefix, trim_line_terminator,
PrinterPath, Replacer, Sunk,
};
/// The configuration for the standard printer. /// The configuration for the standard printer.
/// ///
@@ -34,7 +31,6 @@ struct Config {
path: bool, path: bool,
only_matching: bool, only_matching: bool,
per_match: bool, per_match: bool,
per_match_one_line: bool,
replacement: Arc<Option<Vec<u8>>>, replacement: Arc<Option<Vec<u8>>>,
max_columns: Option<u64>, max_columns: Option<u64>,
max_columns_preview: bool, max_columns_preview: bool,
@@ -59,7 +55,6 @@ impl Default for Config {
path: true, path: true,
only_matching: false, only_matching: false,
per_match: false, per_match: false,
per_match_one_line: false,
replacement: Arc::new(None), replacement: Arc::new(None),
max_columns: None, max_columns: None,
max_columns_preview: false, max_columns_preview: false,
@@ -224,36 +219,15 @@ impl StandardBuilder {
/// the `column` option, which will show the starting column number for /// the `column` option, which will show the starting column number for
/// every match on every line. /// every match on every line.
/// ///
/// When multi-line mode is enabled, each match is printed, including every /// When multi-line mode is enabled, each match and its accompanying lines
/// line in the match. As with single line matches, if a line contains /// are printed. As with single line matches, if a line contains multiple
/// multiple matches (even if only partially), then that line is printed /// matches (even if only partially), then that line is printed once for
/// once for each match it participates in, assuming it's the first line in /// each match it participates in.
/// that match. In multi-line mode, column numbers only indicate the start
/// of a match. Subsequent lines in a multi-line match always have a column
/// number of `1`.
///
/// When a match contains multiple lines, enabling `per_match_one_line`
/// will cause only the first line each in match to be printed.
pub fn per_match(&mut self, yes: bool) -> &mut StandardBuilder { pub fn per_match(&mut self, yes: bool) -> &mut StandardBuilder {
self.config.per_match = yes; self.config.per_match = yes;
self self
} }
/// Print at most one line per match when `per_match` is enabled.
///
/// By default, every line in each match found is printed when `per_match`
/// is enabled. However, this is sometimes undesirable, e.g., when you
/// only ever want one line per match.
///
/// This is only applicable when multi-line matching is enabled, since
/// otherwise, matches are guaranteed to span one line.
///
/// This is disabled by default.
pub fn per_match_one_line(&mut self, yes: bool) -> &mut StandardBuilder {
self.config.per_match_one_line = yes;
self
}
/// Set the bytes that will be used to replace each occurrence of a match /// Set the bytes that will be used to replace each occurrence of a match
/// found. /// found.
/// ///
@@ -318,6 +292,9 @@ impl StandardBuilder {
/// Column numbers are computed in terms of bytes from the start of the /// Column numbers are computed in terms of bytes from the start of the
/// line being printed. /// line being printed.
/// ///
/// For matches that span multiple lines, the column number for each
/// matching line is in terms of the first matching line.
///
/// This is disabled by default. /// This is disabled by default.
pub fn column(&mut self, yes: bool) -> &mut StandardBuilder { pub fn column(&mut self, yes: bool) -> &mut StandardBuilder {
self.config.column = yes; self.config.column = yes;
@@ -625,7 +602,7 @@ impl<W> Standard<W> {
/// * `W` refers to the underlying writer that this printer is writing its /// * `W` refers to the underlying writer that this printer is writing its
/// output to. /// output to.
#[derive(Debug)] #[derive(Debug)]
pub struct StandardSink<'p, 's, M: Matcher, W> { pub struct StandardSink<'p, 's, M: Matcher, W: 's> {
matcher: M, matcher: M,
standard: &'s mut Standard<W>, standard: &'s mut Standard<W>,
replacer: Replacer<M>, replacer: Replacer<M>,
@@ -685,12 +662,7 @@ impl<'p, 's, M: Matcher, W: WriteColor> StandardSink<'p, 's, M, W> {
/// Execute the matcher over the given bytes and record the match /// Execute the matcher over the given bytes and record the match
/// locations if the current configuration demands match granularity. /// locations if the current configuration demands match granularity.
fn record_matches( fn record_matches(&mut self, bytes: &[u8]) -> io::Result<()> {
&mut self,
searcher: &Searcher,
bytes: &[u8],
range: std::ops::Range<usize>,
) -> io::Result<()> {
self.standard.matches.clear(); self.standard.matches.clear();
if !self.needs_match_granularity { if !self.needs_match_granularity {
return Ok(()); return Ok(());
@@ -703,21 +675,16 @@ impl<'p, 's, M: Matcher, W: WriteColor> StandardSink<'p, 's, M, W> {
// one search to find the matches (well, for replacements, we do one // one search to find the matches (well, for replacements, we do one
// additional search to perform the actual replacement). // additional search to perform the actual replacement).
let matches = &mut self.standard.matches; let matches = &mut self.standard.matches;
find_iter_at_in_context( self.matcher
searcher, .find_iter(bytes, |m| {
&self.matcher, matches.push(m);
bytes,
range.clone(),
|m| {
let (s, e) = (m.start() - range.start, m.end() - range.start);
matches.push(Match::new(s, e));
true true
}, })
)?; .map_err(io::Error::error_message)?;
// Don't report empty matches appearing at the end of the bytes. // Don't report empty matches appearing at the end of the bytes.
if !matches.is_empty() if !matches.is_empty()
&& matches.last().unwrap().is_empty() && matches.last().unwrap().is_empty()
&& matches.last().unwrap().start() >= range.end && matches.last().unwrap().start() >= bytes.len()
{ {
matches.pop().unwrap(); matches.pop().unwrap();
} }
@@ -728,25 +695,14 @@ impl<'p, 's, M: Matcher, W: WriteColor> StandardSink<'p, 's, M, W> {
/// replacement, lazily allocating memory if necessary. /// replacement, lazily allocating memory if necessary.
/// ///
/// To access the result of a replacement, use `replacer.replacement()`. /// To access the result of a replacement, use `replacer.replacement()`.
fn replace( fn replace(&mut self, bytes: &[u8]) -> io::Result<()> {
&mut self,
searcher: &Searcher,
bytes: &[u8],
range: std::ops::Range<usize>,
) -> io::Result<()> {
self.replacer.clear(); self.replacer.clear();
if self.standard.config.replacement.is_some() { if self.standard.config.replacement.is_some() {
let replacement = (*self.standard.config.replacement) let replacement = (*self.standard.config.replacement)
.as_ref() .as_ref()
.map(|r| &*r) .map(|r| &*r)
.unwrap(); .unwrap();
self.replacer.replace_all( self.replacer.replace_all(&self.matcher, bytes, replacement)?;
searcher,
&self.matcher,
bytes,
range,
replacement,
)?;
} }
Ok(()) Ok(())
} }
@@ -766,16 +722,6 @@ impl<'p, 's, M: Matcher, W: WriteColor> StandardSink<'p, 's, M, W> {
} }
self.after_context_remaining == 0 self.after_context_remaining == 0
} }
/// Returns whether the current match count exceeds the configured limit.
/// If there is no limit, then this always returns false.
fn match_more_than_limit(&self) -> bool {
let limit = match self.standard.config.max_matches {
None => return false,
Some(limit) => limit,
};
self.match_count > limit
}
} }
impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> { impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
@@ -784,29 +730,13 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
fn matched( fn matched(
&mut self, &mut self,
searcher: &Searcher, searcher: &Searcher,
mat: &SinkMatch<'_>, mat: &SinkMatch,
) -> Result<bool, io::Error> { ) -> Result<bool, io::Error> {
self.match_count += 1; self.match_count += 1;
// When we've exceeded our match count, then the remaining context self.after_context_remaining = searcher.after_context() as u64;
// lines should not be reset, but instead, decremented. This avoids a
// bug where we display more matches than a configured limit. The main
// idea here is that 'matched' might be called again while printing
// an after-context line. In that case, we should treat this as a
// contextual line rather than a matching line for the purposes of
// termination.
if self.match_more_than_limit() {
self.after_context_remaining =
self.after_context_remaining.saturating_sub(1);
} else {
self.after_context_remaining = searcher.after_context() as u64;
}
self.record_matches( self.record_matches(mat.bytes())?;
searcher, self.replace(mat.bytes())?;
mat.buffer(),
mat.bytes_range_in_buffer(),
)?;
self.replace(searcher, mat.buffer(), mat.bytes_range_in_buffer())?;
if let Some(ref mut stats) = self.stats { if let Some(ref mut stats) = self.stats {
stats.add_matches(self.standard.matches.len() as u64); stats.add_matches(self.standard.matches.len() as u64);
@@ -825,7 +755,7 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
fn context( fn context(
&mut self, &mut self,
searcher: &Searcher, searcher: &Searcher,
ctx: &SinkContext<'_>, ctx: &SinkContext,
) -> Result<bool, io::Error> { ) -> Result<bool, io::Error> {
self.standard.matches.clear(); self.standard.matches.clear();
self.replacer.clear(); self.replacer.clear();
@@ -835,8 +765,8 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
self.after_context_remaining.saturating_sub(1); self.after_context_remaining.saturating_sub(1);
} }
if searcher.invert_match() { if searcher.invert_match() {
self.record_matches(searcher, ctx.bytes(), 0..ctx.bytes().len())?; self.record_matches(ctx.bytes())?;
self.replace(searcher, ctx.bytes(), 0..ctx.bytes().len())?; self.replace(ctx.bytes())?;
} }
if searcher.binary_detection().convert_byte().is_some() { if searcher.binary_detection().convert_byte().is_some() {
if self.binary_byte_offset.is_some() { if self.binary_byte_offset.is_some() {
@@ -904,7 +834,7 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
/// A StandardImpl is initialized every time a match or a contextual line is /// A StandardImpl is initialized every time a match or a contextual line is
/// reported. /// reported.
#[derive(Debug)] #[derive(Debug)]
struct StandardImpl<'a, M: Matcher, W> { struct StandardImpl<'a, M: 'a + Matcher, W: 'a> {
searcher: &'a Searcher, searcher: &'a Searcher,
sink: &'a StandardSink<'a, 'a, M, W>, sink: &'a StandardSink<'a, 'a, M, W>,
sunk: Sunk<'a>, sunk: Sunk<'a>,
@@ -916,7 +846,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
/// Bundle self with a searcher and return the core implementation of Sink. /// Bundle self with a searcher and return the core implementation of Sink.
fn new( fn new(
searcher: &'a Searcher, searcher: &'a Searcher,
sink: &'a StandardSink<'_, '_, M, W>, sink: &'a StandardSink<M, W>,
) -> StandardImpl<'a, M, W> { ) -> StandardImpl<'a, M, W> {
StandardImpl { StandardImpl {
searcher: searcher, searcher: searcher,
@@ -930,7 +860,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
/// for use with handling matching lines. /// for use with handling matching lines.
fn from_match( fn from_match(
searcher: &'a Searcher, searcher: &'a Searcher,
sink: &'a StandardSink<'_, '_, M, W>, sink: &'a StandardSink<M, W>,
mat: &'a SinkMatch<'a>, mat: &'a SinkMatch<'a>,
) -> StandardImpl<'a, M, W> { ) -> StandardImpl<'a, M, W> {
let sunk = Sunk::from_sink_match( let sunk = Sunk::from_sink_match(
@@ -945,7 +875,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
/// for use with handling contextual lines. /// for use with handling contextual lines.
fn from_context( fn from_context(
searcher: &'a Searcher, searcher: &'a Searcher,
sink: &'a StandardSink<'_, '_, M, W>, sink: &'a StandardSink<M, W>,
ctx: &'a SinkContext<'a>, ctx: &'a SinkContext<'a>,
) -> StandardImpl<'a, M, W> { ) -> StandardImpl<'a, M, W> {
let sunk = Sunk::from_sink_context( let sunk = Sunk::from_sink_context(
@@ -1160,7 +1090,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
self.write_prelude( self.write_prelude(
self.sunk.absolute_byte_offset() + line.start() as u64, self.sunk.absolute_byte_offset() + line.start() as u64,
self.sunk.line_number().map(|n| n + count), self.sunk.line_number().map(|n| n + count),
Some(m.start().saturating_sub(line.start()) as u64 + 1), Some(m.start() as u64 + 1),
)?; )?;
count += 1; count += 1;
if self.exceeds_max_columns(&bytes[line]) { if self.exceeds_max_columns(&bytes[line]) {
@@ -1185,15 +1115,6 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
} }
} }
self.write_line_term()?; self.write_line_term()?;
// It turns out that vimgrep really only wants one line per
// match, even when a match spans multiple lines. So when
// that option is enabled, we just quit after printing the
// first line.
//
// See: https://github.com/BurntSushi/ripgrep/issues/1866
if self.config().per_match_one_line {
break;
}
} }
} }
Ok(()) Ok(())
@@ -1548,7 +1469,14 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
} }
fn trim_line_terminator(&self, buf: &[u8], line: &mut Match) { fn trim_line_terminator(&self, buf: &[u8], line: &mut Match) {
trim_line_terminator(&self.searcher, buf, line); let lineterm = self.searcher.line_terminator();
if lineterm.is_suffix(&buf[*line]) {
let mut end = line.end() - 1;
if lineterm.is_crlf() && buf[end - 1] == b'\r' {
end -= 1;
}
*line = line.with_end(end);
}
} }
fn has_line_terminator(&self, buf: &[u8]) -> bool { fn has_line_terminator(&self, buf: &[u8]) -> bool {
@@ -1594,7 +1522,7 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
/// multiple lines. /// multiple lines.
/// ///
/// Note that this doesn't just return whether the searcher is in multi /// Note that this doesn't just return whether the searcher is in multi
/// line mode, but also checks if the matter can match over multiple lines. /// line mode, but also checks if the mater can match over multiple lines.
/// If it can't, then we don't need multi line handling, even if the /// If it can't, then we don't need multi line handling, even if the
/// searcher has multi line mode enabled. /// searcher has multi line mode enabled.
fn multi_line(&self) -> bool { fn multi_line(&self) -> bool {
@@ -1617,12 +1545,11 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use grep_matcher::LineTerminator; use grep_regex::RegexMatcher;
use grep_regex::{RegexMatcher, RegexMatcherBuilder};
use grep_searcher::SearcherBuilder; use grep_searcher::SearcherBuilder;
use termcolor::{Ansi, NoColor}; use termcolor::NoColor;
use super::{ColorSpecs, Standard, StandardBuilder}; use super::{Standard, StandardBuilder};
const SHERLOCK: &'static str = "\ const SHERLOCK: &'static str = "\
For the Doctor Watsons of this world, as opposed to the Sherlock For the Doctor Watsons of this world, as opposed to the Sherlock
@@ -1647,10 +1574,6 @@ and exhibited clearly, with a label attached.\
String::from_utf8(printer.get_mut().get_ref().to_owned()).unwrap() String::from_utf8(printer.get_mut().get_ref().to_owned()).unwrap()
} }
fn printer_contents_ansi(printer: &mut Standard<Ansi<Vec<u8>>>) -> String {
String::from_utf8(printer.get_mut().get_ref().to_owned()).unwrap()
}
#[test] #[test]
fn reports_match() { fn reports_match() {
let matcher = RegexMatcher::new("Sherlock").unwrap(); let matcher = RegexMatcher::new("Sherlock").unwrap();
@@ -3067,9 +2990,9 @@ Holmeses, success in the province of detective work must always
let got = printer_contents(&mut printer); let got = printer_contents(&mut printer);
let expected = "\ let expected = "\
1:16:For the Doctor Watsons of this world, as opposed to the Sherlock 1:16:For the Doctor Watsons of this world, as opposed to the Sherlock
2:1:Holmeses, success in the province of detective work must always 2:16:Holmeses, success in the province of detective work must always
5:12:but Doctor Watson has to have it taken out for him and dusted, 5:12:but Doctor Watson has to have it taken out for him and dusted,
6:1:and exhibited clearly, with a label attached. 6:12:and exhibited clearly, with a label attached.
"; ";
assert_eq_printed!(expected, got); assert_eq_printed!(expected, got);
} }
@@ -3096,94 +3019,9 @@ Holmeses, success in the province of detective work must always
let got = printer_contents(&mut printer); let got = printer_contents(&mut printer);
let expected = "\ let expected = "\
1:16:For the Doctor Watsons of this world, as opposed to the Sherlock 1:16:For the Doctor Watsons of this world, as opposed to the Sherlock
2:1:Holmeses, success in the province of detective work must always 2:16:Holmeses, success in the province of detective work must always
2:58:Holmeses, success in the province of detective work must always 2:123:Holmeses, success in the province of detective work must always
3:1:be, to a very large extent, the result of luck. Sherlock Holmes 3:123:be, to a very large extent, the result of luck. Sherlock Holmes
";
assert_eq_printed!(expected, got);
}
#[test]
fn per_match_multi_line1_only_first_line() {
let matcher =
RegexMatcher::new(r"(?s:.{0})(Doctor Watsons|Sherlock)").unwrap();
let mut printer = StandardBuilder::new()
.per_match(true)
.per_match_one_line(true)
.column(true)
.build(NoColor::new(vec![]));
SearcherBuilder::new()
.multi_line(true)
.line_number(true)
.build()
.search_reader(
&matcher,
SHERLOCK.as_bytes(),
printer.sink(&matcher),
)
.unwrap();
let got = printer_contents(&mut printer);
let expected = "\
1:9:For the Doctor Watsons of this world, as opposed to the Sherlock
1:57:For the Doctor Watsons of this world, as opposed to the Sherlock
3:49:be, to a very large extent, the result of luck. Sherlock Holmes
";
assert_eq_printed!(expected, got);
}
#[test]
fn per_match_multi_line2_only_first_line() {
let matcher =
RegexMatcher::new(r"(?s)Watson.+?(Holmeses|clearly)").unwrap();
let mut printer = StandardBuilder::new()
.per_match(true)
.per_match_one_line(true)
.column(true)
.build(NoColor::new(vec![]));
SearcherBuilder::new()
.multi_line(true)
.line_number(true)
.build()
.search_reader(
&matcher,
SHERLOCK.as_bytes(),
printer.sink(&matcher),
)
.unwrap();
let got = printer_contents(&mut printer);
let expected = "\
1:16:For the Doctor Watsons of this world, as opposed to the Sherlock
5:12:but Doctor Watson has to have it taken out for him and dusted,
";
assert_eq_printed!(expected, got);
}
#[test]
fn per_match_multi_line3_only_first_line() {
let matcher =
RegexMatcher::new(r"(?s)Watson.+?Holmeses|always.+?be").unwrap();
let mut printer = StandardBuilder::new()
.per_match(true)
.per_match_one_line(true)
.column(true)
.build(NoColor::new(vec![]));
SearcherBuilder::new()
.multi_line(true)
.line_number(true)
.build()
.search_reader(
&matcher,
SHERLOCK.as_bytes(),
printer.sink(&matcher),
)
.unwrap();
let got = printer_contents(&mut printer);
let expected = "\
1:16:For the Doctor Watsons of this world, as opposed to the Sherlock
2:58:Holmeses, success in the province of detective work must always
"; ";
assert_eq_printed!(expected, got); assert_eq_printed!(expected, got);
} }
@@ -3242,80 +3080,6 @@ Holmeses, success in the province of detective work must always
assert_eq_printed!(expected, got); assert_eq_printed!(expected, got);
} }
// This is a somewhat weird test that checks the behavior of attempting
// to replace a line terminator with something else.
//
// See: https://github.com/BurntSushi/ripgrep/issues/1311
#[test]
fn replacement_multi_line() {
let matcher = RegexMatcher::new(r"\n").unwrap();
let mut printer = StandardBuilder::new()
.replacement(Some(b"?".to_vec()))
.build(NoColor::new(vec![]));
SearcherBuilder::new()
.line_number(true)
.multi_line(true)
.build()
.search_reader(
&matcher,
"hello\nworld\n".as_bytes(),
printer.sink(&matcher),
)
.unwrap();
let got = printer_contents(&mut printer);
let expected = "1:hello?world?\n";
assert_eq_printed!(expected, got);
}
#[test]
fn replacement_multi_line_diff_line_term() {
let matcher = RegexMatcherBuilder::new()
.line_terminator(Some(b'\x00'))
.build(r"\n")
.unwrap();
let mut printer = StandardBuilder::new()
.replacement(Some(b"?".to_vec()))
.build(NoColor::new(vec![]));
SearcherBuilder::new()
.line_terminator(LineTerminator::byte(b'\x00'))
.line_number(true)
.multi_line(true)
.build()
.search_reader(
&matcher,
"hello\nworld\n".as_bytes(),
printer.sink(&matcher),
)
.unwrap();
let got = printer_contents(&mut printer);
let expected = "1:hello?world?\x00";
assert_eq_printed!(expected, got);
}
#[test]
fn replacement_multi_line_combine_lines() {
let matcher = RegexMatcher::new(r"\n(.)?").unwrap();
let mut printer = StandardBuilder::new()
.replacement(Some(b"?$1".to_vec()))
.build(NoColor::new(vec![]));
SearcherBuilder::new()
.line_number(true)
.multi_line(true)
.build()
.search_reader(
&matcher,
"hello\nworld\n".as_bytes(),
printer.sink(&matcher),
)
.unwrap();
let got = printer_contents(&mut printer);
let expected = "1:hello?world?\n";
assert_eq_printed!(expected, got);
}
#[test] #[test]
fn replacement_max_columns() { fn replacement_max_columns() {
let matcher = RegexMatcher::new(r"Sherlock|Doctor (\w+)").unwrap(); let matcher = RegexMatcher::new(r"Sherlock|Doctor (\w+)").unwrap();
@@ -3622,57 +3386,4 @@ and xxx clearly, with a label attached.
"; ";
assert_eq_printed!(expected, got); assert_eq_printed!(expected, got);
} }
#[test]
fn regression_search_empty_with_crlf() {
let matcher =
RegexMatcherBuilder::new().crlf(true).build(r"x?").unwrap();
let mut printer = StandardBuilder::new()
.color_specs(ColorSpecs::default_with_color())
.build(Ansi::new(vec![]));
SearcherBuilder::new()
.line_terminator(LineTerminator::crlf())
.build()
.search_reader(&matcher, &b"\n"[..], printer.sink(&matcher))
.unwrap();
let got = printer_contents_ansi(&mut printer);
assert!(!got.is_empty());
}
#[test]
fn regression_after_context_with_match() {
let haystack = "\
a
b
c
d
e
d
e
d
e
d
e
";
let matcher = RegexMatcherBuilder::new().build(r"d").unwrap();
let mut printer = StandardBuilder::new()
.max_matches(Some(1))
.build(NoColor::new(vec![]));
SearcherBuilder::new()
.line_number(true)
.after_context(2)
.build()
.search_reader(
&matcher,
haystack.as_bytes(),
printer.sink(&matcher),
)
.unwrap();
let got = printer_contents(&mut printer);
let expected = "4:d\n5-e\n6:d\n";
assert_eq_printed!(expected, got);
}
} }

View File

@@ -1,14 +1,14 @@
use std::ops::{Add, AddAssign}; use std::ops::{Add, AddAssign};
use std::time::Duration; use std::time::Duration;
use crate::util::NiceDuration; use util::NiceDuration;
/// Summary statistics produced at the end of a search. /// Summary statistics produced at the end of a search.
/// ///
/// When statistics are reported by a printer, they correspond to all searches /// When statistics are reported by a printer, they correspond to all searches
/// executed with that printer. /// executed with that printer.
#[derive(Clone, Debug, Default, PartialEq, Eq)] #[derive(Clone, Debug, Default, PartialEq, Eq)]
#[cfg_attr(feature = "serde1", derive(serde::Serialize))] #[cfg_attr(feature = "serde1", derive(Serialize))]
pub struct Stats { pub struct Stats {
elapsed: NiceDuration, elapsed: NiceDuration,
searches: u64, searches: u64,

View File

@@ -8,10 +8,10 @@ use grep_matcher::Matcher;
use grep_searcher::{Searcher, Sink, SinkError, SinkFinish, SinkMatch}; use grep_searcher::{Searcher, Sink, SinkError, SinkFinish, SinkMatch};
use termcolor::{ColorSpec, NoColor, WriteColor}; use termcolor::{ColorSpec, NoColor, WriteColor};
use crate::color::ColorSpecs; use color::ColorSpecs;
use crate::counter::CounterWriter; use counter::CounterWriter;
use crate::stats::Stats; use stats::Stats;
use crate::util::{find_iter_at_in_context, PrinterPath}; use util::PrinterPath;
/// The configuration for the summary printer. /// The configuration for the summary printer.
/// ///
@@ -457,7 +457,7 @@ impl<W> Summary<W> {
/// * `W` refers to the underlying writer that this printer is writing its /// * `W` refers to the underlying writer that this printer is writing its
/// output to. /// output to.
#[derive(Debug)] #[derive(Debug)]
pub struct SummarySink<'p, 's, M: Matcher, W> { pub struct SummarySink<'p, 's, M: Matcher, W: 's> {
matcher: M, matcher: M,
summary: &'s mut Summary<W>, summary: &'s mut Summary<W>,
path: Option<PrinterPath<'p>>, path: Option<PrinterPath<'p>>,
@@ -504,17 +504,6 @@ impl<'p, 's, M: Matcher, W: WriteColor> SummarySink<'p, 's, M, W> {
self.stats.as_ref() self.stats.as_ref()
} }
/// Returns true if and only if the searcher may report matches over
/// multiple lines.
///
/// Note that this doesn't just return whether the searcher is in multi
/// line mode, but also checks if the matter can match over multiple lines.
/// If it can't, then we don't need multi line handling, even if the
/// searcher has multi line mode enabled.
fn multi_line(&self, searcher: &Searcher) -> bool {
searcher.multi_line_with_matcher(&self.matcher)
}
/// Returns true if this printer should quit. /// Returns true if this printer should quit.
/// ///
/// This implements the logic for handling quitting after seeing a certain /// This implements the logic for handling quitting after seeing a certain
@@ -590,39 +579,32 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for SummarySink<'p, 's, M, W> {
fn matched( fn matched(
&mut self, &mut self,
searcher: &Searcher, _searcher: &Searcher,
mat: &SinkMatch<'_>, mat: &SinkMatch,
) -> Result<bool, io::Error> { ) -> Result<bool, io::Error> {
let is_multi_line = self.multi_line(searcher); self.match_count += 1;
let sink_match_count = if self.stats.is_none() && !is_multi_line {
1
} else {
// This gives us as many bytes as the searcher can offer. This
// isn't guaranteed to hold the necessary context to get match
// detection correct (because of look-around), but it does in
// practice.
let buf = mat.buffer();
let range = mat.bytes_range_in_buffer();
let mut count = 0;
find_iter_at_in_context(
searcher,
&self.matcher,
buf,
range,
|_| {
count += 1;
true
},
)?;
count
};
if is_multi_line {
self.match_count += sink_match_count;
} else {
self.match_count += 1;
}
if let Some(ref mut stats) = self.stats { if let Some(ref mut stats) = self.stats {
stats.add_matches(sink_match_count); let mut match_count = 0;
self.matcher
.find_iter(mat.bytes(), |_| {
match_count += 1;
true
})
.map_err(io::Error::error_message)?;
if match_count == 0 {
// It is possible for the match count to be zero when
// look-around is used. Since `SinkMatch` won't necessarily
// contain the look-around in its match span, the search here
// could fail to find anything.
//
// It seems likely that setting match_count=1 here is probably
// wrong in some cases, but I don't think we can do any
// better. (Because this printer cannot assume that subsequent
// contents have been loaded into memory, so we have no way of
// increasing the search span here.)
match_count = 1;
}
stats.add_matches(match_count);
stats.add_matched_lines(mat.lines().count() as u64); stats.add_matched_lines(mat.lines().count() as u64);
} else if self.summary.config.kind.quit_early() { } else if self.summary.config.kind.quit_early() {
return Ok(false); return Ok(false);

View File

@@ -7,13 +7,11 @@ use std::time;
use bstr::{ByteSlice, ByteVec}; use bstr::{ByteSlice, ByteVec};
use grep_matcher::{Captures, LineTerminator, Match, Matcher}; use grep_matcher::{Captures, LineTerminator, Match, Matcher};
use grep_searcher::{ use grep_searcher::{
LineIter, Searcher, SinkContext, SinkContextKind, SinkError, SinkMatch, LineIter, SinkContext, SinkContextKind, SinkError, SinkMatch,
}; };
#[cfg(feature = "serde1")] #[cfg(feature = "serde1")]
use serde::{Serialize, Serializer}; use serde::{Serialize, Serializer};
use crate::MAX_LOOK_AHEAD;
/// A type for handling replacements while amortizing allocation. /// A type for handling replacements while amortizing allocation.
pub struct Replacer<M: Matcher> { pub struct Replacer<M: Matcher> {
space: Option<Space<M>>, space: Option<Space<M>>,
@@ -29,7 +27,7 @@ struct Space<M: Matcher> {
} }
impl<M: Matcher> fmt::Debug for Replacer<M> { impl<M: Matcher> fmt::Debug for Replacer<M> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
let (dst, matches) = self.replacement().unwrap_or((&[], &[])); let (dst, matches) = self.replacement().unwrap_or((&[], &[]));
f.debug_struct("Replacer") f.debug_struct("Replacer")
.field("dst", &dst) .field("dst", &dst)
@@ -54,41 +52,18 @@ impl<M: Matcher> Replacer<M> {
/// This can fail if the underlying matcher reports an error. /// This can fail if the underlying matcher reports an error.
pub fn replace_all<'a>( pub fn replace_all<'a>(
&'a mut self, &'a mut self,
searcher: &Searcher,
matcher: &M, matcher: &M,
mut subject: &[u8], subject: &[u8],
range: std::ops::Range<usize>,
replacement: &[u8], replacement: &[u8],
) -> io::Result<()> { ) -> io::Result<()> {
// See the giant comment in 'find_iter_at_in_context' below for why we
// do this dance.
let is_multi_line = searcher.multi_line_with_matcher(&matcher);
if is_multi_line {
if subject[range.end..].len() >= MAX_LOOK_AHEAD {
subject = &subject[..range.end + MAX_LOOK_AHEAD];
}
} else {
// When searching a single line, we should remove the line
// terminator. Otherwise, it's possible for the regex (via
// look-around) to observe the line terminator and not match
// because of it.
let mut m = Match::new(0, range.end);
trim_line_terminator(searcher, subject, &mut m);
subject = &subject[..m.end()];
}
{ {
let &mut Space { ref mut dst, ref mut caps, ref mut matches } = let &mut Space { ref mut dst, ref mut caps, ref mut matches } =
self.allocate(matcher)?; self.allocate(matcher)?;
dst.clear(); dst.clear();
matches.clear(); matches.clear();
replace_with_captures_in_context( matcher
matcher, .replace_with_captures(subject, caps, dst, |caps, dst| {
subject,
range.clone(),
caps,
dst,
|caps, dst| {
let start = dst.len(); let start = dst.len();
caps.interpolate( caps.interpolate(
|name| matcher.capture_index(name), |name| matcher.capture_index(name),
@@ -99,9 +74,8 @@ impl<M: Matcher> Replacer<M> {
let end = dst.len(); let end = dst.len();
matches.push(Match::new(start, end)); matches.push(Match::new(start, end));
true true
}, })
) .map_err(io::Error::error_message)?;
.map_err(io::Error::error_message)?;
} }
Ok(()) Ok(())
} }
@@ -330,7 +304,7 @@ impl<'a> PrinterPath<'a> {
pub struct NiceDuration(pub time::Duration); pub struct NiceDuration(pub time::Duration);
impl fmt::Display for NiceDuration { impl fmt::Display for NiceDuration {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{:0.6}s", self.fractional_seconds()) write!(f, "{:0.6}s", self.fractional_seconds())
} }
} }
@@ -383,108 +357,3 @@ pub fn trim_ascii_prefix(
.count(); .count();
range.with_start(range.start() + count) range.with_start(range.start() + count)
} }
pub fn find_iter_at_in_context<M, F>(
searcher: &Searcher,
matcher: M,
mut bytes: &[u8],
range: std::ops::Range<usize>,
mut matched: F,
) -> io::Result<()>
where
M: Matcher,
F: FnMut(Match) -> bool,
{
// This strange dance is to account for the possibility of look-ahead in
// the regex. The problem here is that mat.bytes() doesn't include the
// lines beyond the match boundaries in mulit-line mode, which means that
// when we try to rediscover the full set of matches here, the regex may no
// longer match if it required some look-ahead beyond the matching lines.
//
// PCRE2 (and the grep-matcher interfaces) has no way of specifying an end
// bound of the search. So we kludge it and let the regex engine search the
// rest of the buffer... But to avoid things getting too crazy, we cap the
// buffer.
//
// If it weren't for multi-line mode, then none of this would be needed.
// Alternatively, if we refactored the grep interfaces to pass along the
// full set of matches (if available) from the searcher, then that might
// also help here. But that winds up paying an upfront unavoidable cost for
// the case where matches don't need to be counted. So then you'd have to
// introduce a way to pass along matches conditionally, only when needed.
// Yikes.
//
// Maybe the bigger picture thing here is that the searcher should be
// responsible for finding matches when necessary, and the printer
// shouldn't be involved in this business in the first place. Sigh. Live
// and learn. Abstraction boundaries are hard.
let is_multi_line = searcher.multi_line_with_matcher(&matcher);
if is_multi_line {
if bytes[range.end..].len() >= MAX_LOOK_AHEAD {
bytes = &bytes[..range.end + MAX_LOOK_AHEAD];
}
} else {
// When searching a single line, we should remove the line terminator.
// Otherwise, it's possible for the regex (via look-around) to observe
// the line terminator and not match because of it.
let mut m = Match::new(0, range.end);
trim_line_terminator(searcher, bytes, &mut m);
bytes = &bytes[..m.end()];
}
matcher
.find_iter_at(bytes, range.start, |m| {
if m.start() >= range.end {
return false;
}
matched(m)
})
.map_err(io::Error::error_message)
}
/// Given a buf and some bounds, if there is a line terminator at the end of
/// the given bounds in buf, then the bounds are trimmed to remove the line
/// terminator.
pub fn trim_line_terminator(
searcher: &Searcher,
buf: &[u8],
line: &mut Match,
) {
let lineterm = searcher.line_terminator();
if lineterm.is_suffix(&buf[*line]) {
let mut end = line.end() - 1;
if lineterm.is_crlf() && end > 0 && buf.get(end - 1) == Some(&b'\r') {
end -= 1;
}
*line = line.with_end(end);
}
}
/// Like `Matcher::replace_with_captures_at`, but accepts an end bound.
///
/// See also: `find_iter_at_in_context` for why we need this.
fn replace_with_captures_in_context<M, F>(
matcher: M,
bytes: &[u8],
range: std::ops::Range<usize>,
caps: &mut M::Captures,
dst: &mut Vec<u8>,
mut append: F,
) -> Result<(), M::Error>
where
M: Matcher,
F: FnMut(&M::Captures, &mut Vec<u8>) -> bool,
{
let mut last_match = range.start;
matcher.captures_iter_at(bytes, range.start, caps, |caps| {
let m = caps.get(0).unwrap();
if m.start() >= range.end {
return false;
}
dst.extend(&bytes[last_match..m.start()]);
last_match = m.end();
append(caps, dst)
})?;
let end = std::cmp::min(bytes.len(), range.end);
dst.extend(&bytes[last_match..end]);
Ok(())
}

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "grep-regex" name = "grep-regex"
version = "0.1.11" #:version version = "0.1.8" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"] authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """ description = """
Use Rust's regex library with the 'grep' crate. Use Rust's regex library with the 'grep' crate.
@@ -10,13 +10,13 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/regex"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/regex" repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/regex"
readme = "README.md" readme = "README.md"
keywords = ["regex", "grep", "search", "pattern", "line"] keywords = ["regex", "grep", "search", "pattern", "line"]
license = "Unlicense OR MIT" license = "Unlicense/MIT"
edition = "2021"
[dependencies] [dependencies]
aho-corasick = "1.0.2" aho-corasick = "0.7.3"
bstr = "1.6.0" bstr = "0.2.10"
grep-matcher = { version = "0.1.6", path = "../matcher" } grep-matcher = { version = "0.1.2", path = "../matcher" }
log = "0.4.19" log = "0.4.5"
regex-automata = { version = "0.3.0" } regex = "1.1"
regex-syntax = "0.7.2" regex-syntax = "0.6.5"
thread_local = "1"

View File

@@ -26,3 +26,9 @@ Add this to your `Cargo.toml`:
[dependencies] [dependencies]
grep-regex = "0.1" grep-regex = "0.1"
``` ```
and this to your crate root:
```rust
extern crate grep_regex;
```

View File

@@ -1,13 +1,17 @@
use regex_syntax::ast::parse::Parser;
use regex_syntax::ast::{self, Ast}; use regex_syntax::ast::{self, Ast};
/// The results of analyzing AST of a regular expression (e.g., for supporting /// The results of analyzing AST of a regular expression (e.g., for supporting
/// smart case). /// smart case).
#[derive(Clone, Debug)] #[derive(Clone, Debug)]
pub(crate) struct AstAnalysis { pub struct AstAnalysis {
/// True if and only if a literal uppercase character occurs in the regex. /// True if and only if a literal uppercase character occurs in the regex.
any_uppercase: bool, any_uppercase: bool,
/// True if and only if the regex contains any literal at all. /// True if and only if the regex contains any literal at all.
any_literal: bool, any_literal: bool,
/// True if and only if the regex consists entirely of a literal and no
/// other special regex characters.
all_verbatim_literal: bool,
} }
impl AstAnalysis { impl AstAnalysis {
@@ -15,16 +19,16 @@ impl AstAnalysis {
/// ///
/// If `pattern` is not a valid regular expression, then `None` is /// If `pattern` is not a valid regular expression, then `None` is
/// returned. /// returned.
#[cfg(test)] #[allow(dead_code)]
pub(crate) fn from_pattern(pattern: &str) -> Option<AstAnalysis> { pub fn from_pattern(pattern: &str) -> Option<AstAnalysis> {
regex_syntax::ast::parse::Parser::new() Parser::new()
.parse(pattern) .parse(pattern)
.map(|ast| AstAnalysis::from_ast(&ast)) .map(|ast| AstAnalysis::from_ast(&ast))
.ok() .ok()
} }
/// Perform an AST analysis given the AST. /// Perform an AST analysis given the AST.
pub(crate) fn from_ast(ast: &Ast) -> AstAnalysis { pub fn from_ast(ast: &Ast) -> AstAnalysis {
let mut analysis = AstAnalysis::new(); let mut analysis = AstAnalysis::new();
analysis.from_ast_impl(ast); analysis.from_ast_impl(ast);
analysis analysis
@@ -36,7 +40,7 @@ impl AstAnalysis {
/// For example, a pattern like `\pL` contains no uppercase literals, /// For example, a pattern like `\pL` contains no uppercase literals,
/// even though `L` is uppercase and the `\pL` class contains uppercase /// even though `L` is uppercase and the `\pL` class contains uppercase
/// characters. /// characters.
pub(crate) fn any_uppercase(&self) -> bool { pub fn any_uppercase(&self) -> bool {
self.any_uppercase self.any_uppercase
} }
@@ -44,13 +48,32 @@ impl AstAnalysis {
/// ///
/// For example, a pattern like `\pL` reports `false`, but a pattern like /// For example, a pattern like `\pL` reports `false`, but a pattern like
/// `\pLfoo` reports `true`. /// `\pLfoo` reports `true`.
pub(crate) fn any_literal(&self) -> bool { pub fn any_literal(&self) -> bool {
self.any_literal self.any_literal
} }
/// Returns true if and only if the entire pattern is a verbatim literal
/// with no special meta characters.
///
/// When this is true, then the pattern satisfies the following law:
/// `escape(pattern) == pattern`. Notable examples where this returns
/// `false` include patterns like `a\u0061` even though `\u0061` is just
/// a literal `a`.
///
/// The purpose of this flag is to determine whether the patterns can be
/// given to non-regex substring search algorithms as-is.
#[allow(dead_code)]
pub fn all_verbatim_literal(&self) -> bool {
self.all_verbatim_literal
}
/// Creates a new `AstAnalysis` value with an initial configuration. /// Creates a new `AstAnalysis` value with an initial configuration.
fn new() -> AstAnalysis { fn new() -> AstAnalysis {
AstAnalysis { any_uppercase: false, any_literal: false } AstAnalysis {
any_uppercase: false,
any_literal: false,
all_verbatim_literal: true,
}
} }
fn from_ast_impl(&mut self, ast: &Ast) { fn from_ast_impl(&mut self, ast: &Ast) {
@@ -63,20 +86,26 @@ impl AstAnalysis {
| Ast::Dot(_) | Ast::Dot(_)
| Ast::Assertion(_) | Ast::Assertion(_)
| Ast::Class(ast::Class::Unicode(_)) | Ast::Class(ast::Class::Unicode(_))
| Ast::Class(ast::Class::Perl(_)) => {} | Ast::Class(ast::Class::Perl(_)) => {
self.all_verbatim_literal = false;
}
Ast::Literal(ref x) => { Ast::Literal(ref x) => {
self.from_ast_literal(x); self.from_ast_literal(x);
} }
Ast::Class(ast::Class::Bracketed(ref x)) => { Ast::Class(ast::Class::Bracketed(ref x)) => {
self.all_verbatim_literal = false;
self.from_ast_class_set(&x.kind); self.from_ast_class_set(&x.kind);
} }
Ast::Repetition(ref x) => { Ast::Repetition(ref x) => {
self.all_verbatim_literal = false;
self.from_ast_impl(&x.ast); self.from_ast_impl(&x.ast);
} }
Ast::Group(ref x) => { Ast::Group(ref x) => {
self.all_verbatim_literal = false;
self.from_ast_impl(&x.ast); self.from_ast_impl(&x.ast);
} }
Ast::Alternation(ref alt) => { Ast::Alternation(ref alt) => {
self.all_verbatim_literal = false;
for x in &alt.asts { for x in &alt.asts {
self.from_ast_impl(x); self.from_ast_impl(x);
} }
@@ -132,6 +161,9 @@ impl AstAnalysis {
} }
fn from_ast_literal(&mut self, ast: &ast::Literal) { fn from_ast_literal(&mut self, ast: &ast::Literal) {
if ast.kind != ast::LiteralKind::Verbatim {
self.all_verbatim_literal = false;
}
self.any_literal = true; self.any_literal = true;
self.any_uppercase = self.any_uppercase || ast.c.is_uppercase(); self.any_uppercase = self.any_uppercase || ast.c.is_uppercase();
} }
@@ -139,7 +171,7 @@ impl AstAnalysis {
/// Returns true if and only if the attributes can never change no matter /// Returns true if and only if the attributes can never change no matter
/// what other AST it might see. /// what other AST it might see.
fn done(&self) -> bool { fn done(&self) -> bool {
self.any_uppercase && self.any_literal self.any_uppercase && self.any_literal && !self.all_verbatim_literal
} }
} }
@@ -156,61 +188,76 @@ mod tests {
let x = analysis(""); let x = analysis("");
assert!(!x.any_uppercase); assert!(!x.any_uppercase);
assert!(!x.any_literal); assert!(!x.any_literal);
assert!(x.all_verbatim_literal);
let x = analysis("foo"); let x = analysis("foo");
assert!(!x.any_uppercase); assert!(!x.any_uppercase);
assert!(x.any_literal); assert!(x.any_literal);
assert!(x.all_verbatim_literal);
let x = analysis("Foo"); let x = analysis("Foo");
assert!(x.any_uppercase); assert!(x.any_uppercase);
assert!(x.any_literal); assert!(x.any_literal);
assert!(x.all_verbatim_literal);
let x = analysis("foO"); let x = analysis("foO");
assert!(x.any_uppercase); assert!(x.any_uppercase);
assert!(x.any_literal); assert!(x.any_literal);
assert!(x.all_verbatim_literal);
let x = analysis(r"foo\\"); let x = analysis(r"foo\\");
assert!(!x.any_uppercase); assert!(!x.any_uppercase);
assert!(x.any_literal); assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo\w"); let x = analysis(r"foo\w");
assert!(!x.any_uppercase); assert!(!x.any_uppercase);
assert!(x.any_literal); assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo\S"); let x = analysis(r"foo\S");
assert!(!x.any_uppercase); assert!(!x.any_uppercase);
assert!(x.any_literal); assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo\p{Ll}"); let x = analysis(r"foo\p{Ll}");
assert!(!x.any_uppercase); assert!(!x.any_uppercase);
assert!(x.any_literal); assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo[a-z]"); let x = analysis(r"foo[a-z]");
assert!(!x.any_uppercase); assert!(!x.any_uppercase);
assert!(x.any_literal); assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo[A-Z]"); let x = analysis(r"foo[A-Z]");
assert!(x.any_uppercase); assert!(x.any_uppercase);
assert!(x.any_literal); assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo[\S\t]"); let x = analysis(r"foo[\S\t]");
assert!(!x.any_uppercase); assert!(!x.any_uppercase);
assert!(x.any_literal); assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"foo\\S"); let x = analysis(r"foo\\S");
assert!(x.any_uppercase); assert!(x.any_uppercase);
assert!(x.any_literal); assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"\p{Ll}"); let x = analysis(r"\p{Ll}");
assert!(!x.any_uppercase); assert!(!x.any_uppercase);
assert!(!x.any_literal); assert!(!x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"aBc\w"); let x = analysis(r"aBc\w");
assert!(x.any_uppercase); assert!(x.any_uppercase);
assert!(x.any_literal); assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
let x = analysis(r"a\u0061"); let x = analysis(r"a\u0061");
assert!(!x.any_uppercase); assert!(!x.any_uppercase);
assert!(x.any_literal); assert!(x.any_literal);
assert!(!x.all_verbatim_literal);
} }
} }

View File

@@ -1,16 +1,15 @@
use { use grep_matcher::{ByteSet, LineTerminator};
grep_matcher::{ByteSet, LineTerminator}, use regex::bytes::{Regex, RegexBuilder};
regex_automata::meta::Regex, use regex_syntax::ast::{self, Ast};
regex_syntax::{ use regex_syntax::hir::{self, Hir};
ast,
hir::{self, Hir, HirKind},
},
};
use crate::{ use ast::AstAnalysis;
ast::AstAnalysis, error::Error, non_matching::non_matching_bytes, use crlf::crlfify;
strip::strip_from_match, use error::Error;
}; use literal::LiteralSets;
use multi::alternation_literals;
use non_matching::non_matching_bytes;
use strip::strip_from_match;
/// Config represents the configuration of a regex matcher in this crate. /// Config represents the configuration of a regex matcher in this crate.
/// The configuration is itself a rough combination of the knobs found in /// The configuration is itself a rough combination of the knobs found in
@@ -22,23 +21,21 @@ use crate::{
/// configuration which generated it, and provides transformation on that HIR /// configuration which generated it, and provides transformation on that HIR
/// such that the configuration is preserved. /// such that the configuration is preserved.
#[derive(Clone, Debug)] #[derive(Clone, Debug)]
pub(crate) struct Config { pub struct Config {
pub(crate) case_insensitive: bool, pub case_insensitive: bool,
pub(crate) case_smart: bool, pub case_smart: bool,
pub(crate) multi_line: bool, pub multi_line: bool,
pub(crate) dot_matches_new_line: bool, pub dot_matches_new_line: bool,
pub(crate) swap_greed: bool, pub swap_greed: bool,
pub(crate) ignore_whitespace: bool, pub ignore_whitespace: bool,
pub(crate) unicode: bool, pub unicode: bool,
pub(crate) octal: bool, pub octal: bool,
pub(crate) size_limit: usize, pub size_limit: usize,
pub(crate) dfa_size_limit: usize, pub dfa_size_limit: usize,
pub(crate) nest_limit: u32, pub nest_limit: u32,
pub(crate) line_terminator: Option<LineTerminator>, pub line_terminator: Option<LineTerminator>,
pub(crate) crlf: bool, pub crlf: bool,
pub(crate) word: bool, pub word: bool,
pub(crate) fixed_strings: bool,
pub(crate) whole_line: bool,
} }
impl Default for Config { impl Default for Config {
@@ -53,28 +50,47 @@ impl Default for Config {
unicode: true, unicode: true,
octal: false, octal: false,
// These size limits are much bigger than what's in the regex // These size limits are much bigger than what's in the regex
// crate by default. // crate.
size_limit: 100 * (1 << 20), size_limit: 100 * (1 << 20),
dfa_size_limit: 1000 * (1 << 20), dfa_size_limit: 1000 * (1 << 20),
nest_limit: 250, nest_limit: 250,
line_terminator: None, line_terminator: None,
crlf: false, crlf: false,
word: false, word: false,
fixed_strings: false,
whole_line: false,
} }
} }
} }
impl Config { impl Config {
/// Use this configuration to build an HIR from the given patterns. The HIR /// Parse the given pattern and returned its HIR expression along with
/// returned corresponds to a single regex that is an alternation of the /// the current configuration.
/// patterns given. ///
pub(crate) fn build_many<P: AsRef<str>>( /// If there was a problem parsing the given expression then an error
&self, /// is returned.
patterns: &[P], pub fn hir(&self, pattern: &str) -> Result<ConfiguredHIR, Error> {
) -> Result<ConfiguredHIR, Error> { let ast = self.ast(pattern)?;
ConfiguredHIR::new(self.clone(), patterns) let analysis = self.analysis(&ast)?;
let expr = hir::translate::TranslatorBuilder::new()
.allow_invalid_utf8(true)
.case_insensitive(self.is_case_insensitive(&analysis))
.multi_line(self.multi_line)
.dot_matches_new_line(self.dot_matches_new_line)
.swap_greed(self.swap_greed)
.unicode(self.unicode)
.build()
.translate(pattern, &ast)
.map_err(Error::regex)?;
let expr = match self.line_terminator {
None => expr,
Some(line_term) => strip_from_match(expr, line_term)?,
};
Ok(ConfiguredHIR {
original: pattern.to_string(),
config: self.clone(),
analysis,
// If CRLF mode is enabled, replace `$` with `(?:\r?$)`.
expr: if self.crlf { crlfify(expr) } else { expr },
})
} }
/// Accounting for the `smart_case` config knob, return true if and only if /// Accounting for the `smart_case` config knob, return true if and only if
@@ -89,55 +105,35 @@ impl Config {
analysis.any_literal() && !analysis.any_uppercase() analysis.any_literal() && !analysis.any_uppercase()
} }
/// Returns whether the given patterns should be treated as "fixed strings" /// Returns true if and only if this config is simple enough such that
/// literals. This is different from just querying the `fixed_strings` knob /// if the pattern is a simple alternation of literals, then it can be
/// in that if the knob is false, this will still return true in some cases /// constructed via a plain Aho-Corasick automaton.
/// if the patterns are themselves indistinguishable from literals.
/// ///
/// The main idea here is that if this returns true, then it is safe /// Note that it is OK to return true even when settings like `multi_line`
/// to build an `regex_syntax::hir::Hir` value directly from the given /// are enabled, since if multi-line can impact the match semantics of a
/// patterns as an alternation of `hir::Literal` values. /// regex, then it is by definition not a simple alternation of literals.
fn is_fixed_strings<P: AsRef<str>>(&self, patterns: &[P]) -> bool { pub fn can_plain_aho_corasick(&self) -> bool {
// When these are enabled, we really need to parse the patterns and !self.word && !self.case_insensitive && !self.case_smart
// let them go through the standard HIR translation process in order }
// for case folding transforms to be applied.
if self.case_insensitive || self.case_smart { /// Perform analysis on the AST of this pattern.
return false; ///
} /// This returns an error if the given pattern failed to parse.
// Even if whole_line or word is enabled, both of those things can fn analysis(&self, ast: &Ast) -> Result<AstAnalysis, Error> {
// be implemented by wrapping the Hir generated by an alternation of Ok(AstAnalysis::from_ast(ast))
// fixed string literals. So for here at least, we don't care about the }
// word or whole_line settings.
if self.fixed_strings { /// Parse the given pattern into its abstract syntax.
// ... but if any literal contains a line terminator, then we've ///
// got to bail out because this will ultimately result in an error. /// This returns an error if the given pattern failed to parse.
if let Some(lineterm) = self.line_terminator { fn ast(&self, pattern: &str) -> Result<Ast, Error> {
for p in patterns.iter() { ast::parse::ParserBuilder::new()
if has_line_terminator(lineterm, p.as_ref()) { .nest_limit(self.nest_limit)
return false; .octal(self.octal)
} .ignore_whitespace(self.ignore_whitespace)
} .build()
} .parse(pattern)
return true; .map_err(Error::regex)
}
// In this case, the only way we can hand construct the Hir is if none
// of the patterns contain meta characters. If they do, then we need to
// send them through the standard parsing/translation process.
for p in patterns.iter() {
let p = p.as_ref();
if p.chars().any(regex_syntax::is_meta_character) {
return false;
}
// Same deal as when fixed_strings is set above. If the pattern has
// a line terminator anywhere, then we need to bail out and let
// an error occur.
if let Some(lineterm) = self.line_terminator {
if has_line_terminator(lineterm, p) {
return false;
}
}
}
true
} }
} }
@@ -153,268 +149,140 @@ impl Config {
/// size limits set on the configured HIR will be propagated out to any /// size limits set on the configured HIR will be propagated out to any
/// subsequently constructed HIR or regular expression. /// subsequently constructed HIR or regular expression.
#[derive(Clone, Debug)] #[derive(Clone, Debug)]
pub(crate) struct ConfiguredHIR { pub struct ConfiguredHIR {
original: String,
config: Config, config: Config,
hir: Hir, analysis: AstAnalysis,
expr: Hir,
} }
impl ConfiguredHIR { impl ConfiguredHIR {
/// Parse the given patterns into a single HIR expression that represents /// Return the configuration for this HIR expression.
/// an alternation of the patterns given. pub fn config(&self) -> &Config {
fn new<P: AsRef<str>>(
config: Config,
patterns: &[P],
) -> Result<ConfiguredHIR, Error> {
let hir = if config.is_fixed_strings(patterns) {
let mut alts = vec![];
for p in patterns.iter() {
alts.push(Hir::literal(p.as_ref().as_bytes()));
}
log::debug!(
"assembling HIR from {} fixed string literals",
alts.len()
);
let hir = Hir::alternation(alts);
hir
} else {
let mut alts = vec![];
for p in patterns.iter() {
alts.push(if config.fixed_strings {
format!("(?:{})", regex_syntax::escape(p.as_ref()))
} else {
format!("(?:{})", p.as_ref())
});
}
let pattern = alts.join("|");
let ast = ast::parse::ParserBuilder::new()
.nest_limit(config.nest_limit)
.octal(config.octal)
.ignore_whitespace(config.ignore_whitespace)
.build()
.parse(&pattern)
.map_err(Error::generic)?;
let analysis = AstAnalysis::from_ast(&ast);
let mut hir = hir::translate::TranslatorBuilder::new()
.utf8(false)
.case_insensitive(config.is_case_insensitive(&analysis))
.multi_line(config.multi_line)
.dot_matches_new_line(config.dot_matches_new_line)
.crlf(config.crlf)
.swap_greed(config.swap_greed)
.unicode(config.unicode)
.build()
.translate(&pattern, &ast)
.map_err(Error::generic)?;
// We don't need to do this for the fixed-strings case above
// because is_fixed_strings will return false if any pattern
// contains a line terminator. Therefore, we don't need to strip
// it.
//
// We go to some pains to avoid doing this in the fixed-strings
// case because this can result in building a new HIR when ripgrep
// is given a huge set of literals to search for. And this can
// actually take a little time. It's not huge, but it's noticeable.
hir = match config.line_terminator {
None => hir,
Some(line_term) => strip_from_match(hir, line_term)?,
};
hir
};
Ok(ConfiguredHIR { config, hir })
}
/// Return a reference to the underlying configuration.
pub(crate) fn config(&self) -> &Config {
&self.config &self.config
} }
/// Return a reference to the underyling HIR. /// Compute the set of non-matching bytes for this HIR expression.
pub(crate) fn hir(&self) -> &Hir { pub fn non_matching_bytes(&self) -> ByteSet {
&self.hir non_matching_bytes(&self.expr)
} }
/// Convert this HIR to a regex that can be used for matching. /// Returns true if and only if this regex needs to have its match offsets
pub(crate) fn to_regex(&self) -> Result<Regex, Error> { /// tweaked because of CRLF support. Specifically, this occurs when the
let meta = Regex::config() /// CRLF hack is enabled and the regex is line anchored at the end. In
.utf8_empty(false) /// this case, matches that end with a `\r` have the `\r` stripped.
.nfa_size_limit(Some(self.config.size_limit)) pub fn needs_crlf_stripped(&self) -> bool {
// We don't expose a knob for this because the one-pass DFA is self.config.crlf && self.expr.is_line_anchored_end()
// usually not a perf bottleneck for ripgrep. But we give it some }
// extra room than the default.
.onepass_size_limit(Some(10 * (1 << 20))) /// Builds a regular expression from this HIR expression.
// Same deal here. The default limit for full DFAs is VERY small, pub fn regex(&self) -> Result<Regex, Error> {
// but with ripgrep we can afford to spend a bit more time on self.pattern_to_regex(&self.expr.to_string())
// building them I think. }
.dfa_size_limit(Some(1 * (1 << 20)))
.dfa_state_limit(Some(1_000)) /// If this HIR corresponds to an alternation of literals with no
.hybrid_cache_capacity(self.config.dfa_size_limit); /// capturing groups, then this returns those literals.
Regex::builder() pub fn alternation_literals(&self) -> Option<Vec<Vec<u8>>> {
.configure(meta) if !self.config.can_plain_aho_corasick() {
.build_from_hir(&self.hir) return None;
}
alternation_literals(&self.expr)
}
/// Applies the given function to the concrete syntax of this HIR and then
/// generates a new HIR based on the result of the function in a way that
/// preserves the configuration.
///
/// For example, this can be used to wrap a user provided regular
/// expression with additional semantics. e.g., See the `WordMatcher`.
pub fn with_pattern<F: FnMut(&str) -> String>(
&self,
mut f: F,
) -> Result<ConfiguredHIR, Error> {
self.pattern_to_hir(&f(&self.expr.to_string()))
}
/// If the current configuration has a line terminator set and if useful
/// literals could be extracted, then a regular expression matching those
/// literals is returned. If no line terminator is set, then `None` is
/// returned.
///
/// If compiling the resulting regular expression failed, then an error
/// is returned.
///
/// This method only returns something when a line terminator is set
/// because matches from this regex are generally candidates that must be
/// confirmed before reporting a match. When performing a line oriented
/// search, confirmation is easy: just extend the candidate match to its
/// respective line boundaries and then re-search that line for a full
/// match. This only works when the line terminator is set because the line
/// terminator setting guarantees that the regex itself can never match
/// through the line terminator byte.
pub fn fast_line_regex(&self) -> Result<Option<Regex>, Error> {
if self.config.line_terminator.is_none() {
return Ok(None);
}
match LiteralSets::new(&self.expr).one_regex(self.config.word) {
None => Ok(None),
Some(pattern) => self.pattern_to_regex(&pattern).map(Some),
}
}
/// Create a regex from the given pattern using this HIR's configuration.
fn pattern_to_regex(&self, pattern: &str) -> Result<Regex, Error> {
// The settings we explicitly set here are intentionally a subset
// of the settings we have. The key point here is that our HIR
// expression is computed with the settings in mind, such that setting
// them here could actually lead to unintended behavior. For example,
// consider the pattern `(?U)a+`. This will get folded into the HIR
// as a non-greedy repetition operator which will in turn get printed
// to the concrete syntax as `a+?`, which is correct. But if we
// set the `swap_greed` option again, then we'll wind up with `(?U)a+?`
// which is equal to `a+` which is not the same as what we were given.
//
// We also don't need to apply `case_insensitive` since this gets
// folded into the HIR and would just cause us to do redundant work.
//
// Finally, we don't need to set `ignore_whitespace` since the concrete
// syntax emitted by the HIR printer never needs it.
//
// We set the rest of the options. Some of them are important, such as
// the size limit, and some of them are necessary to preserve the
// intention of the original pattern. For example, the Unicode flag
// will impact how the WordMatcher functions, namely, whether its
// word boundaries are Unicode aware or not.
RegexBuilder::new(&pattern)
.nest_limit(self.config.nest_limit)
.octal(self.config.octal)
.multi_line(self.config.multi_line)
.dot_matches_new_line(self.config.dot_matches_new_line)
.unicode(self.config.unicode)
.size_limit(self.config.size_limit)
.dfa_size_limit(self.config.dfa_size_limit)
.build()
.map_err(Error::regex) .map_err(Error::regex)
} }
/// Compute the set of non-matching bytes for this HIR expression. /// Create an HIR expression from the given pattern using this HIR's
pub(crate) fn non_matching_bytes(&self) -> ByteSet { /// configuration.
non_matching_bytes(&self.hir) fn pattern_to_hir(&self, pattern: &str) -> Result<ConfiguredHIR, Error> {
} // See `pattern_to_regex` comment for explanation of why we only set
// a subset of knobs here. e.g., `swap_greed` is explicitly left out.
/// Returns the line terminator configured on this expression. let expr = ::regex_syntax::ParserBuilder::new()
/// .nest_limit(self.config.nest_limit)
/// When we have beginning/end anchors (NOT line anchors), the fast line .octal(self.config.octal)
/// searching path isn't quite correct. Or at least, doesn't match the slow .allow_invalid_utf8(true)
/// path. Namely, the slow path strips line terminators while the fast path .multi_line(self.config.multi_line)
/// does not. Since '$' (when multi-line mode is disabled) doesn't match at .dot_matches_new_line(self.config.dot_matches_new_line)
/// line boundaries, the existence of a line terminator might cause it to .unicode(self.config.unicode)
/// not match when it otherwise would with the line terminator stripped. .build()
/// .parse(pattern)
/// Since searching with text anchors is exceptionally rare in the context .map_err(Error::regex)?;
/// of line oriented searching (multi-line mode is basically always Ok(ConfiguredHIR {
/// enabled), we just disable this optimization when there are text original: self.original.clone(),
/// anchors. We disable it by not returning a line terminator, since config: self.config.clone(),
/// without a line terminator, the fast search path can't be executed. analysis: self.analysis.clone(),
/// expr,
/// Actually, the above is no longer quite correct. Later on, another })
/// optimization was added where if the line terminator was in the set of
/// bytes that was guaranteed to never be part of a match, then the higher
/// level search infrastructure assumes that the fast line-by-line search
/// path can still be taken. This optimization applies when multi-line
/// search (not multi-line mode) is enabled. In that case, there is no
/// configured line terminator since the regex is permitted to match a
/// line terminator. But if the regex is guaranteed to never match across
/// multiple lines despite multi-line search being requested, we can still
/// do the faster and more flexible line-by-line search. This is why the
/// non-matching extraction routine removes `\n` when `\A` and `\z` are
/// present even though that's not quite correct...
///
/// See: <https://github.com/BurntSushi/ripgrep/issues/2260>
pub(crate) fn line_terminator(&self) -> Option<LineTerminator> {
if self.hir.properties().look_set().contains_anchor_haystack() {
None
} else {
self.config.line_terminator
}
}
/// Turns this configured HIR into one that only matches when both sides of
/// the match correspond to a word boundary.
///
/// Note that the HIR returned is like turning `pat` into
/// `(?m:^|\W)(pat)(?m:$|\W)`. That is, the true match is at capture group
/// `1` and not `0`.
pub(crate) fn into_word(self) -> Result<ConfiguredHIR, Error> {
// In theory building the HIR for \W should never fail, but there are
// likely some pathological cases (particularly with respect to certain
// values of limits) where it could in theory fail.
let non_word = {
let mut config = self.config.clone();
config.fixed_strings = false;
ConfiguredHIR::new(config, &[r"\W"])?
};
let line_anchor_start = Hir::look(self.line_anchor_start());
let line_anchor_end = Hir::look(self.line_anchor_end());
let hir = Hir::concat(vec![
Hir::alternation(vec![line_anchor_start, non_word.hir.clone()]),
Hir::capture(hir::Capture {
index: 1,
name: None,
sub: Box::new(renumber_capture_indices(self.hir)?),
}),
Hir::alternation(vec![non_word.hir, line_anchor_end]),
]);
Ok(ConfiguredHIR { config: self.config, hir })
}
/// Turns this configured HIR into an equivalent one, but where it must
/// match at the start and end of a line.
pub(crate) fn into_whole_line(self) -> ConfiguredHIR {
let line_anchor_start = Hir::look(self.line_anchor_start());
let line_anchor_end = Hir::look(self.line_anchor_end());
let hir =
Hir::concat(vec![line_anchor_start, self.hir, line_anchor_end]);
ConfiguredHIR { config: self.config, hir }
}
/// Turns this configured HIR into an equivalent one, but where it must
/// match at the start and end of the haystack.
pub(crate) fn into_anchored(self) -> ConfiguredHIR {
let hir = Hir::concat(vec![
Hir::look(hir::Look::Start),
self.hir,
Hir::look(hir::Look::End),
]);
ConfiguredHIR { config: self.config, hir }
}
/// Returns the "start line" anchor for this configuration.
fn line_anchor_start(&self) -> hir::Look {
if self.config.crlf {
hir::Look::StartCRLF
} else {
hir::Look::StartLF
}
}
/// Returns the "end line" anchor for this configuration.
fn line_anchor_end(&self) -> hir::Look {
if self.config.crlf {
hir::Look::EndCRLF
} else {
hir::Look::EndLF
}
}
}
/// This increments the index of every capture group in the given hir by 1. If
/// any increment results in an overflow, then an error is returned.
fn renumber_capture_indices(hir: Hir) -> Result<Hir, Error> {
Ok(match hir.into_kind() {
HirKind::Empty => Hir::empty(),
HirKind::Literal(hir::Literal(lit)) => Hir::literal(lit),
HirKind::Class(cls) => Hir::class(cls),
HirKind::Look(x) => Hir::look(x),
HirKind::Repetition(mut x) => {
x.sub = Box::new(renumber_capture_indices(*x.sub)?);
Hir::repetition(x)
}
HirKind::Capture(mut cap) => {
cap.index = match cap.index.checked_add(1) {
Some(index) => index,
None => {
// This error message kind of sucks, but it's probably
// impossible for it to happen. The only way a capture
// index can overflow addition is if the regex is huge
// (or something else has gone horribly wrong).
let msg = "could not renumber capture index, too big";
return Err(Error::any(msg));
}
};
cap.sub = Box::new(renumber_capture_indices(*cap.sub)?);
Hir::capture(cap)
}
HirKind::Concat(subs) => {
let subs = subs
.into_iter()
.map(|sub| renumber_capture_indices(sub))
.collect::<Result<Vec<Hir>, Error>>()?;
Hir::concat(subs)
}
HirKind::Alternation(subs) => {
let subs = subs
.into_iter()
.map(|sub| renumber_capture_indices(sub))
.collect::<Result<Vec<Hir>, Error>>()?;
Hir::alternation(subs)
}
})
}
/// Returns true if the given literal string contains any byte from the line
/// terminator given.
fn has_line_terminator(lineterm: LineTerminator, literal: &str) -> bool {
if lineterm.is_crlf() {
literal.as_bytes().iter().copied().any(|b| b == b'\r' || b == b'\n')
} else {
literal.as_bytes().iter().copied().any(|b| b == lineterm.as_byte())
} }
} }

189
crates/regex/src/crlf.rs Normal file
View File

@@ -0,0 +1,189 @@
use std::collections::HashMap;
use grep_matcher::{Match, Matcher, NoError};
use regex::bytes::Regex;
use regex_syntax::hir::{self, Hir, HirKind};
use config::ConfiguredHIR;
use error::Error;
use matcher::RegexCaptures;
/// A matcher for implementing "word match" semantics.
#[derive(Clone, Debug)]
pub struct CRLFMatcher {
/// The regex.
regex: Regex,
/// A map from capture group name to capture group index.
names: HashMap<String, usize>,
}
impl CRLFMatcher {
/// Create a new matcher from the given pattern that strips `\r` from the
/// end of every match.
///
/// This panics if the given expression doesn't need its CRLF stripped.
pub fn new(expr: &ConfiguredHIR) -> Result<CRLFMatcher, Error> {
assert!(expr.needs_crlf_stripped());
let regex = expr.regex()?;
let mut names = HashMap::new();
for (i, optional_name) in regex.capture_names().enumerate() {
if let Some(name) = optional_name {
names.insert(name.to_string(), i.checked_sub(1).unwrap());
}
}
Ok(CRLFMatcher { regex, names })
}
/// Return the underlying regex used by this matcher.
pub fn regex(&self) -> &Regex {
&self.regex
}
}
impl Matcher for CRLFMatcher {
type Captures = RegexCaptures;
type Error = NoError;
fn find_at(
&self,
haystack: &[u8],
at: usize,
) -> Result<Option<Match>, NoError> {
let m = match self.regex.find_at(haystack, at) {
None => return Ok(None),
Some(m) => Match::new(m.start(), m.end()),
};
Ok(Some(adjust_match(haystack, m)))
}
fn new_captures(&self) -> Result<RegexCaptures, NoError> {
Ok(RegexCaptures::new(self.regex.capture_locations()))
}
fn capture_count(&self) -> usize {
self.regex.captures_len().checked_sub(1).unwrap()
}
fn capture_index(&self, name: &str) -> Option<usize> {
self.names.get(name).map(|i| *i)
}
fn captures_at(
&self,
haystack: &[u8],
at: usize,
caps: &mut RegexCaptures,
) -> Result<bool, NoError> {
caps.strip_crlf(false);
let r =
self.regex.captures_read_at(caps.locations_mut(), haystack, at);
if !r.is_some() {
return Ok(false);
}
// If the end of our match includes a `\r`, then strip it from all
// capture groups ending at the same location.
let end = caps.locations().get(0).unwrap().1;
if end > 0 && haystack.get(end - 1) == Some(&b'\r') {
caps.strip_crlf(true);
}
Ok(true)
}
// We specifically do not implement other methods like find_iter or
// captures_iter. Namely, the iter methods are guaranteed to be correct
// by virtue of implementing find_at and captures_at above.
}
/// If the given match ends with a `\r`, then return a new match that ends
/// immediately before the `\r`.
pub fn adjust_match(haystack: &[u8], m: Match) -> Match {
if m.end() > 0 && haystack.get(m.end() - 1) == Some(&b'\r') {
m.with_end(m.end() - 1)
} else {
m
}
}
/// Substitutes all occurrences of multi-line enabled `$` with `(?:\r?$)`.
///
/// This does not preserve the exact semantics of the given expression,
/// however, it does have the useful property that anything that matched the
/// given expression will also match the returned expression. The difference is
/// that the returned expression can match possibly other things as well.
///
/// The principle reason why we do this is because the underlying regex engine
/// doesn't support CRLF aware `$` look-around. It's planned to fix it at that
/// level, but we perform this kludge in the mean time.
///
/// Note that while the match preserving semantics are nice and neat, the
/// match position semantics are quite a bit messier. Namely, `$` only ever
/// matches the position between characters where as `\r??` can match a
/// character and change the offset. This is regretable, but works out pretty
/// nicely in most cases, especially when a match is limited to a single line.
pub fn crlfify(expr: Hir) -> Hir {
match expr.into_kind() {
HirKind::Anchor(hir::Anchor::EndLine) => {
let concat = Hir::concat(vec![
Hir::repetition(hir::Repetition {
kind: hir::RepetitionKind::ZeroOrOne,
greedy: false,
hir: Box::new(Hir::literal(hir::Literal::Unicode('\r'))),
}),
Hir::anchor(hir::Anchor::EndLine),
]);
Hir::group(hir::Group {
kind: hir::GroupKind::NonCapturing,
hir: Box::new(concat),
})
}
HirKind::Empty => Hir::empty(),
HirKind::Literal(x) => Hir::literal(x),
HirKind::Class(x) => Hir::class(x),
HirKind::Anchor(x) => Hir::anchor(x),
HirKind::WordBoundary(x) => Hir::word_boundary(x),
HirKind::Repetition(mut x) => {
x.hir = Box::new(crlfify(*x.hir));
Hir::repetition(x)
}
HirKind::Group(mut x) => {
x.hir = Box::new(crlfify(*x.hir));
Hir::group(x)
}
HirKind::Concat(xs) => {
Hir::concat(xs.into_iter().map(crlfify).collect())
}
HirKind::Alternation(xs) => {
Hir::alternation(xs.into_iter().map(crlfify).collect())
}
}
}
#[cfg(test)]
mod tests {
use super::crlfify;
use regex_syntax::Parser;
fn roundtrip(pattern: &str) -> String {
let expr1 = Parser::new().parse(pattern).unwrap();
let expr2 = crlfify(expr1);
expr2.to_string()
}
#[test]
fn various() {
assert_eq!(roundtrip(r"(?m)$"), "(?:\r??(?m:$))");
assert_eq!(roundtrip(r"(?m)$$"), "(?:\r??(?m:$))(?:\r??(?m:$))");
assert_eq!(
roundtrip(r"(?m)(?:foo$|bar$)"),
"(?:foo(?:\r??(?m:$))|bar(?:\r??(?m:$)))"
);
assert_eq!(roundtrip(r"(?m)$a"), "(?:\r??(?m:$))a");
// Not a multiline `$`, so no crlfifying occurs.
assert_eq!(roundtrip(r"$"), "\\z");
// It's a literal, derp.
assert_eq!(roundtrip(r"\$"), "\\$");
}
}

View File

@@ -1,3 +1,8 @@
use std::error;
use std::fmt;
use util;
/// An error that can occur in this crate. /// An error that can occur in this crate.
/// ///
/// Generally, this error corresponds to problems building a regular /// Generally, this error corresponds to problems building a regular
@@ -13,27 +18,10 @@ impl Error {
Error { kind } Error { kind }
} }
pub(crate) fn regex(err: regex_automata::meta::BuildError) -> Error { pub(crate) fn regex<E: error::Error>(err: E) -> Error {
if let Some(size_limit) = err.size_limit() {
let kind = ErrorKind::Regex(format!(
"compiled regex exceeds size limit of {size_limit}",
));
Error { kind }
} else if let Some(ref err) = err.syntax_error() {
Error::generic(err)
} else {
Error::generic(err)
}
}
pub(crate) fn generic<E: std::error::Error>(err: E) -> Error {
Error { kind: ErrorKind::Regex(err.to_string()) } Error { kind: ErrorKind::Regex(err.to_string()) }
} }
pub(crate) fn any<E: ToString>(msg: E) -> Error {
Error { kind: ErrorKind::Regex(msg.to_string()) }
}
/// Return the kind of this error. /// Return the kind of this error.
pub fn kind(&self) -> &ErrorKind { pub fn kind(&self) -> &ErrorKind {
&self.kind &self.kind
@@ -42,7 +30,6 @@ impl Error {
/// The kind of an error that can occur. /// The kind of an error that can occur.
#[derive(Clone, Debug)] #[derive(Clone, Debug)]
#[non_exhaustive]
pub enum ErrorKind { pub enum ErrorKind {
/// An error that occurred as a result of parsing a regular expression. /// An error that occurred as a result of parsing a regular expression.
/// This can be a syntax error or an error that results from attempting to /// This can be a syntax error or an error that results from attempting to
@@ -64,26 +51,38 @@ pub enum ErrorKind {
/// ///
/// The invalid byte is included in this error. /// The invalid byte is included in this error.
InvalidLineTerminator(u8), InvalidLineTerminator(u8),
/// Hints that destructuring should not be exhaustive.
///
/// This enum may grow additional variants, so this makes sure clients
/// don't count on exhaustive matching. (Otherwise, adding a new variant
/// could break existing code.)
#[doc(hidden)]
__Nonexhaustive,
} }
impl std::error::Error for Error {} impl error::Error for Error {
fn description(&self) -> &str {
impl std::fmt::Display for Error {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
use bstr::ByteSlice;
match self.kind { match self.kind {
ErrorKind::Regex(ref s) => write!(f, "{}", s), ErrorKind::Regex(_) => "regex error",
ErrorKind::NotAllowed(ref lit) => { ErrorKind::NotAllowed(_) => "literal not allowed",
write!(f, "the literal {:?} is not allowed in a regex", lit) ErrorKind::InvalidLineTerminator(_) => "invalid line terminator",
} ErrorKind::__Nonexhaustive => unreachable!(),
ErrorKind::InvalidLineTerminator(byte) => { }
write!( }
f, }
"line terminators must be ASCII, but {} is not",
[byte].as_bstr() impl fmt::Display for Error {
) fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
} match self.kind {
ErrorKind::Regex(ref s) => write!(f, "{}", s),
ErrorKind::NotAllowed(ref lit) => {
write!(f, "the literal '{:?}' is not allowed in a regex", lit)
}
ErrorKind::InvalidLineTerminator(byte) => {
let x = util::show_bytes(&[byte]);
write!(f, "line terminators must be ASCII, but '{}' is not", x)
}
ErrorKind::__Nonexhaustive => unreachable!(),
} }
} }
} }

View File

@@ -1,16 +1,29 @@
/*! /*!
An implementation of `grep-matcher`'s `Matcher` trait for Rust's regex engine. An implementation of `grep-matcher`'s `Matcher` trait for Rust's regex engine.
*/ */
#![deny(missing_docs)] #![deny(missing_docs)]
pub use crate::error::{Error, ErrorKind}; extern crate aho_corasick;
pub use crate::matcher::{RegexCaptures, RegexMatcher, RegexMatcherBuilder}; extern crate bstr;
extern crate grep_matcher;
#[macro_use]
extern crate log;
extern crate regex;
extern crate regex_syntax;
extern crate thread_local;
pub use error::{Error, ErrorKind};
pub use matcher::{RegexCaptures, RegexMatcher, RegexMatcherBuilder};
mod ast; mod ast;
mod config; mod config;
mod crlf;
mod error; mod error;
mod literal; mod literal;
mod matcher; mod matcher;
mod multi;
mod non_matching; mod non_matching;
mod strip; mod strip;
mod util;
mod word; mod word;

File diff suppressed because it is too large Load Diff

View File

@@ -1,22 +1,15 @@
use std::sync::Arc; use std::collections::HashMap;
use { use grep_matcher::{
grep_matcher::{ ByteSet, Captures, LineMatchKind, LineTerminator, Match, Matcher, NoError,
ByteSet, Captures, LineMatchKind, LineTerminator, Match, Matcher,
NoError,
},
regex_automata::{
meta::Regex, util::captures::Captures as AutomataCaptures, Input,
PatternID,
},
}; };
use regex::bytes::{CaptureLocations, Regex};
use crate::{ use config::{Config, ConfiguredHIR};
config::{Config, ConfiguredHIR}, use crlf::CRLFMatcher;
error::Error, use error::Error;
literal::InnerLiterals, use multi::MultiLiteralMatcher;
word::WordMatcher, use word::WordMatcher;
};
/// A builder for constructing a `Matcher` using regular expressions. /// A builder for constructing a `Matcher` using regular expressions.
/// ///
@@ -26,7 +19,7 @@ use crate::{
/// types of optimizations. /// types of optimizations.
/// ///
/// The syntax supported is documented as part of the regex crate: /// The syntax supported is documented as part of the regex crate:
/// <https://docs.rs/regex/#syntax>. /// https://docs.rs/regex/*/regex/#syntax
#[derive(Clone, Debug)] #[derive(Clone, Debug)]
pub struct RegexMatcherBuilder { pub struct RegexMatcherBuilder {
config: Config, config: Config,
@@ -48,42 +41,19 @@ impl RegexMatcherBuilder {
/// pattern. /// pattern.
/// ///
/// The syntax supported is documented as part of the regex crate: /// The syntax supported is documented as part of the regex crate:
/// <https://docs.rs/regex/#syntax>. /// https://docs.rs/regex/*/regex/#syntax
pub fn build(&self, pattern: &str) -> Result<RegexMatcher, Error> { pub fn build(&self, pattern: &str) -> Result<RegexMatcher, Error> {
self.build_many(&[pattern]) let chir = self.config.hir(pattern)?;
} let fast_line_regex = chir.fast_line_regex()?;
/// Build a new matcher using the current configuration for the provided
/// patterns. The resulting matcher behaves as if all of the patterns
/// given are joined together into a single alternation. That is, it
/// reports matches where at least one of the given patterns matches.
pub fn build_many<P: AsRef<str>>(
&self,
patterns: &[P],
) -> Result<RegexMatcher, Error> {
let chir = self.config.build_many(patterns)?;
let matcher = RegexMatcherImpl::new(chir)?;
let (chir, re) = (matcher.chir(), matcher.regex());
log::trace!("final regex: {:?}", chir.hir().to_string());
let non_matching_bytes = chir.non_matching_bytes(); let non_matching_bytes = chir.non_matching_bytes();
// If we can pick out some literals from the regex, then we might be if let Some(ref re) = fast_line_regex {
// able to build a faster regex that quickly identifies candidate debug!("extracted fast line regex: {:?}", re);
// matching lines. The regex engine will do what it can on its own, but }
// we can specifically do a little more when a line terminator is set.
// For example, for a regex like `\w+foo\w+`, we can look for `foo`,
// and when a match is found, look for the line containing `foo` and
// then run the original regex on only that line. (In this case, the
// regex engine is likely to handle this case for us since it's so
// simple, but the idea applies.)
let fast_line_regex = InnerLiterals::new(chir, re).one_regex()?;
// We override the line terminator in case the configured HIR doesn't let matcher = RegexMatcherImpl::new(&chir)?;
// support it. trace!("final regex: {:?}", matcher.regex());
let mut config = self.config.clone();
config.line_terminator = chir.line_terminator();
Ok(RegexMatcher { Ok(RegexMatcher {
config, config: self.config.clone(),
matcher, matcher,
fast_line_regex, fast_line_regex,
non_matching_bytes, non_matching_bytes,
@@ -99,7 +69,39 @@ impl RegexMatcherBuilder {
&self, &self,
literals: &[B], literals: &[B],
) -> Result<RegexMatcher, Error> { ) -> Result<RegexMatcher, Error> {
self.build_many(literals) let mut has_escape = false;
let mut slices = vec![];
for lit in literals {
slices.push(lit.as_ref());
has_escape = has_escape || lit.as_ref().contains('\\');
}
// Even when we have a fixed set of literals, we might still want to
// use the regex engine. Specifically, if any string has an escape
// in it, then we probably can't feed it to Aho-Corasick without
// removing the escape. Additionally, if there are any particular
// special match semantics we need to honor, that Aho-Corasick isn't
// enough. Finally, the regex engine can do really well with a small
// number of literals (at time of writing, this is changing soon), so
// we use it when there's a small set.
//
// Yes, this is one giant hack. Ideally, this entirely separate literal
// matcher that uses Aho-Corasick would be pushed down into the regex
// engine.
if has_escape
|| !self.config.can_plain_aho_corasick()
|| literals.len() < 40
{
return self.build(&slices.join("|"));
}
let matcher = MultiLiteralMatcher::new(&slices)?;
let imp = RegexMatcherImpl::MultiLiteral(matcher);
Ok(RegexMatcher {
config: self.config.clone(),
matcher: imp,
fast_line_regex: None,
non_matching_bytes: ByteSet::empty(),
})
} }
/// Set the value for the case insensitive (`i`) flag. /// Set the value for the case insensitive (`i`) flag.
@@ -300,15 +302,20 @@ impl RegexMatcherBuilder {
/// 1. It causes the line terminator for the matcher to be `\r\n`. Namely, /// 1. It causes the line terminator for the matcher to be `\r\n`. Namely,
/// this prevents the matcher from ever producing a match that contains /// this prevents the matcher from ever producing a match that contains
/// a `\r` or `\n`. /// a `\r` or `\n`.
/// 2. It enables CRLF mode for `^` and `$`. This means that line anchors /// 2. It translates all instances of `$` in the pattern to `(?:\r??$)`.
/// will treat both `\r` and `\n` as line terminators, but will never /// This works around the fact that the regex engine does not support
/// match between a `\r` and `\n`. /// matching CRLF as a line terminator when using `$`.
/// ///
/// Note that if you do not wish to set the line terminator but would /// In particular, because of (2), the matches produced by the matcher may
/// still like `$` to match `\r\n` line terminators, then it is valid to /// be slightly different than what one would expect given the pattern.
/// call `crlf(true)` followed by `line_terminator(None)`. Ordering is /// This is the trade off made: in many cases, `$` will "just work" in the
/// important, since `crlf` sets the line terminator, but `line_terminator` /// presence of `\r\n` line terminators, but matches may require some
/// does not touch the `crlf` setting. /// trimming to faithfully represent the intended match.
///
/// Note that if you do not wish to set the line terminator but would still
/// like `$` to match `\r\n` line terminators, then it is valid to call
/// `crlf(true)` followed by `line_terminator(None)`. Ordering is
/// important, since `crlf` and `line_terminator` override each other.
pub fn crlf(&mut self, yes: bool) -> &mut RegexMatcherBuilder { pub fn crlf(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
if yes { if yes {
self.config.line_terminator = Some(LineTerminator::crlf()); self.config.line_terminator = Some(LineTerminator::crlf());
@@ -334,21 +341,6 @@ impl RegexMatcherBuilder {
self.config.word = yes; self.config.word = yes;
self self
} }
/// Whether the patterns should be treated as literal strings or not. When
/// this is active, all characters, including ones that would normally be
/// special regex meta characters, are matched literally.
pub fn fixed_strings(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.config.fixed_strings = yes;
self
}
/// Whether each pattern should match the entire line or not. This is
/// equivalent to surrounding the pattern with `(?m:^)` and `(?m:$)`.
pub fn whole_line(&mut self, yes: bool) -> &mut RegexMatcherBuilder {
self.config.whole_line = yes;
self
}
} }
/// An implementation of the `Matcher` trait using Rust's standard regex /// An implementation of the `Matcher` trait using Rust's standard regex
@@ -378,10 +370,10 @@ impl RegexMatcher {
/// Create a new matcher from the given pattern using the default /// Create a new matcher from the given pattern using the default
/// configuration, but matches lines terminated by `\n`. /// configuration, but matches lines terminated by `\n`.
/// ///
/// This is meant to be a convenience constructor for /// This is meant to be a convenience constructor for using a
/// using a `RegexMatcherBuilder` and setting its /// `RegexMatcherBuilder` and setting its
/// [`line_terminator`](RegexMatcherBuilder::method.line_terminator) to /// [`line_terminator`](struct.RegexMatcherBuilder.html#method.line_terminator)
/// `\n`. The purpose of using this constructor is to permit special /// to `\n`. The purpose of using this constructor is to permit special
/// optimizations that help speed up line oriented search. These types of /// optimizations that help speed up line oriented search. These types of
/// optimizations are only appropriate when matches span no more than one /// optimizations are only appropriate when matches span no more than one
/// line. For this reason, this constructor will return an error if the /// line. For this reason, this constructor will return an error if the
@@ -397,6 +389,13 @@ impl RegexMatcher {
enum RegexMatcherImpl { enum RegexMatcherImpl {
/// The standard matcher used for all regular expressions. /// The standard matcher used for all regular expressions.
Standard(StandardMatcher), Standard(StandardMatcher),
/// A matcher for an alternation of plain literals.
MultiLiteral(MultiLiteralMatcher),
/// A matcher that strips `\r` from the end of matches.
///
/// This is only used when the CRLF hack is enabled and the regex is line
/// anchored at the end.
CRLF(CRLFMatcher),
/// A matcher that only matches at word boundaries. This transforms the /// A matcher that only matches at word boundaries. This transforms the
/// regex to `(^|\W)(...)($|\W)` instead of the more intuitive `\b(...)\b`. /// regex to `(^|\W)(...)($|\W)` instead of the more intuitive `\b(...)\b`.
/// Because of this, the WordMatcher provides its own implementation of /// Because of this, the WordMatcher provides its own implementation of
@@ -408,33 +407,29 @@ enum RegexMatcherImpl {
impl RegexMatcherImpl { impl RegexMatcherImpl {
/// Based on the configuration, create a new implementation of the /// Based on the configuration, create a new implementation of the
/// `Matcher` trait. /// `Matcher` trait.
fn new(mut chir: ConfiguredHIR) -> Result<RegexMatcherImpl, Error> { fn new(expr: &ConfiguredHIR) -> Result<RegexMatcherImpl, Error> {
// When whole_line is set, we don't use a word matcher even if word if expr.config().word {
// matching was requested. Why? Because `(?m:^)(pat)(?m:$)` implies Ok(RegexMatcherImpl::Word(WordMatcher::new(expr)?))
// word matching. } else if expr.needs_crlf_stripped() {
Ok(if chir.config().word && !chir.config().whole_line { Ok(RegexMatcherImpl::CRLF(CRLFMatcher::new(expr)?))
RegexMatcherImpl::Word(WordMatcher::new(chir)?)
} else { } else {
if chir.config().whole_line { if let Some(lits) = expr.alternation_literals() {
chir = chir.into_whole_line(); if lits.len() >= 40 {
let matcher = MultiLiteralMatcher::new(&lits)?;
return Ok(RegexMatcherImpl::MultiLiteral(matcher));
}
} }
RegexMatcherImpl::Standard(StandardMatcher::new(chir)?) Ok(RegexMatcherImpl::Standard(StandardMatcher::new(expr)?))
})
}
/// Return the underlying regex object used.
fn regex(&self) -> &Regex {
match *self {
RegexMatcherImpl::Word(ref x) => x.regex(),
RegexMatcherImpl::Standard(ref x) => &x.regex,
} }
} }
/// Return the underlying HIR of the regex used for searching. /// Return the underlying regex object used.
fn chir(&self) -> &ConfiguredHIR { fn regex(&self) -> String {
match *self { match *self {
RegexMatcherImpl::Word(ref x) => x.chir(), RegexMatcherImpl::Word(ref x) => x.regex().to_string(),
RegexMatcherImpl::Standard(ref x) => &x.chir, RegexMatcherImpl::CRLF(ref x) => x.regex().to_string(),
RegexMatcherImpl::MultiLiteral(_) => "<N/A>".to_string(),
RegexMatcherImpl::Standard(ref x) => x.regex.to_string(),
} }
} }
} }
@@ -454,6 +449,8 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*; use self::RegexMatcherImpl::*;
match self.matcher { match self.matcher {
Standard(ref m) => m.find_at(haystack, at), Standard(ref m) => m.find_at(haystack, at),
MultiLiteral(ref m) => m.find_at(haystack, at),
CRLF(ref m) => m.find_at(haystack, at),
Word(ref m) => m.find_at(haystack, at), Word(ref m) => m.find_at(haystack, at),
} }
} }
@@ -462,6 +459,8 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*; use self::RegexMatcherImpl::*;
match self.matcher { match self.matcher {
Standard(ref m) => m.new_captures(), Standard(ref m) => m.new_captures(),
MultiLiteral(ref m) => m.new_captures(),
CRLF(ref m) => m.new_captures(),
Word(ref m) => m.new_captures(), Word(ref m) => m.new_captures(),
} }
} }
@@ -470,6 +469,8 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*; use self::RegexMatcherImpl::*;
match self.matcher { match self.matcher {
Standard(ref m) => m.capture_count(), Standard(ref m) => m.capture_count(),
MultiLiteral(ref m) => m.capture_count(),
CRLF(ref m) => m.capture_count(),
Word(ref m) => m.capture_count(), Word(ref m) => m.capture_count(),
} }
} }
@@ -478,6 +479,8 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*; use self::RegexMatcherImpl::*;
match self.matcher { match self.matcher {
Standard(ref m) => m.capture_index(name), Standard(ref m) => m.capture_index(name),
MultiLiteral(ref m) => m.capture_index(name),
CRLF(ref m) => m.capture_index(name),
Word(ref m) => m.capture_index(name), Word(ref m) => m.capture_index(name),
} }
} }
@@ -486,6 +489,8 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*; use self::RegexMatcherImpl::*;
match self.matcher { match self.matcher {
Standard(ref m) => m.find(haystack), Standard(ref m) => m.find(haystack),
MultiLiteral(ref m) => m.find(haystack),
CRLF(ref m) => m.find(haystack),
Word(ref m) => m.find(haystack), Word(ref m) => m.find(haystack),
} }
} }
@@ -497,6 +502,8 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*; use self::RegexMatcherImpl::*;
match self.matcher { match self.matcher {
Standard(ref m) => m.find_iter(haystack, matched), Standard(ref m) => m.find_iter(haystack, matched),
MultiLiteral(ref m) => m.find_iter(haystack, matched),
CRLF(ref m) => m.find_iter(haystack, matched),
Word(ref m) => m.find_iter(haystack, matched), Word(ref m) => m.find_iter(haystack, matched),
} }
} }
@@ -512,6 +519,8 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*; use self::RegexMatcherImpl::*;
match self.matcher { match self.matcher {
Standard(ref m) => m.try_find_iter(haystack, matched), Standard(ref m) => m.try_find_iter(haystack, matched),
MultiLiteral(ref m) => m.try_find_iter(haystack, matched),
CRLF(ref m) => m.try_find_iter(haystack, matched),
Word(ref m) => m.try_find_iter(haystack, matched), Word(ref m) => m.try_find_iter(haystack, matched),
} }
} }
@@ -524,6 +533,8 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*; use self::RegexMatcherImpl::*;
match self.matcher { match self.matcher {
Standard(ref m) => m.captures(haystack, caps), Standard(ref m) => m.captures(haystack, caps),
MultiLiteral(ref m) => m.captures(haystack, caps),
CRLF(ref m) => m.captures(haystack, caps),
Word(ref m) => m.captures(haystack, caps), Word(ref m) => m.captures(haystack, caps),
} }
} }
@@ -540,6 +551,8 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*; use self::RegexMatcherImpl::*;
match self.matcher { match self.matcher {
Standard(ref m) => m.captures_iter(haystack, caps, matched), Standard(ref m) => m.captures_iter(haystack, caps, matched),
MultiLiteral(ref m) => m.captures_iter(haystack, caps, matched),
CRLF(ref m) => m.captures_iter(haystack, caps, matched),
Word(ref m) => m.captures_iter(haystack, caps, matched), Word(ref m) => m.captures_iter(haystack, caps, matched),
} }
} }
@@ -556,6 +569,10 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*; use self::RegexMatcherImpl::*;
match self.matcher { match self.matcher {
Standard(ref m) => m.try_captures_iter(haystack, caps, matched), Standard(ref m) => m.try_captures_iter(haystack, caps, matched),
MultiLiteral(ref m) => {
m.try_captures_iter(haystack, caps, matched)
}
CRLF(ref m) => m.try_captures_iter(haystack, caps, matched),
Word(ref m) => m.try_captures_iter(haystack, caps, matched), Word(ref m) => m.try_captures_iter(haystack, caps, matched),
} }
} }
@@ -569,6 +586,8 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*; use self::RegexMatcherImpl::*;
match self.matcher { match self.matcher {
Standard(ref m) => m.captures_at(haystack, at, caps), Standard(ref m) => m.captures_at(haystack, at, caps),
MultiLiteral(ref m) => m.captures_at(haystack, at, caps),
CRLF(ref m) => m.captures_at(haystack, at, caps),
Word(ref m) => m.captures_at(haystack, at, caps), Word(ref m) => m.captures_at(haystack, at, caps),
} }
} }
@@ -585,6 +604,8 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*; use self::RegexMatcherImpl::*;
match self.matcher { match self.matcher {
Standard(ref m) => m.replace(haystack, dst, append), Standard(ref m) => m.replace(haystack, dst, append),
MultiLiteral(ref m) => m.replace(haystack, dst, append),
CRLF(ref m) => m.replace(haystack, dst, append),
Word(ref m) => m.replace(haystack, dst, append), Word(ref m) => m.replace(haystack, dst, append),
} }
} }
@@ -604,6 +625,12 @@ impl Matcher for RegexMatcher {
Standard(ref m) => { Standard(ref m) => {
m.replace_with_captures(haystack, caps, dst, append) m.replace_with_captures(haystack, caps, dst, append)
} }
MultiLiteral(ref m) => {
m.replace_with_captures(haystack, caps, dst, append)
}
CRLF(ref m) => {
m.replace_with_captures(haystack, caps, dst, append)
}
Word(ref m) => { Word(ref m) => {
m.replace_with_captures(haystack, caps, dst, append) m.replace_with_captures(haystack, caps, dst, append)
} }
@@ -614,6 +641,8 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*; use self::RegexMatcherImpl::*;
match self.matcher { match self.matcher {
Standard(ref m) => m.is_match(haystack), Standard(ref m) => m.is_match(haystack),
MultiLiteral(ref m) => m.is_match(haystack),
CRLF(ref m) => m.is_match(haystack),
Word(ref m) => m.is_match(haystack), Word(ref m) => m.is_match(haystack),
} }
} }
@@ -626,6 +655,8 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*; use self::RegexMatcherImpl::*;
match self.matcher { match self.matcher {
Standard(ref m) => m.is_match_at(haystack, at), Standard(ref m) => m.is_match_at(haystack, at),
MultiLiteral(ref m) => m.is_match_at(haystack, at),
CRLF(ref m) => m.is_match_at(haystack, at),
Word(ref m) => m.is_match_at(haystack, at), Word(ref m) => m.is_match_at(haystack, at),
} }
} }
@@ -637,6 +668,8 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*; use self::RegexMatcherImpl::*;
match self.matcher { match self.matcher {
Standard(ref m) => m.shortest_match(haystack), Standard(ref m) => m.shortest_match(haystack),
MultiLiteral(ref m) => m.shortest_match(haystack),
CRLF(ref m) => m.shortest_match(haystack),
Word(ref m) => m.shortest_match(haystack), Word(ref m) => m.shortest_match(haystack),
} }
} }
@@ -649,6 +682,8 @@ impl Matcher for RegexMatcher {
use self::RegexMatcherImpl::*; use self::RegexMatcherImpl::*;
match self.matcher { match self.matcher {
Standard(ref m) => m.shortest_match_at(haystack, at), Standard(ref m) => m.shortest_match_at(haystack, at),
MultiLiteral(ref m) => m.shortest_match_at(haystack, at),
CRLF(ref m) => m.shortest_match_at(haystack, at),
Word(ref m) => m.shortest_match_at(haystack, at), Word(ref m) => m.shortest_match_at(haystack, at),
} }
} }
@@ -667,10 +702,7 @@ impl Matcher for RegexMatcher {
) -> Result<Option<LineMatchKind>, NoError> { ) -> Result<Option<LineMatchKind>, NoError> {
Ok(match self.fast_line_regex { Ok(match self.fast_line_regex {
Some(ref regex) => { Some(ref regex) => {
let input = Input::new(haystack); regex.shortest_match(haystack).map(LineMatchKind::Candidate)
regex
.search_half(&input)
.map(|hm| LineMatchKind::Candidate(hm.offset()))
} }
None => { None => {
self.shortest_match(haystack)?.map(LineMatchKind::Confirmed) self.shortest_match(haystack)?.map(LineMatchKind::Confirmed)
@@ -685,19 +717,20 @@ struct StandardMatcher {
/// The regular expression compiled from the pattern provided by the /// The regular expression compiled from the pattern provided by the
/// caller. /// caller.
regex: Regex, regex: Regex,
/// The HIR that produced this regex. /// A map from capture group name to its corresponding index.
/// names: HashMap<String, usize>,
/// We put this in an `Arc` because by the time it gets here, it won't
/// change. And because cloning and dropping an `Hir` is somewhat expensive
/// due to its deep recursive representation.
chir: Arc<ConfiguredHIR>,
} }
impl StandardMatcher { impl StandardMatcher {
fn new(chir: ConfiguredHIR) -> Result<StandardMatcher, Error> { fn new(expr: &ConfiguredHIR) -> Result<StandardMatcher, Error> {
let chir = Arc::new(chir); let regex = expr.regex()?;
let regex = chir.to_regex()?; let mut names = HashMap::new();
Ok(StandardMatcher { regex, chir }) for (i, optional_name) in regex.capture_names().enumerate() {
if let Some(name) = optional_name {
names.insert(name.to_string(), i);
}
}
Ok(StandardMatcher { regex, names })
} }
} }
@@ -710,12 +743,14 @@ impl Matcher for StandardMatcher {
haystack: &[u8], haystack: &[u8],
at: usize, at: usize,
) -> Result<Option<Match>, NoError> { ) -> Result<Option<Match>, NoError> {
let input = Input::new(haystack).span(at..haystack.len()); Ok(self
Ok(self.regex.find(input).map(|m| Match::new(m.start(), m.end()))) .regex
.find_at(haystack, at)
.map(|m| Match::new(m.start(), m.end())))
} }
fn new_captures(&self) -> Result<RegexCaptures, NoError> { fn new_captures(&self) -> Result<RegexCaptures, NoError> {
Ok(RegexCaptures::new(self.regex.create_captures())) Ok(RegexCaptures::new(self.regex.capture_locations()))
} }
fn capture_count(&self) -> usize { fn capture_count(&self) -> usize {
@@ -723,7 +758,7 @@ impl Matcher for StandardMatcher {
} }
fn capture_index(&self, name: &str) -> Option<usize> { fn capture_index(&self, name: &str) -> Option<usize> {
self.regex.group_info().to_index(PatternID::ZERO, name) self.names.get(name).map(|i| *i)
} }
fn try_find_iter<F, E>( fn try_find_iter<F, E>(
@@ -750,10 +785,10 @@ impl Matcher for StandardMatcher {
at: usize, at: usize,
caps: &mut RegexCaptures, caps: &mut RegexCaptures,
) -> Result<bool, NoError> { ) -> Result<bool, NoError> {
let input = Input::new(haystack).span(at..haystack.len()); Ok(self
let caps = caps.captures_mut(); .regex
self.regex.search_captures(&input, caps); .captures_read_at(&mut caps.locations_mut(), haystack, at)
Ok(caps.is_match()) .is_some())
} }
fn shortest_match_at( fn shortest_match_at(
@@ -761,8 +796,7 @@ impl Matcher for StandardMatcher {
haystack: &[u8], haystack: &[u8],
at: usize, at: usize,
) -> Result<Option<usize>, NoError> { ) -> Result<Option<usize>, NoError> {
let input = Input::new(haystack).span(at..haystack.len()); Ok(self.regex.shortest_match_at(haystack, at))
Ok(self.regex.search_half(&input).map(|hm| hm.offset()))
} }
} }
@@ -781,51 +815,137 @@ impl Matcher for StandardMatcher {
/// index of the group using the corresponding matcher's `capture_index` /// index of the group using the corresponding matcher's `capture_index`
/// method, and then use that index with `RegexCaptures::get`. /// method, and then use that index with `RegexCaptures::get`.
#[derive(Clone, Debug)] #[derive(Clone, Debug)]
pub struct RegexCaptures { pub struct RegexCaptures(RegexCapturesImp);
/// Where the captures are stored.
caps: AutomataCaptures, #[derive(Clone, Debug)]
/// These captures behave as if the capturing groups begin at the given enum RegexCapturesImp {
/// offset. When set to `0`, this has no affect and capture groups are AhoCorasick {
/// indexed like normal. /// The start and end of the match, corresponding to capture group 0.
/// mat: Option<Match>,
/// This is useful when building matchers that wrap arbitrary regular },
/// expressions. For example, `WordMatcher` takes an existing regex Regex {
/// `re` and creates `(?:^|\W)(re)(?:$|\W)`, but hides the fact that /// Where the locations are stored.
/// the regex has been wrapped from the caller. In order to do this, locs: CaptureLocations,
/// the matcher and the capturing groups must behave as if `(re)` is /// These captures behave as if the capturing groups begin at the given
/// the `0`th capture group. /// offset. When set to `0`, this has no affect and capture groups are
offset: usize, /// indexed like normal.
///
/// This is useful when building matchers that wrap arbitrary regular
/// expressions. For example, `WordMatcher` takes an existing regex
/// `re` and creates `(?:^|\W)(re)(?:$|\W)`, but hides the fact that
/// the regex has been wrapped from the caller. In order to do this,
/// the matcher and the capturing groups must behave as if `(re)` is
/// the `0`th capture group.
offset: usize,
/// When enable, the end of a match has `\r` stripped from it, if one
/// exists.
strip_crlf: bool,
},
} }
impl Captures for RegexCaptures { impl Captures for RegexCaptures {
fn len(&self) -> usize { fn len(&self) -> usize {
self.caps match self.0 {
.group_info() RegexCapturesImp::AhoCorasick { .. } => 1,
.all_group_len() RegexCapturesImp::Regex { ref locs, offset, .. } => {
.checked_sub(self.offset) locs.len().checked_sub(offset).unwrap()
.unwrap() }
}
} }
fn get(&self, i: usize) -> Option<Match> { fn get(&self, i: usize) -> Option<Match> {
let actual = i.checked_add(self.offset).unwrap(); match self.0 {
self.caps.get_group(actual).map(|sp| Match::new(sp.start, sp.end)) RegexCapturesImp::AhoCorasick { mat, .. } => {
if i == 0 {
mat
} else {
None
}
}
RegexCapturesImp::Regex { ref locs, offset, strip_crlf } => {
if !strip_crlf {
let actual = i.checked_add(offset).unwrap();
return locs.pos(actual).map(|(s, e)| Match::new(s, e));
}
// currently don't support capture offsetting with CRLF
// stripping
assert_eq!(offset, 0);
let m = match locs.pos(i).map(|(s, e)| Match::new(s, e)) {
None => return None,
Some(m) => m,
};
// If the end position of this match corresponds to the end
// position of the overall match, then we apply our CRLF
// stripping. Otherwise, we cannot assume stripping is correct.
if i == 0 || m.end() == locs.pos(0).unwrap().1 {
Some(m.with_end(m.end() - 1))
} else {
Some(m)
}
}
}
} }
} }
impl RegexCaptures { impl RegexCaptures {
pub(crate) fn new(caps: AutomataCaptures) -> RegexCaptures { pub(crate) fn simple() -> RegexCaptures {
RegexCaptures::with_offset(caps, 0) RegexCaptures(RegexCapturesImp::AhoCorasick { mat: None })
}
pub(crate) fn new(locs: CaptureLocations) -> RegexCaptures {
RegexCaptures::with_offset(locs, 0)
} }
pub(crate) fn with_offset( pub(crate) fn with_offset(
caps: AutomataCaptures, locs: CaptureLocations,
offset: usize, offset: usize,
) -> RegexCaptures { ) -> RegexCaptures {
RegexCaptures { caps, offset } RegexCaptures(RegexCapturesImp::Regex {
locs,
offset,
strip_crlf: false,
})
} }
pub(crate) fn captures_mut(&mut self) -> &mut AutomataCaptures { pub(crate) fn locations(&self) -> &CaptureLocations {
&mut self.caps match self.0 {
RegexCapturesImp::AhoCorasick { .. } => {
panic!("getting locations for simple captures is invalid")
}
RegexCapturesImp::Regex { ref locs, .. } => locs,
}
}
pub(crate) fn locations_mut(&mut self) -> &mut CaptureLocations {
match self.0 {
RegexCapturesImp::AhoCorasick { .. } => {
panic!("getting locations for simple captures is invalid")
}
RegexCapturesImp::Regex { ref mut locs, .. } => locs,
}
}
pub(crate) fn strip_crlf(&mut self, yes: bool) {
match self.0 {
RegexCapturesImp::AhoCorasick { .. } => {
panic!("setting strip_crlf for simple captures is invalid")
}
RegexCapturesImp::Regex { ref mut strip_crlf, .. } => {
*strip_crlf = yes;
}
}
}
pub(crate) fn set_simple(&mut self, one: Option<Match>) {
match self.0 {
RegexCapturesImp::AhoCorasick { ref mut mat } => {
*mat = one;
}
RegexCapturesImp::Regex { .. } => {
panic!("setting simple captures for regex is invalid")
}
}
} }
} }
@@ -912,9 +1032,7 @@ mod tests {
} }
// Test that finding candidate lines works as expected. // Test that finding candidate lines works as expected.
// FIXME: Re-enable this test once inner literal extraction works.
#[test] #[test]
#[ignore]
fn candidate_lines() { fn candidate_lines() {
fn is_confirmed(m: LineMatchKind) -> bool { fn is_confirmed(m: LineMatchKind) -> bool {
match m { match m {

View File

@@ -1,9 +1,9 @@
use aho_corasick::{AhoCorasick, MatchKind}; use aho_corasick::{AhoCorasick, AhoCorasickBuilder, MatchKind};
use grep_matcher::{Match, Matcher, NoError}; use grep_matcher::{Match, Matcher, NoError};
use regex_syntax::hir::{Hir, HirKind}; use regex_syntax::hir::Hir;
use crate::error::Error; use error::Error;
use crate::matcher::RegexCaptures; use matcher::RegexCaptures;
/// A matcher for an alternation of literals. /// A matcher for an alternation of literals.
/// ///
@@ -23,10 +23,11 @@ impl MultiLiteralMatcher {
pub fn new<B: AsRef<[u8]>>( pub fn new<B: AsRef<[u8]>>(
literals: &[B], literals: &[B],
) -> Result<MultiLiteralMatcher, Error> { ) -> Result<MultiLiteralMatcher, Error> {
let ac = AhoCorasick::builder() let ac = AhoCorasickBuilder::new()
.match_kind(MatchKind::LeftmostFirst) .match_kind(MatchKind::LeftmostFirst)
.build(literals) .auto_configure(literals)
.map_err(Error::generic)?; .build_with_size::<usize, _, _>(literals)
.map_err(Error::regex)?;
Ok(MultiLiteralMatcher { ac }) Ok(MultiLiteralMatcher { ac })
} }
} }
@@ -78,11 +79,13 @@ impl Matcher for MultiLiteralMatcher {
/// Alternation literals checks if the given HIR is a simple alternation of /// Alternation literals checks if the given HIR is a simple alternation of
/// literals, and if so, returns them. Otherwise, this returns None. /// literals, and if so, returns them. Otherwise, this returns None.
pub fn alternation_literals(expr: &Hir) -> Option<Vec<Vec<u8>>> { pub fn alternation_literals(expr: &Hir) -> Option<Vec<Vec<u8>>> {
use regex_syntax::hir::{HirKind, Literal};
// This is pretty hacky, but basically, if `is_alternation_literal` is // This is pretty hacky, but basically, if `is_alternation_literal` is
// true, then we can make several assumptions about the structure of our // true, then we can make several assumptions about the structure of our
// HIR. This is what justifies the `unreachable!` statements below. // HIR. This is what justifies the `unreachable!` statements below.
if !expr.properties().is_alternation_literal() { if !expr.is_alternation_literal() {
return None; return None;
} }
let alts = match *expr.kind() { let alts = match *expr.kind() {
@@ -90,16 +93,26 @@ pub fn alternation_literals(expr: &Hir) -> Option<Vec<Vec<u8>>> {
_ => return None, // one literal isn't worth it _ => return None, // one literal isn't worth it
}; };
let extendlit = |lit: &Literal, dst: &mut Vec<u8>| match *lit {
Literal::Unicode(c) => {
let mut buf = [0; 4];
dst.extend_from_slice(c.encode_utf8(&mut buf).as_bytes());
}
Literal::Byte(b) => {
dst.push(b);
}
};
let mut lits = vec![]; let mut lits = vec![];
for alt in alts { for alt in alts {
let mut lit = vec![]; let mut lit = vec![];
match *alt.kind() { match *alt.kind() {
HirKind::Empty => {} HirKind::Empty => {}
HirKind::Literal(ref x) => lit.extend_from_slice(&x.0), HirKind::Literal(ref x) => extendlit(x, &mut lit),
HirKind::Concat(ref exprs) => { HirKind::Concat(ref exprs) => {
for e in exprs { for e in exprs {
match *e.kind() { match *e.kind() {
HirKind::Literal(ref x) => lit.extend_from_slice(&x.0), HirKind::Literal(ref x) => extendlit(x, &mut lit),
_ => unreachable!("expected literal, got {:?}", e), _ => unreachable!("expected literal, got {:?}", e),
} }
} }

View File

@@ -1,13 +1,9 @@
use { use grep_matcher::ByteSet;
grep_matcher::ByteSet, use regex_syntax::hir::{self, Hir, HirKind};
regex_syntax::{ use regex_syntax::utf8::Utf8Sequences;
hir::{self, Hir, HirKind, Look},
utf8::Utf8Sequences,
},
};
/// Return a confirmed set of non-matching bytes from the given expression. /// Return a confirmed set of non-matching bytes from the given expression.
pub(crate) fn non_matching_bytes(expr: &Hir) -> ByteSet { pub fn non_matching_bytes(expr: &Hir) -> ByteSet {
let mut set = ByteSet::full(); let mut set = ByteSet::full();
remove_matching_bytes(expr, &mut set); remove_matching_bytes(expr, &mut set);
set set
@@ -17,27 +13,15 @@ pub(crate) fn non_matching_bytes(expr: &Hir) -> ByteSet {
/// the given expression. /// the given expression.
fn remove_matching_bytes(expr: &Hir, set: &mut ByteSet) { fn remove_matching_bytes(expr: &Hir, set: &mut ByteSet) {
match *expr.kind() { match *expr.kind() {
HirKind::Empty HirKind::Empty | HirKind::Anchor(_) | HirKind::WordBoundary(_) => {}
| HirKind::Look(Look::WordAscii | Look::WordAsciiNegate) HirKind::Literal(hir::Literal::Unicode(c)) => {
| HirKind::Look(Look::WordUnicode | Look::WordUnicodeNegate) => {} for &b in c.encode_utf8(&mut [0; 4]).as_bytes() {
HirKind::Look(Look::Start | Look::End) => {
// FIXME: This is wrong, but not doing this leads to incorrect
// results because of how anchored searches are implemented in
// the 'grep-searcher' crate.
set.remove(b'\n');
}
HirKind::Look(Look::StartLF | Look::EndLF) => {
set.remove(b'\n');
}
HirKind::Look(Look::StartCRLF | Look::EndCRLF) => {
set.remove(b'\r');
set.remove(b'\n');
}
HirKind::Literal(hir::Literal(ref lit)) => {
for &b in lit.iter() {
set.remove(b); set.remove(b);
} }
} }
HirKind::Literal(hir::Literal::Byte(b)) => {
set.remove(b);
}
HirKind::Class(hir::Class::Unicode(ref cls)) => { HirKind::Class(hir::Class::Unicode(ref cls)) => {
for range in cls.iter() { for range in cls.iter() {
// This is presumably faster than encoding every codepoint // This is presumably faster than encoding every codepoint
@@ -55,10 +39,10 @@ fn remove_matching_bytes(expr: &Hir, set: &mut ByteSet) {
} }
} }
HirKind::Repetition(ref x) => { HirKind::Repetition(ref x) => {
remove_matching_bytes(&x.sub, set); remove_matching_bytes(&x.hir, set);
} }
HirKind::Capture(ref x) => { HirKind::Group(ref x) => {
remove_matching_bytes(&x.sub, set); remove_matching_bytes(&x.hir, set);
} }
HirKind::Concat(ref xs) => { HirKind::Concat(ref xs) => {
for x in xs { for x in xs {
@@ -75,13 +59,17 @@ fn remove_matching_bytes(expr: &Hir, set: &mut ByteSet) {
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use {grep_matcher::ByteSet, regex_syntax::ParserBuilder}; use grep_matcher::ByteSet;
use regex_syntax::ParserBuilder;
use super::non_matching_bytes; use super::non_matching_bytes;
fn extract(pattern: &str) -> ByteSet { fn extract(pattern: &str) -> ByteSet {
let expr = let expr = ParserBuilder::new()
ParserBuilder::new().utf8(false).build().parse(pattern).unwrap(); .allow_invalid_utf8(true)
.build()
.parse(pattern)
.unwrap();
non_matching_bytes(&expr) non_matching_bytes(&expr)
} }
@@ -137,16 +125,4 @@ mod tests {
assert_eq!(sparse(&extract(r"\xFF")), sparse_except(&[0xC3, 0xBF])); assert_eq!(sparse(&extract(r"\xFF")), sparse_except(&[0xC3, 0xBF]));
assert_eq!(sparse(&extract(r"(?-u)\xFF")), sparse_except(&[0xFF])); assert_eq!(sparse(&extract(r"(?-u)\xFF")), sparse_except(&[0xFF]));
} }
#[test]
fn anchor() {
// FIXME: The first four tests below should correspond to a full set
// of bytes for the non-matching bytes I think.
assert_eq!(sparse(&extract(r"^")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"$")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"\A")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"\z")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"(?m)^")), sparse_except(&[b'\n']));
assert_eq!(sparse(&extract(r"(?m)$")), sparse_except(&[b'\n']));
}
} }

View File

@@ -1,9 +1,7 @@
use { use grep_matcher::LineTerminator;
grep_matcher::LineTerminator, use regex_syntax::hir::{self, Hir, HirKind};
regex_syntax::hir::{self, Hir, HirKind},
};
use crate::error::{Error, ErrorKind}; use error::{Error, ErrorKind};
/// Return an HIR that is guaranteed to never match the given line terminator, /// Return an HIR that is guaranteed to never match the given line terminator,
/// if possible. /// if possible.
@@ -17,26 +15,7 @@ use crate::error::{Error, ErrorKind};
/// ///
/// If the given line terminator is not ASCII, then this function returns an /// If the given line terminator is not ASCII, then this function returns an
/// error. /// error.
/// pub fn strip_from_match(
/// Note that as of regex 1.9, this routine could theoretically be implemented
/// without returning an error. Namely, for example, we could turn
/// `foo\nbar` into `foo[a&&b]bar`. That is, replace line terminators with a
/// sub-expression that can never match anything. Thus, ripgrep would accept
/// such regexes and just silently not match anything. Regex versions prior to 1.8
/// don't support such constructs. I ended up deciding to leave the existing
/// behavior of returning an error instead. For example:
///
/// ```text
/// $ echo -n 'foo\nbar\n' | rg 'foo\nbar'
/// the literal '"\n"' is not allowed in a regex
///
/// Consider enabling multiline mode with the --multiline flag (or -U for short).
/// When multiline mode is enabled, new line characters can be matched.
/// ```
///
/// This looks like a good error message to me, and even suggests a flag that
/// the user can use instead.
pub(crate) fn strip_from_match(
expr: Hir, expr: Hir,
line_term: LineTerminator, line_term: LineTerminator,
) -> Result<Hir, Error> { ) -> Result<Hir, Error> {
@@ -44,34 +23,40 @@ pub(crate) fn strip_from_match(
let expr1 = strip_from_match_ascii(expr, b'\r')?; let expr1 = strip_from_match_ascii(expr, b'\r')?;
strip_from_match_ascii(expr1, b'\n') strip_from_match_ascii(expr1, b'\n')
} else { } else {
strip_from_match_ascii(expr, line_term.as_byte()) let b = line_term.as_byte();
if b > 0x7F {
return Err(Error::new(ErrorKind::InvalidLineTerminator(b)));
}
strip_from_match_ascii(expr, b)
} }
} }
/// The implementation of strip_from_match. The given byte must be ASCII. /// The implementation of strip_from_match. The given byte must be ASCII. This
/// This function returns an error otherwise. It also returns an error if /// function panics otherwise.
/// it couldn't remove `\n` from the given regex without leaving an empty
/// character class in its place.
fn strip_from_match_ascii(expr: Hir, byte: u8) -> Result<Hir, Error> { fn strip_from_match_ascii(expr: Hir, byte: u8) -> Result<Hir, Error> {
if !byte.is_ascii() { assert!(byte <= 0x7F);
return Err(Error::new(ErrorKind::InvalidLineTerminator(byte))); let chr = byte as char;
} assert_eq!(chr.len_utf8(), 1);
let ch = char::from(byte);
let invalid = || Err(Error::new(ErrorKind::NotAllowed(ch.to_string()))); let invalid = || Err(Error::new(ErrorKind::NotAllowed(chr.to_string())));
Ok(match expr.into_kind() { Ok(match expr.into_kind() {
HirKind::Empty => Hir::empty(), HirKind::Empty => Hir::empty(),
HirKind::Literal(hir::Literal(lit)) => { HirKind::Literal(hir::Literal::Unicode(c)) => {
if lit.iter().find(|&&b| b == byte).is_some() { if c == chr {
return invalid(); return invalid();
} }
Hir::literal(lit) Hir::literal(hir::Literal::Unicode(c))
}
HirKind::Literal(hir::Literal::Byte(b)) => {
if b as char == chr {
return invalid();
}
Hir::literal(hir::Literal::Byte(b))
} }
HirKind::Class(hir::Class::Unicode(mut cls)) => { HirKind::Class(hir::Class::Unicode(mut cls)) => {
if cls.ranges().is_empty() {
return Ok(Hir::class(hir::Class::Unicode(cls)));
}
let remove = hir::ClassUnicode::new(Some( let remove = hir::ClassUnicode::new(Some(
hir::ClassUnicodeRange::new(ch, ch), hir::ClassUnicodeRange::new(chr, chr),
)); ));
cls.difference(&remove); cls.difference(&remove);
if cls.ranges().is_empty() { if cls.ranges().is_empty() {
@@ -80,9 +65,6 @@ fn strip_from_match_ascii(expr: Hir, byte: u8) -> Result<Hir, Error> {
Hir::class(hir::Class::Unicode(cls)) Hir::class(hir::Class::Unicode(cls))
} }
HirKind::Class(hir::Class::Bytes(mut cls)) => { HirKind::Class(hir::Class::Bytes(mut cls)) => {
if cls.ranges().is_empty() {
return Ok(Hir::class(hir::Class::Bytes(cls)));
}
let remove = hir::ClassBytes::new(Some( let remove = hir::ClassBytes::new(Some(
hir::ClassBytesRange::new(byte, byte), hir::ClassBytesRange::new(byte, byte),
)); ));
@@ -92,14 +74,15 @@ fn strip_from_match_ascii(expr: Hir, byte: u8) -> Result<Hir, Error> {
} }
Hir::class(hir::Class::Bytes(cls)) Hir::class(hir::Class::Bytes(cls))
} }
HirKind::Look(x) => Hir::look(x), HirKind::Anchor(x) => Hir::anchor(x),
HirKind::WordBoundary(x) => Hir::word_boundary(x),
HirKind::Repetition(mut x) => { HirKind::Repetition(mut x) => {
x.sub = Box::new(strip_from_match_ascii(*x.sub, byte)?); x.hir = Box::new(strip_from_match_ascii(*x.hir, byte)?);
Hir::repetition(x) Hir::repetition(x)
} }
HirKind::Capture(mut x) => { HirKind::Group(mut x) => {
x.sub = Box::new(strip_from_match_ascii(*x.sub, byte)?); x.hir = Box::new(strip_from_match_ascii(*x.hir, byte)?);
Hir::capture(x) Hir::group(x)
} }
HirKind::Concat(xs) => { HirKind::Concat(xs) => {
let xs = xs let xs = xs
@@ -123,7 +106,7 @@ mod tests {
use regex_syntax::Parser; use regex_syntax::Parser;
use super::{strip_from_match, LineTerminator}; use super::{strip_from_match, LineTerminator};
use crate::error::Error; use error::Error;
fn roundtrip(pattern: &str, byte: u8) -> String { fn roundtrip(pattern: &str, byte: u8) -> String {
roundtrip_line_term(pattern, LineTerminator::byte(byte)).unwrap() roundtrip_line_term(pattern, LineTerminator::byte(byte)).unwrap()
@@ -148,11 +131,11 @@ mod tests {
#[test] #[test]
fn various() { fn various() {
assert_eq!(roundtrip(r"[a\n]", b'\n'), "a"); assert_eq!(roundtrip(r"[a\n]", b'\n'), "[a]");
assert_eq!(roundtrip(r"[a\n]", b'a'), "\n"); assert_eq!(roundtrip(r"[a\n]", b'a'), "[\n]");
assert_eq!(roundtrip_crlf(r"[a\n]"), "a"); assert_eq!(roundtrip_crlf(r"[a\n]"), "[a]");
assert_eq!(roundtrip_crlf(r"[a\r]"), "a"); assert_eq!(roundtrip_crlf(r"[a\r]"), "[a]");
assert_eq!(roundtrip_crlf(r"[a\r\n]"), "a"); assert_eq!(roundtrip_crlf(r"[a\r\n]"), "[a]");
assert_eq!(roundtrip(r"(?-u)\s", b'a'), r"(?-u:[\x09-\x0D\x20])"); assert_eq!(roundtrip(r"(?-u)\s", b'a'), r"(?-u:[\x09-\x0D\x20])");
assert_eq!(roundtrip(r"(?-u)\s", b'\n'), r"(?-u:[\x09\x0B-\x0D\x20])"); assert_eq!(roundtrip(r"(?-u)\s", b'\n'), r"(?-u:[\x09\x0B-\x0D\x20])");

29
crates/regex/src/util.rs Normal file
View File

@@ -0,0 +1,29 @@
/// Converts an arbitrary sequence of bytes to a literal suitable for building
/// a regular expression.
pub fn bytes_to_regex(bs: &[u8]) -> String {
use regex_syntax::is_meta_character;
use std::fmt::Write;
let mut s = String::with_capacity(bs.len());
for &b in bs {
if b <= 0x7F && !is_meta_character(b as char) {
write!(s, r"{}", b as char).unwrap();
} else {
write!(s, r"\x{:02x}", b).unwrap();
}
}
s
}
/// Converts arbitrary bytes to a nice string.
pub fn show_bytes(bs: &[u8]) -> String {
use std::ascii::escape_default;
use std::str;
let mut nice = String::new();
for &b in bs {
let part: Vec<u8> = escape_default(b).collect();
nice.push_str(str::from_utf8(&part).unwrap());
}
nice
}

View File

@@ -1,59 +1,39 @@
use std::{ use std::cell::RefCell;
collections::HashMap, use std::collections::HashMap;
panic::{RefUnwindSafe, UnwindSafe}, use std::sync::Arc;
sync::Arc,
};
use { use grep_matcher::{Match, Matcher, NoError};
grep_matcher::{Match, Matcher, NoError}, use regex::bytes::{CaptureLocations, Regex};
regex_automata::{ use thread_local::CachedThreadLocal;
meta::Regex, util::captures::Captures, util::pool::Pool, Input,
PatternID,
},
};
use crate::{config::ConfiguredHIR, error::Error, matcher::RegexCaptures}; use config::ConfiguredHIR;
use error::Error;
type PoolFn = use matcher::RegexCaptures;
Box<dyn Fn() -> Captures + Send + Sync + UnwindSafe + RefUnwindSafe>;
/// A matcher for implementing "word match" semantics. /// A matcher for implementing "word match" semantics.
#[derive(Debug)] #[derive(Debug)]
pub(crate) struct WordMatcher { pub struct WordMatcher {
/// The regex which is roughly `(?:^|\W)(<original pattern>)(?:$|\W)`. /// The regex which is roughly `(?:^|\W)(<original pattern>)(?:$|\W)`.
regex: Regex, regex: Regex,
/// The HIR that produced the regex above. We don't keep the HIR for the
/// `original` regex.
///
/// We put this in an `Arc` because by the time it gets here, it won't
/// change. And because cloning and dropping an `Hir` is somewhat expensive
/// due to its deep recursive representation.
chir: Arc<ConfiguredHIR>,
/// The original regex supplied by the user, which we use in a fast path /// The original regex supplied by the user, which we use in a fast path
/// to try and detect matches before deferring to slower engines. /// to try and detect matches before deferring to slower engines.
original: Regex, original: Regex,
/// A map from capture group name to capture group index. /// A map from capture group name to capture group index.
names: HashMap<String, usize>, names: HashMap<String, usize>,
/// A thread-safe pool of reusable buffers for finding the match offset of /// A reusable buffer for finding the match location of the inner group.
/// the inner group. locs: Arc<CachedThreadLocal<RefCell<CaptureLocations>>>,
caps: Arc<Pool<Captures, PoolFn>>,
} }
impl Clone for WordMatcher { impl Clone for WordMatcher {
fn clone(&self) -> WordMatcher { fn clone(&self) -> WordMatcher {
// We implement Clone manually so that we get a fresh Pool such that it // We implement Clone manually so that we get a fresh CachedThreadLocal
// can set its own thread owner. This permits each thread usings `caps` // such that it can set its own thread owner. This permits each thread
// to hit the fast path. // usings `locs` to hit the fast path.
//
// Note that cloning a regex is "cheap" since it uses reference
// counting internally.
let re = self.regex.clone();
WordMatcher { WordMatcher {
regex: self.regex.clone(), regex: self.regex.clone(),
chir: Arc::clone(&self.chir),
original: self.original.clone(), original: self.original.clone(),
names: self.names.clone(), names: self.names.clone(),
caps: Arc::new(Pool::new(Box::new(move || re.create_captures()))), locs: Arc::new(CachedThreadLocal::new()),
} }
} }
} }
@@ -64,38 +44,31 @@ impl WordMatcher {
/// ///
/// The given options are used to construct the regular expression /// The given options are used to construct the regular expression
/// internally. /// internally.
pub(crate) fn new(chir: ConfiguredHIR) -> Result<WordMatcher, Error> { pub fn new(expr: &ConfiguredHIR) -> Result<WordMatcher, Error> {
let original = chir.clone().into_anchored().to_regex()?; let original =
let chir = Arc::new(chir.into_word()?); expr.with_pattern(|pat| format!("^(?:{})$", pat))?.regex()?;
let regex = chir.to_regex()?; let word_expr = expr.with_pattern(|pat| {
let caps = Arc::new(Pool::new({ let pat = format!(r"(?:(?-m:^)|\W)({})(?:(?-m:$)|\W)", pat);
let regex = regex.clone(); debug!("word regex: {:?}", pat);
Box::new(move || regex.create_captures()) as PoolFn pat
})); })?;
let regex = word_expr.regex()?;
let locs = Arc::new(CachedThreadLocal::new());
let mut names = HashMap::new(); let mut names = HashMap::new();
let it = regex.group_info().pattern_names(PatternID::ZERO); for (i, optional_name) in regex.capture_names().enumerate() {
for (i, optional_name) in it.enumerate() {
if let Some(name) = optional_name { if let Some(name) = optional_name {
names.insert(name.to_string(), i.checked_sub(1).unwrap()); names.insert(name.to_string(), i.checked_sub(1).unwrap());
} }
} }
Ok(WordMatcher { regex, chir, original, names, caps }) Ok(WordMatcher { regex, original, names, locs })
} }
/// Return the underlying regex used to match at word boundaries. /// Return the underlying regex used by this matcher.
/// pub fn regex(&self) -> &Regex {
/// The original regex is in the capture group at index 1.
pub(crate) fn regex(&self) -> &Regex {
&self.regex &self.regex
} }
/// Return the underlying HIR for the regex used to match at word
/// boundaries.
pub(crate) fn chir(&self) -> &ConfiguredHIR {
&self.chir
}
/// Attempt to do a fast confirmation of a word match that covers a subset /// Attempt to do a fast confirmation of a word match that covers a subset
/// (but hopefully a big subset) of most cases. Ok(Some(..)) is returned /// (but hopefully a big subset) of most cases. Ok(Some(..)) is returned
/// when a match is found. Ok(None) is returned when there is definitively /// when a match is found. Ok(None) is returned when there is definitively
@@ -106,11 +79,12 @@ impl WordMatcher {
haystack: &[u8], haystack: &[u8],
at: usize, at: usize,
) -> Result<Option<Match>, ()> { ) -> Result<Option<Match>, ()> {
// This is a bit hairy. The whole point here is to avoid running a // This is a bit hairy. The whole point here is to avoid running an
// slower regex engine to extract capture groups. Remember, our word // NFA simulation in the regex engine. Remember, our word regex looks
// regex looks like this: // like this:
// //
// (^|\W)(<original regex>)(\W|$) // (^|\W)(<original regex>)($|\W)
// where ^ and $ have multiline mode DISABLED
// //
// What we want are the match offsets of <original regex>. So in the // What we want are the match offsets of <original regex>. So in the
// easy/common case, the original regex will be sandwiched between // easy/common case, the original regex will be sandwiched between
@@ -128,8 +102,7 @@ impl WordMatcher {
// The reason why we cannot handle the ^/$ cases here is because we // The reason why we cannot handle the ^/$ cases here is because we
// can't assume anything about the original pattern. (Try commenting // can't assume anything about the original pattern. (Try commenting
// out the checks for ^/$ below and run the tests to see examples.) // out the checks for ^/$ below and run the tests to see examples.)
let input = Input::new(haystack).span(at..haystack.len()); let mut cand = match self.regex.find_at(haystack, at) {
let mut cand = match self.regex.find(input) {
None => return Ok(None), None => return Ok(None),
Some(m) => Match::new(m.start(), m.end()), Some(m) => Match::new(m.start(), m.end()),
}; };
@@ -138,15 +111,8 @@ impl WordMatcher {
} }
let (_, slen) = bstr::decode_utf8(&haystack[cand]); let (_, slen) = bstr::decode_utf8(&haystack[cand]);
let (_, elen) = bstr::decode_last_utf8(&haystack[cand]); let (_, elen) = bstr::decode_last_utf8(&haystack[cand]);
let new_start = cand.start() + slen; cand =
let new_end = cand.end() - elen; cand.with_start(cand.start() + slen).with_end(cand.end() - elen);
// This occurs the original regex can match the empty string. In this
// case, just bail instead of trying to get it right here since it's
// likely a pathological case.
if new_start > new_end {
return Err(());
}
cand = cand.with_start(new_start).with_end(new_end);
if self.original.is_match(&haystack[cand]) { if self.original.is_match(&haystack[cand]) {
Ok(Some(cand)) Ok(Some(cand))
} else { } else {
@@ -172,23 +138,23 @@ impl Matcher for WordMatcher {
// //
// OK, well, it turns out that it is worth it! But it is quite tricky. // OK, well, it turns out that it is worth it! But it is quite tricky.
// See `fast_find` for details. Effectively, this lets us skip running // See `fast_find` for details. Effectively, this lets us skip running
// a slower regex engine to extract capture groups in the vast majority // the NFA simulation in the regex engine in the vast majority of
// of cases. However, the slower engine is I believe required for full // cases. However, the NFA simulation is required for full correctness.
// correctness.
match self.fast_find(haystack, at) { match self.fast_find(haystack, at) {
Ok(Some(m)) => return Ok(Some(m)), Ok(Some(m)) => return Ok(Some(m)),
Ok(None) => return Ok(None), Ok(None) => return Ok(None),
Err(()) => {} Err(()) => {}
} }
let input = Input::new(haystack).span(at..haystack.len()); let cell =
let mut caps = self.caps.get(); self.locs.get_or(|| RefCell::new(self.regex.capture_locations()));
self.regex.search_captures(&input, &mut caps); let mut caps = cell.borrow_mut();
Ok(caps.get_group(1).map(|sp| Match::new(sp.start, sp.end))) self.regex.captures_read_at(&mut caps, haystack, at);
Ok(caps.get(1).map(|m| Match::new(m.0, m.1)))
} }
fn new_captures(&self) -> Result<RegexCaptures, NoError> { fn new_captures(&self) -> Result<RegexCaptures, NoError> {
Ok(RegexCaptures::with_offset(self.regex.create_captures(), 1)) Ok(RegexCaptures::with_offset(self.regex.capture_locations(), 1))
} }
fn capture_count(&self) -> usize { fn capture_count(&self) -> usize {
@@ -205,10 +171,9 @@ impl Matcher for WordMatcher {
at: usize, at: usize,
caps: &mut RegexCaptures, caps: &mut RegexCaptures,
) -> Result<bool, NoError> { ) -> Result<bool, NoError> {
let input = Input::new(haystack).span(at..haystack.len()); let r =
let caps = caps.captures_mut(); self.regex.captures_read_at(caps.locations_mut(), haystack, at);
self.regex.search_captures(&input, caps); Ok(r.is_some())
Ok(caps.is_match())
} }
// We specifically do not implement other methods like find_iter or // We specifically do not implement other methods like find_iter or
@@ -219,12 +184,12 @@ impl Matcher for WordMatcher {
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::WordMatcher; use super::WordMatcher;
use crate::config::Config; use config::Config;
use grep_matcher::{Captures, Match, Matcher}; use grep_matcher::{Captures, Match, Matcher};
fn matcher(pattern: &str) -> WordMatcher { fn matcher(pattern: &str) -> WordMatcher {
let chir = Config::default().build_many(&[pattern]).unwrap(); let chir = Config::default().hir(pattern).unwrap();
WordMatcher::new(chir).unwrap() WordMatcher::new(&chir).unwrap()
} }
fn find(pattern: &str, haystack: &str) -> Option<(usize, usize)> { fn find(pattern: &str, haystack: &str) -> Option<(usize, usize)> {
@@ -272,8 +237,6 @@ mod tests {
assert_eq!(Some((2, 5)), find(r"!?foo!?", "a!foo!a")); assert_eq!(Some((2, 5)), find(r"!?foo!?", "a!foo!a"));
assert_eq!(Some((2, 7)), find(r"!?foo!?", "##!foo!\n")); assert_eq!(Some((2, 7)), find(r"!?foo!?", "##!foo!\n"));
assert_eq!(Some((3, 8)), find(r"!?foo!?", "##\n!foo!##"));
assert_eq!(Some((3, 8)), find(r"!?foo!?", "##\n!foo!\n##"));
assert_eq!(Some((3, 7)), find(r"f?oo!?", "##\nfoo!##")); assert_eq!(Some((3, 7)), find(r"f?oo!?", "##\nfoo!##"));
assert_eq!(Some((2, 5)), find(r"(?-u)foo[^a]*", "#!foo☃aaa")); assert_eq!(Some((2, 5)), find(r"(?-u)foo[^a]*", "#!foo☃aaa"));
} }

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "grep-searcher" name = "grep-searcher"
version = "0.1.11" #:version version = "0.1.7" #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"] authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """ description = """
Fast line oriented regex searching as a library. Fast line oriented regex searching as a library.
@@ -10,20 +10,19 @@ homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/searcher"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/searcher" repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/searcher"
readme = "README.md" readme = "README.md"
keywords = ["regex", "grep", "egrep", "search", "pattern"] keywords = ["regex", "grep", "egrep", "search", "pattern"]
license = "Unlicense OR MIT" license = "Unlicense/MIT"
edition = "2018"
[dependencies] [dependencies]
bstr = { version = "1.6.0", default-features = false, features = ["std"] } bstr = { version = "0.2.0", default-features = false, features = ["std"] }
bytecount = "0.6" bytecount = "0.6"
encoding_rs = "0.8.14" encoding_rs = "0.8.14"
encoding_rs_io = "0.1.6" encoding_rs_io = "0.1.6"
grep-matcher = { version = "0.1.6", path = "../matcher" } grep-matcher = { version = "0.1.2", path = "../matcher" }
log = "0.4.5" log = "0.4.5"
memmap = { package = "memmap2", version = "0.5.3" } memmap = "0.7"
[dev-dependencies] [dev-dependencies]
grep-regex = { version = "0.1.11", path = "../regex" } grep-regex = { version = "0.1.3", path = "../regex" }
regex = "1.1" regex = "1.1"
[features] [features]

View File

@@ -28,3 +28,9 @@ Add this to your `Cargo.toml`:
[dependencies] [dependencies]
grep-searcher = "0.1" grep-searcher = "0.1"
``` ```
and this to your crate root:
```rust
extern crate grep_searcher;
```

View File

@@ -1,3 +1,6 @@
extern crate grep_regex;
extern crate grep_searcher;
use std::env; use std::env;
use std::error::Error; use std::error::Error;
use std::io; use std::io;

View File

@@ -48,6 +48,10 @@ using the
implementation of `Sink`. implementation of `Sink`.
``` ```
extern crate grep_matcher;
extern crate grep_regex;
extern crate grep_searcher;
use std::error::Error; use std::error::Error;
use grep_matcher::Matcher; use grep_matcher::Matcher;
@@ -95,13 +99,24 @@ searches stdin.
#![deny(missing_docs)] #![deny(missing_docs)]
pub use crate::lines::{LineIter, LineStep}; extern crate bstr;
pub use crate::searcher::{ extern crate bytecount;
extern crate encoding_rs;
extern crate encoding_rs_io;
extern crate grep_matcher;
#[macro_use]
extern crate log;
extern crate memmap;
#[cfg(test)]
extern crate regex;
pub use lines::{LineIter, LineStep};
pub use searcher::{
BinaryDetection, ConfigError, Encoding, MmapChoice, Searcher, BinaryDetection, ConfigError, Encoding, MmapChoice, Searcher,
SearcherBuilder, SearcherBuilder,
}; };
pub use crate::sink::sinks; pub use sink::sinks;
pub use crate::sink::{ pub use sink::{
Sink, SinkContext, SinkContextKind, SinkError, SinkFinish, SinkMatch, Sink, SinkContext, SinkContextKind, SinkError, SinkFinish, SinkMatch,
}; };

Some files were not shown because too many files have changed in this diff Show More