It turns out that ripgrep's -w/--word-regexp flag doesn't quite do what it claims to. And as a result, it actually differs from GNU grep in some cases (and even with PCRE2 within ripgrep):
$ echo '###' | grep -w -o .
#
#
#
$ echo '###' | rg-13.0.0 -w -o .
#
#
$ echo '###' | rg-13.0.0 -P -w -o .
#
#
#
ripgrep 14 will fix this:
$ echo '###' | rg-14.0.0 -w -o .
#
#
#
The actual issue here is that ripgrep used a hacky work-around to implement -w/--word-regexp that just didn't work right in all cases. For PCRE2, it was implemented via look-around assertions that always got it right:
|
} else if self.word { |
|
// We make this option exclusive with whole_line because when |
|
// whole_line is enabled, all matches necessary fall on word |
|
// boundaries. So this extra goop is strictly redundant. |
|
singlepat = format!(r"(?<!\w)(?:{})(?!\w)", singlepat); |
|
} |
The work-around I used basically tried to emulate look-around with capture groups. And while it works in a lot of cases, it doesn't work in all of them. I spent quite a bit of time trying to figure out how to fix the work-around once and for all, but couldn't see a way to make the code obviously correct. Instead, I just added support for \b{start-half} and \b{end-half} word boundary assertions that do exactly what the look-around does in PCRE2. See: rust-lang/regex#469
It turns out that ripgrep's
-w/--word-regexpflag doesn't quite do what it claims to. And as a result, it actually differs from GNU grep in some cases (and even with PCRE2 within ripgrep):ripgrep 14 will fix this:
The actual issue here is that ripgrep used a hacky work-around to implement
-w/--word-regexpthat just didn't work right in all cases. For PCRE2, it was implemented via look-around assertions that always got it right:ripgrep/crates/pcre2/src/matcher.rs
Lines 64 to 69 in 52731cd
The work-around I used basically tried to emulate look-around with capture groups. And while it works in a lot of cases, it doesn't work in all of them. I spent quite a bit of time trying to figure out how to fix the work-around once and for all, but couldn't see a way to make the code obviously correct. Instead, I just added support for
\b{start-half}and\b{end-half}word boundary assertions that do exactly what the look-around does in PCRE2. See: rust-lang/regex#469