PLUG - 1/21/2014 - David Oswald - "Regular Expressions" |
"What is backtracking? How do character classes work? What is the "Little Engine that Could(n't)"?"
Notes
Definitions:
- Literal Characters: abcd... ABC... 123...
- Operators: m// (match), s/// (substitute), =~ or !~ (bind)
- Meta-characters: \ | ( ) [ { ^ $ * + ? .
- Meta symbols: \b \D \t \3 etc...
Example:
$string = "hello world"; say "Match" if $string =~ m/hello/;
Syntactic Shortcuts:
$_ = "hello world"; say "match" if /hello/;
DFA / NFA / Hybrid Engines - (Deterministic Finite Automata) DFAs and (Non-deterministic Finite Automata) NFAs have exactly the same capabilities and limitations. The only difference is notation convenience.
- DFA: simpler, text-directed match, no backtracking, more limited, awk/egrep
- NFA: regex-directed match, backtracking, more flexible semantics, grep/less/sed/more/Perl/PHP/Python
Two big NFA rules:
- The match that begins earliest wins.
- The standard quantifiers are greedy.
- rxrx - command-line REPL and wrapper for Regexp::Debugger
Quick rxrx overview:
Install: cpan Regexp::Debugger Usage: rxrx perl -MRegexp::Debugger -E 'Regexp::Debugger::rxrx(@ARGV)' To set the regex: /[SOMETHING]/ To set the string: '[SOMETHING]' To start match engine: 'm' To quit: 'q' Commands: / : Enter a pattern ' : Enter a new literal string " : Enter a new double-quoted string m : Match current string against current pattern q or x : quit debugger and exit h : help Match Commands: - : back step [enter] : forward step c : proceed visually quickly to the end
Example of large backtracking out of control:
# fails in 218 steps /(a*)*[^Bb]$/ 'aaaaaab'
Backtracking can be put under control by using "possessive quantifiers": (*+, ++, ?+, {n,m}+)
# fails in 79 steps /(a*)*+[^Bb]$/
Possessive quantifiers - change the "greedy" quantifiers to "lazy" quantifiers that attempt to match the minimal amount. Greedy quantifiers attempt to match the greatest left most match. Lazy quantifiers attempt to match the least left most match.
/.*foo/ # Greedy ; backtrack backwards. /.*?foo/ # Lazy; backtrack forward.
More NFA rules:
- Matches occur as far left as possible.
- Alternation has left-to-right precedence.
- Alternative matches if every item listed in the alternative matches sequentially.
- Backtracking occurs to try higher-pecking-order assertions.
- Quantifiers must be satisfied within their permissible range.
- Each atom matches according to its designated semantics.
Shorter segments are often easier:
if( /Brian/ && /John/ ) { ... } # vs if( /Brian.*John|John.*Brian/ ) { ... }
Progressive matching - starting where the previous pattern starts off
When to use RegExes:
RegExes are for matching patters, not for things that have a parser, like HTML and JSON. Should also not be used for Email Addresses.
"Regexes optimal for small HTML parsing problems, pessimal for large ones" -- Tom Christiansesn
Appropriate Alternatives:
- Complex grammars - use parsing classes
- Fixed-width fields - use unpack, substr
- Comma Separated Values - use CSV libraries
- Uncomplicated, predictable data - Regular Expressions!
Thanks to this article I can learn more. Expand your knowledge and abilities. Actually the article is very practical.
ReplyDeletehttps://scoreheromod.info/apk/
ReplyDeletehttps://trafficridermod.pro/apk/
https://vectormod.xyz/apk/
Greaat reading your blog post
ReplyDelete