Perl : Optimizing pattern searches with Regexp::Assemble

When you perform a pattern matching with multiple “or” (|) clauses, such as /pattern1|pattern2|pattern3/, Perl regexp engine will try to match each of them one after the other in sequence, resulting in poor performance if you have a long list of “or” clauses.

In order to optimize such a pattern matching, you can use the Regexp::Assemble module.

The basic purpose of Regexp::Assemble is to take a list of regexps, use it to generate an optimized equivalent pattern, and to match it against a string.

Classical examples are matching a list of IP addresses or FQDN against log files, list of words against text files, etc.

The building of the optimized pattern may take time, but it is done only once, so it only makes sense to use Regexp::Assemble if you have a rather long list of regexps to match and if the number of matches to perform is substantial (i.e : use the optimized pattern on big files).

Otherwise the overhead of building the optimized pattern might kill the overall performance gain.

See Regexp::Assemble on CPAN for all details and methods.