Perl Quick Reference for Integration Services


Regular Expressions


Regular Expressions are the standard way in the UNIX world to match patterns.  While regular expressions differ slightly amongst the UNIX tools, the Perl set of regular expressions is the most powerful and complete. 


Each character matches itself except for the special characters: +?.*^$()[]{}|\

The special meaning of these characters can be escaped using a \



Matches an arbitrary character


Groups a series of pattern elements to a single element


Matches the beginning of the target


Matches the end of the line


Denotes a class of characters to match.  [^...] negates the class


Matches one of the alternatives.

(?: regex)

Grouping without back-references.  Used to group without storing

results in variables $1..$9


Matches the preceding pattern element one or more times


Matches the preceding pattern element zero or more times


Matches the preceding pattern element zero or one times


Denotes the minimum n and maximum m match count.  {n} means exactly n times; {n,} means at least n times;  {n,m} means between n and m times.


Matches word characters, i.e. alphanumeric including _.  \W matches non-alphanumerics. 


Matches whitespace.  \S matches non-whitespace.


Matches digits.  \D matches non-digits.


Matches word boundaries. 






carriage return








Refer to matched subexpressions grouped with (...)


Regular expression modifiers:



Matches as many times as possible


Case-insensitive matching


Treats the string as multiple lines


Treats the string as a single line


Comments and whitespace can be added to pattern for readability







Matching, Searching and Replacing, Transliterating:




Returns true or false depending on whether or not the pattern matched.  Searches expr (default: $_) for the pattern.  If you prepend the m you can use almost any pair of delimiters instead of the slashes.  This is useful if you are going to have lots of slashes in your pattern—avoids having to escape them all.  Most common alternative delimiters are {}, [], and ##. 




Searches the string var (default: $_) for a pattern, and if found, replaces that part with the replacement text.  It returns the number of substitutions made.  Almost any delimiter may replaces the slashes.  If bracketing delimiters are used pattern and newtext may have their own delimiters, e.g.,     s(foo)[bar]




Transliterates all occurrences of the characters found in the search list with corresponding characters in the replacement list. The d modifier deletes all characters found in the search list that do not have a corresponding character in the replacement list. 





Please send me (Jamin) more examples of regex’s you come up with or examples you’d like me to come up with.


In each example the string that contains the data we’re interested in is always in $_. 


Example 1:


A physician name is in the format:




And we’d like to extract each part of the name.


if (/(\w+),\s*(\w+)\s+(\w+)/) {

      $last   = $1;

      $first  = $2;

      $middle = $3;



So we’re capturing  a word followed by a comma, followed by some optional whitespace, then we’re capturing another word followed by some whitespace and then we’re capturing one more word. 


Example 2:


Let’s say instead of extracting each name into a variable we’d like to just reformat:








tr/ ,/^/d;


That will transliterate spaces into ^ and will delete commas since we added the d modifier.










Example 3:


A small program to mark up code for posting to the web.  Escapes characters that have special meaning in HTML


#!/usr/bin/perl -w


print ”<pre><code>\n”;

while (<>) {






print ”</pre></code>\n”;


Quick Examples :


Keep the first five characters of a string :

$first = substr($_, 0, 5);


Keep the last five characters of a string:

$last = substr($_, -5);


Keep all characters up to the first ^:



Remove leading zeros:



Remove leading spaces:

s/^ *//;


Remove trailing spaces:

s/ *$//;


Reformat a phone number (123) 456‑7890 to 1234567890:

tr/()- //d;


Reformat a phone number 1234567890 to (123) 456‑7890:

s/(\d{3})(\d{3})(\d{4})/($1) $2-$3/;


Reformat SSN 123‑45‑6789 to 123456789:



Reformat SSN 123456789 to 123-45-6789:



Keep only alpha chars:



Keep only numeric chars:



Left Pad number with zeros to 10 places:

$_ = sprintf(“%010d”, $_);


Right Pad string with spaces to 10 places

$_ = sprintf(“%-10s”, $_);