BASH

Bash Regular Expressions Example

A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions by using various operators to combine smaller expressions.
 
 
 
 
 
 
 
 
 


 
The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any metacharacter with special meaning may be quoted by preceding it with a backslash.

The following table shows an overview of the whole article:

1. Regular expression metacharacters

A regular expression may be followed by one of several repetition operators (metacharacters):

OperatorEffect
.Matches any single character.
?The preceding item is optional and will be matched, at most, once.
*The preceding item will be matched zero or more times.
+The preceding item will be matched one or more times.
{N}The preceding item is matched exactly N times.
{N,}The preceding item is matched N or more times.
{N,M}The preceding item is matched at least N times, but not more than M times.
represents the range if it’s not first or last in a list or the ending point of a range in a list.
^Matches the empty string at the beginning of a line; also represents the characters not in the range of a list.
$Matches the empty string at the end of a line.
\bMatches the empty string at the edge of a word.
\BMatches the empty string provided it’s not at the edge of a word.
\<Match the empty string at the beginning of word.
\>Match the empty string at the end of word.

Two regular expressions may be concatenated; the resulting regular expression matches any string formed by concatenating two substrings that respectively match the concatenated subexpressions.

Two regular expressions may be joined by the infix operator “|”. The resulting regular expression matches any string matching either subexpression.

Repetition takes precedence over concatenation, which in turn takes precedence over alternation. A whole subexpression may be enclosed in parentheses to override these precedence rules.

2. Examples using grep

The command grep searches the input files for lines containing a match to a given pattern list. When it finds a match in a line, it copies the line to standard output (by default), or whatever other sort of output you have requested with options.

Though grep expects to do the matching on text, it has no limits on input line length other than available memory, and it can match arbitrary characters within a line. If the final byte of an input file is not a newline, grep silently supplies one. Since newline is also a separator for the list of patterns, there is no way to match newline characters in a text.

The following Textfile will be used for the next examples:

Text.txt

 
We shall not spend a large expense of time
Before we reckon with your several loves,
And make us even with you. My thanes and kinsmen,
Henceforth be earls, the first that ever Scotland
In such an honour named. What's more to do,
Which would be planted newly with the time,
As calling home our exiled friends abroad
That fled the snares of watchful tyranny;
Producing forth the cruel ministers
Of this dead butcher and his fiend-like queen,
Who, as 'tis thought, by self and violent hands
Took off her life; this, and what needful else
That calls upon us, by the grace of Grace,
We will perform in measure, time and place:
So, thanks to all at once and to each one,
Whom we invite to see us crown'd at Scone.
Macbeth, William Shakespeare

With the first command, the lines from Test.txt containing the string with will be displayed.

The next command displays the line numbers containing this search string.

A simple Regular Expression Example using grep
A simple Regular Expression Example using grep

2.1 Line and word anchors

In the following example, we now exclusively want to display lines starting with the string “We”.

In the next example, we search for lines ending in “:”.

A Regular Expression Line Anchor Example
A Regular Expression Line Anchor Example

2.2 Character classes

A bracket expression is a list of characters enclosed by “[” and “]”. It matches any single character in that list.

If the first character of the list is the caret, “^”, then it matches any character NOT in the list. For example, the regular expression “[0123456789]” matches any single digit.

Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive, using the locale’s collating sequence and character set.

For example, in the default C locale, “[a-d]” is equivalent to “[abcd]”. Many locales sort characters in dictionary order, and in these locales “[a-d]” is typically not equivalent to “[abcd]”; it might be equivalent to “[aBbCcDd]”, for example.

To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value “C”.

In the following example, all the lines containing either a “y” or “c” character are displayed:

A Regular Expression Character Class Example
A Regular Expression Character Class Example

2.3 Wildcards

Use the “.” for a single character match. If you want to get a list of all five-character English dictionary words starting with “c” and ending in “s” (handy for solving crosswords):

A Regular Expression Wildcards Example
A Regular Expression Wildcards Example

If you want to display lines containing the literal dot character, use the -F option to grep.

For matching multiple characters, use the asterisk. This example selects all words starting with “c” and ending in “s” from the system’s dictionary:

A Regular Expression Wildcards Example
A Regular Expression Wildcards Example

3. Pattern matching using Bash features

3.1 Character ranges

Apart from grep and regular expressions, there’s a good deal of pattern matching that you can do directly in the shell, without having to use an external program.

As you already know, the asterisk (*) and the question mark (?) match any string or any single character, respectively. Quote these special characters to match them literally:

A Regular Expression Character Range Example
A Regular Expression Character Range Example

This lists all files in the current directory, starting with “A”, “B” or “C”.

If the first character within the braces is “!” or “^”, any character not enclosed will be matched. To match the dash (“-“), include it as the first or last character in the set.

The sorting depends on the current locale and of the value of the LC_COLLATE variable, if it is set. Mind that other locales might interpret “[a-cx-z]” as “[aBbCcXxYyZz]” if sorting is done in dictionary order.

If you want to be sure to have the traditional interpretation of ranges, force this behavior by setting LC_COLLATE or LC_ALL to “C”.

3.2 Character classes

Character classes can be specified within the square braces, using the syntax [:CLASS:], where CLASS is defined in the POSIX standard and has one of the values:

  • alnum
  • alpha
  • ascii
  • blank
  • cntrl
  • digit
  • graph
  • lower
  • print
  • punct
  • space
  • upper
  • word
  • xdigit

In the following example are all Files listed, which Begins with an Uppercase letter.

A Regular Expression Character Class Example
A Regular Expression Character Class Example

When the extglob shell option is enabled (using the shopt built-in), several extended pattern matching operators are recognized.

4. Summary

Regular expressions are powerful tools for selecting particular lines from files or output. A lot of UNIX commands use regular expressions: vim, perl, the PostgreSQL database and so on.

They can be made available in any language or application using external libraries, and they even found their way to non-UNIX systems. For instance, regular expressions are used in the Excell spreadsheet that comes with the MicroSoft Windows Office suite.

In this chapter we got the feel of the grep command, which is indispensable in any UNIX environment.

Andreas Pomarolli

Andreas has graduated from Computer Science and Bioinformatics at the University of Linz. During his studies he has been involved with a large number of research projects ranging from software engineering to data engineering and at least web engineering. His scientific focus includes the areas of software engineering, data engineering, web engineering and project management. He currently works as a software engineer in the IT sector where she is mainly involved with projects based on Java, Databases and Web Technologies.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button