Educational program on GREP and regular expressions. Using regular expressions (regex) in Linux Regular expression table grep

The grep utility is a very powerful search and filter tool. text information. This article shows several examples of its use that will allow you to appreciate its capabilities.
The main use of grep is to search for words or phrases in files and output streams. You can search by typing a query and the search area (file) at the command line.
For example, to find the string “needle” in the hystack.txt file, use the following command:

$ grep needle haystack.txt

As a result, grep will display all occurrences of needle that it encounters in the contents of the haystack.txt file. It's important to note that in this case, grep is looking for a set of characters, not a word. For example, strings that include the word “needless” and other words that contain the sequence “needle” will be displayed.


To tell grep that you are looking for a specific word, use the -w switch. This key will limit the search to only the specified word. A word is a query delimited on both sides by any whitespace, punctuation, or line breaks.

$ grep -w needle haystack.txt

It is not necessary to limit the search to just one file; grep can search across a group of files, and the search results will indicate the file in which the match was found. The -n switch will also add the line number in which the match was found, and the -r switch will allow you to perform a recursive search. This is very convenient when searching among files with program source codes.

$ grep -rnw function_name /home/www/dev/myprogram/

The file name will be listed before each match. If you need to hide file names, use the -h switch, on the contrary, if you only need file names, then specify the -l switch
In the following example, we will search for URLs in the IRC log file and show the last 10 matches.

$ grep -wo http://.* channel.log | tail

The -o option tells grep to print only the pattern match rather than the entire line. Using pipe, we redirect the output of grep to the tail command, which by default outputs the last 10 lines.
Now we will count the number of messages sent to the irc channel by certain users. For example, all the messages I sent from home and work. They differ in nickname, at home I use the nickname user_at_home, and at work user_at_work.

$ grep -c "^user_at_(home|work)" channel.log

With the -c option, grep only prints the number of matches found, not the matches themselves. The search string is enclosed in quotes because it contains special characters that can be recognized by the shell as control characters. Please note that quotation marks are not included in the search pattern. The backslash "" is used to escape special characters.
Let's search for messages from people who like to “scream” in the channel. By “scream” we mean messages written in blondy-style, in all CAPITAL letters. To exclude random hits of abbreviations from the search, we will search for words of five or more characters:

$ grep -w "+(5,)" channel.log

For a more detailed description, you can refer to the grep man page.
A few more examples:

# grep root /etc/passwd root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin

Displays lines from the /etc/passwd file that contain the string root.

# grep -n root /etc/passwd 1:root:x:0:0:root:/root:/bin/bash 12:operator:x:11:0:operator:/root:/sbin/nologin

In addition, the line numbers that contain the searched line are displayed.

# grep -v bash /etc/passwd | grep -v nologin sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin :/sbin/halt news:x:9:13:news:/var/spool/news: mailnull:x:47:47::/var/spool/mqueue:/dev/null xfs:x:43:43: X Font Server:/etc/X11/fs:/bin/false rpc:x:32:32:Portmapper RPC user:/:/bin/false nscd:x:28:28:NSCD Daemon:/:/bin/false named:x:25:25:Named:/var/named:/bin/false squid:x:23:23::/var/spool/squid:/dev/null ldap:x:55:55:LDAP User: /var/lib/ldap:/bin/false apache:x:48:48:Apache:/var/www:/bin/false

Checks which users are not using bash, excluding those user accounts that have nologin specified as their shell.

# grep -c false /etc/passwd 7

Counts the number of accounts that have /bin/false as their shell.

# grep -i games ~/.bash* | grep -v history

This command displays lines from all files in the home directory current user, whose names begin with ~/.bash, except for those files whose names contain the string history, in order to exclude matches found in the file ~/.bash_history, which may have the same line in the top or lowercase. Please note that the search for the word “games” is carried out; you can substitute any other word instead.
grep command and regular expressions

Unlike the previous example, we will now display only those lines that begin with the line “root”:

# grep ^root /etc/passwd root:x:0:0:root:/root:/bin/bash

If we want to see which accounts haven't used the shell at all, we look for lines ending with a ":" character:

# grep:$ /etc/passwd news:x:9:13:news:/var/spool/news:

To check if the PATH variable in your ~/.bashrc file is exported, first select the lines with "export" and then look for lines starting with the line "PATH"; in this case, MANPATH and others will not be displayed possible ways:

# grep export ~/.bashrc | grep "PATH" export PATH="/bin:/usr/lib/mh:/lib:/usr/bin:/usr/local/bin:/usr/ucb:/usr/dbin:$PATH"

Character classes

The expression in square brackets is a list of characters enclosed within the characters [" and "]"". It matches any single character specified in this list; if the first character of the list is "^", then it matches any character that is NOT in the list. For example, the regular expression "" matches any single digit.

Within an expression in square brackets, you can specify a range consisting of two characters separated by a hyphen. Then the expression matches any singleton that, according to the sorting rules, falls inside these two characters, including these two characters; this takes into account the collation and character set specified in the locale. For example, when the default locale is C, the expression "" is equivalent to the expression "". There are many locales in which sorting is done in dictionary order, and in these locales "" is generally not equivalent to "", in which, for example, it may be equivalent to the expression "". To use the traditional interpretation of the bracketed expression, you can use the C locale by setting the LC_ALL environment variable to "C".

Finally, there are specially named character classes, which are specified inside expressions in square brackets. Additional information For information about these predefined expressions, see the man pages or grep command documentation.

# grep /etc/group sys:x:3:root,bin,adm tty:x:5: mail:x:12:mail,postfix ftp:x:50: nobody:x:99: floppy:x:19: xfs:x:43: nfsnobody:x:65534: postfix:x:89:

The example displays all lines that contain either the character "y" or the character "f".
Universal characters (metacharacters)

Use "." to match any single character. If you want a list of all English words taken from the dictionary containing five characters starting with "c" and ending with "h" (useful for solving crossword puzzles):

# grep " " /usr/share/dict/words catch clash cloth coach couch cough crash crush

If you want to display lines that contain a period character as a literal, then specify the -F option in the grep command. Symbols "< " и «>" means the presence of an empty line before and, accordingly, after the specified letters. This means that the words in the words file must be written accordingly. If you want to find all words in the text according to the specified patterns without taking into account empty lines, omit the symbols "< " и «>", for a more precise search of only words, use the -w switch.

To similarly find words that can have any number of characters between the “c” and “h,” use an asterisk (*). The example below selects all words starting with "c" and ending with "h" from the system dictionary:

# grep " " /usr/share/dict/words caliph cash catch cheesecloth cheetah --output omitted--

If you want to find a literal asterisk character in a file or output stream, use single quotes. The user in the example below first tries to look for an "asterisk" in the /etc/profile file without using quotes, which results in nothing being found. When quotes are used, the result is output:

# grep * /etc/profile # grep "*" /etc/profile for i in /etc/profile.d/*.sh ; do

In order to fully process texts in bash scripts using sed and awk, you just need to understand regular expressions. Implementations of this most useful tool can be found literally everywhere, and although all regular expressions are structured in a similar way and are based on the same ideas, working with them in different environments has certain features. Here we will talk about regular expressions that are suitable for use in scripts command line Linux.

This material is intended as an introduction to regular expressions, intended for those who may be completely unaware of what they are. So let's start from the very beginning.

What are regular expressions

Many people, when they first see regular expressions, immediately think that they are looking at a meaningless jumble of characters. But this, of course, is far from the case. Take a look at this regex for example


In our opinion, even an absolute beginner will immediately understand how it works and why it is needed :) If you don’t quite understand it, just read on and everything will fall into place.
A regular expression is a pattern that programs like sed or awk use to filter text. Templates use regular ASCII characters that represent themselves, and so-called metacharacters that play a special role, for example, allowing reference to certain groups of characters.

Types of Regular Expressions

Implementations of regular expressions in different environments, for example, in programming languages ​​like Java, Perl and Python, and in Linux tools like sed, awk and grep, have certain features. These features depend on so-called regular expression engines, which interpret patterns.
Linux has two regular expression engines:
  • An engine that supports the POSIX Basic Regular Expression (BRE) standard.
  • An engine that supports the POSIX Extended Regular Expression (ERE) standard.
Most Linux utilities conform to at least the POSIX BRE standard, but some utilities (including sed) understand only a subset of the BRE standard. One of the reasons for this limitation is the desire to make such utilities as fast as possible in text processing.

The POSIX ERE standard is often implemented in programming languages. It allows you to use big amount tools for developing regular expressions. For example, these could be special sequences of characters for frequently used patterns, such as searching for individual words or sets of numbers in text. Awk supports the ERE standard.

There are many ways to develop regular expressions, depending both on the opinion of the programmer and on the features of the engine for which they are created. It's not easy to write universal regular expressions that any engine can understand. Therefore, we will focus on the most commonly used regular expressions and look at the features of their implementation for sed and awk.

POSIX BRE regular expressions

Perhaps the simplest BRE pattern is a regular expression for searching for the exact occurrence of a sequence of characters in text. This is what searching for a string looks like in sed and awk:

$ echo "This is a test" | sed -n "/test/p" $ echo "This is a test" | awk "/test/(print $0)"

Finding text by pattern in sed


Finding text by pattern in awk

You may notice that the search for a given pattern is performed without taking into account the exact location of the text in the line. In addition, the number of occurrences does not matter. After the regular expression finds given text anywhere in the line, the line is considered suitable and is passed on for further processing.

When working with regular expressions, you need to take into account that they are case sensitive:

$ echo "This is a test" | awk "/Test/(print $0)" $ echo "This is a test" | awk "/test/(print $0)"

Regular expressions are case sensitive

The first regular expression did not find any matches because the word “test”, starting with a capital letter, does not appear in the text. The second, configured to search for a word written in capital letters, found a suitable line in the stream.

In regular expressions, you can use not only letters, but also spaces and numbers:

$ echo "This is a test 2 again" | awk "/test 2/(print $0)"

Finding a piece of text containing spaces and numbers

Spaces are treated as regular characters by the regular expression engine.

Special symbols

Using various characters In regular expressions, you need to take into account some features. Thus, there are some special characters, or metacharacters, the use of which in a template requires a special approach. Here they are:

.*^${}\+?|()
If one of them is needed in the template, it will need to be escaped using a backslash (backslash) - \ .

For example, if you need to find a dollar sign in the text, you need to include it in the template, preceded by an escape character. Let's say there is a file myfile with the following text:

There is 10$ on my pocket
The dollar sign can be detected using this pattern:

$awk "/\$/(print $0)" myfile

Using a special character in a pattern

In addition, the backslash is also a special character, so if you need to use it in a pattern, it will also need to be escaped. It looks like two slashes following each other:

$ echo "\ is a special character" | awk "/\\/(print $0)"

Escaping a backslash

Although the forward slash is not included in the list of special characters above, attempting to use it in a regular expression written for sed or awk will result in an error:

$ echo "3 / 2" | awk "///(print $0)"

Incorrect use of forward slash in a pattern

If it is needed, it must also be escaped:

$ echo "3 / 2" | awk "/\//(print $0)"

Escaping a forward slash

Anchor symbols

There are two special characters for linking a pattern to the beginning or end of a text string. The cap character - ^ allows you to describe sequences of characters that are found at the beginning of text lines. If the pattern you are looking for is somewhere else in the string, the regular expression will not respond to it. The use of this symbol looks like this:

$ echo "welcome to likegeeks website" | awk "/^likegeeks/(print $0)" $ echo "likegeeks website" | awk "/^likegeeks/(print $0)"

Finding a pattern at the beginning of a string

The ^ character is designed to search for a pattern at the beginning of a line, while the case of characters is also taken into account. Let's see how this affects the processing of a text file:

$awk "/^this/(print $0)" myfile


Finding a pattern at the beginning of a line in text from a file

When using sed, if you place a cap somewhere inside the pattern, it will be treated like any other regular character:

$ echo "This ^ is a test" | sed -n "/s ^/p"

Cap not at the beginning of the pattern in sed

In awk, when using the same template, this character must be escaped:

$ echo "This ^ is a test" | awk "/s\^/(print $0)"

Cover not at the beginning of the template in awk

We have figured out the search for text fragments located at the beginning of a line. What if you need to find something located at the end of a line?

The dollar sign - $, which is the anchor character for the end of the line, will help us with this:

$ echo "This is a test" | awk "/test$/(print $0)"

Finding text at the end of a line

You can use both anchor symbols in the same template. Let's process the file myfile, the contents of which are shown in the figure below, using the following regular expression:

$ awk "/^this is a test$/(print $0)" myfile


A pattern that uses special characters to start and end a line

As you can see, the template responded only to a line that fully corresponded to the given sequence of characters and their location.

Here's how to filter out empty lines using anchor characters:

$awk "!/^$/(print $0)" myfile
In this template I used a negation symbol, an exclamation point - ! . Using this pattern searches for lines that contain nothing between the beginning and end of the line, and thanks to exclamation mark Only lines that do not match this pattern are printed.

Dot symbol

The period is used to match any single character except the newline character. Let's pass the file myfile to this regular expression, the contents of which are given below:

$awk "/.st/(print $0)" myfile


Using a dot in regular expressions

As can be seen from the output data, only the first two lines from the file correspond to the pattern, since they contain the sequence of characters “st” preceded by another character, while the third line does not contain a suitable sequence, and the fourth does have it, but is in at the very beginning of the line.

Character classes

A dot matches any single character, but what if you want to be more flexible in limiting the set of characters you're looking for? In this situation, you can use character classes.

Thanks to this approach, you can organize a search for any character from a given set. To describe a character class, square brackets are used:

$awk "/th/(print $0)" myfile


Description of a character class in a regular expression

Here we are looking for a sequence of "th" characters preceded by an "o" character or an "i" character.

Classes come in handy when searching for words that can begin with either an uppercase or lowercase letter:

$ echo "this is a test" | awk "/his is a test/(print $0)" $ echo "This is a test" | awk "/his is a test/(print $0)"

Search for words that may begin with a lowercase or uppercase letter

Character classes are not limited to letters. Other symbols can be used here. It is impossible to say in advance in what situation classes will be needed - it all depends on the problem being solved.

Negation of character classes

Character classes can also be used to solve the inverse problem described above. Namely, instead of searching for symbols included in a class, you can organize a search for everything that is not included in the class. In order to achieve this regular expression behavior, you need to place a ^ sign in front of the list of class characters. It looks like this:

$ awk "/[^oi]th/(print $0)" myfile


Finding characters not in a class

In this case, sequences of “th” characters will be found that are preceded by neither “o” nor “i”.

Character ranges

In character classes, you can describe ranges of characters using dashes:

$awk "/st/(print $0)" myfile


Description of a range of characters in a character class

IN in this example the regular expression responds to the sequence of characters "st" preceded by any character located in alphabetical order, between the characters "e" and "p".

Ranges can also be created from numbers:

$ echo "123" | awk "//" $ echo "12a" | awk "//"

Regular expression to find any three numbers

A character class can include several ranges:

$awk "/st/(print $0)" myfile


A character class consisting of several ranges

This regular expression will find all sequences of “st” preceded by characters from ranges a-f and m-z.

Special character classes

BRE has special character classes that you can use when writing regular expressions:
  • [[:alpha:]] - matches any alphabetic character, written in upper or lower case.
  • [[:alnum:]] - matches any alphanumeric character, namely characters in the ranges 0-9 , A-Z , a-z .
  • [[:blank:]] - matches a space and a tab character.
  • [[:digit:]] - any digit character from 0 to 9.
  • [[:upper:]] - uppercase alphabetic characters - A-Z .
  • [[:lower:]] - lowercase alphabetic characters - a-z .
  • [[:print:]] - matches any printable character.
  • [[:punct:]] - matches punctuation marks.
  • [[:space:]] - whitespace characters, in particular - space, tab, characters NL, FF, VT, CR.
You can use special classes in templates like this:

$ echo "abc" | awk "/[[:alpha:]]/(print $0)" $ echo "abc" | awk "/[[:digit:]]/(print $0)" $ echo "abc123" | awk "/[[:digit:]]/(print $0)"


Special character classes in regular expressions

Star symbol

If you place an asterisk after a character in a pattern, this will mean that the regular expression will work if the character appears in the string any number of times - including the situation when the character is absent in the string.

$ echo "test" | awk "/tes*t/(print $0)" $ echo "tessst" | awk "/tes*t/(print $0)"


Using the * character in regular expressions

This wildcard is typically used for words that are constantly misspelled or for words that are subject to different variants correct spelling:

$ echo "I like green color" | awk "/colou*r/(print $0)" $ echo "I like green color " | awk "/colou*r/(print $0)"

Finding a word with different spellings

In this example, the same regular expression responds to both the word "color" and the word "colour". This is so due to the fact that the character “u”, followed by an asterisk, can either be absent or appear several times in a row.

Another useful feature that comes from the asterisk symbol is to combine it with a dot. This combination allows the regular expression to respond to any number of any characters:

$ awk "/this.*test/(print $0)" myfile


A template that responds to any number of any characters

In this case, it doesn’t matter how many and what characters are between the words “this” and “test”.

The asterisk can also be used with character classes:

$ echo "st" | awk "/s*t/(print $0)" $ echo "sat" | awk "/s*t/(print $0)" $ echo "set" | awk "/s*t/(print $0)"


Using an asterisk with character classes

In all three examples, the regular expression works because the asterisk after the character class means that if any number of "a" or "e" characters are found, or if none are found, the string will match the given pattern.

POSIX ERE regular expressions

The POSIX ERE templates that some Linux utilities support may contain additional characters. As already mentioned, awk supports this standard, but sed does not.

Here we will look at the most commonly used symbols in ERE patterns, which will be useful to you when creating your own regular expressions.

▍Question mark

A question mark indicates that the preceding character may appear once or not at all in the text. This character is one of the repetition metacharacters. Here are some examples:

$ echo "tet" | awk "/tes?t/(print $0)" $ echo "test" | awk "/tes?t/(print $0)" $ echo "tesst" | awk "/tes?t/(print $0)"


Question mark in regular expressions

As you can see, in the third case the letter “s” appears twice, so the regular expression does not respond to the word “testst”.

The question mark can also be used with character classes:

$ echo "tst" | awk "/t?st/(print $0)" $ echo "test" | awk "/t?st/(print $0)" $ echo "tast" | awk "/t?st/(print $0)" $ echo "taest" | awk "/t?st/(print $0)" $ echo "teest" | awk "/t?st/(print $0)"


Question mark and character classes

If there are no characters from the class in the line, or one of them occurs once, the regular expression works, but as soon as two characters appear in the word, the system no longer finds a match for the pattern in the text.

▍Plus symbol

The plus character in the pattern indicates that the regular expression will match what it is looking for if the preceding character occurs one or more times in the text. However, this construction will not react to the absence of a symbol:

$ echo "test" | awk "/te+st/(print $0)" $ echo "teest" | awk "/te+st/(print $0)" $ echo "tst" | awk "/te+st/(print $0)"


The plus symbol in regular expressions

In this example, if there is no “e” character in the word, the regular expression engine will not find matches to the pattern in the text. The plus symbol also works with character classes - in this way it is similar to the asterisk and question mark:

$ echo "tst" | awk "/t+st/(print $0)" $ echo "test" | awk "/t+st/(print $0)" $ echo "teast" | awk "/t+st/(print $0)" $ echo "teeast" | awk "/t+st/(print $0)"


Plus sign and character classes

In this case, if the line contains any character from the class, the text will be considered to match the pattern.

▍Curly braces

Curly braces, which can be used in ERE patterns, are similar to the symbols discussed above, but they allow you to more precisely specify the required number of occurrences of the symbol preceding them. You can specify a restriction in two formats:
  • n - a number specifying the exact number of searched occurrences
  • n, m are two numbers that are interpreted as follows: “at least n times, but no more than m.”
Here are examples of the first option:

$ echo "tst" | awk "/te(1)st/(print $0)" $ echo "test" | awk "/te(1)st/(print $0)"

Curly braces in patterns, searching for the exact number of occurrences

In older versions of awk you had to use the --re-interval command line option to make the program recognize intervals in regular expressions, but in newer versions this is not necessary.

$ echo "tst" | awk "/te(1,2)st/(print $0)" $ echo "test" | awk "/te(1,2)st/(print $0)" $ echo "teest" | awk "/te(1,2)st/(print $0)" $ echo "teeest" | awk "/te(1,2)st/(print $0)"


Spacing specified in curly braces

In this example, the character “e” must appear 1 or 2 times in the line, then the regular expression will respond to the text.

Curly braces can also be used with character classes. The principles you already know apply here:

$ echo "tst" | awk "/t(1,2)st/(print $0)" $ echo "test" | awk "/t(1,2)st/(print $0)" $ echo "teest" | awk "/t(1,2)st/(print $0)" $ echo "teeast" | awk "/t(1,2)st/(print $0)"


Curly braces and character classes

The template will react to the text if it contains the character “a” or the character “e” once or twice.

▍Logical “or” symbol

Symbol | - a vertical bar means a logical “or” in regular expressions. When processing a regular expression containing several fragments separated by such a sign, the engine will consider the analyzed text suitable if it matches any of the fragments. Here's an example:

$ echo "This is a test" | awk "/test|exam/(print $0)" $ echo "This is an exam" | awk "/test|exam/(print $0)" $ echo "This is something else" | awk "/test|exam/(print $0)"


Logical "or" in regular expressions

In this example, the regular expression is configured to search the text for the words “test” or “exam”. Please note that between the template fragments and the symbol separating them | there should be no spaces.

Regular expression fragments can be grouped using parentheses. If you group a certain sequence of characters, it will be perceived by the system as an ordinary character. That is, for example, repetition metacharacters can be applied to it. This is what it looks like:

$ echo "Like" | awk "/Like(Geeks)?/(print $0)" $ echo "LikeGeeks" | awk "/Like(Geeks)?/(print $0)"


Grouping regular expression fragments

In these examples, the word “Geeks” is enclosed in parentheses, followed by a question mark. Recall that a question mark means “0 or 1 repetition,” so the regular expression will respond to both the string “Like” and the string “LikeGeeks.”

Practical examples

Now that we've covered the basics of regular expressions, it's time to do something useful with them.

▍Counting the number of files

Let's write a bash script that counts files located in directories that are written to a variable environment PATH. In order to do this, you will first need to generate a list of directory paths. Let's do this using sed, replacing the colons with spaces:

$ echo $PATH | sed "s/:/ /g"
The replace command supports regular expressions as patterns for searching text. In this case, everything is extremely simple, we are looking for the colon symbol, but no one bothers us to use something else here - it all depends on the specific task.
Now you need to go through the resulting list in a loop and perform the actions necessary to count the number of files. The general outline of the script will be like this:

Mypath=$(echo $PATH | sed "s/:/ /g") for directory in $mypath do done
Now let’s write the full text of the script, using the ls command to obtain information about the number of files in each directory:

#!/bin/bash mypath=$(echo $PATH | sed "s/:/ /g") count=0 for directory in $mypath do check=$(ls $directory) for item in $check do count=$ [ $count + 1 ] done echo "$directory - $count" count=0 done
When running the script, it may turn out that some directories from PATH do not exist, however, this will not prevent it from counting files in existing directories.


File counting

The main value of this example is that using the same approach, you can solve much more complex problems. Which ones exactly depends on your needs.

▍Verifying email addresses

There are websites with huge collections of regular expressions that allow you to check addresses Email, phone numbers, and so on. However, it’s one thing to take something ready-made, and quite another to create something yourself. So let's write a regular expression to check email addresses. Let's start with analyzing the source data. Here, for example, is a certain address:

[email protected]
The username, username, can consist of alphanumeric and some other characters. Namely, this is a dot, a dash, an underscore, a plus sign. The username is followed by an @ sign.

Armed with this knowledge, let's start assembling the regular expression from its left side, which is used to check the username. Here's what we got:

^(+)@
This regular expression can be read as follows: “The line must begin with at least one character from those in the group specified in square brackets, followed by an @ sign.”

Now - the hostname queue - hostname . The same rules apply here as for the username, so the template for it will look like this:

(+)
The top level domain name is subject to special rules. There can only be alphabetic characters, of which there must be at least two (for example, such domains usually contain a country code), and no more than five. All this means that the template for checking the last part of the address will be like this:

\.({2,5})$
You can read it like this: “First there must be a period, then 2 to 5 alphabetic characters, and after that the line ends.”

Having prepared templates for individual parts of the regular expression, let's put them together:

^(+)@(+)\.({2,5})$
Now all that remains is to test what happened:

$ echo " [email protected]" | awk "/^(+)@(+)\.((2,5))$/(print $0)" $ echo " [email protected]" | awk "/^(+)@(+)\.((2,5))$/(print $0)"


Validating an email address using regular expressions

The fact that the text passed to awk is displayed on the screen means that the system recognized it as an email address.

Results

If the regular expression for checking email addresses that you came across at the very beginning of the article seemed completely incomprehensible then, we hope that now it no longer looks like a meaningless set of characters. If this is really so, it means this material fulfilled its purpose. In fact, regular expressions are a topic that you can study for a lifetime, but even the little that we have covered can already help you write scripts that process texts quite advanced.

In this series of materials we usually showed very simple examples bash scripts that consisted of literally several lines. Next time we'll look at something bigger.

Dear readers! Do you use regular expressions when processing text in command line scripts?

One of the most useful and feature-rich teams in Linux terminal– “grep” command. Grep is an acronym that stands for “global regular expression print” (that is, “search everywhere for strings matching a regular expression and print them out”). This means that grep can be used to see if input matches specified patterns.

This seemingly trivial program is very powerful when used correctly. Its ability to sort input based on complex rules makes it a popular link in many command chains.

This tutorial looks at some of the grep command's capabilities and then moves on to using regular expressions. All described in this manual techniques can be applied in managing a virtual server.

Basics of use

In its simplest form, grep is used to find matches of letter patterns in a text file. This means that if grep is given a search word, it will print every line in the file that contains that word.

As an example, you can use grep to find lines containing the word "GNU" in version 3 of the GNU General Public License on an Ubuntu system.

cd /usr/share/common-licenses
grep "GNU" GPL-3
GNU GENERAL PUBLIC LICENSE





13. Use with the GNU Affero General Public License.
under version 3 of the GNU Affero General Public License into a single
...
...

The first argument, "GNU", is the pattern to search for, and the second argument, "GPL-3", is the input file to be found.

As a result, all lines containing the text pattern will be output. In some Linux distributions the pattern you are looking for will be highlighted in the output lines.

General options

By default, the grep command simply searches for strictly specified patterns in the input file and prints the lines it finds. However, grep's behavior can be changed by adding some additional flags.

If you need to ignore the case of the search parameter and search for both uppercase and lowercase variations of the pattern, you can use the "-i" or "--ignore-case" utilities.

As an example, you can use grep to search the same file for the word "license" written in uppercase, lowercase, or mixed case.

grep -i "license" GPL-3
GNU GENERAL PUBLIC LICENSE
of this license document, but changing it is not allowed.
The GNU General Public License is a free, copyleft license for
The licenses for most software and other practical works are designed
the GNU General Public License is intended to guarantee your freedom to
GNU General Public License for most of our software; it applies also to


"This License" refers to version 3 of the GNU General Public License.
"The Program" refers to any copyrightable work licensed under this
...
...

As you can see, the output contains "LICENSE", "license", and "License". If there was an instance of "LiCeNsE" in the file, it would also be output.
If you need to find all lines that do not contain the specified pattern, you can use the "-v" or "--invert-match" flags.

As an example, you could use the following command to search the BSD license for all lines that do not contain the word "the":

grep -v "the" BSD
All rights reserved.
Redistribution and use in source and binary forms, with or without
are met:
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS"" ​​AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
...
...

As you can see, the last two lines were output as not containing the word "the" because the "ignore case" command was not used.

It is always useful to know the line numbers where the matches were found. They can be found using the "-n" or "--line-number" flags.

If you apply this flag in the previous example, the following result will be displayed:

grep -vn "the" BSD
2:All rights reserved.
3:
4:Redistribution and use in source and binary forms, with or without
6:are met:
13: may be used to endorse or promote products derived from this software
14: without specific prior written permission.
15:
16:THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS"" ​​AND
17:ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
...
...

You can now refer to the line number when you need to make changes on each line that does not contain "the".

Regular Expressions

As mentioned in the introduction, grep stands for “global regular expression print”. A regular expression is a text string that describes a specific search pattern.

Different applications and programming languages ​​use regular expressions slightly differently. This tutorial covers only a small subset of ways to describe patterns for Grep.

Letter matches

In the above examples of searching for the words "GNU" and "the", very simple regular expressions were looked for that exactly matched the character string "GNU" and "the".

It is more correct to think of them as matches of strings of characters rather than as matches of words. Once you become familiar with more complex patterns, this distinction will become more significant.

Patterns that exactly match given characters are called "letter" patterns because they match the pattern letter by letter, character by character.

All alphabetic and numeric characters (and some other characters) match literally unless they have been modified by other expression mechanisms.

Anchor matches

Anchors are special characters that indicate the location in a string of the desired match.

For example, you can specify that the search only needs lines that contain the word “GNU” at the very beginning. To do this, you need to use the anchor “^” before the letter string.

This example only prints lines that contain the word "GNU" at the beginning.

grep "^GNU" GPL-3
GNU General Public License for most of our software; it applies also to
GNU General Public License, you may choose any version ever published

Likewise, the anchor "$" can be used after a literal string to indicate that the match is only valid if the character string being searched is at the end of the text string.

The following regular expression prints only those lines that contain "and" at the end:

grep "and$" GPL-3
that there is no warranty for this free software. For both users" and
The precise terms and conditions for copying, distribution and


alternative is allowed only occasionally and noncommercially, and
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
provisionally, unless and until the copyright holder explicitly and
receives a license from the original licensors, to run, modify and
make, use, sell, offer for sale, import and otherwise run, modify and

Match any character

The period (.) is used in regular expressions to indicate that any character can appear at the specified location.

For example, if you want to find matches that contain two characters and then the sequence "cept", you would use the following pattern:

grep "..cept" GPL-3
use, which is precisely where it is most unacceptable. Therefore, we
infringement under applicable copyright law, except executing it on a
tells the user that there is no warranty for the work (except to the

form of a separately written license, or stated as exceptions;
You may not propagate or modify a covered work except as expressly
9. Acceptance Not Required for Having Copies.
...
...

As you can see, the results include the words “accept” and “except”, as well as variations of these words. The pattern would also match the sequence “z2cept” if it were in the text.

Expressions in parentheses

By placing a group of characters within square brackets (""), you can indicate that any of the characters in the brackets can appear at that position.

This means that if you need to find strings containing "too" or "two", you can briefly indicate these variations using the following pattern:

grep "to" GPL-3
your programs, too.

Developers that use the GNU GPL protect your rights with two steps:
a computer network, with no transfer of a copy, is not conveying.

Corresponding Source from a network server at no charge.
...
...

As you can see, both variations were found in the file.

Putting characters in parentheses also provides several useful features. You can indicate that everything except the characters in brackets matches the pattern by starting the list of characters in brackets with the character “^”.

This example uses the ".ode" pattern, which must not match the "code" sequence.

grep "[^c]ode" GPL-3
1. Source Code.
model, to give anyone who possesses the object code either (1) a
the only significant mode of use of the product.
notice like this when it starts in an interactive mode:

It's worth noting that the second line output contains the word "code". This is not a regex or grep error.

Rather, this line was printed because it also contains the pattern-matching sequence "mode" found in the word "model". That is, the string was printed because it matched the pattern.

Another one useful feature brackets - the ability to specify a range of characters instead of entering each character separately.

This means that if you need to find every line that starts with a capital letter, you can use the following pattern:

grep "^" GPL-3
GNU General Public License for most of our software; it applies also to

License. Each licensee is addressed as "you". "Licenses" and


System Libraries, or general-purpose tools or generally available free
Source.

...
...

Due to some inherent collation problems, for more accurate results it is better to use POSIX character classes instead of the character range used in the example above.
There are many character classes not covered in this manual; for example, to perform the same procedure as in the example above, you can use the character class "[:upper:]" in parentheses.

grep "^[[:upper:]]" GPL-3
GNU General Public License for most of our software; it applies also to
States should not allow patents to restrict development and use of
License. Each licensee is addressed as "you". "Licenses" and
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
System Libraries, or general-purpose tools or generally available free
Source.
User Product is transferred to the recipient in perpetuity or for a
...
...

Repeat pattern (0 or more times)

One of the most commonly used metacharacters is the "*" symbol, which means "repeat the previous character or expression 0 or more times."

For example, if you want to find every line with opening or closing parentheses that contain only letters and single spaces between them, you can use the following expression:

grep "(*)" GPL-3

distribution (with or without modification), making available to the
than the work as a whole, that (a) is included in the normal form of
Component, and (b) serves only to enable use of the work with that
(if any) on which the executable work runs, or a compiler used to
(including a physical distribution medium), accompanied by the
(including a physical distribution medium), accompanied by a
place (gratis or for a charge), and offer equivalent access to the
...
...

How to avoid metacharacters

Sometimes you may need to look for a literal period or a literal open parenthesis. Because these characters have a specific meaning in regular expressions, you need to "escape" them by telling grep that their special meaning is not needed in this case.

These characters can be escaped by using a backslash (\) before the character, which usually has special meaning.

For example, if you need to find a string that starts with a capital letter and ends with a period, you can use the expression below. The backslash before the last dot tells the command to "escape" it, so that the last dot represents a literal dot and has no "any character" meaning:

grep "^.*\.$" GPL-3
Source.
License by making exceptions from one or more of its conditions.
License would be to refrain entirely from conveying the Program.
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
SUCH DAMAGES.
Also add information on how to contact you by electronic and paper mail.

Advanced Regular Expressions

The Grep command can also be used with an extended regular expression language by using the -E flag or by calling the egrep command instead of grep.

These commands open up the capabilities of "extended regular expressions". Extended regular expressions include all the basic metacharacters, as well as additional metacharacters to express more complex matches.

Grouping

One of the simplest and most useful features that extended regular expressions provide is the ability to group expressions and use them as a single unit.

Parentheses are used to group expressions. If you need to use parentheses outside of extended regular expressions, they can be "escaped" using a backslash

grep "\(grouping\)" file.txt
grep -E "(grouping)" file.txt
egrep "(grouping)" file.txt

The above expressions are equivalent.

Alternation

Just as square brackets specify different possible matches for a single character, interleaving allows you to specify alternative matches for strings of characters or sets of expressions.

The vertical bar symbol “|” is used to indicate alternation. Alternation is often used in grouping to indicate that one of two or more possible options should be considered a match.

In this example, you need to look for “GPL” or “General Public License”:

grep -E "(GPL|General Public License)" GPL-3
The GNU General Public License is a free, copyleft license for
the GNU General Public License is intended to guarantee your freedom to
GNU General Public License for most of our software; it applies also to
price. Our General Public Licenses are designed to make sure that you
Developers that use the GNU GPL protect your rights with two steps:
For the developers" and authors" protection, the GPL clearly explains
authors" sake, the GPL requires that modified versions be marked as
have designed this version of the GPL to prohibit the practice for those
...
...

Alternation can be used to choose between two or more options; To do this, you need to enter the remaining options into the selection group, separating each one using the vertical bar symbol “|”.

Quantifiers

In extended regular expressions, there are metacharacters that indicate how often a character is repeated, much like the metacharacter "*" indicates that the previous character or string of characters matches 0 or more times.

To match a character 0 or more times, you can use the "?" character. It will make the previous character or series of characters essentially optional.

In this example, by inserting the sequence “copy” into the optional group, matches “copyright” and “right” are displayed:

grep -E "(copy)?right" GPL-3
Copyright (C) 2007 Free Software Foundation, Inc.
To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights. Therefore, you have
know their rights.
Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
"Copyright" also means copyright-like laws that apply to other kinds of
...
...

The "+" character matches expressions 1 or more times. It works almost like the "*" symbol, but when using "+" the expression must match at least 1 time.

The following expression matches the string "free" plus 1 or more characters that are not whitespace:

grep -E "free[^[:space:]]+" GPL-3
The GNU General Public License is a free, copyleft license for
to take away your freedom to share and change the works. By contrast,
the GNU General Public License is intended to guarantee your freedom to
When we speak of free software, we are referring to freedom, not
have the freedom to distribute copies of free software (and charge for

freedoms that you received. You must make sure that they, too, receive
protecting users" freedom to change the software. The systematic
of the GPL, as needed to protect the freedom of users.
patents cannot be used to render the program non-free.

Number of matches repeated

If you need to specify the number of times matches should be repeated, you can use curly braces (“( )”). These symbols are used to indicate the exact number, range, and upper and lower limits of the number of matches of an expression.

If you need to find all lines that contain a combination of three vowels, you can use the following expression:

grep -E "(3)" GPL-3
changed, so that their problems will not be attributed erroneously to
authors of previous versions.
receive it, in any medium, provided that you conspicuously and
give under the previous paragraph, plus a right to possession of the
covered work so as to satisfy simultaneously your obligations under this
If you need to find all words consisting of 16-20 characters, use the following expression:
grep -E "[[:alpha:]](16,20)" GPL-3
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.
c) Prohibiting misrepresentation of the origin of that material, or

conclusions

In many cases, the grep command is useful for finding patterns within files or in a hierarchy file system. It saves a lot of time, so it's worth familiarizing yourself with its parameters and syntax.

Regular expressions are even more versatile and can be used in many popular programs. For example, many text editors use regular expressions to search and replace text.

Moreover, advanced programming languages ​​use regular expressions to execute procedures on specific pieces of data. Knowing how to work with regular expressions comes in handy when solving common computer-related problems.

Tags: ,

Good afternoon, guests!

In today's article I want to touch on such a huge topic as Regular Expressions. I think everyone knows that the topic of regexes (as regular expressions are called in slang) is vast in the scope of one post. Therefore, I will try to briefly, but as clearly as possible, collect my thoughts and convey them to you in .

Let me start by saying that there are several types of regular expressions:

1. Traditional Regular Expressions(they are also basic, basic and basic regular expressions(BRE))

  • The syntax of these expressions is defined as obsolete, but nevertheless is still widespread and used by many UNIX utilities
  • Basic regular expressions include the following metacharacters (more on their meanings below):
    • \( \) - initial version for ( ) (in extended)
    • \(\) - initial version for () (in extended)
    • \n, Where n- number from 1 to 9
  • Features of using these metacharacters:
    • An asterisk must follow the expression corresponding to a single character. Example: *.
    • Expression \( block\)* should be considered incorrect. In some cases it matches zero or more repetitions of the string block. In others it corresponds to the string block* .
    • Within a character class, special character meanings are largely ignored. Special cases:
    • To add a ^ character to a set, it must not be placed first there.
    • To add a - character to a set, it must be placed there first or last. For example:
      • DNS name template, which may include letters, numbers, minus and a dot: [-0-9a-zA-Z.] ;
      • any character except minus and numbers: [^-0-9] .
    • To add a [ or ] character to a set, it must be placed there first. For example:
      • matches ], [, a or b.

2. Advanced Regular Expressions(they are extended regular expressions(ERE))

  • The syntax of these expressions is similar to the syntax of the main expressions, with the exception of:
    • Removed the use of backslashes for the ( ) and () metacharacters.
    • A backslash before a metacharacter overrides its special meaning.
    • Rejected theoretically irregular design\ n .
    • Added metacharacters + , ? , | .

3. Regular expressions compatible with Perl(they are Perl-compatible regular expressions(PCRE))

  • have a richer and at the same time predictable syntax than even POSIX ERE, so they are often used by applications.

Regular Expressions consist of templates, or rather set a template search. The template consists from rules searches, which are made up of characters And metacharacters.

Search rules are determined by the following operations:

Enumeration |

Pipe (|) separates valid options, one might say - logical OR. For example, "gray|grey" matches gray or gray.

Group or union()

Round brackets are used to define the scope and precedence of operators. For example, "gray|grey" and "gr(a|e)y" are different patterns, but they both describe a set containing gray And gray.

Quantify()? * +

Quantifier after a character or group determines how many times antecedent expression may occur.

general expression, repetitions may be from m to n inclusive.

general expression m or more repetitions.

general expression no more than n repetitions.

smoothn repetitions.

Question mark means 0 or 1 times, same as {0,1} . For example, "colou?r" matches and color, And color.

Star means 0, 1 or any number once ( {0,} ). For example, "go*gle" matches ggle, Google, google and etc.

Plus means at least 1 once ( {1,} ). For example, "go+gle" matches Google, google etc. (but not ggle).

The exact syntax of these regular expressions is implementation dependent. (that is, in basic regular expressions symbols ( And )- escaped with a backslash)

Metacharacters, in simple terms, are symbols that do not correspond to their real meaning, that is, a symbol. (dot) is not a dot, but any one character, etc. Please familiarize yourself with the metacharacters and their meanings:

. corresponds alone any symbol
[something] Compliant any single character from those enclosed in brackets. In this case: The “-” character is interpreted literally only if it is located immediately after an opening or before a closing parenthesis: or [-abc]. Otherwise, it denotes a character interval. For example, matches "a", "b" or "c". corresponds to lower case letters of the Latin alphabet. These designations can be combined: matches a, b, c, q, r, s, t, u, v, w, x, y, z. To match the characters “[” or “]”, it is enough that the closing bracket was the first character after the opening character: matches "]", "[", "a" or "b". If the value in square brackets is preceded by a ^ character, then the value of the expression matches single character from among those which are not in brackets. For example, [^abc] matches any character other than "a", "b", or "c". [^a-z] matches any character except lowercase characters in the Latin alphabet.
^ Matches the beginning of the text (or the beginning of any line if the mode is line-by-line).
$ Matches the end of the text (or the end of any line if the mode is line-by-line).
\(\) or () Declares a "marked subexpression" (grouped expression) that can be used later (see the following element: \ n). A "marked subexpression" is also a "block". Unlike other operators, this one (in traditional syntax) requires a backslash; in extended and Perl, the \ character is not needed.
\n Where n- this is a number from 1 to 9; corresponds n the th marked subexpression (for example (abcd)\0, that is, the characters abcd are marked with zero). This design is theoretically irregular, it was not accepted in the extended regular expression syntax.
*
  • Star after an expression matching a single character, matches zero or more copies this (preceding) expression. For example, "*" matches the empty string, "x", "y", "zx", "zyx", etc.
  • \n*, Where n is a digit from 1 to 9, matches zero or more occurrences to match n th marked subexpression. For example, "\(a.\)c\1*" matches "abcab" and "abcaba", but not "abcac".

An expression enclosed in "\(" and "\)" followed by a "*" should be considered illegal. In some cases, it matches zero or more occurrences of the string that was enclosed in parentheses. In others, it matches the expression enclosed in parentheses, given the "*" character.

\{x,y\} Corresponds to the last one ( upcoming) block occurring at least x and no more y once. For example, "a\(3,5\)" matches "aaa", "aaaa" or "aaaaa". Unlike other operators, this one (in traditional syntax) requires a backslash.
.* Designation of any number of any characters between two parts of a regular expression.

Metacharacters help us use various matches. But how can we represent a metacharacter as a regular character, that is, the symbol [ (square bracket) with the meaning of a square bracket? Just:

  • must be preceded ( shield) metacharacter (. * + \ ? ( )) backslash. For example \. or \[

To simplify the definition of some character sets, they were combined into the so-called. classes and categories of characters. POSIX has standardized the declaration of certain character classes and categories, as shown in the following table:

POSIX class similarly designation
[:upper:] uppercase characters
[:lower:] lowercase characters
[:alpha:] upper and lower case characters
[:alnum:] numbers, upper and lower case characters
[:digit:] numbers
[:xdigit:] hexadecimal digits
[:punct:] [.,!?:…] punctuation marks
[:blank:] [\t] space and TAB
[:space:] [\t\n\r\f\v] skip characters
[:cntrl:] control characters
[:graph:] [^\t\n\r\f\v] seal symbols
[:print:] [^\t\n\r\f\v] seal symbols and skip symbols

In regex there is such a thing as:

Greed regex

I will try to describe it as clearly as possible. Let's say we want to find everything HTML tags in some text. Having localized the problem, we want to find the values ​​contained between< и >, along with these same brackets. But we know that tags have different lengths and there are at least 50 tags themselves. Listing them all, enclosing them in metasymbols, is too time-consuming a task. But we know that we have an expression.* (dot asterisk), which characterizes any number of any characters in the line. By using given expression we will try to find in the text (

So, How to create RAID level 10/50 on the LSI MegaRAID controller (also relevant for: Intel SRCU42x, Intel SRCS16):

) all values ​​between< и >. As a result, the ENTIRE line will match this expression. why, because regex is GREEDY and tries to capture ANY ALL number of characters between< и >, respectively the entire line, starting < p>So... and ending ...> will belong to this rule!

I hope this example makes it clear what greed is. To get rid of this greed, you can follow the following path:

  • take into account the symbols Not corresponding to the desired pattern (for example:<[^>]*> for the above case)
  • get rid of greed by adding a definition of the quantifier as non-greedy:
    • *? - "not greedy" ("lazy") equivalent *
    • +? - “not greedy” (“lazy”) equivalent +
    • (n,)? - “not greedy” (“lazy”) equivalent (n,)
    • .*? - “not greedy” (“lazy”) equivalent.*

I would like to add to all of the above extended regular expression syntax:

Regular expressions in POSIX are similar to traditional Unix syntax, but with the addition of some metacharacters:

Plus indicates that previous symbol or group may be repeated one or more times. Unlike the asterisk, at least one repetition is required.

Question mark does previous symbol or group optional. In other words, in the corresponding line it may be absent or present smooth one once.

Vertical bar divides alternative options regular expressions. One character specifies two alternatives, but there can be more of them, just use more vertical bars. It is important to remember that this operator uses as much of the expression as possible. For this reason, the alternative operator is most often used inside parentheses.

The use of backslashes has also been abolished: \(…\) becomes (…) and \(…\) becomes (…).

To conclude the post, I will give some examples of using regex:

$ cat text1 1 apple 2 pear 3 banana $ grep p text1 1 apple 2 pear $ grep "pp*" text1 1 apple 2 pear $ cat text1 | grep "l\|n" 1 apple 3 banana $ echo -e "find an\n* here" | grep "\*" * here $ grep "pl\?.*r" text1 # p, on lines where there is an r 2 pear $ grep "a.." text1 # lines with an a followed by at least 2 characters 1 apple 3 banana $ grep "" text1 # search for lines that contain 3 or p 1 apple 2 pear 3 banana $ echo -e "find an\n* here\nsomewhere." | grep "[.*]" * here somewhere..name]$ echo -e "123\n456\n789\n0" | grep "" 123,456,789 $ sed -e "/\(a.*a\)\|\(p.*p\)/s/a/A/g" text1 # replace a with A in all lines where after a comes a or after p comes p 1 Apple 2 pear 3 bAnAnA *\./ LAST WORD./g" First. A LAST WORD. This is a LAST WORD.

Best regards, McSim!

grep stands for 'global regular expression printer'. grep cuts the lines you need from text files which contain user-specified text.

grep can be used in two ways - on its own or in combination with streams.

grep is very extensive in functionality due to the large number of options it supports, such as: searching using a string pattern or RegExp regular expression pattern or perl based regular expressions, etc.

Due to its different functionality The grep tool has many options including egrep (Extended GREP), fgrep (Fixed GREP), pgrep (Process GREP), rgrep (recursive GREP) etc. But these options have minor differences from the original grep.

grep options

$ grep -V grep (GNU grep) 2.10 Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+

There are modifications of the grep utility: egrep (with extended regular expression processing), fgrep (which treats $*^|()\ symbols as literals, i.e. literally), rgrep (with recursive search enabled).

    egrep is the same as grep -E

    fgrep is the same as grep -F

    rgrep is the same as grep -r

    grep [-b] [-c] [-i] [-l] [-n] [-s] [-v] restricted_regex_BRE [file ...]

The grep command matches lines in source files against the pattern specified by limited_regex. If no files are specified, standard input is used. Typically, each successfully matched string is copied to standard output; if there are several source files, the file name is given before the found line. grep uses a compact, non-deterministic algorithm. Restricted regular expressions (expressions that have strings of characters with their meanings and use a limited set of alphanumeric and special characters) are perceived as templates. They have the same meaning as regular expressions in ed.

To escape the characters $, *, , ^, |, (), and \ from shell interpretation, it is easiest to enclose the constrained_regex in single quotes.

Options:

B Prefaces each line with the block number in which it was found. This can be useful when searching for blocks by context (blocks are numbered starting from 0). -c Prints only the number of lines containing the pattern. -h Prevents the file name containing the matched line from being printed before the line itself. Used when searching across multiple files. -i Ignores case when making comparisons. -l Prints only the names of the files containing the matching strings, one per line. If a pattern is found on multiple lines of a file, the file name is not repeated. -n Prints before each line its number in the file (lines are numbered starting from 1). -s Suppresses messages about non-existent or unreadable files. -v Prints all lines except those containing a pattern. -w Searches the expression as a word, as if it were surrounded by metacharacters \< и \>.

grep --help

Usage: grep [OPTION]... PATTERN [FILE]... Searches for PATTERN in each FILE or standard input. By default, PATTERN is a simple regular expression (BRE). Example: grep -i "hello world" menu.h main.c Selecting the type of regular expression and its interpretation: -E, --extended-regexp PATTERN - extended regular expression (ERE) -F, --fixed-regexp PATTERN - strings fixed length, separated by a newline character -G, --basic-regexp PATTERN - simple regular expression (BRE) -P, --perl-regexp PATTERN - Perl regular expressions -e, --regexp=PATTERN use PATTERN to search - f, --file=FILE take PATTERN from FILE -i, --ignore-case ignore case difference -w, --word-regexp PATTERN must match all words -x, --line-regexp PATTERN must match entire line -z, --null-data lines are separated by a null byte rather than a line end character Miscellaneous: -s, --no-messages suppress error messages -v, --revert-match select unmatched lines -V, - -version print version information and exit --help show this help and exit --mmap for backward compatibility, ignored Output control: -m, --max-count=NUMBER stop after the specified NUMBER of matches -b, --byte-offset print the byte offset along with the output lines -n, --line-number print the line number along with the output lines --line-buffered flush the buffer after each line -H, --with-filename print the file name for each match -h, --no-filename do not start output with the file name --label=LABEL use LABEL as the file name for standard input -o, --only-matching show only part of the line matching PATTERN -q, --quiet, --silent suppress all normal output --binary-files=TYPE assume that binary file has a TYPE: binary, text or without-match. -a, --text same as --binary-files=text -I same as --binary-files=without-match -d, --directories=ACTION how to handle directories ACTION can be read ), recurse (recursively) or skip (skip). -D, --devices=ACTION how to handle devices, FIFOs and sockets ACTION can be read or skip -R, -r, --recursive same as --directories=recurse --include=F_PATTERN process only files matching under F_TEMPLATE --exclude=F_TEMPLATE skip files and directories matching F_TEMPLATE --exclude-from=FILE skip files matching the template files from FILE --exclude-dir=TEMPLATE directories matching PATTERN will be skipped -L, - -files-without-match print only FILE names without matches -l, --files-with-matches print only FILE names with matches -c, --count print only the number of matching lines per FILE -T, --initial-tab align tab (if necessary) -Z, --null print byte 0 after the FILE name Context management: -B, --before-context=NUM print the NUMBER of lines of the preceding context -A, --after-context=NUM print the NUMBER of lines of the subsequent context -C, --context[=NUMBER] print the NUMBER of context lines -NUMBER is the same as --context=NUMBER --color[=WHEN], --colour[=WHEN] use markers to distinguish matching lines; WHEN can be always, never or auto -U, --binary do not remove CR characters at the end of the line (MSDOS) -u, --unix-byte-offsets show offset as if there were none CR-s (MSDOS) Instead of “egrep”, it is supposed to run “grep -E”. "grep -F" is assumed instead of "fgrep". It is better not to run as “egrep” or “fgrep”. When FILE is not specified, or when FILE is -, then standard input is read. If fewer than two files are specified, -h is assumed. If a match is found, the exit code will be 0, and 1 if not. If errors occur, or if the -q option is not specified, the exit code will be 2. Report errors to: Please report errors in translation to: GNU Grep home page: Help for working with GNU programs: