Build Your Own PCRE Regex Tester With PHP

Build You Own PCRE Regex Tester With PHP

As promised, in this episode we will build our very own regular expression tester so we can test out various regex patterns. As we covered in the prior post on regular expressions, there are many great tools to do this online as well. So what is the reason to build our own? It’s simple really, it’s fun to do and a great learning exercise too. What we’ll set out to do is build a simple one page application that accepts a string of data, or subject, as well as a pattern. We’ll then have a submit button that will run the pattern against the subject using the preg_match_all function build into PHP. Let’s check it out.

Create The Form

First up, we need the form to accept the data we want to test.

So in this snippet there are a few things to note. We’re simply going to post the form to preg_match_all.php which is the name of the one file in this application. We include a bit of code to check for the presence of data that may have been submitted via the form. If it has, we repopulate the form with those values. This way, we don’t have to continually paste back in the data that we’re testing against for the subject. This also helps for the pattern since we can make incremental edits to the pattern easily and continue to test against the subject in question. Finally we include a simple submit button, which we’ve addressed with the text, “Preg Match All”. It looks like this.
regex subject form

Cool. We can see there is a text area to input some data that we want to check against, a text input for the regular expression pattern, and a simple submit button.

The Form Processing Script

Here is the snippet of PHP that will handle the form processing when it is submitted. This actually goes *above* the html for the form so that we can capture form data for repopulation if needed.

Excellent. First, we use the ternary operator to check for data submitted via the form. If there has been no data submitted, we simply enter an empty string into the $subject and $pattern variables.

Next up, we use a simple if statement to check the length of the $pattern using the strlen function. If the $pattern has a length that is greater than zero, then we proceed to run the preg_match_all function using the data that had been submitted from the form. preg_match_all is a fantastic function you can use to check for all matches within a subject. The first parameter to this function is the actual pattern to use for the regex. Note that in the code above, we added the delimiters ahead of time, that way when we enter the pattern in the form we don’t need to include them. This is just a simple convenience mechanism. The second parameter is the subject that we will test the regular expression against. This is the data that comes in from the textarea in our form. The third parameter is named variable to hold the matches that result from the regular expression being run. With preg_match_all, the matches actually get stored in a multi dimensional array. This is why when we loop over this array, we loop over $matches[0] and not $matches. Finally, we simply echo out each match followed by a line break. The entire script looks like this.

This is super simple and basic, but it should make for a nice quick and dirty testing tool for some of our regular expressions. Let’s examine positive and negative lookbehinds. Recall from our prior post that this is when the regular expression looks behind, or before, the pattern in question. This is good for matching something that only is preceded by a particular character or string of characters. Here is a screenshot or our tool completing a successful positive look behind match.

regular expression positive look behind

This is right after the submit button had been clicked. We can see the match of ‘we’ gets output at the very top, and our form is populated with the data we had used. This way we can easily edit the data in the form, and try another test. This used Do you know what 1 + 1 is equal to? {we will soon find out} [haha]. for the subject and our regex pattern was (?<={)[a-z]{2}. This pattern says, match any two characters within the range of lower case a to z, *only* if it is preceded by a left curly brace. Nice!

Let's now test the negative lookbehind. We can modify the pattern to (?<![)[a-z]{4} which says to match any four sequential characters as long as there is not a left bracket before them. We can see here that this works, as the string 'haha' was not matched, but several other consecutive four characters were.
regular expression negative look behind


Should you now ever find yourself without an internet connection and unable to reach any of the fine resources we covered in the last episode to test your regular expressions, you now have a really simple way to test them locally if need be. It's quick, dirty, and useful - perfect if you ask me 🙂