Monday, June 8, 2015

Beginner’s Guide to Regular Expression (Regex)

A regular expression is a set of characters forming a pattern that can be searched in a string. Regex can be used for validation such as to validate credit card numbers, for search i.e. via complex text matches, and for replacing matched text with another string. It also has great multiple language support – learn it once and you can use it across many programming languages.


I"ve seen few people take a first look at regex, and ignore it completely. I don’t blame them; regex"s syntax is complex and will make many cringe just like those command line languages, only worse. But then every new thing is scary and seem impossible to learn at first. So, borrowing Horatius" words I"ll say this; Begin, be bold, and venture to be wise.


About Regex


Regex had its roots in neuroscience and mathematics and was only implemented in programming in 1968 by Ken Thompson in QED text editor for text search. Now it"s part of many programming languages like Perl, Java, Python, Ruby, and JavaScript.


Let’s look at some examples on how regex works.


I"ll be using JavaScript in my examples. Now, to pass beginner level, you need to learn all the characters, classes, quantifiers, modifiers and methods used in regex. Here"s a link to Mozilla Developer Network"s Regular Expression page where you can view a table containing all those. You can also refer to the cheatsheet at the end of this post with most used characters.


Let’s see a simple example with an explanation. This is a regex.


This is what the above regex will look for in a line, a character "B" followed by at least one of any character between (and including) "a" to "z", "A" to "Z" and numbers 0 to 9.


Here"s a sample of matches in a line highlighted:



Basket, bulb, B12 vitamin, BaSO4, N BC company


The above regex will stop the search at
Basket
and return a positive response. That"s because the global modifier "g" has to be specified if you want the regex to look into all the possible matches.


Now, let"s see how to use this expression in JavaScript. The test method goes: if found a match return true, else false.



var input = "your test string", regex = /B[a-zA-Zd]+/;
if(!regex.test(input))
alert("No match is found");
else
alert("A match is found");

Let"s try another method: match returns the matches found in an array.



var input = "your test string",
regex = /B[a-zA-Zd]+/g,
/*I"ve added the global modifier "g" to the regex to get all the matches*/
ary = input.match(regex);
if(ary===null)
alert("No match is found");
else
alert("matches are: " + ary.toString());

How about string replace? Let"s try that with regex now.


 var input = "your test string", 
regex = /B[a-zA-Zd]+/g;
alert(input.replace(regex, "#"));


Below is a codepen for you to tweak. Click the "JavaScript" tab to view the JS code.



Exercises


For exercises, you can google “regex exercises” and try solving them. Here’s what to expect when attempting these exercises, according to difficulty levels.


Basic


To me being able to validate a password is enough for starters. So, validate a password for 8 to 16 character length, alphanumeric with your choice of special characters allowed.


Intermediate


This is where you should practice with more real world data and learn few more regex points like lookahead, lookbehind assertions and matching groups;


  • Validate PIN codes, hexadecimals, dates, email ID, floating point.

  • Replace trailing zero, whitespaces, a set of matching words

  • Extract different parts of a URL

Advanced


You can optimize the above exercises" solutions – the most optimum regex for email has thousands of characters in it – so take it as far as you feel comfortable with and that"s enough. You can also try:


  • Parsing HTML or XML (eventhough in the real world it is discouraged to do so because using regular expression to parse non-regular language like HTML will never make it foolproof. Plus XML parsing is a difficult task, more suitable for advanced level users)

  • Replacing tags

  • Removing comments (except the IE conditional comments)

Tools


Tools to visualize regex are one of the coolest things out there for me. If you ever come across a long complex regex, just copy paste them into one of those tools and you"ll be able to view the flow clearly. Besides that, there are many tools that you can use to fiddle with the regex code. They also showcase examples and cheatsheets along with share features.


  • Debuggex – It draws a regex diagram as per your input and you can make a quick share to StackOverflow right from there.

  • RegExr – You can test your regex with this one. It also got reference, a cheatsheet and examples to help you out.

  • Refiddle – At the moment, other than JavaScript, you can also fiddle with Ruby and .NET versions of regex in it.

Regex Cheatsheet




















































TokenDefinition
[abc]Any single character a, b or c
[^abc]Any character other than a, b or c
[a-z]Character between(including) a to z
[^a-z]Character except from a to z
[A-Z]Character between(including) A to Z
.Any single character
sAny whitespace character
SAny non-whitespace character
dAny digit 0 to 9
DAny non-digit
wAny word character (letter, number & underscore)
WAny non-word character
(…)Capture everything enclosed
(a|b)Match either a or b
a?Character a is either absent or present one time
a*Character a is either absent or present more times
a+Character a is present one or more times
a33 occurences of character a consecutively
a3,3 or more occurences of character a consecutively
a3,63 to 6 occurences of character a consecutively
^Start of string
$End of string
bA word boundary. If a character is a word’s last or first word character or If a character is between a word or non-word character
BNon-word boundary

Now Read:
Regular Expressions: 30 Useful Tools and Resources



Beginner’s Guide to Regular Expression (Regex)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.