The layman’s guide to regex: what is regex exactly?

RegEx stands for regular expression and it is a common tool used by programmers to match patterns of string data. “WAIT, I thought this was the layman’s explanation?!” Let us continue.

A good way to think about regular expressions is a sifter for string data. Let’s say you have a recipe for an amazing cake but your string flour is too lumpy and stuck together. The regular expression sifter will allow you to collect the string flour of your choosing so that you can finish the recipe. And you can keep adding filters to the sifter to make your flour even more fine.

At the most fundamental level, a regular expression allows us to find, manipulate and collect data within strings. A string is one of the most common data types in computer science and it is usually just a sequence of characters between a pair of quotes.


This is a string

With regular expressions we can take a string like the one above and match certain patterns that we are looking to find. Say for instance we wanted to change every ‘s’ character to the number ‘5’.

We could use regular expression to do that instead of re-typing every single time the letter ‘s’ appears. This might seem like a trivial task for the above example but imagine doing this for a research paper you have been working on or maybe even your first novel.

Regular expressions make that process of matching certain data within strings much easier and that is why it so useful. You probably use regular expressions everyday without know it. Everytime you open up a Google Doc or Microsoft Word to use the “Find and Replace” feature you are using a regular expression!

 

Using regular expressions

When I first started learning about regular expressions I was very intimidated because the syntax looked very complicated. You might see a regular expression that looks like this:

​/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.[\W]).{8,}$/

The above is a regular expression for a password that satisfies a strict set of conditions. These are the types of regular expressions that I first saw and it boggled my mind. Let’s start with something a little more straightforward.

Here we have a string of some sample song lyrics. Let’s switch up the words a little bit:


lyrics_test = File.read('lyrics')
replacements = [ [/club/, "pull request"], [/girl/, "labs"], [/the cut/, "my repo"], [/she/, "they"] ]

replacements.each {|replacement| lyrics_test.gsub!(replacement[0], replacement[1])}

puts lyrics_test

In this very simple example we can use one of the most popular tools for regular expressions called gsub.

The method gsub takes two arguments, the first being the data you want to replace and the second being the data you want to replace it with. In our example we have an array of values we want to replace along with the values we want to replace them with.

​replacements = [ [/club/, "pull request"]...

RegEx uses the syntax of two forward slashes and anything you place inside of those slashes becomes your regular expression /-ANYTHING-/. In the first iteration we have our regular expression of /club/ and we replace it with the string ‘pull request’.

Let’s take a look at another example:

lyrics_test = File.read('lyrics')
replacements = [ [/ [t][A-Za-z]{3} /, " code "] ]

replacements.each {|replacement| lyrics_test.gsub!(replacement[0], replacement[1])}

puts lyrics_test

In this example we want to replace every four letter word that begins with the letter ‘t’, with the word ‘code’.

Looking more closely at our regular expression inside of /-ANYTHING-/ we can see that it starts with an empty space. This will be the start of any new word. Now we want to specify only words that begin with the letter ‘t’. Using [t] we are able to pinpoint only the first words that start with the letter ‘t’.

Next we use a handy regular expression trick to specify any alphabetical letter. [A-Za-z] means ANY character from capital A-Z or lower case a-z. Finally we specify that we want exactly three of the characters using {3} to complete a four letter word.

RegEx has a treasure chest of pattern matchers which allow you to match the different sequences across your strings. I would highly recommend playing around with Rubular and RegExr to get more familiar. Once you get the syntax and matchers down I promise you will it will begin to click!

Advertisements

Ruby Class Shortcuts

class Person
 attr_accessor :name, :age
end

# same as:

class Person
 def name
 @name
 end

def name=(name)
 @name = name
 end

def age
 @age
 end

def age=(age)
 @age = age
 end
end

Ruby is awesome. I think it has to do with the ease and beauty of Textmate but I really look forward to typing out code using the language. Additionally I’ve been introduced to many shortcuts which make writing code a lot easier.

One recent shortcut which I’ve been trying to wrap my head around is attraccessor. In the above code you will see the one line “attraccessor :name, :age” re-writes everything below.

It basically performs the read and write methods on a function. This allows you to call x.name and return whatever x is and also allows you to assign x.name = “Jahde”.

I’m sure I’ll be seeing other shortcuts that make writing code easier but this one stood out to me. I think I’m just shocked that 1 line can replace 12 but I’ve come to find this is one of the key attributes of programming itself. Ruby wants to make herself easy to love.