Wednesday, 22 July 2009

Excel: Manipulating text with LEFT, RIGHT, LEN, FIND, IFERROR

This entry solves the problem of having a single field for someone's name in a spreadsheet but then wanting to split it into forename and surname - perhaps because you want to sort rows alphabetically by name - hmm... yes... there's another entry to come!

This one was suggested by a colleague, Patrick Bellis and he did actually provide a solution as well. I've had a fiddle and come up with an enhanced version though that shows how to deal with entries that have two forenames or a middle name or initial. It doesn't deal with more than one middle name or initial although you can do that easily - details later!

So in my simple example, the Full Name field was the previously existing field and I have used formulae to split this into the two new fields, Forename and Surname.

Now I have to say that this isn't my preferred way of doing this. My preferred way would be to use a Visual Basic (or VB) routine in a macro, but whilst these work for me quite easily in MS Access - the database application that comes with Microsoft Office Professional, I've never managed to get VB macros to work in Excel even when my efforts look exactly like the example in the Help pages. Someone else will have to suggest a fix for that perhaps?

The VB routine would allow me to search for the last space within the name field. Using a simple formula, we have to search for the first space and that's why there's a problem with people who have middle names.

To get over this partly - more specifically to deal with people who have a single middle name or initial - I've used a third new column which I've placed over to the right, out of the way where it doesn't need to be seen or printed. It's doing nothing other than an interim calculation or manipulation of the Full Name field.

I've left a blank column between the working bits of the spreadsheet and this interim field but in reality you may want to hide it away far over to the right - say in column Z or even ZZ!

This interim field searches for the first space and will contain all the text that comes after that first space.

But before we look at the formula for that field let's deal with the simplest formula - that for the Forename field. Before we even do that let's look at the common syntax (the use of characters and the order they must be used) that formulas in Excel have.

Any formula starts with an equals sign (=). This tells Excel that we want it to calculate something rather than treat the following characters as a string of text.

Many formulas use brackets (or parentheses to give them their posh name) to contain any parameters the formula needs to work and use commas (,) to separate each parameter. The formula name tells Excel what to do and the parameters tell it what to do it to and how. A bit like:

=DO-THIS(Do it to this, Do it like this)

Formulas can be nested. That means we can use one formula to calculate the parameters for another formula. To find the Forename from the Full Name we need a formula that does this:

RETURN THE CHARACTERS(from the Full Name, until you reach a space)


For the Forename in Row 3 (so in the cell C3) the formula is:

=IFERROR(LEFT(A3,FIND(" ",A3,1)),"")

There are three different functions here and as with all mathematical formulae you should start by understanding the innermost set of brackets first.

That set contains the parameters for the FIND keyword. The FIND formula counts the character that contains the first character of the string we want to find (the example will make this clear). The syntax for FIND is:

FIND(the string of text you want to find, the string of text you want to find it within, the first character within the original string that you want to start searching from)

Whoa! Too Much! Don't panic yet - I'm about to explain step by step.

Step 1. The string of text we want to find is a space. So in the actual formula a few lines up you see ...FIND(" "...

Step 2. The string of text we want to find the space within, is the Full Name field. So instead of actual text we have a pointer to cell A3. So far we have explained ...FIND(" ",A3...

Step 3. I've put a figure one - 1 - because I want to start searching from the first character. Now; you could argue that there shouldn't be a space until at least the second character so I should have used a 2. But whilst there shouldn't that doesn't mean there won't be a space as the first character. Someone might have wanted to put their first initial and then their surname but not pressed the initial key hard enough. So by using a 1 instead of a 2, we do a simple bit of error trapping.

So that should explain ...FIND(" ",A3,1)... The result of that bit of the formula is the number of the character that contains the space as counted from the left. So as cell A3 contains the text "John Burke" the FIND formula should find the space in character 5 and that is the "result" or "return" of the FIND formula. In the next cell down when considering the string "Seamus O'Leary", the space is the 7th character so the FIND formula returns 7.

Thus as we consider the next most inner bit of the formula - which is the LEFT keyword - I can simplify the formula for you.

Instead of considering:

LEFT(A3,FIND(" ",A3,1))

we can consider it as:

LEFT(A3,5)

because Excel calculates the innermost formula first and works outwards, so once the FIND formula has run the LEFT formula works using the result.

The LEFT formula chops and returns the text from the start (left) of the target string depending on how many characters you want. The syntax for LEFT is:

LEFT(a string of text, the number of characters starting at the left of the string that we want to use)

Again the string of text we want to use is contained in cell A3 so we have LEFT(A3... and the number of characters we want to use (as calculated by the FIND formula) is 5. So LEFT(A3,5) is the same as LEFT("John Burke",5) and the result is "John ". Note that there is a space at the end, because we asked for 5 characters. It won't show so it doesn't matter, but if you wanted to be pedantic (anyone having problems with the formula should ignore this as over complicated) we could have used LEFT(A3,FIND(" ",A3,1)-1) which would have subtracted 1 from the value returned by the FIND.

Ahem! But that's just being a clever clogs!

So that just leaves us with the IFERROR bit of the formula and what that does is it stops Excel from coming up with an error message if there is no space at all.

Basically the syntax for IFERROR is:

IFERROR(I want Excel to do this, but if it finds an error then do this instead)

In the last parameter I've used two double quote marks "" - or in other words I've told Excel that if there is an error, instead of reporting that there's been an error it can show the text within the quote marks. Which is nothing. No text at all. The cell would be empty. Excel isn't good at telling you what's wrong; it just tells you there has been an error.

Without the IFERROR formula if there is no space at all then you just get the dreaded #VALUE! error, which tells you absolutely nothing! See Row 7 in the screen grab.

And so to the surname... Unfortunately there is no way of saying 'Now take the rest' after having done a LEFT formula. But there is a RIGHT formula which does the same as the LEFT only from the rear of the text string.

RIGHT(a string of text, the number of characters starting at the right of the string that we want to use)

Our problem now is that just because we know the position of the space, that still doesn't tell us how many characters we want from the right hand end of the Full Name, because we don't know how many characters there are in the surname.

So we need to know how long the entire string of text is and to do that we use the LEN formula:

LEN(a string of text)

The LEN formula only needs a single parameter and returns a number which is the number of characters contained within the string.

LEN("Fred Bloggs") would return 11. (it's counting the space as a character don't forget)

LEN(A3) returns 10 - because cell A3 contains the text "John Burke"

What we need for the Surname is the length of the Full Name minus the length of the Forename.

So including our error trap use of IFERROR in case there is no space character, our formula is:

=IFERROR(RIGHT(A3,LEN(A3)-FIND(" ",A3,1)),A3)

Break it down from the innermost formula(s) in the nest. That's LEN and FIND. Both return numbers and we are taking LEN formula and then subtracting the result of the FIND formula. That gives us the number of characters to be returned by the RIGHT formula and as long as there is a space existing in cell A3 then we have our final result.

The IFERROR formula this time, instead of returning no text at all, returns all of the Full Name field - assuming that any text with no spaces will be a surname. You can change this round if you keep a spreadsheet of celeb names such as Madonna or Prince!

IFERROR(the main formula,A3)

Now I just need to explain the interim field. The nested formula for the Surname as described above would take everything from the first space. So if there is a middle name or initial then you get that plus the Surname. I've created an interim field with the formula just described and then the Surname field has a formula that is exactly the same but which takes its original text string from the interim field instead of the Full Name field.

You could have as many interim fields as you want, each whittling a middle name or initial out of the final surname, and with no errors even if there is no middle name.

12 such interim columns would let you deal with those unfortunate people whose father named them after the long-forgotten football team in the 1960s or 1970s!

11 comments:

Jacob Sommerville said...

This tutorial is great, thanks! Big help.

Charles Williams said...

Yes, I agree. Perfect tutorial and useful!

fayefoora said...

What if I have a cell with text "(ABCD)" and I want to remove the parentheses so I'm left with "ABCD"?

Can I manipulate any or all of the functions explained here in such a way that I'm left with just the text?

John Burke said...

You could extract the text to another cell, but none of these formulae will amend text in the original cell. For that you would need a Visual Basic routine - something I'm ok with in MS Access but have never got to work properly in Excel...

mahasiswa teladan said...

hi..Im student from Informatics engineering, this article is very informative, thanks for sharing :)

Anonymous said...

Very useful information...
Thanks John...

Carina said...

This is a wonderfully specific and descriptive explanation. I get it!
Thank you very much.

Siddhartha Banik said...

Thanks for your post!! It helped me a lot.

Mohan J said...

Hi, its very nice tutorial. thank you.

Jimmy Cham said...

Need help how extract value from column 1 to column 2, 3 & 4 as listed below?

COLUMN 1

CARTON/PACK/UNIT
11111111/22222222/33333333
CARTON/UNIT
11111111/33333333

COLUMN 2 - CARTON
COLUMN 3 - PACK
COLUMN 4 - UNIT

John Burke said...

Probably not enough info to answer Jimmy Cham fully. If (big "if"!) all values have 8 characters which you have implied here then you can do it with LEFT(), RIGHT() and MID() for which the syntax is MID(cell containing string,position of first char to take, number of chars to take). Or, presuming the eventual columns are separated as you show with a "/" then copy that column to a text file and then import it back into excel as a delimited file. Not enough space here to show this in detail...