Tutorial #1: Dealing with duplicates in Excel




A common occurrence in Excel is the presence of duplicate values within a column which makes it difficult to further utilise the data. Depending on the need, following are some of the quick ways to deal with duplicate values.

A. Using conditional formatting

1. Highlight duplicate values
  • Select the range of cells within which you wish to identify the duplicates
  • From the Home tab, choose Conditional Formatting > Highlight Cells Rules > Duplicate Values. Click OK within the Duplicate Values dialog box
The duplicate values in the list will now be highlighted in red.

2. List duplicate values

After the above step, select right click on a cell within the range and choose Filter > Filter by Selected Cell's Color. This will give the list of unique values which can then be further copy-pasted in another sheet.

Note: Make sure you have a column header in place as the filter will be applied on the first row. So, if the first row happens to be a unique value, the filter will be applied on it assuming it to be the header. This may throw you off guard if you are not totally aware of it.

3. List non-repeating values

Just reverse the selection in Step 2 when applying the filter i.e. select a 'No Fill' cell. This will list the values that are not repeated at all.

Note: This doesn't list unique values as the ones which are repeated are omitted.


B. Using built-in Excel functionality

1. List unique values

The simplest method to list unique values is to use the 'Remove Duplicates' option under the 'Data Tools' group of the Data tab. Make sure you properly utilise the 'My data has headers' selection before clicking on OK in the Remove Duplicates dialog box.


C. Using an array formula

A scalable way of doing this with fewer clicks (or none at all) is to use an array formula. Array formulae are a lot more convoluted to understand, so I may try to do a break-up of the formula along with an illustration in the future, but for now copy-paste of the below should suffice. Note that array formulae are entered using Ctrl + Shift + Enter and hence are also called 'CSE' formulae.

Here, I will assume that the values with duplicates are present in Column A and labelled as 'List1' (Ctrl + F3). Also, the formula will be entered in the cell B2.

1. List duplicate values
{=IFERROR(INDEX(List1, MATCH(0, COUNTIF($B$1:$B1,List1)+IF(COUNTIF(List1,List1)>1, 0, 1), 0)),"")}

2. List unique values 
{=IFERROR(INDEX(List1, MATCH(0, COUNTIF($B$1:$B1,List1)+IF(COUNTIF(List1,List1)>=1, 0, 1), 0)),"")}
Note: The only change here is to use ">=" instead of ">" within the IF statement condition check.

All my Excel tips are based on the latest version (Excel 2016 at the time of writing) as I am an Office 365 subscriber. The tips can be extended to the immediate preceding versions like Excel 2013 and 2010 but may differ in earlier versions.

0 comments: