Finding and removing duplicate data in Excel can be a quick process. However, for many people who haven’t learnt how to do this the quick way in Excel, it can be an absolute chore.
One of the many things I love about my job is that I get to work with so many different people from all kinds of organisations. However, it doesn’t matter what industry or region I’m working in the challenges of working with large lists of data are pretty much the same.
Something I see often is the need to be able to quickly locate and remove duplicate rows of data (excellent for VLookups). Most people just sort the data and then manually skim through the excessive lines of information trying to weed out the annoying duplicates. This works, but boy-oh-boy it can be a time waster. Wouldn’t it be easier to have Excel quickly display the duplicates for you? Then all you need to do is deal with them.
In this post I’m going to walk you through how to quickly locate the duplicates and then, once located, super quick ways in which to remove the duplicates.
The steps below cover:
- Locating duplicates in a single column of data
- Locating rows where the content of all cells are duplicates
- Removing the duplicate data.
Locating duplicates in a single column of data
The worksheet below holds data that has been exported from an in-house system. Unfortunately it has a number of duplicate invoice numbers in the ‘Invoice #’ column. Using the ‘Conditional Formatting’ feature I can apply formatting to any cells with duplicate content. All other cells will remain as they are. This is an excellent way to very quickly identify duplicates.
1 – Select the column that holds the duplicate data. In this case it’s the ‘Invoice #’ column.
2 – From the Home tab, in the Styles group click Conditional Formatting. A drop down menu is displayed. Select Highlight Cell Rules from the menu.
3 – Select Duplicate Values to quickly identify duplicate or unique records.
4 – The dialog box will display the condition to be matched. In the example below the condition is ‘Duplicate’. This setting will ensure any duplicate cells within the selected column will have the formatting applied to them. The option box on the right displays the format that will be applied if the condition is met. There are a range of pre-set formats available on the drop-down arrow, or if you would prefer, you can create your own by selecting Custom Format from the drop-down list.
5 – Select your format options and then click OK. All duplicates will now be formatted and easily identified.
Note: You can also use the Quick Analysis tool (Excel 2013 and 2016 only) to quickly apply Conditional Formatting using default formats. Just select the data range and then click the Quick Analysis button which will be displayed beside the selected range. Select ‘Duplicate Values’ option.
Locating rows where the content of all cells is duplicated
In the steps above we located rows where there were only duplicate invoice numbers. But what if you want to find duplicates where ALL of the data in a row is exactly the same as another? For example, if you were trying to locate duplicates where the invoice number, date, customer name, product name and cost price where all exactly the same.
Here’s a quick way to do this.
1 – The first thing to do is go to an empty cell to the right of the existing data. Using the CONCATENATE function I’m going to pull all of the data content into this one cell.
2 – In the Function Arguments dialog box enter each individual cell for the row into the Text boxes. Note, as you enter a cell reference into one text box another will appear. You can have up to 255 references within the function. Once you have added all of the cells on the row click OK. The content of all of the cells will be pulled into the one cell.
3 – Copy the function down the column and then apply to the same range of data conditional formatting to find the duplicates. Only the cells where all cell content is the same will be formatted.
Removing duplicate rows of data
Once you have located the duplicates in most cases you will be wanting to remove them. I see two different methods used again and again. I’ll outline both of them. You find your favourite.
Using Filter. If your conditional format applied a coloured background or changed the colour of the cell font you can use the Filter command to quickly filter and display only the duplicates.
1 – Select any cell in the data list and then from the Data tab click Filter.
2 – Click the filter drop-down arrow on the column that is currently displaying duplicate formatting. From the drop-down list select Filter by color and then select the cell colour or font colour depending on your conditional formats.
3 – Excel will filter out the unique records and only display the duplicates.
4 – Now delete the duplicate rows. Tip: to do this quickly hold down your CTRL key and click each of the duplicate row numbers. Once they are selected from the Home tab click Delete, Delete Sheet Rows. All selected rows will be deleted.
5 – You will have now removed the duplicates and will only have the unique records remaining. All formatting should now be removed as you no longer have duplicate values.
Using the ‘Remove Duplicates’ button you can have Excel quickly strip out any rows of data where duplicates are found.
1 – Select any cell in the data list and then from the Data tab select Remove Duplicates.
2 – Select one or more columns that contain duplicate values.
3 – Click OK. Excel will delete any rows where duplicate values are found in the selected columns. Please be aware that Excel will indicate how many records it has found with duplicates. However you aren’t shown which these are prior to their removal. If you are wanting to see them prior to removal apply conditional formatting first.
There is no need to spend large amounts of time locating duplicates. Using the steps above you should be able to identify and deal with duplicates swiftly.
We cover these and other fantastic functions and tips in our Excel Data Cleaning and Reporting course. If you are struggling with getting your data into a format that is useful or wanting to become a whiz at Pivot Tables, this course is definitely for you.
Sharyn is an expert trainer. She became the first certified Microsoft® MOUS Authorised Instructor in New Zealand.
She is endorsed by Microsoft® as a qualified Microsoft® Office Specialist and has more than 20 years of experience in the training industry, developing and delivering technology training workshops. Her approach to taking the “techie-speak” out of technology training has placed her as a preferred supplier to many of New Zealand’s leading organisations.
If you enjoyed this you may also enjoy...
Managing, analysing and referencing lists of data becomes easier when you know how to use Excel Tables. There are so many advantages to using Tables which I will cover within this post. Once you see how helpful they are I’m pretty sure you will be using Tables a lot. When you insert a Table over a list
Learn how to change a Pivot Table back to the classic display of version 2003 and earlier. If you were familiar with Pivot tables in Excel versions prior to Excel 2007 you will be very used to the classic display of working with the blue lines of the Pivot Table and the ability to drag and
If you want to save yourself a huge amount of time when reconciling data, learn how to use VLOOKUP to compare two lists or columns of data. Years ago the Training and Development manager of a large organisation invited me to spend some time in their Accounts department in order to observe how the Accounts team
There are times when you want to return a match for a VLOOKUP based on more than one value. For example, we may have customers who have multiple branches. Each branch is recorded under the same customer number. So how do we look up info for a specific branch if all of the branches are
To group dates by month in a Pivot Table is easy when you know how. Unfortunately, displaying date values in a Pivot Table can produce unexpected results. For example, in the Pivot Table below I have added a Date field to the Column area of my Pivot Table. Every date in the underlying data table is
PROBLEM: having to retype text when your data is in UPPERCASE, lowercase or a Mix oF bOTH. SOLUTION: quickly change the text back to the proper case without having to retype it.