Finding Duplicate Rows Based On Values In Multiple Columns?
Mar 28, 2014
I have a data set which has 6 columns (and lots of rows). Every row is different but I want to aggregate them based on 4 fields and then find the average of the numerical column for the results. I basically want to Group based on 4 fields and find the average of the 5th field.
My initial approach was to introduce a column which is a combination of the 4 fields I want to group by, simply in the Excel file (=A2&B2&C2&D2) and then find duplicates of that. I have a solution for this in VBA but when importing new data sets in this method is very slow, so I want to be able to do the whole thing in VBA.
I am not sure if Excel is able to do this but basically I am looking to find out which rows have some duplicate values. I have just read this back and it doesn't make a great deal of sense so I have attached an example spreadsheet.
Basically I am looking to find if E1:G1 duplicates further on down the list, hope this makes a bit more sense with the example attached.
For my job I have to take hundreds of codes and compare them to other codes. For example, in column A I'll have 453 codes, in column B I'll have 352 and in column 97. I want to find common codes for all three columns. Sometimes, I'll have just two columns and sometimes it's multiple columns. I have tried a few formulas but nothing works that well. Any formulas or MACRO
I'm looking for a Conditional Formatting formula that will check two columns before highlighting the duplicate rows. I need it to be conditional formatting because I know nothing about writing macros or vba (what-ever that is?). Data is entered into Columns A, B, and C. I need to check both column A and C before it highlights the duplicates, based on those two columns. (The format only unique or duplicate values checks only one column.) I have attached an example, but this is just an example, as I have hundreds of lines to go through on the original. (For this example, Row 2 and Row 7 are the duplicates I need highlighted.)
I have Master sheet where I collect info from sub sheets. All sheets are similarly formatted, ie. product numbers on column A and headers on row 2. I need to sum values from all sheets based product number and header. Master sheet includes all product numbers and some extra headers, sub sheets include only needed numbers. Headers on sub sheets are identical.
Currently I have this thing solved with following formula:
But problem with this is that column is hard coded, so I have to know that that value I am looking for is in column L. That wouldn't be show stopping problem on its own, but I have columns all the way to DR and copying formulas for each column takes a lot of time when I have to manually update each column. Just copying cell holds that L:L and doesn't change it.
So, in addition of getting values for specific product number I need to get values from specific column based on column header.
I have a table with column headings of product ID Numbers (eg.1111) and row headings of Store number (Eg.1) with data showing the time each product was last sold at that store, I need something to consolidate for each store which Product ID's were sold prior to 5pm and what time they were sold.
EG
Store 1 1111 16:40 2222 13:00 Store 2 1111 15:05 3333 16:50
In the example above I am trying to look up a value from columns C-E. I need to be able to search/index using 2 criteria to figure out which row to match with the given column. for example: If I want to know the invoice qty. for R&D for Jan-2012, so the returned value would be 13. I have tried several different combinations of match and index to get this to work but have had no success. Ultimately what I want to do is have a drop down for the month and year that our VP can select and it will give him the given values for that month.
I need to remove all rows where COL A value and COL B value are the same. COL C does not need to be considered. However I need to retain one of the Col C values for purposes of formatting.
The end result should look similar to columns F,G and H!
How do I delete duplicate rows in a sheet using a macro. When I say duplicate row, it is not based on a particular column but all the columns, so it is a true duplicate record.
I have an excel worksheet which is having duplicate values in multiple columns, i want to remove those duplicates and should return unique values... how can i do that... My Excel Sheet looks below....
I have a number of rows that I want to have duplicated X number of times (and altered) where X is found by looking at certain cells within each row.
There are four numbers in each row, and I want to split them up into multiple rows each with three zeros and one one.
I would like to convert data from this:
Name W X Y Z John 1 0 0 0 Doug 0 0 1 0 Karl 3 0 1 0 Mike 0 1 1 2 etc.
...to this:
Name W X Y Z John 1 0 0 0 Doug 0 0 1 0 Karl 1 0 0 0 Karl 1 0 0 0 Karl 1 0 0 0 Karl 0 0 1 0 Mike 0 1 0 0 Mike 0 0 1 0 Mike 0 0 0 1 Mike 0 0 0 1 etc.
You can see that the W, X, Y, and Z columns from the four new Mike rows sum to equal the values in the original Mike row (0, 1, 1, 2), but everything has been split so that each row just has a single one in it and three zeros.
Does anyone have an idea of how to do this? Thanks Auto Merged Post Until 24 Hrs Passes;I thought of another way of putting it that may be easier to understand.
Given an input row of "George, 4, 7, 3, 2", I would like the output to contain 4 rows of "George, 1, 0, 0, 0", 7 rows of "George, 0, 1, 0, 0", 3 rows of "George, 0, 0, 1, 0", and 2 rows of "George, 0, 0, 0, 1".
I have a sheet with 45,000 rows. Let's say each row has 4 columns: Create_timestamp, Update_timestamp, email_address, and o_flag
Many rows have duplicate email addresses. I would like to remove all the duplicate rows, EXCEPT for the row with the most recent Update_timestamp.
And actually, if I could just "hide" all those rows, that would be even better, but I'd be happy just figuring out how to delete all the "old" rows, so I just have a list of unique email addresses, with their create/update timestamps and o_flag column. Seems like this is such a basic use case for "Remove Duplicates,",.
I am trying to find the top two values per group based on multiple criteria. The list I'm working with is not sorted and would be better for it to not have to be sorted as on-the-fly sorts will likely often occur from the raw data and I wouldn't want that to mess up the results I'm looking for here.
As an Example, here's what I'm trying to do:
Make Model Rating Ford Bronco 64 Chevy Corvette 94 Dodge Intrepid 83 Chevy Chevette 34 Dodge Viper 72 Ford Escape 21 Ford Expidition 53 Chevy Impala 67 Ford Fairmont 11 Dodge Dart 33
I have a spreadsheet that lists employees and their certifications. If an employee has multiple, then they will show up on as many rows as they have certifications.
The macro I have merges them into one row with a line break, but only the first column's unique value has been merged while the other columns containing their own unique values are duplicated when I want them to show up only once. Example: Jane Doe shows up 2 times on the report. Her name should only show up once on the row, not 2 times with a line break.
Here is the code. I have also attached an example of what I need. Because the attachment is a simpler version of the actual report, is it possible to specify which rows have the unique values and which ones don't?
I have a excel file which contains dublicate rows. The duplicate rows can be identified based on few cell/column values. I need a macro to delete the duplicate rows when the below condition is satisfied: let us consider row 5 and row 6:
If column 7,12,13,16,17,18,19,23,24,27,28,29,30 in row 5 = row 6 then row 6 has to be deleted. This condition has to be followed for all other rows in the excel used range. Have attached the sample workbook.
This formula allows me to find the lowest value in column U where column N contains the text "NO".
{=MIN(IF($N$2:$N$10000="NO",$U$2:$U$10000))}
I want to add another condition so that the formula only returns the lowest value in column U where (i) column N contains the text "NO" and also (ii) column F contains the text "YES".
I am trying to pick out certain bits of information from the below "example" set of data:
A 1
A 1
B 1
C 1
[Code] .......
My aim is to record the letters that are recorded against both numbers (note: in my data there are more than 2 sets of numbers). For the example above the solution would be:
A 1,2
B 1,2
...because these two letters appear against both 1 and 2.
There are some letters that are duplicated against the same number which is making it hard for me to work out. I don't care if the same letter appears against the same number, I just would like to know instances when a letter appears with a different number, and if possible what that number is.
My Excel program (Excel 2010) currently has several columns and each column looks for and pulls data from a specific file on my computer. Then I need to delete any duplicate data entries, count the number of unique entries and track the changes through a chart. I have everything done except I cannot figure out (or find on the internet) a way to search in multiple columns (more than 2) and delete just the duplicate cells. I want to delete the cells in a way where there is one left. For example if the code 12gf is duplicated three time, I want to be left with one 12gf (it doesnt matter what column the original one is left in). Additionally, column length changes and they are not sorted. I have attempted to attach an image of an example file below.
I'm new to VBA and macros, using Excel 2010, and am trying to figure out how to delete all duplicate rows in a sheet where 2 or less of their values in column A is "1". I'd like have a script that is flexible enough to change to 3 or less if need be. I also have a header row that needs to be offset in the process.
A---B- 0--123 <-delete 0--123 <-delete 0--123 <-delete 1--123 <-delete based on this the value of column A 0--123 <-delete 0--123 <-delete 1--321 1--321 1--321 1--321 1--321
or
A---B- 0--123 <-delete 0--123 <-delete 1--123 <-delete 1--123 <-delete based on this the value of column A 0--123 <-delete 0--123 <-delete 1--321 1--321 1--321 1--321 1--321
I have some VB code, courtesy of OzGrid and Davc4, that works well to delete duplicate rows based on criteria in Column A of the active worksheet (albeit a bit slow on large files).
How do I modify the code below to evaluate duplicate data in Columns A through D? .....
I have a sheet (see Sheet 1) from a report we run which lists the following information: Personnel Number, Amount, Wage Type. This is generated for 1000's of employees, with each personnel number being repeated several times in column A.
I am trying to pull specific data to another sheet (see Sheet 2), which would ideally generate the sum of "Amount' for a specific wage type for each personnel number. The issue is is that there may be dplicates of the wage type for each ID number (which is also repeated).
For example, the total salary amount on sheet 2 for ID#12345678 would be 0, while for #9876543 it would be 1250. Is there a formula I could use on sheet 2 column B that would generate this?
I have multiple columns / rows of data, some of which are duplicates.
Column S is a concat of columns A:R where this data is stored, and is sorted alphabetically.
I'm looking for a way using VBA to find duplicate concat rows by cycling through this list that is already sorted. I'm interested in moving down this list, 1 by 1, and if current cell = cell above, delete the data in columns A:P of that row, then delete the cell data in column R of the cell above the current cell.
So for example, if I have sorted data in S8:S14, and S9 = S8, then I would like to delete A9:P9, then delete the data in R8.
I deal with leads for a sales room and get sent over leads in bulk, I've created a master scrub list that I can attach to the end of a new lead file and sort by number to show which are duplicates.
When you do the: Data, Filter, Advanced Filter, select Unique Records, it hides the duplicate but what I need is not only for the duplicate to be hidden or gone but the row that it is a duplicate of, i.e. I need BOTH rows to go
Name-----number Dave 555-1212 Dave 555-1212 John 536-2343 Smith 423-2312
needs to become
Name-----number John 536-2343 Smith 423-2312
I would need a formula that figured out that Dave with number 555-1212 was a duplicate and delete BOTH rows,
I have two lists and wish to compare them to identify duplicate values. I have used Duplicate Values in Conditional formatting but cannot find a way of ensuring an exact match. For example one list has the value 4150 and the other list has other values like 5641509 and 341508, both of which contain the string 4150 but are clearly not the same value. However, the conditional formatting is picking these up as duplicate values.
Magazine subscription list. How to highlight the customers that are already in the sheet if enter them again (renewal). Our list is like so....
ColA ColB ColC ColD ColE ColF First Last 123 Ave City State Zip
Is there a way to highlight the row if the info on ColA, ColB, ColE, and ColF all match? Sometimes the Street info is abbreviated or entered PO Box instread of P.O. Box and they wind up on the list a second time.