Excel 2010 :: Delete Row With Duplicate Data In 2 Columns?
Apr 21, 2014
Basically, I have a sheet and I would like to delete the entire row if the data in column G is the same as that in column H. The data is text if that matter. I've tried to figure out the VBA code for it, but my knowledge is severely limited. The spreadsheet is excel 2010.
My Excel program (Excel 2010) currently has several columns and each column looks for and pulls data from a specific file on my computer. Then I need to delete any duplicate data entries, count the number of unique entries and track the changes through a chart. I have everything done except I cannot figure out (or find on the internet) a way to search in multiple columns (more than 2) and delete just the duplicate cells. I want to delete the cells in a way where there is one left. For example if the code 12gf is duplicated three time, I want to be left with one 12gf (it doesnt matter what column the original one is left in). Additionally, column length changes and they are not sorted. I have attempted to attach an image of an example file below.
I'm new to VBA and macros, using Excel 2010, and am trying to figure out how to delete all duplicate rows in a sheet where 2 or less of their values in column A is "1". I'd like have a script that is flexible enough to change to 3 or less if need be. I also have a header row that needs to be offset in the process.
A---B- 0--123 <-delete 0--123 <-delete 0--123 <-delete 1--123 <-delete based on this the value of column A 0--123 <-delete 0--123 <-delete 1--321 1--321 1--321 1--321 1--321
or
A---B- 0--123 <-delete 0--123 <-delete 1--123 <-delete 1--123 <-delete based on this the value of column A 0--123 <-delete 0--123 <-delete 1--321 1--321 1--321 1--321 1--321
i am trying to come up with VBA code, sheet is attached. i have some columns that have headers but rows are empty . so the VBA should delete all the se columns entirely and leave those that has headers and have data in rows.
Excel 2010ABCDEFGHIJKLMNOPQRSTUVW1AccountUnitFund CodeDepartment ActivityAnalysisTypecodedeskitemBegin DateQuantityUnit of MeasureAmount CurrencyJob CodeEntry EventParent Budget Entry TypeOptionsLine CodeFunding SourceFacilities and AdministrationCost Sharing2
I have a sheet (see Sheet 1) from a report we run which lists the following information: Personnel Number, Amount, Wage Type. This is generated for 1000's of employees, with each personnel number being repeated several times in column A.
I am trying to pull specific data to another sheet (see Sheet 2), which would ideally generate the sum of "Amount' for a specific wage type for each personnel number. The issue is is that there may be dplicates of the wage type for each ID number (which is also repeated).
For example, the total salary amount on sheet 2 for ID#12345678 would be 0, while for #9876543 it would be 1250. Is there a formula I could use on sheet 2 column B that would generate this?
I'm trying to create a macro that will look at each worksheet in a workbook and then delete the last line of data on each worksheet. The last row can vary on each worksheet. This is what I have come up with but it is not working. I am on Excel 2010 and Windows 7.
I have 2 worksheets. One has Employees and the devices they have. Last name, First Name, Device each in their own colulmn. Many have more than 1 device so they have multiple entries on seperate rows.
Another worksheet has Employees and thier location. Last name, First Name, Location. Again, all on seperate columns.
It would look something like this
Sheet1 Lastname Firstname Device Johnson
[Code]...
So I'm tasked with combining them into 1 sheet with last name, first name, device and location. The issues I'm having are:
1) A team member could have multiple devices 2) A last and/or first name can appear many times, so a simple Vlookup against lastname won't work - it has to somehow also compare against both.
We utilize large data sheets that can be as large as 300K in rows and 10 to 15 columns wide. Because of how we receive the data, we are forced to manipulate things so that all matching data for a record ends up on a single row. (e.g. Record#, Document Type, Husband Name, Wife Name, Wife Maiden Name, Etc.)
Right now here's how the data is received:
a a a b b c c c c d d d e e e e
Using two vba scripts, we first separate the data with row spaces between the unique data as follows:
a a a
b b
c c c c
Then with another script, we transpose the data as follows:
a a a b b c c c c d d d e e e e
When we transpose the data, the end result starts at the top of the page and go down eliminating the original blank rows. Not a huge issue but I would like to be able to maintain the original data format of the rows so that the data matches the original sheet line for line. The end result would give me the data as follows...
a a a b b c c c c d d d e e e e e
where the vertical gaps between the letters matches the original rows. Like I said, not a huge issue since we can rejoin the transposed data to the original data fairly easily. But it would be nice if we could end up with the above format for speed sake.
The two scripts we use, one-to insert the rows and two-to transpose, take a very long time to run with the transpose script taking the longest by far. On a 30K row sheet, it will take on our systems around 30 minutes to transpose and about 15 minutes to insert rows. Because we have several columns that need to be transposed, a 30K row sheet will take at least 2 hours to complete. A 300K row sheet, that will take 10 to 15 hours to complete.
Is there any way to speed up the scripts either by upgrading to a faster CPU and or writing the scripts to preform faster?
My preferred solution would be to write (have) a formula to preform the transposition that gives me the results as noted above since formula's run so much faster than vba. Is this possible? I have tried all kinds of formulas and can not come close and of course the straight transpose function does not give me the solution I need as noted above.
I have enclosed an excel 2010 spreadsheet with 10K rows of data in rows along with the scripts I use (nothing sensitive here). The tabs at the bottom shows you the data before I transpose, then the data after it has been transposed . To speed up the scripts, I have stripped away the all the rest of the data from the original sheet except just what I need to transpose at one time. Once that is completed, we then re-join the transposed data with the original sheet. The six digit number you see to the far left of the data is the record ID number from the original data. We use this to rejoin the transposed data with the original data so that we know everything is back where it should be. (Note: The insert rows script is run on the original data and not the data you see on the enclosed spreadsheet. That is the only way we can generate unique rows with matching ID numbers. We arrive at this by taking the original data, concatenate the record ID with the column we want to transpose and add a # between the two so that we can break things back apart after the transposition using the text to column function using the # as the separator.)
The sheet I have attached is in the 2010 Macro Enabled format...(xlsm format). We use the xlsb (binary) format for the data to reduce the file size as our normal procedure and run the macros from inside that format. Changing from the xlsx to xlsb format did seem to speed up the scripts a bit and greatly improved the file performance as a whole e.g. saving and loading.
One thing I have done to speed up the scripts is to strip all the data away that is not needed for the transposition. That did work but only a marginal amount.
We are using windows 8.0 with 4G memory and your basic processor speed...e.g. nothing fancy.....just your basic stock computer. Nothing else unusual is installed or running on the computer or at the time the scripts are running.
For those of you that process large sheets, how much of a performance upgrade will we see in processing our scripts by either upgrading memory to 8G (or more or much more) and or getting a faster processor? Or have we reached the maximum script speed already? Or is this a limit to Excel.
One other issue to note: As I stated above, on the 30K row sheets, not a super problem with about 2 hours needed to run the scripts on all the data on the sheet. But on the 300K row sheets, it can take 12 or more hours to run and there are times when things 'lock up' running the scripts on sheets this size.
I am using Excel 2010 and need a macro that can convert data from rows to columns. I have read several posts about this subject but have no experience with macros and don't know how to change the macros to fit my scenario.
There are up to 4 vehicles/locations per account number, and I need 1 account number per row (the dots above are for spacing only and not part of the actual data).
I could do this manually but because I have so many rows of data it could take days or weeks. Is there a macro out there that can do this??
I need something that will take data from columns in one Spreadsheet and put in difference cells in a row. I know this could be done with recording a macro but the number of column will never be constant.
Below I attached examples of the Spreadsheet
Financials SpreadSheet Need to have the data in column B to F put their respective cells in row in the Master Spreadsheet So we would have 5 rows.
ColA and ColB contains a standard information which is supossed to be my reference. ColC contains my queries for which I need information about their place in ColD.
So I need to match ColC with ColA, so as to retrieve the matched data (between ColC and ColA) from ColB to ColD. Following is the way I expect my result to be..
ColA ColB ColC ColD
niki delhi neha patna vinay mumbaihardik kerala kapil bangalorevinay mumbai neha patna pooja goa hardik kerala
I received an answer in that link
"=INDEX($B$2:$B$6,MATCH($C2,$A$2:$A$6,0))",
Which when tried, surely worked a few months back. I am now using excel 2010. I tried the same again, but this time it does not work for me. Is there something else to do which has been changed in the new excel 2010 ?
I have a long list of data with many columns and I'd like all the information to be in one column without manually copying and pasting each column and adding to the first column. The data has different amounts of rows and columns as well. An Example is below. I'm using Excel 2010. Is there a formula or something for this? This isn't the data I'm using but just an example since I do this frequently.
and I have to manually reorganize it like this to import into Stata:
country year value
Benin 1991 20
Benin 1992 254
[code].....
Is there way I can quickly design a macro to do this? The problem is that I generally have a list of about 60 countries, and years from 1991-2011. So, it's really time consuming copying the column of data corresponding to the year, pasting below, repasting the list of countries and the years...then again..then again...then again...I'm using Excel 2010.
I need to remove all rows where COL A value and COL B value are the same. COL C does not need to be considered. However I need to retain one of the Col C values for purposes of formatting.
The end result should look similar to columns F,G and H!
How do I delete duplicate rows in a sheet using a macro. When I say duplicate row, it is not based on a particular column but all the columns, so it is a true duplicate record.
I'm having issues with Excel's 2010 conditional formatting. Seems easy to use, but I'm trying to highlight values based on 2 columns of numerical data. Example:
Column F: 6 6 14
Column L: 3 NA 17
I would like Column L to highlight values that are greater than Column F in green. If they are less than Column L then highlight them in red.
Seems I was able to do this with Excel 2003, but I don't understand the 2010 version.
I have a victim of the Index-Match duplication problem in Excel (2010). Basically, I have three columns of data, all daily input for the year.
Column 1 = Date Column 2 = Actual (Units Sold) Column 3 = Scheduled (Units Sold)
The Date is filled out through the end of the year as is the Scheduled values. The Actual values are filled out daily.
I need to generate a summary box that reports Actual, Scheduled, and Variance (Actual - Scheduled) for the time periods Daily, Month to Date, and YTD.
My problem is that when I try to return the Schedule value that corresponds with the date of the last entry, I don't know if I am pulling the correct Schedule value since I do not know if the Actual value (that is pulled from the last value in the Actual column) is unique. So I tried using an Index-Match formula to return the latest value (that is the last record occurrence of the value) to my function in order to retrieve the correct Schedule value, but, sadly, it did not work.
I by no means am an Excel expert like many of you, so I may have some questions along the way.
I've attached a sample extraction from my worksheet and included an example of the Summary panel I'm creating.
I am trying to find duplicate numbers in sets but so far I can only highlight the ones that are in exact order. I need to find each set that has the same numbers, in any order. Example..
I will provide an example of sets of 3. But I get 3, 4 usually but sometimes 5 or 6.
I get them from different people.
Person A- 234, 569, 498, 849, 848,343,567,347 etc...
Person B- 432, 596, 677, 566, 565,433, 455 etc..
Now I need to find each set that has the same numbers, any order. Like 234 from A and 432 from B would be the same, so I would need to highlight them 2 sets. But I can not figure out how to do this. For Excel to highlight it they have to be 234 and 234. Does not recognize same numbers, different order.
I am looking for a macro to look in Sheet 1 column A and compare the values to Sheet 2 column O. When it finds a duplicate I want it to delete the entire row in sheet 1. I dont want to have to manually sort anything if that's possible.
I have a report with about 7000 rows in it. I need a macro that will find all rows where column A and column B are the same as another rows column A and column B and delete both rows.
I have 2 workbooks in Excel 2010, each contain just 1 sheet. (see attached) I need to compare on sheet 1, cell D1 and column A:A (this column will be much longer), with the data in columns C:C & A:A on sheet 2, if a corresponding match is found, the data contained in column D on the same row on sheet 2 is written to the cell with the matching data in sheet 1.
I'm a first timer here an I'm hving a problem with data. I have 3 sheets in a workbook, I want to transfer all the data to one sheet. I have 3 columns labelled Number, Name, & Sales, on each sheet some of the numbers and names are the same and I want to be able to match them up and put the sales from each sheet into a new column, so the final sheet will have 5 columns in total, if the numbers and names don"t match I just want to add those to the bottom of the matched ones.
Let say that i have this excel file that contains column of account number, the name of the customer, and the payment made.
And I want to extract any of the data that have duplicate. And the script should be able to get the duplicate only if those account numbers, the name of the person and also the payment have been duplicated. If let say only account number is duplicated, then it is not considered duplicate. refer the screenshot below :
1 workbook, 2 worksheets (or tabs). On tab 1, I want a formula/alert that tells the user if any duplicate values exist in Column A of tab 2
Tab 2, Column A, has Unique ID's (6 digit numeric values)
The user manually inputs the ID's on new rows in Column A
Row 1 is reserved and in use for something else Row 2 is my header, so cell A2 says "ID" Row 3-623 currently contain unique ID's
When the user inputs a new ID into cell A624, then they return to Tab 1, I want my formula/alert on Tab 1 to tell the user that they have duplicates in Column A of tab 2. I know the Conditional Formatting, but if the user copies in 100 new values, they won't necessarily see the highlighted cells. My tab 1 is my "checks and balances" and the last place the user is suppposed to look to ensure that they haven't created any duplicate ID's. If the user sees a warning message that says duplicates exist, then I'll tell them that they need to look at column A (for cells that have been conditionally highlighted).
One issue that I'm running into with the conditional highlighting is that I want cells A3:A1048576 to already have the conditional formatting - this way when the user inserts a value into Cell A624, then A625, etc they conditional formatting is already there. Right now with data in cells A3:A623, cells A624:A1048576 are all highlighted with the Red/Bold Red Font (which is okay I guess), but ideally it would be nice to not count 2+ empty cells as duplicates and I'll have to have my formula on Tab 1 not include the blank cells.
I DO NOT want to use the Remove Duplicates feature of Excel 2010. If I remove them I could be removing data in columns B, C, D, etc that belong to the Unique ID. I just need the user to be told in Tab 1 that they DO have duplicates and I'll train the user how to research this and fix it.
The reason I want to look for duplicates in the entire Column A is because the list of Unique ID's will grow over time.
I have a new project that needs macro code. Your help is very much appreciated. We have a spreadsheet with duplicate accounts meaning two or three rows with the same account but different information. We want to use only one row for one account and move the new data from the same account to one row only to the right and delete the duplicates. Can someone please help me with this?I read so many post and I tried some of them but it only delete the duplicate row and not copying the new data from that row to one row only. Also, the other code I tried was retaining only the current or old data. Actually, to elaborate more, I want to get the new data in each cell of the same account in multiple row and move it in one row to the right only and delete the duplicate in that same account.
I found a lot of information on this but not what I need. I have 8 columns A - H. Column D has some duplicate numbers. I would like to find the duplicate numbers in column D (they are all one right after the other) and delete the entire row leaving only the first. I do not need to sum or anything, just delete the row with a duplicate number. If there are 2 or 3, I just end up with one.