Data Scrubbing - Identifying Duplicates And Assigning One Number
Feb 11, 2014
I am involved in a software conversion that is taking 4 full time folks over 5 weeks to clean up and assign an alpha-numeric sequential number to each vendor, client. Each scrubber is reviewing an excel spreadsheet containing the names, addresses, FID, telephone, etc. of our vendors and customers. This information is being pulled from 2 separate sources. We are assigning a BP # to the main office location and not retiring that one. then we go on to identifying the dups. All dups get a Y to be retired, but if they have a different address then the main one, we place a Y to bring that address over under that BP#.
Ultimately, we end up with 2 systems combined into one dumping all old numbers assigned and giving each vendors, customers, etc. a new BP# that may have muliptple addresses.
How can we assign a alpha numeric number without going through each individual line...over 900,000 of them to do. Key is to identify duplicate addresses and duplicate names. Some names might be RK Electrical or Robert King Electrical but the address will be duplicated usually.
View 9 Replies
ADVERTISEMENT
Feb 9, 2010
I have attached a sample sheet which deals with property sales data, in reference to a two-part question.
1. If the row has two, or more, rows share the same value in column 'E', it needs to be identified with a 'Y' in Column G. In looking through old threads, this seems possible, though I could not find and answer I could 'bend' to work. If this is possible, can the following be included?
2. If two or more rows share the same value in column 'E', list the identifiers (value in Column A) for the others in Column H, separated by commas "," or slashes (preferred) "/".
ie.
"00370600000700 'NICHOLLS JOHN W & CARLA R 11/27/2000 85000 '260647 W Y '00370600000800" and
"00370600000800 'NICHOLLS JOHN W & CARLA R 11/27/2000 85000 '260647 W Y '00370600000700" or
"'00370500000801 'FRAHM FREDERICK/ERIK/KRYSTYNA 06/17/2004 110000 '288904W Y '00370500000802/'00370500000803"
The sample sheet attached includes 26 rows of data with several 'doubles' and one 'triple' 'duplicates'.
Please note that in the 'real' file, it has slightly less than 200,000 rows and I have seen 'dupicates' up to 40 with the same value in Column 'E'
View 14 Replies
View Related
Aug 6, 2009
Here is the deal I have 4 columns. Each line gives you the following information:
The Id is the record number, Code_Name is a code for each Fox in the study, date and area is a sub area in a bigger grid. Basically I have an area divided by squares and every time a marked fox enters in one of my squares a new line in the data is created. What I won’t to know is if a fox when in my area of study will return to the same squares or not.
ID
CODE_NAME
Date
Area
Logical
116
SSS1
02-Jan-09
1A1
273
RRR1
02-Jan-09
2A2
2959
BBB1
02-Jan-09
1B1
2959
What I’m trying to achieve is a 5th column were ill get a logical value of TRUE or FALSE if, for each fox in the next available record a fox went back to the same square or not.
So if you check for the fox RRR1 I have 2 records one in 2 of January in area 1A1 and a second in 7 of January in the same area. For the Fox BBB1 you will see that she was always in different areas and for CCC1 she only came back to one square.
The problem is I have over 400 fox’s and 12000 records and I’m trying to get a way of doing it automatically.
View 9 Replies
View Related
Dec 11, 2008
How do I go about assigning a number to a particular piece of data? To give a simple example of what I'm trying to do, and what I envision, I'll pretend I'm building a sandwich.
1=white
2=wheat
3=ham
4=turkey
5=roast beef
6=lettuce
7=tomato
8=mayo
9=mustard
10=cheese
So, someone could come along and build their sandwich, ham/turkey/tomato/mayo on white bread. Excel would then recognize that 1+3+4+7+8=23
23=The American (the name of the sandwich, which I've already assigned a variable to)
I'm using the sandwich model because its a lot simpler than what I'm attempting to do.
View 10 Replies
View Related
Jul 18, 2012
I have organized account numbers from two systems onto a spreadsheet, with numbers from System 1 arrayed in Column A and numbers from System 2 arrayed in Column B. I need to evaluate the numbers in both columns and isolate the numbers that are NOT DUPLICATES across the two systems (Columns A and B) and return a list of non-duplicate numbers in Column C. Here is what the table would look like:
System 1
System 2
Non-Duplicates
173
215
173
[Code] .......
What is the best approach to use?
View 4 Replies
View Related
Dec 4, 2013
The first column shows the sample data. The second column is what I'm expecting.
How to assign a similar value to the duplicates and the unique ones should have the values in order
605-21501
605-21612
605-21501
605-26123
605-26134
605-21612
605-26195
605-21612
View 10 Replies
View Related
Nov 30, 2009
I would like to identify duplicates in a list using conditional formatting in Excel 2007.
I have tried choosing to identify duplicates using the formula that I have found on many threads throughout the message board:
=COUNTIF($A$1:$A1,$A1)>1.
This function works up to 15 characters in a cell, but Excel seems to be treating all digits after the first 15 as the same, resulting in a "fuzzy match" where I want an exact match. Many of the values in my list are 18 characters long, in text format to prevent rounding.
I've noticed that Excel treats the 18-characters values the same way when sorting; for example, it treats these two values as the same:
'234567891011121314
'234567891011122413
Is there a way to force Excel to examine those last four digits for the purpose of sorting & identifying duplicates?
View 9 Replies
View Related
Nov 25, 2008
I'm using a cells.find command to locate a value in a file. How do I return the current row number that I'm on following the command?
I'm guessing it is something along the lines of:
MyCurrentRow = ActiveCell.RowNumber
but I know that that is an invalid statement.
View 9 Replies
View Related
Sep 11, 2013
I am trying to collect data from a server. The data comes through as .csv (seperated data), and I am able to get all the useless info/columns out of the way - but I would like to keep a record of how many times these "alarms" come in. form a spreadsheet, or tell me how to go abouts using a tool to simplify my process.
View 1 Replies
View Related
Jun 18, 2008
I am using a vlookup and have a problem. I am assigning a category to an item number based on the first two characters of the item number. For example item number 60123 would equal scrap because of the first two characters of 60. But the item number can begin with either a number or letter. Here is the formula I am using that works for item numbers that begin with numbers:
=VLOOKUP(VALUE(LEFT(E2,2)),Sheet3!A:B,2,FALSE)
It works fine until I reach a item number that begins with a letter, then I get the dreaded #Value error. If I take the value out of the formula then it works for the letter based number items but not for the number based item numbers.
View 9 Replies
View Related
May 5, 2009
I've collected some data from a GPS logger regarding the speed of an athlete. I want to calculate how many sprints this particular athlete undertook during a training session.
Sprinting is defined as a speed of > 20 kph.
One sprint would be the attainment of one peak >20 kph before decreasing below 20 kph.
It's easy to identify the 3 peaks and thus sprints from the xy scatterplot in the attached file, but I'm struggling to find a way to calculate this.
View 11 Replies
View Related
Feb 10, 2009
In column H I have a list of numbers seperated by a space, the number of lines can change. In column L I have a list of numbers which can change either expand or retract.
I would like to check each cell in column H and if any numbers are not listed in column L then it/they should be shown in column G.
Example1 H2 shows 6 11, therefore cell G2 should show 11.
Example 2 H6 shows 5 6 9 11 therefore G6 should show 9 11
Sheet1 HIJKL1Container ID26 11 135 8 11 245 7 11 355 7 565 6 9 11 675 6 9 Excel tables to the web >> Excel Jeanie HTML 4
View 9 Replies
View Related
Apr 22, 2009
I am looking for the easiest way to find duplicate Work Order numbers that exist in 2 separate Workbooks. EX. Workbook 1 Sheet one contains the numbers 1-100 in A1:A100
Workbook 2 Sheet one contains X amount of the numbers between 1-100 located.
somewhere in A:A. For arguments sake let's assume those numbers are 3,6,33,87,99.
What would the formula be to return the values that are in both of the workbooks?
View 5 Replies
View Related
Mar 11, 2014
I have small table i would like to create. Now, it can be done manually ( but its be very very time consuming) but im sure of a way using IFs and VLOOKUPs so that the data selection can be done automatically...
so in column 1 i have various valuations from 0 to anything 50mil plus that i need to then separate into 4 different columns based on their size. so column A would have 0 - 250k, column B 251k to 500k, column C 501k to 1million and etc etc...
vals 0-250 251-500 501-1mill
555,000 300,000
150,00 75,000
Please see attached ... testing values 1.xlsx‎
View 2 Replies
View Related
Dec 6, 2006
I am trying to slim down my database results in Excel via MS Query by searching for Part ID's that are numeric (we have parts that also contain letters....I want to weed those out).
In all my searching on the web, I thought the ISNUMERIC() function should be the function for this, but I keep getting an ORA-00904::"ISNUMERIC":invalid identifier....
Is this function supposed to work or is there another function that will do this
This is my SQL statement so far, which works to get parts that are 6 characters long only:
SELECT PART.ID
FROM SYSADM.PART PART
WHERE (LENGTH(PART.ID)=6)
When I change it to this to get parts that are numeric, it gives the error above:
SELECT PART.ID
FROM SYSADM.PART PART
WHERE (LENGTH(PART.ID)=6) AND (ISNUMERIC(PART.ID)=1)
View 9 Replies
View Related
Apr 8, 2014
I'm trying to Count duplicate text on sheet 2 and populate the number of times repeated on sheet 1.
On sheet 1 I have A3:A128 and would like the number of times these respective cost centers are repeated in Sheet 2 to populate Column B of Sheet 1.
File attached.
Not sure which formula to use. I tried CountIF but didn't work with 2 sheets.
View 7 Replies
View Related
Sep 17, 2009
i am trying to work out how to use the rank formula to rank numbers in column B and keep them in unison with Column A.
So Column A has say 5 1's with column B having different scores then continuing under 1 in A is 2 and so on is there a way to continue the ranking formula without manually changing the cell ranges?
so =rank(B1,$B$1:$B$7,1) but can i do that if A =1 and then A=2 etc ?
so if A=1,rank(B1,$B$1:$B$7,1)
View 7 Replies
View Related
Sep 21, 2009
I currently have a database that is similar to the example I've pasted below:
TypeDateTimeA10/01/200910:00B10/01/200910:00C10/01/200910:00D10/01/200912:00E10/01/200912:00
As you can see the Date and Time are repeated for several rows. This is how the data I receive comes through as A,B and C refer to a single transaction and D & E refer to another transaction.
Ideally what I would like to acheive is:
TypeDateTimeTransactionA10/01/200910:001B10/01/200910:00C10/01/200910:00D10/01/200912:002E10/01/200912:00
Where Excel can compare the date and times of each row and look for matching rows above and below it and then fill in a column next to it indicating that x number of rows are linked to a single transaction - preferably labelling them in some order to I can tell how many transactions there are.
View 9 Replies
View Related
Sep 8, 2009
Data Table including-
List of Identifying Code Numbers for customer invoices
Multiple repetitions of individual Identifying Code Numbers in list
Various data in table range including Various Values of invoices from different dates for each repetion of Identifying Code Number.
- Wish to find largest invoice for each Individual Identifying Code Number in the table without using a pivot table.
i have tried combining Max and Large functions with Vlookups etc.
View 9 Replies
View Related
May 4, 2009
Im new to Macros. Im trying to find a string of text assign that to be the top of the data and then find a different string and assign that to be the bottom of the data. Then run a loop whilst inside that data range. Am i going about it the right way? Attached is a sample data file.
View 4 Replies
View Related
Jun 22, 2013
I need an excel file to do this:
i have one column like this:
492
500
773
738
572
492 When I repeat this number I need that the first (492) be formatted with a color and continuous like this
200
572 format the first 572
492 format the next 492 but the actual number stays without formatting!
View 14 Replies
View Related
Jun 19, 2014
I am trying to write a formula that shows the best grade for each each student and the subject in which
that grade was achieved.
In a previous post i was shown how to identify the highest grade
MAX(IF(A$3:A$34=A7,C$3:C$34))
I hit CTRL, SHIFT & ENTER to activate the formula - Result in column E
However, I also want to identify the subject in which they scored their highest grade in column F
I have two problems
1. I don't know how to write a formula that brings in the subject based on the grade for each student
2. I don't know how to write a formula in case there is a tie (see DAVID L)
View 11 Replies
View Related
Dec 8, 2007
I have a list of unique number. They are part numbers. For convenience for the user, they cannot be resorted. A list may look like this:
AAAA
BBBB
AAAA
AAAA
CCCC
(blank cell)
CCCC
What I would like is a formula that i can put in the adjacent cell that will result in the following:
AAAA 1
BBBB 1
AAAA 2
AAAA 3
CCCC 1
(blank cell)
CCCC 2
Essentially, the formula/solution will "know" that the third occurence in the list is actually the 3rd occurence, and so forth... I tried COUNTIF but that just gave me the total number of occurences
AAAA 3 wrong
BBBB 1
AAAA 3 wrong
AAAA 3
CCCC 2 wrong
(blank cell)
CCCC 2
View 2 Replies
View Related
Dec 17, 2008
I am trying to improve my expense report template and need to check on the load if expense report number has been loaded correctly as well as if this report has been previously loaded. Expense report number format looks like this: AAA-BBBBBB-CC
Where:
AAA – Employee ID #
BBBBBB – End of the week date
CC – Weekly expense report number
For instance, 023-122008-01 means: Employee number 023, week ending date 12/20/2008, weekly expense report number 01. I would like to prevent/give warning of loading incorrect expense report number format, check for possible duplicates, and check if trying to load expense report belongs to the right person (by simply matching previously loaded in different cell of the same sheet employee ID and first three digits of just loaded expense report number. I think I know how to do all of these separate, but have no idea how to combine all three checks for one cell.
View 3 Replies
View Related
Jun 7, 2013
My sheet cotains a variable number of duplicates (2 up to 12 duplicates) that can each have different amounts.
Ex: 9879 = 7
9879 = 0
I would like to be able to highlight all the duplicates for that show only 0 as their value (all the duplicates must have a total value of 0). I've tried nested if functions and conditional formating but to no avail.
I've attached a small file that shows the end result.
Example.xls
View 2 Replies
View Related
Jun 7, 2013
My sheet contains a variable number of duplicates (2 up to 14 duplicates) that can each have different amounts.
Ex: 9879 = 7
9879 = 0
I would like to be able to highlight all the duplicates for that show only 0 as their value (all the duplicates must have a total value of 0). I've tried nested if functions and conditional formating but to no avail.
I've attached a small file that shows the end result. Attachment 241407
View 9 Replies
View Related
Feb 9, 2009
I have this macro when run copys and pastes values for a row of cells onto another sheet. However if 2 cells in the same row have a number greater than zero. It duplicates the entry. What I need is some sort of check that says if 2 or more cells in the same row have a number greater than zero just copy that row once only.
View 3 Replies
View Related
Apr 10, 2008
Below is a range of data i am working with, i am trying to create a formula that will count the number of different entries in column A for each different entry in column B.
ie how many differnt values are there for "Packing"
008003PICKING MISTAKE
008042UNFIT FOR PURPOSE
008035PACKING
008035PACKING
008035PACKING
007960CHANGE OF MIND
007986PACKING
007986PACKING
008050UNFIT FOR PURPOSE
008070CHANGE OF MIND
008070CHANGE OF MIND
008074CHANGE OF MIND
008074CHANGE OF MIND
008074CHANGE OF MIND
008074CHANGE OF MIND
008074CHANGE OF MIND
008086PACKING
008085PACKING
008085PACKING
View 6 Replies
View Related
Mar 21, 2014
I need the easiest way to randomize or generate team numbers in a league. Using COL A assign numbers between 1 & 8 without duplicates. Then repeat 5 more times. This has to be done on the spot and has time constraints as the players will be waiting for their team assignments.
EXAMPLE: I have 48 players which will be assigned to 8 teams of 6. I want to randomize the drawing so the same players don't play on the same teams each week. Also to be able to adjust number of teams determined by how many players are present. either 6 , 8, or 10 teams.
View 10 Replies
View Related
Jun 14, 2007
I have a list of invoices numbers in column B, some are duplicated
I want to count the the number of invoices excluding duplicates.
View 9 Replies
View Related