Find Top 10 Most Common Words In Column Of Text Strings?
Apr 1, 2014
I've been racking my brains trying to find a way of doing this. I have a list (column A in Excel) of over 50,000 organisations and I'd like to know what the most common words used in the names are. Ideally it would great if I could produce a top 10 list of the most common words at the top e.g. Ltd, School or Church with a count in the next column of how times that word it appears
I'm not too sure if this will be really easy difficult or really difficult but I want to write some code that compares the value of two strings, and any characters that arent in string 2 will be return formatted in bold. For example, if String1 was "The Quick Brown Fox" and String2 was "The Quick Yellow Fox jumped over the lazy dog", String2 would then be returned as "The Quick Yellow Fox jumped over the lazy dog".
I have a spreadsheet of part #'s, descriptions, manufacturer names, and manufacturer part #'s. (It's a list of the inventory in my warehouse). Each row contains information for just the item in that row. Row 2 references another part in my warehouse, row 3 yet another, and so on.
Many of the parts have more than one potential manufacturer and part #, (meaning that any of those manufacturer's part #'s are basically the same tool; just different brands. At one time we may get a shipment of one, at other times we may get a shipment of another). For example, a screwdriver may be listed like this:
Part # 1234 screwdriver, mfg Snap-On, part # 456, mfg Stanley, part # 789, mfg Mac Tool, part # 439.
Then further down the list, there may be another part listed like this:
Part # 9980 wrench, mfg Stanley, part #741, mfg Snap-On, part # 852, mfg Proto, part # 369.
If you can imagine that data across the cells of a spreadsheet row, notice how the mfg name 'Snap-On' was the first mfg name on the screwdriver, but it was listed as the 2nd mfg name on the wrench.
So, here's my question: I want to be able to group all of the items made by any one manufacturer together in a new list. If all of the manufacturer names were in the same column, I could simply sort the list by that column, but since I've got thousands of rows with the mfg name I'm looking for in different columns on different rows, I thought maybe a macro could search each row for the word I'm looking for, then if found, take the whole row and copy it to a new worksheet. So the end result would be, If I wanted to see all items of which Snap-On is an acceptable supplier, I could get a list of all potential Snap-On items grouped together.
I'm sorry this is so long. I may have over-worded this and it may not be too clear. I could email an example of the spreadsheet if anyone needed more info to figure out what I'm looking for and was willing to take a look at it.
2. Once the entire list is broken down into its many parts, use the pivot table feature of excel to determine how common each of the parts is within the entire data set.
So, my questions are these:
1. Do you believe this is the best way to solve my problem? If not, what would be the preferred method? 2. If this is the best method, what function or script would I use to accomplish the first step of breaking down the lines into their individual parts?
Mike Auto Merged Post Until 24 Hrs Passes;It appears I put too many characters in the title of my post. It should read: Common Words - Decomposing Text Phrases
I have a spreadsheet with approx 7000 rows, many of which contain the same item but with flavors and other variations on the end. An example would be:
VB: A B 10142 6kg of whey bundle With Free protein shaker-Banana 10143 6kg of whey bundle With Free protein shaker-Chocolate 10144 6kg of whey bundle With Free protein shaker-Strawberry 10145 6kg of whey bundle With Free protein shaker-Unflavoured 10010 **Bodybuilding Warehouse Premium Whey Probiotic - 2.2kg 10011 **Bodybuilding Warehouse Premium Whey Probiotic - 2.2kg + FREE Shaker
Would it be possible To create a New column (column C) which would display all common words from row b into the New column Like below?
A B C 10142 6kg of whey bundle With Free protein shaker-Banana 6kg of whey bundle With Free protein shaker 10143 6kg of whey bundle With Free protein shaker-Chocolate 6kg of whey bundle With Free protein shaker 10144 6kg of whey bundle With Free protein shaker-Strawberry 6kg of whey bundle With Free protein shaker
[Code] ....
I've attached a larger sample of our list to get a better idea of different variations that are on the spreadsheet.
I think what we need is something similar to this thread[URL] .....
In column A I have 50,000 cells, each containing 1 to 10 keywords. For example A1 = "jobs" A2 = "jobs in milton keynes" A3 = "it jobs in milton keynes" A4 = "sales jobs in milton keynes" A5 = "well paying brickie work in spain" etc etc
At first I was trying to find out the most common keywords in column A, and I used the following code to do so
How do I use an Excel formula to find which (if any) multiple sets, each of up to 50 words, exist in a series of rows of a spreadsheet - if set A has one or more words found in a searched cell.
A positive result will return a specific value in the designated result cell. If none of the words in Set A is found in the searched cell, the formulae will repeat the test for the words in Set B, and so on.
After all 50 sets of words have been tested, the formula will move to the next cell in the searched column.
New words will be added to the sets of words continually as required.
Multiple words within sets are included in double quotes. Within each set of words there will be some n-tuples of words (i.e. 24 adjacent words) that contain one or more of the words in the set, but for which the formula will be required to return a negative result. Example: Set A = word 1, word2, word 3, "word1 word2 word3". (The words within a set could also be each entered in separate columns, as opposed to all included in a single cell.) The single column of text to be searched is about 10,000 rows.
I am wanting to use the above in a spreadsheet that contains data downloaded from a series of bank accounts to automatically allocate items of expenditure to one of 20 or so different categories of expenditure.
The formula will search the description field to find words that are used in the in the downloaded files from the various accounts to describe each transaction.
If a word describing travel expenditure (e.g. hotel, "holiday inn" but not "holiday travel") is found in the description of an expenditure item - the item cost will be allocated to the TRAVEL EXPENDITURE column, which is one of 20 or so different categories of expenditure.
Happy to consider a different solution if the task can be done better a different way.
Tried using a combination of INDEX/SEARCH/IF in Excel, but was not able to get a correct result. PS I am using Excel 2011 for Mac - which does not allow macros, so the solution needs to be entirely formula based.
I have several cells in a column that look something like this:
Cell A1: abc 1234 def ghi Cell A2: xxxx aa b 245 qqqqq Cell A3: abcdefg hij kl mnopqr s
Is there an excel formula or combination of formulas I can use to identify: (1) whether any given text string (such as those above) include numbers, and (2) what the first number (which could contain 1-4 digits) contained in the text string is?
I have been given a huge membership list. The field for the CITY also as the two letter state abbreviation (e.g., "Fremont, CA" instead of just "Fremont".). I want to be able to have the "CA" or "WA" or "NV" (etc) from the city field appear in a new STATE field. I successfully use the below statement to do this with the "CA" but I want a statement that will search for multiple strings (the other states). Here is what works now: =IF(FIND(" CA",F2),"CA"). But I want to be able to add other state abbreviations to this.
I'm trying to write UDF which getting RegEx pattern and a certain cell as arguments and returns only matching string. For examples for string "The quick brown fox jumps over the lazy dog", and RegEx pattern "w{4}" the function will return two words "OVER" and "LAZY". What should I change in my code?
Function GetPattern(myPattern As String, myString As String) Dim regEx As RegExp Dim Matches As Object Set regEx = CreateObject("VBScript.RegExp")
With regEx .Pattern = myPattern .IgnoreCase = True End With GetPattern = regEx.Replace(myString, "$1") End Function
I would like to sync cells together that contain common words for sorting purposes is this possible? For instance i have a [URL] ..... in column A row 1 and In column B row 2 i have the word bellmont i need to get the rows to sync so rows containing common words line up. I have 8,000 rows to sync?
Very simple program I think, can either be solved by build-in functions or macro. So situation is I have a table, where the D column, contains certain words.
Now I have table where the M1:M10 column contains the same words and the corresponding column (N) contains the value.
So for example , cell D5= "A" and I find that cell M6 is also "A" so I then go at cell N6 which has the value "3.3". So now I want I5 to have the value 3.3 in it.
In Summary, I want the value of the n column copied into column I. Now I have plenty of rows in the D column so I prefer a fast way.
I have a column of text strings on Sheet1, Column A, which I need to check for the presence of keywords listed on Sheet2, Column A
So if any word from the keyword list on Sheet2, Column A is found in, say, cell A2 of Sheet1, the cell to its right (B1) should have a formula to display the count of keywords found in A2. I also would like to see each keyword identified either through a highlight or a list. I need the formula to NOT be case sensitive and the match does not have to be for whole words).
I have a find and replace function that removes + smybols from a coloum of strings. How can I remove the first instance of a space (if it later contains a +, too?
see attached spreadsheet. In sheet 1 I would like to look up each word in column D, seeing if they are in column B at all. Note if the word "Jill" is in D and "jilly" in a surname in B I would like it to get picked up. I have highlighted manually those that would get picked up. Those that do get picked up I would like to be copied into column C as per sheet 2 (this is what I would like it to end up like). There is a very long winded way of doing this using a find function and 1 column per word but as the actual sheet i'm using has thousands of different words this isn't really viable!
And I need to see whether any of these appear in cells in a reference column G. If they do, I would like to return 'Used' into column B.
An example of the type of text in each cell in column G is:
"If you have any questions regarding your offer, please contact me. For any questions regarding your benefits, payroll or company policies and programs, please contact HR. Sincerely, {{Advisor_Signature__c}} {{Advisors_Job_Title__c}}"
I don't seem to able to search for a text string across multiple reference cells.
with creating this macro to identify duplicate text strings in a column, which is great.
But, I'd like to be able to identify them by changing the text of the subsequent duplicates that are found.
For example, if 3 cells in a column are 1111, I'd like to add a string of text to the end of the 2nd & 3rd cell, but not the 1st cell. 1111 dup-1111 dup-1111-dup
This will enable me to sort the column and find the duplicate easier than just visually.
Sub color_dup() Dim r As Range, rng As Range, Col As String Col = "d" Set rng = Range(Col & ":" & Col) rng.Interior.ColorIndex = 0 For Each r In Range(Col & "1", Range(Col & "65536").End(xlUp)) If Application.CountIf(rng, r) > 1 Then r.Interior.ColorIndex = 6 End If Next End Sub
Looking to find 1 of 2 words in a cell in column B and return the word found in the same row in column E. This seemed easy but I am not having any luck.
the cells in column B have several words in them but I am looking for 2 specific words "PLAT" and "ORIG". If the word is not in the cell, it should show a blank cell in column E in the same row, otherwise one of the 2 words should be in that row in column E. A VBA loop would be ideal but a formula that can do it might work as well.
I have been trying to format the rows on this sheet to color scale red based on the number of repeat text strings in Column E. Referring to the attached example sheet, '321/312.2/321.3' appears the most times and the goal is to color code the rows it appears in the deepest shade of red, then the next highest occurring string would shade the rows it appears in a lighter shade, etc, in descending order. Our team currently does this across multiple sheets manually everyday and it would be a real time saver if we could get excel to do this automatically.
I am looking for a way of creating the following conditioned concatenation.
I have two tables, let's call them "summary" and "detailed".
The "detailed" table is something like the following:
ID VOL
001 01
001 05
[code]....
The "summary" table below gets info from the "detailed" table. The 'ID'is now unique. I'm looking for a formula on the 'VOL (concatenated)' column cells it should get all rows from the "detailed" table with the same ID and then concatenate the 'VOL' column results, comma separated:
ID (unique) VOL (concatenated)
001 V01, V03, V05
002 V01, V04
003 V06
PS: I have people using this table with office 2003, so compatibility is necessary...
I have a column of words in Column A and I want to replace all the times that these words appear in the rest of the excel sheet with the words in Column B. If someone has already answered a similar problem link me to the thread because I can't find anything.
I'm looking for a macro to remove all words (in a single word per cell format) in a range (approx 100 columns & 7000 rows), except for a list of 100 words.
I've been using conventional method to do this and it's time consuming. I would like to total up 2 column. A multiply B to be exact. Below are some examples:
Table 1 - Before totaling up:
Quantity Product 5 2 x Button A White 3 4 x Button B Pink 4 5 x Ribbon A Black 2 3 x Thread A White 6 2 x Cloth A Blue
Table 2 - After totaling up:
Quantity Product 10 Button A White 12 Button B Pink 20 Ribbon A Black 6 Thread A White 12 Cloth A Blue
I need to have the sum of the "Quantity" multiply "Product". Or in short A x B. And the end result need to have the number and "x" sign removed while keeping on the the products names. (2 x ) Take note it's "number" space "symbol" space.
I am trying to find certain words in a column and delete the word and characters following. For example, Say I have a column of info as seen below. This is a test of me. I am just experimenting with this stuff. Deleted (6/15/01) Let me know what you think. I am not sure about it all, but I guess I will figure it out. riviledge1 (01/05/06) Now let's see what happens when I try to test it.
I want to find all the "Priviledge1 (01/05/06)" and replace with nothing. Please note, the date will change with each record, so I need to figure out how to tell Excel to find "Priviledge1", delete it and the date behind it. So I want to delete "Priviledge1" and the next 11 characters including the space.
I have a fairly large timecourse dataset and I need to find all common values within all 3 columns. Also, when I find these 'common values' is there a speedy way to retrieve data in the same row that is associated with these values, instead of going back one-by-one and copying and pasting beside the value that the function has returned?