I've been handed a table with about 18000 rows. Each record describes the location of one customer. The issue is, that when the person created the table, they did not add a field for "Company Name", only "Location Name," and one company can have many locations.
For example, here are some records that describe the same customer:
Location Table
ID Location_Name
1 TownShop#1
2 Town Shop - Loc 2
3 The Town Shop
4 TTS - Someplace
5 Town Shop,the 3
6 Toen Shop4
My goal is to make it look like:
Location Table
ID Company_ID Location_Name
1 1 Town Shop#1
2 1 Town Shop - Loc 2
3 1 The Town Shop
4 1 TTS - Someplace
5 1 Town Shop,the 3
6 1 Toen Shop4
Company Table
Company_ID Company_Name
1 The Town Shop
There is no "Company" table, I will have to generate the Company Name list from the most descriptive or best Location Name that represents the multiple locations.
Currently I am thinking I need to generate a list of Location Names that are similar, and then and go through that list by hand.
Any suggestions on how I can approach this is appreciated.
@Neall, Thank you for your statement, but unfortunately, each location name is distinct, there are no duplicate location names, only similar. So in the results from your statement "repcount" is 1 in each row.
@yukondude, Your step 4 is the heart of my question.
Please update the question, do you have a list of CompanyNames available to you? I ask because you maybe able to use Levenshtein algo to find a relationship between your list of CompanyNames and LocationNames.
Update
There is not a list of Company Names, I will have to generate the company name from the most descriptive or best Location Name that represents the multiple locations.
Okay... try this:
The whole purpose of the above actions is to automate parts and limit the scope of your problem. It's far from perfect, but will hopefully save you the trouble of going through 18K records by hand.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With