Currently I have a list of 110,000 donors in Excel. One of the pieces of information they give to us is their occupation. I would like to condense this list down to say 10 or 20 categories that I define.
Normally I would just chug through this, going line by line, but since I have to do this for a years worth of data, I don't really have the time to do a line by line of 1,000,000+ rows.
Is there anyway to define my 10 or 20 categories and then have python sort it out from there?
Update:
The data is poorly formatted. People self populate a field either online or on a slip of paper and then mail it into a data processing company. There is a great deal of variance. CEO, Chief Executive, Executive Office, the list goes on.
I used a SORT UNIQ comand and found that my list has ~13,000 different professions.
I assume that the data are noisy, in the sense that it could just be anything at all, written in. The main difficulty here is going to be how to define the mapping between your input data, and categories, and that is going to involve, in the first place, looking through the data.
I suggest that you look at what you have, and draw up a list of mappings from input occupations to categories. You can then use pretty much any tool (and if you're using excel, stick with excel) to apply that mapping to each row. Some rows will not fall into any category. You should look at them, and figure out if that is because your mapping is inadequate (e.g. you didn't think of how to deal with veterinarians), or if it is because the data are noisy. If it's noise, you can either deal with the remainder by hand, or try to use some other technique to categorise the data, e.g. regular expressions or some kind of natural language processing library.
Once you have figured out what your problem cases are, come back and ask us about them, with sample data, and the code you have been using.
If you can't even take the first step in figuring out how to run the mapping, do some research, try to write something, then come back with a specific question about that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With