Anonymizing customer data for development or testing

Tags:

I need to take production data with real customer info (names, address, phone numbers, etc) and move it into a dev environment, but I'd like to remove any semblance of real customer info.

Some of the answers to this question can help me generating NEW test data, but then how do I replace those columns in my production data, but keep the other relevant columns?

Let's say I had a table with 10000 fake names. Should I do a cross-join with a SQL update? Or do something like

UPDATE table SET lastname = (SELECT TOP 1 name FROM samplenames ORDER By NEWID())

934

asked Nov 03 '08 23:11

BradC

2 Answers

Anonymizing data can be tricky and if not done correctly can lead you to trouble, like what happened to AOL when they released search data a while back. I would attempt to create test data from scratch at all costs before I tried to convert existing customer data. Things may lead you to be able to figure out who the data belonged to using things such as behavioral analysis and other data points that you might not consider sensitive. I would rather be safe than sorry.

answered Sep 21 '22 07:09

John Lemp

This is easier than it sounds if you understand the database. One thing that is necessary is to understand the places where personal info is not normalized. For instance, the customer master file will have a name and address, but the order file will also have a name and address that might be different.

My basic process:

ID the data (i.e. the columns), and the tables which contain those columns.
ID the "master" tables for those columns, and also the non-normailzed instances of those columns.
Adjust the master files. Rather than trying to randomize them, (or make them phony), connect them to the key of the file. For customer 123, set the name to name123, the address to 123 123rd St, 123town, CA, USA, phone 1231231231. This has the added bonus of making debugging very easy!
Change the non-normal instances by either updating from the master file or by doing the same kind of de-personalization

It doesn't look pretty, but it works.

122

answered Sep 21 '22 07:09

tomjedrz

Related questions
                            
                                HtmlAgilityPack Drops Option End Tags
                            
                                Why Direct3D application performs better in full screen mode?
                            
                                How do you handle multiple instances of setTimeout()?
                            
                                Do database views affect query performance?
                            
                                NHibernate - not-null property reference a null or transient value
                            
                                Best way to profile memory usage in a Java application?
                            
                                Improving raytracer performance
                            
                                Replacing an element and returning the new one in jQuery
                            
                                Python's unittest and dynamic creation of test cases [duplicate]
                            
                                Django MVC pattern for non database driven models?
                            
                                jQuery: add dom element if it does not exist
                            
                                How do I make a Class extend Observable when it has extended another class too?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With