Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Obfuscate / Mask / Scramble personal information

I'm looking for a homegrown way to scramble production data for use in development and test. I've built a couple of scripts that make random social security numbers, shift birth dates, scramble emails, etc. But I've come up against a wall trying to scramble customer names. I want to keep real names so we can still use or searches so random letter generation is out. What I have tried so far is building a temp table of all last names in the table then updating the customer table with a random selection from the temp table. Like this:

DECLARE @Names TABLE (Id int IDENTITY(1,1),[Name] varchar(100))

/* Scramble the last names (randomly pick another last name) */
INSERT @Names SELECT LastName FROM Customer ORDER BY NEWID();
WITH [Customer ORDERED BY ROWID] AS
(SELECT ROW_NUMBER() OVER (ORDER BY NEWID()) AS ROWID, LastName FROM Customer)
UPDATE [Customer ORDERED BY ROWID] SET LastName=(SELECT [Name] FROM @Names WHERE ROWID=Id)

This worked well in test, but completely bogs down dealing with larger amounts of data (>20 minutes for 40K rows)

All of that to ask, how would you scramble customer names while keeping real names and the weight of the production data?

UPDATE: Never fails, you try to put all the information in the post, and you forget something important. This data will also be used in our sales & demo environments which are publicly available. Some of the answers are what I am attempting to do, to 'switch' the names, but my question is literally, how to code in T-SQL?

like image 547
Computer Chip Avatar asked Oct 03 '08 21:10

Computer Chip


People also ask

How do you obfuscate PII?

PII can be obfuscated by either nullifying or masking. PII is nullified when its value is returned null. PII is masked when a portion of its value is returned with placeholder characters, such as 'XXXXXXX-3213' as a return value for an account number.

How do you obfuscate user data?

Three of the most common techniques used to obfuscate data are encryption, tokenization, and data masking. Encryption, tokenization, and data masking work in different ways. Encryption and tokenization are reversible in that the original values can be derived from the obfuscated data.

What is the meaning of data obfuscation?

Data obfuscation is the process of replacing sensitive information with data that looks like real production information, making it useless to malicious actors.

What is an example of obfuscation?

To obfuscate is to confuse someone, or to obscure the meaning of something. An example of obfuscate is when a politician purposely gives vague answers to a question so no one knows his real position. To deliberately make more confusing in order to conceal the truth.


1 Answers

I use generatedata. It is an open source php script which can generate all sorts of dummy data.

like image 102
Peter Hoffmann Avatar answered Oct 26 '22 22:10

Peter Hoffmann