Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hashing and encryption technique for a huge data set containing phone numbers

Description of problem: I'm in the process of working with a highly sensitive data-set that contains the people's phone number information as one of the columns. I need to apply (encryption/hash function on them) to convert them as some encoded values and do my analysis. It can be an one-way hash - i.e, after processing with the encrypted data we wont be converting them back to original phone numbers. Essentially, am looking for an anonymizer that takes phone numbers and converts them to some random value on which I can do my processing. Suggest the best way to do about this process. Recommendations on the best algorithms to use are welcome.

Update: size of the dataset My dataset is really huge in the size of hundreds of GB.

Update: Sensitive By sensitive, I meant that phone number should not be a part of our analysis.So, basically I would need a one-way hashing function but without redundancy - Each phone number should map to unique value --Two phones numbers should not map to a same value.

Update: Implementation ?

Thanks for your answers.I am looking for elaborate implementation.I was going through python's hashlib library for hashing, Does it necessarily do the same set of steps that you suggested ? Here is the link

Can you give me some example code to achieve the process , preferably in Python ?

like image 557
Learner Avatar asked Apr 08 '13 20:04

Learner


1 Answers

Generate a key for your data set (16 or 32 bytes) and keep it secret. Use Hmac-sha1 on your data with this key, and base 64 encode that and you have a random unique string per phonenumber that isn't reversable (without the key).

Example (Hmac-Sha1 with 256bit key) using Keyczar:

Create random secret key:

$> python keyczart.py create --location=path_to_key_set --purpose=sign
$> python keyczart.py addkey --location=path_to_key_set --status=primary

Anonymize phone number:

from keyczar import keyczar

def anonymize(phone_num):
  signer = keyczar.Signer.Read("path_to_key_set");
  return signer.Sign(phone_num)
like image 146
jbtule Avatar answered Oct 09 '22 11:10

jbtule