Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to separate a person's identity from his personal data?

I'm writing an app which main purpose is to keep list of users purchases.

I would like to ensure that even I as a developer (or anyone with full access to the database) could not figure out how much money a particular person has spent or what he has bought.

I initially came up with the following scheme:

    --------------+------------+-----------
    user_hash     | item       | price
    --------------+------------+-----------
    a45cd654fe810 | Strip club |     400.00
    a45cd654fe810 | Ferrari    | 1510800.00
    54da2241211c2 | Beer       |       5.00
    54da2241211c2 | iPhone     |     399.00
  • User logs in with username and password.
  • From the password calculate user_hash (possibly with salting etc.).
  • Use the hash to access users data with normal SQL-queries.

Given enough users, it should be almost impossible to tell how much money a particular user has spent by just knowing his name.

Is this a sensible thing to do, or am I completely foolish?

like image 426
Rene Saarsoo Avatar asked Sep 11 '10 14:09

Rene Saarsoo


People also ask

What data can be used to identify a person?

Personal data is information that relates to an identified or identifiable individual. What identifies an individual could be as simple as a name or a number or could include other identifiers such as an IP address or a cookie identifier, or other factors.

Who is the owner of an individual's personal data?

Data owners are either individuals or teams who make decisions such as who has the right to access and edit data and how it's used. Owners may not work with their data every day, but are responsible for overseeing and protecting a data domain.

What is personal data Capgemini Quiz answers?

Answer: Personal Data, it is destroyed appropriately and securely or anonymized in accordance with the law. Capgemini takes reasonable security measures to protect Personal Data against loss, misuse, unauthorized or accidental access, disclosure, alteration and destruction.


2 Answers

I'm afraid that if your application can link a person to its data, any developer/admin can.

The only thing you can do is making it harder to do the link, to slow the developer/admin, but if you make it harder to link users to data, you will make it harder for your server too.


Idea based on @no idea :

You can have a classic user/password login to your application (hashed password, or whatever), and a special "pass" used to keep your data secure. This "pass" wouldn't be stored in your database.

When your client log in your application I would have to provide user/password/pass. The user/password is checked with the database, and the pass would be used to load/write data.

When you need to write data, you make a hash of your "username/pass" couple, and store it as a key linking your client to your data.

When you need to load data, you make a hash of your "username/pass" couple, and load every data matching this hash.

This way it's impossible to make a link between your data and your user.

In another hand, (as I said in a comment to @no) beware of collisions. Plus if your user write a bad "pass" you can't check it.


Update : For the last part, I had another idea, you can store in your database a hash of your "pass/password" couple, this way you can check if your "pass" is okay.

like image 56
Colin Hebert Avatar answered Sep 18 '22 23:09

Colin Hebert


  1. Create a users table with:
    1. user_id: an identity column (auto-generated id)
    2. username
    3. password: make sure it's hashed!
  2. Create a product table like in your example:
    1. user_hash
    2. item
    3. price

The user_hash will be based off of user_id which never changes. Username and password are free to change as needed. When the user logs in, you compare username/password to get the user_id. You can send the user_hash back to the client for the duration of the session, or an encrypted/indirect version of the hash (could be a session ID, where the server stores the user_hash in the session).

Now you need a way to hash the user_id into user_hash and keep it protected.

  1. If you do it client-side as @no suggested, the client needs to have user_id. Big security hole (especially if it's a web app), hash can be easily be tampered with and algorithm is freely available to the public.
  2. You could have it as a function in the database. Bad idea, since the database has all the pieces to link the records.
  3. For web sites or client/server apps you could have it on your server-side code. Much better, but then one developer has access to the hashing algorithm and data.
  4. Have another developer write the hashing algorithm (which you don't have access to) and stick in on another server (which you also don't have access to) as a TCP/web service. Your server-side code would then pass the user ID and get a hash back. You wouldn't have the algorithm, but you can send all the user IDs through to get all their hashes back. Not a lot of benefits to #3, though the service could have logging and such to try to minimize the risk.
  5. If it's simply a client-database app, you only have choices #1 and 2. I would strongly suggest adding another [business] layer that is server-side, separate from the database server.

Edit: This overlaps some of the previous points. Have 3 servers:

  • Authentication server: Employee A has access. Maintains user table. Has web service (with encrypted communications) that takes user/password combination. Hashes password, looks up user_id in table, generates user_hash. This way you can't simply send all user_ids and get back the hashes. You have to have the password which isn't stored anywhere and is only available during authentication process.
  • Main database server: Employee B has access. Only stores user_hash. No userid, no passwords. You can link the data using the user_hash, but the actual user info is somewhere else.
  • Website server: Employee B has access. Gets login info, passes to authentication server, gets hash back, then disposes login info. Keeps hash in session for writing/querying to the database.

So Employee A has user_id, username, password and algorithm. Employee B has user_hash and data. Unless employee B modifies the website to store the raw user/password, he has no way of linking to the real users.

Using SQL profiling, Employee A would get user_id, username and password hash (since user_hash is generated later in code). Employee B would get user_hash and data.

like image 37
Nelson Rothermel Avatar answered Sep 20 '22 23:09

Nelson Rothermel