Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I securely manage one-off scripts to fix data? [closed]

Occasionally, things go wrong with software. When things go wrong, one of the worst things that can happen is that the system leaves some data in an inconsistent or invalid state. Of course we try to reduce these cases, but they do happen.

When they happen, we often must take corrective action in the form of some data cleanup. (In addition to hardening the code that allowed the inconsistency.) Some of the techniques I've seen for that cleanup include

  1. editing the production database directly
  2. using a REPL like Rails's rails console to edit the production data through code
  3. writing a script and then running it
  4. writing a Rails migration and then running it

The script and migration versions have a couple of advantages over the direct-access versions:

  1. the team can use pull-requests or other code-review techniques to ensure that the script will operate as expected
  2. the script stays behind as a record of what the team did
  3. the script can be run in non-production environments or in a "dry-run" mode

Unfortunately, if checked in, these scripts also have a significant security disadvantage: they leak PII or other sensitive data from the database into source control. For example,

# 2013-07-05: delete all of Susan Yee's OAuth tokens because she's locked out
User.find_by_email('[email protected]').oauth_tokens.delete_all!

I can think of a couple solutions to this risk:

  1. prefer opaque identifiers in the script. I'm for this, but (a) it's not always possible and (b) it also obfuscates exactly what is happening, reducing the value of the audit log.
  2. move all source-control inside your firewall. This is definitely more secure, but means the team can't take advantage of many cloud-based tools (e.g. Code Climate, TravisCI)
  3. separate the scripts from the seed data. Write the scripts in a parameterized fashion even though they will only be used once. This keeps the sensitive data out of source-control, but it means the record of what was done is harder to piece together.

Does anyone have a proven technique?

like image 568
James A. Rosen Avatar asked Nov 01 '22 10:11

James A. Rosen


1 Answers

I'll start by saying that the corrective actions I had to do during my work have never been so frequent or complex to require the writing of dedicated scripts; so I've always stopped to the points 1. and 2. that you said (editing database directly or using rails console).

Anyway, here are a couple of solutions that come to mind:

  1. Make the script general, so it covers the case you want to fix without specifying "ad personam" data. In this case, your example (I report it here for completeness):

    # Delete all of Susan Yee's OAuth tokens because she's locked out
    User.find_by_email('[email protected]').oauth_tokens.delete_all!
    

    becomes something like:

    # Delete locked out people OAuth tokens
    User.locked_out.each{ |u| u.oauth_tokens.delete_all! }
    

    The advantages are that the script is based on a logic and covers general cases; the (big) disadvantage is that not always you have a state that you can use to find the data you want to edit, and for these cases (the "ad personam" cases) this solution is not applicable.

  2. Pass sensible data via arguments / environment variables / git-ignored files

    This is what I use when I have to store per-app informations, like f.e. the Rails secret key: I put it into a file config/.secret_key, something like this) which is git-ignored, and I put the app secret key there.

    But the git-ignored file is for persistent data; since you're dealing with one-off scripts I think arguments are a good solution:

    # Delete all OAuth tokens of the specified user
    User.find_by_email(ARGV[0]).oauth_tokens.delete_all!
    

    and then run history -c so you're safe even if someone manages to leak your history (although you probably have bigger problems in that case :P )

    This solution doesn't suite well if you have also to store the informations of the edits you did; but I don't think this is your case, and anyway the log files should contain the informations about what you did. If you have this need probably is better using some database versioning logic (here some resources about this, if you are interested about it)

like image 134
mdesantis Avatar answered Nov 15 '22 03:11

mdesantis