How do I securely manage one-off scripts to fix data? [closed]

Question

Occasionally, things go wrong with software. When things go wrong, one of the worst things that can happen is that the system leaves some data in an inconsistent or invalid state. Of course we try to reduce these cases, but they do happen.

When they happen, we often must take corrective action in the form of some data cleanup. (In addition to hardening the code that allowed the inconsistency.) Some of the techniques I've seen for that cleanup include

editing the production database directly
using a REPL like Rails's rails console to edit the production data through code
writing a script and then running it
writing a Rails migration and then running it

The script and migration versions have a couple of advantages over the direct-access versions:

the team can use pull-requests or other code-review techniques to ensure that the script will operate as expected
the script stays behind as a record of what the team did
the script can be run in non-production environments or in a "dry-run" mode

Unfortunately, if checked in, these scripts also have a significant security disadvantage: they leak PII or other sensitive data from the database into source control. For example,

# 2013-07-05: delete all of Susan Yee's OAuth tokens because she's locked out
User.find_by_email('susan.yee@example.com').oauth_tokens.delete_all!

I can think of a couple solutions to this risk:

prefer opaque identifiers in the script. I'm for this, but (a) it's not always possible and (b) it also obfuscates exactly what is happening, reducing the value of the audit log.
move all source-control inside your firewall. This is definitely more secure, but means the team can't take advantage of many cloud-based tools (e.g. Code Climate, TravisCI)
separate the scripts from the seed data. Write the scripts in a parameterized fashion even though they will only be used once. This keeps the sensitive data out of source-control, but it means the record of what was done is harder to piece together.

Does anyone have a proven technique?

mdesantis · Accepted Answer

I'll start by saying that the corrective actions I had to do during my work have never been so frequent or complex to require the writing of dedicated scripts; so I've always stopped to the points 1. and 2. that you said (editing database directly or using rails console).

Anyway, here are a couple of solutions that come to mind:

Make the script general, so it covers the case you want to fix without specifying "ad personam" data. In this case, your example (I report it here for completeness):
```
# Delete all of Susan Yee's OAuth tokens because she's locked out
User.find_by_email('susan.yee@example.com').oauth_tokens.delete_all!
```
becomes something like:
```
# Delete locked out people OAuth tokens
User.locked_out.each{ |u| u.oauth_tokens.delete_all! }
```
The advantages are that the script is based on a logic and covers general cases; the (big) disadvantage is that not always you have a state that you can use to find the data you want to edit, and for these cases (the "ad personam" cases) this solution is not applicable.
Pass sensible data via arguments / environment variables / git-ignored files

This is what I use when I have to store per-app informations, like f.e. the Rails secret key: I put it into a file config/.secret_key, something like this) which is git-ignored, and I put the app secret key there.

But the git-ignored file is for persistent data; since you're dealing with one-off scripts I think arguments are a good solution:
```
# Delete all OAuth tokens of the specified user
User.find_by_email(ARGV[0]).oauth_tokens.delete_all!
```
and then run history -c so you're safe even if someone manages to leak your history (although you probably have bigger problems in that case :P )

This solution doesn't suite well if you have also to store the informations of the edits you did; but I don't think this is your case, and anyway the log files should contain the informations about what you did. If you have this need probably is better using some database versioning logic (here some resources about this, if you are interested about it)

How do I securely manage one-off scripts to fix data? [closed]

Tags:

security

ruby

debugging

ruby-on-rails

James A. Rosen

1 Answers

mdesantis

Recent Activity

Donate For Us

How do I securely manage one-off scripts to fix data? [closed]

Tags:

security

ruby

debugging

ruby-on-rails

James A. Rosen

1 Answers

mdesantis

Related questions

Recent Activity

Donate For Us