How would you go about hiding sensitive information from going into log files? Yes, you can consciously choose not to log sensitive bits of information in the first place, but there can be general cases where you blindly log error messages upon failures or trace messages while investigating a problem etc. and end up with sensitive information landing in your log files.
For example, you could be trying to insert an order record that contains the credit card number of a customer into the database. Upon a database failure, you may want to log the SQL statement that was just executed. You would then end up with the credit card number of the customer in a log file.
Is there a design paradigm that can be employed to "tag" certain bits of information as sensitive so that a generic logging pipeline can filter them out?
Redact and Mask Data Besides tokenization, combining redaction and masking is another effective method to keep sensitive data out of your logs. Some application services may need partial access to data, like the last four digits of a credit card number or social security number (SSN).
How can I protect Sensitive Data? Encryption is the most effective way to protect your data from unauthorized access. Encryption can be defined as transforming the data into an alternative format that can only be read by a person with access to a decryption key.
Accessibility of sensitive data. For example, frequently used sensitive data is best stored on a high-speed medium, such as an HDD or SSD. If the storage media are in a data center, they are much easier to monitor for security and unauthorized access than if the storage media are in a cloud environment.
My current practice for the case in question is to log a hash of such sensitive information. This enables us to identify log records that belong to a specific claim (for example a specific credit-card number) but does not give anybody the power to just grab the logs and use the sensitive information for their evil purposes.
Of course, doing this consistently involves good coding practices. I usually choose to log all objects using their toString
overloads (in Java or .NET) which serializes the hash of the values for fields marked with a Sensitive
attribute applied to them.
Of course, SQL strings are more problematic, but we rely more on our ORM for data persistence and log the state of the system at various stages then log SQL queries, thus it is becomes a non-issue.
I would personally regard the log files themselves as sensitive information and make sure to restrict access to them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With