Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where to store metadata associated with files?

Tags:

python

This a question on storing and loading data, particularly in Python. I'm not entirely sure this is the appropriate forum, so redirect me if not.

I'm handling about 50 1000-row CSV files, and each has 10 parameters of associated metadata. What is the best method to store this in regards to:

(A) All the information is human-readable plain text and it's easy for a non-programming human to associate data and metadata. (B) It's convenient to load the metadata and each column of the csv to a python dictionary.

I've considered four possible solutions:

(0) Previously, I've stored smaller amounts of metadata in the filename. This is bad for obvious reasons.

(1) Assign each CSV file a ID number, name each "ID.csv" and then produce a "metadata.csv" which maps each CSV ID number to its metadata. The shortcomings here are that using ID numbers reduces human readability. (To learn the contents of a file a non-programming human reader must manually check the "metadata.csv")

(2) Leave the metadata at the top of CSV file. This has shortcomings in that my program would need to perform two steps: (a) get the metadata from some arbitrary number of lines at the top of the file and (b) tell the CSV reader (pandas.read_csv) to ignore first few lines.

(3) Convert to CSV to some data serialization format like YAML, where I could then easily include the metadata. This has shortcomings of easily loading the columns of the CSV to my dictionary, and not everyone knows YAML.

Are there any clever solutions to this problem? Thanks!

like image 528
user126350 Avatar asked Nov 01 '22 04:11

user126350


1 Answers

This question is a tad suggestive so it may be closed, but let me offer the suggestion of the built-in python module for handling json files. JSON maintains a good balance of "human-readability" and is highly portable to almost any language or format. You could construct from your original data to something like this:

{ 
  "metadata":{"name":"foo", "status":"bar"},
  "data":[[1,2,3],[4,5,6],[....]]
}

where data is your original CSV file and metadata is a dictionary containing whatever data you would like store. Additionally it is also simple to "strip" the metadata out and return the original csv data from this format - all within the confines of built-in python modules.

like image 59
Hooked Avatar answered Nov 05 '22 00:11

Hooked