DataFrame.to_hdf(path_or_buf, key, **kwargs)
In pandas official document, it is said that key is identifier for the group in the store.
But what does that mean? Still, I cannot find sufficient examples for that. I have tried some arbitrary values for parameter key, but I didn't see any difference between them. Sometimes, the api reference can be quite ambiguous. Can anyone offer me some examples to help me to have a better understanding of parameter key?
We can read data from a text file using read_table() in pandas. This function reads a general delimited file to a DataFrame object. This function is essentially the same as the read_csv() function but with the delimiter = '\t', instead of a comma by default.
pandas mean() Key PointsBy default ignore NaN values and performs mean on index axis.
An hack could be to create N pandas dataframes (each less than 2 GB) (horizontal partitioning) from the big one and create N different spark dataframes, then merging (Union) them to create a final one to write into HDFS.
In pandas to_hdf, the 'key' parameter is the name of the object you are storing in the hdf5 file. You can store multiple objects (dataframes) in a single hdf5 file. So for instance, you can store dataframe 'xyz' AND dataframe 'abc' in the same file, so in this case you would use key='xyz' if you wanted to store dataframe 'xyz' in your hdf5 file.
The 'key' is basically whatever name you want to name the specific object you are storing. It is like a 'key' in a dictionary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With