I have a DataFrame
with named rows and columns indexes:
import numpy as np
import pandas as pd
I = pd.Index(["a", "b", "c", "d"], name="rows")
C = pd.Index(["col0", "col1", "col2"], name="cols")
df = pd.DataFrame(data=np.random.rand(4, 3),
index=I,
columns=C)
I have tried to store it in several formats (Excel, CSV) but when re-reading the file, the names are lost (maybe I have missed some options). Msgpack
works but it is marked as experimental so I would prefer to avoid it for now. I would also prefer to avoid pickle
. Is there any way (format and option) to store the name of the 2 indexes?
EDIT: I know how to write and read CSV with pandas. The problem is to save the name of the column index and of the row index.
You can use hdf.
import numpy as np
import pandas as pd
I = pd.Index(["a", "b", "c", "d"], name="rows")
C = pd.Index(["col0", "col1", "col2"], name="columns")
df = pd.DataFrame(data=np.random.rand(4,3), index=I, columns=C)
print(df)
columns col0 col1 col2
rows
a 0.098497 0.918954 0.642800
b 0.168266 0.678434 0.455059
c 0.434939 0.244027 0.599400
d 0.877356 0.053085 0.182661
df.to_hdf('test.hdf', 'test')
print(pd.read_hdf('test.hdf'))
columns col0 col1 col2
rows
a 0.098497 0.918954 0.642800
b 0.168266 0.678434 0.455059
c 0.434939 0.244027 0.599400
d 0.877356 0.053085 0.182661
You can export the DataFrame to a csv-file using .to_csv()
and read it back in using .read_csv()
. I extended the code you already had as follows:
#!/usr/bin/env python3
# coding: utf-8
import numpy as np
import pandas as pd
I = pd.Index(["a", "b", "c", "d"], "rows")
C = pd.Index(["col0", "col1", "col2"], "cols")
df = pd.DataFrame(data=np.random.rand(4,3), index=I, columns=C)
# export DataFrame to csv
df.to_csv('out.csv')
# set index_col in order to read first column as indices
df_in = pd.read_csv('out.csv', index_col=0)
So the DataFrame df
looks like this:
col0 col1 col2
a 0.590016 0.834033 0.535310
b 0.421589 0.897302 0.029500
c 0.373580 0.109005 0.239181
d 0.473872 0.075918 0.751628
The csv-file out.csv
looks like this:
,col0,col1,col2
a,0.5900160748408918,0.8340332218911729,0.5353103406507513
b,0.42158899389955884,0.8973015040807538,0.029500416731096046
c,0.37357951184145965,0.10900495955642386,0.2391805787788026
d,0.47387186813644167,0.07591794371425187,0.7516279365972057
Reading the data back in leads to the DataFrame df_in
as follows:
col0 col1 col2
a 0.590016 0.834033 0.535310
b 0.421589 0.897302 0.029500
c 0.373580 0.109005 0.239181
d 0.473872 0.075918 0.751628
So df2
is exactly the same as df
which shows that export and the desired import is working as expected.
EDIT to export column and index names:
df.to_csv('out.csv', index_label=[df.index.name, df.columns.name])
However, this makes re-importing a bit difficult since the columns name is added as a additional column. Normally, this is useful for multi-indexed data, but leads to an additional empty column here.
So I would suggest to export the index name, only:
# export DataFrame to csv
df.to_csv('out.csv', index_label=df.index.name)
# set index_col in order to read first column as indices
df_in = pd.read_csv('out.csv', index_col=0)
which leads to df_in
as:
col0 col1 col2
rows
a 0.442467 0.959260 0.626502
b 0.639044 0.989795 0.853002
c 0.576137 0.350260 0.532920
d 0.235698 0.095978 0.194151
I do not know why you need to export the names of both index and colums. If you simply want to access the row or column names you can get their label like this:
column_labels = df.columns.get_values()
>>> array(['col0', 'col1', 'col2'], dtype=object)
index_labels = df.index.get_values()
>>> array(['a', 'b', 'c', 'd'], dtype=object)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With