How to write/read a Pandas DataFrame with MultiIndex from/to an ASCII file?

Tags:

pandas

I want to be able to create a Pandas DataFrame with MultiIndexes for the rows and the columns index and read it from an ASCII text file. My data looks like:

col_indx = MultiIndex.from_tuples([('A',  'B',  'C'), ('A',  'B',  'C2'), ('A',  'B',  'C3'), 
                                   ('A',  'B2', 'C'), ('A',  'B2', 'C2'), ('A',  'B2', 'C3'), 
                                   ('A',  'B3', 'C'), ('A',  'B3', 'C2'), ('A',  'B3', 'C3'), 
                                   ('A2', 'B',  'C'), ('A2', 'B',  'C2'), ('A2', 'B',  'C3'), 
                                   ('A2', 'B2', 'C'), ('A2', 'B2', 'C2'), ('A2', 'B2', 'C3'), 
                                   ('A2', 'B3', 'C'), ('A2', 'B3', 'C2'), ('A2', 'B3', 'C3')], 
                                   names=['one','two','three']) 
row_indx = MultiIndex.from_tuples([(0,  'North', 'M'), 
                                   (1,  'East',  'F'), 
                                   (2,  'West',  'M'), 
                                   (3,  'South', 'M'), 
                                   (4,  'South', 'F'), 
                                   (5,  'West',  'F'), 
                                   (6,  'North', 'M'), 
                                   (7,  'North', 'M'), 
                                   (8,  'East',  'F'), 
                                   (9,  'South', 'M')], 
                                   names=['n', 'location', 'sex'])
size=len(row_indx), len(col_indx)
data = np.random.randint(0,10, size)
df = DataFrame(data, index=row_indx, columns=col_indx)
print df

I've tried df.to_csv() and read_csv() but they don't keep the index.

I was thinking of maybe creating a new format using extra delimeters. For example, using a row of ---------------- to mark the end of the column indexes and a | to mark the end of a row index. So it would look like this:

one            | A   A   A   A   A   A   A   A   A  A2  A2  A2  A2  A2  A2  A2  A2  A2
two            | B   B   B  B2  B2  B2  B3  B3  B3   B   B   B  B2  B2  B2  B3  B3  B3
three          | C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3
--------------------------------------------------------------------------------------
n location sex :                                                                      
0 North    M   | 2   3   9   1   0   6   5   9   5   9   4   4   0   9   6   2   6   1
1 East     F   | 6   2   9   2   7   0   0   3   7   4   8   1   3   2   1   7   7   5
2 West     M   | 5   8   9   7   6   0   3   0   2   5   0   3   9   6   7   3   4   9
3 South    M   | 6   2   3   6   4   0   4   0   1   9   3   6   2   1   0   6   9   3
4 South    F   | 9   6   0   0   6   1   7   0   8   1   7   6   2   0   8   1   5   3
5 West     F   | 7   9   7   8   2   0   4   3   8   9   0   3   4   9   2   5   1   7
6 North    M   | 3   3   5   7   9   4   2   6   3   2   7   5   5   5   6   4   2   9
7 North    M   | 7   4   8   6   8   4   5   7   9   0   2   9   1   9   7   9   5   6
8 East     F   | 1   6   5   3   6   4   6   9   6   9   2   4   2   9   8   4   2   4
9 South    M   | 9   6   6   1   3   1   3   5   7   4   8   6   7   7   8   9   2   3

Does Pandas have a way to write/read DataFrames to/from ASCII files with MultiIndexes?

749

asked Jun 14 '12 21:06

dailyglen

2 Answers

Not sure which version of pandas you are using but with 0.7.3 you can export your DataFrame to a TSV file and retain the indices by doing this:

df.to_csv('mydf.tsv', sep='\t')

The reason you need to export to TSV versus CSV is since the column headers have , characters in them. This should solve the first part of your question.

The second part gets a bit more tricky since from as far as I can tell, you need to beforehand have an idea of what you want your DataFrame to contain. In particular, you need to know:

Which columns on your TSV represent the row MultiIndex
and that the rest of the columns should also be converted to a MultiIndex

To illustrate this, lets read back the TSV file we saved above into a new DataFrame:

In [1]: t_df = read_table('mydf.tsv', index_col=[0,1,2])
In [2]: all(t_df.index == df.index)
Out[2]: True

So we managed to read mydf.tsv into a DataFrame that has the same row index as the original df. But:

In [3]: all(t_df.columns == df.columns)
Out[3]: False

And the reason here is because pandas (as far as I can tell) has no way of parsing the header row correctly into a MultiIndex. As I mentioned above, if you know beorehand that your TSV file header represents a MultiIndex then you can do the following to fix this:

In [4]: from ast import literal_eval
In [5]: t_df.columns = MultiIndex.from_tuples(t_df.columns.map(literal_eval).tolist(), 
                                              names=['one','two','three'])
In [6]: all(t_df.columns == df.columns)
Out[6]: True

129

answered Sep 17 '22 23:09

diliop

You can change the print options using set_option:

display.multi_sparse:
: boolean
Default True, "sparsify" MultiIndex display
(don't display repeated elements in outer levels within groups)

Now the DataFrame will be printed as desired:

In [11]: pd.set_option('multi_sparse', False)

In [12]: df
Out[12]: 
one             A   A   A   A   A   A   A   A   A  A2  A2  A2  A2  A2  A2  A2  A2  A2
two             B   B   B  B2  B2  B2  B3  B3  B3   B   B   B  B2  B2  B2  B3  B3  B3
three           C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3   C  C2  C3
n location sex                                                                       
0 North    M    2   1   6   4   6   4   7   1   1   0   4   3   9   2   0   0   6   4
1 East     F    3   5   5   6   4   8   0   3   2   3   9   8   1   6   7   4   7   2
2 West     M    7   9   3   5   0   1   2   8   1   6   0   7   9   9   3   2   2   4
3 South    M    1   0   0   3   5   7   7   0   9   3   0   3   3   6   8   3   6   1
4 South    F    8   0   0   7   3   8   0   8   0   5   5   6   0   0   0   1   8   7
5 West     F    6   5   9   4   7   2   5   6   1   2   9   4   7   5   5   4   3   6
6 North    M    3   3   0   1   1   3   6   3   8   6   4   1   0   5   5   5   4   9
7 North    M    0   4   9   8   5   7   7   0   5   8   4   1   5   7   6   3   6   8
8 East     F    5   6   2   7   0   6   2   7   1   2   0   5   6   1   4   8   0   3
9 South    M    1   2   0   6   9   7   5   3   3   8   7   6   0   5   4   3   5   9

Note: in older pandas versions this was pd.set_printoptions(multi_sparse=False).

answered Sep 21 '22 23:09

Andy Hayden

Related questions
                            
                                Create kml from csv in Python
                            
                                How to get arguments list of a built-in Python class constructor?
                            
                                Find specific link w/ beautifulsoup
                            
                                Why this error from urllib?
                            
                                How should I implement "nested" subcommands in Python?
                            
                                What's the best way to disable Jinja2 template caching in bottle.py?
                            
                                Are there any toolkit libraries for curses with Python bindings?
                            
                                removing accent and special characters [duplicate]
                            
                                how to normalize list of lists of strings in python?
                            
                                How can I send anything other than strings through Python sock.send()
                            
                                How to set application title in Gnome Shell?
                            
                                python how to append to file in zip archive
                            
                                how to copy directory with all file from c:\\xxx\yyy to c:\\zzz\ in python
                            
                                Centering line-broken axis label in matplotlib
                            
                                What's a good approach to managing the db connection in a Google Cloud SQL (GAE) Python app?
                            
                                unbound method with instance as first argument got string but requires something else
                            
                                Python - Algorithm find time slots
                            
                                Python name space issues with ipython parallel
                            
                                Use of curve_fit to fit data
                            
                                What's the Groovy equivalent to Python's dir()?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With