Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas replaces NAN with arbitrary number when to_latex

Tags:

python

pandas

nan

I have a large multi-indexed multi-columned dataframe df, that I'm not showing here. I generate a slice of index like this:

subDf = df.sort_index(level=0).loc[:'e']

This slice then contains NaN in the second level of the index:

>>> subDf.iloc[0:1]
                  change
robustness value        
baseline   NaN     -14.5

The csv generated by to_csv() appears to be correct:

>>> subDf.iloc[0:1].to_csv()
Out[15]: 'robustness,value,change\nbaseline,,-14.5\n'

Similarly, to_html() is functioning like expeted. However, when I try to get the latex_output, the NaN vanishes and a 50.00 appears:

>>> subDf.iloc[0:1].to_latex()
Out[14]: u'\\begin{tabular}{llr}\n\\toprule\n                &       &  change \\\\\nrobustness & value &         \\\\\n\\midrule\nbaseline & 50.00 &   -14.5 \\\\\n\\bottomrule\n\\end{tabular}\n'

The 50.00 is not a completely arbitrary number, it is the last value in the second-layer of the multi-index in the original data frame:

>>> df.index
Out[18]: 
MultiIndex(levels=[[u'a', u'b', u'c', u'd', u'e', u'baseline', u'f'], [0.01, 0.04, 0.25, 0.75, 0.86, 0.99, 1.0, 2.0, 4.0, 10.0, 50.0]],
           labels=[[5, 6, 6, 2, 2, 1, 3, 3, 3, 4, 4, 0, 0], [-1, 0, 1, 2, 3, 9, 6, 7, 8, 4, 5, 9, 10]],
           names=[u'robustness', u'value'])

Two questions arise:

  • why is this happening in the first place?
  • if it is indeed unexpected behavior that I can't affect in the short run, how can I circumvent this and force to_latex() to print out a NaN?
like image 917
FooBar Avatar asked Sep 19 '16 09:09

FooBar


1 Answers

This is probably much too late to help, but for posterity, but I think either of these approaches should work:

  1. Convert the column with the NaNs to string; this will give you nan in the LaTeX.

  2. If you'd rather have NaN than nan, you can either do 1. and then replace or you can just do df.fillna('NaN').

Obviously these approaches modify your dataframe in a way that isn't good for further analysis, but I think this is an easy workaround; just make a copy of your dataframe first.

(I tested these approaches on a dataframe with just a single-level index, but I can't imagine that it would work any differently for multi-level)

like image 135
Nathan Avatar answered Oct 28 '22 08:10

Nathan