Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pretty printing newlines inside a string in a Pandas DataFrame

I have a Pandas DataFrame in which one of the columns contains string elements, and those string elements contain new lines that I would like to print literally. But they just appear as \n in the output.

That is, I want to print this:

  pos     bidder
0   1
1   2
2   3  <- alice
       <- bob
3   4

but this is what I get:

  pos            bidder
0   1
1   2
2   3  <- alice\n<- bob
3   4

How can I accomplish what I want? Can I use a DataFrame, or will I have to revert to manually printing padded columns one row at a time?

Here's what I have so far:

n = 4
output = pd.DataFrame({
    'pos': range(1, n+1),
    'bidder': [''] * n
})
bids = {'alice': 3, 'bob': 3}
used_pos = []
for bidder, pos in bids.items():
    if pos in used_pos:
        arrow = output.ix[pos, 'bidder']
        output.ix[pos, 'bidder'] = arrow + "\n<- %s" % bidder
    else:
        output.ix[pos, 'bidder'] = "<- %s" % bidder
print(output)
like image 724
shadowtalker Avatar asked Dec 16 '15 21:12

shadowtalker


People also ask

How do I print a Dataframe neatly?

You can use the print() method to print the dataframe in a table format. You can convert the dataframe to String using the to_string() method and pass it to the print method which will print the dataframe.

What is tilde in Dataframe?

tilde ~ is a bitwise operator. If the operand is 1, it returns 0, and if 0, it returns 1. So you will get the InvoiceNo values in the df that does not contain the string 'C'


4 Answers

If you're trying to do this in ipython notebook, you can do:

from IPython.display import display, HTML

def pretty_print(df):
    return display( HTML( df.to_html().replace("\\n","<br>") ) )
like image 95
unsorted Avatar answered Oct 17 '22 06:10

unsorted


Using pandas .set_properties() and CSS white-space property

[For use in IPython notebooks]

Another way will be to use pandas's pandas.io.formats.style.Styler.set_properties() method and the CSS "white-space": "pre-wrap" property:

from IPython.display import display

# Assuming the variable df contains the relevant DataFrame
display(df.style.set_properties(**{
    'white-space': 'pre-wrap',
})

To keep the text left-aligned, you might want to add 'text-align': 'left' as below:

from IPython.display import display

# Assuming the variable df contains the relevant DataFrame
display(df.style.set_properties(**{
    'text-align': 'left',
    'white-space': 'pre-wrap',
})

like image 25
yongjieyongjie Avatar answered Oct 17 '22 08:10

yongjieyongjie


Somewhat in line with unsorted's answer:

import pandas as pd

# Save the original `to_html` function to call it later
pd.DataFrame.base_to_html = pd.DataFrame.to_html
# Call it here in a controlled way
pd.DataFrame.to_html = (
    lambda df, *args, **kwargs: 
        (df.base_to_html(*args, **kwargs)
           .replace(r"\n", "<br/>"))
)

This way, you don't need to call any explicit function in Jupyter notebooks, as to_html is called internally. If you want the original function, call base_to_html (or whatever you named it).

I'm using jupyter 1.0.0, notebook 5.7.6.

like image 39
Roger d'Amiens Avatar answered Oct 17 '22 07:10

Roger d'Amiens


From pandas.DataFrame documention:

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure

So you can't have a row without an index. Newline "\n" won't work in DataFrame.

You could overwrite 'pos' with an empty value, and output the next 'bidder' on the next row. But then index and 'pos' would be offset every time you do that. Like:

  pos    bidder
0   1          
1   2          
2   3  <- alice
3        <- bob
4   5   

So if a bidder called 'frank' had 4 as value, it would overwrite 'bob'. This would cause problems as you add more. It is probably possible to use DataFrame and write code to work around this issue, but probably worth looking into other solutions.

Here is the code to produce the output structure above.

import pandas as pd

n = 5
output = pd.DataFrame({'pos': range(1, n + 1),
                      'bidder': [''] * n},
                      columns=['pos', 'bidder'])
bids = {'alice': 3, 'bob': 3}
used_pos = []
for bidder, pos in bids.items():
    if pos in used_pos:
        output.ix[pos, 'bidder'] = "<- %s" % bidder
        output.ix[pos, 'pos'] = ''
    else:
        output.ix[pos - 1, 'bidder'] = "<- %s" % bidder
        used_pos.append(pos)
print(output)

Edit:

Another option is to restructure the data and output. You could have pos as columns, and create a new row for each key/person in the data. In the code example below it prints the DataFrame with NaN values replaced with an empty string.

import pandas as pd

data = {'johnny\nnewline': 2, 'alice': 3, 'bob': 3,
        'frank': 4, 'lisa': 1, 'tom': 8}
n = range(1, max(data.values()) + 1)

# Create DataFrame with columns = pos
output = pd.DataFrame(columns=n, index=[])

# Populate DataFrame with rows
for index, (bidder, pos) in enumerate(data.items()):
    output.loc[index, pos] = bidder

# Print the DataFrame and remove NaN to make it easier to read.
print(output.fillna(''))

# Fetch and print every element in column 2
for index in range(1, 5):
    print(output.loc[index, 2])

It depends what you want to do with the data though. Good luck :)

like image 5
oystein-hr Avatar answered Oct 17 '22 06:10

oystein-hr