I have a small CSV file of real-world data from tests performed on different days, etc. Not all of the same parameters were tested in each session, so there are a bunch of blank cells sprinkled around the original spreadsheet.
Tuner Location,200,210,220,230,240,250,260,270,280
07/17 #1,,,0.319,0.319,0.233,,0.215,,0.3355
07/21 #1,,0.539,0.482,0.034,0.343,0.478,0.285,0.01,0.538
07/21 #2,,,0.107,0.407,0.559,,0.185,0.439,0.36
07/21 #3,,,0.127,,,,,,
07/22 #1,0.316,0.201,0.646,,,,,,
07/22 #2,,0.098,0.138,0.134,0.194,,,,
07/22 #3,,0.216,0.187,,,,,,
07/27 #1,,0.118,0.065,0.013,1.013,,,,
08/05 #1,,,,,,,0.032,,
08/05 #2,,,,,,,0.128,,
08/05 #3,,,,,,0.235,0.159,0.324,
08/05 #4,,,,,,,0.398,,
08/05 #5,,,,,,0.214,0.121,0.121,
I'm trying to learn to manipulate and display this data in ipython notebook like I would in a regular spreadsheet program. so when I run the following lines inside a notebook:
import pandas as pd
# Set print option so the dataframe will be represented as HTML instead of plain text
pd.core.format.set_printoptions(notebook_repr_html=True)
# Read in csv file as a pandas dataframe
df = pd.read_csv('tuner-data.csv')
# View the HTML representation
df
I get a very nice looking HTML table of the data... with 'NaN' everywhere there was a blank cell in the original CSV file.
I understand 'why' NaN is necessary for later calculations, but it really makes the table hard for viewers to read (my opinion).
Is there a good/easy/simple way to suppress the display of 'NaN' in the HTML table displayed in ipython notebook?
You can visualize a pandas dataframe in Jupyter notebooks by using the display(<dataframe-name>) function. The display() function is supported only on PySpark kernels. The Qviz framework supports 1000 rows and 100 columns.
If there is a certain row with missing data, then you can delete the entire row with all the features in that row. axis=1 is used to drop the column with `NaN` values. axis=0 is used to drop the row with `NaN` values.
This page has some suggestions. For example, you might try:
df.fillna(0)
Or:
df.fillna("")
A potential workaround would be to use the styles, and then show the styled output rather than the df
, as the df.style.format
has a na_rep
parameter
s = df.style.format(na_rep='')
s
The advantage of using the style option is that you do not change the dataframe and therefore will not cause issues with future computations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With