I have started exploring Quip API.
I have created a spreadsheet in Quip with the below details:
| id | name | 
|---|---|
| 1 | harry | 
| 2 | hermione | 
| 3 | ron | 
And here is how I am trying to read from Quip:
import quip
import pandas as pd
import numpy as np
import html5lib
client = quip.QuipClient(token, base_url = baseurl)
rawdictionary = client.get_thread(thread_id)
dfs=pd.read_html(rawdictionary['html'])
raw_df = dfs[0]
raw_df.drop(raw_df.columns[[0]], axis = 1, inplace = True) 
#raw_df.dropna(axis=0,inplace=True)
print(raw_df.replace(r'^\s+$', np.nan, regex=True))
I tried to drop rows with nan objects and also tried to replace blank strings with nan. However, I'm still seeing that these null rows and columns are appearing in the dataframe, for eg:
         A         B  C  D  E  F  G  H  I  J  K  L  M  N  O  P
0   id      name                            
1    1    harry                            
2    2  hermione                            
3    3  ron                            
4                                         
5                                         
6                                         
7                                         
8                                         
9                                         
10                                        
11                                        
12                                        
13                                        
14                                        
15                                        
16                                        
17        
                            
               
Questions
id and name in pandas dataframe?raw_df.dropna(axis=0,inplace=True) when I'm running print(raw_df), I'm getting None . Why?Quip automatically pulls in a number of extra blank columns and rows with \u200b unicode characters.
This is how I've resolved this:
import quip
import pandas as pd
import numpy as np
import html5lib
client = quip.QuipClient(token, base_url = baseurl)
rawdictionary = client.get_thread(thread_id)
dfs=pd.read_html(rawdictionary['html'])
raw_df = dfs[0]
raw_df.columns=raw_df.iloc[0] #Make first row as column header
raw_df=raw_df[1:] #After the above step, the 1st two rows become duplicate. Delete the 1st row.
raw_df=raw_df[attribs]
cleaned_df = raw_df.replace(np.nan, 'N/A')
cleaned_df = cleaned_df.replace('\u200b', np.nan) 
cleaned_df.dropna(axis=0,how='any',inplace=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With