Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programmatically convert pandas dataframe to markdown table

I have a Pandas Dataframe generated from a database, which has data with mixed encodings. For example:

+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+ | ID | path                    | language | date       | longest_sentence                               | shortest_sentence                                      | number_words | readability_consensus | +----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+ | 0  | data/Eng/Sagitarius.txt | Eng      | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...  | 306          | 11th and 12th grade   | +----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+ | 31 | data/Nor/Høylandet.txt  | Nor      | 2015-07-22 | Høgskolen i Østfold er et eksempel...          | Som skuespiller har jeg både...                        | 253          | 15th and 16th grade   | +----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+ 

As seen there is a mix of English and Norwegian (encoded as ISO-8859-1 in the database I think). I need to get the contents of this Dataframe output as a Markdown table, but without getting problems with encoding. I followed this answer (from the question Generate Markdown tables?) and got the following:

import sys, sqlite3  db = sqlite3.connect("Applications.db") df = pd.read_sql_query("SELECT path, language, date, longest_sentence, shortest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db) db.close()  rows = [] for index, row in df.iterrows():     items = (row['date'],               row['path'],               row['language'],               row['shortest_sentence'],              row['longest_sentence'],               row['number_words'],               row['readability_consensus'])     rows.append(items)  headings = ['Date',              'Path',              'Language',             'Shortest Sentence',              'Longest Sentence since',              'Words',             'Grade level']  fields = [0, 1, 2, 3, 4, 5, 6] align = [('^', '<'), ('^', '^'), ('^', '<'), ('^', '^'), ('^', '>'),          ('^','^'), ('^','^')]  table(sys.stdout, rows, fields, headings, align) 

However, this yields an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 72: ordinal not in range(128) error. How can I output the Dataframe as a Markdown table? That is, for the purpose of storing this code in a file for use in writing a Markdown document. I need the output to look like this:

| ID | path                    | language | date       | longest_sentence                               | shortest_sentence                                      | number_words | readability_consensus | |----|-------------------------|----------|------------|------------------------------------------------|--------------------------------------------------------|--------------|-----------------------| | 0  | data/Eng/Sagitarius.txt | Eng      | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...  | 306          | 11th and 12th grade   | | 31 | data/Nor/Høylandet.txt  | Nor      | 2015-07-22 | Høgskolen i Østfold er et eksempel...          | Som skuespiller har jeg både...                        | 253          | 15th and 16th grade   | 
like image 559
OleVik Avatar asked Oct 17 '15 01:10

OleVik


People also ask

What is Tolist () in pandas?

Pandas series can be converted to a list using tolist() or type casting method. There can be situations when you want to perform operations on a list instead of a pandas object. In such cases, you can store the DataFrame columns in a list and perform the required operations.

What does Tolist () do in Python?

The tolist() function is used to convert a given array to an ordinary list with the same items, elements, or values.

Does pandas use PyArrow?

To interface with pandas, PyArrow provides various conversion routines to consume pandas structures and convert back to them. While pandas uses NumPy as a backend, it has enough peculiarities (such as a different type system, and support for null values) that this is a separate topic from NumPy Integration.

Can you use Numba with pandas?

Numba can be used in 2 ways with pandas: Specify the engine="numba" keyword in select pandas methods. Define your own Python function decorated with @jit and pass the underlying NumPy array of Series or Dataframe (using to_numpy() ) into the function.


2 Answers

Improving the answer further, for use in IPython Notebook:

def pandas_df_to_markdown_table(df):     from IPython.display import Markdown, display     fmt = ['---' for i in range(len(df.columns))]     df_fmt = pd.DataFrame([fmt], columns=df.columns)     df_formatted = pd.concat([df_fmt, df])     display(Markdown(df_formatted.to_csv(sep="|", index=False)))  pandas_df_to_markdown_table(infodf) 

Or use tabulate:

pip install tabulate 

Examples of use are in the documentation.

like image 162
kpykc Avatar answered Sep 21 '22 00:09

kpykc


I recommend python-tabulate library for generating ascii-tables. The library supports pandas.DataFrame as well.

Here is how to use it:

from pandas import DataFrame from tabulate import tabulate  df = DataFrame({     "weekday": ["monday", "thursday", "wednesday"],     "temperature": [20, 30, 25],     "precipitation": [100, 200, 150], }).set_index("weekday")  print(tabulate(df, tablefmt="pipe", headers="keys")) 

Output:

| weekday   |   temperature |   precipitation | |:----------|--------------:|----------------:| | monday    |            20 |             100 | | thursday  |            30 |             200 | | wednesday |            25 |             150 | 
like image 29
user4157482 Avatar answered Sep 21 '22 00:09

user4157482