I have a Pandas Dataframe generated from a database, which has data with mixed encodings. For example:
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+ | ID | path | language | date | longest_sentence | shortest_sentence | number_words | readability_consensus | +----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+ | 0 | data/Eng/Sagitarius.txt | Eng | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not... | 306 | 11th and 12th grade | +----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+ | 31 | data/Nor/Høylandet.txt | Nor | 2015-07-22 | Høgskolen i Østfold er et eksempel... | Som skuespiller har jeg både... | 253 | 15th and 16th grade | +----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
As seen there is a mix of English and Norwegian (encoded as ISO-8859-1 in the database I think). I need to get the contents of this Dataframe output as a Markdown table, but without getting problems with encoding. I followed this answer (from the question Generate Markdown tables?) and got the following:
import sys, sqlite3 db = sqlite3.connect("Applications.db") df = pd.read_sql_query("SELECT path, language, date, longest_sentence, shortest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db) db.close() rows = [] for index, row in df.iterrows(): items = (row['date'], row['path'], row['language'], row['shortest_sentence'], row['longest_sentence'], row['number_words'], row['readability_consensus']) rows.append(items) headings = ['Date', 'Path', 'Language', 'Shortest Sentence', 'Longest Sentence since', 'Words', 'Grade level'] fields = [0, 1, 2, 3, 4, 5, 6] align = [('^', '<'), ('^', '^'), ('^', '<'), ('^', '^'), ('^', '>'), ('^','^'), ('^','^')] table(sys.stdout, rows, fields, headings, align)
However, this yields an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 72: ordinal not in range(128)
error. How can I output the Dataframe as a Markdown table? That is, for the purpose of storing this code in a file for use in writing a Markdown document. I need the output to look like this:
| ID | path | language | date | longest_sentence | shortest_sentence | number_words | readability_consensus | |----|-------------------------|----------|------------|------------------------------------------------|--------------------------------------------------------|--------------|-----------------------| | 0 | data/Eng/Sagitarius.txt | Eng | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not... | 306 | 11th and 12th grade | | 31 | data/Nor/Høylandet.txt | Nor | 2015-07-22 | Høgskolen i Østfold er et eksempel... | Som skuespiller har jeg både... | 253 | 15th and 16th grade |
Pandas series can be converted to a list using tolist() or type casting method. There can be situations when you want to perform operations on a list instead of a pandas object. In such cases, you can store the DataFrame columns in a list and perform the required operations.
The tolist() function is used to convert a given array to an ordinary list with the same items, elements, or values.
To interface with pandas, PyArrow provides various conversion routines to consume pandas structures and convert back to them. While pandas uses NumPy as a backend, it has enough peculiarities (such as a different type system, and support for null values) that this is a separate topic from NumPy Integration.
Numba can be used in 2 ways with pandas: Specify the engine="numba" keyword in select pandas methods. Define your own Python function decorated with @jit and pass the underlying NumPy array of Series or Dataframe (using to_numpy() ) into the function.
Improving the answer further, for use in IPython Notebook:
def pandas_df_to_markdown_table(df): from IPython.display import Markdown, display fmt = ['---' for i in range(len(df.columns))] df_fmt = pd.DataFrame([fmt], columns=df.columns) df_formatted = pd.concat([df_fmt, df]) display(Markdown(df_formatted.to_csv(sep="|", index=False))) pandas_df_to_markdown_table(infodf)
Or use tabulate:
pip install tabulate
Examples of use are in the documentation.
I recommend python-tabulate library for generating ascii-tables. The library supports pandas.DataFrame
as well.
Here is how to use it:
from pandas import DataFrame from tabulate import tabulate df = DataFrame({ "weekday": ["monday", "thursday", "wednesday"], "temperature": [20, 30, 25], "precipitation": [100, 200, 150], }).set_index("weekday") print(tabulate(df, tablefmt="pipe", headers="keys"))
Output:
| weekday | temperature | precipitation | |:----------|--------------:|----------------:| | monday | 20 | 100 | | thursday | 30 | 200 | | wednesday | 25 | 150 |
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With