I have a df like so: <pre class="prettyprint"><code>import pandas a=[['1/2/2014', 'a', '6', 'z1'], ['1/2/2014', 'a', '3', 'z1'], ['1/3/2014', 'c', '1', 'x3'], ] df = pandas.DataFrame.from_records(a[1:],columns=a[0]) </code></pre> I want to flatten the df so it is one continuous list like so: <code>['1/2/2014', 'a', '6', 'z1', '1/2/2014', 'a', '3', 'z1','1/3/2014', 'c', '1', 'x3']</code> I can loop through the rows and <code>extend</code> to a list, but is a much easier way to do it?

You can use <code>.flatten()</code> on the DataFrame converted to a NumPy array: <pre class="prettyprint"><code>df.to_numpy().flatten() </code></pre> and you can also add <code>.tolist()</code> if you want the result to be a Python <code>list</code>. <h3>Edit</h3> In previous versions of Pandas, the <code>values</code> attributed was used instead of the <code>.to_numpy()</code> method, as mentioned in the comments below.

Maybe use stack? <pre class="prettyprint"><code>df.stack().values array(['1/2/2014', 'a', '3', 'z1', '1/3/2014', 'c', '1', 'x3'], dtype=object) </code></pre> (Edit: Incidentally, the DF in the Q uses the first row as labels, which is why they're not in the output here.)

python pandas flatten a dataframe to a list

Tags:

python

list

pandas

dataframe

numpy

I have a df like so:

import pandas a=[['1/2/2014', 'a', '6', 'z1'],     ['1/2/2014', 'a', '3', 'z1'],     ['1/3/2014', 'c', '1', 'x3'],    ] df = pandas.DataFrame.from_records(a[1:],columns=a[0])

I want to flatten the df so it is one continuous list like so:

['1/2/2014', 'a', '6', 'z1', '1/2/2014', 'a', '3', 'z1','1/3/2014', 'c', '1', 'x3']

I can loop through the rows and extend to a list, but is a much easier way to do it?

435

asked Aug 22 '14 05:08

jason

2 Answers

You can use .flatten() on the DataFrame converted to a NumPy array:

df.to_numpy().flatten()

and you can also add .tolist() if you want the result to be a Python list.

Edit

In previous versions of Pandas, the values attributed was used instead of the .to_numpy() method, as mentioned in the comments below.

111

answered Sep 22 '22 08:09

Saullo G. P. Castro

Maybe use stack?

df.stack().values array(['1/2/2014', 'a', '3', 'z1', '1/3/2014', 'c', '1', 'x3'], dtype=object)

(Edit: Incidentally, the DF in the Q uses the first row as labels, which is why they're not in the output here.)

answered Sep 19 '22 08:09

meloncholy

Related questions
                            
                                can you write a str.replace() using dictionary values in Python?
                            
                                jinja2 how to remove trailing newline
                            
                                Why Java and Python garbage collection methods are different?
                            
                                Error handling in SQLAlchemy
                            
                                Replace part of a string in Python?
                            
                                Python BeautifulSoup give multiple tags to findAll
                            
                                Superscript in Python plots
                            
                                Best practice in python for return value on error vs. success
                            
                                Find all columns of dataframe in Pandas whose type is float, or a particular type?
                            
                                matplotlib: make plus sign thicker
                            
                                Take n rows from a spark dataframe and pass to toPandas()
                            
                                How do I encrypt and decrypt a string in python?
                            
                                What does "list comprehension" mean? How does it work and how can I use it?
                            
                                cProfile saving data to file causes jumbles of characters
                            
                                How can I set two primary key fields for my models in Django
                            
                                how to send the output of pprint module to a log file
                            
                                Avoiding "MySQL server has gone away" on infrequently used Python / Flask server with SQLAlchemy
                            
                                How to zip two differently sized lists?
                            
                                Use tqdm with concurrent.futures?
                            
                                How do I get the UTC time of "midnight" for a given timezone?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With