I have the foll. list:
list_vals = ['col_a col_B col_C', '12.0 34.0 10.0', '15.0 111.0 23']
How can I convert it into a pandas dataframe?
I can start like this:
df = pd.DataFrame(columns=list_vals[0].split())
Is there a way to populate rest of dataframe?
For Converting a List into Pandas Core Data Frame, we need to use DataFrame Method from pandas Package.
The pandas DataFrame can be created by using the list of lists, to do this we need to pass a python list of lists as a parameter to the pandas. DataFrame() function. Pandas DataFrame will represent the data in a tabular format, like rows and columns.
Create pandas DataFrame from Multiple ListsUse column param and index param to provide column & row labels respectively to the DataFrame. Alternatively, you can also add column names to DataFrame and set the index using pandas. DataFrame.
You could use io.StringIO
to feed a string into read_csv
:
In [23]: pd.read_csv(io.StringIO('\n'.join(list_vals)), delim_whitespace=True)
Out[23]:
col_a col_B col_C
0 12.0 34.0 10.0
1 15.0 111.0 23.0
This has the advantage that it automatically does the type interpretation that pandas would do if you were reading an ordinary csv-- the columns are floats:
In [24]: _.dtypes
Out[24]:
col_a float64
col_B float64
col_C float64
dtype: object
While you could just feed your list into the DataFrame constructor directly, everything would stay strings:
In [21]: pd.DataFrame(columns=list_vals[0].split(),
data=[row.split() for row in list_vals[1:]])
Out[21]:
col_a col_B col_C
0 12.0 34.0 10.0
1 15.0 111.0 23
In [22]: _.dtypes
Out[22]:
col_a object
col_B object
col_C object
dtype: object
We could add dtype=float
to fix this, of course, but we might have mixed types, which the read_csv
approach would handle in the usual way and here we'd have to do manually.
You can do it by converting to your data to dict, e.g.:
>>> pd.DataFrame({a: b for a, *b in (zip(*map(str.split, list_vals)))})
col_B col_C col_a
0 34.0 10.0 12.0
1 111.0 23 15.0
Or with your original order:
>>> pd.DataFrame({a: b for a, *b in (zip(*map(str.split, list_vals)))},
... columns=list_vals[0].split())
col_a col_B col_C
0 12.0 34.0 10.0
1 15.0 111.0 23
You can read this as a numpy structured array, then pass it over to pandas. This is useful if you also need to work with numpy, and have the data types defined before reading (otherwise numpy is a step back to work with compared to pandas).
import numpy as np
import pandas as pd
list_vals = ['col_a col_B col_C', '12.0 34.0 10.0', '15.0 111.0 23']
# Gather names from first line, assume all column types are 'd' (i.e. float)
list_dtype = np.dtype([(name, 'd') for name in list_vals[0].split()])
# Create a numpy structured array
ar = np.fromiter((tuple(x.split()) for x in list_vals[1:]), dtype=list_dtype)
# Now convert it to a pandas DataFrame
dat = pd.DataFrame(ar)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With