Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating pandas dataframe from a list of strings

Tags:

python

pandas

I have the foll. list:

list_vals = ['col_a col_B col_C', '12.0 34.0 10.0', '15.0 111.0 23']

How can I convert it into a pandas dataframe?

I can start like this:

df = pd.DataFrame(columns=list_vals[0].split())

Is there a way to populate rest of dataframe?

like image 415
user308827 Avatar asked Feb 11 '17 03:02

user308827


People also ask

How will you convert a list into pandas DataFrame?

For Converting a List into Pandas Core Data Frame, we need to use DataFrame Method from pandas Package.

Can we create DataFrame from list?

The pandas DataFrame can be created by using the list of lists, to do this we need to pass a python list of lists as a parameter to the pandas. DataFrame() function. Pandas DataFrame will represent the data in a tabular format, like rows and columns.

How do you create a DataFrame from multiple lists?

Create pandas DataFrame from Multiple ListsUse column param and index param to provide column & row labels respectively to the DataFrame. Alternatively, you can also add column names to DataFrame and set the index using pandas. DataFrame.


3 Answers

You could use io.StringIO to feed a string into read_csv:

In [23]: pd.read_csv(io.StringIO('\n'.join(list_vals)), delim_whitespace=True)
Out[23]: 
   col_a  col_B  col_C
0   12.0   34.0   10.0
1   15.0  111.0   23.0

This has the advantage that it automatically does the type interpretation that pandas would do if you were reading an ordinary csv-- the columns are floats:

In [24]: _.dtypes
Out[24]: 
col_a    float64
col_B    float64
col_C    float64
dtype: object

While you could just feed your list into the DataFrame constructor directly, everything would stay strings:

In [21]: pd.DataFrame(columns=list_vals[0].split(), 
                      data=[row.split() for row in list_vals[1:]])
Out[21]: 
  col_a  col_B col_C
0  12.0   34.0  10.0
1  15.0  111.0    23

In [22]: _.dtypes
Out[22]: 
col_a    object
col_B    object
col_C    object
dtype: object

We could add dtype=float to fix this, of course, but we might have mixed types, which the read_csv approach would handle in the usual way and here we'd have to do manually.

like image 97
DSM Avatar answered Sep 28 '22 02:09

DSM


You can do it by converting to your data to dict, e.g.:

>>> pd.DataFrame({a: b for a, *b in (zip(*map(str.split, list_vals)))})
   col_B col_C col_a
0   34.0  10.0  12.0
1  111.0    23  15.0

Or with your original order:

>>> pd.DataFrame({a: b for a, *b in (zip(*map(str.split, list_vals)))},
...              columns=list_vals[0].split())
  col_a  col_B col_C
0  12.0   34.0  10.0
1  15.0  111.0    23
like image 44
AChampion Avatar answered Sep 28 '22 01:09

AChampion


You can read this as a numpy structured array, then pass it over to pandas. This is useful if you also need to work with numpy, and have the data types defined before reading (otherwise numpy is a step back to work with compared to pandas).

import numpy as np
import pandas as pd

list_vals = ['col_a col_B col_C', '12.0 34.0 10.0', '15.0 111.0 23']

# Gather names from first line, assume all column types are 'd' (i.e. float)
list_dtype = np.dtype([(name, 'd') for name in list_vals[0].split()])

# Create a numpy structured array
ar = np.fromiter((tuple(x.split()) for x in list_vals[1:]), dtype=list_dtype)

# Now convert it to a pandas DataFrame
dat = pd.DataFrame(ar)
like image 24
Mike T Avatar answered Sep 28 '22 01:09

Mike T