Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python convert comma separated list to pandas dataframe

Tags:

I am struggling to convert a comma separated list into a multi column (7) data-frame.

print (type(mylist))  <type 'list'> Print(mylist)   ['AN,2__AAS000,26,20150826113000,-283.000,20150826120000,-283.000',         'AN,2__AE000,26,20150826113000,0.000,20150826120000,0.000',......... 

The following creates a frame of a single column:

df = pd.DataFrame(mylist) 

I have reviewed the inbuilt csv functionality for Pandas, however my csv data is held in a list. How can I simply covert the list into a 7 column data-frame.

Thanks in advance.

like image 719
user636322 Avatar asked Aug 26 '15 10:08

user636322


People also ask

Which pandas command can be used in a Python script to read the contents of a comma separated value csv file into memory *?

When loading data with Pandas, the read_csv function is used for reading any delimited text file, and by changing the delimiter using the sep parameter.


1 Answers

You need to split each string in your list:

import  pandas as pd  df = pd.DataFrame([sub.split(",") for sub in l]) print(df) 

Output:

   0         1   2               3         4               5         6 0  AN  2__AS000  26  20150826113000  -283.000  20150826120000  -283.000 1  AN   2__A000  26  20150826113000     0.000  20150826120000     0.000 2  AN  2__AE000  26  20150826113000  -269.000  20150826120000  -269.000 3  AN  2__AE000  26  20150826113000  -255.000  20150826120000  -255.000 4  AN   2__AE00  26  20150826113000  -254.000  20150826120000  -254.000 

If you know how many lines to skip in your csv you can do it all with read_csv using skiprows=lines_of_metadata:

import  pandas as pd  df = pd.read_csv("in.csv",skiprows=3,header=None) print(df) 

Or if each line of the metadata starts with a certain character you can use comment:

df = pd.read_csv("in.csv",header=None,comment="#")   

If you need to specify more then one character you can combine itertools.takewhile which will drop lines starting with xxx:

import pandas as pd from itertools import dropwhile import csv with open("in.csv") as f:     f = dropwhile(lambda x: x.startswith("#!!"), f)     r = csv.reader(f)     df = pd.DataFrame().from_records(r) 

Using your input data adding some lines starting with #!!:

#!! various #!! metadata #!! lines AN,2__AS000,26,20150826113000,-283.000,20150826120000,-283.000 AN,2__A000,26,20150826113000,0.000,20150826120000,0.000 AN,2__AE000,26,20150826113000,-269.000,20150826120000,-269.000 AN,2__AE000,26,20150826113000,-255.000,20150826120000,-255.000 AN,2__AE00,26,20150826113000,-254.000,20150826120000,-254.000 

Outputs:

    0         1   2               3         4               5         6 0  AN  2__AS000  26  20150826113000  -283.000  20150826120000  -283.000 1  AN   2__A000  26  20150826113000     0.000  20150826120000     0.000 2  AN  2__AE000  26  20150826113000  -269.000  20150826120000  -269.000 3  AN  2__AE000  26  20150826113000  -255.000  20150826120000  -255.000 4  AN   2__AE00  26  20150826113000  -254.000  20150826120000  -254.000 
like image 116
Padraic Cunningham Avatar answered Oct 21 '22 17:10

Padraic Cunningham