Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame from raw string

I've got a string which looks like:

a1\tb1\tc1\na2\tb2\tc2\na3\tb3\tc3\n...

Is there an efficient and smart way to convert this kind of string into a Pandas DataFrame? StringIO seems not to be correct for this approach.

Thanks in advance!!

like image 723
P. Solar Avatar asked Dec 19 '22 01:12

P. Solar


1 Answers

StringIO works perfectly.

import io

string = 'a1\tb1\tc1\na2\tb2\tc2\na3\tb3\tc3'
pd.read_csv(io.StringIO(string), delim_whitespace=True, header=None)

    0   1   2
0  a1  b1  c1
1  a2  b2  c2
2  a3  b3  c3

You can also use pd.read_table or pd.read_fwf in the same manner:

pd.read_table(io.StringIO(string), header=None)

Or,

pd.read_fwf(io.StringIO(string), header=None)

    0   1   2
0  a1  b1  c1
1  a2  b2  c2
2  a3  b3  c3

In these last two examples, it is assumed that whitespace is the natural delimiter. However, your raw string must maintain a consistent structure within data.


Finally, you can also use a string splitting approach, splitting on newlines first, and then on tabs:

pd.DataFrame(list(map(str.split, string.splitlines())))

    0   1   2
0  a1  b1  c1
1  a2  b2  c2
2  a3  b3  c3
like image 62
cs95 Avatar answered Dec 27 '22 02:12

cs95