Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create Pandas DataFrame from space separated String

I have a string:

              C1     C2                       DATE     C4     C5         C6      C7
0            0.0    W04  2021-01-08 00:00:00+00:00      E    EUE         C1     157
1            0.0    W04  2021-01-08 00:00:00+00:00      E    AEU         C1     157
2            0.0    W04  2021-01-01 00:00:00+00:00      E   SADA         H1     747
3            0.0    W04  2021-01-04 00:00:00+00:00      E   SSEA         H1     747
4            0.0    W04  2021-01-05 00:00:00+00:00      E   GPEA         H1     747

It sure looks like a Pandas DataFrame because it comes from one. I need to convert it into a Pandas DataFrame.

I tried the following:

pd.read_csv(StringIO(string_file),sep=r"\s+")

but it messes with the columns and separates the DATE column into 2 columns.

like image 570
Pedro Cintra Avatar asked Feb 02 '21 18:02

Pedro Cintra


People also ask

How to split a string in dataframe?

split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.

How to split data in pandas dataframe?

Split column by delimiter into multiple columns Apply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.

How to use str split in pandas?

split() function. The str. split() function is used to split strings around given separator/delimiter. The function splits the string in the Series/Index from the beginning, at the specified delimiter string.

How to split string into 2 columns pandas?

We can use the pandas Series. str. split() function to break up strings in multiple columns around a given separator or delimiter. It's similar to the Python string split() method but applies to the entire Dataframe column.


1 Answers

First, recreate the string:

s = """
              C1     C2                       DATE     C4     C5         C6      C7
0            0.0    W04  2021-01-08 00:00:00+00:00      E    EUE         C1     157
1            0.0    W04  2021-01-08 00:00:00+00:00      E    AEU         C1     157
2            0.0    W04  2021-01-01 00:00:00+00:00      E   SADA         H1     747
3            0.0    W04  2021-01-04 00:00:00+00:00      E   SSEA         H1     747
4            0.0    W04  2021-01-05 00:00:00+00:00      E   GPEA         H1     747
"""

Now, you can use Pandas.read_csv to import a buffer:

from io import StringIO
df = pd.read_csv(StringIO(s), sep=r"\s\s+")

From what I can tell, this results in exactly the DataFrame that you are looking for:

Screenshot of resulting DataFrame

You may want to convert the DATE column to datetime values as well:

df['DATE'] = df.DATE.astype('datetime64')
like image 143
Arthur D. Avatar answered Oct 18 '22 01:10

Arthur D.