Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delimit a specific column and add them as columns in CSV (Python3, CSV)

I have a csv file that has several columns that I first delimit by colon (;). However, ONE column is delimited by a pipe | and I would like to delimit this column and create new columns.

Input:

  Column 1    Column 2      Column 3
     1           2          3|4|5
     6           7          6|7|8
     10          11         12|13|14

Desired Output:

  Column 1   Column 2      ID    Age  Height
     1          2          3      4    5 
     6          7          6      7    8
     10         11         12     13   14

My code so far delimits the first time by ; and then converts to DF (which is my desired end format)

delimit = list(csv.reader(open('test.csv', 'rt'), delimiter=';'))
df = pd.DataFrame(delimit)
like image 278
user3682157 Avatar asked Nov 06 '15 23:11

user3682157


People also ask

How do you split data in a CSV file in Python?

Step 1 (Using Pandas): Find the number of rows from the files. Step 1 (Using Traditional Python): Find the number of rows from the files. Step 2: User to input the number of lines per file (Range) and generate a random number. In case you want an equal split, provide the same number for max and min.


1 Answers

You didn't show exactly what the data looks like (you say it's delimited by semicolons, but your examples don't have any), but if it looks like

Column 1;Column 2;Column 3
1;2;3|4|5
6;7;6|7|8
10;11;12|13|14

You could do something like

>>> df = pd.read_csv("test.csv", sep="[;|]", engine='python', skiprows=1, 
                     names=["Column 1", "Column 2", "ID", "Age", "Height"])
>>> df
   Column 1  Column 2  ID  Age  Height
0         1         2   3    4       5
1         6         7   6    7       8
2        10        11  12   13      14

This works by using a regex separator meaning "either ; or |" and forcing the column names manually.

Alternatively, you could do it in a few steps:

>>> df = pd.read_csv("test.csv", sep=";")
>>> df
   Column 1  Column 2  Column 3
0         1         2     3|4|5
1         6         7     6|7|8
2        10        11  12|13|14
>>> c3 = df.pop("Column 3").str.split("|", expand=True)
>>> c3.columns = ["ID", "Age", "Height"]
>>> df.join(c3)
   Column 1  Column 2  ID Age Height
0         1         2   3   4      5
1         6         7   6   7      8
2        10        11  12  13     14
like image 132
DSM Avatar answered Sep 29 '22 15:09

DSM