I have a csv file that has several columns that I first delimit by colon (;). However, ONE column is delimited by a pipe | and I would like to delimit this column and create new columns.
Input:
Column 1 Column 2 Column 3
1 2 3|4|5
6 7 6|7|8
10 11 12|13|14
Desired Output:
Column 1 Column 2 ID Age Height
1 2 3 4 5
6 7 6 7 8
10 11 12 13 14
My code so far delimits the first time by ; and then converts to DF (which is my desired end format)
delimit = list(csv.reader(open('test.csv', 'rt'), delimiter=';'))
df = pd.DataFrame(delimit)
Step 1 (Using Pandas): Find the number of rows from the files. Step 1 (Using Traditional Python): Find the number of rows from the files. Step 2: User to input the number of lines per file (Range) and generate a random number. In case you want an equal split, provide the same number for max and min.
You didn't show exactly what the data looks like (you say it's delimited by semicolons, but your examples don't have any), but if it looks like
Column 1;Column 2;Column 3
1;2;3|4|5
6;7;6|7|8
10;11;12|13|14
You could do something like
>>> df = pd.read_csv("test.csv", sep="[;|]", engine='python', skiprows=1,
names=["Column 1", "Column 2", "ID", "Age", "Height"])
>>> df
Column 1 Column 2 ID Age Height
0 1 2 3 4 5
1 6 7 6 7 8
2 10 11 12 13 14
This works by using a regex separator meaning "either ;
or |
" and forcing the column names manually.
Alternatively, you could do it in a few steps:
>>> df = pd.read_csv("test.csv", sep=";")
>>> df
Column 1 Column 2 Column 3
0 1 2 3|4|5
1 6 7 6|7|8
2 10 11 12|13|14
>>> c3 = df.pop("Column 3").str.split("|", expand=True)
>>> c3.columns = ["ID", "Age", "Height"]
>>> df.join(c3)
Column 1 Column 2 ID Age Height
0 1 2 3 4 5
1 6 7 6 7 8
2 10 11 12 13 14
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With