I want to use pandas to move the data around on a text file so that it is easy to parse data for the user. So far I am able to import several text files and append the data to a data frame along with adding headers. What I want to do is move the data to the correct column, but the issue is all the data is on the same column.
Here is my data:
test2218
math-science-physics
00:00:00:00
00:00:30:00
03-21 04:00:00
28
test2228
math
00:00:00:00
00:00:30:00
03-21 04:00:00
26
test2317
reading-comprehension
00:00:00:00
00:00:30:00
03-21 20:02:00
This is what I want my output to look like:
Test ID Test Info Duration_A Duration_B Next Use Participants
test2218 math-science-physics 00:00:00:00 00:00:30:00 03-21 14:00:00 28
test2228 math 00:00:00:00 00:00:30:00 03-21 14:00:00 26
test2317 reading-comprehension 00:00:00:00 00:00:30:00 04-11 13:30:00 2
I've looked everywhere and can't find a clear answer. Can someone assist?
Here is my code so far:
import os, glob, pandas as pd
d_frame = []
c_names = ['Test ID', 'Test Info', 'Duration_A', 'Duration_B', 'Next
Use', 'Participants']
files_list = glob.glob(os.path.join('C:\\test', '*.txt'))
for file in files_list:
if os.stat(file).st_size != 0:
df = pd.read_csv(file, delimiter='\t',header=None, names = c_names)
Any insight on this would be greatly appreciated. Thanks in advance!
Assuming your data is a pandas.DataFrame
object and those 6 pieces of information are always present in that specific order, you might try:
df = pd.DataFrame({0: ['test2218', 'math-science-physics', '00:00:00:00', '00:00:30:00', '03-21 04:00:00', '28', 'test2228', 'math', '00:00:00:00', '00:00:30:00', '03-21 04:00:00', '26', 'test2317', 'reading-comprehension', '00:00:00:00', '00:00:30:00', '03-21 20:02:00']})
columns = ['Test ID', 'Test Info', 'Duration_A', 'Duration_B', 'Next Use', 'Participants']
df_new = pd.DataFrame(df.groupby(df.index // len(columns))[0].apply(list).values.tolist(), columns=columns)
print(df_new)
Test ID Test Info Duration_A Duration_B Next Use Participants
0 test2218 math-science-physics 00:00:00:00 00:00:30:00 03-21 04:00:00 28
1 test2228 math 00:00:00:00 00:00:30:00 03-21 04:00:00 26
2 test2317 reading-comprehension 00:00:00:00 00:00:30:00 03-21 20:02:00 None
Or alternatively
df_new = pd.DataFrame(df.values.reshape(-1, len(columns)), columns=columns)
Here's a simple way to do it with numpy.reshape
:
import numpy as np
import pandas as pd
pd.DataFrame(np.reshape(df.values, (len(df) // 6, 6)),
columns=['Test ID', 'Test Info', 'Duration_A', 'Duration_B', 'Next Use', 'Participants'])
Test ID Test Info Duration_A Duration_B Next Use Participants
0 test2218 math-science-physics 00:00:00:00 00:00:30:00 03-21 04:00:00 28
1 test2228 math 00:00:00:00 00:00:30:00 03-21 04:00:00 26
2 test2317 reading-comprehension 00:00:00:00 00:00:30:00 03-21 20:02:00 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With