I have 2 dimensional list/matrix which is dynamic with N rows and M columns.

The data type within a particular column is uniform, example: col1 is text, col2 is integer, column3 is float etc. The order of the columns can be different. Values for certain rows can be also missing.
Expected result should be 2 lists/arrays/dataframes, where:
list1 should be duplicated N times (depending the number of rows) col1_r1, col1_r2, ....., colM_row_n, with appended iterator or the number of rowslist2 should be transposed values of the rows (including the empty ones)What is the best way to achieve this in Python 3.6 using native lists or/and numpy arrays or/and panda dataframes?
output_list1 = [col1_1, col1_2, col1_3, col1_4, col1_5, col2_1, col2_2,
col2_3, col2_4, col2_5, col3_1, col3_2, col3_3, col3_4, col3_5]
-
output_list2 = ["value-row1,col1", "", "value-row3,col1", "value-row4,col1",
",value-row5,col1", "value-row1,col2", "value-row2,col2", "value-row3,col3",
0, "value-row5, col5", "value-row1, col3", 0.0, 0.0, 0.0, "value-row5,col4"]
Thanks in advance for your help.
This should do the trick:
import numpy as np
# create the data in a nested list
data_list = [['Event', 'Waits', 'Total Wait Time (sec)', 'Wait Avg(ms)', '% DB time', 'Wait Class'],
['latch free', '15,625', '311', '19.91', '29.6', 'Other'],
['library cache: mutex X', '90,012', '117,8', '1.31', '11.2', 'Concurrency'],
['DB CPU', '\xa0', '87,3', '\xa0', '8.3', '\xa0']]
# transform into numpy object array
data_array = np.array(data_list, dtype=object)
# construct header from first row
header = data_array[0, :]
# only use the data part of the array
data = data_array[1:, :]
list1 = []
list2 = []
for i in range(data.shape[0]):
for j in range(data.shape[1]):
# adjust for the 1 based index of row numbers
# transpose header columns by switching indices i and j
list1.append('{}_{}'.format(header[j], i+1))
# populate flattened data list
list2.append(data[i,j])
print(list1)
print(list2)
Output:
list1 = ['Event_1', 'Waits_1', 'Total Wait Time (sec)_1', 'Wait Avg(ms)_1', '% DB time_1', 'Wait Class_1', 'Event_2', 'Waits_2', 'Total Wait Time (sec)_2', 'Wait Avg(ms)_2', '% DB time_2', 'Wait Class_2', 'Event_3', 'Waits_3', 'Total Wait Time (sec)_3', 'Wait Avg(ms)_3', '% DB time_3', 'Wait Class_3']
list2 = ['latch free', '15,625', '311', '19.91', '29.6', 'Other', 'library cache: mutex X', '90,012', '117,8', '1.31', '11.2', 'Concurrency', 'DB CPU', '\xa0', '87,3', '\xa0', '8.3', '\xa0']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With