I am doing a CNN project, and I need to preprocess the label first.
The image file is a spectrogram, each file has a label of 250 values stored in an array. It tells a sequence of pitch values present in a particular spectrogram. For example, one label file looks like this:
[ 0 0 0 0 0 0 0 0 0 0 0 57 57 57 57 57 57 57 57 58 58 57 57 57
0 0 0 0 0 56 57 57 56 56 56 56 56 56 56 56 56 57 57 58 59 61 62 62
63 64 64 63 64 64 64 64 0 0 0 0 64 64 64 64 63 63 63 63 63 64 63 64
64 64 65 66 66 66 66 66 65 65 66 66 66 66 65 0 0 0 0 65 65 65 66 66
66 66 66 65 65 65 0 0 0 0 64 64 64 64 64 64 64 64 64 64 64 64 64 64
63 0 0 0 0 0 0 0 0 0 0 0 0 0 60 60 60 60 61 61 62 62 62 62
62 62 62 61 0 0 0 62 62 62 62 62 62 62 62 62 62 62 62 60 0 62 61 60
61 61 61 61 61 61 61 61 61 60 0 0 0 0 0 61 60 60 60 61 61 61 61 61
61 0 0 0 0 0 0 59 59 59 59 58 58 59 59 59 59 0 0 0 0 0 0 0
59 59 58 58 59 59 59 59 59 59 0 0 0 0 58 57 57 57 57 57 57 57 57 57
57 57 58 57 0 0 0 0 0 0]
After I summarize all label files, I have found these 51 unique values present in those labels. I stored these values in an array.
y_train = # y_test also contains these values
[ 0 30 31 32 33 34 35 36 37 38
39 40 41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58
59 60 61 62 63 64 65 66 67 68
69 70 71 72 73 74 76 77 81 83
85]
I need to execute to_categorical
method to determine the class number (in my case, 51) before I can do CNN computation. You can see to_categorical
docs here.
I have done it, but the result is 86, not 51. I assume because my label is already in an integer format, and the method thinks that I have 86 unique values ranging from 0-85 in a complete order, while in reality I have only 51 unique values, ranging from 0-85, but not in complete order (see y_train
).
# convert to array first. y_train and y_test are labels for an image X_train and X_test.
y_train = np.array(y_train) # labels for X_train images
y_test = np.array(y_test) # labels for X_test images
# do to_categorical
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
# shape result
y_train: (638, 250, 86) # 638 = total data, 250 = 1 data length, 86 = num_class
y_test: (161, 250, 86) # 161 = total data, 250 = 1 data length, 86 = num_class
Then, I come up to an idea to map all unique values into a new integer to make to_categorical
method thinks I have only 51 class, example:
0 -> 0
30 -> 1
31 -> 2
32 -> 3
...
85 -> 51
Is there a way in Python to achieve that kind of mapping from y_train
array? And if there is, can I return it back to its original value when the computation is finished? Thank you.
Yes, you can make a dictionary of all those mappings like below
map_dict = {}
for i, value in enumerate(y_train):
map_dict[i] = value
Your new categories would be the keys of map_dict, that you can get like below
list(map_dict.keys())
Later on whenever you have to look back to the original values, you just need to check in the map_dict like
map_dict[k]
For printing both the keys and value in the dictionary, do the following,
for key, value in map_dict.items():
print(key, ' --->', value)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With