I have a table with two columns:
+---------+--------+
| keyword | color |
+---------+--------+
| foo | red |
| bar | yellow |
| fobar | red |
| baz | blue |
| bazbaz | green |
+---------+--------+
I need to do some kind of one-hot encoding and transform table in PostgreSQL to:
+---------+-----+--------+-------+------+
| keyword | red | yellow | green | blue |
+---------+-----+--------+-------+------+
| foo | 1 | 0 | 0 | 0 |
| bar | 0 | 1 | 0 | 0 |
| fobar | 1 | 0 | 0 | 0 |
| baz | 0 | 0 | 0 | 1 |
| bazbaz | 0 | 0 | 1 | 0 |
+---------+-----+--------+-------+------+
Is it possible to do with SQL only? Any tips on how to get started?
In this technique, the categorical parameters will prepare separate columns for both Male and Female labels. So, wherever there is Male, the value will be 1 in Male column and 0 in Female column, and vice-versa.
One-hot encoding. Create as many columns as there are unique values in a variable. Put a 1 in a cell if the column and row represent the same value, otherwise put a zero in the cell. Use these new columns to create ML models.
In these cases, one-hot encoding comes in help because it transforms categorical data into numerical; in other words: it transforms strings into numbers so that we can apply our Machine Learning algorithms without any problems.
If I correctly understand, you need conditional aggregation:
select keyword,
count(case when color = 'red' then 1 end) as red,
count(case when color = 'yellow' then 1 end) as yellow
-- another colors here
from t
group by keyword
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With