I'm looking to 'flatten' my dataset in order to facilitate data mining. Each categorical column should be changed to multiple Boolean columns. I have a column with categorical values, e.g.:
ID col1
1 A
2 B
3 A
I'm looking for a way to pivot this table, and have an aggregated function telling me whether this ID has value A or B:
Result:
ID col1A col1B
1 1 0
2 0 1
3 1 0
I tried using PIVOT but have no idea which aggregated function to use inside it.
Also looked for answers in SF but couldn't find any...
I'm using MS-SQL 2012.
Any help would be appreciated! Omri
EDIT:
The number of categories in col1 is unknown, therefore the solution must be dynamic. Thanks :)
You can also create a dynamic pivot query, which uses a dynamic columns for pivot table, means you do not need to pass hard coded column names that you want to display in your pivot table. Dynamic pivot query will fetch a value for column names from table and creates a dynamic columns name list for pivot table.
You can use the PIVOT and UNPIVOT relational operators to change a table-valued expression into another table. PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output.
Rotates a table by transforming columns into rows. UNPIVOT is a relational operator that accepts two columns (from a table or subquery), along with a list of columns, and generates a row for each column specified in the list. In a query, it is specified in the FROM clause after the table name or subquery.
In SQL Server you can use the PIVOT function to transform the data from rows to columns: select Firstname, Amount, PostalCode, LastName, AccountNumber from ( select value, columnname from yourtable ) d pivot ( max(value) for columnname in (Firstname, Amount, PostalCode, LastName, AccountNumber) ) piv; See Demo.
try this:
select ID,
col1A=(case when col1='A' then 1 else 0 end),
col1B=(case when col1='B' then 1 else 0 end)
from <table>
IF you have one ID with both A and B and you want to have distinct ID in the output you could do
select ID,
col1A=max(case when col1='A' then 1 else 0 end),
col1B=max(case when col1='B' then 1 else 0 end)
from <table>
group by id
EDIT
As per your comment, If you do not know the number of options for col1, then you can go for dynamic PIVOT
DECLARE @cols AS NVARCHAR(MAX),
@query AS NVARCHAR(MAX)
select @cols = STUFF((SELECT distinct ',' + QUOTENAME(col1)
from <table>
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set @query = 'SELECT id, ' + @cols + ' from <table>
pivot
(
count([col1])
for col1 in (' + @cols + ')
) p '
print(@query)
execute(@query)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With