I want to do some pivot
ops to some data. Just like following.
>>> df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
... 'two'],
... 'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
... 'baz': [1, 2, 3, 4, 5, 6],
... 'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
>>> df
foo bar baz zoo
0 one A 1 x
1 one B 2 y
2 one C 3 z
3 two A 4 q
4 two B 5 w
5 two C 6 t
>>> df.pivot(index='foo', columns='bar', values='baz')
bar A B C
foo
one 1 2 3
two 4 5 6
I know DolphinDB can do pivot
in sql.
dateValue=2007.08.01
num=500
syms = (exec count(*) from taq
where
date = dateValue,
time between 09:30:00 : 15:59:59,
0<bid, bid<ofr, ofr<bid*1.2
group by symbol order by count desc).symbol[0:num]
priceMatrix = exec avg(bid + ofr)/2.0 as price from taq
where
date = dateValue, Symbol in syms,
0<bid, bid<ofr, ofr<bid*1.2,
time between 09:30:00 : 15:59:59
pivot by time.minute() as minute, Symbol
but how to do pivot
in clickhouse? Should I use client API to get data ? But there are too many rows, it's too difficult to deal with many rows. And if I can't use pandas
, how to implement pivot
operation easily?
It is the preliminary implementation that can help you make a start.
Remarks:
'holes' in rows is not supported (each column should contain value)
the types of all column casted to common type (String)
introduced the field orderNum. It is the order number of source column in result (for example, 'bar'-column be 2nd)
the result represented as rows with one column with Array-type. The order of array items is defined by orderNum.
Prepare test data:
CREATE TABLE test.pivot_test
(
orderNum Int,
s String,
values Array(String)
) ENGINE = Memory;
INSERT INTO test.pivot_test
VALUES
(1, 'foo', ['one', 'one', 'one', 'two', 'two', 'two']),
(3, 'baz', ['1', '2', '3', '4', '5', '6']),
(4, 'zoo', ['x', 'y', 'z', 'q', 'w', 't']),
(2, 'bar', ['A', 'B', 'C', 'A', 'B', 'C']);
/*
The content of table test.pivot_test:
┌─orderNum─┬─s───┬─values────────────────────────────────┐
│ 1 │ foo │ ['one','one','one','two','two','two'] │
│ 3 │ baz │ ['1','2','3','4','5','6'] │
│ 4 │ zoo │ ['x','y','z','q','w','t'] │
│ 2 │ bar │ ['A','B','C','A','B','C'] │
└──────────┴─────┴───────────────────────────────────────┘
*/
Pivot-emulation:
SELECT arrayMap(x -> x.1, arraySort(x -> x.2, groupArray(value_ordernum))) as row
FROM
(
SELECT
(value, orderNum) AS value_ordernum,
value_index
FROM test.pivot_test
ARRAY JOIN
values AS value,
arrayEnumerate(values) AS value_index
/*
The result of execution the nested query:
┌─value_ordernum─┬─value_index─┐
│ ('one',1) │ 1 │
│ ('one',1) │ 2 │
│ ('one',1) │ 3 │
│ ('two',1) │ 4 │
│ ('two',1) │ 5 │
│ ('two',1) │ 6 │
│ ('1',3) │ 1 │
│ ('2',3) │ 2 │
│ ('3',3) │ 3 │
│ ('4',3) │ 4 │
│ ('5',3) │ 5 │
│ ('6',3) │ 6 │
│ ('x',4) │ 1 │
│ ('y',4) │ 2 │
│ ('z',4) │ 3 │
│ ('q',4) │ 4 │
│ ('w',4) │ 5 │
│ ('t',4) │ 6 │
│ ('A',2) │ 1 │
│ ('B',2) │ 2 │
│ ('C',2) │ 3 │
│ ('A',2) │ 4 │
│ ('B',2) │ 5 │
│ ('C',2) │ 6 │
└────────────────┴─────────────┘
*/
)
GROUP BY value_index;
/*
The final result:
┌─row─────────────────┐
│ ['two','A','4','q'] │
│ ['one','C','3','z'] │
│ ['one','B','2','y'] │
│ ['two','B','5','w'] │
│ ['one','A','1','x'] │
│ ['two','C','6','t'] │
└─────────────────────┘
*/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With