Someone asked me for some SQL to aggregate up some data from a home energy monitor. In this particular case it's a plug controlling a microwave oven and measuring the power that it uses. The input data looks something like the following:
Dev Meas Timestamp Power
Plug power 1449660961833 109.6
Plug power 1449660963225 209.2
Plug power 1449660964291 509.9
Plug power 1449660965296 1199.2
Plug power 1449660966530 1197
Plug power 1449660967766 1195.6
Plug power 1449660969023 1196.1
Plug power 1449660970216 1199
Plug power 1449660972727 1198.1
Plug power 1449660973929 1197.1
Plug power 1449660975130 1193.9
Plug power 1449660976376 667.3
Plug power 1449660977589 174.8
Plug power 1449660978800 76.3
Plug power 1449660980033 51.8
Plug power 1449660986193 11.7
Plug power 1449660987398 3.7
Plug power 1449660988799 1.9
Plug power 1449661288372 211.8
Plug power 1449661289734 478
Plug power 1449661290753 1216.7
Plug power 1449661292024 1204.9
Plug power 1449661293171 1202
Plug power 1449661298122 1201.5
Plug power 1449661302994 670.6
Plug power 1449661304224 175.7
Plug power 1449661305498 76.8
The objective is to translate this into
Start ts End ts On/Off Total Power
1449660961833 1449660986193 1 11386.6
1449660987398 1449660988799 0 5.6
1449661288372 1449661312834 1 6534.3
1449661314041 1449666826229 0 24.0
1449667136265 1449667196427 1 39542.4
1449667197658 1449667197658 0 1.7
1449667598249 1449667647221 1 24923.4
1449667648462 1449670435807 0 12.3
1449670609718 1449670646433 1 19924.9
1449670647639 1449670650052 0 7.5
1449671163432 1449671200244 1 18427.8
1449671201449 1449674183396 0 16.0
1449674184637 1449674184637 1 12.6
1449674190220 1449674212614 0 1.4
1449674245143 1449674246497 1 21.4
1449674251344 1449674251344 0 0
This kind of thing is easy to do in SQL using OLAP functions assuming you have a unique grouping key for each data partition you want to group. The trick here is to generate that key in order to uniquely identify each on and off period for the microwave. This is what I ended up with. It seems long winded so I want to see if I can simplify it.
-- mark each row as plug on (1) or plug off (0)
with state_data as (
select ts,
event,
case
when cast(event as integer) > 5 then 1 -- set the on/off threshold value here
else 0
end as plug_state
from events
where device = 'Plug' and
name = 'power'
order by ts asc
),
-- add some information from the previous row to each row
lag_data as (
select lag(plug_state, 1,0) over ( order by ts) as last_period_plug_state,
lag(ts, 1,0) over ( order by ts) as last_period_end_ts,
ts as period_start_ts,
event as period_event,
plug_state as period_plug_state
from state_data
order by ts asc
),
-- select just the rows where the plug changes from on to off or from off to on
transition_data as (
select last_period_end_ts, period_start_ts, period_plug_state from lag_data
where period_plug_state = 0 and last_period_plug_state = 1 or
period_plug_state = 1 and last_period_plug_state = 0
),
-- create a set of rows that describe the start and end of each on or off period
period_data as (
select lag(period_start_ts, 1,0) over ( order by period_start_ts) as period_start_ts,
last_period_end_ts as period_end_ts
from transition_data
),
-- annotate each row in the full table with the time of the period to which it belongs
annotated_state_data as (
select *, p.period_start_ts as period_id
from state_data s, period_data p
where s.ts >= p.period_start_ts and
s.ts <= p.period_end_ts
)
-- finally aggregate up each period
select avg(period_start_ts), avg(period_end_ts), avg(plug_state), sum(event)
from annotated_state_data
group by period_id
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment