Monday, January 11, 2016

Aggregrating time series events using DB2 SQL

Someone asked me for some SQL to aggregate up some data from a home energy monitor. In this particular case it's a plug controlling a microwave oven and measuring the power that it uses. The input data looks something like the following:

Dev    Meas      Timestamp              Power
Plug    power    1449660961833    109.6
Plug    power    1449660963225    209.2
Plug    power    1449660964291    509.9
Plug    power    1449660965296    1199.2
Plug    power    1449660966530    1197
Plug    power    1449660967766    1195.6
Plug    power    1449660969023    1196.1
Plug    power    1449660970216    1199
Plug    power    1449660972727    1198.1
Plug    power    1449660973929    1197.1
Plug    power    1449660975130    1193.9
Plug    power    1449660976376    667.3
Plug    power    1449660977589    174.8
Plug    power    1449660978800    76.3
Plug    power    1449660980033    51.8
Plug    power    1449660986193    11.7
Plug    power    1449660987398    3.7
Plug    power    1449660988799    1.9
Plug    power    1449661288372    211.8
Plug    power    1449661289734    478
Plug    power    1449661290753    1216.7
Plug    power    1449661292024    1204.9
Plug    power    1449661293171    1202
Plug    power    1449661298122    1201.5
Plug    power    1449661302994    670.6
Plug    power    1449661304224    175.7
Plug    power    1449661305498    76.8


The objective is to translate this into

Start ts                    End ts                  On/Off    Total Power
1449660961833    1449660986193    1    11386.6
1449660987398    1449660988799    0    5.6
1449661288372    1449661312834    1    6534.3
1449661314041    1449666826229    0    24.0
1449667136265    1449667196427    1    39542.4
1449667197658    1449667197658    0    1.7
1449667598249    1449667647221    1    24923.4
1449667648462    1449670435807    0    12.3
1449670609718    1449670646433    1    19924.9
1449670647639    1449670650052    0    7.5
1449671163432    1449671200244    1    18427.8
1449671201449    1449674183396    0    16.0
1449674184637    1449674184637    1    12.6
1449674190220    1449674212614    0    1.4
1449674245143    1449674246497    1    21.4
1449674251344    1449674251344    0    0

This kind of thing is easy to do in SQL using OLAP functions assuming you have a unique grouping key for each data partition you want to group. The trick here is to generate that key in order to uniquely identify each on and off period for the microwave. This is what I ended up with. It seems long winded so I want to see if I can simplify it.

-- mark each row as plug on (1) or plug off (0)
with state_data as (
  select ts,
         event,
         case
           when cast(event as integer) > 5 then 1  -- set the on/off threshold value here
           else 0
         end as plug_state
  from events
  where device = 'Plug' and
        name = 'power'
  order by ts asc
),
-- add some information from the previous row to each row
lag_data as (
  select lag(plug_state, 1,0) over ( order by ts) as last_period_plug_state,
         lag(ts, 1,0) over ( order by ts) as last_period_end_ts,
         ts as period_start_ts,
         event as period_event,
         plug_state as period_plug_state
  from state_data
  order by ts asc
),
-- select just the rows where the plug changes from on to off or from off to on
transition_data as (
  select last_period_end_ts, period_start_ts, period_plug_state from lag_data
  where period_plug_state = 0 and last_period_plug_state = 1 or
        period_plug_state = 1 and last_period_plug_state = 0
),
-- create a set of rows that describe the start and end of each on or off period
period_data as (
  select lag(period_start_ts, 1,0) over ( order by period_start_ts) as period_start_ts,
        last_period_end_ts as period_end_ts
  from transition_data
),
-- annotate each row in the full table with the time of the period to which it belongs
annotated_state_data as (
  select *, p.period_start_ts as period_id
  from state_data s, period_data p
  where s.ts >= p.period_start_ts and
        s.ts <= p.period_end_ts
)
-- finally aggregate up each period
select avg(period_start_ts), avg(period_end_ts), avg(plug_state), sum(event)
  from annotated_state_data
  group by period_id