SQL来选择几乎同时发生的事件
【腾讯云】亏本大甩卖,服务器4核16G 1年370元(带宽12M,系统盘120GB SSD盘,月流量2000GB)!!!!!!
云产品 配置 价格
服务器 1核2G,带宽5M,系统盘50GB SSD盘,月流量500GB 38元/年
MySQL 1核1G 19元/年
服务器 16核32G,带宽18M,系统盘250GB SSD盘,月流量5000GB 1197元/年
点我进入腾讯云,查看更多详情

I have a database table of events that happened:

(timestamp, other data...)

I want to group these by things that happened at 'pretty much the same time'. That is, when ordered by timestamp, such that all events in each group are within X seconds (e.g., X=3) of some other event in that group, and more than X seconds from all events in other groups.

Is there a way to do this even somewhat efficiently in SQL, or should I just ORDER BY timestamp, pull data into my app, and do it there?

#0

Sometimes I want to do something like this with some access logs we have. My data looks like:

EventID | UserID | When                | What
--------|--------|---------------------|--------
  7477  |   33   | 20090614:140517.131 | ...
  7478  |   33   | 20090614:140518.992 | ...
  7479  |   33   | 20090614:140522.020 | ...
  7480  |   33   | 20090614:142719.001 | ...
  7481  |   33   | 20090614:142720.668 | ...

Then I want to identify a "session" by userid and whether the times "lump", which is how I'm reading your statement. So, from the above:

 UserId | SessionStart       | Stuff
--------|--------------------|---------
   33   | 6/14/2009 14:05:17 | ...
   33   | 6/14/2009 14:27:19 | ...

I do this in SQL, using SQL Server. My strategy in this case is:

  1. Group by user
  2. Identify the delta between two records, per row.
  3. Make a IsNewSession column with 1 if the delta exceeds my threshold, else 0. This record is the time/date of the new session.
  4. Make a SessionNumber column which is the running total of IsNewSession. You can then use this number to identify the records in the session, group on them, etc.

In SQL Server, using a temp table, it's pretty quick. Using a single SQL statement, it quickly gets very slow. In both cases, it's really ugly. Oracle, on the other hand, has a nice set of analytic functions to handle the delta and the running total which makes the code both cleaner and (usually) quicker.

If mysql doesn't have any such magic, and if your team isn't particularly enamored with SQL, I'd recommend you just consider doing it in your app for maintainable production code.

Below is the sanitized version of what I'm using. If you want the "single SQL statement" version, let me know. Apologies for giving you SQL Server code instead of mysql. :)

-- Set up work table
DROP TABLE #temp
CREATE TABLE #temp
(
    ID INT PRIMARY KEY,
    EventDate DATETIME,
    RecordRank INT,
    IsNewSession INT,
    SessionNum INT
);

DECLARE
    @NumSecondsBetweenSessions INT,
    @StartDate DATETIME,
    @EndDate DATETIME
;

SELECT
    @NumSecondsBetweenSessions = 600,
    @StartDate = '20000101',
    @EndDate = '20201231'
;

-- Set up what will be our "Current" records in the "Current vs
-- Previous" comparision.
INSERT INTO #temp
(
    ID,
    EventDate,
    RecordRank,
    IsNewSession,
    SessionNum
)
SELECT
    SL.ID,
    SL.Created_DateTime,
    ROW_NUMBER() OVER (ORDER BY SL.Created_DateTime ASC) AS RecordRank,
    0,
    0
FROM
    SystemLog SL
WHERE
    SL.Created_DateTime BETWEEN @StartDate and @EndDate
;

-- Checking the time delta between the Current and Previous
-- records to see if we have a new session.
UPDATE #temp
SET
    IsNewSession = 
        CASE
            WHEN PrevT.EventDate IS NULL THEN 1
            WHEN DATEDIFF(s, PrevT.EventDate, #temp.EventDate) > @NumSecondsBetweenSessions THEN 1
            ELSE 0
        END
FROM
    #temp
    LEFT OUTER JOIN #temp PrevT
    ON #temp.RecordRank = (PrevT.RecordRank + 1)
;

-- This is performing a "running total" on IsNewSession to assign
-- records to a specific Session.
DECLARE @SessionNum INT;
SET @SessionNum = 0;
UPDATE #temp
SET
    @SessionNum = @SessionNum + IsNewSession,
    SessionNum = @SessionNum
;

-- The results.
SELECT
    T.*,
    SL.*
FROM
    #temp T
    JOIN SystemLog SL
    ON SL.ID = T.ID
ORDER BY
    RecordRank ASC
;

#1

You could use UNIX_TIMESTAMP and DIV to calculate values that would be the same for events that happened "at the same time".

The following counts the number of events in 10 second intervals:

SELECT UNIX_TIMESTAMP(timestamp) DIV 10, COUNT(*)
FROM events
GROUP BY 1;

推荐文章

带rawQuery的CursorLoader

带rawQuery的CursorLoader

推荐文章

为什么应用的模板重复两次

为什么应用的模板重复两次

推荐文章

故意破坏胖文件系统?

故意破坏胖文件系统?

推荐文章

打印堆栈跟踪

打印堆栈跟踪

推荐文章

如何将XML列中的数据提取到它们自己的列中?

如何将XML列中的数据提取到它们自己的列中?

推荐文章

安装Google Go和App Engine SDK的正确方法是什么?

安装Google Go和App Engine SDK的正确方法是什么?

推荐文章

斐波纳契搜索

斐波纳契搜索

推荐文章

滚动视图忽略大内容区域的内容插入

滚动视图忽略大内容区域的内容插入

推荐文章

如何使用php实现Mobile express结账

如何使用php实现Mobile express结账

推荐文章

如何知道添加到包含路径的服务器路径

如何知道添加到包含路径的服务器路径

推荐文章

在窗口中将多个用户控件显示为子窗口

在窗口中将多个用户控件显示为子窗口

推荐文章

OpenAL缓冲区实时更新

OpenAL缓冲区实时更新

推荐文章

不推荐使用PHP mysql_db_query。无法修复

不推荐使用PHP mysql_db_query。无法修复

推荐文章

如何在一个函数中获得数组大小而不计算或传递长度作为参数?

如何在一个函数中获得数组大小而不计算或传递长度作为参数?

推荐文章

SVN全局忽略递归吗?

SVN全局忽略递归吗?

推荐文章

如何编辑git合并的提交消息

如何编辑git合并的提交消息