I have a database table of events that happened:
(timestamp, other data...)
I want to group these by things that happened at 'pretty much the same time'. That is, when ordered by timestamp, such that all events in each group are within X seconds (e.g., X=3) of some other event in that group, and more than X seconds from all events in other groups.
Is there a way to do this even somewhat efficiently in SQL, or should I just ORDER BY timestamp, pull data into my app, and do it there?
Sometimes I want to do something like this with some access logs we have. My data looks like:
EventID | UserID | When | What --------|--------|---------------------|-------- 7477 | 33 | 20090614:140517.131 | ... 7478 | 33 | 20090614:140518.992 | ... 7479 | 33 | 20090614:140522.020 | ... 7480 | 33 | 20090614:142719.001 | ... 7481 | 33 | 20090614:142720.668 | ...
Then I want to identify a "session" by userid and whether the times "lump", which is how I'm reading your statement. So, from the above:
UserId | SessionStart | Stuff --------|--------------------|--------- 33 | 6/14/2009 14:05:17 | ... 33 | 6/14/2009 14:27:19 | ...
I do this in SQL, using SQL Server. My strategy in this case is:
In SQL Server, using a temp table, it's pretty quick. Using a single SQL statement, it quickly gets very slow. In both cases, it's really ugly. Oracle, on the other hand, has a nice set of analytic functions to handle the delta and the running total which makes the code both cleaner and (usually) quicker.
If mysql doesn't have any such magic, and if your team isn't particularly enamored with SQL, I'd recommend you just consider doing it in your app for maintainable production code.
Below is the sanitized version of what I'm using. If you want the "single SQL statement" version, let me know. Apologies for giving you SQL Server code instead of mysql. :)
-- Set up work table DROP TABLE #temp CREATE TABLE #temp ( ID INT PRIMARY KEY, EventDate DATETIME, RecordRank INT, IsNewSession INT, SessionNum INT ); DECLARE @NumSecondsBetweenSessions INT, @StartDate DATETIME, @EndDate DATETIME ; SELECT @NumSecondsBetweenSessions = 600, @StartDate = '20000101', @EndDate = '20201231' ; -- Set up what will be our "Current" records in the "Current vs -- Previous" comparision. INSERT INTO #temp ( ID, EventDate, RecordRank, IsNewSession, SessionNum ) SELECT SL.ID, SL.Created_DateTime, ROW_NUMBER() OVER (ORDER BY SL.Created_DateTime ASC) AS RecordRank, 0, 0 FROM SystemLog SL WHERE SL.Created_DateTime BETWEEN @StartDate and @EndDate ; -- Checking the time delta between the Current and Previous -- records to see if we have a new session. UPDATE #temp SET IsNewSession = CASE WHEN PrevT.EventDate IS NULL THEN 1 WHEN DATEDIFF(s, PrevT.EventDate, #temp.EventDate) > @NumSecondsBetweenSessions THEN 1 ELSE 0 END FROM #temp LEFT OUTER JOIN #temp PrevT ON #temp.RecordRank = (PrevT.RecordRank + 1) ; -- This is performing a "running total" on IsNewSession to assign -- records to a specific Session. DECLARE @SessionNum INT; SET @SessionNum = 0; UPDATE #temp SET @SessionNum = @SessionNum + IsNewSession, SessionNum = @SessionNum ; -- The results. SELECT T.*, SL.* FROM #temp T JOIN SystemLog SL ON SL.ID = T.ID ORDER BY RecordRank ASC ;
You could use
DIV to calculate values that would be the same for events that happened "at the same time".
The following counts the number of events in 10 second intervals:
SELECT UNIX_TIMESTAMP(timestamp) DIV 10, COUNT(*) FROM events GROUP BY 1;
安装Google Go和App Engine SDK的正确方法是什么？