I like little challenges. The ones that don’t take all day to figure out, but are enough to capture your interest for a little while. Yesterday I had such a problem to solve.
I had some data in a table that was basically in ‘pairs’ of rows. It was actually different to the example below, but the example we’ll use is a ‘Message’ table, that contains requests and replies, that are linked through a particular identifier.
Our simple example looks like this (my actual table had more fields).
CREATE TABLE Message ( MessageID INT NOT NULL IDENTITY , MessageType CHAR(1) NOT NULL, TransactionID INT NOT NULL, MessageBody VARCHAR(30), CreatedDate DATETIME DEFAULT GetDate() )
We’ll add a bit of sample data (script generated from my insert generator stored proc)
SET IDENTITY_INSERT Message ON INSERT Message(MessageID,MessageType,TransactionID,MessageBody,CreatedDate) VALUES('1','Q','1','Request Message 1',convert(datetime,'2012-08-30 13:55:07.213',121)) INSERT Message(MessageID,MessageType,TransactionID,MessageBody,CreatedDate) VALUES('2','R','1','Reply Message 1',convert(datetime,'2012-08-30 13:55:37.680',121)) INSERT Message(MessageID,MessageType,TransactionID,MessageBody,CreatedDate) VALUES('3','Q','2','Request Message 2',convert(datetime,'2012-08-30 13:55:51.183',121)) INSERT Message(MessageID,MessageType,TransactionID,MessageBody,CreatedDate) VALUES('4','R','2','Reply Message 2',convert(datetime,'2012-08-30 13:56:04.020',121)) SET IDENTITY_INSERT Message OFF SELECT * FROM Message MessageID MessageType TransactionID MessageBody CreatedDate ----------- ----------- ------------- ------------------------------ ----------------------- 1 Q 1 Request Message 1 2012-08-30 13:55:07.213 2 R 1 Reply Message 1 2012-08-30 13:55:37.680 3 Q 2 Request Message 2 2012-08-30 13:55:51.183 4 R 2 Reply Message 2 2012-08-30 13:56:04.020
We can see that some of the fields are consistent from row to row (in pairs), and some of the fields are unique to each row. My challenge was to represent a pair of messages in one row.
On the face of it, this seems like it would be simple – just grouping by the TransactionID (the field that links the two rows). The problem is that you won’t be able to get the unique information from both rows without some assumptions (that may not be solid).
For example, this will happily give you the MessageID’s of both sides of the transaction (given the assumption that the request comes before the reply, and that there are two messages in a transaction) …
SELECT TransactionID, MIN(MessageID) AS RequestID, MAX(MessageID) AS ReplyID FROM [Message] GROUP BY TransactionID HAVING COUNT(*) = 2 TransactionID RequestID ReplyID ------------- ----------- ----------- 1 1 2 2 3 4
But – it’s doesn’t give you the unique data related to each ID, as you’d need to correlate the MessageBody to the right MessageID – MIN(MessageBody) won’t necessarily relate to the ‘Request’.
So… We can think about how to correlate the data to get the result we want. There’s a few options…
1. Use temporary tables, and get the result in two steps (reusing the query above)..
–1 – Two Step Process
SELECT TransactionID, MIN(MessageID) AS RequestID, MAX(MessageID) AS ReplyID INTO #MessagePair FROM [Message] GROUP BY TransactionID HAVING COUNT(*) = 2 SELECT REQ.MessageID AS RequestMessageID, REQ.TransactionId, REQ.MessageBody AS RequestBody, REQ.CreatedDate AS RequestDate, RPY.MessageID AS ReplyMessageID, RPY.MessageBody AS ReplyBody, RPY.CreatedDate AS ReplyDate FROM #MessagePair MP INNER JOIN [Message] REQ ON REQ.MessageID = MP.RequestID INNER JOIN [Message] RPY ON RPY.MessageID = MP.ReplyID RequestMessageID TransactionId RequestBody RequestDate ReplyMessageID ReplyBody ReplyDate ---------------- ------------- ------------------------------ ----------------------- -------------- ------------------------------ ----------------------- 1 1 Request Message 1 2012-08-30 13:55:07.213 2 Reply Message 1 2012-08-30 13:55:37.680 3 2 Request Message 2 2012-08-30 13:55:51.183 4 Reply Message 2 2012-08-30 13:56:04.020
2. Nasty correlated subquery and joins (not even going there)
3. Single query that makes use of the assmption that a request happens before a reply (meaning the messageId will be a lower value)
SELECT REQ.MessageID AS RequestMessageID, REQ.TransactionId, REQ.MessageBody AS RequestBody, REQ.CreatedDate AS RequestDate, RPY.MessageID AS ReplyMessageID, RPY.MessageBody AS ReplyBody, RPY.CreatedDate AS ReplyDate FROM [Message] REQ INNER JOIN [Message] RPY ON REQ.TransactionID = RPY.TransactionID AND REQ.MessageID < RPY.MessageID
This produces the same result as above, and is what I ended up going with. I reckon there’s probably a few more potential viable solutions, so I’d be interested to see anyone’s alternative solutions.