How to remove overlapping rows based on date and keep most recent in sql?

Multi tool use
Multi tool use


How to remove overlapping rows based on date and keep most recent in sql?



I need to find all episode_ids for every patient. However, when an overlapping episode arises within 90 days of the previous, then I just want to keep the most recent episode.



For example, patient_num 3242 below has 3 episodes: the second episode overlaps with the first episode within 90 days, and the third episode overlaps with the second episode within 90 days, in this situation I need to just keep the 3rd episode.


patient_num 3242


CREATE TABLE table1 (episode_id nvarchar(max), patient_num nvarchar(max), admit_date date, discharge_date date)

INSERT INTO table1 (episode_id, patient_num , admit_date , discharge_date ) VALUES
('1','5743','1/1/2016','1/5/2016'),
('2','5743','4/26/2016','4/29/2016'),
('3','5743','5/26/2016','5/28/2016'),
('4','5743','9/21/2016','9/28/2016'),
('5','8859','4/27/2016','5/5/2016'),
('6','3242','4/28/2016','4/29/2016'),
('7','3242','11/21/2016','11/23/2016'),
('8','3242','11/24/2016','11/29/2016'),
('9','3242','12/12/2016','12/29/2016')



Initial Table (table1)


episode_id patient_num admit_date discharge_date
1 5743 2016-01-01 2016-01-05
2 5743 2016-04-26 2016-04-29
3 5743 2016-05-26 2016-05-28
4 5743 2016-09-21 2016-09-28
5 8859 2016-04-27 2016-05-05
6 3242 2016-04-28 2016-04-29
7 3242 2016-11-21 2016-11-23
8 3242 2016-11-24 2016-11-29
9 3242 2016-12-12 2016-12-29



expected result


episode_id patient_num admit_date discharge_date
1 5743 2016-01-01 2016-01-05
3 5743 2016-05-26 2016-05-28
4 5743 2016-09-21 2016-09-28
5 8859 2016-04-27 2016-05-05
6 3242 2016-04-28 2016-04-29
9 3242 2016-12-12 2016-12-29



My attempt:


SELECT *
FROM table1 AS a
WHERE EXISTS
(
SELECT *
FROM table1 AS b
WHERE a.episode_id != b.episode_id
AND a.patient_num= b.patient_num
AND a.admit_date BETWEEN b.discharge_date AND DATEADD(DAY, 90, b.discharge_date ))



There error in my script is that for patient num 3242, I am getting both episode id 8 and 9 where I only want episode 9. I am assuming the reason for this error is that I am comparing each row individually instead of as a group but I am having trouble grouping. Additionally, this script is not showing instances where there is not an overlap, such as the episode_id 1, 4, 5, 6. Any advice on this approach?


3242





Give us the Script to generate the tables as you have them above
– Afshin Amiri
Jul 2 at 0:10





I added a script to generate the tables
– MartyB
Jul 2 at 0:48





As I see from data this is not correct:he second episode overlaps with the first episode within 90 days
– Afshin Amiri
Jul 2 at 1:31






The first and second episode are 112 days apart.
– MartyB
Jul 2 at 1:35





IMHO, we need to select all NON_Overlapped rows. Then from the others create OverlapRanges and from within each OverlapRange select the TOP 1. e.g. please clone episodes 7,8,9 and change their years to 2017. The part I havent solved is creating the OverlapRange. I am working with Celko-s definition of Overlap in SQL for Smarties which is slightly different from BETWEEN which includes the end points, whereas Overlap does not.
– donPablo
Jul 2 at 3:02





2 Answers
2



I removed the cursor solution here since it has low perforamnce
a solution without using Cursor is:


WITH ExcludedIds AS (
SELECT DISTINCT T2.episode_id
FROM table1 AS T
INNER JOIN table1 AS T2 ON T.episode_id != T2.episode_id
AND T.patient_num = T2.patient_num
AND T2.discharge_date BETWEEN DATEADD(DAY, -90, T.admit_date ) AND T.discharge_date)

SELECT T.episode_id, T.patient_num, T.admit_date, T.discharge_date
FROM table1 AS T
WHERE T.episode_id NOT IN (SELECT ExcludedIds.episode_id FROM ExcludedIds)



Thought understanding this solution is a bit hard.





This can be solved using set-based logic. A cursor is definitely not the right way to go.
– Gordon Linoff
Jul 2 at 1:43





How do you know that cursor is not efficient. Have you ever compared subquery performance against fast_forward readonly cursor. There is always a trade-off. It also depends on how many records the table has.
– Afshin Amiri
Jul 2 at 1:46






. . I have been working with relational databases since 1989. And yes, at times in the past, I have compared performance. I also understand how they work.
– Gordon Linoff
Jul 2 at 1:51





This is fantastic! I understand CTE better than cursors so plus 1 for that! Also, your CTE answer takes about 10 min while the cursor took about 2 hours.
– MartyB
Jul 2 at 16:22





I am Happy that it helped. I think @GordonLinoff deserves up-vote for his comment not his solution.
– Afshin Amiri
Jul 2 at 20:21



I think not exists does what you want:


not exists


SELECT a.*
FROM table1 a
WHERE NOT EXISTS (SELECT 1
FROM table1 b
WHERE a.episode_id <> b.episode_id AND
a.patient_num = b.patient_num AND
b.admin_date < a.discharge_date AND
b.discharge_date >= DATEADD(DAY, -90, a.discharge_date)
);





your query returns all the 9 records. First thing in writing a good query is that it must be correct. efficiency comes next.
– Afshin Amiri
Jul 2 at 1:49





I did down-vote your code not you and it is because your answer is wrong. Why do you think it is rude? Please do not consider it personal. I think down-vote shows others to not concentrate on an answer.
– Afshin Amiri
Jul 2 at 2:09







By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

FuIeUXvWNtXdSDF8WQAF5 yCIwU,I0sq29HtHVWIdnMNW 1,HCNtHQbdc,W,1O,1,itP4AjXrblg8GfM
gqaK3 MkrS0O0YF3rZ 8Rjdup,epUJIpRCSz,NhJON,USVGNRaO7DQXO8HeA 46YGbjGr G,en

Popular posts from this blog

Boo (programming language)

Rothschild family