처음 날짜 대신 다른 날짜를 기준으로 코호트 정의
이전에는 각 id별 첫 번째 임기 시작 날짜를 기준으로 코호트를 정의했다. 이번에는 처음 날짜 대신 다른 날짜를 기준으로 코흐트를 정의한다.
- term_start가 2000-12-31보다 작고
- term_end가 2000-01-01보다 큰
id, type별 min_start
- term_start가 12000-12-31보다 작고, term_end가 2000-01-01보다 큰 날짜만을 조회한다.
- id, type, date별로 그루핑하여 제일 작은 term_start 날짜를 조회한다.
1 2 3 4 5 6 7 8
SELECT DISTINCT(id_bioguide), term_type, DATE('2000-01-01') AS first_term, MIN(term_start) AS min_start FROM legislators_terms WHERE term_start <= '2000-12-31' AND term_end >= '2000-01-01' GROUP BY 1, 2, 3
type, period별 코호트 숫자
- 위 쿼리를 서브쿼리로 사용한다.
- C 테이블의 year 컬럼이 2000년도 이상인 것을 JOIN 조건에 추가한다.
- type, period별 코호트 숫자를 계산한다.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
SELECT A.term_type, COALESCE(DATE_PART('year', AGE(C.date, A.first_term)), 0) AS period, COUNT(DISTINCT(A.id_bioguide)) AS cohort_retained FROM ( SELECT DISTINCT(id_bioguide), term_type, DATE('2000-01-01') AS first_term, MIN(term_start) AS min_start FROM legislators_terms WHERE term_start <= '2000-12-31' AND term_end >= '2000-01-01' GROUP BY 1, 2, 3 ) AS A JOIN legislators_terms AS B ON A.id_bioguide = B.id_bioguide LEFT JOIN date_dim AS C ON C.date BETWEEN B.term_start AND B.term_end AND C.month_name = 'December' AND C.day_of_month = 31 AND C.year >= 2000 GROUP BY 1, 2
type, period별 코호트 숫자와 비율
- type별로 period를 오름차순했을 때 첫 번째 코호트 숫자를 기준으로 비율을 계산한다.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
SELECT term_type, period, cohort_retained, FIRST_VALUE(cohort_retained) OVER(PARTITION BY term_type ORDER BY period) AS cohort_size, ROUND(cohort_retained * 1.0 / FIRST_VALUE(cohort_retained) OVER(PARTITION BY term_type ORDER BY period), 2) AS pct_retained FROM ( SELECT A.term_type, COALESCE(DATE_PART('year', AGE(C.date, A.first_term)), 0) AS period, COUNT(DISTINCT(A.id_bioguide)) AS cohort_retained FROM ( SELECT DISTINCT(id_bioguide), term_type, DATE('2000-01-01') AS first_term, MIN(term_start) AS min_start FROM legislators_terms WHERE term_start <= '2000-12-31' AND term_end >= '2000-01-01' GROUP BY 1, 2, 3 ) AS A JOIN legislators_terms AS B ON A.id_bioguide = B.id_bioguide LEFT JOIN date_dim AS C ON C.date BETWEEN B.term_start AND B.term_end AND C.month_name = 'December' AND C.day_of_month = 31 AND C.year >= 2000 GROUP BY 1, 2 ) AS result ORDER BY 2, 1