Home 관련 코호트 분석 - 생존자
Post
Cancel

관련 코호트 분석 - 생존자

id별 첫 임기 시작날짜와 마지막 임기 시작 날짜

1
2
3
4
5
6
SELECT
	id_bioguide,
	MIN(term_start) AS first_term,
	MAX(term_start) AS last_term
FROM legislators_terms
GROUP BY 1


tenure 계산

1
2
3
4
5
6
7
SELECT
	id_bioguide,
	MIN(term_start) AS first_term,
	MAX(term_start) AS last_term,
	DATE_PART('year', AGE(MAX(term_start), MIN(term_start))) AS tenure
FROM legislators_terms
GROUP BY 1


century별 코호트 숫자와 임기가 10년 이상인 경우의 비율

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SELECT
	first_century,
	COUNT(DISTINCT id_bioguide) AS cohort_size,
	COUNT(DISTINCT CASE WHEN tenure >= 10 THEN id_bioguide END) AS survived_10,
	ROUND(COUNT(DISTINCT CASE WHEN tenure >= 10 THEN id_bioguide END)*1.0/COUNT(DISTINCT id_bioguide), 2) AS pct_survived_10
FROM
(
SELECT
	id_bioguide,
	DATE_PART('century', MIN(term_start)) AS first_century,
	MIN(term_start) AS first_term,
	MAX(term_start) AS last_term,
	DATE_PART('year', AGE(MAX(term_start), MIN(term_start))) AS tenure
FROM legislators_terms
GROUP BY 1
) AS sub
GROUP BY 1


century별 코호트 숫자와 임기가 5년 이상인 경우의 비율

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SELECT
	first_century,
	COUNT(DISTINCT id_bioguide) AS cohort_size,
	COUNT(DISTINCT CASE WHEN total_terms >= 5 THEN id_bioguide END) AS survived_5,
	ROUND(COUNT(DISTINCT CASE WHEN total_terms >= 5 THEN id_bioguide END)*1.0/COUNT(DISTINCT id_bioguide), 2) AS pct_survived_5
FROM
(
SELECT
	id_bioguide,
	DATE_PART('century', MIN(term_start)) AS first_century,
	COUNT(term_start) AS total_terms
FROM legislators_terms
GROUP BY 1
) AS sub
GROUP BY 1


century, terms별 코호트 숫자와 비율

  • 1년부터 20년까지 terms을 생성하고 JOIN한다.
  • 각 세기별 고유한 id의 개수를 센다.
  • 각 terms에 해당하는 임기에 대해 고유한 id의 개수를 센다.
  • 비율을 계산한다.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    
    SELECT
      A.first_century,
      B.terms,
      COUNT(DISTINCT A.id_bioguide) AS cohort,
      COUNT(DISTINCT CASE WHEN A.total_terms >= B.terms THEN A.id_bioguide END) AS cohort_survived,
      ROUND(COUNT(DISTINCT CASE WHEN A.total_terms >= B.terms THEN A.id_bioguide END)*1.0/COUNT(DISTINCT id_bioguide), 2) AS pct_cohort_survived
    FROM
    (
    SELECT
      id_bioguide,
      DATE_PART('century', MIN(term_start)) AS first_century,
      COUNT(term_start) AS total_terms
    FROM legislators_terms
    GROUP BY 1
    ) AS A
      JOIN (SELECT generate_series AS terms FROM generate_series(1, 20, 1)) AS B
          ON 1 = 1
    GROUP BY 1, 2
    

This post is licensed under CC BY 4.0 by the author.