Forecasted College Enrollment

Forecasting the College Enrollment in WF-TDM region into the horizon year.

This notebook forecasts college enrollment for Utah’s major universities using a Weighted Participation Rate method. The approach estimates each university’s share of its surrounding college-age population, fits a linear trend to that share, applies a regional penetration ceiling as a control total, and multiplies by future population projections from the Kem C. Gardner Policy Institute. Targeted adjustments are made for capacity-constrained flagship institutions.

Author
Affiliation

Pukar Bhandari

Published

Invalid Date

1 Methodology Overview

This forecast predicts future college enrollment for travel demand modeling using a hybrid demographic approach. The core methodology follows these steps:

  1. Catchment Weights (Addressable Markets): We determine what percentage of a university’s student body comes from specific counties to define their “Addressable Market.”
  2. Weighted Populations: We calculate the catchment population by multiplying county shares by historical county-level college-age populations (18-24).
  3. Base Participation Rates: We calculate the historical ratio of enrolled students to this weighted population and forecast it forward using Linear Regression.
  4. Regional Control Totals: We apply a ceiling to the overall regional penetration rate to prevent infinite/unrealistic demographic capture.
  5. Flagship Override: Because large flagship universities (BYU, UofU) draw heavily from outside the local region and are capacity-constrained, their forecasts are detached from local population trends and modeled directly against time.

1.1 Data Sources


2 Configuration & Setup

2.1 File Paths & Settings

Show the code
from pathlib import Path
import pandas as pd
import numpy as np
import geopandas as gpd
from pygris.data import get_census
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_percentage_error
from IPython.display import display, Markdown
import warnings

warnings.filterwarnings("ignore")

# Centralized path configuration for all data inputs and outputs
PATHS = {
    # Cache files (intermediate data)
    "census_cache": Path(
        "intermediate/Utah_Census_All_Counties_All_Ages_2000_2023.csv"
    ),
    # Input data files
    "enrollment_ipeds": Path("intermediate/Fall_Enrollment_1997_2023.csv"),
    "gpi_data": Path("data/GPI/SchColAgeFS-Dec2022.xlsx"),
    "gpi_projections": Path(
        "data/GPI/Gardner-Policy-Institute-State-and-County-Projections-2025-2065-Data.xlsx"
    ),
    "zip_code": Path("data/UGRC ZIP/UtahZipCodeAreas_-1977770497264301863.gpkg"),
    "enroll_by_zip": Path("intermediate/TotEnrolByZipByCollege.csv"),
    "ushe_2025": Path("data/USHE 2025 Enrollment/WFRC Data Request_Final.xlsx"),
    # NHGIS data
    "nhgis_dir": Path("data/NHGIS"),
    # Output paths
    "model_diagnostics": Path("intermediate/model_diagnostics.csv"),
    "forecast_output": Path("intermediate/enrollment_forecast_2024_2060.csv"),
    "campus_forecast_output": Path("intermediate/enrollment_forecast_by_campus_2000_2060.csv"),
}

# WF-TDM region counties
TARGET_COUNTIES = [
    "Box Elder County",
    "Weber County",
    "Davis County",
    "Salt Lake County",
    "Utah County",
]

# Age range for college-age population (18-24) — matches GPI methodology
COLLEGE_AGE_RANGES = ["18 and 19 years", "20 years", "21 years", "22 to 24 years"]

# Train/test split: hold out 2020-2023 for model validation
TEST_START_YEAR = 2020

# Global Toggles
CALIB_SHIFT = True  # Toggle whether to shift model curves to meet 2023 actuals

3 1. Load Data

3.1 Enrollment Data

Utah System of Higher Education & IPEDS

Sources: USHE and Integrated Postsecondary Education Data System (2000-2023)

Show the code
# Load IPEDS enrollment data (2000-2023)
df_enroll_ipeds = pd.read_csv(PATHS['enrollment_ipeds'])

# Historical enrollment data from 2019 RTP (1990-1999)
df_enroll_rtp19 = pd.DataFrame(
    {
        "Year": [1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999],
        "UofU": [24311, 25581, 25868, 25982, 25645, 26810, 26079, 25895, 25018, 25781],
        "WSU":  [13449, 14495, 14993, 14447, 14230, 13996, 13906, 14613, 13900, 14984],
        "UVU":  [7885, 8777, 9623, 10512, 13293, 14040, 14756, 15994, 18174, 20062],
        "SLCC": [13344, 15972, 17766, 18311, 19440, 20834, 22394, 24307, 19754, 21273],
        "BYU":  [30465, 31101, 31108, 30547, 30413, 30465, 30563, 31249, 32202, 32731],
    }
)

3.2 College-Age Population (Historical 18-24)

US Census Bureau

We define a crosswalk to extract historical 18-24 population cohorts from Census Summary Files and the American Community Survey.

Show the code
# --- 1. Define Variable Crosswalk (Metadata) ---
var_crosswalk = pd.DataFrame(
    [
        ("Male", "Total",             "P012002", "P12_002N", "B01001_002E"),
        ("Male", "Under 5 years",     "P012003", "P12_003N", "B01001_003E"),
        ("Male", "5 to 9 years",      "P012004", "P12_004N", "B01001_004E"),
        ("Male", "10 to 14 years",    "P012005", "P12_005N", "B01001_005E"),
        ("Male", "15 to 17 years",    "P012006", "P12_006N", "B01001_006E"),
        ("Male", "18 and 19 years",   "P012007", "P12_007N", "B01001_007E"),
        ("Male", "20 years",          "P012008", "P12_008N", "B01001_008E"),
        ("Male", "21 years",          "P012009", "P12_009N", "B01001_009E"),
        ("Male", "22 to 24 years",    "P012010", "P12_010N", "B01001_010E"),
        ("Male", "25 to 29 years",    "P012011", "P12_011N", "B01001_011E"),
        ("Male", "30 to 34 years",    "P012012", "P12_012N", "B01001_012E"),
        ("Male", "35 to 39 years",    "P012013", "P12_013N", "B01001_013E"),
        ("Male", "40 to 44 years",    "P012014", "P12_014N", "B01001_014E"),
        ("Male", "45 to 49 years",    "P012015", "P12_015N", "B01001_015E"),
        ("Male", "50 to 54 years",    "P012016", "P12_016N", "B01001_016E"),
        ("Male", "55 to 59 years",    "P012017", "P12_017N", "B01001_017E"),
        ("Male", "60 and 61 years",   "P012018", "P12_018N", "B01001_018E"),
        ("Male", "62 to 64 years",    "P012019", "P12_019N", "B01001_019E"),
        ("Male", "65 and 66 years",   "P012020", "P12_020N", "B01001_020E"),
        ("Male", "67 to 69 years",    "P012021", "P12_021N", "B01001_021E"),
        ("Male", "70 to 74 years",    "P012022", "P12_022N", "B01001_022E"),
        ("Male", "75 to 79 years",    "P012023", "P12_023N", "B01001_023E"),
        ("Male", "80 to 84 years",    "P012024", "P12_024N", "B01001_024E"),
        ("Male", "85 years and over", "P012025", "P12_025N", "B01001_025E"),
        ("Female", "Total",             "P012026", "P12_026N", "B01001_026E"),
        ("Female", "Under 5 years",     "P012027", "P12_027N", "B01001_027E"),
        ("Female", "5 to 9 years",      "P012028", "P12_028N", "B01001_028E"),
        ("Female", "10 to 14 years",    "P012029", "P12_029N", "B01001_029E"),
        ("Female", "15 to 17 years",    "P012030", "P12_030N", "B01001_030E"),
        ("Female", "18 and 19 years",   "P012031", "P12_031N", "B01001_031E"),
        ("Female", "20 years",          "P012032", "P12_032N", "B01001_032E"),
        ("Female", "21 years",          "P012033", "P12_033N", "B01001_033E"),
        ("Female", "22 to 24 years",    "P012034", "P12_034N", "B01001_034E"),
        ("Female", "25 to 29 years",    "P012035", "P12_035N", "B01001_035E"),
        ("Female", "30 to 34 years",    "P012036", "P12_036N", "B01001_036E"),
        ("Female", "35 to 39 years",    "P012037", "P12_037N", "B01001_037E"),
        ("Female", "40 to 44 years",    "P012038", "P12_038N", "B01001_038E"),
        ("Female", "45 to 49 years",    "P012039", "P12_039N", "B01001_039E"),
        ("Female", "50 to 54 years",    "P012040", "P12_040N", "B01001_040E"),
        ("Female", "55 to 59 years",    "P012041", "P12_041N", "B01001_041E"),
        ("Female", "60 and 61 years",   "P012042", "P12_042N", "B01001_042E"),
        ("Female", "62 to 64 years",    "P012043", "P12_043N", "B01001_043E"),
        ("Female", "65 and 66 years",   "P012044", "P12_044N", "B01001_044E"),
        ("Female", "67 to 69 years",    "P012045", "P12_045N", "B01001_045E"),
        ("Female", "70 to 74 years",    "P012046", "P12_046N", "B01001_046E"),
        ("Female", "75 to 79 years",    "P012047", "P12_047N", "B01001_047E"),
        ("Female", "80 to 84 years",    "P012048", "P12_048N", "B01001_048E"),
        ("Female", "85 years and over", "P012049", "P12_049N", "B01001_049E"),
    ],
    columns=["Gender", "Age", "SF1", "DHC", "ACS"],
)
var_crosswalk["Label"] = var_crosswalk["Gender"] + ": " + var_crosswalk["Age"]
Show the code
# --- 2. Fetch & Cache Census Data ---
if not PATHS["census_cache"].exists():
    print(f"Fetching Census data to {PATHS['census_cache']}...")
    decennial_years = [2000, 2010, 2020]
    acs_years = [y for y in range(2009, 2024) if y not in decennial_years]
    data_frames = []
    for year in sorted(decennial_years + acs_years):
        try:
            if year in decennial_years:
                dataset = "dec/dhc" if year == 2020 else "dec/sf1"
                code_col = "DHC" if year == 2020 else "SF1"
            else:
                dataset, code_col = "acs/acs5", "ACS"
            year_codes = var_crosswalk[code_col].tolist()
            rename_map = dict(zip(year_codes, var_crosswalk["Label"]))
            df = get_census(
                dataset=dataset,
                variables=["NAME"] + year_codes,
                year=year,
                params={"for": "county:*", "in": "state:49"},
                return_geoid=True,
            )
            df = df.rename(columns={c: "NAME" for c in df.columns if c.upper() == "NAME"})
            df = df.rename(columns=rename_map)
            valid_cols = ["NAME"] + var_crosswalk["Label"].tolist()
            df = df[df.columns.intersection(valid_cols)].copy()
            df["Year"], df["Dataset"] = year, dataset
            data_frames.append(df)
            print(f"Fetched {year}")
        except Exception as e:
            print(f"Skipping {year}: {e}")
    if data_frames:
        PATHS["census_cache"].parent.mkdir(parents=True, exist_ok=True)
        pd.concat(data_frames, ignore_index=True).to_csv(PATHS["census_cache"], index=False)
        print("Census data cached successfully")
else:
    print(f"Using cached Census data from {PATHS['census_cache']}")
Using cached Census data from intermediate\Utah_Census_All_Counties_All_Ages_2000_2023.csv
Show the code
# Toggle ON (eval: true) to use Census API via pygris
# Toggle OFF (eval: false) if using NHGIS data instead

df_universe = pd.read_csv(PATHS['census_cache'])
df_universe["Geography"] = df_universe["NAME"].str.split(",").str[0]
df_wfrc = df_universe[df_universe["Geography"].isin(TARGET_COUNTIES)].copy()
target_cols = var_crosswalk.loc[var_crosswalk["Age"].isin(COLLEGE_AGE_RANGES), "Label"].tolist()
df_wfrc["Population_18_24"] = (
    df_wfrc[target_cols].apply(pd.to_numeric, errors="coerce").fillna(0).sum(axis=1)
)
df_college_age_hist = (
    df_wfrc.pivot_table(
        index="Year", columns="Geography", values="Population_18_24", aggfunc="sum"
    )
    .reset_index()
    .rename_axis(None, axis=1)
)

Timeseries Data from NHGIS

Show the code
# Toggle ON (eval: true) to fetch from IPUMS NHGIS API
# Requires IPUMS_API_KEY environment variable in .env

import os, zipfile
from dotenv import load_dotenv
from ipumspy import IpumsApiClient, AggregateDataExtract, TimeSeriesTable

load_dotenv()
IPUMS_API_KEY = os.getenv("IPUMS_API_KEY")

if IPUMS_API_KEY:
    client = IpumsApiClient(IPUMS_API_KEY)
    ts_table = TimeSeriesTable(
        name="B57",
        geog_levels=["county"],
        years=["1970","1980","1990","2000","2010","2020",
               "2006-2010","2007-2011","2008-2012","2009-2013","2010-2014",
               "2011-2015","2012-2016","2013-2017","2014-2018","2015-2019",
               "2016-2020","2017-2021","2018-2022","2019-2023","2020-2024"],
    )
    extract = AggregateDataExtract(
        collection="nhgis",
        description="Utah County College Age Time Series",
        time_series_tables=[ts_table],
        geographic_extents=["490"],
        tst_layout="time_by_column_layout",
    )
    print("Submitting extract to IPUMS NHGIS...")
    client.submit_extract(extract)
    print(f"Extract submitted. ID: {extract.extract_id}")
    client.wait_for_extract(extract)
    download_dir = PATHS['nhgis_dir']
    download_dir.mkdir(parents=True, exist_ok=True)
    client.download_extract(extract, download_dir=download_dir)
    zip_path = download_dir / f"nhgis{extract.extract_id:04d}_csv.zip"
    with zipfile.ZipFile(zip_path, "r") as zf:
        zf.extractall(download_dir / f"nhgis{extract.extract_id:04d}_csv")
    print("NHGIS Fetch Complete!")
else:
    print("Error: IPUMS_API_KEY not found in .env file.")
Show the code
# Load and process NHGIS time-series data
# Works with both manually downloaded and API-fetched NHGIS extracts

# 1. Locate latest NHGIS file
nhgis_files = sorted(PATHS['nhgis_dir'].rglob("*_ts_nominal_county.csv"))
nhgis_path = (
    nhgis_files[-1] if nhgis_files
    else PATHS['nhgis_dir'] / "nhgis0002_csv/nhgis0002_ts_nominal_county.csv"
)

df_nhgis = pd.read_csv(nhgis_path)
df_ut = df_nhgis[df_nhgis["STATE"] == "Utah"].copy()

# 2. Select 18-24 age columns (B57AE–B57AH = 18–19, 20, 21, 22–24)
target_prefixes = ["B57AE", "B57AF", "B57AG", "B57AH"]
value_cols = [
    c for c in df_ut.columns
    if any(c.startswith(p) for p in target_prefixes) and not c.endswith("M")
]
for col in value_cols:
    df_ut[col] = pd.to_numeric(df_ut[col], errors="coerce")

# 3. Melt to long format and parse year
df_long = df_ut.melt(
    id_vars=["COUNTY"], value_vars=value_cols,
    var_name="Variable", value_name="Population"
)
df_long["Suffix"] = df_long["Variable"].str[5:]
df_long["Year"]   = df_long["Suffix"].apply(
    lambda x: int(x) if len(x) == 4 else 2000 + int(x[:2])
)
df_long["Type"] = df_long["Suffix"].apply(
    lambda x: "Decennial" if len(x) == 4 else "ACS"
)

# 4. Prefer Decennial over ACS for 2010 and 2020
mask_drop = (
    ((df_long["Year"] == 2010) & (df_long["Type"] == "ACS")) |
    ((df_long["Year"] == 2020) & (df_long["Type"] == "ACS"))
)
df_clean = df_long[~mask_drop].copy()

# 5. Aggregate by county and year
df_agg = df_clean.groupby(["COUNTY", "Year"])["Population"].sum().reset_index()
df_agg["Geography"] = df_agg["COUNTY"].apply(
    lambda x: x if "County" in str(x) else str(x) + " County"
)

# 6. Pivot to wide format (counties as columns)
df_college_age_hist = (
    df_agg[df_agg["Geography"].isin(TARGET_COUNTIES)]
    .pivot(index="Year", columns="Geography", values="Population")
    .sort_index()
)
df_college_age_hist.columns.name = None

Forecast: Kem C. Gardner Policy Institute (2025-2060)

The Kem C. Gardner Policy Institute (GPI) provides the official projected demographic counts for Utah counties.

Show the code
# Primary forecast data: county-level GPI projections (2025–2065)
df_gpi_projections = (
    pd.read_excel(
        PATHS['gpi_projections'],
        sheet_name="Demographic Detail",
        header=3,
        nrows=1230,
    )
    .query("Geography in @TARGET_COUNTIES")[["Geography", "Year", "Population Ages 18-24"]]
)

df_gpi_wide = (
    df_gpi_projections
    .pivot(index="Year", columns="Geography", values="Population Ages 18-24")
    .rename_axis(None, axis=1)
    .sort_index()
)

# Backcast 2023–2024 using the average annual delta over the first 3 GPI years
avg_annual_change = df_gpi_wide.loc[2025:2027].diff().mean()
gpi_2024 = df_gpi_wide.loc[2025] - avg_annual_change
gpi_2023 = df_gpi_wide.loc[2025] - 2 * avg_annual_change

df_gpi_ext = pd.concat([
    pd.DataFrame(gpi_2023).T.rename(index={0: 2023}),
    pd.DataFrame(gpi_2024).T.rename(index={0: 2024}),
    df_gpi_wide,
]).sort_index()

4 Data Exploration

4.1 College Age (18–24) Population Trend

Show the code
# Melt both sources to long for plotting
df_hist_long = (
    df_college_age_hist.reset_index()
    .melt(id_vars=["Year"], var_name="County", value_name="Population")
)
df_hist_long["Type"] = "Historical"

df_proj_long = df_gpi_projections.rename(
    columns={"Geography": "County", "Population Ages 18-24": "Population"}
).copy()
df_proj_long["Type"] = "Projected"

df_plot_combined = pd.concat([df_hist_long, df_proj_long], ignore_index=True)
ojs_define(data_pop_trend=df_plot_combined.to_dict(orient="records"))
Show the code
Plot.plot({
  marks: [
    Plot.ruleY([0]),
    Plot.line(data_pop_trend, {
      filter: d => d.Type === "Historical",
      x: "Year", y: "Population", z: "County",
      stroke: "County", strokeWidth: 3
    }),
    Plot.line(data_pop_trend, {
      filter: d => d.Type === "Projected",
      x: "Year", y: "Population", z: "County",
      stroke: "County", strokeWidth: 3, strokeDasharray: "5,5"
    }),
    Plot.text(data_pop_trend, Plot.selectLast({
      x: "Year", y: "Population", z: "County",
      text: "County", textAnchor: "start", dx: 8, fill: "var(--bs-body-color)"
    })),
    Plot.tip(data_pop_trend, Plot.pointerX({
      x: "Year", y: "Population",
      title: d => `${d.County}\nYear: ${d.Year}\nPop: ${d.Population.toLocaleString()}\n(${d.Type})`,
      fill: "var(--bs-body-bg)", stroke: "var(--bs-body-color)"
    }))
  ],
  width: 780, marginLeft: 50, marginRight: 110,
  color: { legend: true },
  y: { grid: true, label: "Population (18-24)" },
  x: { tickFormat: "d" },
  style: {
    backgroundColor: "transparent", color: "var(--bs-body-color)",
    fontFamily: "sans-serif", fontSize: "12px"
  }
})
Figure 1: Historical and Projected College-Age Population (18-24) by County

5 Forecast Enrollment

5.1 Enrollment by College and County (Dynamic Catchment Generation)

We calculate dynamic base percentages using localized USHE zip-code enrollment patterns for the public universities (UofU, UVU, WSU, SLCC). For private institutions missing from the public dataset, we apply standard assumed market footprints.

5.2 Step 1 — Define Addressable Markets (Catchment Weights)

We programmatically extract the USHE derived weights for our public institutions and combine them with the explicit private institution boundaries.

Show the code
TARGET_COLLEGES = ["UOFU", "BYU", "UVU", "SLCC", "WSU", "ENSIGN", "WESTMIN"]
ANALYSIS_COUNTIES = ["Box Elder", "Weber", "Davis", "Salt Lake", "Utah"]

# Utah Counties Mapping
utah_counties = sorted(
    [
        "Beaver",
        "Box Elder",
        "Cache",
        "Carbon",
        "Daggett",
        "Davis",
        "Duchesne",
        "Emery",
        "Garfield",
        "Grand",
        "Iron",
        "Juab",
        "Kane",
        "Millard",
        "Morgan",
        "Piute",
        "Rich",
        "Salt Lake",
        "San Juan",
        "Sanpete",
        "Sevier",
        "Summit",
        "Tooele",
        "Uintah",
        "Utah",
        "Wasatch",
        "Washington",
        "Wayne",
        "Weber",
    ]
)
county_map = {i + 1: name for i, name in enumerate(utah_counties)}

# Load Zip Data
df_enroll = pd.read_csv(PATHS["enroll_by_zip"])
gdf_zip = gpd.read_file(PATHS["zip_code"])

# Process
df_enroll["College_Clean"] = df_enroll["COLLEGE"].str.split("_").str[0]
df_enroll = df_enroll[df_enroll["College_Clean"].isin(TARGET_COLLEGES)].copy()
df_enroll["ZIP5"] = df_enroll["ZIPCODE"].astype(str).str.zfill(5)
gdf_zip["ZIP5"] = gdf_zip["ZIP5"].astype(str).str.zfill(5)

df_merged = df_enroll.merge(gdf_zip[["ZIP5", "COUNTYNBR"]], on="ZIP5", how="left")
df_merged["County"] = pd.to_numeric(df_merged["COUNTYNBR"], errors="coerce").map(
    county_map
)
df_merged = df_merged[df_merged["County"].isin(ANALYSIS_COUNTIES)]

# Summary & Percentages
df_summary = (
    df_merged.groupby(["County", "College_Clean"])["TotEnrol"]
    .sum()
    .unstack(fill_value=0)
    .reindex(columns=TARGET_COLLEGES, fill_value=0)
)

df_percent = df_summary.div(df_summary.sum(axis=0), axis=1)

CATCHMENT_WEIGHTS = {
    # Private institutions missing from USHE zip dataset
    "BYU": {
        "Box Elder County": 0.000,
        "Davis County": 0.025,
        "Weber County": 0.025,
        "Salt Lake County": 0.20,
        "Utah County": 0.75,
    },
    "ENSIGN": {
        "Box Elder County": 0.000,
        "Weber County": 0.000,
        "Davis County": 0.15,
        "Salt Lake County": 0.70,
        "Utah County": 0.15,
    },
    "WESTMIN": {
        "Box Elder County": 0.000,
        "Weber County": 0.000,
        "Davis County": 0.065,
        "Salt Lake County": 0.90,
        "Utah County": 0.035,
    },
}

# Pull Dynamic Weights for Public Universities
zip_to_ipeds = {"UOFU": "UofU", "UVU": "UVU", "WSU": "WSU", "SLCC": "SLCC"}

for zip_key, ipeds_key in zip_to_ipeds.items():
    if zip_key in df_percent.columns:
        weights = df_percent[zip_key].to_dict()
        CATCHMENT_WEIGHTS[ipeds_key] = {f"{k} County": v for k, v in weights.items()}

DISPLAY_NAMES = {
    "UofU": "University of Utah",
    "UVU": "Utah Valley University",
    "WSU": "Weber State University",
    "SLCC": "Salt Lake Community College",
    "BYU": "Brigham Young University",
    "ENSIGN": "Ensign College",
    "WESTMIN": "Westminster College",
}
SCHOOL_KEYS = list(CATCHMENT_WEIGHTS.keys())

display(Markdown("**Calculated Catchment Weights (Addressable Market Shares):**"))
display(pd.DataFrame(CATCHMENT_WEIGHTS).fillna(0).style.format("{:.1%}"))

Calculated Catchment Weights (Addressable Market Shares):

  BYU ENSIGN WESTMIN UofU UVU WSU SLCC
Box Elder County 0.0% 0.0% 0.0% 0.2% 0.3% 3.3% 0.3%
Davis County 2.5% 15.0% 6.5% 8.3% 4.4% 40.8% 7.0%
Weber County 2.5% 0.0% 0.0% 2.3% 0.6% 45.8% 1.4%
Salt Lake County 20.0% 70.0% 90.0% 81.0% 20.8% 7.9% 83.4%
Utah County 75.0% 15.0% 3.5% 8.1% 73.9% 2.2% 7.9%

5.3 Step 2 — Weighted Populations

We multiply the 18-24 population of a county by the catchment weight to get the Weighted Population.

Show the code
# Interpolate Census data to fill 2001–2009 gap
df_census_full = df_college_age_hist.reindex(range(2000, 2024)).interpolate(
    method="linear"
)

# Compute Historical & Projected Weighted Populations
df_weighted_hist = pd.DataFrame(index=range(2000, 2024))
df_weighted_proj = pd.DataFrame(index=df_gpi_ext.index)

for key, weights in CATCHMENT_WEIGHTS.items():
    df_weighted_hist[key] = sum(
        df_census_full[county] * weight
        for county, weight in weights.items()
        if county in df_census_full.columns
    )
    df_weighted_proj[key] = sum(
        df_gpi_ext[county] * weight
        for county, weight in weights.items()
        if county in df_gpi_ext.columns
    )

display(Markdown("**Historical Weighted Populations (Recent Years):**"))
display(df_weighted_hist.tail().astype(int))

display(Markdown("**Projected Weighted Populations (First 5 Years):**"))
display(df_weighted_proj.head().astype(int))

Historical Weighted Populations (Recent Years):

BYU ENSIGN WESTMIN UofU UVU WSU SLCC
2019 99032 95346 102494 98641 98918 34989 100410
2020 103470 109483 120166 114739 103500 39620 116897
2021 104285 98959 106016 102159 104146 36301 103979
2022 112114 105294 112639 108611 111936 38276 110550
2023 110841 103480 110519 106631 110658 37733 108526

Projected Weighted Populations (First 5 Years):

BYU ENSIGN WESTMIN UofU UVU WSU SLCC
2023 121186 112463 119901 115743 120991 40846 117794
2024 124780 114287 121364 117331 124563 42012 119380
2025 128375 116111 122826 118920 128135 43179 120966
2026 131555 118195 124680 120837 131310 44541 122887
2027 135564 119760 125751 122096 135279 45512 124138

5.4 Step 3 — Historic Participation Rates

We define the Participation Rate as: \text{Rate}_{year} = \frac{\text{Actual Enrollment}_{year}}{\text{Weighted Population}_{year}}

Show the code
# Compute historic participation rates
df_ipeds = df_enroll_ipeds.set_index("Year").sort_index()
df_hist_rates = pd.DataFrame(index=range(2000, 2024))
for key in SCHOOL_KEYS:
    df_hist_rates[key] = (
        df_ipeds[key].reindex(df_hist_rates.index) / df_weighted_hist[key]
    )

display(Markdown("**Historical Participation Rates (Recent Years):**"))
display(df_hist_rates.tail().style.format("{:.2%}"))

Historical Participation Rates (Recent Years):

  BYU ENSIGN WESTMIN UofU UVU WSU SLCC
2019 34.65% 2.02% 2.16% 33.30% 42.18% 84.72% 29.40%
2020 35.24% 1.67% 1.54% 28.83% 39.55% 74.70% 23.35%
2021 33.38% 2.69% 1.45% 33.74% 39.62% 82.02% 26.18%
2022 30.74% 3.85% 1.14% 31.98% 38.50% 78.15% 23.79%
2023 31.64% 5.77% 1.10% 33.07% 40.35% 80.93% 24.66%

5.5 Step 4 — Linear Regression Model

We fit a linear trend to each university’s historical participation rate: \text{Rate}(t) = \alpha + \beta \cdot t, \quad t = \text{Year} - 1999 The model is trained on 2000–2019 and evaluated on 2020–2023.

Show the code
models_dict      = {}
model_metrics    = []
residual_records = []
t_2023 = 2023 - 1999

for key in SCHOOL_KEYS:
    full_name    = DISPLAY_NAMES[key]
    rates_series = df_hist_rates[key].dropna()
    years_arr    = rates_series.index.values.astype(float)
    rates_arr    = rates_series.values
    t_arr        = years_arr - 1999
    X            = t_arr.reshape(-1, 1)

    train_mask  = years_arr < TEST_START_YEAR
    test_mask   = ~train_mask
    years_train = years_arr[train_mask].astype(int)
    years_test  = years_arr[test_mask].astype(int)
    rates_train = rates_arr[train_mask]
    rates_test  = rates_arr[test_mask]
    X_train     = X[train_mask]
    X_test      = X[test_mask]

    model = LinearRegression().fit(X_train, rates_train)
    models_dict[key] = model
    alpha      = float(model.intercept_)
    beta       = float(model.coef_[0])
    pred_train = model.predict(X_train)
    pred_test  = model.predict(X_test) if test_mask.any() else None

    # Train Metrics
    pop_train         = df_weighted_hist.loc[years_train, key].values
    enroll_act_train  = df_ipeds.loc[years_train, key].values
    enroll_pred_train = pred_train * pop_train
    r2_train          = r2_score(rates_train, pred_train)
    mape_train        = mean_absolute_percentage_error(enroll_act_train, enroll_pred_train)

    model_metrics.append({
        "University": full_name, "Dataset": "Train", "Method": "Base Linear",
        "Alpha": round(alpha, 4), "Beta": round(beta, 4), "R2": round(r2_train, 4), "MAPE": round(mape_train, 4),
    })

    # Collect Residuals (Train)
    for yr, act, pred in zip(years_train, enroll_act_train, enroll_pred_train):
        residual_records.append({
            "Year": int(yr), "University": full_name, "Actual": float(act), "Predicted": float(pred),
            "Residual": float(act - pred), "APE": float(abs(act - pred)/act) if act != 0 else 0,
            "Method": "Base Linear", "Dataset": "Train"
        })

    # Test Metrics
    if test_mask.any():
        pop_test          = df_weighted_hist.loc[years_test, key].values
        enroll_act_test   = df_ipeds.loc[years_test, key].values
        enroll_pred_test  = pred_test * pop_test
        r2_test           = r2_score(rates_test, pred_test)
        mape_test         = mean_absolute_percentage_error(enroll_act_test, enroll_pred_test)

        model_metrics.append({
            "University": full_name, "Dataset": "Test", "Method": "Base Linear",
            "Alpha": round(alpha, 4), "Beta": round(beta, 4), "R2": round(r2_test, 4), "MAPE": round(mape_test, 4),
        })

        # Collect Residuals (Test)
        for yr, act, pred in zip(years_test, enroll_act_test, enroll_pred_test):
            residual_records.append({
                "Year": int(yr), "University": full_name, "Actual": float(act), "Predicted": float(pred),
                "Residual": float(act - pred), "APE": float(abs(act - pred)/act) if act != 0 else 0,
                "Method": "Base Linear", "Dataset": "Test"
            })

5.6 Step 5 — Calibration Shift Calculation

Computes a per-school additive shift in RATE units so the model passes exactly through the 2023 actual enrollment when CALIB_SHIFT = True.

Show the code
shift_factors = {}
for key in SCHOOL_KEYS:
    model_rate_2023 = float(models_dict[key].predict(np.array([[t_2023]]))[0])
    actual_enroll_2023 = float(df_ipeds.loc[2023, key])
    gpi_pop_2023 = float(df_weighted_proj.loc[2023, key])
    implied_gpi_rate_2023 = actual_enroll_2023 / gpi_pop_2023

    shift = (implied_gpi_rate_2023 - model_rate_2023) if CALIB_SHIFT else 0.0
    shift_factors[key] = shift

display(Markdown("**Calibration Shifts (Base Linear Models):**"))
display(
    pd.DataFrame.from_dict(
        shift_factors, orient="index", columns=["Shift"]
    ).style.format("{:.4f}")
)

Calibration Shifts (Base Linear Models):

  Shift
BYU -0.0553
ENSIGN 0.0257
WESTMIN -0.0193
UofU -0.0567
UVU -0.0759
WSU -0.2028
SLCC -0.1233

5.7 Step 6 & 7 — Final Forecasts & Regional Control Totals

Why a Control Total? Left unchecked, applying straight-line growth to growing populations could suggest that an impossibly high percentage of the region is attending college. We set a PENETRATION_CEILING based on historical peaks to proportionally scale down college projections if total regional enrollment exceeds reality.

Show the code
total_enroll_hist   = df_ipeds[SCHOOL_KEYS].sum(axis=1).loc[2000:2023]
total_pop_hist_5cty = df_census_full[TARGET_COUNTIES].sum(axis=1).loc[2000:2023]
regional_rate_hist  = (total_enroll_hist / total_pop_hist_5cty).dropna()

HIST_MAX_PENETRATION = float(regional_rate_hist.max())
PENETRATION_CEILING  = HIST_MAX_PENETRATION * 1.02   # 2% buffer above historical peak

FORECAST_YEARS   = list(range(2024, 2061))
df_total_gpi_pop = df_gpi_ext[TARGET_COUNTIES].sum(axis=1)

forecast_records   = []
regional_rate_proj = []

for year in FORECAST_YEARS:
    t = year - 1999
    total_gpi_pop  = float(df_total_gpi_pop.loc[year])
    ceiling_enroll = PENETRATION_CEILING * total_gpi_pop

    raw = {}
    for key in SCHOOL_KEYS:
        model_rate = float(models_dict[key].predict(np.array([[t]]))[0])

        # Decay calibration shift to zero over 10 years for WESTMIN only
        if key == "WESTMIN":
            decay_factor = max(0.0, 1.0 - (year - 2023) / 10.0) if CALIB_SHIFT else 0.0
            active_shift = shift_factors[key] * decay_factor
        else:
            active_shift = shift_factors[key]

        gpi_pop  = float(df_weighted_proj.loc[year, key])
        raw[key] = max(0.0, (model_rate + active_shift) * gpi_pop)

    total_raw = sum(raw.values())
    proj_rate = total_raw / total_gpi_pop

    if total_raw > ceiling_enroll:
        scale_factor = ceiling_enroll / total_raw
        capped = True
    else:
        scale_factor = 1.0
        capped = False

    regional_rate_proj.append({
        "Year": year, "Rate": proj_rate * scale_factor,
        "Type": "Projected", "Capped": capped,
    })
    for key, raw_enroll in raw.items():
        forecast_records.append({
            "Year": year, "Univ_Code": key,
            "University": DISPLAY_NAMES[key],
            "Enrollment": int(round(raw_enroll * scale_factor)),
            "Type": "Projected",
        })

df_forecast_base = pd.DataFrame(forecast_records)

6 Flagship Override Adjustment

BYU and the University of Utah draw students nationally and internationally, making their enrollment largely independent of local Wasatch Front demographics. Tying their forecast to the regional college-age population via participation rates produces unrealistic trajectories as local cohort sizes fluctuate.

Instead, we model each flagship’s enrollment directly as a function of time: \text{Enrollment}(t) = \alpha + \beta \cdot t, \quad t = \text{Year} - 1999

This treats their historical enrollment trend as the best predictor of future enrollment, bypassing catchment weights, weighted populations, and participation-rate mechanics entirely. The resulting forecasts are then substituted back into the base linear forecast for BYU and UofU only; all other universities retain their original participation-rate projections.

Show the code
FLAGSHIP_KEYS = ["BYU", "UofU"]
flagship_models = {}
flagship_metrics = []
flagship_shifts = {}

for key in FLAGSHIP_KEYS:
    full_name = DISPLAY_NAMES[key]
    enroll_series = df_ipeds[key].dropna()
    years_arr = enroll_series.index.values.astype(float)
    enroll_arr = enroll_series.values
    t_arr = years_arr - 1999
    X = t_arr.reshape(-1, 1)

    train_mask = years_arr < TEST_START_YEAR
    test_mask = ~train_mask
    years_train = years_arr[train_mask].astype(int)
    years_test = years_arr[test_mask].astype(int)
    enroll_train = enroll_arr[train_mask]
    enroll_test = enroll_arr[test_mask]
    X_train = X[train_mask]
    X_test = X[test_mask]

    model = LinearRegression().fit(X_train, enroll_train)
    flagship_models[key] = model

    alpha = float(model.intercept_)
    beta = float(model.coef_[0])
    pred_train = model.predict(X_train)
    pred_test = model.predict(X_test) if test_mask.any() else None

    # Flagship Train Metrics
    r2_train = r2_score(enroll_train, pred_train)
    mape_train = mean_absolute_percentage_error(enroll_train, pred_train)
    flagship_metrics.append(
        {
            "University": full_name,
            "Dataset": "Train",
            "Method": "Direct Flagship Override",
            "Alpha": round(alpha, 1),
            "Beta": round(beta, 1),
            "R2": round(r2_train, 4),
            "MAPE": round(mape_train, 4),
        }
    )

    # Collect Residuals (Train)
    for yr, act, pred in zip(years_train, enroll_train, pred_train):
        residual_records.append(
            {
                "Year": int(yr),
                "University": full_name,
                "Actual": float(act),
                "Predicted": float(pred),
                "Residual": float(act - pred),
                "APE": float(abs(act - pred) / act) if act != 0 else 0,
                "Method": "Direct Flagship Override",
                "Dataset": "Train",
            }
        )

    # Flagship Test Metrics
    if test_mask.any():
        r2_test = r2_score(enroll_test, pred_test)
        mape_test = mean_absolute_percentage_error(enroll_test, pred_test)
        flagship_metrics.append(
            {
                "University": full_name,
                "Dataset": "Test",
                "Method": "Direct Flagship Override",
                "Alpha": round(alpha, 1),
                "Beta": round(beta, 1),
                "R2": round(r2_test, 4),
                "MAPE": round(mape_test, 4),
            }
        )

        # Collect Residuals (Test)
        for yr, act, pred in zip(years_test, enroll_test, pred_test):
            residual_records.append(
                {
                    "Year": int(yr),
                    "University": full_name,
                    "Actual": float(act),
                    "Predicted": float(pred),
                    "Residual": float(act - pred),
                    "APE": float(abs(act - pred) / act) if act != 0 else 0,
                    "Method": "Direct Flagship Override",
                    "Dataset": "Test",
                }
            )

    # Calibration Shift for Override (in enrollment units)
    model_enroll_2023 = float(model.predict(np.array([[t_2023]]))[0])
    actual_enroll_2023 = float(df_ipeds.loc[2023, key])
    flagship_shifts[key] = (
        (actual_enroll_2023 - model_enroll_2023) if CALIB_SHIFT else 0.0
    )

display(Markdown("**Calibration Shifts (Flagship Direct Models - Students):**"))
display(
    pd.DataFrame.from_dict(
        flagship_shifts, orient="index", columns=["Enrollment Shift"]
    ).astype(int)
)

# Generate Override Forecast
flagship_override_records = []
for year in FORECAST_YEARS:
    t = year - 1999
    for key in FLAGSHIP_KEYS:
        raw_enroll = float(flagship_models[key].predict(np.array([[t]]))[0])
        final_enroll = max(0, int(round(raw_enroll + flagship_shifts[key])))
        flagship_override_records.append(
            {
                "Year": year,
                "Univ_Code": key,
                "University": DISPLAY_NAMES[key],
                "Enrollment": final_enroll,
                "Type": "Projected",
            }
        )

df_flagship_override = pd.DataFrame(flagship_override_records)

# Drop original BYU/UofU projected rows and substitute flagship overrides
df_forecast_adjusted = pd.concat(
    [
        df_forecast_base[~df_forecast_base["Univ_Code"].isin(FLAGSHIP_KEYS)],
        df_flagship_override,
    ],
    ignore_index=True,
).sort_values(["Univ_Code", "Year"])

Calibration Shifts (Flagship Direct Models - Students):

Enrollment Shift
BYU 1082
UofU 571

7 Policy Adjustments

7.1 BYU Medical School (2027-2030)

Background: Brigham Young University is planning to integrate a Medical School starting in 2027. The medical school is expected to enroll 100 students annually.

Enrollment Impact: - 2027: +100 students (first year class) - 2028: +100 students (second year class added; total +200) - 2029: +100 students (third year class added; total +300) - 2030: +100 students (fourth year class added; total +400) - 2031 onwards: Steady state at +400 students (first cohort graduates as new cohort enters)

This adjustment is applied to BYU’s flagship override forecast to reflect the planned expansion.

Show the code
# BYU Medical School Enrollment Adjustment
MEDICAL_SCHOOL_START_YEAR = 2027
MEDICAL_SCHOOL_ANNUAL_ENROLLMENT = 100
MEDICAL_SCHOOL_PROGRAM_LENGTH = 4  # MD program is 4 years

# Apply cumulative enrollment adjustment
for year in FORECAST_YEARS:
    if year >= MEDICAL_SCHOOL_START_YEAR:
        # Calculate cumulative enrollment (capped at steady-state)
        years_since_start = year - MEDICAL_SCHOOL_START_YEAR
        cohorts_enrolled = min(years_since_start + 1, MEDICAL_SCHOOL_PROGRAM_LENGTH)
        medical_enrollment = cohorts_enrolled * MEDICAL_SCHOOL_ANNUAL_ENROLLMENT

        # Apply adjustment to BYU forecast
        mask = (df_forecast_adjusted['Univ_Code'] == 'BYU') & \
               (df_forecast_adjusted['Year'] == year)

        df_forecast_adjusted.loc[mask, 'Enrollment'] += medical_enrollment

# Log the adjustment
print(f"Applied BYU Medical School adjustment:")
print(f"  - Annual enrollment: {MEDICAL_SCHOOL_ANNUAL_ENROLLMENT} students/year")
print(f"  - Start year: {MEDICAL_SCHOOL_START_YEAR}")
print(f"  - Ramp-up period: {MEDICAL_SCHOOL_PROGRAM_LENGTH} years (2027-2030)")
print(f"  - Steady-state enrollment: {MEDICAL_SCHOOL_PROGRAM_LENGTH * MEDICAL_SCHOOL_ANNUAL_ENROLLMENT} students (2031+)")
Applied BYU Medical School adjustment:
  - Annual enrollment: 100 students/year
  - Start year: 2027
  - Ramp-up period: 4 years (2027-2030)
  - Steady-state enrollment: 400 students (2031+)

8 Campus-Level Disaggregation (2000-2060)

For Home Based College (HBC) travel modeling, we disaggregate institution-level forecasts to specific campus locations across the full time series (2000-2060).

Methodology: - Non-UVU institutions: Apply 2025 USHE campus splits to all years (2000-2060) - UVU campuses: Apply time-varying proportions from historical spatial planning data

8.1 Step 1: Load USHE 2025 Campus Enrollment Data

Show the code
# Load USHE 2025 enrollment data
df_ushe_2025 = pd.read_excel(
    PATHS["ushe_2025"],
    sheet_name="Sheet1"
)

print(f"Total USHE records loaded: {len(df_ushe_2025)}")
print(f"\nUnique terms: {sorted(df_ushe_2025['Term'].unique())}")
print(f"Unique years: {sorted(df_ushe_2025['Academic Year'].unique())}")

# Filter to Academic Year 2025, Term 2 (Fall semester)
df_ushe_fall_2025 = df_ushe_2025[
    (df_ushe_2025['Academic Year'] == 2025) &
    (df_ushe_2025['Term'] == 2)
].copy()

print(f"\nFiltered to Fall 2025: {len(df_ushe_fall_2025)} records")

# Aggregate headcount by Institution and Site (sum across all ZIP codes)
df_campus_enrollment = df_ushe_fall_2025.groupby(['Institution', 'Site'])['Headcount'].sum().reset_index()

print(f"Unique campuses: {len(df_campus_enrollment)}")

# Show totals by institution
print("\n=== Total Enrollment by Institution (Fall 2025) ===")
inst_totals = df_campus_enrollment.groupby('Institution')['Headcount'].sum().sort_values(ascending=False)
for inst, count in inst_totals.items():
    print(f"{inst}: {count:,}")
Total USHE records loaded: 6014

Unique terms: [2]
Unique years: [2025]

Filtered to Fall 2025: 6014 records
Unique campuses: 57

=== Total Enrollment by Institution (Fall 2025) ===
University Of Utah: 44,633
Utah Valley University: 41,257
Utah State University: 32,782
Weber State University: 24,442
Salt Lake Community College: 23,802

8.2 Step 2: Define Campus Mapping (Non-UVU Institutions)

Show the code
# Map USHE Site names to campus codes for non-UVU institutions
CAMPUS_MAPPING = {
    'Salt Lake Community College': {
        'Main Campus': 'SLCC_MAIN',
        'SLCC - Jordan Campus': 'SLCC_JD',
        'SLCC - Larry H. Miller Campus': 'SLCC_ML',
        'SLCC - South City Center': 'SLCC_SC',
    },
    'University Of Utah': {
        'Main Campus': 'UOFU_Main',
    },
    'Weber State University': {
        'Main Campus': 'WSU_Main',
        'WSU - Davis Center': 'WSU_Davis',
    }
}

# Single-campus institutions (no disaggregation needed)
SINGLE_CAMPUS_CODES = ['BYU', 'ENSIGN', 'WESTMIN']

# Map institution codes to USHE names
INSTITUTION_NAME_MAP = {
    'SLCC': 'Salt Lake Community College',
    'UofU': 'University Of Utah',
    'WSU': 'Weber State University',
}

8.3 Step 3: Calculate Campus Percentages (Non-UVU)

Show the code
# Calculate percentage splits for non-UVU institutions
campus_percentages = {}

print("\n=== Calculating Campus Percentages (Non-UVU) ===")

for institution, campus_map in CAMPUS_MAPPING.items():
    print(f"\n{institution}:")

    inst_data = df_campus_enrollment[df_campus_enrollment['Institution'] == institution]

    if len(inst_data) == 0:
        print(f"  WARNING: No data found")
        continue

    # Get enrollment for mapped campuses only
    mapped_campuses = list(campus_map.keys())
    mapped_data = inst_data[inst_data['Site'].isin(mapped_campuses)]

    total_mapped = mapped_data['Headcount'].sum()
    print(f"  Total mapped enrollment: {total_mapped:,}")

    if total_mapped > 0:
        for site, campus_code in campus_map.items():
            site_enrollment = mapped_data[mapped_data['Site'] == site]['Headcount'].sum()
            percentage = site_enrollment / total_mapped if total_mapped > 0 else 0
            campus_percentages[campus_code] = percentage
            print(f"    {campus_code}: {site_enrollment:,} students ({percentage:.1%})")

print("\n" + "="*60)
print("Final Non-UVU Campus Split Percentages:")
for campus, pct in sorted(campus_percentages.items()):
    print(f"  {campus}: {pct:.1%}")

=== Calculating Campus Percentages (Non-UVU) ===

Salt Lake Community College:
  Total mapped enrollment: 21,355
    SLCC_MAIN: 15,618 students (73.1%)
    SLCC_JD: 3,045 students (14.3%)
    SLCC_ML: 369 students (1.7%)
    SLCC_SC: 2,323 students (10.9%)

University Of Utah:
  Total mapped enrollment: 27,488
    UOFU_Main: 27,488 students (100.0%)

Weber State University:
  Total mapped enrollment: 12,446
    WSU_Main: 10,366 students (83.3%)
    WSU_Davis: 2,080 students (16.7%)

============================================================
Final Non-UVU Campus Split Percentages:
  SLCC_JD: 14.3%
  SLCC_MAIN: 73.1%
  SLCC_ML: 1.7%
  SLCC_SC: 10.9%
  UOFU_Main: 100.0%
  WSU_Davis: 16.7%
  WSU_Main: 83.3%

8.4 Step 4: Hardcode UVU Campus Historical Data

Show the code
# Hardcoded UVU campus enrollment from previous RTP spatial planning
# Source: Historical records and approved campus expansion plans
# Note: UVU-Heber excluded (outside WF-TDM region)

uvu_campus_hardcoded = pd.DataFrame({
    'Year': list(range(2000, 2061)),
    'UVU_Orem': [20946, 22609, 23609, 23803, 24149, 24487, 23305, 23840, 27052, 29045,
                 31739, 32734, 31810, 30880, 31589, 33565, 35126, 37785, 39931, 40278,
                 39438, 39715, 41504, 43105, 44720, 46420, 47173, 48078, 48855, 48850,
                 49762, 50673, 51583, 52494, 53405, 54317, 55227, 56138, 57049, 57960,
                 58871, 59781, 60693, 61604, 62515, 63426, 64338, 65248, 66159, 67069,
                 67981, 67981, 67981, 67981, 67981, 67981, 67981, 67981, 67981, 67981, 67981],
    'UVU_Lehi': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                 0, 0, 0, 0, 0, 0, 0, 0, 0, 1300,
                 1342, 1384, 1426, 1468, 1510, 1552, 1594, 1635, 1677, 1719,
                 1761, 1803, 1845, 1887, 1929, 1971, 2013, 2055, 2097, 2139,
                 2181, 2223, 2265, 2306, 2348, 2390, 2432, 2474, 2516, 2558,
                 2600, 2600, 2600, 2600, 2600, 2600, 2600, 2600, 2600, 2600, 2600],
    'UVU_Payson': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                   0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                   0, 0, 0, 0, 0, 0, 0, 0, 0, 650,
                   762, 874, 986, 1098, 1210, 1321, 1433, 1545, 1657, 1769,
                   1881, 1993, 2105, 2217, 2329, 2440, 2552, 2664, 2776, 2888,
                   3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000],
    'UVU_Vineyard1': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                      0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                      0, 0, 0, 0, 0, 0, 570, 725, 879, 1034,
                      1187, 1342, 1496, 1651, 1805, 1960, 2114, 2269, 2422, 2577,
                      2731, 2886, 3040, 3195, 3349, 3504, 3657, 3812, 3966, 4121,
                      4275, 4275, 4275, 4275, 4275, 4275, 4275, 4275, 4275, 4275, 4275],
    'UVU_Vineyard2': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                      0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                      0, 0, 0, 0, 0, 0, 30, 38, 46, 54,
                      63, 71, 79, 87, 95, 103, 111, 119, 128, 136,
                      144, 152, 160, 168, 176, 184, 193, 201, 209, 217,
                      225, 225, 225, 225, 225, 225, 225, 225, 225, 225, 225],
})

# Calculate total UVU enrollment from hardcoded data
uvu_campus_hardcoded['UVU_Total_Hardcoded'] = (
    uvu_campus_hardcoded['UVU_Orem'] +
    uvu_campus_hardcoded['UVU_Lehi'] +
    uvu_campus_hardcoded['UVU_Payson'] +
    uvu_campus_hardcoded['UVU_Vineyard1'] +
    uvu_campus_hardcoded['UVU_Vineyard2']
)

print("=== UVU Campus Hardcoded Data ===")
print(f"Years covered: {uvu_campus_hardcoded['Year'].min()} - {uvu_campus_hardcoded['Year'].max()}")
print("\nSample (2019-2026 - satellite campus openings):")
print(uvu_campus_hardcoded[uvu_campus_hardcoded['Year'].between(2019, 2026)].to_string(index=False))
=== UVU Campus Hardcoded Data ===
Years covered: 2000 - 2060

Sample (2019-2026 - satellite campus openings):
 Year  UVU_Orem  UVU_Lehi  UVU_Payson  UVU_Vineyard1  UVU_Vineyard2  UVU_Total_Hardcoded
 2019     40278      1300           0              0              0                41578
 2020     39438      1342           0              0              0                40780
 2021     39715      1384           0              0              0                41099
 2022     41504      1426           0              0              0                42930
 2023     43105      1468           0              0              0                44573
 2024     44720      1510           0              0              0                46230
 2025     46420      1552           0              0              0                47972
 2026     47173      1594           0            570             30                49367

8.5 Step 5: Calculate UVU Time-Varying Campus Proportions

Show the code
# Calculate proportion of total UVU enrollment at each campus for each year
# These proportions will be applied to our forecasted UVU total

uvu_campus_proportions = uvu_campus_hardcoded.copy()

for campus in ['UVU_Orem', 'UVU_Lehi', 'UVU_Payson', 'UVU_Vineyard1', 'UVU_Vineyard2']:
    prop_col = campus.replace('UVU_', 'prop_')
    uvu_campus_proportions[prop_col] = (
        uvu_campus_proportions[campus] / uvu_campus_proportions['UVU_Total_Hardcoded']
    )

# Keep proportions for merging
uvu_props_clean = uvu_campus_proportions[['Year', 'prop_Orem', 'prop_Lehi', 'prop_Payson', 'prop_Vineyard1', 'prop_Vineyard2']].copy()

print("\n=== UVU Campus Proportions (Sample Years) ===")
print(uvu_props_clean[uvu_props_clean['Year'].isin([2000, 2019, 2026, 2029, 2050, 2060])].to_string(index=False))

=== UVU Campus Proportions (Sample Years) ===
 Year  prop_Orem  prop_Lehi  prop_Payson  prop_Vineyard1  prop_Vineyard2
 2000   1.000000   0.000000     0.000000        0.000000        0.000000
 2019   0.968733   0.031267     0.000000        0.000000        0.000000
 2026   0.955557   0.032289     0.000000        0.011546        0.000608
 2029   0.933909   0.032864     0.012427        0.019768        0.001032
 2050   0.870647   0.033299     0.038422        0.054751        0.002882
 2060   0.870647   0.033299     0.038422        0.054751        0.002882

8.6 Step 6: Create Combined Historical + Forecast Dataset

Show the code
# Combine historical enrollment (2000-2023) with forecast (2024-2060)
# Historical: df_enroll_ipeds
# Forecast: df_forecast_adjusted

# Prepare historical data
df_hist_inst = df_enroll_ipeds.copy()
df_hist_inst = df_hist_inst[df_hist_inst['Year'].between(2000, 2023)].copy()

# Melt to long format with proper column names
df_hist_long = df_hist_inst.melt(id_vars=['Year'], var_name='Univ_Code', value_name='Enrollment')
df_hist_long = df_hist_long.dropna()
df_hist_long['Type'] = 'Historical'

# Prepare forecast data
df_forecast_long = df_forecast_adjusted[['Year', 'Univ_Code', 'Enrollment']].copy()
df_forecast_long['Type'] = 'Projected'

# Combine
df_combined_inst = pd.concat([df_hist_long, df_forecast_long], ignore_index=True)
df_combined_inst = df_combined_inst.sort_values(['Univ_Code', 'Year']).reset_index(drop=True)

print(f"\n=== Combined Institution-Level Dataset ===")
print(f"Years: {df_combined_inst['Year'].min()} - {df_combined_inst['Year'].max()}")
print(f"Total records: {len(df_combined_inst)}")
print(f"Historical: {len(df_combined_inst[df_combined_inst['Type']=='Historical'])}")
print(f"Projected: {len(df_combined_inst[df_combined_inst['Type']=='Projected'])}")

=== Combined Institution-Level Dataset ===
Years: 2000 - 2060
Total records: 427
Historical: 168
Projected: 259

8.7 Step 7: Disaggregate to Campus Level (2000-2060)

Show the code
# Create year-indexed output dataframe
all_years = sorted(df_combined_inst['Year'].unique())
df_campus_output = pd.DataFrame({'YEAR': all_years})

print(f"\n=== Disaggregating to Campus Level ===")
print(f"Processing {len(all_years)} years from {min(all_years)} to {max(all_years)}")

# Process each institution
for univ_code in df_combined_inst['Univ_Code'].unique():
    print(f"\n{univ_code}:")

    # Get enrollment time series
    inst_data = df_combined_inst[df_combined_inst['Univ_Code'] == univ_code][['Year', 'Enrollment']].copy()
    inst_data = inst_data.set_index('Year').sort_index()

    # === SINGLE-CAMPUS INSTITUTIONS ===
    if univ_code in SINGLE_CAMPUS_CODES:
        df_campus_output[univ_code] = df_campus_output['YEAR'].map(inst_data['Enrollment'])
        print(f"  Single campus - assigned directly")

    # === UVU - SPECIAL TIME-VARYING PROPORTIONS ===
    elif univ_code == 'UVU':
        print(f"  Applying time-varying campus proportions")

        # Merge our UVU forecast with proportions
        uvu_merged = inst_data.reset_index().merge(
            uvu_props_clean,
            on='Year',
            how='left'
        )

        # Apply proportions to forecasted enrollment
        for campus_name, prop_name in [
            ('UVU_Main', 'prop_Orem'),
            ('UVU_Lehi', 'prop_Lehi'),
            ('UVU_Payson', 'prop_Payson'),
            ('UVU_Vineyard1', 'prop_Vineyard1'),
            ('UVU_Vineyard2', 'prop_Vineyard2')
        ]:
            uvu_merged[campus_name] = (uvu_merged['Enrollment'] * uvu_merged[prop_name]).round().astype(int)

        # Map to output dataframe
        uvu_merged_indexed = uvu_merged.set_index('Year')
        df_campus_output['UVU_Main'] = df_campus_output['YEAR'].map(uvu_merged_indexed['UVU_Main'])
        df_campus_output['UVU_Lehi'] = df_campus_output['YEAR'].map(uvu_merged_indexed['UVU_Lehi'])
        df_campus_output['UVU_Payson'] = df_campus_output['YEAR'].map(uvu_merged_indexed['UVU_Payson'])
        df_campus_output['UVU_Vineyard1'] = df_campus_output['YEAR'].map(uvu_merged_indexed['UVU_Vineyard1'])
        df_campus_output['UVU_Vineyard2'] = df_campus_output['YEAR'].map(uvu_merged_indexed['UVU_Vineyard2'])

        # Combined Vineyard column
        df_campus_output['UVU_Vineyard'] = (
            df_campus_output['UVU_Vineyard1'].fillna(0) + df_campus_output['UVU_Vineyard2'].fillna(0)
        )

        print(f"  Created: UVU_Main, UVU_Lehi, UVU_Payson, UVU_Vineyard1, UVU_Vineyard2, UVU_Vineyard")

    # === MULTI-CAMPUS (NON-UVU) - APPLY 2025 USHE SPLITS ===
    elif univ_code in INSTITUTION_NAME_MAP:
        institution = INSTITUTION_NAME_MAP[univ_code]
        campus_map = CAMPUS_MAPPING.get(institution, {})

        print(f"  Applying 2025 USHE campus splits")

        for site, campus_code in campus_map.items():
            percentage = campus_percentages.get(campus_code, 0)

            if percentage > 0:
                df_campus_output[campus_code] = df_campus_output['YEAR'].map(
                    lambda y: int(round(inst_data.loc[y, 'Enrollment'] * percentage))
                    if y in inst_data.index else 0
                )
                print(f"    {campus_code}: {percentage:.1%}")
            else:
                df_campus_output[campus_code] = 0
                print(f"    {campus_code}: WARNING - no split data")

print(f"\n{'='*60}")
print(f"Generated {len(df_campus_output.columns)-1} campus columns")

=== Disaggregating to Campus Level ===
Processing 61 years from 2000 to 2060

BYU:
  Single campus - assigned directly

ENSIGN:
  Single campus - assigned directly

SLCC:
  Applying 2025 USHE campus splits
    SLCC_MAIN: 73.1%
    SLCC_JD: 14.3%
    SLCC_ML: 1.7%
    SLCC_SC: 10.9%

UVU:
  Applying time-varying campus proportions
  Created: UVU_Main, UVU_Lehi, UVU_Payson, UVU_Vineyard1, UVU_Vineyard2, UVU_Vineyard

UofU:
  Applying 2025 USHE campus splits
    UOFU_Main: 100.0%

WESTMIN:
  Single campus - assigned directly

WSU:
  Applying 2025 USHE campus splits
    WSU_Main: 83.3%
    WSU_Davis: 16.7%

============================================================
Generated 16 campus columns

8.8 Step 8: Finalize Column Order and Export

Show the code
# Define desired column order
COLUMN_ORDER = [
    'YEAR', 'BYU', 'ENSIGN',
    'SLCC_MAIN', 'SLCC_JD', 'SLCC_ML', 'SLCC_SC',
    'UOFU_Main',
    'UVU_Main', 'UVU_Lehi', 'UVU_Payson', 'UVU_Vineyard1', 'UVU_Vineyard2', 'UVU_Vineyard',
    'WSU_Main', 'WSU_Davis',
    'WESTMIN'
]

# Ensure all columns exist
for col in COLUMN_ORDER:
    if col not in df_campus_output.columns:
        print(f"WARNING: Column {col} missing, adding as zeros")
        df_campus_output[col] = 0

# Reorder and clean
df_campus_final = df_campus_output[COLUMN_ORDER].copy()
df_campus_final = df_campus_final.sort_values('YEAR').reset_index(drop=True)

# Convert to integers
for col in COLUMN_ORDER:
    if col != 'YEAR':
        df_campus_final[col] = df_campus_final[col].fillna(0).astype(int)

# Display summary
print("\n=== Campus-Level Enrollment Forecast (2000-2060) ===")
print("\nFirst 5 years:")
print(df_campus_final.head())

print("\nSatellite campus opening years (2019-2030):")
print(df_campus_final[df_campus_final['YEAR'].between(2019, 2030)])

print("\nLast 5 years:")
print(df_campus_final.tail())

# Verification: Check UVU total
df_campus_final_check = df_campus_final.copy()
df_campus_final_check['UVU_Total_Check'] = (
    df_campus_final_check['UVU_Main'] + df_campus_final_check['UVU_Lehi'] +
    df_campus_final_check['UVU_Payson'] + df_campus_final_check['UVU_Vineyard1'] +
    df_campus_final_check['UVU_Vineyard2']
)

print("\n=== UVU Total Verification (Sample Years) ===")
verification_years = [2000, 2019, 2026, 2029, 2050, 2060]
print(df_campus_final_check[df_campus_final_check['YEAR'].isin(verification_years)][
    ['YEAR', 'UVU_Main', 'UVU_Lehi', 'UVU_Payson', 'UVU_Vineyard1', 'UVU_Vineyard2', 'UVU_Total_Check']
].to_string(index=False))

=== Campus-Level Enrollment Forecast (2000-2060) ===

First 5 years:
   YEAR    BYU  ENSIGN  SLCC_MAIN  SLCC_JD  SLCC_ML  SLCC_SC  UOFU_Main  \
0  2000  32554     824      15794     3079      373     2349      24948   
1  2001  32771    1025      17334     3380      410     2578      27668   
2  2002  32408    1124      17075     3329      403     2540      28369   
3  2003  33008    1258      17593     3430      416     2617      28436   
4  2004  34347    1248      18083     3526      427     2690      28933   

   UVU_Main  UVU_Lehi  UVU_Payson  UVU_Vineyard1  UVU_Vineyard2  UVU_Vineyard  \
0     20946         0           0              0              0             0   
1     22609         0           0              0              0             0   
2     23609         0           0              0              0             0   
3     23803         0           0              0              0             0   
4     24149         0           0              0              0             0   

   WSU_Main  WSU_Davis  WESTMIN  
0     13368       2682     2403  
1     14053       2820     2474  
2     15041       3018     2353  
3     15676       3145     2471  
4     15407       3091     2417  

Satellite campus opening years (2019-2030):
    YEAR    BYU  ENSIGN  SLCC_MAIN  SLCC_JD  SLCC_ML  SLCC_SC  UOFU_Main  \
19  2019  34318    1929      21587     4209      510     3211      32852   
20  2020  36461    1829      19961     3892      472     2969      33081   
21  2021  34811    2665      19911     3882      470     2962      34464   
22  2022  34464    4058      19235     3750      454     2861      34734   
23  2023  35074    5973      19574     3816      462     2911      35260   
24  2024  35108    6155      20299     3958      480     3019      35587   
25  2025  35142    6340      21035     4101      497     3129      35914   
26  2026  35175    6541      21844     4259      516     3249      36241   
27  2027  35309    6717      22545     4396      533     3353      36567   
28  2028  35443    6813      22986     4481      543     3419      36894   
29  2029  35577    6856      23255     4534      549     3459      37221   
30  2030  35710    6877      23454     4573      554     3489      37548   

    UVU_Main  UVU_Lehi  UVU_Payson  UVU_Vineyard1  UVU_Vineyard2  \
19     40423      1305           0              0              0   
20     39589      1347           0              0              0   
21     39873      1389           0              0              0   
22     41667      1432           0              0              0   
23     43180      1471           0              0              0   
24     45449      1535           0              0              0   
25     47778      1597           0              0              0   
26     49372      1668           0            597             31   
27     51751      1760           0            780             41   
28     53560      1839           0            964             50   
29     54078      1903         720           1145             60   
30     54878      1942         840           1309             69   

    UVU_Vineyard  WSU_Main  WSU_Davis  WESTMIN  
19             0     24690       4954     2215  
20             0     24650       4946     1849  
21             0     24798       4976     1535  
22             0     24915       4999     1280  
23             0     25433       5103     1214  
24             0     26892       5396     1492  
25             0     28391       5697     1776  
26           628     30064       6033     2073  
27           821     31513       6323     2364  
28          1014     32321       6486     2628  
29          1205     32929       6608     2870  
30          1378     33517       6725     3100  

Last 5 years:
    YEAR    BYU  ENSIGN  SLCC_MAIN  SLCC_JD  SLCC_ML  SLCC_SC  UOFU_Main  \
56  2056  36588    6699      25315     4936      598     3765      46046   
57  2057  36622    6783      25711     5013      607     3824      46373   
58  2058  36656    6847      26038     5076      615     3873      46700   
59  2059  36690    6890      26294     5126      621     3911      47027   
60  2060  36724    6917      26488     5164      626     3940      47353   

    UVU_Main  UVU_Lehi  UVU_Payson  UVU_Vineyard1  UVU_Vineyard2  \
56     66356      2538        2928           4173            220   
57     67428      2579        2976           4240            223   
58     68190      2608        3009           4288            226   
59     68662      2626        3030           4318            227   
60     68899      2635        3040           4333            228   

    UVU_Vineyard  WSU_Main  WSU_Davis  WESTMIN  
56          4393     40356       8098     3194  
57          4463     41222       8272     3222  
58          4514     41990       8426     3242  
59          4545     42655       8559     3254  
60          4561     43217       8672     3258  

=== UVU Total Verification (Sample Years) ===
 YEAR  UVU_Main  UVU_Lehi  UVU_Payson  UVU_Vineyard1  UVU_Vineyard2  UVU_Total_Check
 2000     20946         0           0              0              0            20946
 2019     40423      1305           0              0              0            41728
 2026     49372      1668           0            597             31            51668
 2029     54078      1903         720           1145             60            57906
 2050     56257      2152        2483           3538            186            64616
 2060     68899      2635        3040           4333            228            79135

8.9 Export Campus-Level Forecast

Show the code
# Export disaggregated campus-level forecast (2000-2060)
df_campus_final.to_csv(PATHS['campus_forecast_output'], index=False)
print(f"\n{'='*60}")
print(f"Campus-level forecast (2000-2060) saved to:")
print(f"  {PATHS['campus_forecast_output']}")
print(f"\nTotal years: {len(df_campus_final)}")
print(f"Total campuses: {len(COLUMN_ORDER) - 1}")

============================================================
Campus-level forecast (2000-2060) saved to:
  intermediate\enrollment_forecast_by_campus_2000_2060.csv

Total years: 61
Total campuses: 16

9 OJS Data Preparation & Exports

Show the code
# ── Export Only the Final Adjusted Output CSV ─────────────────────────────────
df_forecast_adjusted.to_csv(PATHS["forecast_output"], index=False)

# ── Metrics & Residuals Export ────────────────────────────────────────────────
df_all_metrics = pd.concat([pd.DataFrame(model_metrics), pd.DataFrame(flagship_metrics)], ignore_index=True)
df_residuals = pd.DataFrame(residual_records)

# ── Prepare Forecast Data for OJS ─────────────────────────────────────────────
df_history_long = df_enroll_ipeds.melt(id_vars=["Year"], var_name="Univ_Code", value_name="Enrollment").dropna()
df_history_long["University"] = df_history_long["Univ_Code"].map(DISPLAY_NAMES)
df_history_long["Type"] = "Historical"
df_history_long["Scenario"] = "Both"

df_proj_base = df_forecast_base.copy()
df_proj_base["Scenario"] = "Off"

df_proj_adj = df_forecast_adjusted.copy()
df_proj_adj["Scenario"] = "On"

# Combine everything for the UI Toggle
df_final_trend = pd.concat(
    [
        df_history_long[["Year", "University", "Enrollment", "Type", "Scenario"]],
        df_proj_base[["Year", "University", "Enrollment", "Type", "Scenario"]],
        df_proj_adj[["Year", "University", "Enrollment", "Type", "Scenario"]],
    ],
    ignore_index=True,
).sort_values(["University", "Year"])

# Regional rate data (Historical & Base Model only)
df_regional_rate_long = regional_rate_hist.reset_index()
df_regional_rate_long.columns = ["Year", "Rate"]
df_regional_rate_long["Type"] = "Historical"

ojs_define(
    data_final_forecast=df_final_trend.to_dict(orient="records"),
    data_regional_rate_hist=df_regional_rate_long.to_dict(orient="records"),
    data_regional_rate_proj=pd.DataFrame(regional_rate_proj).to_dict(orient="records"),
    control_ceiling=float(PENETRATION_CEILING),
    data_model_metrics=df_all_metrics.to_dict(orient="records"),
    data_residuals=df_residuals.to_dict(orient="records")
)

10 Results & Diagnostics

10.1 Model Performance

The following tables and charts display the accuracy (R² and Mean Absolute Percentage Error) for both the Base Linear Model (participation rate) and the Direct Flagship Override method. The Test dataset represents performance against the held-out 2020–2023 actuals.

How to interpret these metrics:

  • R^2 Score: Measures how much of the historical enrollment trend is explained by our model. You want this number as close to 1.0 as possible. Compare the “Train” and “Test” scores—a massive drop in the Test score suggests the model is overfitting to past data and struggling with recent years.
  • MAPE (Mean Absolute Percentage Error): The average percentage difference between the model’s prediction and the actual enrollment. Lower is better. A MAPE of 2% means the model is usually off by about 2% of the total student body.
Show the code
Inputs.table(data_model_metrics, {
  sort: "Dataset", reverse: true,
  columns: ["University", "Dataset", "Method", "R2", "MAPE"],
  format: {
    R2: d => d.toFixed(3),
    MAPE: d => (d * 100).toFixed(2) + "%"
  }
})
Show the code
Plot.plot({
  marks: [
    Plot.barY(data_model_metrics, {
      x: "University", y: "R2", fill: "Dataset",
      title: d => `${d.University} (${d.Dataset})\nR²: ${d.R2.toFixed(3)}`,
      sort: { x: "y", reverse: true }
    }),
    Plot.ruleY([0])
  ],
  facet: { data: data_model_metrics, x: "Dataset" },
  x: { tickRotate: -45, label: null },
  y: { grid: true, label: "R² Score" },
  color: { legend: true, range: ["#4e79a7", "#f28e2c"] },
  width: 800, height: 400, marginLeft: 50, marginBottom: 120,
  style: { backgroundColor: "transparent", color: "var(--bs-body-color)" }
})
Figure 2: R² Accuracy Score by University (Test Dataset vs Train Dataset)

Residual Analysis

These charts illustrate how well the linear trends predict historical data.

What to look for in the Residuals plot:

  • Random Scatter: Points should be randomly scattered above and below the zero line.
  • Avoid Patterns: If you see a distinct “U” shape or a consistent diagonal slope, it indicates the university’s growth might be non-linear, and a simple linear regression is consistently over- or under-predicting during certain decades.
Show the code
Plot.plot({
  marks: [
    Plot.ruleY([0], { stroke: "gray", strokeDasharray: "4,4" }),
    Plot.dot(data_residuals, {
      x: "Year", y: "Residual", fill: "University",
      title: d => `${d.University}\nYear: ${d.Year}\nResidual: ${d.Residual.toFixed(0)}\nMethod: ${d.Method}`,
      r: 4
    })
  ],
  facet: { data: data_residuals, x: "Method" },
  color: { legend: true },
  y: { grid: true, label: "Residual (Students)" },
  x: { tickFormat: "d" },
  width: 860,
  style: { backgroundColor: "transparent", color: "var(--bs-body-color)", fontFamily: "sans-serif", fontSize: "12px" }
})
Figure 3: Residuals (Actual - Predicted) over Time

What to look for in the Absolute Percentage Error (APE) plot:

  • Because universities vary wildly in size, raw student residuals can be misleading. Missing by 500 students is a massive error for Westminster but barely noticeable for BYU.
  • This chart normalizes the error to the size of the school. Look for sudden spikes in APE, which indicate anomalous years where the model completely failed to capture enrollment reality.
Show the code
Plot.plot({
  marks: [
    Plot.line(data_residuals, {
      x: "Year", y: "APE", stroke: "University", strokeWidth: 1.5
    }),
    Plot.dot(data_residuals, {
      x: "Year", y: "APE", fill: "University",
      title: d => `${d.University}\nYear: ${d.Year}\nAPE: ${(d.APE * 100).toFixed(1)}%\nMethod: ${d.Method}`,
      r: 3
    })
  ],
  facet: { data: data_residuals, x: "Method" },
  color: { legend: true },
  y: { grid: true, label: "Absolute Percentage Error", tickFormat: "%" },
  x: { tickFormat: "d" },
  width: 860,
  style: { backgroundColor: "transparent", color: "var(--bs-body-color)", fontFamily: "sans-serif", fontSize: "12px" }
})
Figure 4: Absolute Percentage Error over Time

10.2 Regional Control Total

This chart maps the historical and projected regional penetration rate before any flagship adjustments. It ensures the mathematical bounds of our population-based projections remain realistic.

Show the code
{
  const allRates = [...data_regional_rate_hist, ...data_regional_rate_proj]

  return Plot.plot({
    marks: [
      Plot.ruleY([0]),
      Plot.ruleY([control_ceiling], {
        stroke: "var(--bs-danger)", strokeWidth: 2, strokeDasharray: "6,3"
      }),
      Plot.line(allRates, {
        filter: d => d.Type === "Historical",
        x: "Year", y: "Rate",
        stroke: "var(--bs-primary)", strokeWidth: 3
      }),
      Plot.line(allRates, {
        filter: d => d.Type === "Projected",
        x: "Year", y: "Rate",
        stroke: "var(--bs-secondary)", strokeWidth: 2.5, strokeDasharray: "5,4"
      }),
      Plot.dot(allRates, {
        filter: d => d.Type === "Projected" && d.Capped,
        x: "Year", y: "Rate",
        fill: "var(--bs-danger)", r: 4, symbol: "diamond"
      }),
      Plot.text([{year: 2053, rate: control_ceiling}], {
        x: "year", y: "rate",
        text: () => `Ceiling: ${(control_ceiling * 100).toFixed(1)}%`,
        dy: -10, fill: "var(--bs-danger)", fontSize: 11
      }),
      Plot.tip(allRates, Plot.pointerX({
        x: "Year", y: "Rate",
        title: d => `Year: ${d.Year}\nRegional Rate: ${(d.Rate * 100).toFixed(1)}%\n${d.Type}${d.Capped ? "\n⚠ Control total applied" : ""}`,
        fill: "var(--bs-body-bg)", stroke: "var(--bs-body-color)"
      }))
    ],
    width: 780, marginLeft: 60, marginRight: 40,
    y: {
      grid: true, label: "Regional Penetration Rate",
      tickFormat: d => (d * 100).toFixed(0) + "%"
    },
    x: { tickFormat: "d" },
    style: {
      backgroundColor: "transparent", color: "var(--bs-body-color)",
      fontFamily: "sans-serif", fontSize: "12px"
    }
  })
}
Figure 5: Regional Penetration Rate: Historical and Projected vs Control Total Ceiling

10.3 Enrollment Forecast by University

Note: Solid lines represent historical IPEDS data (2000–2023). Dashed lines represent the forecast (2024–2060). Toggle the Flagship Adjustment “On” to see the direct-trend override applied to BYU and UofU, bypassing local demographic constraints.

Show the code
viewof adjustment_toggle = Inputs.radio(
  ["On", "Off"],
  { label: "Flagship Adjustment:", value: "On" }
)
Show the code
Plot.plot({
  marks: [
    Plot.ruleY([0]),
    Plot.ruleX([2024], { stroke: "gray", strokeDasharray: "2,2", strokeOpacity: 0.5 }),
    Plot.line(data_final_forecast, {
      filter: d => d.Type === "Historical",
      x: "Year", y: "Enrollment", z: "University",
      stroke: "University", strokeWidth: 2.5
    }),
    Plot.line(data_final_forecast, {
      filter: d => d.Type === "Projected" && d.Scenario === adjustment_toggle,
      x: "Year", y: "Enrollment", z: "University",
      stroke: "University",
      strokeWidth: 2.5,
      strokeDasharray: "5,5"
    }),
    Plot.text(data_final_forecast, Plot.selectLast({
      filter: d => d.Type === "Historical" || (d.Type === "Projected" && d.Scenario === adjustment_toggle),
      x: "Year", y: "Enrollment", z: "University",
      text: "University", textAnchor: "start", dx: 8, fill: "var(--bs-body-color)"
    }), { stroke: "var(--bs-body-bg)", strokeWidth: 3 }),
    Plot.tip(data_final_forecast, Plot.pointerX({
      filter: d => d.Type === "Historical" || (d.Type === "Projected" && d.Scenario === adjustment_toggle),
      x: "Year", y: "Enrollment",
      title: d => `${d.University}\nYear: ${d.Year}\nStudents: ${d.Enrollment.toLocaleString()}\n(${d.Type})`,
      fill: "var(--bs-body-bg)", stroke: "var(--bs-body-color)"
    }))
  ],
  width: 860, marginLeft: 60, marginRight: 185,
  color: { legend: true },
  y: { grid: true, label: "Fall Enrollment", tickFormat: "s" },
  x: { tickFormat: "d" },
  style: {
    backgroundColor: "transparent", color: "var(--bs-body-color)",
    fontFamily: "sans-serif", fontSize: "12px"
  }
})
Figure 6: College Enrollment Forecast by University (2000–2060)