Case Study: Predicting Next-Year Audit Findings

Analytical Tools & Software Used

Programming environment

Python (Jupyter Notebook)

Core libraries

pandas / NumPy for data preparation and feature engineering; scikit-learn for modeling, evaluation, and permutation feature importance; SHAP for model interpretability and stakeholder-friendly explanations

Data sources

Federal Audit Clearinghouse (FAC): general-ay, findings-ay, federal_awards-ay (Audit Year files, 2019-2022)

Methodology: Data Preparation & Preprocessing

1) Dataset grain: entity-year (t)

FAC files are naturally organized around submission/report records, so the first step was building a clean panel at the right grain: one row per entity per audit year.

2) Entity identifier strategy (EIN vs UEI)

Older years (2019-2021) contain many placeholder UEIs (e.g., GSA_MIGRATION), making UEI unreliable for identifying unique entities in those years. To keep continuity across 2019-2022:

We used EIN as the stable entity key for modeling across years.
UEI was treated as supplemental metadata (and crosswalked where clean in 2022).

3) Outcome creation: findings at t+1

We created:

current-year outcomes: has_finding_t, finding_count_t
next-year labels: y_has_finding_t1 by shifting outcomes forward one year within each entity

Only years with observable next-year outcomes were kept for supervised training:

Predictors from 2019-2021
Labels from 2020-2022

4) Feature engineering (award-based predictors)

From federal_awards-ay, we aggregated program activity into entity-year features such as:

total amount expended (total_expended)
number of award lines (award_lines)
breadth across agencies and programs (distinct_agencies, distinct_programs)
structural indicators (direct awards, major programs, pass-through, loans)
concentration signals (e.g., max_program_total)

Model Choice and Performance

Chosen model: HistGradientBoosting (HGB)

We selected a HistGradientBoostingClassifier, a highly effective supervised learning model available in the scikit-learn Python library, because it's strong at detecting complex patterns in large datasets (for example, how award complexity and prior findings combine to elevate risk).

Performance metrics

We evaluated using two standard classification ranking metrics:

0.7656

ROC-AUC
Interpretation: If you randomly choose one entity-year that will have findings next year and one that won't, the model ranks the "will have findings" case higher about 77% of the time.

0.5439

PR-AUC
PR-AUC is especially useful when the outcome is relatively uncommon (audit findings are not present for everyone). A PR-AUC of 0.5439 indicates the model does a solid job concentrating true positives toward the top of the ranking, which is useful for prioritization.

For comparison, here's a logistic regression model run on the same data as a baseline:

ROC-AUC = 0.7575
PR-AUC = 0.5044

The HGB model provided a meaningful lift, especially on PR-AUC.

Interpreting What the Model Learned

We also want the model to be explainable, not a black box. We use Permutation feature importance to accomplish just that.

Permutation importance measures how much performance drops when a feature is shuffled. The top signals included:

prior findings indicators (whether findings occurred, and how many)
award structure/complexity (e.g., direct award lines, major award lines)
breadth across programs/agencies (distinct programs, agencies)
program concentration (max_program_total)

How This Output Can Be Used

This model supports more targeted oversight, while keeping final decisions with program staff, auditors, and policy.

Example applications:

Funding review triggers: When risk is elevated, require additional documentation, consistency checks, or program review before moving forward.
Repeat findings workflow step: For entities with recurring issues, make a risk-model check a standard part of reviewing new awards.
Targeted outreach/support: Prioritize technical assistance or compliance support for high-risk entities.

Recommendations for Improvement

This short case study is just a taste of how powerful analytical tools can transform data into valuable insights. Some ways this model could be made even more effective include:

Expanding the time window: Incorporate more audit years to increase training data, improve stability, and enable stronger "train on past → test on future" validation across multiple years
Refining the outcome: Move beyond "any findings" by predicting more specific outcomes where possible - such as repeat findings, higher-volume findings, or finding categories (i.e., going concerns, material weaknesses) - to make the risk score more actionable for oversight workflows

Predicting Next-Year Audit Findings Using FAC Data