PII Data Detection
Overview
Detecting personally identifiable information (PII) in student essays using named entity recognition. The challenge required identifying and classifying PII tokens across thousands of documents with high precision and recall.
Approach
- Synthetic data generation to augment limited training data with realistic PII patterns
- DeBERTa-based NER models fine-tuned for token-level PII classification
- ONNX optimization for inference speed without sacrificing accuracy
- Ensemble strategy combining multiple model checkpoints
Result
121/2048 🥉
- Public leaderboard: 0.970
- Private leaderboard: 0.956
Links
LLM Science Exam
Overview
Answering difficult science questions using large language models. The competition required selecting correct answers from multiple-choice science questions spanning physics, chemistry, biology, and other domains.
Result
194/2664 🥉
Links
Amex Default Prediction
Overview
Predicting credit card default probability for American Express customers using anonymized transaction and account features. A large-scale tabular competition with heavy feature engineering requirements.
Approach
- Extensive feature engineering over time-series transaction histories
- Gradient boosting models (LightGBM, XGBoost, CatBoost)
- Aggregation features: rolling statistics, lag features, trend indicators
- Careful handling of missing values and categorical encodings
Result
287/4874 🥉
Links
Jane Street Market Prediction
Overview
Predicting profitable trading opportunities from anonymized financial market data. The competition required building models that could identify actionable signals in noisy, high-dimensional market features.
Approach
- Feature selection and denoising on anonymized market signals
- Gradient boosting and neural network ensembles
- Custom utility-based optimization aligned with competition metric
- Time-aware validation to avoid lookahead bias
Links
Ventilator Pressure Prediction
Overview
Predicting airway pressure in mechanically ventilated patients. The challenge simulated a ventilator connected to a sedated patient’s lung, requiring models to predict pressure time series given control inputs.
Result
185/2605 🥉
Links
Google Research Football
Overview
Building AI agents that play 11v11 simulated football. Agents receive game observations (player positions, ball state, game mode) and return actions, competing head-to-head on Kaggle’s evaluation servers with Elo-style rating.
Approach
- Rule-based tactical foundation — “marauding wingers” formation: wide players sprint down flanks and deliver crosses into the box
- Zone-based decision architecture — field divided into zones (defensive third, wing corridors, crossing range, shooting range) with different behaviors per zone
- Opponent-aware mechanics — proximity detection for context-sensitive decisions: sprint in open space, dribble under pressure, pass when crowded
- Goalkeeper exploitation — specific logic to detect when the opposing keeper is out of position and trigger long-range shots
- Sprint/dribble state machine — manages action mode based on field position and opponent proximity
Result
61/1138 🥉