Projects

PII Data Detection

Overview

Detecting personally identifiable information (PII) in student essays using named entity recognition. The challenge required identifying and classifying PII tokens across thousands of documents with high precision and recall.

Approach

Synthetic data generation to augment limited training data with realistic PII patterns
DeBERTa-based NER models fine-tuned for token-level PII classification
ONNX optimization for inference speed without sacrificing accuracy
Ensemble strategy combining multiple model checkpoints

Result

121/2048 🥉

Public leaderboard: 0.970
Private leaderboard: 0.956

LLM Science Exam

Overview

Answering difficult science questions using large language models. The competition required selecting correct answers from multiple-choice science questions spanning physics, chemistry, biology, and other domains.

Result

194/2664 🥉

Amex Default Prediction

Overview

Predicting credit card default probability for American Express customers using anonymized transaction and account features. A large-scale tabular competition with heavy feature engineering requirements.

Approach

Extensive feature engineering over time-series transaction histories
Gradient boosting models (LightGBM, XGBoost, CatBoost)
Aggregation features: rolling statistics, lag features, trend indicators
Careful handling of missing values and categorical encodings

Result

287/4874 🥉

Jane Street Market Prediction

Overview

Predicting profitable trading opportunities from anonymized financial market data. The competition required building models that could identify actionable signals in noisy, high-dimensional market features.

Approach

Feature selection and denoising on anonymized market signals
Gradient boosting and neural network ensembles
Custom utility-based optimization aligned with competition metric
Time-aware validation to avoid lookahead bias

Ventilator Pressure Prediction

Overview

Predicting airway pressure in mechanically ventilated patients. The challenge simulated a ventilator connected to a sedated patient’s lung, requiring models to predict pressure time series given control inputs.

Result

185/2605 🥉

Google Research Football

Overview

Building AI agents that play 11v11 simulated football. Agents receive game observations (player positions, ball state, game mode) and return actions, competing head-to-head on Kaggle’s evaluation servers with Elo-style rating.

Approach

Rule-based tactical foundation — “marauding wingers” formation: wide players sprint down flanks and deliver crosses into the box
Zone-based decision architecture — field divided into zones (defensive third, wing corridors, crossing range, shooting range) with different behaviors per zone
Opponent-aware mechanics — proximity detection for context-sensitive decisions: sprint in open space, dribble under pressure, pass when crowded
Goalkeeper exploitation — specific logic to detect when the opposing keeper is out of position and trigger long-range shots
Sprint/dribble state machine — manages action mode based on field position and opponent proximity

Result

61/1138 🥉

Kaggle

PII Data Detection

Overview

Approach

Result

Links

LLM Science Exam

Overview

Result

Links

Amex Default Prediction

Overview

Approach

Result

Links

Jane Street Market Prediction

Overview

Approach

Links

Ventilator Pressure Prediction

Overview

Result

Links

Google Research Football

Overview

Approach

Result