<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Kaggle on Galliard7</title><link>https://galliard7.github.io/tags/kaggle/</link><description>Recent content in Kaggle on Galliard7</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Mon, 15 Jan 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://galliard7.github.io/tags/kaggle/index.xml" rel="self" type="application/rss+xml"/><item><title>PII Data Detection</title><link>https://galliard7.github.io/projects/pii-data-detection/</link><pubDate>Mon, 15 Jan 2024 00:00:00 +0000</pubDate><guid>https://galliard7.github.io/projects/pii-data-detection/</guid><description>&lt;h2 id="overview"&gt;Overview&lt;/h2&gt;
&lt;p&gt;Detecting personally identifiable information (PII) in student essays using named entity recognition. The challenge required identifying and classifying PII tokens across thousands of documents with high precision and recall.&lt;/p&gt;
&lt;h2 id="approach"&gt;Approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Synthetic data generation&lt;/strong&gt; to augment limited training data with realistic PII patterns&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DeBERTa-based NER models&lt;/strong&gt; fine-tuned for token-level PII classification&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ONNX optimization&lt;/strong&gt; for inference speed without sacrificing accuracy&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ensemble strategy&lt;/strong&gt; combining multiple model checkpoints&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="result"&gt;Result&lt;/h2&gt;
&lt;p&gt;121/2048 🥉&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Public leaderboard:&lt;/strong&gt; 0.970&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Private leaderboard:&lt;/strong&gt; 0.956&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="links"&gt;Links&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Galliard7/pii-data-detection"&gt;GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>LLM Science Exam</title><link>https://galliard7.github.io/projects/llm-science-exam/</link><pubDate>Sun, 15 Oct 2023 00:00:00 +0000</pubDate><guid>https://galliard7.github.io/projects/llm-science-exam/</guid><description>&lt;h2 id="overview"&gt;Overview&lt;/h2&gt;
&lt;p&gt;Answering difficult science questions using large language models. The competition required selecting correct answers from multiple-choice science questions spanning physics, chemistry, biology, and other domains.&lt;/p&gt;
&lt;h2 id="result"&gt;Result&lt;/h2&gt;
&lt;p&gt;194/2664 🥉&lt;/p&gt;
&lt;h2 id="links"&gt;Links&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Galliard7/llm-science-exam"&gt;GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Amex Default Prediction</title><link>https://galliard7.github.io/projects/amex-default-prediction/</link><pubDate>Thu, 15 Jun 2023 00:00:00 +0000</pubDate><guid>https://galliard7.github.io/projects/amex-default-prediction/</guid><description>&lt;h2 id="overview"&gt;Overview&lt;/h2&gt;
&lt;p&gt;Predicting credit card default probability for American Express customers using anonymized transaction and account features. A large-scale tabular competition with heavy feature engineering requirements.&lt;/p&gt;
&lt;h2 id="approach"&gt;Approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Extensive feature engineering over time-series transaction histories&lt;/li&gt;
&lt;li&gt;Gradient boosting models (LightGBM, XGBoost, CatBoost)&lt;/li&gt;
&lt;li&gt;Aggregation features: rolling statistics, lag features, trend indicators&lt;/li&gt;
&lt;li&gt;Careful handling of missing values and categorical encodings&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="result"&gt;Result&lt;/h2&gt;
&lt;p&gt;287/4874 🥉&lt;/p&gt;
&lt;h2 id="links"&gt;Links&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Galliard7/amex-default-prediction"&gt;GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Jane Street Market Prediction</title><link>https://galliard7.github.io/projects/jane-street-market-prediction/</link><pubDate>Wed, 15 Mar 2023 00:00:00 +0000</pubDate><guid>https://galliard7.github.io/projects/jane-street-market-prediction/</guid><description>&lt;h2 id="overview"&gt;Overview&lt;/h2&gt;
&lt;p&gt;Predicting profitable trading opportunities from anonymized financial market data. The competition required building models that could identify actionable signals in noisy, high-dimensional market features.&lt;/p&gt;
&lt;h2 id="approach"&gt;Approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Feature selection and denoising on anonymized market signals&lt;/li&gt;
&lt;li&gt;Gradient boosting and neural network ensembles&lt;/li&gt;
&lt;li&gt;Custom utility-based optimization aligned with competition metric&lt;/li&gt;
&lt;li&gt;Time-aware validation to avoid lookahead bias&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="links"&gt;Links&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Galliard7/jane-street-market-prediction"&gt;GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Ventilator Pressure Prediction</title><link>https://galliard7.github.io/projects/ventilator-pressure-prediction/</link><pubDate>Mon, 15 Nov 2021 00:00:00 +0000</pubDate><guid>https://galliard7.github.io/projects/ventilator-pressure-prediction/</guid><description>&lt;h2 id="overview"&gt;Overview&lt;/h2&gt;
&lt;p&gt;Predicting airway pressure in mechanically ventilated patients. The challenge simulated a ventilator connected to a sedated patient&amp;rsquo;s lung, requiring models to predict pressure time series given control inputs.&lt;/p&gt;
&lt;h2 id="result"&gt;Result&lt;/h2&gt;
&lt;p&gt;185/2605 🥉&lt;/p&gt;
&lt;h2 id="links"&gt;Links&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Galliard7/ventilator-pressure-prediction"&gt;GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>Google Research Football</title><link>https://galliard7.github.io/projects/google-research-football/</link><pubDate>Tue, 15 Sep 2020 00:00:00 +0000</pubDate><guid>https://galliard7.github.io/projects/google-research-football/</guid><description>&lt;h2 id="overview"&gt;Overview&lt;/h2&gt;
&lt;p&gt;Building AI agents that play 11v11 simulated football. Agents receive game observations (player positions, ball state, game mode) and return actions, competing head-to-head on Kaggle&amp;rsquo;s evaluation servers with Elo-style rating.&lt;/p&gt;
&lt;h2 id="approach"&gt;Approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Rule-based tactical foundation&lt;/strong&gt; — &amp;ldquo;marauding wingers&amp;rdquo; formation: wide players sprint down flanks and deliver crosses into the box&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Zone-based decision architecture&lt;/strong&gt; — field divided into zones (defensive third, wing corridors, crossing range, shooting range) with different behaviors per zone&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Opponent-aware mechanics&lt;/strong&gt; — proximity detection for context-sensitive decisions: sprint in open space, dribble under pressure, pass when crowded&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Goalkeeper exploitation&lt;/strong&gt; — specific logic to detect when the opposing keeper is out of position and trigger long-range shots&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sprint/dribble state machine&lt;/strong&gt; — manages action mode based on field position and opponent proximity&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="result"&gt;Result&lt;/h2&gt;
&lt;p&gt;61/1138 🥉&lt;/p&gt;</description></item></channel></rss>