<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Nlp on Galliard7</title><link>https://galliard7.github.io/tags/nlp/</link><description>Recent content in Nlp on Galliard7</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Mon, 15 Jan 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://galliard7.github.io/tags/nlp/index.xml" rel="self" type="application/rss+xml"/><item><title>PII Data Detection</title><link>https://galliard7.github.io/projects/pii-data-detection/</link><pubDate>Mon, 15 Jan 2024 00:00:00 +0000</pubDate><guid>https://galliard7.github.io/projects/pii-data-detection/</guid><description>&lt;h2 id="overview"&gt;Overview&lt;/h2&gt;
&lt;p&gt;Detecting personally identifiable information (PII) in student essays using named entity recognition. The challenge required identifying and classifying PII tokens across thousands of documents with high precision and recall.&lt;/p&gt;
&lt;h2 id="approach"&gt;Approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Synthetic data generation&lt;/strong&gt; to augment limited training data with realistic PII patterns&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DeBERTa-based NER models&lt;/strong&gt; fine-tuned for token-level PII classification&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ONNX optimization&lt;/strong&gt; for inference speed without sacrificing accuracy&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ensemble strategy&lt;/strong&gt; combining multiple model checkpoints&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="result"&gt;Result&lt;/h2&gt;
&lt;p&gt;121/2048 🥉&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Public leaderboard:&lt;/strong&gt; 0.970&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Private leaderboard:&lt;/strong&gt; 0.956&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="links"&gt;Links&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Galliard7/pii-data-detection"&gt;GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>LLM Science Exam</title><link>https://galliard7.github.io/projects/llm-science-exam/</link><pubDate>Sun, 15 Oct 2023 00:00:00 +0000</pubDate><guid>https://galliard7.github.io/projects/llm-science-exam/</guid><description>&lt;h2 id="overview"&gt;Overview&lt;/h2&gt;
&lt;p&gt;Answering difficult science questions using large language models. The competition required selecting correct answers from multiple-choice science questions spanning physics, chemistry, biology, and other domains.&lt;/p&gt;
&lt;h2 id="result"&gt;Result&lt;/h2&gt;
&lt;p&gt;194/2664 🥉&lt;/p&gt;
&lt;h2 id="links"&gt;Links&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/Galliard7/llm-science-exam"&gt;GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>