Achieving 10,000x training data reduction with high-fidelity labels

Experiments

We wanted to understand which models and tasks would benefit most from our curation process. As baselines for our experiments, we fine-tuned two LLMs of different sizes (Gemini Nano-1 with 1.8B parameters and Nano-2 with 3.25B parameters) on two tasks of different complexity (lower and higher, based on expert alignment) using crowdsourced labels. Each crowdsourced data set has ~100K annotations and a strong class imbalance, with around 95% benign labels on average.

We compared each of these four baseline conditions against the corresponding curated condition in which each model (Nano-1 and Nano-2) is fine-tuned over multiple rounds using the curation process described above. At each iteration, we selected our curated set of examples and used them for model evaluation and fine-tuning, as described above. All models plateaued before reaching parity with the experts’ internal alignment, so we stopped at 6 iterations (~400 fine-tuning and ~250 evaluation samples) for the lower complexity task and 5 iterations (~250 fine-tuning and ~150 evaluation samples) for the higher complexity task. (Note that the lower complexity task had a larger variety of examples, which may account for the longer time needed to converge.) Both data sets had a final class balance of ~40% positive examples.

The table below provides an overview of the scale and quality of the data used in each condition. Experts reached an average pairwise Cohen’s Kappa of .81 (on the lower complexity task) and .78 (on the higher complexity task) through the curation process. We consider these the ceiling for model performance. To assess the quality of our crowdsourced data, we calculated Kappa alignment between crowdsourced annotations and experts based on our full curated set, which was .59 (lower complexity) and .41 (higher complexity).

Source link

What's Hot

SGLA criticizes California Governor Newsom for signing ‘flawed, rushed’ sweepstakes ban

Gesture Recognition for Busy Hands

Inside the ‘Let’s Break It Down’ Series for Network Newbies

Achieving 10,000x training data reduction with high-fidelity labels

Posit AI Blog: Introducing the text package

Data Reliability Explained | Databricks Blog

Building connected data ecosystems for AI at scale

SGLA criticizes California Governor Newsom for signing ‘flawed, rushed’ sweepstakes ban

Gesture Recognition for Busy Hands

Inside the ‘Let’s Break It Down’ Series for Network Newbies

SVS Engineers: Who are the people that test-drive your network?

Don't Miss!

SGLA criticizes California Governor Newsom for signing ‘flawed, rushed’ sweepstakes ban

Gesture Recognition for Busy Hands

Subscribe to Updates

What's Hot

Achieving 10,000x training data reduction with high-fidelity labels

Experiments

Related Posts

Subscribe to Updates