Methodology

How we measured the 83% match rate.

We claim Clypt matches 83% of expert editorial decisions on VC podcasts. Here’s exactly how we tested that, what the number means, and where the model falls short.

The question we were trying to answer

When a professional podcast editor reviews a 45-minute VC episode and selects the 5-8 best moments to clip for social media, can Clypt independently identify the same moments?

This is a harder problem than it sounds. The editor isn’t just finding “interesting” moments. They’re making judgment calls about what will resonate with a specific audience (VCs, founders, LPs), on specific platforms (LinkedIn, X), in the context of the guest’s reputation and the show’s brand. We needed to know if Clypt could replicate those judgment calls.

The dataset: 870+ real editorial decisions

We used editorial data from 20VC (The Twenty Minute VC), the world’s largest VC podcast hosted by Harry Stebbings. Specifically:

Source episodes: A set of recent 20VC episodes spanning multiple guest types (GPs, founders, operators, LPs)
Ground truth: The actual clips that the 20VC editorial team selected and posted to social media (primarily Twitter/X)
Total decisions: 870+ individual clip selections across these episodes

We chose 20VC because it has the most sophisticated editorial operation in VC podcasting. Their team reviews every episode and makes deliberate clip selections — this isn’t random or automated. If we could match their decisions, we could likely match any VC podcast editor’s decisions.

The blind test design

The test was structured to prevent any form of data leakage:

Clypt analyzed each episode independently — given only the transcript and basic metadata (episode title, guest name, guest title). No information about which clips had been selected by the editorial team.
Clypt produced a ranked list of recommended clips — typically 5-8 per episode, each with a timestamp range, archetype classification, and editorial rationale.
We compared against the ground truth — did Clypt identify the same moments the human editors chose?

How we defined a “match”

A match is counted when Clypt’s recommended clip overlaps with an editorial team’s selected clip by a meaningful margin. In practice, this means:

Timestamp overlap of 60%+ — the core moment Clypt identified covers at least 60% of the moment the editor selected (or vice versa). This accounts for reasonable differences in where exactly to start and end a clip.
Same semantic moment — even if timestamp boundaries differ slightly, both selections capture the same conversational moment (the same story, take, or insight).

We did not count partial overlaps where Clypt caught part of a broader segment but missed the key moment. The standard was: would a human editor look at both clips and agree they selected the same moment?

The results

83%

Clip selection match rate

870+

Editorial decisions tested

5-8

Clips identified per episode

83% of the clips the 20VC editorial team selected were also identified by Clypt. Of the remaining 17%, most were clips selected for reasons specific to the show’s audience strategy (e.g., promoting an upcoming series, highlighting a returning guest relationship) rather than the clip’s standalone quality.

What about false positives?

Clypt also identified clips that the editorial team did not select. This is expected and, in many cases, a feature rather than a bug. No editorial team clips every good moment — they’re constrained by posting cadence, platform strategy, and audience fatigue. Many of Clypt’s “extra” clips are genuinely strong moments that were left on the table.

That said, some false positives are genuine misses — moments that look clippable in isolation but wouldn’t land well on social. The model is strongest on the Counterintuitive Take and Bold Claim archetypes and weakest on Vulnerable Moment clips, where the line between powerful and too-personal requires human judgment.

Known limitations

We want to be transparent about what this test does and does not prove:

Single show. The test was conducted against 20VC editorial decisions. While 20VC is the gold standard, different shows may have different editorial styles. We are expanding the dataset to include more podcasts.
English-only. All episodes tested were in English. We have not yet tested against non-English VC podcasts.
Timestamp precision. Clypt identifies the right moment but sometimes recommends slightly different start/end points than a human editor would. The “same moment” standard is somewhat subjective.
Context-dependent clips. Clips that depend on external context (current events, previous episodes, inside jokes) are harder for the model to evaluate.

Why this matters

The 83% number isn’t meant to suggest Clypt replaces human editorial judgment. It means that when you get your clips back from Clypt, the vast majority will be the same moments an experienced editor would choose. The remaining clips are either edge cases or genuinely good moments the editor didn’t have bandwidth to post.

For a VC podcast host posting 1-2 clips per episode, Clypt surfaces the 5-6 additional moments they’re currently missing. The 83% match rate gives confidence that those moments are the right ones.

Next steps

We’re actively expanding the dataset:

Adding editorial data from additional VC podcasts beyond 20VC
Building clip-to-engagement mapping (do the clips Clypt selects actually perform well on social?)
Testing across different VC podcast sub-genres: fund mechanics, founder stories, market analysis

We’ll update this page as new data comes in. If you run a VC podcast and want to participate in our testing, reach out.

See it for yourself.

Send us 2 episodes. We’ll clip them free so you can compare Clypt’s selections against your own instincts.

Get 2 Episodes Clipped Free or try Clypt free