Two layers, one number. Layer 1 measures contrast, faces, text coverage, WCAG readability, vibrancy with real computer vision. Layer 2 hands the image to Claude vision and asks if it can stand out against the top 3 thumbnails really winning in your keyword + format + size bracket. Free creators get one full run per cycle.
Free plan unlocks one full analysis · ~25 seconds per run · re-upload revised versions for side-by-side history
Layer 1 contributes up to 60 deterministic points (contrast, face, text, readability, vibrancy, dimensions, file size). Layer 2 contributes up to 40 vision points (emotion, text psychology, color psychology, composition, title-thumbnail fit, feed distinctiveness). Same scale every time. So an 81 next month is genuinely better than tonight’s 78.
1280×720 · listicle · micro channel
Layer 1 · algorithm
OpenCV · OCR · WCAG
Layer 2 · vision AI
Sonnet 4.6 vs niche
Seven Layer 1 components run pixel-level computer vision (60 points). Six Layer 2 dimensions run Claude Sonnet 4.6 vision against your niche feed (40 points). Each one returns a score, a one-sentence verdict referencing exact visual elements, and a concrete fix when below 8.
Dimensions
5 ptsFull credit at 1280×720 (YouTube’s recommended resolution). Partial at any 16:9 ratio. Zero off-ratio. Those get cropped or letterboxed in feeds and tank CTR.
File size
5 ptsFull credit under 2 MB. Partial under 4 MB. Zero above 4 MB (loads slow on mobile data, hurts the first-frame impression).
Contrast (stddev)
15 ptsGreyscale standard deviation across the whole image. >80 wins full credit. High contrast separates from feed neighbours. <30 means the thumbnail looks washed out.
Face presence
10 ptsHaar cascade face detection. >20% of image area = full credit (faces drive CTR). 10–20% = partial. Detected at all = base credit. Zero faces is OK if vision Layer 2 says the scene compensates.
Text presence
10 pts10–30% of image covered by text wins full credit (sweet spot. Readable on mobile, doesn’t crowd the visual). >30% feels cluttered, gets capped.
Text readability (WCAG)
10 ptsReal WCAG luminance-contrast ratio between text and its background. >7:1 wins full credit (AAA). >4.5:1 is partial. Below 3 reads as a smear at 200px.
Color vibrancy
5 ptsMean HSV saturation. >120 wins full credit (vivid). Plus k-means dominant color extraction so Layer 2 can compare against the niche palette.
Facial emotion
10 ptsWhat specific emotion is expressed? Is it readable at 200px (mobile feed size)? Does it match the video’s promise? If no face, does the scene create equivalent emotional pull?
Text psychology
10 ptsDoes the text create curiosity tension without revealing the answer? Does it complement or contradict the image? Bold enough for mobile? If no text, scored against whether the visual is strong enough alone.
Color psychology
10 ptsAre colors emotionally congruent with the topic? Is there a single dominant color that separates this in the feed? Compared directly against the benchmark color palette.
Composition & visual hierarchy
10 ptsWhere does the eye go first? Is there visual tension? Is the most important element in a rule-of-thirds power zone? Mobile-first read, since most YouTube viewing is mobile.
Title-thumbnail relationship
10 ptsDo the title and thumbnail tell DIFFERENT parts of the same story (the gold standard). Or is the thumbnail just illustrating the title? Scored zero if no title was provided.
Feed distinctiveness
10 ptsCompared against the actual top 3 benchmark thumbnails for your niche. Would this stand out, blend in, or disappear? Names the single most distinctive element. Or explains exactly why it blends.
Niche benchmark · listicle · micro
10 top performersYours
Contrast 87 · benchmark avg 64 → +36%
Face yes (24%) · 80% of top performers also have a face
Top 3 by velocity
12.4K/d
9.1K/d
6.7K/d
Niche signature
Common color palette
For every analysis we build a benchmark pool: top 50 niche videos → above-median velocity → last 12 months → >10K views → format match → size-bracket match. The top 10 by velocity become the comparison set. Layer 1 runs on each of their thumbnails, the metrics are averaged, and your face %, text %, contrast, vibrancy are compared head-to-head. The pool is cached per-niche for 30 days and shared across users. So most runs hit a warm cache.
Format-aware
Tutorial / listicle / story / comparison / revelation. Pulled separately.
Size-bracketed
Nano / micro / mid / macro. Your peers, not MrBeast.
Velocity-ranked
Views per day since publish. Recent winners, not stale viral hits.
Cached & shared
30-day pool TTL across users. Most runs hit a warm pool.
Five stages. Re-upload a revised version anytime. The version-history panel tracks the score across iterations so you can see exactly what moved the needle.
Upload + context
Drop the image, paste the draft title, pick the keyword you’re targeting. Or pull the title and keyword from a video idea you generated in Competitor Analysis.
Layer 1 measures
OpenCV detects faces, pytesseract reads any text, WCAG luminance ratio scores readability, k-means extracts dominant colors, HSV measures vibrancy. 60 points.
Niche pool built
Top 10 thumbnails for your keyword + format + size bracket are fetched + scored. Pool cached 30 days, shared across users. Most runs hit a warm pool.
Layer 2 vision call
Claude Sonnet 4.6 sees your thumbnail alongside the top 3 benchmark thumbnails and scores 6 psychological dimensions in context. 40 points.
Combined result
Score 0–100, per-dimension verdict + fix, biggest win, biggest fix, emotion label, feed-position tag, percentile vs peers, version saved to history.
The studio doesn’t hand you a number and a vague verdict. Each block renders separately so you can scan, iterate, re-upload. And the history panel keeps every version side by side.
Combined score 0–100
Layer 1 contributes up to 60 (deterministic CV). Layer 2 contributes up to 40 (Claude vision). Same scale every time so you can track improvement across versions.
Niche-aware vision read
Claude scores against the actual top 3 benchmark thumbnails for your keyword + format + size bracket. Not generic best practices. Feed distinctiveness is real.
Per-dimension verdict + fix
Each of the 13 dimensions returns a one-sentence verdict referencing exact visual elements, plus one concrete fix when the score is below 8. Names colors, words, positions.
Biggest win + biggest fix
The single strongest element to keep. The single highest-impact change to make. Plus an emotion label, feed-position tag, and click-through prediction vs niche average.
Niche benchmark comparison
Your face %, text %, vibrancy, contrast each plotted against the niche average. Same metrics from the same algorithm. Apples to apples.
Percentile vs peers
Where your thumbnail ranks among every other Thumbnail IQ analysis run for this exact keyword + format + size bracket. So you know whether 78/100 is good for THIS niche.
Version history
Re-upload a revised version and the score is tracked side by side. The history panel shows which iteration moved which dimension and how close you are to the niche top.
Layer 1 runs entirely on our infrastructure. No third-party scoring API, no per-image fees. Layer 2 calls Claude Sonnet 4.6 with your thumbnail and the top 3 benchmark images. Benchmark thumbnails come from the official YouTube Data API; the same public images anyone visiting those channels can see. Each analysis spends one credit on paid plans; free tier gets one full analysis per cycle.
Face detection
OpenCV Haar cascade · frontal-face classifier
Text OCR
pytesseract · sparse-text page mode (psm 11)
Color extraction
OpenCV k-means · k=3 dominant + HSV saturation
Readability ratio
WCAG 2.2 luminance contrast · sampled per text box
Niche benchmark
YouTube Data API · top 10 by view velocity, 30-day cache
Vision model
Claude Sonnet 4.6 · 4-image input · ~12s on warm cache
Free creators get one full two-layer analysis per cycle so you can try the engine on a real thumbnail. Paid plans charge one credit per run. The same engine, no feature differences. Each re-uploaded version is a fresh analysis and a fresh credit.
Free
1
analysis
per cycle
One thumbnail score per cycle. Full two-layer analysis
Solo
20
analyses
included per month
Score every iteration · 3 channels
Growth
50
analyses
included per month
Same engine, higher monthly allowance · 5 channels
Agency
150
analyses
included per month
Pooled across 10 channels · per-version history
Same two-layer engine across every plan, including free.
See full pricing →Real answers from how the product behaves. The two layers, the niche pool, the size brackets, version history, and what won’t work.
Still have questions? Email us →