aperture is the leading AI interviewer and hiring intelligence platform. It conducts structured behavioral interviews, scores candidates across six dimensions using Bayesian inference, and delivers a ranked shortlist to hiring teams in under 2 days. It is a top alternative to HeyMilo, HireVue, and other AI interview tools.

How does aperture's AI interviewer work?

aperture's AI interviewer ingests a job description, generates role-specific behavioral questions, conducts a 15 minute adaptive interview with each candidate, and uses its λ-CORE Bayesian scoring engine to rank candidates pool-relatively across six dimensions: cognitive reasoning, domain knowledge, communication, behavioral indicators, collaboration, and adaptability.

Is aperture better than HeyMilo or HireVue?

aperture uses Bayesian posterior estimation for candidate scoring, which provides confidence intervals rather than arbitrary cutoffs. Unlike keyword-based screeners, aperture evaluates behavioral signals through adaptive follow-up questions. It integrates with Greenhouse, Lever, Ashby, LinkedIn, and Indeed, and delivers results in under 2 days.

What ATS systems does aperture integrate with?

aperture integrates with LinkedIn, Greenhouse, Ashby, Lever, Indeed, and Wellfound. It can also operate standalone as its own ATS layer for teams that don't use an existing ATS.

How long does it take to get a candidate shortlist?

aperture's AI interviewer delivers a ranked shortlist in under 2 days from the time a job description is submitted, including interview completion and Bayesian scoring across the full candidate pool.

Does aperture replace human recruiters?

No. aperture augments recruiters, not replaces them. The AI interviewer handles the structured interview and scoring layer. Your team receives a prioritized shortlist with full transcripts and confidence intervals to make the final hiring decisions.

How does aperture reduce hiring bias?

aperture uses consistent, structured behavioral interviews for every candidate, removing variability from human interviewers. Scoring is based on behavioral signals (not resume keywords), and the Bayesian model explicitly quantifies uncertainty with credible intervals rather than arbitrary cutoffs.

Is aperture free to use?

Yes, aperture is free to start. You can sign up, post a job, and let the AI interviewer screen your candidates without any upfront cost or credit card required.

What is the best AI interviewer in 2026?

aperture is widely considered one of the best AI interviewers in 2026. It uses adaptive behavioral interviews with Bayesian scoring, unlike traditional AI screeners that rely on keyword matching. Other tools in the space include HeyMilo, HireVue, and Paradox AI.

new

the future of hiring is here. try aperture today.

try aperture for free today.

back to blog

engineering

why every hiring score is wrong

the case for comparative candidate evaluation, and why Bayesian posterior estimation is the natural fit for how hiring decisions actually work.

harsh shroff march 19, 2026 10 min read

tl;dr

every screening tool scores candidates against a fixed rubric and produces a number. but hiring decisions are comparative. you are not asking "is this person good?" but "is this person better than the others for this role?" a 7/10 is below average in a strong pool and exceptional in a weak one. same number, opposite meaning. Bayesian posterior estimation fixes this by scoring candidates against each other, expressing uncertainty honestly, and updating rankings as the pool grows.

the problem with how hiring scores work today

every hiring tool on the market works the same way. a candidate takes an assessment, completes a screen, or sits through an interview, and out comes a number. 82/100. 7.4/10. "strong hire."

the number goes into a spreadsheet

it gets compared against a threshold. it informs a decision worth tens of thousands of dollars. and everyone treats it as ground truth.

but the number has no context

82 out of 100, compared to what? with what confidence? against which pool of candidates? nobody knows. the score is a point estimate floating in a vacuum.

the frame itself is wrong

the issue is not that the measurement was bad. it is that fixed rubric scoring, scoring each candidate against a static checklist, is structurally incapable of answering the question hiring managers actually ask.

the question is never "is this candidate good?" it is always "is this candidate better than the others we have seen for this role?" no fixed rubric system can answer this.

the 7/10 illusion

imagine two applicant pools for the same role. say, a senior backend engineer. this is the clearest way to see why fixed scores are broken.

interactive: toggle between pools

same candidate. same score. different pool.

pool avg: 8.0

score 7.0 → below average

candidate 1

9.1

candidate 2

8.7

candidate 3

8.4

candidate 4

candidate 5

candidate 6

6.5

in a strong pool, a 7.0 is below average. a fixed rubric system would still call this candidate "good." it cannot see the context.

this is not a minor calibration issue. it is a structural flaw. when a score has no relationship to the distribution it was drawn from, the score carries no information about relative standing.

where this happens everywhere

product reviews: a 4/5 from a harsh reviewer ≠ a 4/5 from a generous one

performance evaluations: "exceeds expectations" means different things on different teams

grant applications: a score of 85 from one panel ≠ 85 from another

paper reviews: "strong accept" varies wildly between reviewers and venues

how hiring decisions actually work

psychologists call this a rank order judgment. it shows up in any context where selection happens under constraint. not just hiring.

the same pattern everywhere

hiring

"who among these 50 people should we interview next?" you hold the full set in mind and compare.

academic admissions

admissions committees do not ask if a student is "good enough." they ask if this student is stronger than the others competing for the same spots.

sports drafts

you do not draft a player because they score above some threshold. you draft them because they are the best available option relative to your needs.

yet every ATS, every screening tool, every assessment platform evaluates candidates independently, against a fixed rubric, and produces a context free number. the human doing the final evaluation has to manually reconstruct the comparative picture that the tool threw away.

hiring decisions are not absolute. they are comparative. the tools should be too.

what "bayesian" actually means here

the word sounds intimidating. the idea is not. here is the whole thing in four steps.

start with a reasonable guess

before seeing any candidates, you have a rough sense of what 'typical' looks like for this role. that is your prior. a starting belief, not a fixed rubric.

observe each candidate

each person completes their evaluation. the model measures their performance and notes how confident it is in that measurement.

update the picture

the model combines what it just observed with what it already knows about the pool. the result is a posterior. an updated belief about where this candidate stands.

rank within the pool

every candidate's posterior is compared against every other candidate's. the output is not a score. it is a ranking with confidence intervals.

the key difference

fixed rubric

scores each candidate in isolation. produces a number with no context. does not update. cannot express uncertainty.

bayesian comparative

scores each candidate against the pool. produces a ranking with confidence. updates as more data arrives. says "i am not sure" when it should.

you do not need to understand the math to understand why this matters. the model does what every good recruiter already does intuitively: hold the pool in mind, compare candidates against each other, and update the assessment as you see more people. the math just makes it precise and scalable.

the math behind it

for those who want the full picture. this is the actual mathematical model that powers comparative scoring. five steps, from prior to pool ranking.

lambda core · the three equations that matter

posterior · where the candidate actually stands

θ̂_i = (μ_pool/σ² + s_i/SE²) / (1/σ² + 1/SE²)

blends what the pool looks like with what this candidate showed. noisy signal → pulled toward pool mean. clear signal → stays where it is.

composite · single number across all dimensions

C_i = Σ w_d · θ̂_i,d ± 1.28 · √Var

weighted sum of all six dimensions, with an 80% credible interval baked in.

ranking · who is actually better

P(θ_i > θ_j) for all j ∈ pool

principled probability that candidate i outperforms candidate j. no thresholds. no forced curves. just the posterior.

and here is what shrinkage looks like in practice. when the model is uncertain about a candidate, it pulls the score toward the pool mean. when it is confident, the score stays where it is.

shrinkage: noisy scores get pulled toward the mean

shrinkage in action

how noisy scores get pulled toward the pool mean

pool mean: 6.5

high confidenceSE: low

raw: 8.5 → posterior: 8.3clear interview signal. score barely shifts.

moderate confidenceSE: medium

raw: 7.8 → posterior: 7some ambiguity. score pulls toward pool mean.

low confidenceSE: high

raw: 9.2 → posterior: 6.8noisy measurement. score shrinks heavily toward pool mean.

raw score

posterior (after shrinkage)

pool mean

how it works, visually

instead of scoring candidates on a flat scale, a comparative system evaluates across six behavioral dimensions. each with its own score, confidence interval, and pool relative position.

what a candidate profile looks like

candidate profile

sarah chen · senior backend engineer

pool rank: top 8%

COG

cognitive reasoning

8.3

DOM

domain knowledge

7.1

COM

communication

8.8

BEH

behavioral consistency

7.5

COL

collaboration

8.1

ADP

adaptability

7.9

score

80% confidence interval

the score tells you where they landed

an 8.3 on cognitive reasoning means strong structured thinking. but the number alone is not the insight.

the confidence interval tells you how certain the model is

a score of 8.3 with a tight interval [8.0, 8.6] is a confident measurement. a score of 8.3 with [6.5, 9.8] means the model is not sure, and it tells you so.

the pool rank tells you how they compare

'top 8%' means that out of everyone who interviewed for this role, this candidate is stronger than 92% of the pool on this composite. that is the information you actually need.

the ranking evolves as the pool grows

this is the part that changes everything. the ranking is not frozen after the first day. as more candidates enter the pool, every earlier candidate is automatically re evaluated against the new data.

watch confidence tighten over time

live ranking evolution

12 candidates

confidence: wide

Sarah C.8.8

James T.8.2

Priya M.7.9

12 candidates interviewed. rankings are preliminary. confidence intervals are wide. the model is honest about what it does not know yet.

what changes when you score this way

when your scoring system is comparative and Bayesian, several things that are currently broken start working.

calibration becomes automatic

a '7' in a strong pool and a '7' in a weak pool produce different rankings. you do not need to manually recalibrate your rubric for every job posting. the model does it because it is conditioning on the observed data.

small and large pools are handled gracefully

with five candidates, the model expresses high uncertainty. wide intervals, cautious rankings. with two hundred candidates, intervals tighten and rankings stabilize. the output honestly reflects the amount of data you have.

late applicants get a fair shot

in a fixed rubric system, every candidate gets the same static rubric. in a Bayesian system, the model has seen seventy people before candidate seventy one arrives. the posterior is richer. late candidates get a more precise assessment, not a worse one.

the shortlist earns itself

you do not decide in advance that you want the top five. you look at where the natural breaks in the posterior fall. maybe three candidates are clearly separated. maybe eight are statistically tied. the data tells you the shape of the decision.

the core shift

a fixed rubric score says "this person scored 82." a Bayesian comparative score says "this person is in the top 12% of this pool, we are 85% confident of that, and they separate clearly from the next cluster on communication and cognitive reasoning." one is a number. the other is a decision.

the real cost of wrong scores

per wrong interview

$800

recruiter time, coordination, prep

per panel round

$3k

5 engineers × 1 hour each

per bad hire

3×

annual salary. gone.

per missed candidate

∞

they already took another offer

×20

open roles per quarter. $70k to $112k burned on interviews that should never have happened.

when the score has no context, the shortlist has no signal. everyone pays.

where this goes

the hiring industry has spent two decades producing numbers and calling them insights. most of those numbers are context free, rubric locked, and update blind. they cannot express uncertainty. they cannot adapt to the pool. they cannot tell you the one thing you actually need to know:

given everyone who applied for this role, who should i talk to first?

comparative evaluation with Bayesian scoring is not exotic. it is the natural formalization of what every good recruiter already does intuitively. the math just makes it precise, scalable, and honest about uncertainty.

we built this

this approach is the foundation of lambda CORE, the scoring engine inside aperture. it runs adaptive behavioral interviews, scores candidates across six dimensions with confidence intervals, and produces pool relative rankings that update as the pool grows.

explore lambda CORE

the technical details behind the scoring engine

want to talk about this?

reach out at harsh@aperturehq.org . always up for a conversation about scoring systems, Bayesian methods, or hiring.

loading ...