Close Menu
  • Home
  • AI
  • Big Data
  • Cloud Computing
  • iOS Development
  • IoT
  • IT/ Cybersecurity
  • Tech
    • Nanotechnology
    • Green Technology
    • Apple
    • Software Development
    • Software Engineering

Subscribe to Updates

Get the latest technology news from Bigteetechhub about IT, Cybersecurity and Big Data.

    What's Hot

    Today’s NYT Connections Hints, Answers for Jan. 25 #959

    January 25, 2026

    How Data-Driven Third-Party Logistics (3PL) Providers Are Transforming Modern Supply Chains

    January 25, 2026

    ios – Why does my page scroll up when I tap on a button?

    January 25, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    Big Tee Tech Hub
    • Home
    • AI
    • Big Data
    • Cloud Computing
    • iOS Development
    • IoT
    • IT/ Cybersecurity
    • Tech
      • Nanotechnology
      • Green Technology
      • Apple
      • Software Development
      • Software Engineering
    Big Tee Tech Hub
    Home»Artificial Intelligence»Starting to think about AI Fairness
    Artificial Intelligence

    Starting to think about AI Fairness

    big tee tech hubBy big tee tech hubJanuary 22, 20260116 Mins Read
    Share Facebook Twitter Pinterest Copy Link LinkedIn Tumblr Email Telegram WhatsApp
    Follow Us
    Google News Flipboard
    Starting to think about AI Fairness
    Share
    Facebook Twitter LinkedIn Pinterest Email Copy Link



    Starting to think about AI Fairness

    If you use deep learning for unsupervised part-of-speech tagging of
    Sanskrit, or knowledge discovery in physics, you probably
    don’t need to worry about model fairness. If you’re a data scientist
    working at a place where decisions are made about people, however, or
    an academic researching models that will be used to such ends, chances
    are that you’ve already been thinking about this topic. — Or feeling that
    you should. And thinking about this is hard.

    It is hard for several reasons. In this text, I will go into just one.

    The forest for the trees

    Nowadays, it is hard to find a modeling framework that does not
    include functionality to assess fairness. (Or is at least planning to.)
    And the terminology sounds so familiar, as well: “calibration,”
    “predictive parity,” “equal true [false] positive rate”… It almost
    seems as though we could just take the metrics we make use of anyway
    (recall or precision, say), test for equality across groups, and that’s
    it. Let’s assume, for a second, it really was that simple. Then the
    question still is: Which metrics, exactly, do we choose?

    In reality things are not simple. And it gets worse. For very good
    reasons, there is a close connection in the ML fairness literature to
    concepts that are primarily treated in other disciplines, such as the
    legal sciences: discrimination and disparate impact (both not being
    far from yet another statistical concept, statistical parity).
    Statistical parity means that if we have a classifier, say to decide
    whom to hire, it should result in as many applicants from the
    disadvantaged group (e.g., Black people) being hired as from the
    advantaged one(s). But that is quite a different requirement from, say,
    equal true/false positive rates!

    So despite all that abundance of software, guides, and decision trees,
    even: This is not a simple, technical decision. It is, in fact, a
    technical decision only to a small degree.

    Common sense, not math

    Let me start this section with a disclaimer: Most of the sources
    referenced in this text appear, or are implied on the “Guidance”
    page
    of IBM’s framework
    AI Fairness 360. If you read that page, and everything that’s said and
    not said there appears clear from the outset, then you may not need this
    more verbose exposition. If not, I invite you to read on.

    Papers on fairness in machine learning, as is common in fields like
    computer science, abound with formulae. Even the papers referenced here,
    though selected not for their theorems and proofs but for the ideas they
    harbor, are no exception. But to start thinking about fairness as it
    might apply to an ML process at hand, common language – and common
    sense – will do just fine. If, after analyzing your use case, you judge
    that the more technical results are relevant to the process in
    question, you will find that their verbal characterizations will often
    suffice. It is only when you doubt their correctness that you will need
    to work through the proofs.

    At this point, you may be wondering what it is I am contrasting those
    “more technical results” with. This is the topic of the next section,
    where I’ll try to give a birds-eye characterization of fairness criteria
    and what they imply.

    Situating fairness criteria

    Think back to the example of a hiring algorithm. What does it mean for
    this algorithm to be fair? We approach this question under two –
    incompatible, mostly – assumptions:

    1. The algorithm is fair if it behaves the same way independent of
      which demographic group it is applied to. Here demographic group
      could be defined by ethnicity, gender, abledness, or in fact any
      categorization suggested by the context.

    2. The algorithm is fair if it does not discriminate against any
      demographic group.

    I’ll call these the technical and societal views, respectively.

    Fairness, viewed the technical way

    What does it mean for an algorithm to “behave the same way” regardless
    of which group it is applied to?

    In a classification setting, we can view the relationship between
    prediction (\(\hat{Y}\)) and target (\(Y\)) as a doubly directed path. In
    one direction: Given true target \(Y\), how accurate is prediction
    \(\hat{Y}\)? In the other: Given \(\hat{Y}\), how well does it predict the
    true class \(Y\)?

    Based on the direction they operate in, metrics popular in machine
    learning overall can be split into two categories. In the first,
    starting from the true target, we have recall, together with “the
    rates”: true positive, true negative, false positive, false negative.
    In the second, we have precision, together with positive (negative,
    resp.) predictive value.

    If now we demand that these metrics be the same across groups, we arrive
    at corresponding fairness criteria: equal false positive rate, equal
    positive predictive value, etc. In the inter-group setting, the two
    types of metrics may be arranged under headings “equality of
    opportunity” and “predictive parity.” You’ll encounter these as actual
    headers in the summary table at the end of this text.

    Said table organizes concepts from different areas into a three-category
    format. The overall narrative builds up towards that “map” in a
    bottom-up way – meaning, most entries will not make sense at this
    point.

    While overall, the terminology around metrics can be confusing (to me it
    is), these headings have some mnemonic value. Equality of opportunity
    suggests that people similar in real life (\(Y\)) get classified similarly
    (\(\hat{Y}\)). Predictive parity suggests that people classified
    similarly (\(\hat{Y}\)) are, in fact, similar (\(Y\)).

    The two criteria can concisely be characterized using the language of
    statistical independence. Following Barocas, Hardt, and Narayanan (2019), these are:

    • Separation: Given true target \(Y\), prediction \(\hat{Y}\) is
      independent of group membership (\(\hat{Y} \perp A | Y\)).

    • Sufficiency: Given prediction \(\hat{Y}\), target \(Y\) is independent
      of group membership (\(Y \perp A | \hat{Y}\)).

    Given those two fairness criteria – and two sets of corresponding
    metrics – the natural question arises: Can we satisfy both? Above, I
    was mentioning precision and recall on purpose: to maybe “prime” you to
    think in the direction of “precision-recall trade-off.” And really,
    these two categories reflect different preferences; usually, it is
    impossible to optimize for both. The most famous, probably, result is
    due to Chouldechova (2016) : It says that predictive parity (testing
    for sufficiency) is incompatible with error rate balance (separation)
    when prevalence differs across groups. This is a theorem (yes, we’re in
    the realm of theorems and proofs here) that may not be surprising, in
    light of Bayes’ theorem, but is of great practical importance
    nonetheless: Unequal prevalence usually is the norm, not the exception.

    This necessarily means we have to make a choice. And this is where the
    theorems and proofs do matter. For example, Yeom and Tschantz (2018) show that
    in this framework – the strictly technical approach to fairness –
    separation should be preferred over sufficiency, because the latter
    allows for arbitrary disparity amplification. Thus, in this framework,
    we may have to work through the theorems.

    What is the alternative?

    Fairness, viewed as a social construct

    Starting with what I just wrote: No one will likely challenge fairness
    being a social construct. But what does that entail?

    Let me start with a biographical reminiscence. In undergraduate
    psychology (a long time ago), probably the most hammered-in distinction
    relevant to experiment planning was that between a hypothesis and its
    operationalization. The hypothesis is what you want to substantiate,
    conceptually; the operationalization is what you measure. There
    necessarily can’t be a one-to-one correspondence; we’re just striving to
    implement the best operationalization possible.

    In the world of datasets and algorithms, all we have are measurements.
    And often, these are treated as though they were the concepts. This
    will get more concrete with an example, and we’ll stay with the hiring
    software scenario.

    Assume the dataset used for training, assembled from scoring previous
    employees, contains a set of predictors (among which, high-school
    grades) and a target variable, say an indicator whether an employee did
    “survive” probation. There is a concept-measurement mismatch on both
    sides.

    For one, say the grades are intended to reflect ability to learn, and
    motivation to learn. But depending on the circumstances, there
    are influence factors of much higher impact: socioeconomic status,
    constantly having to struggle with prejudice, overt discrimination, and
    more.

    And then, the target variable. If the thing it’s supposed to measure
    is “was hired for seemed like a good fit, and was retained since was a
    good fit,” then all is good. But normally, HR departments are aiming for
    more than just a strategy of “keep doing what we’ve always been doing.”

    Unfortunately, that concept-measurement mismatch is even more fatal,
    and even less talked about, when it’s about the target and not the
    predictors. (Not accidentally, we also call the target the “ground
    truth.”) An infamous example is recidivism prediction, where what we
    really want to measure – whether someone did, in fact, commit a crime
    – is replaced, for measurability reasons, by whether they were
    convicted. These are not the same: Conviction depends on more
    then what someone has done – for instance, if they’ve been under
    intense scrutiny from the outset.

    Fortunately, though, the mismatch is clearly pronounced in the AI
    fairness literature. Friedler, Scheidegger, and Venkatasubramanian (2016) distinguish between the construct
    and observed spaces; depending on whether a near-perfect mapping is
    assumed between these, they talk about two “worldviews”: “We’re all
    equal” (WAE) vs. “What you see is what you get” (WYSIWIG). If we’re all
    equal, membership in a societally disadvantaged group should not – in
    fact, may not – affect classification. In the hiring scenario, any
    algorithm employed thus has to result in the same proportion of
    applicants being hired, regardless of which demographic group they
    belong to. If “What you see is what you get,” we don’t question that the
    “ground truth” is the truth.

    This talk of worldviews may seem unnecessary philosophical, but the
    authors go on and clarify: All that matters, in the end, is whether the
    data is seen as reflecting reality in a naïve, take-at-face-value way.

    For example, we might be ready to concede that there could be small,
    albeit uninteresting effect-size-wise, statistical differences between
    men and women as to spatial vs. linguistic abilities, respectively. We
    know for sure, though, that there are much greater effects of
    socialization, starting in the core family and reinforced,
    progressively, as adolescents go through the education system. We
    therefore apply WAE, trying to (partly) compensate for historical
    injustice. This way, we’re effectively applying affirmative action,
    defined as

    A set of procedures designed to eliminate unlawful discrimination
    among applicants, remedy the results of such prior discrimination, and
    prevent such discrimination in the future.

    In the already-mentioned summary table, you’ll find the WYSIWIG
    principle mapped to both equal opportunity and predictive parity
    metrics. WAE maps to the third category, one we haven’t dwelled upon
    yet: demographic parity, also known as statistical parity. In line
    with what was said before, the requirement here is for each group to be
    present in the positive-outcome class in proportion to its
    representation in the input sample. For example, if thirty percent of
    applicants are Black, then at least thirty percent of people selected
    should be Black, as well. A term commonly used for cases where this does
    not happen is disparate impact: The algorithm affects different
    groups in different ways.

    Similar in spirit to demographic parity, but possibly leading to
    different outcomes in practice, is conditional demographic parity.
    Here we additionally take into account other predictors in the dataset;
    to be precise: all other predictors. The desiderate now is that for
    any choice of attributes, outcome proportions should be equal, given the
    protected attribute and the other attributes in question. I’ll come
    back to why this may sound better in theory than work in practice in the
    next section.

    Summing up, we’ve seen commonly used fairness metrics organized into
    three groups, two of which share a common assumption: that the data used
    for training can be taken at face value. The other starts from the
    outside, contemplating what historical events, and what political and
    societal factors have made the given data look as they do.

    Before we conclude, I’d like to try a quick glance at other disciplines,
    beyond machine learning and computer science, domains where fairness
    figures among the central topics. This section is necessarily limited in
    every respect; it should be seen as a flashlight, an invitation to read
    and reflect rather than an orderly exposition. The short section will
    end with a word of caution: Since drawing analogies can feel highly
    enlightening (and is intellectually satisfying, for sure), it is easy to
    abstract away practical realities. But I’m getting ahead of myself.

    A quick glance at neighboring fields: law and political philosophy

    In jurisprudence, fairness and discrimination constitute an important
    subject. A recent paper that caught my attention is Wachter, Mittelstadt, and Russell (2020a) . From a
    machine learning perspective, the interesting point is the
    classification of metrics into bias-preserving and bias-transforming.
    The terms speak for themselves: Metrics in the first group reflect
    biases in the dataset used for training; ones in the second do not. In
    that way, the distinction parallels Friedler, Scheidegger, and Venkatasubramanian (2016) ’s confrontation of
    two “worldviews.” But the exact words used also hint at how guidance by
    metrics feeds back into society: Seen as strategies, one preserves
    existing biases; the other, to consequences unknown a priori, changes
    the world
    .

    To the ML practitioner, this framing is of great help in evaluating what
    criteria to apply in a project. Helpful, too, is the systematic mapping
    provided of metrics to the two groups; it is here that, as alluded to
    above, we encounter conditional demographic parity among the
    bias-transforming ones. I agree that in spirit, this metric can be seen
    as bias-transforming; if we take two sets of people who, per all
    available criteria, are equally qualified for a job, and then find the
    whites favored over the Blacks, fairness is clearly violated. But the
    problem here is “available”: per all available criteria. What if we
    have reason to assume that, in a dataset, all predictors are biased?
    Then it will be very hard to prove that discrimination has occurred.

    A similar problem, I think, surfaces when we look at the field of
    political philosophy, and consult theories on distributive
    justice
    for
    guidance. Heidari et al. (2018) have written a paper comparing the three
    criteria – demographic parity, equality of opportunity, and predictive
    parity – to egalitarianism, equality of opportunity (EOP) in the
    Rawlsian sense, and EOP seen through the glass of luck egalitarianism,
    respectively. While the analogy is fascinating, it too assumes that we
    may take what is in the data at face value. In their likening predictive
    parity to luck egalitarianism, they have to go to especially great
    lengths, in assuming that the predicted class reflects effort
    exerted
    . In the below table, I therefore take the liberty to disagree,
    and map a libertarian view of distributive justice to both equality of
    opportunity and predictive parity metrics.

    In summary, we end up with two highly controversial categories of
    fairness criteria, one bias-preserving, “what you see is what you
    get”-assuming, and libertarian, the other bias-transforming, “we’re all
    equal”-thinking, and egalitarian. Here, then, is that often-announced
    table.

    A.K.A. /
    subsumes /
    related
    concepts
    statistical
    parity, group
    fairness,
    disparate
    impact,
    conditional
    demographic
    parity
    equalized
    odds, equal
    false positive
    / negative
    rates
    equal positive
    / negative
    predictive
    values,
    calibration by
    group
    Statistical
    independence
    criterion

    independence

    \(\hat{Y} \perp A\)

    separation

    \(\hat{Y} \perp A | Y\)

    sufficiency

    \(Y \perp A | \hat{Y}\)

    Individual /
    group
    group group (most)
    or individual
    (fairness
    through
    awareness)
    group
    Distributive
    Justice
    egalitarian libertarian
    (contra
    Heidari et
    al., see
    above)
    libertarian
    (contra
    Heidari et
    al., see
    above)
    Effect on
    bias
    transforming preserving preserving
    Policy /
    “worldview”
    We’re all
    equal (WAE)
    What you see
    is what you
    get (WYSIWIG)
    What you see
    is what you
    get (WYSIWIG)

    (A) Conclusion

    In line with its original goal – to provide some help in starting to
    think about AI fairness metrics – this article does not end with
    recommendations. It does, however, end with an observation. As the last
    section has shown, amidst all theorems and theories, all proofs and
    memes, it makes sense to not lose sight of the concrete: the data trained
    on, and the ML process as a whole. Fairness is not something to be
    evaluated post hoc; the feasibility of fairness is to be reflected on
    right from the beginning.

    In that regard, assessing impact on fairness is not that different from
    that essential, but often toilsome and non-beloved, stage of modeling
    that precedes the modeling itself: exploratory data analysis.

    Thanks for reading!

    Photo by Anders Jildén on Unsplash

    Barocas, Solon, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and Machine Learning. fairmlbook.org.

    Chouldechova, Alexandra. 2016. “Fair prediction with disparate impact: A study of bias in recidivism prediction instruments.” arXiv e-Prints, October, arXiv:1610.07524. https://arxiv.org/abs/1610.07524.
    Cranmer, Miles D., Alvaro Sanchez-Gonzalez, Peter W. Battaglia, Rui Xu, Kyle Cranmer, David N. Spergel, and Shirley Ho. 2020. “Discovering Symbolic Models from Deep Learning with Inductive Biases.” CoRR abs/2006.11287. https://arxiv.org/abs/2006.11287.
    Friedler, Sorelle A., Carlos Scheidegger, and Suresh Venkatasubramanian. 2016. “On the (Im)possibility of Fairness.” CoRR abs/1609.07236. http://arxiv.org/abs/1609.07236.
    Heidari, Hoda, Michele Loi, Krishna P. Gummadi, and Andreas Krause. 2018. “A Moral Framework for Understanding of Fair ML Through Economic Models of Equality of Opportunity.” CoRR abs/1809.03400. http://arxiv.org/abs/1809.03400.
    Srivastava, Prakhar, Kushal Chauhan, Deepanshu Aggarwal, Anupam Shukla, Joydip Dhar, and Vrashabh Prasad Jain. 2018. “Deep Learning Based Unsupervised POS Tagging for Sanskrit.” In Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence. ACAI 2018. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3302425.3302487.
    Wachter, Sandra, Brent D. Mittelstadt, and Chris Russell. 2020a. “Bias Preservation in Machine Learning: The Legality of Fairness Metrics Under EU Non-Discrimination Law.” West Virginia Law Review, Forthcoming abs/2005.05906. https://ssrn.com/abstract=3792772.
    ———. 2020b. “Why Fairness Cannot Be Automated: Bridging the Gap Between EU Non-Discrimination Law and AI.” CoRR abs/2005.05906. https://arxiv.org/abs/2005.05906.
    Yeom, Samuel, and Michael Carl Tschantz. 2018. “Discriminative but Not Discriminatory: A Comparison of Fairness Definitions Under Different Worldviews.” CoRR abs/1808.08619. http://arxiv.org/abs/1808.08619.

    Enjoy this blog? Get notified of new posts by email:

    Posts also available at r-bloggers



    Source link

    Fairness starting
    Follow on Google News Follow on Flipboard
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
    tonirufai
    big tee tech hub
    • Website

    Related Posts

    Balancing cost and performance: Agentic AI development

    January 24, 2026

    Take Action on Emerging Trends

    January 23, 2026

    All anyone wants to talk about at Davos is AI and Donald Trump

    January 21, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Editors Picks

    Today’s NYT Connections Hints, Answers for Jan. 25 #959

    January 25, 2026

    How Data-Driven Third-Party Logistics (3PL) Providers Are Transforming Modern Supply Chains

    January 25, 2026

    ios – Why does my page scroll up when I tap on a button?

    January 25, 2026

    Konni hackers target blockchain engineers with AI-built malware

    January 24, 2026
    About Us
    About Us

    Welcome To big tee tech hub. Big tee tech hub is a Professional seo tools Platform. Here we will provide you only interesting content, which you will like very much. We’re dedicated to providing you the best of seo tools, with a focus on dependability and tools. We’re working to turn our passion for seo tools into a booming online website. We hope you enjoy our seo tools as much as we enjoy offering them to you.

    Don't Miss!

    Today’s NYT Connections Hints, Answers for Jan. 25 #959

    January 25, 2026

    How Data-Driven Third-Party Logistics (3PL) Providers Are Transforming Modern Supply Chains

    January 25, 2026

    Subscribe to Updates

    Get the latest technology news from Bigteetechhub about IT, Cybersecurity and Big Data.

      • About Us
      • Contact Us
      • Disclaimer
      • Privacy Policy
      • Terms and Conditions
      © 2026 bigteetechhub.All Right Reserved

      Type above and press Enter to search. Press Esc to cancel.