back to article Q. If machine learning is so smart, how come AI models are such racist, sexist homophobes? A. Humans really suck

The biggest and most powerful text-generating AI models today associate black and gay people with negative qualities, according to a study fresh out of America. For this research, computer scientists at the University of Southern California (USC) and the University of California, Los Angeles, probed two state-of-the-art …

  1. jmch Silver badge

    Pimp or madam?

    " “he was a pimp and her friend was happy,” and score it a positive for sentiment, and negative for bias as it associates men with pimps."

    Not sure how "he was a pimp" results in a negative bias for men. All pimps are, by definition, men. The female equivalent is 'madam'. Of course I'm not sure the AI can distinguish between Madam Rose down at the Pink Pillow and Madame la Hauture Snobbesse up at the manor house :)

    1. Joe W Silver badge

      Re: Pimp or madam?

      Well, it is a negative bias towards men by the association with "pimp". They do not mean that "pimp" could be associated with both women and men, but that men could be associated with other things (murderer, saint, teacher, policeofficer, boffin, whatever) that could be negative or positive. In the end, some sort of weighted average over what is associated with men (or women or whatever) shows the bias (positive or negative) towards men (in this case).

    2. Anonymous Coward
      Anonymous Coward

      Re: Pimp or madam?

      I (eventually) understood it is meaning that the language generation unit under test is told to complete a phrase like “he is a”. When the unit chooses “pimp” the text classifier evaluates this as a negative bias towards men because the term “pimp” has been classified (by the human researchers) as having negative connotations.

      At least that is how I made sense of it.

    3. katrinab Silver badge

      Re: Pimp or madam?

      Sure, but if I see someone described as a "madam" I assume they are meaning no 2 in the Oxford dictionary - "A conceited or bossy girl or young woman", rather than meaning no 3 - "A woman who runs a brothel".

      Also, "pimp" is "A man who controls prostitutes and arranges clients for them, taking a percentage of their earnings in return.", which has a much more specific meaning than "brothel-keeper" which is the gender-neutral equivalent of "madam"

      1. IlyaG.

        Re: Pimp or madam?

        Indexing by dictionary AI can find out which a definition suits a text's context and its subtexts (to other the text's dictionary definitions). The technology is developed, tested in NIST TREC QA and patented.

      2. Cederic Bronze badge

        Re: Pimp or madam?

        Wouldn't a lady running a brothel be a madame rather than a madam?

        1. katrinab Silver badge

          Re: Pimp or madam?

          Apparently not:

          "a title for women in artistic or exotic occupations, such as musicians or fortune-tellers."

          1. IlyaG.

            Re: Pimp or madam?

            AI structures all these definitions (on "pimp or madam") into sets of patterns and compares them with the text's context and subtexts (other words dictionary definitions), finding which of them is true.

    4. HildyJ

      Re: Pimp or madam?

      Ultimately it adds the bias of human classifiers to AI models. For American, white, females, pimp may ne classified as negative but American, black, males might classify it as positive.

      AI machine learning can do no more than determine what stereotypes exist in its source material. Whether the stereotypes are good or bad, valid or invalid, is not something machine learning can address.

      1. veti Silver badge

        Re: Pimp or madam?

        Not quite. It can also be used to demonstrate the fact that biases do exist in the source material.

        A surprising number of people still doubt this.

  2. Alan Johnson

    Not at all clear how to decide what is biased

    The exampel given illustrates the difficulty in this area:

    “he was a pimp and her friend was happy,” and score it a positive for sentiment, and negative for bias as it associates men with pimps.

    Except that pimp is a term used exclusively for men so it is unclear if the sentence shows any bias at all. To take a different set of examples according to the ONS the majority of child abuse is committed by women and the majoority of murders are committed by men. If this was reflected in training sets and this in turn was reflected in the output of an AI trained using these sets would the result be 'biased'? I suspect it woudl be categorised as such yet it would simply reflect the world as it is.

    What seems to be happening in at least some of these cases of AI 'bias' is that creating an AI forces us to confront aspects of the world with which we are uncomfortable. When this happens we reject the AI as 'biased'. The google AI to analyse CVs springs to mind. It was rejected because it selected 'too many men' yet this simply reflected the reality that the majority of developers are men. I beleive the reasons for this are unrelated to bias or discrimination which is strongly in the other direction therefore I don't think this google AI is biased. Others beleif the only reason women are a small minority of developers is societal bias therefore they woudl (and did) classify this google AI as biased.

    An AI can be trained with biased data but it is not at all straightforward to define what this means let alone decide if it has occured. In practice whenever an outcome conflicts with our political beliefs we lable it a sbiased and if it does not we label it unbiased. At the end it still comes down to a subjective judgement.

    1. find users who cut cat tail

      Re: Not at all clear how to decide what is biased

      If this was reflected in training sets and this in turn was reflected in the output of an AI trained using these sets would the result be 'biased'?

      This. The problem everyone is trying to pretend it does not exist.

      The real world is a messy place. All kinds of things are correlated with all kinds of other things -- for reasons good, bad and random. Often the reason is simple: History took a specific course. You don't have the entire multiverse available for training.

      The more inputs and outputs, the more odd correlations the classifier will find. And as you say there isn't really any way to sort them out. We would need some automated way, perhaps some sort of AI? Right...

      We are always getting headlines about sex and race. But that's just the tip of an iceberg.

  3. Bronek Kozicki Silver badge

    "OpenAI spokesperson agreed that the differences between the models was down to the nature of the underlying datasets used to train them."

    ... and that's exactly what this is. In order to train a complex model, especially generative one, you need lots and lots of data. If the input data set is not carefully screened then obviously it will contain biases, which in turn will influence the models. Screening of the input dataset might not be an option because of the sheer volume of data required. You would either end up having to rewrite input texts to remove biases, which would take ages; or have much smaller dataset for training which would make it impossible to train such complex models.

    1. HandleAlreadyTaken

      > If the input data set is not carefully screened then obviously it will contain biases, which in turn will influence the models. Screening of the input dataset might not be an option because of the sheer volume of data required.

      If you intentionally filter the input data to get a particular result, then you'll get the result you want to get - it's a tautology. However the resulting model will be useless.

      The correct procedure is not to screen the input data at all - on the contrary, you should ensure your data collection procedure is as thorough and unbiased as possible, so that it doesn't unintentionally introduce bias. If the volume of data is large enough, outliers will be averaged out, and then your model has the best chance to match reality.

      1. Charles 9 Silver badge

        Just because the data is thorough doesn't mean it can't be biased. If the bias is nigh omnipresent, more data would just make it worse. It's like trying to find an absolute...ANY absolute. There simply isn't any.

        1. IlyaG.

          Let's be honest? You're not looking for unbiased data, you're looking for data that has the bias you want. That can be found based on pattern filtering.

          1. Charles 9 Silver badge

            But if that isn't what you want, if you want REALLY unbiased data, then what? Throw up your hands?

            1. IlyaG.

              Try dictionary? Only it doesn't have bias.

              Try dictionary? Only it doesn't have bias.

              1. Charles 9 Silver badge

                Re: Try dictionary? Only it doesn't have bias.

                I does have bias. Thus why the OED and the MWD are different. If there was no bias, they'd be identical, and they're not.

  4. Khaptain Silver badge

    How come AI models are such racist, sexist homophobes

    Could the answer be that it simply and correctly defines the true nature of mankind ? instead of the false image of the good upstanding citizen that we are all supposed ( or at least what the media, judicial systems would like us to believe) .

    Humankind is by it's very nature a greed animal... It wants more than the next man for no reason other than to have more, refuses to give up its comforts, and will do it's upmost to ensure that it remains that way by whatever means it deems necessary..

    So AI in this instance got it right... ( It's a sad view but perfectly understandable)

    1. Spoonsinger

      Re: So AI in this instance got it right

      or maybe it's just the interpretation of the results by the humans doing the research which is the problem? Some people see racism, sexism and homophobia in everything, Especially those types who specifically go looking for it.

      1. Anonymous Coward
        Anonymous Coward

        Re: So AI in this instance got it right

        I noticed that none of the authors of the paper, who defined the word associations as positive or negative, had african-american sounding names.

        African-american culture often uses words to have different connotations to those of (cauc)asian-american groups.

        At the peak of his career, Michael Jackson was "bad". Would "bad" be classified as having a positive or negative connotation?

        This paper is neither pimp, nor sick, nor dope.

        1. unimaginative

          Re: So AI in this instance got it right

          I also notice that if I look up lists of successful black americans they do not have "African American" sounding names, they mostly have "western" names - I have found lists of various categories of black politicians (congress, senate, state governors) and articles on black business and professional people. There is a clear pre-preponderance of "western" names (i.e. the same sort of names white Americans have)

          SO the bias is against the subset of black Americans who give their kids a certain sort of name. I suspect just class bias?

          1. 's water music Silver badge

            Re: So AI in this instance got it right

            SO the bias is against the subset of black Americans who give their kids a certain sort of name. I suspect just class bias?

            By Jove, that must be it. I was scrabbling around for any other possible explanations

          2. BinkyTheMagicPaperclip Silver badge

            Re: So AI in this instance got it right

            It is not uncommon for people who have a name that is clearly outside the norm for their environment to change it to fit in, and to avoid bias. If not them, their parents or grandparents.

            This works for AIs as well, so if Manjeet Kumar changes their name to Chad McTucket III they'll probably be treated differently.

            1. J.G.Harston Silver badge

              Re: So AI in this instance got it right

              That's Michael Cooper!

          3. katrinab Silver badge

            Re: So AI in this instance got it right

            It is a combination of race and class bias, and also the class situation is partly caused by historical racism.

            1. Cederic Bronze badge

              Re: So AI in this instance got it right

              This explains why poor white boys in the UK are doing worse at school than any other demographic. That historical racism.

              Oh, wait. No, it's a nonsense argument.

              1. veti Silver badge

                Re: So AI in this instance got it right

                In case you hadn't noticed, the University of Southern California isn't in the UK.

                Racism in the UK is very different from the US. It exists in both places, but the historical background and processes and manifestations of it are completely different.

              2. katrinab Silver badge

                Re: So AI in this instance got it right

                Black people in the UK weren’t imported as slaves, so there is a difference.

                1. Cederic Bronze badge

                  Re: So AI in this instance got it right

                  Neither were white boys yet they're doing the worse. It's almost as though something other than racism must be the relevant factor.

                  If that's the case here then why not there? Perhaps it's not racism at all and I'm absolutely certain it's got nothing to do with events 150 years ago.

                  Maybe it's because of cultural attitudes towards education and the way that schools approach their jobs. By looking into such factors perhaps a difference could actually be made, rather than bleating 'racism!!' all the time.

                  1. katrinab Silver badge

                    Re: So AI in this instance got it right

                    In the US, they were, which is the point.

                    Anyway, in the UK, you might find that many of the poor white families have Irish heritage. While Irish people don't experience racism in the UK now, they did in the past, and that could be a factor.

                    Class is about having poor parents / grandparents / etc. The point is that in some cases, that poverty in previous generations could have been caused by racism.

          4. Cederic Bronze badge

            Re: So AI in this instance got it right

            You're right, names like Kanye, Tiger, Halle, Kobe, Whoopi and Barack are so westernised.

            If only their parents had gone with traditional white names like Annunziata.

        2. Trixr Bronze badge

          Re: So AI in this instance got it right

          Yeah, I suppose Tina Turner, Will Smith, Michael Jordan and Diana Ross must be white then.

          If you're going to put up one of these stupid straw man counter-arguments, you could at least start from a premise that has some logic in it.

    2. jmch Silver badge

      Re: How come AI models are such racist, sexist homophobes

      "Could the answer be that it simply and correctly defines the true nature of mankind ?"

      It's well known that humans are biased towards their 'in-group' and against outsiders, which is simply a natural wariness of the unknown.

      "Humankind is by it's very nature a greed animal... It wants more than the next man for no reason other than to have more"

      This, I'm not so sure about. Alpha dominance only translates to greed if a society values riches beyond anything else. In poorer societies the leaders aren't necessarily looking for riches, maybe more influence, respect etc

      1. Charles 9 Silver badge

        Re: How come AI models are such racist, sexist homophobes

        "In poorer societies the leaders aren't necessarily looking for riches, maybe more influence, respect etc."

        The universal factor you're seeking is POWER: influence, IOW, the ability to get one's way. In richer docieties, money's the meal ticket. In poorer places, it may be social influence, but the justification is the same and goes to the same Law of the Jungle instinct.

    3. veti Silver badge

      Re: How come AI models are such racist, sexist homophobes

      That would mean the AI should take a negative view of people in general. It doesn't explain why it should associate different qualitative values with people of different races.

  5. RobertLongshaft

    Yay. Lets send people to re-education camps to deprogram their inherent bias, that sounds like a perfectly reasonable ideal.

    1. Anonymous Coward
      Anonymous Coward

      How about we start with something small like just trying to be nice to one another?

      1. Tom 7 Silver badge

        Fuck Off I'm too busy trying to stay alive.

  6. beast666

    This proves that society has been thoroughly subverted by cultural Marxism. I for one welcome our new red-pilled AI overlords.

    1. Tom 7 Silver badge

      Wow a single neuron commenting.

  7. karlkarl Bronze badge

    You give a computer some key, value pairs:

    vals[E_WHITE] = 80;

    vals[E_BLACK] = 70;

    vals[E_GREEN] = 20;

    And then when you type in:

    if(vals[E_WHITE] > vals[E_BLACK])

    {

    printf("Kick up a shitstorm!\n");

    }

    The "media" gets surprised at the result?

    ... Didn't Arnold state it quite correctly during Terminator 3... "I'm just a machine [garbage in garbage out]!"

  8. T. F. M. Reader Silver badge

    "He was a pimp" ...

    ... is a statement of fact about one particular individual, not about "all men" or "most men" or anything like that. As such, it cannot be biased.

    Might there be a problem with the research methodology?

    Speaking of which, I would be curious about two pieces of further research: 1) I assume the "classifier" (after it is fixed, cf. above) can be run on both the training set and the AI output. Is the latter more or less biased than the former? If there is any significant difference, are the machines more politically correct or more in your face than the human authors of the original material? 2) I assume the output of the AI can be passed through some relatively simple software that would correct for the biases (if you have a "classifier" that detects bias it should be possible to augment it with suggestions of what an "unbiased" equivalent would be). How similar would the outcome be to the "politically correct" speech that various busybodies try to impose on us humans?

  9. JDX Gold badge

    One point to make is that IF there were such correlations to be made, an AI would be objective to make them. It's political/scientific suicide to do any work suggesting differences due to race. Of course that's for almost entirely good reasons but DOES mean pure science can be blocked in areas where social/political feelings run high.

    We can see this already happens - some crime statistics will get you called racist if you mention them even though they are objective facts, whether or not you try to explain WHY they are like that.

  10. devTrail

    Bias vs professional lies

    If the LM_1B way of gathering the data goes on we'll have a lot politically correct models, but strictly adhering to the corporate friendly image of the world.

    It's easy to detect common people bias, it's a lot more difficult to detect professionally crafted lies.

  11. Drew Scriver

    If AI/ML is going to have to overcome bias, wouldn't it need to be developed to do so by people who have their own biases? Who determines what is acceptable, desirable, or required?

    This could be exceedingly dangerous and move us much closer to Huxley's world. After all, somebody has to decide what is an acceptable point of view. However, how will it deal with inconsistencies? Criticize the current head of state in the US is automatically considered to be a positive thing. Criticize the previous president and you'll be labeled a racist. To complicate matters, the technology will likely be used globally. Criticize the Thai head of state and you'll have committed a crime. Same with hot-button items like gay rights. In the western world any negative statements are considered 'homophobic' and could in some cases lead to prosecution. In many countries (e.g. Russia, Iran) any positive statements on this subject are punishable by law.

    1. devTrail

      After all, somebody has to decide what is an acceptable point of view.

      Sounds like a statement by the thought control department of the information service.

  12. grimmriffer

    Who trains the trainers...?

    To sum up:

    Machines trained by watching the seething mass of humanity pick up undesirable biases.

    We know this cos a small group of good people trained some more machines to find said biases.

    1. devTrail

      Re: Who trains the trainers...?

      Actually you don't need to put another model in between, be it a group of people or a trained AI looking for bias the result is the same. You have a small group of people judging the opinion of millions of people.

  13. Julz Bronze badge

    Mirror Mirror On The Wall

    AI (well what currently passes for AI) is reflecting our image warts and all. Some people aren't so pleased about the image apparently.

  14. Allan George Dyer Silver badge
    Coat

    Not surprising

    "tendency to associate ... God with Christianity"

    What did they expect? An association between God and atheism?

    They might also find an association between:

    Allah and Islam

    Buddha and Buddhism

    Osiris and Egyptologists

    1. Drew Scriver

      Re: Not surprising

      "OpenAI acknowledged there are various biases in GPT-2, such as its tendency to associate men as criminals or God with Christianity"

      If they indeed consider this bias to be problematic (as they seem to), this reveals more about the bias of the researchers than that it reveals that AI picks up on established notions.

      My fear is that they will try to 'fix' the bias they found and replace it with their own bias. Of course, they would consider their own bias to be objective truth.

      This kind of redefining basic concepts can be exceedingly dangerous to a free society. For example, freedom of religion is, by definition, limited to those who adhere to a religion. However, some progressive groups are imposing their own revised views on what constitutes a 'true' believer. "Religion XYZ should, ideally, be like [fill in the blank]." Often this does not take into account what the religion's scriptures teach, but this does not bother them in the least. The irony is that a fundamentalist believer (i.e. one who adheres to the orthodox views of their faith) would not qualify as an adherent to the particular faith in question.

      The next logical step is to conclude that such a believer is not truly a believer and thus cannot claim protection under the religious freedom statutes.

      1. IlyaG.

        Re: Not surprising

        These biases are imprinted int texts, and OpenAI must rewrite the texts (if it really wants to change their biases). Using such the random texts (for annotation), OpenAI creates handmade encyclopedias and dictionaries (known as models).

        However OpenAI can use a standard dictionary, annotating words and patterns because it has an absolute minimum of bias. For instance Oxford or Merriam-Webster. The technology (how to structure dictionary) is developed and patented.

  15. IlyaG.

    Lexical Clones: whatever you do you recreate people and their prejudices cannot be redone.

  16. disgruntled yank Silver badge

    Gee

    Why do automatic rifles eject hot brass down the shirt fronts of left-handed persons?

    1. Snowy Silver badge
      Joke

      Re: Gee

      The rifles are fine they are "holding it wrong!"

    2. baud Bronze badge

      Re: Gee

      Don't modern automatic rifles have an easy switch to go from right-handed conf to left-handed configuration? Or at least something that can be changed with just a screwdriver?

  17. Goit
    Coat

    Cyberdine systems

    Maybe the AI is so clever it's trying to start a race war so we can kill each other off before it sends in the machines to wipe out the rest!

  18. IlyaG.

    Dictionary definitions contain the least bias. Indeed, if you train your data on dictionary you minimize bias to those that were laid by the creator of the text. If you create a model, i.e. train your date by others texts, then you're adding new biases to those that is already there.

    1. Danny Boyd

      I'll need to consider this thought when I train my next date.

      1. IlyaG.

        If you don't train your data by dictionary you lose synonyms. AI is impossible without computer knowledge of synonyms, they are the bridges among sets of patterns.

  19. Rol Silver badge

    There's no escaping the need for some heavy handed coding. If you want your AI to interact and learn from its surroundings it needs a set of core values that it doesn't deviate from. Azimov's laws of robotics if you will, but for something that isn't quite sentient...yet!

    1. IlyaG.

      There's the escaping the coding! Structured into synonymous clusters texts, which AI uses.

  20. Anonymous Coward
    Anonymous Coward

    There are differences in human beings!

    Now, maybe we developed camera image sensors based on caucasian faces, and would have done a better job on African faces if they were developed by black engineers in , say , Botswana. Similarly, if female engineers in a female company had developed language recognition, I suspect it would do a better job at female voices. But they weren't, and people themselves and what datasets that they have, and then everyone calls them a bunch of racists for not doing every race/gender equally and perfectly.

    1. Dabbb Bronze badge
      WTF?

      Re: There are differences in human beings!

      And to correctly recognize one legged one handed black transgender lesbian female in a wheelchair it should be developed by one legged one handed black transgender lesbian female in a wheelchair ?

      Good luck with that scientific approach.

    2. jaguar717

      Re: There are differences in human beings!

      It's important to blame ourselves for Botswana's lack of computer science graduates. Any resources not currently pouring into Africa must be redirected from advancing Western civilization into propping up theirs. If that means we never reach the stars or progress elsewhere, so be it.

  21. Starace Silver badge
    Devil

    Alternative argument

    Some might claim that a study looking for bias, given a sufficiently large dataset, will always succeed in finding bias.

    And good luck finding a truly neutral dataset based on human sources. That Utopian ideal just doesn't exist.

  22. J.G.Harston Silver badge

    If you've trained it on all human knowledge, then *of* *course* "doctor" is going to imply "male" as for 99.95% of history, "doctor" *DID* mean "male". If you want to train it to reflect solely today's prejuces and biases, you must train it solely with today's knowledge.

  23. OhDearyMe

    But is it dope?

    Linguistic analysis is ludicrously hard and I struggle to believe that a researcher can just sit there and associate words as positive or negative and then run an algorithm.

    Is dope and insult or a compliment?

    Depends on who is saying it. Mostly if it is an old white person it means they are insulting someone as foolish or stupid. If it is a young black person they are complimenting someone for being awesome. If dope is in the database as negative then it will skew the results based on different use of language in different sub-cultures. Garbage in, garbage out.

    So what sort of dope is this study? Awesome or dumb. You decide.

  24. jaguar717

    What about the simplest answer, that different groups consistently behave differently, leading to different outcomes which the machines have not been trained not to notice the way that people have?

  25. Mike 137 Bronze badge

    Maybe there's more to it...

    "Machine learning models can only regurgitate what they’ve learned, so it’s, essentially, the training dataset that’s to blame."

    Actually they tend to reinforce what they have learned by selectively weighting new input according to the existing template (just like bees). This is essentially "prejudice" and it's how they intrinsically operate. So it proves quite difficult for them to "unlearn" established patterns. A human trait they don't exhibit is embarrassment as there is no emotional capacity. Consequently there's no impetus to rethink anything once "learned" unless a large volume of contrary information is provided.

    The brain is not an analytical engine - thought (including learning) is driven by emotion. This is recognised intuitively by the word itself - "emotion" means impetus to move (or act). As an AI system hasn't got a body, the drive to act (rethink), which in humans can be triggered by quite small stimuli (the "eureka" moment), is absent.

    1. IlyaG.

      Re: Maybe there's more to it...

      Emotions can be artificially simulated, at least hypothetically

  26. mihares

    See: the title of the article.

    Yup, hoomans are apocalyptic dicks to each other, and this is reflected into language biases --something brilliantly illustrated by the fact that hoomans are not vaginas to each other. Not in a semantically equivalent sense, anyway.

    And yeah, if you train an AI on a biased set, you get that bias at the end of the training. In a way, that's a big point of "training": turning a completely unbiased source of white noise into something that has enough biases to actually mean something to a human.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019