back to article Uncle Sam wants to tackle bias in algorithms by ordering tech corps to explain how their machines really work

US senators introduced a bill on Wednesday that will allow the Federal Trade Commission to inspect if corporations are using algorithms that are biased, discriminatory, and insecure. The bill, known as the Algorithmic Accountability Act of 2019, was backed by Senator Ron Wyden (D-OR), Senator Cory Booker (D-NJ), and …

  1. a_yank_lurker Silver badge

    So the regulations start

    It will be interesting to see how this bill fairs with America's Native Criminal Class (hat tip Mark Twain). Most of these systems are over-hyped and poorly designed with all sorts of biases in them. Add to these woes most of the data used is not as good as is often assumed.

    1. This post has been deleted by its author

    2. BebopWeBop Silver badge
      Thumb Up

      Re: So the regulations start

      And I would give you an extra update for the Mark Twain quote if I could.

    3. DougS Silver badge

      Theoretically this should be bipartisan

      Conservatives keep complaining that e.g. Google's search algorithms are biased against them. Or do they just want to whine about it and make up conspiracy theories, and not risk looking to see whether that's actually the case or not?

      Though senators are probably assuming there's some "source code" that could be looked at to see if there's something like "if user is black then don't show rental to them" or "if article is anti-abortion then bury it on page 6 of search results". With a neural network all you can do is send it sample inputs and see what it does with them on an individual basis, you won't know WHY those actions are taken, so such regulation is not going to produce the intended outcome.

      1. Anonymous Coward
        Anonymous Coward

        Re: Theoretically this should be bipartisan

        There is academic work that demonstrates that it is. And just to show something for yourself, try comparing the results of a Google and a DuckDuckGo search on a "political" story. There will be a difference, and it is often very marked that Google emphasizes "liberal" (US definition) results and produces them in preference to non-"liberal" stories.

        I suggest that the phenomena is very real. This may or may not be a problem depending on your ethical beliefs. After all, free speech is all very well but not for those who think differently, right ?

        1. DougS Silver badge

          Re: Theoretically this should be bipartisan

          The first amendment protection of free speech protects one from interference by the GOVERNMENT. There are no laws requiring private companies to do the same. If you feel this is happening your recourse is to avoid using Google products or services. Just like if you don't agree with Chik-Fil-A's politics you shouldn't patronize them.

  2. J.G.Harston Silver badge

    But the whole point of many of these algorithms is that the programmers *DON'T* know how they come to their decisions. They examine existing data and dig out patterns, building matching algorithms themselves.

    1. Joe W

      Right. If they knew how to arrive at the same decision it would be possible to implement a decision tree (if... else... fi).

      What they are asking about are (among other things) biases in the training data and how these could be corrected for. Let's see what will happen, and if it changes anything at all.

    2. veti Silver badge

      If the algorithm isn't following any kind of logic, then why should we imagine its decisions will provide any benefit?

      If it is following some kind of logic, then it's not unreasonable to require that the owner of the algorithm should be able, on demand, to explain its process. The decision tree may be unfeasibly complicated to put into a generalised if... then structure, but it should still be possible to map the path to any particular conclusion.

      If it doesn't allow for that level of accountability, then it's not fit for use.

      1. Giovani Tapini

        @veti this line of logic suggests the patenting of "algorithms" will become necessary along with all the tiresome hazards to common sense that brings..

      2. Michael H.F. Wilkinson Silver badge

        The issue is that although deep neural networks are based on logic, the myriad weights used to combine the complex input data make it near impossible to understand that logic. The weights given to each input item are not chosen, they are inferred automatically from the data. At the end of the training period you are left with a highly complex network in which hundreds of thousands of computations are performed and a decision is spat out. Understanding how this decision is made is near impossible for deep networks.

        More subtly, the machine will learn biases from its input data. If African Americans are poorer, and are fired from their job more often due to existing racism on average, then the machine trained to decide whether or not to give a loan might well learn to discriminate against African Americans when deciding whether or not to give a loan if ethnic background is part of the training data, perpetuating an existing bias in society. This bias might not be obvious in the weights used in the final machine at all.

      3. Martin Gregorie Silver badge

        Upvoted: I came here to say the more or less the same.

        If the system's decision can adversely affect its subjects and it doesn't provide a clear explanation of how it arrived at each conclusion, then it must NOT be used in any way that can affect a living person.

        The should be no exceptions to this rule.

        1. DavCrav Silver badge

          "The should be no exceptions to this rule."

          Can you please clearly explain how you arrived at that conclusion. Your decision will severely affect the development of self-driving cars, and will result in thousands of people being fired.

          If a human's decision can adversely affect their subjects and they don't provide a clear explanation of how they arrived at each conclusion, then they must NOT be placed in any position that can affect a living person.

          (I'm not totally against your position, by the way, I can just see the hypocrisy of humanity imposing way higher standards on artificial intelligence than on natural intelligence.)

          1. Martin Gregorie Silver badge

            I arrived at the viewpoint I stated at least partly from the experience of having used an ancient compiler (Algol68R) that produced code that could do more or less exactly what I want any autonomous decison-making system to do. Forty years later I still haven't seen anything to match it: after a crash it produced a report that displayed the execution path from the start, not only dumping data but showing details such as which way execution had passed through conditional statements, how many times each loop had executed and why they'd exited - all cross referenced with source line numbers. In short, it took you by the hand and led you to the crash site, pointing out interesting events along the way.

            A few years later, I designed a music planning system for BBC Radio 3 - a surprisingly complex job until you understand the complexity of the way orchestral musical works and the parts thereof are named and referenced. This was designed for easy use by both the dedicated music planners as well as others, e.g. producers and studio managers who might use it once a week at the most, and so it needed a decent help system that could show a user where they'd got to and what to do next. The system needed a menu structure that was 7-8 levels deep and some of the menus were tens of pages long, yet it still had to be fast and easy for the planners to use. It had just ten context-sensitive commands, designed so that they could be strung together along with user-chosen abbreviations of anything that could appear on any menu. The system had no fixed abbreviations, so a user could use any abbreviation they found easy to remember. So, we gave it a help system that could tell a user exactly where they were in the menu structure and how to get back on the track of what they were looking for. The database wasn't just a comprehensive catalogue of musical compositions, it also catalogued composers, performers and stored program details which tracked progress while programs were being made, where the recordings were stored and the broadcast history of each piece of music in each program.

            So, if a compiler and its runtime environment can do what the Algol68R system could, and I was able to provide concise and readable help to the non-technical users of an application that let them search through and manipulate the content of a very complex and quite large database, I fail to see why the authors of an 'AI' system can't do the same, though of course this ability would need to be designed into it from the outset and not bolted on later as an afterthought.

            Yes, I know this probably excludes most neural network systems because nobody, least of all their authors, understands how a trained network makes decisions, but think about it: would you really want such an untestable and unverifiable system making life or death decisions?

      4. DavCrav Silver badge

        "If the algorithm isn't following any kind of logic, then why should we imagine its decisions will provide any benefit?

        If it is following some kind of logic, then it's not unreasonable to require that the owner of the algorithm should be able, on demand, to explain its process. The decision tree may be unfeasibly complicated to put into a generalised if... then structure, but it should still be possible to map the path to any particular conclusion.

        If it doesn't allow for that level of accountability, then it's not fit for use."

        Who are we explaining it to? There are algorithms I know about that I can explain to a few dozen people, and they would say 'yes, that's fine', but the other 7bn wouldn't have a clue what I was talking about. Are they allowed?

    3. a_yank_lurker Silver badge

      One problem not addressed is all decisions are made in a partial information vacuum, a fog if you will. When one makes a decision, one does not all the relevant facts about the issue and may never know all the relevant facts afterwards. So one makes a decision based on experience, information at hand, and some navel gazing. How murky the fog is various but it is there. Many wrong decisions in hindsight were not poor or stupid decisions based on what was known at the time.

  3. Jusme

    Well that's AI fsck'd then...

    Not that it'll ever happen.

  4. A Non e-mouse Silver badge

    So shouldn't the government start by examining the systems that some courts in America use to decide the sentence for a convict? Isn't this a closed system that's been shown to have bias?

    www.theregister.co.uk/2018/08/13/criminal_justice_code/

  5. Giovani Tapini

    The best way to stop this nonsense...

    Is to develop a "politician simulator" AI, as once you can successfully predict most of what they themselves do, publish the algorithm and data and...

    Suddenly all their own biases are reportable, obvious, and probably immoral...

    1. LenG

      Re: The best way to stop this nonsense...

      Already exists - called dice.

      1. Doctor Syntax Silver badge

        Re: The best way to stop this nonsense...

        Not at all. All that has to be done is evaluate balance of campaign donations from various sides.

  6. trevorde

    Garbage in, garbage out

    Didn't Google (or some other tech company) use AI to decide what the perfect resume was by training it with resumes from their own employees? They then found out the AI preferred resumes from nerdy, white, single males. A quick look at their workforce showed they were predominantly...

    1. A Non e-mouse Silver badge

      Re: Garbage in, garbage out

      The same Google that used its massive database to predict flu outbreaks. Only to find that its predictions were..less than stellar.

  7. Sleep deprived
    Big Brother

    First step towards the Butlerian Jihad?

    Where all thinking machines are outlawed.

    https://en.m.wikipedia.org/wiki/Butlerian_Jihad for the non Dune-savvy

  8. Jack of Shadows Silver badge
    Boffin

    First off: What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models, so at least someone is currently working on exactly this sort of problem. One advantage of the systems in use these days is that the weightings are determined as linear functions. [No one, it seems, is doing the insane things I was doing as far back as 1987 with non-parametric, non-linear, let alone autonomous. Don't ask!] This makes it far easier to establish what it is about a given sample to adjust weights in back-propagation.

    The problem I do see in play is that the FTC will be completely incapable of ever evaluation the truthfulness of the submitted document as it will always lack the expertise unless it has the budget. Anyone worth paying..., well corporations will flat outbid them and this is exactly the same problem we're seeing on both sides of the Atlantic at regulating technology based businesses. I do like the idea, though. Personally, I'd just flat out ban it but we're already seeing how well that plays out with CRISPR, another tech to keep locked away.

    1. Anonymous Coward
      Anonymous Coward

      @Jack of Shadows

      Thanks for the tip on that paper!

      No one, it seems, is doing the insane things I was doing as far back as 1987 with non-parametric, non-linear, let alone autonomous. Don't ask!

      I don't need to ask, I played with that a bit in the 90s too... Brrrr...

      1. Jack of Shadows Silver badge

        Re: @Jack of Shadows

        Here's a Phys.org article from a couple of days back. It's relevance is that bias effects on future outcomes is one candidate for analysis with this method. Quantum AI? Scientists build a machine to generate quantum superposition of possible futures.

    2. MonkeyCee Silver badge

      Testing vs secret sauce

      "The problem I do see in play is that the FTC will be completely incapable of ever evaluation the truthfulness of the submitted document as it will always lack the expertise unless it has the budget."

      The FTC don't actually need to see under the hood. What they need to be able to do is test the system to see what results it is given. If they show bias, then that's all they need to show the company isn't up to scratch.

      1. DavCrav Silver badge

        Re: Testing vs secret sauce

        "What they need to be able to do is test the system to see what results it is given. If they show bias, then that's all they need to show the company isn't up to scratch."

        Ooh, I'll bite. Define 'bias'. Do you need to input two identical sets of data into the machine but change M to F, or white to black, and see what happens? Or are you going to use imprecise real-world data, with other confounders, and then infer bias from that?

  9. Nick Kew Silver badge

    The basic idea that algorithms and datasets that affect peoples lives (e.g. those used to determine credit or insurance) should be accountable - and ideally public - makes complete sense and should be uncontroversial.

    But I can't help suspecting that talk of prejudices and biases is unhelpful. It certainly muddles the issue with others that are controversial, and introduces suspicions around the motives of those involved. Bear in mind how few journalists - or their readers - can cope with the distinction of correlation vs causation.

    1. MonkeyCee Silver badge

      TRaining data

      "But I can't help suspecting that talk of prejudices and biases is unhelpful. "

      As a data scientist, I disagree. It's fairly easy to make sure your learning algo isn't inherently biased, or biased in the way you want*.

      Ensuring that the data you train it on isn't biased is much harder. Since real world data often is influenced by societal biases, even relying on accurate data alone won't mitigate it.

      I have argued that using only data from the majority group as the training set might result in less bias, but YMMV.

      Another issue is that certain items of information are protected, in that they can't be used legally to influence the decision. It's fairly trivial to deduce protected information, so does deduction count as knowing? So instead of being biased on skin colour (because it's not shown to the algo) it's biased on region. Instead of saying "no pakis" it says "no-one from Bradford"

      So unfortunately bias is the biggest issue with automated interactions, as once you've trained it, you can't pull those biases out. If it's a human with biases, you can at least retrain them.

      * human diagnosis, you care more about deadly diseases and those you can test for than straight percentage likelihood of which illness is causing the presented symptons. So you bias it towards reporting even 1 in a million possibilities.

      1. Spanners Silver badge
        Alert

        Re: TRaining data

        If it's a human with biases, you can at least retrain them.

        Human biases and preconceptions are,, often, very difficult to remove. It may be cheaper, faster and more reliable to replace the humans...

    2. Jack of Shadows Silver badge

      We've long had this problem in disparate impact analyses where a particular sample, say a corporation or hiring or really just about everywhere people are under a microscope, doesn't match the population at large and, therefore, groups are forced into some sort of consent decree to force that sample environment towards matching the population. None of the agencies, courts, legislators bothers with figuring out why a small group doesn't match population, the automatic assertion is "because discrimination!"

      Sorry, statistics (and computer science) is my first degree. If I turned up a sample that matched the population I'd be wondering how that happened. Variance, the average of the sum of the square of the deviations from the mean, are looked at and expected to be non-zero. Whatev's.

      So, I have little hope for any application of skepticism here about why something doesn't produce "desired" results even before we get into the correlation/causation examination. Sometimes "too hard!" is a valid point.

      1. Jack of Shadows Silver badge

        Dammit, all too often I elide past the beginning or intermediate sentences or even paragraphs and don't even realize it until I see the glazed look on my listeners. This is one such case.

        Foundation: It ranges from somewhat likely to completely unlikely that the data fed to the AI engine under development that this data will completely match the population of the universe "out there." The universe seems to like throwing curve balls (edge/corner/black-swans as we label them). We've already seen this with autonomous vehicles amongst other cases. Further, you toss people into the mix and you can open a real can of worms.

        Now the rest of the post above.

  10. Doctor Syntax Silver badge

    Stand by for responses along the lines of "If I could understand your question you wouldn't understand my answer".

    1. DavCrav Silver badge

      "It's really easy to explain. You start off with your standard multilevel regression and post-stratification model, you see, but then..."

  11. TG_RED

    this is so the Democat Machine can find a way to better manipulate the populace so they can finally pull of a WH win. Remember Booker is running for president. So anything that allows him to see how data is manipulated and used is beneficial to him to steal the next election.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019