nlp自然语言处理
Natural language processing (NLP) is one of the biggest areas of machine learning research, and although current linguistic machine learning models achieve numerically-high performance on many language-understanding tasks, they often lack optimization for reducing implicit biases.
自然语言处理(NLP)是机器学习研究的最大领域之一,尽管当前的语言机器学习模型在许多理解语言的任务上实现了数值上的高性能,但它们通常缺乏优化以减少隐性偏差。
Let’s start from the beginning.
让我们从头开始。
What is bias in machine learning models? Essentially, it’s when machine learning algorithms express implicit biases that often pass undetected during testing because most papers test their models for raw accuracy. Take, for example, the following instances of deep learning models expressing gender bias. According to our deep learning models,
机器学习模型中的偏见是什么? 从本质上讲,这是机器学习算法表达隐性偏差的时候,该偏差通常在测试过程中未被发现,因为大多数论文都在测试其模型的原始准确性。 以下列表示性别偏见的深度学习模型为例。 根据我们的深度学习模型,
“He is doctor” has a higher likelihood than “She is doctor.” [Source]
“他是医生”比“她是医生”的可能性更高。 [ 来源 ]
Man is to woman as computer programmer is to homemaker. [Source]
男人是女人,计算机程序员是家庭主妇。 [ 来源 ]
Sentences with female nouns are more indicative of anger. [Source]
带有女性名词的句子更能表示愤怒。 [ 来源 ]
Translating “He is a nurse. She is a doctor” into Hungarian and back to English results in “She is a nurse. He is a doctor.” [Source]
翻译“他是一名护士。 她是匈牙利的一名医生”,而英语为“她是一名护士”。 他是一个医生。” [ 来源 ]
In these examples, the algorithm is essentially expressing stereotypes, which differs from an example such as “man is to woman as king is to queen” because king and queen have a literal gender definition. Kings are defined to be male and queens are defined to be female. Computer progr