nlp 与计算机视觉
目录 (Table of Contents)
- Introduction 介绍
- Data Science 数据科学
- Natural Langauge Processing 天然语言加工
- Computer Vision 计算机视觉
- Summary 摘要
介绍 (Introduction)
When applying for a position as a Data Scientist, you may see a variety of skills required in the job description section. You scroll down and then see even the education required is different between postings. Most importantly, you see an overview that summarizes the role, and although the title of the position is the same, the section varies considerably. This change is due to the varying types of Data Science positions that are available. However, I have noticed that these roles are taking on new names as companies understand their specialization in Data Science. Those two popular branches of Data Science are Natural Language Processing (NLP) and Computer Vision. Depending on the company you are eventually going to work for, or currently do work for, some positions will still be titled Data Science, but have the focus on NLP or Computer Vision, while some positions will be overall Data Science. I will be highlighting both NLP and Computer Vision so that you can find out more information on what it means to be either, along with expected respective salaries, and which role is ultimately a better specialization for you.
申请数据科学家职位时,您可能会在职位描述部分看到各种技能。 您向下滚动,然后甚至看到所需的教育是不同的职位之间。 最重要的是,您会看到概述角色的概述,尽管职位的标题是相同的,但本节的内容却大不相同。 发生这种变化是由于可用的数据科学职位类型不同。 但是,我注意到,随着公司了解其在数据科学领域的专业知识,这些角色正在重新命名。 数据科学的两个流行分支是自然语言处理(NLP)和计算机视觉。 根据您最终将要工作的公司或当前工作的公司而定,有些职位仍将标题为“数据科学”,但重点是NLP或“计算机视觉”,而有些职位将是整体数据科学。 我将同时强调NLP和计算机视觉,以便您可以找到更多有关其含义的信息,以及相应的期望薪水,以及哪种角色最终对您而言是更好的专业化。
数据科学 (Data Science)
Data Science is an extremely broad term that is oftentimes disputed amongst people, especially in technology. Current Data Scientists can have some bias on what they think Data Science really is based on what they have experienced at their first job, but then will later come to realize that Data Science is really a blanket term for several disciplines. These disciplines include or surround Natural Language Processing, Computer Vision, Machine Learning, Statistics, Mathematics, Programming, Data Analytics, Product Management, and Business Intelligence. It is really up to both you and the company you work for to decide what specific path you want to go down, or perhaps be generalists in all of these facets. A benefit of specializing in NLP or Computer Vision is that you will know what you are getting into, and can focus on learning and improving on those specific skills required by each, respective position.
数据科学是一个极为宽泛的术语,经常在人们之间引起争议,尤其是在技术方面。 当前的数据科学家可能会对他们认为数据科学真正基于他们在第一份工作中所经历的东西有偏见,但是后来人们才意识到,数据科学实际上是多个学科的总称。 这些学科包括或围绕自然语言处理,计算机视觉,机器学习,统计,数学,编程,数据分析,产品管理和商业智能。 实际上,您和您所工作的公司都需要决定要走哪条特定的路,或者在所有这些方面成为通才。 专门从事NLP或计算机视觉的好处是,您将了解自己所学的知识,并且可以专注于学习和改进各个职位所需的特定技能。
天然语言加工 (Natural Langauge Processing)
Sometimes a Data Scientist who specializes in NLP will be also referred to as an NLP Engineer. This specialization focuses on the natural langue of humans and how computers can be involved to digest this unstructured input and then output structured, useful meaning. While there are countless definitions and examples of this type of Data Science, I wanted to give my personal yet professional experience with NLP. I have worked with primarily three types of NLP projects. These three projects include:
有时,专门研究NLP的数据科学家也被称为NLP工程师。 该专业专注于人类的自然语言,以及如何使计算机参与消化这种非结构化的输入,然后输出结构化,有用的含义。 尽管有无数此类数据科学的定义和示例,但我想提供我个人但专业的NLP经验。 我主要处理三种类型的NLP项目。 这三个项目包括:
- Sentiment Analysis 情绪分析
- Topic Modeling 主题建模
- Text Categorization 文字分类
These projects have main concepts and those concepts can be applied to other forms of NLP as well. They all share similar tools and code to create beneficial outputs. I have specifically worked the most with NLP in the Python programming language.
这些项目具有主要概念,这些概念也可以应用于其他形式的NLP。 他们都共享相似的工具和代码来创建有益的输出。 我在Python编程语言中使用NLP的工作最多。
Sentiment Analysis — this form of NLP focuses on the mood or sentiment, polarity, and subjectivity of a given text. A typical stream of work for sentiment analysis would be to gather your data, preprocess it, and then tokenize it. Essentially, at this point, you will have each word that you are analyzing, cleaned, and stripped so that the words can be tagged. This next part is commonly referred to as POS or Part-of-Speech tagging. Once you establish what type of words you have, like adjectives, nouns, and verbs, you can easily apply a library’s function that will assign a polarity score to each text. Some popular sentiment NLP libraries are TextBlob and vaderSentiment. I will not go too in-depth here, but if you would like an article written about the specifics of NLP and these two, popular libraries, I would be happy to do that (please comment below). Sentiment analysis can be used widely by most businesses. Here are some examples of where sentiment analysis can be applied:
情感分析 -这种形式的NLP专注于给定文本的情绪或情感,极性和主观性。 情绪分析的典型工作流是收集数据,对其进行预处理,然后将其标记化。 本质上,在这一点上,您将拥有要分析,清理和剥离的每个单词,以便可以对这些单词进行标记。 下一部分通常称为POS或词性标记。 一旦确定了您拥有的单词类型(如形容词,名词和动词),就可以轻松地应用库函数,该函数将为每个文本分配极性分数。 一些流行的情绪NLP库是TextBlob和vaderSentiment。 我在这里不会太深入,但是如果您想写一篇有关NLP和这两个流行库的细节的文章,我很乐意这样做( 请在下面评论 )。 情绪分析可以被大多数企业广泛使用。 以下是一些可以应用情感分析的示例:
— customer reviews
- 顾客评论
— customer segmentation
-客户细分
— anomaly detection
—异常检测
— product improvement
—产品改进
Here is the summarized process of sentiment analysis:
以下是情绪分析的摘要过程:
gather datapreprocesstokenizePOS tagscoring
Topic Modeling — this form of NLP is under the branch of unsupervised learning that helps you to find topics of documents that are composed of text. One of the most popular ways to find topics in a document is utilizing LDA or Latent-Dirichlet-Allocation. It is a technique that ultimately outputs topics that summarize popular and important, key phrases from your text. Here are some examples of where topic modeling can be applied:
主题建模 -NLP的这种形式属于无监督学习的范畴,可帮助您查找由文本组成的文档的主题。 在文档中查找主题的最流行的方法之一是利用LDA或Latent-Dirichlet分配。 它是一种最终输出主题的技术,该主题总结了您的文本中流行且重要的关键短语。 以下是可以在其中应用主题建模的一些示例:
— coming up with new topics from the text
-提出文字中的新主题
— using those topics to assign new supervised learning labels
-使用这些主题分配新的监督学习标签
— insights that are too difficult to find from manual searching
-从手动搜索中很难找到的见解
Text Categorization — this form of NLP is a supervised learning technique that helps to classify new instances of data that do not need to necessarily only contain text, but contain numeric values as well. More broad than the two NLP forms, you can think of text categorization as a typical classification algorithm, where the label is text and some of the features are text as well. You will use those same techniques from above to preprocess, clean, and extract meaning from text. Here are some examples of where text classification can be applied:
文本分类 -NLP的这种形式是一种有监督的学习技术,有助于对不必仅包含文本但也包含数字值的新数据实例进行分类。 比这两种NLP形式更为广泛,您可以将文本分类视为一种典型的分类算法,其中标签是文本,而某些功能也是文本。 您将从上方使用这些相同的技术来预处理,清理和提取文本中的含义。 以下是一些可以应用文本分类的示例:
— categorizing animal specials
—对动物特惠进行分类
— categorizing fake news
—对假新闻进行分类
— categorizing bank transactions
—对银行交易进行分类
The most popular Python package is the nltk [2], which stands for Natural Language Toolkit. It contains several libraries that are essential in your quest to solve problems with NLP techniques.
最受欢迎的Python软件包是nltk [2],代表自然语言工具包。 它包含几个库,这些库对于您解决NLP技术问题至关重要。
How much does an NLP Engineer make?
NLP工程师赚多少钱?
According to Glassdoor [3], the average salary of an NLP Engineer in the United States is $114,121 / yr.
根据Glassdoor [3],美国NLP工程师的平均工资为每年114,121美元。
计算机视觉 (Computer Vision)
I believe this field of Data Science is even more specialized than NLP. Computer Vision focuses on image and video data, rather than numeric or text data. To me, Computer Vision has a bigger risk because it can be used in more industries that do not necessarily depend on insights, but require security and safety measures to be up into place. Think of how NLP and sentiment analysis worked to analyze the happiness of someone’s review, this insight is useful and powerful, but not as impactful or harmful as what Computer Vision can be. I will highlight some types of Computer Vision below.
我相信,数据科学领域比NLP更专业。 Computer Vision专注于图像和视频数据,而不是数字或文本数据。 对我来说,计算机视觉具有更大的风险,因为它可以用于更多的行业,这些行业不一定依赖于洞察力,而是需要采取适当的安全措施。 考虑一下自然语言处理和情感分析如何分析某人的评论的幸福感,这种见解是有用和强大的,但没有计算机视觉所能具有的影响力或危害性。 我将在下面重点介绍某些类型的计算机视觉。
Facial Recognition — when you pick up your phone, you most likely will have a security feature that analyzes your face to see if it is really you trying to access your phone. A popular Python library that benefits projects going over facial recognition is properly named as face_recognition. The images you work with that are composed of faces are encoded to a feature. Based on common features of the face, you can match (or not) individual faces to the same or different faces in order to ultimately ‘recognize’ the face.
面部识别 -拿起手机时,您很有可能会具有一项安全功能,该功能可以分析您的脸部,以查看您是否真的在尝试访问手机。 一个流行的Python库可以正确地命名为face_recognition ,该库使通过面部识别的项目受益 。 您使用的由面部组成的图像被编码为特征。 根据脸部的共同特征,您可以将( 或不 )将单个脸部与相同或不同的脸部进行匹配,以最终“识别”脸部。
Object Detection — using information from the object, this form of Computer Vision can aid in detecting objects. OpenCV is a popular tool used by programmers and Data Scientists who want to focus on object detection.
对象检测 -使用来自对象的信息,这种形式的Computer Vision可以帮助检测对象。 OpenCV是程序员和数据科学家希望关注对象检测的一种流行工具。
You can expect to find examples of Computer Vision in:
您可以期望在以下位置找到计算机视觉的示例:
— image detection
—图像检测
— iPhone Face ID
— iPhone脸部ID
— Facebook photo tagging
— Facebook照片标签
— Tesla pedestrian and car detection
-特斯拉行人和汽车检测
How much does a Computer Vision Engineer make?
计算机视觉工程师能赚多少钱?
According to Glassdoor [4], the average salary of an NLP Engineer in the United States is $99,619 / yr.
根据Glassdoor [4],美国NLP工程师的平均工资为99,619美元/年。
While both of these salaries are high, I personally have seen from job postings that not only do Computer Vision Engineers make more than the reported average salary, but also do NLP Engineers. Because these two roles in Data Science are becoming more and more specialized, I believe that is why you can expect to have a higher salary.
虽然这两个职位的薪水都很高,但我个人从职位发布中看到,不仅计算机视觉工程师的薪水高于报告的平均薪水,而且NLP工程师的薪水也高。 由于数据科学中的这两个角色越来越专业化,我相信这就是为什么您可以期望获得更高的薪水。
摘要 (Summary)
Most Data Scientists have probably studied some form of NLP or Computer Vision, whether that be from a university or online tutorial. Both of these specialized roles in Data Science are highly respected and can benefit countless industries. It ultimately depends on your preferences and career goals when answering the question of ‘Would you rather be an NLP Engineer or Computer Vision Engineer?’. Think of which types of projects you would like to work on, which industry you would like to work for, and which company you would like to be associated with. Both of these positions in Data Science can result in high impact results from your work, so either will grant you a motivating experience.
大多数数据科学家可能已经研究了某种形式的NLP或计算机视觉,无论是来自大学还是在线教程。 数据科学中的这两个专业角色都受到高度尊重,可以使无数行业受益。 回答“ 您愿意成为NLP工程师还是计算机视觉工程师? ”这个问题最终取决于您的喜好和职业目标。 '。 考虑您想从事的项目类型,您想从事的行业以及您想与哪个公司建立联系。 数据科学中的这两个职位都可以从您的工作中获得高影响力的结果,因此任何一个都会给您带来积极的经历。
I hope that you found this article interesting and useful. Feel free to comment down below your experience as a general Data Scientist, NLP Engineer, or Computer Vision Engineer.
我希望您发现本文有趣而有用。 请随意评论您作为一般数据科学家,NLP工程师或计算机视觉工程师的经验。
Thank you for reading!
感谢您的阅读!
nlp 与计算机视觉