当前位置:   article > 正文

Python-处理非结构化数据

python 表头剔除 非结构

Python-处理非结构化数据 (Python - Processing Unstructured Data)

The data that is already present in a row and column format or which can be easily converted to rows and columns so that later it can fit nicely into a database is known as structured data. Examples are CSV, TXT, XLS files etc. These files have a delimiter and either fixed or variable width where the missing values are represented as blanks in between the delimiters. But sometimes we get data where the lines are not fixed width, or they are just HTML, image or pdf files. Such data is known as unstructured data. While the HTML file can be handled by processing the HTML tags, a feed from twitter or a plain text document from a news feed can without having a delimiter does not have tags to handle. In such scenario we use different in-built functions from various python libraries to process the file.

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/2023面试高手/article/detail/233245
推荐阅读
相关标签
  

闽ICP备14008679号