赞
踩
The data that is already present in a row and column format or which can be easily converted to rows and columns so that later it can fit nicely into a database is known as structured data. Examples are CSV, TXT, XLS files etc. These files have a delimiter and either fixed or variable width where the missing values are represented as blanks in between the delimiters. But sometimes we get data where the lines are not fixed width, or they are just HTML, image or pdf files. Such data is known as unstructured data. While the HTML file can be handled by processing the HTML tags, a feed from twitter or a plain text document from a news feed can without having a delimiter does not have tags to handle. In such scenario we use different in-built functions from various python libraries to process the file.
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。