- # 16、pandas.DataFrame.to_json函数
- DataFrame.to_json(path_or_buf=None, *, orient=None, date_format=None, double_precision=10, force_ascii=True, date_unit='ms', default_handler=None, lines=False, compression='infer', index=None, indent=None, storage_options=None, mode='w')
- Convert the object to a JSON string.
- Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.
- Parameters:
- path_or_bufstr, path object, file-like object, or None, default None
- String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If None, the result is returned as a string.
- orientstr
- Indication of expected JSON string format.
- Series:
- default is ‘index’
- allowed values are: {‘split’, ‘records’, ‘index’, ‘table’}.
- DataFrame:
- default is ‘columns’
- allowed values are: {‘split’, ‘records’, ‘index’, ‘columns’, ‘values’, ‘table’}.
- The format of the JSON string:
- ‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}
- ‘records’ : list like [{column -> value}, … , {column -> value}]
- ‘index’ : dict like {index -> {column -> value}}
- ‘columns’ : dict like {column -> {index -> value}}
- ‘values’ : just the values array
- ‘table’ : dict like {‘schema’: {schema}, ‘data’: {data}}
- Describing the data, where data component is like orient='records'.
- date_format{None, ‘epoch’, ‘iso’}
- Type of date conversion. ‘epoch’ = epoch milliseconds, ‘iso’ = ISO8601. The default depends on the orient. For orient='table', the default is ‘iso’. For all other orients, the default is ‘epoch’.
- double_precisionint, default 10
- The number of decimal places to use when encoding floating point values. The possible maximal value is 15. Passing double_precision greater than 15 will raise a ValueError.
- force_asciibool, default True
- Force encoded string to be ASCII.
- date_unitstr, default ‘ms’ (milliseconds)
- The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.
- default_handlercallable, default None
- Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serialisable object.
- linesbool, default False
- If ‘orient’ is ‘records’ write out line-delimited json format. Will throw ValueError if incorrect ‘orient’ since others are not list-like.
- compressionstr or dict, default ‘infer’
- For on-the-fly compression of the output data. If ‘infer’ and ‘path_or_buf’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to None for no compression. Can also be a dict with key 'method' set to one of {'zip', 'gzip', 'bz2', 'zstd', 'xz', 'tar'} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, zstandard.ZstdCompressor, lzma.LZMAFile or tarfile.TarFile, respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}.
- New in version 1.5.0: Added support for .tar files.
- Changed in version 1.4.0: Zstandard support.
- indexbool or None, default None
- The index is only used when ‘orient’ is ‘split’, ‘index’, ‘column’, or ‘table’. Of these, ‘index’ and ‘column’ do not support index=False.
- indentint, optional
- Length of whitespace used to indent each record.
- storage_optionsdict, optional
- Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.
- modestr, default ‘w’ (writing)
- Specify the IO mode for output when supplying a path_or_buf. Accepted args are ‘w’ (writing) and ‘a’ (append) only. mode=’a’ is only supported when lines is True and orient is ‘records’.
- Returns:
- None or str
- If path_or_buf is None, returns the resulting json format as a string. Otherwise returns None.
16-2-2-1、'split':字典像{index -> [index], columns -> [columns], data -> [values]}。
16-2-2-2、'records': 列表像[{column -> value}, ... , {column -> value}]。
16-2-2-3、'index': 字典像index -> {column -> value}},其中索引是JSON对象中的键。
16-2-2-4、'columns': 字典像{{column -> index} -> value}。
16-2-2-5、'values': 仅仅是值数组。
16-2-3、date_format(可选,默认值为None):字符串,用于日期时间对象的格式。默认为 None
16-2-6、date_unit(可选,默认值为'ms'):字符串,用于时间戳的时间单位,'s', 'ms', 'us', 'ns' 分别代表秒、毫秒、微秒、纳秒。
16-2-9、compression(可选,默认值为'infer'):字符串或None,指定用于写入文件的压缩方式。'infer'(默认)会根据文件扩展名自动选择压缩方式(如 .gz)。
16-2-12、storage_options(可选,默认值为None):字典,用于文件存储的额外选项,如AWS S3访问密钥。
16-2-13、mode(可选,默认值为'w'):字符串,'w' 表示写入模式(如果文件存在则覆盖),'a'表示追加模式。
将Pandas DataFrame对象转换为JSON格式的数据,并可以选择性地将其写入文件或作为字符串返回。
- # 16、pandas.DataFrame.to_json函数
- # 16-1、直接输出
- import pandas as pd
- df = pd.DataFrame({
- 'A': [1, 2, 3],
- 'B': [4, 5, 6]
- })
- json_str = df.to_json(orient='records')
- print(json_str) # 输出:[{"A":1,"B":4},{"A":2,"B":5},{"A":3,"B":6}]
- # 16-2、写入文件
- import pandas as pd
- df = pd.DataFrame({
- 'A': [1, 2, 3],
- 'B': [4, 5, 6]
- })
- df.to_json('data.json', orient='records', lines=True)
- # 在Python脚本所在目录自动生成data.json文件,文件中包含了JSON数据
- # 16、pandas.DataFrame.to_json函数
- # 16-1、直接输出
- # [{"A":1,"B":4},{"A":2,"B":5},{"A":3,"B":6}]
- # 16-2、写入文件
- # 在Python脚本所在目录自动生成data.json文件,文件中包含了JSON数据
- # 17、pandas.read_html函数
- pandas.read_html(io, *, match='.+', flavor=None, header=None, index_col=None, skiprows=None, attrs=None, parse_dates=False, thousands=',', encoding=None, decimal='.', converters=None, na_values=None, keep_default_na=True, displayed_only=True, extract_links=None, dtype_backend=_NoDefault.no_default, storage_options=None)
- Read HTML tables into a list of DataFrame objects.
- Parameters:
- iostr, path object, or file-like object
- String, path object (implementing os.PathLike[str]), or file-like object implementing a string read() function. The string can represent a URL or the HTML itself. Note that lxml only accepts the http, ftp and file url protocols. If you have a URL that starts with 'https' you might try removing the 's'.
- Deprecated since version 2.1.0: Passing html literal strings is deprecated. Wrap literal string/bytes input in io.StringIO/io.BytesIO instead.
- matchstr or compiled regular expression, optional
- The set of tables containing text matching this regex or string will be returned. Unless the HTML is extremely simple you will probably need to pass a non-empty string here. Defaults to ‘.+’ (match any non-empty string). The default value will return all tables contained on a page. This value is converted to a regular expression so that there is consistent behavior between Beautiful Soup and lxml.
- flavor{“lxml”, “html5lib”, “bs4”} or list-like, optional
- The parsing engine (or list of parsing engines) to use. ‘bs4’ and ‘html5lib’ are synonymous with each other, they are both there for backwards compatibility. The default of None tries to use lxml to parse and if that fails it falls back on bs4 + html5lib.
- headerint or list-like, optional
- The row (or list of rows for a MultiIndex) to use to make the columns headers.
- index_colint or list-like, optional
- The column (or list of columns) to use to create the index.
- skiprowsint, list-like or slice, optional
- Number of rows to skip after parsing the column integer. 0-based. If a sequence of integers or a slice is given, will skip the rows indexed by that sequence. Note that a single element sequence means ‘skip the nth row’ whereas an integer means ‘skip n rows’.
- attrsdict, optional
- This is a dictionary of attributes that you can pass to use to identify the table in the HTML. These are not checked for validity before being passed to lxml or Beautiful Soup. However, these attributes must be valid HTML table attributes to work correctly. For example,
- attrs = {'id': 'table'}
- is a valid attribute dictionary because the ‘id’ HTML tag attribute is a valid HTML attribute for any HTML tag as per this document.
- attrs = {'asdf': 'table'}
- is not a valid attribute dictionary because ‘asdf’ is not a valid HTML attribute even if it is a valid XML attribute. Valid HTML 4.01 table attributes can be found here. A working draft of the HTML 5 spec can be found here. It contains the latest information on table attributes for the modern web.
- parse_datesbool, optional
- See read_csv() for more details.
- thousandsstr, optional
- Separator to use to parse thousands. Defaults to ','.
- encodingstr, optional
- The encoding used to decode the web page. Defaults to None.``None`` preserves the previous encoding behavior, which depends on the underlying parser library (e.g., the parser library will try to use the encoding provided by the document).
- decimalstr, default ‘.’
- Character to recognize as decimal point (e.g. use ‘,’ for European data).
- convertersdict, default None
- Dict of functions for converting values in certain columns. Keys can either be integers or column labels, values are functions that take one input argument, the cell (not column) content, and return the transformed content.
- na_valuesiterable, default None
- Custom NA values.
- keep_default_nabool, default True
- If na_values are specified and keep_default_na is False the default NaN values are overridden, otherwise they’re appended to.
- displayed_onlybool, default True
- Whether elements with “display: none” should be parsed.
- extract_links{None, “all”, “header”, “body”, “footer”}
- Table elements in the specified section(s) with <a> tags will have their href extracted.
- New in version 1.5.0.
- dtype_backend{‘numpy_nullable’, ‘pyarrow’}, default ‘numpy_nullable’
- Back-end data type applied to the resultant DataFrame (still experimental). Behaviour is as follows:
- "numpy_nullable": returns nullable-dtype-backed DataFrame (default).
- "pyarrow": returns pyarrow-backed nullable ArrowDtype DataFrame.
- New in version 2.0.
- storage_optionsdict, optional
- Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.
- New in version 2.1.0.
- Returns:
- dfs
- A list of DataFrames.
17-2-2、match(可选,默认值为'.+'):字符串或正则表达式,用于过滤出符合条件的表格。默认值为 '.+',意味着匹配所有表格。
17-2-3、flavor(可选,默认值为None):字符串,指定解析HTML的库,Pandas使用lxml或bs4(BeautifulSoup 4)来解析HTML。如果未指定,Pandas会尝试自动选择。
17-2-15、displayed_only(可选,默认值为True):布尔值,如果为True,则只解析可见的表格元素(忽略 <style> 或 <script> 中的表格)。
从HTML文档(通常包含表格数据)中读取数据,并将这些数据解析为Pandas DataFrame或DataFrame的列表(如果HTML文档中包含多个表格)。
- # 17、pandas.read_html函数
- import pandas as pd
- # 假设的HTML内容,其中包含了一个简单的表格
- html_content = """
- <html>
- <head><title>示例表格</title></head>
- <body>
- <table border="1">
- <tr>
- <th>姓名</th>
- <th>年龄</th>
- <th>职业</th>
- </tr>
- <tr>
- <td>张三</td>
- <td>30</td>
- <td>工程师</td>
- </tr>
- <tr>
- <td>李四</td>
- <td>25</td>
- <td>设计师</td>
- </tr>
- </table>
- </body>
- </html>
- """
- # 使用pandas.read_html函数读取HTML中的表格
- # 注意:read_html返回一个DataFrame列表,因为HTML中可以包含多个表格
- dfs = pd.read_html(html_content)
- # 我们假设HTML中只有一个表格,因此取第一个DataFrame
- df = dfs[0]
- # 显示DataFrame内容
- print(df)
- # 17、pandas.read_html函数
- # 姓名 年龄 职业
- # 0 张三 30 工程师
- # 1 李四 25 设计师
- # 18、pandas.DataFrame.to_html函数
- DataFrame.to_html(buf=None, *, columns=None, col_space=None, header=True, index=True, na_rep='NaN', formatters=None, float_format=None, sparsify=None, index_names=True, justify=None, max_rows=None, max_cols=None, show_dimensions=False, decimal='.', bold_rows=True, classes=None, escape=True, notebook=False, border=None, table_id=None, render_links=False, encoding=None)
- Render a DataFrame as an HTML table.
- Parameters
- :
- buf
- str, Path or StringIO-like, optional, default None
- Buffer to write to. If None, the output is returned as a string.
- columns
- array-like, optional, default None
- The subset of columns to write. Writes all columns by default.
- col_space
- str or int, list or dict of int or str, optional
- The minimum width of each column in CSS length units. An int is assumed to be px units..
- header
- bool, optional
- Whether to print column labels, default True.
- index
- bool, optional, default True
- Whether to print index (row) labels.
- na_rep
- str, optional, default ‘NaN’
- String representation of NaN to use.
- formatters
- list, tuple or dict of one-param. functions, optional
- Formatter functions to apply to columns’ elements by position or name. The result of each function must be a unicode string. List/tuple must be of length equal to the number of columns.
- float_format
- one-parameter function, optional, default None
- Formatter function to apply to columns’ elements if they are floats. This function must return a unicode string and will be applied only to the non-NaN elements, with NaN being handled by na_rep.
- sparsify
- bool, optional, default True
- Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row.
- index_names
- bool, optional, default True
- Prints the names of the indexes.
- justify
- str, default None
- How to justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box. Valid values are
- left
- right
- center
- justify
- justify-all
- start
- end
- inherit
- match-parent
- initial
- unset.
- max_rows
- int, optional
- Maximum number of rows to display in the console.
- max_cols
- int, optional
- Maximum number of columns to display in the console.
- show_dimensions
- bool, default False
- Display DataFrame dimensions (number of rows by number of columns).
- decimal
- str, default ‘.’
- Character recognized as decimal separator, e.g. ‘,’ in Europe.
- bold_rows
- bool, default True
- Make the row labels bold in the output.
- classes
- str or list or tuple, default None
- CSS class(es) to apply to the resulting html table.
- escape
- bool, default True
- Convert the characters <, >, and & to HTML-safe sequences.
- notebook
- {True, False}, default False
- Whether the generated HTML is for IPython Notebook.
- border
- int
- A border=border attribute is included in the opening <table> tag. Default pd.options.display.html.border.
- table_id
- str, optional
- A css id is included in the opening <table> tag if specified.
- render_links
- bool, default False
- Convert URLs to HTML links.
- encoding
- str, default “utf-8”
- Set character encoding.
- Returns
- :
- str or None
- If buf is None, returns the result as a string. Otherwise returns None.
18-2-1、buf(可选,默认值为None):一个文件对象(如文件句柄)或StringIO对象,用于写入生成的 HTML。如果为None(默认值),则返回一个字符串。
18-2-11、justify(可选,默认值为None):如何对齐HTML表格中的文本,可以是'left', 'right', 'center', None。如果为None,则使用DataFrame的样式设置(如果有的话)。
18-2-18、notebook(可选,默认值为False):在Jupyter Notebook中使用时,是否使用特定的样式和格式,这通常会自动处理,但在某些情况下可能需要手动设置。
用于将Pandas DataFrame转换为HTML表格的字符串表示。
- # 18、pandas.DataFrame.to_html函数
- # 18-1、返回字符串
- import pandas as pd
- # 创建一个简单的 DataFrame
- df = pd.DataFrame({
- 'A': [1, 2, 3],
- 'B': [4, 5, 6],
- 'C': [7, 8, 9]
- })
- # 将 DataFrame 转换为 HTML 字符串
- html_str = df.to_html()
- # 打印 HTML 字符串
- print(html_str)
- # 18-2、写入文件
- import pandas as pd
- # 创建一个简单的 DataFrame
- df = pd.DataFrame({
- 'A': [1, 2, 3],
- 'B': [4, 5, 6],
- 'C': [7, 8, 9]
- })
- # 打开一个文件用于写入
- with open('example.html', 'w') as f:
- # 将DataFrame写入文件
- df.to_html(buf=f)
- # 此时,HTML表格已经被写入到'example.html'文件中
- # 18、pandas.DataFrame.to_html函数
- # 18-1、返回字符串
- # <table border="1" class="dataframe">
- # <thead>
- # <tr style="text-align: right;">
- # <th></th>
- # <th>A</th>
- # <th>B</th>
- # <th>C</th>
- # </tr>
- # </thead>
- # <tbody>
- # <tr>
- # <th>0</th>
- # <td>1</td>
- # <td>4</td>
- # <td>7</td>
- # </tr>
- # <tr>
- # <th>1</th>
- # <td>2</td>
- # <td>5</td>
- # <td>8</td>
- # </tr>
- # <tr>
- # <th>2</th>
- # <td>3</td>
- # <td>6</td>
- # <td>9</td>
- # </tr>
- # </tbody>
- # </table>
- # 18-2、写入文件
- # HTML表格已经被写入到'example.html'文件中,'example.html'文件与Python脚本在同一目录
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。