赞
踩
目录
237、pandas.Series.searchsorted方法
- # 236、pandas.Series.explode方法
- pandas.Series.explode(ignore_index=False)
- Transform each element of a list-like to a row.
-
- Parameters:
- ignore_index
- bool, default False
- If True, the resulting index will be labeled 0, 1, …, n - 1.
-
- Returns:
- Series
- Exploded lists to rows; index will be duplicated for these rows.
236-2-1、ignore_index(可选,默认值为False):布尔值,若设置为False,则保持原始索引,展开后的新Series保持原始Series的索引;若设置为True,则忽略原始索引,展开后的新Series使用新的整数索引。
将包含列表、元组或类似的可迭代对象的Series进行展开,使每个元素在新Series中都有一行。简单来说,它可以将一个包含列表的Series转换为一个平坦的Series,其中每个列表元素占据一行。
返回一个新的Series,其索引可能是原来的索引(如果ignore_index=False)或者是重新生成的整数索引(如果ignore_index=True)每个列表-like 元素中的项都变成新的行,如果某元素不是列表-like,则保持不变。
使用场景:
236-5-1、处理嵌套列表数据:在处理从JSON、数据库或其他数据源导入的嵌套数据时,常常会遇到列表嵌套在单个单元格中的情况。explode()方法可以将这些嵌套列表展开为单独的行,便于进一步分析。如:电商订单数据,每个订单包含多个商品。
236-5-2、数据清洗与预处理:在数据清洗过程中,常常需要将一个单元格中的多个值分成多行,以便进行进一步的操作和清洗。如:用户标签数据,每个用户可能有多个标签。
236-5-3、文本分析:在自然语言处理和文本分析中,常常需要将文本数据拆分成单词或短语,然后对这些拆分后的单词或短语进行分析,explode()方法可以帮助将分词后的列表展开为单独的行。如:分词后的文本数据。
236-5-4、时间序列数据处理:在时间序列数据处理中,可能会有某些时间点对应多个事件或值的情况,explode()方法可以将这些多值的时间点展开为多个时间点,以便于进一步分析和处理。如:某时间点的多个事件。
无
- # 236、pandas.Series.explode方法
- # 236-1、处理嵌套列表数据
- import pandas as pd
- # 示例数据
- orders = pd.Series([['item1', 'item2'], ['item3'], ['item4', 'item5', 'item6']])
- # 使用explode方法展开商品列表
- exploded_orders = orders.explode()
- print(exploded_orders, end='\n\n')
-
- # 236-2、数据清洗与预处理
- import pandas as pd
- # 示例数据
- user_tags = pd.Series([['tag1', 'tag2'], ['tag3'], ['tag4', 'tag5', 'tag6']])
- # 使用explode方法展开标签列表
- exploded_tags = user_tags.explode()
- print(exploded_tags, end='\n\n')
-
- # 236-3、文本分析
- import pandas as pd
- # 示例数据
- texts = pd.Series([['word1', 'word2', 'word3'], ['word4'], ['word5', 'word6']])
- # 使用explode方法展开分词后的列表
- exploded_texts = texts.explode()
- print(exploded_texts, end='\n\n')
-
- # 236-4、时间序列数据处理
- import pandas as pd
- # 示例数据
- time_series = pd.Series([['event1', 'event2'], ['event3'], ['event4', 'event5', 'event6']])
- # 使用explode方法展开时间点的事件列表
- exploded_time_series = time_series.explode()
- print(exploded_time_series)

- # 236、pandas.Series.explode方法
- # 236-1、处理嵌套列表数据
- # 0 item1
- # 0 item2
- # 1 item3
- # 2 item4
- # 2 item5
- # 2 item6
- # dtype: object
-
- # 236-2、数据清洗与预处理
- # 0 tag1
- # 0 tag2
- # 1 tag3
- # 2 tag4
- # 2 tag5
- # 2 tag6
- # dtype: object
-
- # 236-3、文本分析
- # 0 word1
- # 0 word2
- # 0 word3
- # 1 word4
- # 2 word5
- # 2 word6
- # dtype: object
-
- # 236-4、时间序列数据处理
- # 0 event1
- # 0 event2
- # 1 event3
- # 2 event4
- # 2 event5
- # 2 event6
- # dtype: object

- # 237、pandas.Series.searchsorted方法
- pandas.Series.searchsorted(value, side='left', sorter=None)
- Find indices where elements should be inserted to maintain order.
-
- Find the indices into a sorted Series self such that, if the corresponding elements in value were inserted before the indices, the order of self would be preserved.
-
- Note
-
- The Series must be monotonically sorted, otherwise wrong locations will likely be returned. Pandas does not check this for you.
-
- Parameters:
- value
- array-like or scalar
- Values to insert into self.
-
- side
- {‘left’, ‘right’}, optional
- If ‘left’, the index of the first suitable location found is given. If ‘right’, return the last such index. If there is no suitable index, return either 0 or N (where N is the length of self).
-
- sorter
- 1-D array-like, optional
- Optional array of integer indices that sort self into ascending order. They are typically the result of np.argsort.
-
- Returns:
- int or array of int
- A scalar or array of insertion points with the same shape as value.

237-2-1、value(必须):标量或数组型数据,表示要查找的值。
237-2-2、side(可选,默认值为'left'):{'left', 'right'},表示在找到等于value的元素时,是插入到左边还是右边。'left'表示插入到等于value的元素的左侧,'right'表示插入到右侧。
237-2-3、sorter(可选,默认值为None):可选数组型数据,表示Series排序后的索引。
用于查找一个值或一组值在一个排序好的Series中应插入的位置,以保持顺序不变,该方法对于二分查找、数据插入和位置索引等操作非常有用。
返回整数或整数数组,表示插入位置的索引。
无
无
- # 237、pandas.Series.searchsorted方法
- # 237-1、基本用法
- import pandas as pd
- # 创建一个排序好的Series
- s = pd.Series([1, 2, 3, 4, 5])
- # 查找插入值的位置
- index = s.searchsorted(3)
- print(index, end='\n\n')
-
- # 237-2、使用'side'参数
- import pandas as pd
- # 创建一个排序好的Series
- s = pd.Series([1, 2, 3, 3, 4, 5])
- # 查找插入值的位置(插入左侧)
- index_left = s.searchsorted(3, side='left')
- print(index_left)
- # 查找插入值的位置(插入右侧)
- index_right = s.searchsorted(3, side='right')
- print(index_right, end='\n\n')
-
- # 237-3、处理未排序的Series
- import pandas as pd
- # 创建一个未排序的Series
- s = pd.Series([5, 1, 4, 2, 3])
- # 获取排序后的索引
- sorter = s.argsort()
- # 查找插入值的位置
- index = s.searchsorted(3, sorter=sorter)
- print(index)

- # 237、pandas.Series.searchsorted方法
- # 237-1、基本用法
- # 2
-
- # 237-2、使用'side'参数
- # 2
- # 4
-
- # 237-3、处理未排序的Series
- # 2
- # 238、pandas.Series.ravel方法
- pandas.Series.ravel(order='C')
- Return the flattened underlying data as an ndarray or ExtensionArray.
-
- Deprecated since version 2.2.0: Series.ravel is deprecated. The underlying array is already 1D, so ravel is not necessary. Use to_numpy() for conversion to a numpy array instead.
-
- Returns:
- numpy.ndarray or ExtensionArray
- Flattened data of the Series.
238-2-1、order(可选,默认值为'C'):字符串类型,选项有:
用于将Series对象展平为一个一维的NumPy数组。
返回一个一维的NumPy数组,其中包含了原Series对象中的所有数据。
此方法目前版本仍然能用,但后续将被pandas.Series.to_numpy方法替代。
无
- # 238、pandas.Series.ravel方法
- import pandas as pd
- import numpy as np
- # 创建一个Pandas Series对象
- data = pd.Series([1, 2, 3, 4, 5])
- # 使用ravel()方法
- flattened_data_C = data.ravel(order='C')
- flattened_data_F = data.ravel(order='F')
- print("Flattened data (C order):", flattened_data_C)
- print("Flattened data (F order):", flattened_data_F)
- # 238、pandas.Series.ravel方法
- # Flattened data (C order): [1 2 3 4 5]
- # Flattened data (F order): [1 2 3 4 5]
- # 239、pandas.Series.repeat方法
- pandas.Series.repeat(repeats, axis=None)
- Repeat elements of a Series.
-
- Returns a new Series where each element of the current Series is repeated consecutively a given number of times.
-
- Parameters:
- repeats
- int or array of ints
- The number of repetitions for each element. This should be a non-negative integer. Repeating 0 times will return an empty Series.
-
- axis
- None
- Unused. Parameter needed for compatibility with DataFrame.
-
- Returns:
- Series
- Newly created Series with repeated elements.

239-2-1、repeats(必须):整数或整数数组,如果是单个整数,则Series中的每个元素都会被重复该整数指定的次数;如果是一个与Series等长的整数数组,则每个元素会按照对应位置的整数进行重复。
239-2-2、axis(可选,默认值为None):参数在Series中无效,因为Series是一维的,因此这个参数在这里不被使用。
用于将Series中的每个元素按指定的次数重复,该方法对于数据扩展或增加数据量非常有用。
返回一个新的Pandas Series对象,其中每个元素按指定的次数进行了重复。
无
无
- # 239、pandas.Series.repeat方法
- import pandas as pd
- # 创建一个Pandas Series对象
- data = pd.Series([1, 2, 3])
- # 每个元素重复3次
- repeated_data_1 = data.repeat(3)
- # 每个元素根据给定的数组分别重复
- repeated_data_2 = data.repeat([1, 2, 3])
- print("Repeated data (3 times):")
- print(repeated_data_1)
- print("\nRepeated data (1, 2, 3 times respectively):")
- print(repeated_data_2)
- # 239、pandas.Series.repeat方法
- # Repeated data (3 times):
- # 0 1
- # 0 1
- # 0 1
- # 1 2
- # 1 2
- # 1 2
- # 2 3
- # 2 3
- # 2 3
- # dtype: int64
- #
- # Repeated data (1, 2, 3 times respectively):
- # 0 1
- # 1 2
- # 1 2
- # 2 3
- # 2 3
- # 2 3
- # dtype: int64

- # 240、pandas.Series.squeeze方法
- pandas.Series.squeeze(axis=None)
- Squeeze 1 dimensional axis objects into scalars.
-
- Series or DataFrames with a single element are squeezed to a scalar. DataFrames with a single column or a single row are squeezed to a Series. Otherwise the object is unchanged.
-
- This method is most useful when you don’t know if your object is a Series or DataFrame, but you do know it has just a single column. In that case you can safely call squeeze to ensure you have a Series.
-
- Parameters:
- axis
- {0 or ‘index’, 1 or ‘columns’, None}, default None
- A specific axis to squeeze. By default, all length-1 axes are squeezed. For Series this parameter is unused and defaults to None.
-
- Returns:
- DataFrame, Series, or scalar
- The projection after squeezing axis or all the axes.

240-2-1、axis(可选,默认值为None):{None, 0, 1},选项有:
用于去除Series中长度为1的维度,它常用于处理从DataFrame中提取的单列或单行结果,使得返回的结果更加简洁。
返回一个去除了长度为1的维度后的对象,如果没有长度为1的维度,则返回原对象。
无
无
- # 240、pandas.Series.squeeze方法
- # 240-1、从DataFrame提取单行或单列
- import pandas as pd
- # 创建一个DataFrame
- df = pd.DataFrame({
- 'A': [10, 20, 30],
- 'B': [15, 25, 35]
- })
- # 提取单列
- single_column = df[['A']]
- squeezed_column = single_column.squeeze()
- # 提取单行
- single_row = df.iloc[[0]]
- squeezed_row = single_row.squeeze()
- print("Original single column DataFrame:")
- print(single_column)
- print("Squeezed Series from single column:")
- print(squeezed_column)
- print("Original single row DataFrame:")
- print(single_row)
- print("Squeezed Series from single row:")
- print(squeezed_row, end='\n\n')
-
- # 240-2、数据分组后的操作
- import pandas as pd
- # 创建一个DataFrame
- df = pd.DataFrame({
- 'Category': ['A', 'A', 'B'],
- 'Value': [10, 20, 30]
- })
- # 按'Category'分组并计算均值
- grouped = df.groupby('Category').mean()
- # 获取特定类别的数据并使用squeeze
- single_category_mean = grouped.loc[['A']]
- squeezed_category_mean = single_category_mean.squeeze()
- print("Grouped mean DataFrame:")
- print(single_category_mean)
- print("Squeezed mean for single category:")
- print(squeezed_category_mean, end='\n\n')
-
- # 240-3、提高内存效率和性能
- import pandas as pd
- # 创建一个大型DataFrame
- large_df = pd.DataFrame({'Value': range(1000000)})
- # 提取单列并使用squeeze
- squeezed_series = large_df[['Value']].squeeze()
- # 检查内存使用
- print("Memory usage of original DataFrame:", large_df.memory_usage(deep=True).sum())
- print("Memory usage of squeezed Series:", squeezed_series.memory_usage(deep=True), end='\n\n')
-
- # 240-4、与函数进行交互
- import matplotlib.pyplot as plt
- # 定义一个只接受 Series 的绘图函数
- def plot_series(series):
- series.plot(kind='line', title='Series Plot')
- plt.show()
- # 提取数据并传递给函数
- data = df[['Value']].iloc[0:3] # 提取单列
- plot_series(data.squeeze())
-
- # 240-5、简化输出
- # 计算平均值并使用squeeze
- processed_result = df[['Value']].mean().squeeze()
- def display_result(result):
- print(f"Processed Result: {result}")
- # 使用squeeze简化输出
- display_result(processed_result)
-
- # 240-6、数据清洗与转换
- import pandas as pd
- # 创建一个包含冗余维度的DataFrame
- redundant_df = pd.DataFrame({'Value': [[10], [20], [30]]})
- # 使用apply和squeeze清理数据
- cleaned_series = redundant_df['Value'].apply(lambda x: pd.Series(x).squeeze())
- print("Original DataFrame with redundant dimension:")
- print(redundant_df)
- print("Cleaned Series:")
- print(cleaned_series, end='\n\n')
-
- # 240-7、数学与统计计算
- import pandas as pd
- # 创建一个DataFrame
- df = pd.DataFrame({'Value': [10, 20, 30]})
- # 计算总和并使用squeeze
- total_sum = df[['Value']].sum().squeeze()
- print("Total sum of values:", total_sum)

- # 240、pandas.Series.squeeze方法
- # 240-1、从DataFrame提取单行或单列
- # Original single column DataFrame:
- # A
- # 0 10
- # 1 20
- # 2 30
- # Squeezed Series from single column:
- # 0 10
- # 1 20
- # 2 30
- # Name: A, dtype: int64
- # Original single row DataFrame:
- # A B
- # 0 10 15
- # Squeezed Series from single row:
- # A 10
- # B 15
- # Name: 0, dtype: int64
-
- # 240-2、数据分组后的操作
- # Grouped mean DataFrame:
- # Value
- # Category
- # A 15.0
- # Squeezed mean for single category:
- # 15.0
-
- # 240-3、提高内存效率和性能
- # Memory usage of original DataFrame: 8000132
- # Memory usage of squeezed Series: 8000132
-
- # 240-4、与函数进行交互
- # 见图1
-
- # 240-5、简化输出
- # Processed Result: 20.0
-
- # 240-6、数据清洗与转换
- # Original DataFrame with redundant dimension:
- # Value
- # 0 [10]
- # 1 [20]
- # 2 [30]
- # Cleaned Series:
- # 0 10
- # 1 20
- # 2 30
- # Name: Value, dtype: int64
-
- # 240-7、数学与统计计算
- # Total sum of values: 60

图1:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。