赞
踩
DataFrame.drop_duplicates(subset=None, keep='first', inplace=False)
当subset是传入很多个值时, 要多个字段联合起来都是一样的才删除.
In [48]: df Out[48]: a b c d e f g 0 49 75 49 50 1 1 1 1 89 87 27 69 2 1 1 2 41 1 75 99 3 2 1 3 8 19 71 6 4 3 1 4 0 59 92 39 4 4 1 In [49]: dff = df.drop_duplicates(subset=['f', 'g']) In [50]: dff Out[50]: a b c d e f g 0 49 75 49 50 1 1 1 2 41 1 75 99 3 2 1 3 8 19 71 6 4 3 1 4 0 59 92 39 4 4 1 In [51]: dff2 = df.drop_duplicates(subset=['e','f', 'g']) In [52]: dff2 Out[52]: a b c d e f g 0 49 75 49 50 1 1 1 1 89 87 27 69 2 1 1 2 41 1 75 99 3 2 1 3 8 19 71 6 4 3 1 4 0 59 92 39 4 4 1
In [26]: df = pd.DataFrame(np.random.randint(0, 100, size=(5, 5)), index=list(range(5)), columns=list('abcde')) In [27]: df['f'] = [1, 1, 2, 3, 4] In [28]: df Out[28]: a b c d e f 0 42 55 55 39 61 1 1 27 51 26 26 64 1 2 87 11 23 2 77 2 3 82 98 61 15 88 3 4 25 21 47 79 4 4 In [29]: dff = df.drop_duplicates(subset=['f'], keep='first') In [30]: dff Out[30]: a b c d e f 0 42 55 55 39 61 1 2 87 11 23 2 77 2 3 82 98 61 15 88 3 4 25 21 47 79 4 4 In [31]: df Out[31]: a b c d e f 0 42 55 55 39 61 1 1 27 51 26 26 64 1 2 87 11 23 2 77 2 3 82 98 61 15 88 3 4 25 21 47 79 4 4 In [34]: new = df.drop_duplicates(subset=['f'], keep='first', inplace=True) In [35]: new In [36]: df Out[36]: a b c d e f 0 42 55 55 39 61 1 2 87 11 23 2 77 2 3 82 98 61 15 88 3 4 25 21 47 79 4 4
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。