1.15 通过某个字段将记录分组-百木园

问题描述

你有一个字典或实例的序列，然后你想根据某个特定字段来分组迭代访问。

解决方案

itertools.groupby()函数对于这样的数据分组操作非常实用：

from itertools import groupby
from operator import itemgetter

rows = [
{\'address\': \'5412 N CLARK\', \'date\': \'07/01/2012\'},
{\'address\': \'5148 N CLARK\', \'date\': \'07/02/2012\'},
{\'address\': \'5148 N CLARK\', \'date\': \'07/02/2012\'},
{\'address\': \'2122 N CLARK\', \'date\': \'07/03/2012\'},
{\'address\': \'5645 N RAVENSWOOD\', \'date\': \'07/02/2012\'},
{\'address\': \'1060 W ADDISON\', \'date\': \'07/02/2012\'},
{\'address\': \'4801 N BROADWAY\', \'date\': \'07/01/2012\'},
{\'address\': \'1039 W GRANVILLE\', \'date\': \'07/04/2012\'}
]

# 先看一下直接对rows使用groupby()函数的效果
for date, item in groupby(rows):
print(date)
for i in item:
print(\' \', i)
\"\"\"输出结果：
{\'address\': \'5412 N CLARK\', \'date\': \'07/01/2012\'}
{\'address\': \'5412 N CLARK\', \'date\': \'07/01/2012\'}
{\'address\': \'5148 N CLARK\', \'date\': \'07/02/2012\'}
{\'address\': \'5148 N CLARK\', \'date\': \'07/02/2012\'}
{\'address\': \'5148 N CLARK\', \'date\': \'07/02/2012\'}
{\'address\': \'2122 N CLARK\', \'date\': \'07/03/2012\'}
{\'address\': \'2122 N CLARK\', \'date\': \'07/03/2012\'}
{\'address\': \'5645 N RAVENSWOOD\', \'date\': \'07/02/2012\'}
{\'address\': \'5645 N RAVENSWOOD\', \'date\': \'07/02/2012\'}
{\'address\': \'1060 W ADDISON\', \'date\': \'07/02/2012\'}
{\'address\': \'1060 W ADDISON\', \'date\': \'07/02/2012\'}
{\'address\': \'4801 N BROADWAY\', \'date\': \'07/01/2012\'}
{\'address\': \'4801 N BROADWAY\', \'date\': \'07/01/2012\'}
{\'address\': \'1039 W GRANVILLE\', \'date\': \'07/04/2012\'}
{\'address\': \'1039 W GRANVILLE\', \'date\': \'07/04/2012\'}
\"\"\"

# 添加key参数指定返回值
for date, item in groupby(rows, key=itemgetter(\'date\')):
print(date)
for i in item:
print(\' \', i)
\"\"\"输出结果：
07/01/2012
{\'address\': \'5412 N CLARK\', \'date\': \'07/01/2012\'}
07/02/2012
{\'address\': \'5148 N CLARK\', \'date\': \'07/02/2012\'}
{\'address\': \'5148 N CLARK\', \'date\': \'07/02/2012\'}
07/03/2012
{\'address\': \'2122 N CLARK\', \'date\': \'07/03/2012\'}
07/02/2012
{\'address\': \'5645 N RAVENSWOOD\', \'date\': \'07/02/2012\'}
{\'address\': \'1060 W ADDISON\', \'date\': \'07/02/2012\'}
07/01/2012
{\'address\': \'4801 N BROADWAY\', \'date\': \'07/01/2012\'}
07/04/2012
{\'address\': \'1039 W GRANVILLE\', \'date\': \'07/04/2012\'}
\"\"\"

# 将rows排序后，再通过key参数指定返回值
rows.sort(key=itemgetter(\'date\'))
for date, item in groupby(rows, key=itemgetter(\'date\')):
print(date)
for i in item:
print(\' \', i)
\"\"\"输出结果：
07/01/2012
{\'address\': \'5412 N CLARK\', \'date\': \'07/01/2012\'}
{\'address\': \'4801 N BROADWAY\', \'date\': \'07/01/2012\'}
07/02/2012
{\'address\': \'5148 N CLARK\', \'date\': \'07/02/2012\'}
{\'address\': \'5148 N CLARK\', \'date\': \'07/02/2012\'}
{\'address\': \'5645 N RAVENSWOOD\', \'date\': \'07/02/2012\'}
{\'address\': \'1060 W ADDISON\', \'date\': \'07/02/2012\'}
07/03/2012
{\'address\': \'2122 N CLARK\', \'date\': \'07/03/2012\'}
07/04/2012
{\'address\': \'1039 W GRANVILLE\', \'date\': \'07/04/2012\'}
\"\"\"

讨论

groupby()函数扫描整个序列并且查找连续相同的值（或者根据指定key函数返回值相同）的元素序列。在每次迭代的时候，它会返回一个值和一个迭代器对象，这个迭代器对象可以生成该返回值所在的对象。

如果你只是想将根据date字段将数据分组到一个大的数据结构中，并且允许随机访问，那么可以使用defaultdict()构建一个多值字典（关于多值字典在1.6小节已经做过介绍），比如：

from collections import defaultdict
rows_by_date = defaultdict(list)
for row in rows:
rows_by_date[row[\'date\']].append(row)

for r in rows_by_date[\'07/01/2012\']:
print(r)
\"\"\"输出结果：
{\'address\': \'5412 N CLARK\', \'date\': \'07/01/2012\'}
{\'address\': \'4801 N BROADWAY\', \'date\': \'07/01/2012\'}
\"\"\"

来源：https://www.cnblogs.com/L999C/p/15746839.html
图文来源于网络，如有侵权请联系删除。

1.15 通过某个字段将记录分组

问题描述

解决方案

讨论

相关推荐

热门文章