拼合双嵌套的JSON

我想展平一个如下所示的JSON文件:

{ "teams": [ { "teamname": "1", "members": [ { "firstname": "John", "lastname": "Doe", "orgname": "Anon", "phone": "916-555-1234", "mobile": "", "email": "john.doe@wildlife.net" }, { "firstname": "Jane", "lastname": "Doe", "orgname": "Anon", "phone": "916-555-4321", "mobile": "916-555-7890", "email": "jane.doe@wildlife.net" } ] }, { "teamname": "2", "members": [ { "firstname": "Mickey", "lastname": "Moose", "orgname": "Moosers", "phone": "916-555-0000", "mobile": "916-555-1111", "email": "mickey.moose@wildlife.net" }, { "firstname": "Minny", "lastname": "Moose", "orgname": "Moosers", "phone": "916-555-2222", "mobile": "", "email": "minny.moose@wildlife.net" } ] } ] 

}

我希望将其导出到Excel表格中。 我目前的代码是这样的:

 from pandas.io.json import json_normalize import json import pandas as pd inputFile = 'E:\\teams.json' outputFile = 'E:\\teams.xlsx' f = open(inputFile) data = json.load(f) f.close() df = pd.DataFrame(data) result1 = json_normalize(data, 'teams' ) print result1 

结果在这个输出中:

 members teamname 0 [{u'firstname': u'John', u'phone': u'916-555-... 1 1 [{u'firstname': u'Mickey', u'phone': u'916-555-... 2 

有两个成员的数据嵌套在每一行。 我想有一个输出表,显示所有4个成员的数据加上他们相关的团队名称。

这是做到这一点的一种方法。 应该给你一些想法。

 df = pd.concat( [ pd.concat([pd.Series(m) for m in t['members']], axis=1) for t in data['teams'] ], keys=[t['teamname'] for t in data['teams']] ) 0 1 1 email john.doe@wildlife.net jane.doe@wildlife.net firstname John Jane lastname Doe Doe mobile 916-555-7890 orgname Anon Anon phone 916-555-1234 916-555-4321 2 email mickey.moose@wildlife.net minny.moose@wildlife.net firstname Mickey Minny lastname Moose Moose mobile 916-555-1111 orgname Moosers Moosers phone 916-555-0000 916-555-2222 

要获得一个包含团队名称和成员的好表格,列中的所有属性:

 df.index.levels[0].name = 'teamname' df.columns.name = 'member' df.T.stack(0).swaplevel(0, 1).sort_index() 

在这里输入图像说明

要获取团队名称和成员作为实际列,只需重置索引。

 df.index.levels[0].name = 'teamname' df.columns.name = 'member' df.T.stack(0).swaplevel(0, 1).sort_index().reset_index() 

在这里输入图像说明

整个东西

 import json import pandas as pd json_text = """{ "teams": [ { "teamname": "1", "members": [ { "firstname": "John", "lastname": "Doe", "orgname": "Anon", "phone": "916-555-1234", "mobile": "", "email": "john.doe@wildlife.net" }, { "firstname": "Jane", "lastname": "Doe", "orgname": "Anon", "phone": "916-555-4321", "mobile": "916-555-7890", "email": "jane.doe@wildlife.net" } ] }, { "teamname": "2", "members": [ { "firstname": "Mickey", "lastname": "Moose", "orgname": "Moosers", "phone": "916-555-0000", "mobile": "916-555-1111", "email": "mickey.moose@wildlife.net" }, { "firstname": "Minny", "lastname": "Moose", "orgname": "Moosers", "phone": "916-555-2222", "mobile": "", "email": "minny.moose@wildlife.net" } ] } ] }""" data = json.loads(json_text) df = pd.concat( [ pd.concat([pd.Series(m) for m in t['members']], axis=1) for t in data['teams'] ], keys=[t['teamname'] for t in data['teams']] ) df.index.levels[0].name = 'teamname' df.columns.name = 'member' df.T.stack(0).swaplevel(0, 1).sort_index().reset_index() 

这应该是你以后。

 json_normalize(data,record_path=['teams','members'],meta=[['teams','teamname']]) output: email firstname lastname mobile orgname phone teams.teamname 0 john.doe@wildlife.net John Doe Anon 916-555-1234 1 1 jane.doe@wildlife.net Jane Doe 916-555-7890 Anon 916-555-4321 1 2 mickey.moose@wildlife.net Mickey Moose 916-555-1111 Moosers 916-555-0000 2 3 minny.moose@wildlife.net Minny Moose Moosers 916-555-2222 2 

说明

 from pandas.io.json import json_normalize import pandas as pd 

我最近才学会如何使用json_normalize函数,所以我的解释可能是不正确的。

从我所说的“第0层”开始

 json_normalize(data) output: teams 0 [{'teamname': '1', 'members': [{'firstname': '... 

有1列和1行。 一切都在“团队”栏内。

通过使用record_path =来查看我正在调用的“层1”

 json_normalize(data,record_path='teams') output: members teamname 0 [{'firstname': 'John', 'lastname': 'Doe', 'org... 1 1 [{'firstname': 'Mickey', 'lastname': 'Moose', ... 2 

在第一层,我们已经拼凑了“团队名称”,但内部的“成员”更多。

使用record_path =查看第2层。 这个符号起初是不直观的。 我现在通过['layer','depthlayer']记住它的结果是layer.deeperlayer。

 json_normalize(data,record_path=['teams','members']) output: email firstname lastname mobile orgname phone 0 john.doe@wildlife.net John Doe Anon 916-555-1234 1 jane.doe@wildlife.net Jane Doe 916-555-7890 Anon 916-555-4321 2 mickey.moose@wildlife.net Mickey Moose 916-555-1111 Moosers 916-555-0000 3 minny.moose@wildlife.net Minny Moose Moosers 916-555-2222 

打扰我的输出,我不知道如何在响应表中。

最后,我们使用meta =添加第1层列

 json_normalize(data,record_path=['teams','members'],meta=[['teams','teamname']]) output: email firstname lastname mobile orgname phone teams.teamname 0 john.doe@wildlife.net John Doe Anon 916-555-1234 1 1 jane.doe@wildlife.net Jane Doe 916-555-7890 Anon 916-555-4321 1 2 mickey.moose@wildlife.net Mickey Moose 916-555-1111 Moosers 916-555-0000 2 3 minny.moose@wildlife.net Minny Moose Moosers 916-555-2222 2 

注意我们需要一个列表meta = [[]]来引用第1层。如果我们想要从第0层和第1层的列,我们可以这样做:

 json_normalize(data,record_path=['layer1','layer2'],meta=['layer0',['layer0','layer1']]) 

json_normalize的结果是一个pandas数据框。