parsing不同数量的标签的XML，以使长度相等的列表。 openpyxl和Beautifulsoup

我有一个XML文件，其中有包含作者，出版date，标签等标签的书籍logging。我将parsing这个文件来创build3个列表，其中一个将有书名，另一个列表中的作者，最后是第三个列表中的标签，稍后我将使用openpyxl将这些列表写入Excel列。问题是一些书籍logging没有标签标签。使用普通的美味汤的parsing技术将产生相同长度的前两个列表，但标签列表将具有较短的长度。

我有三个问题：

1-如何创build所有三个长度相同的列表（没有标签标签的书籍为空）2-标签列表看起来像这样['Energy; Green Buildings; High Performance Buildings'，'Computing'，'Computing ;devise;绿色build筑“，…….]我已经创build了另外15个标题，标题名称是我的，例如”计算“和”devise“，有没有什么办法可以使用openpyXL为图书标签组合创build一个X标记或彩色单元格，例如，如果一本书中包含特定的标签，例如，如果第5行中的标题为“Architecture”的书具有“Design”标签，则需要单元格中的X标记或有色单元格（row'5'，col'Design'）。是否有更简单的方法来完成此任务（parsingXML文件并在Excel中高效编写）？

下面是XML文件的快照和我写的代码（也可以从这里下载XML文件和Python文件： http : //www.ranialabib.com/#! python/ icfwa

<?xml version="1.0" encoding="UTF-8"?> <xml> <records> <record> <database name="My Collection.enl" path="My Collection.enl">My Collection.enl</database> <ref-type name="Book">1</ref-type> <contributors> <authors> <author>AIA Research Corporation</author> </authors> </contributors> <titles> <title>Regional guidelines for building passive energy conserving homes</title> </titles> <periodical/> <keywords/> <dates> <year>1978</year> </dates> <publisher>Dept. of Housing and Urban Development, Office of Policy Development and Research : for sale by the Supt. of Docs., US Govt. Print. Off.</publisher> <urls/> <label>Energy;Green Buildings;High Performance Buildings</label> </record> <record> <database name="My Collection.enl" path="My Collection.enl">My Collection.enl</database> <ref-type name="Book">1</ref-type> <contributors> <authors> <author>Akinci, Burcu</author> <author>Ph, D</author> </authors> </contributors> <titles> <title>Computing in Civil Engineering</title> </titles> <periodical/> <pages>692-699</pages> <keywords/> <dates> <year>2007</year> </dates> <publisher>American Society of Civil Engineers</publisher> <isbn>9780784409374</isbn> <electronic-resource-num>ISBN 978-0-7844-1302-9</electronic-resource-num> <urls> <web-urls> <url>http://books.google.com/books?id=QigBgc-qgdoC</url> </web-urls> </urls> </record> import xml.etree.ElementTree as ET fhand = open('My_Collection.xml') data = fhand.read() Title=list() Year=list() Label=list() tree = ET.fromstring(data) titles = tree.findall('.//title') years = tree.findall('.//year') labels = tree.findall('.//label') for t in titles : Title.append(str(t.text)) print 'Titles: ', len(Title) print Title for y in years : Year.append(str(y.text)) print 'years: ', len(Year) print Year for l in labels : Label.append(str(l.text)) print 'Labels: ', len(Label) print Label from openpyxl import Workbook wb = Workbook() ws = wb.active for row in zip(Title, Year, Label): ws.append(row) wb.save("Test2.xlsx")

这里是我根据查理的build议编写的代码，代码没有工作。我得到一个错误消息说：“TypeError：'NoneType'对象不可迭代”。我不知道是什么问题。我怎样才能得到一个列表中的每个logging的所有3个标签（标题，年份，标签）的文本，以及使用openpylx将如此大量的列表（200列表200个书）写入Excel是多么容易？

 import xml.etree.ElementTree as ET fhand = open('My_Collection.xml') data = fhand.read() Label_lst=list() for record in tree.find("records/record") : label = record.find("label") for l in label: if label is not None: label = label_lst.append(label.text) else: label = label_lst.append(' ') print label_lst

如果你想保留logging结构，你应该逐条logging地parsing，而不是仅仅创build属性列表。您可以遍历logging并提取相关字段或for record in parsed_xml.find("records/record"); label = record.find("label"); if label is not None: label = label.text for record in parsed_xml.find("records/record"); label = record.find("label"); if label is not None: label = label.text for record in parsed_xml.find("records/record"); label = record.find("label"); if label is not None: label = label.text然后，您可以直接将行写入Excel，而不必在列上进行压缩。

我只是想出来了。我仍然使用列。

 from openpyxl import Workbook import xml.etree.ElementTree as ET fhand = open ('My_Collection') tree =ET.parse('My_Collection.xml') data= fhand.read() root = tree.getroot() tree = ET.fromstring(data) title_list= ['Title'] year_list = ['Year'] author_list= ['Author'] label_list = ['Label'] for child in tree: for children in child: if children.find('.//title')is None : t='N' else: t=children.find('.//title').text title_list.append(t) print title_list print len(title_list) for child in tree: for children in child: if children.find('.//year')is None : y='N' else: y=children.find('.//year').text year_list.append(y) print year_list print len(year_list) for child in tree: for children in child: if children.find('.//author')is None : a='N' else: a=children.find('.//author').text author_list.append(a) print author_list print len(author_list) for child in tree: for children in child: if children.find('label')is None : l='N' else: l=children.find('label').text label_list.append(l) print label_list print len(author_list) for item in label_list: wb = Workbook() ws = wb.active for row in zip(title_list, year_list, author_list, label_list): ws.append(row) wb.save("Test3.xlsx")

parsing不同数量的标签的XML，以使长度相等的列表。 openpyxl和Beautifulsoup

Excel：parsing地址

BeautifulSoup + xlwt：将HTML表格的内容放入Excel中

编写指定行号和列号的Excel文件 – openpyxl

如何使用汤＆python从Wikipedia的表中的特定列下的内容

Python 3.5 | 分割列表并导出到Excel或CSV

Python – 将数据格式化为Excel电子表格使用pandas

用BeautifulSoup刮胡子盒，用pandas导出到Excel

VBA到Python转换使用beautifulsoup