Tag: beautifulsoup

parsing不同数量的标签的XML，以使长度相等的列表。 openpyxl和Beautifulsoup: 我有一个XML文件，其中有包含作者，出版date，标签等标签的书籍logging。我将parsing这个文件来创build3个列表，其中一个将有书名，另一个列表中的作者，最后是第三个列表中的标签，稍后我将使用openpyxl将这些列表写入Excel列。问题是一些书籍logging没有标签标签。使用普通的美味汤的parsing技术将产生相同长度的前两个列表，但标签列表将具有较短的长度。我有三个问题： 1-如何创build所有三个长度相同的列表（没有标签标签的书籍为空）2-标签列表看起来像这样['Energy; Green Buildings; High Performance Buildings'，'Computing'，'Computing ;devise;绿色build筑“，…….]我已经创build了另外15个标题，标题名称是我的，例如”计算“和”devise“，有没有什么办法可以使用openpyXL为图书标签组合创build一个X标记或彩色单元格，例如，如果一本书中包含特定的标签，例如，如果第5行中的标题为“Architecture”的书具有“Design”标签，则需要单元格中的X标记或有色单元格（row'5'，col'Design'）。是否有更简单的方法来完成此任务（parsingXML文件并在Excel中高效编写）？下面是XML文件的快照和我写的代码（也可以从这里下载XML文件和Python文件： http : //www.ranialabib.com/#! python/ icfwa <?xml version="1.0" encoding="UTF-8"?> <xml> <records> <record> <database name="My Collection.enl" path="My Collection.enl">My Collection.enl</database> <ref-type name="Book">1</ref-type> <contributors> <authors> <author>AIA Research Corporation</author> </authors> </contributors> <titles> <title>Regional guidelines for building passive energy conserving homes</title> </titles> <periodical/> <keywords/> <dates> <year>1978</year> </dates> <publisher>Dept. […]

Excel：parsing地址: 我已经在泰国网站上search了一些餐馆的数据。我目前有一个地址字段的问题，因为在网站上的地址搬到下一行时，刮了决定结合它，而不是留下任何空间。例如： 22/F, Dusit Thani Bangkok946 Rama 4 RoadBangkokThailand 1/F, Oakwood Residence113 Thonglor Soi 13BangkokThailand G/F, Ocean Tower IISukhumvit Soi 21WattanaBangkokThailand 在第一个条目中，我希望k和9之间以及d和B之间有一个空格，其他条目也是如此。我目前正在使用BeautifulSoup从这里刮取数据。如果任何人都可以帮我解决这个问题，或者更好的方法来刮去HTML我都听过。我宁愿不要手动更改280+地址条目。

Python 3.5 | 分割列表并导出到Excel或CSV: 我用Python 3.5（BeautifulSoup）刮了一个网站，结果是一个列表。这些值存储在一个名为“project_titles”的variables中。值如下所示： project_titles = ['I'm Back. Raspberry Pi unique Case for your Analog Cameras', 'CitizenSpring – App to crowdsource & map safe drinking water', 'Shoka Bell: The Ultimate City Cycling Tool'] 我想在逗号分隔值，并将其导出到Excel文件或CSV。我需要在Excel中的值，如：单元格A1：我回来了。树莓派独特的案例为您的模拟相机单元格B1：CitizenSpring – 应用程序来源和地图安全的饮用水细胞C1：Shoka Bell：终极城市骑行工具

BeautifulSoup + xlwt：将HTML表格的内容放入Excel中: 我正在尝试（用一个小python脚本）将一个在线网页的HTML表格的内容放在Excel工作表中。一切工作都很好，除了“Excel的东西”。 #!/usr/bin/python # –*– coding:UTF-8 –*– import xlwt from urllib2 import urlopen import sys import re from bs4 import BeautifulSoup as soup import urllib def BULATS_IA(name_excel): """ Function for fetching the BULATS AGENTS GLOBAL LIST""" ws = wb.add_sheet("BULATS_IA") # I add a sheet in my excel file Countries_List = ['United Kingdom','Albania','Andorra'] Longueur = len(Countries_List) […]

如何使用汤＆python从Wikipedia的表中的特定列下的内容: 我需要从维基百科的表格中获取内容指向特定列下的href链接。该页面是“ http://en.wikipedia.org/wiki/List_of_Telugu_films_of_2015 ”。在这个页面上，有几个表格“wikitable”。我需要列标题下的内容的链接，他们指向的每一行。我希望将它们复制到Excel表格中。我不知道在一个特定的列下search的确切代码，但我到这里来，我得到一个“Nonetype对象不可调用” 。我正在使用bs4。我想提取至less部分表格，所以我可以弄清楚我想要的标题列下的href链接，但是我以这个错误结束。代码如下： from urllib.request import urlopen from bs4 import BeautifulSoup soup = BeautifulSoup(urlopen('http://en.wikipedia.org/wiki/List_of_Telugu_films_of_2015').read()) for row in soup('table', {'class': 'wikitable'})[1].tbody('tr'): tds = row('td') print (tds[0].string, tds[0].string) 一点指导赞赏。有谁知道？

VBA到Python转换使用beautifulsoup: For Each hdiv In doc.getElementsByClassName("offset1 transport-plan location-detail well well-white margin20right") For Each child In hdiv.Children If child.tagName = "H4" Then location = child.innerText ElseIf child.tagName = "TABLE" Then If row.tagName = "TBODY" Then For Each row1 In row.Children do something If row1.tagName = "TR" Then For Each row2 In row1.Children If row2.innerText <> "" Then […]

用BeautifulSoup刮胡子盒，用pandas导出到Excel: 我一直在试图弄清楚如何用Python 3.6以及BeautifulSoup和Pandas模块从Fangraphs中刮取棒球盒子的分数。我的最终目标是将网页的不同部分保存到Excel中的不同表格中。为了做到这一点，我想我必须分别拉他们各自的id标签每个表。这是构成第一个Excel表格的四个表格（在页面上的图表下方）的代码。运行代码导致这个错误： Traceback (most recent call last): File "Fangraphs Box Score Scraper.py", line 14, in <module> df1 = pd.read_html(soup,attrs={'id': ['WinsBox1_dghb','WinsBox1_dghp','WinsBox1_dgab','WinsBox1_dgap']}) File "C:\Python36\lib\site-packages\pandas\io\html.py", line 906, in read_html keep_default_na=keep_default_na) File "C:\Python36\lib\site-packages\pandas\io\html.py", line 743, in _parse raise_with_traceback(retained) File "C:\Python36\lib\site-packages\pandas\compat\__init__.py", line 344, in raise_with_traceback raise exc.with_traceback(traceback) TypeError: 'NoneType' object is not callable import requests from […]

Python – 将数据格式化为Excel电子表格使用pandas: 我想要两列数据的团队名称和行。然而，我所有的input只是放在单元格B1中。（请注意，在我的代码片段底部注释掉了代码）。我想我需要循环遍历我的列表for循环，让所有的团队沿着A列，沿着B列向下，但只是用pandas来包裹我的头。任何帮助将不胜感激！谢谢 team = [] line = [] # Each row in table find all rows with class name team for tr in table.find_all("tr", class_="team"): # Place all text with identifier 'name' in list named team for td in tr.find_all("td", ["name"]): team.append(td.text.strip()) for tr in table.find_all("tr", class_="team"): for td in tr.find_all("td", […]

编写指定行号和列号的Excel文件 – openpyxl: 我有一个用XLWT库编写的代码，现在我切换到了openpyxl因为它允许具有比XLWT更多的行限制的XLSX文件允许我通过在XLWT指定行号和列号来编写单元格 worksheet.write(1, 2, "City") 现在我想知道如何在openpyxl做到这openpyxl ？我努力了 worksheet.cell(1, 1).value = "test" 但是我得到一个错误 AttributeError: 'int' object has no attribute 'replace'