如何在Excel XLS文件中提取PNG附件?

我经常需要提取在Excel文件中复制粘贴的图像。 不幸的是,这些文件进来卑鄙的XLS格式。 所以,由于简单的解压缩技巧不起作用,我决定尝试自己做一个小Python脚本来做到这一点。

(提取图像是痛苦的,因为我必须实际上复制粘贴到画图以保存它们。没有“ 另存为…”或“ 导出”button。)

如果您查看PNG参考(或已经知道它),您将会看到它基本上以一个èPNG标记开始,并以IEND块结束。

所以我试了下面的代码:

 import sys import os def info(s): print("[i] "+s) info("Opening file: " + sys.argv[1]) with open(sys.argv[1],'rb') as f: buf = f.read() info("File read") offset_s = buf.find(b'\x89PNG\x0D\x0A\x1A\x0A') if offset_s == -1: error("PNG not found") os.exit(-1) else: info("PNG start found at offset: {}".format(offset_s)) offset_e = buf.find(b'IEND') if offset_e == -1: error("PNG not found") os.exit(-1) else: offset_e += 8 info("PNG end found at offset: {}".format(offset_e)) with open("out.png", "wb") as f: f.write(buf[offset_s:offset_e]) info("Written to out.png") 

所以它提取数据。 但PNG数据已损坏(在IDAT块中),因此无法正常显示。 这是一个pngcheck运行的结果:

 File: out.png (221879 bytes) chunk IHDR at offset 0x0000c, length 0 1366 x 768 image, 24-bit RGB, non-interlaced chunk sRGB at offset 0x00025, length 0 rendering intent = perceptual chunk pHYs at offset 0x00032, length 0: 3780x3780 pixels/meter (96 dpi) chunk IDAT at offset 0x00047, length 0 zlib: deflated, 32K window, fast compression CRC error in chunk IDAT (actual 632dd60d, should be 5985ed29) Chunk name fffffffb 02 ffffff8a 5e doesn't conform to naming rules. chunk ?? at offset 0x10008, length 0 

你认为(或知道的事实? – 但我没有find这个信息时,试图)Excel存储PNG文件与特定(甚至专有)的filter/压缩algorithm?

任何想法,我怎样才能得到它的工作?

编辑 – 研究跟进:我一直在进一步追求分析。 我拍了一个更大的图像,把它放在一个空白的Excel文件中,并保存为XLS。

然后,我用我的以前的工具提取它,并创build一个新的由Excel添加4个字节的项目。 代码如下:

 import sys import os import binascii def info(s): print("[i] "+s) def die(s): print("[!] "+s) sys.exit(-1) info("Opening original file: " + sys.argv[1]) i = 0 with open(sys.argv[1], 'rb') as original: info("Opening changed file: " + sys.argv[2]) with open(sys.argv[2], 'rb') as changed: o_byte = original.read(1) c_byte = changed.read(1) while o_byte != b"": if c_byte == b"": die("Error reading from changed file.") while c_byte != o_byte: info("{:08X} - Found diff: 0x{:02X} 0x{:02X} 0x{:02X} 0x{:02X}".format(i, ord(c_byte), ord(changed.read(1)), ord(changed.read(1)), ord(changed.read(1)))) i += 4 c_byte = changed.read(1) o_byte = original.read(1) c_byte = changed.read(1) i += 1 

运行它对我的原始和XLS提取的PNG文件,我得到以下输出:

 [i] Opening original file: test1.PNG [i] Opening changed file: out.png [i] 00001FAB - Found diff: 0xEB 0x00 0x20 0x20 [i] 00003FCF - Found diff: 0x3C 0x00 0x20 0x20 [i] 00005FF3 - Found diff: 0x3C 0x00 0x20 0x20 [i] 00008017 - Found diff: 0x3C 0x00 0x20 0x20 [i] 000090BE - Found diff: 0x81 0x00 0x00 0x00 [i] 000090C2 - Found diff: 0x82 0x00 0x00 0x00 [i] 000090C6 - Found diff: 0x83 0x00 0x00 0x00 [i] 000090CA - Found diff: 0x84 0x00 0x00 0x00 [i] 000090CE - Found diff: 0x85 0x00 0x00 0x00 [i] 000090D2 - Found diff: 0x86 0x00 0x00 0x00 [i] 000090D6 - Found diff: 0x87 0x00 0x00 0x00 [i] 000090DA - Found diff: 0x88 0x00 0x00 0x00 [i] 000090DE - Found diff: 0x89 0x00 0x00 0x00 [i] 000090E2 - Found diff: 0x8A 0x00 0x00 0x00 [i] 000090E6 - Found diff: 0x8B 0x00 0x00 0x00 [i] 000090EA - Found diff: 0x8C 0x00 0x00 0x00 [i] 000090EE - Found diff: 0x8D 0x00 0x00 0x00 [i] 000090F2 - Found diff: 0x8E 0x00 0x00 0x00 [i] 000090F6 - Found diff: 0x8F 0x00 0x00 0x00 [i] 000090FA - Found diff: 0x90 0x00 0x00 0x00 [i] 000090FE - Found diff: 0x91 0x00 0x00 0x00 [i] 00009102 - Found diff: 0x92 0x00 0x00 0x00 [i] 00009106 - Found diff: 0x93 0x00 0x00 0x00 [i] 0000910A - Found diff: 0x94 0x00 0x00 0x00 [i] 0000910E - Found diff: 0x95 0x00 0x00 0x00 [i] 00009112 - Found diff: 0x96 0x00 0x00 0x00 [i] 00009116 - Found diff: 0x97 0x00 0x00 0x00 [i] 0000911A - Found diff: 0x98 0x00 0x00 0x00 [i] 0000911E - Found diff: 0x99 0x00 0x00 0x00 [i] 00009122 - Found diff: 0x9A 0x00 0x00 0x00 [i] 00009126 - Found diff: 0x9B 0x00 0x00 0x00 [i] 0000912A - Found diff: 0x9C 0x00 0x00 0x00 [i] 0000912E - Found diff: 0x9D 0x00 0x00 0x00 [i] 00009132 - Found diff: 0x9E 0x00 0x00 0x00 [i] 00009136 - Found diff: 0x9F 0x00 0x00 0x00 [i] 0000913A - Found diff: 0xA0 0x00 0x00 0x00 [i] 0000913E - Found diff: 0xA1 0x00 0x00 0x00 [i] 00009142 - Found diff: 0xA2 0x00 0x00 0x00 [i] 00009146 - Found diff: 0xA3 0x00 0x00 0x00 [i] 0000914A - Found diff: 0xA4 0x00 0x00 0x00 [i] 0000914E - Found diff: 0xA5 0x00 0x00 0x00 [i] 00009152 - Found diff: 0xA6 0x00 0x00 0x00 [i] 00009156 - Found diff: 0xA7 0x00 0x00 0x00 [i] 0000915A - Found diff: 0xA8 0x00 0x00 0x00 [i] 0000915E - Found diff: 0xA9 0x00 0x00 0x00 [i] 00009162 - Found diff: 0xAA 0x00 0x00 0x00 [i] 00009166 - Found diff: 0xAB 0x00 0x00 0x00 [i] 0000916A - Found diff: 0xAC 0x00 0x00 0x00 [i] 0000916E - Found diff: 0xAD 0x00 0x00 0x00 [i] 00009172 - Found diff: 0xAE 0x00 0x00 0x00 [i] 00009176 - Found diff: 0xAF 0x00 0x00 0x00 [i] 0000917A - Found diff: 0xB0 0x00 0x00 0x00 [i] 0000917E - Found diff: 0xB1 0x00 0x00 0x00 [i] 00009182 - Found diff: 0xB2 0x00 0x00 0x00 [i] 00009186 - Found diff: 0xB3 0x00 0x00 0x00 [i] 0000918A - Found diff: 0xB4 0x00 0x00 0x00 [i] 0000918E - Found diff: 0xB5 0x00 0x00 0x00 [i] 00009192 - Found diff: 0xB6 0x00 0x00 0x00 [i] 00009196 - Found diff: 0xB7 0x00 0x00 0x00 [i] 0000919A - Found diff: 0xB8 0x00 0x00 0x00 [i] 0000919E - Found diff: 0xB9 0x00 0x00 0x00 [i] 000091A2 - Found diff: 0xBA 0x00 0x00 0x00 [i] 000091A6 - Found diff: 0xBB 0x00 0x00 0x00 [i] 000091AA - Found diff: 0xBC 0x00 0x00 0x00 [i] 000091AE - Found diff: 0xBD 0x00 0x00 0x00 [i] 000091B2 - Found diff: 0xBE 0x00 0x00 0x00 [i] 000091B6 - Found diff: 0xBF 0x00 0x00 0x00 [i] 000091BA - Found diff: 0xC0 0x00 0x00 0x00 [i] 000091BE - Found diff: 0xC1 0x00 0x00 0x00 [i] 000091C2 - Found diff: 0xC2 0x00 0x00 0x00 [i] 000091C6 - Found diff: 0xC3 0x00 0x00 0x00 [i] 000091CA - Found diff: 0xC4 0x00 0x00 0x00 [i] 000091CE - Found diff: 0xC5 0x00 0x00 0x00 [i] 000091D2 - Found diff: 0xC6 0x00 0x00 0x00 [i] 000091D6 - Found diff: 0xC7 0x00 0x00 0x00 [i] 000091DA - Found diff: 0xC8 0x00 0x00 0x00 [i] 000091DE - Found diff: 0xC9 0x00 0x00 0x00 [i] 000091E2 - Found diff: 0xCA 0x00 0x00 0x00 [i] 000091E6 - Found diff: 0xCB 0x00 0x00 0x00 [i] 000091EA - Found diff: 0xCC 0x00 0x00 0x00 [i] 000091EE - Found diff: 0xCD 0x00 0x00 0x00 [i] 000091F2 - Found diff: 0xCE 0x00 0x00 0x00 [i] 000091F6 - Found diff: 0xCF 0x00 0x00 0x00 [i] 000091FA - Found diff: 0xD0 0x00 0x00 0x00 [i] 000091FE - Found diff: 0xD1 0x00 0x00 0x00 [i] 00009202 - Found diff: 0xD2 0x00 0x00 0x00 [i] 00009206 - Found diff: 0xD3 0x00 0x00 0x00 [i] 0000920A - Found diff: 0xD4 0x00 0x00 0x00 [i] 0000920E - Found diff: 0xD5 0x00 0x00 0x00 [i] 00009212 - Found diff: 0xD6 0x00 0x00 0x00 [i] 00009216 - Found diff: 0xD7 0x00 0x00 0x00 [i] 0000921A - Found diff: 0xD8 0x00 0x00 0x00 [i] 0000921E - Found diff: 0xD9 0x00 0x00 0x00 [i] 00009222 - Found diff: 0xDA 0x00 0x00 0x00 [i] 00009226 - Found diff: 0xDB 0x00 0x00 0x00 [i] 0000922A - Found diff: 0xDC 0x00 0x00 0x00 [i] 0000922E - Found diff: 0xDD 0x00 0x00 0x00 [i] 00009232 - Found diff: 0xDE 0x00 0x00 0x00 [i] 00009236 - Found diff: 0xDF 0x00 0x00 0x00 [i] 0000923A - Found diff: 0xE0 0x00 0x00 0x00 [i] 0000923E - Found diff: 0xE1 0x00 0x00 0x00 [i] 00009242 - Found diff: 0xE2 0x00 0x00 0x00 [i] 00009246 - Found diff: 0xE3 0x00 0x00 0x00 [i] 0000924A - Found diff: 0xE4 0x00 0x00 0x00 [i] 0000924E - Found diff: 0xE5 0x00 0x00 0x00 [i] 00009252 - Found diff: 0xE6 0x00 0x00 0x00 [i] 00009256 - Found diff: 0xE7 0x00 0x00 0x00 [i] 0000925A - Found diff: 0xE8 0x00 0x00 0x00 [i] 0000925E - Found diff: 0xE9 0x00 0x00 0x00 [i] 00009262 - Found diff: 0xEA 0x00 0x00 0x00 [i] 00009266 - Found diff: 0xEB 0x00 0x00 0x00 [i] 0000926A - Found diff: 0xEC 0x00 0x00 0x00 [i] 0000926E - Found diff: 0xED 0x00 0x00 0x00 [i] 00009272 - Found diff: 0xEE 0x00 0x00 0x00 [i] 00009276 - Found diff: 0xEF 0x00 0x00 0x00 [i] 0000927A - Found diff: 0xF0 0x00 0x00 0x00 [i] 0000927E - Found diff: 0xF1 0x00 0x00 0x00 [i] 00009282 - Found diff: 0xF2 0x00 0x00 0x00 [i] 00009286 - Found diff: 0xF3 0x00 0x00 0x00 [i] 0000928A - Found diff: 0xFE 0xFF 0xFF 0xFF [i] 0000928E - Found diff: 0xFE 0xFF 0xFF 0xFF [i] 00009292 - Found diff: 0xF6 0x00 0x00 0x00 [i] 00009296 - Found diff: 0xFE 0xFF 0xFF 0xFF [i] 0000929A - Found diff: 0xFE 0xFF 0xFF 0xFF [i] 0000929E - Found diff: 0xFF 0xFF 0xFF 0xFF [i] 000092A2 - Found diff: 0xFF 0xFF 0xFF 0xFF [i] 000092A6 - Found diff: 0xFF 0xFF 0xFF 0xFF [i] 000092AA - Found diff: 0xFF 0xFF 0xFF 0xFF [i] 000092AE - Found diff: 0xFF 0xFF 0xFF 0xFF [i] 000092B2 - Found diff: 0xFF 0xFF 0xFF 0xFF [i] 000092B6 - Found diff: 0xFF 0xFF 0xFF 0xFF [i] 000092BA - Found diff: 0xFF 0xFF 0xFF 0xFF [i] 0000A23B - Found diff: 0x3C 0x00 0x20 0x20 [i] 0000C25F - Found diff: 0x3C 0x00 0x20 0x20 [i] 0000E283 - Found diff: 0x3C 0x00 0x20 0x20 [i] 000102A7 - Found diff: 0x3C 0x00 0x20 0x20 [i] 000122CB - Found diff: 0x3C 0x00 0x20 0x20 [i] 000142EF - Found diff: 0x3C 0x00 0x20 0x20 [i] 00016313 - Found diff: 0x3C 0x00 0x20 0x20 [i] 00018337 - Found diff: 0x3C 0x00 0x20 0x20 [i] 0001A35B - Found diff: 0x3C 0x00 0x0D 0x0B 

谁是这个0x3C家伙? 为什么Excel从某个时候开始计算呢? ( 0x83 …)

编辑 – 附加指针:似乎0x003C是Excel文件格式的CONTINUElogging的标识符,如https://www.openoffice.org/sc/excelfileformat.pdf

而计数可能是复合文件 SSAT表,但我不知道。

但是仍然不知道0xEB

如果在Windows上运行,或者如果使用虚拟机可以接受此任务,则可能需要使用COM接口来执行此操作 – 您甚至可以使用pywin32从Python使用该接口。 看看这个问题,例如: 使用Python将图表从Excel导出为图像 。