Perl(或Python)和Excel有没有一种方法可以确定单元格中多行文本中使用的字体types?

我有数据来作为一个Excel文件,其中一些单元格包含string,其中包含以删除线字符表示的以前版本的数据。 我知道如何使用Perl和OLE来parsing/操作Excel文件,但我只能看到文本格式在单元级别上可访问。 有没有一种方法来访问字符的基础上的格式? 我们的目标是find并消除所有格式为删除线的文本。

这是一个VBA解决scheme,因为我的机器上没有安装Python。 希望它能显示出访问各个字符格式的方法。

以下是Range("A1")

在这里输入图像描述

 Option Explicit Sub test() Dim wb As Workbook Dim ws As Worksheet Dim sentence As Range Set wb = ThisWorkbook Set ws = wb.ActiveSheet Set sentence = ws.Range("A1") With sentence Dim i As Long For i = 1 To .Characters.Count If .Characters(i, 1).Font.Strikethrough Then Debug.Print "strikethrough at character " & i End If Next i End With End Sub 

给出输出:

 strikethrough at character 17 strikethrough at character 18 strikethrough at character 19 

使用Spreadsheet::ParseExcel访问单个单元格以及具有多种格式的复杂单元格。 复杂单元格将使用富文本格式,您可以使用$cell->get_rich_text()方法访问它。 下面是一个寻找三维格式的例子,无论是在单个单元格中,还是作为多格式单元格的一部分,根据perldoc Spreadsheet :: ParseExcel的概要进行调整。

lazy_dog.png

parse_lazy_dog.pl

 #!/usr/bin/env perl use warnings; use strict; use Spreadsheet::ParseExcel; my $file = 'lazy_dog.xls'; my $parser = Spreadsheet::ParseExcel->new(); my $workbook = $parser->parse($file); if ( !defined $workbook ) { die $parser->error(), ".\n"; } for my $worksheet ( $workbook->worksheets() ) { my ( $row_min, $row_max ) = $worksheet->row_range(); my ( $col_min, $col_max ) = $worksheet->col_range(); for my $row ( $row_min .. $row_max ) { for my $col ( $col_min .. $col_max ) { my $cell = $worksheet->get_cell( $row, $col ); next unless $cell; print "Row, Col = ($row, $col)\n"; print "Value = ", $cell->value(), "\n"; print "Unformatted Value = ", $cell->unformatted(), "\n"; if ( my $rich = $cell->get_rich_text() ) { # Multiple formats inside one cell print " STRIKEOUT -> "; my $pos = 0; for my $rich_elem (@$rich) { my ($char_pos, $font) = @$rich_elem; if ($font->{Strikeout}) { while ($pos++ < $char_pos) { print " "; } } else { while ($pos++ <= $char_pos) { print "^"; } } } print "\n"; } else { # Entire cell has same format my $format = $cell->get_format(); my $is_strikeout = $format->{Font}->{Strikeout}; if ($is_strikeout) { print " STRIKEOUT -> "; print "^"x(length($cell->unformatted())); print "\n"; } print "\n"; } } } } 

产量

 Row, Col = (0, 0) Value = The Unformatted Value = The Row, Col = (0, 1) Value = quick Unformatted Value = quick Row, Col = (0, 2) Value = brown Unformatted Value = brown Row, Col = (0, 3) Value = fox Unformatted Value = fox Row, Col = (0, 4) Value = jumped Unformatted Value = jumped Row, Col = (0, 5) Value = under Unformatted Value = under STRIKEOUT -> ^^^^^ Row, Col = (0, 6) Value = over Unformatted Value = over Row, Col = (0, 7) Value = the Unformatted Value = the Row, Col = (0, 8) Value = lazy Unformatted Value = lazy Row, Col = (0, 9) Value = dog. Unformatted Value = dog. Row, Col = (1, 0) Value = THE QUICK BROWN FOX JUMPED UNDER OVER THE LAZY DOG. Unformatted Value = THE QUICK BROWN FOX JUMPED UNDER OVER THE LAZY DOG. STRIKEOUT -> ^^^^^