使用PHPExcel获得处理.xls的OOM

我知道如何使用PHPExcel从大型Excel文件(27MB +)中读取大型工作表? 我试图实现在这个问题中讨论的分块阅读,但是我仍然患有OOM错误。 文件本身只有5Mb,9000多行(是的,它超过9000!)范围从A到V.

我不希望用户在上传和处理之前对这个文件进行任何编辑,因为目前这只是一个手动过程,我想完全用自动化的代替它。 该文件是xls格式,通过PHPExcel标识为Excel5。

我的PHP内存限制目前设置为128M,在Ubuntu服务器上运行。

无论我设定的块大小,我最终都会结束。 如果我将块大小设置为200,当设置为1时,实际上它运行得更好(因为我可以pipe理到7000行左右)。所以我相信“东西”正在被存储,或者在读取块的每个迭代中加载到内存中,然后再次丢弃,最终导致OOM,但是我不能看到这发生在哪里。

我是一个非常业余的程序员,这只是我在我的pipe理服务angular色中所做的工作,试图让我们的生活更轻松。

这段代码的全部内容是读取excel文件,过滤出“垃圾”,然后将其保存为CSV(现在我只是将其转储到屏幕而不是CSV)。 在速度的事情,我正在试图通过PHP脚本调用excel2csv,然后尝试清理CSV,而不是…但是这感觉就像放弃,当我可能接近一个解决scheme。

<?php error_reporting(E_ALL); set_time_limit(0); date_default_timezone_set('Europe/London'); require_once 'Classes/PHPExcel/IOFactory.php'; class chunkReadFilter implements PHPExcel_Reader_IReadFilter { private $_startRow = 0; private $_endRow = 0; private $_columns = array(); /** Set the list of rows that we want to read */ public function setRows($startRow, $chunkSize, $columns) { $this->_startRow = $startRow; $this->_endRow = $startRow + $chunkSize; $this->_columns = $columns; } public function readCell($column, $row, $worksheetName = '') { // Only read the heading row, and the rows that are configured in $this->_startRow$ if ($row >= $this->_startRow && $row < $this->_endRow) { if(in_array($column,$this->_columns)) { return true; } } return false; } } $target_dir = "uploads/"; $file_name = $_POST["file_name"]; $full_path = $target_dir . $file_name; echo "Processing ". $file_name . '; <br>'; ob_flush(); flush(); /** /** As files maybe large in memory, use a temp file to handle them $cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp; $cacheSettings = array( 'memoryCacheSize' => '8MB'); PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings); **/ $inputFileName = $full_path; echo 'Excel reader started<br/>'; /** First we should get the type of file **/ $filetype = PHPExcel_IOFactory::identify($inputFileName); echo 'File of type: ' . $filetype . ' found<br/>'; /** Load $inputFileName to a PHPExcel Object - https://github.com/PHPOffice/PHPExcel/blob/develop/$ /** Define how many rows we want to read for each "chunk" **/ $chunkSize = 1; /** Create a new Instance of our Read Filter **/ $chunkFilter = new chunkReadFilter(); $objReader = PHPExcel_IOFactory::createReader($filetype); /** Tell the Reader that we want to use the Read Filter that we've Instantiated **/ $objReader->setReadFilter($chunkFilter); /** Loop to read our worksheet in "chunk size" blocks **/ for ($startRow = 2; $startRow <= 65000; $startRow += $chunkSize) { $endRow = $startRow+$chunkSize-1; echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ',$startR$ /** Tell the Read Filter, the limits on which rows we want to read this iteration **/ $chunkFilter->setRows($startRow,$chunkSize,range('A','T')); /** Load only the rows that match our filter from $inputFileName to a PHPExcel Object **/ $objPHPExcel = $objReader->load($inputFileName); // Do some processing here // $sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true); $sheetData = $objPHPExcel->getActiveSheet()->rangeToArray("A$startRow:T$endRow"); var_dump($sheetData); // Clear the variable to not go over memory! $objPHPExcel->disconnectWorksheets(); unset ($sheetData); unset ($objPHPExcel); ob_flush(); flush(); echo '<br /><br />'; } /** This loads the entire file, crashing with OOM try { $objPHPExcel = PHPExcel_IOFactory::load($inputFileName); echo 'loaded sheet into memory<br>'; } catch(PHPExcel_Reader_Exception $e) { die('Error loading file: '.$e->getMessage()); } $objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'CSV'); echo 'Saving sheet as CSV<br>'; $objWriter->setSheetIndex(0); $objWriter->save('./uploads/'.$file_name.'.csv'); echo 'Processed 1 sheet'; ob_flush(); flush(); **/ echo "<body><table>\n\n"; /** $f = fopen($file_name, "r"); while (($line = fgetcsv($f)) !== false) { echo "<tr>"; foreach ($line as $cell) { echo "<td>" . htmlspecialchars($cell) . "</td>"; } echo "</tr>\n"; } fclose($f); **/ echo "\n</table></body></html>"; ?> 

apache日志中显示的错误是:

 [Fri Mar 31 15:35:27.982697 2017] [:error] [pid 1059] [client 10.0.2.2:53866] PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 45056 bytes) in /var/www/html/Classes/PHPExcel/Shared/OLERead.php on line 93, referer: http://localhost:8080/upload.php 

 unset ($objPHPExcel); 

如果您检查PHPExcel文档 ,这将不会干净地取消设置$ objPHPExcel,因为电子表格,工作表和单元格之间的循环引用,并将导致内存泄漏。 build议先断开这些循环引用。

 $objPHPExcel->disconnectWorksheets(); unset($objPHPExcel); 

仍然会有一些内存泄漏,但它应该允许更多的内存在块之间释放