羊村懒王

这个屌丝很懒，什么也没留下！

热门标签

article

POI处理导入excel发生RecordInputStream$LeftoverDataException处理记录_org.apache.poi.hssf.record.recordinputstream$lefto

作者：羊村懒王 | 2024-02-16 00:35:51

踩

org.apache.poi.hssf.record.recordinputstream$leftoverdataexception: initiali

项目场景：

现场运维反馈，甲方爸爸有业务导入excel表单发生“系统异常”，但是不是每个人都有这种问题，个别用户导入的excel存在异常，由于网络隔离和生产日志管理原因，一直无法拿到error日志，只能请求到家里协助定位问题。

但是家里不管怎么导入excel文件就是无法复现，故怀疑是文件问题，让现场人员把业务的excel表数据拷贝到新建的excel里面试试，结果导入成功，所以只能说临时解决了问题。

通过各种申请，层层审批，终于拿到了导入出错的原始文件，开始进行分析解决问题之路。

问题描述

文件问题：拿到原始文件后，通过office打开，发现打开文件标题上出现了“兼容模式”字样，如下图：

此时可以确认文件应该是第三方应用系统中导出的文件，而非通过excel进行制作的表格数据。

导入程序报错信息：通过把此文件导入测试，发现错误很明显


org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: Initialisation of record 0x31(FontRecord) left 4 bytes remaining still to be read.
	at org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:188)
	at org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:234)
	at org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:488)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:343)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:306)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:258)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:241)
    ... ...

通过异常关键字进行检索，很多解决方案都是文件另存未就能解决问题，而没有进行具体解决。本着能不改就不改的理念，让PM跟甲方爸爸进行了沟通，得到的答复是：凭什么要另存为，下载下来的数据都是都是核心数据，另存为如果改动了这个责任谁来担当，你们必须解决。好吧，都是“甩锅”。

原因分析：

通过异常报错位置，在IDEAL里面找到反编译后的代码位置，发现发生问题的位置原逻辑如下：


 public boolean hasNextRecord() throws LeftoverDataException {
        if (this._currentDataLength != -1 && this._currentDataLength != this._currentDataOffset) {
            throw new LeftoverDataException(this._currentSid, this.remaining());
        }
        if (this._currentDataLength != DATA_LEN_NEEDS_TO_BE_READ) {
            this._nextSid = this.readNextSid();
        }
            return this._nextSid != INVALID_SID_VALUE;
    }

此处由于 this._currentDataLength != this._currentDataOffset 导致抛出异常。

通过借鉴：NPOI.HSSF.Record.LeftoverDataException: Initialisation of record 0x31 left 4 bytes remaining still t_xue251248603的博客-CSDN博客

此篇文章中的方法，进行了处理之路。

解决方案：

很多都说需要对源码重新打包，此种做法在我们项目中是无法实施的，按照合同协议要求，第三方包，必须使用原生，不允许引用项目中的第三方包，所有包都是maven仓库中的包。所以只能选择另外的一个解决方案：重写jar包中的类，通过JVM的编译输出优先使用项目src中的类原理来实现覆盖jar包中的类的方法进行处理。

首先，新建包：org.apache.poi.hssf.record，然后新建类RecordInputStream，保持跟poi包中一致，再进行代码修改。


 public boolean hasNextRecord() throws LeftoverDataException {
        if (this._currentDataLength != -1 && this._currentDataLength != this._currentDataOffset) {
//            throw new LeftoverDataException(this._currentSid, this.remaining());
            readToEndOfRecord();
        }
        if (this._currentDataLength != DATA_LEN_NEEDS_TO_BE_READ) {
            this._nextSid = this.readNextSid();
        }
            return this._nextSid != INVALID_SID_VALUE;
    }
    private void readToEndOfRecord() {
        while (this._currentDataOffset < this._currentDataLength)
            readByte();
    }

（此处代码参考网上方法进行处理），再结合刚刚参考的文档说明规避“Found EOFRecord before WindowTwoRecord was encountered”异常，故还要重新建类RecordOrderer以此来覆盖poi.jar里面的此类，此类的包路径为：org.apache.poi.hssf.model。修改代码如下：


public static boolean isEndOfRowBlock(int sid) {
        switch (sid) {
            case EOFRecord.sid:
                //为了解决其他非标准excel生成的文件进行导入
//                throw new RuntimeException("Found EOFRecord before WindowTwoRecord was encountered");
            case DrawingRecord.sid:
            case DrawingSelectionRecord.sid:
            case ObjRecord.sid:
            case TextObjectRecord.sid:
            case ColumnInfoRecord.sid: // See Bugzilla 53984
            case GutsRecord.sid:   // see Bugzilla 50426
            case WindowOneRecord.sid:
                // should really be part of workbook stream, but some apps seem to put this before WINDOW2
            case WindowTwoRecord.sid:
                return true;
            case DVALRecord.sid:
                return true;
            default:
                return PageSettingsBlock.isComponentRecord(sid);
        }
    }

让其EOFRecord.sid的数据不抛出异常，直接注释掉。

本以为一切顺利，结果调试中出现以下异常：


java.lang.RuntimeException: Unexpected record type (org.apache.poi.hssf.record.PaneRecord)
	at org.apache.poi.hssf.record.aggregates.RowRecordsAggregate.<init>(RowRecordsAggregate.java:97)
	at org.apache.poi.hssf.model.InternalSheet.<init>(InternalSheet.java:183)
	at org.apache.poi.hssf.model.InternalSheet.createSheet(InternalSheet.java:122)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:354)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:306)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:258)
	at org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:241)

好吧，继续处理RowRecordsAggregate类吧，还是按照覆盖模式进行处理。

修改代码如下：


    public RowRecordsAggregate(RecordStream rs, SharedValueManager svm) {
        this(svm);
 
        while(true) {
            while(rs.hasNext()) {
                Record rec = rs.getNext();
                switch (rec.getSid()) {
                    case DConRefRecord.sid:
                        this.addUnknownRecord(rec);
                    case  DBCellRecord.sid:
                        break;
                    case RowRecord.sid:
                        this.insertRow((RowRecord)rec);
                        break;
                    default:
                        if (rec instanceof UnknownRecord) {
                            this.addUnknownRecord(rec);
 
                            while(rs.peekNextSid() == 60) {
                                this.addUnknownRecord(rs.getNext());
                            }
                        } else if (rec instanceof MulBlankRecord) {
                            this._valuesAgg.addMultipleBlanks((MulBlankRecord)rec);
                        } else {
                            //此处为了解决非标excel文件导入而重新进行处理
//                            if (!(rec instanceof CellValueRecordInterface)) {
//                                throw new RuntimeException("Unexpected record type (" + rec.getClass().getName() + ")");
//                            }
//
//                            this._valuesAgg.construct((CellValueRecordInterface)rec, rs, svm);
                            if (rec instanceof CellValueRecordInterface) {
                                this._valuesAgg.construct((CellValueRecordInterface) rec, rs, svm);
                            }
                        }
                }
            }
 
            return;
        }
    }

好了，现在导入提供的原始文件，终于不报错了。

PS：发现阿里的easyExcel的github上也有人遇到此问题，但是由于是底层Poi问题，所以上面建议也是修改poi.jar源码模式进行处理。使用poi.jar版本为4.1.0 ，后续升级POI还要看此三个类是有有变更，一旦变更还要跟着进行处理（o(╥﹏╥)o）。

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/blog/article/detail/88877

POI处理导入excel发生RecordInputStream$LeftoverDataException处理记录_org.apache.poi.hssf.record.recordinputstream$lefto

项目场景：

问题描述

原因分析：

解决方案：

python办公自动化（十）Word、Excel、PowerPoint转换为PDF_python excel转pdf

Python Web 开发中 Excel 转 PDF 文件_python excel to pdf

常用工作文件(excel、word、ppt)转换为PDF格式_怎样把pptx格式转换成excel格式 site:blog.csdn.net

如何在 Python 中将 Excel 文件转换为图像？Aspose快速搞定_python aspose

Java Excel 导出为 PDF_hssfworkbook转pdf

word、ppt、excel、txt转pdf——第三篇xls、xlsx_xlsx saveformat.pdf

后端自学——给阿里云轻量应用服务器安装Tomcat_阿里云轻量应用服务器类型 tomcat apache

openpyxl 插入列_技术经验 | 详解 Python 操作 Excel 神器 openpyxl 的各种操作

如何利用pandas对现有的excel(非CSV)进行追加数据_pandas excel append

解决腾讯云(COS)对象存储文件上传报错：org/apache/commons/codec/digest/HmacUtils_上传腾讯云存储失败怎么回事

Python 读取 Excel 详解（openyxl）_python读取excel文件

org.apache.http.NoHttpResponseException：xxx.xxx.com:443 failed to respond

Apache Kyuubi 讲解与实战操作_kyuubi官方文档

Eclipse运行struts2项目报错：java.lang.ClassNotFoundException: org.apache.struts2.dispatcher.ng.filter.Strut_eclipse创建struts2项目运行时java.lang.classnotfoundexcept

Pandas载入txt、csv、Excel、JSON、数据库文件讲解及实战（超详细附源码）_pandas 文本文件

office Excel 加载加载项时出错解决办法_office加载项加载不出来

Apache Kafka: 强大消息队列系统的介绍与使用

Apache Flink

小程序系列（二）——授权相关及实例（语音识别）_res.authsetting['scope.record']

大数据9_04_Spark内核源码详细解析_spark executor org.apache.spark.executor.yarncoars