当前位置:   article > 正文

迭代器到达第11页时抛出异常,如何解析pdf文件?

java file gettext_java PdfTextExtractor.getTextFromPage(未知来源)

你好,当迭代器到达第11页时抛出一个异常就解析了pdf。

有任何想法吗?谢谢

这是我的代码:

import java.io.*;

import java.nio.charset.Charset;

import java.util.regex.*;

import com.lowagie.text.pdf.PdfReader;

import com.lowagie.text.pdf.hyphenation.TernaryTree.Iterator;

import com.lowagie.text.pdf.parser.PdfTextExtractor;

public class PdfParser {

/**

* @param args

*/

public static void main(String[] args) {

// TODO Auto-generated method stub

int index = 0;

try {

PdfReader readerN = new PdfReader("C:\\Documents and Settings\\stefan.stere\\hibernateWorkspace\\PdfParser\\src\\monitor3.pdf");

OutputStreamWriter out = new OutputStreamWriter( new FileOutputStream(new File("C:\\Documents and Settings\\stefan.stere\\hibernateWorkspace\\PdfParser\\src\\pdf2txt.rtf")),"Cp1252");

PdfTextExtractor parse = new PdfTextExtractor(readerN);

int nrPages = readerN.getNumberOfPages();

for (int i=1; i

index++;

String page = parse.getTextFromPage(i);

if(page != null){

page = page.replace(new StringBuffer("null"), new StringBuffer("??"));

page = page.replaceAll("Comercial.", "Comerciala");

page = page.replaceAll("ACT ADI..IONAL", "ACT ADITIONAL");

page = page.replaceAll("HOT.R..E", "HOTARARE");

page = page.replaceAll("HOT.R..EA", "HOTARAREA");

page = page.replaceAll("HOT.R..I", "HOTARARI");

page = page.replaceAll("..cheiat.", "incheiata");

page = page.replaceAll("ANUN..", "ANUNT");

out.write(page);

System.out.println(page);

}

}

out.close();

readerN.close();

} catch (Exception e) {

// TODO: handle exception

e.printStackTrace();

System.out.println(index);

}

}

}和异常堆栈:

java.lang.ArrayIndexOutOfBoundsException: Invalid index: 62

at com.lowagie.text.pdf.CMapAwareDocumentFont.decodeSingleCID(Unknown Source)

at com.lowagie.text.pdf.CMapAwareDocumentFont.decode(Unknown Source)

at com.lowagie.text.pdf.parser.PdfContentStreamProcessor.decode(Unknown Source)

at com.lowagie.text.pdf.parser.PdfContentStreamProcessor.displayPdfString(Unknown Source)

at com.lowagie.text.pdf.parser.PdfContentStreamProcessor$ShowTextArray.invoke(Unknown Source)

at com.lowagie.text.pdf.parser.PdfContentStreamProcessor.invokeOperator(Unknown Source)

at com.lowagie.text.pdf.parser.PdfContentStreamProcessor.processContent(Unknown Source)

at com.lowagie.text.pdf.parser.PdfTextExtractor.getTextFromPage(Unknown Source)

at PdfParser.main(PdfParser.java:32)

声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号