赞
踩
本文大部分摘抄于IBM developerworks(主要是理论),详下面三篇文章,摘抄主要是为了使自己理解更深一点儿,仅当作笔记而已...也是为了以后再次使用时有个参考!摘抄并不全面,原文内容要丰富地多,详见原文。
参考文章:
使用 StAX 解析 XML,第 1 部分: Streaming API for XML (StAX) 简介:http://www.ibm.com/developerworks/cn/xml/x-stax1.html
使用 StAX 解析 XML,第 2 部分: 拉式解析和事件:http://www.ibm.com/developerworks/cn/xml/x-stax2.html
使用 StAX 解析 XML,第 3 部分: 使用定制事件和编写 XML:http://www.ibm.com/developerworks/cn/xml/x-stax3.html
————————————————
原文链接:https://blog.csdn.net/zhyh1986/article/details/8528649
解析xml的方法大体来说有四种:
这四种方法的利弊比较:
1.SAX解析(Simple API for XML)
SAX解析方式:逐行扫描文档,一边扫描一边解析。相比于DOM,SAX可以在解析文档的任意时刻停止解析解析,是一种速度更快,更高效的方法。
优点:不用事先调入整个文档,占用资源少。解析可以立即开始,速度快,没有内存压力。
缺点:不能对结点做修改
适用:读取XML文件
2.DOM解析(Document Object Model)
DOM解析方式:为 解析XML 文档定义了一组接口。解析器读入整个文档,然后在内存中建立一个树结构, 然后就可以使用 DOM 接口来操作这个树结构。
优点:整个文档树在内存中,便于操作;支持删除、修改、重新排列等多种功能
缺点:如果文件比较大,内存有压力,解析的时间会比较长。将整个文档调入内 存(包括无用的节点),浪费时间和空间。
适用:修改XML数据
3.JDOM
JDOM是处理xml的纯java api.使用具体类而不是接口.JDOM具有树的遍历,又有SAX的java规则.JDOM与DOM主要有两方面不同。
首先,JDOM仅使用具体类而不使用接口。这在某些方面简化了API,但是也限制了灵活性。
第二,API大量使用了Collections类,简化了那些已经熟悉这些类的Java开发者的使用。
JDOM自身不包含解析器。它通常使用SAX2解析器来解析和验证输入XML文档(尽管它还可以将以前构造的DOM表示作为输入)。它包含一些转换器以将JDOM表示输出成SAX2事件流、DOM模型或XML文本文档。
优点:1、是基于树的处理xml的java api,把树加载到内存中.
2、没有向下兼容的限制,所以比DOM简单.
3、速度快.
4、具有SAX的java 规则.
缺点:1、不能处理大于内存的文档.
2、JDOM表示XML文档逻辑模型,不能保证每个字节真正变换.
3、 针对实例文档不提供DTD与模式的任何实际模型.
4、 不支持于DOM中相应遍历包.
4.DOM4J
DOM4J有更复杂的api,所以dom4j比jdom有更大的灵活性.DOM4J性能最好,连Sun的JAXM也在用DOM4J.目前许多开源项目中大量采用DOM4J,例如大名鼎鼎的Hibernate也用DOM4J来读取XML配置文件。如果不考虑可移植性,那就采用DOM4J.
优点:灵活性最高、易用性和功能强大、性能优异
缺点:复杂的api、移植性差
以上这四种方法,我基本都有试过用来解析上述的需求
第一个用的就是DOM解析,但是这个方法只能解析小一点的xml文件,太大的会内存溢出 因为它是一次性加载整个文档的
后面用过DOM4J和SAX,但是都由于电脑系统内存的问题,还是会报JVM内存溢出的问题
没有办法,最后查到了StAX也可以解析大型XML文件的方法
截取一部分要解析的xml文件:
- <?xml version='1.0' encoding='UTF-8'?>
- <gwl>
- <version>20230417084108</version>
- <entities>
- <entity id="1123831" version="20230414163503">
- <name>ALMOND, LINCOLN CARTER</name>
- <listId>1021</listId>
- <listCode>USP</listCode>
- <entityType>03</entityType>
- <createdDate>09/02/2004</createdDate>
- <lastUpdateDate>04/14/2023</lastUpdateDate>
- <source>USP</source>
- <OriginalSource>PEP</OriginalSource>
- <dobs>
- <dob Y="1936">06/16/1936</dob>
- </dobs>
- <pobs>
- <pob>Pawtucket, Rhode Island, United States</pob>
- </pobs>
- <titles>
- <title>FORMER GOVERNOR OF RHODE ISLAND (JANUARY 3, 1995 - JANUARY 7, 2003). DECEASED JANUARY 02, 2023.</title>
- </titles>
- <sdfs>
- <sdf name="OtherInformation">Career: Governor of Rhode Island (January 03, 1995 - January 07, 2003); United State Attorney for the District of Rhode Island (October 09, 1981 - January 20, 1993); United State Attorney for the District of Rhode Island (1969 - 1978).</sdf>
- <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=d14d930f-7943-4363-b4d0-aa2c59437e1b</sdf>
- <sdf name="EffectiveDate">1981</sdf>
- <sdf name="EntityLevel">State</sdf>
- <sdf name="ExpirationDate">1993</sdf>
- <sdf name="Gender">MALE</sdf>
- <sdf name="NameSource">Website</sdf>
- <sdf name="Org_PID">1706394</sdf>
- <sdf name="OriginalID">7031</sdf>
- <sdf name="Relationship">Father</sdf>
- <sdf name="SubCategory">Former PEP</sdf>
- </sdfs>
- <addresses>
- <address>
- <country>US</country>
- <countryName>UNITED STATES</countryName>
- </address>
- </addresses>
- </entity>
- <entity id="1124766" version="20230414163503">
- <name>BAUCUS, MAX SIEBEN</name>
- <listId>1021</listId>
- <listCode>USP</listCode>
- <entityType>03</entityType>
- <createdDate>09/02/2004</createdDate>
- <lastUpdateDate>04/14/2023</lastUpdateDate>
- <source>USP</source>
- <OriginalSource>PEP</OriginalSource>
- <dobs>
- <dob Y="1941">12/11/1941</dob>
- </dobs>
- <pobs>
- <pob>Helena, Montana, United States</pob>
- </pobs>
- <aliases>
- <alias type="Alias">ENKE, MAX SIEBEN</alias>
- </aliases>
- <titles>
- <title>FORMER AMBASSADOR OF THE UNITED STATES TO CHINA (MARCH 20, 2014 - JANUARY 16, 2017).</title>
- </titles>
- <sdfs>
- <sdf name="OtherInformation">Political Party: Democratic. Career: Ambassador Extraordinary and Plenipotentiary of the United States to China, (March 20, 2014 - January 16, 2017); Member of the United States Congress, Senate from Montana (December 15, 1978 - February 06, 2014);</sdf>
- <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=945fd382-f5b7-42c4-ad1f-a40c4bf0e285</sdf>
- <sdf name="EffectiveDate">1978</sdf>
- <sdf name="EntityLevel">National</sdf>
- <sdf name="ExpirationDate">2014</sdf>
- <sdf name="Gender">MALE</sdf>
- <sdf name="NameSource">Website</sdf>
- <sdf name="Org_PID">548118</sdf>
- <sdf name="OriginalID">7542</sdf>
- <sdf name="Relationship">Brother</sdf>
- <sdf name="SubCategory">Former PEP</sdf>
- </sdfs>
- <addresses>
- <address>
- <country>US</country>
- <countryName>UNITED STATES</countryName>
- <province>WASHINGTON, DC</province>
- <postalCode>20515</postalCode>
- </address>
- <address>
- <country>US</country>
- <countryName>UNITED STATES</countryName>
- <province>WASHINGTON, D.C.</province>
- <postalCode>20510</postalCode>
- </address>
- <address>
- <address1>55 ANJIALOU RD</address1>
- <city>BEIJING</city>
- <country>CN</country>
- <countryName>CHINA</countryName>
- <postalCode>100600</postalCode>
- </address>
- </addresses>
- </entity>
- <entity id="1124842" version="20230414163503">
- <name>THOMAS, CRAIG LYLE</name>
- <listId>1021</listId>
- <listCode>USP</listCode>
- <entityType>03</entityType>
- <createdDate>09/02/2004</createdDate>
- <lastUpdateDate>04/14/2023</lastUpdateDate>
- <source>USP</source>
- <OriginalSource>PEP</OriginalSource>
- <dobs>
- <dob Y="1933">02/17/1933</dob>
- </dobs>
- <pobs>
- <pob>Cody, Wyoming, United States</pob>
- </pobs>
- <titles>
- <title>FORMER MEMBER OF THE UNITED STATES CONGRESS (JANUARY 03, 1995 - JUNE 04, 2007). DECEASED JUNE 04, 2007.</title>
- </titles>
- <sdfs>
- <sdf name="OtherInformation">Political Party: Republican. Career: Member of the United States Congress, Senate, Class I (January 03, 1995 - June 04, 2007); Member of the United States Congress, House of Representatives, At-Large (April 27, 1989 - January 03, 1995). Member of the</sdf>
- <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=4e7b1050-36b5-4b1c-9037-c2349c519d40</sdf>
- <sdf name="EffectiveDate">1989</sdf>
- <sdf name="EntityLevel">National</sdf>
- <sdf name="ExpirationDate">1995</sdf>
- <sdf name="Gender">MALE</sdf>
- <sdf name="NameSource">Website</sdf>
- <sdf name="Org_PID">1817490</sdf>
- <sdf name="OriginalID">7629</sdf>
- <sdf name="Relationship">Father</sdf>
- <sdf name="SubCategory">Former PEP</sdf>
- </sdfs>
- <addresses>
- <address>
- <country>US</country>
- <countryName>UNITED STATES</countryName>
- <province>WASHINGTON D.C.</province>
- <postalCode>20510</postalCode>
- </address>
- <address>
- <address1>200 WEST 24TH STREET</address1>
- <city>CHEYENNE</city>
- <state>WY</state>
- <stateName>WYOMING</stateName>
- <country>US</country>
- <countryName>UNITED STATES</countryName>
- <postalCode>82002</postalCode>
- </address>
- </addresses>
- </entity>
- <entity id="1125230" version="20230414163051">
- <name>PATRIAT, FRANCOIS</name>
- <listId>1020</listId>
- <listCode>PEP</listCode>
- <entityType>03</entityType>
- <createdDate>09/02/2004</createdDate>
- <lastUpdateDate>04/14/2023</lastUpdateDate>
- <source>PEP</source>
- <OriginalSource>PEP</OriginalSource>
- <dobs>
- <dob Y="1943">03/21/1943</dob>
- </dobs>
- <pobs>
- <pob>Semur-en-Auxois, , France</pob>
- </pobs>
- <titles>
- <title>MEMBER OF THE FRENCH PARLIAMENT (OCTOBER 01, 2008 - 2026).</title>
- </titles>
- <sdfs>
- <sdf name="OtherInformation">Political party: La Republique en marche (LREM) (currently known as Renaissance). Career: Member of the Executive Bureau of La Republique en Marche (LREM), The Republic on the Move (currently known as Renaissance), effective from November 18, 2017;</sdf>
- <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=a4ffd4f3-5c75-440b-aeca-4e3a7d2ef642</sdf>
- <sdf name="EffectiveDate">2008</sdf>
- <sdf name="EntityLevel">National</sdf>
- <sdf name="ExpirationDate">2026</sdf>
- <sdf name="Gender">MALE</sdf>
- <sdf name="NameSource">Website</sdf>
- <sdf name="Org_PID">3759009</sdf>
- <sdf name="OriginalID">8117</sdf>
- <sdf name="Relationship">Associate</sdf>
- <sdf name="SubCategory">Govt Branch Member</sdf>
- </sdfs>
- <addresses>
- <address>
- <address1>15, RUE DE VAUGIRARD</address1>
- <city>PARIS</city>
- <country>FR</country>
- <countryName>FRANCE</countryName>
- <postalCode>75291</postalCode>
- </address>
- </addresses>
- </entity>
- <entity id="1125282" version="20230414163052">
- <name>BENOUTIQ, ABDELKRIM</name>
- <listId>1020</listId>
- <listCode>PEP</listCode>
- <entityType>03</entityType>
- <createdDate>09/02/2004</createdDate>
- <lastUpdateDate>04/14/2023</lastUpdateDate>
- <source>PEP</source>
- <OriginalSource>PEP</OriginalSource>
- <dobs>
- <dob Y="1959">08/19/1959</dob>
- </dobs>
- <pobs>
- <pob>Rabat, Rabat-Sale-Kenitra Region, Morocco</pob>
- </pobs>
- <aliases>
- <alias type="Alias">BEN ATIQ, ABDELKRIM</alias>
- <alias type="Alias">BENATIQ, ABDELKRIM</alias>
- </aliases>
- <nativeCharNames>
- <nativeCharName charSet="" latinCharName="BEN ATIQ, ABDELKRIM" type="Alias">??? ?????? ?? ????</nativeCharName>
- <nativeCharName charSet="" latinCharName="BENATIQ, ABDELKRIM" type="Alias">??? ?????? ??????</nativeCharName>
- <nativeCharName charSet="" latinCharName="BENOUTIQ, ABDELKRIM" type="Primary">??? ?????? ??????</nativeCharName>
- </nativeCharNames>
- <titles>
- <title>FORMER MEMBER OF THE POLITICAL BUREAU OF SOCIALIST UNION OF POPULAR FORCES PARTY, MOROCCO, ELECTED JUNE 10, 2017, EFFECTIVE UNTIL APRIL 24, 2022.</title>
- </titles>
- <sdfs>
- <sdf name="OtherInformation">Political Party: Union Socialiste Des Forces Populaires (USFP) Career: Member of the Political Bureau of Union Socialiste Des Forces Populaires (USFP), Socialist Union of Popular Forces Party, elected June 10, 2017, effective until April 24, 2022;</sdf>
- <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=35f8bcea-6169-4a8f-9715-81de730d1c17</sdf>
- <sdf name="EffectiveDate">2000</sdf>
- <sdf name="EntityLevel">National</sdf>
- <sdf name="ExpirationDate">2001</sdf>
- <sdf name="Gender">MALE</sdf>
- <sdf name="NameSource">Website</sdf>
- <sdf name="OriginalID">8181</sdf>
- <sdf name="SubCategory">Former PEP</sdf>
- </sdfs>
- <addresses>
- <address>
- <address1>9, AVENUE AL ARAAR</address1>
- <city>RABAT</city>
- <country>MA</country>
- <countryName>MOROCCO</countryName>
- <province>RABAT-SALE-KENITRA REGION</province>
- </address>
- <address>
- <address1>AVENUE F.ROOSEVELT</address1>
- <city>RABAT</city>
- <country>MA</country>
- <countryName>MOROCCO</countryName>
- <province>RABAT-SALE-KENITRA REGION</province>
- </address>
- <address>
- <address1>NO. 9 ARAR STREET</address1>
- <city>RABAT</city>
- <country>MA</country>
- <countryName>MOROCCO</countryName>
- <province>RABAT-SALE-KENITRA REGION</province>
- </address>
- </addresses>
- </entity>
- <entity id="1125443" version="20230414163053">
- <name>OLLING, SVEND</name>
- <listId>1020</listId>
- <listCode>PEP</listCode>
- <entityType>03</entityType>
- <createdDate>09/02/2004</createdDate>
- <lastUpdateDate>04/14/2023</lastUpdateDate>
- <source>PEP</source>
- <OriginalSource>PEP</OriginalSource>
- <dobs>
- <dob Y="1967">11/09/1967</dob>
- </dobs>
- <pobs>
- <pob>Glostrup, , Denmark</pob>
- </pobs>
- <titles>
- <title>AMBASSADOR OF DENMARK TO SOUTH KOREA, AS OF MARCH 30, 2023.</title>
- </titles>
- <sdfs>
- <sdf name="OtherInformation">Career: Ambassador of Denmark to South Korea, as of March 30, 2023; Ambassador of Denmark to Egypt, as of May 28, 2020, expiration reported March 20, 2023; Non-Resident Ambassador of Denmark to Azerbaijan, effective from March 26, 2017, expiration</sdf>
- <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=ef160921-f06b-4942-9527-0ee7565467c0</sdf>
- <sdf name="EffectiveDate">2023</sdf>
- <sdf name="EntityLevel">International</sdf>
- <sdf name="Gender">MALE</sdf>
- <sdf name="NameSource">Website</sdf>
- <sdf name="Org_PID">8698914</sdf>
- <sdf name="OriginalID">8384</sdf>
- <sdf name="Relationship">Father</sdf>
- <sdf name="SubCategory">Diplomat</sdf>
- </sdfs>
- <addresses>
- <address>
- <address1>416, HANGANG-DAERO, JUNG-GU</address1>
- <city>SEOUL</city>
- <country>KR</country>
- <countryName>KOREA, REPUBLIC OF</countryName>
- <postalCode>04637</postalCode>
- </address>
- <address>
- <address1>TURAN GUENES BULVARI 106</address1>
- <city>ANKARA</city>
- <country>TR</country>
- <countryName>TURKEY</countryName>
- <postalCode>06550</postalCode>
- </address>
- <address>
- <address1>ASIATISK PLADS 2</address1>
- <city>COPENHAGEN</city>
- <country>DK</country>
- <countryName>DENMARK</countryName>
- <postalCode>1448</postalCode>
- </address>
- <address>
- <address1>NORTH AVENUE</address1>
- <city>DHAKA</city>
- <country>BD</country>
- <countryName>BANGLADESH</countryName>
- <postalCode>1212</postalCode>
- </address>
- <address>
- <city>CAIRO</city>
- <country>EG</country>
- <countryName>EGYPT</countryName>
- </address>
- </addresses>
- </entity>
- <entity id="1125610" version="20230414163054">
- <name>TAKAHASHI, KOICHI</name>
- <listId>1020</listId>
- <listCode>PEP</listCode>
- <entityType>03</entityType>
- <createdDate>09/02/2004</createdDate>
- <lastUpdateDate>04/14/2023</lastUpdateDate>
- <source>PEP</source>
- <OriginalSource>PEP</OriginalSource>
- <dobs>
- <dob Y="1944">1944</dob>
- </dobs>
- <nativeCharNames>
- <nativeCharName charSet="" latinCharName="TAKAHASHI, KOICHI" type="Primary">たかはし こういち</nativeCharName>
- <nativeCharName charSet="" latinCharName="TAKAHASHI, KOICHI" type="Primary">高橋 恒一</nativeCharName>
- </nativeCharNames>
- <titles>
- <title>FORMER AMBASSADOR OF JAPAN TO THE CZECH REPUBLIC (FEBRUARY 03, 2003 - 2005).</title>
- </titles>
- <sdfs>
- <sdf name="OtherInformation">Career: Ambassador of Japan to the Czech Republic (February 03, 2003 - 2005); Deputy Vice-Minister in charge of Immigration Bureau, Ministry of Justice (1999 - 2001); Consul-General of Japan to Berlin City, Germany (1995 - 1997); Minister of Japan to</sdf>
- <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=9b2a063e-8d55-4806-b2f2-f2c79d815a33</sdf>
- <sdf name="EffectiveDate">1999</sdf>
- <sdf name="EntityLevel">National</sdf>
- <sdf name="ExpirationDate">2001</sdf>
- <sdf name="Gender">MALE</sdf>
- <sdf name="NameSource">Website</sdf>
- <sdf name="OriginalID">8483</sdf>
- <sdf name="SubCategory">Former PEP</sdf>
- </sdfs>
- <addresses>
- <address>
- <country>JP</country>
- <countryName>JAPAN</countryName>
- </address>
- </addresses>
- </entity>
- <entity id="1125925" version="20230414163054">
- <name>PINTER, SANDOR</name>
- <listId>1020</listId>
- <listCode>PEP</listCode>
- <entityType>03</entityType>
- <createdDate>09/02/2004</createdDate>
- <lastUpdateDate>04/14/2023</lastUpdateDate>
- <source>PEP</source>
- <OriginalSource>PEP</OriginalSource>
- <dobs>
- <dob Y="1948">07/03/1948</dob>
- </dobs>
- <pobs>
- <pob>Budapest, , Hungary</pob>
- </pobs>
- <titles>
- <title>DEPUTY PRIME MINISTER OF HUNGARY, EFFECTIVE FROM MAY 04, 2018.</title>
- </titles>
- <sdfs>
- <sdf name="OtherInformation">Career: Deputy Prime Minister, effective from May 04, 2018; Minister of Interior, effective from May 29, 2010; Minister of Interior (July 08, 1998 - May 27, 2002); Chief of the Hungarian National Police (September 18, 1991 - 1996).</sdf>
- <sdf name="DirectID">https://accuity.worldcompliance.com/signin.aspx?ent=cd135a22-6242-4999-bc6f-5aae5b0f92e2</sdf>
- <sdf name="EffectiveDate">2018</sdf>
- <sdf name="EntityLevel">National</sdf>
- <sdf name="Gender">MALE</sdf>
- <sdf name="NameSource">Website</sdf>
- <sdf name="Org_PID">2544374</sdf>
- <sdf name="OriginalID">11549</sdf>
- <sdf name="Relationship">Father</sdf>
- <sdf name="SubCategory">Govt Branch Member</sdf>
- </sdfs>
- <addresses>
- <address>
- <address1>TEVE U. 4-6.</address1>
- <city>BUDAPEST</city>
- <country>HU</country>
- <countryName>HUNGARY</countryName>
- <postalCode>1139</postalCode>
- </address>
- <address>
- <address1>JOZSEF ATTILA U. 2-4.</address1>
- <city>BUDAPEST</city>
- <country>HU</country>
- <countryName>HUNGARY</countryName>
- <postalCode>1051</postalCode>
- </address>
- </addresses>
- </entity>
- </entities>
- </gwl>

下面是用StAX解析的方法解析出上述xml文件里标签为entity的所有内容,并均匀写入7个新的xml文件中,并且每个新的xml文件都是自定义固定的格式:
- import java.io.FileInputStream;
- import java.io.FileOutputStream;
- import java.io.InputStream;
- import java.io.OutputStream;
- import javax.xml.stream.XMLInputFactory;
- import javax.xml.stream.XMLOutputFactory;
- import javax.xml.stream.XMLStreamConstants;
- import javax.xml.stream.XMLStreamException;
- import javax.xml.stream.XMLStreamReader;
- import javax.xml.stream.XMLStreamWriter;
-
- public class StAXParserTest {
- public static void main(String[] args) {
- String inputFile = "D:\\Desktop\\PEP\\ENTITY.XML"; // 输入XML文件路径
- String outputPrefix = "D:\\Desktop\\PEP\\"; // 输出XML文件前缀
- int numFiles = 7; // 新文件数量
-
- try {
- // 创建XML输入工厂和读取器
- XMLInputFactory inputFactory = XMLInputFactory.newInstance();
- //创建输入流
- InputStream inputStream = new FileInputStream(inputFile);
- //使用输入工厂创建XMLStreamReader
- XMLStreamReader reader = inputFactory.createXMLStreamReader(inputStream);
-
- // 创建XML输出工厂和写入器数组
- XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();
- //创建输出流数组:
- OutputStream[] outputStreams = new OutputStream[numFiles];
- //创建XMLStreamWriter数组
- XMLStreamWriter[] writers = new XMLStreamWriter[numFiles];
-
- for (int i = 0; i < numFiles; i++) {
- String outputFileName = outputPrefix + (i + 1) + ".xml";
- outputStreams[i] = new FileOutputStream(outputFileName);
- writers[i] = outputFactory.createXMLStreamWriter(outputStreams[i]);
- //开始编写XML文件刚开始头部 如:<?xml version='1.0' encoding='UTF-8'?>
- writers[i].writeStartDocument("UTF-8", "1.0");
- //此处为加了一个回车
- writers[i].writeCharacters("\n");
- //创建了GWL标签
- writers[i].writeStartElement("gwl");
- writers[i].writeCharacters("\n");
- //创建了Version标签,并在Version标签内增加值
- writers[i].writeStartElement("version");
- writers[i].writeCharacters("20230417084108");
- //Version标签结束,增加回标签</Version>
- writers[i].writeEndElement();
- writers[i].writeCharacters("\n");
- writers[i].writeStartElement("entities");
- }
-
- // 解析XML并写入新文件
- int currentFileIndex = 0;
- int entityCount = 0;
-
- while (reader.hasNext()) {
- int event = reader.next();
-
- switch (event) {
- case XMLStreamConstants.START_ELEMENT:
- String elementName = reader.getLocalName();
- if ("entity".equals(elementName)) {
- // 解析entity元素及其子元素
- writeEntityElement(reader, writers[currentFileIndex]);
- entityCount++;
-
- // 切换到下一个文件
- currentFileIndex = (currentFileIndex + 1) % numFiles;
- }
- break;
- }
- }
-
- // 关闭写入器和输出流
- for (int i = 0; i < numFiles; i++) {
- writers[i].writeCharacters("\n");
- //entities回标签
- writers[i].writeEndElement(); // entities
- writers[i].writeCharacters("\n");
- //gwl回标签
- writers[i].writeEndElement(); // gwl
- writers[i].writeCharacters("\n");
- writers[i].writeEndDocument();
- writers[i].flush();
- writers[i].close();
- outputStreams[i].close();
- }
-
- // 关闭输入流
- inputStream.close();
-
- System.out.println("entity总数量: " + entityCount);
- System.out.println("Entities per file: " + (entityCount / numFiles));
-
- } catch (Exception e) {
- e.printStackTrace();
- }
- }
-
- private static void writeEntityElement(XMLStreamReader reader, XMLStreamWriter writer) throws XMLStreamException {
- writer.writeCharacters("\n");
- //开始写入Entity标签
- writer.writeStartElement("entity");
-
- // 写入entity元素的属性
- int attributeCount = reader.getAttributeCount();
- //读取entity标签内的属性值: attributeName为id/version attributeValue则为值
- for (int i = 0; i < attributeCount; i++) {
- String attributeName = reader.getAttributeLocalName(i);
- String attributeValue = reader.getAttributeValue(i);
- writer.writeAttribute(attributeName, attributeValue);
- }
-
- // 解析entity元素的子元素
- while (reader.hasNext()) {
- int event = reader.next();
- switch (event) {
- case XMLStreamConstants.START_ELEMENT:
- //获取当前开始的元素的名称
- String childElementName = reader.getLocalName();
- //写入开始元素的代码
- writer.writeStartElement(childElementName);
- break;
-
- case XMLStreamConstants.END_ELEMENT:
- String endElementName = reader.getLocalName();
- //写入结束元素的代码
- writer.writeEndElement();
- if ("entity".equals(endElementName)) {
- // entity元素解析完毕,结束写入
- return;
- }
- break;
-
- case XMLStreamConstants.CHARACTERS:
- String text = reader.getText();
- writer.writeCharacters(text);
- break;
- }
- }
- }
- }

上述示例截取的xml文件中一共8个entity元素,解析完成后,7个xml文件中每个文件平均存入一条,多余出来的1条依次存入,所以第一个xml文件里是2条,其他6个里面只有一条数据
我完整的解析了4GB大小的Entity.xml文件,不会存在内存溢出的问题,解析速度也很快!
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。