当前位置:   article > 正文

Python脚本利用openoffice将office文档转为html或者pdf。_python 连接openoffice

python 连接openoffice

准备工作:

一.软件环境:

jdk-7u9-linux-i586.tar.gz    #openoffice安装需要有jdk支持 版本号你随便定

Apache_OpenOffice_4.1.1_Linux_x86_install-rpm_en-US.tar #openoffice软件请到官网下载,我的是最新版本

二.安装部署

1.查看是否安装了JDK


rpm -qa | grep java


如有显示说明已经安装了


那就删除掉


rmp -e java


2.拷贝jdk-7u9-linux-i586.tar.gz到数据库服务器的/web/下

mv  jdk-7u9-linux-i586.tar.gz /home/wwwroot/

解压


tar -zxvf jdk-7u9-linux-i586.tar.gz


3.修改环境变量


vim /etc/profile


添加下面的代码


export JAVA_HOME=/www/web/jdk1.7.0_09
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin


保存退出


4.重新加载环境变量


source /etc/profile


5.验证是否成功


java -version

如果出现下面三行
java version "1.7.0_09"
Java(TM) SE Runtime Environment (build 1.7.0_09-b05)
Java HotSpot(TM) Client VM (build 23.5-b02, mixed mode)

或者新建Test.java


class Test 
{
        public static void main(String[] args) 
        {
               System.out.println("Hello World!");
        }
}
保存


javac Test.java
java Test

运行结果如下

Hello World!

以上说明jdk没问题了。


6.安装openoffice

tar -zxvf Apache_OpenOffice_4.1.1_Linux_x86_install-rpm_en-US.tar.gz

进入里面RPMS去

ls查看

desktop-integration                           openoffice-en-US-calc-4.1.1-9775.i586.rpm
openoffice-4.1.1-9775.i586.rpm                openoffice-en-US-draw-4.1.1-9775.i586.rpm
openoffice-base-4.1.1-9775.i586.rpm           openoffice-en-US-help-4.1.1-9775.i586.rpm
openoffice-brand-base-4.1.1-9775.i586.rpm     openoffice-en-US-impress-4.1.1-9775.i586.rpm
openoffice-brand-calc-4.1.1-9775.i586.rpm     openoffice-en-US-math-4.1.1-9775.i586.rpm
openoffice-brand-draw-4.1.1-9775.i586.rpm     openoffice-en-US-res-4.1.1-9775.i586.rpm
openoffice-brand-en-US-4.1.1-9775.i586.rpm    openoffice-en-US-writer-4.1.1-9775.i586.rpm
openoffice-brand-impress-4.1.1-9775.i586.rpm  openoffice-gnome-integration-4.1.1-9775.i586.rpm
openoffice-brand-math-4.1.1-9775.i586.rpm     openoffice-graphicfilter-4.1.1-9775.i586.rpm
openoffice-brand-writer-4.1.1-9775.i586.rpm   openoffice-images-4.1.1-9775.i586.rpm
openoffice-calc-4.1.1-9775.i586.rpm           openoffice-impress-4.1.1-9775.i586.rpm
openoffice-core01-4.1.1-9775.i586.rpm         openoffice-javafilter-4.1.1-9775.i586.rpm
openoffice-core02-4.1.1-9775.i586.rpm         openoffice-math-4.1.1-9775.i586.rpm
openoffice-core03-4.1.1-9775.i586.rpm         openoffice-ogltrans-4.1.1-9775.i586.rpm
openoffice-core04-4.1.1-9775.i586.rpm         openoffice-onlineupdate-4.1.1-9775.i586.rpm
openoffice-core05-4.1.1-9775.i586.rpm         openoffice-ooofonts-4.1.1-9775.i586.rpm
openoffice-core06-4.1.1-9775.i586.rpm         openoffice-ooolinguistic-4.1.1-9775.i586.rpm
openoffice-core07-4.1.1-9775.i586.rpm         openoffice-pyuno-4.1.1-9775.i586.rpm
openoffice-draw-4.1.1-9775.i586.rpm           openoffice-ure-4.1.1-9775.i586.rpm
openoffice-en-US-4.1.1-9775.i586.rpm          openoffice-writer-4.1.1-9775.i586.rpm
openoffice-en-US-base-4.1.1-9775.i586.rpm     openoffice-xsltfilter-4.1.1-9775.i586.rpm

然后rpm -ivh *.rpm

很快就安装完成了

7.启动openoffice

cd /opt/openoffice4/program/

我装的是openoffice4,你自己看清楚,使用命令启动

./soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard &

如果出现下面
[1] 1784

再查看一下进程

[root@www program]# netstat -tnlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name   
tcp        0      0 0.0.0.0:21                  0.0.0.0:*                   LISTEN      1560/pure-ftpd (SER 
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      1083/sshd           
tcp        0      0 127.0.0.1:631               0.0.0.0:*                   LISTEN      1027/cupsd          
tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN      1543/sendmail       
tcp        0      0 127.0.0.1:8100              0.0.0.0:*                   LISTEN      1814/soffice.bin    
tcp        0      0 0.0.0.0:3306                0.0.0.0:*                   LISTEN      1494/mysqld         
tcp        0      0 127.0.0.1:11211             0.0.0.0:*                   LISTEN      1576/memcached      
tcp        0      0 127.0.0.1:6379              0.0.0.0:*                   LISTEN      1523/redis-server 1 
tcp        0      0 0.0.0.0:80                  0.0.0.0:*                   LISTEN      1067/nginx          
tcp        0      0 :::21                       :::*                        LISTEN      1560/pure-ftpd (SER 
tcp        0      0 :::22                       :::*                        LISTEN      1083/sshd           
tcp        0      0 ::1:631                     :::*                        LISTEN      1027/cupsd  

可以发现8100端口已经处在监听状态

三、测试文档装换

需要一个Python脚本

注意我直接放在/opt/openoffice4/program 下面建了。

命名为DocumentConvert.py 

直接用了网上一个哥们的代码

  1. #
  2. # PyODConverter (Python OpenDocument Converter) v1.1 - 2009-11-14
  3. #
  4. # This script converts a document from one office format to another by
  5. # connecting to an OpenOffice.org instance via Python-UNO bridge.
  6. #
  7. # Copyright (C) 2008-2009 Mirko Nasato <mirko@artofsolving.com>
  8. # Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl-2.1.html
  9. # - or any later version.
  10. #
  11. DEFAULT_OPENOFFICE_PORT = 8100
  12. import uno
  13. from os.path import abspath, isfile, splitext
  14. from com.sun.star.beans import PropertyValue
  15. from com.sun.star.task import ErrorCodeIOException
  16. from com.sun.star.connection import NoConnectException
  17. FAMILY_TEXT = "Text"
  18. FAMILY_WEB = "Web"
  19. FAMILY_SPREADSHEET = "Spreadsheet"
  20. FAMILY_PRESENTATION = "Presentation"
  21. FAMILY_DRAWING = "Drawing"
  22. #---------------------#
  23. # Configuration Start #
  24. #---------------------#
  25. # see http://wiki.services.openoffice.org/wiki/Framework/Article/Filter
  26. # most formats are auto-detected; only those requiring options are defined here
  27. IMPORT_FILTER_MAP = {
  28. "txt": {
  29. "FilterName": "Text (encoded)",
  30. "FilterOptions": "utf8"
  31. },
  32. "csv": {
  33. "FilterName": "Text - txt - csv (StarCalc)",
  34. "FilterOptions": "44,34,0"
  35. }
  36. }
  37. EXPORT_FILTER_MAP = {
  38. "pdf": {
  39. FAMILY_TEXT: { "FilterName": "writer_pdf_Export" },
  40. FAMILY_WEB: { "FilterName": "writer_web_pdf_Export" },
  41. FAMILY_SPREADSHEET: { "FilterName": "calc_pdf_Export" },
  42. FAMILY_PRESENTATION: { "FilterName": "impress_pdf_Export" },
  43. FAMILY_DRAWING: { "FilterName": "draw_pdf_Export" }
  44. },
  45. "html": {
  46. FAMILY_TEXT: { "FilterName": "HTML (StarWriter)" },
  47. FAMILY_SPREADSHEET: { "FilterName": "HTML (StarCalc)" },
  48. FAMILY_PRESENTATION: { "FilterName": "impress_html_Export" }
  49. },
  50. "odt": {
  51. FAMILY_TEXT: { "FilterName": "writer8" },
  52. FAMILY_WEB: { "FilterName": "writerweb8_writer" }
  53. },
  54. "doc": {
  55. FAMILY_TEXT: { "FilterName": "MS Word 97" }
  56. },
  57. "rtf": {
  58. FAMILY_TEXT: { "FilterName": "Rich Text Format" }
  59. },
  60. "txt": {
  61. FAMILY_TEXT: {
  62. "FilterName": "Text",
  63. "FilterOptions": "utf8"
  64. }
  65. },
  66. "ods": {
  67. FAMILY_SPREADSHEET: { "FilterName": "calc8" }
  68. },
  69. "xls": {
  70. FAMILY_SPREADSHEET: { "FilterName": "MS Excel 97" }
  71. },
  72. "csv": {
  73. FAMILY_SPREADSHEET: {
  74. "FilterName": "Text - txt - csv (StarCalc)",
  75. "FilterOptions": "44,34,0"
  76. }
  77. },
  78. "odp": {
  79. FAMILY_PRESENTATION: { "FilterName": "impress8" }
  80. },
  81. "ppt": {
  82. FAMILY_PRESENTATION: { "FilterName": "MS PowerPoint 97" }
  83. },
  84. "swf": {
  85. FAMILY_DRAWING: { "FilterName": "draw_flash_Export" },
  86. FAMILY_PRESENTATION: { "FilterName": "impress_flash_Export" }
  87. }
  88. }
  89. PAGE_STYLE_OVERRIDE_PROPERTIES = {
  90. FAMILY_SPREADSHEET: {
  91. #--- Scale options: uncomment 1 of the 3 ---
  92. # a) 'Reduce / enlarge printout': 'Scaling factor'
  93. "PageScale": 100,
  94. # b) 'Fit print range(s) to width / height': 'Width in pages' and 'Height in pages'
  95. #"ScaleToPagesX": 1, "ScaleToPagesY": 1000,
  96. # c) 'Fit print range(s) on number of pages': 'Fit print range(s) on number of pages'
  97. #"ScaleToPages": 1,
  98. "PrintGrid": False
  99. }
  100. }
  101. #-------------------#
  102. # Configuration End #
  103. #-------------------#
  104. class DocumentConversionException(Exception):
  105. def __init__(self, message):
  106. self.message = message
  107. def __str__(self):
  108. return self.message
  109. class DocumentConverter:
  110. def __init__(self, port=DEFAULT_OPENOFFICE_PORT):
  111. localContext = uno.getComponentContext()
  112. resolver = localContext.ServiceManager.createInstanceWithContext("com.sun.star.bridge.UnoUrlResolver", localContext)
  113. try:
  114. context = resolver.resolve("uno:socket,host=localhost,port=%s;urp;StarOffice.ComponentContext" % port)
  115. except NoConnectException:
  116. raise DocumentConversionException, "failed to connect to OpenOffice.org on port %s" % port
  117. self.desktop = context.ServiceManager.createInstanceWithContext("com.sun.star.frame.Desktop", context)
  118. def convert(self, inputFile, outputFile):
  119. inputUrl = self._toFileUrl(inputFile)
  120. outputUrl = self._toFileUrl(outputFile)
  121. loadProperties = { "Hidden": True }
  122. inputExt = self._getFileExt(inputFile)
  123. if IMPORT_FILTER_MAP.has_key(inputExt):
  124. loadProperties.update(IMPORT_FILTER_MAP[inputExt])
  125. document = self.desktop.loadComponentFromURL(inputUrl, "_blank", 0, self._toProperties(loadProperties))
  126. try:
  127. document.refresh()
  128. except AttributeError:
  129. pass
  130. family = self._detectFamily(document)
  131. self._overridePageStyleProperties(document, family)
  132. outputExt = self._getFileExt(outputFile)
  133. storeProperties = self._getStoreProperties(document, outputExt)
  134. try:
  135. document.storeToURL(outputUrl, self._toProperties(storeProperties))
  136. finally:
  137. document.close(True)
  138. def _overridePageStyleProperties(self, document, family):
  139. if PAGE_STYLE_OVERRIDE_PROPERTIES.has_key(family):
  140. properties = PAGE_STYLE_OVERRIDE_PROPERTIES[family]
  141. pageStyles = document.getStyleFamilies().getByName('PageStyles')
  142. for styleName in pageStyles.getElementNames():
  143. pageStyle = pageStyles.getByName(styleName)
  144. for name, value in properties.items():
  145. pageStyle.setPropertyValue(name, value)
  146. def _getStoreProperties(self, document, outputExt):
  147. family = self._detectFamily(document)
  148. try:
  149. propertiesByFamily = EXPORT_FILTER_MAP[outputExt]
  150. except KeyError:
  151. raise DocumentConversionException, "unknown output format: '%s'" % outputExt
  152. try:
  153. return propertiesByFamily[family]
  154. except KeyError:
  155. raise DocumentConversionException, "unsupported conversion: from '%s' to '%s'" % (family, outputExt)
  156. def _detectFamily(self, document):
  157. if document.supportsService("com.sun.star.text.WebDocument"):
  158. return FAMILY_WEB
  159. if document.supportsService("com.sun.star.text.GenericTextDocument"):
  160. # must be TextDocument or GlobalDocument
  161. return FAMILY_TEXT
  162. if document.supportsService("com.sun.star.sheet.SpreadsheetDocument"):
  163. return FAMILY_SPREADSHEET
  164. if document.supportsService("com.sun.star.presentation.PresentationDocument"):
  165. return FAMILY_PRESENTATION
  166. if document.supportsService("com.sun.star.drawing.DrawingDocument"):
  167. return FAMILY_DRAWING
  168. raise DocumentConversionException, "unknown document family: %s" % document
  169. def _getFileExt(self, path):
  170. ext = splitext(path)[1]
  171. if ext is not None:
  172. return ext[1:].lower()
  173. def _toFileUrl(self, path):
  174. return uno.systemPathToFileUrl(abspath(path))
  175. def _toProperties(self, dict):
  176. props = []
  177. for key in dict:
  178. prop = PropertyValue()
  179. prop.Name = key
  180. prop.Value = dict[key]
  181. props.append(prop)
  182. return tuple(props)
  183. if __name__ == "__main__":
  184. from sys import argv, exit
  185. if len(argv) < 3:
  186. print "USAGE: python %s <input-file> <output-file>" % argv[0]
  187. exit(255)
  188. if not isfile(argv[1]):
  189. print "no such input file: %s" % argv[1]
  190. exit(1)
  191. try:
  192. converter = DocumentConverter()
  193. converter.convert(argv[1], argv[2])
  194. except DocumentConversionException, exception:
  195. print "ERROR! " + str(exception)
  196. exit(1)
  197. except ErrorCodeIOException, exception:
  198. print "ERROR! ErrorCodeIOException %d" % exception.ErrCode
  199. exit(1)


ok我们放一个doc文档转一下试试

[root@www program]# python DocumentConvert.py 1.doc 1.html
[root@www program]# 

没问题

转pdf试试呢

[root@www program]# python DocumentConvert.py 1.doc 1.pdf
[root@www program]# 


没问题。一切OK


声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/很楠不爱3/article/detail/250003
推荐阅读
相关标签
  

闽ICP备14008679号