Sunday 28 January 2018

OpenXML in word processing – Introduction and how to use it in ABAP

In first part of this blog I give introduction to OpenXML in word processing. In second part I will provide ABAP code how to read word files.

Starting with Microsoft Word 2007 when you create new document in word and save it – a new file is created with extension  “*.docx”. This file represents zipped xml files which describe whole word document. It includes, texts, tables, font sizes, colors, comments, margin settings, sections settings and everything what user manually placed and maintained in document. It is all about xml files bounded via relations one with each other in specific structure and zipped into file.

To explore this structure create your test document with something in it and save it. Rewrite extension “*.docx” into “*.zip” and unzip file. After unzpipping you see all xml files in specified structure. If you need to check and have a look at these xml files often I reccomend more convenient way. I suggest to install OOXML Tool which is add-on for Chrome browser. In easy drag and drop way you can see whole word document.

For example I created Test.docx with text “Hello World”. Note that until you provide any input in word it has size of 0. I drag word file into chrome using above mentioned add-on to see xml structure of word docuemnt. I look for /word/document.xml to see text tag which holds value “Hello world”.

SAP ABAP Development, SAP ABAP Certifications, SAP ABAP Guides, SAP ABAP Learning

Each xml file describes properties for document parts or relation between parts. For example:

◈ Conten_types xml describes type of content used in each part of whole document(package)
◈ _rels part describes relation between two parts
◈ doc properties part describe general properties of document in app and core xml file (application, author, version…)
◈ custom xml part is part which can hold customer specific data – this will be more described in other blog
◈ content of document is in /word/document.xml file
◈ fontTable xml contains information about used font types
◈ styles xml describes used styles

SAP provides class CL_DOCX_DOCUMENT which can help us to read and modify word document and go through its structure. Here is simple code which does the job..

*&---------------------------------------------------------------------*
*& Report  ZDOCX_DOCUMENT
*&
*&---------------------------------------------------------------------*
*& Report demonstrates using CL_DOCX_DOCUMENT class to read and maintain
*& word document.
*& Pavol Olejar 23.4.2017
*&---------------------------------------------------------------------*
REPORT zdocx_document.

DATA: lv_length   TYPE i,
      lt_data_tab TYPE STANDARD TABLE OF x255,
      lv_docx     TYPE xstring,
      lv_string   TYPE string,
      lv_xml      TYPE xstring,
      lr_docx     TYPE REF TO cl_docx_document,
      lr_main     TYPE REF TO cl_docx_maindocumentpart.
* Upload file
CALL METHOD cl_gui_frontend_services=>gui_upload
  EXPORTING
    filename   = 'C:\Test.docx'
    filetype   = 'BIN'
  IMPORTING
    filelength = lv_length
  CHANGING
    data_tab   = lt_data_tab.
* Get XSTRING format from BIN table
CALL FUNCTION 'SCMS_BINARY_TO_XSTRING'
  EXPORTING
    input_length = lv_length
  IMPORTING
    buffer       = lv_docx
  TABLES
    binary_tab   = lt_data_tab.
* Instanciate word document in ABAP class CL_DOCX_DOCUMENT
CALL METHOD cl_docx_document=>load_document
  EXPORTING
    iv_data = lv_docx
  RECEIVING
    rr_doc  = lr_docx.
* Get main part where content of word document is stored
lr_main = lr_docx->get_maindocumentpart( ).
* Get data (XSTRING) of main part
lv_xml = lr_main->get_data( ).
* Convert to string for simple maintaining
CALL FUNCTION 'CRM_IC_XML_XSTRING2STRING'
  EXPORTING
    inxstring = lv_xml
  IMPORTING
    outstring = lv_string.
* Change text
REPLACE FIRST OCCURRENCE OF 'Hello world.' IN lv_string
WITH 'Hello world. This is my Test_new.docx document.'.
* Convert back to XTSRING
CALL FUNCTION 'SCMS_STRING_TO_XSTRING'
  EXPORTING
    text   = lv_string
  IMPORTING
    buffer = lv_xml.
* Replace main part with new data and save it
lr_main->feed_data( iv_data = lv_xml ).
lv_docx = lr_docx->get_package_data( ).
* Save new word document locally
lv_length  = xstrlen( lv_docx ).

CALL FUNCTION 'SCMS_XSTRING_TO_BINARY'
  EXPORTING
    buffer     = lv_docx
  TABLES
    binary_tab = lt_data_tab.

CALL METHOD cl_gui_frontend_services=>gui_download
  EXPORTING
    bin_filesize      = lv_length
    filename          = 'C:\Test_new.docx'
    filetype          = 'BIN'
    confirm_overwrite = 'X'
  CHANGING
    data_tab          = lt_data_tab.

Methods get*part of class can provide different parts of document. Inhere we were interested in main part.

Method get_data( ) will give you back xml file from the part and using method feed_data( ) you store xml in used part of the document. These methods are part of every class which represents different parts of documents. For example In our case it is CL_DOCX_MAINDOCUMENTPART. See in debugger

Method get_package_data( ) of class CL_DOCX_DOCUMENT will save all current parts and pack them into zip file.

You can check that in debugger when looking at variables lv_xml and lv_docx using view XML browser. For variable lv_xml you see xml file of main part.

SAP ABAP Development, SAP ABAP Certifications, SAP ABAP Guides, SAP ABAP Learning

 For lv_docx you are prompt with pop-up if you want to save zip.file which is result of get_package_data( ) method.

SAP ABAP Development, SAP ABAP Certifications, SAP ABAP Guides, SAP ABAP Learning

No comments:

Post a Comment