Pages

Friday, 3 December 2021

Extract Document details using Document Information Extraction service with ABAP

In this blog I want to show the posibility to extract information from a document using AI and OCR implemented by the BTP service Document Information Extraction calling the API offered from an ABAP program.

The objective of this blog is not to show how the API works as there are good blogs showing it (Developer Mission) but to show how you can automate the API calls with only ABAP program. I say only abap as there are already another integration scenarios in CPI ( Document Information Extraction Integration with Email Server ) or with iRPA but here we will see a simple solution.

Here the architecture:

SAP ABAP Exam Prep, SAP ABAP Certification, SAP ABAP Learning, SAP ABAP Career, SAP ABAP Exam, SAP ABAP Skills

The API calls you need to perform to send a file and receive the results are:

1. Authenticate
2. Send File
3. Get job status, if job is still processing the document, wait until it’s done
4. Get JSON with fields extracted

You need to create a destination in SM59 for the authentication:

SAP ABAP Exam Prep, SAP ABAP Certification, SAP ABAP Learning, SAP ABAP Career, SAP ABAP Exam, SAP ABAP Skills

Host: d7d51f5atrial.authentication.us10.hana.ondemand.com

Port: 443

User: <clientid from instance service key>

Pass: <clientsecret from instance service key>

And here you have the program that requires a pdf file, it will send the file requesting the fields documentNumber, purchaseOrderNumber and grossAmount and wait for the response. After getting the json it will write the values read by the service.

*&---------------------------------------------------------------------*
*& Report ZTEST_DOCUMENT_INFORMATION_EXT
*&---------------------------------------------------------------------*
*& PoC - Sends a file to Document Information Extraction BTP Service
*& Reads te file from Desktop and sends through API
*&---------------------------------------------------------------------*
REPORT ztest_document_information_ext.

CLASS zcl_die DEFINITION DEFERRED.

TYPES: BEGIN OF ty_filetab,
         value TYPE x,
       END OF ty_filetab.

DATA lr_die          TYPE REF TO zcl_die.
DATA: lv_file_name    TYPE string,
      lv_rc           TYPE i,
      lt_file         TYPE STANDARD TABLE OF ty_filetab,
      lv_file_content TYPE xstring,
      lt_filetable    TYPE filetable.

PARAMETERS: p_fname TYPE rlgrap-filename.

AT SELECTION-SCREEN ON VALUE-REQUEST FOR p_fname.

  CALL METHOD cl_gui_frontend_services=>file_open_dialog
    EXPORTING
      window_title = 'Choose a file'
      file_filter = 'PDF files (*.pdf)|*.pdf|'
    CHANGING
      file_table   = lt_filetable
      rc           = lv_rc.

  p_fname = lt_filetable[ 1 ]-filename.


**********************************************************************
* Document Information Extraction class definition
CLASS  zcl_die DEFINITION FINAL.

  PUBLIC SECTION.
    CONSTANTS: c_api_url  TYPE string VALUE 'https://aiservices-trial-dox.cfapps.us10.hana.ondemand.com',
               c_api_path TYPE string VALUE '/document-information-extraction/v1'.

    DATA:
      m_oauth           TYPE string,
      m_content_clients TYPE string.

    METHODS authenticate RETURNING VALUE(rv_authenticated) TYPE abap_bool..
    METHODS post_document IMPORTING iv_file_content           TYPE xstring
                          RETURNING VALUE(rv_job) TYPE string.
    METHODS send_file IMPORTING iv_file_content           TYPE xstring.
    METHODS get_status_job IMPORTING iv_job               TYPE string
                           RETURNING VALUE(rv_status_job) TYPE string.

ENDCLASS.

**********************************************************************
* Document Information Extraction class implementation
CLASS  zcl_die IMPLEMENTATION.

  METHOD authenticate.

    DATA lr_client         TYPE REF TO if_http_client.

    CALL METHOD cl_http_client=>create_by_destination
      EXPORTING
        destination              = 'ZBTP_DOC_INF_EXT_OAUTH2'
      IMPORTING
        client                   = lr_client
      EXCEPTIONS
        argument_not_found       = 1
        destination_not_found    = 2
        destination_no_authority = 3
        plugin_not_active        = 4
        internal_error           = 5
        OTHERS                   = 6.
    IF sy-subrc = 0.

*     If you have the class cl_oauth2_client in your system check note 3041322 or use following method
      lr_client->request->set_header_field( name  =  if_http_header_fields_sap=>request_method  value =  'POST' ).
      lr_client->request->set_header_field( name  = 'grant_type'  value =  'client_credentials' ).
      lr_client->request->set_header_field( name  =  if_http_header_fields_sap=>request_uri  value =  '/oauth/token?grant_type=client_credentials' ).
      lr_client->send( ).
      lr_client->receive( ).

      lr_client->response->get_status(
        IMPORTING
          code   = DATA(lv_code) ).

      IF lv_code = '200'.

        DATA: rest  TYPE string.

        DATA(l_content) = lr_client->response->get_cdata( ).
        SPLIT l_content AT '"access_token":"' INTO rest l_content.
        SPLIT l_content AT '"' INTO m_oauth rest.

        rv_authenticated = abap_true.

      ELSE.
        rv_authenticated = abap_false.
      ENDIF.

      lr_client->close(  ).

    ENDIF.

  ENDMETHOD.

  METHOD post_document.

    DATA lr_client         TYPE REF TO if_http_client.
    DATA lo_request_part         TYPE REF TO if_http_entity.
    DATA lo_request_part2         TYPE REF TO if_http_entity.
    DATA lv_content_disposition TYPE string.
    DATA len           TYPE i.
    DATA lv_options TYPE string.

    DATA: BEGIN OF ls_create_job_response,
            id            TYPE string,
            status        TYPE string,
            processedtime TYPE string,
          END OF ls_create_job_response.

    CLEAR rv_job.

    CALL METHOD cl_http_client=>create_by_url
      EXPORTING
        url                = c_api_url
      IMPORTING
        client             = lr_client
      EXCEPTIONS
        argument_not_found = 1
        plugin_not_active  = 2
        internal_error     = 3
        OTHERS             = 4.

    IF sy-subrc = 0.

      lr_client->request->set_header_field( name  =  if_http_header_fields_sap=>request_method  value =  if_http_request=>co_request_method_post ).
      lr_client->request->set_header_field( name  =  if_http_header_fields_sap=>request_uri  value =  |{ c_api_path }/document/jobs| ).
      lr_client->request->set_header_field( name  =  'Authorization'  value =  |Bearer { m_oauth }| ).
      lr_client->request->set_content_type( if_rest_media_type=>gc_multipart_form_data ).
      lr_client->request->if_http_entity~set_formfield_encoding( formfield_encoding = cl_http_request=>if_http_entity~co_encoding_raw ).

      lr_client->request->set_header_field( name = 'Accept' value = if_rest_media_type=>gc_appl_json ).

      lo_request_part2 = lr_client->request->add_multipart( ).

      lv_options = '{ "extraction": { "headerFields": [ "documentNumber", "purchaseOrderNumber", "grossAmount" ], "lineItemFields": [ "netAmount" ] },' &&
                   '"clientId": "default", "documentType": "invoice", "receivedDate": "2020-02-17", "enrichment": { "sender": { "top": 5, "type": ' &&
                   '"businessEntity", "subtype": "supplier" }, "employee": { "type": "employee" } }}'.
      lo_request_part2->set_header_field( name = `Content-Disposition` "#EC NOTEXT
                                         value = |form-data; name="options"; type=application/json| ).
      lo_request_part2->set_cdata(
        EXPORTING
          data   =  lv_options  ).

      lo_request_part = lr_client->request->add_multipart( ).
      lv_content_disposition = |form-data; name="file"; filename=sample-invoice.pdf |.
      lo_request_part->set_header_field( name = `Content-Disposition` "#EC NOTEXT
                                         value = lv_content_disposition ).
      lo_request_part->set_content_type( if_rest_media_type=>gc_appl_pdf ).

      len = xstrlen( iv_file_content ).

      lo_request_part->set_data( data = lv_file_content offset = 0 length = len ).

      lr_client->send( ).
      lr_client->receive( ).

      DATA(l_content_clients) = lr_client->response->get_cdata( ).
      /ui2/cl_json=>deserialize( EXPORTING json = l_content_clients pretty_name = /ui2/cl_json=>pretty_mode-camel_case CHANGING data = ls_create_job_response ).

      lr_client->response->get_status(
        IMPORTING
          code   = DATA(lv_code) ).

      IF lv_code = '201'.
        rv_job = ls_create_job_response-id.
      ENDIF.

      lr_client->close(  ).

    ENDIF.

  ENDMETHOD.

  METHOD get_status_job.

    DATA lr_client         TYPE REF TO if_http_client.
    DATA lv_status_job TYPE string.
    DATA l_json_response TYPE string.
    DATA: lr_data          TYPE REF TO data.

    CLEAR rv_status_job.

    CALL METHOD cl_http_client=>create_by_url
      EXPORTING
        url                = c_api_url
      IMPORTING
        client             = lr_client
      EXCEPTIONS
        argument_not_found = 1
        plugin_not_active  = 2
        internal_error     = 3
        OTHERS             = 4.

    IF sy-subrc = 0.

      lr_client->request->set_header_field( name  =  if_http_header_fields_sap=>request_method  value =  if_http_request=>co_request_method_get ).
      lr_client->request->set_header_field( name  =  if_http_header_fields_sap=>request_uri  value =  |{ c_api_path }/document/jobs/{ iv_job }| ).
      lr_client->request->set_header_field( name  =  'Authorization'  value =  |Bearer { m_oauth }| ).

      lr_client->send( ).
      lr_client->receive( ).

      l_json_response = lr_client->response->get_cdata( ).
      /ui2/cl_json=>deserialize( EXPORTING json = l_json_response pretty_name = /ui2/cl_json=>pretty_mode-camel_case CHANGING data = lr_data ).

      lr_client->response->get_status(
        IMPORTING
          code   = DATA(lv_code) ).

      IF lv_code = '200'.

        /ui2/cl_data_access=>create( ir_data = lr_data iv_component = `STATUS`)->value( IMPORTING ev_data = lv_status_job ).

        IF lv_status_job = 'DONE'.

          DATA: l_field_name      TYPE string,
                l_value           TYPE string,
                i TYPE i.
          i = 1.
          WHILE i < 4.

            /ui2/cl_data_access=>create( ir_data = lr_data iv_component = |EXTRACTION-HEADER_FIELDS[{ i }]-NAME| )->value( IMPORTING ev_data = l_field_name ).

            /ui2/cl_data_access=>create( ir_data = lr_data iv_component = |EXTRACTION-HEADER_FIELDS[{ i }]-VALUE| )->value( IMPORTING ev_data = l_value ).

            WRITE:/ l_field_name, l_value.

            i = i + 1.

          ENDWHILE.

          rv_status_job = lv_status_job.

        ENDIF.
      ELSE.
        rv_status_job = 'FAILED'.
      ENDIF.

      lr_client->close(  ).

    ENDIF.

  ENDMETHOD.

  METHOD send_file.

    DATA: l_job        TYPE string,
          l_status_job TYPE string.

    l_job = lr_die->post_document( iv_file_content ).
*    l_job = '1ad442aa-46dc-4e84-8344-d024ec516a18'.
    IF l_job IS NOT INITIAL.

      l_status_job = lr_die->get_status_job( iv_job = l_job ).

      WHILE l_status_job <> 'DONE' AND l_status_job <> 'FAILED'.
        WAIT UP TO 3 SECONDS.
        l_status_job = lr_die->get_status_job( iv_job = l_job ).
      ENDWHILE.

    ENDIF.

  ENDMETHOD.

ENDCLASS.

START-OF-SELECTION.

  IF p_fname IS NOT INITIAL.

*   Covert file to binary format
    CALL METHOD cl_gui_frontend_services=>gui_upload
      EXPORTING
        filename   = CONV #( p_fname )
        filetype   = 'BIN'
      IMPORTING
        filelength = DATA(lv_input_len)
      CHANGING
        data_tab   = lt_file.

*   convert file to XSTRING
    CALL FUNCTION 'SCMS_BINARY_TO_XSTRING'
      EXPORTING
        input_length = lv_input_len
      IMPORTING
        buffer       = lv_file_content
      TABLES
        binary_tab   = lt_file.

    lr_die = NEW zcl_die( ).

    IF lr_die->authenticate( ) = abap_true.

      lr_die->send_file( lv_file_content ).

    ENDIF.

  ENDIF.
 
For testing we can use the following invoice  from missions. If we run the program with that pdf, after some seconds you have the following output

SAP ABAP Exam Prep, SAP ABAP Certification, SAP ABAP Learning, SAP ABAP Career, SAP ABAP Exam, SAP ABAP Skills

We can verify in the Document Information Extraction UI that the extracted that is correct.

SAP ABAP Exam Prep, SAP ABAP Certification, SAP ABAP Learning, SAP ABAP Career, SAP ABAP Exam, SAP ABAP Skills

With that you can automate the process of scanning documents like invoices, check if it has purchase order number to match the infoice with purchase order, and many other options just in an ABAP program.

Source: sap.com

No comments:

Post a Comment